All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/62] XArray November 2017 Edition
@ 2017-11-22 21:06 ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

I've lost count of the number of times I've posted the XArray before,
so time for a new numbering scheme.  Here're two earlier versions,
https://lkml.org/lkml/2017/3/17/724
https://lwn.net/Articles/715948/ (this one's more loquacious in its
description of things that are better about the radix tree API than the
XArray).

This time around, I've gone for an approach of many small changes.
Unfortunately, that means you get 62 moderate patches instead of dozens
of big ones.

Some of these can and should go in independently of whether the XArray
is a good idea.  The first four fix some things in the test-suite.
Patch 6 changes the API for the IDR cyclic allocation.  Patch 7 changes
the API for the IDR 'extended' API that was recently added without my
review.

Patch 15 is messy and breaks any time anybody touches the page cache.
I'd like it to go in soon.  All it's doing is removing the page cache's
private spinlock and having it explicitly use the spinlock now embedded
in the xarray.

Patches 16-30 are all the fun infrastructure, adding pieces of the
xarray API.  This is probably a good place to focus review, particularly
if you're intent on critiquing the API.

Patches 31 onwards start actually using the new API, taking advantage of
the built-in locking and memory allocation to get rid of the preload
API.  There's probably the most scope for bugs here; I've tried to
understand the locking schemes used in thirty different places, and
maybe I got some of them wrong.

There will need to be many more patches.  Dozens, maybe hundreds.  Once
we've converted all the calls to radix_tree_insert() over to
xa_store(), we can get rid of the GFP flags stored in the xarray head.
That'll make the mm people a little happier.  Once we've got rid of
idr_preload(), we can get rid of the per-CPU cache of radix_tree_nodes.
I have a lot more enhancements on my todo list, but I'd like to get
something merged in the next cycle, and if I wait until my todo list is
empty, I will get nothing merged.

If you looked at earlier iterations, there was much more support for
huge pages.  That's not in this revision; I couldn't debug the problems
I was seeing.  I still have all that code, and I'm intending to put it
back in; it's just a bit down the todo list right now.

Matthew Wilcox (62):
  tools: Make __test_and_clear_bit available
  radix tree test suite: Remove ARRAY_SIZE
  radix tree test suite: Check reclaim bit
  idr test suite: Fix ida_test_random()
  radix tree: Add a missing cast to gfp_t
  idr: Make cursor explicit for cyclic allocation
  idr: Rewrite extended IDR API
  Explicitly include radix-tree.h
  arm64: Turn flush_dcache_mmap_lock into a no-op
  unicore32: Turn flush_dcache_mmap_lock into a no-op
  Export __set_page_dirty
  xfs: Rename xa_ elements to ail_
  fscache: Use appropriate radix tree accessors
  xarray: Add the xa_lock to the radix_tree_root
  page cache: Use xa_lock
  xarray: Replace exceptional entries
  xarray: Change definition of sibling entries
  xarray: Add definition of struct xarray
  xarray: Define struct xa_node
  xarray: Add xa_load
  xarray: Add xa_get_tag, xa_set_tag and xa_clear_tag
  xarray: Add xa_store
  xarray: Add xa_cmpxchg
  xarray: Add xa_for_each
  xarray: Add xa_init
  xarray: Add xas_for_each_tag
  xarray: Add xa_get_entries and xa_get_tagged
  xarray: Add xa_destroy
  xarray: Add xas_prev_any
  xarray: Add xas_find_any / xas_next_any
  Convert IDR to use xarray
  ida: Convert to using xarray
  page cache: Convert page_cache_next_hole to XArray
  page cache: Use xarray for adding pages
  page cache: Convert page_cache_tree_delete to xarray
  page cache: Convert find_get_entry to xarray
  shmem: Convert replace to xarray
  shmem: Convert shmem_confirm_swap to XArray
  shmem: Convert find_swap_entry to XArray
  shmem: Convert shmem_tag_pins to XArray
  shmem: Convert shmem_wait_for_pins to XArray
  vmalloc: Convert to xarray
  brd: Convert to XArray
  xfs: Convert m_perag_tree to XArray
  xfs: Convert pag_ici_root to XArray
  xfs: Convert xfs dquot to XArray
  xfs: Convert mru cache to XArray
  block: Remove IDR preloading
  rxrpc: Remove IDR preloading
  cgroup: Remove IDR wrappers
  dca: Remove idr_preload calls
  ipc: Remove call to idr_preload
  irq: Remove call to idr_preload
  scsi: Remove idr_preload in st driver
  firewire: Remove call to idr_preload
  drm: Remove drm_minor_lock and idr_preload
  drm: Remove drm_syncobj_fd_to_handle
  drm: Remove qxl driver IDR locks
  drm: Replace virtio IDRs with IDAs
  drm: Replace vmwgfx IDRs with IDAs
  net: Redesign act_api use of IDR
  mm: Convert page-writeback to XArray

 Documentation/cgroup-v1/memory.txt              |    2 +-
 Documentation/vm/page_migration                 |   14 +-
 arch/arm/include/asm/cacheflush.h               |    6 +-
 arch/arm64/include/asm/cacheflush.h             |    6 +-
 arch/nios2/include/asm/cacheflush.h             |    6 +-
 arch/parisc/include/asm/cacheflush.h            |    6 +-
 arch/powerpc/include/asm/book3s/64/pgtable.h    |    4 +-
 arch/powerpc/include/asm/nohash/64/pgtable.h    |    4 +-
 arch/powerpc/platforms/cell/spufs/sched.c       |    2 +-
 arch/unicore32/include/asm/cacheflush.h         |    6 +-
 block/genhd.c                                   |   23 +-
 drivers/block/brd.c                             |   87 +-
 drivers/dca/dca-sysfs.c                         |   22 +-
 drivers/firewire/core-cdev.c                    |   20 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c         |    8 +-
 drivers/gpu/drm/drm_dp_aux_dev.c                |    5 +-
 drivers/gpu/drm/drm_drv.c                       |   25 +-
 drivers/gpu/drm/drm_gem.c                       |   27 +-
 drivers/gpu/drm/drm_syncobj.c                   |   23 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c    |    6 +-
 drivers/gpu/drm/i915/i915_debugfs.c             |    4 +-
 drivers/gpu/drm/i915/i915_gem.c                 |   17 +-
 drivers/gpu/drm/msm/msm_gem_submit.c            |   10 +-
 drivers/gpu/drm/qxl/qxl_cmd.c                   |   26 +-
 drivers/gpu/drm/qxl/qxl_drv.h                   |    2 -
 drivers/gpu/drm/qxl/qxl_kms.c                   |    2 -
 drivers/gpu/drm/qxl/qxl_release.c               |   12 +-
 drivers/gpu/drm/vc4/vc4_gem.c                   |    4 +-
 drivers/gpu/drm/virtio/virtgpu_drv.h            |    6 +-
 drivers/gpu/drm/virtio/virtgpu_kms.c            |   18 +-
 drivers/gpu/drm/virtio/virtgpu_vq.c             |   12 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c             |    6 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h             |    2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c        |   28 +-
 drivers/infiniband/core/cm.c                    |    5 +-
 drivers/infiniband/hw/mlx4/cm.c                 |    3 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h            |    1 +
 drivers/rapidio/rio_cm.c                        |    3 +-
 drivers/rpmsg/qcom_glink_native.c               |    7 +-
 drivers/scsi/st.c                               |   18 +-
 drivers/staging/lustre/lustre/llite/glimpse.c   |    2 +-
 drivers/staging/lustre/lustre/mdc/mdc_request.c |   10 +-
 drivers/target/target_core_device.c             |    4 +-
 drivers/usb/host/xhci.h                         |    1 +
 fs/afs/write.c                                  |    2 +-
 fs/btrfs/compression.c                          |    4 +-
 fs/btrfs/extent_io.c                            |    8 +-
 fs/btrfs/inode.c                                |    4 +-
 fs/buffer.c                                     |   13 +-
 fs/cifs/file.c                                  |    2 +-
 fs/dax.c                                        |  204 ++--
 fs/f2fs/data.c                                  |    6 +-
 fs/f2fs/dir.c                                   |    6 +-
 fs/f2fs/gc.c                                    |    2 +-
 fs/f2fs/inline.c                                |    6 +-
 fs/f2fs/node.c                                  |    8 +-
 fs/fs-writeback.c                               |   18 +-
 fs/fscache/cookie.c                             |    2 +-
 fs/fscache/object.c                             |    2 +-
 fs/inode.c                                      |   11 +-
 fs/kernfs/dir.c                                 |    5 +-
 fs/nfsd/nfs4state.c                             |    4 +-
 fs/nfsd/state.h                                 |    1 +
 fs/nilfs2/btnode.c                              |   20 +-
 fs/nilfs2/page.c                                |   22 +-
 fs/notify/inotify/inotify_user.c                |   15 +-
 fs/proc/loadavg.c                               |    2 +-
 fs/proc/task_mmu.c                              |    2 +-
 fs/xfs/libxfs/xfs_sb.c                          |   11 +-
 fs/xfs/libxfs/xfs_sb.h                          |    2 +-
 fs/xfs/xfs_aops.c                               |   15 +-
 fs/xfs/xfs_buf_item.c                           |   10 +-
 fs/xfs/xfs_dquot.c                              |   37 +-
 fs/xfs/xfs_dquot_item.c                         |   11 +-
 fs/xfs/xfs_icache.c                             |  142 +--
 fs/xfs/xfs_icache.h                             |   10 +-
 fs/xfs/xfs_inode.c                              |   24 +-
 fs/xfs/xfs_inode_item.c                         |   22 +-
 fs/xfs/xfs_log.c                                |    6 +-
 fs/xfs/xfs_log_recover.c                        |   80 +-
 fs/xfs/xfs_mount.c                              |   22 +-
 fs/xfs/xfs_mount.h                              |    6 +-
 fs/xfs/xfs_mru_cache.c                          |   71 +-
 fs/xfs/xfs_qm.c                                 |   32 +-
 fs/xfs/xfs_qm.h                                 |   18 +-
 fs/xfs/xfs_trans.c                              |   18 +-
 fs/xfs/xfs_trans_ail.c                          |  152 +--
 fs/xfs/xfs_trans_buf.c                          |    4 +-
 fs/xfs/xfs_trans_priv.h                         |   42 +-
 include/drm/drm_file.h                          |    7 +-
 include/linux/backing-dev.h                     |   12 +-
 include/linux/fs.h                              |   68 +-
 include/linux/fscache.h                         |    1 +
 include/linux/fsnotify_backend.h                |    1 +
 include/linux/idr.h                             |  204 ++--
 include/linux/kernfs.h                          |    1 +
 include/linux/mm.h                              |    3 +-
 include/linux/pagemap.h                         |    4 +-
 include/linux/pid_namespace.h                   |    1 +
 include/linux/radix-tree.h                      |  111 +-
 include/linux/swapops.h                         |   19 +-
 include/linux/xarray.h                          |  717 +++++++++++
 include/net/act_api.h                           |   27 +-
 include/net/sctp/sctp.h                         |    1 +
 ipc/util.c                                      |   28 +-
 kernel/bpf/syscall.c                            |    8 +-
 kernel/cgroup/cgroup.c                          |   63 +-
 kernel/irq/timings.c                            |    4 +-
 kernel/pid.c                                    |    6 +-
 kernel/pid_namespace.c                          |   12 +-
 lib/Makefile                                    |    2 +-
 lib/dma-debug.c                                 |    1 +
 lib/idr.c                                       |  470 ++++----
 lib/radix-tree.c                                |  348 ++----
 lib/xarray.c                                    | 1444 +++++++++++++++++++++++
 mm/filemap.c                                    |  306 ++---
 mm/huge_memory.c                                |   10 +-
 mm/khugepaged.c                                 |   51 +-
 mm/madvise.c                                    |    2 +-
 mm/memcontrol.c                                 |    4 +-
 mm/migrate.c                                    |   31 +-
 mm/mincore.c                                    |    2 +-
 mm/page-writeback.c                             |   79 +-
 mm/readahead.c                                  |    4 +-
 mm/rmap.c                                       |    4 +-
 mm/shmem.c                                      |  196 ++-
 mm/swap.c                                       |    2 +-
 mm/swap_state.c                                 |   17 +-
 mm/truncate.c                                   |   34 +-
 mm/vmalloc.c                                    |   37 +-
 mm/vmscan.c                                     |   12 +-
 mm/workingset.c                                 |   34 +-
 net/rxrpc/af_rxrpc.c                            |    2 +-
 net/rxrpc/ar-internal.h                         |    1 +
 net/rxrpc/conn_client.c                         |   21 +-
 net/sched/act_api.c                             |  135 +--
 net/sched/cls_basic.c                           |   20 +-
 net/sched/cls_bpf.c                             |   20 +-
 net/sched/cls_flower.c                          |   32 +-
 net/sched/cls_u32.c                             |   44 +-
 net/sctp/associola.c                            |    3 +-
 net/sctp/protocol.c                             |    1 +
 tools/include/asm-generic/bitops.h              |    1 +
 tools/include/asm-generic/bitops/atomic.h       |    9 -
 tools/include/asm-generic/bitops/non-atomic.h   |  109 ++
 tools/include/linux/spinlock.h                  |    2 +
 tools/testing/radix-tree/.gitignore             |    2 +
 tools/testing/radix-tree/Makefile               |   14 +-
 tools/testing/radix-tree/idr-test.c             |   28 +-
 tools/testing/radix-tree/linux.c                |    2 +-
 tools/testing/radix-tree/linux/kernel.h         |    2 -
 tools/testing/radix-tree/linux/rcupdate.h       |    2 +
 tools/testing/radix-tree/linux/xarray.h         |    2 +
 tools/testing/radix-tree/multiorder.c           |   53 +-
 tools/testing/radix-tree/test.c                 |    8 +-
 tools/testing/radix-tree/xarray-test.c          |  122 ++
 156 files changed, 4178 insertions(+), 2454 deletions(-)
 create mode 100644 include/linux/xarray.h
 create mode 100644 lib/xarray.c
 create mode 100644 tools/include/asm-generic/bitops/non-atomic.h
 create mode 100644 tools/testing/radix-tree/linux/xarray.h
 create mode 100644 tools/testing/radix-tree/xarray-test.c

-- 
2.15.0

^ permalink raw reply	[flat|nested] 159+ messages in thread

* [PATCH 00/62] XArray November 2017 Edition
@ 2017-11-22 21:06 ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

I've lost count of the number of times I've posted the XArray before,
so time for a new numbering scheme.  Here're two earlier versions,
https://lkml.org/lkml/2017/3/17/724
https://lwn.net/Articles/715948/ (this one's more loquacious in its
description of things that are better about the radix tree API than the
XArray).

This time around, I've gone for an approach of many small changes.
Unfortunately, that means you get 62 moderate patches instead of dozens
of big ones.

Some of these can and should go in independently of whether the XArray
is a good idea.  The first four fix some things in the test-suite.
Patch 6 changes the API for the IDR cyclic allocation.  Patch 7 changes
the API for the IDR 'extended' API that was recently added without my
review.

Patch 15 is messy and breaks any time anybody touches the page cache.
I'd like it to go in soon.  All it's doing is removing the page cache's
private spinlock and having it explicitly use the spinlock now embedded
in the xarray.

Patches 16-30 are all the fun infrastructure, adding pieces of the
xarray API.  This is probably a good place to focus review, particularly
if you're intent on critiquing the API.

Patches 31 onwards start actually using the new API, taking advantage of
the built-in locking and memory allocation to get rid of the preload
API.  There's probably the most scope for bugs here; I've tried to
understand the locking schemes used in thirty different places, and
maybe I got some of them wrong.

There will need to be many more patches.  Dozens, maybe hundreds.  Once
we've converted all the calls to radix_tree_insert() over to
xa_store(), we can get rid of the GFP flags stored in the xarray head.
That'll make the mm people a little happier.  Once we've got rid of
idr_preload(), we can get rid of the per-CPU cache of radix_tree_nodes.
I have a lot more enhancements on my todo list, but I'd like to get
something merged in the next cycle, and if I wait until my todo list is
empty, I will get nothing merged.

If you looked at earlier iterations, there was much more support for
huge pages.  That's not in this revision; I couldn't debug the problems
I was seeing.  I still have all that code, and I'm intending to put it
back in; it's just a bit down the todo list right now.

Matthew Wilcox (62):
  tools: Make __test_and_clear_bit available
  radix tree test suite: Remove ARRAY_SIZE
  radix tree test suite: Check reclaim bit
  idr test suite: Fix ida_test_random()
  radix tree: Add a missing cast to gfp_t
  idr: Make cursor explicit for cyclic allocation
  idr: Rewrite extended IDR API
  Explicitly include radix-tree.h
  arm64: Turn flush_dcache_mmap_lock into a no-op
  unicore32: Turn flush_dcache_mmap_lock into a no-op
  Export __set_page_dirty
  xfs: Rename xa_ elements to ail_
  fscache: Use appropriate radix tree accessors
  xarray: Add the xa_lock to the radix_tree_root
  page cache: Use xa_lock
  xarray: Replace exceptional entries
  xarray: Change definition of sibling entries
  xarray: Add definition of struct xarray
  xarray: Define struct xa_node
  xarray: Add xa_load
  xarray: Add xa_get_tag, xa_set_tag and xa_clear_tag
  xarray: Add xa_store
  xarray: Add xa_cmpxchg
  xarray: Add xa_for_each
  xarray: Add xa_init
  xarray: Add xas_for_each_tag
  xarray: Add xa_get_entries and xa_get_tagged
  xarray: Add xa_destroy
  xarray: Add xas_prev_any
  xarray: Add xas_find_any / xas_next_any
  Convert IDR to use xarray
  ida: Convert to using xarray
  page cache: Convert page_cache_next_hole to XArray
  page cache: Use xarray for adding pages
  page cache: Convert page_cache_tree_delete to xarray
  page cache: Convert find_get_entry to xarray
  shmem: Convert replace to xarray
  shmem: Convert shmem_confirm_swap to XArray
  shmem: Convert find_swap_entry to XArray
  shmem: Convert shmem_tag_pins to XArray
  shmem: Convert shmem_wait_for_pins to XArray
  vmalloc: Convert to xarray
  brd: Convert to XArray
  xfs: Convert m_perag_tree to XArray
  xfs: Convert pag_ici_root to XArray
  xfs: Convert xfs dquot to XArray
  xfs: Convert mru cache to XArray
  block: Remove IDR preloading
  rxrpc: Remove IDR preloading
  cgroup: Remove IDR wrappers
  dca: Remove idr_preload calls
  ipc: Remove call to idr_preload
  irq: Remove call to idr_preload
  scsi: Remove idr_preload in st driver
  firewire: Remove call to idr_preload
  drm: Remove drm_minor_lock and idr_preload
  drm: Remove drm_syncobj_fd_to_handle
  drm: Remove qxl driver IDR locks
  drm: Replace virtio IDRs with IDAs
  drm: Replace vmwgfx IDRs with IDAs
  net: Redesign act_api use of IDR
  mm: Convert page-writeback to XArray

 Documentation/cgroup-v1/memory.txt              |    2 +-
 Documentation/vm/page_migration                 |   14 +-
 arch/arm/include/asm/cacheflush.h               |    6 +-
 arch/arm64/include/asm/cacheflush.h             |    6 +-
 arch/nios2/include/asm/cacheflush.h             |    6 +-
 arch/parisc/include/asm/cacheflush.h            |    6 +-
 arch/powerpc/include/asm/book3s/64/pgtable.h    |    4 +-
 arch/powerpc/include/asm/nohash/64/pgtable.h    |    4 +-
 arch/powerpc/platforms/cell/spufs/sched.c       |    2 +-
 arch/unicore32/include/asm/cacheflush.h         |    6 +-
 block/genhd.c                                   |   23 +-
 drivers/block/brd.c                             |   87 +-
 drivers/dca/dca-sysfs.c                         |   22 +-
 drivers/firewire/core-cdev.c                    |   20 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c         |    8 +-
 drivers/gpu/drm/drm_dp_aux_dev.c                |    5 +-
 drivers/gpu/drm/drm_drv.c                       |   25 +-
 drivers/gpu/drm/drm_gem.c                       |   27 +-
 drivers/gpu/drm/drm_syncobj.c                   |   23 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c    |    6 +-
 drivers/gpu/drm/i915/i915_debugfs.c             |    4 +-
 drivers/gpu/drm/i915/i915_gem.c                 |   17 +-
 drivers/gpu/drm/msm/msm_gem_submit.c            |   10 +-
 drivers/gpu/drm/qxl/qxl_cmd.c                   |   26 +-
 drivers/gpu/drm/qxl/qxl_drv.h                   |    2 -
 drivers/gpu/drm/qxl/qxl_kms.c                   |    2 -
 drivers/gpu/drm/qxl/qxl_release.c               |   12 +-
 drivers/gpu/drm/vc4/vc4_gem.c                   |    4 +-
 drivers/gpu/drm/virtio/virtgpu_drv.h            |    6 +-
 drivers/gpu/drm/virtio/virtgpu_kms.c            |   18 +-
 drivers/gpu/drm/virtio/virtgpu_vq.c             |   12 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c             |    6 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h             |    2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c        |   28 +-
 drivers/infiniband/core/cm.c                    |    5 +-
 drivers/infiniband/hw/mlx4/cm.c                 |    3 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h            |    1 +
 drivers/rapidio/rio_cm.c                        |    3 +-
 drivers/rpmsg/qcom_glink_native.c               |    7 +-
 drivers/scsi/st.c                               |   18 +-
 drivers/staging/lustre/lustre/llite/glimpse.c   |    2 +-
 drivers/staging/lustre/lustre/mdc/mdc_request.c |   10 +-
 drivers/target/target_core_device.c             |    4 +-
 drivers/usb/host/xhci.h                         |    1 +
 fs/afs/write.c                                  |    2 +-
 fs/btrfs/compression.c                          |    4 +-
 fs/btrfs/extent_io.c                            |    8 +-
 fs/btrfs/inode.c                                |    4 +-
 fs/buffer.c                                     |   13 +-
 fs/cifs/file.c                                  |    2 +-
 fs/dax.c                                        |  204 ++--
 fs/f2fs/data.c                                  |    6 +-
 fs/f2fs/dir.c                                   |    6 +-
 fs/f2fs/gc.c                                    |    2 +-
 fs/f2fs/inline.c                                |    6 +-
 fs/f2fs/node.c                                  |    8 +-
 fs/fs-writeback.c                               |   18 +-
 fs/fscache/cookie.c                             |    2 +-
 fs/fscache/object.c                             |    2 +-
 fs/inode.c                                      |   11 +-
 fs/kernfs/dir.c                                 |    5 +-
 fs/nfsd/nfs4state.c                             |    4 +-
 fs/nfsd/state.h                                 |    1 +
 fs/nilfs2/btnode.c                              |   20 +-
 fs/nilfs2/page.c                                |   22 +-
 fs/notify/inotify/inotify_user.c                |   15 +-
 fs/proc/loadavg.c                               |    2 +-
 fs/proc/task_mmu.c                              |    2 +-
 fs/xfs/libxfs/xfs_sb.c                          |   11 +-
 fs/xfs/libxfs/xfs_sb.h                          |    2 +-
 fs/xfs/xfs_aops.c                               |   15 +-
 fs/xfs/xfs_buf_item.c                           |   10 +-
 fs/xfs/xfs_dquot.c                              |   37 +-
 fs/xfs/xfs_dquot_item.c                         |   11 +-
 fs/xfs/xfs_icache.c                             |  142 +--
 fs/xfs/xfs_icache.h                             |   10 +-
 fs/xfs/xfs_inode.c                              |   24 +-
 fs/xfs/xfs_inode_item.c                         |   22 +-
 fs/xfs/xfs_log.c                                |    6 +-
 fs/xfs/xfs_log_recover.c                        |   80 +-
 fs/xfs/xfs_mount.c                              |   22 +-
 fs/xfs/xfs_mount.h                              |    6 +-
 fs/xfs/xfs_mru_cache.c                          |   71 +-
 fs/xfs/xfs_qm.c                                 |   32 +-
 fs/xfs/xfs_qm.h                                 |   18 +-
 fs/xfs/xfs_trans.c                              |   18 +-
 fs/xfs/xfs_trans_ail.c                          |  152 +--
 fs/xfs/xfs_trans_buf.c                          |    4 +-
 fs/xfs/xfs_trans_priv.h                         |   42 +-
 include/drm/drm_file.h                          |    7 +-
 include/linux/backing-dev.h                     |   12 +-
 include/linux/fs.h                              |   68 +-
 include/linux/fscache.h                         |    1 +
 include/linux/fsnotify_backend.h                |    1 +
 include/linux/idr.h                             |  204 ++--
 include/linux/kernfs.h                          |    1 +
 include/linux/mm.h                              |    3 +-
 include/linux/pagemap.h                         |    4 +-
 include/linux/pid_namespace.h                   |    1 +
 include/linux/radix-tree.h                      |  111 +-
 include/linux/swapops.h                         |   19 +-
 include/linux/xarray.h                          |  717 +++++++++++
 include/net/act_api.h                           |   27 +-
 include/net/sctp/sctp.h                         |    1 +
 ipc/util.c                                      |   28 +-
 kernel/bpf/syscall.c                            |    8 +-
 kernel/cgroup/cgroup.c                          |   63 +-
 kernel/irq/timings.c                            |    4 +-
 kernel/pid.c                                    |    6 +-
 kernel/pid_namespace.c                          |   12 +-
 lib/Makefile                                    |    2 +-
 lib/dma-debug.c                                 |    1 +
 lib/idr.c                                       |  470 ++++----
 lib/radix-tree.c                                |  348 ++----
 lib/xarray.c                                    | 1444 +++++++++++++++++++++++
 mm/filemap.c                                    |  306 ++---
 mm/huge_memory.c                                |   10 +-
 mm/khugepaged.c                                 |   51 +-
 mm/madvise.c                                    |    2 +-
 mm/memcontrol.c                                 |    4 +-
 mm/migrate.c                                    |   31 +-
 mm/mincore.c                                    |    2 +-
 mm/page-writeback.c                             |   79 +-
 mm/readahead.c                                  |    4 +-
 mm/rmap.c                                       |    4 +-
 mm/shmem.c                                      |  196 ++-
 mm/swap.c                                       |    2 +-
 mm/swap_state.c                                 |   17 +-
 mm/truncate.c                                   |   34 +-
 mm/vmalloc.c                                    |   37 +-
 mm/vmscan.c                                     |   12 +-
 mm/workingset.c                                 |   34 +-
 net/rxrpc/af_rxrpc.c                            |    2 +-
 net/rxrpc/ar-internal.h                         |    1 +
 net/rxrpc/conn_client.c                         |   21 +-
 net/sched/act_api.c                             |  135 +--
 net/sched/cls_basic.c                           |   20 +-
 net/sched/cls_bpf.c                             |   20 +-
 net/sched/cls_flower.c                          |   32 +-
 net/sched/cls_u32.c                             |   44 +-
 net/sctp/associola.c                            |    3 +-
 net/sctp/protocol.c                             |    1 +
 tools/include/asm-generic/bitops.h              |    1 +
 tools/include/asm-generic/bitops/atomic.h       |    9 -
 tools/include/asm-generic/bitops/non-atomic.h   |  109 ++
 tools/include/linux/spinlock.h                  |    2 +
 tools/testing/radix-tree/.gitignore             |    2 +
 tools/testing/radix-tree/Makefile               |   14 +-
 tools/testing/radix-tree/idr-test.c             |   28 +-
 tools/testing/radix-tree/linux.c                |    2 +-
 tools/testing/radix-tree/linux/kernel.h         |    2 -
 tools/testing/radix-tree/linux/rcupdate.h       |    2 +
 tools/testing/radix-tree/linux/xarray.h         |    2 +
 tools/testing/radix-tree/multiorder.c           |   53 +-
 tools/testing/radix-tree/test.c                 |    8 +-
 tools/testing/radix-tree/xarray-test.c          |  122 ++
 156 files changed, 4178 insertions(+), 2454 deletions(-)
 create mode 100644 include/linux/xarray.h
 create mode 100644 lib/xarray.c
 create mode 100644 tools/include/asm-generic/bitops/non-atomic.h
 create mode 100644 tools/testing/radix-tree/linux/xarray.h
 create mode 100644 tools/testing/radix-tree/xarray-test.c

-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 159+ messages in thread

* [PATCH 01/62] tools: Make __test_and_clear_bit available
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Copy over non-atomic.h and delete the duplicate definitions which had
been pasted into atomic.h.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 tools/include/asm-generic/bitops.h            |   1 +
 tools/include/asm-generic/bitops/atomic.h     |   9 ---
 tools/include/asm-generic/bitops/non-atomic.h | 109 ++++++++++++++++++++++++++
 3 files changed, 110 insertions(+), 9 deletions(-)
 create mode 100644 tools/include/asm-generic/bitops/non-atomic.h

diff --git a/tools/include/asm-generic/bitops.h b/tools/include/asm-generic/bitops.h
index 9bce3b56b5e7..5d2ab38965cc 100644
--- a/tools/include/asm-generic/bitops.h
+++ b/tools/include/asm-generic/bitops.h
@@ -27,5 +27,6 @@
 #include <asm-generic/bitops/hweight.h>
 
 #include <asm-generic/bitops/atomic.h>
+#include <asm-generic/bitops/non-atomic.h>
 
 #endif /* __TOOLS_ASM_GENERIC_BITOPS_H */
diff --git a/tools/include/asm-generic/bitops/atomic.h b/tools/include/asm-generic/bitops/atomic.h
index 21c41ccd1266..2f6ea28764a7 100644
--- a/tools/include/asm-generic/bitops/atomic.h
+++ b/tools/include/asm-generic/bitops/atomic.h
@@ -15,13 +15,4 @@ static inline void clear_bit(int nr, unsigned long *addr)
 	addr[nr / __BITS_PER_LONG] &= ~(1UL << (nr % __BITS_PER_LONG));
 }
 
-static __always_inline int test_bit(unsigned int nr, const unsigned long *addr)
-{
-	return ((1UL << (nr % __BITS_PER_LONG)) &
-		(((unsigned long *)addr)[nr / __BITS_PER_LONG])) != 0;
-}
-
-#define __set_bit(nr, addr)	set_bit(nr, addr)
-#define __clear_bit(nr, addr)	clear_bit(nr, addr)
-
 #endif /* _TOOLS_LINUX_ASM_GENERIC_BITOPS_ATOMIC_H_ */
diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
new file mode 100644
index 000000000000..7e10c4b50c5d
--- /dev/null
+++ b/tools/include/asm-generic/bitops/non-atomic.h
@@ -0,0 +1,109 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
+#define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
+
+#include <asm/types.h>
+
+/**
+ * __set_bit - Set a bit in memory
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * Unlike set_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static inline void __set_bit(int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p  |= mask;
+}
+
+static inline void __clear_bit(int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p &= ~mask;
+}
+
+/**
+ * __change_bit - Toggle a bit in memory
+ * @nr: the bit to change
+ * @addr: the address to start counting from
+ *
+ * Unlike change_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static inline void __change_bit(int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p ^= mask;
+}
+
+/**
+ * __test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old | mask;
+	return (old & mask) != 0;
+}
+
+/**
+ * __test_and_clear_bit - Clear a bit and return its old value
+ * @nr: Bit to clear
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old & ~mask;
+	return (old & mask) != 0;
+}
+
+/* WARNING: non atomic and it can be reordered! */
+static inline int __test_and_change_bit(int nr,
+					    volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old ^ mask;
+	return (old & mask) != 0;
+}
+
+/**
+ * test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ */
+static inline int test_bit(int nr, const volatile unsigned long *addr)
+{
+	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
+}
+
+#endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 01/62] tools: Make __test_and_clear_bit available
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Copy over non-atomic.h and delete the duplicate definitions which had
been pasted into atomic.h.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 tools/include/asm-generic/bitops.h            |   1 +
 tools/include/asm-generic/bitops/atomic.h     |   9 ---
 tools/include/asm-generic/bitops/non-atomic.h | 109 ++++++++++++++++++++++++++
 3 files changed, 110 insertions(+), 9 deletions(-)
 create mode 100644 tools/include/asm-generic/bitops/non-atomic.h

diff --git a/tools/include/asm-generic/bitops.h b/tools/include/asm-generic/bitops.h
index 9bce3b56b5e7..5d2ab38965cc 100644
--- a/tools/include/asm-generic/bitops.h
+++ b/tools/include/asm-generic/bitops.h
@@ -27,5 +27,6 @@
 #include <asm-generic/bitops/hweight.h>
 
 #include <asm-generic/bitops/atomic.h>
+#include <asm-generic/bitops/non-atomic.h>
 
 #endif /* __TOOLS_ASM_GENERIC_BITOPS_H */
diff --git a/tools/include/asm-generic/bitops/atomic.h b/tools/include/asm-generic/bitops/atomic.h
index 21c41ccd1266..2f6ea28764a7 100644
--- a/tools/include/asm-generic/bitops/atomic.h
+++ b/tools/include/asm-generic/bitops/atomic.h
@@ -15,13 +15,4 @@ static inline void clear_bit(int nr, unsigned long *addr)
 	addr[nr / __BITS_PER_LONG] &= ~(1UL << (nr % __BITS_PER_LONG));
 }
 
-static __always_inline int test_bit(unsigned int nr, const unsigned long *addr)
-{
-	return ((1UL << (nr % __BITS_PER_LONG)) &
-		(((unsigned long *)addr)[nr / __BITS_PER_LONG])) != 0;
-}
-
-#define __set_bit(nr, addr)	set_bit(nr, addr)
-#define __clear_bit(nr, addr)	clear_bit(nr, addr)
-
 #endif /* _TOOLS_LINUX_ASM_GENERIC_BITOPS_ATOMIC_H_ */
diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
new file mode 100644
index 000000000000..7e10c4b50c5d
--- /dev/null
+++ b/tools/include/asm-generic/bitops/non-atomic.h
@@ -0,0 +1,109 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
+#define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
+
+#include <asm/types.h>
+
+/**
+ * __set_bit - Set a bit in memory
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * Unlike set_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static inline void __set_bit(int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p  |= mask;
+}
+
+static inline void __clear_bit(int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p &= ~mask;
+}
+
+/**
+ * __change_bit - Toggle a bit in memory
+ * @nr: the bit to change
+ * @addr: the address to start counting from
+ *
+ * Unlike change_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static inline void __change_bit(int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+	*p ^= mask;
+}
+
+/**
+ * __test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old | mask;
+	return (old & mask) != 0;
+}
+
+/**
+ * __test_and_clear_bit - Clear a bit and return its old value
+ * @nr: Bit to clear
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old & ~mask;
+	return (old & mask) != 0;
+}
+
+/* WARNING: non atomic and it can be reordered! */
+static inline int __test_and_change_bit(int nr,
+					    volatile unsigned long *addr)
+{
+	unsigned long mask = BIT_MASK(nr);
+	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+	unsigned long old = *p;
+
+	*p = old ^ mask;
+	return (old & mask) != 0;
+}
+
+/**
+ * test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ */
+static inline int test_bit(int nr, const volatile unsigned long *addr)
+{
+	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
+}
+
+#endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 02/62] radix tree test suite: Remove ARRAY_SIZE
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This is now defined in tools/include/linux/kernel.h, so our
definition generates a warning.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 tools/testing/radix-tree/linux/kernel.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tools/testing/radix-tree/linux/kernel.h b/tools/testing/radix-tree/linux/kernel.h
index c3bc3f364f68..426f32f28547 100644
--- a/tools/testing/radix-tree/linux/kernel.h
+++ b/tools/testing/radix-tree/linux/kernel.h
@@ -17,6 +17,4 @@
 #define pr_debug printk
 #define pr_cont printk
 
-#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
-
 #endif /* _KERNEL_H */
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 02/62] radix tree test suite: Remove ARRAY_SIZE
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This is now defined in tools/include/linux/kernel.h, so our
definition generates a warning.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 tools/testing/radix-tree/linux/kernel.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tools/testing/radix-tree/linux/kernel.h b/tools/testing/radix-tree/linux/kernel.h
index c3bc3f364f68..426f32f28547 100644
--- a/tools/testing/radix-tree/linux/kernel.h
+++ b/tools/testing/radix-tree/linux/kernel.h
@@ -17,6 +17,4 @@
 #define pr_debug printk
 #define pr_cont printk
 
-#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
-
 #endif /* _KERNEL_H */
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 03/62] radix tree test suite: Check reclaim bit
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

In order to test the memory allocation failure paths, the radix tree
test suite fails allocations if __GFP_NOWARN is set.  That happens to work
for the radix tree implementation, but the semantics we really want are
that we want to fail allocations which are not GFP_KERNEL.  Do this
by failing allocations which don't have the DIRECT_RECLAIM bit set.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 tools/testing/radix-tree/linux.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/radix-tree/linux.c b/tools/testing/radix-tree/linux.c
index 6903ccf35595..f7f3caed3650 100644
--- a/tools/testing/radix-tree/linux.c
+++ b/tools/testing/radix-tree/linux.c
@@ -29,7 +29,7 @@ void *kmem_cache_alloc(struct kmem_cache *cachep, int flags)
 {
 	struct radix_tree_node *node;
 
-	if (flags & __GFP_NOWARN)
+	if (!(flags & __GFP_DIRECT_RECLAIM))
 		return NULL;
 
 	pthread_mutex_lock(&cachep->lock);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 03/62] radix tree test suite: Check reclaim bit
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

In order to test the memory allocation failure paths, the radix tree
test suite fails allocations if __GFP_NOWARN is set.  That happens to work
for the radix tree implementation, but the semantics we really want are
that we want to fail allocations which are not GFP_KERNEL.  Do this
by failing allocations which don't have the DIRECT_RECLAIM bit set.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 tools/testing/radix-tree/linux.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/radix-tree/linux.c b/tools/testing/radix-tree/linux.c
index 6903ccf35595..f7f3caed3650 100644
--- a/tools/testing/radix-tree/linux.c
+++ b/tools/testing/radix-tree/linux.c
@@ -29,7 +29,7 @@ void *kmem_cache_alloc(struct kmem_cache *cachep, int flags)
 {
 	struct radix_tree_node *node;
 
-	if (flags & __GFP_NOWARN)
+	if (!(flags & __GFP_DIRECT_RECLAIM))
 		return NULL;
 
 	pthread_mutex_lock(&cachep->lock);
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 04/62] idr test suite: Fix ida_test_random()
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The test was checking the wrong errno; ida_get_new_above() returns
EAGAIN, not ENOMEM on memory allocation failure.  Double the number of
threads to increase the chance that we actually exercise this path
during the test suite (it was a bit sporadic before).

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 tools/testing/radix-tree/idr-test.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/radix-tree/idr-test.c b/tools/testing/radix-tree/idr-test.c
index 30cd0b296f1a..193450b29bf0 100644
--- a/tools/testing/radix-tree/idr-test.c
+++ b/tools/testing/radix-tree/idr-test.c
@@ -380,7 +380,7 @@ void ida_check_random(void)
 			do {
 				ida_pre_get(&ida, GFP_KERNEL);
 				err = ida_get_new_above(&ida, bit, &id);
-			} while (err == -ENOMEM);
+			} while (err == -EAGAIN);
 			assert(!err);
 			assert(id == bit);
 		}
@@ -489,7 +489,7 @@ static void *ida_random_fn(void *arg)
 
 void ida_thread_tests(void)
 {
-	pthread_t threads[10];
+	pthread_t threads[20];
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(threads); i++)
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 04/62] idr test suite: Fix ida_test_random()
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The test was checking the wrong errno; ida_get_new_above() returns
EAGAIN, not ENOMEM on memory allocation failure.  Double the number of
threads to increase the chance that we actually exercise this path
during the test suite (it was a bit sporadic before).

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 tools/testing/radix-tree/idr-test.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/radix-tree/idr-test.c b/tools/testing/radix-tree/idr-test.c
index 30cd0b296f1a..193450b29bf0 100644
--- a/tools/testing/radix-tree/idr-test.c
+++ b/tools/testing/radix-tree/idr-test.c
@@ -380,7 +380,7 @@ void ida_check_random(void)
 			do {
 				ida_pre_get(&ida, GFP_KERNEL);
 				err = ida_get_new_above(&ida, bit, &id);
-			} while (err == -ENOMEM);
+			} while (err == -EAGAIN);
 			assert(!err);
 			assert(id == bit);
 		}
@@ -489,7 +489,7 @@ static void *ida_random_fn(void *arg)
 
 void ida_thread_tests(void)
 {
-	pthread_t threads[10];
+	pthread_t threads[20];
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(threads); i++)
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 05/62] radix tree: Add a missing cast to gfp_t
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

sparse complains about an invalid type assignment.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 lib/radix-tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index c8d55565fafa..f00303e0b216 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -178,7 +178,7 @@ static inline void root_tag_clear(struct radix_tree_root *root, unsigned tag)
 
 static inline void root_tag_clear_all(struct radix_tree_root *root)
 {
-	root->gfp_mask &= (1 << ROOT_TAG_SHIFT) - 1;
+	root->gfp_mask &= (__force gfp_t)((1 << ROOT_TAG_SHIFT) - 1);
 }
 
 static inline int root_tag_get(const struct radix_tree_root *root, unsigned tag)
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 05/62] radix tree: Add a missing cast to gfp_t
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

sparse complains about an invalid type assignment.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 lib/radix-tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index c8d55565fafa..f00303e0b216 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -178,7 +178,7 @@ static inline void root_tag_clear(struct radix_tree_root *root, unsigned tag)
 
 static inline void root_tag_clear_all(struct radix_tree_root *root)
 {
-	root->gfp_mask &= (1 << ROOT_TAG_SHIFT) - 1;
+	root->gfp_mask &= (__force gfp_t)((1 << ROOT_TAG_SHIFT) - 1);
 }
 
 static inline int root_tag_get(const struct radix_tree_root *root, unsigned tag)
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 06/62] idr: Make cursor explicit for cyclic allocation
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The struct idr was 24 bytes (on 64 bit architectures) due to the idr_next
element which was only used for the very few idr_alloc_cyclic users.
Save 8 bytes per struct idr by making idr_alloc_cyclic() take a pointer
to a cursor instead of using idr_next.  This also lets us remove the
idr_get_cursor / idr_set_cursor API as users can now manage access to
their own cursor however they like.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 arch/powerpc/platforms/cell/spufs/sched.c |  2 +-
 drivers/gpu/drm/drm_dp_aux_dev.c          |  5 +++--
 drivers/infiniband/core/cm.c              |  5 ++++-
 drivers/infiniband/hw/mlx4/cm.c           |  3 ++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h      |  1 +
 drivers/rapidio/rio_cm.c                  |  3 ++-
 drivers/rpmsg/qcom_glink_native.c         |  7 +++++--
 drivers/target/target_core_device.c       |  4 +++-
 fs/kernfs/dir.c                           |  5 +++--
 fs/nfsd/nfs4state.c                       |  4 +++-
 fs/nfsd/state.h                           |  1 +
 fs/notify/inotify/inotify_user.c          | 15 +++++++--------
 fs/proc/loadavg.c                         |  2 +-
 include/linux/fsnotify_backend.h          |  1 +
 include/linux/idr.h                       | 31 ++-----------------------------
 include/linux/kernfs.h                    |  1 +
 include/linux/pid_namespace.h             |  1 +
 include/net/sctp/sctp.h                   |  1 +
 kernel/bpf/syscall.c                      |  8 ++++++--
 kernel/cgroup/cgroup.c                    |  4 +++-
 kernel/pid.c                              |  4 ++--
 kernel/pid_namespace.c                    | 12 +++++-------
 lib/idr.c                                 | 23 ++++++++++++++---------
 net/rxrpc/af_rxrpc.c                      |  2 +-
 net/rxrpc/ar-internal.h                   |  1 +
 net/rxrpc/conn_client.c                   | 20 ++++++++------------
 net/sctp/associola.c                      |  3 ++-
 net/sctp/protocol.c                       |  1 +
 tools/testing/radix-tree/idr-test.c       | 18 +++++++++++++++---
 29 files changed, 100 insertions(+), 88 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spufs/sched.c b/arch/powerpc/platforms/cell/spufs/sched.c
index e47761cdcb98..d2c3078472ab 100644
--- a/arch/powerpc/platforms/cell/spufs/sched.c
+++ b/arch/powerpc/platforms/cell/spufs/sched.c
@@ -1093,7 +1093,7 @@ static int show_spu_loadavg(struct seq_file *s, void *private)
 		LOAD_INT(c), LOAD_FRAC(c),
 		count_active_contexts(),
 		atomic_read(&nr_spu_contexts),
-		idr_get_cursor(&task_active_pid_ns(current)->idr));
+		task_active_pid_ns(current)->next_pid - 1);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/drm_dp_aux_dev.c b/drivers/gpu/drm/drm_dp_aux_dev.c
index 053044201e31..3ce890908ede 100644
--- a/drivers/gpu/drm/drm_dp_aux_dev.c
+++ b/drivers/gpu/drm/drm_dp_aux_dev.c
@@ -50,6 +50,7 @@ struct drm_dp_aux_dev {
 #define DRM_AUX_MINORS	256
 #define AUX_MAX_OFFSET	(1 << 20)
 static DEFINE_IDR(aux_idr);
+static int aux_cursor;
 static DEFINE_MUTEX(aux_idr_mutex);
 static struct class *drm_dp_aux_dev_class;
 static int drm_dev_major = -1;
@@ -80,8 +81,8 @@ static struct drm_dp_aux_dev *alloc_drm_dp_aux_dev(struct drm_dp_aux *aux)
 	kref_init(&aux_dev->refcount);
 
 	mutex_lock(&aux_idr_mutex);
-	index = idr_alloc_cyclic(&aux_idr, aux_dev, 0, DRM_AUX_MINORS,
-				 GFP_KERNEL);
+	index = idr_alloc_cyclic(&aux_idr, &aux_cursor, aux_dev, 0,
+				DRM_AUX_MINORS, GFP_KERNEL);
 	mutex_unlock(&aux_idr_mutex);
 	if (index < 0) {
 		kfree(aux_dev);
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index f6b159d79977..42d7b37382c4 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -125,6 +125,7 @@ static struct ib_cm {
 	struct rb_root remote_id_table;
 	struct rb_root remote_sidr_table;
 	struct idr local_id_table;
+	int local_id_cursor;
 	__be32 random_id_operand;
 	struct list_head timewait_list;
 	struct workqueue_struct *wq;
@@ -519,7 +520,8 @@ static int cm_alloc_id(struct cm_id_private *cm_id_priv)
 	idr_preload(GFP_KERNEL);
 	spin_lock_irqsave(&cm.lock, flags);
 
-	id = idr_alloc_cyclic(&cm.local_id_table, cm_id_priv, 0, 0, GFP_NOWAIT);
+	id = idr_alloc_cyclic(&cm.local_id_table, &cm.local_id_cursor,
+				cm_id_priv, 0, 0, GFP_NOWAIT);
 
 	spin_unlock_irqrestore(&cm.lock, flags);
 	idr_preload_end();
@@ -4322,6 +4324,7 @@ static int __init ib_cm_init(void)
 	cm.remote_qp_table = RB_ROOT;
 	cm.remote_sidr_table = RB_ROOT;
 	idr_init(&cm.local_id_table);
+	cm.local_id_cursor = 0;
 	get_random_bytes(&cm.random_id_operand, sizeof cm.random_id_operand);
 	INIT_LIST_HEAD(&cm.timewait_list);
 
diff --git a/drivers/infiniband/hw/mlx4/cm.c b/drivers/infiniband/hw/mlx4/cm.c
index fedaf8260105..288a796b9f14 100644
--- a/drivers/infiniband/hw/mlx4/cm.c
+++ b/drivers/infiniband/hw/mlx4/cm.c
@@ -259,7 +259,8 @@ id_map_alloc(struct ib_device *ibdev, int slave_id, u32 sl_cm_id)
 	idr_preload(GFP_KERNEL);
 	spin_lock(&to_mdev(ibdev)->sriov.id_map_lock);
 
-	ret = idr_alloc_cyclic(&sriov->pv_id_table, ent, 0, 0, GFP_NOWAIT);
+	ret = idr_alloc_cyclic(&sriov->pv_id_table, &sriov->pv_id_cursor,
+				ent, 0, 0, GFP_NOWAIT);
 	if (ret >= 0) {
 		ent->pv_cm_id = (u32)ret;
 		sl_id_map_add(ibdev, ent);
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index e14919c15b06..dfa8727254b7 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -501,6 +501,7 @@ struct mlx4_ib_sriov {
 	spinlock_t id_map_lock;
 	struct rb_root sl_id_map;
 	struct idr pv_id_table;
+	int pv_id_cursor;
 };
 
 struct gid_cache_context {
diff --git a/drivers/rapidio/rio_cm.c b/drivers/rapidio/rio_cm.c
index bad0e0ea4f30..56ff8e1a0b09 100644
--- a/drivers/rapidio/rio_cm.c
+++ b/drivers/rapidio/rio_cm.c
@@ -238,6 +238,7 @@ static int riocm_ch_close(struct rio_channel *ch);
 
 static DEFINE_SPINLOCK(idr_lock);
 static DEFINE_IDR(ch_idr);
+static int ch_cursor;
 
 static LIST_HEAD(cm_dev_list);
 static DECLARE_RWSEM(rdev_sem);
@@ -1307,7 +1308,7 @@ static struct rio_channel *riocm_ch_alloc(u16 ch_num)
 
 	idr_preload(GFP_KERNEL);
 	spin_lock_bh(&idr_lock);
-	id = idr_alloc_cyclic(&ch_idr, ch, start, end, GFP_NOWAIT);
+	id = idr_alloc_cyclic(&ch_idr, &ch_cursor, ch, start, end, GFP_NOWAIT);
 	spin_unlock_bh(&idr_lock);
 	idr_preload_end();
 
diff --git a/drivers/rpmsg/qcom_glink_native.c b/drivers/rpmsg/qcom_glink_native.c
index 40d76d2a5eff..bd8709b06c8a 100644
--- a/drivers/rpmsg/qcom_glink_native.c
+++ b/drivers/rpmsg/qcom_glink_native.c
@@ -116,6 +116,7 @@ struct qcom_glink {
 	struct mutex tx_lock;
 
 	spinlock_t idr_lock;
+	int lccursor;
 	struct idr lcids;
 	struct idr rcids;
 	unsigned long features;
@@ -169,6 +170,7 @@ struct glink_channel {
 	unsigned int rcid;
 
 	spinlock_t intent_lock;
+	int licursor;
 	struct idr liids;
 	struct idr riids;
 	struct work_struct intent_work;
@@ -394,7 +396,7 @@ static int qcom_glink_send_open_req(struct qcom_glink *glink,
 	kref_get(&channel->refcount);
 
 	spin_lock_irqsave(&glink->idr_lock, flags);
-	ret = idr_alloc_cyclic(&glink->lcids, channel,
+	ret = idr_alloc_cyclic(&glink->lcids, &glink->lccursor, channel,
 			       RPM_GLINK_CID_MIN, RPM_GLINK_CID_MAX,
 			       GFP_ATOMIC);
 	spin_unlock_irqrestore(&glink->idr_lock, flags);
@@ -644,7 +646,8 @@ qcom_glink_alloc_intent(struct qcom_glink *glink,
 		goto free_intent;
 
 	spin_lock_irqsave(&channel->intent_lock, flags);
-	ret = idr_alloc_cyclic(&channel->liids, intent, 1, -1, GFP_ATOMIC);
+	ret = idr_alloc_cyclic(&channel->liids, &channel->licursor, intent,
+				1, -1, GFP_ATOMIC);
 	if (ret < 0) {
 		spin_unlock_irqrestore(&channel->intent_lock, flags);
 		goto free_data;
diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c
index e8dd6da164b2..f25dcad8d4d8 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -52,6 +52,7 @@
 static DEFINE_MUTEX(device_mutex);
 static LIST_HEAD(device_list);
 static DEFINE_IDR(devices_idr);
+static int devices_cursor;
 
 static struct se_hba *lun0_hba;
 /* not static, needed by tpg.c */
@@ -968,7 +969,8 @@ int target_configure_device(struct se_device *dev)
 	 * Use cyclic to try and avoid collisions with devices
 	 * that were recently removed.
 	 */
-	id = idr_alloc_cyclic(&devices_idr, dev, 0, INT_MAX, GFP_KERNEL);
+	id = idr_alloc_cyclic(&devices_idr, &devices_cursor, dev, 0, INT_MAX,
+				GFP_KERNEL);
 	mutex_unlock(&device_mutex);
 	if (id < 0) {
 		ret = -ENOMEM;
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 89d1dc19340b..843f93de4b88 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -636,8 +636,9 @@ static struct kernfs_node *__kernfs_new_node(struct kernfs_root *root,
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&kernfs_idr_lock);
-	cursor = idr_get_cursor(&root->ino_idr);
-	ret = idr_alloc_cyclic(&root->ino_idr, kn, 1, 0, GFP_ATOMIC);
+	cursor = root->ino_cursor;
+	ret = idr_alloc_cyclic(&root->ino_idr, &root->ino_cursor, kn,
+				1, 0, GFP_ATOMIC);
 	if (ret >= 0 && ret < cursor)
 		root->next_generation++;
 	gen = root->next_generation;
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index b82817767b9d..2b243f086838 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -645,7 +645,8 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&cl->cl_lock);
-	new_id = idr_alloc_cyclic(&cl->cl_stateids, stid, 0, 0, GFP_NOWAIT);
+	new_id = idr_alloc_cyclic(&cl->cl_stateids, &cl->cl_cursor, stid,
+					0, 0, GFP_NOWAIT);
 	spin_unlock(&cl->cl_lock);
 	idr_preload_end();
 	if (new_id < 0)
@@ -1771,6 +1772,7 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name)
 	clp->cl_name.len = name.len;
 	INIT_LIST_HEAD(&clp->cl_sessions);
 	idr_init(&clp->cl_stateids);
+	clp->cl_cursor = 0;
 	atomic_set(&clp->cl_refcount, 0);
 	clp->cl_cb_state = NFSD4_CB_UNKNOWN;
 	INIT_LIST_HEAD(&clp->cl_idhash);
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index f3772ea8ba0d..5525df3ef826 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -300,6 +300,7 @@ struct nfs4_client {
 	struct list_head	*cl_ownerstr_hashtbl;
 	struct list_head	cl_openowners;
 	struct idr		cl_stateids;	/* stateid lookup */
+	int			cl_cursor;
 	struct list_head	cl_delegations;
 	struct list_head	cl_revoked;	/* unacknowledged, revoked 4.1 state */
 	struct list_head        cl_lru;         /* tail queue */
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index d3c20e0bb046..cb84886230c2 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -341,22 +341,23 @@ static int inotify_find_inode(const char __user *dirname, struct path *path, uns
 	return error;
 }
 
-static int inotify_add_to_idr(struct idr *idr, spinlock_t *idr_lock,
-			      struct inotify_inode_mark *i_mark)
+static int inotify_add_to_idr(struct inotify_group_private_data *group,
+				struct inotify_inode_mark *i_mark)
 {
 	int ret;
 
 	idr_preload(GFP_KERNEL);
-	spin_lock(idr_lock);
+	spin_lock(&group->idr_lock);
 
-	ret = idr_alloc_cyclic(idr, i_mark, 1, 0, GFP_NOWAIT);
+	ret = idr_alloc_cyclic(&group->idr, &group->idr_cursor, i_mark, 1, 0,
+				GFP_NOWAIT);
 	if (ret >= 0) {
 		/* we added the mark to the idr, take a reference */
 		i_mark->wd = ret;
 		fsnotify_get_mark(&i_mark->fsn_mark);
 	}
 
-	spin_unlock(idr_lock);
+	spin_unlock(&group->idr_lock);
 	idr_preload_end();
 	return ret < 0 ? ret : 0;
 }
@@ -539,8 +540,6 @@ static int inotify_new_watch(struct fsnotify_group *group,
 	struct inotify_inode_mark *tmp_i_mark;
 	__u32 mask;
 	int ret;
-	struct idr *idr = &group->inotify_data.idr;
-	spinlock_t *idr_lock = &group->inotify_data.idr_lock;
 
 	mask = inotify_arg_to_mask(arg);
 
@@ -552,7 +551,7 @@ static int inotify_new_watch(struct fsnotify_group *group,
 	tmp_i_mark->fsn_mark.mask = mask;
 	tmp_i_mark->wd = -1;
 
-	ret = inotify_add_to_idr(idr, idr_lock, tmp_i_mark);
+	ret = inotify_add_to_idr(&group->inotify_data, tmp_i_mark);
 	if (ret)
 		goto out_err;
 
diff --git a/fs/proc/loadavg.c b/fs/proc/loadavg.c
index a000d7547479..84b628f4056b 100644
--- a/fs/proc/loadavg.c
+++ b/fs/proc/loadavg.c
@@ -24,7 +24,7 @@ static int loadavg_proc_show(struct seq_file *m, void *v)
 		LOAD_INT(avnrun[1]), LOAD_FRAC(avnrun[1]),
 		LOAD_INT(avnrun[2]), LOAD_FRAC(avnrun[2]),
 		nr_running(), nr_threads,
-		idr_get_cursor(&task_active_pid_ns(current)->idr));
+		task_active_pid_ns(current)->next_pid - 1);
 	return 0;
 }
 
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 067d52e95f02..fb97158f3b97 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -178,6 +178,7 @@ struct fsnotify_group {
 #ifdef CONFIG_INOTIFY_USER
 		struct inotify_group_private_data {
 			spinlock_t	idr_lock;
+			int		idr_cursor;
 			struct idr      idr;
 			struct ucounts *ucounts;
 		} inotify_data;
diff --git a/include/linux/idr.h b/include/linux/idr.h
index 7c3a365f7e12..10bfe62423df 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -18,7 +18,6 @@
 
 struct idr {
 	struct radix_tree_root	idr_rt;
-	unsigned int		idr_next;
 };
 
 /*
@@ -36,32 +35,6 @@ struct idr {
 }
 #define DEFINE_IDR(name)	struct idr name = IDR_INIT
 
-/**
- * idr_get_cursor - Return the current position of the cyclic allocator
- * @idr: idr handle
- *
- * The value returned is the value that will be next returned from
- * idr_alloc_cyclic() if it is free (otherwise the search will start from
- * this position).
- */
-static inline unsigned int idr_get_cursor(const struct idr *idr)
-{
-	return READ_ONCE(idr->idr_next);
-}
-
-/**
- * idr_set_cursor - Set the current position of the cyclic allocator
- * @idr: idr handle
- * @val: new position
- *
- * The next call to idr_alloc_cyclic() will return @val if it is free
- * (otherwise the search will start from this position).
- */
-static inline void idr_set_cursor(struct idr *idr, unsigned int val)
-{
-	WRITE_ONCE(idr->idr_next, val);
-}
-
 /**
  * DOC: idr sync
  * idr synchronization (stolen from radix-tree.h)
@@ -130,7 +103,8 @@ static inline int idr_alloc_ext(struct idr *idr, void *ptr,
 	return idr_alloc_cmn(idr, ptr, index, start, end, gfp, true);
 }
 
-int idr_alloc_cyclic(struct idr *, void *entry, int start, int end, gfp_t);
+int idr_alloc_cyclic(struct idr *, int *cursor, void *entry,
+			int start, int end, gfp_t);
 int idr_for_each(const struct idr *,
 		 int (*fn)(int id, void *p, void *data), void *data);
 void *idr_get_next(struct idr *, int *nextid);
@@ -152,7 +126,6 @@ static inline void *idr_remove(struct idr *idr, int id)
 static inline void idr_init(struct idr *idr)
 {
 	INIT_RADIX_TREE(&idr->idr_rt, IDR_RT_MARKER);
-	idr->idr_next = 0;
 }
 
 static inline bool idr_is_empty(const struct idr *idr)
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index ab25c8b6d9e3..2e74b1b361aa 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -185,6 +185,7 @@ struct kernfs_root {
 
 	/* private fields, do not use outside kernfs proper */
 	struct idr		ino_idr;
+	int			ino_cursor;
 	u32			next_generation;
 	struct kernfs_syscall_ops *syscall_ops;
 
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 49538b172483..dce073d7b2fa 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -25,6 +25,7 @@ struct pid_namespace {
 	struct kref kref;
 	struct idr idr;
 	struct rcu_head rcu;
+	int next_pid;
 	unsigned int pid_allocated;
 	struct task_struct *child_reaper;
 	struct kmem_cache *pid_cachep;
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 749a42882437..b920f6890bf3 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -505,6 +505,7 @@ void sctp_put_port(struct sock *sk);
 
 extern struct idr sctp_assocs_id;
 extern spinlock_t sctp_assocs_id_lock;
+extern int sctp_assocs_cursor;
 
 /* Static inline functions. */
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 09badc37e864..9c09f042c351 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -38,8 +38,10 @@
 
 DEFINE_PER_CPU(int, bpf_prog_active);
 static DEFINE_IDR(prog_idr);
+static int prog_cursor;
 static DEFINE_SPINLOCK(prog_idr_lock);
 static DEFINE_IDR(map_idr);
+static int map_cursor;
 static DEFINE_SPINLOCK(map_idr_lock);
 
 int sysctl_unprivileged_bpf_disabled __read_mostly;
@@ -178,7 +180,8 @@ static int bpf_map_alloc_id(struct bpf_map *map)
 	int id;
 
 	spin_lock_bh(&map_idr_lock);
-	id = idr_alloc_cyclic(&map_idr, map, 1, INT_MAX, GFP_ATOMIC);
+	id = idr_alloc_cyclic(&map_idr, &map_cursor, map, 1, INT_MAX,
+				GFP_ATOMIC);
 	if (id > 0)
 		map->id = id;
 	spin_unlock_bh(&map_idr_lock);
@@ -893,7 +896,8 @@ static int bpf_prog_alloc_id(struct bpf_prog *prog)
 	int id;
 
 	spin_lock_bh(&prog_idr_lock);
-	id = idr_alloc_cyclic(&prog_idr, prog, 1, INT_MAX, GFP_ATOMIC);
+	id = idr_alloc_cyclic(&prog_idr, &prog_cursor, prog, 1, INT_MAX,
+				GFP_ATOMIC);
 	if (id > 0)
 		prog->aux->id = id;
 	spin_unlock_bh(&prog_idr_lock);
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 0b1ffe147f24..351b355336d4 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -173,6 +173,7 @@ static int cgroup_root_count;
 
 /* hierarchy ID allocation and mapping, protected by cgroup_mutex */
 static DEFINE_IDR(cgroup_hierarchy_idr);
+static int cgroup_hierarchy_cursor;
 
 /*
  * Assign a monotonically increasing serial number to csses.  It guarantees
@@ -1213,7 +1214,8 @@ static int cgroup_init_root_id(struct cgroup_root *root)
 
 	lockdep_assert_held(&cgroup_mutex);
 
-	id = idr_alloc_cyclic(&cgroup_hierarchy_idr, root, 0, 0, GFP_KERNEL);
+	id = idr_alloc_cyclic(&cgroup_hierarchy_idr, &cgroup_hierarchy_cursor,
+				root, 0, 0, GFP_KERNEL);
 	if (id < 0)
 		return id;
 
diff --git a/kernel/pid.c b/kernel/pid.c
index b13b624e2c49..aae5f3307c35 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -170,14 +170,14 @@ struct pid *alloc_pid(struct pid_namespace *ns)
 		 * init really needs pid 1, but after reaching the maximum
 		 * wrap back to RESERVED_PIDS
 		 */
-		if (idr_get_cursor(&tmp->idr) > RESERVED_PIDS)
+		if (tmp->next_pid > RESERVED_PIDS)
 			pid_min = RESERVED_PIDS;
 
 		/*
 		 * Store a null pointer so find_pid_ns does not find
 		 * a partially initialized PID (see below).
 		 */
-		nr = idr_alloc_cyclic(&tmp->idr, NULL, pid_min,
+		nr = idr_alloc_cyclic(&tmp->idr, &tmp->next_pid, NULL, pid_min,
 				      pid_max, GFP_ATOMIC);
 		spin_unlock_irq(&pidmap_lock);
 		idr_preload_end();
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 0b53eef7d34b..8246d92adc56 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -287,24 +287,22 @@ static int pid_ns_ctl_handler(struct ctl_table *table, int write,
 {
 	struct pid_namespace *pid_ns = task_active_pid_ns(current);
 	struct ctl_table tmp = *table;
-	int ret, next;
+	int ret, prev;
 
 	if (write && !ns_capable(pid_ns->user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 
 	/*
 	 * Writing directly to ns' last_pid field is OK, since this field
-	 * is volatile in a living namespace anyway and a code writing to
+	 * is volatile in a living namespace anyway and code writing to
 	 * it should synchronize its usage with external means.
 	 */
 
-	next = idr_get_cursor(&pid_ns->idr) - 1;
-
-	tmp.data = &next;
+	prev = pid_ns->next_pid - 1;
+	tmp.data = &prev;
 	ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
 	if (!ret && write)
-		idr_set_cursor(&pid_ns->idr, next + 1);
-
+		pid_ns->next_pid = prev + 1;
 	return ret;
 }
 
diff --git a/lib/idr.c b/lib/idr.c
index 2593ce513a18..26cb99412b8f 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -17,6 +17,9 @@ int idr_alloc_cmn(struct idr *idr, void *ptr, unsigned long *index,
 	if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr)))
 		return -EINVAL;
 
+	if (WARN_ON_ONCE(!(idr->idr_rt.gfp_mask & ROOT_IS_IDR)))
+		idr->idr_rt.gfp_mask |= IDR_RT_MARKER;
+
 	radix_tree_iter_init(&iter, start);
 	if (ext)
 		slot = idr_get_free_ext(&idr->idr_rt, &iter, gfp, end);
@@ -35,20 +38,22 @@ int idr_alloc_cmn(struct idr *idr, void *ptr, unsigned long *index,
 EXPORT_SYMBOL_GPL(idr_alloc_cmn);
 
 /**
- * idr_alloc_cyclic - allocate new idr entry in a cyclical fashion
- * @idr: idr handle
- * @ptr: pointer to be associated with the new id
- * @start: the minimum id (inclusive)
- * @end: the maximum id (exclusive)
- * @gfp: memory allocation flags
+ * idr_alloc_cyclic() - Allocate new idr entry in a cyclical fashion.
+ * @idr: idr handle.
+ * @cursor: A pointer to the next ID to allocate.
+ * @ptr: Pointer to be associated with the new id.
+ * @start: The minimum id (inclusive).
+ * @end: The maximum id (exclusive).
+ * @gfp: Memory allocation flags.
  *
  * Allocates an ID larger than the last ID allocated if one is available.
  * If not, it will attempt to allocate the smallest ID that is larger or
  * equal to @start.
  */
-int idr_alloc_cyclic(struct idr *idr, void *ptr, int start, int end, gfp_t gfp)
+int idr_alloc_cyclic(struct idr *idr, int *cursor, void *ptr,
+			int start, int end, gfp_t gfp)
 {
-	int id, curr = idr->idr_next;
+	int id, curr = *cursor;
 
 	if (curr < start)
 		curr = start;
@@ -58,7 +63,7 @@ int idr_alloc_cyclic(struct idr *idr, void *ptr, int start, int end, gfp_t gfp)
 		id = idr_alloc(idr, ptr, start, curr, gfp);
 
 	if (id >= 0)
-		idr->idr_next = id + 1U;
+		*cursor = id + 1U;
 
 	return id;
 }
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 9b5c46b052fd..e10f35c2b8c3 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -959,7 +959,7 @@ static int __init af_rxrpc_init(void)
 	tmp &= 0x3fffffff;
 	if (tmp == 0)
 		tmp = 1;
-	idr_set_cursor(&rxrpc_client_conn_ids, tmp);
+	rxrpc_client_conn_cursor = tmp;
 
 	ret = -ENOMEM;
 	rxrpc_call_jar = kmem_cache_create(
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index b2151993d384..a6fb30b08a30 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -806,6 +806,7 @@ extern unsigned int rxrpc_reap_client_connections;
 extern unsigned int rxrpc_conn_idle_client_expiry;
 extern unsigned int rxrpc_conn_idle_client_fast_expiry;
 extern struct idr rxrpc_client_conn_ids;
+extern int rxrpc_client_conn_cursor;
 
 void rxrpc_destroy_client_conn_ids(void);
 int rxrpc_connect_call(struct rxrpc_call *, struct rxrpc_conn_parameters *,
diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c
index 5f9624bd311c..7e8bf10fec86 100644
--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -91,8 +91,9 @@ __read_mostly unsigned int rxrpc_conn_idle_client_fast_expiry = 2 * HZ;
 /*
  * We use machine-unique IDs for our client connections.
  */
-DEFINE_IDR(rxrpc_client_conn_ids);
 static DEFINE_SPINLOCK(rxrpc_conn_id_lock);
+int rxrpc_client_conn_cursor;
+DEFINE_IDR(rxrpc_client_conn_ids);
 
 static void rxrpc_cull_active_client_conns(struct rxrpc_net *);
 
@@ -112,14 +113,12 @@ static int rxrpc_get_client_connection_id(struct rxrpc_connection *conn,
 
 	idr_preload(gfp);
 	spin_lock(&rxrpc_conn_id_lock);
-
-	id = idr_alloc_cyclic(&rxrpc_client_conn_ids, conn,
-			      1, 0x40000000, GFP_NOWAIT);
-	if (id < 0)
-		goto error;
-
+	id = idr_alloc_cyclic(&rxrpc_client_conn_ids, &rxrpc_client_conn_cursor,
+				conn, 1, 0x40000000, GFP_NOWAIT);
 	spin_unlock(&rxrpc_conn_id_lock);
 	idr_preload_end();
+	if (id < 0)
+		goto error;
 
 	conn->proto.epoch = rxnet->epoch;
 	conn->proto.cid = id << RXRPC_CIDSHIFT;
@@ -128,8 +127,6 @@ static int rxrpc_get_client_connection_id(struct rxrpc_connection *conn,
 	return 0;
 
 error:
-	spin_unlock(&rxrpc_conn_id_lock);
-	idr_preload_end();
 	_leave(" = %d", id);
 	return id;
 }
@@ -238,7 +235,7 @@ rxrpc_alloc_client_connection(struct rxrpc_conn_parameters *cp, gfp_t gfp)
 static bool rxrpc_may_reuse_conn(struct rxrpc_connection *conn)
 {
 	struct rxrpc_net *rxnet = conn->params.local->rxnet;
-	int id_cursor, id, distance, limit;
+	int id, distance, limit;
 
 	if (test_bit(RXRPC_CONN_DONT_REUSE, &conn->flags))
 		goto dont_reuse;
@@ -252,9 +249,8 @@ static bool rxrpc_may_reuse_conn(struct rxrpc_connection *conn)
 	 * times the maximum number of client conns away from the current
 	 * allocation point to try and keep the IDs concentrated.
 	 */
-	id_cursor = idr_get_cursor(&rxrpc_client_conn_ids);
 	id = conn->proto.cid >> RXRPC_CIDSHIFT;
-	distance = id - id_cursor;
+	distance = id - rxrpc_client_conn_cursor;
 	if (distance < 0)
 		distance = -distance;
 	limit = max(rxrpc_max_client_connections * 4, 1024U);
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 69394f4d6091..77178d7456fd 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1622,7 +1622,8 @@ int sctp_assoc_set_id(struct sctp_association *asoc, gfp_t gfp)
 		idr_preload(gfp);
 	spin_lock_bh(&sctp_assocs_id_lock);
 	/* 0 is not a valid assoc_id, must be >= 1 */
-	ret = idr_alloc_cyclic(&sctp_assocs_id, asoc, 1, 0, GFP_NOWAIT);
+	ret = idr_alloc_cyclic(&sctp_assocs_id, &sctp_assocs_cursor,
+				asoc, 1, 0, GFP_NOWAIT);
 	spin_unlock_bh(&sctp_assocs_id_lock);
 	if (preload)
 		idr_preload_end();
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index f5172c21349b..13b2c34c9502 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -67,6 +67,7 @@ struct sctp_globals sctp_globals __read_mostly;
 
 struct idr sctp_assocs_id;
 DEFINE_SPINLOCK(sctp_assocs_id_lock);
+int sctp_assocs_cursor;
 
 static struct sctp_pf *sctp_pf_inet6_specific;
 static struct sctp_pf *sctp_pf_inet_specific;
diff --git a/tools/testing/radix-tree/idr-test.c b/tools/testing/radix-tree/idr-test.c
index 193450b29bf0..1dff94c15da5 100644
--- a/tools/testing/radix-tree/idr-test.c
+++ b/tools/testing/radix-tree/idr-test.c
@@ -42,9 +42,10 @@ void idr_alloc_test(void)
 {
 	unsigned long i;
 	DEFINE_IDR(idr);
+	int cursor = 0;
 
-	assert(idr_alloc_cyclic(&idr, DUMMY_PTR, 0, 0x4000, GFP_KERNEL) == 0);
-	assert(idr_alloc_cyclic(&idr, DUMMY_PTR, 0x3ffd, 0x4000, GFP_KERNEL) == 0x3ffd);
+	assert(idr_alloc_cyclic(&idr, &cursor, DUMMY_PTR, 0, 0x4000, GFP_KERNEL) == 0);
+	assert(idr_alloc_cyclic(&idr, &cursor, DUMMY_PTR, 0x3ffd, 0x4000, GFP_KERNEL) == 0x3ffd);
 	idr_remove(&idr, 0x3ffd);
 	idr_remove(&idr, 0);
 
@@ -57,7 +58,18 @@ void idr_alloc_test(void)
 		else
 			item = item_create(i - 0x3fff, 0);
 
-		id = idr_alloc_cyclic(&idr, item, 1, 0x4000, GFP_KERNEL);
+		id = idr_alloc_cyclic(&idr, &cursor, item, 1, 0x4000, GFP_KERNEL);
+		assert(id == item->index);
+	}
+
+	idr_for_each(&idr, item_idr_free, &idr);
+	idr_destroy(&idr);
+
+	cursor = 0x7ffffffe;
+	for (i = 0x7ffffffe; i < 0x80000003; i++) {
+		struct item *item = item_create(i & 0x7fffffff, 0);
+		int id = idr_alloc_cyclic(&idr, &cursor, item, 0, 0,
+								GFP_KERNEL);
 		assert(id == item->index);
 	}
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 06/62] idr: Make cursor explicit for cyclic allocation
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The struct idr was 24 bytes (on 64 bit architectures) due to the idr_next
element which was only used for the very few idr_alloc_cyclic users.
Save 8 bytes per struct idr by making idr_alloc_cyclic() take a pointer
to a cursor instead of using idr_next.  This also lets us remove the
idr_get_cursor / idr_set_cursor API as users can now manage access to
their own cursor however they like.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 arch/powerpc/platforms/cell/spufs/sched.c |  2 +-
 drivers/gpu/drm/drm_dp_aux_dev.c          |  5 +++--
 drivers/infiniband/core/cm.c              |  5 ++++-
 drivers/infiniband/hw/mlx4/cm.c           |  3 ++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h      |  1 +
 drivers/rapidio/rio_cm.c                  |  3 ++-
 drivers/rpmsg/qcom_glink_native.c         |  7 +++++--
 drivers/target/target_core_device.c       |  4 +++-
 fs/kernfs/dir.c                           |  5 +++--
 fs/nfsd/nfs4state.c                       |  4 +++-
 fs/nfsd/state.h                           |  1 +
 fs/notify/inotify/inotify_user.c          | 15 +++++++--------
 fs/proc/loadavg.c                         |  2 +-
 include/linux/fsnotify_backend.h          |  1 +
 include/linux/idr.h                       | 31 ++-----------------------------
 include/linux/kernfs.h                    |  1 +
 include/linux/pid_namespace.h             |  1 +
 include/net/sctp/sctp.h                   |  1 +
 kernel/bpf/syscall.c                      |  8 ++++++--
 kernel/cgroup/cgroup.c                    |  4 +++-
 kernel/pid.c                              |  4 ++--
 kernel/pid_namespace.c                    | 12 +++++-------
 lib/idr.c                                 | 23 ++++++++++++++---------
 net/rxrpc/af_rxrpc.c                      |  2 +-
 net/rxrpc/ar-internal.h                   |  1 +
 net/rxrpc/conn_client.c                   | 20 ++++++++------------
 net/sctp/associola.c                      |  3 ++-
 net/sctp/protocol.c                       |  1 +
 tools/testing/radix-tree/idr-test.c       | 18 +++++++++++++++---
 29 files changed, 100 insertions(+), 88 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spufs/sched.c b/arch/powerpc/platforms/cell/spufs/sched.c
index e47761cdcb98..d2c3078472ab 100644
--- a/arch/powerpc/platforms/cell/spufs/sched.c
+++ b/arch/powerpc/platforms/cell/spufs/sched.c
@@ -1093,7 +1093,7 @@ static int show_spu_loadavg(struct seq_file *s, void *private)
 		LOAD_INT(c), LOAD_FRAC(c),
 		count_active_contexts(),
 		atomic_read(&nr_spu_contexts),
-		idr_get_cursor(&task_active_pid_ns(current)->idr));
+		task_active_pid_ns(current)->next_pid - 1);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/drm_dp_aux_dev.c b/drivers/gpu/drm/drm_dp_aux_dev.c
index 053044201e31..3ce890908ede 100644
--- a/drivers/gpu/drm/drm_dp_aux_dev.c
+++ b/drivers/gpu/drm/drm_dp_aux_dev.c
@@ -50,6 +50,7 @@ struct drm_dp_aux_dev {
 #define DRM_AUX_MINORS	256
 #define AUX_MAX_OFFSET	(1 << 20)
 static DEFINE_IDR(aux_idr);
+static int aux_cursor;
 static DEFINE_MUTEX(aux_idr_mutex);
 static struct class *drm_dp_aux_dev_class;
 static int drm_dev_major = -1;
@@ -80,8 +81,8 @@ static struct drm_dp_aux_dev *alloc_drm_dp_aux_dev(struct drm_dp_aux *aux)
 	kref_init(&aux_dev->refcount);
 
 	mutex_lock(&aux_idr_mutex);
-	index = idr_alloc_cyclic(&aux_idr, aux_dev, 0, DRM_AUX_MINORS,
-				 GFP_KERNEL);
+	index = idr_alloc_cyclic(&aux_idr, &aux_cursor, aux_dev, 0,
+				DRM_AUX_MINORS, GFP_KERNEL);
 	mutex_unlock(&aux_idr_mutex);
 	if (index < 0) {
 		kfree(aux_dev);
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index f6b159d79977..42d7b37382c4 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -125,6 +125,7 @@ static struct ib_cm {
 	struct rb_root remote_id_table;
 	struct rb_root remote_sidr_table;
 	struct idr local_id_table;
+	int local_id_cursor;
 	__be32 random_id_operand;
 	struct list_head timewait_list;
 	struct workqueue_struct *wq;
@@ -519,7 +520,8 @@ static int cm_alloc_id(struct cm_id_private *cm_id_priv)
 	idr_preload(GFP_KERNEL);
 	spin_lock_irqsave(&cm.lock, flags);
 
-	id = idr_alloc_cyclic(&cm.local_id_table, cm_id_priv, 0, 0, GFP_NOWAIT);
+	id = idr_alloc_cyclic(&cm.local_id_table, &cm.local_id_cursor,
+				cm_id_priv, 0, 0, GFP_NOWAIT);
 
 	spin_unlock_irqrestore(&cm.lock, flags);
 	idr_preload_end();
@@ -4322,6 +4324,7 @@ static int __init ib_cm_init(void)
 	cm.remote_qp_table = RB_ROOT;
 	cm.remote_sidr_table = RB_ROOT;
 	idr_init(&cm.local_id_table);
+	cm.local_id_cursor = 0;
 	get_random_bytes(&cm.random_id_operand, sizeof cm.random_id_operand);
 	INIT_LIST_HEAD(&cm.timewait_list);
 
diff --git a/drivers/infiniband/hw/mlx4/cm.c b/drivers/infiniband/hw/mlx4/cm.c
index fedaf8260105..288a796b9f14 100644
--- a/drivers/infiniband/hw/mlx4/cm.c
+++ b/drivers/infiniband/hw/mlx4/cm.c
@@ -259,7 +259,8 @@ id_map_alloc(struct ib_device *ibdev, int slave_id, u32 sl_cm_id)
 	idr_preload(GFP_KERNEL);
 	spin_lock(&to_mdev(ibdev)->sriov.id_map_lock);
 
-	ret = idr_alloc_cyclic(&sriov->pv_id_table, ent, 0, 0, GFP_NOWAIT);
+	ret = idr_alloc_cyclic(&sriov->pv_id_table, &sriov->pv_id_cursor,
+				ent, 0, 0, GFP_NOWAIT);
 	if (ret >= 0) {
 		ent->pv_cm_id = (u32)ret;
 		sl_id_map_add(ibdev, ent);
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index e14919c15b06..dfa8727254b7 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -501,6 +501,7 @@ struct mlx4_ib_sriov {
 	spinlock_t id_map_lock;
 	struct rb_root sl_id_map;
 	struct idr pv_id_table;
+	int pv_id_cursor;
 };
 
 struct gid_cache_context {
diff --git a/drivers/rapidio/rio_cm.c b/drivers/rapidio/rio_cm.c
index bad0e0ea4f30..56ff8e1a0b09 100644
--- a/drivers/rapidio/rio_cm.c
+++ b/drivers/rapidio/rio_cm.c
@@ -238,6 +238,7 @@ static int riocm_ch_close(struct rio_channel *ch);
 
 static DEFINE_SPINLOCK(idr_lock);
 static DEFINE_IDR(ch_idr);
+static int ch_cursor;
 
 static LIST_HEAD(cm_dev_list);
 static DECLARE_RWSEM(rdev_sem);
@@ -1307,7 +1308,7 @@ static struct rio_channel *riocm_ch_alloc(u16 ch_num)
 
 	idr_preload(GFP_KERNEL);
 	spin_lock_bh(&idr_lock);
-	id = idr_alloc_cyclic(&ch_idr, ch, start, end, GFP_NOWAIT);
+	id = idr_alloc_cyclic(&ch_idr, &ch_cursor, ch, start, end, GFP_NOWAIT);
 	spin_unlock_bh(&idr_lock);
 	idr_preload_end();
 
diff --git a/drivers/rpmsg/qcom_glink_native.c b/drivers/rpmsg/qcom_glink_native.c
index 40d76d2a5eff..bd8709b06c8a 100644
--- a/drivers/rpmsg/qcom_glink_native.c
+++ b/drivers/rpmsg/qcom_glink_native.c
@@ -116,6 +116,7 @@ struct qcom_glink {
 	struct mutex tx_lock;
 
 	spinlock_t idr_lock;
+	int lccursor;
 	struct idr lcids;
 	struct idr rcids;
 	unsigned long features;
@@ -169,6 +170,7 @@ struct glink_channel {
 	unsigned int rcid;
 
 	spinlock_t intent_lock;
+	int licursor;
 	struct idr liids;
 	struct idr riids;
 	struct work_struct intent_work;
@@ -394,7 +396,7 @@ static int qcom_glink_send_open_req(struct qcom_glink *glink,
 	kref_get(&channel->refcount);
 
 	spin_lock_irqsave(&glink->idr_lock, flags);
-	ret = idr_alloc_cyclic(&glink->lcids, channel,
+	ret = idr_alloc_cyclic(&glink->lcids, &glink->lccursor, channel,
 			       RPM_GLINK_CID_MIN, RPM_GLINK_CID_MAX,
 			       GFP_ATOMIC);
 	spin_unlock_irqrestore(&glink->idr_lock, flags);
@@ -644,7 +646,8 @@ qcom_glink_alloc_intent(struct qcom_glink *glink,
 		goto free_intent;
 
 	spin_lock_irqsave(&channel->intent_lock, flags);
-	ret = idr_alloc_cyclic(&channel->liids, intent, 1, -1, GFP_ATOMIC);
+	ret = idr_alloc_cyclic(&channel->liids, &channel->licursor, intent,
+				1, -1, GFP_ATOMIC);
 	if (ret < 0) {
 		spin_unlock_irqrestore(&channel->intent_lock, flags);
 		goto free_data;
diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c
index e8dd6da164b2..f25dcad8d4d8 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -52,6 +52,7 @@
 static DEFINE_MUTEX(device_mutex);
 static LIST_HEAD(device_list);
 static DEFINE_IDR(devices_idr);
+static int devices_cursor;
 
 static struct se_hba *lun0_hba;
 /* not static, needed by tpg.c */
@@ -968,7 +969,8 @@ int target_configure_device(struct se_device *dev)
 	 * Use cyclic to try and avoid collisions with devices
 	 * that were recently removed.
 	 */
-	id = idr_alloc_cyclic(&devices_idr, dev, 0, INT_MAX, GFP_KERNEL);
+	id = idr_alloc_cyclic(&devices_idr, &devices_cursor, dev, 0, INT_MAX,
+				GFP_KERNEL);
 	mutex_unlock(&device_mutex);
 	if (id < 0) {
 		ret = -ENOMEM;
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 89d1dc19340b..843f93de4b88 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -636,8 +636,9 @@ static struct kernfs_node *__kernfs_new_node(struct kernfs_root *root,
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&kernfs_idr_lock);
-	cursor = idr_get_cursor(&root->ino_idr);
-	ret = idr_alloc_cyclic(&root->ino_idr, kn, 1, 0, GFP_ATOMIC);
+	cursor = root->ino_cursor;
+	ret = idr_alloc_cyclic(&root->ino_idr, &root->ino_cursor, kn,
+				1, 0, GFP_ATOMIC);
 	if (ret >= 0 && ret < cursor)
 		root->next_generation++;
 	gen = root->next_generation;
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index b82817767b9d..2b243f086838 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -645,7 +645,8 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&cl->cl_lock);
-	new_id = idr_alloc_cyclic(&cl->cl_stateids, stid, 0, 0, GFP_NOWAIT);
+	new_id = idr_alloc_cyclic(&cl->cl_stateids, &cl->cl_cursor, stid,
+					0, 0, GFP_NOWAIT);
 	spin_unlock(&cl->cl_lock);
 	idr_preload_end();
 	if (new_id < 0)
@@ -1771,6 +1772,7 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name)
 	clp->cl_name.len = name.len;
 	INIT_LIST_HEAD(&clp->cl_sessions);
 	idr_init(&clp->cl_stateids);
+	clp->cl_cursor = 0;
 	atomic_set(&clp->cl_refcount, 0);
 	clp->cl_cb_state = NFSD4_CB_UNKNOWN;
 	INIT_LIST_HEAD(&clp->cl_idhash);
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index f3772ea8ba0d..5525df3ef826 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -300,6 +300,7 @@ struct nfs4_client {
 	struct list_head	*cl_ownerstr_hashtbl;
 	struct list_head	cl_openowners;
 	struct idr		cl_stateids;	/* stateid lookup */
+	int			cl_cursor;
 	struct list_head	cl_delegations;
 	struct list_head	cl_revoked;	/* unacknowledged, revoked 4.1 state */
 	struct list_head        cl_lru;         /* tail queue */
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index d3c20e0bb046..cb84886230c2 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -341,22 +341,23 @@ static int inotify_find_inode(const char __user *dirname, struct path *path, uns
 	return error;
 }
 
-static int inotify_add_to_idr(struct idr *idr, spinlock_t *idr_lock,
-			      struct inotify_inode_mark *i_mark)
+static int inotify_add_to_idr(struct inotify_group_private_data *group,
+				struct inotify_inode_mark *i_mark)
 {
 	int ret;
 
 	idr_preload(GFP_KERNEL);
-	spin_lock(idr_lock);
+	spin_lock(&group->idr_lock);
 
-	ret = idr_alloc_cyclic(idr, i_mark, 1, 0, GFP_NOWAIT);
+	ret = idr_alloc_cyclic(&group->idr, &group->idr_cursor, i_mark, 1, 0,
+				GFP_NOWAIT);
 	if (ret >= 0) {
 		/* we added the mark to the idr, take a reference */
 		i_mark->wd = ret;
 		fsnotify_get_mark(&i_mark->fsn_mark);
 	}
 
-	spin_unlock(idr_lock);
+	spin_unlock(&group->idr_lock);
 	idr_preload_end();
 	return ret < 0 ? ret : 0;
 }
@@ -539,8 +540,6 @@ static int inotify_new_watch(struct fsnotify_group *group,
 	struct inotify_inode_mark *tmp_i_mark;
 	__u32 mask;
 	int ret;
-	struct idr *idr = &group->inotify_data.idr;
-	spinlock_t *idr_lock = &group->inotify_data.idr_lock;
 
 	mask = inotify_arg_to_mask(arg);
 
@@ -552,7 +551,7 @@ static int inotify_new_watch(struct fsnotify_group *group,
 	tmp_i_mark->fsn_mark.mask = mask;
 	tmp_i_mark->wd = -1;
 
-	ret = inotify_add_to_idr(idr, idr_lock, tmp_i_mark);
+	ret = inotify_add_to_idr(&group->inotify_data, tmp_i_mark);
 	if (ret)
 		goto out_err;
 
diff --git a/fs/proc/loadavg.c b/fs/proc/loadavg.c
index a000d7547479..84b628f4056b 100644
--- a/fs/proc/loadavg.c
+++ b/fs/proc/loadavg.c
@@ -24,7 +24,7 @@ static int loadavg_proc_show(struct seq_file *m, void *v)
 		LOAD_INT(avnrun[1]), LOAD_FRAC(avnrun[1]),
 		LOAD_INT(avnrun[2]), LOAD_FRAC(avnrun[2]),
 		nr_running(), nr_threads,
-		idr_get_cursor(&task_active_pid_ns(current)->idr));
+		task_active_pid_ns(current)->next_pid - 1);
 	return 0;
 }
 
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 067d52e95f02..fb97158f3b97 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -178,6 +178,7 @@ struct fsnotify_group {
 #ifdef CONFIG_INOTIFY_USER
 		struct inotify_group_private_data {
 			spinlock_t	idr_lock;
+			int		idr_cursor;
 			struct idr      idr;
 			struct ucounts *ucounts;
 		} inotify_data;
diff --git a/include/linux/idr.h b/include/linux/idr.h
index 7c3a365f7e12..10bfe62423df 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -18,7 +18,6 @@
 
 struct idr {
 	struct radix_tree_root	idr_rt;
-	unsigned int		idr_next;
 };
 
 /*
@@ -36,32 +35,6 @@ struct idr {
 }
 #define DEFINE_IDR(name)	struct idr name = IDR_INIT
 
-/**
- * idr_get_cursor - Return the current position of the cyclic allocator
- * @idr: idr handle
- *
- * The value returned is the value that will be next returned from
- * idr_alloc_cyclic() if it is free (otherwise the search will start from
- * this position).
- */
-static inline unsigned int idr_get_cursor(const struct idr *idr)
-{
-	return READ_ONCE(idr->idr_next);
-}
-
-/**
- * idr_set_cursor - Set the current position of the cyclic allocator
- * @idr: idr handle
- * @val: new position
- *
- * The next call to idr_alloc_cyclic() will return @val if it is free
- * (otherwise the search will start from this position).
- */
-static inline void idr_set_cursor(struct idr *idr, unsigned int val)
-{
-	WRITE_ONCE(idr->idr_next, val);
-}
-
 /**
  * DOC: idr sync
  * idr synchronization (stolen from radix-tree.h)
@@ -130,7 +103,8 @@ static inline int idr_alloc_ext(struct idr *idr, void *ptr,
 	return idr_alloc_cmn(idr, ptr, index, start, end, gfp, true);
 }
 
-int idr_alloc_cyclic(struct idr *, void *entry, int start, int end, gfp_t);
+int idr_alloc_cyclic(struct idr *, int *cursor, void *entry,
+			int start, int end, gfp_t);
 int idr_for_each(const struct idr *,
 		 int (*fn)(int id, void *p, void *data), void *data);
 void *idr_get_next(struct idr *, int *nextid);
@@ -152,7 +126,6 @@ static inline void *idr_remove(struct idr *idr, int id)
 static inline void idr_init(struct idr *idr)
 {
 	INIT_RADIX_TREE(&idr->idr_rt, IDR_RT_MARKER);
-	idr->idr_next = 0;
 }
 
 static inline bool idr_is_empty(const struct idr *idr)
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index ab25c8b6d9e3..2e74b1b361aa 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -185,6 +185,7 @@ struct kernfs_root {
 
 	/* private fields, do not use outside kernfs proper */
 	struct idr		ino_idr;
+	int			ino_cursor;
 	u32			next_generation;
 	struct kernfs_syscall_ops *syscall_ops;
 
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 49538b172483..dce073d7b2fa 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -25,6 +25,7 @@ struct pid_namespace {
 	struct kref kref;
 	struct idr idr;
 	struct rcu_head rcu;
+	int next_pid;
 	unsigned int pid_allocated;
 	struct task_struct *child_reaper;
 	struct kmem_cache *pid_cachep;
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 749a42882437..b920f6890bf3 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -505,6 +505,7 @@ void sctp_put_port(struct sock *sk);
 
 extern struct idr sctp_assocs_id;
 extern spinlock_t sctp_assocs_id_lock;
+extern int sctp_assocs_cursor;
 
 /* Static inline functions. */
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 09badc37e864..9c09f042c351 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -38,8 +38,10 @@
 
 DEFINE_PER_CPU(int, bpf_prog_active);
 static DEFINE_IDR(prog_idr);
+static int prog_cursor;
 static DEFINE_SPINLOCK(prog_idr_lock);
 static DEFINE_IDR(map_idr);
+static int map_cursor;
 static DEFINE_SPINLOCK(map_idr_lock);
 
 int sysctl_unprivileged_bpf_disabled __read_mostly;
@@ -178,7 +180,8 @@ static int bpf_map_alloc_id(struct bpf_map *map)
 	int id;
 
 	spin_lock_bh(&map_idr_lock);
-	id = idr_alloc_cyclic(&map_idr, map, 1, INT_MAX, GFP_ATOMIC);
+	id = idr_alloc_cyclic(&map_idr, &map_cursor, map, 1, INT_MAX,
+				GFP_ATOMIC);
 	if (id > 0)
 		map->id = id;
 	spin_unlock_bh(&map_idr_lock);
@@ -893,7 +896,8 @@ static int bpf_prog_alloc_id(struct bpf_prog *prog)
 	int id;
 
 	spin_lock_bh(&prog_idr_lock);
-	id = idr_alloc_cyclic(&prog_idr, prog, 1, INT_MAX, GFP_ATOMIC);
+	id = idr_alloc_cyclic(&prog_idr, &prog_cursor, prog, 1, INT_MAX,
+				GFP_ATOMIC);
 	if (id > 0)
 		prog->aux->id = id;
 	spin_unlock_bh(&prog_idr_lock);
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 0b1ffe147f24..351b355336d4 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -173,6 +173,7 @@ static int cgroup_root_count;
 
 /* hierarchy ID allocation and mapping, protected by cgroup_mutex */
 static DEFINE_IDR(cgroup_hierarchy_idr);
+static int cgroup_hierarchy_cursor;
 
 /*
  * Assign a monotonically increasing serial number to csses.  It guarantees
@@ -1213,7 +1214,8 @@ static int cgroup_init_root_id(struct cgroup_root *root)
 
 	lockdep_assert_held(&cgroup_mutex);
 
-	id = idr_alloc_cyclic(&cgroup_hierarchy_idr, root, 0, 0, GFP_KERNEL);
+	id = idr_alloc_cyclic(&cgroup_hierarchy_idr, &cgroup_hierarchy_cursor,
+				root, 0, 0, GFP_KERNEL);
 	if (id < 0)
 		return id;
 
diff --git a/kernel/pid.c b/kernel/pid.c
index b13b624e2c49..aae5f3307c35 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -170,14 +170,14 @@ struct pid *alloc_pid(struct pid_namespace *ns)
 		 * init really needs pid 1, but after reaching the maximum
 		 * wrap back to RESERVED_PIDS
 		 */
-		if (idr_get_cursor(&tmp->idr) > RESERVED_PIDS)
+		if (tmp->next_pid > RESERVED_PIDS)
 			pid_min = RESERVED_PIDS;
 
 		/*
 		 * Store a null pointer so find_pid_ns does not find
 		 * a partially initialized PID (see below).
 		 */
-		nr = idr_alloc_cyclic(&tmp->idr, NULL, pid_min,
+		nr = idr_alloc_cyclic(&tmp->idr, &tmp->next_pid, NULL, pid_min,
 				      pid_max, GFP_ATOMIC);
 		spin_unlock_irq(&pidmap_lock);
 		idr_preload_end();
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 0b53eef7d34b..8246d92adc56 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -287,24 +287,22 @@ static int pid_ns_ctl_handler(struct ctl_table *table, int write,
 {
 	struct pid_namespace *pid_ns = task_active_pid_ns(current);
 	struct ctl_table tmp = *table;
-	int ret, next;
+	int ret, prev;
 
 	if (write && !ns_capable(pid_ns->user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 
 	/*
 	 * Writing directly to ns' last_pid field is OK, since this field
-	 * is volatile in a living namespace anyway and a code writing to
+	 * is volatile in a living namespace anyway and code writing to
 	 * it should synchronize its usage with external means.
 	 */
 
-	next = idr_get_cursor(&pid_ns->idr) - 1;
-
-	tmp.data = &next;
+	prev = pid_ns->next_pid - 1;
+	tmp.data = &prev;
 	ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
 	if (!ret && write)
-		idr_set_cursor(&pid_ns->idr, next + 1);
-
+		pid_ns->next_pid = prev + 1;
 	return ret;
 }
 
diff --git a/lib/idr.c b/lib/idr.c
index 2593ce513a18..26cb99412b8f 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -17,6 +17,9 @@ int idr_alloc_cmn(struct idr *idr, void *ptr, unsigned long *index,
 	if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr)))
 		return -EINVAL;
 
+	if (WARN_ON_ONCE(!(idr->idr_rt.gfp_mask & ROOT_IS_IDR)))
+		idr->idr_rt.gfp_mask |= IDR_RT_MARKER;
+
 	radix_tree_iter_init(&iter, start);
 	if (ext)
 		slot = idr_get_free_ext(&idr->idr_rt, &iter, gfp, end);
@@ -35,20 +38,22 @@ int idr_alloc_cmn(struct idr *idr, void *ptr, unsigned long *index,
 EXPORT_SYMBOL_GPL(idr_alloc_cmn);
 
 /**
- * idr_alloc_cyclic - allocate new idr entry in a cyclical fashion
- * @idr: idr handle
- * @ptr: pointer to be associated with the new id
- * @start: the minimum id (inclusive)
- * @end: the maximum id (exclusive)
- * @gfp: memory allocation flags
+ * idr_alloc_cyclic() - Allocate new idr entry in a cyclical fashion.
+ * @idr: idr handle.
+ * @cursor: A pointer to the next ID to allocate.
+ * @ptr: Pointer to be associated with the new id.
+ * @start: The minimum id (inclusive).
+ * @end: The maximum id (exclusive).
+ * @gfp: Memory allocation flags.
  *
  * Allocates an ID larger than the last ID allocated if one is available.
  * If not, it will attempt to allocate the smallest ID that is larger or
  * equal to @start.
  */
-int idr_alloc_cyclic(struct idr *idr, void *ptr, int start, int end, gfp_t gfp)
+int idr_alloc_cyclic(struct idr *idr, int *cursor, void *ptr,
+			int start, int end, gfp_t gfp)
 {
-	int id, curr = idr->idr_next;
+	int id, curr = *cursor;
 
 	if (curr < start)
 		curr = start;
@@ -58,7 +63,7 @@ int idr_alloc_cyclic(struct idr *idr, void *ptr, int start, int end, gfp_t gfp)
 		id = idr_alloc(idr, ptr, start, curr, gfp);
 
 	if (id >= 0)
-		idr->idr_next = id + 1U;
+		*cursor = id + 1U;
 
 	return id;
 }
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 9b5c46b052fd..e10f35c2b8c3 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -959,7 +959,7 @@ static int __init af_rxrpc_init(void)
 	tmp &= 0x3fffffff;
 	if (tmp == 0)
 		tmp = 1;
-	idr_set_cursor(&rxrpc_client_conn_ids, tmp);
+	rxrpc_client_conn_cursor = tmp;
 
 	ret = -ENOMEM;
 	rxrpc_call_jar = kmem_cache_create(
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index b2151993d384..a6fb30b08a30 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -806,6 +806,7 @@ extern unsigned int rxrpc_reap_client_connections;
 extern unsigned int rxrpc_conn_idle_client_expiry;
 extern unsigned int rxrpc_conn_idle_client_fast_expiry;
 extern struct idr rxrpc_client_conn_ids;
+extern int rxrpc_client_conn_cursor;
 
 void rxrpc_destroy_client_conn_ids(void);
 int rxrpc_connect_call(struct rxrpc_call *, struct rxrpc_conn_parameters *,
diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c
index 5f9624bd311c..7e8bf10fec86 100644
--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -91,8 +91,9 @@ __read_mostly unsigned int rxrpc_conn_idle_client_fast_expiry = 2 * HZ;
 /*
  * We use machine-unique IDs for our client connections.
  */
-DEFINE_IDR(rxrpc_client_conn_ids);
 static DEFINE_SPINLOCK(rxrpc_conn_id_lock);
+int rxrpc_client_conn_cursor;
+DEFINE_IDR(rxrpc_client_conn_ids);
 
 static void rxrpc_cull_active_client_conns(struct rxrpc_net *);
 
@@ -112,14 +113,12 @@ static int rxrpc_get_client_connection_id(struct rxrpc_connection *conn,
 
 	idr_preload(gfp);
 	spin_lock(&rxrpc_conn_id_lock);
-
-	id = idr_alloc_cyclic(&rxrpc_client_conn_ids, conn,
-			      1, 0x40000000, GFP_NOWAIT);
-	if (id < 0)
-		goto error;
-
+	id = idr_alloc_cyclic(&rxrpc_client_conn_ids, &rxrpc_client_conn_cursor,
+				conn, 1, 0x40000000, GFP_NOWAIT);
 	spin_unlock(&rxrpc_conn_id_lock);
 	idr_preload_end();
+	if (id < 0)
+		goto error;
 
 	conn->proto.epoch = rxnet->epoch;
 	conn->proto.cid = id << RXRPC_CIDSHIFT;
@@ -128,8 +127,6 @@ static int rxrpc_get_client_connection_id(struct rxrpc_connection *conn,
 	return 0;
 
 error:
-	spin_unlock(&rxrpc_conn_id_lock);
-	idr_preload_end();
 	_leave(" = %d", id);
 	return id;
 }
@@ -238,7 +235,7 @@ rxrpc_alloc_client_connection(struct rxrpc_conn_parameters *cp, gfp_t gfp)
 static bool rxrpc_may_reuse_conn(struct rxrpc_connection *conn)
 {
 	struct rxrpc_net *rxnet = conn->params.local->rxnet;
-	int id_cursor, id, distance, limit;
+	int id, distance, limit;
 
 	if (test_bit(RXRPC_CONN_DONT_REUSE, &conn->flags))
 		goto dont_reuse;
@@ -252,9 +249,8 @@ static bool rxrpc_may_reuse_conn(struct rxrpc_connection *conn)
 	 * times the maximum number of client conns away from the current
 	 * allocation point to try and keep the IDs concentrated.
 	 */
-	id_cursor = idr_get_cursor(&rxrpc_client_conn_ids);
 	id = conn->proto.cid >> RXRPC_CIDSHIFT;
-	distance = id - id_cursor;
+	distance = id - rxrpc_client_conn_cursor;
 	if (distance < 0)
 		distance = -distance;
 	limit = max(rxrpc_max_client_connections * 4, 1024U);
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 69394f4d6091..77178d7456fd 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1622,7 +1622,8 @@ int sctp_assoc_set_id(struct sctp_association *asoc, gfp_t gfp)
 		idr_preload(gfp);
 	spin_lock_bh(&sctp_assocs_id_lock);
 	/* 0 is not a valid assoc_id, must be >= 1 */
-	ret = idr_alloc_cyclic(&sctp_assocs_id, asoc, 1, 0, GFP_NOWAIT);
+	ret = idr_alloc_cyclic(&sctp_assocs_id, &sctp_assocs_cursor,
+				asoc, 1, 0, GFP_NOWAIT);
 	spin_unlock_bh(&sctp_assocs_id_lock);
 	if (preload)
 		idr_preload_end();
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index f5172c21349b..13b2c34c9502 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -67,6 +67,7 @@ struct sctp_globals sctp_globals __read_mostly;
 
 struct idr sctp_assocs_id;
 DEFINE_SPINLOCK(sctp_assocs_id_lock);
+int sctp_assocs_cursor;
 
 static struct sctp_pf *sctp_pf_inet6_specific;
 static struct sctp_pf *sctp_pf_inet_specific;
diff --git a/tools/testing/radix-tree/idr-test.c b/tools/testing/radix-tree/idr-test.c
index 193450b29bf0..1dff94c15da5 100644
--- a/tools/testing/radix-tree/idr-test.c
+++ b/tools/testing/radix-tree/idr-test.c
@@ -42,9 +42,10 @@ void idr_alloc_test(void)
 {
 	unsigned long i;
 	DEFINE_IDR(idr);
+	int cursor = 0;
 
-	assert(idr_alloc_cyclic(&idr, DUMMY_PTR, 0, 0x4000, GFP_KERNEL) == 0);
-	assert(idr_alloc_cyclic(&idr, DUMMY_PTR, 0x3ffd, 0x4000, GFP_KERNEL) == 0x3ffd);
+	assert(idr_alloc_cyclic(&idr, &cursor, DUMMY_PTR, 0, 0x4000, GFP_KERNEL) == 0);
+	assert(idr_alloc_cyclic(&idr, &cursor, DUMMY_PTR, 0x3ffd, 0x4000, GFP_KERNEL) == 0x3ffd);
 	idr_remove(&idr, 0x3ffd);
 	idr_remove(&idr, 0);
 
@@ -57,7 +58,18 @@ void idr_alloc_test(void)
 		else
 			item = item_create(i - 0x3fff, 0);
 
-		id = idr_alloc_cyclic(&idr, item, 1, 0x4000, GFP_KERNEL);
+		id = idr_alloc_cyclic(&idr, &cursor, item, 1, 0x4000, GFP_KERNEL);
+		assert(id == item->index);
+	}
+
+	idr_for_each(&idr, item_idr_free, &idr);
+	idr_destroy(&idr);
+
+	cursor = 0x7ffffffe;
+	for (i = 0x7ffffffe; i < 0x80000003; i++) {
+		struct item *item = item_create(i & 0x7fffffff, 0);
+		int id = idr_alloc_cyclic(&idr, &cursor, item, 0, 0,
+								GFP_KERNEL);
 		assert(id == item->index);
 	}
 
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 07/62] idr: Rewrite extended IDR API
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Rename the API to be 'ul' for unsigned long instead of 'ext'.  This fits
better with other usage in the Linux kernel.

 - idr_replace(), idr_remove() and idr_find() can simply take an unsigned
   long index instead of an int.
 - idr_alloc() turns back into a regular function.
 - idr_alloc_ul() takes 'nextid' as an in-out parameter like idr_get_next(),
   instead of having 'index' as an out-only parameter.
 - idr_alloc_ul() needs a __must_check to ensure that users don't look at
   the result without checking whether the function succeeded.
 - idr_alloc_ul() takes 'max' rather than 'end', or it is impossible to
   allocate the ULONG_MAX id.  This differs from idr_alloc(), but so do
   many other things.
 - We don't need separate idr_get_free() and idr_get_free_ext().
 - Add kerneldoc for idr_alloc_ul().
 - Add an idr_alloc_u32() helper for the majority of existing users.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/idr.h        |  80 ++++++++----------------------------
 include/linux/radix-tree.h |  17 +-------
 lib/idr.c                  | 100 +++++++++++++++++++++++++++++++++------------
 lib/radix-tree.c           |   2 +-
 net/sched/act_api.c        |  60 +++++++++++++--------------
 net/sched/cls_basic.c      |  20 ++++-----
 net/sched/cls_bpf.c        |  20 ++++-----
 net/sched/cls_flower.c     |  32 ++++++---------
 net/sched/cls_u32.c        |  44 +++++++++-----------
 9 files changed, 171 insertions(+), 204 deletions(-)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index 10bfe62423df..87ae635fcee6 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -54,73 +54,30 @@ struct idr {
 
 void idr_preload(gfp_t gfp_mask);
 
-int idr_alloc_cmn(struct idr *idr, void *ptr, unsigned long *index,
-		  unsigned long start, unsigned long end, gfp_t gfp,
-		  bool ext);
-
-/**
- * idr_alloc - allocate an id
- * @idr: idr handle
- * @ptr: pointer to be associated with the new id
- * @start: the minimum id (inclusive)
- * @end: the maximum id (exclusive)
- * @gfp: memory allocation flags
- *
- * Allocates an unused ID in the range [start, end).  Returns -ENOSPC
- * if there are no unused IDs in that range.
- *
- * Note that @end is treated as max when <= 0.  This is to always allow
- * using @start + N as @end as long as N is inside integer range.
- *
- * Simultaneous modifications to the @idr are not allowed and should be
- * prevented by the user, usually with a lock.  idr_alloc() may be called
- * concurrently with read-only accesses to the @idr, such as idr_find() and
- * idr_for_each_entry().
- */
-static inline int idr_alloc(struct idr *idr, void *ptr,
-			    int start, int end, gfp_t gfp)
-{
-	unsigned long id;
-	int ret;
-
-	if (WARN_ON_ONCE(start < 0))
-		return -EINVAL;
-
-	ret = idr_alloc_cmn(idr, ptr, &id, start, end, gfp, false);
-
-	if (ret)
-		return ret;
-
-	return id;
-}
-
-static inline int idr_alloc_ext(struct idr *idr, void *ptr,
-				unsigned long *index,
-				unsigned long start,
-				unsigned long end,
-				gfp_t gfp)
-{
-	return idr_alloc_cmn(idr, ptr, index, start, end, gfp, true);
-}
-
+int idr_alloc(struct idr *, void *, int start, int end, gfp_t);
+int __must_check idr_alloc_ul(struct idr *, void *, unsigned long *nextid,
+			unsigned long max, gfp_t);
 int idr_alloc_cyclic(struct idr *, int *cursor, void *entry,
 			int start, int end, gfp_t);
 int idr_for_each(const struct idr *,
 		 int (*fn)(int id, void *p, void *data), void *data);
 void *idr_get_next(struct idr *, int *nextid);
-void *idr_get_next_ext(struct idr *idr, unsigned long *nextid);
-void *idr_replace(struct idr *, void *, int id);
-void *idr_replace_ext(struct idr *idr, void *ptr, unsigned long id);
+void *idr_get_next_ul(struct idr *, unsigned long *nextid);
+void *idr_replace(struct idr *, void *, unsigned long id);
 void idr_destroy(struct idr *);
 
-static inline void *idr_remove_ext(struct idr *idr, unsigned long id)
+static inline int __must_check idr_alloc_u32(struct idr *idr, void *ptr,
+				u32 *nextid, unsigned long max, gfp_t gfp)
 {
-	return radix_tree_delete_item(&idr->idr_rt, id, NULL);
+	unsigned long tmp = *nextid;
+	int ret = idr_alloc_ul(idr, ptr, &tmp, max, gfp);
+	*nextid = tmp;
+	return ret;
 }
 
-static inline void *idr_remove(struct idr *idr, int id)
+static inline void *idr_remove(struct idr *idr, unsigned long id)
 {
-	return idr_remove_ext(idr, id);
+	return radix_tree_delete_item(&idr->idr_rt, id, NULL);
 }
 
 static inline void idr_init(struct idr *idr)
@@ -157,16 +114,11 @@ static inline void idr_preload_end(void)
  * This function can be called under rcu_read_lock(), given that the leaf
  * pointers lifetimes are correctly managed.
  */
-static inline void *idr_find_ext(const struct idr *idr, unsigned long id)
+static inline void *idr_find(const struct idr *idr, unsigned long id)
 {
 	return radix_tree_lookup(&idr->idr_rt, id);
 }
 
-static inline void *idr_find(const struct idr *idr, int id)
-{
-	return idr_find_ext(idr, id);
-}
-
 /**
  * idr_for_each_entry - iterate over an idr's elements of a given type
  * @idr:     idr handle
@@ -179,8 +131,8 @@ static inline void *idr_find(const struct idr *idr, int id)
  */
 #define idr_for_each_entry(idr, entry, id)			\
 	for (id = 0; ((entry) = idr_get_next(idr, &(id))) != NULL; ++id)
-#define idr_for_each_entry_ext(idr, entry, id)			\
-	for (id = 0; ((entry) = idr_get_next_ext(idr, &(id))) != NULL; ++id)
+#define idr_for_each_entry_ul(idr, entry, id)			\
+	for (id = 0; ((entry) = idr_get_next_ul(idr, &(id))) != NULL; ++id)
 
 /**
  * idr_for_each_entry_continue - continue iteration over an idr's elements of a given type
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 23a9c89c7ad9..fc55ff31eca7 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -356,24 +356,9 @@ int radix_tree_split(struct radix_tree_root *, unsigned long index,
 int radix_tree_join(struct radix_tree_root *, unsigned long index,
 			unsigned new_order, void *);
 
-void __rcu **idr_get_free_cmn(struct radix_tree_root *root,
+void __rcu **idr_get_free(struct radix_tree_root *root,
 			      struct radix_tree_iter *iter, gfp_t gfp,
 			      unsigned long max);
-static inline void __rcu **idr_get_free(struct radix_tree_root *root,
-					struct radix_tree_iter *iter,
-					gfp_t gfp,
-					int end)
-{
-	return idr_get_free_cmn(root, iter, gfp, end > 0 ? end - 1 : INT_MAX);
-}
-
-static inline void __rcu **idr_get_free_ext(struct radix_tree_root *root,
-					    struct radix_tree_iter *iter,
-					    gfp_t gfp,
-					    unsigned long end)
-{
-	return idr_get_free_cmn(root, iter, gfp, end - 1);
-}
 
 enum {
 	RADIX_TREE_ITER_TAG_MASK = 0x0f,	/* tag index in lower nybble */
diff --git a/lib/idr.c b/lib/idr.c
index 26cb99412b8f..35678388e134 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -7,9 +7,26 @@
 DEFINE_PER_CPU(struct ida_bitmap *, ida_bitmap);
 static DEFINE_SPINLOCK(simple_ida_lock);
 
-int idr_alloc_cmn(struct idr *idr, void *ptr, unsigned long *index,
-		  unsigned long start, unsigned long end, gfp_t gfp,
-		  bool ext)
+/**
+ * idr_alloc_ul() - allocate a large ID
+ * @idr: idr handle
+ * @ptr: pointer to be associated with the new ID
+ * @nextid: Pointer to minimum ID to allocate
+ * @max: the maximum ID (inclusive)
+ * @gfp: memory allocation flags
+ *
+ * Allocates an unused ID in the range [*nextid, end] and stores it in
+ * @nextid.  Note that @max differs from the @end parameter to idr_alloc().
+ *
+ * Simultaneous modifications to the @idr are not allowed and should be
+ * prevented by the user, usually with a lock.  idr_alloc_ul() may be called
+ * concurrently with read-only accesses to the @idr, such as idr_find() and
+ * idr_for_each_entry().
+ *
+ * Return: 0 on success or a negative errno on failure (ENOMEM or ENOSPC)
+ */
+int idr_alloc_ul(struct idr *idr, void *ptr, unsigned long *nextid,
+			unsigned long max, gfp_t gfp)
 {
 	struct radix_tree_iter iter;
 	void __rcu **slot;
@@ -20,22 +37,54 @@ int idr_alloc_cmn(struct idr *idr, void *ptr, unsigned long *index,
 	if (WARN_ON_ONCE(!(idr->idr_rt.gfp_mask & ROOT_IS_IDR)))
 		idr->idr_rt.gfp_mask |= IDR_RT_MARKER;
 
-	radix_tree_iter_init(&iter, start);
-	if (ext)
-		slot = idr_get_free_ext(&idr->idr_rt, &iter, gfp, end);
-	else
-		slot = idr_get_free(&idr->idr_rt, &iter, gfp, end);
+	radix_tree_iter_init(&iter, *nextid);
+	slot = idr_get_free(&idr->idr_rt, &iter, gfp, max);
 	if (IS_ERR(slot))
 		return PTR_ERR(slot);
 
 	radix_tree_iter_replace(&idr->idr_rt, &iter, slot, ptr);
 	radix_tree_iter_tag_clear(&idr->idr_rt, &iter, IDR_FREE);
 
-	if (index)
-		*index = iter.index;
+	*nextid = iter.index;
 	return 0;
 }
-EXPORT_SYMBOL_GPL(idr_alloc_cmn);
+EXPORT_SYMBOL_GPL(idr_alloc_ul);
+
+/**
+ * idr_alloc - allocate an id
+ * @idr: idr handle
+ * @ptr: pointer to be associated with the new id
+ * @start: the minimum id (inclusive)
+ * @end: the maximum id (exclusive)
+ * @gfp: memory allocation flags
+ *
+ * Allocates an unused ID in the range [start, end).  Returns -ENOSPC
+ * if there are no unused IDs in that range.
+ *
+ * Note that @end is treated as max when <= 0.  This is to always allow
+ * using @start + N as @end as long as N is inside integer range.
+ *
+ * Simultaneous modifications to the @idr are not allowed and should be
+ * prevented by the user, usually with a lock.  idr_alloc() may be called
+ * concurrently with read-only accesses to the @idr, such as idr_find() and
+ * idr_for_each_entry().
+ */
+int idr_alloc(struct idr *idr, void *ptr, int start, int end, gfp_t gfp)
+{
+	unsigned long id = start;
+	int ret;
+
+	if (WARN_ON_ONCE(start < 0))
+		return -EINVAL;
+
+	ret = idr_alloc_ul(idr, ptr, &id, end > 0 ? end - 1 : INT_MAX, gfp);
+
+	if (ret)
+		return ret;
+
+	return id;
+}
+EXPORT_SYMBOL_GPL(idr_alloc);
 
 /**
  * idr_alloc_cyclic() - Allocate new idr entry in a cyclical fashion.
@@ -126,7 +175,17 @@ void *idr_get_next(struct idr *idr, int *nextid)
 }
 EXPORT_SYMBOL(idr_get_next);
 
-void *idr_get_next_ext(struct idr *idr, unsigned long *nextid)
+/**
+ * idr_get_next_ul - Find next populated entry
+ * @idr: idr handle
+ * @nextid: Pointer to lowest possible ID to return
+ *
+ * Returns the next populated entry in the tree with an ID greater than
+ * or equal to the value pointed to by @nextid.  On exit, @nextid is updated
+ * to the ID of the found value.  To use in a loop, the value pointed to by
+ * nextid must be incremented by the user.
+ */
+void *idr_get_next_ul(struct idr *idr, unsigned long *nextid)
 {
 	struct radix_tree_iter iter;
 	void __rcu **slot;
@@ -138,7 +197,7 @@ void *idr_get_next_ext(struct idr *idr, unsigned long *nextid)
 	*nextid = iter.index;
 	return rcu_dereference_raw(*slot);
 }
-EXPORT_SYMBOL(idr_get_next_ext);
+EXPORT_SYMBOL(idr_get_next_ul);
 
 /**
  * idr_replace - replace pointer for given id
@@ -154,16 +213,7 @@ EXPORT_SYMBOL(idr_get_next_ext);
  * Returns: the old value on success.  %-ENOENT indicates that @id was not
  * found.  %-EINVAL indicates that @id or @ptr were not valid.
  */
-void *idr_replace(struct idr *idr, void *ptr, int id)
-{
-	if (id < 0)
-		return ERR_PTR(-EINVAL);
-
-	return idr_replace_ext(idr, ptr, id);
-}
-EXPORT_SYMBOL(idr_replace);
-
-void *idr_replace_ext(struct idr *idr, void *ptr, unsigned long id)
+void *idr_replace(struct idr *idr, void *ptr, unsigned long id)
 {
 	struct radix_tree_node *node;
 	void __rcu **slot = NULL;
@@ -180,7 +230,7 @@ void *idr_replace_ext(struct idr *idr, void *ptr, unsigned long id)
 
 	return entry;
 }
-EXPORT_SYMBOL(idr_replace_ext);
+EXPORT_SYMBOL(idr_replace);
 
 /**
  * DOC: IDA description
@@ -240,7 +290,7 @@ EXPORT_SYMBOL(idr_replace_ext);
  * bitmap, which is excessive.
  */
 
-#define IDA_MAX (0x80000000U / IDA_BITMAP_BITS)
+#define IDA_MAX (0x80000000U / IDA_BITMAP_BITS - 1)
 
 /**
  * ida_get_new_above - allocate new ID above or equal to a start id
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index f00303e0b216..6d29ca4c8db0 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -2135,7 +2135,7 @@ int ida_pre_get(struct ida *ida, gfp_t gfp)
 }
 EXPORT_SYMBOL(ida_pre_get);
 
-void __rcu **idr_get_free_cmn(struct radix_tree_root *root,
+void __rcu **idr_get_free(struct radix_tree_root *root,
 			      struct radix_tree_iter *iter, gfp_t gfp,
 			      unsigned long max)
 {
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 4d33a50a8a6d..a6aa606b5e99 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -78,7 +78,7 @@ static void free_tcf(struct tc_action *p)
 static void tcf_idr_remove(struct tcf_idrinfo *idrinfo, struct tc_action *p)
 {
 	spin_lock_bh(&idrinfo->lock);
-	idr_remove_ext(&idrinfo->action_idr, p->tcfa_index);
+	idr_remove(&idrinfo->action_idr, p->tcfa_index);
 	spin_unlock_bh(&idrinfo->lock);
 	gen_kill_estimator(&p->tcfa_rate_est);
 	free_tcf(p);
@@ -124,7 +124,7 @@ static int tcf_dump_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
 
 	s_i = cb->args[0];
 
-	idr_for_each_entry_ext(idr, p, id) {
+	idr_for_each_entry_ul(idr, p, id) {
 		index++;
 		if (index < s_i)
 			continue;
@@ -181,7 +181,7 @@ static int tcf_del_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
 	if (nla_put_string(skb, TCA_KIND, ops->kind))
 		goto nla_put_failure;
 
-	idr_for_each_entry_ext(idr, p, id) {
+	idr_for_each_entry_ul(idr, p, id) {
 		ret = __tcf_idr_release(p, false, true);
 		if (ret == ACT_P_DELETED) {
 			module_put(ops->owner);
@@ -219,10 +219,10 @@ EXPORT_SYMBOL(tcf_generic_walker);
 
 static struct tc_action *tcf_idr_lookup(u32 index, struct tcf_idrinfo *idrinfo)
 {
-	struct tc_action *p = NULL;
+	struct tc_action *p;
 
 	spin_lock_bh(&idrinfo->lock);
-	p = idr_find_ext(&idrinfo->action_idr, index);
+	p = idr_find(&idrinfo->action_idr, index);
 	spin_unlock_bh(&idrinfo->lock);
 
 	return p;
@@ -274,7 +274,6 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 	struct tcf_idrinfo *idrinfo = tn->idrinfo;
 	struct idr *idr = &idrinfo->action_idr;
 	int err = -ENOMEM;
-	unsigned long idr_index;
 
 	if (unlikely(!p))
 		return -ENOMEM;
@@ -284,45 +283,34 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 
 	if (cpustats) {
 		p->cpu_bstats = netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu);
-		if (!p->cpu_bstats) {
-err1:
-			kfree(p);
-			return err;
-		}
-		p->cpu_qstats = alloc_percpu(struct gnet_stats_queue);
-		if (!p->cpu_qstats) {
-err2:
-			free_percpu(p->cpu_bstats);
+		if (!p->cpu_bstats)
 			goto err1;
-		}
+		p->cpu_qstats = alloc_percpu(struct gnet_stats_queue);
+		if (!p->cpu_qstats)
+			goto err2;
 	}
 	spin_lock_init(&p->tcfa_lock);
 	/* user doesn't specify an index */
 	if (!index) {
+		index = 1;
 		idr_preload(GFP_KERNEL);
 		spin_lock_bh(&idrinfo->lock);
-		err = idr_alloc_ext(idr, NULL, &idr_index, 1, 0,
-				    GFP_ATOMIC);
+		err = idr_alloc_u32(idr, NULL, &index, UINT_MAX, GFP_ATOMIC);
 		spin_unlock_bh(&idrinfo->lock);
 		idr_preload_end();
-		if (err) {
-err3:
-			free_percpu(p->cpu_qstats);
-			goto err2;
-		}
-		p->tcfa_index = idr_index;
+		if (err)
+			goto err3;
 	} else {
 		idr_preload(GFP_KERNEL);
 		spin_lock_bh(&idrinfo->lock);
-		err = idr_alloc_ext(idr, NULL, NULL, index, index + 1,
-				    GFP_ATOMIC);
+		err = idr_alloc_u32(idr, NULL, &index, index, GFP_ATOMIC);
 		spin_unlock_bh(&idrinfo->lock);
 		idr_preload_end();
 		if (err)
 			goto err3;
-		p->tcfa_index = index;
 	}
 
+	p->tcfa_index = index;
 	p->tcfa_tm.install = jiffies;
 	p->tcfa_tm.lastuse = jiffies;
 	p->tcfa_tm.firstuse = 0;
@@ -330,9 +318,8 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 		err = gen_new_estimator(&p->tcfa_bstats, p->cpu_bstats,
 					&p->tcfa_rate_est,
 					&p->tcfa_lock, NULL, est);
-		if (err) {
-			goto err3;
-		}
+		if (err)
+			goto err4;
 	}
 
 	p->idrinfo = idrinfo;
@@ -340,6 +327,15 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 	INIT_LIST_HEAD(&p->list);
 	*a = p;
 	return 0;
+err4:
+	idr_remove(idr, index);
+err3:
+	free_percpu(p->cpu_qstats);
+err2:
+	free_percpu(p->cpu_bstats);
+err1:
+	kfree(p);
+	return err;
 }
 EXPORT_SYMBOL(tcf_idr_create);
 
@@ -348,7 +344,7 @@ void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a)
 	struct tcf_idrinfo *idrinfo = tn->idrinfo;
 
 	spin_lock_bh(&idrinfo->lock);
-	idr_replace_ext(&idrinfo->action_idr, a, a->tcfa_index);
+	idr_replace(&idrinfo->action_idr, a, a->tcfa_index);
 	spin_unlock_bh(&idrinfo->lock);
 }
 EXPORT_SYMBOL(tcf_idr_insert);
@@ -361,7 +357,7 @@ void tcf_idrinfo_destroy(const struct tc_action_ops *ops,
 	int ret;
 	unsigned long id = 1;
 
-	idr_for_each_entry_ext(idr, p, id) {
+	idr_for_each_entry_ul(idr, p, id) {
 		ret = __tcf_idr_release(p, false, true);
 		if (ret == ACT_P_DELETED)
 			module_put(ops->owner);
diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index 5f169ded347e..c2ff835639ed 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -120,7 +120,7 @@ static void basic_destroy(struct tcf_proto *tp)
 	list_for_each_entry_safe(f, n, &head->flist, link) {
 		list_del_rcu(&f->link);
 		tcf_unbind_filter(tp, &f->res);
-		idr_remove_ext(&head->handle_idr, f->handle);
+		idr_remove(&head->handle_idr, f->handle);
 		if (tcf_exts_get_net(&f->exts))
 			call_rcu(&f->rcu, basic_delete_filter);
 		else
@@ -137,7 +137,7 @@ static int basic_delete(struct tcf_proto *tp, void *arg, bool *last)
 
 	list_del_rcu(&f->link);
 	tcf_unbind_filter(tp, &f->res);
-	idr_remove_ext(&head->handle_idr, f->handle);
+	idr_remove(&head->handle_idr, f->handle);
 	tcf_exts_get_net(&f->exts);
 	call_rcu(&f->rcu, basic_delete_filter);
 	*last = list_empty(&head->flist);
@@ -182,7 +182,6 @@ static int basic_change(struct net *net, struct sk_buff *in_skb,
 	struct nlattr *tb[TCA_BASIC_MAX + 1];
 	struct basic_filter *fold = (struct basic_filter *) *arg;
 	struct basic_filter *fnew;
-	unsigned long idr_index;
 
 	if (tca[TCA_OPTIONS] == NULL)
 		return -EINVAL;
@@ -208,30 +207,29 @@ static int basic_change(struct net *net, struct sk_buff *in_skb,
 	if (handle) {
 		fnew->handle = handle;
 		if (!fold) {
-			err = idr_alloc_ext(&head->handle_idr, fnew, &idr_index,
-					    handle, handle + 1, GFP_KERNEL);
+			err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
+						handle, GFP_KERNEL);
 			if (err)
 				goto errout;
 		}
 	} else {
-		err = idr_alloc_ext(&head->handle_idr, fnew, &idr_index,
-				    1, 0x7FFFFFFF, GFP_KERNEL);
-		if (err)
+		err = idr_alloc(&head->handle_idr, fnew, 1, 0, GFP_KERNEL);
+		if (err < 0)
 			goto errout;
-		fnew->handle = idr_index;
+		fnew->handle = err;
 	}
 
 	err = basic_set_parms(net, tp, fnew, base, tb, tca[TCA_RATE], ovr);
 	if (err < 0) {
 		if (!fold)
-			idr_remove_ext(&head->handle_idr, fnew->handle);
+			idr_remove(&head->handle_idr, fnew->handle);
 		goto errout;
 	}
 
 	*arg = fnew;
 
 	if (fold) {
-		idr_replace_ext(&head->handle_idr, fnew, fnew->handle);
+		idr_replace(&head->handle_idr, fnew, fnew->handle);
 		list_replace_rcu(&fold->link, &fnew->link);
 		tcf_unbind_filter(tp, &fold->res);
 		tcf_exts_get_net(&fold->exts);
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index fb680dafac5a..b575d41543df 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -294,7 +294,7 @@ static void __cls_bpf_delete(struct tcf_proto *tp, struct cls_bpf_prog *prog)
 {
 	struct cls_bpf_head *head = rtnl_dereference(tp->root);
 
-	idr_remove_ext(&head->handle_idr, prog->handle);
+	idr_remove(&head->handle_idr, prog->handle);
 	cls_bpf_stop_offload(tp, prog);
 	list_del_rcu(&prog->link);
 	tcf_unbind_filter(tp, &prog->res);
@@ -469,7 +469,6 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 	struct cls_bpf_prog *oldprog = *arg;
 	struct nlattr *tb[TCA_BPF_MAX + 1];
 	struct cls_bpf_prog *prog;
-	unsigned long idr_index;
 	int ret;
 
 	if (tca[TCA_OPTIONS] == NULL)
@@ -496,15 +495,14 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 	}
 
 	if (handle == 0) {
-		ret = idr_alloc_ext(&head->handle_idr, prog, &idr_index,
-				    1, 0x7FFFFFFF, GFP_KERNEL);
-		if (ret)
+		ret = idr_alloc(&head->handle_idr, prog, 1, 0, GFP_KERNEL);
+		if (ret < 0)
 			goto errout;
-		prog->handle = idr_index;
+		prog->handle = ret;
 	} else {
 		if (!oldprog) {
-			ret = idr_alloc_ext(&head->handle_idr, prog, &idr_index,
-					    handle, handle + 1, GFP_KERNEL);
+			ret = idr_alloc_u32(&head->handle_idr, prog, &handle,
+						handle, GFP_KERNEL);
 			if (ret)
 				goto errout;
 		}
@@ -518,7 +516,7 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 	ret = cls_bpf_offload(tp, prog, oldprog);
 	if (ret) {
 		if (!oldprog)
-			idr_remove_ext(&head->handle_idr, prog->handle);
+			idr_remove(&head->handle_idr, prog->handle);
 		__cls_bpf_delete_prog(prog);
 		return ret;
 	}
@@ -527,7 +525,7 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 		prog->gen_flags |= TCA_CLS_FLAGS_NOT_IN_HW;
 
 	if (oldprog) {
-		idr_replace_ext(&head->handle_idr, prog, handle);
+		idr_replace(&head->handle_idr, prog, handle);
 		list_replace_rcu(&oldprog->link, &prog->link);
 		tcf_unbind_filter(tp, &oldprog->res);
 		tcf_exts_get_net(&oldprog->exts);
@@ -541,7 +539,7 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 
 errout_idr:
 	if (!oldprog)
-		idr_remove_ext(&head->handle_idr, prog->handle);
+		idr_remove(&head->handle_idr, prog->handle);
 errout:
 	tcf_exts_destroy(&prog->exts);
 	kfree(prog);
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 543a3e875d05..0867fa7bef5b 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -283,7 +283,7 @@ static void __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f)
 {
 	struct cls_fl_head *head = rtnl_dereference(tp->root);
 
-	idr_remove_ext(&head->handle_idr, f->handle);
+	idr_remove(&head->handle_idr, f->handle);
 	list_del_rcu(&f->list);
 	if (!tc_skip_hw(f->flags))
 		fl_hw_destroy_filter(tp, f);
@@ -329,7 +329,7 @@ static void *fl_get(struct tcf_proto *tp, u32 handle)
 {
 	struct cls_fl_head *head = rtnl_dereference(tp->root);
 
-	return idr_find_ext(&head->handle_idr, handle);
+	return idr_find(&head->handle_idr, handle);
 }
 
 static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 1] = {
@@ -858,7 +858,6 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	struct cls_fl_filter *fnew;
 	struct nlattr **tb;
 	struct fl_flow_mask mask = {};
-	unsigned long idr_index;
 	int err;
 
 	if (!tca[TCA_OPTIONS])
@@ -889,21 +888,15 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 		goto errout;
 
 	if (!handle) {
-		err = idr_alloc_ext(&head->handle_idr, fnew, &idr_index,
-				    1, 0x80000000, GFP_KERNEL);
-		if (err)
-			goto errout;
-		fnew->handle = idr_index;
-	}
-
-	/* user specifies a handle and it doesn't exist */
-	if (handle && !fold) {
-		err = idr_alloc_ext(&head->handle_idr, fnew, &idr_index,
-				    handle, handle + 1, GFP_KERNEL);
-		if (err)
-			goto errout;
-		fnew->handle = idr_index;
+		err = idr_alloc(&head->handle_idr, fnew, 1, 0, GFP_KERNEL);
+	} else if (!fold) {
+		/* user specifies a handle and it doesn't exist */
+		err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
+				    handle, GFP_KERNEL);
 	}
+	if (err < 0)
+		goto errout;
+	fnew->handle = handle;
 
 	if (tb[TCA_FLOWER_FLAGS]) {
 		fnew->flags = nla_get_u32(tb[TCA_FLOWER_FLAGS]);
@@ -957,8 +950,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	*arg = fnew;
 
 	if (fold) {
-		fnew->handle = handle;
-		idr_replace_ext(&head->handle_idr, fnew, fnew->handle);
+		idr_replace(&head->handle_idr, fnew, fnew->handle);
 		list_replace_rcu(&fold->list, &fnew->list);
 		tcf_unbind_filter(tp, &fold->res);
 		tcf_exts_get_net(&fold->exts);
@@ -972,7 +964,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 
 errout_idr:
 	if (fnew->handle)
-		idr_remove_ext(&head->handle_idr, fnew->handle);
+		idr_remove(&head->handle_idr, fnew->handle);
 errout:
 	tcf_exts_destroy(&fnew->exts);
 	kfree(fnew);
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index ac152b4f4247..9260f7d14ab9 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -316,19 +316,16 @@ static void *u32_get(struct tcf_proto *tp, u32 handle)
 	return u32_lookup_key(ht, handle);
 }
 
+/*
+ * This is only used inside rtnl lock it is safe to increment
+ * without read _copy_ update semantics
+ */
 static u32 gen_new_htid(struct tc_u_common *tp_c, struct tc_u_hnode *ptr)
 {
-	unsigned long idr_index;
-	int err;
-
-	/* This is only used inside rtnl lock it is safe to increment
-	 * without read _copy_ update semantics
-	 */
-	err = idr_alloc_ext(&tp_c->handle_idr, ptr, &idr_index,
-			    1, 0x7FF, GFP_KERNEL);
-	if (err)
+	int id = idr_alloc(&tp_c->handle_idr, ptr, 1, 0x7FF, GFP_KERNEL);
+	if (id < 0)
 		return 0;
-	return (u32)(idr_index | 0x800) << 20;
+	return ((u32)id | 0x800) << 20;
 }
 
 static struct hlist_head *tc_u_common_hash;
@@ -591,7 +588,7 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
 					 rtnl_dereference(n->next));
 			tcf_unbind_filter(tp, &n->res);
 			u32_remove_hw_knode(tp, n->handle);
-			idr_remove_ext(&ht->handle_idr, n->handle);
+			idr_remove(&ht->handle_idr, n->handle);
 			if (tcf_exts_get_net(&n->exts))
 				call_rcu(&n->rcu, u32_delete_key_freepf_rcu);
 			else
@@ -617,7 +614,7 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
 		if (phn == ht) {
 			u32_clear_hw_hnode(tp, ht);
 			idr_destroy(&ht->handle_idr);
-			idr_remove_ext(&tp_c->handle_idr, ht->handle);
+			idr_remove(&tp_c->handle_idr, ht->handle);
 			RCU_INIT_POINTER(*hn, ht->next);
 			kfree_rcu(ht, rcu);
 			return 0;
@@ -736,15 +733,15 @@ static int u32_delete(struct tcf_proto *tp, void *arg, bool *last)
 
 static u32 gen_new_kid(struct tc_u_hnode *ht, u32 htid)
 {
-	unsigned long idr_index;
 	u32 start = htid | 0x800;
 	u32 max = htid | 0xFFF;
 	u32 min = htid;
+	unsigned long idr_index = start;
 
-	if (idr_alloc_ext(&ht->handle_idr, NULL, &idr_index,
-			  start, max + 1, GFP_KERNEL)) {
-		if (idr_alloc_ext(&ht->handle_idr, NULL, &idr_index,
-				  min + 1, max + 1, GFP_KERNEL))
+	if (idr_alloc_ul(&ht->handle_idr, NULL, &idr_index, max, GFP_KERNEL)) {
+		idr_index = min + 1;
+		if (idr_alloc_ul(&ht->handle_idr, NULL, &idr_index, max,
+					GFP_KERNEL))
 			return max;
 	}
 
@@ -833,7 +830,7 @@ static void u32_replace_knode(struct tcf_proto *tp, struct tc_u_common *tp_c,
 		if (pins->handle == n->handle)
 			break;
 
-	idr_replace_ext(&ht->handle_idr, n, n->handle);
+	idr_replace(&ht->handle_idr, n, n->handle);
 	RCU_INIT_POINTER(n->next, pins->next);
 	rcu_assign_pointer(*ins, n);
 }
@@ -976,8 +973,8 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 				return -ENOMEM;
 			}
 		} else {
-			err = idr_alloc_ext(&tp_c->handle_idr, ht, NULL,
-					    handle, handle + 1, GFP_KERNEL);
+			err = idr_alloc_u32(&tp_c->handle_idr, ht, &handle,
+						handle, GFP_KERNEL);
 			if (err) {
 				kfree(ht);
 				return err;
@@ -992,7 +989,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 
 		err = u32_replace_hw_hnode(tp, ht, flags);
 		if (err) {
-			idr_remove_ext(&tp_c->handle_idr, handle);
+			idr_remove(&tp_c->handle_idr, handle);
 			kfree(ht);
 			return err;
 		}
@@ -1026,8 +1023,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 		if (TC_U32_HTID(handle) && TC_U32_HTID(handle^htid))
 			return -EINVAL;
 		handle = htid | TC_U32_NODE(handle);
-		err = idr_alloc_ext(&ht->handle_idr, NULL, NULL,
-				    handle, handle + 1,
+		err = idr_alloc_u32(&ht->handle_idr, NULL, &handle, handle,
 				    GFP_KERNEL);
 		if (err)
 			return err;
@@ -1120,7 +1116,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 #endif
 	kfree(n);
 erridr:
-	idr_remove_ext(&ht->handle_idr, handle);
+	idr_remove(&ht->handle_idr, handle);
 	return err;
 }
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 07/62] idr: Rewrite extended IDR API
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Rename the API to be 'ul' for unsigned long instead of 'ext'.  This fits
better with other usage in the Linux kernel.

 - idr_replace(), idr_remove() and idr_find() can simply take an unsigned
   long index instead of an int.
 - idr_alloc() turns back into a regular function.
 - idr_alloc_ul() takes 'nextid' as an in-out parameter like idr_get_next(),
   instead of having 'index' as an out-only parameter.
 - idr_alloc_ul() needs a __must_check to ensure that users don't look at
   the result without checking whether the function succeeded.
 - idr_alloc_ul() takes 'max' rather than 'end', or it is impossible to
   allocate the ULONG_MAX id.  This differs from idr_alloc(), but so do
   many other things.
 - We don't need separate idr_get_free() and idr_get_free_ext().
 - Add kerneldoc for idr_alloc_ul().
 - Add an idr_alloc_u32() helper for the majority of existing users.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/idr.h        |  80 ++++++++----------------------------
 include/linux/radix-tree.h |  17 +-------
 lib/idr.c                  | 100 +++++++++++++++++++++++++++++++++------------
 lib/radix-tree.c           |   2 +-
 net/sched/act_api.c        |  60 +++++++++++++--------------
 net/sched/cls_basic.c      |  20 ++++-----
 net/sched/cls_bpf.c        |  20 ++++-----
 net/sched/cls_flower.c     |  32 ++++++---------
 net/sched/cls_u32.c        |  44 +++++++++-----------
 9 files changed, 171 insertions(+), 204 deletions(-)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index 10bfe62423df..87ae635fcee6 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -54,73 +54,30 @@ struct idr {
 
 void idr_preload(gfp_t gfp_mask);
 
-int idr_alloc_cmn(struct idr *idr, void *ptr, unsigned long *index,
-		  unsigned long start, unsigned long end, gfp_t gfp,
-		  bool ext);
-
-/**
- * idr_alloc - allocate an id
- * @idr: idr handle
- * @ptr: pointer to be associated with the new id
- * @start: the minimum id (inclusive)
- * @end: the maximum id (exclusive)
- * @gfp: memory allocation flags
- *
- * Allocates an unused ID in the range [start, end).  Returns -ENOSPC
- * if there are no unused IDs in that range.
- *
- * Note that @end is treated as max when <= 0.  This is to always allow
- * using @start + N as @end as long as N is inside integer range.
- *
- * Simultaneous modifications to the @idr are not allowed and should be
- * prevented by the user, usually with a lock.  idr_alloc() may be called
- * concurrently with read-only accesses to the @idr, such as idr_find() and
- * idr_for_each_entry().
- */
-static inline int idr_alloc(struct idr *idr, void *ptr,
-			    int start, int end, gfp_t gfp)
-{
-	unsigned long id;
-	int ret;
-
-	if (WARN_ON_ONCE(start < 0))
-		return -EINVAL;
-
-	ret = idr_alloc_cmn(idr, ptr, &id, start, end, gfp, false);
-
-	if (ret)
-		return ret;
-
-	return id;
-}
-
-static inline int idr_alloc_ext(struct idr *idr, void *ptr,
-				unsigned long *index,
-				unsigned long start,
-				unsigned long end,
-				gfp_t gfp)
-{
-	return idr_alloc_cmn(idr, ptr, index, start, end, gfp, true);
-}
-
+int idr_alloc(struct idr *, void *, int start, int end, gfp_t);
+int __must_check idr_alloc_ul(struct idr *, void *, unsigned long *nextid,
+			unsigned long max, gfp_t);
 int idr_alloc_cyclic(struct idr *, int *cursor, void *entry,
 			int start, int end, gfp_t);
 int idr_for_each(const struct idr *,
 		 int (*fn)(int id, void *p, void *data), void *data);
 void *idr_get_next(struct idr *, int *nextid);
-void *idr_get_next_ext(struct idr *idr, unsigned long *nextid);
-void *idr_replace(struct idr *, void *, int id);
-void *idr_replace_ext(struct idr *idr, void *ptr, unsigned long id);
+void *idr_get_next_ul(struct idr *, unsigned long *nextid);
+void *idr_replace(struct idr *, void *, unsigned long id);
 void idr_destroy(struct idr *);
 
-static inline void *idr_remove_ext(struct idr *idr, unsigned long id)
+static inline int __must_check idr_alloc_u32(struct idr *idr, void *ptr,
+				u32 *nextid, unsigned long max, gfp_t gfp)
 {
-	return radix_tree_delete_item(&idr->idr_rt, id, NULL);
+	unsigned long tmp = *nextid;
+	int ret = idr_alloc_ul(idr, ptr, &tmp, max, gfp);
+	*nextid = tmp;
+	return ret;
 }
 
-static inline void *idr_remove(struct idr *idr, int id)
+static inline void *idr_remove(struct idr *idr, unsigned long id)
 {
-	return idr_remove_ext(idr, id);
+	return radix_tree_delete_item(&idr->idr_rt, id, NULL);
 }
 
 static inline void idr_init(struct idr *idr)
@@ -157,16 +114,11 @@ static inline void idr_preload_end(void)
  * This function can be called under rcu_read_lock(), given that the leaf
  * pointers lifetimes are correctly managed.
  */
-static inline void *idr_find_ext(const struct idr *idr, unsigned long id)
+static inline void *idr_find(const struct idr *idr, unsigned long id)
 {
 	return radix_tree_lookup(&idr->idr_rt, id);
 }
 
-static inline void *idr_find(const struct idr *idr, int id)
-{
-	return idr_find_ext(idr, id);
-}
-
 /**
  * idr_for_each_entry - iterate over an idr's elements of a given type
  * @idr:     idr handle
@@ -179,8 +131,8 @@ static inline void *idr_find(const struct idr *idr, int id)
  */
 #define idr_for_each_entry(idr, entry, id)			\
 	for (id = 0; ((entry) = idr_get_next(idr, &(id))) != NULL; ++id)
-#define idr_for_each_entry_ext(idr, entry, id)			\
-	for (id = 0; ((entry) = idr_get_next_ext(idr, &(id))) != NULL; ++id)
+#define idr_for_each_entry_ul(idr, entry, id)			\
+	for (id = 0; ((entry) = idr_get_next_ul(idr, &(id))) != NULL; ++id)
 
 /**
  * idr_for_each_entry_continue - continue iteration over an idr's elements of a given type
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 23a9c89c7ad9..fc55ff31eca7 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -356,24 +356,9 @@ int radix_tree_split(struct radix_tree_root *, unsigned long index,
 int radix_tree_join(struct radix_tree_root *, unsigned long index,
 			unsigned new_order, void *);
 
-void __rcu **idr_get_free_cmn(struct radix_tree_root *root,
+void __rcu **idr_get_free(struct radix_tree_root *root,
 			      struct radix_tree_iter *iter, gfp_t gfp,
 			      unsigned long max);
-static inline void __rcu **idr_get_free(struct radix_tree_root *root,
-					struct radix_tree_iter *iter,
-					gfp_t gfp,
-					int end)
-{
-	return idr_get_free_cmn(root, iter, gfp, end > 0 ? end - 1 : INT_MAX);
-}
-
-static inline void __rcu **idr_get_free_ext(struct radix_tree_root *root,
-					    struct radix_tree_iter *iter,
-					    gfp_t gfp,
-					    unsigned long end)
-{
-	return idr_get_free_cmn(root, iter, gfp, end - 1);
-}
 
 enum {
 	RADIX_TREE_ITER_TAG_MASK = 0x0f,	/* tag index in lower nybble */
diff --git a/lib/idr.c b/lib/idr.c
index 26cb99412b8f..35678388e134 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -7,9 +7,26 @@
 DEFINE_PER_CPU(struct ida_bitmap *, ida_bitmap);
 static DEFINE_SPINLOCK(simple_ida_lock);
 
-int idr_alloc_cmn(struct idr *idr, void *ptr, unsigned long *index,
-		  unsigned long start, unsigned long end, gfp_t gfp,
-		  bool ext)
+/**
+ * idr_alloc_ul() - allocate a large ID
+ * @idr: idr handle
+ * @ptr: pointer to be associated with the new ID
+ * @nextid: Pointer to minimum ID to allocate
+ * @max: the maximum ID (inclusive)
+ * @gfp: memory allocation flags
+ *
+ * Allocates an unused ID in the range [*nextid, end] and stores it in
+ * @nextid.  Note that @max differs from the @end parameter to idr_alloc().
+ *
+ * Simultaneous modifications to the @idr are not allowed and should be
+ * prevented by the user, usually with a lock.  idr_alloc_ul() may be called
+ * concurrently with read-only accesses to the @idr, such as idr_find() and
+ * idr_for_each_entry().
+ *
+ * Return: 0 on success or a negative errno on failure (ENOMEM or ENOSPC)
+ */
+int idr_alloc_ul(struct idr *idr, void *ptr, unsigned long *nextid,
+			unsigned long max, gfp_t gfp)
 {
 	struct radix_tree_iter iter;
 	void __rcu **slot;
@@ -20,22 +37,54 @@ int idr_alloc_cmn(struct idr *idr, void *ptr, unsigned long *index,
 	if (WARN_ON_ONCE(!(idr->idr_rt.gfp_mask & ROOT_IS_IDR)))
 		idr->idr_rt.gfp_mask |= IDR_RT_MARKER;
 
-	radix_tree_iter_init(&iter, start);
-	if (ext)
-		slot = idr_get_free_ext(&idr->idr_rt, &iter, gfp, end);
-	else
-		slot = idr_get_free(&idr->idr_rt, &iter, gfp, end);
+	radix_tree_iter_init(&iter, *nextid);
+	slot = idr_get_free(&idr->idr_rt, &iter, gfp, max);
 	if (IS_ERR(slot))
 		return PTR_ERR(slot);
 
 	radix_tree_iter_replace(&idr->idr_rt, &iter, slot, ptr);
 	radix_tree_iter_tag_clear(&idr->idr_rt, &iter, IDR_FREE);
 
-	if (index)
-		*index = iter.index;
+	*nextid = iter.index;
 	return 0;
 }
-EXPORT_SYMBOL_GPL(idr_alloc_cmn);
+EXPORT_SYMBOL_GPL(idr_alloc_ul);
+
+/**
+ * idr_alloc - allocate an id
+ * @idr: idr handle
+ * @ptr: pointer to be associated with the new id
+ * @start: the minimum id (inclusive)
+ * @end: the maximum id (exclusive)
+ * @gfp: memory allocation flags
+ *
+ * Allocates an unused ID in the range [start, end).  Returns -ENOSPC
+ * if there are no unused IDs in that range.
+ *
+ * Note that @end is treated as max when <= 0.  This is to always allow
+ * using @start + N as @end as long as N is inside integer range.
+ *
+ * Simultaneous modifications to the @idr are not allowed and should be
+ * prevented by the user, usually with a lock.  idr_alloc() may be called
+ * concurrently with read-only accesses to the @idr, such as idr_find() and
+ * idr_for_each_entry().
+ */
+int idr_alloc(struct idr *idr, void *ptr, int start, int end, gfp_t gfp)
+{
+	unsigned long id = start;
+	int ret;
+
+	if (WARN_ON_ONCE(start < 0))
+		return -EINVAL;
+
+	ret = idr_alloc_ul(idr, ptr, &id, end > 0 ? end - 1 : INT_MAX, gfp);
+
+	if (ret)
+		return ret;
+
+	return id;
+}
+EXPORT_SYMBOL_GPL(idr_alloc);
 
 /**
  * idr_alloc_cyclic() - Allocate new idr entry in a cyclical fashion.
@@ -126,7 +175,17 @@ void *idr_get_next(struct idr *idr, int *nextid)
 }
 EXPORT_SYMBOL(idr_get_next);
 
-void *idr_get_next_ext(struct idr *idr, unsigned long *nextid)
+/**
+ * idr_get_next_ul - Find next populated entry
+ * @idr: idr handle
+ * @nextid: Pointer to lowest possible ID to return
+ *
+ * Returns the next populated entry in the tree with an ID greater than
+ * or equal to the value pointed to by @nextid.  On exit, @nextid is updated
+ * to the ID of the found value.  To use in a loop, the value pointed to by
+ * nextid must be incremented by the user.
+ */
+void *idr_get_next_ul(struct idr *idr, unsigned long *nextid)
 {
 	struct radix_tree_iter iter;
 	void __rcu **slot;
@@ -138,7 +197,7 @@ void *idr_get_next_ext(struct idr *idr, unsigned long *nextid)
 	*nextid = iter.index;
 	return rcu_dereference_raw(*slot);
 }
-EXPORT_SYMBOL(idr_get_next_ext);
+EXPORT_SYMBOL(idr_get_next_ul);
 
 /**
  * idr_replace - replace pointer for given id
@@ -154,16 +213,7 @@ EXPORT_SYMBOL(idr_get_next_ext);
  * Returns: the old value on success.  %-ENOENT indicates that @id was not
  * found.  %-EINVAL indicates that @id or @ptr were not valid.
  */
-void *idr_replace(struct idr *idr, void *ptr, int id)
-{
-	if (id < 0)
-		return ERR_PTR(-EINVAL);
-
-	return idr_replace_ext(idr, ptr, id);
-}
-EXPORT_SYMBOL(idr_replace);
-
-void *idr_replace_ext(struct idr *idr, void *ptr, unsigned long id)
+void *idr_replace(struct idr *idr, void *ptr, unsigned long id)
 {
 	struct radix_tree_node *node;
 	void __rcu **slot = NULL;
@@ -180,7 +230,7 @@ void *idr_replace_ext(struct idr *idr, void *ptr, unsigned long id)
 
 	return entry;
 }
-EXPORT_SYMBOL(idr_replace_ext);
+EXPORT_SYMBOL(idr_replace);
 
 /**
  * DOC: IDA description
@@ -240,7 +290,7 @@ EXPORT_SYMBOL(idr_replace_ext);
  * bitmap, which is excessive.
  */
 
-#define IDA_MAX (0x80000000U / IDA_BITMAP_BITS)
+#define IDA_MAX (0x80000000U / IDA_BITMAP_BITS - 1)
 
 /**
  * ida_get_new_above - allocate new ID above or equal to a start id
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index f00303e0b216..6d29ca4c8db0 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -2135,7 +2135,7 @@ int ida_pre_get(struct ida *ida, gfp_t gfp)
 }
 EXPORT_SYMBOL(ida_pre_get);
 
-void __rcu **idr_get_free_cmn(struct radix_tree_root *root,
+void __rcu **idr_get_free(struct radix_tree_root *root,
 			      struct radix_tree_iter *iter, gfp_t gfp,
 			      unsigned long max)
 {
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 4d33a50a8a6d..a6aa606b5e99 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -78,7 +78,7 @@ static void free_tcf(struct tc_action *p)
 static void tcf_idr_remove(struct tcf_idrinfo *idrinfo, struct tc_action *p)
 {
 	spin_lock_bh(&idrinfo->lock);
-	idr_remove_ext(&idrinfo->action_idr, p->tcfa_index);
+	idr_remove(&idrinfo->action_idr, p->tcfa_index);
 	spin_unlock_bh(&idrinfo->lock);
 	gen_kill_estimator(&p->tcfa_rate_est);
 	free_tcf(p);
@@ -124,7 +124,7 @@ static int tcf_dump_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
 
 	s_i = cb->args[0];
 
-	idr_for_each_entry_ext(idr, p, id) {
+	idr_for_each_entry_ul(idr, p, id) {
 		index++;
 		if (index < s_i)
 			continue;
@@ -181,7 +181,7 @@ static int tcf_del_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
 	if (nla_put_string(skb, TCA_KIND, ops->kind))
 		goto nla_put_failure;
 
-	idr_for_each_entry_ext(idr, p, id) {
+	idr_for_each_entry_ul(idr, p, id) {
 		ret = __tcf_idr_release(p, false, true);
 		if (ret == ACT_P_DELETED) {
 			module_put(ops->owner);
@@ -219,10 +219,10 @@ EXPORT_SYMBOL(tcf_generic_walker);
 
 static struct tc_action *tcf_idr_lookup(u32 index, struct tcf_idrinfo *idrinfo)
 {
-	struct tc_action *p = NULL;
+	struct tc_action *p;
 
 	spin_lock_bh(&idrinfo->lock);
-	p = idr_find_ext(&idrinfo->action_idr, index);
+	p = idr_find(&idrinfo->action_idr, index);
 	spin_unlock_bh(&idrinfo->lock);
 
 	return p;
@@ -274,7 +274,6 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 	struct tcf_idrinfo *idrinfo = tn->idrinfo;
 	struct idr *idr = &idrinfo->action_idr;
 	int err = -ENOMEM;
-	unsigned long idr_index;
 
 	if (unlikely(!p))
 		return -ENOMEM;
@@ -284,45 +283,34 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 
 	if (cpustats) {
 		p->cpu_bstats = netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu);
-		if (!p->cpu_bstats) {
-err1:
-			kfree(p);
-			return err;
-		}
-		p->cpu_qstats = alloc_percpu(struct gnet_stats_queue);
-		if (!p->cpu_qstats) {
-err2:
-			free_percpu(p->cpu_bstats);
+		if (!p->cpu_bstats)
 			goto err1;
-		}
+		p->cpu_qstats = alloc_percpu(struct gnet_stats_queue);
+		if (!p->cpu_qstats)
+			goto err2;
 	}
 	spin_lock_init(&p->tcfa_lock);
 	/* user doesn't specify an index */
 	if (!index) {
+		index = 1;
 		idr_preload(GFP_KERNEL);
 		spin_lock_bh(&idrinfo->lock);
-		err = idr_alloc_ext(idr, NULL, &idr_index, 1, 0,
-				    GFP_ATOMIC);
+		err = idr_alloc_u32(idr, NULL, &index, UINT_MAX, GFP_ATOMIC);
 		spin_unlock_bh(&idrinfo->lock);
 		idr_preload_end();
-		if (err) {
-err3:
-			free_percpu(p->cpu_qstats);
-			goto err2;
-		}
-		p->tcfa_index = idr_index;
+		if (err)
+			goto err3;
 	} else {
 		idr_preload(GFP_KERNEL);
 		spin_lock_bh(&idrinfo->lock);
-		err = idr_alloc_ext(idr, NULL, NULL, index, index + 1,
-				    GFP_ATOMIC);
+		err = idr_alloc_u32(idr, NULL, &index, index, GFP_ATOMIC);
 		spin_unlock_bh(&idrinfo->lock);
 		idr_preload_end();
 		if (err)
 			goto err3;
-		p->tcfa_index = index;
 	}
 
+	p->tcfa_index = index;
 	p->tcfa_tm.install = jiffies;
 	p->tcfa_tm.lastuse = jiffies;
 	p->tcfa_tm.firstuse = 0;
@@ -330,9 +318,8 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 		err = gen_new_estimator(&p->tcfa_bstats, p->cpu_bstats,
 					&p->tcfa_rate_est,
 					&p->tcfa_lock, NULL, est);
-		if (err) {
-			goto err3;
-		}
+		if (err)
+			goto err4;
 	}
 
 	p->idrinfo = idrinfo;
@@ -340,6 +327,15 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 	INIT_LIST_HEAD(&p->list);
 	*a = p;
 	return 0;
+err4:
+	idr_remove(idr, index);
+err3:
+	free_percpu(p->cpu_qstats);
+err2:
+	free_percpu(p->cpu_bstats);
+err1:
+	kfree(p);
+	return err;
 }
 EXPORT_SYMBOL(tcf_idr_create);
 
@@ -348,7 +344,7 @@ void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a)
 	struct tcf_idrinfo *idrinfo = tn->idrinfo;
 
 	spin_lock_bh(&idrinfo->lock);
-	idr_replace_ext(&idrinfo->action_idr, a, a->tcfa_index);
+	idr_replace(&idrinfo->action_idr, a, a->tcfa_index);
 	spin_unlock_bh(&idrinfo->lock);
 }
 EXPORT_SYMBOL(tcf_idr_insert);
@@ -361,7 +357,7 @@ void tcf_idrinfo_destroy(const struct tc_action_ops *ops,
 	int ret;
 	unsigned long id = 1;
 
-	idr_for_each_entry_ext(idr, p, id) {
+	idr_for_each_entry_ul(idr, p, id) {
 		ret = __tcf_idr_release(p, false, true);
 		if (ret == ACT_P_DELETED)
 			module_put(ops->owner);
diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index 5f169ded347e..c2ff835639ed 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -120,7 +120,7 @@ static void basic_destroy(struct tcf_proto *tp)
 	list_for_each_entry_safe(f, n, &head->flist, link) {
 		list_del_rcu(&f->link);
 		tcf_unbind_filter(tp, &f->res);
-		idr_remove_ext(&head->handle_idr, f->handle);
+		idr_remove(&head->handle_idr, f->handle);
 		if (tcf_exts_get_net(&f->exts))
 			call_rcu(&f->rcu, basic_delete_filter);
 		else
@@ -137,7 +137,7 @@ static int basic_delete(struct tcf_proto *tp, void *arg, bool *last)
 
 	list_del_rcu(&f->link);
 	tcf_unbind_filter(tp, &f->res);
-	idr_remove_ext(&head->handle_idr, f->handle);
+	idr_remove(&head->handle_idr, f->handle);
 	tcf_exts_get_net(&f->exts);
 	call_rcu(&f->rcu, basic_delete_filter);
 	*last = list_empty(&head->flist);
@@ -182,7 +182,6 @@ static int basic_change(struct net *net, struct sk_buff *in_skb,
 	struct nlattr *tb[TCA_BASIC_MAX + 1];
 	struct basic_filter *fold = (struct basic_filter *) *arg;
 	struct basic_filter *fnew;
-	unsigned long idr_index;
 
 	if (tca[TCA_OPTIONS] == NULL)
 		return -EINVAL;
@@ -208,30 +207,29 @@ static int basic_change(struct net *net, struct sk_buff *in_skb,
 	if (handle) {
 		fnew->handle = handle;
 		if (!fold) {
-			err = idr_alloc_ext(&head->handle_idr, fnew, &idr_index,
-					    handle, handle + 1, GFP_KERNEL);
+			err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
+						handle, GFP_KERNEL);
 			if (err)
 				goto errout;
 		}
 	} else {
-		err = idr_alloc_ext(&head->handle_idr, fnew, &idr_index,
-				    1, 0x7FFFFFFF, GFP_KERNEL);
-		if (err)
+		err = idr_alloc(&head->handle_idr, fnew, 1, 0, GFP_KERNEL);
+		if (err < 0)
 			goto errout;
-		fnew->handle = idr_index;
+		fnew->handle = err;
 	}
 
 	err = basic_set_parms(net, tp, fnew, base, tb, tca[TCA_RATE], ovr);
 	if (err < 0) {
 		if (!fold)
-			idr_remove_ext(&head->handle_idr, fnew->handle);
+			idr_remove(&head->handle_idr, fnew->handle);
 		goto errout;
 	}
 
 	*arg = fnew;
 
 	if (fold) {
-		idr_replace_ext(&head->handle_idr, fnew, fnew->handle);
+		idr_replace(&head->handle_idr, fnew, fnew->handle);
 		list_replace_rcu(&fold->link, &fnew->link);
 		tcf_unbind_filter(tp, &fold->res);
 		tcf_exts_get_net(&fold->exts);
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index fb680dafac5a..b575d41543df 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -294,7 +294,7 @@ static void __cls_bpf_delete(struct tcf_proto *tp, struct cls_bpf_prog *prog)
 {
 	struct cls_bpf_head *head = rtnl_dereference(tp->root);
 
-	idr_remove_ext(&head->handle_idr, prog->handle);
+	idr_remove(&head->handle_idr, prog->handle);
 	cls_bpf_stop_offload(tp, prog);
 	list_del_rcu(&prog->link);
 	tcf_unbind_filter(tp, &prog->res);
@@ -469,7 +469,6 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 	struct cls_bpf_prog *oldprog = *arg;
 	struct nlattr *tb[TCA_BPF_MAX + 1];
 	struct cls_bpf_prog *prog;
-	unsigned long idr_index;
 	int ret;
 
 	if (tca[TCA_OPTIONS] == NULL)
@@ -496,15 +495,14 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 	}
 
 	if (handle == 0) {
-		ret = idr_alloc_ext(&head->handle_idr, prog, &idr_index,
-				    1, 0x7FFFFFFF, GFP_KERNEL);
-		if (ret)
+		ret = idr_alloc(&head->handle_idr, prog, 1, 0, GFP_KERNEL);
+		if (ret < 0)
 			goto errout;
-		prog->handle = idr_index;
+		prog->handle = ret;
 	} else {
 		if (!oldprog) {
-			ret = idr_alloc_ext(&head->handle_idr, prog, &idr_index,
-					    handle, handle + 1, GFP_KERNEL);
+			ret = idr_alloc_u32(&head->handle_idr, prog, &handle,
+						handle, GFP_KERNEL);
 			if (ret)
 				goto errout;
 		}
@@ -518,7 +516,7 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 	ret = cls_bpf_offload(tp, prog, oldprog);
 	if (ret) {
 		if (!oldprog)
-			idr_remove_ext(&head->handle_idr, prog->handle);
+			idr_remove(&head->handle_idr, prog->handle);
 		__cls_bpf_delete_prog(prog);
 		return ret;
 	}
@@ -527,7 +525,7 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 		prog->gen_flags |= TCA_CLS_FLAGS_NOT_IN_HW;
 
 	if (oldprog) {
-		idr_replace_ext(&head->handle_idr, prog, handle);
+		idr_replace(&head->handle_idr, prog, handle);
 		list_replace_rcu(&oldprog->link, &prog->link);
 		tcf_unbind_filter(tp, &oldprog->res);
 		tcf_exts_get_net(&oldprog->exts);
@@ -541,7 +539,7 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 
 errout_idr:
 	if (!oldprog)
-		idr_remove_ext(&head->handle_idr, prog->handle);
+		idr_remove(&head->handle_idr, prog->handle);
 errout:
 	tcf_exts_destroy(&prog->exts);
 	kfree(prog);
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 543a3e875d05..0867fa7bef5b 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -283,7 +283,7 @@ static void __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f)
 {
 	struct cls_fl_head *head = rtnl_dereference(tp->root);
 
-	idr_remove_ext(&head->handle_idr, f->handle);
+	idr_remove(&head->handle_idr, f->handle);
 	list_del_rcu(&f->list);
 	if (!tc_skip_hw(f->flags))
 		fl_hw_destroy_filter(tp, f);
@@ -329,7 +329,7 @@ static void *fl_get(struct tcf_proto *tp, u32 handle)
 {
 	struct cls_fl_head *head = rtnl_dereference(tp->root);
 
-	return idr_find_ext(&head->handle_idr, handle);
+	return idr_find(&head->handle_idr, handle);
 }
 
 static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 1] = {
@@ -858,7 +858,6 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	struct cls_fl_filter *fnew;
 	struct nlattr **tb;
 	struct fl_flow_mask mask = {};
-	unsigned long idr_index;
 	int err;
 
 	if (!tca[TCA_OPTIONS])
@@ -889,21 +888,15 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 		goto errout;
 
 	if (!handle) {
-		err = idr_alloc_ext(&head->handle_idr, fnew, &idr_index,
-				    1, 0x80000000, GFP_KERNEL);
-		if (err)
-			goto errout;
-		fnew->handle = idr_index;
-	}
-
-	/* user specifies a handle and it doesn't exist */
-	if (handle && !fold) {
-		err = idr_alloc_ext(&head->handle_idr, fnew, &idr_index,
-				    handle, handle + 1, GFP_KERNEL);
-		if (err)
-			goto errout;
-		fnew->handle = idr_index;
+		err = idr_alloc(&head->handle_idr, fnew, 1, 0, GFP_KERNEL);
+	} else if (!fold) {
+		/* user specifies a handle and it doesn't exist */
+		err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
+				    handle, GFP_KERNEL);
 	}
+	if (err < 0)
+		goto errout;
+	fnew->handle = handle;
 
 	if (tb[TCA_FLOWER_FLAGS]) {
 		fnew->flags = nla_get_u32(tb[TCA_FLOWER_FLAGS]);
@@ -957,8 +950,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	*arg = fnew;
 
 	if (fold) {
-		fnew->handle = handle;
-		idr_replace_ext(&head->handle_idr, fnew, fnew->handle);
+		idr_replace(&head->handle_idr, fnew, fnew->handle);
 		list_replace_rcu(&fold->list, &fnew->list);
 		tcf_unbind_filter(tp, &fold->res);
 		tcf_exts_get_net(&fold->exts);
@@ -972,7 +964,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 
 errout_idr:
 	if (fnew->handle)
-		idr_remove_ext(&head->handle_idr, fnew->handle);
+		idr_remove(&head->handle_idr, fnew->handle);
 errout:
 	tcf_exts_destroy(&fnew->exts);
 	kfree(fnew);
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index ac152b4f4247..9260f7d14ab9 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -316,19 +316,16 @@ static void *u32_get(struct tcf_proto *tp, u32 handle)
 	return u32_lookup_key(ht, handle);
 }
 
+/*
+ * This is only used inside rtnl lock it is safe to increment
+ * without read _copy_ update semantics
+ */
 static u32 gen_new_htid(struct tc_u_common *tp_c, struct tc_u_hnode *ptr)
 {
-	unsigned long idr_index;
-	int err;
-
-	/* This is only used inside rtnl lock it is safe to increment
-	 * without read _copy_ update semantics
-	 */
-	err = idr_alloc_ext(&tp_c->handle_idr, ptr, &idr_index,
-			    1, 0x7FF, GFP_KERNEL);
-	if (err)
+	int id = idr_alloc(&tp_c->handle_idr, ptr, 1, 0x7FF, GFP_KERNEL);
+	if (id < 0)
 		return 0;
-	return (u32)(idr_index | 0x800) << 20;
+	return ((u32)id | 0x800) << 20;
 }
 
 static struct hlist_head *tc_u_common_hash;
@@ -591,7 +588,7 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
 					 rtnl_dereference(n->next));
 			tcf_unbind_filter(tp, &n->res);
 			u32_remove_hw_knode(tp, n->handle);
-			idr_remove_ext(&ht->handle_idr, n->handle);
+			idr_remove(&ht->handle_idr, n->handle);
 			if (tcf_exts_get_net(&n->exts))
 				call_rcu(&n->rcu, u32_delete_key_freepf_rcu);
 			else
@@ -617,7 +614,7 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
 		if (phn == ht) {
 			u32_clear_hw_hnode(tp, ht);
 			idr_destroy(&ht->handle_idr);
-			idr_remove_ext(&tp_c->handle_idr, ht->handle);
+			idr_remove(&tp_c->handle_idr, ht->handle);
 			RCU_INIT_POINTER(*hn, ht->next);
 			kfree_rcu(ht, rcu);
 			return 0;
@@ -736,15 +733,15 @@ static int u32_delete(struct tcf_proto *tp, void *arg, bool *last)
 
 static u32 gen_new_kid(struct tc_u_hnode *ht, u32 htid)
 {
-	unsigned long idr_index;
 	u32 start = htid | 0x800;
 	u32 max = htid | 0xFFF;
 	u32 min = htid;
+	unsigned long idr_index = start;
 
-	if (idr_alloc_ext(&ht->handle_idr, NULL, &idr_index,
-			  start, max + 1, GFP_KERNEL)) {
-		if (idr_alloc_ext(&ht->handle_idr, NULL, &idr_index,
-				  min + 1, max + 1, GFP_KERNEL))
+	if (idr_alloc_ul(&ht->handle_idr, NULL, &idr_index, max, GFP_KERNEL)) {
+		idr_index = min + 1;
+		if (idr_alloc_ul(&ht->handle_idr, NULL, &idr_index, max,
+					GFP_KERNEL))
 			return max;
 	}
 
@@ -833,7 +830,7 @@ static void u32_replace_knode(struct tcf_proto *tp, struct tc_u_common *tp_c,
 		if (pins->handle == n->handle)
 			break;
 
-	idr_replace_ext(&ht->handle_idr, n, n->handle);
+	idr_replace(&ht->handle_idr, n, n->handle);
 	RCU_INIT_POINTER(n->next, pins->next);
 	rcu_assign_pointer(*ins, n);
 }
@@ -976,8 +973,8 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 				return -ENOMEM;
 			}
 		} else {
-			err = idr_alloc_ext(&tp_c->handle_idr, ht, NULL,
-					    handle, handle + 1, GFP_KERNEL);
+			err = idr_alloc_u32(&tp_c->handle_idr, ht, &handle,
+						handle, GFP_KERNEL);
 			if (err) {
 				kfree(ht);
 				return err;
@@ -992,7 +989,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 
 		err = u32_replace_hw_hnode(tp, ht, flags);
 		if (err) {
-			idr_remove_ext(&tp_c->handle_idr, handle);
+			idr_remove(&tp_c->handle_idr, handle);
 			kfree(ht);
 			return err;
 		}
@@ -1026,8 +1023,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 		if (TC_U32_HTID(handle) && TC_U32_HTID(handle^htid))
 			return -EINVAL;
 		handle = htid | TC_U32_NODE(handle);
-		err = idr_alloc_ext(&ht->handle_idr, NULL, NULL,
-				    handle, handle + 1,
+		err = idr_alloc_u32(&ht->handle_idr, NULL, &handle, handle,
 				    GFP_KERNEL);
 		if (err)
 			return err;
@@ -1120,7 +1116,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 #endif
 	kfree(n);
 erridr:
-	idr_remove_ext(&ht->handle_idr, handle);
+	idr_remove(&ht->handle_idr, handle);
 	return err;
 }
 
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 08/62] Explicitly include radix-tree.h
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

These three files use the radix tree, but rely on an implicit include
of radix-tree.h through fs.h.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 drivers/usb/host/xhci.h | 1 +
 include/linux/fscache.h | 1 +
 lib/dma-debug.c         | 1 +
 3 files changed, 3 insertions(+)

diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 99a014a920d3..054ce74524af 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -15,6 +15,7 @@
 #include <linux/usb.h>
 #include <linux/timer.h>
 #include <linux/kernel.h>
+#include <linux/radix-tree.h>
 #include <linux/usb/hcd.h>
 #include <linux/io-64-nonatomic-lo-hi.h>
 
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index f4ff47d4a893..6a2f631a913f 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -22,6 +22,7 @@
 #include <linux/list.h>
 #include <linux/pagemap.h>
 #include <linux/pagevec.h>
+#include <linux/radix-tree.h>
 
 #if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
 #define fscache_available() (1)
diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 1b34d210452c..fb4af570ce04 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -22,6 +22,7 @@
 #include <linux/dma-mapping.h>
 #include <linux/sched/task.h>
 #include <linux/stacktrace.h>
+#include <linux/radix-tree.h>
 #include <linux/dma-debug.h>
 #include <linux/spinlock.h>
 #include <linux/vmalloc.h>
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 08/62] Explicitly include radix-tree.h
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

These three files use the radix tree, but rely on an implicit include
of radix-tree.h through fs.h.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 drivers/usb/host/xhci.h | 1 +
 include/linux/fscache.h | 1 +
 lib/dma-debug.c         | 1 +
 3 files changed, 3 insertions(+)

diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 99a014a920d3..054ce74524af 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -15,6 +15,7 @@
 #include <linux/usb.h>
 #include <linux/timer.h>
 #include <linux/kernel.h>
+#include <linux/radix-tree.h>
 #include <linux/usb/hcd.h>
 #include <linux/io-64-nonatomic-lo-hi.h>
 
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index f4ff47d4a893..6a2f631a913f 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -22,6 +22,7 @@
 #include <linux/list.h>
 #include <linux/pagemap.h>
 #include <linux/pagevec.h>
+#include <linux/radix-tree.h>
 
 #if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
 #define fscache_available() (1)
diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 1b34d210452c..fb4af570ce04 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -22,6 +22,7 @@
 #include <linux/dma-mapping.h>
 #include <linux/sched/task.h>
 #include <linux/stacktrace.h>
+#include <linux/radix-tree.h>
 #include <linux/dma-debug.h>
 #include <linux/spinlock.h>
 #include <linux/vmalloc.h>
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 09/62] arm64: Turn flush_dcache_mmap_lock into a no-op
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox, com

From: Matthew Wilcox <mawilcox@microsoft.com>

ARM64 doesn't walk the VMA tree in its flush_dcache_page()
implementation, so has no need to take the tree_lock.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft,com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/cacheflush.h | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index 76d1cc85d5b1..568f83533f12 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -130,10 +130,8 @@ static inline void __flush_icache_all(void)
 	dsb(ish);
 }
 
-#define flush_dcache_mmap_lock(mapping) \
-	spin_lock_irq(&(mapping)->tree_lock)
-#define flush_dcache_mmap_unlock(mapping) \
-	spin_unlock_irq(&(mapping)->tree_lock)
+#define flush_dcache_mmap_lock(mapping)		do { } while (0)
+#define flush_dcache_mmap_unlock(mapping)	do { } while (0)
 
 /*
  * We don't appear to need to do anything here.  In fact, if we did, we'd
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 09/62] arm64: Turn flush_dcache_mmap_lock into a no-op
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox, com

From: Matthew Wilcox <mawilcox@microsoft.com>

ARM64 doesn't walk the VMA tree in its flush_dcache_page()
implementation, so has no need to take the tree_lock.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft,com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/cacheflush.h | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index 76d1cc85d5b1..568f83533f12 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -130,10 +130,8 @@ static inline void __flush_icache_all(void)
 	dsb(ish);
 }
 
-#define flush_dcache_mmap_lock(mapping) \
-	spin_lock_irq(&(mapping)->tree_lock)
-#define flush_dcache_mmap_unlock(mapping) \
-	spin_unlock_irq(&(mapping)->tree_lock)
+#define flush_dcache_mmap_lock(mapping)		do { } while (0)
+#define flush_dcache_mmap_unlock(mapping)	do { } while (0)
 
 /*
  * We don't appear to need to do anything here.  In fact, if we did, we'd
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 10/62] unicore32: Turn flush_dcache_mmap_lock into a no-op
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Unicore doesn't walk the VMA tree in its flush_dcache_page()
implementation, so has no need to take the tree_lock.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 arch/unicore32/include/asm/cacheflush.h | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/unicore32/include/asm/cacheflush.h b/arch/unicore32/include/asm/cacheflush.h
index a5e08e2d5d6d..1d9132b66039 100644
--- a/arch/unicore32/include/asm/cacheflush.h
+++ b/arch/unicore32/include/asm/cacheflush.h
@@ -170,10 +170,8 @@ extern void flush_cache_page(struct vm_area_struct *vma,
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 extern void flush_dcache_page(struct page *);
 
-#define flush_dcache_mmap_lock(mapping)			\
-	spin_lock_irq(&(mapping)->tree_lock)
-#define flush_dcache_mmap_unlock(mapping)		\
-	spin_unlock_irq(&(mapping)->tree_lock)
+#define flush_dcache_mmap_lock(mapping)		do { } while (0)
+#define flush_dcache_mmap_unlock(mapping)	do { } while (0)
 
 #define flush_icache_user_range(vma, page, addr, len)	\
 	flush_dcache_page(page)
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 10/62] unicore32: Turn flush_dcache_mmap_lock into a no-op
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Unicore doesn't walk the VMA tree in its flush_dcache_page()
implementation, so has no need to take the tree_lock.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 arch/unicore32/include/asm/cacheflush.h | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/unicore32/include/asm/cacheflush.h b/arch/unicore32/include/asm/cacheflush.h
index a5e08e2d5d6d..1d9132b66039 100644
--- a/arch/unicore32/include/asm/cacheflush.h
+++ b/arch/unicore32/include/asm/cacheflush.h
@@ -170,10 +170,8 @@ extern void flush_cache_page(struct vm_area_struct *vma,
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 extern void flush_dcache_page(struct page *);
 
-#define flush_dcache_mmap_lock(mapping)			\
-	spin_lock_irq(&(mapping)->tree_lock)
-#define flush_dcache_mmap_unlock(mapping)		\
-	spin_unlock_irq(&(mapping)->tree_lock)
+#define flush_dcache_mmap_lock(mapping)		do { } while (0)
+#define flush_dcache_mmap_unlock(mapping)	do { } while (0)
 
 #define flush_icache_user_range(vma, page, addr, len)	\
 	flush_dcache_page(page)
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 11/62] Export __set_page_dirty
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

XFS currently contains a copy-and-paste of __set_page_dirty().  Export
it from buffer.c instead.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/buffer.c        |  3 ++-
 fs/xfs/xfs_aops.c  | 15 ++-------------
 include/linux/mm.h |  1 +
 3 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 0736a6a2e2f0..fd894c2ae284 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -601,7 +601,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode);
  *
  * The caller must hold lock_page_memcg().
  */
-static void __set_page_dirty(struct page *page, struct address_space *mapping,
+void __set_page_dirty(struct page *page, struct address_space *mapping,
 			     int warn)
 {
 	unsigned long flags;
@@ -615,6 +615,7 @@ static void __set_page_dirty(struct page *page, struct address_space *mapping,
 	}
 	spin_unlock_irqrestore(&mapping->tree_lock, flags);
 }
+EXPORT_SYMBOL_GPL(__set_page_dirty);
 
 /*
  * Add a page to the dirty page list.
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index a3eeaba156c5..48b17d522fa1 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1459,19 +1459,8 @@ xfs_vm_set_page_dirty(
 	newly_dirty = !TestSetPageDirty(page);
 	spin_unlock(&mapping->private_lock);
 
-	if (newly_dirty) {
-		/* sigh - __set_page_dirty() is static, so copy it here, too */
-		unsigned long flags;
-
-		spin_lock_irqsave(&mapping->tree_lock, flags);
-		if (page->mapping) {	/* Race with truncate? */
-			WARN_ON_ONCE(!PageUptodate(page));
-			account_page_dirtied(page, mapping);
-			radix_tree_tag_set(&mapping->page_tree,
-					page_index(page), PAGECACHE_TAG_DIRTY);
-		}
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
-	}
+	if (newly_dirty)
+		__set_page_dirty(page, mapping, 1);
 	unlock_page_memcg(page);
 	if (newly_dirty)
 		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ee073146aaa7..c692451245be 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1434,6 +1434,7 @@ extern int try_to_release_page(struct page * page, gfp_t gfp_mask);
 extern void do_invalidatepage(struct page *page, unsigned int offset,
 			      unsigned int length);
 
+void __set_page_dirty(struct page *, struct address_space *, int warn);
 int __set_page_dirty_nobuffers(struct page *page);
 int __set_page_dirty_no_writeback(struct page *page);
 int redirty_page_for_writepage(struct writeback_control *wbc,
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 11/62] Export __set_page_dirty
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

XFS currently contains a copy-and-paste of __set_page_dirty().  Export
it from buffer.c instead.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/buffer.c        |  3 ++-
 fs/xfs/xfs_aops.c  | 15 ++-------------
 include/linux/mm.h |  1 +
 3 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 0736a6a2e2f0..fd894c2ae284 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -601,7 +601,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode);
  *
  * The caller must hold lock_page_memcg().
  */
-static void __set_page_dirty(struct page *page, struct address_space *mapping,
+void __set_page_dirty(struct page *page, struct address_space *mapping,
 			     int warn)
 {
 	unsigned long flags;
@@ -615,6 +615,7 @@ static void __set_page_dirty(struct page *page, struct address_space *mapping,
 	}
 	spin_unlock_irqrestore(&mapping->tree_lock, flags);
 }
+EXPORT_SYMBOL_GPL(__set_page_dirty);
 
 /*
  * Add a page to the dirty page list.
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index a3eeaba156c5..48b17d522fa1 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1459,19 +1459,8 @@ xfs_vm_set_page_dirty(
 	newly_dirty = !TestSetPageDirty(page);
 	spin_unlock(&mapping->private_lock);
 
-	if (newly_dirty) {
-		/* sigh - __set_page_dirty() is static, so copy it here, too */
-		unsigned long flags;
-
-		spin_lock_irqsave(&mapping->tree_lock, flags);
-		if (page->mapping) {	/* Race with truncate? */
-			WARN_ON_ONCE(!PageUptodate(page));
-			account_page_dirtied(page, mapping);
-			radix_tree_tag_set(&mapping->page_tree,
-					page_index(page), PAGECACHE_TAG_DIRTY);
-		}
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
-	}
+	if (newly_dirty)
+		__set_page_dirty(page, mapping, 1);
 	unlock_page_memcg(page);
 	if (newly_dirty)
 		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ee073146aaa7..c692451245be 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1434,6 +1434,7 @@ extern int try_to_release_page(struct page * page, gfp_t gfp_mask);
 extern void do_invalidatepage(struct page *page, unsigned int offset,
 			      unsigned int length);
 
+void __set_page_dirty(struct page *, struct address_space *, int warn);
 int __set_page_dirty_nobuffers(struct page *page);
 int __set_page_dirty_no_writeback(struct page *page);
 int redirty_page_for_writepage(struct writeback_control *wbc,
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 12/62] xfs: Rename xa_ elements to ail_
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This is a simple rename, except that xa_ail becomes ail_head.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/xfs/xfs_buf_item.c    |  10 ++--
 fs/xfs/xfs_dquot.c       |   4 +-
 fs/xfs/xfs_dquot_item.c  |  11 ++--
 fs/xfs/xfs_inode_item.c  |  22 +++----
 fs/xfs/xfs_log.c         |   6 +-
 fs/xfs/xfs_log_recover.c |  80 ++++++++++++-------------
 fs/xfs/xfs_trans.c       |  18 +++---
 fs/xfs/xfs_trans_ail.c   | 152 +++++++++++++++++++++++------------------------
 fs/xfs/xfs_trans_buf.c   |   4 +-
 fs/xfs/xfs_trans_priv.h  |  42 ++++++-------
 10 files changed, 175 insertions(+), 174 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index e0a0af0946f2..6c5035544a93 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -459,7 +459,7 @@ xfs_buf_item_unpin(
 			bp->b_fspriv = NULL;
 			bp->b_iodone = NULL;
 		} else {
-			spin_lock(&ailp->xa_lock);
+			spin_lock(&ailp->ail_lock);
 			xfs_trans_ail_delete(ailp, lip, SHUTDOWN_LOG_IO_ERROR);
 			xfs_buf_item_relse(bp);
 			ASSERT(bp->b_fspriv == NULL);
@@ -1056,13 +1056,13 @@ xfs_buf_do_callbacks_fail(
 	struct xfs_log_item	*lip = bp->b_fspriv;
 	struct xfs_ail		*ailp = lip->li_ailp;
 
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	for (; lip; lip = next) {
 		next = lip->li_bio_list;
 		if (lip->li_ops->iop_error)
 			lip->li_ops->iop_error(lip, bp);
 	}
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 }
 
 static bool
@@ -1215,7 +1215,7 @@ xfs_buf_iodone(
 	 *
 	 * Either way, AIL is useless if we're forcing a shutdown.
 	 */
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
 	xfs_buf_item_free(BUF_ITEM(lip));
 }
@@ -1236,7 +1236,7 @@ xfs_buf_resubmit_failed_buffers(
 	/*
 	 * Clear XFS_LI_FAILED flag from all items before resubmit
 	 *
-	 * XFS_LI_FAILED set/clear is protected by xa_lock, caller  this
+	 * XFS_LI_FAILED set/clear is protected by ail_lock, caller  this
 	 * function already have it acquired
 	 */
 	for (; lip; lip = next) {
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index d57c2db64e59..fbf37b326050 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -973,11 +973,11 @@ xfs_qm_dqflush_done(
 	    lip->li_lsn == qip->qli_flush_lsn) {
 
 		/* xfs_trans_ail_delete() drops the AIL lock. */
-		spin_lock(&ailp->xa_lock);
+		spin_lock(&ailp->ail_lock);
 		if (lip->li_lsn == qip->qli_flush_lsn)
 			xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
 		else
-			spin_unlock(&ailp->xa_lock);
+			spin_unlock(&ailp->ail_lock);
 	}
 
 	/*
diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
index 2c7a1629e064..14951799d920 100644
--- a/fs/xfs/xfs_dquot_item.c
+++ b/fs/xfs/xfs_dquot_item.c
@@ -140,8 +140,9 @@ xfs_qm_dqunpin_wait(
 STATIC uint
 xfs_qm_dquot_logitem_push(
 	struct xfs_log_item	*lip,
-	struct list_head	*buffer_list) __releases(&lip->li_ailp->xa_lock)
-					      __acquires(&lip->li_ailp->xa_lock)
+	struct list_head	*buffer_list)
+		__releases(&lip->li_ailp->ail_lock)
+		__acquires(&lip->li_ailp->ail_lock)
 {
 	struct xfs_dquot	*dqp = DQUOT_ITEM(lip)->qli_dquot;
 	struct xfs_buf		*bp = NULL;
@@ -173,7 +174,7 @@ xfs_qm_dquot_logitem_push(
 		goto out_unlock;
 	}
 
-	spin_unlock(&lip->li_ailp->xa_lock);
+	spin_unlock(&lip->li_ailp->ail_lock);
 
 	error = xfs_qm_dqflush(dqp, &bp);
 	if (error) {
@@ -185,7 +186,7 @@ xfs_qm_dquot_logitem_push(
 		xfs_buf_relse(bp);
 	}
 
-	spin_lock(&lip->li_ailp->xa_lock);
+	spin_lock(&lip->li_ailp->ail_lock);
 out_unlock:
 	xfs_dqunlock(dqp);
 	return rval;
@@ -367,7 +368,7 @@ xfs_qm_qoffend_logitem_committed(
 	 * Delete the qoff-start logitem from the AIL.
 	 * xfs_trans_ail_delete() drops the AIL lock.
 	 */
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	xfs_trans_ail_delete(ailp, &qfs->qql_item, SHUTDOWN_LOG_IO_ERROR);
 
 	kmem_free(qfs->qql_item.li_lv_shadow);
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 6ee5c3bf19ad..071acd4249a0 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -501,8 +501,8 @@ STATIC uint
 xfs_inode_item_push(
 	struct xfs_log_item	*lip,
 	struct list_head	*buffer_list)
-		__releases(&lip->li_ailp->xa_lock)
-		__acquires(&lip->li_ailp->xa_lock)
+		__releases(&lip->li_ailp->ail_lock)
+		__acquires(&lip->li_ailp->ail_lock)
 {
 	struct xfs_inode_log_item *iip = INODE_ITEM(lip);
 	struct xfs_inode	*ip = iip->ili_inode;
@@ -561,7 +561,7 @@ xfs_inode_item_push(
 	ASSERT(iip->ili_fields != 0 || XFS_FORCED_SHUTDOWN(ip->i_mount));
 	ASSERT(iip->ili_logged == 0 || XFS_FORCED_SHUTDOWN(ip->i_mount));
 
-	spin_unlock(&lip->li_ailp->xa_lock);
+	spin_unlock(&lip->li_ailp->ail_lock);
 
 	error = xfs_iflush(ip, &bp);
 	if (!error) {
@@ -570,7 +570,7 @@ xfs_inode_item_push(
 		xfs_buf_relse(bp);
 	}
 
-	spin_lock(&lip->li_ailp->xa_lock);
+	spin_lock(&lip->li_ailp->ail_lock);
 out_unlock:
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
 	return rval;
@@ -774,7 +774,7 @@ xfs_iflush_done(
 		bool			mlip_changed = false;
 
 		/* this is an opencoded batch version of xfs_trans_ail_delete */
-		spin_lock(&ailp->xa_lock);
+		spin_lock(&ailp->ail_lock);
 		for (blip = lip; blip; blip = blip->li_bio_list) {
 			if (INODE_ITEM(blip)->ili_logged &&
 			    blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn)
@@ -785,15 +785,15 @@ xfs_iflush_done(
 		}
 
 		if (mlip_changed) {
-			if (!XFS_FORCED_SHUTDOWN(ailp->xa_mount))
-				xlog_assign_tail_lsn_locked(ailp->xa_mount);
-			if (list_empty(&ailp->xa_ail))
-				wake_up_all(&ailp->xa_empty);
+			if (!XFS_FORCED_SHUTDOWN(ailp->ail_mount))
+				xlog_assign_tail_lsn_locked(ailp->ail_mount);
+			if (list_empty(&ailp->ail_head))
+				wake_up_all(&ailp->ail_empty);
 		}
-		spin_unlock(&ailp->xa_lock);
+		spin_unlock(&ailp->ail_lock);
 
 		if (mlip_changed)
-			xfs_log_space_wake(ailp->xa_mount);
+			xfs_log_space_wake(ailp->ail_mount);
 	}
 
 	/*
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 38d4227895ae..c3f549e6ad23 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1148,7 +1148,7 @@ xlog_assign_tail_lsn_locked(
 	struct xfs_log_item	*lip;
 	xfs_lsn_t		tail_lsn;
 
-	assert_spin_locked(&mp->m_ail->xa_lock);
+	assert_spin_locked(&mp->m_ail->ail_lock);
 
 	/*
 	 * To make sure we always have a valid LSN for the log tail we keep
@@ -1171,9 +1171,9 @@ xlog_assign_tail_lsn(
 {
 	xfs_lsn_t		tail_lsn;
 
-	spin_lock(&mp->m_ail->xa_lock);
+	spin_lock(&mp->m_ail->ail_lock);
 	tail_lsn = xlog_assign_tail_lsn_locked(mp);
-	spin_unlock(&mp->m_ail->xa_lock);
+	spin_unlock(&mp->m_ail->ail_lock);
 
 	return tail_lsn;
 }
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 87b1c331f9eb..0183355cdeaf 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3423,7 +3423,7 @@ xlog_recover_efi_pass2(
 	}
 	atomic_set(&efip->efi_next_extent, efi_formatp->efi_nextents);
 
-	spin_lock(&log->l_ailp->xa_lock);
+	spin_lock(&log->l_ailp->ail_lock);
 	/*
 	 * The EFI has two references. One for the EFD and one for EFI to ensure
 	 * it makes it into the AIL. Insert the EFI into the AIL directly and
@@ -3466,7 +3466,7 @@ xlog_recover_efd_pass2(
 	 * Search for the EFI with the id in the EFD format structure in the
 	 * AIL.
 	 */
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
 	while (lip != NULL) {
 		if (lip->li_type == XFS_LI_EFI) {
@@ -3476,9 +3476,9 @@ xlog_recover_efd_pass2(
 				 * Drop the EFD reference to the EFI. This
 				 * removes the EFI from the AIL and frees it.
 				 */
-				spin_unlock(&ailp->xa_lock);
+				spin_unlock(&ailp->ail_lock);
 				xfs_efi_release(efip);
-				spin_lock(&ailp->xa_lock);
+				spin_lock(&ailp->ail_lock);
 				break;
 			}
 		}
@@ -3486,7 +3486,7 @@ xlog_recover_efd_pass2(
 	}
 
 	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 
 	return 0;
 }
@@ -3519,7 +3519,7 @@ xlog_recover_rui_pass2(
 	}
 	atomic_set(&ruip->rui_next_extent, rui_formatp->rui_nextents);
 
-	spin_lock(&log->l_ailp->xa_lock);
+	spin_lock(&log->l_ailp->ail_lock);
 	/*
 	 * The RUI has two references. One for the RUD and one for RUI to ensure
 	 * it makes it into the AIL. Insert the RUI into the AIL directly and
@@ -3559,7 +3559,7 @@ xlog_recover_rud_pass2(
 	 * Search for the RUI with the id in the RUD format structure in the
 	 * AIL.
 	 */
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
 	while (lip != NULL) {
 		if (lip->li_type == XFS_LI_RUI) {
@@ -3569,9 +3569,9 @@ xlog_recover_rud_pass2(
 				 * Drop the RUD reference to the RUI. This
 				 * removes the RUI from the AIL and frees it.
 				 */
-				spin_unlock(&ailp->xa_lock);
+				spin_unlock(&ailp->ail_lock);
 				xfs_rui_release(ruip);
-				spin_lock(&ailp->xa_lock);
+				spin_lock(&ailp->ail_lock);
 				break;
 			}
 		}
@@ -3579,7 +3579,7 @@ xlog_recover_rud_pass2(
 	}
 
 	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 
 	return 0;
 }
@@ -3635,7 +3635,7 @@ xlog_recover_cui_pass2(
 	}
 	atomic_set(&cuip->cui_next_extent, cui_formatp->cui_nextents);
 
-	spin_lock(&log->l_ailp->xa_lock);
+	spin_lock(&log->l_ailp->ail_lock);
 	/*
 	 * The CUI has two references. One for the CUD and one for CUI to ensure
 	 * it makes it into the AIL. Insert the CUI into the AIL directly and
@@ -3676,7 +3676,7 @@ xlog_recover_cud_pass2(
 	 * Search for the CUI with the id in the CUD format structure in the
 	 * AIL.
 	 */
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
 	while (lip != NULL) {
 		if (lip->li_type == XFS_LI_CUI) {
@@ -3686,9 +3686,9 @@ xlog_recover_cud_pass2(
 				 * Drop the CUD reference to the CUI. This
 				 * removes the CUI from the AIL and frees it.
 				 */
-				spin_unlock(&ailp->xa_lock);
+				spin_unlock(&ailp->ail_lock);
 				xfs_cui_release(cuip);
-				spin_lock(&ailp->xa_lock);
+				spin_lock(&ailp->ail_lock);
 				break;
 			}
 		}
@@ -3696,7 +3696,7 @@ xlog_recover_cud_pass2(
 	}
 
 	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 
 	return 0;
 }
@@ -3754,7 +3754,7 @@ xlog_recover_bui_pass2(
 	}
 	atomic_set(&buip->bui_next_extent, bui_formatp->bui_nextents);
 
-	spin_lock(&log->l_ailp->xa_lock);
+	spin_lock(&log->l_ailp->ail_lock);
 	/*
 	 * The RUI has two references. One for the RUD and one for RUI to ensure
 	 * it makes it into the AIL. Insert the RUI into the AIL directly and
@@ -3795,7 +3795,7 @@ xlog_recover_bud_pass2(
 	 * Search for the BUI with the id in the BUD format structure in the
 	 * AIL.
 	 */
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
 	while (lip != NULL) {
 		if (lip->li_type == XFS_LI_BUI) {
@@ -3805,9 +3805,9 @@ xlog_recover_bud_pass2(
 				 * Drop the BUD reference to the BUI. This
 				 * removes the BUI from the AIL and frees it.
 				 */
-				spin_unlock(&ailp->xa_lock);
+				spin_unlock(&ailp->ail_lock);
 				xfs_bui_release(buip);
-				spin_lock(&ailp->xa_lock);
+				spin_lock(&ailp->ail_lock);
 				break;
 			}
 		}
@@ -3815,7 +3815,7 @@ xlog_recover_bud_pass2(
 	}
 
 	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 
 	return 0;
 }
@@ -4648,9 +4648,9 @@ xlog_recover_process_efi(
 	if (test_bit(XFS_EFI_RECOVERED, &efip->efi_flags))
 		return 0;
 
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	error = xfs_efi_recover(mp, efip);
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 
 	return error;
 }
@@ -4666,9 +4666,9 @@ xlog_recover_cancel_efi(
 
 	efip = container_of(lip, struct xfs_efi_log_item, efi_item);
 
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	xfs_efi_release(efip);
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 }
 
 /* Recover the RUI if necessary. */
@@ -4688,9 +4688,9 @@ xlog_recover_process_rui(
 	if (test_bit(XFS_RUI_RECOVERED, &ruip->rui_flags))
 		return 0;
 
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	error = xfs_rui_recover(mp, ruip);
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 
 	return error;
 }
@@ -4706,9 +4706,9 @@ xlog_recover_cancel_rui(
 
 	ruip = container_of(lip, struct xfs_rui_log_item, rui_item);
 
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	xfs_rui_release(ruip);
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 }
 
 /* Recover the CUI if necessary. */
@@ -4728,9 +4728,9 @@ xlog_recover_process_cui(
 	if (test_bit(XFS_CUI_RECOVERED, &cuip->cui_flags))
 		return 0;
 
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	error = xfs_cui_recover(mp, cuip);
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 
 	return error;
 }
@@ -4746,9 +4746,9 @@ xlog_recover_cancel_cui(
 
 	cuip = container_of(lip, struct xfs_cui_log_item, cui_item);
 
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	xfs_cui_release(cuip);
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 }
 
 /* Recover the BUI if necessary. */
@@ -4768,9 +4768,9 @@ xlog_recover_process_bui(
 	if (test_bit(XFS_BUI_RECOVERED, &buip->bui_flags))
 		return 0;
 
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	error = xfs_bui_recover(mp, buip);
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 
 	return error;
 }
@@ -4786,9 +4786,9 @@ xlog_recover_cancel_bui(
 
 	buip = container_of(lip, struct xfs_bui_log_item, bui_item);
 
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	xfs_bui_release(buip);
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 }
 
 /* Is this log item a deferred action intent? */
@@ -4834,7 +4834,7 @@ xlog_recover_process_intents(
 #endif
 
 	ailp = log->l_ailp;
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
 #if defined(DEBUG) || defined(XFS_WARN)
 	last_lsn = xlog_assign_lsn(log->l_curr_cycle, log->l_curr_block);
@@ -4879,7 +4879,7 @@ xlog_recover_process_intents(
 	}
 out:
 	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	return error;
 }
 
@@ -4897,7 +4897,7 @@ xlog_recover_cancel_intents(
 	struct xfs_ail		*ailp;
 
 	ailp = log->l_ailp;
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
 	while (lip != NULL) {
 		/*
@@ -4931,7 +4931,7 @@ xlog_recover_cancel_intents(
 	}
 
 	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index a87f657f59c9..756e01999c24 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -781,8 +781,8 @@ xfs_log_item_batch_insert(
 {
 	int	i;
 
-	spin_lock(&ailp->xa_lock);
-	/* xfs_trans_ail_update_bulk drops ailp->xa_lock */
+	spin_lock(&ailp->ail_lock);
+	/* xfs_trans_ail_update_bulk drops ailp->ail_lock */
 	xfs_trans_ail_update_bulk(ailp, cur, log_items, nr_items, commit_lsn);
 
 	for (i = 0; i < nr_items; i++) {
@@ -825,9 +825,9 @@ xfs_trans_committed_bulk(
 	struct xfs_ail_cursor	cur;
 	int			i = 0;
 
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	xfs_trans_ail_cursor_last(ailp, &cur, commit_lsn);
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 
 	/* unpin all the log items */
 	for (lv = log_vector; lv; lv = lv->lv_next ) {
@@ -847,7 +847,7 @@ xfs_trans_committed_bulk(
 		 * object into the AIL as we are in a shutdown situation.
 		 */
 		if (aborted) {
-			ASSERT(XFS_FORCED_SHUTDOWN(ailp->xa_mount));
+			ASSERT(XFS_FORCED_SHUTDOWN(ailp->ail_mount));
 			lip->li_ops->iop_unpin(lip, 1);
 			continue;
 		}
@@ -861,11 +861,11 @@ xfs_trans_committed_bulk(
 			 * not affect the AIL cursor the bulk insert path is
 			 * using.
 			 */
-			spin_lock(&ailp->xa_lock);
+			spin_lock(&ailp->ail_lock);
 			if (XFS_LSN_CMP(item_lsn, lip->li_lsn) > 0)
 				xfs_trans_ail_update(ailp, lip, item_lsn);
 			else
-				spin_unlock(&ailp->xa_lock);
+				spin_unlock(&ailp->ail_lock);
 			lip->li_ops->iop_unpin(lip, 0);
 			continue;
 		}
@@ -883,9 +883,9 @@ xfs_trans_committed_bulk(
 	if (i)
 		xfs_log_item_batch_insert(ailp, &cur, log_items, i, commit_lsn);
 
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 }
 
 /*
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index cef89f7127d3..d4a2445215e6 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -40,7 +40,7 @@ xfs_ail_check(
 {
 	xfs_log_item_t	*prev_lip;
 
-	if (list_empty(&ailp->xa_ail))
+	if (list_empty(&ailp->ail_head))
 		return;
 
 	/*
@@ -48,11 +48,11 @@ xfs_ail_check(
 	 */
 	ASSERT((lip->li_flags & XFS_LI_IN_AIL) != 0);
 	prev_lip = list_entry(lip->li_ail.prev, xfs_log_item_t, li_ail);
-	if (&prev_lip->li_ail != &ailp->xa_ail)
+	if (&prev_lip->li_ail != &ailp->ail_head)
 		ASSERT(XFS_LSN_CMP(prev_lip->li_lsn, lip->li_lsn) <= 0);
 
 	prev_lip = list_entry(lip->li_ail.next, xfs_log_item_t, li_ail);
-	if (&prev_lip->li_ail != &ailp->xa_ail)
+	if (&prev_lip->li_ail != &ailp->ail_head)
 		ASSERT(XFS_LSN_CMP(prev_lip->li_lsn, lip->li_lsn) >= 0);
 
 
@@ -69,10 +69,10 @@ static xfs_log_item_t *
 xfs_ail_max(
 	struct xfs_ail  *ailp)
 {
-	if (list_empty(&ailp->xa_ail))
+	if (list_empty(&ailp->ail_head))
 		return NULL;
 
-	return list_entry(ailp->xa_ail.prev, xfs_log_item_t, li_ail);
+	return list_entry(ailp->ail_head.prev, xfs_log_item_t, li_ail);
 }
 
 /*
@@ -84,7 +84,7 @@ xfs_ail_next(
 	struct xfs_ail  *ailp,
 	xfs_log_item_t  *lip)
 {
-	if (lip->li_ail.next == &ailp->xa_ail)
+	if (lip->li_ail.next == &ailp->ail_head)
 		return NULL;
 
 	return list_first_entry(&lip->li_ail, xfs_log_item_t, li_ail);
@@ -105,11 +105,11 @@ xfs_ail_min_lsn(
 	xfs_lsn_t	lsn = 0;
 	xfs_log_item_t	*lip;
 
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	lip = xfs_ail_min(ailp);
 	if (lip)
 		lsn = lip->li_lsn;
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 
 	return lsn;
 }
@@ -124,11 +124,11 @@ xfs_ail_max_lsn(
 	xfs_lsn_t       lsn = 0;
 	xfs_log_item_t  *lip;
 
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	lip = xfs_ail_max(ailp);
 	if (lip)
 		lsn = lip->li_lsn;
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 
 	return lsn;
 }
@@ -146,7 +146,7 @@ xfs_trans_ail_cursor_init(
 	struct xfs_ail_cursor	*cur)
 {
 	cur->item = NULL;
-	list_add_tail(&cur->list, &ailp->xa_cursors);
+	list_add_tail(&cur->list, &ailp->ail_cursors);
 }
 
 /*
@@ -194,7 +194,7 @@ xfs_trans_ail_cursor_clear(
 {
 	struct xfs_ail_cursor	*cur;
 
-	list_for_each_entry(cur, &ailp->xa_cursors, list) {
+	list_for_each_entry(cur, &ailp->ail_cursors, list) {
 		if (cur->item == lip)
 			cur->item = (struct xfs_log_item *)
 					((uintptr_t)cur->item | 1);
@@ -222,7 +222,7 @@ xfs_trans_ail_cursor_first(
 		goto out;
 	}
 
-	list_for_each_entry(lip, &ailp->xa_ail, li_ail) {
+	list_for_each_entry(lip, &ailp->ail_head, li_ail) {
 		if (XFS_LSN_CMP(lip->li_lsn, lsn) >= 0)
 			goto out;
 	}
@@ -241,7 +241,7 @@ __xfs_trans_ail_cursor_last(
 {
 	xfs_log_item_t		*lip;
 
-	list_for_each_entry_reverse(lip, &ailp->xa_ail, li_ail) {
+	list_for_each_entry_reverse(lip, &ailp->ail_head, li_ail) {
 		if (XFS_LSN_CMP(lip->li_lsn, lsn) <= 0)
 			return lip;
 	}
@@ -310,7 +310,7 @@ xfs_ail_splice(
 	if (lip)
 		list_splice(list, &lip->li_ail);
 	else
-		list_splice(list, &ailp->xa_ail);
+		list_splice(list, &ailp->ail_head);
 }
 
 /*
@@ -335,17 +335,17 @@ xfsaild_push_item(
 	 * If log item pinning is enabled, skip the push and track the item as
 	 * pinned. This can help induce head-behind-tail conditions.
 	 */
-	if (XFS_TEST_ERROR(false, ailp->xa_mount, XFS_ERRTAG_LOG_ITEM_PIN))
+	if (XFS_TEST_ERROR(false, ailp->ail_mount, XFS_ERRTAG_LOG_ITEM_PIN))
 		return XFS_ITEM_PINNED;
 
-	return lip->li_ops->iop_push(lip, &ailp->xa_buf_list);
+	return lip->li_ops->iop_push(lip, &ailp->ail_buf_list);
 }
 
 static long
 xfsaild_push(
 	struct xfs_ail		*ailp)
 {
-	xfs_mount_t		*mp = ailp->xa_mount;
+	xfs_mount_t		*mp = ailp->ail_mount;
 	struct xfs_ail_cursor	cur;
 	xfs_log_item_t		*lip;
 	xfs_lsn_t		lsn;
@@ -360,30 +360,30 @@ xfsaild_push(
 	 * buffers the last time we ran, force the log first and wait for it
 	 * before pushing again.
 	 */
-	if (ailp->xa_log_flush && ailp->xa_last_pushed_lsn == 0 &&
-	    (!list_empty_careful(&ailp->xa_buf_list) ||
+	if (ailp->ail_log_flush && ailp->ail_last_pushed_lsn == 0 &&
+	    (!list_empty_careful(&ailp->ail_buf_list) ||
 	     xfs_ail_min_lsn(ailp))) {
-		ailp->xa_log_flush = 0;
+		ailp->ail_log_flush = 0;
 
 		XFS_STATS_INC(mp, xs_push_ail_flush);
 		xfs_log_force(mp, XFS_LOG_SYNC);
 	}
 
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 
-	/* barrier matches the xa_target update in xfs_ail_push() */
+	/* barrier matches the ail_target update in xfs_ail_push() */
 	smp_rmb();
-	target = ailp->xa_target;
-	ailp->xa_target_prev = target;
+	target = ailp->ail_target;
+	ailp->ail_target_prev = target;
 
-	lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->xa_last_pushed_lsn);
+	lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->ail_last_pushed_lsn);
 	if (!lip) {
 		/*
 		 * If the AIL is empty or our push has reached the end we are
 		 * done now.
 		 */
 		xfs_trans_ail_cursor_done(&cur);
-		spin_unlock(&ailp->xa_lock);
+		spin_unlock(&ailp->ail_lock);
 		goto out_done;
 	}
 
@@ -404,7 +404,7 @@ xfsaild_push(
 			XFS_STATS_INC(mp, xs_push_ail_success);
 			trace_xfs_ail_push(lip);
 
-			ailp->xa_last_pushed_lsn = lsn;
+			ailp->ail_last_pushed_lsn = lsn;
 			break;
 
 		case XFS_ITEM_FLUSHING:
@@ -423,7 +423,7 @@ xfsaild_push(
 			trace_xfs_ail_flushing(lip);
 
 			flushing++;
-			ailp->xa_last_pushed_lsn = lsn;
+			ailp->ail_last_pushed_lsn = lsn;
 			break;
 
 		case XFS_ITEM_PINNED:
@@ -431,7 +431,7 @@ xfsaild_push(
 			trace_xfs_ail_pinned(lip);
 
 			stuck++;
-			ailp->xa_log_flush++;
+			ailp->ail_log_flush++;
 			break;
 		case XFS_ITEM_LOCKED:
 			XFS_STATS_INC(mp, xs_push_ail_locked);
@@ -468,10 +468,10 @@ xfsaild_push(
 		lsn = lip->li_lsn;
 	}
 	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 
-	if (xfs_buf_delwri_submit_nowait(&ailp->xa_buf_list))
-		ailp->xa_log_flush++;
+	if (xfs_buf_delwri_submit_nowait(&ailp->ail_buf_list))
+		ailp->ail_log_flush++;
 
 	if (!count || XFS_LSN_CMP(lsn, target) >= 0) {
 out_done:
@@ -481,7 +481,7 @@ xfsaild_push(
 		 * AIL before we start the next scan from the start of the AIL.
 		 */
 		tout = 50;
-		ailp->xa_last_pushed_lsn = 0;
+		ailp->ail_last_pushed_lsn = 0;
 	} else if (((stuck + flushing) * 100) / count > 90) {
 		/*
 		 * Either there is a lot of contention on the AIL or we are
@@ -494,7 +494,7 @@ xfsaild_push(
 		 * the restart to issue a log force to unpin the stuck items.
 		 */
 		tout = 20;
-		ailp->xa_last_pushed_lsn = 0;
+		ailp->ail_last_pushed_lsn = 0;
 	} else {
 		/*
 		 * Assume we have more work to do in a short while.
@@ -536,26 +536,26 @@ xfsaild(
 			break;
 		}
 
-		spin_lock(&ailp->xa_lock);
+		spin_lock(&ailp->ail_lock);
 
 		/*
 		 * Idle if the AIL is empty and we are not racing with a target
 		 * update. We check the AIL after we set the task to a sleep
-		 * state to guarantee that we either catch an xa_target update
+		 * state to guarantee that we either catch an ail_target update
 		 * or that a wake_up resets the state to TASK_RUNNING.
 		 * Otherwise, we run the risk of sleeping indefinitely.
 		 *
-		 * The barrier matches the xa_target update in xfs_ail_push().
+		 * The barrier matches the ail_target update in xfs_ail_push().
 		 */
 		smp_rmb();
 		if (!xfs_ail_min(ailp) &&
-		    ailp->xa_target == ailp->xa_target_prev) {
-			spin_unlock(&ailp->xa_lock);
+		    ailp->ail_target == ailp->ail_target_prev) {
+			spin_unlock(&ailp->ail_lock);
 			freezable_schedule();
 			tout = 0;
 			continue;
 		}
-		spin_unlock(&ailp->xa_lock);
+		spin_unlock(&ailp->ail_lock);
 
 		if (tout)
 			freezable_schedule_timeout(msecs_to_jiffies(tout));
@@ -592,8 +592,8 @@ xfs_ail_push(
 	xfs_log_item_t	*lip;
 
 	lip = xfs_ail_min(ailp);
-	if (!lip || XFS_FORCED_SHUTDOWN(ailp->xa_mount) ||
-	    XFS_LSN_CMP(threshold_lsn, ailp->xa_target) <= 0)
+	if (!lip || XFS_FORCED_SHUTDOWN(ailp->ail_mount) ||
+	    XFS_LSN_CMP(threshold_lsn, ailp->ail_target) <= 0)
 		return;
 
 	/*
@@ -601,10 +601,10 @@ xfs_ail_push(
 	 * the XFS_AIL_PUSHING_BIT.
 	 */
 	smp_wmb();
-	xfs_trans_ail_copy_lsn(ailp, &ailp->xa_target, &threshold_lsn);
+	xfs_trans_ail_copy_lsn(ailp, &ailp->ail_target, &threshold_lsn);
 	smp_wmb();
 
-	wake_up_process(ailp->xa_task);
+	wake_up_process(ailp->ail_task);
 }
 
 /*
@@ -630,18 +630,18 @@ xfs_ail_push_all_sync(
 	struct xfs_log_item	*lip;
 	DEFINE_WAIT(wait);
 
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	while ((lip = xfs_ail_max(ailp)) != NULL) {
-		prepare_to_wait(&ailp->xa_empty, &wait, TASK_UNINTERRUPTIBLE);
-		ailp->xa_target = lip->li_lsn;
-		wake_up_process(ailp->xa_task);
-		spin_unlock(&ailp->xa_lock);
+		prepare_to_wait(&ailp->ail_empty, &wait, TASK_UNINTERRUPTIBLE);
+		ailp->ail_target = lip->li_lsn;
+		wake_up_process(ailp->ail_task);
+		spin_unlock(&ailp->ail_lock);
 		schedule();
-		spin_lock(&ailp->xa_lock);
+		spin_lock(&ailp->ail_lock);
 	}
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 
-	finish_wait(&ailp->xa_empty, &wait);
+	finish_wait(&ailp->ail_empty, &wait);
 }
 
 /*
@@ -672,7 +672,7 @@ xfs_trans_ail_update_bulk(
 	struct xfs_ail_cursor	*cur,
 	struct xfs_log_item	**log_items,
 	int			nr_items,
-	xfs_lsn_t		lsn) __releases(ailp->xa_lock)
+	xfs_lsn_t		lsn) __releases(ailp->ail_lock)
 {
 	xfs_log_item_t		*mlip;
 	int			mlip_changed = 0;
@@ -705,13 +705,13 @@ xfs_trans_ail_update_bulk(
 		xfs_ail_splice(ailp, cur, &tmp, lsn);
 
 	if (mlip_changed) {
-		if (!XFS_FORCED_SHUTDOWN(ailp->xa_mount))
-			xlog_assign_tail_lsn_locked(ailp->xa_mount);
-		spin_unlock(&ailp->xa_lock);
+		if (!XFS_FORCED_SHUTDOWN(ailp->ail_mount))
+			xlog_assign_tail_lsn_locked(ailp->ail_mount);
+		spin_unlock(&ailp->ail_lock);
 
-		xfs_log_space_wake(ailp->xa_mount);
+		xfs_log_space_wake(ailp->ail_mount);
 	} else {
-		spin_unlock(&ailp->xa_lock);
+		spin_unlock(&ailp->ail_lock);
 	}
 }
 
@@ -756,13 +756,13 @@ void
 xfs_trans_ail_delete(
 	struct xfs_ail		*ailp,
 	struct xfs_log_item	*lip,
-	int			shutdown_type) __releases(ailp->xa_lock)
+	int			shutdown_type) __releases(ailp->ail_lock)
 {
-	struct xfs_mount	*mp = ailp->xa_mount;
+	struct xfs_mount	*mp = ailp->ail_mount;
 	bool			mlip_changed;
 
 	if (!(lip->li_flags & XFS_LI_IN_AIL)) {
-		spin_unlock(&ailp->xa_lock);
+		spin_unlock(&ailp->ail_lock);
 		if (!XFS_FORCED_SHUTDOWN(mp)) {
 			xfs_alert_tag(mp, XFS_PTAG_AILDELETE,
 	"%s: attempting to delete a log item that is not in the AIL",
@@ -776,13 +776,13 @@ xfs_trans_ail_delete(
 	if (mlip_changed) {
 		if (!XFS_FORCED_SHUTDOWN(mp))
 			xlog_assign_tail_lsn_locked(mp);
-		if (list_empty(&ailp->xa_ail))
-			wake_up_all(&ailp->xa_empty);
+		if (list_empty(&ailp->ail_head))
+			wake_up_all(&ailp->ail_empty);
 	}
 
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	if (mlip_changed)
-		xfs_log_space_wake(ailp->xa_mount);
+		xfs_log_space_wake(ailp->ail_mount);
 }
 
 int
@@ -795,16 +795,16 @@ xfs_trans_ail_init(
 	if (!ailp)
 		return -ENOMEM;
 
-	ailp->xa_mount = mp;
-	INIT_LIST_HEAD(&ailp->xa_ail);
-	INIT_LIST_HEAD(&ailp->xa_cursors);
-	spin_lock_init(&ailp->xa_lock);
-	INIT_LIST_HEAD(&ailp->xa_buf_list);
-	init_waitqueue_head(&ailp->xa_empty);
+	ailp->ail_mount = mp;
+	INIT_LIST_HEAD(&ailp->ail_head);
+	INIT_LIST_HEAD(&ailp->ail_cursors);
+	spin_lock_init(&ailp->ail_lock);
+	INIT_LIST_HEAD(&ailp->ail_buf_list);
+	init_waitqueue_head(&ailp->ail_empty);
 
-	ailp->xa_task = kthread_run(xfsaild, ailp, "xfsaild/%s",
-			ailp->xa_mount->m_fsname);
-	if (IS_ERR(ailp->xa_task))
+	ailp->ail_task = kthread_run(xfsaild, ailp, "xfsaild/%s",
+			ailp->ail_mount->m_fsname);
+	if (IS_ERR(ailp->ail_task))
 		goto out_free_ailp;
 
 	mp->m_ail = ailp;
@@ -821,6 +821,6 @@ xfs_trans_ail_destroy(
 {
 	struct xfs_ail	*ailp = mp->m_ail;
 
-	kthread_stop(ailp->xa_task);
+	kthread_stop(ailp->ail_task);
 	kmem_free(ailp);
 }
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 3ba7a96a8abd..b8871bcfe00b 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -429,8 +429,8 @@ xfs_trans_brelse(xfs_trans_t	*tp,
 	 * If the fs has shutdown and we dropped the last reference, it may fall
 	 * on us to release a (possibly dirty) bli if it never made it to the
 	 * AIL (e.g., the aborted unpin already happened and didn't release it
-	 * due to our reference). Since we're already shutdown and need xa_lock,
-	 * just force remove from the AIL and release the bli here.
+	 * due to our reference). Since we're already shutdown and need
+	 * ail_lock, just force remove from the AIL and release the bli here.
 	 */
 	if (XFS_FORCED_SHUTDOWN(tp->t_mountp) && freed) {
 		xfs_trans_ail_remove(&bip->bli_item, SHUTDOWN_LOG_IO_ERROR);
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index b317a3644c00..be24b0c8a332 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -65,17 +65,17 @@ struct xfs_ail_cursor {
  * Eventually we need to drive the locking in here as well.
  */
 struct xfs_ail {
-	struct xfs_mount	*xa_mount;
-	struct task_struct	*xa_task;
-	struct list_head	xa_ail;
-	xfs_lsn_t		xa_target;
-	xfs_lsn_t		xa_target_prev;
-	struct list_head	xa_cursors;
-	spinlock_t		xa_lock;
-	xfs_lsn_t		xa_last_pushed_lsn;
-	int			xa_log_flush;
-	struct list_head	xa_buf_list;
-	wait_queue_head_t	xa_empty;
+	struct xfs_mount	*ail_mount;
+	struct task_struct	*ail_task;
+	struct list_head	ail_head;
+	xfs_lsn_t		ail_target;
+	xfs_lsn_t		ail_target_prev;
+	struct list_head	ail_cursors;
+	spinlock_t		ail_lock;
+	xfs_lsn_t		ail_last_pushed_lsn;
+	int			ail_log_flush;
+	struct list_head	ail_buf_list;
+	wait_queue_head_t	ail_empty;
 };
 
 /*
@@ -84,7 +84,7 @@ struct xfs_ail {
 void	xfs_trans_ail_update_bulk(struct xfs_ail *ailp,
 				struct xfs_ail_cursor *cur,
 				struct xfs_log_item **log_items, int nr_items,
-				xfs_lsn_t lsn) __releases(ailp->xa_lock);
+				xfs_lsn_t lsn) __releases(ailp->ail_lock);
 /*
  * Return a pointer to the first item in the AIL.  If the AIL is empty, then
  * return NULL.
@@ -93,7 +93,7 @@ static inline struct xfs_log_item *
 xfs_ail_min(
 	struct xfs_ail  *ailp)
 {
-	return list_first_entry_or_null(&ailp->xa_ail, struct xfs_log_item,
+	return list_first_entry_or_null(&ailp->ail_head, struct xfs_log_item,
 					li_ail);
 }
 
@@ -101,14 +101,14 @@ static inline void
 xfs_trans_ail_update(
 	struct xfs_ail		*ailp,
 	struct xfs_log_item	*lip,
-	xfs_lsn_t		lsn) __releases(ailp->xa_lock)
+	xfs_lsn_t		lsn) __releases(ailp->ail_lock)
 {
 	xfs_trans_ail_update_bulk(ailp, NULL, &lip, 1, lsn);
 }
 
 bool xfs_ail_delete_one(struct xfs_ail *ailp, struct xfs_log_item *lip);
 void xfs_trans_ail_delete(struct xfs_ail *ailp, struct xfs_log_item *lip,
-		int shutdown_type) __releases(ailp->xa_lock);
+		int shutdown_type) __releases(ailp->ail_lock);
 
 static inline void
 xfs_trans_ail_remove(
@@ -117,12 +117,12 @@ xfs_trans_ail_remove(
 {
 	struct xfs_ail		*ailp = lip->li_ailp;
 
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	/* xfs_trans_ail_delete() drops the AIL lock */
 	if (lip->li_flags & XFS_LI_IN_AIL)
 		xfs_trans_ail_delete(ailp, lip, shutdown_type);
 	else
-		spin_unlock(&ailp->xa_lock);
+		spin_unlock(&ailp->ail_lock);
 }
 
 void			xfs_ail_push(struct xfs_ail *, xfs_lsn_t);
@@ -149,9 +149,9 @@ xfs_trans_ail_copy_lsn(
 	xfs_lsn_t	*src)
 {
 	ASSERT(sizeof(xfs_lsn_t) == 8);	/* don't lock if it shrinks */
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	*dst = *src;
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 }
 #else
 static inline void
@@ -172,7 +172,7 @@ xfs_clear_li_failed(
 	struct xfs_buf	*bp = lip->li_buf;
 
 	ASSERT(lip->li_flags & XFS_LI_IN_AIL);
-	lockdep_assert_held(&lip->li_ailp->xa_lock);
+	lockdep_assert_held(&lip->li_ailp->ail_lock);
 
 	if (lip->li_flags & XFS_LI_FAILED) {
 		lip->li_flags &= ~XFS_LI_FAILED;
@@ -186,7 +186,7 @@ xfs_set_li_failed(
 	struct xfs_log_item	*lip,
 	struct xfs_buf		*bp)
 {
-	lockdep_assert_held(&lip->li_ailp->xa_lock);
+	lockdep_assert_held(&lip->li_ailp->ail_lock);
 
 	if (!(lip->li_flags & XFS_LI_FAILED)) {
 		xfs_buf_hold(bp);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 12/62] xfs: Rename xa_ elements to ail_
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This is a simple rename, except that xa_ail becomes ail_head.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/xfs/xfs_buf_item.c    |  10 ++--
 fs/xfs/xfs_dquot.c       |   4 +-
 fs/xfs/xfs_dquot_item.c  |  11 ++--
 fs/xfs/xfs_inode_item.c  |  22 +++----
 fs/xfs/xfs_log.c         |   6 +-
 fs/xfs/xfs_log_recover.c |  80 ++++++++++++-------------
 fs/xfs/xfs_trans.c       |  18 +++---
 fs/xfs/xfs_trans_ail.c   | 152 +++++++++++++++++++++++------------------------
 fs/xfs/xfs_trans_buf.c   |   4 +-
 fs/xfs/xfs_trans_priv.h  |  42 ++++++-------
 10 files changed, 175 insertions(+), 174 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index e0a0af0946f2..6c5035544a93 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -459,7 +459,7 @@ xfs_buf_item_unpin(
 			bp->b_fspriv = NULL;
 			bp->b_iodone = NULL;
 		} else {
-			spin_lock(&ailp->xa_lock);
+			spin_lock(&ailp->ail_lock);
 			xfs_trans_ail_delete(ailp, lip, SHUTDOWN_LOG_IO_ERROR);
 			xfs_buf_item_relse(bp);
 			ASSERT(bp->b_fspriv == NULL);
@@ -1056,13 +1056,13 @@ xfs_buf_do_callbacks_fail(
 	struct xfs_log_item	*lip = bp->b_fspriv;
 	struct xfs_ail		*ailp = lip->li_ailp;
 
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	for (; lip; lip = next) {
 		next = lip->li_bio_list;
 		if (lip->li_ops->iop_error)
 			lip->li_ops->iop_error(lip, bp);
 	}
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 }
 
 static bool
@@ -1215,7 +1215,7 @@ xfs_buf_iodone(
 	 *
 	 * Either way, AIL is useless if we're forcing a shutdown.
 	 */
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
 	xfs_buf_item_free(BUF_ITEM(lip));
 }
@@ -1236,7 +1236,7 @@ xfs_buf_resubmit_failed_buffers(
 	/*
 	 * Clear XFS_LI_FAILED flag from all items before resubmit
 	 *
-	 * XFS_LI_FAILED set/clear is protected by xa_lock, caller  this
+	 * XFS_LI_FAILED set/clear is protected by ail_lock, caller  this
 	 * function already have it acquired
 	 */
 	for (; lip; lip = next) {
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index d57c2db64e59..fbf37b326050 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -973,11 +973,11 @@ xfs_qm_dqflush_done(
 	    lip->li_lsn == qip->qli_flush_lsn) {
 
 		/* xfs_trans_ail_delete() drops the AIL lock. */
-		spin_lock(&ailp->xa_lock);
+		spin_lock(&ailp->ail_lock);
 		if (lip->li_lsn == qip->qli_flush_lsn)
 			xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
 		else
-			spin_unlock(&ailp->xa_lock);
+			spin_unlock(&ailp->ail_lock);
 	}
 
 	/*
diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
index 2c7a1629e064..14951799d920 100644
--- a/fs/xfs/xfs_dquot_item.c
+++ b/fs/xfs/xfs_dquot_item.c
@@ -140,8 +140,9 @@ xfs_qm_dqunpin_wait(
 STATIC uint
 xfs_qm_dquot_logitem_push(
 	struct xfs_log_item	*lip,
-	struct list_head	*buffer_list) __releases(&lip->li_ailp->xa_lock)
-					      __acquires(&lip->li_ailp->xa_lock)
+	struct list_head	*buffer_list)
+		__releases(&lip->li_ailp->ail_lock)
+		__acquires(&lip->li_ailp->ail_lock)
 {
 	struct xfs_dquot	*dqp = DQUOT_ITEM(lip)->qli_dquot;
 	struct xfs_buf		*bp = NULL;
@@ -173,7 +174,7 @@ xfs_qm_dquot_logitem_push(
 		goto out_unlock;
 	}
 
-	spin_unlock(&lip->li_ailp->xa_lock);
+	spin_unlock(&lip->li_ailp->ail_lock);
 
 	error = xfs_qm_dqflush(dqp, &bp);
 	if (error) {
@@ -185,7 +186,7 @@ xfs_qm_dquot_logitem_push(
 		xfs_buf_relse(bp);
 	}
 
-	spin_lock(&lip->li_ailp->xa_lock);
+	spin_lock(&lip->li_ailp->ail_lock);
 out_unlock:
 	xfs_dqunlock(dqp);
 	return rval;
@@ -367,7 +368,7 @@ xfs_qm_qoffend_logitem_committed(
 	 * Delete the qoff-start logitem from the AIL.
 	 * xfs_trans_ail_delete() drops the AIL lock.
 	 */
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	xfs_trans_ail_delete(ailp, &qfs->qql_item, SHUTDOWN_LOG_IO_ERROR);
 
 	kmem_free(qfs->qql_item.li_lv_shadow);
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 6ee5c3bf19ad..071acd4249a0 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -501,8 +501,8 @@ STATIC uint
 xfs_inode_item_push(
 	struct xfs_log_item	*lip,
 	struct list_head	*buffer_list)
-		__releases(&lip->li_ailp->xa_lock)
-		__acquires(&lip->li_ailp->xa_lock)
+		__releases(&lip->li_ailp->ail_lock)
+		__acquires(&lip->li_ailp->ail_lock)
 {
 	struct xfs_inode_log_item *iip = INODE_ITEM(lip);
 	struct xfs_inode	*ip = iip->ili_inode;
@@ -561,7 +561,7 @@ xfs_inode_item_push(
 	ASSERT(iip->ili_fields != 0 || XFS_FORCED_SHUTDOWN(ip->i_mount));
 	ASSERT(iip->ili_logged == 0 || XFS_FORCED_SHUTDOWN(ip->i_mount));
 
-	spin_unlock(&lip->li_ailp->xa_lock);
+	spin_unlock(&lip->li_ailp->ail_lock);
 
 	error = xfs_iflush(ip, &bp);
 	if (!error) {
@@ -570,7 +570,7 @@ xfs_inode_item_push(
 		xfs_buf_relse(bp);
 	}
 
-	spin_lock(&lip->li_ailp->xa_lock);
+	spin_lock(&lip->li_ailp->ail_lock);
 out_unlock:
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
 	return rval;
@@ -774,7 +774,7 @@ xfs_iflush_done(
 		bool			mlip_changed = false;
 
 		/* this is an opencoded batch version of xfs_trans_ail_delete */
-		spin_lock(&ailp->xa_lock);
+		spin_lock(&ailp->ail_lock);
 		for (blip = lip; blip; blip = blip->li_bio_list) {
 			if (INODE_ITEM(blip)->ili_logged &&
 			    blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn)
@@ -785,15 +785,15 @@ xfs_iflush_done(
 		}
 
 		if (mlip_changed) {
-			if (!XFS_FORCED_SHUTDOWN(ailp->xa_mount))
-				xlog_assign_tail_lsn_locked(ailp->xa_mount);
-			if (list_empty(&ailp->xa_ail))
-				wake_up_all(&ailp->xa_empty);
+			if (!XFS_FORCED_SHUTDOWN(ailp->ail_mount))
+				xlog_assign_tail_lsn_locked(ailp->ail_mount);
+			if (list_empty(&ailp->ail_head))
+				wake_up_all(&ailp->ail_empty);
 		}
-		spin_unlock(&ailp->xa_lock);
+		spin_unlock(&ailp->ail_lock);
 
 		if (mlip_changed)
-			xfs_log_space_wake(ailp->xa_mount);
+			xfs_log_space_wake(ailp->ail_mount);
 	}
 
 	/*
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 38d4227895ae..c3f549e6ad23 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1148,7 +1148,7 @@ xlog_assign_tail_lsn_locked(
 	struct xfs_log_item	*lip;
 	xfs_lsn_t		tail_lsn;
 
-	assert_spin_locked(&mp->m_ail->xa_lock);
+	assert_spin_locked(&mp->m_ail->ail_lock);
 
 	/*
 	 * To make sure we always have a valid LSN for the log tail we keep
@@ -1171,9 +1171,9 @@ xlog_assign_tail_lsn(
 {
 	xfs_lsn_t		tail_lsn;
 
-	spin_lock(&mp->m_ail->xa_lock);
+	spin_lock(&mp->m_ail->ail_lock);
 	tail_lsn = xlog_assign_tail_lsn_locked(mp);
-	spin_unlock(&mp->m_ail->xa_lock);
+	spin_unlock(&mp->m_ail->ail_lock);
 
 	return tail_lsn;
 }
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 87b1c331f9eb..0183355cdeaf 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3423,7 +3423,7 @@ xlog_recover_efi_pass2(
 	}
 	atomic_set(&efip->efi_next_extent, efi_formatp->efi_nextents);
 
-	spin_lock(&log->l_ailp->xa_lock);
+	spin_lock(&log->l_ailp->ail_lock);
 	/*
 	 * The EFI has two references. One for the EFD and one for EFI to ensure
 	 * it makes it into the AIL. Insert the EFI into the AIL directly and
@@ -3466,7 +3466,7 @@ xlog_recover_efd_pass2(
 	 * Search for the EFI with the id in the EFD format structure in the
 	 * AIL.
 	 */
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
 	while (lip != NULL) {
 		if (lip->li_type == XFS_LI_EFI) {
@@ -3476,9 +3476,9 @@ xlog_recover_efd_pass2(
 				 * Drop the EFD reference to the EFI. This
 				 * removes the EFI from the AIL and frees it.
 				 */
-				spin_unlock(&ailp->xa_lock);
+				spin_unlock(&ailp->ail_lock);
 				xfs_efi_release(efip);
-				spin_lock(&ailp->xa_lock);
+				spin_lock(&ailp->ail_lock);
 				break;
 			}
 		}
@@ -3486,7 +3486,7 @@ xlog_recover_efd_pass2(
 	}
 
 	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 
 	return 0;
 }
@@ -3519,7 +3519,7 @@ xlog_recover_rui_pass2(
 	}
 	atomic_set(&ruip->rui_next_extent, rui_formatp->rui_nextents);
 
-	spin_lock(&log->l_ailp->xa_lock);
+	spin_lock(&log->l_ailp->ail_lock);
 	/*
 	 * The RUI has two references. One for the RUD and one for RUI to ensure
 	 * it makes it into the AIL. Insert the RUI into the AIL directly and
@@ -3559,7 +3559,7 @@ xlog_recover_rud_pass2(
 	 * Search for the RUI with the id in the RUD format structure in the
 	 * AIL.
 	 */
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
 	while (lip != NULL) {
 		if (lip->li_type == XFS_LI_RUI) {
@@ -3569,9 +3569,9 @@ xlog_recover_rud_pass2(
 				 * Drop the RUD reference to the RUI. This
 				 * removes the RUI from the AIL and frees it.
 				 */
-				spin_unlock(&ailp->xa_lock);
+				spin_unlock(&ailp->ail_lock);
 				xfs_rui_release(ruip);
-				spin_lock(&ailp->xa_lock);
+				spin_lock(&ailp->ail_lock);
 				break;
 			}
 		}
@@ -3579,7 +3579,7 @@ xlog_recover_rud_pass2(
 	}
 
 	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 
 	return 0;
 }
@@ -3635,7 +3635,7 @@ xlog_recover_cui_pass2(
 	}
 	atomic_set(&cuip->cui_next_extent, cui_formatp->cui_nextents);
 
-	spin_lock(&log->l_ailp->xa_lock);
+	spin_lock(&log->l_ailp->ail_lock);
 	/*
 	 * The CUI has two references. One for the CUD and one for CUI to ensure
 	 * it makes it into the AIL. Insert the CUI into the AIL directly and
@@ -3676,7 +3676,7 @@ xlog_recover_cud_pass2(
 	 * Search for the CUI with the id in the CUD format structure in the
 	 * AIL.
 	 */
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
 	while (lip != NULL) {
 		if (lip->li_type == XFS_LI_CUI) {
@@ -3686,9 +3686,9 @@ xlog_recover_cud_pass2(
 				 * Drop the CUD reference to the CUI. This
 				 * removes the CUI from the AIL and frees it.
 				 */
-				spin_unlock(&ailp->xa_lock);
+				spin_unlock(&ailp->ail_lock);
 				xfs_cui_release(cuip);
-				spin_lock(&ailp->xa_lock);
+				spin_lock(&ailp->ail_lock);
 				break;
 			}
 		}
@@ -3696,7 +3696,7 @@ xlog_recover_cud_pass2(
 	}
 
 	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 
 	return 0;
 }
@@ -3754,7 +3754,7 @@ xlog_recover_bui_pass2(
 	}
 	atomic_set(&buip->bui_next_extent, bui_formatp->bui_nextents);
 
-	spin_lock(&log->l_ailp->xa_lock);
+	spin_lock(&log->l_ailp->ail_lock);
 	/*
 	 * The RUI has two references. One for the RUD and one for RUI to ensure
 	 * it makes it into the AIL. Insert the RUI into the AIL directly and
@@ -3795,7 +3795,7 @@ xlog_recover_bud_pass2(
 	 * Search for the BUI with the id in the BUD format structure in the
 	 * AIL.
 	 */
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
 	while (lip != NULL) {
 		if (lip->li_type == XFS_LI_BUI) {
@@ -3805,9 +3805,9 @@ xlog_recover_bud_pass2(
 				 * Drop the BUD reference to the BUI. This
 				 * removes the BUI from the AIL and frees it.
 				 */
-				spin_unlock(&ailp->xa_lock);
+				spin_unlock(&ailp->ail_lock);
 				xfs_bui_release(buip);
-				spin_lock(&ailp->xa_lock);
+				spin_lock(&ailp->ail_lock);
 				break;
 			}
 		}
@@ -3815,7 +3815,7 @@ xlog_recover_bud_pass2(
 	}
 
 	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 
 	return 0;
 }
@@ -4648,9 +4648,9 @@ xlog_recover_process_efi(
 	if (test_bit(XFS_EFI_RECOVERED, &efip->efi_flags))
 		return 0;
 
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	error = xfs_efi_recover(mp, efip);
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 
 	return error;
 }
@@ -4666,9 +4666,9 @@ xlog_recover_cancel_efi(
 
 	efip = container_of(lip, struct xfs_efi_log_item, efi_item);
 
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	xfs_efi_release(efip);
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 }
 
 /* Recover the RUI if necessary. */
@@ -4688,9 +4688,9 @@ xlog_recover_process_rui(
 	if (test_bit(XFS_RUI_RECOVERED, &ruip->rui_flags))
 		return 0;
 
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	error = xfs_rui_recover(mp, ruip);
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 
 	return error;
 }
@@ -4706,9 +4706,9 @@ xlog_recover_cancel_rui(
 
 	ruip = container_of(lip, struct xfs_rui_log_item, rui_item);
 
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	xfs_rui_release(ruip);
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 }
 
 /* Recover the CUI if necessary. */
@@ -4728,9 +4728,9 @@ xlog_recover_process_cui(
 	if (test_bit(XFS_CUI_RECOVERED, &cuip->cui_flags))
 		return 0;
 
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	error = xfs_cui_recover(mp, cuip);
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 
 	return error;
 }
@@ -4746,9 +4746,9 @@ xlog_recover_cancel_cui(
 
 	cuip = container_of(lip, struct xfs_cui_log_item, cui_item);
 
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	xfs_cui_release(cuip);
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 }
 
 /* Recover the BUI if necessary. */
@@ -4768,9 +4768,9 @@ xlog_recover_process_bui(
 	if (test_bit(XFS_BUI_RECOVERED, &buip->bui_flags))
 		return 0;
 
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	error = xfs_bui_recover(mp, buip);
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 
 	return error;
 }
@@ -4786,9 +4786,9 @@ xlog_recover_cancel_bui(
 
 	buip = container_of(lip, struct xfs_bui_log_item, bui_item);
 
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	xfs_bui_release(buip);
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 }
 
 /* Is this log item a deferred action intent? */
@@ -4834,7 +4834,7 @@ xlog_recover_process_intents(
 #endif
 
 	ailp = log->l_ailp;
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
 #if defined(DEBUG) || defined(XFS_WARN)
 	last_lsn = xlog_assign_lsn(log->l_curr_cycle, log->l_curr_block);
@@ -4879,7 +4879,7 @@ xlog_recover_process_intents(
 	}
 out:
 	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	return error;
 }
 
@@ -4897,7 +4897,7 @@ xlog_recover_cancel_intents(
 	struct xfs_ail		*ailp;
 
 	ailp = log->l_ailp;
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
 	while (lip != NULL) {
 		/*
@@ -4931,7 +4931,7 @@ xlog_recover_cancel_intents(
 	}
 
 	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index a87f657f59c9..756e01999c24 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -781,8 +781,8 @@ xfs_log_item_batch_insert(
 {
 	int	i;
 
-	spin_lock(&ailp->xa_lock);
-	/* xfs_trans_ail_update_bulk drops ailp->xa_lock */
+	spin_lock(&ailp->ail_lock);
+	/* xfs_trans_ail_update_bulk drops ailp->ail_lock */
 	xfs_trans_ail_update_bulk(ailp, cur, log_items, nr_items, commit_lsn);
 
 	for (i = 0; i < nr_items; i++) {
@@ -825,9 +825,9 @@ xfs_trans_committed_bulk(
 	struct xfs_ail_cursor	cur;
 	int			i = 0;
 
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	xfs_trans_ail_cursor_last(ailp, &cur, commit_lsn);
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 
 	/* unpin all the log items */
 	for (lv = log_vector; lv; lv = lv->lv_next ) {
@@ -847,7 +847,7 @@ xfs_trans_committed_bulk(
 		 * object into the AIL as we are in a shutdown situation.
 		 */
 		if (aborted) {
-			ASSERT(XFS_FORCED_SHUTDOWN(ailp->xa_mount));
+			ASSERT(XFS_FORCED_SHUTDOWN(ailp->ail_mount));
 			lip->li_ops->iop_unpin(lip, 1);
 			continue;
 		}
@@ -861,11 +861,11 @@ xfs_trans_committed_bulk(
 			 * not affect the AIL cursor the bulk insert path is
 			 * using.
 			 */
-			spin_lock(&ailp->xa_lock);
+			spin_lock(&ailp->ail_lock);
 			if (XFS_LSN_CMP(item_lsn, lip->li_lsn) > 0)
 				xfs_trans_ail_update(ailp, lip, item_lsn);
 			else
-				spin_unlock(&ailp->xa_lock);
+				spin_unlock(&ailp->ail_lock);
 			lip->li_ops->iop_unpin(lip, 0);
 			continue;
 		}
@@ -883,9 +883,9 @@ xfs_trans_committed_bulk(
 	if (i)
 		xfs_log_item_batch_insert(ailp, &cur, log_items, i, commit_lsn);
 
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 }
 
 /*
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index cef89f7127d3..d4a2445215e6 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -40,7 +40,7 @@ xfs_ail_check(
 {
 	xfs_log_item_t	*prev_lip;
 
-	if (list_empty(&ailp->xa_ail))
+	if (list_empty(&ailp->ail_head))
 		return;
 
 	/*
@@ -48,11 +48,11 @@ xfs_ail_check(
 	 */
 	ASSERT((lip->li_flags & XFS_LI_IN_AIL) != 0);
 	prev_lip = list_entry(lip->li_ail.prev, xfs_log_item_t, li_ail);
-	if (&prev_lip->li_ail != &ailp->xa_ail)
+	if (&prev_lip->li_ail != &ailp->ail_head)
 		ASSERT(XFS_LSN_CMP(prev_lip->li_lsn, lip->li_lsn) <= 0);
 
 	prev_lip = list_entry(lip->li_ail.next, xfs_log_item_t, li_ail);
-	if (&prev_lip->li_ail != &ailp->xa_ail)
+	if (&prev_lip->li_ail != &ailp->ail_head)
 		ASSERT(XFS_LSN_CMP(prev_lip->li_lsn, lip->li_lsn) >= 0);
 
 
@@ -69,10 +69,10 @@ static xfs_log_item_t *
 xfs_ail_max(
 	struct xfs_ail  *ailp)
 {
-	if (list_empty(&ailp->xa_ail))
+	if (list_empty(&ailp->ail_head))
 		return NULL;
 
-	return list_entry(ailp->xa_ail.prev, xfs_log_item_t, li_ail);
+	return list_entry(ailp->ail_head.prev, xfs_log_item_t, li_ail);
 }
 
 /*
@@ -84,7 +84,7 @@ xfs_ail_next(
 	struct xfs_ail  *ailp,
 	xfs_log_item_t  *lip)
 {
-	if (lip->li_ail.next == &ailp->xa_ail)
+	if (lip->li_ail.next == &ailp->ail_head)
 		return NULL;
 
 	return list_first_entry(&lip->li_ail, xfs_log_item_t, li_ail);
@@ -105,11 +105,11 @@ xfs_ail_min_lsn(
 	xfs_lsn_t	lsn = 0;
 	xfs_log_item_t	*lip;
 
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	lip = xfs_ail_min(ailp);
 	if (lip)
 		lsn = lip->li_lsn;
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 
 	return lsn;
 }
@@ -124,11 +124,11 @@ xfs_ail_max_lsn(
 	xfs_lsn_t       lsn = 0;
 	xfs_log_item_t  *lip;
 
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	lip = xfs_ail_max(ailp);
 	if (lip)
 		lsn = lip->li_lsn;
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 
 	return lsn;
 }
@@ -146,7 +146,7 @@ xfs_trans_ail_cursor_init(
 	struct xfs_ail_cursor	*cur)
 {
 	cur->item = NULL;
-	list_add_tail(&cur->list, &ailp->xa_cursors);
+	list_add_tail(&cur->list, &ailp->ail_cursors);
 }
 
 /*
@@ -194,7 +194,7 @@ xfs_trans_ail_cursor_clear(
 {
 	struct xfs_ail_cursor	*cur;
 
-	list_for_each_entry(cur, &ailp->xa_cursors, list) {
+	list_for_each_entry(cur, &ailp->ail_cursors, list) {
 		if (cur->item == lip)
 			cur->item = (struct xfs_log_item *)
 					((uintptr_t)cur->item | 1);
@@ -222,7 +222,7 @@ xfs_trans_ail_cursor_first(
 		goto out;
 	}
 
-	list_for_each_entry(lip, &ailp->xa_ail, li_ail) {
+	list_for_each_entry(lip, &ailp->ail_head, li_ail) {
 		if (XFS_LSN_CMP(lip->li_lsn, lsn) >= 0)
 			goto out;
 	}
@@ -241,7 +241,7 @@ __xfs_trans_ail_cursor_last(
 {
 	xfs_log_item_t		*lip;
 
-	list_for_each_entry_reverse(lip, &ailp->xa_ail, li_ail) {
+	list_for_each_entry_reverse(lip, &ailp->ail_head, li_ail) {
 		if (XFS_LSN_CMP(lip->li_lsn, lsn) <= 0)
 			return lip;
 	}
@@ -310,7 +310,7 @@ xfs_ail_splice(
 	if (lip)
 		list_splice(list, &lip->li_ail);
 	else
-		list_splice(list, &ailp->xa_ail);
+		list_splice(list, &ailp->ail_head);
 }
 
 /*
@@ -335,17 +335,17 @@ xfsaild_push_item(
 	 * If log item pinning is enabled, skip the push and track the item as
 	 * pinned. This can help induce head-behind-tail conditions.
 	 */
-	if (XFS_TEST_ERROR(false, ailp->xa_mount, XFS_ERRTAG_LOG_ITEM_PIN))
+	if (XFS_TEST_ERROR(false, ailp->ail_mount, XFS_ERRTAG_LOG_ITEM_PIN))
 		return XFS_ITEM_PINNED;
 
-	return lip->li_ops->iop_push(lip, &ailp->xa_buf_list);
+	return lip->li_ops->iop_push(lip, &ailp->ail_buf_list);
 }
 
 static long
 xfsaild_push(
 	struct xfs_ail		*ailp)
 {
-	xfs_mount_t		*mp = ailp->xa_mount;
+	xfs_mount_t		*mp = ailp->ail_mount;
 	struct xfs_ail_cursor	cur;
 	xfs_log_item_t		*lip;
 	xfs_lsn_t		lsn;
@@ -360,30 +360,30 @@ xfsaild_push(
 	 * buffers the last time we ran, force the log first and wait for it
 	 * before pushing again.
 	 */
-	if (ailp->xa_log_flush && ailp->xa_last_pushed_lsn == 0 &&
-	    (!list_empty_careful(&ailp->xa_buf_list) ||
+	if (ailp->ail_log_flush && ailp->ail_last_pushed_lsn == 0 &&
+	    (!list_empty_careful(&ailp->ail_buf_list) ||
 	     xfs_ail_min_lsn(ailp))) {
-		ailp->xa_log_flush = 0;
+		ailp->ail_log_flush = 0;
 
 		XFS_STATS_INC(mp, xs_push_ail_flush);
 		xfs_log_force(mp, XFS_LOG_SYNC);
 	}
 
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 
-	/* barrier matches the xa_target update in xfs_ail_push() */
+	/* barrier matches the ail_target update in xfs_ail_push() */
 	smp_rmb();
-	target = ailp->xa_target;
-	ailp->xa_target_prev = target;
+	target = ailp->ail_target;
+	ailp->ail_target_prev = target;
 
-	lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->xa_last_pushed_lsn);
+	lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->ail_last_pushed_lsn);
 	if (!lip) {
 		/*
 		 * If the AIL is empty or our push has reached the end we are
 		 * done now.
 		 */
 		xfs_trans_ail_cursor_done(&cur);
-		spin_unlock(&ailp->xa_lock);
+		spin_unlock(&ailp->ail_lock);
 		goto out_done;
 	}
 
@@ -404,7 +404,7 @@ xfsaild_push(
 			XFS_STATS_INC(mp, xs_push_ail_success);
 			trace_xfs_ail_push(lip);
 
-			ailp->xa_last_pushed_lsn = lsn;
+			ailp->ail_last_pushed_lsn = lsn;
 			break;
 
 		case XFS_ITEM_FLUSHING:
@@ -423,7 +423,7 @@ xfsaild_push(
 			trace_xfs_ail_flushing(lip);
 
 			flushing++;
-			ailp->xa_last_pushed_lsn = lsn;
+			ailp->ail_last_pushed_lsn = lsn;
 			break;
 
 		case XFS_ITEM_PINNED:
@@ -431,7 +431,7 @@ xfsaild_push(
 			trace_xfs_ail_pinned(lip);
 
 			stuck++;
-			ailp->xa_log_flush++;
+			ailp->ail_log_flush++;
 			break;
 		case XFS_ITEM_LOCKED:
 			XFS_STATS_INC(mp, xs_push_ail_locked);
@@ -468,10 +468,10 @@ xfsaild_push(
 		lsn = lip->li_lsn;
 	}
 	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 
-	if (xfs_buf_delwri_submit_nowait(&ailp->xa_buf_list))
-		ailp->xa_log_flush++;
+	if (xfs_buf_delwri_submit_nowait(&ailp->ail_buf_list))
+		ailp->ail_log_flush++;
 
 	if (!count || XFS_LSN_CMP(lsn, target) >= 0) {
 out_done:
@@ -481,7 +481,7 @@ xfsaild_push(
 		 * AIL before we start the next scan from the start of the AIL.
 		 */
 		tout = 50;
-		ailp->xa_last_pushed_lsn = 0;
+		ailp->ail_last_pushed_lsn = 0;
 	} else if (((stuck + flushing) * 100) / count > 90) {
 		/*
 		 * Either there is a lot of contention on the AIL or we are
@@ -494,7 +494,7 @@ xfsaild_push(
 		 * the restart to issue a log force to unpin the stuck items.
 		 */
 		tout = 20;
-		ailp->xa_last_pushed_lsn = 0;
+		ailp->ail_last_pushed_lsn = 0;
 	} else {
 		/*
 		 * Assume we have more work to do in a short while.
@@ -536,26 +536,26 @@ xfsaild(
 			break;
 		}
 
-		spin_lock(&ailp->xa_lock);
+		spin_lock(&ailp->ail_lock);
 
 		/*
 		 * Idle if the AIL is empty and we are not racing with a target
 		 * update. We check the AIL after we set the task to a sleep
-		 * state to guarantee that we either catch an xa_target update
+		 * state to guarantee that we either catch an ail_target update
 		 * or that a wake_up resets the state to TASK_RUNNING.
 		 * Otherwise, we run the risk of sleeping indefinitely.
 		 *
-		 * The barrier matches the xa_target update in xfs_ail_push().
+		 * The barrier matches the ail_target update in xfs_ail_push().
 		 */
 		smp_rmb();
 		if (!xfs_ail_min(ailp) &&
-		    ailp->xa_target == ailp->xa_target_prev) {
-			spin_unlock(&ailp->xa_lock);
+		    ailp->ail_target == ailp->ail_target_prev) {
+			spin_unlock(&ailp->ail_lock);
 			freezable_schedule();
 			tout = 0;
 			continue;
 		}
-		spin_unlock(&ailp->xa_lock);
+		spin_unlock(&ailp->ail_lock);
 
 		if (tout)
 			freezable_schedule_timeout(msecs_to_jiffies(tout));
@@ -592,8 +592,8 @@ xfs_ail_push(
 	xfs_log_item_t	*lip;
 
 	lip = xfs_ail_min(ailp);
-	if (!lip || XFS_FORCED_SHUTDOWN(ailp->xa_mount) ||
-	    XFS_LSN_CMP(threshold_lsn, ailp->xa_target) <= 0)
+	if (!lip || XFS_FORCED_SHUTDOWN(ailp->ail_mount) ||
+	    XFS_LSN_CMP(threshold_lsn, ailp->ail_target) <= 0)
 		return;
 
 	/*
@@ -601,10 +601,10 @@ xfs_ail_push(
 	 * the XFS_AIL_PUSHING_BIT.
 	 */
 	smp_wmb();
-	xfs_trans_ail_copy_lsn(ailp, &ailp->xa_target, &threshold_lsn);
+	xfs_trans_ail_copy_lsn(ailp, &ailp->ail_target, &threshold_lsn);
 	smp_wmb();
 
-	wake_up_process(ailp->xa_task);
+	wake_up_process(ailp->ail_task);
 }
 
 /*
@@ -630,18 +630,18 @@ xfs_ail_push_all_sync(
 	struct xfs_log_item	*lip;
 	DEFINE_WAIT(wait);
 
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	while ((lip = xfs_ail_max(ailp)) != NULL) {
-		prepare_to_wait(&ailp->xa_empty, &wait, TASK_UNINTERRUPTIBLE);
-		ailp->xa_target = lip->li_lsn;
-		wake_up_process(ailp->xa_task);
-		spin_unlock(&ailp->xa_lock);
+		prepare_to_wait(&ailp->ail_empty, &wait, TASK_UNINTERRUPTIBLE);
+		ailp->ail_target = lip->li_lsn;
+		wake_up_process(ailp->ail_task);
+		spin_unlock(&ailp->ail_lock);
 		schedule();
-		spin_lock(&ailp->xa_lock);
+		spin_lock(&ailp->ail_lock);
 	}
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 
-	finish_wait(&ailp->xa_empty, &wait);
+	finish_wait(&ailp->ail_empty, &wait);
 }
 
 /*
@@ -672,7 +672,7 @@ xfs_trans_ail_update_bulk(
 	struct xfs_ail_cursor	*cur,
 	struct xfs_log_item	**log_items,
 	int			nr_items,
-	xfs_lsn_t		lsn) __releases(ailp->xa_lock)
+	xfs_lsn_t		lsn) __releases(ailp->ail_lock)
 {
 	xfs_log_item_t		*mlip;
 	int			mlip_changed = 0;
@@ -705,13 +705,13 @@ xfs_trans_ail_update_bulk(
 		xfs_ail_splice(ailp, cur, &tmp, lsn);
 
 	if (mlip_changed) {
-		if (!XFS_FORCED_SHUTDOWN(ailp->xa_mount))
-			xlog_assign_tail_lsn_locked(ailp->xa_mount);
-		spin_unlock(&ailp->xa_lock);
+		if (!XFS_FORCED_SHUTDOWN(ailp->ail_mount))
+			xlog_assign_tail_lsn_locked(ailp->ail_mount);
+		spin_unlock(&ailp->ail_lock);
 
-		xfs_log_space_wake(ailp->xa_mount);
+		xfs_log_space_wake(ailp->ail_mount);
 	} else {
-		spin_unlock(&ailp->xa_lock);
+		spin_unlock(&ailp->ail_lock);
 	}
 }
 
@@ -756,13 +756,13 @@ void
 xfs_trans_ail_delete(
 	struct xfs_ail		*ailp,
 	struct xfs_log_item	*lip,
-	int			shutdown_type) __releases(ailp->xa_lock)
+	int			shutdown_type) __releases(ailp->ail_lock)
 {
-	struct xfs_mount	*mp = ailp->xa_mount;
+	struct xfs_mount	*mp = ailp->ail_mount;
 	bool			mlip_changed;
 
 	if (!(lip->li_flags & XFS_LI_IN_AIL)) {
-		spin_unlock(&ailp->xa_lock);
+		spin_unlock(&ailp->ail_lock);
 		if (!XFS_FORCED_SHUTDOWN(mp)) {
 			xfs_alert_tag(mp, XFS_PTAG_AILDELETE,
 	"%s: attempting to delete a log item that is not in the AIL",
@@ -776,13 +776,13 @@ xfs_trans_ail_delete(
 	if (mlip_changed) {
 		if (!XFS_FORCED_SHUTDOWN(mp))
 			xlog_assign_tail_lsn_locked(mp);
-		if (list_empty(&ailp->xa_ail))
-			wake_up_all(&ailp->xa_empty);
+		if (list_empty(&ailp->ail_head))
+			wake_up_all(&ailp->ail_empty);
 	}
 
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 	if (mlip_changed)
-		xfs_log_space_wake(ailp->xa_mount);
+		xfs_log_space_wake(ailp->ail_mount);
 }
 
 int
@@ -795,16 +795,16 @@ xfs_trans_ail_init(
 	if (!ailp)
 		return -ENOMEM;
 
-	ailp->xa_mount = mp;
-	INIT_LIST_HEAD(&ailp->xa_ail);
-	INIT_LIST_HEAD(&ailp->xa_cursors);
-	spin_lock_init(&ailp->xa_lock);
-	INIT_LIST_HEAD(&ailp->xa_buf_list);
-	init_waitqueue_head(&ailp->xa_empty);
+	ailp->ail_mount = mp;
+	INIT_LIST_HEAD(&ailp->ail_head);
+	INIT_LIST_HEAD(&ailp->ail_cursors);
+	spin_lock_init(&ailp->ail_lock);
+	INIT_LIST_HEAD(&ailp->ail_buf_list);
+	init_waitqueue_head(&ailp->ail_empty);
 
-	ailp->xa_task = kthread_run(xfsaild, ailp, "xfsaild/%s",
-			ailp->xa_mount->m_fsname);
-	if (IS_ERR(ailp->xa_task))
+	ailp->ail_task = kthread_run(xfsaild, ailp, "xfsaild/%s",
+			ailp->ail_mount->m_fsname);
+	if (IS_ERR(ailp->ail_task))
 		goto out_free_ailp;
 
 	mp->m_ail = ailp;
@@ -821,6 +821,6 @@ xfs_trans_ail_destroy(
 {
 	struct xfs_ail	*ailp = mp->m_ail;
 
-	kthread_stop(ailp->xa_task);
+	kthread_stop(ailp->ail_task);
 	kmem_free(ailp);
 }
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 3ba7a96a8abd..b8871bcfe00b 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -429,8 +429,8 @@ xfs_trans_brelse(xfs_trans_t	*tp,
 	 * If the fs has shutdown and we dropped the last reference, it may fall
 	 * on us to release a (possibly dirty) bli if it never made it to the
 	 * AIL (e.g., the aborted unpin already happened and didn't release it
-	 * due to our reference). Since we're already shutdown and need xa_lock,
-	 * just force remove from the AIL and release the bli here.
+	 * due to our reference). Since we're already shutdown and need
+	 * ail_lock, just force remove from the AIL and release the bli here.
 	 */
 	if (XFS_FORCED_SHUTDOWN(tp->t_mountp) && freed) {
 		xfs_trans_ail_remove(&bip->bli_item, SHUTDOWN_LOG_IO_ERROR);
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index b317a3644c00..be24b0c8a332 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -65,17 +65,17 @@ struct xfs_ail_cursor {
  * Eventually we need to drive the locking in here as well.
  */
 struct xfs_ail {
-	struct xfs_mount	*xa_mount;
-	struct task_struct	*xa_task;
-	struct list_head	xa_ail;
-	xfs_lsn_t		xa_target;
-	xfs_lsn_t		xa_target_prev;
-	struct list_head	xa_cursors;
-	spinlock_t		xa_lock;
-	xfs_lsn_t		xa_last_pushed_lsn;
-	int			xa_log_flush;
-	struct list_head	xa_buf_list;
-	wait_queue_head_t	xa_empty;
+	struct xfs_mount	*ail_mount;
+	struct task_struct	*ail_task;
+	struct list_head	ail_head;
+	xfs_lsn_t		ail_target;
+	xfs_lsn_t		ail_target_prev;
+	struct list_head	ail_cursors;
+	spinlock_t		ail_lock;
+	xfs_lsn_t		ail_last_pushed_lsn;
+	int			ail_log_flush;
+	struct list_head	ail_buf_list;
+	wait_queue_head_t	ail_empty;
 };
 
 /*
@@ -84,7 +84,7 @@ struct xfs_ail {
 void	xfs_trans_ail_update_bulk(struct xfs_ail *ailp,
 				struct xfs_ail_cursor *cur,
 				struct xfs_log_item **log_items, int nr_items,
-				xfs_lsn_t lsn) __releases(ailp->xa_lock);
+				xfs_lsn_t lsn) __releases(ailp->ail_lock);
 /*
  * Return a pointer to the first item in the AIL.  If the AIL is empty, then
  * return NULL.
@@ -93,7 +93,7 @@ static inline struct xfs_log_item *
 xfs_ail_min(
 	struct xfs_ail  *ailp)
 {
-	return list_first_entry_or_null(&ailp->xa_ail, struct xfs_log_item,
+	return list_first_entry_or_null(&ailp->ail_head, struct xfs_log_item,
 					li_ail);
 }
 
@@ -101,14 +101,14 @@ static inline void
 xfs_trans_ail_update(
 	struct xfs_ail		*ailp,
 	struct xfs_log_item	*lip,
-	xfs_lsn_t		lsn) __releases(ailp->xa_lock)
+	xfs_lsn_t		lsn) __releases(ailp->ail_lock)
 {
 	xfs_trans_ail_update_bulk(ailp, NULL, &lip, 1, lsn);
 }
 
 bool xfs_ail_delete_one(struct xfs_ail *ailp, struct xfs_log_item *lip);
 void xfs_trans_ail_delete(struct xfs_ail *ailp, struct xfs_log_item *lip,
-		int shutdown_type) __releases(ailp->xa_lock);
+		int shutdown_type) __releases(ailp->ail_lock);
 
 static inline void
 xfs_trans_ail_remove(
@@ -117,12 +117,12 @@ xfs_trans_ail_remove(
 {
 	struct xfs_ail		*ailp = lip->li_ailp;
 
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	/* xfs_trans_ail_delete() drops the AIL lock */
 	if (lip->li_flags & XFS_LI_IN_AIL)
 		xfs_trans_ail_delete(ailp, lip, shutdown_type);
 	else
-		spin_unlock(&ailp->xa_lock);
+		spin_unlock(&ailp->ail_lock);
 }
 
 void			xfs_ail_push(struct xfs_ail *, xfs_lsn_t);
@@ -149,9 +149,9 @@ xfs_trans_ail_copy_lsn(
 	xfs_lsn_t	*src)
 {
 	ASSERT(sizeof(xfs_lsn_t) == 8);	/* don't lock if it shrinks */
-	spin_lock(&ailp->xa_lock);
+	spin_lock(&ailp->ail_lock);
 	*dst = *src;
-	spin_unlock(&ailp->xa_lock);
+	spin_unlock(&ailp->ail_lock);
 }
 #else
 static inline void
@@ -172,7 +172,7 @@ xfs_clear_li_failed(
 	struct xfs_buf	*bp = lip->li_buf;
 
 	ASSERT(lip->li_flags & XFS_LI_IN_AIL);
-	lockdep_assert_held(&lip->li_ailp->xa_lock);
+	lockdep_assert_held(&lip->li_ailp->ail_lock);
 
 	if (lip->li_flags & XFS_LI_FAILED) {
 		lip->li_flags &= ~XFS_LI_FAILED;
@@ -186,7 +186,7 @@ xfs_set_li_failed(
 	struct xfs_log_item	*lip,
 	struct xfs_buf		*bp)
 {
-	lockdep_assert_held(&lip->li_ailp->xa_lock);
+	lockdep_assert_held(&lip->li_ailp->ail_lock);
 
 	if (!(lip->li_flags & XFS_LI_FAILED)) {
 		xfs_buf_hold(bp);
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 13/62] fscache: Use appropriate radix tree accessors
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Don't open-code accesses to data structure internals.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/fscache/cookie.c | 2 +-
 fs/fscache/object.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index ff84258132bb..e9054e0c1a49 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -608,7 +608,7 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie, bool retire)
 	/* Clear pointers back to the netfs */
 	cookie->netfs_data	= NULL;
 	cookie->def		= NULL;
-	BUG_ON(cookie->stores.rnode);
+	BUG_ON(!radix_tree_empty(&cookie->stores));
 
 	if (cookie->parent) {
 		ASSERTCMP(atomic_read(&cookie->parent->usage), >, 0);
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index 7a182c87f378..aa0e71f02c33 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -956,7 +956,7 @@ static const struct fscache_state *_fscache_invalidate_object(struct fscache_obj
 	 * retire the object instead.
 	 */
 	if (!fscache_use_cookie(object)) {
-		ASSERT(object->cookie->stores.rnode == NULL);
+		ASSERT(radix_tree_empty(&object->cookie->stores));
 		set_bit(FSCACHE_OBJECT_RETIRED, &object->flags);
 		_leave(" [no cookie]");
 		return transit_to(KILL_OBJECT);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 13/62] fscache: Use appropriate radix tree accessors
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Don't open-code accesses to data structure internals.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/fscache/cookie.c | 2 +-
 fs/fscache/object.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index ff84258132bb..e9054e0c1a49 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -608,7 +608,7 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie, bool retire)
 	/* Clear pointers back to the netfs */
 	cookie->netfs_data	= NULL;
 	cookie->def		= NULL;
-	BUG_ON(cookie->stores.rnode);
+	BUG_ON(!radix_tree_empty(&cookie->stores));
 
 	if (cookie->parent) {
 		ASSERTCMP(atomic_read(&cookie->parent->usage), >, 0);
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index 7a182c87f378..aa0e71f02c33 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -956,7 +956,7 @@ static const struct fscache_state *_fscache_invalidate_object(struct fscache_obj
 	 * retire the object instead.
 	 */
 	if (!fscache_use_cookie(object)) {
-		ASSERT(object->cookie->stores.rnode == NULL);
+		ASSERT(radix_tree_empty(&object->cookie->stores));
 		set_bit(FSCACHE_OBJECT_RETIRED, &object->flags);
 		_leave(" [no cookie]");
 		return transit_to(KILL_OBJECT);
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 14/62] xarray: Add the xa_lock to the radix_tree_root
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This results in no change in structure size on 64-bit x86 as it fits in
the padding between the gfp_t and the void *.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/f2fs/gc.c                   |  2 +-
 include/linux/idr.h            | 12 ++++++------
 include/linux/radix-tree.h     |  7 +++++--
 include/linux/xarray.h         | 34 ++++++++++++++++++++++++++++++++++
 kernel/pid.c                   |  2 +-
 tools/include/linux/spinlock.h |  1 +
 6 files changed, 48 insertions(+), 10 deletions(-)
 create mode 100644 include/linux/xarray.h

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 5d5bba462f26..2cf52b92db5c 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -991,7 +991,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
 	unsigned int init_segno = segno;
 	struct gc_inode_list gc_list = {
 		.ilist = LIST_HEAD_INIT(gc_list.ilist),
-		.iroot = RADIX_TREE_INIT(GFP_NOFS),
+		.iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
 	};
 
 	trace_f2fs_gc_begin(sbi->sb, sync, background,
diff --git a/include/linux/idr.h b/include/linux/idr.h
index 87ae635fcee6..3c3346441cbc 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -29,11 +29,11 @@ struct idr {
 /* Set the IDR flag and the IDR_FREE tag */
 #define IDR_RT_MARKER		((__force gfp_t)(3 << __GFP_BITS_SHIFT))
 
-#define IDR_INIT							\
+#define IDR_INIT(name)							\
 {									\
-	.idr_rt = RADIX_TREE_INIT(IDR_RT_MARKER)			\
+	.idr_rt = RADIX_TREE_INIT(name, IDR_RT_MARKER)			\
 }
-#define DEFINE_IDR(name)	struct idr name = IDR_INIT
+#define DEFINE_IDR(name)	struct idr name = IDR_INIT(name)
 
 /**
  * DOC: idr sync
@@ -166,10 +166,10 @@ struct ida {
 	struct radix_tree_root	ida_rt;
 };
 
-#define IDA_INIT	{						\
-	.ida_rt = RADIX_TREE_INIT(IDR_RT_MARKER | GFP_NOWAIT),		\
+#define IDA_INIT(name)	{						\
+	.ida_rt = RADIX_TREE_INIT(name, IDR_RT_MARKER | GFP_NOWAIT),	\
 }
-#define DEFINE_IDA(name)	struct ida name = IDA_INIT
+#define DEFINE_IDA(name)	struct ida name = IDA_INIT(name)
 
 int ida_pre_get(struct ida *ida, gfp_t gfp_mask);
 int ida_get_new_above(struct ida *ida, int starting_id, int *p_id);
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index fc55ff31eca7..b4d60295decd 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -109,20 +109,23 @@ struct radix_tree_node {
 #define ROOT_TAG_SHIFT	(__GFP_BITS_SHIFT + 1)
 
 struct radix_tree_root {
+	spinlock_t		xa_lock;
 	gfp_t			gfp_mask;
 	struct radix_tree_node	__rcu *rnode;
 };
 
-#define RADIX_TREE_INIT(mask)	{					\
+#define RADIX_TREE_INIT(name, mask)	{				\
+	.xa_lock = __SPIN_LOCK_UNLOCKED(name.xa_lock),			\
 	.gfp_mask = (mask),						\
 	.rnode = NULL,							\
 }
 
 #define RADIX_TREE(name, mask) \
-	struct radix_tree_root name = RADIX_TREE_INIT(mask)
+	struct radix_tree_root name = RADIX_TREE_INIT(name, mask)
 
 #define INIT_RADIX_TREE(root, mask)					\
 do {									\
+	(root)->xa_lock = __SPIN_LOCK_UNLOCKED(root.xa_lock);		\
 	(root)->gfp_mask = (mask);					\
 	(root)->rnode = NULL;						\
 } while (0)
diff --git a/include/linux/xarray.h b/include/linux/xarray.h
new file mode 100644
index 000000000000..a5a933925b85
--- /dev/null
+++ b/include/linux/xarray.h
@@ -0,0 +1,34 @@
+#ifndef _LINUX_XARRAY_H
+#define _LINUX_XARRAY_H
+/*
+ * eXtensible Arrays
+ * Copyright (c) 2017 Microsoft Corporation
+ * Author: Matthew Wilcox <mawilcox@microsoft.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/spinlock.h>
+
+#define xa_trylock(xa)		spin_trylock(&(xa)->xa_lock)
+#define xa_lock(xa)		spin_lock(&(xa)->xa_lock)
+#define xa_unlock(xa)		spin_unlock(&(xa)->xa_lock)
+#define xa_lock_bh(xa)		spin_lock_bh(&(xa)->xa_lock)
+#define xa_unlock_bh(xa)	spin_unlock_bh(&(xa)->xa_lock)
+#define xa_lock_irq(xa)		spin_lock_irq(&(xa)->xa_lock)
+#define xa_unlock_irq(xa)	spin_unlock_irq(&(xa)->xa_lock)
+#define xa_lock_irqsave(xa, flags) \
+				spin_lock_irqsave(&(xa)->xa_lock, flags)
+#define xa_unlock_irqrestore(xa, flags) \
+				spin_unlock_irqrestore(&(xa)->xa_lock, flags)
+#define xa_lock_held(xa)	lockdep_is_held(&(xa)->xa_lock)
+
+#endif /* _LINUX_XARRAY_H */
diff --git a/kernel/pid.c b/kernel/pid.c
index aae5f3307c35..1df901f32b54 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -58,7 +58,7 @@ int pid_max_max = PID_MAX_LIMIT;
  */
 struct pid_namespace init_pid_ns = {
 	.kref = KREF_INIT(2),
-	.idr = IDR_INIT,
+	.idr = IDR_INIT(init_pid_ns.idr),
 	.pid_allocated = PIDNS_ADDING,
 	.level = 0,
 	.child_reaper = &init_task,
diff --git a/tools/include/linux/spinlock.h b/tools/include/linux/spinlock.h
index 4ed569fcb139..b21b586b9854 100644
--- a/tools/include/linux/spinlock.h
+++ b/tools/include/linux/spinlock.h
@@ -7,6 +7,7 @@
 
 #define spinlock_t		pthread_mutex_t
 #define DEFINE_SPINLOCK(x)	pthread_mutex_t x = PTHREAD_MUTEX_INITIALIZER;
+#define __SPIN_LOCK_UNLOCKED(x)	(pthread_mutex_t)PTHREAD_MUTEX_INITIALIZER
 
 #define spin_lock_irqsave(x, f)		(void)f, pthread_mutex_lock(x)
 #define spin_unlock_irqrestore(x, f)	(void)f, pthread_mutex_unlock(x)
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 14/62] xarray: Add the xa_lock to the radix_tree_root
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This results in no change in structure size on 64-bit x86 as it fits in
the padding between the gfp_t and the void *.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/f2fs/gc.c                   |  2 +-
 include/linux/idr.h            | 12 ++++++------
 include/linux/radix-tree.h     |  7 +++++--
 include/linux/xarray.h         | 34 ++++++++++++++++++++++++++++++++++
 kernel/pid.c                   |  2 +-
 tools/include/linux/spinlock.h |  1 +
 6 files changed, 48 insertions(+), 10 deletions(-)
 create mode 100644 include/linux/xarray.h

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 5d5bba462f26..2cf52b92db5c 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -991,7 +991,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
 	unsigned int init_segno = segno;
 	struct gc_inode_list gc_list = {
 		.ilist = LIST_HEAD_INIT(gc_list.ilist),
-		.iroot = RADIX_TREE_INIT(GFP_NOFS),
+		.iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
 	};
 
 	trace_f2fs_gc_begin(sbi->sb, sync, background,
diff --git a/include/linux/idr.h b/include/linux/idr.h
index 87ae635fcee6..3c3346441cbc 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -29,11 +29,11 @@ struct idr {
 /* Set the IDR flag and the IDR_FREE tag */
 #define IDR_RT_MARKER		((__force gfp_t)(3 << __GFP_BITS_SHIFT))
 
-#define IDR_INIT							\
+#define IDR_INIT(name)							\
 {									\
-	.idr_rt = RADIX_TREE_INIT(IDR_RT_MARKER)			\
+	.idr_rt = RADIX_TREE_INIT(name, IDR_RT_MARKER)			\
 }
-#define DEFINE_IDR(name)	struct idr name = IDR_INIT
+#define DEFINE_IDR(name)	struct idr name = IDR_INIT(name)
 
 /**
  * DOC: idr sync
@@ -166,10 +166,10 @@ struct ida {
 	struct radix_tree_root	ida_rt;
 };
 
-#define IDA_INIT	{						\
-	.ida_rt = RADIX_TREE_INIT(IDR_RT_MARKER | GFP_NOWAIT),		\
+#define IDA_INIT(name)	{						\
+	.ida_rt = RADIX_TREE_INIT(name, IDR_RT_MARKER | GFP_NOWAIT),	\
 }
-#define DEFINE_IDA(name)	struct ida name = IDA_INIT
+#define DEFINE_IDA(name)	struct ida name = IDA_INIT(name)
 
 int ida_pre_get(struct ida *ida, gfp_t gfp_mask);
 int ida_get_new_above(struct ida *ida, int starting_id, int *p_id);
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index fc55ff31eca7..b4d60295decd 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -109,20 +109,23 @@ struct radix_tree_node {
 #define ROOT_TAG_SHIFT	(__GFP_BITS_SHIFT + 1)
 
 struct radix_tree_root {
+	spinlock_t		xa_lock;
 	gfp_t			gfp_mask;
 	struct radix_tree_node	__rcu *rnode;
 };
 
-#define RADIX_TREE_INIT(mask)	{					\
+#define RADIX_TREE_INIT(name, mask)	{				\
+	.xa_lock = __SPIN_LOCK_UNLOCKED(name.xa_lock),			\
 	.gfp_mask = (mask),						\
 	.rnode = NULL,							\
 }
 
 #define RADIX_TREE(name, mask) \
-	struct radix_tree_root name = RADIX_TREE_INIT(mask)
+	struct radix_tree_root name = RADIX_TREE_INIT(name, mask)
 
 #define INIT_RADIX_TREE(root, mask)					\
 do {									\
+	(root)->xa_lock = __SPIN_LOCK_UNLOCKED(root.xa_lock);		\
 	(root)->gfp_mask = (mask);					\
 	(root)->rnode = NULL;						\
 } while (0)
diff --git a/include/linux/xarray.h b/include/linux/xarray.h
new file mode 100644
index 000000000000..a5a933925b85
--- /dev/null
+++ b/include/linux/xarray.h
@@ -0,0 +1,34 @@
+#ifndef _LINUX_XARRAY_H
+#define _LINUX_XARRAY_H
+/*
+ * eXtensible Arrays
+ * Copyright (c) 2017 Microsoft Corporation
+ * Author: Matthew Wilcox <mawilcox@microsoft.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/spinlock.h>
+
+#define xa_trylock(xa)		spin_trylock(&(xa)->xa_lock)
+#define xa_lock(xa)		spin_lock(&(xa)->xa_lock)
+#define xa_unlock(xa)		spin_unlock(&(xa)->xa_lock)
+#define xa_lock_bh(xa)		spin_lock_bh(&(xa)->xa_lock)
+#define xa_unlock_bh(xa)	spin_unlock_bh(&(xa)->xa_lock)
+#define xa_lock_irq(xa)		spin_lock_irq(&(xa)->xa_lock)
+#define xa_unlock_irq(xa)	spin_unlock_irq(&(xa)->xa_lock)
+#define xa_lock_irqsave(xa, flags) \
+				spin_lock_irqsave(&(xa)->xa_lock, flags)
+#define xa_unlock_irqrestore(xa, flags) \
+				spin_unlock_irqrestore(&(xa)->xa_lock, flags)
+#define xa_lock_held(xa)	lockdep_is_held(&(xa)->xa_lock)
+
+#endif /* _LINUX_XARRAY_H */
diff --git a/kernel/pid.c b/kernel/pid.c
index aae5f3307c35..1df901f32b54 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -58,7 +58,7 @@ int pid_max_max = PID_MAX_LIMIT;
  */
 struct pid_namespace init_pid_ns = {
 	.kref = KREF_INIT(2),
-	.idr = IDR_INIT,
+	.idr = IDR_INIT(init_pid_ns.idr),
 	.pid_allocated = PIDNS_ADDING,
 	.level = 0,
 	.child_reaper = &init_task,
diff --git a/tools/include/linux/spinlock.h b/tools/include/linux/spinlock.h
index 4ed569fcb139..b21b586b9854 100644
--- a/tools/include/linux/spinlock.h
+++ b/tools/include/linux/spinlock.h
@@ -7,6 +7,7 @@
 
 #define spinlock_t		pthread_mutex_t
 #define DEFINE_SPINLOCK(x)	pthread_mutex_t x = PTHREAD_MUTEX_INITIALIZER;
+#define __SPIN_LOCK_UNLOCKED(x)	(pthread_mutex_t)PTHREAD_MUTEX_INITIALIZER
 
 #define spin_lock_irqsave(x, f)		(void)f, pthread_mutex_lock(x)
 #define spin_unlock_irqrestore(x, f)	(void)f, pthread_mutex_unlock(x)
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 15/62] page cache: Use xa_lock
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Remove the address_space ->tree_lock and use the xa_lock newly added to
the radix_tree_root.  Rename the address_space ->page_tree to ->pages,
since we don't really care that it's a tree.  Take the opportunity to
rearrange the elements of address_space to pack them better on 64-bit,
and make the comments more useful.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 Documentation/cgroup-v1/memory.txt              |   2 +-
 Documentation/vm/page_migration                 |  14 ++--
 arch/arm/include/asm/cacheflush.h               |   6 +-
 arch/nios2/include/asm/cacheflush.h             |   6 +-
 arch/parisc/include/asm/cacheflush.h            |   6 +-
 drivers/staging/lustre/lustre/llite/glimpse.c   |   2 +-
 drivers/staging/lustre/lustre/mdc/mdc_request.c |   8 +-
 fs/afs/write.c                                  |   2 +-
 fs/btrfs/compression.c                          |   2 +-
 fs/btrfs/extent_io.c                            |   8 +-
 fs/btrfs/inode.c                                |   2 +-
 fs/buffer.c                                     |  10 +--
 fs/cifs/file.c                                  |   2 +-
 fs/dax.c                                        | 106 ++++++++++++------------
 fs/f2fs/data.c                                  |   6 +-
 fs/f2fs/dir.c                                   |   6 +-
 fs/f2fs/inline.c                                |   6 +-
 fs/f2fs/node.c                                  |   8 +-
 fs/fs-writeback.c                               |  18 ++--
 fs/inode.c                                      |  11 ++-
 fs/nilfs2/btnode.c                              |  20 ++---
 fs/nilfs2/page.c                                |  22 ++---
 include/linux/backing-dev.h                     |  12 +--
 include/linux/fs.h                              |  17 ++--
 include/linux/mm.h                              |   2 +-
 include/linux/pagemap.h                         |   4 +-
 mm/filemap.c                                    |  83 +++++++++----------
 mm/huge_memory.c                                |  10 +--
 mm/khugepaged.c                                 |  49 +++++------
 mm/memcontrol.c                                 |   2 +-
 mm/migrate.c                                    |  31 ++++---
 mm/page-writeback.c                             |  42 +++++-----
 mm/readahead.c                                  |   2 +-
 mm/rmap.c                                       |   4 +-
 mm/shmem.c                                      |  60 +++++++-------
 mm/swap_state.c                                 |  17 ++--
 mm/truncate.c                                   |  22 ++---
 mm/vmscan.c                                     |  12 +--
 mm/workingset.c                                 |  22 ++---
 39 files changed, 322 insertions(+), 342 deletions(-)

diff --git a/Documentation/cgroup-v1/memory.txt b/Documentation/cgroup-v1/memory.txt
index cefb63639070..1d17fb0405ef 100644
--- a/Documentation/cgroup-v1/memory.txt
+++ b/Documentation/cgroup-v1/memory.txt
@@ -262,7 +262,7 @@ When oom event notifier is registered, event will be delivered.
 2.6 Locking
 
    lock_page_cgroup()/unlock_page_cgroup() should not be called under
-   mapping->tree_lock.
+   mapping xa_lock.
 
    Other lock order is following:
    PG_locked.
diff --git a/Documentation/vm/page_migration b/Documentation/vm/page_migration
index 0478ae2ad44a..faf849596a85 100644
--- a/Documentation/vm/page_migration
+++ b/Documentation/vm/page_migration
@@ -90,7 +90,7 @@ Steps:
 
 1. Lock the page to be migrated
 
-2. Insure that writeback is complete.
+2. Ensure that writeback is complete.
 
 3. Lock the new page that we want to move to. It is locked so that accesses to
    this (not yet uptodate) page immediately lock while the move is in progress.
@@ -100,8 +100,8 @@ Steps:
    mapcount is not zero then we do not migrate the page. All user space
    processes that attempt to access the page will now wait on the page lock.
 
-5. The radix tree lock is taken. This will cause all processes trying
-   to access the page via the mapping to block on the radix tree spinlock.
+5. The address space xa_lock is taken. This will cause all processes trying
+   to access the page via the mapping to block on the spinlock.
 
 6. The refcount of the page is examined and we back out if references remain
    otherwise we know that we are the only one referencing this page.
@@ -114,12 +114,12 @@ Steps:
 
 9. The radix tree is changed to point to the new page.
 
-10. The reference count of the old page is dropped because the radix tree
+10. The reference count of the old page is dropped because the address space
     reference is gone. A reference to the new page is established because
-    the new page is referenced to by the radix tree.
+    the new page is referenced by the address space.
 
-11. The radix tree lock is dropped. With that lookups in the mapping
-    become possible again. Processes will move from spinning on the tree_lock
+11. The address space xa_lock is dropped. With that lookups in the mapping
+    become possible again. Processes will move from spinning on the xa_lock
     to sleeping on the locked new page.
 
 12. The page contents are copied to the new page.
diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 74504b154256..f4ead9a74b7d 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -318,10 +318,8 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
 #define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE
 extern void flush_kernel_dcache_page(struct page *);
 
-#define flush_dcache_mmap_lock(mapping) \
-	spin_lock_irq(&(mapping)->tree_lock)
-#define flush_dcache_mmap_unlock(mapping) \
-	spin_unlock_irq(&(mapping)->tree_lock)
+#define flush_dcache_mmap_lock(mapping)		xa_lock_irq(&mapping->pages)
+#define flush_dcache_mmap_unlock(mapping)	xa_unlock_irq(&mapping->pages)
 
 #define flush_icache_user_range(vma,page,addr,len) \
 	flush_dcache_page(page)
diff --git a/arch/nios2/include/asm/cacheflush.h b/arch/nios2/include/asm/cacheflush.h
index 55e383c173f7..7a6eda381964 100644
--- a/arch/nios2/include/asm/cacheflush.h
+++ b/arch/nios2/include/asm/cacheflush.h
@@ -46,9 +46,7 @@ extern void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
 extern void flush_dcache_range(unsigned long start, unsigned long end);
 extern void invalidate_dcache_range(unsigned long start, unsigned long end);
 
-#define flush_dcache_mmap_lock(mapping) \
-	spin_lock_irq(&(mapping)->tree_lock)
-#define flush_dcache_mmap_unlock(mapping) \
-	spin_unlock_irq(&(mapping)->tree_lock)
+#define flush_dcache_mmap_lock(mapping)		xa_lock_irq(&mapping->pages)
+#define flush_dcache_mmap_unlock(mapping)	xa_unlock_irq(&mapping->pages)
 
 #endif /* _ASM_NIOS2_CACHEFLUSH_H */
diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h
index 3742508cc534..b772dd320118 100644
--- a/arch/parisc/include/asm/cacheflush.h
+++ b/arch/parisc/include/asm/cacheflush.h
@@ -54,10 +54,8 @@ void invalidate_kernel_vmap_range(void *vaddr, int size);
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 extern void flush_dcache_page(struct page *page);
 
-#define flush_dcache_mmap_lock(mapping) \
-	spin_lock_irq(&(mapping)->tree_lock)
-#define flush_dcache_mmap_unlock(mapping) \
-	spin_unlock_irq(&(mapping)->tree_lock)
+#define flush_dcache_mmap_lock(mapping)		xa_lock_irq(&mapping->pages)
+#define flush_dcache_mmap_unlock(mapping)	xa_unlock_irq(&mapping->pages)
 
 #define flush_icache_page(vma,page)	do { 		\
 	flush_kernel_dcache_page(page);			\
diff --git a/drivers/staging/lustre/lustre/llite/glimpse.c b/drivers/staging/lustre/lustre/llite/glimpse.c
index c43ac574274c..5f2843da911c 100644
--- a/drivers/staging/lustre/lustre/llite/glimpse.c
+++ b/drivers/staging/lustre/lustre/llite/glimpse.c
@@ -69,7 +69,7 @@ blkcnt_t dirty_cnt(struct inode *inode)
 	void	      *results[1];
 
 	if (inode->i_mapping)
-		cnt += radix_tree_gang_lookup_tag(&inode->i_mapping->page_tree,
+		cnt += radix_tree_gang_lookup_tag(&inode->i_mapping->pages,
 						  results, 0, 1,
 						  PAGECACHE_TAG_DIRTY);
 	if (cnt == 0 && atomic_read(&vob->vob_mmap_cnt) > 0)
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 03e55bca4ada..45dcf9f958d4 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -937,14 +937,14 @@ static struct page *mdc_page_locate(struct address_space *mapping, __u64 *hash,
 	struct page *page;
 	int found;
 
-	spin_lock_irq(&mapping->tree_lock);
-	found = radix_tree_gang_lookup(&mapping->page_tree,
+	xa_lock_irq(&mapping->pages);
+	found = radix_tree_gang_lookup(&mapping->pages,
 				       (void **)&page, offset, 1);
 	if (found > 0 && !radix_tree_exceptional_entry(page)) {
 		struct lu_dirpage *dp;
 
 		get_page(page);
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		/*
 		 * In contrast to find_lock_page() we are sure that directory
 		 * page cannot be truncated (while DLM lock is held) and,
@@ -992,7 +992,7 @@ static struct page *mdc_page_locate(struct address_space *mapping, __u64 *hash,
 			page = ERR_PTR(-EIO);
 		}
 	} else {
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		page = NULL;
 	}
 	return page;
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 18e46e31523c..eae4c80ec09d 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -563,7 +563,7 @@ static int afs_writepages_region(struct address_space *mapping,
 
 		_debug("wback %lx", page->index);
 
-		/* at this point we hold neither mapping->tree_lock nor lock on
+		/* at this point we hold neither mapping xa_lock nor lock on
 		 * the page itself: the page may be truncated or invalidated
 		 * (changing page->mapping to NULL), or even swizzled back from
 		 * swapper_space to tmpfs file mapping
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index b35ce16b3df3..541bd536c815 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -449,7 +449,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 			break;
 
 		rcu_read_lock();
-		page = radix_tree_lookup(&mapping->page_tree, pg_index);
+		page = radix_tree_lookup(&mapping->pages, pg_index);
 		rcu_read_unlock();
 		if (page && !radix_tree_exceptional_entry(page)) {
 			misses++;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 16045ea86fc1..184096a709a9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3966,7 +3966,7 @@ static int extent_write_cache_pages(struct address_space *mapping,
 
 			done_index = page->index;
 			/*
-			 * At this point we hold neither mapping->tree_lock nor
+			 * At this point we hold neither mapping xa_lock nor
 			 * lock on the page itself: the page may be truncated or
 			 * invalidated (changing page->mapping to NULL), or even
 			 * swizzled back from swapper_space to tmpfs file
@@ -5196,13 +5196,13 @@ void clear_extent_buffer_dirty(struct extent_buffer *eb)
 		WARN_ON(!PagePrivate(page));
 
 		clear_page_dirty_for_io(page);
-		spin_lock_irq(&page->mapping->tree_lock);
+		xa_lock_irq(&page->mapping->pages);
 		if (!PageDirty(page)) {
-			radix_tree_tag_clear(&page->mapping->page_tree,
+			radix_tree_tag_clear(&page->mapping->pages,
 						page_index(page),
 						PAGECACHE_TAG_DIRTY);
 		}
-		spin_unlock_irq(&page->mapping->tree_lock);
+		xa_unlock_irq(&page->mapping->pages);
 		ClearPageError(page);
 		unlock_page(page);
 	}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b93fe05a39c7..9ceee45ed513 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7527,7 +7527,7 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
 
 bool btrfs_page_exists_in_range(struct inode *inode, loff_t start, loff_t end)
 {
-	struct radix_tree_root *root = &inode->i_mapping->page_tree;
+	struct radix_tree_root *root = &inode->i_mapping->pages;
 	bool found = false;
 	void **pagep = NULL;
 	struct page *page = NULL;
diff --git a/fs/buffer.c b/fs/buffer.c
index fd894c2ae284..33c08624d45b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -195,7 +195,7 @@ EXPORT_SYMBOL(end_buffer_write_sync);
  * Hack idea: for the blockdev mapping, i_bufferlist_lock contention
  * may be quite high.  This code could TryLock the page, and if that
  * succeeds, there is no need to take private_lock. (But if
- * private_lock is contended then so is mapping->tree_lock).
+ * private_lock is contended then so is mapping xa_lock).
  */
 static struct buffer_head *
 __find_get_block_slow(struct block_device *bdev, sector_t block)
@@ -606,14 +606,14 @@ void __set_page_dirty(struct page *page, struct address_space *mapping,
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&mapping->tree_lock, flags);
+	xa_lock_irqsave(&mapping->pages, flags);
 	if (page->mapping) {	/* Race with truncate? */
 		WARN_ON_ONCE(warn && !PageUptodate(page));
 		account_page_dirtied(page, mapping);
-		radix_tree_tag_set(&mapping->page_tree,
+		radix_tree_tag_set(&mapping->pages,
 				page_index(page), PAGECACHE_TAG_DIRTY);
 	}
-	spin_unlock_irqrestore(&mapping->tree_lock, flags);
+	xa_unlock_irqrestore(&mapping->pages, flags);
 }
 EXPORT_SYMBOL_GPL(__set_page_dirty);
 
@@ -1102,7 +1102,7 @@ __getblk_slow(struct block_device *bdev, sector_t block,
  * inode list.
  *
  * mark_buffer_dirty() is atomic.  It takes bh->b_page->mapping->private_lock,
- * mapping->tree_lock and mapping->host->i_lock.
+ * mapping xa_lock and mapping->host->i_lock.
  */
 void mark_buffer_dirty(struct buffer_head *bh)
 {
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 3a85df2a9baf..ca7084825622 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1987,7 +1987,7 @@ wdata_prepare_pages(struct cifs_writedata *wdata, unsigned int found_pages,
 	for (i = 0; i < found_pages; i++) {
 		page = wdata->pages[i];
 		/*
-		 * At this point we hold neither mapping->tree_lock nor
+		 * At this point we hold neither mapping xa_lock nor
 		 * lock on the page itself: the page may be truncated or
 		 * invalidated (changing page->mapping to NULL), or even
 		 * swizzled back from swapper_space to tmpfs file
diff --git a/fs/dax.c b/fs/dax.c
index 95981591977a..93e6b79687af 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -158,7 +158,7 @@ static int wake_exceptional_entry_func(wait_queue_entry_t *wait, unsigned int mo
 }
 
 /*
- * We do not necessarily hold the mapping->tree_lock when we call this
+ * We do not necessarily hold the mapping xa_lock when we call this
  * function so it is possible that 'entry' is no longer a valid item in the
  * radix tree.  This is okay because all we really need to do is to find the
  * correct waitqueue where tasks might be waiting for that old 'entry' and
@@ -174,7 +174,7 @@ static void dax_wake_mapping_entry_waiter(struct address_space *mapping,
 
 	/*
 	 * Checking for locked entry and prepare_to_wait_exclusive() happens
-	 * under mapping->tree_lock, ditto for entry handling in our callers.
+	 * under mapping xa_lock, ditto for entry handling in our callers.
 	 * So at this point all tasks that could have seen our entry locked
 	 * must be in the waitqueue and the following check will see them.
 	 */
@@ -184,40 +184,40 @@ static void dax_wake_mapping_entry_waiter(struct address_space *mapping,
 
 /*
  * Check whether the given slot is locked. The function must be called with
- * mapping->tree_lock held
+ * mapping xa_lock held
  */
 static inline int slot_locked(struct address_space *mapping, void **slot)
 {
 	unsigned long entry = (unsigned long)
-		radix_tree_deref_slot_protected(slot, &mapping->tree_lock);
+		radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock);
 	return entry & RADIX_DAX_ENTRY_LOCK;
 }
 
 /*
  * Mark the given slot is locked. The function must be called with
- * mapping->tree_lock held
+ * mapping xa_lock held
  */
 static inline void *lock_slot(struct address_space *mapping, void **slot)
 {
 	unsigned long entry = (unsigned long)
-		radix_tree_deref_slot_protected(slot, &mapping->tree_lock);
+		radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock);
 
 	entry |= RADIX_DAX_ENTRY_LOCK;
-	radix_tree_replace_slot(&mapping->page_tree, slot, (void *)entry);
+	radix_tree_replace_slot(&mapping->pages, slot, (void *)entry);
 	return (void *)entry;
 }
 
 /*
  * Mark the given slot is unlocked. The function must be called with
- * mapping->tree_lock held
+ * mapping xa_lock held
  */
 static inline void *unlock_slot(struct address_space *mapping, void **slot)
 {
 	unsigned long entry = (unsigned long)
-		radix_tree_deref_slot_protected(slot, &mapping->tree_lock);
+		radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock);
 
 	entry &= ~(unsigned long)RADIX_DAX_ENTRY_LOCK;
-	radix_tree_replace_slot(&mapping->page_tree, slot, (void *)entry);
+	radix_tree_replace_slot(&mapping->pages, slot, (void *)entry);
 	return (void *)entry;
 }
 
@@ -228,7 +228,7 @@ static inline void *unlock_slot(struct address_space *mapping, void **slot)
  * put_locked_mapping_entry() when he locked the entry and now wants to
  * unlock it.
  *
- * The function must be called with mapping->tree_lock held.
+ * The function must be called with mapping xa_lock held.
  */
 static void *get_unlocked_mapping_entry(struct address_space *mapping,
 					pgoff_t index, void ***slotp)
@@ -241,7 +241,7 @@ static void *get_unlocked_mapping_entry(struct address_space *mapping,
 	ewait.wait.func = wake_exceptional_entry_func;
 
 	for (;;) {
-		entry = __radix_tree_lookup(&mapping->page_tree, index, NULL,
+		entry = __radix_tree_lookup(&mapping->pages, index, NULL,
 					  &slot);
 		if (!entry ||
 		    WARN_ON_ONCE(!radix_tree_exceptional_entry(entry)) ||
@@ -254,10 +254,10 @@ static void *get_unlocked_mapping_entry(struct address_space *mapping,
 		wq = dax_entry_waitqueue(mapping, index, entry, &ewait.key);
 		prepare_to_wait_exclusive(wq, &ewait.wait,
 					  TASK_UNINTERRUPTIBLE);
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		schedule();
 		finish_wait(wq, &ewait.wait);
-		spin_lock_irq(&mapping->tree_lock);
+		xa_lock_irq(&mapping->pages);
 	}
 }
 
@@ -266,15 +266,15 @@ static void dax_unlock_mapping_entry(struct address_space *mapping,
 {
 	void *entry, **slot;
 
-	spin_lock_irq(&mapping->tree_lock);
-	entry = __radix_tree_lookup(&mapping->page_tree, index, NULL, &slot);
+	xa_lock_irq(&mapping->pages);
+	entry = __radix_tree_lookup(&mapping->pages, index, NULL, &slot);
 	if (WARN_ON_ONCE(!entry || !radix_tree_exceptional_entry(entry) ||
 			 !slot_locked(mapping, slot))) {
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		return;
 	}
 	unlock_slot(mapping, slot);
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 	dax_wake_mapping_entry_waiter(mapping, index, entry, false);
 }
 
@@ -331,7 +331,7 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
 	void *entry, **slot;
 
 restart:
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 	entry = get_unlocked_mapping_entry(mapping, index, &slot);
 
 	if (WARN_ON_ONCE(entry && !radix_tree_exceptional_entry(entry))) {
@@ -363,12 +363,12 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
 		if (pmd_downgrade) {
 			/*
 			 * Make sure 'entry' remains valid while we drop
-			 * mapping->tree_lock.
+			 * mapping xa_lock.
 			 */
 			entry = lock_slot(mapping, slot);
 		}
 
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		/*
 		 * Besides huge zero pages the only other thing that gets
 		 * downgraded are empty entries which don't need to be
@@ -385,26 +385,26 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
 				put_locked_mapping_entry(mapping, index);
 			return ERR_PTR(err);
 		}
-		spin_lock_irq(&mapping->tree_lock);
+		xa_lock_irq(&mapping->pages);
 
 		if (!entry) {
 			/*
-			 * We needed to drop the page_tree lock while calling
+			 * We needed to drop the pages lock while calling
 			 * radix_tree_preload() and we didn't have an entry to
 			 * lock.  See if another thread inserted an entry at
 			 * our index during this time.
 			 */
-			entry = __radix_tree_lookup(&mapping->page_tree, index,
+			entry = __radix_tree_lookup(&mapping->pages, index,
 					NULL, &slot);
 			if (entry) {
 				radix_tree_preload_end();
-				spin_unlock_irq(&mapping->tree_lock);
+				xa_unlock_irq(&mapping->pages);
 				goto restart;
 			}
 		}
 
 		if (pmd_downgrade) {
-			radix_tree_delete(&mapping->page_tree, index);
+			radix_tree_delete(&mapping->pages, index);
 			mapping->nrexceptional--;
 			dax_wake_mapping_entry_waiter(mapping, index, entry,
 					true);
@@ -412,11 +412,11 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
 
 		entry = dax_radix_locked_entry(0, size_flag | RADIX_DAX_EMPTY);
 
-		err = __radix_tree_insert(&mapping->page_tree, index,
+		err = __radix_tree_insert(&mapping->pages, index,
 				dax_radix_order(entry), entry);
 		radix_tree_preload_end();
 		if (err) {
-			spin_unlock_irq(&mapping->tree_lock);
+			xa_unlock_irq(&mapping->pages);
 			/*
 			 * Our insertion of a DAX entry failed, most likely
 			 * because we were inserting a PMD entry and it
@@ -429,12 +429,12 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
 		}
 		/* Good, we have inserted empty locked entry into the tree. */
 		mapping->nrexceptional++;
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		return entry;
 	}
 	entry = lock_slot(mapping, slot);
  out_unlock:
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 	return entry;
 }
 
@@ -443,22 +443,22 @@ static int __dax_invalidate_mapping_entry(struct address_space *mapping,
 {
 	int ret = 0;
 	void *entry;
-	struct radix_tree_root *page_tree = &mapping->page_tree;
+	struct radix_tree_root *pages = &mapping->pages;
 
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 	entry = get_unlocked_mapping_entry(mapping, index, NULL);
 	if (!entry || WARN_ON_ONCE(!radix_tree_exceptional_entry(entry)))
 		goto out;
 	if (!trunc &&
-	    (radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_DIRTY) ||
-	     radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE)))
+	    (radix_tree_tag_get(pages, index, PAGECACHE_TAG_DIRTY) ||
+	     radix_tree_tag_get(pages, index, PAGECACHE_TAG_TOWRITE)))
 		goto out;
-	radix_tree_delete(page_tree, index);
+	radix_tree_delete(pages, index);
 	mapping->nrexceptional--;
 	ret = 1;
 out:
 	put_unlocked_mapping_entry(mapping, index, entry);
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 	return ret;
 }
 /*
@@ -528,7 +528,7 @@ static void *dax_insert_mapping_entry(struct address_space *mapping,
 				      void *entry, sector_t sector,
 				      unsigned long flags, bool dirty)
 {
-	struct radix_tree_root *page_tree = &mapping->page_tree;
+	struct radix_tree_root *pages = &mapping->pages;
 	void *new_entry;
 	pgoff_t index = vmf->pgoff;
 
@@ -546,7 +546,7 @@ static void *dax_insert_mapping_entry(struct address_space *mapping,
 					PAGE_SIZE, 0);
 	}
 
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 	new_entry = dax_radix_locked_entry(sector, flags);
 
 	if (dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
@@ -562,17 +562,17 @@ static void *dax_insert_mapping_entry(struct address_space *mapping,
 		void **slot;
 		void *ret;
 
-		ret = __radix_tree_lookup(page_tree, index, &node, &slot);
+		ret = __radix_tree_lookup(pages, index, &node, &slot);
 		WARN_ON_ONCE(ret != entry);
-		__radix_tree_replace(page_tree, node, slot,
+		__radix_tree_replace(pages, node, slot,
 				     new_entry, NULL);
 		entry = new_entry;
 	}
 
 	if (dirty)
-		radix_tree_tag_set(page_tree, index, PAGECACHE_TAG_DIRTY);
+		radix_tree_tag_set(pages, index, PAGECACHE_TAG_DIRTY);
 
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 	return entry;
 }
 
@@ -662,7 +662,7 @@ static int dax_writeback_one(struct block_device *bdev,
 		struct dax_device *dax_dev, struct address_space *mapping,
 		pgoff_t index, void *entry)
 {
-	struct radix_tree_root *page_tree = &mapping->page_tree;
+	struct radix_tree_root *pages = &mapping->pages;
 	void *entry2, **slot, *kaddr;
 	long ret = 0, id;
 	sector_t sector;
@@ -677,7 +677,7 @@ static int dax_writeback_one(struct block_device *bdev,
 	if (WARN_ON(!radix_tree_exceptional_entry(entry)))
 		return -EIO;
 
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 	entry2 = get_unlocked_mapping_entry(mapping, index, &slot);
 	/* Entry got punched out / reallocated? */
 	if (!entry2 || WARN_ON_ONCE(!radix_tree_exceptional_entry(entry2)))
@@ -696,7 +696,7 @@ static int dax_writeback_one(struct block_device *bdev,
 	}
 
 	/* Another fsync thread may have already written back this entry */
-	if (!radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE))
+	if (!radix_tree_tag_get(pages, index, PAGECACHE_TAG_TOWRITE))
 		goto put_unlocked;
 	/* Lock the entry to serialize with page faults */
 	entry = lock_slot(mapping, slot);
@@ -704,11 +704,11 @@ static int dax_writeback_one(struct block_device *bdev,
 	 * We can clear the tag now but we have to be careful so that concurrent
 	 * dax_writeback_one() calls for the same index cannot finish before we
 	 * actually flush the caches. This is achieved as the calls will look
-	 * at the entry only under tree_lock and once they do that they will
+	 * at the entry only under xa_lock and once they do that they will
 	 * see the entry locked and wait for it to unlock.
 	 */
-	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
-	spin_unlock_irq(&mapping->tree_lock);
+	radix_tree_tag_clear(pages, index, PAGECACHE_TAG_TOWRITE);
+	xa_unlock_irq(&mapping->pages);
 
 	/*
 	 * Even if dax_writeback_mapping_range() was given a wbc->range_start
@@ -726,7 +726,7 @@ static int dax_writeback_one(struct block_device *bdev,
 		goto dax_unlock;
 
 	/*
-	 * dax_direct_access() may sleep, so cannot hold tree_lock over
+	 * dax_direct_access() may sleep, so cannot hold xa_lock over
 	 * its invocation.
 	 */
 	ret = dax_direct_access(dax_dev, pgoff, size / PAGE_SIZE, &kaddr, &pfn);
@@ -746,9 +746,9 @@ static int dax_writeback_one(struct block_device *bdev,
 	 * the pfn mappings are writeprotected and fault waits for mapping
 	 * entry lock.
 	 */
-	spin_lock_irq(&mapping->tree_lock);
-	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_DIRTY);
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
+	radix_tree_tag_clear(pages, index, PAGECACHE_TAG_DIRTY);
+	xa_unlock_irq(&mapping->pages);
 	trace_dax_writeback_one(mapping->host, index, size >> PAGE_SHIFT);
  dax_unlock:
 	dax_read_unlock(id);
@@ -757,7 +757,7 @@ static int dax_writeback_one(struct block_device *bdev,
 
  put_unlocked:
 	put_unlocked_mapping_entry(mapping, index, entry2);
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 	return ret;
 }
 
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 516fa0d3ff9c..8f51ac47b77f 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2172,12 +2172,12 @@ void f2fs_set_page_dirty_nobuffers(struct page *page)
 	SetPageDirty(page);
 	spin_unlock(&mapping->private_lock);
 
-	spin_lock_irqsave(&mapping->tree_lock, flags);
+	xa_lock_irqsave(&mapping->pages, flags);
 	WARN_ON_ONCE(!PageUptodate(page));
 	account_page_dirtied(page, mapping);
-	radix_tree_tag_set(&mapping->page_tree,
+	radix_tree_tag_set(&mapping->pages,
 			page_index(page), PAGECACHE_TAG_DIRTY);
-	spin_unlock_irqrestore(&mapping->tree_lock, flags);
+	xa_unlock_irqrestore(&mapping->pages, flags);
 	unlock_page_memcg(page);
 
 	__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index 2d98d877c09d..b5515ea6bb2f 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -739,10 +739,10 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page,
 
 	if (bit_pos == NR_DENTRY_IN_BLOCK &&
 			!truncate_hole(dir, page->index, page->index + 1)) {
-		spin_lock_irqsave(&mapping->tree_lock, flags);
-		radix_tree_tag_clear(&mapping->page_tree, page_index(page),
+		xa_lock_irqsave(&mapping->pages, flags);
+		radix_tree_tag_clear(&mapping->pages, page_index(page),
 				     PAGECACHE_TAG_DIRTY);
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
+		xa_unlock_irqrestore(&mapping->pages, flags);
 
 		clear_page_dirty_for_io(page);
 		ClearPagePrivate(page);
diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
index 90e38d8ea688..7858b8e15f33 100644
--- a/fs/f2fs/inline.c
+++ b/fs/f2fs/inline.c
@@ -226,10 +226,10 @@ int f2fs_write_inline_data(struct inode *inode, struct page *page)
 	kunmap_atomic(src_addr);
 	set_page_dirty(dn.inode_page);
 
-	spin_lock_irqsave(&mapping->tree_lock, flags);
-	radix_tree_tag_clear(&mapping->page_tree, page_index(page),
+	xa_lock_irqsave(&mapping->pages, flags);
+	radix_tree_tag_clear(&mapping->pages, page_index(page),
 			     PAGECACHE_TAG_DIRTY);
-	spin_unlock_irqrestore(&mapping->tree_lock, flags);
+	xa_unlock_irqrestore(&mapping->pages, flags);
 
 	set_inode_flag(inode, FI_APPEND_WRITE);
 	set_inode_flag(inode, FI_DATA_EXIST);
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index d3322752426f..6b64a3009d55 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -91,11 +91,11 @@ static void clear_node_page_dirty(struct page *page)
 	unsigned int long flags;
 
 	if (PageDirty(page)) {
-		spin_lock_irqsave(&mapping->tree_lock, flags);
-		radix_tree_tag_clear(&mapping->page_tree,
+		xa_lock_irqsave(&mapping->pages, flags);
+		radix_tree_tag_clear(&mapping->pages,
 				page_index(page),
 				PAGECACHE_TAG_DIRTY);
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
+		xa_unlock_irqrestore(&mapping->pages, flags);
 
 		clear_page_dirty_for_io(page);
 		dec_page_count(F2FS_M_SB(mapping), F2FS_DIRTY_NODES);
@@ -1143,7 +1143,7 @@ void ra_node_page(struct f2fs_sb_info *sbi, nid_t nid)
 	f2fs_bug_on(sbi, check_nid_range(sbi, nid));
 
 	rcu_read_lock();
-	apage = radix_tree_lookup(&NODE_MAPPING(sbi)->page_tree, nid);
+	apage = radix_tree_lookup(&NODE_MAPPING(sbi)->pages, nid);
 	rcu_read_unlock();
 	if (apage)
 		return;
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 08f5debd07d1..50b94da97e4f 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -347,9 +347,9 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
 	 * By the time control reaches here, RCU grace period has passed
 	 * since I_WB_SWITCH assertion and all wb stat update transactions
 	 * between unlocked_inode_to_wb_begin/end() are guaranteed to be
-	 * synchronizing against mapping->tree_lock.
+	 * synchronizing against mapping xa_lock.
 	 *
-	 * Grabbing old_wb->list_lock, inode->i_lock and mapping->tree_lock
+	 * Grabbing old_wb->list_lock, inode->i_lock and mapping xa_lock
 	 * gives us exclusion against all wb related operations on @inode
 	 * including IO list manipulations and stat updates.
 	 */
@@ -361,7 +361,7 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
 		spin_lock_nested(&old_wb->list_lock, SINGLE_DEPTH_NESTING);
 	}
 	spin_lock(&inode->i_lock);
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 
 	/*
 	 * Once I_FREEING is visible under i_lock, the eviction path owns
@@ -375,20 +375,20 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
 	 * to possibly dirty pages while PAGECACHE_TAG_WRITEBACK points to
 	 * pages actually under underwriteback.
 	 */
-	radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter, 0,
+	radix_tree_for_each_tagged(slot, &mapping->pages, &iter, 0,
 				   PAGECACHE_TAG_DIRTY) {
 		struct page *page = radix_tree_deref_slot_protected(slot,
-							&mapping->tree_lock);
+						&mapping->pages.xa_lock);
 		if (likely(page) && PageDirty(page)) {
 			dec_wb_stat(old_wb, WB_RECLAIMABLE);
 			inc_wb_stat(new_wb, WB_RECLAIMABLE);
 		}
 	}
 
-	radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter, 0,
+	radix_tree_for_each_tagged(slot, &mapping->pages, &iter, 0,
 				   PAGECACHE_TAG_WRITEBACK) {
 		struct page *page = radix_tree_deref_slot_protected(slot,
-							&mapping->tree_lock);
+						&mapping->pages.xa_lock);
 		if (likely(page)) {
 			WARN_ON_ONCE(!PageWriteback(page));
 			dec_wb_stat(old_wb, WB_WRITEBACK);
@@ -430,7 +430,7 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
 	 */
 	smp_store_release(&inode->i_state, inode->i_state & ~I_WB_SWITCH);
 
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 	spin_unlock(&inode->i_lock);
 	spin_unlock(&new_wb->list_lock);
 	spin_unlock(&old_wb->list_lock);
@@ -507,7 +507,7 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
 	/*
 	 * In addition to synchronizing among switchers, I_WB_SWITCH tells
 	 * the RCU protected stat update paths to grab the mapping's
-	 * tree_lock so that stat transfer can synchronize against them.
+	 * xa_lock so that stat transfer can synchronize against them.
 	 * Let's continue after I_WB_SWITCH is guaranteed to be visible.
 	 */
 	call_rcu(&isw->rcu_head, inode_switch_wbs_rcu_fn);
diff --git a/fs/inode.c b/fs/inode.c
index fd401028a309..5dff2f1004e3 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -348,8 +348,7 @@ EXPORT_SYMBOL(inc_nlink);
 void address_space_init_once(struct address_space *mapping)
 {
 	memset(mapping, 0, sizeof(*mapping));
-	INIT_RADIX_TREE(&mapping->page_tree, GFP_ATOMIC | __GFP_ACCOUNT);
-	spin_lock_init(&mapping->tree_lock);
+	INIT_RADIX_TREE(&mapping->pages, GFP_ATOMIC | __GFP_ACCOUNT);
 	init_rwsem(&mapping->i_mmap_rwsem);
 	INIT_LIST_HEAD(&mapping->private_list);
 	spin_lock_init(&mapping->private_lock);
@@ -499,14 +498,14 @@ void clear_inode(struct inode *inode)
 {
 	might_sleep();
 	/*
-	 * We have to cycle tree_lock here because reclaim can be still in the
+	 * We have to cycle the xa_lock here because reclaim can be in the
 	 * process of removing the last page (in __delete_from_page_cache())
-	 * and we must not free mapping under it.
+	 * and we must not free the mapping under it.
 	 */
-	spin_lock_irq(&inode->i_data.tree_lock);
+	xa_lock_irq(&inode->i_data.pages);
 	BUG_ON(inode->i_data.nrpages);
 	BUG_ON(inode->i_data.nrexceptional);
-	spin_unlock_irq(&inode->i_data.tree_lock);
+	xa_unlock_irq(&inode->i_data.pages);
 	BUG_ON(!list_empty(&inode->i_data.private_list));
 	BUG_ON(!(inode->i_state & I_FREEING));
 	BUG_ON(inode->i_state & I_CLEAR);
diff --git a/fs/nilfs2/btnode.c b/fs/nilfs2/btnode.c
index c21e0b4454a6..9e2a00207436 100644
--- a/fs/nilfs2/btnode.c
+++ b/fs/nilfs2/btnode.c
@@ -193,9 +193,9 @@ int nilfs_btnode_prepare_change_key(struct address_space *btnc,
 				       (unsigned long long)oldkey,
 				       (unsigned long long)newkey);
 
-		spin_lock_irq(&btnc->tree_lock);
-		err = radix_tree_insert(&btnc->page_tree, newkey, obh->b_page);
-		spin_unlock_irq(&btnc->tree_lock);
+		xa_lock_irq(&btnc->pages);
+		err = radix_tree_insert(&btnc->pages, newkey, obh->b_page);
+		xa_unlock_irq(&btnc->pages);
 		/*
 		 * Note: page->index will not change to newkey until
 		 * nilfs_btnode_commit_change_key() will be called.
@@ -251,11 +251,11 @@ void nilfs_btnode_commit_change_key(struct address_space *btnc,
 				       (unsigned long long)newkey);
 		mark_buffer_dirty(obh);
 
-		spin_lock_irq(&btnc->tree_lock);
-		radix_tree_delete(&btnc->page_tree, oldkey);
-		radix_tree_tag_set(&btnc->page_tree, newkey,
+		xa_lock_irq(&btnc->pages);
+		radix_tree_delete(&btnc->pages, oldkey);
+		radix_tree_tag_set(&btnc->pages, newkey,
 				   PAGECACHE_TAG_DIRTY);
-		spin_unlock_irq(&btnc->tree_lock);
+		xa_unlock_irq(&btnc->pages);
 
 		opage->index = obh->b_blocknr = newkey;
 		unlock_page(opage);
@@ -283,9 +283,9 @@ void nilfs_btnode_abort_change_key(struct address_space *btnc,
 		return;
 
 	if (nbh == NULL) {	/* blocksize == pagesize */
-		spin_lock_irq(&btnc->tree_lock);
-		radix_tree_delete(&btnc->page_tree, newkey);
-		spin_unlock_irq(&btnc->tree_lock);
+		xa_lock_irq(&btnc->pages);
+		radix_tree_delete(&btnc->pages, newkey);
+		xa_unlock_irq(&btnc->pages);
 		unlock_page(ctxt->bh->b_page);
 	} else
 		brelse(nbh);
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 68241512d7c1..1c6703efde9e 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -331,15 +331,15 @@ void nilfs_copy_back_pages(struct address_space *dmap,
 			struct page *page2;
 
 			/* move the page to the destination cache */
-			spin_lock_irq(&smap->tree_lock);
-			page2 = radix_tree_delete(&smap->page_tree, offset);
+			xa_lock_irq(&smap->pages);
+			page2 = radix_tree_delete(&smap->pages, offset);
 			WARN_ON(page2 != page);
 
 			smap->nrpages--;
-			spin_unlock_irq(&smap->tree_lock);
+			xa_unlock_irq(&smap->pages);
 
-			spin_lock_irq(&dmap->tree_lock);
-			err = radix_tree_insert(&dmap->page_tree, offset, page);
+			xa_lock_irq(&dmap->pages);
+			err = radix_tree_insert(&dmap->pages, offset, page);
 			if (unlikely(err < 0)) {
 				WARN_ON(err == -EEXIST);
 				page->mapping = NULL;
@@ -348,11 +348,11 @@ void nilfs_copy_back_pages(struct address_space *dmap,
 				page->mapping = dmap;
 				dmap->nrpages++;
 				if (PageDirty(page))
-					radix_tree_tag_set(&dmap->page_tree,
+					radix_tree_tag_set(&dmap->pages,
 							   offset,
 							   PAGECACHE_TAG_DIRTY);
 			}
-			spin_unlock_irq(&dmap->tree_lock);
+			xa_unlock_irq(&dmap->pages);
 		}
 		unlock_page(page);
 	}
@@ -474,15 +474,15 @@ int __nilfs_clear_page_dirty(struct page *page)
 	struct address_space *mapping = page->mapping;
 
 	if (mapping) {
-		spin_lock_irq(&mapping->tree_lock);
+		xa_lock_irq(&mapping->pages);
 		if (test_bit(PG_dirty, &page->flags)) {
-			radix_tree_tag_clear(&mapping->page_tree,
+			radix_tree_tag_clear(&mapping->pages,
 					     page_index(page),
 					     PAGECACHE_TAG_DIRTY);
-			spin_unlock_irq(&mapping->tree_lock);
+			xa_unlock_irq(&mapping->pages);
 			return clear_page_dirty_for_io(page);
 		}
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		return 0;
 	}
 	return TestClearPageDirty(page);
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index e54e7e0033eb..9038f6c1eeda 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -329,7 +329,7 @@ static inline bool inode_to_wb_is_valid(struct inode *inode)
  * @inode: inode of interest
  *
  * Returns the wb @inode is currently associated with.  The caller must be
- * holding either @inode->i_lock, @inode->i_mapping->tree_lock, or the
+ * holding either @inode->i_lock, @inode->i_mapping->pages.xa_lock, or the
  * associated wb's list_lock.
  */
 static inline struct bdi_writeback *inode_to_wb(struct inode *inode)
@@ -337,7 +337,7 @@ static inline struct bdi_writeback *inode_to_wb(struct inode *inode)
 #ifdef CONFIG_LOCKDEP
 	WARN_ON_ONCE(debug_locks &&
 		     (!lockdep_is_held(&inode->i_lock) &&
-		      !lockdep_is_held(&inode->i_mapping->tree_lock) &&
+		      !xa_lock_held(&inode->i_mapping->pages) &&
 		      !lockdep_is_held(&inode->i_wb->list_lock)));
 #endif
 	return inode->i_wb;
@@ -349,7 +349,7 @@ static inline struct bdi_writeback *inode_to_wb(struct inode *inode)
  * @lockedp: temp bool output param, to be passed to the end function
  *
  * The caller wants to access the wb associated with @inode but isn't
- * holding inode->i_lock, mapping->tree_lock or wb->list_lock.  This
+ * holding inode->i_lock, mapping->pages.xa_lock or wb->list_lock.  This
  * function determines the wb associated with @inode and ensures that the
  * association doesn't change until the transaction is finished with
  * unlocked_inode_to_wb_end().
@@ -370,10 +370,10 @@ unlocked_inode_to_wb_begin(struct inode *inode, bool *lockedp)
 	*lockedp = smp_load_acquire(&inode->i_state) & I_WB_SWITCH;
 
 	if (unlikely(*lockedp))
-		spin_lock_irq(&inode->i_mapping->tree_lock);
+		xa_lock_irq(&inode->i_mapping->pages);
 
 	/*
-	 * Protected by either !I_WB_SWITCH + rcu_read_lock() or tree_lock.
+	 * Protected by either !I_WB_SWITCH + rcu_read_lock() or xa_lock.
 	 * inode_to_wb() will bark.  Deref directly.
 	 */
 	return inode->i_wb;
@@ -387,7 +387,7 @@ unlocked_inode_to_wb_begin(struct inode *inode, bool *lockedp)
 static inline void unlocked_inode_to_wb_end(struct inode *inode, bool locked)
 {
 	if (unlikely(locked))
-		spin_unlock_irq(&inode->i_mapping->tree_lock);
+		xa_unlock_irq(&inode->i_mapping->pages);
 
 	rcu_read_unlock();
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2995a271ec46..265340149265 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -13,6 +13,7 @@
 #include <linux/list_lru.h>
 #include <linux/llist.h>
 #include <linux/radix-tree.h>
+#include <linux/xarray.h>
 #include <linux/rbtree.h>
 #include <linux/init.h>
 #include <linux/pid.h>
@@ -390,23 +391,21 @@ int pagecache_write_end(struct file *, struct address_space *mapping,
 
 struct address_space {
 	struct inode		*host;		/* owner: inode, block_device */
-	struct radix_tree_root	page_tree;	/* radix tree of all pages */
-	spinlock_t		tree_lock;	/* and lock protecting it */
+	struct radix_tree_root	pages;		/* cached pages */
+	gfp_t			gfp_mask;	/* for allocating pages */
 	atomic_t		i_mmap_writable;/* count VM_SHARED mappings */
 	struct rb_root_cached	i_mmap;		/* tree of private and shared mappings */
 	struct rw_semaphore	i_mmap_rwsem;	/* protect tree, count, list */
-	/* Protected by tree_lock together with the radix tree */
+	/* Protected by pages.xa_lock */
 	unsigned long		nrpages;	/* number of total pages */
-	/* number of shadow or DAX exceptional entries */
-	unsigned long		nrexceptional;
+	unsigned long		nrexceptional;	/* shadow or DAX entries */
 	pgoff_t			writeback_index;/* writeback starts here */
 	const struct address_space_operations *a_ops;	/* methods */
 	unsigned long		flags;		/* error bits */
+	errseq_t		wb_err;
 	spinlock_t		private_lock;	/* for use by the address_space */
-	gfp_t			gfp_mask;	/* implicit gfp mask for allocations */
-	struct list_head	private_list;	/* for use by the address_space */
+	struct list_head	private_list;	/* ditto */
 	void			*private_data;	/* ditto */
-	errseq_t		wb_err;
 } __attribute__((aligned(sizeof(long)))) __randomize_layout;
 	/*
 	 * On most architectures that alignment is already the case; but
@@ -1977,7 +1976,7 @@ static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
  *
  * I_WB_SWITCH		Cgroup bdi_writeback switching in progress.  Used to
  *			synchronize competing switching instances and to tell
- *			wb stat updates to grab mapping->tree_lock.  See
+ *			wb stat updates to grab mapping->pages.xa_lock.  See
  *			inode_switch_wb_work_fn() for details.
  *
  * I_OVL_INUSE		Used by overlayfs to get exclusive ownership on upper
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c692451245be..4a6e147dcf9d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -737,7 +737,7 @@ int finish_mkwrite_fault(struct vm_fault *vmf);
  * refcount. The each user mapping also has a reference to the page.
  *
  * The pagecache pages are stored in a per-mapping radix tree, which is
- * rooted at mapping->page_tree, and indexed by offset.
+ * rooted at mapping->pages, and indexed by offset.
  * Where 2.4 and early 2.6 kernels kept dirty/clean pages in per-address_space
  * lists, we instead now tag pages as dirty/writeback in the radix tree.
  *
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 34ce3ebf97d5..80a6149152d4 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -144,7 +144,7 @@ void release_pages(struct page **pages, int nr);
  * 3. check the page is still in pagecache (if no, goto 1)
  *
  * Remove-side that cares about stability of _refcount (eg. reclaim) has the
- * following (with tree_lock held for write):
+ * following (with pages.xa_lock held):
  * A. atomically check refcount is correct and set it to 0 (atomic_cmpxchg)
  * B. remove page from pagecache
  * C. free the page
@@ -157,7 +157,7 @@ void release_pages(struct page **pages, int nr);
  *
  * It is possible that between 1 and 2, the page is removed then the exact same
  * page is inserted into the same position in pagecache. That's OK: the
- * old find_get_page using tree_lock could equally have run before or after
+ * old find_get_page using a lock could equally have run before or after
  * such a re-insertion, depending on order that locks are granted.
  *
  * Lookups racing against pagecache insertion isn't a big problem: either 1
diff --git a/mm/filemap.c b/mm/filemap.c
index ee83baaf855d..5c8f22fe4e62 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -67,7 +67,7 @@
  *  ->i_mmap_rwsem		(truncate_pagecache)
  *    ->private_lock		(__free_pte->__set_page_dirty_buffers)
  *      ->swap_lock		(exclusive_swap_page, others)
- *        ->mapping->tree_lock
+ *        ->mapping->pages.xa_lock
  *
  *  ->i_mutex
  *    ->i_mmap_rwsem		(truncate->unmap_mapping_range)
@@ -75,7 +75,7 @@
  *  ->mmap_sem
  *    ->i_mmap_rwsem
  *      ->page_table_lock or pte_lock	(various, mainly in memory.c)
- *        ->mapping->tree_lock	(arch-dependent flush_dcache_mmap_lock)
+ *        ->mapping->pages.xa_lock	(arch-dependent flush_dcache_mmap_lock)
  *
  *  ->mmap_sem
  *    ->lock_page		(access_process_vm)
@@ -85,7 +85,7 @@
  *
  *  bdi->wb.list_lock
  *    sb_lock			(fs/fs-writeback.c)
- *    ->mapping->tree_lock	(__sync_single_inode)
+ *    ->mapping->pages.xa_lock	(__sync_single_inode)
  *
  *  ->i_mmap_rwsem
  *    ->anon_vma.lock		(vma_adjust)
@@ -96,11 +96,11 @@
  *  ->page_table_lock or pte_lock
  *    ->swap_lock		(try_to_unmap_one)
  *    ->private_lock		(try_to_unmap_one)
- *    ->tree_lock		(try_to_unmap_one)
+ *    ->pages.xa_lock		(try_to_unmap_one)
  *    ->zone_lru_lock(zone)	(follow_page->mark_page_accessed)
  *    ->zone_lru_lock(zone)	(check_pte_range->isolate_lru_page)
  *    ->private_lock		(page_remove_rmap->set_page_dirty)
- *    ->tree_lock		(page_remove_rmap->set_page_dirty)
+ *    ->pages.xa_lock		(page_remove_rmap->set_page_dirty)
  *    bdi.wb->list_lock		(page_remove_rmap->set_page_dirty)
  *    ->inode->i_lock		(page_remove_rmap->set_page_dirty)
  *    ->memcg->move_lock	(page_remove_rmap->lock_page_memcg)
@@ -119,14 +119,14 @@ static int page_cache_tree_insert(struct address_space *mapping,
 	void **slot;
 	int error;
 
-	error = __radix_tree_create(&mapping->page_tree, page->index, 0,
+	error = __radix_tree_create(&mapping->pages, page->index, 0,
 				    &node, &slot);
 	if (error)
 		return error;
 	if (*slot) {
 		void *p;
 
-		p = radix_tree_deref_slot_protected(slot, &mapping->tree_lock);
+		p = radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock);
 		if (!radix_tree_exceptional_entry(p))
 			return -EEXIST;
 
@@ -134,7 +134,7 @@ static int page_cache_tree_insert(struct address_space *mapping,
 		if (shadowp)
 			*shadowp = p;
 	}
-	__radix_tree_replace(&mapping->page_tree, node, slot, page,
+	__radix_tree_replace(&mapping->pages, node, slot, page,
 			     workingset_lookup_update(mapping));
 	mapping->nrpages++;
 	return 0;
@@ -156,13 +156,13 @@ static void page_cache_tree_delete(struct address_space *mapping,
 		struct radix_tree_node *node;
 		void **slot;
 
-		__radix_tree_lookup(&mapping->page_tree, page->index + i,
+		__radix_tree_lookup(&mapping->pages, page->index + i,
 				    &node, &slot);
 
 		VM_BUG_ON_PAGE(!node && nr != 1, page);
 
-		radix_tree_clear_tags(&mapping->page_tree, node, slot);
-		__radix_tree_replace(&mapping->page_tree, node, slot, shadow,
+		radix_tree_clear_tags(&mapping->pages, node, slot);
+		__radix_tree_replace(&mapping->pages, node, slot, shadow,
 				workingset_lookup_update(mapping));
 	}
 
@@ -254,7 +254,7 @@ static void unaccount_page_cache_page(struct address_space *mapping,
 /*
  * Delete a page from the page cache and free it. Caller has to make
  * sure the page is locked and that nobody else uses it - or that usage
- * is safe.  The caller must hold the mapping's tree_lock.
+ * is safe.  The caller must hold the mapping's xa_lock.
  */
 void __delete_from_page_cache(struct page *page, void *shadow)
 {
@@ -297,9 +297,9 @@ void delete_from_page_cache(struct page *page)
 	unsigned long flags;
 
 	BUG_ON(!PageLocked(page));
-	spin_lock_irqsave(&mapping->tree_lock, flags);
+	xa_lock_irqsave(&mapping->pages, flags);
 	__delete_from_page_cache(page, NULL);
-	spin_unlock_irqrestore(&mapping->tree_lock, flags);
+	xa_unlock_irqrestore(&mapping->pages, flags);
 
 	page_cache_free_page(mapping, page);
 }
@@ -310,14 +310,14 @@ EXPORT_SYMBOL(delete_from_page_cache);
  * @mapping: the mapping to which pages belong
  * @pvec: pagevec with pages to delete
  *
- * The function walks over mapping->page_tree and removes pages passed in @pvec
- * from the radix tree. The function expects @pvec to be sorted by page index.
- * It tolerates holes in @pvec (radix tree entries at those indices are not
+ * The function walks over mapping->pages and removes pages passed in @pvec
+ * from the mapping. The function expects @pvec to be sorted by page index.
+ * It tolerates holes in @pvec (mapping entries at those indices are not
  * modified). The function expects only THP head pages to be present in the
- * @pvec and takes care to delete all corresponding tail pages from the radix
- * tree as well.
+ * @pvec and takes care to delete all corresponding tail pages from the
+ * mapping as well.
  *
- * The function expects mapping->tree_lock to be held.
+ * The function expects xa_lock to be held.
  */
 static void
 page_cache_tree_delete_batch(struct address_space *mapping,
@@ -331,11 +331,11 @@ page_cache_tree_delete_batch(struct address_space *mapping,
 	pgoff_t start;
 
 	start = pvec->pages[0]->index;
-	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
+	radix_tree_for_each_slot(slot, &mapping->pages, &iter, start) {
 		if (i >= pagevec_count(pvec) && !tail_pages)
 			break;
 		page = radix_tree_deref_slot_protected(slot,
-						       &mapping->tree_lock);
+						       &mapping->pages.xa_lock);
 		if (radix_tree_exceptional_entry(page))
 			continue;
 		if (!tail_pages) {
@@ -358,8 +358,8 @@ page_cache_tree_delete_batch(struct address_space *mapping,
 		} else {
 			tail_pages--;
 		}
-		radix_tree_clear_tags(&mapping->page_tree, iter.node, slot);
-		__radix_tree_replace(&mapping->page_tree, iter.node, slot, NULL,
+		radix_tree_clear_tags(&mapping->pages, iter.node, slot);
+		__radix_tree_replace(&mapping->pages, iter.node, slot, NULL,
 				workingset_lookup_update(mapping));
 		total_pages++;
 	}
@@ -375,14 +375,14 @@ void delete_from_page_cache_batch(struct address_space *mapping,
 	if (!pagevec_count(pvec))
 		return;
 
-	spin_lock_irqsave(&mapping->tree_lock, flags);
+	xa_lock_irqsave(&mapping->pages, flags);
 	for (i = 0; i < pagevec_count(pvec); i++) {
 		trace_mm_filemap_delete_from_page_cache(pvec->pages[i]);
 
 		unaccount_page_cache_page(mapping, pvec->pages[i]);
 	}
 	page_cache_tree_delete_batch(mapping, pvec);
-	spin_unlock_irqrestore(&mapping->tree_lock, flags);
+	xa_unlock_irqrestore(&mapping->pages, flags);
 
 	for (i = 0; i < pagevec_count(pvec); i++)
 		page_cache_free_page(mapping, pvec->pages[i]);
@@ -799,7 +799,7 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
 		new->mapping = mapping;
 		new->index = offset;
 
-		spin_lock_irqsave(&mapping->tree_lock, flags);
+		xa_lock_irqsave(&mapping->pages, flags);
 		__delete_from_page_cache(old, NULL);
 		error = page_cache_tree_insert(mapping, new, NULL);
 		BUG_ON(error);
@@ -811,7 +811,7 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
 			__inc_node_page_state(new, NR_FILE_PAGES);
 		if (PageSwapBacked(new))
 			__inc_node_page_state(new, NR_SHMEM);
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
+		xa_unlock_irqrestore(&mapping->pages, flags);
 		mem_cgroup_migrate(old, new);
 		radix_tree_preload_end();
 		if (freepage)
@@ -853,7 +853,7 @@ static int __add_to_page_cache_locked(struct page *page,
 	page->mapping = mapping;
 	page->index = offset;
 
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 	error = page_cache_tree_insert(mapping, page, shadowp);
 	radix_tree_preload_end();
 	if (unlikely(error))
@@ -862,7 +862,7 @@ static int __add_to_page_cache_locked(struct page *page,
 	/* hugetlb pages do not participate in page cache accounting. */
 	if (!huge)
 		__inc_node_page_state(page, NR_FILE_PAGES);
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 	if (!huge)
 		mem_cgroup_commit_charge(page, memcg, false, false);
 	trace_mm_filemap_add_to_page_cache(page);
@@ -870,7 +870,7 @@ static int __add_to_page_cache_locked(struct page *page,
 err_insert:
 	page->mapping = NULL;
 	/* Leave page->index set: truncation relies upon it */
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 	if (!huge)
 		mem_cgroup_cancel_charge(page, memcg, false);
 	put_page(page);
@@ -1354,7 +1354,7 @@ pgoff_t page_cache_next_hole(struct address_space *mapping,
 	for (i = 0; i < max_scan; i++) {
 		struct page *page;
 
-		page = radix_tree_lookup(&mapping->page_tree, index);
+		page = radix_tree_lookup(&mapping->pages, index);
 		if (!page || radix_tree_exceptional_entry(page))
 			break;
 		index++;
@@ -1395,7 +1395,7 @@ pgoff_t page_cache_prev_hole(struct address_space *mapping,
 	for (i = 0; i < max_scan; i++) {
 		struct page *page;
 
-		page = radix_tree_lookup(&mapping->page_tree, index);
+		page = radix_tree_lookup(&mapping->pages, index);
 		if (!page || radix_tree_exceptional_entry(page))
 			break;
 		index--;
@@ -1428,7 +1428,7 @@ struct page *find_get_entry(struct address_space *mapping, pgoff_t offset)
 	rcu_read_lock();
 repeat:
 	page = NULL;
-	pagep = radix_tree_lookup_slot(&mapping->page_tree, offset);
+	pagep = radix_tree_lookup_slot(&mapping->pages, offset);
 	if (pagep) {
 		page = radix_tree_deref_slot(pagep);
 		if (unlikely(!page))
@@ -1634,7 +1634,7 @@ unsigned find_get_entries(struct address_space *mapping,
 		return 0;
 
 	rcu_read_lock();
-	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
+	radix_tree_for_each_slot(slot, &mapping->pages, &iter, start) {
 		struct page *head, *page;
 repeat:
 		page = radix_tree_deref_slot(slot);
@@ -1711,7 +1711,7 @@ unsigned find_get_pages_range(struct address_space *mapping, pgoff_t *start,
 		return 0;
 
 	rcu_read_lock();
-	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, *start) {
+	radix_tree_for_each_slot(slot, &mapping->pages, &iter, *start) {
 		struct page *head, *page;
 
 		if (iter.index > end)
@@ -1796,7 +1796,7 @@ unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t index,
 		return 0;
 
 	rcu_read_lock();
-	radix_tree_for_each_contig(slot, &mapping->page_tree, &iter, index) {
+	radix_tree_for_each_contig(slot, &mapping->pages, &iter, index) {
 		struct page *head, *page;
 repeat:
 		page = radix_tree_deref_slot(slot);
@@ -1876,8 +1876,7 @@ unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index,
 		return 0;
 
 	rcu_read_lock();
-	radix_tree_for_each_tagged(slot, &mapping->page_tree,
-				   &iter, *index, tag) {
+	radix_tree_for_each_tagged(slot, &mapping->pages, &iter, *index, tag) {
 		struct page *head, *page;
 
 		if (iter.index > end)
@@ -1970,8 +1969,7 @@ unsigned find_get_entries_tag(struct address_space *mapping, pgoff_t start,
 		return 0;
 
 	rcu_read_lock();
-	radix_tree_for_each_tagged(slot, &mapping->page_tree,
-				   &iter, start, tag) {
+	radix_tree_for_each_tagged(slot, &mapping->pages, &iter, start, tag) {
 		struct page *head, *page;
 repeat:
 		page = radix_tree_deref_slot(slot);
@@ -2625,8 +2623,7 @@ void filemap_map_pages(struct vm_fault *vmf,
 	struct page *head, *page;
 
 	rcu_read_lock();
-	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter,
-			start_pgoff) {
+	radix_tree_for_each_slot(slot, &mapping->pages, &iter, start_pgoff) {
 		if (iter.index > end_pgoff)
 			break;
 repeat:
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 86fe697e8bfb..1fe17f3b7473 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2468,7 +2468,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 	} else {
 		/* Additional pin to radix tree */
 		page_ref_add(head, 2);
-		spin_unlock(&head->mapping->tree_lock);
+		xa_unlock(&head->mapping->pages);
 	}
 
 	spin_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags);
@@ -2676,15 +2676,15 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 	if (mapping) {
 		void **pslot;
 
-		spin_lock(&mapping->tree_lock);
-		pslot = radix_tree_lookup_slot(&mapping->page_tree,
+		xa_lock(&mapping->pages);
+		pslot = radix_tree_lookup_slot(&mapping->pages,
 				page_index(head));
 		/*
 		 * Check if the head page is present in radix tree.
 		 * We assume all tail are present too, if head is there.
 		 */
 		if (radix_tree_deref_slot_protected(pslot,
-					&mapping->tree_lock) != head)
+					&mapping->pages.xa_lock) != head)
 			goto fail;
 	}
 
@@ -2718,7 +2718,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 		}
 		spin_unlock(&pgdata->split_queue_lock);
 fail:		if (mapping)
-			spin_unlock(&mapping->tree_lock);
+			xa_unlock(&mapping->pages);
 		spin_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags);
 		unfreeze_page(head);
 		ret = -EBUSY;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index ea4ff259b671..cb4d199bf328 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1339,8 +1339,8 @@ static void collapse_shmem(struct mm_struct *mm,
 	 */
 
 	index = start;
-	spin_lock_irq(&mapping->tree_lock);
-	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
+	xa_lock_irq(&mapping->pages);
+	radix_tree_for_each_slot(slot, &mapping->pages, &iter, start) {
 		int n = min(iter.index, end) - index;
 
 		/*
@@ -1353,7 +1353,7 @@ static void collapse_shmem(struct mm_struct *mm,
 		}
 		nr_none += n;
 		for (; index < min(iter.index, end); index++) {
-			radix_tree_insert(&mapping->page_tree, index,
+			radix_tree_insert(&mapping->pages, index,
 					new_page + (index % HPAGE_PMD_NR));
 		}
 
@@ -1362,16 +1362,16 @@ static void collapse_shmem(struct mm_struct *mm,
 			break;
 
 		page = radix_tree_deref_slot_protected(slot,
-				&mapping->tree_lock);
+				&mapping->pages.xa_lock);
 		if (radix_tree_exceptional_entry(page) || !PageUptodate(page)) {
-			spin_unlock_irq(&mapping->tree_lock);
+			xa_unlock_irq(&mapping->pages);
 			/* swap in or instantiate fallocated page */
 			if (shmem_getpage(mapping->host, index, &page,
 						SGP_NOHUGE)) {
 				result = SCAN_FAIL;
 				goto tree_unlocked;
 			}
-			spin_lock_irq(&mapping->tree_lock);
+			xa_lock_irq(&mapping->pages);
 		} else if (trylock_page(page)) {
 			get_page(page);
 		} else {
@@ -1380,7 +1380,7 @@ static void collapse_shmem(struct mm_struct *mm,
 		}
 
 		/*
-		 * The page must be locked, so we can drop the tree_lock
+		 * The page must be locked, so we can drop the xa_lock
 		 * without racing with truncate.
 		 */
 		VM_BUG_ON_PAGE(!PageLocked(page), page);
@@ -1391,7 +1391,7 @@ static void collapse_shmem(struct mm_struct *mm,
 			result = SCAN_TRUNCATED;
 			goto out_unlock;
 		}
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 
 		if (isolate_lru_page(page)) {
 			result = SCAN_DEL_PAGE_LRU;
@@ -1402,11 +1402,11 @@ static void collapse_shmem(struct mm_struct *mm,
 			unmap_mapping_range(mapping, index << PAGE_SHIFT,
 					PAGE_SIZE, 0);
 
-		spin_lock_irq(&mapping->tree_lock);
+		xa_lock_irq(&mapping->pages);
 
-		slot = radix_tree_lookup_slot(&mapping->page_tree, index);
+		slot = radix_tree_lookup_slot(&mapping->pages, index);
 		VM_BUG_ON_PAGE(page != radix_tree_deref_slot_protected(slot,
-					&mapping->tree_lock), page);
+					&mapping->pages.xa_lock), page);
 		VM_BUG_ON_PAGE(page_mapped(page), page);
 
 		/*
@@ -1427,14 +1427,14 @@ static void collapse_shmem(struct mm_struct *mm,
 		list_add_tail(&page->lru, &pagelist);
 
 		/* Finally, replace with the new page. */
-		radix_tree_replace_slot(&mapping->page_tree, slot,
+		radix_tree_replace_slot(&mapping->pages, slot,
 				new_page + (index % HPAGE_PMD_NR));
 
 		slot = radix_tree_iter_resume(slot, &iter);
 		index++;
 		continue;
 out_lru:
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		putback_lru_page(page);
 out_isolate_failed:
 		unlock_page(page);
@@ -1460,14 +1460,14 @@ static void collapse_shmem(struct mm_struct *mm,
 		}
 
 		for (; index < end; index++) {
-			radix_tree_insert(&mapping->page_tree, index,
+			radix_tree_insert(&mapping->pages, index,
 					new_page + (index % HPAGE_PMD_NR));
 		}
 		nr_none += n;
 	}
 
 tree_locked:
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 tree_unlocked:
 
 	if (result == SCAN_SUCCEED) {
@@ -1516,9 +1516,8 @@ static void collapse_shmem(struct mm_struct *mm,
 	} else {
 		/* Something went wrong: rollback changes to the radix-tree */
 		shmem_uncharge(mapping->host, nr_none);
-		spin_lock_irq(&mapping->tree_lock);
-		radix_tree_for_each_slot(slot, &mapping->page_tree, &iter,
-				start) {
+		xa_lock_irq(&mapping->pages);
+		radix_tree_for_each_slot(slot, &mapping->pages, &iter, start) {
 			if (iter.index >= end)
 				break;
 			page = list_first_entry_or_null(&pagelist,
@@ -1528,8 +1527,7 @@ static void collapse_shmem(struct mm_struct *mm,
 					break;
 				nr_none--;
 				/* Put holes back where they were */
-				radix_tree_delete(&mapping->page_tree,
-						  iter.index);
+				radix_tree_delete(&mapping->pages, iter.index);
 				continue;
 			}
 
@@ -1538,16 +1536,15 @@ static void collapse_shmem(struct mm_struct *mm,
 			/* Unfreeze the page. */
 			list_del(&page->lru);
 			page_ref_unfreeze(page, 2);
-			radix_tree_replace_slot(&mapping->page_tree,
-						slot, page);
+			radix_tree_replace_slot(&mapping->pages, slot, page);
 			slot = radix_tree_iter_resume(slot, &iter);
-			spin_unlock_irq(&mapping->tree_lock);
+			xa_unlock_irq(&mapping->pages);
 			putback_lru_page(page);
 			unlock_page(page);
-			spin_lock_irq(&mapping->tree_lock);
+			xa_lock_irq(&mapping->pages);
 		}
 		VM_BUG_ON(nr_none);
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 
 		/* Unfreeze new_page, caller would take care about freeing it */
 		page_ref_unfreeze(new_page, 1);
@@ -1575,7 +1572,7 @@ static void khugepaged_scan_shmem(struct mm_struct *mm,
 	swap = 0;
 	memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load));
 	rcu_read_lock();
-	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
+	radix_tree_for_each_slot(slot, &mapping->pages, &iter, start) {
 		if (iter.index >= start + HPAGE_PMD_NR)
 			break;
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 50e6906314f8..953dfed9e780 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6034,7 +6034,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
 
 	/*
 	 * Interrupts should be disabled here because the caller holds the
-	 * mapping->tree_lock lock which is taken with interrupts-off. It is
+	 * mapping->pages xa_lock which is taken with interrupts-off. It is
 	 * important here to have the interrupts disabled because it is the
 	 * only synchronisation we have for udpating the per-CPU variables.
 	 */
diff --git a/mm/migrate.c b/mm/migrate.c
index 4d0be47a322a..59f18c571120 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -466,20 +466,20 @@ int migrate_page_move_mapping(struct address_space *mapping,
 	oldzone = page_zone(page);
 	newzone = page_zone(newpage);
 
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 
-	pslot = radix_tree_lookup_slot(&mapping->page_tree,
+	pslot = radix_tree_lookup_slot(&mapping->pages,
  					page_index(page));
 
 	expected_count += 1 + page_has_private(page);
 	if (page_count(page) != expected_count ||
-		radix_tree_deref_slot_protected(pslot, &mapping->tree_lock) != page) {
-		spin_unlock_irq(&mapping->tree_lock);
+		radix_tree_deref_slot_protected(pslot, &mapping->pages.xa_lock) != page) {
+		xa_unlock_irq(&mapping->pages);
 		return -EAGAIN;
 	}
 
 	if (!page_ref_freeze(page, expected_count)) {
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		return -EAGAIN;
 	}
 
@@ -493,7 +493,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
 	if (mode == MIGRATE_ASYNC && head &&
 			!buffer_migrate_lock_buffers(head, mode)) {
 		page_ref_unfreeze(page, expected_count);
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		return -EAGAIN;
 	}
 
@@ -521,7 +521,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
 		SetPageDirty(newpage);
 	}
 
-	radix_tree_replace_slot(&mapping->page_tree, pslot, newpage);
+	radix_tree_replace_slot(&mapping->pages, pslot, newpage);
 
 	/*
 	 * Drop cache reference from old page by unfreezing
@@ -530,7 +530,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
 	 */
 	page_ref_unfreeze(page, expected_count - 1);
 
-	spin_unlock(&mapping->tree_lock);
+	xa_unlock(&mapping->pages);
 	/* Leave irq disabled to prevent preemption while updating stats */
 
 	/*
@@ -573,20 +573,19 @@ int migrate_huge_page_move_mapping(struct address_space *mapping,
 	int expected_count;
 	void **pslot;
 
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 
-	pslot = radix_tree_lookup_slot(&mapping->page_tree,
-					page_index(page));
+	pslot = radix_tree_lookup_slot(&mapping->pages, page_index(page));
 
 	expected_count = 2 + page_has_private(page);
 	if (page_count(page) != expected_count ||
-		radix_tree_deref_slot_protected(pslot, &mapping->tree_lock) != page) {
-		spin_unlock_irq(&mapping->tree_lock);
+		radix_tree_deref_slot_protected(pslot, &mapping->pages.xa_lock) != page) {
+		xa_unlock_irq(&mapping->pages);
 		return -EAGAIN;
 	}
 
 	if (!page_ref_freeze(page, expected_count)) {
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		return -EAGAIN;
 	}
 
@@ -595,11 +594,11 @@ int migrate_huge_page_move_mapping(struct address_space *mapping,
 
 	get_page(newpage);
 
-	radix_tree_replace_slot(&mapping->page_tree, pslot, newpage);
+	radix_tree_replace_slot(&mapping->pages, pslot, newpage);
 
 	page_ref_unfreeze(page, expected_count - 1);
 
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 
 	return MIGRATEPAGE_SUCCESS;
 }
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 8a1551154285..6a5b92629727 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2101,7 +2101,7 @@ void __init page_writeback_init(void)
  * so that it can tag pages faster than a dirtying process can create them).
  */
 /*
- * We tag pages in batches of WRITEBACK_TAG_BATCH to reduce tree_lock latency.
+ * We tag pages in batches of WRITEBACK_TAG_BATCH to reduce xa_lock latency.
  */
 void tag_pages_for_writeback(struct address_space *mapping,
 			     pgoff_t start, pgoff_t end)
@@ -2111,22 +2111,22 @@ void tag_pages_for_writeback(struct address_space *mapping,
 	struct radix_tree_iter iter;
 	void **slot;
 
-	spin_lock_irq(&mapping->tree_lock);
-	radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter, start,
+	xa_lock_irq(&mapping->pages);
+	radix_tree_for_each_tagged(slot, &mapping->pages, &iter, start,
 							PAGECACHE_TAG_DIRTY) {
 		if (iter.index > end)
 			break;
-		radix_tree_iter_tag_set(&mapping->page_tree, &iter,
+		radix_tree_iter_tag_set(&mapping->pages, &iter,
 							PAGECACHE_TAG_TOWRITE);
 		tagged++;
 		if ((tagged % WRITEBACK_TAG_BATCH) != 0)
 			continue;
 		slot = radix_tree_iter_resume(slot, &iter);
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		cond_resched();
-		spin_lock_irq(&mapping->tree_lock);
+		xa_lock_irq(&mapping->pages);
 	}
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 }
 EXPORT_SYMBOL(tag_pages_for_writeback);
 
@@ -2469,13 +2469,13 @@ int __set_page_dirty_nobuffers(struct page *page)
 			return 1;
 		}
 
-		spin_lock_irqsave(&mapping->tree_lock, flags);
+		xa_lock_irqsave(&mapping->pages, flags);
 		BUG_ON(page_mapping(page) != mapping);
 		WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
 		account_page_dirtied(page, mapping);
-		radix_tree_tag_set(&mapping->page_tree, page_index(page),
+		radix_tree_tag_set(&mapping->pages, page_index(page),
 				   PAGECACHE_TAG_DIRTY);
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
+		xa_unlock_irqrestore(&mapping->pages, flags);
 		unlock_page_memcg(page);
 
 		if (mapping->host) {
@@ -2720,11 +2720,10 @@ int test_clear_page_writeback(struct page *page)
 		struct backing_dev_info *bdi = inode_to_bdi(inode);
 		unsigned long flags;
 
-		spin_lock_irqsave(&mapping->tree_lock, flags);
+		xa_lock_irqsave(&mapping->pages, flags);
 		ret = TestClearPageWriteback(page);
 		if (ret) {
-			radix_tree_tag_clear(&mapping->page_tree,
-						page_index(page),
+			radix_tree_tag_clear(&mapping->pages, page_index(page),
 						PAGECACHE_TAG_WRITEBACK);
 			if (bdi_cap_account_writeback(bdi)) {
 				struct bdi_writeback *wb = inode_to_wb(inode);
@@ -2738,7 +2737,7 @@ int test_clear_page_writeback(struct page *page)
 						     PAGECACHE_TAG_WRITEBACK))
 			sb_clear_inode_writeback(mapping->host);
 
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
+		xa_unlock_irqrestore(&mapping->pages, flags);
 	} else {
 		ret = TestClearPageWriteback(page);
 	}
@@ -2768,7 +2767,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 		struct backing_dev_info *bdi = inode_to_bdi(inode);
 		unsigned long flags;
 
-		spin_lock_irqsave(&mapping->tree_lock, flags);
+		xa_lock_irqsave(&mapping->pages, flags);
 		ret = TestSetPageWriteback(page);
 		if (!ret) {
 			bool on_wblist;
@@ -2776,8 +2775,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 			on_wblist = mapping_tagged(mapping,
 						   PAGECACHE_TAG_WRITEBACK);
 
-			radix_tree_tag_set(&mapping->page_tree,
-						page_index(page),
+			radix_tree_tag_set(&mapping->pages, page_index(page),
 						PAGECACHE_TAG_WRITEBACK);
 			if (bdi_cap_account_writeback(bdi))
 				inc_wb_stat(inode_to_wb(inode), WB_WRITEBACK);
@@ -2791,14 +2789,12 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 				sb_mark_inode_writeback(mapping->host);
 		}
 		if (!PageDirty(page))
-			radix_tree_tag_clear(&mapping->page_tree,
-						page_index(page),
+			radix_tree_tag_clear(&mapping->pages, page_index(page),
 						PAGECACHE_TAG_DIRTY);
 		if (!keep_write)
-			radix_tree_tag_clear(&mapping->page_tree,
-						page_index(page),
+			radix_tree_tag_clear(&mapping->pages, page_index(page),
 						PAGECACHE_TAG_TOWRITE);
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
+		xa_unlock_irqrestore(&mapping->pages, flags);
 	} else {
 		ret = TestSetPageWriteback(page);
 	}
@@ -2818,7 +2814,7 @@ EXPORT_SYMBOL(__test_set_page_writeback);
  */
 int mapping_tagged(struct address_space *mapping, int tag)
 {
-	return radix_tree_tagged(&mapping->page_tree, tag);
+	return radix_tree_tagged(&mapping->pages, tag);
 }
 EXPORT_SYMBOL(mapping_tagged);
 
diff --git a/mm/readahead.c b/mm/readahead.c
index c4ca70239233..514188fd2489 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -175,7 +175,7 @@ int __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
 			break;
 
 		rcu_read_lock();
-		page = radix_tree_lookup(&mapping->page_tree, page_offset);
+		page = radix_tree_lookup(&mapping->pages, page_offset);
 		rcu_read_unlock();
 		if (page && !radix_tree_exceptional_entry(page))
 			continue;
diff --git a/mm/rmap.c b/mm/rmap.c
index 47db27f8049e..87c1ca0cf1a3 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -32,11 +32,11 @@
  *                 mmlist_lock (in mmput, drain_mmlist and others)
  *                 mapping->private_lock (in __set_page_dirty_buffers)
  *                   mem_cgroup_{begin,end}_page_stat (memcg->move_lock)
- *                     mapping->tree_lock (widely used)
+ *                     mapping->pages.xa_lock (widely used)
  *                 inode->i_lock (in set_page_dirty's __mark_inode_dirty)
  *                 bdi.wb->list_lock (in set_page_dirty's __mark_inode_dirty)
  *                   sb_lock (within inode_lock in fs/fs-writeback.c)
- *                   mapping->tree_lock (widely used, in set_page_dirty,
+ *                   mapping->pages.xa_lock (widely used, in set_page_dirty,
  *                             in arch-dependent flush_dcache_mmap_lock,
  *                             within bdi.wb->list_lock in __sync_single_inode)
  *
diff --git a/mm/shmem.c b/mm/shmem.c
index 4aa9307feab0..b1396ed0c9b1 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -332,12 +332,12 @@ static int shmem_radix_tree_replace(struct address_space *mapping,
 
 	VM_BUG_ON(!expected);
 	VM_BUG_ON(!replacement);
-	item = __radix_tree_lookup(&mapping->page_tree, index, &node, &pslot);
+	item = __radix_tree_lookup(&mapping->pages, index, &node, &pslot);
 	if (!item)
 		return -ENOENT;
 	if (item != expected)
 		return -ENOENT;
-	__radix_tree_replace(&mapping->page_tree, node, pslot,
+	__radix_tree_replace(&mapping->pages, node, pslot,
 			     replacement, NULL);
 	return 0;
 }
@@ -355,7 +355,7 @@ static bool shmem_confirm_swap(struct address_space *mapping,
 	void *item;
 
 	rcu_read_lock();
-	item = radix_tree_lookup(&mapping->page_tree, index);
+	item = radix_tree_lookup(&mapping->pages, index);
 	rcu_read_unlock();
 	return item == swp_to_radix_entry(swap);
 }
@@ -581,14 +581,14 @@ static int shmem_add_to_page_cache(struct page *page,
 	page->mapping = mapping;
 	page->index = index;
 
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 	if (PageTransHuge(page)) {
 		void __rcu **results;
 		pgoff_t idx;
 		int i;
 
 		error = 0;
-		if (radix_tree_gang_lookup_slot(&mapping->page_tree,
+		if (radix_tree_gang_lookup_slot(&mapping->pages,
 					&results, &idx, index, 1) &&
 				idx < index + HPAGE_PMD_NR) {
 			error = -EEXIST;
@@ -596,14 +596,14 @@ static int shmem_add_to_page_cache(struct page *page,
 
 		if (!error) {
 			for (i = 0; i < HPAGE_PMD_NR; i++) {
-				error = radix_tree_insert(&mapping->page_tree,
+				error = radix_tree_insert(&mapping->pages,
 						index + i, page + i);
 				VM_BUG_ON(error);
 			}
 			count_vm_event(THP_FILE_ALLOC);
 		}
 	} else if (!expected) {
-		error = radix_tree_insert(&mapping->page_tree, index, page);
+		error = radix_tree_insert(&mapping->pages, index, page);
 	} else {
 		error = shmem_radix_tree_replace(mapping, index, expected,
 								 page);
@@ -615,10 +615,10 @@ static int shmem_add_to_page_cache(struct page *page,
 			__inc_node_page_state(page, NR_SHMEM_THPS);
 		__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr);
 		__mod_node_page_state(page_pgdat(page), NR_SHMEM, nr);
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 	} else {
 		page->mapping = NULL;
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		page_ref_sub(page, nr);
 	}
 	return error;
@@ -634,13 +634,13 @@ static void shmem_delete_from_page_cache(struct page *page, void *radswap)
 
 	VM_BUG_ON_PAGE(PageCompound(page), page);
 
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 	error = shmem_radix_tree_replace(mapping, page->index, page, radswap);
 	page->mapping = NULL;
 	mapping->nrpages--;
 	__dec_node_page_state(page, NR_FILE_PAGES);
 	__dec_node_page_state(page, NR_SHMEM);
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 	put_page(page);
 	BUG_ON(error);
 }
@@ -653,9 +653,9 @@ static int shmem_free_swap(struct address_space *mapping,
 {
 	void *old;
 
-	spin_lock_irq(&mapping->tree_lock);
-	old = radix_tree_delete_item(&mapping->page_tree, index, radswap);
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
+	old = radix_tree_delete_item(&mapping->pages, index, radswap);
+	xa_unlock_irq(&mapping->pages);
 	if (old != radswap)
 		return -ENOENT;
 	free_swap_and_cache(radix_to_swp_entry(radswap));
@@ -666,7 +666,7 @@ static int shmem_free_swap(struct address_space *mapping,
  * Determine (in bytes) how many of the shmem object's pages mapped by the
  * given offsets are swapped out.
  *
- * This is safe to call without i_mutex or mapping->tree_lock thanks to RCU,
+ * This is safe to call without i_mutex or mapping->pages.xa_lock thanks to RCU,
  * as long as the inode doesn't go away and racy results are not a problem.
  */
 unsigned long shmem_partial_swap_usage(struct address_space *mapping,
@@ -679,7 +679,7 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
 
 	rcu_read_lock();
 
-	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
+	radix_tree_for_each_slot(slot, &mapping->pages, &iter, start) {
 		if (iter.index >= end)
 			break;
 
@@ -708,7 +708,7 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
  * Determine (in bytes) how many of the shmem object's pages mapped by the
  * given vma is swapped out.
  *
- * This is safe to call without i_mutex or mapping->tree_lock thanks to RCU,
+ * This is safe to call without i_mutex or mapping->pages.xa_lock thanks to RCU,
  * as long as the inode doesn't go away and racy results are not a problem.
  */
 unsigned long shmem_swap_usage(struct vm_area_struct *vma)
@@ -1123,7 +1123,7 @@ static int shmem_unuse_inode(struct shmem_inode_info *info,
 	int error = 0;
 
 	radswap = swp_to_radix_entry(swap);
-	index = find_swap_entry(&mapping->page_tree, radswap);
+	index = find_swap_entry(&mapping->pages, radswap);
 	if (index == -1)
 		return -EAGAIN;	/* tell shmem_unuse we found nothing */
 
@@ -1436,7 +1436,7 @@ static struct page *shmem_alloc_hugepage(gfp_t gfp,
 
 	hindex = round_down(index, HPAGE_PMD_NR);
 	rcu_read_lock();
-	if (radix_tree_gang_lookup_slot(&mapping->page_tree, &results, &idx,
+	if (radix_tree_gang_lookup_slot(&mapping->pages, &results, &idx,
 				hindex, 1) && idx < hindex + HPAGE_PMD_NR) {
 		rcu_read_unlock();
 		return NULL;
@@ -1549,14 +1549,14 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
 	 * Our caller will very soon move newpage out of swapcache, but it's
 	 * a nice clean interface for us to replace oldpage by newpage there.
 	 */
-	spin_lock_irq(&swap_mapping->tree_lock);
+	xa_lock_irq(&swap_mapping->pages);
 	error = shmem_radix_tree_replace(swap_mapping, swap_index, oldpage,
 								   newpage);
 	if (!error) {
 		__inc_node_page_state(newpage, NR_FILE_PAGES);
 		__dec_node_page_state(oldpage, NR_FILE_PAGES);
 	}
-	spin_unlock_irq(&swap_mapping->tree_lock);
+	xa_unlock_irq(&swap_mapping->pages);
 
 	if (unlikely(error)) {
 		/*
@@ -2622,7 +2622,7 @@ static void shmem_tag_pins(struct address_space *mapping)
 	start = 0;
 	rcu_read_lock();
 
-	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
+	radix_tree_for_each_slot(slot, &mapping->pages, &iter, start) {
 		page = radix_tree_deref_slot(slot);
 		if (!page || radix_tree_exception(page)) {
 			if (radix_tree_deref_retry(page)) {
@@ -2630,10 +2630,10 @@ static void shmem_tag_pins(struct address_space *mapping)
 				continue;
 			}
 		} else if (page_count(page) - page_mapcount(page) > 1) {
-			spin_lock_irq(&mapping->tree_lock);
-			radix_tree_tag_set(&mapping->page_tree, iter.index,
+			xa_lock_irq(&mapping->pages);
+			radix_tree_tag_set(&mapping->pages, iter.index,
 					   SHMEM_TAG_PINNED);
-			spin_unlock_irq(&mapping->tree_lock);
+			xa_unlock_irq(&mapping->pages);
 		}
 
 		if (need_resched()) {
@@ -2665,7 +2665,7 @@ static int shmem_wait_for_pins(struct address_space *mapping)
 
 	error = 0;
 	for (scan = 0; scan <= LAST_SCAN; scan++) {
-		if (!radix_tree_tagged(&mapping->page_tree, SHMEM_TAG_PINNED))
+		if (!radix_tree_tagged(&mapping->pages, SHMEM_TAG_PINNED))
 			break;
 
 		if (!scan)
@@ -2675,7 +2675,7 @@ static int shmem_wait_for_pins(struct address_space *mapping)
 
 		start = 0;
 		rcu_read_lock();
-		radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter,
+		radix_tree_for_each_tagged(slot, &mapping->pages, &iter,
 					   start, SHMEM_TAG_PINNED) {
 
 			page = radix_tree_deref_slot(slot);
@@ -2701,10 +2701,10 @@ static int shmem_wait_for_pins(struct address_space *mapping)
 				error = -EBUSY;
 			}
 
-			spin_lock_irq(&mapping->tree_lock);
-			radix_tree_tag_clear(&mapping->page_tree,
+			xa_lock_irq(&mapping->pages);
+			radix_tree_tag_clear(&mapping->pages,
 					     iter.index, SHMEM_TAG_PINNED);
-			spin_unlock_irq(&mapping->tree_lock);
+			xa_unlock_irq(&mapping->pages);
 continue_resched:
 			if (need_resched()) {
 				slot = radix_tree_iter_resume(slot, &iter);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 39ae7cfad90f..3f95e8fc4cb2 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -124,10 +124,10 @@ int __add_to_swap_cache(struct page *page, swp_entry_t entry)
 	SetPageSwapCache(page);
 
 	address_space = swap_address_space(entry);
-	spin_lock_irq(&address_space->tree_lock);
+	xa_lock_irq(&address_space->pages);
 	for (i = 0; i < nr; i++) {
 		set_page_private(page + i, entry.val + i);
-		error = radix_tree_insert(&address_space->page_tree,
+		error = radix_tree_insert(&address_space->pages,
 					  idx + i, page + i);
 		if (unlikely(error))
 			break;
@@ -145,13 +145,13 @@ int __add_to_swap_cache(struct page *page, swp_entry_t entry)
 		VM_BUG_ON(error == -EEXIST);
 		set_page_private(page + i, 0UL);
 		while (i--) {
-			radix_tree_delete(&address_space->page_tree, idx + i);
+			radix_tree_delete(&address_space->pages, idx + i);
 			set_page_private(page + i, 0UL);
 		}
 		ClearPageSwapCache(page);
 		page_ref_sub(page, nr);
 	}
-	spin_unlock_irq(&address_space->tree_lock);
+	xa_unlock_irq(&address_space->pages);
 
 	return error;
 }
@@ -188,7 +188,7 @@ void __delete_from_swap_cache(struct page *page)
 	address_space = swap_address_space(entry);
 	idx = swp_offset(entry);
 	for (i = 0; i < nr; i++) {
-		radix_tree_delete(&address_space->page_tree, idx + i);
+		radix_tree_delete(&address_space->pages, idx + i);
 		set_page_private(page + i, 0);
 	}
 	ClearPageSwapCache(page);
@@ -272,9 +272,9 @@ void delete_from_swap_cache(struct page *page)
 	entry.val = page_private(page);
 
 	address_space = swap_address_space(entry);
-	spin_lock_irq(&address_space->tree_lock);
+	xa_lock_irq(&address_space->pages);
 	__delete_from_swap_cache(page);
-	spin_unlock_irq(&address_space->tree_lock);
+	xa_unlock_irq(&address_space->pages);
 
 	put_swap_page(page, entry);
 	page_ref_sub(page, hpage_nr_pages(page));
@@ -612,12 +612,11 @@ int init_swap_address_space(unsigned int type, unsigned long nr_pages)
 		return -ENOMEM;
 	for (i = 0; i < nr; i++) {
 		space = spaces + i;
-		INIT_RADIX_TREE(&space->page_tree, GFP_ATOMIC|__GFP_NOWARN);
+		INIT_RADIX_TREE(&space->pages, GFP_ATOMIC|__GFP_NOWARN);
 		atomic_set(&space->i_mmap_writable, 0);
 		space->a_ops = &swap_aops;
 		/* swap cache doesn't use writeback related tags */
 		mapping_set_no_writeback_tags(space);
-		spin_lock_init(&space->tree_lock);
 	}
 	nr_swapper_spaces[type] = nr;
 	rcu_assign_pointer(swapper_spaces[type], spaces);
diff --git a/mm/truncate.c b/mm/truncate.c
index e4b4cf0f4070..094158f2e447 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -36,11 +36,11 @@ static inline void __clear_shadow_entry(struct address_space *mapping,
 	struct radix_tree_node *node;
 	void **slot;
 
-	if (!__radix_tree_lookup(&mapping->page_tree, index, &node, &slot))
+	if (!__radix_tree_lookup(&mapping->pages, index, &node, &slot))
 		return;
 	if (*slot != entry)
 		return;
-	__radix_tree_replace(&mapping->page_tree, node, slot, NULL,
+	__radix_tree_replace(&mapping->pages, node, slot, NULL,
 			     workingset_update_node);
 	mapping->nrexceptional--;
 }
@@ -48,9 +48,9 @@ static inline void __clear_shadow_entry(struct address_space *mapping,
 static void clear_shadow_entry(struct address_space *mapping, pgoff_t index,
 			       void *entry)
 {
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 	__clear_shadow_entry(mapping, index, entry);
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 }
 
 /*
@@ -79,7 +79,7 @@ static void truncate_exceptional_pvec_entries(struct address_space *mapping,
 	dax = dax_mapping(mapping);
 	lock = !dax && indices[j] < end;
 	if (lock)
-		spin_lock_irq(&mapping->tree_lock);
+		xa_lock_irq(&mapping->pages);
 
 	for (i = j; i < pagevec_count(pvec); i++) {
 		struct page *page = pvec->pages[i];
@@ -102,7 +102,7 @@ static void truncate_exceptional_pvec_entries(struct address_space *mapping,
 	}
 
 	if (lock)
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 	pvec->nr = j;
 }
 
@@ -522,8 +522,8 @@ void truncate_inode_pages_final(struct address_space *mapping)
 		 * modification that does not see AS_EXITING is
 		 * completed before starting the final truncate.
 		 */
-		spin_lock_irq(&mapping->tree_lock);
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_lock_irq(&mapping->pages);
+		xa_unlock_irq(&mapping->pages);
 
 		truncate_inode_pages(mapping, 0);
 	}
@@ -631,13 +631,13 @@ invalidate_complete_page2(struct address_space *mapping, struct page *page)
 	if (page_has_private(page) && !try_to_release_page(page, GFP_KERNEL))
 		return 0;
 
-	spin_lock_irqsave(&mapping->tree_lock, flags);
+	xa_lock_irqsave(&mapping->pages, flags);
 	if (PageDirty(page))
 		goto failed;
 
 	BUG_ON(page_has_private(page));
 	__delete_from_page_cache(page, NULL);
-	spin_unlock_irqrestore(&mapping->tree_lock, flags);
+	xa_unlock_irqrestore(&mapping->pages, flags);
 
 	if (mapping->a_ops->freepage)
 		mapping->a_ops->freepage(page);
@@ -645,7 +645,7 @@ invalidate_complete_page2(struct address_space *mapping, struct page *page)
 	put_page(page);	/* pagecache ref */
 	return 1;
 failed:
-	spin_unlock_irqrestore(&mapping->tree_lock, flags);
+	xa_unlock_irqrestore(&mapping->pages, flags);
 	return 0;
 }
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c02c850ea349..96316bd91f91 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -674,7 +674,7 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
 	BUG_ON(!PageLocked(page));
 	BUG_ON(mapping != page_mapping(page));
 
-	spin_lock_irqsave(&mapping->tree_lock, flags);
+	xa_lock_irqsave(&mapping->pages, flags);
 	/*
 	 * The non racy check for a busy page.
 	 *
@@ -698,7 +698,7 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
 	 * load is not satisfied before that of page->_refcount.
 	 *
 	 * Note that if SetPageDirty is always performed via set_page_dirty,
-	 * and thus under tree_lock, then this ordering is not required.
+	 * and thus under xa_lock, then this ordering is not required.
 	 */
 	if (unlikely(PageTransHuge(page)) && PageSwapCache(page))
 		refcount = 1 + HPAGE_PMD_NR;
@@ -716,7 +716,7 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
 		swp_entry_t swap = { .val = page_private(page) };
 		mem_cgroup_swapout(page, swap);
 		__delete_from_swap_cache(page);
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
+		xa_unlock_irqrestore(&mapping->pages, flags);
 		put_swap_page(page, swap);
 	} else {
 		void (*freepage)(struct page *);
@@ -737,13 +737,13 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
 		 * only page cache pages found in these are zero pages
 		 * covering holes, and because we don't want to mix DAX
 		 * exceptional entries and shadow exceptional entries in the
-		 * same page_tree.
+		 * same address_space.
 		 */
 		if (reclaimed && page_is_file_cache(page) &&
 		    !mapping_exiting(mapping) && !dax_mapping(mapping))
 			shadow = workingset_eviction(mapping, page);
 		__delete_from_page_cache(page, shadow);
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
+		xa_unlock_irqrestore(&mapping->pages, flags);
 
 		if (freepage != NULL)
 			freepage(page);
@@ -752,7 +752,7 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
 	return 1;
 
 cannot_free:
-	spin_unlock_irqrestore(&mapping->tree_lock, flags);
+	xa_unlock_irqrestore(&mapping->pages, flags);
 	return 0;
 }
 
diff --git a/mm/workingset.c b/mm/workingset.c
index b7d616a3bbbe..2d071f0df3af 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -202,7 +202,7 @@ static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat,
  * @mapping: address space the page was backing
  * @page: the page being evicted
  *
- * Returns a shadow entry to be stored in @mapping->page_tree in place
+ * Returns a shadow entry to be stored in @mapping->pages in place
  * of the evicted @page so that a later refault can be detected.
  */
 void *workingset_eviction(struct address_space *mapping, struct page *page)
@@ -348,7 +348,7 @@ void workingset_update_node(struct radix_tree_node *node)
 	 *
 	 * Avoid acquiring the list_lru lock when the nodes are
 	 * already where they should be. The list_empty() test is safe
-	 * as node->private_list is protected by &mapping->tree_lock.
+	 * as node->private_list is protected by &mapping->pages.xa_lock.
 	 */
 	if (node->count && node->count == node->exceptional) {
 		if (list_empty(&node->private_list))
@@ -366,7 +366,7 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
 	unsigned long nodes;
 	unsigned long cache;
 
-	/* list_lru lock nests inside IRQ-safe mapping->tree_lock */
+	/* list_lru lock nests inside IRQ-safe mapping->pages.xa_lock */
 	local_irq_disable();
 	nodes = list_lru_shrink_count(&shadow_nodes, sc);
 	local_irq_enable();
@@ -419,21 +419,21 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
 
 	/*
 	 * Page cache insertions and deletions synchroneously maintain
-	 * the shadow node LRU under the mapping->tree_lock and the
+	 * the shadow node LRU under the mapping->pages.xa_lock and the
 	 * lru_lock.  Because the page cache tree is emptied before
 	 * the inode can be destroyed, holding the lru_lock pins any
 	 * address_space that has radix tree nodes on the LRU.
 	 *
-	 * We can then safely transition to the mapping->tree_lock to
+	 * We can then safely transition to the mapping->pages.xa_lock to
 	 * pin only the address_space of the particular node we want
 	 * to reclaim, take the node off-LRU, and drop the lru_lock.
 	 */
 
 	node = container_of(item, struct radix_tree_node, private_list);
-	mapping = container_of(node->root, struct address_space, page_tree);
+	mapping = container_of(node->root, struct address_space, pages);
 
 	/* Coming from the list, invert the lock order */
-	if (!spin_trylock(&mapping->tree_lock)) {
+	if (!xa_trylock(&mapping->pages)) {
 		spin_unlock(lru_lock);
 		ret = LRU_RETRY;
 		goto out;
@@ -468,11 +468,11 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
 	if (WARN_ON_ONCE(node->exceptional))
 		goto out_invalid;
 	inc_lruvec_page_state(virt_to_page(node), WORKINGSET_NODERECLAIM);
-	__radix_tree_delete_node(&mapping->page_tree, node,
+	__radix_tree_delete_node(&mapping->pages, node,
 				 workingset_lookup_update(mapping));
 
 out_invalid:
-	spin_unlock(&mapping->tree_lock);
+	xa_unlock(&mapping->pages);
 	ret = LRU_REMOVED_RETRY;
 out:
 	local_irq_enable();
@@ -487,7 +487,7 @@ static unsigned long scan_shadow_nodes(struct shrinker *shrinker,
 {
 	unsigned long ret;
 
-	/* list_lru lock nests inside IRQ-safe mapping->tree_lock */
+	/* list_lru lock nests inside IRQ-safe mapping->xa_lock */
 	local_irq_disable();
 	ret = list_lru_shrink_walk(&shadow_nodes, sc, shadow_lru_isolate, NULL);
 	local_irq_enable();
@@ -503,7 +503,7 @@ static struct shrinker workingset_shadow_shrinker = {
 
 /*
  * Our list_lru->lock is IRQ-safe as it nests inside the IRQ-safe
- * mapping->tree_lock.
+ * mapping->pages.xa_lock.
  */
 static struct lock_class_key shadow_nodes_key;
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 15/62] page cache: Use xa_lock
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Remove the address_space ->tree_lock and use the xa_lock newly added to
the radix_tree_root.  Rename the address_space ->page_tree to ->pages,
since we don't really care that it's a tree.  Take the opportunity to
rearrange the elements of address_space to pack them better on 64-bit,
and make the comments more useful.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 Documentation/cgroup-v1/memory.txt              |   2 +-
 Documentation/vm/page_migration                 |  14 ++--
 arch/arm/include/asm/cacheflush.h               |   6 +-
 arch/nios2/include/asm/cacheflush.h             |   6 +-
 arch/parisc/include/asm/cacheflush.h            |   6 +-
 drivers/staging/lustre/lustre/llite/glimpse.c   |   2 +-
 drivers/staging/lustre/lustre/mdc/mdc_request.c |   8 +-
 fs/afs/write.c                                  |   2 +-
 fs/btrfs/compression.c                          |   2 +-
 fs/btrfs/extent_io.c                            |   8 +-
 fs/btrfs/inode.c                                |   2 +-
 fs/buffer.c                                     |  10 +--
 fs/cifs/file.c                                  |   2 +-
 fs/dax.c                                        | 106 ++++++++++++------------
 fs/f2fs/data.c                                  |   6 +-
 fs/f2fs/dir.c                                   |   6 +-
 fs/f2fs/inline.c                                |   6 +-
 fs/f2fs/node.c                                  |   8 +-
 fs/fs-writeback.c                               |  18 ++--
 fs/inode.c                                      |  11 ++-
 fs/nilfs2/btnode.c                              |  20 ++---
 fs/nilfs2/page.c                                |  22 ++---
 include/linux/backing-dev.h                     |  12 +--
 include/linux/fs.h                              |  17 ++--
 include/linux/mm.h                              |   2 +-
 include/linux/pagemap.h                         |   4 +-
 mm/filemap.c                                    |  83 +++++++++----------
 mm/huge_memory.c                                |  10 +--
 mm/khugepaged.c                                 |  49 +++++------
 mm/memcontrol.c                                 |   2 +-
 mm/migrate.c                                    |  31 ++++---
 mm/page-writeback.c                             |  42 +++++-----
 mm/readahead.c                                  |   2 +-
 mm/rmap.c                                       |   4 +-
 mm/shmem.c                                      |  60 +++++++-------
 mm/swap_state.c                                 |  17 ++--
 mm/truncate.c                                   |  22 ++---
 mm/vmscan.c                                     |  12 +--
 mm/workingset.c                                 |  22 ++---
 39 files changed, 322 insertions(+), 342 deletions(-)

diff --git a/Documentation/cgroup-v1/memory.txt b/Documentation/cgroup-v1/memory.txt
index cefb63639070..1d17fb0405ef 100644
--- a/Documentation/cgroup-v1/memory.txt
+++ b/Documentation/cgroup-v1/memory.txt
@@ -262,7 +262,7 @@ When oom event notifier is registered, event will be delivered.
 2.6 Locking
 
    lock_page_cgroup()/unlock_page_cgroup() should not be called under
-   mapping->tree_lock.
+   mapping xa_lock.
 
    Other lock order is following:
    PG_locked.
diff --git a/Documentation/vm/page_migration b/Documentation/vm/page_migration
index 0478ae2ad44a..faf849596a85 100644
--- a/Documentation/vm/page_migration
+++ b/Documentation/vm/page_migration
@@ -90,7 +90,7 @@ Steps:
 
 1. Lock the page to be migrated
 
-2. Insure that writeback is complete.
+2. Ensure that writeback is complete.
 
 3. Lock the new page that we want to move to. It is locked so that accesses to
    this (not yet uptodate) page immediately lock while the move is in progress.
@@ -100,8 +100,8 @@ Steps:
    mapcount is not zero then we do not migrate the page. All user space
    processes that attempt to access the page will now wait on the page lock.
 
-5. The radix tree lock is taken. This will cause all processes trying
-   to access the page via the mapping to block on the radix tree spinlock.
+5. The address space xa_lock is taken. This will cause all processes trying
+   to access the page via the mapping to block on the spinlock.
 
 6. The refcount of the page is examined and we back out if references remain
    otherwise we know that we are the only one referencing this page.
@@ -114,12 +114,12 @@ Steps:
 
 9. The radix tree is changed to point to the new page.
 
-10. The reference count of the old page is dropped because the radix tree
+10. The reference count of the old page is dropped because the address space
     reference is gone. A reference to the new page is established because
-    the new page is referenced to by the radix tree.
+    the new page is referenced by the address space.
 
-11. The radix tree lock is dropped. With that lookups in the mapping
-    become possible again. Processes will move from spinning on the tree_lock
+11. The address space xa_lock is dropped. With that lookups in the mapping
+    become possible again. Processes will move from spinning on the xa_lock
     to sleeping on the locked new page.
 
 12. The page contents are copied to the new page.
diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 74504b154256..f4ead9a74b7d 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -318,10 +318,8 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
 #define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE
 extern void flush_kernel_dcache_page(struct page *);
 
-#define flush_dcache_mmap_lock(mapping) \
-	spin_lock_irq(&(mapping)->tree_lock)
-#define flush_dcache_mmap_unlock(mapping) \
-	spin_unlock_irq(&(mapping)->tree_lock)
+#define flush_dcache_mmap_lock(mapping)		xa_lock_irq(&mapping->pages)
+#define flush_dcache_mmap_unlock(mapping)	xa_unlock_irq(&mapping->pages)
 
 #define flush_icache_user_range(vma,page,addr,len) \
 	flush_dcache_page(page)
diff --git a/arch/nios2/include/asm/cacheflush.h b/arch/nios2/include/asm/cacheflush.h
index 55e383c173f7..7a6eda381964 100644
--- a/arch/nios2/include/asm/cacheflush.h
+++ b/arch/nios2/include/asm/cacheflush.h
@@ -46,9 +46,7 @@ extern void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
 extern void flush_dcache_range(unsigned long start, unsigned long end);
 extern void invalidate_dcache_range(unsigned long start, unsigned long end);
 
-#define flush_dcache_mmap_lock(mapping) \
-	spin_lock_irq(&(mapping)->tree_lock)
-#define flush_dcache_mmap_unlock(mapping) \
-	spin_unlock_irq(&(mapping)->tree_lock)
+#define flush_dcache_mmap_lock(mapping)		xa_lock_irq(&mapping->pages)
+#define flush_dcache_mmap_unlock(mapping)	xa_unlock_irq(&mapping->pages)
 
 #endif /* _ASM_NIOS2_CACHEFLUSH_H */
diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h
index 3742508cc534..b772dd320118 100644
--- a/arch/parisc/include/asm/cacheflush.h
+++ b/arch/parisc/include/asm/cacheflush.h
@@ -54,10 +54,8 @@ void invalidate_kernel_vmap_range(void *vaddr, int size);
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 extern void flush_dcache_page(struct page *page);
 
-#define flush_dcache_mmap_lock(mapping) \
-	spin_lock_irq(&(mapping)->tree_lock)
-#define flush_dcache_mmap_unlock(mapping) \
-	spin_unlock_irq(&(mapping)->tree_lock)
+#define flush_dcache_mmap_lock(mapping)		xa_lock_irq(&mapping->pages)
+#define flush_dcache_mmap_unlock(mapping)	xa_unlock_irq(&mapping->pages)
 
 #define flush_icache_page(vma,page)	do { 		\
 	flush_kernel_dcache_page(page);			\
diff --git a/drivers/staging/lustre/lustre/llite/glimpse.c b/drivers/staging/lustre/lustre/llite/glimpse.c
index c43ac574274c..5f2843da911c 100644
--- a/drivers/staging/lustre/lustre/llite/glimpse.c
+++ b/drivers/staging/lustre/lustre/llite/glimpse.c
@@ -69,7 +69,7 @@ blkcnt_t dirty_cnt(struct inode *inode)
 	void	      *results[1];
 
 	if (inode->i_mapping)
-		cnt += radix_tree_gang_lookup_tag(&inode->i_mapping->page_tree,
+		cnt += radix_tree_gang_lookup_tag(&inode->i_mapping->pages,
 						  results, 0, 1,
 						  PAGECACHE_TAG_DIRTY);
 	if (cnt == 0 && atomic_read(&vob->vob_mmap_cnt) > 0)
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 03e55bca4ada..45dcf9f958d4 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -937,14 +937,14 @@ static struct page *mdc_page_locate(struct address_space *mapping, __u64 *hash,
 	struct page *page;
 	int found;
 
-	spin_lock_irq(&mapping->tree_lock);
-	found = radix_tree_gang_lookup(&mapping->page_tree,
+	xa_lock_irq(&mapping->pages);
+	found = radix_tree_gang_lookup(&mapping->pages,
 				       (void **)&page, offset, 1);
 	if (found > 0 && !radix_tree_exceptional_entry(page)) {
 		struct lu_dirpage *dp;
 
 		get_page(page);
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		/*
 		 * In contrast to find_lock_page() we are sure that directory
 		 * page cannot be truncated (while DLM lock is held) and,
@@ -992,7 +992,7 @@ static struct page *mdc_page_locate(struct address_space *mapping, __u64 *hash,
 			page = ERR_PTR(-EIO);
 		}
 	} else {
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		page = NULL;
 	}
 	return page;
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 18e46e31523c..eae4c80ec09d 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -563,7 +563,7 @@ static int afs_writepages_region(struct address_space *mapping,
 
 		_debug("wback %lx", page->index);
 
-		/* at this point we hold neither mapping->tree_lock nor lock on
+		/* at this point we hold neither mapping xa_lock nor lock on
 		 * the page itself: the page may be truncated or invalidated
 		 * (changing page->mapping to NULL), or even swizzled back from
 		 * swapper_space to tmpfs file mapping
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index b35ce16b3df3..541bd536c815 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -449,7 +449,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 			break;
 
 		rcu_read_lock();
-		page = radix_tree_lookup(&mapping->page_tree, pg_index);
+		page = radix_tree_lookup(&mapping->pages, pg_index);
 		rcu_read_unlock();
 		if (page && !radix_tree_exceptional_entry(page)) {
 			misses++;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 16045ea86fc1..184096a709a9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3966,7 +3966,7 @@ static int extent_write_cache_pages(struct address_space *mapping,
 
 			done_index = page->index;
 			/*
-			 * At this point we hold neither mapping->tree_lock nor
+			 * At this point we hold neither mapping xa_lock nor
 			 * lock on the page itself: the page may be truncated or
 			 * invalidated (changing page->mapping to NULL), or even
 			 * swizzled back from swapper_space to tmpfs file
@@ -5196,13 +5196,13 @@ void clear_extent_buffer_dirty(struct extent_buffer *eb)
 		WARN_ON(!PagePrivate(page));
 
 		clear_page_dirty_for_io(page);
-		spin_lock_irq(&page->mapping->tree_lock);
+		xa_lock_irq(&page->mapping->pages);
 		if (!PageDirty(page)) {
-			radix_tree_tag_clear(&page->mapping->page_tree,
+			radix_tree_tag_clear(&page->mapping->pages,
 						page_index(page),
 						PAGECACHE_TAG_DIRTY);
 		}
-		spin_unlock_irq(&page->mapping->tree_lock);
+		xa_unlock_irq(&page->mapping->pages);
 		ClearPageError(page);
 		unlock_page(page);
 	}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b93fe05a39c7..9ceee45ed513 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7527,7 +7527,7 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
 
 bool btrfs_page_exists_in_range(struct inode *inode, loff_t start, loff_t end)
 {
-	struct radix_tree_root *root = &inode->i_mapping->page_tree;
+	struct radix_tree_root *root = &inode->i_mapping->pages;
 	bool found = false;
 	void **pagep = NULL;
 	struct page *page = NULL;
diff --git a/fs/buffer.c b/fs/buffer.c
index fd894c2ae284..33c08624d45b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -195,7 +195,7 @@ EXPORT_SYMBOL(end_buffer_write_sync);
  * Hack idea: for the blockdev mapping, i_bufferlist_lock contention
  * may be quite high.  This code could TryLock the page, and if that
  * succeeds, there is no need to take private_lock. (But if
- * private_lock is contended then so is mapping->tree_lock).
+ * private_lock is contended then so is mapping xa_lock).
  */
 static struct buffer_head *
 __find_get_block_slow(struct block_device *bdev, sector_t block)
@@ -606,14 +606,14 @@ void __set_page_dirty(struct page *page, struct address_space *mapping,
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&mapping->tree_lock, flags);
+	xa_lock_irqsave(&mapping->pages, flags);
 	if (page->mapping) {	/* Race with truncate? */
 		WARN_ON_ONCE(warn && !PageUptodate(page));
 		account_page_dirtied(page, mapping);
-		radix_tree_tag_set(&mapping->page_tree,
+		radix_tree_tag_set(&mapping->pages,
 				page_index(page), PAGECACHE_TAG_DIRTY);
 	}
-	spin_unlock_irqrestore(&mapping->tree_lock, flags);
+	xa_unlock_irqrestore(&mapping->pages, flags);
 }
 EXPORT_SYMBOL_GPL(__set_page_dirty);
 
@@ -1102,7 +1102,7 @@ __getblk_slow(struct block_device *bdev, sector_t block,
  * inode list.
  *
  * mark_buffer_dirty() is atomic.  It takes bh->b_page->mapping->private_lock,
- * mapping->tree_lock and mapping->host->i_lock.
+ * mapping xa_lock and mapping->host->i_lock.
  */
 void mark_buffer_dirty(struct buffer_head *bh)
 {
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 3a85df2a9baf..ca7084825622 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1987,7 +1987,7 @@ wdata_prepare_pages(struct cifs_writedata *wdata, unsigned int found_pages,
 	for (i = 0; i < found_pages; i++) {
 		page = wdata->pages[i];
 		/*
-		 * At this point we hold neither mapping->tree_lock nor
+		 * At this point we hold neither mapping xa_lock nor
 		 * lock on the page itself: the page may be truncated or
 		 * invalidated (changing page->mapping to NULL), or even
 		 * swizzled back from swapper_space to tmpfs file
diff --git a/fs/dax.c b/fs/dax.c
index 95981591977a..93e6b79687af 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -158,7 +158,7 @@ static int wake_exceptional_entry_func(wait_queue_entry_t *wait, unsigned int mo
 }
 
 /*
- * We do not necessarily hold the mapping->tree_lock when we call this
+ * We do not necessarily hold the mapping xa_lock when we call this
  * function so it is possible that 'entry' is no longer a valid item in the
  * radix tree.  This is okay because all we really need to do is to find the
  * correct waitqueue where tasks might be waiting for that old 'entry' and
@@ -174,7 +174,7 @@ static void dax_wake_mapping_entry_waiter(struct address_space *mapping,
 
 	/*
 	 * Checking for locked entry and prepare_to_wait_exclusive() happens
-	 * under mapping->tree_lock, ditto for entry handling in our callers.
+	 * under mapping xa_lock, ditto for entry handling in our callers.
 	 * So at this point all tasks that could have seen our entry locked
 	 * must be in the waitqueue and the following check will see them.
 	 */
@@ -184,40 +184,40 @@ static void dax_wake_mapping_entry_waiter(struct address_space *mapping,
 
 /*
  * Check whether the given slot is locked. The function must be called with
- * mapping->tree_lock held
+ * mapping xa_lock held
  */
 static inline int slot_locked(struct address_space *mapping, void **slot)
 {
 	unsigned long entry = (unsigned long)
-		radix_tree_deref_slot_protected(slot, &mapping->tree_lock);
+		radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock);
 	return entry & RADIX_DAX_ENTRY_LOCK;
 }
 
 /*
  * Mark the given slot is locked. The function must be called with
- * mapping->tree_lock held
+ * mapping xa_lock held
  */
 static inline void *lock_slot(struct address_space *mapping, void **slot)
 {
 	unsigned long entry = (unsigned long)
-		radix_tree_deref_slot_protected(slot, &mapping->tree_lock);
+		radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock);
 
 	entry |= RADIX_DAX_ENTRY_LOCK;
-	radix_tree_replace_slot(&mapping->page_tree, slot, (void *)entry);
+	radix_tree_replace_slot(&mapping->pages, slot, (void *)entry);
 	return (void *)entry;
 }
 
 /*
  * Mark the given slot is unlocked. The function must be called with
- * mapping->tree_lock held
+ * mapping xa_lock held
  */
 static inline void *unlock_slot(struct address_space *mapping, void **slot)
 {
 	unsigned long entry = (unsigned long)
-		radix_tree_deref_slot_protected(slot, &mapping->tree_lock);
+		radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock);
 
 	entry &= ~(unsigned long)RADIX_DAX_ENTRY_LOCK;
-	radix_tree_replace_slot(&mapping->page_tree, slot, (void *)entry);
+	radix_tree_replace_slot(&mapping->pages, slot, (void *)entry);
 	return (void *)entry;
 }
 
@@ -228,7 +228,7 @@ static inline void *unlock_slot(struct address_space *mapping, void **slot)
  * put_locked_mapping_entry() when he locked the entry and now wants to
  * unlock it.
  *
- * The function must be called with mapping->tree_lock held.
+ * The function must be called with mapping xa_lock held.
  */
 static void *get_unlocked_mapping_entry(struct address_space *mapping,
 					pgoff_t index, void ***slotp)
@@ -241,7 +241,7 @@ static void *get_unlocked_mapping_entry(struct address_space *mapping,
 	ewait.wait.func = wake_exceptional_entry_func;
 
 	for (;;) {
-		entry = __radix_tree_lookup(&mapping->page_tree, index, NULL,
+		entry = __radix_tree_lookup(&mapping->pages, index, NULL,
 					  &slot);
 		if (!entry ||
 		    WARN_ON_ONCE(!radix_tree_exceptional_entry(entry)) ||
@@ -254,10 +254,10 @@ static void *get_unlocked_mapping_entry(struct address_space *mapping,
 		wq = dax_entry_waitqueue(mapping, index, entry, &ewait.key);
 		prepare_to_wait_exclusive(wq, &ewait.wait,
 					  TASK_UNINTERRUPTIBLE);
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		schedule();
 		finish_wait(wq, &ewait.wait);
-		spin_lock_irq(&mapping->tree_lock);
+		xa_lock_irq(&mapping->pages);
 	}
 }
 
@@ -266,15 +266,15 @@ static void dax_unlock_mapping_entry(struct address_space *mapping,
 {
 	void *entry, **slot;
 
-	spin_lock_irq(&mapping->tree_lock);
-	entry = __radix_tree_lookup(&mapping->page_tree, index, NULL, &slot);
+	xa_lock_irq(&mapping->pages);
+	entry = __radix_tree_lookup(&mapping->pages, index, NULL, &slot);
 	if (WARN_ON_ONCE(!entry || !radix_tree_exceptional_entry(entry) ||
 			 !slot_locked(mapping, slot))) {
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		return;
 	}
 	unlock_slot(mapping, slot);
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 	dax_wake_mapping_entry_waiter(mapping, index, entry, false);
 }
 
@@ -331,7 +331,7 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
 	void *entry, **slot;
 
 restart:
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 	entry = get_unlocked_mapping_entry(mapping, index, &slot);
 
 	if (WARN_ON_ONCE(entry && !radix_tree_exceptional_entry(entry))) {
@@ -363,12 +363,12 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
 		if (pmd_downgrade) {
 			/*
 			 * Make sure 'entry' remains valid while we drop
-			 * mapping->tree_lock.
+			 * mapping xa_lock.
 			 */
 			entry = lock_slot(mapping, slot);
 		}
 
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		/*
 		 * Besides huge zero pages the only other thing that gets
 		 * downgraded are empty entries which don't need to be
@@ -385,26 +385,26 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
 				put_locked_mapping_entry(mapping, index);
 			return ERR_PTR(err);
 		}
-		spin_lock_irq(&mapping->tree_lock);
+		xa_lock_irq(&mapping->pages);
 
 		if (!entry) {
 			/*
-			 * We needed to drop the page_tree lock while calling
+			 * We needed to drop the pages lock while calling
 			 * radix_tree_preload() and we didn't have an entry to
 			 * lock.  See if another thread inserted an entry at
 			 * our index during this time.
 			 */
-			entry = __radix_tree_lookup(&mapping->page_tree, index,
+			entry = __radix_tree_lookup(&mapping->pages, index,
 					NULL, &slot);
 			if (entry) {
 				radix_tree_preload_end();
-				spin_unlock_irq(&mapping->tree_lock);
+				xa_unlock_irq(&mapping->pages);
 				goto restart;
 			}
 		}
 
 		if (pmd_downgrade) {
-			radix_tree_delete(&mapping->page_tree, index);
+			radix_tree_delete(&mapping->pages, index);
 			mapping->nrexceptional--;
 			dax_wake_mapping_entry_waiter(mapping, index, entry,
 					true);
@@ -412,11 +412,11 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
 
 		entry = dax_radix_locked_entry(0, size_flag | RADIX_DAX_EMPTY);
 
-		err = __radix_tree_insert(&mapping->page_tree, index,
+		err = __radix_tree_insert(&mapping->pages, index,
 				dax_radix_order(entry), entry);
 		radix_tree_preload_end();
 		if (err) {
-			spin_unlock_irq(&mapping->tree_lock);
+			xa_unlock_irq(&mapping->pages);
 			/*
 			 * Our insertion of a DAX entry failed, most likely
 			 * because we were inserting a PMD entry and it
@@ -429,12 +429,12 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
 		}
 		/* Good, we have inserted empty locked entry into the tree. */
 		mapping->nrexceptional++;
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		return entry;
 	}
 	entry = lock_slot(mapping, slot);
  out_unlock:
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 	return entry;
 }
 
@@ -443,22 +443,22 @@ static int __dax_invalidate_mapping_entry(struct address_space *mapping,
 {
 	int ret = 0;
 	void *entry;
-	struct radix_tree_root *page_tree = &mapping->page_tree;
+	struct radix_tree_root *pages = &mapping->pages;
 
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 	entry = get_unlocked_mapping_entry(mapping, index, NULL);
 	if (!entry || WARN_ON_ONCE(!radix_tree_exceptional_entry(entry)))
 		goto out;
 	if (!trunc &&
-	    (radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_DIRTY) ||
-	     radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE)))
+	    (radix_tree_tag_get(pages, index, PAGECACHE_TAG_DIRTY) ||
+	     radix_tree_tag_get(pages, index, PAGECACHE_TAG_TOWRITE)))
 		goto out;
-	radix_tree_delete(page_tree, index);
+	radix_tree_delete(pages, index);
 	mapping->nrexceptional--;
 	ret = 1;
 out:
 	put_unlocked_mapping_entry(mapping, index, entry);
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 	return ret;
 }
 /*
@@ -528,7 +528,7 @@ static void *dax_insert_mapping_entry(struct address_space *mapping,
 				      void *entry, sector_t sector,
 				      unsigned long flags, bool dirty)
 {
-	struct radix_tree_root *page_tree = &mapping->page_tree;
+	struct radix_tree_root *pages = &mapping->pages;
 	void *new_entry;
 	pgoff_t index = vmf->pgoff;
 
@@ -546,7 +546,7 @@ static void *dax_insert_mapping_entry(struct address_space *mapping,
 					PAGE_SIZE, 0);
 	}
 
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 	new_entry = dax_radix_locked_entry(sector, flags);
 
 	if (dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
@@ -562,17 +562,17 @@ static void *dax_insert_mapping_entry(struct address_space *mapping,
 		void **slot;
 		void *ret;
 
-		ret = __radix_tree_lookup(page_tree, index, &node, &slot);
+		ret = __radix_tree_lookup(pages, index, &node, &slot);
 		WARN_ON_ONCE(ret != entry);
-		__radix_tree_replace(page_tree, node, slot,
+		__radix_tree_replace(pages, node, slot,
 				     new_entry, NULL);
 		entry = new_entry;
 	}
 
 	if (dirty)
-		radix_tree_tag_set(page_tree, index, PAGECACHE_TAG_DIRTY);
+		radix_tree_tag_set(pages, index, PAGECACHE_TAG_DIRTY);
 
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 	return entry;
 }
 
@@ -662,7 +662,7 @@ static int dax_writeback_one(struct block_device *bdev,
 		struct dax_device *dax_dev, struct address_space *mapping,
 		pgoff_t index, void *entry)
 {
-	struct radix_tree_root *page_tree = &mapping->page_tree;
+	struct radix_tree_root *pages = &mapping->pages;
 	void *entry2, **slot, *kaddr;
 	long ret = 0, id;
 	sector_t sector;
@@ -677,7 +677,7 @@ static int dax_writeback_one(struct block_device *bdev,
 	if (WARN_ON(!radix_tree_exceptional_entry(entry)))
 		return -EIO;
 
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 	entry2 = get_unlocked_mapping_entry(mapping, index, &slot);
 	/* Entry got punched out / reallocated? */
 	if (!entry2 || WARN_ON_ONCE(!radix_tree_exceptional_entry(entry2)))
@@ -696,7 +696,7 @@ static int dax_writeback_one(struct block_device *bdev,
 	}
 
 	/* Another fsync thread may have already written back this entry */
-	if (!radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE))
+	if (!radix_tree_tag_get(pages, index, PAGECACHE_TAG_TOWRITE))
 		goto put_unlocked;
 	/* Lock the entry to serialize with page faults */
 	entry = lock_slot(mapping, slot);
@@ -704,11 +704,11 @@ static int dax_writeback_one(struct block_device *bdev,
 	 * We can clear the tag now but we have to be careful so that concurrent
 	 * dax_writeback_one() calls for the same index cannot finish before we
 	 * actually flush the caches. This is achieved as the calls will look
-	 * at the entry only under tree_lock and once they do that they will
+	 * at the entry only under xa_lock and once they do that they will
 	 * see the entry locked and wait for it to unlock.
 	 */
-	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
-	spin_unlock_irq(&mapping->tree_lock);
+	radix_tree_tag_clear(pages, index, PAGECACHE_TAG_TOWRITE);
+	xa_unlock_irq(&mapping->pages);
 
 	/*
 	 * Even if dax_writeback_mapping_range() was given a wbc->range_start
@@ -726,7 +726,7 @@ static int dax_writeback_one(struct block_device *bdev,
 		goto dax_unlock;
 
 	/*
-	 * dax_direct_access() may sleep, so cannot hold tree_lock over
+	 * dax_direct_access() may sleep, so cannot hold xa_lock over
 	 * its invocation.
 	 */
 	ret = dax_direct_access(dax_dev, pgoff, size / PAGE_SIZE, &kaddr, &pfn);
@@ -746,9 +746,9 @@ static int dax_writeback_one(struct block_device *bdev,
 	 * the pfn mappings are writeprotected and fault waits for mapping
 	 * entry lock.
 	 */
-	spin_lock_irq(&mapping->tree_lock);
-	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_DIRTY);
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
+	radix_tree_tag_clear(pages, index, PAGECACHE_TAG_DIRTY);
+	xa_unlock_irq(&mapping->pages);
 	trace_dax_writeback_one(mapping->host, index, size >> PAGE_SHIFT);
  dax_unlock:
 	dax_read_unlock(id);
@@ -757,7 +757,7 @@ static int dax_writeback_one(struct block_device *bdev,
 
  put_unlocked:
 	put_unlocked_mapping_entry(mapping, index, entry2);
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 	return ret;
 }
 
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 516fa0d3ff9c..8f51ac47b77f 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2172,12 +2172,12 @@ void f2fs_set_page_dirty_nobuffers(struct page *page)
 	SetPageDirty(page);
 	spin_unlock(&mapping->private_lock);
 
-	spin_lock_irqsave(&mapping->tree_lock, flags);
+	xa_lock_irqsave(&mapping->pages, flags);
 	WARN_ON_ONCE(!PageUptodate(page));
 	account_page_dirtied(page, mapping);
-	radix_tree_tag_set(&mapping->page_tree,
+	radix_tree_tag_set(&mapping->pages,
 			page_index(page), PAGECACHE_TAG_DIRTY);
-	spin_unlock_irqrestore(&mapping->tree_lock, flags);
+	xa_unlock_irqrestore(&mapping->pages, flags);
 	unlock_page_memcg(page);
 
 	__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index 2d98d877c09d..b5515ea6bb2f 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -739,10 +739,10 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page,
 
 	if (bit_pos == NR_DENTRY_IN_BLOCK &&
 			!truncate_hole(dir, page->index, page->index + 1)) {
-		spin_lock_irqsave(&mapping->tree_lock, flags);
-		radix_tree_tag_clear(&mapping->page_tree, page_index(page),
+		xa_lock_irqsave(&mapping->pages, flags);
+		radix_tree_tag_clear(&mapping->pages, page_index(page),
 				     PAGECACHE_TAG_DIRTY);
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
+		xa_unlock_irqrestore(&mapping->pages, flags);
 
 		clear_page_dirty_for_io(page);
 		ClearPagePrivate(page);
diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
index 90e38d8ea688..7858b8e15f33 100644
--- a/fs/f2fs/inline.c
+++ b/fs/f2fs/inline.c
@@ -226,10 +226,10 @@ int f2fs_write_inline_data(struct inode *inode, struct page *page)
 	kunmap_atomic(src_addr);
 	set_page_dirty(dn.inode_page);
 
-	spin_lock_irqsave(&mapping->tree_lock, flags);
-	radix_tree_tag_clear(&mapping->page_tree, page_index(page),
+	xa_lock_irqsave(&mapping->pages, flags);
+	radix_tree_tag_clear(&mapping->pages, page_index(page),
 			     PAGECACHE_TAG_DIRTY);
-	spin_unlock_irqrestore(&mapping->tree_lock, flags);
+	xa_unlock_irqrestore(&mapping->pages, flags);
 
 	set_inode_flag(inode, FI_APPEND_WRITE);
 	set_inode_flag(inode, FI_DATA_EXIST);
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index d3322752426f..6b64a3009d55 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -91,11 +91,11 @@ static void clear_node_page_dirty(struct page *page)
 	unsigned int long flags;
 
 	if (PageDirty(page)) {
-		spin_lock_irqsave(&mapping->tree_lock, flags);
-		radix_tree_tag_clear(&mapping->page_tree,
+		xa_lock_irqsave(&mapping->pages, flags);
+		radix_tree_tag_clear(&mapping->pages,
 				page_index(page),
 				PAGECACHE_TAG_DIRTY);
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
+		xa_unlock_irqrestore(&mapping->pages, flags);
 
 		clear_page_dirty_for_io(page);
 		dec_page_count(F2FS_M_SB(mapping), F2FS_DIRTY_NODES);
@@ -1143,7 +1143,7 @@ void ra_node_page(struct f2fs_sb_info *sbi, nid_t nid)
 	f2fs_bug_on(sbi, check_nid_range(sbi, nid));
 
 	rcu_read_lock();
-	apage = radix_tree_lookup(&NODE_MAPPING(sbi)->page_tree, nid);
+	apage = radix_tree_lookup(&NODE_MAPPING(sbi)->pages, nid);
 	rcu_read_unlock();
 	if (apage)
 		return;
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 08f5debd07d1..50b94da97e4f 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -347,9 +347,9 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
 	 * By the time control reaches here, RCU grace period has passed
 	 * since I_WB_SWITCH assertion and all wb stat update transactions
 	 * between unlocked_inode_to_wb_begin/end() are guaranteed to be
-	 * synchronizing against mapping->tree_lock.
+	 * synchronizing against mapping xa_lock.
 	 *
-	 * Grabbing old_wb->list_lock, inode->i_lock and mapping->tree_lock
+	 * Grabbing old_wb->list_lock, inode->i_lock and mapping xa_lock
 	 * gives us exclusion against all wb related operations on @inode
 	 * including IO list manipulations and stat updates.
 	 */
@@ -361,7 +361,7 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
 		spin_lock_nested(&old_wb->list_lock, SINGLE_DEPTH_NESTING);
 	}
 	spin_lock(&inode->i_lock);
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 
 	/*
 	 * Once I_FREEING is visible under i_lock, the eviction path owns
@@ -375,20 +375,20 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
 	 * to possibly dirty pages while PAGECACHE_TAG_WRITEBACK points to
 	 * pages actually under underwriteback.
 	 */
-	radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter, 0,
+	radix_tree_for_each_tagged(slot, &mapping->pages, &iter, 0,
 				   PAGECACHE_TAG_DIRTY) {
 		struct page *page = radix_tree_deref_slot_protected(slot,
-							&mapping->tree_lock);
+						&mapping->pages.xa_lock);
 		if (likely(page) && PageDirty(page)) {
 			dec_wb_stat(old_wb, WB_RECLAIMABLE);
 			inc_wb_stat(new_wb, WB_RECLAIMABLE);
 		}
 	}
 
-	radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter, 0,
+	radix_tree_for_each_tagged(slot, &mapping->pages, &iter, 0,
 				   PAGECACHE_TAG_WRITEBACK) {
 		struct page *page = radix_tree_deref_slot_protected(slot,
-							&mapping->tree_lock);
+						&mapping->pages.xa_lock);
 		if (likely(page)) {
 			WARN_ON_ONCE(!PageWriteback(page));
 			dec_wb_stat(old_wb, WB_WRITEBACK);
@@ -430,7 +430,7 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
 	 */
 	smp_store_release(&inode->i_state, inode->i_state & ~I_WB_SWITCH);
 
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 	spin_unlock(&inode->i_lock);
 	spin_unlock(&new_wb->list_lock);
 	spin_unlock(&old_wb->list_lock);
@@ -507,7 +507,7 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
 	/*
 	 * In addition to synchronizing among switchers, I_WB_SWITCH tells
 	 * the RCU protected stat update paths to grab the mapping's
-	 * tree_lock so that stat transfer can synchronize against them.
+	 * xa_lock so that stat transfer can synchronize against them.
 	 * Let's continue after I_WB_SWITCH is guaranteed to be visible.
 	 */
 	call_rcu(&isw->rcu_head, inode_switch_wbs_rcu_fn);
diff --git a/fs/inode.c b/fs/inode.c
index fd401028a309..5dff2f1004e3 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -348,8 +348,7 @@ EXPORT_SYMBOL(inc_nlink);
 void address_space_init_once(struct address_space *mapping)
 {
 	memset(mapping, 0, sizeof(*mapping));
-	INIT_RADIX_TREE(&mapping->page_tree, GFP_ATOMIC | __GFP_ACCOUNT);
-	spin_lock_init(&mapping->tree_lock);
+	INIT_RADIX_TREE(&mapping->pages, GFP_ATOMIC | __GFP_ACCOUNT);
 	init_rwsem(&mapping->i_mmap_rwsem);
 	INIT_LIST_HEAD(&mapping->private_list);
 	spin_lock_init(&mapping->private_lock);
@@ -499,14 +498,14 @@ void clear_inode(struct inode *inode)
 {
 	might_sleep();
 	/*
-	 * We have to cycle tree_lock here because reclaim can be still in the
+	 * We have to cycle the xa_lock here because reclaim can be in the
 	 * process of removing the last page (in __delete_from_page_cache())
-	 * and we must not free mapping under it.
+	 * and we must not free the mapping under it.
 	 */
-	spin_lock_irq(&inode->i_data.tree_lock);
+	xa_lock_irq(&inode->i_data.pages);
 	BUG_ON(inode->i_data.nrpages);
 	BUG_ON(inode->i_data.nrexceptional);
-	spin_unlock_irq(&inode->i_data.tree_lock);
+	xa_unlock_irq(&inode->i_data.pages);
 	BUG_ON(!list_empty(&inode->i_data.private_list));
 	BUG_ON(!(inode->i_state & I_FREEING));
 	BUG_ON(inode->i_state & I_CLEAR);
diff --git a/fs/nilfs2/btnode.c b/fs/nilfs2/btnode.c
index c21e0b4454a6..9e2a00207436 100644
--- a/fs/nilfs2/btnode.c
+++ b/fs/nilfs2/btnode.c
@@ -193,9 +193,9 @@ int nilfs_btnode_prepare_change_key(struct address_space *btnc,
 				       (unsigned long long)oldkey,
 				       (unsigned long long)newkey);
 
-		spin_lock_irq(&btnc->tree_lock);
-		err = radix_tree_insert(&btnc->page_tree, newkey, obh->b_page);
-		spin_unlock_irq(&btnc->tree_lock);
+		xa_lock_irq(&btnc->pages);
+		err = radix_tree_insert(&btnc->pages, newkey, obh->b_page);
+		xa_unlock_irq(&btnc->pages);
 		/*
 		 * Note: page->index will not change to newkey until
 		 * nilfs_btnode_commit_change_key() will be called.
@@ -251,11 +251,11 @@ void nilfs_btnode_commit_change_key(struct address_space *btnc,
 				       (unsigned long long)newkey);
 		mark_buffer_dirty(obh);
 
-		spin_lock_irq(&btnc->tree_lock);
-		radix_tree_delete(&btnc->page_tree, oldkey);
-		radix_tree_tag_set(&btnc->page_tree, newkey,
+		xa_lock_irq(&btnc->pages);
+		radix_tree_delete(&btnc->pages, oldkey);
+		radix_tree_tag_set(&btnc->pages, newkey,
 				   PAGECACHE_TAG_DIRTY);
-		spin_unlock_irq(&btnc->tree_lock);
+		xa_unlock_irq(&btnc->pages);
 
 		opage->index = obh->b_blocknr = newkey;
 		unlock_page(opage);
@@ -283,9 +283,9 @@ void nilfs_btnode_abort_change_key(struct address_space *btnc,
 		return;
 
 	if (nbh == NULL) {	/* blocksize == pagesize */
-		spin_lock_irq(&btnc->tree_lock);
-		radix_tree_delete(&btnc->page_tree, newkey);
-		spin_unlock_irq(&btnc->tree_lock);
+		xa_lock_irq(&btnc->pages);
+		radix_tree_delete(&btnc->pages, newkey);
+		xa_unlock_irq(&btnc->pages);
 		unlock_page(ctxt->bh->b_page);
 	} else
 		brelse(nbh);
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 68241512d7c1..1c6703efde9e 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -331,15 +331,15 @@ void nilfs_copy_back_pages(struct address_space *dmap,
 			struct page *page2;
 
 			/* move the page to the destination cache */
-			spin_lock_irq(&smap->tree_lock);
-			page2 = radix_tree_delete(&smap->page_tree, offset);
+			xa_lock_irq(&smap->pages);
+			page2 = radix_tree_delete(&smap->pages, offset);
 			WARN_ON(page2 != page);
 
 			smap->nrpages--;
-			spin_unlock_irq(&smap->tree_lock);
+			xa_unlock_irq(&smap->pages);
 
-			spin_lock_irq(&dmap->tree_lock);
-			err = radix_tree_insert(&dmap->page_tree, offset, page);
+			xa_lock_irq(&dmap->pages);
+			err = radix_tree_insert(&dmap->pages, offset, page);
 			if (unlikely(err < 0)) {
 				WARN_ON(err == -EEXIST);
 				page->mapping = NULL;
@@ -348,11 +348,11 @@ void nilfs_copy_back_pages(struct address_space *dmap,
 				page->mapping = dmap;
 				dmap->nrpages++;
 				if (PageDirty(page))
-					radix_tree_tag_set(&dmap->page_tree,
+					radix_tree_tag_set(&dmap->pages,
 							   offset,
 							   PAGECACHE_TAG_DIRTY);
 			}
-			spin_unlock_irq(&dmap->tree_lock);
+			xa_unlock_irq(&dmap->pages);
 		}
 		unlock_page(page);
 	}
@@ -474,15 +474,15 @@ int __nilfs_clear_page_dirty(struct page *page)
 	struct address_space *mapping = page->mapping;
 
 	if (mapping) {
-		spin_lock_irq(&mapping->tree_lock);
+		xa_lock_irq(&mapping->pages);
 		if (test_bit(PG_dirty, &page->flags)) {
-			radix_tree_tag_clear(&mapping->page_tree,
+			radix_tree_tag_clear(&mapping->pages,
 					     page_index(page),
 					     PAGECACHE_TAG_DIRTY);
-			spin_unlock_irq(&mapping->tree_lock);
+			xa_unlock_irq(&mapping->pages);
 			return clear_page_dirty_for_io(page);
 		}
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		return 0;
 	}
 	return TestClearPageDirty(page);
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index e54e7e0033eb..9038f6c1eeda 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -329,7 +329,7 @@ static inline bool inode_to_wb_is_valid(struct inode *inode)
  * @inode: inode of interest
  *
  * Returns the wb @inode is currently associated with.  The caller must be
- * holding either @inode->i_lock, @inode->i_mapping->tree_lock, or the
+ * holding either @inode->i_lock, @inode->i_mapping->pages.xa_lock, or the
  * associated wb's list_lock.
  */
 static inline struct bdi_writeback *inode_to_wb(struct inode *inode)
@@ -337,7 +337,7 @@ static inline struct bdi_writeback *inode_to_wb(struct inode *inode)
 #ifdef CONFIG_LOCKDEP
 	WARN_ON_ONCE(debug_locks &&
 		     (!lockdep_is_held(&inode->i_lock) &&
-		      !lockdep_is_held(&inode->i_mapping->tree_lock) &&
+		      !xa_lock_held(&inode->i_mapping->pages) &&
 		      !lockdep_is_held(&inode->i_wb->list_lock)));
 #endif
 	return inode->i_wb;
@@ -349,7 +349,7 @@ static inline struct bdi_writeback *inode_to_wb(struct inode *inode)
  * @lockedp: temp bool output param, to be passed to the end function
  *
  * The caller wants to access the wb associated with @inode but isn't
- * holding inode->i_lock, mapping->tree_lock or wb->list_lock.  This
+ * holding inode->i_lock, mapping->pages.xa_lock or wb->list_lock.  This
  * function determines the wb associated with @inode and ensures that the
  * association doesn't change until the transaction is finished with
  * unlocked_inode_to_wb_end().
@@ -370,10 +370,10 @@ unlocked_inode_to_wb_begin(struct inode *inode, bool *lockedp)
 	*lockedp = smp_load_acquire(&inode->i_state) & I_WB_SWITCH;
 
 	if (unlikely(*lockedp))
-		spin_lock_irq(&inode->i_mapping->tree_lock);
+		xa_lock_irq(&inode->i_mapping->pages);
 
 	/*
-	 * Protected by either !I_WB_SWITCH + rcu_read_lock() or tree_lock.
+	 * Protected by either !I_WB_SWITCH + rcu_read_lock() or xa_lock.
 	 * inode_to_wb() will bark.  Deref directly.
 	 */
 	return inode->i_wb;
@@ -387,7 +387,7 @@ unlocked_inode_to_wb_begin(struct inode *inode, bool *lockedp)
 static inline void unlocked_inode_to_wb_end(struct inode *inode, bool locked)
 {
 	if (unlikely(locked))
-		spin_unlock_irq(&inode->i_mapping->tree_lock);
+		xa_unlock_irq(&inode->i_mapping->pages);
 
 	rcu_read_unlock();
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2995a271ec46..265340149265 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -13,6 +13,7 @@
 #include <linux/list_lru.h>
 #include <linux/llist.h>
 #include <linux/radix-tree.h>
+#include <linux/xarray.h>
 #include <linux/rbtree.h>
 #include <linux/init.h>
 #include <linux/pid.h>
@@ -390,23 +391,21 @@ int pagecache_write_end(struct file *, struct address_space *mapping,
 
 struct address_space {
 	struct inode		*host;		/* owner: inode, block_device */
-	struct radix_tree_root	page_tree;	/* radix tree of all pages */
-	spinlock_t		tree_lock;	/* and lock protecting it */
+	struct radix_tree_root	pages;		/* cached pages */
+	gfp_t			gfp_mask;	/* for allocating pages */
 	atomic_t		i_mmap_writable;/* count VM_SHARED mappings */
 	struct rb_root_cached	i_mmap;		/* tree of private and shared mappings */
 	struct rw_semaphore	i_mmap_rwsem;	/* protect tree, count, list */
-	/* Protected by tree_lock together with the radix tree */
+	/* Protected by pages.xa_lock */
 	unsigned long		nrpages;	/* number of total pages */
-	/* number of shadow or DAX exceptional entries */
-	unsigned long		nrexceptional;
+	unsigned long		nrexceptional;	/* shadow or DAX entries */
 	pgoff_t			writeback_index;/* writeback starts here */
 	const struct address_space_operations *a_ops;	/* methods */
 	unsigned long		flags;		/* error bits */
+	errseq_t		wb_err;
 	spinlock_t		private_lock;	/* for use by the address_space */
-	gfp_t			gfp_mask;	/* implicit gfp mask for allocations */
-	struct list_head	private_list;	/* for use by the address_space */
+	struct list_head	private_list;	/* ditto */
 	void			*private_data;	/* ditto */
-	errseq_t		wb_err;
 } __attribute__((aligned(sizeof(long)))) __randomize_layout;
 	/*
 	 * On most architectures that alignment is already the case; but
@@ -1977,7 +1976,7 @@ static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
  *
  * I_WB_SWITCH		Cgroup bdi_writeback switching in progress.  Used to
  *			synchronize competing switching instances and to tell
- *			wb stat updates to grab mapping->tree_lock.  See
+ *			wb stat updates to grab mapping->pages.xa_lock.  See
  *			inode_switch_wb_work_fn() for details.
  *
  * I_OVL_INUSE		Used by overlayfs to get exclusive ownership on upper
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c692451245be..4a6e147dcf9d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -737,7 +737,7 @@ int finish_mkwrite_fault(struct vm_fault *vmf);
  * refcount. The each user mapping also has a reference to the page.
  *
  * The pagecache pages are stored in a per-mapping radix tree, which is
- * rooted at mapping->page_tree, and indexed by offset.
+ * rooted at mapping->pages, and indexed by offset.
  * Where 2.4 and early 2.6 kernels kept dirty/clean pages in per-address_space
  * lists, we instead now tag pages as dirty/writeback in the radix tree.
  *
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 34ce3ebf97d5..80a6149152d4 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -144,7 +144,7 @@ void release_pages(struct page **pages, int nr);
  * 3. check the page is still in pagecache (if no, goto 1)
  *
  * Remove-side that cares about stability of _refcount (eg. reclaim) has the
- * following (with tree_lock held for write):
+ * following (with pages.xa_lock held):
  * A. atomically check refcount is correct and set it to 0 (atomic_cmpxchg)
  * B. remove page from pagecache
  * C. free the page
@@ -157,7 +157,7 @@ void release_pages(struct page **pages, int nr);
  *
  * It is possible that between 1 and 2, the page is removed then the exact same
  * page is inserted into the same position in pagecache. That's OK: the
- * old find_get_page using tree_lock could equally have run before or after
+ * old find_get_page using a lock could equally have run before or after
  * such a re-insertion, depending on order that locks are granted.
  *
  * Lookups racing against pagecache insertion isn't a big problem: either 1
diff --git a/mm/filemap.c b/mm/filemap.c
index ee83baaf855d..5c8f22fe4e62 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -67,7 +67,7 @@
  *  ->i_mmap_rwsem		(truncate_pagecache)
  *    ->private_lock		(__free_pte->__set_page_dirty_buffers)
  *      ->swap_lock		(exclusive_swap_page, others)
- *        ->mapping->tree_lock
+ *        ->mapping->pages.xa_lock
  *
  *  ->i_mutex
  *    ->i_mmap_rwsem		(truncate->unmap_mapping_range)
@@ -75,7 +75,7 @@
  *  ->mmap_sem
  *    ->i_mmap_rwsem
  *      ->page_table_lock or pte_lock	(various, mainly in memory.c)
- *        ->mapping->tree_lock	(arch-dependent flush_dcache_mmap_lock)
+ *        ->mapping->pages.xa_lock	(arch-dependent flush_dcache_mmap_lock)
  *
  *  ->mmap_sem
  *    ->lock_page		(access_process_vm)
@@ -85,7 +85,7 @@
  *
  *  bdi->wb.list_lock
  *    sb_lock			(fs/fs-writeback.c)
- *    ->mapping->tree_lock	(__sync_single_inode)
+ *    ->mapping->pages.xa_lock	(__sync_single_inode)
  *
  *  ->i_mmap_rwsem
  *    ->anon_vma.lock		(vma_adjust)
@@ -96,11 +96,11 @@
  *  ->page_table_lock or pte_lock
  *    ->swap_lock		(try_to_unmap_one)
  *    ->private_lock		(try_to_unmap_one)
- *    ->tree_lock		(try_to_unmap_one)
+ *    ->pages.xa_lock		(try_to_unmap_one)
  *    ->zone_lru_lock(zone)	(follow_page->mark_page_accessed)
  *    ->zone_lru_lock(zone)	(check_pte_range->isolate_lru_page)
  *    ->private_lock		(page_remove_rmap->set_page_dirty)
- *    ->tree_lock		(page_remove_rmap->set_page_dirty)
+ *    ->pages.xa_lock		(page_remove_rmap->set_page_dirty)
  *    bdi.wb->list_lock		(page_remove_rmap->set_page_dirty)
  *    ->inode->i_lock		(page_remove_rmap->set_page_dirty)
  *    ->memcg->move_lock	(page_remove_rmap->lock_page_memcg)
@@ -119,14 +119,14 @@ static int page_cache_tree_insert(struct address_space *mapping,
 	void **slot;
 	int error;
 
-	error = __radix_tree_create(&mapping->page_tree, page->index, 0,
+	error = __radix_tree_create(&mapping->pages, page->index, 0,
 				    &node, &slot);
 	if (error)
 		return error;
 	if (*slot) {
 		void *p;
 
-		p = radix_tree_deref_slot_protected(slot, &mapping->tree_lock);
+		p = radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock);
 		if (!radix_tree_exceptional_entry(p))
 			return -EEXIST;
 
@@ -134,7 +134,7 @@ static int page_cache_tree_insert(struct address_space *mapping,
 		if (shadowp)
 			*shadowp = p;
 	}
-	__radix_tree_replace(&mapping->page_tree, node, slot, page,
+	__radix_tree_replace(&mapping->pages, node, slot, page,
 			     workingset_lookup_update(mapping));
 	mapping->nrpages++;
 	return 0;
@@ -156,13 +156,13 @@ static void page_cache_tree_delete(struct address_space *mapping,
 		struct radix_tree_node *node;
 		void **slot;
 
-		__radix_tree_lookup(&mapping->page_tree, page->index + i,
+		__radix_tree_lookup(&mapping->pages, page->index + i,
 				    &node, &slot);
 
 		VM_BUG_ON_PAGE(!node && nr != 1, page);
 
-		radix_tree_clear_tags(&mapping->page_tree, node, slot);
-		__radix_tree_replace(&mapping->page_tree, node, slot, shadow,
+		radix_tree_clear_tags(&mapping->pages, node, slot);
+		__radix_tree_replace(&mapping->pages, node, slot, shadow,
 				workingset_lookup_update(mapping));
 	}
 
@@ -254,7 +254,7 @@ static void unaccount_page_cache_page(struct address_space *mapping,
 /*
  * Delete a page from the page cache and free it. Caller has to make
  * sure the page is locked and that nobody else uses it - or that usage
- * is safe.  The caller must hold the mapping's tree_lock.
+ * is safe.  The caller must hold the mapping's xa_lock.
  */
 void __delete_from_page_cache(struct page *page, void *shadow)
 {
@@ -297,9 +297,9 @@ void delete_from_page_cache(struct page *page)
 	unsigned long flags;
 
 	BUG_ON(!PageLocked(page));
-	spin_lock_irqsave(&mapping->tree_lock, flags);
+	xa_lock_irqsave(&mapping->pages, flags);
 	__delete_from_page_cache(page, NULL);
-	spin_unlock_irqrestore(&mapping->tree_lock, flags);
+	xa_unlock_irqrestore(&mapping->pages, flags);
 
 	page_cache_free_page(mapping, page);
 }
@@ -310,14 +310,14 @@ EXPORT_SYMBOL(delete_from_page_cache);
  * @mapping: the mapping to which pages belong
  * @pvec: pagevec with pages to delete
  *
- * The function walks over mapping->page_tree and removes pages passed in @pvec
- * from the radix tree. The function expects @pvec to be sorted by page index.
- * It tolerates holes in @pvec (radix tree entries at those indices are not
+ * The function walks over mapping->pages and removes pages passed in @pvec
+ * from the mapping. The function expects @pvec to be sorted by page index.
+ * It tolerates holes in @pvec (mapping entries at those indices are not
  * modified). The function expects only THP head pages to be present in the
- * @pvec and takes care to delete all corresponding tail pages from the radix
- * tree as well.
+ * @pvec and takes care to delete all corresponding tail pages from the
+ * mapping as well.
  *
- * The function expects mapping->tree_lock to be held.
+ * The function expects xa_lock to be held.
  */
 static void
 page_cache_tree_delete_batch(struct address_space *mapping,
@@ -331,11 +331,11 @@ page_cache_tree_delete_batch(struct address_space *mapping,
 	pgoff_t start;
 
 	start = pvec->pages[0]->index;
-	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
+	radix_tree_for_each_slot(slot, &mapping->pages, &iter, start) {
 		if (i >= pagevec_count(pvec) && !tail_pages)
 			break;
 		page = radix_tree_deref_slot_protected(slot,
-						       &mapping->tree_lock);
+						       &mapping->pages.xa_lock);
 		if (radix_tree_exceptional_entry(page))
 			continue;
 		if (!tail_pages) {
@@ -358,8 +358,8 @@ page_cache_tree_delete_batch(struct address_space *mapping,
 		} else {
 			tail_pages--;
 		}
-		radix_tree_clear_tags(&mapping->page_tree, iter.node, slot);
-		__radix_tree_replace(&mapping->page_tree, iter.node, slot, NULL,
+		radix_tree_clear_tags(&mapping->pages, iter.node, slot);
+		__radix_tree_replace(&mapping->pages, iter.node, slot, NULL,
 				workingset_lookup_update(mapping));
 		total_pages++;
 	}
@@ -375,14 +375,14 @@ void delete_from_page_cache_batch(struct address_space *mapping,
 	if (!pagevec_count(pvec))
 		return;
 
-	spin_lock_irqsave(&mapping->tree_lock, flags);
+	xa_lock_irqsave(&mapping->pages, flags);
 	for (i = 0; i < pagevec_count(pvec); i++) {
 		trace_mm_filemap_delete_from_page_cache(pvec->pages[i]);
 
 		unaccount_page_cache_page(mapping, pvec->pages[i]);
 	}
 	page_cache_tree_delete_batch(mapping, pvec);
-	spin_unlock_irqrestore(&mapping->tree_lock, flags);
+	xa_unlock_irqrestore(&mapping->pages, flags);
 
 	for (i = 0; i < pagevec_count(pvec); i++)
 		page_cache_free_page(mapping, pvec->pages[i]);
@@ -799,7 +799,7 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
 		new->mapping = mapping;
 		new->index = offset;
 
-		spin_lock_irqsave(&mapping->tree_lock, flags);
+		xa_lock_irqsave(&mapping->pages, flags);
 		__delete_from_page_cache(old, NULL);
 		error = page_cache_tree_insert(mapping, new, NULL);
 		BUG_ON(error);
@@ -811,7 +811,7 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
 			__inc_node_page_state(new, NR_FILE_PAGES);
 		if (PageSwapBacked(new))
 			__inc_node_page_state(new, NR_SHMEM);
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
+		xa_unlock_irqrestore(&mapping->pages, flags);
 		mem_cgroup_migrate(old, new);
 		radix_tree_preload_end();
 		if (freepage)
@@ -853,7 +853,7 @@ static int __add_to_page_cache_locked(struct page *page,
 	page->mapping = mapping;
 	page->index = offset;
 
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 	error = page_cache_tree_insert(mapping, page, shadowp);
 	radix_tree_preload_end();
 	if (unlikely(error))
@@ -862,7 +862,7 @@ static int __add_to_page_cache_locked(struct page *page,
 	/* hugetlb pages do not participate in page cache accounting. */
 	if (!huge)
 		__inc_node_page_state(page, NR_FILE_PAGES);
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 	if (!huge)
 		mem_cgroup_commit_charge(page, memcg, false, false);
 	trace_mm_filemap_add_to_page_cache(page);
@@ -870,7 +870,7 @@ static int __add_to_page_cache_locked(struct page *page,
 err_insert:
 	page->mapping = NULL;
 	/* Leave page->index set: truncation relies upon it */
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 	if (!huge)
 		mem_cgroup_cancel_charge(page, memcg, false);
 	put_page(page);
@@ -1354,7 +1354,7 @@ pgoff_t page_cache_next_hole(struct address_space *mapping,
 	for (i = 0; i < max_scan; i++) {
 		struct page *page;
 
-		page = radix_tree_lookup(&mapping->page_tree, index);
+		page = radix_tree_lookup(&mapping->pages, index);
 		if (!page || radix_tree_exceptional_entry(page))
 			break;
 		index++;
@@ -1395,7 +1395,7 @@ pgoff_t page_cache_prev_hole(struct address_space *mapping,
 	for (i = 0; i < max_scan; i++) {
 		struct page *page;
 
-		page = radix_tree_lookup(&mapping->page_tree, index);
+		page = radix_tree_lookup(&mapping->pages, index);
 		if (!page || radix_tree_exceptional_entry(page))
 			break;
 		index--;
@@ -1428,7 +1428,7 @@ struct page *find_get_entry(struct address_space *mapping, pgoff_t offset)
 	rcu_read_lock();
 repeat:
 	page = NULL;
-	pagep = radix_tree_lookup_slot(&mapping->page_tree, offset);
+	pagep = radix_tree_lookup_slot(&mapping->pages, offset);
 	if (pagep) {
 		page = radix_tree_deref_slot(pagep);
 		if (unlikely(!page))
@@ -1634,7 +1634,7 @@ unsigned find_get_entries(struct address_space *mapping,
 		return 0;
 
 	rcu_read_lock();
-	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
+	radix_tree_for_each_slot(slot, &mapping->pages, &iter, start) {
 		struct page *head, *page;
 repeat:
 		page = radix_tree_deref_slot(slot);
@@ -1711,7 +1711,7 @@ unsigned find_get_pages_range(struct address_space *mapping, pgoff_t *start,
 		return 0;
 
 	rcu_read_lock();
-	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, *start) {
+	radix_tree_for_each_slot(slot, &mapping->pages, &iter, *start) {
 		struct page *head, *page;
 
 		if (iter.index > end)
@@ -1796,7 +1796,7 @@ unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t index,
 		return 0;
 
 	rcu_read_lock();
-	radix_tree_for_each_contig(slot, &mapping->page_tree, &iter, index) {
+	radix_tree_for_each_contig(slot, &mapping->pages, &iter, index) {
 		struct page *head, *page;
 repeat:
 		page = radix_tree_deref_slot(slot);
@@ -1876,8 +1876,7 @@ unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index,
 		return 0;
 
 	rcu_read_lock();
-	radix_tree_for_each_tagged(slot, &mapping->page_tree,
-				   &iter, *index, tag) {
+	radix_tree_for_each_tagged(slot, &mapping->pages, &iter, *index, tag) {
 		struct page *head, *page;
 
 		if (iter.index > end)
@@ -1970,8 +1969,7 @@ unsigned find_get_entries_tag(struct address_space *mapping, pgoff_t start,
 		return 0;
 
 	rcu_read_lock();
-	radix_tree_for_each_tagged(slot, &mapping->page_tree,
-				   &iter, start, tag) {
+	radix_tree_for_each_tagged(slot, &mapping->pages, &iter, start, tag) {
 		struct page *head, *page;
 repeat:
 		page = radix_tree_deref_slot(slot);
@@ -2625,8 +2623,7 @@ void filemap_map_pages(struct vm_fault *vmf,
 	struct page *head, *page;
 
 	rcu_read_lock();
-	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter,
-			start_pgoff) {
+	radix_tree_for_each_slot(slot, &mapping->pages, &iter, start_pgoff) {
 		if (iter.index > end_pgoff)
 			break;
 repeat:
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 86fe697e8bfb..1fe17f3b7473 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2468,7 +2468,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 	} else {
 		/* Additional pin to radix tree */
 		page_ref_add(head, 2);
-		spin_unlock(&head->mapping->tree_lock);
+		xa_unlock(&head->mapping->pages);
 	}
 
 	spin_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags);
@@ -2676,15 +2676,15 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 	if (mapping) {
 		void **pslot;
 
-		spin_lock(&mapping->tree_lock);
-		pslot = radix_tree_lookup_slot(&mapping->page_tree,
+		xa_lock(&mapping->pages);
+		pslot = radix_tree_lookup_slot(&mapping->pages,
 				page_index(head));
 		/*
 		 * Check if the head page is present in radix tree.
 		 * We assume all tail are present too, if head is there.
 		 */
 		if (radix_tree_deref_slot_protected(pslot,
-					&mapping->tree_lock) != head)
+					&mapping->pages.xa_lock) != head)
 			goto fail;
 	}
 
@@ -2718,7 +2718,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 		}
 		spin_unlock(&pgdata->split_queue_lock);
 fail:		if (mapping)
-			spin_unlock(&mapping->tree_lock);
+			xa_unlock(&mapping->pages);
 		spin_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags);
 		unfreeze_page(head);
 		ret = -EBUSY;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index ea4ff259b671..cb4d199bf328 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1339,8 +1339,8 @@ static void collapse_shmem(struct mm_struct *mm,
 	 */
 
 	index = start;
-	spin_lock_irq(&mapping->tree_lock);
-	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
+	xa_lock_irq(&mapping->pages);
+	radix_tree_for_each_slot(slot, &mapping->pages, &iter, start) {
 		int n = min(iter.index, end) - index;
 
 		/*
@@ -1353,7 +1353,7 @@ static void collapse_shmem(struct mm_struct *mm,
 		}
 		nr_none += n;
 		for (; index < min(iter.index, end); index++) {
-			radix_tree_insert(&mapping->page_tree, index,
+			radix_tree_insert(&mapping->pages, index,
 					new_page + (index % HPAGE_PMD_NR));
 		}
 
@@ -1362,16 +1362,16 @@ static void collapse_shmem(struct mm_struct *mm,
 			break;
 
 		page = radix_tree_deref_slot_protected(slot,
-				&mapping->tree_lock);
+				&mapping->pages.xa_lock);
 		if (radix_tree_exceptional_entry(page) || !PageUptodate(page)) {
-			spin_unlock_irq(&mapping->tree_lock);
+			xa_unlock_irq(&mapping->pages);
 			/* swap in or instantiate fallocated page */
 			if (shmem_getpage(mapping->host, index, &page,
 						SGP_NOHUGE)) {
 				result = SCAN_FAIL;
 				goto tree_unlocked;
 			}
-			spin_lock_irq(&mapping->tree_lock);
+			xa_lock_irq(&mapping->pages);
 		} else if (trylock_page(page)) {
 			get_page(page);
 		} else {
@@ -1380,7 +1380,7 @@ static void collapse_shmem(struct mm_struct *mm,
 		}
 
 		/*
-		 * The page must be locked, so we can drop the tree_lock
+		 * The page must be locked, so we can drop the xa_lock
 		 * without racing with truncate.
 		 */
 		VM_BUG_ON_PAGE(!PageLocked(page), page);
@@ -1391,7 +1391,7 @@ static void collapse_shmem(struct mm_struct *mm,
 			result = SCAN_TRUNCATED;
 			goto out_unlock;
 		}
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 
 		if (isolate_lru_page(page)) {
 			result = SCAN_DEL_PAGE_LRU;
@@ -1402,11 +1402,11 @@ static void collapse_shmem(struct mm_struct *mm,
 			unmap_mapping_range(mapping, index << PAGE_SHIFT,
 					PAGE_SIZE, 0);
 
-		spin_lock_irq(&mapping->tree_lock);
+		xa_lock_irq(&mapping->pages);
 
-		slot = radix_tree_lookup_slot(&mapping->page_tree, index);
+		slot = radix_tree_lookup_slot(&mapping->pages, index);
 		VM_BUG_ON_PAGE(page != radix_tree_deref_slot_protected(slot,
-					&mapping->tree_lock), page);
+					&mapping->pages.xa_lock), page);
 		VM_BUG_ON_PAGE(page_mapped(page), page);
 
 		/*
@@ -1427,14 +1427,14 @@ static void collapse_shmem(struct mm_struct *mm,
 		list_add_tail(&page->lru, &pagelist);
 
 		/* Finally, replace with the new page. */
-		radix_tree_replace_slot(&mapping->page_tree, slot,
+		radix_tree_replace_slot(&mapping->pages, slot,
 				new_page + (index % HPAGE_PMD_NR));
 
 		slot = radix_tree_iter_resume(slot, &iter);
 		index++;
 		continue;
 out_lru:
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		putback_lru_page(page);
 out_isolate_failed:
 		unlock_page(page);
@@ -1460,14 +1460,14 @@ static void collapse_shmem(struct mm_struct *mm,
 		}
 
 		for (; index < end; index++) {
-			radix_tree_insert(&mapping->page_tree, index,
+			radix_tree_insert(&mapping->pages, index,
 					new_page + (index % HPAGE_PMD_NR));
 		}
 		nr_none += n;
 	}
 
 tree_locked:
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 tree_unlocked:
 
 	if (result == SCAN_SUCCEED) {
@@ -1516,9 +1516,8 @@ static void collapse_shmem(struct mm_struct *mm,
 	} else {
 		/* Something went wrong: rollback changes to the radix-tree */
 		shmem_uncharge(mapping->host, nr_none);
-		spin_lock_irq(&mapping->tree_lock);
-		radix_tree_for_each_slot(slot, &mapping->page_tree, &iter,
-				start) {
+		xa_lock_irq(&mapping->pages);
+		radix_tree_for_each_slot(slot, &mapping->pages, &iter, start) {
 			if (iter.index >= end)
 				break;
 			page = list_first_entry_or_null(&pagelist,
@@ -1528,8 +1527,7 @@ static void collapse_shmem(struct mm_struct *mm,
 					break;
 				nr_none--;
 				/* Put holes back where they were */
-				radix_tree_delete(&mapping->page_tree,
-						  iter.index);
+				radix_tree_delete(&mapping->pages, iter.index);
 				continue;
 			}
 
@@ -1538,16 +1536,15 @@ static void collapse_shmem(struct mm_struct *mm,
 			/* Unfreeze the page. */
 			list_del(&page->lru);
 			page_ref_unfreeze(page, 2);
-			radix_tree_replace_slot(&mapping->page_tree,
-						slot, page);
+			radix_tree_replace_slot(&mapping->pages, slot, page);
 			slot = radix_tree_iter_resume(slot, &iter);
-			spin_unlock_irq(&mapping->tree_lock);
+			xa_unlock_irq(&mapping->pages);
 			putback_lru_page(page);
 			unlock_page(page);
-			spin_lock_irq(&mapping->tree_lock);
+			xa_lock_irq(&mapping->pages);
 		}
 		VM_BUG_ON(nr_none);
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 
 		/* Unfreeze new_page, caller would take care about freeing it */
 		page_ref_unfreeze(new_page, 1);
@@ -1575,7 +1572,7 @@ static void khugepaged_scan_shmem(struct mm_struct *mm,
 	swap = 0;
 	memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load));
 	rcu_read_lock();
-	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
+	radix_tree_for_each_slot(slot, &mapping->pages, &iter, start) {
 		if (iter.index >= start + HPAGE_PMD_NR)
 			break;
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 50e6906314f8..953dfed9e780 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6034,7 +6034,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
 
 	/*
 	 * Interrupts should be disabled here because the caller holds the
-	 * mapping->tree_lock lock which is taken with interrupts-off. It is
+	 * mapping->pages xa_lock which is taken with interrupts-off. It is
 	 * important here to have the interrupts disabled because it is the
 	 * only synchronisation we have for udpating the per-CPU variables.
 	 */
diff --git a/mm/migrate.c b/mm/migrate.c
index 4d0be47a322a..59f18c571120 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -466,20 +466,20 @@ int migrate_page_move_mapping(struct address_space *mapping,
 	oldzone = page_zone(page);
 	newzone = page_zone(newpage);
 
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 
-	pslot = radix_tree_lookup_slot(&mapping->page_tree,
+	pslot = radix_tree_lookup_slot(&mapping->pages,
  					page_index(page));
 
 	expected_count += 1 + page_has_private(page);
 	if (page_count(page) != expected_count ||
-		radix_tree_deref_slot_protected(pslot, &mapping->tree_lock) != page) {
-		spin_unlock_irq(&mapping->tree_lock);
+		radix_tree_deref_slot_protected(pslot, &mapping->pages.xa_lock) != page) {
+		xa_unlock_irq(&mapping->pages);
 		return -EAGAIN;
 	}
 
 	if (!page_ref_freeze(page, expected_count)) {
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		return -EAGAIN;
 	}
 
@@ -493,7 +493,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
 	if (mode == MIGRATE_ASYNC && head &&
 			!buffer_migrate_lock_buffers(head, mode)) {
 		page_ref_unfreeze(page, expected_count);
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		return -EAGAIN;
 	}
 
@@ -521,7 +521,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
 		SetPageDirty(newpage);
 	}
 
-	radix_tree_replace_slot(&mapping->page_tree, pslot, newpage);
+	radix_tree_replace_slot(&mapping->pages, pslot, newpage);
 
 	/*
 	 * Drop cache reference from old page by unfreezing
@@ -530,7 +530,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
 	 */
 	page_ref_unfreeze(page, expected_count - 1);
 
-	spin_unlock(&mapping->tree_lock);
+	xa_unlock(&mapping->pages);
 	/* Leave irq disabled to prevent preemption while updating stats */
 
 	/*
@@ -573,20 +573,19 @@ int migrate_huge_page_move_mapping(struct address_space *mapping,
 	int expected_count;
 	void **pslot;
 
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 
-	pslot = radix_tree_lookup_slot(&mapping->page_tree,
-					page_index(page));
+	pslot = radix_tree_lookup_slot(&mapping->pages, page_index(page));
 
 	expected_count = 2 + page_has_private(page);
 	if (page_count(page) != expected_count ||
-		radix_tree_deref_slot_protected(pslot, &mapping->tree_lock) != page) {
-		spin_unlock_irq(&mapping->tree_lock);
+		radix_tree_deref_slot_protected(pslot, &mapping->pages.xa_lock) != page) {
+		xa_unlock_irq(&mapping->pages);
 		return -EAGAIN;
 	}
 
 	if (!page_ref_freeze(page, expected_count)) {
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		return -EAGAIN;
 	}
 
@@ -595,11 +594,11 @@ int migrate_huge_page_move_mapping(struct address_space *mapping,
 
 	get_page(newpage);
 
-	radix_tree_replace_slot(&mapping->page_tree, pslot, newpage);
+	radix_tree_replace_slot(&mapping->pages, pslot, newpage);
 
 	page_ref_unfreeze(page, expected_count - 1);
 
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 
 	return MIGRATEPAGE_SUCCESS;
 }
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 8a1551154285..6a5b92629727 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2101,7 +2101,7 @@ void __init page_writeback_init(void)
  * so that it can tag pages faster than a dirtying process can create them).
  */
 /*
- * We tag pages in batches of WRITEBACK_TAG_BATCH to reduce tree_lock latency.
+ * We tag pages in batches of WRITEBACK_TAG_BATCH to reduce xa_lock latency.
  */
 void tag_pages_for_writeback(struct address_space *mapping,
 			     pgoff_t start, pgoff_t end)
@@ -2111,22 +2111,22 @@ void tag_pages_for_writeback(struct address_space *mapping,
 	struct radix_tree_iter iter;
 	void **slot;
 
-	spin_lock_irq(&mapping->tree_lock);
-	radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter, start,
+	xa_lock_irq(&mapping->pages);
+	radix_tree_for_each_tagged(slot, &mapping->pages, &iter, start,
 							PAGECACHE_TAG_DIRTY) {
 		if (iter.index > end)
 			break;
-		radix_tree_iter_tag_set(&mapping->page_tree, &iter,
+		radix_tree_iter_tag_set(&mapping->pages, &iter,
 							PAGECACHE_TAG_TOWRITE);
 		tagged++;
 		if ((tagged % WRITEBACK_TAG_BATCH) != 0)
 			continue;
 		slot = radix_tree_iter_resume(slot, &iter);
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		cond_resched();
-		spin_lock_irq(&mapping->tree_lock);
+		xa_lock_irq(&mapping->pages);
 	}
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 }
 EXPORT_SYMBOL(tag_pages_for_writeback);
 
@@ -2469,13 +2469,13 @@ int __set_page_dirty_nobuffers(struct page *page)
 			return 1;
 		}
 
-		spin_lock_irqsave(&mapping->tree_lock, flags);
+		xa_lock_irqsave(&mapping->pages, flags);
 		BUG_ON(page_mapping(page) != mapping);
 		WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
 		account_page_dirtied(page, mapping);
-		radix_tree_tag_set(&mapping->page_tree, page_index(page),
+		radix_tree_tag_set(&mapping->pages, page_index(page),
 				   PAGECACHE_TAG_DIRTY);
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
+		xa_unlock_irqrestore(&mapping->pages, flags);
 		unlock_page_memcg(page);
 
 		if (mapping->host) {
@@ -2720,11 +2720,10 @@ int test_clear_page_writeback(struct page *page)
 		struct backing_dev_info *bdi = inode_to_bdi(inode);
 		unsigned long flags;
 
-		spin_lock_irqsave(&mapping->tree_lock, flags);
+		xa_lock_irqsave(&mapping->pages, flags);
 		ret = TestClearPageWriteback(page);
 		if (ret) {
-			radix_tree_tag_clear(&mapping->page_tree,
-						page_index(page),
+			radix_tree_tag_clear(&mapping->pages, page_index(page),
 						PAGECACHE_TAG_WRITEBACK);
 			if (bdi_cap_account_writeback(bdi)) {
 				struct bdi_writeback *wb = inode_to_wb(inode);
@@ -2738,7 +2737,7 @@ int test_clear_page_writeback(struct page *page)
 						     PAGECACHE_TAG_WRITEBACK))
 			sb_clear_inode_writeback(mapping->host);
 
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
+		xa_unlock_irqrestore(&mapping->pages, flags);
 	} else {
 		ret = TestClearPageWriteback(page);
 	}
@@ -2768,7 +2767,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 		struct backing_dev_info *bdi = inode_to_bdi(inode);
 		unsigned long flags;
 
-		spin_lock_irqsave(&mapping->tree_lock, flags);
+		xa_lock_irqsave(&mapping->pages, flags);
 		ret = TestSetPageWriteback(page);
 		if (!ret) {
 			bool on_wblist;
@@ -2776,8 +2775,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 			on_wblist = mapping_tagged(mapping,
 						   PAGECACHE_TAG_WRITEBACK);
 
-			radix_tree_tag_set(&mapping->page_tree,
-						page_index(page),
+			radix_tree_tag_set(&mapping->pages, page_index(page),
 						PAGECACHE_TAG_WRITEBACK);
 			if (bdi_cap_account_writeback(bdi))
 				inc_wb_stat(inode_to_wb(inode), WB_WRITEBACK);
@@ -2791,14 +2789,12 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 				sb_mark_inode_writeback(mapping->host);
 		}
 		if (!PageDirty(page))
-			radix_tree_tag_clear(&mapping->page_tree,
-						page_index(page),
+			radix_tree_tag_clear(&mapping->pages, page_index(page),
 						PAGECACHE_TAG_DIRTY);
 		if (!keep_write)
-			radix_tree_tag_clear(&mapping->page_tree,
-						page_index(page),
+			radix_tree_tag_clear(&mapping->pages, page_index(page),
 						PAGECACHE_TAG_TOWRITE);
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
+		xa_unlock_irqrestore(&mapping->pages, flags);
 	} else {
 		ret = TestSetPageWriteback(page);
 	}
@@ -2818,7 +2814,7 @@ EXPORT_SYMBOL(__test_set_page_writeback);
  */
 int mapping_tagged(struct address_space *mapping, int tag)
 {
-	return radix_tree_tagged(&mapping->page_tree, tag);
+	return radix_tree_tagged(&mapping->pages, tag);
 }
 EXPORT_SYMBOL(mapping_tagged);
 
diff --git a/mm/readahead.c b/mm/readahead.c
index c4ca70239233..514188fd2489 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -175,7 +175,7 @@ int __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
 			break;
 
 		rcu_read_lock();
-		page = radix_tree_lookup(&mapping->page_tree, page_offset);
+		page = radix_tree_lookup(&mapping->pages, page_offset);
 		rcu_read_unlock();
 		if (page && !radix_tree_exceptional_entry(page))
 			continue;
diff --git a/mm/rmap.c b/mm/rmap.c
index 47db27f8049e..87c1ca0cf1a3 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -32,11 +32,11 @@
  *                 mmlist_lock (in mmput, drain_mmlist and others)
  *                 mapping->private_lock (in __set_page_dirty_buffers)
  *                   mem_cgroup_{begin,end}_page_stat (memcg->move_lock)
- *                     mapping->tree_lock (widely used)
+ *                     mapping->pages.xa_lock (widely used)
  *                 inode->i_lock (in set_page_dirty's __mark_inode_dirty)
  *                 bdi.wb->list_lock (in set_page_dirty's __mark_inode_dirty)
  *                   sb_lock (within inode_lock in fs/fs-writeback.c)
- *                   mapping->tree_lock (widely used, in set_page_dirty,
+ *                   mapping->pages.xa_lock (widely used, in set_page_dirty,
  *                             in arch-dependent flush_dcache_mmap_lock,
  *                             within bdi.wb->list_lock in __sync_single_inode)
  *
diff --git a/mm/shmem.c b/mm/shmem.c
index 4aa9307feab0..b1396ed0c9b1 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -332,12 +332,12 @@ static int shmem_radix_tree_replace(struct address_space *mapping,
 
 	VM_BUG_ON(!expected);
 	VM_BUG_ON(!replacement);
-	item = __radix_tree_lookup(&mapping->page_tree, index, &node, &pslot);
+	item = __radix_tree_lookup(&mapping->pages, index, &node, &pslot);
 	if (!item)
 		return -ENOENT;
 	if (item != expected)
 		return -ENOENT;
-	__radix_tree_replace(&mapping->page_tree, node, pslot,
+	__radix_tree_replace(&mapping->pages, node, pslot,
 			     replacement, NULL);
 	return 0;
 }
@@ -355,7 +355,7 @@ static bool shmem_confirm_swap(struct address_space *mapping,
 	void *item;
 
 	rcu_read_lock();
-	item = radix_tree_lookup(&mapping->page_tree, index);
+	item = radix_tree_lookup(&mapping->pages, index);
 	rcu_read_unlock();
 	return item == swp_to_radix_entry(swap);
 }
@@ -581,14 +581,14 @@ static int shmem_add_to_page_cache(struct page *page,
 	page->mapping = mapping;
 	page->index = index;
 
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 	if (PageTransHuge(page)) {
 		void __rcu **results;
 		pgoff_t idx;
 		int i;
 
 		error = 0;
-		if (radix_tree_gang_lookup_slot(&mapping->page_tree,
+		if (radix_tree_gang_lookup_slot(&mapping->pages,
 					&results, &idx, index, 1) &&
 				idx < index + HPAGE_PMD_NR) {
 			error = -EEXIST;
@@ -596,14 +596,14 @@ static int shmem_add_to_page_cache(struct page *page,
 
 		if (!error) {
 			for (i = 0; i < HPAGE_PMD_NR; i++) {
-				error = radix_tree_insert(&mapping->page_tree,
+				error = radix_tree_insert(&mapping->pages,
 						index + i, page + i);
 				VM_BUG_ON(error);
 			}
 			count_vm_event(THP_FILE_ALLOC);
 		}
 	} else if (!expected) {
-		error = radix_tree_insert(&mapping->page_tree, index, page);
+		error = radix_tree_insert(&mapping->pages, index, page);
 	} else {
 		error = shmem_radix_tree_replace(mapping, index, expected,
 								 page);
@@ -615,10 +615,10 @@ static int shmem_add_to_page_cache(struct page *page,
 			__inc_node_page_state(page, NR_SHMEM_THPS);
 		__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr);
 		__mod_node_page_state(page_pgdat(page), NR_SHMEM, nr);
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 	} else {
 		page->mapping = NULL;
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		page_ref_sub(page, nr);
 	}
 	return error;
@@ -634,13 +634,13 @@ static void shmem_delete_from_page_cache(struct page *page, void *radswap)
 
 	VM_BUG_ON_PAGE(PageCompound(page), page);
 
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 	error = shmem_radix_tree_replace(mapping, page->index, page, radswap);
 	page->mapping = NULL;
 	mapping->nrpages--;
 	__dec_node_page_state(page, NR_FILE_PAGES);
 	__dec_node_page_state(page, NR_SHMEM);
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 	put_page(page);
 	BUG_ON(error);
 }
@@ -653,9 +653,9 @@ static int shmem_free_swap(struct address_space *mapping,
 {
 	void *old;
 
-	spin_lock_irq(&mapping->tree_lock);
-	old = radix_tree_delete_item(&mapping->page_tree, index, radswap);
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
+	old = radix_tree_delete_item(&mapping->pages, index, radswap);
+	xa_unlock_irq(&mapping->pages);
 	if (old != radswap)
 		return -ENOENT;
 	free_swap_and_cache(radix_to_swp_entry(radswap));
@@ -666,7 +666,7 @@ static int shmem_free_swap(struct address_space *mapping,
  * Determine (in bytes) how many of the shmem object's pages mapped by the
  * given offsets are swapped out.
  *
- * This is safe to call without i_mutex or mapping->tree_lock thanks to RCU,
+ * This is safe to call without i_mutex or mapping->pages.xa_lock thanks to RCU,
  * as long as the inode doesn't go away and racy results are not a problem.
  */
 unsigned long shmem_partial_swap_usage(struct address_space *mapping,
@@ -679,7 +679,7 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
 
 	rcu_read_lock();
 
-	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
+	radix_tree_for_each_slot(slot, &mapping->pages, &iter, start) {
 		if (iter.index >= end)
 			break;
 
@@ -708,7 +708,7 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
  * Determine (in bytes) how many of the shmem object's pages mapped by the
  * given vma is swapped out.
  *
- * This is safe to call without i_mutex or mapping->tree_lock thanks to RCU,
+ * This is safe to call without i_mutex or mapping->pages.xa_lock thanks to RCU,
  * as long as the inode doesn't go away and racy results are not a problem.
  */
 unsigned long shmem_swap_usage(struct vm_area_struct *vma)
@@ -1123,7 +1123,7 @@ static int shmem_unuse_inode(struct shmem_inode_info *info,
 	int error = 0;
 
 	radswap = swp_to_radix_entry(swap);
-	index = find_swap_entry(&mapping->page_tree, radswap);
+	index = find_swap_entry(&mapping->pages, radswap);
 	if (index == -1)
 		return -EAGAIN;	/* tell shmem_unuse we found nothing */
 
@@ -1436,7 +1436,7 @@ static struct page *shmem_alloc_hugepage(gfp_t gfp,
 
 	hindex = round_down(index, HPAGE_PMD_NR);
 	rcu_read_lock();
-	if (radix_tree_gang_lookup_slot(&mapping->page_tree, &results, &idx,
+	if (radix_tree_gang_lookup_slot(&mapping->pages, &results, &idx,
 				hindex, 1) && idx < hindex + HPAGE_PMD_NR) {
 		rcu_read_unlock();
 		return NULL;
@@ -1549,14 +1549,14 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
 	 * Our caller will very soon move newpage out of swapcache, but it's
 	 * a nice clean interface for us to replace oldpage by newpage there.
 	 */
-	spin_lock_irq(&swap_mapping->tree_lock);
+	xa_lock_irq(&swap_mapping->pages);
 	error = shmem_radix_tree_replace(swap_mapping, swap_index, oldpage,
 								   newpage);
 	if (!error) {
 		__inc_node_page_state(newpage, NR_FILE_PAGES);
 		__dec_node_page_state(oldpage, NR_FILE_PAGES);
 	}
-	spin_unlock_irq(&swap_mapping->tree_lock);
+	xa_unlock_irq(&swap_mapping->pages);
 
 	if (unlikely(error)) {
 		/*
@@ -2622,7 +2622,7 @@ static void shmem_tag_pins(struct address_space *mapping)
 	start = 0;
 	rcu_read_lock();
 
-	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
+	radix_tree_for_each_slot(slot, &mapping->pages, &iter, start) {
 		page = radix_tree_deref_slot(slot);
 		if (!page || radix_tree_exception(page)) {
 			if (radix_tree_deref_retry(page)) {
@@ -2630,10 +2630,10 @@ static void shmem_tag_pins(struct address_space *mapping)
 				continue;
 			}
 		} else if (page_count(page) - page_mapcount(page) > 1) {
-			spin_lock_irq(&mapping->tree_lock);
-			radix_tree_tag_set(&mapping->page_tree, iter.index,
+			xa_lock_irq(&mapping->pages);
+			radix_tree_tag_set(&mapping->pages, iter.index,
 					   SHMEM_TAG_PINNED);
-			spin_unlock_irq(&mapping->tree_lock);
+			xa_unlock_irq(&mapping->pages);
 		}
 
 		if (need_resched()) {
@@ -2665,7 +2665,7 @@ static int shmem_wait_for_pins(struct address_space *mapping)
 
 	error = 0;
 	for (scan = 0; scan <= LAST_SCAN; scan++) {
-		if (!radix_tree_tagged(&mapping->page_tree, SHMEM_TAG_PINNED))
+		if (!radix_tree_tagged(&mapping->pages, SHMEM_TAG_PINNED))
 			break;
 
 		if (!scan)
@@ -2675,7 +2675,7 @@ static int shmem_wait_for_pins(struct address_space *mapping)
 
 		start = 0;
 		rcu_read_lock();
-		radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter,
+		radix_tree_for_each_tagged(slot, &mapping->pages, &iter,
 					   start, SHMEM_TAG_PINNED) {
 
 			page = radix_tree_deref_slot(slot);
@@ -2701,10 +2701,10 @@ static int shmem_wait_for_pins(struct address_space *mapping)
 				error = -EBUSY;
 			}
 
-			spin_lock_irq(&mapping->tree_lock);
-			radix_tree_tag_clear(&mapping->page_tree,
+			xa_lock_irq(&mapping->pages);
+			radix_tree_tag_clear(&mapping->pages,
 					     iter.index, SHMEM_TAG_PINNED);
-			spin_unlock_irq(&mapping->tree_lock);
+			xa_unlock_irq(&mapping->pages);
 continue_resched:
 			if (need_resched()) {
 				slot = radix_tree_iter_resume(slot, &iter);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 39ae7cfad90f..3f95e8fc4cb2 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -124,10 +124,10 @@ int __add_to_swap_cache(struct page *page, swp_entry_t entry)
 	SetPageSwapCache(page);
 
 	address_space = swap_address_space(entry);
-	spin_lock_irq(&address_space->tree_lock);
+	xa_lock_irq(&address_space->pages);
 	for (i = 0; i < nr; i++) {
 		set_page_private(page + i, entry.val + i);
-		error = radix_tree_insert(&address_space->page_tree,
+		error = radix_tree_insert(&address_space->pages,
 					  idx + i, page + i);
 		if (unlikely(error))
 			break;
@@ -145,13 +145,13 @@ int __add_to_swap_cache(struct page *page, swp_entry_t entry)
 		VM_BUG_ON(error == -EEXIST);
 		set_page_private(page + i, 0UL);
 		while (i--) {
-			radix_tree_delete(&address_space->page_tree, idx + i);
+			radix_tree_delete(&address_space->pages, idx + i);
 			set_page_private(page + i, 0UL);
 		}
 		ClearPageSwapCache(page);
 		page_ref_sub(page, nr);
 	}
-	spin_unlock_irq(&address_space->tree_lock);
+	xa_unlock_irq(&address_space->pages);
 
 	return error;
 }
@@ -188,7 +188,7 @@ void __delete_from_swap_cache(struct page *page)
 	address_space = swap_address_space(entry);
 	idx = swp_offset(entry);
 	for (i = 0; i < nr; i++) {
-		radix_tree_delete(&address_space->page_tree, idx + i);
+		radix_tree_delete(&address_space->pages, idx + i);
 		set_page_private(page + i, 0);
 	}
 	ClearPageSwapCache(page);
@@ -272,9 +272,9 @@ void delete_from_swap_cache(struct page *page)
 	entry.val = page_private(page);
 
 	address_space = swap_address_space(entry);
-	spin_lock_irq(&address_space->tree_lock);
+	xa_lock_irq(&address_space->pages);
 	__delete_from_swap_cache(page);
-	spin_unlock_irq(&address_space->tree_lock);
+	xa_unlock_irq(&address_space->pages);
 
 	put_swap_page(page, entry);
 	page_ref_sub(page, hpage_nr_pages(page));
@@ -612,12 +612,11 @@ int init_swap_address_space(unsigned int type, unsigned long nr_pages)
 		return -ENOMEM;
 	for (i = 0; i < nr; i++) {
 		space = spaces + i;
-		INIT_RADIX_TREE(&space->page_tree, GFP_ATOMIC|__GFP_NOWARN);
+		INIT_RADIX_TREE(&space->pages, GFP_ATOMIC|__GFP_NOWARN);
 		atomic_set(&space->i_mmap_writable, 0);
 		space->a_ops = &swap_aops;
 		/* swap cache doesn't use writeback related tags */
 		mapping_set_no_writeback_tags(space);
-		spin_lock_init(&space->tree_lock);
 	}
 	nr_swapper_spaces[type] = nr;
 	rcu_assign_pointer(swapper_spaces[type], spaces);
diff --git a/mm/truncate.c b/mm/truncate.c
index e4b4cf0f4070..094158f2e447 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -36,11 +36,11 @@ static inline void __clear_shadow_entry(struct address_space *mapping,
 	struct radix_tree_node *node;
 	void **slot;
 
-	if (!__radix_tree_lookup(&mapping->page_tree, index, &node, &slot))
+	if (!__radix_tree_lookup(&mapping->pages, index, &node, &slot))
 		return;
 	if (*slot != entry)
 		return;
-	__radix_tree_replace(&mapping->page_tree, node, slot, NULL,
+	__radix_tree_replace(&mapping->pages, node, slot, NULL,
 			     workingset_update_node);
 	mapping->nrexceptional--;
 }
@@ -48,9 +48,9 @@ static inline void __clear_shadow_entry(struct address_space *mapping,
 static void clear_shadow_entry(struct address_space *mapping, pgoff_t index,
 			       void *entry)
 {
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 	__clear_shadow_entry(mapping, index, entry);
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 }
 
 /*
@@ -79,7 +79,7 @@ static void truncate_exceptional_pvec_entries(struct address_space *mapping,
 	dax = dax_mapping(mapping);
 	lock = !dax && indices[j] < end;
 	if (lock)
-		spin_lock_irq(&mapping->tree_lock);
+		xa_lock_irq(&mapping->pages);
 
 	for (i = j; i < pagevec_count(pvec); i++) {
 		struct page *page = pvec->pages[i];
@@ -102,7 +102,7 @@ static void truncate_exceptional_pvec_entries(struct address_space *mapping,
 	}
 
 	if (lock)
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 	pvec->nr = j;
 }
 
@@ -522,8 +522,8 @@ void truncate_inode_pages_final(struct address_space *mapping)
 		 * modification that does not see AS_EXITING is
 		 * completed before starting the final truncate.
 		 */
-		spin_lock_irq(&mapping->tree_lock);
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_lock_irq(&mapping->pages);
+		xa_unlock_irq(&mapping->pages);
 
 		truncate_inode_pages(mapping, 0);
 	}
@@ -631,13 +631,13 @@ invalidate_complete_page2(struct address_space *mapping, struct page *page)
 	if (page_has_private(page) && !try_to_release_page(page, GFP_KERNEL))
 		return 0;
 
-	spin_lock_irqsave(&mapping->tree_lock, flags);
+	xa_lock_irqsave(&mapping->pages, flags);
 	if (PageDirty(page))
 		goto failed;
 
 	BUG_ON(page_has_private(page));
 	__delete_from_page_cache(page, NULL);
-	spin_unlock_irqrestore(&mapping->tree_lock, flags);
+	xa_unlock_irqrestore(&mapping->pages, flags);
 
 	if (mapping->a_ops->freepage)
 		mapping->a_ops->freepage(page);
@@ -645,7 +645,7 @@ invalidate_complete_page2(struct address_space *mapping, struct page *page)
 	put_page(page);	/* pagecache ref */
 	return 1;
 failed:
-	spin_unlock_irqrestore(&mapping->tree_lock, flags);
+	xa_unlock_irqrestore(&mapping->pages, flags);
 	return 0;
 }
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c02c850ea349..96316bd91f91 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -674,7 +674,7 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
 	BUG_ON(!PageLocked(page));
 	BUG_ON(mapping != page_mapping(page));
 
-	spin_lock_irqsave(&mapping->tree_lock, flags);
+	xa_lock_irqsave(&mapping->pages, flags);
 	/*
 	 * The non racy check for a busy page.
 	 *
@@ -698,7 +698,7 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
 	 * load is not satisfied before that of page->_refcount.
 	 *
 	 * Note that if SetPageDirty is always performed via set_page_dirty,
-	 * and thus under tree_lock, then this ordering is not required.
+	 * and thus under xa_lock, then this ordering is not required.
 	 */
 	if (unlikely(PageTransHuge(page)) && PageSwapCache(page))
 		refcount = 1 + HPAGE_PMD_NR;
@@ -716,7 +716,7 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
 		swp_entry_t swap = { .val = page_private(page) };
 		mem_cgroup_swapout(page, swap);
 		__delete_from_swap_cache(page);
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
+		xa_unlock_irqrestore(&mapping->pages, flags);
 		put_swap_page(page, swap);
 	} else {
 		void (*freepage)(struct page *);
@@ -737,13 +737,13 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
 		 * only page cache pages found in these are zero pages
 		 * covering holes, and because we don't want to mix DAX
 		 * exceptional entries and shadow exceptional entries in the
-		 * same page_tree.
+		 * same address_space.
 		 */
 		if (reclaimed && page_is_file_cache(page) &&
 		    !mapping_exiting(mapping) && !dax_mapping(mapping))
 			shadow = workingset_eviction(mapping, page);
 		__delete_from_page_cache(page, shadow);
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
+		xa_unlock_irqrestore(&mapping->pages, flags);
 
 		if (freepage != NULL)
 			freepage(page);
@@ -752,7 +752,7 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
 	return 1;
 
 cannot_free:
-	spin_unlock_irqrestore(&mapping->tree_lock, flags);
+	xa_unlock_irqrestore(&mapping->pages, flags);
 	return 0;
 }
 
diff --git a/mm/workingset.c b/mm/workingset.c
index b7d616a3bbbe..2d071f0df3af 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -202,7 +202,7 @@ static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat,
  * @mapping: address space the page was backing
  * @page: the page being evicted
  *
- * Returns a shadow entry to be stored in @mapping->page_tree in place
+ * Returns a shadow entry to be stored in @mapping->pages in place
  * of the evicted @page so that a later refault can be detected.
  */
 void *workingset_eviction(struct address_space *mapping, struct page *page)
@@ -348,7 +348,7 @@ void workingset_update_node(struct radix_tree_node *node)
 	 *
 	 * Avoid acquiring the list_lru lock when the nodes are
 	 * already where they should be. The list_empty() test is safe
-	 * as node->private_list is protected by &mapping->tree_lock.
+	 * as node->private_list is protected by &mapping->pages.xa_lock.
 	 */
 	if (node->count && node->count == node->exceptional) {
 		if (list_empty(&node->private_list))
@@ -366,7 +366,7 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
 	unsigned long nodes;
 	unsigned long cache;
 
-	/* list_lru lock nests inside IRQ-safe mapping->tree_lock */
+	/* list_lru lock nests inside IRQ-safe mapping->pages.xa_lock */
 	local_irq_disable();
 	nodes = list_lru_shrink_count(&shadow_nodes, sc);
 	local_irq_enable();
@@ -419,21 +419,21 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
 
 	/*
 	 * Page cache insertions and deletions synchroneously maintain
-	 * the shadow node LRU under the mapping->tree_lock and the
+	 * the shadow node LRU under the mapping->pages.xa_lock and the
 	 * lru_lock.  Because the page cache tree is emptied before
 	 * the inode can be destroyed, holding the lru_lock pins any
 	 * address_space that has radix tree nodes on the LRU.
 	 *
-	 * We can then safely transition to the mapping->tree_lock to
+	 * We can then safely transition to the mapping->pages.xa_lock to
 	 * pin only the address_space of the particular node we want
 	 * to reclaim, take the node off-LRU, and drop the lru_lock.
 	 */
 
 	node = container_of(item, struct radix_tree_node, private_list);
-	mapping = container_of(node->root, struct address_space, page_tree);
+	mapping = container_of(node->root, struct address_space, pages);
 
 	/* Coming from the list, invert the lock order */
-	if (!spin_trylock(&mapping->tree_lock)) {
+	if (!xa_trylock(&mapping->pages)) {
 		spin_unlock(lru_lock);
 		ret = LRU_RETRY;
 		goto out;
@@ -468,11 +468,11 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
 	if (WARN_ON_ONCE(node->exceptional))
 		goto out_invalid;
 	inc_lruvec_page_state(virt_to_page(node), WORKINGSET_NODERECLAIM);
-	__radix_tree_delete_node(&mapping->page_tree, node,
+	__radix_tree_delete_node(&mapping->pages, node,
 				 workingset_lookup_update(mapping));
 
 out_invalid:
-	spin_unlock(&mapping->tree_lock);
+	xa_unlock(&mapping->pages);
 	ret = LRU_REMOVED_RETRY;
 out:
 	local_irq_enable();
@@ -487,7 +487,7 @@ static unsigned long scan_shadow_nodes(struct shrinker *shrinker,
 {
 	unsigned long ret;
 
-	/* list_lru lock nests inside IRQ-safe mapping->tree_lock */
+	/* list_lru lock nests inside IRQ-safe mapping->xa_lock */
 	local_irq_disable();
 	ret = list_lru_shrink_walk(&shadow_nodes, sc, shadow_lru_isolate, NULL);
 	local_irq_enable();
@@ -503,7 +503,7 @@ static struct shrinker workingset_shadow_shrinker = {
 
 /*
  * Our list_lru->lock is IRQ-safe as it nests inside the IRQ-safe
- * mapping->tree_lock.
+ * mapping->pages.xa_lock.
  */
 static struct lock_class_key shadow_nodes_key;
 
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 16/62] xarray: Replace exceptional entries
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Introduce xarray value entries to replace the radix tree exceptional
entry code.  This is a slight change in encoding to allow the use of an
extra bit (we can now store BITS_PER_LONG - 1 bits in a value entry).
It is also a change in emphasis; exceptional entries are intimidating
and different.  As the comment explains, you can choose to store values
or pointers in the xarray and they are both first-class citizens.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h    |   4 +-
 arch/powerpc/include/asm/nohash/64/pgtable.h    |   4 +-
 drivers/gpu/drm/i915/i915_gem.c                 |  17 ++--
 drivers/staging/lustre/lustre/mdc/mdc_request.c |   2 +-
 fs/btrfs/compression.c                          |   2 +-
 fs/btrfs/inode.c                                |   2 +-
 fs/dax.c                                        | 108 ++++++++++++------------
 fs/proc/task_mmu.c                              |   2 +-
 include/linux/fs.h                              |  48 +++++++----
 include/linux/radix-tree.h                      |  36 ++------
 include/linux/swapops.h                         |  19 ++---
 include/linux/xarray.h                          |  66 +++++++++++++++
 lib/idr.c                                       |  63 ++++++--------
 lib/radix-tree.c                                |  21 ++---
 mm/filemap.c                                    |  10 +--
 mm/khugepaged.c                                 |   2 +-
 mm/madvise.c                                    |   2 +-
 mm/memcontrol.c                                 |   2 +-
 mm/mincore.c                                    |   2 +-
 mm/readahead.c                                  |   2 +-
 mm/shmem.c                                      |  10 +--
 mm/swap.c                                       |   2 +-
 mm/truncate.c                                   |  12 +--
 mm/workingset.c                                 |  12 ++-
 tools/testing/radix-tree/idr-test.c             |   6 +-
 tools/testing/radix-tree/linux/radix-tree.h     |   1 +
 tools/testing/radix-tree/multiorder.c           |  47 +++++------
 tools/testing/radix-tree/test.c                 |   2 +-
 28 files changed, 270 insertions(+), 236 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 9a677cd5997f..ff96663d8455 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -649,9 +649,7 @@ static inline bool pte_user(pte_t pte)
 	BUILD_BUG_ON(_PAGE_HPTEFLAGS & (0x1f << _PAGE_BIT_SWAP_TYPE)); \
 	BUILD_BUG_ON(_PAGE_HPTEFLAGS & _PAGE_SWP_SOFT_DIRTY);	\
 	} while (0)
-/*
- * on pte we don't need handle RADIX_TREE_EXCEPTIONAL_SHIFT;
- */
+
 #define SWP_TYPE_BITS 5
 #define __swp_type(x)		(((x).val >> _PAGE_BIT_SWAP_TYPE) \
 				& ((1UL << SWP_TYPE_BITS) - 1))
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h b/arch/powerpc/include/asm/nohash/64/pgtable.h
index abddf5830ad5..f711773568d7 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -329,9 +329,7 @@ static inline void __ptep_set_access_flags(struct mm_struct *mm,
 	 */							\
 	BUILD_BUG_ON(_PAGE_HPTEFLAGS & (0x1f << _PAGE_BIT_SWAP_TYPE)); \
 	} while (0)
-/*
- * on pte we don't need handle RADIX_TREE_EXCEPTIONAL_SHIFT;
- */
+
 #define SWP_TYPE_BITS 5
 #define __swp_type(x)		(((x).val >> _PAGE_BIT_SWAP_TYPE) \
 				& ((1UL << SWP_TYPE_BITS) - 1))
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3a140eedfc83..7b10ec3694a8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5375,7 +5375,8 @@ i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
 	count = __sg_page_count(sg);
 
 	while (idx + count <= n) {
-		unsigned long exception, i;
+		void *entry;
+		unsigned long i;
 		int ret;
 
 		/* If we cannot allocate and insert this entry, or the
@@ -5390,12 +5391,9 @@ i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
 		if (ret && ret != -EEXIST)
 			goto scan;
 
-		exception =
-			RADIX_TREE_EXCEPTIONAL_ENTRY |
-			idx << RADIX_TREE_EXCEPTIONAL_SHIFT;
+		entry = xa_mk_value(idx);
 		for (i = 1; i < count; i++) {
-			ret = radix_tree_insert(&iter->radix, idx + i,
-						(void *)exception);
+			ret = radix_tree_insert(&iter->radix, idx + i, entry);
 			if (ret && ret != -EEXIST)
 				goto scan;
 		}
@@ -5433,15 +5431,14 @@ i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
 	GEM_BUG_ON(!sg);
 
 	/* If this index is in the middle of multi-page sg entry,
-	 * the radixtree will contain an exceptional entry that points
+	 * the radixtree will contain a data entry that points
 	 * to the start of that range. We will return the pointer to
 	 * the base page and the offset of this page within the
 	 * sg entry's range.
 	 */
 	*offset = 0;
-	if (unlikely(radix_tree_exception(sg))) {
-		unsigned long base =
-			(unsigned long)sg >> RADIX_TREE_EXCEPTIONAL_SHIFT;
+	if (unlikely(xa_is_value(sg))) {
+		unsigned long base = xa_to_value(sg);
 
 		sg = radix_tree_lookup(&iter->radix, base);
 		GEM_BUG_ON(!sg);
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 45dcf9f958d4..2ec79a6b17da 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -940,7 +940,7 @@ static struct page *mdc_page_locate(struct address_space *mapping, __u64 *hash,
 	xa_lock_irq(&mapping->pages);
 	found = radix_tree_gang_lookup(&mapping->pages,
 				       (void **)&page, offset, 1);
-	if (found > 0 && !radix_tree_exceptional_entry(page)) {
+	if (found > 0 && !xa_is_value(page)) {
 		struct lu_dirpage *dp;
 
 		get_page(page);
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 541bd536c815..9446c58fa0ec 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -451,7 +451,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 		rcu_read_lock();
 		page = radix_tree_lookup(&mapping->pages, pg_index);
 		rcu_read_unlock();
-		if (page && !radix_tree_exceptional_entry(page)) {
+		if (page && !xa_is_value(page)) {
 			misses++;
 			if (misses > 4)
 				break;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 9ceee45ed513..dcad35f7c699 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7562,7 +7562,7 @@ bool btrfs_page_exists_in_range(struct inode *inode, loff_t start, loff_t end)
 			}
 			/*
 			 * Otherwise, shmem/tmpfs must be storing a swap entry
-			 * here as an exceptional entry: so return it without
+			 * here as data entry: so return it without
 			 * attempting to raise page count.
 			 */
 			page = NULL;
diff --git a/fs/dax.c b/fs/dax.c
index 93e6b79687af..e6ed216d3760 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -58,7 +58,7 @@ static int __init init_dax_wait_table(void)
 fs_initcall(init_dax_wait_table);
 
 /*
- * We use lowest available bit in exceptional entry for locking, one bit for
+ * We use lowest available bit in data entry for locking, one bit for
  * the entry size (PMD) and two more to tell us if the entry is a zero page or
  * an empty entry that is just used for locking.  In total four special bits.
  *
@@ -66,49 +66,48 @@ fs_initcall(init_dax_wait_table);
  * and EMPTY bits aren't set the entry is a normal DAX entry with a filesystem
  * block allocation.
  */
-#define RADIX_DAX_SHIFT		(RADIX_TREE_EXCEPTIONAL_SHIFT + 4)
-#define RADIX_DAX_ENTRY_LOCK	(1 << RADIX_TREE_EXCEPTIONAL_SHIFT)
-#define RADIX_DAX_PMD		(1 << (RADIX_TREE_EXCEPTIONAL_SHIFT + 1))
-#define RADIX_DAX_ZERO_PAGE	(1 << (RADIX_TREE_EXCEPTIONAL_SHIFT + 2))
-#define RADIX_DAX_EMPTY		(1 << (RADIX_TREE_EXCEPTIONAL_SHIFT + 3))
+#define DAX_SHIFT	(4)
+#define DAX_ENTRY_LOCK	(1UL << 0)
+#define DAX_PMD		(1UL << 1)
+#define DAX_ZERO_PAGE	(1UL << 2)
+#define DAX_EMPTY	(1UL << 3)
 
 static unsigned long dax_radix_sector(void *entry)
 {
-	return (unsigned long)entry >> RADIX_DAX_SHIFT;
+	return xa_to_value(entry) >> DAX_SHIFT;
 }
 
 static void *dax_radix_locked_entry(sector_t sector, unsigned long flags)
 {
-	return (void *)(RADIX_TREE_EXCEPTIONAL_ENTRY | flags |
-			((unsigned long)sector << RADIX_DAX_SHIFT) |
-			RADIX_DAX_ENTRY_LOCK);
+	return xa_mk_value(flags | ((unsigned long)sector << DAX_SHIFT) |
+			DAX_ENTRY_LOCK);
 }
 
 static unsigned int dax_radix_order(void *entry)
 {
-	if ((unsigned long)entry & RADIX_DAX_PMD)
+	if (xa_to_value(entry) & DAX_PMD)
 		return PMD_SHIFT - PAGE_SHIFT;
 	return 0;
 }
 
 static int dax_is_pmd_entry(void *entry)
 {
-	return (unsigned long)entry & RADIX_DAX_PMD;
+	return xa_to_value(entry) & DAX_PMD;
 }
 
 static int dax_is_pte_entry(void *entry)
 {
-	return !((unsigned long)entry & RADIX_DAX_PMD);
+	return !(xa_to_value(entry) & DAX_PMD);
 }
 
 static int dax_is_zero_entry(void *entry)
 {
-	return (unsigned long)entry & RADIX_DAX_ZERO_PAGE;
+	return xa_to_value(entry) & DAX_ZERO_PAGE;
 }
 
 static int dax_is_empty_entry(void *entry)
 {
-	return (unsigned long)entry & RADIX_DAX_EMPTY;
+	return xa_to_value(entry) & DAX_EMPTY;
 }
 
 /*
@@ -188,9 +187,9 @@ static void dax_wake_mapping_entry_waiter(struct address_space *mapping,
  */
 static inline int slot_locked(struct address_space *mapping, void **slot)
 {
-	unsigned long entry = (unsigned long)
-		radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock);
-	return entry & RADIX_DAX_ENTRY_LOCK;
+	unsigned long entry = xa_to_value(
+		radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock));
+	return entry & DAX_ENTRY_LOCK;
 }
 
 /*
@@ -199,12 +198,11 @@ static inline int slot_locked(struct address_space *mapping, void **slot)
  */
 static inline void *lock_slot(struct address_space *mapping, void **slot)
 {
-	unsigned long entry = (unsigned long)
-		radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock);
-
-	entry |= RADIX_DAX_ENTRY_LOCK;
-	radix_tree_replace_slot(&mapping->pages, slot, (void *)entry);
-	return (void *)entry;
+	unsigned long v = xa_to_value(
+		radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock));
+	void *entry = xa_mk_value(v | DAX_ENTRY_LOCK);
+	radix_tree_replace_slot(&mapping->pages, slot, entry);
+	return entry;
 }
 
 /*
@@ -213,17 +211,16 @@ static inline void *lock_slot(struct address_space *mapping, void **slot)
  */
 static inline void *unlock_slot(struct address_space *mapping, void **slot)
 {
-	unsigned long entry = (unsigned long)
-		radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock);
-
-	entry &= ~(unsigned long)RADIX_DAX_ENTRY_LOCK;
-	radix_tree_replace_slot(&mapping->pages, slot, (void *)entry);
-	return (void *)entry;
+	unsigned long v = xa_to_value(
+		radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock));
+	void *entry = xa_mk_value(v & ~DAX_ENTRY_LOCK);
+	radix_tree_replace_slot(&mapping->pages, slot, entry);
+	return entry;
 }
 
 /*
  * Lookup entry in radix tree, wait for it to become unlocked if it is
- * exceptional entry and return it. The caller must call
+ * a data entry and return it. The caller must call
  * put_unlocked_mapping_entry() when he decided not to lock the entry or
  * put_locked_mapping_entry() when he locked the entry and now wants to
  * unlock it.
@@ -244,7 +241,7 @@ static void *get_unlocked_mapping_entry(struct address_space *mapping,
 		entry = __radix_tree_lookup(&mapping->pages, index, NULL,
 					  &slot);
 		if (!entry ||
-		    WARN_ON_ONCE(!radix_tree_exceptional_entry(entry)) ||
+		    WARN_ON_ONCE(!xa_is_value(entry)) ||
 		    !slot_locked(mapping, slot)) {
 			if (slotp)
 				*slotp = slot;
@@ -268,7 +265,7 @@ static void dax_unlock_mapping_entry(struct address_space *mapping,
 
 	xa_lock_irq(&mapping->pages);
 	entry = __radix_tree_lookup(&mapping->pages, index, NULL, &slot);
-	if (WARN_ON_ONCE(!entry || !radix_tree_exceptional_entry(entry) ||
+	if (WARN_ON_ONCE(!entry || !xa_is_value(entry) ||
 			 !slot_locked(mapping, slot))) {
 		xa_unlock_irq(&mapping->pages);
 		return;
@@ -299,12 +296,11 @@ static void put_unlocked_mapping_entry(struct address_space *mapping,
 }
 
 /*
- * Find radix tree entry at given index. If it points to an exceptional entry,
- * return it with the radix tree entry locked. If the radix tree doesn't
- * contain given index, create an empty exceptional entry for the index and
- * return with it locked.
+ * Find radix tree entry at given index. If it is a data value, return it
+ * with the radix tree entry locked. If the radix tree doesn't contain the
+ * given index, create an empty value for the index and return with it locked.
  *
- * When requesting an entry with size RADIX_DAX_PMD, grab_mapping_entry() will
+ * When requesting an entry with size DAX_PMD, grab_mapping_entry() will
  * either return that locked entry or will return an error.  This error will
  * happen if there are any 4k entries within the 2MiB range that we are
  * requesting.
@@ -334,13 +330,13 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
 	xa_lock_irq(&mapping->pages);
 	entry = get_unlocked_mapping_entry(mapping, index, &slot);
 
-	if (WARN_ON_ONCE(entry && !radix_tree_exceptional_entry(entry))) {
+	if (WARN_ON_ONCE(entry && !xa_is_value(entry))) {
 		entry = ERR_PTR(-EIO);
 		goto out_unlock;
 	}
 
 	if (entry) {
-		if (size_flag & RADIX_DAX_PMD) {
+		if (size_flag & DAX_PMD) {
 			if (dax_is_pte_entry(entry)) {
 				put_unlocked_mapping_entry(mapping, index,
 						entry);
@@ -410,7 +406,7 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
 					true);
 		}
 
-		entry = dax_radix_locked_entry(0, size_flag | RADIX_DAX_EMPTY);
+		entry = dax_radix_locked_entry(0, size_flag | DAX_EMPTY);
 
 		err = __radix_tree_insert(&mapping->pages, index,
 				dax_radix_order(entry), entry);
@@ -447,7 +443,7 @@ static int __dax_invalidate_mapping_entry(struct address_space *mapping,
 
 	xa_lock_irq(&mapping->pages);
 	entry = get_unlocked_mapping_entry(mapping, index, NULL);
-	if (!entry || WARN_ON_ONCE(!radix_tree_exceptional_entry(entry)))
+	if (!entry || WARN_ON_ONCE(!xa_is_value(entry)))
 		goto out;
 	if (!trunc &&
 	    (radix_tree_tag_get(pages, index, PAGECACHE_TAG_DIRTY) ||
@@ -462,7 +458,7 @@ static int __dax_invalidate_mapping_entry(struct address_space *mapping,
 	return ret;
 }
 /*
- * Delete exceptional DAX entry at @index from @mapping. Wait for radix tree
+ * Delete DAX data entry at @index from @mapping. Wait for radix tree
  * entry to get unlocked before deleting it.
  */
 int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index)
@@ -473,7 +469,7 @@ int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index)
 	 * This gets called from truncate / punch_hole path. As such, the caller
 	 * must hold locks protecting against concurrent modifications of the
 	 * radix tree (usually fs-private i_mmap_sem for writing). Since the
-	 * caller has seen exceptional entry for this index, we better find it
+	 * caller has seen a data entry for this index, we better find it
 	 * at that index as well...
 	 */
 	WARN_ON_ONCE(!ret);
@@ -481,7 +477,7 @@ int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index)
 }
 
 /*
- * Invalidate exceptional DAX entry if it is clean.
+ * Invalidate DAX data entry if it is clean.
  */
 int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
 				      pgoff_t index)
@@ -535,7 +531,7 @@ static void *dax_insert_mapping_entry(struct address_space *mapping,
 	if (dirty)
 		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
 
-	if (dax_is_zero_entry(entry) && !(flags & RADIX_DAX_ZERO_PAGE)) {
+	if (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE)) {
 		/* we are replacing a zero page with block mapping */
 		if (dax_is_pmd_entry(entry))
 			unmap_mapping_range(mapping,
@@ -674,13 +670,13 @@ static int dax_writeback_one(struct block_device *bdev,
 	 * A page got tagged dirty in DAX mapping? Something is seriously
 	 * wrong.
 	 */
-	if (WARN_ON(!radix_tree_exceptional_entry(entry)))
+	if (WARN_ON(!xa_is_value(entry)))
 		return -EIO;
 
 	xa_lock_irq(&mapping->pages);
 	entry2 = get_unlocked_mapping_entry(mapping, index, &slot);
 	/* Entry got punched out / reallocated? */
-	if (!entry2 || WARN_ON_ONCE(!radix_tree_exceptional_entry(entry2)))
+	if (!entry2 || WARN_ON_ONCE(!xa_is_value(entry2)))
 		goto put_unlocked;
 	/*
 	 * Entry got reallocated elsewhere? No need to writeback. We have to
@@ -886,7 +882,7 @@ static int dax_load_hole(struct address_space *mapping, void *entry,
 	}
 
 	entry2 = dax_insert_mapping_entry(mapping, vmf, entry, 0,
-			RADIX_DAX_ZERO_PAGE, false);
+			DAX_ZERO_PAGE, false);
 	if (IS_ERR(entry2)) {
 		ret = VM_FAULT_SIGBUS;
 		goto out;
@@ -1292,7 +1288,7 @@ static int dax_pmd_load_hole(struct vm_fault *vmf, struct iomap *iomap,
 		goto fallback;
 
 	ret = dax_insert_mapping_entry(mapping, vmf, entry, 0,
-			RADIX_DAX_PMD | RADIX_DAX_ZERO_PAGE, false);
+			DAX_PMD | DAX_ZERO_PAGE, false);
 	if (IS_ERR(ret))
 		goto fallback;
 
@@ -1377,7 +1373,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	 * is already in the tree, for instance), it will return -EEXIST and
 	 * we just fall back to 4k entries.
 	 */
-	entry = grab_mapping_entry(mapping, pgoff, RADIX_DAX_PMD);
+	entry = grab_mapping_entry(mapping, pgoff, DAX_PMD);
 	if (IS_ERR(entry))
 		goto fallback;
 
@@ -1416,7 +1412,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 
 		entry = dax_insert_mapping_entry(mapping, vmf, entry,
 						dax_iomap_sector(&iomap, pos),
-						RADIX_DAX_PMD, write && !sync);
+						DAX_PMD, write && !sync);
 		if (IS_ERR(entry))
 			goto finish_iomap;
 
@@ -1528,21 +1524,21 @@ static int dax_insert_pfn_mkwrite(struct vm_fault *vmf,
 	pgoff_t index = vmf->pgoff;
 	int vmf_ret, error;
 
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 	entry = get_unlocked_mapping_entry(mapping, index, &slot);
 	/* Did we race with someone splitting entry or so? */
 	if (!entry ||
 	    (pe_size == PE_SIZE_PTE && !dax_is_pte_entry(entry)) ||
 	    (pe_size == PE_SIZE_PMD && !dax_is_pmd_entry(entry))) {
 		put_unlocked_mapping_entry(mapping, index, entry);
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		trace_dax_insert_pfn_mkwrite_no_entry(mapping->host, vmf,
 						      VM_FAULT_NOPAGE);
 		return VM_FAULT_NOPAGE;
 	}
-	radix_tree_tag_set(&mapping->page_tree, index, PAGECACHE_TAG_DIRTY);
+	radix_tree_tag_set(&mapping->pages, index, PAGECACHE_TAG_DIRTY);
 	entry = lock_slot(mapping, slot);
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 	switch (pe_size) {
 	case PE_SIZE_PTE:
 		error = vm_insert_mixed_mkwrite(vmf->vma, vmf->address, pfn);
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 339e4c1c044d..fadc6dbe17d6 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -553,7 +553,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
 		if (!page)
 			return;
 
-		if (radix_tree_exceptional_entry(page))
+		if (xa_is_value(page))
 			mss->swap += PAGE_SIZE;
 		else
 			put_page(page);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 265340149265..a5c105d292a7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -389,23 +389,41 @@ int pagecache_write_end(struct file *, struct address_space *mapping,
 				loff_t pos, unsigned len, unsigned copied,
 				struct page *page, void *fsdata);
 
+/**
+ * struct address_space - Contents of a cachable, mappable object
+ *
+ * @host: Owner, either the inode or the block_device
+ * @pages: Cached pages
+ * @gfp_mask: Memory allocation flags to use for allocating pages
+ * @i_mmap_writable: count VM_SHARED mappings
+ * @i_mmap: tree of private and shared mappings
+ * @i_mmap_rwsem: Protects @i_mmap and @i_mmap_writable
+ * @nrpages: Number of total pages, protected by pages.xa_lock
+ * @nrexceptional: Shadow or DAX entries, protected by pages.xa_lock
+ * @writeback_index: writeback starts here
+ * @a_ops: methods
+ * @flags: Error bits and flags (AS_*)
+ * @wb_err: The most recent error which has occurred
+ * @private_lock: For use by the owner of the address_space
+ * @private_list: For use by the owner of the address space
+ * @private_data: For use by the owner of the address space
+ */
 struct address_space {
-	struct inode		*host;		/* owner: inode, block_device */
-	struct radix_tree_root	pages;		/* cached pages */
-	gfp_t			gfp_mask;	/* for allocating pages */
-	atomic_t		i_mmap_writable;/* count VM_SHARED mappings */
-	struct rb_root_cached	i_mmap;		/* tree of private and shared mappings */
-	struct rw_semaphore	i_mmap_rwsem;	/* protect tree, count, list */
-	/* Protected by pages.xa_lock */
-	unsigned long		nrpages;	/* number of total pages */
-	unsigned long		nrexceptional;	/* shadow or DAX entries */
-	pgoff_t			writeback_index;/* writeback starts here */
-	const struct address_space_operations *a_ops;	/* methods */
-	unsigned long		flags;		/* error bits */
+	struct inode		*host;
+	struct radix_tree_root	pages;
+	gfp_t			gfp_mask;
+	atomic_t		i_mmap_writable;
+	struct rb_root_cached	i_mmap;
+	struct rw_semaphore	i_mmap_rwsem;
+	unsigned long		nrpages;
+	unsigned long		nrexceptional;
+	pgoff_t			writeback_index;
+	const struct address_space_operations *a_ops;
+	unsigned long		flags;
 	errseq_t		wb_err;
-	spinlock_t		private_lock;	/* for use by the address_space */
-	struct list_head	private_list;	/* ditto */
-	void			*private_data;	/* ditto */
+	spinlock_t		private_lock;
+	struct list_head	private_list;
+	void			*private_data;
 } __attribute__((aligned(sizeof(long)))) __randomize_layout;
 	/*
 	 * On most architectures that alignment is already the case; but
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index b4d60295decd..ee9aad11d472 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -28,34 +28,26 @@
 #include <linux/rcupdate.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
+#include <linux/xarray.h>
 
 /*
  * The bottom two bits of the slot determine how the remaining bits in the
  * slot are interpreted:
  *
  * 00 - data pointer
- * 01 - internal entry
- * 10 - exceptional entry
- * 11 - this bit combination is currently unused/reserved
+ * 10 - internal entry
+ * x1 - inline data entry
  *
  * The internal entry may be a pointer to the next level in the tree, a
  * sibling entry, or an indicator that the entry in this slot has been moved
  * to another location in the tree and the lookup should be restarted.  While
  * NULL fits the 'data pointer' pattern, it means that there is no entry in
  * the tree for this index (no matter what level of the tree it is found at).
- * This means that you cannot store NULL in the tree as a value for the index.
+ * This means that storing NULL in the tree as a value for the index is the
+ * same as deleting the index from the tree.
  */
 #define RADIX_TREE_ENTRY_MASK		3UL
-#define RADIX_TREE_INTERNAL_NODE	1UL
-
-/*
- * Most users of the radix tree store pointers but shmem/tmpfs stores swap
- * entries in the same tree.  They are marked as exceptional entries to
- * distinguish them from pointers to struct page.
- * EXCEPTIONAL_ENTRY tests the bit, EXCEPTIONAL_SHIFT shifts content past it.
- */
-#define RADIX_TREE_EXCEPTIONAL_ENTRY	2
-#define RADIX_TREE_EXCEPTIONAL_SHIFT	2
+#define RADIX_TREE_INTERNAL_NODE	2UL
 
 static inline bool radix_tree_is_internal_node(void *ptr)
 {
@@ -83,11 +75,10 @@ static inline bool radix_tree_is_internal_node(void *ptr)
 
 /*
  * @count is the count of every non-NULL element in the ->slots array
- * whether that is an exceptional entry, a retry entry, a user pointer,
+ * whether that is a data entry, a retry entry, a user pointer,
  * a sibling entry or a pointer to the next level of the tree.
  * @exceptional is the count of every element in ->slots which is
- * either radix_tree_exceptional_entry() or is a sibling entry for an
- * exceptional entry.
+ * either a data entry or a sibling entry for data.
  */
 struct radix_tree_node {
 	unsigned char	shift;		/* Bits remaining in each slot */
@@ -267,17 +258,6 @@ static inline int radix_tree_deref_retry(void *arg)
 	return unlikely(radix_tree_is_internal_node(arg));
 }
 
-/**
- * radix_tree_exceptional_entry	- radix_tree_deref_slot gave exceptional entry?
- * @arg:	value returned by radix_tree_deref_slot
- * Returns:	0 if well-aligned pointer, non-0 if exceptional entry.
- */
-static inline int radix_tree_exceptional_entry(void *arg)
-{
-	/* Not unlikely because radix_tree_exception often tested first */
-	return (unsigned long)arg & RADIX_TREE_EXCEPTIONAL_ENTRY;
-}
-
 /**
  * radix_tree_exception	- radix_tree_deref_slot returned either exception?
  * @arg:	value returned by radix_tree_deref_slot
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 9c5a2628d6ce..5e93c7b500da 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -17,9 +17,8 @@
  *
  * swp_entry_t's are *never* stored anywhere in their arch-dependent format.
  */
-#define SWP_TYPE_SHIFT(e)	((sizeof(e.val) * 8) - \
-			(MAX_SWAPFILES_SHIFT + RADIX_TREE_EXCEPTIONAL_SHIFT))
-#define SWP_OFFSET_MASK(e)	((1UL << SWP_TYPE_SHIFT(e)) - 1)
+#define SWP_TYPE_SHIFT	(BITS_PER_XA_VALUE - MAX_SWAPFILES_SHIFT)
+#define SWP_OFFSET_MASK	((1UL << SWP_TYPE_SHIFT) - 1)
 
 /*
  * Store a type+offset into a swp_entry_t in an arch-independent format
@@ -28,8 +27,7 @@ static inline swp_entry_t swp_entry(unsigned long type, pgoff_t offset)
 {
 	swp_entry_t ret;
 
-	ret.val = (type << SWP_TYPE_SHIFT(ret)) |
-			(offset & SWP_OFFSET_MASK(ret));
+	ret.val = (type << SWP_TYPE_SHIFT) | (offset & SWP_OFFSET_MASK);
 	return ret;
 }
 
@@ -39,7 +37,7 @@ static inline swp_entry_t swp_entry(unsigned long type, pgoff_t offset)
  */
 static inline unsigned swp_type(swp_entry_t entry)
 {
-	return (entry.val >> SWP_TYPE_SHIFT(entry));
+	return (entry.val >> SWP_TYPE_SHIFT);
 }
 
 /*
@@ -48,7 +46,7 @@ static inline unsigned swp_type(swp_entry_t entry)
  */
 static inline pgoff_t swp_offset(swp_entry_t entry)
 {
-	return entry.val & SWP_OFFSET_MASK(entry);
+	return entry.val & SWP_OFFSET_MASK;
 }
 
 #ifdef CONFIG_MMU
@@ -89,16 +87,13 @@ static inline swp_entry_t radix_to_swp_entry(void *arg)
 {
 	swp_entry_t entry;
 
-	entry.val = (unsigned long)arg >> RADIX_TREE_EXCEPTIONAL_SHIFT;
+	entry.val = xa_to_value(arg);
 	return entry;
 }
 
 static inline void *swp_to_radix_entry(swp_entry_t entry)
 {
-	unsigned long value;
-
-	value = entry.val << RADIX_TREE_EXCEPTIONAL_SHIFT;
-	return (void *)(value | RADIX_TREE_EXCEPTIONAL_ENTRY);
+	return xa_mk_value(entry.val);
 }
 
 #if IS_ENABLED(CONFIG_DEVICE_PRIVATE)
diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index a5a933925b85..b1da8021b5fa 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -16,7 +16,73 @@
  * GNU General Public License for more details.
  */
 
+/**
+ * An eXtensible Array is an array of entries indexed by an unsigned
+ * long.  The XArray takes care of its own locking, using an irqsafe
+ * spinlock for operations that modify the XArray and RCU for operations
+ * which only read from the XArray.
+ *
+ * The XArray can store pointers which are aligned to a multiple of 4.
+ * This includes all objects allocated via SLAB, but you cannot store
+ * pointers to an arbitrary offset within an object.  The XArray can also
+ * store data values between 0 and LONG_MAX by converting them into entries
+ * using xa_mk_value().  The XArray does not support storing IS_ERR()
+ * pointers; some conflict with data values and others conflict with
+ * entries the XArray uses for its own purposes.
+ *
+ * A freshly initialised XArray is full of NULL pointers.  You can set the
+ * entry at any index by calling xa_store(), and get the value by calling
+ * xa_load().  There is no difference between an entry which has never been
+ * stored to and an entry which has most recently had NULL stored to it.  You
+ * can conditionally update the value of an entry by calling xa_cmpxchg().
+ * Each entry which isn't NULL can be tagged with up to three bits of extra
+ * information, accessed through xa_get_tag(), xa_set_tag() and
+ * xa_clear_tag().  You can copy batches of entries out of the array by
+ * calling xa_get_entries() or xa_get_tagged().  You can iterate over
+ * non-NULL entries in the array by calling xa_find(), xa_next() or
+ * using the xa_for_each() iterator.
+ *
+ * There are two levels of API provided.  Normal and Advanced.
+ * The advanced API is more flexible but has fewer safeguards.
+ */
 #include <linux/spinlock.h>
+#include <linux/types.h>
+
+#define BITS_PER_XA_VALUE	(BITS_PER_LONG - 1)
+
+/**
+ * xa_mk_value() - Create an XArray entry from a data value.
+ * @v: Value to store in XArray.
+ *
+ * Return: An entry suitable for storing in the XArray.
+ */
+static inline void *xa_mk_value(unsigned long v)
+{
+	WARN_ON((long)v < 0);
+	return (void *)((v << 1) | 1);
+}
+
+/**
+ * xa_to_value() - Get value stored in an XArray entry.
+ * @entry: XArray entry.
+ *
+ * Return: The value stored in the XArray entry.
+ */
+static inline unsigned long xa_to_value(void *entry)
+{
+	return (unsigned long)entry >> 1;
+}
+
+/**
+ * xa_is_value() - Determine if an entry is a data value.
+ * @entry: XArray entry.
+ *
+ * Return: True if the entry is a data value, false if it is a pointer.
+ */
+static inline bool xa_is_value(void *entry)
+{
+	return (unsigned long)entry & 1;
+}
 
 #define xa_trylock(xa)		spin_trylock(&(xa)->xa_lock)
 #define xa_lock(xa)		spin_lock(&(xa)->xa_lock)
diff --git a/lib/idr.c b/lib/idr.c
index 35678388e134..afcdff365037 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -3,6 +3,7 @@
 #include <linux/idr.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
+#include <linux/xarray.h>
 
 DEFINE_PER_CPU(struct ida_bitmap *, ida_bitmap);
 static DEFINE_SPINLOCK(simple_ida_lock);
@@ -271,11 +272,8 @@ EXPORT_SYMBOL(idr_replace);
  * by the number of bits in the leaf bitmap before doing a radix tree lookup.
  *
  * As an optimisation, if there are only a few low bits set in any given
- * leaf, instead of allocating a 128-byte bitmap, we use the 'exceptional
- * entry' functionality of the radix tree to store BITS_PER_LONG - 2 bits
- * directly in the entry.  By being really tricksy, we could store
- * BITS_PER_LONG - 1 bits, but there're diminishing returns after optimising
- * for 0-3 allocated IDs.
+ * leaf, instead of allocating a 128-byte bitmap, we store the data
+ * directly in the entry.
  *
  * We allow the radix tree 'exceptional' count to get out of date.  Nothing
  * in the IDA nor the radix tree code checks it.  If it becomes important
@@ -317,12 +315,11 @@ int ida_get_new_above(struct ida *ida, int start, int *id)
 	struct radix_tree_iter iter;
 	struct ida_bitmap *bitmap;
 	unsigned long index;
-	unsigned bit, ebit;
+	unsigned bit;
 	int new;
 
 	index = start / IDA_BITMAP_BITS;
 	bit = start % IDA_BITMAP_BITS;
-	ebit = bit + RADIX_TREE_EXCEPTIONAL_SHIFT;
 
 	slot = radix_tree_iter_init(&iter, index);
 	for (;;) {
@@ -337,26 +334,25 @@ int ida_get_new_above(struct ida *ida, int start, int *id)
 				return PTR_ERR(slot);
 			}
 		}
-		if (iter.index > index) {
+		if (iter.index > index)
 			bit = 0;
-			ebit = RADIX_TREE_EXCEPTIONAL_SHIFT;
-		}
 		new = iter.index * IDA_BITMAP_BITS;
 		bitmap = rcu_dereference_raw(*slot);
-		if (radix_tree_exception(bitmap)) {
-			unsigned long tmp = (unsigned long)bitmap;
-			ebit = find_next_zero_bit(&tmp, BITS_PER_LONG, ebit);
-			if (ebit < BITS_PER_LONG) {
-				tmp |= 1UL << ebit;
-				rcu_assign_pointer(*slot, (void *)tmp);
-				*id = new + ebit - RADIX_TREE_EXCEPTIONAL_SHIFT;
+		if (xa_is_value(bitmap)) {
+			unsigned long tmp = xa_to_value(bitmap);
+			int vbit = find_next_zero_bit(&tmp, BITS_PER_XA_VALUE,
+							bit);
+			if (vbit < BITS_PER_XA_VALUE) {
+				tmp |= 1UL << vbit;
+				rcu_assign_pointer(*slot, xa_mk_value(tmp));
+				*id = new + vbit;
 				return 0;
 			}
 			bitmap = this_cpu_xchg(ida_bitmap, NULL);
 			if (!bitmap)
 				return -EAGAIN;
 			memset(bitmap, 0, sizeof(*bitmap));
-			bitmap->bitmap[0] = tmp >> RADIX_TREE_EXCEPTIONAL_SHIFT;
+			bitmap->bitmap[0] = tmp;
 			rcu_assign_pointer(*slot, bitmap);
 		}
 
@@ -377,19 +373,15 @@ int ida_get_new_above(struct ida *ida, int start, int *id)
 			new += bit;
 			if (new < 0)
 				return -ENOSPC;
-			if (ebit < BITS_PER_LONG) {
-				bitmap = (void *)((1UL << ebit) |
-						RADIX_TREE_EXCEPTIONAL_ENTRY);
-				radix_tree_iter_replace(root, &iter, slot,
-						bitmap);
-				*id = new;
-				return 0;
+			if (bit < BITS_PER_XA_VALUE) {
+				bitmap = xa_mk_value(1UL << bit);
+			} else {
+				bitmap = this_cpu_xchg(ida_bitmap, NULL);
+				if (!bitmap)
+					return -EAGAIN;
+				memset(bitmap, 0, sizeof(*bitmap));
+				__set_bit(bit, bitmap->bitmap);
 			}
-			bitmap = this_cpu_xchg(ida_bitmap, NULL);
-			if (!bitmap)
-				return -EAGAIN;
-			memset(bitmap, 0, sizeof(*bitmap));
-			__set_bit(bit, bitmap->bitmap);
 			radix_tree_iter_replace(root, &iter, slot, bitmap);
 		}
 
@@ -420,9 +412,9 @@ void ida_remove(struct ida *ida, int id)
 		goto err;
 
 	bitmap = rcu_dereference_raw(*slot);
-	if (radix_tree_exception(bitmap)) {
+	if (xa_is_value(bitmap)) {
 		btmp = (unsigned long *)slot;
-		offset += RADIX_TREE_EXCEPTIONAL_SHIFT;
+		offset += 1; /* Intimate knowledge of the xa_data encoding */
 		if (offset >= BITS_PER_LONG)
 			goto err;
 	} else {
@@ -433,9 +425,8 @@ void ida_remove(struct ida *ida, int id)
 
 	__clear_bit(offset, btmp);
 	radix_tree_iter_tag_set(&ida->ida_rt, &iter, IDR_FREE);
-	if (radix_tree_exception(bitmap)) {
-		if (rcu_dereference_raw(*slot) ==
-					(void *)RADIX_TREE_EXCEPTIONAL_ENTRY)
+	if (xa_is_value(bitmap)) {
+		if (xa_to_value(rcu_dereference_raw(*slot)) == 0)
 			radix_tree_iter_delete(&ida->ida_rt, &iter, slot);
 	} else if (bitmap_empty(btmp, IDA_BITMAP_BITS)) {
 		kfree(bitmap);
@@ -463,7 +454,7 @@ void ida_destroy(struct ida *ida)
 
 	radix_tree_for_each_slot(slot, &ida->ida_rt, &iter, 0) {
 		struct ida_bitmap *bitmap = rcu_dereference_raw(*slot);
-		if (!radix_tree_exception(bitmap))
+		if (!xa_is_value(bitmap))
 			kfree(bitmap);
 		radix_tree_iter_delete(&ida->ida_rt, &iter, slot);
 	}
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 6d29ca4c8db0..30e49b89aa3b 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -339,14 +339,12 @@ static void dump_ida_node(void *entry, unsigned long index)
 		for (i = 0; i < RADIX_TREE_MAP_SIZE; i++)
 			dump_ida_node(node->slots[i],
 					index | (i << node->shift));
-	} else if (radix_tree_exceptional_entry(entry)) {
+	} else if (xa_is_value(entry)) {
 		pr_debug("ida excp: %p offset %d indices %lu-%lu data %lx\n",
 				entry, (int)(index & RADIX_TREE_MAP_MASK),
 				index * IDA_BITMAP_BITS,
-				index * IDA_BITMAP_BITS + BITS_PER_LONG -
-					RADIX_TREE_EXCEPTIONAL_SHIFT,
-				(unsigned long)entry >>
-					RADIX_TREE_EXCEPTIONAL_SHIFT);
+				index * IDA_BITMAP_BITS + BITS_PER_XA_VALUE,
+				xa_to_value(entry));
 	} else {
 		struct ida_bitmap *bitmap = entry;
 
@@ -655,7 +653,7 @@ static int radix_tree_extend(struct radix_tree_root *root, gfp_t gfp,
 		BUG_ON(shift > BITS_PER_LONG);
 		if (radix_tree_is_internal_node(entry)) {
 			entry_to_node(entry)->parent = node;
-		} else if (radix_tree_exceptional_entry(entry)) {
+		} else if (xa_is_value(entry)) {
 			/* Moving an exceptional root->rnode to a node */
 			node->exceptional = 1;
 		}
@@ -946,12 +944,12 @@ static inline int insert_entries(struct radix_tree_node *node,
 					!is_sibling_entry(node, old) &&
 					(old != RADIX_TREE_RETRY))
 			radix_tree_free_nodes(old);
-		if (radix_tree_exceptional_entry(old))
+		if (xa_is_value(old))
 			node->exceptional--;
 	}
 	if (node) {
 		node->count += n;
-		if (radix_tree_exceptional_entry(item))
+		if (xa_is_value(item))
 			node->exceptional += n;
 	}
 	return n;
@@ -965,7 +963,7 @@ static inline int insert_entries(struct radix_tree_node *node,
 	rcu_assign_pointer(*slot, item);
 	if (node) {
 		node->count++;
-		if (radix_tree_exceptional_entry(item))
+		if (xa_is_value(item))
 			node->exceptional++;
 	}
 	return 1;
@@ -1182,8 +1180,7 @@ void __radix_tree_replace(struct radix_tree_root *root,
 			  radix_tree_update_node_t update_node)
 {
 	void *old = rcu_dereference_raw(*slot);
-	int exceptional = !!radix_tree_exceptional_entry(item) -
-				!!radix_tree_exceptional_entry(old);
+	int exceptional = !!xa_is_value(item) - !!xa_is_value(old);
 	int count = calculate_count(root, node, slot, item, old);
 
 	/*
@@ -1986,7 +1983,7 @@ static bool __radix_tree_delete(struct radix_tree_root *root,
 				struct radix_tree_node *node, void __rcu **slot)
 {
 	void *old = rcu_dereference_raw(*slot);
-	int exceptional = radix_tree_exceptional_entry(old) ? -1 : 0;
+	int exceptional = xa_is_value(old) ? -1 : 0;
 	unsigned offset = get_slot_offset(node, slot);
 	int tag;
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 5c8f22fe4e62..1d012dd3629e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -127,7 +127,7 @@ static int page_cache_tree_insert(struct address_space *mapping,
 		void *p;
 
 		p = radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock);
-		if (!radix_tree_exceptional_entry(p))
+		if (!xa_is_value(p))
 			return -EEXIST;
 
 		mapping->nrexceptional--;
@@ -336,7 +336,7 @@ page_cache_tree_delete_batch(struct address_space *mapping,
 			break;
 		page = radix_tree_deref_slot_protected(slot,
 						       &mapping->pages.xa_lock);
-		if (radix_tree_exceptional_entry(page))
+		if (xa_is_value(page))
 			continue;
 		if (!tail_pages) {
 			/*
@@ -1355,7 +1355,7 @@ pgoff_t page_cache_next_hole(struct address_space *mapping,
 		struct page *page;
 
 		page = radix_tree_lookup(&mapping->pages, index);
-		if (!page || radix_tree_exceptional_entry(page))
+		if (!page || xa_is_value(page))
 			break;
 		index++;
 		if (index == 0)
@@ -1396,7 +1396,7 @@ pgoff_t page_cache_prev_hole(struct address_space *mapping,
 		struct page *page;
 
 		page = radix_tree_lookup(&mapping->pages, index);
-		if (!page || radix_tree_exceptional_entry(page))
+		if (!page || xa_is_value(page))
 			break;
 		index--;
 		if (index == ULONG_MAX)
@@ -1539,7 +1539,7 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset,
 
 repeat:
 	page = find_get_entry(mapping, offset);
-	if (radix_tree_exceptional_entry(page))
+	if (xa_is_value(page))
 		page = NULL;
 	if (!page)
 		goto no_page;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index cb4d199bf328..55ade70c33bb 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1363,7 +1363,7 @@ static void collapse_shmem(struct mm_struct *mm,
 
 		page = radix_tree_deref_slot_protected(slot,
 				&mapping->pages.xa_lock);
-		if (radix_tree_exceptional_entry(page) || !PageUptodate(page)) {
+		if (xa_is_value(page) || !PageUptodate(page)) {
 			xa_unlock_irq(&mapping->pages);
 			/* swap in or instantiate fallocated page */
 			if (shmem_getpage(mapping->host, index, &page,
diff --git a/mm/madvise.c b/mm/madvise.c
index 375cf32087e4..85f5d4f66cdd 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -251,7 +251,7 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma,
 		index = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
 
 		page = find_get_entry(mapping, index);
-		if (!radix_tree_exceptional_entry(page)) {
+		if (!xa_is_value(page)) {
 			if (page)
 				put_page(page);
 			continue;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 953dfed9e780..1b8dc671e32c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4526,7 +4526,7 @@ static struct page *mc_handle_file_pte(struct vm_area_struct *vma,
 	/* shmem/tmpfs may report page out on swap: account for that too. */
 	if (shmem_mapping(mapping)) {
 		page = find_get_entry(mapping, pgoff);
-		if (radix_tree_exceptional_entry(page)) {
+		if (xa_is_value(page)) {
 			swp_entry_t swp = radix_to_swp_entry(page);
 			if (do_memsw_account())
 				*entry = swp;
diff --git a/mm/mincore.c b/mm/mincore.c
index fc37afe226e6..4985965aa20a 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -66,7 +66,7 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff)
 		 * shmem/tmpfs may return swap: account for swapcache
 		 * page too.
 		 */
-		if (radix_tree_exceptional_entry(page)) {
+		if (xa_is_value(page)) {
 			swp_entry_t swp = radix_to_swp_entry(page);
 			page = find_get_page(swap_address_space(swp),
 					     swp_offset(swp));
diff --git a/mm/readahead.c b/mm/readahead.c
index 514188fd2489..4851f002605f 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -177,7 +177,7 @@ int __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
 		rcu_read_lock();
 		page = radix_tree_lookup(&mapping->pages, page_offset);
 		rcu_read_unlock();
-		if (page && !radix_tree_exceptional_entry(page))
+		if (page && !xa_is_value(page))
 			continue;
 
 		page = __page_cache_alloc(gfp_mask);
diff --git a/mm/shmem.c b/mm/shmem.c
index b1396ed0c9b1..21bf42f14ee2 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -690,7 +690,7 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
 			continue;
 		}
 
-		if (radix_tree_exceptional_entry(page))
+		if (xa_is_value(page))
 			swapped++;
 
 		if (need_resched()) {
@@ -805,7 +805,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 			if (index >= end)
 				break;
 
-			if (radix_tree_exceptional_entry(page)) {
+			if (xa_is_value(page)) {
 				if (unfalloc)
 					continue;
 				nr_swaps_freed += !shmem_free_swap(mapping,
@@ -902,7 +902,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 			if (index >= end)
 				break;
 
-			if (radix_tree_exceptional_entry(page)) {
+			if (xa_is_value(page)) {
 				if (unfalloc)
 					continue;
 				if (shmem_free_swap(mapping, index, page)) {
@@ -1614,7 +1614,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
 repeat:
 	swap.val = 0;
 	page = find_lock_entry(mapping, index);
-	if (radix_tree_exceptional_entry(page)) {
+	if (xa_is_value(page)) {
 		swap = radix_to_swp_entry(page);
 		page = NULL;
 	}
@@ -2547,7 +2547,7 @@ static pgoff_t shmem_seek_hole_data(struct address_space *mapping,
 				index = indices[i];
 			}
 			page = pvec.pages[i];
-			if (page && !radix_tree_exceptional_entry(page)) {
+			if (page && !xa_is_value(page)) {
 				if (!PageUptodate(page))
 					page = NULL;
 			}
diff --git a/mm/swap.c b/mm/swap.c
index 38e1b6374a97..8d7773cb2c3f 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -953,7 +953,7 @@ void pagevec_remove_exceptionals(struct pagevec *pvec)
 
 	for (i = 0, j = 0; i < pagevec_count(pvec); i++) {
 		struct page *page = pvec->pages[i];
-		if (!radix_tree_exceptional_entry(page))
+		if (!xa_is_value(page))
 			pvec->pages[j++] = page;
 	}
 	pvec->nr = j;
diff --git a/mm/truncate.c b/mm/truncate.c
index 094158f2e447..69bb743dd7e5 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -70,7 +70,7 @@ static void truncate_exceptional_pvec_entries(struct address_space *mapping,
 		return;
 
 	for (j = 0; j < pagevec_count(pvec); j++)
-		if (radix_tree_exceptional_entry(pvec->pages[j]))
+		if (xa_is_value(pvec->pages[j]))
 			break;
 
 	if (j == pagevec_count(pvec))
@@ -85,7 +85,7 @@ static void truncate_exceptional_pvec_entries(struct address_space *mapping,
 		struct page *page = pvec->pages[i];
 		pgoff_t index = indices[i];
 
-		if (!radix_tree_exceptional_entry(page)) {
+		if (!xa_is_value(page)) {
 			pvec->pages[j++] = page;
 			continue;
 		}
@@ -351,7 +351,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 			if (index >= end)
 				break;
 
-			if (radix_tree_exceptional_entry(page))
+			if (xa_is_value(page))
 				continue;
 
 			if (!trylock_page(page))
@@ -446,7 +446,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 				break;
 			}
 
-			if (radix_tree_exceptional_entry(page))
+			if (xa_is_value(page))
 				continue;
 
 			lock_page(page);
@@ -565,7 +565,7 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
 			if (index > end)
 				break;
 
-			if (radix_tree_exceptional_entry(page)) {
+			if (xa_is_value(page)) {
 				invalidate_exceptional_entry(mapping, index,
 							     page);
 				continue;
@@ -696,7 +696,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
 			if (index > end)
 				break;
 
-			if (radix_tree_exceptional_entry(page)) {
+			if (xa_is_value(page)) {
 				if (!invalidate_exceptional_entry2(mapping,
 								   index, page))
 					ret = -EBUSY;
diff --git a/mm/workingset.c b/mm/workingset.c
index 2d071f0df3af..0a3465700d5f 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -155,8 +155,8 @@
  * refault distance will immediately activate the refaulting page.
  */
 
-#define EVICTION_SHIFT	(RADIX_TREE_EXCEPTIONAL_ENTRY + \
-			 NODES_SHIFT +	\
+#define EVICTION_SHIFT	((BITS_PER_LONG - BITS_PER_XA_VALUE) +	\
+			 NODES_SHIFT +				\
 			 MEM_CGROUP_ID_SHIFT)
 #define EVICTION_MASK	(~0UL >> EVICTION_SHIFT)
 
@@ -175,18 +175,16 @@ static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction)
 	eviction >>= bucket_order;
 	eviction = (eviction << MEM_CGROUP_ID_SHIFT) | memcgid;
 	eviction = (eviction << NODES_SHIFT) | pgdat->node_id;
-	eviction = (eviction << RADIX_TREE_EXCEPTIONAL_SHIFT);
 
-	return (void *)(eviction | RADIX_TREE_EXCEPTIONAL_ENTRY);
+	return xa_mk_value(eviction);
 }
 
 static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat,
 			  unsigned long *evictionp)
 {
-	unsigned long entry = (unsigned long)shadow;
+	unsigned long entry = xa_to_value(shadow);
 	int memcgid, nid;
 
-	entry >>= RADIX_TREE_EXCEPTIONAL_SHIFT;
 	nid = entry & ((1UL << NODES_SHIFT) - 1);
 	entry >>= NODES_SHIFT;
 	memcgid = entry & ((1UL << MEM_CGROUP_ID_SHIFT) - 1);
@@ -453,7 +451,7 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
 		goto out_invalid;
 	for (i = 0; i < RADIX_TREE_MAP_SIZE; i++) {
 		if (node->slots[i]) {
-			if (WARN_ON_ONCE(!radix_tree_exceptional_entry(node->slots[i])))
+			if (WARN_ON_ONCE(!xa_is_value(node->slots[i])))
 				goto out_invalid;
 			if (WARN_ON_ONCE(!node->exceptional))
 				goto out_invalid;
diff --git a/tools/testing/radix-tree/idr-test.c b/tools/testing/radix-tree/idr-test.c
index 1dff94c15da5..1720bd90ece0 100644
--- a/tools/testing/radix-tree/idr-test.c
+++ b/tools/testing/radix-tree/idr-test.c
@@ -19,7 +19,7 @@
 
 #include "test.h"
 
-#define DUMMY_PTR	((void *)0x12)
+#define DUMMY_PTR	((void *)0x10)
 
 int item_idr_free(int id, void *p, void *data)
 {
@@ -332,11 +332,11 @@ void ida_check_conv(void)
 	for (i = 0; i < 1000000; i++) {
 		int err = ida_get_new(&ida, &id);
 		if (err == -EAGAIN) {
-			assert((i % IDA_BITMAP_BITS) == (BITS_PER_LONG - 2));
+			assert((i % IDA_BITMAP_BITS) == (BITS_PER_LONG - 1));
 			assert(ida_pre_get(&ida, GFP_KERNEL));
 			err = ida_get_new(&ida, &id);
 		} else {
-			assert((i % IDA_BITMAP_BITS) != (BITS_PER_LONG - 2));
+			assert((i % IDA_BITMAP_BITS) != (BITS_PER_LONG - 1));
 		}
 		assert(!err);
 		assert(id == i);
diff --git a/tools/testing/radix-tree/linux/radix-tree.h b/tools/testing/radix-tree/linux/radix-tree.h
index 24f13d27a8da..de3f655caca3 100644
--- a/tools/testing/radix-tree/linux/radix-tree.h
+++ b/tools/testing/radix-tree/linux/radix-tree.h
@@ -4,6 +4,7 @@
 
 #include "generated/map-shift.h"
 #include "../../../../include/linux/radix-tree.h"
+#include <linux/xarray.h>
 
 extern int kmalloc_verbose;
 extern int test_verbose;
diff --git a/tools/testing/radix-tree/multiorder.c b/tools/testing/radix-tree/multiorder.c
index 59245b3d587c..684e76f79f4a 100644
--- a/tools/testing/radix-tree/multiorder.c
+++ b/tools/testing/radix-tree/multiorder.c
@@ -38,12 +38,11 @@ static void __multiorder_tag_test(int index, int order)
 
 	/*
 	 * Verify we get collisions for covered indices.  We try and fail to
-	 * insert an exceptional entry so we don't leak memory via
+	 * insert a data entry so we don't leak memory via
 	 * item_insert_order().
 	 */
 	for_each_index(i, base, order) {
-		err = __radix_tree_insert(&tree, i, order,
-				(void *)(0xA0 | RADIX_TREE_EXCEPTIONAL_ENTRY));
+		err = __radix_tree_insert(&tree, i, order, xa_mk_value(0xA0));
 		assert(err == -EEXIST);
 	}
 
@@ -379,8 +378,8 @@ static void multiorder_join1(unsigned long index,
 }
 
 /*
- * Check that the accounting of exceptional entries is handled correctly
- * by joining an exceptional entry to a normal pointer.
+ * Check that the accounting of inline data entries is handled correctly
+ * by joining a data entry to a normal pointer.
  */
 static void multiorder_join2(unsigned order1, unsigned order2)
 {
@@ -390,9 +389,9 @@ static void multiorder_join2(unsigned order1, unsigned order2)
 	void *item2;
 
 	item_insert_order(&tree, 0, order2);
-	radix_tree_insert(&tree, 1 << order2, (void *)0x12UL);
+	radix_tree_insert(&tree, 1 << order2, xa_mk_value(5));
 	item2 = __radix_tree_lookup(&tree, 1 << order2, &node, NULL);
-	assert(item2 == (void *)0x12UL);
+	assert(item2 == xa_mk_value(5));
 	assert(node->exceptional == 1);
 
 	item2 = radix_tree_lookup(&tree, 0);
@@ -406,7 +405,7 @@ static void multiorder_join2(unsigned order1, unsigned order2)
 }
 
 /*
- * This test revealed an accounting bug for exceptional entries at one point.
+ * This test revealed an accounting bug for inline data entries at one point.
  * Nodes were being freed back into the pool with an elevated exception count
  * by radix_tree_join() and then radix_tree_split() was failing to zero the
  * count of exceptional entries.
@@ -420,16 +419,16 @@ static void multiorder_join3(unsigned int order)
 	unsigned long i;
 
 	for (i = 0; i < (1 << order); i++) {
-		radix_tree_insert(&tree, i, (void *)0x12UL);
+		radix_tree_insert(&tree, i, xa_mk_value(5));
 	}
 
-	radix_tree_join(&tree, 0, order, (void *)0x16UL);
+	radix_tree_join(&tree, 0, order, xa_mk_value(7));
 	rcu_barrier();
 
 	radix_tree_split(&tree, 0, 0);
 
 	radix_tree_for_each_slot(slot, &tree, &iter, 0) {
-		radix_tree_iter_replace(&tree, &iter, slot, (void *)0x12UL);
+		radix_tree_iter_replace(&tree, &iter, slot, xa_mk_value(5));
 	}
 
 	__radix_tree_lookup(&tree, 0, &node, NULL);
@@ -516,10 +515,10 @@ static void __multiorder_split2(int old_order, int new_order)
 	struct radix_tree_node *node;
 	void *item;
 
-	__radix_tree_insert(&tree, 0, old_order, (void *)0x12);
+	__radix_tree_insert(&tree, 0, old_order, xa_mk_value(5));
 
 	item = __radix_tree_lookup(&tree, 0, &node, NULL);
-	assert(item == (void *)0x12);
+	assert(item == xa_mk_value(5));
 	assert(node->exceptional > 0);
 
 	radix_tree_split(&tree, 0, new_order);
@@ -529,7 +528,7 @@ static void __multiorder_split2(int old_order, int new_order)
 	}
 
 	item = __radix_tree_lookup(&tree, 0, &node, NULL);
-	assert(item != (void *)0x12);
+	assert(item != xa_mk_value(5));
 	assert(node->exceptional == 0);
 
 	item_kill_tree(&tree);
@@ -543,40 +542,40 @@ static void __multiorder_split3(int old_order, int new_order)
 	struct radix_tree_node *node;
 	void *item;
 
-	__radix_tree_insert(&tree, 0, old_order, (void *)0x12);
+	__radix_tree_insert(&tree, 0, old_order, xa_mk_value(5));
 
 	item = __radix_tree_lookup(&tree, 0, &node, NULL);
-	assert(item == (void *)0x12);
+	assert(item == xa_mk_value(5));
 	assert(node->exceptional > 0);
 
 	radix_tree_split(&tree, 0, new_order);
 	radix_tree_for_each_slot(slot, &tree, &iter, 0) {
-		radix_tree_iter_replace(&tree, &iter, slot, (void *)0x16);
+		radix_tree_iter_replace(&tree, &iter, slot, xa_mk_value(7));
 	}
 
 	item = __radix_tree_lookup(&tree, 0, &node, NULL);
-	assert(item == (void *)0x16);
+	assert(item == xa_mk_value(7));
 	assert(node->exceptional > 0);
 
 	item_kill_tree(&tree);
 
-	__radix_tree_insert(&tree, 0, old_order, (void *)0x12);
+	__radix_tree_insert(&tree, 0, old_order, xa_mk_value(5));
 
 	item = __radix_tree_lookup(&tree, 0, &node, NULL);
-	assert(item == (void *)0x12);
+	assert(item == xa_mk_value(5));
 	assert(node->exceptional > 0);
 
 	radix_tree_split(&tree, 0, new_order);
 	radix_tree_for_each_slot(slot, &tree, &iter, 0) {
 		if (iter.index == (1 << new_order))
 			radix_tree_iter_replace(&tree, &iter, slot,
-						(void *)0x16);
+						xa_mk_value(7));
 		else
 			radix_tree_iter_replace(&tree, &iter, slot, NULL);
 	}
 
 	item = __radix_tree_lookup(&tree, 1 << new_order, &node, NULL);
-	assert(item == (void *)0x16);
+	assert(item == xa_mk_value(7));
 	assert(node->count == node->exceptional);
 	do {
 		node = node->parent;
@@ -609,13 +608,13 @@ static void multiorder_account(void)
 
 	item_insert_order(&tree, 0, 5);
 
-	__radix_tree_insert(&tree, 1 << 5, 5, (void *)0x12);
+	__radix_tree_insert(&tree, 1 << 5, 5, xa_mk_value(5));
 	__radix_tree_lookup(&tree, 0, &node, NULL);
 	assert(node->count == node->exceptional * 2);
 	radix_tree_delete(&tree, 1 << 5);
 	assert(node->exceptional == 0);
 
-	__radix_tree_insert(&tree, 1 << 5, 5, (void *)0x12);
+	__radix_tree_insert(&tree, 1 << 5, 5, xa_mk_value(5));
 	__radix_tree_lookup(&tree, 1 << 5, &node, &slot);
 	assert(node->count == node->exceptional * 2);
 	__radix_tree_replace(&tree, node, slot, NULL, NULL);
diff --git a/tools/testing/radix-tree/test.c b/tools/testing/radix-tree/test.c
index 5978ab1f403d..0d69c49177c6 100644
--- a/tools/testing/radix-tree/test.c
+++ b/tools/testing/radix-tree/test.c
@@ -276,7 +276,7 @@ void item_kill_tree(struct radix_tree_root *root)
 	int nfound;
 
 	radix_tree_for_each_slot(slot, root, &iter, 0) {
-		if (radix_tree_exceptional_entry(*slot))
+		if (xa_is_value(*slot))
 			radix_tree_delete(root, iter.index);
 	}
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 16/62] xarray: Replace exceptional entries
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Introduce xarray value entries to replace the radix tree exceptional
entry code.  This is a slight change in encoding to allow the use of an
extra bit (we can now store BITS_PER_LONG - 1 bits in a value entry).
It is also a change in emphasis; exceptional entries are intimidating
and different.  As the comment explains, you can choose to store values
or pointers in the xarray and they are both first-class citizens.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h    |   4 +-
 arch/powerpc/include/asm/nohash/64/pgtable.h    |   4 +-
 drivers/gpu/drm/i915/i915_gem.c                 |  17 ++--
 drivers/staging/lustre/lustre/mdc/mdc_request.c |   2 +-
 fs/btrfs/compression.c                          |   2 +-
 fs/btrfs/inode.c                                |   2 +-
 fs/dax.c                                        | 108 ++++++++++++------------
 fs/proc/task_mmu.c                              |   2 +-
 include/linux/fs.h                              |  48 +++++++----
 include/linux/radix-tree.h                      |  36 ++------
 include/linux/swapops.h                         |  19 ++---
 include/linux/xarray.h                          |  66 +++++++++++++++
 lib/idr.c                                       |  63 ++++++--------
 lib/radix-tree.c                                |  21 ++---
 mm/filemap.c                                    |  10 +--
 mm/khugepaged.c                                 |   2 +-
 mm/madvise.c                                    |   2 +-
 mm/memcontrol.c                                 |   2 +-
 mm/mincore.c                                    |   2 +-
 mm/readahead.c                                  |   2 +-
 mm/shmem.c                                      |  10 +--
 mm/swap.c                                       |   2 +-
 mm/truncate.c                                   |  12 +--
 mm/workingset.c                                 |  12 ++-
 tools/testing/radix-tree/idr-test.c             |   6 +-
 tools/testing/radix-tree/linux/radix-tree.h     |   1 +
 tools/testing/radix-tree/multiorder.c           |  47 +++++------
 tools/testing/radix-tree/test.c                 |   2 +-
 28 files changed, 270 insertions(+), 236 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 9a677cd5997f..ff96663d8455 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -649,9 +649,7 @@ static inline bool pte_user(pte_t pte)
 	BUILD_BUG_ON(_PAGE_HPTEFLAGS & (0x1f << _PAGE_BIT_SWAP_TYPE)); \
 	BUILD_BUG_ON(_PAGE_HPTEFLAGS & _PAGE_SWP_SOFT_DIRTY);	\
 	} while (0)
-/*
- * on pte we don't need handle RADIX_TREE_EXCEPTIONAL_SHIFT;
- */
+
 #define SWP_TYPE_BITS 5
 #define __swp_type(x)		(((x).val >> _PAGE_BIT_SWAP_TYPE) \
 				& ((1UL << SWP_TYPE_BITS) - 1))
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h b/arch/powerpc/include/asm/nohash/64/pgtable.h
index abddf5830ad5..f711773568d7 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -329,9 +329,7 @@ static inline void __ptep_set_access_flags(struct mm_struct *mm,
 	 */							\
 	BUILD_BUG_ON(_PAGE_HPTEFLAGS & (0x1f << _PAGE_BIT_SWAP_TYPE)); \
 	} while (0)
-/*
- * on pte we don't need handle RADIX_TREE_EXCEPTIONAL_SHIFT;
- */
+
 #define SWP_TYPE_BITS 5
 #define __swp_type(x)		(((x).val >> _PAGE_BIT_SWAP_TYPE) \
 				& ((1UL << SWP_TYPE_BITS) - 1))
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3a140eedfc83..7b10ec3694a8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5375,7 +5375,8 @@ i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
 	count = __sg_page_count(sg);
 
 	while (idx + count <= n) {
-		unsigned long exception, i;
+		void *entry;
+		unsigned long i;
 		int ret;
 
 		/* If we cannot allocate and insert this entry, or the
@@ -5390,12 +5391,9 @@ i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
 		if (ret && ret != -EEXIST)
 			goto scan;
 
-		exception =
-			RADIX_TREE_EXCEPTIONAL_ENTRY |
-			idx << RADIX_TREE_EXCEPTIONAL_SHIFT;
+		entry = xa_mk_value(idx);
 		for (i = 1; i < count; i++) {
-			ret = radix_tree_insert(&iter->radix, idx + i,
-						(void *)exception);
+			ret = radix_tree_insert(&iter->radix, idx + i, entry);
 			if (ret && ret != -EEXIST)
 				goto scan;
 		}
@@ -5433,15 +5431,14 @@ i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
 	GEM_BUG_ON(!sg);
 
 	/* If this index is in the middle of multi-page sg entry,
-	 * the radixtree will contain an exceptional entry that points
+	 * the radixtree will contain a data entry that points
 	 * to the start of that range. We will return the pointer to
 	 * the base page and the offset of this page within the
 	 * sg entry's range.
 	 */
 	*offset = 0;
-	if (unlikely(radix_tree_exception(sg))) {
-		unsigned long base =
-			(unsigned long)sg >> RADIX_TREE_EXCEPTIONAL_SHIFT;
+	if (unlikely(xa_is_value(sg))) {
+		unsigned long base = xa_to_value(sg);
 
 		sg = radix_tree_lookup(&iter->radix, base);
 		GEM_BUG_ON(!sg);
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 45dcf9f958d4..2ec79a6b17da 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -940,7 +940,7 @@ static struct page *mdc_page_locate(struct address_space *mapping, __u64 *hash,
 	xa_lock_irq(&mapping->pages);
 	found = radix_tree_gang_lookup(&mapping->pages,
 				       (void **)&page, offset, 1);
-	if (found > 0 && !radix_tree_exceptional_entry(page)) {
+	if (found > 0 && !xa_is_value(page)) {
 		struct lu_dirpage *dp;
 
 		get_page(page);
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 541bd536c815..9446c58fa0ec 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -451,7 +451,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 		rcu_read_lock();
 		page = radix_tree_lookup(&mapping->pages, pg_index);
 		rcu_read_unlock();
-		if (page && !radix_tree_exceptional_entry(page)) {
+		if (page && !xa_is_value(page)) {
 			misses++;
 			if (misses > 4)
 				break;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 9ceee45ed513..dcad35f7c699 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7562,7 +7562,7 @@ bool btrfs_page_exists_in_range(struct inode *inode, loff_t start, loff_t end)
 			}
 			/*
 			 * Otherwise, shmem/tmpfs must be storing a swap entry
-			 * here as an exceptional entry: so return it without
+			 * here as data entry: so return it without
 			 * attempting to raise page count.
 			 */
 			page = NULL;
diff --git a/fs/dax.c b/fs/dax.c
index 93e6b79687af..e6ed216d3760 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -58,7 +58,7 @@ static int __init init_dax_wait_table(void)
 fs_initcall(init_dax_wait_table);
 
 /*
- * We use lowest available bit in exceptional entry for locking, one bit for
+ * We use lowest available bit in data entry for locking, one bit for
  * the entry size (PMD) and two more to tell us if the entry is a zero page or
  * an empty entry that is just used for locking.  In total four special bits.
  *
@@ -66,49 +66,48 @@ fs_initcall(init_dax_wait_table);
  * and EMPTY bits aren't set the entry is a normal DAX entry with a filesystem
  * block allocation.
  */
-#define RADIX_DAX_SHIFT		(RADIX_TREE_EXCEPTIONAL_SHIFT + 4)
-#define RADIX_DAX_ENTRY_LOCK	(1 << RADIX_TREE_EXCEPTIONAL_SHIFT)
-#define RADIX_DAX_PMD		(1 << (RADIX_TREE_EXCEPTIONAL_SHIFT + 1))
-#define RADIX_DAX_ZERO_PAGE	(1 << (RADIX_TREE_EXCEPTIONAL_SHIFT + 2))
-#define RADIX_DAX_EMPTY		(1 << (RADIX_TREE_EXCEPTIONAL_SHIFT + 3))
+#define DAX_SHIFT	(4)
+#define DAX_ENTRY_LOCK	(1UL << 0)
+#define DAX_PMD		(1UL << 1)
+#define DAX_ZERO_PAGE	(1UL << 2)
+#define DAX_EMPTY	(1UL << 3)
 
 static unsigned long dax_radix_sector(void *entry)
 {
-	return (unsigned long)entry >> RADIX_DAX_SHIFT;
+	return xa_to_value(entry) >> DAX_SHIFT;
 }
 
 static void *dax_radix_locked_entry(sector_t sector, unsigned long flags)
 {
-	return (void *)(RADIX_TREE_EXCEPTIONAL_ENTRY | flags |
-			((unsigned long)sector << RADIX_DAX_SHIFT) |
-			RADIX_DAX_ENTRY_LOCK);
+	return xa_mk_value(flags | ((unsigned long)sector << DAX_SHIFT) |
+			DAX_ENTRY_LOCK);
 }
 
 static unsigned int dax_radix_order(void *entry)
 {
-	if ((unsigned long)entry & RADIX_DAX_PMD)
+	if (xa_to_value(entry) & DAX_PMD)
 		return PMD_SHIFT - PAGE_SHIFT;
 	return 0;
 }
 
 static int dax_is_pmd_entry(void *entry)
 {
-	return (unsigned long)entry & RADIX_DAX_PMD;
+	return xa_to_value(entry) & DAX_PMD;
 }
 
 static int dax_is_pte_entry(void *entry)
 {
-	return !((unsigned long)entry & RADIX_DAX_PMD);
+	return !(xa_to_value(entry) & DAX_PMD);
 }
 
 static int dax_is_zero_entry(void *entry)
 {
-	return (unsigned long)entry & RADIX_DAX_ZERO_PAGE;
+	return xa_to_value(entry) & DAX_ZERO_PAGE;
 }
 
 static int dax_is_empty_entry(void *entry)
 {
-	return (unsigned long)entry & RADIX_DAX_EMPTY;
+	return xa_to_value(entry) & DAX_EMPTY;
 }
 
 /*
@@ -188,9 +187,9 @@ static void dax_wake_mapping_entry_waiter(struct address_space *mapping,
  */
 static inline int slot_locked(struct address_space *mapping, void **slot)
 {
-	unsigned long entry = (unsigned long)
-		radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock);
-	return entry & RADIX_DAX_ENTRY_LOCK;
+	unsigned long entry = xa_to_value(
+		radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock));
+	return entry & DAX_ENTRY_LOCK;
 }
 
 /*
@@ -199,12 +198,11 @@ static inline int slot_locked(struct address_space *mapping, void **slot)
  */
 static inline void *lock_slot(struct address_space *mapping, void **slot)
 {
-	unsigned long entry = (unsigned long)
-		radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock);
-
-	entry |= RADIX_DAX_ENTRY_LOCK;
-	radix_tree_replace_slot(&mapping->pages, slot, (void *)entry);
-	return (void *)entry;
+	unsigned long v = xa_to_value(
+		radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock));
+	void *entry = xa_mk_value(v | DAX_ENTRY_LOCK);
+	radix_tree_replace_slot(&mapping->pages, slot, entry);
+	return entry;
 }
 
 /*
@@ -213,17 +211,16 @@ static inline void *lock_slot(struct address_space *mapping, void **slot)
  */
 static inline void *unlock_slot(struct address_space *mapping, void **slot)
 {
-	unsigned long entry = (unsigned long)
-		radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock);
-
-	entry &= ~(unsigned long)RADIX_DAX_ENTRY_LOCK;
-	radix_tree_replace_slot(&mapping->pages, slot, (void *)entry);
-	return (void *)entry;
+	unsigned long v = xa_to_value(
+		radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock));
+	void *entry = xa_mk_value(v & ~DAX_ENTRY_LOCK);
+	radix_tree_replace_slot(&mapping->pages, slot, entry);
+	return entry;
 }
 
 /*
  * Lookup entry in radix tree, wait for it to become unlocked if it is
- * exceptional entry and return it. The caller must call
+ * a data entry and return it. The caller must call
  * put_unlocked_mapping_entry() when he decided not to lock the entry or
  * put_locked_mapping_entry() when he locked the entry and now wants to
  * unlock it.
@@ -244,7 +241,7 @@ static void *get_unlocked_mapping_entry(struct address_space *mapping,
 		entry = __radix_tree_lookup(&mapping->pages, index, NULL,
 					  &slot);
 		if (!entry ||
-		    WARN_ON_ONCE(!radix_tree_exceptional_entry(entry)) ||
+		    WARN_ON_ONCE(!xa_is_value(entry)) ||
 		    !slot_locked(mapping, slot)) {
 			if (slotp)
 				*slotp = slot;
@@ -268,7 +265,7 @@ static void dax_unlock_mapping_entry(struct address_space *mapping,
 
 	xa_lock_irq(&mapping->pages);
 	entry = __radix_tree_lookup(&mapping->pages, index, NULL, &slot);
-	if (WARN_ON_ONCE(!entry || !radix_tree_exceptional_entry(entry) ||
+	if (WARN_ON_ONCE(!entry || !xa_is_value(entry) ||
 			 !slot_locked(mapping, slot))) {
 		xa_unlock_irq(&mapping->pages);
 		return;
@@ -299,12 +296,11 @@ static void put_unlocked_mapping_entry(struct address_space *mapping,
 }
 
 /*
- * Find radix tree entry at given index. If it points to an exceptional entry,
- * return it with the radix tree entry locked. If the radix tree doesn't
- * contain given index, create an empty exceptional entry for the index and
- * return with it locked.
+ * Find radix tree entry at given index. If it is a data value, return it
+ * with the radix tree entry locked. If the radix tree doesn't contain the
+ * given index, create an empty value for the index and return with it locked.
  *
- * When requesting an entry with size RADIX_DAX_PMD, grab_mapping_entry() will
+ * When requesting an entry with size DAX_PMD, grab_mapping_entry() will
  * either return that locked entry or will return an error.  This error will
  * happen if there are any 4k entries within the 2MiB range that we are
  * requesting.
@@ -334,13 +330,13 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
 	xa_lock_irq(&mapping->pages);
 	entry = get_unlocked_mapping_entry(mapping, index, &slot);
 
-	if (WARN_ON_ONCE(entry && !radix_tree_exceptional_entry(entry))) {
+	if (WARN_ON_ONCE(entry && !xa_is_value(entry))) {
 		entry = ERR_PTR(-EIO);
 		goto out_unlock;
 	}
 
 	if (entry) {
-		if (size_flag & RADIX_DAX_PMD) {
+		if (size_flag & DAX_PMD) {
 			if (dax_is_pte_entry(entry)) {
 				put_unlocked_mapping_entry(mapping, index,
 						entry);
@@ -410,7 +406,7 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
 					true);
 		}
 
-		entry = dax_radix_locked_entry(0, size_flag | RADIX_DAX_EMPTY);
+		entry = dax_radix_locked_entry(0, size_flag | DAX_EMPTY);
 
 		err = __radix_tree_insert(&mapping->pages, index,
 				dax_radix_order(entry), entry);
@@ -447,7 +443,7 @@ static int __dax_invalidate_mapping_entry(struct address_space *mapping,
 
 	xa_lock_irq(&mapping->pages);
 	entry = get_unlocked_mapping_entry(mapping, index, NULL);
-	if (!entry || WARN_ON_ONCE(!radix_tree_exceptional_entry(entry)))
+	if (!entry || WARN_ON_ONCE(!xa_is_value(entry)))
 		goto out;
 	if (!trunc &&
 	    (radix_tree_tag_get(pages, index, PAGECACHE_TAG_DIRTY) ||
@@ -462,7 +458,7 @@ static int __dax_invalidate_mapping_entry(struct address_space *mapping,
 	return ret;
 }
 /*
- * Delete exceptional DAX entry at @index from @mapping. Wait for radix tree
+ * Delete DAX data entry at @index from @mapping. Wait for radix tree
  * entry to get unlocked before deleting it.
  */
 int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index)
@@ -473,7 +469,7 @@ int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index)
 	 * This gets called from truncate / punch_hole path. As such, the caller
 	 * must hold locks protecting against concurrent modifications of the
 	 * radix tree (usually fs-private i_mmap_sem for writing). Since the
-	 * caller has seen exceptional entry for this index, we better find it
+	 * caller has seen a data entry for this index, we better find it
 	 * at that index as well...
 	 */
 	WARN_ON_ONCE(!ret);
@@ -481,7 +477,7 @@ int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index)
 }
 
 /*
- * Invalidate exceptional DAX entry if it is clean.
+ * Invalidate DAX data entry if it is clean.
  */
 int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
 				      pgoff_t index)
@@ -535,7 +531,7 @@ static void *dax_insert_mapping_entry(struct address_space *mapping,
 	if (dirty)
 		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
 
-	if (dax_is_zero_entry(entry) && !(flags & RADIX_DAX_ZERO_PAGE)) {
+	if (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE)) {
 		/* we are replacing a zero page with block mapping */
 		if (dax_is_pmd_entry(entry))
 			unmap_mapping_range(mapping,
@@ -674,13 +670,13 @@ static int dax_writeback_one(struct block_device *bdev,
 	 * A page got tagged dirty in DAX mapping? Something is seriously
 	 * wrong.
 	 */
-	if (WARN_ON(!radix_tree_exceptional_entry(entry)))
+	if (WARN_ON(!xa_is_value(entry)))
 		return -EIO;
 
 	xa_lock_irq(&mapping->pages);
 	entry2 = get_unlocked_mapping_entry(mapping, index, &slot);
 	/* Entry got punched out / reallocated? */
-	if (!entry2 || WARN_ON_ONCE(!radix_tree_exceptional_entry(entry2)))
+	if (!entry2 || WARN_ON_ONCE(!xa_is_value(entry2)))
 		goto put_unlocked;
 	/*
 	 * Entry got reallocated elsewhere? No need to writeback. We have to
@@ -886,7 +882,7 @@ static int dax_load_hole(struct address_space *mapping, void *entry,
 	}
 
 	entry2 = dax_insert_mapping_entry(mapping, vmf, entry, 0,
-			RADIX_DAX_ZERO_PAGE, false);
+			DAX_ZERO_PAGE, false);
 	if (IS_ERR(entry2)) {
 		ret = VM_FAULT_SIGBUS;
 		goto out;
@@ -1292,7 +1288,7 @@ static int dax_pmd_load_hole(struct vm_fault *vmf, struct iomap *iomap,
 		goto fallback;
 
 	ret = dax_insert_mapping_entry(mapping, vmf, entry, 0,
-			RADIX_DAX_PMD | RADIX_DAX_ZERO_PAGE, false);
+			DAX_PMD | DAX_ZERO_PAGE, false);
 	if (IS_ERR(ret))
 		goto fallback;
 
@@ -1377,7 +1373,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	 * is already in the tree, for instance), it will return -EEXIST and
 	 * we just fall back to 4k entries.
 	 */
-	entry = grab_mapping_entry(mapping, pgoff, RADIX_DAX_PMD);
+	entry = grab_mapping_entry(mapping, pgoff, DAX_PMD);
 	if (IS_ERR(entry))
 		goto fallback;
 
@@ -1416,7 +1412,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 
 		entry = dax_insert_mapping_entry(mapping, vmf, entry,
 						dax_iomap_sector(&iomap, pos),
-						RADIX_DAX_PMD, write && !sync);
+						DAX_PMD, write && !sync);
 		if (IS_ERR(entry))
 			goto finish_iomap;
 
@@ -1528,21 +1524,21 @@ static int dax_insert_pfn_mkwrite(struct vm_fault *vmf,
 	pgoff_t index = vmf->pgoff;
 	int vmf_ret, error;
 
-	spin_lock_irq(&mapping->tree_lock);
+	xa_lock_irq(&mapping->pages);
 	entry = get_unlocked_mapping_entry(mapping, index, &slot);
 	/* Did we race with someone splitting entry or so? */
 	if (!entry ||
 	    (pe_size == PE_SIZE_PTE && !dax_is_pte_entry(entry)) ||
 	    (pe_size == PE_SIZE_PMD && !dax_is_pmd_entry(entry))) {
 		put_unlocked_mapping_entry(mapping, index, entry);
-		spin_unlock_irq(&mapping->tree_lock);
+		xa_unlock_irq(&mapping->pages);
 		trace_dax_insert_pfn_mkwrite_no_entry(mapping->host, vmf,
 						      VM_FAULT_NOPAGE);
 		return VM_FAULT_NOPAGE;
 	}
-	radix_tree_tag_set(&mapping->page_tree, index, PAGECACHE_TAG_DIRTY);
+	radix_tree_tag_set(&mapping->pages, index, PAGECACHE_TAG_DIRTY);
 	entry = lock_slot(mapping, slot);
-	spin_unlock_irq(&mapping->tree_lock);
+	xa_unlock_irq(&mapping->pages);
 	switch (pe_size) {
 	case PE_SIZE_PTE:
 		error = vm_insert_mixed_mkwrite(vmf->vma, vmf->address, pfn);
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 339e4c1c044d..fadc6dbe17d6 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -553,7 +553,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
 		if (!page)
 			return;
 
-		if (radix_tree_exceptional_entry(page))
+		if (xa_is_value(page))
 			mss->swap += PAGE_SIZE;
 		else
 			put_page(page);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 265340149265..a5c105d292a7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -389,23 +389,41 @@ int pagecache_write_end(struct file *, struct address_space *mapping,
 				loff_t pos, unsigned len, unsigned copied,
 				struct page *page, void *fsdata);
 
+/**
+ * struct address_space - Contents of a cachable, mappable object
+ *
+ * @host: Owner, either the inode or the block_device
+ * @pages: Cached pages
+ * @gfp_mask: Memory allocation flags to use for allocating pages
+ * @i_mmap_writable: count VM_SHARED mappings
+ * @i_mmap: tree of private and shared mappings
+ * @i_mmap_rwsem: Protects @i_mmap and @i_mmap_writable
+ * @nrpages: Number of total pages, protected by pages.xa_lock
+ * @nrexceptional: Shadow or DAX entries, protected by pages.xa_lock
+ * @writeback_index: writeback starts here
+ * @a_ops: methods
+ * @flags: Error bits and flags (AS_*)
+ * @wb_err: The most recent error which has occurred
+ * @private_lock: For use by the owner of the address_space
+ * @private_list: For use by the owner of the address space
+ * @private_data: For use by the owner of the address space
+ */
 struct address_space {
-	struct inode		*host;		/* owner: inode, block_device */
-	struct radix_tree_root	pages;		/* cached pages */
-	gfp_t			gfp_mask;	/* for allocating pages */
-	atomic_t		i_mmap_writable;/* count VM_SHARED mappings */
-	struct rb_root_cached	i_mmap;		/* tree of private and shared mappings */
-	struct rw_semaphore	i_mmap_rwsem;	/* protect tree, count, list */
-	/* Protected by pages.xa_lock */
-	unsigned long		nrpages;	/* number of total pages */
-	unsigned long		nrexceptional;	/* shadow or DAX entries */
-	pgoff_t			writeback_index;/* writeback starts here */
-	const struct address_space_operations *a_ops;	/* methods */
-	unsigned long		flags;		/* error bits */
+	struct inode		*host;
+	struct radix_tree_root	pages;
+	gfp_t			gfp_mask;
+	atomic_t		i_mmap_writable;
+	struct rb_root_cached	i_mmap;
+	struct rw_semaphore	i_mmap_rwsem;
+	unsigned long		nrpages;
+	unsigned long		nrexceptional;
+	pgoff_t			writeback_index;
+	const struct address_space_operations *a_ops;
+	unsigned long		flags;
 	errseq_t		wb_err;
-	spinlock_t		private_lock;	/* for use by the address_space */
-	struct list_head	private_list;	/* ditto */
-	void			*private_data;	/* ditto */
+	spinlock_t		private_lock;
+	struct list_head	private_list;
+	void			*private_data;
 } __attribute__((aligned(sizeof(long)))) __randomize_layout;
 	/*
 	 * On most architectures that alignment is already the case; but
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index b4d60295decd..ee9aad11d472 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -28,34 +28,26 @@
 #include <linux/rcupdate.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
+#include <linux/xarray.h>
 
 /*
  * The bottom two bits of the slot determine how the remaining bits in the
  * slot are interpreted:
  *
  * 00 - data pointer
- * 01 - internal entry
- * 10 - exceptional entry
- * 11 - this bit combination is currently unused/reserved
+ * 10 - internal entry
+ * x1 - inline data entry
  *
  * The internal entry may be a pointer to the next level in the tree, a
  * sibling entry, or an indicator that the entry in this slot has been moved
  * to another location in the tree and the lookup should be restarted.  While
  * NULL fits the 'data pointer' pattern, it means that there is no entry in
  * the tree for this index (no matter what level of the tree it is found at).
- * This means that you cannot store NULL in the tree as a value for the index.
+ * This means that storing NULL in the tree as a value for the index is the
+ * same as deleting the index from the tree.
  */
 #define RADIX_TREE_ENTRY_MASK		3UL
-#define RADIX_TREE_INTERNAL_NODE	1UL
-
-/*
- * Most users of the radix tree store pointers but shmem/tmpfs stores swap
- * entries in the same tree.  They are marked as exceptional entries to
- * distinguish them from pointers to struct page.
- * EXCEPTIONAL_ENTRY tests the bit, EXCEPTIONAL_SHIFT shifts content past it.
- */
-#define RADIX_TREE_EXCEPTIONAL_ENTRY	2
-#define RADIX_TREE_EXCEPTIONAL_SHIFT	2
+#define RADIX_TREE_INTERNAL_NODE	2UL
 
 static inline bool radix_tree_is_internal_node(void *ptr)
 {
@@ -83,11 +75,10 @@ static inline bool radix_tree_is_internal_node(void *ptr)
 
 /*
  * @count is the count of every non-NULL element in the ->slots array
- * whether that is an exceptional entry, a retry entry, a user pointer,
+ * whether that is a data entry, a retry entry, a user pointer,
  * a sibling entry or a pointer to the next level of the tree.
  * @exceptional is the count of every element in ->slots which is
- * either radix_tree_exceptional_entry() or is a sibling entry for an
- * exceptional entry.
+ * either a data entry or a sibling entry for data.
  */
 struct radix_tree_node {
 	unsigned char	shift;		/* Bits remaining in each slot */
@@ -267,17 +258,6 @@ static inline int radix_tree_deref_retry(void *arg)
 	return unlikely(radix_tree_is_internal_node(arg));
 }
 
-/**
- * radix_tree_exceptional_entry	- radix_tree_deref_slot gave exceptional entry?
- * @arg:	value returned by radix_tree_deref_slot
- * Returns:	0 if well-aligned pointer, non-0 if exceptional entry.
- */
-static inline int radix_tree_exceptional_entry(void *arg)
-{
-	/* Not unlikely because radix_tree_exception often tested first */
-	return (unsigned long)arg & RADIX_TREE_EXCEPTIONAL_ENTRY;
-}
-
 /**
  * radix_tree_exception	- radix_tree_deref_slot returned either exception?
  * @arg:	value returned by radix_tree_deref_slot
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 9c5a2628d6ce..5e93c7b500da 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -17,9 +17,8 @@
  *
  * swp_entry_t's are *never* stored anywhere in their arch-dependent format.
  */
-#define SWP_TYPE_SHIFT(e)	((sizeof(e.val) * 8) - \
-			(MAX_SWAPFILES_SHIFT + RADIX_TREE_EXCEPTIONAL_SHIFT))
-#define SWP_OFFSET_MASK(e)	((1UL << SWP_TYPE_SHIFT(e)) - 1)
+#define SWP_TYPE_SHIFT	(BITS_PER_XA_VALUE - MAX_SWAPFILES_SHIFT)
+#define SWP_OFFSET_MASK	((1UL << SWP_TYPE_SHIFT) - 1)
 
 /*
  * Store a type+offset into a swp_entry_t in an arch-independent format
@@ -28,8 +27,7 @@ static inline swp_entry_t swp_entry(unsigned long type, pgoff_t offset)
 {
 	swp_entry_t ret;
 
-	ret.val = (type << SWP_TYPE_SHIFT(ret)) |
-			(offset & SWP_OFFSET_MASK(ret));
+	ret.val = (type << SWP_TYPE_SHIFT) | (offset & SWP_OFFSET_MASK);
 	return ret;
 }
 
@@ -39,7 +37,7 @@ static inline swp_entry_t swp_entry(unsigned long type, pgoff_t offset)
  */
 static inline unsigned swp_type(swp_entry_t entry)
 {
-	return (entry.val >> SWP_TYPE_SHIFT(entry));
+	return (entry.val >> SWP_TYPE_SHIFT);
 }
 
 /*
@@ -48,7 +46,7 @@ static inline unsigned swp_type(swp_entry_t entry)
  */
 static inline pgoff_t swp_offset(swp_entry_t entry)
 {
-	return entry.val & SWP_OFFSET_MASK(entry);
+	return entry.val & SWP_OFFSET_MASK;
 }
 
 #ifdef CONFIG_MMU
@@ -89,16 +87,13 @@ static inline swp_entry_t radix_to_swp_entry(void *arg)
 {
 	swp_entry_t entry;
 
-	entry.val = (unsigned long)arg >> RADIX_TREE_EXCEPTIONAL_SHIFT;
+	entry.val = xa_to_value(arg);
 	return entry;
 }
 
 static inline void *swp_to_radix_entry(swp_entry_t entry)
 {
-	unsigned long value;
-
-	value = entry.val << RADIX_TREE_EXCEPTIONAL_SHIFT;
-	return (void *)(value | RADIX_TREE_EXCEPTIONAL_ENTRY);
+	return xa_mk_value(entry.val);
 }
 
 #if IS_ENABLED(CONFIG_DEVICE_PRIVATE)
diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index a5a933925b85..b1da8021b5fa 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -16,7 +16,73 @@
  * GNU General Public License for more details.
  */
 
+/**
+ * An eXtensible Array is an array of entries indexed by an unsigned
+ * long.  The XArray takes care of its own locking, using an irqsafe
+ * spinlock for operations that modify the XArray and RCU for operations
+ * which only read from the XArray.
+ *
+ * The XArray can store pointers which are aligned to a multiple of 4.
+ * This includes all objects allocated via SLAB, but you cannot store
+ * pointers to an arbitrary offset within an object.  The XArray can also
+ * store data values between 0 and LONG_MAX by converting them into entries
+ * using xa_mk_value().  The XArray does not support storing IS_ERR()
+ * pointers; some conflict with data values and others conflict with
+ * entries the XArray uses for its own purposes.
+ *
+ * A freshly initialised XArray is full of NULL pointers.  You can set the
+ * entry at any index by calling xa_store(), and get the value by calling
+ * xa_load().  There is no difference between an entry which has never been
+ * stored to and an entry which has most recently had NULL stored to it.  You
+ * can conditionally update the value of an entry by calling xa_cmpxchg().
+ * Each entry which isn't NULL can be tagged with up to three bits of extra
+ * information, accessed through xa_get_tag(), xa_set_tag() and
+ * xa_clear_tag().  You can copy batches of entries out of the array by
+ * calling xa_get_entries() or xa_get_tagged().  You can iterate over
+ * non-NULL entries in the array by calling xa_find(), xa_next() or
+ * using the xa_for_each() iterator.
+ *
+ * There are two levels of API provided.  Normal and Advanced.
+ * The advanced API is more flexible but has fewer safeguards.
+ */
 #include <linux/spinlock.h>
+#include <linux/types.h>
+
+#define BITS_PER_XA_VALUE	(BITS_PER_LONG - 1)
+
+/**
+ * xa_mk_value() - Create an XArray entry from a data value.
+ * @v: Value to store in XArray.
+ *
+ * Return: An entry suitable for storing in the XArray.
+ */
+static inline void *xa_mk_value(unsigned long v)
+{
+	WARN_ON((long)v < 0);
+	return (void *)((v << 1) | 1);
+}
+
+/**
+ * xa_to_value() - Get value stored in an XArray entry.
+ * @entry: XArray entry.
+ *
+ * Return: The value stored in the XArray entry.
+ */
+static inline unsigned long xa_to_value(void *entry)
+{
+	return (unsigned long)entry >> 1;
+}
+
+/**
+ * xa_is_value() - Determine if an entry is a data value.
+ * @entry: XArray entry.
+ *
+ * Return: True if the entry is a data value, false if it is a pointer.
+ */
+static inline bool xa_is_value(void *entry)
+{
+	return (unsigned long)entry & 1;
+}
 
 #define xa_trylock(xa)		spin_trylock(&(xa)->xa_lock)
 #define xa_lock(xa)		spin_lock(&(xa)->xa_lock)
diff --git a/lib/idr.c b/lib/idr.c
index 35678388e134..afcdff365037 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -3,6 +3,7 @@
 #include <linux/idr.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
+#include <linux/xarray.h>
 
 DEFINE_PER_CPU(struct ida_bitmap *, ida_bitmap);
 static DEFINE_SPINLOCK(simple_ida_lock);
@@ -271,11 +272,8 @@ EXPORT_SYMBOL(idr_replace);
  * by the number of bits in the leaf bitmap before doing a radix tree lookup.
  *
  * As an optimisation, if there are only a few low bits set in any given
- * leaf, instead of allocating a 128-byte bitmap, we use the 'exceptional
- * entry' functionality of the radix tree to store BITS_PER_LONG - 2 bits
- * directly in the entry.  By being really tricksy, we could store
- * BITS_PER_LONG - 1 bits, but there're diminishing returns after optimising
- * for 0-3 allocated IDs.
+ * leaf, instead of allocating a 128-byte bitmap, we store the data
+ * directly in the entry.
  *
  * We allow the radix tree 'exceptional' count to get out of date.  Nothing
  * in the IDA nor the radix tree code checks it.  If it becomes important
@@ -317,12 +315,11 @@ int ida_get_new_above(struct ida *ida, int start, int *id)
 	struct radix_tree_iter iter;
 	struct ida_bitmap *bitmap;
 	unsigned long index;
-	unsigned bit, ebit;
+	unsigned bit;
 	int new;
 
 	index = start / IDA_BITMAP_BITS;
 	bit = start % IDA_BITMAP_BITS;
-	ebit = bit + RADIX_TREE_EXCEPTIONAL_SHIFT;
 
 	slot = radix_tree_iter_init(&iter, index);
 	for (;;) {
@@ -337,26 +334,25 @@ int ida_get_new_above(struct ida *ida, int start, int *id)
 				return PTR_ERR(slot);
 			}
 		}
-		if (iter.index > index) {
+		if (iter.index > index)
 			bit = 0;
-			ebit = RADIX_TREE_EXCEPTIONAL_SHIFT;
-		}
 		new = iter.index * IDA_BITMAP_BITS;
 		bitmap = rcu_dereference_raw(*slot);
-		if (radix_tree_exception(bitmap)) {
-			unsigned long tmp = (unsigned long)bitmap;
-			ebit = find_next_zero_bit(&tmp, BITS_PER_LONG, ebit);
-			if (ebit < BITS_PER_LONG) {
-				tmp |= 1UL << ebit;
-				rcu_assign_pointer(*slot, (void *)tmp);
-				*id = new + ebit - RADIX_TREE_EXCEPTIONAL_SHIFT;
+		if (xa_is_value(bitmap)) {
+			unsigned long tmp = xa_to_value(bitmap);
+			int vbit = find_next_zero_bit(&tmp, BITS_PER_XA_VALUE,
+							bit);
+			if (vbit < BITS_PER_XA_VALUE) {
+				tmp |= 1UL << vbit;
+				rcu_assign_pointer(*slot, xa_mk_value(tmp));
+				*id = new + vbit;
 				return 0;
 			}
 			bitmap = this_cpu_xchg(ida_bitmap, NULL);
 			if (!bitmap)
 				return -EAGAIN;
 			memset(bitmap, 0, sizeof(*bitmap));
-			bitmap->bitmap[0] = tmp >> RADIX_TREE_EXCEPTIONAL_SHIFT;
+			bitmap->bitmap[0] = tmp;
 			rcu_assign_pointer(*slot, bitmap);
 		}
 
@@ -377,19 +373,15 @@ int ida_get_new_above(struct ida *ida, int start, int *id)
 			new += bit;
 			if (new < 0)
 				return -ENOSPC;
-			if (ebit < BITS_PER_LONG) {
-				bitmap = (void *)((1UL << ebit) |
-						RADIX_TREE_EXCEPTIONAL_ENTRY);
-				radix_tree_iter_replace(root, &iter, slot,
-						bitmap);
-				*id = new;
-				return 0;
+			if (bit < BITS_PER_XA_VALUE) {
+				bitmap = xa_mk_value(1UL << bit);
+			} else {
+				bitmap = this_cpu_xchg(ida_bitmap, NULL);
+				if (!bitmap)
+					return -EAGAIN;
+				memset(bitmap, 0, sizeof(*bitmap));
+				__set_bit(bit, bitmap->bitmap);
 			}
-			bitmap = this_cpu_xchg(ida_bitmap, NULL);
-			if (!bitmap)
-				return -EAGAIN;
-			memset(bitmap, 0, sizeof(*bitmap));
-			__set_bit(bit, bitmap->bitmap);
 			radix_tree_iter_replace(root, &iter, slot, bitmap);
 		}
 
@@ -420,9 +412,9 @@ void ida_remove(struct ida *ida, int id)
 		goto err;
 
 	bitmap = rcu_dereference_raw(*slot);
-	if (radix_tree_exception(bitmap)) {
+	if (xa_is_value(bitmap)) {
 		btmp = (unsigned long *)slot;
-		offset += RADIX_TREE_EXCEPTIONAL_SHIFT;
+		offset += 1; /* Intimate knowledge of the xa_data encoding */
 		if (offset >= BITS_PER_LONG)
 			goto err;
 	} else {
@@ -433,9 +425,8 @@ void ida_remove(struct ida *ida, int id)
 
 	__clear_bit(offset, btmp);
 	radix_tree_iter_tag_set(&ida->ida_rt, &iter, IDR_FREE);
-	if (radix_tree_exception(bitmap)) {
-		if (rcu_dereference_raw(*slot) ==
-					(void *)RADIX_TREE_EXCEPTIONAL_ENTRY)
+	if (xa_is_value(bitmap)) {
+		if (xa_to_value(rcu_dereference_raw(*slot)) == 0)
 			radix_tree_iter_delete(&ida->ida_rt, &iter, slot);
 	} else if (bitmap_empty(btmp, IDA_BITMAP_BITS)) {
 		kfree(bitmap);
@@ -463,7 +454,7 @@ void ida_destroy(struct ida *ida)
 
 	radix_tree_for_each_slot(slot, &ida->ida_rt, &iter, 0) {
 		struct ida_bitmap *bitmap = rcu_dereference_raw(*slot);
-		if (!radix_tree_exception(bitmap))
+		if (!xa_is_value(bitmap))
 			kfree(bitmap);
 		radix_tree_iter_delete(&ida->ida_rt, &iter, slot);
 	}
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 6d29ca4c8db0..30e49b89aa3b 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -339,14 +339,12 @@ static void dump_ida_node(void *entry, unsigned long index)
 		for (i = 0; i < RADIX_TREE_MAP_SIZE; i++)
 			dump_ida_node(node->slots[i],
 					index | (i << node->shift));
-	} else if (radix_tree_exceptional_entry(entry)) {
+	} else if (xa_is_value(entry)) {
 		pr_debug("ida excp: %p offset %d indices %lu-%lu data %lx\n",
 				entry, (int)(index & RADIX_TREE_MAP_MASK),
 				index * IDA_BITMAP_BITS,
-				index * IDA_BITMAP_BITS + BITS_PER_LONG -
-					RADIX_TREE_EXCEPTIONAL_SHIFT,
-				(unsigned long)entry >>
-					RADIX_TREE_EXCEPTIONAL_SHIFT);
+				index * IDA_BITMAP_BITS + BITS_PER_XA_VALUE,
+				xa_to_value(entry));
 	} else {
 		struct ida_bitmap *bitmap = entry;
 
@@ -655,7 +653,7 @@ static int radix_tree_extend(struct radix_tree_root *root, gfp_t gfp,
 		BUG_ON(shift > BITS_PER_LONG);
 		if (radix_tree_is_internal_node(entry)) {
 			entry_to_node(entry)->parent = node;
-		} else if (radix_tree_exceptional_entry(entry)) {
+		} else if (xa_is_value(entry)) {
 			/* Moving an exceptional root->rnode to a node */
 			node->exceptional = 1;
 		}
@@ -946,12 +944,12 @@ static inline int insert_entries(struct radix_tree_node *node,
 					!is_sibling_entry(node, old) &&
 					(old != RADIX_TREE_RETRY))
 			radix_tree_free_nodes(old);
-		if (radix_tree_exceptional_entry(old))
+		if (xa_is_value(old))
 			node->exceptional--;
 	}
 	if (node) {
 		node->count += n;
-		if (radix_tree_exceptional_entry(item))
+		if (xa_is_value(item))
 			node->exceptional += n;
 	}
 	return n;
@@ -965,7 +963,7 @@ static inline int insert_entries(struct radix_tree_node *node,
 	rcu_assign_pointer(*slot, item);
 	if (node) {
 		node->count++;
-		if (radix_tree_exceptional_entry(item))
+		if (xa_is_value(item))
 			node->exceptional++;
 	}
 	return 1;
@@ -1182,8 +1180,7 @@ void __radix_tree_replace(struct radix_tree_root *root,
 			  radix_tree_update_node_t update_node)
 {
 	void *old = rcu_dereference_raw(*slot);
-	int exceptional = !!radix_tree_exceptional_entry(item) -
-				!!radix_tree_exceptional_entry(old);
+	int exceptional = !!xa_is_value(item) - !!xa_is_value(old);
 	int count = calculate_count(root, node, slot, item, old);
 
 	/*
@@ -1986,7 +1983,7 @@ static bool __radix_tree_delete(struct radix_tree_root *root,
 				struct radix_tree_node *node, void __rcu **slot)
 {
 	void *old = rcu_dereference_raw(*slot);
-	int exceptional = radix_tree_exceptional_entry(old) ? -1 : 0;
+	int exceptional = xa_is_value(old) ? -1 : 0;
 	unsigned offset = get_slot_offset(node, slot);
 	int tag;
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 5c8f22fe4e62..1d012dd3629e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -127,7 +127,7 @@ static int page_cache_tree_insert(struct address_space *mapping,
 		void *p;
 
 		p = radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock);
-		if (!radix_tree_exceptional_entry(p))
+		if (!xa_is_value(p))
 			return -EEXIST;
 
 		mapping->nrexceptional--;
@@ -336,7 +336,7 @@ page_cache_tree_delete_batch(struct address_space *mapping,
 			break;
 		page = radix_tree_deref_slot_protected(slot,
 						       &mapping->pages.xa_lock);
-		if (radix_tree_exceptional_entry(page))
+		if (xa_is_value(page))
 			continue;
 		if (!tail_pages) {
 			/*
@@ -1355,7 +1355,7 @@ pgoff_t page_cache_next_hole(struct address_space *mapping,
 		struct page *page;
 
 		page = radix_tree_lookup(&mapping->pages, index);
-		if (!page || radix_tree_exceptional_entry(page))
+		if (!page || xa_is_value(page))
 			break;
 		index++;
 		if (index == 0)
@@ -1396,7 +1396,7 @@ pgoff_t page_cache_prev_hole(struct address_space *mapping,
 		struct page *page;
 
 		page = radix_tree_lookup(&mapping->pages, index);
-		if (!page || radix_tree_exceptional_entry(page))
+		if (!page || xa_is_value(page))
 			break;
 		index--;
 		if (index == ULONG_MAX)
@@ -1539,7 +1539,7 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset,
 
 repeat:
 	page = find_get_entry(mapping, offset);
-	if (radix_tree_exceptional_entry(page))
+	if (xa_is_value(page))
 		page = NULL;
 	if (!page)
 		goto no_page;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index cb4d199bf328..55ade70c33bb 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1363,7 +1363,7 @@ static void collapse_shmem(struct mm_struct *mm,
 
 		page = radix_tree_deref_slot_protected(slot,
 				&mapping->pages.xa_lock);
-		if (radix_tree_exceptional_entry(page) || !PageUptodate(page)) {
+		if (xa_is_value(page) || !PageUptodate(page)) {
 			xa_unlock_irq(&mapping->pages);
 			/* swap in or instantiate fallocated page */
 			if (shmem_getpage(mapping->host, index, &page,
diff --git a/mm/madvise.c b/mm/madvise.c
index 375cf32087e4..85f5d4f66cdd 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -251,7 +251,7 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma,
 		index = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
 
 		page = find_get_entry(mapping, index);
-		if (!radix_tree_exceptional_entry(page)) {
+		if (!xa_is_value(page)) {
 			if (page)
 				put_page(page);
 			continue;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 953dfed9e780..1b8dc671e32c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4526,7 +4526,7 @@ static struct page *mc_handle_file_pte(struct vm_area_struct *vma,
 	/* shmem/tmpfs may report page out on swap: account for that too. */
 	if (shmem_mapping(mapping)) {
 		page = find_get_entry(mapping, pgoff);
-		if (radix_tree_exceptional_entry(page)) {
+		if (xa_is_value(page)) {
 			swp_entry_t swp = radix_to_swp_entry(page);
 			if (do_memsw_account())
 				*entry = swp;
diff --git a/mm/mincore.c b/mm/mincore.c
index fc37afe226e6..4985965aa20a 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -66,7 +66,7 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff)
 		 * shmem/tmpfs may return swap: account for swapcache
 		 * page too.
 		 */
-		if (radix_tree_exceptional_entry(page)) {
+		if (xa_is_value(page)) {
 			swp_entry_t swp = radix_to_swp_entry(page);
 			page = find_get_page(swap_address_space(swp),
 					     swp_offset(swp));
diff --git a/mm/readahead.c b/mm/readahead.c
index 514188fd2489..4851f002605f 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -177,7 +177,7 @@ int __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
 		rcu_read_lock();
 		page = radix_tree_lookup(&mapping->pages, page_offset);
 		rcu_read_unlock();
-		if (page && !radix_tree_exceptional_entry(page))
+		if (page && !xa_is_value(page))
 			continue;
 
 		page = __page_cache_alloc(gfp_mask);
diff --git a/mm/shmem.c b/mm/shmem.c
index b1396ed0c9b1..21bf42f14ee2 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -690,7 +690,7 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
 			continue;
 		}
 
-		if (radix_tree_exceptional_entry(page))
+		if (xa_is_value(page))
 			swapped++;
 
 		if (need_resched()) {
@@ -805,7 +805,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 			if (index >= end)
 				break;
 
-			if (radix_tree_exceptional_entry(page)) {
+			if (xa_is_value(page)) {
 				if (unfalloc)
 					continue;
 				nr_swaps_freed += !shmem_free_swap(mapping,
@@ -902,7 +902,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 			if (index >= end)
 				break;
 
-			if (radix_tree_exceptional_entry(page)) {
+			if (xa_is_value(page)) {
 				if (unfalloc)
 					continue;
 				if (shmem_free_swap(mapping, index, page)) {
@@ -1614,7 +1614,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
 repeat:
 	swap.val = 0;
 	page = find_lock_entry(mapping, index);
-	if (radix_tree_exceptional_entry(page)) {
+	if (xa_is_value(page)) {
 		swap = radix_to_swp_entry(page);
 		page = NULL;
 	}
@@ -2547,7 +2547,7 @@ static pgoff_t shmem_seek_hole_data(struct address_space *mapping,
 				index = indices[i];
 			}
 			page = pvec.pages[i];
-			if (page && !radix_tree_exceptional_entry(page)) {
+			if (page && !xa_is_value(page)) {
 				if (!PageUptodate(page))
 					page = NULL;
 			}
diff --git a/mm/swap.c b/mm/swap.c
index 38e1b6374a97..8d7773cb2c3f 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -953,7 +953,7 @@ void pagevec_remove_exceptionals(struct pagevec *pvec)
 
 	for (i = 0, j = 0; i < pagevec_count(pvec); i++) {
 		struct page *page = pvec->pages[i];
-		if (!radix_tree_exceptional_entry(page))
+		if (!xa_is_value(page))
 			pvec->pages[j++] = page;
 	}
 	pvec->nr = j;
diff --git a/mm/truncate.c b/mm/truncate.c
index 094158f2e447..69bb743dd7e5 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -70,7 +70,7 @@ static void truncate_exceptional_pvec_entries(struct address_space *mapping,
 		return;
 
 	for (j = 0; j < pagevec_count(pvec); j++)
-		if (radix_tree_exceptional_entry(pvec->pages[j]))
+		if (xa_is_value(pvec->pages[j]))
 			break;
 
 	if (j == pagevec_count(pvec))
@@ -85,7 +85,7 @@ static void truncate_exceptional_pvec_entries(struct address_space *mapping,
 		struct page *page = pvec->pages[i];
 		pgoff_t index = indices[i];
 
-		if (!radix_tree_exceptional_entry(page)) {
+		if (!xa_is_value(page)) {
 			pvec->pages[j++] = page;
 			continue;
 		}
@@ -351,7 +351,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 			if (index >= end)
 				break;
 
-			if (radix_tree_exceptional_entry(page))
+			if (xa_is_value(page))
 				continue;
 
 			if (!trylock_page(page))
@@ -446,7 +446,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 				break;
 			}
 
-			if (radix_tree_exceptional_entry(page))
+			if (xa_is_value(page))
 				continue;
 
 			lock_page(page);
@@ -565,7 +565,7 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
 			if (index > end)
 				break;
 
-			if (radix_tree_exceptional_entry(page)) {
+			if (xa_is_value(page)) {
 				invalidate_exceptional_entry(mapping, index,
 							     page);
 				continue;
@@ -696,7 +696,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
 			if (index > end)
 				break;
 
-			if (radix_tree_exceptional_entry(page)) {
+			if (xa_is_value(page)) {
 				if (!invalidate_exceptional_entry2(mapping,
 								   index, page))
 					ret = -EBUSY;
diff --git a/mm/workingset.c b/mm/workingset.c
index 2d071f0df3af..0a3465700d5f 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -155,8 +155,8 @@
  * refault distance will immediately activate the refaulting page.
  */
 
-#define EVICTION_SHIFT	(RADIX_TREE_EXCEPTIONAL_ENTRY + \
-			 NODES_SHIFT +	\
+#define EVICTION_SHIFT	((BITS_PER_LONG - BITS_PER_XA_VALUE) +	\
+			 NODES_SHIFT +				\
 			 MEM_CGROUP_ID_SHIFT)
 #define EVICTION_MASK	(~0UL >> EVICTION_SHIFT)
 
@@ -175,18 +175,16 @@ static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction)
 	eviction >>= bucket_order;
 	eviction = (eviction << MEM_CGROUP_ID_SHIFT) | memcgid;
 	eviction = (eviction << NODES_SHIFT) | pgdat->node_id;
-	eviction = (eviction << RADIX_TREE_EXCEPTIONAL_SHIFT);
 
-	return (void *)(eviction | RADIX_TREE_EXCEPTIONAL_ENTRY);
+	return xa_mk_value(eviction);
 }
 
 static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat,
 			  unsigned long *evictionp)
 {
-	unsigned long entry = (unsigned long)shadow;
+	unsigned long entry = xa_to_value(shadow);
 	int memcgid, nid;
 
-	entry >>= RADIX_TREE_EXCEPTIONAL_SHIFT;
 	nid = entry & ((1UL << NODES_SHIFT) - 1);
 	entry >>= NODES_SHIFT;
 	memcgid = entry & ((1UL << MEM_CGROUP_ID_SHIFT) - 1);
@@ -453,7 +451,7 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
 		goto out_invalid;
 	for (i = 0; i < RADIX_TREE_MAP_SIZE; i++) {
 		if (node->slots[i]) {
-			if (WARN_ON_ONCE(!radix_tree_exceptional_entry(node->slots[i])))
+			if (WARN_ON_ONCE(!xa_is_value(node->slots[i])))
 				goto out_invalid;
 			if (WARN_ON_ONCE(!node->exceptional))
 				goto out_invalid;
diff --git a/tools/testing/radix-tree/idr-test.c b/tools/testing/radix-tree/idr-test.c
index 1dff94c15da5..1720bd90ece0 100644
--- a/tools/testing/radix-tree/idr-test.c
+++ b/tools/testing/radix-tree/idr-test.c
@@ -19,7 +19,7 @@
 
 #include "test.h"
 
-#define DUMMY_PTR	((void *)0x12)
+#define DUMMY_PTR	((void *)0x10)
 
 int item_idr_free(int id, void *p, void *data)
 {
@@ -332,11 +332,11 @@ void ida_check_conv(void)
 	for (i = 0; i < 1000000; i++) {
 		int err = ida_get_new(&ida, &id);
 		if (err == -EAGAIN) {
-			assert((i % IDA_BITMAP_BITS) == (BITS_PER_LONG - 2));
+			assert((i % IDA_BITMAP_BITS) == (BITS_PER_LONG - 1));
 			assert(ida_pre_get(&ida, GFP_KERNEL));
 			err = ida_get_new(&ida, &id);
 		} else {
-			assert((i % IDA_BITMAP_BITS) != (BITS_PER_LONG - 2));
+			assert((i % IDA_BITMAP_BITS) != (BITS_PER_LONG - 1));
 		}
 		assert(!err);
 		assert(id == i);
diff --git a/tools/testing/radix-tree/linux/radix-tree.h b/tools/testing/radix-tree/linux/radix-tree.h
index 24f13d27a8da..de3f655caca3 100644
--- a/tools/testing/radix-tree/linux/radix-tree.h
+++ b/tools/testing/radix-tree/linux/radix-tree.h
@@ -4,6 +4,7 @@
 
 #include "generated/map-shift.h"
 #include "../../../../include/linux/radix-tree.h"
+#include <linux/xarray.h>
 
 extern int kmalloc_verbose;
 extern int test_verbose;
diff --git a/tools/testing/radix-tree/multiorder.c b/tools/testing/radix-tree/multiorder.c
index 59245b3d587c..684e76f79f4a 100644
--- a/tools/testing/radix-tree/multiorder.c
+++ b/tools/testing/radix-tree/multiorder.c
@@ -38,12 +38,11 @@ static void __multiorder_tag_test(int index, int order)
 
 	/*
 	 * Verify we get collisions for covered indices.  We try and fail to
-	 * insert an exceptional entry so we don't leak memory via
+	 * insert a data entry so we don't leak memory via
 	 * item_insert_order().
 	 */
 	for_each_index(i, base, order) {
-		err = __radix_tree_insert(&tree, i, order,
-				(void *)(0xA0 | RADIX_TREE_EXCEPTIONAL_ENTRY));
+		err = __radix_tree_insert(&tree, i, order, xa_mk_value(0xA0));
 		assert(err == -EEXIST);
 	}
 
@@ -379,8 +378,8 @@ static void multiorder_join1(unsigned long index,
 }
 
 /*
- * Check that the accounting of exceptional entries is handled correctly
- * by joining an exceptional entry to a normal pointer.
+ * Check that the accounting of inline data entries is handled correctly
+ * by joining a data entry to a normal pointer.
  */
 static void multiorder_join2(unsigned order1, unsigned order2)
 {
@@ -390,9 +389,9 @@ static void multiorder_join2(unsigned order1, unsigned order2)
 	void *item2;
 
 	item_insert_order(&tree, 0, order2);
-	radix_tree_insert(&tree, 1 << order2, (void *)0x12UL);
+	radix_tree_insert(&tree, 1 << order2, xa_mk_value(5));
 	item2 = __radix_tree_lookup(&tree, 1 << order2, &node, NULL);
-	assert(item2 == (void *)0x12UL);
+	assert(item2 == xa_mk_value(5));
 	assert(node->exceptional == 1);
 
 	item2 = radix_tree_lookup(&tree, 0);
@@ -406,7 +405,7 @@ static void multiorder_join2(unsigned order1, unsigned order2)
 }
 
 /*
- * This test revealed an accounting bug for exceptional entries at one point.
+ * This test revealed an accounting bug for inline data entries at one point.
  * Nodes were being freed back into the pool with an elevated exception count
  * by radix_tree_join() and then radix_tree_split() was failing to zero the
  * count of exceptional entries.
@@ -420,16 +419,16 @@ static void multiorder_join3(unsigned int order)
 	unsigned long i;
 
 	for (i = 0; i < (1 << order); i++) {
-		radix_tree_insert(&tree, i, (void *)0x12UL);
+		radix_tree_insert(&tree, i, xa_mk_value(5));
 	}
 
-	radix_tree_join(&tree, 0, order, (void *)0x16UL);
+	radix_tree_join(&tree, 0, order, xa_mk_value(7));
 	rcu_barrier();
 
 	radix_tree_split(&tree, 0, 0);
 
 	radix_tree_for_each_slot(slot, &tree, &iter, 0) {
-		radix_tree_iter_replace(&tree, &iter, slot, (void *)0x12UL);
+		radix_tree_iter_replace(&tree, &iter, slot, xa_mk_value(5));
 	}
 
 	__radix_tree_lookup(&tree, 0, &node, NULL);
@@ -516,10 +515,10 @@ static void __multiorder_split2(int old_order, int new_order)
 	struct radix_tree_node *node;
 	void *item;
 
-	__radix_tree_insert(&tree, 0, old_order, (void *)0x12);
+	__radix_tree_insert(&tree, 0, old_order, xa_mk_value(5));
 
 	item = __radix_tree_lookup(&tree, 0, &node, NULL);
-	assert(item == (void *)0x12);
+	assert(item == xa_mk_value(5));
 	assert(node->exceptional > 0);
 
 	radix_tree_split(&tree, 0, new_order);
@@ -529,7 +528,7 @@ static void __multiorder_split2(int old_order, int new_order)
 	}
 
 	item = __radix_tree_lookup(&tree, 0, &node, NULL);
-	assert(item != (void *)0x12);
+	assert(item != xa_mk_value(5));
 	assert(node->exceptional == 0);
 
 	item_kill_tree(&tree);
@@ -543,40 +542,40 @@ static void __multiorder_split3(int old_order, int new_order)
 	struct radix_tree_node *node;
 	void *item;
 
-	__radix_tree_insert(&tree, 0, old_order, (void *)0x12);
+	__radix_tree_insert(&tree, 0, old_order, xa_mk_value(5));
 
 	item = __radix_tree_lookup(&tree, 0, &node, NULL);
-	assert(item == (void *)0x12);
+	assert(item == xa_mk_value(5));
 	assert(node->exceptional > 0);
 
 	radix_tree_split(&tree, 0, new_order);
 	radix_tree_for_each_slot(slot, &tree, &iter, 0) {
-		radix_tree_iter_replace(&tree, &iter, slot, (void *)0x16);
+		radix_tree_iter_replace(&tree, &iter, slot, xa_mk_value(7));
 	}
 
 	item = __radix_tree_lookup(&tree, 0, &node, NULL);
-	assert(item == (void *)0x16);
+	assert(item == xa_mk_value(7));
 	assert(node->exceptional > 0);
 
 	item_kill_tree(&tree);
 
-	__radix_tree_insert(&tree, 0, old_order, (void *)0x12);
+	__radix_tree_insert(&tree, 0, old_order, xa_mk_value(5));
 
 	item = __radix_tree_lookup(&tree, 0, &node, NULL);
-	assert(item == (void *)0x12);
+	assert(item == xa_mk_value(5));
 	assert(node->exceptional > 0);
 
 	radix_tree_split(&tree, 0, new_order);
 	radix_tree_for_each_slot(slot, &tree, &iter, 0) {
 		if (iter.index == (1 << new_order))
 			radix_tree_iter_replace(&tree, &iter, slot,
-						(void *)0x16);
+						xa_mk_value(7));
 		else
 			radix_tree_iter_replace(&tree, &iter, slot, NULL);
 	}
 
 	item = __radix_tree_lookup(&tree, 1 << new_order, &node, NULL);
-	assert(item == (void *)0x16);
+	assert(item == xa_mk_value(7));
 	assert(node->count == node->exceptional);
 	do {
 		node = node->parent;
@@ -609,13 +608,13 @@ static void multiorder_account(void)
 
 	item_insert_order(&tree, 0, 5);
 
-	__radix_tree_insert(&tree, 1 << 5, 5, (void *)0x12);
+	__radix_tree_insert(&tree, 1 << 5, 5, xa_mk_value(5));
 	__radix_tree_lookup(&tree, 0, &node, NULL);
 	assert(node->count == node->exceptional * 2);
 	radix_tree_delete(&tree, 1 << 5);
 	assert(node->exceptional == 0);
 
-	__radix_tree_insert(&tree, 1 << 5, 5, (void *)0x12);
+	__radix_tree_insert(&tree, 1 << 5, 5, xa_mk_value(5));
 	__radix_tree_lookup(&tree, 1 << 5, &node, &slot);
 	assert(node->count == node->exceptional * 2);
 	__radix_tree_replace(&tree, node, slot, NULL, NULL);
diff --git a/tools/testing/radix-tree/test.c b/tools/testing/radix-tree/test.c
index 5978ab1f403d..0d69c49177c6 100644
--- a/tools/testing/radix-tree/test.c
+++ b/tools/testing/radix-tree/test.c
@@ -276,7 +276,7 @@ void item_kill_tree(struct radix_tree_root *root)
 	int nfound;
 
 	radix_tree_for_each_slot(slot, root, &iter, 0) {
-		if (radix_tree_exceptional_entry(*slot))
+		if (xa_is_value(*slot))
 			radix_tree_delete(root, iter.index);
 	}
 
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 17/62] xarray: Change definition of sibling entries
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Instead of storing a pointer to the slot containing the canonical entry,
store the offset of the slot.  Produces slightly more efficient code
(~300 bytes) and simplifies the implementation.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h | 64 +++++++++++++++++++++++++++++++++++++++++++++++++
 lib/radix-tree.c       | 65 ++++++++++++++------------------------------------
 2 files changed, 82 insertions(+), 47 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index b1da8021b5fa..b9e0350b9e90 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -84,6 +84,8 @@ static inline bool xa_is_value(void *entry)
 	return (unsigned long)entry & 1;
 }
 
+/* Everything below here is the Advanced API.  Proceed with caution. */
+
 #define xa_trylock(xa)		spin_trylock(&(xa)->xa_lock)
 #define xa_lock(xa)		spin_lock(&(xa)->xa_lock)
 #define xa_unlock(xa)		spin_unlock(&(xa)->xa_lock)
@@ -97,4 +99,66 @@ static inline bool xa_is_value(void *entry)
 				spin_unlock_irqrestore(&(xa)->xa_lock, flags)
 #define xa_lock_held(xa)	lockdep_is_held(&(xa)->xa_lock)
 
+/*
+ * The xarray is constructed out of a set of 'chunks' of pointers.  Choosing
+ * the best chunk size requires some tradeoffs.  A power of two recommends
+ * itself so that we can walk the tree based purely on shifts and masks.
+ * Generally, the larger the better; as the number of slots per level of the
+ * tree increases, the less tall the tree needs to be.  But that needs to be
+ * balanced against the memory consumption of each node.  On a 64-bit system,
+ * xa_node is currently 576 bytes, and we get 7 of them per 4kB page.  If we
+ * doubled the number of slots per node, we'd get only 3 nodes per 4kB page.
+ */
+#ifndef XA_CHUNK_SHIFT
+#define XA_CHUNK_SHIFT		(CONFIG_BASE_SMALL ? 4 : 6)
+#endif
+#define XA_CHUNK_SIZE		(1UL << XA_CHUNK_SHIFT)
+#define XA_CHUNK_MASK		(XA_CHUNK_SIZE - 1)
+
+/*
+ * Internal entries have the bottom two bits set to the value 10b.  Most
+ * internal entries are pointers to the next node in the tree.  Since the
+ * kernel unmaps page 0 to trap NULL pointer dereferences, we can store up
+ * to 1024 distinct values in the tree.  Values 0-62 are used for sibling
+ * entries.  The retry entry is value 256.
+ */
+static inline void *xa_mk_internal(unsigned long v)
+{
+	return (void *)((v << 2) | 2);
+}
+
+static inline unsigned long xa_to_internal(void *entry)
+{
+	return (unsigned long)entry >> 2;
+}
+
+static inline bool xa_is_internal(void *entry)
+{
+	return ((unsigned long)entry & 3) == 2;
+}
+
+static inline bool xa_is_node(void *entry)
+{
+	return xa_is_internal(entry) && (unsigned long)entry > 4096;
+}
+
+static inline void *xa_mk_sibling(unsigned int offset)
+{
+	return xa_mk_internal(offset);
+}
+
+static inline unsigned long xa_to_sibling(void *entry)
+{
+	return xa_to_internal(entry);
+}
+
+static inline bool xa_is_sibling(void *entry)
+{
+	return IS_ENABLED(CONFIG_RADIX_TREE_MULTIORDER) &&
+		xa_is_internal(entry) &&
+		(entry < xa_mk_sibling(XA_CHUNK_SIZE - 1));
+}
+
+#define XA_RETRY_ENTRY		xa_mk_internal(256)
+
 #endif /* _LINUX_XARRAY_H */
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 30e49b89aa3b..4a1091e31932 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -37,6 +37,7 @@
 #include <linux/rcupdate.h>
 #include <linux/slab.h>
 #include <linux/string.h>
+#include <linux/xarray.h>
 
 
 /* Number of nodes in fully populated tree of given height */
@@ -97,24 +98,7 @@ static inline void *node_to_entry(void *ptr)
 	return (void *)((unsigned long)ptr | RADIX_TREE_INTERNAL_NODE);
 }
 
-#define RADIX_TREE_RETRY	node_to_entry(NULL)
-
-#ifdef CONFIG_RADIX_TREE_MULTIORDER
-/* Sibling slots point directly to another slot in the same node */
-static inline
-bool is_sibling_entry(const struct radix_tree_node *parent, void *node)
-{
-	void __rcu **ptr = node;
-	return (parent->slots <= ptr) &&
-			(ptr < parent->slots + RADIX_TREE_MAP_SIZE);
-}
-#else
-static inline
-bool is_sibling_entry(const struct radix_tree_node *parent, void *node)
-{
-	return false;
-}
-#endif
+#define RADIX_TREE_RETRY	XA_RETRY_ENTRY
 
 static inline unsigned long
 get_slot_offset(const struct radix_tree_node *parent, void __rcu **slot)
@@ -128,16 +112,10 @@ static unsigned int radix_tree_descend(const struct radix_tree_node *parent,
 	unsigned int offset = (index >> parent->shift) & RADIX_TREE_MAP_MASK;
 	void __rcu **entry = rcu_dereference_raw(parent->slots[offset]);
 
-#ifdef CONFIG_RADIX_TREE_MULTIORDER
-	if (radix_tree_is_internal_node(entry)) {
-		if (is_sibling_entry(parent, entry)) {
-			void __rcu **sibentry;
-			sibentry = (void __rcu **) entry_to_node(entry);
-			offset = get_slot_offset(parent, sibentry);
-			entry = rcu_dereference_raw(*sibentry);
-		}
+	if (xa_is_sibling(entry)) {
+		offset = xa_to_sibling(entry);
+		entry = rcu_dereference_raw(parent->slots[offset]);
 	}
-#endif
 
 	*nodep = (void *)entry;
 	return offset;
@@ -299,10 +277,10 @@ static void dump_node(struct radix_tree_node *node, unsigned long index)
 		} else if (!radix_tree_is_internal_node(entry)) {
 			pr_debug("radix entry %p offset %ld indices %lu-%lu parent %p\n",
 					entry, i, first, last, node);
-		} else if (is_sibling_entry(node, entry)) {
+		} else if (xa_is_sibling(entry)) {
 			pr_debug("radix sblng %p offset %ld indices %lu-%lu parent %p val %p\n",
 					entry, i, first, last, node,
-					*(void **)entry_to_node(entry));
+					node->slots[xa_to_sibling(entry)]);
 		} else {
 			dump_node(entry_to_node(entry), first);
 		}
@@ -872,8 +850,7 @@ static void radix_tree_free_nodes(struct radix_tree_node *node)
 
 	for (;;) {
 		void *entry = rcu_dereference_raw(child->slots[offset]);
-		if (radix_tree_is_internal_node(entry) &&
-					!is_sibling_entry(child, entry)) {
+		if (xa_is_node(entry)) {
 			child = entry_to_node(entry);
 			offset = 0;
 			continue;
@@ -895,7 +872,7 @@ static void radix_tree_free_nodes(struct radix_tree_node *node)
 static inline int insert_entries(struct radix_tree_node *node,
 		void __rcu **slot, void *item, unsigned order, bool replace)
 {
-	struct radix_tree_node *child;
+	void *sibling;
 	unsigned i, n, tag, offset, tags = 0;
 
 	if (node) {
@@ -913,7 +890,7 @@ static inline int insert_entries(struct radix_tree_node *node,
 		offset = offset & ~(n - 1);
 		slot = &node->slots[offset];
 	}
-	child = node_to_entry(slot);
+	sibling = xa_mk_sibling(offset);
 
 	for (i = 0; i < n; i++) {
 		if (slot[i]) {
@@ -930,7 +907,7 @@ static inline int insert_entries(struct radix_tree_node *node,
 	for (i = 0; i < n; i++) {
 		struct radix_tree_node *old = rcu_dereference_raw(slot[i]);
 		if (i) {
-			rcu_assign_pointer(slot[i], child);
+			rcu_assign_pointer(slot[i], sibling);
 			for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
 				if (tags & (1 << tag))
 					tag_clear(node, tag, offset + i);
@@ -940,9 +917,7 @@ static inline int insert_entries(struct radix_tree_node *node,
 				if (tags & (1 << tag))
 					tag_set(node, tag, offset);
 		}
-		if (radix_tree_is_internal_node(old) &&
-					!is_sibling_entry(node, old) &&
-					(old != RADIX_TREE_RETRY))
+		if (xa_is_node(old))
 			radix_tree_free_nodes(old);
 		if (xa_is_value(old))
 			node->exceptional--;
@@ -1101,10 +1076,10 @@ static inline void replace_sibling_entries(struct radix_tree_node *node,
 				void __rcu **slot, int count, int exceptional)
 {
 #ifdef CONFIG_RADIX_TREE_MULTIORDER
-	void *ptr = node_to_entry(slot);
-	unsigned offset = get_slot_offset(node, slot) + 1;
+	unsigned offset = get_slot_offset(node, slot);
+	void *ptr = xa_mk_sibling(offset);
 
-	while (offset < RADIX_TREE_MAP_SIZE) {
+	while (++offset < RADIX_TREE_MAP_SIZE) {
 		if (rcu_dereference_raw(node->slots[offset]) != ptr)
 			break;
 		if (count < 0) {
@@ -1112,7 +1087,6 @@ static inline void replace_sibling_entries(struct radix_tree_node *node,
 			node->count--;
 		}
 		node->exceptional += exceptional;
-		offset++;
 	}
 #endif
 }
@@ -1311,8 +1285,7 @@ int radix_tree_split(struct radix_tree_root *root, unsigned long index,
 			tags |= 1 << tag;
 
 	for (end = offset + 1; end < RADIX_TREE_MAP_SIZE; end++) {
-		if (!is_sibling_entry(parent,
-				rcu_dereference_raw(parent->slots[end])))
+		if (!xa_is_sibling(rcu_dereference_raw(parent->slots[end])))
 			break;
 		for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
 			if (tags & (1 << tag))
@@ -1608,11 +1581,9 @@ static void set_iter_tags(struct radix_tree_iter *iter,
 static void __rcu **skip_siblings(struct radix_tree_node **nodep,
 			void __rcu **slot, struct radix_tree_iter *iter)
 {
-	void *sib = node_to_entry(slot - 1);
-
 	while (iter->index < iter->next_index) {
 		*nodep = rcu_dereference_raw(*slot);
-		if (*nodep && *nodep != sib)
+		if (*nodep && !xa_is_sibling(*nodep))
 			return slot;
 		slot++;
 		iter->index = __radix_tree_iter_add(iter, 1);
@@ -1763,7 +1734,7 @@ void __rcu **radix_tree_next_chunk(const struct radix_tree_root *root,
 				while (++offset	< RADIX_TREE_MAP_SIZE) {
 					void *slot = rcu_dereference_raw(
 							node->slots[offset]);
-					if (is_sibling_entry(node, slot))
+					if (xa_is_sibling(slot))
 						continue;
 					if (slot)
 						break;
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 17/62] xarray: Change definition of sibling entries
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Instead of storing a pointer to the slot containing the canonical entry,
store the offset of the slot.  Produces slightly more efficient code
(~300 bytes) and simplifies the implementation.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h | 64 +++++++++++++++++++++++++++++++++++++++++++++++++
 lib/radix-tree.c       | 65 ++++++++++++++------------------------------------
 2 files changed, 82 insertions(+), 47 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index b1da8021b5fa..b9e0350b9e90 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -84,6 +84,8 @@ static inline bool xa_is_value(void *entry)
 	return (unsigned long)entry & 1;
 }
 
+/* Everything below here is the Advanced API.  Proceed with caution. */
+
 #define xa_trylock(xa)		spin_trylock(&(xa)->xa_lock)
 #define xa_lock(xa)		spin_lock(&(xa)->xa_lock)
 #define xa_unlock(xa)		spin_unlock(&(xa)->xa_lock)
@@ -97,4 +99,66 @@ static inline bool xa_is_value(void *entry)
 				spin_unlock_irqrestore(&(xa)->xa_lock, flags)
 #define xa_lock_held(xa)	lockdep_is_held(&(xa)->xa_lock)
 
+/*
+ * The xarray is constructed out of a set of 'chunks' of pointers.  Choosing
+ * the best chunk size requires some tradeoffs.  A power of two recommends
+ * itself so that we can walk the tree based purely on shifts and masks.
+ * Generally, the larger the better; as the number of slots per level of the
+ * tree increases, the less tall the tree needs to be.  But that needs to be
+ * balanced against the memory consumption of each node.  On a 64-bit system,
+ * xa_node is currently 576 bytes, and we get 7 of them per 4kB page.  If we
+ * doubled the number of slots per node, we'd get only 3 nodes per 4kB page.
+ */
+#ifndef XA_CHUNK_SHIFT
+#define XA_CHUNK_SHIFT		(CONFIG_BASE_SMALL ? 4 : 6)
+#endif
+#define XA_CHUNK_SIZE		(1UL << XA_CHUNK_SHIFT)
+#define XA_CHUNK_MASK		(XA_CHUNK_SIZE - 1)
+
+/*
+ * Internal entries have the bottom two bits set to the value 10b.  Most
+ * internal entries are pointers to the next node in the tree.  Since the
+ * kernel unmaps page 0 to trap NULL pointer dereferences, we can store up
+ * to 1024 distinct values in the tree.  Values 0-62 are used for sibling
+ * entries.  The retry entry is value 256.
+ */
+static inline void *xa_mk_internal(unsigned long v)
+{
+	return (void *)((v << 2) | 2);
+}
+
+static inline unsigned long xa_to_internal(void *entry)
+{
+	return (unsigned long)entry >> 2;
+}
+
+static inline bool xa_is_internal(void *entry)
+{
+	return ((unsigned long)entry & 3) == 2;
+}
+
+static inline bool xa_is_node(void *entry)
+{
+	return xa_is_internal(entry) && (unsigned long)entry > 4096;
+}
+
+static inline void *xa_mk_sibling(unsigned int offset)
+{
+	return xa_mk_internal(offset);
+}
+
+static inline unsigned long xa_to_sibling(void *entry)
+{
+	return xa_to_internal(entry);
+}
+
+static inline bool xa_is_sibling(void *entry)
+{
+	return IS_ENABLED(CONFIG_RADIX_TREE_MULTIORDER) &&
+		xa_is_internal(entry) &&
+		(entry < xa_mk_sibling(XA_CHUNK_SIZE - 1));
+}
+
+#define XA_RETRY_ENTRY		xa_mk_internal(256)
+
 #endif /* _LINUX_XARRAY_H */
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 30e49b89aa3b..4a1091e31932 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -37,6 +37,7 @@
 #include <linux/rcupdate.h>
 #include <linux/slab.h>
 #include <linux/string.h>
+#include <linux/xarray.h>
 
 
 /* Number of nodes in fully populated tree of given height */
@@ -97,24 +98,7 @@ static inline void *node_to_entry(void *ptr)
 	return (void *)((unsigned long)ptr | RADIX_TREE_INTERNAL_NODE);
 }
 
-#define RADIX_TREE_RETRY	node_to_entry(NULL)
-
-#ifdef CONFIG_RADIX_TREE_MULTIORDER
-/* Sibling slots point directly to another slot in the same node */
-static inline
-bool is_sibling_entry(const struct radix_tree_node *parent, void *node)
-{
-	void __rcu **ptr = node;
-	return (parent->slots <= ptr) &&
-			(ptr < parent->slots + RADIX_TREE_MAP_SIZE);
-}
-#else
-static inline
-bool is_sibling_entry(const struct radix_tree_node *parent, void *node)
-{
-	return false;
-}
-#endif
+#define RADIX_TREE_RETRY	XA_RETRY_ENTRY
 
 static inline unsigned long
 get_slot_offset(const struct radix_tree_node *parent, void __rcu **slot)
@@ -128,16 +112,10 @@ static unsigned int radix_tree_descend(const struct radix_tree_node *parent,
 	unsigned int offset = (index >> parent->shift) & RADIX_TREE_MAP_MASK;
 	void __rcu **entry = rcu_dereference_raw(parent->slots[offset]);
 
-#ifdef CONFIG_RADIX_TREE_MULTIORDER
-	if (radix_tree_is_internal_node(entry)) {
-		if (is_sibling_entry(parent, entry)) {
-			void __rcu **sibentry;
-			sibentry = (void __rcu **) entry_to_node(entry);
-			offset = get_slot_offset(parent, sibentry);
-			entry = rcu_dereference_raw(*sibentry);
-		}
+	if (xa_is_sibling(entry)) {
+		offset = xa_to_sibling(entry);
+		entry = rcu_dereference_raw(parent->slots[offset]);
 	}
-#endif
 
 	*nodep = (void *)entry;
 	return offset;
@@ -299,10 +277,10 @@ static void dump_node(struct radix_tree_node *node, unsigned long index)
 		} else if (!radix_tree_is_internal_node(entry)) {
 			pr_debug("radix entry %p offset %ld indices %lu-%lu parent %p\n",
 					entry, i, first, last, node);
-		} else if (is_sibling_entry(node, entry)) {
+		} else if (xa_is_sibling(entry)) {
 			pr_debug("radix sblng %p offset %ld indices %lu-%lu parent %p val %p\n",
 					entry, i, first, last, node,
-					*(void **)entry_to_node(entry));
+					node->slots[xa_to_sibling(entry)]);
 		} else {
 			dump_node(entry_to_node(entry), first);
 		}
@@ -872,8 +850,7 @@ static void radix_tree_free_nodes(struct radix_tree_node *node)
 
 	for (;;) {
 		void *entry = rcu_dereference_raw(child->slots[offset]);
-		if (radix_tree_is_internal_node(entry) &&
-					!is_sibling_entry(child, entry)) {
+		if (xa_is_node(entry)) {
 			child = entry_to_node(entry);
 			offset = 0;
 			continue;
@@ -895,7 +872,7 @@ static void radix_tree_free_nodes(struct radix_tree_node *node)
 static inline int insert_entries(struct radix_tree_node *node,
 		void __rcu **slot, void *item, unsigned order, bool replace)
 {
-	struct radix_tree_node *child;
+	void *sibling;
 	unsigned i, n, tag, offset, tags = 0;
 
 	if (node) {
@@ -913,7 +890,7 @@ static inline int insert_entries(struct radix_tree_node *node,
 		offset = offset & ~(n - 1);
 		slot = &node->slots[offset];
 	}
-	child = node_to_entry(slot);
+	sibling = xa_mk_sibling(offset);
 
 	for (i = 0; i < n; i++) {
 		if (slot[i]) {
@@ -930,7 +907,7 @@ static inline int insert_entries(struct radix_tree_node *node,
 	for (i = 0; i < n; i++) {
 		struct radix_tree_node *old = rcu_dereference_raw(slot[i]);
 		if (i) {
-			rcu_assign_pointer(slot[i], child);
+			rcu_assign_pointer(slot[i], sibling);
 			for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
 				if (tags & (1 << tag))
 					tag_clear(node, tag, offset + i);
@@ -940,9 +917,7 @@ static inline int insert_entries(struct radix_tree_node *node,
 				if (tags & (1 << tag))
 					tag_set(node, tag, offset);
 		}
-		if (radix_tree_is_internal_node(old) &&
-					!is_sibling_entry(node, old) &&
-					(old != RADIX_TREE_RETRY))
+		if (xa_is_node(old))
 			radix_tree_free_nodes(old);
 		if (xa_is_value(old))
 			node->exceptional--;
@@ -1101,10 +1076,10 @@ static inline void replace_sibling_entries(struct radix_tree_node *node,
 				void __rcu **slot, int count, int exceptional)
 {
 #ifdef CONFIG_RADIX_TREE_MULTIORDER
-	void *ptr = node_to_entry(slot);
-	unsigned offset = get_slot_offset(node, slot) + 1;
+	unsigned offset = get_slot_offset(node, slot);
+	void *ptr = xa_mk_sibling(offset);
 
-	while (offset < RADIX_TREE_MAP_SIZE) {
+	while (++offset < RADIX_TREE_MAP_SIZE) {
 		if (rcu_dereference_raw(node->slots[offset]) != ptr)
 			break;
 		if (count < 0) {
@@ -1112,7 +1087,6 @@ static inline void replace_sibling_entries(struct radix_tree_node *node,
 			node->count--;
 		}
 		node->exceptional += exceptional;
-		offset++;
 	}
 #endif
 }
@@ -1311,8 +1285,7 @@ int radix_tree_split(struct radix_tree_root *root, unsigned long index,
 			tags |= 1 << tag;
 
 	for (end = offset + 1; end < RADIX_TREE_MAP_SIZE; end++) {
-		if (!is_sibling_entry(parent,
-				rcu_dereference_raw(parent->slots[end])))
+		if (!xa_is_sibling(rcu_dereference_raw(parent->slots[end])))
 			break;
 		for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
 			if (tags & (1 << tag))
@@ -1608,11 +1581,9 @@ static void set_iter_tags(struct radix_tree_iter *iter,
 static void __rcu **skip_siblings(struct radix_tree_node **nodep,
 			void __rcu **slot, struct radix_tree_iter *iter)
 {
-	void *sib = node_to_entry(slot - 1);
-
 	while (iter->index < iter->next_index) {
 		*nodep = rcu_dereference_raw(*slot);
-		if (*nodep && *nodep != sib)
+		if (*nodep && !xa_is_sibling(*nodep))
 			return slot;
 		slot++;
 		iter->index = __radix_tree_iter_add(iter, 1);
@@ -1763,7 +1734,7 @@ void __rcu **radix_tree_next_chunk(const struct radix_tree_root *root,
 				while (++offset	< RADIX_TREE_MAP_SIZE) {
 					void *slot = rcu_dereference_raw(
 							node->slots[offset]);
-					if (is_sibling_entry(node, slot))
+					if (xa_is_sibling(slot))
 						continue;
 					if (slot)
 						break;
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 18/62] xarray: Add definition of struct xarray
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This is a direct replacement for struct radix_tree_root.  Some of the
struct members have changed name; convert those, and use a #define so
that radix_tree users continue to work without change.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/radix-tree.h            | 26 +++++-------
 include/linux/xarray.h                | 29 ++++++++++++++
 lib/idr.c                             |  4 +-
 lib/radix-tree.c                      | 75 +++++++++++++++++------------------
 tools/include/linux/spinlock.h        |  1 +
 tools/testing/radix-tree/Makefile     |  2 +-
 tools/testing/radix-tree/multiorder.c |  6 +--
 tools/testing/radix-tree/test.c       |  6 +--
 8 files changed, 86 insertions(+), 63 deletions(-)

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index ee9aad11d472..015bc1bdc3d2 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -30,6 +30,9 @@
 #include <linux/types.h>
 #include <linux/xarray.h>
 
+/* Keep unconverted code working */
+#define radix_tree_root		xarray
+
 /*
  * The bottom two bits of the slot determine how the remaining bits in the
  * slot are interpreted:
@@ -59,10 +62,7 @@ static inline bool radix_tree_is_internal_node(void *ptr)
 
 #define RADIX_TREE_MAX_TAGS 3
 
-#ifndef RADIX_TREE_MAP_SHIFT
-#define RADIX_TREE_MAP_SHIFT	(CONFIG_BASE_SMALL ? 4 : 6)
-#endif
-
+#define RADIX_TREE_MAP_SHIFT	XA_CHUNK_SHIFT
 #define RADIX_TREE_MAP_SIZE	(1UL << RADIX_TREE_MAP_SHIFT)
 #define RADIX_TREE_MAP_MASK	(RADIX_TREE_MAP_SIZE-1)
 
@@ -95,20 +95,14 @@ struct radix_tree_node {
 	unsigned long	tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS];
 };
 
-/* The top bits of gfp_mask are used to store the root tags and the IDR flag */
+/* The top bits of xa_flags are used to store the root tags and the IDR flag */
 #define ROOT_IS_IDR	((__force gfp_t)(1 << __GFP_BITS_SHIFT))
 #define ROOT_TAG_SHIFT	(__GFP_BITS_SHIFT + 1)
 
-struct radix_tree_root {
-	spinlock_t		xa_lock;
-	gfp_t			gfp_mask;
-	struct radix_tree_node	__rcu *rnode;
-};
-
 #define RADIX_TREE_INIT(name, mask)	{				\
 	.xa_lock = __SPIN_LOCK_UNLOCKED(name.xa_lock),			\
-	.gfp_mask = (mask),						\
-	.rnode = NULL,							\
+	.xa_flags = (mask),						\
+	.xa_head = NULL,						\
 }
 
 #define RADIX_TREE(name, mask) \
@@ -117,13 +111,13 @@ struct radix_tree_root {
 #define INIT_RADIX_TREE(root, mask)					\
 do {									\
 	(root)->xa_lock = __SPIN_LOCK_UNLOCKED(root.xa_lock);		\
-	(root)->gfp_mask = (mask);					\
-	(root)->rnode = NULL;						\
+	(root)->xa_flags = (mask);					\
+	(root)->xa_head = NULL;						\
 } while (0)
 
 static inline bool radix_tree_empty(const struct radix_tree_root *root)
 {
-	return root->rnode == NULL;
+	return root->xa_head == NULL;
 }
 
 /**
diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index b9e0350b9e90..03d430ec3bce 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -45,9 +45,38 @@
  * There are two levels of API provided.  Normal and Advanced.
  * The advanced API is more flexible but has fewer safeguards.
  */
+
+#include <linux/compiler.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
 
+/**
+ * struct xarray - The anchor of the XArray
+ * @xa_lock: Lock protecting writes to the array.
+ * @xa_flags: Internal XArray flags.
+ * @xa_head: The first pointer in the array.
+ *
+ * If all of the pointers in the array are NULL, @xa_head is a NULL pointer.
+ * If the only non-NULL pointer in the array is at index 0, @xa_head is that
+ * pointer.  If any other pointer in the array is non-NULL, @xa_head points
+ * to an @xa_node.
+ */
+struct xarray {
+	spinlock_t xa_lock;
+	gfp_t xa_flags;
+	void __rcu *xa_head;
+};
+
+#define __XARRAY_INIT(name, flags) {				\
+	.xa_lock = __SPIN_LOCK_UNLOCKED(name.xa_lock),		\
+	.xa_flags = flags,					\
+	.xa_head = NULL,					\
+}
+
+#define XARRAY_INIT(name) __XARRAY_INIT(name, 0)
+
+#define DEFINE_XARRAY(name) struct xarray name = XARRAY_INIT(name)
+
 #define BITS_PER_XA_VALUE	(BITS_PER_LONG - 1)
 
 /**
diff --git a/lib/idr.c b/lib/idr.c
index afcdff365037..50201b5c46e9 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -35,8 +35,8 @@ int idr_alloc_ul(struct idr *idr, void *ptr, unsigned long *nextid,
 	if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr)))
 		return -EINVAL;
 
-	if (WARN_ON_ONCE(!(idr->idr_rt.gfp_mask & ROOT_IS_IDR)))
-		idr->idr_rt.gfp_mask |= IDR_RT_MARKER;
+	if (WARN_ON_ONCE(!(idr->idr_rt.xa_flags & ROOT_IS_IDR)))
+		idr->idr_rt.xa_flags |= IDR_RT_MARKER;
 
 	radix_tree_iter_init(&iter, *nextid);
 	slot = idr_get_free(&idr->idr_rt, &iter, gfp, max);
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 4a1091e31932..930eb7d298d7 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -123,7 +123,7 @@ static unsigned int radix_tree_descend(const struct radix_tree_node *parent,
 
 static inline gfp_t root_gfp_mask(const struct radix_tree_root *root)
 {
-	return root->gfp_mask & __GFP_BITS_MASK;
+	return root->xa_flags & __GFP_BITS_MASK;
 }
 
 static inline void tag_set(struct radix_tree_node *node, unsigned int tag,
@@ -146,32 +146,32 @@ static inline int tag_get(const struct radix_tree_node *node, unsigned int tag,
 
 static inline void root_tag_set(struct radix_tree_root *root, unsigned tag)
 {
-	root->gfp_mask |= (__force gfp_t)(1 << (tag + ROOT_TAG_SHIFT));
+	root->xa_flags |= (__force gfp_t)(1 << (tag + ROOT_TAG_SHIFT));
 }
 
 static inline void root_tag_clear(struct radix_tree_root *root, unsigned tag)
 {
-	root->gfp_mask &= (__force gfp_t)~(1 << (tag + ROOT_TAG_SHIFT));
+	root->xa_flags &= (__force gfp_t)~(1 << (tag + ROOT_TAG_SHIFT));
 }
 
 static inline void root_tag_clear_all(struct radix_tree_root *root)
 {
-	root->gfp_mask &= (__force gfp_t)((1 << ROOT_TAG_SHIFT) - 1);
+	root->xa_flags &= (__force gfp_t)((1 << ROOT_TAG_SHIFT) - 1);
 }
 
 static inline int root_tag_get(const struct radix_tree_root *root, unsigned tag)
 {
-	return (__force int)root->gfp_mask & (1 << (tag + ROOT_TAG_SHIFT));
+	return (__force int)root->xa_flags & (1 << (tag + ROOT_TAG_SHIFT));
 }
 
 static inline unsigned root_tags_get(const struct radix_tree_root *root)
 {
-	return (__force unsigned)root->gfp_mask >> ROOT_TAG_SHIFT;
+	return (__force unsigned)root->xa_flags >> ROOT_TAG_SHIFT;
 }
 
 static inline bool is_idr(const struct radix_tree_root *root)
 {
-	return !!(root->gfp_mask & ROOT_IS_IDR);
+	return !!(root->xa_flags & ROOT_IS_IDR);
 }
 
 /*
@@ -290,12 +290,12 @@ static void dump_node(struct radix_tree_node *node, unsigned long index)
 /* For debug */
 static void radix_tree_dump(struct radix_tree_root *root)
 {
-	pr_debug("radix root: %p rnode %p tags %x\n",
-			root, root->rnode,
-			root->gfp_mask >> ROOT_TAG_SHIFT);
-	if (!radix_tree_is_internal_node(root->rnode))
+	pr_debug("radix root: %p xa_head %p tags %x\n",
+			root, root->xa_head,
+			root->xa_flags >> ROOT_TAG_SHIFT);
+	if (!radix_tree_is_internal_node(root->xa_head))
 		return;
-	dump_node(entry_to_node(root->rnode), 0);
+	dump_node(entry_to_node(root->xa_head), 0);
 }
 
 static void dump_ida_node(void *entry, unsigned long index)
@@ -339,9 +339,9 @@ static void dump_ida_node(void *entry, unsigned long index)
 static void ida_dump(struct ida *ida)
 {
 	struct radix_tree_root *root = &ida->ida_rt;
-	pr_debug("ida: %p node %p free %d\n", ida, root->rnode,
-				root->gfp_mask >> ROOT_TAG_SHIFT);
-	dump_ida_node(root->rnode, 0);
+	pr_debug("ida: %p node %p free %d\n", ida, root->xa_head,
+				root->xa_flags >> ROOT_TAG_SHIFT);
+	dump_ida_node(root->xa_head, 0);
 }
 #endif
 
@@ -575,7 +575,7 @@ int radix_tree_maybe_preload_order(gfp_t gfp_mask, int order)
 static unsigned radix_tree_load_root(const struct radix_tree_root *root,
 		struct radix_tree_node **nodep, unsigned long *maxindex)
 {
-	struct radix_tree_node *node = rcu_dereference_raw(root->rnode);
+	struct radix_tree_node *node = rcu_dereference_raw(root->xa_head);
 
 	*nodep = node;
 
@@ -604,7 +604,7 @@ static int radix_tree_extend(struct radix_tree_root *root, gfp_t gfp,
 	while (index > shift_maxindex(maxshift))
 		maxshift += RADIX_TREE_MAP_SHIFT;
 
-	entry = rcu_dereference_raw(root->rnode);
+	entry = rcu_dereference_raw(root->xa_head);
 	if (!entry && (!is_idr(root) || root_tag_get(root, IDR_FREE)))
 		goto out;
 
@@ -632,7 +632,7 @@ static int radix_tree_extend(struct radix_tree_root *root, gfp_t gfp,
 		if (radix_tree_is_internal_node(entry)) {
 			entry_to_node(entry)->parent = node;
 		} else if (xa_is_value(entry)) {
-			/* Moving an exceptional root->rnode to a node */
+			/* Moving an exceptional root->xa_head to a node */
 			node->exceptional = 1;
 		}
 		/*
@@ -641,7 +641,7 @@ static int radix_tree_extend(struct radix_tree_root *root, gfp_t gfp,
 		 */
 		node->slots[0] = (void __rcu *)entry;
 		entry = node_to_entry(node);
-		rcu_assign_pointer(root->rnode, entry);
+		rcu_assign_pointer(root->xa_head, entry);
 		shift += RADIX_TREE_MAP_SHIFT;
 	} while (shift <= maxshift);
 out:
@@ -658,7 +658,7 @@ static inline bool radix_tree_shrink(struct radix_tree_root *root,
 	bool shrunk = false;
 
 	for (;;) {
-		struct radix_tree_node *node = rcu_dereference_raw(root->rnode);
+		struct radix_tree_node *node = rcu_dereference_raw(root->xa_head);
 		struct radix_tree_node *child;
 
 		if (!radix_tree_is_internal_node(node))
@@ -686,9 +686,9 @@ static inline bool radix_tree_shrink(struct radix_tree_root *root,
 		 * moving the node from one part of the tree to another: if it
 		 * was safe to dereference the old pointer to it
 		 * (node->slots[0]), it will be safe to dereference the new
-		 * one (root->rnode) as far as dependent read barriers go.
+		 * one (root->xa_head) as far as dependent read barriers go.
 		 */
-		root->rnode = (void __rcu *)child;
+		root->xa_head = (void __rcu *)child;
 		if (is_idr(root) && !tag_get(node, IDR_FREE, 0))
 			root_tag_clear(root, IDR_FREE);
 
@@ -736,9 +736,8 @@ static bool delete_node(struct radix_tree_root *root,
 
 		if (node->count) {
 			if (node_to_entry(node) ==
-					rcu_dereference_raw(root->rnode))
-				deleted |= radix_tree_shrink(root,
-								update_node);
+					rcu_dereference_raw(root->xa_head))
+				deleted |= radix_tree_shrink(root, update_node);
 			return deleted;
 		}
 
@@ -753,7 +752,7 @@ static bool delete_node(struct radix_tree_root *root,
 			 */
 			if (!is_idr(root))
 				root_tag_clear_all(root);
-			root->rnode = NULL;
+			root->xa_head = NULL;
 		}
 
 		WARN_ON_ONCE(!list_empty(&node->private_list));
@@ -778,7 +777,7 @@ static bool delete_node(struct radix_tree_root *root,
  *	at position @index in the radix tree @root.
  *
  *	Until there is more than one item in the tree, no nodes are
- *	allocated and @root->rnode is used as a direct slot instead of
+ *	allocated and @root->xa_head is used as a direct slot instead of
  *	pointing to a node, in which case *@nodep will be NULL.
  *
  *	Returns -ENOMEM, or 0 for success.
@@ -788,7 +787,7 @@ int __radix_tree_create(struct radix_tree_root *root, unsigned long index,
 			void __rcu ***slotp)
 {
 	struct radix_tree_node *node = NULL, *child;
-	void __rcu **slot = (void __rcu **)&root->rnode;
+	void __rcu **slot = (void __rcu **)&root->xa_head;
 	unsigned long maxindex;
 	unsigned int shift, offset = 0;
 	unsigned long max = index | ((1UL << order) - 1);
@@ -804,7 +803,7 @@ int __radix_tree_create(struct radix_tree_root *root, unsigned long index,
 		if (error < 0)
 			return error;
 		shift = error;
-		child = rcu_dereference_raw(root->rnode);
+		child = rcu_dereference_raw(root->xa_head);
 	}
 
 	while (shift > order) {
@@ -995,7 +994,7 @@ EXPORT_SYMBOL(__radix_tree_insert);
  *	tree @root.
  *
  *	Until there is more than one item in the tree, no nodes are
- *	allocated and @root->rnode is used as a direct slot instead of
+ *	allocated and @root->xa_head is used as a direct slot instead of
  *	pointing to a node, in which case *@nodep will be NULL.
  */
 void *__radix_tree_lookup(const struct radix_tree_root *root,
@@ -1008,7 +1007,7 @@ void *__radix_tree_lookup(const struct radix_tree_root *root,
 
  restart:
 	parent = NULL;
-	slot = (void __rcu **)&root->rnode;
+	slot = (void __rcu **)&root->xa_head;
 	radix_tree_load_root(root, &node, &maxindex);
 	if (index > maxindex)
 		return NULL;
@@ -1160,9 +1159,9 @@ void __radix_tree_replace(struct radix_tree_root *root,
 	/*
 	 * This function supports replacing exceptional entries and
 	 * deleting entries, but that needs accounting against the
-	 * node unless the slot is root->rnode.
+	 * node unless the slot is root->xa_head.
 	 */
-	WARN_ON_ONCE(!node && (slot != (void __rcu **)&root->rnode) &&
+	WARN_ON_ONCE(!node && (slot != (void __rcu **)&root->xa_head) &&
 			(count || exceptional));
 	replace_slot(slot, item, node, count, exceptional);
 
@@ -1714,7 +1713,7 @@ void __rcu **radix_tree_next_chunk(const struct radix_tree_root *root,
 		iter->tags = 1;
 		iter->node = NULL;
 		__set_iter_shift(iter, 0);
-		return (void __rcu **)&root->rnode;
+		return (void __rcu **)&root->xa_head;
 	}
 
 	do {
@@ -2108,7 +2107,7 @@ void __rcu **idr_get_free(struct radix_tree_root *root,
 			      unsigned long max)
 {
 	struct radix_tree_node *node = NULL, *child;
-	void __rcu **slot = (void __rcu **)&root->rnode;
+	void __rcu **slot = (void __rcu **)&root->xa_head;
 	unsigned long maxindex, start = iter->next_index;
 	unsigned int shift, offset = 0;
 
@@ -2124,7 +2123,7 @@ void __rcu **idr_get_free(struct radix_tree_root *root,
 		if (error < 0)
 			return ERR_PTR(error);
 		shift = error;
-		child = rcu_dereference_raw(root->rnode);
+		child = rcu_dereference_raw(root->xa_head);
 	}
 
 	while (shift) {
@@ -2187,10 +2186,10 @@ void __rcu **idr_get_free(struct radix_tree_root *root,
  */
 void idr_destroy(struct idr *idr)
 {
-	struct radix_tree_node *node = rcu_dereference_raw(idr->idr_rt.rnode);
+	struct radix_tree_node *node = rcu_dereference_raw(idr->idr_rt.xa_head);
 	if (radix_tree_is_internal_node(node))
 		radix_tree_free_nodes(node);
-	idr->idr_rt.rnode = NULL;
+	idr->idr_rt.xa_head = NULL;
 	root_tag_set(&idr->idr_rt, IDR_FREE);
 }
 EXPORT_SYMBOL(idr_destroy);
diff --git a/tools/include/linux/spinlock.h b/tools/include/linux/spinlock.h
index b21b586b9854..34fed5c38da2 100644
--- a/tools/include/linux/spinlock.h
+++ b/tools/include/linux/spinlock.h
@@ -8,6 +8,7 @@
 #define spinlock_t		pthread_mutex_t
 #define DEFINE_SPINLOCK(x)	pthread_mutex_t x = PTHREAD_MUTEX_INITIALIZER;
 #define __SPIN_LOCK_UNLOCKED(x)	(pthread_mutex_t)PTHREAD_MUTEX_INITIALIZER
+#define spin_lock_init(x)	pthread_mutex_init(x, NULL);
 
 #define spin_lock_irqsave(x, f)		(void)f, pthread_mutex_lock(x)
 #define spin_unlock_irqrestore(x, f)	(void)f, pthread_mutex_unlock(x)
diff --git a/tools/testing/radix-tree/Makefile b/tools/testing/radix-tree/Makefile
index fa7ee369b3c9..ebb12224e258 100644
--- a/tools/testing/radix-tree/Makefile
+++ b/tools/testing/radix-tree/Makefile
@@ -46,6 +46,6 @@ idr.c: ../../../lib/idr.c
 
 mapshift:
 	@if ! grep -qws $(SHIFT) generated/map-shift.h; then		\
-		echo "#define RADIX_TREE_MAP_SHIFT $(SHIFT)" >		\
+		echo "#define XA_CHUNK_SHIFT $(SHIFT)" >		\
 				generated/map-shift.h;			\
 	fi
diff --git a/tools/testing/radix-tree/multiorder.c b/tools/testing/radix-tree/multiorder.c
index 684e76f79f4a..24293a2fd82d 100644
--- a/tools/testing/radix-tree/multiorder.c
+++ b/tools/testing/radix-tree/multiorder.c
@@ -191,13 +191,13 @@ static void multiorder_shrink(unsigned long index, int order)
 
 	assert(item_insert_order(&tree, 0, order) == 0);
 
-	node = tree.rnode;
+	node = tree.xa_head;
 
 	assert(item_insert(&tree, index) == 0);
-	assert(node != tree.rnode);
+	assert(node != tree.xa_head);
 
 	assert(item_delete(&tree, index) != 0);
-	assert(node == tree.rnode);
+	assert(node == tree.xa_head);
 
 	for (i = 0; i < max; i++) {
 		struct item *item = item_lookup(&tree, i);
diff --git a/tools/testing/radix-tree/test.c b/tools/testing/radix-tree/test.c
index 0d69c49177c6..6e1cc2040817 100644
--- a/tools/testing/radix-tree/test.c
+++ b/tools/testing/radix-tree/test.c
@@ -262,7 +262,7 @@ static int verify_node(struct radix_tree_node *slot, unsigned int tag,
 
 void verify_tag_consistency(struct radix_tree_root *root, unsigned int tag)
 {
-	struct radix_tree_node *node = root->rnode;
+	struct radix_tree_node *node = root->xa_head;
 	if (!radix_tree_is_internal_node(node))
 		return;
 	verify_node(node, tag, !!root_tag_get(root, tag));
@@ -292,13 +292,13 @@ void item_kill_tree(struct radix_tree_root *root)
 		}
 	}
 	assert(radix_tree_gang_lookup(root, (void **)items, 0, 32) == 0);
-	assert(root->rnode == NULL);
+	assert(root->xa_head == NULL);
 }
 
 void tree_verify_min_height(struct radix_tree_root *root, int maxindex)
 {
 	unsigned shift;
-	struct radix_tree_node *node = root->rnode;
+	struct radix_tree_node *node = root->xa_head;
 	if (!radix_tree_is_internal_node(node)) {
 		assert(maxindex == 0);
 		return;
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 18/62] xarray: Add definition of struct xarray
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This is a direct replacement for struct radix_tree_root.  Some of the
struct members have changed name; convert those, and use a #define so
that radix_tree users continue to work without change.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/radix-tree.h            | 26 +++++-------
 include/linux/xarray.h                | 29 ++++++++++++++
 lib/idr.c                             |  4 +-
 lib/radix-tree.c                      | 75 +++++++++++++++++------------------
 tools/include/linux/spinlock.h        |  1 +
 tools/testing/radix-tree/Makefile     |  2 +-
 tools/testing/radix-tree/multiorder.c |  6 +--
 tools/testing/radix-tree/test.c       |  6 +--
 8 files changed, 86 insertions(+), 63 deletions(-)

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index ee9aad11d472..015bc1bdc3d2 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -30,6 +30,9 @@
 #include <linux/types.h>
 #include <linux/xarray.h>
 
+/* Keep unconverted code working */
+#define radix_tree_root		xarray
+
 /*
  * The bottom two bits of the slot determine how the remaining bits in the
  * slot are interpreted:
@@ -59,10 +62,7 @@ static inline bool radix_tree_is_internal_node(void *ptr)
 
 #define RADIX_TREE_MAX_TAGS 3
 
-#ifndef RADIX_TREE_MAP_SHIFT
-#define RADIX_TREE_MAP_SHIFT	(CONFIG_BASE_SMALL ? 4 : 6)
-#endif
-
+#define RADIX_TREE_MAP_SHIFT	XA_CHUNK_SHIFT
 #define RADIX_TREE_MAP_SIZE	(1UL << RADIX_TREE_MAP_SHIFT)
 #define RADIX_TREE_MAP_MASK	(RADIX_TREE_MAP_SIZE-1)
 
@@ -95,20 +95,14 @@ struct radix_tree_node {
 	unsigned long	tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS];
 };
 
-/* The top bits of gfp_mask are used to store the root tags and the IDR flag */
+/* The top bits of xa_flags are used to store the root tags and the IDR flag */
 #define ROOT_IS_IDR	((__force gfp_t)(1 << __GFP_BITS_SHIFT))
 #define ROOT_TAG_SHIFT	(__GFP_BITS_SHIFT + 1)
 
-struct radix_tree_root {
-	spinlock_t		xa_lock;
-	gfp_t			gfp_mask;
-	struct radix_tree_node	__rcu *rnode;
-};
-
 #define RADIX_TREE_INIT(name, mask)	{				\
 	.xa_lock = __SPIN_LOCK_UNLOCKED(name.xa_lock),			\
-	.gfp_mask = (mask),						\
-	.rnode = NULL,							\
+	.xa_flags = (mask),						\
+	.xa_head = NULL,						\
 }
 
 #define RADIX_TREE(name, mask) \
@@ -117,13 +111,13 @@ struct radix_tree_root {
 #define INIT_RADIX_TREE(root, mask)					\
 do {									\
 	(root)->xa_lock = __SPIN_LOCK_UNLOCKED(root.xa_lock);		\
-	(root)->gfp_mask = (mask);					\
-	(root)->rnode = NULL;						\
+	(root)->xa_flags = (mask);					\
+	(root)->xa_head = NULL;						\
 } while (0)
 
 static inline bool radix_tree_empty(const struct radix_tree_root *root)
 {
-	return root->rnode == NULL;
+	return root->xa_head == NULL;
 }
 
 /**
diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index b9e0350b9e90..03d430ec3bce 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -45,9 +45,38 @@
  * There are two levels of API provided.  Normal and Advanced.
  * The advanced API is more flexible but has fewer safeguards.
  */
+
+#include <linux/compiler.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
 
+/**
+ * struct xarray - The anchor of the XArray
+ * @xa_lock: Lock protecting writes to the array.
+ * @xa_flags: Internal XArray flags.
+ * @xa_head: The first pointer in the array.
+ *
+ * If all of the pointers in the array are NULL, @xa_head is a NULL pointer.
+ * If the only non-NULL pointer in the array is at index 0, @xa_head is that
+ * pointer.  If any other pointer in the array is non-NULL, @xa_head points
+ * to an @xa_node.
+ */
+struct xarray {
+	spinlock_t xa_lock;
+	gfp_t xa_flags;
+	void __rcu *xa_head;
+};
+
+#define __XARRAY_INIT(name, flags) {				\
+	.xa_lock = __SPIN_LOCK_UNLOCKED(name.xa_lock),		\
+	.xa_flags = flags,					\
+	.xa_head = NULL,					\
+}
+
+#define XARRAY_INIT(name) __XARRAY_INIT(name, 0)
+
+#define DEFINE_XARRAY(name) struct xarray name = XARRAY_INIT(name)
+
 #define BITS_PER_XA_VALUE	(BITS_PER_LONG - 1)
 
 /**
diff --git a/lib/idr.c b/lib/idr.c
index afcdff365037..50201b5c46e9 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -35,8 +35,8 @@ int idr_alloc_ul(struct idr *idr, void *ptr, unsigned long *nextid,
 	if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr)))
 		return -EINVAL;
 
-	if (WARN_ON_ONCE(!(idr->idr_rt.gfp_mask & ROOT_IS_IDR)))
-		idr->idr_rt.gfp_mask |= IDR_RT_MARKER;
+	if (WARN_ON_ONCE(!(idr->idr_rt.xa_flags & ROOT_IS_IDR)))
+		idr->idr_rt.xa_flags |= IDR_RT_MARKER;
 
 	radix_tree_iter_init(&iter, *nextid);
 	slot = idr_get_free(&idr->idr_rt, &iter, gfp, max);
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 4a1091e31932..930eb7d298d7 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -123,7 +123,7 @@ static unsigned int radix_tree_descend(const struct radix_tree_node *parent,
 
 static inline gfp_t root_gfp_mask(const struct radix_tree_root *root)
 {
-	return root->gfp_mask & __GFP_BITS_MASK;
+	return root->xa_flags & __GFP_BITS_MASK;
 }
 
 static inline void tag_set(struct radix_tree_node *node, unsigned int tag,
@@ -146,32 +146,32 @@ static inline int tag_get(const struct radix_tree_node *node, unsigned int tag,
 
 static inline void root_tag_set(struct radix_tree_root *root, unsigned tag)
 {
-	root->gfp_mask |= (__force gfp_t)(1 << (tag + ROOT_TAG_SHIFT));
+	root->xa_flags |= (__force gfp_t)(1 << (tag + ROOT_TAG_SHIFT));
 }
 
 static inline void root_tag_clear(struct radix_tree_root *root, unsigned tag)
 {
-	root->gfp_mask &= (__force gfp_t)~(1 << (tag + ROOT_TAG_SHIFT));
+	root->xa_flags &= (__force gfp_t)~(1 << (tag + ROOT_TAG_SHIFT));
 }
 
 static inline void root_tag_clear_all(struct radix_tree_root *root)
 {
-	root->gfp_mask &= (__force gfp_t)((1 << ROOT_TAG_SHIFT) - 1);
+	root->xa_flags &= (__force gfp_t)((1 << ROOT_TAG_SHIFT) - 1);
 }
 
 static inline int root_tag_get(const struct radix_tree_root *root, unsigned tag)
 {
-	return (__force int)root->gfp_mask & (1 << (tag + ROOT_TAG_SHIFT));
+	return (__force int)root->xa_flags & (1 << (tag + ROOT_TAG_SHIFT));
 }
 
 static inline unsigned root_tags_get(const struct radix_tree_root *root)
 {
-	return (__force unsigned)root->gfp_mask >> ROOT_TAG_SHIFT;
+	return (__force unsigned)root->xa_flags >> ROOT_TAG_SHIFT;
 }
 
 static inline bool is_idr(const struct radix_tree_root *root)
 {
-	return !!(root->gfp_mask & ROOT_IS_IDR);
+	return !!(root->xa_flags & ROOT_IS_IDR);
 }
 
 /*
@@ -290,12 +290,12 @@ static void dump_node(struct radix_tree_node *node, unsigned long index)
 /* For debug */
 static void radix_tree_dump(struct radix_tree_root *root)
 {
-	pr_debug("radix root: %p rnode %p tags %x\n",
-			root, root->rnode,
-			root->gfp_mask >> ROOT_TAG_SHIFT);
-	if (!radix_tree_is_internal_node(root->rnode))
+	pr_debug("radix root: %p xa_head %p tags %x\n",
+			root, root->xa_head,
+			root->xa_flags >> ROOT_TAG_SHIFT);
+	if (!radix_tree_is_internal_node(root->xa_head))
 		return;
-	dump_node(entry_to_node(root->rnode), 0);
+	dump_node(entry_to_node(root->xa_head), 0);
 }
 
 static void dump_ida_node(void *entry, unsigned long index)
@@ -339,9 +339,9 @@ static void dump_ida_node(void *entry, unsigned long index)
 static void ida_dump(struct ida *ida)
 {
 	struct radix_tree_root *root = &ida->ida_rt;
-	pr_debug("ida: %p node %p free %d\n", ida, root->rnode,
-				root->gfp_mask >> ROOT_TAG_SHIFT);
-	dump_ida_node(root->rnode, 0);
+	pr_debug("ida: %p node %p free %d\n", ida, root->xa_head,
+				root->xa_flags >> ROOT_TAG_SHIFT);
+	dump_ida_node(root->xa_head, 0);
 }
 #endif
 
@@ -575,7 +575,7 @@ int radix_tree_maybe_preload_order(gfp_t gfp_mask, int order)
 static unsigned radix_tree_load_root(const struct radix_tree_root *root,
 		struct radix_tree_node **nodep, unsigned long *maxindex)
 {
-	struct radix_tree_node *node = rcu_dereference_raw(root->rnode);
+	struct radix_tree_node *node = rcu_dereference_raw(root->xa_head);
 
 	*nodep = node;
 
@@ -604,7 +604,7 @@ static int radix_tree_extend(struct radix_tree_root *root, gfp_t gfp,
 	while (index > shift_maxindex(maxshift))
 		maxshift += RADIX_TREE_MAP_SHIFT;
 
-	entry = rcu_dereference_raw(root->rnode);
+	entry = rcu_dereference_raw(root->xa_head);
 	if (!entry && (!is_idr(root) || root_tag_get(root, IDR_FREE)))
 		goto out;
 
@@ -632,7 +632,7 @@ static int radix_tree_extend(struct radix_tree_root *root, gfp_t gfp,
 		if (radix_tree_is_internal_node(entry)) {
 			entry_to_node(entry)->parent = node;
 		} else if (xa_is_value(entry)) {
-			/* Moving an exceptional root->rnode to a node */
+			/* Moving an exceptional root->xa_head to a node */
 			node->exceptional = 1;
 		}
 		/*
@@ -641,7 +641,7 @@ static int radix_tree_extend(struct radix_tree_root *root, gfp_t gfp,
 		 */
 		node->slots[0] = (void __rcu *)entry;
 		entry = node_to_entry(node);
-		rcu_assign_pointer(root->rnode, entry);
+		rcu_assign_pointer(root->xa_head, entry);
 		shift += RADIX_TREE_MAP_SHIFT;
 	} while (shift <= maxshift);
 out:
@@ -658,7 +658,7 @@ static inline bool radix_tree_shrink(struct radix_tree_root *root,
 	bool shrunk = false;
 
 	for (;;) {
-		struct radix_tree_node *node = rcu_dereference_raw(root->rnode);
+		struct radix_tree_node *node = rcu_dereference_raw(root->xa_head);
 		struct radix_tree_node *child;
 
 		if (!radix_tree_is_internal_node(node))
@@ -686,9 +686,9 @@ static inline bool radix_tree_shrink(struct radix_tree_root *root,
 		 * moving the node from one part of the tree to another: if it
 		 * was safe to dereference the old pointer to it
 		 * (node->slots[0]), it will be safe to dereference the new
-		 * one (root->rnode) as far as dependent read barriers go.
+		 * one (root->xa_head) as far as dependent read barriers go.
 		 */
-		root->rnode = (void __rcu *)child;
+		root->xa_head = (void __rcu *)child;
 		if (is_idr(root) && !tag_get(node, IDR_FREE, 0))
 			root_tag_clear(root, IDR_FREE);
 
@@ -736,9 +736,8 @@ static bool delete_node(struct radix_tree_root *root,
 
 		if (node->count) {
 			if (node_to_entry(node) ==
-					rcu_dereference_raw(root->rnode))
-				deleted |= radix_tree_shrink(root,
-								update_node);
+					rcu_dereference_raw(root->xa_head))
+				deleted |= radix_tree_shrink(root, update_node);
 			return deleted;
 		}
 
@@ -753,7 +752,7 @@ static bool delete_node(struct radix_tree_root *root,
 			 */
 			if (!is_idr(root))
 				root_tag_clear_all(root);
-			root->rnode = NULL;
+			root->xa_head = NULL;
 		}
 
 		WARN_ON_ONCE(!list_empty(&node->private_list));
@@ -778,7 +777,7 @@ static bool delete_node(struct radix_tree_root *root,
  *	at position @index in the radix tree @root.
  *
  *	Until there is more than one item in the tree, no nodes are
- *	allocated and @root->rnode is used as a direct slot instead of
+ *	allocated and @root->xa_head is used as a direct slot instead of
  *	pointing to a node, in which case *@nodep will be NULL.
  *
  *	Returns -ENOMEM, or 0 for success.
@@ -788,7 +787,7 @@ int __radix_tree_create(struct radix_tree_root *root, unsigned long index,
 			void __rcu ***slotp)
 {
 	struct radix_tree_node *node = NULL, *child;
-	void __rcu **slot = (void __rcu **)&root->rnode;
+	void __rcu **slot = (void __rcu **)&root->xa_head;
 	unsigned long maxindex;
 	unsigned int shift, offset = 0;
 	unsigned long max = index | ((1UL << order) - 1);
@@ -804,7 +803,7 @@ int __radix_tree_create(struct radix_tree_root *root, unsigned long index,
 		if (error < 0)
 			return error;
 		shift = error;
-		child = rcu_dereference_raw(root->rnode);
+		child = rcu_dereference_raw(root->xa_head);
 	}
 
 	while (shift > order) {
@@ -995,7 +994,7 @@ EXPORT_SYMBOL(__radix_tree_insert);
  *	tree @root.
  *
  *	Until there is more than one item in the tree, no nodes are
- *	allocated and @root->rnode is used as a direct slot instead of
+ *	allocated and @root->xa_head is used as a direct slot instead of
  *	pointing to a node, in which case *@nodep will be NULL.
  */
 void *__radix_tree_lookup(const struct radix_tree_root *root,
@@ -1008,7 +1007,7 @@ void *__radix_tree_lookup(const struct radix_tree_root *root,
 
  restart:
 	parent = NULL;
-	slot = (void __rcu **)&root->rnode;
+	slot = (void __rcu **)&root->xa_head;
 	radix_tree_load_root(root, &node, &maxindex);
 	if (index > maxindex)
 		return NULL;
@@ -1160,9 +1159,9 @@ void __radix_tree_replace(struct radix_tree_root *root,
 	/*
 	 * This function supports replacing exceptional entries and
 	 * deleting entries, but that needs accounting against the
-	 * node unless the slot is root->rnode.
+	 * node unless the slot is root->xa_head.
 	 */
-	WARN_ON_ONCE(!node && (slot != (void __rcu **)&root->rnode) &&
+	WARN_ON_ONCE(!node && (slot != (void __rcu **)&root->xa_head) &&
 			(count || exceptional));
 	replace_slot(slot, item, node, count, exceptional);
 
@@ -1714,7 +1713,7 @@ void __rcu **radix_tree_next_chunk(const struct radix_tree_root *root,
 		iter->tags = 1;
 		iter->node = NULL;
 		__set_iter_shift(iter, 0);
-		return (void __rcu **)&root->rnode;
+		return (void __rcu **)&root->xa_head;
 	}
 
 	do {
@@ -2108,7 +2107,7 @@ void __rcu **idr_get_free(struct radix_tree_root *root,
 			      unsigned long max)
 {
 	struct radix_tree_node *node = NULL, *child;
-	void __rcu **slot = (void __rcu **)&root->rnode;
+	void __rcu **slot = (void __rcu **)&root->xa_head;
 	unsigned long maxindex, start = iter->next_index;
 	unsigned int shift, offset = 0;
 
@@ -2124,7 +2123,7 @@ void __rcu **idr_get_free(struct radix_tree_root *root,
 		if (error < 0)
 			return ERR_PTR(error);
 		shift = error;
-		child = rcu_dereference_raw(root->rnode);
+		child = rcu_dereference_raw(root->xa_head);
 	}
 
 	while (shift) {
@@ -2187,10 +2186,10 @@ void __rcu **idr_get_free(struct radix_tree_root *root,
  */
 void idr_destroy(struct idr *idr)
 {
-	struct radix_tree_node *node = rcu_dereference_raw(idr->idr_rt.rnode);
+	struct radix_tree_node *node = rcu_dereference_raw(idr->idr_rt.xa_head);
 	if (radix_tree_is_internal_node(node))
 		radix_tree_free_nodes(node);
-	idr->idr_rt.rnode = NULL;
+	idr->idr_rt.xa_head = NULL;
 	root_tag_set(&idr->idr_rt, IDR_FREE);
 }
 EXPORT_SYMBOL(idr_destroy);
diff --git a/tools/include/linux/spinlock.h b/tools/include/linux/spinlock.h
index b21b586b9854..34fed5c38da2 100644
--- a/tools/include/linux/spinlock.h
+++ b/tools/include/linux/spinlock.h
@@ -8,6 +8,7 @@
 #define spinlock_t		pthread_mutex_t
 #define DEFINE_SPINLOCK(x)	pthread_mutex_t x = PTHREAD_MUTEX_INITIALIZER;
 #define __SPIN_LOCK_UNLOCKED(x)	(pthread_mutex_t)PTHREAD_MUTEX_INITIALIZER
+#define spin_lock_init(x)	pthread_mutex_init(x, NULL);
 
 #define spin_lock_irqsave(x, f)		(void)f, pthread_mutex_lock(x)
 #define spin_unlock_irqrestore(x, f)	(void)f, pthread_mutex_unlock(x)
diff --git a/tools/testing/radix-tree/Makefile b/tools/testing/radix-tree/Makefile
index fa7ee369b3c9..ebb12224e258 100644
--- a/tools/testing/radix-tree/Makefile
+++ b/tools/testing/radix-tree/Makefile
@@ -46,6 +46,6 @@ idr.c: ../../../lib/idr.c
 
 mapshift:
 	@if ! grep -qws $(SHIFT) generated/map-shift.h; then		\
-		echo "#define RADIX_TREE_MAP_SHIFT $(SHIFT)" >		\
+		echo "#define XA_CHUNK_SHIFT $(SHIFT)" >		\
 				generated/map-shift.h;			\
 	fi
diff --git a/tools/testing/radix-tree/multiorder.c b/tools/testing/radix-tree/multiorder.c
index 684e76f79f4a..24293a2fd82d 100644
--- a/tools/testing/radix-tree/multiorder.c
+++ b/tools/testing/radix-tree/multiorder.c
@@ -191,13 +191,13 @@ static void multiorder_shrink(unsigned long index, int order)
 
 	assert(item_insert_order(&tree, 0, order) == 0);
 
-	node = tree.rnode;
+	node = tree.xa_head;
 
 	assert(item_insert(&tree, index) == 0);
-	assert(node != tree.rnode);
+	assert(node != tree.xa_head);
 
 	assert(item_delete(&tree, index) != 0);
-	assert(node == tree.rnode);
+	assert(node == tree.xa_head);
 
 	for (i = 0; i < max; i++) {
 		struct item *item = item_lookup(&tree, i);
diff --git a/tools/testing/radix-tree/test.c b/tools/testing/radix-tree/test.c
index 0d69c49177c6..6e1cc2040817 100644
--- a/tools/testing/radix-tree/test.c
+++ b/tools/testing/radix-tree/test.c
@@ -262,7 +262,7 @@ static int verify_node(struct radix_tree_node *slot, unsigned int tag,
 
 void verify_tag_consistency(struct radix_tree_root *root, unsigned int tag)
 {
-	struct radix_tree_node *node = root->rnode;
+	struct radix_tree_node *node = root->xa_head;
 	if (!radix_tree_is_internal_node(node))
 		return;
 	verify_node(node, tag, !!root_tag_get(root, tag));
@@ -292,13 +292,13 @@ void item_kill_tree(struct radix_tree_root *root)
 		}
 	}
 	assert(radix_tree_gang_lookup(root, (void **)items, 0, 32) == 0);
-	assert(root->rnode == NULL);
+	assert(root->xa_head == NULL);
 }
 
 void tree_verify_min_height(struct radix_tree_root *root, int maxindex)
 {
 	unsigned shift;
-	struct radix_tree_node *node = root->rnode;
+	struct radix_tree_node *node = root->xa_head;
 	if (!radix_tree_is_internal_node(node)) {
 		assert(maxindex == 0);
 		return;
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 19/62] xarray: Define struct xa_node
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This is a direct replacement for struct radix_tree_node.  Use a #define
so that radix tree users continue to work without change.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/radix-tree.h | 29 +++--------------------------
 include/linux/xarray.h     | 24 ++++++++++++++++++++++++
 2 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 015bc1bdc3d2..1da1fb01e993 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -32,6 +32,7 @@
 
 /* Keep unconverted code working */
 #define radix_tree_root		xarray
+#define radix_tree_node		xa_node
 
 /*
  * The bottom two bits of the slot determine how the remaining bits in the
@@ -60,41 +61,17 @@ static inline bool radix_tree_is_internal_node(void *ptr)
 
 /*** radix-tree API starts here ***/
 
-#define RADIX_TREE_MAX_TAGS 3
-
 #define RADIX_TREE_MAP_SHIFT	XA_CHUNK_SHIFT
 #define RADIX_TREE_MAP_SIZE	(1UL << RADIX_TREE_MAP_SHIFT)
 #define RADIX_TREE_MAP_MASK	(RADIX_TREE_MAP_SIZE-1)
 
-#define RADIX_TREE_TAG_LONGS	\
-	((RADIX_TREE_MAP_SIZE + BITS_PER_LONG - 1) / BITS_PER_LONG)
+#define RADIX_TREE_MAX_TAGS	XA_MAX_TAGS
+#define RADIX_TREE_TAG_LONGS	XA_TAG_LONGS
 
 #define RADIX_TREE_INDEX_BITS  (8 /* CHAR_BIT */ * sizeof(unsigned long))
 #define RADIX_TREE_MAX_PATH (DIV_ROUND_UP(RADIX_TREE_INDEX_BITS, \
 					  RADIX_TREE_MAP_SHIFT))
 
-/*
- * @count is the count of every non-NULL element in the ->slots array
- * whether that is a data entry, a retry entry, a user pointer,
- * a sibling entry or a pointer to the next level of the tree.
- * @exceptional is the count of every element in ->slots which is
- * either a data entry or a sibling entry for data.
- */
-struct radix_tree_node {
-	unsigned char	shift;		/* Bits remaining in each slot */
-	unsigned char	offset;		/* Slot offset in parent */
-	unsigned char	count;		/* Total entry count */
-	unsigned char	exceptional;	/* Exceptional entry count */
-	struct radix_tree_node *parent;		/* Used when ascending tree */
-	struct radix_tree_root *root;		/* The tree we belong to */
-	union {
-		struct list_head private_list;	/* For tree user */
-		struct rcu_head	rcu_head;	/* Used when freeing node */
-	};
-	void __rcu	*slots[RADIX_TREE_MAP_SIZE];
-	unsigned long	tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS];
-};
-
 /* The top bits of xa_flags are used to store the root tags and the IDR flag */
 #define ROOT_IS_IDR	((__force gfp_t)(1 << __GFP_BITS_SHIFT))
 #define ROOT_TAG_SHIFT	(__GFP_BITS_SHIFT + 1)
diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 03d430ec3bce..1513a9e85580 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -143,6 +143,30 @@ static inline bool xa_is_value(void *entry)
 #endif
 #define XA_CHUNK_SIZE		(1UL << XA_CHUNK_SHIFT)
 #define XA_CHUNK_MASK		(XA_CHUNK_SIZE - 1)
+#define XA_MAX_TAGS		3
+#define XA_TAG_LONGS		DIV_ROUND_UP(XA_CHUNK_SIZE, BITS_PER_LONG)
+
+/*
+ * @count is the count of every non-NULL element in the ->slots array
+ * whether that is a data entry, a retry entry, a user pointer,
+ * a sibling entry or a pointer to the next level of the tree.
+ * @exceptional is the count of every element in ->slots which is
+ * either a data entry or a sibling entry for data.
+ */
+struct xa_node {
+	unsigned char	shift;		/* Bits remaining in each slot */
+	unsigned char	offset;		/* Slot offset in parent */
+	unsigned char	count;		/* Total entry count */
+	unsigned char	exceptional;	/* Exceptional entry count */
+	struct xa_node *parent;		/* Used when ascending tree */
+	struct xarray *	root;		/* The tree we belong to */
+	union {
+		struct list_head private_list;	/* For tree user */
+		struct rcu_head	rcu_head;	/* Used when freeing node */
+	};
+	void __rcu	*slots[XA_CHUNK_SIZE];
+	unsigned long	tags[XA_MAX_TAGS][XA_TAG_LONGS];
+};
 
 /*
  * Internal entries have the bottom two bits set to the value 10b.  Most
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 19/62] xarray: Define struct xa_node
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This is a direct replacement for struct radix_tree_node.  Use a #define
so that radix tree users continue to work without change.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/radix-tree.h | 29 +++--------------------------
 include/linux/xarray.h     | 24 ++++++++++++++++++++++++
 2 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 015bc1bdc3d2..1da1fb01e993 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -32,6 +32,7 @@
 
 /* Keep unconverted code working */
 #define radix_tree_root		xarray
+#define radix_tree_node		xa_node
 
 /*
  * The bottom two bits of the slot determine how the remaining bits in the
@@ -60,41 +61,17 @@ static inline bool radix_tree_is_internal_node(void *ptr)
 
 /*** radix-tree API starts here ***/
 
-#define RADIX_TREE_MAX_TAGS 3
-
 #define RADIX_TREE_MAP_SHIFT	XA_CHUNK_SHIFT
 #define RADIX_TREE_MAP_SIZE	(1UL << RADIX_TREE_MAP_SHIFT)
 #define RADIX_TREE_MAP_MASK	(RADIX_TREE_MAP_SIZE-1)
 
-#define RADIX_TREE_TAG_LONGS	\
-	((RADIX_TREE_MAP_SIZE + BITS_PER_LONG - 1) / BITS_PER_LONG)
+#define RADIX_TREE_MAX_TAGS	XA_MAX_TAGS
+#define RADIX_TREE_TAG_LONGS	XA_TAG_LONGS
 
 #define RADIX_TREE_INDEX_BITS  (8 /* CHAR_BIT */ * sizeof(unsigned long))
 #define RADIX_TREE_MAX_PATH (DIV_ROUND_UP(RADIX_TREE_INDEX_BITS, \
 					  RADIX_TREE_MAP_SHIFT))
 
-/*
- * @count is the count of every non-NULL element in the ->slots array
- * whether that is a data entry, a retry entry, a user pointer,
- * a sibling entry or a pointer to the next level of the tree.
- * @exceptional is the count of every element in ->slots which is
- * either a data entry or a sibling entry for data.
- */
-struct radix_tree_node {
-	unsigned char	shift;		/* Bits remaining in each slot */
-	unsigned char	offset;		/* Slot offset in parent */
-	unsigned char	count;		/* Total entry count */
-	unsigned char	exceptional;	/* Exceptional entry count */
-	struct radix_tree_node *parent;		/* Used when ascending tree */
-	struct radix_tree_root *root;		/* The tree we belong to */
-	union {
-		struct list_head private_list;	/* For tree user */
-		struct rcu_head	rcu_head;	/* Used when freeing node */
-	};
-	void __rcu	*slots[RADIX_TREE_MAP_SIZE];
-	unsigned long	tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS];
-};
-
 /* The top bits of xa_flags are used to store the root tags and the IDR flag */
 #define ROOT_IS_IDR	((__force gfp_t)(1 << __GFP_BITS_SHIFT))
 #define ROOT_TAG_SHIFT	(__GFP_BITS_SHIFT + 1)
diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 03d430ec3bce..1513a9e85580 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -143,6 +143,30 @@ static inline bool xa_is_value(void *entry)
 #endif
 #define XA_CHUNK_SIZE		(1UL << XA_CHUNK_SHIFT)
 #define XA_CHUNK_MASK		(XA_CHUNK_SIZE - 1)
+#define XA_MAX_TAGS		3
+#define XA_TAG_LONGS		DIV_ROUND_UP(XA_CHUNK_SIZE, BITS_PER_LONG)
+
+/*
+ * @count is the count of every non-NULL element in the ->slots array
+ * whether that is a data entry, a retry entry, a user pointer,
+ * a sibling entry or a pointer to the next level of the tree.
+ * @exceptional is the count of every element in ->slots which is
+ * either a data entry or a sibling entry for data.
+ */
+struct xa_node {
+	unsigned char	shift;		/* Bits remaining in each slot */
+	unsigned char	offset;		/* Slot offset in parent */
+	unsigned char	count;		/* Total entry count */
+	unsigned char	exceptional;	/* Exceptional entry count */
+	struct xa_node *parent;		/* Used when ascending tree */
+	struct xarray *	root;		/* The tree we belong to */
+	union {
+		struct list_head private_list;	/* For tree user */
+		struct rcu_head	rcu_head;	/* Used when freeing node */
+	};
+	void __rcu	*slots[XA_CHUNK_SIZE];
+	unsigned long	tags[XA_MAX_TAGS][XA_TAG_LONGS];
+};
 
 /*
  * Internal entries have the bottom two bits set to the value 10b.  Most
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 20/62] xarray: Add xa_load
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This first function in the XArray API brings with it a lot of support
infrastructure.  The advanced API is based around the xa_state which is
a more capable version of the radix_tree_iter.

As the test-suite demonstrates, it is possible to use the xarray and
radix tree APIs on the same data structure.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h                      | 165 ++++++++++++++++++++++++++++
 lib/Makefile                                |   2 +-
 lib/xarray.c                                | 137 +++++++++++++++++++++++
 tools/testing/radix-tree/.gitignore         |   2 +
 tools/testing/radix-tree/Makefile           |  12 +-
 tools/testing/radix-tree/linux/radix-tree.h |   1 -
 tools/testing/radix-tree/linux/rcupdate.h   |   1 +
 tools/testing/radix-tree/linux/xarray.h     |   2 +
 tools/testing/radix-tree/xarray-test.c      |  56 ++++++++++
 9 files changed, 373 insertions(+), 5 deletions(-)
 create mode 100644 lib/xarray.c
 create mode 100644 tools/testing/radix-tree/linux/xarray.h
 create mode 100644 tools/testing/radix-tree/xarray-test.c

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 1513a9e85580..0e736d2db049 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -46,7 +46,10 @@
  * The advanced API is more flexible but has fewer safeguards.
  */
 
+#include <linux/bug.h>
 #include <linux/compiler.h>
+#include <linux/kernel.h>
+#include <linux/rcupdate.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
 
@@ -77,6 +80,8 @@ struct xarray {
 
 #define DEFINE_XARRAY(name) struct xarray name = XARRAY_INIT(name)
 
+void *xa_load(struct xarray *, unsigned long index);
+
 #define BITS_PER_XA_VALUE	(BITS_PER_LONG - 1)
 
 /**
@@ -128,6 +133,18 @@ static inline bool xa_is_value(void *entry)
 				spin_unlock_irqrestore(&(xa)->xa_lock, flags)
 #define xa_lock_held(xa)	lockdep_is_held(&(xa)->xa_lock)
 
+#ifdef XA_DEBUG
+void xa_dump(const struct xarray *);
+void xa_dump_node(const struct xa_node *);
+#define XA_BUG_ON(node, x) do { \
+		if ((x) && (node)) \
+			xa_dump_node(node); \
+		BUG_ON(x); \
+	} while (0)
+#else
+#define XA_BUG_ON(node, x)	do { } while (0)
+#endif
+
 /*
  * The xarray is constructed out of a set of 'chunks' of pointers.  Choosing
  * the best chunk size requires some tradeoffs.  A power of two recommends
@@ -168,6 +185,30 @@ struct xa_node {
 	unsigned long	tags[XA_MAX_TAGS][XA_TAG_LONGS];
 };
 
+static inline void *xa_head(struct xarray *xa)
+{
+	return rcu_dereference_check(xa->xa_head, xa_lock_held(xa));
+}
+
+static inline void *xa_head_locked(struct xarray *xa)
+{
+	return rcu_dereference_protected(xa->xa_head, xa_lock_held(xa));
+}
+
+static inline void *xa_entry(struct xarray *xa,
+				const struct xa_node *node, unsigned int offset)
+{
+	XA_BUG_ON(node, offset >= XA_CHUNK_SIZE);
+	return rcu_dereference_check(node->slots[offset], xa_lock_held(xa));
+}
+
+static inline void *xa_entry_locked(struct xarray *xa,
+				const struct xa_node *node, unsigned int offset)
+{
+	XA_BUG_ON(node, offset >= XA_CHUNK_SIZE);
+	return rcu_dereference_protected(node->slots[offset], xa_lock_held(xa));
+}
+
 /*
  * Internal entries have the bottom two bits set to the value 10b.  Most
  * internal entries are pointers to the next node in the tree.  Since the
@@ -190,6 +231,11 @@ static inline bool xa_is_internal(void *entry)
 	return ((unsigned long)entry & 3) == 2;
 }
 
+static inline struct xa_node *xa_to_node(void *entry)
+{
+	return (struct xa_node *)((unsigned long)entry & ~3UL);
+}
+
 static inline bool xa_is_node(void *entry)
 {
 	return xa_is_internal(entry) && (unsigned long)entry > 4096;
@@ -214,4 +260,123 @@ static inline bool xa_is_sibling(void *entry)
 
 #define XA_RETRY_ENTRY		xa_mk_internal(256)
 
+static inline bool xa_is_retry(void *entry)
+{
+	return unlikely(entry == XA_RETRY_ENTRY);
+}
+
+typedef void (*xa_update_node_t)(struct xa_node *);
+
+/*
+ * The xa_state is opaque to its users.  It contains various different pieces
+ * of state involved in the current operation on the XArray.  It should be
+ * declared on the stack and passed between the various internal routines.
+ * The various elements in it should not be accessed directly, but only
+ * through the provided accessor functions.  The below documentation is for
+ * the benefit of those working on the code, not for users of the XArray.
+ *
+ * @xa_node usually points to the xa_node containing the slot we're operating
+ * on (and @xa_offset is the offset in the slots array).  However, it can
+ * also have the value NULL if there is a single entry at index 0; it can
+ * have the value XAS_RESTART if the xa_state has not yet been walked to
+ * the correct position in the tree, and it can have an XAS_ERROR value to
+ * encode any error which may have occurred during the operation.
+ */
+struct xa_state {
+	unsigned long xa_index;
+	unsigned char xa_shift;
+	unsigned char xa_sibs;
+	unsigned char xa_offset;
+	struct xa_node *xa_node;
+	struct xa_node *xa_alloc;
+	xa_update_node_t xa_update;
+};
+
+/*
+ * We encode errnos in the xas->xa_node.  If an error has happened, we need to
+ * drop the lock to fix it, and once we've done so the xa_state is invalid.
+ */
+#define XAS_ERROR(errno)	((struct xa_node *)((errno << 1) | 1))
+#define XAS_RESTART		XAS_ERROR(0)
+
+#define __XA_STATE(index)	(struct xa_state) {	\
+	.xa_index = index,				\
+	.xa_shift = 0,					\
+	.xa_sibs = 0,					\
+	.xa_offset = 0,					\
+	.xa_node = XAS_RESTART,				\
+	.xa_alloc = NULL,				\
+	.xa_update = NULL				\
+}
+
+#define XA_STATE(name, index)				\
+	struct xa_state name = __XA_STATE(index)
+
+static inline int xas_error(const struct xa_state *xas)
+{
+	unsigned long v = (unsigned long)xas->xa_node;
+	return (v & 1) ? -(v >> 1) : 0;
+}
+
+static inline void xas_set_err(struct xa_state *xas, unsigned long err)
+{
+	XA_BUG_ON(NULL, err > MAX_ERRNO || !err);
+	xas->xa_node = XAS_ERROR(err);
+}
+
+static inline bool xas_invalid(const struct xa_state *xas)
+{
+	return (unsigned long)xas->xa_node & 1;
+}
+
+static inline bool xas_valid(const struct xa_state *xas)
+{
+	return !xas_invalid(xas);
+}
+
+/**
+ * xas_retry() - Handle a retry entry.
+ * @xas: XArray operation state.
+ * @entry: Entry from xarray.
+ *
+ * An RCU-protected read may see a retry entry as a side-effect of a
+ * simultaneous modification.  This function sets up the @xas to retry
+ * the walk from the head of the array.
+ *
+ * Return: true if the operation needs to be retried.
+ */
+static inline bool xas_retry(struct xa_state *xas, void *entry)
+{
+	if (!xa_is_retry(entry))
+		return false;
+	xas->xa_node = XAS_RESTART;
+	return true;
+}
+
+void *xas_load(struct xarray *, struct xa_state *);
+
+/**
+ * xas_reload() - Refetch an entry from the xarray.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ *
+ * Use this function to check that a previously loaded entry still has
+ * the same value.  This is useful for the lockless pagecache lookup where
+ * we walk the array with only the RCU lock to protect us, lock the page,
+ * then check that the page hasn't moved since we looked it up.
+ *
+ * The caller guarantees that @xas is still valid.  If it may be in an
+ * error or restart state, call xas_load() instead.
+ *
+ * Return: The entry at this location in the xarray.
+ */
+static inline void *xas_reload(struct xarray *xa, struct xa_state *xas)
+{
+	struct xa_node *node = xas->xa_node;
+
+	if (node)
+		return xa_entry(xa, node, xas->xa_offset);
+	return xa_head(xa);
+}
+
 #endif /* _LINUX_XARRAY_H */
diff --git a/lib/Makefile b/lib/Makefile
index d11c48ec8ffd..6aa523acc7c1 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -18,7 +18,7 @@ KCOV_INSTRUMENT_debugobjects.o := n
 KCOV_INSTRUMENT_dynamic_debug.o := n
 
 lib-y := ctype.o string.o vsprintf.o cmdline.o \
-	 rbtree.o radix-tree.o dump_stack.o timerqueue.o\
+	 rbtree.o radix-tree.o dump_stack.o timerqueue.o xarray.o \
 	 idr.o int_sqrt.o extable.o \
 	 sha1.o chacha20.o irq_regs.o argv_split.o \
 	 flex_proportions.o ratelimit.o show_mem.o \
diff --git a/lib/xarray.c b/lib/xarray.c
new file mode 100644
index 000000000000..1f7d30a8b61f
--- /dev/null
+++ b/lib/xarray.c
@@ -0,0 +1,137 @@
+/*
+ * XArray implementation
+ * Copyright (c) 2017 Microsoft Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/export.h>
+#include <linux/gfp.h>
+#include <linux/radix-tree.h>
+#include <linux/xarray.h>
+
+/*
+ * Coding conventions in this file:
+ *
+ * @xa is used to refer to the entire xarray.
+ * @xas is the 'xarray operation state'.  It may be either a pointer to
+ * an xa_state, or an xa_state stored on the stack.  This is an unfortunate
+ * ambiguity.
+ * @index is the index of the entry being operated on
+ * @tag is an xa_tag_t; a small number indicating one of the tag bits.
+ * @node refers to an xa_node; usually the primary one being operated on by
+ * this function.
+ * @offset is the index into the slots array inside an xa_node.
+ * @parent refers to the @xa_node closer to the head than @node.
+ * @entry refers to something stored in a slot in the xarray
+ */
+
+/* extracts the offset within this node from the index */
+static unsigned int get_offset(unsigned long index, struct xa_node *node)
+{
+	return (index >> node->shift) & XA_CHUNK_MASK;
+}
+
+/*
+ * Starts a walk.  If the @xas is already valid, we assume that it's on
+ * the right path and just return where we've got to.  If we're in an
+ * error state, return NULL.  If the index is outside the current scope
+ * of the xarray, return NULL without changing @xas->xa_node.  Otherwise
+ * set @xas->xa_node to NULL and return the current head of the array.
+ */
+static void *xas_start(struct xarray *xa, struct xa_state *xas)
+{
+	void *entry;
+
+	if (xas_valid(xas))
+		return xas_reload(xa, xas);
+	if (xas_error(xas))
+		return NULL;
+
+	entry = xa_head(xa);
+	if (!xa_is_node(entry)) {
+		if (xas->xa_index)
+			return NULL;
+	} else {
+		if ((xas->xa_index >> xa_to_node(entry)->shift) > XA_CHUNK_MASK)
+			return NULL;
+	}
+
+	xas->xa_node = NULL;
+	return entry;
+}
+
+static void *xas_descend(struct xarray *xa, struct xa_state *xas,
+			struct xa_node *node)
+{
+	unsigned int offset = get_offset(xas->xa_index, node);
+	void *entry = xa_entry(xa, node, offset);
+
+	if (xa_is_sibling(entry)) {
+		offset = xa_to_sibling(entry);
+		entry = xa_entry(xa, node, offset);
+	}
+
+	xas->xa_node = node;
+	xas->xa_offset = offset;
+	return entry;
+}
+
+/**
+ * xas_load() - Load an entry from the XArray (advanced).
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ *
+ * Usually walks the @xas to the appropriate state to load the entry stored
+ * at xa_index.  However, it will do nothing and return NULL  if @xas is
+ * holding an error.  If the xa_shift indicates we're operating on a
+ * multislot entry, it will terminate early and potentially return an
+ * internal entry.  xas_load() will never expand the tree (see xas_create()).
+ *
+ * The caller should hold the xa_lock or the RCU lock.
+ *
+ * Return: Usually an entry in the XArray, but see description for exceptions.
+ */
+void *xas_load(struct xarray *xa, struct xa_state *xas)
+{
+	void *entry = xas_start(xa, xas);
+
+	while (xa_is_node(entry)) {
+		struct xa_node *node = xa_to_node(entry);
+
+		if (xas->xa_shift > node->shift)
+			break;
+		entry = xas_descend(xa, xas, node);
+	}
+	return entry;
+}
+EXPORT_SYMBOL_GPL(xas_load);
+
+/**
+ * xa_load() - Load an entry from an XArray.
+ * @xa: XArray.
+ * @index: index into array.
+ *
+ * Return: The entry at @index in @xa.
+ */
+void *xa_load(struct xarray *xa, unsigned long index)
+{
+	XA_STATE(xas, index);
+	void *entry;
+
+	rcu_read_lock();
+	do {
+		entry = xas_load(xa, &xas);
+	} while (xas_retry(&xas, entry));
+	rcu_read_unlock();
+
+	return entry;
+}
+EXPORT_SYMBOL(xa_load);
diff --git a/tools/testing/radix-tree/.gitignore b/tools/testing/radix-tree/.gitignore
index d4706c0ffceb..833136896b91 100644
--- a/tools/testing/radix-tree/.gitignore
+++ b/tools/testing/radix-tree/.gitignore
@@ -4,3 +4,5 @@ idr-test
 main
 multiorder
 radix-tree.c
+xarray.c
+xarray-test
diff --git a/tools/testing/radix-tree/Makefile b/tools/testing/radix-tree/Makefile
index ebb12224e258..d0ef117941e6 100644
--- a/tools/testing/radix-tree/Makefile
+++ b/tools/testing/radix-tree/Makefile
@@ -3,10 +3,11 @@
 CFLAGS += -I. -I../../include -g -O2 -Wall -D_LGPL_SOURCE -fsanitize=address
 LDFLAGS += -fsanitize=address
 LDLIBS+= -lpthread -lurcu
-TARGETS = main idr-test multiorder
-CORE_OFILES := radix-tree.o idr.o linux.o test.o find_bit.o
+TARGETS = main idr-test multiorder xarray-test
+CORE_OFILES := radix-tree.o idr.o xarray.o linux.o test.o find_bit.o
 OFILES = main.o $(CORE_OFILES) regression1.o regression2.o regression3.o \
-	 tag_check.o multiorder.o idr-test.o iteration_check.o benchmark.o
+	 tag_check.o multiorder.o idr-test.o iteration_check.o benchmark.o \
+	 xarray-test.o
 
 ifndef SHIFT
 	SHIFT=3
@@ -23,6 +24,8 @@ main:	$(OFILES)
 
 idr-test: idr-test.o $(CORE_OFILES)
 
+xarray-test: idr-test.o $(CORE_OFILES)
+
 multiorder: multiorder.o $(CORE_OFILES)
 
 clean:
@@ -42,6 +45,9 @@ radix-tree.c: ../../../lib/radix-tree.c
 idr.c: ../../../lib/idr.c
 	sed -e 's/^static //' -e 's/__always_inline //' -e 's/inline //' < $< > $@
 
+xarray.c: ../../../lib/xarray.c
+	sed -e 's/^static //' -e 's/__always_inline //' -e 's/inline //' < $< > $@
+
 .PHONY: mapshift
 
 mapshift:
diff --git a/tools/testing/radix-tree/linux/radix-tree.h b/tools/testing/radix-tree/linux/radix-tree.h
index de3f655caca3..24f13d27a8da 100644
--- a/tools/testing/radix-tree/linux/radix-tree.h
+++ b/tools/testing/radix-tree/linux/radix-tree.h
@@ -4,7 +4,6 @@
 
 #include "generated/map-shift.h"
 #include "../../../../include/linux/radix-tree.h"
-#include <linux/xarray.h>
 
 extern int kmalloc_verbose;
 extern int test_verbose;
diff --git a/tools/testing/radix-tree/linux/rcupdate.h b/tools/testing/radix-tree/linux/rcupdate.h
index 73ed33658203..25010bf86c1d 100644
--- a/tools/testing/radix-tree/linux/rcupdate.h
+++ b/tools/testing/radix-tree/linux/rcupdate.h
@@ -6,5 +6,6 @@
 
 #define rcu_dereference_raw(p) rcu_dereference(p)
 #define rcu_dereference_protected(p, cond) rcu_dereference(p)
+#define rcu_dereference_check(p, cond) rcu_dereference(p)
 
 #endif
diff --git a/tools/testing/radix-tree/linux/xarray.h b/tools/testing/radix-tree/linux/xarray.h
new file mode 100644
index 000000000000..4440878501c4
--- /dev/null
+++ b/tools/testing/radix-tree/linux/xarray.h
@@ -0,0 +1,2 @@
+#include "../generated/map-shift.h"
+#include "../../../../include/linux/xarray.h"
diff --git a/tools/testing/radix-tree/xarray-test.c b/tools/testing/radix-tree/xarray-test.c
new file mode 100644
index 000000000000..3f8f19cb3739
--- /dev/null
+++ b/tools/testing/radix-tree/xarray-test.c
@@ -0,0 +1,56 @@
+/*
+ * xarray-test.c: Test the XArray API
+ * Copyright (c) 2017 Microsoft Corporation <mawilcox@microsoft.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#include <linux/bitmap.h>
+#include <linux/xarray.h>
+#include <linux/slab.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+
+#include "test.h"
+
+void check_xa_load(struct xarray *xa)
+{
+	unsigned long i, j;
+
+	for (i = 0; i < 1024; i++) {
+		for (j = 0; j < 1024; j++) {
+			void *entry = xa_load(xa, j);
+			if (j < i)
+				assert(xa_to_value(entry) == j);
+			else
+				assert(!entry);
+		}
+		radix_tree_insert(xa, i, xa_mk_value(i));
+	}
+}
+
+void xarray_checks(void)
+{
+	RADIX_TREE(array, GFP_KERNEL);
+
+	check_xa_load(&array);
+
+	item_kill_tree(&array);
+}
+
+int __weak main(void)
+{
+	radix_tree_init();
+	xarray_checks();
+	radix_tree_cpu_dead(1);
+	rcu_barrier();
+	if (nr_allocated)
+		printf("nr_allocated = %d\n", nr_allocated);
+	return 0;
+}
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 20/62] xarray: Add xa_load
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This first function in the XArray API brings with it a lot of support
infrastructure.  The advanced API is based around the xa_state which is
a more capable version of the radix_tree_iter.

As the test-suite demonstrates, it is possible to use the xarray and
radix tree APIs on the same data structure.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h                      | 165 ++++++++++++++++++++++++++++
 lib/Makefile                                |   2 +-
 lib/xarray.c                                | 137 +++++++++++++++++++++++
 tools/testing/radix-tree/.gitignore         |   2 +
 tools/testing/radix-tree/Makefile           |  12 +-
 tools/testing/radix-tree/linux/radix-tree.h |   1 -
 tools/testing/radix-tree/linux/rcupdate.h   |   1 +
 tools/testing/radix-tree/linux/xarray.h     |   2 +
 tools/testing/radix-tree/xarray-test.c      |  56 ++++++++++
 9 files changed, 373 insertions(+), 5 deletions(-)
 create mode 100644 lib/xarray.c
 create mode 100644 tools/testing/radix-tree/linux/xarray.h
 create mode 100644 tools/testing/radix-tree/xarray-test.c

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 1513a9e85580..0e736d2db049 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -46,7 +46,10 @@
  * The advanced API is more flexible but has fewer safeguards.
  */
 
+#include <linux/bug.h>
 #include <linux/compiler.h>
+#include <linux/kernel.h>
+#include <linux/rcupdate.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
 
@@ -77,6 +80,8 @@ struct xarray {
 
 #define DEFINE_XARRAY(name) struct xarray name = XARRAY_INIT(name)
 
+void *xa_load(struct xarray *, unsigned long index);
+
 #define BITS_PER_XA_VALUE	(BITS_PER_LONG - 1)
 
 /**
@@ -128,6 +133,18 @@ static inline bool xa_is_value(void *entry)
 				spin_unlock_irqrestore(&(xa)->xa_lock, flags)
 #define xa_lock_held(xa)	lockdep_is_held(&(xa)->xa_lock)
 
+#ifdef XA_DEBUG
+void xa_dump(const struct xarray *);
+void xa_dump_node(const struct xa_node *);
+#define XA_BUG_ON(node, x) do { \
+		if ((x) && (node)) \
+			xa_dump_node(node); \
+		BUG_ON(x); \
+	} while (0)
+#else
+#define XA_BUG_ON(node, x)	do { } while (0)
+#endif
+
 /*
  * The xarray is constructed out of a set of 'chunks' of pointers.  Choosing
  * the best chunk size requires some tradeoffs.  A power of two recommends
@@ -168,6 +185,30 @@ struct xa_node {
 	unsigned long	tags[XA_MAX_TAGS][XA_TAG_LONGS];
 };
 
+static inline void *xa_head(struct xarray *xa)
+{
+	return rcu_dereference_check(xa->xa_head, xa_lock_held(xa));
+}
+
+static inline void *xa_head_locked(struct xarray *xa)
+{
+	return rcu_dereference_protected(xa->xa_head, xa_lock_held(xa));
+}
+
+static inline void *xa_entry(struct xarray *xa,
+				const struct xa_node *node, unsigned int offset)
+{
+	XA_BUG_ON(node, offset >= XA_CHUNK_SIZE);
+	return rcu_dereference_check(node->slots[offset], xa_lock_held(xa));
+}
+
+static inline void *xa_entry_locked(struct xarray *xa,
+				const struct xa_node *node, unsigned int offset)
+{
+	XA_BUG_ON(node, offset >= XA_CHUNK_SIZE);
+	return rcu_dereference_protected(node->slots[offset], xa_lock_held(xa));
+}
+
 /*
  * Internal entries have the bottom two bits set to the value 10b.  Most
  * internal entries are pointers to the next node in the tree.  Since the
@@ -190,6 +231,11 @@ static inline bool xa_is_internal(void *entry)
 	return ((unsigned long)entry & 3) == 2;
 }
 
+static inline struct xa_node *xa_to_node(void *entry)
+{
+	return (struct xa_node *)((unsigned long)entry & ~3UL);
+}
+
 static inline bool xa_is_node(void *entry)
 {
 	return xa_is_internal(entry) && (unsigned long)entry > 4096;
@@ -214,4 +260,123 @@ static inline bool xa_is_sibling(void *entry)
 
 #define XA_RETRY_ENTRY		xa_mk_internal(256)
 
+static inline bool xa_is_retry(void *entry)
+{
+	return unlikely(entry == XA_RETRY_ENTRY);
+}
+
+typedef void (*xa_update_node_t)(struct xa_node *);
+
+/*
+ * The xa_state is opaque to its users.  It contains various different pieces
+ * of state involved in the current operation on the XArray.  It should be
+ * declared on the stack and passed between the various internal routines.
+ * The various elements in it should not be accessed directly, but only
+ * through the provided accessor functions.  The below documentation is for
+ * the benefit of those working on the code, not for users of the XArray.
+ *
+ * @xa_node usually points to the xa_node containing the slot we're operating
+ * on (and @xa_offset is the offset in the slots array).  However, it can
+ * also have the value NULL if there is a single entry at index 0; it can
+ * have the value XAS_RESTART if the xa_state has not yet been walked to
+ * the correct position in the tree, and it can have an XAS_ERROR value to
+ * encode any error which may have occurred during the operation.
+ */
+struct xa_state {
+	unsigned long xa_index;
+	unsigned char xa_shift;
+	unsigned char xa_sibs;
+	unsigned char xa_offset;
+	struct xa_node *xa_node;
+	struct xa_node *xa_alloc;
+	xa_update_node_t xa_update;
+};
+
+/*
+ * We encode errnos in the xas->xa_node.  If an error has happened, we need to
+ * drop the lock to fix it, and once we've done so the xa_state is invalid.
+ */
+#define XAS_ERROR(errno)	((struct xa_node *)((errno << 1) | 1))
+#define XAS_RESTART		XAS_ERROR(0)
+
+#define __XA_STATE(index)	(struct xa_state) {	\
+	.xa_index = index,				\
+	.xa_shift = 0,					\
+	.xa_sibs = 0,					\
+	.xa_offset = 0,					\
+	.xa_node = XAS_RESTART,				\
+	.xa_alloc = NULL,				\
+	.xa_update = NULL				\
+}
+
+#define XA_STATE(name, index)				\
+	struct xa_state name = __XA_STATE(index)
+
+static inline int xas_error(const struct xa_state *xas)
+{
+	unsigned long v = (unsigned long)xas->xa_node;
+	return (v & 1) ? -(v >> 1) : 0;
+}
+
+static inline void xas_set_err(struct xa_state *xas, unsigned long err)
+{
+	XA_BUG_ON(NULL, err > MAX_ERRNO || !err);
+	xas->xa_node = XAS_ERROR(err);
+}
+
+static inline bool xas_invalid(const struct xa_state *xas)
+{
+	return (unsigned long)xas->xa_node & 1;
+}
+
+static inline bool xas_valid(const struct xa_state *xas)
+{
+	return !xas_invalid(xas);
+}
+
+/**
+ * xas_retry() - Handle a retry entry.
+ * @xas: XArray operation state.
+ * @entry: Entry from xarray.
+ *
+ * An RCU-protected read may see a retry entry as a side-effect of a
+ * simultaneous modification.  This function sets up the @xas to retry
+ * the walk from the head of the array.
+ *
+ * Return: true if the operation needs to be retried.
+ */
+static inline bool xas_retry(struct xa_state *xas, void *entry)
+{
+	if (!xa_is_retry(entry))
+		return false;
+	xas->xa_node = XAS_RESTART;
+	return true;
+}
+
+void *xas_load(struct xarray *, struct xa_state *);
+
+/**
+ * xas_reload() - Refetch an entry from the xarray.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ *
+ * Use this function to check that a previously loaded entry still has
+ * the same value.  This is useful for the lockless pagecache lookup where
+ * we walk the array with only the RCU lock to protect us, lock the page,
+ * then check that the page hasn't moved since we looked it up.
+ *
+ * The caller guarantees that @xas is still valid.  If it may be in an
+ * error or restart state, call xas_load() instead.
+ *
+ * Return: The entry at this location in the xarray.
+ */
+static inline void *xas_reload(struct xarray *xa, struct xa_state *xas)
+{
+	struct xa_node *node = xas->xa_node;
+
+	if (node)
+		return xa_entry(xa, node, xas->xa_offset);
+	return xa_head(xa);
+}
+
 #endif /* _LINUX_XARRAY_H */
diff --git a/lib/Makefile b/lib/Makefile
index d11c48ec8ffd..6aa523acc7c1 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -18,7 +18,7 @@ KCOV_INSTRUMENT_debugobjects.o := n
 KCOV_INSTRUMENT_dynamic_debug.o := n
 
 lib-y := ctype.o string.o vsprintf.o cmdline.o \
-	 rbtree.o radix-tree.o dump_stack.o timerqueue.o\
+	 rbtree.o radix-tree.o dump_stack.o timerqueue.o xarray.o \
 	 idr.o int_sqrt.o extable.o \
 	 sha1.o chacha20.o irq_regs.o argv_split.o \
 	 flex_proportions.o ratelimit.o show_mem.o \
diff --git a/lib/xarray.c b/lib/xarray.c
new file mode 100644
index 000000000000..1f7d30a8b61f
--- /dev/null
+++ b/lib/xarray.c
@@ -0,0 +1,137 @@
+/*
+ * XArray implementation
+ * Copyright (c) 2017 Microsoft Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/export.h>
+#include <linux/gfp.h>
+#include <linux/radix-tree.h>
+#include <linux/xarray.h>
+
+/*
+ * Coding conventions in this file:
+ *
+ * @xa is used to refer to the entire xarray.
+ * @xas is the 'xarray operation state'.  It may be either a pointer to
+ * an xa_state, or an xa_state stored on the stack.  This is an unfortunate
+ * ambiguity.
+ * @index is the index of the entry being operated on
+ * @tag is an xa_tag_t; a small number indicating one of the tag bits.
+ * @node refers to an xa_node; usually the primary one being operated on by
+ * this function.
+ * @offset is the index into the slots array inside an xa_node.
+ * @parent refers to the @xa_node closer to the head than @node.
+ * @entry refers to something stored in a slot in the xarray
+ */
+
+/* extracts the offset within this node from the index */
+static unsigned int get_offset(unsigned long index, struct xa_node *node)
+{
+	return (index >> node->shift) & XA_CHUNK_MASK;
+}
+
+/*
+ * Starts a walk.  If the @xas is already valid, we assume that it's on
+ * the right path and just return where we've got to.  If we're in an
+ * error state, return NULL.  If the index is outside the current scope
+ * of the xarray, return NULL without changing @xas->xa_node.  Otherwise
+ * set @xas->xa_node to NULL and return the current head of the array.
+ */
+static void *xas_start(struct xarray *xa, struct xa_state *xas)
+{
+	void *entry;
+
+	if (xas_valid(xas))
+		return xas_reload(xa, xas);
+	if (xas_error(xas))
+		return NULL;
+
+	entry = xa_head(xa);
+	if (!xa_is_node(entry)) {
+		if (xas->xa_index)
+			return NULL;
+	} else {
+		if ((xas->xa_index >> xa_to_node(entry)->shift) > XA_CHUNK_MASK)
+			return NULL;
+	}
+
+	xas->xa_node = NULL;
+	return entry;
+}
+
+static void *xas_descend(struct xarray *xa, struct xa_state *xas,
+			struct xa_node *node)
+{
+	unsigned int offset = get_offset(xas->xa_index, node);
+	void *entry = xa_entry(xa, node, offset);
+
+	if (xa_is_sibling(entry)) {
+		offset = xa_to_sibling(entry);
+		entry = xa_entry(xa, node, offset);
+	}
+
+	xas->xa_node = node;
+	xas->xa_offset = offset;
+	return entry;
+}
+
+/**
+ * xas_load() - Load an entry from the XArray (advanced).
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ *
+ * Usually walks the @xas to the appropriate state to load the entry stored
+ * at xa_index.  However, it will do nothing and return NULL  if @xas is
+ * holding an error.  If the xa_shift indicates we're operating on a
+ * multislot entry, it will terminate early and potentially return an
+ * internal entry.  xas_load() will never expand the tree (see xas_create()).
+ *
+ * The caller should hold the xa_lock or the RCU lock.
+ *
+ * Return: Usually an entry in the XArray, but see description for exceptions.
+ */
+void *xas_load(struct xarray *xa, struct xa_state *xas)
+{
+	void *entry = xas_start(xa, xas);
+
+	while (xa_is_node(entry)) {
+		struct xa_node *node = xa_to_node(entry);
+
+		if (xas->xa_shift > node->shift)
+			break;
+		entry = xas_descend(xa, xas, node);
+	}
+	return entry;
+}
+EXPORT_SYMBOL_GPL(xas_load);
+
+/**
+ * xa_load() - Load an entry from an XArray.
+ * @xa: XArray.
+ * @index: index into array.
+ *
+ * Return: The entry at @index in @xa.
+ */
+void *xa_load(struct xarray *xa, unsigned long index)
+{
+	XA_STATE(xas, index);
+	void *entry;
+
+	rcu_read_lock();
+	do {
+		entry = xas_load(xa, &xas);
+	} while (xas_retry(&xas, entry));
+	rcu_read_unlock();
+
+	return entry;
+}
+EXPORT_SYMBOL(xa_load);
diff --git a/tools/testing/radix-tree/.gitignore b/tools/testing/radix-tree/.gitignore
index d4706c0ffceb..833136896b91 100644
--- a/tools/testing/radix-tree/.gitignore
+++ b/tools/testing/radix-tree/.gitignore
@@ -4,3 +4,5 @@ idr-test
 main
 multiorder
 radix-tree.c
+xarray.c
+xarray-test
diff --git a/tools/testing/radix-tree/Makefile b/tools/testing/radix-tree/Makefile
index ebb12224e258..d0ef117941e6 100644
--- a/tools/testing/radix-tree/Makefile
+++ b/tools/testing/radix-tree/Makefile
@@ -3,10 +3,11 @@
 CFLAGS += -I. -I../../include -g -O2 -Wall -D_LGPL_SOURCE -fsanitize=address
 LDFLAGS += -fsanitize=address
 LDLIBS+= -lpthread -lurcu
-TARGETS = main idr-test multiorder
-CORE_OFILES := radix-tree.o idr.o linux.o test.o find_bit.o
+TARGETS = main idr-test multiorder xarray-test
+CORE_OFILES := radix-tree.o idr.o xarray.o linux.o test.o find_bit.o
 OFILES = main.o $(CORE_OFILES) regression1.o regression2.o regression3.o \
-	 tag_check.o multiorder.o idr-test.o iteration_check.o benchmark.o
+	 tag_check.o multiorder.o idr-test.o iteration_check.o benchmark.o \
+	 xarray-test.o
 
 ifndef SHIFT
 	SHIFT=3
@@ -23,6 +24,8 @@ main:	$(OFILES)
 
 idr-test: idr-test.o $(CORE_OFILES)
 
+xarray-test: idr-test.o $(CORE_OFILES)
+
 multiorder: multiorder.o $(CORE_OFILES)
 
 clean:
@@ -42,6 +45,9 @@ radix-tree.c: ../../../lib/radix-tree.c
 idr.c: ../../../lib/idr.c
 	sed -e 's/^static //' -e 's/__always_inline //' -e 's/inline //' < $< > $@
 
+xarray.c: ../../../lib/xarray.c
+	sed -e 's/^static //' -e 's/__always_inline //' -e 's/inline //' < $< > $@
+
 .PHONY: mapshift
 
 mapshift:
diff --git a/tools/testing/radix-tree/linux/radix-tree.h b/tools/testing/radix-tree/linux/radix-tree.h
index de3f655caca3..24f13d27a8da 100644
--- a/tools/testing/radix-tree/linux/radix-tree.h
+++ b/tools/testing/radix-tree/linux/radix-tree.h
@@ -4,7 +4,6 @@
 
 #include "generated/map-shift.h"
 #include "../../../../include/linux/radix-tree.h"
-#include <linux/xarray.h>
 
 extern int kmalloc_verbose;
 extern int test_verbose;
diff --git a/tools/testing/radix-tree/linux/rcupdate.h b/tools/testing/radix-tree/linux/rcupdate.h
index 73ed33658203..25010bf86c1d 100644
--- a/tools/testing/radix-tree/linux/rcupdate.h
+++ b/tools/testing/radix-tree/linux/rcupdate.h
@@ -6,5 +6,6 @@
 
 #define rcu_dereference_raw(p) rcu_dereference(p)
 #define rcu_dereference_protected(p, cond) rcu_dereference(p)
+#define rcu_dereference_check(p, cond) rcu_dereference(p)
 
 #endif
diff --git a/tools/testing/radix-tree/linux/xarray.h b/tools/testing/radix-tree/linux/xarray.h
new file mode 100644
index 000000000000..4440878501c4
--- /dev/null
+++ b/tools/testing/radix-tree/linux/xarray.h
@@ -0,0 +1,2 @@
+#include "../generated/map-shift.h"
+#include "../../../../include/linux/xarray.h"
diff --git a/tools/testing/radix-tree/xarray-test.c b/tools/testing/radix-tree/xarray-test.c
new file mode 100644
index 000000000000..3f8f19cb3739
--- /dev/null
+++ b/tools/testing/radix-tree/xarray-test.c
@@ -0,0 +1,56 @@
+/*
+ * xarray-test.c: Test the XArray API
+ * Copyright (c) 2017 Microsoft Corporation <mawilcox@microsoft.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#include <linux/bitmap.h>
+#include <linux/xarray.h>
+#include <linux/slab.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+
+#include "test.h"
+
+void check_xa_load(struct xarray *xa)
+{
+	unsigned long i, j;
+
+	for (i = 0; i < 1024; i++) {
+		for (j = 0; j < 1024; j++) {
+			void *entry = xa_load(xa, j);
+			if (j < i)
+				assert(xa_to_value(entry) == j);
+			else
+				assert(!entry);
+		}
+		radix_tree_insert(xa, i, xa_mk_value(i));
+	}
+}
+
+void xarray_checks(void)
+{
+	RADIX_TREE(array, GFP_KERNEL);
+
+	check_xa_load(&array);
+
+	item_kill_tree(&array);
+}
+
+int __weak main(void)
+{
+	radix_tree_init();
+	xarray_checks();
+	radix_tree_cpu_dead(1);
+	rcu_barrier();
+	if (nr_allocated)
+		printf("nr_allocated = %d\n", nr_allocated);
+	return 0;
+}
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 21/62] xarray: Add xa_get_tag, xa_set_tag and xa_clear_tag
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

XArray tags are slightly more strongly typed than the radix tree tags,
but occupy the same bits.  This commit also adds the xas_ family of tag
operations, for cases where the caller is already holding the lock, and
xa_tagged() to ask whether any array member has a particular tag set.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h |  38 +++++++-
 lib/radix-tree.c       |  52 +++++-----
 lib/xarray.c           | 250 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 313 insertions(+), 27 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 0e736d2db049..ab6b1f5e685a 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -48,6 +48,7 @@
 
 #include <linux/bug.h>
 #include <linux/compiler.h>
+#include <linux/gfp.h>
 #include <linux/kernel.h>
 #include <linux/rcupdate.h>
 #include <linux/spinlock.h>
@@ -82,6 +83,33 @@ struct xarray {
 
 void *xa_load(struct xarray *, unsigned long index);
 
+typedef unsigned __bitwise xa_tag_t;
+#define XA_TAG_0		((__force xa_tag_t)0U)
+#define XA_TAG_1		((__force xa_tag_t)1U)
+#define XA_TAG_2		((__force xa_tag_t)2U)
+#define XA_NO_TAG		((__force xa_tag_t)4U)
+
+#define XA_TAG_MAX		XA_TAG_2
+#define XA_FREE_TAG		XA_TAG_0
+#define XA_FLAGS_TAG(tag)	((__force gfp_t)((2U << __GFP_BITS_SHIFT) << \
+				(__force unsigned)(tag)))
+
+/**
+ * xa_tagged() - Inquire whether any entry in this array has a tag set
+ * @xa: Array
+ * @tag: Tag value
+ *
+ * Return: True if any entry has this tag set, false if no entry does.
+ */
+static inline bool xa_tagged(const struct xarray *xa, xa_tag_t tag)
+{
+	return xa->xa_flags & XA_FLAGS_TAG(tag);
+}
+
+bool xa_get_tag(struct xarray *, unsigned long index, xa_tag_t);
+void *xa_set_tag(struct xarray *, unsigned long index, xa_tag_t);
+void *xa_clear_tag(struct xarray *, unsigned long index, xa_tag_t);
+
 #define BITS_PER_XA_VALUE	(BITS_PER_LONG - 1)
 
 /**
@@ -133,6 +161,10 @@ static inline bool xa_is_value(void *entry)
 				spin_unlock_irqrestore(&(xa)->xa_lock, flags)
 #define xa_lock_held(xa)	lockdep_is_held(&(xa)->xa_lock)
 
+/* Versions of the normal API which require the caller to hold the xa_lock */
+void *__xa_set_tag(struct xarray *, unsigned long index, xa_tag_t);
+void *__xa_clear_tag(struct xarray *, unsigned long index, xa_tag_t);
+
 #ifdef XA_DEBUG
 void xa_dump(const struct xarray *);
 void xa_dump_node(const struct xa_node *);
@@ -175,7 +207,7 @@ struct xa_node {
 	unsigned char	offset;		/* Slot offset in parent */
 	unsigned char	count;		/* Total entry count */
 	unsigned char	exceptional;	/* Exceptional entry count */
-	struct xa_node *parent;		/* Used when ascending tree */
+	struct xa_node __rcu *parent;	/* Used when ascending tree */
 	struct xarray *	root;		/* The tree we belong to */
 	union {
 		struct list_head private_list;	/* For tree user */
@@ -355,6 +387,10 @@ static inline bool xas_retry(struct xa_state *xas, void *entry)
 
 void *xas_load(struct xarray *, struct xa_state *);
 
+bool xas_get_tag(const struct xarray *, const struct xa_state *, xa_tag_t);
+void xas_set_tag(struct xarray *, const struct xa_state *, xa_tag_t);
+void xas_clear_tag(struct xarray *, const struct xa_state *, xa_tag_t);
+
 /**
  * xas_reload() - Refetch an entry from the xarray.
  * @xa: XArray.
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 930eb7d298d7..711a6d9b79fc 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -126,19 +126,19 @@ static inline gfp_t root_gfp_mask(const struct radix_tree_root *root)
 	return root->xa_flags & __GFP_BITS_MASK;
 }
 
-static inline void tag_set(struct radix_tree_node *node, unsigned int tag,
+static inline void rtag_set(struct radix_tree_node *node, unsigned int tag,
 		int offset)
 {
 	__set_bit(offset, node->tags[tag]);
 }
 
-static inline void tag_clear(struct radix_tree_node *node, unsigned int tag,
+static inline void rtag_clear(struct radix_tree_node *node, unsigned int tag,
 		int offset)
 {
 	__clear_bit(offset, node->tags[tag]);
 }
 
-static inline int tag_get(const struct radix_tree_node *node, unsigned int tag,
+static inline int rtag_get(const struct radix_tree_node *node, unsigned int tag,
 		int offset)
 {
 	return test_bit(offset, node->tags[tag]);
@@ -617,14 +617,14 @@ static int radix_tree_extend(struct radix_tree_root *root, gfp_t gfp,
 		if (is_idr(root)) {
 			all_tag_set(node, IDR_FREE);
 			if (!root_tag_get(root, IDR_FREE)) {
-				tag_clear(node, IDR_FREE, 0);
+				rtag_clear(node, IDR_FREE, 0);
 				root_tag_set(root, IDR_FREE);
 			}
 		} else {
 			/* Propagate the aggregated tag info to the new child */
 			for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) {
 				if (root_tag_get(root, tag))
-					tag_set(node, tag, 0);
+					rtag_set(node, tag, 0);
 			}
 		}
 
@@ -689,7 +689,7 @@ static inline bool radix_tree_shrink(struct radix_tree_root *root,
 		 * one (root->xa_head) as far as dependent read barriers go.
 		 */
 		root->xa_head = (void __rcu *)child;
-		if (is_idr(root) && !tag_get(node, IDR_FREE, 0))
+		if (is_idr(root) && !rtag_get(node, IDR_FREE, 0))
 			root_tag_clear(root, IDR_FREE);
 
 		/*
@@ -896,7 +896,7 @@ static inline int insert_entries(struct radix_tree_node *node,
 			if (replace) {
 				node->count--;
 				for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
-					if (tag_get(node, tag, offset + i))
+					if (rtag_get(node, tag, offset + i))
 						tags |= 1 << tag;
 			} else
 				return -EEXIST;
@@ -909,12 +909,12 @@ static inline int insert_entries(struct radix_tree_node *node,
 			rcu_assign_pointer(slot[i], sibling);
 			for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
 				if (tags & (1 << tag))
-					tag_clear(node, tag, offset + i);
+					rtag_clear(node, tag, offset + i);
 		} else {
 			rcu_assign_pointer(slot[i], item);
 			for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
 				if (tags & (1 << tag))
-					tag_set(node, tag, offset);
+					rtag_set(node, tag, offset);
 		}
 		if (xa_is_node(old))
 			radix_tree_free_nodes(old);
@@ -972,9 +972,9 @@ int __radix_tree_insert(struct radix_tree_root *root, unsigned long index,
 
 	if (node) {
 		unsigned offset = get_slot_offset(node, slot);
-		BUG_ON(tag_get(node, 0, offset));
-		BUG_ON(tag_get(node, 1, offset));
-		BUG_ON(tag_get(node, 2, offset));
+		BUG_ON(rtag_get(node, 0, offset));
+		BUG_ON(rtag_get(node, 1, offset));
+		BUG_ON(rtag_get(node, 2, offset));
 	} else {
 		BUG_ON(root_tags_get(root));
 	}
@@ -1110,7 +1110,7 @@ static bool node_tag_get(const struct radix_tree_root *root,
 				unsigned int tag, unsigned int offset)
 {
 	if (node)
-		return tag_get(node, tag, offset);
+		return rtag_get(node, tag, offset);
 	return root_tag_get(root, tag);
 }
 
@@ -1280,7 +1280,7 @@ int radix_tree_split(struct radix_tree_root *root, unsigned long index,
 	offset = get_slot_offset(parent, slot);
 
 	for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
-		if (tag_get(parent, tag, offset))
+		if (rtag_get(parent, tag, offset))
 			tags |= 1 << tag;
 
 	for (end = offset + 1; end < RADIX_TREE_MAP_SIZE; end++) {
@@ -1288,7 +1288,7 @@ int radix_tree_split(struct radix_tree_root *root, unsigned long index,
 			break;
 		for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
 			if (tags & (1 << tag))
-				tag_set(parent, tag, end);
+				rtag_set(parent, tag, end);
 		/* rcu_assign_pointer ensures tags are set before RETRY */
 		rcu_assign_pointer(parent->slots[end], RADIX_TREE_RETRY);
 	}
@@ -1319,7 +1319,7 @@ int radix_tree_split(struct radix_tree_root *root, unsigned long index,
 							node_to_entry(child));
 				for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
 					if (tags & (1 << tag))
-						tag_set(node, tag, offset);
+						rtag_set(node, tag, offset);
 			}
 
 			node = child;
@@ -1333,7 +1333,7 @@ int radix_tree_split(struct radix_tree_root *root, unsigned long index,
 
 		for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
 			if (tags & (1 << tag))
-				tag_set(node, tag, offset);
+				rtag_set(node, tag, offset);
 		offset += n;
 
 		while (offset == RADIX_TREE_MAP_SIZE) {
@@ -1363,9 +1363,9 @@ static void node_tag_set(struct radix_tree_root *root,
 				unsigned int tag, unsigned int offset)
 {
 	while (node) {
-		if (tag_get(node, tag, offset))
+		if (rtag_get(node, tag, offset))
 			return;
-		tag_set(node, tag, offset);
+		rtag_set(node, tag, offset);
 		offset = node->offset;
 		node = node->parent;
 	}
@@ -1403,8 +1403,8 @@ void *radix_tree_tag_set(struct radix_tree_root *root,
 		offset = radix_tree_descend(parent, &node, index);
 		BUG_ON(!node);
 
-		if (!tag_get(parent, tag, offset))
-			tag_set(parent, tag, offset);
+		if (!rtag_get(parent, tag, offset))
+			rtag_set(parent, tag, offset);
 	}
 
 	/* set the root's tag bit */
@@ -1432,9 +1432,9 @@ static void node_tag_clear(struct radix_tree_root *root,
 				unsigned int tag, unsigned int offset)
 {
 	while (node) {
-		if (!tag_get(node, tag, offset))
+		if (!rtag_get(node, tag, offset))
 			return;
-		tag_clear(node, tag, offset);
+		rtag_clear(node, tag, offset);
 		if (any_tag_set(node, tag))
 			return;
 
@@ -1532,7 +1532,7 @@ int radix_tree_tag_get(const struct radix_tree_root *root,
 		parent = entry_to_node(node);
 		offset = radix_tree_descend(parent, &node, index);
 
-		if (!tag_get(parent, tag, offset))
+		if (!rtag_get(parent, tag, offset))
 			return 0;
 		if (node == RADIX_TREE_RETRY)
 			break;
@@ -1721,7 +1721,7 @@ void __rcu **radix_tree_next_chunk(const struct radix_tree_root *root,
 		offset = radix_tree_descend(node, &child, index);
 
 		if ((flags & RADIX_TREE_ITER_TAGGED) ?
-				!tag_get(node, tag, offset) : !child) {
+				!rtag_get(node, tag, offset) : !child) {
 			/* Hole detected */
 			if (flags & RADIX_TREE_ITER_CONTIG)
 				return NULL;
@@ -2143,7 +2143,7 @@ void __rcu **idr_get_free(struct radix_tree_root *root,
 
 		node = entry_to_node(child);
 		offset = radix_tree_descend(node, &child, start);
-		if (!tag_get(node, IDR_FREE, offset)) {
+		if (!rtag_get(node, IDR_FREE, offset)) {
 			offset = radix_tree_find_next_bit(node, IDR_FREE,
 							offset + 1);
 			start = next_index(start, node, offset);
diff --git a/lib/xarray.c b/lib/xarray.c
index 1f7d30a8b61f..fbc7de5a224f 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -33,6 +33,53 @@
  * @entry refers to something stored in a slot in the xarray
  */
 
+static inline struct xa_node *xa_parent(struct xarray *xa,
+					const struct xa_node *node)
+{
+	return rcu_dereference_check(node->parent, xa_lock_held(xa));
+}
+
+static inline struct xa_node *xa_parent_locked(struct xarray *xa,
+					const struct xa_node *node)
+{
+	return rcu_dereference_protected(node->parent, xa_lock_held(xa));
+}
+
+static inline void xa_tag_set(struct xarray *xa, xa_tag_t tag)
+{
+	if (!(xa->xa_flags & XA_FLAGS_TAG(tag)))
+		xa->xa_flags |= XA_FLAGS_TAG(tag);
+}
+
+static inline void xa_tag_clear(struct xarray *xa, xa_tag_t tag)
+{
+	if (xa->xa_flags & XA_FLAGS_TAG(tag))
+		xa->xa_flags &= ~(XA_FLAGS_TAG(tag));
+}
+
+static inline bool tag_get(const struct xa_node *node, unsigned int offset,
+				xa_tag_t tag)
+{
+	return test_bit(offset, node->tags[(__force unsigned)tag]);
+}
+
+static inline void tag_set(struct xa_node *node, unsigned int offset,
+				xa_tag_t tag)
+{
+	__set_bit(offset, node->tags[(__force unsigned)tag]);
+}
+
+static inline void tag_clear(struct xa_node *node, unsigned int offset,
+				xa_tag_t tag)
+{
+	__clear_bit(offset, node->tags[(__force unsigned)tag]);
+}
+
+static inline bool tag_any_set(struct xa_node *node, xa_tag_t tag)
+{
+	return !bitmap_empty(node->tags[(__force unsigned)tag], XA_CHUNK_SIZE);
+}
+
 /* extracts the offset within this node from the index */
 static unsigned int get_offset(unsigned long index, struct xa_node *node)
 {
@@ -114,6 +161,89 @@ void *xas_load(struct xarray *xa, struct xa_state *xas)
 }
 EXPORT_SYMBOL_GPL(xas_load);
 
+/**
+ * xas_get_tag() - Returns the state of this tag.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ * @tag: Tag number.
+ *
+ * Return: true if the tag is set, false if the tag is clear or @xas
+ * is in an error state.
+ */
+bool xas_get_tag(const struct xarray *xa, const struct xa_state *xas,
+			xa_tag_t tag)
+{
+	if (xas_invalid(xas))
+		return false;
+	if (!xas->xa_node)
+		return xa_tagged(xa, tag);
+	return tag_get(xas->xa_node, xas->xa_offset, tag);
+}
+EXPORT_SYMBOL_GPL(xas_get_tag);
+
+/**
+ * xas_set_tag() - Sets the tag on this entry and its parents.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ * @tag: Tag number.
+ *
+ * Sets the specified tag on this entry, and walks up the tree setting it
+ * on all the ancestor entries.  Does nothing if @xas has not been walked to
+ * an entry, or is in an error state.
+ */
+void xas_set_tag(struct xarray *xa, const struct xa_state *xas, xa_tag_t tag)
+{
+	struct xa_node *node = xas->xa_node;
+	unsigned int offset = xas->xa_offset;
+
+	if (xas_invalid(xas))
+		return;
+
+	while (node) {
+		if (tag_get(node, offset, tag))
+			return;
+		tag_set(node, offset, tag);
+		offset = node->offset;
+		node = xa_parent_locked(xa, node);
+	}
+
+	if (!xa_tagged(xa, tag))
+		xa_tag_set(xa, tag);
+}
+EXPORT_SYMBOL_GPL(xas_set_tag);
+
+/**
+ * xas_clear_tag() - Clears the tag on this entry and its parents.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ * @tag: Tag number.
+ *
+ * Clears the specified tag on this entry, and walks back to the head
+ * attempting to clear it on all the ancestor entries.  Does nothing if
+ * @xas has not been walked to an entry, or is in an error state.
+ */
+void xas_clear_tag(struct xarray *xa, const struct xa_state *xas, xa_tag_t tag)
+{
+	struct xa_node *node = xas->xa_node;
+	unsigned int offset = xas->xa_offset;
+
+	if (xas_invalid(xas))
+		return;
+
+	while (node) {
+		tag_clear(node, offset, tag);
+		if (tag_any_set(node, tag))
+			return;
+
+		offset = node->offset;
+		node = xa_parent_locked(xa, node);
+	}
+
+	if (xa_tagged(xa, tag))
+		xa_tag_clear(xa, tag);
+}
+EXPORT_SYMBOL_GPL(xas_clear_tag);
+
 /**
  * xa_load() - Load an entry from an XArray.
  * @xa: XArray.
@@ -135,3 +265,123 @@ void *xa_load(struct xarray *xa, unsigned long index)
 	return entry;
 }
 EXPORT_SYMBOL(xa_load);
+
+/**
+ * __xa_set_tag() - Set this tag on this entry.
+ * @xa: XArray.
+ * @index: Index of entry.
+ * @tag: Tag number.
+ *
+ * Attempting to set a tag on a NULL entry does not succeed.
+ * This function expects the xa_lock to be held on entry.
+ *
+ * Return: The entry at this index.
+ */
+void *__xa_set_tag(struct xarray *xa, unsigned long index, xa_tag_t tag)
+{
+	XA_STATE(xas, index);
+	void *entry = xas_load(xa, &xas);
+
+	if (entry)
+		xas_set_tag(xa, &xas, tag);
+
+	return entry;
+}
+EXPORT_SYMBOL_GPL(__xa_set_tag);
+
+/**
+ * __xa_clear_tag() - Clear this tag on this entry.
+ * @xa: XArray.
+ * @index: Index of entry.
+ * @tag: Tag number.
+ *
+ * This function expects the xa_lock to be held on entry.
+ *
+ * Return: The entry at this index.
+ */
+void *__xa_clear_tag(struct xarray *xa, unsigned long index, xa_tag_t tag)
+{
+	XA_STATE(xas, index);
+	void *entry = xas_load(xa, &xas);
+
+	if (entry)
+		xas_clear_tag(xa, &xas, tag);
+
+	return entry;
+}
+EXPORT_SYMBOL_GPL(__xa_clear_tag);
+
+/**
+ * xa_get_tag() - Inquire whether this tag is set on this entry.
+ * @xa: XArray.
+ * @index: Index of entry.
+ * @tag: Tag number.
+ *
+ * This function uses the RCU read lock, so the result may be out of date
+ * by the time it returns.  If you need the result to be stable, use a lock.
+ *
+ * Return: True if the entry at @index has this tag set, false if it doesn't.
+ */
+bool xa_get_tag(struct xarray *xa, unsigned long index, xa_tag_t tag)
+{
+	XA_STATE(xas, index);
+	void *entry;
+
+	rcu_read_lock();
+	entry = xas_start(xa, &xas);
+	while (xas_get_tag(xa, &xas, tag)) {
+		if (!xa_is_node(entry))
+			goto found;
+		entry = xas_descend(xa, &xas, xa_to_node(entry));
+	}
+	rcu_read_unlock();
+	return false;
+ found:
+	rcu_read_unlock();
+	return true;
+}
+EXPORT_SYMBOL(xa_get_tag);
+
+/**
+ * xa_set_tag() - Set this tag on this entry.
+ * @xa: XArray.
+ * @index: Index of entry.
+ * @tag: Tag number.
+ *
+ * Attempting to set a tag on a NULL entry does not succeed.
+ *
+ * Return: The entry at this index.
+ */
+void *xa_set_tag(struct xarray *xa, unsigned long index, xa_tag_t tag)
+{
+	unsigned long flags;
+	void *entry;
+
+	xa_lock_irqsave(xa, flags);
+	entry = __xa_set_tag(xa, index, tag);
+	xa_unlock_irqrestore(xa, flags);
+
+	return entry;
+}
+EXPORT_SYMBOL(xa_set_tag);
+
+/**
+ * xa_clear_tag() - Clear this tag on this entry.
+ * @xa: XArray.
+ * @index: Index of entry.
+ * @tag: Tag number.
+ *
+ * Return: The entry at this index.
+ */
+void *xa_clear_tag(struct xarray *xa, unsigned long index, xa_tag_t tag)
+{
+	unsigned long flags;
+	void *entry;
+
+	xa_lock_irqsave(xa, flags);
+	entry = __xa_clear_tag(xa, index, tag);
+	xa_unlock_irqrestore(xa, flags);
+
+	return entry;
+}
+EXPORT_SYMBOL(xa_clear_tag);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 21/62] xarray: Add xa_get_tag, xa_set_tag and xa_clear_tag
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

XArray tags are slightly more strongly typed than the radix tree tags,
but occupy the same bits.  This commit also adds the xas_ family of tag
operations, for cases where the caller is already holding the lock, and
xa_tagged() to ask whether any array member has a particular tag set.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h |  38 +++++++-
 lib/radix-tree.c       |  52 +++++-----
 lib/xarray.c           | 250 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 313 insertions(+), 27 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 0e736d2db049..ab6b1f5e685a 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -48,6 +48,7 @@
 
 #include <linux/bug.h>
 #include <linux/compiler.h>
+#include <linux/gfp.h>
 #include <linux/kernel.h>
 #include <linux/rcupdate.h>
 #include <linux/spinlock.h>
@@ -82,6 +83,33 @@ struct xarray {
 
 void *xa_load(struct xarray *, unsigned long index);
 
+typedef unsigned __bitwise xa_tag_t;
+#define XA_TAG_0		((__force xa_tag_t)0U)
+#define XA_TAG_1		((__force xa_tag_t)1U)
+#define XA_TAG_2		((__force xa_tag_t)2U)
+#define XA_NO_TAG		((__force xa_tag_t)4U)
+
+#define XA_TAG_MAX		XA_TAG_2
+#define XA_FREE_TAG		XA_TAG_0
+#define XA_FLAGS_TAG(tag)	((__force gfp_t)((2U << __GFP_BITS_SHIFT) << \
+				(__force unsigned)(tag)))
+
+/**
+ * xa_tagged() - Inquire whether any entry in this array has a tag set
+ * @xa: Array
+ * @tag: Tag value
+ *
+ * Return: True if any entry has this tag set, false if no entry does.
+ */
+static inline bool xa_tagged(const struct xarray *xa, xa_tag_t tag)
+{
+	return xa->xa_flags & XA_FLAGS_TAG(tag);
+}
+
+bool xa_get_tag(struct xarray *, unsigned long index, xa_tag_t);
+void *xa_set_tag(struct xarray *, unsigned long index, xa_tag_t);
+void *xa_clear_tag(struct xarray *, unsigned long index, xa_tag_t);
+
 #define BITS_PER_XA_VALUE	(BITS_PER_LONG - 1)
 
 /**
@@ -133,6 +161,10 @@ static inline bool xa_is_value(void *entry)
 				spin_unlock_irqrestore(&(xa)->xa_lock, flags)
 #define xa_lock_held(xa)	lockdep_is_held(&(xa)->xa_lock)
 
+/* Versions of the normal API which require the caller to hold the xa_lock */
+void *__xa_set_tag(struct xarray *, unsigned long index, xa_tag_t);
+void *__xa_clear_tag(struct xarray *, unsigned long index, xa_tag_t);
+
 #ifdef XA_DEBUG
 void xa_dump(const struct xarray *);
 void xa_dump_node(const struct xa_node *);
@@ -175,7 +207,7 @@ struct xa_node {
 	unsigned char	offset;		/* Slot offset in parent */
 	unsigned char	count;		/* Total entry count */
 	unsigned char	exceptional;	/* Exceptional entry count */
-	struct xa_node *parent;		/* Used when ascending tree */
+	struct xa_node __rcu *parent;	/* Used when ascending tree */
 	struct xarray *	root;		/* The tree we belong to */
 	union {
 		struct list_head private_list;	/* For tree user */
@@ -355,6 +387,10 @@ static inline bool xas_retry(struct xa_state *xas, void *entry)
 
 void *xas_load(struct xarray *, struct xa_state *);
 
+bool xas_get_tag(const struct xarray *, const struct xa_state *, xa_tag_t);
+void xas_set_tag(struct xarray *, const struct xa_state *, xa_tag_t);
+void xas_clear_tag(struct xarray *, const struct xa_state *, xa_tag_t);
+
 /**
  * xas_reload() - Refetch an entry from the xarray.
  * @xa: XArray.
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 930eb7d298d7..711a6d9b79fc 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -126,19 +126,19 @@ static inline gfp_t root_gfp_mask(const struct radix_tree_root *root)
 	return root->xa_flags & __GFP_BITS_MASK;
 }
 
-static inline void tag_set(struct radix_tree_node *node, unsigned int tag,
+static inline void rtag_set(struct radix_tree_node *node, unsigned int tag,
 		int offset)
 {
 	__set_bit(offset, node->tags[tag]);
 }
 
-static inline void tag_clear(struct radix_tree_node *node, unsigned int tag,
+static inline void rtag_clear(struct radix_tree_node *node, unsigned int tag,
 		int offset)
 {
 	__clear_bit(offset, node->tags[tag]);
 }
 
-static inline int tag_get(const struct radix_tree_node *node, unsigned int tag,
+static inline int rtag_get(const struct radix_tree_node *node, unsigned int tag,
 		int offset)
 {
 	return test_bit(offset, node->tags[tag]);
@@ -617,14 +617,14 @@ static int radix_tree_extend(struct radix_tree_root *root, gfp_t gfp,
 		if (is_idr(root)) {
 			all_tag_set(node, IDR_FREE);
 			if (!root_tag_get(root, IDR_FREE)) {
-				tag_clear(node, IDR_FREE, 0);
+				rtag_clear(node, IDR_FREE, 0);
 				root_tag_set(root, IDR_FREE);
 			}
 		} else {
 			/* Propagate the aggregated tag info to the new child */
 			for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) {
 				if (root_tag_get(root, tag))
-					tag_set(node, tag, 0);
+					rtag_set(node, tag, 0);
 			}
 		}
 
@@ -689,7 +689,7 @@ static inline bool radix_tree_shrink(struct radix_tree_root *root,
 		 * one (root->xa_head) as far as dependent read barriers go.
 		 */
 		root->xa_head = (void __rcu *)child;
-		if (is_idr(root) && !tag_get(node, IDR_FREE, 0))
+		if (is_idr(root) && !rtag_get(node, IDR_FREE, 0))
 			root_tag_clear(root, IDR_FREE);
 
 		/*
@@ -896,7 +896,7 @@ static inline int insert_entries(struct radix_tree_node *node,
 			if (replace) {
 				node->count--;
 				for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
-					if (tag_get(node, tag, offset + i))
+					if (rtag_get(node, tag, offset + i))
 						tags |= 1 << tag;
 			} else
 				return -EEXIST;
@@ -909,12 +909,12 @@ static inline int insert_entries(struct radix_tree_node *node,
 			rcu_assign_pointer(slot[i], sibling);
 			for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
 				if (tags & (1 << tag))
-					tag_clear(node, tag, offset + i);
+					rtag_clear(node, tag, offset + i);
 		} else {
 			rcu_assign_pointer(slot[i], item);
 			for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
 				if (tags & (1 << tag))
-					tag_set(node, tag, offset);
+					rtag_set(node, tag, offset);
 		}
 		if (xa_is_node(old))
 			radix_tree_free_nodes(old);
@@ -972,9 +972,9 @@ int __radix_tree_insert(struct radix_tree_root *root, unsigned long index,
 
 	if (node) {
 		unsigned offset = get_slot_offset(node, slot);
-		BUG_ON(tag_get(node, 0, offset));
-		BUG_ON(tag_get(node, 1, offset));
-		BUG_ON(tag_get(node, 2, offset));
+		BUG_ON(rtag_get(node, 0, offset));
+		BUG_ON(rtag_get(node, 1, offset));
+		BUG_ON(rtag_get(node, 2, offset));
 	} else {
 		BUG_ON(root_tags_get(root));
 	}
@@ -1110,7 +1110,7 @@ static bool node_tag_get(const struct radix_tree_root *root,
 				unsigned int tag, unsigned int offset)
 {
 	if (node)
-		return tag_get(node, tag, offset);
+		return rtag_get(node, tag, offset);
 	return root_tag_get(root, tag);
 }
 
@@ -1280,7 +1280,7 @@ int radix_tree_split(struct radix_tree_root *root, unsigned long index,
 	offset = get_slot_offset(parent, slot);
 
 	for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
-		if (tag_get(parent, tag, offset))
+		if (rtag_get(parent, tag, offset))
 			tags |= 1 << tag;
 
 	for (end = offset + 1; end < RADIX_TREE_MAP_SIZE; end++) {
@@ -1288,7 +1288,7 @@ int radix_tree_split(struct radix_tree_root *root, unsigned long index,
 			break;
 		for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
 			if (tags & (1 << tag))
-				tag_set(parent, tag, end);
+				rtag_set(parent, tag, end);
 		/* rcu_assign_pointer ensures tags are set before RETRY */
 		rcu_assign_pointer(parent->slots[end], RADIX_TREE_RETRY);
 	}
@@ -1319,7 +1319,7 @@ int radix_tree_split(struct radix_tree_root *root, unsigned long index,
 							node_to_entry(child));
 				for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
 					if (tags & (1 << tag))
-						tag_set(node, tag, offset);
+						rtag_set(node, tag, offset);
 			}
 
 			node = child;
@@ -1333,7 +1333,7 @@ int radix_tree_split(struct radix_tree_root *root, unsigned long index,
 
 		for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
 			if (tags & (1 << tag))
-				tag_set(node, tag, offset);
+				rtag_set(node, tag, offset);
 		offset += n;
 
 		while (offset == RADIX_TREE_MAP_SIZE) {
@@ -1363,9 +1363,9 @@ static void node_tag_set(struct radix_tree_root *root,
 				unsigned int tag, unsigned int offset)
 {
 	while (node) {
-		if (tag_get(node, tag, offset))
+		if (rtag_get(node, tag, offset))
 			return;
-		tag_set(node, tag, offset);
+		rtag_set(node, tag, offset);
 		offset = node->offset;
 		node = node->parent;
 	}
@@ -1403,8 +1403,8 @@ void *radix_tree_tag_set(struct radix_tree_root *root,
 		offset = radix_tree_descend(parent, &node, index);
 		BUG_ON(!node);
 
-		if (!tag_get(parent, tag, offset))
-			tag_set(parent, tag, offset);
+		if (!rtag_get(parent, tag, offset))
+			rtag_set(parent, tag, offset);
 	}
 
 	/* set the root's tag bit */
@@ -1432,9 +1432,9 @@ static void node_tag_clear(struct radix_tree_root *root,
 				unsigned int tag, unsigned int offset)
 {
 	while (node) {
-		if (!tag_get(node, tag, offset))
+		if (!rtag_get(node, tag, offset))
 			return;
-		tag_clear(node, tag, offset);
+		rtag_clear(node, tag, offset);
 		if (any_tag_set(node, tag))
 			return;
 
@@ -1532,7 +1532,7 @@ int radix_tree_tag_get(const struct radix_tree_root *root,
 		parent = entry_to_node(node);
 		offset = radix_tree_descend(parent, &node, index);
 
-		if (!tag_get(parent, tag, offset))
+		if (!rtag_get(parent, tag, offset))
 			return 0;
 		if (node == RADIX_TREE_RETRY)
 			break;
@@ -1721,7 +1721,7 @@ void __rcu **radix_tree_next_chunk(const struct radix_tree_root *root,
 		offset = radix_tree_descend(node, &child, index);
 
 		if ((flags & RADIX_TREE_ITER_TAGGED) ?
-				!tag_get(node, tag, offset) : !child) {
+				!rtag_get(node, tag, offset) : !child) {
 			/* Hole detected */
 			if (flags & RADIX_TREE_ITER_CONTIG)
 				return NULL;
@@ -2143,7 +2143,7 @@ void __rcu **idr_get_free(struct radix_tree_root *root,
 
 		node = entry_to_node(child);
 		offset = radix_tree_descend(node, &child, start);
-		if (!tag_get(node, IDR_FREE, offset)) {
+		if (!rtag_get(node, IDR_FREE, offset)) {
 			offset = radix_tree_find_next_bit(node, IDR_FREE,
 							offset + 1);
 			start = next_index(start, node, offset);
diff --git a/lib/xarray.c b/lib/xarray.c
index 1f7d30a8b61f..fbc7de5a224f 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -33,6 +33,53 @@
  * @entry refers to something stored in a slot in the xarray
  */
 
+static inline struct xa_node *xa_parent(struct xarray *xa,
+					const struct xa_node *node)
+{
+	return rcu_dereference_check(node->parent, xa_lock_held(xa));
+}
+
+static inline struct xa_node *xa_parent_locked(struct xarray *xa,
+					const struct xa_node *node)
+{
+	return rcu_dereference_protected(node->parent, xa_lock_held(xa));
+}
+
+static inline void xa_tag_set(struct xarray *xa, xa_tag_t tag)
+{
+	if (!(xa->xa_flags & XA_FLAGS_TAG(tag)))
+		xa->xa_flags |= XA_FLAGS_TAG(tag);
+}
+
+static inline void xa_tag_clear(struct xarray *xa, xa_tag_t tag)
+{
+	if (xa->xa_flags & XA_FLAGS_TAG(tag))
+		xa->xa_flags &= ~(XA_FLAGS_TAG(tag));
+}
+
+static inline bool tag_get(const struct xa_node *node, unsigned int offset,
+				xa_tag_t tag)
+{
+	return test_bit(offset, node->tags[(__force unsigned)tag]);
+}
+
+static inline void tag_set(struct xa_node *node, unsigned int offset,
+				xa_tag_t tag)
+{
+	__set_bit(offset, node->tags[(__force unsigned)tag]);
+}
+
+static inline void tag_clear(struct xa_node *node, unsigned int offset,
+				xa_tag_t tag)
+{
+	__clear_bit(offset, node->tags[(__force unsigned)tag]);
+}
+
+static inline bool tag_any_set(struct xa_node *node, xa_tag_t tag)
+{
+	return !bitmap_empty(node->tags[(__force unsigned)tag], XA_CHUNK_SIZE);
+}
+
 /* extracts the offset within this node from the index */
 static unsigned int get_offset(unsigned long index, struct xa_node *node)
 {
@@ -114,6 +161,89 @@ void *xas_load(struct xarray *xa, struct xa_state *xas)
 }
 EXPORT_SYMBOL_GPL(xas_load);
 
+/**
+ * xas_get_tag() - Returns the state of this tag.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ * @tag: Tag number.
+ *
+ * Return: true if the tag is set, false if the tag is clear or @xas
+ * is in an error state.
+ */
+bool xas_get_tag(const struct xarray *xa, const struct xa_state *xas,
+			xa_tag_t tag)
+{
+	if (xas_invalid(xas))
+		return false;
+	if (!xas->xa_node)
+		return xa_tagged(xa, tag);
+	return tag_get(xas->xa_node, xas->xa_offset, tag);
+}
+EXPORT_SYMBOL_GPL(xas_get_tag);
+
+/**
+ * xas_set_tag() - Sets the tag on this entry and its parents.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ * @tag: Tag number.
+ *
+ * Sets the specified tag on this entry, and walks up the tree setting it
+ * on all the ancestor entries.  Does nothing if @xas has not been walked to
+ * an entry, or is in an error state.
+ */
+void xas_set_tag(struct xarray *xa, const struct xa_state *xas, xa_tag_t tag)
+{
+	struct xa_node *node = xas->xa_node;
+	unsigned int offset = xas->xa_offset;
+
+	if (xas_invalid(xas))
+		return;
+
+	while (node) {
+		if (tag_get(node, offset, tag))
+			return;
+		tag_set(node, offset, tag);
+		offset = node->offset;
+		node = xa_parent_locked(xa, node);
+	}
+
+	if (!xa_tagged(xa, tag))
+		xa_tag_set(xa, tag);
+}
+EXPORT_SYMBOL_GPL(xas_set_tag);
+
+/**
+ * xas_clear_tag() - Clears the tag on this entry and its parents.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ * @tag: Tag number.
+ *
+ * Clears the specified tag on this entry, and walks back to the head
+ * attempting to clear it on all the ancestor entries.  Does nothing if
+ * @xas has not been walked to an entry, or is in an error state.
+ */
+void xas_clear_tag(struct xarray *xa, const struct xa_state *xas, xa_tag_t tag)
+{
+	struct xa_node *node = xas->xa_node;
+	unsigned int offset = xas->xa_offset;
+
+	if (xas_invalid(xas))
+		return;
+
+	while (node) {
+		tag_clear(node, offset, tag);
+		if (tag_any_set(node, tag))
+			return;
+
+		offset = node->offset;
+		node = xa_parent_locked(xa, node);
+	}
+
+	if (xa_tagged(xa, tag))
+		xa_tag_clear(xa, tag);
+}
+EXPORT_SYMBOL_GPL(xas_clear_tag);
+
 /**
  * xa_load() - Load an entry from an XArray.
  * @xa: XArray.
@@ -135,3 +265,123 @@ void *xa_load(struct xarray *xa, unsigned long index)
 	return entry;
 }
 EXPORT_SYMBOL(xa_load);
+
+/**
+ * __xa_set_tag() - Set this tag on this entry.
+ * @xa: XArray.
+ * @index: Index of entry.
+ * @tag: Tag number.
+ *
+ * Attempting to set a tag on a NULL entry does not succeed.
+ * This function expects the xa_lock to be held on entry.
+ *
+ * Return: The entry at this index.
+ */
+void *__xa_set_tag(struct xarray *xa, unsigned long index, xa_tag_t tag)
+{
+	XA_STATE(xas, index);
+	void *entry = xas_load(xa, &xas);
+
+	if (entry)
+		xas_set_tag(xa, &xas, tag);
+
+	return entry;
+}
+EXPORT_SYMBOL_GPL(__xa_set_tag);
+
+/**
+ * __xa_clear_tag() - Clear this tag on this entry.
+ * @xa: XArray.
+ * @index: Index of entry.
+ * @tag: Tag number.
+ *
+ * This function expects the xa_lock to be held on entry.
+ *
+ * Return: The entry at this index.
+ */
+void *__xa_clear_tag(struct xarray *xa, unsigned long index, xa_tag_t tag)
+{
+	XA_STATE(xas, index);
+	void *entry = xas_load(xa, &xas);
+
+	if (entry)
+		xas_clear_tag(xa, &xas, tag);
+
+	return entry;
+}
+EXPORT_SYMBOL_GPL(__xa_clear_tag);
+
+/**
+ * xa_get_tag() - Inquire whether this tag is set on this entry.
+ * @xa: XArray.
+ * @index: Index of entry.
+ * @tag: Tag number.
+ *
+ * This function uses the RCU read lock, so the result may be out of date
+ * by the time it returns.  If you need the result to be stable, use a lock.
+ *
+ * Return: True if the entry at @index has this tag set, false if it doesn't.
+ */
+bool xa_get_tag(struct xarray *xa, unsigned long index, xa_tag_t tag)
+{
+	XA_STATE(xas, index);
+	void *entry;
+
+	rcu_read_lock();
+	entry = xas_start(xa, &xas);
+	while (xas_get_tag(xa, &xas, tag)) {
+		if (!xa_is_node(entry))
+			goto found;
+		entry = xas_descend(xa, &xas, xa_to_node(entry));
+	}
+	rcu_read_unlock();
+	return false;
+ found:
+	rcu_read_unlock();
+	return true;
+}
+EXPORT_SYMBOL(xa_get_tag);
+
+/**
+ * xa_set_tag() - Set this tag on this entry.
+ * @xa: XArray.
+ * @index: Index of entry.
+ * @tag: Tag number.
+ *
+ * Attempting to set a tag on a NULL entry does not succeed.
+ *
+ * Return: The entry at this index.
+ */
+void *xa_set_tag(struct xarray *xa, unsigned long index, xa_tag_t tag)
+{
+	unsigned long flags;
+	void *entry;
+
+	xa_lock_irqsave(xa, flags);
+	entry = __xa_set_tag(xa, index, tag);
+	xa_unlock_irqrestore(xa, flags);
+
+	return entry;
+}
+EXPORT_SYMBOL(xa_set_tag);
+
+/**
+ * xa_clear_tag() - Clear this tag on this entry.
+ * @xa: XArray.
+ * @index: Index of entry.
+ * @tag: Tag number.
+ *
+ * Return: The entry at this index.
+ */
+void *xa_clear_tag(struct xarray *xa, unsigned long index, xa_tag_t tag)
+{
+	unsigned long flags;
+	void *entry;
+
+	xa_lock_irqsave(xa, flags);
+	entry = __xa_clear_tag(xa, index, tag);
+	xa_unlock_irqrestore(xa, flags);
+
+	return entry;
+}
+EXPORT_SYMBOL(xa_clear_tag);
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 22/62] xarray: Add xa_store
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:06   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

xa_store() differs from radix_tree_insert() in that it will overwrite an
existing element in the array rather than returning an error.  This is
the behaviour which most users want, and those that want more complex
behaviour generally want to use the xas family of routines anyway.

For memory allocation, xa_store() will first attempt to request memory
from the slab allocator; if memory is not immediately available, it will
drop the xa_lock and allocate memory, keeping a pointer in the xa_state.
It does not use the per-CPU cache, although those will continue to exist
until all radix tree users are converted to the xarray.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h                    |  58 ++++
 lib/radix-tree.c                          |   4 +-
 lib/xarray.c                              | 552 ++++++++++++++++++++++++++++++
 tools/testing/radix-tree/linux/rcupdate.h |   1 +
 tools/testing/radix-tree/xarray-test.c    |  66 ++++
 5 files changed, 679 insertions(+), 2 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index ab6b1f5e685a..5e975c512018 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -82,6 +82,18 @@ struct xarray {
 #define DEFINE_XARRAY(name) struct xarray name = XARRAY_INIT(name)
 
 void *xa_load(struct xarray *, unsigned long index);
+void *xa_store(struct xarray *, unsigned long index, void *entry, gfp_t);
+
+/**
+ * xa_empty() - Determine if an array has any present entries
+ * @xa: Array
+ *
+ * Return: True if the array has no entries in it.
+ */
+static inline bool xa_empty(const struct xarray *xa)
+{
+	return xa->xa_head == NULL;
+}
 
 typedef unsigned __bitwise xa_tag_t;
 #define XA_TAG_0		((__force xa_tag_t)0U)
@@ -91,9 +103,15 @@ typedef unsigned __bitwise xa_tag_t;
 
 #define XA_TAG_MAX		XA_TAG_2
 #define XA_FREE_TAG		XA_TAG_0
+#define XA_FLAGS_TRACK_FREE	((__force gfp_t)(1U << __GFP_BITS_SHIFT))
 #define XA_FLAGS_TAG(tag)	((__force gfp_t)((2U << __GFP_BITS_SHIFT) << \
 				(__force unsigned)(tag)))
 
+static inline bool xa_track_free(const struct xarray *xa)
+{
+	return xa->xa_flags & XA_FLAGS_TRACK_FREE;
+}
+
 /**
  * xa_tagged() - Inquire whether any entry in this array has a tag set
  * @xa: Array
@@ -263,6 +281,11 @@ static inline bool xa_is_internal(void *entry)
 	return ((unsigned long)entry & 3) == 2;
 }
 
+static inline void *xa_mk_node(struct xa_node *node)
+{
+	return (void *)((unsigned long)node | 2);
+}
+
 static inline struct xa_node *xa_to_node(void *entry)
 {
 	return (struct xa_node *)((unsigned long)entry & ~3UL);
@@ -386,10 +409,16 @@ static inline bool xas_retry(struct xa_state *xas, void *entry)
 }
 
 void *xas_load(struct xarray *, struct xa_state *);
+void *xas_store(struct xarray *, struct xa_state *, void *entry);
+void *xas_create(struct xarray *, struct xa_state *);
 
 bool xas_get_tag(const struct xarray *, const struct xa_state *, xa_tag_t);
 void xas_set_tag(struct xarray *, const struct xa_state *, xa_tag_t);
 void xas_clear_tag(struct xarray *, const struct xa_state *, xa_tag_t);
+void xas_init_tags(struct xarray *, const struct xa_state *);
+
+void xas_destroy(struct xa_state *);
+bool xas_nomem(struct xa_state *, gfp_t);
 
 /**
  * xas_reload() - Refetch an entry from the xarray.
@@ -415,4 +444,33 @@ static inline void *xas_reload(struct xarray *xa, struct xa_state *xas)
 	return xa_head(xa);
 }
 
+/**
+ * xas_set() - Set up XArray operation state for a different index.
+ * @xas: XArray operation state.
+ * @index: New index into the XArray.
+ *
+ * Move the operation state to refer to a different index.  This will
+ * have the effect of starting a walk from the top; see xas_next()
+ * to move to an adjacent index.
+ */
+static inline void xas_set(struct xa_state *xas, unsigned long index)
+{
+	xas->xa_index = index;
+	xas->xa_node = XAS_RESTART;
+}
+
+/**
+ * xas_set_order() - Set up XArray operation state for a multislot entry.
+ * @xas: XArray operation state.
+ * @index: Target of the operation.
+ * @order: Entry occupies 2^@order indices.
+ */
+static inline void xas_set_order(struct xa_state *xas, unsigned long index,
+					unsigned int order)
+{
+	xas->xa_index = (index >> order) << order;
+	xas->xa_shift = order - (order % XA_CHUNK_SHIFT);
+	xas->xa_sibs = (1 << (order % XA_CHUNK_SHIFT)) - 1;
+	xas->xa_node = XAS_RESTART;
+}
 #endif /* _LINUX_XARRAY_H */
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 711a6d9b79fc..507e1842255b 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -46,7 +46,7 @@ static unsigned long height_to_maxnodes[RADIX_TREE_MAX_PATH + 1] __read_mostly;
 /*
  * Radix tree node cache.
  */
-static struct kmem_cache *radix_tree_node_cachep;
+struct kmem_cache *radix_tree_node_cachep;
 
 /*
  * The radix tree is variable-height, so an insert operation not only has
@@ -407,7 +407,7 @@ radix_tree_node_alloc(gfp_t gfp_mask, struct radix_tree_node *parent,
 	return ret;
 }
 
-static void radix_tree_node_rcu_free(struct rcu_head *head)
+void radix_tree_node_rcu_free(struct rcu_head *head)
 {
 	struct radix_tree_node *node =
 			container_of(head, struct radix_tree_node, rcu_head);
diff --git a/lib/xarray.c b/lib/xarray.c
index fbc7de5a224f..a3f4f4ab673f 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -12,9 +12,11 @@
  * more details.
  */
 
+#include <linux/bitmap.h>
 #include <linux/export.h>
 #include <linux/gfp.h>
 #include <linux/radix-tree.h>
+#include <linux/slab.h>
 #include <linux/xarray.h>
 
 /*
@@ -75,11 +77,20 @@ static inline void tag_clear(struct xa_node *node, unsigned int offset,
 	__clear_bit(offset, node->tags[(__force unsigned)tag]);
 }
 
+static inline void tag_set_all(struct xa_node *node, xa_tag_t tag)
+{
+	bitmap_fill(node->tags[(__force unsigned)tag], XA_CHUNK_SIZE);
+}
+
 static inline bool tag_any_set(struct xa_node *node, xa_tag_t tag)
 {
 	return !bitmap_empty(node->tags[(__force unsigned)tag], XA_CHUNK_SIZE);
 }
 
+#define tag_inc(tag) do { \
+	tag = (__force xa_tag_t)((__force unsigned)(tag) + 1); \
+} while (0)
+
 /* extracts the offset within this node from the index */
 static unsigned int get_offset(unsigned long index, struct xa_node *node)
 {
@@ -161,6 +172,481 @@ void *xas_load(struct xarray *xa, struct xa_state *xas)
 }
 EXPORT_SYMBOL_GPL(xas_load);
 
+/* Move the radix tree node cache here */
+extern struct kmem_cache *radix_tree_node_cachep;
+extern void radix_tree_node_rcu_free(struct rcu_head *head);
+
+static void xa_node_free(struct xa_node *node)
+{
+	XA_BUG_ON(node, !list_empty(&node->private_list));
+	call_rcu(&node->rcu_head, radix_tree_node_rcu_free);
+}
+
+/**
+ * xas_destroy() - Free any resources allocated during the XArray operation.
+ * @xas: XArray operation state.
+ *
+ * If the operation only involved read accesses to the XArray or modifying
+ * existing data in the XArray, there is no need to call this function
+ * (eg xa_set_tag()).  However, if you may have allocated memory (for
+ * example by calling xas_nomem()), then call this function.
+ *
+ * This function does not reinitialise the state, so you may continue to
+ * call xas_error(), and you would want to call xas_init() before reusing
+ * this structure.  It only releases any resources.
+ */
+void xas_destroy(struct xa_state *xas)
+{
+	struct xa_node *node = xas->xa_alloc;
+
+	if (!node)
+		return;
+	XA_BUG_ON(node, !list_empty(&node->private_list));
+	kmem_cache_free(radix_tree_node_cachep, node);
+	xas->xa_alloc = NULL;
+}
+EXPORT_SYMBOL_GPL(xas_destroy);
+
+/**
+ * xas_nomem() - Allocate memory if needed.
+ * @xas: XArray operation state.
+ * @gfp: Memory allocation flags.
+ *
+ * If we need to add new nodes to the XArray, we try to allocate memory
+ * with GFP_NOWAIT while holding the lock, which will usually succeed.
+ * If it fails, @xas is flagged as needing memory to continue.  The caller
+ * should drop the lock and call xas_nomem().  If xas_nomem() succeeds,
+ * the caller should retry the operation.
+ *
+ * Forward progress is guaranteed as one node is allocated here and
+ * stored in the xa_state where it will be found by xas_alloc().  More
+ * nodes will likely be found in the slab allocator, but we do not tie
+ * them up here.
+ *
+ * Return: true if memory was needed, and was successfully allocated.
+ */
+bool xas_nomem(struct xa_state *xas, gfp_t gfp)
+{
+	if (xas->xa_node != XAS_ERROR(ENOMEM))
+		return false;
+	xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep, gfp);
+	if (!xas->xa_alloc)
+		return false;
+	XA_BUG_ON(xas->xa_alloc, !list_empty(&xas->xa_alloc->private_list));
+	xas->xa_node = XAS_RESTART;
+	return true;
+}
+EXPORT_SYMBOL_GPL(xas_nomem);
+
+static void *xas_alloc(struct xarray *xa, struct xa_state *xas,
+			unsigned int shift)
+{
+	struct xa_node *parent = xas->xa_node;
+	struct xa_node *node = xas->xa_alloc;
+
+	if (xas_invalid(xas))
+		return NULL;
+
+	if (node) {
+		xas->xa_alloc = NULL;
+	} else {
+		node = kmem_cache_alloc(radix_tree_node_cachep,
+					GFP_NOWAIT | __GFP_NOWARN);
+		if (!node) {
+			xas_set_err(xas, ENOMEM);
+			return NULL;
+		}
+	}
+
+	if (xas->xa_node) {
+		node->offset = xas->xa_offset;
+		parent->count++;
+		XA_BUG_ON(node, parent->count > XA_CHUNK_SIZE);
+	}
+	XA_BUG_ON(node, shift > BITS_PER_LONG);
+	XA_BUG_ON(node, !list_empty(&node->private_list));
+	node->shift = shift;
+	node->count = 0;
+	node->exceptional = 0;
+	RCU_INIT_POINTER(node->parent, xas->xa_node);
+	node->root = xa;
+
+	return node;
+}
+
+/*
+ * Use this to calculate the maximum index that will need to be created
+ * in order to add the entry described by @xas.  Because we cannot store a
+ * multiple-slot entry at index 0, the calculation is a little more complex
+ * than you might expect.
+ */
+static unsigned long xas_max(struct xa_state *xas)
+{
+	unsigned long mask, max = xas->xa_index;
+
+	if (xas->xa_shift || xas->xa_sibs) {
+		mask = (((xas->xa_sibs + 1UL) << xas->xa_shift) - 1);
+		max |= mask;
+		if (mask == max)
+			max++;
+	}
+
+	return max;
+}
+
+/* The maximum index that can be contained in the array without expanding it */
+static unsigned long max_index(void *entry)
+{
+	if (!xa_is_node(entry))
+		return 0;
+	return (XA_CHUNK_SIZE << xa_to_node(entry)->shift) - 1;
+}
+
+static void xas_shrink(struct xarray *xa, const struct xa_state *xas)
+{
+	struct xa_node *node = xas->xa_node;
+
+	for (;;) {
+		void *entry;
+
+		XA_BUG_ON(node, node->count > XA_CHUNK_SIZE);
+		if (node->count != 1)
+			break;
+		entry = xa_entry_locked(xa, node, 0);
+		if (!entry)
+			break;
+		if (!xa_is_node(entry) && node->shift)
+			break;
+
+		RCU_INIT_POINTER(xa->xa_head, entry);
+		if (xa_track_free(xa) && !tag_get(node, 0, XA_FREE_TAG))
+			xa_tag_clear(xa, XA_FREE_TAG);
+
+		node->count = 0;
+		node->exceptional = 0;
+		if (xa_is_node(entry))
+			RCU_INIT_POINTER(node->slots[0], XA_RETRY_ENTRY);
+		XA_BUG_ON(node, !list_empty(&node->private_list));
+		xa_node_free(node);
+		if (!xa_is_node(entry))
+			break;
+		node = xa_to_node(entry);
+		if (xas->xa_update)
+			xas->xa_update(node);
+		else
+			XA_BUG_ON(node, !list_empty(&node->private_list));
+	}
+}
+
+/*
+ * xas_delete_node() - Attempt to delete an xa_node
+ * @xa: Array
+ * @xas: Array operation state.
+ *
+ * Attempts to delete the @xas->xa_node.  This will fail if xa->node has
+ * a non-zero reference count.
+ */
+static void xas_delete_node(struct xarray *xa, struct xa_state *xas)
+{
+	struct xa_node *node = xas->xa_node;
+
+	for (;;) {
+		struct xa_node *parent;
+
+		XA_BUG_ON(node, node->count > XA_CHUNK_SIZE);
+		if (node->count)
+			break;
+
+		parent = xa_parent_locked(xa, node);
+		xas->xa_node = parent;
+		xas->xa_offset = node->offset;
+		XA_BUG_ON(node, !list_empty(&node->private_list));
+		xa_node_free(node);
+
+		if (!parent) {
+			xa->xa_head = NULL;
+			xas->xa_node = XAS_RESTART;
+			return;
+		}
+
+		parent->slots[xas->xa_offset] = NULL;
+		parent->count--;
+		XA_BUG_ON(node, parent->count > XA_CHUNK_SIZE);
+		node = parent;
+		if (xas->xa_update)
+			xas->xa_update(node);
+		else
+			XA_BUG_ON(node, !list_empty(&node->private_list));
+	}
+
+	if (!node->parent)
+		xas_shrink(xa, xas);
+}
+
+/**
+ * xas_free_nodes() - Free this node and all nodes that it references
+ * @xa: Array
+ * @xas: Array operation state.
+ * @top: Node to free
+ *
+ * This node has been removed from the tree.  We must now free it and all
+ * of its subnodes.  There may be RCU walkers with references into the tree,
+ * so we must replace all entries with retry markers.
+ */
+static void xas_free_nodes(struct xarray *xa, struct xa_state *xas,
+		struct xa_node *top)
+{
+	unsigned int offset = 0;
+	struct xa_node *node = top;
+
+	for (;;) {
+		void *entry = xa_entry_locked(xa, node, offset);
+
+		if (xa_is_node(entry)) {
+			node = xa_to_node(entry);
+			offset = 0;
+			continue;
+		}
+		if (entry)
+			RCU_INIT_POINTER(node->slots[offset], XA_RETRY_ENTRY);
+		offset++;
+		while (offset == XA_CHUNK_SIZE) {
+			struct xa_node *parent = xa_parent_locked(xa, node);
+
+			offset = node->offset + 1;
+			node->count = 0;
+			node->exceptional = 0;
+			if (xas->xa_update)
+				xas->xa_update(node);
+			XA_BUG_ON(node, !list_empty(&node->private_list));
+			xa_node_free(node);
+			if (node == top)
+				return;
+			node = parent;
+		}
+	}
+}
+
+/*
+ * xas_expand adds nodes to the head of the tree until it has reached
+ * sufficient height to be able to contain @xas->xa_index
+ */
+static int xas_expand(struct xarray *xa, struct xa_state *xas, void *head)
+{
+	struct xa_node *node = NULL;
+	unsigned int shift = 0;
+	unsigned long max = xas_max(xas);
+
+	if (!head) {
+		if (max == 0)
+			return 0;
+		while ((max >> shift) >= XA_CHUNK_SIZE)
+			shift += XA_CHUNK_SHIFT;
+		return shift + XA_CHUNK_SHIFT;
+	} else if (xa_is_node(head)) {
+		node = xa_to_node(head);
+		shift = node->shift + XA_CHUNK_SHIFT;
+	}
+	xas->xa_node = NULL;
+
+	while (max > max_index(head)) {
+		xa_tag_t tag = 0;
+
+		XA_BUG_ON(node, shift > BITS_PER_LONG);
+		node = xas_alloc(xa, xas, shift);
+		if (!node)
+			return -ENOMEM;
+
+		node->count = 1;
+		if (xa_is_value(head))
+			node->exceptional = 1;
+		RCU_INIT_POINTER(node->slots[0], head);
+
+		/* Propagate the aggregated tag info to the new child */
+		if (xa_track_free(xa)) {
+			tag_set_all(node, XA_FREE_TAG);
+			if (!xa_tagged(xa, XA_FREE_TAG)) {
+				tag_clear(node, 0, XA_FREE_TAG);
+				xa_tag_set(xa, XA_FREE_TAG);
+			}
+			tag_inc(tag);
+		}
+		for (;;) {
+			if (xa_tagged(xa, tag))
+				tag_set(node, 0, tag);
+			if (tag == XA_TAG_MAX)
+				break;
+			tag_inc(tag);
+		}
+
+		/*
+		 * Now that the new node is fully initialised, we can add
+		 * it to the tree
+		 */
+		if (xa_is_node(head)) {
+			xa_to_node(head)->offset = 0;
+			rcu_assign_pointer(xa_to_node(head)->parent, node);
+		}
+		head = xa_mk_node(node);
+		rcu_assign_pointer(xa->xa_head, head);
+
+		shift += XA_CHUNK_SHIFT;
+	}
+
+	xas->xa_node = node;
+	return shift;
+}
+
+/**
+ * xas_create() - Create a slot to store an entry in.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ *
+ * Most users will not need to call this function directly, as it is called
+ * by xas_store().  It is useful for doing conditional store operations
+ * (see the xa_cmpxchg() implementation for an example).
+ *
+ * Return: If the slot already existed, returns the contents of this slot.
+ * If the slot was newly created, returns NULL.  If it failed to create the
+ * slot, returns NULL and indicates the error in @xas.
+ */
+void *xas_create(struct xarray *xa, struct xa_state *xas)
+{
+	void *entry;
+	void __rcu **slot;
+	struct xa_node *node = xas->xa_node;
+	int shift;
+	unsigned int order = xas->xa_shift;
+
+	if (node == XAS_RESTART) {
+		entry = xa_head_locked(xa);
+		xas->xa_node = NULL;
+		shift = xas_expand(xa, xas, entry);
+		if (shift < 0)
+			return NULL;
+		entry = xa_head_locked(xa);
+		slot = &xa->xa_head;
+	} else if (xas_error(xas)) {
+		return NULL;
+	} else if (node) {
+		unsigned int offset = xas->xa_offset;
+
+		shift = node->shift;
+		entry = xa_entry_locked(xa, node, offset);
+		slot = &node->slots[offset];
+	} else {
+		shift = 0;
+		entry = xa_head_locked(xa);
+		slot = &xa->xa_head;
+	}
+
+	while (shift > order) {
+		shift -= XA_CHUNK_SHIFT;
+		if (!entry) {
+			node = xas_alloc(xa, xas, shift);
+			if (!node)
+				break;
+			if (xa_track_free(xa))
+				tag_set_all(node, XA_FREE_TAG);
+			rcu_assign_pointer(*slot, xa_mk_node(node));
+		} else if (xa_is_node(entry)) {
+			node = xa_to_node(entry);
+		} else {
+			break;
+		}
+		entry = xas_descend(xa, xas, node);
+		slot = &node->slots[xas->xa_offset];
+	}
+
+	return entry;
+}
+EXPORT_SYMBOL_GPL(xas_create);
+
+static void store_siblings(struct xarray *xa, struct xa_state *xas,
+				void *entry, int *countp, int *valuesp)
+{
+	struct xa_node *node = xas->xa_node;
+	unsigned int sibs, offset = xas->xa_offset;
+	void *sibling = entry ? xa_mk_sibling(offset) : NULL;
+	void *real = entry;
+
+	if (!entry)
+		sibs = XA_CHUNK_SIZE;
+	else if (xas->xa_shift < node->shift)
+		sibs = 0;
+	else
+		sibs = xas->xa_sibs;
+
+	while (sibs--) {
+		void *next = xa_entry(xa, node, ++offset);
+
+		if (!xa_is_sibling(next)) {
+			if (!entry)
+				break;
+			real = next;
+		}
+		RCU_INIT_POINTER(node->slots[offset], sibling);
+		if (xa_is_node(next))
+			xas_free_nodes(xa, xas, xa_to_node(next));
+		*countp += !next - !entry;
+		*valuesp += !xa_is_value(real) - !xa_is_value(entry);
+	}
+}
+
+/**
+ * xas_store() - Store this entry in the XArray.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ * @entry: New entry.
+ *
+ * Return: The old entry at this index.
+ */
+void *xas_store(struct xarray *xa, struct xa_state *xas, void *entry)
+{
+	struct xa_node *node;
+	int count, values;
+	void *curr;
+
+	if (entry)
+		curr = xas_create(xa, xas);
+	else
+		curr = xas_load(xa, xas);
+	if (xas_invalid(xas))
+		return curr;
+	if ((curr == entry) && !xas->xa_sibs)
+		return curr;
+
+	node = xas->xa_node;
+	if (node)
+		rcu_assign_pointer(node->slots[xas->xa_offset], entry);
+	else
+		rcu_assign_pointer(xa->xa_head, entry);
+	if (!entry)
+		xas_init_tags(xa, xas);
+
+	values = !xa_is_value(curr) - !xa_is_value(entry);
+	count = !curr - !entry;
+	if (xa_is_node(curr))
+		xas_free_nodes(xa, xas, xa_to_node(curr));
+
+	if (node) {
+		store_siblings(xa, xas, entry, &count, &values);
+		node->count += count;
+		XA_BUG_ON(node, node->count > XA_CHUNK_SIZE);
+		node->exceptional += values;
+		XA_BUG_ON(node, node->exceptional > XA_CHUNK_SIZE);
+		if ((count || values) && xas->xa_update)
+			xas->xa_update(node);
+		else
+			XA_BUG_ON(node, !list_empty(&node->private_list));
+		if (count < 0)
+			xas_delete_node(xa, xas);
+	}
+
+	return curr;
+}
+EXPORT_SYMBOL_GPL(xas_store);
+
 /**
  * xas_get_tag() - Returns the state of this tag.
  * @xa: XArray.
@@ -244,6 +730,35 @@ void xas_clear_tag(struct xarray *xa, const struct xa_state *xas, xa_tag_t tag)
 }
 EXPORT_SYMBOL_GPL(xas_clear_tag);
 
+/**
+ * xas_init_tags() - Initialise all tags for the entry
+ * @xa: Array
+ * @xas: Array operations state.
+ *
+ * Initialise all tags for the entry specified by @xas.  If we're tracking
+ * free entries with a tag, we need to set it on all entries.  All other
+ * tags are cleared.
+ *
+ * This implementation is not as efficient as it could be; we may walk
+ * up the tree multiple times.
+ */
+void xas_init_tags(struct xarray *xa, const struct xa_state *xas)
+{
+	xa_tag_t tag = 0;
+
+	if (xa_track_free(xa)) {
+		xas_set_tag(xa, xas, XA_FREE_TAG);
+		tag_inc(tag);
+	}
+	for (;;) {
+		xas_clear_tag(xa, xas, tag);
+		if (tag == XA_TAG_MAX)
+			break;
+		tag_inc(tag);
+	}
+}
+EXPORT_SYMBOL_GPL(xas_init_tags);
+
 /**
  * xa_load() - Load an entry from an XArray.
  * @xa: XArray.
@@ -266,6 +781,43 @@ void *xa_load(struct xarray *xa, unsigned long index)
 }
 EXPORT_SYMBOL(xa_load);
 
+/**
+ * xa_store() - Store this entry in the XArray.
+ * @xa: XArray.
+ * @index: Index into array.
+ * @entry: New entry.
+ * @gfp: Allocation flags.
+ *
+ * Stores almost always succeed.  The notable exceptions:
+ *  - Attempted to store a reserved pointer entry (-EINVAL)
+ *  - Ran out of memory trying to allocate new nodes (-ENOMEM)
+ *
+ * Storing into an existing multislot entry updates the entry of every index.
+ *
+ * Return: The old entry at this index or ERR_PTR() if an error happened.
+ */
+void *xa_store(struct xarray *xa, unsigned long index, void *entry, gfp_t gfp)
+{
+	XA_STATE(xas, index);
+	unsigned long flags;
+	void *curr;
+
+	if (WARN_ON_ONCE(xa_is_internal(entry)))
+		return ERR_PTR(-EINVAL);
+
+	do {
+		xa_lock_irqsave(xa, flags);
+		curr = xas_store(xa, &xas, entry);
+		xa_unlock_irqrestore(xa, flags);
+	} while (xas_nomem(&xas, gfp));
+	xas_destroy(&xas);
+
+	if (xas_error(&xas))
+		curr = ERR_PTR(xas_error(&xas));
+	return curr;
+}
+EXPORT_SYMBOL(xa_store);
+
 /**
  * __xa_set_tag() - Set this tag on this entry.
  * @xa: XArray.
diff --git a/tools/testing/radix-tree/linux/rcupdate.h b/tools/testing/radix-tree/linux/rcupdate.h
index 25010bf86c1d..fd280b070fdb 100644
--- a/tools/testing/radix-tree/linux/rcupdate.h
+++ b/tools/testing/radix-tree/linux/rcupdate.h
@@ -7,5 +7,6 @@
 #define rcu_dereference_raw(p) rcu_dereference(p)
 #define rcu_dereference_protected(p, cond) rcu_dereference(p)
 #define rcu_dereference_check(p, cond) rcu_dereference(p)
+#define RCU_INIT_POINTER(p, v)	(p) = (v)
 
 #endif
diff --git a/tools/testing/radix-tree/xarray-test.c b/tools/testing/radix-tree/xarray-test.c
index 3f8f19cb3739..8412d7818152 100644
--- a/tools/testing/radix-tree/xarray-test.c
+++ b/tools/testing/radix-tree/xarray-test.c
@@ -35,12 +35,78 @@ void check_xa_load(struct xarray *xa)
 	}
 }
 
+static void *xa_store_order(struct xarray *xa, unsigned long index,
+				unsigned order, void *entry)
+{
+	XA_STATE(xas, 0);
+	void *curr;
+
+	xas_set_order(&xas, index, order);
+	do {
+		curr = xas_store(xa, &xas, entry);
+	} while (xas_nomem(&xas, GFP_KERNEL));
+	xas_destroy(&xas);
+
+	return curr;
+}
+
+void check_multi_store(struct xarray *xa)
+{
+	unsigned long i, j, k;
+
+	xa_store_order(xa, 0, 1, xa_mk_value(0));
+	assert(xa_load(xa, 0) == xa_mk_value(0));
+	assert(xa_load(xa, 1) == xa_mk_value(0));
+	assert(xa_load(xa, 2) == NULL);
+	assert(xa_to_node(xa_head(xa))->count == 2);
+	assert(xa_to_node(xa_head(xa))->exceptional == 2);
+
+	xa_store(xa, 3, xa, GFP_KERNEL);
+	assert(xa_load(xa, 0) == xa_mk_value(0));
+	assert(xa_load(xa, 1) == xa_mk_value(0));
+	assert(xa_load(xa, 2) == NULL);
+	assert(xa_to_node(xa_head(xa))->count == 3);
+	assert(xa_to_node(xa_head(xa))->exceptional == 2);
+
+	xa_store_order(xa, 0, 2, xa_mk_value(1));
+	assert(xa_load(xa, 0) == xa_mk_value(1));
+	assert(xa_load(xa, 1) == xa_mk_value(1));
+	assert(xa_load(xa, 2) == xa_mk_value(1));
+	assert(xa_load(xa, 3) == xa_mk_value(1));
+	assert(xa_load(xa, 4) == NULL);
+	assert(xa_to_node(xa_head(xa))->count == 4);
+	assert(xa_to_node(xa_head(xa))->exceptional == 4);
+
+	xa_store_order(xa, 0, 64, NULL);
+	assert(xa_empty(xa));
+
+	for (i = 0; i < 60; i++) {
+		for (j = 0; j < 60; j++) {
+			xa_store_order(xa, 0, i, xa_mk_value(i));
+			xa_store_order(xa, 0, j, xa_mk_value(j));
+
+			for (k = 0; k < 60; k++) {
+				void *entry = xa_load(xa, (1UL << k) - 1);
+				if ((i < k) && (j < k))
+					assert(entry == NULL);
+				else
+					assert(entry == xa_mk_value(j));
+			}
+
+			xa_store(xa, 0, NULL, GFP_KERNEL);
+			assert(xa_empty(xa));
+		}
+	}
+}
+
 void xarray_checks(void)
 {
 	RADIX_TREE(array, GFP_KERNEL);
 
 	check_xa_load(&array);
+	item_kill_tree(&array);
 
+	check_multi_store(&array);
 	item_kill_tree(&array);
 }
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 22/62] xarray: Add xa_store
@ 2017-11-22 21:06   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

xa_store() differs from radix_tree_insert() in that it will overwrite an
existing element in the array rather than returning an error.  This is
the behaviour which most users want, and those that want more complex
behaviour generally want to use the xas family of routines anyway.

For memory allocation, xa_store() will first attempt to request memory
from the slab allocator; if memory is not immediately available, it will
drop the xa_lock and allocate memory, keeping a pointer in the xa_state.
It does not use the per-CPU cache, although those will continue to exist
until all radix tree users are converted to the xarray.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h                    |  58 ++++
 lib/radix-tree.c                          |   4 +-
 lib/xarray.c                              | 552 ++++++++++++++++++++++++++++++
 tools/testing/radix-tree/linux/rcupdate.h |   1 +
 tools/testing/radix-tree/xarray-test.c    |  66 ++++
 5 files changed, 679 insertions(+), 2 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index ab6b1f5e685a..5e975c512018 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -82,6 +82,18 @@ struct xarray {
 #define DEFINE_XARRAY(name) struct xarray name = XARRAY_INIT(name)
 
 void *xa_load(struct xarray *, unsigned long index);
+void *xa_store(struct xarray *, unsigned long index, void *entry, gfp_t);
+
+/**
+ * xa_empty() - Determine if an array has any present entries
+ * @xa: Array
+ *
+ * Return: True if the array has no entries in it.
+ */
+static inline bool xa_empty(const struct xarray *xa)
+{
+	return xa->xa_head == NULL;
+}
 
 typedef unsigned __bitwise xa_tag_t;
 #define XA_TAG_0		((__force xa_tag_t)0U)
@@ -91,9 +103,15 @@ typedef unsigned __bitwise xa_tag_t;
 
 #define XA_TAG_MAX		XA_TAG_2
 #define XA_FREE_TAG		XA_TAG_0
+#define XA_FLAGS_TRACK_FREE	((__force gfp_t)(1U << __GFP_BITS_SHIFT))
 #define XA_FLAGS_TAG(tag)	((__force gfp_t)((2U << __GFP_BITS_SHIFT) << \
 				(__force unsigned)(tag)))
 
+static inline bool xa_track_free(const struct xarray *xa)
+{
+	return xa->xa_flags & XA_FLAGS_TRACK_FREE;
+}
+
 /**
  * xa_tagged() - Inquire whether any entry in this array has a tag set
  * @xa: Array
@@ -263,6 +281,11 @@ static inline bool xa_is_internal(void *entry)
 	return ((unsigned long)entry & 3) == 2;
 }
 
+static inline void *xa_mk_node(struct xa_node *node)
+{
+	return (void *)((unsigned long)node | 2);
+}
+
 static inline struct xa_node *xa_to_node(void *entry)
 {
 	return (struct xa_node *)((unsigned long)entry & ~3UL);
@@ -386,10 +409,16 @@ static inline bool xas_retry(struct xa_state *xas, void *entry)
 }
 
 void *xas_load(struct xarray *, struct xa_state *);
+void *xas_store(struct xarray *, struct xa_state *, void *entry);
+void *xas_create(struct xarray *, struct xa_state *);
 
 bool xas_get_tag(const struct xarray *, const struct xa_state *, xa_tag_t);
 void xas_set_tag(struct xarray *, const struct xa_state *, xa_tag_t);
 void xas_clear_tag(struct xarray *, const struct xa_state *, xa_tag_t);
+void xas_init_tags(struct xarray *, const struct xa_state *);
+
+void xas_destroy(struct xa_state *);
+bool xas_nomem(struct xa_state *, gfp_t);
 
 /**
  * xas_reload() - Refetch an entry from the xarray.
@@ -415,4 +444,33 @@ static inline void *xas_reload(struct xarray *xa, struct xa_state *xas)
 	return xa_head(xa);
 }
 
+/**
+ * xas_set() - Set up XArray operation state for a different index.
+ * @xas: XArray operation state.
+ * @index: New index into the XArray.
+ *
+ * Move the operation state to refer to a different index.  This will
+ * have the effect of starting a walk from the top; see xas_next()
+ * to move to an adjacent index.
+ */
+static inline void xas_set(struct xa_state *xas, unsigned long index)
+{
+	xas->xa_index = index;
+	xas->xa_node = XAS_RESTART;
+}
+
+/**
+ * xas_set_order() - Set up XArray operation state for a multislot entry.
+ * @xas: XArray operation state.
+ * @index: Target of the operation.
+ * @order: Entry occupies 2^@order indices.
+ */
+static inline void xas_set_order(struct xa_state *xas, unsigned long index,
+					unsigned int order)
+{
+	xas->xa_index = (index >> order) << order;
+	xas->xa_shift = order - (order % XA_CHUNK_SHIFT);
+	xas->xa_sibs = (1 << (order % XA_CHUNK_SHIFT)) - 1;
+	xas->xa_node = XAS_RESTART;
+}
 #endif /* _LINUX_XARRAY_H */
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 711a6d9b79fc..507e1842255b 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -46,7 +46,7 @@ static unsigned long height_to_maxnodes[RADIX_TREE_MAX_PATH + 1] __read_mostly;
 /*
  * Radix tree node cache.
  */
-static struct kmem_cache *radix_tree_node_cachep;
+struct kmem_cache *radix_tree_node_cachep;
 
 /*
  * The radix tree is variable-height, so an insert operation not only has
@@ -407,7 +407,7 @@ radix_tree_node_alloc(gfp_t gfp_mask, struct radix_tree_node *parent,
 	return ret;
 }
 
-static void radix_tree_node_rcu_free(struct rcu_head *head)
+void radix_tree_node_rcu_free(struct rcu_head *head)
 {
 	struct radix_tree_node *node =
 			container_of(head, struct radix_tree_node, rcu_head);
diff --git a/lib/xarray.c b/lib/xarray.c
index fbc7de5a224f..a3f4f4ab673f 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -12,9 +12,11 @@
  * more details.
  */
 
+#include <linux/bitmap.h>
 #include <linux/export.h>
 #include <linux/gfp.h>
 #include <linux/radix-tree.h>
+#include <linux/slab.h>
 #include <linux/xarray.h>
 
 /*
@@ -75,11 +77,20 @@ static inline void tag_clear(struct xa_node *node, unsigned int offset,
 	__clear_bit(offset, node->tags[(__force unsigned)tag]);
 }
 
+static inline void tag_set_all(struct xa_node *node, xa_tag_t tag)
+{
+	bitmap_fill(node->tags[(__force unsigned)tag], XA_CHUNK_SIZE);
+}
+
 static inline bool tag_any_set(struct xa_node *node, xa_tag_t tag)
 {
 	return !bitmap_empty(node->tags[(__force unsigned)tag], XA_CHUNK_SIZE);
 }
 
+#define tag_inc(tag) do { \
+	tag = (__force xa_tag_t)((__force unsigned)(tag) + 1); \
+} while (0)
+
 /* extracts the offset within this node from the index */
 static unsigned int get_offset(unsigned long index, struct xa_node *node)
 {
@@ -161,6 +172,481 @@ void *xas_load(struct xarray *xa, struct xa_state *xas)
 }
 EXPORT_SYMBOL_GPL(xas_load);
 
+/* Move the radix tree node cache here */
+extern struct kmem_cache *radix_tree_node_cachep;
+extern void radix_tree_node_rcu_free(struct rcu_head *head);
+
+static void xa_node_free(struct xa_node *node)
+{
+	XA_BUG_ON(node, !list_empty(&node->private_list));
+	call_rcu(&node->rcu_head, radix_tree_node_rcu_free);
+}
+
+/**
+ * xas_destroy() - Free any resources allocated during the XArray operation.
+ * @xas: XArray operation state.
+ *
+ * If the operation only involved read accesses to the XArray or modifying
+ * existing data in the XArray, there is no need to call this function
+ * (eg xa_set_tag()).  However, if you may have allocated memory (for
+ * example by calling xas_nomem()), then call this function.
+ *
+ * This function does not reinitialise the state, so you may continue to
+ * call xas_error(), and you would want to call xas_init() before reusing
+ * this structure.  It only releases any resources.
+ */
+void xas_destroy(struct xa_state *xas)
+{
+	struct xa_node *node = xas->xa_alloc;
+
+	if (!node)
+		return;
+	XA_BUG_ON(node, !list_empty(&node->private_list));
+	kmem_cache_free(radix_tree_node_cachep, node);
+	xas->xa_alloc = NULL;
+}
+EXPORT_SYMBOL_GPL(xas_destroy);
+
+/**
+ * xas_nomem() - Allocate memory if needed.
+ * @xas: XArray operation state.
+ * @gfp: Memory allocation flags.
+ *
+ * If we need to add new nodes to the XArray, we try to allocate memory
+ * with GFP_NOWAIT while holding the lock, which will usually succeed.
+ * If it fails, @xas is flagged as needing memory to continue.  The caller
+ * should drop the lock and call xas_nomem().  If xas_nomem() succeeds,
+ * the caller should retry the operation.
+ *
+ * Forward progress is guaranteed as one node is allocated here and
+ * stored in the xa_state where it will be found by xas_alloc().  More
+ * nodes will likely be found in the slab allocator, but we do not tie
+ * them up here.
+ *
+ * Return: true if memory was needed, and was successfully allocated.
+ */
+bool xas_nomem(struct xa_state *xas, gfp_t gfp)
+{
+	if (xas->xa_node != XAS_ERROR(ENOMEM))
+		return false;
+	xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep, gfp);
+	if (!xas->xa_alloc)
+		return false;
+	XA_BUG_ON(xas->xa_alloc, !list_empty(&xas->xa_alloc->private_list));
+	xas->xa_node = XAS_RESTART;
+	return true;
+}
+EXPORT_SYMBOL_GPL(xas_nomem);
+
+static void *xas_alloc(struct xarray *xa, struct xa_state *xas,
+			unsigned int shift)
+{
+	struct xa_node *parent = xas->xa_node;
+	struct xa_node *node = xas->xa_alloc;
+
+	if (xas_invalid(xas))
+		return NULL;
+
+	if (node) {
+		xas->xa_alloc = NULL;
+	} else {
+		node = kmem_cache_alloc(radix_tree_node_cachep,
+					GFP_NOWAIT | __GFP_NOWARN);
+		if (!node) {
+			xas_set_err(xas, ENOMEM);
+			return NULL;
+		}
+	}
+
+	if (xas->xa_node) {
+		node->offset = xas->xa_offset;
+		parent->count++;
+		XA_BUG_ON(node, parent->count > XA_CHUNK_SIZE);
+	}
+	XA_BUG_ON(node, shift > BITS_PER_LONG);
+	XA_BUG_ON(node, !list_empty(&node->private_list));
+	node->shift = shift;
+	node->count = 0;
+	node->exceptional = 0;
+	RCU_INIT_POINTER(node->parent, xas->xa_node);
+	node->root = xa;
+
+	return node;
+}
+
+/*
+ * Use this to calculate the maximum index that will need to be created
+ * in order to add the entry described by @xas.  Because we cannot store a
+ * multiple-slot entry at index 0, the calculation is a little more complex
+ * than you might expect.
+ */
+static unsigned long xas_max(struct xa_state *xas)
+{
+	unsigned long mask, max = xas->xa_index;
+
+	if (xas->xa_shift || xas->xa_sibs) {
+		mask = (((xas->xa_sibs + 1UL) << xas->xa_shift) - 1);
+		max |= mask;
+		if (mask == max)
+			max++;
+	}
+
+	return max;
+}
+
+/* The maximum index that can be contained in the array without expanding it */
+static unsigned long max_index(void *entry)
+{
+	if (!xa_is_node(entry))
+		return 0;
+	return (XA_CHUNK_SIZE << xa_to_node(entry)->shift) - 1;
+}
+
+static void xas_shrink(struct xarray *xa, const struct xa_state *xas)
+{
+	struct xa_node *node = xas->xa_node;
+
+	for (;;) {
+		void *entry;
+
+		XA_BUG_ON(node, node->count > XA_CHUNK_SIZE);
+		if (node->count != 1)
+			break;
+		entry = xa_entry_locked(xa, node, 0);
+		if (!entry)
+			break;
+		if (!xa_is_node(entry) && node->shift)
+			break;
+
+		RCU_INIT_POINTER(xa->xa_head, entry);
+		if (xa_track_free(xa) && !tag_get(node, 0, XA_FREE_TAG))
+			xa_tag_clear(xa, XA_FREE_TAG);
+
+		node->count = 0;
+		node->exceptional = 0;
+		if (xa_is_node(entry))
+			RCU_INIT_POINTER(node->slots[0], XA_RETRY_ENTRY);
+		XA_BUG_ON(node, !list_empty(&node->private_list));
+		xa_node_free(node);
+		if (!xa_is_node(entry))
+			break;
+		node = xa_to_node(entry);
+		if (xas->xa_update)
+			xas->xa_update(node);
+		else
+			XA_BUG_ON(node, !list_empty(&node->private_list));
+	}
+}
+
+/*
+ * xas_delete_node() - Attempt to delete an xa_node
+ * @xa: Array
+ * @xas: Array operation state.
+ *
+ * Attempts to delete the @xas->xa_node.  This will fail if xa->node has
+ * a non-zero reference count.
+ */
+static void xas_delete_node(struct xarray *xa, struct xa_state *xas)
+{
+	struct xa_node *node = xas->xa_node;
+
+	for (;;) {
+		struct xa_node *parent;
+
+		XA_BUG_ON(node, node->count > XA_CHUNK_SIZE);
+		if (node->count)
+			break;
+
+		parent = xa_parent_locked(xa, node);
+		xas->xa_node = parent;
+		xas->xa_offset = node->offset;
+		XA_BUG_ON(node, !list_empty(&node->private_list));
+		xa_node_free(node);
+
+		if (!parent) {
+			xa->xa_head = NULL;
+			xas->xa_node = XAS_RESTART;
+			return;
+		}
+
+		parent->slots[xas->xa_offset] = NULL;
+		parent->count--;
+		XA_BUG_ON(node, parent->count > XA_CHUNK_SIZE);
+		node = parent;
+		if (xas->xa_update)
+			xas->xa_update(node);
+		else
+			XA_BUG_ON(node, !list_empty(&node->private_list));
+	}
+
+	if (!node->parent)
+		xas_shrink(xa, xas);
+}
+
+/**
+ * xas_free_nodes() - Free this node and all nodes that it references
+ * @xa: Array
+ * @xas: Array operation state.
+ * @top: Node to free
+ *
+ * This node has been removed from the tree.  We must now free it and all
+ * of its subnodes.  There may be RCU walkers with references into the tree,
+ * so we must replace all entries with retry markers.
+ */
+static void xas_free_nodes(struct xarray *xa, struct xa_state *xas,
+		struct xa_node *top)
+{
+	unsigned int offset = 0;
+	struct xa_node *node = top;
+
+	for (;;) {
+		void *entry = xa_entry_locked(xa, node, offset);
+
+		if (xa_is_node(entry)) {
+			node = xa_to_node(entry);
+			offset = 0;
+			continue;
+		}
+		if (entry)
+			RCU_INIT_POINTER(node->slots[offset], XA_RETRY_ENTRY);
+		offset++;
+		while (offset == XA_CHUNK_SIZE) {
+			struct xa_node *parent = xa_parent_locked(xa, node);
+
+			offset = node->offset + 1;
+			node->count = 0;
+			node->exceptional = 0;
+			if (xas->xa_update)
+				xas->xa_update(node);
+			XA_BUG_ON(node, !list_empty(&node->private_list));
+			xa_node_free(node);
+			if (node == top)
+				return;
+			node = parent;
+		}
+	}
+}
+
+/*
+ * xas_expand adds nodes to the head of the tree until it has reached
+ * sufficient height to be able to contain @xas->xa_index
+ */
+static int xas_expand(struct xarray *xa, struct xa_state *xas, void *head)
+{
+	struct xa_node *node = NULL;
+	unsigned int shift = 0;
+	unsigned long max = xas_max(xas);
+
+	if (!head) {
+		if (max == 0)
+			return 0;
+		while ((max >> shift) >= XA_CHUNK_SIZE)
+			shift += XA_CHUNK_SHIFT;
+		return shift + XA_CHUNK_SHIFT;
+	} else if (xa_is_node(head)) {
+		node = xa_to_node(head);
+		shift = node->shift + XA_CHUNK_SHIFT;
+	}
+	xas->xa_node = NULL;
+
+	while (max > max_index(head)) {
+		xa_tag_t tag = 0;
+
+		XA_BUG_ON(node, shift > BITS_PER_LONG);
+		node = xas_alloc(xa, xas, shift);
+		if (!node)
+			return -ENOMEM;
+
+		node->count = 1;
+		if (xa_is_value(head))
+			node->exceptional = 1;
+		RCU_INIT_POINTER(node->slots[0], head);
+
+		/* Propagate the aggregated tag info to the new child */
+		if (xa_track_free(xa)) {
+			tag_set_all(node, XA_FREE_TAG);
+			if (!xa_tagged(xa, XA_FREE_TAG)) {
+				tag_clear(node, 0, XA_FREE_TAG);
+				xa_tag_set(xa, XA_FREE_TAG);
+			}
+			tag_inc(tag);
+		}
+		for (;;) {
+			if (xa_tagged(xa, tag))
+				tag_set(node, 0, tag);
+			if (tag == XA_TAG_MAX)
+				break;
+			tag_inc(tag);
+		}
+
+		/*
+		 * Now that the new node is fully initialised, we can add
+		 * it to the tree
+		 */
+		if (xa_is_node(head)) {
+			xa_to_node(head)->offset = 0;
+			rcu_assign_pointer(xa_to_node(head)->parent, node);
+		}
+		head = xa_mk_node(node);
+		rcu_assign_pointer(xa->xa_head, head);
+
+		shift += XA_CHUNK_SHIFT;
+	}
+
+	xas->xa_node = node;
+	return shift;
+}
+
+/**
+ * xas_create() - Create a slot to store an entry in.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ *
+ * Most users will not need to call this function directly, as it is called
+ * by xas_store().  It is useful for doing conditional store operations
+ * (see the xa_cmpxchg() implementation for an example).
+ *
+ * Return: If the slot already existed, returns the contents of this slot.
+ * If the slot was newly created, returns NULL.  If it failed to create the
+ * slot, returns NULL and indicates the error in @xas.
+ */
+void *xas_create(struct xarray *xa, struct xa_state *xas)
+{
+	void *entry;
+	void __rcu **slot;
+	struct xa_node *node = xas->xa_node;
+	int shift;
+	unsigned int order = xas->xa_shift;
+
+	if (node == XAS_RESTART) {
+		entry = xa_head_locked(xa);
+		xas->xa_node = NULL;
+		shift = xas_expand(xa, xas, entry);
+		if (shift < 0)
+			return NULL;
+		entry = xa_head_locked(xa);
+		slot = &xa->xa_head;
+	} else if (xas_error(xas)) {
+		return NULL;
+	} else if (node) {
+		unsigned int offset = xas->xa_offset;
+
+		shift = node->shift;
+		entry = xa_entry_locked(xa, node, offset);
+		slot = &node->slots[offset];
+	} else {
+		shift = 0;
+		entry = xa_head_locked(xa);
+		slot = &xa->xa_head;
+	}
+
+	while (shift > order) {
+		shift -= XA_CHUNK_SHIFT;
+		if (!entry) {
+			node = xas_alloc(xa, xas, shift);
+			if (!node)
+				break;
+			if (xa_track_free(xa))
+				tag_set_all(node, XA_FREE_TAG);
+			rcu_assign_pointer(*slot, xa_mk_node(node));
+		} else if (xa_is_node(entry)) {
+			node = xa_to_node(entry);
+		} else {
+			break;
+		}
+		entry = xas_descend(xa, xas, node);
+		slot = &node->slots[xas->xa_offset];
+	}
+
+	return entry;
+}
+EXPORT_SYMBOL_GPL(xas_create);
+
+static void store_siblings(struct xarray *xa, struct xa_state *xas,
+				void *entry, int *countp, int *valuesp)
+{
+	struct xa_node *node = xas->xa_node;
+	unsigned int sibs, offset = xas->xa_offset;
+	void *sibling = entry ? xa_mk_sibling(offset) : NULL;
+	void *real = entry;
+
+	if (!entry)
+		sibs = XA_CHUNK_SIZE;
+	else if (xas->xa_shift < node->shift)
+		sibs = 0;
+	else
+		sibs = xas->xa_sibs;
+
+	while (sibs--) {
+		void *next = xa_entry(xa, node, ++offset);
+
+		if (!xa_is_sibling(next)) {
+			if (!entry)
+				break;
+			real = next;
+		}
+		RCU_INIT_POINTER(node->slots[offset], sibling);
+		if (xa_is_node(next))
+			xas_free_nodes(xa, xas, xa_to_node(next));
+		*countp += !next - !entry;
+		*valuesp += !xa_is_value(real) - !xa_is_value(entry);
+	}
+}
+
+/**
+ * xas_store() - Store this entry in the XArray.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ * @entry: New entry.
+ *
+ * Return: The old entry at this index.
+ */
+void *xas_store(struct xarray *xa, struct xa_state *xas, void *entry)
+{
+	struct xa_node *node;
+	int count, values;
+	void *curr;
+
+	if (entry)
+		curr = xas_create(xa, xas);
+	else
+		curr = xas_load(xa, xas);
+	if (xas_invalid(xas))
+		return curr;
+	if ((curr == entry) && !xas->xa_sibs)
+		return curr;
+
+	node = xas->xa_node;
+	if (node)
+		rcu_assign_pointer(node->slots[xas->xa_offset], entry);
+	else
+		rcu_assign_pointer(xa->xa_head, entry);
+	if (!entry)
+		xas_init_tags(xa, xas);
+
+	values = !xa_is_value(curr) - !xa_is_value(entry);
+	count = !curr - !entry;
+	if (xa_is_node(curr))
+		xas_free_nodes(xa, xas, xa_to_node(curr));
+
+	if (node) {
+		store_siblings(xa, xas, entry, &count, &values);
+		node->count += count;
+		XA_BUG_ON(node, node->count > XA_CHUNK_SIZE);
+		node->exceptional += values;
+		XA_BUG_ON(node, node->exceptional > XA_CHUNK_SIZE);
+		if ((count || values) && xas->xa_update)
+			xas->xa_update(node);
+		else
+			XA_BUG_ON(node, !list_empty(&node->private_list));
+		if (count < 0)
+			xas_delete_node(xa, xas);
+	}
+
+	return curr;
+}
+EXPORT_SYMBOL_GPL(xas_store);
+
 /**
  * xas_get_tag() - Returns the state of this tag.
  * @xa: XArray.
@@ -244,6 +730,35 @@ void xas_clear_tag(struct xarray *xa, const struct xa_state *xas, xa_tag_t tag)
 }
 EXPORT_SYMBOL_GPL(xas_clear_tag);
 
+/**
+ * xas_init_tags() - Initialise all tags for the entry
+ * @xa: Array
+ * @xas: Array operations state.
+ *
+ * Initialise all tags for the entry specified by @xas.  If we're tracking
+ * free entries with a tag, we need to set it on all entries.  All other
+ * tags are cleared.
+ *
+ * This implementation is not as efficient as it could be; we may walk
+ * up the tree multiple times.
+ */
+void xas_init_tags(struct xarray *xa, const struct xa_state *xas)
+{
+	xa_tag_t tag = 0;
+
+	if (xa_track_free(xa)) {
+		xas_set_tag(xa, xas, XA_FREE_TAG);
+		tag_inc(tag);
+	}
+	for (;;) {
+		xas_clear_tag(xa, xas, tag);
+		if (tag == XA_TAG_MAX)
+			break;
+		tag_inc(tag);
+	}
+}
+EXPORT_SYMBOL_GPL(xas_init_tags);
+
 /**
  * xa_load() - Load an entry from an XArray.
  * @xa: XArray.
@@ -266,6 +781,43 @@ void *xa_load(struct xarray *xa, unsigned long index)
 }
 EXPORT_SYMBOL(xa_load);
 
+/**
+ * xa_store() - Store this entry in the XArray.
+ * @xa: XArray.
+ * @index: Index into array.
+ * @entry: New entry.
+ * @gfp: Allocation flags.
+ *
+ * Stores almost always succeed.  The notable exceptions:
+ *  - Attempted to store a reserved pointer entry (-EINVAL)
+ *  - Ran out of memory trying to allocate new nodes (-ENOMEM)
+ *
+ * Storing into an existing multislot entry updates the entry of every index.
+ *
+ * Return: The old entry at this index or ERR_PTR() if an error happened.
+ */
+void *xa_store(struct xarray *xa, unsigned long index, void *entry, gfp_t gfp)
+{
+	XA_STATE(xas, index);
+	unsigned long flags;
+	void *curr;
+
+	if (WARN_ON_ONCE(xa_is_internal(entry)))
+		return ERR_PTR(-EINVAL);
+
+	do {
+		xa_lock_irqsave(xa, flags);
+		curr = xas_store(xa, &xas, entry);
+		xa_unlock_irqrestore(xa, flags);
+	} while (xas_nomem(&xas, gfp));
+	xas_destroy(&xas);
+
+	if (xas_error(&xas))
+		curr = ERR_PTR(xas_error(&xas));
+	return curr;
+}
+EXPORT_SYMBOL(xa_store);
+
 /**
  * __xa_set_tag() - Set this tag on this entry.
  * @xa: XArray.
diff --git a/tools/testing/radix-tree/linux/rcupdate.h b/tools/testing/radix-tree/linux/rcupdate.h
index 25010bf86c1d..fd280b070fdb 100644
--- a/tools/testing/radix-tree/linux/rcupdate.h
+++ b/tools/testing/radix-tree/linux/rcupdate.h
@@ -7,5 +7,6 @@
 #define rcu_dereference_raw(p) rcu_dereference(p)
 #define rcu_dereference_protected(p, cond) rcu_dereference(p)
 #define rcu_dereference_check(p, cond) rcu_dereference(p)
+#define RCU_INIT_POINTER(p, v)	(p) = (v)
 
 #endif
diff --git a/tools/testing/radix-tree/xarray-test.c b/tools/testing/radix-tree/xarray-test.c
index 3f8f19cb3739..8412d7818152 100644
--- a/tools/testing/radix-tree/xarray-test.c
+++ b/tools/testing/radix-tree/xarray-test.c
@@ -35,12 +35,78 @@ void check_xa_load(struct xarray *xa)
 	}
 }
 
+static void *xa_store_order(struct xarray *xa, unsigned long index,
+				unsigned order, void *entry)
+{
+	XA_STATE(xas, 0);
+	void *curr;
+
+	xas_set_order(&xas, index, order);
+	do {
+		curr = xas_store(xa, &xas, entry);
+	} while (xas_nomem(&xas, GFP_KERNEL));
+	xas_destroy(&xas);
+
+	return curr;
+}
+
+void check_multi_store(struct xarray *xa)
+{
+	unsigned long i, j, k;
+
+	xa_store_order(xa, 0, 1, xa_mk_value(0));
+	assert(xa_load(xa, 0) == xa_mk_value(0));
+	assert(xa_load(xa, 1) == xa_mk_value(0));
+	assert(xa_load(xa, 2) == NULL);
+	assert(xa_to_node(xa_head(xa))->count == 2);
+	assert(xa_to_node(xa_head(xa))->exceptional == 2);
+
+	xa_store(xa, 3, xa, GFP_KERNEL);
+	assert(xa_load(xa, 0) == xa_mk_value(0));
+	assert(xa_load(xa, 1) == xa_mk_value(0));
+	assert(xa_load(xa, 2) == NULL);
+	assert(xa_to_node(xa_head(xa))->count == 3);
+	assert(xa_to_node(xa_head(xa))->exceptional == 2);
+
+	xa_store_order(xa, 0, 2, xa_mk_value(1));
+	assert(xa_load(xa, 0) == xa_mk_value(1));
+	assert(xa_load(xa, 1) == xa_mk_value(1));
+	assert(xa_load(xa, 2) == xa_mk_value(1));
+	assert(xa_load(xa, 3) == xa_mk_value(1));
+	assert(xa_load(xa, 4) == NULL);
+	assert(xa_to_node(xa_head(xa))->count == 4);
+	assert(xa_to_node(xa_head(xa))->exceptional == 4);
+
+	xa_store_order(xa, 0, 64, NULL);
+	assert(xa_empty(xa));
+
+	for (i = 0; i < 60; i++) {
+		for (j = 0; j < 60; j++) {
+			xa_store_order(xa, 0, i, xa_mk_value(i));
+			xa_store_order(xa, 0, j, xa_mk_value(j));
+
+			for (k = 0; k < 60; k++) {
+				void *entry = xa_load(xa, (1UL << k) - 1);
+				if ((i < k) && (j < k))
+					assert(entry == NULL);
+				else
+					assert(entry == xa_mk_value(j));
+			}
+
+			xa_store(xa, 0, NULL, GFP_KERNEL);
+			assert(xa_empty(xa));
+		}
+	}
+}
+
 void xarray_checks(void)
 {
 	RADIX_TREE(array, GFP_KERNEL);
 
 	check_xa_load(&array);
+	item_kill_tree(&array);
 
+	check_multi_store(&array);
 	item_kill_tree(&array);
 }
 
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 23/62] xarray: Add xa_cmpxchg
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This works like doing cmpxchg() on an array entry.  Code which wants
the radix_tree_insert() semantic of not overwriting an existing entry
can cmpxchg() with NULL and get the action it wants.  Plus, instead of
having an error returned, they get the value currently stored in the
array, which often saves them a subsequent lookup.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h |  2 ++
 lib/xarray.c           | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 5e975c512018..274dd7530e40 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -83,6 +83,8 @@ struct xarray {
 
 void *xa_load(struct xarray *, unsigned long index);
 void *xa_store(struct xarray *, unsigned long index, void *entry, gfp_t);
+void *xa_cmpxchg(struct xarray *, unsigned long index,
+			void *old, void *entry, gfp_t);
 
 /**
  * xa_empty() - Determine if an array has any present entries
diff --git a/lib/xarray.c b/lib/xarray.c
index a3f4f4ab673f..82f39d86fc76 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -818,6 +818,44 @@ void *xa_store(struct xarray *xa, unsigned long index, void *entry, gfp_t gfp)
 }
 EXPORT_SYMBOL(xa_store);
 
+/**
+ * xa_cmpxchg() - Conditionally replace an entry in the XArray.
+ * @xa: XArray.
+ * @index: Index into array.
+ * @old: Old value to test against.
+ * @entry: New value to place in array.
+ * @gfp: Allocation flags.
+ *
+ * If the entry at @index is the same as @old, replace it with @entry.
+ * If the return value is equal to @old, then the exchange was successful.
+ *
+ * Return: The old value at this index or ERR_PTR() if an error happened.
+ */
+void *xa_cmpxchg(struct xarray *xa, unsigned long index,
+			void *old, void *entry, gfp_t gfp)
+{
+	XA_STATE(xas, index);
+	unsigned long flags;
+	void *curr;
+
+	if (WARN_ON_ONCE(xa_is_internal(entry)))
+		return ERR_PTR(-EINVAL);
+
+	do {
+		xa_lock_irqsave(xa, flags);
+		curr = xas_create(xa, &xas);
+		if (curr == old)
+			xas_store(xa, &xas, entry);
+		xa_unlock_irqrestore(xa, flags);
+	} while (xas_nomem(&xas, gfp));
+	xas_destroy(&xas);
+
+	if (xas_error(&xas))
+		curr = ERR_PTR(xas_error(&xas));
+	return curr;
+}
+EXPORT_SYMBOL(xa_cmpxchg);
+
 /**
  * __xa_set_tag() - Set this tag on this entry.
  * @xa: XArray.
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 23/62] xarray: Add xa_cmpxchg
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This works like doing cmpxchg() on an array entry.  Code which wants
the radix_tree_insert() semantic of not overwriting an existing entry
can cmpxchg() with NULL and get the action it wants.  Plus, instead of
having an error returned, they get the value currently stored in the
array, which often saves them a subsequent lookup.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h |  2 ++
 lib/xarray.c           | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 5e975c512018..274dd7530e40 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -83,6 +83,8 @@ struct xarray {
 
 void *xa_load(struct xarray *, unsigned long index);
 void *xa_store(struct xarray *, unsigned long index, void *entry, gfp_t);
+void *xa_cmpxchg(struct xarray *, unsigned long index,
+			void *old, void *entry, gfp_t);
 
 /**
  * xa_empty() - Determine if an array has any present entries
diff --git a/lib/xarray.c b/lib/xarray.c
index a3f4f4ab673f..82f39d86fc76 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -818,6 +818,44 @@ void *xa_store(struct xarray *xa, unsigned long index, void *entry, gfp_t gfp)
 }
 EXPORT_SYMBOL(xa_store);
 
+/**
+ * xa_cmpxchg() - Conditionally replace an entry in the XArray.
+ * @xa: XArray.
+ * @index: Index into array.
+ * @old: Old value to test against.
+ * @entry: New value to place in array.
+ * @gfp: Allocation flags.
+ *
+ * If the entry at @index is the same as @old, replace it with @entry.
+ * If the return value is equal to @old, then the exchange was successful.
+ *
+ * Return: The old value at this index or ERR_PTR() if an error happened.
+ */
+void *xa_cmpxchg(struct xarray *xa, unsigned long index,
+			void *old, void *entry, gfp_t gfp)
+{
+	XA_STATE(xas, index);
+	unsigned long flags;
+	void *curr;
+
+	if (WARN_ON_ONCE(xa_is_internal(entry)))
+		return ERR_PTR(-EINVAL);
+
+	do {
+		xa_lock_irqsave(xa, flags);
+		curr = xas_create(xa, &xas);
+		if (curr == old)
+			xas_store(xa, &xas, entry);
+		xa_unlock_irqrestore(xa, flags);
+	} while (xas_nomem(&xas, gfp));
+	xas_destroy(&xas);
+
+	if (xas_error(&xas))
+		curr = ERR_PTR(xas_error(&xas));
+	return curr;
+}
+EXPORT_SYMBOL(xa_cmpxchg);
+
 /**
  * __xa_set_tag() - Set this tag on this entry.
  * @xa: XArray.
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 24/62] xarray: Add xa_for_each
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This iterator allows the user to efficiently walk a range of the array,
executing the loop body once for each non-NULL entry in that range.
This commit also includes xa_find() and xa_next() which are helper
functions for xa_for_each() but may also be useful in their own right.

In the xas family of functions, we also have xas_for_each(),
xas_find(), xas_next(), xas_pause() and xas_jump().  xas_pause() allows
a xas_for_each() iteration to be resumed later from the next element
and xas_jump() allows an iteration to be resumed from a specified index.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h |  96 +++++++++++++++++++++++++++
 lib/radix-tree.c       |   4 +-
 lib/xarray.c           | 173 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 271 insertions(+), 2 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 274dd7530e40..08ddad60a43d 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -130,6 +130,35 @@ bool xa_get_tag(struct xarray *, unsigned long index, xa_tag_t);
 void *xa_set_tag(struct xarray *, unsigned long index, xa_tag_t);
 void *xa_clear_tag(struct xarray *, unsigned long index, xa_tag_t);
 
+void *xa_find(struct xarray *xa, unsigned long *index, unsigned long max);
+void *xa_next(struct xarray *xa, unsigned long *index, unsigned long max);
+
+/**
+ * xa_for_each() - Iterate over a portion of an XArray.
+ * @xa: XArray,
+ * @entry: Entry retrieved from array.
+ * @index: Index of @entry.
+ * @max: Maximum index to retrieve from array.
+ *
+ * Initialise @index to the minimum index you want to retrieve from
+ * the array.  During the iteration, @entry will have the value of the
+ * entry stored in @xa at @index.  The iteration will skip all NULL
+ * entries in the array.  You may modify @index during the
+ * iteration if you want to skip indices.  It is safe to modify the
+ * array during the iteration.  At the end of the iteration, @entry will
+ * be set to NULL and @index will have a value less than or equal to max.
+ *
+ * xa_for_each() is O(n.log(n)) while xas_for_each() is O(n).  You have
+ * to handle your own locking with xas_for_each(), and if you have to unlock
+ * after each iteration, it will also end up being O(n.log(n)).  xa_for_each()
+ * will spin if it hits a retry entry; if you intend to see retry entries,
+ * you should use the xas_for_each() iterator instead.  The xas_for_each()
+ * iterator will expand into more inline code than xa_for_each().
+ */
+#define xa_for_each(xa, entry, index, max) \
+	for (entry = xa_find(xa, &index, max); entry; \
+	     entry = xa_next(xa, &index, max))
+
 #define BITS_PER_XA_VALUE	(BITS_PER_LONG - 1)
 
 /**
@@ -391,6 +420,11 @@ static inline bool xas_valid(const struct xa_state *xas)
 	return !xas_invalid(xas);
 }
 
+static inline bool xas_not_node(struct xa_node *node)
+{
+	return (unsigned long)node < 4096;
+}
+
 /**
  * xas_retry() - Handle a retry entry.
  * @xas: XArray operation state.
@@ -413,6 +447,7 @@ static inline bool xas_retry(struct xa_state *xas, void *entry)
 void *xas_load(struct xarray *, struct xa_state *);
 void *xas_store(struct xarray *, struct xa_state *, void *entry);
 void *xas_create(struct xarray *, struct xa_state *);
+void *xas_find(struct xarray *, struct xa_state *, unsigned long max);
 
 bool xas_get_tag(const struct xarray *, const struct xa_state *, xa_tag_t);
 void xas_set_tag(struct xarray *, const struct xa_state *, xa_tag_t);
@@ -421,6 +456,7 @@ void xas_init_tags(struct xarray *, const struct xa_state *);
 
 void xas_destroy(struct xa_state *);
 bool xas_nomem(struct xa_state *, gfp_t);
+void xas_pause(struct xa_state *);
 
 /**
  * xas_reload() - Refetch an entry from the xarray.
@@ -475,4 +511,64 @@ static inline void xas_set_order(struct xa_state *xas, unsigned long index,
 	xas->xa_sibs = (1 << (order % XA_CHUNK_SHIFT)) - 1;
 	xas->xa_node = XAS_RESTART;
 }
+
+/* Skip over any of these entries when iterating */
+static inline bool xa_iter_skip(void *entry)
+{
+	return unlikely(!entry ||
+			(xa_is_internal(entry) && entry < XA_RETRY_ENTRY));
+}
+
+/**
+ * xas_next() - Advance iterator to next present entry.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ * @max: Highest index to return.
+ *
+ * xas_next() is an inline function to optimise xarray traversal for speed.
+ * It is equivalent to calling xas_find(), and will call xas_find() for all
+ * the hard cases.
+ *
+ * Return: The next present entry after the one currently referred to by @xas.
+ */
+static inline void *xas_next(struct xarray *xa, struct xa_state *xas,
+					unsigned long max)
+{
+	struct xa_node *node = xas->xa_node;
+	void *entry;
+
+	if (unlikely(xas_not_node(node) || node->shift))
+		return xas_find(xa, xas, max);
+
+	do {
+		if (unlikely(xas->xa_index >= max))
+			return xas_find(xa, xas, max);
+		if (unlikely(xas->xa_offset == XA_CHUNK_MASK))
+			return xas_find(xa, xas, max);
+		xas->xa_index++;
+		xas->xa_offset++;
+		entry = xa_entry(xa, node, xas->xa_offset);
+	} while (xa_iter_skip(entry));
+
+	return entry;
+}
+
+/**
+ * xas_for_each() - Iterate over a range of an XArray
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ * @entry: Entry retrieved from array.
+ * @max: Maximum index to retrieve from array.
+ *
+ * The loop body will be executed for each entry present in the xarray
+ * between the current xas position and @max.  @entry will be set to
+ * the entry retrieved from the xarray.  It is safe to delete entries
+ * from the array in the loop body.  You should hold either the RCU lock
+ * or the xa_lock while iterating.  If you need to drop the lock, call
+ * xas_pause() first.
+ */
+#define xas_for_each(xa, xas, entry, max) \
+	for (entry = xas_find(xa, xas, max); entry; \
+	     entry = xas_next(xa, xas, max))
+
 #endif /* _LINUX_XARRAY_H */
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 507e1842255b..29b38d447497 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -247,7 +247,7 @@ static inline unsigned long node_maxindex(const struct radix_tree_node *node)
 	return shift_maxindex(node->shift);
 }
 
-static unsigned long next_index(unsigned long index,
+static unsigned long rnext_index(unsigned long index,
 				const struct radix_tree_node *node,
 				unsigned long offset)
 {
@@ -2146,7 +2146,7 @@ void __rcu **idr_get_free(struct radix_tree_root *root,
 		if (!rtag_get(node, IDR_FREE, offset)) {
 			offset = radix_tree_find_next_bit(node, IDR_FREE,
 							offset + 1);
-			start = next_index(start, node, offset);
+			start = rnext_index(start, node, offset);
 			if (start > max)
 				return ERR_PTR(-ENOSPC);
 			while (offset == RADIX_TREE_MAP_SIZE) {
diff --git a/lib/xarray.c b/lib/xarray.c
index 82f39d86fc76..5409048e8b44 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -91,12 +91,24 @@ static inline bool tag_any_set(struct xa_node *node, xa_tag_t tag)
 	tag = (__force xa_tag_t)((__force unsigned)(tag) + 1); \
 } while (0)
 
+/* Returns the index of the next slot in this node.  @index may be unaligned. */
+static unsigned long next_index(unsigned long index, struct xa_node *node)
+{
+	return round_up(index + 1, 1UL << node->shift);
+}
+
 /* extracts the offset within this node from the index */
 static unsigned int get_offset(unsigned long index, struct xa_node *node)
 {
 	return (index >> node->shift) & XA_CHUNK_MASK;
 }
 
+static void xas_add(struct xa_state *xas, unsigned long val)
+{
+	xas->xa_index += (val << xas->xa_node->shift);
+	xas->xa_offset += val;
+}
+
 /*
  * Starts a walk.  If the @xas is already valid, we assume that it's on
  * the right path and just return where we've got to.  If we're in an
@@ -759,6 +771,103 @@ void xas_init_tags(struct xarray *xa, const struct xa_state *xas)
 }
 EXPORT_SYMBOL_GPL(xas_init_tags);
 
+/**
+ * xas_pause() - Pause a walk to drop a lock.
+ * @xas: XArray operation state.
+ *
+ * Some users need to pause a walk and drop the lock they're holding in
+ * order to yield to a higher priority thread or carry out an operation
+ * on an entry.  Those users should call this function before they drop
+ * the lock.  It resets the @xas to be suitable for the next iteration
+ * of the loop after the user has reacquired the lock.  If most entries
+ * found during a walk require you to call xas_pause(), the xa_for_each()
+ * iterator may be more appropriate.
+ *
+ * Note that xas_pause() only works for forward iteration.  If a user needs
+ * to pause a reverse iteration, we will need a xas_pause_rev().
+ */
+void xas_pause(struct xa_state *xas)
+{
+	struct xa_node *node = xas->xa_node;
+
+	if (node) {
+		unsigned int offset = xas->xa_offset;
+		xas->xa_index = next_index(xas->xa_index, node);
+		while (++offset < XA_CHUNK_SIZE) {
+			if (!xa_is_sibling(xa_entry(node->root, node, offset)))
+				break;
+			xas->xa_index += 1UL << node->shift;
+		}
+	} else {
+		xas->xa_index++;
+	}
+	xas->xa_node = XAS_RESTART;
+}
+EXPORT_SYMBOL_GPL(xas_pause);
+
+/**
+ * xas_find() - Find the next present entry in the XArray.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ * @max: Highest index to return.
+ *
+ * If the xas has not yet been walked to an entry, return the entry
+ * which has an index >= xas.xa_index.  If it has been walked, the entry
+ * currently being pointed at has been processed, and so we move to the
+ * next entry.
+ *
+ * If no entry is found and the array is smaller than @max, @xas is
+ * set to the restart state and xas->xa_index is set to the smallest index
+ * not yet in the array.  This allows @xas to be immediately passed to
+ * xas_create().
+ *
+ * Return: The entry, if found, otherwise NULL.
+ */
+void *xas_find(struct xarray *xa, struct xa_state *xas, unsigned long max)
+{
+	void *entry;
+
+	if (xas_error(xas))
+		return NULL;
+
+	if (xas->xa_node == XAS_RESTART) {
+		entry = xas_load(xa, xas);
+		if (entry || xas_not_node(xas->xa_node))
+			return entry;
+	} else if (!xas->xa_node) {
+		xas->xa_index = 1;
+		xas->xa_node = XAS_RESTART;
+		return NULL;
+	}
+
+	xas->xa_index = next_index(xas->xa_index, xas->xa_node);
+	xas->xa_offset++;
+
+	while (xas->xa_node && (xas->xa_index <= max)) {
+		if (unlikely(xas->xa_offset == XA_CHUNK_SIZE)) {
+			xas->xa_offset = xas->xa_node->offset + 1;
+			xas->xa_node = xa_parent(xa, xas->xa_node);
+			continue;
+		}
+
+		entry = xa_entry(xa, xas->xa_node, xas->xa_offset);
+		if (xa_is_node(entry)) {
+			xas->xa_node = xa_to_node(entry);
+			xas->xa_offset = 0;
+			continue;
+		}
+		if (!xa_iter_skip(entry))
+			return entry;
+
+		xas_add(xas, 1);
+	}
+
+	if (!xas->xa_node)
+		xas->xa_node = XAS_RESTART;
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(xas_find);
+
 /**
  * xa_load() - Load an entry from an XArray.
  * @xa: XArray.
@@ -975,3 +1084,67 @@ void *xa_clear_tag(struct xarray *xa, unsigned long index, xa_tag_t tag)
 	return entry;
 }
 EXPORT_SYMBOL(xa_clear_tag);
+
+/**
+ * xa_find() - Search the XArray for a present entry.
+ * @xa: XArray.
+ * @indexp: Pointer to an index.
+ * @max: Maximum index to search to.
+ *
+ * Finds the entry in the xarray with the lowest index that is between
+ * *@indexp and max, inclusive.  If an entry is found, updates @indexp to
+ * be the index of the pointer.  This function is protected by the RCU read
+ * lock, so it may not find all entries if called in a loop.  It will not
+ * return an %XA_RETRY_ENTRY; if you need to see retry entries, use xas_next().
+ *
+ * Return: The entry, if found, otherwise NULL.
+ */
+void *xa_find(struct xarray *xa, unsigned long *indexp, unsigned long max)
+{
+	XA_STATE(xas, *indexp);
+	void *entry;
+
+	rcu_read_lock();
+	do {
+		entry = xas_find(xa, &xas, max);
+	} while (xas_retry(&xas, entry));
+	rcu_read_unlock();
+
+	if (entry)
+		*indexp = xas.xa_index;
+	return entry;
+}
+EXPORT_SYMBOL(xa_find);
+
+/**
+ * xa_next() - Search the XArray for a present entry.
+ * @xa: XArray.
+ * @indexp: Pointer to an index.
+ * @max: Maximum index to search to.
+ *
+ * Finds the entry in @xa with the lowest index that is above *@indexp and
+ * less than or equal to @max.  If an entry is found, updates @indexp to be
+ * the index of the pointer.  This function is protected by the RCU read
+ * lock, so it may not find all entries if called in a loop.  It will not
+ * return an %XA_RETRY_ENTRY; if you need to see retry entries, use xas_next().
+ *
+ * Return: The pointer, if found, otherwise NULL.
+ */
+void *xa_next(struct xarray *xa, unsigned long *indexp, unsigned long max)
+{
+	XA_STATE(xas, *indexp + 1);
+	void *entry;
+
+	rcu_read_lock();
+	do {
+		entry = xas_find(xa, &xas, max);
+		if (xas.xa_index <= *indexp)
+			entry = xas_next(xa, &xas, max);
+	} while (xas_retry(&xas, entry));
+	rcu_read_unlock();
+
+	if (entry)
+		*indexp = xas.xa_index;
+	return entry;
+}
+EXPORT_SYMBOL(xa_next);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 24/62] xarray: Add xa_for_each
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This iterator allows the user to efficiently walk a range of the array,
executing the loop body once for each non-NULL entry in that range.
This commit also includes xa_find() and xa_next() which are helper
functions for xa_for_each() but may also be useful in their own right.

In the xas family of functions, we also have xas_for_each(),
xas_find(), xas_next(), xas_pause() and xas_jump().  xas_pause() allows
a xas_for_each() iteration to be resumed later from the next element
and xas_jump() allows an iteration to be resumed from a specified index.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h |  96 +++++++++++++++++++++++++++
 lib/radix-tree.c       |   4 +-
 lib/xarray.c           | 173 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 271 insertions(+), 2 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 274dd7530e40..08ddad60a43d 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -130,6 +130,35 @@ bool xa_get_tag(struct xarray *, unsigned long index, xa_tag_t);
 void *xa_set_tag(struct xarray *, unsigned long index, xa_tag_t);
 void *xa_clear_tag(struct xarray *, unsigned long index, xa_tag_t);
 
+void *xa_find(struct xarray *xa, unsigned long *index, unsigned long max);
+void *xa_next(struct xarray *xa, unsigned long *index, unsigned long max);
+
+/**
+ * xa_for_each() - Iterate over a portion of an XArray.
+ * @xa: XArray,
+ * @entry: Entry retrieved from array.
+ * @index: Index of @entry.
+ * @max: Maximum index to retrieve from array.
+ *
+ * Initialise @index to the minimum index you want to retrieve from
+ * the array.  During the iteration, @entry will have the value of the
+ * entry stored in @xa at @index.  The iteration will skip all NULL
+ * entries in the array.  You may modify @index during the
+ * iteration if you want to skip indices.  It is safe to modify the
+ * array during the iteration.  At the end of the iteration, @entry will
+ * be set to NULL and @index will have a value less than or equal to max.
+ *
+ * xa_for_each() is O(n.log(n)) while xas_for_each() is O(n).  You have
+ * to handle your own locking with xas_for_each(), and if you have to unlock
+ * after each iteration, it will also end up being O(n.log(n)).  xa_for_each()
+ * will spin if it hits a retry entry; if you intend to see retry entries,
+ * you should use the xas_for_each() iterator instead.  The xas_for_each()
+ * iterator will expand into more inline code than xa_for_each().
+ */
+#define xa_for_each(xa, entry, index, max) \
+	for (entry = xa_find(xa, &index, max); entry; \
+	     entry = xa_next(xa, &index, max))
+
 #define BITS_PER_XA_VALUE	(BITS_PER_LONG - 1)
 
 /**
@@ -391,6 +420,11 @@ static inline bool xas_valid(const struct xa_state *xas)
 	return !xas_invalid(xas);
 }
 
+static inline bool xas_not_node(struct xa_node *node)
+{
+	return (unsigned long)node < 4096;
+}
+
 /**
  * xas_retry() - Handle a retry entry.
  * @xas: XArray operation state.
@@ -413,6 +447,7 @@ static inline bool xas_retry(struct xa_state *xas, void *entry)
 void *xas_load(struct xarray *, struct xa_state *);
 void *xas_store(struct xarray *, struct xa_state *, void *entry);
 void *xas_create(struct xarray *, struct xa_state *);
+void *xas_find(struct xarray *, struct xa_state *, unsigned long max);
 
 bool xas_get_tag(const struct xarray *, const struct xa_state *, xa_tag_t);
 void xas_set_tag(struct xarray *, const struct xa_state *, xa_tag_t);
@@ -421,6 +456,7 @@ void xas_init_tags(struct xarray *, const struct xa_state *);
 
 void xas_destroy(struct xa_state *);
 bool xas_nomem(struct xa_state *, gfp_t);
+void xas_pause(struct xa_state *);
 
 /**
  * xas_reload() - Refetch an entry from the xarray.
@@ -475,4 +511,64 @@ static inline void xas_set_order(struct xa_state *xas, unsigned long index,
 	xas->xa_sibs = (1 << (order % XA_CHUNK_SHIFT)) - 1;
 	xas->xa_node = XAS_RESTART;
 }
+
+/* Skip over any of these entries when iterating */
+static inline bool xa_iter_skip(void *entry)
+{
+	return unlikely(!entry ||
+			(xa_is_internal(entry) && entry < XA_RETRY_ENTRY));
+}
+
+/**
+ * xas_next() - Advance iterator to next present entry.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ * @max: Highest index to return.
+ *
+ * xas_next() is an inline function to optimise xarray traversal for speed.
+ * It is equivalent to calling xas_find(), and will call xas_find() for all
+ * the hard cases.
+ *
+ * Return: The next present entry after the one currently referred to by @xas.
+ */
+static inline void *xas_next(struct xarray *xa, struct xa_state *xas,
+					unsigned long max)
+{
+	struct xa_node *node = xas->xa_node;
+	void *entry;
+
+	if (unlikely(xas_not_node(node) || node->shift))
+		return xas_find(xa, xas, max);
+
+	do {
+		if (unlikely(xas->xa_index >= max))
+			return xas_find(xa, xas, max);
+		if (unlikely(xas->xa_offset == XA_CHUNK_MASK))
+			return xas_find(xa, xas, max);
+		xas->xa_index++;
+		xas->xa_offset++;
+		entry = xa_entry(xa, node, xas->xa_offset);
+	} while (xa_iter_skip(entry));
+
+	return entry;
+}
+
+/**
+ * xas_for_each() - Iterate over a range of an XArray
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ * @entry: Entry retrieved from array.
+ * @max: Maximum index to retrieve from array.
+ *
+ * The loop body will be executed for each entry present in the xarray
+ * between the current xas position and @max.  @entry will be set to
+ * the entry retrieved from the xarray.  It is safe to delete entries
+ * from the array in the loop body.  You should hold either the RCU lock
+ * or the xa_lock while iterating.  If you need to drop the lock, call
+ * xas_pause() first.
+ */
+#define xas_for_each(xa, xas, entry, max) \
+	for (entry = xas_find(xa, xas, max); entry; \
+	     entry = xas_next(xa, xas, max))
+
 #endif /* _LINUX_XARRAY_H */
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 507e1842255b..29b38d447497 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -247,7 +247,7 @@ static inline unsigned long node_maxindex(const struct radix_tree_node *node)
 	return shift_maxindex(node->shift);
 }
 
-static unsigned long next_index(unsigned long index,
+static unsigned long rnext_index(unsigned long index,
 				const struct radix_tree_node *node,
 				unsigned long offset)
 {
@@ -2146,7 +2146,7 @@ void __rcu **idr_get_free(struct radix_tree_root *root,
 		if (!rtag_get(node, IDR_FREE, offset)) {
 			offset = radix_tree_find_next_bit(node, IDR_FREE,
 							offset + 1);
-			start = next_index(start, node, offset);
+			start = rnext_index(start, node, offset);
 			if (start > max)
 				return ERR_PTR(-ENOSPC);
 			while (offset == RADIX_TREE_MAP_SIZE) {
diff --git a/lib/xarray.c b/lib/xarray.c
index 82f39d86fc76..5409048e8b44 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -91,12 +91,24 @@ static inline bool tag_any_set(struct xa_node *node, xa_tag_t tag)
 	tag = (__force xa_tag_t)((__force unsigned)(tag) + 1); \
 } while (0)
 
+/* Returns the index of the next slot in this node.  @index may be unaligned. */
+static unsigned long next_index(unsigned long index, struct xa_node *node)
+{
+	return round_up(index + 1, 1UL << node->shift);
+}
+
 /* extracts the offset within this node from the index */
 static unsigned int get_offset(unsigned long index, struct xa_node *node)
 {
 	return (index >> node->shift) & XA_CHUNK_MASK;
 }
 
+static void xas_add(struct xa_state *xas, unsigned long val)
+{
+	xas->xa_index += (val << xas->xa_node->shift);
+	xas->xa_offset += val;
+}
+
 /*
  * Starts a walk.  If the @xas is already valid, we assume that it's on
  * the right path and just return where we've got to.  If we're in an
@@ -759,6 +771,103 @@ void xas_init_tags(struct xarray *xa, const struct xa_state *xas)
 }
 EXPORT_SYMBOL_GPL(xas_init_tags);
 
+/**
+ * xas_pause() - Pause a walk to drop a lock.
+ * @xas: XArray operation state.
+ *
+ * Some users need to pause a walk and drop the lock they're holding in
+ * order to yield to a higher priority thread or carry out an operation
+ * on an entry.  Those users should call this function before they drop
+ * the lock.  It resets the @xas to be suitable for the next iteration
+ * of the loop after the user has reacquired the lock.  If most entries
+ * found during a walk require you to call xas_pause(), the xa_for_each()
+ * iterator may be more appropriate.
+ *
+ * Note that xas_pause() only works for forward iteration.  If a user needs
+ * to pause a reverse iteration, we will need a xas_pause_rev().
+ */
+void xas_pause(struct xa_state *xas)
+{
+	struct xa_node *node = xas->xa_node;
+
+	if (node) {
+		unsigned int offset = xas->xa_offset;
+		xas->xa_index = next_index(xas->xa_index, node);
+		while (++offset < XA_CHUNK_SIZE) {
+			if (!xa_is_sibling(xa_entry(node->root, node, offset)))
+				break;
+			xas->xa_index += 1UL << node->shift;
+		}
+	} else {
+		xas->xa_index++;
+	}
+	xas->xa_node = XAS_RESTART;
+}
+EXPORT_SYMBOL_GPL(xas_pause);
+
+/**
+ * xas_find() - Find the next present entry in the XArray.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ * @max: Highest index to return.
+ *
+ * If the xas has not yet been walked to an entry, return the entry
+ * which has an index >= xas.xa_index.  If it has been walked, the entry
+ * currently being pointed at has been processed, and so we move to the
+ * next entry.
+ *
+ * If no entry is found and the array is smaller than @max, @xas is
+ * set to the restart state and xas->xa_index is set to the smallest index
+ * not yet in the array.  This allows @xas to be immediately passed to
+ * xas_create().
+ *
+ * Return: The entry, if found, otherwise NULL.
+ */
+void *xas_find(struct xarray *xa, struct xa_state *xas, unsigned long max)
+{
+	void *entry;
+
+	if (xas_error(xas))
+		return NULL;
+
+	if (xas->xa_node == XAS_RESTART) {
+		entry = xas_load(xa, xas);
+		if (entry || xas_not_node(xas->xa_node))
+			return entry;
+	} else if (!xas->xa_node) {
+		xas->xa_index = 1;
+		xas->xa_node = XAS_RESTART;
+		return NULL;
+	}
+
+	xas->xa_index = next_index(xas->xa_index, xas->xa_node);
+	xas->xa_offset++;
+
+	while (xas->xa_node && (xas->xa_index <= max)) {
+		if (unlikely(xas->xa_offset == XA_CHUNK_SIZE)) {
+			xas->xa_offset = xas->xa_node->offset + 1;
+			xas->xa_node = xa_parent(xa, xas->xa_node);
+			continue;
+		}
+
+		entry = xa_entry(xa, xas->xa_node, xas->xa_offset);
+		if (xa_is_node(entry)) {
+			xas->xa_node = xa_to_node(entry);
+			xas->xa_offset = 0;
+			continue;
+		}
+		if (!xa_iter_skip(entry))
+			return entry;
+
+		xas_add(xas, 1);
+	}
+
+	if (!xas->xa_node)
+		xas->xa_node = XAS_RESTART;
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(xas_find);
+
 /**
  * xa_load() - Load an entry from an XArray.
  * @xa: XArray.
@@ -975,3 +1084,67 @@ void *xa_clear_tag(struct xarray *xa, unsigned long index, xa_tag_t tag)
 	return entry;
 }
 EXPORT_SYMBOL(xa_clear_tag);
+
+/**
+ * xa_find() - Search the XArray for a present entry.
+ * @xa: XArray.
+ * @indexp: Pointer to an index.
+ * @max: Maximum index to search to.
+ *
+ * Finds the entry in the xarray with the lowest index that is between
+ * *@indexp and max, inclusive.  If an entry is found, updates @indexp to
+ * be the index of the pointer.  This function is protected by the RCU read
+ * lock, so it may not find all entries if called in a loop.  It will not
+ * return an %XA_RETRY_ENTRY; if you need to see retry entries, use xas_next().
+ *
+ * Return: The entry, if found, otherwise NULL.
+ */
+void *xa_find(struct xarray *xa, unsigned long *indexp, unsigned long max)
+{
+	XA_STATE(xas, *indexp);
+	void *entry;
+
+	rcu_read_lock();
+	do {
+		entry = xas_find(xa, &xas, max);
+	} while (xas_retry(&xas, entry));
+	rcu_read_unlock();
+
+	if (entry)
+		*indexp = xas.xa_index;
+	return entry;
+}
+EXPORT_SYMBOL(xa_find);
+
+/**
+ * xa_next() - Search the XArray for a present entry.
+ * @xa: XArray.
+ * @indexp: Pointer to an index.
+ * @max: Maximum index to search to.
+ *
+ * Finds the entry in @xa with the lowest index that is above *@indexp and
+ * less than or equal to @max.  If an entry is found, updates @indexp to be
+ * the index of the pointer.  This function is protected by the RCU read
+ * lock, so it may not find all entries if called in a loop.  It will not
+ * return an %XA_RETRY_ENTRY; if you need to see retry entries, use xas_next().
+ *
+ * Return: The pointer, if found, otherwise NULL.
+ */
+void *xa_next(struct xarray *xa, unsigned long *indexp, unsigned long max)
+{
+	XA_STATE(xas, *indexp + 1);
+	void *entry;
+
+	rcu_read_lock();
+	do {
+		entry = xas_find(xa, &xas, max);
+		if (xas.xa_index <= *indexp)
+			entry = xas_next(xa, &xas, max);
+	} while (xas_retry(&xas, entry));
+	rcu_read_unlock();
+
+	if (entry)
+		*indexp = xas.xa_index;
+	return entry;
+}
+EXPORT_SYMBOL(xa_next);
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 25/62] xarray: Add xa_init
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

For initialising xarrays in code rather than data.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h | 13 +++++++++++++
 lib/xarray.c           | 15 +++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 08ddad60a43d..19a3974fdc4f 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -81,6 +81,19 @@ struct xarray {
 
 #define DEFINE_XARRAY(name) struct xarray name = XARRAY_INIT(name)
 
+void __xa_init(struct xarray *, gfp_t flags);
+
+/**
+ * xa_init() - Initialise an empty XArray.
+ * @xa: XArray.
+ *
+ * An empty XArray is full of NULL entries.
+ */
+static inline void xa_init(struct xarray *xa)
+{
+	__xa_init(xa, 0);
+}
+
 void *xa_load(struct xarray *, unsigned long index);
 void *xa_store(struct xarray *, unsigned long index, void *entry, gfp_t);
 void *xa_cmpxchg(struct xarray *, unsigned long index,
diff --git a/lib/xarray.c b/lib/xarray.c
index 5409048e8b44..59f45c07988f 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -868,6 +868,21 @@ void *xas_find(struct xarray *xa, struct xa_state *xas, unsigned long max)
 }
 EXPORT_SYMBOL_GPL(xas_find);
 
+/**
+ * __xa_init() - Initialise an empty XArray
+ * @xa: XArray.
+ * @flags: XA_FLAG_ values
+ *
+ * An empty XArray is full of NULL pointers.
+ */
+void __xa_init(struct xarray *xa, gfp_t flags)
+{
+	spin_lock_init(&xa->xa_lock);
+	xa->xa_flags = flags;
+	xa->xa_head = NULL;
+}
+EXPORT_SYMBOL(__xa_init);
+
 /**
  * xa_load() - Load an entry from an XArray.
  * @xa: XArray.
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 25/62] xarray: Add xa_init
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

For initialising xarrays in code rather than data.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h | 13 +++++++++++++
 lib/xarray.c           | 15 +++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 08ddad60a43d..19a3974fdc4f 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -81,6 +81,19 @@ struct xarray {
 
 #define DEFINE_XARRAY(name) struct xarray name = XARRAY_INIT(name)
 
+void __xa_init(struct xarray *, gfp_t flags);
+
+/**
+ * xa_init() - Initialise an empty XArray.
+ * @xa: XArray.
+ *
+ * An empty XArray is full of NULL entries.
+ */
+static inline void xa_init(struct xarray *xa)
+{
+	__xa_init(xa, 0);
+}
+
 void *xa_load(struct xarray *, unsigned long index);
 void *xa_store(struct xarray *, unsigned long index, void *entry, gfp_t);
 void *xa_cmpxchg(struct xarray *, unsigned long index,
diff --git a/lib/xarray.c b/lib/xarray.c
index 5409048e8b44..59f45c07988f 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -868,6 +868,21 @@ void *xas_find(struct xarray *xa, struct xa_state *xas, unsigned long max)
 }
 EXPORT_SYMBOL_GPL(xas_find);
 
+/**
+ * __xa_init() - Initialise an empty XArray
+ * @xa: XArray.
+ * @flags: XA_FLAG_ values
+ *
+ * An empty XArray is full of NULL pointers.
+ */
+void __xa_init(struct xarray *xa, gfp_t flags)
+{
+	spin_lock_init(&xa->xa_lock);
+	xa->xa_flags = flags;
+	xa->xa_head = NULL;
+}
+EXPORT_SYMBOL(__xa_init);
+
 /**
  * xa_load() - Load an entry from an XArray.
  * @xa: XArray.
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 26/62] xarray: Add xas_for_each_tag
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This iterator operates across each tagged entry in the specified range.
We do not yet have a user for an xa_for_each_tag iterator, but it would
be straight-forward to add one if needed.  This commit also includes
xas_find_tag() and xas_next_tag().

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h | 70 +++++++++++++++++++++++++++++++++++++++++++
 lib/xarray.c           | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 150 insertions(+)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 19a3974fdc4f..427c792ddb2a 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -465,6 +465,8 @@ void *xas_find(struct xarray *, struct xa_state *, unsigned long max);
 bool xas_get_tag(const struct xarray *, const struct xa_state *, xa_tag_t);
 void xas_set_tag(struct xarray *, const struct xa_state *, xa_tag_t);
 void xas_clear_tag(struct xarray *, const struct xa_state *, xa_tag_t);
+void *xas_find_tag(struct xarray *, struct xa_state *, unsigned long max,
+			xa_tag_t);
 void xas_init_tags(struct xarray *, const struct xa_state *);
 
 void xas_destroy(struct xa_state *);
@@ -566,6 +568,55 @@ static inline void *xas_next(struct xarray *xa, struct xa_state *xas,
 	return entry;
 }
 
+static inline unsigned int xas_find_chunk(struct xa_state *xas, bool advance,
+		xa_tag_t tag)
+{
+	unsigned long *addr = xas->xa_node->tags[(__force unsigned)tag];
+	unsigned int offset = xas->xa_offset;
+
+	if (advance)
+		offset++;
+	if (XA_CHUNK_SIZE == BITS_PER_LONG) {
+		unsigned long data = *addr & (~0UL << offset);
+		if (data)
+			return __ffs(data);
+		return XA_CHUNK_SIZE;
+	}
+
+	return find_next_bit(addr, XA_CHUNK_SIZE, offset);
+}
+
+/**
+ * xas_next_tag() - Advance iterator to next tagged entry.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ * @max: Highest index to return.
+ * @tag: Tag to search for.
+ *
+ * xas_next_tag() is an inline function to optimise xarray traversal for
+ * speed.  It is equivalent to calling xas_find_tag(), and will call
+ * xas_find_tag() for all the hard cases.
+ *
+ * Return: The next tagged entry after the one currently referred to by @xas.
+ */
+static inline void *xas_next_tag(struct xarray *xa, struct xa_state *xas,
+					unsigned long max, xa_tag_t tag)
+{
+	struct xa_node *node = xas->xa_node;
+	unsigned int offset;
+
+	if (unlikely(xas_not_node(node) || node->shift))
+		return xas_find_tag(xa, xas, max, tag);
+	offset = xas_find_chunk(xas, true, tag);
+	xas->xa_offset = offset;
+	xas->xa_index = (xas->xa_index & ~XA_CHUNK_MASK) + offset;
+	if (xas->xa_index > max)
+		return NULL;
+	if (offset == XA_CHUNK_SIZE)
+		return xas_find_tag(xa, xas, max, tag);
+	return xa_entry(xa, node, offset);
+}
+
 /**
  * xas_for_each() - Iterate over a range of an XArray
  * @xa: XArray.
@@ -584,4 +635,23 @@ static inline void *xas_next(struct xarray *xa, struct xa_state *xas,
 	for (entry = xas_find(xa, xas, max); entry; \
 	     entry = xas_next(xa, xas, max))
 
+/**
+ * xas_for_each_tag() - Iterate over a range of an XArray
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ * @entry: Entry retrieved from array.
+ * @max: Maximum index to retrieve from array.
+ * @tag: Tag to search for.
+ *
+ * The loop body will be executed for each tagged entry in the xarray
+ * between the current xas position and @max.  @entry will be set to
+ * the entry retrieved from the xarray.  It is safe to delete entries
+ * from the array in the loop body.  You should hold either the RCU lock
+ * or the xa_lock while iterating.  If you need to drop the lock, call
+ * xas_pause() first.
+ */
+#define xas_for_each_tag(xa, xas, entry, max, tag) \
+	for (entry = xas_find_tag(xa, xas, max, tag); entry; \
+	     entry = xas_next_tag(xa, xas, max, tag))
+
 #endif /* _LINUX_XARRAY_H */
diff --git a/lib/xarray.c b/lib/xarray.c
index 59f45c07988f..ea2dbd343380 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -868,6 +868,86 @@ void *xas_find(struct xarray *xa, struct xa_state *xas, unsigned long max)
 }
 EXPORT_SYMBOL_GPL(xas_find);
 
+/**
+ * xas_find_tag() - Find the next tagged entry in the XArray.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ * @max: Highest index to return.
+ * @tag: Tag number to search for.
+ *
+ * If the xas has not yet been walked to an entry, return the tagged entry
+ * which has an index >= xas.xa_index.  If it has been walked, the entry
+ * currently being pointed at has been processed, and so we move to the
+ * next tagged entry.
+ *
+ * If no tagged entry is found and the array is smaller than @max, @xas is
+ * set to the restart state and xas->xa_index is set to the smallest index
+ * not yet in the array.  This allows @xas to be immediately passed to
+ * xas_create().
+ *
+ * Return: The entry, if found, otherwise NULL.
+ */
+void *xas_find_tag(struct xarray *xa, struct xa_state *xas, unsigned long max,
+		xa_tag_t tag)
+{
+	bool advance = true;
+	unsigned int offset;
+	void *entry;
+
+	if (xas_error(xas))
+		return NULL;
+
+	if (xas->xa_node == NULL) {
+		xas->xa_index = 1;
+		goto out;
+	} else if (xas->xa_node == XAS_RESTART) {
+		advance = false;
+		entry = xa_head(xa);
+		if (xas->xa_index > max_index(entry))
+			goto out;
+		if (!xa_is_node(entry)) {
+			if (xa_tagged(xa, tag)) {
+				xas->xa_node = NULL;
+				return entry;
+			}
+			xas->xa_index = 1;
+			goto out;
+		}
+		xas->xa_node = xa_to_node(entry);
+		xas->xa_offset = xas->xa_index >> xas->xa_node->shift;
+	}
+
+	while (xas->xa_index <= max) {
+		if (unlikely(xas->xa_offset == XA_CHUNK_SIZE)) {
+			xas->xa_offset = xas->xa_node->offset + 1;
+			xas->xa_node = xa_parent(xa, xas->xa_node);
+			if (!xas->xa_node)
+				break;
+			advance = false;
+			continue;
+		}
+
+		offset = xas_find_chunk(xas, advance, tag);
+		xas_add(xas, offset - xas->xa_offset);
+		if (offset == XA_CHUNK_SIZE) {
+			advance = false;
+			continue;
+		}
+
+		entry = xa_entry(xa, xas->xa_node, xas->xa_offset);
+		if (!xa_is_node(entry))
+			return entry;
+		xas->xa_node = xa_to_node(entry);
+		xas->xa_offset = get_offset(xas->xa_index, xas->xa_node);
+	}
+
+ out:
+	if (!xas->xa_node)
+		xas->xa_node = XAS_RESTART;
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(xas_find_tag);
+
 /**
  * __xa_init() - Initialise an empty XArray
  * @xa: XArray.
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 26/62] xarray: Add xas_for_each_tag
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This iterator operates across each tagged entry in the specified range.
We do not yet have a user for an xa_for_each_tag iterator, but it would
be straight-forward to add one if needed.  This commit also includes
xas_find_tag() and xas_next_tag().

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h | 70 +++++++++++++++++++++++++++++++++++++++++++
 lib/xarray.c           | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 150 insertions(+)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 19a3974fdc4f..427c792ddb2a 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -465,6 +465,8 @@ void *xas_find(struct xarray *, struct xa_state *, unsigned long max);
 bool xas_get_tag(const struct xarray *, const struct xa_state *, xa_tag_t);
 void xas_set_tag(struct xarray *, const struct xa_state *, xa_tag_t);
 void xas_clear_tag(struct xarray *, const struct xa_state *, xa_tag_t);
+void *xas_find_tag(struct xarray *, struct xa_state *, unsigned long max,
+			xa_tag_t);
 void xas_init_tags(struct xarray *, const struct xa_state *);
 
 void xas_destroy(struct xa_state *);
@@ -566,6 +568,55 @@ static inline void *xas_next(struct xarray *xa, struct xa_state *xas,
 	return entry;
 }
 
+static inline unsigned int xas_find_chunk(struct xa_state *xas, bool advance,
+		xa_tag_t tag)
+{
+	unsigned long *addr = xas->xa_node->tags[(__force unsigned)tag];
+	unsigned int offset = xas->xa_offset;
+
+	if (advance)
+		offset++;
+	if (XA_CHUNK_SIZE == BITS_PER_LONG) {
+		unsigned long data = *addr & (~0UL << offset);
+		if (data)
+			return __ffs(data);
+		return XA_CHUNK_SIZE;
+	}
+
+	return find_next_bit(addr, XA_CHUNK_SIZE, offset);
+}
+
+/**
+ * xas_next_tag() - Advance iterator to next tagged entry.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ * @max: Highest index to return.
+ * @tag: Tag to search for.
+ *
+ * xas_next_tag() is an inline function to optimise xarray traversal for
+ * speed.  It is equivalent to calling xas_find_tag(), and will call
+ * xas_find_tag() for all the hard cases.
+ *
+ * Return: The next tagged entry after the one currently referred to by @xas.
+ */
+static inline void *xas_next_tag(struct xarray *xa, struct xa_state *xas,
+					unsigned long max, xa_tag_t tag)
+{
+	struct xa_node *node = xas->xa_node;
+	unsigned int offset;
+
+	if (unlikely(xas_not_node(node) || node->shift))
+		return xas_find_tag(xa, xas, max, tag);
+	offset = xas_find_chunk(xas, true, tag);
+	xas->xa_offset = offset;
+	xas->xa_index = (xas->xa_index & ~XA_CHUNK_MASK) + offset;
+	if (xas->xa_index > max)
+		return NULL;
+	if (offset == XA_CHUNK_SIZE)
+		return xas_find_tag(xa, xas, max, tag);
+	return xa_entry(xa, node, offset);
+}
+
 /**
  * xas_for_each() - Iterate over a range of an XArray
  * @xa: XArray.
@@ -584,4 +635,23 @@ static inline void *xas_next(struct xarray *xa, struct xa_state *xas,
 	for (entry = xas_find(xa, xas, max); entry; \
 	     entry = xas_next(xa, xas, max))
 
+/**
+ * xas_for_each_tag() - Iterate over a range of an XArray
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ * @entry: Entry retrieved from array.
+ * @max: Maximum index to retrieve from array.
+ * @tag: Tag to search for.
+ *
+ * The loop body will be executed for each tagged entry in the xarray
+ * between the current xas position and @max.  @entry will be set to
+ * the entry retrieved from the xarray.  It is safe to delete entries
+ * from the array in the loop body.  You should hold either the RCU lock
+ * or the xa_lock while iterating.  If you need to drop the lock, call
+ * xas_pause() first.
+ */
+#define xas_for_each_tag(xa, xas, entry, max, tag) \
+	for (entry = xas_find_tag(xa, xas, max, tag); entry; \
+	     entry = xas_next_tag(xa, xas, max, tag))
+
 #endif /* _LINUX_XARRAY_H */
diff --git a/lib/xarray.c b/lib/xarray.c
index 59f45c07988f..ea2dbd343380 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -868,6 +868,86 @@ void *xas_find(struct xarray *xa, struct xa_state *xas, unsigned long max)
 }
 EXPORT_SYMBOL_GPL(xas_find);
 
+/**
+ * xas_find_tag() - Find the next tagged entry in the XArray.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ * @max: Highest index to return.
+ * @tag: Tag number to search for.
+ *
+ * If the xas has not yet been walked to an entry, return the tagged entry
+ * which has an index >= xas.xa_index.  If it has been walked, the entry
+ * currently being pointed at has been processed, and so we move to the
+ * next tagged entry.
+ *
+ * If no tagged entry is found and the array is smaller than @max, @xas is
+ * set to the restart state and xas->xa_index is set to the smallest index
+ * not yet in the array.  This allows @xas to be immediately passed to
+ * xas_create().
+ *
+ * Return: The entry, if found, otherwise NULL.
+ */
+void *xas_find_tag(struct xarray *xa, struct xa_state *xas, unsigned long max,
+		xa_tag_t tag)
+{
+	bool advance = true;
+	unsigned int offset;
+	void *entry;
+
+	if (xas_error(xas))
+		return NULL;
+
+	if (xas->xa_node == NULL) {
+		xas->xa_index = 1;
+		goto out;
+	} else if (xas->xa_node == XAS_RESTART) {
+		advance = false;
+		entry = xa_head(xa);
+		if (xas->xa_index > max_index(entry))
+			goto out;
+		if (!xa_is_node(entry)) {
+			if (xa_tagged(xa, tag)) {
+				xas->xa_node = NULL;
+				return entry;
+			}
+			xas->xa_index = 1;
+			goto out;
+		}
+		xas->xa_node = xa_to_node(entry);
+		xas->xa_offset = xas->xa_index >> xas->xa_node->shift;
+	}
+
+	while (xas->xa_index <= max) {
+		if (unlikely(xas->xa_offset == XA_CHUNK_SIZE)) {
+			xas->xa_offset = xas->xa_node->offset + 1;
+			xas->xa_node = xa_parent(xa, xas->xa_node);
+			if (!xas->xa_node)
+				break;
+			advance = false;
+			continue;
+		}
+
+		offset = xas_find_chunk(xas, advance, tag);
+		xas_add(xas, offset - xas->xa_offset);
+		if (offset == XA_CHUNK_SIZE) {
+			advance = false;
+			continue;
+		}
+
+		entry = xa_entry(xa, xas->xa_node, xas->xa_offset);
+		if (!xa_is_node(entry))
+			return entry;
+		xas->xa_node = xa_to_node(entry);
+		xas->xa_offset = get_offset(xas->xa_index, xas->xa_node);
+	}
+
+ out:
+	if (!xas->xa_node)
+		xas->xa_node = XAS_RESTART;
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(xas_find_tag);
+
 /**
  * __xa_init() - Initialise an empty XArray
  * @xa: XArray.
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 27/62] xarray: Add xa_get_entries and xa_get_tagged
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

These functions allow a range of xarray entries to be extracted into a
compact normal array.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h |  4 +++
 lib/xarray.c           | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 427c792ddb2a..a48e7aa6406c 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -145,6 +145,10 @@ void *xa_clear_tag(struct xarray *, unsigned long index, xa_tag_t);
 
 void *xa_find(struct xarray *xa, unsigned long *index, unsigned long max);
 void *xa_next(struct xarray *xa, unsigned long *index, unsigned long max);
+int xa_get_entries(struct xarray *, void **dst, unsigned long start,
+			unsigned long max, unsigned int n);
+int xa_get_tagged(struct xarray *, void **dst, unsigned long start,
+			unsigned long max, unsigned int n, xa_tag_t);
 
 /**
  * xa_for_each() - Iterate over a portion of an XArray.
diff --git a/lib/xarray.c b/lib/xarray.c
index ea2dbd343380..9577a70495c0 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -1243,3 +1243,72 @@ void *xa_next(struct xarray *xa, unsigned long *indexp, unsigned long max)
 	return entry;
 }
 EXPORT_SYMBOL(xa_next);
+
+/**
+ * xa_get_entries() - Copy entries from the xarray into a normal array
+ * @xa: The source XArray to copy from
+ * @dst: The buffer to copy pointers into
+ * @start: The first index in the XArray eligible to be copied from
+ * @max: The last index in the XArray eligible to be copied from
+ * @n: The maximum number of entries to copy
+ *
+ * Return: The number of entries copied.
+ */
+int xa_get_entries(struct xarray *xa, void **dst, unsigned long start,
+			unsigned long max, unsigned int n)
+{
+	XA_STATE(xas, start);
+	void *entry;
+	unsigned int i = 0;
+
+	if (!n)
+		return 0;
+
+	rcu_read_lock();
+	xas_for_each(xa, &xas, entry, max) {
+		if (xas_retry(&xas, entry))
+			continue;
+		dst[i++] = entry;
+		if (i == n)
+			break;
+	}
+	rcu_read_unlock();
+
+	return i;
+}
+EXPORT_SYMBOL(xa_get_entries);
+
+/**
+ * xa_get_tagged() - Copy tagged entries from the xarray into a normal array.
+ * @xa: The source XArray to copy from.
+ * @dst: The buffer to copy pointers into.
+ * @start: The first index in the XArray eligible to be copied from.
+ * @max: The last index in the XArray eligible to be copied from
+ * @n: The maximum number of entries to copy.
+ * @tag: Tag number.
+ *
+ * Return: The number of entries copied.
+ */
+int xa_get_tagged(struct xarray *xa, void **dst, unsigned long start,
+			unsigned long max, unsigned int n, xa_tag_t tag)
+{
+	XA_STATE(xas, start);
+	void *entry;
+	unsigned int i = 0;
+
+	if (!n)
+		return 0;
+
+	rcu_read_lock();
+	xas_for_each_tag(xa, &xas, entry, max, tag) {
+		if (xas_retry(&xas, entry))
+			continue;
+		dst[i++] = entry;
+		if (i == n)
+			break;
+	}
+	rcu_read_unlock();
+
+	return i;
+}
+EXPORT_SYMBOL(xa_get_tagged);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 27/62] xarray: Add xa_get_entries and xa_get_tagged
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

These functions allow a range of xarray entries to be extracted into a
compact normal array.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h |  4 +++
 lib/xarray.c           | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 427c792ddb2a..a48e7aa6406c 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -145,6 +145,10 @@ void *xa_clear_tag(struct xarray *, unsigned long index, xa_tag_t);
 
 void *xa_find(struct xarray *xa, unsigned long *index, unsigned long max);
 void *xa_next(struct xarray *xa, unsigned long *index, unsigned long max);
+int xa_get_entries(struct xarray *, void **dst, unsigned long start,
+			unsigned long max, unsigned int n);
+int xa_get_tagged(struct xarray *, void **dst, unsigned long start,
+			unsigned long max, unsigned int n, xa_tag_t);
 
 /**
  * xa_for_each() - Iterate over a portion of an XArray.
diff --git a/lib/xarray.c b/lib/xarray.c
index ea2dbd343380..9577a70495c0 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -1243,3 +1243,72 @@ void *xa_next(struct xarray *xa, unsigned long *indexp, unsigned long max)
 	return entry;
 }
 EXPORT_SYMBOL(xa_next);
+
+/**
+ * xa_get_entries() - Copy entries from the xarray into a normal array
+ * @xa: The source XArray to copy from
+ * @dst: The buffer to copy pointers into
+ * @start: The first index in the XArray eligible to be copied from
+ * @max: The last index in the XArray eligible to be copied from
+ * @n: The maximum number of entries to copy
+ *
+ * Return: The number of entries copied.
+ */
+int xa_get_entries(struct xarray *xa, void **dst, unsigned long start,
+			unsigned long max, unsigned int n)
+{
+	XA_STATE(xas, start);
+	void *entry;
+	unsigned int i = 0;
+
+	if (!n)
+		return 0;
+
+	rcu_read_lock();
+	xas_for_each(xa, &xas, entry, max) {
+		if (xas_retry(&xas, entry))
+			continue;
+		dst[i++] = entry;
+		if (i == n)
+			break;
+	}
+	rcu_read_unlock();
+
+	return i;
+}
+EXPORT_SYMBOL(xa_get_entries);
+
+/**
+ * xa_get_tagged() - Copy tagged entries from the xarray into a normal array.
+ * @xa: The source XArray to copy from.
+ * @dst: The buffer to copy pointers into.
+ * @start: The first index in the XArray eligible to be copied from.
+ * @max: The last index in the XArray eligible to be copied from
+ * @n: The maximum number of entries to copy.
+ * @tag: Tag number.
+ *
+ * Return: The number of entries copied.
+ */
+int xa_get_tagged(struct xarray *xa, void **dst, unsigned long start,
+			unsigned long max, unsigned int n, xa_tag_t tag)
+{
+	XA_STATE(xas, start);
+	void *entry;
+	unsigned int i = 0;
+
+	if (!n)
+		return 0;
+
+	rcu_read_lock();
+	xas_for_each_tag(xa, &xas, entry, max, tag) {
+		if (xas_retry(&xas, entry))
+			continue;
+		dst[i++] = entry;
+		if (i == n)
+			break;
+	}
+	rcu_read_unlock();
+
+	return i;
+}
+EXPORT_SYMBOL(xa_get_tagged);
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 28/62] xarray: Add xa_destroy
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This function frees all the internal memory allocated to the xarray
and reinitialises it to be empty.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h |  1 +
 lib/xarray.c           | 26 ++++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index a48e7aa6406c..347347499652 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -98,6 +98,7 @@ void *xa_load(struct xarray *, unsigned long index);
 void *xa_store(struct xarray *, unsigned long index, void *entry, gfp_t);
 void *xa_cmpxchg(struct xarray *, unsigned long index,
 			void *old, void *entry, gfp_t);
+void xa_destroy(struct xarray *);
 
 /**
  * xa_empty() - Determine if an array has any present entries
diff --git a/lib/xarray.c b/lib/xarray.c
index 9577a70495c0..4fc1073f9454 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -1312,3 +1312,29 @@ int xa_get_tagged(struct xarray *xa, void **dst, unsigned long start,
 	return i;
 }
 EXPORT_SYMBOL(xa_get_tagged);
+
+/**
+ * xa_destroy() - Free all internal data structures.
+ * @xa: XArray.
+ *
+ * After calling this function, the XArray is empty and has freed all memory
+ * allocated for its internal data structures.  You are responsible for
+ * freeing the objects referenced by the XArray.
+ */
+void xa_destroy(struct xarray *xa)
+{
+	XA_STATE(xas, 0);
+	unsigned long flags;
+	void *entry;
+
+	xas.xa_node = NULL;
+	xa_lock_irqsave(xa, flags);
+	entry = xa_head_locked(xa);
+	RCU_INIT_POINTER(xa->xa_head, NULL);
+	xas_init_tags(xa, &xas);
+	/* lockdep checks we're still holding the lock in xas_free_nodes() */
+	if (xa_is_node(entry))
+		xas_free_nodes(xa, &xas, xa_to_node(entry));
+	xa_unlock_irqrestore(xa, flags);
+}
+EXPORT_SYMBOL(xa_destroy);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 28/62] xarray: Add xa_destroy
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This function frees all the internal memory allocated to the xarray
and reinitialises it to be empty.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h |  1 +
 lib/xarray.c           | 26 ++++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index a48e7aa6406c..347347499652 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -98,6 +98,7 @@ void *xa_load(struct xarray *, unsigned long index);
 void *xa_store(struct xarray *, unsigned long index, void *entry, gfp_t);
 void *xa_cmpxchg(struct xarray *, unsigned long index,
 			void *old, void *entry, gfp_t);
+void xa_destroy(struct xarray *);
 
 /**
  * xa_empty() - Determine if an array has any present entries
diff --git a/lib/xarray.c b/lib/xarray.c
index 9577a70495c0..4fc1073f9454 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -1312,3 +1312,29 @@ int xa_get_tagged(struct xarray *xa, void **dst, unsigned long start,
 	return i;
 }
 EXPORT_SYMBOL(xa_get_tagged);
+
+/**
+ * xa_destroy() - Free all internal data structures.
+ * @xa: XArray.
+ *
+ * After calling this function, the XArray is empty and has freed all memory
+ * allocated for its internal data structures.  You are responsible for
+ * freeing the objects referenced by the XArray.
+ */
+void xa_destroy(struct xarray *xa)
+{
+	XA_STATE(xas, 0);
+	unsigned long flags;
+	void *entry;
+
+	xas.xa_node = NULL;
+	xa_lock_irqsave(xa, flags);
+	entry = xa_head_locked(xa);
+	RCU_INIT_POINTER(xa->xa_head, NULL);
+	xas_init_tags(xa, &xas);
+	/* lockdep checks we're still holding the lock in xas_free_nodes() */
+	if (xa_is_node(entry))
+		xas_free_nodes(xa, &xas, xa_to_node(entry));
+	xa_unlock_irqrestore(xa, flags);
+}
+EXPORT_SYMBOL(xa_destroy);
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 29/62] xarray: Add xas_prev_any
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The page cache wants to search backwards to find the first hole.
Its definition of a hole doesn't make sense for the xarray, so introduce
a function called xas_prev_any() which will return any kind of entry,
including NULL or internal entries.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h |  1 +
 lib/xarray.c           | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 mm/filemap.c           | 15 +++++----------
 3 files changed, 55 insertions(+), 10 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 347347499652..e0cfe6944752 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -466,6 +466,7 @@ void *xas_load(struct xarray *, struct xa_state *);
 void *xas_store(struct xarray *, struct xa_state *, void *entry);
 void *xas_create(struct xarray *, struct xa_state *);
 void *xas_find(struct xarray *, struct xa_state *, unsigned long max);
+void *xas_prev_any(struct xarray *, struct xa_state *);
 
 bool xas_get_tag(const struct xarray *, const struct xa_state *, xa_tag_t);
 void xas_set_tag(struct xarray *, const struct xa_state *, xa_tag_t);
diff --git a/lib/xarray.c b/lib/xarray.c
index 4fc1073f9454..202e5aae596d 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -948,6 +948,55 @@ void *xas_find_tag(struct xarray *xa, struct xa_state *xas, unsigned long max,
 }
 EXPORT_SYMBOL_GPL(xas_find_tag);
 
+/**
+ * xas_prev_any() - Find the previous entry in the XArray.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ *
+ * If the xas has not yet been walked to an entry, return the entry
+ * which has an index = xas.xa_index.  If it has been walked, the entry
+ * currently being pointed at has been processed, and so we move to the
+ * previous entry.
+ *
+ * If asked for the previous entry of 0, this function returns NULL and
+ * sets xa_index to ULONG_MAX.  The caller is responsible for detecting
+ * this situation.
+ *
+ * Return: The entry at the index, even if it is NULL.
+ */
+void *xas_prev_any(struct xarray *xa, struct xa_state *xas)
+{
+	void *entry;
+
+	if (xas_error(xas))
+		return NULL;
+
+	if (xas->xa_node == XAS_RESTART)
+		return xas_load(xa, xas);
+
+	while (xas->xa_node) {
+		if (unlikely(xas->xa_offset == 0)) {
+			xas->xa_offset = xas->xa_node->offset;
+			xas->xa_node = xa_parent(xa, xas->xa_node);
+			continue;
+		}
+
+		xas->xa_offset--;
+		xas->xa_index -= 1UL << xas->xa_node->shift;
+		entry = xa_entry(xa, xas->xa_node, xas->xa_offset);
+		if (!xa_is_node(entry))
+			return entry;
+
+		xas->xa_node = xa_to_node(entry);
+		xas->xa_offset = XA_CHUNK_MASK;
+		xas->xa_index |= XA_CHUNK_MASK << xas->xa_node->shift;
+	}
+
+	xas->xa_index = ULONG_MAX;
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(xas_prev_any);
+
 /**
  * __xa_init() - Initialise an empty XArray
  * @xa: XArray.
diff --git a/mm/filemap.c b/mm/filemap.c
index 1d012dd3629e..1c03b0ea105e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1390,20 +1390,15 @@ EXPORT_SYMBOL(page_cache_next_hole);
 pgoff_t page_cache_prev_hole(struct address_space *mapping,
 			     pgoff_t index, unsigned long max_scan)
 {
-	unsigned long i;
-
-	for (i = 0; i < max_scan; i++) {
-		struct page *page;
+	XA_STATE(xas, index);
 
-		page = radix_tree_lookup(&mapping->pages, index);
-		if (!page || xa_is_value(page))
-			break;
-		index--;
-		if (index == ULONG_MAX)
+	while (max_scan--) {
+		void *entry = xas_prev_any(&mapping->pages, &xas);
+		if (!entry || xa_is_value(entry))
 			break;
 	}
 
-	return index;
+	return xas.xa_index;
 }
 EXPORT_SYMBOL(page_cache_prev_hole);
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 29/62] xarray: Add xas_prev_any
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The page cache wants to search backwards to find the first hole.
Its definition of a hole doesn't make sense for the xarray, so introduce
a function called xas_prev_any() which will return any kind of entry,
including NULL or internal entries.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h |  1 +
 lib/xarray.c           | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 mm/filemap.c           | 15 +++++----------
 3 files changed, 55 insertions(+), 10 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 347347499652..e0cfe6944752 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -466,6 +466,7 @@ void *xas_load(struct xarray *, struct xa_state *);
 void *xas_store(struct xarray *, struct xa_state *, void *entry);
 void *xas_create(struct xarray *, struct xa_state *);
 void *xas_find(struct xarray *, struct xa_state *, unsigned long max);
+void *xas_prev_any(struct xarray *, struct xa_state *);
 
 bool xas_get_tag(const struct xarray *, const struct xa_state *, xa_tag_t);
 void xas_set_tag(struct xarray *, const struct xa_state *, xa_tag_t);
diff --git a/lib/xarray.c b/lib/xarray.c
index 4fc1073f9454..202e5aae596d 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -948,6 +948,55 @@ void *xas_find_tag(struct xarray *xa, struct xa_state *xas, unsigned long max,
 }
 EXPORT_SYMBOL_GPL(xas_find_tag);
 
+/**
+ * xas_prev_any() - Find the previous entry in the XArray.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ *
+ * If the xas has not yet been walked to an entry, return the entry
+ * which has an index = xas.xa_index.  If it has been walked, the entry
+ * currently being pointed at has been processed, and so we move to the
+ * previous entry.
+ *
+ * If asked for the previous entry of 0, this function returns NULL and
+ * sets xa_index to ULONG_MAX.  The caller is responsible for detecting
+ * this situation.
+ *
+ * Return: The entry at the index, even if it is NULL.
+ */
+void *xas_prev_any(struct xarray *xa, struct xa_state *xas)
+{
+	void *entry;
+
+	if (xas_error(xas))
+		return NULL;
+
+	if (xas->xa_node == XAS_RESTART)
+		return xas_load(xa, xas);
+
+	while (xas->xa_node) {
+		if (unlikely(xas->xa_offset == 0)) {
+			xas->xa_offset = xas->xa_node->offset;
+			xas->xa_node = xa_parent(xa, xas->xa_node);
+			continue;
+		}
+
+		xas->xa_offset--;
+		xas->xa_index -= 1UL << xas->xa_node->shift;
+		entry = xa_entry(xa, xas->xa_node, xas->xa_offset);
+		if (!xa_is_node(entry))
+			return entry;
+
+		xas->xa_node = xa_to_node(entry);
+		xas->xa_offset = XA_CHUNK_MASK;
+		xas->xa_index |= XA_CHUNK_MASK << xas->xa_node->shift;
+	}
+
+	xas->xa_index = ULONG_MAX;
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(xas_prev_any);
+
 /**
  * __xa_init() - Initialise an empty XArray
  * @xa: XArray.
diff --git a/mm/filemap.c b/mm/filemap.c
index 1d012dd3629e..1c03b0ea105e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1390,20 +1390,15 @@ EXPORT_SYMBOL(page_cache_next_hole);
 pgoff_t page_cache_prev_hole(struct address_space *mapping,
 			     pgoff_t index, unsigned long max_scan)
 {
-	unsigned long i;
-
-	for (i = 0; i < max_scan; i++) {
-		struct page *page;
+	XA_STATE(xas, index);
 
-		page = radix_tree_lookup(&mapping->pages, index);
-		if (!page || xa_is_value(page))
-			break;
-		index--;
-		if (index == ULONG_MAX)
+	while (max_scan--) {
+		void *entry = xas_prev_any(&mapping->pages, &xas);
+		if (!entry || xa_is_value(entry))
 			break;
 	}
 
-	return index;
+	return xas.xa_index;
 }
 EXPORT_SYMBOL(page_cache_prev_hole);
 
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 30/62] xarray: Add xas_find_any / xas_next_any
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

These variations will find any entry (whether it be NULL or an internal
entry).  The only thing they won't return is a node pointer (because it
will walk down the tree).  xas_next_any() is an inline version of
xas_find_any() which avoids making a function call for the common cases.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h | 25 +++++++++++++++++++++++++
 lib/xarray.c           | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index e0cfe6944752..8ab6c4468c21 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -466,6 +466,7 @@ void *xas_load(struct xarray *, struct xa_state *);
 void *xas_store(struct xarray *, struct xa_state *, void *entry);
 void *xas_create(struct xarray *, struct xa_state *);
 void *xas_find(struct xarray *, struct xa_state *, unsigned long max);
+void *xas_find_any(struct xarray *, struct xa_state *);
 void *xas_prev_any(struct xarray *, struct xa_state *);
 
 bool xas_get_tag(const struct xarray *, const struct xa_state *, xa_tag_t);
@@ -540,6 +541,30 @@ static inline bool xa_iter_skip(void *entry)
 			(xa_is_internal(entry) && entry < XA_RETRY_ENTRY));
 }
 
+/**
+ * xas_next_any() - Advance iterator to next entry of any kind.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ *
+ * xas_next_any() is an inline function to optimise xarray traversal for speed.
+ * It is equivalent to calling xas_find_any(), and will call xas_find_any()
+ * for all the hard cases.
+ *
+ * Return: The next entry after the one currently referred to by @xas.
+ */
+static inline void *xas_next_any(struct xarray *xa, struct xa_state *xas)
+{
+	struct xa_node *node = xas->xa_node;
+
+	if (unlikely(xas_not_node(node) || node->shift ||
+				xas->xa_offset == XA_CHUNK_MASK))
+		return xas_find_any(xa, xas);
+
+	xas->xa_index++;
+	xas->xa_offset++;
+	return xa_entry(xa, node, xas->xa_offset);
+}
+
 /**
  * xas_next() - Advance iterator to next present entry.
  * @xa: XArray.
diff --git a/lib/xarray.c b/lib/xarray.c
index 202e5aae596d..38e5fe39bb97 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -805,6 +805,57 @@ void xas_pause(struct xa_state *xas)
 }
 EXPORT_SYMBOL_GPL(xas_pause);
 
+/**
+ * xas_find_any() - Find the next entry in the XArray.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ *
+ * If the xas has not yet been walked to an entry, return the entry
+ * which has an index >= xas.xa_index.  If it has been walked, the entry
+ * currently being pointed at has been processed, and so we move to the
+ * next entry.
+ *
+ * Return: The entry at that location.
+ */
+void *xas_find_any(struct xarray *xa, struct xa_state *xas)
+{
+	void *entry;
+
+	if (xas_error(xas))
+		return NULL;
+
+	if (xas->xa_node == XAS_RESTART) {
+		return xas_load(xa, xas);
+	} else if (!xas->xa_node) {
+		xas->xa_index = 1;
+		xas->xa_node = XAS_RESTART;
+		return NULL;
+	}
+
+	xas->xa_index = next_index(xas->xa_index, xas->xa_node);
+	xas->xa_offset++;
+
+	while (xas->xa_node) {
+		if (unlikely(xas->xa_offset == XA_CHUNK_SIZE)) {
+			xas->xa_offset = xas->xa_node->offset + 1;
+			xas->xa_node = xa_parent(xa, xas->xa_node);
+			continue;
+		}
+
+		entry = xa_entry(xa, xas->xa_node, xas->xa_offset);
+		if (!xa_is_node(entry))
+			return entry;
+
+		xas->xa_node = xa_to_node(entry);
+		xas->xa_offset = 0;
+	}
+
+	if (!xas->xa_node)
+		xas->xa_node = XAS_RESTART;
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(xas_find_any);
+
 /**
  * xas_find() - Find the next present entry in the XArray.
  * @xa: XArray.
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 30/62] xarray: Add xas_find_any / xas_next_any
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

These variations will find any entry (whether it be NULL or an internal
entry).  The only thing they won't return is a node pointer (because it
will walk down the tree).  xas_next_any() is an inline version of
xas_find_any() which avoids making a function call for the common cases.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h | 25 +++++++++++++++++++++++++
 lib/xarray.c           | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index e0cfe6944752..8ab6c4468c21 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -466,6 +466,7 @@ void *xas_load(struct xarray *, struct xa_state *);
 void *xas_store(struct xarray *, struct xa_state *, void *entry);
 void *xas_create(struct xarray *, struct xa_state *);
 void *xas_find(struct xarray *, struct xa_state *, unsigned long max);
+void *xas_find_any(struct xarray *, struct xa_state *);
 void *xas_prev_any(struct xarray *, struct xa_state *);
 
 bool xas_get_tag(const struct xarray *, const struct xa_state *, xa_tag_t);
@@ -540,6 +541,30 @@ static inline bool xa_iter_skip(void *entry)
 			(xa_is_internal(entry) && entry < XA_RETRY_ENTRY));
 }
 
+/**
+ * xas_next_any() - Advance iterator to next entry of any kind.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ *
+ * xas_next_any() is an inline function to optimise xarray traversal for speed.
+ * It is equivalent to calling xas_find_any(), and will call xas_find_any()
+ * for all the hard cases.
+ *
+ * Return: The next entry after the one currently referred to by @xas.
+ */
+static inline void *xas_next_any(struct xarray *xa, struct xa_state *xas)
+{
+	struct xa_node *node = xas->xa_node;
+
+	if (unlikely(xas_not_node(node) || node->shift ||
+				xas->xa_offset == XA_CHUNK_MASK))
+		return xas_find_any(xa, xas);
+
+	xas->xa_index++;
+	xas->xa_offset++;
+	return xa_entry(xa, node, xas->xa_offset);
+}
+
 /**
  * xas_next() - Advance iterator to next present entry.
  * @xa: XArray.
diff --git a/lib/xarray.c b/lib/xarray.c
index 202e5aae596d..38e5fe39bb97 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -805,6 +805,57 @@ void xas_pause(struct xa_state *xas)
 }
 EXPORT_SYMBOL_GPL(xas_pause);
 
+/**
+ * xas_find_any() - Find the next entry in the XArray.
+ * @xa: XArray.
+ * @xas: XArray operation state.
+ *
+ * If the xas has not yet been walked to an entry, return the entry
+ * which has an index >= xas.xa_index.  If it has been walked, the entry
+ * currently being pointed at has been processed, and so we move to the
+ * next entry.
+ *
+ * Return: The entry at that location.
+ */
+void *xas_find_any(struct xarray *xa, struct xa_state *xas)
+{
+	void *entry;
+
+	if (xas_error(xas))
+		return NULL;
+
+	if (xas->xa_node == XAS_RESTART) {
+		return xas_load(xa, xas);
+	} else if (!xas->xa_node) {
+		xas->xa_index = 1;
+		xas->xa_node = XAS_RESTART;
+		return NULL;
+	}
+
+	xas->xa_index = next_index(xas->xa_index, xas->xa_node);
+	xas->xa_offset++;
+
+	while (xas->xa_node) {
+		if (unlikely(xas->xa_offset == XA_CHUNK_SIZE)) {
+			xas->xa_offset = xas->xa_node->offset + 1;
+			xas->xa_node = xa_parent(xa, xas->xa_node);
+			continue;
+		}
+
+		entry = xa_entry(xa, xas->xa_node, xas->xa_offset);
+		if (!xa_is_node(entry))
+			return entry;
+
+		xas->xa_node = xa_to_node(entry);
+		xas->xa_offset = 0;
+	}
+
+	if (!xas->xa_node)
+		xas->xa_node = XAS_RESTART;
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(xas_find_any);
+
 /**
  * xas_find() - Find the next present entry in the XArray.
  * @xa: XArray.
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 31/62] Convert IDR to use xarray
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The one thing the IDR can do which the XArray cannot is allocate an
entry and store a NULL pointer there.  The radix tree was modified to
have this ability, but it added a lot of complexity.  Instead, we
introduce the 'Skip entry', which will read back from the XArray as
NULL through the normal API and will be skipped when iterating, but
will keep a slot allocated.  The only way to store a skip entry in the
xarray is through the advanced API, so unaware users should be
unaffected by it.

The idr_for_each_entry_ul() iterator becomes an alias for xa_for_each(),
so we drop the idr_get_next_ul() function as it has no users.

The exported IDR API was a weird mix of GPL-only and general symbols;
I converted them all to GPL as there was no way to use the IDR API
without being GPL.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/idr.h    |  93 +++++++++++++++++---------------
 include/linux/xarray.h |  18 +++++--
 lib/idr.c              | 143 ++++++++++++++++++++++---------------------------
 lib/radix-tree.c       |  75 +++++++++++++-------------
 lib/xarray.c           |   4 ++
 5 files changed, 170 insertions(+), 163 deletions(-)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index 3c3346441cbc..57945eb0792a 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -9,62 +9,71 @@
  * tables.
  */
 
-#ifndef __IDR_H__
-#define __IDR_H__
+#ifndef _LINUX_IDR_H
+#define _LINUX_IDR_H
 
 #include <linux/radix-tree.h>
 #include <linux/gfp.h>
 #include <linux/percpu.h>
+#include <linux/xarray.h>
 
 struct idr {
-	struct radix_tree_root	idr_rt;
+	struct xarray	idxa;
 };
 
-/*
- * The IDR API does not expose the tagging functionality of the radix tree
- * to users.  Use tag 0 to track whether a node has free space below it.
- */
-#define IDR_FREE	0
-
-/* Set the IDR flag and the IDR_FREE tag */
-#define IDR_RT_MARKER		((__force gfp_t)(3 << __GFP_BITS_SHIFT))
+#define IDR_INIT_FLAGS		XA_FLAGS_TRACK_FREE | XA_FLAGS_TAG(0)
 
 #define IDR_INIT(name)							\
 {									\
-	.idr_rt = RADIX_TREE_INIT(name, IDR_RT_MARKER)			\
+	.idxa = __XARRAY_INIT(name.idxa, IDR_INIT_FLAGS)		\
 }
 #define DEFINE_IDR(name)	struct idr name = IDR_INIT(name)
 
+static inline void idr_init(struct idr *idr)
+{
+	__xa_init(&idr->idxa, IDR_INIT_FLAGS);
+}
+
 /**
  * DOC: idr sync
- * idr synchronization (stolen from radix-tree.h)
+ * idr synchronization
  *
- * idr_find() is able to be called locklessly, using RCU. The caller must
- * ensure calls to this function are made within rcu_read_lock() regions.
- * Other readers (lock-free or otherwise) and modifications may be running
- * concurrently.
+ * The IDR manages its own locking, using irqsafe spinlocks for operations
+ * which modify the IDR and RCU for operations which do not.  The user of
+ * the IDR may choose to wrap accesses to it in a lock if it needs to
+ * guarantee the IDR does not change during a read access.  The easiest way
+ * to do this is to grab the same lock the IDR uses for write accesses
+ * using one of the idr_lock() wrappers.
  *
- * It is still required that the caller manage the synchronization and
- * lifetimes of the items. So if RCU lock-free lookups are used, typically
- * this would mean that the items have their own locks, or are amenable to
- * lock-free access; and that the items are freed by RCU (or only freed after
- * having been deleted from the idr tree *and* a synchronize_rcu() grace
- * period).
+ * The caller must still manage the synchronization and lifetimes of the
+ * items. So if RCU lock-free lookups are used, typically this would mean
+ * that the items have their own locks, or are amenable to lock-free access;
+ * and that the items are freed by RCU (or only freed after having been
+ * deleted from the IDR *and* a synchronize_rcu() grace period has elapsed).
  */
 
-void idr_preload(gfp_t gfp_mask);
+#define idr_lock(idr)		xa_lock(&(idr)->idxa)
+#define idr_unlock(idr)		xa_unlock(&(idr)->idxa)
+#define idr_lock_bh(idr)	xa_lock_bh(&(idr)->idxa)
+#define idr_unlock_bh(idr)	xa_unlock_bh(&(idr)->idxa)
+#define idr_lock_irq(idr)	xa_lock_irq(&(idr)->idxa)
+#define idr_unlock_irq(idr)	xa_unlock_irq(&(idr)->idxa)
+#define idr_lock_irqsave(idr, flags) \
+				xa_lock_irqsave(&(idr)->idxa, flags)
+#define idr_unlock_irqrestore(idr, flags) \
+				xa_unlock_irqrestore(&(idr)->idxa, flags)
+
+void idr_preload(gfp_t);
 
 int idr_alloc(struct idr *, void *, int start, int end, gfp_t);
 int __must_check idr_alloc_ul(struct idr *, void *, unsigned long *nextid,
 			unsigned long max, gfp_t);
 int idr_alloc_cyclic(struct idr *, int *cursor, void *entry,
 			int start, int end, gfp_t);
-int idr_for_each(const struct idr *,
+int idr_for_each(struct idr *,
 		 int (*fn)(int id, void *p, void *data), void *data);
 void *idr_get_next(struct idr *, int *nextid);
-void *idr_get_next_ul(struct idr *, unsigned long *nextid);
 void *idr_replace(struct idr *, void *, unsigned long id);
-void idr_destroy(struct idr *);
 
 static inline int __must_check idr_alloc_u32(struct idr *idr, void *ptr,
 				u32 *nextid, unsigned long max, gfp_t gfp)
@@ -77,22 +86,21 @@ static inline int __must_check idr_alloc_u32(struct idr *idr, void *ptr,
 
 static inline void *idr_remove(struct idr *idr, unsigned long id)
 {
-	return radix_tree_delete_item(&idr->idr_rt, id, NULL);
+	return xa_store(&idr->idxa, id, NULL, GFP_NOWAIT);
 }
 
-static inline void idr_init(struct idr *idr)
+static inline bool idr_is_empty(const struct idr *idr)
 {
-	INIT_RADIX_TREE(&idr->idr_rt, IDR_RT_MARKER);
+	return xa_empty(&idr->idxa);
 }
 
-static inline bool idr_is_empty(const struct idr *idr)
+static inline void idr_destroy(struct idr *idr)
 {
-	return radix_tree_empty(&idr->idr_rt) &&
-		radix_tree_tagged(&idr->idr_rt, IDR_FREE);
+	xa_destroy(&idr->idxa);
 }
 
 /**
- * idr_preload_end - end preload section started with idr_preload()
+ * idr_preload_end() - end preload section started with idr_preload()
  *
  * Each idr_preload() should be matched with an invocation of this
  * function.  See idr_preload() for details.
@@ -103,7 +111,7 @@ static inline void idr_preload_end(void)
 }
 
 /**
- * idr_find - return pointer for given id
+ * idr_find() - return pointer for given id
  * @idr: idr handle
  * @id: lookup key
  *
@@ -111,12 +119,11 @@ static inline void idr_preload_end(void)
  * return indicates that @id is not valid or you passed %NULL in
  * idr_get_new().
  *
- * This function can be called under rcu_read_lock(), given that the leaf
- * pointers lifetimes are correctly managed.
+ * This function is protected by the RCU read lock.
  */
-static inline void *idr_find(const struct idr *idr, unsigned long id)
+static inline void *idr_find(struct idr *idr, unsigned long id)
 {
-	return radix_tree_lookup(&idr->idr_rt, id);
+	return xa_load(&idr->idxa, id);
 }
 
 /**
@@ -132,7 +139,7 @@ static inline void *idr_find(const struct idr *idr, unsigned long id)
 #define idr_for_each_entry(idr, entry, id)			\
 	for (id = 0; ((entry) = idr_get_next(idr, &(id))) != NULL; ++id)
 #define idr_for_each_entry_ul(idr, entry, id)			\
-	for (id = 0; ((entry) = idr_get_next_ul(idr, &(id))) != NULL; ++id)
+	xa_for_each(&(idr)->idxa, entry, id, ULONG_MAX)
 
 /**
  * idr_for_each_entry_continue - continue iteration over an idr's elements of a given type
@@ -167,7 +174,7 @@ struct ida {
 };
 
 #define IDA_INIT(name)	{						\
-	.ida_rt = RADIX_TREE_INIT(name, IDR_RT_MARKER | GFP_NOWAIT),	\
+	.ida_rt = RADIX_TREE_INIT(name, IDR_INIT_FLAGS | GFP_NOWAIT),	\
 }
 #define DEFINE_IDA(name)	struct ida name = IDA_INIT(name)
 
@@ -182,7 +189,7 @@ void ida_simple_remove(struct ida *ida, unsigned int id);
 
 static inline void ida_init(struct ida *ida)
 {
-	INIT_RADIX_TREE(&ida->ida_rt, IDR_RT_MARKER | GFP_NOWAIT);
+	INIT_RADIX_TREE(&ida->ida_rt, IDR_INIT_FLAGS | GFP_NOWAIT);
 }
 
 /**
@@ -201,4 +208,4 @@ static inline bool ida_is_empty(const struct ida *ida)
 {
 	return radix_tree_empty(&ida->ida_rt);
 }
-#endif /* __IDR_H__ */
+#endif /* _LINUX_IDR_H */
diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 8ab6c4468c21..8d7cd70beb8e 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -362,7 +362,13 @@ static inline bool xa_is_sibling(void *entry)
 		(entry < xa_mk_sibling(XA_CHUNK_SIZE - 1));
 }
 
-#define XA_RETRY_ENTRY		xa_mk_internal(256)
+#define XA_SKIP_ENTRY		xa_mk_internal(256)
+#define XA_RETRY_ENTRY		xa_mk_internal(257)
+
+static inline bool xa_is_skip(void *entry)
+{
+	return unlikely(entry == XA_SKIP_ENTRY);
+}
 
 static inline bool xa_is_retry(void *entry)
 {
@@ -444,18 +450,20 @@ static inline bool xas_not_node(struct xa_node *node)
 }
 
 /**
- * xas_retry() - Handle a retry entry.
+ * xas_retry() - Retry the operation if appropriate.
  * @xas: XArray operation state.
  * @entry: Entry from xarray.
  *
- * An RCU-protected read may see a retry entry as a side-effect of a
- * simultaneous modification.  This function sets up the @xas to retry
- * the walk from the head of the array.
+ * The advanced functions may sometimes return an internal entry, such as
+ * a retry entry or a skip entry.  This function sets up the @xas to restart
+ * the walk from the head of the array if needed.
  *
  * Return: true if the operation needs to be retried.
  */
 static inline bool xas_retry(struct xa_state *xas, void *entry)
 {
+	if (xa_is_skip(entry))
+		return true;
 	if (!xa_is_retry(entry))
 		return false;
 	xas->xa_node = XAS_RESTART;
diff --git a/lib/idr.c b/lib/idr.c
index 50201b5c46e9..713b19e6f1b3 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -8,6 +8,9 @@
 DEFINE_PER_CPU(struct ida_bitmap *, ida_bitmap);
 static DEFINE_SPINLOCK(simple_ida_lock);
 
+/* In radix-tree.c temporarily */
+extern bool idr_nomem(struct xa_state *, gfp_t);
+
 /**
  * idr_alloc_ul() - allocate a large ID
  * @idr: idr handle
@@ -29,30 +32,36 @@ static DEFINE_SPINLOCK(simple_ida_lock);
 int idr_alloc_ul(struct idr *idr, void *ptr, unsigned long *nextid,
 			unsigned long max, gfp_t gfp)
 {
-	struct radix_tree_iter iter;
-	void __rcu **slot;
+	XA_STATE(xas, *nextid);
+	unsigned long flags;
+	int err;
 
-	if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr)))
+	if (WARN_ON_ONCE(xa_is_internal(ptr)))
 		return -EINVAL;
-
-	if (WARN_ON_ONCE(!(idr->idr_rt.xa_flags & ROOT_IS_IDR)))
-		idr->idr_rt.xa_flags |= IDR_RT_MARKER;
-
-	radix_tree_iter_init(&iter, *nextid);
-	slot = idr_get_free(&idr->idr_rt, &iter, gfp, max);
-	if (IS_ERR(slot))
-		return PTR_ERR(slot);
-
-	radix_tree_iter_replace(&idr->idr_rt, &iter, slot, ptr);
-	radix_tree_iter_tag_clear(&idr->idr_rt, &iter, IDR_FREE);
-
-	*nextid = iter.index;
-	return 0;
+	if (!ptr)
+		ptr = XA_SKIP_ENTRY;
+
+	do {
+		xa_lock_irqsave(&idr->idxa, flags);
+		xas_find_tag(&idr->idxa, &xas, max, XA_FREE_TAG);
+		if (xas.xa_index > max)
+			xas_set_err(&xas, ENOSPC);
+		xas_store(&idr->idxa, &xas, ptr);
+		xas_clear_tag(&idr->idxa, &xas, XA_FREE_TAG);
+		xa_unlock_irqrestore(&idr->idxa, flags);
+	} while (idr_nomem(&xas, gfp));
+
+	err = xas_error(&xas);
+	if (!err)
+		*nextid = xas.xa_index;
+	xas_destroy(&xas);
+
+	return err;
 }
 EXPORT_SYMBOL_GPL(idr_alloc_ul);
 
 /**
- * idr_alloc - allocate an id
+ * idr_alloc() - allocate an id
  * @idr: idr handle
  * @ptr: pointer to be associated with the new id
  * @start: the minimum id (inclusive)
@@ -117,10 +126,10 @@ int idr_alloc_cyclic(struct idr *idr, int *cursor, void *ptr,
 
 	return id;
 }
-EXPORT_SYMBOL(idr_alloc_cyclic);
+EXPORT_SYMBOL_GPL(idr_alloc_cyclic);
 
 /**
- * idr_for_each - iterate through all stored pointers
+ * idr_for_each() - iterate through all stored pointers
  * @idr: idr handle
  * @fn: function to be called for each pointer
  * @data: data passed to callback function
@@ -136,69 +145,41 @@ EXPORT_SYMBOL(idr_alloc_cyclic);
  * seen and deleted entries may be seen, but adding and removing entries
  * will not cause other entries to be skipped, nor spurious ones to be seen.
  */
-int idr_for_each(const struct idr *idr,
+int idr_for_each(struct idr *idr,
 		int (*fn)(int id, void *p, void *data), void *data)
 {
-	struct radix_tree_iter iter;
-	void __rcu **slot;
+	unsigned long i = 0;
+	void *p;
 
-	radix_tree_for_each_slot(slot, &idr->idr_rt, &iter, 0) {
-		int ret = fn(iter.index, rcu_dereference_raw(*slot), data);
+	xa_for_each(&idr->idxa, p, i, INT_MAX) {
+		int ret = fn(i, p, data);
 		if (ret)
 			return ret;
 	}
 
 	return 0;
 }
-EXPORT_SYMBOL(idr_for_each);
-
-/**
- * idr_get_next - Find next populated entry
- * @idr: idr handle
- * @nextid: Pointer to lowest possible ID to return
- *
- * Returns the next populated entry in the tree with an ID greater than
- * or equal to the value pointed to by @nextid.  On exit, @nextid is updated
- * to the ID of the found value.  To use in a loop, the value pointed to by
- * nextid must be incremented by the user.
- */
-void *idr_get_next(struct idr *idr, int *nextid)
-{
-	struct radix_tree_iter iter;
-	void __rcu **slot;
-
-	slot = radix_tree_iter_find(&idr->idr_rt, &iter, *nextid);
-	if (!slot)
-		return NULL;
-
-	*nextid = iter.index;
-	return rcu_dereference_raw(*slot);
-}
-EXPORT_SYMBOL(idr_get_next);
+EXPORT_SYMBOL_GPL(idr_for_each);
 
 /**
- * idr_get_next_ul - Find next populated entry
+ * idr_get_next() - Find next populated entry
  * @idr: idr handle
- * @nextid: Pointer to lowest possible ID to return
+ * @id: Pointer to lowest possible ID to return
  *
  * Returns the next populated entry in the tree with an ID greater than
  * or equal to the value pointed to by @nextid.  On exit, @nextid is updated
  * to the ID of the found value.  To use in a loop, the value pointed to by
  * nextid must be incremented by the user.
  */
-void *idr_get_next_ul(struct idr *idr, unsigned long *nextid)
+void *idr_get_next(struct idr *idr, int *id)
 {
-	struct radix_tree_iter iter;
-	void __rcu **slot;
+	unsigned long index = *id;
+	void *entry = xa_find(&idr->idxa, &index, INT_MAX);
 
-	slot = radix_tree_iter_find(&idr->idr_rt, &iter, *nextid);
-	if (!slot)
-		return NULL;
-
-	*nextid = iter.index;
-	return rcu_dereference_raw(*slot);
+	*id = index;
+	return entry;
 }
-EXPORT_SYMBOL(idr_get_next_ul);
+EXPORT_SYMBOL_GPL(idr_get_next);
 
 /**
  * idr_replace - replace pointer for given id
@@ -216,22 +197,28 @@ EXPORT_SYMBOL(idr_get_next_ul);
  */
 void *idr_replace(struct idr *idr, void *ptr, unsigned long id)
 {
-	struct radix_tree_node *node;
-	void __rcu **slot = NULL;
-	void *entry;
+	XA_STATE(xas, id);
+	unsigned long flags;
+	void *curr;
 
-	if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr)))
+	if (WARN_ON_ONCE(xa_is_internal(ptr)))
 		return ERR_PTR(-EINVAL);
-
-	entry = __radix_tree_lookup(&idr->idr_rt, id, &node, &slot);
-	if (!slot || radix_tree_tag_get(&idr->idr_rt, id, IDR_FREE))
-		return ERR_PTR(-ENOENT);
-
-	__radix_tree_replace(&idr->idr_rt, node, slot, ptr, NULL);
-
-	return entry;
+	if (!ptr)
+		ptr = XA_SKIP_ENTRY;
+
+	xa_lock_irqsave(&idr->idxa, flags);
+	curr = xas_load(&idr->idxa, &xas);
+	if (curr)
+		xas_store(&idr->idxa, &xas, ptr);
+	else
+		curr = ERR_PTR(-ENOENT);
+	xa_unlock_irqrestore(&idr->idxa, flags);
+
+	if (xa_is_skip(curr))
+		return NULL;
+	return curr;
 }
-EXPORT_SYMBOL(idr_replace);
+EXPORT_SYMBOL_GPL(idr_replace);
 
 /**
  * DOC: IDA description
@@ -262,7 +249,7 @@ EXPORT_SYMBOL(idr_replace);
  * Developer's notes:
  *
  * The IDA uses the functionality provided by the IDR & radix tree to store
- * bitmaps in each entry.  The IDR_FREE tag means there is at least one bit
+ * bitmaps in each entry.  The XA_FREE_TAG tag means there is at least one bit
  * free, unlike the IDR where it means at least one entry is free.
  *
  * I considered telling the radix tree that each slot is an order-10 node
@@ -368,7 +355,7 @@ int ida_get_new_above(struct ida *ida, int start, int *id)
 			__set_bit(bit, bitmap->bitmap);
 			if (bitmap_full(bitmap->bitmap, IDA_BITMAP_BITS))
 				radix_tree_iter_tag_clear(root, &iter,
-								IDR_FREE);
+								XA_FREE_TAG);
 		} else {
 			new += bit;
 			if (new < 0)
@@ -424,7 +411,7 @@ void ida_remove(struct ida *ida, int id)
 		goto err;
 
 	__clear_bit(offset, btmp);
-	radix_tree_iter_tag_set(&ida->ida_rt, &iter, IDR_FREE);
+	radix_tree_iter_tag_set(&ida->ida_rt, &iter, XA_FREE_TAG);
 	if (xa_is_value(bitmap)) {
 		if (xa_to_value(rcu_dereference_raw(*slot)) == 0)
 			radix_tree_iter_delete(&ida->ida_rt, &iter, slot);
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 29b38d447497..3fbc0751b181 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -572,6 +572,28 @@ int radix_tree_maybe_preload_order(gfp_t gfp_mask, int order)
 	return __radix_tree_preload(gfp_mask, nr_nodes);
 }
 
+/* Once the IDR users abandon the preload API, we can use xas_nomem */
+bool idr_nomem(struct xa_state *xas, gfp_t gfp)
+{
+	if (xas->xa_node != XAS_ERROR(ENOMEM))
+		return false;
+	xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep,
+						gfp | __GFP_NOWARN);
+	if (!xas->xa_alloc) {
+		struct radix_tree_preload *rtp;
+
+		rtp = this_cpu_ptr(&radix_tree_preloads);
+		if (!rtp->nr)
+			return false;
+		xas->xa_alloc = rtp->nodes;
+		rtp->nodes = xas->xa_alloc->parent;
+		rtp->nr--;
+	}
+
+	xas->xa_node = XAS_RESTART;
+	return true;
+}
+
 static unsigned radix_tree_load_root(const struct radix_tree_root *root,
 		struct radix_tree_node **nodep, unsigned long *maxindex)
 {
@@ -605,7 +627,7 @@ static int radix_tree_extend(struct radix_tree_root *root, gfp_t gfp,
 		maxshift += RADIX_TREE_MAP_SHIFT;
 
 	entry = rcu_dereference_raw(root->xa_head);
-	if (!entry && (!is_idr(root) || root_tag_get(root, IDR_FREE)))
+	if (!entry && (!is_idr(root) || root_tag_get(root, XA_FREE_TAG)))
 		goto out;
 
 	do {
@@ -615,10 +637,10 @@ static int radix_tree_extend(struct radix_tree_root *root, gfp_t gfp,
 			return -ENOMEM;
 
 		if (is_idr(root)) {
-			all_tag_set(node, IDR_FREE);
-			if (!root_tag_get(root, IDR_FREE)) {
-				rtag_clear(node, IDR_FREE, 0);
-				root_tag_set(root, IDR_FREE);
+			all_tag_set(node, XA_FREE_TAG);
+			if (!root_tag_get(root, XA_FREE_TAG)) {
+				rtag_clear(node, XA_FREE_TAG, 0);
+				root_tag_set(root, XA_FREE_TAG);
 			}
 		} else {
 			/* Propagate the aggregated tag info to the new child */
@@ -689,8 +711,8 @@ static inline bool radix_tree_shrink(struct radix_tree_root *root,
 		 * one (root->xa_head) as far as dependent read barriers go.
 		 */
 		root->xa_head = (void __rcu *)child;
-		if (is_idr(root) && !rtag_get(node, IDR_FREE, 0))
-			root_tag_clear(root, IDR_FREE);
+		if (is_idr(root) && !rtag_get(node, XA_FREE_TAG, 0))
+			root_tag_clear(root, XA_FREE_TAG);
 
 		/*
 		 * We have a dilemma here. The node's slot[0] must not be
@@ -1117,7 +1139,7 @@ static bool node_tag_get(const struct radix_tree_root *root,
 /*
  * IDR users want to be able to store NULL in the tree, so if the slot isn't
  * free, don't adjust the count, even if it's transitioning between NULL and
- * non-NULL.  For the IDA, we mark slots as being IDR_FREE while they still
+ * non-NULL.  For the IDA, we mark slots as being XA_FREE_TAG while they still
  * have empty bits, but it only stores NULL in slots when they're being
  * deleted.
  */
@@ -1127,7 +1149,7 @@ static int calculate_count(struct radix_tree_root *root,
 {
 	if (is_idr(root)) {
 		unsigned offset = get_slot_offset(node, slot);
-		bool free = node_tag_get(root, node, IDR_FREE, offset);
+		bool free = node_tag_get(root, node, XA_FREE_TAG, offset);
 		if (!free)
 			return 0;
 		if (!old)
@@ -1958,7 +1980,7 @@ static bool __radix_tree_delete(struct radix_tree_root *root,
 	int tag;
 
 	if (is_idr(root))
-		node_tag_set(root, node, IDR_FREE, offset);
+		node_tag_set(root, node, XA_FREE_TAG, offset);
 	else
 		for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
 			node_tag_clear(root, node, tag, offset);
@@ -2006,7 +2028,7 @@ void *radix_tree_delete_item(struct radix_tree_root *root,
 	void *entry;
 
 	entry = __radix_tree_lookup(root, index, &node, &slot);
-	if (!entry && (!is_idr(root) || node_tag_get(root, node, IDR_FREE,
+	if (!entry && (!is_idr(root) || node_tag_get(root, node, XA_FREE_TAG,
 						get_slot_offset(node, slot))))
 		return NULL;
 
@@ -2113,7 +2135,7 @@ void __rcu **idr_get_free(struct radix_tree_root *root,
 
  grow:
 	shift = radix_tree_load_root(root, &child, &maxindex);
-	if (!radix_tree_tagged(root, IDR_FREE))
+	if (!radix_tree_tagged(root, XA_FREE_TAG))
 		start = max(start, maxindex + 1);
 	if (start > max)
 		return ERR_PTR(-ENOSPC);
@@ -2134,7 +2156,7 @@ void __rcu **idr_get_free(struct radix_tree_root *root,
 							offset, 0, 0);
 			if (!child)
 				return ERR_PTR(-ENOMEM);
-			all_tag_set(child, IDR_FREE);
+			all_tag_set(child, XA_FREE_TAG);
 			rcu_assign_pointer(*slot, node_to_entry(child));
 			if (node)
 				node->count++;
@@ -2143,8 +2165,8 @@ void __rcu **idr_get_free(struct radix_tree_root *root,
 
 		node = entry_to_node(child);
 		offset = radix_tree_descend(node, &child, start);
-		if (!rtag_get(node, IDR_FREE, offset)) {
-			offset = radix_tree_find_next_bit(node, IDR_FREE,
+		if (!rtag_get(node, XA_FREE_TAG, offset)) {
+			offset = radix_tree_find_next_bit(node, XA_FREE_TAG,
 							offset + 1);
 			start = rnext_index(start, node, offset);
 			if (start > max)
@@ -2168,32 +2190,11 @@ void __rcu **idr_get_free(struct radix_tree_root *root,
 		iter->next_index = 1;
 	iter->node = node;
 	__set_iter_shift(iter, shift);
-	set_iter_tags(iter, node, offset, IDR_FREE);
+	set_iter_tags(iter, node, offset, XA_FREE_TAG);
 
 	return slot;
 }
 
-/**
- * idr_destroy - release all internal memory from an IDR
- * @idr: idr handle
- *
- * After this function is called, the IDR is empty, and may be reused or
- * the data structure containing it may be freed.
- *
- * A typical clean-up sequence for objects stored in an idr tree will use
- * idr_for_each() to free all objects, if necessary, then idr_destroy() to
- * free the memory used to keep track of those objects.
- */
-void idr_destroy(struct idr *idr)
-{
-	struct radix_tree_node *node = rcu_dereference_raw(idr->idr_rt.xa_head);
-	if (radix_tree_is_internal_node(node))
-		radix_tree_free_nodes(node);
-	idr->idr_rt.xa_head = NULL;
-	root_tag_set(&idr->idr_rt, IDR_FREE);
-}
-EXPORT_SYMBOL(idr_destroy);
-
 static void
 radix_tree_node_ctor(void *arg)
 {
diff --git a/lib/xarray.c b/lib/xarray.c
index 38e5fe39bb97..4990ff18db9d 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -1078,6 +1078,8 @@ void *xa_load(struct xarray *xa, unsigned long index)
 	rcu_read_lock();
 	do {
 		entry = xas_load(xa, &xas);
+		if (xa_is_skip(entry))
+			entry = NULL;
 	} while (xas_retry(&xas, entry));
 	rcu_read_unlock();
 
@@ -1113,6 +1115,8 @@ void *xa_store(struct xarray *xa, unsigned long index, void *entry, gfp_t gfp)
 		xa_lock_irqsave(xa, flags);
 		curr = xas_store(xa, &xas, entry);
 		xa_unlock_irqrestore(xa, flags);
+		if (xa_is_skip(curr))
+			curr = NULL;
 	} while (xas_nomem(&xas, gfp));
 	xas_destroy(&xas);
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 31/62] Convert IDR to use xarray
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The one thing the IDR can do which the XArray cannot is allocate an
entry and store a NULL pointer there.  The radix tree was modified to
have this ability, but it added a lot of complexity.  Instead, we
introduce the 'Skip entry', which will read back from the XArray as
NULL through the normal API and will be skipped when iterating, but
will keep a slot allocated.  The only way to store a skip entry in the
xarray is through the advanced API, so unaware users should be
unaffected by it.

The idr_for_each_entry_ul() iterator becomes an alias for xa_for_each(),
so we drop the idr_get_next_ul() function as it has no users.

The exported IDR API was a weird mix of GPL-only and general symbols;
I converted them all to GPL as there was no way to use the IDR API
without being GPL.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/idr.h    |  93 +++++++++++++++++---------------
 include/linux/xarray.h |  18 +++++--
 lib/idr.c              | 143 ++++++++++++++++++++++---------------------------
 lib/radix-tree.c       |  75 +++++++++++++-------------
 lib/xarray.c           |   4 ++
 5 files changed, 170 insertions(+), 163 deletions(-)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index 3c3346441cbc..57945eb0792a 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -9,62 +9,71 @@
  * tables.
  */
 
-#ifndef __IDR_H__
-#define __IDR_H__
+#ifndef _LINUX_IDR_H
+#define _LINUX_IDR_H
 
 #include <linux/radix-tree.h>
 #include <linux/gfp.h>
 #include <linux/percpu.h>
+#include <linux/xarray.h>
 
 struct idr {
-	struct radix_tree_root	idr_rt;
+	struct xarray	idxa;
 };
 
-/*
- * The IDR API does not expose the tagging functionality of the radix tree
- * to users.  Use tag 0 to track whether a node has free space below it.
- */
-#define IDR_FREE	0
-
-/* Set the IDR flag and the IDR_FREE tag */
-#define IDR_RT_MARKER		((__force gfp_t)(3 << __GFP_BITS_SHIFT))
+#define IDR_INIT_FLAGS		XA_FLAGS_TRACK_FREE | XA_FLAGS_TAG(0)
 
 #define IDR_INIT(name)							\
 {									\
-	.idr_rt = RADIX_TREE_INIT(name, IDR_RT_MARKER)			\
+	.idxa = __XARRAY_INIT(name.idxa, IDR_INIT_FLAGS)		\
 }
 #define DEFINE_IDR(name)	struct idr name = IDR_INIT(name)
 
+static inline void idr_init(struct idr *idr)
+{
+	__xa_init(&idr->idxa, IDR_INIT_FLAGS);
+}
+
 /**
  * DOC: idr sync
- * idr synchronization (stolen from radix-tree.h)
+ * idr synchronization
  *
- * idr_find() is able to be called locklessly, using RCU. The caller must
- * ensure calls to this function are made within rcu_read_lock() regions.
- * Other readers (lock-free or otherwise) and modifications may be running
- * concurrently.
+ * The IDR manages its own locking, using irqsafe spinlocks for operations
+ * which modify the IDR and RCU for operations which do not.  The user of
+ * the IDR may choose to wrap accesses to it in a lock if it needs to
+ * guarantee the IDR does not change during a read access.  The easiest way
+ * to do this is to grab the same lock the IDR uses for write accesses
+ * using one of the idr_lock() wrappers.
  *
- * It is still required that the caller manage the synchronization and
- * lifetimes of the items. So if RCU lock-free lookups are used, typically
- * this would mean that the items have their own locks, or are amenable to
- * lock-free access; and that the items are freed by RCU (or only freed after
- * having been deleted from the idr tree *and* a synchronize_rcu() grace
- * period).
+ * The caller must still manage the synchronization and lifetimes of the
+ * items. So if RCU lock-free lookups are used, typically this would mean
+ * that the items have their own locks, or are amenable to lock-free access;
+ * and that the items are freed by RCU (or only freed after having been
+ * deleted from the IDR *and* a synchronize_rcu() grace period has elapsed).
  */
 
-void idr_preload(gfp_t gfp_mask);
+#define idr_lock(idr)		xa_lock(&(idr)->idxa)
+#define idr_unlock(idr)		xa_unlock(&(idr)->idxa)
+#define idr_lock_bh(idr)	xa_lock_bh(&(idr)->idxa)
+#define idr_unlock_bh(idr)	xa_unlock_bh(&(idr)->idxa)
+#define idr_lock_irq(idr)	xa_lock_irq(&(idr)->idxa)
+#define idr_unlock_irq(idr)	xa_unlock_irq(&(idr)->idxa)
+#define idr_lock_irqsave(idr, flags) \
+				xa_lock_irqsave(&(idr)->idxa, flags)
+#define idr_unlock_irqrestore(idr, flags) \
+				xa_unlock_irqrestore(&(idr)->idxa, flags)
+
+void idr_preload(gfp_t);
 
 int idr_alloc(struct idr *, void *, int start, int end, gfp_t);
 int __must_check idr_alloc_ul(struct idr *, void *, unsigned long *nextid,
 			unsigned long max, gfp_t);
 int idr_alloc_cyclic(struct idr *, int *cursor, void *entry,
 			int start, int end, gfp_t);
-int idr_for_each(const struct idr *,
+int idr_for_each(struct idr *,
 		 int (*fn)(int id, void *p, void *data), void *data);
 void *idr_get_next(struct idr *, int *nextid);
-void *idr_get_next_ul(struct idr *, unsigned long *nextid);
 void *idr_replace(struct idr *, void *, unsigned long id);
-void idr_destroy(struct idr *);
 
 static inline int __must_check idr_alloc_u32(struct idr *idr, void *ptr,
 				u32 *nextid, unsigned long max, gfp_t gfp)
@@ -77,22 +86,21 @@ static inline int __must_check idr_alloc_u32(struct idr *idr, void *ptr,
 
 static inline void *idr_remove(struct idr *idr, unsigned long id)
 {
-	return radix_tree_delete_item(&idr->idr_rt, id, NULL);
+	return xa_store(&idr->idxa, id, NULL, GFP_NOWAIT);
 }
 
-static inline void idr_init(struct idr *idr)
+static inline bool idr_is_empty(const struct idr *idr)
 {
-	INIT_RADIX_TREE(&idr->idr_rt, IDR_RT_MARKER);
+	return xa_empty(&idr->idxa);
 }
 
-static inline bool idr_is_empty(const struct idr *idr)
+static inline void idr_destroy(struct idr *idr)
 {
-	return radix_tree_empty(&idr->idr_rt) &&
-		radix_tree_tagged(&idr->idr_rt, IDR_FREE);
+	xa_destroy(&idr->idxa);
 }
 
 /**
- * idr_preload_end - end preload section started with idr_preload()
+ * idr_preload_end() - end preload section started with idr_preload()
  *
  * Each idr_preload() should be matched with an invocation of this
  * function.  See idr_preload() for details.
@@ -103,7 +111,7 @@ static inline void idr_preload_end(void)
 }
 
 /**
- * idr_find - return pointer for given id
+ * idr_find() - return pointer for given id
  * @idr: idr handle
  * @id: lookup key
  *
@@ -111,12 +119,11 @@ static inline void idr_preload_end(void)
  * return indicates that @id is not valid or you passed %NULL in
  * idr_get_new().
  *
- * This function can be called under rcu_read_lock(), given that the leaf
- * pointers lifetimes are correctly managed.
+ * This function is protected by the RCU read lock.
  */
-static inline void *idr_find(const struct idr *idr, unsigned long id)
+static inline void *idr_find(struct idr *idr, unsigned long id)
 {
-	return radix_tree_lookup(&idr->idr_rt, id);
+	return xa_load(&idr->idxa, id);
 }
 
 /**
@@ -132,7 +139,7 @@ static inline void *idr_find(const struct idr *idr, unsigned long id)
 #define idr_for_each_entry(idr, entry, id)			\
 	for (id = 0; ((entry) = idr_get_next(idr, &(id))) != NULL; ++id)
 #define idr_for_each_entry_ul(idr, entry, id)			\
-	for (id = 0; ((entry) = idr_get_next_ul(idr, &(id))) != NULL; ++id)
+	xa_for_each(&(idr)->idxa, entry, id, ULONG_MAX)
 
 /**
  * idr_for_each_entry_continue - continue iteration over an idr's elements of a given type
@@ -167,7 +174,7 @@ struct ida {
 };
 
 #define IDA_INIT(name)	{						\
-	.ida_rt = RADIX_TREE_INIT(name, IDR_RT_MARKER | GFP_NOWAIT),	\
+	.ida_rt = RADIX_TREE_INIT(name, IDR_INIT_FLAGS | GFP_NOWAIT),	\
 }
 #define DEFINE_IDA(name)	struct ida name = IDA_INIT(name)
 
@@ -182,7 +189,7 @@ void ida_simple_remove(struct ida *ida, unsigned int id);
 
 static inline void ida_init(struct ida *ida)
 {
-	INIT_RADIX_TREE(&ida->ida_rt, IDR_RT_MARKER | GFP_NOWAIT);
+	INIT_RADIX_TREE(&ida->ida_rt, IDR_INIT_FLAGS | GFP_NOWAIT);
 }
 
 /**
@@ -201,4 +208,4 @@ static inline bool ida_is_empty(const struct ida *ida)
 {
 	return radix_tree_empty(&ida->ida_rt);
 }
-#endif /* __IDR_H__ */
+#endif /* _LINUX_IDR_H */
diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 8ab6c4468c21..8d7cd70beb8e 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -362,7 +362,13 @@ static inline bool xa_is_sibling(void *entry)
 		(entry < xa_mk_sibling(XA_CHUNK_SIZE - 1));
 }
 
-#define XA_RETRY_ENTRY		xa_mk_internal(256)
+#define XA_SKIP_ENTRY		xa_mk_internal(256)
+#define XA_RETRY_ENTRY		xa_mk_internal(257)
+
+static inline bool xa_is_skip(void *entry)
+{
+	return unlikely(entry == XA_SKIP_ENTRY);
+}
 
 static inline bool xa_is_retry(void *entry)
 {
@@ -444,18 +450,20 @@ static inline bool xas_not_node(struct xa_node *node)
 }
 
 /**
- * xas_retry() - Handle a retry entry.
+ * xas_retry() - Retry the operation if appropriate.
  * @xas: XArray operation state.
  * @entry: Entry from xarray.
  *
- * An RCU-protected read may see a retry entry as a side-effect of a
- * simultaneous modification.  This function sets up the @xas to retry
- * the walk from the head of the array.
+ * The advanced functions may sometimes return an internal entry, such as
+ * a retry entry or a skip entry.  This function sets up the @xas to restart
+ * the walk from the head of the array if needed.
  *
  * Return: true if the operation needs to be retried.
  */
 static inline bool xas_retry(struct xa_state *xas, void *entry)
 {
+	if (xa_is_skip(entry))
+		return true;
 	if (!xa_is_retry(entry))
 		return false;
 	xas->xa_node = XAS_RESTART;
diff --git a/lib/idr.c b/lib/idr.c
index 50201b5c46e9..713b19e6f1b3 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -8,6 +8,9 @@
 DEFINE_PER_CPU(struct ida_bitmap *, ida_bitmap);
 static DEFINE_SPINLOCK(simple_ida_lock);
 
+/* In radix-tree.c temporarily */
+extern bool idr_nomem(struct xa_state *, gfp_t);
+
 /**
  * idr_alloc_ul() - allocate a large ID
  * @idr: idr handle
@@ -29,30 +32,36 @@ static DEFINE_SPINLOCK(simple_ida_lock);
 int idr_alloc_ul(struct idr *idr, void *ptr, unsigned long *nextid,
 			unsigned long max, gfp_t gfp)
 {
-	struct radix_tree_iter iter;
-	void __rcu **slot;
+	XA_STATE(xas, *nextid);
+	unsigned long flags;
+	int err;
 
-	if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr)))
+	if (WARN_ON_ONCE(xa_is_internal(ptr)))
 		return -EINVAL;
-
-	if (WARN_ON_ONCE(!(idr->idr_rt.xa_flags & ROOT_IS_IDR)))
-		idr->idr_rt.xa_flags |= IDR_RT_MARKER;
-
-	radix_tree_iter_init(&iter, *nextid);
-	slot = idr_get_free(&idr->idr_rt, &iter, gfp, max);
-	if (IS_ERR(slot))
-		return PTR_ERR(slot);
-
-	radix_tree_iter_replace(&idr->idr_rt, &iter, slot, ptr);
-	radix_tree_iter_tag_clear(&idr->idr_rt, &iter, IDR_FREE);
-
-	*nextid = iter.index;
-	return 0;
+	if (!ptr)
+		ptr = XA_SKIP_ENTRY;
+
+	do {
+		xa_lock_irqsave(&idr->idxa, flags);
+		xas_find_tag(&idr->idxa, &xas, max, XA_FREE_TAG);
+		if (xas.xa_index > max)
+			xas_set_err(&xas, ENOSPC);
+		xas_store(&idr->idxa, &xas, ptr);
+		xas_clear_tag(&idr->idxa, &xas, XA_FREE_TAG);
+		xa_unlock_irqrestore(&idr->idxa, flags);
+	} while (idr_nomem(&xas, gfp));
+
+	err = xas_error(&xas);
+	if (!err)
+		*nextid = xas.xa_index;
+	xas_destroy(&xas);
+
+	return err;
 }
 EXPORT_SYMBOL_GPL(idr_alloc_ul);
 
 /**
- * idr_alloc - allocate an id
+ * idr_alloc() - allocate an id
  * @idr: idr handle
  * @ptr: pointer to be associated with the new id
  * @start: the minimum id (inclusive)
@@ -117,10 +126,10 @@ int idr_alloc_cyclic(struct idr *idr, int *cursor, void *ptr,
 
 	return id;
 }
-EXPORT_SYMBOL(idr_alloc_cyclic);
+EXPORT_SYMBOL_GPL(idr_alloc_cyclic);
 
 /**
- * idr_for_each - iterate through all stored pointers
+ * idr_for_each() - iterate through all stored pointers
  * @idr: idr handle
  * @fn: function to be called for each pointer
  * @data: data passed to callback function
@@ -136,69 +145,41 @@ EXPORT_SYMBOL(idr_alloc_cyclic);
  * seen and deleted entries may be seen, but adding and removing entries
  * will not cause other entries to be skipped, nor spurious ones to be seen.
  */
-int idr_for_each(const struct idr *idr,
+int idr_for_each(struct idr *idr,
 		int (*fn)(int id, void *p, void *data), void *data)
 {
-	struct radix_tree_iter iter;
-	void __rcu **slot;
+	unsigned long i = 0;
+	void *p;
 
-	radix_tree_for_each_slot(slot, &idr->idr_rt, &iter, 0) {
-		int ret = fn(iter.index, rcu_dereference_raw(*slot), data);
+	xa_for_each(&idr->idxa, p, i, INT_MAX) {
+		int ret = fn(i, p, data);
 		if (ret)
 			return ret;
 	}
 
 	return 0;
 }
-EXPORT_SYMBOL(idr_for_each);
-
-/**
- * idr_get_next - Find next populated entry
- * @idr: idr handle
- * @nextid: Pointer to lowest possible ID to return
- *
- * Returns the next populated entry in the tree with an ID greater than
- * or equal to the value pointed to by @nextid.  On exit, @nextid is updated
- * to the ID of the found value.  To use in a loop, the value pointed to by
- * nextid must be incremented by the user.
- */
-void *idr_get_next(struct idr *idr, int *nextid)
-{
-	struct radix_tree_iter iter;
-	void __rcu **slot;
-
-	slot = radix_tree_iter_find(&idr->idr_rt, &iter, *nextid);
-	if (!slot)
-		return NULL;
-
-	*nextid = iter.index;
-	return rcu_dereference_raw(*slot);
-}
-EXPORT_SYMBOL(idr_get_next);
+EXPORT_SYMBOL_GPL(idr_for_each);
 
 /**
- * idr_get_next_ul - Find next populated entry
+ * idr_get_next() - Find next populated entry
  * @idr: idr handle
- * @nextid: Pointer to lowest possible ID to return
+ * @id: Pointer to lowest possible ID to return
  *
  * Returns the next populated entry in the tree with an ID greater than
  * or equal to the value pointed to by @nextid.  On exit, @nextid is updated
  * to the ID of the found value.  To use in a loop, the value pointed to by
  * nextid must be incremented by the user.
  */
-void *idr_get_next_ul(struct idr *idr, unsigned long *nextid)
+void *idr_get_next(struct idr *idr, int *id)
 {
-	struct radix_tree_iter iter;
-	void __rcu **slot;
+	unsigned long index = *id;
+	void *entry = xa_find(&idr->idxa, &index, INT_MAX);
 
-	slot = radix_tree_iter_find(&idr->idr_rt, &iter, *nextid);
-	if (!slot)
-		return NULL;
-
-	*nextid = iter.index;
-	return rcu_dereference_raw(*slot);
+	*id = index;
+	return entry;
 }
-EXPORT_SYMBOL(idr_get_next_ul);
+EXPORT_SYMBOL_GPL(idr_get_next);
 
 /**
  * idr_replace - replace pointer for given id
@@ -216,22 +197,28 @@ EXPORT_SYMBOL(idr_get_next_ul);
  */
 void *idr_replace(struct idr *idr, void *ptr, unsigned long id)
 {
-	struct radix_tree_node *node;
-	void __rcu **slot = NULL;
-	void *entry;
+	XA_STATE(xas, id);
+	unsigned long flags;
+	void *curr;
 
-	if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr)))
+	if (WARN_ON_ONCE(xa_is_internal(ptr)))
 		return ERR_PTR(-EINVAL);
-
-	entry = __radix_tree_lookup(&idr->idr_rt, id, &node, &slot);
-	if (!slot || radix_tree_tag_get(&idr->idr_rt, id, IDR_FREE))
-		return ERR_PTR(-ENOENT);
-
-	__radix_tree_replace(&idr->idr_rt, node, slot, ptr, NULL);
-
-	return entry;
+	if (!ptr)
+		ptr = XA_SKIP_ENTRY;
+
+	xa_lock_irqsave(&idr->idxa, flags);
+	curr = xas_load(&idr->idxa, &xas);
+	if (curr)
+		xas_store(&idr->idxa, &xas, ptr);
+	else
+		curr = ERR_PTR(-ENOENT);
+	xa_unlock_irqrestore(&idr->idxa, flags);
+
+	if (xa_is_skip(curr))
+		return NULL;
+	return curr;
 }
-EXPORT_SYMBOL(idr_replace);
+EXPORT_SYMBOL_GPL(idr_replace);
 
 /**
  * DOC: IDA description
@@ -262,7 +249,7 @@ EXPORT_SYMBOL(idr_replace);
  * Developer's notes:
  *
  * The IDA uses the functionality provided by the IDR & radix tree to store
- * bitmaps in each entry.  The IDR_FREE tag means there is at least one bit
+ * bitmaps in each entry.  The XA_FREE_TAG tag means there is at least one bit
  * free, unlike the IDR where it means at least one entry is free.
  *
  * I considered telling the radix tree that each slot is an order-10 node
@@ -368,7 +355,7 @@ int ida_get_new_above(struct ida *ida, int start, int *id)
 			__set_bit(bit, bitmap->bitmap);
 			if (bitmap_full(bitmap->bitmap, IDA_BITMAP_BITS))
 				radix_tree_iter_tag_clear(root, &iter,
-								IDR_FREE);
+								XA_FREE_TAG);
 		} else {
 			new += bit;
 			if (new < 0)
@@ -424,7 +411,7 @@ void ida_remove(struct ida *ida, int id)
 		goto err;
 
 	__clear_bit(offset, btmp);
-	radix_tree_iter_tag_set(&ida->ida_rt, &iter, IDR_FREE);
+	radix_tree_iter_tag_set(&ida->ida_rt, &iter, XA_FREE_TAG);
 	if (xa_is_value(bitmap)) {
 		if (xa_to_value(rcu_dereference_raw(*slot)) == 0)
 			radix_tree_iter_delete(&ida->ida_rt, &iter, slot);
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 29b38d447497..3fbc0751b181 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -572,6 +572,28 @@ int radix_tree_maybe_preload_order(gfp_t gfp_mask, int order)
 	return __radix_tree_preload(gfp_mask, nr_nodes);
 }
 
+/* Once the IDR users abandon the preload API, we can use xas_nomem */
+bool idr_nomem(struct xa_state *xas, gfp_t gfp)
+{
+	if (xas->xa_node != XAS_ERROR(ENOMEM))
+		return false;
+	xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep,
+						gfp | __GFP_NOWARN);
+	if (!xas->xa_alloc) {
+		struct radix_tree_preload *rtp;
+
+		rtp = this_cpu_ptr(&radix_tree_preloads);
+		if (!rtp->nr)
+			return false;
+		xas->xa_alloc = rtp->nodes;
+		rtp->nodes = xas->xa_alloc->parent;
+		rtp->nr--;
+	}
+
+	xas->xa_node = XAS_RESTART;
+	return true;
+}
+
 static unsigned radix_tree_load_root(const struct radix_tree_root *root,
 		struct radix_tree_node **nodep, unsigned long *maxindex)
 {
@@ -605,7 +627,7 @@ static int radix_tree_extend(struct radix_tree_root *root, gfp_t gfp,
 		maxshift += RADIX_TREE_MAP_SHIFT;
 
 	entry = rcu_dereference_raw(root->xa_head);
-	if (!entry && (!is_idr(root) || root_tag_get(root, IDR_FREE)))
+	if (!entry && (!is_idr(root) || root_tag_get(root, XA_FREE_TAG)))
 		goto out;
 
 	do {
@@ -615,10 +637,10 @@ static int radix_tree_extend(struct radix_tree_root *root, gfp_t gfp,
 			return -ENOMEM;
 
 		if (is_idr(root)) {
-			all_tag_set(node, IDR_FREE);
-			if (!root_tag_get(root, IDR_FREE)) {
-				rtag_clear(node, IDR_FREE, 0);
-				root_tag_set(root, IDR_FREE);
+			all_tag_set(node, XA_FREE_TAG);
+			if (!root_tag_get(root, XA_FREE_TAG)) {
+				rtag_clear(node, XA_FREE_TAG, 0);
+				root_tag_set(root, XA_FREE_TAG);
 			}
 		} else {
 			/* Propagate the aggregated tag info to the new child */
@@ -689,8 +711,8 @@ static inline bool radix_tree_shrink(struct radix_tree_root *root,
 		 * one (root->xa_head) as far as dependent read barriers go.
 		 */
 		root->xa_head = (void __rcu *)child;
-		if (is_idr(root) && !rtag_get(node, IDR_FREE, 0))
-			root_tag_clear(root, IDR_FREE);
+		if (is_idr(root) && !rtag_get(node, XA_FREE_TAG, 0))
+			root_tag_clear(root, XA_FREE_TAG);
 
 		/*
 		 * We have a dilemma here. The node's slot[0] must not be
@@ -1117,7 +1139,7 @@ static bool node_tag_get(const struct radix_tree_root *root,
 /*
  * IDR users want to be able to store NULL in the tree, so if the slot isn't
  * free, don't adjust the count, even if it's transitioning between NULL and
- * non-NULL.  For the IDA, we mark slots as being IDR_FREE while they still
+ * non-NULL.  For the IDA, we mark slots as being XA_FREE_TAG while they still
  * have empty bits, but it only stores NULL in slots when they're being
  * deleted.
  */
@@ -1127,7 +1149,7 @@ static int calculate_count(struct radix_tree_root *root,
 {
 	if (is_idr(root)) {
 		unsigned offset = get_slot_offset(node, slot);
-		bool free = node_tag_get(root, node, IDR_FREE, offset);
+		bool free = node_tag_get(root, node, XA_FREE_TAG, offset);
 		if (!free)
 			return 0;
 		if (!old)
@@ -1958,7 +1980,7 @@ static bool __radix_tree_delete(struct radix_tree_root *root,
 	int tag;
 
 	if (is_idr(root))
-		node_tag_set(root, node, IDR_FREE, offset);
+		node_tag_set(root, node, XA_FREE_TAG, offset);
 	else
 		for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
 			node_tag_clear(root, node, tag, offset);
@@ -2006,7 +2028,7 @@ void *radix_tree_delete_item(struct radix_tree_root *root,
 	void *entry;
 
 	entry = __radix_tree_lookup(root, index, &node, &slot);
-	if (!entry && (!is_idr(root) || node_tag_get(root, node, IDR_FREE,
+	if (!entry && (!is_idr(root) || node_tag_get(root, node, XA_FREE_TAG,
 						get_slot_offset(node, slot))))
 		return NULL;
 
@@ -2113,7 +2135,7 @@ void __rcu **idr_get_free(struct radix_tree_root *root,
 
  grow:
 	shift = radix_tree_load_root(root, &child, &maxindex);
-	if (!radix_tree_tagged(root, IDR_FREE))
+	if (!radix_tree_tagged(root, XA_FREE_TAG))
 		start = max(start, maxindex + 1);
 	if (start > max)
 		return ERR_PTR(-ENOSPC);
@@ -2134,7 +2156,7 @@ void __rcu **idr_get_free(struct radix_tree_root *root,
 							offset, 0, 0);
 			if (!child)
 				return ERR_PTR(-ENOMEM);
-			all_tag_set(child, IDR_FREE);
+			all_tag_set(child, XA_FREE_TAG);
 			rcu_assign_pointer(*slot, node_to_entry(child));
 			if (node)
 				node->count++;
@@ -2143,8 +2165,8 @@ void __rcu **idr_get_free(struct radix_tree_root *root,
 
 		node = entry_to_node(child);
 		offset = radix_tree_descend(node, &child, start);
-		if (!rtag_get(node, IDR_FREE, offset)) {
-			offset = radix_tree_find_next_bit(node, IDR_FREE,
+		if (!rtag_get(node, XA_FREE_TAG, offset)) {
+			offset = radix_tree_find_next_bit(node, XA_FREE_TAG,
 							offset + 1);
 			start = rnext_index(start, node, offset);
 			if (start > max)
@@ -2168,32 +2190,11 @@ void __rcu **idr_get_free(struct radix_tree_root *root,
 		iter->next_index = 1;
 	iter->node = node;
 	__set_iter_shift(iter, shift);
-	set_iter_tags(iter, node, offset, IDR_FREE);
+	set_iter_tags(iter, node, offset, XA_FREE_TAG);
 
 	return slot;
 }
 
-/**
- * idr_destroy - release all internal memory from an IDR
- * @idr: idr handle
- *
- * After this function is called, the IDR is empty, and may be reused or
- * the data structure containing it may be freed.
- *
- * A typical clean-up sequence for objects stored in an idr tree will use
- * idr_for_each() to free all objects, if necessary, then idr_destroy() to
- * free the memory used to keep track of those objects.
- */
-void idr_destroy(struct idr *idr)
-{
-	struct radix_tree_node *node = rcu_dereference_raw(idr->idr_rt.xa_head);
-	if (radix_tree_is_internal_node(node))
-		radix_tree_free_nodes(node);
-	idr->idr_rt.xa_head = NULL;
-	root_tag_set(&idr->idr_rt, IDR_FREE);
-}
-EXPORT_SYMBOL(idr_destroy);
-
 static void
 radix_tree_node_ctor(void *arg)
 {
diff --git a/lib/xarray.c b/lib/xarray.c
index 38e5fe39bb97..4990ff18db9d 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -1078,6 +1078,8 @@ void *xa_load(struct xarray *xa, unsigned long index)
 	rcu_read_lock();
 	do {
 		entry = xas_load(xa, &xas);
+		if (xa_is_skip(entry))
+			entry = NULL;
 	} while (xas_retry(&xas, entry));
 	rcu_read_unlock();
 
@@ -1113,6 +1115,8 @@ void *xa_store(struct xarray *xa, unsigned long index, void *entry, gfp_t gfp)
 		xa_lock_irqsave(xa, flags);
 		curr = xas_store(xa, &xas, entry);
 		xa_unlock_irqrestore(xa, flags);
+		if (xa_is_skip(curr))
+			curr = NULL;
 	} while (xas_nomem(&xas, gfp));
 	xas_destroy(&xas);
 
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 32/62] ida: Convert to using xarray
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Use the xarray infrstructure like we used the radix tree infrastructure.
This lets us get rid of idr_get_free() from the radix tree code.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/idr.h        |   8 +-
 include/linux/radix-tree.h |   4 -
 lib/idr.c                  | 239 +++++++++++++++++++++++----------------------
 lib/radix-tree.c           |  86 +---------------
 4 files changed, 128 insertions(+), 209 deletions(-)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index 57945eb0792a..8f62f7ba79fd 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -170,11 +170,11 @@ struct ida_bitmap {
 DECLARE_PER_CPU(struct ida_bitmap *, ida_bitmap);
 
 struct ida {
-	struct radix_tree_root	ida_rt;
+	struct xarray	idxa;
 };
 
 #define IDA_INIT(name)	{						\
-	.ida_rt = RADIX_TREE_INIT(name, IDR_INIT_FLAGS | GFP_NOWAIT),	\
+	.idxa = __XARRAY_INIT(name.idxa, IDR_INIT_FLAGS)		\
 }
 #define DEFINE_IDA(name)	struct ida name = IDA_INIT(name)
 
@@ -189,7 +189,7 @@ void ida_simple_remove(struct ida *ida, unsigned int id);
 
 static inline void ida_init(struct ida *ida)
 {
-	INIT_RADIX_TREE(&ida->ida_rt, IDR_INIT_FLAGS | GFP_NOWAIT);
+	__xa_init(&ida->idxa, IDR_INIT_FLAGS);
 }
 
 /**
@@ -206,6 +206,6 @@ static inline int ida_get_new(struct ida *ida, int *p_id)
 
 static inline bool ida_is_empty(const struct ida *ida)
 {
-	return radix_tree_empty(&ida->ida_rt);
+	return xa_empty(&ida->idxa);
 }
 #endif /* _LINUX_IDR_H */
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 1da1fb01e993..3e1c7ef06a0b 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -310,10 +310,6 @@ int radix_tree_split(struct radix_tree_root *, unsigned long index,
 int radix_tree_join(struct radix_tree_root *, unsigned long index,
 			unsigned new_order, void *);
 
-void __rcu **idr_get_free(struct radix_tree_root *root,
-			      struct radix_tree_iter *iter, gfp_t gfp,
-			      unsigned long max);
-
 enum {
 	RADIX_TREE_ITER_TAG_MASK = 0x0f,	/* tag index in lower nybble */
 	RADIX_TREE_ITER_TAGGED   = 0x10,	/* lookup tagged slots */
diff --git a/lib/idr.c b/lib/idr.c
index 713b19e6f1b3..574dcade0c4b 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -6,7 +6,6 @@
 #include <linux/xarray.h>
 
 DEFINE_PER_CPU(struct ida_bitmap *, ida_bitmap);
-static DEFINE_SPINLOCK(simple_ida_lock);
 
 /* In radix-tree.c temporarily */
 extern bool idr_nomem(struct xa_state *, gfp_t);
@@ -277,104 +276,114 @@ EXPORT_SYMBOL_GPL(idr_replace);
 
 #define IDA_MAX (0x80000000U / IDA_BITMAP_BITS - 1)
 
+static struct ida_bitmap *alloc_ida_bitmap(void)
+{
+	struct ida_bitmap *bitmap = this_cpu_xchg(ida_bitmap, NULL);
+	if (bitmap)
+		memset(bitmap, 0, sizeof(*bitmap));
+	return bitmap;
+}
+
+static void free_ida_bitmap(struct ida_bitmap *bitmap)
+{
+	if (this_cpu_cmpxchg(ida_bitmap, NULL, bitmap))
+		kfree(bitmap);
+}
+
 /**
  * ida_get_new_above - allocate new ID above or equal to a start id
  * @ida: ida handle
  * @start: id to start search at
  * @id: pointer to the allocated handle
  *
- * Allocate new ID above or equal to @start.  It should be called
- * with any required locks to ensure that concurrent calls to
- * ida_get_new_above() / ida_get_new() / ida_remove() are not allowed.
- * Consider using ida_simple_get() if you do not have complex locking
- * requirements.
+ * Allocate new ID above or equal to @start.  The ida has its own lock,
+ * although you may wish to provide your own locking around it.
  *
  * If memory is required, it will return %-EAGAIN, you should unlock
  * and go back to the ida_pre_get() call.  If the ida is full, it will
  * return %-ENOSPC.  On success, it will return 0.
  *
- * @id returns a value in the range @start ... %0x7fffffff.
+ * @id returns a value in the range @start ... %INT_MAX.
  */
 int ida_get_new_above(struct ida *ida, int start, int *id)
 {
-	struct radix_tree_root *root = &ida->ida_rt;
-	void __rcu **slot;
-	struct radix_tree_iter iter;
+	unsigned long flags;
+	unsigned long index = start / IDA_BITMAP_BITS;
+	unsigned int bit = start % IDA_BITMAP_BITS;
+	XA_STATE(xas, index);
 	struct ida_bitmap *bitmap;
-	unsigned long index;
-	unsigned bit;
-	int new;
-
-	index = start / IDA_BITMAP_BITS;
-	bit = start % IDA_BITMAP_BITS;
-
-	slot = radix_tree_iter_init(&iter, index);
-	for (;;) {
-		if (slot)
-			slot = radix_tree_next_slot(slot, &iter,
-						RADIX_TREE_ITER_TAGGED);
-		if (!slot) {
-			slot = idr_get_free(root, &iter, GFP_NOWAIT, IDA_MAX);
-			if (IS_ERR(slot)) {
-				if (slot == ERR_PTR(-ENOMEM))
-					return -EAGAIN;
-				return PTR_ERR(slot);
-			}
-		}
-		if (iter.index > index)
-			bit = 0;
-		new = iter.index * IDA_BITMAP_BITS;
-		bitmap = rcu_dereference_raw(*slot);
-		if (xa_is_value(bitmap)) {
-			unsigned long tmp = xa_to_value(bitmap);
-			int vbit = find_next_zero_bit(&tmp, BITS_PER_XA_VALUE,
-							bit);
-			if (vbit < BITS_PER_XA_VALUE) {
-				tmp |= 1UL << vbit;
-				rcu_assign_pointer(*slot, xa_mk_value(tmp));
-				*id = new + vbit;
-				return 0;
-			}
-			bitmap = this_cpu_xchg(ida_bitmap, NULL);
-			if (!bitmap)
-				return -EAGAIN;
-			memset(bitmap, 0, sizeof(*bitmap));
-			bitmap->bitmap[0] = tmp;
-			rcu_assign_pointer(*slot, bitmap);
-		}
+	unsigned int new;
+
+	xa_lock_irqsave(&ida->idxa, flags);
+retry:
+	bitmap = xas_find_tag(&ida->idxa, &xas, IDA_MAX, XA_FREE_TAG);
+	if (xas.xa_index > IDA_MAX)
+		goto nospc;
+	if (xas.xa_index > index)
+		bit = 0;
+	new = xas.xa_index * IDA_BITMAP_BITS;
+	if (xa_is_value(bitmap)) {
+		unsigned long value = xa_to_value(bitmap);
+		if (bit < BITS_PER_XA_VALUE) {
+			unsigned long tmp = value | ((1UL << bit) - 1);
+			bit = ffz(tmp);
 
-		if (bitmap) {
-			bit = find_next_zero_bit(bitmap->bitmap,
-							IDA_BITMAP_BITS, bit);
-			new += bit;
-			if (new < 0)
-				return -ENOSPC;
-			if (bit == IDA_BITMAP_BITS)
-				continue;
-
-			__set_bit(bit, bitmap->bitmap);
-			if (bitmap_full(bitmap->bitmap, IDA_BITMAP_BITS))
-				radix_tree_iter_tag_clear(root, &iter,
-								XA_FREE_TAG);
-		} else {
-			new += bit;
-			if (new < 0)
-				return -ENOSPC;
 			if (bit < BITS_PER_XA_VALUE) {
-				bitmap = xa_mk_value(1UL << bit);
-			} else {
-				bitmap = this_cpu_xchg(ida_bitmap, NULL);
-				if (!bitmap)
-					return -EAGAIN;
-				memset(bitmap, 0, sizeof(*bitmap));
-				__set_bit(bit, bitmap->bitmap);
+				value |= (1UL << bit);
+				xas_store(&ida->idxa, &xas, xa_mk_value(value));
+				new += bit;
+				goto unlock;
 			}
-			radix_tree_iter_replace(root, &iter, slot, bitmap);
 		}
 
-		*id = new;
-		return 0;
+		bitmap = alloc_ida_bitmap();
+		if (!bitmap)
+			goto nomem;
+		bitmap->bitmap[0] = value;
+		new += bit;
+		__set_bit(bit, bitmap->bitmap);
+		xas_store(&ida->idxa, &xas, bitmap);
+		if (xas_error(&xas))
+			free_ida_bitmap(bitmap);
+	} else if (bitmap) {
+		bit = find_next_zero_bit(bitmap->bitmap, IDA_BITMAP_BITS, bit);
+		if (bit == IDA_BITMAP_BITS)
+			goto retry;
+		new += bit;
+		if (new > INT_MAX)
+			goto nospc;
+		__set_bit(bit, bitmap->bitmap);
+		if (bitmap_full(bitmap->bitmap, IDA_BITMAP_BITS))
+			xas_clear_tag(&ida->idxa, &xas, XA_FREE_TAG);
+	} else if (bit < BITS_PER_XA_VALUE) {
+		new += bit;
+		bitmap = xa_mk_value(1UL << bit);
+		xas_store(&ida->idxa, &xas, bitmap);
+	} else {
+		bitmap = alloc_ida_bitmap();
+		if (!bitmap)
+			goto nomem;
+		new += bit;
+		__set_bit(bit, bitmap->bitmap);
+		xas_store(&ida->idxa, &xas, bitmap);
+		if (xas_error(&xas))
+			free_ida_bitmap(bitmap);
 	}
+
+	if (idr_nomem(&xas, GFP_NOWAIT))
+		goto retry;
+unlock:
+	xa_unlock_irqrestore(&ida->idxa, flags);
+	if (xas_error(&xas) == -ENOMEM)
+		return -EAGAIN;
+	*id = new;
+	return 0;
+nospc:
+	xa_unlock_irqrestore(&ida->idxa, flags);
+	return -ENOSPC;
+nomem:
+	xa_unlock_irqrestore(&ida->idxa, flags);
+	return -EAGAIN;
 }
 EXPORT_SYMBOL(ida_get_new_above);
 
@@ -382,45 +391,44 @@ EXPORT_SYMBOL(ida_get_new_above);
  * ida_remove - Free the given ID
  * @ida: ida handle
  * @id: ID to free
- *
- * This function should not be called at the same time as ida_get_new_above().
  */
 void ida_remove(struct ida *ida, int id)
 {
+	unsigned long flags;
 	unsigned long index = id / IDA_BITMAP_BITS;
-	unsigned offset = id % IDA_BITMAP_BITS;
+	unsigned bit = id % IDA_BITMAP_BITS;
+	XA_STATE(xas, index);
 	struct ida_bitmap *bitmap;
-	unsigned long *btmp;
-	struct radix_tree_iter iter;
-	void __rcu **slot;
 
-	slot = radix_tree_iter_lookup(&ida->ida_rt, &iter, index);
-	if (!slot)
+	xa_lock_irqsave(&ida->idxa, flags);
+	bitmap = xas_load(&ida->idxa, &xas);
+	if (!bitmap)
 		goto err;
-
-	bitmap = rcu_dereference_raw(*slot);
 	if (xa_is_value(bitmap)) {
-		btmp = (unsigned long *)slot;
-		offset += 1; /* Intimate knowledge of the xa_data encoding */
-		if (offset >= BITS_PER_LONG)
+		unsigned long v = xa_to_value(bitmap);
+		if (bit >= BITS_PER_XA_VALUE)
 			goto err;
+		if (!(v & (1UL << bit)))
+			goto err;
+		v &= ~(1UL << bit);
+		if (v)
+			bitmap = xa_mk_value(v);
+		else
+			bitmap = NULL;
+		xas_store(&ida->idxa, &xas, bitmap);
 	} else {
-		btmp = bitmap->bitmap;
-	}
-	if (!test_bit(offset, btmp))
-		goto err;
-
-	__clear_bit(offset, btmp);
-	radix_tree_iter_tag_set(&ida->ida_rt, &iter, XA_FREE_TAG);
-	if (xa_is_value(bitmap)) {
-		if (xa_to_value(rcu_dereference_raw(*slot)) == 0)
-			radix_tree_iter_delete(&ida->ida_rt, &iter, slot);
-	} else if (bitmap_empty(btmp, IDA_BITMAP_BITS)) {
-		kfree(bitmap);
-		radix_tree_iter_delete(&ida->ida_rt, &iter, slot);
+		if (!__test_and_clear_bit(bit, bitmap->bitmap))
+			goto err;
+		if (bitmap_empty(bitmap->bitmap, IDA_BITMAP_BITS)) {
+			kfree(bitmap);
+			xas_store(&ida->idxa, &xas, NULL);
+		}
 	}
+	xas_set_tag(&ida->idxa, &xas, XA_FREE_TAG);
+	xa_unlock_irqrestore(&ida->idxa, flags);
 	return;
  err:
+	xa_unlock_irqrestore(&ida->idxa, flags);
 	WARN(1, "ida_remove called for id=%d which is not allocated.\n", id);
 }
 EXPORT_SYMBOL(ida_remove);
@@ -430,21 +438,21 @@ EXPORT_SYMBOL(ida_remove);
  * @ida: ida handle
  *
  * Calling this function releases all resources associated with an IDA.  When
- * this call returns, the IDA is empty and can be reused or freed.  The caller
- * should not allow ida_remove() or ida_get_new_above() to be called at the
- * same time.
+ * this call returns, the IDA is empty and can be reused or freed.
  */
 void ida_destroy(struct ida *ida)
 {
-	struct radix_tree_iter iter;
-	void __rcu **slot;
+	XA_STATE(xas, 0);
+	unsigned long flags;
+	struct ida_bitmap *bitmap;
 
-	radix_tree_for_each_slot(slot, &ida->ida_rt, &iter, 0) {
-		struct ida_bitmap *bitmap = rcu_dereference_raw(*slot);
+	xa_lock_irqsave(&ida->idxa, flags);
+	xas_for_each(&ida->idxa, &xas, bitmap, ULONG_MAX) {
 		if (!xa_is_value(bitmap))
 			kfree(bitmap);
-		radix_tree_iter_delete(&ida->ida_rt, &iter, slot);
+		xas_store(&ida->idxa, &xas, NULL);
 	}
+	xa_unlock_irqrestore(&ida->idxa, flags);
 }
 EXPORT_SYMBOL(ida_destroy);
 
@@ -468,7 +476,6 @@ int ida_simple_get(struct ida *ida, unsigned int start, unsigned int end,
 {
 	int ret, id;
 	unsigned int max;
-	unsigned long flags;
 
 	BUG_ON((int)start < 0);
 	BUG_ON((int)end < 0);
@@ -484,7 +491,6 @@ int ida_simple_get(struct ida *ida, unsigned int start, unsigned int end,
 	if (!ida_pre_get(ida, gfp_mask))
 		return -ENOMEM;
 
-	spin_lock_irqsave(&simple_ida_lock, flags);
 	ret = ida_get_new_above(ida, start, &id);
 	if (!ret) {
 		if (id > max) {
@@ -494,7 +500,6 @@ int ida_simple_get(struct ida *ida, unsigned int start, unsigned int end,
 			ret = id;
 		}
 	}
-	spin_unlock_irqrestore(&simple_ida_lock, flags);
 
 	if (unlikely(ret == -EAGAIN))
 		goto again;
@@ -515,11 +520,7 @@ EXPORT_SYMBOL(ida_simple_get);
  */
 void ida_simple_remove(struct ida *ida, unsigned int id)
 {
-	unsigned long flags;
-
 	BUG_ON((int)id < 0);
-	spin_lock_irqsave(&simple_ida_lock, flags);
 	ida_remove(ida, id);
-	spin_unlock_irqrestore(&simple_ida_lock, flags);
 }
 EXPORT_SYMBOL(ida_simple_remove);
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 3fbc0751b181..f261fb2a92d2 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -247,13 +247,6 @@ static inline unsigned long node_maxindex(const struct radix_tree_node *node)
 	return shift_maxindex(node->shift);
 }
 
-static unsigned long rnext_index(unsigned long index,
-				const struct radix_tree_node *node,
-				unsigned long offset)
-{
-	return (index & ~node_maxindex(node)) + (offset << node->shift);
-}
-
 #ifndef __KERNEL__
 static void dump_node(struct radix_tree_node *node, unsigned long index)
 {
@@ -338,10 +331,10 @@ static void dump_ida_node(void *entry, unsigned long index)
 
 static void ida_dump(struct ida *ida)
 {
-	struct radix_tree_root *root = &ida->ida_rt;
-	pr_debug("ida: %p node %p free %d\n", ida, root->xa_head,
-				root->xa_flags >> ROOT_TAG_SHIFT);
-	dump_ida_node(root->xa_head, 0);
+	struct xarray *xa = &ida->idxa;
+	pr_debug("ida: %p node %p free %d\n", ida, xa->xa_head,
+				xa->xa_flags >> ROOT_TAG_SHIFT);
+	dump_ida_node(xa->xa_head, 0);
 }
 #endif
 
@@ -2124,77 +2117,6 @@ int ida_pre_get(struct ida *ida, gfp_t gfp)
 }
 EXPORT_SYMBOL(ida_pre_get);
 
-void __rcu **idr_get_free(struct radix_tree_root *root,
-			      struct radix_tree_iter *iter, gfp_t gfp,
-			      unsigned long max)
-{
-	struct radix_tree_node *node = NULL, *child;
-	void __rcu **slot = (void __rcu **)&root->xa_head;
-	unsigned long maxindex, start = iter->next_index;
-	unsigned int shift, offset = 0;
-
- grow:
-	shift = radix_tree_load_root(root, &child, &maxindex);
-	if (!radix_tree_tagged(root, XA_FREE_TAG))
-		start = max(start, maxindex + 1);
-	if (start > max)
-		return ERR_PTR(-ENOSPC);
-
-	if (start > maxindex) {
-		int error = radix_tree_extend(root, gfp, start, shift);
-		if (error < 0)
-			return ERR_PTR(error);
-		shift = error;
-		child = rcu_dereference_raw(root->xa_head);
-	}
-
-	while (shift) {
-		shift -= RADIX_TREE_MAP_SHIFT;
-		if (child == NULL) {
-			/* Have to add a child node.  */
-			child = radix_tree_node_alloc(gfp, node, root, shift,
-							offset, 0, 0);
-			if (!child)
-				return ERR_PTR(-ENOMEM);
-			all_tag_set(child, XA_FREE_TAG);
-			rcu_assign_pointer(*slot, node_to_entry(child));
-			if (node)
-				node->count++;
-		} else if (!radix_tree_is_internal_node(child))
-			break;
-
-		node = entry_to_node(child);
-		offset = radix_tree_descend(node, &child, start);
-		if (!rtag_get(node, XA_FREE_TAG, offset)) {
-			offset = radix_tree_find_next_bit(node, XA_FREE_TAG,
-							offset + 1);
-			start = rnext_index(start, node, offset);
-			if (start > max)
-				return ERR_PTR(-ENOSPC);
-			while (offset == RADIX_TREE_MAP_SIZE) {
-				offset = node->offset + 1;
-				node = node->parent;
-				if (!node)
-					goto grow;
-				shift = node->shift;
-			}
-			child = rcu_dereference_raw(node->slots[offset]);
-		}
-		slot = &node->slots[offset];
-	}
-
-	iter->index = start;
-	if (node)
-		iter->next_index = 1 + min(max, (start | node_maxindex(node)));
-	else
-		iter->next_index = 1;
-	iter->node = node;
-	__set_iter_shift(iter, shift);
-	set_iter_tags(iter, node, offset, XA_FREE_TAG);
-
-	return slot;
-}
-
 static void
 radix_tree_node_ctor(void *arg)
 {
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 32/62] ida: Convert to using xarray
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Use the xarray infrstructure like we used the radix tree infrastructure.
This lets us get rid of idr_get_free() from the radix tree code.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/idr.h        |   8 +-
 include/linux/radix-tree.h |   4 -
 lib/idr.c                  | 239 +++++++++++++++++++++++----------------------
 lib/radix-tree.c           |  86 +---------------
 4 files changed, 128 insertions(+), 209 deletions(-)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index 57945eb0792a..8f62f7ba79fd 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -170,11 +170,11 @@ struct ida_bitmap {
 DECLARE_PER_CPU(struct ida_bitmap *, ida_bitmap);
 
 struct ida {
-	struct radix_tree_root	ida_rt;
+	struct xarray	idxa;
 };
 
 #define IDA_INIT(name)	{						\
-	.ida_rt = RADIX_TREE_INIT(name, IDR_INIT_FLAGS | GFP_NOWAIT),	\
+	.idxa = __XARRAY_INIT(name.idxa, IDR_INIT_FLAGS)		\
 }
 #define DEFINE_IDA(name)	struct ida name = IDA_INIT(name)
 
@@ -189,7 +189,7 @@ void ida_simple_remove(struct ida *ida, unsigned int id);
 
 static inline void ida_init(struct ida *ida)
 {
-	INIT_RADIX_TREE(&ida->ida_rt, IDR_INIT_FLAGS | GFP_NOWAIT);
+	__xa_init(&ida->idxa, IDR_INIT_FLAGS);
 }
 
 /**
@@ -206,6 +206,6 @@ static inline int ida_get_new(struct ida *ida, int *p_id)
 
 static inline bool ida_is_empty(const struct ida *ida)
 {
-	return radix_tree_empty(&ida->ida_rt);
+	return xa_empty(&ida->idxa);
 }
 #endif /* _LINUX_IDR_H */
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 1da1fb01e993..3e1c7ef06a0b 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -310,10 +310,6 @@ int radix_tree_split(struct radix_tree_root *, unsigned long index,
 int radix_tree_join(struct radix_tree_root *, unsigned long index,
 			unsigned new_order, void *);
 
-void __rcu **idr_get_free(struct radix_tree_root *root,
-			      struct radix_tree_iter *iter, gfp_t gfp,
-			      unsigned long max);
-
 enum {
 	RADIX_TREE_ITER_TAG_MASK = 0x0f,	/* tag index in lower nybble */
 	RADIX_TREE_ITER_TAGGED   = 0x10,	/* lookup tagged slots */
diff --git a/lib/idr.c b/lib/idr.c
index 713b19e6f1b3..574dcade0c4b 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -6,7 +6,6 @@
 #include <linux/xarray.h>
 
 DEFINE_PER_CPU(struct ida_bitmap *, ida_bitmap);
-static DEFINE_SPINLOCK(simple_ida_lock);
 
 /* In radix-tree.c temporarily */
 extern bool idr_nomem(struct xa_state *, gfp_t);
@@ -277,104 +276,114 @@ EXPORT_SYMBOL_GPL(idr_replace);
 
 #define IDA_MAX (0x80000000U / IDA_BITMAP_BITS - 1)
 
+static struct ida_bitmap *alloc_ida_bitmap(void)
+{
+	struct ida_bitmap *bitmap = this_cpu_xchg(ida_bitmap, NULL);
+	if (bitmap)
+		memset(bitmap, 0, sizeof(*bitmap));
+	return bitmap;
+}
+
+static void free_ida_bitmap(struct ida_bitmap *bitmap)
+{
+	if (this_cpu_cmpxchg(ida_bitmap, NULL, bitmap))
+		kfree(bitmap);
+}
+
 /**
  * ida_get_new_above - allocate new ID above or equal to a start id
  * @ida: ida handle
  * @start: id to start search at
  * @id: pointer to the allocated handle
  *
- * Allocate new ID above or equal to @start.  It should be called
- * with any required locks to ensure that concurrent calls to
- * ida_get_new_above() / ida_get_new() / ida_remove() are not allowed.
- * Consider using ida_simple_get() if you do not have complex locking
- * requirements.
+ * Allocate new ID above or equal to @start.  The ida has its own lock,
+ * although you may wish to provide your own locking around it.
  *
  * If memory is required, it will return %-EAGAIN, you should unlock
  * and go back to the ida_pre_get() call.  If the ida is full, it will
  * return %-ENOSPC.  On success, it will return 0.
  *
- * @id returns a value in the range @start ... %0x7fffffff.
+ * @id returns a value in the range @start ... %INT_MAX.
  */
 int ida_get_new_above(struct ida *ida, int start, int *id)
 {
-	struct radix_tree_root *root = &ida->ida_rt;
-	void __rcu **slot;
-	struct radix_tree_iter iter;
+	unsigned long flags;
+	unsigned long index = start / IDA_BITMAP_BITS;
+	unsigned int bit = start % IDA_BITMAP_BITS;
+	XA_STATE(xas, index);
 	struct ida_bitmap *bitmap;
-	unsigned long index;
-	unsigned bit;
-	int new;
-
-	index = start / IDA_BITMAP_BITS;
-	bit = start % IDA_BITMAP_BITS;
-
-	slot = radix_tree_iter_init(&iter, index);
-	for (;;) {
-		if (slot)
-			slot = radix_tree_next_slot(slot, &iter,
-						RADIX_TREE_ITER_TAGGED);
-		if (!slot) {
-			slot = idr_get_free(root, &iter, GFP_NOWAIT, IDA_MAX);
-			if (IS_ERR(slot)) {
-				if (slot == ERR_PTR(-ENOMEM))
-					return -EAGAIN;
-				return PTR_ERR(slot);
-			}
-		}
-		if (iter.index > index)
-			bit = 0;
-		new = iter.index * IDA_BITMAP_BITS;
-		bitmap = rcu_dereference_raw(*slot);
-		if (xa_is_value(bitmap)) {
-			unsigned long tmp = xa_to_value(bitmap);
-			int vbit = find_next_zero_bit(&tmp, BITS_PER_XA_VALUE,
-							bit);
-			if (vbit < BITS_PER_XA_VALUE) {
-				tmp |= 1UL << vbit;
-				rcu_assign_pointer(*slot, xa_mk_value(tmp));
-				*id = new + vbit;
-				return 0;
-			}
-			bitmap = this_cpu_xchg(ida_bitmap, NULL);
-			if (!bitmap)
-				return -EAGAIN;
-			memset(bitmap, 0, sizeof(*bitmap));
-			bitmap->bitmap[0] = tmp;
-			rcu_assign_pointer(*slot, bitmap);
-		}
+	unsigned int new;
+
+	xa_lock_irqsave(&ida->idxa, flags);
+retry:
+	bitmap = xas_find_tag(&ida->idxa, &xas, IDA_MAX, XA_FREE_TAG);
+	if (xas.xa_index > IDA_MAX)
+		goto nospc;
+	if (xas.xa_index > index)
+		bit = 0;
+	new = xas.xa_index * IDA_BITMAP_BITS;
+	if (xa_is_value(bitmap)) {
+		unsigned long value = xa_to_value(bitmap);
+		if (bit < BITS_PER_XA_VALUE) {
+			unsigned long tmp = value | ((1UL << bit) - 1);
+			bit = ffz(tmp);
 
-		if (bitmap) {
-			bit = find_next_zero_bit(bitmap->bitmap,
-							IDA_BITMAP_BITS, bit);
-			new += bit;
-			if (new < 0)
-				return -ENOSPC;
-			if (bit == IDA_BITMAP_BITS)
-				continue;
-
-			__set_bit(bit, bitmap->bitmap);
-			if (bitmap_full(bitmap->bitmap, IDA_BITMAP_BITS))
-				radix_tree_iter_tag_clear(root, &iter,
-								XA_FREE_TAG);
-		} else {
-			new += bit;
-			if (new < 0)
-				return -ENOSPC;
 			if (bit < BITS_PER_XA_VALUE) {
-				bitmap = xa_mk_value(1UL << bit);
-			} else {
-				bitmap = this_cpu_xchg(ida_bitmap, NULL);
-				if (!bitmap)
-					return -EAGAIN;
-				memset(bitmap, 0, sizeof(*bitmap));
-				__set_bit(bit, bitmap->bitmap);
+				value |= (1UL << bit);
+				xas_store(&ida->idxa, &xas, xa_mk_value(value));
+				new += bit;
+				goto unlock;
 			}
-			radix_tree_iter_replace(root, &iter, slot, bitmap);
 		}
 
-		*id = new;
-		return 0;
+		bitmap = alloc_ida_bitmap();
+		if (!bitmap)
+			goto nomem;
+		bitmap->bitmap[0] = value;
+		new += bit;
+		__set_bit(bit, bitmap->bitmap);
+		xas_store(&ida->idxa, &xas, bitmap);
+		if (xas_error(&xas))
+			free_ida_bitmap(bitmap);
+	} else if (bitmap) {
+		bit = find_next_zero_bit(bitmap->bitmap, IDA_BITMAP_BITS, bit);
+		if (bit == IDA_BITMAP_BITS)
+			goto retry;
+		new += bit;
+		if (new > INT_MAX)
+			goto nospc;
+		__set_bit(bit, bitmap->bitmap);
+		if (bitmap_full(bitmap->bitmap, IDA_BITMAP_BITS))
+			xas_clear_tag(&ida->idxa, &xas, XA_FREE_TAG);
+	} else if (bit < BITS_PER_XA_VALUE) {
+		new += bit;
+		bitmap = xa_mk_value(1UL << bit);
+		xas_store(&ida->idxa, &xas, bitmap);
+	} else {
+		bitmap = alloc_ida_bitmap();
+		if (!bitmap)
+			goto nomem;
+		new += bit;
+		__set_bit(bit, bitmap->bitmap);
+		xas_store(&ida->idxa, &xas, bitmap);
+		if (xas_error(&xas))
+			free_ida_bitmap(bitmap);
 	}
+
+	if (idr_nomem(&xas, GFP_NOWAIT))
+		goto retry;
+unlock:
+	xa_unlock_irqrestore(&ida->idxa, flags);
+	if (xas_error(&xas) == -ENOMEM)
+		return -EAGAIN;
+	*id = new;
+	return 0;
+nospc:
+	xa_unlock_irqrestore(&ida->idxa, flags);
+	return -ENOSPC;
+nomem:
+	xa_unlock_irqrestore(&ida->idxa, flags);
+	return -EAGAIN;
 }
 EXPORT_SYMBOL(ida_get_new_above);
 
@@ -382,45 +391,44 @@ EXPORT_SYMBOL(ida_get_new_above);
  * ida_remove - Free the given ID
  * @ida: ida handle
  * @id: ID to free
- *
- * This function should not be called at the same time as ida_get_new_above().
  */
 void ida_remove(struct ida *ida, int id)
 {
+	unsigned long flags;
 	unsigned long index = id / IDA_BITMAP_BITS;
-	unsigned offset = id % IDA_BITMAP_BITS;
+	unsigned bit = id % IDA_BITMAP_BITS;
+	XA_STATE(xas, index);
 	struct ida_bitmap *bitmap;
-	unsigned long *btmp;
-	struct radix_tree_iter iter;
-	void __rcu **slot;
 
-	slot = radix_tree_iter_lookup(&ida->ida_rt, &iter, index);
-	if (!slot)
+	xa_lock_irqsave(&ida->idxa, flags);
+	bitmap = xas_load(&ida->idxa, &xas);
+	if (!bitmap)
 		goto err;
-
-	bitmap = rcu_dereference_raw(*slot);
 	if (xa_is_value(bitmap)) {
-		btmp = (unsigned long *)slot;
-		offset += 1; /* Intimate knowledge of the xa_data encoding */
-		if (offset >= BITS_PER_LONG)
+		unsigned long v = xa_to_value(bitmap);
+		if (bit >= BITS_PER_XA_VALUE)
 			goto err;
+		if (!(v & (1UL << bit)))
+			goto err;
+		v &= ~(1UL << bit);
+		if (v)
+			bitmap = xa_mk_value(v);
+		else
+			bitmap = NULL;
+		xas_store(&ida->idxa, &xas, bitmap);
 	} else {
-		btmp = bitmap->bitmap;
-	}
-	if (!test_bit(offset, btmp))
-		goto err;
-
-	__clear_bit(offset, btmp);
-	radix_tree_iter_tag_set(&ida->ida_rt, &iter, XA_FREE_TAG);
-	if (xa_is_value(bitmap)) {
-		if (xa_to_value(rcu_dereference_raw(*slot)) == 0)
-			radix_tree_iter_delete(&ida->ida_rt, &iter, slot);
-	} else if (bitmap_empty(btmp, IDA_BITMAP_BITS)) {
-		kfree(bitmap);
-		radix_tree_iter_delete(&ida->ida_rt, &iter, slot);
+		if (!__test_and_clear_bit(bit, bitmap->bitmap))
+			goto err;
+		if (bitmap_empty(bitmap->bitmap, IDA_BITMAP_BITS)) {
+			kfree(bitmap);
+			xas_store(&ida->idxa, &xas, NULL);
+		}
 	}
+	xas_set_tag(&ida->idxa, &xas, XA_FREE_TAG);
+	xa_unlock_irqrestore(&ida->idxa, flags);
 	return;
  err:
+	xa_unlock_irqrestore(&ida->idxa, flags);
 	WARN(1, "ida_remove called for id=%d which is not allocated.\n", id);
 }
 EXPORT_SYMBOL(ida_remove);
@@ -430,21 +438,21 @@ EXPORT_SYMBOL(ida_remove);
  * @ida: ida handle
  *
  * Calling this function releases all resources associated with an IDA.  When
- * this call returns, the IDA is empty and can be reused or freed.  The caller
- * should not allow ida_remove() or ida_get_new_above() to be called at the
- * same time.
+ * this call returns, the IDA is empty and can be reused or freed.
  */
 void ida_destroy(struct ida *ida)
 {
-	struct radix_tree_iter iter;
-	void __rcu **slot;
+	XA_STATE(xas, 0);
+	unsigned long flags;
+	struct ida_bitmap *bitmap;
 
-	radix_tree_for_each_slot(slot, &ida->ida_rt, &iter, 0) {
-		struct ida_bitmap *bitmap = rcu_dereference_raw(*slot);
+	xa_lock_irqsave(&ida->idxa, flags);
+	xas_for_each(&ida->idxa, &xas, bitmap, ULONG_MAX) {
 		if (!xa_is_value(bitmap))
 			kfree(bitmap);
-		radix_tree_iter_delete(&ida->ida_rt, &iter, slot);
+		xas_store(&ida->idxa, &xas, NULL);
 	}
+	xa_unlock_irqrestore(&ida->idxa, flags);
 }
 EXPORT_SYMBOL(ida_destroy);
 
@@ -468,7 +476,6 @@ int ida_simple_get(struct ida *ida, unsigned int start, unsigned int end,
 {
 	int ret, id;
 	unsigned int max;
-	unsigned long flags;
 
 	BUG_ON((int)start < 0);
 	BUG_ON((int)end < 0);
@@ -484,7 +491,6 @@ int ida_simple_get(struct ida *ida, unsigned int start, unsigned int end,
 	if (!ida_pre_get(ida, gfp_mask))
 		return -ENOMEM;
 
-	spin_lock_irqsave(&simple_ida_lock, flags);
 	ret = ida_get_new_above(ida, start, &id);
 	if (!ret) {
 		if (id > max) {
@@ -494,7 +500,6 @@ int ida_simple_get(struct ida *ida, unsigned int start, unsigned int end,
 			ret = id;
 		}
 	}
-	spin_unlock_irqrestore(&simple_ida_lock, flags);
 
 	if (unlikely(ret == -EAGAIN))
 		goto again;
@@ -515,11 +520,7 @@ EXPORT_SYMBOL(ida_simple_get);
  */
 void ida_simple_remove(struct ida *ida, unsigned int id)
 {
-	unsigned long flags;
-
 	BUG_ON((int)id < 0);
-	spin_lock_irqsave(&simple_ida_lock, flags);
 	ida_remove(ida, id);
-	spin_unlock_irqrestore(&simple_ida_lock, flags);
 }
 EXPORT_SYMBOL(ida_simple_remove);
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 3fbc0751b181..f261fb2a92d2 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -247,13 +247,6 @@ static inline unsigned long node_maxindex(const struct radix_tree_node *node)
 	return shift_maxindex(node->shift);
 }
 
-static unsigned long rnext_index(unsigned long index,
-				const struct radix_tree_node *node,
-				unsigned long offset)
-{
-	return (index & ~node_maxindex(node)) + (offset << node->shift);
-}
-
 #ifndef __KERNEL__
 static void dump_node(struct radix_tree_node *node, unsigned long index)
 {
@@ -338,10 +331,10 @@ static void dump_ida_node(void *entry, unsigned long index)
 
 static void ida_dump(struct ida *ida)
 {
-	struct radix_tree_root *root = &ida->ida_rt;
-	pr_debug("ida: %p node %p free %d\n", ida, root->xa_head,
-				root->xa_flags >> ROOT_TAG_SHIFT);
-	dump_ida_node(root->xa_head, 0);
+	struct xarray *xa = &ida->idxa;
+	pr_debug("ida: %p node %p free %d\n", ida, xa->xa_head,
+				xa->xa_flags >> ROOT_TAG_SHIFT);
+	dump_ida_node(xa->xa_head, 0);
 }
 #endif
 
@@ -2124,77 +2117,6 @@ int ida_pre_get(struct ida *ida, gfp_t gfp)
 }
 EXPORT_SYMBOL(ida_pre_get);
 
-void __rcu **idr_get_free(struct radix_tree_root *root,
-			      struct radix_tree_iter *iter, gfp_t gfp,
-			      unsigned long max)
-{
-	struct radix_tree_node *node = NULL, *child;
-	void __rcu **slot = (void __rcu **)&root->xa_head;
-	unsigned long maxindex, start = iter->next_index;
-	unsigned int shift, offset = 0;
-
- grow:
-	shift = radix_tree_load_root(root, &child, &maxindex);
-	if (!radix_tree_tagged(root, XA_FREE_TAG))
-		start = max(start, maxindex + 1);
-	if (start > max)
-		return ERR_PTR(-ENOSPC);
-
-	if (start > maxindex) {
-		int error = radix_tree_extend(root, gfp, start, shift);
-		if (error < 0)
-			return ERR_PTR(error);
-		shift = error;
-		child = rcu_dereference_raw(root->xa_head);
-	}
-
-	while (shift) {
-		shift -= RADIX_TREE_MAP_SHIFT;
-		if (child == NULL) {
-			/* Have to add a child node.  */
-			child = radix_tree_node_alloc(gfp, node, root, shift,
-							offset, 0, 0);
-			if (!child)
-				return ERR_PTR(-ENOMEM);
-			all_tag_set(child, XA_FREE_TAG);
-			rcu_assign_pointer(*slot, node_to_entry(child));
-			if (node)
-				node->count++;
-		} else if (!radix_tree_is_internal_node(child))
-			break;
-
-		node = entry_to_node(child);
-		offset = radix_tree_descend(node, &child, start);
-		if (!rtag_get(node, XA_FREE_TAG, offset)) {
-			offset = radix_tree_find_next_bit(node, XA_FREE_TAG,
-							offset + 1);
-			start = rnext_index(start, node, offset);
-			if (start > max)
-				return ERR_PTR(-ENOSPC);
-			while (offset == RADIX_TREE_MAP_SIZE) {
-				offset = node->offset + 1;
-				node = node->parent;
-				if (!node)
-					goto grow;
-				shift = node->shift;
-			}
-			child = rcu_dereference_raw(node->slots[offset]);
-		}
-		slot = &node->slots[offset];
-	}
-
-	iter->index = start;
-	if (node)
-		iter->next_index = 1 + min(max, (start | node_maxindex(node)));
-	else
-		iter->next_index = 1;
-	iter->node = node;
-	__set_iter_shift(iter, shift);
-	set_iter_tags(iter, node, offset, XA_FREE_TAG);
-
-	return slot;
-}
-
 static void
 radix_tree_node_ctor(void *arg)
 {
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 33/62] page cache: Convert page_cache_next_hole to XArray
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Use xas_find_any() to scan the entries instead of doing a lookup from
the top of the tree each time.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/filemap.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 1c03b0ea105e..accc350f9544 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1349,20 +1349,15 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 pgoff_t page_cache_next_hole(struct address_space *mapping,
 			     pgoff_t index, unsigned long max_scan)
 {
-	unsigned long i;
-
-	for (i = 0; i < max_scan; i++) {
-		struct page *page;
+	XA_STATE(xas, index);
 
-		page = radix_tree_lookup(&mapping->pages, index);
-		if (!page || xa_is_value(page))
-			break;
-		index++;
-		if (index == 0)
+	while (max_scan--) {
+		void *entry = xas_find_any(&mapping->pages, &xas);
+		if (!entry || xa_is_value(entry))
 			break;
 	}
 
-	return index;
+	return xas.xa_index;
 }
 EXPORT_SYMBOL(page_cache_next_hole);
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 33/62] page cache: Convert page_cache_next_hole to XArray
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Use xas_find_any() to scan the entries instead of doing a lookup from
the top of the tree each time.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/filemap.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 1c03b0ea105e..accc350f9544 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1349,20 +1349,15 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 pgoff_t page_cache_next_hole(struct address_space *mapping,
 			     pgoff_t index, unsigned long max_scan)
 {
-	unsigned long i;
-
-	for (i = 0; i < max_scan; i++) {
-		struct page *page;
+	XA_STATE(xas, index);
 
-		page = radix_tree_lookup(&mapping->pages, index);
-		if (!page || xa_is_value(page))
-			break;
-		index++;
-		if (index == 0)
+	while (max_scan--) {
+		void *entry = xas_find_any(&mapping->pages, &xas);
+		if (!entry || xa_is_value(entry))
 			break;
 	}
 
-	return index;
+	return xas.xa_index;
 }
 EXPORT_SYMBOL(page_cache_next_hole);
 
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 34/62] page cache: Use xarray for adding pages
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Use the XArray APIs to add and replace pages in the page cache.  This
removes two uses of the radix tree preload API and is significantly
shorter code.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h |  13 +++++
 mm/filemap.c           | 131 +++++++++++++++++++++----------------------------
 2 files changed, 69 insertions(+), 75 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 8d7cd70beb8e..1648eda4a20d 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -542,6 +542,19 @@ static inline void xas_set_order(struct xa_state *xas, unsigned long index,
 	xas->xa_node = XAS_RESTART;
 }
 
+/**
+ * xas_set_update() - Set up XArray operation state for a callback.
+ * @xas: XArray operation state.
+ * @update: Function to call when updating a node.
+ *
+ * The XArray can notify a caller after it has updated an xa_node.
+ * This is advanced functionality and is only needed by the page cache.
+ */
+static inline void xas_set_update(struct xa_state *xas, xa_update_node_t update)
+{
+	xas->xa_update = update;
+}
+
 /* Skip over any of these entries when iterating */
 static inline bool xa_iter_skip(void *entry)
 {
diff --git a/mm/filemap.c b/mm/filemap.c
index accc350f9544..1d520748789b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -112,34 +112,6 @@
  *   ->tasklist_lock            (memory_failure, collect_procs_ao)
  */
 
-static int page_cache_tree_insert(struct address_space *mapping,
-				  struct page *page, void **shadowp)
-{
-	struct radix_tree_node *node;
-	void **slot;
-	int error;
-
-	error = __radix_tree_create(&mapping->pages, page->index, 0,
-				    &node, &slot);
-	if (error)
-		return error;
-	if (*slot) {
-		void *p;
-
-		p = radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock);
-		if (!xa_is_value(p))
-			return -EEXIST;
-
-		mapping->nrexceptional--;
-		if (shadowp)
-			*shadowp = p;
-	}
-	__radix_tree_replace(&mapping->pages, node, slot, page,
-			     workingset_lookup_update(mapping));
-	mapping->nrpages++;
-	return 0;
-}
-
 static void page_cache_tree_delete(struct address_space *mapping,
 				   struct page *page, void *shadow)
 {
@@ -775,51 +747,44 @@ EXPORT_SYMBOL(file_write_and_wait_range);
  * locked.  This function does not add the new page to the LRU, the
  * caller must do that.
  *
- * The remove + add is atomic.  The only way this function can fail is
- * memory allocation failure.
+ * The remove + add is atomic.  This function cannot fail.
  */
 int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
 {
-	int error;
+	XA_STATE(xas, old->index);
+	struct address_space *mapping = old->mapping;
+	void (*freepage)(struct page *) = mapping->a_ops->freepage;
+	pgoff_t offset = old->index;
+	unsigned long flags;
 
 	VM_BUG_ON_PAGE(!PageLocked(old), old);
 	VM_BUG_ON_PAGE(!PageLocked(new), new);
 	VM_BUG_ON_PAGE(new->mapping, new);
 
-	error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
-	if (!error) {
-		struct address_space *mapping = old->mapping;
-		void (*freepage)(struct page *);
-		unsigned long flags;
-
-		pgoff_t offset = old->index;
-		freepage = mapping->a_ops->freepage;
-
-		get_page(new);
-		new->mapping = mapping;
-		new->index = offset;
+	get_page(new);
+	new->mapping = mapping;
+	new->index = offset;
 
-		xa_lock_irqsave(&mapping->pages, flags);
-		__delete_from_page_cache(old, NULL);
-		error = page_cache_tree_insert(mapping, new, NULL);
-		BUG_ON(error);
+	xa_lock_irqsave(&mapping->pages, flags);
+	xas_store(&mapping->pages, &xas, new);
 
-		/*
-		 * hugetlb pages do not participate in page cache accounting.
-		 */
-		if (!PageHuge(new))
-			__inc_node_page_state(new, NR_FILE_PAGES);
-		if (PageSwapBacked(new))
-			__inc_node_page_state(new, NR_SHMEM);
-		xa_unlock_irqrestore(&mapping->pages, flags);
-		mem_cgroup_migrate(old, new);
-		radix_tree_preload_end();
-		if (freepage)
-			freepage(old);
-		put_page(old);
-	}
+	old->mapping = NULL;
+	/* hugetlb pages do not participate in page cache accounting. */
+	if (!PageHuge(old))
+		__dec_node_page_state(new, NR_FILE_PAGES);
+	if (!PageHuge(new))
+		__inc_node_page_state(new, NR_FILE_PAGES);
+	if (PageSwapBacked(old))
+		__dec_node_page_state(new, NR_SHMEM);
+	if (PageSwapBacked(new))
+		__inc_node_page_state(new, NR_SHMEM);
+	xa_unlock_irqrestore(&mapping->pages, flags);
+	mem_cgroup_migrate(old, new);
+	if (freepage)
+		freepage(old);
+	put_page(old);
 
-	return error;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(replace_page_cache_page);
 
@@ -828,12 +793,15 @@ static int __add_to_page_cache_locked(struct page *page,
 				      pgoff_t offset, gfp_t gfp_mask,
 				      void **shadowp)
 {
+	XA_STATE(xas, offset);
 	int huge = PageHuge(page);
 	struct mem_cgroup *memcg;
 	int error;
+	void *old;
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	VM_BUG_ON_PAGE(PageSwapBacked(page), page);
+	xas_set_update(&xas, workingset_lookup_update(mapping));
 
 	if (!huge) {
 		error = mem_cgroup_try_charge(page, current->mm,
@@ -842,35 +810,48 @@ static int __add_to_page_cache_locked(struct page *page,
 			return error;
 	}
 
-	error = radix_tree_maybe_preload(gfp_mask & ~__GFP_HIGHMEM);
-	if (error) {
-		if (!huge)
-			mem_cgroup_cancel_charge(page, memcg, false);
-		return error;
-	}
-
 	get_page(page);
 	page->mapping = mapping;
 	page->index = offset;
 
+retry:
 	xa_lock_irq(&mapping->pages);
-	error = page_cache_tree_insert(mapping, page, shadowp);
-	radix_tree_preload_end();
-	if (unlikely(error))
-		goto err_insert;
+	old = xas_create(&mapping->pages, &xas);
+	error = xas_error(&xas);
+	if (error) {
+		xa_unlock_irq(&mapping->pages);
+		if (xas_nomem(&xas, gfp_mask & ~__GFP_HIGHMEM))
+			goto retry;
+		goto error;
+	}
+
+	if (xa_is_value(old)) {
+		mapping->nrexceptional--;
+		if (shadowp)
+			*shadowp = old;
+	} else if (old) {
+		goto exist;
+	}
+
+	xas_store(&mapping->pages, &xas, page);
+	mapping->nrpages++;
 
 	/* hugetlb pages do not participate in page cache accounting. */
 	if (!huge)
 		__inc_node_page_state(page, NR_FILE_PAGES);
 	xa_unlock_irq(&mapping->pages);
+	xas_destroy(&xas);
 	if (!huge)
 		mem_cgroup_commit_charge(page, memcg, false, false);
 	trace_mm_filemap_add_to_page_cache(page);
 	return 0;
-err_insert:
+exist:
+	xa_unlock_irq(&mapping->pages);
+	error = -EEXIST;
+error:
+	xas_destroy(&xas);
 	page->mapping = NULL;
 	/* Leave page->index set: truncation relies upon it */
-	xa_unlock_irq(&mapping->pages);
 	if (!huge)
 		mem_cgroup_cancel_charge(page, memcg, false);
 	put_page(page);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 34/62] page cache: Use xarray for adding pages
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Use the XArray APIs to add and replace pages in the page cache.  This
removes two uses of the radix tree preload API and is significantly
shorter code.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h |  13 +++++
 mm/filemap.c           | 131 +++++++++++++++++++++----------------------------
 2 files changed, 69 insertions(+), 75 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 8d7cd70beb8e..1648eda4a20d 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -542,6 +542,19 @@ static inline void xas_set_order(struct xa_state *xas, unsigned long index,
 	xas->xa_node = XAS_RESTART;
 }
 
+/**
+ * xas_set_update() - Set up XArray operation state for a callback.
+ * @xas: XArray operation state.
+ * @update: Function to call when updating a node.
+ *
+ * The XArray can notify a caller after it has updated an xa_node.
+ * This is advanced functionality and is only needed by the page cache.
+ */
+static inline void xas_set_update(struct xa_state *xas, xa_update_node_t update)
+{
+	xas->xa_update = update;
+}
+
 /* Skip over any of these entries when iterating */
 static inline bool xa_iter_skip(void *entry)
 {
diff --git a/mm/filemap.c b/mm/filemap.c
index accc350f9544..1d520748789b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -112,34 +112,6 @@
  *   ->tasklist_lock            (memory_failure, collect_procs_ao)
  */
 
-static int page_cache_tree_insert(struct address_space *mapping,
-				  struct page *page, void **shadowp)
-{
-	struct radix_tree_node *node;
-	void **slot;
-	int error;
-
-	error = __radix_tree_create(&mapping->pages, page->index, 0,
-				    &node, &slot);
-	if (error)
-		return error;
-	if (*slot) {
-		void *p;
-
-		p = radix_tree_deref_slot_protected(slot, &mapping->pages.xa_lock);
-		if (!xa_is_value(p))
-			return -EEXIST;
-
-		mapping->nrexceptional--;
-		if (shadowp)
-			*shadowp = p;
-	}
-	__radix_tree_replace(&mapping->pages, node, slot, page,
-			     workingset_lookup_update(mapping));
-	mapping->nrpages++;
-	return 0;
-}
-
 static void page_cache_tree_delete(struct address_space *mapping,
 				   struct page *page, void *shadow)
 {
@@ -775,51 +747,44 @@ EXPORT_SYMBOL(file_write_and_wait_range);
  * locked.  This function does not add the new page to the LRU, the
  * caller must do that.
  *
- * The remove + add is atomic.  The only way this function can fail is
- * memory allocation failure.
+ * The remove + add is atomic.  This function cannot fail.
  */
 int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
 {
-	int error;
+	XA_STATE(xas, old->index);
+	struct address_space *mapping = old->mapping;
+	void (*freepage)(struct page *) = mapping->a_ops->freepage;
+	pgoff_t offset = old->index;
+	unsigned long flags;
 
 	VM_BUG_ON_PAGE(!PageLocked(old), old);
 	VM_BUG_ON_PAGE(!PageLocked(new), new);
 	VM_BUG_ON_PAGE(new->mapping, new);
 
-	error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
-	if (!error) {
-		struct address_space *mapping = old->mapping;
-		void (*freepage)(struct page *);
-		unsigned long flags;
-
-		pgoff_t offset = old->index;
-		freepage = mapping->a_ops->freepage;
-
-		get_page(new);
-		new->mapping = mapping;
-		new->index = offset;
+	get_page(new);
+	new->mapping = mapping;
+	new->index = offset;
 
-		xa_lock_irqsave(&mapping->pages, flags);
-		__delete_from_page_cache(old, NULL);
-		error = page_cache_tree_insert(mapping, new, NULL);
-		BUG_ON(error);
+	xa_lock_irqsave(&mapping->pages, flags);
+	xas_store(&mapping->pages, &xas, new);
 
-		/*
-		 * hugetlb pages do not participate in page cache accounting.
-		 */
-		if (!PageHuge(new))
-			__inc_node_page_state(new, NR_FILE_PAGES);
-		if (PageSwapBacked(new))
-			__inc_node_page_state(new, NR_SHMEM);
-		xa_unlock_irqrestore(&mapping->pages, flags);
-		mem_cgroup_migrate(old, new);
-		radix_tree_preload_end();
-		if (freepage)
-			freepage(old);
-		put_page(old);
-	}
+	old->mapping = NULL;
+	/* hugetlb pages do not participate in page cache accounting. */
+	if (!PageHuge(old))
+		__dec_node_page_state(new, NR_FILE_PAGES);
+	if (!PageHuge(new))
+		__inc_node_page_state(new, NR_FILE_PAGES);
+	if (PageSwapBacked(old))
+		__dec_node_page_state(new, NR_SHMEM);
+	if (PageSwapBacked(new))
+		__inc_node_page_state(new, NR_SHMEM);
+	xa_unlock_irqrestore(&mapping->pages, flags);
+	mem_cgroup_migrate(old, new);
+	if (freepage)
+		freepage(old);
+	put_page(old);
 
-	return error;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(replace_page_cache_page);
 
@@ -828,12 +793,15 @@ static int __add_to_page_cache_locked(struct page *page,
 				      pgoff_t offset, gfp_t gfp_mask,
 				      void **shadowp)
 {
+	XA_STATE(xas, offset);
 	int huge = PageHuge(page);
 	struct mem_cgroup *memcg;
 	int error;
+	void *old;
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	VM_BUG_ON_PAGE(PageSwapBacked(page), page);
+	xas_set_update(&xas, workingset_lookup_update(mapping));
 
 	if (!huge) {
 		error = mem_cgroup_try_charge(page, current->mm,
@@ -842,35 +810,48 @@ static int __add_to_page_cache_locked(struct page *page,
 			return error;
 	}
 
-	error = radix_tree_maybe_preload(gfp_mask & ~__GFP_HIGHMEM);
-	if (error) {
-		if (!huge)
-			mem_cgroup_cancel_charge(page, memcg, false);
-		return error;
-	}
-
 	get_page(page);
 	page->mapping = mapping;
 	page->index = offset;
 
+retry:
 	xa_lock_irq(&mapping->pages);
-	error = page_cache_tree_insert(mapping, page, shadowp);
-	radix_tree_preload_end();
-	if (unlikely(error))
-		goto err_insert;
+	old = xas_create(&mapping->pages, &xas);
+	error = xas_error(&xas);
+	if (error) {
+		xa_unlock_irq(&mapping->pages);
+		if (xas_nomem(&xas, gfp_mask & ~__GFP_HIGHMEM))
+			goto retry;
+		goto error;
+	}
+
+	if (xa_is_value(old)) {
+		mapping->nrexceptional--;
+		if (shadowp)
+			*shadowp = old;
+	} else if (old) {
+		goto exist;
+	}
+
+	xas_store(&mapping->pages, &xas, page);
+	mapping->nrpages++;
 
 	/* hugetlb pages do not participate in page cache accounting. */
 	if (!huge)
 		__inc_node_page_state(page, NR_FILE_PAGES);
 	xa_unlock_irq(&mapping->pages);
+	xas_destroy(&xas);
 	if (!huge)
 		mem_cgroup_commit_charge(page, memcg, false, false);
 	trace_mm_filemap_add_to_page_cache(page);
 	return 0;
-err_insert:
+exist:
+	xa_unlock_irq(&mapping->pages);
+	error = -EEXIST;
+error:
+	xas_destroy(&xas);
 	page->mapping = NULL;
 	/* Leave page->index set: truncation relies upon it */
-	xa_unlock_irq(&mapping->pages);
 	if (!huge)
 		mem_cgroup_cancel_charge(page, memcg, false);
 	put_page(page);
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 35/62] page cache: Convert page_cache_tree_delete to xarray
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The code is slightly shorter and simpler.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/filemap.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 1d520748789b..f60f22867a1a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -115,27 +115,25 @@
 static void page_cache_tree_delete(struct address_space *mapping,
 				   struct page *page, void *shadow)
 {
-	int i, nr;
+	XA_STATE(xas, page->index);
+	unsigned int i, nr;
 
-	/* hugetlb pages are represented by one entry in the radix tree */
+	xas_set_update(&xas, workingset_lookup_update(mapping));
+
+	/* hugetlb pages are represented by a single entry in the xarray */
 	nr = PageHuge(page) ? 1 : hpage_nr_pages(page);
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	VM_BUG_ON_PAGE(PageTail(page), page);
 	VM_BUG_ON_PAGE(nr != 1 && shadow, page);
 
-	for (i = 0; i < nr; i++) {
-		struct radix_tree_node *node;
-		void **slot;
-
-		__radix_tree_lookup(&mapping->pages, page->index + i,
-				    &node, &slot);
-
-		VM_BUG_ON_PAGE(!node && nr != 1, page);
-
-		radix_tree_clear_tags(&mapping->pages, node, slot);
-		__radix_tree_replace(&mapping->pages, node, slot, shadow,
-				workingset_lookup_update(mapping));
+	i = nr;
+repeat:
+	xas_store(&mapping->pages, &xas, shadow);
+	xas_init_tags(&mapping->pages, &xas);
+	if (--i) {
+		xas_next_any(&mapping->pages, &xas);
+		goto repeat;
 	}
 
 	page->mapping = NULL;
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 35/62] page cache: Convert page_cache_tree_delete to xarray
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The code is slightly shorter and simpler.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/filemap.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 1d520748789b..f60f22867a1a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -115,27 +115,25 @@
 static void page_cache_tree_delete(struct address_space *mapping,
 				   struct page *page, void *shadow)
 {
-	int i, nr;
+	XA_STATE(xas, page->index);
+	unsigned int i, nr;
 
-	/* hugetlb pages are represented by one entry in the radix tree */
+	xas_set_update(&xas, workingset_lookup_update(mapping));
+
+	/* hugetlb pages are represented by a single entry in the xarray */
 	nr = PageHuge(page) ? 1 : hpage_nr_pages(page);
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	VM_BUG_ON_PAGE(PageTail(page), page);
 	VM_BUG_ON_PAGE(nr != 1 && shadow, page);
 
-	for (i = 0; i < nr; i++) {
-		struct radix_tree_node *node;
-		void **slot;
-
-		__radix_tree_lookup(&mapping->pages, page->index + i,
-				    &node, &slot);
-
-		VM_BUG_ON_PAGE(!node && nr != 1, page);
-
-		radix_tree_clear_tags(&mapping->pages, node, slot);
-		__radix_tree_replace(&mapping->pages, node, slot, shadow,
-				workingset_lookup_update(mapping));
+	i = nr;
+repeat:
+	xas_store(&mapping->pages, &xas, shadow);
+	xas_init_tags(&mapping->pages, &xas);
+	if (--i) {
+		xas_next_any(&mapping->pages, &xas);
+		goto repeat;
 	}
 
 	page->mapping = NULL;
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 36/62] page cache: Convert find_get_entry to xarray
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Slightly simpler and shorter code.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/filemap.c | 56 +++++++++++++++++++++-----------------------------------
 1 file changed, 21 insertions(+), 35 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index f60f22867a1a..2172c9f9efb9 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1391,51 +1391,37 @@ EXPORT_SYMBOL(page_cache_prev_hole);
  */
 struct page *find_get_entry(struct address_space *mapping, pgoff_t offset)
 {
-	void **pagep;
+	XA_STATE(xas, offset);
 	struct page *head, *page;
 
 	rcu_read_lock();
 repeat:
-	page = NULL;
-	pagep = radix_tree_lookup_slot(&mapping->pages, offset);
-	if (pagep) {
-		page = radix_tree_deref_slot(pagep);
-		if (unlikely(!page))
-			goto out;
-		if (radix_tree_exception(page)) {
-			if (radix_tree_deref_retry(page))
-				goto repeat;
-			/*
-			 * A shadow entry of a recently evicted page,
-			 * or a swap entry from shmem/tmpfs.  Return
-			 * it without attempting to raise page count.
-			 */
-			goto out;
-		}
+	page = xas_load(&mapping->pages, &xas);
+	if (!page || xa_is_value(page))
+		goto out;
 
-		head = compound_head(page);
-		if (!page_cache_get_speculative(head))
-			goto repeat;
+	head = compound_head(page);
+	if (!page_cache_get_speculative(head))
+		goto repeat;
 
-		/* The page was split under us? */
-		if (compound_head(page) != head) {
-			put_page(head);
-			goto repeat;
-		}
+	/* The page was split under us? */
+	if (compound_head(page) != head) {
+		put_page(head);
+		goto repeat;
+	}
 
-		/*
-		 * Has the page moved?
-		 * This is part of the lockless pagecache protocol. See
-		 * include/linux/pagemap.h for details.
-		 */
-		if (unlikely(page != *pagep)) {
-			put_page(head);
-			goto repeat;
-		}
+	/*
+	 * Has the page moved?
+	 * This is part of the lockless pagecache protocol. See
+	 * include/linux/pagemap.h for details.
+	 */
+	if (unlikely(page != xas_reload(&mapping->pages, &xas))) {
+		put_page(head);
+		goto repeat;
 	}
+
 out:
 	rcu_read_unlock();
-
 	return page;
 }
 EXPORT_SYMBOL(find_get_entry);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 36/62] page cache: Convert find_get_entry to xarray
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Slightly simpler and shorter code.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/filemap.c | 56 +++++++++++++++++++++-----------------------------------
 1 file changed, 21 insertions(+), 35 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index f60f22867a1a..2172c9f9efb9 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1391,51 +1391,37 @@ EXPORT_SYMBOL(page_cache_prev_hole);
  */
 struct page *find_get_entry(struct address_space *mapping, pgoff_t offset)
 {
-	void **pagep;
+	XA_STATE(xas, offset);
 	struct page *head, *page;
 
 	rcu_read_lock();
 repeat:
-	page = NULL;
-	pagep = radix_tree_lookup_slot(&mapping->pages, offset);
-	if (pagep) {
-		page = radix_tree_deref_slot(pagep);
-		if (unlikely(!page))
-			goto out;
-		if (radix_tree_exception(page)) {
-			if (radix_tree_deref_retry(page))
-				goto repeat;
-			/*
-			 * A shadow entry of a recently evicted page,
-			 * or a swap entry from shmem/tmpfs.  Return
-			 * it without attempting to raise page count.
-			 */
-			goto out;
-		}
+	page = xas_load(&mapping->pages, &xas);
+	if (!page || xa_is_value(page))
+		goto out;
 
-		head = compound_head(page);
-		if (!page_cache_get_speculative(head))
-			goto repeat;
+	head = compound_head(page);
+	if (!page_cache_get_speculative(head))
+		goto repeat;
 
-		/* The page was split under us? */
-		if (compound_head(page) != head) {
-			put_page(head);
-			goto repeat;
-		}
+	/* The page was split under us? */
+	if (compound_head(page) != head) {
+		put_page(head);
+		goto repeat;
+	}
 
-		/*
-		 * Has the page moved?
-		 * This is part of the lockless pagecache protocol. See
-		 * include/linux/pagemap.h for details.
-		 */
-		if (unlikely(page != *pagep)) {
-			put_page(head);
-			goto repeat;
-		}
+	/*
+	 * Has the page moved?
+	 * This is part of the lockless pagecache protocol. See
+	 * include/linux/pagemap.h for details.
+	 */
+	if (unlikely(page != xas_reload(&mapping->pages, &xas))) {
+		put_page(head);
+		goto repeat;
 	}
+
 out:
 	rcu_read_unlock();
-
 	return page;
 }
 EXPORT_SYMBOL(find_get_entry);
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 37/62] shmem: Convert replace to xarray
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

shmem_radix_tree_replace() is renamed to shmem_xa_replace() and
converted to use the XArray API.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/shmem.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 21bf42f14ee2..f16afa03cfb0 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -321,24 +321,20 @@ void shmem_uncharge(struct inode *inode, long pages)
 }
 
 /*
- * Replace item expected in radix tree by a new item, while holding tree lock.
+ * Replace item expected in xarray by a new item, while holding xa_lock.
  */
-static int shmem_radix_tree_replace(struct address_space *mapping,
+static int shmem_xa_replace(struct address_space *mapping,
 			pgoff_t index, void *expected, void *replacement)
 {
-	struct radix_tree_node *node;
-	void **pslot;
+	XA_STATE(xas, index);
 	void *item;
 
 	VM_BUG_ON(!expected);
 	VM_BUG_ON(!replacement);
-	item = __radix_tree_lookup(&mapping->pages, index, &node, &pslot);
-	if (!item)
-		return -ENOENT;
+	item = xas_load(&mapping->pages, &xas);
 	if (item != expected)
 		return -ENOENT;
-	__radix_tree_replace(&mapping->pages, node, pslot,
-			     replacement, NULL);
+	xas_store(&mapping->pages, &xas, replacement);
 	return 0;
 }
 
@@ -605,8 +601,7 @@ static int shmem_add_to_page_cache(struct page *page,
 	} else if (!expected) {
 		error = radix_tree_insert(&mapping->pages, index, page);
 	} else {
-		error = shmem_radix_tree_replace(mapping, index, expected,
-								 page);
+		error = shmem_xa_replace(mapping, index, expected, page);
 	}
 
 	if (!error) {
@@ -635,7 +630,7 @@ static void shmem_delete_from_page_cache(struct page *page, void *radswap)
 	VM_BUG_ON_PAGE(PageCompound(page), page);
 
 	xa_lock_irq(&mapping->pages);
-	error = shmem_radix_tree_replace(mapping, page->index, page, radswap);
+	error = shmem_xa_replace(mapping, page->index, page, radswap);
 	page->mapping = NULL;
 	mapping->nrpages--;
 	__dec_node_page_state(page, NR_FILE_PAGES);
@@ -1550,8 +1545,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
 	 * a nice clean interface for us to replace oldpage by newpage there.
 	 */
 	xa_lock_irq(&swap_mapping->pages);
-	error = shmem_radix_tree_replace(swap_mapping, swap_index, oldpage,
-								   newpage);
+	error = shmem_xa_replace(swap_mapping, swap_index, oldpage, newpage);
 	if (!error) {
 		__inc_node_page_state(newpage, NR_FILE_PAGES);
 		__dec_node_page_state(oldpage, NR_FILE_PAGES);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 37/62] shmem: Convert replace to xarray
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

shmem_radix_tree_replace() is renamed to shmem_xa_replace() and
converted to use the XArray API.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/shmem.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 21bf42f14ee2..f16afa03cfb0 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -321,24 +321,20 @@ void shmem_uncharge(struct inode *inode, long pages)
 }
 
 /*
- * Replace item expected in radix tree by a new item, while holding tree lock.
+ * Replace item expected in xarray by a new item, while holding xa_lock.
  */
-static int shmem_radix_tree_replace(struct address_space *mapping,
+static int shmem_xa_replace(struct address_space *mapping,
 			pgoff_t index, void *expected, void *replacement)
 {
-	struct radix_tree_node *node;
-	void **pslot;
+	XA_STATE(xas, index);
 	void *item;
 
 	VM_BUG_ON(!expected);
 	VM_BUG_ON(!replacement);
-	item = __radix_tree_lookup(&mapping->pages, index, &node, &pslot);
-	if (!item)
-		return -ENOENT;
+	item = xas_load(&mapping->pages, &xas);
 	if (item != expected)
 		return -ENOENT;
-	__radix_tree_replace(&mapping->pages, node, pslot,
-			     replacement, NULL);
+	xas_store(&mapping->pages, &xas, replacement);
 	return 0;
 }
 
@@ -605,8 +601,7 @@ static int shmem_add_to_page_cache(struct page *page,
 	} else if (!expected) {
 		error = radix_tree_insert(&mapping->pages, index, page);
 	} else {
-		error = shmem_radix_tree_replace(mapping, index, expected,
-								 page);
+		error = shmem_xa_replace(mapping, index, expected, page);
 	}
 
 	if (!error) {
@@ -635,7 +630,7 @@ static void shmem_delete_from_page_cache(struct page *page, void *radswap)
 	VM_BUG_ON_PAGE(PageCompound(page), page);
 
 	xa_lock_irq(&mapping->pages);
-	error = shmem_radix_tree_replace(mapping, page->index, page, radswap);
+	error = shmem_xa_replace(mapping, page->index, page, radswap);
 	page->mapping = NULL;
 	mapping->nrpages--;
 	__dec_node_page_state(page, NR_FILE_PAGES);
@@ -1550,8 +1545,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
 	 * a nice clean interface for us to replace oldpage by newpage there.
 	 */
 	xa_lock_irq(&swap_mapping->pages);
-	error = shmem_radix_tree_replace(swap_mapping, swap_index, oldpage,
-								   newpage);
+	error = shmem_xa_replace(swap_mapping, swap_index, oldpage, newpage);
 	if (!error) {
 		__inc_node_page_state(newpage, NR_FILE_PAGES);
 		__dec_node_page_state(oldpage, NR_FILE_PAGES);
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 38/62] shmem: Convert shmem_confirm_swap to XArray
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

xa_load has its own RCU locking, so we can eliminate it here.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/shmem.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index f16afa03cfb0..98944cff6319 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -348,12 +348,7 @@ static int shmem_xa_replace(struct address_space *mapping,
 static bool shmem_confirm_swap(struct address_space *mapping,
 			       pgoff_t index, swp_entry_t swap)
 {
-	void *item;
-
-	rcu_read_lock();
-	item = radix_tree_lookup(&mapping->pages, index);
-	rcu_read_unlock();
-	return item == swp_to_radix_entry(swap);
+	return xa_load(&mapping->pages, index) == swp_to_radix_entry(swap);
 }
 
 /*
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 38/62] shmem: Convert shmem_confirm_swap to XArray
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

xa_load has its own RCU locking, so we can eliminate it here.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/shmem.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index f16afa03cfb0..98944cff6319 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -348,12 +348,7 @@ static int shmem_xa_replace(struct address_space *mapping,
 static bool shmem_confirm_swap(struct address_space *mapping,
 			       pgoff_t index, swp_entry_t swap)
 {
-	void *item;
-
-	rcu_read_lock();
-	item = radix_tree_lookup(&mapping->pages, index);
-	rcu_read_unlock();
-	return item == swp_to_radix_entry(swap);
+	return xa_load(&mapping->pages, index) == swp_to_radix_entry(swap);
 }
 
 /*
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 39/62] shmem: Convert find_swap_entry to XArray
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This is a 1:1 conversion.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/shmem.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 98944cff6319..b6f7fc283aad 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1076,28 +1076,27 @@ static void shmem_evict_inode(struct inode *inode)
 	clear_inode(inode);
 }
 
-static unsigned long find_swap_entry(struct radix_tree_root *root, void *item)
+static unsigned long find_swap_entry(struct xarray *xa, void *item)
 {
-	struct radix_tree_iter iter;
-	void **slot;
-	unsigned long found = -1;
+	XA_STATE(xas, 0);
 	unsigned int checked = 0;
+	void *entry;
 
 	rcu_read_lock();
-	radix_tree_for_each_slot(slot, root, &iter, 0) {
-		if (*slot == item) {
-			found = iter.index;
+	xas_for_each(xa, &xas, entry, ULONG_MAX) {
+		if (xas_retry(&xas, entry))
+			continue;
+		if (entry == item)
 			break;
-		}
 		checked++;
 		if ((checked % 4096) != 0)
 			continue;
-		slot = radix_tree_iter_resume(slot, &iter);
+		xas_pause(&xas);
 		cond_resched_rcu();
 	}
-
 	rcu_read_unlock();
-	return found;
+
+	return xas_invalid(&xas) ? -1 : xas.xa_index;
 }
 
 /*
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 39/62] shmem: Convert find_swap_entry to XArray
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This is a 1:1 conversion.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/shmem.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 98944cff6319..b6f7fc283aad 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1076,28 +1076,27 @@ static void shmem_evict_inode(struct inode *inode)
 	clear_inode(inode);
 }
 
-static unsigned long find_swap_entry(struct radix_tree_root *root, void *item)
+static unsigned long find_swap_entry(struct xarray *xa, void *item)
 {
-	struct radix_tree_iter iter;
-	void **slot;
-	unsigned long found = -1;
+	XA_STATE(xas, 0);
 	unsigned int checked = 0;
+	void *entry;
 
 	rcu_read_lock();
-	radix_tree_for_each_slot(slot, root, &iter, 0) {
-		if (*slot == item) {
-			found = iter.index;
+	xas_for_each(xa, &xas, entry, ULONG_MAX) {
+		if (xas_retry(&xas, entry))
+			continue;
+		if (entry == item)
 			break;
-		}
 		checked++;
 		if ((checked % 4096) != 0)
 			continue;
-		slot = radix_tree_iter_resume(slot, &iter);
+		xas_pause(&xas);
 		cond_resched_rcu();
 	}
-
 	rcu_read_unlock();
-	return found;
+
+	return xas_invalid(&xas) ? -1 : xas.xa_index;
 }
 
 /*
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 40/62] shmem: Convert shmem_tag_pins to XArray
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Simplify the locking by taking the spinlock while we walk the tree on
the assumption that many acquires and releases of the lock will be
worse than holding the lock for a (potentially) long time.

We could replicate the same locking behaviour with the xarray, but would
have to be careful that the xa_node wasn't RCU-freed under us before we
took the lock.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h |  8 ++++++++
 mm/shmem.c             | 39 ++++++++++++++++-----------------------
 2 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 1648eda4a20d..dda61b5c8e49 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -669,6 +669,14 @@ static inline void *xas_next_tag(struct xarray *xa, struct xa_state *xas,
 	return xa_entry(xa, node, offset);
 }
 
+/*
+ * If iterating while holding a lock, drop the lock and reschedule
+ * every %XA_CHECK_SCHED loops.
+ */
+enum {
+	XA_CHECK_SCHED = 4096,
+};
+
 /**
  * xas_for_each() - Iterate over a range of an XArray
  * @xa: XArray.
diff --git a/mm/shmem.c b/mm/shmem.c
index b6f7fc283aad..302fcb62ba1f 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2601,35 +2601,28 @@ static loff_t shmem_file_llseek(struct file *file, loff_t offset, int whence)
 
 static void shmem_tag_pins(struct address_space *mapping)
 {
-	struct radix_tree_iter iter;
-	void **slot;
-	pgoff_t start;
+	XA_STATE(xas, 0);
 	struct page *page;
+	unsigned int tagged = 0;
 
 	lru_add_drain();
-	start = 0;
-	rcu_read_lock();
 
-	radix_tree_for_each_slot(slot, &mapping->pages, &iter, start) {
-		page = radix_tree_deref_slot(slot);
-		if (!page || radix_tree_exception(page)) {
-			if (radix_tree_deref_retry(page)) {
-				slot = radix_tree_iter_retry(&iter);
-				continue;
-			}
-		} else if (page_count(page) - page_mapcount(page) > 1) {
-			xa_lock_irq(&mapping->pages);
-			radix_tree_tag_set(&mapping->pages, iter.index,
-					   SHMEM_TAG_PINNED);
-			xa_unlock_irq(&mapping->pages);
-		}
+	xa_lock_irq(&mapping->pages);
+	xas_for_each(&mapping->pages, &xas, page, ULONG_MAX) {
+		if (xa_is_value(page))
+			continue;
+		if (page_count(page) - page_mapcount(page) > 1)
+			xas_set_tag(&mapping->pages, &xas, SHMEM_TAG_PINNED);
 
-		if (need_resched()) {
-			slot = radix_tree_iter_resume(slot, &iter);
-			cond_resched_rcu();
-		}
+		if (++tagged % XA_CHECK_SCHED)
+			continue;
+
+		xas_pause(&xas);
+		xa_unlock_irq(&mapping->pages);
+		cond_resched();
+		xa_lock_irq(&mapping->pages);
 	}
-	rcu_read_unlock();
+	xa_unlock_irq(&mapping->pages);
 }
 
 /*
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 40/62] shmem: Convert shmem_tag_pins to XArray
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Simplify the locking by taking the spinlock while we walk the tree on
the assumption that many acquires and releases of the lock will be
worse than holding the lock for a (potentially) long time.

We could replicate the same locking behaviour with the xarray, but would
have to be careful that the xa_node wasn't RCU-freed under us before we
took the lock.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/xarray.h |  8 ++++++++
 mm/shmem.c             | 39 ++++++++++++++++-----------------------
 2 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 1648eda4a20d..dda61b5c8e49 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -669,6 +669,14 @@ static inline void *xas_next_tag(struct xarray *xa, struct xa_state *xas,
 	return xa_entry(xa, node, offset);
 }
 
+/*
+ * If iterating while holding a lock, drop the lock and reschedule
+ * every %XA_CHECK_SCHED loops.
+ */
+enum {
+	XA_CHECK_SCHED = 4096,
+};
+
 /**
  * xas_for_each() - Iterate over a range of an XArray
  * @xa: XArray.
diff --git a/mm/shmem.c b/mm/shmem.c
index b6f7fc283aad..302fcb62ba1f 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2601,35 +2601,28 @@ static loff_t shmem_file_llseek(struct file *file, loff_t offset, int whence)
 
 static void shmem_tag_pins(struct address_space *mapping)
 {
-	struct radix_tree_iter iter;
-	void **slot;
-	pgoff_t start;
+	XA_STATE(xas, 0);
 	struct page *page;
+	unsigned int tagged = 0;
 
 	lru_add_drain();
-	start = 0;
-	rcu_read_lock();
 
-	radix_tree_for_each_slot(slot, &mapping->pages, &iter, start) {
-		page = radix_tree_deref_slot(slot);
-		if (!page || radix_tree_exception(page)) {
-			if (radix_tree_deref_retry(page)) {
-				slot = radix_tree_iter_retry(&iter);
-				continue;
-			}
-		} else if (page_count(page) - page_mapcount(page) > 1) {
-			xa_lock_irq(&mapping->pages);
-			radix_tree_tag_set(&mapping->pages, iter.index,
-					   SHMEM_TAG_PINNED);
-			xa_unlock_irq(&mapping->pages);
-		}
+	xa_lock_irq(&mapping->pages);
+	xas_for_each(&mapping->pages, &xas, page, ULONG_MAX) {
+		if (xa_is_value(page))
+			continue;
+		if (page_count(page) - page_mapcount(page) > 1)
+			xas_set_tag(&mapping->pages, &xas, SHMEM_TAG_PINNED);
 
-		if (need_resched()) {
-			slot = radix_tree_iter_resume(slot, &iter);
-			cond_resched_rcu();
-		}
+		if (++tagged % XA_CHECK_SCHED)
+			continue;
+
+		xas_pause(&xas);
+		xa_unlock_irq(&mapping->pages);
+		cond_resched();
+		xa_lock_irq(&mapping->pages);
 	}
-	rcu_read_unlock();
+	xa_unlock_irq(&mapping->pages);
 }
 
 /*
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 41/62] shmem: Convert shmem_wait_for_pins to XArray
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

As with shmem_tag_pins(), hold the lock around the entire loop instead
of acquiring & dropping it for each entry we're going to untag.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/shmem.c | 59 +++++++++++++++++++++++++----------------------------------
 1 file changed, 25 insertions(+), 34 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 302fcb62ba1f..e4d28743a666 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2636,9 +2636,7 @@ static void shmem_tag_pins(struct address_space *mapping)
  */
 static int shmem_wait_for_pins(struct address_space *mapping)
 {
-	struct radix_tree_iter iter;
-	void **slot;
-	pgoff_t start;
+	XA_STATE(xas, 0);
 	struct page *page;
 	int error, scan;
 
@@ -2646,7 +2644,9 @@ static int shmem_wait_for_pins(struct address_space *mapping)
 
 	error = 0;
 	for (scan = 0; scan <= LAST_SCAN; scan++) {
-		if (!radix_tree_tagged(&mapping->pages, SHMEM_TAG_PINNED))
+		unsigned int tagged = 0;
+
+		if (!xa_tagged(&mapping->pages, SHMEM_TAG_PINNED))
 			break;
 
 		if (!scan)
@@ -2654,45 +2654,36 @@ static int shmem_wait_for_pins(struct address_space *mapping)
 		else if (schedule_timeout_killable((HZ << scan) / 200))
 			scan = LAST_SCAN;
 
-		start = 0;
-		rcu_read_lock();
-		radix_tree_for_each_tagged(slot, &mapping->pages, &iter,
-					   start, SHMEM_TAG_PINNED) {
-
-			page = radix_tree_deref_slot(slot);
-			if (radix_tree_exception(page)) {
-				if (radix_tree_deref_retry(page)) {
-					slot = radix_tree_iter_retry(&iter);
-					continue;
-				}
-
-				page = NULL;
-			}
-
-			if (page &&
-			    page_count(page) - page_mapcount(page) != 1) {
-				if (scan < LAST_SCAN)
-					goto continue_resched;
-
+		xas_set(&xas, 0);
+		xa_lock_irq(&mapping->pages);
+		xas_for_each_tag(&mapping->pages, &xas, page, ULONG_MAX,
+				SHMEM_TAG_PINNED) {
+			bool clear = true;
+			if (xa_is_value(page))
+				continue;
+			if (page_count(page) - page_mapcount(page) != 1) {
 				/*
 				 * On the last scan, we clean up all those tags
 				 * we inserted; but make a note that we still
 				 * found pages pinned.
 				 */
-				error = -EBUSY;
+				if (scan == LAST_SCAN)
+					error = -EBUSY;
+				else
+					clear = false;
 			}
+			if (clear)
+				xas_clear_tag(&mapping->pages, &xas,
+							SHMEM_TAG_PINNED);
+			if (++tagged % XA_CHECK_SCHED)
+				continue;
 
-			xa_lock_irq(&mapping->pages);
-			radix_tree_tag_clear(&mapping->pages,
-					     iter.index, SHMEM_TAG_PINNED);
+			xas_pause(&xas);
 			xa_unlock_irq(&mapping->pages);
-continue_resched:
-			if (need_resched()) {
-				slot = radix_tree_iter_resume(slot, &iter);
-				cond_resched_rcu();
-			}
+			cond_resched();
+			xa_lock_irq(&mapping->pages);
 		}
-		rcu_read_unlock();
+		xa_unlock_irq(&mapping->pages);
 	}
 
 	return error;
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 41/62] shmem: Convert shmem_wait_for_pins to XArray
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

As with shmem_tag_pins(), hold the lock around the entire loop instead
of acquiring & dropping it for each entry we're going to untag.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/shmem.c | 59 +++++++++++++++++++++++++----------------------------------
 1 file changed, 25 insertions(+), 34 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 302fcb62ba1f..e4d28743a666 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2636,9 +2636,7 @@ static void shmem_tag_pins(struct address_space *mapping)
  */
 static int shmem_wait_for_pins(struct address_space *mapping)
 {
-	struct radix_tree_iter iter;
-	void **slot;
-	pgoff_t start;
+	XA_STATE(xas, 0);
 	struct page *page;
 	int error, scan;
 
@@ -2646,7 +2644,9 @@ static int shmem_wait_for_pins(struct address_space *mapping)
 
 	error = 0;
 	for (scan = 0; scan <= LAST_SCAN; scan++) {
-		if (!radix_tree_tagged(&mapping->pages, SHMEM_TAG_PINNED))
+		unsigned int tagged = 0;
+
+		if (!xa_tagged(&mapping->pages, SHMEM_TAG_PINNED))
 			break;
 
 		if (!scan)
@@ -2654,45 +2654,36 @@ static int shmem_wait_for_pins(struct address_space *mapping)
 		else if (schedule_timeout_killable((HZ << scan) / 200))
 			scan = LAST_SCAN;
 
-		start = 0;
-		rcu_read_lock();
-		radix_tree_for_each_tagged(slot, &mapping->pages, &iter,
-					   start, SHMEM_TAG_PINNED) {
-
-			page = radix_tree_deref_slot(slot);
-			if (radix_tree_exception(page)) {
-				if (radix_tree_deref_retry(page)) {
-					slot = radix_tree_iter_retry(&iter);
-					continue;
-				}
-
-				page = NULL;
-			}
-
-			if (page &&
-			    page_count(page) - page_mapcount(page) != 1) {
-				if (scan < LAST_SCAN)
-					goto continue_resched;
-
+		xas_set(&xas, 0);
+		xa_lock_irq(&mapping->pages);
+		xas_for_each_tag(&mapping->pages, &xas, page, ULONG_MAX,
+				SHMEM_TAG_PINNED) {
+			bool clear = true;
+			if (xa_is_value(page))
+				continue;
+			if (page_count(page) - page_mapcount(page) != 1) {
 				/*
 				 * On the last scan, we clean up all those tags
 				 * we inserted; but make a note that we still
 				 * found pages pinned.
 				 */
-				error = -EBUSY;
+				if (scan == LAST_SCAN)
+					error = -EBUSY;
+				else
+					clear = false;
 			}
+			if (clear)
+				xas_clear_tag(&mapping->pages, &xas,
+							SHMEM_TAG_PINNED);
+			if (++tagged % XA_CHECK_SCHED)
+				continue;
 
-			xa_lock_irq(&mapping->pages);
-			radix_tree_tag_clear(&mapping->pages,
-					     iter.index, SHMEM_TAG_PINNED);
+			xas_pause(&xas);
 			xa_unlock_irq(&mapping->pages);
-continue_resched:
-			if (need_resched()) {
-				slot = radix_tree_iter_resume(slot, &iter);
-				cond_resched_rcu();
-			}
+			cond_resched();
+			xa_lock_irq(&mapping->pages);
 		}
-		rcu_read_unlock();
+		xa_unlock_irq(&mapping->pages);
 	}
 
 	return error;
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 42/62] vmalloc: Convert to xarray
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The radix tree of vmap blocks is simpler to express as an XArray.
Saves a couple of hundred bytes of text and eliminates a user of the
radix tree preload API.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/vmalloc.c | 37 ++++++++++++-------------------------
 1 file changed, 12 insertions(+), 25 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 673942094328..cc5de1ee6520 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -821,12 +821,11 @@ struct vmap_block {
 static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
 
 /*
- * Radix tree of vmap blocks, indexed by address, to quickly find a vmap block
+ * XArray of vmap blocks, indexed by address, to quickly find a vmap block
  * in the free path. Could get rid of this if we change the API to return a
  * "cookie" from alloc, to be passed to free. But no big deal yet.
  */
-static DEFINE_SPINLOCK(vmap_block_tree_lock);
-static RADIX_TREE(vmap_block_tree, GFP_ATOMIC);
+static DEFINE_XARRAY(vmap_block_tree);
 
 /*
  * We should probably have a fallback mechanism to allocate virtual memory
@@ -865,8 +864,8 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
 	struct vmap_block *vb;
 	struct vmap_area *va;
 	unsigned long vb_idx;
-	int node, err;
-	void *vaddr;
+	int node;
+	void *ret, *vaddr;
 
 	node = numa_node_id();
 
@@ -883,13 +882,6 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
 		return ERR_CAST(va);
 	}
 
-	err = radix_tree_preload(gfp_mask);
-	if (unlikely(err)) {
-		kfree(vb);
-		free_vmap_area(va);
-		return ERR_PTR(err);
-	}
-
 	vaddr = vmap_block_vaddr(va->va_start, 0);
 	spin_lock_init(&vb->lock);
 	vb->va = va;
@@ -902,11 +894,12 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
 	INIT_LIST_HEAD(&vb->free_list);
 
 	vb_idx = addr_to_vb_idx(va->va_start);
-	spin_lock(&vmap_block_tree_lock);
-	err = radix_tree_insert(&vmap_block_tree, vb_idx, vb);
-	spin_unlock(&vmap_block_tree_lock);
-	BUG_ON(err);
-	radix_tree_preload_end();
+	ret = xa_store(&vmap_block_tree, vb_idx, vb, gfp_mask);
+	if (IS_ERR(ret)) {
+		kfree(vb);
+		free_vmap_area(va);
+		return ret;
+	}
 
 	vbq = &get_cpu_var(vmap_block_queue);
 	spin_lock(&vbq->lock);
@@ -923,9 +916,7 @@ static void free_vmap_block(struct vmap_block *vb)
 	unsigned long vb_idx;
 
 	vb_idx = addr_to_vb_idx(vb->va->va_start);
-	spin_lock(&vmap_block_tree_lock);
-	tmp = radix_tree_delete(&vmap_block_tree, vb_idx);
-	spin_unlock(&vmap_block_tree_lock);
+	tmp = xa_store(&vmap_block_tree, vb_idx, NULL, GFP_NOWAIT);
 	BUG_ON(tmp != vb);
 
 	free_vmap_area_noflush(vb->va);
@@ -1031,7 +1022,6 @@ static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
 static void vb_free(const void *addr, unsigned long size)
 {
 	unsigned long offset;
-	unsigned long vb_idx;
 	unsigned int order;
 	struct vmap_block *vb;
 
@@ -1045,10 +1035,7 @@ static void vb_free(const void *addr, unsigned long size)
 	offset = (unsigned long)addr & (VMAP_BLOCK_SIZE - 1);
 	offset >>= PAGE_SHIFT;
 
-	vb_idx = addr_to_vb_idx((unsigned long)addr);
-	rcu_read_lock();
-	vb = radix_tree_lookup(&vmap_block_tree, vb_idx);
-	rcu_read_unlock();
+	vb = xa_load(&vmap_block_tree, addr_to_vb_idx((unsigned long)addr));
 	BUG_ON(!vb);
 
 	vunmap_page_range((unsigned long)addr, (unsigned long)addr + size);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 42/62] vmalloc: Convert to xarray
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The radix tree of vmap blocks is simpler to express as an XArray.
Saves a couple of hundred bytes of text and eliminates a user of the
radix tree preload API.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/vmalloc.c | 37 ++++++++++++-------------------------
 1 file changed, 12 insertions(+), 25 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 673942094328..cc5de1ee6520 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -821,12 +821,11 @@ struct vmap_block {
 static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
 
 /*
- * Radix tree of vmap blocks, indexed by address, to quickly find a vmap block
+ * XArray of vmap blocks, indexed by address, to quickly find a vmap block
  * in the free path. Could get rid of this if we change the API to return a
  * "cookie" from alloc, to be passed to free. But no big deal yet.
  */
-static DEFINE_SPINLOCK(vmap_block_tree_lock);
-static RADIX_TREE(vmap_block_tree, GFP_ATOMIC);
+static DEFINE_XARRAY(vmap_block_tree);
 
 /*
  * We should probably have a fallback mechanism to allocate virtual memory
@@ -865,8 +864,8 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
 	struct vmap_block *vb;
 	struct vmap_area *va;
 	unsigned long vb_idx;
-	int node, err;
-	void *vaddr;
+	int node;
+	void *ret, *vaddr;
 
 	node = numa_node_id();
 
@@ -883,13 +882,6 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
 		return ERR_CAST(va);
 	}
 
-	err = radix_tree_preload(gfp_mask);
-	if (unlikely(err)) {
-		kfree(vb);
-		free_vmap_area(va);
-		return ERR_PTR(err);
-	}
-
 	vaddr = vmap_block_vaddr(va->va_start, 0);
 	spin_lock_init(&vb->lock);
 	vb->va = va;
@@ -902,11 +894,12 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
 	INIT_LIST_HEAD(&vb->free_list);
 
 	vb_idx = addr_to_vb_idx(va->va_start);
-	spin_lock(&vmap_block_tree_lock);
-	err = radix_tree_insert(&vmap_block_tree, vb_idx, vb);
-	spin_unlock(&vmap_block_tree_lock);
-	BUG_ON(err);
-	radix_tree_preload_end();
+	ret = xa_store(&vmap_block_tree, vb_idx, vb, gfp_mask);
+	if (IS_ERR(ret)) {
+		kfree(vb);
+		free_vmap_area(va);
+		return ret;
+	}
 
 	vbq = &get_cpu_var(vmap_block_queue);
 	spin_lock(&vbq->lock);
@@ -923,9 +916,7 @@ static void free_vmap_block(struct vmap_block *vb)
 	unsigned long vb_idx;
 
 	vb_idx = addr_to_vb_idx(vb->va->va_start);
-	spin_lock(&vmap_block_tree_lock);
-	tmp = radix_tree_delete(&vmap_block_tree, vb_idx);
-	spin_unlock(&vmap_block_tree_lock);
+	tmp = xa_store(&vmap_block_tree, vb_idx, NULL, GFP_NOWAIT);
 	BUG_ON(tmp != vb);
 
 	free_vmap_area_noflush(vb->va);
@@ -1031,7 +1022,6 @@ static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
 static void vb_free(const void *addr, unsigned long size)
 {
 	unsigned long offset;
-	unsigned long vb_idx;
 	unsigned int order;
 	struct vmap_block *vb;
 
@@ -1045,10 +1035,7 @@ static void vb_free(const void *addr, unsigned long size)
 	offset = (unsigned long)addr & (VMAP_BLOCK_SIZE - 1);
 	offset >>= PAGE_SHIFT;
 
-	vb_idx = addr_to_vb_idx((unsigned long)addr);
-	rcu_read_lock();
-	vb = radix_tree_lookup(&vmap_block_tree, vb_idx);
-	rcu_read_unlock();
+	vb = xa_load(&vmap_block_tree, addr_to_vb_idx((unsigned long)addr));
 	BUG_ON(!vb);
 
 	vunmap_page_range((unsigned long)addr, (unsigned long)addr + size);
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 43/62] brd: Convert to XArray
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Convert brd_pages from a radix tree to an XArray.  Simpler and smaller
code; in particular another user of radix_tree_preload is eliminated.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 drivers/block/brd.c | 87 ++++++++++++++---------------------------------------
 1 file changed, 23 insertions(+), 64 deletions(-)

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 8028a3a7e7fd..c9fda03810ea 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -17,7 +17,7 @@
 #include <linux/bio.h>
 #include <linux/highmem.h>
 #include <linux/mutex.h>
-#include <linux/radix-tree.h>
+#include <linux/xarray.h>
 #include <linux/fs.h>
 #include <linux/slab.h>
 #include <linux/backing-dev.h>
@@ -29,9 +29,9 @@
 #define PAGE_SECTORS		(1 << PAGE_SECTORS_SHIFT)
 
 /*
- * Each block ramdisk device has a radix_tree brd_pages of pages that stores
- * the pages containing the block device's contents. A brd page's ->index is
- * its offset in PAGE_SIZE units. This is similar to, but in no way connected
+ * Each block ramdisk device has an xarray brd_pages that stores the pages
+ * containing the block device's contents. A brd page's ->index is its
+ * offset in PAGE_SIZE units. This is similar to, but in no way connected
  * with, the kernel's pagecache or buffer cache (which sit above our block
  * device).
  */
@@ -41,13 +41,7 @@ struct brd_device {
 	struct request_queue	*brd_queue;
 	struct gendisk		*brd_disk;
 	struct list_head	brd_list;
-
-	/*
-	 * Backing store of pages and lock to protect it. This is the contents
-	 * of the block device.
-	 */
-	spinlock_t		brd_lock;
-	struct radix_tree_root	brd_pages;
+	struct xarray		brd_pages;
 };
 
 /*
@@ -62,17 +56,9 @@ static struct page *brd_lookup_page(struct brd_device *brd, sector_t sector)
 	 * The page lifetime is protected by the fact that we have opened the
 	 * device node -- brd pages will never be deleted under us, so we
 	 * don't need any further locking or refcounting.
-	 *
-	 * This is strictly true for the radix-tree nodes as well (ie. we
-	 * don't actually need the rcu_read_lock()), however that is not a
-	 * documented feature of the radix-tree API so it is better to be
-	 * safe here (we don't have total exclusion from radix tree updates
-	 * here, only deletes).
 	 */
-	rcu_read_lock();
 	idx = sector >> PAGE_SECTORS_SHIFT; /* sector to page index */
-	page = radix_tree_lookup(&brd->brd_pages, idx);
-	rcu_read_unlock();
+	page = xa_load(&brd->brd_pages, idx);
 
 	BUG_ON(page && page->index != idx);
 
@@ -87,7 +73,7 @@ static struct page *brd_lookup_page(struct brd_device *brd, sector_t sector)
 static struct page *brd_insert_page(struct brd_device *brd, sector_t sector)
 {
 	pgoff_t idx;
-	struct page *page;
+	struct page *curr, *page;
 	gfp_t gfp_flags;
 
 	page = brd_lookup_page(brd, sector);
@@ -108,62 +94,36 @@ static struct page *brd_insert_page(struct brd_device *brd, sector_t sector)
 	if (!page)
 		return NULL;
 
-	if (radix_tree_preload(GFP_NOIO)) {
-		__free_page(page);
-		return NULL;
-	}
-
-	spin_lock(&brd->brd_lock);
 	idx = sector >> PAGE_SECTORS_SHIFT;
 	page->index = idx;
-	if (radix_tree_insert(&brd->brd_pages, idx, page)) {
+	curr = xa_cmpxchg(&brd->brd_pages, idx, NULL, page, GFP_NOIO);
+	if (curr) {
 		__free_page(page);
-		page = radix_tree_lookup(&brd->brd_pages, idx);
+		page = curr;
 		BUG_ON(!page);
 		BUG_ON(page->index != idx);
 	}
-	spin_unlock(&brd->brd_lock);
-
-	radix_tree_preload_end();
 
 	return page;
 }
 
 /*
- * Free all backing store pages and radix tree. This must only be called when
+ * Free all backing store pages and xarray.  This must only be called when
  * there are no other users of the device.
  */
-#define FREE_BATCH 16
 static void brd_free_pages(struct brd_device *brd)
 {
-	unsigned long pos = 0;
-	struct page *pages[FREE_BATCH];
-	int nr_pages;
-
-	do {
-		int i;
-
-		nr_pages = radix_tree_gang_lookup(&brd->brd_pages,
-				(void **)pages, pos, FREE_BATCH);
-
-		for (i = 0; i < nr_pages; i++) {
-			void *ret;
-
-			BUG_ON(pages[i]->index < pos);
-			pos = pages[i]->index;
-			ret = radix_tree_delete(&brd->brd_pages, pos);
-			BUG_ON(!ret || ret != pages[i]);
-			__free_page(pages[i]);
-		}
-
-		pos++;
-
-		/*
-		 * This assumes radix_tree_gang_lookup always returns as
-		 * many pages as possible. If the radix-tree code changes,
-		 * so will this have to.
-		 */
-	} while (nr_pages == FREE_BATCH);
+	XA_STATE(xas, 0);
+	struct page *page;
+
+	/* lockdep can't know there are no other users */
+	xa_lock(&brd->brd_pages);
+	xas_for_each(&brd->brd_pages, &xas, page, ULONG_MAX) {
+		BUG_ON(page->index != xas.xa_index);
+		__free_page(page);
+		xas_store(&brd->brd_pages, &xas, NULL);
+	}
+	xa_unlock(&brd->brd_pages);
 }
 
 /*
@@ -373,8 +333,7 @@ static struct brd_device *brd_alloc(int i)
 	if (!brd)
 		goto out;
 	brd->brd_number		= i;
-	spin_lock_init(&brd->brd_lock);
-	INIT_RADIX_TREE(&brd->brd_pages, GFP_ATOMIC);
+	xa_init(&brd->brd_pages);
 
 	brd->brd_queue = blk_alloc_queue(GFP_KERNEL);
 	if (!brd->brd_queue)
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 43/62] brd: Convert to XArray
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Convert brd_pages from a radix tree to an XArray.  Simpler and smaller
code; in particular another user of radix_tree_preload is eliminated.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 drivers/block/brd.c | 87 ++++++++++++++---------------------------------------
 1 file changed, 23 insertions(+), 64 deletions(-)

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 8028a3a7e7fd..c9fda03810ea 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -17,7 +17,7 @@
 #include <linux/bio.h>
 #include <linux/highmem.h>
 #include <linux/mutex.h>
-#include <linux/radix-tree.h>
+#include <linux/xarray.h>
 #include <linux/fs.h>
 #include <linux/slab.h>
 #include <linux/backing-dev.h>
@@ -29,9 +29,9 @@
 #define PAGE_SECTORS		(1 << PAGE_SECTORS_SHIFT)
 
 /*
- * Each block ramdisk device has a radix_tree brd_pages of pages that stores
- * the pages containing the block device's contents. A brd page's ->index is
- * its offset in PAGE_SIZE units. This is similar to, but in no way connected
+ * Each block ramdisk device has an xarray brd_pages that stores the pages
+ * containing the block device's contents. A brd page's ->index is its
+ * offset in PAGE_SIZE units. This is similar to, but in no way connected
  * with, the kernel's pagecache or buffer cache (which sit above our block
  * device).
  */
@@ -41,13 +41,7 @@ struct brd_device {
 	struct request_queue	*brd_queue;
 	struct gendisk		*brd_disk;
 	struct list_head	brd_list;
-
-	/*
-	 * Backing store of pages and lock to protect it. This is the contents
-	 * of the block device.
-	 */
-	spinlock_t		brd_lock;
-	struct radix_tree_root	brd_pages;
+	struct xarray		brd_pages;
 };
 
 /*
@@ -62,17 +56,9 @@ static struct page *brd_lookup_page(struct brd_device *brd, sector_t sector)
 	 * The page lifetime is protected by the fact that we have opened the
 	 * device node -- brd pages will never be deleted under us, so we
 	 * don't need any further locking or refcounting.
-	 *
-	 * This is strictly true for the radix-tree nodes as well (ie. we
-	 * don't actually need the rcu_read_lock()), however that is not a
-	 * documented feature of the radix-tree API so it is better to be
-	 * safe here (we don't have total exclusion from radix tree updates
-	 * here, only deletes).
 	 */
-	rcu_read_lock();
 	idx = sector >> PAGE_SECTORS_SHIFT; /* sector to page index */
-	page = radix_tree_lookup(&brd->brd_pages, idx);
-	rcu_read_unlock();
+	page = xa_load(&brd->brd_pages, idx);
 
 	BUG_ON(page && page->index != idx);
 
@@ -87,7 +73,7 @@ static struct page *brd_lookup_page(struct brd_device *brd, sector_t sector)
 static struct page *brd_insert_page(struct brd_device *brd, sector_t sector)
 {
 	pgoff_t idx;
-	struct page *page;
+	struct page *curr, *page;
 	gfp_t gfp_flags;
 
 	page = brd_lookup_page(brd, sector);
@@ -108,62 +94,36 @@ static struct page *brd_insert_page(struct brd_device *brd, sector_t sector)
 	if (!page)
 		return NULL;
 
-	if (radix_tree_preload(GFP_NOIO)) {
-		__free_page(page);
-		return NULL;
-	}
-
-	spin_lock(&brd->brd_lock);
 	idx = sector >> PAGE_SECTORS_SHIFT;
 	page->index = idx;
-	if (radix_tree_insert(&brd->brd_pages, idx, page)) {
+	curr = xa_cmpxchg(&brd->brd_pages, idx, NULL, page, GFP_NOIO);
+	if (curr) {
 		__free_page(page);
-		page = radix_tree_lookup(&brd->brd_pages, idx);
+		page = curr;
 		BUG_ON(!page);
 		BUG_ON(page->index != idx);
 	}
-	spin_unlock(&brd->brd_lock);
-
-	radix_tree_preload_end();
 
 	return page;
 }
 
 /*
- * Free all backing store pages and radix tree. This must only be called when
+ * Free all backing store pages and xarray.  This must only be called when
  * there are no other users of the device.
  */
-#define FREE_BATCH 16
 static void brd_free_pages(struct brd_device *brd)
 {
-	unsigned long pos = 0;
-	struct page *pages[FREE_BATCH];
-	int nr_pages;
-
-	do {
-		int i;
-
-		nr_pages = radix_tree_gang_lookup(&brd->brd_pages,
-				(void **)pages, pos, FREE_BATCH);
-
-		for (i = 0; i < nr_pages; i++) {
-			void *ret;
-
-			BUG_ON(pages[i]->index < pos);
-			pos = pages[i]->index;
-			ret = radix_tree_delete(&brd->brd_pages, pos);
-			BUG_ON(!ret || ret != pages[i]);
-			__free_page(pages[i]);
-		}
-
-		pos++;
-
-		/*
-		 * This assumes radix_tree_gang_lookup always returns as
-		 * many pages as possible. If the radix-tree code changes,
-		 * so will this have to.
-		 */
-	} while (nr_pages == FREE_BATCH);
+	XA_STATE(xas, 0);
+	struct page *page;
+
+	/* lockdep can't know there are no other users */
+	xa_lock(&brd->brd_pages);
+	xas_for_each(&brd->brd_pages, &xas, page, ULONG_MAX) {
+		BUG_ON(page->index != xas.xa_index);
+		__free_page(page);
+		xas_store(&brd->brd_pages, &xas, NULL);
+	}
+	xa_unlock(&brd->brd_pages);
 }
 
 /*
@@ -373,8 +333,7 @@ static struct brd_device *brd_alloc(int i)
 	if (!brd)
 		goto out;
 	brd->brd_number		= i;
-	spin_lock_init(&brd->brd_lock);
-	INIT_RADIX_TREE(&brd->brd_pages, GFP_ATOMIC);
+	xa_init(&brd->brd_pages);
 
 	brd->brd_queue = blk_alloc_queue(GFP_KERNEL);
 	if (!brd->brd_queue)
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 44/62] xfs: Convert m_perag_tree to XArray
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Getting rid of the m_perag_lock lets us also get rid of the call to
radix_tree_preload().  This is a relatively naive conversion; we could
improve performance over the radix tree implementation by passing around
xa_state pointers instead of indices, possibly at the expense of extending
rcu_read_lock() periods.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/xfs/libxfs/xfs_sb.c |  9 ++++-----
 fs/xfs/xfs_icache.c    | 35 +++++++++--------------------------
 fs/xfs/xfs_icache.h    |  6 +++---
 fs/xfs/xfs_mount.c     | 19 ++++---------------
 fs/xfs/xfs_mount.h     |  3 +--
 5 files changed, 21 insertions(+), 51 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 9b5aae2bcc0b..811fa57007c9 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -59,7 +59,7 @@ xfs_perag_get(
 	int			ref = 0;
 
 	rcu_read_lock();
-	pag = radix_tree_lookup(&mp->m_perag_tree, agno);
+	pag = xa_load(&mp->m_perag_xa, agno);
 	if (pag) {
 		ASSERT(atomic_read(&pag->pag_ref) >= 0);
 		ref = atomic_inc_return(&pag->pag_ref);
@@ -78,14 +78,13 @@ xfs_perag_get_tag(
 	xfs_agnumber_t		first,
 	int			tag)
 {
+	XA_STATE(xas, first);
 	struct xfs_perag	*pag;
-	int			found;
 	int			ref;
 
 	rcu_read_lock();
-	found = radix_tree_gang_lookup_tag(&mp->m_perag_tree,
-					(void **)&pag, first, 1, tag);
-	if (found <= 0) {
+	pag = xas_find_tag(&mp->m_perag_xa, &xas, ULONG_MAX, tag);
+	if (!pag) {
 		rcu_read_unlock();
 		return NULL;
 	}
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 43005fbe8b1e..f56e500d89e2 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -156,13 +156,10 @@ static void
 xfs_reclaim_work_queue(
 	struct xfs_mount        *mp)
 {
-
-	rcu_read_lock();
-	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_RECLAIM_TAG)) {
+	if (xa_tagged(&mp->m_perag_xa, XFS_ICI_RECLAIM_TAG)) {
 		queue_delayed_work(mp->m_reclaim_workqueue, &mp->m_reclaim_work,
 			msecs_to_jiffies(xfs_syncd_centisecs / 6 * 10));
 	}
-	rcu_read_unlock();
 }
 
 /*
@@ -194,10 +191,7 @@ xfs_perag_set_reclaim_tag(
 		return;
 
 	/* propagate the reclaim tag up into the perag radix tree */
-	spin_lock(&mp->m_perag_lock);
-	radix_tree_tag_set(&mp->m_perag_tree, pag->pag_agno,
-			   XFS_ICI_RECLAIM_TAG);
-	spin_unlock(&mp->m_perag_lock);
+	xa_set_tag(&mp->m_perag_xa, pag->pag_agno, XFS_ICI_RECLAIM_TAG);
 
 	/* schedule periodic background inode reclaim */
 	xfs_reclaim_work_queue(mp);
@@ -216,10 +210,7 @@ xfs_perag_clear_reclaim_tag(
 		return;
 
 	/* clear the reclaim tag from the perag radix tree */
-	spin_lock(&mp->m_perag_lock);
-	radix_tree_tag_clear(&mp->m_perag_tree, pag->pag_agno,
-			     XFS_ICI_RECLAIM_TAG);
-	spin_unlock(&mp->m_perag_lock);
+	xa_clear_tag(&mp->m_perag_xa, pag->pag_agno, XFS_ICI_RECLAIM_TAG);
 	trace_xfs_perag_clear_reclaim(mp, pag->pag_agno, -1, _RET_IP_);
 }
 
@@ -847,12 +838,10 @@ void
 xfs_queue_eofblocks(
 	struct xfs_mount *mp)
 {
-	rcu_read_lock();
-	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_EOFBLOCKS_TAG))
+	if (xa_tagged(&mp->m_perag_xa, XFS_ICI_EOFBLOCKS_TAG))
 		queue_delayed_work(mp->m_eofblocks_workqueue,
 				   &mp->m_eofblocks_work,
 				   msecs_to_jiffies(xfs_eofb_secs * 1000));
-	rcu_read_unlock();
 }
 
 void
@@ -874,12 +863,10 @@ STATIC void
 xfs_queue_cowblocks(
 	struct xfs_mount *mp)
 {
-	rcu_read_lock();
-	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_COWBLOCKS_TAG))
+	if (xa_tagged(&mp->m_perag_xa, XFS_ICI_COWBLOCKS_TAG))
 		queue_delayed_work(mp->m_eofblocks_workqueue,
 				   &mp->m_cowblocks_work,
 				   msecs_to_jiffies(xfs_cowb_secs * 1000));
-	rcu_read_unlock();
 }
 
 void
@@ -1542,7 +1529,7 @@ __xfs_inode_set_eofblocks_tag(
 	void		(*execute)(struct xfs_mount *mp),
 	void		(*set_tp)(struct xfs_mount *mp, xfs_agnumber_t agno,
 				  int error, unsigned long caller_ip),
-	int		tag)
+	xa_tag_t	tag)
 {
 	struct xfs_mount *mp = ip->i_mount;
 	struct xfs_perag *pag;
@@ -1566,11 +1553,9 @@ __xfs_inode_set_eofblocks_tag(
 			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino), tag);
 	if (!tagged) {
 		/* propagate the eofblocks tag up into the perag radix tree */
-		spin_lock(&ip->i_mount->m_perag_lock);
-		radix_tree_tag_set(&ip->i_mount->m_perag_tree,
+		xa_set_tag(&ip->i_mount->m_perag_xa,
 				   XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino),
 				   tag);
-		spin_unlock(&ip->i_mount->m_perag_lock);
 
 		/* kick off background trimming */
 		execute(ip->i_mount);
@@ -1597,7 +1582,7 @@ __xfs_inode_clear_eofblocks_tag(
 	xfs_inode_t	*ip,
 	void		(*clear_tp)(struct xfs_mount *mp, xfs_agnumber_t agno,
 				    int error, unsigned long caller_ip),
-	int		tag)
+	xa_tag_t	tag)
 {
 	struct xfs_mount *mp = ip->i_mount;
 	struct xfs_perag *pag;
@@ -1613,11 +1598,9 @@ __xfs_inode_clear_eofblocks_tag(
 			     XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino), tag);
 	if (!radix_tree_tagged(&pag->pag_ici_root, tag)) {
 		/* clear the eofblocks tag from the perag radix tree */
-		spin_lock(&ip->i_mount->m_perag_lock);
-		radix_tree_tag_clear(&ip->i_mount->m_perag_tree,
+		xa_clear_tag(&ip->i_mount->m_perag_xa,
 				     XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino),
 				     tag);
-		spin_unlock(&ip->i_mount->m_perag_lock);
 		clear_tp(ip->i_mount, pag->pag_agno, -1, _RET_IP_);
 	}
 
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index bff4d85e5498..bd04d5adadfe 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -37,9 +37,9 @@ struct xfs_eofblocks {
  */
 #define XFS_ICI_NO_TAG		(-1)	/* special flag for an untagged lookup
 					   in xfs_inode_ag_iterator */
-#define XFS_ICI_RECLAIM_TAG	0	/* inode is to be reclaimed */
-#define XFS_ICI_EOFBLOCKS_TAG	1	/* inode has blocks beyond EOF */
-#define XFS_ICI_COWBLOCKS_TAG	2	/* inode can have cow blocks to gc */
+#define XFS_ICI_RECLAIM_TAG	XA_TAG_0 /* inode is to be reclaimed */
+#define XFS_ICI_EOFBLOCKS_TAG	XA_TAG_1 /* inode has blocks beyond EOF */
+#define XFS_ICI_COWBLOCKS_TAG	XA_TAG_2 /* inode can have cow blocks to gc */
 
 /*
  * Flags for xfs_iget()
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index c879b517cc94..ae90f6f8fa5c 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -156,9 +156,7 @@ xfs_free_perag(
 	struct xfs_perag *pag;
 
 	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
-		spin_lock(&mp->m_perag_lock);
-		pag = radix_tree_delete(&mp->m_perag_tree, agno);
-		spin_unlock(&mp->m_perag_lock);
+		pag = xa_store(&mp->m_perag_xa, agno, NULL, GFP_NOWAIT);
 		ASSERT(pag);
 		ASSERT(atomic_read(&pag->pag_ref) == 0);
 		xfs_buf_hash_destroy(pag);
@@ -219,19 +217,11 @@ xfs_initialize_perag(
 			goto out_free_pag;
 		init_waitqueue_head(&pag->pagb_wait);
 
-		if (radix_tree_preload(GFP_NOFS))
-			goto out_hash_destroy;
-
-		spin_lock(&mp->m_perag_lock);
-		if (radix_tree_insert(&mp->m_perag_tree, index, pag)) {
+		if (xa_store(&mp->m_perag_xa, index, pag, GFP_NOFS)) {
 			BUG();
-			spin_unlock(&mp->m_perag_lock);
-			radix_tree_preload_end();
 			error = -EEXIST;
 			goto out_hash_destroy;
 		}
-		spin_unlock(&mp->m_perag_lock);
-		radix_tree_preload_end();
 		/* first new pag is fully initialized */
 		if (first_initialised == NULLAGNUMBER)
 			first_initialised = index;
@@ -252,7 +242,7 @@ xfs_initialize_perag(
 out_unwind_new_pags:
 	/* unwind any prior newly initialized pags */
 	for (index = first_initialised; index < agcount; index++) {
-		pag = radix_tree_delete(&mp->m_perag_tree, index);
+		pag = xa_store(&mp->m_perag_xa, index, NULL, GFP_NOWAIT);
 		if (!pag)
 			break;
 		xfs_buf_hash_destroy(pag);
@@ -816,8 +806,7 @@ xfs_mountfs(
 	/*
 	 * Allocate and initialize the per-ag data.
 	 */
-	spin_lock_init(&mp->m_perag_lock);
-	INIT_RADIX_TREE(&mp->m_perag_tree, GFP_ATOMIC);
+	xa_init(&mp->m_perag_xa);
 	error = xfs_initialize_perag(mp, sbp->sb_agcount, &mp->m_maxagi);
 	if (error) {
 		xfs_warn(mp, "Failed per-ag init: %d", error);
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index e0792d036be2..6e5ad7b26f46 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -134,8 +134,7 @@ typedef struct xfs_mount {
 	xfs_extlen_t		m_ag_prealloc_blocks; /* reserved ag blocks */
 	uint			m_alloc_set_aside; /* space we can't use */
 	uint			m_ag_max_usable; /* max space per AG */
-	struct radix_tree_root	m_perag_tree;	/* per-ag accounting info */
-	spinlock_t		m_perag_lock;	/* lock for m_perag_tree */
+	struct xarray		m_perag_xa;	/* per-ag accounting info */
 	struct mutex		m_growlock;	/* growfs mutex */
 	int			m_fixedfsid[2];	/* unchanged for life of FS */
 	uint			m_dmevmask;	/* DMI events for this FS */
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 44/62] xfs: Convert m_perag_tree to XArray
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Getting rid of the m_perag_lock lets us also get rid of the call to
radix_tree_preload().  This is a relatively naive conversion; we could
improve performance over the radix tree implementation by passing around
xa_state pointers instead of indices, possibly at the expense of extending
rcu_read_lock() periods.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/xfs/libxfs/xfs_sb.c |  9 ++++-----
 fs/xfs/xfs_icache.c    | 35 +++++++++--------------------------
 fs/xfs/xfs_icache.h    |  6 +++---
 fs/xfs/xfs_mount.c     | 19 ++++---------------
 fs/xfs/xfs_mount.h     |  3 +--
 5 files changed, 21 insertions(+), 51 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 9b5aae2bcc0b..811fa57007c9 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -59,7 +59,7 @@ xfs_perag_get(
 	int			ref = 0;
 
 	rcu_read_lock();
-	pag = radix_tree_lookup(&mp->m_perag_tree, agno);
+	pag = xa_load(&mp->m_perag_xa, agno);
 	if (pag) {
 		ASSERT(atomic_read(&pag->pag_ref) >= 0);
 		ref = atomic_inc_return(&pag->pag_ref);
@@ -78,14 +78,13 @@ xfs_perag_get_tag(
 	xfs_agnumber_t		first,
 	int			tag)
 {
+	XA_STATE(xas, first);
 	struct xfs_perag	*pag;
-	int			found;
 	int			ref;
 
 	rcu_read_lock();
-	found = radix_tree_gang_lookup_tag(&mp->m_perag_tree,
-					(void **)&pag, first, 1, tag);
-	if (found <= 0) {
+	pag = xas_find_tag(&mp->m_perag_xa, &xas, ULONG_MAX, tag);
+	if (!pag) {
 		rcu_read_unlock();
 		return NULL;
 	}
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 43005fbe8b1e..f56e500d89e2 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -156,13 +156,10 @@ static void
 xfs_reclaim_work_queue(
 	struct xfs_mount        *mp)
 {
-
-	rcu_read_lock();
-	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_RECLAIM_TAG)) {
+	if (xa_tagged(&mp->m_perag_xa, XFS_ICI_RECLAIM_TAG)) {
 		queue_delayed_work(mp->m_reclaim_workqueue, &mp->m_reclaim_work,
 			msecs_to_jiffies(xfs_syncd_centisecs / 6 * 10));
 	}
-	rcu_read_unlock();
 }
 
 /*
@@ -194,10 +191,7 @@ xfs_perag_set_reclaim_tag(
 		return;
 
 	/* propagate the reclaim tag up into the perag radix tree */
-	spin_lock(&mp->m_perag_lock);
-	radix_tree_tag_set(&mp->m_perag_tree, pag->pag_agno,
-			   XFS_ICI_RECLAIM_TAG);
-	spin_unlock(&mp->m_perag_lock);
+	xa_set_tag(&mp->m_perag_xa, pag->pag_agno, XFS_ICI_RECLAIM_TAG);
 
 	/* schedule periodic background inode reclaim */
 	xfs_reclaim_work_queue(mp);
@@ -216,10 +210,7 @@ xfs_perag_clear_reclaim_tag(
 		return;
 
 	/* clear the reclaim tag from the perag radix tree */
-	spin_lock(&mp->m_perag_lock);
-	radix_tree_tag_clear(&mp->m_perag_tree, pag->pag_agno,
-			     XFS_ICI_RECLAIM_TAG);
-	spin_unlock(&mp->m_perag_lock);
+	xa_clear_tag(&mp->m_perag_xa, pag->pag_agno, XFS_ICI_RECLAIM_TAG);
 	trace_xfs_perag_clear_reclaim(mp, pag->pag_agno, -1, _RET_IP_);
 }
 
@@ -847,12 +838,10 @@ void
 xfs_queue_eofblocks(
 	struct xfs_mount *mp)
 {
-	rcu_read_lock();
-	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_EOFBLOCKS_TAG))
+	if (xa_tagged(&mp->m_perag_xa, XFS_ICI_EOFBLOCKS_TAG))
 		queue_delayed_work(mp->m_eofblocks_workqueue,
 				   &mp->m_eofblocks_work,
 				   msecs_to_jiffies(xfs_eofb_secs * 1000));
-	rcu_read_unlock();
 }
 
 void
@@ -874,12 +863,10 @@ STATIC void
 xfs_queue_cowblocks(
 	struct xfs_mount *mp)
 {
-	rcu_read_lock();
-	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_COWBLOCKS_TAG))
+	if (xa_tagged(&mp->m_perag_xa, XFS_ICI_COWBLOCKS_TAG))
 		queue_delayed_work(mp->m_eofblocks_workqueue,
 				   &mp->m_cowblocks_work,
 				   msecs_to_jiffies(xfs_cowb_secs * 1000));
-	rcu_read_unlock();
 }
 
 void
@@ -1542,7 +1529,7 @@ __xfs_inode_set_eofblocks_tag(
 	void		(*execute)(struct xfs_mount *mp),
 	void		(*set_tp)(struct xfs_mount *mp, xfs_agnumber_t agno,
 				  int error, unsigned long caller_ip),
-	int		tag)
+	xa_tag_t	tag)
 {
 	struct xfs_mount *mp = ip->i_mount;
 	struct xfs_perag *pag;
@@ -1566,11 +1553,9 @@ __xfs_inode_set_eofblocks_tag(
 			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino), tag);
 	if (!tagged) {
 		/* propagate the eofblocks tag up into the perag radix tree */
-		spin_lock(&ip->i_mount->m_perag_lock);
-		radix_tree_tag_set(&ip->i_mount->m_perag_tree,
+		xa_set_tag(&ip->i_mount->m_perag_xa,
 				   XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino),
 				   tag);
-		spin_unlock(&ip->i_mount->m_perag_lock);
 
 		/* kick off background trimming */
 		execute(ip->i_mount);
@@ -1597,7 +1582,7 @@ __xfs_inode_clear_eofblocks_tag(
 	xfs_inode_t	*ip,
 	void		(*clear_tp)(struct xfs_mount *mp, xfs_agnumber_t agno,
 				    int error, unsigned long caller_ip),
-	int		tag)
+	xa_tag_t	tag)
 {
 	struct xfs_mount *mp = ip->i_mount;
 	struct xfs_perag *pag;
@@ -1613,11 +1598,9 @@ __xfs_inode_clear_eofblocks_tag(
 			     XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino), tag);
 	if (!radix_tree_tagged(&pag->pag_ici_root, tag)) {
 		/* clear the eofblocks tag from the perag radix tree */
-		spin_lock(&ip->i_mount->m_perag_lock);
-		radix_tree_tag_clear(&ip->i_mount->m_perag_tree,
+		xa_clear_tag(&ip->i_mount->m_perag_xa,
 				     XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino),
 				     tag);
-		spin_unlock(&ip->i_mount->m_perag_lock);
 		clear_tp(ip->i_mount, pag->pag_agno, -1, _RET_IP_);
 	}
 
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index bff4d85e5498..bd04d5adadfe 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -37,9 +37,9 @@ struct xfs_eofblocks {
  */
 #define XFS_ICI_NO_TAG		(-1)	/* special flag for an untagged lookup
 					   in xfs_inode_ag_iterator */
-#define XFS_ICI_RECLAIM_TAG	0	/* inode is to be reclaimed */
-#define XFS_ICI_EOFBLOCKS_TAG	1	/* inode has blocks beyond EOF */
-#define XFS_ICI_COWBLOCKS_TAG	2	/* inode can have cow blocks to gc */
+#define XFS_ICI_RECLAIM_TAG	XA_TAG_0 /* inode is to be reclaimed */
+#define XFS_ICI_EOFBLOCKS_TAG	XA_TAG_1 /* inode has blocks beyond EOF */
+#define XFS_ICI_COWBLOCKS_TAG	XA_TAG_2 /* inode can have cow blocks to gc */
 
 /*
  * Flags for xfs_iget()
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index c879b517cc94..ae90f6f8fa5c 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -156,9 +156,7 @@ xfs_free_perag(
 	struct xfs_perag *pag;
 
 	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
-		spin_lock(&mp->m_perag_lock);
-		pag = radix_tree_delete(&mp->m_perag_tree, agno);
-		spin_unlock(&mp->m_perag_lock);
+		pag = xa_store(&mp->m_perag_xa, agno, NULL, GFP_NOWAIT);
 		ASSERT(pag);
 		ASSERT(atomic_read(&pag->pag_ref) == 0);
 		xfs_buf_hash_destroy(pag);
@@ -219,19 +217,11 @@ xfs_initialize_perag(
 			goto out_free_pag;
 		init_waitqueue_head(&pag->pagb_wait);
 
-		if (radix_tree_preload(GFP_NOFS))
-			goto out_hash_destroy;
-
-		spin_lock(&mp->m_perag_lock);
-		if (radix_tree_insert(&mp->m_perag_tree, index, pag)) {
+		if (xa_store(&mp->m_perag_xa, index, pag, GFP_NOFS)) {
 			BUG();
-			spin_unlock(&mp->m_perag_lock);
-			radix_tree_preload_end();
 			error = -EEXIST;
 			goto out_hash_destroy;
 		}
-		spin_unlock(&mp->m_perag_lock);
-		radix_tree_preload_end();
 		/* first new pag is fully initialized */
 		if (first_initialised == NULLAGNUMBER)
 			first_initialised = index;
@@ -252,7 +242,7 @@ xfs_initialize_perag(
 out_unwind_new_pags:
 	/* unwind any prior newly initialized pags */
 	for (index = first_initialised; index < agcount; index++) {
-		pag = radix_tree_delete(&mp->m_perag_tree, index);
+		pag = xa_store(&mp->m_perag_xa, index, NULL, GFP_NOWAIT);
 		if (!pag)
 			break;
 		xfs_buf_hash_destroy(pag);
@@ -816,8 +806,7 @@ xfs_mountfs(
 	/*
 	 * Allocate and initialize the per-ag data.
 	 */
-	spin_lock_init(&mp->m_perag_lock);
-	INIT_RADIX_TREE(&mp->m_perag_tree, GFP_ATOMIC);
+	xa_init(&mp->m_perag_xa);
 	error = xfs_initialize_perag(mp, sbp->sb_agcount, &mp->m_maxagi);
 	if (error) {
 		xfs_warn(mp, "Failed per-ag init: %d", error);
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index e0792d036be2..6e5ad7b26f46 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -134,8 +134,7 @@ typedef struct xfs_mount {
 	xfs_extlen_t		m_ag_prealloc_blocks; /* reserved ag blocks */
 	uint			m_alloc_set_aside; /* space we can't use */
 	uint			m_ag_max_usable; /* max space per AG */
-	struct radix_tree_root	m_perag_tree;	/* per-ag accounting info */
-	spinlock_t		m_perag_lock;	/* lock for m_perag_tree */
+	struct xarray		m_perag_xa;	/* per-ag accounting info */
 	struct mutex		m_growlock;	/* growfs mutex */
 	int			m_fixedfsid[2];	/* unchanged for life of FS */
 	uint			m_dmevmask;	/* DMI events for this FS */
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 45/62] xfs: Convert pag_ici_root to XArray
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Rename pag_ici_root to pag_ici_xa and use XArray APIs instead of radix
tree APIs.  Shorter code, typechecking on tag numbers, better error
checking in xfs_reclaim_inode(), and eliminates a call to
radix_tree_preload().

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/xfs/libxfs/xfs_sb.c |   2 +-
 fs/xfs/libxfs/xfs_sb.h |   2 +-
 fs/xfs/xfs_icache.c    | 107 ++++++++++++++++++++-----------------------------
 fs/xfs/xfs_icache.h    |   4 +-
 fs/xfs/xfs_inode.c     |  24 ++++-------
 fs/xfs/xfs_mount.c     |   3 +-
 fs/xfs/xfs_mount.h     |   3 +-
 7 files changed, 57 insertions(+), 88 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 811fa57007c9..9345fad57db8 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -76,7 +76,7 @@ struct xfs_perag *
 xfs_perag_get_tag(
 	struct xfs_mount	*mp,
 	xfs_agnumber_t		first,
-	int			tag)
+	xa_tag_t		tag)
 {
 	XA_STATE(xas, first);
 	struct xfs_perag	*pag;
diff --git a/fs/xfs/libxfs/xfs_sb.h b/fs/xfs/libxfs/xfs_sb.h
index 961e6475a309..d2de90b8f39c 100644
--- a/fs/xfs/libxfs/xfs_sb.h
+++ b/fs/xfs/libxfs/xfs_sb.h
@@ -23,7 +23,7 @@
  */
 extern struct xfs_perag *xfs_perag_get(struct xfs_mount *, xfs_agnumber_t);
 extern struct xfs_perag *xfs_perag_get_tag(struct xfs_mount *, xfs_agnumber_t,
-					   int tag);
+					   xa_tag_t tag);
 extern void	xfs_perag_put(struct xfs_perag *pag);
 extern int	xfs_initialize_perag_data(struct xfs_mount *, xfs_agnumber_t);
 
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index f56e500d89e2..26f261ef708a 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -186,7 +186,7 @@ xfs_perag_set_reclaim_tag(
 {
 	struct xfs_mount	*mp = pag->pag_mount;
 
-	lockdep_assert_held(&pag->pag_ici_lock);
+	lockdep_assert_held(&pag->pag_ici_xa.xa_lock);
 	if (pag->pag_ici_reclaimable++)
 		return;
 
@@ -205,7 +205,7 @@ xfs_perag_clear_reclaim_tag(
 {
 	struct xfs_mount	*mp = pag->pag_mount;
 
-	lockdep_assert_held(&pag->pag_ici_lock);
+	lockdep_assert_held(&pag->pag_ici_xa.xa_lock);
 	if (--pag->pag_ici_reclaimable)
 		return;
 
@@ -228,16 +228,16 @@ xfs_inode_set_reclaim_tag(
 	struct xfs_perag	*pag;
 
 	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
-	spin_lock(&pag->pag_ici_lock);
+	xa_lock(&pag->pag_ici_xa);
 	spin_lock(&ip->i_flags_lock);
 
-	radix_tree_tag_set(&pag->pag_ici_root, XFS_INO_TO_AGINO(mp, ip->i_ino),
+	__xa_set_tag(&pag->pag_ici_xa, XFS_INO_TO_AGINO(mp, ip->i_ino),
 			   XFS_ICI_RECLAIM_TAG);
 	xfs_perag_set_reclaim_tag(pag);
 	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
 
 	spin_unlock(&ip->i_flags_lock);
-	spin_unlock(&pag->pag_ici_lock);
+	xa_unlock(&pag->pag_ici_xa);
 	xfs_perag_put(pag);
 }
 
@@ -246,7 +246,7 @@ xfs_inode_clear_reclaim_tag(
 	struct xfs_perag	*pag,
 	xfs_ino_t		ino)
 {
-	radix_tree_tag_clear(&pag->pag_ici_root,
+	__xa_clear_tag(&pag->pag_ici_xa,
 			     XFS_INO_TO_AGINO(pag->pag_mount, ino),
 			     XFS_ICI_RECLAIM_TAG);
 	xfs_perag_clear_reclaim_tag(pag);
@@ -367,8 +367,8 @@ xfs_iget_cache_hit(
 		/*
 		 * We need to set XFS_IRECLAIM to prevent xfs_reclaim_inode
 		 * from stomping over us while we recycle the inode.  We can't
-		 * clear the radix tree reclaimable tag yet as it requires
-		 * pag_ici_lock to be held exclusive.
+		 * clear the xarray reclaimable tag yet as it requires
+		 * pag_ici_xa.xa_lock to be held exclusive.
 		 */
 		ip->i_flags |= XFS_IRECLAIM;
 
@@ -393,7 +393,7 @@ xfs_iget_cache_hit(
 			goto out_error;
 		}
 
-		spin_lock(&pag->pag_ici_lock);
+		xa_lock(&pag->pag_ici_xa);
 		spin_lock(&ip->i_flags_lock);
 
 		/*
@@ -410,7 +410,7 @@ xfs_iget_cache_hit(
 		init_rwsem(&inode->i_rwsem);
 
 		spin_unlock(&ip->i_flags_lock);
-		spin_unlock(&pag->pag_ici_lock);
+		xa_unlock(&pag->pag_ici_xa);
 	} else {
 		/* If the VFS inode is being torn down, pause and try again. */
 		if (!igrab(inode)) {
@@ -451,7 +451,7 @@ xfs_iget_cache_miss(
 	int			flags,
 	int			lock_flags)
 {
-	struct xfs_inode	*ip;
+	struct xfs_inode	*ip, *curr;
 	int			error;
 	xfs_agino_t		agino = XFS_INO_TO_AGINO(mp, ino);
 	int			iflags;
@@ -471,17 +471,6 @@ xfs_iget_cache_miss(
 		goto out_destroy;
 	}
 
-	/*
-	 * Preload the radix tree so we can insert safely under the
-	 * write spinlock. Note that we cannot sleep inside the preload
-	 * region. Since we can be called from transaction context, don't
-	 * recurse into the file system.
-	 */
-	if (radix_tree_preload(GFP_NOFS)) {
-		error = -EAGAIN;
-		goto out_destroy;
-	}
-
 	/*
 	 * Because the inode hasn't been added to the radix-tree yet it can't
 	 * be found by another thread, so we can do the non-sleeping lock here.
@@ -509,23 +498,18 @@ xfs_iget_cache_miss(
 	xfs_iflags_set(ip, iflags);
 
 	/* insert the new inode */
-	spin_lock(&pag->pag_ici_lock);
-	error = radix_tree_insert(&pag->pag_ici_root, agino, ip);
-	if (unlikely(error)) {
-		WARN_ON(error != -EEXIST);
+	curr = xa_cmpxchg(&pag->pag_ici_xa, agino, NULL, ip, GFP_NOFS);
+	if (unlikely(curr)) {
+		WARN_ON(IS_ERR(curr));
 		XFS_STATS_INC(mp, xs_ig_dup);
 		error = -EAGAIN;
-		goto out_preload_end;
+		goto out_unlock;
 	}
-	spin_unlock(&pag->pag_ici_lock);
-	radix_tree_preload_end();
 
 	*ipp = ip;
 	return 0;
 
-out_preload_end:
-	spin_unlock(&pag->pag_ici_lock);
-	radix_tree_preload_end();
+out_unlock:
 	if (lock_flags)
 		xfs_iunlock(ip, lock_flags);
 out_destroy:
@@ -592,7 +576,7 @@ xfs_iget(
 again:
 	error = 0;
 	rcu_read_lock();
-	ip = radix_tree_lookup(&pag->pag_ici_root, agino);
+	ip = xa_load(&pag->pag_ici_xa, agino);
 
 	if (ip) {
 		error = xfs_iget_cache_hit(pag, ip, ino, flags, lock_flags);
@@ -731,7 +715,7 @@ xfs_inode_ag_walk(
 					   void *args),
 	int			flags,
 	void			*args,
-	int			tag,
+	xa_tag_t		tag,
 	int			iter_flags)
 {
 	uint32_t		first_index;
@@ -752,14 +736,13 @@ xfs_inode_ag_walk(
 
 		rcu_read_lock();
 
-		if (tag == -1)
-			nr_found = radix_tree_gang_lookup(&pag->pag_ici_root,
-					(void **)batch, first_index,
+		if (tag == XFS_ICI_NO_TAG)
+			nr_found = xa_get_entries(&pag->pag_ici_xa,
+					(void **)batch, first_index, ULONG_MAX,
 					XFS_LOOKUP_BATCH);
 		else
-			nr_found = radix_tree_gang_lookup_tag(
-					&pag->pag_ici_root,
-					(void **) batch, first_index,
+			nr_found = xa_get_tagged(&pag->pag_ici_xa,
+					(void **)batch, first_index, ULONG_MAX,
 					XFS_LOOKUP_BATCH, tag);
 
 		if (!nr_found) {
@@ -896,8 +879,8 @@ xfs_inode_ag_iterator_flags(
 	ag = 0;
 	while ((pag = xfs_perag_get(mp, ag))) {
 		ag = pag->pag_agno + 1;
-		error = xfs_inode_ag_walk(mp, pag, execute, flags, args, -1,
-					  iter_flags);
+		error = xfs_inode_ag_walk(mp, pag, execute, flags, args,
+					  XFS_ICI_NO_TAG, iter_flags);
 		xfs_perag_put(pag);
 		if (error) {
 			last_error = error;
@@ -926,7 +909,7 @@ xfs_inode_ag_iterator_tag(
 					   void *args),
 	int			flags,
 	void			*args,
-	int			tag)
+	xa_tag_t		tag)
 {
 	struct xfs_perag	*pag;
 	int			error = 0;
@@ -1040,8 +1023,9 @@ xfs_reclaim_inode(
 	int			sync_mode)
 {
 	struct xfs_buf		*bp = NULL;
-	xfs_ino_t		ino = ip->i_ino; /* for radix_tree_delete */
+	xfs_ino_t		ino = ip->i_ino;
 	int			error;
+	XA_STATE(xas, XFS_INO_TO_AGINO(ip->i_mount, ino));
 
 restart:
 	error = 0;
@@ -1128,16 +1112,14 @@ xfs_reclaim_inode(
 	/*
 	 * Remove the inode from the per-AG radix tree.
 	 *
-	 * Because radix_tree_delete won't complain even if the item was never
-	 * added to the tree assert that it's been there before to catch
-	 * problems with the inode life time early on.
+	 * Check that it was there before to catch problems with the
+	 * inode life time early on.
 	 */
-	spin_lock(&pag->pag_ici_lock);
-	if (!radix_tree_delete(&pag->pag_ici_root,
-				XFS_INO_TO_AGINO(ip->i_mount, ino)))
+	xa_lock(&pag->pag_ici_xa);
+	if (xas_store(&pag->pag_ici_xa, &xas, NULL) != ip)
 		ASSERT(0);
 	xfs_perag_clear_reclaim_tag(pag);
-	spin_unlock(&pag->pag_ici_lock);
+	xa_unlock(&pag->pag_ici_xa);
 
 	/*
 	 * Here we do an (almost) spurious inode lock in order to coordinate
@@ -1213,9 +1195,8 @@ xfs_reclaim_inodes_ag(
 			int	i;
 
 			rcu_read_lock();
-			nr_found = radix_tree_gang_lookup_tag(
-					&pag->pag_ici_root,
-					(void **)batch, first_index,
+			nr_found = xa_get_tagged(&pag->pag_ici_xa,
+					(void **)batch, first_index, ULONG_MAX,
 					XFS_LOOKUP_BATCH,
 					XFS_ICI_RECLAIM_TAG);
 			if (!nr_found) {
@@ -1450,7 +1431,7 @@ __xfs_icache_free_eofblocks(
 	struct xfs_eofblocks	*eofb,
 	int			(*execute)(struct xfs_inode *ip, int flags,
 					   void *args),
-	int			tag)
+	xa_tag_t		tag)
 {
 	int flags = SYNC_TRYLOCK;
 
@@ -1546,10 +1527,10 @@ __xfs_inode_set_eofblocks_tag(
 	spin_unlock(&ip->i_flags_lock);
 
 	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
-	spin_lock(&pag->pag_ici_lock);
+	xa_lock(&pag->pag_ici_xa);
 
-	tagged = radix_tree_tagged(&pag->pag_ici_root, tag);
-	radix_tree_tag_set(&pag->pag_ici_root,
+	tagged = xa_tagged(&pag->pag_ici_xa, tag);
+	__xa_set_tag(&pag->pag_ici_xa,
 			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino), tag);
 	if (!tagged) {
 		/* propagate the eofblocks tag up into the perag radix tree */
@@ -1563,7 +1544,7 @@ __xfs_inode_set_eofblocks_tag(
 		set_tp(ip->i_mount, pag->pag_agno, -1, _RET_IP_);
 	}
 
-	spin_unlock(&pag->pag_ici_lock);
+	xa_unlock(&pag->pag_ici_xa);
 	xfs_perag_put(pag);
 }
 
@@ -1592,11 +1573,11 @@ __xfs_inode_clear_eofblocks_tag(
 	spin_unlock(&ip->i_flags_lock);
 
 	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
-	spin_lock(&pag->pag_ici_lock);
+	xa_lock(&pag->pag_ici_xa);
 
-	radix_tree_tag_clear(&pag->pag_ici_root,
+	__xa_clear_tag(&pag->pag_ici_xa,
 			     XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino), tag);
-	if (!radix_tree_tagged(&pag->pag_ici_root, tag)) {
+	if (!xa_tagged(&pag->pag_ici_xa, tag)) {
 		/* clear the eofblocks tag from the perag radix tree */
 		xa_clear_tag(&ip->i_mount->m_perag_xa,
 				     XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino),
@@ -1604,7 +1585,7 @@ __xfs_inode_clear_eofblocks_tag(
 		clear_tp(ip->i_mount, pag->pag_agno, -1, _RET_IP_);
 	}
 
-	spin_unlock(&pag->pag_ici_lock);
+	xa_unlock(&pag->pag_ici_xa);
 	xfs_perag_put(pag);
 }
 
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index bd04d5adadfe..436e7f0b1ecc 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -35,7 +35,7 @@ struct xfs_eofblocks {
 /*
  * tags for inode radix tree
  */
-#define XFS_ICI_NO_TAG		(-1)	/* special flag for an untagged lookup
+#define XFS_ICI_NO_TAG		XA_NO_TAG /* special flag for an untagged lookup
 					   in xfs_inode_ag_iterator */
 #define XFS_ICI_RECLAIM_TAG	XA_TAG_0 /* inode is to be reclaimed */
 #define XFS_ICI_EOFBLOCKS_TAG	XA_TAG_1 /* inode has blocks beyond EOF */
@@ -90,7 +90,7 @@ int xfs_inode_ag_iterator_flags(struct xfs_mount *mp,
 	int flags, void *args, int iter_flags);
 int xfs_inode_ag_iterator_tag(struct xfs_mount *mp,
 	int (*execute)(struct xfs_inode *ip, int flags, void *args),
-	int flags, void *args, int tag);
+	int flags, void *args, xa_tag_t tag);
 
 static inline int
 xfs_fs_eofblocks_from_user(
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 61d1cb7dc10d..661f0ac655fa 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2309,7 +2309,7 @@ xfs_ifree_cluster(
 		for (i = 0; i < inodes_per_cluster; i++) {
 retry:
 			rcu_read_lock();
-			ip = radix_tree_lookup(&pag->pag_ici_root,
+			ip = xa_load(&pag->pag_ici_xa,
 					XFS_INO_TO_AGINO(mp, (inum + i)));
 
 			/* Inode not in memory, nothing to do */
@@ -3186,7 +3186,7 @@ xfs_iflush_cluster(
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_perag	*pag;
-	unsigned long		first_index, mask;
+	unsigned long		first_index, last_index, mask;
 	unsigned long		inodes_per_cluster;
 	int			cilist_size;
 	struct xfs_inode	**cilist;
@@ -3204,12 +3204,12 @@ xfs_iflush_cluster(
 	if (!cilist)
 		goto out_put;
 
-	mask = ~(((mp->m_inode_cluster_size >> mp->m_sb.sb_inodelog)) - 1);
-	first_index = XFS_INO_TO_AGINO(mp, ip->i_ino) & mask;
+	mask = (((mp->m_inode_cluster_size >> mp->m_sb.sb_inodelog)) - 1);
+	first_index = XFS_INO_TO_AGINO(mp, ip->i_ino) & ~mask;
+	last_index = first_index | mask;
 	rcu_read_lock();
-	/* really need a gang lookup range call here */
-	nr_found = radix_tree_gang_lookup(&pag->pag_ici_root, (void**)cilist,
-					first_index, inodes_per_cluster);
+	nr_found = xa_get_entries(&pag->pag_ici_xa, (void**)cilist, first_index,
+					last_index, inodes_per_cluster);
 	if (nr_found == 0)
 		goto out_free;
 
@@ -3230,16 +3230,6 @@ xfs_iflush_cluster(
 			spin_unlock(&cip->i_flags_lock);
 			continue;
 		}
-
-		/*
-		 * Once we fall off the end of the cluster, no point checking
-		 * any more inodes in the list because they will also all be
-		 * outside the cluster.
-		 */
-		if ((XFS_INO_TO_AGINO(mp, cip->i_ino) & mask) != first_index) {
-			spin_unlock(&cip->i_flags_lock);
-			break;
-		}
 		spin_unlock(&cip->i_flags_lock);
 
 		/*
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index ae90f6f8fa5c..d5185657f030 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -210,9 +210,8 @@ xfs_initialize_perag(
 			goto out_unwind_new_pags;
 		pag->pag_agno = index;
 		pag->pag_mount = mp;
-		spin_lock_init(&pag->pag_ici_lock);
 		mutex_init(&pag->pag_ici_reclaim_lock);
-		INIT_RADIX_TREE(&pag->pag_ici_root, GFP_ATOMIC);
+		xa_init(&pag->pag_ici_xa);
 		if (xfs_buf_hash_init(pag))
 			goto out_free_pag;
 		init_waitqueue_head(&pag->pagb_wait);
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 6e5ad7b26f46..ab0f706d2fd7 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -374,8 +374,7 @@ typedef struct xfs_perag {
 
 	atomic_t        pagf_fstrms;    /* # of filestreams active in this AG */
 
-	spinlock_t	pag_ici_lock;	/* incore inode cache lock */
-	struct radix_tree_root pag_ici_root;	/* incore inode cache root */
+	struct xarray	pag_ici_xa;	/* incore inode cache */
 	int		pag_ici_reclaimable;	/* reclaimable inodes */
 	struct mutex	pag_ici_reclaim_lock;	/* serialisation point */
 	unsigned long	pag_ici_reclaim_cursor;	/* reclaim restart point */
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 45/62] xfs: Convert pag_ici_root to XArray
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Rename pag_ici_root to pag_ici_xa and use XArray APIs instead of radix
tree APIs.  Shorter code, typechecking on tag numbers, better error
checking in xfs_reclaim_inode(), and eliminates a call to
radix_tree_preload().

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/xfs/libxfs/xfs_sb.c |   2 +-
 fs/xfs/libxfs/xfs_sb.h |   2 +-
 fs/xfs/xfs_icache.c    | 107 ++++++++++++++++++++-----------------------------
 fs/xfs/xfs_icache.h    |   4 +-
 fs/xfs/xfs_inode.c     |  24 ++++-------
 fs/xfs/xfs_mount.c     |   3 +-
 fs/xfs/xfs_mount.h     |   3 +-
 7 files changed, 57 insertions(+), 88 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 811fa57007c9..9345fad57db8 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -76,7 +76,7 @@ struct xfs_perag *
 xfs_perag_get_tag(
 	struct xfs_mount	*mp,
 	xfs_agnumber_t		first,
-	int			tag)
+	xa_tag_t		tag)
 {
 	XA_STATE(xas, first);
 	struct xfs_perag	*pag;
diff --git a/fs/xfs/libxfs/xfs_sb.h b/fs/xfs/libxfs/xfs_sb.h
index 961e6475a309..d2de90b8f39c 100644
--- a/fs/xfs/libxfs/xfs_sb.h
+++ b/fs/xfs/libxfs/xfs_sb.h
@@ -23,7 +23,7 @@
  */
 extern struct xfs_perag *xfs_perag_get(struct xfs_mount *, xfs_agnumber_t);
 extern struct xfs_perag *xfs_perag_get_tag(struct xfs_mount *, xfs_agnumber_t,
-					   int tag);
+					   xa_tag_t tag);
 extern void	xfs_perag_put(struct xfs_perag *pag);
 extern int	xfs_initialize_perag_data(struct xfs_mount *, xfs_agnumber_t);
 
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index f56e500d89e2..26f261ef708a 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -186,7 +186,7 @@ xfs_perag_set_reclaim_tag(
 {
 	struct xfs_mount	*mp = pag->pag_mount;
 
-	lockdep_assert_held(&pag->pag_ici_lock);
+	lockdep_assert_held(&pag->pag_ici_xa.xa_lock);
 	if (pag->pag_ici_reclaimable++)
 		return;
 
@@ -205,7 +205,7 @@ xfs_perag_clear_reclaim_tag(
 {
 	struct xfs_mount	*mp = pag->pag_mount;
 
-	lockdep_assert_held(&pag->pag_ici_lock);
+	lockdep_assert_held(&pag->pag_ici_xa.xa_lock);
 	if (--pag->pag_ici_reclaimable)
 		return;
 
@@ -228,16 +228,16 @@ xfs_inode_set_reclaim_tag(
 	struct xfs_perag	*pag;
 
 	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
-	spin_lock(&pag->pag_ici_lock);
+	xa_lock(&pag->pag_ici_xa);
 	spin_lock(&ip->i_flags_lock);
 
-	radix_tree_tag_set(&pag->pag_ici_root, XFS_INO_TO_AGINO(mp, ip->i_ino),
+	__xa_set_tag(&pag->pag_ici_xa, XFS_INO_TO_AGINO(mp, ip->i_ino),
 			   XFS_ICI_RECLAIM_TAG);
 	xfs_perag_set_reclaim_tag(pag);
 	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
 
 	spin_unlock(&ip->i_flags_lock);
-	spin_unlock(&pag->pag_ici_lock);
+	xa_unlock(&pag->pag_ici_xa);
 	xfs_perag_put(pag);
 }
 
@@ -246,7 +246,7 @@ xfs_inode_clear_reclaim_tag(
 	struct xfs_perag	*pag,
 	xfs_ino_t		ino)
 {
-	radix_tree_tag_clear(&pag->pag_ici_root,
+	__xa_clear_tag(&pag->pag_ici_xa,
 			     XFS_INO_TO_AGINO(pag->pag_mount, ino),
 			     XFS_ICI_RECLAIM_TAG);
 	xfs_perag_clear_reclaim_tag(pag);
@@ -367,8 +367,8 @@ xfs_iget_cache_hit(
 		/*
 		 * We need to set XFS_IRECLAIM to prevent xfs_reclaim_inode
 		 * from stomping over us while we recycle the inode.  We can't
-		 * clear the radix tree reclaimable tag yet as it requires
-		 * pag_ici_lock to be held exclusive.
+		 * clear the xarray reclaimable tag yet as it requires
+		 * pag_ici_xa.xa_lock to be held exclusive.
 		 */
 		ip->i_flags |= XFS_IRECLAIM;
 
@@ -393,7 +393,7 @@ xfs_iget_cache_hit(
 			goto out_error;
 		}
 
-		spin_lock(&pag->pag_ici_lock);
+		xa_lock(&pag->pag_ici_xa);
 		spin_lock(&ip->i_flags_lock);
 
 		/*
@@ -410,7 +410,7 @@ xfs_iget_cache_hit(
 		init_rwsem(&inode->i_rwsem);
 
 		spin_unlock(&ip->i_flags_lock);
-		spin_unlock(&pag->pag_ici_lock);
+		xa_unlock(&pag->pag_ici_xa);
 	} else {
 		/* If the VFS inode is being torn down, pause and try again. */
 		if (!igrab(inode)) {
@@ -451,7 +451,7 @@ xfs_iget_cache_miss(
 	int			flags,
 	int			lock_flags)
 {
-	struct xfs_inode	*ip;
+	struct xfs_inode	*ip, *curr;
 	int			error;
 	xfs_agino_t		agino = XFS_INO_TO_AGINO(mp, ino);
 	int			iflags;
@@ -471,17 +471,6 @@ xfs_iget_cache_miss(
 		goto out_destroy;
 	}
 
-	/*
-	 * Preload the radix tree so we can insert safely under the
-	 * write spinlock. Note that we cannot sleep inside the preload
-	 * region. Since we can be called from transaction context, don't
-	 * recurse into the file system.
-	 */
-	if (radix_tree_preload(GFP_NOFS)) {
-		error = -EAGAIN;
-		goto out_destroy;
-	}
-
 	/*
 	 * Because the inode hasn't been added to the radix-tree yet it can't
 	 * be found by another thread, so we can do the non-sleeping lock here.
@@ -509,23 +498,18 @@ xfs_iget_cache_miss(
 	xfs_iflags_set(ip, iflags);
 
 	/* insert the new inode */
-	spin_lock(&pag->pag_ici_lock);
-	error = radix_tree_insert(&pag->pag_ici_root, agino, ip);
-	if (unlikely(error)) {
-		WARN_ON(error != -EEXIST);
+	curr = xa_cmpxchg(&pag->pag_ici_xa, agino, NULL, ip, GFP_NOFS);
+	if (unlikely(curr)) {
+		WARN_ON(IS_ERR(curr));
 		XFS_STATS_INC(mp, xs_ig_dup);
 		error = -EAGAIN;
-		goto out_preload_end;
+		goto out_unlock;
 	}
-	spin_unlock(&pag->pag_ici_lock);
-	radix_tree_preload_end();
 
 	*ipp = ip;
 	return 0;
 
-out_preload_end:
-	spin_unlock(&pag->pag_ici_lock);
-	radix_tree_preload_end();
+out_unlock:
 	if (lock_flags)
 		xfs_iunlock(ip, lock_flags);
 out_destroy:
@@ -592,7 +576,7 @@ xfs_iget(
 again:
 	error = 0;
 	rcu_read_lock();
-	ip = radix_tree_lookup(&pag->pag_ici_root, agino);
+	ip = xa_load(&pag->pag_ici_xa, agino);
 
 	if (ip) {
 		error = xfs_iget_cache_hit(pag, ip, ino, flags, lock_flags);
@@ -731,7 +715,7 @@ xfs_inode_ag_walk(
 					   void *args),
 	int			flags,
 	void			*args,
-	int			tag,
+	xa_tag_t		tag,
 	int			iter_flags)
 {
 	uint32_t		first_index;
@@ -752,14 +736,13 @@ xfs_inode_ag_walk(
 
 		rcu_read_lock();
 
-		if (tag == -1)
-			nr_found = radix_tree_gang_lookup(&pag->pag_ici_root,
-					(void **)batch, first_index,
+		if (tag == XFS_ICI_NO_TAG)
+			nr_found = xa_get_entries(&pag->pag_ici_xa,
+					(void **)batch, first_index, ULONG_MAX,
 					XFS_LOOKUP_BATCH);
 		else
-			nr_found = radix_tree_gang_lookup_tag(
-					&pag->pag_ici_root,
-					(void **) batch, first_index,
+			nr_found = xa_get_tagged(&pag->pag_ici_xa,
+					(void **)batch, first_index, ULONG_MAX,
 					XFS_LOOKUP_BATCH, tag);
 
 		if (!nr_found) {
@@ -896,8 +879,8 @@ xfs_inode_ag_iterator_flags(
 	ag = 0;
 	while ((pag = xfs_perag_get(mp, ag))) {
 		ag = pag->pag_agno + 1;
-		error = xfs_inode_ag_walk(mp, pag, execute, flags, args, -1,
-					  iter_flags);
+		error = xfs_inode_ag_walk(mp, pag, execute, flags, args,
+					  XFS_ICI_NO_TAG, iter_flags);
 		xfs_perag_put(pag);
 		if (error) {
 			last_error = error;
@@ -926,7 +909,7 @@ xfs_inode_ag_iterator_tag(
 					   void *args),
 	int			flags,
 	void			*args,
-	int			tag)
+	xa_tag_t		tag)
 {
 	struct xfs_perag	*pag;
 	int			error = 0;
@@ -1040,8 +1023,9 @@ xfs_reclaim_inode(
 	int			sync_mode)
 {
 	struct xfs_buf		*bp = NULL;
-	xfs_ino_t		ino = ip->i_ino; /* for radix_tree_delete */
+	xfs_ino_t		ino = ip->i_ino;
 	int			error;
+	XA_STATE(xas, XFS_INO_TO_AGINO(ip->i_mount, ino));
 
 restart:
 	error = 0;
@@ -1128,16 +1112,14 @@ xfs_reclaim_inode(
 	/*
 	 * Remove the inode from the per-AG radix tree.
 	 *
-	 * Because radix_tree_delete won't complain even if the item was never
-	 * added to the tree assert that it's been there before to catch
-	 * problems with the inode life time early on.
+	 * Check that it was there before to catch problems with the
+	 * inode life time early on.
 	 */
-	spin_lock(&pag->pag_ici_lock);
-	if (!radix_tree_delete(&pag->pag_ici_root,
-				XFS_INO_TO_AGINO(ip->i_mount, ino)))
+	xa_lock(&pag->pag_ici_xa);
+	if (xas_store(&pag->pag_ici_xa, &xas, NULL) != ip)
 		ASSERT(0);
 	xfs_perag_clear_reclaim_tag(pag);
-	spin_unlock(&pag->pag_ici_lock);
+	xa_unlock(&pag->pag_ici_xa);
 
 	/*
 	 * Here we do an (almost) spurious inode lock in order to coordinate
@@ -1213,9 +1195,8 @@ xfs_reclaim_inodes_ag(
 			int	i;
 
 			rcu_read_lock();
-			nr_found = radix_tree_gang_lookup_tag(
-					&pag->pag_ici_root,
-					(void **)batch, first_index,
+			nr_found = xa_get_tagged(&pag->pag_ici_xa,
+					(void **)batch, first_index, ULONG_MAX,
 					XFS_LOOKUP_BATCH,
 					XFS_ICI_RECLAIM_TAG);
 			if (!nr_found) {
@@ -1450,7 +1431,7 @@ __xfs_icache_free_eofblocks(
 	struct xfs_eofblocks	*eofb,
 	int			(*execute)(struct xfs_inode *ip, int flags,
 					   void *args),
-	int			tag)
+	xa_tag_t		tag)
 {
 	int flags = SYNC_TRYLOCK;
 
@@ -1546,10 +1527,10 @@ __xfs_inode_set_eofblocks_tag(
 	spin_unlock(&ip->i_flags_lock);
 
 	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
-	spin_lock(&pag->pag_ici_lock);
+	xa_lock(&pag->pag_ici_xa);
 
-	tagged = radix_tree_tagged(&pag->pag_ici_root, tag);
-	radix_tree_tag_set(&pag->pag_ici_root,
+	tagged = xa_tagged(&pag->pag_ici_xa, tag);
+	__xa_set_tag(&pag->pag_ici_xa,
 			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino), tag);
 	if (!tagged) {
 		/* propagate the eofblocks tag up into the perag radix tree */
@@ -1563,7 +1544,7 @@ __xfs_inode_set_eofblocks_tag(
 		set_tp(ip->i_mount, pag->pag_agno, -1, _RET_IP_);
 	}
 
-	spin_unlock(&pag->pag_ici_lock);
+	xa_unlock(&pag->pag_ici_xa);
 	xfs_perag_put(pag);
 }
 
@@ -1592,11 +1573,11 @@ __xfs_inode_clear_eofblocks_tag(
 	spin_unlock(&ip->i_flags_lock);
 
 	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
-	spin_lock(&pag->pag_ici_lock);
+	xa_lock(&pag->pag_ici_xa);
 
-	radix_tree_tag_clear(&pag->pag_ici_root,
+	__xa_clear_tag(&pag->pag_ici_xa,
 			     XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino), tag);
-	if (!radix_tree_tagged(&pag->pag_ici_root, tag)) {
+	if (!xa_tagged(&pag->pag_ici_xa, tag)) {
 		/* clear the eofblocks tag from the perag radix tree */
 		xa_clear_tag(&ip->i_mount->m_perag_xa,
 				     XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino),
@@ -1604,7 +1585,7 @@ __xfs_inode_clear_eofblocks_tag(
 		clear_tp(ip->i_mount, pag->pag_agno, -1, _RET_IP_);
 	}
 
-	spin_unlock(&pag->pag_ici_lock);
+	xa_unlock(&pag->pag_ici_xa);
 	xfs_perag_put(pag);
 }
 
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index bd04d5adadfe..436e7f0b1ecc 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -35,7 +35,7 @@ struct xfs_eofblocks {
 /*
  * tags for inode radix tree
  */
-#define XFS_ICI_NO_TAG		(-1)	/* special flag for an untagged lookup
+#define XFS_ICI_NO_TAG		XA_NO_TAG /* special flag for an untagged lookup
 					   in xfs_inode_ag_iterator */
 #define XFS_ICI_RECLAIM_TAG	XA_TAG_0 /* inode is to be reclaimed */
 #define XFS_ICI_EOFBLOCKS_TAG	XA_TAG_1 /* inode has blocks beyond EOF */
@@ -90,7 +90,7 @@ int xfs_inode_ag_iterator_flags(struct xfs_mount *mp,
 	int flags, void *args, int iter_flags);
 int xfs_inode_ag_iterator_tag(struct xfs_mount *mp,
 	int (*execute)(struct xfs_inode *ip, int flags, void *args),
-	int flags, void *args, int tag);
+	int flags, void *args, xa_tag_t tag);
 
 static inline int
 xfs_fs_eofblocks_from_user(
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 61d1cb7dc10d..661f0ac655fa 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2309,7 +2309,7 @@ xfs_ifree_cluster(
 		for (i = 0; i < inodes_per_cluster; i++) {
 retry:
 			rcu_read_lock();
-			ip = radix_tree_lookup(&pag->pag_ici_root,
+			ip = xa_load(&pag->pag_ici_xa,
 					XFS_INO_TO_AGINO(mp, (inum + i)));
 
 			/* Inode not in memory, nothing to do */
@@ -3186,7 +3186,7 @@ xfs_iflush_cluster(
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_perag	*pag;
-	unsigned long		first_index, mask;
+	unsigned long		first_index, last_index, mask;
 	unsigned long		inodes_per_cluster;
 	int			cilist_size;
 	struct xfs_inode	**cilist;
@@ -3204,12 +3204,12 @@ xfs_iflush_cluster(
 	if (!cilist)
 		goto out_put;
 
-	mask = ~(((mp->m_inode_cluster_size >> mp->m_sb.sb_inodelog)) - 1);
-	first_index = XFS_INO_TO_AGINO(mp, ip->i_ino) & mask;
+	mask = (((mp->m_inode_cluster_size >> mp->m_sb.sb_inodelog)) - 1);
+	first_index = XFS_INO_TO_AGINO(mp, ip->i_ino) & ~mask;
+	last_index = first_index | mask;
 	rcu_read_lock();
-	/* really need a gang lookup range call here */
-	nr_found = radix_tree_gang_lookup(&pag->pag_ici_root, (void**)cilist,
-					first_index, inodes_per_cluster);
+	nr_found = xa_get_entries(&pag->pag_ici_xa, (void**)cilist, first_index,
+					last_index, inodes_per_cluster);
 	if (nr_found == 0)
 		goto out_free;
 
@@ -3230,16 +3230,6 @@ xfs_iflush_cluster(
 			spin_unlock(&cip->i_flags_lock);
 			continue;
 		}
-
-		/*
-		 * Once we fall off the end of the cluster, no point checking
-		 * any more inodes in the list because they will also all be
-		 * outside the cluster.
-		 */
-		if ((XFS_INO_TO_AGINO(mp, cip->i_ino) & mask) != first_index) {
-			spin_unlock(&cip->i_flags_lock);
-			break;
-		}
 		spin_unlock(&cip->i_flags_lock);
 
 		/*
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index ae90f6f8fa5c..d5185657f030 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -210,9 +210,8 @@ xfs_initialize_perag(
 			goto out_unwind_new_pags;
 		pag->pag_agno = index;
 		pag->pag_mount = mp;
-		spin_lock_init(&pag->pag_ici_lock);
 		mutex_init(&pag->pag_ici_reclaim_lock);
-		INIT_RADIX_TREE(&pag->pag_ici_root, GFP_ATOMIC);
+		xa_init(&pag->pag_ici_xa);
 		if (xfs_buf_hash_init(pag))
 			goto out_free_pag;
 		init_waitqueue_head(&pag->pagb_wait);
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 6e5ad7b26f46..ab0f706d2fd7 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -374,8 +374,7 @@ typedef struct xfs_perag {
 
 	atomic_t        pagf_fstrms;    /* # of filestreams active in this AG */
 
-	spinlock_t	pag_ici_lock;	/* incore inode cache lock */
-	struct radix_tree_root pag_ici_root;	/* incore inode cache root */
+	struct xarray	pag_ici_xa;	/* incore inode cache */
 	int		pag_ici_reclaimable;	/* reclaimable inodes */
 	struct mutex	pag_ici_reclaim_lock;	/* serialisation point */
 	unsigned long	pag_ici_reclaim_cursor;	/* reclaim restart point */
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 46/62] xfs: Convert xfs dquot to XArray
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This is a pretty straight-forward conversion.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/xfs/xfs_dquot.c | 33 +++++++++++++++++----------------
 fs/xfs/xfs_qm.c    | 32 ++++++++++++++++----------------
 fs/xfs/xfs_qm.h    | 18 +++++++++---------
 3 files changed, 42 insertions(+), 41 deletions(-)

diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index fbf37b326050..89f5250de8b2 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -44,7 +44,7 @@
  * Lock order:
  *
  * ip->i_lock
- *   qi->qi_tree_lock
+ *   qi->qi_xa_lock
  *     dquot->q_qlock (xfs_dqlock() and friends)
  *       dquot->q_flush (xfs_dqflock() and friends)
  *       qi->qi_lru_lock
@@ -752,8 +752,8 @@ xfs_qm_dqget(
 	xfs_dquot_t	**O_dqpp) /* OUT : locked incore dquot */
 {
 	struct xfs_quotainfo	*qi = mp->m_quotainfo;
-	struct radix_tree_root *tree = xfs_dquot_tree(qi, type);
-	struct xfs_dquot	*dqp;
+	struct xarray		*xa = xfs_dquot_xa(qi, type);
+	struct xfs_dquot	*dqp, *curr;
 	int			error;
 
 	ASSERT(XFS_IS_QUOTA_RUNNING(mp));
@@ -772,13 +772,14 @@ xfs_qm_dqget(
 	}
 
 restart:
-	mutex_lock(&qi->qi_tree_lock);
-	dqp = radix_tree_lookup(tree, id);
+	mutex_lock(&qi->qi_xa_lock);
+	dqp = xa_load(xa, id);
+found:
 	if (dqp) {
 		xfs_dqlock(dqp);
 		if (dqp->dq_flags & XFS_DQ_FREEING) {
 			xfs_dqunlock(dqp);
-			mutex_unlock(&qi->qi_tree_lock);
+			mutex_unlock(&qi->qi_xa_lock);
 			trace_xfs_dqget_freeing(dqp);
 			delay(1);
 			goto restart;
@@ -788,7 +789,7 @@ xfs_qm_dqget(
 		if (flags & XFS_QMOPT_DQNEXT) {
 			if (XFS_IS_DQUOT_UNINITIALIZED(dqp)) {
 				xfs_dqunlock(dqp);
-				mutex_unlock(&qi->qi_tree_lock);
+				mutex_unlock(&qi->qi_xa_lock);
 				error = xfs_dq_get_next_id(mp, type, &id);
 				if (error)
 					return error;
@@ -797,14 +798,14 @@ xfs_qm_dqget(
 		}
 
 		dqp->q_nrefs++;
-		mutex_unlock(&qi->qi_tree_lock);
+		mutex_unlock(&qi->qi_xa_lock);
 
 		trace_xfs_dqget_hit(dqp);
 		XFS_STATS_INC(mp, xs_qm_dqcachehits);
 		*O_dqpp = dqp;
 		return 0;
 	}
-	mutex_unlock(&qi->qi_tree_lock);
+	mutex_unlock(&qi->qi_xa_lock);
 	XFS_STATS_INC(mp, xs_qm_dqcachemisses);
 
 	/*
@@ -854,20 +855,20 @@ xfs_qm_dqget(
 		}
 	}
 
-	mutex_lock(&qi->qi_tree_lock);
-	error = radix_tree_insert(tree, id, dqp);
-	if (unlikely(error)) {
-		WARN_ON(error != -EEXIST);
+	mutex_lock(&qi->qi_xa_lock);
+	curr = xa_cmpxchg(xa, id, NULL, dqp, GFP_NOFS);
+	if (unlikely(curr)) {
+		WARN_ON(IS_ERR(curr));
 
 		/*
 		 * Duplicate found. Just throw away the new dquot and start
 		 * over.
 		 */
-		mutex_unlock(&qi->qi_tree_lock);
 		trace_xfs_dqget_dup(dqp);
 		xfs_qm_dqdestroy(dqp);
 		XFS_STATS_INC(mp, xs_qm_dquot_dups);
-		goto restart;
+		dqp = curr;
+		goto found;
 	}
 
 	/*
@@ -877,7 +878,7 @@ xfs_qm_dqget(
 	dqp->q_nrefs = 1;
 
 	qi->qi_dquots++;
-	mutex_unlock(&qi->qi_tree_lock);
+	mutex_unlock(&qi->qi_xa_lock);
 
 	/* If we are asked to find next active id, keep looking */
 	if (flags & XFS_QMOPT_DQNEXT) {
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 010a13a201aa..5a75836faf92 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -67,7 +67,7 @@ xfs_qm_dquot_walk(
 	void			*data)
 {
 	struct xfs_quotainfo	*qi = mp->m_quotainfo;
-	struct radix_tree_root	*tree = xfs_dquot_tree(qi, type);
+	struct xarray		*xa = xfs_dquot_xa(qi, type);
 	uint32_t		next_index;
 	int			last_error = 0;
 	int			skipped;
@@ -83,11 +83,11 @@ xfs_qm_dquot_walk(
 		int		error = 0;
 		int		i;
 
-		mutex_lock(&qi->qi_tree_lock);
-		nr_found = radix_tree_gang_lookup(tree, (void **)batch,
-					next_index, XFS_DQ_LOOKUP_BATCH);
+		mutex_lock(&qi->qi_xa_lock);
+		nr_found = xa_get_entries(xa, (void **)batch, next_index,
+					ULONG_MAX, XFS_DQ_LOOKUP_BATCH);
 		if (!nr_found) {
-			mutex_unlock(&qi->qi_tree_lock);
+			mutex_unlock(&qi->qi_xa_lock);
 			break;
 		}
 
@@ -105,7 +105,7 @@ xfs_qm_dquot_walk(
 				last_error = error;
 		}
 
-		mutex_unlock(&qi->qi_tree_lock);
+		mutex_unlock(&qi->qi_xa_lock);
 
 		/* bail out if the filesystem is corrupted.  */
 		if (last_error == -EFSCORRUPTED) {
@@ -178,8 +178,8 @@ xfs_qm_dqpurge(
 	xfs_dqfunlock(dqp);
 	xfs_dqunlock(dqp);
 
-	radix_tree_delete(xfs_dquot_tree(qi, dqp->q_core.d_flags),
-			  be32_to_cpu(dqp->q_core.d_id));
+	xa_store(xfs_dquot_xa(qi, dqp->q_core.d_flags),
+			  be32_to_cpu(dqp->q_core.d_id), NULL, GFP_NOWAIT);
 	qi->qi_dquots--;
 
 	/*
@@ -623,10 +623,10 @@ xfs_qm_init_quotainfo(
 	if (error)
 		goto out_free_lru;
 
-	INIT_RADIX_TREE(&qinf->qi_uquota_tree, GFP_NOFS);
-	INIT_RADIX_TREE(&qinf->qi_gquota_tree, GFP_NOFS);
-	INIT_RADIX_TREE(&qinf->qi_pquota_tree, GFP_NOFS);
-	mutex_init(&qinf->qi_tree_lock);
+	xa_init(&qinf->qi_uquota_xa);
+	xa_init(&qinf->qi_gquota_xa);
+	xa_init(&qinf->qi_pquota_xa);
+	mutex_init(&qinf->qi_xa_lock);
 
 	/* mutex used to serialize quotaoffs */
 	mutex_init(&qinf->qi_quotaofflock);
@@ -1606,12 +1606,12 @@ xfs_qm_dqfree_one(
 	struct xfs_mount	*mp = dqp->q_mount;
 	struct xfs_quotainfo	*qi = mp->m_quotainfo;
 
-	mutex_lock(&qi->qi_tree_lock);
-	radix_tree_delete(xfs_dquot_tree(qi, dqp->q_core.d_flags),
-			  be32_to_cpu(dqp->q_core.d_id));
+	mutex_lock(&qi->qi_xa_lock);
+	xa_store(xfs_dquot_xa(qi, dqp->q_core.d_flags),
+			  be32_to_cpu(dqp->q_core.d_id), NULL, GFP_NOWAIT);
 
 	qi->qi_dquots--;
-	mutex_unlock(&qi->qi_tree_lock);
+	mutex_unlock(&qi->qi_xa_lock);
 
 	xfs_qm_dqdestroy(dqp);
 }
diff --git a/fs/xfs/xfs_qm.h b/fs/xfs/xfs_qm.h
index 2975a822e9f0..946f929f7bfb 100644
--- a/fs/xfs/xfs_qm.h
+++ b/fs/xfs/xfs_qm.h
@@ -67,10 +67,10 @@ struct xfs_def_quota {
  * The mount structure keeps a pointer to this.
  */
 typedef struct xfs_quotainfo {
-	struct radix_tree_root qi_uquota_tree;
-	struct radix_tree_root qi_gquota_tree;
-	struct radix_tree_root qi_pquota_tree;
-	struct mutex qi_tree_lock;
+	struct xarray	qi_uquota_xa;
+	struct xarray	qi_gquota_xa;
+	struct xarray	qi_pquota_xa;
+	struct mutex	qi_xa_lock;
 	struct xfs_inode	*qi_uquotaip;	/* user quota inode */
 	struct xfs_inode	*qi_gquotaip;	/* group quota inode */
 	struct xfs_inode	*qi_pquotaip;	/* project quota inode */
@@ -91,18 +91,18 @@ typedef struct xfs_quotainfo {
 	struct shrinker  qi_shrinker;
 } xfs_quotainfo_t;
 
-static inline struct radix_tree_root *
-xfs_dquot_tree(
+static inline struct xarray *
+xfs_dquot_xa(
 	struct xfs_quotainfo	*qi,
 	int			type)
 {
 	switch (type) {
 	case XFS_DQ_USER:
-		return &qi->qi_uquota_tree;
+		return &qi->qi_uquota_xa;
 	case XFS_DQ_GROUP:
-		return &qi->qi_gquota_tree;
+		return &qi->qi_gquota_xa;
 	case XFS_DQ_PROJ:
-		return &qi->qi_pquota_tree;
+		return &qi->qi_pquota_xa;
 	default:
 		ASSERT(0);
 	}
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 46/62] xfs: Convert xfs dquot to XArray
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This is a pretty straight-forward conversion.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/xfs/xfs_dquot.c | 33 +++++++++++++++++----------------
 fs/xfs/xfs_qm.c    | 32 ++++++++++++++++----------------
 fs/xfs/xfs_qm.h    | 18 +++++++++---------
 3 files changed, 42 insertions(+), 41 deletions(-)

diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index fbf37b326050..89f5250de8b2 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -44,7 +44,7 @@
  * Lock order:
  *
  * ip->i_lock
- *   qi->qi_tree_lock
+ *   qi->qi_xa_lock
  *     dquot->q_qlock (xfs_dqlock() and friends)
  *       dquot->q_flush (xfs_dqflock() and friends)
  *       qi->qi_lru_lock
@@ -752,8 +752,8 @@ xfs_qm_dqget(
 	xfs_dquot_t	**O_dqpp) /* OUT : locked incore dquot */
 {
 	struct xfs_quotainfo	*qi = mp->m_quotainfo;
-	struct radix_tree_root *tree = xfs_dquot_tree(qi, type);
-	struct xfs_dquot	*dqp;
+	struct xarray		*xa = xfs_dquot_xa(qi, type);
+	struct xfs_dquot	*dqp, *curr;
 	int			error;
 
 	ASSERT(XFS_IS_QUOTA_RUNNING(mp));
@@ -772,13 +772,14 @@ xfs_qm_dqget(
 	}
 
 restart:
-	mutex_lock(&qi->qi_tree_lock);
-	dqp = radix_tree_lookup(tree, id);
+	mutex_lock(&qi->qi_xa_lock);
+	dqp = xa_load(xa, id);
+found:
 	if (dqp) {
 		xfs_dqlock(dqp);
 		if (dqp->dq_flags & XFS_DQ_FREEING) {
 			xfs_dqunlock(dqp);
-			mutex_unlock(&qi->qi_tree_lock);
+			mutex_unlock(&qi->qi_xa_lock);
 			trace_xfs_dqget_freeing(dqp);
 			delay(1);
 			goto restart;
@@ -788,7 +789,7 @@ xfs_qm_dqget(
 		if (flags & XFS_QMOPT_DQNEXT) {
 			if (XFS_IS_DQUOT_UNINITIALIZED(dqp)) {
 				xfs_dqunlock(dqp);
-				mutex_unlock(&qi->qi_tree_lock);
+				mutex_unlock(&qi->qi_xa_lock);
 				error = xfs_dq_get_next_id(mp, type, &id);
 				if (error)
 					return error;
@@ -797,14 +798,14 @@ xfs_qm_dqget(
 		}
 
 		dqp->q_nrefs++;
-		mutex_unlock(&qi->qi_tree_lock);
+		mutex_unlock(&qi->qi_xa_lock);
 
 		trace_xfs_dqget_hit(dqp);
 		XFS_STATS_INC(mp, xs_qm_dqcachehits);
 		*O_dqpp = dqp;
 		return 0;
 	}
-	mutex_unlock(&qi->qi_tree_lock);
+	mutex_unlock(&qi->qi_xa_lock);
 	XFS_STATS_INC(mp, xs_qm_dqcachemisses);
 
 	/*
@@ -854,20 +855,20 @@ xfs_qm_dqget(
 		}
 	}
 
-	mutex_lock(&qi->qi_tree_lock);
-	error = radix_tree_insert(tree, id, dqp);
-	if (unlikely(error)) {
-		WARN_ON(error != -EEXIST);
+	mutex_lock(&qi->qi_xa_lock);
+	curr = xa_cmpxchg(xa, id, NULL, dqp, GFP_NOFS);
+	if (unlikely(curr)) {
+		WARN_ON(IS_ERR(curr));
 
 		/*
 		 * Duplicate found. Just throw away the new dquot and start
 		 * over.
 		 */
-		mutex_unlock(&qi->qi_tree_lock);
 		trace_xfs_dqget_dup(dqp);
 		xfs_qm_dqdestroy(dqp);
 		XFS_STATS_INC(mp, xs_qm_dquot_dups);
-		goto restart;
+		dqp = curr;
+		goto found;
 	}
 
 	/*
@@ -877,7 +878,7 @@ xfs_qm_dqget(
 	dqp->q_nrefs = 1;
 
 	qi->qi_dquots++;
-	mutex_unlock(&qi->qi_tree_lock);
+	mutex_unlock(&qi->qi_xa_lock);
 
 	/* If we are asked to find next active id, keep looking */
 	if (flags & XFS_QMOPT_DQNEXT) {
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 010a13a201aa..5a75836faf92 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -67,7 +67,7 @@ xfs_qm_dquot_walk(
 	void			*data)
 {
 	struct xfs_quotainfo	*qi = mp->m_quotainfo;
-	struct radix_tree_root	*tree = xfs_dquot_tree(qi, type);
+	struct xarray		*xa = xfs_dquot_xa(qi, type);
 	uint32_t		next_index;
 	int			last_error = 0;
 	int			skipped;
@@ -83,11 +83,11 @@ xfs_qm_dquot_walk(
 		int		error = 0;
 		int		i;
 
-		mutex_lock(&qi->qi_tree_lock);
-		nr_found = radix_tree_gang_lookup(tree, (void **)batch,
-					next_index, XFS_DQ_LOOKUP_BATCH);
+		mutex_lock(&qi->qi_xa_lock);
+		nr_found = xa_get_entries(xa, (void **)batch, next_index,
+					ULONG_MAX, XFS_DQ_LOOKUP_BATCH);
 		if (!nr_found) {
-			mutex_unlock(&qi->qi_tree_lock);
+			mutex_unlock(&qi->qi_xa_lock);
 			break;
 		}
 
@@ -105,7 +105,7 @@ xfs_qm_dquot_walk(
 				last_error = error;
 		}
 
-		mutex_unlock(&qi->qi_tree_lock);
+		mutex_unlock(&qi->qi_xa_lock);
 
 		/* bail out if the filesystem is corrupted.  */
 		if (last_error == -EFSCORRUPTED) {
@@ -178,8 +178,8 @@ xfs_qm_dqpurge(
 	xfs_dqfunlock(dqp);
 	xfs_dqunlock(dqp);
 
-	radix_tree_delete(xfs_dquot_tree(qi, dqp->q_core.d_flags),
-			  be32_to_cpu(dqp->q_core.d_id));
+	xa_store(xfs_dquot_xa(qi, dqp->q_core.d_flags),
+			  be32_to_cpu(dqp->q_core.d_id), NULL, GFP_NOWAIT);
 	qi->qi_dquots--;
 
 	/*
@@ -623,10 +623,10 @@ xfs_qm_init_quotainfo(
 	if (error)
 		goto out_free_lru;
 
-	INIT_RADIX_TREE(&qinf->qi_uquota_tree, GFP_NOFS);
-	INIT_RADIX_TREE(&qinf->qi_gquota_tree, GFP_NOFS);
-	INIT_RADIX_TREE(&qinf->qi_pquota_tree, GFP_NOFS);
-	mutex_init(&qinf->qi_tree_lock);
+	xa_init(&qinf->qi_uquota_xa);
+	xa_init(&qinf->qi_gquota_xa);
+	xa_init(&qinf->qi_pquota_xa);
+	mutex_init(&qinf->qi_xa_lock);
 
 	/* mutex used to serialize quotaoffs */
 	mutex_init(&qinf->qi_quotaofflock);
@@ -1606,12 +1606,12 @@ xfs_qm_dqfree_one(
 	struct xfs_mount	*mp = dqp->q_mount;
 	struct xfs_quotainfo	*qi = mp->m_quotainfo;
 
-	mutex_lock(&qi->qi_tree_lock);
-	radix_tree_delete(xfs_dquot_tree(qi, dqp->q_core.d_flags),
-			  be32_to_cpu(dqp->q_core.d_id));
+	mutex_lock(&qi->qi_xa_lock);
+	xa_store(xfs_dquot_xa(qi, dqp->q_core.d_flags),
+			  be32_to_cpu(dqp->q_core.d_id), NULL, GFP_NOWAIT);
 
 	qi->qi_dquots--;
-	mutex_unlock(&qi->qi_tree_lock);
+	mutex_unlock(&qi->qi_xa_lock);
 
 	xfs_qm_dqdestroy(dqp);
 }
diff --git a/fs/xfs/xfs_qm.h b/fs/xfs/xfs_qm.h
index 2975a822e9f0..946f929f7bfb 100644
--- a/fs/xfs/xfs_qm.h
+++ b/fs/xfs/xfs_qm.h
@@ -67,10 +67,10 @@ struct xfs_def_quota {
  * The mount structure keeps a pointer to this.
  */
 typedef struct xfs_quotainfo {
-	struct radix_tree_root qi_uquota_tree;
-	struct radix_tree_root qi_gquota_tree;
-	struct radix_tree_root qi_pquota_tree;
-	struct mutex qi_tree_lock;
+	struct xarray	qi_uquota_xa;
+	struct xarray	qi_gquota_xa;
+	struct xarray	qi_pquota_xa;
+	struct mutex	qi_xa_lock;
 	struct xfs_inode	*qi_uquotaip;	/* user quota inode */
 	struct xfs_inode	*qi_gquotaip;	/* group quota inode */
 	struct xfs_inode	*qi_pquotaip;	/* project quota inode */
@@ -91,18 +91,18 @@ typedef struct xfs_quotainfo {
 	struct shrinker  qi_shrinker;
 } xfs_quotainfo_t;
 
-static inline struct radix_tree_root *
-xfs_dquot_tree(
+static inline struct xarray *
+xfs_dquot_xa(
 	struct xfs_quotainfo	*qi,
 	int			type)
 {
 	switch (type) {
 	case XFS_DQ_USER:
-		return &qi->qi_uquota_tree;
+		return &qi->qi_uquota_xa;
 	case XFS_DQ_GROUP:
-		return &qi->qi_gquota_tree;
+		return &qi->qi_gquota_xa;
 	case XFS_DQ_PROJ:
-		return &qi->qi_pquota_tree;
+		return &qi->qi_pquota_xa;
 	default:
 		ASSERT(0);
 	}
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 47/62] xfs: Convert mru cache to XArray
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This eliminates a call to radix_tree_preload().

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/xfs/xfs_mru_cache.c | 71 +++++++++++++++++++++++++-------------------------
 1 file changed, 35 insertions(+), 36 deletions(-)

diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c
index f8a674d7f092..d665ac490045 100644
--- a/fs/xfs/xfs_mru_cache.c
+++ b/fs/xfs/xfs_mru_cache.c
@@ -101,10 +101,9 @@
  * an infinite loop in the code.
  */
 struct xfs_mru_cache {
-	struct radix_tree_root	store;     /* Core storage data structure.  */
+	struct xarray		store;     /* Core storage data structure.  */
 	struct list_head	*lists;    /* Array of lists, one per grp.  */
 	struct list_head	reap_list; /* Elements overdue for reaping. */
-	spinlock_t		lock;      /* Lock to protect this struct.  */
 	unsigned int		grp_count; /* Number of discrete groups.    */
 	unsigned int		grp_time;  /* Time period spanned by grps.  */
 	unsigned int		lru_grp;   /* Group containing time zero.   */
@@ -232,22 +231,24 @@ _xfs_mru_cache_list_insert(
  * data store, removing it from the reap list, calling the client's free
  * function and deleting the element from the element zone.
  *
- * We get called holding the mru->lock, which we drop and then reacquire.
- * Sparse need special help with this to tell it we know what we are doing.
+ * We get called holding the mru->store lock, which we drop and then reacquire.
+ * Sparse needs special help with this to tell it we know what we are doing.
  */
 STATIC void
 _xfs_mru_cache_clear_reap_list(
 	struct xfs_mru_cache	*mru)
-		__releases(mru->lock) __acquires(mru->lock)
+		__releases(mru->store) __acquires(mru->store)
 {
+	XA_STATE(xas, 0);
 	struct xfs_mru_cache_elem *elem, *next;
 	struct list_head	tmp;
 
 	INIT_LIST_HEAD(&tmp);
 	list_for_each_entry_safe(elem, next, &mru->reap_list, list_node) {
+		xas_set(&xas, elem->key);
 
 		/* Remove the element from the data store. */
-		radix_tree_delete(&mru->store, elem->key);
+		xas_store(&mru->store, &xas, NULL);
 
 		/*
 		 * remove to temp list so it can be freed without
@@ -255,14 +256,14 @@ _xfs_mru_cache_clear_reap_list(
 		 */
 		list_move(&elem->list_node, &tmp);
 	}
-	spin_unlock(&mru->lock);
+	xa_unlock(&mru->store);
 
 	list_for_each_entry_safe(elem, next, &tmp, list_node) {
 		list_del_init(&elem->list_node);
 		mru->free_func(elem);
 	}
 
-	spin_lock(&mru->lock);
+	xa_lock(&mru->store);
 }
 
 /*
@@ -284,7 +285,7 @@ _xfs_mru_cache_reap(
 	if (!mru || !mru->lists)
 		return;
 
-	spin_lock(&mru->lock);
+	xa_lock(&mru->store);
 	next = _xfs_mru_cache_migrate(mru, jiffies);
 	_xfs_mru_cache_clear_reap_list(mru);
 
@@ -298,7 +299,7 @@ _xfs_mru_cache_reap(
 		queue_delayed_work(xfs_mru_reap_wq, &mru->work, next);
 	}
 
-	spin_unlock(&mru->lock);
+	xa_unlock(&mru->store);
 }
 
 int
@@ -358,13 +359,8 @@ xfs_mru_cache_create(
 	for (grp = 0; grp < mru->grp_count; grp++)
 		INIT_LIST_HEAD(mru->lists + grp);
 
-	/*
-	 * We use GFP_KERNEL radix tree preload and do inserts under a
-	 * spinlock so GFP_ATOMIC is appropriate for the radix tree itself.
-	 */
-	INIT_RADIX_TREE(&mru->store, GFP_ATOMIC);
+	xa_init(&mru->store);
 	INIT_LIST_HEAD(&mru->reap_list);
-	spin_lock_init(&mru->lock);
 	INIT_DELAYED_WORK(&mru->work, _xfs_mru_cache_reap);
 
 	mru->grp_time  = grp_time;
@@ -394,17 +390,17 @@ xfs_mru_cache_flush(
 	if (!mru || !mru->lists)
 		return;
 
-	spin_lock(&mru->lock);
+	xa_lock(&mru->store);
 	if (mru->queued) {
-		spin_unlock(&mru->lock);
+		xa_unlock(&mru->store);
 		cancel_delayed_work_sync(&mru->work);
-		spin_lock(&mru->lock);
+		xa_lock(&mru->store);
 	}
 
 	_xfs_mru_cache_migrate(mru, jiffies + mru->grp_count * mru->grp_time);
 	_xfs_mru_cache_clear_reap_list(mru);
 
-	spin_unlock(&mru->lock);
+	xa_unlock(&mru->store);
 }
 
 void
@@ -431,24 +427,25 @@ xfs_mru_cache_insert(
 	unsigned long		key,
 	struct xfs_mru_cache_elem *elem)
 {
+	XA_STATE(xas, key);
 	int			error;
 
 	ASSERT(mru && mru->lists);
 	if (!mru || !mru->lists)
 		return -EINVAL;
 
-	if (radix_tree_preload(GFP_NOFS))
-		return -ENOMEM;
-
 	INIT_LIST_HEAD(&elem->list_node);
 	elem->key = key;
 
-	spin_lock(&mru->lock);
-	error = radix_tree_insert(&mru->store, key, elem);
-	radix_tree_preload_end();
+retry:
+	xa_lock(&mru->store);
+	xas_store(&mru->store, &xas, elem);
+	error = xas_error(&xas);
 	if (!error)
 		_xfs_mru_cache_list_insert(mru, elem);
-	spin_unlock(&mru->lock);
+	xa_unlock(&mru->store);
+	if (xas_nomem(&xas, GFP_NOFS))
+		goto retry;
 
 	return error;
 }
@@ -464,17 +461,18 @@ xfs_mru_cache_remove(
 	struct xfs_mru_cache	*mru,
 	unsigned long		key)
 {
+	XA_STATE(xas, key);
 	struct xfs_mru_cache_elem *elem;
 
 	ASSERT(mru && mru->lists);
 	if (!mru || !mru->lists)
 		return NULL;
 
-	spin_lock(&mru->lock);
-	elem = radix_tree_delete(&mru->store, key);
+	xa_lock(&mru->store);
+	elem = xas_store(&mru->store, &xas, NULL);
 	if (elem)
 		list_del(&elem->list_node);
-	spin_unlock(&mru->lock);
+	xa_unlock(&mru->store);
 
 	return elem;
 }
@@ -520,20 +518,21 @@ xfs_mru_cache_lookup(
 	struct xfs_mru_cache	*mru,
 	unsigned long		key)
 {
+	XA_STATE(xas, key);
 	struct xfs_mru_cache_elem *elem;
 
 	ASSERT(mru && mru->lists);
 	if (!mru || !mru->lists)
 		return NULL;
 
-	spin_lock(&mru->lock);
-	elem = radix_tree_lookup(&mru->store, key);
+	xa_lock(&mru->store);
+	elem = xas_load(&mru->store, &xas);
 	if (elem) {
 		list_del(&elem->list_node);
 		_xfs_mru_cache_list_insert(mru, elem);
-		__release(mru_lock); /* help sparse not be stupid */
+		__release(&mru->store); /* help sparse not be stupid */
 	} else
-		spin_unlock(&mru->lock);
+		xa_unlock(&mru->store);
 
 	return elem;
 }
@@ -546,7 +545,7 @@ xfs_mru_cache_lookup(
 void
 xfs_mru_cache_done(
 	struct xfs_mru_cache	*mru)
-		__releases(mru->lock)
+		__releases(mru->store)
 {
-	spin_unlock(&mru->lock);
+	xa_unlock(&mru->store);
 }
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 47/62] xfs: Convert mru cache to XArray
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

This eliminates a call to radix_tree_preload().

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/xfs/xfs_mru_cache.c | 71 +++++++++++++++++++++++++-------------------------
 1 file changed, 35 insertions(+), 36 deletions(-)

diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c
index f8a674d7f092..d665ac490045 100644
--- a/fs/xfs/xfs_mru_cache.c
+++ b/fs/xfs/xfs_mru_cache.c
@@ -101,10 +101,9 @@
  * an infinite loop in the code.
  */
 struct xfs_mru_cache {
-	struct radix_tree_root	store;     /* Core storage data structure.  */
+	struct xarray		store;     /* Core storage data structure.  */
 	struct list_head	*lists;    /* Array of lists, one per grp.  */
 	struct list_head	reap_list; /* Elements overdue for reaping. */
-	spinlock_t		lock;      /* Lock to protect this struct.  */
 	unsigned int		grp_count; /* Number of discrete groups.    */
 	unsigned int		grp_time;  /* Time period spanned by grps.  */
 	unsigned int		lru_grp;   /* Group containing time zero.   */
@@ -232,22 +231,24 @@ _xfs_mru_cache_list_insert(
  * data store, removing it from the reap list, calling the client's free
  * function and deleting the element from the element zone.
  *
- * We get called holding the mru->lock, which we drop and then reacquire.
- * Sparse need special help with this to tell it we know what we are doing.
+ * We get called holding the mru->store lock, which we drop and then reacquire.
+ * Sparse needs special help with this to tell it we know what we are doing.
  */
 STATIC void
 _xfs_mru_cache_clear_reap_list(
 	struct xfs_mru_cache	*mru)
-		__releases(mru->lock) __acquires(mru->lock)
+		__releases(mru->store) __acquires(mru->store)
 {
+	XA_STATE(xas, 0);
 	struct xfs_mru_cache_elem *elem, *next;
 	struct list_head	tmp;
 
 	INIT_LIST_HEAD(&tmp);
 	list_for_each_entry_safe(elem, next, &mru->reap_list, list_node) {
+		xas_set(&xas, elem->key);
 
 		/* Remove the element from the data store. */
-		radix_tree_delete(&mru->store, elem->key);
+		xas_store(&mru->store, &xas, NULL);
 
 		/*
 		 * remove to temp list so it can be freed without
@@ -255,14 +256,14 @@ _xfs_mru_cache_clear_reap_list(
 		 */
 		list_move(&elem->list_node, &tmp);
 	}
-	spin_unlock(&mru->lock);
+	xa_unlock(&mru->store);
 
 	list_for_each_entry_safe(elem, next, &tmp, list_node) {
 		list_del_init(&elem->list_node);
 		mru->free_func(elem);
 	}
 
-	spin_lock(&mru->lock);
+	xa_lock(&mru->store);
 }
 
 /*
@@ -284,7 +285,7 @@ _xfs_mru_cache_reap(
 	if (!mru || !mru->lists)
 		return;
 
-	spin_lock(&mru->lock);
+	xa_lock(&mru->store);
 	next = _xfs_mru_cache_migrate(mru, jiffies);
 	_xfs_mru_cache_clear_reap_list(mru);
 
@@ -298,7 +299,7 @@ _xfs_mru_cache_reap(
 		queue_delayed_work(xfs_mru_reap_wq, &mru->work, next);
 	}
 
-	spin_unlock(&mru->lock);
+	xa_unlock(&mru->store);
 }
 
 int
@@ -358,13 +359,8 @@ xfs_mru_cache_create(
 	for (grp = 0; grp < mru->grp_count; grp++)
 		INIT_LIST_HEAD(mru->lists + grp);
 
-	/*
-	 * We use GFP_KERNEL radix tree preload and do inserts under a
-	 * spinlock so GFP_ATOMIC is appropriate for the radix tree itself.
-	 */
-	INIT_RADIX_TREE(&mru->store, GFP_ATOMIC);
+	xa_init(&mru->store);
 	INIT_LIST_HEAD(&mru->reap_list);
-	spin_lock_init(&mru->lock);
 	INIT_DELAYED_WORK(&mru->work, _xfs_mru_cache_reap);
 
 	mru->grp_time  = grp_time;
@@ -394,17 +390,17 @@ xfs_mru_cache_flush(
 	if (!mru || !mru->lists)
 		return;
 
-	spin_lock(&mru->lock);
+	xa_lock(&mru->store);
 	if (mru->queued) {
-		spin_unlock(&mru->lock);
+		xa_unlock(&mru->store);
 		cancel_delayed_work_sync(&mru->work);
-		spin_lock(&mru->lock);
+		xa_lock(&mru->store);
 	}
 
 	_xfs_mru_cache_migrate(mru, jiffies + mru->grp_count * mru->grp_time);
 	_xfs_mru_cache_clear_reap_list(mru);
 
-	spin_unlock(&mru->lock);
+	xa_unlock(&mru->store);
 }
 
 void
@@ -431,24 +427,25 @@ xfs_mru_cache_insert(
 	unsigned long		key,
 	struct xfs_mru_cache_elem *elem)
 {
+	XA_STATE(xas, key);
 	int			error;
 
 	ASSERT(mru && mru->lists);
 	if (!mru || !mru->lists)
 		return -EINVAL;
 
-	if (radix_tree_preload(GFP_NOFS))
-		return -ENOMEM;
-
 	INIT_LIST_HEAD(&elem->list_node);
 	elem->key = key;
 
-	spin_lock(&mru->lock);
-	error = radix_tree_insert(&mru->store, key, elem);
-	radix_tree_preload_end();
+retry:
+	xa_lock(&mru->store);
+	xas_store(&mru->store, &xas, elem);
+	error = xas_error(&xas);
 	if (!error)
 		_xfs_mru_cache_list_insert(mru, elem);
-	spin_unlock(&mru->lock);
+	xa_unlock(&mru->store);
+	if (xas_nomem(&xas, GFP_NOFS))
+		goto retry;
 
 	return error;
 }
@@ -464,17 +461,18 @@ xfs_mru_cache_remove(
 	struct xfs_mru_cache	*mru,
 	unsigned long		key)
 {
+	XA_STATE(xas, key);
 	struct xfs_mru_cache_elem *elem;
 
 	ASSERT(mru && mru->lists);
 	if (!mru || !mru->lists)
 		return NULL;
 
-	spin_lock(&mru->lock);
-	elem = radix_tree_delete(&mru->store, key);
+	xa_lock(&mru->store);
+	elem = xas_store(&mru->store, &xas, NULL);
 	if (elem)
 		list_del(&elem->list_node);
-	spin_unlock(&mru->lock);
+	xa_unlock(&mru->store);
 
 	return elem;
 }
@@ -520,20 +518,21 @@ xfs_mru_cache_lookup(
 	struct xfs_mru_cache	*mru,
 	unsigned long		key)
 {
+	XA_STATE(xas, key);
 	struct xfs_mru_cache_elem *elem;
 
 	ASSERT(mru && mru->lists);
 	if (!mru || !mru->lists)
 		return NULL;
 
-	spin_lock(&mru->lock);
-	elem = radix_tree_lookup(&mru->store, key);
+	xa_lock(&mru->store);
+	elem = xas_load(&mru->store, &xas);
 	if (elem) {
 		list_del(&elem->list_node);
 		_xfs_mru_cache_list_insert(mru, elem);
-		__release(mru_lock); /* help sparse not be stupid */
+		__release(&mru->store); /* help sparse not be stupid */
 	} else
-		spin_unlock(&mru->lock);
+		xa_unlock(&mru->store);
 
 	return elem;
 }
@@ -546,7 +545,7 @@ xfs_mru_cache_lookup(
 void
 xfs_mru_cache_done(
 	struct xfs_mru_cache	*mru)
-		__releases(mru->lock)
+		__releases(mru->store)
 {
-	spin_unlock(&mru->lock);
+	xa_unlock(&mru->store);
 }
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 48/62] block: Remove IDR preloading
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The IDR now handles its own locking, so if we remove the locking in
genhd, we can also remove the memory preloading.  The genhd needs to
protect the object retrieved from the IDR against removal until its
refcount has been elevated, so hold the IDR's lock during lookup.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 block/genhd.c | 23 +++++------------------
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index c2223f12a805..d50bd99e8ce2 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -30,10 +30,7 @@ struct kobject *block_depr;
 /* for extended dynamic devt allocation, currently only one major is used */
 #define NR_EXT_DEVT		(1 << MINORBITS)
 
-/* For extended devt allocation.  ext_devt_lock prevents look up
- * results from going away underneath its user.
- */
-static DEFINE_SPINLOCK(ext_devt_lock);
+/* For extended devt allocation. */
 static DEFINE_IDR(ext_devt_idr);
 
 static const struct device_type disk_type;
@@ -467,14 +464,7 @@ int blk_alloc_devt(struct hd_struct *part, dev_t *devt)
 		return 0;
 	}
 
-	/* allocate ext devt */
-	idr_preload(GFP_KERNEL);
-
-	spin_lock_bh(&ext_devt_lock);
-	idx = idr_alloc(&ext_devt_idr, part, 0, NR_EXT_DEVT, GFP_NOWAIT);
-	spin_unlock_bh(&ext_devt_lock);
-
-	idr_preload_end();
+	idx = idr_alloc(&ext_devt_idr, part, 0, NR_EXT_DEVT, GFP_KERNEL);
 	if (idx < 0)
 		return idx == -ENOSPC ? -EBUSY : idx;
 
@@ -496,11 +486,8 @@ void blk_free_devt(dev_t devt)
 	if (devt == MKDEV(0, 0))
 		return;
 
-	if (MAJOR(devt) == BLOCK_EXT_MAJOR) {
-		spin_lock_bh(&ext_devt_lock);
+	if (MAJOR(devt) == BLOCK_EXT_MAJOR)
 		idr_remove(&ext_devt_idr, blk_mangle_minor(MINOR(devt)));
-		spin_unlock_bh(&ext_devt_lock);
-	}
 }
 
 static char *bdevt_str(dev_t devt, char *buf)
@@ -789,13 +776,13 @@ struct gendisk *get_gendisk(dev_t devt, int *partno)
 	} else {
 		struct hd_struct *part;
 
-		spin_lock_bh(&ext_devt_lock);
+		idr_lock_bh(&ext_devt_idr);
 		part = idr_find(&ext_devt_idr, blk_mangle_minor(MINOR(devt)));
 		if (part && get_disk(part_to_disk(part))) {
 			*partno = part->partno;
 			disk = part_to_disk(part);
 		}
-		spin_unlock_bh(&ext_devt_lock);
+		idr_unlock_bh(&ext_devt_idr);
 	}
 
 	if (disk && unlikely(disk->flags & GENHD_FL_HIDDEN)) {
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 48/62] block: Remove IDR preloading
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The IDR now handles its own locking, so if we remove the locking in
genhd, we can also remove the memory preloading.  The genhd needs to
protect the object retrieved from the IDR against removal until its
refcount has been elevated, so hold the IDR's lock during lookup.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 block/genhd.c | 23 +++++------------------
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index c2223f12a805..d50bd99e8ce2 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -30,10 +30,7 @@ struct kobject *block_depr;
 /* for extended dynamic devt allocation, currently only one major is used */
 #define NR_EXT_DEVT		(1 << MINORBITS)
 
-/* For extended devt allocation.  ext_devt_lock prevents look up
- * results from going away underneath its user.
- */
-static DEFINE_SPINLOCK(ext_devt_lock);
+/* For extended devt allocation. */
 static DEFINE_IDR(ext_devt_idr);
 
 static const struct device_type disk_type;
@@ -467,14 +464,7 @@ int blk_alloc_devt(struct hd_struct *part, dev_t *devt)
 		return 0;
 	}
 
-	/* allocate ext devt */
-	idr_preload(GFP_KERNEL);
-
-	spin_lock_bh(&ext_devt_lock);
-	idx = idr_alloc(&ext_devt_idr, part, 0, NR_EXT_DEVT, GFP_NOWAIT);
-	spin_unlock_bh(&ext_devt_lock);
-
-	idr_preload_end();
+	idx = idr_alloc(&ext_devt_idr, part, 0, NR_EXT_DEVT, GFP_KERNEL);
 	if (idx < 0)
 		return idx == -ENOSPC ? -EBUSY : idx;
 
@@ -496,11 +486,8 @@ void blk_free_devt(dev_t devt)
 	if (devt == MKDEV(0, 0))
 		return;
 
-	if (MAJOR(devt) == BLOCK_EXT_MAJOR) {
-		spin_lock_bh(&ext_devt_lock);
+	if (MAJOR(devt) == BLOCK_EXT_MAJOR)
 		idr_remove(&ext_devt_idr, blk_mangle_minor(MINOR(devt)));
-		spin_unlock_bh(&ext_devt_lock);
-	}
 }
 
 static char *bdevt_str(dev_t devt, char *buf)
@@ -789,13 +776,13 @@ struct gendisk *get_gendisk(dev_t devt, int *partno)
 	} else {
 		struct hd_struct *part;
 
-		spin_lock_bh(&ext_devt_lock);
+		idr_lock_bh(&ext_devt_idr);
 		part = idr_find(&ext_devt_idr, blk_mangle_minor(MINOR(devt)));
 		if (part && get_disk(part_to_disk(part))) {
 			*partno = part->partno;
 			disk = part_to_disk(part);
 		}
-		spin_unlock_bh(&ext_devt_lock);
+		idr_unlock_bh(&ext_devt_idr);
 	}
 
 	if (disk && unlikely(disk->flags & GENHD_FL_HIDDEN)) {
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 49/62] rxrpc: Remove IDR preloading
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The IDR now handles its own locking, so if we remove the locking in
rxrpc, we can also remove the memory preloading.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 net/rxrpc/conn_client.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c
index 7e8bf10fec86..d61fbd359bfa 100644
--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -91,7 +91,6 @@ __read_mostly unsigned int rxrpc_conn_idle_client_fast_expiry = 2 * HZ;
 /*
  * We use machine-unique IDs for our client connections.
  */
-static DEFINE_SPINLOCK(rxrpc_conn_id_lock);
 int rxrpc_client_conn_cursor;
 DEFINE_IDR(rxrpc_client_conn_ids);
 
@@ -111,12 +110,8 @@ static int rxrpc_get_client_connection_id(struct rxrpc_connection *conn,
 
 	_enter("");
 
-	idr_preload(gfp);
-	spin_lock(&rxrpc_conn_id_lock);
 	id = idr_alloc_cyclic(&rxrpc_client_conn_ids, &rxrpc_client_conn_cursor,
-				conn, 1, 0x40000000, GFP_NOWAIT);
-	spin_unlock(&rxrpc_conn_id_lock);
-	idr_preload_end();
+				conn, 1, 0x40000000, gfp);
 	if (id < 0)
 		goto error;
 
@@ -137,10 +132,8 @@ static int rxrpc_get_client_connection_id(struct rxrpc_connection *conn,
 static void rxrpc_put_client_connection_id(struct rxrpc_connection *conn)
 {
 	if (test_bit(RXRPC_CONN_HAS_IDR, &conn->flags)) {
-		spin_lock(&rxrpc_conn_id_lock);
 		idr_remove(&rxrpc_client_conn_ids,
 			   conn->proto.cid >> RXRPC_CIDSHIFT);
-		spin_unlock(&rxrpc_conn_id_lock);
 	}
 }
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 49/62] rxrpc: Remove IDR preloading
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The IDR now handles its own locking, so if we remove the locking in
rxrpc, we can also remove the memory preloading.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 net/rxrpc/conn_client.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c
index 7e8bf10fec86..d61fbd359bfa 100644
--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -91,7 +91,6 @@ __read_mostly unsigned int rxrpc_conn_idle_client_fast_expiry = 2 * HZ;
 /*
  * We use machine-unique IDs for our client connections.
  */
-static DEFINE_SPINLOCK(rxrpc_conn_id_lock);
 int rxrpc_client_conn_cursor;
 DEFINE_IDR(rxrpc_client_conn_ids);
 
@@ -111,12 +110,8 @@ static int rxrpc_get_client_connection_id(struct rxrpc_connection *conn,
 
 	_enter("");
 
-	idr_preload(gfp);
-	spin_lock(&rxrpc_conn_id_lock);
 	id = idr_alloc_cyclic(&rxrpc_client_conn_ids, &rxrpc_client_conn_cursor,
-				conn, 1, 0x40000000, GFP_NOWAIT);
-	spin_unlock(&rxrpc_conn_id_lock);
-	idr_preload_end();
+				conn, 1, 0x40000000, gfp);
 	if (id < 0)
 		goto error;
 
@@ -137,10 +132,8 @@ static int rxrpc_get_client_connection_id(struct rxrpc_connection *conn,
 static void rxrpc_put_client_connection_id(struct rxrpc_connection *conn)
 {
 	if (test_bit(RXRPC_CONN_HAS_IDR, &conn->flags)) {
-		spin_lock(&rxrpc_conn_id_lock);
 		idr_remove(&rxrpc_client_conn_ids,
 			   conn->proto.cid >> RXRPC_CIDSHIFT);
-		spin_unlock(&rxrpc_conn_id_lock);
 	}
 }
 
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 50/62] cgroup: Remove IDR wrappers
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The IDR now behaves the way the cgroup wrappers made the IDR behave,
so the wrappers are no longer needed.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 kernel/cgroup/cgroup.c | 59 ++++++++++----------------------------------------
 1 file changed, 11 insertions(+), 48 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 351b355336d4..67b4c2aa69a9 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -80,12 +80,6 @@ EXPORT_SYMBOL_GPL(cgroup_mutex);
 EXPORT_SYMBOL_GPL(css_set_lock);
 #endif
 
-/*
- * Protects cgroup_idr and css_idr so that IDs can be released without
- * grabbing cgroup_mutex.
- */
-static DEFINE_SPINLOCK(cgroup_idr_lock);
-
 /*
  * Protects cgroup_file->kn for !self csses.  It synchronizes notifications
  * against file removal/re-creation across css hiding.
@@ -291,37 +285,6 @@ bool cgroup_on_dfl(const struct cgroup *cgrp)
 	return cgrp->root == &cgrp_dfl_root;
 }
 
-/* IDR wrappers which synchronize using cgroup_idr_lock */
-static int cgroup_idr_alloc(struct idr *idr, void *ptr, int start, int end,
-			    gfp_t gfp_mask)
-{
-	int ret;
-
-	idr_preload(gfp_mask);
-	spin_lock_bh(&cgroup_idr_lock);
-	ret = idr_alloc(idr, ptr, start, end, gfp_mask & ~__GFP_DIRECT_RECLAIM);
-	spin_unlock_bh(&cgroup_idr_lock);
-	idr_preload_end();
-	return ret;
-}
-
-static void *cgroup_idr_replace(struct idr *idr, void *ptr, int id)
-{
-	void *ret;
-
-	spin_lock_bh(&cgroup_idr_lock);
-	ret = idr_replace(idr, ptr, id);
-	spin_unlock_bh(&cgroup_idr_lock);
-	return ret;
-}
-
-static void cgroup_idr_remove(struct idr *idr, int id)
-{
-	spin_lock_bh(&cgroup_idr_lock);
-	idr_remove(idr, id);
-	spin_unlock_bh(&cgroup_idr_lock);
-}
-
 static bool cgroup_has_tasks(struct cgroup *cgrp)
 {
 	return cgrp->nr_populated_csets;
@@ -1883,7 +1846,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags)
 
 	lockdep_assert_held(&cgroup_mutex);
 
-	ret = cgroup_idr_alloc(&root->cgroup_idr, root_cgrp, 1, 2, GFP_KERNEL);
+	ret = idr_alloc(&root->cgroup_idr, root_cgrp, 1, 2, GFP_KERNEL);
 	if (ret < 0)
 		goto out;
 	root_cgrp->id = ret;
@@ -4532,7 +4495,7 @@ static void css_free_work_fn(struct work_struct *work)
 		int id = css->id;
 
 		ss->css_free(css);
-		cgroup_idr_remove(&ss->css_idr, id);
+		idr_remove(&ss->css_idr, id);
 		cgroup_put(cgrp);
 
 		if (parent)
@@ -4589,7 +4552,7 @@ static void css_release_work_fn(struct work_struct *work)
 
 	if (ss) {
 		/* css release path */
-		cgroup_idr_replace(&ss->css_idr, NULL, css->id);
+		idr_replace(&ss->css_idr, NULL, css->id);
 		if (ss->css_released)
 			ss->css_released(css);
 	} else {
@@ -4605,7 +4568,7 @@ static void css_release_work_fn(struct work_struct *work)
 		     tcgrp = cgroup_parent(tcgrp))
 			tcgrp->nr_dying_descendants--;
 
-		cgroup_idr_remove(&cgrp->root->cgroup_idr, cgrp->id);
+		idr_remove(&cgrp->root->cgroup_idr, cgrp->id);
 		cgrp->id = -1;
 
 		/*
@@ -4731,14 +4694,14 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
 	if (err)
 		goto err_free_css;
 
-	err = cgroup_idr_alloc(&ss->css_idr, NULL, 2, 0, GFP_KERNEL);
+	err = idr_alloc(&ss->css_idr, NULL, 2, 0, GFP_KERNEL);
 	if (err < 0)
 		goto err_free_css;
 	css->id = err;
 
 	/* @css is ready to be brought online now, make it visible */
 	list_add_tail_rcu(&css->sibling, &parent_css->children);
-	cgroup_idr_replace(&ss->css_idr, css, css->id);
+	idr_replace(&ss->css_idr, css, css->id);
 
 	err = online_css(css);
 	if (err)
@@ -4794,7 +4757,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	 * Temporarily set the pointer to NULL, so idr_find() won't return
 	 * a half-baked cgroup.
 	 */
-	cgrp->id = cgroup_idr_alloc(&root->cgroup_idr, NULL, 2, 0, GFP_KERNEL);
+	cgrp->id = idr_alloc(&root->cgroup_idr, NULL, 2, 0, GFP_KERNEL);
 	if (cgrp->id < 0) {
 		ret = -ENOMEM;
 		goto out_stat_exit;
@@ -4833,7 +4796,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	 * @cgrp is now fully operational.  If something fails after this
 	 * point, it'll be released via the normal destruction path.
 	 */
-	cgroup_idr_replace(&root->cgroup_idr, cgrp, cgrp->id);
+	idr_replace(&root->cgroup_idr, cgrp, cgrp->id);
 
 	/*
 	 * On the default hierarchy, a child doesn't automatically inherit
@@ -4847,7 +4810,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	return cgrp;
 
 out_idr_free:
-	cgroup_idr_remove(&root->cgroup_idr, cgrp->id);
+	idr_remove(&root->cgroup_idr, cgrp->id);
 out_stat_exit:
 	if (cgroup_on_dfl(parent))
 		cgroup_stat_exit(cgrp);
@@ -5166,7 +5129,7 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early)
 		/* allocation can't be done safely during early init */
 		css->id = 1;
 	} else {
-		css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL);
+		css->id = idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL);
 		BUG_ON(css->id < 0);
 	}
 
@@ -5273,7 +5236,7 @@ int __init cgroup_init(void)
 			struct cgroup_subsys_state *css =
 				init_css_set.subsys[ss->id];
 
-			css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2,
+			css->id = idr_alloc(&ss->css_idr, css, 1, 2,
 						   GFP_KERNEL);
 			BUG_ON(css->id < 0);
 		} else {
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 50/62] cgroup: Remove IDR wrappers
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The IDR now behaves the way the cgroup wrappers made the IDR behave,
so the wrappers are no longer needed.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 kernel/cgroup/cgroup.c | 59 ++++++++++----------------------------------------
 1 file changed, 11 insertions(+), 48 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 351b355336d4..67b4c2aa69a9 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -80,12 +80,6 @@ EXPORT_SYMBOL_GPL(cgroup_mutex);
 EXPORT_SYMBOL_GPL(css_set_lock);
 #endif
 
-/*
- * Protects cgroup_idr and css_idr so that IDs can be released without
- * grabbing cgroup_mutex.
- */
-static DEFINE_SPINLOCK(cgroup_idr_lock);
-
 /*
  * Protects cgroup_file->kn for !self csses.  It synchronizes notifications
  * against file removal/re-creation across css hiding.
@@ -291,37 +285,6 @@ bool cgroup_on_dfl(const struct cgroup *cgrp)
 	return cgrp->root == &cgrp_dfl_root;
 }
 
-/* IDR wrappers which synchronize using cgroup_idr_lock */
-static int cgroup_idr_alloc(struct idr *idr, void *ptr, int start, int end,
-			    gfp_t gfp_mask)
-{
-	int ret;
-
-	idr_preload(gfp_mask);
-	spin_lock_bh(&cgroup_idr_lock);
-	ret = idr_alloc(idr, ptr, start, end, gfp_mask & ~__GFP_DIRECT_RECLAIM);
-	spin_unlock_bh(&cgroup_idr_lock);
-	idr_preload_end();
-	return ret;
-}
-
-static void *cgroup_idr_replace(struct idr *idr, void *ptr, int id)
-{
-	void *ret;
-
-	spin_lock_bh(&cgroup_idr_lock);
-	ret = idr_replace(idr, ptr, id);
-	spin_unlock_bh(&cgroup_idr_lock);
-	return ret;
-}
-
-static void cgroup_idr_remove(struct idr *idr, int id)
-{
-	spin_lock_bh(&cgroup_idr_lock);
-	idr_remove(idr, id);
-	spin_unlock_bh(&cgroup_idr_lock);
-}
-
 static bool cgroup_has_tasks(struct cgroup *cgrp)
 {
 	return cgrp->nr_populated_csets;
@@ -1883,7 +1846,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags)
 
 	lockdep_assert_held(&cgroup_mutex);
 
-	ret = cgroup_idr_alloc(&root->cgroup_idr, root_cgrp, 1, 2, GFP_KERNEL);
+	ret = idr_alloc(&root->cgroup_idr, root_cgrp, 1, 2, GFP_KERNEL);
 	if (ret < 0)
 		goto out;
 	root_cgrp->id = ret;
@@ -4532,7 +4495,7 @@ static void css_free_work_fn(struct work_struct *work)
 		int id = css->id;
 
 		ss->css_free(css);
-		cgroup_idr_remove(&ss->css_idr, id);
+		idr_remove(&ss->css_idr, id);
 		cgroup_put(cgrp);
 
 		if (parent)
@@ -4589,7 +4552,7 @@ static void css_release_work_fn(struct work_struct *work)
 
 	if (ss) {
 		/* css release path */
-		cgroup_idr_replace(&ss->css_idr, NULL, css->id);
+		idr_replace(&ss->css_idr, NULL, css->id);
 		if (ss->css_released)
 			ss->css_released(css);
 	} else {
@@ -4605,7 +4568,7 @@ static void css_release_work_fn(struct work_struct *work)
 		     tcgrp = cgroup_parent(tcgrp))
 			tcgrp->nr_dying_descendants--;
 
-		cgroup_idr_remove(&cgrp->root->cgroup_idr, cgrp->id);
+		idr_remove(&cgrp->root->cgroup_idr, cgrp->id);
 		cgrp->id = -1;
 
 		/*
@@ -4731,14 +4694,14 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
 	if (err)
 		goto err_free_css;
 
-	err = cgroup_idr_alloc(&ss->css_idr, NULL, 2, 0, GFP_KERNEL);
+	err = idr_alloc(&ss->css_idr, NULL, 2, 0, GFP_KERNEL);
 	if (err < 0)
 		goto err_free_css;
 	css->id = err;
 
 	/* @css is ready to be brought online now, make it visible */
 	list_add_tail_rcu(&css->sibling, &parent_css->children);
-	cgroup_idr_replace(&ss->css_idr, css, css->id);
+	idr_replace(&ss->css_idr, css, css->id);
 
 	err = online_css(css);
 	if (err)
@@ -4794,7 +4757,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	 * Temporarily set the pointer to NULL, so idr_find() won't return
 	 * a half-baked cgroup.
 	 */
-	cgrp->id = cgroup_idr_alloc(&root->cgroup_idr, NULL, 2, 0, GFP_KERNEL);
+	cgrp->id = idr_alloc(&root->cgroup_idr, NULL, 2, 0, GFP_KERNEL);
 	if (cgrp->id < 0) {
 		ret = -ENOMEM;
 		goto out_stat_exit;
@@ -4833,7 +4796,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	 * @cgrp is now fully operational.  If something fails after this
 	 * point, it'll be released via the normal destruction path.
 	 */
-	cgroup_idr_replace(&root->cgroup_idr, cgrp, cgrp->id);
+	idr_replace(&root->cgroup_idr, cgrp, cgrp->id);
 
 	/*
 	 * On the default hierarchy, a child doesn't automatically inherit
@@ -4847,7 +4810,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	return cgrp;
 
 out_idr_free:
-	cgroup_idr_remove(&root->cgroup_idr, cgrp->id);
+	idr_remove(&root->cgroup_idr, cgrp->id);
 out_stat_exit:
 	if (cgroup_on_dfl(parent))
 		cgroup_stat_exit(cgrp);
@@ -5166,7 +5129,7 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early)
 		/* allocation can't be done safely during early init */
 		css->id = 1;
 	} else {
-		css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL);
+		css->id = idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL);
 		BUG_ON(css->id < 0);
 	}
 
@@ -5273,7 +5236,7 @@ int __init cgroup_init(void)
 			struct cgroup_subsys_state *css =
 				init_css_set.subsys[ss->id];
 
-			css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2,
+			css->id = idr_alloc(&ss->css_idr, css, 1, 2,
 						   GFP_KERNEL);
 			BUG_ON(css->id < 0);
 		} else {
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 51/62] dca: Remove idr_preload calls
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The IDR now has its own locking, so remove the dca lock and calls to
idr_preload.  Also, there is no need to call idr_destroy on a freshly
initialised IDR.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 drivers/dca/dca-sysfs.c | 22 ++++------------------
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/drivers/dca/dca-sysfs.c b/drivers/dca/dca-sysfs.c
index 126cf295b198..afa1d5c0f55e 100644
--- a/drivers/dca/dca-sysfs.c
+++ b/drivers/dca/dca-sysfs.c
@@ -31,7 +31,6 @@
 
 static struct class *dca_class;
 static struct idr dca_idr;
-static spinlock_t dca_idr_lock;
 
 int dca_sysfs_add_req(struct dca_provider *dca, struct device *dev, int slot)
 {
@@ -55,23 +54,14 @@ int dca_sysfs_add_provider(struct dca_provider *dca, struct device *dev)
 	struct device *cd;
 	int ret;
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&dca_idr_lock);
-
-	ret = idr_alloc(&dca_idr, dca, 0, 0, GFP_NOWAIT);
-	if (ret >= 0)
-		dca->id = ret;
-
-	spin_unlock(&dca_idr_lock);
-	idr_preload_end();
+	ret = idr_alloc(&dca_idr, dca, 0, 0, GFP_KERNEL);
 	if (ret < 0)
 		return ret;
+	dca->id = ret;
 
 	cd = device_create(dca_class, dev, MKDEV(0, 0), NULL, "dca%d", dca->id);
 	if (IS_ERR(cd)) {
-		spin_lock(&dca_idr_lock);
 		idr_remove(&dca_idr, dca->id);
-		spin_unlock(&dca_idr_lock);
 		return PTR_ERR(cd);
 	}
 	dca->cd = cd;
@@ -82,21 +72,17 @@ void dca_sysfs_remove_provider(struct dca_provider *dca)
 {
 	device_unregister(dca->cd);
 	dca->cd = NULL;
-	spin_lock(&dca_idr_lock);
 	idr_remove(&dca_idr, dca->id);
-	spin_unlock(&dca_idr_lock);
 }
 
 int __init dca_sysfs_init(void)
 {
 	idr_init(&dca_idr);
-	spin_lock_init(&dca_idr_lock);
 
 	dca_class = class_create(THIS_MODULE, "dca");
-	if (IS_ERR(dca_class)) {
-		idr_destroy(&dca_idr);
+	if (IS_ERR(dca_class))
 		return PTR_ERR(dca_class);
-	}
+
 	return 0;
 }
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 51/62] dca: Remove idr_preload calls
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The IDR now has its own locking, so remove the dca lock and calls to
idr_preload.  Also, there is no need to call idr_destroy on a freshly
initialised IDR.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 drivers/dca/dca-sysfs.c | 22 ++++------------------
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/drivers/dca/dca-sysfs.c b/drivers/dca/dca-sysfs.c
index 126cf295b198..afa1d5c0f55e 100644
--- a/drivers/dca/dca-sysfs.c
+++ b/drivers/dca/dca-sysfs.c
@@ -31,7 +31,6 @@
 
 static struct class *dca_class;
 static struct idr dca_idr;
-static spinlock_t dca_idr_lock;
 
 int dca_sysfs_add_req(struct dca_provider *dca, struct device *dev, int slot)
 {
@@ -55,23 +54,14 @@ int dca_sysfs_add_provider(struct dca_provider *dca, struct device *dev)
 	struct device *cd;
 	int ret;
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&dca_idr_lock);
-
-	ret = idr_alloc(&dca_idr, dca, 0, 0, GFP_NOWAIT);
-	if (ret >= 0)
-		dca->id = ret;
-
-	spin_unlock(&dca_idr_lock);
-	idr_preload_end();
+	ret = idr_alloc(&dca_idr, dca, 0, 0, GFP_KERNEL);
 	if (ret < 0)
 		return ret;
+	dca->id = ret;
 
 	cd = device_create(dca_class, dev, MKDEV(0, 0), NULL, "dca%d", dca->id);
 	if (IS_ERR(cd)) {
-		spin_lock(&dca_idr_lock);
 		idr_remove(&dca_idr, dca->id);
-		spin_unlock(&dca_idr_lock);
 		return PTR_ERR(cd);
 	}
 	dca->cd = cd;
@@ -82,21 +72,17 @@ void dca_sysfs_remove_provider(struct dca_provider *dca)
 {
 	device_unregister(dca->cd);
 	dca->cd = NULL;
-	spin_lock(&dca_idr_lock);
 	idr_remove(&dca_idr, dca->id);
-	spin_unlock(&dca_idr_lock);
 }
 
 int __init dca_sysfs_init(void)
 {
 	idr_init(&dca_idr);
-	spin_lock_init(&dca_idr_lock);
 
 	dca_class = class_create(THIS_MODULE, "dca");
-	if (IS_ERR(dca_class)) {
-		idr_destroy(&dca_idr);
+	if (IS_ERR(dca_class))
 		return PTR_ERR(dca_class);
-	}
+
 	return 0;
 }
 
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 52/62] ipc: Remove call to idr_preload
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The ipc code follows a different pattern to most IDR preload users;
the lock that it is holding is embedded in the object, not protecting
the IDR (the IDR is protected by an rwsem).  Instead of dropping the
lock, allocating memory and retrying, we reserve a slot in the tree
before grabbing the lock, then merely replace the NULL entry with the
initialised object.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 ipc/util.c | 28 ++++++++++++----------------
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/ipc/util.c b/ipc/util.c
index ff045fec8d83..9a20354ce4fb 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -193,10 +193,10 @@ static struct kern_ipc_perm *ipc_findkey(struct ipc_ids *ids, key_t key)
 /*
  * Specify desired id for next allocated IPC object.
  */
-#define ipc_idr_alloc(ids, new)						\
-	idr_alloc(&(ids)->ipcs_idr, (new),				\
+#define ipc_idr_alloc(ids)						\
+	idr_alloc(&(ids)->ipcs_idr, NULL,				\
 		  (ids)->next_id < 0 ? 0 : ipcid_to_idx((ids)->next_id),\
-		  0, GFP_NOWAIT)
+		  0, GFP_KERNEL)
 
 static inline int ipc_buildid(int id, struct ipc_ids *ids,
 			      struct kern_ipc_perm *new)
@@ -214,8 +214,8 @@ static inline int ipc_buildid(int id, struct ipc_ids *ids,
 }
 
 #else
-#define ipc_idr_alloc(ids, new)					\
-	idr_alloc(&(ids)->ipcs_idr, (new), 0, 0, GFP_NOWAIT)
+#define ipc_idr_alloc(ids)					\
+	idr_alloc(&(ids)->ipcs_idr, NULL, 0, 0, GFP_KERNEL)
 
 static inline int ipc_buildid(int id, struct ipc_ids *ids,
 			      struct kern_ipc_perm *new)
@@ -254,34 +254,30 @@ int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int limit)
 	if (!ids->tables_initialized || ids->in_use >= limit)
 		return -ENOSPC;
 
-	idr_preload(GFP_KERNEL);
+	id = ipc_idr_alloc(ids);
+	if (id < 0)
+		return id;
 
 	refcount_set(&new->refcount, 1);
 	spin_lock_init(&new->lock);
 	new->deleted = false;
-	rcu_read_lock();
 	spin_lock(&new->lock);
 
 	current_euid_egid(&euid, &egid);
 	new->cuid = new->uid = euid;
 	new->gid = new->cgid = egid;
 
-	id = ipc_idr_alloc(ids, new);
-	idr_preload_end();
+	idr_replace(&ids->ipcs_idr, new, id);
 
-	if (id >= 0 && new->key != IPC_PRIVATE) {
+	if (new->key != IPC_PRIVATE) {
 		err = rhashtable_insert_fast(&ids->key_ht, &new->khtnode,
 					     ipc_kht_params);
 		if (err < 0) {
 			idr_remove(&ids->ipcs_idr, id);
-			id = err;
+			spin_unlock(&new->lock);
+			return err;
 		}
 	}
-	if (id < 0) {
-		spin_unlock(&new->lock);
-		rcu_read_unlock();
-		return id;
-	}
 
 	ids->in_use++;
 	if (id > ids->max_id)
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 52/62] ipc: Remove call to idr_preload
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The ipc code follows a different pattern to most IDR preload users;
the lock that it is holding is embedded in the object, not protecting
the IDR (the IDR is protected by an rwsem).  Instead of dropping the
lock, allocating memory and retrying, we reserve a slot in the tree
before grabbing the lock, then merely replace the NULL entry with the
initialised object.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 ipc/util.c | 28 ++++++++++++----------------
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/ipc/util.c b/ipc/util.c
index ff045fec8d83..9a20354ce4fb 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -193,10 +193,10 @@ static struct kern_ipc_perm *ipc_findkey(struct ipc_ids *ids, key_t key)
 /*
  * Specify desired id for next allocated IPC object.
  */
-#define ipc_idr_alloc(ids, new)						\
-	idr_alloc(&(ids)->ipcs_idr, (new),				\
+#define ipc_idr_alloc(ids)						\
+	idr_alloc(&(ids)->ipcs_idr, NULL,				\
 		  (ids)->next_id < 0 ? 0 : ipcid_to_idx((ids)->next_id),\
-		  0, GFP_NOWAIT)
+		  0, GFP_KERNEL)
 
 static inline int ipc_buildid(int id, struct ipc_ids *ids,
 			      struct kern_ipc_perm *new)
@@ -214,8 +214,8 @@ static inline int ipc_buildid(int id, struct ipc_ids *ids,
 }
 
 #else
-#define ipc_idr_alloc(ids, new)					\
-	idr_alloc(&(ids)->ipcs_idr, (new), 0, 0, GFP_NOWAIT)
+#define ipc_idr_alloc(ids)					\
+	idr_alloc(&(ids)->ipcs_idr, NULL, 0, 0, GFP_KERNEL)
 
 static inline int ipc_buildid(int id, struct ipc_ids *ids,
 			      struct kern_ipc_perm *new)
@@ -254,34 +254,30 @@ int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int limit)
 	if (!ids->tables_initialized || ids->in_use >= limit)
 		return -ENOSPC;
 
-	idr_preload(GFP_KERNEL);
+	id = ipc_idr_alloc(ids);
+	if (id < 0)
+		return id;
 
 	refcount_set(&new->refcount, 1);
 	spin_lock_init(&new->lock);
 	new->deleted = false;
-	rcu_read_lock();
 	spin_lock(&new->lock);
 
 	current_euid_egid(&euid, &egid);
 	new->cuid = new->uid = euid;
 	new->gid = new->cgid = egid;
 
-	id = ipc_idr_alloc(ids, new);
-	idr_preload_end();
+	idr_replace(&ids->ipcs_idr, new, id);
 
-	if (id >= 0 && new->key != IPC_PRIVATE) {
+	if (new->key != IPC_PRIVATE) {
 		err = rhashtable_insert_fast(&ids->key_ht, &new->khtnode,
 					     ipc_kht_params);
 		if (err < 0) {
 			idr_remove(&ids->ipcs_idr, id);
-			id = err;
+			spin_unlock(&new->lock);
+			return err;
 		}
 	}
-	if (id < 0) {
-		spin_unlock(&new->lock);
-		rcu_read_unlock();
-		return id;
-	}
 
 	ids->in_use++;
 	if (id > ids->max_id)
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 53/62] irq: Remove call to idr_preload
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Not entirely clear why the preload was there with no locking.  Maybe at
one time there was a lock, and the preload was never removed?

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 kernel/irq/timings.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/irq/timings.c b/kernel/irq/timings.c
index e0923fa4927a..e9f16527656c 100644
--- a/kernel/irq/timings.c
+++ b/kernel/irq/timings.c
@@ -356,9 +356,7 @@ int irq_timings_alloc(int irq)
 	if (!s)
 		return -ENOMEM;
 
-	idr_preload(GFP_KERNEL);
-	id = idr_alloc(&irqt_stats, s, irq, irq + 1, GFP_NOWAIT);
-	idr_preload_end();
+	id = idr_alloc(&irqt_stats, s, irq, irq + 1, GFP_KERNEL);
 
 	if (id < 0) {
 		free_percpu(s);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 53/62] irq: Remove call to idr_preload
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Not entirely clear why the preload was there with no locking.  Maybe at
one time there was a lock, and the preload was never removed?

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 kernel/irq/timings.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/irq/timings.c b/kernel/irq/timings.c
index e0923fa4927a..e9f16527656c 100644
--- a/kernel/irq/timings.c
+++ b/kernel/irq/timings.c
@@ -356,9 +356,7 @@ int irq_timings_alloc(int irq)
 	if (!s)
 		return -ENOMEM;
 
-	idr_preload(GFP_KERNEL);
-	id = idr_alloc(&irqt_stats, s, irq, irq + 1, GFP_NOWAIT);
-	idr_preload_end();
+	id = idr_alloc(&irqt_stats, s, irq, irq + 1, GFP_KERNEL);
 
 	if (id < 0) {
 		free_percpu(s);
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 54/62] scsi: Remove idr_preload in st driver
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The IDR now has its own locking, so remove the driver's private lock
and calls to idr_preload.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 drivers/scsi/st.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index b141d7641a2e..e8703905c94a 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -221,7 +221,6 @@ static void scsi_tape_release(struct kref *);
 #define to_scsi_tape(obj) container_of(obj, struct scsi_tape, kref)
 
 static DEFINE_MUTEX(st_ref_mutex);
-static DEFINE_SPINLOCK(st_index_lock);
 static DEFINE_SPINLOCK(st_use_lock);
 static DEFINE_IDR(st_index_idr);
 
@@ -242,10 +241,11 @@ static struct scsi_tape *scsi_tape_get(int dev)
 	struct scsi_tape *STp = NULL;
 
 	mutex_lock(&st_ref_mutex);
-	spin_lock(&st_index_lock);
+	idr_lock(&st_index_idr);
 
 	STp = idr_find(&st_index_idr, dev);
-	if (!STp) goto out;
+	if (!STp)
+		goto out;
 
 	kref_get(&STp->kref);
 
@@ -261,7 +261,7 @@ static struct scsi_tape *scsi_tape_get(int dev)
 	kref_put(&STp->kref, scsi_tape_release);
 	STp = NULL;
 out:
-	spin_unlock(&st_index_lock);
+	idr_unlock(&st_index_idr);
 	mutex_unlock(&st_ref_mutex);
 	return STp;
 }
@@ -4370,11 +4370,7 @@ static int st_probe(struct device *dev)
 	    tpnt->blksize_changed = 0;
 	mutex_init(&tpnt->lock);
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&st_index_lock);
-	error = idr_alloc(&st_index_idr, tpnt, 0, ST_MAX_TAPES + 1, GFP_NOWAIT);
-	spin_unlock(&st_index_lock);
-	idr_preload_end();
+	error = idr_alloc(&st_index_idr, tpnt, 0, ST_MAX_TAPES + 1, GFP_KERNEL);
 	if (error < 0) {
 		pr_warn("st: idr allocation failed: %d\n", error);
 		goto out_put_queue;
@@ -4408,9 +4404,7 @@ static int st_probe(struct device *dev)
 	remove_cdevs(tpnt);
 	kfree(tpnt->stats);
 out_idr_remove:
-	spin_lock(&st_index_lock);
 	idr_remove(&st_index_idr, tpnt->index);
-	spin_unlock(&st_index_lock);
 out_put_queue:
 	blk_put_queue(disk->queue);
 out_put_disk:
@@ -4435,9 +4429,7 @@ static int st_remove(struct device *dev)
 	mutex_lock(&st_ref_mutex);
 	kref_put(&tpnt->kref, scsi_tape_release);
 	mutex_unlock(&st_ref_mutex);
-	spin_lock(&st_index_lock);
 	idr_remove(&st_index_idr, index);
-	spin_unlock(&st_index_lock);
 	return 0;
 }
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 54/62] scsi: Remove idr_preload in st driver
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The IDR now has its own locking, so remove the driver's private lock
and calls to idr_preload.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 drivers/scsi/st.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index b141d7641a2e..e8703905c94a 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -221,7 +221,6 @@ static void scsi_tape_release(struct kref *);
 #define to_scsi_tape(obj) container_of(obj, struct scsi_tape, kref)
 
 static DEFINE_MUTEX(st_ref_mutex);
-static DEFINE_SPINLOCK(st_index_lock);
 static DEFINE_SPINLOCK(st_use_lock);
 static DEFINE_IDR(st_index_idr);
 
@@ -242,10 +241,11 @@ static struct scsi_tape *scsi_tape_get(int dev)
 	struct scsi_tape *STp = NULL;
 
 	mutex_lock(&st_ref_mutex);
-	spin_lock(&st_index_lock);
+	idr_lock(&st_index_idr);
 
 	STp = idr_find(&st_index_idr, dev);
-	if (!STp) goto out;
+	if (!STp)
+		goto out;
 
 	kref_get(&STp->kref);
 
@@ -261,7 +261,7 @@ static struct scsi_tape *scsi_tape_get(int dev)
 	kref_put(&STp->kref, scsi_tape_release);
 	STp = NULL;
 out:
-	spin_unlock(&st_index_lock);
+	idr_unlock(&st_index_idr);
 	mutex_unlock(&st_ref_mutex);
 	return STp;
 }
@@ -4370,11 +4370,7 @@ static int st_probe(struct device *dev)
 	    tpnt->blksize_changed = 0;
 	mutex_init(&tpnt->lock);
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&st_index_lock);
-	error = idr_alloc(&st_index_idr, tpnt, 0, ST_MAX_TAPES + 1, GFP_NOWAIT);
-	spin_unlock(&st_index_lock);
-	idr_preload_end();
+	error = idr_alloc(&st_index_idr, tpnt, 0, ST_MAX_TAPES + 1, GFP_KERNEL);
 	if (error < 0) {
 		pr_warn("st: idr allocation failed: %d\n", error);
 		goto out_put_queue;
@@ -4408,9 +4404,7 @@ static int st_probe(struct device *dev)
 	remove_cdevs(tpnt);
 	kfree(tpnt->stats);
 out_idr_remove:
-	spin_lock(&st_index_lock);
 	idr_remove(&st_index_idr, tpnt->index);
-	spin_unlock(&st_index_lock);
 out_put_queue:
 	blk_put_queue(disk->queue);
 out_put_disk:
@@ -4435,9 +4429,7 @@ static int st_remove(struct device *dev)
 	mutex_lock(&st_ref_mutex);
 	kref_put(&tpnt->kref, scsi_tape_release);
 	mutex_unlock(&st_ref_mutex);
-	spin_lock(&st_index_lock);
 	idr_remove(&st_index_idr, index);
-	spin_unlock(&st_index_lock);
 	return 0;
 }
 
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 55/62] firewire: Remove call to idr_preload
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

core-cdev uses a spinlock to protect several elements, including the IDR.
Rather than reusing the IDR's spinlock for all of this, preallocate the
necessary memory by allocating a NULL pointer and then replace it with
the resource pointer once we have the spinlock.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 drivers/firewire/core-cdev.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/firewire/core-cdev.c b/drivers/firewire/core-cdev.c
index a301fcf46e88..52c22c39744e 100644
--- a/drivers/firewire/core-cdev.c
+++ b/drivers/firewire/core-cdev.c
@@ -486,28 +486,24 @@ static int ioctl_get_info(struct client *client, union ioctl_arg *arg)
 static int add_client_resource(struct client *client,
 			       struct client_resource *resource, gfp_t gfp_mask)
 {
-	bool preload = gfpflags_allow_blocking(gfp_mask);
 	unsigned long flags;
 	int ret;
 
-	if (preload)
-		idr_preload(gfp_mask);
-	spin_lock_irqsave(&client->lock, flags);
+	ret = idr_alloc(&client->resource_idr, NULL, 0, 0, gfp_mask);
+	if (ret < 0)
+		return ret;
 
-	if (client->in_shutdown)
+	spin_lock_irqsave(&client->lock, flags);
+	if (client->in_shutdown) {
+		idr_remove(&client->resource_idr, ret);
 		ret = -ECANCELED;
-	else
-		ret = idr_alloc(&client->resource_idr, resource, 0, 0,
-				GFP_NOWAIT);
-	if (ret >= 0) {
+	} else {
+		idr_replace(&client->resource_idr, resource, ret);
 		resource->handle = ret;
 		client_get(client);
 		schedule_if_iso_resource(resource);
 	}
-
 	spin_unlock_irqrestore(&client->lock, flags);
-	if (preload)
-		idr_preload_end();
 
 	return ret < 0 ? ret : 0;
 }
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 55/62] firewire: Remove call to idr_preload
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

core-cdev uses a spinlock to protect several elements, including the IDR.
Rather than reusing the IDR's spinlock for all of this, preallocate the
necessary memory by allocating a NULL pointer and then replace it with
the resource pointer once we have the spinlock.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 drivers/firewire/core-cdev.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/firewire/core-cdev.c b/drivers/firewire/core-cdev.c
index a301fcf46e88..52c22c39744e 100644
--- a/drivers/firewire/core-cdev.c
+++ b/drivers/firewire/core-cdev.c
@@ -486,28 +486,24 @@ static int ioctl_get_info(struct client *client, union ioctl_arg *arg)
 static int add_client_resource(struct client *client,
 			       struct client_resource *resource, gfp_t gfp_mask)
 {
-	bool preload = gfpflags_allow_blocking(gfp_mask);
 	unsigned long flags;
 	int ret;
 
-	if (preload)
-		idr_preload(gfp_mask);
-	spin_lock_irqsave(&client->lock, flags);
+	ret = idr_alloc(&client->resource_idr, NULL, 0, 0, gfp_mask);
+	if (ret < 0)
+		return ret;
 
-	if (client->in_shutdown)
+	spin_lock_irqsave(&client->lock, flags);
+	if (client->in_shutdown) {
+		idr_remove(&client->resource_idr, ret);
 		ret = -ECANCELED;
-	else
-		ret = idr_alloc(&client->resource_idr, resource, 0, 0,
-				GFP_NOWAIT);
-	if (ret >= 0) {
+	} else {
+		idr_replace(&client->resource_idr, resource, ret);
 		resource->handle = ret;
 		client_get(client);
 		schedule_if_iso_resource(resource);
 	}
-
 	spin_unlock_irqrestore(&client->lock, flags);
-	if (preload)
-		idr_preload_end();
 
 	return ret < 0 ? ret : 0;
 }
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 56/62] drm: Remove drm_minor_lock and idr_preload
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The IDR now has its own lock; use it to protect the lookup too.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c      |  8 ++++----
 drivers/gpu/drm/drm_drv.c                    | 25 +++----------------------
 drivers/gpu/drm/drm_gem.c                    | 27 ++++-----------------------
 drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c |  6 +++---
 drivers/gpu/drm/i915/i915_debugfs.c          |  4 ++--
 drivers/gpu/drm/msm/msm_gem_submit.c         | 10 +++++-----
 drivers/gpu/drm/vc4/vc4_gem.c                |  4 ++--
 include/drm/drm_file.h                       |  5 +----
 8 files changed, 24 insertions(+), 65 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index a418df1b9422..25ee4eef3279 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -89,13 +89,13 @@ void amdgpu_gem_force_release(struct amdgpu_device *adev)
 		int handle;
 
 		WARN_ONCE(1, "Still active user space clients!\n");
-		spin_lock(&file->table_lock);
+		idr_lock(&file->object_idr);
 		idr_for_each_entry(&file->object_idr, gobj, handle) {
 			WARN_ONCE(1, "And also active allocations!\n");
 			drm_gem_object_put_unlocked(gobj);
 		}
+		idr_unlock(&file->object_idr);
 		idr_destroy(&file->object_idr);
-		spin_unlock(&file->table_lock);
 	}
 
 	mutex_unlock(&ddev->filelist_mutex);
@@ -824,9 +824,9 @@ static int amdgpu_debugfs_gem_info(struct seq_file *m, void *data)
 			   task ? task->comm : "<unknown>");
 		rcu_read_unlock();
 
-		spin_lock(&file->table_lock);
+		idr_lock(&file->object_idr);
 		idr_for_each(&file->object_idr, amdgpu_debugfs_gem_bo_info, m);
-		spin_unlock(&file->table_lock);
+		idr_unlock(&file->object_idr);
 	}
 
 	mutex_unlock(&dev->filelist_mutex);
diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index a934fd5e7e55..df51db70ea39 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -61,7 +61,6 @@ MODULE_PARM_DESC(debug, "Enable debug output, where each bit enables a debug cat
 "\t\tBit 7 (0x80) will enable LEASE messages (leasing code)");
 module_param_named(debug, drm_debug, int, 0600);
 
-static DEFINE_SPINLOCK(drm_minor_lock);
 static struct idr drm_minors_idr;
 
 /*
@@ -153,7 +152,6 @@ static struct drm_minor **drm_minor_get_slot(struct drm_device *dev,
 static int drm_minor_alloc(struct drm_device *dev, unsigned int type)
 {
 	struct drm_minor *minor;
-	unsigned long flags;
 	int r;
 
 	minor = kzalloc(sizeof(*minor), GFP_KERNEL);
@@ -163,15 +161,11 @@ static int drm_minor_alloc(struct drm_device *dev, unsigned int type)
 	minor->type = type;
 	minor->dev = dev;
 
-	idr_preload(GFP_KERNEL);
-	spin_lock_irqsave(&drm_minor_lock, flags);
 	r = idr_alloc(&drm_minors_idr,
 		      NULL,
 		      64 * type,
 		      64 * (type + 1),
-		      GFP_NOWAIT);
-	spin_unlock_irqrestore(&drm_minor_lock, flags);
-	idr_preload_end();
+		      GFP_KERNEL);
 
 	if (r < 0)
 		goto err_free;
@@ -188,9 +182,7 @@ static int drm_minor_alloc(struct drm_device *dev, unsigned int type)
 	return 0;
 
 err_index:
-	spin_lock_irqsave(&drm_minor_lock, flags);
 	idr_remove(&drm_minors_idr, minor->index);
-	spin_unlock_irqrestore(&drm_minor_lock, flags);
 err_free:
 	kfree(minor);
 	return r;
@@ -199,7 +191,6 @@ static int drm_minor_alloc(struct drm_device *dev, unsigned int type)
 static void drm_minor_free(struct drm_device *dev, unsigned int type)
 {
 	struct drm_minor **slot, *minor;
-	unsigned long flags;
 
 	slot = drm_minor_get_slot(dev, type);
 	minor = *slot;
@@ -207,11 +198,7 @@ static void drm_minor_free(struct drm_device *dev, unsigned int type)
 		return;
 
 	put_device(minor->kdev);
-
-	spin_lock_irqsave(&drm_minor_lock, flags);
 	idr_remove(&drm_minors_idr, minor->index);
-	spin_unlock_irqrestore(&drm_minor_lock, flags);
-
 	kfree(minor);
 	*slot = NULL;
 }
@@ -219,7 +206,6 @@ static void drm_minor_free(struct drm_device *dev, unsigned int type)
 static int drm_minor_register(struct drm_device *dev, unsigned int type)
 {
 	struct drm_minor *minor;
-	unsigned long flags;
 	int ret;
 
 	DRM_DEBUG("\n");
@@ -239,9 +225,7 @@ static int drm_minor_register(struct drm_device *dev, unsigned int type)
 		goto err_debugfs;
 
 	/* replace NULL with @minor so lookups will succeed from now on */
-	spin_lock_irqsave(&drm_minor_lock, flags);
 	idr_replace(&drm_minors_idr, minor, minor->index);
-	spin_unlock_irqrestore(&drm_minor_lock, flags);
 
 	DRM_DEBUG("new minor registered %d\n", minor->index);
 	return 0;
@@ -254,16 +238,13 @@ static int drm_minor_register(struct drm_device *dev, unsigned int type)
 static void drm_minor_unregister(struct drm_device *dev, unsigned int type)
 {
 	struct drm_minor *minor;
-	unsigned long flags;
 
 	minor = *drm_minor_get_slot(dev, type);
 	if (!minor || !device_is_registered(minor->kdev))
 		return;
 
 	/* replace @minor with NULL so lookups will fail from now on */
-	spin_lock_irqsave(&drm_minor_lock, flags);
 	idr_replace(&drm_minors_idr, NULL, minor->index);
-	spin_unlock_irqrestore(&drm_minor_lock, flags);
 
 	device_del(minor->kdev);
 	dev_set_drvdata(minor->kdev, NULL); /* safety belt */
@@ -284,11 +265,11 @@ struct drm_minor *drm_minor_acquire(unsigned int minor_id)
 	struct drm_minor *minor;
 	unsigned long flags;
 
-	spin_lock_irqsave(&drm_minor_lock, flags);
+	idr_lock_irqsave(&drm_minors_idr, flags);
 	minor = idr_find(&drm_minors_idr, minor_id);
 	if (minor)
 		drm_dev_get(minor->dev);
-	spin_unlock_irqrestore(&drm_minor_lock, flags);
+	idr_unlock_irqrestore(&drm_minors_idr, flags);
 
 	if (!minor) {
 		return ERR_PTR(-ENODEV);
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 55d6182555c7..7d45b3ef4dd6 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -282,11 +282,8 @@ drm_gem_handle_delete(struct drm_file *filp, u32 handle)
 {
 	struct drm_gem_object *obj;
 
-	spin_lock(&filp->table_lock);
-
 	/* Check if we currently have a reference on the object */
 	obj = idr_replace(&filp->object_idr, NULL, handle);
-	spin_unlock(&filp->table_lock);
 	if (IS_ERR_OR_NULL(obj))
 		return -EINVAL;
 
@@ -294,9 +291,7 @@ drm_gem_handle_delete(struct drm_file *filp, u32 handle)
 	drm_gem_object_release_handle(handle, obj, filp);
 
 	/* And finally make the handle available for future allocations. */
-	spin_lock(&filp->table_lock);
 	idr_remove(&filp->object_idr, handle);
-	spin_unlock(&filp->table_lock);
 
 	return 0;
 }
@@ -387,17 +382,8 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
 	if (obj->handle_count++ == 0)
 		drm_gem_object_get(obj);
 
-	/*
-	 * Get the user-visible handle using idr.  Preload and perform
-	 * allocation under our spinlock.
-	 */
-	idr_preload(GFP_KERNEL);
-	spin_lock(&file_priv->table_lock);
-
-	ret = idr_alloc(&file_priv->object_idr, obj, 1, 0, GFP_NOWAIT);
-
-	spin_unlock(&file_priv->table_lock);
-	idr_preload_end();
+	/* Get the user-visible handle using idr. */
+	ret = idr_alloc(&file_priv->object_idr, obj, 1, 0, GFP_KERNEL);
 
 	mutex_unlock(&dev->object_name_lock);
 	if (ret < 0)
@@ -421,9 +407,7 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
 err_revoke:
 	drm_vma_node_revoke(&obj->vma_node, file_priv);
 err_remove:
-	spin_lock(&file_priv->table_lock);
 	idr_remove(&file_priv->object_idr, handle);
-	spin_unlock(&file_priv->table_lock);
 err_unref:
 	drm_gem_object_handle_put_unlocked(obj);
 	return ret;
@@ -634,14 +618,12 @@ drm_gem_object_lookup(struct drm_file *filp, u32 handle)
 {
 	struct drm_gem_object *obj;
 
-	spin_lock(&filp->table_lock);
-
 	/* Check if we currently have a reference on the object */
+	idr_lock(&filp->object_idr);
 	obj = idr_find(&filp->object_idr, handle);
 	if (obj)
 		drm_gem_object_get(obj);
-
-	spin_unlock(&filp->table_lock);
+	idr_unlock(&filp->object_idr);
 
 	return obj;
 }
@@ -776,7 +758,6 @@ void
 drm_gem_open(struct drm_device *dev, struct drm_file *file_private)
 {
 	idr_init(&file_private->object_idr);
-	spin_lock_init(&file_private->table_lock);
 }
 
 /**
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
index ff911541a190..5298d9e78523 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
@@ -61,7 +61,7 @@ static int submit_lookup_objects(struct etnaviv_gem_submit *submit,
 	unsigned i;
 	int ret = 0;
 
-	spin_lock(&file->table_lock);
+	idr_lock(&file->object_idr);
 
 	for (i = 0, bo = submit_bos; i < nr_bos; i++, bo++) {
 		struct drm_gem_object *obj;
@@ -75,7 +75,7 @@ static int submit_lookup_objects(struct etnaviv_gem_submit *submit,
 		submit->bos[i].flags = bo->flags;
 
 		/* normally use drm_gem_object_lookup(), but for bulk lookup
-		 * all under single table_lock just hit object_idr directly:
+		 * all under single lock just hit object_idr directly:
 		 */
 		obj = idr_find(&file->object_idr, bo->handle);
 		if (!obj) {
@@ -96,7 +96,7 @@ static int submit_lookup_objects(struct etnaviv_gem_submit *submit,
 
 out_unlock:
 	submit->nr_bos = i;
-	spin_unlock(&file->table_lock);
+	idr_unlock(&file->object_idr);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index c65e381b85f3..1d88bd0852c3 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -544,9 +544,9 @@ static int i915_gem_object_info(struct seq_file *m, void *data)
 
 		memset(&stats, 0, sizeof(stats));
 		stats.file_priv = file->driver_priv;
-		spin_lock(&file->table_lock);
+		idr_lock(&file->object_idr);
 		idr_for_each(&file->object_idr, per_file_stats, &stats);
-		spin_unlock(&file->table_lock);
+		idr_unlock(&file->object_idr);
 		/*
 		 * Although we have a valid reference on file->pid, that does
 		 * not guarantee that the task_struct who called get_pid() is
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index b8dc8f96caf2..aa8d0d6e40cc 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -88,7 +88,7 @@ static int submit_lookup_objects(struct msm_gem_submit *submit,
 	unsigned i;
 	int ret = 0;
 
-	spin_lock(&file->table_lock);
+	idr_lock(&file->object_idr);
 	pagefault_disable();
 
 	for (i = 0; i < args->nr_bos; i++) {
@@ -105,12 +105,12 @@ static int submit_lookup_objects(struct msm_gem_submit *submit,
 
 		if (copy_from_user_inatomic(&submit_bo, userptr, sizeof(submit_bo))) {
 			pagefault_enable();
-			spin_unlock(&file->table_lock);
+			idr_unlock(&file->object_idr);
 			if (copy_from_user(&submit_bo, userptr, sizeof(submit_bo))) {
 				ret = -EFAULT;
 				goto out;
 			}
-			spin_lock(&file->table_lock);
+			idr_lock(&file->object_idr);
 			pagefault_disable();
 		}
 
@@ -126,7 +126,7 @@ static int submit_lookup_objects(struct msm_gem_submit *submit,
 		submit->bos[i].iova  = submit_bo.presumed;
 
 		/* normally use drm_gem_object_lookup(), but for bulk lookup
-		 * all under single table_lock just hit object_idr directly:
+		 * all under single lock just hit object_idr directly:
 		 */
 		obj = idr_find(&file->object_idr, submit_bo.handle);
 		if (!obj) {
@@ -153,7 +153,7 @@ static int submit_lookup_objects(struct msm_gem_submit *submit,
 
 out_unlock:
 	pagefault_enable();
-	spin_unlock(&file->table_lock);
+	idr_unlock(&file->object_idr);
 
 out:
 	submit->nr_bos = i;
diff --git a/drivers/gpu/drm/vc4/vc4_gem.c b/drivers/gpu/drm/vc4/vc4_gem.c
index e00ac2f3a264..3dfd4fa103c3 100644
--- a/drivers/gpu/drm/vc4/vc4_gem.c
+++ b/drivers/gpu/drm/vc4/vc4_gem.c
@@ -713,7 +713,7 @@ vc4_cl_lookup_bos(struct drm_device *dev,
 		goto fail;
 	}
 
-	spin_lock(&file_priv->table_lock);
+	idr_lock(&file_priv->object_idr);
 	for (i = 0; i < exec->bo_count; i++) {
 		struct drm_gem_object *bo = idr_find(&file_priv->object_idr,
 						     handles[i]);
@@ -727,7 +727,7 @@ vc4_cl_lookup_bos(struct drm_device *dev,
 		drm_gem_object_get(bo);
 		exec->bo[i] = (struct drm_gem_cma_object *)bo;
 	}
-	spin_unlock(&file_priv->table_lock);
+	idr_unlock(&file_priv->object_idr);
 
 	if (ret)
 		goto fail_put_bo;
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index 0e0c868451a5..dd3520d8ce60 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -225,13 +225,10 @@ struct drm_file {
 	 * @object_idr:
 	 *
 	 * Mapping of mm object handles to object pointers. Used by the GEM
-	 * subsystem. Protected by @table_lock.
+	 * subsystem.
 	 */
 	struct idr object_idr;
 
-	/** @table_lock: Protects @object_idr. */
-	spinlock_t table_lock;
-
 	/** @syncobj_idr: Mapping of sync object handles to object pointers. */
 	struct idr syncobj_idr;
 	/** @syncobj_table_lock: Protects @syncobj_idr. */
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 56/62] drm: Remove drm_minor_lock and idr_preload
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The IDR now has its own lock; use it to protect the lookup too.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c      |  8 ++++----
 drivers/gpu/drm/drm_drv.c                    | 25 +++----------------------
 drivers/gpu/drm/drm_gem.c                    | 27 ++++-----------------------
 drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c |  6 +++---
 drivers/gpu/drm/i915/i915_debugfs.c          |  4 ++--
 drivers/gpu/drm/msm/msm_gem_submit.c         | 10 +++++-----
 drivers/gpu/drm/vc4/vc4_gem.c                |  4 ++--
 include/drm/drm_file.h                       |  5 +----
 8 files changed, 24 insertions(+), 65 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index a418df1b9422..25ee4eef3279 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -89,13 +89,13 @@ void amdgpu_gem_force_release(struct amdgpu_device *adev)
 		int handle;
 
 		WARN_ONCE(1, "Still active user space clients!\n");
-		spin_lock(&file->table_lock);
+		idr_lock(&file->object_idr);
 		idr_for_each_entry(&file->object_idr, gobj, handle) {
 			WARN_ONCE(1, "And also active allocations!\n");
 			drm_gem_object_put_unlocked(gobj);
 		}
+		idr_unlock(&file->object_idr);
 		idr_destroy(&file->object_idr);
-		spin_unlock(&file->table_lock);
 	}
 
 	mutex_unlock(&ddev->filelist_mutex);
@@ -824,9 +824,9 @@ static int amdgpu_debugfs_gem_info(struct seq_file *m, void *data)
 			   task ? task->comm : "<unknown>");
 		rcu_read_unlock();
 
-		spin_lock(&file->table_lock);
+		idr_lock(&file->object_idr);
 		idr_for_each(&file->object_idr, amdgpu_debugfs_gem_bo_info, m);
-		spin_unlock(&file->table_lock);
+		idr_unlock(&file->object_idr);
 	}
 
 	mutex_unlock(&dev->filelist_mutex);
diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index a934fd5e7e55..df51db70ea39 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -61,7 +61,6 @@ MODULE_PARM_DESC(debug, "Enable debug output, where each bit enables a debug cat
 "\t\tBit 7 (0x80) will enable LEASE messages (leasing code)");
 module_param_named(debug, drm_debug, int, 0600);
 
-static DEFINE_SPINLOCK(drm_minor_lock);
 static struct idr drm_minors_idr;
 
 /*
@@ -153,7 +152,6 @@ static struct drm_minor **drm_minor_get_slot(struct drm_device *dev,
 static int drm_minor_alloc(struct drm_device *dev, unsigned int type)
 {
 	struct drm_minor *minor;
-	unsigned long flags;
 	int r;
 
 	minor = kzalloc(sizeof(*minor), GFP_KERNEL);
@@ -163,15 +161,11 @@ static int drm_minor_alloc(struct drm_device *dev, unsigned int type)
 	minor->type = type;
 	minor->dev = dev;
 
-	idr_preload(GFP_KERNEL);
-	spin_lock_irqsave(&drm_minor_lock, flags);
 	r = idr_alloc(&drm_minors_idr,
 		      NULL,
 		      64 * type,
 		      64 * (type + 1),
-		      GFP_NOWAIT);
-	spin_unlock_irqrestore(&drm_minor_lock, flags);
-	idr_preload_end();
+		      GFP_KERNEL);
 
 	if (r < 0)
 		goto err_free;
@@ -188,9 +182,7 @@ static int drm_minor_alloc(struct drm_device *dev, unsigned int type)
 	return 0;
 
 err_index:
-	spin_lock_irqsave(&drm_minor_lock, flags);
 	idr_remove(&drm_minors_idr, minor->index);
-	spin_unlock_irqrestore(&drm_minor_lock, flags);
 err_free:
 	kfree(minor);
 	return r;
@@ -199,7 +191,6 @@ static int drm_minor_alloc(struct drm_device *dev, unsigned int type)
 static void drm_minor_free(struct drm_device *dev, unsigned int type)
 {
 	struct drm_minor **slot, *minor;
-	unsigned long flags;
 
 	slot = drm_minor_get_slot(dev, type);
 	minor = *slot;
@@ -207,11 +198,7 @@ static void drm_minor_free(struct drm_device *dev, unsigned int type)
 		return;
 
 	put_device(minor->kdev);
-
-	spin_lock_irqsave(&drm_minor_lock, flags);
 	idr_remove(&drm_minors_idr, minor->index);
-	spin_unlock_irqrestore(&drm_minor_lock, flags);
-
 	kfree(minor);
 	*slot = NULL;
 }
@@ -219,7 +206,6 @@ static void drm_minor_free(struct drm_device *dev, unsigned int type)
 static int drm_minor_register(struct drm_device *dev, unsigned int type)
 {
 	struct drm_minor *minor;
-	unsigned long flags;
 	int ret;
 
 	DRM_DEBUG("\n");
@@ -239,9 +225,7 @@ static int drm_minor_register(struct drm_device *dev, unsigned int type)
 		goto err_debugfs;
 
 	/* replace NULL with @minor so lookups will succeed from now on */
-	spin_lock_irqsave(&drm_minor_lock, flags);
 	idr_replace(&drm_minors_idr, minor, minor->index);
-	spin_unlock_irqrestore(&drm_minor_lock, flags);
 
 	DRM_DEBUG("new minor registered %d\n", minor->index);
 	return 0;
@@ -254,16 +238,13 @@ static int drm_minor_register(struct drm_device *dev, unsigned int type)
 static void drm_minor_unregister(struct drm_device *dev, unsigned int type)
 {
 	struct drm_minor *minor;
-	unsigned long flags;
 
 	minor = *drm_minor_get_slot(dev, type);
 	if (!minor || !device_is_registered(minor->kdev))
 		return;
 
 	/* replace @minor with NULL so lookups will fail from now on */
-	spin_lock_irqsave(&drm_minor_lock, flags);
 	idr_replace(&drm_minors_idr, NULL, minor->index);
-	spin_unlock_irqrestore(&drm_minor_lock, flags);
 
 	device_del(minor->kdev);
 	dev_set_drvdata(minor->kdev, NULL); /* safety belt */
@@ -284,11 +265,11 @@ struct drm_minor *drm_minor_acquire(unsigned int minor_id)
 	struct drm_minor *minor;
 	unsigned long flags;
 
-	spin_lock_irqsave(&drm_minor_lock, flags);
+	idr_lock_irqsave(&drm_minors_idr, flags);
 	minor = idr_find(&drm_minors_idr, minor_id);
 	if (minor)
 		drm_dev_get(minor->dev);
-	spin_unlock_irqrestore(&drm_minor_lock, flags);
+	idr_unlock_irqrestore(&drm_minors_idr, flags);
 
 	if (!minor) {
 		return ERR_PTR(-ENODEV);
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 55d6182555c7..7d45b3ef4dd6 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -282,11 +282,8 @@ drm_gem_handle_delete(struct drm_file *filp, u32 handle)
 {
 	struct drm_gem_object *obj;
 
-	spin_lock(&filp->table_lock);
-
 	/* Check if we currently have a reference on the object */
 	obj = idr_replace(&filp->object_idr, NULL, handle);
-	spin_unlock(&filp->table_lock);
 	if (IS_ERR_OR_NULL(obj))
 		return -EINVAL;
 
@@ -294,9 +291,7 @@ drm_gem_handle_delete(struct drm_file *filp, u32 handle)
 	drm_gem_object_release_handle(handle, obj, filp);
 
 	/* And finally make the handle available for future allocations. */
-	spin_lock(&filp->table_lock);
 	idr_remove(&filp->object_idr, handle);
-	spin_unlock(&filp->table_lock);
 
 	return 0;
 }
@@ -387,17 +382,8 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
 	if (obj->handle_count++ == 0)
 		drm_gem_object_get(obj);
 
-	/*
-	 * Get the user-visible handle using idr.  Preload and perform
-	 * allocation under our spinlock.
-	 */
-	idr_preload(GFP_KERNEL);
-	spin_lock(&file_priv->table_lock);
-
-	ret = idr_alloc(&file_priv->object_idr, obj, 1, 0, GFP_NOWAIT);
-
-	spin_unlock(&file_priv->table_lock);
-	idr_preload_end();
+	/* Get the user-visible handle using idr. */
+	ret = idr_alloc(&file_priv->object_idr, obj, 1, 0, GFP_KERNEL);
 
 	mutex_unlock(&dev->object_name_lock);
 	if (ret < 0)
@@ -421,9 +407,7 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
 err_revoke:
 	drm_vma_node_revoke(&obj->vma_node, file_priv);
 err_remove:
-	spin_lock(&file_priv->table_lock);
 	idr_remove(&file_priv->object_idr, handle);
-	spin_unlock(&file_priv->table_lock);
 err_unref:
 	drm_gem_object_handle_put_unlocked(obj);
 	return ret;
@@ -634,14 +618,12 @@ drm_gem_object_lookup(struct drm_file *filp, u32 handle)
 {
 	struct drm_gem_object *obj;
 
-	spin_lock(&filp->table_lock);
-
 	/* Check if we currently have a reference on the object */
+	idr_lock(&filp->object_idr);
 	obj = idr_find(&filp->object_idr, handle);
 	if (obj)
 		drm_gem_object_get(obj);
-
-	spin_unlock(&filp->table_lock);
+	idr_unlock(&filp->object_idr);
 
 	return obj;
 }
@@ -776,7 +758,6 @@ void
 drm_gem_open(struct drm_device *dev, struct drm_file *file_private)
 {
 	idr_init(&file_private->object_idr);
-	spin_lock_init(&file_private->table_lock);
 }
 
 /**
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
index ff911541a190..5298d9e78523 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
@@ -61,7 +61,7 @@ static int submit_lookup_objects(struct etnaviv_gem_submit *submit,
 	unsigned i;
 	int ret = 0;
 
-	spin_lock(&file->table_lock);
+	idr_lock(&file->object_idr);
 
 	for (i = 0, bo = submit_bos; i < nr_bos; i++, bo++) {
 		struct drm_gem_object *obj;
@@ -75,7 +75,7 @@ static int submit_lookup_objects(struct etnaviv_gem_submit *submit,
 		submit->bos[i].flags = bo->flags;
 
 		/* normally use drm_gem_object_lookup(), but for bulk lookup
-		 * all under single table_lock just hit object_idr directly:
+		 * all under single lock just hit object_idr directly:
 		 */
 		obj = idr_find(&file->object_idr, bo->handle);
 		if (!obj) {
@@ -96,7 +96,7 @@ static int submit_lookup_objects(struct etnaviv_gem_submit *submit,
 
 out_unlock:
 	submit->nr_bos = i;
-	spin_unlock(&file->table_lock);
+	idr_unlock(&file->object_idr);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index c65e381b85f3..1d88bd0852c3 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -544,9 +544,9 @@ static int i915_gem_object_info(struct seq_file *m, void *data)
 
 		memset(&stats, 0, sizeof(stats));
 		stats.file_priv = file->driver_priv;
-		spin_lock(&file->table_lock);
+		idr_lock(&file->object_idr);
 		idr_for_each(&file->object_idr, per_file_stats, &stats);
-		spin_unlock(&file->table_lock);
+		idr_unlock(&file->object_idr);
 		/*
 		 * Although we have a valid reference on file->pid, that does
 		 * not guarantee that the task_struct who called get_pid() is
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index b8dc8f96caf2..aa8d0d6e40cc 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -88,7 +88,7 @@ static int submit_lookup_objects(struct msm_gem_submit *submit,
 	unsigned i;
 	int ret = 0;
 
-	spin_lock(&file->table_lock);
+	idr_lock(&file->object_idr);
 	pagefault_disable();
 
 	for (i = 0; i < args->nr_bos; i++) {
@@ -105,12 +105,12 @@ static int submit_lookup_objects(struct msm_gem_submit *submit,
 
 		if (copy_from_user_inatomic(&submit_bo, userptr, sizeof(submit_bo))) {
 			pagefault_enable();
-			spin_unlock(&file->table_lock);
+			idr_unlock(&file->object_idr);
 			if (copy_from_user(&submit_bo, userptr, sizeof(submit_bo))) {
 				ret = -EFAULT;
 				goto out;
 			}
-			spin_lock(&file->table_lock);
+			idr_lock(&file->object_idr);
 			pagefault_disable();
 		}
 
@@ -126,7 +126,7 @@ static int submit_lookup_objects(struct msm_gem_submit *submit,
 		submit->bos[i].iova  = submit_bo.presumed;
 
 		/* normally use drm_gem_object_lookup(), but for bulk lookup
-		 * all under single table_lock just hit object_idr directly:
+		 * all under single lock just hit object_idr directly:
 		 */
 		obj = idr_find(&file->object_idr, submit_bo.handle);
 		if (!obj) {
@@ -153,7 +153,7 @@ static int submit_lookup_objects(struct msm_gem_submit *submit,
 
 out_unlock:
 	pagefault_enable();
-	spin_unlock(&file->table_lock);
+	idr_unlock(&file->object_idr);
 
 out:
 	submit->nr_bos = i;
diff --git a/drivers/gpu/drm/vc4/vc4_gem.c b/drivers/gpu/drm/vc4/vc4_gem.c
index e00ac2f3a264..3dfd4fa103c3 100644
--- a/drivers/gpu/drm/vc4/vc4_gem.c
+++ b/drivers/gpu/drm/vc4/vc4_gem.c
@@ -713,7 +713,7 @@ vc4_cl_lookup_bos(struct drm_device *dev,
 		goto fail;
 	}
 
-	spin_lock(&file_priv->table_lock);
+	idr_lock(&file_priv->object_idr);
 	for (i = 0; i < exec->bo_count; i++) {
 		struct drm_gem_object *bo = idr_find(&file_priv->object_idr,
 						     handles[i]);
@@ -727,7 +727,7 @@ vc4_cl_lookup_bos(struct drm_device *dev,
 		drm_gem_object_get(bo);
 		exec->bo[i] = (struct drm_gem_cma_object *)bo;
 	}
-	spin_unlock(&file_priv->table_lock);
+	idr_unlock(&file_priv->object_idr);
 
 	if (ret)
 		goto fail_put_bo;
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index 0e0c868451a5..dd3520d8ce60 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -225,13 +225,10 @@ struct drm_file {
 	 * @object_idr:
 	 *
 	 * Mapping of mm object handles to object pointers. Used by the GEM
-	 * subsystem. Protected by @table_lock.
+	 * subsystem.
 	 */
 	struct idr object_idr;
 
-	/** @table_lock: Protects @object_idr. */
-	spinlock_t table_lock;
-
 	/** @syncobj_idr: Mapping of sync object handles to object pointers. */
 	struct idr syncobj_idr;
 	/** @syncobj_table_lock: Protects @syncobj_idr. */
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 57/62] drm: Remove drm_syncobj_fd_to_handle
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Now that the IDR handles its own locking, by removing our own lock, we
can also remove the idr_preload calls.

Also add a missing put call in a failure path.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 drivers/gpu/drm/drm_syncobj.c | 23 +++++------------------
 include/drm/drm_file.h        |  2 --
 2 files changed, 5 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index f776fc1cc543..fec56b26a0b6 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -68,14 +68,14 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private,
 {
 	struct drm_syncobj *syncobj;
 
-	spin_lock(&file_private->syncobj_table_lock);
+	idr_lock(&file_private->syncobj_idr);
 
 	/* Check if we currently have a reference on the object */
 	syncobj = idr_find(&file_private->syncobj_idr, handle);
 	if (syncobj)
 		drm_syncobj_get(syncobj);
 
-	spin_unlock(&file_private->syncobj_table_lock);
+	idr_unlock(&file_private->syncobj_idr);
 
 	return syncobj;
 }
@@ -309,13 +309,7 @@ int drm_syncobj_get_handle(struct drm_file *file_private,
 	/* take a reference to put in the idr */
 	drm_syncobj_get(syncobj);
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&file_private->syncobj_table_lock);
-	ret = idr_alloc(&file_private->syncobj_idr, syncobj, 1, 0, GFP_NOWAIT);
-	spin_unlock(&file_private->syncobj_table_lock);
-
-	idr_preload_end();
-
+	ret = idr_alloc(&file_private->syncobj_idr, syncobj, 1, 0, GFP_KERNEL);
 	if (ret < 0) {
 		drm_syncobj_put(syncobj);
 		return ret;
@@ -346,9 +340,7 @@ static int drm_syncobj_destroy(struct drm_file *file_private,
 {
 	struct drm_syncobj *syncobj;
 
-	spin_lock(&file_private->syncobj_table_lock);
 	syncobj = idr_remove(&file_private->syncobj_idr, handle);
-	spin_unlock(&file_private->syncobj_table_lock);
 
 	if (!syncobj)
 		return -EINVAL;
@@ -449,14 +441,10 @@ static int drm_syncobj_fd_to_handle(struct drm_file *file_private,
 	/* take a reference to put in the idr */
 	drm_syncobj_get(syncobj);
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&file_private->syncobj_table_lock);
-	ret = idr_alloc(&file_private->syncobj_idr, syncobj, 1, 0, GFP_NOWAIT);
-	spin_unlock(&file_private->syncobj_table_lock);
-	idr_preload_end();
-
+	ret = idr_alloc(&file_private->syncobj_idr, syncobj, 1, 0, GFP_KERNEL);
 	if (ret < 0) {
 		fput(syncobj->file);
+		drm_syncobj_put(syncobj);
 		return ret;
 	}
 	*handle = ret;
@@ -527,7 +515,6 @@ void
 drm_syncobj_open(struct drm_file *file_private)
 {
 	idr_init(&file_private->syncobj_idr);
-	spin_lock_init(&file_private->syncobj_table_lock);
 }
 
 static int
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index dd3520d8ce60..dfabc20a5934 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -231,8 +231,6 @@ struct drm_file {
 
 	/** @syncobj_idr: Mapping of sync object handles to object pointers. */
 	struct idr syncobj_idr;
-	/** @syncobj_table_lock: Protects @syncobj_idr. */
-	spinlock_t syncobj_table_lock;
 
 	/** @filp: Pointer to the core file structure. */
 	struct file *filp;
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 57/62] drm: Remove drm_syncobj_fd_to_handle
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Now that the IDR handles its own locking, by removing our own lock, we
can also remove the idr_preload calls.

Also add a missing put call in a failure path.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 drivers/gpu/drm/drm_syncobj.c | 23 +++++------------------
 include/drm/drm_file.h        |  2 --
 2 files changed, 5 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index f776fc1cc543..fec56b26a0b6 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -68,14 +68,14 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private,
 {
 	struct drm_syncobj *syncobj;
 
-	spin_lock(&file_private->syncobj_table_lock);
+	idr_lock(&file_private->syncobj_idr);
 
 	/* Check if we currently have a reference on the object */
 	syncobj = idr_find(&file_private->syncobj_idr, handle);
 	if (syncobj)
 		drm_syncobj_get(syncobj);
 
-	spin_unlock(&file_private->syncobj_table_lock);
+	idr_unlock(&file_private->syncobj_idr);
 
 	return syncobj;
 }
@@ -309,13 +309,7 @@ int drm_syncobj_get_handle(struct drm_file *file_private,
 	/* take a reference to put in the idr */
 	drm_syncobj_get(syncobj);
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&file_private->syncobj_table_lock);
-	ret = idr_alloc(&file_private->syncobj_idr, syncobj, 1, 0, GFP_NOWAIT);
-	spin_unlock(&file_private->syncobj_table_lock);
-
-	idr_preload_end();
-
+	ret = idr_alloc(&file_private->syncobj_idr, syncobj, 1, 0, GFP_KERNEL);
 	if (ret < 0) {
 		drm_syncobj_put(syncobj);
 		return ret;
@@ -346,9 +340,7 @@ static int drm_syncobj_destroy(struct drm_file *file_private,
 {
 	struct drm_syncobj *syncobj;
 
-	spin_lock(&file_private->syncobj_table_lock);
 	syncobj = idr_remove(&file_private->syncobj_idr, handle);
-	spin_unlock(&file_private->syncobj_table_lock);
 
 	if (!syncobj)
 		return -EINVAL;
@@ -449,14 +441,10 @@ static int drm_syncobj_fd_to_handle(struct drm_file *file_private,
 	/* take a reference to put in the idr */
 	drm_syncobj_get(syncobj);
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&file_private->syncobj_table_lock);
-	ret = idr_alloc(&file_private->syncobj_idr, syncobj, 1, 0, GFP_NOWAIT);
-	spin_unlock(&file_private->syncobj_table_lock);
-	idr_preload_end();
-
+	ret = idr_alloc(&file_private->syncobj_idr, syncobj, 1, 0, GFP_KERNEL);
 	if (ret < 0) {
 		fput(syncobj->file);
+		drm_syncobj_put(syncobj);
 		return ret;
 	}
 	*handle = ret;
@@ -527,7 +515,6 @@ void
 drm_syncobj_open(struct drm_file *file_private)
 {
 	idr_init(&file_private->syncobj_idr);
-	spin_lock_init(&file_private->syncobj_table_lock);
 }
 
 static int
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index dd3520d8ce60..dfabc20a5934 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -231,8 +231,6 @@ struct drm_file {
 
 	/** @syncobj_idr: Mapping of sync object handles to object pointers. */
 	struct idr syncobj_idr;
-	/** @syncobj_table_lock: Protects @syncobj_idr. */
-	spinlock_t syncobj_table_lock;
 
 	/** @filp: Pointer to the core file structure. */
 	struct file *filp;
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 58/62] drm: Remove qxl driver IDR locks
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

By switching to the internal IDR lock, we can get rid of the idr_preload
calls.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 drivers/gpu/drm/qxl/qxl_cmd.c     | 26 +++++++-------------------
 drivers/gpu/drm/qxl/qxl_drv.h     |  2 --
 drivers/gpu/drm/qxl/qxl_kms.c     |  2 --
 drivers/gpu/drm/qxl/qxl_release.c | 12 +++---------
 4 files changed, 10 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_cmd.c b/drivers/gpu/drm/qxl/qxl_cmd.c
index c0fb52c6d4ca..b1cecc38ef5f 100644
--- a/drivers/gpu/drm/qxl/qxl_cmd.c
+++ b/drivers/gpu/drm/qxl/qxl_cmd.c
@@ -451,37 +451,29 @@ int qxl_surface_id_alloc(struct qxl_device *qdev,
 	int idr_ret;
 	int count = 0;
 again:
-	idr_preload(GFP_ATOMIC);
-	spin_lock(&qdev->surf_id_idr_lock);
-	idr_ret = idr_alloc(&qdev->surf_id_idr, NULL, 1, 0, GFP_NOWAIT);
-	spin_unlock(&qdev->surf_id_idr_lock);
-	idr_preload_end();
+	idr_ret = idr_alloc(&qdev->surf_id_idr, NULL, 1, 0, GFP_ATOMIC);
 	if (idr_ret < 0)
 		return idr_ret;
 	handle = idr_ret;
 
 	if (handle >= qdev->rom->n_surfaces) {
 		count++;
-		spin_lock(&qdev->surf_id_idr_lock);
 		idr_remove(&qdev->surf_id_idr, handle);
-		spin_unlock(&qdev->surf_id_idr_lock);
 		qxl_reap_surface_id(qdev, 2);
 		goto again;
 	}
 	surf->surface_id = handle;
 
-	spin_lock(&qdev->surf_id_idr_lock);
+	idr_lock(&qdev->surf_id_idr);
 	qdev->last_alloced_surf_id = handle;
-	spin_unlock(&qdev->surf_id_idr_lock);
+	idr_unlock(&qdev->surf_id_idr);
 	return 0;
 }
 
 void qxl_surface_id_dealloc(struct qxl_device *qdev,
 			    uint32_t surface_id)
 {
-	spin_lock(&qdev->surf_id_idr_lock);
 	idr_remove(&qdev->surf_id_idr, surface_id);
-	spin_unlock(&qdev->surf_id_idr_lock);
 }
 
 int qxl_hw_surface_alloc(struct qxl_device *qdev,
@@ -534,9 +526,7 @@ int qxl_hw_surface_alloc(struct qxl_device *qdev,
 	qxl_release_fence_buffer_objects(release);
 
 	surf->hw_surf_alloc = true;
-	spin_lock(&qdev->surf_id_idr_lock);
 	idr_replace(&qdev->surf_id_idr, surf, surf->surface_id);
-	spin_unlock(&qdev->surf_id_idr_lock);
 	return 0;
 }
 
@@ -559,9 +549,7 @@ int qxl_hw_surface_dealloc(struct qxl_device *qdev,
 
 	surf->surf_create = NULL;
 	/* remove the surface from the idr, but not the surface id yet */
-	spin_lock(&qdev->surf_id_idr_lock);
 	idr_replace(&qdev->surf_id_idr, NULL, surf->surface_id);
-	spin_unlock(&qdev->surf_id_idr_lock);
 	surf->hw_surf_alloc = false;
 
 	id = surf->surface_id;
@@ -650,9 +638,9 @@ static int qxl_reap_surface_id(struct qxl_device *qdev, int max_to_reap)
 	mutex_lock(&qdev->surf_evict_mutex);
 again:
 
-	spin_lock(&qdev->surf_id_idr_lock);
+	idr_lock(&qdev->surf_id_idr);
 	start = qdev->last_alloced_surf_id + 1;
-	spin_unlock(&qdev->surf_id_idr_lock);
+	idr_unlock(&qdev->surf_id_idr);
 
 	for (i = start; i < start + qdev->rom->n_surfaces; i++) {
 		void *objptr;
@@ -661,9 +649,9 @@ static int qxl_reap_surface_id(struct qxl_device *qdev, int max_to_reap)
 		/* this avoids the case where the objects is in the
 		   idr but has been evicted half way - its makes
 		   the idr lookup atomic with the eviction */
-		spin_lock(&qdev->surf_id_idr_lock);
+		idr_lock(&qdev->surf_id_idr);
 		objptr = idr_find(&qdev->surf_id_idr, surfid);
-		spin_unlock(&qdev->surf_id_idr_lock);
+		idr_unlock(&qdev->surf_id_idr);
 
 		if (!objptr)
 			continue;
diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h
index 08752c0ffb35..07378adbf4e8 100644
--- a/drivers/gpu/drm/qxl/qxl_drv.h
+++ b/drivers/gpu/drm/qxl/qxl_drv.h
@@ -255,7 +255,6 @@ struct qxl_device {
 	spinlock_t	release_lock;
 	struct idr	release_idr;
 	uint32_t	release_seqno;
-	spinlock_t release_idr_lock;
 	struct mutex	async_io_mutex;
 	unsigned int last_sent_io_cmd;
 
@@ -277,7 +276,6 @@ struct qxl_device {
 	struct mutex		update_area_mutex;
 
 	struct idr	surf_id_idr;
-	spinlock_t surf_id_idr_lock;
 	int last_alloced_surf_id;
 
 	struct mutex surf_evict_mutex;
diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c
index c5716a0ca3b8..276a5b25ae8e 100644
--- a/drivers/gpu/drm/qxl/qxl_kms.c
+++ b/drivers/gpu/drm/qxl/qxl_kms.c
@@ -204,11 +204,9 @@ int qxl_device_init(struct qxl_device *qdev,
 			GFP_KERNEL);
 
 	idr_init(&qdev->release_idr);
-	spin_lock_init(&qdev->release_idr_lock);
 	spin_lock_init(&qdev->release_lock);
 
 	idr_init(&qdev->surf_id_idr);
-	spin_lock_init(&qdev->surf_id_idr_lock);
 
 	mutex_init(&qdev->async_io_mutex);
 
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index a6da6fa6ad58..c327b3441b63 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -142,12 +142,10 @@ qxl_release_alloc(struct qxl_device *qdev, int type,
 	release->surface_release_id = 0;
 	INIT_LIST_HEAD(&release->bos);
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&qdev->release_idr_lock);
-	handle = idr_alloc(&qdev->release_idr, release, 1, 0, GFP_NOWAIT);
+	handle = idr_alloc(&qdev->release_idr, release, 1, 0, GFP_KERNEL);
+	idr_lock(&qdev->release_idr);
 	release->base.seqno = ++qdev->release_seqno;
-	spin_unlock(&qdev->release_idr_lock);
-	idr_preload_end();
+	idr_unlock(&qdev->release_idr);
 	if (handle < 0) {
 		kfree(release);
 		*ret = NULL;
@@ -184,9 +182,7 @@ qxl_release_free(struct qxl_device *qdev,
 	if (release->surface_release_id)
 		qxl_surface_id_dealloc(qdev, release->surface_release_id);
 
-	spin_lock(&qdev->release_idr_lock);
 	idr_remove(&qdev->release_idr, release->id);
-	spin_unlock(&qdev->release_idr_lock);
 
 	if (release->base.ops) {
 		WARN_ON(list_empty(&release->bos));
@@ -392,9 +388,7 @@ struct qxl_release *qxl_release_from_id_locked(struct qxl_device *qdev,
 {
 	struct qxl_release *release;
 
-	spin_lock(&qdev->release_idr_lock);
 	release = idr_find(&qdev->release_idr, id);
-	spin_unlock(&qdev->release_idr_lock);
 	if (!release) {
 		DRM_ERROR("failed to find id in release_idr\n");
 		return NULL;
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 58/62] drm: Remove qxl driver IDR locks
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

By switching to the internal IDR lock, we can get rid of the idr_preload
calls.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 drivers/gpu/drm/qxl/qxl_cmd.c     | 26 +++++++-------------------
 drivers/gpu/drm/qxl/qxl_drv.h     |  2 --
 drivers/gpu/drm/qxl/qxl_kms.c     |  2 --
 drivers/gpu/drm/qxl/qxl_release.c | 12 +++---------
 4 files changed, 10 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_cmd.c b/drivers/gpu/drm/qxl/qxl_cmd.c
index c0fb52c6d4ca..b1cecc38ef5f 100644
--- a/drivers/gpu/drm/qxl/qxl_cmd.c
+++ b/drivers/gpu/drm/qxl/qxl_cmd.c
@@ -451,37 +451,29 @@ int qxl_surface_id_alloc(struct qxl_device *qdev,
 	int idr_ret;
 	int count = 0;
 again:
-	idr_preload(GFP_ATOMIC);
-	spin_lock(&qdev->surf_id_idr_lock);
-	idr_ret = idr_alloc(&qdev->surf_id_idr, NULL, 1, 0, GFP_NOWAIT);
-	spin_unlock(&qdev->surf_id_idr_lock);
-	idr_preload_end();
+	idr_ret = idr_alloc(&qdev->surf_id_idr, NULL, 1, 0, GFP_ATOMIC);
 	if (idr_ret < 0)
 		return idr_ret;
 	handle = idr_ret;
 
 	if (handle >= qdev->rom->n_surfaces) {
 		count++;
-		spin_lock(&qdev->surf_id_idr_lock);
 		idr_remove(&qdev->surf_id_idr, handle);
-		spin_unlock(&qdev->surf_id_idr_lock);
 		qxl_reap_surface_id(qdev, 2);
 		goto again;
 	}
 	surf->surface_id = handle;
 
-	spin_lock(&qdev->surf_id_idr_lock);
+	idr_lock(&qdev->surf_id_idr);
 	qdev->last_alloced_surf_id = handle;
-	spin_unlock(&qdev->surf_id_idr_lock);
+	idr_unlock(&qdev->surf_id_idr);
 	return 0;
 }
 
 void qxl_surface_id_dealloc(struct qxl_device *qdev,
 			    uint32_t surface_id)
 {
-	spin_lock(&qdev->surf_id_idr_lock);
 	idr_remove(&qdev->surf_id_idr, surface_id);
-	spin_unlock(&qdev->surf_id_idr_lock);
 }
 
 int qxl_hw_surface_alloc(struct qxl_device *qdev,
@@ -534,9 +526,7 @@ int qxl_hw_surface_alloc(struct qxl_device *qdev,
 	qxl_release_fence_buffer_objects(release);
 
 	surf->hw_surf_alloc = true;
-	spin_lock(&qdev->surf_id_idr_lock);
 	idr_replace(&qdev->surf_id_idr, surf, surf->surface_id);
-	spin_unlock(&qdev->surf_id_idr_lock);
 	return 0;
 }
 
@@ -559,9 +549,7 @@ int qxl_hw_surface_dealloc(struct qxl_device *qdev,
 
 	surf->surf_create = NULL;
 	/* remove the surface from the idr, but not the surface id yet */
-	spin_lock(&qdev->surf_id_idr_lock);
 	idr_replace(&qdev->surf_id_idr, NULL, surf->surface_id);
-	spin_unlock(&qdev->surf_id_idr_lock);
 	surf->hw_surf_alloc = false;
 
 	id = surf->surface_id;
@@ -650,9 +638,9 @@ static int qxl_reap_surface_id(struct qxl_device *qdev, int max_to_reap)
 	mutex_lock(&qdev->surf_evict_mutex);
 again:
 
-	spin_lock(&qdev->surf_id_idr_lock);
+	idr_lock(&qdev->surf_id_idr);
 	start = qdev->last_alloced_surf_id + 1;
-	spin_unlock(&qdev->surf_id_idr_lock);
+	idr_unlock(&qdev->surf_id_idr);
 
 	for (i = start; i < start + qdev->rom->n_surfaces; i++) {
 		void *objptr;
@@ -661,9 +649,9 @@ static int qxl_reap_surface_id(struct qxl_device *qdev, int max_to_reap)
 		/* this avoids the case where the objects is in the
 		   idr but has been evicted half way - its makes
 		   the idr lookup atomic with the eviction */
-		spin_lock(&qdev->surf_id_idr_lock);
+		idr_lock(&qdev->surf_id_idr);
 		objptr = idr_find(&qdev->surf_id_idr, surfid);
-		spin_unlock(&qdev->surf_id_idr_lock);
+		idr_unlock(&qdev->surf_id_idr);
 
 		if (!objptr)
 			continue;
diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h
index 08752c0ffb35..07378adbf4e8 100644
--- a/drivers/gpu/drm/qxl/qxl_drv.h
+++ b/drivers/gpu/drm/qxl/qxl_drv.h
@@ -255,7 +255,6 @@ struct qxl_device {
 	spinlock_t	release_lock;
 	struct idr	release_idr;
 	uint32_t	release_seqno;
-	spinlock_t release_idr_lock;
 	struct mutex	async_io_mutex;
 	unsigned int last_sent_io_cmd;
 
@@ -277,7 +276,6 @@ struct qxl_device {
 	struct mutex		update_area_mutex;
 
 	struct idr	surf_id_idr;
-	spinlock_t surf_id_idr_lock;
 	int last_alloced_surf_id;
 
 	struct mutex surf_evict_mutex;
diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c
index c5716a0ca3b8..276a5b25ae8e 100644
--- a/drivers/gpu/drm/qxl/qxl_kms.c
+++ b/drivers/gpu/drm/qxl/qxl_kms.c
@@ -204,11 +204,9 @@ int qxl_device_init(struct qxl_device *qdev,
 			GFP_KERNEL);
 
 	idr_init(&qdev->release_idr);
-	spin_lock_init(&qdev->release_idr_lock);
 	spin_lock_init(&qdev->release_lock);
 
 	idr_init(&qdev->surf_id_idr);
-	spin_lock_init(&qdev->surf_id_idr_lock);
 
 	mutex_init(&qdev->async_io_mutex);
 
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index a6da6fa6ad58..c327b3441b63 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -142,12 +142,10 @@ qxl_release_alloc(struct qxl_device *qdev, int type,
 	release->surface_release_id = 0;
 	INIT_LIST_HEAD(&release->bos);
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&qdev->release_idr_lock);
-	handle = idr_alloc(&qdev->release_idr, release, 1, 0, GFP_NOWAIT);
+	handle = idr_alloc(&qdev->release_idr, release, 1, 0, GFP_KERNEL);
+	idr_lock(&qdev->release_idr);
 	release->base.seqno = ++qdev->release_seqno;
-	spin_unlock(&qdev->release_idr_lock);
-	idr_preload_end();
+	idr_unlock(&qdev->release_idr);
 	if (handle < 0) {
 		kfree(release);
 		*ret = NULL;
@@ -184,9 +182,7 @@ qxl_release_free(struct qxl_device *qdev,
 	if (release->surface_release_id)
 		qxl_surface_id_dealloc(qdev, release->surface_release_id);
 
-	spin_lock(&qdev->release_idr_lock);
 	idr_remove(&qdev->release_idr, release->id);
-	spin_unlock(&qdev->release_idr_lock);
 
 	if (release->base.ops) {
 		WARN_ON(list_empty(&release->bos));
@@ -392,9 +388,7 @@ struct qxl_release *qxl_release_from_id_locked(struct qxl_device *qdev,
 {
 	struct qxl_release *release;
 
-	spin_lock(&qdev->release_idr_lock);
 	release = idr_find(&qdev->release_idr, id);
-	spin_unlock(&qdev->release_idr_lock);
 	if (!release) {
 		DRM_ERROR("failed to find id in release_idr\n");
 		return NULL;
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 59/62] drm: Replace virtio IDRs with IDAs
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

These IDRs were only being used to allocate unique numbers, not to look
up pointers, so they can use the more space-efficient IDA instead.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 drivers/gpu/drm/virtio/virtgpu_drv.h |  6 ++----
 drivers/gpu/drm/virtio/virtgpu_kms.c | 18 ++++--------------
 drivers/gpu/drm/virtio/virtgpu_vq.c  | 12 ++----------
 3 files changed, 8 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
index da2fb585fea4..6852e8bbc1b0 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -181,8 +181,7 @@ struct virtio_gpu_device {
 	struct kmem_cache *vbufs;
 	bool vqs_ready;
 
-	struct idr	resource_idr;
-	spinlock_t resource_idr_lock;
+	struct ida	resource_ida;
 
 	wait_queue_head_t resp_wq;
 	/* current display info */
@@ -191,8 +190,7 @@ struct virtio_gpu_device {
 
 	struct virtio_gpu_fence_driver fence_drv;
 
-	struct idr	ctx_id_idr;
-	spinlock_t ctx_id_idr_lock;
+	struct ida	ctx_id_ida;
 
 	bool has_virgl_3d;
 
diff --git a/drivers/gpu/drm/virtio/virtgpu_kms.c b/drivers/gpu/drm/virtio/virtgpu_kms.c
index 6400506a06b0..29c08b63d8d3 100644
--- a/drivers/gpu/drm/virtio/virtgpu_kms.c
+++ b/drivers/gpu/drm/virtio/virtgpu_kms.c
@@ -55,21 +55,13 @@ static void virtio_gpu_config_changed_work_func(struct work_struct *work)
 static void virtio_gpu_ctx_id_get(struct virtio_gpu_device *vgdev,
 				  uint32_t *resid)
 {
-	int handle;
-
-	idr_preload(GFP_KERNEL);
-	spin_lock(&vgdev->ctx_id_idr_lock);
-	handle = idr_alloc(&vgdev->ctx_id_idr, NULL, 1, 0, 0);
-	spin_unlock(&vgdev->ctx_id_idr_lock);
-	idr_preload_end();
+	int handle = ida_simple_get(&vgdev->ctx_id_ida, 1, 0, GFP_KERNEL);
 	*resid = handle;
 }
 
 static void virtio_gpu_ctx_id_put(struct virtio_gpu_device *vgdev, uint32_t id)
 {
-	spin_lock(&vgdev->ctx_id_idr_lock);
-	idr_remove(&vgdev->ctx_id_idr, id);
-	spin_unlock(&vgdev->ctx_id_idr_lock);
+	ida_simple_remove(&vgdev->ctx_id_ida, id);
 }
 
 static void virtio_gpu_context_create(struct virtio_gpu_device *vgdev,
@@ -151,10 +143,8 @@ int virtio_gpu_driver_load(struct drm_device *dev, unsigned long flags)
 	vgdev->dev = dev->dev;
 
 	spin_lock_init(&vgdev->display_info_lock);
-	spin_lock_init(&vgdev->ctx_id_idr_lock);
-	idr_init(&vgdev->ctx_id_idr);
-	spin_lock_init(&vgdev->resource_idr_lock);
-	idr_init(&vgdev->resource_idr);
+	ida_init(&vgdev->ctx_id_ida);
+	ida_init(&vgdev->resource_ida);
 	init_waitqueue_head(&vgdev->resp_wq);
 	virtio_gpu_init_vq(&vgdev->ctrlq, virtio_gpu_dequeue_ctrl_func);
 	virtio_gpu_init_vq(&vgdev->cursorq, virtio_gpu_dequeue_cursor_func);
diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
index 9eb96fb2c147..b35b5f2c056c 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -41,21 +41,13 @@
 void virtio_gpu_resource_id_get(struct virtio_gpu_device *vgdev,
 				uint32_t *resid)
 {
-	int handle;
-
-	idr_preload(GFP_KERNEL);
-	spin_lock(&vgdev->resource_idr_lock);
-	handle = idr_alloc(&vgdev->resource_idr, NULL, 1, 0, GFP_NOWAIT);
-	spin_unlock(&vgdev->resource_idr_lock);
-	idr_preload_end();
+	int handle = ida_simple_get(&vgdev->resource_ida, 1, 0, GFP_KERNEL);
 	*resid = handle;
 }
 
 void virtio_gpu_resource_id_put(struct virtio_gpu_device *vgdev, uint32_t id)
 {
-	spin_lock(&vgdev->resource_idr_lock);
-	idr_remove(&vgdev->resource_idr, id);
-	spin_unlock(&vgdev->resource_idr_lock);
+	ida_simple_remove(&vgdev->resource_ida, id);
 }
 
 void virtio_gpu_ctrl_ack(struct virtqueue *vq)
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 59/62] drm: Replace virtio IDRs with IDAs
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

These IDRs were only being used to allocate unique numbers, not to look
up pointers, so they can use the more space-efficient IDA instead.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 drivers/gpu/drm/virtio/virtgpu_drv.h |  6 ++----
 drivers/gpu/drm/virtio/virtgpu_kms.c | 18 ++++--------------
 drivers/gpu/drm/virtio/virtgpu_vq.c  | 12 ++----------
 3 files changed, 8 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
index da2fb585fea4..6852e8bbc1b0 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -181,8 +181,7 @@ struct virtio_gpu_device {
 	struct kmem_cache *vbufs;
 	bool vqs_ready;
 
-	struct idr	resource_idr;
-	spinlock_t resource_idr_lock;
+	struct ida	resource_ida;
 
 	wait_queue_head_t resp_wq;
 	/* current display info */
@@ -191,8 +190,7 @@ struct virtio_gpu_device {
 
 	struct virtio_gpu_fence_driver fence_drv;
 
-	struct idr	ctx_id_idr;
-	spinlock_t ctx_id_idr_lock;
+	struct ida	ctx_id_ida;
 
 	bool has_virgl_3d;
 
diff --git a/drivers/gpu/drm/virtio/virtgpu_kms.c b/drivers/gpu/drm/virtio/virtgpu_kms.c
index 6400506a06b0..29c08b63d8d3 100644
--- a/drivers/gpu/drm/virtio/virtgpu_kms.c
+++ b/drivers/gpu/drm/virtio/virtgpu_kms.c
@@ -55,21 +55,13 @@ static void virtio_gpu_config_changed_work_func(struct work_struct *work)
 static void virtio_gpu_ctx_id_get(struct virtio_gpu_device *vgdev,
 				  uint32_t *resid)
 {
-	int handle;
-
-	idr_preload(GFP_KERNEL);
-	spin_lock(&vgdev->ctx_id_idr_lock);
-	handle = idr_alloc(&vgdev->ctx_id_idr, NULL, 1, 0, 0);
-	spin_unlock(&vgdev->ctx_id_idr_lock);
-	idr_preload_end();
+	int handle = ida_simple_get(&vgdev->ctx_id_ida, 1, 0, GFP_KERNEL);
 	*resid = handle;
 }
 
 static void virtio_gpu_ctx_id_put(struct virtio_gpu_device *vgdev, uint32_t id)
 {
-	spin_lock(&vgdev->ctx_id_idr_lock);
-	idr_remove(&vgdev->ctx_id_idr, id);
-	spin_unlock(&vgdev->ctx_id_idr_lock);
+	ida_simple_remove(&vgdev->ctx_id_ida, id);
 }
 
 static void virtio_gpu_context_create(struct virtio_gpu_device *vgdev,
@@ -151,10 +143,8 @@ int virtio_gpu_driver_load(struct drm_device *dev, unsigned long flags)
 	vgdev->dev = dev->dev;
 
 	spin_lock_init(&vgdev->display_info_lock);
-	spin_lock_init(&vgdev->ctx_id_idr_lock);
-	idr_init(&vgdev->ctx_id_idr);
-	spin_lock_init(&vgdev->resource_idr_lock);
-	idr_init(&vgdev->resource_idr);
+	ida_init(&vgdev->ctx_id_ida);
+	ida_init(&vgdev->resource_ida);
 	init_waitqueue_head(&vgdev->resp_wq);
 	virtio_gpu_init_vq(&vgdev->ctrlq, virtio_gpu_dequeue_ctrl_func);
 	virtio_gpu_init_vq(&vgdev->cursorq, virtio_gpu_dequeue_cursor_func);
diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
index 9eb96fb2c147..b35b5f2c056c 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -41,21 +41,13 @@
 void virtio_gpu_resource_id_get(struct virtio_gpu_device *vgdev,
 				uint32_t *resid)
 {
-	int handle;
-
-	idr_preload(GFP_KERNEL);
-	spin_lock(&vgdev->resource_idr_lock);
-	handle = idr_alloc(&vgdev->resource_idr, NULL, 1, 0, GFP_NOWAIT);
-	spin_unlock(&vgdev->resource_idr_lock);
-	idr_preload_end();
+	int handle = ida_simple_get(&vgdev->resource_ida, 1, 0, GFP_KERNEL);
 	*resid = handle;
 }
 
 void virtio_gpu_resource_id_put(struct virtio_gpu_device *vgdev, uint32_t id)
 {
-	spin_lock(&vgdev->resource_idr_lock);
-	idr_remove(&vgdev->resource_idr, id);
-	spin_unlock(&vgdev->resource_idr_lock);
+	ida_simple_remove(&vgdev->resource_ida, id);
 }
 
 void virtio_gpu_ctrl_ack(struct virtqueue *vq)
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 60/62] drm: Replace vmwgfx IDRs with IDAs
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

These IDRs were only being used to allocate unique numbers, not to look
up pointers, so they can use the more space-efficient IDA instead.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c      |  6 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h      |  2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 28 ++++++++++------------------
 3 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 184340d486c3..fc6e04cf071e 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -652,7 +652,7 @@ static int vmw_driver_load(struct drm_device *dev, unsigned long chipset)
 	spin_lock_init(&dev_priv->cursor_lock);
 
 	for (i = vmw_res_context; i < vmw_res_max; ++i) {
-		idr_init(&dev_priv->res_idr[i]);
+		ida_init(&dev_priv->res_ida[i]);
 		INIT_LIST_HEAD(&dev_priv->res_lru[i]);
 	}
 
@@ -950,7 +950,7 @@ static int vmw_driver_load(struct drm_device *dev, unsigned long chipset)
 	vmw_ttm_global_release(dev_priv);
 out_err0:
 	for (i = vmw_res_context; i < vmw_res_max; ++i)
-		idr_destroy(&dev_priv->res_idr[i]);
+		ida_destroy(&dev_priv->res_ida[i]);
 
 	if (dev_priv->ctx.staged_bindings)
 		vmw_binding_state_free(dev_priv->ctx.staged_bindings);
@@ -1002,7 +1002,7 @@ static void vmw_driver_unload(struct drm_device *dev)
 	vmw_ttm_global_release(dev_priv);
 
 	for (i = vmw_res_context; i < vmw_res_max; ++i)
-		idr_destroy(&dev_priv->res_idr[i]);
+		ida_destroy(&dev_priv->res_ida[i]);
 
 	kfree(dev_priv);
 }
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 7e5f30e234b1..96866a1e3547 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -429,7 +429,7 @@ struct vmw_private {
 	 */
 
 	rwlock_t resource_lock;
-	struct idr res_idr[vmw_res_max];
+	struct ida res_ida[vmw_res_max];
 	/*
 	 * Block lastclose from racing with firstopen.
 	 */
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
index a96f90f017d1..ca0e2a1fd0c5 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -80,13 +80,11 @@ vmw_resource_reference_unless_doomed(struct vmw_resource *res)
 void vmw_resource_release_id(struct vmw_resource *res)
 {
 	struct vmw_private *dev_priv = res->dev_priv;
-	struct idr *idr = &dev_priv->res_idr[res->func->res_type];
+	struct ida *ida = &dev_priv->res_ida[res->func->res_type];
 
-	write_lock(&dev_priv->resource_lock);
 	if (res->id != -1)
-		idr_remove(idr, res->id);
+		ida_simple_remove(ida, res->id);
 	res->id = -1;
-	write_unlock(&dev_priv->resource_lock);
 }
 
 static void vmw_resource_release(struct kref *kref)
@@ -95,7 +93,7 @@ static void vmw_resource_release(struct kref *kref)
 	    container_of(kref, struct vmw_resource, kref);
 	struct vmw_private *dev_priv = res->dev_priv;
 	int id;
-	struct idr *idr = &dev_priv->res_idr[res->func->res_type];
+	struct ida *ida = &dev_priv->res_ida[res->func->res_type];
 
 	write_lock(&dev_priv->resource_lock);
 	res->avail = false;
@@ -132,10 +130,8 @@ static void vmw_resource_release(struct kref *kref)
 	else
 		kfree(res);
 
-	write_lock(&dev_priv->resource_lock);
 	if (id != -1)
-		idr_remove(idr, id);
-	write_unlock(&dev_priv->resource_lock);
+		ida_simple_remove(ida, id);
 }
 
 void vmw_resource_unreference(struct vmw_resource **p_res)
@@ -159,20 +155,16 @@ int vmw_resource_alloc_id(struct vmw_resource *res)
 {
 	struct vmw_private *dev_priv = res->dev_priv;
 	int ret;
-	struct idr *idr = &dev_priv->res_idr[res->func->res_type];
+	struct ida *ida = &dev_priv->res_ida[res->func->res_type];
 
 	BUG_ON(res->id != -1);
 
-	idr_preload(GFP_KERNEL);
-	write_lock(&dev_priv->resource_lock);
-
-	ret = idr_alloc(idr, res, 1, 0, GFP_NOWAIT);
-	if (ret >= 0)
-		res->id = ret;
+	ret = ida_simple_get(ida, 1, 0, GFP_KERNEL);
+	if (ret < 0)
+		return ret;
 
-	write_unlock(&dev_priv->resource_lock);
-	idr_preload_end();
-	return ret < 0 ? ret : 0;
+	res->id = ret;
+	return 0;
 }
 
 /**
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 60/62] drm: Replace vmwgfx IDRs with IDAs
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

These IDRs were only being used to allocate unique numbers, not to look
up pointers, so they can use the more space-efficient IDA instead.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c      |  6 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h      |  2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 28 ++++++++++------------------
 3 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 184340d486c3..fc6e04cf071e 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -652,7 +652,7 @@ static int vmw_driver_load(struct drm_device *dev, unsigned long chipset)
 	spin_lock_init(&dev_priv->cursor_lock);
 
 	for (i = vmw_res_context; i < vmw_res_max; ++i) {
-		idr_init(&dev_priv->res_idr[i]);
+		ida_init(&dev_priv->res_ida[i]);
 		INIT_LIST_HEAD(&dev_priv->res_lru[i]);
 	}
 
@@ -950,7 +950,7 @@ static int vmw_driver_load(struct drm_device *dev, unsigned long chipset)
 	vmw_ttm_global_release(dev_priv);
 out_err0:
 	for (i = vmw_res_context; i < vmw_res_max; ++i)
-		idr_destroy(&dev_priv->res_idr[i]);
+		ida_destroy(&dev_priv->res_ida[i]);
 
 	if (dev_priv->ctx.staged_bindings)
 		vmw_binding_state_free(dev_priv->ctx.staged_bindings);
@@ -1002,7 +1002,7 @@ static void vmw_driver_unload(struct drm_device *dev)
 	vmw_ttm_global_release(dev_priv);
 
 	for (i = vmw_res_context; i < vmw_res_max; ++i)
-		idr_destroy(&dev_priv->res_idr[i]);
+		ida_destroy(&dev_priv->res_ida[i]);
 
 	kfree(dev_priv);
 }
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 7e5f30e234b1..96866a1e3547 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -429,7 +429,7 @@ struct vmw_private {
 	 */
 
 	rwlock_t resource_lock;
-	struct idr res_idr[vmw_res_max];
+	struct ida res_ida[vmw_res_max];
 	/*
 	 * Block lastclose from racing with firstopen.
 	 */
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
index a96f90f017d1..ca0e2a1fd0c5 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -80,13 +80,11 @@ vmw_resource_reference_unless_doomed(struct vmw_resource *res)
 void vmw_resource_release_id(struct vmw_resource *res)
 {
 	struct vmw_private *dev_priv = res->dev_priv;
-	struct idr *idr = &dev_priv->res_idr[res->func->res_type];
+	struct ida *ida = &dev_priv->res_ida[res->func->res_type];
 
-	write_lock(&dev_priv->resource_lock);
 	if (res->id != -1)
-		idr_remove(idr, res->id);
+		ida_simple_remove(ida, res->id);
 	res->id = -1;
-	write_unlock(&dev_priv->resource_lock);
 }
 
 static void vmw_resource_release(struct kref *kref)
@@ -95,7 +93,7 @@ static void vmw_resource_release(struct kref *kref)
 	    container_of(kref, struct vmw_resource, kref);
 	struct vmw_private *dev_priv = res->dev_priv;
 	int id;
-	struct idr *idr = &dev_priv->res_idr[res->func->res_type];
+	struct ida *ida = &dev_priv->res_ida[res->func->res_type];
 
 	write_lock(&dev_priv->resource_lock);
 	res->avail = false;
@@ -132,10 +130,8 @@ static void vmw_resource_release(struct kref *kref)
 	else
 		kfree(res);
 
-	write_lock(&dev_priv->resource_lock);
 	if (id != -1)
-		idr_remove(idr, id);
-	write_unlock(&dev_priv->resource_lock);
+		ida_simple_remove(ida, id);
 }
 
 void vmw_resource_unreference(struct vmw_resource **p_res)
@@ -159,20 +155,16 @@ int vmw_resource_alloc_id(struct vmw_resource *res)
 {
 	struct vmw_private *dev_priv = res->dev_priv;
 	int ret;
-	struct idr *idr = &dev_priv->res_idr[res->func->res_type];
+	struct ida *ida = &dev_priv->res_ida[res->func->res_type];
 
 	BUG_ON(res->id != -1);
 
-	idr_preload(GFP_KERNEL);
-	write_lock(&dev_priv->resource_lock);
-
-	ret = idr_alloc(idr, res, 1, 0, GFP_NOWAIT);
-	if (ret >= 0)
-		res->id = ret;
+	ret = ida_simple_get(ida, 1, 0, GFP_KERNEL);
+	if (ret < 0)
+		return ret;
 
-	write_unlock(&dev_priv->resource_lock);
-	idr_preload_end();
-	return ret < 0 ? ret : 0;
+	res->id = ret;
+	return 0;
 }
 
 /**
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 61/62] net: Redesign act_api use of IDR
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The IDR now has its own internal locking, so remove the idrinfo->lock.
Use the IDR's lock to protect the walks that were formerly protected by
our own lock.  Remove the preloading as it is no longer necessary with
the internal locking.  Then embed the action_idr in the struct tc_action
instead of separately allocating it.  Adjust various function signatures
to work with this.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/net/act_api.h |  27 ++-----------
 net/sched/act_api.c   | 105 ++++++++++++++++++--------------------------------
 2 files changed, 41 insertions(+), 91 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index fd08df74c466..a353e7a3fb22 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -11,11 +11,6 @@
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
-struct tcf_idrinfo {
-	spinlock_t	lock;
-	struct idr	action_idr;
-};
-
 struct tc_action_ops;
 
 struct tc_action {
@@ -23,7 +18,7 @@ struct tc_action {
 	__u32				type; /* for backward compat(TCA_OLD_COMPAT) */
 	__u32				order;
 	struct list_head		list;
-	struct tcf_idrinfo		*idrinfo;
+	struct idr			*action_idr;
 
 	u32				tcfa_index;
 	int				tcfa_refcnt;
@@ -98,7 +93,7 @@ struct tc_action_ops {
 };
 
 struct tc_action_net {
-	struct tcf_idrinfo *idrinfo;
+	struct idr	action_idr;
 	const struct tc_action_ops *ops;
 };
 
@@ -108,25 +103,11 @@ int tc_action_net_init(struct tc_action_net *tn,
 {
 	int err = 0;
 
-	tn->idrinfo = kmalloc(sizeof(*tn->idrinfo), GFP_KERNEL);
-	if (!tn->idrinfo)
-		return -ENOMEM;
 	tn->ops = ops;
-	spin_lock_init(&tn->idrinfo->lock);
-	idr_init(&tn->idrinfo->action_idr);
+	idr_init(&tn->action_idr);
 	return err;
 }
-
-void tcf_idrinfo_destroy(const struct tc_action_ops *ops,
-			 struct tcf_idrinfo *idrinfo);
-
-static inline void tc_action_net_exit(struct tc_action_net *tn)
-{
-	rtnl_lock();
-	tcf_idrinfo_destroy(tn->ops, tn->idrinfo);
-	rtnl_unlock();
-	kfree(tn->idrinfo);
-}
+void tc_action_net_exit(struct tc_action_net *tn);
 
 int tcf_generic_walker(struct tc_action_net *tn, struct sk_buff *skb,
 		       struct netlink_callback *cb, int type,
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index a6aa606b5e99..058e78088569 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -75,11 +75,9 @@ static void free_tcf(struct tc_action *p)
 	kfree(p);
 }
 
-static void tcf_idr_remove(struct tcf_idrinfo *idrinfo, struct tc_action *p)
+static void tcf_idr_remove(struct idr *idr, struct tc_action *p)
 {
-	spin_lock_bh(&idrinfo->lock);
-	idr_remove(&idrinfo->action_idr, p->tcfa_index);
-	spin_unlock_bh(&idrinfo->lock);
+	idr_remove(idr, p->tcfa_index);
 	gen_kill_estimator(&p->tcfa_rate_est);
 	free_tcf(p);
 }
@@ -100,7 +98,7 @@ int __tcf_idr_release(struct tc_action *p, bool bind, bool strict)
 		if (p->tcfa_bindcnt <= 0 && p->tcfa_refcnt <= 0) {
 			if (p->ops->cleanup)
 				p->ops->cleanup(p, bind);
-			tcf_idr_remove(p->idrinfo, p);
+			tcf_idr_remove(p->action_idr, p);
 			ret = ACT_P_DELETED;
 		}
 	}
@@ -109,18 +107,18 @@ int __tcf_idr_release(struct tc_action *p, bool bind, bool strict)
 }
 EXPORT_SYMBOL(__tcf_idr_release);
 
-static int tcf_dump_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
+static int tcf_dump_walker(struct tc_action_net *tn, struct sk_buff *skb,
 			   struct netlink_callback *cb)
 {
 	int err = 0, index = -1, s_i = 0, n_i = 0;
 	u32 act_flags = cb->args[2];
 	unsigned long jiffy_since = cb->args[3];
 	struct nlattr *nest;
-	struct idr *idr = &idrinfo->action_idr;
+	struct idr *idr = &tn->action_idr;
 	struct tc_action *p;
 	unsigned long id = 1;
 
-	spin_lock_bh(&idrinfo->lock);
+	idr_lock_bh(idr);
 
 	s_i = cb->args[0];
 
@@ -153,7 +151,7 @@ static int tcf_dump_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
 	if (index >= 0)
 		cb->args[0] = index + 1;
 
-	spin_unlock_bh(&idrinfo->lock);
+	idr_unlock_bh(idr);
 	if (n_i) {
 		if (act_flags & TCA_FLAG_LARGE_DUMP_ON)
 			cb->args[1] = n_i;
@@ -165,13 +163,13 @@ static int tcf_dump_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
 	goto done;
 }
 
-static int tcf_del_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
+static int tcf_del_walker(struct tc_action_net *tn, struct sk_buff *skb,
 			  const struct tc_action_ops *ops)
 {
 	struct nlattr *nest;
 	int n_i = 0;
 	int ret = -EINVAL;
-	struct idr *idr = &idrinfo->action_idr;
+	struct idr *idr = &tn->action_idr;
 	struct tc_action *p;
 	unsigned long id = 1;
 
@@ -204,12 +202,10 @@ int tcf_generic_walker(struct tc_action_net *tn, struct sk_buff *skb,
 		       struct netlink_callback *cb, int type,
 		       const struct tc_action_ops *ops)
 {
-	struct tcf_idrinfo *idrinfo = tn->idrinfo;
-
 	if (type == RTM_DELACTION) {
-		return tcf_del_walker(idrinfo, skb, ops);
+		return tcf_del_walker(tn, skb, ops);
 	} else if (type == RTM_GETACTION) {
-		return tcf_dump_walker(idrinfo, skb, cb);
+		return tcf_dump_walker(tn, skb, cb);
 	} else {
 		WARN(1, "tcf_generic_walker: unknown action %d\n", type);
 		return -EINVAL;
@@ -217,21 +213,9 @@ int tcf_generic_walker(struct tc_action_net *tn, struct sk_buff *skb,
 }
 EXPORT_SYMBOL(tcf_generic_walker);
 
-static struct tc_action *tcf_idr_lookup(u32 index, struct tcf_idrinfo *idrinfo)
-{
-	struct tc_action *p;
-
-	spin_lock_bh(&idrinfo->lock);
-	p = idr_find(&idrinfo->action_idr, index);
-	spin_unlock_bh(&idrinfo->lock);
-
-	return p;
-}
-
 int tcf_idr_search(struct tc_action_net *tn, struct tc_action **a, u32 index)
 {
-	struct tcf_idrinfo *idrinfo = tn->idrinfo;
-	struct tc_action *p = tcf_idr_lookup(index, idrinfo);
+	struct tc_action *p = idr_find(&tn->action_idr, index);
 
 	if (p) {
 		*a = p;
@@ -244,8 +228,7 @@ EXPORT_SYMBOL(tcf_idr_search);
 bool tcf_idr_check(struct tc_action_net *tn, u32 index, struct tc_action **a,
 		   int bind)
 {
-	struct tcf_idrinfo *idrinfo = tn->idrinfo;
-	struct tc_action *p = tcf_idr_lookup(index, idrinfo);
+	struct tc_action *p = idr_find(&tn->action_idr, index);
 
 	if (index && p) {
 		if (bind)
@@ -271,12 +254,11 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 		   int bind, bool cpustats)
 {
 	struct tc_action *p = kzalloc(ops->size, GFP_KERNEL);
-	struct tcf_idrinfo *idrinfo = tn->idrinfo;
-	struct idr *idr = &idrinfo->action_idr;
+	struct idr *idr = &tn->action_idr;
 	int err = -ENOMEM;
 
 	if (unlikely(!p))
-		return -ENOMEM;
+		goto free;
 	p->tcfa_refcnt = 1;
 	if (bind)
 		p->tcfa_bindcnt = 1;
@@ -284,31 +266,21 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 	if (cpustats) {
 		p->cpu_bstats = netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu);
 		if (!p->cpu_bstats)
-			goto err1;
+			goto free;
 		p->cpu_qstats = alloc_percpu(struct gnet_stats_queue);
 		if (!p->cpu_qstats)
-			goto err2;
+			goto free_bstats;
 	}
 	spin_lock_init(&p->tcfa_lock);
 	/* user doesn't specify an index */
 	if (!index) {
 		index = 1;
-		idr_preload(GFP_KERNEL);
-		spin_lock_bh(&idrinfo->lock);
-		err = idr_alloc_u32(idr, NULL, &index, UINT_MAX, GFP_ATOMIC);
-		spin_unlock_bh(&idrinfo->lock);
-		idr_preload_end();
-		if (err)
-			goto err3;
+		err = idr_alloc_u32(idr, NULL, &index, UINT_MAX, GFP_KERNEL);
 	} else {
-		idr_preload(GFP_KERNEL);
-		spin_lock_bh(&idrinfo->lock);
-		err = idr_alloc_u32(idr, NULL, &index, index, GFP_ATOMIC);
-		spin_unlock_bh(&idrinfo->lock);
-		idr_preload_end();
-		if (err)
-			goto err3;
+		err = idr_alloc_u32(idr, NULL, &index, index, GFP_KERNEL);
 	}
+	if (err)
+		goto free_qstats;
 
 	p->tcfa_index = index;
 	p->tcfa_tm.install = jiffies;
@@ -319,21 +291,22 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 					&p->tcfa_rate_est,
 					&p->tcfa_lock, NULL, est);
 		if (err)
-			goto err4;
+			goto free_idr;
 	}
 
-	p->idrinfo = idrinfo;
+	p->action_idr = idr;
 	p->ops = ops;
 	INIT_LIST_HEAD(&p->list);
 	*a = p;
 	return 0;
-err4:
+
+free_idr:
 	idr_remove(idr, index);
-err3:
+free_qstats:
 	free_percpu(p->cpu_qstats);
-err2:
+free_bstats:
 	free_percpu(p->cpu_bstats);
-err1:
+free:
 	kfree(p);
 	return err;
 }
@@ -341,32 +314,28 @@ EXPORT_SYMBOL(tcf_idr_create);
 
 void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a)
 {
-	struct tcf_idrinfo *idrinfo = tn->idrinfo;
-
-	spin_lock_bh(&idrinfo->lock);
-	idr_replace(&idrinfo->action_idr, a, a->tcfa_index);
-	spin_unlock_bh(&idrinfo->lock);
+	idr_replace(&tn->action_idr, a, a->tcfa_index);
 }
 EXPORT_SYMBOL(tcf_idr_insert);
 
-void tcf_idrinfo_destroy(const struct tc_action_ops *ops,
-			 struct tcf_idrinfo *idrinfo)
+void tc_action_net_exit(struct tc_action_net *tn)
 {
-	struct idr *idr = &idrinfo->action_idr;
+	struct idr *idr = &tn->action_idr;
 	struct tc_action *p;
 	int ret;
 	unsigned long id = 1;
 
+	rtnl_lock();
 	idr_for_each_entry_ul(idr, p, id) {
 		ret = __tcf_idr_release(p, false, true);
 		if (ret == ACT_P_DELETED)
-			module_put(ops->owner);
-		else if (ret < 0)
-			return;
+			module_put(tn->ops->owner);
+		WARN_ON(ret < 0);
 	}
-	idr_destroy(&idrinfo->action_idr);
+	idr_destroy(idr);
+	rtnl_unlock();
 }
-EXPORT_SYMBOL(tcf_idrinfo_destroy);
+EXPORT_SYMBOL(tc_action_net_exit);
 
 static LIST_HEAD(act_base);
 static DEFINE_RWLOCK(act_mod_lock);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 61/62] net: Redesign act_api use of IDR
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

The IDR now has its own internal locking, so remove the idrinfo->lock.
Use the IDR's lock to protect the walks that were formerly protected by
our own lock.  Remove the preloading as it is no longer necessary with
the internal locking.  Then embed the action_idr in the struct tc_action
instead of separately allocating it.  Adjust various function signatures
to work with this.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/net/act_api.h |  27 ++-----------
 net/sched/act_api.c   | 105 ++++++++++++++++++--------------------------------
 2 files changed, 41 insertions(+), 91 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index fd08df74c466..a353e7a3fb22 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -11,11 +11,6 @@
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
-struct tcf_idrinfo {
-	spinlock_t	lock;
-	struct idr	action_idr;
-};
-
 struct tc_action_ops;
 
 struct tc_action {
@@ -23,7 +18,7 @@ struct tc_action {
 	__u32				type; /* for backward compat(TCA_OLD_COMPAT) */
 	__u32				order;
 	struct list_head		list;
-	struct tcf_idrinfo		*idrinfo;
+	struct idr			*action_idr;
 
 	u32				tcfa_index;
 	int				tcfa_refcnt;
@@ -98,7 +93,7 @@ struct tc_action_ops {
 };
 
 struct tc_action_net {
-	struct tcf_idrinfo *idrinfo;
+	struct idr	action_idr;
 	const struct tc_action_ops *ops;
 };
 
@@ -108,25 +103,11 @@ int tc_action_net_init(struct tc_action_net *tn,
 {
 	int err = 0;
 
-	tn->idrinfo = kmalloc(sizeof(*tn->idrinfo), GFP_KERNEL);
-	if (!tn->idrinfo)
-		return -ENOMEM;
 	tn->ops = ops;
-	spin_lock_init(&tn->idrinfo->lock);
-	idr_init(&tn->idrinfo->action_idr);
+	idr_init(&tn->action_idr);
 	return err;
 }
-
-void tcf_idrinfo_destroy(const struct tc_action_ops *ops,
-			 struct tcf_idrinfo *idrinfo);
-
-static inline void tc_action_net_exit(struct tc_action_net *tn)
-{
-	rtnl_lock();
-	tcf_idrinfo_destroy(tn->ops, tn->idrinfo);
-	rtnl_unlock();
-	kfree(tn->idrinfo);
-}
+void tc_action_net_exit(struct tc_action_net *tn);
 
 int tcf_generic_walker(struct tc_action_net *tn, struct sk_buff *skb,
 		       struct netlink_callback *cb, int type,
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index a6aa606b5e99..058e78088569 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -75,11 +75,9 @@ static void free_tcf(struct tc_action *p)
 	kfree(p);
 }
 
-static void tcf_idr_remove(struct tcf_idrinfo *idrinfo, struct tc_action *p)
+static void tcf_idr_remove(struct idr *idr, struct tc_action *p)
 {
-	spin_lock_bh(&idrinfo->lock);
-	idr_remove(&idrinfo->action_idr, p->tcfa_index);
-	spin_unlock_bh(&idrinfo->lock);
+	idr_remove(idr, p->tcfa_index);
 	gen_kill_estimator(&p->tcfa_rate_est);
 	free_tcf(p);
 }
@@ -100,7 +98,7 @@ int __tcf_idr_release(struct tc_action *p, bool bind, bool strict)
 		if (p->tcfa_bindcnt <= 0 && p->tcfa_refcnt <= 0) {
 			if (p->ops->cleanup)
 				p->ops->cleanup(p, bind);
-			tcf_idr_remove(p->idrinfo, p);
+			tcf_idr_remove(p->action_idr, p);
 			ret = ACT_P_DELETED;
 		}
 	}
@@ -109,18 +107,18 @@ int __tcf_idr_release(struct tc_action *p, bool bind, bool strict)
 }
 EXPORT_SYMBOL(__tcf_idr_release);
 
-static int tcf_dump_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
+static int tcf_dump_walker(struct tc_action_net *tn, struct sk_buff *skb,
 			   struct netlink_callback *cb)
 {
 	int err = 0, index = -1, s_i = 0, n_i = 0;
 	u32 act_flags = cb->args[2];
 	unsigned long jiffy_since = cb->args[3];
 	struct nlattr *nest;
-	struct idr *idr = &idrinfo->action_idr;
+	struct idr *idr = &tn->action_idr;
 	struct tc_action *p;
 	unsigned long id = 1;
 
-	spin_lock_bh(&idrinfo->lock);
+	idr_lock_bh(idr);
 
 	s_i = cb->args[0];
 
@@ -153,7 +151,7 @@ static int tcf_dump_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
 	if (index >= 0)
 		cb->args[0] = index + 1;
 
-	spin_unlock_bh(&idrinfo->lock);
+	idr_unlock_bh(idr);
 	if (n_i) {
 		if (act_flags & TCA_FLAG_LARGE_DUMP_ON)
 			cb->args[1] = n_i;
@@ -165,13 +163,13 @@ static int tcf_dump_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
 	goto done;
 }
 
-static int tcf_del_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
+static int tcf_del_walker(struct tc_action_net *tn, struct sk_buff *skb,
 			  const struct tc_action_ops *ops)
 {
 	struct nlattr *nest;
 	int n_i = 0;
 	int ret = -EINVAL;
-	struct idr *idr = &idrinfo->action_idr;
+	struct idr *idr = &tn->action_idr;
 	struct tc_action *p;
 	unsigned long id = 1;
 
@@ -204,12 +202,10 @@ int tcf_generic_walker(struct tc_action_net *tn, struct sk_buff *skb,
 		       struct netlink_callback *cb, int type,
 		       const struct tc_action_ops *ops)
 {
-	struct tcf_idrinfo *idrinfo = tn->idrinfo;
-
 	if (type == RTM_DELACTION) {
-		return tcf_del_walker(idrinfo, skb, ops);
+		return tcf_del_walker(tn, skb, ops);
 	} else if (type == RTM_GETACTION) {
-		return tcf_dump_walker(idrinfo, skb, cb);
+		return tcf_dump_walker(tn, skb, cb);
 	} else {
 		WARN(1, "tcf_generic_walker: unknown action %d\n", type);
 		return -EINVAL;
@@ -217,21 +213,9 @@ int tcf_generic_walker(struct tc_action_net *tn, struct sk_buff *skb,
 }
 EXPORT_SYMBOL(tcf_generic_walker);
 
-static struct tc_action *tcf_idr_lookup(u32 index, struct tcf_idrinfo *idrinfo)
-{
-	struct tc_action *p;
-
-	spin_lock_bh(&idrinfo->lock);
-	p = idr_find(&idrinfo->action_idr, index);
-	spin_unlock_bh(&idrinfo->lock);
-
-	return p;
-}
-
 int tcf_idr_search(struct tc_action_net *tn, struct tc_action **a, u32 index)
 {
-	struct tcf_idrinfo *idrinfo = tn->idrinfo;
-	struct tc_action *p = tcf_idr_lookup(index, idrinfo);
+	struct tc_action *p = idr_find(&tn->action_idr, index);
 
 	if (p) {
 		*a = p;
@@ -244,8 +228,7 @@ EXPORT_SYMBOL(tcf_idr_search);
 bool tcf_idr_check(struct tc_action_net *tn, u32 index, struct tc_action **a,
 		   int bind)
 {
-	struct tcf_idrinfo *idrinfo = tn->idrinfo;
-	struct tc_action *p = tcf_idr_lookup(index, idrinfo);
+	struct tc_action *p = idr_find(&tn->action_idr, index);
 
 	if (index && p) {
 		if (bind)
@@ -271,12 +254,11 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 		   int bind, bool cpustats)
 {
 	struct tc_action *p = kzalloc(ops->size, GFP_KERNEL);
-	struct tcf_idrinfo *idrinfo = tn->idrinfo;
-	struct idr *idr = &idrinfo->action_idr;
+	struct idr *idr = &tn->action_idr;
 	int err = -ENOMEM;
 
 	if (unlikely(!p))
-		return -ENOMEM;
+		goto free;
 	p->tcfa_refcnt = 1;
 	if (bind)
 		p->tcfa_bindcnt = 1;
@@ -284,31 +266,21 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 	if (cpustats) {
 		p->cpu_bstats = netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu);
 		if (!p->cpu_bstats)
-			goto err1;
+			goto free;
 		p->cpu_qstats = alloc_percpu(struct gnet_stats_queue);
 		if (!p->cpu_qstats)
-			goto err2;
+			goto free_bstats;
 	}
 	spin_lock_init(&p->tcfa_lock);
 	/* user doesn't specify an index */
 	if (!index) {
 		index = 1;
-		idr_preload(GFP_KERNEL);
-		spin_lock_bh(&idrinfo->lock);
-		err = idr_alloc_u32(idr, NULL, &index, UINT_MAX, GFP_ATOMIC);
-		spin_unlock_bh(&idrinfo->lock);
-		idr_preload_end();
-		if (err)
-			goto err3;
+		err = idr_alloc_u32(idr, NULL, &index, UINT_MAX, GFP_KERNEL);
 	} else {
-		idr_preload(GFP_KERNEL);
-		spin_lock_bh(&idrinfo->lock);
-		err = idr_alloc_u32(idr, NULL, &index, index, GFP_ATOMIC);
-		spin_unlock_bh(&idrinfo->lock);
-		idr_preload_end();
-		if (err)
-			goto err3;
+		err = idr_alloc_u32(idr, NULL, &index, index, GFP_KERNEL);
 	}
+	if (err)
+		goto free_qstats;
 
 	p->tcfa_index = index;
 	p->tcfa_tm.install = jiffies;
@@ -319,21 +291,22 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 					&p->tcfa_rate_est,
 					&p->tcfa_lock, NULL, est);
 		if (err)
-			goto err4;
+			goto free_idr;
 	}
 
-	p->idrinfo = idrinfo;
+	p->action_idr = idr;
 	p->ops = ops;
 	INIT_LIST_HEAD(&p->list);
 	*a = p;
 	return 0;
-err4:
+
+free_idr:
 	idr_remove(idr, index);
-err3:
+free_qstats:
 	free_percpu(p->cpu_qstats);
-err2:
+free_bstats:
 	free_percpu(p->cpu_bstats);
-err1:
+free:
 	kfree(p);
 	return err;
 }
@@ -341,32 +314,28 @@ EXPORT_SYMBOL(tcf_idr_create);
 
 void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a)
 {
-	struct tcf_idrinfo *idrinfo = tn->idrinfo;
-
-	spin_lock_bh(&idrinfo->lock);
-	idr_replace(&idrinfo->action_idr, a, a->tcfa_index);
-	spin_unlock_bh(&idrinfo->lock);
+	idr_replace(&tn->action_idr, a, a->tcfa_index);
 }
 EXPORT_SYMBOL(tcf_idr_insert);
 
-void tcf_idrinfo_destroy(const struct tc_action_ops *ops,
-			 struct tcf_idrinfo *idrinfo)
+void tc_action_net_exit(struct tc_action_net *tn)
 {
-	struct idr *idr = &idrinfo->action_idr;
+	struct idr *idr = &tn->action_idr;
 	struct tc_action *p;
 	int ret;
 	unsigned long id = 1;
 
+	rtnl_lock();
 	idr_for_each_entry_ul(idr, p, id) {
 		ret = __tcf_idr_release(p, false, true);
 		if (ret == ACT_P_DELETED)
-			module_put(ops->owner);
-		else if (ret < 0)
-			return;
+			module_put(tn->ops->owner);
+		WARN_ON(ret < 0);
 	}
-	idr_destroy(&idrinfo->action_idr);
+	idr_destroy(idr);
+	rtnl_unlock();
 }
-EXPORT_SYMBOL(tcf_idrinfo_destroy);
+EXPORT_SYMBOL(tc_action_net_exit);
 
 static LIST_HEAD(act_base);
 static DEFINE_RWLOCK(act_mod_lock);
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 62/62] mm: Convert page-writeback to XArray
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-22 21:07   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Includes moving mapping_tagged() to fs.h as a static inline, and
changing it to return bool.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/fs.h  | 17 +++++++++------
 mm/page-writeback.c | 63 +++++++++++++++++++----------------------------------
 2 files changed, 33 insertions(+), 47 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index a5c105d292a7..14b7406e03e0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -470,15 +470,18 @@ struct block_device {
 	struct mutex		bd_fsfreeze_mutex;
 } __randomize_layout;
 
+/* XArray tags, for tagging dirty and writeback pages in the pagecache. */
+#define PAGECACHE_TAG_DIRTY	XA_TAG_0
+#define PAGECACHE_TAG_WRITEBACK	XA_TAG_1
+#define PAGECACHE_TAG_TOWRITE	XA_TAG_2
+
 /*
- * Radix-tree tags, for tagging dirty and writeback pages within the pagecache
- * radix trees
+ * Returns true if any of the pages in the mapping are marked with the tag.
  */
-#define PAGECACHE_TAG_DIRTY	0
-#define PAGECACHE_TAG_WRITEBACK	1
-#define PAGECACHE_TAG_TOWRITE	2
-
-int mapping_tagged(struct address_space *mapping, int tag);
+static inline bool mapping_tagged(struct address_space *mapping, xa_tag_t tag)
+{
+	return xa_tagged(&mapping->pages, tag);
+}
 
 static inline void i_mmap_lock_write(struct address_space *mapping)
 {
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 6a5b92629727..28175cba7e72 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2100,33 +2100,26 @@ void __init page_writeback_init(void)
  * dirty pages in the file (thus it is important for this function to be quick
  * so that it can tag pages faster than a dirtying process can create them).
  */
-/*
- * We tag pages in batches of WRITEBACK_TAG_BATCH to reduce xa_lock latency.
- */
 void tag_pages_for_writeback(struct address_space *mapping,
 			     pgoff_t start, pgoff_t end)
 {
-#define WRITEBACK_TAG_BATCH 4096
-	unsigned long tagged = 0;
-	struct radix_tree_iter iter;
-	void **slot;
+	XA_STATE(xas, start);
+	struct xarray *xa = &mapping->pages;
+	unsigned int tagged = 0;
+	void *page;
 
-	xa_lock_irq(&mapping->pages);
-	radix_tree_for_each_tagged(slot, &mapping->pages, &iter, start,
-							PAGECACHE_TAG_DIRTY) {
-		if (iter.index > end)
-			break;
-		radix_tree_iter_tag_set(&mapping->pages, &iter,
-							PAGECACHE_TAG_TOWRITE);
-		tagged++;
-		if ((tagged % WRITEBACK_TAG_BATCH) != 0)
+	xa_lock_irq(xa);
+	xas_for_each_tag(xa, &xas, page, end, PAGECACHE_TAG_DIRTY) {
+		xas_set_tag(xa, &xas, PAGECACHE_TAG_TOWRITE);
+		if (++tagged % XA_CHECK_SCHED)
 			continue;
-		slot = radix_tree_iter_resume(slot, &iter);
-		xa_unlock_irq(&mapping->pages);
+
+		xas_pause(&xas);
+		xa_unlock_irq(xa);
 		cond_resched();
-		xa_lock_irq(&mapping->pages);
+		xa_lock_irq(xa);
 	}
-	xa_unlock_irq(&mapping->pages);
+	xa_unlock_irq(xa);
 }
 EXPORT_SYMBOL(tag_pages_for_writeback);
 
@@ -2166,7 +2159,7 @@ int write_cache_pages(struct address_space *mapping,
 	pgoff_t done_index;
 	int cycled;
 	int range_whole = 0;
-	int tag;
+	xa_tag_t tag;
 
 	pagevec_init(&pvec);
 	if (wbc->range_cyclic) {
@@ -2447,7 +2440,7 @@ void account_page_cleaned(struct page *page, struct address_space *mapping,
 
 /*
  * For address_spaces which do not use buffers.  Just tag the page as dirty in
- * its radix tree.
+ * the xarray.
  *
  * This is also used when a single buffer is being dirtied: we want to set the
  * page dirty in that case, but not all the buffers.  This is a "bottom-up"
@@ -2473,7 +2466,7 @@ int __set_page_dirty_nobuffers(struct page *page)
 		BUG_ON(page_mapping(page) != mapping);
 		WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
 		account_page_dirtied(page, mapping);
-		radix_tree_tag_set(&mapping->pages, page_index(page),
+		__xa_set_tag(&mapping->pages, page_index(page),
 				   PAGECACHE_TAG_DIRTY);
 		xa_unlock_irqrestore(&mapping->pages, flags);
 		unlock_page_memcg(page);
@@ -2636,13 +2629,13 @@ EXPORT_SYMBOL(__cancel_dirty_page);
  * Returns true if the page was previously dirty.
  *
  * This is for preparing to put the page under writeout.  We leave the page
- * tagged as dirty in the radix tree so that a concurrent write-for-sync
+ * tagged as dirty in the xarray so that a concurrent write-for-sync
  * can discover it via a PAGECACHE_TAG_DIRTY walk.  The ->writepage
  * implementation will run either set_page_writeback() or set_page_dirty(),
- * at which stage we bring the page's dirty flag and radix-tree dirty tag
+ * at which stage we bring the page's dirty flag and xarray dirty tag
  * back into sync.
  *
- * This incoherency between the page's dirty flag and radix-tree tag is
+ * This incoherency between the page's dirty flag and xarray tag is
  * unfortunate, but it only exists while the page is locked.
  */
 int clear_page_dirty_for_io(struct page *page)
@@ -2723,7 +2716,7 @@ int test_clear_page_writeback(struct page *page)
 		xa_lock_irqsave(&mapping->pages, flags);
 		ret = TestClearPageWriteback(page);
 		if (ret) {
-			radix_tree_tag_clear(&mapping->pages, page_index(page),
+			__xa_clear_tag(&mapping->pages, page_index(page),
 						PAGECACHE_TAG_WRITEBACK);
 			if (bdi_cap_account_writeback(bdi)) {
 				struct bdi_writeback *wb = inode_to_wb(inode);
@@ -2775,7 +2768,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 			on_wblist = mapping_tagged(mapping,
 						   PAGECACHE_TAG_WRITEBACK);
 
-			radix_tree_tag_set(&mapping->pages, page_index(page),
+			__xa_set_tag(&mapping->pages, page_index(page),
 						PAGECACHE_TAG_WRITEBACK);
 			if (bdi_cap_account_writeback(bdi))
 				inc_wb_stat(inode_to_wb(inode), WB_WRITEBACK);
@@ -2789,10 +2782,10 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 				sb_mark_inode_writeback(mapping->host);
 		}
 		if (!PageDirty(page))
-			radix_tree_tag_clear(&mapping->pages, page_index(page),
+			__xa_clear_tag(&mapping->pages, page_index(page),
 						PAGECACHE_TAG_DIRTY);
 		if (!keep_write)
-			radix_tree_tag_clear(&mapping->pages, page_index(page),
+			__xa_clear_tag(&mapping->pages, page_index(page),
 						PAGECACHE_TAG_TOWRITE);
 		xa_unlock_irqrestore(&mapping->pages, flags);
 	} else {
@@ -2808,16 +2801,6 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 }
 EXPORT_SYMBOL(__test_set_page_writeback);
 
-/*
- * Return true if any of the pages in the mapping are marked with the
- * passed tag.
- */
-int mapping_tagged(struct address_space *mapping, int tag)
-{
-	return radix_tree_tagged(&mapping->pages, tag);
-}
-EXPORT_SYMBOL(mapping_tagged);
-
 /**
  * wait_for_stable_page() - wait for writeback to finish, if necessary.
  * @page:	The page to wait on.
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 62/62] mm: Convert page-writeback to XArray
@ 2017-11-22 21:07   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 21:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <mawilcox@microsoft.com>

Includes moving mapping_tagged() to fs.h as a static inline, and
changing it to return bool.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/fs.h  | 17 +++++++++------
 mm/page-writeback.c | 63 +++++++++++++++++++----------------------------------
 2 files changed, 33 insertions(+), 47 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index a5c105d292a7..14b7406e03e0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -470,15 +470,18 @@ struct block_device {
 	struct mutex		bd_fsfreeze_mutex;
 } __randomize_layout;
 
+/* XArray tags, for tagging dirty and writeback pages in the pagecache. */
+#define PAGECACHE_TAG_DIRTY	XA_TAG_0
+#define PAGECACHE_TAG_WRITEBACK	XA_TAG_1
+#define PAGECACHE_TAG_TOWRITE	XA_TAG_2
+
 /*
- * Radix-tree tags, for tagging dirty and writeback pages within the pagecache
- * radix trees
+ * Returns true if any of the pages in the mapping are marked with the tag.
  */
-#define PAGECACHE_TAG_DIRTY	0
-#define PAGECACHE_TAG_WRITEBACK	1
-#define PAGECACHE_TAG_TOWRITE	2
-
-int mapping_tagged(struct address_space *mapping, int tag);
+static inline bool mapping_tagged(struct address_space *mapping, xa_tag_t tag)
+{
+	return xa_tagged(&mapping->pages, tag);
+}
 
 static inline void i_mmap_lock_write(struct address_space *mapping)
 {
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 6a5b92629727..28175cba7e72 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2100,33 +2100,26 @@ void __init page_writeback_init(void)
  * dirty pages in the file (thus it is important for this function to be quick
  * so that it can tag pages faster than a dirtying process can create them).
  */
-/*
- * We tag pages in batches of WRITEBACK_TAG_BATCH to reduce xa_lock latency.
- */
 void tag_pages_for_writeback(struct address_space *mapping,
 			     pgoff_t start, pgoff_t end)
 {
-#define WRITEBACK_TAG_BATCH 4096
-	unsigned long tagged = 0;
-	struct radix_tree_iter iter;
-	void **slot;
+	XA_STATE(xas, start);
+	struct xarray *xa = &mapping->pages;
+	unsigned int tagged = 0;
+	void *page;
 
-	xa_lock_irq(&mapping->pages);
-	radix_tree_for_each_tagged(slot, &mapping->pages, &iter, start,
-							PAGECACHE_TAG_DIRTY) {
-		if (iter.index > end)
-			break;
-		radix_tree_iter_tag_set(&mapping->pages, &iter,
-							PAGECACHE_TAG_TOWRITE);
-		tagged++;
-		if ((tagged % WRITEBACK_TAG_BATCH) != 0)
+	xa_lock_irq(xa);
+	xas_for_each_tag(xa, &xas, page, end, PAGECACHE_TAG_DIRTY) {
+		xas_set_tag(xa, &xas, PAGECACHE_TAG_TOWRITE);
+		if (++tagged % XA_CHECK_SCHED)
 			continue;
-		slot = radix_tree_iter_resume(slot, &iter);
-		xa_unlock_irq(&mapping->pages);
+
+		xas_pause(&xas);
+		xa_unlock_irq(xa);
 		cond_resched();
-		xa_lock_irq(&mapping->pages);
+		xa_lock_irq(xa);
 	}
-	xa_unlock_irq(&mapping->pages);
+	xa_unlock_irq(xa);
 }
 EXPORT_SYMBOL(tag_pages_for_writeback);
 
@@ -2166,7 +2159,7 @@ int write_cache_pages(struct address_space *mapping,
 	pgoff_t done_index;
 	int cycled;
 	int range_whole = 0;
-	int tag;
+	xa_tag_t tag;
 
 	pagevec_init(&pvec);
 	if (wbc->range_cyclic) {
@@ -2447,7 +2440,7 @@ void account_page_cleaned(struct page *page, struct address_space *mapping,
 
 /*
  * For address_spaces which do not use buffers.  Just tag the page as dirty in
- * its radix tree.
+ * the xarray.
  *
  * This is also used when a single buffer is being dirtied: we want to set the
  * page dirty in that case, but not all the buffers.  This is a "bottom-up"
@@ -2473,7 +2466,7 @@ int __set_page_dirty_nobuffers(struct page *page)
 		BUG_ON(page_mapping(page) != mapping);
 		WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
 		account_page_dirtied(page, mapping);
-		radix_tree_tag_set(&mapping->pages, page_index(page),
+		__xa_set_tag(&mapping->pages, page_index(page),
 				   PAGECACHE_TAG_DIRTY);
 		xa_unlock_irqrestore(&mapping->pages, flags);
 		unlock_page_memcg(page);
@@ -2636,13 +2629,13 @@ EXPORT_SYMBOL(__cancel_dirty_page);
  * Returns true if the page was previously dirty.
  *
  * This is for preparing to put the page under writeout.  We leave the page
- * tagged as dirty in the radix tree so that a concurrent write-for-sync
+ * tagged as dirty in the xarray so that a concurrent write-for-sync
  * can discover it via a PAGECACHE_TAG_DIRTY walk.  The ->writepage
  * implementation will run either set_page_writeback() or set_page_dirty(),
- * at which stage we bring the page's dirty flag and radix-tree dirty tag
+ * at which stage we bring the page's dirty flag and xarray dirty tag
  * back into sync.
  *
- * This incoherency between the page's dirty flag and radix-tree tag is
+ * This incoherency between the page's dirty flag and xarray tag is
  * unfortunate, but it only exists while the page is locked.
  */
 int clear_page_dirty_for_io(struct page *page)
@@ -2723,7 +2716,7 @@ int test_clear_page_writeback(struct page *page)
 		xa_lock_irqsave(&mapping->pages, flags);
 		ret = TestClearPageWriteback(page);
 		if (ret) {
-			radix_tree_tag_clear(&mapping->pages, page_index(page),
+			__xa_clear_tag(&mapping->pages, page_index(page),
 						PAGECACHE_TAG_WRITEBACK);
 			if (bdi_cap_account_writeback(bdi)) {
 				struct bdi_writeback *wb = inode_to_wb(inode);
@@ -2775,7 +2768,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 			on_wblist = mapping_tagged(mapping,
 						   PAGECACHE_TAG_WRITEBACK);
 
-			radix_tree_tag_set(&mapping->pages, page_index(page),
+			__xa_set_tag(&mapping->pages, page_index(page),
 						PAGECACHE_TAG_WRITEBACK);
 			if (bdi_cap_account_writeback(bdi))
 				inc_wb_stat(inode_to_wb(inode), WB_WRITEBACK);
@@ -2789,10 +2782,10 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 				sb_mark_inode_writeback(mapping->host);
 		}
 		if (!PageDirty(page))
-			radix_tree_tag_clear(&mapping->pages, page_index(page),
+			__xa_clear_tag(&mapping->pages, page_index(page),
 						PAGECACHE_TAG_DIRTY);
 		if (!keep_write)
-			radix_tree_tag_clear(&mapping->pages, page_index(page),
+			__xa_clear_tag(&mapping->pages, page_index(page),
 						PAGECACHE_TAG_TOWRITE);
 		xa_unlock_irqrestore(&mapping->pages, flags);
 	} else {
@@ -2808,16 +2801,6 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 }
 EXPORT_SYMBOL(__test_set_page_writeback);
 
-/*
- * Return true if any of the pages in the mapping are marked with the
- * passed tag.
- */
-int mapping_tagged(struct address_space *mapping, int tag)
-{
-	return radix_tree_tagged(&mapping->pages, tag);
-}
-EXPORT_SYMBOL(mapping_tagged);
-
 /**
  * wait_for_stable_page() - wait for writeback to finish, if necessary.
  * @page:	The page to wait on.
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* Re: [PATCH 05/62] radix tree: Add a missing cast to gfp_t
  2017-11-22 21:06   ` Matthew Wilcox
@ 2017-11-22 21:28     ` Luc Van Oostenryck
  -1 siblings, 0 replies; 159+ messages in thread
From: Luc Van Oostenryck @ 2017-11-22 21:28 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

On Wed, Nov 22, 2017 at 01:06:42PM -0800, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> sparse complains about an invalid type assignment.
> 
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
> ---
>  lib/radix-tree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index c8d55565fafa..f00303e0b216 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -178,7 +178,7 @@ static inline void root_tag_clear(struct radix_tree_root *root, unsigned tag)
>  
>  static inline void root_tag_clear_all(struct radix_tree_root *root)
>  {
> -	root->gfp_mask &= (1 << ROOT_TAG_SHIFT) - 1;
> +	root->gfp_mask &= (__force gfp_t)((1 << ROOT_TAG_SHIFT) - 1);
>  }
>  
>  static inline int root_tag_get(const struct radix_tree_root *root, unsigned tag)
> -- 

IMO, it would be better to define something for that in radix-tree.h,
like it has been done for ROOT_IS_IDR.

Regards,
-- Luc

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 05/62] radix tree: Add a missing cast to gfp_t
@ 2017-11-22 21:28     ` Luc Van Oostenryck
  0 siblings, 0 replies; 159+ messages in thread
From: Luc Van Oostenryck @ 2017-11-22 21:28 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

On Wed, Nov 22, 2017 at 01:06:42PM -0800, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> sparse complains about an invalid type assignment.
> 
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
> ---
>  lib/radix-tree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index c8d55565fafa..f00303e0b216 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -178,7 +178,7 @@ static inline void root_tag_clear(struct radix_tree_root *root, unsigned tag)
>  
>  static inline void root_tag_clear_all(struct radix_tree_root *root)
>  {
> -	root->gfp_mask &= (1 << ROOT_TAG_SHIFT) - 1;
> +	root->gfp_mask &= (__force gfp_t)((1 << ROOT_TAG_SHIFT) - 1);
>  }
>  
>  static inline int root_tag_get(const struct radix_tree_root *root, unsigned tag)
> -- 

IMO, it would be better to define something for that in radix-tree.h,
like it has been done for ROOT_IS_IDR.

Regards,
-- Luc

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 05/62] radix tree: Add a missing cast to gfp_t
  2017-11-22 21:28     ` Luc Van Oostenryck
@ 2017-11-22 22:24       ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 22:24 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

On Wed, Nov 22, 2017 at 10:28:48PM +0100, Luc Van Oostenryck wrote:
> > -	root->gfp_mask &= (1 << ROOT_TAG_SHIFT) - 1;
> > +	root->gfp_mask &= (__force gfp_t)((1 << ROOT_TAG_SHIFT) - 1);
> 
> IMO, it would be better to define something for that in radix-tree.h,
> like it has been done for ROOT_IS_IDR.

If we were keeping the radix tree around, I'd agree, but the point of
the rest of this patch set is replacing it ;-)  I should probably have
just dropped this patch, to be honest.

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 05/62] radix tree: Add a missing cast to gfp_t
@ 2017-11-22 22:24       ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-22 22:24 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

On Wed, Nov 22, 2017 at 10:28:48PM +0100, Luc Van Oostenryck wrote:
> > -	root->gfp_mask &= (1 << ROOT_TAG_SHIFT) - 1;
> > +	root->gfp_mask &= (__force gfp_t)((1 << ROOT_TAG_SHIFT) - 1);
> 
> IMO, it would be better to define something for that in radix-tree.h,
> like it has been done for ROOT_IS_IDR.

If we were keeping the radix tree around, I'd agree, but the point of
the rest of this patch set is replacing it ;-)  I should probably have
just dropped this patch, to be honest.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 05/62] radix tree: Add a missing cast to gfp_t
  2017-11-22 22:24       ` Matthew Wilcox
@ 2017-11-22 22:35         ` Luc Van Oostenryck
  -1 siblings, 0 replies; 159+ messages in thread
From: Luc Van Oostenryck @ 2017-11-22 22:35 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

On Wed, Nov 22, 2017 at 02:24:02PM -0800, Matthew Wilcox wrote:
> On Wed, Nov 22, 2017 at 10:28:48PM +0100, Luc Van Oostenryck wrote:
> > > -	root->gfp_mask &= (1 << ROOT_TAG_SHIFT) - 1;
> > > +	root->gfp_mask &= (__force gfp_t)((1 << ROOT_TAG_SHIFT) - 1);
> > 
> > IMO, it would be better to define something for that in radix-tree.h,
> > like it has been done for ROOT_IS_IDR.
> 
> If we were keeping the radix tree around, I'd agree, but the point of
> the rest of this patch set is replacing it ;-)  I should probably have
> just dropped this patch, to be honest.

Ah OK, sure.
I confess I didn't saw the whole series, just this patch.

-- Luc

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 05/62] radix tree: Add a missing cast to gfp_t
@ 2017-11-22 22:35         ` Luc Van Oostenryck
  0 siblings, 0 replies; 159+ messages in thread
From: Luc Van Oostenryck @ 2017-11-22 22:35 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

On Wed, Nov 22, 2017 at 02:24:02PM -0800, Matthew Wilcox wrote:
> On Wed, Nov 22, 2017 at 10:28:48PM +0100, Luc Van Oostenryck wrote:
> > > -	root->gfp_mask &= (1 << ROOT_TAG_SHIFT) - 1;
> > > +	root->gfp_mask &= (__force gfp_t)((1 << ROOT_TAG_SHIFT) - 1);
> > 
> > IMO, it would be better to define something for that in radix-tree.h,
> > like it has been done for ROOT_IS_IDR.
> 
> If we were keeping the radix tree around, I'd agree, but the point of
> the rest of this patch set is replacing it ;-)  I should probably have
> just dropped this patch, to be honest.

Ah OK, sure.
I confess I didn't saw the whole series, just this patch.

-- Luc

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 00/62] XArray November 2017 Edition
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-23  1:25   ` Dave Chinner
  -1 siblings, 0 replies; 159+ messages in thread
From: Dave Chinner @ 2017-11-23  1:25 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

On Wed, Nov 22, 2017 at 01:06:37PM -0800, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> I've lost count of the number of times I've posted the XArray before,
> so time for a new numbering scheme.  Here're two earlier versions,
> https://lkml.org/lkml/2017/3/17/724
> https://lwn.net/Articles/715948/ (this one's more loquacious in its
> description of things that are better about the radix tree API than the
> XArray).
> 
> This time around, I've gone for an approach of many small changes.
> Unfortunately, that means you get 62 moderate patches instead of dozens
> of big ones.

Where's the API documentation that tells things like constraints
about locking and lock-less lookups via RCU?

e.g. I notice in the XFS patches you seem to randomly strip out
rcu_read_lock/unlock() pairs that are currently around radix tree
lookup operations without explanation. Without documentation
describing how this stuff is supposed to work, review is somewhat
difficult...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 00/62] XArray November 2017 Edition
@ 2017-11-23  1:25   ` Dave Chinner
  0 siblings, 0 replies; 159+ messages in thread
From: Dave Chinner @ 2017-11-23  1:25 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

On Wed, Nov 22, 2017 at 01:06:37PM -0800, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> I've lost count of the number of times I've posted the XArray before,
> so time for a new numbering scheme.  Here're two earlier versions,
> https://lkml.org/lkml/2017/3/17/724
> https://lwn.net/Articles/715948/ (this one's more loquacious in its
> description of things that are better about the radix tree API than the
> XArray).
> 
> This time around, I've gone for an approach of many small changes.
> Unfortunately, that means you get 62 moderate patches instead of dozens
> of big ones.

Where's the API documentation that tells things like constraints
about locking and lock-less lookups via RCU?

e.g. I notice in the XFS patches you seem to randomly strip out
rcu_read_lock/unlock() pairs that are currently around radix tree
lookup operations without explanation. Without documentation
describing how this stuff is supposed to work, review is somewhat
difficult...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 00/62] XArray November 2017 Edition
  2017-11-23  1:25   ` Dave Chinner
@ 2017-11-23  2:46     ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-23  2:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

On Thu, Nov 23, 2017 at 12:25:01PM +1100, Dave Chinner wrote:
> On Wed, Nov 22, 2017 at 01:06:37PM -0800, Matthew Wilcox wrote:
> > From: Matthew Wilcox <mawilcox@microsoft.com>
> > 
> > I've lost count of the number of times I've posted the XArray before,
> > so time for a new numbering scheme.  Here're two earlier versions,
> > https://lkml.org/lkml/2017/3/17/724
> > https://lwn.net/Articles/715948/ (this one's more loquacious in its
> > description of things that are better about the radix tree API than the
> > XArray).
> > 
> > This time around, I've gone for an approach of many small changes.
> > Unfortunately, that means you get 62 moderate patches instead of dozens
> > of big ones.
> 
> Where's the API documentation that tells things like constraints
> about locking and lock-less lookups via RCU?

Pretty spartan so far:

 * An eXtensible Array is an array of entries indexed by an unsigned
 * long.  The XArray takes care of its own locking, using an irqsafe
 * spinlock for operations that modify the XArray and RCU for operations
 * which only read from the XArray.

That needs to be amended to specify it's only talking about the normal API.
For the advanced API, the user needs to handle their own locking.

> e.g. I notice in the XFS patches you seem to randomly strip out
> rcu_read_lock/unlock() pairs that are currently around radix tree
> lookup operations without explanation. Without documentation
> describing how this stuff is supposed to work, review is somewhat
> difficult...

It's not entirely random, although I appreciate it may seem that way.

Takes no lock, doesn't need it:
 * xa_empty
 * xa_tagged
Takes RCU read lock:
 * xa_load
 * xa_for_each
 * xa_find
 * xa_next
 * xa_get_entries
 * xa_get_tagged
 * xa_get_tag
Takes xa_lock internally:
 * xa_store
 * xa_cmpxchg
 * xa_destroy
 * xa_set_tag
 * xa_clear_tag

The __xa_ and xas_ functions take no locks, RCU or spin.  One advantage
the xarray has over the radix tree is that you'll get nice little RCU splats
if you forget to take a lock when you need it.

Some places in the xfs patches I had to leave the RCU locks in place
because they're preventing the thing we're looking up from evaporating
under us.  So we're taking the RCU lock twice, which isn't ideal, but
also not *that* expensive.  If it turns out to be a problem, we can
introduce __xa versions or use the existing xas_ family of functions.

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 00/62] XArray November 2017 Edition
@ 2017-11-23  2:46     ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-23  2:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

On Thu, Nov 23, 2017 at 12:25:01PM +1100, Dave Chinner wrote:
> On Wed, Nov 22, 2017 at 01:06:37PM -0800, Matthew Wilcox wrote:
> > From: Matthew Wilcox <mawilcox@microsoft.com>
> > 
> > I've lost count of the number of times I've posted the XArray before,
> > so time for a new numbering scheme.  Here're two earlier versions,
> > https://lkml.org/lkml/2017/3/17/724
> > https://lwn.net/Articles/715948/ (this one's more loquacious in its
> > description of things that are better about the radix tree API than the
> > XArray).
> > 
> > This time around, I've gone for an approach of many small changes.
> > Unfortunately, that means you get 62 moderate patches instead of dozens
> > of big ones.
> 
> Where's the API documentation that tells things like constraints
> about locking and lock-less lookups via RCU?

Pretty spartan so far:

 * An eXtensible Array is an array of entries indexed by an unsigned
 * long.  The XArray takes care of its own locking, using an irqsafe
 * spinlock for operations that modify the XArray and RCU for operations
 * which only read from the XArray.

That needs to be amended to specify it's only talking about the normal API.
For the advanced API, the user needs to handle their own locking.

> e.g. I notice in the XFS patches you seem to randomly strip out
> rcu_read_lock/unlock() pairs that are currently around radix tree
> lookup operations without explanation. Without documentation
> describing how this stuff is supposed to work, review is somewhat
> difficult...

It's not entirely random, although I appreciate it may seem that way.

Takes no lock, doesn't need it:
 * xa_empty
 * xa_tagged
Takes RCU read lock:
 * xa_load
 * xa_for_each
 * xa_find
 * xa_next
 * xa_get_entries
 * xa_get_tagged
 * xa_get_tag
Takes xa_lock internally:
 * xa_store
 * xa_cmpxchg
 * xa_destroy
 * xa_set_tag
 * xa_clear_tag

The __xa_ and xas_ functions take no locks, RCU or spin.  One advantage
the xarray has over the radix tree is that you'll get nice little RCU splats
if you forget to take a lock when you need it.

Some places in the xfs patches I had to leave the RCU locks in place
because they're preventing the thing we're looking up from evaporating
under us.  So we're taking the RCU lock twice, which isn't ideal, but
also not *that* expensive.  If it turns out to be a problem, we can
introduce __xa versions or use the existing xas_ family of functions.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 159+ messages in thread

* XArray documentation
  2017-11-22 21:06 ` Matthew Wilcox
@ 2017-11-24  1:16   ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-24  1:16 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

Here's the current state of the documentation for the XArray.  Suggestions
for improvement gratefully received.

======
XArray
======

Overview
========

The XArray is an array of ULONG_MAX entries.  Each entry can be either
a pointer, or an encoded value between 0 and LONG_MAX.  It is efficient
when the indices used are densely clustered; hashing the object and
using the hash as the index will not perform well.  A freshly-initialised
XArray contains a NULL pointer at every index.  There is no difference
between an entry which has never been stored to and an entry which has most
recently had NULL stored to it.

Pointers to be stored in the XArray must have the bottom two bits clear
(ie must point to something which is 4-byte aligned).  This includes all
objects allocated by calling :c:func:`kmalloc` and :c:func:`alloc_page`,
but you cannot store pointers to arbitrary offsets within an object.
The XArray does not support storing :c:func:`IS_ERR` pointers; some
conflict with data values and others conflict with entries the XArray
uses for its own purposes.  If you need to store special values which
cannot be confused with real kernel pointers, the values 4, 8, ... 4092
are available.

Each non-NULL entry in the array has three bits associated with it called
tags.  Each tag may be flipped on or off independently of the others.
You can search for entries with a given tag set.

An unusual feature of the XArray is the ability to tie multiple entries
together.  Once stored to, looking up any entry in the range will give
the same result as looking up any other entry in the range.  Setting a
tag on one entry will set it on all of them.  Multiple entries can be
explicitly split into smaller entries, or storing NULL into any entry
will cause the XArray to forget about the tie.

Normal API
==========

Start by initialising an XArray, either with :c:func:`DEFINE_XARRAY`
for statically allocated XArrays or :c:func:`xa_init` for dynamically
allocated ones.

You can then set entries using :c:func:`xa_store` and get entries using
:c:func:`xa_load`.  xa_store will overwrite a non-NULL entry with the
new entry.  It returns the previous entry stored at that index.  You can
conditionally replace an entry at an index by using :c:func:`xa_cmpxchg`.
Like :c:func:`cmpxchg`, it will only succeed if the entry at that
index has the 'old' value.  It also returns the entry which was at
that index; if it returns the same entry which was passed as 'old',
then :c:func:`xa_cmpxchg` succeeded.

If you want to store a pointer, you can do that directly.  If you want
to store an integer between 0 and LONG_MAX, you must first encode it
using :c:func:`xa_mk_value`.  When you retrieve an entry from the XArray,
you can check whether it is a data value by calling :c:func:`xa_is_value`,
and convert it back to an integer by calling :c:func:`xa_to_value`.

You can enquire whether a tag is set on an entry by using
:c:func:`xa_get_tag`.  If the entry is not NULL, you can set a tag on
it by using :c:func:`xa_set_tag` and remove the tag from an entry by
calling :c:func:`xa_clear_tag`.  You can ask whether any entry in the
XArray has a particular tag set by calling :c:func:`xa_tagged`.

You can copy entries out of the XArray into a plain array by
calling :c:func:`xa_get_entries` and copy tagged entries by calling
:c:func:`xa_get_tagged`.  Or you can iterate over the non-NULL entries
in place in the XArray by calling :c:func:`xa_for_each`.  You may prefer
to use :c:func:`xa_find` or :c:func:`xa_next` to move to the next present
entry in the XArray.

Finally, you can remove all entries from an XArray by calling
:c:func:`xa_destroy`.  If the XArray entries are pointers, you may wish
to free the entries first.  You can do this by iterating over all non-NULL
entries in

When using the Normal API, you do not have to worry about locking.
The XArray uses RCU and an irq-safe spinlock to synchronise access to
the XArray:

No lock needed:
 * :c:func:`xa_empty`
 * :c:func:`xa_tagged`

Takes RCU read lock:
 * :c:func:`xa_load`
 * :c:func:`xa_for_each`
 * :c:func:`xa_find`
 * :c:func:`xa_next`
 * :c:func:`xa_get_entries`
 * :c:func:`xa_get_tagged`
 * :c:func:`xa_get_tag`

Takes xa_lock internally:
 * :c:func:`xa_store`
 * :c:func:`xa_cmpxchg`
 * :c:func:`xa_destroy`
 * :c:func:`xa_set_tag`
 * :c:func:`xa_clear_tag`

The :c:func:`xa_store` and :c:func:`xa_cmpxchg` functions take a gfp_t
parameter in case the XArray needs to allocate memory to store this entry.
If the entry being stored is NULL, no memory allocation needs to be
performed, and the GFP flags specified here will be ignored.

Advanced API
============

The advanced API offers more flexibility and better performance at the
cost of an interface which can be harder to use and has fewer safeguards.
No locking is done for you by the advanced API, and you are required to
use the xa_lock while modifying the array.  You can choose whether to use
the xa_lock or the RCU lock while doing read-only operations on the array.

The advanced API is based around the xa_state.  This is an opaque data
structure which you declare on the stack using the :c:func:`XA_STATE`
macro.  This macro initialises the xa_state ready to start walking
around the XArray.  It is used as a cursor to maintain the position
in the XArray and let you compose various operations together without
having to restart from the top every time.

The xa_state is also used to store errors.  If an operation fails, it
calls :c:func:`xas_set_err` to note the error.  All operations check
whether the xa_state is in an error state before proceeding, so there's
no need for you to check for an error after each call; you can make
multiple calls in succession and only check at a convenient point.

The only error currently generated by the xarray code itself is
ENOMEM, but it supports arbitrary errors in case you want to call
:c:func:`xas_set_err` yourself.  If the xa_state is holding an ENOMEM
error, :c:func:`xas_nomem` will attempt to allocate a single xa_node using
the specified gfp flags and store it in the xa_state for the next attempt.
The idea is that you take the xa_lock, attempt the operation and drop
the lock.  Then you allocate memory if there was a memory allocation
failure and retry the operation.  You must call :c:func:`xas_destroy`
if you call :c:func:`xas_nomem` in case it's not necessary to use the
memory that was allocated.

When using the advanced API, it's possible to see internal entries
in the XArray.  You should never see an :c:func:`xa_is_node` entry,
but you may see other internal entries, including sibling entries,
skip entries and retry entries.  The :c:func:`xas_retry` function is a
useful helper function for handling internal entries, particularly in
the middle of iterations.

Functions
=========

.. kernel-doc:: include/linux/xarray.h
.. kernel-doc:: lib/xarray.c

^ permalink raw reply	[flat|nested] 159+ messages in thread

* XArray documentation
@ 2017-11-24  1:16   ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-24  1:16 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel; +Cc: Matthew Wilcox

Here's the current state of the documentation for the XArray.  Suggestions
for improvement gratefully received.

======
XArray
======

Overview
========

The XArray is an array of ULONG_MAX entries.  Each entry can be either
a pointer, or an encoded value between 0 and LONG_MAX.  It is efficient
when the indices used are densely clustered; hashing the object and
using the hash as the index will not perform well.  A freshly-initialised
XArray contains a NULL pointer at every index.  There is no difference
between an entry which has never been stored to and an entry which has most
recently had NULL stored to it.

Pointers to be stored in the XArray must have the bottom two bits clear
(ie must point to something which is 4-byte aligned).  This includes all
objects allocated by calling :c:func:`kmalloc` and :c:func:`alloc_page`,
but you cannot store pointers to arbitrary offsets within an object.
The XArray does not support storing :c:func:`IS_ERR` pointers; some
conflict with data values and others conflict with entries the XArray
uses for its own purposes.  If you need to store special values which
cannot be confused with real kernel pointers, the values 4, 8, ... 4092
are available.

Each non-NULL entry in the array has three bits associated with it called
tags.  Each tag may be flipped on or off independently of the others.
You can search for entries with a given tag set.

An unusual feature of the XArray is the ability to tie multiple entries
together.  Once stored to, looking up any entry in the range will give
the same result as looking up any other entry in the range.  Setting a
tag on one entry will set it on all of them.  Multiple entries can be
explicitly split into smaller entries, or storing NULL into any entry
will cause the XArray to forget about the tie.

Normal API
==========

Start by initialising an XArray, either with :c:func:`DEFINE_XARRAY`
for statically allocated XArrays or :c:func:`xa_init` for dynamically
allocated ones.

You can then set entries using :c:func:`xa_store` and get entries using
:c:func:`xa_load`.  xa_store will overwrite a non-NULL entry with the
new entry.  It returns the previous entry stored at that index.  You can
conditionally replace an entry at an index by using :c:func:`xa_cmpxchg`.
Like :c:func:`cmpxchg`, it will only succeed if the entry at that
index has the 'old' value.  It also returns the entry which was at
that index; if it returns the same entry which was passed as 'old',
then :c:func:`xa_cmpxchg` succeeded.

If you want to store a pointer, you can do that directly.  If you want
to store an integer between 0 and LONG_MAX, you must first encode it
using :c:func:`xa_mk_value`.  When you retrieve an entry from the XArray,
you can check whether it is a data value by calling :c:func:`xa_is_value`,
and convert it back to an integer by calling :c:func:`xa_to_value`.

You can enquire whether a tag is set on an entry by using
:c:func:`xa_get_tag`.  If the entry is not NULL, you can set a tag on
it by using :c:func:`xa_set_tag` and remove the tag from an entry by
calling :c:func:`xa_clear_tag`.  You can ask whether any entry in the
XArray has a particular tag set by calling :c:func:`xa_tagged`.

You can copy entries out of the XArray into a plain array by
calling :c:func:`xa_get_entries` and copy tagged entries by calling
:c:func:`xa_get_tagged`.  Or you can iterate over the non-NULL entries
in place in the XArray by calling :c:func:`xa_for_each`.  You may prefer
to use :c:func:`xa_find` or :c:func:`xa_next` to move to the next present
entry in the XArray.

Finally, you can remove all entries from an XArray by calling
:c:func:`xa_destroy`.  If the XArray entries are pointers, you may wish
to free the entries first.  You can do this by iterating over all non-NULL
entries in

When using the Normal API, you do not have to worry about locking.
The XArray uses RCU and an irq-safe spinlock to synchronise access to
the XArray:

No lock needed:
 * :c:func:`xa_empty`
 * :c:func:`xa_tagged`

Takes RCU read lock:
 * :c:func:`xa_load`
 * :c:func:`xa_for_each`
 * :c:func:`xa_find`
 * :c:func:`xa_next`
 * :c:func:`xa_get_entries`
 * :c:func:`xa_get_tagged`
 * :c:func:`xa_get_tag`

Takes xa_lock internally:
 * :c:func:`xa_store`
 * :c:func:`xa_cmpxchg`
 * :c:func:`xa_destroy`
 * :c:func:`xa_set_tag`
 * :c:func:`xa_clear_tag`

The :c:func:`xa_store` and :c:func:`xa_cmpxchg` functions take a gfp_t
parameter in case the XArray needs to allocate memory to store this entry.
If the entry being stored is NULL, no memory allocation needs to be
performed, and the GFP flags specified here will be ignored.

Advanced API
============

The advanced API offers more flexibility and better performance at the
cost of an interface which can be harder to use and has fewer safeguards.
No locking is done for you by the advanced API, and you are required to
use the xa_lock while modifying the array.  You can choose whether to use
the xa_lock or the RCU lock while doing read-only operations on the array.

The advanced API is based around the xa_state.  This is an opaque data
structure which you declare on the stack using the :c:func:`XA_STATE`
macro.  This macro initialises the xa_state ready to start walking
around the XArray.  It is used as a cursor to maintain the position
in the XArray and let you compose various operations together without
having to restart from the top every time.

The xa_state is also used to store errors.  If an operation fails, it
calls :c:func:`xas_set_err` to note the error.  All operations check
whether the xa_state is in an error state before proceeding, so there's
no need for you to check for an error after each call; you can make
multiple calls in succession and only check at a convenient point.

The only error currently generated by the xarray code itself is
ENOMEM, but it supports arbitrary errors in case you want to call
:c:func:`xas_set_err` yourself.  If the xa_state is holding an ENOMEM
error, :c:func:`xas_nomem` will attempt to allocate a single xa_node using
the specified gfp flags and store it in the xa_state for the next attempt.
The idea is that you take the xa_lock, attempt the operation and drop
the lock.  Then you allocate memory if there was a memory allocation
failure and retry the operation.  You must call :c:func:`xas_destroy`
if you call :c:func:`xas_nomem` in case it's not necessary to use the
memory that was allocated.

When using the advanced API, it's possible to see internal entries
in the XArray.  You should never see an :c:func:`xa_is_node` entry,
but you may see other internal entries, including sibling entries,
skip entries and retry entries.  The :c:func:`xas_retry` function is a
useful helper function for handling internal entries, particularly in
the middle of iterations.

Functions
=========

.. kernel-doc:: include/linux/xarray.h
.. kernel-doc:: lib/xarray.c

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: XArray documentation
  2017-11-24  1:16   ` Matthew Wilcox
  (?)
@ 2017-11-24  4:30   ` Andreas Dilger
  2017-11-24 17:17       ` Matthew Wilcox
  -1 siblings, 1 reply; 159+ messages in thread
From: Andreas Dilger @ 2017-11-24  4:30 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

[-- Attachment #1: Type: text/plain, Size: 8573 bytes --]

On Nov 23, 2017, at 6:16 PM, Matthew Wilcox <willy@infradead.org> wrote:
> 
> Here's the current state of the documentation for the XArray.  Suggestions
> for improvement gratefully received.
> 
> ======
> XArray
> ======
> 
> Overview
> ========
> 
> The XArray is an array of ULONG_MAX entries.  Each entry can be either
> a pointer, or an encoded value between 0 and LONG_MAX.  It is efficient
> when the indices used are densely clustered; hashing the object and
> using the hash as the index will not perform well.  A freshly-initialised
> XArray contains a NULL pointer at every index.  There is no difference
> between an entry which has never been stored to and an entry which has most
> recently had NULL stored to it.
> 
> Pointers to be stored in the XArray must have the bottom two bits clear
> (ie must point to something which is 4-byte aligned).  This includes all
> objects allocated by calling :c:func:`kmalloc` and :c:func:`alloc_page`,
> but you cannot store pointers to arbitrary offsets within an object.
> The XArray does not support storing :c:func:`IS_ERR` pointers; some
> conflict with data values and others conflict with entries the XArray
> uses for its own purposes.  If you need to store special values which
> cannot be confused with real kernel pointers, the values 4, 8, ... 4092
> are available.

Thought - if storing error values into the XArray in addition to regular
pointers is important for some use case, it would be easy to make
"ERR_PTR_XA()", "PTR_ERR_XA()", and "IS_ERR_XA()" macros that just shift
the error values up and down by two bits to avoid the conflict.  That
would still allow error values up (down) to -1023 to be stored without
any chance of a pointer conflict, which should be enough.

> Each non-NULL entry in the array has three bits associated with it called
> tags.  Each tag may be flipped on or off independently of the others.
> You can search for entries with a given tag set.

How can it be 3 tag bits, if the pointers only need to be 4-byte aligned?

> An unusual feature of the XArray is the ability to tie multiple entries
> together.  Once stored to, looking up any entry in the range will give
> the same result as looking up any other entry in the range.  Setting a
> tag on one entry will set it on all of them.  Multiple entries can be
> explicitly split into smaller entries, or storing NULL into any entry
> will cause the XArray to forget about the tie.
> 
> Normal API
> ==========
> 
> Start by initialising an XArray, either with :c:func:`DEFINE_XARRAY`
> for statically allocated XArrays or :c:func:`xa_init` for dynamically
> allocated ones.
> 
> You can then set entries using :c:func:`xa_store` and get entries using
> :c:func:`xa_load`.  xa_store will overwrite a non-NULL entry with the
> new entry.  It returns the previous entry stored at that index.  You can
> conditionally replace an entry at an index by using :c:func:`xa_cmpxchg`.
> Like :c:func:`cmpxchg`, it will only succeed if the entry at that
> index has the 'old' value.  It also returns the entry which was at
> that index; if it returns the same entry which was passed as 'old',
> then :c:func:`xa_cmpxchg` succeeded.
> 
> If you want to store a pointer, you can do that directly.  If you want
> to store an integer between 0 and LONG_MAX, you must first encode it
> using :c:func:`xa_mk_value`.  When you retrieve an entry from the XArray,
> you can check whether it is a data value by calling :c:func:`xa_is_value`,
> and convert it back to an integer by calling :c:func:`xa_to_value`.
> 
> You can enquire whether a tag is set on an entry by using
> :c:func:`xa_get_tag`.  If the entry is not NULL, you can set a tag on
> it by using :c:func:`xa_set_tag` and remove the tag from an entry by
> calling :c:func:`xa_clear_tag`.  You can ask whether any entry in the
> XArray has a particular tag set by calling :c:func:`xa_tagged`.
> 
> You can copy entries out of the XArray into a plain array by
> calling :c:func:`xa_get_entries` and copy tagged entries by calling
> :c:func:`xa_get_tagged`.  Or you can iterate over the non-NULL entries
> in place in the XArray by calling :c:func:`xa_for_each`.  You may prefer
> to use :c:func:`xa_find` or :c:func:`xa_next` to move to the next present
> entry in the XArray.
> 
> Finally, you can remove all entries from an XArray by calling
> :c:func:`xa_destroy`.  If the XArray entries are pointers, you may wish
> to free the entries first.  You can do this by iterating over all non-NULL
> entries in

... the XArray using xa_for_each() ?

> When using the Normal API, you do not have to worry about locking.
> The XArray uses RCU and an irq-safe spinlock to synchronise access to
> the XArray:
> 
> No lock needed:
> * :c:func:`xa_empty`
> * :c:func:`xa_tagged`
> 
> Takes RCU read lock:
> * :c:func:`xa_load`
> * :c:func:`xa_for_each`
> * :c:func:`xa_find`
> * :c:func:`xa_next`
> * :c:func:`xa_get_entries`
> * :c:func:`xa_get_tagged`
> * :c:func:`xa_get_tag`
> 
> Takes xa_lock internally:
> * :c:func:`xa_store`
> * :c:func:`xa_cmpxchg`
> * :c:func:`xa_destroy`
> * :c:func:`xa_set_tag`
> * :c:func:`xa_clear_tag`
> 
> The :c:func:`xa_store` and :c:func:`xa_cmpxchg` functions take a gfp_t
> parameter in case the XArray needs to allocate memory to store this entry.
> If the entry being stored is NULL, no memory allocation needs to be
> performed, and the GFP flags specified here will be ignored.
> 
> Advanced API
> ============
> 
> The advanced API offers more flexibility and better performance at the
> cost of an interface which can be harder to use and has fewer safeguards.
> No locking is done for you by the advanced API, and you are required to
> use the xa_lock while modifying the array.  You can choose whether to use
> the xa_lock or the RCU lock while doing read-only operations on the array.
> 
> The advanced API is based around the xa_state.  This is an opaque data
> structure which you declare on the stack using the :c:func:`XA_STATE`
> macro.  This macro initialises the xa_state ready to start walking
> around the XArray.  It is used as a cursor to maintain the position
> in the XArray and let you compose various operations together without
> having to restart from the top every time.
> 
> The xa_state is also used to store errors.  If an operation fails, it
> calls :c:func:`xas_set_err` to note the error.  All operations check
> whether the xa_state is in an error state before proceeding, so there's
> no need for you to check for an error after each call; you can make
> multiple calls in succession and only check at a convenient point.
> 
> The only error currently generated by the xarray code itself is
> ENOMEM, but it supports arbitrary errors in case you want to call
> :c:func:`xas_set_err` yourself.  If the xa_state is holding an ENOMEM
> error, :c:func:`xas_nomem` will attempt to allocate a single xa_node using

.. calling :c:func:`xas_nomem` ... ?

> the specified gfp flags and store it in the xa_state for the next attempt.
> The idea is that you take the xa_lock, attempt the operation and drop
> the lock.  Then you allocate memory if there was a memory allocation

... then you try to allocate ...

> failure and retry the operation.  You must call :c:func:`xas_destroy`
> if you call :c:func:`xas_nomem` in case it's not necessary to use the
> memory that was allocated.

This last sentence is not totally clear.  How about:

If you called :c:func:`xas_nomem` to allocate memory, but didn't need
to use the memory for some reason, you need to call :c:func:`xas_destroy`
to free the allocated memory.


The question is where the "allocated memory" is stored, if it isn't in
the XArray?  Is it in the XA_STATE?  How do you know if the allocated
memory was needed, or is it always safe to call xas_destroy?  Is the
allocated memory always consumed if xa_store/xa_cmpxchg are called?
What if there was another process that also added the same entry while
the xa_lock was dropped?

> When using the advanced API, it's possible to see internal entries
> in the XArray.  You should never see an :c:func:`xa_is_node` entry,
> but you may see other internal entries, including sibling entries,
> skip entries and retry entries.  The :c:func:`xas_retry` function is a
> useful helper function for handling internal entries, particularly in
> the middle of iterations.

How do you know if a returned value is an "internal entry"?  Is there
some "xas_is_internal()" macro/function that tells you this?

> Functions
> =========
> 
> .. kernel-doc:: include/linux/xarray.h
> .. kernel-doc:: lib/xarray.c


Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: XArray documentation
  2017-11-24  1:16   ` Matthew Wilcox
@ 2017-11-24 16:50     ` Martin Steigerwald
  -1 siblings, 0 replies; 159+ messages in thread
From: Martin Steigerwald @ 2017-11-24 16:50 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

Hello Matthew.

Matthew Wilcox - 24.11.17, 02:16:
> ======
> XArray
> ======
> 
> Overview
> ========
> 
> The XArray is an array of ULONG_MAX entries.  Each entry can be either
> a pointer, or an encoded value between 0 and LONG_MAX.  It is efficient
> when the indices used are densely clustered; hashing the object and
> using the hash as the index will not perform well.  A freshly-initialised
> XArray contains a NULL pointer at every index.  There is no difference
> between an entry which has never been stored to and an entry which has most
> recently had NULL stored to it.

I am no kernel developer (just provided a tiny bit of documentation a long 
time ago)… but on reading into this, I missed:

What is it about? And what is it used for?

"Overview" appears to be already a description of the actual implementation 
specifics, instead of… well an overview.

Of course, I am sure you all know what it is for… but someone who wants to 
learn about the kernel is likely to be confused by such a start.

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: XArray documentation
@ 2017-11-24 16:50     ` Martin Steigerwald
  0 siblings, 0 replies; 159+ messages in thread
From: Martin Steigerwald @ 2017-11-24 16:50 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

Hello Matthew.

Matthew Wilcox - 24.11.17, 02:16:
> ======
> XArray
> ======
> 
> Overview
> ========
> 
> The XArray is an array of ULONG_MAX entries.  Each entry can be either
> a pointer, or an encoded value between 0 and LONG_MAX.  It is efficient
> when the indices used are densely clustered; hashing the object and
> using the hash as the index will not perform well.  A freshly-initialised
> XArray contains a NULL pointer at every index.  There is no difference
> between an entry which has never been stored to and an entry which has most
> recently had NULL stored to it.

I am no kernel developer (just provided a tiny bit of documentation a long 
time ago)… but on reading into this, I missed:

What is it about? And what is it used for?

"Overview" appears to be already a description of the actual implementation 
specifics, instead of… well an overview.

Of course, I am sure you all know what it is for… but someone who wants to 
learn about the kernel is likely to be confused by such a start.

Thanks,
-- 
Martin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: XArray documentation
  2017-11-24 16:50     ` Martin Steigerwald
  (?)
@ 2017-11-24 17:03       ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-24 17:03 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

On Fri, Nov 24, 2017 at 05:50:41PM +0100, Martin Steigerwald wrote:
> Matthew Wilcox - 24.11.17, 02:16:
> > ======
> > XArray
> > ======
> > 
> > Overview
> > ========
> > 
> > The XArray is an array of ULONG_MAX entries.  Each entry can be either
> > a pointer, or an encoded value between 0 and LONG_MAX.  It is efficient
> > when the indices used are densely clustered; hashing the object and
> > using the hash as the index will not perform well.  A freshly-initialised
> > XArray contains a NULL pointer at every index.  There is no difference
> > between an entry which has never been stored to and an entry which has most
> > recently had NULL stored to it.
> 
> I am no kernel developer (just provided a tiny bit of documentation a long 
> time ago)… but on reading into this, I missed:
> 
> What is it about? And what is it used for?
> 
> "Overview" appears to be already a description of the actual implementation 
> specifics, instead of… well an overview.
> 
> Of course, I am sure you all know what it is for… but someone who wants to 
> learn about the kernel is likely to be confused by such a start.

Hi Martin,

Thank you for your comment.  I'm clearly too close to it because even
after reading your useful critique, I'm not sure what to change.  Please
help me!

Maybe it's that I've described the abstraction as if it's the
implementation and put too much detail into the overview.  This might
be clearer?

The XArray is an abstract data type which behaves like an infinitely
large array of pointers.  The index into the array is an unsigned long.
A freshly-initialised XArray contains a NULL pointer at every index.

----
and then move all this information into later paragraphs:

There is no difference between an entry which has never been stored to
and an entry which has most recently had NULL stored to it.
Each entry in the array can be either a pointer, or an
encoded value between 0 and LONG_MAX.
While you can use any index, the implementation is efficient when the
indices used are densely clustered; hashing the object and using the
hash as the index will not perform well.

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: XArray documentation
@ 2017-11-24 17:03       ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-24 17:03 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

On Fri, Nov 24, 2017 at 05:50:41PM +0100, Martin Steigerwald wrote:
> Matthew Wilcox - 24.11.17, 02:16:
> > ======
> > XArray
> > ======
> > 
> > Overview
> > ========
> > 
> > The XArray is an array of ULONG_MAX entries.  Each entry can be either
> > a pointer, or an encoded value between 0 and LONG_MAX.  It is efficient
> > when the indices used are densely clustered; hashing the object and
> > using the hash as the index will not perform well.  A freshly-initialised
> > XArray contains a NULL pointer at every index.  There is no difference
> > between an entry which has never been stored to and an entry which has most
> > recently had NULL stored to it.
> 
> I am no kernel developer (just provided a tiny bit of documentation a long 
> time ago)… but on reading into this, I missed:
> 
> What is it about? And what is it used for?
> 
> "Overview" appears to be already a description of the actual implementation 
> specifics, instead of… well an overview.
> 
> Of course, I am sure you all know what it is for… but someone who wants to 
> learn about the kernel is likely to be confused by such a start.

Hi Martin,

Thank you for your comment.  I'm clearly too close to it because even
after reading your useful critique, I'm not sure what to change.  Please
help me!

Maybe it's that I've described the abstraction as if it's the
implementation and put too much detail into the overview.  This might
be clearer?

The XArray is an abstract data type which behaves like an infinitely
large array of pointers.  The index into the array is an unsigned long.
A freshly-initialised XArray contains a NULL pointer at every index.

----
and then move all this information into later paragraphs:

There is no difference between an entry which has never been stored to
and an entry which has most recently had NULL stored to it.
Each entry in the array can be either a pointer, or an
encoded value between 0 and LONG_MAX.
While you can use any index, the implementation is efficient when the
indices used are densely clustered; hashing the object and using the
hash as the index will not perform well.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: XArray documentation
@ 2017-11-24 17:03       ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-24 17:03 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

On Fri, Nov 24, 2017 at 05:50:41PM +0100, Martin Steigerwald wrote:
> Matthew Wilcox - 24.11.17, 02:16:
> > ======
> > XArray
> > ======
> > 
> > Overview
> > ========
> > 
> > The XArray is an array of ULONG_MAX entries.  Each entry can be either
> > a pointer, or an encoded value between 0 and LONG_MAX.  It is efficient
> > when the indices used are densely clustered; hashing the object and
> > using the hash as the index will not perform well.  A freshly-initialised
> > XArray contains a NULL pointer at every index.  There is no difference
> > between an entry which has never been stored to and an entry which has most
> > recently had NULL stored to it.
> 
> I am no kernel developer (just provided a tiny bit of documentation a long 
> time ago)a?| but on reading into this, I missed:
> 
> What is it about? And what is it used for?
> 
> "Overview" appears to be already a description of the actual implementation 
> specifics, instead ofa?| well an overview.
> 
> Of course, I am sure you all know what it is fora?| but someone who wants to 
> learn about the kernel is likely to be confused by such a start.

Hi Martin,

Thank you for your comment.  I'm clearly too close to it because even
after reading your useful critique, I'm not sure what to change.  Please
help me!

Maybe it's that I've described the abstraction as if it's the
implementation and put too much detail into the overview.  This might
be clearer?

The XArray is an abstract data type which behaves like an infinitely
large array of pointers.  The index into the array is an unsigned long.
A freshly-initialised XArray contains a NULL pointer at every index.

----
and then move all this information into later paragraphs:

There is no difference between an entry which has never been stored to
and an entry which has most recently had NULL stored to it.
Each entry in the array can be either a pointer, or an
encoded value between 0 and LONG_MAX.
While you can use any index, the implementation is efficient when the
indices used are densely clustered; hashing the object and using the
hash as the index will not perform well.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: XArray documentation
  2017-11-24  4:30   ` Andreas Dilger
@ 2017-11-24 17:17       ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-24 17:17 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

On Thu, Nov 23, 2017 at 09:30:21PM -0700, Andreas Dilger wrote:
> > Pointers to be stored in the XArray must have the bottom two bits clear
> > (ie must point to something which is 4-byte aligned).  This includes all
> > objects allocated by calling :c:func:`kmalloc` and :c:func:`alloc_page`,
> > but you cannot store pointers to arbitrary offsets within an object.
> > The XArray does not support storing :c:func:`IS_ERR` pointers; some
> > conflict with data values and others conflict with entries the XArray
> > uses for its own purposes.  If you need to store special values which
> > cannot be confused with real kernel pointers, the values 4, 8, ... 4092
> > are available.
> 
> Thought - if storing error values into the XArray in addition to regular
> pointers is important for some use case, it would be easy to make
> "ERR_PTR_XA()", "PTR_ERR_XA()", and "IS_ERR_XA()" macros that just shift
> the error values up and down by two bits to avoid the conflict.  That
> would still allow error values up (down) to -1023 to be stored without
> any chance of a pointer conflict, which should be enough.

We could do something along those lines.  It's not something anybody's
trying to do with the radix tree today, but I wanted to document it
just in case anybody got a clever idea in the future.  Another source
of ambiguity that I didn't mention is that xa_store() can return
ERR_PTR(-ENOMEM), so if we encode the xarray-error-pointers in some
way, they have to be distinguishable from errors returned by the xa_*
functions.  There's no ambiguity when using the xas_ functions as they
signal errors in the xa_state.

> > Each non-NULL entry in the array has three bits associated with it called
> > tags.  Each tag may be flipped on or off independently of the others.
> > You can search for entries with a given tag set.
> 
> How can it be 3 tag bits, if the pointers only need to be 4-byte aligned?

The same way the radix tree does it -- there are three bitfields in each
node.  So all your PAGECACHE_DIRTY tags are stored in the same bitfield.

(I'm taking these two questions as request for more information on the
implementation, which I don't want to put in this document.  I want
this to be the users manual.  I think the appropriate place to put this
information is in source code comments.)

> > Finally, you can remove all entries from an XArray by calling
> > :c:func:`xa_destroy`.  If the XArray entries are pointers, you may wish
> > to free the entries first.  You can do this by iterating over all non-NULL
> > entries in
> 
> ... the XArray using xa_for_each() ?

I swear I wrote that ... editing error.  New paragraph:

Finally, you can remove all entries from an XArray by calling
:c:func:`xa_destroy`.  If the XArray entries are pointers, you may wish
to free the entries first.  You can do this by iterating over all non-NULL
entries in the XArray using the :c:func:`xa_for_each` iterator.

> > The only error currently generated by the xarray code itself is
> > ENOMEM, but it supports arbitrary errors in case you want to call
> > :c:func:`xas_set_err` yourself.  If the xa_state is holding an ENOMEM
> > error, :c:func:`xas_nomem` will attempt to allocate a single xa_node using
> 
> .. calling :c:func:`xas_nomem` ... ?
> 
> > the specified gfp flags and store it in the xa_state for the next attempt.
> > The idea is that you take the xa_lock, attempt the operation and drop
> > the lock.  Then you allocate memory if there was a memory allocation
> 
> ... then you try to allocate ...

Fair point -- if xas_nomem() fails to allocate memory, it'll return 'false'
to indicate not to bother retrying the operation.

> > failure and retry the operation.  You must call :c:func:`xas_destroy`
> > if you call :c:func:`xas_nomem` in case it's not necessary to use the
> > memory that was allocated.
> 
> This last sentence is not totally clear.  How about:
> 
> If you called :c:func:`xas_nomem` to allocate memory, but didn't need
> to use the memory for some reason, you need to call :c:func:`xas_destroy`
> to free the allocated memory.
> 
> 
> The question is where the "allocated memory" is stored, if it isn't in
> the XArray?  Is it in the XA_STATE? 

That was earlier in the document:

:c:func:`xas_set_err` yourself.  If the xa_state is holding an ENOMEM
error, :c:func:`xas_nomem` will attempt to allocate a single xa_node using
the specified gfp flags and store it in the xa_state for the next attempt.

> How do you know if the allocated
> memory was needed, or is it always safe to call xas_destroy?

It's always safe to call xas_destroy.

> Is the
> allocated memory always consumed if xa_store/xa_cmpxchg are called?

Usually.  If another process added the same node that we were trying to
store to, we won't allocate memory.

> What if there was another process that also added the same entry while
> the xa_lock was dropped?

We'll overwrite it.

I've rewritten this whole section.  Thanks for the feedback!

The xa_state is also used to store errors.  You can call
:c:func:`xas_error` to retrieve the error.  All operations check whether
the xa_state is in an error state before proceeding, so there's no need 
for you to check for an error after each call; you can make multiple
calls in succession and only check at a convenient point.  The only error
currently generated by the xarray code itself is ENOMEM, but it supports
arbitrary errors in case you want to call :c:func:`xas_set_err` yourself.

If the xa_state is holding an ENOMEM error, calling :c:func:`xas_nomem`
will attempt to allocate more memory using the specified gfp flags and
cache it in the xa_state for the next attempt.  The idea is that you take
the xa_lock, attempt the operation and drop the lock.  The operation
attempts to allocate memory while holding the lock, but it is more
likely to fail.  Once you have dropped the lock, :c:func:`xas_nomem`
can try harder to allocate more memory.  It will return ``true`` if it
is worth retrying the operation (ie that there was a memory error *and*
more memory was allocated.  You must call :c:func:`xas_destroy` if you
call :c:func:`xas_nomem` in case memory was allocated and then turned
out not to be needed.

> > When using the advanced API, it's possible to see internal entries
> > in the XArray.  You should never see an :c:func:`xa_is_node` entry,
> > but you may see other internal entries, including sibling entries,
> > skip entries and retry entries.  The :c:func:`xas_retry` function is a
> > useful helper function for handling internal entries, particularly in
> > the middle of iterations.
> 
> How do you know if a returned value is an "internal entry"?  Is there
> some "xas_is_internal()" macro/function that tells you this?

I know I chose the right name for that function, since you guessed it :-)

I've rewritten this whole section.  I'll post the updated document in a bit.

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: XArray documentation
@ 2017-11-24 17:17       ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-24 17:17 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

On Thu, Nov 23, 2017 at 09:30:21PM -0700, Andreas Dilger wrote:
> > Pointers to be stored in the XArray must have the bottom two bits clear
> > (ie must point to something which is 4-byte aligned).  This includes all
> > objects allocated by calling :c:func:`kmalloc` and :c:func:`alloc_page`,
> > but you cannot store pointers to arbitrary offsets within an object.
> > The XArray does not support storing :c:func:`IS_ERR` pointers; some
> > conflict with data values and others conflict with entries the XArray
> > uses for its own purposes.  If you need to store special values which
> > cannot be confused with real kernel pointers, the values 4, 8, ... 4092
> > are available.
> 
> Thought - if storing error values into the XArray in addition to regular
> pointers is important for some use case, it would be easy to make
> "ERR_PTR_XA()", "PTR_ERR_XA()", and "IS_ERR_XA()" macros that just shift
> the error values up and down by two bits to avoid the conflict.  That
> would still allow error values up (down) to -1023 to be stored without
> any chance of a pointer conflict, which should be enough.

We could do something along those lines.  It's not something anybody's
trying to do with the radix tree today, but I wanted to document it
just in case anybody got a clever idea in the future.  Another source
of ambiguity that I didn't mention is that xa_store() can return
ERR_PTR(-ENOMEM), so if we encode the xarray-error-pointers in some
way, they have to be distinguishable from errors returned by the xa_*
functions.  There's no ambiguity when using the xas_ functions as they
signal errors in the xa_state.

> > Each non-NULL entry in the array has three bits associated with it called
> > tags.  Each tag may be flipped on or off independently of the others.
> > You can search for entries with a given tag set.
> 
> How can it be 3 tag bits, if the pointers only need to be 4-byte aligned?

The same way the radix tree does it -- there are three bitfields in each
node.  So all your PAGECACHE_DIRTY tags are stored in the same bitfield.

(I'm taking these two questions as request for more information on the
implementation, which I don't want to put in this document.  I want
this to be the users manual.  I think the appropriate place to put this
information is in source code comments.)

> > Finally, you can remove all entries from an XArray by calling
> > :c:func:`xa_destroy`.  If the XArray entries are pointers, you may wish
> > to free the entries first.  You can do this by iterating over all non-NULL
> > entries in
> 
> ... the XArray using xa_for_each() ?

I swear I wrote that ... editing error.  New paragraph:

Finally, you can remove all entries from an XArray by calling
:c:func:`xa_destroy`.  If the XArray entries are pointers, you may wish
to free the entries first.  You can do this by iterating over all non-NULL
entries in the XArray using the :c:func:`xa_for_each` iterator.

> > The only error currently generated by the xarray code itself is
> > ENOMEM, but it supports arbitrary errors in case you want to call
> > :c:func:`xas_set_err` yourself.  If the xa_state is holding an ENOMEM
> > error, :c:func:`xas_nomem` will attempt to allocate a single xa_node using
> 
> .. calling :c:func:`xas_nomem` ... ?
> 
> > the specified gfp flags and store it in the xa_state for the next attempt.
> > The idea is that you take the xa_lock, attempt the operation and drop
> > the lock.  Then you allocate memory if there was a memory allocation
> 
> ... then you try to allocate ...

Fair point -- if xas_nomem() fails to allocate memory, it'll return 'false'
to indicate not to bother retrying the operation.

> > failure and retry the operation.  You must call :c:func:`xas_destroy`
> > if you call :c:func:`xas_nomem` in case it's not necessary to use the
> > memory that was allocated.
> 
> This last sentence is not totally clear.  How about:
> 
> If you called :c:func:`xas_nomem` to allocate memory, but didn't need
> to use the memory for some reason, you need to call :c:func:`xas_destroy`
> to free the allocated memory.
> 
> 
> The question is where the "allocated memory" is stored, if it isn't in
> the XArray?  Is it in the XA_STATE? 

That was earlier in the document:

:c:func:`xas_set_err` yourself.  If the xa_state is holding an ENOMEM
error, :c:func:`xas_nomem` will attempt to allocate a single xa_node using
the specified gfp flags and store it in the xa_state for the next attempt.

> How do you know if the allocated
> memory was needed, or is it always safe to call xas_destroy?

It's always safe to call xas_destroy.

> Is the
> allocated memory always consumed if xa_store/xa_cmpxchg are called?

Usually.  If another process added the same node that we were trying to
store to, we won't allocate memory.

> What if there was another process that also added the same entry while
> the xa_lock was dropped?

We'll overwrite it.

I've rewritten this whole section.  Thanks for the feedback!

The xa_state is also used to store errors.  You can call
:c:func:`xas_error` to retrieve the error.  All operations check whether
the xa_state is in an error state before proceeding, so there's no need 
for you to check for an error after each call; you can make multiple
calls in succession and only check at a convenient point.  The only error
currently generated by the xarray code itself is ENOMEM, but it supports
arbitrary errors in case you want to call :c:func:`xas_set_err` yourself.

If the xa_state is holding an ENOMEM error, calling :c:func:`xas_nomem`
will attempt to allocate more memory using the specified gfp flags and
cache it in the xa_state for the next attempt.  The idea is that you take
the xa_lock, attempt the operation and drop the lock.  The operation
attempts to allocate memory while holding the lock, but it is more
likely to fail.  Once you have dropped the lock, :c:func:`xas_nomem`
can try harder to allocate more memory.  It will return ``true`` if it
is worth retrying the operation (ie that there was a memory error *and*
more memory was allocated.  You must call :c:func:`xas_destroy` if you
call :c:func:`xas_nomem` in case memory was allocated and then turned
out not to be needed.

> > When using the advanced API, it's possible to see internal entries
> > in the XArray.  You should never see an :c:func:`xa_is_node` entry,
> > but you may see other internal entries, including sibling entries,
> > skip entries and retry entries.  The :c:func:`xas_retry` function is a
> > useful helper function for handling internal entries, particularly in
> > the middle of iterations.
> 
> How do you know if a returned value is an "internal entry"?  Is there
> some "xas_is_internal()" macro/function that tells you this?

I know I chose the right name for that function, since you guessed it :-)

I've rewritten this whole section.  I'll post the updated document in a bit.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: XArray documentation
  2017-11-24 17:03       ` Matthew Wilcox
@ 2017-11-24 18:01         ` Martin Steigerwald
  -1 siblings, 0 replies; 159+ messages in thread
From: Martin Steigerwald @ 2017-11-24 18:01 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

Hi Matthew.

Matthew Wilcox - 24.11.17, 18:03:
> On Fri, Nov 24, 2017 at 05:50:41PM +0100, Martin Steigerwald wrote:
> > Matthew Wilcox - 24.11.17, 02:16:
> > > ======
> > > XArray
> > > ======
> > > 
> > > Overview
> > > ========
> > > 
> > > The XArray is an array of ULONG_MAX entries.  Each entry can be either
> > > a pointer, or an encoded value between 0 and LONG_MAX.  It is efficient
> > > when the indices used are densely clustered; hashing the object and
> > > using the hash as the index will not perform well.  A
> > > freshly-initialised
> > > XArray contains a NULL pointer at every index.  There is no difference
> > > between an entry which has never been stored to and an entry which has
> > > most
> > > recently had NULL stored to it.
> > 
> > I am no kernel developer (just provided a tiny bit of documentation a long
> > time ago)… but on reading into this, I missed:
> > 
> > What is it about? And what is it used for?
> > 
> > "Overview" appears to be already a description of the actual
> > implementation
> > specifics, instead of… well an overview.
> > 
> > Of course, I am sure you all know what it is for… but someone who wants to
> > learn about the kernel is likely to be confused by such a start.
[…]
> Thank you for your comment.  I'm clearly too close to it because even
> after reading your useful critique, I'm not sure what to change.  Please
> help me!

And likely I am too far away to and do not understand enough of it to provide 
more concrete suggestions, but let me try. (I do understand some programming 
stuff like what an array is, what a pointer what an linked list or a tree is 
or… so I am not completely novice here. I think the documentation should not 
cover any of these basics.)

> Maybe it's that I've described the abstraction as if it's the
> implementation and put too much detail into the overview.  This might
> be clearer?
> 
> The XArray is an abstract data type which behaves like an infinitely
> large array of pointers.  The index into the array is an unsigned long.
> A freshly-initialised XArray contains a NULL pointer at every index.

Yes, I think this is clearer already.

Maybe with a few sentences on "Why does the kernel provide this?", "Where is 
it used?" (if already known), "What use case is it suitable for – if I want to 
implement something into the kernel (or in user space?) ?" and probably "How 
does it differ from user data structures the kernel provides?"

I don´t know whether the questions make sense to you. But that were questions 
I had in mind as I read into your documentation. I do not think this needs to 
be long or so… maybe just a few sentences that put XArray into a context, 
before diving into the details. I think that could help new developers who 
want to learn about kernel development when they learn about XArray.

And then as you suggest all the important implementation details.

> ----
> and then move all this information into later paragraphs:
> 
> There is no difference between an entry which has never been stored to
> and an entry which has most recently had NULL stored to it.
> Each entry in the array can be either a pointer, or an
> encoded value between 0 and LONG_MAX.
> While you can use any index, the implementation is efficient when the
> indices used are densely clustered; hashing the object and using the
> hash as the index will not perform well.

Yes.

And I notice now that you have some use case remarks in here… like "efficient 
when densely clustered". I missed these initially.

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: XArray documentation
@ 2017-11-24 18:01         ` Martin Steigerwald
  0 siblings, 0 replies; 159+ messages in thread
From: Martin Steigerwald @ 2017-11-24 18:01 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

Hi Matthew.

Matthew Wilcox - 24.11.17, 18:03:
> On Fri, Nov 24, 2017 at 05:50:41PM +0100, Martin Steigerwald wrote:
> > Matthew Wilcox - 24.11.17, 02:16:
> > > ======
> > > XArray
> > > ======
> > > 
> > > Overview
> > > ========
> > > 
> > > The XArray is an array of ULONG_MAX entries.  Each entry can be either
> > > a pointer, or an encoded value between 0 and LONG_MAX.  It is efficient
> > > when the indices used are densely clustered; hashing the object and
> > > using the hash as the index will not perform well.  A
> > > freshly-initialised
> > > XArray contains a NULL pointer at every index.  There is no difference
> > > between an entry which has never been stored to and an entry which has
> > > most
> > > recently had NULL stored to it.
> > 
> > I am no kernel developer (just provided a tiny bit of documentation a long
> > time ago)… but on reading into this, I missed:
> > 
> > What is it about? And what is it used for?
> > 
> > "Overview" appears to be already a description of the actual
> > implementation
> > specifics, instead of… well an overview.
> > 
> > Of course, I am sure you all know what it is for… but someone who wants to
> > learn about the kernel is likely to be confused by such a start.
[…]
> Thank you for your comment.  I'm clearly too close to it because even
> after reading your useful critique, I'm not sure what to change.  Please
> help me!

And likely I am too far away to and do not understand enough of it to provide 
more concrete suggestions, but let me try. (I do understand some programming 
stuff like what an array is, what a pointer what an linked list or a tree is 
or… so I am not completely novice here. I think the documentation should not 
cover any of these basics.)

> Maybe it's that I've described the abstraction as if it's the
> implementation and put too much detail into the overview.  This might
> be clearer?
> 
> The XArray is an abstract data type which behaves like an infinitely
> large array of pointers.  The index into the array is an unsigned long.
> A freshly-initialised XArray contains a NULL pointer at every index.

Yes, I think this is clearer already.

Maybe with a few sentences on "Why does the kernel provide this?", "Where is 
it used?" (if already known), "What use case is it suitable for – if I want to 
implement something into the kernel (or in user space?) ?" and probably "How 
does it differ from user data structures the kernel provides?"

I don´t know whether the questions make sense to you. But that were questions 
I had in mind as I read into your documentation. I do not think this needs to 
be long or so… maybe just a few sentences that put XArray into a context, 
before diving into the details. I think that could help new developers who 
want to learn about kernel development when they learn about XArray.

And then as you suggest all the important implementation details.

> ----
> and then move all this information into later paragraphs:
> 
> There is no difference between an entry which has never been stored to
> and an entry which has most recently had NULL stored to it.
> Each entry in the array can be either a pointer, or an
> encoded value between 0 and LONG_MAX.
> While you can use any index, the implementation is efficient when the
> indices used are densely clustered; hashing the object and using the
> hash as the index will not perform well.

Yes.

And I notice now that you have some use case remarks in here… like "efficient 
when densely clustered". I missed these initially.

Thanks,
-- 
Martin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: XArray documentation
  2017-11-24 18:01         ` Martin Steigerwald
@ 2017-11-24 19:48           ` Shakeel Butt
  -1 siblings, 0 replies; 159+ messages in thread
From: Shakeel Butt @ 2017-11-24 19:48 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Matthew Wilcox, linux-fsdevel, Linux MM, LKML, Matthew Wilcox

On Fri, Nov 24, 2017 at 10:01 AM, Martin Steigerwald
<martin@lichtvoll.de> wrote:
> Hi Matthew.
>
> Matthew Wilcox - 24.11.17, 18:03:
>> On Fri, Nov 24, 2017 at 05:50:41PM +0100, Martin Steigerwald wrote:
>> > Matthew Wilcox - 24.11.17, 02:16:
>> > > ======
>> > > XArray
>> > > ======
>> > >
>> > > Overview
>> > > ========
>> > >
>> > > The XArray is an array of ULONG_MAX entries.  Each entry can be either
>> > > a pointer, or an encoded value between 0 and LONG_MAX.  It is efficient
>> > > when the indices used are densely clustered; hashing the object and
>> > > using the hash as the index will not perform well.  A
>> > > freshly-initialised
>> > > XArray contains a NULL pointer at every index.  There is no difference
>> > > between an entry which has never been stored to and an entry which has
>> > > most
>> > > recently had NULL stored to it.
>> >
>> > I am no kernel developer (just provided a tiny bit of documentation a long
>> > time ago)… but on reading into this, I missed:
>> >
>> > What is it about? And what is it used for?
>> >
>> > "Overview" appears to be already a description of the actual
>> > implementation
>> > specifics, instead of… well an overview.
>> >
>> > Of course, I am sure you all know what it is for… but someone who wants to
>> > learn about the kernel is likely to be confused by such a start.
> […]
>> Thank you for your comment.  I'm clearly too close to it because even
>> after reading your useful critique, I'm not sure what to change.  Please
>> help me!
>
> And likely I am too far away to and do not understand enough of it to provide
> more concrete suggestions, but let me try. (I do understand some programming
> stuff like what an array is, what a pointer what an linked list or a tree is
> or… so I am not completely novice here. I think the documentation should not
> cover any of these basics.)
>
>> Maybe it's that I've described the abstraction as if it's the
>> implementation and put too much detail into the overview.  This might
>> be clearer?
>>
>> The XArray is an abstract data type which behaves like an infinitely
>> large array of pointers.  The index into the array is an unsigned long.
>> A freshly-initialised XArray contains a NULL pointer at every index.
>
> Yes, I think this is clearer already.
>
> Maybe with a few sentences on "Why does the kernel provide this?", "Where is
> it used?" (if already known), "What use case is it suitable for – if I want to
> implement something into the kernel (or in user space?) ?" and probably "How
> does it differ from user data structures the kernel provides?"
>

Adding on to Martin's questions. Basically what is the motivation
behind it? It seems like a replacement for radix tree, so, it would be
good to write why radix tree was not good enough or which use cases
radix tree could not solve. Also how XArray solves those
issues/use-cases? And if you know which scenarios or use-cases where
XArray will not be an optimal solution.

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: XArray documentation
@ 2017-11-24 19:48           ` Shakeel Butt
  0 siblings, 0 replies; 159+ messages in thread
From: Shakeel Butt @ 2017-11-24 19:48 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Matthew Wilcox, linux-fsdevel, Linux MM, LKML, Matthew Wilcox

On Fri, Nov 24, 2017 at 10:01 AM, Martin Steigerwald
<martin@lichtvoll.de> wrote:
> Hi Matthew.
>
> Matthew Wilcox - 24.11.17, 18:03:
>> On Fri, Nov 24, 2017 at 05:50:41PM +0100, Martin Steigerwald wrote:
>> > Matthew Wilcox - 24.11.17, 02:16:
>> > > ======
>> > > XArray
>> > > ======
>> > >
>> > > Overview
>> > > ========
>> > >
>> > > The XArray is an array of ULONG_MAX entries.  Each entry can be either
>> > > a pointer, or an encoded value between 0 and LONG_MAX.  It is efficient
>> > > when the indices used are densely clustered; hashing the object and
>> > > using the hash as the index will not perform well.  A
>> > > freshly-initialised
>> > > XArray contains a NULL pointer at every index.  There is no difference
>> > > between an entry which has never been stored to and an entry which has
>> > > most
>> > > recently had NULL stored to it.
>> >
>> > I am no kernel developer (just provided a tiny bit of documentation a long
>> > time ago)… but on reading into this, I missed:
>> >
>> > What is it about? And what is it used for?
>> >
>> > "Overview" appears to be already a description of the actual
>> > implementation
>> > specifics, instead of… well an overview.
>> >
>> > Of course, I am sure you all know what it is for… but someone who wants to
>> > learn about the kernel is likely to be confused by such a start.
> […]
>> Thank you for your comment.  I'm clearly too close to it because even
>> after reading your useful critique, I'm not sure what to change.  Please
>> help me!
>
> And likely I am too far away to and do not understand enough of it to provide
> more concrete suggestions, but let me try. (I do understand some programming
> stuff like what an array is, what a pointer what an linked list or a tree is
> or… so I am not completely novice here. I think the documentation should not
> cover any of these basics.)
>
>> Maybe it's that I've described the abstraction as if it's the
>> implementation and put too much detail into the overview.  This might
>> be clearer?
>>
>> The XArray is an abstract data type which behaves like an infinitely
>> large array of pointers.  The index into the array is an unsigned long.
>> A freshly-initialised XArray contains a NULL pointer at every index.
>
> Yes, I think this is clearer already.
>
> Maybe with a few sentences on "Why does the kernel provide this?", "Where is
> it used?" (if already known), "What use case is it suitable for – if I want to
> implement something into the kernel (or in user space?) ?" and probably "How
> does it differ from user data structures the kernel provides?"
>

Adding on to Martin's questions. Basically what is the motivation
behind it? It seems like a replacement for radix tree, so, it would be
good to write why radix tree was not good enough or which use cases
radix tree could not solve. Also how XArray solves those
issues/use-cases? And if you know which scenarios or use-cases where
XArray will not be an optimal solution.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: XArray documentation
  2017-11-24 19:48           ` Shakeel Butt
@ 2017-11-24 19:56             ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-24 19:56 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Martin Steigerwald, linux-fsdevel, Linux MM, LKML, Matthew Wilcox

On Fri, Nov 24, 2017 at 11:48:49AM -0800, Shakeel Butt wrote:
> Adding on to Martin's questions. Basically what is the motivation
> behind it? It seems like a replacement for radix tree, so, it would be
> good to write why radix tree was not good enough or which use cases
> radix tree could not solve. Also how XArray solves those
> issues/use-cases? And if you know which scenarios or use-cases where
> XArray will not be an optimal solution.

I strongly disagree that there should be any mention of the radix tree
in this document.  In the Glorious Future, there will be no more radix
tree and so talking about it will only confuse people (because Linux's
radix tree is not in fact a radix tree).

Talking about why the radix tree isn't good enough is for the cover letter.
I did it quite well here:

https://lwn.net/Articles/715948/

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: XArray documentation
@ 2017-11-24 19:56             ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-24 19:56 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Martin Steigerwald, linux-fsdevel, Linux MM, LKML, Matthew Wilcox

On Fri, Nov 24, 2017 at 11:48:49AM -0800, Shakeel Butt wrote:
> Adding on to Martin's questions. Basically what is the motivation
> behind it? It seems like a replacement for radix tree, so, it would be
> good to write why radix tree was not good enough or which use cases
> radix tree could not solve. Also how XArray solves those
> issues/use-cases? And if you know which scenarios or use-cases where
> XArray will not be an optimal solution.

I strongly disagree that there should be any mention of the radix tree
in this document.  In the Glorious Future, there will be no more radix
tree and so talking about it will only confuse people (because Linux's
radix tree is not in fact a radix tree).

Talking about why the radix tree isn't good enough is for the cover letter.
I did it quite well here:

https://lwn.net/Articles/715948/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: XArray documentation
  2017-11-24 18:01         ` Martin Steigerwald
  (?)
@ 2017-11-24 21:18           ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-24 21:18 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

On Fri, Nov 24, 2017 at 07:01:31PM +0100, Martin Steigerwald wrote:
> > The XArray is an abstract data type which behaves like an infinitely
> > large array of pointers.  The index into the array is an unsigned long.
> > A freshly-initialised XArray contains a NULL pointer at every index.
> 
> Yes, I think this is clearer already.
> 
> Maybe with a few sentences on "Why does the kernel provide this?", "Where is 
> it used?" (if already known), "What use case is it suitable for – if I want to 
> implement something into the kernel (or in user space?) ?" and probably "How 
> does it differ from user data structures the kernel provides?"

OK, I think this is getting more useful.  Here's what I currently have:

Overview
========

The XArray is an abstract data type which behaves like a very large array
of pointers.  It meets many of the same needs as a hash or a conventional
resizable array.  Unlike a hash, it allows you to sensibly go to the
next or previous entry in a cache-efficient manner.  In contrast to
a resizable array, there is no need for copying data or changing MMU
mappings in order to grow the array.  It is more memory-efficient,
parallelisable and cache friendly than a doubly-linked list.  It takes
advantage of RCU to perform lookups without locking.

The XArray implementation is efficient when the indices used are
densely clustered; hashing the object and using the hash as the index
will not perform well.  The XArray is optimised for small indices,
but still has good performance with large indices.  If your index is
larger than ULONG_MAX then the XArray is not the data type for you.
The most important user of the XArray is the page cache.

A freshly-initialised XArray contains a ``NULL`` pointer at every index.
Each non-``NULL`` entry in the array has three bits associated with
it called tags.  Each tag may be flipped on or off independently of
the others.  You can search for entries with a given tag set.

Normal pointers may be stored in the XArray directly.  They must be 4-byte
aligned, which is true for any pointer returned from :c:func:`kmalloc` and
:c:func:`alloc_page`.  It isn't true for arbitrary user-space pointers,
nor for function pointers.  You can store pointers to statically allocated
objects, as long as those objects have an alignment of at least 4.

The XArray does not support storing :c:func:`IS_ERR` pointers; some
conflict with data values and others conflict with entries the XArray
uses for its own purposes.  If you need to store special values which
cannot be confused with real kernel pointers, the values 4, 8, ... 4092
are available.

You can also store integers between 0 and ``LONG_MAX`` in the XArray.
You must first convert it into an entry using :c:func:`xa_mk_value`.
When you retrieve an entry from the XArray, you can check whether it is
a data value by calling :c:func:`xa_is_value`, and convert it back to
an integer by calling :c:func:`xa_to_value`.

An unusual feature of the XArray is the ability to create entries which
occupy a range of indices.  Once stored to, looking up any index in
the range will give the same entry as looking up any other index in
the range.  Setting a tag on one index will set it on all of them.
Storing to any index will store to all of them.  Multi-index entries can
be explicitly split into smaller entries, or storing ``NULL`` into any
entry will cause the XArray to forget about the range.

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: XArray documentation
@ 2017-11-24 21:18           ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-24 21:18 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

On Fri, Nov 24, 2017 at 07:01:31PM +0100, Martin Steigerwald wrote:
> > The XArray is an abstract data type which behaves like an infinitely
> > large array of pointers.  The index into the array is an unsigned long.
> > A freshly-initialised XArray contains a NULL pointer at every index.
> 
> Yes, I think this is clearer already.
> 
> Maybe with a few sentences on "Why does the kernel provide this?", "Where is 
> it used?" (if already known), "What use case is it suitable for – if I want to 
> implement something into the kernel (or in user space?) ?" and probably "How 
> does it differ from user data structures the kernel provides?"

OK, I think this is getting more useful.  Here's what I currently have:

Overview
========

The XArray is an abstract data type which behaves like a very large array
of pointers.  It meets many of the same needs as a hash or a conventional
resizable array.  Unlike a hash, it allows you to sensibly go to the
next or previous entry in a cache-efficient manner.  In contrast to
a resizable array, there is no need for copying data or changing MMU
mappings in order to grow the array.  It is more memory-efficient,
parallelisable and cache friendly than a doubly-linked list.  It takes
advantage of RCU to perform lookups without locking.

The XArray implementation is efficient when the indices used are
densely clustered; hashing the object and using the hash as the index
will not perform well.  The XArray is optimised for small indices,
but still has good performance with large indices.  If your index is
larger than ULONG_MAX then the XArray is not the data type for you.
The most important user of the XArray is the page cache.

A freshly-initialised XArray contains a ``NULL`` pointer at every index.
Each non-``NULL`` entry in the array has three bits associated with
it called tags.  Each tag may be flipped on or off independently of
the others.  You can search for entries with a given tag set.

Normal pointers may be stored in the XArray directly.  They must be 4-byte
aligned, which is true for any pointer returned from :c:func:`kmalloc` and
:c:func:`alloc_page`.  It isn't true for arbitrary user-space pointers,
nor for function pointers.  You can store pointers to statically allocated
objects, as long as those objects have an alignment of at least 4.

The XArray does not support storing :c:func:`IS_ERR` pointers; some
conflict with data values and others conflict with entries the XArray
uses for its own purposes.  If you need to store special values which
cannot be confused with real kernel pointers, the values 4, 8, ... 4092
are available.

You can also store integers between 0 and ``LONG_MAX`` in the XArray.
You must first convert it into an entry using :c:func:`xa_mk_value`.
When you retrieve an entry from the XArray, you can check whether it is
a data value by calling :c:func:`xa_is_value`, and convert it back to
an integer by calling :c:func:`xa_to_value`.

An unusual feature of the XArray is the ability to create entries which
occupy a range of indices.  Once stored to, looking up any index in
the range will give the same entry as looking up any other index in
the range.  Setting a tag on one index will set it on all of them.
Storing to any index will store to all of them.  Multi-index entries can
be explicitly split into smaller entries, or storing ``NULL`` into any
entry will cause the XArray to forget about the range.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: XArray documentation
@ 2017-11-24 21:18           ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-24 21:18 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

On Fri, Nov 24, 2017 at 07:01:31PM +0100, Martin Steigerwald wrote:
> > The XArray is an abstract data type which behaves like an infinitely
> > large array of pointers.  The index into the array is an unsigned long.
> > A freshly-initialised XArray contains a NULL pointer at every index.
> 
> Yes, I think this is clearer already.
> 
> Maybe with a few sentences on "Why does the kernel provide this?", "Where is 
> it used?" (if already known), "What use case is it suitable for a?? if I want to 
> implement something into the kernel (or in user space?) ?" and probably "How 
> does it differ from user data structures the kernel provides?"

OK, I think this is getting more useful.  Here's what I currently have:

Overview
========

The XArray is an abstract data type which behaves like a very large array
of pointers.  It meets many of the same needs as a hash or a conventional
resizable array.  Unlike a hash, it allows you to sensibly go to the
next or previous entry in a cache-efficient manner.  In contrast to
a resizable array, there is no need for copying data or changing MMU
mappings in order to grow the array.  It is more memory-efficient,
parallelisable and cache friendly than a doubly-linked list.  It takes
advantage of RCU to perform lookups without locking.

The XArray implementation is efficient when the indices used are
densely clustered; hashing the object and using the hash as the index
will not perform well.  The XArray is optimised for small indices,
but still has good performance with large indices.  If your index is
larger than ULONG_MAX then the XArray is not the data type for you.
The most important user of the XArray is the page cache.

A freshly-initialised XArray contains a ``NULL`` pointer at every index.
Each non-``NULL`` entry in the array has three bits associated with
it called tags.  Each tag may be flipped on or off independently of
the others.  You can search for entries with a given tag set.

Normal pointers may be stored in the XArray directly.  They must be 4-byte
aligned, which is true for any pointer returned from :c:func:`kmalloc` and
:c:func:`alloc_page`.  It isn't true for arbitrary user-space pointers,
nor for function pointers.  You can store pointers to statically allocated
objects, as long as those objects have an alignment of at least 4.

The XArray does not support storing :c:func:`IS_ERR` pointers; some
conflict with data values and others conflict with entries the XArray
uses for its own purposes.  If you need to store special values which
cannot be confused with real kernel pointers, the values 4, 8, ... 4092
are available.

You can also store integers between 0 and ``LONG_MAX`` in the XArray.
You must first convert it into an entry using :c:func:`xa_mk_value`.
When you retrieve an entry from the XArray, you can check whether it is
a data value by calling :c:func:`xa_is_value`, and convert it back to
an integer by calling :c:func:`xa_to_value`.

An unusual feature of the XArray is the ability to create entries which
occupy a range of indices.  Once stored to, looking up any index in
the range will give the same entry as looking up any other index in
the range.  Setting a tag on one index will set it on all of them.
Storing to any index will store to all of them.  Multi-index entries can
be explicitly split into smaller entries, or storing ``NULL`` into any
entry will cause the XArray to forget about the range.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: XArray documentation
  2017-11-24 21:18           ` Matthew Wilcox
@ 2017-11-24 22:02             ` Martin Steigerwald
  -1 siblings, 0 replies; 159+ messages in thread
From: Martin Steigerwald @ 2017-11-24 22:02 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

Matthew Wilcox - 24.11.17, 22:18:
> On Fri, Nov 24, 2017 at 07:01:31PM +0100, Martin Steigerwald wrote:
> > > The XArray is an abstract data type which behaves like an infinitely
> > > large array of pointers.  The index into the array is an unsigned long.
> > > A freshly-initialised XArray contains a NULL pointer at every index.
> > 
> > Yes, I think this is clearer already.
> > 
> > Maybe with a few sentences on "Why does the kernel provide this?", "Where
> > is it used?" (if already known), "What use case is it suitable for – if I
> > want to implement something into the kernel (or in user space?) ?" and
> > probably "How does it differ from user data structures the kernel
> > provides?"
> 
> OK, I think this is getting more useful.  Here's what I currently have:
> 
> Overview
> ========
> 
> The XArray is an abstract data type which behaves like a very large array
> of pointers.  It meets many of the same needs as a hash or a conventional
> resizable array.  Unlike a hash, it allows you to sensibly go to the
> next or previous entry in a cache-efficient manner.  In contrast to
> a resizable array, there is no need for copying data or changing MMU
> mappings in order to grow the array.  It is more memory-efficient,
> parallelisable and cache friendly than a doubly-linked list.  It takes
> advantage of RCU to perform lookups without locking.

I like this.

I bet I may not be able help much further with it other than to possibly 
proofread it tomorrow.

Thank you for considering my suggestion.

-- 
Martin

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: XArray documentation
@ 2017-11-24 22:02             ` Martin Steigerwald
  0 siblings, 0 replies; 159+ messages in thread
From: Martin Steigerwald @ 2017-11-24 22:02 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

Matthew Wilcox - 24.11.17, 22:18:
> On Fri, Nov 24, 2017 at 07:01:31PM +0100, Martin Steigerwald wrote:
> > > The XArray is an abstract data type which behaves like an infinitely
> > > large array of pointers.  The index into the array is an unsigned long.
> > > A freshly-initialised XArray contains a NULL pointer at every index.
> > 
> > Yes, I think this is clearer already.
> > 
> > Maybe with a few sentences on "Why does the kernel provide this?", "Where
> > is it used?" (if already known), "What use case is it suitable for – if I
> > want to implement something into the kernel (or in user space?) ?" and
> > probably "How does it differ from user data structures the kernel
> > provides?"
> 
> OK, I think this is getting more useful.  Here's what I currently have:
> 
> Overview
> ========
> 
> The XArray is an abstract data type which behaves like a very large array
> of pointers.  It meets many of the same needs as a hash or a conventional
> resizable array.  Unlike a hash, it allows you to sensibly go to the
> next or previous entry in a cache-efficient manner.  In contrast to
> a resizable array, there is no need for copying data or changing MMU
> mappings in order to grow the array.  It is more memory-efficient,
> parallelisable and cache friendly than a doubly-linked list.  It takes
> advantage of RCU to perform lookups without locking.

I like this.

I bet I may not be able help much further with it other than to possibly 
proofread it tomorrow.

Thank you for considering my suggestion.

-- 
Martin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: XArray documentation
  2017-11-24 22:02             ` Martin Steigerwald
@ 2017-11-24 22:08               ` Matthew Wilcox
  -1 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-24 22:08 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

On Fri, Nov 24, 2017 at 11:02:41PM +0100, Martin Steigerwald wrote:
> I like this.
> 
> I bet I may not be able help much further with it other than to possibly 
> proofread it tomorrow.
> 
> Thank you for considering my suggestion.

Not at all; your feedback has significantly improved this document.
Thank you for your contribution.

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: XArray documentation
@ 2017-11-24 22:08               ` Matthew Wilcox
  0 siblings, 0 replies; 159+ messages in thread
From: Matthew Wilcox @ 2017-11-24 22:08 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-fsdevel, linux-mm, linux-kernel, Matthew Wilcox

On Fri, Nov 24, 2017 at 11:02:41PM +0100, Martin Steigerwald wrote:
> I like this.
> 
> I bet I may not be able help much further with it other than to possibly 
> proofread it tomorrow.
> 
> Thank you for considering my suggestion.

Not at all; your feedback has significantly improved this document.
Thank you for your contribution.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 159+ messages in thread

end of thread, other threads:[~2017-11-24 22:08 UTC | newest]

Thread overview: 159+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-22 21:06 [PATCH 00/62] XArray November 2017 Edition Matthew Wilcox
2017-11-22 21:06 ` Matthew Wilcox
2017-11-22 21:06 ` [PATCH 01/62] tools: Make __test_and_clear_bit available Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:06 ` [PATCH 02/62] radix tree test suite: Remove ARRAY_SIZE Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:06 ` [PATCH 03/62] radix tree test suite: Check reclaim bit Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:06 ` [PATCH 04/62] idr test suite: Fix ida_test_random() Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:06 ` [PATCH 05/62] radix tree: Add a missing cast to gfp_t Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:28   ` Luc Van Oostenryck
2017-11-22 21:28     ` Luc Van Oostenryck
2017-11-22 22:24     ` Matthew Wilcox
2017-11-22 22:24       ` Matthew Wilcox
2017-11-22 22:35       ` Luc Van Oostenryck
2017-11-22 22:35         ` Luc Van Oostenryck
2017-11-22 21:06 ` [PATCH 06/62] idr: Make cursor explicit for cyclic allocation Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:06 ` [PATCH 07/62] idr: Rewrite extended IDR API Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:06 ` [PATCH 08/62] Explicitly include radix-tree.h Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:06 ` [PATCH 09/62] arm64: Turn flush_dcache_mmap_lock into a no-op Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:06 ` [PATCH 10/62] unicore32: " Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:06 ` [PATCH 11/62] Export __set_page_dirty Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:06 ` [PATCH 12/62] xfs: Rename xa_ elements to ail_ Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:06 ` [PATCH 13/62] fscache: Use appropriate radix tree accessors Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:06 ` [PATCH 14/62] xarray: Add the xa_lock to the radix_tree_root Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:06 ` [PATCH 15/62] page cache: Use xa_lock Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:06 ` [PATCH 16/62] xarray: Replace exceptional entries Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:06 ` [PATCH 17/62] xarray: Change definition of sibling entries Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:06 ` [PATCH 18/62] xarray: Add definition of struct xarray Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:06 ` [PATCH 19/62] xarray: Define struct xa_node Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:06 ` [PATCH 20/62] xarray: Add xa_load Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:06 ` [PATCH 21/62] xarray: Add xa_get_tag, xa_set_tag and xa_clear_tag Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:06 ` [PATCH 22/62] xarray: Add xa_store Matthew Wilcox
2017-11-22 21:06   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 23/62] xarray: Add xa_cmpxchg Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 24/62] xarray: Add xa_for_each Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 25/62] xarray: Add xa_init Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 26/62] xarray: Add xas_for_each_tag Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 27/62] xarray: Add xa_get_entries and xa_get_tagged Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 28/62] xarray: Add xa_destroy Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 29/62] xarray: Add xas_prev_any Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 30/62] xarray: Add xas_find_any / xas_next_any Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 31/62] Convert IDR to use xarray Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 32/62] ida: Convert to using xarray Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 33/62] page cache: Convert page_cache_next_hole to XArray Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 34/62] page cache: Use xarray for adding pages Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 35/62] page cache: Convert page_cache_tree_delete to xarray Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 36/62] page cache: Convert find_get_entry " Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 37/62] shmem: Convert replace " Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 38/62] shmem: Convert shmem_confirm_swap to XArray Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 39/62] shmem: Convert find_swap_entry " Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 40/62] shmem: Convert shmem_tag_pins " Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 41/62] shmem: Convert shmem_wait_for_pins " Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 42/62] vmalloc: Convert to xarray Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 43/62] brd: Convert to XArray Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 44/62] xfs: Convert m_perag_tree " Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 45/62] xfs: Convert pag_ici_root " Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 46/62] xfs: Convert xfs dquot " Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 47/62] xfs: Convert mru cache " Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 48/62] block: Remove IDR preloading Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 49/62] rxrpc: " Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 50/62] cgroup: Remove IDR wrappers Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 51/62] dca: Remove idr_preload calls Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 52/62] ipc: Remove call to idr_preload Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 53/62] irq: " Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 54/62] scsi: Remove idr_preload in st driver Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 55/62] firewire: Remove call to idr_preload Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 56/62] drm: Remove drm_minor_lock and idr_preload Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 57/62] drm: Remove drm_syncobj_fd_to_handle Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 58/62] drm: Remove qxl driver IDR locks Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 59/62] drm: Replace virtio IDRs with IDAs Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 60/62] drm: Replace vmwgfx " Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 61/62] net: Redesign act_api use of IDR Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-22 21:07 ` [PATCH 62/62] mm: Convert page-writeback to XArray Matthew Wilcox
2017-11-22 21:07   ` Matthew Wilcox
2017-11-23  1:25 ` [PATCH 00/62] XArray November 2017 Edition Dave Chinner
2017-11-23  1:25   ` Dave Chinner
2017-11-23  2:46   ` Matthew Wilcox
2017-11-23  2:46     ` Matthew Wilcox
2017-11-24  1:16 ` XArray documentation Matthew Wilcox
2017-11-24  1:16   ` Matthew Wilcox
2017-11-24  4:30   ` Andreas Dilger
2017-11-24 17:17     ` Matthew Wilcox
2017-11-24 17:17       ` Matthew Wilcox
2017-11-24 16:50   ` Martin Steigerwald
2017-11-24 16:50     ` Martin Steigerwald
2017-11-24 17:03     ` Matthew Wilcox
2017-11-24 17:03       ` Matthew Wilcox
2017-11-24 17:03       ` Matthew Wilcox
2017-11-24 18:01       ` Martin Steigerwald
2017-11-24 18:01         ` Martin Steigerwald
2017-11-24 19:48         ` Shakeel Butt
2017-11-24 19:48           ` Shakeel Butt
2017-11-24 19:56           ` Matthew Wilcox
2017-11-24 19:56             ` Matthew Wilcox
2017-11-24 21:18         ` Matthew Wilcox
2017-11-24 21:18           ` Matthew Wilcox
2017-11-24 21:18           ` Matthew Wilcox
2017-11-24 22:02           ` Martin Steigerwald
2017-11-24 22:02             ` Martin Steigerwald
2017-11-24 22:08             ` Matthew Wilcox
2017-11-24 22:08               ` Matthew Wilcox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.