linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Design of xa_alloc
@ 2019-01-17 20:08 Matthew Wilcox
  2019-01-22  1:53 ` Matthew Wilcox
  0 siblings, 1 reply; 2+ messages in thread
From: Matthew Wilcox @ 2019-01-17 20:08 UTC (permalink / raw)
  To: linux-kernel


As part of getting rid of the radix tree, I need to replace idr_alloc()
with xa_alloc().  xa_alloc() takes the internal xa_lock (and has
xa_alloc_irq(), xa_alloc_bh() and __xa_alloc() variations which handle
the lock the way you probably expect them to).

It would, of course, be possible to make the xa_alloc() API identical
to idr_alloc().  idr_alloc() has been a phenomenal success with around
200 callers in the kernel today.  But it could be improved, and this
seems like a good time to make the API changes.

int idr_alloc(struct idr *, void *ptr, int start, int end, gfp_t);

1. It is not possible to allocate an ID above 2^31.  This led to the
   introduction of idr_alloc_u32 (which has its own problems I'll talk
   about later)
2. It's hard to insert a fully-initialised object into an IDR in a
   way that is safe for RCU-based lookups; the object needs to be marked
   as not-fully-initialised, then inserted, then the ID updated, then the
   object marked as fully-initialised.  Many users choose an "invalid" ID
   for this purpose, or make two calls, first allocating a NULL pointer,
   then replacing it with the actual object.
3. Allocating a specific ID is a little awkward, passing 'id, id + 1' as
   start & end.
4, The meaning of the 'end' argument has been misunderstood by some users.
   It's exclusive, not inclusive, so writing 'INT_MAX' means you
   can allocate up to INT_MAX-1.  This probably doesn't cause any bugs,
   but it's not great.
5. Nobody actually uses the ability to allocate an ID between 'a' and
   'a + n' which is the rationale for how idr_alloc interprets negative
   'end'.
6. Using a non-zero 'start' is inefficient.  I made a start on solving
   this with IDR_INIT_BASE(), but this is a good opportunity to solve
   it for the whole kernel.

As for idr_alloc_u32(), it's just too easy to forget to initialise *id
before calling it, and there's no realistic way to get gcc to warn for
that mistake.  It does solve problems 2, 3 and 5, but it's a bad tradeoff.
The current xa_alloc() in the tree is modelled after idr_alloc_u32(), but
I feel it should be fixed.

I've been through a number of variations of this interface, but this is
what I've currently settled on:

/**
 * xa_alloc() - Find somewhere to store this entry in the XArray.
 * @xa: XArray.
 * @id: Pointer to ID.
 * @entry: New entry.
 * @limit: Range of ID to allocate.
 * @gfp: Memory allocation flags.
 *
 * Finds an empty entry in @xa between @limit.min and @limit.max,
 * stores the index into the @id pointer, then stores the entry at
 * that index.  A concurrent lookup will not see an uninitialised @id.
 *
 * Context: Any context.  Takes and releases the xa_lock.  May sleep if
 * the @gfp flags permit.
 * Return: 0 on success, -ENOMEM if memory could not be allocated or
 * -ENOSPC if there are no free entries in @limit.
 */
static inline __must_check int xa_alloc(struct xarray *xa, u32 *id,
                void *entry, struct xa_limit limit, gfp_t gfp)

Problems 1 & 2 are solved by passing a pointer to the ID, but using it
only as an output, avoiding the initialisation problem of idr_alloc_u32().
There are a few users who store the allocated ID in a u16, but this
isn't too hard to work around.

Problem 3 is solved by using xa_insert() instead of xa_alloc().  It does
return a different errno (-EEXIST) instead of -ENOSPC, but most users
convert the errno to something else anyway because -ENOSPC is a horrible
errno to return to userspace.

Problem 4 is solved by interpreting the 'max' element of xa_limit
inclusively.  The xa_limit is a little complicated to explain, but easy
to use.  Most users will use one of two predefined ranges; xa_limit_32b
or xa_limit_31b which correspond to the ranges [0 - UINT_MAX] and [0 -
INT_MAX] respectively.  The remaining users will pass XA_LIMIT(x, y)
as an argument.

Problem 6 is solved by having two ways of declaring an allocating XArray:

#define DEFINE_XARRAY_ALLOC(name) DEFINE_XARRAY_FLAGS(name, XA_FLAGS_ALLOC)
#define DEFINE_XARRAY_ALLOC1(name) DEFINE_XARRAY_FLAGS(name, XA_FLAGS_ALLOC1)

Supporting bases other than 0 and 1 seems like more effort than it's worth.


Here's an example conversion which is fairly typical, drivers/scsi/st.c:

-static DEFINE_SPINLOCK(st_index_lock);
-static DEFINE_IDR(st_index_idr);
+static DEFINE_XARRAY_ALLOC(st_index);

[...]

-       idr_preload(GFP_KERNEL);
-       spin_lock(&st_index_lock);
-       error = idr_alloc(&st_index_idr, tpnt, 0, ST_MAX_TAPES + 1, GFP_NOWAIT);
-       spin_unlock(&st_index_lock);
-       idr_preload_end();
+       error = xa_alloc(&st_index, &tpnt->index, tpnt,
+                       XA_LIMIT(0, ST_MAX_TAPES), GFP_KERNEL);
        if (error < 0) {
                pr_warn("st: idr allocation failed: %d\n", error);
                goto out_put_queue;
        }
-       tpnt->index = error;


Similarly, we also need an xa_alloc_cyclic, which can be found here:
http://git.infradead.org/users/willy/linux-dax.git/commitdiff/9bbcfb45052235ede806b0a84dcaacc27a5c2f66


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Design of xa_alloc
  2019-01-17 20:08 Design of xa_alloc Matthew Wilcox
@ 2019-01-22  1:53 ` Matthew Wilcox
  0 siblings, 0 replies; 2+ messages in thread
From: Matthew Wilcox @ 2019-01-22  1:53 UTC (permalink / raw)
  To: linux-kernel

On Thu, Jan 17, 2019 at 12:08:17PM -0800, Matthew Wilcox wrote:
> It would, of course, be possible to make the xa_alloc() API identical
> to idr_alloc().  idr_alloc() has been a phenomenal success with around
> 200 callers in the kernel today.  But it could be improved, and this
> seems like a good time to make the API changes.

> Problem 3 is solved by using xa_insert() instead of xa_alloc().  It does
> return a different errno (-EEXIST) instead of -ENOSPC, but most users
> convert the errno to something else anyway because -ENOSPC is a horrible
> errno to return to userspace.

Writing this down and then thinking about it more makes me think that
this is the right time to change the return code of xa_alloc().  I chose
-ENOSPC to be compatible with idr_alloc(), but bugs like
https://bugzilla.kernel.org/show_bug.cgi?id=202297
show that returning an -ENOSPC error to userspace is just confusing:

: # tc filter add dev ipenc0 parent 2:0 handle ::102 protocol ip prio 1 u32 match ip dst 224.44.44.4/32 flowid 2:102
: RTNETLINK answers: No space left on device
: We have an error talking to the kernel

I'm really tempted to make it return -EEXIST instead of -ENOSPC to match
xa_insert().

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-01-22  1:53 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-17 20:08 Design of xa_alloc Matthew Wilcox
2019-01-22  1:53 ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).