linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Manfred Spraul <manfred@colorfullife.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH] ipc: Convert ipcs_idr to XArray
Date: Mon, 20 Apr 2020 10:06:03 -0700	[thread overview]
Message-ID: <20200420170603.GC5820@bombadil.infradead.org> (raw)
In-Reply-To: <80ab3182-5a17-7434-9007-33eb1da46d85@colorfullife.com>

On Mon, Apr 20, 2020 at 05:35:20PM +0200, Manfred Spraul wrote:
> > -		max_idx = max(ids->in_use*3/2, ipc_min_cycle);
> > -		max_idx = min(max_idx, ipc_mni);
> > -
> > -		/* allocate the idx, with a NULL struct kern_ipc_perm */
> > -		idx = idr_alloc_cyclic(&ids->ipcs_idr, NULL, 0, max_idx,
> > -					GFP_NOWAIT);
> > -
> > -		if (idx >= 0) {
> > -			/*
> > -			 * idx got allocated successfully.
> > -			 * Now calculate the sequence number and set the
> > -			 * pointer for real.
> > -			 */
> > -			if (idx <= ids->last_idx) {
> > +		min_idx = ids->next_idx;
> > +		new->seq = ids->seq;
> > +
> > +		/* Modified version of __xa_alloc */
> > +		do {
> > +			xas.xa_index = min_idx;
> > +			xas_find_marked(&xas, max_idx, XA_FREE_MARK);
> > +			if (xas.xa_node == XAS_RESTART && min_idx > 0) {
> >   				ids->seq++;
> >   				if (ids->seq >= ipcid_seq_max())
> >   					ids->seq = 0;
> > +				new->seq = ids->seq;
> > +				xas.xa_index = 0;
> > +				min_idx = 0;
> > +				xas_find_marked(&xas, max_idx, XA_FREE_MARK);
> >   			}
> 
> Is is nessary to have that many details of xarray in ipc/util?
> 
> This function is not performance critical.
> 
> The core requirement is that ipc_obtain_object_check() must scale.
> 
> Would it be possible to use something like
> 
>     xa_alloc(,entry=NULL,)
> 
>     new->seq = ...
> 
>     xa_store(,entry=new,);

Yes, that would absolutely be possible, and some users do exactly that.
I thought that creating a new message queue / semaphore set / shared
memory segment would be performance sensitive.

> > -			ids->last_idx = idx;
> > -
> > -			new->seq = ids->seq;
> > -			/* no need for smp_wmb(), this is done
> > -			 * inside idr_replace, as part of
> > -			 * rcu_assign_pointer
> > -			 */
> 
> Could you leave the memory barrier comments in the code?
> 
> The rcu_assign_pointer() is the first hands-off from semget() or msgget().
> 
> Before the rcu_assign_pointer, e.g. semop() calls would return -EINVAL;
> 
> After the rcu_assign_pointer, semwhatever() must work - and e.g. the
> permission checks are lockless.

How about simply:
			/* xa_store contains a memory barrier */

> > -			idr_replace(&ids->ipcs_idr, new, idx);
> > -		}
> > +			if (xas.xa_node == XAS_RESTART)
> > +				xas_set_err(&xas, -ENOSPC);
> > +			else
> > +				new->id = (new->seq << ipcmni_seq_shift()) +
> > +					xas.xa_index;
> 
> Setting new->id should remain at the end, outside any locking:
> 
> The variable has no special protection, access is only allowed after proper
> locking, thus no need to have the initialization in the middle.
> 
> What is crucial is that the final value of new->seq is visible to all cpus
> before a storing the pointer.

The IPC locking is weird.  Most users spin_lock the xarray/idr/radix
tree for modifications, and on the read-side use RCU to protect the
lookup and a refcount on the object looked up in it (after which,
RCU is unlocked).  IPC seems to hold the RCU lock much longer, and it
has a per-object spinlock rather than refcount.  And it has an rwsem.
It feels like it could be much simpler, but I haven't had time to dig
into it and figure out why it's so different from all the other users.
Maybe it's just older code.

> > +			xas_store(&xas, new);
> > +			xas_clear_mark(&xas, XA_FREE_MARK);
> > +		} while (__xas_nomem(&xas, GFP_KERNEL));
> > +
> 
> Just for my curiosity:
> 
> If the xas is in an invalid state, then xas_store() will not store anything.
> Thus the loop will not store "new" multiple times, it will be stored only
> once.

Correct, although we're going to delete this loop entirely.

> @@ -472,7 +487,7 @@ void ipc_rmid(struct ipc_ids *ids, struct kern_ipc_perm
> *ipcp)
> >   			idx--;
> >   			if (idx == -1)
> >   				break;
> > -		} while (!idr_find(&ids->ipcs_idr, idx));
> > +		} while (!xa_load(&ids->ipcs, idx));
> >   		ids->max_idx = idx;
> >   	}
> >   }
> 
> Is there an xa_find_last() function?
> 
> It is outside of any hot path, I have a patch that does a binary search with
> idr_get_next().
> 
> If there is no xa_find_last(), then I would rebase that patch.

There is not currently an xa_find_last().  It shouldn't be too hard
to add; start at the top of the tree and walk backwards in each node
until finding a non-NULL entry.  Of course, it'll need documentation
and test cases.

  reply	other threads:[~2020-04-20 17:06 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-26 15:14 [PATCH] ipc: Convert ipcs_idr to XArray Matthew Wilcox
2020-04-18 20:15 ` Andrew Morton
2020-04-18 21:34   ` Matthew Wilcox
2020-04-18 22:08     ` Andrew Morton
2020-04-20 15:35 ` Manfred Spraul
2020-04-20 17:06   ` Matthew Wilcox [this message]
2020-04-21 18:33     ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200420170603.GC5820@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).