From: Tejun Heo <tj@kernel.org>
To: Waiman Long <waiman.long@hpe.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
Jan Kara <jack@suse.com>, Jeff Layton <jlayton@poochiereds.net>,
"J. Bruce Fields" <bfields@fieldses.org>,
Christoph Lameter <cl@linux-foundation.org>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Andi Kleen <andi@firstfloor.org>,
Dave Chinner <dchinner@redhat.com>,
Boqun Feng <boqun.feng@gmail.com>,
Scott J Norton <scott.norton@hpe.com>,
Douglas Hatch <doug.hatch@hpe.com>
Subject: Re: [PATCH v3 1/4] lib/dlock-list: Distributed and lock-protected lists
Date: Tue, 19 Jul 2016 15:23:33 -0400 [thread overview]
Message-ID: <20160719192333.GP3078@mtj.duckdns.org> (raw)
In-Reply-To: <578E7497.30602@hpe.com>
Hello,
On Tue, Jul 19, 2016 at 02:42:31PM -0400, Waiman Long wrote:
> On 07/18/2016 07:38 PM, Tejun Heo wrote:
> > > +struct dlock_list_node {
> > > + struct list_head list;
> > > + spinlock_t *lockptr;
> > > +};
> > Wouldn't it be better to point to dlock_list_percpu?
>
> I could. However, the only thing that matter is the spinlock that protects
> the list entry.
Yeah, we can get back to this when it's actually necessary. It just
looked a bit weird to me.
> > > +/*
> > > + * The dlock list iteration functions which return true if iteration has
> > > + * to be continued.
> > > + */
> > > +extern bool dlock_list_next(struct dlock_list_head *dlist,
> > > + struct dlock_list_iter *iter);
> > > +extern bool dlock_list_next_safe(struct dlock_list_head *dlist,
> > > + struct dlock_list_iter *iter);
> > Why not return dlock_list_node * for the current node? That'd more
> > conventional and allows dlock_list_iter to be opaque.
>
> Yes, I can make it return dlock_list_node *.
>
> However, to make dlock_list_iter opaque, I will have to dynamically allocate
> the structure. That will add an extra memory allocation and free calls as
> well as handling the error case of running out of memory. I don't think that
> is worth doing at this point.
Sure, keep it defined in the header file. Just don't require users to
reach into it and add a comment saying that the struct is opaque to
its users.
> > > +int alloc_dlock_list_head(struct dlock_list_head *dlist)
> > > +{
> > > + struct dlock_list_head dlist_tmp;
> > > + int cpu;
> > > +
> > > + dlist_tmp.head = alloc_percpu(struct dlock_list_head_percpu);
> > > + if (!dlist_tmp.head)
> > > + return -ENOMEM;
> > > +
> > > + for_each_possible_cpu(cpu) {
> > > + struct dlock_list_head_percpu *head;
> > > +
> > > + head = per_cpu_ptr(dlist_tmp.head, cpu);
> > > + INIT_LIST_HEAD(&head->list);
> > > + head->lock = __SPIN_LOCK_UNLOCKED(&head->lock);
> > > + lockdep_set_class(&head->lock,&dlock_list_key);
> > > + }
> > > +
> > > + dlist->head = dlist_tmp.head;
> > Just use dlist->head directly or use local __perpcu head pointer?
>
> I just don't want to expose the structure to world until it is fully
> initialized. If you think I am over-cautious, I can use dlist->head as
> suggested.
I don't think it makes any actual difference. No strong opinion
either way. Just use local __percpu head pointer then?
> > > + return 0;
> > > +}
> > > +EXPORT_SYMBOL(alloc_dlock_list_head);
> > Does this actually need to be exported? If so, it might be a better
> > idea to start with EXPORT_SYMBOL_GPL().
>
> For the current use case, we probably don't need to export the symbols.
> Other use cases may require that. I will change it to use the version
> instead.
If it's not immediately necessary, it's best to not export at all.
> > > +void dlock_list_del(struct dlock_list_node *node)
> > > +{
> > > + spinlock_t *lock = READ_ONCE(node->lockptr);
> > > +
> > > + if (unlikely(!lock)) {
> > > + WARN_ONCE(1,
> > > + "dlock_list_del: node 0x%lx has no associated lock\n",
> > > + (unsigned long)node);
> > Maybe "if (WARN_ONCE(!lock...)"? WARN_ONCE implies unlikely.
>
> OK, will do that.
>
> > > + return;
> > > + }
> > > +
> > > + spin_lock(lock);
> > > + if (likely(lock == node->lockptr)) {
> > > + list_del_init(&node->list);
> > > + node->lockptr = NULL;
> > > + } else {
> > > + /*
> > > + * This path should never be executed.
> > > + */
> > > + WARN_ON_ONCE(1);
> > > + }
> > This still kinda bothers me because this pretty much requires the
> > users to have strong synchronization around the operations and makes
> > it unusable in situations where opportunistic behaviors are
> > acceptable. It negates the usefulness quite a bit.
>
> I understand your concern. I will make it retry again with the new lock.
It doesn't necessarily have to retry but shouldn't break down when
used in an opportunistic racy way - e.g. if adds and removes race, the
order of operations isn't clearly defined as such any outcome is fine
as long as the list maintains its integrity.
> > > +/**
> > > + * dlock_list_next_safe - Removal-safe iterator of dlock list
> > > + * @dlist: Pointer to the dlock_list_head structure
> > > + * @iter : Pointer to the dlock list iterator structure
> > > + * Return: true if the next entry is found, false if all the entries iterated
> > > + *
> > > + * The iterator has to be properly initialized before calling this function.
> > > + * This iteration function is safe with respect to list entry removal.
> > > + * However, it cannot correctly iterate newly added entries right after the
> > > + * current one.
> > > + */
> > This still looks wrong to me. If you want to provide the two variants
> > of iterations, can't you just implement one next function and build
> > the two types of iterations on top of it?
>
> I have been thinking about making dlock_list_next_cpu() the real external
> function and have 2 inline functions that implement dlock_list_next() and
> dlock_list_next_safe(). That may strike a better balance between performance
> and code abstraction. I will do so if you have no objection to that.
Yeah, please give it a try. As mentioned in another reply, it'd
probably be best to provide an iteration macro which encapsulates the
whole thing.
Thanks.
--
tejun
next prev parent reply other threads:[~2016-07-19 19:24 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-15 17:39 [PATCH v3 0/4] vfs: Use dlock list for SB's s_inodes list Waiman Long
2016-07-15 17:39 ` [PATCH v3 1/4] lib/dlock-list: Distributed and lock-protected lists Waiman Long
2016-07-18 23:38 ` Tejun Heo
2016-07-19 18:42 ` Waiman Long
2016-07-19 19:23 ` Tejun Heo [this message]
2016-07-20 19:53 ` Waiman Long
2016-07-20 22:02 ` Waiman Long
2016-07-20 22:15 ` Waiman Long
2016-07-21 0:48 ` Christoph Lameter
2016-07-21 1:36 ` Waiman Long
2016-07-21 1:49 ` Dave Chinner
2016-07-22 20:43 ` Waiman Long
2016-07-19 5:00 ` Al Viro
2016-07-19 19:01 ` Waiman Long
2016-07-15 17:39 ` [PATCH v3 2/4] fsnotify: Simplify inode iteration on umount Waiman Long
2016-07-15 17:39 ` [PATCH v3 3/4] vfs: Remove unnecessary list_for_each_entry_safe() variants Waiman Long
2016-07-15 17:39 ` [PATCH v3 4/4] vfs: Use dlock list for superblock's inode list Waiman Long
2016-07-19 5:23 ` Al Viro
2016-07-19 18:35 ` Tejun Heo
2016-07-19 19:07 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160719192333.GP3078@mtj.duckdns.org \
--to=tj@kernel.org \
--cc=andi@firstfloor.org \
--cc=bfields@fieldses.org \
--cc=boqun.feng@gmail.com \
--cc=cl@linux-foundation.org \
--cc=dchinner@redhat.com \
--cc=doug.hatch@hpe.com \
--cc=jack@suse.com \
--cc=jlayton@poochiereds.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=scott.norton@hpe.com \
--cc=viro@zeniv.linux.org.uk \
--cc=waiman.long@hpe.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).