Re: [PATCH v3 2/5] util: introduce threaded workqueue

From: "Emilio G. Cota" <cota@braap.org>
To: Christophe de Dinechin <dinechin@redhat.com>
Cc: kvm@vger.kernel.org, mst@redhat.com, mtosatti@redhat.com,
	Xiao Guangrong <xiaoguangrong@tencent.com>,
	qemu-devel@nongnu.org, peterx@redhat.com, dgilbert@redhat.com,
	quintela@redhat.com, wei.w.wang@intel.com,
	Xiao Guangrong <guangrong.xiao@gmail.com>,
	jiang.biao2@zte.com.cn, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH v3 2/5] util: introduce threaded workqueue
Date: Tue, 27 Nov 2018 12:39:34 -0500	[thread overview]
Message-ID: <20181127173934.GB12617@flamenco> (raw)
In-Reply-To: <BB9071C9-5BED-4763-99B3-7083B8EB49CF@redhat.com>

On Tue, Nov 27, 2018 at 13:49:13 +0100, Christophe de Dinechin wrote:
> (I did not finish the review, but decided to send what I already had).
> 
> > On 22 Nov 2018, at 08:20, guangrong.xiao@gmail.com wrote:
> > 
> > From: Xiao Guangrong <xiaoguangrong@tencent.com>
> > 
> > This modules implements the lockless and efficient threaded workqueue.
> 
> I’m not entirely convinced that it’s either “lockless” or “efficient”
> in the current iteration. I believe that it’s relatively easy to fix, though.
(snip)
> > 
> >    All the requests are pre-allocated and carefully partitioned between
> >    threads so there is no contention on the request, that make threads
> >    be parallel as much as possible.
> 
> That sentence confused me (it’s also in a comment in the text).
> I think I’m mostly confused by “there is no contention”. Perhaps you
> meant “so as to avoid contention if possible”? If there is a reason
> why there would never be any contention even if requests arrive faster than
> completions, I did not figure it out.
> 
> I personally see serious contention on the fields in the Threads structure,
> for example, but also possibly on the targets of the “modulo” operation in
> thread_find_free_request. Specifically, if three CPUs are entering
> thread_find_free_request at the same time, they will all run the same
> loop and all, presumably, “attack” the same memory locations.
> 
> Sorry if I mis-read the code, but at the moment, it does not seem to
> avoid contention as intended. I don’t see how it could without having
> some way to discriminate between CPUs to start with, which I did not find.

You might have missed that only one thread can request jobs. So contention
should only happen between that thread and the worker threads, but
not among worker threads (they should only share cache lines with the
requester thread).

> > - User, i.e, the submitter
> >    It's the one fills the request and submits it to the workqueue,
> the one -> the one who
> >    the result will be collected after it is handled by the work queue.
> > 
> >    The user can consecutively submit requests without waiting the previous
> waiting -> waiting for
> >    requests been handled.
> >    It only supports one submitter, you should do serial submission by
> >    yourself if you want more, e.g, use lock on you side.
> 
> I’m also confused by this last statement. The proposal purports
> to be “lockless”, which I read as working correctly without a lock…
> Reading the code, I indeed see issues if different threads
> try to place requests at the same time. So I believe the word
> “lockless” is a bit misleading.

ditto, it is lockless as presented here, i.e. one requester thread.

> > +static void uninit_thread_requests(ThreadLocal *thread, int free_nr)
> > +{
> > +    Threads *threads = thread->threads;
> > +    ThreadRequest *request = thread->requests;
> > +    int i;
> > +
> > +    for (i = 0; i < free_nr; i++) {
> > +        threads->ops->thread_request_uninit(request + 1);
> > +        request = (void *)request + threads->request_size;
> 
> Despite GCC’s tolerance for it and rather lengthy debates,
> pointer arithmetic on void * is illegal in C [1].
> 
> Consider using char * arithmetic, and using macros such as:
> 
> #define request_to_payload(req) (((ThreadRequest *) req) + 1)
> #define payload_to_request(req) (((ThreadRequest *) req) - 1)
> #define request_to_next(req,threads) ((ThreadRequest *) ((char *) req) + threads->request_size))
> 
> where appropriate, that would clarify the intent.
> 
> [1] https://stackoverflow.com/questions/3523145/pointer-arithmetic-for-void-pointer-in-c

FWIW, we use void pointer arithmetic in other places in QEMU, so
I wouldn't worry about it being illegal.

I like those little macros though; even better as inlines.

Thanks,

		Emilio