All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Emilio G. Cota" <cota@braap.org>
To: Christophe de Dinechin <dinechin@redhat.com>
Cc: kvm@vger.kernel.org, mst@redhat.com, mtosatti@redhat.com,
	Xiao Guangrong <xiaoguangrong@tencent.com>,
	qemu-devel@nongnu.org, peterx@redhat.com, dgilbert@redhat.com,
	quintela@redhat.com, wei.w.wang@intel.com,
	Xiao Guangrong <guangrong.xiao@gmail.com>,
	jiang.biao2@zte.com.cn, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH v3 2/5] util: introduce threaded workqueue
Date: Tue, 27 Nov 2018 12:39:34 -0500	[thread overview]
Message-ID: <20181127173934.GB12617@flamenco> (raw)
In-Reply-To: <BB9071C9-5BED-4763-99B3-7083B8EB49CF@redhat.com>

On Tue, Nov 27, 2018 at 13:49:13 +0100, Christophe de Dinechin wrote:
> (I did not finish the review, but decided to send what I already had).
> 
> > On 22 Nov 2018, at 08:20, guangrong.xiao@gmail.com wrote:
> > 
> > From: Xiao Guangrong <xiaoguangrong@tencent.com>
> > 
> > This modules implements the lockless and efficient threaded workqueue.
> 
> I’m not entirely convinced that it’s either “lockless” or “efficient”
> in the current iteration. I believe that it’s relatively easy to fix, though.
(snip)
> > 
> >    All the requests are pre-allocated and carefully partitioned between
> >    threads so there is no contention on the request, that make threads
> >    be parallel as much as possible.
> 
> That sentence confused me (it’s also in a comment in the text).
> I think I’m mostly confused by “there is no contention”. Perhaps you
> meant “so as to avoid contention if possible”? If there is a reason
> why there would never be any contention even if requests arrive faster than
> completions, I did not figure it out.
> 
> I personally see serious contention on the fields in the Threads structure,
> for example, but also possibly on the targets of the “modulo” operation in
> thread_find_free_request. Specifically, if three CPUs are entering
> thread_find_free_request at the same time, they will all run the same
> loop and all, presumably, “attack” the same memory locations.
> 
> Sorry if I mis-read the code, but at the moment, it does not seem to
> avoid contention as intended. I don’t see how it could without having
> some way to discriminate between CPUs to start with, which I did not find.

You might have missed that only one thread can request jobs. So contention
should only happen between that thread and the worker threads, but
not among worker threads (they should only share cache lines with the
requester thread).

> > - User, i.e, the submitter
> >    It's the one fills the request and submits it to the workqueue,
> the one -> the one who
> >    the result will be collected after it is handled by the work queue.
> > 
> >    The user can consecutively submit requests without waiting the previous
> waiting -> waiting for
> >    requests been handled.
> >    It only supports one submitter, you should do serial submission by
> >    yourself if you want more, e.g, use lock on you side.
> 
> I’m also confused by this last statement. The proposal purports
> to be “lockless”, which I read as working correctly without a lock…
> Reading the code, I indeed see issues if different threads
> try to place requests at the same time. So I believe the word
> “lockless” is a bit misleading.

ditto, it is lockless as presented here, i.e. one requester thread.

> > +static void uninit_thread_requests(ThreadLocal *thread, int free_nr)
> > +{
> > +    Threads *threads = thread->threads;
> > +    ThreadRequest *request = thread->requests;
> > +    int i;
> > +
> > +    for (i = 0; i < free_nr; i++) {
> > +        threads->ops->thread_request_uninit(request + 1);
> > +        request = (void *)request + threads->request_size;
> 
> Despite GCC’s tolerance for it and rather lengthy debates,
> pointer arithmetic on void * is illegal in C [1].
> 
> Consider using char * arithmetic, and using macros such as:
> 
> #define request_to_payload(req) (((ThreadRequest *) req) + 1)
> #define payload_to_request(req) (((ThreadRequest *) req) - 1)
> #define request_to_next(req,threads) ((ThreadRequest *) ((char *) req) + threads->request_size))
> 
> where appropriate, that would clarify the intent.
> 
> [1] https://stackoverflow.com/questions/3523145/pointer-arithmetic-for-void-pointer-in-c

FWIW, we use void pointer arithmetic in other places in QEMU, so
I wouldn't worry about it being illegal.

I like those little macros though; even better as inlines.

Thanks,

		Emilio

WARNING: multiple messages have this Message-ID (diff)
From: "Emilio G. Cota" <cota@braap.org>
To: Christophe de Dinechin <dinechin@redhat.com>
Cc: Xiao Guangrong <guangrong.xiao@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	mst@redhat.com, mtosatti@redhat.com, qemu-devel@nongnu.org,
	kvm@vger.kernel.org, dgilbert@redhat.com, peterx@redhat.com,
	wei.w.wang@intel.com, jiang.biao2@zte.com.cn, eblake@redhat.com,
	quintela@redhat.com, Xiao Guangrong <xiaoguangrong@tencent.com>
Subject: Re: [Qemu-devel] [PATCH v3 2/5] util: introduce threaded workqueue
Date: Tue, 27 Nov 2018 12:39:34 -0500	[thread overview]
Message-ID: <20181127173934.GB12617@flamenco> (raw)
In-Reply-To: <BB9071C9-5BED-4763-99B3-7083B8EB49CF@redhat.com>

On Tue, Nov 27, 2018 at 13:49:13 +0100, Christophe de Dinechin wrote:
> (I did not finish the review, but decided to send what I already had).
> 
> > On 22 Nov 2018, at 08:20, guangrong.xiao@gmail.com wrote:
> > 
> > From: Xiao Guangrong <xiaoguangrong@tencent.com>
> > 
> > This modules implements the lockless and efficient threaded workqueue.
> 
> I’m not entirely convinced that it’s either “lockless” or “efficient”
> in the current iteration. I believe that it’s relatively easy to fix, though.
(snip)
> > 
> >    All the requests are pre-allocated and carefully partitioned between
> >    threads so there is no contention on the request, that make threads
> >    be parallel as much as possible.
> 
> That sentence confused me (it’s also in a comment in the text).
> I think I’m mostly confused by “there is no contention”. Perhaps you
> meant “so as to avoid contention if possible”? If there is a reason
> why there would never be any contention even if requests arrive faster than
> completions, I did not figure it out.
> 
> I personally see serious contention on the fields in the Threads structure,
> for example, but also possibly on the targets of the “modulo” operation in
> thread_find_free_request. Specifically, if three CPUs are entering
> thread_find_free_request at the same time, they will all run the same
> loop and all, presumably, “attack” the same memory locations.
> 
> Sorry if I mis-read the code, but at the moment, it does not seem to
> avoid contention as intended. I don’t see how it could without having
> some way to discriminate between CPUs to start with, which I did not find.

You might have missed that only one thread can request jobs. So contention
should only happen between that thread and the worker threads, but
not among worker threads (they should only share cache lines with the
requester thread).

> > - User, i.e, the submitter
> >    It's the one fills the request and submits it to the workqueue,
> the one -> the one who
> >    the result will be collected after it is handled by the work queue.
> > 
> >    The user can consecutively submit requests without waiting the previous
> waiting -> waiting for
> >    requests been handled.
> >    It only supports one submitter, you should do serial submission by
> >    yourself if you want more, e.g, use lock on you side.
> 
> I’m also confused by this last statement. The proposal purports
> to be “lockless”, which I read as working correctly without a lock…
> Reading the code, I indeed see issues if different threads
> try to place requests at the same time. So I believe the word
> “lockless” is a bit misleading.

ditto, it is lockless as presented here, i.e. one requester thread.

> > +static void uninit_thread_requests(ThreadLocal *thread, int free_nr)
> > +{
> > +    Threads *threads = thread->threads;
> > +    ThreadRequest *request = thread->requests;
> > +    int i;
> > +
> > +    for (i = 0; i < free_nr; i++) {
> > +        threads->ops->thread_request_uninit(request + 1);
> > +        request = (void *)request + threads->request_size;
> 
> Despite GCC’s tolerance for it and rather lengthy debates,
> pointer arithmetic on void * is illegal in C [1].
> 
> Consider using char * arithmetic, and using macros such as:
> 
> #define request_to_payload(req) (((ThreadRequest *) req) + 1)
> #define payload_to_request(req) (((ThreadRequest *) req) - 1)
> #define request_to_next(req,threads) ((ThreadRequest *) ((char *) req) + threads->request_size))
> 
> where appropriate, that would clarify the intent.
> 
> [1] https://stackoverflow.com/questions/3523145/pointer-arithmetic-for-void-pointer-in-c

FWIW, we use void pointer arithmetic in other places in QEMU, so
I wouldn't worry about it being illegal.

I like those little macros though; even better as inlines.

Thanks,

		Emilio

  parent reply	other threads:[~2018-11-27 17:39 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-22  7:20 [PATCH v3 0/5] migration: improve multithreads guangrong.xiao
2018-11-22  7:20 ` [Qemu-devel] " guangrong.xiao
2018-11-22  7:20 ` [PATCH v3 1/5] bitops: introduce change_bit_atomic guangrong.xiao
2018-11-22  7:20   ` [Qemu-devel] " guangrong.xiao
2018-11-23 10:23   ` Dr. David Alan Gilbert
2018-11-23 10:23     ` [Qemu-devel] " Dr. David Alan Gilbert
2018-11-28  9:35   ` Juan Quintela
2018-11-28  9:35     ` [Qemu-devel] " Juan Quintela
2018-11-22  7:20 ` [PATCH v3 2/5] util: introduce threaded workqueue guangrong.xiao
2018-11-22  7:20   ` [Qemu-devel] " guangrong.xiao
2018-11-23 11:02   ` Dr. David Alan Gilbert
2018-11-23 11:02     ` [Qemu-devel] " Dr. David Alan Gilbert
2018-11-26  7:57     ` Xiao Guangrong
2018-11-26  7:57       ` [Qemu-devel] " Xiao Guangrong
2018-11-26 10:56       ` Dr. David Alan Gilbert
2018-11-26 10:56         ` [Qemu-devel] " Dr. David Alan Gilbert
2018-11-27  7:17         ` Xiao Guangrong
2018-11-27  7:17           ` [Qemu-devel] " Xiao Guangrong
2018-11-26 18:55       ` Emilio G. Cota
2018-11-26 18:55         ` [Qemu-devel] " Emilio G. Cota
2018-11-27  8:30         ` Xiao Guangrong
2018-11-27  8:30           ` [Qemu-devel] " Xiao Guangrong
2018-11-24  0:12   ` Emilio G. Cota
2018-11-24  0:12     ` [Qemu-devel] " Emilio G. Cota
2018-11-26  8:06     ` Xiao Guangrong
2018-11-26  8:06       ` [Qemu-devel] " Xiao Guangrong
2018-11-26 18:49       ` Emilio G. Cota
2018-11-26 18:49         ` [Qemu-devel] " Emilio G. Cota
2018-11-27  8:29         ` Xiao Guangrong
2018-11-27  8:29           ` [Qemu-devel] " Xiao Guangrong
2018-11-24  0:17   ` Emilio G. Cota
2018-11-24  0:17     ` [Qemu-devel] " Emilio G. Cota
2018-11-26  8:18     ` Xiao Guangrong
2018-11-26  8:18       ` [Qemu-devel] " Xiao Guangrong
2018-11-26 10:28       ` Paolo Bonzini
2018-11-26 10:28         ` [Qemu-devel] " Paolo Bonzini
2018-11-27  8:31         ` Xiao Guangrong
2018-11-27  8:31           ` [Qemu-devel] " Xiao Guangrong
2018-11-27 12:49   ` Christophe de Dinechin
2018-11-27 12:49     ` [Qemu-devel] " Christophe de Dinechin
2018-11-27 13:51     ` Paolo Bonzini
2018-11-27 13:51       ` [Qemu-devel] " Paolo Bonzini
2018-12-04 15:49       ` Christophe de Dinechin
2018-12-04 15:49         ` [Qemu-devel] " Christophe de Dinechin
2018-12-04 17:16         ` Paolo Bonzini
2018-12-04 17:16           ` [Qemu-devel] " Paolo Bonzini
2018-12-10  3:23           ` Xiao Guangrong
2018-12-10  3:23             ` [Qemu-devel] " Xiao Guangrong
2018-11-27 17:39     ` Emilio G. Cota [this message]
2018-11-27 17:39       ` Emilio G. Cota
2018-11-28  8:55     ` Xiao Guangrong
2018-11-28  8:55       ` [Qemu-devel] " Xiao Guangrong
2018-11-22  7:20 ` [PATCH v3 3/5] migration: use threaded workqueue for compression guangrong.xiao
2018-11-22  7:20   ` [Qemu-devel] " guangrong.xiao
2018-11-23 18:17   ` Dr. David Alan Gilbert
2018-11-23 18:17     ` [Qemu-devel] " Dr. David Alan Gilbert
2018-11-23 18:22     ` Paolo Bonzini
2018-11-23 18:22       ` [Qemu-devel] " Paolo Bonzini
2018-11-23 18:29       ` Dr. David Alan Gilbert
2018-11-23 18:29         ` [Qemu-devel] " Dr. David Alan Gilbert
2018-11-26  8:00         ` Xiao Guangrong
2018-11-26  8:00           ` [Qemu-devel] " Xiao Guangrong
2018-11-22  7:20 ` [PATCH v3 4/5] migration: use threaded workqueue for decompression guangrong.xiao
2018-11-22  7:20   ` [Qemu-devel] " guangrong.xiao
2018-11-22  7:20 ` [PATCH v3 5/5] tests: add threaded-workqueue-bench guangrong.xiao
2018-11-22  7:20   ` [Qemu-devel] " guangrong.xiao
2018-11-22 21:25 ` [PATCH v3 0/5] migration: improve multithreads no-reply
2018-11-22 21:25   ` [Qemu-devel] " no-reply
2018-11-22 21:35 ` no-reply
2018-11-22 21:35   ` [Qemu-devel] " no-reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181127173934.GB12617@flamenco \
    --to=cota@braap.org \
    --cc=dgilbert@redhat.com \
    --cc=dinechin@redhat.com \
    --cc=guangrong.xiao@gmail.com \
    --cc=jiang.biao2@zte.com.cn \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=wei.w.wang@intel.com \
    --cc=xiaoguangrong@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.