From: "Michael S. Tsirkin" <mst@redhat.com> To: Mike Christie <michael.christie@oracle.com> Cc: hch@infradead.org, stefanha@redhat.com, jasowang@redhat.com, sgarzare@redhat.com, virtualization@lists.linux-foundation.org, brauner@kernel.org, ebiederm@xmission.com, torvalds@linux-foundation.org, konrad.wilk@oracle.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH v11 8/8] vhost: use vhost_tasks for worker threads Date: Sun, 13 Aug 2023 15:01:24 -0400 [thread overview] Message-ID: <20230813145936-mutt-send-email-mst@kernel.org> (raw) In-Reply-To: <b2b02526-913d-42a9-9d23-59badf5b96db@oracle.com> On Fri, Aug 11, 2023 at 01:51:36PM -0500, Mike Christie wrote: > On 8/10/23 1:57 PM, Michael S. Tsirkin wrote: > > On Sat, Jul 22, 2023 at 11:03:29PM -0500, michael.christie@oracle.com wrote: > >> On 7/20/23 8:06 AM, Michael S. Tsirkin wrote: > >>> On Thu, Feb 02, 2023 at 05:25:17PM -0600, Mike Christie wrote: > >>>> For vhost workers we use the kthread API which inherit's its values from > >>>> and checks against the kthreadd thread. This results in the wrong RLIMITs > >>>> being checked, so while tools like libvirt try to control the number of > >>>> threads based on the nproc rlimit setting we can end up creating more > >>>> threads than the user wanted. > >>>> > >>>> This patch has us use the vhost_task helpers which will inherit its > >>>> values/checks from the thread that owns the device similar to if we did > >>>> a clone in userspace. The vhost threads will now be counted in the nproc > >>>> rlimits. And we get features like cgroups and mm sharing automatically, > >>>> so we can remove those calls. > >>>> > >>>> Signed-off-by: Mike Christie <michael.christie@oracle.com> > >>>> Acked-by: Michael S. Tsirkin <mst@redhat.com> > >>> > >>> > >>> Hi Mike, > >>> So this seems to have caused a measureable regression in networking > >>> performance (about 30%). Take a look here, and there's a zip file > >>> with detailed measuraments attached: > >>> > >>> https://bugzilla.redhat.com/show_bug.cgi?id=2222603 > >>> > >>> > >>> Could you take a look please? > >>> You can also ask reporter questions there assuming you > >>> have or can create a (free) account. > >>> > >> > >> Sorry for the late reply. I just got home from vacation. > >> > >> The account creation link seems to be down. I keep getting a > >> "unable to establish SMTP connection to bz-exim-prod port 25 " error. > >> > >> Can you give me Quan's email? > >> > >> I think I can replicate the problem. I just need some extra info from Quan: > >> > >> 1. Just double check that they are using RHEL 9 on the host running the VMs. > >> 2. The kernel config > >> 3. Any tuning that was done. Is tuned running in guest and/or host running the > >> VMs and what profile is being used in each. > >> 4. Number of vCPUs and virtqueues being used. > >> 5. Can they dump the contents of: > >> > >> /sys/kernel/debug/sched > >> > >> and > >> > >> sysctl -a > >> > >> on the host running the VMs. > >> > >> 6. With the 6.4 kernel, can they also run a quick test and tell me if they set > >> the scheduler to batch: > >> > >> ps -T -o comm,pid,tid $QEMU_THREAD > >> > >> then for each vhost thread do: > >> > >> chrt -b -p 0 $VHOST_THREAD > >> > >> Does that end up increasing perf? When I do this I see throughput go up by > >> around 50% vs 6.3 when sessions was 16 or more (16 was the number of vCPUs > >> and virtqueues per net device in the VM). Note that I'm not saying that is a fix. > >> It's just a difference I noticed when running some other tests. > > > > > > Mike I'm unsure what to do at this point. Regressions are not nice > > but if the kernel is released with the new userspace api we won't > > be able to revert. So what's the plan? > > > > I'm sort of stumped. I still can't replicate the problem out of the box. 6.3 and > 6.4 perform the same for me. I've tried your setup and settings and with different > combos of using things like tuned and irqbalance. > > I can sort of force the issue. In 6.4, the vhost thread inherits it's settings > from the parent thread. In 6.3, the vhost thread inherits from kthreadd and we > would then reset the sched settings. So in 6.4 if I just tune the parent differently > I can cause different performance. If we want the 6.3 behavior we can do the patch > below. > > However, I don't think you guys are hitting this because you are just running > qemu from the normal shell and were not doing anything fancy with the sched > settings. > > > diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c > index da35e5b7f047..f2c2638d1106 100644 > --- a/kernel/vhost_task.c > +++ b/kernel/vhost_task.c > @@ -2,6 +2,7 @@ > /* > * Copyright (C) 2021 Oracle Corporation > */ > +#include <uapi/linux/sched/types.h> > #include <linux/slab.h> > #include <linux/completion.h> > #include <linux/sched/task.h> > @@ -22,9 +23,16 @@ struct vhost_task { > > static int vhost_task_fn(void *data) > { > + static const struct sched_param param = { .sched_priority = 0 }; > struct vhost_task *vtsk = data; > bool dead = false; > > + /* > + * Don't inherit the parent's sched info, so we maintain compat from > + * when we used kthreads and it reset this info. > + */ > + sched_setscheduler_nocheck(current, SCHED_NORMAL, ¶m); > + > for (;;) { > bool did_work; > > > yes seems unlikely, still, attach this to bugzilla so it can be tested? and, what will help you debug? any traces to enable? Also wasn't there another issue with a non standard config? Maybe if we fix that it will by chance fix this one too? > >
WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com> To: Mike Christie <michael.christie@oracle.com> Cc: brauner@kernel.org, konrad.wilk@oracle.com, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, hch@infradead.org, ebiederm@xmission.com, stefanha@redhat.com, torvalds@linux-foundation.org Subject: Re: [PATCH v11 8/8] vhost: use vhost_tasks for worker threads Date: Sun, 13 Aug 2023 15:01:24 -0400 [thread overview] Message-ID: <20230813145936-mutt-send-email-mst@kernel.org> (raw) In-Reply-To: <b2b02526-913d-42a9-9d23-59badf5b96db@oracle.com> On Fri, Aug 11, 2023 at 01:51:36PM -0500, Mike Christie wrote: > On 8/10/23 1:57 PM, Michael S. Tsirkin wrote: > > On Sat, Jul 22, 2023 at 11:03:29PM -0500, michael.christie@oracle.com wrote: > >> On 7/20/23 8:06 AM, Michael S. Tsirkin wrote: > >>> On Thu, Feb 02, 2023 at 05:25:17PM -0600, Mike Christie wrote: > >>>> For vhost workers we use the kthread API which inherit's its values from > >>>> and checks against the kthreadd thread. This results in the wrong RLIMITs > >>>> being checked, so while tools like libvirt try to control the number of > >>>> threads based on the nproc rlimit setting we can end up creating more > >>>> threads than the user wanted. > >>>> > >>>> This patch has us use the vhost_task helpers which will inherit its > >>>> values/checks from the thread that owns the device similar to if we did > >>>> a clone in userspace. The vhost threads will now be counted in the nproc > >>>> rlimits. And we get features like cgroups and mm sharing automatically, > >>>> so we can remove those calls. > >>>> > >>>> Signed-off-by: Mike Christie <michael.christie@oracle.com> > >>>> Acked-by: Michael S. Tsirkin <mst@redhat.com> > >>> > >>> > >>> Hi Mike, > >>> So this seems to have caused a measureable regression in networking > >>> performance (about 30%). Take a look here, and there's a zip file > >>> with detailed measuraments attached: > >>> > >>> https://bugzilla.redhat.com/show_bug.cgi?id=2222603 > >>> > >>> > >>> Could you take a look please? > >>> You can also ask reporter questions there assuming you > >>> have or can create a (free) account. > >>> > >> > >> Sorry for the late reply. I just got home from vacation. > >> > >> The account creation link seems to be down. I keep getting a > >> "unable to establish SMTP connection to bz-exim-prod port 25 " error. > >> > >> Can you give me Quan's email? > >> > >> I think I can replicate the problem. I just need some extra info from Quan: > >> > >> 1. Just double check that they are using RHEL 9 on the host running the VMs. > >> 2. The kernel config > >> 3. Any tuning that was done. Is tuned running in guest and/or host running the > >> VMs and what profile is being used in each. > >> 4. Number of vCPUs and virtqueues being used. > >> 5. Can they dump the contents of: > >> > >> /sys/kernel/debug/sched > >> > >> and > >> > >> sysctl -a > >> > >> on the host running the VMs. > >> > >> 6. With the 6.4 kernel, can they also run a quick test and tell me if they set > >> the scheduler to batch: > >> > >> ps -T -o comm,pid,tid $QEMU_THREAD > >> > >> then for each vhost thread do: > >> > >> chrt -b -p 0 $VHOST_THREAD > >> > >> Does that end up increasing perf? When I do this I see throughput go up by > >> around 50% vs 6.3 when sessions was 16 or more (16 was the number of vCPUs > >> and virtqueues per net device in the VM). Note that I'm not saying that is a fix. > >> It's just a difference I noticed when running some other tests. > > > > > > Mike I'm unsure what to do at this point. Regressions are not nice > > but if the kernel is released with the new userspace api we won't > > be able to revert. So what's the plan? > > > > I'm sort of stumped. I still can't replicate the problem out of the box. 6.3 and > 6.4 perform the same for me. I've tried your setup and settings and with different > combos of using things like tuned and irqbalance. > > I can sort of force the issue. In 6.4, the vhost thread inherits it's settings > from the parent thread. In 6.3, the vhost thread inherits from kthreadd and we > would then reset the sched settings. So in 6.4 if I just tune the parent differently > I can cause different performance. If we want the 6.3 behavior we can do the patch > below. > > However, I don't think you guys are hitting this because you are just running > qemu from the normal shell and were not doing anything fancy with the sched > settings. > > > diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c > index da35e5b7f047..f2c2638d1106 100644 > --- a/kernel/vhost_task.c > +++ b/kernel/vhost_task.c > @@ -2,6 +2,7 @@ > /* > * Copyright (C) 2021 Oracle Corporation > */ > +#include <uapi/linux/sched/types.h> > #include <linux/slab.h> > #include <linux/completion.h> > #include <linux/sched/task.h> > @@ -22,9 +23,16 @@ struct vhost_task { > > static int vhost_task_fn(void *data) > { > + static const struct sched_param param = { .sched_priority = 0 }; > struct vhost_task *vtsk = data; > bool dead = false; > > + /* > + * Don't inherit the parent's sched info, so we maintain compat from > + * when we used kthreads and it reset this info. > + */ > + sched_setscheduler_nocheck(current, SCHED_NORMAL, ¶m); > + > for (;;) { > bool did_work; > > > yes seems unlikely, still, attach this to bugzilla so it can be tested? and, what will help you debug? any traces to enable? Also wasn't there another issue with a non standard config? Maybe if we fix that it will by chance fix this one too? > > _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
next prev parent reply other threads:[~2023-08-13 19:02 UTC|newest] Thread overview: 176+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-02-02 23:25 [PATCH v11 0/8] Use copy_process in vhost layer Mike Christie 2023-02-02 23:25 ` Mike Christie 2023-02-02 23:25 ` [PATCH v11 1/8] fork: Make IO worker options flag based Mike Christie 2023-02-02 23:25 ` Mike Christie 2023-02-03 0:14 ` Linus Torvalds 2023-02-03 0:14 ` Linus Torvalds 2023-02-02 23:25 ` [PATCH v11 2/8] fork/vm: Move common PF_IO_WORKER behavior to new flag Mike Christie 2023-02-02 23:25 ` Mike Christie 2023-02-02 23:25 ` [PATCH v11 3/8] fork: add USER_WORKER flag to not dup/clone files Mike Christie 2023-02-02 23:25 ` Mike Christie 2023-02-03 0:16 ` Linus Torvalds 2023-02-03 0:16 ` Linus Torvalds 2023-02-02 23:25 ` [PATCH v11 4/8] fork: Add USER_WORKER flag to ignore signals Mike Christie 2023-02-02 23:25 ` Mike Christie 2023-02-03 0:19 ` Linus Torvalds 2023-02-03 0:19 ` Linus Torvalds 2023-02-05 16:06 ` Mike Christie 2023-02-05 16:06 ` Mike Christie 2023-02-02 23:25 ` [PATCH v11 5/8] fork: allow kernel code to call copy_process Mike Christie 2023-02-02 23:25 ` Mike Christie 2023-02-02 23:25 ` [PATCH v11 6/8] vhost_task: Allow vhost layer to use copy_process Mike Christie 2023-02-02 23:25 ` Mike Christie 2023-02-03 0:43 ` Linus Torvalds 2023-02-03 0:43 ` Linus Torvalds 2023-02-02 23:25 ` [PATCH v11 7/8] vhost: move worker thread fields to new struct Mike Christie 2023-02-02 23:25 ` Mike Christie 2023-02-02 23:25 ` [PATCH v11 8/8] vhost: use vhost_tasks for worker threads Mike Christie 2023-02-02 23:25 ` Mike Christie 2023-05-05 13:40 ` Nicolas Dichtel 2023-05-05 18:22 ` Linus Torvalds 2023-05-05 18:22 ` Linus Torvalds 2023-05-05 22:37 ` Mike Christie 2023-05-05 22:37 ` Mike Christie 2023-05-06 1:53 ` Linus Torvalds 2023-05-06 1:53 ` Linus Torvalds 2023-05-08 17:13 ` Christian Brauner 2023-05-09 8:09 ` Nicolas Dichtel 2023-05-09 8:17 ` Nicolas Dichtel 2023-05-13 12:39 ` Thorsten Leemhuis 2023-05-13 12:39 ` Thorsten Leemhuis 2023-05-13 15:08 ` Linus Torvalds 2023-05-13 15:08 ` Linus Torvalds 2023-05-15 14:23 ` Christian Brauner 2023-05-15 15:44 ` Linus Torvalds 2023-05-15 15:44 ` Linus Torvalds 2023-05-15 15:52 ` Jens Axboe 2023-05-15 15:52 ` Jens Axboe 2023-05-15 15:54 ` Linus Torvalds 2023-05-15 15:54 ` Linus Torvalds 2023-05-15 17:23 ` Linus Torvalds 2023-05-15 17:23 ` Linus Torvalds 2023-05-15 15:56 ` Linus Torvalds 2023-05-15 15:56 ` Linus Torvalds 2023-05-15 22:23 ` Mike Christie 2023-05-15 22:23 ` Mike Christie 2023-05-15 22:54 ` Linus Torvalds 2023-05-15 22:54 ` Linus Torvalds 2023-05-16 3:53 ` Mike Christie 2023-05-16 3:53 ` Mike Christie 2023-05-16 13:18 ` Oleg Nesterov 2023-05-16 13:18 ` Oleg Nesterov 2023-05-16 13:40 ` Oleg Nesterov 2023-05-16 13:40 ` Oleg Nesterov 2023-05-16 15:56 ` Eric W. Biederman 2023-05-16 15:56 ` Eric W. Biederman 2023-05-16 18:37 ` Oleg Nesterov 2023-05-16 18:37 ` Oleg Nesterov 2023-05-16 20:12 ` Eric W. Biederman 2023-05-16 20:12 ` Eric W. Biederman 2023-05-17 17:09 ` Oleg Nesterov 2023-05-17 17:09 ` Oleg Nesterov 2023-05-17 18:22 ` Mike Christie 2023-05-17 18:22 ` Mike Christie 2023-05-16 8:39 ` Christian Brauner 2023-05-16 16:24 ` Mike Christie 2023-05-16 16:24 ` Mike Christie 2023-05-16 16:44 ` Christian Brauner 2023-05-19 12:15 ` [RFC PATCH 0/8] vhost_tasks: Use CLONE_THREAD/SIGHAND Christian Brauner 2023-06-01 7:58 ` Thorsten Leemhuis 2023-06-01 7:58 ` Thorsten Leemhuis 2023-06-01 10:18 ` Nicolas Dichtel 2023-06-01 10:47 ` Christian Brauner 2023-06-01 11:29 ` Thorsten Leemhuis 2023-06-01 11:29 ` Thorsten Leemhuis 2023-06-01 12:26 ` Linus Torvalds 2023-06-01 12:26 ` Linus Torvalds 2023-06-01 16:10 ` Mike Christie 2023-06-01 16:10 ` Mike Christie 2023-05-16 14:06 ` [PATCH v11 8/8] vhost: use vhost_tasks for worker threads Linux regression tracking #adding (Thorsten Leemhuis) 2023-05-26 9:03 ` Linux regression tracking #update (Thorsten Leemhuis) 2023-06-02 11:38 ` Thorsten Leemhuis 2023-07-20 13:06 ` Michael S. Tsirkin 2023-07-20 13:06 ` Michael S. Tsirkin 2023-07-23 4:03 ` michael.christie 2023-07-23 4:03 ` michael.christie 2023-07-23 9:31 ` Michael S. Tsirkin 2023-07-23 9:31 ` Michael S. Tsirkin 2023-08-10 18:57 ` Michael S. Tsirkin 2023-08-10 18:57 ` Michael S. Tsirkin 2023-08-11 18:51 ` Mike Christie 2023-08-11 18:51 ` Mike Christie 2023-08-13 19:01 ` Michael S. Tsirkin [this message] 2023-08-13 19:01 ` Michael S. Tsirkin 2023-08-14 3:13 ` michael.christie 2023-08-14 3:13 ` michael.christie 2023-02-07 8:19 ` [PATCH v11 0/8] Use copy_process in vhost layer Christian Brauner 2023-05-18 0:09 [RFC PATCH 0/8] vhost_tasks: Use CLONE_THREAD/SIGHAND Mike Christie 2023-05-18 0:09 ` Mike Christie 2023-05-18 0:09 ` [RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set Mike Christie 2023-05-18 0:09 ` Mike Christie 2023-05-18 2:34 ` Eric W. Biederman 2023-05-18 2:34 ` Eric W. Biederman 2023-05-18 3:49 ` Eric W. Biederman 2023-05-18 3:49 ` Eric W. Biederman 2023-05-18 15:21 ` Mike Christie 2023-05-18 15:21 ` Mike Christie 2023-05-18 16:25 ` Oleg Nesterov 2023-05-18 16:25 ` Oleg Nesterov 2023-05-18 16:42 ` Mike Christie 2023-05-18 16:42 ` Mike Christie 2023-05-18 17:04 ` Oleg Nesterov 2023-05-18 17:04 ` Oleg Nesterov 2023-05-18 18:28 ` Eric W. Biederman 2023-05-18 18:28 ` Eric W. Biederman 2023-05-18 22:57 ` Mike Christie 2023-05-18 22:57 ` Mike Christie 2023-05-19 4:16 ` Eric W. Biederman 2023-05-19 4:16 ` Eric W. Biederman 2023-05-19 23:24 ` Mike Christie 2023-05-19 23:24 ` Mike Christie 2023-05-22 13:30 ` Oleg Nesterov 2023-05-22 13:30 ` Oleg Nesterov 2023-05-18 8:08 ` Christian Brauner 2023-05-18 15:27 ` Mike Christie 2023-05-18 15:27 ` Mike Christie 2023-05-18 17:07 ` Christian Brauner 2023-05-18 18:08 ` Oleg Nesterov 2023-05-18 18:08 ` Oleg Nesterov 2023-05-18 18:12 ` Christian Brauner 2023-05-18 18:23 ` Oleg Nesterov 2023-05-18 18:23 ` Oleg Nesterov 2023-05-18 0:09 ` [RFC PATCH 2/8] vhost/vhost_task: Hook vhost layer into signal handler Mike Christie 2023-05-18 0:09 ` Mike Christie 2023-05-18 0:16 ` Linus Torvalds 2023-05-18 0:16 ` Linus Torvalds 2023-05-18 1:01 ` Mike Christie 2023-05-18 1:01 ` Mike Christie 2023-05-18 8:16 ` Christian Brauner 2023-05-18 0:09 ` [RFC PATCH 3/8] fork/vhost_task: Switch to CLONE_THREAD and CLONE_SIGHAND Mike Christie 2023-05-18 0:09 ` Mike Christie 2023-05-18 8:18 ` Christian Brauner 2023-05-18 0:09 ` [RFC PATCH 4/8] vhost-net: Move vhost_net_open Mike Christie 2023-05-18 0:09 ` Mike Christie 2023-05-18 0:09 ` [RFC PATCH 5/8] vhost: Add callback that stops new work and waits on running ones Mike Christie 2023-05-18 0:09 ` Mike Christie 2023-05-18 14:18 ` Christian Brauner 2023-05-18 15:03 ` Mike Christie 2023-05-18 15:03 ` Mike Christie 2023-05-18 15:09 ` Christian Brauner 2023-05-18 18:38 ` Eric W. Biederman 2023-05-18 18:38 ` Eric W. Biederman 2023-05-18 0:09 ` [RFC PATCH 6/8] vhost-scsi: Add callback to stop and wait on works Mike Christie 2023-05-18 0:09 ` Mike Christie 2023-05-18 0:09 ` [RFC PATCH 7/8] vhost-net: " Mike Christie 2023-05-18 0:09 ` Mike Christie 2023-05-18 0:09 ` [RFC PATCH 8/8] fork/vhost_task: remove no_files Mike Christie 2023-05-18 0:09 ` Mike Christie 2023-05-18 1:04 ` Mike Christie 2023-05-18 1:04 ` Mike Christie 2023-05-18 12:31 ` kernel test robot 2023-05-18 15:30 ` kernel test robot 2023-05-18 23:14 ` kernel test robot 2023-05-19 7:26 ` kernel test robot 2023-05-18 8:25 ` [RFC PATCH 0/8] vhost_tasks: Use CLONE_THREAD/SIGHAND Christian Brauner 2023-05-18 8:40 ` Christian Brauner 2023-05-18 14:30 ` Christian Brauner
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20230813145936-mutt-send-email-mst@kernel.org \ --to=mst@redhat.com \ --cc=brauner@kernel.org \ --cc=ebiederm@xmission.com \ --cc=hch@infradead.org \ --cc=jasowang@redhat.com \ --cc=konrad.wilk@oracle.com \ --cc=linux-kernel@vger.kernel.org \ --cc=michael.christie@oracle.com \ --cc=sgarzare@redhat.com \ --cc=stefanha@redhat.com \ --cc=torvalds@linux-foundation.org \ --cc=virtualization@lists.linux-foundation.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.