All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Christie <michael.christie@oracle.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>, fam <fam@euphon.net>,
	linux-scsi <linux-scsi@vger.kernel.org>,
	Jason Wang <jasowang@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Linux Virtualization <virtualization@lists.linux-foundation.org>,
	target-devel <target-devel@vger.kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH 00/10] vhost/qemu: thread per IO SCSI vq
Date: Thu, 19 Nov 2020 10:43:40 -0600	[thread overview]
Message-ID: <ffd88f0c-981e-a102-4b08-f29d6b9a0f71@oracle.com> (raw)
In-Reply-To: <CAJSP0QUvSwX5NCPmfSODV_C+D41E21LZT=oXQ2PLc6baAsGGDQ@mail.gmail.com>

On 11/19/20 10:24 AM, Stefan Hajnoczi wrote:
> On Thu, Nov 19, 2020 at 4:13 PM Mike Christie
> <michael.christie@oracle.com> wrote:
>>
>> On 11/19/20 8:46 AM, Michael S. Tsirkin wrote:
>>> On Wed, Nov 18, 2020 at 11:31:17AM +0000, Stefan Hajnoczi wrote:
>>>>> My preference has been:
>>>>>
>>>>> 1. If we were to ditch cgroups, then add a new interface that would allow
>>>>> us to bind threads to a specific CPU, so that it lines up with the guest's
>>>>> mq to CPU mapping.
>>>>
>>>> A 1:1 vCPU/vq->CPU mapping isn't desirable in all cases.
>>>>
>>>> The CPU affinity is a userspace policy decision. The host kernel should
>>>> provide a mechanism but not the policy. That way userspace can decide
>>>> which workers are shared by multiple vqs and on which physical CPUs they
>>>> should run.
>>>
>>> So if we let userspace dictate the threading policy then I think binding
>>> vqs to userspace threads and running there makes the most sense,
>>> no need to create the threads.
>>>
>>
>> Just to make sure I am on the same page, in one of the first postings of
>> this set at the bottom of the mail:
>>
>> https://urldefense.com/v3/__https://www.spinics.net/lists/linux-scsi/msg148322.html__;!!GqivPVa7Brio!PdGIFdzqcAb6DW8twtjX3r7xOcM7XbTh7Ndkhxhb-1fV1VNB4lXjFzwFVE1zczUIE2Mp$
>>
>> I asked about a new interface and had done something more like what
>> Stefan posted:
>>
>>     struct vhost_vq_worker_info {
>>         /*
>>          * The pid of an existing vhost worker that this vq will be
>>          * assigned to. When pid is 0 the virtqueue is assigned to the
>>          * default vhost worker. When pid is -1 a new worker thread is
>>          * created for this virtqueue. When pid is -2 the virtqueue's
>>          * worker thread is unchanged.
>>          *
>>          * If a vhost worker no longer has any virtqueues assigned to it
>>          * then it will terminate.
>>          *
>>          * The pid of the vhost worker is stored to this field when the
>>          * ioctl completes successfully. Use pid -2 to query the current
>>          * vhost worker pid.
>>          */
>>         __kernel_pid_t pid;  /* in/out */
>>
>>         /* The virtqueue index*/
>>         unsigned int vq_idx; /* in */
>>     };
>>
>> This approach is simple and it allowed me to have userspace map queues
>> and threads optimally for our setups.
>>
>> Note: Stefan, in response to your previous comment, I am just using my
>> 1:1 mapping as an example and would make it configurable from userspace.
>>
>> In the email above are you guys suggesting to execute the SCSI/vhost
>> requests in userspace? We should not do that because:
>>
>> 1. It negates part of what makes vhost fast where we do not have to kick
>> out to userspace then back to the kernel.
>>
>> 2. It's not doable or becomes a crazy mess because vhost-scsi is tied to
>> the scsi/target layer in the kernel. You can't process the scsi command
>> in userspace since the scsi state machine and all its configuration info
>> is in the kernel's scsi/target layer.
>>
>> For example, I was just the maintainer of the target_core_user module
>> that hooks into LIO/target on the backend (vhost-scsi hooks in on the
>> front end) and passes commands to userspace and there we have a
>> semi-shadow state machine. It gets nasty to try and maintain/sync state
>> between lio/target core in the kernel and in userspace. We also see the
>> perf loss I mentioned in #1.
> 
> No, if I understand Michael correctly he has suggested a different approach.
> 
> My suggestion was that the kernel continues to manage the worker
> threads but an ioctl allows userspace to control the policy.
> 
> I think Michael is saying that the kernel shouldn't manage/create
> threads. Userspace should create threads and then invoke an ioctl from
> those threads.
> 
> The ioctl will call into the vhost driver where it will execute
> something similar to vhost_worker(). So this ioctl will block while
> the kernel is using the thread to process vqs.
> 
> What isn't clear to me is how to tell the kernel which vqs are
> processed by a thread. We could try to pass that information into the
> ioctl. I'm not sure what the cleanest solution is here.
> 
> Maybe something like:
> 
> struct vhost_run_worker_info {
>      struct timespec *timeout;
>      sigset_t *sigmask;
> 
>      /* List of virtqueues to process */
>      unsigned nvqs;
>      unsigned vqs[];
> };
> 
> /* This blocks until the timeout is reached, a signal is received, or
> the vhost device is destroyed */
> int ret = ioctl(vhost_fd, VHOST_RUN_WORKER, &info);
> 
> As you can see, userspace isn't involved with dealing with the
> requests. It just acts as a thread donor to the vhost driver.
> 
> We would want the VHOST_RUN_WORKER calls to be infrequent to avoid the
> penalty of switching into the kernel, copying in the arguments, etc.

I didn't get this part. Why have the timeout? When the timeout expires, 
does userspace just call right back down to the kernel or does it do 
some sort of processing/operation?

You could have your worker function run from that ioctl wait for a 
signal or a wake up call from the vhost_work/poll functions.


> 
> Michael: is this the kind of thing you were thinking of?
> 
> Stefan
> 


WARNING: multiple messages have this Message-ID (diff)
From: Mike Christie <michael.christie@oracle.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: fam <fam@euphon.net>, linux-scsi <linux-scsi@vger.kernel.org>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Linux Virtualization <virtualization@lists.linux-foundation.org>,
	target-devel <target-devel@vger.kernel.org>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH 00/10] vhost/qemu: thread per IO SCSI vq
Date: Thu, 19 Nov 2020 10:43:40 -0600	[thread overview]
Message-ID: <ffd88f0c-981e-a102-4b08-f29d6b9a0f71@oracle.com> (raw)
In-Reply-To: <CAJSP0QUvSwX5NCPmfSODV_C+D41E21LZT=oXQ2PLc6baAsGGDQ@mail.gmail.com>

On 11/19/20 10:24 AM, Stefan Hajnoczi wrote:
> On Thu, Nov 19, 2020 at 4:13 PM Mike Christie
> <michael.christie@oracle.com> wrote:
>>
>> On 11/19/20 8:46 AM, Michael S. Tsirkin wrote:
>>> On Wed, Nov 18, 2020 at 11:31:17AM +0000, Stefan Hajnoczi wrote:
>>>>> My preference has been:
>>>>>
>>>>> 1. If we were to ditch cgroups, then add a new interface that would allow
>>>>> us to bind threads to a specific CPU, so that it lines up with the guest's
>>>>> mq to CPU mapping.
>>>>
>>>> A 1:1 vCPU/vq->CPU mapping isn't desirable in all cases.
>>>>
>>>> The CPU affinity is a userspace policy decision. The host kernel should
>>>> provide a mechanism but not the policy. That way userspace can decide
>>>> which workers are shared by multiple vqs and on which physical CPUs they
>>>> should run.
>>>
>>> So if we let userspace dictate the threading policy then I think binding
>>> vqs to userspace threads and running there makes the most sense,
>>> no need to create the threads.
>>>
>>
>> Just to make sure I am on the same page, in one of the first postings of
>> this set at the bottom of the mail:
>>
>> https://urldefense.com/v3/__https://www.spinics.net/lists/linux-scsi/msg148322.html__;!!GqivPVa7Brio!PdGIFdzqcAb6DW8twtjX3r7xOcM7XbTh7Ndkhxhb-1fV1VNB4lXjFzwFVE1zczUIE2Mp$
>>
>> I asked about a new interface and had done something more like what
>> Stefan posted:
>>
>>     struct vhost_vq_worker_info {
>>         /*
>>          * The pid of an existing vhost worker that this vq will be
>>          * assigned to. When pid is 0 the virtqueue is assigned to the
>>          * default vhost worker. When pid is -1 a new worker thread is
>>          * created for this virtqueue. When pid is -2 the virtqueue's
>>          * worker thread is unchanged.
>>          *
>>          * If a vhost worker no longer has any virtqueues assigned to it
>>          * then it will terminate.
>>          *
>>          * The pid of the vhost worker is stored to this field when the
>>          * ioctl completes successfully. Use pid -2 to query the current
>>          * vhost worker pid.
>>          */
>>         __kernel_pid_t pid;  /* in/out */
>>
>>         /* The virtqueue index*/
>>         unsigned int vq_idx; /* in */
>>     };
>>
>> This approach is simple and it allowed me to have userspace map queues
>> and threads optimally for our setups.
>>
>> Note: Stefan, in response to your previous comment, I am just using my
>> 1:1 mapping as an example and would make it configurable from userspace.
>>
>> In the email above are you guys suggesting to execute the SCSI/vhost
>> requests in userspace? We should not do that because:
>>
>> 1. It negates part of what makes vhost fast where we do not have to kick
>> out to userspace then back to the kernel.
>>
>> 2. It's not doable or becomes a crazy mess because vhost-scsi is tied to
>> the scsi/target layer in the kernel. You can't process the scsi command
>> in userspace since the scsi state machine and all its configuration info
>> is in the kernel's scsi/target layer.
>>
>> For example, I was just the maintainer of the target_core_user module
>> that hooks into LIO/target on the backend (vhost-scsi hooks in on the
>> front end) and passes commands to userspace and there we have a
>> semi-shadow state machine. It gets nasty to try and maintain/sync state
>> between lio/target core in the kernel and in userspace. We also see the
>> perf loss I mentioned in #1.
> 
> No, if I understand Michael correctly he has suggested a different approach.
> 
> My suggestion was that the kernel continues to manage the worker
> threads but an ioctl allows userspace to control the policy.
> 
> I think Michael is saying that the kernel shouldn't manage/create
> threads. Userspace should create threads and then invoke an ioctl from
> those threads.
> 
> The ioctl will call into the vhost driver where it will execute
> something similar to vhost_worker(). So this ioctl will block while
> the kernel is using the thread to process vqs.
> 
> What isn't clear to me is how to tell the kernel which vqs are
> processed by a thread. We could try to pass that information into the
> ioctl. I'm not sure what the cleanest solution is here.
> 
> Maybe something like:
> 
> struct vhost_run_worker_info {
>      struct timespec *timeout;
>      sigset_t *sigmask;
> 
>      /* List of virtqueues to process */
>      unsigned nvqs;
>      unsigned vqs[];
> };
> 
> /* This blocks until the timeout is reached, a signal is received, or
> the vhost device is destroyed */
> int ret = ioctl(vhost_fd, VHOST_RUN_WORKER, &info);
> 
> As you can see, userspace isn't involved with dealing with the
> requests. It just acts as a thread donor to the vhost driver.
> 
> We would want the VHOST_RUN_WORKER calls to be infrequent to avoid the
> penalty of switching into the kernel, copying in the arguments, etc.

I didn't get this part. Why have the timeout? When the timeout expires, 
does userspace just call right back down to the kernel or does it do 
some sort of processing/operation?

You could have your worker function run from that ioctl wait for a 
signal or a wake up call from the vhost_work/poll functions.


> 
> Michael: is this the kind of thing you were thinking of?
> 
> Stefan
> 



  reply	other threads:[~2020-11-19 16:44 UTC|newest]

Thread overview: 114+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-12 23:18 [PATCH 00/10] vhost/qemu: thread per IO SCSI vq Mike Christie
2020-11-12 23:18 ` Mike Christie
2020-11-12 23:19 ` [PATCH 1/1] qemu vhost scsi: add VHOST_SET_VRING_ENABLE support Mike Christie
2020-11-12 23:19   ` Mike Christie
2020-11-17 11:53   ` Stefan Hajnoczi
2020-11-17 11:53     ` Stefan Hajnoczi
2020-11-17 11:53     ` Stefan Hajnoczi
2020-12-02  9:59   ` Michael S. Tsirkin
2020-12-02  9:59     ` Michael S. Tsirkin
2020-12-02  9:59     ` Michael S. Tsirkin
2020-12-02 16:05     ` Michael Christie
2020-12-02 16:05       ` Michael Christie
2020-11-12 23:19 ` [PATCH 01/10] vhost: remove work arg from vhost_work_flush Mike Christie
2020-11-12 23:19   ` Mike Christie
2020-11-17 13:04   ` Stefan Hajnoczi
2020-11-17 13:04     ` Stefan Hajnoczi
2020-11-17 13:04     ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 02/10] vhost scsi: remove extra flushes Mike Christie
2020-11-12 23:19   ` Mike Christie
2020-11-17 13:07   ` Stefan Hajnoczi
2020-11-17 13:07     ` Stefan Hajnoczi
2020-11-17 13:07     ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 03/10] vhost poll: fix coding style Mike Christie
2020-11-12 23:19   ` Mike Christie
2020-11-17 13:07   ` Stefan Hajnoczi
2020-11-17 13:07     ` Stefan Hajnoczi
2020-11-17 13:07     ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 04/10] vhost: support multiple worker threads Mike Christie
2020-11-12 23:19   ` Mike Christie
2020-11-12 23:19 ` [PATCH 05/10] vhost: poll support support multiple workers Mike Christie
2020-11-12 23:19   ` Mike Christie
2020-11-17 15:32   ` Stefan Hajnoczi
2020-11-17 15:32     ` Stefan Hajnoczi
2020-11-17 15:32     ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 06/10] vhost scsi: make SCSI cmd completion per vq Mike Christie
2020-11-12 23:19   ` Mike Christie
2020-11-17 16:04   ` Stefan Hajnoczi
2020-11-17 16:04     ` Stefan Hajnoczi
2020-11-17 16:04     ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 07/10] vhost, vhost-scsi: flush IO vqs then send TMF rsp Mike Christie
2020-11-12 23:19   ` Mike Christie
2020-11-17 16:05   ` Stefan Hajnoczi
2020-11-17 16:05     ` Stefan Hajnoczi
2020-11-17 16:05     ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 08/10] vhost: move msg_handler to new ops struct Mike Christie
2020-11-12 23:19   ` Mike Christie
2020-11-17 16:08   ` Stefan Hajnoczi
2020-11-17 16:08     ` Stefan Hajnoczi
2020-11-17 16:08     ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 09/10] vhost: add VHOST_SET_VRING_ENABLE support Mike Christie
2020-11-12 23:19   ` Mike Christie
2020-11-17 16:14   ` Stefan Hajnoczi
2020-11-17 16:14     ` Stefan Hajnoczi
2020-11-17 16:14     ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 10/10] vhost-scsi: create a woker per IO vq Mike Christie
2020-11-12 23:19   ` Mike Christie
2020-11-17 16:40 ` [PATCH 00/10] vhost/qemu: thread per IO SCSI vq Stefan Hajnoczi
2020-11-17 16:40   ` Stefan Hajnoczi
2020-11-17 16:40   ` Stefan Hajnoczi
2020-11-17 19:13   ` Mike Christie
2020-11-17 19:13     ` Mike Christie
2020-11-18  9:54     ` Michael S. Tsirkin
2020-11-18  9:54       ` Michael S. Tsirkin
2020-11-18  9:54       ` Michael S. Tsirkin
2020-11-19 14:00       ` Stefan Hajnoczi
2020-11-19 14:00         ` Stefan Hajnoczi
2020-11-19 14:00         ` Stefan Hajnoczi
2020-11-18 11:31     ` Stefan Hajnoczi
2020-11-18 11:31       ` Stefan Hajnoczi
2020-11-18 11:31       ` Stefan Hajnoczi
2020-11-19 14:46       ` Michael S. Tsirkin
2020-11-19 14:46         ` Michael S. Tsirkin
2020-11-19 14:46         ` Michael S. Tsirkin
2020-11-19 16:11         ` Mike Christie
2020-11-19 16:11           ` Mike Christie
2020-11-19 16:24           ` Stefan Hajnoczi
2020-11-19 16:24             ` Stefan Hajnoczi
2020-11-19 16:24             ` Stefan Hajnoczi
2020-11-19 16:43             ` Mike Christie [this message]
2020-11-19 16:43               ` Mike Christie
2020-11-19 17:08               ` Stefan Hajnoczi
2020-11-19 17:08                 ` Stefan Hajnoczi
2020-11-19 17:08                 ` Stefan Hajnoczi
2020-11-20  8:45                 ` Stefan Hajnoczi
2020-11-20  8:45                   ` Stefan Hajnoczi
2020-11-20  8:45                   ` Stefan Hajnoczi
2020-11-20 12:31                   ` Michael S. Tsirkin
2020-11-20 12:31                     ` Michael S. Tsirkin
2020-11-20 12:31                     ` Michael S. Tsirkin
2020-12-01 12:59                     ` Stefan Hajnoczi
2020-12-01 12:59                       ` Stefan Hajnoczi
2020-12-01 12:59                       ` Stefan Hajnoczi
2020-12-01 13:45                       ` Stefano Garzarella
2020-12-01 13:45                         ` Stefano Garzarella
2020-12-01 13:45                         ` Stefano Garzarella
2020-12-01 17:43                         ` Stefan Hajnoczi
2020-12-01 17:43                           ` Stefan Hajnoczi
2020-12-01 17:43                           ` Stefan Hajnoczi
2020-12-02 10:35                           ` Stefano Garzarella
2020-12-02 10:35                             ` Stefano Garzarella
2020-12-02 10:35                             ` Stefano Garzarella
2020-11-23 15:17                   ` Stefano Garzarella
2020-11-23 15:17                     ` Stefano Garzarella
2020-11-23 15:17                     ` Stefano Garzarella
2020-11-18  5:17   ` Jason Wang
2020-11-18  5:17     ` Jason Wang
2020-11-18  6:57     ` Mike Christie
2020-11-18  7:19       ` Mike Christie
2020-11-18  7:54       ` Jason Wang
2020-11-18  7:54         ` Jason Wang
2020-11-18 20:06         ` Mike Christie
2020-11-19  4:35           ` Jason Wang
2020-11-19  4:35             ` Jason Wang
2020-11-19 15:49             ` Mike Christie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ffd88f0c-981e-a102-4b08-f29d6b9a0f71@oracle.com \
    --to=michael.christie@oracle.com \
    --cc=fam@euphon.net \
    --cc=jasowang@redhat.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=stefanha@redhat.com \
    --cc=target-devel@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.