Target-devel archive on lore.kernel.org
 help / color / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Stefan Hajnoczi <stefanha@gmail.com>,
	Mike Christie <michael.christie@oracle.com>, fam <fam@euphon.net>,
	linux-scsi <linux-scsi@vger.kernel.org>,
	Jason Wang <jasowang@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Linux Virtualization <virtualization@lists.linux-foundation.org>,
	target-devel <target-devel@vger.kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Stefano Garzarella <sgarzare@redhat.com>
Subject: Re: [PATCH 00/10] vhost/qemu: thread per IO SCSI vq
Date: Tue, 1 Dec 2020 12:59:43 +0000
Message-ID: <20201201125943.GE585157@stefanha-x1.localdomain> (raw)
In-Reply-To: <20201120072802-mutt-send-email-mst@kernel.org>


[-- Attachment #1: Type: text/plain, Size: 4713 bytes --]

On Fri, Nov 20, 2020 at 07:31:08AM -0500, Michael S. Tsirkin wrote:
> On Fri, Nov 20, 2020 at 08:45:49AM +0000, Stefan Hajnoczi wrote:
> > On Thu, Nov 19, 2020 at 5:08 PM Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > >
> > > On Thu, Nov 19, 2020 at 4:43 PM Mike Christie
> > > <michael.christie@oracle.com> wrote:
> > > >
> > > > On 11/19/20 10:24 AM, Stefan Hajnoczi wrote:
> > > > > On Thu, Nov 19, 2020 at 4:13 PM Mike Christie
> > > > > <michael.christie@oracle.com> wrote:
> > > > >>
> > > > >> On 11/19/20 8:46 AM, Michael S. Tsirkin wrote:
> > > > >>> On Wed, Nov 18, 2020 at 11:31:17AM +0000, Stefan Hajnoczi wrote:
> > > > > struct vhost_run_worker_info {
> > > > >      struct timespec *timeout;
> > > > >      sigset_t *sigmask;
> > > > >
> > > > >      /* List of virtqueues to process */
> > > > >      unsigned nvqs;
> > > > >      unsigned vqs[];
> > > > > };
> > > > >
> > > > > /* This blocks until the timeout is reached, a signal is received, or
> > > > > the vhost device is destroyed */
> > > > > int ret = ioctl(vhost_fd, VHOST_RUN_WORKER, &info);
> > > > >
> > > > > As you can see, userspace isn't involved with dealing with the
> > > > > requests. It just acts as a thread donor to the vhost driver.
> > > > >
> > > > > We would want the VHOST_RUN_WORKER calls to be infrequent to avoid the
> > > > > penalty of switching into the kernel, copying in the arguments, etc.
> > > >
> > > > I didn't get this part. Why have the timeout? When the timeout expires,
> > > > does userspace just call right back down to the kernel or does it do
> > > > some sort of processing/operation?
> > > >
> > > > You could have your worker function run from that ioctl wait for a
> > > > signal or a wake up call from the vhost_work/poll functions.
> > >
> > > An optional timeout argument is common in blocking interfaces like
> > > poll(2), recvmmsg(2), etc.
> > >
> > > Although something can send a signal to the thread instead,
> > > implementing that in an application is more awkward than passing a
> > > struct timespec.
> > >
> > > Compared to other blocking calls we don't expect
> > > ioctl(VHOST_RUN_WORKER) to return soon, so maybe the timeout will
> > > rarely be used and can be dropped from the interface.
> > >
> > > BTW the code I posted wasn't a carefully thought out proposal :). The
> > > details still need to be considered and I'm going to be offline for
> > > the next week so maybe someone else can think it through in the
> > > meantime.
> > 
> > One final thought before I'm offline for a week. If
> > ioctl(VHOST_RUN_WORKER) is specific to a single vhost device instance
> > then it's hard to support poll-mode (busy waiting) workers because
> > each device instance consumes a whole CPU. If we stick to an interface
> > where the kernel manages the worker threads then it's easier to share
> > workers between devices for polling.
> 
> 
> Yes that is the reason vhost did its own reason in the first place.
> 
> 
> I am vaguely thinking about poll(2) or a similar interface,
> which can wait for an event on multiple FDs.

I can imagine how using poll(2) would work from a userspace perspective,
but on the kernel side I don't think it can be implemented cleanly.
poll(2) is tied to the file_operations->poll() callback and
read/write/error events. Not to mention there isn't a way to substitue
the vhost worker thread function instead of scheduling out the current
thread while waiting for poll fd events.

But maybe ioctl(VHOST_WORKER_RUN) can do it:

  struct vhost_run_worker_dev {
      int vhostfd;      /* /dev/vhost-TYPE fd */
      unsigned nvqs;    /* number of virtqueues in vqs[] */
      unsigned vqs[];   /* virtqueues to process */
  };

  struct vhost_run_worker_info {
       struct timespec *timeout;
       sigset_t *sigmask;

       unsigned ndevices;
       struct vhost_run_worker_dev *devices[];
  };

In the simple case userspace sets ndevices to 1 and we just handle
virtqueues for the current device.

In the fancier shared worker thread case the userspace process has the
vhost fds of all the devices it is processing and passes them to
ioctl(VHOST_WORKER_RUN) via struct vhost_run_worker_dev elements.

From a security perspective it means the userspace thread has access to
all vhost devices (because it has their fds).

I'm not sure how the mm is supposed to work. The devices might be
associated with different userspace processes (guests) and therefore
have different virtual memory.

Just wanted to push this discussion along a little further. I'm buried
under emails and probably wont be very active over the next few days.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply index

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-12 23:18 Mike Christie
2020-11-12 23:19 ` [PATCH 1/1] qemu vhost scsi: add VHOST_SET_VRING_ENABLE support Mike Christie
2020-11-17 11:53   ` Stefan Hajnoczi
2020-12-02  9:59   ` Michael S. Tsirkin
2020-12-02 16:05     ` Michael Christie
2020-11-12 23:19 ` [PATCH 01/10] vhost: remove work arg from vhost_work_flush Mike Christie
2020-11-17 13:04   ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 02/10] vhost scsi: remove extra flushes Mike Christie
2020-11-17 13:07   ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 03/10] vhost poll: fix coding style Mike Christie
2020-11-17 13:07   ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 04/10] vhost: support multiple worker threads Mike Christie
2020-11-12 23:19 ` [PATCH 05/10] vhost: poll support support multiple workers Mike Christie
2020-11-17 15:32   ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 06/10] vhost scsi: make SCSI cmd completion per vq Mike Christie
2020-11-17 16:04   ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 07/10] vhost, vhost-scsi: flush IO vqs then send TMF rsp Mike Christie
2020-11-17 16:05   ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 08/10] vhost: move msg_handler to new ops struct Mike Christie
2020-11-17 16:08   ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 09/10] vhost: add VHOST_SET_VRING_ENABLE support Mike Christie
2020-11-17 16:14   ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 10/10] vhost-scsi: create a woker per IO vq Mike Christie
2020-11-17 16:40 ` [PATCH 00/10] vhost/qemu: thread per IO SCSI vq Stefan Hajnoczi
2020-11-17 19:13   ` Mike Christie
2020-11-18  9:54     ` Michael S. Tsirkin
2020-11-19 14:00       ` Stefan Hajnoczi
2020-11-18 11:31     ` Stefan Hajnoczi
2020-11-19 14:46       ` Michael S. Tsirkin
2020-11-19 16:11         ` Mike Christie
2020-11-19 16:24           ` Stefan Hajnoczi
2020-11-19 16:43             ` Mike Christie
2020-11-19 17:08               ` Stefan Hajnoczi
2020-11-20  8:45                 ` Stefan Hajnoczi
2020-11-20 12:31                   ` Michael S. Tsirkin
2020-12-01 12:59                     ` Stefan Hajnoczi [this message]
2020-12-01 13:45                       ` Stefano Garzarella
2020-12-01 17:43                         ` Stefan Hajnoczi
2020-12-02 10:35                           ` Stefano Garzarella
2020-11-23 15:17                   ` Stefano Garzarella
2020-11-18  5:17   ` Jason Wang
2020-11-18  6:57     ` Mike Christie
2020-11-18  7:19       ` Mike Christie
2020-11-18  7:54       ` Jason Wang
2020-11-18 20:06         ` Mike Christie
2020-11-19  4:35           ` Jason Wang
2020-11-19 15:49             ` Mike Christie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201201125943.GE585157@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=fam@euphon.net \
    --cc=jasowang@redhat.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=michael.christie@oracle.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@gmail.com \
    --cc=target-devel@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Target-devel archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/target-devel/0 target-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 target-devel target-devel/ https://lore.kernel.org/target-devel \
		target-devel@vger.kernel.org
	public-inbox-index target-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.target-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git