All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>,
	Sridhar Samudrala <sri@us.ibm.com>,
	netdev <netdev@vger.kernel.org>,
	lkml <linux-kernel@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dmitri Vorobiev <dmitri.vorobiev@movial.com>,
	Jiri Kosina <jkosina@suse.cz>,
	Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
	Andi Kleen <ak@linux.intel.com>
Subject: Re: [PATCH 1/3] vhost: replace vhost_workqueue with per-vhost kthread
Date: Mon, 31 May 2010 18:22:21 +0300	[thread overview]
Message-ID: <20100531152221.GB2987@redhat.com> (raw)
In-Reply-To: <4C02C961.9050606@kernel.org>

On Sun, May 30, 2010 at 10:24:01PM +0200, Tejun Heo wrote:
> Replace vhost_workqueue with per-vhost kthread.  Other than callback
> argument change from struct work_struct * to struct vhost_poll *,
> there's no visible change to vhost_poll_*() interface.

I would prefer a substructure vhost_work, even just to make
the code easier to review and compare to workqueue.c.

> This conversion is to make each vhost use a dedicated kthread so that
> resource control via cgroup can be applied.
> 
> Partially based on Sridhar Samudrala's patch.
> 
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Sridhar Samudrala <samudrala.sridhar@gmail.com>
> ---
> Okay, here is three patch series to convert vhost to use per-vhost
> kthread, add cgroup_attach_task_current_cg() and apply it to the vhost
> kthreads.  The conversion is mostly straight forward although flush is
> slightly tricky.
> 
> The problem is that I have no idea how to test this.

It's a 3 step process:

1. 
Install qemu-kvm under fc13, or build recent one from source,
get it from here:
git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git

2. install guest under it:
qemu-img create -f qcow2 disk.qcow2 100G
Now get some image (e.g. fedora 13 in fc13.iso)
and install guest:
qemu-kvm -enable-kvm -m 1G -cdrom fc13.iso -drive file=disk.qcow2


3. set up networking. I usually simply do host to guest 
on a special subnet for testing purposes:

Set up a bridge named mstbr0:

ifconfig mstbr0 down
brctl delbr mstbr0
brctl addbr mstbr0
brctl setfd mstbr0 0
ifconfig mstbr0 11.0.0.1

cat > ifup << EOF
#!/bin/sh -x
/sbin/ifconfig msttap0 0.0.0.0 up
brctl addif mstbr0 msttap0
EOF


qemu-kvm -enable-kvm -m 1G -cdrom fc13.iso -drive file=disk.qcow2
 -net nic,model=virtio,netdev=foo -netdev
tap,id=foo,ifname=msttap0,script=/home/mst/ifup,downscript=no,vhost=on

after you set up the guest, log into it and
ifconfig eth0 11.0.0.2

You should now be able to ping guest to host and back.
Use something like netperf to stress the connection.
Close qemu with kill -9 and unload module to test flushing code.



> Index: work/drivers/vhost/vhost.c
> ===================================================================
> --- work.orig/drivers/vhost/vhost.c
> +++ work/drivers/vhost/vhost.c

...

> @@ -125,10 +139,50 @@ static void vhost_vq_reset(struct vhost_
>  	vq->log_ctx = NULL;
>  }
> 
> +static int vhost_poller(void *data)
> +{
> +	struct vhost_dev *dev = data;
> +	struct vhost_poll *poll;
> +
> +repeat:
> +	set_current_state(TASK_INTERRUPTIBLE);	/* mb paired w/ kthread_stop */
> +
> +	if (kthread_should_stop()) {
> +		__set_current_state(TASK_RUNNING);
> +		return 0;
> +	}
> +
> +	poll = NULL;
> +	spin_lock(&dev->poller_lock);
> +	if (!list_empty(&dev->poll_list)) {
> +		poll = list_first_entry(&dev->poll_list,
> +					struct vhost_poll, node);
> +		list_del_init(&poll->node);
> +	}
> +	spin_unlock(&dev->poller_lock);
> +
> +	if (poll) {
> +		__set_current_state(TASK_RUNNING);
> +		poll->fn(poll);
> +		smp_wmb();	/* paired with rmb in vhost_poll_flush() */
> +		poll->done_seq = poll->queue_seq;
> +		wake_up_all(&poll->done);


This seems to add wakeups on data path, which uses spinlocks etc.
OTOH workqueue.c adds a special barrier
entry which only does a wakeup when needed.
Right?

> +	} else
> +		schedule();
> +
> +	goto repeat;
> +}
> +
>  long vhost_dev_init(struct vhost_dev *dev,
>  		    struct vhost_virtqueue *vqs, int nvqs)
>  {
> +	struct task_struct *poller;
>  	int i;
> +
> +	poller = kthread_create(vhost_poller, dev, "vhost-%d", current->pid);
> +	if (IS_ERR(poller))
> +		return PTR_ERR(poller);
> +
>  	dev->vqs = vqs;
>  	dev->nvqs = nvqs;
>  	mutex_init(&dev->mutex);
> @@ -136,6 +190,9 @@ long vhost_dev_init(struct vhost_dev *de
>  	dev->log_file = NULL;
>  	dev->memory = NULL;
>  	dev->mm = NULL;
> +	spin_lock_init(&dev->poller_lock);
> +	INIT_LIST_HEAD(&dev->poll_list);
> +	dev->poller = poller;
> 
>  	for (i = 0; i < dev->nvqs; ++i) {
>  		dev->vqs[i].dev = dev;
> @@ -143,8 +200,7 @@ long vhost_dev_init(struct vhost_dev *de
>  		vhost_vq_reset(dev, dev->vqs + i);
>  		if (dev->vqs[i].handle_kick)
>  			vhost_poll_init(&dev->vqs[i].poll,
> -					dev->vqs[i].handle_kick,
> -					POLLIN);
> +					dev->vqs[i].handle_kick, POLLIN, dev);
>  	}
>  	return 0;
>  }
> @@ -217,6 +273,8 @@ void vhost_dev_cleanup(struct vhost_dev
>  	if (dev->mm)
>  		mmput(dev->mm);
>  	dev->mm = NULL;
> +
> +	kthread_stop(dev->poller);
>  }
> 
>  static int log_access_ok(void __user *log_base, u64 addr, unsigned long sz)
> @@ -1113,16 +1171,3 @@ void vhost_disable_notify(struct vhost_v
>  		vq_err(vq, "Failed to enable notification at %p: %d\n",
>  		       &vq->used->flags, r);
>  }
> -
> -int vhost_init(void)
> -{
> -	vhost_workqueue = create_singlethread_workqueue("vhost");
> -	if (!vhost_workqueue)
> -		return -ENOMEM;
> -	return 0;
> -}
> -
> -void vhost_cleanup(void)
> -{
> -	destroy_workqueue(vhost_workqueue);

I note that destroy_workqueue does a flush, kthread_stop
doesn't. Right? Sure we don't need to check nothing is in one of
the lists? Maybe add a BUG_ON?

> -}
> Index: work/drivers/vhost/vhost.h
> ===================================================================
> --- work.orig/drivers/vhost/vhost.h
> +++ work/drivers/vhost/vhost.h
> @@ -5,7 +5,6 @@
>  #include <linux/vhost.h>
>  #include <linux/mm.h>
>  #include <linux/mutex.h>
> -#include <linux/workqueue.h>
>  #include <linux/poll.h>
>  #include <linux/file.h>
>  #include <linux/skbuff.h>
> @@ -20,19 +19,26 @@ enum {
>  	VHOST_NET_MAX_SG = MAX_SKB_FRAGS + 2,
>  };
> 
> +struct vhost_poll;
> +typedef void (*vhost_poll_fn_t)(struct vhost_poll *poll);
> +
>  /* Poll a file (eventfd or socket) */
>  /* Note: there's nothing vhost specific about this structure. */
>  struct vhost_poll {
> +	vhost_poll_fn_t		  fn;
>  	poll_table                table;
>  	wait_queue_head_t        *wqh;
>  	wait_queue_t              wait;
> -	/* struct which will handle all actual work. */
> -	struct work_struct        work;
> +	struct list_head	  node;
> +	wait_queue_head_t	  done;
>  	unsigned long		  mask;
> +	struct vhost_dev	 *dev;
> +	int			  queue_seq;
> +	int			  done_seq;
>  };
> 
> -void vhost_poll_init(struct vhost_poll *poll, work_func_t func,
> -		     unsigned long mask);
> +void vhost_poll_init(struct vhost_poll *poll, vhost_poll_fn_t fn,
> +		     unsigned long mask, struct vhost_dev *dev);
>  void vhost_poll_start(struct vhost_poll *poll, struct file *file);
>  void vhost_poll_stop(struct vhost_poll *poll);
>  void vhost_poll_flush(struct vhost_poll *poll);
> @@ -63,7 +69,7 @@ struct vhost_virtqueue {
>  	struct vhost_poll poll;
> 
>  	/* The routine to call when the Guest pings us, or timeout. */
> -	work_func_t handle_kick;
> +	vhost_poll_fn_t handle_kick;
> 
>  	/* Last available index we saw. */
>  	u16 last_avail_idx;
> @@ -86,11 +92,11 @@ struct vhost_virtqueue {
>  	struct iovec hdr[VHOST_NET_MAX_SG];
>  	size_t hdr_size;
>  	/* We use a kind of RCU to access private pointer.
> -	 * All readers access it from workqueue, which makes it possible to
> -	 * flush the workqueue instead of synchronize_rcu. Therefore readers do
> +	 * All readers access it from poller, which makes it possible to
> +	 * flush the vhost_poll instead of synchronize_rcu. Therefore readers do
>  	 * not need to call rcu_read_lock/rcu_read_unlock: the beginning of
> -	 * work item execution acts instead of rcu_read_lock() and the end of
> -	 * work item execution acts instead of rcu_read_lock().
> +	 * vhost_poll execution acts instead of rcu_read_lock() and the end of
> +	 * vhost_poll execution acts instead of rcu_read_lock().
>  	 * Writers use virtqueue mutex. */
>  	void *private_data;
>  	/* Log write descriptors */
> @@ -110,6 +116,9 @@ struct vhost_dev {
>  	int nvqs;
>  	struct file *log_file;
>  	struct eventfd_ctx *log_ctx;
> +	spinlock_t poller_lock;
> +	struct list_head poll_list;
> +	struct task_struct *poller;
>  };
> 
>  long vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue *vqs, int nvqs);
> @@ -136,9 +145,6 @@ bool vhost_enable_notify(struct vhost_vi
>  int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
>  		    unsigned int log_num, u64 len);
> 
> -int vhost_init(void);
> -void vhost_cleanup(void);
> -
>  #define vq_err(vq, fmt, ...) do {                                  \
>  		pr_debug(pr_fmt(fmt), ##__VA_ARGS__);       \
>  		if ((vq)->error_ctx)                               \

  parent reply	other threads:[~2010-05-31 15:27 UTC|newest]

Thread overview: 115+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-19  0:04 [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup Sridhar Samudrala
2010-05-27  9:14 ` Michael S. Tsirkin
2010-05-27 12:44   ` Oleg Nesterov
2010-05-27 13:12     ` Michael S. Tsirkin
2010-05-27 13:48       ` Oleg Nesterov
2010-05-27 16:15       ` Tejun Heo
2010-05-27 16:39         ` Michael S. Tsirkin
2010-05-27 16:56           ` Tejun Heo
2010-05-27 17:32             ` Michael S. Tsirkin
2010-05-27 21:20               ` Tejun Heo
2010-05-28 15:08                 ` Michael S. Tsirkin
2010-05-28 15:54                   ` Tejun Heo
2010-05-30 11:29                     ` Michael S. Tsirkin
2010-05-30 20:24                       ` [PATCH 1/3] vhost: replace vhost_workqueue with per-vhost kthread Tejun Heo
2010-05-31 14:39                         ` Oleg Nesterov
2010-05-31 15:07                           ` Tejun Heo
2010-05-31 15:31                             ` Oleg Nesterov
2010-05-31 15:38                               ` Tejun Heo
2010-05-31 15:22                         ` Michael S. Tsirkin [this message]
2010-05-31 15:45                           ` Tejun Heo
2010-05-31 16:00                             ` Michael S. Tsirkin
2010-06-01  9:34                               ` Tejun Heo
2010-06-02 18:40                                 ` [PATCH UPDATED " Tejun Heo
2010-06-02 21:34                                   ` Sridhar Samudrala
2010-07-22 15:58                                   ` Michael S. Tsirkin
2010-07-22 21:21                                     ` Tejun Heo
2010-07-24 19:14                                       ` Michael S. Tsirkin
2010-07-25  7:41                                         ` Tejun Heo
2010-07-25 10:04                                           ` Michael S. Tsirkin
2010-07-26 15:25                                           ` Michael S. Tsirkin
2010-07-26 15:34                                             ` Tejun Heo
2010-07-26 15:46                                               ` Tejun Heo
2010-07-26 15:51                                                 ` Michael S. Tsirkin
2010-07-26 15:50                                               ` Michael S. Tsirkin
2010-07-26 16:05                                                 ` Tejun Heo
2010-07-26 16:14                                                   ` Tejun Heo
2010-07-26 16:31                                                     ` Michael S. Tsirkin
2010-07-26 18:51                                                       ` Tejun Heo
2010-07-26 19:57                                                         ` Michael S. Tsirkin
2010-07-27  8:18                                                           ` Tejun Heo
2010-07-26 16:51                                                     ` Michael S. Tsirkin
2010-07-26 19:14                                                       ` Tejun Heo
2010-07-26 19:31                                                         ` Tejun Heo
2010-07-26 19:59                                                           ` Michael S. Tsirkin
2010-07-27 19:19                                                           ` Michael S. Tsirkin
2010-07-28  7:48                                                             ` Tejun Heo
2010-07-28 10:48                                                               ` Michael S. Tsirkin
2010-07-28 12:00                                                                 ` Tejun Heo
2010-07-26 16:57                                                     ` Michael S. Tsirkin
2010-07-26 16:23                                                   ` Michael S. Tsirkin
2010-07-26 19:04                                                     ` Tejun Heo
2010-07-26 20:19                                                       ` Michael S. Tsirkin
2010-07-27  8:21                                                         ` Tejun Heo
2010-06-01  9:34                               ` [PATCH 2/3] cgroups: Add an API to attach a task to current task's cgroup Tejun Heo
2010-06-01  9:35                               ` [PATCH 3/3] vhost: apply cpumask and cgroup to vhost workers Tejun Heo
2010-06-01 10:17                                 ` Michael S. Tsirkin
2010-06-01 10:56                                   ` Tejun Heo
2010-06-01 17:19                                 ` Sridhar Samudrala
2010-06-01 23:59                                   ` Tejun Heo
2010-06-01 14:13                           ` [PATCH 1/3] vhost: replace vhost_workqueue with per-vhost kthread Paul E. McKenney
2010-05-30 20:24                       ` [PATCH 2/3] cgroups: Add an API to attach a task to current task's cgroup Tejun Heo
2010-05-31  1:07                         ` Li Zefan
2010-05-31  7:00                           ` Tejun Heo
2010-05-30 20:25                       ` [PATCH 3/3] vhost: apply cpumask and cgroup to vhost pollers Tejun Heo
2010-05-31  1:11                         ` Li Zefan
2010-05-31  6:58                           ` [PATCH UPDATED " Tejun Heo
2010-05-31  7:48                             ` Li Zefan
2010-05-31 10:20                               ` [PATCH UPDATED2 " Tejun Heo
2010-06-24  8:11                         ` [PATCH " Michael S. Tsirkin
2010-06-24 22:45                           ` Sridhar Samudrala
2010-06-25 10:10                             ` [PATCH] sched: export sched_set/getaffinity (was Re: [PATCH 3/3] vhost: apply cpumask and cgroup to vhost pollers) Michael S. Tsirkin
2010-07-01 11:07                               ` [PATCH repost] sched: export sched_set/getaffinity to modules Michael S. Tsirkin
2010-07-01 11:19                                 ` Peter Zijlstra
2010-07-01 11:43                                   ` Peter Zijlstra
2010-07-01 11:55                                     ` Michael S. Tsirkin
2010-07-01 12:23                                       ` Michael S. Tsirkin
2010-07-01 12:34                                         ` Peter Zijlstra
2010-07-01 12:46                                           ` Peter Zijlstra
2010-07-01 13:08                                             ` Michael S. Tsirkin
2010-07-01 13:30                                               ` Peter Zijlstra
2010-07-01 13:39                                                 ` Michael S. Tsirkin
2010-07-01 13:57                                                   ` Peter Zijlstra
2010-07-01 14:27                                                   ` Tejun Heo
2010-07-01 14:46                                                     ` Oleg Nesterov
2010-07-01 14:53                                                       ` Tejun Heo
2010-07-01 14:55                                                         ` Peter Zijlstra
2010-07-02 18:01                                                           ` Sridhar Samudrala
2010-07-02 18:11                                                             ` Peter Zijlstra
2010-07-02 21:06                                                               ` Oleg Nesterov
2010-07-04  9:00                                                                 ` Michael S. Tsirkin
2010-07-13  6:59                                                                   ` Sridhar Samudrala
2010-07-13 11:09                                                                     ` Michael S. Tsirkin
2010-07-14 23:26                                                                       ` Sridhar Samudrala
2010-07-15  0:05                                                                         ` Oleg Nesterov
2010-07-15  5:29                                                                           ` Sridhar Samudrala
2010-07-26 17:12                                                                 ` Michael S. Tsirkin
2010-07-26 17:51                                                                   ` Sridhar Samudrala
2010-07-26 18:08                                                                     ` Oleg Nesterov
2010-07-26 19:55                                                                       ` Michael S. Tsirkin
2010-07-26 20:27                                                                       ` Michael S. Tsirkin
2010-07-27  4:55                                                                       ` Michael S. Tsirkin
2010-08-04 10:45                                                                         ` Peter Zijlstra
2010-07-27 15:41                                                                       ` Michael S. Tsirkin
2010-07-30 14:19                                                                         ` Oleg Nesterov
2010-07-30 14:31                                                                           ` Tejun Heo
2010-08-01  8:50                                                                           ` Michael S. Tsirkin
2010-08-02 15:02                                                                             ` Oleg Nesterov
2010-07-01 14:33                                                 ` Oleg Nesterov
2010-07-01 12:32                                       ` Peter Zijlstra
2010-07-01 12:50                                         ` Michael S. Tsirkin
2010-07-01 13:07                                           ` Peter Zijlstra
2010-07-01 13:22                                             ` Michael S. Tsirkin
2010-05-27 16:24     ` [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup Sridhar Samudrala
2010-05-27 16:41       ` Michael S. Tsirkin
2010-05-27 17:30       ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100531152221.GB2987@redhat.com \
    --to=mst@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dmitri.vorobiev@movial.com \
    --cc=jkosina@suse.cz \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=netdev@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=sri@us.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.