From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753686Ab1HAGZv (ORCPT ); Mon, 1 Aug 2011 02:25:51 -0400 Received: from mail-gx0-f174.google.com ([209.85.161.174]:48429 "EHLO mail-gx0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753586Ab1HAGZn (ORCPT ); Mon, 1 Aug 2011 02:25:43 -0400 Message-ID: <4E3646E0.9050400@gmail.com> Date: Mon, 01 Aug 2011 14:25:36 +0800 From: Liu Yuan User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110617 Thunderbird/3.1.11 MIME-Version: 1.0 To: "Michael S. Tsirkin" CC: Rusty Russell , Avi Kivity , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH] vhost-blk: An in-kernel accelerator for virtio-blk References: <1311863346-4338-1-git-send-email-namei.unix@gmail.com> <1311863346-4338-2-git-send-email-namei.unix@gmail.com> <20110728152244.GA31888@redhat.com> In-Reply-To: <20110728152244.GA31888@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/28/2011 11:22 PM, Michael S. Tsirkin wrote: > > It would be nicer to reuse the worker infrastructure > from vhost.c. In particular this one ignores cgroups that > the owner belongs to if any. > Does this one do anything that vhost.c doesn't? > The main idea I use a separated thread to handling completion is to decouple the request handling and the request completion signalling. This might allow better scalability in a IO intensive scenario, since I noted that virtio driver would allow sumbit as much as 128 requests in one go. Hmm, I have tried to make signalling thread into a function that is called as a vhost-worker's work. I didn't see regression in my laptop with iodepth equalling 1,2,3. But requests handling and completion signalling may produce race in a high requests submitting rate. Anyway, I'll adopt it to work as a vhost worker function in v2. > >> + switch (ioctl) { >> + case VHOST_NET_SET_BACKEND: >> + if(copy_from_user(&backend, (void __user *)arg, sizeof backend)) >> + break; >> + ret = blk_set_backend(blk,&backend); >> + break; > Please create your own ioctl for this one. > How about change VHOST_NET_SET_BACKEND into VHOST_SET_BACKEND? >> >> static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, >> struct iocb *iocb, bool compat) >> diff --git a/fs/eventfd.c b/fs/eventfd.c >> index d9a5917..6343bc9 100644 >> --- a/fs/eventfd.c >> +++ b/fs/eventfd.c >> @@ -406,6 +406,7 @@ struct file *eventfd_file_create(unsigned int count, int flags) >> >> return file; >> } >> +EXPORT_SYMBOL_GPL(eventfd_file_create); > You can avoid the need for this export if you pass > the eventfd in from userspace. > Since eventfd used by completion code is internal and hiding it from hw/vhost_blk.c would simplify the configuration, I think this exporting is necessary and can get rid of unnecessary FD management in vhost-blk.c. >> >> SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags) >> { >> diff --git a/include/linux/aio.h b/include/linux/aio.h >> index 7a8db41..d63bc04 100644 >> --- a/include/linux/aio.h >> +++ b/include/linux/aio.h >> @@ -214,6 +214,37 @@ struct mm_struct; >> extern void exit_aio(struct mm_struct *mm); >> extern long do_io_submit(aio_context_t ctx_id, long nr, >> struct iocb __user *__user *iocbpp, bool compat); >> +extern void __put_ioctx(struct kioctx *ctx); >> +extern struct kioctx *ioctx_alloc(unsigned nr_events); >> +extern struct kiocb *aio_get_req(struct kioctx *ctx); >> +extern ssize_t aio_run_iocb(struct kiocb *iocb); >> +extern int __aio_run_iocbs(struct kioctx *ctx); >> +extern int read_events(struct kioctx *ctx, >> + long min_nr, long nr, >> + struct io_event __user *event, >> + struct timespec __user *timeout); >> +extern void io_destroy(struct kioctx *ioctx); >> +extern ssize_t aio_setup_iocb(struct kiocb *kiocb, bool compat); >> +extern void __put_ioctx(struct kioctx *ctx); >> + >> +static inline void get_ioctx(struct kioctx *kioctx) >> +{ >> + BUG_ON(atomic_read(&kioctx->users)<= 0); >> + atomic_inc(&kioctx->users); >> +} >> + >> +static inline int try_get_ioctx(struct kioctx *kioctx) >> +{ >> + return atomic_inc_not_zero(&kioctx->users); >> +} >> + >> +static inline void put_ioctx(struct kioctx *kioctx) >> +{ >> + BUG_ON(atomic_read(&kioctx->users)<= 0); >> + if (unlikely(atomic_dec_and_test(&kioctx->users))) >> + __put_ioctx(kioctx); >> +} >> + >> #else >> static inline ssize_t wait_on_sync_kiocb(struct kiocb *iocb) { return 0; } >> static inline int aio_put_req(struct kiocb *iocb) { return 0; } >> -- >> 1.7.5.1 Other comments will be addressed in V2. Thanks Yuan