From mboxrd@z Thu Jan 1 00:00:00 1970 From: Liu Yuan Subject: Re: [RFC PATCH]vhost-blk: In-kernel accelerator for virtio block device Date: Wed, 07 Sep 2011 21:36:15 +0800 Message-ID: <4E67734F.7040209@gmail.com> References: <1311863346-4338-1-git-send-email-namei.unix@gmail.com> <4E325F98.5090308@gmail.com> <4E32F7F2.4080607@us.ibm.com> <4E363DB9.70801@gmail.com> <1312495132.9603.4.camel@badari-desktop> <4E3BCE4D.7090809@gmail.com> <4E3C302A.3040500@us.ibm.com> <4E3F3D4E.70104@gmail.com> <4E3F6E72.1000907@us.ibm.com> <4E3F90E3.9080600@gmail.com> <4E4019E1.2090508@us.ibm.com> <4E41EAC5.8060001@gmail.com> <1313008667.9603.14.camel@badari-desktop> <4E4345F1.90107@gmail.com> <4E434A51.8000902@gmail.com> <4E44B100.3000208@us.ibm.com> <4E44E40C.7040407@gmail.com> <4E45113C.3040502@gmail.com> <4E4550DB.3020802@us.ibm.com> <4E489097.1070307@gmail.com> <4E489DF7.3050707@us.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org, Dongsu Park To: Badari Pulavarty Return-path: Received: from mail-qy0-f181.google.com ([209.85.216.181]:58271 "EHLO mail-qy0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932080Ab1IGQUU (ORCPT ); Wed, 7 Sep 2011 12:20:20 -0400 Received: by qyk34 with SMTP id 34so4606902qyk.19 for ; Wed, 07 Sep 2011 09:20:19 -0700 (PDT) In-Reply-To: <4E489DF7.3050707@us.ibm.com> Sender: kvm-owner@vger.kernel.org List-ID: On 08/15/2011 12:17 PM, Badari Pulavarty wrote: > On 8/14/2011 8:20 PM, Liu Yuan wrote: >> On 08/13/2011 12:12 AM, Badari Pulavarty wrote: >>> On 8/12/2011 4:40 AM, Liu Yuan wrote: >>>> On 08/12/2011 04:27 PM, Liu Yuan wrote: >>>>> On 08/12/2011 12:50 PM, Badari Pulavarty wrote: >>>>>> On 8/10/2011 8:19 PM, Liu Yuan wrote: >>>>>>> On 08/11/2011 11:01 AM, Liu Yuan wrote: >>>>>>>> >>>>>>>>> It looks like the patch wouldn't work for testing multiple >>>>>>>>> devices. >>>>>>>>> >>>>>>>>> vhost_blk_open() does >>>>>>>>> + used_info_cachep = KMEM_CACHE(used_info, >>>>>>>>> SLAB_HWCACHE_ALIGN | >>>>>>>>> SLAB_PANIC); >>>>>>>>> >>>>>>>> >>>>>>>> This is weird. how do you open multiple device?I just opened >>>>>>>> the device with following command: >>>>>>>> >>>>>>>> -drive file=/dev/sda6,if=virtio,cache=none,aio=native -drive >>>>>>>> file=~/data0.img,if=virtio,cache=none,aio=native -drive >>>>>>>> file=~/data1.img,if=virtio,cache=none,aio=native >>>>>>>> >>>>>>>> And I didn't meet any problem. >>>>>>>> >>>>>>>> this would tell qemu to open three devices, and pass three FDs >>>>>>>> to three instances of vhost_blk module. >>>>>>>> So KMEM_CACHE() is okay in vhost_blk_open(). >>>>>>>> >>>>>>> >>>>>>> Oh, you are right. KMEM_CACHE() is in the wrong place. it is >>>>>>> three instances vhost worker threads created. Hmmm, but I didn't >>>>>>> meet any problem when opening it and running it. So strange. >>>>>>> I'll go to figure it out. >>>>>>> >>>>>>>>> When opening second device, we get panic since >>>>>>>>> used_info_cachep is >>>>>>>>> already created. Just to make progress I moved this call to >>>>>>>>> vhost_blk_init(). >>>>>>>>> >>>>>>>>> I don't see any host panics now. With single block device (dd), >>>>>>>>> it seems to work fine. But when I start testing multiple block >>>>>>>>> devices I quickly run into hangs in the guest. I see following >>>>>>>>> messages in the guest from virtio_ring.c: >>>>>>>>> >>>>>>>>> virtio_blk virtio2: requests: id 0 is not a head ! >>>>>>>>> virtio_blk virtio1: requests: id 0 is not a head ! >>>>>>>>> virtio_blk virtio4: requests: id 1 is not a head ! >>>>>>>>> virtio_blk virtio3: requests: id 39 is not a head ! >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Badari >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> vq->data[] is initialized by guest virtio-blk driver and >>>>>>>> vhost_blk is unware of it. it looks like used ID passed >>>>>>>> over by vhost_blk to guest virtio_blk is wrong, but, it should >>>>>>>> not happen. :| >>>>>>>> >>>>>>>> And I can't reproduce this on my laptop. :( >>>>>>>> >>>>>> Finally, found the issue :) >>>>>> >>>>>> Culprit is: >>>>>> >>>>>> +static struct io_event events[MAX_EVENTS]; >>>>>> >>>>>> With multiple devices, multiple threads could be executing >>>>>> handle_completion() (one for >>>>>> each fd) at the same time. "events" array is global :( Need to >>>>>> make it one per device/fd. >>>>>> >>>>>> For test, I changed MAX_EVENTS to 32 and moved "events" array to >>>>>> be local (stack) >>>>>> to handle_completion(). Tests are running fine. >>>>>> >>>>>> Your laptop must have single processor, hence you have only one >>>>>> thread executing handle_completion() >>>>>> at any time.. >>>>>> >>>>>> Thanks, >>>>>> Badari >>>>>> >>>>>> >>>>> Good catch, this is rather cool!....Yup, I develop it mostly in a >>>>> nested KVM environment. and the L2 host only runs single >>>>> processor :( >>>>> >>>>> Thanks, >>>>> Yuan >>>> By the way, MAX_EVENTS should be 128, as much as guest virtio_blk >>>> driver can batch-submit, >>>> causing array overflow. >>>> I have had turned on the debug, and had seen as much as over 100 >>>> requests batched from guest OS. >>>> >>> >>> Hmm.. I am not sure why you see over 100 outstanding events per fd. >>> Max events could be as high as >>> number of number of outstanding IOs. >>> >>> Anyway, instead of putting it on stack, I kmalloced it now. >>> >>> Dongsu Park, Here is the complete patch. >>> >>> Thanks >>> Badari >>> >>> >> In the physical machine, there is a queue depth posted by block >> device driver to limit the >> pending requests number, normally it is 31. But virtio driver doesn't >> post it in the guest OS. >> So nothing prvents OS batch-submitting requests more than 31. >> >> I have noticed over 100 pending requests during guest OS >> initilization and it is reproducible. >> >> BTW, how is perf number for vhost-blk in your environment? > > Right now I am doing "dd" tests to test out the functionality and > stability. > > I plan to collect FFSB benchmark results across 6-virtio-blk/vhost-blk > disks with > all profiles - seq read, seq write, random read, random write with > blocksizes varying > from 4k to 1MB. > > I will start the test tomorrow. It will take few days to run thru all > the scenarios. > I don't have an easy way to collect host CPU consumption - but for now > lets > focus on throughput and latency. I will share the results in few days. > > Thanks > Badari > > Hi Badari, how is test going? Thanks, Yuan