From mboxrd@z Thu Jan  1 00:00:00 1970
From: Liu Yuan <namei.unix@gmail.com>
Subject: Re: [RFC PATCH]vhost-blk: In-kernel accelerator for virtio block
 device
Date: Wed, 07 Sep 2011 21:36:15 +0800
Message-ID: <4E67734F.7040209@gmail.com>
References: <1311863346-4338-1-git-send-email-namei.unix@gmail.com> <CAJSP0QUKz7LFuARF_n2LcBi_uuSzfmkjrWsHAWrXXkEaQJKkEA@mail.gmail.com> <4E325F98.5090308@gmail.com> <4E32F7F2.4080607@us.ibm.com> <4E363DB9.70801@gmail.com> <1312495132.9603.4.camel@badari-desktop> <4E3BCE4D.7090809@gmail.com> <4E3C302A.3040500@us.ibm.com> <4E3F3D4E.70104@gmail.com> <4E3F6E72.1000907@us.ibm.com> <4E3F90E3.9080600@gmail.com> <4E4019E1.2090508@us.ibm.com> <4E41EAC5.8060001@gmail.com> <1313008667.9603.14.camel@badari-desktop> <4E4345F1.90107@gmail.com> <4E434A51.8000902@gmail.com> <4E44B100.3000208@us.ibm.com> <4E44E40C.7040407@gmail.com> <4E45113C.3040502@gmail.com> <4E4550DB.3020802@us.ibm.com> <4E489097.1070307@gmail.com> <4E489DF7.3050707@us.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: kvm@vger.kernel.org, Dongsu Park <dongsu.park@profitbricks.com>
To: Badari Pulavarty <pbadari@us.ibm.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mail-qy0-f181.google.com ([209.85.216.181]:58271 "EHLO
	mail-qy0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S932080Ab1IGQUU (ORCPT <rfc822;kvm@vger.kernel.org>);
	Wed, 7 Sep 2011 12:20:20 -0400
Received: by qyk34 with SMTP id 34so4606902qyk.19
        for <kvm@vger.kernel.org>; Wed, 07 Sep 2011 09:20:19 -0700 (PDT)
In-Reply-To: <4E489DF7.3050707@us.ibm.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 08/15/2011 12:17 PM, Badari Pulavarty wrote:
> On 8/14/2011 8:20 PM, Liu Yuan wrote:
>> On 08/13/2011 12:12 AM, Badari Pulavarty wrote:
>>> On 8/12/2011 4:40 AM, Liu Yuan wrote:
>>>> On 08/12/2011 04:27 PM, Liu Yuan wrote:
>>>>> On 08/12/2011 12:50 PM, Badari Pulavarty wrote:
>>>>>> On 8/10/2011 8:19 PM, Liu Yuan wrote:
>>>>>>> On 08/11/2011 11:01 AM, Liu Yuan wrote:
>>>>>>>>
>>>>>>>>> It looks like the patch wouldn't work for testing multiple 
>>>>>>>>> devices.
>>>>>>>>>
>>>>>>>>> vhost_blk_open() does
>>>>>>>>> +       used_info_cachep = KMEM_CACHE(used_info, 
>>>>>>>>> SLAB_HWCACHE_ALIGN |
>>>>>>>>> SLAB_PANIC);
>>>>>>>>>
>>>>>>>>
>>>>>>>> This is weird. how do you open multiple device?I just opened 
>>>>>>>> the device with following command:
>>>>>>>>
>>>>>>>> -drive file=/dev/sda6,if=virtio,cache=none,aio=native -drive 
>>>>>>>> file=~/data0.img,if=virtio,cache=none,aio=native -drive 
>>>>>>>> file=~/data1.img,if=virtio,cache=none,aio=native
>>>>>>>>
>>>>>>>> And I didn't meet any problem.
>>>>>>>>
>>>>>>>> this would tell qemu to open three devices, and pass three FDs 
>>>>>>>> to three instances of vhost_blk module.
>>>>>>>> So KMEM_CACHE() is okay in vhost_blk_open().
>>>>>>>>
>>>>>>>
>>>>>>> Oh, you are right. KMEM_CACHE() is in the wrong place. it is 
>>>>>>> three instances vhost worker threads created. Hmmm, but I didn't 
>>>>>>> meet any problem when opening it and running it. So strange. 
>>>>>>> I'll go to figure it out.
>>>>>>>
>>>>>>>>> When opening second device, we get panic since 
>>>>>>>>> used_info_cachep is
>>>>>>>>> already created. Just to make progress I moved this call to
>>>>>>>>> vhost_blk_init().
>>>>>>>>>
>>>>>>>>> I don't see any host panics now. With single block device (dd),
>>>>>>>>> it seems to work fine. But when I start testing multiple block
>>>>>>>>> devices I quickly run into hangs in the guest. I see following
>>>>>>>>> messages in the guest from virtio_ring.c:
>>>>>>>>>
>>>>>>>>> virtio_blk virtio2: requests: id 0 is not a head !
>>>>>>>>> virtio_blk virtio1: requests: id 0 is not a head !
>>>>>>>>> virtio_blk virtio4: requests: id 1 is not a head !
>>>>>>>>> virtio_blk virtio3: requests: id 39 is not a head !
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Badari
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> vq->data[] is initialized by guest virtio-blk driver and 
>>>>>>>> vhost_blk is unware of it. it looks like used ID passed
>>>>>>>> over by vhost_blk to guest virtio_blk is wrong, but, it should 
>>>>>>>> not happen. :|
>>>>>>>>
>>>>>>>> And I can't reproduce this on my laptop. :(
>>>>>>>>
>>>>>> Finally, found the issue  :)
>>>>>>
>>>>>> Culprit is:
>>>>>>
>>>>>> +static struct io_event events[MAX_EVENTS];
>>>>>>
>>>>>> With multiple devices, multiple threads could be executing 
>>>>>> handle_completion() (one for
>>>>>> each fd) at the same time. "events" array is global :( Need to 
>>>>>> make it one per device/fd.
>>>>>>
>>>>>> For test, I changed MAX_EVENTS to 32 and moved "events" array to 
>>>>>> be local (stack)
>>>>>> to handle_completion(). Tests are running fine.
>>>>>>
>>>>>> Your laptop must have single processor, hence you have only one 
>>>>>> thread executing handle_completion()
>>>>>> at any time..
>>>>>>
>>>>>> Thanks,
>>>>>> Badari
>>>>>>
>>>>>>
>>>>> Good catch, this is rather cool!....Yup, I develop it mostly in a 
>>>>> nested KVM environment. and the L2 host  only runs single 
>>>>> processor :(
>>>>>
>>>>> Thanks,
>>>>> Yuan
>>>> By the way, MAX_EVENTS should be 128, as much as guest virtio_blk 
>>>> driver can batch-submit,
>>>> causing array overflow.
>>>> I have had turned on the debug, and had seen as much as over 100 
>>>> requests batched from guest OS.
>>>>
>>>
>>> Hmm.. I am not sure why you see over 100 outstanding events per fd.  
>>> Max events could be as high as
>>> number of number of outstanding IOs.
>>>
>>> Anyway, instead of putting it on stack, I kmalloced it now.
>>>
>>> Dongsu Park, Here is the complete patch.
>>>
>>> Thanks
>>> Badari
>>>
>>>
>> In the physical machine, there is a queue depth posted by block 
>> device driver to limit the
>> pending requests number, normally it is 31. But virtio driver doesn't 
>> post it in the guest OS.
>> So nothing prvents OS batch-submitting requests more than 31.
>>
>> I have noticed over 100 pending requests during guest OS 
>> initilization and it is reproducible.
>>
>> BTW, how is perf number for vhost-blk in your environment?
>
> Right now I am doing "dd" tests to test out the functionality and 
> stability.
>
> I plan to collect FFSB benchmark results across 6-virtio-blk/vhost-blk 
> disks with
> all profiles - seq read, seq write, random read, random write with 
> blocksizes varying
> from 4k to 1MB.
>
> I will start the test tomorrow. It will take few days to run thru all 
> the scenarios.
> I don't have an easy way to collect host CPU consumption - but for now 
> lets
> focus on throughput and latency. I will share the results in few days.
>
> Thanks
> Badari
>
>
Hi Badari,
     how is test going?

Thanks,
Yuan