From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Tan, Jianfeng" Subject: Re: [RFC 0/5] virtio support for container Date: Tue, 24 Nov 2015 06:19:07 +0000 Message-ID: References: <1446748276-132087-1-git-send-email-jianfeng.tan@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Cc: "nakajima.yoshihiro@lab.ntt.co.jp" , Zhbzg , "mst@redhat.com" , gaoxiaoqiu , "Zhangbo \(Oscar\)" , Zhoujingbin , Guohongzhen To: Zhuangyanying , "dev@dpdk.org" Return-path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id DF9848E83 for ; Tue, 24 Nov 2015 07:19:11 +0100 (CET) In-Reply-To: Content-Language: en-US List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Zhuangyanying [mailto:ann.zhuangyanying@huawei.com] > Sent: Tuesday, November 24, 2015 11:53 AM > To: Tan, Jianfeng; dev@dpdk.org > Cc: mst@redhat.com; mukawa@igel.co.jp; nakajima.yoshihiro@lab.ntt.co.jp; > Qiu, Michael; Guohongzhen; Zhoujingbin; Zhangbo (Oscar); gaoxiaoqiu; > Zhbzg; Xie, Huawei > Subject: RE: [RFC 0/5] virtio support for container >=20 >=20 >=20 > > -----Original Message----- > > From: Jianfeng Tan [mailto:jianfeng.tan@intel.com] > > Sent: Friday, November 06, 2015 2:31 AM > > To: dev@dpdk.org > > Cc: mst@redhat.com; mukawa@igel.co.jp; > nakajima.yoshihiro@lab.ntt.co.jp; > > michael.qiu@intel.com; Guohongzhen; Zhoujingbin; Zhuangyanying; > Zhangbo > > (Oscar); gaoxiaoqiu; Zhbzg; huawei.xie@intel.com; Jianfeng Tan > > Subject: [RFC 0/5] virtio support for container > > ... > > 2.1.4 >=20 > This patch arose a good idea to add an extra abstracted IO layer, which > would make it simple to extend the function to the kernel mode switch(suc= h > as OVS). That's great. > But I have one question here: > it's the issue on VHOST_USER_SET_MEM_TABLE. you alloc memory from > tmpfs filesyste, just one fd, could used rte_memseg_info_get() to > directly get the memory topology, However, things change in kernel- > space, because mempool should be created on each container's > hugetlbfs(rather than tmpfs), which is seperated from each other, at > last, considering of the ioctl's parameter. > My solution is as follows for your reference: > /* > reg =3D mem->regions; > reg->guest_phys_addr =3D (__u64) ((struct virtqueue *)(dev->data- > >rx_queues[0]))->mpool->elt_va_start; > reg->userspace_addr =3D reg->guest_phys_addr; > reg->memory_size =3D ((struct virtqueue *)(dev->data- > >rx_queues[0]))->mpool->elt_va_end - reg->guest_phys_addr; >=20 > reg =3D mem->regions + 1; > reg->guest_phys_addr =3D (__u64)(((struct virtqueue *)(dev->data- > >tx_queues[0]))->virtio_net_hdr_mem); > reg->userspace_addr =3D reg->guest_phys_addr; > reg->memory_size =3D vq_size * internals->vtnet_hdr_size; > */ > But it's a little ugly, any better idea? Hi Yanying, Your solution seems ok for me when used with kernel vhost-net, because vhos= t kthread just shares the same mm_struct with virtio process. But it will not= work with vhost-user, which realize memory sharing through putting fd in sendmsg= (). Worse, it will not work with userspace vhost_cuse (see lib/librte_vhost/vhost_cuse/), either, because current implementation suppo= ses VM's physical memory is backed by one huge file. Actually, what we need to = do Is enhancing userspace vhost_cuse, so that it supports cross-file memory re= gion. With below solutions to support hugetlbfs FYI: To support hugetlbfs, my previous idea is to use -v option of "docker run" to map hugetlbfs into its /dev/shm, so that we can create a "huge" shm file on hugetlbfs. But this seems not accepted by others. You mentioned the situation that DPDK now creates a file for each hugepage. Maybe we just need to share all these hugepages with vhost. To minimize the memory translation effort, we need to require that we use as few pages as possible. Can you accept this solution? Thanks, Jianfeng