From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F376C433EF for ; Mon, 27 Sep 2021 04:17:50 +0000 (UTC) Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by mail.kernel.org (Postfix) with ESMTP id 2DE666113A for ; Mon, 27 Sep 2021 04:17:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2DE666113A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=dpdk.org Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0B56B40686; Mon, 27 Sep 2021 06:17:49 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by mails.dpdk.org (Postfix) with ESMTP id A6EF44003D for ; Mon, 27 Sep 2021 06:17:46 +0200 (CEST) X-IronPort-AV: E=McAfee;i="6200,9189,10119"; a="224433244" X-IronPort-AV: E=Sophos;i="5.85,325,1624345200"; d="scan'208";a="224433244" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Sep 2021 21:17:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,325,1624345200"; d="scan'208";a="552977463" Received: from irsmsx601.ger.corp.intel.com ([163.33.146.7]) by FMSMGA003.fm.intel.com with ESMTP; 26 Sep 2021 21:17:43 -0700 Received: from shsmsx606.ccr.corp.intel.com (10.109.6.216) by irsmsx601.ger.corp.intel.com (163.33.146.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2242.12; Mon, 27 Sep 2021 05:17:41 +0100 Received: from shsmsx606.ccr.corp.intel.com ([10.109.6.216]) by SHSMSX606.ccr.corp.intel.com ([10.109.6.216]) with mapi id 15.01.2242.012; Mon, 27 Sep 2021 12:17:39 +0800 From: "Hu, Jiayu" To: "Ding, Xuan" , "dev@dpdk.org" , "Burakov, Anatoly" , "maxime.coquelin@redhat.com" , "Xia, Chenbo" CC: "Jiang, Cheng1" , "Richardson, Bruce" , "Pai G, Sunil" , "Wang, Yinan" , "Yang, YvonneX" Thread-Topic: [PATCH v3 2/2] vhost: enable IOMMU for async vhost Thread-Index: AQHXsfW7iiBW43IUWUC64hYnEb3P2au3QoSQ Date: Mon, 27 Sep 2021 04:17:39 +0000 Message-ID: <144cf26ebf434ff4b6f3b0b22ebc41a6@intel.com> References: <20210901053044.109901-1-xuan.ding@intel.com> <20210925100358.61995-1-xuan.ding@intel.com> <20210925100358.61995-3-xuan.ding@intel.com> In-Reply-To: <20210925100358.61995-3-xuan.ding@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-reaction: no-action dlp-version: 11.6.200.16 dlp-product: dlpe-windows x-originating-ip: [10.239.127.36] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v3 2/2] vhost: enable IOMMU for async vhost X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Xuan, > -----Original Message----- > From: Ding, Xuan > Sent: Saturday, September 25, 2021 6:04 PM > To: dev@dpdk.org; Burakov, Anatoly ; > maxime.coquelin@redhat.com; Xia, Chenbo > Cc: Hu, Jiayu ; Jiang, Cheng1 ; > Richardson, Bruce ; Pai G, Sunil > ; Wang, Yinan ; Yang, > YvonneX ; Ding, Xuan > Subject: [PATCH v3 2/2] vhost: enable IOMMU for async vhost >=20 > The use of IOMMU has many advantages, such as isolation and address > translation. This patch extends the capbility of DMA engine to use IOMMU = if > the DMA engine is bound to vfio. >=20 > When set memory table, the guest memory will be mapped into the default > container of DPDK. >=20 > Signed-off-by: Xuan Ding > --- > lib/vhost/vhost.h | 4 ++ > lib/vhost/vhost_user.c | 112 > ++++++++++++++++++++++++++++++++++++++++- > 2 files changed, 114 insertions(+), 2 deletions(-) >=20 > diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h index > 89a31e4ca8..bc5695e899 100644 > --- a/lib/vhost/vhost.h > +++ b/lib/vhost/vhost.h > @@ -370,6 +370,10 @@ struct virtio_net { > int16_t broadcast_rarp; > uint32_t nr_vring; > int async_copy; > + > + /* Record the dma map status for each region. */ > + bool *async_map_status; > + > int extbuf; > int linearbuf; > struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2]; > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c index > 29a4c9af60..3990e9b057 100644 > --- a/lib/vhost/vhost_user.c > +++ b/lib/vhost/vhost_user.c > @@ -45,6 +45,8 @@ > #include > #include > #include > +#include > +#include >=20 > #include "iotlb.h" > #include "vhost.h" > @@ -141,6 +143,63 @@ get_blk_size(int fd) > return ret =3D=3D -1 ? (uint64_t)-1 : (uint64_t)stat.st_blksize; } >=20 > +static int > +async_dma_map(struct rte_vhost_mem_region *region, bool > +*dma_map_success, bool do_map) { > + uint64_t host_iova; > + int ret =3D 0; > + > + host_iova =3D rte_mem_virt2iova((void *)(uintptr_t)region- > >host_user_addr); > + if (do_map) { > + /* Add mapped region into the default container of DPDK. */ > + ret =3D > rte_vfio_container_dma_map(RTE_VFIO_DEFAULT_CONTAINER_FD, > + region->host_user_addr, > + host_iova, > + region->size); > + *dma_map_success =3D ret =3D=3D 0; > + > + if (ret) { > + /* > + * DMA device may bind with kernel driver, in this > case, > + * we don't need to program IOMMU manually. > However, if no > + * device is bound with vfio/uio in DPDK, and vfio > kernel > + * module is loaded, the API will still be called and > return > + * with ENODEV/ENOSUP. > + * > + * DPDK VFIO only returns ENODEV/ENOSUP in very > similar > + * situations(VFIO either unsupported, or supported > + * but no devices found). Either way, no mappings > could be > + * performed. We treat it as normal case in async > path. > + */ > + if (rte_errno =3D=3D ENODEV && rte_errno =3D=3D ENOTSUP) { > + return 0; > + } else { > + VHOST_LOG_CONFIG(ERR, "DMA engine map > failed\n"); > + return ret; > + } > + } > + > + } else { > + /* No need to do vfio unmap if the map failed. */ > + if (!*dma_map_success) > + return 0; > + > + /* Remove mapped region from the default container of > DPDK. */ > + ret =3D > rte_vfio_container_dma_unmap(RTE_VFIO_DEFAULT_CONTAINER_FD, > + region->host_user_addr, > + host_iova, > + region->size); > + if (ret) { > + VHOST_LOG_CONFIG(ERR, "DMA engine unmap > failed\n"); > + return ret; > + } > + /* Clear the flag once the unmap succeeds. */ > + *dma_map_success =3D 0; > + } > + > + return ret; > +} > + > static void > free_mem_region(struct virtio_net *dev) { @@ -153,6 +212,9 @@ > free_mem_region(struct virtio_net *dev) > for (i =3D 0; i < dev->mem->nregions; i++) { > reg =3D &dev->mem->regions[i]; > if (reg->host_user_addr) { > + if (dev->async_copy && rte_vfio_is_enabled("vfio")) > + async_dma_map(reg, &dev- > >async_map_status[i], false); > + > munmap(reg->mmap_addr, reg->mmap_size); > close(reg->fd); > } > @@ -203,6 +265,11 @@ vhost_backend_cleanup(struct virtio_net *dev) > } >=20 > dev->postcopy_listening =3D 0; > + > + if (dev->async_map_status) { > + rte_free(dev->async_map_status); > + dev->async_map_status =3D NULL; > + } > } >=20 > static void > @@ -621,6 +688,17 @@ numa_realloc(struct virtio_net *dev, int index) > } > dev->mem =3D mem; >=20 > + if (dev->async_copy && rte_vfio_is_enabled("vfio")) { > + dev->async_map_status =3D rte_zmalloc_socket("async-dma- > map-status", > + sizeof(bool) * dev->mem->nregions, > 0, node); > + if (!dev->async_map_status) { > + VHOST_LOG_CONFIG(ERR, > + "(%d) failed to realloc dma mapping status on > node\n", > + dev->vid); > + return dev; > + } > + } > + > gp =3D rte_realloc_socket(dev->guest_pages, dev->max_guest_pages * > sizeof(*gp), > RTE_CACHE_LINE_SIZE, node); > if (!gp) { > @@ -1151,12 +1229,14 @@ vhost_user_postcopy_register(struct virtio_net > *dev, int main_fd, static int vhost_user_mmap_region(struct virtio_net = *dev, > struct rte_vhost_mem_region *region, > + uint32_t region_index, > uint64_t mmap_offset) > { > void *mmap_addr; > uint64_t mmap_size; > uint64_t alignment; > int populate; > + int ret; >=20 > /* Check for memory_size + mmap_offset overflow */ > if (mmap_offset >=3D -region->size) { > @@ -1210,13 +1290,25 @@ vhost_user_mmap_region(struct virtio_net *dev, > region->mmap_size =3D mmap_size; > region->host_user_addr =3D (uint64_t)(uintptr_t)mmap_addr + > mmap_offset; >=20 > - if (dev->async_copy) > + if (dev->async_copy) { > if (add_guest_pages(dev, region, alignment) < 0) { > VHOST_LOG_CONFIG(ERR, > "adding guest pages to region > failed.\n"); > return -1; > } >=20 > + if (rte_vfio_is_enabled("vfio")) { > + ret =3D async_dma_map(region, &dev- > >async_map_status[region_index], true); > + if (ret) { > + VHOST_LOG_CONFIG(ERR, "Configure > IOMMU for DMA " > + "engine failed\n"); > + rte_free(dev->async_map_status); > + dev->async_map_status =3D NULL; The freed dev->async_map_status is accessed in free_mem_region() later. You need to free it after calling free_mem_region(). > + return -1; > + } > + } > + } > + > VHOST_LOG_CONFIG(INFO, > "guest memory region size: 0x%" PRIx64 "\n" > "\t guest physical addr: 0x%" PRIx64 "\n" > @@ -1291,6 +1383,11 @@ vhost_user_set_mem_table(struct virtio_net > **pdev, struct VhostUserMsg *msg, > dev->mem =3D NULL; > } >=20 > + if (dev->async_map_status) { > + rte_free(dev->async_map_status); > + dev->async_map_status =3D NULL; > + } To handle the gust memory hot-plug case, you need to un-map iommu tables before program iommu for new memory. But you seem only free the old dev->async_map_status. Thanks, Jiayu > + > /* Flush IOTLB cache as previous HVAs are now invalid */ > if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > for (i =3D 0; i < dev->nr_vring; i++) > @@ -1329,6 +1426,17 @@ vhost_user_set_mem_table(struct virtio_net > **pdev, struct VhostUserMsg *msg, > goto free_guest_pages; > } >=20 > + if (dev->async_copy) { > + dev->async_map_status =3D rte_zmalloc_socket("async-dma- > map-status", > + sizeof(bool) * memory->nregions, 0, > numa_node); > + if (!dev->async_map_status) { > + VHOST_LOG_CONFIG(ERR, > + "(%d) failed to allocate memory for dma > mapping status\n", > + dev->vid); > + goto free_guest_pages; > + } > + } > + > for (i =3D 0; i < memory->nregions; i++) { > reg =3D &dev->mem->regions[i]; >=20 > @@ -1345,7 +1453,7 @@ vhost_user_set_mem_table(struct virtio_net > **pdev, struct VhostUserMsg *msg, >=20 > mmap_offset =3D memory->regions[i].mmap_offset; >=20 > - if (vhost_user_mmap_region(dev, reg, mmap_offset) < 0) { > + if (vhost_user_mmap_region(dev, reg, i, mmap_offset) < 0) { > VHOST_LOG_CONFIG(ERR, "Failed to mmap > region %u\n", i); > goto free_mem_table; > } > -- > 2.17.1