All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matan Azrad <matan@mellanox.com>
To: Matan Azrad <matan@mellanox.com>,
	Maxime Coquelin <maxime.coquelin@redhat.com>,
	Tiwei Bie <tiwei.bie@intel.com>,
	Zhihong Wang <zhihong.wang@intel.com>,
	Xiao Wang <xiao.w.wang@intel.com>
Cc: Ferruh Yigit <ferruh.yigit@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	Thomas Monjalon <thomas@monjalon.net>,
	Andrew Rybchenko <arybchenko@solarflare.com>
Subject: Re: [dpdk-dev] [PATCH v2 3/3] drivers: move ifc driver to the vDPA class
Date: Thu, 9 Jan 2020 17:25:41 +0000	[thread overview]
Message-ID: <AM0PR0502MB4019F014310929212E496790D2390@AM0PR0502MB4019.eurprd05.prod.outlook.com> (raw)
In-Reply-To: <1578567617-3541-4-git-send-email-matan@mellanox.com>

Small typo inline.

From: Matan Azrad
> A new vDPA class was recently introduced.
> 
> IFC driver implements the vDPA operations, hence it should be moved to
> the vDPA class.
> 
> Move it.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
>  MAINTAINERS                              |   14 +-
>  doc/guides/nics/features/ifcvf.ini       |    8 -
>  doc/guides/nics/ifc.rst                  |  106 ---
>  doc/guides/nics/index.rst                |    1 -
>  doc/guides/vdpadevs/features/ifcvf.ini   |    8 +
>  doc/guides/vdpadevs/ifc.rst              |  106 +++
>  doc/guides/vdpadevs/index.rst            |    1 +
>  drivers/net/Makefile                     |    3 -
>  drivers/net/ifc/Makefile                 |   34 -
>  drivers/net/ifc/base/ifcvf.c             |  329 --------
>  drivers/net/ifc/base/ifcvf.h             |  162 ----
>  drivers/net/ifc/base/ifcvf_osdep.h       |   52 --
>  drivers/net/ifc/ifcvf_vdpa.c             | 1280 ------------------------------
>  drivers/net/ifc/meson.build              |    9 -
>  drivers/net/ifc/rte_pmd_ifc_version.map  |    3 -
>  drivers/net/meson.build                  |    1 -
>  drivers/vdpa/Makefile                    |    6 +
>  drivers/vdpa/ifc/Makefile                |   34 +
>  drivers/vdpa/ifc/base/ifcvf.c            |  329 ++++++++
>  drivers/vdpa/ifc/base/ifcvf.h            |  162 ++++
>  drivers/vdpa/ifc/base/ifcvf_osdep.h      |   52 ++
>  drivers/vdpa/ifc/ifcvf_vdpa.c            | 1280
> ++++++++++++++++++++++++++++++
>  drivers/vdpa/ifc/meson.build             |    9 +
>  drivers/vdpa/ifc/rte_pmd_ifc_version.map |    3 +
>  drivers/vdpa/meson.build                 |    2 +-
>  25 files changed, 1997 insertions(+), 1997 deletions(-)
>  delete mode 100644 doc/guides/nics/features/ifcvf.ini
>  delete mode 100644 doc/guides/nics/ifc.rst
>  create mode 100644 doc/guides/vdpadevs/features/ifcvf.ini
>  create mode 100644 doc/guides/vdpadevs/ifc.rst
>  delete mode 100644 drivers/net/ifc/Makefile
>  delete mode 100644 drivers/net/ifc/base/ifcvf.c
>  delete mode 100644 drivers/net/ifc/base/ifcvf.h
>  delete mode 100644 drivers/net/ifc/base/ifcvf_osdep.h
>  delete mode 100644 drivers/net/ifc/ifcvf_vdpa.c
>  delete mode 100644 drivers/net/ifc/meson.build
>  delete mode 100644 drivers/net/ifc/rte_pmd_ifc_version.map
>  create mode 100644 drivers/vdpa/ifc/Makefile
>  create mode 100644 drivers/vdpa/ifc/base/ifcvf.c
>  create mode 100644 drivers/vdpa/ifc/base/ifcvf.h
>  create mode 100644 drivers/vdpa/ifc/base/ifcvf_osdep.h
>  create mode 100644 drivers/vdpa/ifc/ifcvf_vdpa.c
>  create mode 100644 drivers/vdpa/ifc/meson.build
>  create mode 100644 drivers/vdpa/ifc/rte_pmd_ifc_version.map
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 17c2df7..16facba 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -679,14 +679,6 @@ T: git://dpdk.org/next/dpdk-next-net-intel
>  F: drivers/net/iavf/
>  F: doc/guides/nics/features/iavf*.ini
> 
> -Intel ifc
> -M: Xiao Wang <xiao.w.wang@intel.com>
> -T: git://dpdk.org/next/dpdk-next-net-intel
> -F: drivers/net/ifc/
> -F: doc/guides/nics/ifc.rst
> -F: doc/guides/nics/features/ifc*.ini
> -
> -Intel ice

This line removing is typo.
Will be fixed in next version if needed or in integration.

>  M: Qiming Yang <qiming.yang@intel.com>
>  M: Wenzhuo Lu <wenzhuo.lu@intel.com>
>  T: git://dpdk.org/next/dpdk-next-net-intel
> @@ -1093,6 +1085,12 @@ vDPA Drivers
>  ------------
>  T: git://dpdk.org/next/dpdk-next-virtio
> 
> +Intel ifc
> +M: Xiao Wang <xiao.w.wang@intel.com>
> +F: drivers/vdpa/ifc/
> +F: doc/guides/vdpadevs/ifc.rst
> +F: doc/guides/vdpadevs/features/ifcvf.ini
> +
> 
>  Eventdev Drivers
>  ----------------
> diff --git a/doc/guides/nics/features/ifcvf.ini
> b/doc/guides/nics/features/ifcvf.ini
> deleted file mode 100644
> index ef1fc47..0000000
> --- a/doc/guides/nics/features/ifcvf.ini
> +++ /dev/null
> @@ -1,8 +0,0 @@
> -;
> -; Supported features of the 'ifcvf' vDPA driver.
> -;
> -; Refer to default.ini for the full list of available PMD features.
> -;
> -[Features]
> -x86-32               = Y
> -x86-64               = Y
> diff --git a/doc/guides/nics/ifc.rst b/doc/guides/nics/ifc.rst
> deleted file mode 100644
> index 12a2a34..0000000
> --- a/doc/guides/nics/ifc.rst
> +++ /dev/null
> @@ -1,106 +0,0 @@
> -..  SPDX-License-Identifier: BSD-3-Clause
> -    Copyright(c) 2018 Intel Corporation.
> -
> -IFCVF vDPA driver
> -=================
> -
> -The IFCVF vDPA (vhost data path acceleration) driver provides support for
> the
> -Intel FPGA 100G VF (IFCVF). IFCVF's datapath is virtio ring compatible, it
> -works as a HW vhost backend which can send/receive packets to/from virtio
> -directly by DMA. Besides, it supports dirty page logging and device state
> -report/restore, this driver enables its vDPA functionality.
> -
> -
> -Pre-Installation Configuration
> -------------------------------
> -
> -Config File Options
> -~~~~~~~~~~~~~~~~~~~
> -
> -The following option can be modified in the ``config`` file.
> -
> -- ``CONFIG_RTE_LIBRTE_IFC_PMD`` (default ``y`` for linux)
> -
> -  Toggle compilation of the ``librte_pmd_ifc`` driver.
> -
> -
> -IFCVF vDPA Implementation
> --------------------------
> -
> -IFCVF's vendor ID and device ID are same as that of virtio net pci device,
> -with its specific subsystem vendor ID and device ID. To let the device be
> -probed by IFCVF driver, adding "vdpa=1" parameter helps to specify that
> this
> -device is to be used in vDPA mode, rather than polling mode, virtio pmd will
> -skip when it detects this message. If no this parameter specified, device
> -will not be used as a vDPA device, and it will be driven by virtio pmd.
> -
> -Different VF devices serve different virtio frontends which are in different
> -VMs, so each VF needs to have its own DMA address translation service.
> During
> -the driver probe a new container is created for this device, with this
> -container vDPA driver can program DMA remapping table with the VM's
> memory
> -region information.
> -
> -The device argument "sw-live-migration=1" will configure the driver into SW
> -assisted live migration mode. In this mode, the driver will set up a SW relay
> -thread when LM happens, this thread will help device to log dirty pages.
> Thus
> -this mode does not require HW to implement a dirty page logging function
> block,
> -but will consume some percentage of CPU resource depending on the
> network
> -throughput. If no this parameter specified, driver will rely on device's logging
> -capability.
> -
> -Key IFCVF vDPA driver ops
> -~~~~~~~~~~~~~~~~~~~~~~~~~
> -
> -- ifcvf_dev_config:
> -  Enable VF data path with virtio information provided by vhost lib, including
> -  IOMMU programming to enable VF DMA to VM's memory, VFIO interrupt
> setup to
> -  route HW interrupt to virtio driver, create notify relay thread to translate
> -  virtio driver's kick to a MMIO write onto HW, HW queues configuration.
> -
> -  This function gets called to set up HW data path backend when virtio driver
> -  in VM gets ready.
> -
> -- ifcvf_dev_close:
> -  Revoke all the setup in ifcvf_dev_config.
> -
> -  This function gets called when virtio driver stops device in VM.
> -
> -To create a vhost port with IFC VF
> -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> -
> -- Create a vhost socket and assign a VF's device ID to this socket via
> -  vhost API. When QEMU vhost connection gets ready, the assigned VF will
> -  get configured automatically.
> -
> -
> -Features
> ---------
> -
> -Features of the IFCVF driver are:
> -
> -- Compatibility with virtio 0.95 and 1.0.
> -- SW assisted vDPA live migration.
> -
> -
> -Prerequisites
> --------------
> -
> -- Platform with IOMMU feature. IFC VF needs address translation service to
> -  Rx/Tx directly with virtio driver in VM.
> -
> -
> -Limitations
> ------------
> -
> -Dependency on vfio-pci
> -~~~~~~~~~~~~~~~~~~~~~~
> -
> -vDPA driver needs to setup VF MSIX interrupts, each queue's interrupt
> vector
> -is mapped to a callfd associated with a virtio ring. Currently only vfio-pci
> -allows multiple interrupts, so the IFCVF driver is dependent on vfio-pci.
> -
> -Live Migration with VIRTIO_NET_F_GUEST_ANNOUNCE
> -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> -
> -IFC VF doesn't support RARP packet generation, virtio frontend supporting
> -VIRTIO_NET_F_GUEST_ANNOUNCE feature can help to do that.
> diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
> index d61c27f..8c540c0 100644
> --- a/doc/guides/nics/index.rst
> +++ b/doc/guides/nics/index.rst
> @@ -31,7 +31,6 @@ Network Interface Controller Drivers
>      hns3
>      i40e
>      ice
> -    ifc
>      igb
>      ipn3ke
>      ixgbe
> diff --git a/doc/guides/vdpadevs/features/ifcvf.ini
> b/doc/guides/vdpadevs/features/ifcvf.ini
> new file mode 100644
> index 0000000..ef1fc47
> --- /dev/null
> +++ b/doc/guides/vdpadevs/features/ifcvf.ini
> @@ -0,0 +1,8 @@
> +;
> +; Supported features of the 'ifcvf' vDPA driver.
> +;
> +; Refer to default.ini for the full list of available PMD features.
> +;
> +[Features]
> +x86-32               = Y
> +x86-64               = Y
> diff --git a/doc/guides/vdpadevs/ifc.rst b/doc/guides/vdpadevs/ifc.rst
> new file mode 100644
> index 0000000..12a2a34
> --- /dev/null
> +++ b/doc/guides/vdpadevs/ifc.rst
> @@ -0,0 +1,106 @@
> +..  SPDX-License-Identifier: BSD-3-Clause
> +    Copyright(c) 2018 Intel Corporation.
> +
> +IFCVF vDPA driver
> +=================
> +
> +The IFCVF vDPA (vhost data path acceleration) driver provides support for
> the
> +Intel FPGA 100G VF (IFCVF). IFCVF's datapath is virtio ring compatible, it
> +works as a HW vhost backend which can send/receive packets to/from
> virtio
> +directly by DMA. Besides, it supports dirty page logging and device state
> +report/restore, this driver enables its vDPA functionality.
> +
> +
> +Pre-Installation Configuration
> +------------------------------
> +
> +Config File Options
> +~~~~~~~~~~~~~~~~~~~
> +
> +The following option can be modified in the ``config`` file.
> +
> +- ``CONFIG_RTE_LIBRTE_IFC_PMD`` (default ``y`` for linux)
> +
> +  Toggle compilation of the ``librte_pmd_ifc`` driver.
> +
> +
> +IFCVF vDPA Implementation
> +-------------------------
> +
> +IFCVF's vendor ID and device ID are same as that of virtio net pci device,
> +with its specific subsystem vendor ID and device ID. To let the device be
> +probed by IFCVF driver, adding "vdpa=1" parameter helps to specify that
> this
> +device is to be used in vDPA mode, rather than polling mode, virtio pmd will
> +skip when it detects this message. If no this parameter specified, device
> +will not be used as a vDPA device, and it will be driven by virtio pmd.
> +
> +Different VF devices serve different virtio frontends which are in different
> +VMs, so each VF needs to have its own DMA address translation service.
> During
> +the driver probe a new container is created for this device, with this
> +container vDPA driver can program DMA remapping table with the VM's
> memory
> +region information.
> +
> +The device argument "sw-live-migration=1" will configure the driver into SW
> +assisted live migration mode. In this mode, the driver will set up a SW relay
> +thread when LM happens, this thread will help device to log dirty pages.
> Thus
> +this mode does not require HW to implement a dirty page logging function
> block,
> +but will consume some percentage of CPU resource depending on the
> network
> +throughput. If no this parameter specified, driver will rely on device's
> logging
> +capability.
> +
> +Key IFCVF vDPA driver ops
> +~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +- ifcvf_dev_config:
> +  Enable VF data path with virtio information provided by vhost lib, including
> +  IOMMU programming to enable VF DMA to VM's memory, VFIO interrupt
> setup to
> +  route HW interrupt to virtio driver, create notify relay thread to translate
> +  virtio driver's kick to a MMIO write onto HW, HW queues configuration.
> +
> +  This function gets called to set up HW data path backend when virtio driver
> +  in VM gets ready.
> +
> +- ifcvf_dev_close:
> +  Revoke all the setup in ifcvf_dev_config.
> +
> +  This function gets called when virtio driver stops device in VM.
> +
> +To create a vhost port with IFC VF
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +- Create a vhost socket and assign a VF's device ID to this socket via
> +  vhost API. When QEMU vhost connection gets ready, the assigned VF will
> +  get configured automatically.
> +
> +
> +Features
> +--------
> +
> +Features of the IFCVF driver are:
> +
> +- Compatibility with virtio 0.95 and 1.0.
> +- SW assisted vDPA live migration.
> +
> +
> +Prerequisites
> +-------------
> +
> +- Platform with IOMMU feature. IFC VF needs address translation service to
> +  Rx/Tx directly with virtio driver in VM.
> +
> +
> +Limitations
> +-----------
> +
> +Dependency on vfio-pci
> +~~~~~~~~~~~~~~~~~~~~~~
> +
> +vDPA driver needs to setup VF MSIX interrupts, each queue's interrupt
> vector
> +is mapped to a callfd associated with a virtio ring. Currently only vfio-pci
> +allows multiple interrupts, so the IFCVF driver is dependent on vfio-pci.
> +
> +Live Migration with VIRTIO_NET_F_GUEST_ANNOUNCE
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +IFC VF doesn't support RARP packet generation, virtio frontend supporting
> +VIRTIO_NET_F_GUEST_ANNOUNCE feature can help to do that.
> diff --git a/doc/guides/vdpadevs/index.rst b/doc/guides/vdpadevs/index.rst
> index 89e2b03..6cf0827 100644
> --- a/doc/guides/vdpadevs/index.rst
> +++ b/doc/guides/vdpadevs/index.rst
> @@ -12,3 +12,4 @@ which can be used from an application through vhost
> API.
>      :numbered:
> 
>      features_overview
> +    ifc
> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index cee3036..cca3c44 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -71,9 +71,6 @@ endif # $(CONFIG_RTE_LIBRTE_SCHED)
> 
>  ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
> -ifeq ($(CONFIG_RTE_EAL_VFIO),y)
> -DIRS-$(CONFIG_RTE_LIBRTE_IFC_PMD) += ifc
> -endif
>  endif # $(CONFIG_RTE_LIBRTE_VHOST)
> 
>  ifeq ($(CONFIG_RTE_LIBRTE_MVPP2_PMD),y)
> diff --git a/drivers/net/ifc/Makefile b/drivers/net/ifc/Makefile
> deleted file mode 100644
> index fe227b8..0000000
> --- a/drivers/net/ifc/Makefile
> +++ /dev/null
> @@ -1,34 +0,0 @@
> -# SPDX-License-Identifier: BSD-3-Clause
> -# Copyright(c) 2018 Intel Corporation
> -
> -include $(RTE_SDK)/mk/rte.vars.mk
> -
> -#
> -# library name
> -#
> -LIB = librte_pmd_ifc.a
> -
> -LDLIBS += -lpthread
> -LDLIBS += -lrte_eal -lrte_pci -lrte_vhost -lrte_bus_pci
> -LDLIBS += -lrte_kvargs
> -
> -CFLAGS += -O3
> -CFLAGS += $(WERROR_FLAGS)
> -CFLAGS += -DALLOW_EXPERIMENTAL_API
> -
> -#
> -# Add extra flags for base driver source files to disable warnings in them
> -#
> -BASE_DRIVER_OBJS=$(sort $(patsubst %.c,%.o,$(notdir $(wildcard
> $(SRCDIR)/base/*.c))))
> -
> -VPATH += $(SRCDIR)/base
> -
> -EXPORT_MAP := rte_pmd_ifc_version.map
> -
> -#
> -# all source are stored in SRCS-y
> -#
> -SRCS-$(CONFIG_RTE_LIBRTE_IFC_PMD) += ifcvf_vdpa.c
> -SRCS-$(CONFIG_RTE_LIBRTE_IFC_PMD) += ifcvf.c
> -
> -include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/drivers/net/ifc/base/ifcvf.c b/drivers/net/ifc/base/ifcvf.c
> deleted file mode 100644
> index 3c0b2df..0000000
> --- a/drivers/net/ifc/base/ifcvf.c
> +++ /dev/null
> @@ -1,329 +0,0 @@
> -/* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright(c) 2018 Intel Corporation
> - */
> -
> -#include "ifcvf.h"
> -#include "ifcvf_osdep.h"
> -
> -STATIC void *
> -get_cap_addr(struct ifcvf_hw *hw, struct ifcvf_pci_cap *cap)
> -{
> -	u8 bar = cap->bar;
> -	u32 length = cap->length;
> -	u32 offset = cap->offset;
> -
> -	if (bar > IFCVF_PCI_MAX_RESOURCE - 1) {
> -		DEBUGOUT("invalid bar: %u\n", bar);
> -		return NULL;
> -	}
> -
> -	if (offset + length < offset) {
> -		DEBUGOUT("offset(%u) + length(%u) overflows\n",
> -			offset, length);
> -		return NULL;
> -	}
> -
> -	if (offset + length > hw->mem_resource[cap->bar].len) {
> -		DEBUGOUT("offset(%u) + length(%u) overflows bar
> length(%u)",
> -			offset, length, (u32)hw->mem_resource[cap-
> >bar].len);
> -		return NULL;
> -	}
> -
> -	return hw->mem_resource[bar].addr + offset;
> -}
> -
> -int
> -ifcvf_init_hw(struct ifcvf_hw *hw, PCI_DEV *dev)
> -{
> -	int ret;
> -	u8 pos;
> -	struct ifcvf_pci_cap cap;
> -
> -	ret = PCI_READ_CONFIG_BYTE(dev, &pos, PCI_CAPABILITY_LIST);
> -	if (ret < 0) {
> -		DEBUGOUT("failed to read pci capability list\n");
> -		return -1;
> -	}
> -
> -	while (pos) {
> -		ret = PCI_READ_CONFIG_RANGE(dev, (u32 *)&cap,
> -				sizeof(cap), pos);
> -		if (ret < 0) {
> -			DEBUGOUT("failed to read cap at pos: %x", pos);
> -			break;
> -		}
> -
> -		if (cap.cap_vndr != PCI_CAP_ID_VNDR)
> -			goto next;
> -
> -		DEBUGOUT("cfg type: %u, bar: %u, offset: %u, "
> -				"len: %u\n", cap.cfg_type, cap.bar,
> -				cap.offset, cap.length);
> -
> -		switch (cap.cfg_type) {
> -		case IFCVF_PCI_CAP_COMMON_CFG:
> -			hw->common_cfg = get_cap_addr(hw, &cap);
> -			break;
> -		case IFCVF_PCI_CAP_NOTIFY_CFG:
> -			PCI_READ_CONFIG_DWORD(dev, &hw-
> >notify_off_multiplier,
> -					pos + sizeof(cap));
> -			hw->notify_base = get_cap_addr(hw, &cap);
> -			hw->notify_region = cap.bar;
> -			break;
> -		case IFCVF_PCI_CAP_ISR_CFG:
> -			hw->isr = get_cap_addr(hw, &cap);
> -			break;
> -		case IFCVF_PCI_CAP_DEVICE_CFG:
> -			hw->dev_cfg = get_cap_addr(hw, &cap);
> -			break;
> -		}
> -next:
> -		pos = cap.cap_next;
> -	}
> -
> -	hw->lm_cfg = hw->mem_resource[4].addr;
> -
> -	if (hw->common_cfg == NULL || hw->notify_base == NULL ||
> -			hw->isr == NULL || hw->dev_cfg == NULL) {
> -		DEBUGOUT("capability incomplete\n");
> -		return -1;
> -	}
> -
> -	DEBUGOUT("capability mapping:\ncommon cfg: %p\n"
> -			"notify base: %p\nisr cfg: %p\ndevice cfg: %p\n"
> -			"multiplier: %u\n",
> -			hw->common_cfg, hw->dev_cfg,
> -			hw->isr, hw->notify_base,
> -			hw->notify_off_multiplier);
> -
> -	return 0;
> -}
> -
> -STATIC u8
> -ifcvf_get_status(struct ifcvf_hw *hw)
> -{
> -	return IFCVF_READ_REG8(&hw->common_cfg->device_status);
> -}
> -
> -STATIC void
> -ifcvf_set_status(struct ifcvf_hw *hw, u8 status)
> -{
> -	IFCVF_WRITE_REG8(status, &hw->common_cfg->device_status);
> -}
> -
> -STATIC void
> -ifcvf_reset(struct ifcvf_hw *hw)
> -{
> -	ifcvf_set_status(hw, 0);
> -
> -	/* flush status write */
> -	while (ifcvf_get_status(hw))
> -		msec_delay(1);
> -}
> -
> -STATIC void
> -ifcvf_add_status(struct ifcvf_hw *hw, u8 status)
> -{
> -	if (status != 0)
> -		status |= ifcvf_get_status(hw);
> -
> -	ifcvf_set_status(hw, status);
> -	ifcvf_get_status(hw);
> -}
> -
> -u64
> -ifcvf_get_features(struct ifcvf_hw *hw)
> -{
> -	u32 features_lo, features_hi;
> -	struct ifcvf_pci_common_cfg *cfg = hw->common_cfg;
> -
> -	IFCVF_WRITE_REG32(0, &cfg->device_feature_select);
> -	features_lo = IFCVF_READ_REG32(&cfg->device_feature);
> -
> -	IFCVF_WRITE_REG32(1, &cfg->device_feature_select);
> -	features_hi = IFCVF_READ_REG32(&cfg->device_feature);
> -
> -	return ((u64)features_hi << 32) | features_lo;
> -}
> -
> -STATIC void
> -ifcvf_set_features(struct ifcvf_hw *hw, u64 features)
> -{
> -	struct ifcvf_pci_common_cfg *cfg = hw->common_cfg;
> -
> -	IFCVF_WRITE_REG32(0, &cfg->guest_feature_select);
> -	IFCVF_WRITE_REG32(features & ((1ULL << 32) - 1), &cfg-
> >guest_feature);
> -
> -	IFCVF_WRITE_REG32(1, &cfg->guest_feature_select);
> -	IFCVF_WRITE_REG32(features >> 32, &cfg->guest_feature);
> -}
> -
> -STATIC int
> -ifcvf_config_features(struct ifcvf_hw *hw)
> -{
> -	u64 host_features;
> -
> -	host_features = ifcvf_get_features(hw);
> -	hw->req_features &= host_features;
> -
> -	ifcvf_set_features(hw, hw->req_features);
> -	ifcvf_add_status(hw, IFCVF_CONFIG_STATUS_FEATURES_OK);
> -
> -	if (!(ifcvf_get_status(hw) &
> IFCVF_CONFIG_STATUS_FEATURES_OK)) {
> -		DEBUGOUT("failed to set FEATURES_OK status\n");
> -		return -1;
> -	}
> -
> -	return 0;
> -}
> -
> -STATIC void
> -io_write64_twopart(u64 val, u32 *lo, u32 *hi)
> -{
> -	IFCVF_WRITE_REG32(val & ((1ULL << 32) - 1), lo);
> -	IFCVF_WRITE_REG32(val >> 32, hi);
> -}
> -
> -STATIC int
> -ifcvf_hw_enable(struct ifcvf_hw *hw)
> -{
> -	struct ifcvf_pci_common_cfg *cfg;
> -	u8 *lm_cfg;
> -	u32 i;
> -	u16 notify_off;
> -
> -	cfg = hw->common_cfg;
> -	lm_cfg = hw->lm_cfg;
> -
> -	IFCVF_WRITE_REG16(0, &cfg->msix_config);
> -	if (IFCVF_READ_REG16(&cfg->msix_config) ==
> IFCVF_MSI_NO_VECTOR) {
> -		DEBUGOUT("msix vec alloc failed for device config\n");
> -		return -1;
> -	}
> -
> -	for (i = 0; i < hw->nr_vring; i++) {
> -		IFCVF_WRITE_REG16(i, &cfg->queue_select);
> -		io_write64_twopart(hw->vring[i].desc, &cfg-
> >queue_desc_lo,
> -				&cfg->queue_desc_hi);
> -		io_write64_twopart(hw->vring[i].avail, &cfg-
> >queue_avail_lo,
> -				&cfg->queue_avail_hi);
> -		io_write64_twopart(hw->vring[i].used, &cfg-
> >queue_used_lo,
> -				&cfg->queue_used_hi);
> -		IFCVF_WRITE_REG16(hw->vring[i].size, &cfg->queue_size);
> -
> -		*(u32 *)(lm_cfg + IFCVF_LM_RING_STATE_OFFSET +
> -				(i / 2) * IFCVF_LM_CFG_SIZE + (i % 2) * 4) =
> -			(u32)hw->vring[i].last_avail_idx |
> -			((u32)hw->vring[i].last_used_idx << 16);
> -
> -		IFCVF_WRITE_REG16(i + 1, &cfg->queue_msix_vector);
> -		if (IFCVF_READ_REG16(&cfg->queue_msix_vector) ==
> -				IFCVF_MSI_NO_VECTOR) {
> -			DEBUGOUT("queue %u, msix vec alloc failed\n",
> -					i);
> -			return -1;
> -		}
> -
> -		notify_off = IFCVF_READ_REG16(&cfg->queue_notify_off);
> -		hw->notify_addr[i] = (void *)((u8 *)hw->notify_base +
> -				notify_off * hw->notify_off_multiplier);
> -		IFCVF_WRITE_REG16(1, &cfg->queue_enable);
> -	}
> -
> -	return 0;
> -}
> -
> -STATIC void
> -ifcvf_hw_disable(struct ifcvf_hw *hw)
> -{
> -	u32 i;
> -	struct ifcvf_pci_common_cfg *cfg;
> -	u32 ring_state;
> -
> -	cfg = hw->common_cfg;
> -
> -	IFCVF_WRITE_REG16(IFCVF_MSI_NO_VECTOR, &cfg->msix_config);
> -	for (i = 0; i < hw->nr_vring; i++) {
> -		IFCVF_WRITE_REG16(i, &cfg->queue_select);
> -		IFCVF_WRITE_REG16(0, &cfg->queue_enable);
> -		IFCVF_WRITE_REG16(IFCVF_MSI_NO_VECTOR, &cfg-
> >queue_msix_vector);
> -		ring_state = *(u32 *)(hw->lm_cfg +
> IFCVF_LM_RING_STATE_OFFSET +
> -				(i / 2) * IFCVF_LM_CFG_SIZE + (i % 2) * 4);
> -		hw->vring[i].last_avail_idx = (u16)(ring_state >> 16);
> -		hw->vring[i].last_used_idx = (u16)(ring_state >> 16);
> -	}
> -}
> -
> -int
> -ifcvf_start_hw(struct ifcvf_hw *hw)
> -{
> -	ifcvf_reset(hw);
> -	ifcvf_add_status(hw, IFCVF_CONFIG_STATUS_ACK);
> -	ifcvf_add_status(hw, IFCVF_CONFIG_STATUS_DRIVER);
> -
> -	if (ifcvf_config_features(hw) < 0)
> -		return -1;
> -
> -	if (ifcvf_hw_enable(hw) < 0)
> -		return -1;
> -
> -	ifcvf_add_status(hw, IFCVF_CONFIG_STATUS_DRIVER_OK);
> -	return 0;
> -}
> -
> -void
> -ifcvf_stop_hw(struct ifcvf_hw *hw)
> -{
> -	ifcvf_hw_disable(hw);
> -	ifcvf_reset(hw);
> -}
> -
> -void
> -ifcvf_enable_logging(struct ifcvf_hw *hw, u64 log_base, u64 log_size)
> -{
> -	u8 *lm_cfg;
> -
> -	lm_cfg = hw->lm_cfg;
> -
> -	*(u32 *)(lm_cfg + IFCVF_LM_BASE_ADDR_LOW) =
> -		log_base & IFCVF_32_BIT_MASK;
> -
> -	*(u32 *)(lm_cfg + IFCVF_LM_BASE_ADDR_HIGH) =
> -		(log_base >> 32) & IFCVF_32_BIT_MASK;
> -
> -	*(u32 *)(lm_cfg + IFCVF_LM_END_ADDR_LOW) =
> -		(log_base + log_size) & IFCVF_32_BIT_MASK;
> -
> -	*(u32 *)(lm_cfg + IFCVF_LM_END_ADDR_HIGH) =
> -		((log_base + log_size) >> 32) & IFCVF_32_BIT_MASK;
> -
> -	*(u32 *)(lm_cfg + IFCVF_LM_LOGGING_CTRL) =
> IFCVF_LM_ENABLE_VF;
> -}
> -
> -void
> -ifcvf_disable_logging(struct ifcvf_hw *hw)
> -{
> -	u8 *lm_cfg;
> -
> -	lm_cfg = hw->lm_cfg;
> -	*(u32 *)(lm_cfg + IFCVF_LM_LOGGING_CTRL) = IFCVF_LM_DISABLE;
> -}
> -
> -void
> -ifcvf_notify_queue(struct ifcvf_hw *hw, u16 qid)
> -{
> -	IFCVF_WRITE_REG16(qid, hw->notify_addr[qid]);
> -}
> -
> -u8
> -ifcvf_get_notify_region(struct ifcvf_hw *hw)
> -{
> -	return hw->notify_region;
> -}
> -
> -u64
> -ifcvf_get_queue_notify_off(struct ifcvf_hw *hw, int qid)
> -{
> -	return (u8 *)hw->notify_addr[qid] -
> -		(u8 *)hw->mem_resource[hw->notify_region].addr;
> -}
> diff --git a/drivers/net/ifc/base/ifcvf.h b/drivers/net/ifc/base/ifcvf.h
> deleted file mode 100644
> index 9be2770..0000000
> --- a/drivers/net/ifc/base/ifcvf.h
> +++ /dev/null
> @@ -1,162 +0,0 @@
> -/* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright(c) 2018 Intel Corporation
> - */
> -
> -#ifndef _IFCVF_H_
> -#define _IFCVF_H_
> -
> -#include "ifcvf_osdep.h"
> -
> -#define IFCVF_VENDOR_ID		0x1AF4
> -#define IFCVF_DEVICE_ID		0x1041
> -#define IFCVF_SUBSYS_VENDOR_ID	0x8086
> -#define IFCVF_SUBSYS_DEVICE_ID	0x001A
> -
> -#define IFCVF_MAX_QUEUES		1
> -#define VIRTIO_F_IOMMU_PLATFORM		33
> -
> -/* Common configuration */
> -#define IFCVF_PCI_CAP_COMMON_CFG	1
> -/* Notifications */
> -#define IFCVF_PCI_CAP_NOTIFY_CFG	2
> -/* ISR Status */
> -#define IFCVF_PCI_CAP_ISR_CFG		3
> -/* Device specific configuration */
> -#define IFCVF_PCI_CAP_DEVICE_CFG	4
> -/* PCI configuration access */
> -#define IFCVF_PCI_CAP_PCI_CFG		5
> -
> -#define IFCVF_CONFIG_STATUS_RESET     0x00
> -#define IFCVF_CONFIG_STATUS_ACK       0x01
> -#define IFCVF_CONFIG_STATUS_DRIVER    0x02
> -#define IFCVF_CONFIG_STATUS_DRIVER_OK 0x04
> -#define IFCVF_CONFIG_STATUS_FEATURES_OK 0x08
> -#define IFCVF_CONFIG_STATUS_FAILED    0x80
> -
> -#define IFCVF_MSI_NO_VECTOR	0xffff
> -#define IFCVF_PCI_MAX_RESOURCE	6
> -
> -#define IFCVF_LM_CFG_SIZE		0x40
> -#define IFCVF_LM_RING_STATE_OFFSET	0x20
> -
> -#define IFCVF_LM_LOGGING_CTRL		0x0
> -
> -#define IFCVF_LM_BASE_ADDR_LOW		0x10
> -#define IFCVF_LM_BASE_ADDR_HIGH		0x14
> -#define IFCVF_LM_END_ADDR_LOW		0x18
> -#define IFCVF_LM_END_ADDR_HIGH		0x1c
> -
> -#define IFCVF_LM_DISABLE		0x0
> -#define IFCVF_LM_ENABLE_VF		0x1
> -#define IFCVF_LM_ENABLE_PF		0x3
> -#define IFCVF_LOG_BASE			0x100000000000
> -#define IFCVF_MEDIATED_VRING		0x200000000000
> -
> -#define IFCVF_32_BIT_MASK		0xffffffff
> -
> -
> -struct ifcvf_pci_cap {
> -	u8 cap_vndr;            /* Generic PCI field: PCI_CAP_ID_VNDR */
> -	u8 cap_next;            /* Generic PCI field: next ptr. */
> -	u8 cap_len;             /* Generic PCI field: capability length */
> -	u8 cfg_type;            /* Identifies the structure. */
> -	u8 bar;                 /* Where to find it. */
> -	u8 padding[3];          /* Pad to full dword. */
> -	u32 offset;             /* Offset within bar. */
> -	u32 length;             /* Length of the structure, in bytes. */
> -};
> -
> -struct ifcvf_pci_notify_cap {
> -	struct ifcvf_pci_cap cap;
> -	u32 notify_off_multiplier;  /* Multiplier for queue_notify_off. */
> -};
> -
> -struct ifcvf_pci_common_cfg {
> -	/* About the whole device. */
> -	u32 device_feature_select;
> -	u32 device_feature;
> -	u32 guest_feature_select;
> -	u32 guest_feature;
> -	u16 msix_config;
> -	u16 num_queues;
> -	u8 device_status;
> -	u8 config_generation;
> -
> -	/* About a specific virtqueue. */
> -	u16 queue_select;
> -	u16 queue_size;
> -	u16 queue_msix_vector;
> -	u16 queue_enable;
> -	u16 queue_notify_off;
> -	u32 queue_desc_lo;
> -	u32 queue_desc_hi;
> -	u32 queue_avail_lo;
> -	u32 queue_avail_hi;
> -	u32 queue_used_lo;
> -	u32 queue_used_hi;
> -};
> -
> -struct ifcvf_net_config {
> -	u8    mac[6];
> -	u16   status;
> -	u16   max_virtqueue_pairs;
> -} __attribute__((packed));
> -
> -struct ifcvf_pci_mem_resource {
> -	u64      phys_addr; /**< Physical address, 0 if not resource. */
> -	u64      len;       /**< Length of the resource. */
> -	u8       *addr;     /**< Virtual address, NULL when not mapped. */
> -};
> -
> -struct vring_info {
> -	u64 desc;
> -	u64 avail;
> -	u64 used;
> -	u16 size;
> -	u16 last_avail_idx;
> -	u16 last_used_idx;
> -};
> -
> -struct ifcvf_hw {
> -	u64    req_features;
> -	u8     notify_region;
> -	u32    notify_off_multiplier;
> -	struct ifcvf_pci_common_cfg *common_cfg;
> -	struct ifcvf_net_config *dev_cfg;
> -	u8     *isr;
> -	u16    *notify_base;
> -	u16    *notify_addr[IFCVF_MAX_QUEUES * 2];
> -	u8     *lm_cfg;
> -	struct vring_info vring[IFCVF_MAX_QUEUES * 2];
> -	u8 nr_vring;
> -	struct ifcvf_pci_mem_resource
> mem_resource[IFCVF_PCI_MAX_RESOURCE];
> -};
> -
> -int
> -ifcvf_init_hw(struct ifcvf_hw *hw, PCI_DEV *dev);
> -
> -u64
> -ifcvf_get_features(struct ifcvf_hw *hw);
> -
> -int
> -ifcvf_start_hw(struct ifcvf_hw *hw);
> -
> -void
> -ifcvf_stop_hw(struct ifcvf_hw *hw);
> -
> -void
> -ifcvf_enable_logging(struct ifcvf_hw *hw, u64 log_base, u64 log_size);
> -
> -void
> -ifcvf_disable_logging(struct ifcvf_hw *hw);
> -
> -void
> -ifcvf_notify_queue(struct ifcvf_hw *hw, u16 qid);
> -
> -u8
> -ifcvf_get_notify_region(struct ifcvf_hw *hw);
> -
> -u64
> -ifcvf_get_queue_notify_off(struct ifcvf_hw *hw, int qid);
> -
> -#endif /* _IFCVF_H_ */
> diff --git a/drivers/net/ifc/base/ifcvf_osdep.h
> b/drivers/net/ifc/base/ifcvf_osdep.h
> deleted file mode 100644
> index 6aef25e..0000000
> --- a/drivers/net/ifc/base/ifcvf_osdep.h
> +++ /dev/null
> @@ -1,52 +0,0 @@
> -/* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright(c) 2018 Intel Corporation
> - */
> -
> -#ifndef _IFCVF_OSDEP_H_
> -#define _IFCVF_OSDEP_H_
> -
> -#include <stdint.h>
> -#include <linux/pci_regs.h>
> -
> -#include <rte_cycles.h>
> -#include <rte_pci.h>
> -#include <rte_bus_pci.h>
> -#include <rte_log.h>
> -#include <rte_io.h>
> -
> -#define DEBUGOUT(S, args...)    RTE_LOG(DEBUG, PMD, S, ##args)
> -#define STATIC                  static
> -
> -#define msec_delay(x)	rte_delay_us_sleep(1000 * (x))
> -
> -#define IFCVF_READ_REG8(reg)		rte_read8(reg)
> -#define IFCVF_WRITE_REG8(val, reg)	rte_write8((val), (reg))
> -#define IFCVF_READ_REG16(reg)		rte_read16(reg)
> -#define IFCVF_WRITE_REG16(val, reg)	rte_write16((val), (reg))
> -#define IFCVF_READ_REG32(reg)		rte_read32(reg)
> -#define IFCVF_WRITE_REG32(val, reg)	rte_write32((val), (reg))
> -
> -typedef struct rte_pci_device PCI_DEV;
> -
> -#define PCI_READ_CONFIG_BYTE(dev, val, where) \
> -	rte_pci_read_config(dev, val, 1, where)
> -
> -#define PCI_READ_CONFIG_DWORD(dev, val, where) \
> -	rte_pci_read_config(dev, val, 4, where)
> -
> -typedef uint8_t    u8;
> -typedef int8_t     s8;
> -typedef uint16_t   u16;
> -typedef int16_t    s16;
> -typedef uint32_t   u32;
> -typedef int32_t    s32;
> -typedef int64_t    s64;
> -typedef uint64_t   u64;
> -
> -static inline int
> -PCI_READ_CONFIG_RANGE(PCI_DEV *dev, uint32_t *val, int size, int
> where)
> -{
> -	return rte_pci_read_config(dev, val, size, where);
> -}
> -
> -#endif /* _IFCVF_OSDEP_H_ */
> diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
> deleted file mode 100644
> index da4667b..0000000
> --- a/drivers/net/ifc/ifcvf_vdpa.c
> +++ /dev/null
> @@ -1,1280 +0,0 @@
> -/* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright(c) 2018 Intel Corporation
> - */
> -
> -#include <unistd.h>
> -#include <pthread.h>
> -#include <fcntl.h>
> -#include <string.h>
> -#include <sys/ioctl.h>
> -#include <sys/epoll.h>
> -#include <linux/virtio_net.h>
> -#include <stdbool.h>
> -
> -#include <rte_malloc.h>
> -#include <rte_memory.h>
> -#include <rte_bus_pci.h>
> -#include <rte_vhost.h>
> -#include <rte_vdpa.h>
> -#include <rte_vfio.h>
> -#include <rte_spinlock.h>
> -#include <rte_log.h>
> -#include <rte_kvargs.h>
> -#include <rte_devargs.h>
> -
> -#include "base/ifcvf.h"
> -
> -#define DRV_LOG(level, fmt, args...) \
> -	rte_log(RTE_LOG_ ## level, ifcvf_vdpa_logtype, \
> -		"IFCVF %s(): " fmt "\n", __func__, ##args)
> -
> -#ifndef PAGE_SIZE
> -#define PAGE_SIZE 4096
> -#endif
> -
> -#define IFCVF_USED_RING_LEN(size) \
> -	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
> -
> -#define IFCVF_VDPA_MODE		"vdpa"
> -#define IFCVF_SW_FALLBACK_LM	"sw-live-migration"
> -
> -static const char * const ifcvf_valid_arguments[] = {
> -	IFCVF_VDPA_MODE,
> -	IFCVF_SW_FALLBACK_LM,
> -	NULL
> -};
> -
> -static int ifcvf_vdpa_logtype;
> -
> -struct ifcvf_internal {
> -	struct rte_vdpa_dev_addr dev_addr;
> -	struct rte_pci_device *pdev;
> -	struct ifcvf_hw hw;
> -	int vfio_container_fd;
> -	int vfio_group_fd;
> -	int vfio_dev_fd;
> -	pthread_t tid;	/* thread for notify relay */
> -	int epfd;
> -	int vid;
> -	int did;
> -	uint16_t max_queues;
> -	uint64_t features;
> -	rte_atomic32_t started;
> -	rte_atomic32_t dev_attached;
> -	rte_atomic32_t running;
> -	rte_spinlock_t lock;
> -	bool sw_lm;
> -	bool sw_fallback_running;
> -	/* mediated vring for sw fallback */
> -	struct vring m_vring[IFCVF_MAX_QUEUES * 2];
> -	/* eventfd for used ring interrupt */
> -	int intr_fd[IFCVF_MAX_QUEUES * 2];
> -};
> -
> -struct internal_list {
> -	TAILQ_ENTRY(internal_list) next;
> -	struct ifcvf_internal *internal;
> -};
> -
> -TAILQ_HEAD(internal_list_head, internal_list);
> -static struct internal_list_head internal_list =
> -	TAILQ_HEAD_INITIALIZER(internal_list);
> -
> -static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
> -
> -static void update_used_ring(struct ifcvf_internal *internal, uint16_t qid);
> -
> -static struct internal_list *
> -find_internal_resource_by_did(int did)
> -{
> -	int found = 0;
> -	struct internal_list *list;
> -
> -	pthread_mutex_lock(&internal_list_lock);
> -
> -	TAILQ_FOREACH(list, &internal_list, next) {
> -		if (did == list->internal->did) {
> -			found = 1;
> -			break;
> -		}
> -	}
> -
> -	pthread_mutex_unlock(&internal_list_lock);
> -
> -	if (!found)
> -		return NULL;
> -
> -	return list;
> -}
> -
> -static struct internal_list *
> -find_internal_resource_by_dev(struct rte_pci_device *pdev)
> -{
> -	int found = 0;
> -	struct internal_list *list;
> -
> -	pthread_mutex_lock(&internal_list_lock);
> -
> -	TAILQ_FOREACH(list, &internal_list, next) {
> -		if (pdev == list->internal->pdev) {
> -			found = 1;
> -			break;
> -		}
> -	}
> -
> -	pthread_mutex_unlock(&internal_list_lock);
> -
> -	if (!found)
> -		return NULL;
> -
> -	return list;
> -}
> -
> -static int
> -ifcvf_vfio_setup(struct ifcvf_internal *internal)
> -{
> -	struct rte_pci_device *dev = internal->pdev;
> -	char devname[RTE_DEV_NAME_MAX_LEN] = {0};
> -	int iommu_group_num;
> -	int i, ret;
> -
> -	internal->vfio_dev_fd = -1;
> -	internal->vfio_group_fd = -1;
> -	internal->vfio_container_fd = -1;
> -
> -	rte_pci_device_name(&dev->addr, devname,
> RTE_DEV_NAME_MAX_LEN);
> -	ret = rte_vfio_get_group_num(rte_pci_get_sysfs_path(), devname,
> -			&iommu_group_num);
> -	if (ret <= 0) {
> -		DRV_LOG(ERR, "%s failed to get IOMMU group", devname);
> -		return -1;
> -	}
> -
> -	internal->vfio_container_fd = rte_vfio_container_create();
> -	if (internal->vfio_container_fd < 0)
> -		return -1;
> -
> -	internal->vfio_group_fd = rte_vfio_container_group_bind(
> -			internal->vfio_container_fd, iommu_group_num);
> -	if (internal->vfio_group_fd < 0)
> -		goto err;
> -
> -	if (rte_pci_map_device(dev))
> -		goto err;
> -
> -	internal->vfio_dev_fd = dev->intr_handle.vfio_dev_fd;
> -
> -	for (i = 0; i < RTE_MIN(PCI_MAX_RESOURCE,
> IFCVF_PCI_MAX_RESOURCE);
> -			i++) {
> -		internal->hw.mem_resource[i].addr =
> -			internal->pdev->mem_resource[i].addr;
> -		internal->hw.mem_resource[i].phys_addr =
> -			internal->pdev->mem_resource[i].phys_addr;
> -		internal->hw.mem_resource[i].len =
> -			internal->pdev->mem_resource[i].len;
> -	}
> -
> -	return 0;
> -
> -err:
> -	rte_vfio_container_destroy(internal->vfio_container_fd);
> -	return -1;
> -}
> -
> -static int
> -ifcvf_dma_map(struct ifcvf_internal *internal, int do_map)
> -{
> -	uint32_t i;
> -	int ret;
> -	struct rte_vhost_memory *mem = NULL;
> -	int vfio_container_fd;
> -
> -	ret = rte_vhost_get_mem_table(internal->vid, &mem);
> -	if (ret < 0) {
> -		DRV_LOG(ERR, "failed to get VM memory layout.");
> -		goto exit;
> -	}
> -
> -	vfio_container_fd = internal->vfio_container_fd;
> -
> -	for (i = 0; i < mem->nregions; i++) {
> -		struct rte_vhost_mem_region *reg;
> -
> -		reg = &mem->regions[i];
> -		DRV_LOG(INFO, "%s, region %u: HVA 0x%" PRIx64 ", "
> -			"GPA 0x%" PRIx64 ", size 0x%" PRIx64 ".",
> -			do_map ? "DMA map" : "DMA unmap", i,
> -			reg->host_user_addr, reg->guest_phys_addr, reg-
> >size);
> -
> -		if (do_map) {
> -			ret =
> rte_vfio_container_dma_map(vfio_container_fd,
> -				reg->host_user_addr, reg-
> >guest_phys_addr,
> -				reg->size);
> -			if (ret < 0) {
> -				DRV_LOG(ERR, "DMA map failed.");
> -				goto exit;
> -			}
> -		} else {
> -			ret =
> rte_vfio_container_dma_unmap(vfio_container_fd,
> -				reg->host_user_addr, reg-
> >guest_phys_addr,
> -				reg->size);
> -			if (ret < 0) {
> -				DRV_LOG(ERR, "DMA unmap failed.");
> -				goto exit;
> -			}
> -		}
> -	}
> -
> -exit:
> -	if (mem)
> -		free(mem);
> -	return ret;
> -}
> -
> -static uint64_t
> -hva_to_gpa(int vid, uint64_t hva)
> -{
> -	struct rte_vhost_memory *mem = NULL;
> -	struct rte_vhost_mem_region *reg;
> -	uint32_t i;
> -	uint64_t gpa = 0;
> -
> -	if (rte_vhost_get_mem_table(vid, &mem) < 0)
> -		goto exit;
> -
> -	for (i = 0; i < mem->nregions; i++) {
> -		reg = &mem->regions[i];
> -
> -		if (hva >= reg->host_user_addr &&
> -				hva < reg->host_user_addr + reg->size) {
> -			gpa = hva - reg->host_user_addr + reg-
> >guest_phys_addr;
> -			break;
> -		}
> -	}
> -
> -exit:
> -	if (mem)
> -		free(mem);
> -	return gpa;
> -}
> -
> -static int
> -vdpa_ifcvf_start(struct ifcvf_internal *internal)
> -{
> -	struct ifcvf_hw *hw = &internal->hw;
> -	int i, nr_vring;
> -	int vid;
> -	struct rte_vhost_vring vq;
> -	uint64_t gpa;
> -
> -	vid = internal->vid;
> -	nr_vring = rte_vhost_get_vring_num(vid);
> -	rte_vhost_get_negotiated_features(vid, &hw->req_features);
> -
> -	for (i = 0; i < nr_vring; i++) {
> -		rte_vhost_get_vhost_vring(vid, i, &vq);
> -		gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.desc);
> -		if (gpa == 0) {
> -			DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
> -			return -1;
> -		}
> -		hw->vring[i].desc = gpa;
> -
> -		gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.avail);
> -		if (gpa == 0) {
> -			DRV_LOG(ERR, "Fail to get GPA for available ring.");
> -			return -1;
> -		}
> -		hw->vring[i].avail = gpa;
> -
> -		gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.used);
> -		if (gpa == 0) {
> -			DRV_LOG(ERR, "Fail to get GPA for used ring.");
> -			return -1;
> -		}
> -		hw->vring[i].used = gpa;
> -
> -		hw->vring[i].size = vq.size;
> -		rte_vhost_get_vring_base(vid, i, &hw-
> >vring[i].last_avail_idx,
> -				&hw->vring[i].last_used_idx);
> -	}
> -	hw->nr_vring = i;
> -
> -	return ifcvf_start_hw(&internal->hw);
> -}
> -
> -static void
> -vdpa_ifcvf_stop(struct ifcvf_internal *internal)
> -{
> -	struct ifcvf_hw *hw = &internal->hw;
> -	uint32_t i;
> -	int vid;
> -	uint64_t features = 0;
> -	uint64_t log_base = 0, log_size = 0;
> -	uint64_t len;
> -
> -	vid = internal->vid;
> -	ifcvf_stop_hw(hw);
> -
> -	for (i = 0; i < hw->nr_vring; i++)
> -		rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx,
> -				hw->vring[i].last_used_idx);
> -
> -	if (internal->sw_lm)
> -		return;
> -
> -	rte_vhost_get_negotiated_features(vid, &features);
> -	if (RTE_VHOST_NEED_LOG(features)) {
> -		ifcvf_disable_logging(hw);
> -		rte_vhost_get_log_base(internal->vid, &log_base,
> &log_size);
> -		rte_vfio_container_dma_unmap(internal-
> >vfio_container_fd,
> -				log_base, IFCVF_LOG_BASE, log_size);
> -		/*
> -		 * IFCVF marks dirty memory pages for only packet buffer,
> -		 * SW helps to mark the used ring as dirty after device stops.
> -		 */
> -		for (i = 0; i < hw->nr_vring; i++) {
> -			len = IFCVF_USED_RING_LEN(hw->vring[i].size);
> -			rte_vhost_log_used_vring(vid, i, 0, len);
> -		}
> -	}
> -}
> -
> -#define MSIX_IRQ_SET_BUF_LEN (sizeof(struct vfio_irq_set) + \
> -		sizeof(int) * (IFCVF_MAX_QUEUES * 2 + 1))
> -static int
> -vdpa_enable_vfio_intr(struct ifcvf_internal *internal, bool m_rx)
> -{
> -	int ret;
> -	uint32_t i, nr_vring;
> -	char irq_set_buf[MSIX_IRQ_SET_BUF_LEN];
> -	struct vfio_irq_set *irq_set;
> -	int *fd_ptr;
> -	struct rte_vhost_vring vring;
> -	int fd;
> -
> -	vring.callfd = -1;
> -
> -	nr_vring = rte_vhost_get_vring_num(internal->vid);
> -
> -	irq_set = (struct vfio_irq_set *)irq_set_buf;
> -	irq_set->argsz = sizeof(irq_set_buf);
> -	irq_set->count = nr_vring + 1;
> -	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
> -			 VFIO_IRQ_SET_ACTION_TRIGGER;
> -	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
> -	irq_set->start = 0;
> -	fd_ptr = (int *)&irq_set->data;
> -	fd_ptr[RTE_INTR_VEC_ZERO_OFFSET] = internal->pdev-
> >intr_handle.fd;
> -
> -	for (i = 0; i < nr_vring; i++)
> -		internal->intr_fd[i] = -1;
> -
> -	for (i = 0; i < nr_vring; i++) {
> -		rte_vhost_get_vhost_vring(internal->vid, i, &vring);
> -		fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = vring.callfd;
> -		if ((i & 1) == 0 && m_rx == true) {
> -			fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
> -			if (fd < 0) {
> -				DRV_LOG(ERR, "can't setup eventfd: %s",
> -					strerror(errno));
> -				return -1;
> -			}
> -			internal->intr_fd[i] = fd;
> -			fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = fd;
> -		}
> -	}
> -
> -	ret = ioctl(internal->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
> -	if (ret) {
> -		DRV_LOG(ERR, "Error enabling MSI-X interrupts: %s",
> -				strerror(errno));
> -		return -1;
> -	}
> -
> -	return 0;
> -}
> -
> -static int
> -vdpa_disable_vfio_intr(struct ifcvf_internal *internal)
> -{
> -	int ret;
> -	uint32_t i, nr_vring;
> -	char irq_set_buf[MSIX_IRQ_SET_BUF_LEN];
> -	struct vfio_irq_set *irq_set;
> -
> -	irq_set = (struct vfio_irq_set *)irq_set_buf;
> -	irq_set->argsz = sizeof(irq_set_buf);
> -	irq_set->count = 0;
> -	irq_set->flags = VFIO_IRQ_SET_DATA_NONE |
> VFIO_IRQ_SET_ACTION_TRIGGER;
> -	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
> -	irq_set->start = 0;
> -
> -	nr_vring = rte_vhost_get_vring_num(internal->vid);
> -	for (i = 0; i < nr_vring; i++) {
> -		if (internal->intr_fd[i] >= 0)
> -			close(internal->intr_fd[i]);
> -		internal->intr_fd[i] = -1;
> -	}
> -
> -	ret = ioctl(internal->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
> -	if (ret) {
> -		DRV_LOG(ERR, "Error disabling MSI-X interrupts: %s",
> -				strerror(errno));
> -		return -1;
> -	}
> -
> -	return 0;
> -}
> -
> -static void *
> -notify_relay(void *arg)
> -{
> -	int i, kickfd, epfd, nfds = 0;
> -	uint32_t qid, q_num;
> -	struct epoll_event events[IFCVF_MAX_QUEUES * 2];
> -	struct epoll_event ev;
> -	uint64_t buf;
> -	int nbytes;
> -	struct rte_vhost_vring vring;
> -	struct ifcvf_internal *internal = (struct ifcvf_internal *)arg;
> -	struct ifcvf_hw *hw = &internal->hw;
> -
> -	q_num = rte_vhost_get_vring_num(internal->vid);
> -
> -	epfd = epoll_create(IFCVF_MAX_QUEUES * 2);
> -	if (epfd < 0) {
> -		DRV_LOG(ERR, "failed to create epoll instance.");
> -		return NULL;
> -	}
> -	internal->epfd = epfd;
> -
> -	vring.kickfd = -1;
> -	for (qid = 0; qid < q_num; qid++) {
> -		ev.events = EPOLLIN | EPOLLPRI;
> -		rte_vhost_get_vhost_vring(internal->vid, qid, &vring);
> -		ev.data.u64 = qid | (uint64_t)vring.kickfd << 32;
> -		if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) {
> -			DRV_LOG(ERR, "epoll add error: %s",
> strerror(errno));
> -			return NULL;
> -		}
> -	}
> -
> -	for (;;) {
> -		nfds = epoll_wait(epfd, events, q_num, -1);
> -		if (nfds < 0) {
> -			if (errno == EINTR)
> -				continue;
> -			DRV_LOG(ERR, "epoll_wait return fail\n");
> -			return NULL;
> -		}
> -
> -		for (i = 0; i < nfds; i++) {
> -			qid = events[i].data.u32;
> -			kickfd = (uint32_t)(events[i].data.u64 >> 32);
> -			do {
> -				nbytes = read(kickfd, &buf, 8);
> -				if (nbytes < 0) {
> -					if (errno == EINTR ||
> -					    errno == EWOULDBLOCK ||
> -					    errno == EAGAIN)
> -						continue;
> -					DRV_LOG(INFO, "Error reading "
> -						"kickfd: %s",
> -						strerror(errno));
> -				}
> -				break;
> -			} while (1);
> -
> -			ifcvf_notify_queue(hw, qid);
> -		}
> -	}
> -
> -	return NULL;
> -}
> -
> -static int
> -setup_notify_relay(struct ifcvf_internal *internal)
> -{
> -	int ret;
> -
> -	ret = pthread_create(&internal->tid, NULL, notify_relay,
> -			(void *)internal);
> -	if (ret) {
> -		DRV_LOG(ERR, "failed to create notify relay pthread.");
> -		return -1;
> -	}
> -	return 0;
> -}
> -
> -static int
> -unset_notify_relay(struct ifcvf_internal *internal)
> -{
> -	void *status;
> -
> -	if (internal->tid) {
> -		pthread_cancel(internal->tid);
> -		pthread_join(internal->tid, &status);
> -	}
> -	internal->tid = 0;
> -
> -	if (internal->epfd >= 0)
> -		close(internal->epfd);
> -	internal->epfd = -1;
> -
> -	return 0;
> -}
> -
> -static int
> -update_datapath(struct ifcvf_internal *internal)
> -{
> -	int ret;
> -
> -	rte_spinlock_lock(&internal->lock);
> -
> -	if (!rte_atomic32_read(&internal->running) &&
> -	    (rte_atomic32_read(&internal->started) &&
> -	     rte_atomic32_read(&internal->dev_attached))) {
> -		ret = ifcvf_dma_map(internal, 1);
> -		if (ret)
> -			goto err;
> -
> -		ret = vdpa_enable_vfio_intr(internal, 0);
> -		if (ret)
> -			goto err;
> -
> -		ret = vdpa_ifcvf_start(internal);
> -		if (ret)
> -			goto err;
> -
> -		ret = setup_notify_relay(internal);
> -		if (ret)
> -			goto err;
> -
> -		rte_atomic32_set(&internal->running, 1);
> -	} else if (rte_atomic32_read(&internal->running) &&
> -		   (!rte_atomic32_read(&internal->started) ||
> -		    !rte_atomic32_read(&internal->dev_attached))) {
> -		ret = unset_notify_relay(internal);
> -		if (ret)
> -			goto err;
> -
> -		vdpa_ifcvf_stop(internal);
> -
> -		ret = vdpa_disable_vfio_intr(internal);
> -		if (ret)
> -			goto err;
> -
> -		ret = ifcvf_dma_map(internal, 0);
> -		if (ret)
> -			goto err;
> -
> -		rte_atomic32_set(&internal->running, 0);
> -	}
> -
> -	rte_spinlock_unlock(&internal->lock);
> -	return 0;
> -err:
> -	rte_spinlock_unlock(&internal->lock);
> -	return ret;
> -}
> -
> -static int
> -m_ifcvf_start(struct ifcvf_internal *internal)
> -{
> -	struct ifcvf_hw *hw = &internal->hw;
> -	uint32_t i, nr_vring;
> -	int vid, ret;
> -	struct rte_vhost_vring vq;
> -	void *vring_buf;
> -	uint64_t m_vring_iova = IFCVF_MEDIATED_VRING;
> -	uint64_t size;
> -	uint64_t gpa;
> -
> -	memset(&vq, 0, sizeof(vq));
> -	vid = internal->vid;
> -	nr_vring = rte_vhost_get_vring_num(vid);
> -	rte_vhost_get_negotiated_features(vid, &hw->req_features);
> -
> -	for (i = 0; i < nr_vring; i++) {
> -		rte_vhost_get_vhost_vring(vid, i, &vq);
> -
> -		size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE),
> -				PAGE_SIZE);
> -		vring_buf = rte_zmalloc("ifcvf", size, PAGE_SIZE);
> -		vring_init(&internal->m_vring[i], vq.size, vring_buf,
> -				PAGE_SIZE);
> -
> -		ret = rte_vfio_container_dma_map(internal-
> >vfio_container_fd,
> -			(uint64_t)(uintptr_t)vring_buf, m_vring_iova, size);
> -		if (ret < 0) {
> -			DRV_LOG(ERR, "mediated vring DMA map failed.");
> -			goto error;
> -		}
> -
> -		gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.desc);
> -		if (gpa == 0) {
> -			DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
> -			return -1;
> -		}
> -		hw->vring[i].desc = gpa;
> -
> -		gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.avail);
> -		if (gpa == 0) {
> -			DRV_LOG(ERR, "Fail to get GPA for available ring.");
> -			return -1;
> -		}
> -		hw->vring[i].avail = gpa;
> -
> -		/* Direct I/O for Tx queue, relay for Rx queue */
> -		if (i & 1) {
> -			gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.used);
> -			if (gpa == 0) {
> -				DRV_LOG(ERR, "Fail to get GPA for used
> ring.");
> -				return -1;
> -			}
> -			hw->vring[i].used = gpa;
> -		} else {
> -			hw->vring[i].used = m_vring_iova +
> -				(char *)internal->m_vring[i].used -
> -				(char *)internal->m_vring[i].desc;
> -		}
> -
> -		hw->vring[i].size = vq.size;
> -
> -		rte_vhost_get_vring_base(vid, i,
> -				&internal->m_vring[i].avail->idx,
> -				&internal->m_vring[i].used->idx);
> -
> -		rte_vhost_get_vring_base(vid, i, &hw-
> >vring[i].last_avail_idx,
> -				&hw->vring[i].last_used_idx);
> -
> -		m_vring_iova += size;
> -	}
> -	hw->nr_vring = nr_vring;
> -
> -	return ifcvf_start_hw(&internal->hw);
> -
> -error:
> -	for (i = 0; i < nr_vring; i++)
> -		if (internal->m_vring[i].desc)
> -			rte_free(internal->m_vring[i].desc);
> -
> -	return -1;
> -}
> -
> -static int
> -m_ifcvf_stop(struct ifcvf_internal *internal)
> -{
> -	int vid;
> -	uint32_t i;
> -	struct rte_vhost_vring vq;
> -	struct ifcvf_hw *hw = &internal->hw;
> -	uint64_t m_vring_iova = IFCVF_MEDIATED_VRING;
> -	uint64_t size, len;
> -
> -	vid = internal->vid;
> -	ifcvf_stop_hw(hw);
> -
> -	for (i = 0; i < hw->nr_vring; i++) {
> -		/* synchronize remaining new used entries if any */
> -		if ((i & 1) == 0)
> -			update_used_ring(internal, i);
> -
> -		rte_vhost_get_vhost_vring(vid, i, &vq);
> -		len = IFCVF_USED_RING_LEN(vq.size);
> -		rte_vhost_log_used_vring(vid, i, 0, len);
> -
> -		size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE),
> -				PAGE_SIZE);
> -		rte_vfio_container_dma_unmap(internal-
> >vfio_container_fd,
> -			(uint64_t)(uintptr_t)internal->m_vring[i].desc,
> -			m_vring_iova, size);
> -
> -		rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx,
> -				hw->vring[i].last_used_idx);
> -		rte_free(internal->m_vring[i].desc);
> -		m_vring_iova += size;
> -	}
> -
> -	return 0;
> -}
> -
> -static void
> -update_used_ring(struct ifcvf_internal *internal, uint16_t qid)
> -{
> -	rte_vdpa_relay_vring_used(internal->vid, qid, &internal-
> >m_vring[qid]);
> -	rte_vhost_vring_call(internal->vid, qid);
> -}
> -
> -static void *
> -vring_relay(void *arg)
> -{
> -	int i, vid, epfd, fd, nfds;
> -	struct ifcvf_internal *internal = (struct ifcvf_internal *)arg;
> -	struct rte_vhost_vring vring;
> -	uint16_t qid, q_num;
> -	struct epoll_event events[IFCVF_MAX_QUEUES * 4];
> -	struct epoll_event ev;
> -	int nbytes;
> -	uint64_t buf;
> -
> -	vid = internal->vid;
> -	q_num = rte_vhost_get_vring_num(vid);
> -
> -	/* add notify fd and interrupt fd to epoll */
> -	epfd = epoll_create(IFCVF_MAX_QUEUES * 2);
> -	if (epfd < 0) {
> -		DRV_LOG(ERR, "failed to create epoll instance.");
> -		return NULL;
> -	}
> -	internal->epfd = epfd;
> -
> -	vring.kickfd = -1;
> -	for (qid = 0; qid < q_num; qid++) {
> -		ev.events = EPOLLIN | EPOLLPRI;
> -		rte_vhost_get_vhost_vring(vid, qid, &vring);
> -		ev.data.u64 = qid << 1 | (uint64_t)vring.kickfd << 32;
> -		if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) {
> -			DRV_LOG(ERR, "epoll add error: %s",
> strerror(errno));
> -			return NULL;
> -		}
> -	}
> -
> -	for (qid = 0; qid < q_num; qid += 2) {
> -		ev.events = EPOLLIN | EPOLLPRI;
> -		/* leave a flag to mark it's for interrupt */
> -		ev.data.u64 = 1 | qid << 1 |
> -			(uint64_t)internal->intr_fd[qid] << 32;
> -		if (epoll_ctl(epfd, EPOLL_CTL_ADD, internal->intr_fd[qid],
> &ev)
> -				< 0) {
> -			DRV_LOG(ERR, "epoll add error: %s",
> strerror(errno));
> -			return NULL;
> -		}
> -		update_used_ring(internal, qid);
> -	}
> -
> -	/* start relay with a first kick */
> -	for (qid = 0; qid < q_num; qid++)
> -		ifcvf_notify_queue(&internal->hw, qid);
> -
> -	/* listen to the events and react accordingly */
> -	for (;;) {
> -		nfds = epoll_wait(epfd, events, q_num * 2, -1);
> -		if (nfds < 0) {
> -			if (errno == EINTR)
> -				continue;
> -			DRV_LOG(ERR, "epoll_wait return fail\n");
> -			return NULL;
> -		}
> -
> -		for (i = 0; i < nfds; i++) {
> -			fd = (uint32_t)(events[i].data.u64 >> 32);
> -			do {
> -				nbytes = read(fd, &buf, 8);
> -				if (nbytes < 0) {
> -					if (errno == EINTR ||
> -					    errno == EWOULDBLOCK ||
> -					    errno == EAGAIN)
> -						continue;
> -					DRV_LOG(INFO, "Error reading "
> -						"kickfd: %s",
> -						strerror(errno));
> -				}
> -				break;
> -			} while (1);
> -
> -			qid = events[i].data.u32 >> 1;
> -
> -			if (events[i].data.u32 & 1)
> -				update_used_ring(internal, qid);
> -			else
> -				ifcvf_notify_queue(&internal->hw, qid);
> -		}
> -	}
> -
> -	return NULL;
> -}
> -
> -static int
> -setup_vring_relay(struct ifcvf_internal *internal)
> -{
> -	int ret;
> -
> -	ret = pthread_create(&internal->tid, NULL, vring_relay,
> -			(void *)internal);
> -	if (ret) {
> -		DRV_LOG(ERR, "failed to create ring relay pthread.");
> -		return -1;
> -	}
> -	return 0;
> -}
> -
> -static int
> -unset_vring_relay(struct ifcvf_internal *internal)
> -{
> -	void *status;
> -
> -	if (internal->tid) {
> -		pthread_cancel(internal->tid);
> -		pthread_join(internal->tid, &status);
> -	}
> -	internal->tid = 0;
> -
> -	if (internal->epfd >= 0)
> -		close(internal->epfd);
> -	internal->epfd = -1;
> -
> -	return 0;
> -}
> -
> -static int
> -ifcvf_sw_fallback_switchover(struct ifcvf_internal *internal)
> -{
> -	int ret;
> -	int vid = internal->vid;
> -
> -	/* stop the direct IO data path */
> -	unset_notify_relay(internal);
> -	vdpa_ifcvf_stop(internal);
> -	vdpa_disable_vfio_intr(internal);
> -
> -	ret = rte_vhost_host_notifier_ctrl(vid, false);
> -	if (ret && ret != -ENOTSUP)
> -		goto error;
> -
> -	/* set up interrupt for interrupt relay */
> -	ret = vdpa_enable_vfio_intr(internal, 1);
> -	if (ret)
> -		goto unmap;
> -
> -	/* config the VF */
> -	ret = m_ifcvf_start(internal);
> -	if (ret)
> -		goto unset_intr;
> -
> -	/* set up vring relay thread */
> -	ret = setup_vring_relay(internal);
> -	if (ret)
> -		goto stop_vf;
> -
> -	rte_vhost_host_notifier_ctrl(vid, true);
> -
> -	internal->sw_fallback_running = true;
> -
> -	return 0;
> -
> -stop_vf:
> -	m_ifcvf_stop(internal);
> -unset_intr:
> -	vdpa_disable_vfio_intr(internal);
> -unmap:
> -	ifcvf_dma_map(internal, 0);
> -error:
> -	return -1;
> -}
> -
> -static int
> -ifcvf_dev_config(int vid)
> -{
> -	int did;
> -	struct internal_list *list;
> -	struct ifcvf_internal *internal;
> -
> -	did = rte_vhost_get_vdpa_device_id(vid);
> -	list = find_internal_resource_by_did(did);
> -	if (list == NULL) {
> -		DRV_LOG(ERR, "Invalid device id: %d", did);
> -		return -1;
> -	}
> -
> -	internal = list->internal;
> -	internal->vid = vid;
> -	rte_atomic32_set(&internal->dev_attached, 1);
> -	update_datapath(internal);
> -
> -	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
> -		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
> -
> -	return 0;
> -}
> -
> -static int
> -ifcvf_dev_close(int vid)
> -{
> -	int did;
> -	struct internal_list *list;
> -	struct ifcvf_internal *internal;
> -
> -	did = rte_vhost_get_vdpa_device_id(vid);
> -	list = find_internal_resource_by_did(did);
> -	if (list == NULL) {
> -		DRV_LOG(ERR, "Invalid device id: %d", did);
> -		return -1;
> -	}
> -
> -	internal = list->internal;
> -
> -	if (internal->sw_fallback_running) {
> -		/* unset ring relay */
> -		unset_vring_relay(internal);
> -
> -		/* reset VF */
> -		m_ifcvf_stop(internal);
> -
> -		/* remove interrupt setting */
> -		vdpa_disable_vfio_intr(internal);
> -
> -		/* unset DMA map for guest memory */
> -		ifcvf_dma_map(internal, 0);
> -
> -		internal->sw_fallback_running = false;
> -	} else {
> -		rte_atomic32_set(&internal->dev_attached, 0);
> -		update_datapath(internal);
> -	}
> -
> -	return 0;
> -}
> -
> -static int
> -ifcvf_set_features(int vid)
> -{
> -	uint64_t features = 0;
> -	int did;
> -	struct internal_list *list;
> -	struct ifcvf_internal *internal;
> -	uint64_t log_base = 0, log_size = 0;
> -
> -	did = rte_vhost_get_vdpa_device_id(vid);
> -	list = find_internal_resource_by_did(did);
> -	if (list == NULL) {
> -		DRV_LOG(ERR, "Invalid device id: %d", did);
> -		return -1;
> -	}
> -
> -	internal = list->internal;
> -	rte_vhost_get_negotiated_features(vid, &features);
> -
> -	if (!RTE_VHOST_NEED_LOG(features))
> -		return 0;
> -
> -	if (internal->sw_lm) {
> -		ifcvf_sw_fallback_switchover(internal);
> -	} else {
> -		rte_vhost_get_log_base(vid, &log_base, &log_size);
> -		rte_vfio_container_dma_map(internal->vfio_container_fd,
> -				log_base, IFCVF_LOG_BASE, log_size);
> -		ifcvf_enable_logging(&internal->hw, IFCVF_LOG_BASE,
> log_size);
> -	}
> -
> -	return 0;
> -}
> -
> -static int
> -ifcvf_get_vfio_group_fd(int vid)
> -{
> -	int did;
> -	struct internal_list *list;
> -
> -	did = rte_vhost_get_vdpa_device_id(vid);
> -	list = find_internal_resource_by_did(did);
> -	if (list == NULL) {
> -		DRV_LOG(ERR, "Invalid device id: %d", did);
> -		return -1;
> -	}
> -
> -	return list->internal->vfio_group_fd;
> -}
> -
> -static int
> -ifcvf_get_vfio_device_fd(int vid)
> -{
> -	int did;
> -	struct internal_list *list;
> -
> -	did = rte_vhost_get_vdpa_device_id(vid);
> -	list = find_internal_resource_by_did(did);
> -	if (list == NULL) {
> -		DRV_LOG(ERR, "Invalid device id: %d", did);
> -		return -1;
> -	}
> -
> -	return list->internal->vfio_dev_fd;
> -}
> -
> -static int
> -ifcvf_get_notify_area(int vid, int qid, uint64_t *offset, uint64_t *size)
> -{
> -	int did;
> -	struct internal_list *list;
> -	struct ifcvf_internal *internal;
> -	struct vfio_region_info reg = { .argsz = sizeof(reg) };
> -	int ret;
> -
> -	did = rte_vhost_get_vdpa_device_id(vid);
> -	list = find_internal_resource_by_did(did);
> -	if (list == NULL) {
> -		DRV_LOG(ERR, "Invalid device id: %d", did);
> -		return -1;
> -	}
> -
> -	internal = list->internal;
> -
> -	reg.index = ifcvf_get_notify_region(&internal->hw);
> -	ret = ioctl(internal->vfio_dev_fd, VFIO_DEVICE_GET_REGION_INFO,
> &reg);
> -	if (ret) {
> -		DRV_LOG(ERR, "Get not get device region info: %s",
> -				strerror(errno));
> -		return -1;
> -	}
> -
> -	*offset = ifcvf_get_queue_notify_off(&internal->hw, qid) +
> reg.offset;
> -	*size = 0x1000;
> -
> -	return 0;
> -}
> -
> -static int
> -ifcvf_get_queue_num(int did, uint32_t *queue_num)
> -{
> -	struct internal_list *list;
> -
> -	list = find_internal_resource_by_did(did);
> -	if (list == NULL) {
> -		DRV_LOG(ERR, "Invalid device id: %d", did);
> -		return -1;
> -	}
> -
> -	*queue_num = list->internal->max_queues;
> -
> -	return 0;
> -}
> -
> -static int
> -ifcvf_get_vdpa_features(int did, uint64_t *features)
> -{
> -	struct internal_list *list;
> -
> -	list = find_internal_resource_by_did(did);
> -	if (list == NULL) {
> -		DRV_LOG(ERR, "Invalid device id: %d", did);
> -		return -1;
> -	}
> -
> -	*features = list->internal->features;
> -
> -	return 0;
> -}
> -
> -#define VDPA_SUPPORTED_PROTOCOL_FEATURES \
> -		(1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK | \
> -		 1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ | \
> -		 1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD | \
> -		 1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER | \
> -		 1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD)
> -static int
> -ifcvf_get_protocol_features(int did __rte_unused, uint64_t *features)
> -{
> -	*features = VDPA_SUPPORTED_PROTOCOL_FEATURES;
> -	return 0;
> -}
> -
> -static struct rte_vdpa_dev_ops ifcvf_ops = {
> -	.get_queue_num = ifcvf_get_queue_num,
> -	.get_features = ifcvf_get_vdpa_features,
> -	.get_protocol_features = ifcvf_get_protocol_features,
> -	.dev_conf = ifcvf_dev_config,
> -	.dev_close = ifcvf_dev_close,
> -	.set_vring_state = NULL,
> -	.set_features = ifcvf_set_features,
> -	.migration_done = NULL,
> -	.get_vfio_group_fd = ifcvf_get_vfio_group_fd,
> -	.get_vfio_device_fd = ifcvf_get_vfio_device_fd,
> -	.get_notify_area = ifcvf_get_notify_area,
> -};
> -
> -static inline int
> -open_int(const char *key __rte_unused, const char *value, void
> *extra_args)
> -{
> -	uint16_t *n = extra_args;
> -
> -	if (value == NULL || extra_args == NULL)
> -		return -EINVAL;
> -
> -	*n = (uint16_t)strtoul(value, NULL, 0);
> -	if (*n == USHRT_MAX && errno == ERANGE)
> -		return -1;
> -
> -	return 0;
> -}
> -
> -static int
> -ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
> -		struct rte_pci_device *pci_dev)
> -{
> -	uint64_t features;
> -	struct ifcvf_internal *internal = NULL;
> -	struct internal_list *list = NULL;
> -	int vdpa_mode = 0;
> -	int sw_fallback_lm = 0;
> -	struct rte_kvargs *kvlist = NULL;
> -	int ret = 0;
> -
> -	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> -		return 0;
> -
> -	if (!pci_dev->device.devargs)
> -		return 1;
> -
> -	kvlist = rte_kvargs_parse(pci_dev->device.devargs->args,
> -			ifcvf_valid_arguments);
> -	if (kvlist == NULL)
> -		return 1;
> -
> -	/* probe only when vdpa mode is specified */
> -	if (rte_kvargs_count(kvlist, IFCVF_VDPA_MODE) == 0) {
> -		rte_kvargs_free(kvlist);
> -		return 1;
> -	}
> -
> -	ret = rte_kvargs_process(kvlist, IFCVF_VDPA_MODE, &open_int,
> -			&vdpa_mode);
> -	if (ret < 0 || vdpa_mode == 0) {
> -		rte_kvargs_free(kvlist);
> -		return 1;
> -	}
> -
> -	list = rte_zmalloc("ifcvf", sizeof(*list), 0);
> -	if (list == NULL)
> -		goto error;
> -
> -	internal = rte_zmalloc("ifcvf", sizeof(*internal), 0);
> -	if (internal == NULL)
> -		goto error;
> -
> -	internal->pdev = pci_dev;
> -	rte_spinlock_init(&internal->lock);
> -
> -	if (ifcvf_vfio_setup(internal) < 0) {
> -		DRV_LOG(ERR, "failed to setup device %s", pci_dev->name);
> -		goto error;
> -	}
> -
> -	if (ifcvf_init_hw(&internal->hw, internal->pdev) < 0) {
> -		DRV_LOG(ERR, "failed to init device %s", pci_dev->name);
> -		goto error;
> -	}
> -
> -	internal->max_queues = IFCVF_MAX_QUEUES;
> -	features = ifcvf_get_features(&internal->hw);
> -	internal->features = (features &
> -		~(1ULL << VIRTIO_F_IOMMU_PLATFORM)) |
> -		(1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) |
> -		(1ULL << VIRTIO_NET_F_CTRL_VQ) |
> -		(1ULL << VIRTIO_NET_F_STATUS) |
> -		(1ULL << VHOST_USER_F_PROTOCOL_FEATURES) |
> -		(1ULL << VHOST_F_LOG_ALL);
> -
> -	internal->dev_addr.pci_addr = pci_dev->addr;
> -	internal->dev_addr.type = PCI_ADDR;
> -	list->internal = internal;
> -
> -	if (rte_kvargs_count(kvlist, IFCVF_SW_FALLBACK_LM)) {
> -		ret = rte_kvargs_process(kvlist, IFCVF_SW_FALLBACK_LM,
> -				&open_int, &sw_fallback_lm);
> -		if (ret < 0)
> -			goto error;
> -	}
> -	internal->sw_lm = sw_fallback_lm;
> -
> -	internal->did = rte_vdpa_register_device(&internal->dev_addr,
> -				&ifcvf_ops);
> -	if (internal->did < 0) {
> -		DRV_LOG(ERR, "failed to register device %s", pci_dev-
> >name);
> -		goto error;
> -	}
> -
> -	pthread_mutex_lock(&internal_list_lock);
> -	TAILQ_INSERT_TAIL(&internal_list, list, next);
> -	pthread_mutex_unlock(&internal_list_lock);
> -
> -	rte_atomic32_set(&internal->started, 1);
> -	update_datapath(internal);
> -
> -	rte_kvargs_free(kvlist);
> -	return 0;
> -
> -error:
> -	rte_kvargs_free(kvlist);
> -	rte_free(list);
> -	rte_free(internal);
> -	return -1;
> -}
> -
> -static int
> -ifcvf_pci_remove(struct rte_pci_device *pci_dev)
> -{
> -	struct ifcvf_internal *internal;
> -	struct internal_list *list;
> -
> -	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> -		return 0;
> -
> -	list = find_internal_resource_by_dev(pci_dev);
> -	if (list == NULL) {
> -		DRV_LOG(ERR, "Invalid device: %s", pci_dev->name);
> -		return -1;
> -	}
> -
> -	internal = list->internal;
> -	rte_atomic32_set(&internal->started, 0);
> -	update_datapath(internal);
> -
> -	rte_pci_unmap_device(internal->pdev);
> -	rte_vfio_container_destroy(internal->vfio_container_fd);
> -	rte_vdpa_unregister_device(internal->did);
> -
> -	pthread_mutex_lock(&internal_list_lock);
> -	TAILQ_REMOVE(&internal_list, list, next);
> -	pthread_mutex_unlock(&internal_list_lock);
> -
> -	rte_free(list);
> -	rte_free(internal);
> -
> -	return 0;
> -}
> -
> -/*
> - * IFCVF has the same vendor ID and device ID as virtio net PCI
> - * device, with its specific subsystem vendor ID and device ID.
> - */
> -static const struct rte_pci_id pci_id_ifcvf_map[] = {
> -	{ .class_id = RTE_CLASS_ANY_ID,
> -	  .vendor_id = IFCVF_VENDOR_ID,
> -	  .device_id = IFCVF_DEVICE_ID,
> -	  .subsystem_vendor_id = IFCVF_SUBSYS_VENDOR_ID,
> -	  .subsystem_device_id = IFCVF_SUBSYS_DEVICE_ID,
> -	},
> -
> -	{ .vendor_id = 0, /* sentinel */
> -	},
> -};
> -
> -static struct rte_pci_driver rte_ifcvf_vdpa = {
> -	.id_table = pci_id_ifcvf_map,
> -	.drv_flags = 0,
> -	.probe = ifcvf_pci_probe,
> -	.remove = ifcvf_pci_remove,
> -};
> -
> -RTE_PMD_REGISTER_PCI(net_ifcvf, rte_ifcvf_vdpa);
> -RTE_PMD_REGISTER_PCI_TABLE(net_ifcvf, pci_id_ifcvf_map);
> -RTE_PMD_REGISTER_KMOD_DEP(net_ifcvf, "* vfio-pci");
> -
> -RTE_INIT(ifcvf_vdpa_init_log)
> -{
> -	ifcvf_vdpa_logtype = rte_log_register("pmd.net.ifcvf_vdpa");
> -	if (ifcvf_vdpa_logtype >= 0)
> -		rte_log_set_level(ifcvf_vdpa_logtype, RTE_LOG_NOTICE);
> -}
> diff --git a/drivers/net/ifc/meson.build b/drivers/net/ifc/meson.build
> deleted file mode 100644
> index adc9ed9..0000000
> --- a/drivers/net/ifc/meson.build
> +++ /dev/null
> @@ -1,9 +0,0 @@
> -# SPDX-License-Identifier: BSD-3-Clause
> -# Copyright(c) 2018 Intel Corporation
> -
> -build = dpdk_conf.has('RTE_LIBRTE_VHOST')
> -reason = 'missing dependency, DPDK vhost library'
> -allow_experimental_apis = true
> -sources = files('ifcvf_vdpa.c', 'base/ifcvf.c')
> -includes += include_directories('base')
> -deps += 'vhost'
> diff --git a/drivers/net/ifc/rte_pmd_ifc_version.map
> b/drivers/net/ifc/rte_pmd_ifc_version.map
> deleted file mode 100644
> index f9f17e4..0000000
> --- a/drivers/net/ifc/rte_pmd_ifc_version.map
> +++ /dev/null
> @@ -1,3 +0,0 @@
> -DPDK_20.0 {
> -	local: *;
> -};
> diff --git a/drivers/net/meson.build b/drivers/net/meson.build
> index c300afb..b0ea8fe 100644
> --- a/drivers/net/meson.build
> +++ b/drivers/net/meson.build
> @@ -21,7 +21,6 @@ drivers = ['af_packet',
>  	'hns3',
>  	'iavf',
>  	'ice',
> -	'ifc',
>  	'ipn3ke',
>  	'ixgbe',
>  	'kni',
> diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
> index 82a2b70..27fec96 100644
> --- a/drivers/vdpa/Makefile
> +++ b/drivers/vdpa/Makefile
> @@ -5,4 +5,10 @@ include $(RTE_SDK)/mk/rte.vars.mk
> 
>  # DIRS-$(<configuration>) += <directory>
> 
> +ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
> +ifeq ($(CONFIG_RTE_EAL_VFIO),y)
> +DIRS-$(CONFIG_RTE_LIBRTE_IFC_PMD) += ifc
> +endif
> +endif # $(CONFIG_RTE_LIBRTE_VHOST)
> +
>  include $(RTE_SDK)/mk/rte.subdir.mk
> diff --git a/drivers/vdpa/ifc/Makefile b/drivers/vdpa/ifc/Makefile
> new file mode 100644
> index 0000000..fe227b8
> --- /dev/null
> +++ b/drivers/vdpa/ifc/Makefile
> @@ -0,0 +1,34 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2018 Intel Corporation
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +#
> +# library name
> +#
> +LIB = librte_pmd_ifc.a
> +
> +LDLIBS += -lpthread
> +LDLIBS += -lrte_eal -lrte_pci -lrte_vhost -lrte_bus_pci
> +LDLIBS += -lrte_kvargs
> +
> +CFLAGS += -O3
> +CFLAGS += $(WERROR_FLAGS)
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
> +
> +#
> +# Add extra flags for base driver source files to disable warnings in them
> +#
> +BASE_DRIVER_OBJS=$(sort $(patsubst %.c,%.o,$(notdir $(wildcard
> $(SRCDIR)/base/*.c))))
> +
> +VPATH += $(SRCDIR)/base
> +
> +EXPORT_MAP := rte_pmd_ifc_version.map
> +
> +#
> +# all source are stored in SRCS-y
> +#
> +SRCS-$(CONFIG_RTE_LIBRTE_IFC_PMD) += ifcvf_vdpa.c
> +SRCS-$(CONFIG_RTE_LIBRTE_IFC_PMD) += ifcvf.c
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/drivers/vdpa/ifc/base/ifcvf.c b/drivers/vdpa/ifc/base/ifcvf.c
> new file mode 100644
> index 0000000..3c0b2df
> --- /dev/null
> +++ b/drivers/vdpa/ifc/base/ifcvf.c
> @@ -0,0 +1,329 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Intel Corporation
> + */
> +
> +#include "ifcvf.h"
> +#include "ifcvf_osdep.h"
> +
> +STATIC void *
> +get_cap_addr(struct ifcvf_hw *hw, struct ifcvf_pci_cap *cap)
> +{
> +	u8 bar = cap->bar;
> +	u32 length = cap->length;
> +	u32 offset = cap->offset;
> +
> +	if (bar > IFCVF_PCI_MAX_RESOURCE - 1) {
> +		DEBUGOUT("invalid bar: %u\n", bar);
> +		return NULL;
> +	}
> +
> +	if (offset + length < offset) {
> +		DEBUGOUT("offset(%u) + length(%u) overflows\n",
> +			offset, length);
> +		return NULL;
> +	}
> +
> +	if (offset + length > hw->mem_resource[cap->bar].len) {
> +		DEBUGOUT("offset(%u) + length(%u) overflows bar
> length(%u)",
> +			offset, length, (u32)hw->mem_resource[cap-
> >bar].len);
> +		return NULL;
> +	}
> +
> +	return hw->mem_resource[bar].addr + offset;
> +}
> +
> +int
> +ifcvf_init_hw(struct ifcvf_hw *hw, PCI_DEV *dev)
> +{
> +	int ret;
> +	u8 pos;
> +	struct ifcvf_pci_cap cap;
> +
> +	ret = PCI_READ_CONFIG_BYTE(dev, &pos, PCI_CAPABILITY_LIST);
> +	if (ret < 0) {
> +		DEBUGOUT("failed to read pci capability list\n");
> +		return -1;
> +	}
> +
> +	while (pos) {
> +		ret = PCI_READ_CONFIG_RANGE(dev, (u32 *)&cap,
> +				sizeof(cap), pos);
> +		if (ret < 0) {
> +			DEBUGOUT("failed to read cap at pos: %x", pos);
> +			break;
> +		}
> +
> +		if (cap.cap_vndr != PCI_CAP_ID_VNDR)
> +			goto next;
> +
> +		DEBUGOUT("cfg type: %u, bar: %u, offset: %u, "
> +				"len: %u\n", cap.cfg_type, cap.bar,
> +				cap.offset, cap.length);
> +
> +		switch (cap.cfg_type) {
> +		case IFCVF_PCI_CAP_COMMON_CFG:
> +			hw->common_cfg = get_cap_addr(hw, &cap);
> +			break;
> +		case IFCVF_PCI_CAP_NOTIFY_CFG:
> +			PCI_READ_CONFIG_DWORD(dev, &hw-
> >notify_off_multiplier,
> +					pos + sizeof(cap));
> +			hw->notify_base = get_cap_addr(hw, &cap);
> +			hw->notify_region = cap.bar;
> +			break;
> +		case IFCVF_PCI_CAP_ISR_CFG:
> +			hw->isr = get_cap_addr(hw, &cap);
> +			break;
> +		case IFCVF_PCI_CAP_DEVICE_CFG:
> +			hw->dev_cfg = get_cap_addr(hw, &cap);
> +			break;
> +		}
> +next:
> +		pos = cap.cap_next;
> +	}
> +
> +	hw->lm_cfg = hw->mem_resource[4].addr;
> +
> +	if (hw->common_cfg == NULL || hw->notify_base == NULL ||
> +			hw->isr == NULL || hw->dev_cfg == NULL) {
> +		DEBUGOUT("capability incomplete\n");
> +		return -1;
> +	}
> +
> +	DEBUGOUT("capability mapping:\ncommon cfg: %p\n"
> +			"notify base: %p\nisr cfg: %p\ndevice cfg: %p\n"
> +			"multiplier: %u\n",
> +			hw->common_cfg, hw->dev_cfg,
> +			hw->isr, hw->notify_base,
> +			hw->notify_off_multiplier);
> +
> +	return 0;
> +}
> +
> +STATIC u8
> +ifcvf_get_status(struct ifcvf_hw *hw)
> +{
> +	return IFCVF_READ_REG8(&hw->common_cfg->device_status);
> +}
> +
> +STATIC void
> +ifcvf_set_status(struct ifcvf_hw *hw, u8 status)
> +{
> +	IFCVF_WRITE_REG8(status, &hw->common_cfg->device_status);
> +}
> +
> +STATIC void
> +ifcvf_reset(struct ifcvf_hw *hw)
> +{
> +	ifcvf_set_status(hw, 0);
> +
> +	/* flush status write */
> +	while (ifcvf_get_status(hw))
> +		msec_delay(1);
> +}
> +
> +STATIC void
> +ifcvf_add_status(struct ifcvf_hw *hw, u8 status)
> +{
> +	if (status != 0)
> +		status |= ifcvf_get_status(hw);
> +
> +	ifcvf_set_status(hw, status);
> +	ifcvf_get_status(hw);
> +}
> +
> +u64
> +ifcvf_get_features(struct ifcvf_hw *hw)
> +{
> +	u32 features_lo, features_hi;
> +	struct ifcvf_pci_common_cfg *cfg = hw->common_cfg;
> +
> +	IFCVF_WRITE_REG32(0, &cfg->device_feature_select);
> +	features_lo = IFCVF_READ_REG32(&cfg->device_feature);
> +
> +	IFCVF_WRITE_REG32(1, &cfg->device_feature_select);
> +	features_hi = IFCVF_READ_REG32(&cfg->device_feature);
> +
> +	return ((u64)features_hi << 32) | features_lo;
> +}
> +
> +STATIC void
> +ifcvf_set_features(struct ifcvf_hw *hw, u64 features)
> +{
> +	struct ifcvf_pci_common_cfg *cfg = hw->common_cfg;
> +
> +	IFCVF_WRITE_REG32(0, &cfg->guest_feature_select);
> +	IFCVF_WRITE_REG32(features & ((1ULL << 32) - 1), &cfg-
> >guest_feature);
> +
> +	IFCVF_WRITE_REG32(1, &cfg->guest_feature_select);
> +	IFCVF_WRITE_REG32(features >> 32, &cfg->guest_feature);
> +}
> +
> +STATIC int
> +ifcvf_config_features(struct ifcvf_hw *hw)
> +{
> +	u64 host_features;
> +
> +	host_features = ifcvf_get_features(hw);
> +	hw->req_features &= host_features;
> +
> +	ifcvf_set_features(hw, hw->req_features);
> +	ifcvf_add_status(hw, IFCVF_CONFIG_STATUS_FEATURES_OK);
> +
> +	if (!(ifcvf_get_status(hw) &
> IFCVF_CONFIG_STATUS_FEATURES_OK)) {
> +		DEBUGOUT("failed to set FEATURES_OK status\n");
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +STATIC void
> +io_write64_twopart(u64 val, u32 *lo, u32 *hi)
> +{
> +	IFCVF_WRITE_REG32(val & ((1ULL << 32) - 1), lo);
> +	IFCVF_WRITE_REG32(val >> 32, hi);
> +}
> +
> +STATIC int
> +ifcvf_hw_enable(struct ifcvf_hw *hw)
> +{
> +	struct ifcvf_pci_common_cfg *cfg;
> +	u8 *lm_cfg;
> +	u32 i;
> +	u16 notify_off;
> +
> +	cfg = hw->common_cfg;
> +	lm_cfg = hw->lm_cfg;
> +
> +	IFCVF_WRITE_REG16(0, &cfg->msix_config);
> +	if (IFCVF_READ_REG16(&cfg->msix_config) ==
> IFCVF_MSI_NO_VECTOR) {
> +		DEBUGOUT("msix vec alloc failed for device config\n");
> +		return -1;
> +	}
> +
> +	for (i = 0; i < hw->nr_vring; i++) {
> +		IFCVF_WRITE_REG16(i, &cfg->queue_select);
> +		io_write64_twopart(hw->vring[i].desc, &cfg-
> >queue_desc_lo,
> +				&cfg->queue_desc_hi);
> +		io_write64_twopart(hw->vring[i].avail, &cfg-
> >queue_avail_lo,
> +				&cfg->queue_avail_hi);
> +		io_write64_twopart(hw->vring[i].used, &cfg-
> >queue_used_lo,
> +				&cfg->queue_used_hi);
> +		IFCVF_WRITE_REG16(hw->vring[i].size, &cfg->queue_size);
> +
> +		*(u32 *)(lm_cfg + IFCVF_LM_RING_STATE_OFFSET +
> +				(i / 2) * IFCVF_LM_CFG_SIZE + (i % 2) * 4) =
> +			(u32)hw->vring[i].last_avail_idx |
> +			((u32)hw->vring[i].last_used_idx << 16);
> +
> +		IFCVF_WRITE_REG16(i + 1, &cfg->queue_msix_vector);
> +		if (IFCVF_READ_REG16(&cfg->queue_msix_vector) ==
> +				IFCVF_MSI_NO_VECTOR) {
> +			DEBUGOUT("queue %u, msix vec alloc failed\n",
> +					i);
> +			return -1;
> +		}
> +
> +		notify_off = IFCVF_READ_REG16(&cfg->queue_notify_off);
> +		hw->notify_addr[i] = (void *)((u8 *)hw->notify_base +
> +				notify_off * hw->notify_off_multiplier);
> +		IFCVF_WRITE_REG16(1, &cfg->queue_enable);
> +	}
> +
> +	return 0;
> +}
> +
> +STATIC void
> +ifcvf_hw_disable(struct ifcvf_hw *hw)
> +{
> +	u32 i;
> +	struct ifcvf_pci_common_cfg *cfg;
> +	u32 ring_state;
> +
> +	cfg = hw->common_cfg;
> +
> +	IFCVF_WRITE_REG16(IFCVF_MSI_NO_VECTOR, &cfg->msix_config);
> +	for (i = 0; i < hw->nr_vring; i++) {
> +		IFCVF_WRITE_REG16(i, &cfg->queue_select);
> +		IFCVF_WRITE_REG16(0, &cfg->queue_enable);
> +		IFCVF_WRITE_REG16(IFCVF_MSI_NO_VECTOR, &cfg-
> >queue_msix_vector);
> +		ring_state = *(u32 *)(hw->lm_cfg +
> IFCVF_LM_RING_STATE_OFFSET +
> +				(i / 2) * IFCVF_LM_CFG_SIZE + (i % 2) * 4);
> +		hw->vring[i].last_avail_idx = (u16)(ring_state >> 16);
> +		hw->vring[i].last_used_idx = (u16)(ring_state >> 16);
> +	}
> +}
> +
> +int
> +ifcvf_start_hw(struct ifcvf_hw *hw)
> +{
> +	ifcvf_reset(hw);
> +	ifcvf_add_status(hw, IFCVF_CONFIG_STATUS_ACK);
> +	ifcvf_add_status(hw, IFCVF_CONFIG_STATUS_DRIVER);
> +
> +	if (ifcvf_config_features(hw) < 0)
> +		return -1;
> +
> +	if (ifcvf_hw_enable(hw) < 0)
> +		return -1;
> +
> +	ifcvf_add_status(hw, IFCVF_CONFIG_STATUS_DRIVER_OK);
> +	return 0;
> +}
> +
> +void
> +ifcvf_stop_hw(struct ifcvf_hw *hw)
> +{
> +	ifcvf_hw_disable(hw);
> +	ifcvf_reset(hw);
> +}
> +
> +void
> +ifcvf_enable_logging(struct ifcvf_hw *hw, u64 log_base, u64 log_size)
> +{
> +	u8 *lm_cfg;
> +
> +	lm_cfg = hw->lm_cfg;
> +
> +	*(u32 *)(lm_cfg + IFCVF_LM_BASE_ADDR_LOW) =
> +		log_base & IFCVF_32_BIT_MASK;
> +
> +	*(u32 *)(lm_cfg + IFCVF_LM_BASE_ADDR_HIGH) =
> +		(log_base >> 32) & IFCVF_32_BIT_MASK;
> +
> +	*(u32 *)(lm_cfg + IFCVF_LM_END_ADDR_LOW) =
> +		(log_base + log_size) & IFCVF_32_BIT_MASK;
> +
> +	*(u32 *)(lm_cfg + IFCVF_LM_END_ADDR_HIGH) =
> +		((log_base + log_size) >> 32) & IFCVF_32_BIT_MASK;
> +
> +	*(u32 *)(lm_cfg + IFCVF_LM_LOGGING_CTRL) =
> IFCVF_LM_ENABLE_VF;
> +}
> +
> +void
> +ifcvf_disable_logging(struct ifcvf_hw *hw)
> +{
> +	u8 *lm_cfg;
> +
> +	lm_cfg = hw->lm_cfg;
> +	*(u32 *)(lm_cfg + IFCVF_LM_LOGGING_CTRL) = IFCVF_LM_DISABLE;
> +}
> +
> +void
> +ifcvf_notify_queue(struct ifcvf_hw *hw, u16 qid)
> +{
> +	IFCVF_WRITE_REG16(qid, hw->notify_addr[qid]);
> +}
> +
> +u8
> +ifcvf_get_notify_region(struct ifcvf_hw *hw)
> +{
> +	return hw->notify_region;
> +}
> +
> +u64
> +ifcvf_get_queue_notify_off(struct ifcvf_hw *hw, int qid)
> +{
> +	return (u8 *)hw->notify_addr[qid] -
> +		(u8 *)hw->mem_resource[hw->notify_region].addr;
> +}
> diff --git a/drivers/vdpa/ifc/base/ifcvf.h b/drivers/vdpa/ifc/base/ifcvf.h
> new file mode 100644
> index 0000000..9be2770
> --- /dev/null
> +++ b/drivers/vdpa/ifc/base/ifcvf.h
> @@ -0,0 +1,162 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Intel Corporation
> + */
> +
> +#ifndef _IFCVF_H_
> +#define _IFCVF_H_
> +
> +#include "ifcvf_osdep.h"
> +
> +#define IFCVF_VENDOR_ID		0x1AF4
> +#define IFCVF_DEVICE_ID		0x1041
> +#define IFCVF_SUBSYS_VENDOR_ID	0x8086
> +#define IFCVF_SUBSYS_DEVICE_ID	0x001A
> +
> +#define IFCVF_MAX_QUEUES		1
> +#define VIRTIO_F_IOMMU_PLATFORM		33
> +
> +/* Common configuration */
> +#define IFCVF_PCI_CAP_COMMON_CFG	1
> +/* Notifications */
> +#define IFCVF_PCI_CAP_NOTIFY_CFG	2
> +/* ISR Status */
> +#define IFCVF_PCI_CAP_ISR_CFG		3
> +/* Device specific configuration */
> +#define IFCVF_PCI_CAP_DEVICE_CFG	4
> +/* PCI configuration access */
> +#define IFCVF_PCI_CAP_PCI_CFG		5
> +
> +#define IFCVF_CONFIG_STATUS_RESET     0x00
> +#define IFCVF_CONFIG_STATUS_ACK       0x01
> +#define IFCVF_CONFIG_STATUS_DRIVER    0x02
> +#define IFCVF_CONFIG_STATUS_DRIVER_OK 0x04
> +#define IFCVF_CONFIG_STATUS_FEATURES_OK 0x08
> +#define IFCVF_CONFIG_STATUS_FAILED    0x80
> +
> +#define IFCVF_MSI_NO_VECTOR	0xffff
> +#define IFCVF_PCI_MAX_RESOURCE	6
> +
> +#define IFCVF_LM_CFG_SIZE		0x40
> +#define IFCVF_LM_RING_STATE_OFFSET	0x20
> +
> +#define IFCVF_LM_LOGGING_CTRL		0x0
> +
> +#define IFCVF_LM_BASE_ADDR_LOW		0x10
> +#define IFCVF_LM_BASE_ADDR_HIGH		0x14
> +#define IFCVF_LM_END_ADDR_LOW		0x18
> +#define IFCVF_LM_END_ADDR_HIGH		0x1c
> +
> +#define IFCVF_LM_DISABLE		0x0
> +#define IFCVF_LM_ENABLE_VF		0x1
> +#define IFCVF_LM_ENABLE_PF		0x3
> +#define IFCVF_LOG_BASE			0x100000000000
> +#define IFCVF_MEDIATED_VRING		0x200000000000
> +
> +#define IFCVF_32_BIT_MASK		0xffffffff
> +
> +
> +struct ifcvf_pci_cap {
> +	u8 cap_vndr;            /* Generic PCI field: PCI_CAP_ID_VNDR */
> +	u8 cap_next;            /* Generic PCI field: next ptr. */
> +	u8 cap_len;             /* Generic PCI field: capability length */
> +	u8 cfg_type;            /* Identifies the structure. */
> +	u8 bar;                 /* Where to find it. */
> +	u8 padding[3];          /* Pad to full dword. */
> +	u32 offset;             /* Offset within bar. */
> +	u32 length;             /* Length of the structure, in bytes. */
> +};
> +
> +struct ifcvf_pci_notify_cap {
> +	struct ifcvf_pci_cap cap;
> +	u32 notify_off_multiplier;  /* Multiplier for queue_notify_off. */
> +};
> +
> +struct ifcvf_pci_common_cfg {
> +	/* About the whole device. */
> +	u32 device_feature_select;
> +	u32 device_feature;
> +	u32 guest_feature_select;
> +	u32 guest_feature;
> +	u16 msix_config;
> +	u16 num_queues;
> +	u8 device_status;
> +	u8 config_generation;
> +
> +	/* About a specific virtqueue. */
> +	u16 queue_select;
> +	u16 queue_size;
> +	u16 queue_msix_vector;
> +	u16 queue_enable;
> +	u16 queue_notify_off;
> +	u32 queue_desc_lo;
> +	u32 queue_desc_hi;
> +	u32 queue_avail_lo;
> +	u32 queue_avail_hi;
> +	u32 queue_used_lo;
> +	u32 queue_used_hi;
> +};
> +
> +struct ifcvf_net_config {
> +	u8    mac[6];
> +	u16   status;
> +	u16   max_virtqueue_pairs;
> +} __attribute__((packed));
> +
> +struct ifcvf_pci_mem_resource {
> +	u64      phys_addr; /**< Physical address, 0 if not resource. */
> +	u64      len;       /**< Length of the resource. */
> +	u8       *addr;     /**< Virtual address, NULL when not mapped. */
> +};
> +
> +struct vring_info {
> +	u64 desc;
> +	u64 avail;
> +	u64 used;
> +	u16 size;
> +	u16 last_avail_idx;
> +	u16 last_used_idx;
> +};
> +
> +struct ifcvf_hw {
> +	u64    req_features;
> +	u8     notify_region;
> +	u32    notify_off_multiplier;
> +	struct ifcvf_pci_common_cfg *common_cfg;
> +	struct ifcvf_net_config *dev_cfg;
> +	u8     *isr;
> +	u16    *notify_base;
> +	u16    *notify_addr[IFCVF_MAX_QUEUES * 2];
> +	u8     *lm_cfg;
> +	struct vring_info vring[IFCVF_MAX_QUEUES * 2];
> +	u8 nr_vring;
> +	struct ifcvf_pci_mem_resource
> mem_resource[IFCVF_PCI_MAX_RESOURCE];
> +};
> +
> +int
> +ifcvf_init_hw(struct ifcvf_hw *hw, PCI_DEV *dev);
> +
> +u64
> +ifcvf_get_features(struct ifcvf_hw *hw);
> +
> +int
> +ifcvf_start_hw(struct ifcvf_hw *hw);
> +
> +void
> +ifcvf_stop_hw(struct ifcvf_hw *hw);
> +
> +void
> +ifcvf_enable_logging(struct ifcvf_hw *hw, u64 log_base, u64 log_size);
> +
> +void
> +ifcvf_disable_logging(struct ifcvf_hw *hw);
> +
> +void
> +ifcvf_notify_queue(struct ifcvf_hw *hw, u16 qid);
> +
> +u8
> +ifcvf_get_notify_region(struct ifcvf_hw *hw);
> +
> +u64
> +ifcvf_get_queue_notify_off(struct ifcvf_hw *hw, int qid);
> +
> +#endif /* _IFCVF_H_ */
> diff --git a/drivers/vdpa/ifc/base/ifcvf_osdep.h
> b/drivers/vdpa/ifc/base/ifcvf_osdep.h
> new file mode 100644
> index 0000000..6aef25e
> --- /dev/null
> +++ b/drivers/vdpa/ifc/base/ifcvf_osdep.h
> @@ -0,0 +1,52 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Intel Corporation
> + */
> +
> +#ifndef _IFCVF_OSDEP_H_
> +#define _IFCVF_OSDEP_H_
> +
> +#include <stdint.h>
> +#include <linux/pci_regs.h>
> +
> +#include <rte_cycles.h>
> +#include <rte_pci.h>
> +#include <rte_bus_pci.h>
> +#include <rte_log.h>
> +#include <rte_io.h>
> +
> +#define DEBUGOUT(S, args...)    RTE_LOG(DEBUG, PMD, S, ##args)
> +#define STATIC                  static
> +
> +#define msec_delay(x)	rte_delay_us_sleep(1000 * (x))
> +
> +#define IFCVF_READ_REG8(reg)		rte_read8(reg)
> +#define IFCVF_WRITE_REG8(val, reg)	rte_write8((val), (reg))
> +#define IFCVF_READ_REG16(reg)		rte_read16(reg)
> +#define IFCVF_WRITE_REG16(val, reg)	rte_write16((val), (reg))
> +#define IFCVF_READ_REG32(reg)		rte_read32(reg)
> +#define IFCVF_WRITE_REG32(val, reg)	rte_write32((val), (reg))
> +
> +typedef struct rte_pci_device PCI_DEV;
> +
> +#define PCI_READ_CONFIG_BYTE(dev, val, where) \
> +	rte_pci_read_config(dev, val, 1, where)
> +
> +#define PCI_READ_CONFIG_DWORD(dev, val, where) \
> +	rte_pci_read_config(dev, val, 4, where)
> +
> +typedef uint8_t    u8;
> +typedef int8_t     s8;
> +typedef uint16_t   u16;
> +typedef int16_t    s16;
> +typedef uint32_t   u32;
> +typedef int32_t    s32;
> +typedef int64_t    s64;
> +typedef uint64_t   u64;
> +
> +static inline int
> +PCI_READ_CONFIG_RANGE(PCI_DEV *dev, uint32_t *val, int size, int
> where)
> +{
> +	return rte_pci_read_config(dev, val, size, where);
> +}
> +
> +#endif /* _IFCVF_OSDEP_H_ */
> diff --git a/drivers/vdpa/ifc/ifcvf_vdpa.c b/drivers/vdpa/ifc/ifcvf_vdpa.c
> new file mode 100644
> index 0000000..da4667b
> --- /dev/null
> +++ b/drivers/vdpa/ifc/ifcvf_vdpa.c
> @@ -0,0 +1,1280 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Intel Corporation
> + */
> +
> +#include <unistd.h>
> +#include <pthread.h>
> +#include <fcntl.h>
> +#include <string.h>
> +#include <sys/ioctl.h>
> +#include <sys/epoll.h>
> +#include <linux/virtio_net.h>
> +#include <stdbool.h>
> +
> +#include <rte_malloc.h>
> +#include <rte_memory.h>
> +#include <rte_bus_pci.h>
> +#include <rte_vhost.h>
> +#include <rte_vdpa.h>
> +#include <rte_vfio.h>
> +#include <rte_spinlock.h>
> +#include <rte_log.h>
> +#include <rte_kvargs.h>
> +#include <rte_devargs.h>
> +
> +#include "base/ifcvf.h"
> +
> +#define DRV_LOG(level, fmt, args...) \
> +	rte_log(RTE_LOG_ ## level, ifcvf_vdpa_logtype, \
> +		"IFCVF %s(): " fmt "\n", __func__, ##args)
> +
> +#ifndef PAGE_SIZE
> +#define PAGE_SIZE 4096
> +#endif
> +
> +#define IFCVF_USED_RING_LEN(size) \
> +	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
> +
> +#define IFCVF_VDPA_MODE		"vdpa"
> +#define IFCVF_SW_FALLBACK_LM	"sw-live-migration"
> +
> +static const char * const ifcvf_valid_arguments[] = {
> +	IFCVF_VDPA_MODE,
> +	IFCVF_SW_FALLBACK_LM,
> +	NULL
> +};
> +
> +static int ifcvf_vdpa_logtype;
> +
> +struct ifcvf_internal {
> +	struct rte_vdpa_dev_addr dev_addr;
> +	struct rte_pci_device *pdev;
> +	struct ifcvf_hw hw;
> +	int vfio_container_fd;
> +	int vfio_group_fd;
> +	int vfio_dev_fd;
> +	pthread_t tid;	/* thread for notify relay */
> +	int epfd;
> +	int vid;
> +	int did;
> +	uint16_t max_queues;
> +	uint64_t features;
> +	rte_atomic32_t started;
> +	rte_atomic32_t dev_attached;
> +	rte_atomic32_t running;
> +	rte_spinlock_t lock;
> +	bool sw_lm;
> +	bool sw_fallback_running;
> +	/* mediated vring for sw fallback */
> +	struct vring m_vring[IFCVF_MAX_QUEUES * 2];
> +	/* eventfd for used ring interrupt */
> +	int intr_fd[IFCVF_MAX_QUEUES * 2];
> +};
> +
> +struct internal_list {
> +	TAILQ_ENTRY(internal_list) next;
> +	struct ifcvf_internal *internal;
> +};
> +
> +TAILQ_HEAD(internal_list_head, internal_list);
> +static struct internal_list_head internal_list =
> +	TAILQ_HEAD_INITIALIZER(internal_list);
> +
> +static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
> +
> +static void update_used_ring(struct ifcvf_internal *internal, uint16_t qid);
> +
> +static struct internal_list *
> +find_internal_resource_by_did(int did)
> +{
> +	int found = 0;
> +	struct internal_list *list;
> +
> +	pthread_mutex_lock(&internal_list_lock);
> +
> +	TAILQ_FOREACH(list, &internal_list, next) {
> +		if (did == list->internal->did) {
> +			found = 1;
> +			break;
> +		}
> +	}
> +
> +	pthread_mutex_unlock(&internal_list_lock);
> +
> +	if (!found)
> +		return NULL;
> +
> +	return list;
> +}
> +
> +static struct internal_list *
> +find_internal_resource_by_dev(struct rte_pci_device *pdev)
> +{
> +	int found = 0;
> +	struct internal_list *list;
> +
> +	pthread_mutex_lock(&internal_list_lock);
> +
> +	TAILQ_FOREACH(list, &internal_list, next) {
> +		if (pdev == list->internal->pdev) {
> +			found = 1;
> +			break;
> +		}
> +	}
> +
> +	pthread_mutex_unlock(&internal_list_lock);
> +
> +	if (!found)
> +		return NULL;
> +
> +	return list;
> +}
> +
> +static int
> +ifcvf_vfio_setup(struct ifcvf_internal *internal)
> +{
> +	struct rte_pci_device *dev = internal->pdev;
> +	char devname[RTE_DEV_NAME_MAX_LEN] = {0};
> +	int iommu_group_num;
> +	int i, ret;
> +
> +	internal->vfio_dev_fd = -1;
> +	internal->vfio_group_fd = -1;
> +	internal->vfio_container_fd = -1;
> +
> +	rte_pci_device_name(&dev->addr, devname,
> RTE_DEV_NAME_MAX_LEN);
> +	ret = rte_vfio_get_group_num(rte_pci_get_sysfs_path(), devname,
> +			&iommu_group_num);
> +	if (ret <= 0) {
> +		DRV_LOG(ERR, "%s failed to get IOMMU group", devname);
> +		return -1;
> +	}
> +
> +	internal->vfio_container_fd = rte_vfio_container_create();
> +	if (internal->vfio_container_fd < 0)
> +		return -1;
> +
> +	internal->vfio_group_fd = rte_vfio_container_group_bind(
> +			internal->vfio_container_fd, iommu_group_num);
> +	if (internal->vfio_group_fd < 0)
> +		goto err;
> +
> +	if (rte_pci_map_device(dev))
> +		goto err;
> +
> +	internal->vfio_dev_fd = dev->intr_handle.vfio_dev_fd;
> +
> +	for (i = 0; i < RTE_MIN(PCI_MAX_RESOURCE,
> IFCVF_PCI_MAX_RESOURCE);
> +			i++) {
> +		internal->hw.mem_resource[i].addr =
> +			internal->pdev->mem_resource[i].addr;
> +		internal->hw.mem_resource[i].phys_addr =
> +			internal->pdev->mem_resource[i].phys_addr;
> +		internal->hw.mem_resource[i].len =
> +			internal->pdev->mem_resource[i].len;
> +	}
> +
> +	return 0;
> +
> +err:
> +	rte_vfio_container_destroy(internal->vfio_container_fd);
> +	return -1;
> +}
> +
> +static int
> +ifcvf_dma_map(struct ifcvf_internal *internal, int do_map)
> +{
> +	uint32_t i;
> +	int ret;
> +	struct rte_vhost_memory *mem = NULL;
> +	int vfio_container_fd;
> +
> +	ret = rte_vhost_get_mem_table(internal->vid, &mem);
> +	if (ret < 0) {
> +		DRV_LOG(ERR, "failed to get VM memory layout.");
> +		goto exit;
> +	}
> +
> +	vfio_container_fd = internal->vfio_container_fd;
> +
> +	for (i = 0; i < mem->nregions; i++) {
> +		struct rte_vhost_mem_region *reg;
> +
> +		reg = &mem->regions[i];
> +		DRV_LOG(INFO, "%s, region %u: HVA 0x%" PRIx64 ", "
> +			"GPA 0x%" PRIx64 ", size 0x%" PRIx64 ".",
> +			do_map ? "DMA map" : "DMA unmap", i,
> +			reg->host_user_addr, reg->guest_phys_addr, reg-
> >size);
> +
> +		if (do_map) {
> +			ret =
> rte_vfio_container_dma_map(vfio_container_fd,
> +				reg->host_user_addr, reg-
> >guest_phys_addr,
> +				reg->size);
> +			if (ret < 0) {
> +				DRV_LOG(ERR, "DMA map failed.");
> +				goto exit;
> +			}
> +		} else {
> +			ret =
> rte_vfio_container_dma_unmap(vfio_container_fd,
> +				reg->host_user_addr, reg-
> >guest_phys_addr,
> +				reg->size);
> +			if (ret < 0) {
> +				DRV_LOG(ERR, "DMA unmap failed.");
> +				goto exit;
> +			}
> +		}
> +	}
> +
> +exit:
> +	if (mem)
> +		free(mem);
> +	return ret;
> +}
> +
> +static uint64_t
> +hva_to_gpa(int vid, uint64_t hva)
> +{
> +	struct rte_vhost_memory *mem = NULL;
> +	struct rte_vhost_mem_region *reg;
> +	uint32_t i;
> +	uint64_t gpa = 0;
> +
> +	if (rte_vhost_get_mem_table(vid, &mem) < 0)
> +		goto exit;
> +
> +	for (i = 0; i < mem->nregions; i++) {
> +		reg = &mem->regions[i];
> +
> +		if (hva >= reg->host_user_addr &&
> +				hva < reg->host_user_addr + reg->size) {
> +			gpa = hva - reg->host_user_addr + reg-
> >guest_phys_addr;
> +			break;
> +		}
> +	}
> +
> +exit:
> +	if (mem)
> +		free(mem);
> +	return gpa;
> +}
> +
> +static int
> +vdpa_ifcvf_start(struct ifcvf_internal *internal)
> +{
> +	struct ifcvf_hw *hw = &internal->hw;
> +	int i, nr_vring;
> +	int vid;
> +	struct rte_vhost_vring vq;
> +	uint64_t gpa;
> +
> +	vid = internal->vid;
> +	nr_vring = rte_vhost_get_vring_num(vid);
> +	rte_vhost_get_negotiated_features(vid, &hw->req_features);
> +
> +	for (i = 0; i < nr_vring; i++) {
> +		rte_vhost_get_vhost_vring(vid, i, &vq);
> +		gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.desc);
> +		if (gpa == 0) {
> +			DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
> +			return -1;
> +		}
> +		hw->vring[i].desc = gpa;
> +
> +		gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.avail);
> +		if (gpa == 0) {
> +			DRV_LOG(ERR, "Fail to get GPA for available ring.");
> +			return -1;
> +		}
> +		hw->vring[i].avail = gpa;
> +
> +		gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.used);
> +		if (gpa == 0) {
> +			DRV_LOG(ERR, "Fail to get GPA for used ring.");
> +			return -1;
> +		}
> +		hw->vring[i].used = gpa;
> +
> +		hw->vring[i].size = vq.size;
> +		rte_vhost_get_vring_base(vid, i, &hw-
> >vring[i].last_avail_idx,
> +				&hw->vring[i].last_used_idx);
> +	}
> +	hw->nr_vring = i;
> +
> +	return ifcvf_start_hw(&internal->hw);
> +}
> +
> +static void
> +vdpa_ifcvf_stop(struct ifcvf_internal *internal)
> +{
> +	struct ifcvf_hw *hw = &internal->hw;
> +	uint32_t i;
> +	int vid;
> +	uint64_t features = 0;
> +	uint64_t log_base = 0, log_size = 0;
> +	uint64_t len;
> +
> +	vid = internal->vid;
> +	ifcvf_stop_hw(hw);
> +
> +	for (i = 0; i < hw->nr_vring; i++)
> +		rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx,
> +				hw->vring[i].last_used_idx);
> +
> +	if (internal->sw_lm)
> +		return;
> +
> +	rte_vhost_get_negotiated_features(vid, &features);
> +	if (RTE_VHOST_NEED_LOG(features)) {
> +		ifcvf_disable_logging(hw);
> +		rte_vhost_get_log_base(internal->vid, &log_base,
> &log_size);
> +		rte_vfio_container_dma_unmap(internal-
> >vfio_container_fd,
> +				log_base, IFCVF_LOG_BASE, log_size);
> +		/*
> +		 * IFCVF marks dirty memory pages for only packet buffer,
> +		 * SW helps to mark the used ring as dirty after device stops.
> +		 */
> +		for (i = 0; i < hw->nr_vring; i++) {
> +			len = IFCVF_USED_RING_LEN(hw->vring[i].size);
> +			rte_vhost_log_used_vring(vid, i, 0, len);
> +		}
> +	}
> +}
> +
> +#define MSIX_IRQ_SET_BUF_LEN (sizeof(struct vfio_irq_set) + \
> +		sizeof(int) * (IFCVF_MAX_QUEUES * 2 + 1))
> +static int
> +vdpa_enable_vfio_intr(struct ifcvf_internal *internal, bool m_rx)
> +{
> +	int ret;
> +	uint32_t i, nr_vring;
> +	char irq_set_buf[MSIX_IRQ_SET_BUF_LEN];
> +	struct vfio_irq_set *irq_set;
> +	int *fd_ptr;
> +	struct rte_vhost_vring vring;
> +	int fd;
> +
> +	vring.callfd = -1;
> +
> +	nr_vring = rte_vhost_get_vring_num(internal->vid);
> +
> +	irq_set = (struct vfio_irq_set *)irq_set_buf;
> +	irq_set->argsz = sizeof(irq_set_buf);
> +	irq_set->count = nr_vring + 1;
> +	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
> +			 VFIO_IRQ_SET_ACTION_TRIGGER;
> +	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
> +	irq_set->start = 0;
> +	fd_ptr = (int *)&irq_set->data;
> +	fd_ptr[RTE_INTR_VEC_ZERO_OFFSET] = internal->pdev-
> >intr_handle.fd;
> +
> +	for (i = 0; i < nr_vring; i++)
> +		internal->intr_fd[i] = -1;
> +
> +	for (i = 0; i < nr_vring; i++) {
> +		rte_vhost_get_vhost_vring(internal->vid, i, &vring);
> +		fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = vring.callfd;
> +		if ((i & 1) == 0 && m_rx == true) {
> +			fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
> +			if (fd < 0) {
> +				DRV_LOG(ERR, "can't setup eventfd: %s",
> +					strerror(errno));
> +				return -1;
> +			}
> +			internal->intr_fd[i] = fd;
> +			fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = fd;
> +		}
> +	}
> +
> +	ret = ioctl(internal->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
> +	if (ret) {
> +		DRV_LOG(ERR, "Error enabling MSI-X interrupts: %s",
> +				strerror(errno));
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +vdpa_disable_vfio_intr(struct ifcvf_internal *internal)
> +{
> +	int ret;
> +	uint32_t i, nr_vring;
> +	char irq_set_buf[MSIX_IRQ_SET_BUF_LEN];
> +	struct vfio_irq_set *irq_set;
> +
> +	irq_set = (struct vfio_irq_set *)irq_set_buf;
> +	irq_set->argsz = sizeof(irq_set_buf);
> +	irq_set->count = 0;
> +	irq_set->flags = VFIO_IRQ_SET_DATA_NONE |
> VFIO_IRQ_SET_ACTION_TRIGGER;
> +	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
> +	irq_set->start = 0;
> +
> +	nr_vring = rte_vhost_get_vring_num(internal->vid);
> +	for (i = 0; i < nr_vring; i++) {
> +		if (internal->intr_fd[i] >= 0)
> +			close(internal->intr_fd[i]);
> +		internal->intr_fd[i] = -1;
> +	}
> +
> +	ret = ioctl(internal->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
> +	if (ret) {
> +		DRV_LOG(ERR, "Error disabling MSI-X interrupts: %s",
> +				strerror(errno));
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +static void *
> +notify_relay(void *arg)
> +{
> +	int i, kickfd, epfd, nfds = 0;
> +	uint32_t qid, q_num;
> +	struct epoll_event events[IFCVF_MAX_QUEUES * 2];
> +	struct epoll_event ev;
> +	uint64_t buf;
> +	int nbytes;
> +	struct rte_vhost_vring vring;
> +	struct ifcvf_internal *internal = (struct ifcvf_internal *)arg;
> +	struct ifcvf_hw *hw = &internal->hw;
> +
> +	q_num = rte_vhost_get_vring_num(internal->vid);
> +
> +	epfd = epoll_create(IFCVF_MAX_QUEUES * 2);
> +	if (epfd < 0) {
> +		DRV_LOG(ERR, "failed to create epoll instance.");
> +		return NULL;
> +	}
> +	internal->epfd = epfd;
> +
> +	vring.kickfd = -1;
> +	for (qid = 0; qid < q_num; qid++) {
> +		ev.events = EPOLLIN | EPOLLPRI;
> +		rte_vhost_get_vhost_vring(internal->vid, qid, &vring);
> +		ev.data.u64 = qid | (uint64_t)vring.kickfd << 32;
> +		if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) {
> +			DRV_LOG(ERR, "epoll add error: %s",
> strerror(errno));
> +			return NULL;
> +		}
> +	}
> +
> +	for (;;) {
> +		nfds = epoll_wait(epfd, events, q_num, -1);
> +		if (nfds < 0) {
> +			if (errno == EINTR)
> +				continue;
> +			DRV_LOG(ERR, "epoll_wait return fail\n");
> +			return NULL;
> +		}
> +
> +		for (i = 0; i < nfds; i++) {
> +			qid = events[i].data.u32;
> +			kickfd = (uint32_t)(events[i].data.u64 >> 32);
> +			do {
> +				nbytes = read(kickfd, &buf, 8);
> +				if (nbytes < 0) {
> +					if (errno == EINTR ||
> +					    errno == EWOULDBLOCK ||
> +					    errno == EAGAIN)
> +						continue;
> +					DRV_LOG(INFO, "Error reading "
> +						"kickfd: %s",
> +						strerror(errno));
> +				}
> +				break;
> +			} while (1);
> +
> +			ifcvf_notify_queue(hw, qid);
> +		}
> +	}
> +
> +	return NULL;
> +}
> +
> +static int
> +setup_notify_relay(struct ifcvf_internal *internal)
> +{
> +	int ret;
> +
> +	ret = pthread_create(&internal->tid, NULL, notify_relay,
> +			(void *)internal);
> +	if (ret) {
> +		DRV_LOG(ERR, "failed to create notify relay pthread.");
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int
> +unset_notify_relay(struct ifcvf_internal *internal)
> +{
> +	void *status;
> +
> +	if (internal->tid) {
> +		pthread_cancel(internal->tid);
> +		pthread_join(internal->tid, &status);
> +	}
> +	internal->tid = 0;
> +
> +	if (internal->epfd >= 0)
> +		close(internal->epfd);
> +	internal->epfd = -1;
> +
> +	return 0;
> +}
> +
> +static int
> +update_datapath(struct ifcvf_internal *internal)
> +{
> +	int ret;
> +
> +	rte_spinlock_lock(&internal->lock);
> +
> +	if (!rte_atomic32_read(&internal->running) &&
> +	    (rte_atomic32_read(&internal->started) &&
> +	     rte_atomic32_read(&internal->dev_attached))) {
> +		ret = ifcvf_dma_map(internal, 1);
> +		if (ret)
> +			goto err;
> +
> +		ret = vdpa_enable_vfio_intr(internal, 0);
> +		if (ret)
> +			goto err;
> +
> +		ret = vdpa_ifcvf_start(internal);
> +		if (ret)
> +			goto err;
> +
> +		ret = setup_notify_relay(internal);
> +		if (ret)
> +			goto err;
> +
> +		rte_atomic32_set(&internal->running, 1);
> +	} else if (rte_atomic32_read(&internal->running) &&
> +		   (!rte_atomic32_read(&internal->started) ||
> +		    !rte_atomic32_read(&internal->dev_attached))) {
> +		ret = unset_notify_relay(internal);
> +		if (ret)
> +			goto err;
> +
> +		vdpa_ifcvf_stop(internal);
> +
> +		ret = vdpa_disable_vfio_intr(internal);
> +		if (ret)
> +			goto err;
> +
> +		ret = ifcvf_dma_map(internal, 0);
> +		if (ret)
> +			goto err;
> +
> +		rte_atomic32_set(&internal->running, 0);
> +	}
> +
> +	rte_spinlock_unlock(&internal->lock);
> +	return 0;
> +err:
> +	rte_spinlock_unlock(&internal->lock);
> +	return ret;
> +}
> +
> +static int
> +m_ifcvf_start(struct ifcvf_internal *internal)
> +{
> +	struct ifcvf_hw *hw = &internal->hw;
> +	uint32_t i, nr_vring;
> +	int vid, ret;
> +	struct rte_vhost_vring vq;
> +	void *vring_buf;
> +	uint64_t m_vring_iova = IFCVF_MEDIATED_VRING;
> +	uint64_t size;
> +	uint64_t gpa;
> +
> +	memset(&vq, 0, sizeof(vq));
> +	vid = internal->vid;
> +	nr_vring = rte_vhost_get_vring_num(vid);
> +	rte_vhost_get_negotiated_features(vid, &hw->req_features);
> +
> +	for (i = 0; i < nr_vring; i++) {
> +		rte_vhost_get_vhost_vring(vid, i, &vq);
> +
> +		size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE),
> +				PAGE_SIZE);
> +		vring_buf = rte_zmalloc("ifcvf", size, PAGE_SIZE);
> +		vring_init(&internal->m_vring[i], vq.size, vring_buf,
> +				PAGE_SIZE);
> +
> +		ret = rte_vfio_container_dma_map(internal-
> >vfio_container_fd,
> +			(uint64_t)(uintptr_t)vring_buf, m_vring_iova, size);
> +		if (ret < 0) {
> +			DRV_LOG(ERR, "mediated vring DMA map failed.");
> +			goto error;
> +		}
> +
> +		gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.desc);
> +		if (gpa == 0) {
> +			DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
> +			return -1;
> +		}
> +		hw->vring[i].desc = gpa;
> +
> +		gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.avail);
> +		if (gpa == 0) {
> +			DRV_LOG(ERR, "Fail to get GPA for available ring.");
> +			return -1;
> +		}
> +		hw->vring[i].avail = gpa;
> +
> +		/* Direct I/O for Tx queue, relay for Rx queue */
> +		if (i & 1) {
> +			gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.used);
> +			if (gpa == 0) {
> +				DRV_LOG(ERR, "Fail to get GPA for used
> ring.");
> +				return -1;
> +			}
> +			hw->vring[i].used = gpa;
> +		} else {
> +			hw->vring[i].used = m_vring_iova +
> +				(char *)internal->m_vring[i].used -
> +				(char *)internal->m_vring[i].desc;
> +		}
> +
> +		hw->vring[i].size = vq.size;
> +
> +		rte_vhost_get_vring_base(vid, i,
> +				&internal->m_vring[i].avail->idx,
> +				&internal->m_vring[i].used->idx);
> +
> +		rte_vhost_get_vring_base(vid, i, &hw-
> >vring[i].last_avail_idx,
> +				&hw->vring[i].last_used_idx);
> +
> +		m_vring_iova += size;
> +	}
> +	hw->nr_vring = nr_vring;
> +
> +	return ifcvf_start_hw(&internal->hw);
> +
> +error:
> +	for (i = 0; i < nr_vring; i++)
> +		if (internal->m_vring[i].desc)
> +			rte_free(internal->m_vring[i].desc);
> +
> +	return -1;
> +}
> +
> +static int
> +m_ifcvf_stop(struct ifcvf_internal *internal)
> +{
> +	int vid;
> +	uint32_t i;
> +	struct rte_vhost_vring vq;
> +	struct ifcvf_hw *hw = &internal->hw;
> +	uint64_t m_vring_iova = IFCVF_MEDIATED_VRING;
> +	uint64_t size, len;
> +
> +	vid = internal->vid;
> +	ifcvf_stop_hw(hw);
> +
> +	for (i = 0; i < hw->nr_vring; i++) {
> +		/* synchronize remaining new used entries if any */
> +		if ((i & 1) == 0)
> +			update_used_ring(internal, i);
> +
> +		rte_vhost_get_vhost_vring(vid, i, &vq);
> +		len = IFCVF_USED_RING_LEN(vq.size);
> +		rte_vhost_log_used_vring(vid, i, 0, len);
> +
> +		size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE),
> +				PAGE_SIZE);
> +		rte_vfio_container_dma_unmap(internal-
> >vfio_container_fd,
> +			(uint64_t)(uintptr_t)internal->m_vring[i].desc,
> +			m_vring_iova, size);
> +
> +		rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx,
> +				hw->vring[i].last_used_idx);
> +		rte_free(internal->m_vring[i].desc);
> +		m_vring_iova += size;
> +	}
> +
> +	return 0;
> +}
> +
> +static void
> +update_used_ring(struct ifcvf_internal *internal, uint16_t qid)
> +{
> +	rte_vdpa_relay_vring_used(internal->vid, qid, &internal-
> >m_vring[qid]);
> +	rte_vhost_vring_call(internal->vid, qid);
> +}
> +
> +static void *
> +vring_relay(void *arg)
> +{
> +	int i, vid, epfd, fd, nfds;
> +	struct ifcvf_internal *internal = (struct ifcvf_internal *)arg;
> +	struct rte_vhost_vring vring;
> +	uint16_t qid, q_num;
> +	struct epoll_event events[IFCVF_MAX_QUEUES * 4];
> +	struct epoll_event ev;
> +	int nbytes;
> +	uint64_t buf;
> +
> +	vid = internal->vid;
> +	q_num = rte_vhost_get_vring_num(vid);
> +
> +	/* add notify fd and interrupt fd to epoll */
> +	epfd = epoll_create(IFCVF_MAX_QUEUES * 2);
> +	if (epfd < 0) {
> +		DRV_LOG(ERR, "failed to create epoll instance.");
> +		return NULL;
> +	}
> +	internal->epfd = epfd;
> +
> +	vring.kickfd = -1;
> +	for (qid = 0; qid < q_num; qid++) {
> +		ev.events = EPOLLIN | EPOLLPRI;
> +		rte_vhost_get_vhost_vring(vid, qid, &vring);
> +		ev.data.u64 = qid << 1 | (uint64_t)vring.kickfd << 32;
> +		if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) {
> +			DRV_LOG(ERR, "epoll add error: %s",
> strerror(errno));
> +			return NULL;
> +		}
> +	}
> +
> +	for (qid = 0; qid < q_num; qid += 2) {
> +		ev.events = EPOLLIN | EPOLLPRI;
> +		/* leave a flag to mark it's for interrupt */
> +		ev.data.u64 = 1 | qid << 1 |
> +			(uint64_t)internal->intr_fd[qid] << 32;
> +		if (epoll_ctl(epfd, EPOLL_CTL_ADD, internal->intr_fd[qid],
> &ev)
> +				< 0) {
> +			DRV_LOG(ERR, "epoll add error: %s",
> strerror(errno));
> +			return NULL;
> +		}
> +		update_used_ring(internal, qid);
> +	}
> +
> +	/* start relay with a first kick */
> +	for (qid = 0; qid < q_num; qid++)
> +		ifcvf_notify_queue(&internal->hw, qid);
> +
> +	/* listen to the events and react accordingly */
> +	for (;;) {
> +		nfds = epoll_wait(epfd, events, q_num * 2, -1);
> +		if (nfds < 0) {
> +			if (errno == EINTR)
> +				continue;
> +			DRV_LOG(ERR, "epoll_wait return fail\n");
> +			return NULL;
> +		}
> +
> +		for (i = 0; i < nfds; i++) {
> +			fd = (uint32_t)(events[i].data.u64 >> 32);
> +			do {
> +				nbytes = read(fd, &buf, 8);
> +				if (nbytes < 0) {
> +					if (errno == EINTR ||
> +					    errno == EWOULDBLOCK ||
> +					    errno == EAGAIN)
> +						continue;
> +					DRV_LOG(INFO, "Error reading "
> +						"kickfd: %s",
> +						strerror(errno));
> +				}
> +				break;
> +			} while (1);
> +
> +			qid = events[i].data.u32 >> 1;
> +
> +			if (events[i].data.u32 & 1)
> +				update_used_ring(internal, qid);
> +			else
> +				ifcvf_notify_queue(&internal->hw, qid);
> +		}
> +	}
> +
> +	return NULL;
> +}
> +
> +static int
> +setup_vring_relay(struct ifcvf_internal *internal)
> +{
> +	int ret;
> +
> +	ret = pthread_create(&internal->tid, NULL, vring_relay,
> +			(void *)internal);
> +	if (ret) {
> +		DRV_LOG(ERR, "failed to create ring relay pthread.");
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int
> +unset_vring_relay(struct ifcvf_internal *internal)
> +{
> +	void *status;
> +
> +	if (internal->tid) {
> +		pthread_cancel(internal->tid);
> +		pthread_join(internal->tid, &status);
> +	}
> +	internal->tid = 0;
> +
> +	if (internal->epfd >= 0)
> +		close(internal->epfd);
> +	internal->epfd = -1;
> +
> +	return 0;
> +}
> +
> +static int
> +ifcvf_sw_fallback_switchover(struct ifcvf_internal *internal)
> +{
> +	int ret;
> +	int vid = internal->vid;
> +
> +	/* stop the direct IO data path */
> +	unset_notify_relay(internal);
> +	vdpa_ifcvf_stop(internal);
> +	vdpa_disable_vfio_intr(internal);
> +
> +	ret = rte_vhost_host_notifier_ctrl(vid, false);
> +	if (ret && ret != -ENOTSUP)
> +		goto error;
> +
> +	/* set up interrupt for interrupt relay */
> +	ret = vdpa_enable_vfio_intr(internal, 1);
> +	if (ret)
> +		goto unmap;
> +
> +	/* config the VF */
> +	ret = m_ifcvf_start(internal);
> +	if (ret)
> +		goto unset_intr;
> +
> +	/* set up vring relay thread */
> +	ret = setup_vring_relay(internal);
> +	if (ret)
> +		goto stop_vf;
> +
> +	rte_vhost_host_notifier_ctrl(vid, true);
> +
> +	internal->sw_fallback_running = true;
> +
> +	return 0;
> +
> +stop_vf:
> +	m_ifcvf_stop(internal);
> +unset_intr:
> +	vdpa_disable_vfio_intr(internal);
> +unmap:
> +	ifcvf_dma_map(internal, 0);
> +error:
> +	return -1;
> +}
> +
> +static int
> +ifcvf_dev_config(int vid)
> +{
> +	int did;
> +	struct internal_list *list;
> +	struct ifcvf_internal *internal;
> +
> +	did = rte_vhost_get_vdpa_device_id(vid);
> +	list = find_internal_resource_by_did(did);
> +	if (list == NULL) {
> +		DRV_LOG(ERR, "Invalid device id: %d", did);
> +		return -1;
> +	}
> +
> +	internal = list->internal;
> +	internal->vid = vid;
> +	rte_atomic32_set(&internal->dev_attached, 1);
> +	update_datapath(internal);
> +
> +	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
> +		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
> +
> +	return 0;
> +}
> +
> +static int
> +ifcvf_dev_close(int vid)
> +{
> +	int did;
> +	struct internal_list *list;
> +	struct ifcvf_internal *internal;
> +
> +	did = rte_vhost_get_vdpa_device_id(vid);
> +	list = find_internal_resource_by_did(did);
> +	if (list == NULL) {
> +		DRV_LOG(ERR, "Invalid device id: %d", did);
> +		return -1;
> +	}
> +
> +	internal = list->internal;
> +
> +	if (internal->sw_fallback_running) {
> +		/* unset ring relay */
> +		unset_vring_relay(internal);
> +
> +		/* reset VF */
> +		m_ifcvf_stop(internal);
> +
> +		/* remove interrupt setting */
> +		vdpa_disable_vfio_intr(internal);
> +
> +		/* unset DMA map for guest memory */
> +		ifcvf_dma_map(internal, 0);
> +
> +		internal->sw_fallback_running = false;
> +	} else {
> +		rte_atomic32_set(&internal->dev_attached, 0);
> +		update_datapath(internal);
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +ifcvf_set_features(int vid)
> +{
> +	uint64_t features = 0;
> +	int did;
> +	struct internal_list *list;
> +	struct ifcvf_internal *internal;
> +	uint64_t log_base = 0, log_size = 0;
> +
> +	did = rte_vhost_get_vdpa_device_id(vid);
> +	list = find_internal_resource_by_did(did);
> +	if (list == NULL) {
> +		DRV_LOG(ERR, "Invalid device id: %d", did);
> +		return -1;
> +	}
> +
> +	internal = list->internal;
> +	rte_vhost_get_negotiated_features(vid, &features);
> +
> +	if (!RTE_VHOST_NEED_LOG(features))
> +		return 0;
> +
> +	if (internal->sw_lm) {
> +		ifcvf_sw_fallback_switchover(internal);
> +	} else {
> +		rte_vhost_get_log_base(vid, &log_base, &log_size);
> +		rte_vfio_container_dma_map(internal->vfio_container_fd,
> +				log_base, IFCVF_LOG_BASE, log_size);
> +		ifcvf_enable_logging(&internal->hw, IFCVF_LOG_BASE,
> log_size);
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +ifcvf_get_vfio_group_fd(int vid)
> +{
> +	int did;
> +	struct internal_list *list;
> +
> +	did = rte_vhost_get_vdpa_device_id(vid);
> +	list = find_internal_resource_by_did(did);
> +	if (list == NULL) {
> +		DRV_LOG(ERR, "Invalid device id: %d", did);
> +		return -1;
> +	}
> +
> +	return list->internal->vfio_group_fd;
> +}
> +
> +static int
> +ifcvf_get_vfio_device_fd(int vid)
> +{
> +	int did;
> +	struct internal_list *list;
> +
> +	did = rte_vhost_get_vdpa_device_id(vid);
> +	list = find_internal_resource_by_did(did);
> +	if (list == NULL) {
> +		DRV_LOG(ERR, "Invalid device id: %d", did);
> +		return -1;
> +	}
> +
> +	return list->internal->vfio_dev_fd;
> +}
> +
> +static int
> +ifcvf_get_notify_area(int vid, int qid, uint64_t *offset, uint64_t *size)
> +{
> +	int did;
> +	struct internal_list *list;
> +	struct ifcvf_internal *internal;
> +	struct vfio_region_info reg = { .argsz = sizeof(reg) };
> +	int ret;
> +
> +	did = rte_vhost_get_vdpa_device_id(vid);
> +	list = find_internal_resource_by_did(did);
> +	if (list == NULL) {
> +		DRV_LOG(ERR, "Invalid device id: %d", did);
> +		return -1;
> +	}
> +
> +	internal = list->internal;
> +
> +	reg.index = ifcvf_get_notify_region(&internal->hw);
> +	ret = ioctl(internal->vfio_dev_fd, VFIO_DEVICE_GET_REGION_INFO,
> &reg);
> +	if (ret) {
> +		DRV_LOG(ERR, "Get not get device region info: %s",
> +				strerror(errno));
> +		return -1;
> +	}
> +
> +	*offset = ifcvf_get_queue_notify_off(&internal->hw, qid) +
> reg.offset;
> +	*size = 0x1000;
> +
> +	return 0;
> +}
> +
> +static int
> +ifcvf_get_queue_num(int did, uint32_t *queue_num)
> +{
> +	struct internal_list *list;
> +
> +	list = find_internal_resource_by_did(did);
> +	if (list == NULL) {
> +		DRV_LOG(ERR, "Invalid device id: %d", did);
> +		return -1;
> +	}
> +
> +	*queue_num = list->internal->max_queues;
> +
> +	return 0;
> +}
> +
> +static int
> +ifcvf_get_vdpa_features(int did, uint64_t *features)
> +{
> +	struct internal_list *list;
> +
> +	list = find_internal_resource_by_did(did);
> +	if (list == NULL) {
> +		DRV_LOG(ERR, "Invalid device id: %d", did);
> +		return -1;
> +	}
> +
> +	*features = list->internal->features;
> +
> +	return 0;
> +}
> +
> +#define VDPA_SUPPORTED_PROTOCOL_FEATURES \
> +		(1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK | \
> +		 1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ | \
> +		 1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD | \
> +		 1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER | \
> +		 1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD)
> +static int
> +ifcvf_get_protocol_features(int did __rte_unused, uint64_t *features)
> +{
> +	*features = VDPA_SUPPORTED_PROTOCOL_FEATURES;
> +	return 0;
> +}
> +
> +static struct rte_vdpa_dev_ops ifcvf_ops = {
> +	.get_queue_num = ifcvf_get_queue_num,
> +	.get_features = ifcvf_get_vdpa_features,
> +	.get_protocol_features = ifcvf_get_protocol_features,
> +	.dev_conf = ifcvf_dev_config,
> +	.dev_close = ifcvf_dev_close,
> +	.set_vring_state = NULL,
> +	.set_features = ifcvf_set_features,
> +	.migration_done = NULL,
> +	.get_vfio_group_fd = ifcvf_get_vfio_group_fd,
> +	.get_vfio_device_fd = ifcvf_get_vfio_device_fd,
> +	.get_notify_area = ifcvf_get_notify_area,
> +};
> +
> +static inline int
> +open_int(const char *key __rte_unused, const char *value, void
> *extra_args)
> +{
> +	uint16_t *n = extra_args;
> +
> +	if (value == NULL || extra_args == NULL)
> +		return -EINVAL;
> +
> +	*n = (uint16_t)strtoul(value, NULL, 0);
> +	if (*n == USHRT_MAX && errno == ERANGE)
> +		return -1;
> +
> +	return 0;
> +}
> +
> +static int
> +ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
> +		struct rte_pci_device *pci_dev)
> +{
> +	uint64_t features;
> +	struct ifcvf_internal *internal = NULL;
> +	struct internal_list *list = NULL;
> +	int vdpa_mode = 0;
> +	int sw_fallback_lm = 0;
> +	struct rte_kvargs *kvlist = NULL;
> +	int ret = 0;
> +
> +	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> +		return 0;
> +
> +	if (!pci_dev->device.devargs)
> +		return 1;
> +
> +	kvlist = rte_kvargs_parse(pci_dev->device.devargs->args,
> +			ifcvf_valid_arguments);
> +	if (kvlist == NULL)
> +		return 1;
> +
> +	/* probe only when vdpa mode is specified */
> +	if (rte_kvargs_count(kvlist, IFCVF_VDPA_MODE) == 0) {
> +		rte_kvargs_free(kvlist);
> +		return 1;
> +	}
> +
> +	ret = rte_kvargs_process(kvlist, IFCVF_VDPA_MODE, &open_int,
> +			&vdpa_mode);
> +	if (ret < 0 || vdpa_mode == 0) {
> +		rte_kvargs_free(kvlist);
> +		return 1;
> +	}
> +
> +	list = rte_zmalloc("ifcvf", sizeof(*list), 0);
> +	if (list == NULL)
> +		goto error;
> +
> +	internal = rte_zmalloc("ifcvf", sizeof(*internal), 0);
> +	if (internal == NULL)
> +		goto error;
> +
> +	internal->pdev = pci_dev;
> +	rte_spinlock_init(&internal->lock);
> +
> +	if (ifcvf_vfio_setup(internal) < 0) {
> +		DRV_LOG(ERR, "failed to setup device %s", pci_dev->name);
> +		goto error;
> +	}
> +
> +	if (ifcvf_init_hw(&internal->hw, internal->pdev) < 0) {
> +		DRV_LOG(ERR, "failed to init device %s", pci_dev->name);
> +		goto error;
> +	}
> +
> +	internal->max_queues = IFCVF_MAX_QUEUES;
> +	features = ifcvf_get_features(&internal->hw);
> +	internal->features = (features &
> +		~(1ULL << VIRTIO_F_IOMMU_PLATFORM)) |
> +		(1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) |
> +		(1ULL << VIRTIO_NET_F_CTRL_VQ) |
> +		(1ULL << VIRTIO_NET_F_STATUS) |
> +		(1ULL << VHOST_USER_F_PROTOCOL_FEATURES) |
> +		(1ULL << VHOST_F_LOG_ALL);
> +
> +	internal->dev_addr.pci_addr = pci_dev->addr;
> +	internal->dev_addr.type = PCI_ADDR;
> +	list->internal = internal;
> +
> +	if (rte_kvargs_count(kvlist, IFCVF_SW_FALLBACK_LM)) {
> +		ret = rte_kvargs_process(kvlist, IFCVF_SW_FALLBACK_LM,
> +				&open_int, &sw_fallback_lm);
> +		if (ret < 0)
> +			goto error;
> +	}
> +	internal->sw_lm = sw_fallback_lm;
> +
> +	internal->did = rte_vdpa_register_device(&internal->dev_addr,
> +				&ifcvf_ops);
> +	if (internal->did < 0) {
> +		DRV_LOG(ERR, "failed to register device %s", pci_dev-
> >name);
> +		goto error;
> +	}
> +
> +	pthread_mutex_lock(&internal_list_lock);
> +	TAILQ_INSERT_TAIL(&internal_list, list, next);
> +	pthread_mutex_unlock(&internal_list_lock);
> +
> +	rte_atomic32_set(&internal->started, 1);
> +	update_datapath(internal);
> +
> +	rte_kvargs_free(kvlist);
> +	return 0;
> +
> +error:
> +	rte_kvargs_free(kvlist);
> +	rte_free(list);
> +	rte_free(internal);
> +	return -1;
> +}
> +
> +static int
> +ifcvf_pci_remove(struct rte_pci_device *pci_dev)
> +{
> +	struct ifcvf_internal *internal;
> +	struct internal_list *list;
> +
> +	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> +		return 0;
> +
> +	list = find_internal_resource_by_dev(pci_dev);
> +	if (list == NULL) {
> +		DRV_LOG(ERR, "Invalid device: %s", pci_dev->name);
> +		return -1;
> +	}
> +
> +	internal = list->internal;
> +	rte_atomic32_set(&internal->started, 0);
> +	update_datapath(internal);
> +
> +	rte_pci_unmap_device(internal->pdev);
> +	rte_vfio_container_destroy(internal->vfio_container_fd);
> +	rte_vdpa_unregister_device(internal->did);
> +
> +	pthread_mutex_lock(&internal_list_lock);
> +	TAILQ_REMOVE(&internal_list, list, next);
> +	pthread_mutex_unlock(&internal_list_lock);
> +
> +	rte_free(list);
> +	rte_free(internal);
> +
> +	return 0;
> +}
> +
> +/*
> + * IFCVF has the same vendor ID and device ID as virtio net PCI
> + * device, with its specific subsystem vendor ID and device ID.
> + */
> +static const struct rte_pci_id pci_id_ifcvf_map[] = {
> +	{ .class_id = RTE_CLASS_ANY_ID,
> +	  .vendor_id = IFCVF_VENDOR_ID,
> +	  .device_id = IFCVF_DEVICE_ID,
> +	  .subsystem_vendor_id = IFCVF_SUBSYS_VENDOR_ID,
> +	  .subsystem_device_id = IFCVF_SUBSYS_DEVICE_ID,
> +	},
> +
> +	{ .vendor_id = 0, /* sentinel */
> +	},
> +};
> +
> +static struct rte_pci_driver rte_ifcvf_vdpa = {
> +	.id_table = pci_id_ifcvf_map,
> +	.drv_flags = 0,
> +	.probe = ifcvf_pci_probe,
> +	.remove = ifcvf_pci_remove,
> +};
> +
> +RTE_PMD_REGISTER_PCI(net_ifcvf, rte_ifcvf_vdpa);
> +RTE_PMD_REGISTER_PCI_TABLE(net_ifcvf, pci_id_ifcvf_map);
> +RTE_PMD_REGISTER_KMOD_DEP(net_ifcvf, "* vfio-pci");
> +
> +RTE_INIT(ifcvf_vdpa_init_log)
> +{
> +	ifcvf_vdpa_logtype = rte_log_register("pmd.net.ifcvf_vdpa");
> +	if (ifcvf_vdpa_logtype >= 0)
> +		rte_log_set_level(ifcvf_vdpa_logtype, RTE_LOG_NOTICE);
> +}
> diff --git a/drivers/vdpa/ifc/meson.build b/drivers/vdpa/ifc/meson.build
> new file mode 100644
> index 0000000..adc9ed9
> --- /dev/null
> +++ b/drivers/vdpa/ifc/meson.build
> @@ -0,0 +1,9 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2018 Intel Corporation
> +
> +build = dpdk_conf.has('RTE_LIBRTE_VHOST')
> +reason = 'missing dependency, DPDK vhost library'
> +allow_experimental_apis = true
> +sources = files('ifcvf_vdpa.c', 'base/ifcvf.c')
> +includes += include_directories('base')
> +deps += 'vhost'
> diff --git a/drivers/vdpa/ifc/rte_pmd_ifc_version.map
> b/drivers/vdpa/ifc/rte_pmd_ifc_version.map
> new file mode 100644
> index 0000000..f9f17e4
> --- /dev/null
> +++ b/drivers/vdpa/ifc/rte_pmd_ifc_version.map
> @@ -0,0 +1,3 @@
> +DPDK_20.0 {
> +	local: *;
> +};
> diff --git a/drivers/vdpa/meson.build b/drivers/vdpa/meson.build
> index a839ff5..fd164d3 100644
> --- a/drivers/vdpa/meson.build
> +++ b/drivers/vdpa/meson.build
> @@ -1,7 +1,7 @@
>  #   SPDX-License-Identifier: BSD-3-Clause
>  #   Copyright 2019 Mellanox Technologies, Ltd
> 
> -drivers = []
> +drivers = ['ifc']
>  std_deps = ['bus_pci', 'kvargs']
>  std_deps += ['vhost']
>  config_flag_fmt = 'RTE_LIBRTE_@0@_PMD'
> --
> 1.8.3.1


  reply	other threads:[~2020-01-09 17:25 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-25 15:19 [dpdk-dev] [PATCH v1 0/3] Introduce new class for vDPA device drivers Matan Azrad
2019-12-25 15:19 ` [dpdk-dev] [PATCH v1 1/3] drivers: introduce vDPA class Matan Azrad
2020-01-07 17:32   ` Maxime Coquelin
2020-01-08 21:28     ` Thomas Monjalon
2020-01-09  8:00       ` Maxime Coquelin
2019-12-25 15:19 ` [dpdk-dev] [PATCH v1 2/3] doc: add vDPA feature table Matan Azrad
2020-01-07 17:39   ` Maxime Coquelin
2020-01-08  5:28     ` Tiwei Bie
2020-01-08  7:20       ` Andrew Rybchenko
2020-01-08 10:42         ` Matan Azrad
2020-01-08 13:11           ` Andrew Rybchenko
2020-01-08 17:01             ` Matan Azrad
2020-01-09  2:15           ` Tiwei Bie
2020-01-09  8:08             ` Matan Azrad
2019-12-25 15:19 ` [dpdk-dev] [PATCH v1 3/3] drivers: move ifc driver to the vDPA class Matan Azrad
2020-01-07 18:17   ` Maxime Coquelin
2020-01-07  7:57 ` [dpdk-dev] [PATCH v1 0/3] Introduce new class for vDPA device drivers Matan Azrad
2020-01-08  5:44   ` Xu, Rosen
2020-01-08 10:45     ` Matan Azrad
2020-01-08 12:39       ` Xu, Rosen
2020-01-08 12:58         ` Thomas Monjalon
2020-01-09  2:27           ` Xu, Rosen
2020-01-09  8:41             ` Thomas Monjalon
2020-01-09  9:23               ` Maxime Coquelin
2020-01-09  9:49                 ` Xu, Rosen
2020-01-09 10:42                   ` Maxime Coquelin
2020-01-10  2:40                     ` Xu, Rosen
2020-01-09 10:42                   ` Maxime Coquelin
2020-01-09 10:53               ` Xu, Rosen
2020-01-09 11:34                 ` Matan Azrad
2020-01-10  2:38                   ` Xu, Rosen
2020-01-10  9:21                     ` Thomas Monjalon
2020-01-10 14:18                       ` Xu, Rosen
2020-01-10 16:27                         ` Thomas Monjalon
2020-01-09 11:00 ` [dpdk-dev] [PATCH v2 " Matan Azrad
2020-01-09 11:00   ` [dpdk-dev] [PATCH v2 1/3] drivers: introduce vDPA class Matan Azrad
2020-01-09 11:00   ` [dpdk-dev] [PATCH v2 2/3] doc: add vDPA feature table Matan Azrad
2020-01-10 18:26     ` Thomas Monjalon
2020-01-13 22:40     ` Thomas Monjalon
2020-01-09 11:00   ` [dpdk-dev] [PATCH v2 3/3] drivers: move ifc driver to the vDPA class Matan Azrad
2020-01-09 17:25     ` Matan Azrad [this message]
2020-01-10  1:55       ` Wang, Haiyue
2020-01-10  9:07         ` Matan Azrad
2020-01-10  9:13           ` Thomas Monjalon
2020-01-10 12:31             ` Wang, Haiyue
2020-01-10 12:34               ` Maxime Coquelin
2020-01-10 12:59                 ` Thomas Monjalon
2020-01-10 19:17                   ` Kevin Traynor
2020-01-13 22:57     ` Thomas Monjalon
2020-01-13 23:08   ` [dpdk-dev] [PATCH v2 0/3] Introduce new class for vDPA device drivers Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AM0PR0502MB4019F014310929212E496790D2390@AM0PR0502MB4019.eurprd05.prod.outlook.com \
    --to=matan@mellanox.com \
    --cc=arybchenko@solarflare.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=thomas@monjalon.net \
    --cc=tiwei.bie@intel.com \
    --cc=xiao.w.wang@intel.com \
    --cc=zhihong.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.