linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Haotian Wang <haotian.wang@sifive.com>
Cc: kishon@ti.com, lorenzo.pieralisi@arm.com, bhelgaas@google.com,
	jasowang@redhat.com, linux-pci@vger.kernel.org,
	haotian.wang@duke.edu
Subject: Re: [PATCH] pci: endpoint: functions: Add a virtnet EP function
Date: Tue, 3 Sep 2019 02:25:35 -0400	[thread overview]
Message-ID: <20190903021403-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20190823213145.2016-1-haotian.wang@sifive.com>

On Fri, Aug 23, 2019 at 02:31:45PM -0700, Haotian Wang wrote:
> This endpoint function enables the PCI endpoint to establish a virtual
> ethernet link with the PCI host. The main features are:
> 
> - Zero modification of PCI host kernel. The only requirement for the
>   PCI host is to enable virtio, virtio_pci, virtio_pci_legacy and
>   virito_net.

Do we need to support legacy? Why not just the modern interface?
Even if yes, limiting device
to only legacy support is not a good idea.


> 
> - The virtual ethernet link is stable enough to support ordinary
>   capabilities of the Linux network stack. User space programs such as
>   ping, ssh, iperf and scp can run on the link without additional
>   hassle.
> 
> - This function fits in the PCI endpoint framework
>   (drivers/pci/endpoint/) and makes API calls provided by virtio_net
>   (drivers/net/virtio_net.c). It does not depend on
>   architecture-specific or hardware-specific features.
> 
> This function driver is tested on the following pair of systems. The PCI
> endpoint is a Xilinx VCU118 board programmed with a SiFive Linux-capable
> core running Linux 5.2. The PCI host is an x86_64 Intel(R) Core(TM)
> i3-6100 running unmodified Linux 5.2. The virtual link achieved a
> stable throughput of ~180KB/s during scp sessions of a 50M file. The
> PCI host could setup ip-forwarding and NAT to enable the PCI endpoint to
> have Internet access. Documentation for using this function driver is at
> Documentation/PCI/endpoint/pci-epf-virtio-howto.rst.
> 
> Reference Docs,
> - Documentation/PCI/endpoint/pci-endpoint.rst. Initialization and
>   removal of endpoint function device and driver.
> - Documentation/PCI/endpoint/pci-endpoint-cfs.rst. Use configfs to
>   control bind, linkup and unbind behavior.
> - https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-
>   csprd01.html, drivers/virtio/ and drivers/net/virtio_net.c. Algorithms
>   and data structures used by the virtio framework.
> 
> Signed-off-by: Haotian Wang <haotian.wang@sifive.com>
> ---
>  Documentation/PCI/endpoint/index.rst          |    1 +
>  .../PCI/endpoint/pci-epf-virtio-howto.rst     |  176 ++
>  MAINTAINERS                                   |    7 +
>  drivers/pci/endpoint/functions/Kconfig        |   45 +
>  drivers/pci/endpoint/functions/Makefile       |    1 +
>  .../pci/endpoint/functions/pci-epf-virtio.c   | 2043 +++++++++++++++++
>  include/linux/pci-epf-virtio.h                |  253 ++
>  7 files changed, 2526 insertions(+)
>  create mode 100644 Documentation/PCI/endpoint/pci-epf-virtio-howto.rst
>  create mode 100644 drivers/pci/endpoint/functions/pci-epf-virtio.c
>  create mode 100644 include/linux/pci-epf-virtio.h
> 
> diff --git a/Documentation/PCI/endpoint/index.rst b/Documentation/PCI/endpoint/index.rst
> index d114ea74b444..ac396afb3e99 100644
> --- a/Documentation/PCI/endpoint/index.rst
> +++ b/Documentation/PCI/endpoint/index.rst
> @@ -11,3 +11,4 @@ PCI Endpoint Framework
>     pci-endpoint-cfs
>     pci-test-function
>     pci-test-howto
> +   pci-epf-virtio-howto
> diff --git a/Documentation/PCI/endpoint/pci-epf-virtio-howto.rst b/Documentation/PCI/endpoint/pci-epf-virtio-howto.rst
> new file mode 100644
> index 000000000000..f62d830ab820
> --- /dev/null
> +++ b/Documentation/PCI/endpoint/pci-epf-virtio-howto.rst
> @@ -0,0 +1,176 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +==========================================
> +PCI Virtio Net Endpoint Function Userguide
> +==========================================
> +
> +:Author: Haotian Wang <haotian.wang@sifive.com>
> +
> +This document provides steps to use the pci-epf-virtio endpoint function driver
> +on the PCI endpoint, together with virtio_net on the PCI host side, to achieve a
> +virtual ethernet connection between the two ends.
> +
> +Host Device
> +===========
> +
> +Build the host kernel with virtio, virtio_pci, virtio_pci_legacy, virtio_net as
> +BUILT-IN modules. The locations of these configurations in `make menuconfig`
> +are:
> +
> +	virtio: Device Drivers/Virtio drivers
> +	virtio_pci: Device Drivers/Virtio drivers/PCI driver for virtio devices
> +	virtio_pci_legacy: Device Drivers/Virtio drivers/Support for legacy
> +			   virtio draft 0.9.X and older devices
> +	virtio_net: Device Drivers/Network device support/Virtio network driver
> +
> +After `make menuconfig`, make sure these config options are set to "=y" in the
> +.config file:
> +
> +	CONFIG_VIRTIO
> +	CONFIG_VIRTIO
> +	CONFIG_VIRTIO_PCI_LEGACY
> +	CONFIG_VIRTIO_NET
> +
> +CONFIG_PCI_HOST_LITTLE_ENDIAN must be set at COMPILE TIME. Toggle it on to build
> +the module with the PCI host being in little endianness.

If you drop legacy support, you will not need to mess with this.

> +
> +Build the kernel with the .config file. These are all the requirements for the
> +host side.
> +
> +Endpoint Device
> +===============
> +
> +Required Modules
> +----------------
> +
> +pci-epf-virtio relies on PCI_ENDPOINT, PCI_ENDPOINT_CONFIGFS, VIRTIO, VIRTIO_NET
> +to function properly. Make sure those are BUILT-IN. PCI_ENDPOINT_DMAENGINE and
> +PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION have to be turned on or off at compile time
> +for pci-epf-virtio to recognize these options.
> +
> +Enable PCI_ENDPOINT_DMAENGINE if your endpoint controller has an implementation
> +for that feature. Enable PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION for possible
> +performance gain.
> +
> +Endpoint Function Drivers
> +-------------------------
> +
> +To find the list of endpoint function drivers in the kernel::
> +
> +	# ls /sys/bus/pci-epf/drivers
> +	  pci_epf_virtio
> +OR::
> +
> +	# ls /sys/kernel/config/pci_ep/functions
> +	  pci_epf_virtio
> +
> +Creating pci-epf-virtio Device
> +------------------------------
> +
> +Since CONFIG_PCI_ENDPOINT_CONFIGFS is enabled, use the following commands to
> +create a pci-epf-virtio device::
> +
> +	# mount -t configfs none /sys/kernel/config
> +	# cd /sys/kernel/config/pci_ep
> +	# mkdir functions/pci_epf_virtio/func1
> +
> +Now the device will be probed by the pci_epf_virtio driver.
> +
> +Binding pci-epf-virtio Device to Endpoint Controller
> +----------------------------------------------------
> +
> +A `ln` command on the configfs will call the `bind` function defined in
> +pci-epf-virtio.c. This will bind the endpoint device to the controller::
> +
> +	# ln -s functions/pci_epf_virtio/func1 controllers/<some string>.pcie_ep
> +
> +Starting the Link
> +-----------------
> +
> +Once the device is bound to the endpoint controller. Use the configfs to
> +actually start the link with the PCI host side::
> +
> +	# echo 1 > controllers/<some string>.pcie_ep/start
> +
> +Using pci-epf-virtio
> +====================
> +
> +Setting Up Network Interfaces
> +-----------------------------
> +
> +Once the PCI link is brought up, both the host and endpoint will see a virtual
> +network interface if running `ifconfig`. On the host side, the virtual network
> +interface will have a mac address 02:02:02:02:02:02. On the endpoint side, if
> +will be 04:04:04:04:04:04. An easy way to enable a virtual ethernet link between
> +the two is to give them IP addresses that belong to the same subnet. For
> +example, assume the interface on the host side is called "enp2s0", and the
> +interface on the endpoint side is called "eth0". Run the following commonds.
> +
> +On the host side::
> +
> +	# ifconfig enp2s0 192.168.1.1 up
> +
> +On the endpoint side::
> +
> +	# ifconfig eth0 192.168.1.2 up
> +
> +Please note that if the host side usually has a complete distro such as Ubuntu
> +or Fedora. In that case, it is better to use the NetworkManager GUI provided by
> +the distro to assign a static IP address to "enp2s0", because the GUI will keep
> +trying to overwrite `ifconfig` settings with its settings. At this point of
> +time, the link between the host and endpoint is established.
> +
> +Using the Virtual Ethernet Link
> +-------------------------------
> +
> +User can run any task between these two network interfaces as if there were a
> +physical ethernet cable between two network devices. `ssh`, `scp`, `ping` work
> +out of the box from either side to the other side. `wireshark` can be run to
> +monitor packet traffic on the virtual network interfaces. If `ip-forwarding` is
> +enabled on the host side, and the host has Internet access, the host can use
> +`iptables -t nat` or equivalent programs to set up packet routing between the
> +Internet and the endpoint.
> +
> +Endpoint pci-epf-virtio Runtime Module Parameters
> +-------------------------------------------------
> +
> +On the endpoint, all module parameters shown can be toggled at runtime::
> +
> +	# ls /sys/module/pci_epf_virtio/parameters
> +	  check_queues_usec_max
> +	  check_queues_usec_min
> +	  notif_poll_usec_max
> +	  notif_poll_usec_min
> +
> +If PCI_ENDPOINT_DMAENGINE is enabled at COMPILE TIME, there will be an
> +additional parameter, enable_dma.
> +
> +If PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION is enabled at COMPILE TIME, there will
> +be an additional parameter, event_suppression.
> +
> +check_queues_usec_min/max specify the range of interval in microseconds between
> +two consecutive polls of vring data structures on the host by the endpoint.
> +Lower these values for more frequent polling, which probably increases traffic
> +throughput but hogs more CPU resources on the endpoint. The default values for
> +this pair are 100/200.
> +
> +notif_poll_usec_min/max specify the range of interval in microseconds between
> +two consecutive polls of vring update notices from the host by the endpoint.
> +Lowering them has similar effect to lowering check_queues_usec_min/max. The
> +default values for this pair are 10/20.
> +
> +It should be noted that notif_poll_usec_min/max should be much smaller than
> +check_queues_usec_min/max because check_queues is a much heavier task than
> +notif_poll. check_queues is implemented as a last resort in case update notices
> +from the host are missed by the endpoint, and should not be done as frequently
> +as polling for update notices from the host.
> +
> +If enable_dma is set to true, dma transfer will be used for each packet
> +transfer. Right now enabling dma actually hurts performance, so this option is
> +not recommended. The default value is false.
> +
> +event_suppression is an int value. Recommended values are between 2 and 5. This
> +value is used by endpoint and host as a reference. For example, if it is set to
> +3, the host will only update the endpoint after each batch of 3 packets are
> +transferred. Without event suppression, both sides will try to signal the other
> +end after every single packet is transferred. The default value is 3.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 997a4f8fe88e..fe6c7651a894 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -12384,6 +12384,13 @@ F:	drivers/pci/endpoint/
>  F:	drivers/misc/pci_endpoint_test.c
>  F:	tools/pci/
>  
> +PCI ENDPOINT VIRTIO NET FUNCTION
> +M:	Haotian Wang <haotian.wang@sifive.com>
> +L:	linux-pci@vger.kernel.org
> +S:	Supported
> +F:	drivers/pci/endpoint/functions/pci-epf-virtio.c
> +F:	include/linux/pci-epf-virtio.h
> +
>  PCI ENHANCED ERROR HANDLING (EEH) FOR POWERPC
>  M:	Russell Currey <ruscur@russell.cc>
>  M:	Sam Bobroff <sbobroff@linux.ibm.com>
> diff --git a/drivers/pci/endpoint/functions/Kconfig b/drivers/pci/endpoint/functions/Kconfig
> index 8820d0f7ec77..e9e78fcd90d2 100644
> --- a/drivers/pci/endpoint/functions/Kconfig
> +++ b/drivers/pci/endpoint/functions/Kconfig
> @@ -12,3 +12,48 @@ config PCI_EPF_TEST
>  	   for PCI Endpoint.
>  
>  	   If in doubt, say "N" to disable Endpoint test driver.
> +
> +config PCI_EPF_VIRTIO
> +	tristate "PCI Endpoint virtio driver"
> +	depends on PCI_ENDPOINT
> +	select VIRTIO
> +	select VIRTIO_NET
> +	help
> +	   Enable this configuration option to enable the virtio net
> +	   driver for PCI Endpoint. Enabling this function driver automatically
> +	   selects virtio and virtio_net modules in your kernel build.
> +	   If the endpoint has this driver built-in or loaded, and
> +	   the PCI host enables virtio_net, the two systems can communicate
> +	   with each other via a pair of virtual network devices.
> +
> +	   If in doubt, say "N" to disable Endpoint virtio driver.
> +
> +config PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> +	bool "PCI Virtio Endpoint Function Notification Suppression"
> +	default n
> +	depends on PCI_EPF_VIRTIO
> +	help
> +	  Enable this configuration option to allow virtio queues to suppress
> +	  some notifications and interrupts. Normally the host and the endpoint
> +	  send a notification/interrupt to each other after each packet has been
> +	  provided/consumed. Notifications/Interrupts can be generally expensive
> +	  across the PCI bus. If this config is enabled, both sides will only
> +	  signal the other end after a batch of packets has been consumed/
> +	  provided. However, in reality, this option does not offer significant
> +	  performance gain so far.
> +
> +	  If in doubt, say "N" to enable this feature.
> +
> +config PCI_HOST_LITTLE_ENDIAN
> +	bool "PCI host will be in little endianness"
> +	depends on PCI_EPF_VIRTIO
> +	default y
> +	help
> +	  Enable this configuration option if the PCI host uses little endianness.
> +	  Disable it if the PCI host uses big endianness. pci-epf-virtio
> +	  leverages the functions of the legacy virtio framework. Legacy
> +	  virtio does not specify a fixed endianness used between systems. Thus,
> +	  at compile time, the user has to build the endpoint function with
> +	  the endianness of the PCI host already known.
> +
> +	  The default option assumes PCI host is little endian.
> diff --git a/drivers/pci/endpoint/functions/Makefile b/drivers/pci/endpoint/functions/Makefile
> index d6fafff080e2..9b5e72a324eb 100644
> --- a/drivers/pci/endpoint/functions/Makefile
> +++ b/drivers/pci/endpoint/functions/Makefile
> @@ -4,3 +4,4 @@
>  #
>  
>  obj-$(CONFIG_PCI_EPF_TEST)		+= pci-epf-test.o
> +obj-$(CONFIG_PCI_EPF_VIRTIO)		+= pci-epf-virtio.o
> diff --git a/drivers/pci/endpoint/functions/pci-epf-virtio.c b/drivers/pci/endpoint/functions/pci-epf-virtio.c
> new file mode 100644
> index 000000000000..5cc8cb02fb48
> --- /dev/null
> +++ b/drivers/pci/endpoint/functions/pci-epf-virtio.c
> @@ -0,0 +1,2043 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/**
> + * PCI epf driver to implement virtio endpoint functionality
> + *
> + * Author: Haotian Wang <haotian.wang@sifive.com>
> + */
> +
> +#include <linux/io.h>
> +#include <linux/pci-epc.h>
> +#include <linux/pci-epf.h>
> +#include <linux/pci_regs.h>
> +#include <linux/module.h>
> +#include <linux/pci_ids.h>
> +#include <linux/random.h>
> +#include <linux/kernel.h>
> +#include <linux/virtio.h>
> +#include <linux/if_ether.h>
> +#include <linux/etherdevice.h>
> +#include <linux/slab.h>
> +#include <linux/virtio_ring.h>
> +#include <linux/virtio_byteorder.h>
> +#include <uapi/linux/virtio_pci.h>
> +#include <uapi/linux/virtio_net.h>
> +#include <uapi/linux/virtio_ring.h>
> +#include <uapi/linux/virtio_types.h>
> +#include <uapi/linux/sched/types.h>
> +#include <uapi/linux/virtio_config.h>
> +#include <linux/pci-epf-virtio.h>
> +
> +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> +static int event_suppression = EVENT_SUPPRESSION;
> +module_param(event_suppression, int, 0644);
> +#endif
> +static int notif_poll_usec_min = CATCH_NOTIFY_USEC_MIN;
> +module_param(notif_poll_usec_min, int, 0644);
> +static int notif_poll_usec_max = CATCH_NOTIFY_USEC_MAX;
> +module_param(notif_poll_usec_max, int, 0644);
> +static int check_queues_usec_min = CHECK_QUEUES_USEC_MIN;
> +module_param(check_queues_usec_min, int, 0644);
> +static int check_queues_usec_max = CHECK_QUEUES_USEC_MAX;
> +module_param(check_queues_usec_max, int, 0644);
> +#ifdef CONFIG_PCI_ENDPOINT_DMAENGINE
> +static bool enable_dma = ENABLE_DMA;
> +module_param(enable_dma, bool, 0644);
> +#endif
> +
> +/* Default information written to configfs */
> +static struct pci_epf_header virtio_header = {
> +	.vendorid	= PCI_VENDOR_ID_REDHAT_QUMRANET,
> +	.deviceid	= VIRTIO_DEVICE_ID,
> +	.baseclass_code = PCI_CLASS_OTHERS,
> +	.interrupt_pin	= PCI_INTERRUPT_INTA,
> +	.subsys_id	= VIRTIO_NET_SUBSYS_ID,
> +	.subsys_vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET,
> +};
> +
> +/* Default bar sizes */
> +static size_t bar_size[] = { 512, 512, 1024, 16384, 131072, 1048576 };
> +
> +/*
> + * Clear mapped memory of a map. If there is memory allocated using the
> + * pci-ep framework, that memory will be released.
> + *
> + * @map: a map struct pointer that will be unmapped
> + */
> +static void pci_epf_unmap(struct pci_epf_map *map)
> +{
> +	if (map->iobase) {
> +		struct pci_epf *const epf = map->epf;
> +		struct pci_epc *const epc = epf->epc;
> +
> +		pci_epc_unmap_addr(epc, epf->func_no, map->phys_iobase);
> +		pci_epc_mem_free_addr(epc, map->phys_iobase,
> +				      map->iobase, map->iosize);
> +		map->iobase = NULL;
> +		map->ioaddr = NULL;
> +		map->phys_ioaddr = 0;
> +		map->phys_iobase = 0;
> +	}
> +}
> +
> +/*
> + * Release all mapped memory in the cache of maps.
> + *
> + * @lhead: the struct list_head that chains all maps together
> + * @slab: slab pointer used to allocate the maps. They are required
> + *	  to free the map structs according to slab allocator API.
> + */
> +static void pci_epf_free_map_cache(struct list_head *lhead,
> +				   struct kmem_cache *slab)
> +{
> +	struct pci_epf_map *iter;
> +	struct pci_epf_map *temp;
> +
> +	list_for_each_entry_safe(iter, temp, lhead, node) {
> +		list_del(&iter->node);
> +		kmem_cache_free(slab, iter);
> +	}
> +}
> +
> +/*
> + * Initialize a struct pci_epf_map.
> + *
> + * @map: ptr to map to be initialized
> + * @epf: required for following mapping and unmapping action
> + * @align: alignment requirement that the PCI endpoint may have
> + */
> +static void pci_epf_map_init(struct pci_epf_map *map,
> +			     struct pci_epf *epf,
> +			     size_t align)
> +{
> +	memset(map, 0, sizeof(*map));
> +	map->epf = epf;
> +	map->epc = epf->epc;
> +	map->align = align;
> +	INIT_LIST_HEAD(&map->node);
> +}
> +
> +/*
> + * Check whether the requested memory region is already mapped by the map.
> + *
> + * @map: ptr to the map to be checked
> + * @host_addr: physical address of the memory region on the PCI host
> + * @size: size in bytes of the memory region to be requested
> + *
> + * Returns true if the map already maps the region. Returns false if the map
> + * does not map the requested region.
> + */
> +static inline bool pci_epf_map_match(struct pci_epf_map *map, u64 host_addr,
> +				     size_t size)
> +{
> +	return host_addr >= map->prev_host_base &&
> +	       host_addr + size <= map->prev_host_base + map->iosize;
> +}
> +
> +/*
> + * Map a requested memory region
> + *
> + * @map: map ptr to hold the mapped memory
> + * @host_addr: physical memory address of starting byte on PCI host
> + * @size: size in bytes of the requested region
> + *
> + * Returns 0 on success and a negative error number on failure
> + */
> +static int pci_epf_map(struct pci_epf_map *map,
> +		       u64 host_addr,
> +		       size_t size)
> +{
> +	struct pci_epc *const epc = map->epc;
> +	struct pci_epf *const epf = map->epf;
> +	struct device *dev = &epf->dev;
> +	void __iomem *iobase;
> +	phys_addr_t phys_iobase;
> +	u64 host_base;
> +	off_t offset;
> +	size_t align, iosize;
> +	int ret;
> +
> +	align = map->align;
> +	iosize = (align > PAGE_SIZE && size < align) ? align : size;
> +	iobase = pci_epc_mem_alloc_addr(epc, &phys_iobase, iosize);
> +	if (!iobase) {
> +		dev_err(dev, "Failed to allocate address map\n");
> +		return -ENOMEM;
> +	}
> +
> +	host_base = host_addr;
> +	if (align > PAGE_SIZE)
> +		host_base &= ~(align - 1);
> +
> +	ret = pci_epc_map_addr(epc, epf->func_no,
> +			       phys_iobase, host_base, iosize);
> +	if (ret) {
> +		dev_err(dev, "Failed to map host address\n");
> +		pci_epc_mem_free_addr(epc, phys_iobase, iobase, iosize);
> +		return ret;
> +	}
> +
> +	offset = host_addr - host_base;
> +
> +	map->prev_host_base = host_base;
> +	map->iosize = iosize;
> +	map->iobase = iobase;
> +	map->ioaddr = iobase + offset;
> +	map->phys_iobase = phys_iobase;
> +	map->phys_ioaddr = phys_iobase + offset;
> +
> +	return 0;
> +}
> +
> +/*
> + * Get a best map ptr from the lru cache and map the requested memory region
> + *
> + * @lru_head: head of list linking all available pci_epf_map
> + * @host_addr: physical memory address of starting byte on PCI host
> + * @size: size in bytes of requested memory region
> + *
> + * Returns a ptr to the mapped struct pci_epf_map on success
> + * or an error pointer on failure. The caller must make sure to check
> + * for error pointer.
> + */
> +static struct pci_epf_map *pci_epf_get_map(struct list_head *lru_head,
> +					   u64 host_addr,
> +					   size_t size)
> +{
> +	int ret;
> +	struct pci_epf_map *map;
> +
> +	list_for_each_entry(map, lru_head, node) {
> +		if (pci_epf_map_match(map, host_addr, size)) {
> +			map->phys_ioaddr = map->phys_iobase + host_addr
> +					   - map->prev_host_base;
> +			map->ioaddr = (void __iomem *)(map->iobase + host_addr
> +						       - map->prev_host_base);
> +			list_move(&map->node, lru_head);
> +			return map;
> +		}
> +	}
> +
> +	map = list_last_entry(lru_head, struct pci_epf_map, node);
> +	list_move(&map->node, lru_head);
> +	pci_epf_unmap(map);
> +	ret = pci_epf_map(map, host_addr, size);
> +	if (ret)
> +		return ERR_PTR(ret);
> +	return map;
> +}
> +
> +/*
> + * These functions convert __virtio unsigned integers which are in PCI host
> + * endianness to unsigned integers in PCI endpoint endianness
> + */
> +static inline u16 epf_virtio16_to_cpu(__virtio16 val)
> +{
> +#ifdef CONFIG_PCI_HOST_LITTLE_ENDIAN
> +	return le16_to_cpu((__force __le16)val);
> +#else
> +	return be16_to_cpu((__force __be16)val);
> +#endif
> +}
> +
> +static inline u32 epf_virtio32_to_cpu(__virtio32 val)
> +{
> +#ifdef CONFIG_PCI_HOST_LITTLE_ENDIAN
> +	return le32_to_cpu((__force __le32)val);
> +#else
> +	return be32_to_cpu((__force __be32)val);
> +#endif
> +}
> +
> +static inline u64 epf_virtio64_to_cpu(__virtio64 val)
> +{
> +#ifdef CONFIG_PCI_HOST_LITTLE_ENDIAN
> +	return le64_to_cpu((__force __le64)val);
> +#else
> +	return be64_to_cpu((__force __be64)val);
> +#endif
> +}
> +
> +/*
> + * These functions convert unsigned integers in PCI endpoint endianness
> + * to __virtio unsigned integers in PCI host endianness
> + */
> +static inline __virtio16 epf_cpu_to_virtio16(u16 val)
> +{
> +#ifdef CONFIG_PCI_HOST_LITTLE_ENDIAN
> +	return (__force __virtio16)cpu_to_le16(val);
> +#else
> +	return (__force __virtio16)cpu_to_be16(val);
> +#endif
> +}
> +
> +static inline __virtio32 epf_cpu_to_virtio32(u32 val)
> +{
> +#ifdef CONFIG_PCI_HOST_LITTLE_ENDIAN
> +	return (__force __virtio32)cpu_to_le32(val);
> +#else
> +	return (__force __virtio32)cpu_to_be32(val);
> +#endif
> +}
> +
> +static inline __virtio64 epf_cpu_to_virtio64(u64 val)
> +{
> +#ifdef CONFIG_PCI_HOST_LITTLE_ENDIAN
> +	return (__force __virtio64)cpu_to_le64(val);
> +#else
> +	return (__force __virtio64)cpu_to_be64(val);
> +#endif
> +}
> +
> +/*
> + * Though locally __virtio unsigned integers have the exact same endianness
> + * as the normal unsigned integers. These functions are here for type
> + * consistency as required by sparse.
> + */
> +static inline u16 local_virtio16_to_cpu(__virtio16 val)
> +{
> +	return (__force u16)val;
> +}
> +
> +static inline u32 local_virtio32_to_cpu(__virtio32 val)
> +{
> +	return (__force u32)val;
> +}
> +
> +static inline u64 local_virtio64_to_cpu(__virtio64 val)
> +{
> +	return (__force u64)val;
> +}
> +
> +static inline __virtio16 local_cpu_to_virtio16(u16 val)
> +{
> +	return (__force __virtio16)val;
> +}
> +
> +static inline __virtio32 local_cpu_to_virtio32(u32 val)
> +{
> +	return (__force __virtio32)val;
> +}
> +
> +static inline __virtio64 local_cpu_to_virtio64(u64 val)
> +{
> +	return (__force __virtio64)val;
> +}
> +
> +/*
> + * Convert a __virtio16 in PCI host endianness to PCI endpoint endianness
> + * in place.
> + *
> + * @ptr: ptr to __virtio16 value in PCI host endianness
> + */
> +static inline void convert_to_local(__virtio16 *ptr)
> +{
> +	*ptr = (__force __virtio16)epf_virtio16_to_cpu(*ptr);
> +}
> +
> +/*
> + * Convert a local __virtio16 in PCI endpoint endianness to PCI host endianness
> + * in place.
> + *
> + * @ptr: ptr to  __virtio16 value in PCI endpoint endianness
> + */
> +static inline void convert_to_remote(__virtio16 *ptr)
> +{
> +	*ptr = epf_cpu_to_virtio16((__force u16)*ptr);
> +}
> +
> +/*
> + * These functions read from an IO memory address from PCI host and convert
> + * the value to PCI endpoint endianness.
> + */
> +static inline u16 epf_ioread16(void __iomem *addr)
> +{
> +	return epf_virtio16_to_cpu((__force __virtio16)ioread16(addr));
> +}
> +
> +static inline u32 epf_ioread32(void __iomem *addr)
> +{
> +	return epf_virtio32_to_cpu((__force __virtio32)ioread32(addr));
> +}
> +
> +static inline u64 epf_ioread64(void __iomem *addr)
> +{
> +	return epf_virtio64_to_cpu((__force __virtio64)readq(addr));
> +}
> +
> +/*
> + * These functions convert values to PCI host endianness and write those values
> + * to an IO memory address to the PCI host.
> + */
> +static inline void epf_iowrite16(u16 val, void __iomem *addr)
> +{
> +	iowrite16((__force u16)epf_cpu_to_virtio16(val), addr);
> +}
> +
> +static inline void epf_iowrite32(u32 val, void __iomem *addr)
> +{
> +	iowrite32((__force u32)epf_cpu_to_virtio32(val), addr);
> +}
> +
> +static inline void epf_iowrite64(u64 val, void __iomem *addr)
> +{
> +	writeq((__force u64)epf_cpu_to_virtio64(val), addr);
> +}
> +
> +/*
> + * Generate a 32 bit number representing the features supported by the device
> + * seen by virtio_pci_legacy on the PCI host across the bus.
> + *
> + * @features: feature bits supported by the device
> + * @len: number of supported features
> + */
> +static inline u32 generate_dev_feature32(const unsigned int *features, int len)
> +{
> +	u32 feature = 0;
> +	int index = len - 1;
> +
> +	for (; index >= 0; index--)
> +		feature |= BIT(features[index]);
> +	return feature;
> +}
> +
> +/*
> + * Generate a 64 bit number representing the features supported by the device
> + * seen by the local virtio modules on the PCI endpoint.
> + *
> + * @features: feature bits supported by the local device
> + * @len: number of supported features
> + */
> +static inline u64 generate_local_dev_feature64(const unsigned int *features,
> +					       int len)
> +{
> +	u64 feature = 0;
> +	int i = 0;
> +
> +	for (; i < len; i++)
> +		feature |= BIT_ULL(features[i]);
> +	return feature;
> +}
> +
> +/*
> + * Simulate an interrupt by the local virtio_net device to the local virtio_net
> + * drivers on the PCI endpoint. There will be no real irq. Instead, there
> + * is enough information to invoke callbacks associated with some virtqueue
> + * directly.
> + *
> + * @vring: the vring on which an "interrupt" occurs
> + * @dev: local device required for error reporting
> + */
> +static void epf_virtio_interrupt(struct vring *vring, struct device *dev)
> +{
> +	struct vring_virtqueue *const vvq = container_of(vring,
> +							 struct vring_virtqueue,
> +							 split.vring);
> +	struct virtqueue *const vq = &vvq->vq;
> +
> +	if (vvq->last_used_idx == local_virtio16_to_cpu(vring->used->idx)) {
> +		dev_dbg(dev, "no more work for vq %#06x\n", vq->index);
> +		return;
> +	}
> +	if (unlikely(vvq->broken)) {
> +		dev_err(dev, "virtuque %#06x is broken\n", vq->index);
> +		return;
> +	}
> +	if (vq->callback)
> +		vq->callback(vq);
> +}
> +
> +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> +/*
> + * Read local used_event written by the local virtio_ring module.
> + *
> + * @avail: local avail vring
> + *
> + * Returns an u16 representing the used event idx
> + */
> +static inline u16 read_local_used_event(struct vring_avail *avail)
> +{
> +	return local_virtio16_to_cpu(avail->ring[EPF_VIRTIO_QUEUE_SIZE]);
> +}
> +
> +/*
> + * Write local avail_event read by the local virtio_ring module.
> + *
> + * @used: local used vring
> + * @val: the avail_event value to be written
> + */
> +static inline void write_local_avail_event(struct vring_used *used, u16 val)
> +{
> +	*(__force u16 *)&used->ring[EPF_VIRTIO_QUEUE_SIZE] = val;
> +}
> +
> +/*
> + * Read remote used_event written by remote virtio_ring module
> + *
> + * @avail: IO memory address of the avail ring on PCI host
> + *
> + * Returns an u16 representing the used event idx
> + */
> +static inline u16 read_used_event(void __iomem *avail)
> +{
> +	return epf_ioread16(IO_MEMBER_ARR_ELEM_PTR(avail,
> +						   struct vring_avail,
> +						   ring,
> +						   __virtio16,
> +						   EPF_VIRTIO_QUEUE_SIZE));
> +}
> +
> +/*
> + * Write remote avail event read by remote virtio_ring module
> + *
> + * @used: IO memory address of the used ring on PCI host
> + * @val: avail event in endpoint endianness to be written
> + */
> +static inline void write_avail_event(void __iomem *used, u16 val)
> +{
> +	epf_iowrite16(val, IO_MEMBER_ARR_ELEM_PTR(used,
> +						  struct vring_used,
> +						  ring,
> +						  struct vring_used_elem,
> +						  EPF_VIRTIO_QUEUE_SIZE));
> +}
> +#endif
> +
> +/*
> + * Increase a local __virtio16 value by some increment in place. idx_shadow
> + * will store the corresponding u16 value after increment in PCI endpoint
> + * endianness.
> + *
> + * @idx: ptr to the __virtio16 value to be incremented
> + * @idx_shadow: ptr to the u16 value to store the incremented value
> + * @increment: amount of increment
> + */
> +static inline void advance_idx(__virtio16 *idx,
> +			       u16 *idx_shadow,
> +			       int increment)
> +{
> +	*idx_shadow = local_virtio16_to_cpu(*idx) + increment;
> +	*idx = local_cpu_to_virtio16(*idx_shadow);
> +}
> +
> +/*
> + * Increase a remote __virtio16 value by some increment in place. idx_shadow
> + * will store the corresponding u16 value after increment in PCI endpoint
> + * endianness.
> + *
> + * @idx: IO memory address of the remote __virtio16 value to be incremented
> + * @idx_shadow: ptr to u16 value that stores the incremented value in PCI
> + *		endpoint endianness
> + * @increment: amount of increment
> + */
> +static inline void advance_idx_remote(void __iomem *idx,
> +				      u16 *idx_shadow,
> +				      int increment)
> +{
> +	*idx_shadow = epf_ioread16(idx) + increment;
> +	epf_iowrite16(*idx_shadow, idx);
> +}
> +
> +/*
> + * Function called when local endpoint function wants to notify the local
> + * virtio device about new available buffers.
> + *
> + * @vq: virtqueue where new notification occurs
> + *
> + * Returns true always
> + */
> +static inline bool epf_virtio_local_notify(struct virtqueue *vq)
> +{
> +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> +	__virtio16 avail_event;
> +#endif
> +	const u32 index = vq->index;
> +	struct epf_virtio_device *const epf_vdev = vq->priv;
> +	atomic_t *const local_pending = epf_vdev->local_pending;
> +
> +	if (index)
> +		atomic_cmpxchg(local_pending, 0, 1);
> +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> +	avail_event = epf_vdev->vrings[index]->avail->idx;
> +	write_local_avail_event(epf_vdev->vrings[index]->used,
> +				local_virtio16_to_cpu(avail_event)
> +				+ event_suppression);
> +#endif
> +	return true;
> +}
> +
> +/*
> + * Delete all vring_virtqueues of the local virtio_device
> + *
> + * @vdev: local virtio device
> + */
> +static void epf_virtio_local_del_vqs(struct virtio_device *vdev)
> +{
> +	int i;
> +	struct vring *vr;
> +	struct vring_virtqueue *vvq;
> +	struct epf_virtio_device *const epf_vdev = vdev_to_epf_vdev(vdev);
> +
> +	for (i = 0; i < 2; i++) {
> +		vr = epf_vdev->vrings[i];
> +		if (vr) {
> +			vvq = container_of(vr, struct vring_virtqueue,
> +					   split.vring);
> +			vring_del_virtqueue(&vvq->vq);
> +		}
> +	}
> +}
> +
> +/*
> + * Get value from the virtio network config of the local virtio device.
> + *
> + * @vdev: local virtio device
> + * @offset: offset of starting memory address from the start of local
> + *	    virtio network config in bytes
> + * @buf: virtual memory address to store the value
> + * @len: size of requested data in bytes
> + */
> +static inline void epf_virtio_local_get(struct virtio_device *vdev,
> +					unsigned int offset,
> +					void *buf,
> +					unsigned int len)
> +{
> +	memcpy(buf,
> +	       (void *)&vdev_to_epf_vdev(vdev)->local_net_cfg + offset,
> +	       len);
> +}
> +
> +/*
> + * Set a value in the virtio network config of the local virtio device.
> + *
> + * @vdev: local virtio device
> + * @offset: offset of starting memory address from start of local virtio
> + *	    network config in bytes
> + * @buf: source of data in virtual memory
> + * @len: size of data in bytes
> + */
> +static inline void epf_virtio_local_set(struct virtio_device *vdev,
> +					unsigned int offset,
> +					const void *buf,
> +					unsigned int len)
> +{
> +	memcpy((void *)&vdev_to_epf_vdev(vdev)->local_net_cfg + offset,
> +	       buf,
> +	       len);
> +}
> +
> +/* Dummy function */
> +static inline u32 epf_virtio_local_generation(struct virtio_device *vdev)
> +{
> +	return 0;
> +}
> +
> +/*
> + * Get status of local virtio device.
> + *
> + * @vdev: local virtio device
> + *
> + * Returns a byte representing the status of the device.
> + */
> +static inline u8 epf_virtio_local_get_status(struct virtio_device *vdev)
> +{
> +	return vdev_to_epf_vdev(vdev)->local_cfg.dev_status;
> +}
> +
> +/*
> + * Set the status of the local virtio device
> + *
> + * @vdev: local virtio device
> + * @status: a byte that will be written to the status of local virtio device
> + */
> +static inline void epf_virtio_local_set_status(struct virtio_device *vdev,
> +					       u8 status)
> +{
> +	WARN_ON(status == 0);
> +	vdev_to_epf_vdev(vdev)->local_cfg.dev_status = status;
> +}
> +
> +/*
> + * Simulate a "reset" action on the local virtio device
> + *
> + * @vdev: local virtio device
> + */
> +static inline void epf_virtio_local_reset(struct virtio_device *vdev)
> +{
> +	vdev_to_epf_vdev(vdev)->local_cfg.dev_status = 0;
> +}
> +
> +/*
> + * Allocate and initialize vrings for the local virtio device. irq affinity
> + * is not implemented, and this endpoint function does not yet support
> + * msix features of virtio_net.
> + *
> + * @vdev: local virtio device
> + * @nvqs: number of virtqueues to create. 2 for virtio_net device.
> + * @vqs: array of pointers that store the memory addresses of vrings
> + * @callbacks: callback functions associated with each vring. The interrupt
> + *	       callback function will be called when an "interrupt" is
> + *	       simulated on that vring.
> + * @names: names of vrings
> + * @ctx: not implemented because msix is not enabled
> + * @desc: not implemented because msix is not enabled
> + *
> + * Returns 0 on success and a negative error number on failure
> + */
> +static int epf_virtio_local_find_vqs(struct virtio_device *vdev,
> +				     unsigned int nvqs,
> +				     struct virtqueue *vqs[],
> +				     vq_callback_t *callbacks[],
> +				     const char * const names[],
> +				     const bool *ctx,
> +				     struct irq_affinity *desc)
> +{
> +	int i;
> +	int queue_idx = 0;
> +	struct virtqueue *vq;
> +	struct vring_virtqueue *vvq;
> +	struct epf_virtio_device *const epf_vdev = vdev_to_epf_vdev(vdev);
> +
> +	for (i = 0; i < nvqs; i++) {
> +		if (!names[i]) {
> +			vqs[i] = NULL;
> +			continue;
> +		}
> +		vq = vring_create_virtqueue(queue_idx++,
> +					    EPF_VIRTIO_QUEUE_SIZE,
> +					    VIRTIO_PCI_VRING_ALIGN,
> +					    vdev,
> +					    true,
> +					    false,
> +					    ctx ? ctx[i] : false,
> +					    epf_virtio_local_notify,
> +					    callbacks[i],
> +					    names[i]);
> +		if (!vq)
> +			goto out_del_vqs;
> +		vqs[i] = vq;
> +		vvq = container_of(vq, struct vring_virtqueue, vq);
> +		epf_vdev->vrings[i] = &vvq->split.vring;
> +		vq->priv = epf_vdev;
> +	}
> +	return 0;
> +out_del_vqs:
> +	epf_virtio_local_del_vqs(vdev);
> +	return -ENOMEM;
> +}
> +
> +/*
> + * Get features advertised by the local virtio device.
> + *
> + * @vdev: local virtio device
> + *
> + * Returns a 64 bit integer representing the features advertised by the device.
> + */
> +static inline u64 epf_virtio_local_get_features(struct virtio_device *vdev)
> +{
> +	return vdev_to_epf_vdev(vdev)->local_cfg.dev_feature;
> +}
> +
> +/*
> + * Finalize features supported by both the local virtio device and the local
> + * virtio drivers.
> + *
> + * @vdev: local virtio device
> + *
> + * Always returns 0.
> + */
> +static int epf_virtio_local_finalize_features(struct virtio_device *vdev)
> +{
> +	struct epf_virtio_device *const epf_vdev = vdev_to_epf_vdev(vdev);
> +
> +	vring_transport_features(vdev);
> +	epf_vdev->local_cfg.drv_feature = vdev->features;
> +	return 0;
> +}
> +
> +/*
> + * Get the bus name of the local virtio device.
> + *
> + * @vdev: local virtio device
> + *
> + * Returns the local bus name. It will always be "epf_virtio_local_bus".
> + */
> +static inline const char *epf_virtio_local_bus_name(struct virtio_device *vdev)
> +{
> +	return "epf_virtio_local_bus";
> +}
> +
> +/* Dummpy function. msix is not enabled. */
> +static inline int
> +	epf_virtio_local_set_vq_affinity(struct virtqueue *vq,
> +					 const struct cpumask *cpu_mask)
> +{
> +	return 0;
> +}
> +
> +/* Dummpy function. msix is not enabled. */
> +static inline const struct cpumask *
> +	epf_virtio_local_get_vq_affinity(struct virtio_device *vdev,
> +					 int index)
> +{
> +	return NULL;
> +}
> +
> +/* This function table will be used by local virtio modules. */
> +static const struct virtio_config_ops epf_virtio_local_dev_config_ops = {
> +	.get = epf_virtio_local_get,
> +	.set = epf_virtio_local_set,
> +	.get_status = epf_virtio_local_get_status,
> +	.set_status = epf_virtio_local_set_status,
> +	.reset = epf_virtio_local_reset,
> +	.find_vqs = epf_virtio_local_find_vqs,
> +	.del_vqs = epf_virtio_local_del_vqs,
> +	.get_features = epf_virtio_local_get_features,
> +	.finalize_features = epf_virtio_local_finalize_features,
> +	.bus_name = epf_virtio_local_bus_name,
> +	.set_vq_affinity = epf_virtio_local_set_vq_affinity,
> +	.get_vq_affinity = epf_virtio_local_get_vq_affinity,
> +	.generation = epf_virtio_local_generation,
> +};
> +
> +/*
> + * Initializes the virtio_pci and virtio_net config space that will be exposed
> + * to the remote virtio_pci and virtio_net modules on the PCI host. This
> + * includes setting up feature negotiation and default config setup etc.
> + *
> + * @epf_virtio: epf_virtio handler
> + */
> +static void pci_epf_virtio_init_cfg_legacy(struct pci_epf_virtio *epf_virtio)
> +{
> +	const u32 dev_feature =
> +		generate_dev_feature32(features, ARRAY_SIZE(features));
> +	struct virtio_legacy_cfg *const legacy_cfg = epf_virtio->reg[BAR_0];
> +	/* msix is disabled */
> +	struct virtio_net_config *const net_cfg = (void *)legacy_cfg +
> +						  VIRTIO_PCI_CONFIG_OFF(0);
> +
> +	epf_virtio->legacy_cfg = legacy_cfg;
> +	epf_virtio->net_cfg = net_cfg;
> +
> +	/* virtio PCI legacy cfg */
> +	legacy_cfg->q_select = epf_cpu_to_virtio16(2);
> +	legacy_cfg->q_size = epf_cpu_to_virtio16(EPF_VIRTIO_QUEUE_SIZE);
> +	legacy_cfg->dev_feature = epf_cpu_to_virtio32(dev_feature);
> +	legacy_cfg->q_notify = epf_cpu_to_virtio16(2);
> +	legacy_cfg->isr_status = VIRTIO_PCI_ISR_HIGH;
> +
> +	/* virtio net specific cfg */
> +	net_cfg->max_virtqueue_pairs = (__force __u16)epf_cpu_to_virtio16(1);

You don't need this without VIRTIO_NET_F_MQ.

> +	memcpy(net_cfg->mac, host_mac, ETH_ALEN);
> +	dev_info(&epf_virtio->epf->dev,
> +		 "dev_feature is %#010x\n",
> +		 epf_virtio32_to_cpu(epf_virtio->legacy_cfg->dev_feature));
> +}
> +
> +/*
> + * Handles the actual transfer of data across PCI bus. Supports both read
> + * and write.
> + *
> + * @epf_virtio: epf_virtio handler
> + * @write: true for write from endpoint to host and false for read from host
> + *	   to endpoint
> + * @remote_addr: physical address on PCI host
> + * @buf: virtual address on PCI endpoint
> + * @len: size of data transfer in bytes
> + * @lhead: list head that links the cache of available maps
> + *
> + * Returns 0 on success and a negative error number on failure.
> + */
> +static int epf_virtio_rw(struct pci_epf_virtio *epf_virtio, bool write,
> +			 u64 remote_addr, void *buf, int len,
> +			 struct list_head *lhead)
> +{
> +#ifdef CONFIG_PCI_ENDPOINT_DMAENGINE
> +	int ret = 0;
> +	phys_addr_t src_addr;
> +	phys_addr_t dst_addr;
> +	struct device *const dma_dev = epf_virtio->epf->epc->dev.parent;
> +#endif
> +	struct device *const dev = &epf_virtio->epf->dev;
> +	struct pci_epf_map *const map = pci_epf_get_map(lhead,
> +							remote_addr,
> +							len);
> +	if (IS_ERR(map)) {
> +		dev_err(dev, "EPF map failed before io\n");
> +		return PTR_ERR(map);
> +	}
> +#ifdef CONFIG_PCI_ENDPOINT_DMAENGINE
> +	if (enable_dma) {
> +		if (write) {
> +			src_addr = dma_map_single(dma_dev,
> +						  buf,
> +						  len,
> +						  DMA_TO_DEVICE);
> +			if (dma_mapping_error(dma_dev,
> +					      src_addr)) {
> +				dev_err(dev,
> +					"Failed to map src buffer address\n");
> +				ret = -ENOMEM;
> +				goto out;
> +			}
> +			ret = pci_epf_tx(epf_virtio->epf,
> +					 map->phys_ioaddr,
> +					 src_addr,
> +					 len);
> +			dma_unmap_single(dma_dev,
> +					 src_addr,
> +					 len,
> +					 DMA_TO_DEVICE);
> +			if (ret)
> +				dev_err(dev, "DMA transfer failed\n");
> +		} else {
> +			dst_addr = dma_map_single(dma_dev,
> +						  buf,
> +						  len,
> +						  DMA_FROM_DEVICE);
> +			if (dma_mapping_error(dma_dev,
> +					      dst_addr)) {
> +				dev_err(dev,
> +					"Failed to map dst address\n");
> +				ret = -ENOMEM;
> +				goto out;
> +			}
> +			ret = pci_epf_tx(epf_virtio->epf,
> +					 dst_addr,
> +					 map->phys_ioaddr,
> +					 len);
> +			dma_unmap_single(dma_dev,
> +					 dst_addr,
> +					 len,
> +					 DMA_FROM_DEVICE);
> +			if (ret)
> +				dev_err(dev, "DMA transfer failed\n");
> +		}
> +	} else {
> +		if (write)
> +			memcpy_toio(map->ioaddr, buf, len);
> +		else
> +			memcpy_fromio(buf, map->ioaddr, len);
> +	}
> +	return 0;
> +out:
> +	pci_epf_unmap(map);
> +	return ret;
> +#else
> +	if (write)
> +		memcpy_toio(map->ioaddr, buf, len);
> +	else
> +		memcpy_fromio(buf, map->ioaddr, len);
> +	return 0;
> +#endif
> +}
> +
> +/*
> + * Free memory allocated on PCI endpoint that is used to store data
> + * about the vrings on PCI host.
> + *
> + * @epf_virtio: epf_virtio handler
> + * @n: number of vrings' information to be freed on PCI endpoint
> + */
> +static void free_vring_info(struct pci_epf_virtio *epf_virtio, int n)
> +{
> +	int i;
> +
> +	for (i = n; i >= 0; i--) {
> +		kfree(&epf_virtio->q_addrs[i]);
> +		kfree(&epf_virtio->q_pfns[i]);
> +		pci_epf_unmap(&epf_virtio->q_map[i]);
> +	}
> +}
> +
> +/*
> + * Allocate memory and store information about the vrings on PCI host.
> + * Information includes physical addresses of vrings and different members
> + * of those vrings.
> + *
> + * @epf_virtio: epf_virtio handler
> + *
> + * Returns 0 on success and a negative error number on failure.
> + */
> +static int store_host_vring(struct pci_epf_virtio *epf_virtio)
> +{
> +	struct pci_epf_map *map;
> +	int ret;
> +	int n;
> +	__virtio32 q_pfn;
> +	void __iomem *tmp_ptr;
> +
> +	for (n = 0; n < 2; n++) {
> +		map = &epf_virtio->q_map[n];
> +		/*
> +		 * The left shift is applied because virtio_pci_legacy
> +		 * applied the right shift first
> +		 */
> +		q_pfn = (__force __virtio32)atomic_read(&epf_virtio->q_pfns[n]);
> +		epf_virtio->q_addrs[n] = epf_virtio32_to_cpu(q_pfn);
> +		ret = pci_epf_map(map,
> +				  epf_virtio->q_addrs[n]
> +				  << VIRTIO_PCI_QUEUE_ADDR_SHIFT,
> +				  vring_size(EPF_VIRTIO_QUEUE_SIZE,
> +					     VIRTIO_PCI_VRING_ALIGN));
> +		if (ret) {
> +			dev_err(&epf_virtio->epf->dev,
> +				"EPF mapping error storing host ring%d\n",
> +				n);
> +			free_vring_info(epf_virtio, n - 1);
> +			return ret;
> +		}
> +		/* Store the remote vring addresses according to virtio-legacy*/
> +		epf_virtio->desc[n] = map->ioaddr;
> +		epf_virtio->avail[n] = map->ioaddr
> +				       + EPF_VIRTIO_QUEUE_SIZE
> +				       * sizeof(struct vring_desc);
> +		tmp_ptr = IO_MEMBER_ARR_ELEM_PTR(epf_virtio->avail[n],
> +						 struct vring_avail,
> +						 ring,
> +						 __virtio16,
> +						 EPF_VIRTIO_QUEUE_SIZE);
> +		epf_virtio->used[n] =
> +			(void __iomem *)(((uintptr_t)tmp_ptr
> +					  + sizeof(__virtio16)
> +					  + VIRTIO_PCI_VRING_ALIGN - 1)
> +					 & ~(VIRTIO_PCI_VRING_ALIGN - 1));
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Catch notification sent by the PCI host to the PCI endpoint. This usually
> + * happens when the PCI host has provided a new available buffer and wants
> + * the PCI endpoint to process the new buffer. This function will set the
> + * pending bit atomically to 1. The transfer handler thread will then under-
> + * stand that there are more unprocessed buffers.
> + *
> + * @data: kthread context data. It is actually the epf_virtio handler.
> + *
> + * Always returns 0.
> + */
> +static int pci_epf_virtio_catch_notif(void *data)
> +{
> +	u16 changed;
> +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> +	void __iomem *avail_idx;
> +	u16 event;
> +#endif
> +
> +	register const __virtio16 default_notify = epf_cpu_to_virtio16(2);
> +
> +	struct pci_epf_virtio *const epf_virtio = data;
> +	atomic_t *const pending = epf_virtio->pending;
> +
> +	while (!kthread_should_stop()) {
> +		changed = epf_virtio16_to_cpu(epf_virtio->legacy_cfg->q_notify);
> +		if (changed != 2) {
> +			epf_virtio->legacy_cfg->q_notify = default_notify;
> +			/* The pci host has made changes to virtqueues */
> +			if (changed)
> +				atomic_cmpxchg(pending, 0, 1);
> +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> +			avail_idx = IO_MEMBER_PTR(epf_virtio->avail[changed],
> +						  struct vring_avail,
> +						  idx);
> +			event = epf_ioread16(avail_idx) + event_suppression;
> +			write_avail_event(epf_virtio->used[changed], event);
> +#endif
> +		}
> +		usleep_range(notif_poll_usec_min,
> +			     notif_poll_usec_max);
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Transfer data from PCI host to PCI endpoint. Physical addresses of memory
> + * to read from are not passed in as parameters. Instead they are stored in
> + * the epf_virtio handler.
> + *
> + * @desc: local descriptor to store the data
> + * @epf_virtio: epf_virtio handler
> + * @cache_head: list head that links all the available maps
> + */
> +static void fill_ep_buf(struct vring_desc *desc,
> +			struct pci_epf_virtio *epf_virtio,
> +			struct list_head *cache_head)
> +{
> +	int ret;
> +	u64 local_addr;
> +	u16 flags;
> +	struct mem_frag *const hdr_frag = &epf_virtio->frags[0];
> +	struct mem_frag *const frag = &epf_virtio->frags[1];
> +	struct virtio_net_hdr *hdr;
> +	void *buf;
> +
> +	local_addr = local_virtio64_to_cpu(desc->addr);
> +	hdr = phys_to_virt((phys_addr_t)local_addr);
> +	ret = epf_virtio_rw(epf_virtio, false,
> +			    hdr_frag->addr, hdr,
> +			    hdr_frag->len, cache_head);
> +	if (ret)
> +		dev_err(&epf_virtio->epf->dev,
> +			"Read header failed\n");
> +	buf = (void *)hdr + hdr_frag->len;
> +	ret = epf_virtio_rw(epf_virtio, false, frag->addr, buf,
> +			    frag->len, cache_head);
> +	if (ret)
> +		dev_err(&epf_virtio->epf->dev,
> +			"Read data failed\n");
> +	flags = local_virtio16_to_cpu(desc->flags);
> +	desc->flags =
> +		local_cpu_to_virtio16(flags & ~(VRING_DESC_F_NEXT));
> +	desc->len = local_cpu_to_virtio32(frag->len + hdr_frag->len);
> +}
> +
> +/*
> + * Transfer data from PCI endpoint to PCI host. Physical addresses of local
> + * memory to write from are not passed in as parameters. Instead, they are
> + * stored in the epf_virtio_device in the epf_virtio handler.
> + *
> + * @desc: IO memory of the remote descriptor on PCI host to hold the data
> + * @epf_virtio: epf_virtio handler
> + * @cache_head: list head that links all the available maps
> + */
> +static void fill_host_buf(void __iomem *desc,
> +			  struct pci_epf_virtio *epf_virtio,
> +			  struct list_head *cache_head)
> +{
> +	int ret;
> +	u64 remote_addr;
> +	struct mem_frag *const hdr_frag =
> +		&epf_virtio->epf_vdev.local_frags[0];
> +	struct mem_frag *const frag = &epf_virtio->epf_vdev.local_frags[1];
> +	void __iomem *const flag_addr = IO_MEMBER_PTR(desc,
> +						      struct vring_desc,
> +						      flags);
> +	struct virtio_net_hdr *hdr;
> +	void *buf;
> +	u16 flags;
> +
> +	hdr = phys_to_virt((phys_addr_t)hdr_frag->addr);
> +	buf = phys_to_virt((phys_addr_t)frag->addr);
> +	remote_addr = epf_ioread64(IO_MEMBER_PTR(desc,
> +						 struct vring_desc,
> +						 addr));
> +	ret = epf_virtio_rw(epf_virtio, true, remote_addr, hdr,
> +			    hdr_frag->len, cache_head);
> +	if (ret)
> +		dev_err(&epf_virtio->epf->dev,
> +			"Write header failed\n");
> +
> +	remote_addr += hdr_frag->len;
> +	ret = epf_virtio_rw(epf_virtio, true, remote_addr, buf,
> +			    frag->len, cache_head);
> +	if (ret)
> +		dev_err(&epf_virtio->epf->dev,
> +			"write data failed\n");
> +	epf_iowrite32(frag->len + hdr_frag->len,
> +		      IO_MEMBER_PTR(desc,
> +				    struct vring_desc,
> +				    len));
> +	flags = epf_ioread16(flag_addr);
> +	epf_iowrite16(flags & ~(VRING_DESC_F_NEXT), flag_addr);
> +}
> +
> +/*
> + * Handle transfer from PCI host to PCI endpoint. This runs in a dedicated
> + * kernel thread infinitely unless the thread is stopped. This thread
> + * continuously polls for available buffers provided by PCI host and puts
> + * them in right places on PCI endpoint.
> + *
> + * @data: kthread context. Actually a epf_virtio handler.
> + *
> + * Always return 0. Only return when thread is stopped.
> + */
> +static int pci_epf_virtio_handle_tx(void *data)
> +{
> +	int i;
> +	u32 total_size;
> +	u16 idx_shadow;
> +	u16 local_idx_shadow;
> +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> +	u16 local_used_event;
> +	u16 used_event;
> +#endif
> +	u16 num_desc;
> +	__virtio16 desc_idx;
> +	u16 used_idx_modulo;
> +	u16 local_used_idx_modulo;
> +	u16 used_idx;
> +	u16 local_used_idx;
> +	struct mem_frag *remote_frag;
> +	void __iomem *desc;
> +	void __iomem *desc_next;
> +	void __iomem *avail_used_ptr;
> +	void __iomem *used_used_ptr;
> +	struct pci_epf_virtio *const epf_virtio = data;
> +	atomic_t *const pending = epf_virtio->pending;
> +	struct epf_virtio_device *const epf_vdev = &epf_virtio->epf_vdev;
> +	struct vring *const local_rx_vring = epf_vdev->vrings[0];
> +	struct vring_desc *const local_desc_head = local_rx_vring->desc;
> +	struct vring_desc *local_desc = local_desc_head;
> +	struct vring_used *const local_used = local_rx_vring->used;
> +	struct vring_avail *const local_avail = local_rx_vring->avail;
> +	struct pci_epf *epf = epf_virtio->epf;
> +	struct pci_epc *epc = epf->epc;
> +	void __iomem *const desc_head = epf_virtio->desc[1];
> +	void __iomem *const avail = epf_virtio->avail[1];
> +	void __iomem *const used = epf_virtio->used[1];
> +re_entry:
> +	if (kthread_should_stop())
> +		return 0;
> +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> +	local_used_event = 0;
> +	used_event = 0;
> +#endif
> +	num_desc = 0;
> +	used_idx = epf_ioread16(IO_MEMBER_PTR(used, struct vring_used, idx));
> +	local_used_idx = local_virtio16_to_cpu(local_used->idx);
> +	while (used_idx != epf_ioread16(IO_MEMBER_PTR(avail,
> +						      struct vring_avail,
> +						      idx))) {
> +		total_size = 0;
> +		used_idx_modulo = MODULO_QUEUE_SIZE(used_idx);
> +		local_used_idx_modulo = MODULO_QUEUE_SIZE(local_used_idx);
> +		avail_used_ptr = IO_MEMBER_ARR_ELEM_PTR(avail,
> +							struct vring_avail,
> +							ring,
> +							__virtio16,
> +							used_idx_modulo);
> +		used_used_ptr = IO_MEMBER_ARR_ELEM_PTR(used,
> +						       struct vring_used,
> +						       ring,
> +						       struct vring_used_elem,
> +						       used_idx_modulo);
> +		desc = IO_ARR_ELEM_PTR(desc_head,
> +				       struct vring_desc,
> +				       epf_ioread16(avail_used_ptr));
> +		for (i = 0; i < 2; i++) {
> +			remote_frag = &epf_virtio->frags[i];
> +			remote_frag->addr =
> +				epf_ioread64(IO_MEMBER_PTR(desc,
> +							   struct vring_desc,
> +							   addr));
> +			remote_frag->len =
> +				epf_ioread32(IO_MEMBER_PTR(desc,
> +							   struct vring_desc,
> +							   len));
> +			total_size += remote_frag->len;
> +			desc_next = IO_MEMBER_PTR(desc,
> +						  struct vring_desc,
> +						  next);
> +			desc = IO_ARR_ELEM_PTR(desc_head,
> +					       struct vring_desc,
> +					       epf_ioread16(desc_next));
> +		}
> +
> +		/* Copy content into local buffer from remote frags */
> +		desc_idx = local_avail->ring[local_used_idx_modulo];
> +		local_desc =
> +			&local_desc_head[local_virtio16_to_cpu(desc_idx)];
> +		fill_ep_buf(local_desc, epf_virtio, &epf_virtio->lru_head);
> +
> +		/* Update used rings for both sides */
> +		local_used->ring[local_used_idx_modulo].id =
> +			(__force __virtio32)desc_idx;
> +		local_used->ring[local_used_idx_modulo].len =
> +			local_cpu_to_virtio32(total_size);
> +		epf_iowrite32((u32)epf_ioread16(avail_used_ptr),
> +			      IO_MEMBER_PTR(used_used_ptr,
> +					    struct vring_used_elem,
> +					    id));
> +		epf_iowrite32(total_size,
> +			      IO_MEMBER_PTR(used_used_ptr,
> +					    struct vring_used_elem,
> +					    len));
> +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> +		/* Only update index after contents are updated */
> +		wmb();
> +		advance_idx_remote(IO_MEMBER_PTR(used,
> +						 struct vring_used,
> +						 idx),
> +				   &idx_shadow,
> +				   1);
> +		used_event = read_used_event(avail);
> +		advance_idx(&local_used->idx, &local_idx_shadow,
> +			    1);
> +		local_used_event = read_local_used_event(local_avail);
> +		/* Only signal after indices are updated */
> +		mb();
> +		if (local_idx_shadow == local_used_event + 1)
> +			epf_virtio_interrupt(local_rx_vring,
> +					     &epf_vdev->vdev.dev);
> +		if (idx_shadow == used_event + 1)
> +			pci_epc_raise_irq(epc,
> +					  epf->func_no,
> +					  PCI_EPC_IRQ_LEGACY,
> +					  0);
> +#endif
> +		local_used_idx++;
> +		used_idx++;
> +		num_desc++;
> +	}
> +#ifndef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> +	if (num_desc) {
> +		/* Only update index after contents are updated */
> +		wmb();
> +		advance_idx_remote(IO_MEMBER_PTR(used, struct vring_used, idx),
> +				   &idx_shadow,
> +				   num_desc);
> +		advance_idx(&local_used->idx, &local_idx_shadow,
> +			    num_desc);
> +		/* Only signal after indices are updated */
> +		mb();
> +		if (likely(!(epf_ioread16(IO_MEMBER_PTR(avail,
> +							struct vring_avail,
> +							flags))
> +			     & VRING_AVAIL_F_NO_INTERRUPT)))
> +			pci_epc_raise_irq(epc,
> +					  epf->func_no,
> +					  PCI_EPC_IRQ_LEGACY,
> +					  0);
> +		if (likely(!(local_virtio16_to_cpu(local_avail->flags)
> +			     & VRING_AVAIL_F_NO_INTERRUPT)))
> +			epf_virtio_interrupt(local_rx_vring,
> +					     &epf_vdev->vdev.dev);
> +	}
> +#endif
> +	if (!atomic_xchg(pending, 0))
> +		usleep_range(check_queues_usec_min,
> +			     check_queues_usec_max);

What's the usleep hackery doing? Set it too low and you
waste cycles. Set it too high and your latency suffers.
It would be nicer to just use a completion or something like this.

> +	goto re_entry;
> +}
> +
> +/*
> + * Handle transfer from PCI endpoint to PCI host and run in a dedicated kernel
> + * thread. This function does not need to poll for notifications sent by the
> + * local virtio driver modules. Instead the local virtio modules will call
> + * exactly functions in this file, which will directly set up transfer envi-
> + * ronments.
> + *
> + * @data: kthread context. Actually a epf_virtio handler.
> + *
> + * Always return 0. Only return when the kernel thread is stopped.
> + */
> +static int pci_epf_virtio_local_handle_tx(void *data)
> +{
> +	int i;
> +	u32 total_size;
> +	struct vring_desc *desc;
> +	u16 idx_shadow;
> +	u16 local_idx_shadow;
> +	u16 used_idx_modulo;
> +	u16 host_used_idx_modulo;
> +	u16 used_idx;
> +	__virtio16 desc_idx;
> +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> +	u16 host_used_event;
> +	u16 used_event;
> +#endif
> +	u16 num_desc;
> +	u16 host_used_idx;
> +	void __iomem *avail_used_ptr;
> +	void __iomem *used_used_ptr;
> +	struct mem_frag *local_frag;
> +	struct pci_epf_virtio *const epf_virtio = data;
> +	struct epf_virtio_device *const epf_vdev = &epf_virtio->epf_vdev;
> +	struct pci_epf *const epf = epf_virtio->epf;
> +	struct pci_epc *const epc = epf->epc;
> +	void __iomem *const host_desc_head = epf_virtio->desc[0];
> +	void __iomem *host_desc = host_desc_head;
> +	void __iomem *const host_avail = epf_virtio->avail[0];
> +	void __iomem *const host_used = epf_virtio->used[0];
> +	struct vring *const vr = epf_vdev->vrings[1];
> +	struct vring_desc *const desc_head = vr->desc;
> +	struct vring_used *const used = vr->used;
> +	struct vring_avail *const avail = vr->avail;
> +	atomic_t *const local_pending = epf_vdev->local_pending;
> +re_entry:
> +	if (kthread_should_stop())
> +		return 0;
> +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> +	host_used_event = 0;
> +	used_event = 0;
> +#endif
> +	num_desc = 0;
> +	used_idx = local_virtio16_to_cpu(used->idx);
> +	host_used_idx = epf_ioread16(IO_MEMBER_PTR(host_used,
> +						   struct vring_used,
> +						   idx));
> +	while (used_idx != local_virtio16_to_cpu(avail->idx)) {
> +		total_size = 0;
> +		used_idx_modulo = MODULO_QUEUE_SIZE(used_idx);
> +		host_used_idx_modulo = MODULO_QUEUE_SIZE(host_used_idx);
> +		desc_idx = avail->ring[used_idx_modulo];
> +		desc = &desc_head[local_virtio16_to_cpu(desc_idx)];
> +		avail_used_ptr = IO_MEMBER_ARR_ELEM_PTR(host_avail,
> +							struct vring_avail,
> +							ring,
> +							__virtio16,
> +							host_used_idx_modulo);
> +		used_used_ptr = IO_MEMBER_ARR_ELEM_PTR(host_used,
> +						       struct vring_used,
> +						       ring,
> +						       struct vring_used_elem,
> +						       host_used_idx_modulo);
> +		for (i = 0; i < 2; i++) {
> +			/* Only allocate if there is none available */
> +			local_frag = &epf_vdev->local_frags[i];
> +			local_frag->addr = local_virtio64_to_cpu(desc->addr);
> +			local_frag->len = local_virtio32_to_cpu(desc->len);
> +			total_size += local_virtio32_to_cpu(desc->len);
> +			desc = &desc_head[local_virtio16_to_cpu(desc->next)];
> +		}
> +
> +		host_desc = IO_ARR_ELEM_PTR(host_desc_head,
> +					    struct vring_desc,
> +					    epf_ioread16(avail_used_ptr));
> +		fill_host_buf(host_desc, epf_virtio, &epf_vdev->local_lru_head);
> +
> +		/* Update used rings for both sides */
> +		epf_iowrite32((u32)epf_ioread16(avail_used_ptr),
> +			      IO_MEMBER_PTR(used_used_ptr,
> +					    struct vring_used_elem,
> +					    id));
> +		epf_iowrite32(total_size,
> +			      IO_MEMBER_PTR(used_used_ptr,
> +					    struct vring_used_elem,
> +					    len));
> +		used->ring[used_idx_modulo].id =
> +			(__force __virtio32)avail->ring[used_idx_modulo];
> +		used->ring[used_idx_modulo].len =
> +			local_cpu_to_virtio32(total_size);
> +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> +		/* Only update index after contents are updated */
> +		wmb();
> +		advance_idx_remote(IO_MEMBER_PTR(host_used,
> +						 struct vring_used,
> +						 idx),
> +				   &idx_shadow,
> +				   1);
> +		advance_idx(&used->idx, &local_idx_shadow, 1);
> +		host_used_event = read_used_event(host_avail);
> +		used_event = read_local_used_event(avail);
> +		/* Only signal after indices are updated */
> +		mb();
> +		if (local_idx_shadow == used_event + 1)
> +			epf_virtio_interrupt(vr, &epf_vdev->vdev.dev);
> +		if (idx_shadow == host_used_event + 1)
> +			pci_epc_raise_irq(epc,
> +					  epf->func_no,
> +					  PCI_EPC_IRQ_LEGACY,
> +					  0);
> +#endif
> +		host_used_idx++;
> +		used_idx++;
> +		num_desc++;
> +	}
> +#ifndef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> +	if (num_desc) {
> +		/* Only update index after contents are updated */
> +		wmb();
> +		advance_idx_remote(IO_MEMBER_PTR(host_used,
> +						 struct vring_used,
> +						 idx),
> +				   &idx_shadow,
> +				   num_desc);
> +		advance_idx(&used->idx, &local_idx_shadow, num_desc);
> +		/* Only signal after indices are updated */
> +		mb();
> +		if (likely(!(epf_ioread16(IO_MEMBER_PTR(host_avail,
> +							struct vring_avail,
> +							flags))
> +			     & VRING_AVAIL_F_NO_INTERRUPT)))
> +			pci_epc_raise_irq(epc,
> +					  epf->func_no,
> +					  PCI_EPC_IRQ_LEGACY,
> +					  0);
> +		if (likely(!(local_virtio16_to_cpu(avail->flags)
> +			     & VRING_AVAIL_F_NO_INTERRUPT)))
> +			epf_virtio_interrupt(vr, &epf_vdev->vdev.dev);
> +	}
> +#endif
> +	if (!atomic_xchg(local_pending, 0))
> +		usleep_range(check_queues_usec_min,
> +			     check_queues_usec_max);
> +	goto re_entry;
> +}
> +
> +/*
> + * This function terminates early setup work and initializes variables
> + * for data transfer between the local vrings on PCI endpoint and remote vrings
> + * on PCI host. The initialization work includes storing information of
> + * physicaly addresses of remote vrings and starting two kernel threads
> + * that handle transfer between PCI host and endpoint. Some polling thread
> + * for notification from PCI host will also be set up.
> + *
> + * @epf_virtio: epf_virtio handler
> + *
> + * Return 0 on success and a negative error number on failure.
> + */
> +static int terminate_early_work(struct pci_epf_virtio *epf_virtio)
> +{
> +	int ret;
> +	struct net_device *netdev;
> +	struct epf_virtio_device *const epf_vdev = &epf_virtio->epf_vdev;
> +
> +	ret = store_host_vring(epf_virtio);
> +	if (ret) {
> +		dev_err(&epf_virtio->epf->dev,
> +			"Failed to store addresses of host vrings, abort\n");
> +		return ret;
> +	}
> +	ret = register_virtio_device(&epf_vdev->vdev);
> +	if (ret) {
> +		dev_err(&epf_vdev->vdev.dev,
> +			"local virtio device register failure\n");
> +		free_vring_info(epf_virtio, 2);
> +		return ret;
> +	}
> +	epf_vdev->registered = true;
> +	dev_info(&epf_vdev->vdev.dev,
> +		 "local_dev_feature is %#018llx\n",
> +		 epf_vdev->local_cfg.drv_feature);
> +	netdev = ((struct virtnet_info *)epf_vdev->vdev.priv)->dev;
> +	while (!(READ_ONCE(netdev->flags) & IFF_UP))
> +		schedule();
> +	epf_virtio->pending = kmalloc(sizeof(*epf_virtio->pending), GFP_KERNEL);
> +	epf_vdev->local_pending = kmalloc(sizeof(*epf_vdev->local_pending),
> +					  GFP_KERNEL);
> +	atomic_set(epf_virtio->pending, 0);
> +	atomic_set(epf_vdev->local_pending, 0);
> +	epf_virtio->catch_notif = kthread_run(pci_epf_virtio_catch_notif,
> +					      epf_virtio,
> +					      "catch host notification");
> +	if (!epf_virtio->catch_notif) {
> +		dev_err(&epf_virtio->epf->dev,
> +			"Failed to start thread for host notif\n");
> +		goto thread_alloc_err;
> +	}
> +	epf_virtio->handle_vq = kthread_run(pci_epf_virtio_handle_tx,
> +					    epf_virtio,
> +					    "host to ep transfer");
> +	if (!epf_virtio->handle_vq) {
> +		dev_err(&epf_virtio->epf->dev,
> +			"Failed to start thread for host to ep transfer\n");
> +		kthread_stop(epf_virtio->catch_notif);
> +		goto thread_alloc_err;
> +	}
> +	epf_vdev->local_handle_vq = kthread_run(pci_epf_virtio_local_handle_tx,
> +						epf_virtio,
> +						"endpoint to host transfer");
> +	if (!epf_vdev->local_handle_vq) {
> +		dev_err(&epf_vdev->vdev.dev,
> +			"Failed to start thread for ep to host transfer\n");
> +		kthread_stop(epf_virtio->catch_notif);
> +		kthread_stop(epf_virtio->handle_vq);
> +		goto thread_alloc_err;
> +	}
> +	return 0;
> +
> +thread_alloc_err:
> +	kfree(epf_virtio->pending);
> +	kfree(epf_vdev->local_pending);
> +	free_vring_info(epf_virtio, 2);
> +	return -ENOMEM;
> +}
> +
> +/*
> + * This function mostly runs in a high-priority real-time thread and attempts
> + * to store vring page frame numbers written by the PCI host's virtio_pci to
> + * BAR 0 of the PCI device. The PCI host usually has faster cores and will not
> + * wait for the PCI endpoint to respond. Therefore the PCI endpoint has to run
> + * in a tight loop to catch up with PCI host. Note that if this thread blocks,
> + * the whole kernel will hang.
> + *
> + * @data: kthread context. Actually epf_virtio handler.
> + *
> + * Return 0 on success and a negative error number on failure.
> + */
> +static int pci_epf_virtio_queue_cfg(void *data)
> +{
> +	int ret;
> +	struct pci_epf_virtio *const epf_virtio = data;
> +	__virtio16 *const q_select = &epf_virtio->legacy_cfg->q_select;
> +	atomic_t *const q_addr_atomic =
> +		(__force atomic_t *)&epf_virtio->legacy_cfg->q_addr;
> +	atomic_t *const rx_pfn = &epf_virtio->q_pfns[0];
> +	atomic_t *const tx_pfn = &epf_virtio->q_pfns[1];
> +
> +	register u32 val;
> +
> +	register const __virtio16 q_default = epf_cpu_to_virtio16(2);
> +
> +	while (READ_ONCE(*q_select) == q_default)
> +		DO_NOTHING
> +	while (!(val = atomic_xchg(q_addr_atomic, 0)))
> +		DO_NOTHING
> +	atomic_xchg(rx_pfn, val);
> +	while (!(val = atomic_xchg(q_addr_atomic, 0)))
> +		DO_NOTHING
> +	atomic_xchg(tx_pfn, val);
> +	sched_setscheduler_nocheck(epf_virtio->early_task,
> +				   SCHED_NORMAL,
> +				   &normal_param);
> +	ret = terminate_early_work(epf_virtio);
> +	if (ret) {
> +		dev_err(&epf_virtio->epf->dev,
> +			"Failed to terminate early work\n");
> +		return ret;
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Get called when the PCIe endpoint controller start the link. Allocate memory
> + * and initialize variables that will be used by the virtual network devices.
> + *
> + * @epf: epf handler
> + */
> +static void pci_epf_virtio_linkup(struct pci_epf *epf)
> +{
> +	int i;
> +	struct pci_epf_map *map;
> +	struct pci_epf_map *local_map;
> +	struct pci_epf_virtio *const epf_virtio = epf_get_drvdata(epf);
> +	const struct pci_epc_features *const features =
> +		epf_virtio->epc_features;
> +	const size_t align =
> +		(features && features->align) ? features->align : PAGE_SIZE;
> +
> +	pci_epf_map_init(&epf_virtio->q_map[0], epf, align);
> +	pci_epf_map_init(&epf_virtio->q_map[1], epf, align);
> +	epf_virtio->map_slab = kmem_cache_create("map slab",
> +						 sizeof(struct pci_epf_map),
> +						 0,
> +						 SLAB_HWCACHE_ALIGN,
> +						 NULL);
> +	if (!epf_virtio->map_slab) {
> +		dev_err(&epf_virtio->epf->dev,
> +			"Map slab allocation failed\n");
> +		return;
> +	}
> +	epf_virtio->epf_vdev.local_map_slab =
> +		kmem_cache_create("local map slab",
> +				  sizeof(struct pci_epf_map),
> +				  0,
> +				  SLAB_HWCACHE_ALIGN,
> +				  NULL);
> +	if (!epf_virtio->epf_vdev.local_map_slab) {
> +		dev_err(&epf_virtio->epf_vdev.vdev.dev,
> +			"Local map slab allocation failed\n");
> +		return;
> +	}
> +	INIT_LIST_HEAD(&epf_virtio->lru_head);
> +	INIT_LIST_HEAD(&epf_virtio->epf_vdev.local_lru_head);
> +	for (i = 0; i < MAP_CACHE_SIZE; i++) {
> +		map = kmem_cache_alloc(epf_virtio->map_slab,
> +				       GFP_KERNEL);
> +		if (!map) {
> +			dev_err(&epf_virtio->epf->dev,
> +				"Map %d allocation failed\n", i);
> +			return;
> +		}
> +		local_map =
> +			kmem_cache_alloc(epf_virtio->epf_vdev.local_map_slab,
> +					 GFP_KERNEL);
> +		if (!local_map) {
> +			dev_err(&epf_virtio->epf_vdev.vdev.dev,
> +				"Local map %d allocation failed\n", i);
> +			return;
> +		}
> +
> +		pci_epf_map_init(map, epf, align);
> +		list_add(&map->node, &epf_virtio->lru_head);
> +
> +		pci_epf_map_init(local_map, epf, align);
> +		list_add(&local_map->node,
> +			 &epf_virtio->epf_vdev.local_lru_head);
> +	}
> +	pci_epf_virtio_init_cfg_legacy(epf_virtio);
> +	epf_virtio->early_task = kthread_create(pci_epf_virtio_queue_cfg,
> +						epf_virtio,
> +						"early task");
> +	if (IS_ERR(epf_virtio->early_task)) {
> +		dev_err(&epf_virtio->epf->dev,
> +			"Thread creation error\n");
> +		return;
> +	}
> +	if (!epf_virtio->early_task) {
> +		dev_err(&epf_virtio->epf->dev,
> +			"No memory to allocate thread for early setup work\n");
> +		return;
> +	}
> +	/*
> +	 * TODO: find a better alternative than this.
> +	 * This gives the early task the highest priority and the scheduler
> +	 * will not be able to detect stalls on this thread. The kernel will not
> +	 * be able to recover from this thread if there is only one core
> +	 */
> +	sched_setscheduler_nocheck(epf_virtio->early_task,
> +				   SCHED_FIFO,
> +				   &high_rt);
> +	wake_up_process(epf_virtio->early_task);
> +}
> +
> +/*
> + * Get called when the endpoint function device is unbound from the PCIe
> + * endpoint controller. Free memory and stop continuously running kernel
> + * threads.
> + *
> + * @epf: epf handler
> + */
> +static void pci_epf_virtio_unbind(struct pci_epf *epf)
> +{
> +	struct pci_epf_virtio *epf_virtio = epf_get_drvdata(epf);
> +	struct pci_epc *epc = epf->epc;
> +	struct pci_epf_bar *epf_bar;
> +	int bar;
> +
> +	if (epf_virtio->catch_notif && kthread_stop(epf_virtio->catch_notif))
> +		dev_info(&epf_virtio->epf->dev,
> +			 "Never started catching host notification\n");
> +	if (epf_virtio->handle_vq && kthread_stop(epf_virtio->handle_vq))
> +		dev_info(&epf_virtio->epf->dev,
> +			 "Never starteding host to endpoint transfer\n");
> +	if (epf_virtio->epf_vdev.local_handle_vq &&
> +	    kthread_stop(epf_virtio->epf_vdev.local_handle_vq))
> +		dev_info(&epf_virtio->epf_vdev.vdev.dev,
> +			 "Never started endpoint to host transfer\n");
> +	if (epf_virtio->epf_vdev.registered)
> +		unregister_virtio_device(&epf_virtio->epf_vdev.vdev);
> +	pci_epf_unmap(&epf_virtio->q_map[0]);
> +	pci_epf_unmap(&epf_virtio->q_map[1]);
> +	if (epf_virtio->map_slab) {
> +		pci_epf_free_map_cache(&epf_virtio->lru_head,
> +				       epf_virtio->map_slab);
> +		kmem_cache_destroy(epf_virtio->map_slab);
> +	}
> +	if (epf_virtio->epf_vdev.local_map_slab) {
> +		pci_epf_free_map_cache(&epf_virtio->epf_vdev.local_lru_head,
> +				       epf_virtio->epf_vdev.local_map_slab);
> +		kmem_cache_destroy(epf_virtio->epf_vdev.local_map_slab);
> +	}
> +	kfree(epf_virtio->q_pfns);
> +	kfree(epf_virtio->q_addrs);
> +	kfree(epf_virtio->pending);
> +	kfree(epf_virtio->epf_vdev.local_pending);
> +	pci_epc_stop(epc);
> +	for (bar = BAR_0; bar <= BAR_5; bar++) {
> +		epf_bar = &epf->bar[bar];
> +		if (epf_virtio->reg[bar]) {
> +			pci_epc_clear_bar(epc, epf->func_no, epf_bar);
> +			pci_epf_free_space(epf, epf_virtio->reg[bar], bar);
> +		}
> +	}
> +}
> +
> +/*
> + * Set BAR 0 to BAR 5 of the PCI endpoint device.
> + *
> + * @epf: epf handler
> + *
> + * Return 0 on success and a negative error number on failure.
> + */
> +static int pci_epf_virtio_set_bar(struct pci_epf *epf)
> +{
> +	int bar, add;
> +	int ret;
> +	struct pci_epf_bar *epf_bar;
> +	struct pci_epc *epc = epf->epc;
> +	struct device *dev = &epf->dev;
> +	struct pci_epf_virtio *epf_virtio = epf_get_drvdata(epf);
> +	enum pci_barno virtio_reg_bar = epf_virtio->virtio_reg_bar;
> +	const struct pci_epc_features *epc_features;
> +
> +	epc_features = epf_virtio->epc_features;
> +
> +	for (bar = BAR_0; bar <= BAR_5; bar += add) {
> +		epf_bar = &epf->bar[bar];
> +		/*
> +		 * pci_epc_set_bar() sets PCI_BASE_ADDRESS_MEM_TYPE_64
> +		 * if the specific implementation required a 64-bit BAR,
> +		 * even if we only requested a 32-bit BAR.
> +		 */
> +		add = (epf_bar->flags & PCI_BASE_ADDRESS_MEM_TYPE_64) ? 2 : 1;
> +
> +		if (!!(epc_features->reserved_bar & (1 << bar)))
> +			continue;
> +
> +		ret = pci_epc_set_bar(epc, epf->func_no, epf_bar);
> +		if (ret) {
> +			pci_epf_free_space(epf, epf_virtio->reg[bar], bar);
> +			dev_err(dev, "Failed to set BAR%d\n", bar);
> +			if (bar == virtio_reg_bar)
> +				return ret;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Allocate space on BAR 0 for negotiating features and important information
> + * with virtio_pci on the PCI host side.
> + *
> + * @epf: epf handler
> + *
> + * Return 0 on success and a negative error number on failure.
> + */
> +static int pci_epf_virtio_alloc_space(struct pci_epf *epf)
> +{
> +	struct pci_epf_virtio *epf_virtio = epf_get_drvdata(epf);
> +	struct device *dev = &epf->dev;
> +	struct pci_epf_bar *epf_bar;
> +	void *base;
> +	int bar, add;
> +	enum pci_barno virtio_reg_bar = epf_virtio->virtio_reg_bar;
> +	const struct pci_epc_features *epc_features;
> +	size_t virtio_reg_size;
> +
> +	epc_features = epf_virtio->epc_features;
> +
> +	if (epc_features->bar_fixed_size[virtio_reg_bar])
> +		virtio_reg_size = bar_size[virtio_reg_bar];
> +	else
> +		virtio_reg_size = sizeof(struct virtio_legacy_cfg) +
> +				  sizeof(struct virtio_net_config);
> +
> +	base = pci_epf_alloc_space(epf, virtio_reg_size,
> +				   virtio_reg_bar, epc_features->align);
> +	if (!base) {
> +		dev_err(dev, "Failed to allocated register space\n");
> +		return -ENOMEM;
> +	}
> +	epf_virtio->reg[virtio_reg_bar] = base;
> +
> +	for (bar = BAR_0; bar <= BAR_5; bar += add) {
> +		epf_bar = &epf->bar[bar];
> +		add = (epf_bar->flags & PCI_BASE_ADDRESS_MEM_TYPE_64) ? 2 : 1;
> +
> +		if (bar == virtio_reg_bar)
> +			continue;
> +
> +		if (!!(epc_features->reserved_bar & (1 << bar)))
> +			continue;
> +
> +		base = pci_epf_alloc_space(epf, bar_size[bar], bar,
> +					   epc_features->align);
> +		if (!base)
> +			dev_err(dev, "Failed to allocate space for BAR%d\n",
> +				bar);
> +		epf_virtio->reg[bar] = base;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Configure BAR of PCI endpoint device.
> + *
> + * @epf: epf handler
> + * @epc_features: set by vendor-specific epc features
> + */
> +static void pci_epf_configure_bar(struct pci_epf *epf,
> +				  const struct pci_epc_features *epc_features)
> +{
> +	struct pci_epf_bar *epf_bar;
> +	bool bar_fixed_64bit;
> +	int i;
> +
> +	for (i = BAR_0; i <= BAR_5; i++) {
> +		epf_bar = &epf->bar[i];
> +		bar_fixed_64bit = !!(epc_features->bar_fixed_64bit & (1 << i));
> +		if (bar_fixed_64bit)
> +			epf_bar->flags |= PCI_BASE_ADDRESS_MEM_TYPE_64;
> +		if (epc_features->bar_fixed_size[i])
> +			bar_size[i] = epc_features->bar_fixed_size[i];
> +	}
> +}
> +
> +/*
> + * Bind endpoint function device to PCI endpoint controller.
> + *
> + * @epf: epf hanlder
> + *
> + * Return 0 on success and a negative error number on failure.
> + */
> +static int pci_epf_virtio_bind(struct pci_epf *epf)
> +{
> +	int ret;
> +	struct pci_epf_virtio *epf_virtio = epf_get_drvdata(epf);
> +	struct pci_epf_header *header = epf->header;
> +	const struct pci_epc_features *epc_features;
> +	enum pci_barno virtio_reg_bar = BAR_0;
> +	struct pci_epc *epc = epf->epc;
> +	struct device *dev = &epf->dev;
> +	bool msix_capable = false;
> +	bool msi_capable = true;
> +
> +	if (WARN_ON_ONCE(!epc))
> +		return -EINVAL;
> +
> +	epc_features = pci_epc_get_features(epc, epf->func_no);
> +	if (epc_features) {
> +		msix_capable = epc_features->msix_capable;
> +		msi_capable = epc_features->msi_capable;
> +		virtio_reg_bar = pci_epc_get_first_free_bar(epc_features);
> +		pci_epf_configure_bar(epf, epc_features);
> +	}
> +
> +	epf_virtio->virtio_reg_bar = virtio_reg_bar;
> +	epf_virtio->epc_features = epc_features;
> +
> +	ret = pci_epc_write_header(epc, epf->func_no, header);
> +	if (ret) {
> +		dev_err(dev, "Configuration header write failed\n");
> +		return ret;
> +	}
> +
> +	ret = pci_epf_virtio_alloc_space(epf);
> +	if (ret)
> +		return ret;
> +
> +	ret = pci_epf_virtio_set_bar(epf);
> +	if (ret)
> +		return ret;
> +
> +	if (msi_capable) {
> +		ret = pci_epc_set_msi(epc, epf->func_no, epf->msi_interrupts);
> +		if (ret) {
> +			dev_err(dev, "MSI configuration failed\n");
> +			return ret;
> +		}
> +	}
> +
> +	if (msix_capable) {
> +		ret = pci_epc_set_msix(epc, epf->func_no, epf->msix_interrupts);
> +		if (ret) {
> +			dev_err(dev, "MSI-X configuration failed\n");
> +			return ret;
> +		}
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Destroy the virtual device associated with the local virtio device.
> + *
> + * @dev: a device handler to the virtual device
> + */
> +static inline void pci_epf_virtio_release(struct device *dev)
> +{
> +	memset(dev, 0, sizeof(*dev));
> +}
> +
> +/*
> + * Initialize the local epf_virtio_device. This local epf_virtio_device
> + * contains important information other than the virtio_device as required
> + * by the local virtio modules on the PCI endpoint. The fields of
> + * epf_virtio_device mostly mirror those of pci_epf_virtio. They are
> + * conceptual counterparts. pci_epf_virtio serves the remote PCI host,
> + * while epf_virtio_device serves the local PCI endpoint.
> + *
> + * @epf_virtio: epf_virtio handler
> + *
> + * Return 0 on success and a negative error number on failure.
> + */
> +static int init_local_epf_vdev(struct pci_epf_virtio *epf_virtio)
> +{
> +	struct epf_virtio_device *const epf_vdev = &epf_virtio->epf_vdev;
> +
> +	epf_vdev->vdev.dev.parent = &epf_virtio->epf->dev;
> +	epf_vdev->vdev.id.vendor = virtio_header.subsys_vendor_id;
> +	epf_vdev->vdev.id.device = virtio_header.subsys_id;
> +	epf_vdev->vdev.config = &epf_virtio_local_dev_config_ops;
> +	epf_vdev->vdev.dev.release = pci_epf_virtio_release;
> +	epf_vdev->local_cfg.dev_feature =
> +		generate_local_dev_feature64(local_features,
> +					     ARRAY_SIZE(local_features));
> +	epf_vdev->local_net_cfg.max_virtqueue_pairs = 1;
> +	epf_vdev->registered = false;
> +	memcpy(epf_vdev->local_net_cfg.mac, local_mac, ETH_ALEN);
> +	return 0;
> +}
> +
> +/*
> + * Endpoint function driver's probe function. This will get called
> + * when an endpoint function device is created by the user in userspace
> + * after kernel bootup with config filesystem.
> + *
> + * @epf: epf handler
> + *
> + * Return 0 on success and a negative error number on failure.
> + */
> +static int pci_epf_virtio_probe(struct pci_epf *epf)
> +{
> +	int ret;
> +	struct pci_epf_virtio *epf_virtio;
> +	struct device *dev = &epf->dev;
> +
> +	epf_virtio = devm_kzalloc(dev, sizeof(*epf_virtio), GFP_KERNEL);
> +	if (!epf_virtio)
> +		return -ENOMEM;
> +	epf->header = &virtio_header;
> +	epf_virtio->epf = epf;
> +	ret = init_local_epf_vdev(epf_virtio);
> +	if (ret) {
> +		dev_err(&epf_virtio->epf_vdev.vdev.dev,
> +			"Failed to initialize local virtio device\n");
> +		devm_kfree(dev, epf_virtio);
> +		return ret;
> +	}
> +	epf_virtio->q_pfns = kcalloc(2,
> +				     sizeof(*epf_virtio->q_pfns),
> +				     GFP_KERNEL);
> +	epf_virtio->q_addrs = kcalloc(2,
> +				      sizeof(*epf_virtio->q_addrs),
> +				      GFP_KERNEL);
> +	atomic_set(&epf_virtio->q_pfns[0], 0);
> +	atomic_set(&epf_virtio->q_pfns[1], 0);
> +	epf_set_drvdata(epf, epf_virtio);
> +	return 0;
> +}
> +
> +/* This function table is used by pci_epf_core. */
> +static struct pci_epf_ops ops = {
> +	.unbind	= pci_epf_virtio_unbind,
> +	.bind	= pci_epf_virtio_bind,
> +	.linkup = pci_epf_virtio_linkup,
> +};
> +
> +/* This function table is used by virtio.c on PCI endpoint */
> +static struct pci_epf_driver virtio_driver = {
> +	.driver.name	= "pci_epf_virtio",
> +	.probe		= pci_epf_virtio_probe,
> +	.id_table	= pci_epf_virtio_ids,
> +	.ops		= &ops,
> +	.owner		= THIS_MODULE,
> +};
> +
> +static int __init pci_epf_virtio_init(void)
> +{
> +	int ret;
> +
> +	ret = pci_epf_register_driver(&virtio_driver);
> +	if (ret) {
> +		pr_err("Failed to register pci epf virtio driver --> %d\n",
> +		       ret);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +module_init(pci_epf_virtio_init);
> +
> +static void __exit pci_epf_virtio_exit(void)
> +{
> +	pci_epf_unregister_driver(&virtio_driver);
> +}
> +module_exit(pci_epf_virtio_exit);
> +
> +MODULE_DESCRIPTION("PCI EPF VIRTIO DRIVER");
> +MODULE_AUTHOR("Haotian Wang <haotian.wang@sifive.com, haotian.wang@duke.edu>");
> +MODULE_LICENSE("GPL v2");
> diff --git a/include/linux/pci-epf-virtio.h b/include/linux/pci-epf-virtio.h
> new file mode 100644
> index 000000000000..d68e8d0f570c
> --- /dev/null
> +++ b/include/linux/pci-epf-virtio.h
> @@ -0,0 +1,253 @@
> +/* SPDX-License-Identifier: GPL-2.0*/
> +#ifndef PCI_EPF_VIRTIO_H
> +#define PCI_EPF_VIRTIO_H
> +
> +#define VIRTIO_DEVICE_ID		(0x1000)
> +#define VIRTIO_NET_SUBSYS_ID		1
> +
> +#define EPF_VIRTIO_QUEUE_SIZE_SHIFT	5
> +#define EPF_VIRTIO_QUEUE_SIZE		BIT(EPF_VIRTIO_QUEUE_SIZE_SHIFT)
> +#define MAP_CACHE_SIZE			5
> +#define CATCH_NOTIFY_USEC_MIN		10
> +#define CATCH_NOTIFY_USEC_MAX		20
> +#define CHECK_QUEUES_USEC_MIN		100
> +#define CHECK_QUEUES_USEC_MAX		200
> +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> +#define EVENT_SUPPRESSION		3
> +#endif
> +#ifdef CONFIG_PCI_ENDPOINT_DMAENGINE
> +#define ENABLE_DMA			0
> +#endif
> +
> +#define VIRTIO_PCI_ISR_HIGH		1
> +
> +#define vdev_to_epf_vdev(vdev_ptr)		\
> +	container_of(vdev_ptr,			\
> +		     struct epf_virtio_device,	\
> +		     vdev)
> +
> +#define MODULO_QUEUE_SIZE(x)		((x) & (EPF_VIRTIO_QUEUE_SIZE - 1))
> +
> +/* These macros are used because structs are on PCI host */
> +#define IO_MEMBER_PTR(base_ptr, type, member)				\
> +	((base_ptr) + offsetof(type, member))
> +
> +#define IO_MEMBER_ARR_ELEM_PTR(base_ptr,			\
> +			       type,				\
> +			       member,				\
> +			       member_type,			\
> +			       index)				\
> +	(							\
> +		(base_ptr) + offsetof(type, member) +		\
> +		(index) * sizeof(member_type)			\
> +	)
> +
> +#define IO_ARR_ELEM_PTR(base_ptr, type, index)				\
> +	((base_ptr) + (index) * sizeof(type))
> +
> +#define DO_NOTHING {}
> +
> +static const u8 host_mac[ETH_ALEN] = { 2, 2, 2, 2, 2, 2 };
> +
> +static const u8 local_mac[ETH_ALEN] = { 4, 4, 4, 4, 4, 4 };
> +
> +static const struct sched_param high_rt = {
> +	.sched_priority = MAX_RT_PRIO - 1
> +};
> +
> +static const struct sched_param normal_param = {
> +	.sched_priority = 0
> +};
> +
> +static const unsigned int features[] = {
> +	VIRTIO_NET_F_MAC,
> +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> +	VIRTIO_RING_F_EVENT_IDX,
> +#endif
> +	VIRTIO_NET_F_GUEST_CSUM,
> +};
> +
> +static const unsigned int local_features[] = {
> +	VIRTIO_NET_F_MAC,
> +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> +	VIRTIO_RING_F_EVENT_IDX,
> +#endif
> +	VIRTIO_NET_F_GUEST_CSUM,
> +};
> +
> +static const struct pci_epf_device_id pci_epf_virtio_ids[] = {
> +	{
> +		.name = "pci_epf_virtio",
> +	},
> +	{},
> +};
> +
> +struct pci_epf_map {
> +	size_t iosize;
> +	size_t	align;
> +	void __iomem *ioaddr;
> +	void __iomem *iobase;
> +	phys_addr_t phys_ioaddr;
> +	phys_addr_t phys_iobase;
> +	u64 prev_host_base;
> +	struct pci_epf *epf;
> +	struct pci_epc *epc;
> +	struct list_head node;
> +};
> +
> +struct virtio_legacy_cfg {
> +	__virtio32	dev_feature;
> +	__virtio32	drv_feature;
> +	__virtio32	q_addr;
> +	__virtio16	q_size;
> +	__virtio16	q_select;
> +	__virtio16	q_notify;
> +	u8		dev_status;
> +	u8		isr_status;
> +} __packed;
> +
> +struct virtio_local_cfg {
> +	u64	dev_feature;
> +	u64	drv_feature;
> +	u8	dev_status;
> +};
> +
> +struct mem_frag {
> +	u64	addr;
> +	u32	len;
> +};
> +
> +struct epf_virtio_device {
> +	struct virtio_device		vdev;
> +	struct virtio_local_cfg		local_cfg;
> +	struct virtio_net_config	local_net_cfg;
> +	struct vring			*vrings[2];
> +	struct task_struct		*local_handle_vq;
> +	struct mem_frag			local_frags[2];
> +	struct kmem_cache		*local_map_slab;
> +	struct list_head		local_lru_head;
> +	bool				registered;
> +	atomic_t			*local_pending;
> +};
> +
> +struct pci_epf_virtio {
> +	void			*reg[6];
> +	atomic_t		*pending;
> +	atomic_t		*q_pfns;
> +	u64			*q_addrs;
> +	struct mem_frag		frags[2];
> +	struct pci_epf_map	q_map[2];
> +	void __iomem		*desc[2];
> +	void __iomem		*avail[2];
> +	void __iomem		*used[2];
> +	struct pci_epf		*epf;
> +	enum pci_barno		virtio_reg_bar;
> +	struct kmem_cache	*map_slab;
> +	struct list_head	lru_head;
> +	struct task_struct	*early_task;
> +	struct task_struct	*catch_notif;
> +	struct task_struct	*handle_vq;
> +	struct epf_virtio_device	epf_vdev;
> +	struct virtio_legacy_cfg	*legacy_cfg;
> +	struct virtio_net_config	*net_cfg;
> +	const struct pci_epc_features	*epc_features;
> +};
> +
> +struct vring_desc_state_split {
> +	void *data;			/* Data for callback. */
> +	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
> +};
> +
> +struct vring_desc_state_packed {
> +	void *data;			/* Data for callback. */
> +	struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
> +	u16 num;			/* Descriptor list length. */
> +	u16 next;			/* The next desc state in a list. */
> +	u16 last;			/* The last desc state in a list. */
> +};
> +
> +struct vring_desc_extra_packed {
> +	dma_addr_t addr;		/* Buffer DMA addr. */
> +	u32 len;			/* Buffer length. */
> +	u16 flags;			/* Descriptor flags. */
> +};
> +
> +struct vring_virtqueue {
> +	struct virtqueue vq;
> +	bool packed_ring;
> +	bool use_dma_api;
> +	bool weak_barriers;
> +	bool broken;
> +	bool indirect;
> +	bool event;
> +	unsigned int free_head;
> +	unsigned int num_added;
> +	u16 last_used_idx;
> +	union {
> +		struct {
> +			struct vring vring;
> +			u16 avail_flags_shadow;
> +			u16 avail_idx_shadow;
> +			struct vring_desc_state_split *desc_state;
> +			dma_addr_t queue_dma_addr;
> +			size_t queue_size_in_bytes;
> +		} split;
> +		struct {
> +			struct {
> +				unsigned int num;
> +				struct vring_packed_desc *desc;
> +				struct vring_packed_desc_event *driver;
> +				struct vring_packed_desc_event *device;
> +			} vring;
> +			bool avail_wrap_counter;
> +			bool used_wrap_counter;
> +			u16 avail_used_flags;
> +			u16 next_avail_idx;
> +			u16 event_flags_shadow;
> +			struct vring_desc_state_packed *desc_state;
> +			struct vring_desc_extra_packed *desc_extra;
> +			dma_addr_t ring_dma_addr;
> +			dma_addr_t driver_event_dma_addr;
> +			dma_addr_t device_event_dma_addr;
> +			size_t ring_size_in_bytes;
> +			size_t event_size_in_bytes;
> +		} packed;
> +	};
> +	bool (*notify)(struct virtqueue *vq);
> +	bool we_own_ring;
> +#ifdef DEBUG
> +	unsigned int in_use;
> +	bool last_add_time_valid;
> +	ktime_t last_add_time;
> +#endif
> +};
> +
> +struct virtnet_info {
> +	struct virtio_device *vdev;
> +	struct virtqueue *cvq;
> +	struct net_device *dev;
> +	struct send_queue *sq;
> +	struct receive_queue *rq;
> +	unsigned int status;
> +	u16 max_queue_pairs;
> +	u16 curr_queue_pairs;
> +	u16 xdp_queue_pairs;
> +	bool big_packets;
> +	bool mergeable_rx_bufs;
> +	bool has_cvq;
> +	bool any_header_sg;
> +	u8 hdr_len;
> +	struct delayed_work refill;
> +	struct work_struct config_work;
> +	bool affinity_hint_set;
> +	struct hlist_node node;
> +	struct hlist_node node_dead;
> +	struct control_buf *ctrl;
> +	u8 duplex;
> +	u32 speed;
> +	unsigned long guest_offloads;
> +	unsigned long guest_offloads_capable;
> +	struct failover *failover;
> +};
> +
> +#endif
> -- 
> 2.20.1

  parent reply	other threads:[~2019-09-03  6:25 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-23 21:31 [PATCH] pci: endpoint: functions: Add a virtnet EP function Haotian Wang
2019-08-26 10:51 ` Kishon Vijay Abraham I
2019-08-26 21:59   ` Haotian Wang
2019-08-27  8:12     ` Kishon Vijay Abraham I
2019-08-27 18:01       ` Haotian Wang
2019-08-30  6:11 ` Jason Wang
2019-08-30 23:06   ` Haotian Wang
2019-09-02  3:50     ` Jason Wang
2019-09-02 20:05       ` Haotian Wang
2019-09-03 10:42         ` Jason Wang
2019-09-04  0:55           ` Haotian Wang
2019-09-04 21:58           ` Haotian Wang
2019-09-05  2:56             ` Jason Wang
2019-09-05  3:28               ` Haotian Wang
2019-11-25 12:49               ` Kishon Vijay Abraham I
2019-11-26  9:58                 ` Jason Wang
2019-11-26 12:35                   ` Kishon Vijay Abraham I
2019-11-26 21:55                     ` Alan Mikhak
2019-11-26 22:01                       ` Alan Mikhak
2019-11-27  3:04                         ` Jason Wang
2019-09-03  6:25 ` Michael S. Tsirkin [this message]
2019-09-03 20:39   ` Haotian Wang
2019-09-05  7:07     ` Michael S. Tsirkin
2019-09-05 16:15       ` Haotian Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190903021403-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=haotian.wang@duke.edu \
    --cc=haotian.wang@sifive.com \
    --cc=jasowang@redhat.com \
    --cc=kishon@ti.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).