All of lore.kernel.org
 help / color / mirror / Atom feed
* pull request: sfc-next 2012-02-16
@ 2012-02-16  0:42 Ben Hutchings
  2012-02-16  0:44 ` [PATCH net-next 01/19] sfc: Skip RX end-of-batch work on channels without an RX queue Ben Hutchings
                   ` (19 more replies)
  0 siblings, 20 replies; 25+ messages in thread
From: Ben Hutchings @ 2012-02-16  0:42 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers, Shradha Shah

[-- Attachment #1: Type: text/plain, Size: 3121 bytes --]

The following changes since commit eb40d89276705a35c9a2e05793ae63411ae357eb:

  mlx4: add unicast steering entries to resource_tracker (2012-02-14 14:11:59 -0500)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next.git for-davem

(commit cd2d5b529cdb9bd274f3e4bc68d37d4d63b7f383)

The main change here is the addition of SR-IOV support.

Ben.

Ben Hutchings (18):
      sfc: Skip RX end-of-batch work on channels without an RX queue
      sfc: Do not retry hardware probe if it schedules a reset
      sfc: Replace some literal constants with EFX_PAGE_SIZE/EFX_BUF_SIZE
      sfc: Warn if unable to create MTDs
      sfc: Add support for configuring RX unicast/multicast default filters
      sfc: Add support for TX MAC filters
      sfc: Correct MAC filter bitfield definitions
      sfc: Generalise driver event generation
      sfc: Generate RX fill events based on RX queues, not channels
      sfc: Leave interrupts and event queues enabled whenever we can
      sfc: Use proper function to test for RX channel in efx_poll()
      sfc: Generalise event generation to cover VF-owned event queues
      sfc: Make buffer table indices and counts consistently unsigned
      sfc: Make all CPU/IRQ/channel/queue counts unsigned
      sfc: Add support for 'extra' channel types
      sfc: Pass NIC structure into efx_wanted_parallelism()
      sfc: Allocate SRAM between buffer table and descriptor caches at init time
      sfc: Add SR-IOV back-end support for SFC9000 family

Steve Hodgson (1):
      sfc: Disable flow control during flushes

 drivers/net/ethernet/sfc/Kconfig       |    8 +
 drivers/net/ethernet/sfc/Makefile      |    1 +
 drivers/net/ethernet/sfc/efx.c         |  685 ++++++++------
 drivers/net/ethernet/sfc/efx.h         |    1 +
 drivers/net/ethernet/sfc/ethtool.c     |   62 +-
 drivers/net/ethernet/sfc/falcon.c      |   12 +-
 drivers/net/ethernet/sfc/filter.c      |  255 +++++-
 drivers/net/ethernet/sfc/filter.h      |   20 +-
 drivers/net/ethernet/sfc/mcdi.c        |   34 +
 drivers/net/ethernet/sfc/mcdi.h        |    2 +
 drivers/net/ethernet/sfc/mcdi_mac.c    |    4 +-
 drivers/net/ethernet/sfc/mtd.c         |    2 +-
 drivers/net/ethernet/sfc/net_driver.h  |  123 ++-
 drivers/net/ethernet/sfc/nic.c         |  524 ++++++-----
 drivers/net/ethernet/sfc/nic.h         |  102 ++-
 drivers/net/ethernet/sfc/regs.h        |   20 +-
 drivers/net/ethernet/sfc/rx.c          |    7 +-
 drivers/net/ethernet/sfc/siena.c       |   14 +-
 drivers/net/ethernet/sfc/siena_sriov.c | 1642 ++++++++++++++++++++++++++++++++
 drivers/net/ethernet/sfc/tx.c          |    2 +-
 drivers/net/ethernet/sfc/vfdi.h        |  254 +++++
 21 files changed, 3170 insertions(+), 604 deletions(-)
 create mode 100644 drivers/net/ethernet/sfc/siena_sriov.c
 create mode 100644 drivers/net/ethernet/sfc/vfdi.h

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH net-next 01/19] sfc: Skip RX end-of-batch work on channels without an RX queue
  2012-02-16  0:42 pull request: sfc-next 2012-02-16 Ben Hutchings
@ 2012-02-16  0:44 ` Ben Hutchings
  2012-02-16  0:44 ` [PATCH net-next 02/19] sfc: Do not retry hardware probe if it schedules a reset Ben Hutchings
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Ben Hutchings @ 2012-02-16  0:44 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

The code in efx_process_channel() to update the RX queue after each
batch of RX completions works out as a no-op on a TX-only channel
where the RX queue structure is set to all-zeroes, but
(1) efx_channel_get_rx_queue() will BUG() if DEBUG is defined, and
(2) it's a waste of time.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/efx.c |   21 +++++++++++----------
 1 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 952d0bf..b7cf9f0 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -224,19 +224,20 @@ static int efx_process_channel(struct efx_channel *channel, int budget)
 		return 0;
 
 	spent = efx_nic_process_eventq(channel, budget);
-	if (spent == 0)
-		return 0;
+	if (spent && efx_channel_has_rx_queue(channel)) {
+		struct efx_rx_queue *rx_queue =
+			efx_channel_get_rx_queue(channel);
+
+		/* Deliver last RX packet. */
+		if (channel->rx_pkt) {
+			__efx_rx_packet(channel, channel->rx_pkt);
+			channel->rx_pkt = NULL;
+		}
 
-	/* Deliver last RX packet. */
-	if (channel->rx_pkt) {
-		__efx_rx_packet(channel, channel->rx_pkt);
-		channel->rx_pkt = NULL;
+		efx_rx_strategy(channel);
+		efx_fast_push_rx_descriptors(rx_queue);
 	}
 
-	efx_rx_strategy(channel);
-
-	efx_fast_push_rx_descriptors(efx_channel_get_rx_queue(channel));
-
 	return spent;
 }
 
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 02/19] sfc: Do not retry hardware probe if it schedules a reset
  2012-02-16  0:42 pull request: sfc-next 2012-02-16 Ben Hutchings
  2012-02-16  0:44 ` [PATCH net-next 01/19] sfc: Skip RX end-of-batch work on channels without an RX queue Ben Hutchings
@ 2012-02-16  0:44 ` Ben Hutchings
  2012-02-16  0:45 ` [PATCH net-next 03/19] sfc: Replace some literal constants with EFX_PAGE_SIZE/EFX_BUF_SIZE Ben Hutchings
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Ben Hutchings @ 2012-02-16  0:44 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

If efx_pci_probe_main() schedules an INVISIBLE or ALL reset (but
nothing more drastic), we retry it up to 5 times.  So far as I'm
aware, this was a workaround for bugs in Falcon A0 which were fixed
in production silicon.  Remove the retry.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/efx.c |   50 ++++++++++++---------------------------
 1 files changed, 16 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index b7cf9f0..2e6389c 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -2440,7 +2440,7 @@ static int __devinit efx_pci_probe(struct pci_dev *pci_dev,
 	const struct efx_nic_type *type = (const struct efx_nic_type *) entry->driver_data;
 	struct net_device *net_dev;
 	struct efx_nic *efx;
-	int i, rc;
+	int rc;
 
 	/* Allocate and initialise a struct net_device and struct efx_nic */
 	net_dev = alloc_etherdev_mqs(sizeof(*efx), EFX_MAX_CORE_TX_QUEUES,
@@ -2473,39 +2473,22 @@ static int __devinit efx_pci_probe(struct pci_dev *pci_dev,
 	if (rc)
 		goto fail2;
 
-	/* No serialisation is required with the reset path because
-	 * we're in STATE_INIT. */
-	for (i = 0; i < 5; i++) {
-		rc = efx_pci_probe_main(efx);
-
-		/* Serialise against efx_reset(). No more resets will be
-		 * scheduled since efx_stop_all() has been called, and we
-		 * have not and never have been registered with either
-		 * the rtnetlink or driverlink layers. */
-		cancel_work_sync(&efx->reset_work);
-
-		if (rc == 0) {
-			if (efx->reset_pending) {
-				/* If there was a scheduled reset during
-				 * probe, the NIC is probably hosed anyway */
-				efx_pci_remove_main(efx);
-				rc = -EIO;
-			} else {
-				break;
-			}
-		}
+	rc = efx_pci_probe_main(efx);
 
-		/* Retry if a recoverably reset event has been scheduled */
-		if (efx->reset_pending &
-		    ~(1 << RESET_TYPE_INVISIBLE | 1 << RESET_TYPE_ALL) ||
-		    !efx->reset_pending)
-			goto fail3;
+	/* Serialise against efx_reset(). No more resets will be
+	 * scheduled since efx_stop_all() has been called, and we have
+	 * not and never have been registered.
+	 */
+	cancel_work_sync(&efx->reset_work);
 
-		efx->reset_pending = 0;
-	}
+	if (rc)
+		goto fail3;
 
-	if (rc) {
-		netif_err(efx, probe, efx->net_dev, "Could not reset NIC\n");
+	/* If there was a scheduled reset during probe, the NIC is
+	 * probably hosed anyway.
+	 */
+	if (efx->reset_pending) {
+		rc = -EIO;
 		goto fail4;
 	}
 
@@ -2515,7 +2498,7 @@ static int __devinit efx_pci_probe(struct pci_dev *pci_dev,
 
 	rc = efx_register_netdev(efx);
 	if (rc)
-		goto fail5;
+		goto fail4;
 
 	netif_dbg(efx, probe, efx->net_dev, "initialisation successful\n");
 
@@ -2524,9 +2507,8 @@ static int __devinit efx_pci_probe(struct pci_dev *pci_dev,
 	rtnl_unlock();
 	return 0;
 
- fail5:
-	efx_pci_remove_main(efx);
  fail4:
+	efx_pci_remove_main(efx);
  fail3:
 	efx_fini_io(efx);
  fail2:
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 03/19] sfc: Replace some literal constants with EFX_PAGE_SIZE/EFX_BUF_SIZE
  2012-02-16  0:42 pull request: sfc-next 2012-02-16 Ben Hutchings
  2012-02-16  0:44 ` [PATCH net-next 01/19] sfc: Skip RX end-of-batch work on channels without an RX queue Ben Hutchings
  2012-02-16  0:44 ` [PATCH net-next 02/19] sfc: Do not retry hardware probe if it schedules a reset Ben Hutchings
@ 2012-02-16  0:45 ` Ben Hutchings
  2012-02-16  0:45 ` [PATCH net-next 04/19] sfc: Warn if unable to create MTDs Ben Hutchings
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Ben Hutchings @ 2012-02-16  0:45 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

The 'page size' for PCIe DMA, i.e. the alignment of boundaries at
which DMA must be broken, is 4KB.  Name this value as EFX_PAGE_SIZE
and use it in efx_max_tx_len().  Redefine EFX_BUF_SIZE as
EFX_PAGE_SIZE since its value is also a result of that requirement,
and use it in efx_init_special_buffer().

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/nic.c |    5 +----
 drivers/net/ethernet/sfc/nic.h |    5 +++++
 drivers/net/ethernet/sfc/tx.c  |    2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/sfc/nic.c b/drivers/net/ethernet/sfc/nic.c
index a43d1ca..dd50c4f 100644
--- a/drivers/net/ethernet/sfc/nic.c
+++ b/drivers/net/ethernet/sfc/nic.c
@@ -54,9 +54,6 @@
 #define EFX_FLUSH_INTERVAL 10
 #define EFX_FLUSH_POLL_COUNT 100
 
-/* Size and alignment of special buffers (4KB) */
-#define EFX_BUF_SIZE 4096
-
 /* Depth of RX flush request fifo */
 #define EFX_RX_FLUSH_COUNT 4
 
@@ -196,7 +193,7 @@ efx_init_special_buffer(struct efx_nic *efx, struct efx_special_buffer *buffer)
 	/* Write buffer descriptors to NIC */
 	for (i = 0; i < buffer->entries; i++) {
 		index = buffer->index + i;
-		dma_addr = buffer->dma_addr + (i * 4096);
+		dma_addr = buffer->dma_addr + (i * EFX_BUF_SIZE);
 		netif_dbg(efx, probe, efx->net_dev,
 			  "mapping special buffer %d at %llx\n",
 			  index, (unsigned long long)dma_addr);
diff --git a/drivers/net/ethernet/sfc/nic.h b/drivers/net/ethernet/sfc/nic.h
index 905a187..4f9d18a 100644
--- a/drivers/net/ethernet/sfc/nic.h
+++ b/drivers/net/ethernet/sfc/nic.h
@@ -65,6 +65,11 @@ enum {
 #define FALCON_GMAC_LOOPBACKS			\
 	(1 << LOOPBACK_GMAC)
 
+/* Alignment of PCIe DMA boundaries (4KB) */
+#define EFX_PAGE_SIZE	4096
+/* Size and alignment of buffer table entries (same) */
+#define EFX_BUF_SIZE	EFX_PAGE_SIZE
+
 /**
  * struct falcon_board_type - board operations and type information
  * @id: Board type id, as found in NVRAM
diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index 5cb81fa..a096e28 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -110,7 +110,7 @@ efx_max_tx_len(struct efx_nic *efx, dma_addr_t dma_addr)
 	 * little benefit from using descriptors that cross those
 	 * boundaries and we keep things simple by not doing so.
 	 */
-	unsigned len = (~dma_addr & 0xfff) + 1;
+	unsigned len = (~dma_addr & (EFX_PAGE_SIZE - 1)) + 1;
 
 	/* Work around hardware bug for unaligned buffers. */
 	if (EFX_WORKAROUND_5391(efx) && (dma_addr & 0xf))
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 04/19] sfc: Warn if unable to create MTDs
  2012-02-16  0:42 pull request: sfc-next 2012-02-16 Ben Hutchings
                   ` (2 preceding siblings ...)
  2012-02-16  0:45 ` [PATCH net-next 03/19] sfc: Replace some literal constants with EFX_PAGE_SIZE/EFX_BUF_SIZE Ben Hutchings
@ 2012-02-16  0:45 ` Ben Hutchings
  2012-02-16  0:46 ` [PATCH net-next 05/19] sfc: Add support for configuring RX unicast/multicast default filters Ben Hutchings
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Ben Hutchings @ 2012-02-16  0:45 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers, Shradha Shah

Log an explicit warning if we are unable to create MTDs for a net
device.  Also correct the comment about why mtd_device_register() may
fail; there is no longer an MTD table to fill up.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/efx.c |    7 ++++++-
 drivers/net/ethernet/sfc/mtd.c |    2 +-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 2e6389c..ce07aa5 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -2502,9 +2502,14 @@ static int __devinit efx_pci_probe(struct pci_dev *pci_dev,
 
 	netif_dbg(efx, probe, efx->net_dev, "initialisation successful\n");
 
+	/* Try to create MTDs, but allow this to fail */
 	rtnl_lock();
-	efx_mtd_probe(efx); /* allowed to fail */
+	rc = efx_mtd_probe(efx);
 	rtnl_unlock();
+	if (rc)
+		netif_warn(efx, probe, efx->net_dev,
+			   "failed to create MTDs (%d)\n", rc);
+
 	return 0;
 
  fail4:
diff --git a/drivers/net/ethernet/sfc/mtd.c b/drivers/net/ethernet/sfc/mtd.c
index 79c1922..26b3c23 100644
--- a/drivers/net/ethernet/sfc/mtd.c
+++ b/drivers/net/ethernet/sfc/mtd.c
@@ -280,7 +280,7 @@ fail:
 		--part;
 		efx_mtd_remove_partition(part);
 	}
-	/* mtd_device_register() returns 1 if the MTD table is full */
+	/* Failure is unlikely here, but probably means we're out of memory */
 	return -ENOMEM;
 }
 
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 05/19] sfc: Add support for configuring RX unicast/multicast default filters
  2012-02-16  0:42 pull request: sfc-next 2012-02-16 Ben Hutchings
                   ` (3 preceding siblings ...)
  2012-02-16  0:45 ` [PATCH net-next 04/19] sfc: Warn if unable to create MTDs Ben Hutchings
@ 2012-02-16  0:46 ` Ben Hutchings
  2012-02-16  0:46 ` [PATCH net-next 06/19] sfc: Add support for TX MAC filters Ben Hutchings
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Ben Hutchings @ 2012-02-16  0:46 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

On Siena all received packets that don't match a more specific filter
will match the unicast or multicast default filter.  Currently we
leave these set to the default values (RSS with base queue number of
0).  Allow them to be reconfigured to select a single RX queue.

These default filters are programmed through the FILTER_CTL register,
but we represent them internally as an additional table of size 2.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/ethtool.c |   59 ++++++++---
 drivers/net/ethernet/sfc/filter.c  |  196 ++++++++++++++++++++++++++++++------
 drivers/net/ethernet/sfc/filter.h  |    6 +
 3 files changed, 214 insertions(+), 47 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ethtool.c b/drivers/net/ethernet/sfc/ethtool.c
index f887f65..8319115 100644
--- a/drivers/net/ethernet/sfc/ethtool.c
+++ b/drivers/net/ethernet/sfc/ethtool.c
@@ -808,11 +808,16 @@ static int efx_ethtool_reset(struct net_device *net_dev, u32 *flags)
 	return efx_reset(efx, rc);
 }
 
+/* MAC address mask including only MC flag */
+static const u8 mac_addr_mc_mask[ETH_ALEN] = { 0x01, 0, 0, 0, 0, 0 };
+
 static int efx_ethtool_get_class_rule(struct efx_nic *efx,
 				      struct ethtool_rx_flow_spec *rule)
 {
 	struct ethtool_tcpip4_spec *ip_entry = &rule->h_u.tcp_ip4_spec;
 	struct ethtool_tcpip4_spec *ip_mask = &rule->m_u.tcp_ip4_spec;
+	struct ethhdr *mac_entry = &rule->h_u.ether_spec;
+	struct ethhdr *mac_mask = &rule->m_u.ether_spec;
 	struct efx_filter_spec spec;
 	u16 vid;
 	u8 proto;
@@ -828,11 +833,18 @@ static int efx_ethtool_get_class_rule(struct efx_nic *efx,
 	else
 		rule->ring_cookie = spec.dmaq_id;
 
-	rc = efx_filter_get_eth_local(&spec, &vid,
-				      rule->h_u.ether_spec.h_dest);
+	if (spec.type == EFX_FILTER_MC_DEF || spec.type == EFX_FILTER_UC_DEF) {
+		rule->flow_type = ETHER_FLOW;
+		memcpy(mac_mask->h_dest, mac_addr_mc_mask, ETH_ALEN);
+		if (spec.type == EFX_FILTER_MC_DEF)
+			memcpy(mac_entry->h_dest, mac_addr_mc_mask, ETH_ALEN);
+		return 0;
+	}
+
+	rc = efx_filter_get_eth_local(&spec, &vid, mac_entry->h_dest);
 	if (rc == 0) {
 		rule->flow_type = ETHER_FLOW;
-		memset(rule->m_u.ether_spec.h_dest, ~0, ETH_ALEN);
+		memset(mac_mask->h_dest, ~0, ETH_ALEN);
 		if (vid != EFX_FILTER_VID_UNSPEC) {
 			rule->flow_type |= FLOW_EXT;
 			rule->h_ext.vlan_tci = htons(vid);
@@ -1001,27 +1013,40 @@ static int efx_ethtool_set_class_rule(struct efx_nic *efx,
 	}
 
 	case ETHER_FLOW | FLOW_EXT:
-		/* Must match all or none of VID */
-		if (rule->m_ext.vlan_tci != htons(0xfff) &&
-		    rule->m_ext.vlan_tci != 0)
-			return -EINVAL;
-	case ETHER_FLOW:
-		/* Must match all of destination */
-		if (!is_broadcast_ether_addr(mac_mask->h_dest))
-			return -EINVAL;
-		/* and nothing else */
+	case ETHER_FLOW: {
+		u16 vlan_tag_mask = (rule->flow_type & FLOW_EXT ?
+				     ntohs(rule->m_ext.vlan_tci) : 0);
+
+		/* Must not match on source address or Ethertype */
 		if (!is_zero_ether_addr(mac_mask->h_source) ||
 		    mac_mask->h_proto)
 			return -EINVAL;
 
-		rc = efx_filter_set_eth_local(
-			&spec,
-			(rule->flow_type & FLOW_EXT && rule->m_ext.vlan_tci) ?
-			ntohs(rule->h_ext.vlan_tci) : EFX_FILTER_VID_UNSPEC,
-			mac_entry->h_dest);
+		/* Is it a default UC or MC filter? */
+		if (!compare_ether_addr(mac_mask->h_dest, mac_addr_mc_mask) &&
+		    vlan_tag_mask == 0) {
+			if (is_multicast_ether_addr(mac_entry->h_dest))
+				rc = efx_filter_set_mc_def(&spec);
+			else
+				rc = efx_filter_set_uc_def(&spec);
+		}
+		/* Otherwise, it must match all of destination and all
+		 * or none of VID.
+		 */
+		else if (is_broadcast_ether_addr(mac_mask->h_dest) &&
+			 (vlan_tag_mask == 0xfff || vlan_tag_mask == 0)) {
+			rc = efx_filter_set_eth_local(
+				&spec,
+				vlan_tag_mask ?
+				ntohs(rule->h_ext.vlan_tci) : EFX_FILTER_VID_UNSPEC,
+				mac_entry->h_dest);
+		} else {
+			rc = -EINVAL;
+		}
 		if (rc)
 			return rc;
 		break;
+	}
 
 	default:
 		return -EINVAL;
diff --git a/drivers/net/ethernet/sfc/filter.c b/drivers/net/ethernet/sfc/filter.c
index 1fbbbee..92517ea 100644
--- a/drivers/net/ethernet/sfc/filter.c
+++ b/drivers/net/ethernet/sfc/filter.c
@@ -35,9 +35,16 @@
 enum efx_filter_table_id {
 	EFX_FILTER_TABLE_RX_IP = 0,
 	EFX_FILTER_TABLE_RX_MAC,
+	EFX_FILTER_TABLE_RX_DEF,
 	EFX_FILTER_TABLE_COUNT,
 };
 
+enum efx_filter_index {
+	EFX_FILTER_INDEX_UC_DEF,
+	EFX_FILTER_INDEX_MC_DEF,
+	EFX_FILTER_SIZE_RX_DEF,
+};
+
 struct efx_filter_table {
 	enum efx_filter_table_id id;
 	u32		offset;		/* address of table relative to BAR */
@@ -109,7 +116,7 @@ static void efx_filter_table_reset_search_depth(struct efx_filter_table *table)
 	memset(table->search_depth, 0, sizeof(table->search_depth));
 }
 
-static void efx_filter_push_rx_limits(struct efx_nic *efx)
+static void efx_filter_push_rx_config(struct efx_nic *efx)
 {
 	struct efx_filter_state *state = efx->filter_state;
 	struct efx_filter_table *table;
@@ -143,6 +150,32 @@ static void efx_filter_push_rx_limits(struct efx_nic *efx)
 			FILTER_CTL_SRCH_FUDGE_WILD);
 	}
 
+	table = &state->table[EFX_FILTER_TABLE_RX_DEF];
+	if (table->size) {
+		EFX_SET_OWORD_FIELD(
+			filter_ctl, FRF_CZ_UNICAST_NOMATCH_Q_ID,
+			table->spec[EFX_FILTER_INDEX_UC_DEF].dmaq_id);
+		EFX_SET_OWORD_FIELD(
+			filter_ctl, FRF_CZ_UNICAST_NOMATCH_RSS_ENABLED,
+			!!(table->spec[EFX_FILTER_INDEX_UC_DEF].flags &
+			   EFX_FILTER_FLAG_RX_RSS));
+		EFX_SET_OWORD_FIELD(
+			filter_ctl, FRF_CZ_UNICAST_NOMATCH_IP_OVERRIDE,
+			!!(table->spec[EFX_FILTER_INDEX_UC_DEF].flags &
+			   EFX_FILTER_FLAG_RX_OVERRIDE_IP));
+		EFX_SET_OWORD_FIELD(
+			filter_ctl, FRF_CZ_MULTICAST_NOMATCH_Q_ID,
+			table->spec[EFX_FILTER_INDEX_MC_DEF].dmaq_id);
+		EFX_SET_OWORD_FIELD(
+			filter_ctl, FRF_CZ_MULTICAST_NOMATCH_RSS_ENABLED,
+			!!(table->spec[EFX_FILTER_INDEX_MC_DEF].flags &
+			   EFX_FILTER_FLAG_RX_RSS));
+		EFX_SET_OWORD_FIELD(
+			filter_ctl, FRF_CZ_MULTICAST_NOMATCH_IP_OVERRIDE,
+			!!(table->spec[EFX_FILTER_INDEX_MC_DEF].flags &
+			   EFX_FILTER_FLAG_RX_OVERRIDE_IP));
+	}
+
 	efx_writeo(efx, &filter_ctl, FR_BZ_RX_FILTER_CTL);
 }
 
@@ -319,6 +352,52 @@ int efx_filter_set_eth_local(struct efx_filter_spec *spec,
 	return 0;
 }
 
+/**
+ * efx_filter_set_uc_def - specify matching otherwise-unmatched unicast
+ * @spec: Specification to initialise
+ */
+int efx_filter_set_uc_def(struct efx_filter_spec *spec)
+{
+	EFX_BUG_ON_PARANOID(!(spec->flags &
+			      (EFX_FILTER_FLAG_RX | EFX_FILTER_FLAG_TX)));
+
+	if (spec->type != EFX_FILTER_UNSPEC)
+		return -EINVAL;
+
+	spec->type = EFX_FILTER_UC_DEF;
+	memset(spec->data, 0, sizeof(spec->data)); /* ensure equality */
+	return 0;
+}
+
+/**
+ * efx_filter_set_mc_def - specify matching otherwise-unmatched multicast
+ * @spec: Specification to initialise
+ */
+int efx_filter_set_mc_def(struct efx_filter_spec *spec)
+{
+	EFX_BUG_ON_PARANOID(!(spec->flags &
+			      (EFX_FILTER_FLAG_RX | EFX_FILTER_FLAG_TX)));
+
+	if (spec->type != EFX_FILTER_UNSPEC)
+		return -EINVAL;
+
+	spec->type = EFX_FILTER_MC_DEF;
+	memset(spec->data, 0, sizeof(spec->data)); /* ensure equality */
+	return 0;
+}
+
+static void efx_filter_reset_rx_def(struct efx_nic *efx, unsigned filter_idx)
+{
+	struct efx_filter_state *state = efx->filter_state;
+	struct efx_filter_table *table = &state->table[EFX_FILTER_TABLE_RX_DEF];
+	struct efx_filter_spec *spec = &table->spec[filter_idx];
+
+	efx_filter_init_rx(spec, EFX_FILTER_PRI_MANUAL,
+			   EFX_FILTER_FLAG_RX_RSS, 0);
+	spec->type = EFX_FILTER_UC_DEF + filter_idx;
+	table->used_bitmap[0] |= 1 << filter_idx;
+}
+
 int efx_filter_get_eth_local(const struct efx_filter_spec *spec,
 			     u16 *vid, u8 *addr)
 {
@@ -366,6 +445,13 @@ static u32 efx_filter_build(efx_oword_t *filter, struct efx_filter_spec *spec)
 		break;
 	}
 
+	case EFX_FILTER_TABLE_RX_DEF:
+		/* One filter spec per type */
+		BUILD_BUG_ON(EFX_FILTER_INDEX_UC_DEF != 0);
+		BUILD_BUG_ON(EFX_FILTER_INDEX_MC_DEF !=
+			     EFX_FILTER_MC_DEF - EFX_FILTER_UC_DEF);
+		return spec->type - EFX_FILTER_UC_DEF;
+
 	case EFX_FILTER_TABLE_RX_MAC: {
 		bool is_wild = spec->type == EFX_FILTER_MAC_WILD;
 		EFX_POPULATE_OWORD_8(
@@ -448,23 +534,40 @@ static int efx_filter_search(struct efx_filter_table *table,
  * MAC filters without overriding behaviour.
  */
 
+#define EFX_FILTER_MATCH_PRI_RX_MAC_OVERRIDE_IP	0
+#define EFX_FILTER_MATCH_PRI_RX_DEF_OVERRIDE_IP	1
+#define EFX_FILTER_MATCH_PRI_NORMAL_BASE	2
+
 #define EFX_FILTER_INDEX_WIDTH	13
 #define EFX_FILTER_INDEX_MASK	((1 << EFX_FILTER_INDEX_WIDTH) - 1)
 
 static inline u32 efx_filter_make_id(enum efx_filter_table_id table_id,
 				     unsigned int index, u8 flags)
 {
-	return (table_id == EFX_FILTER_TABLE_RX_MAC &&
-		flags & EFX_FILTER_FLAG_RX_OVERRIDE_IP) ?
-		index :
-		(table_id + 1) << EFX_FILTER_INDEX_WIDTH | index;
+	unsigned int match_pri = EFX_FILTER_MATCH_PRI_NORMAL_BASE + table_id;
+
+	if (flags & EFX_FILTER_FLAG_RX_OVERRIDE_IP) {
+		if (table_id == EFX_FILTER_TABLE_RX_MAC)
+			match_pri = EFX_FILTER_MATCH_PRI_RX_MAC_OVERRIDE_IP;
+		else if (table_id == EFX_FILTER_TABLE_RX_DEF)
+			match_pri = EFX_FILTER_MATCH_PRI_RX_DEF_OVERRIDE_IP;
+	}
+
+	return match_pri << EFX_FILTER_INDEX_WIDTH | index;
 }
 
 static inline enum efx_filter_table_id efx_filter_id_table_id(u32 id)
 {
-	return (id <= EFX_FILTER_INDEX_MASK) ?
-		EFX_FILTER_TABLE_RX_MAC :
-		(id >> EFX_FILTER_INDEX_WIDTH) - 1;
+	unsigned int match_pri = id >> EFX_FILTER_INDEX_WIDTH;
+
+	switch (match_pri) {
+	case EFX_FILTER_MATCH_PRI_RX_MAC_OVERRIDE_IP:
+		return EFX_FILTER_TABLE_RX_MAC;
+	case EFX_FILTER_MATCH_PRI_RX_DEF_OVERRIDE_IP:
+		return EFX_FILTER_TABLE_RX_DEF;
+	default:
+		return match_pri - EFX_FILTER_MATCH_PRI_NORMAL_BASE;
+	}
 }
 
 static inline unsigned int efx_filter_id_index(u32 id)
@@ -474,23 +577,27 @@ static inline unsigned int efx_filter_id_index(u32 id)
 
 static inline u8 efx_filter_id_flags(u32 id)
 {
-	return (id <= EFX_FILTER_INDEX_MASK) ?
-		EFX_FILTER_FLAG_RX | EFX_FILTER_FLAG_RX_OVERRIDE_IP :
-		EFX_FILTER_FLAG_RX;
+	unsigned int match_pri = id >> EFX_FILTER_INDEX_WIDTH;
+
+	if (match_pri < EFX_FILTER_MATCH_PRI_NORMAL_BASE)
+		return EFX_FILTER_FLAG_RX | EFX_FILTER_FLAG_RX_OVERRIDE_IP;
+	else
+		return EFX_FILTER_FLAG_RX;
 }
 
 u32 efx_filter_get_rx_id_limit(struct efx_nic *efx)
 {
 	struct efx_filter_state *state = efx->filter_state;
+	unsigned int table_id = EFX_FILTER_TABLE_RX_DEF;
 
-	if (state->table[EFX_FILTER_TABLE_RX_MAC].size != 0)
-		return ((EFX_FILTER_TABLE_RX_MAC + 1) << EFX_FILTER_INDEX_WIDTH)
-			+ state->table[EFX_FILTER_TABLE_RX_MAC].size;
-	else if (state->table[EFX_FILTER_TABLE_RX_IP].size != 0)
-		return ((EFX_FILTER_TABLE_RX_IP + 1) << EFX_FILTER_INDEX_WIDTH)
-			+ state->table[EFX_FILTER_TABLE_RX_IP].size;
-	else
-		return 0;
+	do {
+		if (state->table[table_id].size != 0)
+			return ((EFX_FILTER_MATCH_PRI_NORMAL_BASE + table_id)
+				<< EFX_FILTER_INDEX_WIDTH) +
+				state->table[table_id].size;
+	} while (table_id--);
+
+	return 0;
 }
 
 /**
@@ -548,12 +655,17 @@ s32 efx_filter_insert_filter(struct efx_nic *efx, struct efx_filter_spec *spec,
 	}
 	*saved_spec = *spec;
 
-	if (table->search_depth[spec->type] < depth) {
-		table->search_depth[spec->type] = depth;
-		efx_filter_push_rx_limits(efx);
-	}
+	if (table->id == EFX_FILTER_TABLE_RX_DEF) {
+		efx_filter_push_rx_config(efx);
+	} else {
+		if (table->search_depth[spec->type] < depth) {
+			table->search_depth[spec->type] = depth;
+			efx_filter_push_rx_config(efx);
+		}
 
-	efx_writeo(efx, &filter, table->offset + table->step * filter_idx);
+		efx_writeo(efx, &filter,
+			   table->offset + table->step * filter_idx);
+	}
 
 	netif_vdbg(efx, hw, efx->net_dev,
 		   "%s: filter type %d index %d rxq %u set",
@@ -571,7 +683,11 @@ static void efx_filter_table_clear_entry(struct efx_nic *efx,
 {
 	static efx_oword_t filter;
 
-	if (test_bit(filter_idx, table->used_bitmap)) {
+	if (table->id == EFX_FILTER_TABLE_RX_DEF) {
+		/* RX default filters must always exist */
+		efx_filter_reset_rx_def(efx, filter_idx);
+		efx_filter_push_rx_config(efx);
+	} else if (test_bit(filter_idx, table->used_bitmap)) {
 		__clear_bit(filter_idx, table->used_bitmap);
 		--table->used;
 		memset(&table->spec[filter_idx], 0, sizeof(table->spec[0]));
@@ -617,7 +733,8 @@ int efx_filter_remove_id_safe(struct efx_nic *efx,
 	spin_lock_bh(&state->lock);
 
 	if (test_bit(filter_idx, table->used_bitmap) &&
-	    spec->priority == priority && spec->flags == filter_flags) {
+	    spec->priority == priority &&
+	    !((spec->flags ^ filter_flags) & EFX_FILTER_FLAG_RX_OVERRIDE_IP)) {
 		efx_filter_table_clear_entry(efx, table, filter_idx);
 		if (table->used == 0)
 			efx_filter_table_reset_search_depth(table);
@@ -668,7 +785,8 @@ int efx_filter_get_filter_safe(struct efx_nic *efx,
 	spin_lock_bh(&state->lock);
 
 	if (test_bit(filter_idx, table->used_bitmap) &&
-	    spec->priority == priority && spec->flags == filter_flags) {
+	    spec->priority == priority &&
+	    !((spec->flags ^ filter_flags) & EFX_FILTER_FLAG_RX_OVERRIDE_IP)) {
 		*spec_buf = *spec;
 		rc = 0;
 	} else {
@@ -722,7 +840,7 @@ u32 efx_filter_count_rx_used(struct efx_nic *efx,
 	spin_lock_bh(&state->lock);
 
 	for (table_id = EFX_FILTER_TABLE_RX_IP;
-	     table_id <= EFX_FILTER_TABLE_RX_MAC;
+	     table_id <= EFX_FILTER_TABLE_RX_DEF;
 	     table_id++) {
 		table = &state->table[table_id];
 		for (filter_idx = 0; filter_idx < table->size; filter_idx++) {
@@ -750,7 +868,7 @@ s32 efx_filter_get_rx_ids(struct efx_nic *efx,
 	spin_lock_bh(&state->lock);
 
 	for (table_id = EFX_FILTER_TABLE_RX_IP;
-	     table_id <= EFX_FILTER_TABLE_RX_MAC;
+	     table_id <= EFX_FILTER_TABLE_RX_DEF;
 	     table_id++) {
 		table = &state->table[table_id];
 		for (filter_idx = 0; filter_idx < table->size; filter_idx++) {
@@ -785,6 +903,11 @@ void efx_restore_filters(struct efx_nic *efx)
 
 	for (table_id = 0; table_id < EFX_FILTER_TABLE_COUNT; table_id++) {
 		table = &state->table[table_id];
+
+		/* Check whether this is a regular register table */
+		if (table->step == 0)
+			continue;
+
 		for (filter_idx = 0; filter_idx < table->size; filter_idx++) {
 			if (!test_bit(filter_idx, table->used_bitmap))
 				continue;
@@ -794,7 +917,7 @@ void efx_restore_filters(struct efx_nic *efx)
 		}
 	}
 
-	efx_filter_push_rx_limits(efx);
+	efx_filter_push_rx_config(efx);
 
 	spin_unlock_bh(&state->lock);
 }
@@ -833,6 +956,10 @@ int efx_probe_filters(struct efx_nic *efx)
 		table->offset = FR_CZ_RX_MAC_FILTER_TBL0;
 		table->size = FR_CZ_RX_MAC_FILTER_TBL0_ROWS;
 		table->step = FR_CZ_RX_MAC_FILTER_TBL0_STEP;
+
+		table = &state->table[EFX_FILTER_TABLE_RX_DEF];
+		table->id = EFX_FILTER_TABLE_RX_DEF;
+		table->size = EFX_FILTER_SIZE_RX_DEF;
 	}
 
 	for (table_id = 0; table_id < EFX_FILTER_TABLE_COUNT; table_id++) {
@@ -849,6 +976,15 @@ int efx_probe_filters(struct efx_nic *efx)
 			goto fail;
 	}
 
+	if (state->table[EFX_FILTER_TABLE_RX_DEF].size) {
+		/* RX default filters must always exist */
+		unsigned i;
+		for (i = 0; i < EFX_FILTER_SIZE_RX_DEF; i++)
+			efx_filter_reset_rx_def(efx, i);
+	}
+
+	efx_filter_push_rx_config(efx);
+
 	return 0;
 
 fail:
diff --git a/drivers/net/ethernet/sfc/filter.h b/drivers/net/ethernet/sfc/filter.h
index 3d4108c..4519978 100644
--- a/drivers/net/ethernet/sfc/filter.h
+++ b/drivers/net/ethernet/sfc/filter.h
@@ -20,6 +20,8 @@
  * @EFX_FILTER_UDP_WILD: Matching UDP/IPv4 destination (host, port)
  * @EFX_FILTER_MAC_FULL: Matching Ethernet destination MAC address, VID
  * @EFX_FILTER_MAC_WILD: Matching Ethernet destination MAC address
+ * @EFX_FILTER_UC_DEF: Matching all otherwise unmatched unicast
+ * @EFX_FILTER_MC_DEF: Matching all otherwise unmatched multicast
  * @EFX_FILTER_UNSPEC: Match type is unspecified
  *
  * Falcon NICs only support the TCP/IPv4 and UDP/IPv4 filter types.
@@ -31,6 +33,8 @@ enum efx_filter_type {
 	EFX_FILTER_UDP_WILD,
 	EFX_FILTER_MAC_FULL = 4,
 	EFX_FILTER_MAC_WILD,
+	EFX_FILTER_UC_DEF = 8,
+	EFX_FILTER_MC_DEF,
 	EFX_FILTER_TYPE_COUNT,		/* number of specific types */
 	EFX_FILTER_UNSPEC = 0xf,
 };
@@ -117,6 +121,8 @@ extern int efx_filter_set_eth_local(struct efx_filter_spec *spec,
 				    u16 vid, const u8 *addr);
 extern int efx_filter_get_eth_local(const struct efx_filter_spec *spec,
 				    u16 *vid, u8 *addr);
+extern int efx_filter_set_uc_def(struct efx_filter_spec *spec);
+extern int efx_filter_set_mc_def(struct efx_filter_spec *spec);
 enum {
 	EFX_FILTER_VID_UNSPEC = 0xffff,
 };
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 06/19] sfc: Add support for TX MAC filters
  2012-02-16  0:42 pull request: sfc-next 2012-02-16 Ben Hutchings
                   ` (4 preceding siblings ...)
  2012-02-16  0:46 ` [PATCH net-next 05/19] sfc: Add support for configuring RX unicast/multicast default filters Ben Hutchings
@ 2012-02-16  0:46 ` Ben Hutchings
  2012-02-16  0:47 ` [PATCH net-next 07/19] sfc: Correct MAC filter bitfield definitions Ben Hutchings
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Ben Hutchings @ 2012-02-16  0:46 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

On Siena each TX queue can be configured to send only packets for
which there is a TX MAC filter that matches the source MAC address,
queue ID, and optionally VID.  This will be used to implement the
'spoofchk' feature for SR-IOV virtual functions.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/filter.c |   63 ++++++++++++++++++++++++++++++++++--
 drivers/net/ethernet/sfc/filter.h |   14 +++++++-
 2 files changed, 72 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/sfc/filter.c b/drivers/net/ethernet/sfc/filter.c
index 92517ea..fea7f73 100644
--- a/drivers/net/ethernet/sfc/filter.c
+++ b/drivers/net/ethernet/sfc/filter.c
@@ -36,6 +36,7 @@ enum efx_filter_table_id {
 	EFX_FILTER_TABLE_RX_IP = 0,
 	EFX_FILTER_TABLE_RX_MAC,
 	EFX_FILTER_TABLE_RX_DEF,
+	EFX_FILTER_TABLE_TX_MAC,
 	EFX_FILTER_TABLE_COUNT,
 };
 
@@ -97,8 +98,9 @@ efx_filter_spec_table_id(const struct efx_filter_spec *spec)
 	BUILD_BUG_ON(EFX_FILTER_TABLE_RX_IP != (EFX_FILTER_UDP_WILD >> 2));
 	BUILD_BUG_ON(EFX_FILTER_TABLE_RX_MAC != (EFX_FILTER_MAC_FULL >> 2));
 	BUILD_BUG_ON(EFX_FILTER_TABLE_RX_MAC != (EFX_FILTER_MAC_WILD >> 2));
+	BUILD_BUG_ON(EFX_FILTER_TABLE_TX_MAC != EFX_FILTER_TABLE_RX_MAC + 2);
 	EFX_BUG_ON_PARANOID(spec->type == EFX_FILTER_UNSPEC);
-	return spec->type >> 2;
+	return (spec->type >> 2) + ((spec->flags & EFX_FILTER_FLAG_TX) ? 2 : 0);
 }
 
 static struct efx_filter_table *
@@ -179,6 +181,29 @@ static void efx_filter_push_rx_config(struct efx_nic *efx)
 	efx_writeo(efx, &filter_ctl, FR_BZ_RX_FILTER_CTL);
 }
 
+static void efx_filter_push_tx_limits(struct efx_nic *efx)
+{
+	struct efx_filter_state *state = efx->filter_state;
+	struct efx_filter_table *table;
+	efx_oword_t tx_cfg;
+
+	efx_reado(efx, &tx_cfg, FR_AZ_TX_CFG);
+
+	table = &state->table[EFX_FILTER_TABLE_TX_MAC];
+	if (table->size) {
+		EFX_SET_OWORD_FIELD(
+			tx_cfg, FRF_CZ_TX_ETH_FILTER_FULL_SEARCH_RANGE,
+			table->search_depth[EFX_FILTER_MAC_FULL] +
+			FILTER_CTL_SRCH_FUDGE_FULL);
+		EFX_SET_OWORD_FIELD(
+			tx_cfg, FRF_CZ_TX_ETH_FILTER_WILD_SEARCH_RANGE,
+			table->search_depth[EFX_FILTER_MAC_WILD] +
+			FILTER_CTL_SRCH_FUDGE_WILD);
+	}
+
+	efx_writeo(efx, &tx_cfg, FR_AZ_TX_CFG);
+}
+
 static inline void __efx_filter_set_ipv4(struct efx_filter_spec *spec,
 					 __be32 host1, __be16 port1,
 					 __be32 host2, __be16 port2)
@@ -333,7 +358,8 @@ int efx_filter_get_ipv4_full(const struct efx_filter_spec *spec,
 int efx_filter_set_eth_local(struct efx_filter_spec *spec,
 			     u16 vid, const u8 *addr)
 {
-	EFX_BUG_ON_PARANOID(!(spec->flags & EFX_FILTER_FLAG_RX));
+	EFX_BUG_ON_PARANOID(!(spec->flags &
+			      (EFX_FILTER_FLAG_RX | EFX_FILTER_FLAG_TX)));
 
 	/* This cannot currently be combined with other filtering */
 	if (spec->type != EFX_FILTER_UNSPEC)
@@ -471,6 +497,18 @@ static u32 efx_filter_build(efx_oword_t *filter, struct efx_filter_spec *spec)
 		break;
 	}
 
+	case EFX_FILTER_TABLE_TX_MAC: {
+		bool is_wild = spec->type == EFX_FILTER_MAC_WILD;
+		EFX_POPULATE_OWORD_5(*filter,
+				     FRF_CZ_TMFT_TXQ_ID, spec->dmaq_id,
+				     FRF_CZ_TMFT_WILDCARD_MATCH, is_wild,
+				     FRF_CZ_TMFT_SRC_MAC_HI, spec->data[2],
+				     FRF_CZ_TMFT_SRC_MAC_LO, spec->data[1],
+				     FRF_CZ_TMFT_VLAN_ID, spec->data[0]);
+		data3 = is_wild | spec->dmaq_id << 1;
+		break;
+	}
+
 	default:
 		BUG();
 	}
@@ -485,6 +523,10 @@ static bool efx_filter_equal(const struct efx_filter_spec *left,
 	    memcmp(left->data, right->data, sizeof(left->data)))
 		return false;
 
+	if (left->flags & EFX_FILTER_FLAG_TX &&
+	    left->dmaq_id != right->dmaq_id)
+		return false;
+
 	return true;
 }
 
@@ -581,8 +623,11 @@ static inline u8 efx_filter_id_flags(u32 id)
 
 	if (match_pri < EFX_FILTER_MATCH_PRI_NORMAL_BASE)
 		return EFX_FILTER_FLAG_RX | EFX_FILTER_FLAG_RX_OVERRIDE_IP;
-	else
+	else if (match_pri <=
+		 EFX_FILTER_MATCH_PRI_NORMAL_BASE + EFX_FILTER_TABLE_RX_DEF)
 		return EFX_FILTER_FLAG_RX;
+	else
+		return EFX_FILTER_FLAG_TX;
 }
 
 u32 efx_filter_get_rx_id_limit(struct efx_nic *efx)
@@ -660,7 +705,10 @@ s32 efx_filter_insert_filter(struct efx_nic *efx, struct efx_filter_spec *spec,
 	} else {
 		if (table->search_depth[spec->type] < depth) {
 			table->search_depth[spec->type] = depth;
-			efx_filter_push_rx_config(efx);
+			if (spec->flags & EFX_FILTER_FLAG_TX)
+				efx_filter_push_tx_limits(efx);
+			else
+				efx_filter_push_rx_config(efx);
 		}
 
 		efx_writeo(efx, &filter,
@@ -918,6 +966,7 @@ void efx_restore_filters(struct efx_nic *efx)
 	}
 
 	efx_filter_push_rx_config(efx);
+	efx_filter_push_tx_limits(efx);
 
 	spin_unlock_bh(&state->lock);
 }
@@ -960,6 +1009,12 @@ int efx_probe_filters(struct efx_nic *efx)
 		table = &state->table[EFX_FILTER_TABLE_RX_DEF];
 		table->id = EFX_FILTER_TABLE_RX_DEF;
 		table->size = EFX_FILTER_SIZE_RX_DEF;
+
+		table = &state->table[EFX_FILTER_TABLE_TX_MAC];
+		table->id = EFX_FILTER_TABLE_TX_MAC;
+		table->offset = FR_CZ_TX_MAC_FILTER_TBL0;
+		table->size = FR_CZ_TX_MAC_FILTER_TBL0_ROWS;
+		table->step = FR_CZ_TX_MAC_FILTER_TBL0_STEP;
 	}
 
 	for (table_id = 0; table_id < EFX_FILTER_TABLE_COUNT; table_id++) {
diff --git a/drivers/net/ethernet/sfc/filter.h b/drivers/net/ethernet/sfc/filter.h
index 4519978..3c77802 100644
--- a/drivers/net/ethernet/sfc/filter.h
+++ b/drivers/net/ethernet/sfc/filter.h
@@ -43,7 +43,8 @@ enum efx_filter_type {
  * enum efx_filter_priority - priority of a hardware filter specification
  * @EFX_FILTER_PRI_HINT: Performance hint
  * @EFX_FILTER_PRI_MANUAL: Manually configured filter
- * @EFX_FILTER_PRI_REQUIRED: Required for correct behaviour
+ * @EFX_FILTER_PRI_REQUIRED: Required for correct behaviour (user-level
+ *	networking and SR-IOV)
  */
 enum efx_filter_priority {
 	EFX_FILTER_PRI_HINT = 0,
@@ -64,12 +65,14 @@ enum efx_filter_priority {
  *	any IP filter that matches the same packet.  By default, IP
  *	filters take precedence.
  * @EFX_FILTER_FLAG_RX: Filter is for RX
+ * @EFX_FILTER_FLAG_TX: Filter is for TX
  */
 enum efx_filter_flags {
 	EFX_FILTER_FLAG_RX_RSS = 0x01,
 	EFX_FILTER_FLAG_RX_SCATTER = 0x02,
 	EFX_FILTER_FLAG_RX_OVERRIDE_IP = 0x04,
 	EFX_FILTER_FLAG_RX = 0x08,
+	EFX_FILTER_FLAG_TX = 0x10,
 };
 
 /**
@@ -107,6 +110,15 @@ static inline void efx_filter_init_rx(struct efx_filter_spec *spec,
 	spec->dmaq_id = rxq_id;
 }
 
+static inline void efx_filter_init_tx(struct efx_filter_spec *spec,
+				      unsigned txq_id)
+{
+	spec->type = EFX_FILTER_UNSPEC;
+	spec->priority = EFX_FILTER_PRI_REQUIRED;
+	spec->flags = EFX_FILTER_FLAG_TX;
+	spec->dmaq_id = txq_id;
+}
+
 extern int efx_filter_set_ipv4_local(struct efx_filter_spec *spec, u8 proto,
 				     __be32 host, __be16 port);
 extern int efx_filter_get_ipv4_local(const struct efx_filter_spec *spec,
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 07/19] sfc: Correct MAC filter bitfield definitions
  2012-02-16  0:42 pull request: sfc-next 2012-02-16 Ben Hutchings
                   ` (5 preceding siblings ...)
  2012-02-16  0:46 ` [PATCH net-next 06/19] sfc: Add support for TX MAC filters Ben Hutchings
@ 2012-02-16  0:47 ` Ben Hutchings
  2012-02-16  0:47 ` [PATCH net-next 08/19] sfc: Generalise driver event generation Ben Hutchings
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Ben Hutchings @ 2012-02-16  0:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

The RMFT_DEST_MAC and TMFT_SRC_MAC register fields were previously
documented as 44 bits wide, whereas a MAC address has 48 bits.
Thankfully the hardware uses the correct width and the driver has
used separate definitions that divide each of these into 32-bit and
16-bit fields.

Fix the initial definitions for these fields and rewrite the latter
definitions to use them.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/regs.h |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/sfc/regs.h b/drivers/net/ethernet/sfc/regs.h
index cc2c86b..ade4c4d 100644
--- a/drivers/net/ethernet/sfc/regs.h
+++ b/drivers/net/ethernet/sfc/regs.h
@@ -2446,8 +2446,8 @@
 #define	FRF_CZ_RMFT_RXQ_ID_WIDTH 12
 #define	FRF_CZ_RMFT_WILDCARD_MATCH_LBN 60
 #define	FRF_CZ_RMFT_WILDCARD_MATCH_WIDTH 1
-#define	FRF_CZ_RMFT_DEST_MAC_LBN 16
-#define	FRF_CZ_RMFT_DEST_MAC_WIDTH 44
+#define	FRF_CZ_RMFT_DEST_MAC_LBN 12
+#define	FRF_CZ_RMFT_DEST_MAC_WIDTH 48
 #define	FRF_CZ_RMFT_VLAN_ID_LBN 0
 #define	FRF_CZ_RMFT_VLAN_ID_WIDTH 12
 
@@ -2523,8 +2523,8 @@
 #define	FRF_CZ_TMFT_TXQ_ID_WIDTH 12
 #define	FRF_CZ_TMFT_WILDCARD_MATCH_LBN 60
 #define	FRF_CZ_TMFT_WILDCARD_MATCH_WIDTH 1
-#define	FRF_CZ_TMFT_SRC_MAC_LBN 16
-#define	FRF_CZ_TMFT_SRC_MAC_WIDTH 44
+#define	FRF_CZ_TMFT_SRC_MAC_LBN 12
+#define	FRF_CZ_TMFT_SRC_MAC_WIDTH 48
 #define	FRF_CZ_TMFT_VLAN_ID_LBN 0
 #define	FRF_CZ_TMFT_VLAN_ID_WIDTH 12
 
@@ -2895,17 +2895,17 @@
 
 /* RX_MAC_FILTER_TBL0 */
 /* RMFT_DEST_MAC is wider than 32 bits */
-#define FRF_CZ_RMFT_DEST_MAC_LO_LBN 12
+#define FRF_CZ_RMFT_DEST_MAC_LO_LBN FRF_CZ_RMFT_DEST_MAC_LBN
 #define FRF_CZ_RMFT_DEST_MAC_LO_WIDTH 32
-#define FRF_CZ_RMFT_DEST_MAC_HI_LBN 44
-#define FRF_CZ_RMFT_DEST_MAC_HI_WIDTH 16
+#define FRF_CZ_RMFT_DEST_MAC_HI_LBN (FRF_CZ_RMFT_DEST_MAC_LBN + 32)
+#define FRF_CZ_RMFT_DEST_MAC_HI_WIDTH (FRF_CZ_RMFT_DEST_MAC_WIDTH - 32)
 
 /* TX_MAC_FILTER_TBL0 */
 /* TMFT_SRC_MAC is wider than 32 bits */
-#define FRF_CZ_TMFT_SRC_MAC_LO_LBN 12
+#define FRF_CZ_TMFT_SRC_MAC_LO_LBN FRF_CZ_TMFT_SRC_MAC_LBN
 #define FRF_CZ_TMFT_SRC_MAC_LO_WIDTH 32
-#define FRF_CZ_TMFT_SRC_MAC_HI_LBN 44
-#define FRF_CZ_TMFT_SRC_MAC_HI_WIDTH 16
+#define FRF_CZ_TMFT_SRC_MAC_HI_LBN (FRF_CZ_TMFT_SRC_MAC_LBN + 32)
+#define FRF_CZ_TMFT_SRC_MAC_HI_WIDTH (FRF_CZ_TMFT_SRC_MAC_WIDTH - 32)
 
 /* TX_PACE_TBL */
 /* Values >20 are documented as reserved, but will result in a queue going
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 08/19] sfc: Generalise driver event generation
  2012-02-16  0:42 pull request: sfc-next 2012-02-16 Ben Hutchings
                   ` (6 preceding siblings ...)
  2012-02-16  0:47 ` [PATCH net-next 07/19] sfc: Correct MAC filter bitfield definitions Ben Hutchings
@ 2012-02-16  0:47 ` Ben Hutchings
  2012-02-16  0:47 ` [PATCH net-next 09/19] sfc: Generate RX fill events based on RX queues, not channels Ben Hutchings
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Ben Hutchings @ 2012-02-16  0:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/nic.c |   51 +++++++++++++++++++++------------------
 1 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/sfc/nic.c b/drivers/net/ethernet/sfc/nic.c
index dd50c4f..0a46b1c 100644
--- a/drivers/net/ethernet/sfc/nic.c
+++ b/drivers/net/ethernet/sfc/nic.c
@@ -57,13 +57,17 @@
 /* Depth of RX flush request fifo */
 #define EFX_RX_FLUSH_COUNT 4
 
-/* Generated event code for efx_generate_test_event() */
-#define EFX_CHANNEL_MAGIC_TEST(_channel)	\
-	(0x00010100 + (_channel)->channel)
+/* Driver generated events */
+#define _EFX_CHANNEL_MAGIC_TEST		0x000101
+#define _EFX_CHANNEL_MAGIC_FILL		0x000102
 
-/* Generated event code for efx_generate_fill_event() */
-#define EFX_CHANNEL_MAGIC_FILL(_channel)	\
-	(0x00010200 + (_channel)->channel)
+#define _EFX_CHANNEL_MAGIC(_code, _data)	((_code) << 8 | (_data))
+#define _EFX_CHANNEL_MAGIC_CODE(_magic)		((_magic) >> 8)
+
+#define EFX_CHANNEL_MAGIC_TEST(_channel)				\
+	_EFX_CHANNEL_MAGIC(_EFX_CHANNEL_MAGIC_TEST, (_channel)->channel)
+#define EFX_CHANNEL_MAGIC_FILL(_channel)				\
+	_EFX_CHANNEL_MAGIC(_EFX_CHANNEL_MAGIC_FILL, (_channel)->channel)
 
 /**************************************************************************
  *
@@ -693,6 +697,16 @@ static void efx_generate_event(struct efx_channel *channel, efx_qword_t *event)
 	efx_writeo(channel->efx, &drv_ev_reg, FR_AZ_DRV_EV);
 }
 
+static void efx_magic_event(struct efx_channel *channel, u32 magic)
+{
+	efx_qword_t event;
+
+	EFX_POPULATE_QWORD_2(event, FSF_AZ_EV_CODE,
+			     FSE_AZ_EV_CODE_DRV_GEN_EV,
+			     FSF_AZ_DRV_GEN_EV_MAGIC, magic);
+	efx_generate_event(channel, &event);
+}
+
 /* Handle a transmit completion event
  *
  * The NIC batches TX completion events; the message we receive is of
@@ -898,12 +912,13 @@ static void
 efx_handle_generated_event(struct efx_channel *channel, efx_qword_t *event)
 {
 	struct efx_nic *efx = channel->efx;
-	unsigned code;
+	unsigned magic;
 
-	code = EFX_QWORD_FIELD(*event, FSF_AZ_DRV_GEN_EV_MAGIC);
-	if (code == EFX_CHANNEL_MAGIC_TEST(channel))
+	magic = EFX_QWORD_FIELD(*event, FSF_AZ_DRV_GEN_EV_MAGIC);
+
+	if (magic == EFX_CHANNEL_MAGIC_TEST(channel))
 		; /* ignore */
-	else if (code == EFX_CHANNEL_MAGIC_FILL(channel))
+	else if (magic == EFX_CHANNEL_MAGIC_FILL(channel))
 		/* The queue must be empty, so we won't receive any rx
 		 * events, so efx_process_channel() won't refill the
 		 * queue. Refill it here */
@@ -1132,24 +1147,12 @@ void efx_nic_remove_eventq(struct efx_channel *channel)
 
 void efx_nic_generate_test_event(struct efx_channel *channel)
 {
-	unsigned int magic = EFX_CHANNEL_MAGIC_TEST(channel);
-	efx_qword_t test_event;
-
-	EFX_POPULATE_QWORD_2(test_event, FSF_AZ_EV_CODE,
-			     FSE_AZ_EV_CODE_DRV_GEN_EV,
-			     FSF_AZ_DRV_GEN_EV_MAGIC, magic);
-	efx_generate_event(channel, &test_event);
+	efx_magic_event(channel, EFX_CHANNEL_MAGIC_TEST(channel));
 }
 
 void efx_nic_generate_fill_event(struct efx_channel *channel)
 {
-	unsigned int magic = EFX_CHANNEL_MAGIC_FILL(channel);
-	efx_qword_t test_event;
-
-	EFX_POPULATE_QWORD_2(test_event, FSF_AZ_EV_CODE,
-			     FSE_AZ_EV_CODE_DRV_GEN_EV,
-			     FSF_AZ_DRV_GEN_EV_MAGIC, magic);
-	efx_generate_event(channel, &test_event);
+	efx_magic_event(channel, EFX_CHANNEL_MAGIC_FILL(channel));
 }
 
 /**************************************************************************
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 09/19] sfc: Generate RX fill events based on RX queues, not channels
  2012-02-16  0:42 pull request: sfc-next 2012-02-16 Ben Hutchings
                   ` (7 preceding siblings ...)
  2012-02-16  0:47 ` [PATCH net-next 08/19] sfc: Generalise driver event generation Ben Hutchings
@ 2012-02-16  0:47 ` Ben Hutchings
  2012-02-16  0:48 ` [PATCH net-next 10/19] sfc: Leave interrupts and event queues enabled whenever we can Ben Hutchings
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Ben Hutchings @ 2012-02-16  0:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

This makes it harder to accidentally send such events to TX-only
channels.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/nic.c |   17 +++++++++++------
 drivers/net/ethernet/sfc/nic.h |    2 +-
 drivers/net/ethernet/sfc/rx.c  |    3 +--
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/sfc/nic.c b/drivers/net/ethernet/sfc/nic.c
index 0a46b1c..2b33afd 100644
--- a/drivers/net/ethernet/sfc/nic.c
+++ b/drivers/net/ethernet/sfc/nic.c
@@ -66,8 +66,9 @@
 
 #define EFX_CHANNEL_MAGIC_TEST(_channel)				\
 	_EFX_CHANNEL_MAGIC(_EFX_CHANNEL_MAGIC_TEST, (_channel)->channel)
-#define EFX_CHANNEL_MAGIC_FILL(_channel)				\
-	_EFX_CHANNEL_MAGIC(_EFX_CHANNEL_MAGIC_FILL, (_channel)->channel)
+#define EFX_CHANNEL_MAGIC_FILL(_rx_queue)				\
+	_EFX_CHANNEL_MAGIC(_EFX_CHANNEL_MAGIC_FILL,			\
+			   efx_rx_queue_index(_rx_queue))
 
 /**************************************************************************
  *
@@ -912,17 +913,20 @@ static void
 efx_handle_generated_event(struct efx_channel *channel, efx_qword_t *event)
 {
 	struct efx_nic *efx = channel->efx;
+	struct efx_rx_queue *rx_queue =
+		efx_channel_has_rx_queue(channel) ?
+		efx_channel_get_rx_queue(channel) : NULL;
 	unsigned magic;
 
 	magic = EFX_QWORD_FIELD(*event, FSF_AZ_DRV_GEN_EV_MAGIC);
 
 	if (magic == EFX_CHANNEL_MAGIC_TEST(channel))
 		; /* ignore */
-	else if (magic == EFX_CHANNEL_MAGIC_FILL(channel))
+	else if (rx_queue && magic == EFX_CHANNEL_MAGIC_FILL(rx_queue))
 		/* The queue must be empty, so we won't receive any rx
 		 * events, so efx_process_channel() won't refill the
 		 * queue. Refill it here */
-		efx_fast_push_rx_descriptors(efx_channel_get_rx_queue(channel));
+		efx_fast_push_rx_descriptors(rx_queue);
 	else
 		netif_dbg(efx, hw, efx->net_dev, "channel %d received "
 			  "generated event "EFX_QWORD_FMT"\n",
@@ -1150,9 +1154,10 @@ void efx_nic_generate_test_event(struct efx_channel *channel)
 	efx_magic_event(channel, EFX_CHANNEL_MAGIC_TEST(channel));
 }
 
-void efx_nic_generate_fill_event(struct efx_channel *channel)
+void efx_nic_generate_fill_event(struct efx_rx_queue *rx_queue)
 {
-	efx_magic_event(channel, EFX_CHANNEL_MAGIC_FILL(channel));
+	efx_magic_event(efx_rx_queue_channel(rx_queue),
+			EFX_CHANNEL_MAGIC_FILL(rx_queue));
 }
 
 /**************************************************************************
diff --git a/drivers/net/ethernet/sfc/nic.h b/drivers/net/ethernet/sfc/nic.h
index 4f9d18a..aeca4e8 100644
--- a/drivers/net/ethernet/sfc/nic.h
+++ b/drivers/net/ethernet/sfc/nic.h
@@ -195,6 +195,7 @@ extern void efx_nic_init_rx(struct efx_rx_queue *rx_queue);
 extern void efx_nic_fini_rx(struct efx_rx_queue *rx_queue);
 extern void efx_nic_remove_rx(struct efx_rx_queue *rx_queue);
 extern void efx_nic_notify_rx_desc(struct efx_rx_queue *rx_queue);
+extern void efx_nic_generate_fill_event(struct efx_rx_queue *rx_queue);
 
 /* Event data path */
 extern int efx_nic_probe_eventq(struct efx_channel *channel);
@@ -216,7 +217,6 @@ extern void falcon_update_stats_xmac(struct efx_nic *efx);
 extern int efx_nic_init_interrupt(struct efx_nic *efx);
 extern void efx_nic_enable_interrupts(struct efx_nic *efx);
 extern void efx_nic_generate_test_event(struct efx_channel *channel);
-extern void efx_nic_generate_fill_event(struct efx_channel *channel);
 extern void efx_nic_generate_interrupt(struct efx_nic *efx);
 extern void efx_nic_disable_interrupts(struct efx_nic *efx);
 extern void efx_nic_fini_interrupt(struct efx_nic *efx);
diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 1dfda5e..76fb521 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -405,10 +405,9 @@ void efx_fast_push_rx_descriptors(struct efx_rx_queue *rx_queue)
 void efx_rx_slow_fill(unsigned long context)
 {
 	struct efx_rx_queue *rx_queue = (struct efx_rx_queue *)context;
-	struct efx_channel *channel = efx_rx_queue_channel(rx_queue);
 
 	/* Post an event to cause NAPI to run and refill the queue */
-	efx_nic_generate_fill_event(channel);
+	efx_nic_generate_fill_event(rx_queue);
 	++rx_queue->slow_fill_count;
 }
 
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 10/19] sfc: Leave interrupts and event queues enabled whenever we can
  2012-02-16  0:42 pull request: sfc-next 2012-02-16 Ben Hutchings
                   ` (8 preceding siblings ...)
  2012-02-16  0:47 ` [PATCH net-next 09/19] sfc: Generate RX fill events based on RX queues, not channels Ben Hutchings
@ 2012-02-16  0:48 ` Ben Hutchings
  2012-02-16  0:48 ` [PATCH net-next 11/19] sfc: Use proper function to test for RX channel in efx_poll() Ben Hutchings
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Ben Hutchings @ 2012-02-16  0:48 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

When SR-IOV is enabled we may receive FLR (Function-Level Reset)
events, associated queue flush events and requests from VF drivers at
any time.  Therefore we need to keep event queues and interrupts
enabled whenever possible.

Currently we stop interrupt-driven event processing before flushing RX
and TX queues; efx_nic_flush_queues() then polls event queues for
flush events and discards any others it finds.  Change it to work with
the regular event handling functions.

Currently efx_start_channel() fills RX queues synchronously when a
device is brought up.  This could now race with NAPI, so change it to
send fill events.

This was almost entirely written by Steve Hodgson, formerly
shodgson@solarflare.com.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/efx.c        |  237 ++++++++++------------
 drivers/net/ethernet/sfc/net_driver.h |   36 ++--
 drivers/net/ethernet/sfc/nic.c        |  349 +++++++++++++++++----------------
 drivers/net/ethernet/sfc/rx.c         |    4 +
 4 files changed, 308 insertions(+), 318 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index ce07aa5..ccfb43c 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -186,6 +186,8 @@ MODULE_PARM_DESC(debug, "Bitmapped debugging message enable value");
  *
  *************************************************************************/
 
+static void efx_start_interrupts(struct efx_nic *efx);
+static void efx_stop_interrupts(struct efx_nic *efx);
 static void efx_remove_channels(struct efx_nic *efx);
 static void efx_remove_port(struct efx_nic *efx);
 static void efx_init_napi(struct efx_nic *efx);
@@ -217,10 +219,9 @@ static void efx_stop_all(struct efx_nic *efx);
  */
 static int efx_process_channel(struct efx_channel *channel, int budget)
 {
-	struct efx_nic *efx = channel->efx;
 	int spent;
 
-	if (unlikely(efx->reset_pending || !channel->enabled))
+	if (unlikely(!channel->enabled))
 		return 0;
 
 	spent = efx_nic_process_eventq(channel, budget);
@@ -233,9 +234,10 @@ static int efx_process_channel(struct efx_channel *channel, int budget)
 			__efx_rx_packet(channel, channel->rx_pkt);
 			channel->rx_pkt = NULL;
 		}
-
-		efx_rx_strategy(channel);
-		efx_fast_push_rx_descriptors(rx_queue);
+		if (rx_queue->enabled) {
+			efx_rx_strategy(channel);
+			efx_fast_push_rx_descriptors(rx_queue);
+		}
 	}
 
 	return spent;
@@ -387,6 +389,34 @@ static void efx_init_eventq(struct efx_channel *channel)
 	efx_nic_init_eventq(channel);
 }
 
+/* Enable event queue processing and NAPI */
+static void efx_start_eventq(struct efx_channel *channel)
+{
+	netif_dbg(channel->efx, ifup, channel->efx->net_dev,
+		  "chan %d start event queue\n", channel->channel);
+
+	/* The interrupt handler for this channel may set work_pending
+	 * as soon as we enable it.  Make sure it's cleared before
+	 * then.  Similarly, make sure it sees the enabled flag set.
+	 */
+	channel->work_pending = false;
+	channel->enabled = true;
+	smp_wmb();
+
+	napi_enable(&channel->napi_str);
+	efx_nic_eventq_read_ack(channel);
+}
+
+/* Disable event queue processing and NAPI */
+static void efx_stop_eventq(struct efx_channel *channel)
+{
+	if (!channel->enabled)
+		return;
+
+	napi_disable(&channel->napi_str);
+	channel->enabled = false;
+}
+
 static void efx_fini_eventq(struct efx_channel *channel)
 {
 	netif_dbg(channel->efx, drv, channel->efx->net_dev,
@@ -556,7 +586,7 @@ fail:
  * to propagate configuration changes (mtu, checksum offload), or
  * to clear hardware error conditions
  */
-static void efx_init_channels(struct efx_nic *efx)
+static void efx_start_datapath(struct efx_nic *efx)
 {
 	struct efx_tx_queue *tx_queue;
 	struct efx_rx_queue *rx_queue;
@@ -575,68 +605,26 @@ static void efx_init_channels(struct efx_nic *efx)
 
 	/* Initialise the channels */
 	efx_for_each_channel(channel, efx) {
-		netif_dbg(channel->efx, drv, channel->efx->net_dev,
-			  "init chan %d\n", channel->channel);
-
-		efx_init_eventq(channel);
-
 		efx_for_each_channel_tx_queue(tx_queue, channel)
 			efx_init_tx_queue(tx_queue);
 
 		/* The rx buffer allocation strategy is MTU dependent */
 		efx_rx_strategy(channel);
 
-		efx_for_each_channel_rx_queue(rx_queue, channel)
+		efx_for_each_channel_rx_queue(rx_queue, channel) {
 			efx_init_rx_queue(rx_queue);
+			efx_nic_generate_fill_event(rx_queue);
+		}
 
 		WARN_ON(channel->rx_pkt != NULL);
 		efx_rx_strategy(channel);
 	}
-}
-
-/* This enables event queue processing and packet transmission.
- *
- * Note that this function is not allowed to fail, since that would
- * introduce too much complexity into the suspend/resume path.
- */
-static void efx_start_channel(struct efx_channel *channel)
-{
-	struct efx_rx_queue *rx_queue;
-
-	netif_dbg(channel->efx, ifup, channel->efx->net_dev,
-		  "starting chan %d\n", channel->channel);
-
-	/* The interrupt handler for this channel may set work_pending
-	 * as soon as we enable it.  Make sure it's cleared before
-	 * then.  Similarly, make sure it sees the enabled flag set. */
-	channel->work_pending = false;
-	channel->enabled = true;
-	smp_wmb();
-
-	/* Fill the queues before enabling NAPI */
-	efx_for_each_channel_rx_queue(rx_queue, channel)
-		efx_fast_push_rx_descriptors(rx_queue);
-
-	napi_enable(&channel->napi_str);
-}
 
-/* This disables event queue processing and packet transmission.
- * This function does not guarantee that all queue processing
- * (e.g. RX refill) is complete.
- */
-static void efx_stop_channel(struct efx_channel *channel)
-{
-	if (!channel->enabled)
-		return;
-
-	netif_dbg(channel->efx, ifdown, channel->efx->net_dev,
-		  "stop chan %d\n", channel->channel);
-
-	channel->enabled = false;
-	napi_disable(&channel->napi_str);
+	if (netif_device_present(efx->net_dev))
+		netif_tx_wake_all_queues(efx->net_dev);
 }
 
-static void efx_fini_channels(struct efx_nic *efx)
+static void efx_stop_datapath(struct efx_nic *efx)
 {
 	struct efx_channel *channel;
 	struct efx_tx_queue *tx_queue;
@@ -663,14 +651,21 @@ static void efx_fini_channels(struct efx_nic *efx)
 	}
 
 	efx_for_each_channel(channel, efx) {
-		netif_dbg(channel->efx, drv, channel->efx->net_dev,
-			  "shut down chan %d\n", channel->channel);
+		/* RX packet processing is pipelined, so wait for the
+		 * NAPI handler to complete.  At least event queue 0
+		 * might be kept active by non-data events, so don't
+		 * use napi_synchronize() but actually disable NAPI
+		 * temporarily.
+		 */
+		if (efx_channel_has_rx_queue(channel)) {
+			efx_stop_eventq(channel);
+			efx_start_eventq(channel);
+		}
 
 		efx_for_each_channel_rx_queue(rx_queue, channel)
 			efx_fini_rx_queue(rx_queue);
 		efx_for_each_possible_channel_tx_queue(tx_queue, channel)
 			efx_fini_tx_queue(tx_queue);
-		efx_fini_eventq(channel);
 	}
 }
 
@@ -706,7 +701,7 @@ efx_realloc_channels(struct efx_nic *efx, u32 rxq_entries, u32 txq_entries)
 	int rc;
 
 	efx_stop_all(efx);
-	efx_fini_channels(efx);
+	efx_stop_interrupts(efx);
 
 	/* Clone channels */
 	memset(other_channel, 0, sizeof(other_channel));
@@ -746,7 +741,7 @@ out:
 	for (i = 0; i < efx->n_channels; i++)
 		kfree(other_channel[i]);
 
-	efx_init_channels(efx);
+	efx_start_interrupts(efx);
 	efx_start_all(efx);
 	return rc;
 
@@ -1246,6 +1241,44 @@ static int efx_probe_interrupts(struct efx_nic *efx)
 	return 0;
 }
 
+/* Enable interrupts, then probe and start the event queues */
+static void efx_start_interrupts(struct efx_nic *efx)
+{
+	struct efx_channel *channel;
+
+	if (efx->legacy_irq)
+		efx->legacy_irq_enabled = true;
+	efx_nic_enable_interrupts(efx);
+
+	efx_for_each_channel(channel, efx) {
+		efx_init_eventq(channel);
+		efx_start_eventq(channel);
+	}
+
+	efx_mcdi_mode_event(efx);
+}
+
+static void efx_stop_interrupts(struct efx_nic *efx)
+{
+	struct efx_channel *channel;
+
+	efx_mcdi_mode_poll(efx);
+
+	efx_nic_disable_interrupts(efx);
+	if (efx->legacy_irq) {
+		synchronize_irq(efx->legacy_irq);
+		efx->legacy_irq_enabled = false;
+	}
+
+	efx_for_each_channel(channel, efx) {
+		if (channel->irq)
+			synchronize_irq(channel->irq);
+
+		efx_stop_eventq(channel);
+		efx_fini_eventq(channel);
+	}
+}
+
 static void efx_remove_interrupts(struct efx_nic *efx)
 {
 	struct efx_channel *channel;
@@ -1371,15 +1404,13 @@ static int efx_probe_all(struct efx_nic *efx)
 	return rc;
 }
 
-/* Called after previous invocation(s) of efx_stop_all, restarts the
- * port, kernel transmit queue, NAPI processing and hardware interrupts,
- * and ensures that the port is scheduled to be reconfigured.
- * This function is safe to call multiple times when the NIC is in any
- * state. */
+/* Called after previous invocation(s) of efx_stop_all, restarts the port,
+ * kernel transmit queues and NAPI processing, and ensures that the port is
+ * scheduled to be reconfigured. This function is safe to call multiple
+ * times when the NIC is in any state.
+ */
 static void efx_start_all(struct efx_nic *efx)
 {
-	struct efx_channel *channel;
-
 	EFX_ASSERT_RESET_SERIALISED(efx);
 
 	/* Check that it is appropriate to restart the interface. All
@@ -1391,28 +1422,8 @@ static void efx_start_all(struct efx_nic *efx)
 	if (!netif_running(efx->net_dev))
 		return;
 
-	/* Mark the port as enabled so port reconfigurations can start, then
-	 * restart the transmit interface early so the watchdog timer stops */
 	efx_start_port(efx);
-
-	if (netif_device_present(efx->net_dev))
-		netif_tx_wake_all_queues(efx->net_dev);
-
-	efx_for_each_channel(channel, efx)
-		efx_start_channel(channel);
-
-	if (efx->legacy_irq)
-		efx->legacy_irq_enabled = true;
-	efx_nic_enable_interrupts(efx);
-
-	/* Switch to event based MCDI completions after enabling interrupts.
-	 * If a reset has been scheduled, then we need to stay in polled mode.
-	 * Rather than serialising efx_mcdi_mode_event() [which sleeps] and
-	 * reset_pending [modified from an atomic context], we instead guarantee
-	 * that efx_mcdi_mode_poll() isn't reverted erroneously */
-	efx_mcdi_mode_event(efx);
-	if (efx->reset_pending)
-		efx_mcdi_mode_poll(efx);
+	efx_start_datapath(efx);
 
 	/* Start the hardware monitor if there is one. Otherwise (we're link
 	 * event driven), we have to poll the PHY because after an event queue
@@ -1448,8 +1459,6 @@ static void efx_flush_all(struct efx_nic *efx)
  * taking locks. */
 static void efx_stop_all(struct efx_nic *efx)
 {
-	struct efx_channel *channel;
-
 	EFX_ASSERT_RESET_SERIALISED(efx);
 
 	/* port_enabled can be read safely under the rtnl lock */
@@ -1457,28 +1466,6 @@ static void efx_stop_all(struct efx_nic *efx)
 		return;
 
 	efx->type->stop_stats(efx);
-
-	/* Switch to MCDI polling on Siena before disabling interrupts */
-	efx_mcdi_mode_poll(efx);
-
-	/* Disable interrupts and wait for ISR to complete */
-	efx_nic_disable_interrupts(efx);
-	if (efx->legacy_irq) {
-		synchronize_irq(efx->legacy_irq);
-		efx->legacy_irq_enabled = false;
-	}
-	efx_for_each_channel(channel, efx) {
-		if (channel->irq)
-			synchronize_irq(channel->irq);
-	}
-
-	/* Stop all NAPI processing and synchronous rx refills */
-	efx_for_each_channel(channel, efx)
-		efx_stop_channel(channel);
-
-	/* Stop all asynchronous port reconfigurations. Since all
-	 * event processing has already been stopped, there is no
-	 * window to loose phy events */
 	efx_stop_port(efx);
 
 	/* Flush efx_mac_work(), refill_workqueue, monitor_work */
@@ -1486,9 +1473,9 @@ static void efx_stop_all(struct efx_nic *efx)
 
 	/* Stop the kernel transmit interface late, so the watchdog
 	 * timer isn't ticking over the flush */
-	netif_tx_stop_all_queues(efx->net_dev);
-	netif_tx_lock_bh(efx->net_dev);
-	netif_tx_unlock_bh(efx->net_dev);
+	netif_tx_disable(efx->net_dev);
+
+	efx_stop_datapath(efx);
 }
 
 static void efx_remove_all(struct efx_nic *efx)
@@ -1731,8 +1718,6 @@ static int efx_net_stop(struct net_device *net_dev)
 	if (efx->state != STATE_DISABLED) {
 		/* Stop the device and flush all the channels */
 		efx_stop_all(efx);
-		efx_fini_channels(efx);
-		efx_init_channels(efx);
 	}
 
 	return 0;
@@ -1803,8 +1788,6 @@ static int efx_change_mtu(struct net_device *net_dev, int new_mtu)
 
 	netif_dbg(efx, drv, efx->net_dev, "changing MTU to %d\n", new_mtu);
 
-	efx_fini_channels(efx);
-
 	mutex_lock(&efx->mac_lock);
 	/* Reconfigure the MAC before enabling the dma queues so that
 	 * the RX buffers don't overflow */
@@ -1812,8 +1795,6 @@ static int efx_change_mtu(struct net_device *net_dev, int new_mtu)
 	efx->type->reconfigure_mac(efx);
 	mutex_unlock(&efx->mac_lock);
 
-	efx_init_channels(efx);
-
 	efx_start_all(efx);
 	return 0;
 }
@@ -2030,7 +2011,7 @@ void efx_reset_down(struct efx_nic *efx, enum reset_type method)
 	efx_stop_all(efx);
 	mutex_lock(&efx->mac_lock);
 
-	efx_fini_channels(efx);
+	efx_stop_interrupts(efx);
 	if (efx->port_initialized && method != RESET_TYPE_INVISIBLE)
 		efx->phy_op->fini(efx);
 	efx->type->fini(efx);
@@ -2067,7 +2048,7 @@ int efx_reset_up(struct efx_nic *efx, enum reset_type method, bool ok)
 
 	efx->type->reconfigure_mac(efx);
 
-	efx_init_channels(efx);
+	efx_start_interrupts(efx);
 	efx_restore_filters(efx);
 
 	mutex_unlock(&efx->mac_lock);
@@ -2273,6 +2254,7 @@ static int efx_init_struct(struct efx_nic *efx, const struct efx_nic_type *type,
 	efx->phy_op = &efx_dummy_phy_operations;
 	efx->mdio.dev = net_dev;
 	INIT_WORK(&efx->mac_work, efx_mac_work);
+	init_waitqueue_head(&efx->flush_wq);
 
 	for (i = 0; i < EFX_MAX_CHANNELS; i++) {
 		efx->channel[i] = efx_alloc_channel(efx, i, NULL);
@@ -2330,8 +2312,8 @@ static void efx_pci_remove_main(struct efx_nic *efx)
 	free_irq_cpu_rmap(efx->net_dev->rx_cpu_rmap);
 	efx->net_dev->rx_cpu_rmap = NULL;
 #endif
+	efx_stop_interrupts(efx);
 	efx_nic_fini_interrupt(efx);
-	efx_fini_channels(efx);
 	efx_fini_port(efx);
 	efx->type->fini(efx);
 	efx_fini_napi(efx);
@@ -2357,6 +2339,7 @@ static void efx_pci_remove(struct pci_dev *pci_dev)
 	/* Allow any queued efx_resets() to complete */
 	rtnl_unlock();
 
+	efx_stop_interrupts(efx);
 	efx_unregister_netdev(efx);
 
 	efx_mtd_remove(efx);
@@ -2405,16 +2388,14 @@ static int efx_pci_probe_main(struct efx_nic *efx)
 		goto fail4;
 	}
 
-	efx_init_channels(efx);
-
 	rc = efx_nic_init_interrupt(efx);
 	if (rc)
 		goto fail5;
+	efx_start_interrupts(efx);
 
 	return 0;
 
  fail5:
-	efx_fini_channels(efx);
 	efx_fini_port(efx);
  fail4:
 	efx->type->fini(efx);
@@ -2534,7 +2515,7 @@ static int efx_pm_freeze(struct device *dev)
 	netif_device_detach(efx->net_dev);
 
 	efx_stop_all(efx);
-	efx_fini_channels(efx);
+	efx_stop_interrupts(efx);
 
 	return 0;
 }
@@ -2545,7 +2526,7 @@ static int efx_pm_thaw(struct device *dev)
 
 	efx->state = STATE_INIT;
 
-	efx_init_channels(efx);
+	efx_start_interrupts(efx);
 
 	mutex_lock(&efx->mac_lock);
 	efx->phy_op->reconfigure(efx);
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 5386401..18c4b3c 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -85,13 +85,6 @@ struct efx_special_buffer {
 	int entries;
 };
 
-enum efx_flush_state {
-	FLUSH_NONE,
-	FLUSH_PENDING,
-	FLUSH_FAILED,
-	FLUSH_DONE,
-};
-
 /**
  * struct efx_tx_buffer - An Efx TX buffer
  * @skb: The associated socket buffer.
@@ -138,7 +131,6 @@ struct efx_tx_buffer {
  * @txd: The hardware descriptor ring
  * @ptr_mask: The size of the ring minus 1.
  * @initialised: Has hardware queue been initialised?
- * @flushed: Used when handling queue flushing
  * @read_count: Current read pointer.
  *	This is the number of buffers that have been removed from both rings.
  * @old_write_count: The value of @write_count when last checked.
@@ -181,7 +173,6 @@ struct efx_tx_queue {
 	struct efx_special_buffer txd;
 	unsigned int ptr_mask;
 	bool initialised;
-	enum efx_flush_state flushed;
 
 	/* Members used mainly on the completion path */
 	unsigned int read_count ____cacheline_aligned_in_smp;
@@ -249,6 +240,9 @@ struct efx_rx_page_state {
  * @buffer: The software buffer ring
  * @rxd: The hardware descriptor ring
  * @ptr_mask: The size of the ring minus 1.
+ * @enabled: Receive queue enabled indicator.
+ * @flush_pending: Set when a RX flush is pending. Has the same lifetime as
+ *	@rxq_flush_pending.
  * @added_count: Number of buffers added to the receive queue.
  * @notified_count: Number of buffers given to NIC (<= @added_count).
  * @removed_count: Number of buffers removed from the receive queue.
@@ -263,13 +257,14 @@ struct efx_rx_page_state {
  * @alloc_page_count: RX allocation strategy counter.
  * @alloc_skb_count: RX allocation strategy counter.
  * @slow_fill: Timer used to defer efx_nic_generate_fill_event().
- * @flushed: Use when handling queue flushing
  */
 struct efx_rx_queue {
 	struct efx_nic *efx;
 	struct efx_rx_buffer *buffer;
 	struct efx_special_buffer rxd;
 	unsigned int ptr_mask;
+	bool enabled;
+	bool flush_pending;
 
 	int added_count;
 	int notified_count;
@@ -283,8 +278,6 @@ struct efx_rx_queue {
 	unsigned int alloc_skb_count;
 	struct timer_list slow_fill;
 	unsigned int slow_fill_count;
-
-	enum efx_flush_state flushed;
 };
 
 /**
@@ -681,6 +674,13 @@ struct efx_filter_state;
  * @loopback_mode: Loopback status
  * @loopback_modes: Supported loopback mode bitmask
  * @loopback_selftest: Offline self-test private state
+ * @drain_pending: Count of RX and TX queues that haven't been flushed and drained.
+ * @rxq_flush_pending: Count of number of receive queues that need to be flushed.
+ *	Decremented when the efx_flush_rx_queue() is called.
+ * @rxq_flush_outstanding: Count of number of RX flushes started but not yet
+ *	completed (either success or failure). Not used when MCDI is used to
+ *	flush receive queues.
+ * @flush_wq: wait queue used by efx_nic_flush_queues() to wait for flush completions.
  * @monitor_work: Hardware monitor workitem
  * @biu_lock: BIU (bus interface unit) lock
  * @last_irq_cpu: Last CPU to handle a possible test interrupt.  This
@@ -778,6 +778,11 @@ struct efx_nic {
 
 	struct efx_filter_state *filter_state;
 
+	atomic_t drain_pending;
+	atomic_t rxq_flush_pending;
+	atomic_t rxq_flush_outstanding;
+	wait_queue_head_t flush_wq;
+
 	/* The following fields may be written more often */
 
 	struct delayed_work monitor_work ____cacheline_aligned_in_smp;
@@ -956,13 +961,6 @@ static inline bool efx_tx_queue_used(struct efx_tx_queue *tx_queue)
 	     _tx_queue < (_channel)->tx_queue + EFX_TXQ_TYPES;		\
 	     _tx_queue++)
 
-static inline struct efx_rx_queue *
-efx_get_rx_queue(struct efx_nic *efx, unsigned index)
-{
-	EFX_BUG_ON_PARANOID(index >= efx->n_rx_channels);
-	return &efx->channel[index]->rx_queue;
-}
-
 static inline bool efx_channel_has_rx_queue(struct efx_channel *channel)
 {
 	return channel->channel < channel->efx->n_rx_channels;
diff --git a/drivers/net/ethernet/sfc/nic.c b/drivers/net/ethernet/sfc/nic.c
index 2b33afd..03e4125 100644
--- a/drivers/net/ethernet/sfc/nic.c
+++ b/drivers/net/ethernet/sfc/nic.c
@@ -49,17 +49,14 @@
 #define EFX_INT_ERROR_EXPIRE 3600
 #define EFX_MAX_INT_ERRORS 5
 
-/* We poll for events every FLUSH_INTERVAL ms, and check FLUSH_POLL_COUNT times
- */
-#define EFX_FLUSH_INTERVAL 10
-#define EFX_FLUSH_POLL_COUNT 100
-
 /* Depth of RX flush request fifo */
 #define EFX_RX_FLUSH_COUNT 4
 
 /* Driver generated events */
 #define _EFX_CHANNEL_MAGIC_TEST		0x000101
 #define _EFX_CHANNEL_MAGIC_FILL		0x000102
+#define _EFX_CHANNEL_MAGIC_RX_DRAIN	0x000103
+#define _EFX_CHANNEL_MAGIC_TX_DRAIN	0x000104
 
 #define _EFX_CHANNEL_MAGIC(_code, _data)	((_code) << 8 | (_data))
 #define _EFX_CHANNEL_MAGIC_CODE(_magic)		((_magic) >> 8)
@@ -69,6 +66,12 @@
 #define EFX_CHANNEL_MAGIC_FILL(_rx_queue)				\
 	_EFX_CHANNEL_MAGIC(_EFX_CHANNEL_MAGIC_FILL,			\
 			   efx_rx_queue_index(_rx_queue))
+#define EFX_CHANNEL_MAGIC_RX_DRAIN(_rx_queue)				\
+	_EFX_CHANNEL_MAGIC(_EFX_CHANNEL_MAGIC_RX_DRAIN,			\
+			   efx_rx_queue_index(_rx_queue))
+#define EFX_CHANNEL_MAGIC_TX_DRAIN(_tx_queue)				\
+	_EFX_CHANNEL_MAGIC(_EFX_CHANNEL_MAGIC_TX_DRAIN,			\
+			   (_tx_queue)->queue)
 
 /**************************************************************************
  *
@@ -432,8 +435,6 @@ void efx_nic_init_tx(struct efx_tx_queue *tx_queue)
 	struct efx_nic *efx = tx_queue->efx;
 	efx_oword_t reg;
 
-	tx_queue->flushed = FLUSH_NONE;
-
 	/* Pin TX descriptor ring */
 	efx_init_special_buffer(efx, &tx_queue->txd);
 
@@ -490,9 +491,6 @@ static void efx_flush_tx_queue(struct efx_tx_queue *tx_queue)
 	struct efx_nic *efx = tx_queue->efx;
 	efx_oword_t tx_flush_descq;
 
-	tx_queue->flushed = FLUSH_PENDING;
-
-	/* Post a flush command */
 	EFX_POPULATE_OWORD_2(tx_flush_descq,
 			     FRF_AZ_TX_FLUSH_DESCQ_CMD, 1,
 			     FRF_AZ_TX_FLUSH_DESCQ, tx_queue->queue);
@@ -504,9 +502,6 @@ void efx_nic_fini_tx(struct efx_tx_queue *tx_queue)
 	struct efx_nic *efx = tx_queue->efx;
 	efx_oword_t tx_desc_ptr;
 
-	/* The queue should have been flushed */
-	WARN_ON(tx_queue->flushed != FLUSH_DONE);
-
 	/* Remove TX descriptor ring from card */
 	EFX_ZERO_OWORD(tx_desc_ptr);
 	efx_writeo_table(efx, &tx_desc_ptr, efx->type->txd_ptr_tbl_base,
@@ -597,8 +592,6 @@ void efx_nic_init_rx(struct efx_rx_queue *rx_queue)
 		  efx_rx_queue_index(rx_queue), rx_queue->rxd.index,
 		  rx_queue->rxd.index + rx_queue->rxd.entries - 1);
 
-	rx_queue->flushed = FLUSH_NONE;
-
 	/* Pin RX descriptor ring */
 	efx_init_special_buffer(efx, &rx_queue->rxd);
 
@@ -627,9 +620,6 @@ static void efx_flush_rx_queue(struct efx_rx_queue *rx_queue)
 	struct efx_nic *efx = rx_queue->efx;
 	efx_oword_t rx_flush_descq;
 
-	rx_queue->flushed = FLUSH_PENDING;
-
-	/* Post a flush command */
 	EFX_POPULATE_OWORD_2(rx_flush_descq,
 			     FRF_AZ_RX_FLUSH_DESCQ_CMD, 1,
 			     FRF_AZ_RX_FLUSH_DESCQ,
@@ -642,9 +632,6 @@ void efx_nic_fini_rx(struct efx_rx_queue *rx_queue)
 	efx_oword_t rx_desc_ptr;
 	struct efx_nic *efx = rx_queue->efx;
 
-	/* The queue should already have been flushed */
-	WARN_ON(rx_queue->flushed != FLUSH_DONE);
-
 	/* Remove RX descriptor ring from card */
 	EFX_ZERO_OWORD(rx_desc_ptr);
 	efx_writeo_table(efx, &rx_desc_ptr, efx->type->rxd_ptr_tbl_base,
@@ -662,6 +649,89 @@ void efx_nic_remove_rx(struct efx_rx_queue *rx_queue)
 
 /**************************************************************************
  *
+ * Flush handling
+ *
+ **************************************************************************/
+
+/* efx_nic_flush_queues() must be woken up when all flushes are completed,
+ * or more RX flushes can be kicked off.
+ */
+static bool efx_flush_wake(struct efx_nic *efx)
+{
+	/* Ensure that all updates are visible to efx_nic_flush_queues() */
+	smp_mb();
+
+	return (atomic_read(&efx->drain_pending) == 0 ||
+		(atomic_read(&efx->rxq_flush_outstanding) < EFX_RX_FLUSH_COUNT
+		 && atomic_read(&efx->rxq_flush_pending) > 0));
+}
+
+/* Flush all the transmit queues, and continue flushing receive queues until
+ * they're all flushed. Wait for the DRAIN events to be recieved so that there
+ * are no more RX and TX events left on any channel. */
+int efx_nic_flush_queues(struct efx_nic *efx)
+{
+	unsigned timeout = msecs_to_jiffies(5000); /* 5s for all flushes and drains */
+	struct efx_channel *channel;
+	struct efx_rx_queue *rx_queue;
+	struct efx_tx_queue *tx_queue;
+	int rc = 0;
+
+	efx->type->prepare_flush(efx);
+
+	efx_for_each_channel(channel, efx) {
+		efx_for_each_channel_tx_queue(tx_queue, channel) {
+			atomic_inc(&efx->drain_pending);
+			efx_flush_tx_queue(tx_queue);
+		}
+		efx_for_each_channel_rx_queue(rx_queue, channel) {
+			atomic_inc(&efx->drain_pending);
+			rx_queue->flush_pending = true;
+			atomic_inc(&efx->rxq_flush_pending);
+		}
+	}
+
+	while (timeout && atomic_read(&efx->drain_pending) > 0) {
+		/* The hardware supports four concurrent rx flushes, each of
+		 * which may need to be retried if there is an outstanding
+		 * descriptor fetch
+		 */
+		efx_for_each_channel(channel, efx) {
+			efx_for_each_channel_rx_queue(rx_queue, channel) {
+				if (atomic_read(&efx->rxq_flush_outstanding) >=
+				    EFX_RX_FLUSH_COUNT)
+					break;
+
+				if (rx_queue->flush_pending) {
+					rx_queue->flush_pending = false;
+					atomic_dec(&efx->rxq_flush_pending);
+					atomic_inc(&efx->rxq_flush_outstanding);
+					efx_flush_rx_queue(rx_queue);
+				}
+			}
+		}
+
+		timeout = wait_event_timeout(efx->flush_wq, efx_flush_wake(efx),
+					     timeout);
+	}
+
+	if (atomic_read(&efx->drain_pending)) {
+		netif_err(efx, hw, efx->net_dev, "failed to flush %d queues "
+			  "(rx %d+%d)\n", atomic_read(&efx->drain_pending),
+			  atomic_read(&efx->rxq_flush_outstanding),
+			  atomic_read(&efx->rxq_flush_pending));
+		rc = -ETIMEDOUT;
+
+		atomic_set(&efx->drain_pending, 0);
+		atomic_set(&efx->rxq_flush_pending, 0);
+		atomic_set(&efx->rxq_flush_outstanding, 0);
+	}
+
+	return rc;
+}
+
+/**************************************************************************
+ *
  * Event queue processing
  * Event queues are processed by per-channel tasklets.
  *
@@ -722,6 +792,9 @@ efx_handle_tx_event(struct efx_channel *channel, efx_qword_t *event)
 	struct efx_nic *efx = channel->efx;
 	int tx_packets = 0;
 
+	if (unlikely(ACCESS_ONCE(efx->reset_pending)))
+		return 0;
+
 	if (likely(EFX_QWORD_FIELD(*event, FSF_AZ_TX_EV_COMP))) {
 		/* Transmit completion */
 		tx_ev_desc_ptr = EFX_QWORD_FIELD(*event, FSF_AZ_TX_EV_DESC_PTR);
@@ -863,6 +936,10 @@ efx_handle_rx_event(struct efx_channel *channel, const efx_qword_t *event)
 	bool rx_ev_pkt_ok;
 	u16 flags;
 	struct efx_rx_queue *rx_queue;
+	struct efx_nic *efx = channel->efx;
+
+	if (unlikely(ACCESS_ONCE(efx->reset_pending)))
+		return;
 
 	/* Basic packet information */
 	rx_ev_byte_cnt = EFX_QWORD_FIELD(*event, FSF_AZ_RX_EV_BYTE_CNT);
@@ -909,6 +986,72 @@ efx_handle_rx_event(struct efx_channel *channel, const efx_qword_t *event)
 	efx_rx_packet(rx_queue, rx_ev_desc_ptr, rx_ev_byte_cnt, flags);
 }
 
+/* If this flush done event corresponds to a &struct efx_tx_queue, then
+ * send an %EFX_CHANNEL_MAGIC_TX_DRAIN event to drain the event queue
+ * of all transmit completions.
+ */
+static void
+efx_handle_tx_flush_done(struct efx_nic *efx, efx_qword_t *event)
+{
+	struct efx_tx_queue *tx_queue;
+	int qid;
+
+	qid = EFX_QWORD_FIELD(*event, FSF_AZ_DRIVER_EV_SUBDATA);
+	if (qid < EFX_TXQ_TYPES * efx->n_tx_channels) {
+		tx_queue = efx_get_tx_queue(efx, qid / EFX_TXQ_TYPES,
+					    qid % EFX_TXQ_TYPES);
+
+		efx_magic_event(tx_queue->channel,
+				EFX_CHANNEL_MAGIC_TX_DRAIN(tx_queue));
+	}
+}
+
+/* If this flush done event corresponds to a &struct efx_rx_queue: If the flush
+ * was succesful then send an %EFX_CHANNEL_MAGIC_RX_DRAIN, otherwise add
+ * the RX queue back to the mask of RX queues in need of flushing.
+ */
+static void
+efx_handle_rx_flush_done(struct efx_nic *efx, efx_qword_t *event)
+{
+	struct efx_channel *channel;
+	struct efx_rx_queue *rx_queue;
+	int qid;
+	bool failed;
+
+	qid = EFX_QWORD_FIELD(*event, FSF_AZ_DRIVER_EV_RX_DESCQ_ID);
+	failed = EFX_QWORD_FIELD(*event, FSF_AZ_DRIVER_EV_RX_FLUSH_FAIL);
+	if (qid >= efx->n_channels)
+		return;
+	channel = efx_get_channel(efx, qid);
+	if (!efx_channel_has_rx_queue(channel))
+		return;
+	rx_queue = efx_channel_get_rx_queue(channel);
+
+	if (failed) {
+		netif_info(efx, hw, efx->net_dev,
+			   "RXQ %d flush retry\n", qid);
+		rx_queue->flush_pending = true;
+		atomic_inc(&efx->rxq_flush_pending);
+	} else {
+		efx_magic_event(efx_rx_queue_channel(rx_queue),
+				EFX_CHANNEL_MAGIC_RX_DRAIN(rx_queue));
+	}
+	atomic_dec(&efx->rxq_flush_outstanding);
+	if (efx_flush_wake(efx))
+		wake_up(&efx->flush_wq);
+}
+
+static void
+efx_handle_drain_event(struct efx_channel *channel)
+{
+	struct efx_nic *efx = channel->efx;
+
+	WARN_ON(atomic_read(&efx->drain_pending) == 0);
+	atomic_dec(&efx->drain_pending);
+	if (efx_flush_wake(efx))
+		wake_up(&efx->flush_wq);
+}
+
 static void
 efx_handle_generated_event(struct efx_channel *channel, efx_qword_t *event)
 {
@@ -916,21 +1059,28 @@ efx_handle_generated_event(struct efx_channel *channel, efx_qword_t *event)
 	struct efx_rx_queue *rx_queue =
 		efx_channel_has_rx_queue(channel) ?
 		efx_channel_get_rx_queue(channel) : NULL;
-	unsigned magic;
+	unsigned magic, code;
 
 	magic = EFX_QWORD_FIELD(*event, FSF_AZ_DRV_GEN_EV_MAGIC);
+	code = _EFX_CHANNEL_MAGIC_CODE(magic);
 
-	if (magic == EFX_CHANNEL_MAGIC_TEST(channel))
-		; /* ignore */
-	else if (rx_queue && magic == EFX_CHANNEL_MAGIC_FILL(rx_queue))
+	if (magic == EFX_CHANNEL_MAGIC_TEST(channel)) {
+		/* ignore */
+	} else if (rx_queue && magic == EFX_CHANNEL_MAGIC_FILL(rx_queue)) {
 		/* The queue must be empty, so we won't receive any rx
 		 * events, so efx_process_channel() won't refill the
 		 * queue. Refill it here */
 		efx_fast_push_rx_descriptors(rx_queue);
-	else
+	} else if (rx_queue && magic == EFX_CHANNEL_MAGIC_RX_DRAIN(rx_queue)) {
+		rx_queue->enabled = false;
+		efx_handle_drain_event(channel);
+	} else if (code == _EFX_CHANNEL_MAGIC_TX_DRAIN) {
+		efx_handle_drain_event(channel);
+	} else {
 		netif_dbg(efx, hw, efx->net_dev, "channel %d received "
 			  "generated event "EFX_QWORD_FMT"\n",
 			  channel->channel, EFX_QWORD_VAL(*event));
+	}
 }
 
 static void
@@ -947,10 +1097,12 @@ efx_handle_driver_event(struct efx_channel *channel, efx_qword_t *event)
 	case FSE_AZ_TX_DESCQ_FLS_DONE_EV:
 		netif_vdbg(efx, hw, efx->net_dev, "channel %d TXQ %d flushed\n",
 			   channel->channel, ev_sub_data);
+		efx_handle_tx_flush_done(efx, event);
 		break;
 	case FSE_AZ_RX_DESCQ_FLS_DONE_EV:
 		netif_vdbg(efx, hw, efx->net_dev, "channel %d RXQ %d flushed\n",
 			   channel->channel, ev_sub_data);
+		efx_handle_rx_flush_done(efx, event);
 		break;
 	case FSE_AZ_EVQ_INIT_DONE_EV:
 		netif_dbg(efx, hw, efx->net_dev,
@@ -1162,143 +1314,6 @@ void efx_nic_generate_fill_event(struct efx_rx_queue *rx_queue)
 
 /**************************************************************************
  *
- * Flush handling
- *
- **************************************************************************/
-
-
-static void efx_poll_flush_events(struct efx_nic *efx)
-{
-	struct efx_channel *channel = efx_get_channel(efx, 0);
-	struct efx_tx_queue *tx_queue;
-	struct efx_rx_queue *rx_queue;
-	unsigned int read_ptr = channel->eventq_read_ptr;
-	unsigned int end_ptr = read_ptr + channel->eventq_mask - 1;
-
-	do {
-		efx_qword_t *event = efx_event(channel, read_ptr);
-		int ev_code, ev_sub_code, ev_queue;
-		bool ev_failed;
-
-		if (!efx_event_present(event))
-			break;
-
-		ev_code = EFX_QWORD_FIELD(*event, FSF_AZ_EV_CODE);
-		ev_sub_code = EFX_QWORD_FIELD(*event,
-					      FSF_AZ_DRIVER_EV_SUBCODE);
-		if (ev_code == FSE_AZ_EV_CODE_DRIVER_EV &&
-		    ev_sub_code == FSE_AZ_TX_DESCQ_FLS_DONE_EV) {
-			ev_queue = EFX_QWORD_FIELD(*event,
-						   FSF_AZ_DRIVER_EV_SUBDATA);
-			if (ev_queue < EFX_TXQ_TYPES * efx->n_tx_channels) {
-				tx_queue = efx_get_tx_queue(
-					efx, ev_queue / EFX_TXQ_TYPES,
-					ev_queue % EFX_TXQ_TYPES);
-				tx_queue->flushed = FLUSH_DONE;
-			}
-		} else if (ev_code == FSE_AZ_EV_CODE_DRIVER_EV &&
-			   ev_sub_code == FSE_AZ_RX_DESCQ_FLS_DONE_EV) {
-			ev_queue = EFX_QWORD_FIELD(
-				*event, FSF_AZ_DRIVER_EV_RX_DESCQ_ID);
-			ev_failed = EFX_QWORD_FIELD(
-				*event, FSF_AZ_DRIVER_EV_RX_FLUSH_FAIL);
-			if (ev_queue < efx->n_rx_channels) {
-				rx_queue = efx_get_rx_queue(efx, ev_queue);
-				rx_queue->flushed =
-					ev_failed ? FLUSH_FAILED : FLUSH_DONE;
-			}
-		}
-
-		/* We're about to destroy the queue anyway, so
-		 * it's ok to throw away every non-flush event */
-		EFX_SET_QWORD(*event);
-
-		++read_ptr;
-	} while (read_ptr != end_ptr);
-
-	channel->eventq_read_ptr = read_ptr;
-}
-
-/* Handle tx and rx flushes at the same time, since they run in
- * parallel in the hardware and there's no reason for us to
- * serialise them */
-int efx_nic_flush_queues(struct efx_nic *efx)
-{
-	struct efx_channel *channel;
-	struct efx_rx_queue *rx_queue;
-	struct efx_tx_queue *tx_queue;
-	int i, tx_pending, rx_pending;
-
-	/* If necessary prepare the hardware for flushing */
-	efx->type->prepare_flush(efx);
-
-	/* Flush all tx queues in parallel */
-	efx_for_each_channel(channel, efx) {
-		efx_for_each_possible_channel_tx_queue(tx_queue, channel) {
-			if (tx_queue->initialised)
-				efx_flush_tx_queue(tx_queue);
-		}
-	}
-
-	/* The hardware supports four concurrent rx flushes, each of which may
-	 * need to be retried if there is an outstanding descriptor fetch */
-	for (i = 0; i < EFX_FLUSH_POLL_COUNT; ++i) {
-		rx_pending = tx_pending = 0;
-		efx_for_each_channel(channel, efx) {
-			efx_for_each_channel_rx_queue(rx_queue, channel) {
-				if (rx_queue->flushed == FLUSH_PENDING)
-					++rx_pending;
-			}
-		}
-		efx_for_each_channel(channel, efx) {
-			efx_for_each_channel_rx_queue(rx_queue, channel) {
-				if (rx_pending == EFX_RX_FLUSH_COUNT)
-					break;
-				if (rx_queue->flushed == FLUSH_FAILED ||
-				    rx_queue->flushed == FLUSH_NONE) {
-					efx_flush_rx_queue(rx_queue);
-					++rx_pending;
-				}
-			}
-			efx_for_each_possible_channel_tx_queue(tx_queue, channel) {
-				if (tx_queue->initialised &&
-				    tx_queue->flushed != FLUSH_DONE)
-					++tx_pending;
-			}
-		}
-
-		if (rx_pending == 0 && tx_pending == 0)
-			return 0;
-
-		msleep(EFX_FLUSH_INTERVAL);
-		efx_poll_flush_events(efx);
-	}
-
-	/* Mark the queues as all flushed. We're going to return failure
-	 * leading to a reset, or fake up success anyway */
-	efx_for_each_channel(channel, efx) {
-		efx_for_each_possible_channel_tx_queue(tx_queue, channel) {
-			if (tx_queue->initialised &&
-			    tx_queue->flushed != FLUSH_DONE)
-				netif_err(efx, hw, efx->net_dev,
-					  "tx queue %d flush command timed out\n",
-					  tx_queue->queue);
-			tx_queue->flushed = FLUSH_DONE;
-		}
-		efx_for_each_channel_rx_queue(rx_queue, channel) {
-			if (rx_queue->flushed != FLUSH_DONE)
-				netif_err(efx, hw, efx->net_dev,
-					  "rx queue %d flush command timed out\n",
-					  efx_rx_queue_index(rx_queue));
-			rx_queue->flushed = FLUSH_DONE;
-		}
-	}
-
-	return -ETIMEDOUT;
-}
-
-/**************************************************************************
- *
  * Hardware interrupts
  * The hardware interrupt handler does very little work; all the event
  * queue processing is carried out by per-channel tasklets.
@@ -1320,18 +1335,10 @@ static inline void efx_nic_interrupts(struct efx_nic *efx,
 
 void efx_nic_enable_interrupts(struct efx_nic *efx)
 {
-	struct efx_channel *channel;
-
 	EFX_ZERO_OWORD(*((efx_oword_t *) efx->irq_status.addr));
 	wmb(); /* Ensure interrupt vector is clear before interrupts enabled */
 
-	/* Enable interrupts */
 	efx_nic_interrupts(efx, true, false);
-
-	/* Force processing of all the channels to get the EVQ RPTRs up to
-	   date */
-	efx_for_each_channel(channel, efx)
-		efx_schedule_channel(channel);
 }
 
 void efx_nic_disable_interrupts(struct efx_nic *efx)
diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 76fb521..506d246 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -705,6 +705,7 @@ void efx_init_rx_queue(struct efx_rx_queue *rx_queue)
 	rx_queue->fast_fill_limit = limit;
 
 	/* Set up RX descriptor ring */
+	rx_queue->enabled = true;
 	efx_nic_init_rx(rx_queue);
 }
 
@@ -716,6 +717,9 @@ void efx_fini_rx_queue(struct efx_rx_queue *rx_queue)
 	netif_dbg(rx_queue->efx, drv, rx_queue->efx->net_dev,
 		  "shutting down RX queue %d\n", efx_rx_queue_index(rx_queue));
 
+	/* A flush failure might have left rx_queue->enabled */
+	rx_queue->enabled = false;
+
 	del_timer_sync(&rx_queue->slow_fill);
 	efx_nic_fini_rx(rx_queue);
 
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 11/19] sfc: Use proper function to test for RX channel in efx_poll()
  2012-02-16  0:42 pull request: sfc-next 2012-02-16 Ben Hutchings
                   ` (9 preceding siblings ...)
  2012-02-16  0:48 ` [PATCH net-next 10/19] sfc: Leave interrupts and event queues enabled whenever we can Ben Hutchings
@ 2012-02-16  0:48 ` Ben Hutchings
  2012-02-16  0:48 ` [PATCH net-next 12/19] sfc: Generalise event generation to cover VF-owned event queues Ben Hutchings
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Ben Hutchings @ 2012-02-16  0:48 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/efx.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index ccfb43c..eda2272 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -279,7 +279,7 @@ static int efx_poll(struct napi_struct *napi, int budget)
 	spent = efx_process_channel(channel, budget);
 
 	if (spent < budget) {
-		if (channel->channel < efx->n_rx_channels &&
+		if (efx_channel_has_rx_queue(channel) &&
 		    efx->irq_rx_adaptive &&
 		    unlikely(++channel->irq_count == 1000)) {
 			if (unlikely(channel->irq_mod_score <
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 12/19] sfc: Generalise event generation to cover VF-owned event queues
  2012-02-16  0:42 pull request: sfc-next 2012-02-16 Ben Hutchings
                   ` (10 preceding siblings ...)
  2012-02-16  0:48 ` [PATCH net-next 11/19] sfc: Use proper function to test for RX channel in efx_poll() Ben Hutchings
@ 2012-02-16  0:48 ` Ben Hutchings
  2012-02-16  0:49 ` [PATCH net-next 13/19] sfc: Disable flow control during flushes Ben Hutchings
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Ben Hutchings @ 2012-02-16  0:48 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

For SR-IOV we will need to send events to event queues that belong to
VFs serviced by other drivers.  Change the parameters of
efx_generate_event() to allow this and declare it extern.

While we're at it, remove the existing declaration under the wrong
name efx_nic_generate_event().

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/nic.c |    9 +++++----
 drivers/net/ethernet/sfc/nic.h |    4 ++--
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/sfc/nic.c b/drivers/net/ethernet/sfc/nic.c
index 03e4125..539b574 100644
--- a/drivers/net/ethernet/sfc/nic.c
+++ b/drivers/net/ethernet/sfc/nic.c
@@ -754,7 +754,8 @@ void efx_nic_eventq_read_ack(struct efx_channel *channel)
 }
 
 /* Use HW to insert a SW defined event */
-static void efx_generate_event(struct efx_channel *channel, efx_qword_t *event)
+void efx_generate_event(struct efx_nic *efx, unsigned int evq,
+			efx_qword_t *event)
 {
 	efx_oword_t drv_ev_reg;
 
@@ -764,8 +765,8 @@ static void efx_generate_event(struct efx_channel *channel, efx_qword_t *event)
 	drv_ev_reg.u32[1] = event->u32[1];
 	drv_ev_reg.u32[2] = 0;
 	drv_ev_reg.u32[3] = 0;
-	EFX_SET_OWORD_FIELD(drv_ev_reg, FRF_AZ_DRV_EV_QID, channel->channel);
-	efx_writeo(channel->efx, &drv_ev_reg, FR_AZ_DRV_EV);
+	EFX_SET_OWORD_FIELD(drv_ev_reg, FRF_AZ_DRV_EV_QID, evq);
+	efx_writeo(efx, &drv_ev_reg, FR_AZ_DRV_EV);
 }
 
 static void efx_magic_event(struct efx_channel *channel, u32 magic)
@@ -775,7 +776,7 @@ static void efx_magic_event(struct efx_channel *channel, u32 magic)
 	EFX_POPULATE_QWORD_2(event, FSF_AZ_EV_CODE,
 			     FSE_AZ_EV_CODE_DRV_GEN_EV,
 			     FSF_AZ_DRV_GEN_EV_MAGIC, magic);
-	efx_generate_event(channel, &event);
+	efx_generate_event(channel->efx, channel->channel, &event);
 }
 
 /* Handle a transmit completion event
diff --git a/drivers/net/ethernet/sfc/nic.h b/drivers/net/ethernet/sfc/nic.h
index aeca4e8..1f53e2c 100644
--- a/drivers/net/ethernet/sfc/nic.h
+++ b/drivers/net/ethernet/sfc/nic.h
@@ -283,8 +283,8 @@ extern void efx_nic_get_regs(struct efx_nic *efx, void *buf);
 #define MAC_DATA_LBN 0
 #define MAC_DATA_WIDTH 32
 
-extern void efx_nic_generate_event(struct efx_channel *channel,
-				   efx_qword_t *event);
+extern void efx_generate_event(struct efx_nic *efx, unsigned int evq,
+			       efx_qword_t *event);
 
 extern void falcon_poll_xmac(struct efx_nic *efx);
 
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 13/19] sfc: Disable flow control during flushes
  2012-02-16  0:42 pull request: sfc-next 2012-02-16 Ben Hutchings
                   ` (11 preceding siblings ...)
  2012-02-16  0:48 ` [PATCH net-next 12/19] sfc: Generalise event generation to cover VF-owned event queues Ben Hutchings
@ 2012-02-16  0:49 ` Ben Hutchings
  2012-02-16  0:49 ` [PATCH net-next 14/19] sfc: Make buffer table indices and counts consistently unsigned Ben Hutchings
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Ben Hutchings @ 2012-02-16  0:49 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

From: Steve Hodgson <shodgson@solarflare.com>

The TX DMA engine issues upstream read requests when there is room in
the TX FIFO for the completion. However, the fetches for the rest of
the packet might be delayed by any back pressure.  Since a flush must
wait for an EOP, the entire flush may be delayed by back pressure.

Mitigate this by disabling flow control before the flushes are
started.  Since PF and VF flushes run in parallel introduce
fc_disable, a reference count of the number of flushes outstanding.

The same principle could be applied to Falcon, but that
would bring with it its own testing.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/mcdi_mac.c   |    2 ++
 drivers/net/ethernet/sfc/net_driver.h |    4 ++++
 drivers/net/ethernet/sfc/nic.c        |    3 +++
 3 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/sfc/mcdi_mac.c b/drivers/net/ethernet/sfc/mcdi_mac.c
index f67cf92..98afe1c 100644
--- a/drivers/net/ethernet/sfc/mcdi_mac.c
+++ b/drivers/net/ethernet/sfc/mcdi_mac.c
@@ -44,6 +44,8 @@ static int efx_mcdi_set_mac(struct efx_nic *efx)
 	}
 	if (efx->wanted_fc & EFX_FC_AUTO)
 		fcntl = MC_CMD_FCNTL_AUTO;
+	if (efx->fc_disable)
+		fcntl = MC_CMD_FCNTL_OFF;
 
 	MCDI_SET_DWORD(cmdbytes, SET_MAC_IN_FCNTL, fcntl);
 
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 18c4b3c..f526877 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -670,6 +670,9 @@ struct efx_filter_state;
  * @promiscuous: Promiscuous flag. Protected by netif_tx_lock.
  * @multicast_hash: Multicast hash table
  * @wanted_fc: Wanted flow control flags
+ * @fc_disable: When non-zero flow control is disabled. Typically used to
+ *	ensure that network back pressure doesn't delay dma queue flushes.
+ *	Serialised by the rtnl lock.
  * @mac_work: Work item for changing MAC promiscuity and multicast hash
  * @loopback_mode: Loopback status
  * @loopback_modes: Supported loopback mode bitmask
@@ -769,6 +772,7 @@ struct efx_nic {
 	bool promiscuous;
 	union efx_multicast_hash multicast_hash;
 	u8 wanted_fc;
+	unsigned fc_disable;
 
 	atomic_t rx_reset;
 	enum efx_loopback_mode loopback_mode;
diff --git a/drivers/net/ethernet/sfc/nic.c b/drivers/net/ethernet/sfc/nic.c
index 539b574..1bb41ff 100644
--- a/drivers/net/ethernet/sfc/nic.c
+++ b/drivers/net/ethernet/sfc/nic.c
@@ -677,6 +677,7 @@ int efx_nic_flush_queues(struct efx_nic *efx)
 	struct efx_tx_queue *tx_queue;
 	int rc = 0;
 
+	efx->fc_disable++;
 	efx->type->prepare_flush(efx);
 
 	efx_for_each_channel(channel, efx) {
@@ -727,6 +728,8 @@ int efx_nic_flush_queues(struct efx_nic *efx)
 		atomic_set(&efx->rxq_flush_outstanding, 0);
 	}
 
+	efx->fc_disable--;
+
 	return rc;
 }
 
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 14/19] sfc: Make buffer table indices and counts consistently unsigned
  2012-02-16  0:42 pull request: sfc-next 2012-02-16 Ben Hutchings
                   ` (12 preceding siblings ...)
  2012-02-16  0:49 ` [PATCH net-next 13/19] sfc: Disable flow control during flushes Ben Hutchings
@ 2012-02-16  0:49 ` Ben Hutchings
  2012-02-16  0:50 ` [PATCH net-next 15/19] sfc: Make all CPU/IRQ/channel/queue counts unsigned Ben Hutchings
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Ben Hutchings @ 2012-02-16  0:49 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/net_driver.h |    4 ++--
 drivers/net/ethernet/sfc/nic.c        |    2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index f526877..59c0583 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -81,8 +81,8 @@ struct efx_special_buffer {
 	void *addr;
 	dma_addr_t dma_addr;
 	unsigned int len;
-	int index;
-	int entries;
+	unsigned int index;
+	unsigned int entries;
 };
 
 /**
diff --git a/drivers/net/ethernet/sfc/nic.c b/drivers/net/ethernet/sfc/nic.c
index 1bb41ff..2bdfb63 100644
--- a/drivers/net/ethernet/sfc/nic.c
+++ b/drivers/net/ethernet/sfc/nic.c
@@ -192,7 +192,7 @@ static void
 efx_init_special_buffer(struct efx_nic *efx, struct efx_special_buffer *buffer)
 {
 	efx_qword_t buf_desc;
-	int index;
+	unsigned int index;
 	dma_addr_t dma_addr;
 	int i;
 
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 15/19] sfc: Make all CPU/IRQ/channel/queue counts unsigned
  2012-02-16  0:42 pull request: sfc-next 2012-02-16 Ben Hutchings
                   ` (13 preceding siblings ...)
  2012-02-16  0:49 ` [PATCH net-next 14/19] sfc: Make buffer table indices and counts consistently unsigned Ben Hutchings
@ 2012-02-16  0:50 ` Ben Hutchings
  2012-02-16  0:50 ` [PATCH net-next 16/19] sfc: Add support for 'extra' channel types Ben Hutchings
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Ben Hutchings @ 2012-02-16  0:50 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers, Shradha Shah

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/efx.c        |   18 ++++++++++--------
 drivers/net/ethernet/sfc/net_driver.h |    2 +-
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index eda2272..3a103f8 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -1104,10 +1104,10 @@ static void efx_fini_io(struct efx_nic *efx)
 	pci_disable_device(efx->pci_dev);
 }
 
-static int efx_wanted_parallelism(void)
+static unsigned int efx_wanted_parallelism(void)
 {
 	cpumask_var_t thread_mask;
-	int count;
+	unsigned int count;
 	int cpu;
 
 	if (rss_cpus)
@@ -1136,7 +1136,8 @@ static int
 efx_init_rx_cpu_rmap(struct efx_nic *efx, struct msix_entry *xentries)
 {
 #ifdef CONFIG_RFS_ACCEL
-	int i, rc;
+	unsigned int i;
+	int rc;
 
 	efx->net_dev->rx_cpu_rmap = alloc_irq_cpu_rmap(efx->n_rx_channels);
 	if (!efx->net_dev->rx_cpu_rmap)
@@ -1159,13 +1160,14 @@ efx_init_rx_cpu_rmap(struct efx_nic *efx, struct msix_entry *xentries)
  */
 static int efx_probe_interrupts(struct efx_nic *efx)
 {
-	int max_channels =
-		min_t(int, efx->type->phys_addr_channels, EFX_MAX_CHANNELS);
-	int rc, i;
+	unsigned int max_channels =
+		min(efx->type->phys_addr_channels, EFX_MAX_CHANNELS);
+	unsigned int i;
+	int rc;
 
 	if (efx->interrupt_mode == EFX_INT_MODE_MSIX) {
 		struct msix_entry xentries[EFX_MAX_CHANNELS];
-		int n_channels;
+		unsigned int n_channels;
 
 		n_channels = efx_wanted_parallelism();
 		if (separate_tx_channels)
@@ -1178,7 +1180,7 @@ static int efx_probe_interrupts(struct efx_nic *efx)
 		if (rc > 0) {
 			netif_err(efx, drv, efx->net_dev,
 				  "WARNING: Insufficient MSI-X vectors"
-				  " available (%d < %d).\n", rc, n_channels);
+				  " available (%d < %u).\n", rc, n_channels);
 			netif_err(efx, drv, efx->net_dev,
 				  "WARNING: Performance may be reduced.\n");
 			EFX_BUG_ON_PARANOID(rc >= n_channels);
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 59c0583..3260e71 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -52,7 +52,7 @@
  *
  **************************************************************************/
 
-#define EFX_MAX_CHANNELS 32
+#define EFX_MAX_CHANNELS 32U
 #define EFX_MAX_RX_QUEUES EFX_MAX_CHANNELS
 
 /* Checksum generation is a per-queue option in hardware, so each
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 16/19] sfc: Add support for 'extra' channel types
  2012-02-16  0:42 pull request: sfc-next 2012-02-16 Ben Hutchings
                   ` (14 preceding siblings ...)
  2012-02-16  0:50 ` [PATCH net-next 15/19] sfc: Make all CPU/IRQ/channel/queue counts unsigned Ben Hutchings
@ 2012-02-16  0:50 ` Ben Hutchings
  2012-02-16  0:51 ` [PATCH net-next 17/19] sfc: Pass NIC structure into efx_wanted_parallelism() Ben Hutchings
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Ben Hutchings @ 2012-02-16  0:50 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

Abstract some of the channel operations to allow for 'extra'
channels that do not have RX or TX queues.

- Try to assign a channel to each extra channel type that is enabled
  for the NIC, but gracefully degrade if we can't allocate sufficient
  MSI-X vectors
- Allow each extra channel type to generate its own channel name
- Allow channel types to disable reallocation and reinitialisation
  of their channels

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/efx.c        |  312 +++++++++++++++++++++------------
 drivers/net/ethernet/sfc/efx.h        |    1 +
 drivers/net/ethernet/sfc/net_driver.h |   34 ++++
 3 files changed, 239 insertions(+), 108 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 3a103f8..0a9ab49 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -186,11 +186,13 @@ MODULE_PARM_DESC(debug, "Bitmapped debugging message enable value");
  *
  *************************************************************************/
 
-static void efx_start_interrupts(struct efx_nic *efx);
-static void efx_stop_interrupts(struct efx_nic *efx);
+static void efx_start_interrupts(struct efx_nic *efx, bool may_keep_eventq);
+static void efx_stop_interrupts(struct efx_nic *efx, bool may_keep_eventq);
+static void efx_remove_channel(struct efx_channel *channel);
 static void efx_remove_channels(struct efx_nic *efx);
+static const struct efx_channel_type efx_default_channel_type;
 static void efx_remove_port(struct efx_nic *efx);
-static void efx_init_napi(struct efx_nic *efx);
+static void efx_init_napi_channel(struct efx_channel *channel);
 static void efx_fini_napi(struct efx_nic *efx);
 static void efx_fini_napi_channel(struct efx_channel *channel);
 static void efx_fini_struct(struct efx_nic *efx);
@@ -439,8 +441,7 @@ static void efx_remove_eventq(struct efx_channel *channel)
  *
  *************************************************************************/
 
-/* Allocate and initialise a channel structure, optionally copying
- * parameters (but not resources) from an old channel structure. */
+/* Allocate and initialise a channel structure. */
 static struct efx_channel *
 efx_alloc_channel(struct efx_nic *efx, int i, struct efx_channel *old_channel)
 {
@@ -449,45 +450,60 @@ efx_alloc_channel(struct efx_nic *efx, int i, struct efx_channel *old_channel)
 	struct efx_tx_queue *tx_queue;
 	int j;
 
-	if (old_channel) {
-		channel = kmalloc(sizeof(*channel), GFP_KERNEL);
-		if (!channel)
-			return NULL;
+	channel = kzalloc(sizeof(*channel), GFP_KERNEL);
+	if (!channel)
+		return NULL;
 
-		*channel = *old_channel;
+	channel->efx = efx;
+	channel->channel = i;
+	channel->type = &efx_default_channel_type;
 
-		channel->napi_dev = NULL;
-		memset(&channel->eventq, 0, sizeof(channel->eventq));
+	for (j = 0; j < EFX_TXQ_TYPES; j++) {
+		tx_queue = &channel->tx_queue[j];
+		tx_queue->efx = efx;
+		tx_queue->queue = i * EFX_TXQ_TYPES + j;
+		tx_queue->channel = channel;
+	}
 
-		rx_queue = &channel->rx_queue;
-		rx_queue->buffer = NULL;
-		memset(&rx_queue->rxd, 0, sizeof(rx_queue->rxd));
+	rx_queue = &channel->rx_queue;
+	rx_queue->efx = efx;
+	setup_timer(&rx_queue->slow_fill, efx_rx_slow_fill,
+		    (unsigned long)rx_queue);
 
-		for (j = 0; j < EFX_TXQ_TYPES; j++) {
-			tx_queue = &channel->tx_queue[j];
-			if (tx_queue->channel)
-				tx_queue->channel = channel;
-			tx_queue->buffer = NULL;
-			memset(&tx_queue->txd, 0, sizeof(tx_queue->txd));
-		}
-	} else {
-		channel = kzalloc(sizeof(*channel), GFP_KERNEL);
-		if (!channel)
-			return NULL;
+	return channel;
+}
+
+/* Allocate and initialise a channel structure, copying parameters
+ * (but not resources) from an old channel structure.
+ */
+static struct efx_channel *
+efx_copy_channel(const struct efx_channel *old_channel)
+{
+	struct efx_channel *channel;
+	struct efx_rx_queue *rx_queue;
+	struct efx_tx_queue *tx_queue;
+	int j;
 
-		channel->efx = efx;
-		channel->channel = i;
+	channel = kmalloc(sizeof(*channel), GFP_KERNEL);
+	if (!channel)
+		return NULL;
+
+	*channel = *old_channel;
+
+	channel->napi_dev = NULL;
+	memset(&channel->eventq, 0, sizeof(channel->eventq));
 
-		for (j = 0; j < EFX_TXQ_TYPES; j++) {
-			tx_queue = &channel->tx_queue[j];
-			tx_queue->efx = efx;
-			tx_queue->queue = i * EFX_TXQ_TYPES + j;
+	for (j = 0; j < EFX_TXQ_TYPES; j++) {
+		tx_queue = &channel->tx_queue[j];
+		if (tx_queue->channel)
 			tx_queue->channel = channel;
-		}
+		tx_queue->buffer = NULL;
+		memset(&tx_queue->txd, 0, sizeof(tx_queue->txd));
 	}
 
 	rx_queue = &channel->rx_queue;
-	rx_queue->efx = efx;
+	rx_queue->buffer = NULL;
+	memset(&rx_queue->rxd, 0, sizeof(rx_queue->rxd));
 	setup_timer(&rx_queue->slow_fill, efx_rx_slow_fill,
 		    (unsigned long)rx_queue);
 
@@ -503,57 +519,62 @@ static int efx_probe_channel(struct efx_channel *channel)
 	netif_dbg(channel->efx, probe, channel->efx->net_dev,
 		  "creating channel %d\n", channel->channel);
 
+	rc = channel->type->pre_probe(channel);
+	if (rc)
+		goto fail;
+
 	rc = efx_probe_eventq(channel);
 	if (rc)
-		goto fail1;
+		goto fail;
 
 	efx_for_each_channel_tx_queue(tx_queue, channel) {
 		rc = efx_probe_tx_queue(tx_queue);
 		if (rc)
-			goto fail2;
+			goto fail;
 	}
 
 	efx_for_each_channel_rx_queue(rx_queue, channel) {
 		rc = efx_probe_rx_queue(rx_queue);
 		if (rc)
-			goto fail3;
+			goto fail;
 	}
 
 	channel->n_rx_frm_trunc = 0;
 
 	return 0;
 
- fail3:
-	efx_for_each_channel_rx_queue(rx_queue, channel)
-		efx_remove_rx_queue(rx_queue);
- fail2:
-	efx_for_each_channel_tx_queue(tx_queue, channel)
-		efx_remove_tx_queue(tx_queue);
- fail1:
+fail:
+	efx_remove_channel(channel);
 	return rc;
 }
 
+static void
+efx_get_channel_name(struct efx_channel *channel, char *buf, size_t len)
+{
+	struct efx_nic *efx = channel->efx;
+	const char *type;
+	int number;
+
+	number = channel->channel;
+	if (efx->tx_channel_offset == 0) {
+		type = "";
+	} else if (channel->channel < efx->tx_channel_offset) {
+		type = "-rx";
+	} else {
+		type = "-tx";
+		number -= efx->tx_channel_offset;
+	}
+	snprintf(buf, len, "%s%s-%d", efx->name, type, number);
+}
 
 static void efx_set_channel_names(struct efx_nic *efx)
 {
 	struct efx_channel *channel;
-	const char *type = "";
-	int number;
 
-	efx_for_each_channel(channel, efx) {
-		number = channel->channel;
-		if (efx->n_channels > efx->n_rx_channels) {
-			if (channel->channel < efx->n_rx_channels) {
-				type = "-rx";
-			} else {
-				type = "-tx";
-				number -= efx->n_rx_channels;
-			}
-		}
-		snprintf(efx->channel_name[channel->channel],
-			 sizeof(efx->channel_name[0]),
-			 "%s%s-%d", efx->name, type, number);
-	}
+	efx_for_each_channel(channel, efx)
+		channel->type->get_name(channel,
+					efx->channel_name[channel->channel],
+					sizeof(efx->channel_name[0]));
 }
 
 static int efx_probe_channels(struct efx_nic *efx)
@@ -697,16 +718,40 @@ efx_realloc_channels(struct efx_nic *efx, u32 rxq_entries, u32 txq_entries)
 {
 	struct efx_channel *other_channel[EFX_MAX_CHANNELS], *channel;
 	u32 old_rxq_entries, old_txq_entries;
-	unsigned i;
-	int rc;
+	unsigned i, next_buffer_table = 0;
+	int rc = 0;
+
+	/* Not all channels should be reallocated. We must avoid
+	 * reallocating their buffer table entries.
+	 */
+	efx_for_each_channel(channel, efx) {
+		struct efx_rx_queue *rx_queue;
+		struct efx_tx_queue *tx_queue;
+
+		if (channel->type->copy)
+			continue;
+		next_buffer_table = max(next_buffer_table,
+					channel->eventq.index +
+					channel->eventq.entries);
+		efx_for_each_channel_rx_queue(rx_queue, channel)
+			next_buffer_table = max(next_buffer_table,
+						rx_queue->rxd.index +
+						rx_queue->rxd.entries);
+		efx_for_each_channel_tx_queue(tx_queue, channel)
+			next_buffer_table = max(next_buffer_table,
+						tx_queue->txd.index +
+						tx_queue->txd.entries);
+	}
 
 	efx_stop_all(efx);
-	efx_stop_interrupts(efx);
+	efx_stop_interrupts(efx, true);
 
-	/* Clone channels */
+	/* Clone channels (where possible) */
 	memset(other_channel, 0, sizeof(other_channel));
 	for (i = 0; i < efx->n_channels; i++) {
-		channel = efx_alloc_channel(efx, i, efx->channel[i]);
+		channel = efx->channel[i];
+		if (channel->type->copy)
+			channel = channel->type->copy(channel);
 		if (!channel) {
 			rc = -ENOMEM;
 			goto out;
@@ -725,23 +770,31 @@ efx_realloc_channels(struct efx_nic *efx, u32 rxq_entries, u32 txq_entries)
 		other_channel[i] = channel;
 	}
 
-	rc = efx_probe_channels(efx);
-	if (rc)
-		goto rollback;
-
-	efx_init_napi(efx);
+	/* Restart buffer table allocation */
+	efx->next_buffer_table = next_buffer_table;
 
-	/* Destroy old channels */
 	for (i = 0; i < efx->n_channels; i++) {
-		efx_fini_napi_channel(other_channel[i]);
-		efx_remove_channel(other_channel[i]);
+		channel = efx->channel[i];
+		if (!channel->type->copy)
+			continue;
+		rc = efx_probe_channel(channel);
+		if (rc)
+			goto rollback;
+		efx_init_napi_channel(efx->channel[i]);
 	}
+
 out:
-	/* Free unused channel structures */
-	for (i = 0; i < efx->n_channels; i++)
-		kfree(other_channel[i]);
+	/* Destroy unused channel structures */
+	for (i = 0; i < efx->n_channels; i++) {
+		channel = other_channel[i];
+		if (channel && channel->type->copy) {
+			efx_fini_napi_channel(channel);
+			efx_remove_channel(channel);
+			kfree(channel);
+		}
+	}
 
-	efx_start_interrupts(efx);
+	efx_start_interrupts(efx, true);
 	efx_start_all(efx);
 	return rc;
 
@@ -762,6 +815,18 @@ void efx_schedule_slow_fill(struct efx_rx_queue *rx_queue)
 	mod_timer(&rx_queue->slow_fill, jiffies + msecs_to_jiffies(100));
 }
 
+static const struct efx_channel_type efx_default_channel_type = {
+	.pre_probe		= efx_channel_dummy_op_int,
+	.get_name		= efx_get_channel_name,
+	.copy			= efx_copy_channel,
+	.keep_eventq		= false,
+};
+
+int efx_channel_dummy_op_int(struct efx_channel *channel)
+{
+	return 0;
+}
+
 /**************************************************************************
  *
  * Port handling
@@ -1162,9 +1227,14 @@ static int efx_probe_interrupts(struct efx_nic *efx)
 {
 	unsigned int max_channels =
 		min(efx->type->phys_addr_channels, EFX_MAX_CHANNELS);
-	unsigned int i;
+	unsigned int extra_channels = 0;
+	unsigned int i, j;
 	int rc;
 
+	for (i = 0; i < EFX_MAX_EXTRA_CHANNELS; i++)
+		if (efx->extra_channel_type[i])
+			++extra_channels;
+
 	if (efx->interrupt_mode == EFX_INT_MODE_MSIX) {
 		struct msix_entry xentries[EFX_MAX_CHANNELS];
 		unsigned int n_channels;
@@ -1172,6 +1242,7 @@ static int efx_probe_interrupts(struct efx_nic *efx)
 		n_channels = efx_wanted_parallelism();
 		if (separate_tx_channels)
 			n_channels *= 2;
+		n_channels += extra_channels;
 		n_channels = min(n_channels, max_channels);
 
 		for (i = 0; i < n_channels; i++)
@@ -1191,22 +1262,23 @@ static int efx_probe_interrupts(struct efx_nic *efx)
 
 		if (rc == 0) {
 			efx->n_channels = n_channels;
+			if (n_channels > extra_channels)
+				n_channels -= extra_channels;
 			if (separate_tx_channels) {
-				efx->n_tx_channels =
-					max(efx->n_channels / 2, 1U);
-				efx->n_rx_channels =
-					max(efx->n_channels -
-					    efx->n_tx_channels, 1U);
+				efx->n_tx_channels = max(n_channels / 2, 1U);
+				efx->n_rx_channels = max(n_channels -
+							 efx->n_tx_channels,
+							 1U);
 			} else {
-				efx->n_tx_channels = efx->n_channels;
-				efx->n_rx_channels = efx->n_channels;
+				efx->n_tx_channels = n_channels;
+				efx->n_rx_channels = n_channels;
 			}
 			rc = efx_init_rx_cpu_rmap(efx, xentries);
 			if (rc) {
 				pci_disable_msix(efx->pci_dev);
 				return rc;
 			}
-			for (i = 0; i < n_channels; i++)
+			for (i = 0; i < efx->n_channels; i++)
 				efx_get_channel(efx, i)->irq =
 					xentries[i].vector;
 		} else {
@@ -1240,11 +1312,26 @@ static int efx_probe_interrupts(struct efx_nic *efx)
 		efx->legacy_irq = efx->pci_dev->irq;
 	}
 
+	/* Assign extra channels if possible */
+	j = efx->n_channels;
+	for (i = 0; i < EFX_MAX_EXTRA_CHANNELS; i++) {
+		if (!efx->extra_channel_type[i])
+			continue;
+		if (efx->interrupt_mode != EFX_INT_MODE_MSIX ||
+		    efx->n_channels <= extra_channels) {
+			efx->extra_channel_type[i]->handle_no_channel(efx);
+		} else {
+			--j;
+			efx_get_channel(efx, j)->type =
+				efx->extra_channel_type[i];
+		}
+	}
+
 	return 0;
 }
 
 /* Enable interrupts, then probe and start the event queues */
-static void efx_start_interrupts(struct efx_nic *efx)
+static void efx_start_interrupts(struct efx_nic *efx, bool may_keep_eventq)
 {
 	struct efx_channel *channel;
 
@@ -1253,14 +1340,15 @@ static void efx_start_interrupts(struct efx_nic *efx)
 	efx_nic_enable_interrupts(efx);
 
 	efx_for_each_channel(channel, efx) {
-		efx_init_eventq(channel);
+		if (!channel->type->keep_eventq || !may_keep_eventq)
+			efx_init_eventq(channel);
 		efx_start_eventq(channel);
 	}
 
 	efx_mcdi_mode_event(efx);
 }
 
-static void efx_stop_interrupts(struct efx_nic *efx)
+static void efx_stop_interrupts(struct efx_nic *efx, bool may_keep_eventq)
 {
 	struct efx_channel *channel;
 
@@ -1277,7 +1365,8 @@ static void efx_stop_interrupts(struct efx_nic *efx)
 			synchronize_irq(channel->irq);
 
 		efx_stop_eventq(channel);
-		efx_fini_eventq(channel);
+		if (!channel->type->keep_eventq || !may_keep_eventq)
+			efx_fini_eventq(channel);
 	}
 }
 
@@ -1383,21 +1472,22 @@ static int efx_probe_all(struct efx_nic *efx)
 	}
 
 	efx->rxq_entries = efx->txq_entries = EFX_DEFAULT_DMAQ_SIZE;
-	rc = efx_probe_channels(efx);
-	if (rc)
-		goto fail3;
 
 	rc = efx_probe_filters(efx);
 	if (rc) {
 		netif_err(efx, probe, efx->net_dev,
 			  "failed to create filter tables\n");
-		goto fail4;
+		goto fail3;
 	}
 
+	rc = efx_probe_channels(efx);
+	if (rc)
+		goto fail4;
+
 	return 0;
 
  fail4:
-	efx_remove_channels(efx);
+	efx_remove_filters(efx);
  fail3:
 	efx_remove_port(efx);
  fail2:
@@ -1482,8 +1572,8 @@ static void efx_stop_all(struct efx_nic *efx)
 
 static void efx_remove_all(struct efx_nic *efx)
 {
-	efx_remove_filters(efx);
 	efx_remove_channels(efx);
+	efx_remove_filters(efx);
 	efx_remove_port(efx);
 	efx_remove_nic(efx);
 }
@@ -1627,15 +1717,21 @@ static int efx_ioctl(struct net_device *net_dev, struct ifreq *ifr, int cmd)
  *
  **************************************************************************/
 
+static void efx_init_napi_channel(struct efx_channel *channel)
+{
+	struct efx_nic *efx = channel->efx;
+
+	channel->napi_dev = efx->net_dev;
+	netif_napi_add(channel->napi_dev, &channel->napi_str,
+		       efx_poll, napi_weight);
+}
+
 static void efx_init_napi(struct efx_nic *efx)
 {
 	struct efx_channel *channel;
 
-	efx_for_each_channel(channel, efx) {
-		channel->napi_dev = efx->net_dev;
-		netif_napi_add(channel->napi_dev, &channel->napi_str,
-			       efx_poll, napi_weight);
-	}
+	efx_for_each_channel(channel, efx)
+		efx_init_napi_channel(channel);
 }
 
 static void efx_fini_napi_channel(struct efx_channel *channel)
@@ -2013,7 +2109,7 @@ void efx_reset_down(struct efx_nic *efx, enum reset_type method)
 	efx_stop_all(efx);
 	mutex_lock(&efx->mac_lock);
 
-	efx_stop_interrupts(efx);
+	efx_stop_interrupts(efx, false);
 	if (efx->port_initialized && method != RESET_TYPE_INVISIBLE)
 		efx->phy_op->fini(efx);
 	efx->type->fini(efx);
@@ -2050,7 +2146,7 @@ int efx_reset_up(struct efx_nic *efx, enum reset_type method, bool ok)
 
 	efx->type->reconfigure_mac(efx);
 
-	efx_start_interrupts(efx);
+	efx_start_interrupts(efx, false);
 	efx_restore_filters(efx);
 
 	mutex_unlock(&efx->mac_lock);
@@ -2314,7 +2410,7 @@ static void efx_pci_remove_main(struct efx_nic *efx)
 	free_irq_cpu_rmap(efx->net_dev->rx_cpu_rmap);
 	efx->net_dev->rx_cpu_rmap = NULL;
 #endif
-	efx_stop_interrupts(efx);
+	efx_stop_interrupts(efx, false);
 	efx_nic_fini_interrupt(efx);
 	efx_fini_port(efx);
 	efx->type->fini(efx);
@@ -2341,7 +2437,7 @@ static void efx_pci_remove(struct pci_dev *pci_dev)
 	/* Allow any queued efx_resets() to complete */
 	rtnl_unlock();
 
-	efx_stop_interrupts(efx);
+	efx_stop_interrupts(efx, false);
 	efx_unregister_netdev(efx);
 
 	efx_mtd_remove(efx);
@@ -2393,7 +2489,7 @@ static int efx_pci_probe_main(struct efx_nic *efx)
 	rc = efx_nic_init_interrupt(efx);
 	if (rc)
 		goto fail5;
-	efx_start_interrupts(efx);
+	efx_start_interrupts(efx, false);
 
 	return 0;
 
@@ -2517,7 +2613,7 @@ static int efx_pm_freeze(struct device *dev)
 	netif_device_detach(efx->net_dev);
 
 	efx_stop_all(efx);
-	efx_stop_interrupts(efx);
+	efx_stop_interrupts(efx, false);
 
 	return 0;
 }
@@ -2528,7 +2624,7 @@ static int efx_pm_thaw(struct device *dev)
 
 	efx->state = STATE_INIT;
 
-	efx_start_interrupts(efx);
+	efx_start_interrupts(efx, false);
 
 	mutex_lock(&efx->mac_lock);
 	efx->phy_op->reconfigure(efx);
diff --git a/drivers/net/ethernet/sfc/efx.h b/drivers/net/ethernet/sfc/efx.h
index 7f546e2..4debfe0 100644
--- a/drivers/net/ethernet/sfc/efx.h
+++ b/drivers/net/ethernet/sfc/efx.h
@@ -95,6 +95,7 @@ static inline void efx_filter_rfs_expire(struct efx_channel *channel) {}
 #endif
 
 /* Channels */
+extern int efx_channel_dummy_op_int(struct efx_channel *channel);
 extern void efx_process_channel_now(struct efx_channel *channel);
 extern int
 efx_realloc_channels(struct efx_nic *efx, u32 rxq_entries, u32 txq_entries);
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 3260e71..94b0dca 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -54,6 +54,7 @@
 
 #define EFX_MAX_CHANNELS 32U
 #define EFX_MAX_RX_QUEUES EFX_MAX_CHANNELS
+#define EFX_MAX_EXTRA_CHANNELS	0U
 
 /* Checksum generation is a per-queue option in hardware, so each
  * queue visible to the networking core is backed by two hardware TX
@@ -311,6 +312,7 @@ enum efx_rx_alloc_method {
  *
  * @efx: Associated Efx NIC
  * @channel: Channel instance number
+ * @type: Channel type definition
  * @enabled: Channel enabled indicator
  * @irq: IRQ number (MSI and MSI-X only)
  * @irq_moderation: IRQ moderation value (in hardware ticks)
@@ -341,6 +343,7 @@ enum efx_rx_alloc_method {
 struct efx_channel {
 	struct efx_nic *efx;
 	int channel;
+	const struct efx_channel_type *type;
 	bool enabled;
 	int irq;
 	unsigned int irq_moderation;
@@ -379,6 +382,26 @@ struct efx_channel {
 	struct efx_tx_queue tx_queue[EFX_TXQ_TYPES];
 };
 
+/**
+ * struct efx_channel_type - distinguishes traffic and extra channels
+ * @handle_no_channel: Handle failure to allocate an extra channel
+ * @pre_probe: Set up extra state prior to initialisation
+ * @post_remove: Tear down extra state after finalisation, if allocated.
+ *	May be called on channels that have not been probed.
+ * @get_name: Generate the channel's name (used for its IRQ handler)
+ * @copy: Copy the channel state prior to reallocation.  May be %NULL if
+ *	reallocation is not supported.
+ * @keep_eventq: Flag for whether event queue should be kept initialised
+ *	while the device is stopped
+ */
+struct efx_channel_type {
+	void (*handle_no_channel)(struct efx_nic *);
+	int (*pre_probe)(struct efx_channel *);
+	void (*get_name)(struct efx_channel *, char *buf, size_t len);
+	struct efx_channel *(*copy)(const struct efx_channel *);
+	bool keep_eventq;
+};
+
 enum efx_led_mode {
 	EFX_LED_OFF	= 0,
 	EFX_LED_ON	= 1,
@@ -631,6 +654,8 @@ struct efx_filter_state;
  * @rx_queue: RX DMA queues
  * @channel: Channels
  * @channel_name: Names for channels and their IRQs
+ * @extra_channel_types: Types of extra (non-traffic) channels that
+ *	should be allocated for this NIC
  * @rxq_entries: Size of receive queues requested by user.
  * @txq_entries: Size of transmit queues requested by user.
  * @next_buffer_table: First available buffer table id
@@ -723,6 +748,8 @@ struct efx_nic {
 
 	struct efx_channel *channel[EFX_MAX_CHANNELS];
 	char channel_name[EFX_MAX_CHANNELS][IFNAMSIZ + 6];
+	const struct efx_channel_type *
+	extra_channel_type[EFX_MAX_EXTRA_CHANNELS];
 
 	unsigned rxq_entries;
 	unsigned txq_entries;
@@ -921,6 +948,13 @@ efx_get_channel(struct efx_nic *efx, unsigned index)
 	     _channel = (_channel->channel + 1 < (_efx)->n_channels) ?	\
 		     (_efx)->channel[_channel->channel + 1] : NULL)
 
+/* Iterate over all used channels in reverse */
+#define efx_for_each_channel_rev(_channel, _efx)			\
+	for (_channel = (_efx)->channel[(_efx)->n_channels - 1];	\
+	     _channel;							\
+	     _channel = _channel->channel ?				\
+		     (_efx)->channel[_channel->channel - 1] : NULL)
+
 static inline struct efx_tx_queue *
 efx_get_tx_queue(struct efx_nic *efx, unsigned index, unsigned type)
 {
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 17/19] sfc: Pass NIC structure into efx_wanted_parallelism()
  2012-02-16  0:42 pull request: sfc-next 2012-02-16 Ben Hutchings
                   ` (15 preceding siblings ...)
  2012-02-16  0:50 ` [PATCH net-next 16/19] sfc: Add support for 'extra' channel types Ben Hutchings
@ 2012-02-16  0:51 ` Ben Hutchings
  2012-02-16  0:52 ` [PATCH net-next 18/19] sfc: Allocate SRAM between buffer table and descriptor caches at init time Ben Hutchings
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Ben Hutchings @ 2012-02-16  0:51 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

This lets us identify the NIC affected in case of failure, and
will be necessary to adjust for SR-IOV constraints.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/efx.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 0a9ab49..e943ffb 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -1169,7 +1169,7 @@ static void efx_fini_io(struct efx_nic *efx)
 	pci_disable_device(efx->pci_dev);
 }
 
-static unsigned int efx_wanted_parallelism(void)
+static unsigned int efx_wanted_parallelism(struct efx_nic *efx)
 {
 	cpumask_var_t thread_mask;
 	unsigned int count;
@@ -1179,8 +1179,8 @@ static unsigned int efx_wanted_parallelism(void)
 		return rss_cpus;
 
 	if (unlikely(!zalloc_cpumask_var(&thread_mask, GFP_KERNEL))) {
-		printk(KERN_WARNING
-		       "sfc: RSS disabled due to allocation failure\n");
+		netif_warn(efx, probe, efx->net_dev,
+			   "RSS disabled due to allocation failure\n");
 		return 1;
 	}
 
@@ -1239,7 +1239,7 @@ static int efx_probe_interrupts(struct efx_nic *efx)
 		struct msix_entry xentries[EFX_MAX_CHANNELS];
 		unsigned int n_channels;
 
-		n_channels = efx_wanted_parallelism();
+		n_channels = efx_wanted_parallelism(efx);
 		if (separate_tx_channels)
 			n_channels *= 2;
 		n_channels += extra_channels;
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 18/19] sfc: Allocate SRAM between buffer table and descriptor caches at init time
  2012-02-16  0:42 pull request: sfc-next 2012-02-16 Ben Hutchings
                   ` (16 preceding siblings ...)
  2012-02-16  0:51 ` [PATCH net-next 17/19] sfc: Pass NIC structure into efx_wanted_parallelism() Ben Hutchings
@ 2012-02-16  0:52 ` Ben Hutchings
  2012-02-16  0:52 ` [PATCH net-next 19/19] sfc: Add SR-IOV back-end support for SFC9000 family Ben Hutchings
  2012-02-16 22:08 ` pull request: sfc-next 2012-02-16 David Miller
  19 siblings, 0 replies; 25+ messages in thread
From: Ben Hutchings @ 2012-02-16  0:52 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

Each port has a block of 64-bit SRAM that is divided between buffer
table and descriptor cache regions at initialisation time.  Currently
we use a fixed allocation, but it needs to be changed to support
larger numbers of queues.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/efx.c        |    2 ++
 drivers/net/ethernet/sfc/falcon.c     |   12 ++++++++----
 drivers/net/ethernet/sfc/net_driver.h |   13 +++++++++----
 drivers/net/ethernet/sfc/nic.c        |   23 +++++++++++++++++++----
 drivers/net/ethernet/sfc/nic.h        |    2 ++
 drivers/net/ethernet/sfc/siena.c      |   12 ++++++++++--
 6 files changed, 50 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index e943ffb..c9c306a 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -1420,6 +1420,8 @@ static int efx_probe_nic(struct efx_nic *efx)
 	if (rc)
 		goto fail;
 
+	efx->type->dimension_resources(efx);
+
 	if (efx->n_channels > 1)
 		get_random_bytes(&efx->rx_hash_key, sizeof(efx->rx_hash_key));
 	for (i = 0; i < ARRAY_SIZE(efx->rx_indir_table); i++)
diff --git a/drivers/net/ethernet/sfc/falcon.c b/drivers/net/ethernet/sfc/falcon.c
index 9828511..3a1ca2b 100644
--- a/drivers/net/ethernet/sfc/falcon.c
+++ b/drivers/net/ethernet/sfc/falcon.c
@@ -1333,6 +1333,12 @@ out:
 	return rc;
 }
 
+static void falcon_dimension_resources(struct efx_nic *efx)
+{
+	efx->rx_dc_base = 0x20000;
+	efx->tx_dc_base = 0x26000;
+}
+
 /* Probe all SPI devices on the NIC */
 static void falcon_probe_spi_devices(struct efx_nic *efx)
 {
@@ -1749,6 +1755,7 @@ const struct efx_nic_type falcon_a1_nic_type = {
 	.probe = falcon_probe_nic,
 	.remove = falcon_remove_nic,
 	.init = falcon_init_nic,
+	.dimension_resources = falcon_dimension_resources,
 	.fini = efx_port_dummy_op_void,
 	.monitor = falcon_monitor,
 	.map_reset_reason = falcon_map_reset_reason,
@@ -1783,8 +1790,6 @@ const struct efx_nic_type falcon_a1_nic_type = {
 	.max_interrupt_mode = EFX_INT_MODE_MSI,
 	.phys_addr_channels = 4,
 	.timer_period_max =  1 << FRF_AB_TC_TIMER_VAL_WIDTH,
-	.tx_dc_base = 0x130000,
-	.rx_dc_base = 0x100000,
 	.offload_features = NETIF_F_IP_CSUM,
 };
 
@@ -1792,6 +1797,7 @@ const struct efx_nic_type falcon_b0_nic_type = {
 	.probe = falcon_probe_nic,
 	.remove = falcon_remove_nic,
 	.init = falcon_init_nic,
+	.dimension_resources = falcon_dimension_resources,
 	.fini = efx_port_dummy_op_void,
 	.monitor = falcon_monitor,
 	.map_reset_reason = falcon_map_reset_reason,
@@ -1835,8 +1841,6 @@ const struct efx_nic_type falcon_b0_nic_type = {
 				   * interrupt handler only supports 32
 				   * channels */
 	.timer_period_max =  1 << FRF_AB_TC_TIMER_VAL_WIDTH,
-	.tx_dc_base = 0x130000,
-	.rx_dc_base = 0x100000,
 	.offload_features = NETIF_F_IP_CSUM | NETIF_F_RXHASH | NETIF_F_NTUPLE,
 };
 
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 94b0dca..7870cef 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -658,6 +658,9 @@ struct efx_filter_state;
  *	should be allocated for this NIC
  * @rxq_entries: Size of receive queues requested by user.
  * @txq_entries: Size of transmit queues requested by user.
+ * @tx_dc_base: Base qword address in SRAM of TX queue descriptor caches
+ * @rx_dc_base: Base qword address in SRAM of RX queue descriptor caches
+ * @sram_lim_qw: Qword address limit of SRAM
  * @next_buffer_table: First available buffer table id
  * @n_channels: Number of channels in use
  * @n_rx_channels: Number of channels used for RX (= number of RX queues)
@@ -753,6 +756,9 @@ struct efx_nic {
 
 	unsigned rxq_entries;
 	unsigned txq_entries;
+	unsigned tx_dc_base;
+	unsigned rx_dc_base;
+	unsigned sram_lim_qw;
 	unsigned next_buffer_table;
 	unsigned n_channels;
 	unsigned n_rx_channels;
@@ -839,6 +845,8 @@ static inline unsigned int efx_port_num(struct efx_nic *efx)
  * @probe: Probe the controller
  * @remove: Free resources allocated by probe()
  * @init: Initialise the controller
+ * @dimension_resources: Dimension controller resources (buffer table,
+ *	and VIs once the available interrupt resources are clear)
  * @fini: Shut down the controller
  * @monitor: Periodic function for polling link state and hardware monitor
  * @map_reset_reason: Map ethtool reset reason to a reset method
@@ -878,8 +886,6 @@ static inline unsigned int efx_port_num(struct efx_nic *efx)
  * @phys_addr_channels: Number of channels with physically addressed
  *	descriptors
  * @timer_period_max: Maximum period of interrupt timer (in ticks)
- * @tx_dc_base: Base address in SRAM of TX queue descriptor caches
- * @rx_dc_base: Base address in SRAM of RX queue descriptor caches
  * @offload_features: net_device feature flags for protocol offload
  *	features implemented in hardware
  */
@@ -887,6 +893,7 @@ struct efx_nic_type {
 	int (*probe)(struct efx_nic *efx);
 	void (*remove)(struct efx_nic *efx);
 	int (*init)(struct efx_nic *efx);
+	void (*dimension_resources)(struct efx_nic *efx);
 	void (*fini)(struct efx_nic *efx);
 	void (*monitor)(struct efx_nic *efx);
 	enum reset_type (*map_reset_reason)(enum reset_type reason);
@@ -923,8 +930,6 @@ struct efx_nic_type {
 	unsigned int max_interrupt_mode;
 	unsigned int phys_addr_channels;
 	unsigned int timer_period_max;
-	unsigned int tx_dc_base;
-	unsigned int rx_dc_base;
 	netdev_features_t offload_features;
 };
 
diff --git a/drivers/net/ethernet/sfc/nic.c b/drivers/net/ethernet/sfc/nic.c
index 2bdfb63..747cf94 100644
--- a/drivers/net/ethernet/sfc/nic.c
+++ b/drivers/net/ethernet/sfc/nic.c
@@ -1609,6 +1609,23 @@ void efx_nic_fini_interrupt(struct efx_nic *efx)
 		free_irq(efx->legacy_irq, efx);
 }
 
+void efx_nic_dimension_resources(struct efx_nic *efx, unsigned sram_lim_qw)
+{
+	unsigned vi_count, buftbl_min;
+
+	/* Account for the buffer table entries backing the datapath channels
+	 * and the descriptor caches for those channels.
+	 */
+	buftbl_min = ((efx->n_rx_channels * EFX_MAX_DMAQ_SIZE +
+		       efx->n_tx_channels * EFX_TXQ_TYPES * EFX_MAX_DMAQ_SIZE +
+		       efx->n_channels * EFX_MAX_EVQ_SIZE)
+		      * sizeof(efx_qword_t) / EFX_BUF_SIZE);
+	vi_count = max(efx->n_channels, efx->n_tx_channels * EFX_TXQ_TYPES);
+
+	efx->tx_dc_base = sram_lim_qw - vi_count * TX_DC_ENTRIES;
+	efx->rx_dc_base = efx->tx_dc_base - vi_count * RX_DC_ENTRIES;
+}
+
 u32 efx_nic_fpga_ver(struct efx_nic *efx)
 {
 	efx_oword_t altera_build;
@@ -1621,11 +1638,9 @@ void efx_nic_init_common(struct efx_nic *efx)
 	efx_oword_t temp;
 
 	/* Set positions of descriptor caches in SRAM. */
-	EFX_POPULATE_OWORD_1(temp, FRF_AZ_SRM_TX_DC_BASE_ADR,
-			     efx->type->tx_dc_base / 8);
+	EFX_POPULATE_OWORD_1(temp, FRF_AZ_SRM_TX_DC_BASE_ADR, efx->tx_dc_base);
 	efx_writeo(efx, &temp, FR_AZ_SRM_TX_DC_CFG);
-	EFX_POPULATE_OWORD_1(temp, FRF_AZ_SRM_RX_DC_BASE_ADR,
-			     efx->type->rx_dc_base / 8);
+	EFX_POPULATE_OWORD_1(temp, FRF_AZ_SRM_RX_DC_BASE_ADR, efx->rx_dc_base);
 	efx_writeo(efx, &temp, FR_AZ_SRM_RX_DC_CFG);
 
 	/* Set TX descriptor cache size. */
diff --git a/drivers/net/ethernet/sfc/nic.h b/drivers/net/ethernet/sfc/nic.h
index 1f53e2c..5df7da8 100644
--- a/drivers/net/ethernet/sfc/nic.h
+++ b/drivers/net/ethernet/sfc/nic.h
@@ -230,6 +230,8 @@ extern void falcon_start_nic_stats(struct efx_nic *efx);
 extern void falcon_stop_nic_stats(struct efx_nic *efx);
 extern void falcon_setup_xaui(struct efx_nic *efx);
 extern int falcon_reset_xaui(struct efx_nic *efx);
+extern void
+efx_nic_dimension_resources(struct efx_nic *efx, unsigned sram_lim_qw);
 extern void efx_nic_init_common(struct efx_nic *efx);
 extern void efx_nic_push_rx_indir_table(struct efx_nic *efx);
 
diff --git a/drivers/net/ethernet/sfc/siena.c b/drivers/net/ethernet/sfc/siena.c
index d3c4169..657f3fa 100644
--- a/drivers/net/ethernet/sfc/siena.c
+++ b/drivers/net/ethernet/sfc/siena.c
@@ -225,6 +225,15 @@ static int siena_probe_nvconfig(struct efx_nic *efx)
 	return rc;
 }
 
+static void siena_dimension_resources(struct efx_nic *efx)
+{
+	/* Each port has a small block of internal SRAM dedicated to
+	 * the buffer table and descriptor caches.  In theory we can
+	 * map both blocks to one port, but we don't.
+	 */
+	efx_nic_dimension_resources(efx, FR_CZ_BUF_FULL_TBL_ROWS / 2);
+}
+
 static int siena_probe_nic(struct efx_nic *efx)
 {
 	struct siena_nic_data *nic_data;
@@ -619,6 +628,7 @@ const struct efx_nic_type siena_a0_nic_type = {
 	.probe = siena_probe_nic,
 	.remove = siena_remove_nic,
 	.init = siena_init_nic,
+	.dimension_resources = siena_dimension_resources,
 	.fini = efx_port_dummy_op_void,
 	.monitor = NULL,
 	.map_reset_reason = siena_map_reset_reason,
@@ -657,8 +667,6 @@ const struct efx_nic_type siena_a0_nic_type = {
 				   * interrupt handler only supports 32
 				   * channels */
 	.timer_period_max = 1 << FRF_CZ_TC_TIMER_VAL_WIDTH,
-	.tx_dc_base = 0x88000,
-	.rx_dc_base = 0x68000,
 	.offload_features = (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
 			     NETIF_F_RXHASH | NETIF_F_NTUPLE),
 };
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 19/19] sfc: Add SR-IOV back-end support for SFC9000 family
  2012-02-16  0:42 pull request: sfc-next 2012-02-16 Ben Hutchings
                   ` (17 preceding siblings ...)
  2012-02-16  0:52 ` [PATCH net-next 18/19] sfc: Allocate SRAM between buffer table and descriptor caches at init time Ben Hutchings
@ 2012-02-16  0:52 ` Ben Hutchings
  2012-02-16  1:18   ` John Fastabend
  2012-02-16 22:08 ` pull request: sfc-next 2012-02-16 David Miller
  19 siblings, 1 reply; 25+ messages in thread
From: Ben Hutchings @ 2012-02-16  0:52 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers, Shradha Shah

On the SFC9000 family, each port has 1024 Virtual Interfaces (VIs),
each with an RX queue, a TX queue, an event queue and a mailbox
register.  These may be assigned to up to 127 SR-IOV virtual functions
per port, with up to 64 VIs per VF.

We allocate an extra channel (IRQ and event queue only) to receive
requests from VF drivers.

There is a per-port limit of 4 concurrent RX queue flushes, and queue
flushes may be initiated by the MC in response to a Function Level
Reset (FLR) of a VF.  Therefore, when SR-IOV is in use, we submit all
flush requests via the MC.

The RSS indirection table is shared with VFs, so the number of RX
queues used in the PF is limited to the number of VIs per VF.

This is almost entirely the work of Steve Hodgson, formerly
shodgson@solarflare.com.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/Kconfig       |    8 +
 drivers/net/ethernet/sfc/Makefile      |    1 +
 drivers/net/ethernet/sfc/efx.c         |   70 ++-
 drivers/net/ethernet/sfc/ethtool.c     |    3 +-
 drivers/net/ethernet/sfc/mcdi.c        |   34 +
 drivers/net/ethernet/sfc/mcdi.h        |    2 +
 drivers/net/ethernet/sfc/mcdi_mac.c    |    2 +-
 drivers/net/ethernet/sfc/net_driver.h  |   32 +-
 drivers/net/ethernet/sfc/nic.c         |   79 ++-
 drivers/net/ethernet/sfc/nic.h         |   89 ++
 drivers/net/ethernet/sfc/siena.c       |    2 +
 drivers/net/ethernet/sfc/siena_sriov.c | 1642 ++++++++++++++++++++++++++++++++
 drivers/net/ethernet/sfc/vfdi.h        |  254 +++++
 13 files changed, 2192 insertions(+), 26 deletions(-)
 create mode 100644 drivers/net/ethernet/sfc/siena_sriov.c
 create mode 100644 drivers/net/ethernet/sfc/vfdi.h

diff --git a/drivers/net/ethernet/sfc/Kconfig b/drivers/net/ethernet/sfc/Kconfig
index 8d42354..fb3cbc2 100644
--- a/drivers/net/ethernet/sfc/Kconfig
+++ b/drivers/net/ethernet/sfc/Kconfig
@@ -26,3 +26,11 @@ config SFC_MCDI_MON
 	----help---
 	  This exposes the on-board firmware-managed sensors as a
 	  hardware monitor device.
+config SFC_SRIOV
+	bool "Solarflare SFC9000-family SR-IOV support"
+	depends on SFC && PCI_IOV
+	default y
+	---help---
+	  This enables support for the SFC9000 I/O Virtualization
+	  features, allowing accelerated network performance in
+	  virtualized environments.
diff --git a/drivers/net/ethernet/sfc/Makefile b/drivers/net/ethernet/sfc/Makefile
index 3fa2e25..ea1f8db 100644
--- a/drivers/net/ethernet/sfc/Makefile
+++ b/drivers/net/ethernet/sfc/Makefile
@@ -4,5 +4,6 @@ sfc-y			+= efx.o nic.o falcon.o siena.o tx.o rx.o filter.o \
 			   tenxpress.o txc43128_phy.o falcon_boards.o \
 			   mcdi.o mcdi_phy.o mcdi_mon.o
 sfc-$(CONFIG_SFC_MTD)	+= mtd.o
+sfc-$(CONFIG_SFC_SRIOV)	+= siena_sriov.o
 
 obj-$(CONFIG_SFC)	+= sfc.o
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index c9c306a..ac571cf 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -1175,25 +1175,40 @@ static unsigned int efx_wanted_parallelism(struct efx_nic *efx)
 	unsigned int count;
 	int cpu;
 
-	if (rss_cpus)
-		return rss_cpus;
+	if (rss_cpus) {
+		count = rss_cpus;
+	} else {
+		if (unlikely(!zalloc_cpumask_var(&thread_mask, GFP_KERNEL))) {
+			netif_warn(efx, probe, efx->net_dev,
+				   "RSS disabled due to allocation failure\n");
+			return 1;
+		}
 
-	if (unlikely(!zalloc_cpumask_var(&thread_mask, GFP_KERNEL))) {
-		netif_warn(efx, probe, efx->net_dev,
-			   "RSS disabled due to allocation failure\n");
-		return 1;
+		count = 0;
+		for_each_online_cpu(cpu) {
+			if (!cpumask_test_cpu(cpu, thread_mask)) {
+				++count;
+				cpumask_or(thread_mask, thread_mask,
+					   topology_thread_cpumask(cpu));
+			}
+		}
+
+		free_cpumask_var(thread_mask);
 	}
 
-	count = 0;
-	for_each_online_cpu(cpu) {
-		if (!cpumask_test_cpu(cpu, thread_mask)) {
-			++count;
-			cpumask_or(thread_mask, thread_mask,
-				   topology_thread_cpumask(cpu));
-		}
+	/* If RSS is requested for the PF *and* VFs then we can't write RSS
+	 * table entries that are inaccessible to VFs
+	 */
+	if (efx_sriov_wanted(efx) && efx_vf_size(efx) > 1 &&
+	    count > efx_vf_size(efx)) {
+		netif_warn(efx, probe, efx->net_dev,
+			   "Reducing number of RSS channels from %u to %u for "
+			   "VF support. Increase vf-msix-limit to use more "
+			   "channels on the PF.\n",
+			   count, efx_vf_size(efx));
+		count = efx_vf_size(efx);
 	}
 
-	free_cpumask_var(thread_mask);
 	return count;
 }
 
@@ -1327,6 +1342,10 @@ static int efx_probe_interrupts(struct efx_nic *efx)
 		}
 	}
 
+	/* RSS might be usable on VFs even if it is disabled on the PF */
+	efx->rss_spread = (efx->n_rx_channels > 1 ?
+			   efx->n_rx_channels : efx_vf_size(efx));
+
 	return 0;
 }
 
@@ -1426,7 +1445,7 @@ static int efx_probe_nic(struct efx_nic *efx)
 		get_random_bytes(&efx->rx_hash_key, sizeof(efx->rx_hash_key));
 	for (i = 0; i < ARRAY_SIZE(efx->rx_indir_table); i++)
 		efx->rx_indir_table[i] =
-			ethtool_rxfh_indir_default(i, efx->n_rx_channels);
+			ethtool_rxfh_indir_default(i, efx->rss_spread);
 
 	efx_set_channels(efx);
 	netif_set_real_num_tx_queues(efx->net_dev, efx->n_tx_channels);
@@ -1915,6 +1934,7 @@ static int efx_set_mac_address(struct net_device *net_dev, void *data)
 	}
 
 	memcpy(net_dev->dev_addr, new_addr, net_dev->addr_len);
+	efx_sriov_mac_address_changed(efx);
 
 	/* Reconfigure the MAC */
 	mutex_lock(&efx->mac_lock);
@@ -1981,6 +2001,12 @@ static const struct net_device_ops efx_netdev_ops = {
 	.ndo_set_mac_address	= efx_set_mac_address,
 	.ndo_set_rx_mode	= efx_set_rx_mode,
 	.ndo_set_features	= efx_set_features,
+#ifdef CONFIG_SFC_SRIOV
+	.ndo_set_vf_mac		= efx_sriov_set_vf_mac,
+	.ndo_set_vf_vlan	= efx_sriov_set_vf_vlan,
+	.ndo_set_vf_spoofchk	= efx_sriov_set_vf_spoofchk,
+	.ndo_get_vf_config	= efx_sriov_get_vf_config,
+#endif
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller = efx_netpoll,
 #endif
@@ -2150,6 +2176,7 @@ int efx_reset_up(struct efx_nic *efx, enum reset_type method, bool ok)
 
 	efx_start_interrupts(efx, false);
 	efx_restore_filters(efx);
+	efx_sriov_reset(efx);
 
 	mutex_unlock(&efx->mac_lock);
 
@@ -2440,6 +2467,7 @@ static void efx_pci_remove(struct pci_dev *pci_dev)
 	rtnl_unlock();
 
 	efx_stop_interrupts(efx, false);
+	efx_sriov_fini(efx);
 	efx_unregister_netdev(efx);
 
 	efx_mtd_remove(efx);
@@ -2581,6 +2609,11 @@ static int __devinit efx_pci_probe(struct pci_dev *pci_dev,
 	if (rc)
 		goto fail4;
 
+	rc = efx_sriov_init(efx);
+	if (rc)
+		netif_err(efx, probe, efx->net_dev,
+			  "SR-IOV can't be enabled rc %d\n", rc);
+
 	netif_dbg(efx, probe, efx->net_dev, "initialisation successful\n");
 
 	/* Try to create MTDs, but allow this to fail */
@@ -2732,6 +2765,10 @@ static int __init efx_init_module(void)
 	if (rc)
 		goto err_notifier;
 
+	rc = efx_init_sriov();
+	if (rc)
+		goto err_sriov;
+
 	reset_workqueue = create_singlethread_workqueue("sfc_reset");
 	if (!reset_workqueue) {
 		rc = -ENOMEM;
@@ -2747,6 +2784,8 @@ static int __init efx_init_module(void)
  err_pci:
 	destroy_workqueue(reset_workqueue);
  err_reset:
+	efx_fini_sriov();
+ err_sriov:
 	unregister_netdevice_notifier(&efx_netdev_notifier);
  err_notifier:
 	return rc;
@@ -2758,6 +2797,7 @@ static void __exit efx_exit_module(void)
 
 	pci_unregister_driver(&efx_pci_driver);
 	destroy_workqueue(reset_workqueue);
+	efx_fini_sriov();
 	unregister_netdevice_notifier(&efx_netdev_notifier);
 
 }
diff --git a/drivers/net/ethernet/sfc/ethtool.c b/drivers/net/ethernet/sfc/ethtool.c
index 8319115..f22f45f 100644
--- a/drivers/net/ethernet/sfc/ethtool.c
+++ b/drivers/net/ethernet/sfc/ethtool.c
@@ -1085,7 +1085,8 @@ static u32 efx_ethtool_get_rxfh_indir_size(struct net_device *net_dev)
 {
 	struct efx_nic *efx = netdev_priv(net_dev);
 
-	return (efx_nic_rev(efx) < EFX_REV_FALCON_B0 ?
+	return ((efx_nic_rev(efx) < EFX_REV_FALCON_B0 ||
+		 efx->n_rx_channels == 1) ?
 		0 : ARRAY_SIZE(efx->rx_indir_table));
 }
 
diff --git a/drivers/net/ethernet/sfc/mcdi.c b/drivers/net/ethernet/sfc/mcdi.c
index 619f63a..17b6463 100644
--- a/drivers/net/ethernet/sfc/mcdi.c
+++ b/drivers/net/ethernet/sfc/mcdi.c
@@ -560,6 +560,9 @@ void efx_mcdi_process_event(struct efx_channel *channel,
 	case MCDI_EVENT_CODE_MAC_STATS_DMA:
 		/* MAC stats are gather lazily.  We can ignore this. */
 		break;
+	case MCDI_EVENT_CODE_FLR:
+		efx_sriov_flr(efx, MCDI_EVENT_FIELD(*event, FLR_VF));
+		break;
 
 	default:
 		netif_err(efx, hw, efx->net_dev, "Unknown MCDI event 0x%x\n",
@@ -1154,6 +1157,37 @@ fail:
 	return rc;
 }
 
+int efx_mcdi_flush_rxqs(struct efx_nic *efx)
+{
+	struct efx_channel *channel;
+	struct efx_rx_queue *rx_queue;
+	__le32 *qid;
+	int rc, count;
+
+	qid = kmalloc(EFX_MAX_CHANNELS * sizeof(*qid), GFP_KERNEL);
+	if (qid == NULL)
+		return -ENOMEM;
+
+	count = 0;
+	efx_for_each_channel(channel, efx) {
+		efx_for_each_channel_rx_queue(rx_queue, channel) {
+			if (rx_queue->flush_pending) {
+				rx_queue->flush_pending = false;
+				atomic_dec(&efx->rxq_flush_pending);
+				qid[count++] = cpu_to_le32(
+					efx_rx_queue_index(rx_queue));
+			}
+		}
+	}
+
+	rc = efx_mcdi_rpc(efx, MC_CMD_FLUSH_RX_QUEUES, (u8 *)qid,
+			  count * sizeof(*qid), NULL, 0, NULL);
+	WARN_ON(rc > 0);
+
+	kfree(qid);
+
+	return rc;
+}
 
 int efx_mcdi_wol_filter_reset(struct efx_nic *efx)
 {
diff --git a/drivers/net/ethernet/sfc/mcdi.h b/drivers/net/ethernet/sfc/mcdi.h
index fbaa6ef..0bdf3e3 100644
--- a/drivers/net/ethernet/sfc/mcdi.h
+++ b/drivers/net/ethernet/sfc/mcdi.h
@@ -146,6 +146,8 @@ extern int efx_mcdi_wol_filter_set_magic(struct efx_nic *efx,
 extern int efx_mcdi_wol_filter_get_magic(struct efx_nic *efx, int *id_out);
 extern int efx_mcdi_wol_filter_remove(struct efx_nic *efx, int id);
 extern int efx_mcdi_wol_filter_reset(struct efx_nic *efx);
+extern int efx_mcdi_flush_rxqs(struct efx_nic *efx);
+extern int efx_mcdi_set_mac(struct efx_nic *efx);
 extern int efx_mcdi_mac_stats(struct efx_nic *efx, dma_addr_t dma_addr,
 			      u32 dma_len, int enable, int clear);
 extern int efx_mcdi_mac_reconfigure(struct efx_nic *efx);
diff --git a/drivers/net/ethernet/sfc/mcdi_mac.c b/drivers/net/ethernet/sfc/mcdi_mac.c
index 98afe1c..1003f30 100644
--- a/drivers/net/ethernet/sfc/mcdi_mac.c
+++ b/drivers/net/ethernet/sfc/mcdi_mac.c
@@ -12,7 +12,7 @@
 #include "mcdi.h"
 #include "mcdi_pcol.h"
 
-static int efx_mcdi_set_mac(struct efx_nic *efx)
+int efx_mcdi_set_mac(struct efx_nic *efx)
 {
 	u32 reject, fcntl;
 	u8 cmdbytes[MC_CMD_SET_MAC_IN_LEN];
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 7870cef..3fbec45 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -24,6 +24,7 @@
 #include <linux/device.h>
 #include <linux/highmem.h>
 #include <linux/workqueue.h>
+#include <linux/mutex.h>
 #include <linux/vmalloc.h>
 #include <linux/i2c.h>
 
@@ -54,7 +55,8 @@
 
 #define EFX_MAX_CHANNELS 32U
 #define EFX_MAX_RX_QUEUES EFX_MAX_CHANNELS
-#define EFX_MAX_EXTRA_CHANNELS	0U
+#define EFX_EXTRA_CHANNEL_IOV	0
+#define EFX_MAX_EXTRA_CHANNELS	1U
 
 /* Checksum generation is a per-queue option in hardware, so each
  * queue visible to the networking core is backed by two hardware TX
@@ -629,6 +631,8 @@ union efx_multicast_hash {
 };
 
 struct efx_filter_state;
+struct efx_vf;
+struct vfdi_status;
 
 /**
  * struct efx_nic - an Efx NIC
@@ -712,6 +716,17 @@ struct efx_filter_state;
  *	completed (either success or failure). Not used when MCDI is used to
  *	flush receive queues.
  * @flush_wq: wait queue used by efx_nic_flush_queues() to wait for flush completions.
+ * @vf: Array of &struct efx_vf objects.
+ * @vf_count: Number of VFs intended to be enabled.
+ * @vf_init_count: Number of VFs that have been fully initialised.
+ * @vi_scale: log2 number of vnics per VF.
+ * @vf_buftbl_base: The zeroth buffer table index used to back VF queues.
+ * @vfdi_status: Common VFDI status page to be dmad to VF address space.
+ * @local_addr_list: List of local addresses. Protected by %local_lock.
+ * @local_page_list: List of DMA addressable pages used to broadcast
+ *	%local_addr_list. Protected by %local_lock.
+ * @local_lock: Mutex protecting %local_addr_list and %local_page_list.
+ * @peer_work: Work item to broadcast peer addresses to VMs.
  * @monitor_work: Hardware monitor workitem
  * @biu_lock: BIU (bus interface unit) lock
  * @last_irq_cpu: Last CPU to handle a possible test interrupt.  This
@@ -762,6 +777,7 @@ struct efx_nic {
 	unsigned next_buffer_table;
 	unsigned n_channels;
 	unsigned n_rx_channels;
+	unsigned rss_spread;
 	unsigned tx_channel_offset;
 	unsigned n_tx_channels;
 	unsigned int rx_buffer_len;
@@ -820,6 +836,20 @@ struct efx_nic {
 	atomic_t rxq_flush_outstanding;
 	wait_queue_head_t flush_wq;
 
+#ifdef CONFIG_SFC_SRIOV
+	struct efx_channel *vfdi_channel;
+	struct efx_vf *vf;
+	unsigned vf_count;
+	unsigned vf_init_count;
+	unsigned vi_scale;
+	unsigned vf_buftbl_base;
+	struct efx_buffer vfdi_status;
+	struct list_head local_addr_list;
+	struct list_head local_page_list;
+	struct mutex local_lock;
+	struct work_struct peer_work;
+#endif
+
 	/* The following fields may be written more often */
 
 	struct delayed_work monitor_work ____cacheline_aligned_in_smp;
diff --git a/drivers/net/ethernet/sfc/nic.c b/drivers/net/ethernet/sfc/nic.c
index 747cf94..2bf4283 100644
--- a/drivers/net/ethernet/sfc/nic.c
+++ b/drivers/net/ethernet/sfc/nic.c
@@ -264,6 +264,10 @@ static int efx_alloc_special_buffer(struct efx_nic *efx,
 	/* Select new buffer ID */
 	buffer->index = efx->next_buffer_table;
 	efx->next_buffer_table += buffer->entries;
+#ifdef CONFIG_SFC_SRIOV
+	BUG_ON(efx_sriov_enabled(efx) &&
+	       efx->vf_buftbl_base < efx->next_buffer_table);
+#endif
 
 	netif_dbg(efx, probe, efx->net_dev,
 		  "allocating special buffers %d-%d at %llx+%x "
@@ -693,6 +697,16 @@ int efx_nic_flush_queues(struct efx_nic *efx)
 	}
 
 	while (timeout && atomic_read(&efx->drain_pending) > 0) {
+		/* If SRIOV is enabled, then offload receive queue flushing to
+		 * the firmware (though we will still have to poll for
+		 * completion). If that fails, fall back to the old scheme.
+		 */
+		if (efx_sriov_enabled(efx)) {
+			rc = efx_mcdi_flush_rxqs(efx);
+			if (!rc)
+				goto wait;
+		}
+
 		/* The hardware supports four concurrent rx flushes, each of
 		 * which may need to be retried if there is an outstanding
 		 * descriptor fetch
@@ -712,6 +726,7 @@ int efx_nic_flush_queues(struct efx_nic *efx)
 			}
 		}
 
+	wait:
 		timeout = wait_event_timeout(efx->flush_wq, efx_flush_wake(efx),
 					     timeout);
 	}
@@ -1102,11 +1117,13 @@ efx_handle_driver_event(struct efx_channel *channel, efx_qword_t *event)
 		netif_vdbg(efx, hw, efx->net_dev, "channel %d TXQ %d flushed\n",
 			   channel->channel, ev_sub_data);
 		efx_handle_tx_flush_done(efx, event);
+		efx_sriov_tx_flush_done(efx, event);
 		break;
 	case FSE_AZ_RX_DESCQ_FLS_DONE_EV:
 		netif_vdbg(efx, hw, efx->net_dev, "channel %d RXQ %d flushed\n",
 			   channel->channel, ev_sub_data);
 		efx_handle_rx_flush_done(efx, event);
+		efx_sriov_rx_flush_done(efx, event);
 		break;
 	case FSE_AZ_EVQ_INIT_DONE_EV:
 		netif_dbg(efx, hw, efx->net_dev,
@@ -1138,16 +1155,24 @@ efx_handle_driver_event(struct efx_channel *channel, efx_qword_t *event)
 				   RESET_TYPE_DISABLE);
 		break;
 	case FSE_BZ_RX_DSC_ERROR_EV:
-		netif_err(efx, rx_err, efx->net_dev,
-			  "RX DMA Q %d reports descriptor fetch error."
-			  " RX Q %d is disabled.\n", ev_sub_data, ev_sub_data);
-		efx_schedule_reset(efx, RESET_TYPE_RX_DESC_FETCH);
+		if (ev_sub_data < EFX_VI_BASE) {
+			netif_err(efx, rx_err, efx->net_dev,
+				  "RX DMA Q %d reports descriptor fetch error."
+				  " RX Q %d is disabled.\n", ev_sub_data,
+				  ev_sub_data);
+			efx_schedule_reset(efx, RESET_TYPE_RX_DESC_FETCH);
+		} else
+			efx_sriov_desc_fetch_err(efx, ev_sub_data);
 		break;
 	case FSE_BZ_TX_DSC_ERROR_EV:
-		netif_err(efx, tx_err, efx->net_dev,
-			  "TX DMA Q %d reports descriptor fetch error."
-			  " TX Q %d is disabled.\n", ev_sub_data, ev_sub_data);
-		efx_schedule_reset(efx, RESET_TYPE_TX_DESC_FETCH);
+		if (ev_sub_data < EFX_VI_BASE) {
+			netif_err(efx, tx_err, efx->net_dev,
+				  "TX DMA Q %d reports descriptor fetch error."
+				  " TX Q %d is disabled.\n", ev_sub_data,
+				  ev_sub_data);
+			efx_schedule_reset(efx, RESET_TYPE_TX_DESC_FETCH);
+		} else
+			efx_sriov_desc_fetch_err(efx, ev_sub_data);
 		break;
 	default:
 		netif_vdbg(efx, hw, efx->net_dev,
@@ -1207,6 +1232,9 @@ int efx_nic_process_eventq(struct efx_channel *channel, int budget)
 		case FSE_AZ_EV_CODE_DRIVER_EV:
 			efx_handle_driver_event(channel, &event);
 			break;
+		case FSE_CZ_EV_CODE_USER_EV:
+			efx_sriov_event(channel, &event);
+			break;
 		case FSE_CZ_EV_CODE_MCDI_EV:
 			efx_mcdi_process_event(channel, &event);
 			break;
@@ -1609,6 +1637,15 @@ void efx_nic_fini_interrupt(struct efx_nic *efx)
 		free_irq(efx->legacy_irq, efx);
 }
 
+/* Looks at available SRAM resources and works out how many queues we
+ * can support, and where things like descriptor caches should live.
+ *
+ * SRAM is split up as follows:
+ * 0                          buftbl entries for channels
+ * efx->vf_buftbl_base        buftbl entries for SR-IOV
+ * efx->rx_dc_base            RX descriptor caches
+ * efx->tx_dc_base            TX descriptor caches
+ */
 void efx_nic_dimension_resources(struct efx_nic *efx, unsigned sram_lim_qw)
 {
 	unsigned vi_count, buftbl_min;
@@ -1622,6 +1659,32 @@ void efx_nic_dimension_resources(struct efx_nic *efx, unsigned sram_lim_qw)
 		      * sizeof(efx_qword_t) / EFX_BUF_SIZE);
 	vi_count = max(efx->n_channels, efx->n_tx_channels * EFX_TXQ_TYPES);
 
+#ifdef CONFIG_SFC_SRIOV
+	if (efx_sriov_wanted(efx)) {
+		unsigned vi_dc_entries, buftbl_free, entries_per_vf, vf_limit;
+
+		efx->vf_buftbl_base = buftbl_min;
+
+		vi_dc_entries = RX_DC_ENTRIES + TX_DC_ENTRIES;
+		vi_count = max(vi_count, EFX_VI_BASE);
+		buftbl_free = (sram_lim_qw - buftbl_min -
+			       vi_count * vi_dc_entries);
+
+		entries_per_vf = ((vi_dc_entries + EFX_VF_BUFTBL_PER_VI) *
+				  efx_vf_size(efx));
+		vf_limit = min(buftbl_free / entries_per_vf,
+			       (1024U - EFX_VI_BASE) >> efx->vi_scale);
+
+		if (efx->vf_count > vf_limit) {
+			netif_err(efx, probe, efx->net_dev,
+				  "Reducing VF count from from %d to %d\n",
+				  efx->vf_count, vf_limit);
+			efx->vf_count = vf_limit;
+		}
+		vi_count += efx->vf_count * efx_vf_size(efx);
+	}
+#endif
+
 	efx->tx_dc_base = sram_lim_qw - vi_count * TX_DC_ENTRIES;
 	efx->rx_dc_base = efx->tx_dc_base - vi_count * RX_DC_ENTRIES;
 }
diff --git a/drivers/net/ethernet/sfc/nic.h b/drivers/net/ethernet/sfc/nic.h
index 5df7da8..246c414 100644
--- a/drivers/net/ethernet/sfc/nic.h
+++ b/drivers/net/ethernet/sfc/nic.h
@@ -169,6 +169,95 @@ static inline struct efx_mcdi_mon *efx_mcdi_mon(struct efx_nic *efx)
 }
 #endif
 
+/*
+ * On the SFC9000 family each port is associated with 1 PCI physical
+ * function (PF) handled by sfc and a configurable number of virtual
+ * functions (VFs) that may be handled by some other driver, often in
+ * a VM guest.  The queue pointer registers are mapped in both PF and
+ * VF BARs such that an 8K region provides access to a single RX, TX
+ * and event queue (collectively a Virtual Interface, VI or VNIC).
+ *
+ * The PF has access to all 1024 VIs while VFs are mapped to VIs
+ * according to VI_BASE and VI_SCALE: VF i has access to VIs numbered
+ * in range [VI_BASE + i << VI_SCALE, VI_BASE + i + 1 << VI_SCALE).
+ * The number of VIs and the VI_SCALE value are configurable but must
+ * be established at boot time by firmware.
+ */
+
+/* Maximum VI_SCALE parameter supported by Siena */
+#define EFX_VI_SCALE_MAX 6
+/* Base VI to use for SR-IOV. Must be aligned to (1 << EFX_VI_SCALE_MAX),
+ * so this is the smallest allowed value. */
+#define EFX_VI_BASE 128U
+/* Maximum number of VFs allowed */
+#define EFX_VF_COUNT_MAX 127
+/* Limit EVQs on VFs to be only 8k to reduce buffer table reservation */
+#define EFX_MAX_VF_EVQ_SIZE 8192UL
+/* The number of buffer table entries reserved for each VI on a VF */
+#define EFX_VF_BUFTBL_PER_VI					\
+	((EFX_MAX_VF_EVQ_SIZE + 2 * EFX_MAX_DMAQ_SIZE) *	\
+	 sizeof(efx_qword_t) / EFX_BUF_SIZE)
+
+#ifdef CONFIG_SFC_SRIOV
+
+static inline bool efx_sriov_wanted(struct efx_nic *efx)
+{
+	return efx->vf_count != 0;
+}
+static inline bool efx_sriov_enabled(struct efx_nic *efx)
+{
+	return efx->vf_init_count != 0;
+}
+static inline unsigned int efx_vf_size(struct efx_nic *efx)
+{
+	return 1 << efx->vi_scale;
+}
+
+extern int efx_init_sriov(void);
+extern void efx_sriov_probe(struct efx_nic *efx);
+extern int efx_sriov_init(struct efx_nic *efx);
+extern void efx_sriov_mac_address_changed(struct efx_nic *efx);
+extern void efx_sriov_tx_flush_done(struct efx_nic *efx, efx_qword_t *event);
+extern void efx_sriov_rx_flush_done(struct efx_nic *efx, efx_qword_t *event);
+extern void efx_sriov_event(struct efx_channel *channel, efx_qword_t *event);
+extern void efx_sriov_desc_fetch_err(struct efx_nic *efx, unsigned dmaq);
+extern void efx_sriov_flr(struct efx_nic *efx, unsigned flr);
+extern void efx_sriov_reset(struct efx_nic *efx);
+extern void efx_sriov_fini(struct efx_nic *efx);
+extern void efx_fini_sriov(void);
+
+#else
+
+static inline bool efx_sriov_wanted(struct efx_nic *efx) { return false; }
+static inline bool efx_sriov_enabled(struct efx_nic *efx) { return false; }
+static inline unsigned int efx_vf_size(struct efx_nic *efx) { return 0; }
+
+static inline int efx_init_sriov(void) { return 0; }
+static inline void efx_sriov_probe(struct efx_nic *efx) {}
+static inline int efx_sriov_init(struct efx_nic *efx) { return -EOPNOTSUPP; }
+static inline void efx_sriov_mac_address_changed(struct efx_nic *efx) {}
+static inline void efx_sriov_tx_flush_done(struct efx_nic *efx,
+					   efx_qword_t *event) {}
+static inline void efx_sriov_rx_flush_done(struct efx_nic *efx,
+					   efx_qword_t *event) {}
+static inline void efx_sriov_event(struct efx_channel *channel,
+				   efx_qword_t *event) {}
+static inline void efx_sriov_desc_fetch_err(struct efx_nic *efx, unsigned dmaq) {}
+static inline void efx_sriov_flr(struct efx_nic *efx, unsigned flr) {}
+static inline void efx_sriov_reset(struct efx_nic *efx) {}
+static inline void efx_sriov_fini(struct efx_nic *efx) {}
+static inline void efx_fini_sriov(void) {}
+
+#endif
+
+extern int efx_sriov_set_vf_mac(struct net_device *dev, int vf, u8 *mac);
+extern int efx_sriov_set_vf_vlan(struct net_device *dev, int vf,
+				 u16 vlan, u8 qos);
+extern int efx_sriov_get_vf_config(struct net_device *dev, int vf,
+				   struct ifla_vf_info *ivf);
+extern int efx_sriov_set_vf_spoofchk(struct net_device *net_dev, int vf,
+				     bool spoofchk);
+
 extern const struct efx_nic_type falcon_a1_nic_type;
 extern const struct efx_nic_type falcon_b0_nic_type;
 extern const struct efx_nic_type siena_a0_nic_type;
diff --git a/drivers/net/ethernet/sfc/siena.c b/drivers/net/ethernet/sfc/siena.c
index 657f3fa..7bea790 100644
--- a/drivers/net/ethernet/sfc/siena.c
+++ b/drivers/net/ethernet/sfc/siena.c
@@ -313,6 +313,8 @@ static int siena_probe_nic(struct efx_nic *efx)
 	if (rc)
 		goto fail5;
 
+	efx_sriov_probe(efx);
+
 	return 0;
 
 fail5:
diff --git a/drivers/net/ethernet/sfc/siena_sriov.c b/drivers/net/ethernet/sfc/siena_sriov.c
new file mode 100644
index 0000000..5c6839e
--- /dev/null
+++ b/drivers/net/ethernet/sfc/siena_sriov.c
@@ -0,0 +1,1642 @@
+/****************************************************************************
+ * Driver for Solarflare Solarstorm network controllers and boards
+ * Copyright 2010-2011 Solarflare Communications Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation, incorporated herein by reference.
+ */
+#include <linux/pci.h>
+#include <linux/module.h>
+#include "net_driver.h"
+#include "efx.h"
+#include "nic.h"
+#include "io.h"
+#include "mcdi.h"
+#include "filter.h"
+#include "mcdi_pcol.h"
+#include "regs.h"
+#include "vfdi.h"
+
+/* Number of longs required to track all the VIs in a VF */
+#define VI_MASK_LENGTH BITS_TO_LONGS(1 << EFX_VI_SCALE_MAX)
+
+/**
+ * enum efx_vf_tx_filter_mode - TX MAC filtering behaviour
+ * @VF_TX_FILTER_OFF: Disabled
+ * @VF_TX_FILTER_AUTO: Enabled if MAC address assigned to VF and only
+ *	2 TX queues allowed per VF.
+ * @VF_TX_FILTER_ON: Enabled
+ */
+enum efx_vf_tx_filter_mode {
+	VF_TX_FILTER_OFF,
+	VF_TX_FILTER_AUTO,
+	VF_TX_FILTER_ON,
+};
+
+/**
+ * struct efx_vf - Back-end resource and protocol state for a PCI VF
+ * @efx: The Efx NIC owning this VF
+ * @pci_rid: The PCI requester ID for this VF
+ * @pci_name: The PCI name (formatted address) of this VF
+ * @index: Index of VF within its port and PF.
+ * @req: VFDI incoming request work item. Incoming USR_EV events are received
+ *	by the NAPI handler, but must be handled by executing MCDI requests
+ *	inside a work item.
+ * @req_addr: VFDI incoming request DMA address (in VF's PCI address space).
+ * @req_type: Expected next incoming (from VF) %VFDI_EV_TYPE member.
+ * @req_seqno: Expected next incoming (from VF) %VFDI_EV_SEQ member.
+ * @msg_seqno: Next %VFDI_EV_SEQ member to reply to VF. Protected by
+ *	@status_lock
+ * @busy: VFDI request queued to be processed or being processed. Receiving
+ *	a VFDI request when @busy is set is an error condition.
+ * @buf: Incoming VFDI requests are DMA from the VF into this buffer.
+ * @buftbl_base: Buffer table entries for this VF start at this index.
+ * @rx_filtering: Receive filtering has been requested by the VF driver.
+ * @rx_filter_flags: The flags sent in the %VFDI_OP_INSERT_FILTER request.
+ * @rx_filter_qid: VF relative qid for RX filter requested by VF.
+ * @rx_filter_id: Receive MAC filter ID. Only one filter per VF is supported.
+ * @tx_filter_mode: Transmit MAC filtering mode.
+ * @tx_filter_id: Transmit MAC filter ID.
+ * @addr: The MAC address and outer vlan tag of the VF.
+ * @status_addr: VF DMA address of page for &struct vfdi_status updates.
+ * @status_lock: Mutex protecting @msg_seqno, @status_addr, @addr,
+ *	@peer_page_addrs and @peer_page_count from simultaneous
+ *	updates by the VM and consumption by
+ *	efx_sriov_update_vf_addr()
+ * @peer_page_addrs: Pointer to an array of guest pages for local addresses.
+ * @peer_page_count: Number of entries in @peer_page_count.
+ * @evq0_addrs: Array of guest pages backing evq0.
+ * @evq0_count: Number of entries in @evq0_addrs.
+ * @flush_waitq: wait queue used by %VFDI_OP_FINI_ALL_QUEUES handler
+ *	to wait for flush completions.
+ * @txq_lock: Mutex for TX queue allocation.
+ * @txq_mask: Mask of initialized transmit queues.
+ * @txq_count: Number of initialized transmit queues.
+ * @rxq_mask: Mask of initialized receive queues.
+ * @rxq_count: Number of initialized receive queues.
+ * @rxq_retry_mask: Mask or receive queues that need to be flushed again
+ *	due to flush failure.
+ * @rxq_retry_count: Number of receive queues in @rxq_retry_mask.
+ * @reset_work: Work item to schedule a VF reset.
+ */
+struct efx_vf {
+	struct efx_nic *efx;
+	unsigned int pci_rid;
+	char pci_name[13]; /* dddd:bb:dd.f */
+	unsigned int index;
+	struct work_struct req;
+	u64 req_addr;
+	int req_type;
+	unsigned req_seqno;
+	unsigned msg_seqno;
+	bool busy;
+	struct efx_buffer buf;
+	unsigned buftbl_base;
+	bool rx_filtering;
+	enum efx_filter_flags rx_filter_flags;
+	unsigned rx_filter_qid;
+	int rx_filter_id;
+	enum efx_vf_tx_filter_mode tx_filter_mode;
+	int tx_filter_id;
+	struct vfdi_endpoint addr;
+	u64 status_addr;
+	struct mutex status_lock;
+	u64 *peer_page_addrs;
+	unsigned peer_page_count;
+	u64 evq0_addrs[EFX_MAX_VF_EVQ_SIZE * sizeof(efx_qword_t) /
+		       EFX_BUF_SIZE];
+	unsigned evq0_count;
+	wait_queue_head_t flush_waitq;
+	struct mutex txq_lock;
+	unsigned long txq_mask[VI_MASK_LENGTH];
+	unsigned txq_count;
+	unsigned long rxq_mask[VI_MASK_LENGTH];
+	unsigned rxq_count;
+	unsigned long rxq_retry_mask[VI_MASK_LENGTH];
+	atomic_t rxq_retry_count;
+	struct work_struct reset_work;
+};
+
+struct efx_memcpy_req {
+	unsigned int from_rid;
+	void *from_buf;
+	u64 from_addr;
+	unsigned int to_rid;
+	u64 to_addr;
+	unsigned length;
+};
+
+/**
+ * struct efx_local_addr - A MAC address on the vswitch without a VF.
+ *
+ * Siena does not have a switch, so VFs can't transmit data to each
+ * other. Instead the VFs must be made aware of the local addresses
+ * on the vswitch, so that they can arrange for an alternative
+ * software datapath to be used.
+ *
+ * @link: List head for insertion into efx->local_addr_list.
+ * @addr: Ethernet address
+ */
+struct efx_local_addr {
+	struct list_head link;
+	u8 addr[ETH_ALEN];
+};
+
+/**
+ * struct efx_endpoint_page - Page of vfdi_endpoint structures
+ *
+ * @link: List head for insertion into efx->local_page_list.
+ * @ptr: Pointer to page.
+ * @addr: DMA address of page.
+ */
+struct efx_endpoint_page {
+	struct list_head link;
+	void *ptr;
+	dma_addr_t addr;
+};
+
+/* Buffer table entries are reserved txq0,rxq0,evq0,txq1,rxq1,evq1 */
+#define EFX_BUFTBL_TXQ_BASE(_vf, _qid)					\
+	((_vf)->buftbl_base + EFX_VF_BUFTBL_PER_VI * (_qid))
+#define EFX_BUFTBL_RXQ_BASE(_vf, _qid)					\
+	(EFX_BUFTBL_TXQ_BASE(_vf, _qid) +				\
+	 (EFX_MAX_DMAQ_SIZE * sizeof(efx_qword_t) / EFX_BUF_SIZE))
+#define EFX_BUFTBL_EVQ_BASE(_vf, _qid)					\
+	(EFX_BUFTBL_TXQ_BASE(_vf, _qid) +				\
+	 (2 * EFX_MAX_DMAQ_SIZE * sizeof(efx_qword_t) / EFX_BUF_SIZE))
+
+#define EFX_FIELD_MASK(_field)			\
+	((1 << _field ## _WIDTH) - 1)
+
+/* VFs can only use this many transmit channels */
+static unsigned int vf_max_tx_channels = 2;
+module_param(vf_max_tx_channels, uint, 0444);
+MODULE_PARM_DESC(vf_max_tx_channels,
+		 "Limit the number of TX channels VFs can use");
+
+static int max_vfs = -1;
+module_param(max_vfs, int, 0444);
+MODULE_PARM_DESC(max_vfs,
+		 "Reduce the number of VFs initialized by the driver");
+
+/* Workqueue used by VFDI communication.  We can't use the global
+ * workqueue because it may be running the VF driver's probe()
+ * routine, which will be blocked there waiting for a VFDI response.
+ */
+static struct workqueue_struct *vfdi_workqueue;
+
+static unsigned abs_index(struct efx_vf *vf, unsigned index)
+{
+	return EFX_VI_BASE + vf->index * efx_vf_size(vf->efx) + index;
+}
+
+static int efx_sriov_cmd(struct efx_nic *efx, bool enable,
+			 unsigned *vi_scale_out, unsigned *vf_total_out)
+{
+	u8 inbuf[MC_CMD_SRIOV_IN_LEN];
+	u8 outbuf[MC_CMD_SRIOV_OUT_LEN];
+	unsigned vi_scale, vf_total;
+	size_t outlen;
+	int rc;
+
+	MCDI_SET_DWORD(inbuf, SRIOV_IN_ENABLE, enable ? 1 : 0);
+	MCDI_SET_DWORD(inbuf, SRIOV_IN_VI_BASE, EFX_VI_BASE);
+	MCDI_SET_DWORD(inbuf, SRIOV_IN_VF_COUNT, efx->vf_count);
+
+	rc = efx_mcdi_rpc(efx, MC_CMD_SRIOV, inbuf, MC_CMD_SRIOV_IN_LEN,
+			  outbuf, MC_CMD_SRIOV_OUT_LEN, &outlen);
+	if (rc)
+		return rc;
+	if (outlen < MC_CMD_SRIOV_OUT_LEN)
+		return -EIO;
+
+	vf_total = MCDI_DWORD(outbuf, SRIOV_OUT_VF_TOTAL);
+	vi_scale = MCDI_DWORD(outbuf, SRIOV_OUT_VI_SCALE);
+	if (vi_scale > EFX_VI_SCALE_MAX)
+		return -EOPNOTSUPP;
+
+	if (vi_scale_out)
+		*vi_scale_out = vi_scale;
+	if (vf_total_out)
+		*vf_total_out = vf_total;
+
+	return 0;
+}
+
+static void efx_sriov_usrev(struct efx_nic *efx, bool enabled)
+{
+	efx_oword_t reg;
+
+	EFX_POPULATE_OWORD_2(reg,
+			     FRF_CZ_USREV_DIS, enabled ? 0 : 1,
+			     FRF_CZ_DFLT_EVQ, efx->vfdi_channel->channel);
+	efx_writeo(efx, &reg, FR_CZ_USR_EV_CFG);
+}
+
+static int efx_sriov_memcpy(struct efx_nic *efx, struct efx_memcpy_req *req,
+			    unsigned int count)
+{
+	u8 *inbuf, *record;
+	unsigned int used;
+	u32 from_rid, from_hi, from_lo;
+	int rc;
+
+	mb();	/* Finish writing source/reading dest before DMA starts */
+
+	used = MC_CMD_MEMCPY_IN_LEN(count);
+	if (WARN_ON(used > MCDI_CTL_SDU_LEN_MAX))
+		return -ENOBUFS;
+
+	/* Allocate room for the largest request */
+	inbuf = kzalloc(MCDI_CTL_SDU_LEN_MAX, GFP_KERNEL);
+	if (inbuf == NULL)
+		return -ENOMEM;
+
+	record = inbuf;
+	MCDI_SET_DWORD(record, MEMCPY_IN_RECORD, count);
+	while (count-- > 0) {
+		MCDI_SET_DWORD(record, MEMCPY_RECORD_TYPEDEF_TO_RID,
+			       req->to_rid);
+		MCDI_SET_DWORD(record, MEMCPY_RECORD_TYPEDEF_TO_ADDR_LO,
+			       (u32)req->to_addr);
+		MCDI_SET_DWORD(record, MEMCPY_RECORD_TYPEDEF_TO_ADDR_HI,
+			       (u32)(req->to_addr >> 32));
+		if (req->from_buf == NULL) {
+			from_rid = req->from_rid;
+			from_lo = (u32)req->from_addr;
+			from_hi = (u32)(req->from_addr >> 32);
+		} else {
+			if (WARN_ON(used + req->length > MCDI_CTL_SDU_LEN_MAX)) {
+				rc = -ENOBUFS;
+				goto out;
+			}
+
+			from_rid = MC_CMD_MEMCPY_RECORD_TYPEDEF_RID_INLINE;
+			from_lo = used;
+			from_hi = 0;
+			memcpy(inbuf + used, req->from_buf, req->length);
+			used += req->length;
+		}
+
+		MCDI_SET_DWORD(record, MEMCPY_RECORD_TYPEDEF_FROM_RID, from_rid);
+		MCDI_SET_DWORD(record, MEMCPY_RECORD_TYPEDEF_FROM_ADDR_LO,
+			       from_lo);
+		MCDI_SET_DWORD(record, MEMCPY_RECORD_TYPEDEF_FROM_ADDR_HI,
+			       from_hi);
+		MCDI_SET_DWORD(record, MEMCPY_RECORD_TYPEDEF_LENGTH,
+			       req->length);
+
+		++req;
+		record += MC_CMD_MEMCPY_IN_RECORD_LEN;
+	}
+
+	rc = efx_mcdi_rpc(efx, MC_CMD_MEMCPY, inbuf, used, NULL, 0, NULL);
+out:
+	kfree(inbuf);
+
+	mb();	/* Don't write source/read dest before DMA is complete */
+
+	return rc;
+}
+
+/* The TX filter is entirely controlled by this driver, and is modified
+ * underneath the feet of the VF
+ */
+static void efx_sriov_reset_tx_filter(struct efx_vf *vf)
+{
+	struct efx_nic *efx = vf->efx;
+	struct efx_filter_spec filter;
+	u16 vlan;
+	int rc;
+
+	if (vf->tx_filter_id != -1) {
+		efx_filter_remove_id_safe(efx, EFX_FILTER_PRI_REQUIRED,
+					  vf->tx_filter_id);
+		netif_dbg(efx, hw, efx->net_dev, "Removed vf %s tx filter %d\n",
+			  vf->pci_name, vf->tx_filter_id);
+		vf->tx_filter_id = -1;
+	}
+
+	if (is_zero_ether_addr(vf->addr.mac_addr))
+		return;
+
+	/* Turn on TX filtering automatically if not explicitly
+	 * enabled or disabled.
+	 */
+	if (vf->tx_filter_mode == VF_TX_FILTER_AUTO && vf_max_tx_channels <= 2)
+		vf->tx_filter_mode = VF_TX_FILTER_ON;
+
+	vlan = ntohs(vf->addr.tci) & VLAN_VID_MASK;
+	efx_filter_init_tx(&filter, abs_index(vf, 0));
+	rc = efx_filter_set_eth_local(&filter,
+				      vlan ? vlan : EFX_FILTER_VID_UNSPEC,
+				      vf->addr.mac_addr);
+	BUG_ON(rc);
+
+	rc = efx_filter_insert_filter(efx, &filter, true);
+	if (rc < 0) {
+		netif_warn(efx, hw, efx->net_dev,
+			   "Unable to migrate tx filter for vf %s\n",
+			   vf->pci_name);
+	} else {
+		netif_dbg(efx, hw, efx->net_dev, "Inserted vf %s tx filter %d\n",
+			  vf->pci_name, rc);
+		vf->tx_filter_id = rc;
+	}
+}
+
+/* The RX filter is managed here on behalf of the VF driver */
+static void efx_sriov_reset_rx_filter(struct efx_vf *vf)
+{
+	struct efx_nic *efx = vf->efx;
+	struct efx_filter_spec filter;
+	u16 vlan;
+	int rc;
+
+	if (vf->rx_filter_id != -1) {
+		efx_filter_remove_id_safe(efx, EFX_FILTER_PRI_REQUIRED,
+					  vf->rx_filter_id);
+		netif_dbg(efx, hw, efx->net_dev, "Removed vf %s rx filter %d\n",
+			  vf->pci_name, vf->rx_filter_id);
+		vf->rx_filter_id = -1;
+	}
+
+	if (!vf->rx_filtering || is_zero_ether_addr(vf->addr.mac_addr))
+		return;
+
+	vlan = ntohs(vf->addr.tci) & VLAN_VID_MASK;
+	efx_filter_init_rx(&filter, EFX_FILTER_PRI_REQUIRED,
+			   vf->rx_filter_flags,
+			   abs_index(vf, vf->rx_filter_qid));
+	rc = efx_filter_set_eth_local(&filter,
+				      vlan ? vlan : EFX_FILTER_VID_UNSPEC,
+				      vf->addr.mac_addr);
+	BUG_ON(rc);
+
+	rc = efx_filter_insert_filter(efx, &filter, true);
+	if (rc < 0) {
+		netif_warn(efx, hw, efx->net_dev,
+			   "Unable to insert rx filter for vf %s\n",
+			   vf->pci_name);
+	} else {
+		netif_dbg(efx, hw, efx->net_dev, "Inserted vf %s rx filter %d\n",
+			  vf->pci_name, rc);
+		vf->rx_filter_id = rc;
+	}
+}
+
+static void __efx_sriov_update_vf_addr(struct efx_vf *vf)
+{
+	efx_sriov_reset_tx_filter(vf);
+	efx_sriov_reset_rx_filter(vf);
+	queue_work(vfdi_workqueue, &vf->efx->peer_work);
+}
+
+/* Push the peer list to this VF. The caller must hold status_lock to interlock
+ * with VFDI requests, and they must be serialised against manipulation of
+ * local_page_list, either by acquiring local_lock or by running from
+ * efx_sriov_peer_work()
+ */
+static void __efx_sriov_push_vf_status(struct efx_vf *vf)
+{
+	struct efx_nic *efx = vf->efx;
+	struct vfdi_status *status = efx->vfdi_status.addr;
+	struct efx_memcpy_req copy[4];
+	struct efx_endpoint_page *epp;
+	unsigned int pos, count;
+	unsigned data_offset;
+	efx_qword_t event;
+
+	WARN_ON(!mutex_is_locked(&vf->status_lock));
+	WARN_ON(!vf->status_addr);
+
+	status->local = vf->addr;
+	status->generation_end = ++status->generation_start;
+
+	memset(copy, '\0', sizeof(copy));
+	/* Write generation_start */
+	copy[0].from_buf = &status->generation_start;
+	copy[0].to_rid = vf->pci_rid;
+	copy[0].to_addr = vf->status_addr + offsetof(struct vfdi_status,
+						     generation_start);
+	copy[0].length = sizeof(status->generation_start);
+	/* DMA the rest of the structure (excluding the generations). This
+	 * assumes that the non-generation portion of vfdi_status is in
+	 * one chunk starting at the version member.
+	 */
+	data_offset = offsetof(struct vfdi_status, version);
+	copy[1].from_rid = efx->pci_dev->devfn;
+	copy[1].from_addr = efx->vfdi_status.dma_addr + data_offset;
+	copy[1].to_rid = vf->pci_rid;
+	copy[1].to_addr = vf->status_addr + data_offset;
+	copy[1].length =  status->length - data_offset;
+
+	/* Copy the peer pages */
+	pos = 2;
+	count = 0;
+	list_for_each_entry(epp, &efx->local_page_list, link) {
+		if (count == vf->peer_page_count) {
+			/* The VF driver will know they need to provide more
+			 * pages because peer_addr_count is too large.
+			 */
+			break;
+		}
+		copy[pos].from_buf = NULL;
+		copy[pos].from_rid = efx->pci_dev->devfn;
+		copy[pos].from_addr = epp->addr;
+		copy[pos].to_rid = vf->pci_rid;
+		copy[pos].to_addr = vf->peer_page_addrs[count];
+		copy[pos].length = EFX_PAGE_SIZE;
+
+		if (++pos == ARRAY_SIZE(copy)) {
+			efx_sriov_memcpy(efx, copy, ARRAY_SIZE(copy));
+			pos = 0;
+		}
+		++count;
+	}
+
+	/* Write generation_end */
+	copy[pos].from_buf = &status->generation_end;
+	copy[pos].to_rid = vf->pci_rid;
+	copy[pos].to_addr = vf->status_addr + offsetof(struct vfdi_status,
+						       generation_end);
+	copy[pos].length = sizeof(status->generation_end);
+	efx_sriov_memcpy(efx, copy, pos + 1);
+
+	/* Notify the guest */
+	EFX_POPULATE_QWORD_3(event,
+			     FSF_AZ_EV_CODE, FSE_CZ_EV_CODE_USER_EV,
+			     VFDI_EV_SEQ, (vf->msg_seqno & 0xff),
+			     VFDI_EV_TYPE, VFDI_EV_TYPE_STATUS);
+	++vf->msg_seqno;
+	efx_generate_event(efx, EFX_VI_BASE + vf->index * efx_vf_size(efx),
+			      &event);
+}
+
+static void efx_sriov_bufs(struct efx_nic *efx, unsigned offset,
+			   u64 *addr, unsigned count)
+{
+	efx_qword_t buf;
+	unsigned pos;
+
+	for (pos = 0; pos < count; ++pos) {
+		EFX_POPULATE_QWORD_3(buf,
+				     FRF_AZ_BUF_ADR_REGION, 0,
+				     FRF_AZ_BUF_ADR_FBUF,
+				     addr ? addr[pos] >> 12 : 0,
+				     FRF_AZ_BUF_OWNER_ID_FBUF, 0);
+		efx_sram_writeq(efx, efx->membase + FR_BZ_BUF_FULL_TBL,
+				&buf, offset + pos);
+	}
+}
+
+static bool bad_vf_index(struct efx_nic *efx, unsigned index)
+{
+	return index >= efx_vf_size(efx);
+}
+
+static bool bad_buf_count(unsigned buf_count, unsigned max_entry_count)
+{
+	unsigned max_buf_count = max_entry_count *
+		sizeof(efx_qword_t) / EFX_BUF_SIZE;
+
+	return ((buf_count & (buf_count - 1)) || buf_count > max_buf_count);
+}
+
+/* Check that VI specified by per-port index belongs to a VF.
+ * Optionally set VF index and VI index within the VF.
+ */
+static bool map_vi_index(struct efx_nic *efx, unsigned abs_index,
+			 struct efx_vf **vf_out, unsigned *rel_index_out)
+{
+	unsigned vf_i;
+
+	if (abs_index < EFX_VI_BASE)
+		return true;
+	vf_i = (abs_index - EFX_VI_BASE) * efx_vf_size(efx);
+	if (vf_i >= efx->vf_init_count)
+		return true;
+
+	if (vf_out)
+		*vf_out = efx->vf + vf_i;
+	if (rel_index_out)
+		*rel_index_out = abs_index % efx_vf_size(efx);
+	return false;
+}
+
+static int efx_vfdi_init_evq(struct efx_vf *vf)
+{
+	struct efx_nic *efx = vf->efx;
+	struct vfdi_req *req = vf->buf.addr;
+	unsigned vf_evq = req->u.init_evq.index;
+	unsigned buf_count = req->u.init_evq.buf_count;
+	unsigned abs_evq = abs_index(vf, vf_evq);
+	unsigned buftbl = EFX_BUFTBL_EVQ_BASE(vf, vf_evq);
+	efx_oword_t reg;
+
+	if (bad_vf_index(efx, vf_evq) ||
+	    bad_buf_count(buf_count, EFX_MAX_VF_EVQ_SIZE)) {
+		if (net_ratelimit())
+			netif_err(efx, hw, efx->net_dev,
+				  "ERROR: Invalid INIT_EVQ from %s: evq %d bufs %d\n",
+				  vf->pci_name, vf_evq, buf_count);
+		return VFDI_RC_EINVAL;
+	}
+
+	efx_sriov_bufs(efx, buftbl, req->u.init_evq.addr, buf_count);
+
+	EFX_POPULATE_OWORD_3(reg,
+			     FRF_CZ_TIMER_Q_EN, 1,
+			     FRF_CZ_HOST_NOTIFY_MODE, 0,
+			     FRF_CZ_TIMER_MODE, FFE_CZ_TIMER_MODE_DIS);
+	efx_writeo_table(efx, &reg, FR_BZ_TIMER_TBL, abs_evq);
+	EFX_POPULATE_OWORD_3(reg,
+			     FRF_AZ_EVQ_EN, 1,
+			     FRF_AZ_EVQ_SIZE, __ffs(buf_count),
+			     FRF_AZ_EVQ_BUF_BASE_ID, buftbl);
+	efx_writeo_table(efx, &reg, FR_BZ_EVQ_PTR_TBL, abs_evq);
+
+	if (vf_evq == 0) {
+		memcpy(vf->evq0_addrs, req->u.init_evq.addr,
+		       buf_count * sizeof(u64));
+		vf->evq0_count = buf_count;
+	}
+
+	return VFDI_RC_SUCCESS;
+}
+
+static int efx_vfdi_init_rxq(struct efx_vf *vf)
+{
+	struct efx_nic *efx = vf->efx;
+	struct vfdi_req *req = vf->buf.addr;
+	unsigned vf_rxq = req->u.init_rxq.index;
+	unsigned vf_evq = req->u.init_rxq.evq;
+	unsigned buf_count = req->u.init_rxq.buf_count;
+	unsigned buftbl = EFX_BUFTBL_RXQ_BASE(vf, vf_rxq);
+	unsigned label;
+	efx_oword_t reg;
+
+	if (bad_vf_index(efx, vf_evq) || bad_vf_index(efx, vf_rxq) ||
+	    bad_buf_count(buf_count, EFX_MAX_DMAQ_SIZE)) {
+		if (net_ratelimit())
+			netif_err(efx, hw, efx->net_dev,
+				  "ERROR: Invalid INIT_RXQ from %s: rxq %d evq %d "
+				  "buf_count %d\n", vf->pci_name, vf_rxq,
+				  vf_evq, buf_count);
+		return VFDI_RC_EINVAL;
+	}
+	if (__test_and_set_bit(req->u.init_rxq.index, vf->rxq_mask))
+		++vf->rxq_count;
+	efx_sriov_bufs(efx, buftbl, req->u.init_rxq.addr, buf_count);
+
+	label = req->u.init_rxq.label & EFX_FIELD_MASK(FRF_AZ_RX_DESCQ_LABEL);
+	EFX_POPULATE_OWORD_6(reg,
+			     FRF_AZ_RX_DESCQ_BUF_BASE_ID, buftbl,
+			     FRF_AZ_RX_DESCQ_EVQ_ID, abs_index(vf, vf_evq),
+			     FRF_AZ_RX_DESCQ_LABEL, label,
+			     FRF_AZ_RX_DESCQ_SIZE, __ffs(buf_count),
+			     FRF_AZ_RX_DESCQ_JUMBO,
+			     !!(req->u.init_rxq.flags &
+				VFDI_RXQ_FLAG_SCATTER_EN),
+			     FRF_AZ_RX_DESCQ_EN, 1);
+	efx_writeo_table(efx, &reg, FR_BZ_RX_DESC_PTR_TBL,
+			 abs_index(vf, vf_rxq));
+
+	return VFDI_RC_SUCCESS;
+}
+
+static int efx_vfdi_init_txq(struct efx_vf *vf)
+{
+	struct efx_nic *efx = vf->efx;
+	struct vfdi_req *req = vf->buf.addr;
+	unsigned vf_txq = req->u.init_txq.index;
+	unsigned vf_evq = req->u.init_txq.evq;
+	unsigned buf_count = req->u.init_txq.buf_count;
+	unsigned buftbl = EFX_BUFTBL_TXQ_BASE(vf, vf_txq);
+	unsigned label, eth_filt_en;
+	efx_oword_t reg;
+
+	if (bad_vf_index(efx, vf_evq) || bad_vf_index(efx, vf_txq) ||
+	    vf_txq >= vf_max_tx_channels ||
+	    bad_buf_count(buf_count, EFX_MAX_DMAQ_SIZE)) {
+		if (net_ratelimit())
+			netif_err(efx, hw, efx->net_dev,
+				  "ERROR: Invalid INIT_TXQ from %s: txq %d evq %d "
+				  "buf_count %d\n", vf->pci_name, vf_txq,
+				  vf_evq, buf_count);
+		return VFDI_RC_EINVAL;
+	}
+
+	mutex_lock(&vf->txq_lock);
+	if (__test_and_set_bit(req->u.init_txq.index, vf->txq_mask))
+		++vf->txq_count;
+	mutex_unlock(&vf->txq_lock);
+	efx_sriov_bufs(efx, buftbl, req->u.init_txq.addr, buf_count);
+
+	eth_filt_en = vf->tx_filter_mode == VF_TX_FILTER_ON;
+
+	label = req->u.init_txq.label & EFX_FIELD_MASK(FRF_AZ_TX_DESCQ_LABEL);
+	EFX_POPULATE_OWORD_8(reg,
+			     FRF_CZ_TX_DPT_Q_MASK_WIDTH, min(efx->vi_scale, 1U),
+			     FRF_CZ_TX_DPT_ETH_FILT_EN, eth_filt_en,
+			     FRF_AZ_TX_DESCQ_EN, 1,
+			     FRF_AZ_TX_DESCQ_BUF_BASE_ID, buftbl,
+			     FRF_AZ_TX_DESCQ_EVQ_ID, abs_index(vf, vf_evq),
+			     FRF_AZ_TX_DESCQ_LABEL, label,
+			     FRF_AZ_TX_DESCQ_SIZE, __ffs(buf_count),
+			     FRF_BZ_TX_NON_IP_DROP_DIS, 1);
+	efx_writeo_table(efx, &reg, FR_BZ_TX_DESC_PTR_TBL,
+			 abs_index(vf, vf_txq));
+
+	return VFDI_RC_SUCCESS;
+}
+
+/* Returns true when efx_vfdi_fini_all_queues should wake */
+static bool efx_vfdi_flush_wake(struct efx_vf *vf)
+{
+	/* Ensure that all updates are visible to efx_vfdi_fini_all_queues() */
+	smp_mb();
+
+	return (!vf->txq_count && !vf->rxq_count) ||
+		atomic_read(&vf->rxq_retry_count);
+}
+
+static void efx_vfdi_flush_clear(struct efx_vf *vf)
+{
+	memset(vf->txq_mask, 0, sizeof(vf->txq_mask));
+	vf->txq_count = 0;
+	memset(vf->rxq_mask, 0, sizeof(vf->rxq_mask));
+	vf->rxq_count = 0;
+	memset(vf->rxq_retry_mask, 0, sizeof(vf->rxq_retry_mask));
+	atomic_set(&vf->rxq_retry_count, 0);
+}
+
+static int efx_vfdi_fini_all_queues(struct efx_vf *vf)
+{
+	struct efx_nic *efx = vf->efx;
+	efx_oword_t reg;
+	unsigned count = efx_vf_size(efx);
+	unsigned vf_offset = EFX_VI_BASE + vf->index * efx_vf_size(efx);
+	unsigned timeout = HZ;
+	unsigned index, rxqs_count;
+	__le32 *rxqs;
+	int rc;
+
+	rxqs = kmalloc(count * sizeof(*rxqs), GFP_KERNEL);
+	if (rxqs == NULL)
+		return VFDI_RC_ENOMEM;
+
+	rtnl_lock();
+	if (efx->fc_disable++ == 0)
+		efx_mcdi_set_mac(efx);
+	rtnl_unlock();
+
+	/* Flush all the initialized queues */
+	rxqs_count = 0;
+	for (index = 0; index < count; ++index) {
+		if (test_bit(index, vf->txq_mask)) {
+			EFX_POPULATE_OWORD_2(reg,
+					     FRF_AZ_TX_FLUSH_DESCQ_CMD, 1,
+					     FRF_AZ_TX_FLUSH_DESCQ,
+					     vf_offset + index);
+			efx_writeo(efx, &reg, FR_AZ_TX_FLUSH_DESCQ);
+		}
+		if (test_bit(index, vf->rxq_mask))
+			rxqs[rxqs_count++] = cpu_to_le32(vf_offset + index);
+	}
+
+	atomic_set(&vf->rxq_retry_count, 0);
+	while (timeout && (vf->rxq_count || vf->txq_count)) {
+		rc = efx_mcdi_rpc(efx, MC_CMD_FLUSH_RX_QUEUES, (u8 *)rxqs,
+				  rxqs_count * sizeof(*rxqs), NULL, 0, NULL);
+		WARN_ON(rc < 0);
+
+		timeout = wait_event_timeout(vf->flush_waitq,
+					     efx_vfdi_flush_wake(vf),
+					     timeout);
+		rxqs_count = 0;
+		for (index = 0; index < count; ++index) {
+			if (test_and_clear_bit(index, vf->rxq_retry_mask)) {
+				atomic_dec(&vf->rxq_retry_count);
+				rxqs[rxqs_count++] =
+					cpu_to_le32(vf_offset + index);
+			}
+		}
+	}
+
+	rtnl_lock();
+	if (--efx->fc_disable == 0)
+		efx_mcdi_set_mac(efx);
+	rtnl_unlock();
+
+	/* Irrespective of success/failure, fini the queues */
+	EFX_ZERO_OWORD(reg);
+	for (index = 0; index < count; ++index) {
+		efx_writeo_table(efx, &reg, FR_BZ_RX_DESC_PTR_TBL,
+				 vf_offset + index);
+		efx_writeo_table(efx, &reg, FR_BZ_TX_DESC_PTR_TBL,
+				 vf_offset + index);
+		efx_writeo_table(efx, &reg, FR_BZ_EVQ_PTR_TBL,
+				 vf_offset + index);
+		efx_writeo_table(efx, &reg, FR_BZ_TIMER_TBL,
+				 vf_offset + index);
+	}
+	efx_sriov_bufs(efx, vf->buftbl_base, NULL,
+		       EFX_VF_BUFTBL_PER_VI * efx_vf_size(efx));
+	kfree(rxqs);
+	efx_vfdi_flush_clear(vf);
+
+	vf->evq0_count = 0;
+
+	return timeout ? 0 : VFDI_RC_ETIMEDOUT;
+}
+
+static int efx_vfdi_insert_filter(struct efx_vf *vf)
+{
+	struct efx_nic *efx = vf->efx;
+	struct vfdi_req *req = vf->buf.addr;
+	unsigned vf_rxq = req->u.mac_filter.rxq;
+	unsigned flags;
+
+	if (bad_vf_index(efx, vf_rxq) || vf->rx_filtering) {
+		if (net_ratelimit())
+			netif_err(efx, hw, efx->net_dev,
+				  "ERROR: Invalid INSERT_FILTER from %s: rxq %d "
+				  "flags 0x%x\n", vf->pci_name, vf_rxq,
+				  req->u.mac_filter.flags);
+		return VFDI_RC_EINVAL;
+	}
+
+	flags = 0;
+	if (req->u.mac_filter.flags & VFDI_MAC_FILTER_FLAG_RSS)
+		flags |= EFX_FILTER_FLAG_RX_RSS;
+	if (req->u.mac_filter.flags & VFDI_MAC_FILTER_FLAG_SCATTER)
+		flags |= EFX_FILTER_FLAG_RX_SCATTER;
+	vf->rx_filter_flags = flags;
+	vf->rx_filter_qid = vf_rxq;
+	vf->rx_filtering = true;
+
+	efx_sriov_reset_rx_filter(vf);
+	queue_work(vfdi_workqueue, &efx->peer_work);
+
+	return VFDI_RC_SUCCESS;
+}
+
+static int efx_vfdi_remove_all_filters(struct efx_vf *vf)
+{
+	vf->rx_filtering = false;
+	efx_sriov_reset_rx_filter(vf);
+	queue_work(vfdi_workqueue, &vf->efx->peer_work);
+
+	return VFDI_RC_SUCCESS;
+}
+
+static int efx_vfdi_set_status_page(struct efx_vf *vf)
+{
+	struct efx_nic *efx = vf->efx;
+	struct vfdi_req *req = vf->buf.addr;
+	unsigned int page_count;
+
+	page_count = req->u.set_status_page.peer_page_count;
+	if (!req->u.set_status_page.dma_addr || EFX_PAGE_SIZE <
+	    offsetof(struct vfdi_req,
+		     u.set_status_page.peer_page_addr[page_count])) {
+		if (net_ratelimit())
+			netif_err(efx, hw, efx->net_dev,
+				  "ERROR: Invalid SET_STATUS_PAGE from %s\n",
+				  vf->pci_name);
+		return VFDI_RC_EINVAL;
+	}
+
+	mutex_lock(&efx->local_lock);
+	mutex_lock(&vf->status_lock);
+	vf->status_addr = req->u.set_status_page.dma_addr;
+
+	kfree(vf->peer_page_addrs);
+	vf->peer_page_addrs = NULL;
+	vf->peer_page_count = 0;
+
+	if (page_count) {
+		vf->peer_page_addrs = kcalloc(page_count, sizeof(u64),
+					      GFP_KERNEL);
+		if (vf->peer_page_addrs) {
+			memcpy(vf->peer_page_addrs,
+			       req->u.set_status_page.peer_page_addr,
+			       page_count * sizeof(u64));
+			vf->peer_page_count = page_count;
+		}
+	}
+
+	__efx_sriov_push_vf_status(vf);
+	mutex_unlock(&vf->status_lock);
+	mutex_unlock(&efx->local_lock);
+
+	return VFDI_RC_SUCCESS;
+}
+
+static int efx_vfdi_clear_status_page(struct efx_vf *vf)
+{
+	mutex_lock(&vf->status_lock);
+	vf->status_addr = 0;
+	mutex_unlock(&vf->status_lock);
+
+	return VFDI_RC_SUCCESS;
+}
+
+typedef int (*efx_vfdi_op_t)(struct efx_vf *vf);
+
+static const efx_vfdi_op_t vfdi_ops[VFDI_OP_LIMIT] = {
+	[VFDI_OP_INIT_EVQ] = efx_vfdi_init_evq,
+	[VFDI_OP_INIT_TXQ] = efx_vfdi_init_txq,
+	[VFDI_OP_INIT_RXQ] = efx_vfdi_init_rxq,
+	[VFDI_OP_FINI_ALL_QUEUES] = efx_vfdi_fini_all_queues,
+	[VFDI_OP_INSERT_FILTER] = efx_vfdi_insert_filter,
+	[VFDI_OP_REMOVE_ALL_FILTERS] = efx_vfdi_remove_all_filters,
+	[VFDI_OP_SET_STATUS_PAGE] = efx_vfdi_set_status_page,
+	[VFDI_OP_CLEAR_STATUS_PAGE] = efx_vfdi_clear_status_page,
+};
+
+static void efx_sriov_vfdi(struct work_struct *work)
+{
+	struct efx_vf *vf = container_of(work, struct efx_vf, req);
+	struct efx_nic *efx = vf->efx;
+	struct vfdi_req *req = vf->buf.addr;
+	struct efx_memcpy_req copy[2];
+	int rc;
+
+	/* Copy this page into the local address space */
+	memset(copy, '\0', sizeof(copy));
+	copy[0].from_rid = vf->pci_rid;
+	copy[0].from_addr = vf->req_addr;
+	copy[0].to_rid = efx->pci_dev->devfn;
+	copy[0].to_addr = vf->buf.dma_addr;
+	copy[0].length = EFX_PAGE_SIZE;
+	rc = efx_sriov_memcpy(efx, copy, 1);
+	if (rc) {
+		/* If we can't get the request, we can't reply to the caller */
+		if (net_ratelimit())
+			netif_err(efx, hw, efx->net_dev,
+				  "ERROR: Unable to fetch VFDI request from %s rc %d\n",
+				  vf->pci_name, -rc);
+		vf->busy = false;
+		return;
+	}
+
+	if (req->op < VFDI_OP_LIMIT && vfdi_ops[req->op] != NULL) {
+		rc = vfdi_ops[req->op](vf);
+		if (rc == 0) {
+			netif_dbg(efx, hw, efx->net_dev,
+				  "vfdi request %d from %s ok\n",
+				  req->op, vf->pci_name);
+		}
+	} else {
+		netif_dbg(efx, hw, efx->net_dev,
+			  "ERROR: Unrecognised request %d from VF %s addr "
+			  "%llx\n", req->op, vf->pci_name,
+			  (unsigned long long)vf->req_addr);
+		rc = VFDI_RC_EOPNOTSUPP;
+	}
+
+	/* Allow subsequent VF requests */
+	vf->busy = false;
+	smp_wmb();
+
+	/* Respond to the request */
+	req->rc = rc;
+	req->op = VFDI_OP_RESPONSE;
+
+	memset(copy, '\0', sizeof(copy));
+	copy[0].from_buf = &req->rc;
+	copy[0].to_rid = vf->pci_rid;
+	copy[0].to_addr = vf->req_addr + offsetof(struct vfdi_req, rc);
+	copy[0].length = sizeof(req->rc);
+	copy[1].from_buf = &req->op;
+	copy[1].to_rid = vf->pci_rid;
+	copy[1].to_addr = vf->req_addr + offsetof(struct vfdi_req, op);
+	copy[1].length = sizeof(req->op);
+
+	(void) efx_sriov_memcpy(efx, copy, ARRAY_SIZE(copy));
+}
+
+
+
+/* After a reset the event queues inside the guests no longer exist. Fill the
+ * event ring in guest memory with VFDI reset events, then (re-initialise) the
+ * event queue to raise an interrupt. The guest driver will then recover.
+ */
+static void efx_sriov_reset_vf(struct efx_vf *vf, struct efx_buffer *buffer)
+{
+	struct efx_nic *efx = vf->efx;
+	struct efx_memcpy_req copy_req[4];
+	efx_qword_t event;
+	unsigned int pos, count, k, buftbl, abs_evq;
+	efx_oword_t reg;
+	efx_dword_t ptr;
+	int rc;
+
+	BUG_ON(buffer->len != EFX_PAGE_SIZE);
+
+	if (!vf->evq0_count)
+		return;
+	BUG_ON(vf->evq0_count & (vf->evq0_count - 1));
+
+	mutex_lock(&vf->status_lock);
+	EFX_POPULATE_QWORD_3(event,
+			     FSF_AZ_EV_CODE, FSE_CZ_EV_CODE_USER_EV,
+			     VFDI_EV_SEQ, vf->msg_seqno,
+			     VFDI_EV_TYPE, VFDI_EV_TYPE_RESET);
+	vf->msg_seqno++;
+	for (pos = 0; pos < EFX_PAGE_SIZE; pos += sizeof(event))
+		memcpy(buffer->addr + pos, &event, sizeof(event));
+
+	for (pos = 0; pos < vf->evq0_count; pos += count) {
+		count = min_t(unsigned, vf->evq0_count - pos,
+			      ARRAY_SIZE(copy_req));
+		for (k = 0; k < count; k++) {
+			copy_req[k].from_buf = NULL;
+			copy_req[k].from_rid = efx->pci_dev->devfn;
+			copy_req[k].from_addr = buffer->dma_addr;
+			copy_req[k].to_rid = vf->pci_rid;
+			copy_req[k].to_addr = vf->evq0_addrs[pos + k];
+			copy_req[k].length = EFX_PAGE_SIZE;
+		}
+		rc = efx_sriov_memcpy(efx, copy_req, count);
+		if (rc) {
+			if (net_ratelimit())
+				netif_err(efx, hw, efx->net_dev,
+					  "ERROR: Unable to notify %s of reset"
+					  ": %d\n", vf->pci_name, -rc);
+			break;
+		}
+	}
+
+	/* Reinitialise, arm and trigger evq0 */
+	abs_evq = abs_index(vf, 0);
+	buftbl = EFX_BUFTBL_EVQ_BASE(vf, 0);
+	efx_sriov_bufs(efx, buftbl, vf->evq0_addrs, vf->evq0_count);
+
+	EFX_POPULATE_OWORD_3(reg,
+			     FRF_CZ_TIMER_Q_EN, 1,
+			     FRF_CZ_HOST_NOTIFY_MODE, 0,
+			     FRF_CZ_TIMER_MODE, FFE_CZ_TIMER_MODE_DIS);
+	efx_writeo_table(efx, &reg, FR_BZ_TIMER_TBL, abs_evq);
+	EFX_POPULATE_OWORD_3(reg,
+			     FRF_AZ_EVQ_EN, 1,
+			     FRF_AZ_EVQ_SIZE, __ffs(vf->evq0_count),
+			     FRF_AZ_EVQ_BUF_BASE_ID, buftbl);
+	efx_writeo_table(efx, &reg, FR_BZ_EVQ_PTR_TBL, abs_evq);
+	EFX_POPULATE_DWORD_1(ptr, FRF_AZ_EVQ_RPTR, 0);
+	efx_writed_table(efx, &ptr, FR_BZ_EVQ_RPTR, abs_evq);
+
+	mutex_unlock(&vf->status_lock);
+}
+
+static void efx_sriov_reset_vf_work(struct work_struct *work)
+{
+	struct efx_vf *vf = container_of(work, struct efx_vf, req);
+	struct efx_nic *efx = vf->efx;
+	struct efx_buffer buf;
+
+	if (!efx_nic_alloc_buffer(efx, &buf, EFX_PAGE_SIZE)) {
+		efx_sriov_reset_vf(vf, &buf);
+		efx_nic_free_buffer(efx, &buf);
+	}
+}
+
+static void efx_sriov_handle_no_channel(struct efx_nic *efx)
+{
+	netif_err(efx, drv, efx->net_dev,
+		  "ERROR: IOV requires MSI-X and 1 additional interrupt"
+		  "vector. IOV disabled\n");
+	efx->vf_count = 0;
+}
+
+static int efx_sriov_probe_channel(struct efx_channel *channel)
+{
+	channel->efx->vfdi_channel = channel;
+	return 0;
+}
+
+static void
+efx_sriov_get_channel_name(struct efx_channel *channel, char *buf, size_t len)
+{
+	snprintf(buf, len, "%s-iov", channel->efx->name);
+}
+
+static const struct efx_channel_type efx_sriov_channel_type = {
+	.handle_no_channel	= efx_sriov_handle_no_channel,
+	.pre_probe		= efx_sriov_probe_channel,
+	.get_name		= efx_sriov_get_channel_name,
+	/* no copy operation; channel must not be reallocated */
+	.keep_eventq		= true,
+};
+
+void efx_sriov_probe(struct efx_nic *efx)
+{
+	unsigned count;
+
+	if (!max_vfs)
+		return;
+
+	if (efx_sriov_cmd(efx, false, &efx->vi_scale, &count))
+		return;
+	if (count > 0 && count > max_vfs)
+		count = max_vfs;
+
+	/* efx_nic_dimension_resources() will reduce vf_count as appopriate */
+	efx->vf_count = count;
+
+	efx->extra_channel_type[EFX_EXTRA_CHANNEL_IOV] = &efx_sriov_channel_type;
+}
+
+/* Copy the list of individual addresses into the vfdi_status.peers
+ * array and auxillary pages, protected by %local_lock. Drop that lock
+ * and then broadcast the address list to every VF.
+ */
+static void efx_sriov_peer_work(struct work_struct *data)
+{
+	struct efx_nic *efx = container_of(data, struct efx_nic, peer_work);
+	struct vfdi_status *vfdi_status = efx->vfdi_status.addr;
+	struct efx_vf *vf;
+	struct efx_local_addr *local_addr;
+	struct vfdi_endpoint *peer;
+	struct efx_endpoint_page *epp;
+	struct list_head pages;
+	unsigned int peer_space;
+	unsigned int peer_count;
+	unsigned int pos;
+
+	mutex_lock(&efx->local_lock);
+
+	/* Move the existing peer pages off %local_page_list */
+	INIT_LIST_HEAD(&pages);
+	list_splice_tail_init(&efx->local_page_list, &pages);
+
+	/* Populate the VF addresses starting from entry 1 (entry 0 is
+	 * the PF address)
+	 */
+	peer = vfdi_status->peers + 1;
+	peer_space = ARRAY_SIZE(vfdi_status->peers) - 1;
+	peer_count = 1;
+	for (pos = 0; pos < efx->vf_count; ++pos) {
+		vf = efx->vf + pos;
+
+		mutex_lock(&vf->status_lock);
+		if (vf->rx_filtering && !is_zero_ether_addr(vf->addr.mac_addr)) {
+			*peer++ = vf->addr;
+			++peer_count;
+			--peer_space;
+			BUG_ON(peer_space == 0);
+		}
+		mutex_unlock(&vf->status_lock);
+	}
+
+	/* Fill the remaining addresses */
+	list_for_each_entry(local_addr, &efx->local_addr_list, link) {
+		memcpy(peer->mac_addr, local_addr->addr, ETH_ALEN);
+		peer->tci = 0;
+		++peer;
+		++peer_count;
+		if (--peer_space == 0) {
+			if (list_empty(&pages)) {
+				epp = kmalloc(sizeof(*epp), GFP_KERNEL);
+				if (!epp)
+					break;
+				epp->ptr = dma_alloc_coherent(
+					&efx->pci_dev->dev, EFX_PAGE_SIZE,
+					&epp->addr, GFP_KERNEL);
+				if (!epp->ptr) {
+					kfree(epp);
+					break;
+				}
+			} else {
+				epp = list_first_entry(
+					&pages, struct efx_endpoint_page, link);
+				list_del(&epp->link);
+			}
+
+			list_add_tail(&epp->link, &efx->local_page_list);
+			peer = (struct vfdi_endpoint *)epp->ptr;
+			peer_space = EFX_PAGE_SIZE / sizeof(struct vfdi_endpoint);
+		}
+	}
+	vfdi_status->peer_count = peer_count;
+	mutex_unlock(&efx->local_lock);
+
+	/* Free any now unused endpoint pages */
+	while (!list_empty(&pages)) {
+		epp = list_first_entry(
+			&pages, struct efx_endpoint_page, link);
+		list_del(&epp->link);
+		dma_free_coherent(&efx->pci_dev->dev, EFX_PAGE_SIZE,
+				  epp->ptr, epp->addr);
+		kfree(epp);
+	}
+
+	/* Finally, push the pages */
+	for (pos = 0; pos < efx->vf_count; ++pos) {
+		vf = efx->vf + pos;
+
+		mutex_lock(&vf->status_lock);
+		if (vf->status_addr)
+			__efx_sriov_push_vf_status(vf);
+		mutex_unlock(&vf->status_lock);
+	}
+}
+
+static void efx_sriov_free_local(struct efx_nic *efx)
+{
+	struct efx_local_addr *local_addr;
+	struct efx_endpoint_page *epp;
+
+	while (!list_empty(&efx->local_addr_list)) {
+		local_addr = list_first_entry(&efx->local_addr_list,
+					      struct efx_local_addr, link);
+		list_del(&local_addr->link);
+		kfree(local_addr);
+	}
+
+	while (!list_empty(&efx->local_page_list)) {
+		epp = list_first_entry(&efx->local_page_list,
+				       struct efx_endpoint_page, link);
+		list_del(&epp->link);
+		dma_free_coherent(&efx->pci_dev->dev, EFX_PAGE_SIZE,
+				  epp->ptr, epp->addr);
+		kfree(epp);
+	}
+}
+
+static int efx_sriov_vf_alloc(struct efx_nic *efx)
+{
+	unsigned index;
+	struct efx_vf *vf;
+
+	efx->vf = kzalloc(sizeof(struct efx_vf) * efx->vf_count, GFP_KERNEL);
+	if (!efx->vf)
+		return -ENOMEM;
+
+	for (index = 0; index < efx->vf_count; ++index) {
+		vf = efx->vf + index;
+
+		vf->efx = efx;
+		vf->index = index;
+		vf->rx_filter_id = -1;
+		vf->tx_filter_mode = VF_TX_FILTER_AUTO;
+		vf->tx_filter_id = -1;
+		INIT_WORK(&vf->req, efx_sriov_vfdi);
+		INIT_WORK(&vf->reset_work, efx_sriov_reset_vf_work);
+		init_waitqueue_head(&vf->flush_waitq);
+		mutex_init(&vf->status_lock);
+		mutex_init(&vf->txq_lock);
+	}
+
+	return 0;
+}
+
+static void efx_sriov_vfs_fini(struct efx_nic *efx)
+{
+	struct efx_vf *vf;
+	unsigned int pos;
+
+	for (pos = 0; pos < efx->vf_count; ++pos) {
+		vf = efx->vf + pos;
+
+		efx_nic_free_buffer(efx, &vf->buf);
+		kfree(vf->peer_page_addrs);
+		vf->peer_page_addrs = NULL;
+		vf->peer_page_count = 0;
+
+		vf->evq0_count = 0;
+	}
+}
+
+static int efx_sriov_vfs_init(struct efx_nic *efx)
+{
+	struct pci_dev *pci_dev = efx->pci_dev;
+	unsigned index, devfn, sriov, buftbl_base;
+	u16 offset, stride;
+	struct efx_vf *vf;
+	int rc;
+
+	sriov = pci_find_ext_capability(pci_dev, PCI_EXT_CAP_ID_SRIOV);
+	if (!sriov)
+		return -ENOENT;
+
+	pci_read_config_word(pci_dev, sriov + PCI_SRIOV_VF_OFFSET, &offset);
+	pci_read_config_word(pci_dev, sriov + PCI_SRIOV_VF_STRIDE, &stride);
+
+	buftbl_base = efx->vf_buftbl_base;
+	devfn = pci_dev->devfn + offset;
+	for (index = 0; index < efx->vf_count; ++index) {
+		vf = efx->vf + index;
+
+		/* Reserve buffer entries */
+		vf->buftbl_base = buftbl_base;
+		buftbl_base += EFX_VF_BUFTBL_PER_VI * efx_vf_size(efx);
+
+		vf->pci_rid = devfn;
+		snprintf(vf->pci_name, sizeof(vf->pci_name),
+			 "%04x:%02x:%02x.%d",
+			 pci_domain_nr(pci_dev->bus), pci_dev->bus->number,
+			 PCI_SLOT(devfn), PCI_FUNC(devfn));
+
+		rc = efx_nic_alloc_buffer(efx, &vf->buf, EFX_PAGE_SIZE);
+		if (rc)
+			goto fail;
+
+		devfn += stride;
+	}
+
+	return 0;
+
+fail:
+	efx_sriov_vfs_fini(efx);
+	return rc;
+}
+
+int efx_sriov_init(struct efx_nic *efx)
+{
+	struct net_device *net_dev = efx->net_dev;
+	struct vfdi_status *vfdi_status;
+	int rc;
+
+	/* Ensure there's room for vf_channel */
+	BUILD_BUG_ON(EFX_MAX_CHANNELS + 1 >= EFX_VI_BASE);
+	/* Ensure that VI_BASE is aligned on VI_SCALE */
+	BUILD_BUG_ON(EFX_VI_BASE & ((1 << EFX_VI_SCALE_MAX) - 1));
+
+	if (efx->vf_count == 0)
+		return 0;
+
+	rc = efx_sriov_cmd(efx, true, NULL, NULL);
+	if (rc)
+		goto fail_cmd;
+
+	rc = efx_nic_alloc_buffer(efx, &efx->vfdi_status, sizeof(*vfdi_status));
+	if (rc)
+		goto fail_status;
+	vfdi_status = efx->vfdi_status.addr;
+	memset(vfdi_status, 0, sizeof(*vfdi_status));
+	vfdi_status->version = 1;
+	vfdi_status->length = sizeof(*vfdi_status);
+	vfdi_status->max_tx_channels = vf_max_tx_channels;
+	vfdi_status->vi_scale = efx->vi_scale;
+	vfdi_status->rss_rxq_count = efx->rss_spread;
+	vfdi_status->peer_count = 1 + efx->vf_count;
+	vfdi_status->timer_quantum_ns = efx->timer_quantum_ns;
+
+	rc = efx_sriov_vf_alloc(efx);
+	if (rc)
+		goto fail_alloc;
+
+	mutex_init(&efx->local_lock);
+	INIT_WORK(&efx->peer_work, efx_sriov_peer_work);
+	INIT_LIST_HEAD(&efx->local_addr_list);
+	INIT_LIST_HEAD(&efx->local_page_list);
+
+	rc = efx_sriov_vfs_init(efx);
+	if (rc)
+		goto fail_vfs;
+
+	rtnl_lock();
+	memcpy(vfdi_status->peers[0].mac_addr,
+	       net_dev->dev_addr, ETH_ALEN);
+	efx->vf_init_count = efx->vf_count;
+	rtnl_unlock();
+
+	efx_sriov_usrev(efx, true);
+
+	/* At this point we must be ready to accept VFDI requests */
+
+	rc = pci_enable_sriov(efx->pci_dev, efx->vf_count);
+	if (rc)
+		goto fail_pci;
+
+	netif_info(efx, probe, net_dev,
+		   "enabled SR-IOV for %d VFs, %d VI per VF\n",
+		   efx->vf_count, efx_vf_size(efx));
+	return 0;
+
+fail_pci:
+	efx_sriov_usrev(efx, false);
+	rtnl_lock();
+	efx->vf_init_count = 0;
+	rtnl_unlock();
+	efx_sriov_vfs_fini(efx);
+fail_vfs:
+	cancel_work_sync(&efx->peer_work);
+	efx_sriov_free_local(efx);
+	kfree(efx->vf);
+fail_alloc:
+	efx_nic_free_buffer(efx, &efx->vfdi_status);
+fail_status:
+	efx_sriov_cmd(efx, false, NULL, NULL);
+fail_cmd:
+	return rc;
+}
+
+void efx_sriov_fini(struct efx_nic *efx)
+{
+	struct efx_vf *vf;
+	unsigned int pos;
+
+	if (efx->vf_init_count == 0)
+		return;
+
+	/* Disable all interfaces to reconfiguration */
+	BUG_ON(efx->vfdi_channel->enabled);
+	efx_sriov_usrev(efx, false);
+	rtnl_lock();
+	efx->vf_init_count = 0;
+	rtnl_unlock();
+
+	/* Flush all reconfiguration work */
+	for (pos = 0; pos < efx->vf_count; ++pos) {
+		vf = efx->vf + pos;
+		cancel_work_sync(&vf->req);
+		cancel_work_sync(&vf->reset_work);
+	}
+	cancel_work_sync(&efx->peer_work);
+
+	pci_disable_sriov(efx->pci_dev);
+
+	/* Tear down back-end state */
+	efx_sriov_vfs_fini(efx);
+	efx_sriov_free_local(efx);
+	kfree(efx->vf);
+	efx_nic_free_buffer(efx, &efx->vfdi_status);
+	efx_sriov_cmd(efx, false, NULL, NULL);
+}
+
+void efx_sriov_event(struct efx_channel *channel, efx_qword_t *event)
+{
+	struct efx_nic *efx = channel->efx;
+	struct efx_vf *vf;
+	unsigned qid, seq, type, data;
+
+	qid = EFX_QWORD_FIELD(*event, FSF_CZ_USER_QID);
+
+	/* USR_EV_REG_VALUE is dword0, so access the VFDI_EV fields directly */
+	BUILD_BUG_ON(FSF_CZ_USER_EV_REG_VALUE_LBN != 0);
+	seq = EFX_QWORD_FIELD(*event, VFDI_EV_SEQ);
+	type = EFX_QWORD_FIELD(*event, VFDI_EV_TYPE);
+	data = EFX_QWORD_FIELD(*event, VFDI_EV_DATA);
+
+	netif_vdbg(efx, hw, efx->net_dev,
+		   "USR_EV event from qid %d seq 0x%x type %d data 0x%x\n",
+		   qid, seq, type, data);
+
+	if (map_vi_index(efx, qid, &vf, NULL))
+		return;
+	if (vf->busy)
+		goto error;
+
+	if (type == VFDI_EV_TYPE_REQ_WORD0) {
+		/* Resynchronise */
+		vf->req_type = VFDI_EV_TYPE_REQ_WORD0;
+		vf->req_seqno = seq + 1;
+		vf->req_addr = 0;
+	} else if (seq != (vf->req_seqno++ & 0xff) || type != vf->req_type)
+		goto error;
+
+	switch (vf->req_type) {
+	case VFDI_EV_TYPE_REQ_WORD0:
+	case VFDI_EV_TYPE_REQ_WORD1:
+	case VFDI_EV_TYPE_REQ_WORD2:
+		vf->req_addr |= (u64)data << (vf->req_type << 4);
+		++vf->req_type;
+		return;
+
+	case VFDI_EV_TYPE_REQ_WORD3:
+		vf->req_addr |= (u64)data << 48;
+		vf->req_type = VFDI_EV_TYPE_REQ_WORD0;
+		vf->busy = true;
+		queue_work(vfdi_workqueue, &vf->req);
+		return;
+	}
+
+error:
+	if (net_ratelimit())
+		netif_err(efx, hw, efx->net_dev,
+			  "ERROR: Screaming VFDI request from %s\n",
+			  vf->pci_name);
+	/* Reset the request and sequence number */
+	vf->req_type = VFDI_EV_TYPE_REQ_WORD0;
+	vf->req_seqno = seq + 1;
+}
+
+void efx_sriov_flr(struct efx_nic *efx, unsigned vf_i)
+{
+	struct efx_vf *vf;
+
+	if (vf_i > efx->vf_init_count)
+		return;
+	vf = efx->vf + vf_i;
+	netif_info(efx, hw, efx->net_dev,
+		   "FLR on VF %s\n", vf->pci_name);
+
+	vf->status_addr = 0;
+	efx_vfdi_remove_all_filters(vf);
+	efx_vfdi_flush_clear(vf);
+
+	vf->evq0_count = 0;
+}
+
+void efx_sriov_mac_address_changed(struct efx_nic *efx)
+{
+	struct vfdi_status *vfdi_status = efx->vfdi_status.addr;
+
+	if (!efx->vf_init_count)
+		return;
+	memcpy(vfdi_status->peers[0].mac_addr,
+	       efx->net_dev->dev_addr, ETH_ALEN);
+	queue_work(vfdi_workqueue, &efx->peer_work);
+}
+
+void efx_sriov_tx_flush_done(struct efx_nic *efx, efx_qword_t *event)
+{
+	struct efx_vf *vf;
+	unsigned queue, qid;
+
+	queue = EFX_QWORD_FIELD(*event,  FSF_AZ_DRIVER_EV_SUBDATA);
+	if (map_vi_index(efx, queue, &vf, &qid))
+		return;
+	/* Ignore flush completions triggered by an FLR */
+	if (!test_bit(qid, vf->txq_mask))
+		return;
+
+	__clear_bit(qid, vf->txq_mask);
+	--vf->txq_count;
+
+	if (efx_vfdi_flush_wake(vf))
+		wake_up(&vf->flush_waitq);
+}
+
+void efx_sriov_rx_flush_done(struct efx_nic *efx, efx_qword_t *event)
+{
+	struct efx_vf *vf;
+	unsigned ev_failed, queue, qid;
+
+	queue = EFX_QWORD_FIELD(*event, FSF_AZ_DRIVER_EV_RX_DESCQ_ID);
+	ev_failed = EFX_QWORD_FIELD(*event,
+				    FSF_AZ_DRIVER_EV_RX_FLUSH_FAIL);
+	if (map_vi_index(efx, queue, &vf, &qid))
+		return;
+	if (!test_bit(qid, vf->rxq_mask))
+		return;
+
+	if (ev_failed) {
+		set_bit(qid, vf->rxq_retry_mask);
+		atomic_inc(&vf->rxq_retry_count);
+	} else {
+		__clear_bit(qid, vf->rxq_mask);
+		--vf->rxq_count;
+	}
+	if (efx_vfdi_flush_wake(vf))
+		wake_up(&vf->flush_waitq);
+}
+
+/* Called from napi. Schedule the reset work item */
+void efx_sriov_desc_fetch_err(struct efx_nic *efx, unsigned dmaq)
+{
+	struct efx_vf *vf;
+	unsigned int rel;
+
+	if (map_vi_index(efx, dmaq, &vf, &rel))
+		return;
+
+	if (net_ratelimit())
+		netif_err(efx, hw, efx->net_dev,
+			  "VF %d DMA Q %d reports descriptor fetch error.\n",
+			  vf->index, rel);
+	queue_work(vfdi_workqueue, &vf->reset_work);
+}
+
+/* Reset all VFs */
+void efx_sriov_reset(struct efx_nic *efx)
+{
+	unsigned int vf_i;
+	struct efx_buffer buf;
+	struct efx_vf *vf;
+
+	ASSERT_RTNL();
+
+	if (efx->vf_init_count == 0)
+		return;
+
+	efx_sriov_usrev(efx, true);
+	(void)efx_sriov_cmd(efx, true, NULL, NULL);
+
+	if (efx_nic_alloc_buffer(efx, &buf, EFX_PAGE_SIZE))
+		return;
+
+	for (vf_i = 0; vf_i < efx->vf_init_count; ++vf_i) {
+		vf = efx->vf + vf_i;
+		efx_sriov_reset_vf(vf, &buf);
+	}
+
+	efx_nic_free_buffer(efx, &buf);
+}
+
+int efx_init_sriov(void)
+{
+	/* A single threaded workqueue is sufficient. efx_sriov_vfdi() and
+	 * efx_sriov_peer_work() spend almost all their time sleeping for
+	 * MCDI to complete anyway
+	 */
+	vfdi_workqueue = create_singlethread_workqueue("sfc_vfdi");
+	if (!vfdi_workqueue)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void efx_fini_sriov(void)
+{
+	destroy_workqueue(vfdi_workqueue);
+}
+
+int efx_sriov_set_vf_mac(struct net_device *net_dev, int vf_i, u8 *mac)
+{
+	struct efx_nic *efx = netdev_priv(net_dev);
+	struct efx_vf *vf;
+
+	if (vf_i >= efx->vf_init_count)
+		return -EINVAL;
+	vf = efx->vf + vf_i;
+
+	mutex_lock(&vf->status_lock);
+	memcpy(vf->addr.mac_addr, mac, ETH_ALEN);
+	__efx_sriov_update_vf_addr(vf);
+	mutex_unlock(&vf->status_lock);
+
+	return 0;
+}
+
+int efx_sriov_set_vf_vlan(struct net_device *net_dev, int vf_i,
+			  u16 vlan, u8 qos)
+{
+	struct efx_nic *efx = netdev_priv(net_dev);
+	struct efx_vf *vf;
+	u16 tci;
+
+	if (vf_i >= efx->vf_init_count)
+		return -EINVAL;
+	vf = efx->vf + vf_i;
+
+	mutex_lock(&vf->status_lock);
+	tci = (vlan & VLAN_VID_MASK) | ((qos & 0x7) << VLAN_PRIO_SHIFT);
+	vf->addr.tci = htons(tci);
+	__efx_sriov_update_vf_addr(vf);
+	mutex_unlock(&vf->status_lock);
+
+	return 0;
+}
+
+int efx_sriov_set_vf_spoofchk(struct net_device *net_dev, int vf_i,
+			      bool spoofchk)
+{
+	struct efx_nic *efx = netdev_priv(net_dev);
+	struct efx_vf *vf;
+	int rc;
+
+	if (vf_i >= efx->vf_init_count)
+		return -EINVAL;
+	vf = efx->vf + vf_i;
+
+	mutex_lock(&vf->txq_lock);
+	if (vf->txq_count == 0) {
+		vf->tx_filter_mode =
+			spoofchk ? VF_TX_FILTER_ON : VF_TX_FILTER_OFF;
+		rc = 0;
+	} else {
+		/* This cannot be changed while TX queues are running */
+		rc = -EBUSY;
+	}
+	mutex_unlock(&vf->txq_lock);
+	return rc;
+}
+
+int efx_sriov_get_vf_config(struct net_device *net_dev, int vf_i,
+			    struct ifla_vf_info *ivi)
+{
+	struct efx_nic *efx = netdev_priv(net_dev);
+	struct efx_vf *vf;
+	u16 tci;
+
+	if (vf_i >= efx->vf_init_count)
+		return -EINVAL;
+	vf = efx->vf + vf_i;
+
+	ivi->vf = vf_i;
+	memcpy(ivi->mac, vf->addr.mac_addr, ETH_ALEN);
+	ivi->tx_rate = 0;
+	tci = ntohs(vf->addr.tci);
+	ivi->vlan = tci & VLAN_VID_MASK;
+	ivi->qos = (tci >> VLAN_PRIO_SHIFT) & 0x7;
+	ivi->spoofchk = vf->tx_filter_mode == VF_TX_FILTER_ON;
+
+	return 0;
+}
+
diff --git a/drivers/net/ethernet/sfc/vfdi.h b/drivers/net/ethernet/sfc/vfdi.h
new file mode 100644
index 0000000..656fa70
--- /dev/null
+++ b/drivers/net/ethernet/sfc/vfdi.h
@@ -0,0 +1,254 @@
+/****************************************************************************
+ * Driver for Solarflare Solarstorm network controllers and boards
+ * Copyright 2010-2012 Solarflare Communications Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation, incorporated herein by reference.
+ */
+#ifndef _VFDI_H
+#define _VFDI_H
+
+/**
+ * DOC: Virtual Function Driver Interface
+ *
+ * This file contains software structures used to form a two way
+ * communication channel between the VF driver and the PF driver,
+ * named Virtual Function Driver Interface (VFDI).
+ *
+ * For the purposes of VFDI, a page is a memory region with size and
+ * alignment of 4K.  All addresses are DMA addresses to be used within
+ * the domain of the relevant VF.
+ *
+ * The only hardware-defined channels for a VF driver to communicate
+ * with the PF driver are the event mailboxes (%FR_CZ_USR_EV
+ * registers).  Writing to these registers generates an event with
+ * EV_CODE = EV_CODE_USR_EV, USER_QID set to the index of the mailbox
+ * and USER_EV_REG_VALUE set to the value written.  The PF driver may
+ * direct or disable delivery of these events by setting
+ * %FR_CZ_USR_EV_CFG.
+ *
+ * The PF driver can send arbitrary events to arbitrary event queues.
+ * However, for consistency, VFDI events from the PF are defined to
+ * follow the same form and be sent to the first event queue assigned
+ * to the VF while that queue is enabled by the VF driver.
+ *
+ * The general form of the variable bits of VFDI events is:
+ *
+ *       0             16                       24   31
+ *      | DATA        | TYPE                   | SEQ   |
+ *
+ * SEQ is a sequence number which should be incremented by 1 (modulo
+ * 256) for each event.  The sequence numbers used in each direction
+ * are independent.
+ *
+ * The VF submits requests of type &struct vfdi_req by sending the
+ * address of the request (ADDR) in a series of 4 events:
+ *
+ *       0             16                       24   31
+ *      | ADDR[0:15]  | VFDI_EV_TYPE_REQ_WORD0 | SEQ   |
+ *      | ADDR[16:31] | VFDI_EV_TYPE_REQ_WORD1 | SEQ+1 |
+ *      | ADDR[32:47] | VFDI_EV_TYPE_REQ_WORD2 | SEQ+2 |
+ *      | ADDR[48:63] | VFDI_EV_TYPE_REQ_WORD3 | SEQ+3 |
+ *
+ * The address must be page-aligned.  After receiving such a valid
+ * series of events, the PF driver will attempt to read the request
+ * and write a response to the same address.  In case of an invalid
+ * sequence of events or a DMA error, there will be no response.
+ *
+ * The VF driver may request that the PF driver writes status
+ * information into its domain asynchronously.  After writing the
+ * status, the PF driver will send an event of the form:
+ *
+ *       0             16                       24   31
+ *      | reserved    | VFDI_EV_TYPE_STATUS    | SEQ   |
+ *
+ * In case the VF must be reset for any reason, the PF driver will
+ * send an event of the form:
+ *
+ *       0             16                       24   31
+ *      | reserved    | VFDI_EV_TYPE_RESET     | SEQ   |
+ *
+ * It is then the responsibility of the VF driver to request
+ * reinitialisation of its queues.
+ */
+#define VFDI_EV_SEQ_LBN 24
+#define VFDI_EV_SEQ_WIDTH 8
+#define VFDI_EV_TYPE_LBN 16
+#define VFDI_EV_TYPE_WIDTH 8
+#define VFDI_EV_TYPE_REQ_WORD0 0
+#define VFDI_EV_TYPE_REQ_WORD1 1
+#define VFDI_EV_TYPE_REQ_WORD2 2
+#define VFDI_EV_TYPE_REQ_WORD3 3
+#define VFDI_EV_TYPE_STATUS 4
+#define VFDI_EV_TYPE_RESET 5
+#define VFDI_EV_DATA_LBN 0
+#define VFDI_EV_DATA_WIDTH 16
+
+struct vfdi_endpoint {
+	u8 mac_addr[ETH_ALEN];
+	__be16 tci;
+};
+
+/**
+ * enum vfdi_op - VFDI operation enumeration
+ * @VFDI_OP_RESPONSE: Indicates a response to the request.
+ * @VFDI_OP_INIT_EVQ: Initialize SRAM entries and initialize an EVQ.
+ * @VFDI_OP_INIT_RXQ: Initialize SRAM entries and initialize an RXQ.
+ * @VFDI_OP_INIT_TXQ: Initialize SRAM entries and initialize a TXQ.
+ * @VFDI_OP_FINI_ALL_QUEUES: Flush all queues, finalize all queues, then
+ *	finalize the SRAM entries.
+ * @VFDI_OP_INSERT_FILTER: Insert a MAC filter targetting the given RXQ.
+ * @VFDI_OP_REMOVE_ALL_FILTERS: Remove all filters.
+ * @VFDI_OP_SET_STATUS_PAGE: Set the DMA page(s) used for status updates
+ *	from PF and write the initial status.
+ * @VFDI_OP_CLEAR_STATUS_PAGE: Clear the DMA page(s) used for status
+ *	updates from PF.
+ */
+enum vfdi_op {
+	VFDI_OP_RESPONSE = 0,
+	VFDI_OP_INIT_EVQ = 1,
+	VFDI_OP_INIT_RXQ = 2,
+	VFDI_OP_INIT_TXQ = 3,
+	VFDI_OP_FINI_ALL_QUEUES = 4,
+	VFDI_OP_INSERT_FILTER = 5,
+	VFDI_OP_REMOVE_ALL_FILTERS = 6,
+	VFDI_OP_SET_STATUS_PAGE = 7,
+	VFDI_OP_CLEAR_STATUS_PAGE = 8,
+	VFDI_OP_LIMIT,
+};
+
+/* Response codes for VFDI operations. Other values may be used in future. */
+#define VFDI_RC_SUCCESS		0
+#define VFDI_RC_ENOMEM		(-12)
+#define VFDI_RC_EINVAL		(-22)
+#define VFDI_RC_EOPNOTSUPP	(-95)
+#define VFDI_RC_ETIMEDOUT	(-110)
+
+/**
+ * struct vfdi_req - Request from VF driver to PF driver
+ * @op: Operation code or response indicator, taken from &enum vfdi_op.
+ * @rc: Response code.  Set to 0 on success or a negative error code on failure.
+ * @u.init_evq.index: Index of event queue to create.
+ * @u.init_evq.buf_count: Number of 4k buffers backing event queue.
+ * @u.init_evq.addr: Array of length %u.init_evq.buf_count containing DMA
+ *	address of each page backing the event queue.
+ * @u.init_rxq.index: Index of receive queue to create.
+ * @u.init_rxq.buf_count: Number of 4k buffers backing receive queue.
+ * @u.init_rxq.evq: Instance of event queue to target receive events at.
+ * @u.init_rxq.label: Label used in receive events.
+ * @u.init_rxq.flags: Unused.
+ * @u.init_rxq.addr: Array of length %u.init_rxq.buf_count containing DMA
+ *	address of each page backing the receive queue.
+ * @u.init_txq.index: Index of transmit queue to create.
+ * @u.init_txq.buf_count: Number of 4k buffers backing transmit queue.
+ * @u.init_txq.evq: Instance of event queue to target transmit completion
+ *	events at.
+ * @u.init_txq.label: Label used in transmit completion events.
+ * @u.init_txq.flags: Checksum offload flags.
+ * @u.init_txq.addr: Array of length %u.init_txq.buf_count containing DMA
+ *	address of each page backing the transmit queue.
+ * @u.mac_filter.rxq: Insert MAC filter at VF local address/VLAN targetting
+ *	all traffic at this receive queue.
+ * @u.mac_filter.flags: MAC filter flags.
+ * @u.set_status_page.dma_addr: Base address for the &struct vfdi_status.
+ *	This address must be such that the structure fits within a page.
+ * @u.set_status_page.peer_page_count: Number of additional pages the VF
+ *	has provided into which peer addresses may be DMAd.
+ * @u.set_status_page.peer_page_addr: Array of DMA addresses of pages.
+ *	If the number of peers exceeds 256, then the VF must provide
+ *	additional pages in this array. The PF will then DMA up to
+ *	512 vfdi_endpoint structures into each page.  These addresses
+ *	must be page-aligned.
+ */
+struct vfdi_req {
+	u32 op;
+	u32 reserved1;
+	s32 rc;
+	u32 reserved2;
+	union {
+		struct {
+			u32 index;
+			u32 buf_count;
+			u64 addr[];
+		} init_evq;
+		struct {
+			u32 index;
+			u32 buf_count;
+			u32 evq;
+			u32 label;
+			u32 flags;
+#define VFDI_RXQ_FLAG_SCATTER_EN 1
+			u32 reserved;
+			u64 addr[];
+		} init_rxq;
+		struct {
+			u32 index;
+			u32 buf_count;
+			u32 evq;
+			u32 label;
+			u32 flags;
+#define VFDI_TXQ_FLAG_IP_CSUM_DIS 1
+#define VFDI_TXQ_FLAG_TCPUDP_CSUM_DIS 2
+			u32 reserved;
+			u64 addr[];
+		} init_txq;
+		struct {
+			u32 rxq;
+			u32 flags;
+#define VFDI_MAC_FILTER_FLAG_RSS 1
+#define VFDI_MAC_FILTER_FLAG_SCATTER 2
+		} mac_filter;
+		struct {
+			u64 dma_addr;
+			u64 peer_page_count;
+			u64 peer_page_addr[];
+		} set_status_page;
+	} u;
+};
+
+/**
+ * struct vfdi_status - Status provided by PF driver to VF driver
+ * @generation_start: A generation count DMA'd to VF *before* the
+ *	rest of the structure.
+ * @generation_end: A generation count DMA'd to VF *after* the
+ *	rest of the structure.
+ * @version: Version of this structure; currently set to 1.  Later
+ *	versions must either be layout-compatible or only be sent to VFs
+ *	that specifically request them.
+ * @length: Total length of this structure including embedded tables
+ * @vi_scale: log2 the number of VIs available on this VF. This quantity
+ *	is used by the hardware for register decoding.
+ * @max_tx_channels: The maximum number of transmit queues the VF can use.
+ * @rss_rxq_count: The number of receive queues present in the shared RSS
+ *	indirection table.
+ * @peer_count: Total number of peers in the complete peer list. If larger
+ *	than ARRAY_SIZE(%peers), then the VF must provide sufficient
+ *	additional pages each of which is filled with vfdi_endpoint structures.
+ * @local: The MAC address and outer VLAN tag of *this* VF
+ * @peers: Table of peer addresses.  The @tci fields in these structures
+ *	are currently unused and must be ignored.  Additional peers are
+ *	written into any additional pages provided by the VF.
+ * @timer_quantum_ns: Timer quantum (nominal period between timer ticks)
+ *	for interrupt moderation timers, in nanoseconds. This member is only
+ *	present if @length is sufficiently large.
+ */
+struct vfdi_status {
+	u32 generation_start;
+	u32 generation_end;
+	u32 version;
+	u32 length;
+	u8 vi_scale;
+	u8 max_tx_channels;
+	u8 rss_rxq_count;
+	u8 reserved1;
+	u16 peer_count;
+	u16 reserved2;
+	struct vfdi_endpoint local;
+	struct vfdi_endpoint peers[256];
+
+	/* Members below here extend version 1 of this structure */
+	u32 timer_quantum_ns;
+};
+
+#endif
-- 
1.7.7.6


-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 19/19] sfc: Add SR-IOV back-end support for SFC9000 family
  2012-02-16  0:52 ` [PATCH net-next 19/19] sfc: Add SR-IOV back-end support for SFC9000 family Ben Hutchings
@ 2012-02-16  1:18   ` John Fastabend
  2012-02-16  1:57     ` Ben Hutchings
  0 siblings, 1 reply; 25+ messages in thread
From: John Fastabend @ 2012-02-16  1:18 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, netdev, linux-net-drivers, Shradha Shah

On 2/15/2012 4:52 PM, Ben Hutchings wrote:
> On the SFC9000 family, each port has 1024 Virtual Interfaces (VIs),
> each with an RX queue, a TX queue, an event queue and a mailbox
> register.  These may be assigned to up to 127 SR-IOV virtual functions
> per port, with up to 64 VIs per VF.
> 
> We allocate an extra channel (IRQ and event queue only) to receive
> requests from VF drivers.
> 
> There is a per-port limit of 4 concurrent RX queue flushes, and queue
> flushes may be initiated by the MC in response to a Function Level
> Reset (FLR) of a VF.  Therefore, when SR-IOV is in use, we submit all
> flush requests via the MC.
> 
> The RSS indirection table is shared with VFs, so the number of RX
> queues used in the PF is limited to the number of VIs per VF.
> 
> This is almost entirely the work of Steve Hodgson, formerly
> shodgson@solarflare.com.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---

Hi Ben,

So how would multiple VIs per VF work? Looks like each VI has a TX/RX
pair all bundled under a single netdev with some set of TX MAC filters.

Do you expect users to build tc rules and edit the queue_mapping to get
the skb headed at the correct tx queue? Would it be better to model each
VI has its own net device.

Thanks,
John

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 19/19] sfc: Add SR-IOV back-end support for SFC9000 family
  2012-02-16  1:18   ` John Fastabend
@ 2012-02-16  1:57     ` Ben Hutchings
  2012-02-21 19:52       ` John Fastabend
  0 siblings, 1 reply; 25+ messages in thread
From: Ben Hutchings @ 2012-02-16  1:57 UTC (permalink / raw)
  To: John Fastabend; +Cc: David Miller, netdev, linux-net-drivers, Shradha Shah

On Wed, 2012-02-15 at 17:18 -0800, John Fastabend wrote:
> On 2/15/2012 4:52 PM, Ben Hutchings wrote:
> > On the SFC9000 family, each port has 1024 Virtual Interfaces (VIs),
> > each with an RX queue, a TX queue, an event queue and a mailbox
> > register.  These may be assigned to up to 127 SR-IOV virtual functions
> > per port, with up to 64 VIs per VF.
> > 
> > We allocate an extra channel (IRQ and event queue only) to receive
> > requests from VF drivers.
> > 
> > There is a per-port limit of 4 concurrent RX queue flushes, and queue
> > flushes may be initiated by the MC in response to a Function Level
> > Reset (FLR) of a VF.  Therefore, when SR-IOV is in use, we submit all
> > flush requests via the MC.
> > 
> > The RSS indirection table is shared with VFs, so the number of RX
> > queues used in the PF is limited to the number of VIs per VF.
> > 
> > This is almost entirely the work of Steve Hodgson, formerly
> > shodgson@solarflare.com.
> > 
> > Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> > ---
> 
> Hi Ben,
> 
> So how would multiple VIs per VF work? Looks like each VI has a TX/RX
> pair all bundled under a single netdev with some set of TX MAC filters.

They can be used to provide a multiqueue net device for use in multi-
processor guests.

> Do you expect users to build tc rules and edit the queue_mapping to get
> the skb headed at the correct tx queue? Would it be better to model each
> VI has its own net device.

No, we expect users to assign the VF into the guest.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pull request: sfc-next 2012-02-16
  2012-02-16  0:42 pull request: sfc-next 2012-02-16 Ben Hutchings
                   ` (18 preceding siblings ...)
  2012-02-16  0:52 ` [PATCH net-next 19/19] sfc: Add SR-IOV back-end support for SFC9000 family Ben Hutchings
@ 2012-02-16 22:08 ` David Miller
  19 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2012-02-16 22:08 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers, sshah

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Thu, 16 Feb 2012 00:42:18 +0000

> The following changes since commit eb40d89276705a35c9a2e05793ae63411ae357eb:
> 
>   mlx4: add unicast steering entries to resource_tracker (2012-02-14 14:11:59 -0500)
> 
> are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next.git for-davem
> 
> (commit cd2d5b529cdb9bd274f3e4bc68d37d4d63b7f383)
> 
> The main change here is the addition of SR-IOV support.

Pulled, thanks Ben.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 19/19] sfc: Add SR-IOV back-end support for SFC9000 family
  2012-02-16  1:57     ` Ben Hutchings
@ 2012-02-21 19:52       ` John Fastabend
  2012-02-21 20:24         ` Ben Hutchings
  0 siblings, 1 reply; 25+ messages in thread
From: John Fastabend @ 2012-02-21 19:52 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, netdev, linux-net-drivers, Shradha Shah

On 2/15/2012 5:57 PM, Ben Hutchings wrote:
> On Wed, 2012-02-15 at 17:18 -0800, John Fastabend wrote:
>> On 2/15/2012 4:52 PM, Ben Hutchings wrote:
>>> On the SFC9000 family, each port has 1024 Virtual Interfaces (VIs),
>>> each with an RX queue, a TX queue, an event queue and a mailbox
>>> register.  These may be assigned to up to 127 SR-IOV virtual functions
>>> per port, with up to 64 VIs per VF.
>>>
>>> We allocate an extra channel (IRQ and event queue only) to receive
>>> requests from VF drivers.
>>>
>>> There is a per-port limit of 4 concurrent RX queue flushes, and queue
>>> flushes may be initiated by the MC in response to a Function Level
>>> Reset (FLR) of a VF.  Therefore, when SR-IOV is in use, we submit all
>>> flush requests via the MC.
>>>
>>> The RSS indirection table is shared with VFs, so the number of RX
>>> queues used in the PF is limited to the number of VIs per VF.
>>>
>>> This is almost entirely the work of Steve Hodgson, formerly
>>> shodgson@solarflare.com.
>>>
>>> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
>>> ---
>>
>> Hi Ben,
>>
>> So how would multiple VIs per VF work? Looks like each VI has a TX/RX
>> pair all bundled under a single netdev with some set of TX MAC filters.
> 
> They can be used to provide a multiqueue net device for use in multi-
> processor guests.

OK thanks its really just a multiqueue VF then. Calling it a virtual
interface seems a bit confusing here. For example it doesn't resemble
a VSI (Virtual Station Interface) per 802.1Q spec at all.

I'm guessing using this with a TX MAC/VLAN filter looks something like
Intel's VMDQ solutions.

> 
>> Do you expect users to build tc rules and edit the queue_mapping to get
>> the skb headed at the correct tx queue? Would it be better to model each
>> VI has its own net device.
> 
> No, we expect users to assign the VF into the guest.
> 

Got it.

.John

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 19/19] sfc: Add SR-IOV back-end support for SFC9000 family
  2012-02-21 19:52       ` John Fastabend
@ 2012-02-21 20:24         ` Ben Hutchings
  0 siblings, 0 replies; 25+ messages in thread
From: Ben Hutchings @ 2012-02-21 20:24 UTC (permalink / raw)
  To: John Fastabend; +Cc: David Miller, netdev, linux-net-drivers, Shradha Shah

On Tue, 2012-02-21 at 11:52 -0800, John Fastabend wrote:
> On 2/15/2012 5:57 PM, Ben Hutchings wrote:
> > On Wed, 2012-02-15 at 17:18 -0800, John Fastabend wrote:
> >> On 2/15/2012 4:52 PM, Ben Hutchings wrote:
> >>> On the SFC9000 family, each port has 1024 Virtual Interfaces (VIs),
> >>> each with an RX queue, a TX queue, an event queue and a mailbox
> >>> register.  These may be assigned to up to 127 SR-IOV virtual functions
> >>> per port, with up to 64 VIs per VF.
> >>>
> >>> We allocate an extra channel (IRQ and event queue only) to receive
> >>> requests from VF drivers.
> >>>
> >>> There is a per-port limit of 4 concurrent RX queue flushes, and queue
> >>> flushes may be initiated by the MC in response to a Function Level
> >>> Reset (FLR) of a VF.  Therefore, when SR-IOV is in use, we submit all
> >>> flush requests via the MC.
> >>>
> >>> The RSS indirection table is shared with VFs, so the number of RX
> >>> queues used in the PF is limited to the number of VIs per VF.
> >>>
> >>> This is almost entirely the work of Steve Hodgson, formerly
> >>> shodgson@solarflare.com.
> >>>
> >>> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> >>> ---
> >>
> >> Hi Ben,
> >>
> >> So how would multiple VIs per VF work? Looks like each VI has a TX/RX
> >> pair all bundled under a single netdev with some set of TX MAC filters.
> > 
> > They can be used to provide a multiqueue net device for use in multi-
> > processor guests.
> 
> OK thanks its really just a multiqueue VF then. Calling it a virtual
> interface seems a bit confusing here.

The Falcon architecture was designed around the needs of user-level
networking, so that we could give each process a Virtual Interface
consisting of one RX, one TX and one event queue by mapping one page of
MMIO space into the process.  This term is now used to refer to a set of
queues accessible through a single page - but there is no hard-wired
connection between them or with other resources like filters.

> For example it doesn't resemble
> a VSI (Virtual Station Interface) per 802.1Q spec at all.

Right.  Also, Solarflare documentation uses the term 'VNIC' instead of
'VI', while it's not what is usually meant by 'vNIC' now, either.  But
Solarflare and its predecessors were using these terms well before
networking virtualisation was cool. ;-)

> I'm guessing using this with a TX MAC/VLAN filter looks something like
> Intel's VMDQ solutions.

Possibly; I haven't compared.

Ben.
 
> >> Do you expect users to build tc rules and edit the queue_mapping to get
> >> the skb headed at the correct tx queue? Would it be better to model each
> >> VI has its own net device.
> > 
> > No, we expect users to assign the VF into the guest.
> > 
> 
> Got it.
> 
> .John

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2012-02-21 20:24 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-16  0:42 pull request: sfc-next 2012-02-16 Ben Hutchings
2012-02-16  0:44 ` [PATCH net-next 01/19] sfc: Skip RX end-of-batch work on channels without an RX queue Ben Hutchings
2012-02-16  0:44 ` [PATCH net-next 02/19] sfc: Do not retry hardware probe if it schedules a reset Ben Hutchings
2012-02-16  0:45 ` [PATCH net-next 03/19] sfc: Replace some literal constants with EFX_PAGE_SIZE/EFX_BUF_SIZE Ben Hutchings
2012-02-16  0:45 ` [PATCH net-next 04/19] sfc: Warn if unable to create MTDs Ben Hutchings
2012-02-16  0:46 ` [PATCH net-next 05/19] sfc: Add support for configuring RX unicast/multicast default filters Ben Hutchings
2012-02-16  0:46 ` [PATCH net-next 06/19] sfc: Add support for TX MAC filters Ben Hutchings
2012-02-16  0:47 ` [PATCH net-next 07/19] sfc: Correct MAC filter bitfield definitions Ben Hutchings
2012-02-16  0:47 ` [PATCH net-next 08/19] sfc: Generalise driver event generation Ben Hutchings
2012-02-16  0:47 ` [PATCH net-next 09/19] sfc: Generate RX fill events based on RX queues, not channels Ben Hutchings
2012-02-16  0:48 ` [PATCH net-next 10/19] sfc: Leave interrupts and event queues enabled whenever we can Ben Hutchings
2012-02-16  0:48 ` [PATCH net-next 11/19] sfc: Use proper function to test for RX channel in efx_poll() Ben Hutchings
2012-02-16  0:48 ` [PATCH net-next 12/19] sfc: Generalise event generation to cover VF-owned event queues Ben Hutchings
2012-02-16  0:49 ` [PATCH net-next 13/19] sfc: Disable flow control during flushes Ben Hutchings
2012-02-16  0:49 ` [PATCH net-next 14/19] sfc: Make buffer table indices and counts consistently unsigned Ben Hutchings
2012-02-16  0:50 ` [PATCH net-next 15/19] sfc: Make all CPU/IRQ/channel/queue counts unsigned Ben Hutchings
2012-02-16  0:50 ` [PATCH net-next 16/19] sfc: Add support for 'extra' channel types Ben Hutchings
2012-02-16  0:51 ` [PATCH net-next 17/19] sfc: Pass NIC structure into efx_wanted_parallelism() Ben Hutchings
2012-02-16  0:52 ` [PATCH net-next 18/19] sfc: Allocate SRAM between buffer table and descriptor caches at init time Ben Hutchings
2012-02-16  0:52 ` [PATCH net-next 19/19] sfc: Add SR-IOV back-end support for SFC9000 family Ben Hutchings
2012-02-16  1:18   ` John Fastabend
2012-02-16  1:57     ` Ben Hutchings
2012-02-21 19:52       ` John Fastabend
2012-02-21 20:24         ` Ben Hutchings
2012-02-16 22:08 ` pull request: sfc-next 2012-02-16 David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.