[net-next 00/10][pull request] Intel Wired LAN Drivers Update

All of lore.kernel.org
 help / color / mirror / Atom feed

* [net-next 00/10][pull request] Intel Wired LAN Drivers Update
@ 2011-09-24  9:17 Jeff Kirsher
  2011-09-24  9:17 ` [net-next 01/10] e1000: don't enable dma receives until after dma address has been setup Jeff Kirsher
                   ` (9 more replies)
  0 siblings, 10 replies; 30+ messages in thread
From: Jeff Kirsher @ 2011-09-24  9:17 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo

The following series contains updates to e1000, ixgbe, ixgbevf and
if_link.  The e1000 patch is a fix to an issue seen qemu-kvm. The
if_link patch adds an additional parameter for spoof checking,
and there is a subsequent ixgbe patch to implement the if_link
parameter.  The ixgbevf patch fixes an issue with trunk VLAN.

The remainder of the patches for ixgbe include:
  - add thermal sensor support
  - add ECC warning for legacy interrupts
  - cleanup/update interrupt throttle rate logic
  - cleanup/update X540 code

NOTE- After the discussion on conflating NAPI and interrupt moderation,
I have a patch in validation to remove the tie between work_limit and
interrupt moderation for ixgbe, which is not in this series.  I mention
this because the cleanup to interrupt throttle rate logic patch
still contains the code that ties NAPI and interrupt moderation.

The following are changes since commit 7777de9af54a1402c79bf7663b38ff5ba308dd45:
  qlcnic: Change CDRP function
and are available in the git repository at:
  git://github.com/Jkirsher/net-next.git

Dean Nelson (1):
  e1000: don't enable dma receives until after dma address has been
    setup

Don Skidmore (2):
  ixgbe: cleanup ixgbe_setup_gpie() for X540
  ixgbe: add ECC warning for legacy interrupts

Emil Tantilov (1):
  ixgbe: Cleanup q_vector interrupt throttle rate logic

Greg Rose (3):
  ixgbevf: Fix broken trunk vlan
  if_link: Add additional parameter to IFLA_VF_INFO for spoof checking
  ixgbe: Add new netdev op to turn spoof checking on or off per VF

Jacob Keller (1):
  ixgbe add thermal sensor support for x540 hardware

John Fastabend (1):
  ixgbe: update {P}FC thresholds to account for X540 and loopback

Vasu Dev (1):
  ixgbe: disable LLI for FCoE

 drivers/net/ethernet/intel/e1000/e1000_main.c      |    6 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe.h           |   28 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c     |    8 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c    |   12 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.h       |    1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82598.c |    9 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82599.c |    8 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c   |  153 +++-------
 drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c      |    4 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      |  299 +++++++++++++++-----
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c     |   48 +++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h     |    1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_type.h      |   64 ++++-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c  |    6 +-
 include/linux/if_link.h                            |    7 +
 include/linux/netdevice.h                          |    3 +
 net/core/rtnetlink.c                               |   25 ++-
 17 files changed, 442 insertions(+), 240 deletions(-)

-- 
1.7.6.2

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [net-next 01/10] e1000: don't enable dma receives until after dma address has been setup
  2011-09-24  9:17 [net-next 00/10][pull request] Intel Wired LAN Drivers Update Jeff Kirsher
@ 2011-09-24  9:17 ` Jeff Kirsher
  2011-09-24  9:17 ` [net-next 02/10] ixgbevf: Fix broken trunk vlan Jeff Kirsher
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Jeff Kirsher @ 2011-09-24  9:17 UTC (permalink / raw)
  To: davem; +Cc: Dean Nelson, netdev, gospo, Andy Gospodarek, Jeff Kirsher

From: Dean Nelson <dnelson@redhat.com>

Doing an 'ifconfig ethN down' followed by an 'ifconfig ethN up' on a qemu-kvm
guest system configured with two e1000 NICs can result in an 'unable to handle
kernel paging request at 0000000100000000' or 'bad page map in process ...' or
something similar.

These result from a 4096-byte page being corrupted with the following two-word
pattern (16-bytes) repeated throughout the entire page:

  0x0000000000000000
  0x0000000100000000

There can be other bits set as well. What is a constant is that the 2nd word
has the 32nd bit set. So one could see:

        :
  0x0000000000000000
  0x0000000100000000
  0x0000000000000000
  0x0000000172adc067    <<< bad pte
  0x800000006ec60067
  0x0000000700000040
  0x0000000000000000
  0x0000000100000000
        :

Which came from from a process' page table I dumped out when the marked line
was seen as bad by print_bad_pte().

The repeating pattern represents the e1000's two-word receive descriptor:

struct e1000_rx_desc {
        __le64 buffer_addr;   /* Address of the descriptor's data buffer */
        __le16 length;        /* Length of data DMAed into data buffer */
        __le16 csum;          /* Packet checksum */
        u8 status;            /* Descriptor status */
        u8 errors;            /* Descriptor Errors */
        __le16 special;
};

And the 32nd bit of the 2nd word maps to the 'u8 status' member, and
corresponds to E1000_RXD_STAT_DD which indicates the descriptor is done.

The corruption appears to result from the following...

 . An 'ifconfig ethN down' gets us into e1000_close(), which through a number
   of subfunctions results in:
     1. E1000_RCTL_EN being cleared in RCTL register.  [e1000_down()]
     2. dma_free_coherent() being called.  [e1000_free_rx_resources()]

 . An 'ifconfig ethN up' gets us into e1000_open(), which through a number of
   subfunctions results in:
     1. dma_alloc_coherent() being called.  [e1000_setup_rx_resources()]
     2. E1000_RCTL_EN being set in RCTL register.  [e1000_setup_rctl()]
     3. E1000_RCTL_EN being cleared in RCTL register.  [e1000_configure_rx()]
     4. RDLEN, RDBAH and RDBAL registers being set to reflect the dma page
        allocated in step 1.  [e1000_configure_rx()]
     5. E1000_RCTL_EN being set in RCTL register.  [e1000_configure_rx()]

During the 'ifconfig ethN up' there is a window opened, starting in step 2
where the receives are enabled up until they are disabled in step 3, in which
the address of the receive descriptor dma page known by the NIC is still the
previous one which was freed during the 'ifconfig ethN down'. If this memory
has been reallocated for some other use and the NIC feels so inclined, it will
write to that former dma page with predictably unpleasant results.

I realize that in the guest, we're dealing with an e1000 NIC that is software
emulated by qemu-kvm. The problem doesn't appear to occur on bare-metal. Andy
suspects that this is because in the emulator link-up is essentially instant
and traffic can start flowing immediately. Whereas on bare-metal, link-up
usually seems to take at least a few milliseconds. And this might be enough
to prevent traffic from flowing into the device inside the window where
E1000_RCTL_EN is set.

So perhaps a modification needs to be made to the qemu-kvm e1000 NIC emulator
to delay the link-up. But in defense of the emulator, it seems like a bad idea
to enable dma operations before the address of the memory to be involved has
been made known.

The following patch no longer enables receives in e1000_setup_rctl() but leaves
them however they were. It only enables receives in e1000_configure_rx(), and
only after the dma address has been made known to the hardware.

There are two places where e1000_setup_rctl() gets called. The one in
e1000_configure() is followed immediately by a call to e1000_configure_rx(), so
there's really no change functionally (except for the removal of the problem
window. The other is in __e1000_shutdown() and is not followed by a call to
e1000_configure_rx(), so there is a change functionally. But consider...

 . An 'ifconfig ethN down' (just as described above).

 . A 'suspend' of the system, which (I'm assuming) will find its way into
   e1000_suspend() which calls __e1000_shutdown() resulting in:
     1. E1000_RCTL_EN being set in RCTL register.  [e1000_setup_rctl()]

And again we've re-opened the problem window for some unknown amount of time.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000/e1000_main.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 27f586a..4bbc05a 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -1814,8 +1814,8 @@ static void e1000_setup_rctl(struct e1000_adapter *adapter)

 	rctl &= ~(3 << E1000_RCTL_MO_SHIFT);

-	rctl |= E1000_RCTL_EN | E1000_RCTL_BAM |
-		E1000_RCTL_LBM_NO | E1000_RCTL_RDMTS_HALF |
+	rctl |= E1000_RCTL_BAM | E1000_RCTL_LBM_NO |
+		E1000_RCTL_RDMTS_HALF |
 		(hw->mc_filter_type << E1000_RCTL_MO_SHIFT);

 	if (hw->tbi_compatibility_on == 1)
@@ -1917,7 +1917,7 @@ static void e1000_configure_rx(struct e1000_adapter *adapter)
 	}

 	/* Enable Receives */
-	ew32(RCTL, rctl);
+	ew32(RCTL, rctl | E1000_RCTL_EN);
 }

 /**
-- 
1.7.6.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next 02/10] ixgbevf: Fix broken trunk vlan
  2011-09-24  9:17 [net-next 00/10][pull request] Intel Wired LAN Drivers Update Jeff Kirsher
  2011-09-24  9:17 ` [net-next 01/10] e1000: don't enable dma receives until after dma address has been setup Jeff Kirsher
@ 2011-09-24  9:17 ` Jeff Kirsher
  2011-09-24 16:33   ` Jesse Gross
  2011-09-24  9:17 ` [net-next 03/10] ixgbe: Cleanup q_vector interrupt throttle rate logic Jeff Kirsher
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Jeff Kirsher @ 2011-09-24  9:17 UTC (permalink / raw)
  To: davem; +Cc: Greg Rose, netdev, gospo, Jiri Pirko, Jeff Kirsher

From: Greg Rose <gregory.v.rose@intel.com>

Changes to clean up the vlan rx path broke trunk vlan.  Trunk vlans in
a VF driver are those set using:

"ip link set <pfdev> vf <n> <vlanid>"

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
CC: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index d72905b..4930c46 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -293,12 +293,10 @@ static void ixgbevf_receive_skb(struct ixgbevf_q_vector *q_vector,
 {
 	struct ixgbevf_adapter *adapter = q_vector->adapter;
 	bool is_vlan = (status & IXGBE_RXD_STAT_VP);
+	u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan);
 
-	if (is_vlan) {
-		u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan);
-
+	if (is_vlan && test_bit(tag, adapter->active_vlans))
 		__vlan_hwaccel_put_tag(skb, tag);
-	}
 
 	if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL))
 			napi_gro_receive(&q_vector->napi, skb);
-- 
1.7.6.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next 03/10] ixgbe: Cleanup q_vector interrupt throttle rate logic
  2011-09-24  9:17 [net-next 00/10][pull request] Intel Wired LAN Drivers Update Jeff Kirsher
  2011-09-24  9:17 ` [net-next 01/10] e1000: don't enable dma receives until after dma address has been setup Jeff Kirsher
  2011-09-24  9:17 ` [net-next 02/10] ixgbevf: Fix broken trunk vlan Jeff Kirsher
@ 2011-09-24  9:17 ` Jeff Kirsher
  2011-09-24  9:17 ` [net-next 04/10] ixgbe: disable LLI for FCoE Jeff Kirsher
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Jeff Kirsher @ 2011-09-24  9:17 UTC (permalink / raw)
  To: davem; +Cc: Emil Tantilov, netdev, gospo, Jeff Kirsher

From: Emil Tantilov <emil.s.tantilov@intel.com>

This patch is meant to help cleanup the interrupt throttle rate logic by
storing the interrupt throttle rate as a value in microseconds instead of
interrupts per second.  The advantage to this approach is that the value
can now be stored in an 16 bit field and doesn't require as much math to
flip the value back and forth since the hardware already used microseconds
when setting the rate.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |   25 ++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |  153 +++++++---------------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |   77 +++++------
 3 files changed, 96 insertions(+), 159 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 1f4a4ca..38940d7 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -301,26 +301,29 @@ struct ixgbe_ring_container {
  */
 struct ixgbe_q_vector {
 	struct ixgbe_adapter *adapter;
-	unsigned int v_idx; /* index of q_vector within array, also used for
-	                     * finding the bit in EICR and friends that
-	                     * represents the vector for this ring */
 #ifdef CONFIG_IXGBE_DCA
 	int cpu;	    /* CPU for DCA */
 #endif
-	struct napi_struct napi;
+	u16 v_idx;		/* index of q_vector within array, also used for
+				 * finding the bit in EICR and friends that
+				 * represents the vector for this ring */
+	u16 itr;		/* Interrupt throttle rate written to EITR */
 	struct ixgbe_ring_container rx, tx;
-	u32 eitr;
+
+	struct napi_struct napi;
 	cpumask_var_t affinity_mask;
 	char name[IFNAMSIZ + 9];
 };
 
-/* Helper macros to switch between ints/sec and what the register uses.
- * And yes, it's the same math going both ways.  The lowest value
- * supported by all of the ixgbe hardware is 8.
+/*
+ * microsecond values for various ITR rates shifted by 2 to fit itr register
+ * with the first 3 bits reserved 0
  */
-#define EITR_INTS_PER_SEC_TO_REG(_eitr) \
-	((_eitr) ? (1000000000 / ((_eitr) * 256)) : 8)
-#define EITR_REG_TO_INTS_PER_SEC EITR_INTS_PER_SEC_TO_REG
+#define IXGBE_MIN_RSC_ITR	24
+#define IXGBE_100K_ITR		40
+#define IXGBE_20K_ITR		200
+#define IXGBE_10K_ITR		400
+#define IXGBE_8K_ITR		500
 
 static inline u16 ixgbe_desc_unused(struct ixgbe_ring *ring)
 {
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index debcf5f..ae9fba5 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -2026,39 +2026,20 @@ static int ixgbe_get_coalesce(struct net_device *netdev,
 	ec->tx_max_coalesced_frames_irq = adapter->tx_work_limit;
 
 	/* only valid if in constant ITR mode */
-	switch (adapter->rx_itr_setting) {
-	case 0:
-		/* throttling disabled */
-		ec->rx_coalesce_usecs = 0;
-		break;
-	case 1:
-		/* dynamic ITR mode */
-		ec->rx_coalesce_usecs = 1;
-		break;
-	default:
-		/* fixed interrupt rate mode */
-		ec->rx_coalesce_usecs = 1000000/adapter->rx_eitr_param;
-		break;
-	}
+	if (adapter->rx_itr_setting <= 1)
+		ec->rx_coalesce_usecs = adapter->rx_itr_setting;
+	else
+		ec->rx_coalesce_usecs = adapter->rx_itr_setting >> 2;
 
 	/* if in mixed tx/rx queues per vector mode, report only rx settings */
 	if (adapter->q_vector[0]->tx.count && adapter->q_vector[0]->rx.count)
 		return 0;
 
 	/* only valid if in constant ITR mode */
-	switch (adapter->tx_itr_setting) {
-	case 0:
-		/* throttling disabled */
-		ec->tx_coalesce_usecs = 0;
-		break;
-	case 1:
-		/* dynamic ITR mode */
-		ec->tx_coalesce_usecs = 1;
-		break;
-	default:
-		ec->tx_coalesce_usecs = 1000000/adapter->tx_eitr_param;
-		break;
-	}
+	if (adapter->tx_itr_setting <= 1)
+		ec->tx_coalesce_usecs = adapter->tx_itr_setting;
+	else
+		ec->tx_coalesce_usecs = adapter->tx_itr_setting >> 2;
 
 	return 0;
 }
@@ -2077,10 +2058,9 @@ static bool ixgbe_update_rsc(struct ixgbe_adapter *adapter,
 
 	/* if interrupt rate is too high then disable RSC */
 	if (ec->rx_coalesce_usecs != 1 &&
-	    ec->rx_coalesce_usecs <= 1000000/IXGBE_MAX_RSC_INT_RATE) {
+	    ec->rx_coalesce_usecs <= (IXGBE_MIN_RSC_ITR >> 2)) {
 		if (adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED) {
-			e_info(probe, "rx-usecs set too low, "
-				      "disabling RSC\n");
+			e_info(probe, "rx-usecs set too low, disabling RSC\n");
 			adapter->flags2 &= ~IXGBE_FLAG2_RSC_ENABLED;
 			return true;
 		}
@@ -2088,8 +2068,7 @@ static bool ixgbe_update_rsc(struct ixgbe_adapter *adapter,
 		/* check the feature flag value and enable RSC if necessary */
 		if ((netdev->features & NETIF_F_LRO) &&
 		    !(adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED)) {
-			e_info(probe, "rx-usecs set to %d, "
-				      "re-enabling RSC\n",
+			e_info(probe, "rx-usecs set to %d, re-enabling RSC\n",
 			       ec->rx_coalesce_usecs);
 			adapter->flags2 |= IXGBE_FLAG2_RSC_ENABLED;
 			return true;
@@ -2104,97 +2083,59 @@ static int ixgbe_set_coalesce(struct net_device *netdev,
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 	struct ixgbe_q_vector *q_vector;
 	int i;
+	int num_vectors;
+	u16 tx_itr_param, rx_itr_param;
 	bool need_reset = false;
 
 	/* don't accept tx specific changes if we've got mixed RxTx vectors */
 	if (adapter->q_vector[0]->tx.count && adapter->q_vector[0]->rx.count
-	   && ec->tx_coalesce_usecs)
+	    && ec->tx_coalesce_usecs)
 		return -EINVAL;
 
 	if (ec->tx_max_coalesced_frames_irq)
 		adapter->tx_work_limit = ec->tx_max_coalesced_frames_irq;
 
-	if (ec->rx_coalesce_usecs > 1) {
-		/* check the limits */
-		if ((1000000/ec->rx_coalesce_usecs > IXGBE_MAX_INT_RATE) ||
-		    (1000000/ec->rx_coalesce_usecs < IXGBE_MIN_INT_RATE))
-			return -EINVAL;
-
-		/* check the old value and enable RSC if necessary */
-		need_reset = ixgbe_update_rsc(adapter, ec);
-
-		/* store the value in ints/second */
-		adapter->rx_eitr_param = 1000000/ec->rx_coalesce_usecs;
+	if ((ec->rx_coalesce_usecs > (IXGBE_MAX_EITR >> 2)) ||
+	    (ec->tx_coalesce_usecs > (IXGBE_MAX_EITR >> 2)))
+		return -EINVAL;
 
-		/* static value of interrupt rate */
-		adapter->rx_itr_setting = adapter->rx_eitr_param;
-		/* clear the lower bit as its used for dynamic state */
-		adapter->rx_itr_setting &= ~1;
-	} else if (ec->rx_coalesce_usecs == 1) {
-		/* check the old value and enable RSC if necessary */
-		need_reset = ixgbe_update_rsc(adapter, ec);
+	/* check the old value and enable RSC if necessary */
+	need_reset = ixgbe_update_rsc(adapter, ec);
 
-		/* 1 means dynamic mode */
-		adapter->rx_eitr_param = 20000;
-		adapter->rx_itr_setting = 1;
-	} else {
-		/* check the old value and enable RSC if necessary */
-		need_reset = ixgbe_update_rsc(adapter, ec);
-		/*
-		 * any other value means disable eitr, which is best
-		 * served by setting the interrupt rate very high
-		 */
-		adapter->rx_eitr_param = IXGBE_MAX_INT_RATE;
-		adapter->rx_itr_setting = 0;
-	}
+	if (ec->rx_coalesce_usecs > 1)
+		adapter->rx_itr_setting = ec->rx_coalesce_usecs << 2;
+	else
+		adapter->rx_itr_setting = ec->rx_coalesce_usecs;
 
-	if (ec->tx_coalesce_usecs > 1) {
-		/*
-		 * don't have to worry about max_int as above because
-		 * tx vectors don't do hardware RSC (an rx function)
-		 */
-		/* check the limits */
-		if ((1000000/ec->tx_coalesce_usecs > IXGBE_MAX_INT_RATE) ||
-		    (1000000/ec->tx_coalesce_usecs < IXGBE_MIN_INT_RATE))
-			return -EINVAL;
+	if (adapter->rx_itr_setting == 1)
+		rx_itr_param = IXGBE_20K_ITR;
+	else
+		rx_itr_param = adapter->rx_itr_setting;
 
-		/* store the value in ints/second */
-		adapter->tx_eitr_param = 1000000/ec->tx_coalesce_usecs;
+	if (ec->tx_coalesce_usecs > 1)
+		adapter->tx_itr_setting = ec->tx_coalesce_usecs << 2;
+	else
+		adapter->tx_itr_setting = ec->tx_coalesce_usecs;
 
-		/* static value of interrupt rate */
-		adapter->tx_itr_setting = adapter->tx_eitr_param;
+	if (adapter->tx_itr_setting == 1)
+		tx_itr_param = IXGBE_10K_ITR;
+	else
+		tx_itr_param = adapter->tx_itr_setting;
 
-		/* clear the lower bit as its used for dynamic state */
-		adapter->tx_itr_setting &= ~1;
-	} else if (ec->tx_coalesce_usecs == 1) {
-		/* 1 means dynamic mode */
-		adapter->tx_eitr_param = 10000;
-		adapter->tx_itr_setting = 1;
-	} else {
-		adapter->tx_eitr_param = IXGBE_MAX_INT_RATE;
-		adapter->tx_itr_setting = 0;
-	}
+	if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED)
+		num_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
+	else
+		num_vectors = 1;
 
-	/* MSI/MSIx Interrupt Mode */
-	if (adapter->flags &
-	    (IXGBE_FLAG_MSIX_ENABLED | IXGBE_FLAG_MSI_ENABLED)) {
-		int num_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
-		for (i = 0; i < num_vectors; i++) {
-			q_vector = adapter->q_vector[i];
-			if (q_vector->tx.count && !q_vector->rx.count)
-				/* tx only */
-				q_vector->eitr = adapter->tx_eitr_param;
-			else
-				/* rx only or mixed */
-				q_vector->eitr = adapter->rx_eitr_param;
-			q_vector->tx.work_limit = adapter->tx_work_limit;
-			ixgbe_write_eitr(q_vector);
-		}
-	/* Legacy Interrupt Mode */
-	} else {
-		q_vector = adapter->q_vector[0];
-		q_vector->eitr = adapter->rx_eitr_param;
+	for (i = 0; i < num_vectors; i++) {
+		q_vector = adapter->q_vector[i];
 		q_vector->tx.work_limit = adapter->tx_work_limit;
+		if (q_vector->tx.count && !q_vector->rx.count)
+			/* tx only */
+			q_vector->itr = tx_itr_param;
+		else
+			/* rx only or mixed */
+			q_vector->itr = rx_itr_param;
 		ixgbe_write_eitr(q_vector);
 	}
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index c26ea94..3594b09 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1500,12 +1500,19 @@ static void ixgbe_configure_msix(struct ixgbe_adapter *adapter)
 		for (ring = q_vector->tx.ring; ring != NULL; ring = ring->next)
 			ixgbe_set_ivar(adapter, 1, ring->reg_idx, v_idx);
 
-		if (q_vector->tx.ring && !q_vector->rx.ring)
-			/* tx only */
-			q_vector->eitr = adapter->tx_eitr_param;
-		else if (q_vector->rx.ring)
-			/* rx or mixed */
-			q_vector->eitr = adapter->rx_eitr_param;
+		if (q_vector->tx.ring && !q_vector->rx.ring) {
+			/* tx only vector */
+			if (adapter->tx_itr_setting == 1)
+				q_vector->itr = IXGBE_10K_ITR;
+			else
+				q_vector->itr = adapter->tx_itr_setting;
+		} else {
+			/* rx or rx/tx vector */
+			if (adapter->rx_itr_setting == 1)
+				q_vector->itr = IXGBE_20K_ITR;
+			else
+				q_vector->itr = adapter->rx_itr_setting;
+		}
 
 		ixgbe_write_eitr(q_vector);
 	}
@@ -1519,7 +1526,6 @@ static void ixgbe_configure_msix(struct ixgbe_adapter *adapter)
 	case ixgbe_mac_X540:
 		ixgbe_set_ivar(adapter, -1, 1, v_idx);
 		break;
-
 	default:
 		break;
 	}
@@ -1527,12 +1533,10 @@ static void ixgbe_configure_msix(struct ixgbe_adapter *adapter)
 
 	/* set up to autoclear timer, and the vectors */
 	mask = IXGBE_EIMS_ENABLE_MASK;
-	if (adapter->num_vfs)
-		mask &= ~(IXGBE_EIMS_OTHER |
-			  IXGBE_EIMS_MAILBOX |
-			  IXGBE_EIMS_LSC);
-	else
-		mask &= ~(IXGBE_EIMS_OTHER | IXGBE_EIMS_LSC);
+	mask &= ~(IXGBE_EIMS_OTHER |
+		  IXGBE_EIMS_MAILBOX |
+		  IXGBE_EIMS_LSC);
+
 	IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIAC, mask);
 }
 
@@ -1577,7 +1581,7 @@ static void ixgbe_update_itr(struct ixgbe_q_vector *q_vector,
 	 *  100-1249MB/s bulk (8000 ints/s)
 	 */
 	/* what was last interrupt timeslice? */
-	timepassed_us = 1000000/q_vector->eitr;
+	timepassed_us = q_vector->itr >> 2;
 	bytes_perint = bytes / timepassed_us; /* bytes/usec */
 
 	switch (itr_setting) {
@@ -1618,7 +1622,7 @@ void ixgbe_write_eitr(struct ixgbe_q_vector *q_vector)
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 	struct ixgbe_hw *hw = &adapter->hw;
 	int v_idx = q_vector->v_idx;
-	u32 itr_reg = EITR_INTS_PER_SEC_TO_REG(q_vector->eitr);
+	u32 itr_reg = q_vector->itr;
 
 	switch (adapter->hw.mac.type) {
 	case ixgbe_mac_82598EB:
@@ -1628,15 +1632,6 @@ void ixgbe_write_eitr(struct ixgbe_q_vector *q_vector)
 	case ixgbe_mac_82599EB:
 	case ixgbe_mac_X540:
 		/*
-		 * 82599 and X540 can support a value of zero, so allow it for
-		 * max interrupt rate, but there is an errata where it can
-		 * not be zero with RSC
-		 */
-		if (itr_reg == 8 &&
-		    !(adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED))
-			itr_reg = 0;
-
-		/*
 		 * set the WDIS bit to not clear the timer bits and cause an
 		 * immediate assertion of the interrupt
 		 */
@@ -1650,7 +1645,7 @@ void ixgbe_write_eitr(struct ixgbe_q_vector *q_vector)
 
 static void ixgbe_set_itr(struct ixgbe_q_vector *q_vector)
 {
-	u32 new_itr = q_vector->eitr;
+	u32 new_itr = q_vector->itr;
 	u8 current_itr;
 
 	ixgbe_update_itr(q_vector, &q_vector->tx);
@@ -1661,24 +1656,25 @@ static void ixgbe_set_itr(struct ixgbe_q_vector *q_vector)
 	switch (current_itr) {
 	/* counts and packets in update_itr are dependent on these numbers */
 	case lowest_latency:
-		new_itr = 100000;
+		new_itr = IXGBE_100K_ITR;
 		break;
 	case low_latency:
-		new_itr = 20000; /* aka hwitr = ~200 */
+		new_itr = IXGBE_20K_ITR;
 		break;
 	case bulk_latency:
-		new_itr = 8000;
+		new_itr = IXGBE_8K_ITR;
 		break;
 	default:
 		break;
 	}
 
-	if (new_itr != q_vector->eitr) {
+	if (new_itr != q_vector->itr) {
 		/* do an exponential smoothing */
-		new_itr = ((q_vector->eitr * 9) + new_itr)/10;
+		new_itr = (10 * new_itr * q_vector->itr) /
+			  ((9 * new_itr) + q_vector->itr);
 
 		/* save the algorithm value here */
-		q_vector->eitr = new_itr;
+		q_vector->itr = new_itr & IXGBE_MAX_EITR;
 
 		ixgbe_write_eitr(q_vector);
 	}
@@ -2301,10 +2297,15 @@ static inline void ixgbe_irq_disable(struct ixgbe_adapter *adapter)
  **/
 static void ixgbe_configure_msi_and_legacy(struct ixgbe_adapter *adapter)
 {
-	struct ixgbe_hw *hw = &adapter->hw;
+	struct ixgbe_q_vector *q_vector = adapter->q_vector[0];
 
-	IXGBE_WRITE_REG(hw, IXGBE_EITR(0),
-			EITR_INTS_PER_SEC_TO_REG(adapter->rx_eitr_param));
+	/* rx/tx vector */
+	if (adapter->rx_itr_setting == 1)
+		q_vector->itr = IXGBE_20K_ITR;
+	else
+		q_vector->itr = adapter->rx_itr_setting;
+
+	ixgbe_write_eitr(q_vector);
 
 	ixgbe_set_ivar(adapter, 0, 0, 0);
 	ixgbe_set_ivar(adapter, 1, 0, 0);
@@ -4613,12 +4614,6 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
 		if (!alloc_cpumask_var(&q_vector->affinity_mask, GFP_KERNEL))
 			goto err_out;
 		cpumask_set_cpu(v_idx, q_vector->affinity_mask);
-
-		if (q_vector->tx.count && !q_vector->rx.count)
-			q_vector->eitr = adapter->tx_eitr_param;
-		else
-			q_vector->eitr = adapter->rx_eitr_param;
-
 		netif_napi_add(adapter->netdev, &q_vector->napi,
 			       ixgbe_poll, 64);
 		adapter->q_vector[v_idx] = q_vector;
@@ -4864,9 +4859,7 @@ static int __devinit ixgbe_sw_init(struct ixgbe_adapter *adapter)
 
 	/* enable itr by default in dynamic mode */
 	adapter->rx_itr_setting = 1;
-	adapter->rx_eitr_param = 20000;
 	adapter->tx_itr_setting = 1;
-	adapter->tx_eitr_param = 10000;
 
 	/* set defaults for eitr in MegaBytes */
 	adapter->eitr_low = 10;
-- 
1.7.6.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next 04/10] ixgbe: disable LLI for FCoE
  2011-09-24  9:17 [net-next 00/10][pull request] Intel Wired LAN Drivers Update Jeff Kirsher
                   ` (2 preceding siblings ...)
  2011-09-24  9:17 ` [net-next 03/10] ixgbe: Cleanup q_vector interrupt throttle rate logic Jeff Kirsher
@ 2011-09-24  9:17 ` Jeff Kirsher
  2011-09-24  9:17 ` [net-next 05/10] if_link: Add additional parameter to IFLA_VF_INFO for spoof checking Jeff Kirsher
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Jeff Kirsher @ 2011-09-24  9:17 UTC (permalink / raw)
  To: davem; +Cc: Vasu Dev, netdev, gospo, Jeff Kirsher

From: Vasu Dev <vasu.dev@intel.com>

Disable LLI for FCoE since regular interrupt
and their moderation rate works slightly better
for FCoE also.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
index cae766d..323f452 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
@@ -661,9 +661,7 @@ void ixgbe_configure_fcoe(struct ixgbe_adapter *adapter)
 			IXGBE_ETQS_QUEUE_EN |
 			(fcoe_q << IXGBE_ETQS_RX_QUEUE_SHIFT));
 
-	IXGBE_WRITE_REG(hw, IXGBE_FCRXCTRL,
-			IXGBE_FCRXCTRL_FCOELLI |
-			IXGBE_FCRXCTRL_FCCRCBO |
+	IXGBE_WRITE_REG(hw, IXGBE_FCRXCTRL, IXGBE_FCRXCTRL_FCCRCBO |
 			(FC_FCOE_VER << IXGBE_FCRXCTRL_FCOEVER_SHIFT));
 	return;
 
-- 
1.7.6.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next 05/10] if_link: Add additional parameter to IFLA_VF_INFO for spoof checking
  2011-09-24  9:17 [net-next 00/10][pull request] Intel Wired LAN Drivers Update Jeff Kirsher
                   ` (3 preceding siblings ...)
  2011-09-24  9:17 ` [net-next 04/10] ixgbe: disable LLI for FCoE Jeff Kirsher
@ 2011-09-24  9:17 ` Jeff Kirsher
  2011-09-24 16:40   ` Ben Hutchings
  2011-09-25 17:22   ` Stephen Hemminger
  2011-09-24  9:17 ` [net-next 06/10] ixgbe: Add new netdev op to turn spoof checking on or off per VF Jeff Kirsher
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 30+ messages in thread
From: Jeff Kirsher @ 2011-09-24  9:17 UTC (permalink / raw)
  To: davem; +Cc: Greg Rose, netdev, gospo, Jeff Kirsher

From: Greg Rose <gregory.v.rose@intel.com>

Add configuration setting for drivers to turn spoof checking on or off
for discrete VFs.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 include/linux/if_link.h   |    7 +++++++
 include/linux/netdevice.h |    3 +++
 net/core/rtnetlink.c      |   25 ++++++++++++++++++++++---
 3 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 0ee969a..8bd6d6d 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -279,6 +279,7 @@ enum {
 	IFLA_VF_MAC,		/* Hardware queue specific attributes */
 	IFLA_VF_VLAN,
 	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
+	IFLA_VF_SPOOFCHK,	/* Spoof Checking on/off switch */
 	__IFLA_VF_MAX,
 };
 
@@ -300,12 +301,18 @@ struct ifla_vf_tx_rate {
 	__u32 rate; /* Max TX bandwidth in Mbps, 0 disables throttling */
 };
 
+struct ifla_vf_spoofchk {
+	__u32 vf;
+	__u32 setting;
+};
+
 struct ifla_vf_info {
 	__u32 vf;
 	__u8 mac[32];
 	__u32 vlan;
 	__u32 qos;
 	__u32 tx_rate;
+	__u32 spoofchk;
 };
 
 /* VF ports management section
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 43b3298..a2951a0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -781,6 +781,7 @@ struct netdev_tc_txq {
  * int (*ndo_set_vf_mac)(struct net_device *dev, int vf, u8* mac);
  * int (*ndo_set_vf_vlan)(struct net_device *dev, int vf, u16 vlan, u8 qos);
  * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
+ * int (*ndo_set_vf_spoofchk)(struct net_device *dev, int vf, u8 setting);
  * int (*ndo_get_vf_config)(struct net_device *dev,
  *			    int vf, struct ifla_vf_info *ivf);
  * int (*ndo_set_vf_port)(struct net_device *dev, int vf,
@@ -900,6 +901,8 @@ struct net_device_ops {
 						   int queue, u16 vlan, u8 qos);
 	int			(*ndo_set_vf_tx_rate)(struct net_device *dev,
 						      int vf, int rate);
+	int			(*ndo_set_vf_spoofchk)(struct net_device *dev,
+						       int vf, u8 setting);
 	int			(*ndo_get_vf_config)(struct net_device *dev,
 						     int vf,
 						     struct ifla_vf_info *ivf);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 39f8dd6..6535810 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -731,7 +731,8 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev)
 		size += num_vfs *
 			(nla_total_size(sizeof(struct ifla_vf_mac)) +
 			 nla_total_size(sizeof(struct ifla_vf_vlan)) +
-			 nla_total_size(sizeof(struct ifla_vf_tx_rate)));
+			 nla_total_size(sizeof(struct ifla_vf_tx_rate)) +
+			 nla_total_size(sizeof(struct ifla_vf_spoofchk)));
 		return size;
 	} else
 		return 0;
@@ -954,13 +955,19 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			struct ifla_vf_mac vf_mac;
 			struct ifla_vf_vlan vf_vlan;
 			struct ifla_vf_tx_rate vf_tx_rate;
+			struct ifla_vf_spoofchk vf_spoofchk;
 			if (dev->netdev_ops->ndo_get_vf_config(dev, i, &ivi))
 				break;
-			vf_mac.vf = vf_vlan.vf = vf_tx_rate.vf = ivi.vf;
+			vf_mac.vf =
+			vf_vlan.vf =
+			vf_tx_rate.vf =
+			vf_spoofchk.vf = ivi.vf;
+
 			memcpy(vf_mac.mac, ivi.mac, sizeof(ivi.mac));
 			vf_vlan.vlan = ivi.vlan;
 			vf_vlan.qos = ivi.qos;
 			vf_tx_rate.rate = ivi.tx_rate;
+			vf_spoofchk.setting = ivi.spoofchk;
 			vf = nla_nest_start(skb, IFLA_VF_INFO);
 			if (!vf) {
 				nla_nest_cancel(skb, vfinfo);
@@ -968,7 +975,10 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			}
 			NLA_PUT(skb, IFLA_VF_MAC, sizeof(vf_mac), &vf_mac);
 			NLA_PUT(skb, IFLA_VF_VLAN, sizeof(vf_vlan), &vf_vlan);
-			NLA_PUT(skb, IFLA_VF_TX_RATE, sizeof(vf_tx_rate), &vf_tx_rate);
+			NLA_PUT(skb, IFLA_VF_TX_RATE, sizeof(vf_tx_rate),
+				&vf_tx_rate);
+			NLA_PUT(skb, IFLA_VF_SPOOFCHK, sizeof(vf_spoofchk),
+				&vf_spoofchk);
 			nla_nest_end(skb, vf);
 		}
 		nla_nest_end(skb, vfinfo);
@@ -1202,6 +1212,15 @@ static int do_setvfinfo(struct net_device *dev, struct nlattr *attr)
 							      ivt->rate);
 			break;
 		}
+		case IFLA_VF_SPOOFCHK: {
+			struct ifla_vf_spoofchk *ivs;
+			ivs = nla_data(vf);
+			err = -EOPNOTSUPP;
+			if (ops->ndo_set_vf_spoofchk)
+				err = ops->ndo_set_vf_spoofchk(dev, ivs->vf,
+							       ivs->setting);
+			break;
+		}
 		default:
 			err = -EINVAL;
 			break;
-- 
1.7.6.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next 06/10] ixgbe: Add new netdev op to turn spoof checking on or off per VF
  2011-09-24  9:17 [net-next 00/10][pull request] Intel Wired LAN Drivers Update Jeff Kirsher
                   ` (4 preceding siblings ...)
  2011-09-24  9:17 ` [net-next 05/10] if_link: Add additional parameter to IFLA_VF_INFO for spoof checking Jeff Kirsher
@ 2011-09-24  9:17 ` Jeff Kirsher
  2011-09-24  9:17 ` [net-next 07/10] ixgbe: update {P}FC thresholds to account for X540 and loopback Jeff Kirsher
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Jeff Kirsher @ 2011-09-24  9:17 UTC (permalink / raw)
  To: davem; +Cc: Greg Rose, netdev, gospo, Jeff Kirsher

From: Greg Rose <gregory.v.rose@intel.com>

Implements the new netdev op to allow user configuration of spoof
checking on a per VF basis.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h       |    3 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |   10 ++++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |   48 ++++++++++++++++++++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h |    1 +
 4 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 38940d7..623d121 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -128,6 +128,8 @@ struct vf_data_storage {
 	u16 pf_vlan; /* When set, guest VLAN config not allowed. */
 	u16 pf_qos;
 	u16 tx_rate;
+	u16 vlan_count;
+	u8 spoofchk_enabled;
 	struct pci_dev *vfdev;
 };
 
@@ -507,7 +509,6 @@ struct ixgbe_adapter {
 	int vf_rate_link_speed;
 	struct vf_macvlans vf_mvs;
 	struct vf_macvlans *mv_list;
-	bool antispoofing_enabled;
 
 	struct hlist_head fdir_filter_list;
 	union ixgbe_atr_input fdir_mask;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 3594b09..29bcdbe 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2782,6 +2782,7 @@ static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
 	u32 vt_reg_bits;
 	u32 reg_offset, vf_shift;
 	u32 vmdctl;
+	int i;
 
 	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED))
 		return;
@@ -2817,9 +2818,13 @@ static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
 	IXGBE_WRITE_REG(hw, IXGBE_PFDTXGSWC, IXGBE_PFDTXGSWC_VT_LBEN);
 	/* Enable MAC Anti-Spoofing */
 	hw->mac.ops.set_mac_anti_spoofing(hw,
-					  (adapter->antispoofing_enabled =
-					   (adapter->num_vfs != 0)),
+					   (adapter->num_vfs != 0),
 					  adapter->num_vfs);
+	/* For VFs that have spoof checking turned off */
+	for (i = 0; i < adapter->num_vfs; i++) {
+		if (!adapter->vfinfo[i].spoofchk_enabled)
+			ixgbe_ndo_set_vf_spoofchk(adapter->netdev, i, false);
+	}
 }
 
 static void ixgbe_set_rx_buffer_len(struct ixgbe_adapter *adapter)
@@ -7004,6 +7009,7 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_set_vf_mac		= ixgbe_ndo_set_vf_mac,
 	.ndo_set_vf_vlan	= ixgbe_ndo_set_vf_vlan,
 	.ndo_set_vf_tx_rate	= ixgbe_ndo_set_vf_bw,
+	.ndo_set_vf_spoofchk	= ixgbe_ndo_set_vf_spoofchk,
 	.ndo_get_vf_config	= ixgbe_ndo_get_vf_config,
 	.ndo_get_stats64	= ixgbe_get_stats64,
 	.ndo_setup_tc		= ixgbe_setup_tc,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 468ddd0..743afb9 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -151,6 +151,8 @@ void ixgbe_enable_sriov(struct ixgbe_adapter *adapter,
 		/* Disable RSC when in SR-IOV mode */
 		adapter->flags2 &= ~(IXGBE_FLAG2_RSC_CAPABLE |
 				     IXGBE_FLAG2_RSC_ENABLED);
+		for (i = 0; i < adapter->num_vfs; i++)
+			adapter->vfinfo[i].spoofchk_enabled = true;
 		return;
 	}
 
@@ -620,7 +622,13 @@ static int ixgbe_rcv_msg_from_vf(struct ixgbe_adapter *adapter, u32 vf)
 			       vf);
 			retval = -1;
 		} else {
+			if (add)
+				adapter->vfinfo[vf].vlan_count++;
+			else if (adapter->vfinfo[vf].vlan_count)
+				adapter->vfinfo[vf].vlan_count--;
 			retval = ixgbe_set_vf_vlan(adapter, add, vid, vf);
+			if (!retval && adapter->vfinfo[vf].spoofchk_enabled)
+				hw->mac.ops.set_vlan_anti_spoofing(hw, true, vf);
 		}
 		break;
 	case IXGBE_VF_SET_MACVLAN:
@@ -632,12 +640,8 @@ static int ixgbe_rcv_msg_from_vf(struct ixgbe_adapter *adapter, u32 vf)
 		 * greater than 0 will indicate the VF is setting a
 		 * macvlan MAC filter.
 		 */
-		if (index > 0 && adapter->antispoofing_enabled) {
-			hw->mac.ops.set_mac_anti_spoofing(hw, false,
-							  adapter->num_vfs);
-			hw->mac.ops.set_vlan_anti_spoofing(hw, false, vf);
-			adapter->antispoofing_enabled = false;
-		}
+		if (index > 0 && adapter->vfinfo[vf].spoofchk_enabled)
+			ixgbe_ndo_set_vf_spoofchk(adapter->netdev, vf, false);
 		retval = ixgbe_set_vf_macvlan(adapter, vf, index,
 					      (unsigned char *)(&msgbuf[1]));
 		break;
@@ -748,8 +752,9 @@ int ixgbe_ndo_set_vf_vlan(struct net_device *netdev, int vf, u16 vlan, u8 qos)
 			goto out;
 		ixgbe_set_vmvir(adapter, vlan | (qos << VLAN_PRIO_SHIFT), vf);
 		ixgbe_set_vmolr(hw, vf, false);
-		if (adapter->antispoofing_enabled)
+		if (adapter->vfinfo[vf].spoofchk_enabled)
 			hw->mac.ops.set_vlan_anti_spoofing(hw, true, vf);
+		adapter->vfinfo[vf].vlan_count++;
 		adapter->vfinfo[vf].pf_vlan = vlan;
 		adapter->vfinfo[vf].pf_qos = qos;
 		dev_info(&adapter->pdev->dev,
@@ -768,6 +773,8 @@ int ixgbe_ndo_set_vf_vlan(struct net_device *netdev, int vf, u16 vlan, u8 qos)
 		ixgbe_set_vmvir(adapter, vlan, vf);
 		ixgbe_set_vmolr(hw, vf, true);
 		hw->mac.ops.set_vlan_anti_spoofing(hw, false, vf);
+		if (adapter->vfinfo[vf].vlan_count)
+			adapter->vfinfo[vf].vlan_count--;
 		adapter->vfinfo[vf].pf_vlan = 0;
 		adapter->vfinfo[vf].pf_qos = 0;
        }
@@ -877,6 +884,32 @@ int ixgbe_ndo_set_vf_bw(struct net_device *netdev, int vf, int tx_rate)
 	return 0;
 }
 
+int ixgbe_ndo_set_vf_spoofchk(struct net_device *netdev, int vf, u8 setting)
+{
+	struct ixgbe_adapter *adapter = netdev_priv(netdev);
+	int vf_target_reg = vf >> 3;
+	int vf_target_shift = vf % 8;
+	struct ixgbe_hw *hw = &adapter->hw;
+	u32 regval;
+
+	adapter->vfinfo[vf].spoofchk_enabled = setting;
+
+	regval = IXGBE_READ_REG(hw, IXGBE_PFVFSPOOF(vf_target_reg));
+	regval &= ~(1 << vf_target_shift);
+	regval |= (setting << vf_target_shift);
+	IXGBE_WRITE_REG(hw, IXGBE_PFVFSPOOF(vf_target_reg), regval);
+
+	if (adapter->vfinfo[vf].vlan_count) {
+		vf_target_shift += IXGBE_SPOOF_VLANAS_SHIFT;
+		regval = IXGBE_READ_REG(hw, IXGBE_PFVFSPOOF(vf_target_reg));
+		regval &= ~(1 << vf_target_shift);
+		regval |= (setting << vf_target_shift);
+		IXGBE_WRITE_REG(hw, IXGBE_PFVFSPOOF(vf_target_reg), regval);
+	}
+
+	return 0;
+}
+
 int ixgbe_ndo_get_vf_config(struct net_device *netdev,
 			    int vf, struct ifla_vf_info *ivi)
 {
@@ -888,5 +921,6 @@ int ixgbe_ndo_get_vf_config(struct net_device *netdev,
 	ivi->tx_rate = adapter->vfinfo[vf].tx_rate;
 	ivi->vlan = adapter->vfinfo[vf].pf_vlan;
 	ivi->qos = adapter->vfinfo[vf].pf_qos;
+	ivi->spoofchk = adapter->vfinfo[vf].spoofchk_enabled;
 	return 0;
 }
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
index 2781847..2e54f7a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
@@ -38,6 +38,7 @@ int ixgbe_ndo_set_vf_mac(struct net_device *netdev, int queue, u8 *mac);
 int ixgbe_ndo_set_vf_vlan(struct net_device *netdev, int queue, u16 vlan,
 			   u8 qos);
 int ixgbe_ndo_set_vf_bw(struct net_device *netdev, int vf, int tx_rate);
+int ixgbe_ndo_set_vf_spoofchk(struct net_device *netdev, int vf, u8 setting);
 int ixgbe_ndo_get_vf_config(struct net_device *netdev,
 			    int vf, struct ifla_vf_info *ivi);
 void ixgbe_check_vf_rate_limit(struct ixgbe_adapter *adapter);
-- 
1.7.6.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next 07/10] ixgbe: update {P}FC thresholds to account for X540 and loopback
  2011-09-24  9:17 [net-next 00/10][pull request] Intel Wired LAN Drivers Update Jeff Kirsher
                   ` (5 preceding siblings ...)
  2011-09-24  9:17 ` [net-next 06/10] ixgbe: Add new netdev op to turn spoof checking on or off per VF Jeff Kirsher
@ 2011-09-24  9:17 ` Jeff Kirsher
  2011-09-24  9:17 ` [net-next 08/10] ixgbe add thermal sensor support for x540 hardware Jeff Kirsher
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Jeff Kirsher @ 2011-09-24  9:17 UTC (permalink / raw)
  To: davem; +Cc: John Fastabend, netdev, gospo, Jeff Kirsher

From: John Fastabend <john.r.fastabend@intel.com>

Revise high and low threshold marks wrt flow control to account
for the X540 devices and latency introduced by the loopback
switch.

Without this it was in theory possible to drop frames on a
supposedly lossless link with X540 or SR-IOV enabled.

Previously we used a magic number in a define to calculate the
threshold values. This made it difficult to sort out exactly
which latencies were or were not being accounted for. Here
I was overly explicit and tried to used #define names that would
be recognizable after reading the IEEE 802.1Qbb specification.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c     |    8 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c    |   12 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.h       |    1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82598.c |    9 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82599.c |    8 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      |  128 ++++++++++++++++++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_type.h      |   62 +++++++++-
 7 files changed, 190 insertions(+), 38 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
index b816a62..fa079bb 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
@@ -358,7 +358,6 @@ static s32 ixgbe_fc_enable_82598(struct ixgbe_hw *hw, s32 packetbuf_num)
 	u32 fctrl_reg;
 	u32 rmcs_reg;
 	u32 reg;
-	u32 rx_pba_size;
 	u32 link_speed = 0;
 	bool link_up;
 
@@ -461,16 +460,13 @@ static s32 ixgbe_fc_enable_82598(struct ixgbe_hw *hw, s32 packetbuf_num)
 
 	/* Set up and enable Rx high/low water mark thresholds, enable XON. */
 	if (hw->fc.current_mode & ixgbe_fc_tx_pause) {
-		rx_pba_size = IXGBE_READ_REG(hw, IXGBE_RXPBSIZE(packetbuf_num));
-		rx_pba_size >>= IXGBE_RXPBSIZE_SHIFT;
-
-		reg = (rx_pba_size - hw->fc.low_water) << 6;
+		reg = hw->fc.low_water << 6;
 		if (hw->fc.send_xon)
 			reg |= IXGBE_FCRTL_XONE;
 
 		IXGBE_WRITE_REG(hw, IXGBE_FCRTL(packetbuf_num), reg);
 
-		reg = (rx_pba_size - hw->fc.high_water) << 6;
+		reg = hw->fc.high_water[packetbuf_num] << 6;
 		reg |= IXGBE_FCRTH_FCEN;
 
 		IXGBE_WRITE_REG(hw, IXGBE_FCRTH(packetbuf_num), reg);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index 84ed9ef..59cd54c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -1932,7 +1932,6 @@ s32 ixgbe_fc_enable_generic(struct ixgbe_hw *hw, s32 packetbuf_num)
 	s32 ret_val = 0;
 	u32 mflcn_reg, fccfg_reg;
 	u32 reg;
-	u32 rx_pba_size;
 	u32 fcrtl, fcrth;
 
 #ifdef CONFIG_DCB
@@ -2012,11 +2011,8 @@ s32 ixgbe_fc_enable_generic(struct ixgbe_hw *hw, s32 packetbuf_num)
 	IXGBE_WRITE_REG(hw, IXGBE_MFLCN, mflcn_reg);
 	IXGBE_WRITE_REG(hw, IXGBE_FCCFG, fccfg_reg);
 
-	rx_pba_size = IXGBE_READ_REG(hw, IXGBE_RXPBSIZE(packetbuf_num));
-	rx_pba_size >>= IXGBE_RXPBSIZE_SHIFT;
-
-	fcrth = (rx_pba_size - hw->fc.high_water) << 10;
-	fcrtl = (rx_pba_size - hw->fc.low_water) << 10;
+	fcrth = hw->fc.high_water[packetbuf_num] << 10;
+	fcrtl = hw->fc.low_water << 10;
 
 	if (hw->fc.current_mode & ixgbe_fc_tx_pause) {
 		fcrth |= IXGBE_FCRTH_FCEN;
@@ -2293,7 +2289,9 @@ static s32 ixgbe_setup_fc(struct ixgbe_hw *hw, s32 packetbuf_num)
 	 * Validate the water mark configuration.  Zero water marks are invalid
 	 * because it causes the controller to just blast out fc packets.
 	 */
-	if (!hw->fc.low_water || !hw->fc.high_water || !hw->fc.pause_time) {
+	if (!hw->fc.low_water ||
+	    !hw->fc.high_water[packetbuf_num] ||
+	    !hw->fc.pause_time) {
 		hw_dbg(hw, "Invalid water mark configuration\n");
 		ret_val = IXGBE_ERR_INVALID_LINK_SETTINGS;
 		goto out;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.h
index 0a68aa7..df095a9 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.h
@@ -36,7 +36,6 @@
 
 #define IXGBE_MAX_PACKET_BUFFERS 8
 #define MAX_USER_PRIORITY        8
-#define MAX_TRAFFIC_CLASS        8
 #define MAX_BW_GROUP             8
 #define BW_PERCENT               100
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82598.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82598.c
index 2288c3c..fcd0e47 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82598.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82598.c
@@ -191,7 +191,7 @@ s32 ixgbe_dcb_config_tx_data_arbiter_82598(struct ixgbe_hw *hw,
  */
 s32 ixgbe_dcb_config_pfc_82598(struct ixgbe_hw *hw, u8 pfc_en)
 {
-	u32 reg, rx_pba_size;
+	u32 reg;
 	u8  i;
 
 	if (pfc_en) {
@@ -222,9 +222,8 @@ s32 ixgbe_dcb_config_pfc_82598(struct ixgbe_hw *hw, u8 pfc_en)
 	 */
 	for (i = 0; i < MAX_TRAFFIC_CLASS; i++) {
 		int enabled = pfc_en & (1 << i);
-		rx_pba_size = IXGBE_READ_REG(hw, IXGBE_RXPBSIZE(i));
-		rx_pba_size >>= IXGBE_RXPBSIZE_SHIFT;
-		reg = (rx_pba_size - hw->fc.low_water) << 10;
+
+		reg = hw->fc.low_water << 10;
 
 		if (enabled == pfc_enabled_tx ||
 		    enabled == pfc_enabled_full)
@@ -232,7 +231,7 @@ s32 ixgbe_dcb_config_pfc_82598(struct ixgbe_hw *hw, u8 pfc_en)
 
 		IXGBE_WRITE_REG(hw, IXGBE_FCRTL(i), reg);
 
-		reg = (rx_pba_size - hw->fc.high_water) << 10;
+		reg = hw->fc.high_water[i] << 10;
 		if (enabled == pfc_enabled_tx ||
 		    enabled == pfc_enabled_full)
 			reg |= IXGBE_FCRTH_FCEN;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82599.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82599.c
index d64fb87..02f6724 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82599.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82599.c
@@ -210,21 +210,19 @@ s32 ixgbe_dcb_config_tx_data_arbiter_82599(struct ixgbe_hw *hw,
  */
 s32 ixgbe_dcb_config_pfc_82599(struct ixgbe_hw *hw, u8 pfc_en)
 {
-	u32 i, reg, rx_pba_size;
+	u32 i, reg;
 
 	/* Configure PFC Tx thresholds per TC */
 	for (i = 0; i < MAX_TRAFFIC_CLASS; i++) {
 		int enabled = pfc_en & (1 << i);
-		rx_pba_size = IXGBE_READ_REG(hw, IXGBE_RXPBSIZE(i));
-		rx_pba_size >>= IXGBE_RXPBSIZE_SHIFT;
 
-		reg = (rx_pba_size - hw->fc.low_water) << 10;
+		reg = hw->fc.low_water << 10;
 
 		if (enabled)
 			reg |= IXGBE_FCRTL_XONE;
 		IXGBE_WRITE_REG(hw, IXGBE_FCRTL_82599(i), reg);
 
-		reg = (rx_pba_size - hw->fc.high_water) << 10;
+		reg = hw->fc.high_water[i] << 10;
 		if (enabled)
 			reg |= IXGBE_FCRTH_FCEN;
 		IXGBE_WRITE_REG(hw, IXGBE_FCRTH_82599(i), reg);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 29bcdbe..3d289b5 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3356,9 +3356,128 @@ static void ixgbe_configure_dcb(struct ixgbe_adapter *adapter)
 		IXGBE_WRITE_REG(hw, IXGBE_RQTC, reg);
 	}
 }
+#endif
+
+/* Additional bittime to account for IXGBE framing */
+#define IXGBE_ETH_FRAMING 20
+
+/*
+ * ixgbe_hpbthresh - calculate high water mark for flow control
+ *
+ * @adapter: board private structure to calculate for
+ * @pb - packet buffer to calculate
+ */
+static int ixgbe_hpbthresh(struct ixgbe_adapter *adapter, int pb)
+{
+	struct ixgbe_hw *hw = &adapter->hw;
+	struct net_device *dev = adapter->netdev;
+	int link, tc, kb, marker;
+	u32 dv_id, rx_pba;
+
+	/* Calculate max LAN frame size */
+	tc = link = dev->mtu + ETH_HLEN + ETH_FCS_LEN + IXGBE_ETH_FRAMING;
+
+#ifdef IXGBE_FCOE
+	/* FCoE traffic class uses FCOE jumbo frames */
+	if (dev->features & NETIF_F_FCOE_MTU) {
+		int fcoe_pb = 0;
 
+#ifdef CONFIG_IXGBE_DCB
+		fcoe_pb = netdev_get_prio_tc_map(dev, adapter->fcoe.up);
+
+#endif
+		if (fcoe_pb == pb && tc < IXGBE_FCOE_JUMBO_FRAME_SIZE)
+			tc = IXGBE_FCOE_JUMBO_FRAME_SIZE;
+	}
 #endif
 
+	/* Calculate delay value for device */
+	switch (hw->mac.type) {
+	case ixgbe_mac_X540:
+		dv_id = IXGBE_DV_X540(link, tc);
+		break;
+	default:
+		dv_id = IXGBE_DV(link, tc);
+		break;
+	}
+
+	/* Loopback switch introduces additional latency */
+	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
+		dv_id += IXGBE_B2BT(tc);
+
+	/* Delay value is calculated in bit times convert to KB */
+	kb = IXGBE_BT2KB(dv_id);
+	rx_pba = IXGBE_READ_REG(hw, IXGBE_RXPBSIZE(pb)) >> 10;
+
+	marker = rx_pba - kb;
+
+	/* It is possible that the packet buffer is not large enough
+	 * to provide required headroom. In this case throw an error
+	 * to user and a do the best we can.
+	 */
+	if (marker < 0) {
+		e_warn(drv, "Packet Buffer(%i) can not provide enough"
+			    "headroom to support flow control."
+			    "Decrease MTU or number of traffic classes\n", pb);
+		marker = tc + 1;
+	}
+
+	return marker;
+}
+
+/*
+ * ixgbe_lpbthresh - calculate low water mark for for flow control
+ *
+ * @adapter: board private structure to calculate for
+ * @pb - packet buffer to calculate
+ */
+static int ixgbe_lpbthresh(struct ixgbe_adapter *adapter)
+{
+	struct ixgbe_hw *hw = &adapter->hw;
+	struct net_device *dev = adapter->netdev;
+	int tc;
+	u32 dv_id;
+
+	/* Calculate max LAN frame size */
+	tc = dev->mtu + ETH_HLEN + ETH_FCS_LEN;
+
+	/* Calculate delay value for device */
+	switch (hw->mac.type) {
+	case ixgbe_mac_X540:
+		dv_id = IXGBE_LOW_DV_X540(tc);
+		break;
+	default:
+		dv_id = IXGBE_LOW_DV(tc);
+		break;
+	}
+
+	/* Delay value is calculated in bit times convert to KB */
+	return IXGBE_BT2KB(dv_id);
+}
+
+/*
+ * ixgbe_pbthresh_setup - calculate and setup high low water marks
+ */
+static void ixgbe_pbthresh_setup(struct ixgbe_adapter *adapter)
+{
+	struct ixgbe_hw *hw = &adapter->hw;
+	int num_tc = netdev_get_num_tc(adapter->netdev);
+	int i;
+
+	if (!num_tc)
+		num_tc = 1;
+
+	hw->fc.low_water = ixgbe_lpbthresh(adapter);
+
+	for (i = 0; i < num_tc; i++) {
+		hw->fc.high_water[i] = ixgbe_hpbthresh(adapter, i);
+
+		/* Low water marks must not be larger than high water marks */
+		if (hw->fc.low_water > hw->fc.high_water[i])
+			hw->fc.low_water = 0;
+	}
+}
+
 static void ixgbe_configure_pb(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
@@ -3372,6 +3491,7 @@ static void ixgbe_configure_pb(struct ixgbe_adapter *adapter)
 		hdrm = 0;
 
 	hw->mac.ops.set_rxpba(hw, tc, hdrm, PBA_STRATEGY_EQUAL);
+	ixgbe_pbthresh_setup(adapter);
 }
 
 static void ixgbe_fdir_filter_restore(struct ixgbe_adapter *adapter)
@@ -4774,13 +4894,11 @@ static int __devinit ixgbe_sw_init(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
 	struct pci_dev *pdev = adapter->pdev;
-	struct net_device *dev = adapter->netdev;
 	unsigned int rss;
 #ifdef CONFIG_IXGBE_DCB
 	int j;
 	struct tc_configuration *tc;
 #endif
-	int max_frame = dev->mtu + ETH_HLEN + ETH_FCS_LEN;
 
 	/* PCI config space info */
 
@@ -4856,8 +4974,7 @@ static int __devinit ixgbe_sw_init(struct ixgbe_adapter *adapter)
 #ifdef CONFIG_DCB
 	adapter->last_lfc_mode = hw->fc.current_mode;
 #endif
-	hw->fc.high_water = FC_HIGH_WATER(max_frame);
-	hw->fc.low_water = FC_LOW_WATER(max_frame);
+	ixgbe_pbthresh_setup(adapter);
 	hw->fc.pause_time = IXGBE_DEFAULT_FCPAUSE;
 	hw->fc.send_xon = true;
 	hw->fc.disable_fc_autoneg = false;
@@ -5124,9 +5241,6 @@ static int ixgbe_change_mtu(struct net_device *netdev, int new_mtu)
 	/* must set new MTU before calling down or up */
 	netdev->mtu = new_mtu;
 
-	hw->fc.high_water = FC_HIGH_WATER(max_frame);
-	hw->fc.low_water = FC_LOW_WATER(max_frame);
-
 	if (netif_running(netdev))
 		ixgbe_reinit_locked(adapter);
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
index 9a03341..16dd461 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
@@ -404,6 +404,7 @@
 #define IXGBE_WUPL_LENGTH_MASK 0xFFFF
 
 /* DCB registers */
+#define MAX_TRAFFIC_CLASS        8
 #define IXGBE_RMCS      0x03D00
 #define IXGBE_DPMCS     0x07F40
 #define IXGBE_PDPMCS    0x0CD00
@@ -2323,13 +2324,60 @@ typedef u32 ixgbe_physical_layer;
 #define IXGBE_PHYSICAL_LAYER_10GBASE_XAUI 0x1000
 #define IXGBE_PHYSICAL_LAYER_SFP_ACTIVE_DA 0x2000
 
-/* Flow Control Macros */
-#define PAUSE_RTT	8
-#define PAUSE_MTU(MTU)	((MTU + 1024 - 1) / 1024)
+/* Flow Control Data Sheet defined values
+ * Calculation and defines taken from 802.1bb Annex O
+ */
+
+/* BitTimes (BT) conversion */
+#define IXGBE_BT2KB(BT) ((BT + 1023) / (8 * 1024))
+#define IXGBE_B2BT(BT) (BT * 8)
+
+/* Calculate Delay to respond to PFC */
+#define IXGBE_PFC_D	672
+
+/* Calculate Cable Delay */
+#define IXGBE_CABLE_DC	5556 /* Delay Copper */
+#define IXGBE_CABLE_DO	5000 /* Delay Optical */
+
+/* Calculate Interface Delay X540 */
+#define IXGBE_PHY_DC	25600	/* Delay 10G BASET */
+#define IXGBE_MAC_DC	8192	/* Delay Copper XAUI interface */
+#define IXGBE_XAUI_DC	(2 * 2048) /* Delay Copper Phy */
+
+#define IXGBE_ID_X540	(IXGBE_MAC_DC + IXGBE_XAUI_DC + IXGBE_PHY_DC)
+
+/* Calculate Interface Delay 82598, 82599 */
+#define IXGBE_PHY_D	12800
+#define IXGBE_MAC_D	4096
+#define IXGBE_XAUI_D	(2 * 1024)
+
+#define IXGBE_ID	(IXGBE_MAC_D + IXGBE_XAUI_D + IXGBE_PHY_D)
+
+/* Calculate Delay incurred from higher layer */
+#define IXGBE_HD	6144
+
+/* Calculate PCI Bus delay for low thresholds */
+#define IXGBE_PCI_DELAY	10000
+
+/* Calculate X540 delay value in bit times */
+#define IXGBE_FILL_RATE (36 / 25)
+
+#define IXGBE_DV_X540(LINK, TC) (IXGBE_FILL_RATE * \
+				 (IXGBE_B2BT(LINK) + IXGBE_PFC_D + \
+				 (2 * IXGBE_CABLE_DC) + \
+				 (2 * IXGBE_ID_X540) + \
+				 IXGBE_HD + IXGBE_B2BT(TC)))
+
+/* Calculate 82599, 82598 delay value in bit times */
+#define IXGBE_DV(LINK, TC) (IXGBE_FILL_RATE * \
+			    (IXGBE_B2BT(LINK) + IXGBE_PFC_D + \
+			    (2 * IXGBE_CABLE_DC) + (2 * IXGBE_ID) + \
+			    IXGBE_HD + IXGBE_B2BT(TC)))
 
-#define FC_HIGH_WATER(MTU) ((((PAUSE_RTT + PAUSE_MTU(MTU)) * 144) + 99) / 100 +\
-				PAUSE_MTU(MTU))
-#define FC_LOW_WATER(MTU)  (2 * (2 * PAUSE_MTU(MTU) + PAUSE_RTT))
+/* Calculate low threshold delay values */
+#define IXGBE_LOW_DV_X540(TC) (2 * IXGBE_B2BT(TC) + \
+			       (IXGBE_FILL_RATE * IXGBE_PCI_DELAY))
+#define IXGBE_LOW_DV(TC)      (2 * IXGBE_LOW_DV_X540(TC))
 
 /* Software ATR hash keys */
 #define IXGBE_ATR_BUCKET_HASH_KEY    0x3DAD14E2
@@ -2548,7 +2596,7 @@ struct ixgbe_bus_info {
 
 /* Flow control parameters */
 struct ixgbe_fc_info {
-	u32 high_water; /* Flow Control High-water */
+	u32 high_water[MAX_TRAFFIC_CLASS]; /* Flow Control High-water */
 	u32 low_water; /* Flow Control Low-water */
 	u16 pause_time; /* Flow Control Pause timer */
 	bool send_xon; /* Flow control send XON */
-- 
1.7.6.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next 08/10] ixgbe add thermal sensor support for x540 hardware
  2011-09-24  9:17 [net-next 00/10][pull request] Intel Wired LAN Drivers Update Jeff Kirsher
                   ` (6 preceding siblings ...)
  2011-09-24  9:17 ` [net-next 07/10] ixgbe: update {P}FC thresholds to account for X540 and loopback Jeff Kirsher
@ 2011-09-24  9:17 ` Jeff Kirsher
  2011-09-24  9:17 ` [net-next 09/10] ixgbe: cleanup ixgbe_setup_gpie() for X540 Jeff Kirsher
  2011-09-24  9:17 ` [net-next 10/10] ixgbe: add ECC warning for legacy interrupts Jeff Kirsher
  9 siblings, 0 replies; 30+ messages in thread
From: Jeff Kirsher @ 2011-09-24  9:17 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, gospo, Jeff Kirsher

From: Jacob Keller <jacob.e.keller@intel.com>

Add code to enable thermal sensors for the x540 hardware, as well as a
thermal interrupt check which will exit with a critical message of a
thermal overheat is detected. Intent of code allows other mac types to
be added with different configuration in the future.

Fixed in this version is the addition of setting the temp_sensor
capable flag which was previously only set for a specific mac.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   66 ++++++++++++++++++-------
 drivers/net/ethernet/intel/ixgbe/ixgbe_type.h |    2 +
 2 files changed, 50 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 3d289b5..45f5ab5 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1751,6 +1751,39 @@ static void ixgbe_check_fan_failure(struct ixgbe_adapter *adapter, u32 eicr)
 	}
 }
 
+static void ixgbe_check_overtemp_event(struct ixgbe_adapter *adapter, u32 eicr)
+{
+	if (!(adapter->flags2 & IXGBE_FLAG2_TEMP_SENSOR_CAPABLE))
+		return;
+
+	switch (adapter->hw.mac.type) {
+	case ixgbe_mac_82599EB:
+		/*
+		 * Need to check link state so complete overtemp check
+		 * on service task
+		 */
+		if (((eicr & IXGBE_EICR_GPI_SDP0) || (eicr & IXGBE_EICR_LSC)) &&
+		    (!test_bit(__IXGBE_DOWN, &adapter->state))) {
+			adapter->interrupt_event = eicr;
+			adapter->flags2 |= IXGBE_FLAG2_TEMP_SENSOR_EVENT;
+			ixgbe_service_event_schedule(adapter);
+			return;
+		}
+		return;
+	case ixgbe_mac_X540:
+		if (!(eicr & IXGBE_EICR_TS))
+			return;
+		break;
+	default:
+		return;
+	}
+
+	e_crit(drv,
+	       "Network adapter has been stopped because it has over heated. "
+	       "Restart the computer. If the problem persists, "
+	       "power off the system and replace the adapter\n");
+}
+
 static void ixgbe_check_sfp_event(struct ixgbe_adapter *adapter, u32 eicr)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
@@ -1854,7 +1887,16 @@ static inline void ixgbe_irq_enable(struct ixgbe_adapter *adapter, bool queues,
 		mask &= ~IXGBE_EIMS_LSC;
 
 	if (adapter->flags2 & IXGBE_FLAG2_TEMP_SENSOR_CAPABLE)
-		mask |= IXGBE_EIMS_GPI_SDP0;
+		switch (adapter->hw.mac.type) {
+		case ixgbe_mac_82599EB:
+			mask |= IXGBE_EIMS_GPI_SDP0;
+			break;
+		case ixgbe_mac_X540:
+			mask |= IXGBE_EIMS_TS;
+			break;
+		default:
+			break;
+		}
 	if (adapter->flags & IXGBE_FLAG_FAN_FAIL_CAPABLE)
 		mask |= IXGBE_EIMS_GPI_SDP1;
 	switch (adapter->hw.mac.type) {
@@ -1924,14 +1966,7 @@ static irqreturn_t ixgbe_msix_other(int irq, void *data)
 			}
 		}
 		ixgbe_check_sfp_event(adapter, eicr);
-		if ((adapter->flags2 & IXGBE_FLAG2_TEMP_SENSOR_CAPABLE) &&
-		    ((eicr & IXGBE_EICR_GPI_SDP0) || (eicr & IXGBE_EICR_LSC))) {
-			if (!test_bit(__IXGBE_DOWN, &adapter->state)) {
-				adapter->interrupt_event = eicr;
-				adapter->flags2 |= IXGBE_FLAG2_TEMP_SENSOR_EVENT;
-				ixgbe_service_event_schedule(adapter);
-			}
-		}
+		ixgbe_check_overtemp_event(adapter, eicr);
 		break;
 	default:
 		break;
@@ -2140,15 +2175,9 @@ static irqreturn_t ixgbe_intr(int irq, void *data)
 
 	switch (hw->mac.type) {
 	case ixgbe_mac_82599EB:
+	case ixgbe_mac_X540:
 		ixgbe_check_sfp_event(adapter, eicr);
-		if ((adapter->flags2 & IXGBE_FLAG2_TEMP_SENSOR_CAPABLE) &&
-		    ((eicr & IXGBE_EICR_GPI_SDP0) || (eicr & IXGBE_EICR_LSC))) {
-			if (!test_bit(__IXGBE_DOWN, &adapter->state)) {
-				adapter->interrupt_event = eicr;
-				adapter->flags2 |= IXGBE_FLAG2_TEMP_SENSOR_EVENT;
-				ixgbe_service_event_schedule(adapter);
-			}
-		}
+		ixgbe_check_overtemp_event(adapter, eicr);
 		break;
 	default:
 		break;
@@ -4918,8 +4947,9 @@ static int __devinit ixgbe_sw_init(struct ixgbe_adapter *adapter)
 			adapter->flags |= IXGBE_FLAG_FAN_FAIL_CAPABLE;
 		adapter->max_msix_q_vectors = MAX_MSIX_Q_VECTORS_82598;
 		break;
-	case ixgbe_mac_82599EB:
 	case ixgbe_mac_X540:
+		adapter->flags2 |= IXGBE_FLAG2_TEMP_SENSOR_CAPABLE;
+	case ixgbe_mac_82599EB:
 		adapter->max_msix_q_vectors = MAX_MSIX_Q_VECTORS_82599;
 		adapter->flags2 |= IXGBE_FLAG2_RSC_CAPABLE;
 		adapter->flags2 |= IXGBE_FLAG2_RSC_ENABLED;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
index 16dd461..838847c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
@@ -1280,6 +1280,7 @@ enum {
 #define IXGBE_EICR_LSC          0x00100000 /* Link Status Change */
 #define IXGBE_EICR_LINKSEC      0x00200000 /* PN Threshold */
 #define IXGBE_EICR_MNG          0x00400000 /* Manageability Event Interrupt */
+#define IXGBE_EICR_TS           0x00800000 /* Thermal Sensor Event */
 #define IXGBE_EICR_GPI_SDP0     0x01000000 /* Gen Purpose Interrupt on SDP0 */
 #define IXGBE_EICR_GPI_SDP1     0x02000000 /* Gen Purpose Interrupt on SDP1 */
 #define IXGBE_EICR_GPI_SDP2     0x04000000 /* Gen Purpose Interrupt on SDP2 */
@@ -1314,6 +1315,7 @@ enum {
 #define IXGBE_EIMS_MAILBOX      IXGBE_EICR_MAILBOX   /* VF to PF Mailbox Int */
 #define IXGBE_EIMS_LSC          IXGBE_EICR_LSC       /* Link Status Change */
 #define IXGBE_EIMS_MNG          IXGBE_EICR_MNG       /* MNG Event Interrupt */
+#define IXGBE_EIMS_TS           IXGBE_EICR_TS        /* Thermel Sensor Event */
 #define IXGBE_EIMS_GPI_SDP0     IXGBE_EICR_GPI_SDP0  /* SDP0 Gen Purpose Int */
 #define IXGBE_EIMS_GPI_SDP1     IXGBE_EICR_GPI_SDP1  /* SDP1 Gen Purpose Int */
 #define IXGBE_EIMS_GPI_SDP2     IXGBE_EICR_GPI_SDP2  /* SDP2 Gen Purpose Int */
-- 
1.7.6.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next 09/10] ixgbe: cleanup ixgbe_setup_gpie() for X540
  2011-09-24  9:17 [net-next 00/10][pull request] Intel Wired LAN Drivers Update Jeff Kirsher
                   ` (7 preceding siblings ...)
  2011-09-24  9:17 ` [net-next 08/10] ixgbe add thermal sensor support for x540 hardware Jeff Kirsher
@ 2011-09-24  9:17 ` Jeff Kirsher
  2011-09-24  9:17 ` [net-next 10/10] ixgbe: add ECC warning for legacy interrupts Jeff Kirsher
  9 siblings, 0 replies; 30+ messages in thread
From: Jeff Kirsher @ 2011-09-24  9:17 UTC (permalink / raw)
  To: davem; +Cc: Don Skidmore, netdev, gospo, Jeff Kirsher

From: Don Skidmore <donald.c.skidmore@intel.com>

The X540 thermal sensor interrupt isn't a General Purpose Interrupt
so doesn't need to be enabled in ixgbe_setup_gpie().  Likewise X540 doesn't
use the SDP0 for thermal sensor so it doesn't need to be enabled for any
device other than 82599.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   14 ++++++++++++--
 1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 45f5ab5..6c97aa4 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3685,8 +3685,18 @@ static void ixgbe_setup_gpie(struct ixgbe_adapter *adapter)
 	}
 
 	/* Enable Thermal over heat sensor interrupt */
-	if (adapter->flags2 & IXGBE_FLAG2_TEMP_SENSOR_CAPABLE)
-		gpie |= IXGBE_SDP0_GPIEN;
+	if (adapter->flags2 & IXGBE_FLAG2_TEMP_SENSOR_CAPABLE) {
+		switch (adapter->hw.mac.type) {
+		case ixgbe_mac_82599EB:
+			gpie |= IXGBE_SDP0_GPIEN;
+			break;
+		case ixgbe_mac_X540:
+			gpie |= IXGBE_EIMS_TS;
+			break;
+		default:
+			break;
+		}
+	}
 
 	/* Enable fan failure interrupt */
 	if (adapter->flags & IXGBE_FLAG_FAN_FAIL_CAPABLE)
-- 
1.7.6.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next 10/10] ixgbe: add ECC warning for legacy interrupts
  2011-09-24  9:17 [net-next 00/10][pull request] Intel Wired LAN Drivers Update Jeff Kirsher
                   ` (8 preceding siblings ...)
  2011-09-24  9:17 ` [net-next 09/10] ixgbe: cleanup ixgbe_setup_gpie() for X540 Jeff Kirsher
@ 2011-09-24  9:17 ` Jeff Kirsher
  9 siblings, 0 replies; 30+ messages in thread
From: Jeff Kirsher @ 2011-09-24  9:17 UTC (permalink / raw)
  To: davem; +Cc: Don Skidmore, netdev, gospo, Jeff Kirsher

From: Don Skidmore <donald.c.skidmore@intel.com>

Noticed that the legacy Interrupt handler didn't have the same
ECC warning as did the MSI.  So this patch adds it.

Signed-off-by: Don Skidmore <donald.c.skidmore>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 6c97aa4..8995944 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2175,8 +2175,12 @@ static irqreturn_t ixgbe_intr(int irq, void *data)
 
 	switch (hw->mac.type) {
 	case ixgbe_mac_82599EB:
-	case ixgbe_mac_X540:
 		ixgbe_check_sfp_event(adapter, eicr);
+		/* Fall through */
+	case ixgbe_mac_X540:
+		if (eicr & IXGBE_EICR_ECC)
+			e_info(link, "Received unrecoverable ECC err, please "
+				     "reboot\n");
 		ixgbe_check_overtemp_event(adapter, eicr);
 		break;
 	default:
-- 
1.7.6.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [net-next 02/10] ixgbevf: Fix broken trunk vlan
  2011-09-24  9:17 ` [net-next 02/10] ixgbevf: Fix broken trunk vlan Jeff Kirsher
@ 2011-09-24 16:33   ` Jesse Gross
  2011-09-25  5:47     ` Jeff Kirsher
  2011-09-26 15:57     ` Rose, Gregory V
  0 siblings, 2 replies; 30+ messages in thread
From: Jesse Gross @ 2011-09-24 16:33 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, Greg Rose, netdev, gospo, Jiri Pirko

On Sat, Sep 24, 2011 at 2:17 AM, Jeff Kirsher
<jeffrey.t.kirsher@intel.com> wrote:
> From: Greg Rose <gregory.v.rose@intel.com>
>
> Changes to clean up the vlan rx path broke trunk vlan.  Trunk vlans in
> a VF driver are those set using:
>
> "ip link set <pfdev> vf <n> <vlanid>"
>
> Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> CC: Jiri Pirko <jpirko@redhat.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |    6 ++----
>  1 files changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> index d72905b..4930c46 100644
> --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> @@ -293,12 +293,10 @@ static void ixgbevf_receive_skb(struct ixgbevf_q_vector *q_vector,
>  {
>        struct ixgbevf_adapter *adapter = q_vector->adapter;
>        bool is_vlan = (status & IXGBE_RXD_STAT_VP);
> +       u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan);
>
> -       if (is_vlan) {
> -               u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan);
> -
> +       if (is_vlan && test_bit(tag, adapter->active_vlans))
>                __vlan_hwaccel_put_tag(skb, tag);
> -       }

What happens if you run tcpdump without configuring vlan devices?
Shouldn't you see tagged packets for the vlans that are being trunked
to you?  I think this will strip tags in that case.  The apparent
behavior of vlan filters here is also surprising to me because on one
hand if they're truly filtering this test shouldn't be needed and on
the other hand they don't seem to be disabled in promiscuous mode.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next 05/10] if_link: Add additional parameter to IFLA_VF_INFO for spoof checking
  2011-09-24  9:17 ` [net-next 05/10] if_link: Add additional parameter to IFLA_VF_INFO for spoof checking Jeff Kirsher
@ 2011-09-24 16:40   ` Ben Hutchings
  2011-09-25  5:54     ` Jeff Kirsher
  2011-09-26 16:06     ` Rose, Gregory V
  2011-09-25 17:22   ` Stephen Hemminger
  1 sibling, 2 replies; 30+ messages in thread
From: Ben Hutchings @ 2011-09-24 16:40 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, Greg Rose, netdev, gospo

On Sat, 2011-09-24 at 02:17 -0700, Jeff Kirsher wrote:
> From: Greg Rose <gregory.v.rose@intel.com>
> 
> Add configuration setting for drivers to turn spoof checking on or off
> for discrete VFs.
> 
> Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
>  include/linux/if_link.h   |    7 +++++++
>  include/linux/netdevice.h |    3 +++
>  net/core/rtnetlink.c      |   25 ++++++++++++++++++++++---
>  3 files changed, 32 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> index 0ee969a..8bd6d6d 100644
> --- a/include/linux/if_link.h
> +++ b/include/linux/if_link.h
> @@ -279,6 +279,7 @@ enum {
>  	IFLA_VF_MAC,		/* Hardware queue specific attributes */
>  	IFLA_VF_VLAN,
>  	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
> +	IFLA_VF_SPOOFCHK,	/* Spoof Checking on/off switch */
>  	__IFLA_VF_MAX,
>  };
>  
> @@ -300,12 +301,18 @@ struct ifla_vf_tx_rate {
>  	__u32 rate; /* Max TX bandwidth in Mbps, 0 disables throttling */
>  };
>  
> +struct ifla_vf_spoofchk {
> +	__u32 vf;
> +	__u32 setting;
> +};
> +
>  struct ifla_vf_info {
>  	__u32 vf;
>  	__u8 mac[32];
>  	__u32 vlan;
>  	__u32 qos;
>  	__u32 tx_rate;
> +	__u32 spoofchk;
>  };

Not something you need to change now, but shouldn't this last struct
definition be #ifdef __KERNEL__?

>  /* VF ports management section
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 43b3298..a2951a0 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -781,6 +781,7 @@ struct netdev_tc_txq {
>   * int (*ndo_set_vf_mac)(struct net_device *dev, int vf, u8* mac);
>   * int (*ndo_set_vf_vlan)(struct net_device *dev, int vf, u16 vlan, u8 qos);
>   * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
> + * int (*ndo_set_vf_spoofchk)(struct net_device *dev, int vf, u8 setting);
>   * int (*ndo_get_vf_config)(struct net_device *dev,
>   *			    int vf, struct ifla_vf_info *ivf);
>   * int (*ndo_set_vf_port)(struct net_device *dev, int vf,
> @@ -900,6 +901,8 @@ struct net_device_ops {
>  						   int queue, u16 vlan, u8 qos);
>  	int			(*ndo_set_vf_tx_rate)(struct net_device *dev,
>  						      int vf, int rate);
> +	int			(*ndo_set_vf_spoofchk)(struct net_device *dev,
> +						       int vf, u8 setting);

Why u8 and not bool?

>  	int			(*ndo_get_vf_config)(struct net_device *dev,
>  						     int vf,
>  						     struct ifla_vf_info *ivf);
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 39f8dd6..6535810 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -731,7 +731,8 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev)
>  		size += num_vfs *
>  			(nla_total_size(sizeof(struct ifla_vf_mac)) +
>  			 nla_total_size(sizeof(struct ifla_vf_vlan)) +
> -			 nla_total_size(sizeof(struct ifla_vf_tx_rate)));
> +			 nla_total_size(sizeof(struct ifla_vf_tx_rate)) +
> +			 nla_total_size(sizeof(struct ifla_vf_spoofchk)));
>  		return size;
>  	} else
>  		return 0;
> @@ -954,13 +955,19 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
>  			struct ifla_vf_mac vf_mac;
>  			struct ifla_vf_vlan vf_vlan;
>  			struct ifla_vf_tx_rate vf_tx_rate;
> +			struct ifla_vf_spoofchk vf_spoofchk;
>  			if (dev->netdev_ops->ndo_get_vf_config(dev, i, &ivi))
>  				break;
> -			vf_mac.vf = vf_vlan.vf = vf_tx_rate.vf = ivi.vf;
> +			vf_mac.vf =
> +			vf_vlan.vf =
> +			vf_tx_rate.vf =
> +			vf_spoofchk.vf = ivi.vf;
> +
[...]

The continuation lines should be indented.  Or you could just write the
assignments as multiple statements.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next 02/10] ixgbevf: Fix broken trunk vlan
  2011-09-24 16:33   ` Jesse Gross
@ 2011-09-25  5:47     ` Jeff Kirsher
  2011-09-26 15:57     ` Rose, Gregory V
  1 sibling, 0 replies; 30+ messages in thread
From: Jeff Kirsher @ 2011-09-25  5:47 UTC (permalink / raw)
  To: Jesse Gross; +Cc: davem, Rose, Gregory V, netdev, gospo, Jiri Pirko

[-- Attachment #1: Type: text/plain, Size: 2170 bytes --]

On Sat, 2011-09-24 at 09:33 -0700, Jesse Gross wrote:
> On Sat, Sep 24, 2011 at 2:17 AM, Jeff Kirsher
> <jeffrey.t.kirsher@intel.com> wrote:
> > From: Greg Rose <gregory.v.rose@intel.com>
> >
> > Changes to clean up the vlan rx path broke trunk vlan.  Trunk vlans in
> > a VF driver are those set using:
> >
> > "ip link set <pfdev> vf <n> <vlanid>"
> >
> > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> > CC: Jiri Pirko <jpirko@redhat.com>
> > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > ---
> >  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |    6 ++----
> >  1 files changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> > index d72905b..4930c46 100644
> > --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> > +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> > @@ -293,12 +293,10 @@ static void ixgbevf_receive_skb(struct ixgbevf_q_vector *q_vector,
> >  {
> >        struct ixgbevf_adapter *adapter = q_vector->adapter;
> >        bool is_vlan = (status & IXGBE_RXD_STAT_VP);
> > +       u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan);
> >
> > -       if (is_vlan) {
> > -               u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan);
> > -
> > +       if (is_vlan && test_bit(tag, adapter->active_vlans))
> >                __vlan_hwaccel_put_tag(skb, tag);
> > -       }
> 
> What happens if you run tcpdump without configuring vlan devices?
> Shouldn't you see tagged packets for the vlans that are being trunked
> to you?  I think this will strip tags in that case.  The apparent
> behavior of vlan filters here is also surprising to me because on one
> hand if they're truly filtering this test shouldn't be needed and on
> the other hand they don't seem to be disabled in promiscuous mode.

Jesse-

I believe this issue was noticed/reported by Jiri and I did not follow
it closely, so I would like Greg or Jiri to give you the information you
are wanting.  Being the weekend, I am sure Greg will respond come Monday
with more information.

Cheers,
Jeff

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next 05/10] if_link: Add additional parameter to IFLA_VF_INFO for spoof checking
  2011-09-24 16:40   ` Ben Hutchings
@ 2011-09-25  5:54     ` Jeff Kirsher
  2011-09-26 16:06     ` Rose, Gregory V
  1 sibling, 0 replies; 30+ messages in thread
From: Jeff Kirsher @ 2011-09-25  5:54 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: davem, Rose, Gregory V, netdev, gospo

[-- Attachment #1: Type: text/plain, Size: 4425 bytes --]

On Sat, 2011-09-24 at 09:40 -0700, Ben Hutchings wrote:
> On Sat, 2011-09-24 at 02:17 -0700, Jeff Kirsher wrote:
> > From: Greg Rose <gregory.v.rose@intel.com>
> > 
> > Add configuration setting for drivers to turn spoof checking on or off
> > for discrete VFs.
> > 
> > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > ---
> >  include/linux/if_link.h   |    7 +++++++
> >  include/linux/netdevice.h |    3 +++
> >  net/core/rtnetlink.c      |   25 ++++++++++++++++++++++---
> >  3 files changed, 32 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> > index 0ee969a..8bd6d6d 100644
> > --- a/include/linux/if_link.h
> > +++ b/include/linux/if_link.h
> > @@ -279,6 +279,7 @@ enum {
> >  	IFLA_VF_MAC,		/* Hardware queue specific attributes */
> >  	IFLA_VF_VLAN,
> >  	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
> > +	IFLA_VF_SPOOFCHK,	/* Spoof Checking on/off switch */
> >  	__IFLA_VF_MAX,
> >  };
> >  
> > @@ -300,12 +301,18 @@ struct ifla_vf_tx_rate {
> >  	__u32 rate; /* Max TX bandwidth in Mbps, 0 disables throttling */
> >  };
> >  
> > +struct ifla_vf_spoofchk {
> > +	__u32 vf;
> > +	__u32 setting;
> > +};
> > +
> >  struct ifla_vf_info {
> >  	__u32 vf;
> >  	__u8 mac[32];
> >  	__u32 vlan;
> >  	__u32 qos;
> >  	__u32 tx_rate;
> > +	__u32 spoofchk;
> >  };
> 
> Not something you need to change now, but shouldn't this last struct
> definition be #ifdef __KERNEL__?
> 
> >  /* VF ports management section
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 43b3298..a2951a0 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -781,6 +781,7 @@ struct netdev_tc_txq {
> >   * int (*ndo_set_vf_mac)(struct net_device *dev, int vf, u8* mac);
> >   * int (*ndo_set_vf_vlan)(struct net_device *dev, int vf, u16 vlan, u8 qos);
> >   * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
> > + * int (*ndo_set_vf_spoofchk)(struct net_device *dev, int vf, u8 setting);
> >   * int (*ndo_get_vf_config)(struct net_device *dev,
> >   *			    int vf, struct ifla_vf_info *ivf);
> >   * int (*ndo_set_vf_port)(struct net_device *dev, int vf,
> > @@ -900,6 +901,8 @@ struct net_device_ops {
> >  						   int queue, u16 vlan, u8 qos);
> >  	int			(*ndo_set_vf_tx_rate)(struct net_device *dev,
> >  						      int vf, int rate);
> > +	int			(*ndo_set_vf_spoofchk)(struct net_device *dev,
> > +						       int vf, u8 setting);
> 
> Why u8 and not bool?

This could be a bool, I believe Greg did a int just based on other
similar instances.  Greg will definitely have a better knowledge and
possible future use instances which may justify an int.

> 
> >  	int			(*ndo_get_vf_config)(struct net_device *dev,
> >  						     int vf,
> >  						     struct ifla_vf_info *ivf);
> > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > index 39f8dd6..6535810 100644
> > --- a/net/core/rtnetlink.c
> > +++ b/net/core/rtnetlink.c
> > @@ -731,7 +731,8 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev)
> >  		size += num_vfs *
> >  			(nla_total_size(sizeof(struct ifla_vf_mac)) +
> >  			 nla_total_size(sizeof(struct ifla_vf_vlan)) +
> > -			 nla_total_size(sizeof(struct ifla_vf_tx_rate)));
> > +			 nla_total_size(sizeof(struct ifla_vf_tx_rate)) +
> > +			 nla_total_size(sizeof(struct ifla_vf_spoofchk)));
> >  		return size;
> >  	} else
> >  		return 0;
> > @@ -954,13 +955,19 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
> >  			struct ifla_vf_mac vf_mac;
> >  			struct ifla_vf_vlan vf_vlan;
> >  			struct ifla_vf_tx_rate vf_tx_rate;
> > +			struct ifla_vf_spoofchk vf_spoofchk;
> >  			if (dev->netdev_ops->ndo_get_vf_config(dev, i, &ivi))
> >  				break;
> > -			vf_mac.vf = vf_vlan.vf = vf_tx_rate.vf = ivi.vf;
> > +			vf_mac.vf =
> > +			vf_vlan.vf =
> > +			vf_tx_rate.vf =
> > +			vf_spoofchk.vf = ivi.vf;
> > +
> [...]
> 
> The continuation lines should be indented.  Or you could just write the
> assignments as multiple statements.
> 
> Ben.
> 

I agree.  At least Greg and I can clean this up, if nothing else.  I
will wait for Greg's thoughts/comments on your other observations before
re-spinning this patch.

Thanks Ben.



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next 05/10] if_link: Add additional parameter to IFLA_VF_INFO for spoof checking
  2011-09-24  9:17 ` [net-next 05/10] if_link: Add additional parameter to IFLA_VF_INFO for spoof checking Jeff Kirsher
  2011-09-24 16:40   ` Ben Hutchings
@ 2011-09-25 17:22   ` Stephen Hemminger
  2011-09-25 20:06     ` Ben Hutchings
  2011-09-26 16:18     ` Rose, Gregory V
  1 sibling, 2 replies; 30+ messages in thread
From: Stephen Hemminger @ 2011-09-25 17:22 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, Greg Rose, netdev, gospo

On Sat, 24 Sep 2011 02:17:38 -0700
Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:

> From: Greg Rose <gregory.v.rose@intel.com>
> 
> Add configuration setting for drivers to turn spoof checking on or off
> for discrete VFs.
> 
> Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
>  include/linux/if_link.h   |    7 +++++++
>  include/linux/netdevice.h |    3 +++
>  net/core/rtnetlink.c      |   25 ++++++++++++++++++++++---
>  3 files changed, 32 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> index 0ee969a..8bd6d6d 100644
> --- a/include/linux/if_link.h
> +++ b/include/linux/if_link.h
> @@ -279,6 +279,7 @@ enum {
>  	IFLA_VF_MAC,		/* Hardware queue specific attributes */
>  	IFLA_VF_VLAN,
>  	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
> +	IFLA_VF_SPOOFCHK,	/* Spoof Checking on/off switch */
>  	__IFLA_VF_MAX,
>  };
>  
> @@ -300,12 +301,18 @@ struct ifla_vf_tx_rate {
>  	__u32 rate; /* Max TX bandwidth in Mbps, 0 disables throttling */
>  };
>  
> +struct ifla_vf_spoofchk {
> +	__u32 vf;
> +	__u32 setting;
> +};
> +
>  struct ifla_vf_info {
>  	__u32 vf;
>  	__u8 mac[32];
>  	__u32 vlan;
>  	__u32 qos;
>  	__u32 tx_rate;
> +	__u32 spoofchk;
>  };

This breaks ABI compatibility, unless you add some special case code
to handle the case of tools sending the old ifla_vf_info. Users may have older version
of ip utilities that send smaller size structure.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next 05/10] if_link: Add additional parameter to IFLA_VF_INFO for spoof checking
  2011-09-25 17:22   ` Stephen Hemminger
@ 2011-09-25 20:06     ` Ben Hutchings
  2011-09-26 16:14       ` Stephen Hemminger
  2011-09-26 16:18     ` Rose, Gregory V
  1 sibling, 1 reply; 30+ messages in thread
From: Ben Hutchings @ 2011-09-25 20:06 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Jeff Kirsher, davem, Greg Rose, netdev, gospo

On Sun, 2011-09-25 at 10:22 -0700, Stephen Hemminger wrote:
> On Sat, 24 Sep 2011 02:17:38 -0700
> Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
> 
> > From: Greg Rose <gregory.v.rose@intel.com>
> > 
> > Add configuration setting for drivers to turn spoof checking on or off
> > for discrete VFs.
> > 
> > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > ---
> >  include/linux/if_link.h   |    7 +++++++
> >  include/linux/netdevice.h |    3 +++
> >  net/core/rtnetlink.c      |   25 ++++++++++++++++++++++---
> >  3 files changed, 32 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> > index 0ee969a..8bd6d6d 100644
> > --- a/include/linux/if_link.h
> > +++ b/include/linux/if_link.h
> > @@ -279,6 +279,7 @@ enum {
> >  	IFLA_VF_MAC,		/* Hardware queue specific attributes */
> >  	IFLA_VF_VLAN,
> >  	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
> > +	IFLA_VF_SPOOFCHK,	/* Spoof Checking on/off switch */
> >  	__IFLA_VF_MAX,
> >  };
> >  
> > @@ -300,12 +301,18 @@ struct ifla_vf_tx_rate {
> >  	__u32 rate; /* Max TX bandwidth in Mbps, 0 disables throttling */
> >  };
> >  
> > +struct ifla_vf_spoofchk {
> > +	__u32 vf;
> > +	__u32 setting;
> > +};
> > +
> >  struct ifla_vf_info {
> >  	__u32 vf;
> >  	__u8 mac[32];
> >  	__u32 vlan;
> >  	__u32 qos;
> >  	__u32 tx_rate;
> > +	__u32 spoofchk;
> >  };
> 
> This breaks ABI compatibility, unless you add some special case code
> to handle the case of tools sending the old ifla_vf_info. Users may have older version
> of ip utilities that send smaller size structure.

Unless I'm missing something, that structure is not sent or received by
userland; that's why I thought it should be #ifdef __KERNEL__.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [net-next 02/10] ixgbevf: Fix broken trunk vlan
  2011-09-24 16:33   ` Jesse Gross
  2011-09-25  5:47     ` Jeff Kirsher
@ 2011-09-26 15:57     ` Rose, Gregory V
  2011-09-26 23:08       ` Jesse Gross
  1 sibling, 1 reply; 30+ messages in thread
From: Rose, Gregory V @ 2011-09-26 15:57 UTC (permalink / raw)
  To: Jesse Gross, Kirsher, Jeffrey T; +Cc: davem, netdev, gospo, Jiri Pirko

> -----Original Message-----
> From: Jesse Gross [mailto:jesse@nicira.com]
> Sent: Saturday, September 24, 2011 9:33 AM
> To: Kirsher, Jeffrey T
> Cc: davem@davemloft.net; Rose, Gregory V; netdev@vger.kernel.org;
> gospo@redhat.com; Jiri Pirko
> Subject: Re: [net-next 02/10] ixgbevf: Fix broken trunk vlan
> 
> On Sat, Sep 24, 2011 at 2:17 AM, Jeff Kirsher
> <jeffrey.t.kirsher@intel.com> wrote:
> > From: Greg Rose <gregory.v.rose@intel.com>
> >
> > Changes to clean up the vlan rx path broke trunk vlan.  Trunk vlans in
> > a VF driver are those set using:
> >
> > "ip link set <pfdev> vf <n> <vlanid>"
> >
> > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> > CC: Jiri Pirko <jpirko@redhat.com>
> > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > ---
> >  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |    6 ++----
> >  1 files changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> > index d72905b..4930c46 100644
> > --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> > +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> > @@ -293,12 +293,10 @@ static void ixgbevf_receive_skb(struct
> ixgbevf_q_vector *q_vector,
> >  {
> >        struct ixgbevf_adapter *adapter = q_vector->adapter;
> >        bool is_vlan = (status & IXGBE_RXD_STAT_VP);
> > +       u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan);
> >
> > -       if (is_vlan) {
> > -               u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan);
> > -
> > +       if (is_vlan && test_bit(tag, adapter->active_vlans))
> >                __vlan_hwaccel_put_tag(skb, tag);
> > -       }
> 
> What happens if you run tcpdump without configuring vlan devices?
> Shouldn't you see tagged packets for the vlans that are being trunked
> to you?  I think this will strip tags in that case.  The apparent
> behavior of vlan filters here is also surprising to me because on one
> hand if they're truly filtering this test shouldn't be needed and on
> the other hand they don't seem to be disabled in promiscuous mode.

I think you're not quite understanding the action the HW is taking here.  In the physical function driver we have put the VF on a VLAN without it knowing that it's on a VLAN.  Once that is done by the PF, the VF is not allowed to configure its own VLANs anymore.  However, the descriptor still includes a bit for the VLAN tag indicating it was a packet that arrived on a VLAN.  The HW is inserting and stripping the VLAN tag though without any awareness of that by the VF driver.

It's a security measure to allow an administrator to put a VF on a VLAN to provide another level of isolation for the VF.

Intel VFs don't support promiscuous mode.  If you ran tcpdump you wouldn't see the VLAN tags because they've been stripped by the HW.  The VF has no choice in this.

- Greg

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [net-next 05/10] if_link: Add additional parameter to IFLA_VF_INFO for spoof checking
  2011-09-24 16:40   ` Ben Hutchings
  2011-09-25  5:54     ` Jeff Kirsher
@ 2011-09-26 16:06     ` Rose, Gregory V
  1 sibling, 0 replies; 30+ messages in thread
From: Rose, Gregory V @ 2011-09-26 16:06 UTC (permalink / raw)
  To: Ben Hutchings, Kirsher, Jeffrey T; +Cc: davem, netdev, gospo

> -----Original Message-----
> From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> Sent: Saturday, September 24, 2011 9:41 AM
> To: Kirsher, Jeffrey T
> Cc: davem@davemloft.net; Rose, Gregory V; netdev@vger.kernel.org;
> gospo@redhat.com
> Subject: Re: [net-next 05/10] if_link: Add additional parameter to
> IFLA_VF_INFO for spoof checking
> 
> On Sat, 2011-09-24 at 02:17 -0700, Jeff Kirsher wrote:
> > From: Greg Rose <gregory.v.rose@intel.com>
> >
> > Add configuration setting for drivers to turn spoof checking on or off
> > for discrete VFs.
> >
> > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > ---
> >  include/linux/if_link.h   |    7 +++++++
> >  include/linux/netdevice.h |    3 +++
> >  net/core/rtnetlink.c      |   25 ++++++++++++++++++++++---
> >  3 files changed, 32 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> > index 0ee969a..8bd6d6d 100644
> > --- a/include/linux/if_link.h
> > +++ b/include/linux/if_link.h
> > @@ -279,6 +279,7 @@ enum {
> >  	IFLA_VF_MAC,		/* Hardware queue specific attributes */
> >  	IFLA_VF_VLAN,
> >  	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
> > +	IFLA_VF_SPOOFCHK,	/* Spoof Checking on/off switch */
> >  	__IFLA_VF_MAX,
> >  };
> >
> > @@ -300,12 +301,18 @@ struct ifla_vf_tx_rate {
> >  	__u32 rate; /* Max TX bandwidth in Mbps, 0 disables throttling */
> >  };
> >
> > +struct ifla_vf_spoofchk {
> > +	__u32 vf;
> > +	__u32 setting;
> > +};
> > +
> >  struct ifla_vf_info {
> >  	__u32 vf;
> >  	__u8 mac[32];
> >  	__u32 vlan;
> >  	__u32 qos;
> >  	__u32 tx_rate;
> > +	__u32 spoofchk;
> >  };
> 
> Not something you need to change now, but shouldn't this last struct
> definition be #ifdef __KERNEL__?

It's not done for any of the other ifla_vf_* structs in if_link.h.  Which doesn't mean it shouldn't have been done but I was merely following custom.  If it needs doing though then I'd be glad to do it.

> 
> >  /* VF ports management section
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 43b3298..a2951a0 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -781,6 +781,7 @@ struct netdev_tc_txq {
> >   * int (*ndo_set_vf_mac)(struct net_device *dev, int vf, u8* mac);
> >   * int (*ndo_set_vf_vlan)(struct net_device *dev, int vf, u16 vlan, u8
> qos);
> >   * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
> > + * int (*ndo_set_vf_spoofchk)(struct net_device *dev, int vf, u8
> setting);
> >   * int (*ndo_get_vf_config)(struct net_device *dev,
> >   *			    int vf, struct ifla_vf_info *ivf);
> >   * int (*ndo_set_vf_port)(struct net_device *dev, int vf,
> > @@ -900,6 +901,8 @@ struct net_device_ops {
> >  						   int queue, u16 vlan, u8 qos);
> >  	int			(*ndo_set_vf_tx_rate)(struct net_device *dev,
> >  						      int vf, int rate);
> > +	int			(*ndo_set_vf_spoofchk)(struct net_device *dev,
> > +						       int vf, u8 setting);
> 
> Why u8 and not bool?

I didn't see the bool type used in any of the other parameter lists for the net device ops so I hesitated to use it myself.  If it's not a problem doing so then I'll go ahead and change it.


> 
> >  	int			(*ndo_get_vf_config)(struct net_device *dev,
> >  						     int vf,
> >  						     struct ifla_vf_info *ivf);
> > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > index 39f8dd6..6535810 100644
> > --- a/net/core/rtnetlink.c
> > +++ b/net/core/rtnetlink.c
> > @@ -731,7 +731,8 @@ static inline int rtnl_vfinfo_size(const struct
> net_device *dev)
> >  		size += num_vfs *
> >  			(nla_total_size(sizeof(struct ifla_vf_mac)) +
> >  			 nla_total_size(sizeof(struct ifla_vf_vlan)) +
> > -			 nla_total_size(sizeof(struct ifla_vf_tx_rate)));
> > +			 nla_total_size(sizeof(struct ifla_vf_tx_rate)) +
> > +			 nla_total_size(sizeof(struct ifla_vf_spoofchk)));
> >  		return size;
> >  	} else
> >  		return 0;
> > @@ -954,13 +955,19 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
> struct net_device *dev,
> >  			struct ifla_vf_mac vf_mac;
> >  			struct ifla_vf_vlan vf_vlan;
> >  			struct ifla_vf_tx_rate vf_tx_rate;
> > +			struct ifla_vf_spoofchk vf_spoofchk;
> >  			if (dev->netdev_ops->ndo_get_vf_config(dev, i, &ivi))
> >  				break;
> > -			vf_mac.vf = vf_vlan.vf = vf_tx_rate.vf = ivi.vf;
> > +			vf_mac.vf =
> > +			vf_vlan.vf =
> > +			vf_tx_rate.vf =
> > +			vf_spoofchk.vf = ivi.vf;
> > +
> [...]
> 
> The continuation lines should be indented.  Or you could just write the
> assignments as multiple statements.

OK, sure.  I'll do that.

Thanks,

- Greg


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next 05/10] if_link: Add additional parameter to IFLA_VF_INFO for spoof checking
  2011-09-25 20:06     ` Ben Hutchings
@ 2011-09-26 16:14       ` Stephen Hemminger
  2011-09-26 16:32         ` Rose, Gregory V
  0 siblings, 1 reply; 30+ messages in thread
From: Stephen Hemminger @ 2011-09-26 16:14 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Jeff Kirsher, davem, Greg Rose, netdev, gospo

On Sun, 25 Sep 2011 21:06:49 +0100
Ben Hutchings <bhutchings@solarflare.com> wrote:

> On Sun, 2011-09-25 at 10:22 -0700, Stephen Hemminger wrote:
> > On Sat, 24 Sep 2011 02:17:38 -0700
> > Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
> > 
> > > From: Greg Rose <gregory.v.rose@intel.com>
> > > 
> > > Add configuration setting for drivers to turn spoof checking on or off
> > > for discrete VFs.
> > > 
> > > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> > > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > > ---
> > >  include/linux/if_link.h   |    7 +++++++
> > >  include/linux/netdevice.h |    3 +++
> > >  net/core/rtnetlink.c      |   25 ++++++++++++++++++++++---
> > >  3 files changed, 32 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> > > index 0ee969a..8bd6d6d 100644
> > > --- a/include/linux/if_link.h
> > > +++ b/include/linux/if_link.h
> > > @@ -279,6 +279,7 @@ enum {
> > >  	IFLA_VF_MAC,		/* Hardware queue specific attributes */
> > >  	IFLA_VF_VLAN,
> > >  	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
> > > +	IFLA_VF_SPOOFCHK,	/* Spoof Checking on/off switch */
> > >  	__IFLA_VF_MAX,
> > >  };
> > >  
> > > @@ -300,12 +301,18 @@ struct ifla_vf_tx_rate {
> > >  	__u32 rate; /* Max TX bandwidth in Mbps, 0 disables throttling */
> > >  };
> > >  
> > > +struct ifla_vf_spoofchk {
> > > +	__u32 vf;
> > > +	__u32 setting;
> > > +};
> > > +
> > >  struct ifla_vf_info {
> > >  	__u32 vf;
> > >  	__u8 mac[32];
> > >  	__u32 vlan;
> > >  	__u32 qos;
> > >  	__u32 tx_rate;
> > > +	__u32 spoofchk;
> > >  };
> > 
> > This breaks ABI compatibility, unless you add some special case code
> > to handle the case of tools sending the old ifla_vf_info. Users may have older version
> > of ip utilities that send smaller size structure.
> 
> Unless I'm missing something, that structure is not sent or received by
> userland; that's why I thought it should be #ifdef __KERNEL__.

The struct ifla_vf_info is exposed to userland, it is not inside the #ifdef
and therefore exposed.

But it is probably okay to change this structure because the ifla_vf_info
is not used/supported by any released version iproute2.  There may have been
some patches to use this, but they never made it into the git or released
code.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [net-next 05/10] if_link: Add additional parameter to IFLA_VF_INFO for spoof checking
  2011-09-25 17:22   ` Stephen Hemminger
  2011-09-25 20:06     ` Ben Hutchings
@ 2011-09-26 16:18     ` Rose, Gregory V
  2011-09-26 16:21       ` Stephen Hemminger
  1 sibling, 1 reply; 30+ messages in thread
From: Rose, Gregory V @ 2011-09-26 16:18 UTC (permalink / raw)
  To: Stephen Hemminger, Kirsher, Jeffrey T; +Cc: davem, netdev, gospo

> -----Original Message-----
> From: Stephen Hemminger [mailto:shemminger@vyatta.com]
> Sent: Sunday, September 25, 2011 10:23 AM
> To: Kirsher, Jeffrey T
> Cc: davem@davemloft.net; Rose, Gregory V; netdev@vger.kernel.org;
> gospo@redhat.com
> Subject: Re: [net-next 05/10] if_link: Add additional parameter to
> IFLA_VF_INFO for spoof checking
> 
> On Sat, 24 Sep 2011 02:17:38 -0700
> Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
> 
> > From: Greg Rose <gregory.v.rose@intel.com>
> >
> > Add configuration setting for drivers to turn spoof checking on or off
> > for discrete VFs.
> >
> > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > ---
> >  include/linux/if_link.h   |    7 +++++++
> >  include/linux/netdevice.h |    3 +++
> >  net/core/rtnetlink.c      |   25 ++++++++++++++++++++++---
> >  3 files changed, 32 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> > index 0ee969a..8bd6d6d 100644
> > --- a/include/linux/if_link.h
> > +++ b/include/linux/if_link.h
> > @@ -279,6 +279,7 @@ enum {
> >  	IFLA_VF_MAC,		/* Hardware queue specific attributes */
> >  	IFLA_VF_VLAN,
> >  	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
> > +	IFLA_VF_SPOOFCHK,	/* Spoof Checking on/off switch */
> >  	__IFLA_VF_MAX,
> >  };
> >
> > @@ -300,12 +301,18 @@ struct ifla_vf_tx_rate {
> >  	__u32 rate; /* Max TX bandwidth in Mbps, 0 disables throttling */
> >  };
> >
> > +struct ifla_vf_spoofchk {
> > +	__u32 vf;
> > +	__u32 setting;
> > +};
> > +
> >  struct ifla_vf_info {
> >  	__u32 vf;
> >  	__u8 mac[32];
> >  	__u32 vlan;
> >  	__u32 qos;
> >  	__u32 tx_rate;
> > +	__u32 spoofchk;
> >  };
> 
> This breaks ABI compatibility, unless you add some special case code
> to handle the case of tools sending the old ifla_vf_info. Users may have
> older version
> of ip utilities that send smaller size structure.

The structure is not sent directly to the kernel from user space.  The kernel will get the information and stuff it into individual data units using NLA_PUT.  If the older tool doesn't ask for the info then that's fine so far as I can tell.

The only issue I've seen is using the new ip tool on older kernels that don't supply the data.  You'll get a segmentation fault and core dump.  However, I was under the impression that the general rule was to use a release of the ip tool only on the kernel it was released for or on newer kernels.

- Greg

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next 05/10] if_link: Add additional parameter to IFLA_VF_INFO for spoof checking
  2011-09-26 16:18     ` Rose, Gregory V
@ 2011-09-26 16:21       ` Stephen Hemminger
  2011-09-26 16:57         ` Rose, Gregory V
  0 siblings, 1 reply; 30+ messages in thread
From: Stephen Hemminger @ 2011-09-26 16:21 UTC (permalink / raw)
  To: Rose, Gregory V; +Cc: Kirsher, Jeffrey T, davem, netdev, gospo

On Mon, 26 Sep 2011 09:18:34 -0700
"Rose, Gregory V" <gregory.v.rose@intel.com> wrote:

> > -----Original Message-----
> > From: Stephen Hemminger [mailto:shemminger@vyatta.com]
> > Sent: Sunday, September 25, 2011 10:23 AM
> > To: Kirsher, Jeffrey T
> > Cc: davem@davemloft.net; Rose, Gregory V; netdev@vger.kernel.org;
> > gospo@redhat.com
> > Subject: Re: [net-next 05/10] if_link: Add additional parameter to
> > IFLA_VF_INFO for spoof checking
> > 
> > On Sat, 24 Sep 2011 02:17:38 -0700
> > Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
> > 
> > > From: Greg Rose <gregory.v.rose@intel.com>
> > >
> > > Add configuration setting for drivers to turn spoof checking on or off
> > > for discrete VFs.
> > >
> > > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> > > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > > ---
> > >  include/linux/if_link.h   |    7 +++++++
> > >  include/linux/netdevice.h |    3 +++
> > >  net/core/rtnetlink.c      |   25 ++++++++++++++++++++++---
> > >  3 files changed, 32 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> > > index 0ee969a..8bd6d6d 100644
> > > --- a/include/linux/if_link.h
> > > +++ b/include/linux/if_link.h
> > > @@ -279,6 +279,7 @@ enum {
> > >  	IFLA_VF_MAC,		/* Hardware queue specific attributes */
> > >  	IFLA_VF_VLAN,
> > >  	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
> > > +	IFLA_VF_SPOOFCHK,	/* Spoof Checking on/off switch */
> > >  	__IFLA_VF_MAX,
> > >  };
> > >
> > > @@ -300,12 +301,18 @@ struct ifla_vf_tx_rate {
> > >  	__u32 rate; /* Max TX bandwidth in Mbps, 0 disables throttling */
> > >  };
> > >
> > > +struct ifla_vf_spoofchk {
> > > +	__u32 vf;
> > > +	__u32 setting;
> > > +};
> > > +
> > >  struct ifla_vf_info {
> > >  	__u32 vf;
> > >  	__u8 mac[32];
> > >  	__u32 vlan;
> > >  	__u32 qos;
> > >  	__u32 tx_rate;
> > > +	__u32 spoofchk;
> > >  };
> > 
> > This breaks ABI compatibility, unless you add some special case code
> > to handle the case of tools sending the old ifla_vf_info. Users may have
> > older version
> > of ip utilities that send smaller size structure.
> 
> The structure is not sent directly to the kernel from user space.  The kernel will get the information and stuff it into individual data units using NLA_PUT.  If the older tool doesn't ask for the info then that's fine so far as I can tell.
> 
> The only issue I've seen is using the new ip tool on older kernels that don't supply the data.  You'll get a segmentation fault and core dump.  However, I was under the impression that the general rule was to use a release of the ip tool only on the kernel it was released for or on newer kernels.
> 
> - Greg

The tools need to run on older kernels. Think of Debian and other distributions which want to
ship newer ip tools but run on old kernel.  In this case, what is expected is:
# ip li some new option
RTNETLINK: Invalid ...

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [net-next 05/10] if_link: Add additional parameter to IFLA_VF_INFO for spoof checking
  2011-09-26 16:14       ` Stephen Hemminger
@ 2011-09-26 16:32         ` Rose, Gregory V
  2011-09-26 16:37           ` Stephen Hemminger
  0 siblings, 1 reply; 30+ messages in thread
From: Rose, Gregory V @ 2011-09-26 16:32 UTC (permalink / raw)
  To: Stephen Hemminger, Ben Hutchings; +Cc: Kirsher, Jeffrey T, davem, netdev, gospo

> -----Original Message-----
> From: Stephen Hemminger [mailto:shemminger@vyatta.com]
> Sent: Monday, September 26, 2011 9:14 AM
> To: Ben Hutchings
> Cc: Kirsher, Jeffrey T; davem@davemloft.net; Rose, Gregory V;
> netdev@vger.kernel.org; gospo@redhat.com
> Subject: Re: [net-next 05/10] if_link: Add additional parameter to
> IFLA_VF_INFO for spoof checking
> 
> On Sun, 25 Sep 2011 21:06:49 +0100
> Ben Hutchings <bhutchings@solarflare.com> wrote:
> 
> > On Sun, 2011-09-25 at 10:22 -0700, Stephen Hemminger wrote:
> > > On Sat, 24 Sep 2011 02:17:38 -0700
> > > Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
> > >
> > > > From: Greg Rose <gregory.v.rose@intel.com>
> > > >
> > > > Add configuration setting for drivers to turn spoof checking on or
> off
> > > > for discrete VFs.
> > > >
> > > > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> > > > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > > > ---
> > > >  include/linux/if_link.h   |    7 +++++++
> > > >  include/linux/netdevice.h |    3 +++
> > > >  net/core/rtnetlink.c      |   25 ++++++++++++++++++++++---
> > > >  3 files changed, 32 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> > > > index 0ee969a..8bd6d6d 100644
> > > > --- a/include/linux/if_link.h
> > > > +++ b/include/linux/if_link.h
> > > > @@ -279,6 +279,7 @@ enum {
> > > >  	IFLA_VF_MAC,		/* Hardware queue specific attributes
> */
> > > >  	IFLA_VF_VLAN,
> > > >  	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
> > > > +	IFLA_VF_SPOOFCHK,	/* Spoof Checking on/off switch */
> > > >  	__IFLA_VF_MAX,
> > > >  };
> > > >
> > > > @@ -300,12 +301,18 @@ struct ifla_vf_tx_rate {
> > > >  	__u32 rate; /* Max TX bandwidth in Mbps, 0 disables throttling
> */
> > > >  };
> > > >
> > > > +struct ifla_vf_spoofchk {
> > > > +	__u32 vf;
> > > > +	__u32 setting;
> > > > +};
> > > > +
> > > >  struct ifla_vf_info {
> > > >  	__u32 vf;
> > > >  	__u8 mac[32];
> > > >  	__u32 vlan;
> > > >  	__u32 qos;
> > > >  	__u32 tx_rate;
> > > > +	__u32 spoofchk;
> > > >  };
> > >
> > > This breaks ABI compatibility, unless you add some special case code
> > > to handle the case of tools sending the old ifla_vf_info. Users may
> have older version
> > > of ip utilities that send smaller size structure.
> >
> > Unless I'm missing something, that structure is not sent or received by
> > userland; that's why I thought it should be #ifdef __KERNEL__.
> 
> The struct ifla_vf_info is exposed to userland, it is not inside the
> #ifdef
> and therefore exposed.
> 
> But it is probably okay to change this structure because the ifla_vf_info
> is not used/supported by any released version iproute2.  There may have
> been
> some patches to use this, but they never made it into the git or released
> code.
> 

I will respin the patches with *ifdef __KERNEL__ wrapper.

Thanks,

- Greg

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next 05/10] if_link: Add additional parameter to IFLA_VF_INFO for spoof checking
  2011-09-26 16:32         ` Rose, Gregory V
@ 2011-09-26 16:37           ` Stephen Hemminger
  0 siblings, 0 replies; 30+ messages in thread
From: Stephen Hemminger @ 2011-09-26 16:37 UTC (permalink / raw)
  To: Rose, Gregory V; +Cc: Ben Hutchings, Kirsher, Jeffrey T, davem, netdev, gospo

On Mon, 26 Sep 2011 09:32:35 -0700
"Rose, Gregory V" <gregory.v.rose@intel.com> wrote:

> > -----Original Message-----
> > From: Stephen Hemminger [mailto:shemminger@vyatta.com]
> > Sent: Monday, September 26, 2011 9:14 AM
> > To: Ben Hutchings
> > Cc: Kirsher, Jeffrey T; davem@davemloft.net; Rose, Gregory V;
> > netdev@vger.kernel.org; gospo@redhat.com
> > Subject: Re: [net-next 05/10] if_link: Add additional parameter to
> > IFLA_VF_INFO for spoof checking
> > 
> > On Sun, 25 Sep 2011 21:06:49 +0100
> > Ben Hutchings <bhutchings@solarflare.com> wrote:
> > 
> > > On Sun, 2011-09-25 at 10:22 -0700, Stephen Hemminger wrote:
> > > > On Sat, 24 Sep 2011 02:17:38 -0700
> > > > Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
> > > >
> > > > > From: Greg Rose <gregory.v.rose@intel.com>
> > > > >
> > > > > Add configuration setting for drivers to turn spoof checking on or
> > off
> > > > > for discrete VFs.
> > > > >
> > > > > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> > > > > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > > > > ---
> > > > >  include/linux/if_link.h   |    7 +++++++
> > > > >  include/linux/netdevice.h |    3 +++
> > > > >  net/core/rtnetlink.c      |   25 ++++++++++++++++++++++---
> > > > >  3 files changed, 32 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> > > > > index 0ee969a..8bd6d6d 100644
> > > > > --- a/include/linux/if_link.h
> > > > > +++ b/include/linux/if_link.h
> > > > > @@ -279,6 +279,7 @@ enum {
> > > > >  	IFLA_VF_MAC,		/* Hardware queue specific attributes
> > */
> > > > >  	IFLA_VF_VLAN,
> > > > >  	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
> > > > > +	IFLA_VF_SPOOFCHK,	/* Spoof Checking on/off switch */
> > > > >  	__IFLA_VF_MAX,
> > > > >  };
> > > > >
> > > > > @@ -300,12 +301,18 @@ struct ifla_vf_tx_rate {
> > > > >  	__u32 rate; /* Max TX bandwidth in Mbps, 0 disables throttling
> > */
> > > > >  };
> > > > >
> > > > > +struct ifla_vf_spoofchk {
> > > > > +	__u32 vf;
> > > > > +	__u32 setting;
> > > > > +};
> > > > > +
> > > > >  struct ifla_vf_info {
> > > > >  	__u32 vf;
> > > > >  	__u8 mac[32];
> > > > >  	__u32 vlan;
> > > > >  	__u32 qos;
> > > > >  	__u32 tx_rate;
> > > > > +	__u32 spoofchk;
> > > > >  };
> > > >
> > > > This breaks ABI compatibility, unless you add some special case code
> > > > to handle the case of tools sending the old ifla_vf_info. Users may
> > have older version
> > > > of ip utilities that send smaller size structure.
> > >
> > > Unless I'm missing something, that structure is not sent or received by
> > > userland; that's why I thought it should be #ifdef __KERNEL__.
> > 
> > The struct ifla_vf_info is exposed to userland, it is not inside the
> > #ifdef
> > and therefore exposed.
> > 
> > But it is probably okay to change this structure because the ifla_vf_info
> > is not used/supported by any released version iproute2.  There may have
> > been
> > some patches to use this, but they never made it into the git or released
> > code.
> > 
> 
> I will respin the patches with *ifdef __KERNEL__ wrapper.
> 
> Thanks,
> 
> - Greg
> 

Just make spoofchk a new nested attribute.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [net-next 05/10] if_link: Add additional parameter to IFLA_VF_INFO for spoof checking
  2011-09-26 16:21       ` Stephen Hemminger
@ 2011-09-26 16:57         ` Rose, Gregory V
  0 siblings, 0 replies; 30+ messages in thread
From: Rose, Gregory V @ 2011-09-26 16:57 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Kirsher, Jeffrey T, davem, netdev, gospo

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Stephen Hemminger
> Sent: Monday, September 26, 2011 9:21 AM
> To: Rose, Gregory V
> Cc: Kirsher, Jeffrey T; davem@davemloft.net; netdev@vger.kernel.org;
> gospo@redhat.com
> Subject: Re: [net-next 05/10] if_link: Add additional parameter to
> IFLA_VF_INFO for spoof checking
> 
> On Mon, 26 Sep 2011 09:18:34 -0700
> "Rose, Gregory V" <gregory.v.rose@intel.com> wrote:
> 
> > > -----Original Message-----
> > > From: Stephen Hemminger [mailto:shemminger@vyatta.com]
> > > Sent: Sunday, September 25, 2011 10:23 AM
> > > To: Kirsher, Jeffrey T
> > > Cc: davem@davemloft.net; Rose, Gregory V; netdev@vger.kernel.org;
> > > gospo@redhat.com
> > > Subject: Re: [net-next 05/10] if_link: Add additional parameter to
> > > IFLA_VF_INFO for spoof checking
> > >
> > > On Sat, 24 Sep 2011 02:17:38 -0700
> > > Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
> > >
> > > > From: Greg Rose <gregory.v.rose@intel.com>
> > > >
> > > > Add configuration setting for drivers to turn spoof checking on or
> off
> > > > for discrete VFs.
> > > >
> > > > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> > > > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > > > ---
> > > >  include/linux/if_link.h   |    7 +++++++
> > > >  include/linux/netdevice.h |    3 +++
> > > >  net/core/rtnetlink.c      |   25 ++++++++++++++++++++++---
> > > >  3 files changed, 32 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> > > > index 0ee969a..8bd6d6d 100644
> > > > --- a/include/linux/if_link.h
> > > > +++ b/include/linux/if_link.h
> > > > @@ -279,6 +279,7 @@ enum {
> > > >  	IFLA_VF_MAC,		/* Hardware queue specific attributes
> */
> > > >  	IFLA_VF_VLAN,
> > > >  	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
> > > > +	IFLA_VF_SPOOFCHK,	/* Spoof Checking on/off switch */
> > > >  	__IFLA_VF_MAX,
> > > >  };
> > > >
> > > > @@ -300,12 +301,18 @@ struct ifla_vf_tx_rate {
> > > >  	__u32 rate; /* Max TX bandwidth in Mbps, 0 disables throttling
> */
> > > >  };
> > > >
> > > > +struct ifla_vf_spoofchk {
> > > > +	__u32 vf;
> > > > +	__u32 setting;
> > > > +};
> > > > +
> > > >  struct ifla_vf_info {
> > > >  	__u32 vf;
> > > >  	__u8 mac[32];
> > > >  	__u32 vlan;
> > > >  	__u32 qos;
> > > >  	__u32 tx_rate;
> > > > +	__u32 spoofchk;
> > > >  };
> > >
> > > This breaks ABI compatibility, unless you add some special case code
> > > to handle the case of tools sending the old ifla_vf_info. Users may
> have
> > > older version
> > > of ip utilities that send smaller size structure.
> >
> > The structure is not sent directly to the kernel from user space.  The
> kernel will get the information and stuff it into individual data units
> using NLA_PUT.  If the older tool doesn't ask for the info then that's
> fine so far as I can tell.
> >
> > The only issue I've seen is using the new ip tool on older kernels that
> don't supply the data.  You'll get a segmentation fault and core dump.
> However, I was under the impression that the general rule was to use a
> release of the ip tool only on the kernel it was released for or on newer
> kernels.
> >
> > - Greg
> 
> The tools need to run on older kernels. Think of Debian and other
> distributions which want to
> ship newer ip tools but run on old kernel.  In this case, what is expected
> is:
> # ip li some new option
> RTNETLINK: Invalid ...

That is what happens when you set the option on older kernels that don't support it.  In the do_setvfinfo() function in ../net/core/rtnetlink.c the default return value is EINVAL.  If the case for IFLA_VF_SPOOFCHK doesn't exist then it falls through and returns with the error value.

It's when you do the ip link show <dev> command on an older kernel that doesn't report the spoof check value that you get the segmentation fault in the ip tool.  I'll look at that and see what I can do to fix that up and respin the patch.

- Greg

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next 02/10] ixgbevf: Fix broken trunk vlan
  2011-09-26 15:57     ` Rose, Gregory V
@ 2011-09-26 23:08       ` Jesse Gross
  2011-09-26 23:24         ` Rose, Gregory V
  0 siblings, 1 reply; 30+ messages in thread
From: Jesse Gross @ 2011-09-26 23:08 UTC (permalink / raw)
  To: Rose, Gregory V; +Cc: Kirsher, Jeffrey T, davem, netdev, gospo, Jiri Pirko

On Mon, Sep 26, 2011 at 8:57 AM, Rose, Gregory V
<gregory.v.rose@intel.com> wrote:
>> -----Original Message-----
>> From: Jesse Gross [mailto:jesse@nicira.com]
>> Sent: Saturday, September 24, 2011 9:33 AM
>> To: Kirsher, Jeffrey T
>> Cc: davem@davemloft.net; Rose, Gregory V; netdev@vger.kernel.org;
>> gospo@redhat.com; Jiri Pirko
>> Subject: Re: [net-next 02/10] ixgbevf: Fix broken trunk vlan
>>
>> On Sat, Sep 24, 2011 at 2:17 AM, Jeff Kirsher
>> <jeffrey.t.kirsher@intel.com> wrote:
>> > From: Greg Rose <gregory.v.rose@intel.com>
>> >
>> > Changes to clean up the vlan rx path broke trunk vlan.  Trunk vlans in
>> > a VF driver are those set using:
>> >
>> > "ip link set <pfdev> vf <n> <vlanid>"
>> >
>> > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
>> > CC: Jiri Pirko <jpirko@redhat.com>
>> > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>> > ---
>> >  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |    6 ++----
>> >  1 files changed, 2 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
>> b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
>> > index d72905b..4930c46 100644
>> > --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
>> > +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
>> > @@ -293,12 +293,10 @@ static void ixgbevf_receive_skb(struct
>> ixgbevf_q_vector *q_vector,
>> >  {
>> >        struct ixgbevf_adapter *adapter = q_vector->adapter;
>> >        bool is_vlan = (status & IXGBE_RXD_STAT_VP);
>> > +       u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan);
>> >
>> > -       if (is_vlan) {
>> > -               u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan);
>> > -
>> > +       if (is_vlan && test_bit(tag, adapter->active_vlans))
>> >                __vlan_hwaccel_put_tag(skb, tag);
>> > -       }
>>
>> What happens if you run tcpdump without configuring vlan devices?
>> Shouldn't you see tagged packets for the vlans that are being trunked
>> to you?  I think this will strip tags in that case.  The apparent
>> behavior of vlan filters here is also surprising to me because on one
>> hand if they're truly filtering this test shouldn't be needed and on
>> the other hand they don't seem to be disabled in promiscuous mode.
>
> I think you're not quite understanding the action the HW is taking here.  In the physical function driver we have put the VF on a VLAN without it knowing that it's on a VLAN.  Once that is done by the PF, the VF is not allowed to configure its own VLANs anymore.  However, the descriptor still includes a bit for the VLAN tag indicating it was a packet that arrived on a VLAN.  The HW is inserting and stripping the VLAN tag though without any awareness of that by the VF driver.
>
> It's a security measure to allow an administrator to put a VF on a VLAN to provide another level of isolation for the VF.
>
> Intel VFs don't support promiscuous mode.  If you ran tcpdump you wouldn't see the VLAN tags because they've been stripped by the HW.  The VF has no choice in this.

I understand that VFs are limited in what they can do.  What I'm
concerned about are differences in how the driver behaves when a vlan
device is configured on it vs. not.  There are two cases that I think
you will see different behavior:

* The VF driver has it's own vlan filters.  I'm assuming (correct me
if I'm wrong) that this will further restrict the vlans that the guest
sees from the set that have been configured on the PF driver.  If that
is correct, then putting the VF interface in promiscuous mode should
disable the filters and allow packets from all vlans that the PF
allows.  This situation is the same as if an upstream physical switch
is trunking only a subset of vlans to an end host and the NIC is put
in promiscuous mode.  However, it doesn't look like the VF driver does
anything in response to being in promiscuous mode.  This shouldn't
require any hardware support because it is just clearing the filter
table.
* It looks like the VF driver is receiving not just an indication that
a tag was present but the actual tag and although it was already
stripped by the NIC it can be passed up to the network stack just as
if it was stripped in a non-SR-IOV NIC.  After this patch, if you run
tcpdump on the VF interface you will see packets with tags if a vlan
device is configured but packets without tags if there is no vlan
device.

Does the VF know whether it is assigned to a particular vlan or is
allowed to send/receive tagged packets (essentially the difference
between an access port or a trunk port)?  Or is that why the presence
of vlan device is used as an indication?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [net-next 02/10] ixgbevf: Fix broken trunk vlan
  2011-09-26 23:08       ` Jesse Gross
@ 2011-09-26 23:24         ` Rose, Gregory V
  2011-09-27  0:54           ` Jesse Gross
  0 siblings, 1 reply; 30+ messages in thread
From: Rose, Gregory V @ 2011-09-26 23:24 UTC (permalink / raw)
  To: Jesse Gross; +Cc: Kirsher, Jeffrey T, davem, netdev, gospo, Jiri Pirko

> -----Original Message-----
> From: Jesse Gross [mailto:jesse@nicira.com]
> Sent: Monday, September 26, 2011 4:09 PM
> To: Rose, Gregory V
> Cc: Kirsher, Jeffrey T; davem@davemloft.net; netdev@vger.kernel.org;
> gospo@redhat.com; Jiri Pirko
> Subject: Re: [net-next 02/10] ixgbevf: Fix broken trunk vlan
> 
> On Mon, Sep 26, 2011 at 8:57 AM, Rose, Gregory V
> <gregory.v.rose@intel.com> wrote:
> >> -----Original Message-----
> >> From: Jesse Gross [mailto:jesse@nicira.com]
> >> Sent: Saturday, September 24, 2011 9:33 AM
> >> To: Kirsher, Jeffrey T
> >> Cc: davem@davemloft.net; Rose, Gregory V; netdev@vger.kernel.org;
> >> gospo@redhat.com; Jiri Pirko
> >> Subject: Re: [net-next 02/10] ixgbevf: Fix broken trunk vlan
> >>
> >> What happens if you run tcpdump without configuring vlan devices?
> >> Shouldn't you see tagged packets for the vlans that are being trunked
> >> to you?  I think this will strip tags in that case.  The apparent
> >> behavior of vlan filters here is also surprising to me because on one
> >> hand if they're truly filtering this test shouldn't be needed and on
> >> the other hand they don't seem to be disabled in promiscuous mode.
> >
> > I think you're not quite understanding the action the HW is taking here.
>  In the physical function driver we have put the VF on a VLAN without it
> knowing that it's on a VLAN.  Once that is done by the PF, the VF is not
> allowed to configure its own VLANs anymore.  However, the descriptor still
> includes a bit for the VLAN tag indicating it was a packet that arrived on
> a VLAN.  The HW is inserting and stripping the VLAN tag though without any
> awareness of that by the VF driver.
> >
> > It's a security measure to allow an administrator to put a VF on a VLAN
> to provide another level of isolation for the VF.
> >
> > Intel VFs don't support promiscuous mode.  If you ran tcpdump you
> wouldn't see the VLAN tags because they've been stripped by the HW.  The
> VF has no choice in this.
> 
> I understand that VFs are limited in what they can do.  What I'm
> concerned about are differences in how the driver behaves when a vlan
> device is configured on it vs. not.  There are two cases that I think
> you will see different behavior:
> 
> * The VF driver has it's own vlan filters.  I'm assuming (correct me
> if I'm wrong) that this will further restrict the vlans that the guest
> sees from the set that have been configured on the PF driver.  If that
> is correct, then putting the VF interface in promiscuous mode should
> disable the filters and allow packets from all vlans that the PF
> allows.  This situation is the same as if an upstream physical switch
> is trunking only a subset of vlans to an end host and the NIC is put
> in promiscuous mode.  However, it doesn't look like the VF driver does
> anything in response to being in promiscuous mode.  This shouldn't
> require any hardware support because it is just clearing the filter
> table.

The VF does not have its own VLAN filters.  It must request them through the PF via the mailbox messaging feature in the NIC.  When the PF puts the VF in trunk VLAN mode via the 'ip link set <dev> vf <n> vlan <vlanid>' command then the PF also flags that VF and will not take VLAN filter requests from it.  In effect, the VF is forced to a VLAN and has no choice.  So it cannot further restrict what VLANs the guest sees.  And the VF can only be on a single trunk VLAN, it does not support multiple trunk VLANs.

And again, the VF doesn’t support promiscuous mode.  The VF device HW doesn't have that feature.

> * It looks like the VF driver is receiving not just an indication that
> a tag was present but the actual tag and although it was already
> stripped by the NIC it can be passed up to the network stack just as
> if it was stripped in a non-SR-IOV NIC.  After this patch, if you run
> tcpdump on the VF interface you will see packets with tags if a vlan
> device is configured but packets without tags if there is no vlan
> device.
> 

No.  In trunk VLAN mode the receive descriptor provided by the HW has a VLAN tag present bit and that is set but unless someone has used vconfig to configure a VLAN and a bit is set for that VLAN in the active_vlans bit table then the packet is sent up the stack without the tag.  Upper layers don't have access to the driver's internal descriptor rings and have no idea that the VLAN tag present bit was set in the descriptor.  Upper layer SW in the network stack won't be looking for VLAN tags since it never requested any VLANs.  And since the VLAN tag was stripped, and the upper layers haven't ever set a VLAN filter, then there is no expectation of a VLAN tag.

> Does the VF know whether it is assigned to a particular vlan or is
> allowed to send/receive tagged packets (essentially the difference
> between an access port or a trunk port)?

Yes, it knows if it has requested a VLAN filter via the active_vlans bit table.  If no VLANs are set in the active_vlans table then it knows to ignore the VLAN tag present bit set in the descriptor since it must be in trunk VLAN mode.

>  Or is that why the presence
> of vlan device is used as an indication?

Correct, that is the case.

- Greg


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next 02/10] ixgbevf: Fix broken trunk vlan
  2011-09-26 23:24         ` Rose, Gregory V
@ 2011-09-27  0:54           ` Jesse Gross
  2011-09-27 16:39             ` Rose, Gregory V
  0 siblings, 1 reply; 30+ messages in thread
From: Jesse Gross @ 2011-09-27  0:54 UTC (permalink / raw)
  To: Rose, Gregory V; +Cc: Kirsher, Jeffrey T, davem, netdev, gospo, Jiri Pirko

On Mon, Sep 26, 2011 at 4:24 PM, Rose, Gregory V
<gregory.v.rose@intel.com> wrote:
>> -----Original Message-----
>> From: Jesse Gross [mailto:jesse@nicira.com]
>> Sent: Monday, September 26, 2011 4:09 PM
>> To: Rose, Gregory V
>> Cc: Kirsher, Jeffrey T; davem@davemloft.net; netdev@vger.kernel.org;
>> gospo@redhat.com; Jiri Pirko
>> Subject: Re: [net-next 02/10] ixgbevf: Fix broken trunk vlan
>>
>> On Mon, Sep 26, 2011 at 8:57 AM, Rose, Gregory V
>> <gregory.v.rose@intel.com> wrote:
>> >> -----Original Message-----
>> >> From: Jesse Gross [mailto:jesse@nicira.com]
>> >> Sent: Saturday, September 24, 2011 9:33 AM
>> >> To: Kirsher, Jeffrey T
>> >> Cc: davem@davemloft.net; Rose, Gregory V; netdev@vger.kernel.org;
>> >> gospo@redhat.com; Jiri Pirko
>> >> Subject: Re: [net-next 02/10] ixgbevf: Fix broken trunk vlan
>> >>
>> >> What happens if you run tcpdump without configuring vlan devices?
>> >> Shouldn't you see tagged packets for the vlans that are being trunked
>> >> to you?  I think this will strip tags in that case.  The apparent
>> >> behavior of vlan filters here is also surprising to me because on one
>> >> hand if they're truly filtering this test shouldn't be needed and on
>> >> the other hand they don't seem to be disabled in promiscuous mode.
>> >
>> > I think you're not quite understanding the action the HW is taking here.
>>  In the physical function driver we have put the VF on a VLAN without it
>> knowing that it's on a VLAN.  Once that is done by the PF, the VF is not
>> allowed to configure its own VLANs anymore.  However, the descriptor still
>> includes a bit for the VLAN tag indicating it was a packet that arrived on
>> a VLAN.  The HW is inserting and stripping the VLAN tag though without any
>> awareness of that by the VF driver.
>> >
>> > It's a security measure to allow an administrator to put a VF on a VLAN
>> to provide another level of isolation for the VF.
>> >
>> > Intel VFs don't support promiscuous mode.  If you ran tcpdump you
>> wouldn't see the VLAN tags because they've been stripped by the HW.  The
>> VF has no choice in this.
>>
>> I understand that VFs are limited in what they can do.  What I'm
>> concerned about are differences in how the driver behaves when a vlan
>> device is configured on it vs. not.  There are two cases that I think
>> you will see different behavior:
>>
>> * The VF driver has it's own vlan filters.  I'm assuming (correct me
>> if I'm wrong) that this will further restrict the vlans that the guest
>> sees from the set that have been configured on the PF driver.  If that
>> is correct, then putting the VF interface in promiscuous mode should
>> disable the filters and allow packets from all vlans that the PF
>> allows.  This situation is the same as if an upstream physical switch
>> is trunking only a subset of vlans to an end host and the NIC is put
>> in promiscuous mode.  However, it doesn't look like the VF driver does
>> anything in response to being in promiscuous mode.  This shouldn't
>> require any hardware support because it is just clearing the filter
>> table.
>
> The VF does not have its own VLAN filters.  It must request them through the PF via the mailbox messaging feature in the NIC.  When the PF puts the VF in trunk VLAN mode via the 'ip link set <dev> vf <n> vlan <vlanid>' command then the PF also flags that VF and will not take VLAN filter requests from it.  In effect, the VF is forced to a VLAN and has no choice.  So it cannot further restrict what VLANs the guest sees.  And the VF can only be on a single trunk VLAN, it does not support multiple trunk VLANs.
>
> And again, the VF doesn’t support promiscuous mode.  The VF device HW doesn't have that feature.
>
>> * It looks like the VF driver is receiving not just an indication that
>> a tag was present but the actual tag and although it was already
>> stripped by the NIC it can be passed up to the network stack just as
>> if it was stripped in a non-SR-IOV NIC.  After this patch, if you run
>> tcpdump on the VF interface you will see packets with tags if a vlan
>> device is configured but packets without tags if there is no vlan
>> device.
>>
>
> No.  In trunk VLAN mode the receive descriptor provided by the HW has a VLAN tag present bit and that is set but unless someone has used vconfig to configure a VLAN and a bit is set for that VLAN in the active_vlans bit table then the packet is sent up the stack without the tag.  Upper layers don't have access to the driver's internal descriptor rings and have no idea that the VLAN tag present bit was set in the descriptor.  Upper layer SW in the network stack won't be looking for VLAN tags since it never requested any VLANs.  And since the VLAN tag was stripped, and the upper layers haven't ever set a VLAN filter, then there is no expectation of a VLAN tag.
>
>> Does the VF know whether it is assigned to a particular vlan or is
>> allowed to send/receive tagged packets (essentially the difference
>> between an access port or a trunk port)?
>
> Yes, it knows if it has requested a VLAN filter via the active_vlans bit table.  If no VLANs are set in the active_vlans table then it knows to ignore the VLAN tag present bit set in the descriptor since it must be in trunk VLAN mode.
>
>>  Or is that why the presence
>> of vlan device is used as an indication?
>
> Correct, that is the case.

OK, maybe due to hardware limitations what I'm looking for just really
isn't possible.  However, what I'm trying to emphasize is that vconfig
is not the only way that vlans can be consumed by the network stack
and active_vlans is just an indication of whether a vlan filter was
set, nothing more (perhaps I should have picked a better name when I
originally designed this stuff).  In particular, it is not intended to
determine whether a tag should be stripped off or not because
non-vconfig users don't necessarily know which vlans they care about
(think tcpdump or trunking over a bridge).  A major goal of the
existing vlan infrastructure is to avoid having drivers make
assumptions about the consumer of the tag and instead just hand all
information over to the network stack so it can behave in a consistent
manner.  That's why I was looking for alternate ways to get this
information without depending on active_vlans as this driver behaves
quite a bit differently from others, include the ixgbe PF driver.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [net-next 02/10] ixgbevf: Fix broken trunk vlan
  2011-09-27  0:54           ` Jesse Gross
@ 2011-09-27 16:39             ` Rose, Gregory V
  2011-09-27 16:49               ` Jesse Gross
  0 siblings, 1 reply; 30+ messages in thread
From: Rose, Gregory V @ 2011-09-27 16:39 UTC (permalink / raw)
  To: Jesse Gross; +Cc: Kirsher, Jeffrey T, davem, netdev, gospo, Jiri Pirko

> -----Original Message-----
> From: Jesse Gross [mailto:jesse@nicira.com]
> Sent: Monday, September 26, 2011 5:54 PM
> To: Rose, Gregory V
> Cc: Kirsher, Jeffrey T; davem@davemloft.net; netdev@vger.kernel.org;
> gospo@redhat.com; Jiri Pirko
> Subject: Re: [net-next 02/10] ixgbevf: Fix broken trunk vlan
> 
> 
> OK, maybe due to hardware limitations what I'm looking for just really
> isn't possible.

From what I can tell that is the case.

>  However, what I'm trying to emphasize is that vconfig
> is not the only way that vlans can be consumed by the network stack
> and active_vlans is just an indication of whether a vlan filter was
> set, nothing more (perhaps I should have picked a better name when I
> originally designed this stuff).

Understood.  The VF is incapable of receiving VLAN traffic unless a filter has been set.  It doesn't do promiscuous mode and any path that doesn't actually end up setting a VLAN filter won't receive any VLAN traffic.

>  In particular, it is not intended to
> determine whether a tag should be stripped off or not because
> non-vconfig users don't necessarily know which vlans they care about
> (think tcpdump or trunking over a bridge).  A major goal of the
> existing vlan infrastructure is to avoid having drivers make
> assumptions about the consumer of the tag and instead just hand all
> information over to the network stack so it can behave in a consistent
> manner.  That's why I was looking for alternate ways to get this
> information without depending on active_vlans as this driver behaves
> quite a bit differently from others, include the ixgbe PF driver.

There are only two paths for the ixgbevf driver to receive VLAN traffic.  Either a VLAN filter has been set, which will result in a bit tag in active_vlans being set or the system administrator in the host VMM has put the VF device in trunk VLAN mode using the 'ip link set <dev> vf <n> <vlanid>' path.  There is no other way for it to receive VLAN traffic, so I think we're fine.  And keep in mind, once the system admin has put the VF device in trunk vlan mode, the VF is no longer allowed, as a policy implemented in the PF driver, to set any other VLAN filters.  Even if it did, it wouldn't work due to the operational characteristics of the VF device HW.

The methods by which the ixgbe driver, as a fully featured Physical Function device, are quite a bit more varied.  I believe you're thinking of the VF device as a typical Ethernet device and that is just not the case.

- Greg

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next 02/10] ixgbevf: Fix broken trunk vlan
  2011-09-27 16:39             ` Rose, Gregory V
@ 2011-09-27 16:49               ` Jesse Gross
  0 siblings, 0 replies; 30+ messages in thread
From: Jesse Gross @ 2011-09-27 16:49 UTC (permalink / raw)
  To: Rose, Gregory V; +Cc: Kirsher, Jeffrey T, davem, netdev, gospo, Jiri Pirko

On Tue, Sep 27, 2011 at 9:39 AM, Rose, Gregory V
<gregory.v.rose@intel.com> wrote:
>> -----Original Message-----
>> From: Jesse Gross [mailto:jesse@nicira.com]
>> Sent: Monday, September 26, 2011 5:54 PM
>> To: Rose, Gregory V
>> Cc: Kirsher, Jeffrey T; davem@davemloft.net; netdev@vger.kernel.org;
>> gospo@redhat.com; Jiri Pirko
>> Subject: Re: [net-next 02/10] ixgbevf: Fix broken trunk vlan
>>
>>
>> OK, maybe due to hardware limitations what I'm looking for just really
>> isn't possible.
>
> From what I can tell that is the case.
>
>>  However, what I'm trying to emphasize is that vconfig
>> is not the only way that vlans can be consumed by the network stack
>> and active_vlans is just an indication of whether a vlan filter was
>> set, nothing more (perhaps I should have picked a better name when I
>> originally designed this stuff).
>
> Understood.  The VF is incapable of receiving VLAN traffic unless a filter has been set.  It doesn't do promiscuous mode and any path that doesn't actually end up setting a VLAN filter won't receive any VLAN traffic.
>
>>  In particular, it is not intended to
>> determine whether a tag should be stripped off or not because
>> non-vconfig users don't necessarily know which vlans they care about
>> (think tcpdump or trunking over a bridge).  A major goal of the
>> existing vlan infrastructure is to avoid having drivers make
>> assumptions about the consumer of the tag and instead just hand all
>> information over to the network stack so it can behave in a consistent
>> manner.  That's why I was looking for alternate ways to get this
>> information without depending on active_vlans as this driver behaves
>> quite a bit differently from others, include the ixgbe PF driver.
>
> There are only two paths for the ixgbevf driver to receive VLAN traffic.  Either a VLAN filter has been set, which will result in a bit tag in active_vlans being set or the system administrator in the host VMM has put the VF device in trunk VLAN mode using the 'ip link set <dev> vf <n> <vlanid>' path.  There is no other way for it to receive VLAN traffic, so I think we're fine.  And keep in mind, once the system admin has put the VF device in trunk vlan mode, the VF is no longer allowed, as a policy implemented in the PF driver, to set any other VLAN filters.  Even if it did, it wouldn't work due to the operational characteristics of the VF device HW.
>
> The methods by which the ixgbe driver, as a fully featured Physical Function device, are quite a bit more varied.  I believe you're thinking of the VF device as a typical Ethernet device and that is just not the case.

Yes, I think you're right.  Thanks for the explanation.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2011-09-27 16:49 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-24  9:17 [net-next 00/10][pull request] Intel Wired LAN Drivers Update Jeff Kirsher
2011-09-24  9:17 ` [net-next 01/10] e1000: don't enable dma receives until after dma address has been setup Jeff Kirsher
2011-09-24  9:17 ` [net-next 02/10] ixgbevf: Fix broken trunk vlan Jeff Kirsher
2011-09-24 16:33   ` Jesse Gross
2011-09-25  5:47     ` Jeff Kirsher
2011-09-26 15:57     ` Rose, Gregory V
2011-09-26 23:08       ` Jesse Gross
2011-09-26 23:24         ` Rose, Gregory V
2011-09-27  0:54           ` Jesse Gross
2011-09-27 16:39             ` Rose, Gregory V
2011-09-27 16:49               ` Jesse Gross
2011-09-24  9:17 ` [net-next 03/10] ixgbe: Cleanup q_vector interrupt throttle rate logic Jeff Kirsher
2011-09-24  9:17 ` [net-next 04/10] ixgbe: disable LLI for FCoE Jeff Kirsher
2011-09-24  9:17 ` [net-next 05/10] if_link: Add additional parameter to IFLA_VF_INFO for spoof checking Jeff Kirsher
2011-09-24 16:40   ` Ben Hutchings
2011-09-25  5:54     ` Jeff Kirsher
2011-09-26 16:06     ` Rose, Gregory V
2011-09-25 17:22   ` Stephen Hemminger
2011-09-25 20:06     ` Ben Hutchings
2011-09-26 16:14       ` Stephen Hemminger
2011-09-26 16:32         ` Rose, Gregory V
2011-09-26 16:37           ` Stephen Hemminger
2011-09-26 16:18     ` Rose, Gregory V
2011-09-26 16:21       ` Stephen Hemminger
2011-09-26 16:57         ` Rose, Gregory V
2011-09-24  9:17 ` [net-next 06/10] ixgbe: Add new netdev op to turn spoof checking on or off per VF Jeff Kirsher
2011-09-24  9:17 ` [net-next 07/10] ixgbe: update {P}FC thresholds to account for X540 and loopback Jeff Kirsher
2011-09-24  9:17 ` [net-next 08/10] ixgbe add thermal sensor support for x540 hardware Jeff Kirsher
2011-09-24  9:17 ` [net-next 09/10] ixgbe: cleanup ixgbe_setup_gpie() for X540 Jeff Kirsher
2011-09-24  9:17 ` [net-next 10/10] ixgbe: add ECC warning for legacy interrupts Jeff Kirsher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.