linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <Alexander.Levin@microsoft.com>
To: "stable@vger.kernel.org" <stable@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Cc: Jacob Keller <jacob.e.keller@intel.com>,
	Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>,
	Jeff Kirsher <jeffrey.t.kirsher@intel.com>,
	Sasha Levin <Alexander.Levin@microsoft.com>
Subject: [PATCH AUTOSEL 4.18 28/56] ice: Report stats for allocated queues via ethtool stats
Date: Thu, 20 Sep 2018 02:47:46 +0000	[thread overview]
Message-ID: <20180920024716.58490-28-alexander.levin@microsoft.com> (raw)
In-Reply-To: <20180920024716.58490-1-alexander.levin@microsoft.com>

From: Jacob Keller <jacob.e.keller@intel.com>

[ Upstream commit f8ba7db850350319348b6d3c276f8ba19bc098ef ]

It is not safe to have the string table for statistics change order or
size over the lifetime of a given netdevice. This is because of the
nature of the 3-step process for obtaining stats. First, user space
performs a request for the size of the strings table. Second it performs
a separate request for the strings themselves, after allocating space
for the table. Third, it requests the stats themselves, also allocating
space for the table.

If the size decreased, there is potential to see garbage data or stats
values. In the worst case, we could potentially see stats values become
mis-aligned with their strings, so that it looks like a statistic is
being reported differently than it actually is.

Even worse, if the size increased, there is potential that the strings
table or stats table was not allocated large enough and the stats code
could access and write to memory it should not, potentially resulting in
undefined behavior and system crashes.

It isn't even safe if the size always changes under the RTNL lock. This
is because the calls take place over multiple user space commands, so it
is not possible to hold the RTNL lock for the entire duration of
obtaining strings and stats. Further, not all consumers of the ethtool
API are the user space ethtool program, and it is possible that one
assumes the strings will not change (valid under the current contract),
and thus only requests the stats values when requesting stats in a loop.

Finally, it's not possible in the general case to detect when the size
changes, because it is quite possible that one value which could impact
the stat size increased, while another decreased. This would result in
the same total number of stats, but reordering them so that stats no
longer line up with the strings they belong to. Since only size changes
aren't enough, we would need some sort of hash or token to determine
when the strings no longer match. This would require extending the
ethtool stats commands, but there is no more space in the relevant
structures.

The real solution to resolve this would be to add a completely new API
for stats, probably over netlink.

In the ice driver, the only thing impacting the stats that is not
constant is the number of queues. Instead of reporting stats for each
used queue, report stats for each allocated queue. We do not change the
number of queues allocated for a given netdevice, as we pass this into
the alloc_etherdev_mq() function to set the num_tx_queues and
num_rx_queues.

This resolves the potential bugs at the slight cost of displaying many
queue statistics which will not be activated.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
---
 drivers/net/ethernet/intel/ice/ice.h         |  7 +++
 drivers/net/ethernet/intel/ice/ice_ethtool.c | 52 +++++++++++++++-----
 2 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index d8b5fff581e7..ed071ea75f20 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -89,6 +89,13 @@ extern const char ice_drv_ver[];
 #define ice_for_each_rxq(vsi, i) \
 	for ((i) = 0; (i) < (vsi)->num_rxq; (i)++)
 
+/* Macros for each allocated tx/rx ring whether used or not in a VSI */
+#define ice_for_each_alloc_txq(vsi, i) \
+	for ((i) = 0; (i) < (vsi)->alloc_txq; (i)++)
+
+#define ice_for_each_alloc_rxq(vsi, i) \
+	for ((i) = 0; (i) < (vsi)->alloc_rxq; (i)++)
+
 struct ice_tc_info {
 	u16 qoffset;
 	u16 qcount;
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index 1db304c01d10..c71a9b528d6d 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -26,7 +26,7 @@ static int ice_q_stats_len(struct net_device *netdev)
 {
 	struct ice_netdev_priv *np = netdev_priv(netdev);
 
-	return ((np->vsi->num_txq + np->vsi->num_rxq) *
+	return ((np->vsi->alloc_txq + np->vsi->alloc_rxq) *
 		(sizeof(struct ice_q_stats) / sizeof(u64)));
 }
 
@@ -218,7 +218,7 @@ static void ice_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
 			p += ETH_GSTRING_LEN;
 		}
 
-		ice_for_each_txq(vsi, i) {
+		ice_for_each_alloc_txq(vsi, i) {
 			snprintf(p, ETH_GSTRING_LEN,
 				 "tx-queue-%u.tx_packets", i);
 			p += ETH_GSTRING_LEN;
@@ -226,7 +226,7 @@ static void ice_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
 			p += ETH_GSTRING_LEN;
 		}
 
-		ice_for_each_rxq(vsi, i) {
+		ice_for_each_alloc_rxq(vsi, i) {
 			snprintf(p, ETH_GSTRING_LEN,
 				 "rx-queue-%u.rx_packets", i);
 			p += ETH_GSTRING_LEN;
@@ -253,6 +253,24 @@ static int ice_get_sset_count(struct net_device *netdev, int sset)
 {
 	switch (sset) {
 	case ETH_SS_STATS:
+		/* The number (and order) of strings reported *must* remain
+		 * constant for a given netdevice. This function must not
+		 * report a different number based on run time parameters
+		 * (such as the number of queues in use, or the setting of
+		 * a private ethtool flag). This is due to the nature of the
+		 * ethtool stats API.
+		 *
+		 * User space programs such as ethtool must make 3 separate
+		 * ioctl requests, one for size, one for the strings, and
+		 * finally one for the stats. Since these cross into
+		 * user space, changes to the number or size could result in
+		 * undefined memory access or incorrect string<->value
+		 * correlations for statistics.
+		 *
+		 * Even if it appears to be safe, changes to the size or
+		 * order of strings will suffer from race conditions and are
+		 * not safe.
+		 */
 		return ICE_ALL_STATS_LEN(netdev);
 	default:
 		return -EOPNOTSUPP;
@@ -280,18 +298,26 @@ ice_get_ethtool_stats(struct net_device *netdev,
 	/* populate per queue stats */
 	rcu_read_lock();
 
-	ice_for_each_txq(vsi, j) {
+	ice_for_each_alloc_txq(vsi, j) {
 		ring = READ_ONCE(vsi->tx_rings[j]);
-		if (!ring)
-			continue;
-		data[i++] = ring->stats.pkts;
-		data[i++] = ring->stats.bytes;
+		if (ring) {
+			data[i++] = ring->stats.pkts;
+			data[i++] = ring->stats.bytes;
+		} else {
+			data[i++] = 0;
+			data[i++] = 0;
+		}
 	}
 
-	ice_for_each_rxq(vsi, j) {
+	ice_for_each_alloc_rxq(vsi, j) {
 		ring = READ_ONCE(vsi->rx_rings[j]);
-		data[i++] = ring->stats.pkts;
-		data[i++] = ring->stats.bytes;
+		if (ring) {
+			data[i++] = ring->stats.pkts;
+			data[i++] = ring->stats.bytes;
+		} else {
+			data[i++] = 0;
+			data[i++] = 0;
+		}
 	}
 
 	rcu_read_unlock();
@@ -519,7 +545,7 @@ ice_set_ringparam(struct net_device *netdev, struct ethtool_ringparam *ring)
 		goto done;
 	}
 
-	for (i = 0; i < vsi->num_txq; i++) {
+	for (i = 0; i < vsi->alloc_txq; i++) {
 		/* clone ring and setup updated count */
 		tx_rings[i] = *vsi->tx_rings[i];
 		tx_rings[i].count = new_tx_cnt;
@@ -551,7 +577,7 @@ ice_set_ringparam(struct net_device *netdev, struct ethtool_ringparam *ring)
 		goto done;
 	}
 
-	for (i = 0; i < vsi->num_rxq; i++) {
+	for (i = 0; i < vsi->alloc_rxq; i++) {
 		/* clone ring and setup updated count */
 		rx_rings[i] = *vsi->rx_rings[i];
 		rx_rings[i].count = new_rx_cnt;
-- 
2.17.1

  parent reply	other threads:[~2018-09-20  2:48 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-20  2:47 [PATCH AUTOSEL 4.18 01/56] ARM: OMAP2+: Fix null hwmod for ti-sysc debug Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 02/56] ARM: OMAP2+: Fix module address for modules using mpu_rt_idx Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 03/56] bus: ti-sysc: Fix module register ioremap for larger offsets Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 04/56] qed: Wait for ready indication before rereading the shmem Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 05/56] qed: Wait for MCP halt and resume commands to take place Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 07/56] qed: Avoid sending mailbox commands when MFW is not responsive Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 06/56] qed: Prevent a possible deadlock during driver load and unload Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 08/56] thermal: of-thermal: disable passive polling when thermal zone is disabled Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 09/56] isofs: reject hardware sector size > 2048 bytes Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 10/56] mmc: atmel-mci: fix bad logic of sg_copy_{from,to}_buffer conversion Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 11/56] mmc: android-goldfish: " Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 12/56] bus: ti-sysc: Fix no_console_suspend handling Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 13/56] ARM: dts: omap4-droid4: fix vibrations on Droid 4 Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 14/56] bpf, sockmap: fix sock_hash_alloc and reject zero-sized keys Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 15/56] bpf, sockmap: fix sock hash count in alloc_sock_hash_elem Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 16/56] tls: possible hang when do_tcp_sendpages hits sndbuf is full case Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 17/56] bpf: sockmap: write_space events need to be passed to TCP handler Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 18/56] drm/amdgpu: fix VM clearing for the root PD Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 19/56] drm/amdgpu: fix preamble handling Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 21/56] net/ncsi: Fixup .dumpit message flags and ID check in Netlink handler Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 20/56] amdgpu: fix multi-process hang issue Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 22/56] tcp_bbr: add bbr_check_probe_rtt_done() helper Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 24/56] net: hns: fix length and page_offset overflow when CONFIG_ARM64_64K_PAGES Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 23/56] tcp_bbr: in restart from idle, see if we should exit PROBE_RTT Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 25/56] net: hns: fix skb->truesize underestimation Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 26/56] net: hns3: fix page_offset overflow when CONFIG_ARM64_64K_PAGES Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 27/56] ice: Fix multiple static analyser warnings Sasha Levin
2018-09-20  2:47 ` Sasha Levin [this message]
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 29/56] ice: Clean control queues only when they are initialized Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 30/56] ice: Fix bugs in control queue processing Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 31/56] ice: Use order_base_2 to calculate higher power of 2 Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 32/56] ice: Set VLAN flags correctly Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 33/56] tools: bpftool: return from do_event_pipe() on bad arguments Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 34/56] ice: Fix a few null pointer dereference issues Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 36/56] e1000: check on netif_running() before calling e1000_up() Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 35/56] ice: Fix potential return of uninitialized value Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 38/56] ixgbe: fix driver behaviour after issuing VFLR Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 37/56] e1000: ensure to free old tx/rx rings in set_ringparam() Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 39/56] i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 40/56] i40e: fix condition of WARN_ONCE for stat strings Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 41/56] crypto: chtls - fix null dereference chtls_free_uld() Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 42/56] crypto: cavium/nitrox - fix for command corruption in queue full case with backlog submissions Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 43/56] hwmon: (ina2xx) fix sysfs shunt resistor read access Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 44/56] hwmon: (adt7475) Make adt7475_read_word() return errors Sasha Levin
2018-09-20  2:47 ` [PATCH AUTOSEL 4.18 45/56] Revert "ARM: dts: imx7d: Invert legacy PCI irq mapping" Sasha Levin
2018-09-20  2:48 ` [PATCH AUTOSEL 4.18 47/56] drm/amdgpu: Update power state at the end of smu hw_init Sasha Levin
2018-09-20  2:48 ` [PATCH AUTOSEL 4.18 46/56] drm/amdgpu: Enable/disable gfx PG feature in rlc safe mode Sasha Levin
2018-09-20  2:48 ` [PATCH AUTOSEL 4.18 48/56] ata: ftide010: Add a quirk for SQ201 Sasha Levin
2018-09-20  2:48 ` [PATCH AUTOSEL 4.18 49/56] nvme-fcloop: Fix dropped LS's to removed target port Sasha Levin
2018-09-20  2:48 ` [PATCH AUTOSEL 4.18 51/56] drm/amdgpu: Need to set moved to true when evict bo Sasha Levin
2018-09-20  2:48 ` [PATCH AUTOSEL 4.18 50/56] ARM: dts: omap4-droid4: Fix emmc errors seen on some devices Sasha Levin
2018-09-20  2:48 ` [PATCH AUTOSEL 4.18 52/56] arm/arm64: smccc-1.1: Make return values unsigned long Sasha Levin
2018-09-20  2:48 ` [PATCH AUTOSEL 4.18 54/56] i2c: i801: Allow ACPI AML access I/O ports not reserved for SMBus Sasha Levin
2018-09-20  2:48 ` [PATCH AUTOSEL 4.18 53/56] arm/arm64: smccc-1.1: Handle function result as parameters Sasha Levin
2018-09-20  2:48 ` [PATCH AUTOSEL 4.18 56/56] x86/pti: Fix section mismatch warning/error Sasha Levin
2018-09-20  2:48 ` [PATCH AUTOSEL 4.18 55/56] clk: x86: Set default parent to 48Mhz Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180920024716.58490-28-alexander.levin@microsoft.com \
    --to=alexander.levin@microsoft.com \
    --cc=anirudh.venkataramanan@intel.com \
    --cc=jacob.e.keller@intel.com \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).