All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/4] net-sysfs: fix race conditions in the xps code
@ 2020-12-17 16:25 Antoine Tenart
  2020-12-17 16:25 ` [PATCH net 1/4] net-sysfs: take the rtnl lock when storing xps_cpus Antoine Tenart
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Antoine Tenart @ 2020-12-17 16:25 UTC (permalink / raw)
  To: davem, kuba; +Cc: Antoine Tenart, netdev, pabeni

Hello all,

This series fixes two similar issues, one with xps_cpus and one with
xps_rxqs, where a race condition can occur between accesses to the
xps_cpus and xps_rxqs maps and the update of dev->num_tc. Those races
lead to accesses outside of the map and oops being triggered. An
explanation is given in each of the commit logs.

Thanks!
Antoine

Antoine Tenart (4):
  net-sysfs: take the rtnl lock when storing xps_cpus
  net-sysfs: take the rtnl lock when accessing xps_cpus_map and num_tc
  net-sysfs: take the rtnl lock when storing xps_rxqs
  net-sysfs: take the rtnl lock when accessing xps_rxqs_map and num_tc

 net/core/net-sysfs.c | 65 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 53 insertions(+), 12 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net 1/4] net-sysfs: take the rtnl lock when storing xps_cpus
  2020-12-17 16:25 [PATCH net 0/4] net-sysfs: fix race conditions in the xps code Antoine Tenart
@ 2020-12-17 16:25 ` Antoine Tenart
  2020-12-18 15:27     ` kernel test robot
  2020-12-19  0:30   ` Jakub Kicinski
  2020-12-17 16:25 ` [PATCH net 2/4] net-sysfs: take the rtnl lock when accessing xps_cpus_map and num_tc Antoine Tenart
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 14+ messages in thread
From: Antoine Tenart @ 2020-12-17 16:25 UTC (permalink / raw)
  To: davem, kuba; +Cc: Antoine Tenart, netdev, pabeni

Callers to netif_set_xps_queue should take the rtnl lock. Failing to do
so can lead to race conditions between netdev_set_num_tc and
netif_set_xps_queue, triggering various oops:

- netif_set_xps_queue uses dev->tc_num as one of the parameters to
  compute the size of new_dev_maps when allocating it. dev->tc_num is
  also used to access the map, and the compiler may generate code to
  retrieve this field multiple times in the function.

- netdev_set_num_tc sets dev->tc_num.

If new_dev_maps is allocated using dev->tc_num and then dev->tc_num is
set to a higher value through netdev_set_num_tc, later accesses to
new_dev_maps in netif_set_xps_queue could lead to accessing memory
outside of new_dev_maps; triggering an oops.

One way of triggering this is to set an iface up (for which the driver
uses netdev_set_num_tc in the open path, such as bnx2x) and writing to
xps_cpus in a concurrent thread. With the right timing an oops is
triggered.

Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 net/core/net-sysfs.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 999b70c59761..7cc15dec1717 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1396,7 +1396,13 @@ static ssize_t xps_cpus_store(struct netdev_queue *queue,
 		return err;
 	}
 
+	if (!rtnl_trylock()) {
+		free_cpumask_var(mask);
+		return restart_syscall();
+	}
+
 	err = netif_set_xps_queue(dev, mask, index);
+	rtnl_unlock();
 
 	free_cpumask_var(mask);
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net 2/4] net-sysfs: take the rtnl lock when accessing xps_cpus_map and num_tc
  2020-12-17 16:25 [PATCH net 0/4] net-sysfs: fix race conditions in the xps code Antoine Tenart
  2020-12-17 16:25 ` [PATCH net 1/4] net-sysfs: take the rtnl lock when storing xps_cpus Antoine Tenart
@ 2020-12-17 16:25 ` Antoine Tenart
  2020-12-17 16:25 ` [PATCH net 3/4] net-sysfs: take the rtnl lock when storing xps_rxqs Antoine Tenart
  2020-12-17 16:25 ` [PATCH net 4/4] net-sysfs: take the rtnl lock when accessing xps_rxqs_map and num_tc Antoine Tenart
  3 siblings, 0 replies; 14+ messages in thread
From: Antoine Tenart @ 2020-12-17 16:25 UTC (permalink / raw)
  To: davem, kuba; +Cc: Antoine Tenart, netdev, pabeni

Accesses to dev->xps_cpus_map (when using dev->num_tc) should be
protected by the rtnl lock, like we do for netif_set_xps_queue. I didn't
see an actual bug being triggered, but let's be safe here and take the
rtnl lock while accessing the map in sysfs.

Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 net/core/net-sysfs.c | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 7cc15dec1717..65886bfbf822 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1317,8 +1317,8 @@ static const struct attribute_group dql_group = {
 static ssize_t xps_cpus_show(struct netdev_queue *queue,
 			     char *buf)
 {
+	int cpu, len, ret, num_tc = 1, tc = 0;
 	struct net_device *dev = queue->dev;
-	int cpu, len, num_tc = 1, tc = 0;
 	struct xps_dev_maps *dev_maps;
 	cpumask_var_t mask;
 	unsigned long index;
@@ -1328,22 +1328,31 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 
 	index = get_netdev_queue_index(queue);
 
+	if (!rtnl_trylock())
+		return restart_syscall();
+
 	if (dev->num_tc) {
 		/* Do not allow XPS on subordinate device directly */
 		num_tc = dev->num_tc;
-		if (num_tc < 0)
-			return -EINVAL;
+		if (num_tc < 0) {
+			ret = -EINVAL;
+			goto err_rtnl_unlock;
+		}
 
 		/* If queue belongs to subordinate dev use its map */
 		dev = netdev_get_tx_queue(dev, index)->sb_dev ? : dev;
 
 		tc = netdev_txq_to_tc(dev, index);
-		if (tc < 0)
-			return -EINVAL;
+		if (tc < 0) {
+			ret = -EINVAL;
+			goto err_rtnl_unlock;
+		}
 	}
 
-	if (!zalloc_cpumask_var(&mask, GFP_KERNEL))
-		return -ENOMEM;
+	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
+		ret = -ENOMEM;
+		goto err_rtnl_unlock;
+	}
 
 	rcu_read_lock();
 	dev_maps = rcu_dereference(dev->xps_cpus_map);
@@ -1366,9 +1375,15 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 	}
 	rcu_read_unlock();
 
+	rtnl_unlock();
+
 	len = snprintf(buf, PAGE_SIZE, "%*pb\n", cpumask_pr_args(mask));
 	free_cpumask_var(mask);
 	return len < PAGE_SIZE ? len : -EINVAL;
+
+err_rtnl_unlock:
+	rtnl_unlock();
+	return ret;
 }
 
 static ssize_t xps_cpus_store(struct netdev_queue *queue,
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net 3/4] net-sysfs: take the rtnl lock when storing xps_rxqs
  2020-12-17 16:25 [PATCH net 0/4] net-sysfs: fix race conditions in the xps code Antoine Tenart
  2020-12-17 16:25 ` [PATCH net 1/4] net-sysfs: take the rtnl lock when storing xps_cpus Antoine Tenart
  2020-12-17 16:25 ` [PATCH net 2/4] net-sysfs: take the rtnl lock when accessing xps_cpus_map and num_tc Antoine Tenart
@ 2020-12-17 16:25 ` Antoine Tenart
  2020-12-17 16:25 ` [PATCH net 4/4] net-sysfs: take the rtnl lock when accessing xps_rxqs_map and num_tc Antoine Tenart
  3 siblings, 0 replies; 14+ messages in thread
From: Antoine Tenart @ 2020-12-17 16:25 UTC (permalink / raw)
  To: davem, kuba; +Cc: Antoine Tenart, netdev, pabeni

Callers to __netif_set_xps_queue should take the rtnl lock. Failing to
do so can lead to race conditions between netdev_set_num_tc and
__netif_set_xps_queue, triggering various oops:

- __netif_set_xps_queue uses dev->tc_num as one of the parameters to
  compute the size of new_dev_maps when allocating it. dev->tc_num is
  also used to access the map, and the compiler may generate code to
  retrieve this field multiple times in the function.

- netdev_set_num_tc sets dev->tc_num.

If new_dev_maps is allocated using dev->tc_num and then dev->tc_num is
set to a higher value through netdev_set_num_tc, later accesses to
new_dev_maps in __netif_set_xps_queue could lead to accessing memory
outside of new_dev_maps; triggering an oops.

One way of triggering this is to set an iface up (for which the driver
uses netdev_set_num_tc in the open path, such as bnx2x) and writing to
xps_rxqs in a concurrent thread. With the right timing an oops is
triggered.

Fixes: 8af2c06ff4b1 ("net-sysfs: Add interface for Rx queue(s) map per Tx queue")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 net/core/net-sysfs.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 65886bfbf822..62ca2f2c0ee6 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1499,10 +1499,17 @@ static ssize_t xps_rxqs_store(struct netdev_queue *queue, const char *buf,
 		return err;
 	}
 
+	if (!rtnl_trylock()) {
+		bitmap_free(mask);
+		return restart_syscall();
+	}
+
 	cpus_read_lock();
 	err = __netif_set_xps_queue(dev, mask, index, true);
 	cpus_read_unlock();
 
+	rtnl_unlock();
+
 	bitmap_free(mask);
 	return err ? : len;
 }
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net 4/4] net-sysfs: take the rtnl lock when accessing xps_rxqs_map and num_tc
  2020-12-17 16:25 [PATCH net 0/4] net-sysfs: fix race conditions in the xps code Antoine Tenart
                   ` (2 preceding siblings ...)
  2020-12-17 16:25 ` [PATCH net 3/4] net-sysfs: take the rtnl lock when storing xps_rxqs Antoine Tenart
@ 2020-12-17 16:25 ` Antoine Tenart
  3 siblings, 0 replies; 14+ messages in thread
From: Antoine Tenart @ 2020-12-17 16:25 UTC (permalink / raw)
  To: davem, kuba; +Cc: Antoine Tenart, netdev, pabeni

Accesses to dev->xps_rxqs_map (when using dev->num_tc) should be
protected by the rtnl lock, like we do for netif_set_xps_queue. I didn't
see an actual bug being triggered, but let's be safe here and take the
rtnl lock while accessing the map in sysfs.

Fixes: 8af2c06ff4b1 ("net-sysfs: Add interface for Rx queue(s) map per Tx queue")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 net/core/net-sysfs.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 62ca2f2c0ee6..daf502c13d6d 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1429,22 +1429,29 @@ static struct netdev_queue_attribute xps_cpus_attribute __ro_after_init
 
 static ssize_t xps_rxqs_show(struct netdev_queue *queue, char *buf)
 {
+	int j, len, ret, num_tc = 1, tc = 0;
 	struct net_device *dev = queue->dev;
 	struct xps_dev_maps *dev_maps;
 	unsigned long *mask, index;
-	int j, len, num_tc = 1, tc = 0;
 
 	index = get_netdev_queue_index(queue);
 
+	if (!rtnl_trylock())
+		return restart_syscall();
+
 	if (dev->num_tc) {
 		num_tc = dev->num_tc;
 		tc = netdev_txq_to_tc(dev, index);
-		if (tc < 0)
-			return -EINVAL;
+		if (tc < 0) {
+			ret = -EINVAL;
+			goto err_rtnl_unlock;
+		}
 	}
 	mask = bitmap_zalloc(dev->num_rx_queues, GFP_KERNEL);
-	if (!mask)
-		return -ENOMEM;
+	if (!mask) {
+		ret = -ENOMEM;
+		goto err_rtnl_unlock;
+	}
 
 	rcu_read_lock();
 	dev_maps = rcu_dereference(dev->xps_rxqs_map);
@@ -1470,10 +1477,16 @@ static ssize_t xps_rxqs_show(struct netdev_queue *queue, char *buf)
 out_no_maps:
 	rcu_read_unlock();
 
+	rtnl_unlock();
+
 	len = bitmap_print_to_pagebuf(false, buf, mask, dev->num_rx_queues);
 	bitmap_free(mask);
 
 	return len < PAGE_SIZE ? len : -EINVAL;
+
+err_rtnl_unlock:
+	rtnl_unlock();
+	return ret;
 }
 
 static ssize_t xps_rxqs_store(struct netdev_queue *queue, const char *buf,
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/4] net-sysfs: take the rtnl lock when storing xps_cpus
  2020-12-17 16:25 ` [PATCH net 1/4] net-sysfs: take the rtnl lock when storing xps_cpus Antoine Tenart
@ 2020-12-18 15:27     ` kernel test robot
  2020-12-19  0:30   ` Jakub Kicinski
  1 sibling, 0 replies; 14+ messages in thread
From: kernel test robot @ 2020-12-18 15:27 UTC (permalink / raw)
  To: Antoine Tenart, davem, kuba
  Cc: kbuild-all, clang-built-linux, Antoine Tenart, netdev, pabeni

[-- Attachment #1: Type: text/plain, Size: 2475 bytes --]

Hi Antoine,

I love your patch! Yet something to improve:

[auto build test ERROR on net/master]

url:    https://github.com/0day-ci/linux/commits/Antoine-Tenart/net-sysfs-fix-race-conditions-in-the-xps-code/20201218-002852
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 3ae32c07815a24ae12de2e7838d9d429ba31e5e0
config: riscv-randconfig-r014-20201217 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project cee1e7d14f4628d6174b33640d502bff3b54ae45)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install riscv cross compiling tool for clang build
        # apt-get install binutils-riscv64-linux-gnu
        # https://github.com/0day-ci/linux/commit/f989c3dcbe4d9abd1c6c48b34f08c6c0cd9d44b3
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Antoine-Tenart/net-sysfs-fix-race-conditions-in-the-xps-code/20201218-002852
        git checkout f989c3dcbe4d9abd1c6c48b34f08c6c0cd9d44b3
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=riscv 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

Note: the linux-review/Antoine-Tenart/net-sysfs-fix-race-conditions-in-the-xps-code/20201218-002852 HEAD 563d144b47845dea594b409ecf22914b9797cd1e builds fine.
      It only hurts bisectibility.

All errors (new ones prefixed by >>):

   /tmp/ics932s401-422897.s: Assembler messages:
>> /tmp/ics932s401-422897.s:260: Error: unrecognized opcode `zext.b a1,s11'
   /tmp/ics932s401-422897.s:362: Error: unrecognized opcode `zext.b a1,s11'
   /tmp/ics932s401-422897.s:518: Error: unrecognized opcode `zext.b a1,s11'
   /tmp/ics932s401-422897.s:637: Error: unrecognized opcode `zext.b a1,s11'
   /tmp/ics932s401-422897.s:774: Error: unrecognized opcode `zext.b a1,s11'
   /tmp/ics932s401-422897.s:893: Error: unrecognized opcode `zext.b a1,s11'
   /tmp/ics932s401-422897.s:1021: Error: unrecognized opcode `zext.b a1,s11'
>> /tmp/ics932s401-422897.s:1180: Error: unrecognized opcode `zext.b a1,s2'
   clang-12: error: assembler command failed with exit code 1 (use -v to see invocation)

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 19178 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/4] net-sysfs: take the rtnl lock when storing xps_cpus
@ 2020-12-18 15:27     ` kernel test robot
  0 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2020-12-18 15:27 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2521 bytes --]

Hi Antoine,

I love your patch! Yet something to improve:

[auto build test ERROR on net/master]

url:    https://github.com/0day-ci/linux/commits/Antoine-Tenart/net-sysfs-fix-race-conditions-in-the-xps-code/20201218-002852
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 3ae32c07815a24ae12de2e7838d9d429ba31e5e0
config: riscv-randconfig-r014-20201217 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project cee1e7d14f4628d6174b33640d502bff3b54ae45)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install riscv cross compiling tool for clang build
        # apt-get install binutils-riscv64-linux-gnu
        # https://github.com/0day-ci/linux/commit/f989c3dcbe4d9abd1c6c48b34f08c6c0cd9d44b3
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Antoine-Tenart/net-sysfs-fix-race-conditions-in-the-xps-code/20201218-002852
        git checkout f989c3dcbe4d9abd1c6c48b34f08c6c0cd9d44b3
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=riscv 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

Note: the linux-review/Antoine-Tenart/net-sysfs-fix-race-conditions-in-the-xps-code/20201218-002852 HEAD 563d144b47845dea594b409ecf22914b9797cd1e builds fine.
      It only hurts bisectibility.

All errors (new ones prefixed by >>):

   /tmp/ics932s401-422897.s: Assembler messages:
>> /tmp/ics932s401-422897.s:260: Error: unrecognized opcode `zext.b a1,s11'
   /tmp/ics932s401-422897.s:362: Error: unrecognized opcode `zext.b a1,s11'
   /tmp/ics932s401-422897.s:518: Error: unrecognized opcode `zext.b a1,s11'
   /tmp/ics932s401-422897.s:637: Error: unrecognized opcode `zext.b a1,s11'
   /tmp/ics932s401-422897.s:774: Error: unrecognized opcode `zext.b a1,s11'
   /tmp/ics932s401-422897.s:893: Error: unrecognized opcode `zext.b a1,s11'
   /tmp/ics932s401-422897.s:1021: Error: unrecognized opcode `zext.b a1,s11'
>> /tmp/ics932s401-422897.s:1180: Error: unrecognized opcode `zext.b a1,s2'
   clang-12: error: assembler command failed with exit code 1 (use -v to see invocation)

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 19178 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/4] net-sysfs: take the rtnl lock when storing xps_cpus
  2020-12-18 15:27     ` kernel test robot
@ 2020-12-18 16:11       ` Antoine Tenart
  -1 siblings, 0 replies; 14+ messages in thread
From: Antoine Tenart @ 2020-12-18 16:11 UTC (permalink / raw)
  To: davem, kernel test robot, kuba
  Cc: kbuild-all, clang-built-linux, netdev, pabeni

That build issue seems unrelated to the patch. The series as a whole
builds fine according to the same report, and this code is not modified
by later patches.

It looks a lot like this report from yesterday:
https://www.spinics.net/lists/netdev/msg709132.html

Which also seemed unrelated to the changes:
https://www.spinics.net/lists/netdev/msg709264.html

Thanks!
Antoine

Quoting kernel test robot (2020-12-18 16:27:46)
> Hi Antoine,
> 
> I love your patch! Yet something to improve:
> 
> [auto build test ERROR on net/master]
> 
> url:    https://github.com/0day-ci/linux/commits/Antoine-Tenart/net-sysfs-fix-race-conditions-in-the-xps-code/20201218-002852
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 3ae32c07815a24ae12de2e7838d9d429ba31e5e0
> config: riscv-randconfig-r014-20201217 (attached as .config)
> compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project cee1e7d14f4628d6174b33640d502bff3b54ae45)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # install riscv cross compiling tool for clang build
>         # apt-get install binutils-riscv64-linux-gnu
>         # https://github.com/0day-ci/linux/commit/f989c3dcbe4d9abd1c6c48b34f08c6c0cd9d44b3
>         git remote add linux-review https://github.com/0day-ci/linux
>         git fetch --no-tags linux-review Antoine-Tenart/net-sysfs-fix-race-conditions-in-the-xps-code/20201218-002852
>         git checkout f989c3dcbe4d9abd1c6c48b34f08c6c0cd9d44b3
>         # save the attached .config to linux build tree
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=riscv 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> Note: the linux-review/Antoine-Tenart/net-sysfs-fix-race-conditions-in-the-xps-code/20201218-002852 HEAD 563d144b47845dea594b409ecf22914b9797cd1e builds fine.
>       It only hurts bisectibility.
> 
> All errors (new ones prefixed by >>):
> 
>    /tmp/ics932s401-422897.s: Assembler messages:
> >> /tmp/ics932s401-422897.s:260: Error: unrecognized opcode `zext.b a1,s11'
>    /tmp/ics932s401-422897.s:362: Error: unrecognized opcode `zext.b a1,s11'
>    /tmp/ics932s401-422897.s:518: Error: unrecognized opcode `zext.b a1,s11'
>    /tmp/ics932s401-422897.s:637: Error: unrecognized opcode `zext.b a1,s11'
>    /tmp/ics932s401-422897.s:774: Error: unrecognized opcode `zext.b a1,s11'
>    /tmp/ics932s401-422897.s:893: Error: unrecognized opcode `zext.b a1,s11'
>    /tmp/ics932s401-422897.s:1021: Error: unrecognized opcode `zext.b a1,s11'
> >> /tmp/ics932s401-422897.s:1180: Error: unrecognized opcode `zext.b a1,s2'
>    clang-12: error: assembler command failed with exit code 1 (use -v to see invocation)
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/4] net-sysfs: take the rtnl lock when storing xps_cpus
@ 2020-12-18 16:11       ` Antoine Tenart
  0 siblings, 0 replies; 14+ messages in thread
From: Antoine Tenart @ 2020-12-18 16:11 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3044 bytes --]

That build issue seems unrelated to the patch. The series as a whole
builds fine according to the same report, and this code is not modified
by later patches.

It looks a lot like this report from yesterday:
https://www.spinics.net/lists/netdev/msg709132.html

Which also seemed unrelated to the changes:
https://www.spinics.net/lists/netdev/msg709264.html

Thanks!
Antoine

Quoting kernel test robot (2020-12-18 16:27:46)
> Hi Antoine,
> 
> I love your patch! Yet something to improve:
> 
> [auto build test ERROR on net/master]
> 
> url:    https://github.com/0day-ci/linux/commits/Antoine-Tenart/net-sysfs-fix-race-conditions-in-the-xps-code/20201218-002852
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 3ae32c07815a24ae12de2e7838d9d429ba31e5e0
> config: riscv-randconfig-r014-20201217 (attached as .config)
> compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project cee1e7d14f4628d6174b33640d502bff3b54ae45)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # install riscv cross compiling tool for clang build
>         # apt-get install binutils-riscv64-linux-gnu
>         # https://github.com/0day-ci/linux/commit/f989c3dcbe4d9abd1c6c48b34f08c6c0cd9d44b3
>         git remote add linux-review https://github.com/0day-ci/linux
>         git fetch --no-tags linux-review Antoine-Tenart/net-sysfs-fix-race-conditions-in-the-xps-code/20201218-002852
>         git checkout f989c3dcbe4d9abd1c6c48b34f08c6c0cd9d44b3
>         # save the attached .config to linux build tree
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=riscv 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> Note: the linux-review/Antoine-Tenart/net-sysfs-fix-race-conditions-in-the-xps-code/20201218-002852 HEAD 563d144b47845dea594b409ecf22914b9797cd1e builds fine.
>       It only hurts bisectibility.
> 
> All errors (new ones prefixed by >>):
> 
>    /tmp/ics932s401-422897.s: Assembler messages:
> >> /tmp/ics932s401-422897.s:260: Error: unrecognized opcode `zext.b a1,s11'
>    /tmp/ics932s401-422897.s:362: Error: unrecognized opcode `zext.b a1,s11'
>    /tmp/ics932s401-422897.s:518: Error: unrecognized opcode `zext.b a1,s11'
>    /tmp/ics932s401-422897.s:637: Error: unrecognized opcode `zext.b a1,s11'
>    /tmp/ics932s401-422897.s:774: Error: unrecognized opcode `zext.b a1,s11'
>    /tmp/ics932s401-422897.s:893: Error: unrecognized opcode `zext.b a1,s11'
>    /tmp/ics932s401-422897.s:1021: Error: unrecognized opcode `zext.b a1,s11'
> >> /tmp/ics932s401-422897.s:1180: Error: unrecognized opcode `zext.b a1,s2'
>    clang-12: error: assembler command failed with exit code 1 (use -v to see invocation)
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/4] net-sysfs: take the rtnl lock when storing xps_cpus
  2020-12-17 16:25 ` [PATCH net 1/4] net-sysfs: take the rtnl lock when storing xps_cpus Antoine Tenart
  2020-12-18 15:27     ` kernel test robot
@ 2020-12-19  0:30   ` Jakub Kicinski
  2020-12-19  1:41     ` Alexander Duyck
  1 sibling, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2020-12-19  0:30 UTC (permalink / raw)
  To: Antoine Tenart; +Cc: davem, netdev, pabeni, Alexander Duyck

On Thu, 17 Dec 2020 17:25:18 +0100 Antoine Tenart wrote:
> Callers to netif_set_xps_queue should take the rtnl lock. Failing to do
> so can lead to race conditions between netdev_set_num_tc and
> netif_set_xps_queue, triggering various oops:
> 
> - netif_set_xps_queue uses dev->tc_num as one of the parameters to
>   compute the size of new_dev_maps when allocating it. dev->tc_num is
>   also used to access the map, and the compiler may generate code to
>   retrieve this field multiple times in the function.
> 
> - netdev_set_num_tc sets dev->tc_num.
> 
> If new_dev_maps is allocated using dev->tc_num and then dev->tc_num is
> set to a higher value through netdev_set_num_tc, later accesses to
> new_dev_maps in netif_set_xps_queue could lead to accessing memory
> outside of new_dev_maps; triggering an oops.
> 
> One way of triggering this is to set an iface up (for which the driver
> uses netdev_set_num_tc in the open path, such as bnx2x) and writing to
> xps_cpus in a concurrent thread. With the right timing an oops is
> triggered.
> 
> Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes")

Let's CC Alex

> Signed-off-by: Antoine Tenart <atenart@kernel.org>

Two things: (a) is the datapath not exposed to a similar problem?
__get_xps_queue_idx() uses dev->tc_num in a very similar fashion.
Should we perhaps make the "num_tcs" part of the XPS maps which is
under RCU protection rather than accessing the netdev copy? 
(b) if we always take rtnl_lock, why have xps_map_mutex? Can we
rearrange things so that xps_map_mutex is sufficient?

> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index 999b70c59761..7cc15dec1717 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -1396,7 +1396,13 @@ static ssize_t xps_cpus_store(struct netdev_queue *queue,
>  		return err;
>  	}
>  
> +	if (!rtnl_trylock()) {
> +		free_cpumask_var(mask);
> +		return restart_syscall();
> +	}
> +
>  	err = netif_set_xps_queue(dev, mask, index);
> +	rtnl_unlock();
>  
>  	free_cpumask_var(mask);
>  


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/4] net-sysfs: take the rtnl lock when storing xps_cpus
  2020-12-19  0:30   ` Jakub Kicinski
@ 2020-12-19  1:41     ` Alexander Duyck
  2020-12-19 15:01       ` Antoine Tenart
  0 siblings, 1 reply; 14+ messages in thread
From: Alexander Duyck @ 2020-12-19  1:41 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Antoine Tenart, David Miller, Netdev, Paolo Abeni

On Fri, Dec 18, 2020 at 4:30 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 17 Dec 2020 17:25:18 +0100 Antoine Tenart wrote:
> > Callers to netif_set_xps_queue should take the rtnl lock. Failing to do
> > so can lead to race conditions between netdev_set_num_tc and
> > netif_set_xps_queue, triggering various oops:
> >
> > - netif_set_xps_queue uses dev->tc_num as one of the parameters to
> >   compute the size of new_dev_maps when allocating it. dev->tc_num is
> >   also used to access the map, and the compiler may generate code to
> >   retrieve this field multiple times in the function.
> >
> > - netdev_set_num_tc sets dev->tc_num.
> >
> > If new_dev_maps is allocated using dev->tc_num and then dev->tc_num is
> > set to a higher value through netdev_set_num_tc, later accesses to
> > new_dev_maps in netif_set_xps_queue could lead to accessing memory
> > outside of new_dev_maps; triggering an oops.
> >
> > One way of triggering this is to set an iface up (for which the driver
> > uses netdev_set_num_tc in the open path, such as bnx2x) and writing to
> > xps_cpus in a concurrent thread. With the right timing an oops is
> > triggered.
> >
> > Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes")
>
> Let's CC Alex
>
> > Signed-off-by: Antoine Tenart <atenart@kernel.org>
>
> Two things: (a) is the datapath not exposed to a similar problem?
> __get_xps_queue_idx() uses dev->tc_num in a very similar fashion.

I think we are shielded from this by the fact that if you change the
number of tc the Tx path has to be torn down and rebuilt since you are
normally changing the qdisc configuration anyway.

> Should we perhaps make the "num_tcs" part of the XPS maps which is
> under RCU protection rather than accessing the netdev copy?

So it looks like the issue is the fact that we really need to
synchronize netdev_reset_tc, netdev_set_tc_queue, and
netdev_set_num_tc with __netif_set_xps_queue.

> (b) if we always take rtnl_lock, why have xps_map_mutex? Can we
> rearrange things so that xps_map_mutex is sufficient?

It seems like the quick and dirty way would be to look at updating the
3 functions I called out so that they were holding the xps_map_mutex
while they were updating things, and for __netif_set_xps_queue to
expand out the mutex to include the code starting at "if (dev->num_tc)
{".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/4] net-sysfs: take the rtnl lock when storing xps_cpus
  2020-12-19  1:41     ` Alexander Duyck
@ 2020-12-19 15:01       ` Antoine Tenart
  0 siblings, 0 replies; 14+ messages in thread
From: Antoine Tenart @ 2020-12-19 15:01 UTC (permalink / raw)
  To: Alexander Duyck, Jakub Kicinski; +Cc: David Miller, Netdev, Paolo Abeni

Hi Jakub, Alexander,

Quoting Alexander Duyck (2020-12-19 02:41:08)
> On Fri, Dec 18, 2020 at 4:30 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > Two things: (a) is the datapath not exposed to a similar problem?
> > __get_xps_queue_idx() uses dev->tc_num in a very similar fashion.
> 
> I think we are shielded from this by the fact that if you change the
> number of tc the Tx path has to be torn down and rebuilt since you are
> normally changing the qdisc configuration anyway.

That's right. But there's nothing preventing users to call functions
using the xps maps in between. There are a few functions being exposed.

One (similar) example of that is another bug I reproduced, were the old
and the new map in __netif_set_xps_queue do not have the same size,
because num_tc was updated in between two calls to this function. The
root cause is the same: the size of the map is not embedded in it and
whenever we access it we can make an out of bound access.

> > Should we perhaps make the "num_tcs" part of the XPS maps which is
> > under RCU protection rather than accessing the netdev copy?

Yes, I have a local patch (untested, still WIP) doing exactly that. The
idea is we can't make sure a num_tc update will trigger an xps
reallocation / reconfiguration of the map; but at least we can make sure
the map won't be accessed out of bounds.

It's a different issue though: not being able to access a map out of
bound once it has been allocated whereas this patch wants to prevent an
update of num_tc while the xps map allocation/setup is in progress.

> So it looks like the issue is the fact that we really need to
> synchronize netdev_reset_tc, netdev_set_tc_queue, and
> netdev_set_num_tc with __netif_set_xps_queue.
> 
> > (b) if we always take rtnl_lock, why have xps_map_mutex? Can we
> > rearrange things so that xps_map_mutex is sufficient?
> 
> It seems like the quick and dirty way would be to look at updating the
> 3 functions I called out so that they were holding the xps_map_mutex
> while they were updating things, and for __netif_set_xps_queue to
> expand out the mutex to include the code starting at "if (dev->num_tc)
> {".

That should do the trick. The only downside is xps_map_mutex is only
defined with CONFIG_XPS while netdev_set_num_tc is not, adding more
ifdef to it. But that's probably a better compromise than taking the
rtnl lock.

Thanks for the review and suggestions!
Antoine

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [kbuild-all] Re: [PATCH net 1/4] net-sysfs: take the rtnl lock when storing xps_cpus
  2020-12-18 16:11       ` Antoine Tenart
@ 2020-12-21  4:09         ` Philip Li
  -1 siblings, 0 replies; 14+ messages in thread
From: Philip Li @ 2020-12-21  4:09 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: davem, kernel test robot, kuba, kbuild-all, clang-built-linux,
	netdev, pabeni

On Fri, Dec 18, 2020 at 05:11:28PM +0100, Antoine Tenart wrote:
> That build issue seems unrelated to the patch. The series as a whole
> builds fine according to the same report, and this code is not modified
> by later patches.
Hi Antoine, this is a false positive report, kindly ignore this.
Sorry for inconvenience.

> 
> It looks a lot like this report from yesterday:
> https://www.spinics.net/lists/netdev/msg709132.html
> 
> Which also seemed unrelated to the changes:
> https://www.spinics.net/lists/netdev/msg709264.html
> 
> Thanks!
> Antoine
> 
> Quoting kernel test robot (2020-12-18 16:27:46)
> > Hi Antoine,
> > 
> > I love your patch! Yet something to improve:
> > 
> > [auto build test ERROR on net/master]
> > 
> > url:    https://github.com/0day-ci/linux/commits/Antoine-Tenart/net-sysfs-fix-race-conditions-in-the-xps-code/20201218-002852
> > base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 3ae32c07815a24ae12de2e7838d9d429ba31e5e0
> > config: riscv-randconfig-r014-20201217 (attached as .config)
> > compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project cee1e7d14f4628d6174b33640d502bff3b54ae45)
> > reproduce (this is a W=1 build):
> >         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> >         chmod +x ~/bin/make.cross
> >         # install riscv cross compiling tool for clang build
> >         # apt-get install binutils-riscv64-linux-gnu
> >         # https://github.com/0day-ci/linux/commit/f989c3dcbe4d9abd1c6c48b34f08c6c0cd9d44b3
> >         git remote add linux-review https://github.com/0day-ci/linux
> >         git fetch --no-tags linux-review Antoine-Tenart/net-sysfs-fix-race-conditions-in-the-xps-code/20201218-002852
> >         git checkout f989c3dcbe4d9abd1c6c48b34f08c6c0cd9d44b3
> >         # save the attached .config to linux build tree
> >         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=riscv 
> > 
> > If you fix the issue, kindly add following tag as appropriate
> > Reported-by: kernel test robot <lkp@intel.com>
> > 
> > Note: the linux-review/Antoine-Tenart/net-sysfs-fix-race-conditions-in-the-xps-code/20201218-002852 HEAD 563d144b47845dea594b409ecf22914b9797cd1e builds fine.
> >       It only hurts bisectibility.
> > 
> > All errors (new ones prefixed by >>):
> > 
> >    /tmp/ics932s401-422897.s: Assembler messages:
> > >> /tmp/ics932s401-422897.s:260: Error: unrecognized opcode `zext.b a1,s11'
> >    /tmp/ics932s401-422897.s:362: Error: unrecognized opcode `zext.b a1,s11'
> >    /tmp/ics932s401-422897.s:518: Error: unrecognized opcode `zext.b a1,s11'
> >    /tmp/ics932s401-422897.s:637: Error: unrecognized opcode `zext.b a1,s11'
> >    /tmp/ics932s401-422897.s:774: Error: unrecognized opcode `zext.b a1,s11'
> >    /tmp/ics932s401-422897.s:893: Error: unrecognized opcode `zext.b a1,s11'
> >    /tmp/ics932s401-422897.s:1021: Error: unrecognized opcode `zext.b a1,s11'
> > >> /tmp/ics932s401-422897.s:1180: Error: unrecognized opcode `zext.b a1,s2'
> >    clang-12: error: assembler command failed with exit code 1 (use -v to see invocation)
> > 
> > ---
> > 0-DAY CI Kernel Test Service, Intel Corporation
> > https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
> _______________________________________________
> kbuild-all mailing list -- kbuild-all@lists.01.org
> To unsubscribe send an email to kbuild-all-leave@lists.01.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/4] net-sysfs: take the rtnl lock when storing xps_cpus
@ 2020-12-21  4:09         ` Philip Li
  0 siblings, 0 replies; 14+ messages in thread
From: Philip Li @ 2020-12-21  4:09 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3493 bytes --]

On Fri, Dec 18, 2020 at 05:11:28PM +0100, Antoine Tenart wrote:
> That build issue seems unrelated to the patch. The series as a whole
> builds fine according to the same report, and this code is not modified
> by later patches.
Hi Antoine, this is a false positive report, kindly ignore this.
Sorry for inconvenience.

> 
> It looks a lot like this report from yesterday:
> https://www.spinics.net/lists/netdev/msg709132.html
> 
> Which also seemed unrelated to the changes:
> https://www.spinics.net/lists/netdev/msg709264.html
> 
> Thanks!
> Antoine
> 
> Quoting kernel test robot (2020-12-18 16:27:46)
> > Hi Antoine,
> > 
> > I love your patch! Yet something to improve:
> > 
> > [auto build test ERROR on net/master]
> > 
> > url:    https://github.com/0day-ci/linux/commits/Antoine-Tenart/net-sysfs-fix-race-conditions-in-the-xps-code/20201218-002852
> > base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 3ae32c07815a24ae12de2e7838d9d429ba31e5e0
> > config: riscv-randconfig-r014-20201217 (attached as .config)
> > compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project cee1e7d14f4628d6174b33640d502bff3b54ae45)
> > reproduce (this is a W=1 build):
> >         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> >         chmod +x ~/bin/make.cross
> >         # install riscv cross compiling tool for clang build
> >         # apt-get install binutils-riscv64-linux-gnu
> >         # https://github.com/0day-ci/linux/commit/f989c3dcbe4d9abd1c6c48b34f08c6c0cd9d44b3
> >         git remote add linux-review https://github.com/0day-ci/linux
> >         git fetch --no-tags linux-review Antoine-Tenart/net-sysfs-fix-race-conditions-in-the-xps-code/20201218-002852
> >         git checkout f989c3dcbe4d9abd1c6c48b34f08c6c0cd9d44b3
> >         # save the attached .config to linux build tree
> >         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=riscv 
> > 
> > If you fix the issue, kindly add following tag as appropriate
> > Reported-by: kernel test robot <lkp@intel.com>
> > 
> > Note: the linux-review/Antoine-Tenart/net-sysfs-fix-race-conditions-in-the-xps-code/20201218-002852 HEAD 563d144b47845dea594b409ecf22914b9797cd1e builds fine.
> >       It only hurts bisectibility.
> > 
> > All errors (new ones prefixed by >>):
> > 
> >    /tmp/ics932s401-422897.s: Assembler messages:
> > >> /tmp/ics932s401-422897.s:260: Error: unrecognized opcode `zext.b a1,s11'
> >    /tmp/ics932s401-422897.s:362: Error: unrecognized opcode `zext.b a1,s11'
> >    /tmp/ics932s401-422897.s:518: Error: unrecognized opcode `zext.b a1,s11'
> >    /tmp/ics932s401-422897.s:637: Error: unrecognized opcode `zext.b a1,s11'
> >    /tmp/ics932s401-422897.s:774: Error: unrecognized opcode `zext.b a1,s11'
> >    /tmp/ics932s401-422897.s:893: Error: unrecognized opcode `zext.b a1,s11'
> >    /tmp/ics932s401-422897.s:1021: Error: unrecognized opcode `zext.b a1,s11'
> > >> /tmp/ics932s401-422897.s:1180: Error: unrecognized opcode `zext.b a1,s2'
> >    clang-12: error: assembler command failed with exit code 1 (use -v to see invocation)
> > 
> > ---
> > 0-DAY CI Kernel Test Service, Intel Corporation
> > https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
> _______________________________________________
> kbuild-all mailing list -- kbuild-all(a)lists.01.org
> To unsubscribe send an email to kbuild-all-leave(a)lists.01.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-12-21  4:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-17 16:25 [PATCH net 0/4] net-sysfs: fix race conditions in the xps code Antoine Tenart
2020-12-17 16:25 ` [PATCH net 1/4] net-sysfs: take the rtnl lock when storing xps_cpus Antoine Tenart
2020-12-18 15:27   ` kernel test robot
2020-12-18 15:27     ` kernel test robot
2020-12-18 16:11     ` Antoine Tenart
2020-12-18 16:11       ` Antoine Tenart
2020-12-21  4:09       ` [kbuild-all] " Philip Li
2020-12-21  4:09         ` Philip Li
2020-12-19  0:30   ` Jakub Kicinski
2020-12-19  1:41     ` Alexander Duyck
2020-12-19 15:01       ` Antoine Tenart
2020-12-17 16:25 ` [PATCH net 2/4] net-sysfs: take the rtnl lock when accessing xps_cpus_map and num_tc Antoine Tenart
2020-12-17 16:25 ` [PATCH net 3/4] net-sysfs: take the rtnl lock when storing xps_rxqs Antoine Tenart
2020-12-17 16:25 ` [PATCH net 4/4] net-sysfs: take the rtnl lock when accessing xps_rxqs_map and num_tc Antoine Tenart

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.