All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Leon Romanovsky <leon@kernel.org>
Cc: Dragos Tatulea <dtatulea@nvidia.com>,
	linux-rdma@vger.kernel.org, Tariq Toukan <tariqt@nvidia.com>
Subject: Re: [PATCH rdma-rc v1] IB/IPoIB: Fix legacy IPoIB due to wrong number of queues
Date: Fri, 20 Jan 2023 16:52:21 -0400	[thread overview]
Message-ID: <Y8r/BUdb7XMxwVN+@nvidia.com> (raw)
In-Reply-To: <752143b0eef72a966662ce94526b1ceb5ba4bbb3.1674234106.git.leon@kernel.org>

On Fri, Jan 20, 2023 at 07:02:48PM +0200, Leon Romanovsky wrote:
> From: Dragos Tatulea <dtatulea@nvidia.com>
> 
> The cited commit creates child PKEY interfaces over netlink will multiple
> tx and rx queues, but some devices doesn't support more than 1 tx and 1 rx
> queues. This causes to a crash when traffic is sent over the PKEY interface
> due to the parent having a single queue but the child having multiple queues.
> 
> This patch inherits the real_num_tx/rx_queues from the parent netdev.
> 
> BUG: kernel NULL pointer dereference, address: 000000000000036b
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP
> CPU: 4 PID: 209665 Comm: python3 Not tainted 6.1.0_for_upstream_min_debug_2022_12_12_17_02 #1
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> RIP: 0010:kmem_cache_alloc+0xcb/0x450
> Code: ce 7e 49 8b 50 08 49 83 78 10 00 4d 8b 28 0f 84 cb 02 00 00 4d 85 ed 0f 84 c2 02 00 00 41 8b 44 24 28 48 8d 4a 01 49 8b 3c 24 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b8 41 8b
> RSP: 0018:ffff88822acbbab8 EFLAGS: 00010202
> RAX: 0000000000000070 RBX: ffff8881c28e3e00 RCX: 00000000064f8dae
> RDX: 00000000064f8dad RSI: 0000000000000a20 RDI: 0000000000030d00
> RBP: 0000000000000a20 R08: ffff8882f5d30d00 R09: ffff888104032f40
> R10: ffff88810fade828 R11: 736f6d6570736575 R12: ffff88810081c000
> R13: 00000000000002fb R14: ffffffff817fc865 R15: 0000000000000000
> FS:  00007f9324ff9700(0000) GS:ffff8882f5d00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000000036b CR3: 00000001125af004 CR4: 0000000000370ea0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  skb_clone+0x55/0xd0
>  ip6_finish_output2+0x3fe/0x690
>  ip6_finish_output+0xfa/0x310
>  ip6_send_skb+0x1e/0x60
>  udp_v6_send_skb+0x1e5/0x420
>  udpv6_sendmsg+0xb3c/0xe60
>  ? ip_mc_finish_output+0x180/0x180
>  ? __switch_to_asm+0x3a/0x60
>  ? __switch_to_asm+0x34/0x60
>  sock_sendmsg+0x33/0x40
>  __sys_sendto+0x103/0x160
>  ? _copy_to_user+0x21/0x30
>  ? kvm_clock_get_cycles+0xd/0x10
>  ? ktime_get_ts64+0x49/0xe0
>  __x64_sys_sendto+0x25/0x30
>  do_syscall_64+0x3d/0x90
>  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> RIP: 0033:0x7f9374f1ed14
> Code: 42 41 f8 ff 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 34 89 ef 48 89 44 24 08 e8 68 41 f8 ff 48 8b
> RSP: 002b:00007f9324ff7bd0 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 00007f9324ff7cc8 RCX: 00007f9374f1ed14
> RDX: 00000000000002fb RSI: 00007f93000052f0 RDI: 0000000000000030
> RBP: 0000000000000000 R08: 00007f9324ff7d40 R09: 000000000000001c
> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
> R13: 000000012a05f200 R14: 0000000000000001 R15: 00007f9374d57bdc
>  </TASK>
> 
> Fixes: dbc94a0fb817 ("IB/IPoIB: Fix queue count inconsistency for PKEY child interfaces")
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
> Changelog:
> v1:
>  * Fixed typo in warning print.
> v0: https://lore.kernel.org/all/4a7ecec08ee30ad8004019818fadf1e58057e945.1674137153.git.leon@kernel.org
> ---
>  drivers/infiniband/ulp/ipoib/ipoib_netlink.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
> index 9ad8d9856275..0548735a15b5 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
> @@ -126,6 +126,18 @@ static int ipoib_new_child_link(struct net *src_net, struct net_device *dev,
>  	} else
>  		child_pkey  = nla_get_u16(data[IFLA_IPOIB_PKEY]);
>  
> +	err = netif_set_real_num_tx_queues(dev, pdev->real_num_tx_queues);
> +	if (err) {
> +		ipoib_warn(ppriv, "failed setting the child tx queue count based on parent\n");
> +		return err;
> +	}
> +
> +	err = netif_set_real_num_rx_queues(dev, pdev->real_num_rx_queues);
> +	if (err) {
> +		ipoib_warn(ppriv, "failed setting the child rx queue count based on parent\n");
> +		return err;
> +	}

This still seems flawed.. Netlink does this:

	unsigned int num_rx_queues = 1;

	if (tb[IFLA_NUM_RX_QUEUES])
		num_rx_queues = nla_get_u32(tb[IFLA_NUM_RX_QUEUES]);
	else if (ops->get_num_rx_queues)
		num_rx_queues = ops->get_num_rx_queues();

So num_rx_queues can really be any value that userspaces cares to
provide.

If pdev->real_num_rx_queues is > the user provided value then
netif_set_real_num_rx_queues() just fails.

So at a minimum this should min the actual number of queues requested
against the maximum number of queues the driver can provide and use
that to set the real queues.

And the return of a really big number from ops->get_num_rx_queues is
pretty ugly too, ideally that would be fixed to pass in some function
arguments and obtain the ppriv so it can return the actual maximum
number of queues and we don't waste a bunch of memory..

Jason

  reply	other threads:[~2023-01-20 20:52 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-20 17:02 [PATCH rdma-rc v1] IB/IPoIB: Fix legacy IPoIB due to wrong number of queues Leon Romanovsky
2023-01-20 20:52 ` Jason Gunthorpe [this message]
2023-01-21  8:40   ` Dragos Tatulea
2023-01-22 12:44   ` Leon Romanovsky
2023-01-23 18:32     ` Jason Gunthorpe
2023-01-24  6:27       ` Leon Romanovsky
2023-01-24 13:00         ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y8r/BUdb7XMxwVN+@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=dtatulea@nvidia.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.