From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBD7CC43219 for ; Wed, 19 Oct 2022 15:04:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229774AbiJSPEN (ORCPT ); Wed, 19 Oct 2022 11:04:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34430 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231867AbiJSPDz (ORCPT ); Wed, 19 Oct 2022 11:03:55 -0400 Received: from out2.migadu.com (out2.migadu.com [IPv6:2001:41d0:2:aacc::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D0261BF84A; Wed, 19 Oct 2022 07:58:13 -0700 (PDT) Message-ID: <204f1ef4-77b1-7d4b-4953-00a99ce83be4@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1666191411; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kU7BtG77M/ab+DXgf8+I8IYdsDkahSG2ZOcGzVisnBA=; b=oWs0XLC9kkNRU8MU8zw2y6jF4xWvndFwanpEjkSnhxbeymE0odhLIr81TQ6cSOdSH+E8Hs LreGRNgHXuoFonOS66m9dX8sCPAVv+neaz3s+mD6nxsScg7x/xMlYeyBXDBD2yAiAklC3A zoDyzWX2luSJOWQU9l2daR0UjtZZ1S4= Date: Wed, 19 Oct 2022 22:56:45 +0800 MIME-Version: 1.0 Subject: Re: [PATCHv2 0/6] Fix the problem that rxe can not work in net To: yanjun.zhu@linux.dev, jgg@ziepe.ca, leon@kernel.org, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org, netdev@vger.kernel.org, davem@davemloft.net Cc: Zhu Yanjun References: <20221006085921.1323148-1-yanjun.zhu@linux.dev> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yanjun Zhu In-Reply-To: <20221006085921.1323148-1-yanjun.zhu@linux.dev> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org 在 2022/10/6 16:59, yanjun.zhu@linux.dev 写道: > From: Zhu Yanjun > > When run "ip link add" command to add a rxe rdma link in a net > namespace, normally this rxe rdma link can not work in a net > name space. > > The root cause is that a sock listening on udp port 4791 is created > in init_net when the rdma_rxe module is loaded into kernel. That is, > the sock listening on udp port 4791 is created in init_net. Other net > namespace is difficult to use this sock. > > The following commits will solve this problem. > > In the first commit, move the creating sock listening on udp port 4791 > from module_init function to rdma link creating functions. That is, > after the module rdma_rxe is loaded, the sock will not be created. > When run "rdma link add ..." command, the sock will be created. So > when creating a rdma link in the net namespace, the sock will be > created in this net namespace. > > In the second commit, the functions udp4_lib_lookup and udp6_lib_lookup > will check the sock exists in the net namespace or not. If yes, rdma > link will increase the reference count of this sock, then continue other > jobs instead of creating a new sock to listen on udp port 4791. Since the > network notifier is global, when the module rdma_rxe is loaded, this > notifier will be registered. > > After the rdma link is created, the command "rdma link del" is to > delete rdma link at the same time the sock is checked. If the reference > count of this sock is greater than the sock reference count needed by > udp tunnel, the sock reference count is decreased by one. If equal, it > indicates that this rdma link is the last one. As such, the udp tunnel > is shut down and the sock is closed. The above work should be > implemented in linkdel function. But currently no dellink function in > rxe. So the 3rd commit addes dellink function pointer. And the 4th > commit implements the dellink function in rxe. > > To now, it is not necessary to keep a global variable to store the sock > listening udp port 4791. This global variable can be replaced by the > functions udp4_lib_lookup and udp6_lib_lookup totally. Because the > function udp6_lib_lookup is in the fast path, a member variable l_sk6 > is added to store the sock. If l_sk6 is NULL, udp6_lib_lookup is called > to lookup the sock, then the sock is stored in l_sk6, in the future,it > can be used directly. > > All the above work has been done in init_net. And it can also work in > the net namespace. So the init_net is replaced by the individual net > namespace. This is what the 6th commit does. Because rxe device is > dependent on the net device and the sock listening on udp port 4791, > every rxe device is in exclusive mode in the individual net namespace. > Other rdma netns operations will be considerred in the future. > > Test steps: > 1) Suppose that 2 NICs are in 2 different net namespaces. > > # ip netns exec net0 ip link > 3: eno2: mtu 1500 qdisc mq state UP > link/ether 00:1e:67:a0:22:3f brd ff:ff:ff:ff:ff:ff > altname enp5s0 > > # ip netns exec net1 ip link > 4: eno3: mtu 1500 qdisc fq_codel > link/ether f8:e4:3b:3b:e4:10 brd ff:ff:ff:ff:ff:ff > > 2) Add rdma link in the different net namespace > net0: > ip netns exec net0 rdma link add rxe0 type rxe netdev eno2 > > net1: > ip netns exec net1 rdma link add rxe1 type rxe netdev eno3 > > 3) Run rping test. > net0 > # ip netns exec net0 rping -s -a 192.168.2.1 -C 1& > [1] 1737 > # ip netns exec net1 rping -c -a 192.168.2.1 -d -v -C 1 > verbose > count 1 > ... > ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr > ... > > 4) Remove the rdma links from the net namespaces. > net0: > ip netns exec net0 rdma link del rxe0 > net1: > ip netns exec net1 rdma link del rxe1 > > --- > V1->V2: Add the explicit initialization of sk6. > --- > Zhu Yanjun (6): > RDMA/rxe: Creating listening sock in newlink function > RDMA/rxe: Support more rdma links in init_net > RDMA/nldev: Add dellink function pointer > RDMA/rxe: Implement dellink in rxe > RDMA/rxe: Replace global variable with sock lookup functions > RDMA/rxe: add the support of net namespace Gently ping Zhu Yanjun > > drivers/infiniband/core/nldev.c | 6 ++ > drivers/infiniband/sw/rxe/rxe.c | 26 +++++- > drivers/infiniband/sw/rxe/rxe_net.c | 129 ++++++++++++++++++++------ > drivers/infiniband/sw/rxe/rxe_net.h | 9 +- > drivers/infiniband/sw/rxe/rxe_verbs.h | 1 + > include/rdma/rdma_netlink.h | 2 + > 6 files changed, 134 insertions(+), 39 deletions(-) >