From: Yanjun Zhu <yanjun.zhu@linux.dev>
To: Yanjun Zhu <yanjun.zhu@linux.dev>,
jgg@ziepe.ca, leon@kernel.org, zyjzyj2000@gmail.com,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
davem@davemloft.net, Parav Pandit <parav@nvidia.com>
Cc: Zhu Yanjun <yanjun.zhu@intel.com>
Subject: Re: [PATCHv2 0/6] Fix the problem that rxe can not work in net
Date: Fri, 11 Nov 2022 10:36:39 +0800 [thread overview]
Message-ID: <25767d73-c7fc-4831-4a45-337764430fe7@linux.dev> (raw)
In-Reply-To: <204f1ef4-77b1-7d4b-4953-00a99ce83be4@linux.dev>
在 2022/10/19 22:56, Yanjun Zhu 写道:
> 在 2022/10/6 16:59, yanjun.zhu@linux.dev 写道:
>> From: Zhu Yanjun <yanjun.zhu@intel.com>
>>
>> When run "ip link add" command to add a rxe rdma link in a net
>> namespace, normally this rxe rdma link can not work in a net
>> name space.
>>
>> The root cause is that a sock listening on udp port 4791 is created
>> in init_net when the rdma_rxe module is loaded into kernel. That is,
>> the sock listening on udp port 4791 is created in init_net. Other net
>> namespace is difficult to use this sock.
>>
>> The following commits will solve this problem.
>>
>> In the first commit, move the creating sock listening on udp port 4791
>> from module_init function to rdma link creating functions. That is,
>> after the module rdma_rxe is loaded, the sock will not be created.
>> When run "rdma link add ..." command, the sock will be created. So
>> when creating a rdma link in the net namespace, the sock will be
>> created in this net namespace.
>>
>> In the second commit, the functions udp4_lib_lookup and udp6_lib_lookup
>> will check the sock exists in the net namespace or not. If yes, rdma
>> link will increase the reference count of this sock, then continue other
>> jobs instead of creating a new sock to listen on udp port 4791. Since the
>> network notifier is global, when the module rdma_rxe is loaded, this
>> notifier will be registered.
>>
>> After the rdma link is created, the command "rdma link del" is to
>> delete rdma link at the same time the sock is checked. If the reference
>> count of this sock is greater than the sock reference count needed by
>> udp tunnel, the sock reference count is decreased by one. If equal, it
>> indicates that this rdma link is the last one. As such, the udp tunnel
>> is shut down and the sock is closed. The above work should be
>> implemented in linkdel function. But currently no dellink function in
>> rxe. So the 3rd commit addes dellink function pointer. And the 4th
>> commit implements the dellink function in rxe.
>>
>> To now, it is not necessary to keep a global variable to store the sock
>> listening udp port 4791. This global variable can be replaced by the
>> functions udp4_lib_lookup and udp6_lib_lookup totally. Because the
>> function udp6_lib_lookup is in the fast path, a member variable l_sk6
>> is added to store the sock. If l_sk6 is NULL, udp6_lib_lookup is called
>> to lookup the sock, then the sock is stored in l_sk6, in the future,it
>> can be used directly.
>>
>> All the above work has been done in init_net. And it can also work in
>> the net namespace. So the init_net is replaced by the individual net
>> namespace. This is what the 6th commit does. Because rxe device is
>> dependent on the net device and the sock listening on udp port 4791,
>> every rxe device is in exclusive mode in the individual net namespace.
>> Other rdma netns operations will be considerred in the future.
>>
>> Test steps:
>> 1) Suppose that 2 NICs are in 2 different net namespaces.
>>
>> # ip netns exec net0 ip link
>> 3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
>> link/ether 00:1e:67:a0:22:3f brd ff:ff:ff:ff:ff:ff
>> altname enp5s0
>>
>> # ip netns exec net1 ip link
>> 4: eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
>> link/ether f8:e4:3b:3b:e4:10 brd ff:ff:ff:ff:ff:ff
>>
>> 2) Add rdma link in the different net namespace
>> net0:
>> ip netns exec net0 rdma link add rxe0 type rxe netdev eno2
>>
>> net1:
>> ip netns exec net1 rdma link add rxe1 type rxe netdev eno3
>>
>> 3) Run rping test.
>> net0
>> # ip netns exec net0 rping -s -a 192.168.2.1 -C 1&
>> [1] 1737
>> # ip netns exec net1 rping -c -a 192.168.2.1 -d -v -C 1
>> verbose
>> count 1
>> ...
>> ping data: rdma-ping-0:
>> ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
>> ...
>>
>> 4) Remove the rdma links from the net namespaces.
>> net0:
>> ip netns exec net0 rdma link del rxe0
>> net1:
>> ip netns exec net1 rdma link del rxe1
>>
>> ---
>> V1->V2: Add the explicit initialization of sk6.
>> ---
>> Zhu Yanjun (6):
>> RDMA/rxe: Creating listening sock in newlink function
>> RDMA/rxe: Support more rdma links in init_net
>> RDMA/nldev: Add dellink function pointer
>> RDMA/rxe: Implement dellink in rxe
>> RDMA/rxe: Replace global variable with sock lookup functions
>> RDMA/rxe: add the support of net namespace
Hi, Parav Pandit
I think you are the expert of netns. Can you help to review these patches?
Thanks and Regards,
Zhu Yanjun
>
> Gently ping
>
> Zhu Yanjun
>
>>
>> drivers/infiniband/core/nldev.c | 6 ++
>> drivers/infiniband/sw/rxe/rxe.c | 26 +++++-
>> drivers/infiniband/sw/rxe/rxe_net.c | 129 ++++++++++++++++++++------
>> drivers/infiniband/sw/rxe/rxe_net.h | 9 +-
>> drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
>> include/rdma/rdma_netlink.h | 2 +
>> 6 files changed, 134 insertions(+), 39 deletions(-)
>>
>
next prev parent reply other threads:[~2022-11-11 2:41 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-06 8:59 [PATCHv2 0/6] Fix the problem that rxe can not work in net yanjun.zhu
2022-10-06 8:59 ` [PATCHv2 1/6] RDMA/rxe: Creating listening sock in newlink function yanjun.zhu
2022-10-06 8:59 ` [PATCHv2 2/6] RDMA/rxe: Support more rdma links in init_net yanjun.zhu
2022-10-06 8:59 ` [PATCHv2 3/6] RDMA/nldev: Add dellink function pointer yanjun.zhu
2022-10-06 8:59 ` [PATCHv2 4/6] RDMA/rxe: Implement dellink in rxe yanjun.zhu
2022-10-06 8:59 ` [PATCHv2 5/6] RDMA/rxe: Replace global variable with sock lookup functions yanjun.zhu
2022-10-06 8:59 ` [PATCHv2 6/6] RDMA/rxe: add the support of net namespace yanjun.zhu
2022-10-19 14:56 ` [PATCHv2 0/6] Fix the problem that rxe can not work in net Yanjun Zhu
2022-11-11 2:36 ` Yanjun Zhu [this message]
2022-11-11 3:35 ` Parav Pandit
2022-11-11 3:38 ` Yanjun Zhu
2022-11-13 4:58 ` Parav Pandit
2022-11-13 10:25 ` Yanjun Zhu
2023-02-13 12:00 ` Zhu Yanjun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=25767d73-c7fc-4831-4a45-337764430fe7@linux.dev \
--to=yanjun.zhu@linux.dev \
--cc=davem@davemloft.net \
--cc=jgg@ziepe.ca \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=parav@nvidia.com \
--cc=yanjun.zhu@intel.com \
--cc=zyjzyj2000@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).