linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yanjun Zhu <yanjun.zhu@linux.dev>
To: Yanjun Zhu <yanjun.zhu@linux.dev>,
	jgg@ziepe.ca, leon@kernel.org, zyjzyj2000@gmail.com,
	linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
	davem@davemloft.net, Parav Pandit <parav@nvidia.com>
Cc: Zhu Yanjun <yanjun.zhu@intel.com>
Subject: Re: [PATCHv2 0/6] Fix the problem that rxe can not work in net
Date: Fri, 11 Nov 2022 10:36:39 +0800	[thread overview]
Message-ID: <25767d73-c7fc-4831-4a45-337764430fe7@linux.dev> (raw)
In-Reply-To: <204f1ef4-77b1-7d4b-4953-00a99ce83be4@linux.dev>

在 2022/10/19 22:56, Yanjun Zhu 写道:
> 在 2022/10/6 16:59, yanjun.zhu@linux.dev 写道:
>> From: Zhu Yanjun <yanjun.zhu@intel.com>
>>
>> When run "ip link add" command to add a rxe rdma link in a net
>> namespace, normally this rxe rdma link can not work in a net
>> name space.
>>
>> The root cause is that a sock listening on udp port 4791 is created
>> in init_net when the rdma_rxe module is loaded into kernel. That is,
>> the sock listening on udp port 4791 is created in init_net. Other net
>> namespace is difficult to use this sock.
>>
>> The following commits will solve this problem.
>>
>> In the first commit, move the creating sock listening on udp port 4791
>> from module_init function to rdma link creating functions. That is,
>> after the module rdma_rxe is loaded, the sock will not be created.
>> When run "rdma link add ..." command, the sock will be created. So
>> when creating a rdma link in the net namespace, the sock will be
>> created in this net namespace.
>>
>> In the second commit, the functions udp4_lib_lookup and udp6_lib_lookup
>> will check the sock exists in the net namespace or not. If yes, rdma
>> link will increase the reference count of this sock, then continue other
>> jobs instead of creating a new sock to listen on udp port 4791. Since the
>> network notifier is global, when the module rdma_rxe is loaded, this
>> notifier will be registered.
>>
>> After the rdma link is created, the command "rdma link del" is to
>> delete rdma link at the same time the sock is checked. If the reference
>> count of this sock is greater than the sock reference count needed by
>> udp tunnel, the sock reference count is decreased by one. If equal, it
>> indicates that this rdma link is the last one. As such, the udp tunnel
>> is shut down and the sock is closed. The above work should be
>> implemented in linkdel function. But currently no dellink function in
>> rxe. So the 3rd commit addes dellink function pointer. And the 4th
>> commit implements the dellink function in rxe.
>>
>> To now, it is not necessary to keep a global variable to store the sock
>> listening udp port 4791. This global variable can be replaced by the
>> functions udp4_lib_lookup and udp6_lib_lookup totally. Because the
>> function udp6_lib_lookup is in the fast path, a member variable l_sk6
>> is added to store the sock. If l_sk6 is NULL, udp6_lib_lookup is called
>> to lookup the sock, then the sock is stored in l_sk6, in the future,it
>> can be used directly.
>>
>> All the above work has been done in init_net. And it can also work in
>> the net namespace. So the init_net is replaced by the individual net
>> namespace. This is what the 6th commit does. Because rxe device is
>> dependent on the net device and the sock listening on udp port 4791,
>> every rxe device is in exclusive mode in the individual net namespace.
>> Other rdma netns operations will be considerred in the future.
>>
>> Test steps:
>> 1) Suppose that 2 NICs are in 2 different net namespaces.
>>
>>   # ip netns exec net0 ip link
>>   3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
>>      link/ether 00:1e:67:a0:22:3f brd ff:ff:ff:ff:ff:ff
>>      altname enp5s0
>>
>>   # ip netns exec net1 ip link
>>   4: eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
>>      link/ether f8:e4:3b:3b:e4:10 brd ff:ff:ff:ff:ff:ff
>>
>> 2) Add rdma link in the different net namespace
>>     net0:
>>     ip netns exec net0 rdma link add rxe0 type rxe netdev eno2
>>
>>     net1:
>>     ip netns exec net1 rdma link add rxe1 type rxe netdev eno3
>>
>> 3) Run rping test.
>>     net0
>>     # ip netns exec net0 rping -s -a 192.168.2.1 -C 1&
>>     [1] 1737
>>     # ip netns exec net1 rping -c -a 192.168.2.1 -d -v -C 1
>>     verbose
>>     count 1
>>     ...
>>     ping data: rdma-ping-0: 
>> ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
>>     ...
>>
>> 4) Remove the rdma links from the net namespaces.
>>     net0:
>>     ip netns exec net0 rdma link del rxe0
>>     net1:
>>     ip netns exec net1 rdma link del rxe1
>>
>> ---
>> V1->V2: Add the explicit initialization of sk6.
>> ---
>> Zhu Yanjun (6):
>>    RDMA/rxe: Creating listening sock in newlink function
>>    RDMA/rxe: Support more rdma links in init_net
>>    RDMA/nldev: Add dellink function pointer
>>    RDMA/rxe: Implement dellink in rxe
>>    RDMA/rxe: Replace global variable with sock lookup functions
>>    RDMA/rxe: add the support of net namespace

Hi, Parav Pandit

I think you are the expert of netns. Can you help to review these patches?

Thanks and Regards,
Zhu Yanjun

> 
> Gently ping
> 
> Zhu Yanjun
> 
>>
>>   drivers/infiniband/core/nldev.c       |   6 ++
>>   drivers/infiniband/sw/rxe/rxe.c       |  26 +++++-
>>   drivers/infiniband/sw/rxe/rxe_net.c   | 129 ++++++++++++++++++++------
>>   drivers/infiniband/sw/rxe/rxe_net.h   |   9 +-
>>   drivers/infiniband/sw/rxe/rxe_verbs.h |   1 +
>>   include/rdma/rdma_netlink.h           |   2 +
>>   6 files changed, 134 insertions(+), 39 deletions(-)
>>
> 


  reply	other threads:[~2022-11-11  2:41 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-06  8:59 [PATCHv2 0/6] Fix the problem that rxe can not work in net yanjun.zhu
2022-10-06  8:59 ` [PATCHv2 1/6] RDMA/rxe: Creating listening sock in newlink function yanjun.zhu
2022-10-06  8:59 ` [PATCHv2 2/6] RDMA/rxe: Support more rdma links in init_net yanjun.zhu
2022-10-06  8:59 ` [PATCHv2 3/6] RDMA/nldev: Add dellink function pointer yanjun.zhu
2022-10-06  8:59 ` [PATCHv2 4/6] RDMA/rxe: Implement dellink in rxe yanjun.zhu
2022-10-06  8:59 ` [PATCHv2 5/6] RDMA/rxe: Replace global variable with sock lookup functions yanjun.zhu
2022-10-06  8:59 ` [PATCHv2 6/6] RDMA/rxe: add the support of net namespace yanjun.zhu
2022-10-19 14:56 ` [PATCHv2 0/6] Fix the problem that rxe can not work in net Yanjun Zhu
2022-11-11  2:36   ` Yanjun Zhu [this message]
2022-11-11  3:35     ` Parav Pandit
2022-11-11  3:38       ` Yanjun Zhu
2022-11-13  4:58         ` Parav Pandit
2022-11-13 10:25           ` Yanjun Zhu
2023-02-13 12:00           ` Zhu Yanjun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=25767d73-c7fc-4831-4a45-337764430fe7@linux.dev \
    --to=yanjun.zhu@linux.dev \
    --cc=davem@davemloft.net \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=parav@nvidia.com \
    --cc=yanjun.zhu@intel.com \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).