All of lore.kernel.org
 help / color / mirror / Atom feed
From: yanjun.zhu@linux.dev
To: jgg@ziepe.ca, leon@kernel.org, zyjzyj2000@gmail.com,
	linux-rdma@vger.kernel.org, yanjun.zhu@linux.dev
Subject: [PATCH 0/6] Fix the problem that rxe can not work in net
Date: Sun,  2 Oct 2022 00:41:46 -0400	[thread overview]
Message-ID: <20221002044152.933021-1-yanjun.zhu@linux.dev> (raw)

From: Zhu Yanjun <yanjun.zhu@linux.dev>

When run "ip link add" command to add a rxe rdma link in a net
namespace, normally this rxe rdma link can not work in a net
name space. 

The root cause is that a sock listening on udp port 4791 is created
in init_net when the rdma_rxe module is loaded into kernel. That is,
the sock listening on udp port 4791 is created in init_net. Other net
namespace is difficult to use this sock.

The following commits will solve this problem.

In the first commit, move the creating sock listening on udp port 4791
from module_init function to rdma link creating functions. That is,
after the module rdma_rxe is loaded, the sock will not be created.
When run "rdma link add ..." command, the sock will be created. So
when creating a rdma link in the net namespace, the sock will be
created in this net namespace.

In the second commit, the functions udp4_lib_lookup and udp6_lib_lookup
will check the sock exists in the net namespace or not. If yes, rdma
link will increase the reference count of this sock, then continue other
jobs instead of creating a new sock to listen on udp port 4791. Since the
network notifier is global, when the module rdma_rxe is loaded, this
notifier will be registered.

After the rdma link is created, the command "rdma link del" is to
delete rdma link at the same time the sock is checked. If the reference
count of this sock is greater than the sock reference count needed by
udp tunnel, the sock reference count is decreased by one. If equal, it
indicates that this rdma link is the last one. As such, the udp tunnel
is shut down and the sock is closed. The above work should be
implemented in linkdel function. But currently no dellink function in
rxe. So the 3rd commit addes dellink function pointer. And the 4th
commit implements the dellink function in rxe.

To now, it is not necessary to keep a global variable to store the sock
listening udp port 4791. This global variable can be replaced by the
functions udp4_lib_lookup and udp6_lib_lookup totally. Because the
function udp6_lib_lookup is in the fast path, a member variable l_sk6
is added to store the sock. If l_sk6 is NULL, udp6_lib_lookup is called
to lookup the sock, then the sock is stored in l_sk6, in the future,it
can be used directly.

All the above work has been done in init_net. And it can also work in
the net namespace. So the init_net is replaced by the individual net
namespace. This is what the 6th commit does. Because rxe device is
dependent on the net device and the sock listening on udp port 4791,
every rxe device is in exclusive mode in the individual net namespace.
Other rdma netns operations will be considerred in the future.

Test steps:
1) Suppose that 2 NICs are in 2 different net namespaces.

 # ip netns exec net0 ip link
 3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
    link/ether 00:1e:67:a0:22:3f brd ff:ff:ff:ff:ff:ff
    altname enp5s0

 # ip netns exec net1 ip link
 4: eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
    link/ether f8:e4:3b:3b:e4:10 brd ff:ff:ff:ff:ff:ff

2) Add rdma link in the different net namespace
   net0:
   ip netns exec net0 rdma link add rxe0 type rxe netdev eno2

   net1:
   ip netns exec net1 rdma link add rxe1 type rxe netdev eno3

3) Run rping test.
   net0
   # ip netns exec net0 rping -s -a 192.168.2.1 -C 1&
   [1] 1737
   # ip netns exec net1 rping -c -a 192.168.2.1 -d -v -C 1
   verbose
   count 1
   ...
   ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
   ...

4) Remove the rdma links from the net namespaces.
   net0:
   ip netns exec net0 rdma link del rxe0
   net1:
   ip netns exec net1 rdma link del rxe1

Zhu Yanjun (6):
  RDMA/rxe: Creating listening sock in newlink function
  RDMA/rxe: Support more rdma links in init_net
  RDMA/nldev: Add dellink function pointer
  RDMA/rxe: Implement dellink in rxe
  RDMA/rxe: Replace global variable with sock lookup functions
  RDMA/rxe: add the support of net namespace

 drivers/infiniband/core/nldev.c       |   6 ++
 drivers/infiniband/sw/rxe/rxe.c       |  27 +++++-
 drivers/infiniband/sw/rxe/rxe_net.c   | 131 ++++++++++++++++++++------
 drivers/infiniband/sw/rxe/rxe_net.h   |   9 +-
 drivers/infiniband/sw/rxe/rxe_verbs.h |   1 +
 include/rdma/rdma_netlink.h           |   2 +
 6 files changed, 135 insertions(+), 41 deletions(-)

-- 
2.25.1


             reply	other threads:[~2022-10-01 12:15 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-02  4:41 yanjun.zhu [this message]
2022-10-02  4:41 ` [PATCH 1/6] RDMA/rxe: Creating listening sock in newlink function yanjun.zhu
2022-10-02  4:41 ` [PATCH 2/6] RDMA/rxe: Support more rdma links in init_net yanjun.zhu
2022-10-02  4:41 ` [PATCH 3/6] RDMA/nldev: Add dellink function pointer yanjun.zhu
2022-10-02  4:41 ` [PATCH 4/6] RDMA/rxe: Implement dellink in rxe yanjun.zhu
2022-10-02  4:41 ` [PATCH 5/6] RDMA/rxe: Replace global variable with sock lookup functions yanjun.zhu
2022-10-02  4:41 ` [PATCH 6/6] RDMA/rxe: add the support of net namespace yanjun.zhu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221002044152.933021-1-yanjun.zhu@linux.dev \
    --to=yanjun.zhu@linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.