On 9/7/20 9:48 PM, Ka-Cheong Poon wrote: > This may require a number of changes and the way a client interacts with > the current RDMA framework.  For example, currently a client registers > once using one struct ib_client and gets device notifications for all > namespaces and devices.  Suppose there is rdma_[un]register_net_client(), > it may need to require a client to use a different struct ib_client to > register for each net namespace.  And struct ib_client probably needs to > have a field to store the net namespace.  Probably all those client > interaction functions will need to be modified.  Since the clients xarray > is global, more clients may mean performance implication, such as it takes > longer to go through the whole clients xarray. > > There are probably many other subtle changes required.  It may turn out to > be not so straight forward.  Is this community willing the take such changes? > I can take a stab at it if the community really thinks that this is preferred. Attached is a diff of a prototype for the above. This exercise is to see what needs to be done to have a more network namespace aware interface for RDMA client registration. Currently, there are ib_[un]register_client(). Under the RDMA namespace exclusive mode, all RDMA devices are assigned to the init_net namespace initially. A kernel module uses this interface to register with the RDMA subsystem. When a device is assigned to a namespace, the client's registered remove upcall is called with the device as the parameter (this is removing from the init_net namespace). Then the client's add upcall is called with the device as the parameter (this is assigning to the new namespace). When that namespace is removed (*), a similar sequence of events happen, a remove upcall (removing from the namespace) is followed by add upcall (assigning back to the init_net namespace). All the RDMA clients are stored in a global struct xarray called clients (in device.c) and each client is assigned a client ID. This exercise adds the rdma_[un]register_net_client() for those clients which want to have more separation between different namespaces. This interface takes a struct net parameter. A kernel module uses this to indicate that it is only interested in the RDMA events related to the given network namespace. Suppose a client uses init_net as the parameter. In the above example when a device is assigned to a namespace, only the client's remove upcall is called (removing from the init_net namespace). The add upcall is not followed. Then when the namespace is removed, the client's add upcall is called (re-assigning back to the init_net namespace). Suppose a client uses a specific namespace as the parameter. When a device is assigned to that specific namespace, the client's add upcall is called. When the client unregisters with RDMA (or when the namespace is going away), the client's remove upcall is called. The RDMA clients are stored in each namespace's struct rdma_dev_net and each client is assigned a client ID in that namespace (this means that it is unique only in that namespace but not unique globally among all namespaces). This seemingly simple exercise turned out to be not so simple because of the need to keep the existing interface with the existing behavior. So only when a client uses the new interface, the behavior is changed to what is described above. There should be no change of behavior to any existing RDMA client. There are several obstacles to overcome for this change. One difficulty is the global client ID since a lot of code rely on this ID as an index the both the global clients xarray and individual device's client_data xarray. Detailed changes are in the attached diff if folks are interested. Note that the new interface has one obvious issue, it does not make much sense in RDMA shared network namespace mode. In the shared mode, all devices are associated with init_net. So if a client uses the new interface to register a specific namespace other than init_net, it will never get any upcall. This and the difficulties in adding a seemingly simple interface makes me wonder about the following questions. Is the RDMA shared namespace mode the preferred mode to use as it is the default mode? Is it expected that a client knows the running mode before interacting with the RDMA subsystem? Is a client not supposed to differentiate different namespaces? Besides the current add client upcall, another example related to this is about event handling. Suppose a client calls rdma_create_id() to create listeners in different namespaces but with the same event handler. A new connection comes in and the event handler is called for an RDMA_CM_EVENT_CONNECT_REQUEST event. There is no obvious namespace info regarding the event. It seems that the only way to find out the namespace info is to use the context of struct rdma_cm_id. The client must somehow add the namespace info to the context since the subsystem does not provide any help. Is this the assumed solution? BTW, this exercise still does not remove the need to have rdma_dev_to_netns() as the add upcall does not provide any namespace info. Given all these questions, the rdma_[un]register_net_client() do not seem to fit in the current way in interacting with the RDMA subsystem unfortunately. Thanks. (*) Note that in __rdma_create_id(), it does a get_net(net) to put a reference on a namespace. Suppose a kernel module calls rdma_create_id() in its namespace .init function to create an RDMA listener and calls rdma_destroy_id() in its namespace .exit function to destroy it. Since __rdma_create_id() adds a reference to a namespace, when a sys admin deletes a namespace (say `ip netns del ...`), the namespace won't be deleted because of this reference. And the module will not release this reference until its .exit function is called only when the namespace is deleted. To resolve this issue, in the diff (in __rdma_create_id()), I did something similar to the kern check in sk_alloc(). -- K. Poon ka-cheong.poon@oracle.com