All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm
@ 2020-03-10  9:25 Leon Romanovsky
  2020-03-10  9:25 ` [PATCH rdma-next 01/15] RDMA/cm: Fix ordering of xa_alloc_cyclic() in ib_create_cm_id() Leon Romanovsky
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: Leon Romanovsky @ 2020-03-10  9:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Daniel Jurgens, Haggai Eran, linux-kernel,
	linux-rdma, Sean Hefty

From: Leon Romanovsky <leonro@mellanox.com>

From Jason:

cm_id.state is a non-atomic value that must always be read and written
under lock, or while the thread has the only pointer to the cm_id.

Critically, during MAD handling the cm_id.state is used to control when
MAD handlers can run, and in turn what data they can touch. Without
locking, an assignment to state can immediately allow concurrent MAD
handlers to execute, potentially creating a mess.

Several of these cases only risk load/store tearing, but create very
confusing code. For instance changing the state from IB_CM_IDLE to
IB_CM_LISTEN doesn't allow any MAD handlers to run in either state, but a
superficial audit would suggest that it is not locked properly.

This loose methodology has allowed two bugs to creep in. After creating an
ID the code did not lock the state transition, apparently mistakenly
assuming that the new ID could not be used concurrently. However, the ID
is immediately placed in the xarray and so a carefully crafted network
sequence could trigger races with the unlocked stores.

The main solution to many of these problems is to use the xarray to create
a two stage add - the first reserves the ID and the second publishes the
pointer. The second stage is either omitted entirely or moved after the
newly created ID is setup.

Where it is trivial to do so other places directly put the state
manipulation under lock, or add an assertion that it is, in fact, under
lock.

This also removes a number of places where the state is being read under
lock, then the lock dropped, reacquired and state tested again.

There remain other issues related to missing locking on cm_id data.

Thanks

------------------------------------------------------------------------
It is based on rdma-next + rdma-rc patch c14dfddbd869
("RMDA/cm: Fix missing ib_cm_destroy_id() in ib_cm_insert_listen()")

Jason Gunthorpe (15):
  RDMA/cm: Fix ordering of xa_alloc_cyclic() in ib_create_cm_id()
  RDMA/cm: Fix checking for allowed duplicate listens
  RDMA/cm: Remove a race freeing timewait_info
  RDMA/cm: Make the destroy_id flow more robust
  RDMA/cm: Simplify establishing a listen cm_id
  RDMA/cm: Read id.state under lock when doing pr_debug()
  RDMA/cm: Make it clear that there is no concurrency in
    cm_sidr_req_handler()
  RDMA/cm: Make it clearer how concurrency works in cm_req_handler()
  RDMA/cm: Add missing locking around id.state in cm_dup_req_handler
  RDMA/cm: Add some lockdep assertions for cm_id_priv->lock
  RDMA/cm: Allow ib_send_cm_dreq() to be done under lock
  RDMA/cm: Allow ib_send_cm_drep() to be done under lock
  RDMA/cm: Allow ib_send_cm_rej() to be done under lock
  RDMA/cm: Allow ib_send_cm_sidr_rep() to be done under lock
  RDMA/cm: Make sure the cm_id is in the IB_CM_IDLE state in destroy

 drivers/infiniband/core/cm.c | 732 ++++++++++++++++++++---------------
 1 file changed, 420 insertions(+), 312 deletions(-)

--
2.24.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-03-17 20:15 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-10  9:25 [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 01/15] RDMA/cm: Fix ordering of xa_alloc_cyclic() in ib_create_cm_id() Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 02/15] RDMA/cm: Fix checking for allowed duplicate listens Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 03/15] RDMA/cm: Remove a race freeing timewait_info Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 04/15] RDMA/cm: Make the destroy_id flow more robust Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 05/15] RDMA/cm: Simplify establishing a listen cm_id Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 06/15] RDMA/cm: Read id.state under lock when doing pr_debug() Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 07/15] RDMA/cm: Make it clear that there is no concurrency in cm_sidr_req_handler() Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 08/15] RDMA/cm: Make it clearer how concurrency works in cm_req_handler() Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 09/15] RDMA/cm: Add missing locking around id.state in cm_dup_req_handler Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 10/15] RDMA/cm: Add some lockdep assertions for cm_id_priv->lock Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 11/15] RDMA/cm: Allow ib_send_cm_dreq() to be done under lock Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 12/15] RDMA/cm: Allow ib_send_cm_drep() " Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 13/15] RDMA/cm: Allow ib_send_cm_rej() " Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 14/15] RDMA/cm: Allow ib_send_cm_sidr_rep() " Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 15/15] RDMA/cm: Make sure the cm_id is in the IB_CM_IDLE state in destroy Leon Romanovsky
2020-03-17 20:15 ` [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.