All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Mark Zhang <markzhang@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>,
	Doug Ledford <dledford@redhat.com>,
	linux-rdma@vger.kernel.org
Subject: Re: [PATCH rdma-next v2 7/9] IB/cm: Clear all associated AV's ports when remove a cm device
Date: Fri, 23 Apr 2021 11:24:30 -0300	[thread overview]
Message-ID: <20210423142430.GI1370958@nvidia.com> (raw)
In-Reply-To: <2eee42c7-04aa-eea1-f8a1-debf700ad0b0@nvidia.com>

On Fri, Apr 23, 2021 at 09:14:21PM +0800, Mark Zhang wrote:
> 
> 
> On 4/23/2021 3:34 AM, Jason Gunthorpe wrote:
> > On Wed, Apr 21, 2021 at 02:40:37PM +0300, Leon Romanovsky wrote:
> > > @@ -4396,6 +4439,14 @@ static void cm_remove_one(struct ib_device *ib_device, void *client_data)
> > >   	cm_dev->going_down = 1;
> > >   	spin_unlock_irq(&cm.lock);
> > > +	list_for_each_entry_safe(cm_id_priv, tmp,
> > > +				 &cm_dev->cm_id_priv_list, cm_dev_list) {
> > > +		if (!list_empty(&cm_id_priv->cm_dev_list))
> > > +			list_del(&cm_id_priv->cm_dev_list);
> > > +		cm_id_priv->av.port = NULL;
> > > +		cm_id_priv->alt_av.port = NULL;
> > > +	}
> > 
> > Ugh, this is in the wrong order, it has to be after the work queue
> > flush..
> > 
> > Hurm, I didn't see an easy way to fix it up, but I did think of a much
> > better design!
> > 
> > Generally speaking all we need is the memory of the cm_dev and port to
> > remain active, we don't need to block or fence with cm_remove_one(),
> > so just stick a memory kref on this thing and keep the memory. The
> > only things that needs to seralize with cm_remove_one() are on the
> > workqueue or take a spinlock (eg because they touch mad_agent)
> > 
> > Try this, I didn't finish every detail, applies on top of your series,
> > but you'll need to reflow it into new commits:
> 
> Thanks Jason, I think we still need a rwlock to protect "av->port"? It is
> modified and cleared by cm_set_av_port() and read in many places.

Hum..

This is a real mess.

It looks to me like any access to the av->port should always be
protected by the cm_id_priv->lock

Most already are, but the sets are wrong and a couple readers are wrong

Set reverse call chains:

cm_init_av_for_lap()
 cm_lap_handler(work) (ok)

cm_init_av_for_response()
 cm_req_handler(work) (OK, cm_id_priv is on stack)
 cm_sidr_req_handler(work) (OK, cm_id_priv is on stack)

cm_init_av_by_path()
 cm_req_handler(work) (OK, cm_id_priv is on stack)
 cm_lap_handler(work) (OK)
 ib_send_cm_req() (not locked)
   cma_connect_ib()
    rdma_connect_locked()
     [..]
   ipoib_cm_send_req()
   srp_send_req()
     srp_connect_ch()
      [..]
 ib_send_cm_sidr_req() (not locked)
  cma_resolve_ib_udp()
   rdma_connect_locked()

And
  cm_destroy_id() (locked)

And read reverse call chains:

cm_alloc_msg()
 ib_send_cm_req() (not locked)
 ib_send_cm_rep() (OK)
 ib_send_cm_rtu() (OK)
 cm_send_dreq_locked() (OK)
 cm_send_drep_locked() (OK)
 cm_send_rej_locked() (OK)
 ib_send_cm_mra() (OK)
 ib_send_cm_sidr_req() (not locked)
 cm_send_sidr_rep_locked() (OK)
cm_form_tid()
 cm_format_req()
  ib_send_cm_req() (sort of OK)
 cm_format_dreq()
   cm_send_dreq_locked (OK)
cm_format_req()
  ib_send_cm_req() (sort of OK)
cm_format_req_event()
 cm_req_handler() (OK, cm_id_priv is on stack)
cm_format_rep()
 ib_send_cm_rep() (OK)
cm_rep_handler(work) (OK)
cm_establish_handler(work) (OK)
cm_rtu_handler(work) (OK)
cm_send_dreq_locked() (OK)
cm_dreq_handler(work) (OK)
cm_drep_handler(work) (OK)
cm_rej_handler(work) (OK)
cm_mra_handler(work) (OK)
cm_apr_handler(work) (OK)
cm_sidr_rep_handler(work) (OK)
cm_init_qp_init_attr() (OK)
cm_init_qp_rtr_attr() (OK)
cm_init_qp_rts_attr() (OK)

So.. That leaves these functions that are not obviously locked
correctly:
 ib_send_cm_req()
 ib_send_cm_sidr_req()

And the way their locking expects to work is basically because they
expect that there are not parallel touches to the cm_id - however I'm
doubtful this is completely true.

So no new lock needed, but something should be done about the above
two functions, and this should be documented

Jason

  reply	other threads:[~2021-04-23 14:24 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-21 11:40 [PATCH rdma-next v2 0/9] Fix memory corruption in CM Leon Romanovsky
2021-04-21 11:40 ` [PATCH rdma-next v2 1/9] IB/cm: Pair cm_alloc_response_msg() with a cm_free_response_msg() Leon Romanovsky
2021-04-21 11:40 ` [PATCH rdma-next v2 2/9] IB/cm: Split cm_alloc_msg() Leon Romanovsky
2021-04-21 11:40 ` [PATCH rdma-next v2 3/9] IB/cm: Call the correct message free functions in cm_send_handler() Leon Romanovsky
2021-04-21 11:40 ` [PATCH rdma-next v2 4/9] IB/cm: Tidy remaining cm_msg free paths Leon Romanovsky
2021-04-21 11:40 ` [PATCH rdma-next v2 5/9] Revert "IB/cm: Mark stale CM id's whenever the mad agent was unregistered" Leon Romanovsky
2021-04-21 11:40 ` [PATCH rdma-next v2 6/9] IB/cm: Simplify ib_cancel_mad() and ib_modify_mad() calls Leon Romanovsky
2021-04-21 11:40 ` [PATCH rdma-next v2 7/9] IB/cm: Clear all associated AV's ports when remove a cm device Leon Romanovsky
2021-04-22 19:34   ` Jason Gunthorpe
2021-04-23 13:14     ` Mark Zhang
2021-04-23 14:24       ` Jason Gunthorpe [this message]
2021-04-24  2:33         ` Mark Zhang
2021-04-26 13:56           ` Jason Gunthorpe
2021-04-27  1:59             ` Mark Zhang
2021-04-21 11:40 ` [PATCH rdma-next v2 8/9] IB/cm: Add lock protection when access av/alt_av's port of a cm_id Leon Romanovsky
2021-04-22 19:08   ` Jason Gunthorpe
2021-04-25 13:21     ` Leon Romanovsky
2021-04-21 11:40 ` [PATCH rdma-next v2 9/9] IB/cm: Initialize av before aquire the spin lock in cm_lap_handler Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210423142430.GI1370958@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=dledford@redhat.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=markzhang@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.