Linux-RDMA Archive on lore.kernel.org
 help / color / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Leon Romanovsky <leon@kernel.org>
Cc: Doug Ledford <dledford@redhat.com>,
	RDMA mailing list <linux-rdma@vger.kernel.org>,
	Daniel Jurgens <danielj@mellanox.com>,
	Erez Shitrit <erezsh@mellanox.com>,
	Maor Gottlieb <maorg@mellanox.com>,
	Michael Guralnik <michaelgur@mellanox.com>,
	Moni Shoua <monis@mellanox.com>,
	Parav Pandit <parav@mellanox.com>,
	Sean Hefty <sean.hefty@intel.com>,
	Valentine Fatiev <valentinef@mellanox.com>,
	Yishai Hadas <yishaih@mellanox.com>,
	Yonatan Cohen <yonatanc@mellanox.com>,
	Zhu Yanjun <yanjunz@mellanox.com>
Subject: Re: [PATCH rdma-rc 4/9] IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode
Date: Thu, 13 Feb 2020 15:09:29 -0400
Message-ID: <20200213190929.GN31668@ziepe.ca> (raw)
In-Reply-To: <20200213183622.GH679970@unreal>

On Thu, Feb 13, 2020 at 08:36:22PM +0200, Leon Romanovsky wrote:
> On Thu, Feb 13, 2020 at 02:26:55PM -0400, Jason Gunthorpe wrote:
> > On Thu, Feb 13, 2020 at 08:10:12PM +0200, Leon Romanovsky wrote:
> > > On Thu, Feb 13, 2020 at 11:37:43AM -0400, Jason Gunthorpe wrote:
> > > > On Wed, Feb 12, 2020 at 09:26:30AM +0200, Leon Romanovsky wrote:
> > > >
> > > > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
> > > > > index 2aa3457a30ce..c614cb87d09b 100644
> > > > > +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
> > > > > @@ -379,6 +379,7 @@ struct ipoib_dev_priv {
> > > > >  	struct ipoib_tx_buf *tx_ring;
> > > > >  	unsigned int	     tx_head;
> > > > >  	unsigned int	     tx_tail;
> > > > > +	atomic_t             tx_outstanding;
> > > > >  	struct ib_sge	     tx_sge[MAX_SKB_FRAGS + 1];
> > > > >  	struct ib_ud_wr      tx_wr;
> > > > >  	struct ib_wc	     send_wc[MAX_SEND_CQE];
> > > > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > > > > index c59e00a0881f..db6aace83fe5 100644
> > > > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > > > > @@ -756,7 +756,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
> > > > >  		return;
> > > > >  	}
> > > > >
> > > > > -	if ((priv->tx_head - priv->tx_tail) == ipoib_sendq_size - 1) {
> > > > > +	if (atomic_read(&priv->tx_outstanding) == ipoib_sendq_size - 1) {
> > > > >  		ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n",
> > > > >  			  tx->qp->qp_num);
> > > > >  		netif_stop_queue(dev);
> > > > > @@ -786,7 +786,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
> > > > >  	} else {
> > > > >  		netif_trans_update(dev);
> > > > >  		++tx->tx_head;
> > > > > -		++priv->tx_head;
> > > > > +		atomic_inc(&priv->tx_outstanding);
> > > > >  	}
> > > >
> > > > This use of an atomic is very weird, probably wrong.
> > > >
> > > > Why is it an atomic?  priv->tx_head wasn't an atomic, and every place
> > > > touching tx_outstanding was also touching tx_head.
> > > >
> > > > I assume there is some hidden locking here? Or much more stuff is
> > > > busted up.
> > > >
> > > > In that case, drop the atomic.
> > > >
> > > > However, if the atomic is needed (where/why?) then something has to
> > > > be dealing with the races, and if the write side is fully locked then
> > > > an atomic is the wrong choice, use READ_ONCE/WRITE_ONCE instead
> > >
> > > I thought that atomic_t is appropriate here. Valentin wanted to
> > > implement "window" and he needed to ensure that this window is counted
> > > correctly while it doesn't need to be 100% accurate.
> >
> > It does need to be 100% accurate because the stop_queue condition is:
> >
> >  +	if (atomic_read(&priv->tx_outstanding) == ipoib_sendq_size - 1) {
> 
> So better to write ">=" instead of "=".

Then you risk overflowing the send q.

The required scheme appears to be to globally bound the # of packets
outstanding to tx such that the global bound is lower than any of the
(many) QP's sendq, thus there will always be a place to put the
packets.

So it must be exact or at least pessimistic.

This isn't a multiq netdev, so AFAICT, the tx side is single threaded
by netdev, and it looks like the wc side was single threaded by NAPI
as well.

The old code used two variable, each written in a single threaded way,
and then read to generate the state. With the missing READ_ONCE added,
it should look like this:

    if ((priv->tx_head - READ_ONCE(priv->tx_tail)) == ipoib_sendq_size - 1) {

I feel like sticking with the single writer packets posted/completed
scheme is better and clearer than trying to collapse it into a single
atomic.

> > And presumably the whole problem here is overflowing some sized q
> > opbject.
> >
> > That atomic is only needed if there is concurrency, and where is the
> > concurrency?
> 
> Depends on if you believe to description in commit message :)

I do.. It is obviously wrong now that you point to it :) Each QP must
have exclusive control of its head/tail pointers, having a CM QP reach
into the head/tail of the UD QP is certainly wrong.

Jason

  reply index

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-12  7:26 [PATCH rdma-rc 0/9] Fixes for v5.6 Leon Romanovsky
2020-02-12  7:26 ` [PATCH rdma-rc 1/9] RDMA/ucma: Mask QPN to be 24 bits according to IBTA Leon Romanovsky
2020-02-12  7:26 ` [PATCH rdma-rc 2/9] RDMA/core: Fix protection fault in get_pkey_idx_qp_list Leon Romanovsky
2020-02-12  8:01   ` Leon Romanovsky
2020-02-12  8:06   ` Leon Romanovsky
2020-02-12  7:26 ` [PATCH rdma-rc 3/9] Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow" Leon Romanovsky
2020-02-13 13:30   ` Jason Gunthorpe
2020-02-14  3:11     ` Parav Pandit
2020-02-14 14:08       ` Jason Gunthorpe
2020-02-14 14:48         ` Parav Pandit
2020-02-19 20:40   ` Jason Gunthorpe
2020-02-12  7:26 ` [PATCH rdma-rc 4/9] IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode Leon Romanovsky
2020-02-13 15:37   ` Jason Gunthorpe
2020-02-13 18:10     ` Leon Romanovsky
2020-02-13 18:26       ` Jason Gunthorpe
2020-02-13 18:36         ` Leon Romanovsky
2020-02-13 19:09           ` Jason Gunthorpe [this message]
2020-02-12  7:26 ` [PATCH rdma-rc 5/9] RDMA/core: Add missing list deletion on freeing event queue Leon Romanovsky
2020-02-12  7:26 ` [PATCH rdma-rc 6/9] IB/mlx5: Fix async events cleanup flows Leon Romanovsky
2020-02-12  7:26 ` [PATCH rdma-rc 7/9] RDMA/rxe: Fix soft lockup problem due to using tasklets in softirq Leon Romanovsky
2020-02-12  7:26 ` [PATCH rdma-rc 8/9] IB/umad: Fix kernel crash while unloading ib_umad Leon Romanovsky
2020-02-13 14:28   ` Jason Gunthorpe
2020-02-13 18:03     ` Leon Romanovsky
2020-02-12  7:26 ` [PATCH rdma-rc 9/9] RDMA/mlx5: Prevent overflow in mmap offset calculations Leon Romanovsky
2020-02-13 18:03 ` [PATCH rdma-rc 0/9] Fixes for v5.6 Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200213190929.GN31668@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=danielj@mellanox.com \
    --cc=dledford@redhat.com \
    --cc=erezsh@mellanox.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maorg@mellanox.com \
    --cc=michaelgur@mellanox.com \
    --cc=monis@mellanox.com \
    --cc=parav@mellanox.com \
    --cc=sean.hefty@intel.com \
    --cc=valentinef@mellanox.com \
    --cc=yanjunz@mellanox.com \
    --cc=yishaih@mellanox.com \
    --cc=yonatanc@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-RDMA Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \
		linux-rdma@vger.kernel.org
	public-inbox-index linux-rdma

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git