All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bob Pearson <rpearsonhpe@gmail.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org
Subject: Re: [PATCH for-next v11 10/13] RDMA/rxe: Stop lookup of partially built objects
Date: Tue, 15 Mar 2022 22:55:01 -0500	[thread overview]
Message-ID: <8aa69cff-2871-3015-fd2d-7279d8f3ddac@gmail.com> (raw)
In-Reply-To: <20220316001655.GV11336@nvidia.com>

On 3/15/22 19:16, Jason Gunthorpe wrote:
> On Thu, Mar 03, 2022 at 06:08:06PM -0600, Bob Pearson wrote:
>> Currently the rdma_rxe driver has a security weakness due to giving
>> objects which are partially initialized indices allowing external
>> actors to gain access to them by sending packets which refer to
>> their index (e.g. qpn, rkey, etc) causing unpredictable results.
>>
>> This patch adds two new APIs rxe_show(obj) and rxe_hide(obj) which
>> enable or disable looking up pool objects from indices using
>> rxe_pool_get_index(). By default objects are disabled. These APIs
>> are used to enable looking up objects which have indices:
>> AH, SRQ, QP, MR, and MW. They are added in create verbs after the
>> objects are fully initialized and as soon as possible in destroy
>> verbs.
> 
> In other parts of rdma we use the word 'finalize' where you used show
> 
> So rxe_pool_finalize() or something
> 
> I'm not clear on what hide is supposed to be for, if the object is
> being destroyed why do we need a period when it is NULL in the xarray
> before just erasing it?
The problem I am trying to solve is that when a destroy verb is called I want
to stop looking up rdma objects from their numbers (rkey, qpn, etc.) which
happens when a new packet arrives that refers to the object. So we start the
object destroy flow but we may hit a long delay if there are already
references taken on the object. In the next patch we are going to add a complete
wait_for_completion so that we won't return to rdma_core until all the refs
are dropped. While we are waiting for some past event to complete and drop its
reference new packets may arrive and take a reference on the object while looking it
up. Potentially this could happen many times. I just want to stop accepting any
new packets as soon as the destroy verb gets called. Meanwhile we will allow
past packets to complete. That is what hide() does.

Show() deals with the opposite problem. The way the object library worked as soon as
the object was created or 'added to the pool' it becomes searchable. But some of the
verbs have a lot of code to execute after the object gets its number assigned so by
setting the link to NULL in the xa_alloc call we get the index assigned but the
object is not searchable. show turns it on at the end for the create verb call after
all the init code is done.

There are likely better names but these were the ones that came to mind.

> 
>> @@ -221,8 +221,12 @@ static void rxe_elem_release(struct kref *kref)
>>  {
>>  	struct rxe_pool_elem *elem = container_of(kref, typeof(*elem), ref_cnt);
>>  	struct rxe_pool *pool = elem->pool;
>> +	struct xarray *xa = &pool->xa;
>> +	unsigned long flags;
>>  
>> -	xa_erase(&pool->xa, elem->index);
>> +	xa_lock_irqsave(xa, flags);
>> +	__xa_erase(&pool->xa, elem->index);
>> +	xa_unlock_irqrestore(xa, flags);
> 
> I guess it has to do with this, but why have the xa_erase in the kref
> release at all?
> 
>>  	if (pool->cleanup)
>>  		pool->cleanup(elem);
>> @@ -242,3 +246,33 @@ int __rxe_put(struct rxe_pool_elem *elem)
>>  {
>>  	return kref_put(&elem->ref_cnt, rxe_elem_release);
>>  }
>> +
>> +int __rxe_show(struct rxe_pool_elem *elem)
>> +{
>> +	struct xarray *xa = &elem->pool->xa;
>> +	unsigned long flags;
>> +	void *ret;
>> +
>> +	xa_lock_irqsave(xa, flags);
>> +	ret = __xa_store(&elem->pool->xa, elem->index, elem, GFP_KERNEL);
>> +	xa_unlock_irqrestore(xa, flags);
>> +	if (IS_ERR(ret))
>> +		return PTR_ERR(ret);
> 
> This can't fail due to the xa memory already being allocated. You can
> just WARN_ON here and 'finalize' should not return an error code.
> 
> If you want to be fancy this checks for other mistakes too:
> 
>    tmp = xa_cmpxchg((&elem->pool->xa, elem->index, XA_ZERO_ENTRY,  elem, 0)
>    WARN_ON(tmp != NULL);
> 
>> +int __rxe_hide(struct rxe_pool_elem *elem)
>> +{
>> +	struct xarray *xa = &elem->pool->xa;
>> +	unsigned long flags;
>> +	void *ret;
>> +
>> +	xa_lock_irqsave(xa, flags);
>> +	ret = __xa_store(&elem->pool->xa, elem->index, NULL, GFP_KERNEL);
>> +	xa_unlock_irqrestore(xa, flags);
> 
> IIRC storing NULL is the same as erase, isn't it?  You have to store
> XA_ZERO_ENTRY if you want to keep an allocated NULL
This works just fine. You are correct for normal xarrays but for allocating xarrays
setting the link to NULL is treated differently and it remembers. Here is the comment
in <kernel>/Documentation

Allocating XArrays

If you use DEFINE_XARRAY_ALLOC() to define the XArray, or initialise it by passing XA_FLAGS_ALLOC to xa_init_flags(), the XArray changes to track whether entries are in use or not.

You can call xa_alloc() to store the entry at an unused index in the XArray. If you need to modify the array from interrupt context, you can use xa_alloc_bh() or xa_alloc_irq() to disable interrupts while allocating the ID.

Using xa_store(), xa_cmpxchg() or xa_insert() will also mark the entry as being allocated. Unlike a normal XArray, storing NULL will mark the entry as being in use, like xa_reserve(). To free an entry, use xa_erase() (or xa_release() if you only want to free the entry if it’s NULL).

> 
>> +	if (IS_ERR(ret))
>> +		return PTR_ERR(ret);
>> +	else
>> +		return 0;
>> +}
> 
> Same remark about the error handling
> 
>> @@ -491,6 +497,7 @@ static int rxe_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
>>  	struct rxe_qp *qp = to_rqp(ibqp);
>>  	int ret;
>>  
>> +	rxe_hide(qp);
>>  	ret = rxe_qp_chk_destroy(qp);
>>  	if (ret)
>>  		return ret;
> 
> So we decided not to destroy the QP but wrecked it in the xarray?
> 
> Not convinced about the hide at all..
This particular example is pretty light weight since rxe_qp_chk_destroy only makes one
memory reference. But dereg_mr actually can do a lot of work and in either case
the idea as explained above is not to wreck the object but prevent rxe_pool_get_index(pool, index)
from succeeding and taking a new ref on the object. Once the verbs client has called a destroy
verb I don't see any reason why we should continue to process newly arriving packets forever which
is the only place in the driver where we convert object numbers to objects.

There is another issue which is not being dealt with here and that is partially torn down
objects. After this patch the flow for a destroy verb for an indexed object is

hide() =	"disable new lookups, e.g. xa_store(NULL)"
		"misc tear down code"
rxe_put() = 	"drop a reference, will be last one if the object is quiescent"
		"if necessary wait until last ref dropped"
		"object cleanup code, includes xa_erase()"

If objects are still holding references you have to be careful to make sure that
nothing in misc tear down code matters for an outstanding reference. Currently this all works
but if any change breaks things we have had to defer the change to the cleanup phase.
It doesn't work by design but just debugging in the past.

Bob


> 
> Jason


  reply	other threads:[~2022-03-16  3:55 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-04  0:07 [PATCH for-next v11 00/13] Fix race conditions in rxe_pool Bob Pearson
2022-03-04  0:07 ` [PATCH for-next v11 01/13] RDMA/rxe: Fix ref error in rxe_av.c Bob Pearson
2022-03-04  0:07 ` [PATCH for-next v11 02/13] RDMA/rxe: Replace mr by rkey in responder resources Bob Pearson
2022-03-04  0:07 ` [PATCH for-next v11 03/13] RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC Bob Pearson
2022-03-04  0:08 ` [PATCH for-next v11 04/13] RDMA/rxe: Delete _locked() APIs for pool objects Bob Pearson
2022-03-04  0:08 ` [PATCH for-next v11 05/13] RDMA/rxe: Replace obj by elem in declaration Bob Pearson
2022-03-04  0:08 ` [PATCH for-next v11 06/13] RDMA/rxe: Move max_elem into rxe_type_info Bob Pearson
2022-03-04  0:08 ` [PATCH for-next v11 07/13] RDMA/rxe: Shorten pool names in rxe_pool.c Bob Pearson
2022-03-04  0:08 ` [PATCH for-next v11 08/13] RDMA/rxe: Replace red-black trees by xarrays Bob Pearson
2022-03-15 23:45   ` Jason Gunthorpe
2022-03-16  3:05     ` Bob Pearson
2022-03-04  0:08 ` [PATCH for-next v11 09/13] RDMA/rxe: Use standard names for ref counting Bob Pearson
2022-03-04  0:08 ` [PATCH for-next v11 10/13] RDMA/rxe: Stop lookup of partially built objects Bob Pearson
2022-03-16  0:16   ` Jason Gunthorpe
2022-03-16  3:55     ` Bob Pearson [this message]
2022-03-16 13:42       ` Jason Gunthorpe
2022-03-04  0:08 ` [PATCH for-next v11 11/13] RDMA/rxe: Add wait_for_completion to pool objects Bob Pearson
2022-03-16  0:17   ` Jason Gunthorpe
2022-03-16  3:57     ` Bob Pearson
2022-03-16 13:43       ` Jason Gunthorpe
2022-03-04  0:08 ` [PATCH for-next v11 12/13] RDMA/rxe: Convert read side locking to rcu Bob Pearson
2022-03-16  0:18   ` Jason Gunthorpe
2022-03-16  4:05     ` Bob Pearson
2022-03-04  0:08 ` [PATCH for-next v11 13/13] RDMA/rxe: Cleanup rxe_pool.c Bob Pearson
2022-03-16  0:25 ` [PATCH for-next v11 00/13] Fix race conditions in rxe_pool Jason Gunthorpe
2022-03-16  4:05   ` Bob Pearson
2022-03-16 16:08     ` Jason Gunthorpe
2022-03-16 16:09       ` Pearson, Robert B

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8aa69cff-2871-3015-fd2d-7279d8f3ddac@gmail.com \
    --to=rpearsonhpe@gmail.com \
    --cc=jgg@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.