All of lore.kernel.org
 help / color / mirror / Atom feed
* Maybe a race condition in net/rds/rdma.c?
@ 2020-02-18 13:13 zerons
  2020-02-27 14:28 ` Håkon Bugge
  0 siblings, 1 reply; 9+ messages in thread
From: zerons @ 2020-02-18 13:13 UTC (permalink / raw)
  To: linux-rdma

Hi, all

In net/rds/rdma.c
(https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/rds/rdma.c?h=v5.5.3#n419),
there may be a race condition between rds_rdma_unuse() and rds_free_mr().

It seems that this one need some specific devices to run test,
unfortunately, I don't have any of these.
I've already sent two emails to the maintainer for help, no response yet,
(the email address may not be in use).

0) in rds_recv_incoming_exthdrs(), it calls rds_rdma_unuse() when receive an
extension header with force=0, if the victim mr does not have RDS_RDMA_USE_ONCE
flag set, then the mr would stay in the rbtree. Without any lock, it tries to
call mr->r_trans->sync_mr().

1) in rds_free_mr(), the same mr is found, and then freed. The mr->r_refcount
doesn't change while rds_mr_tree_walk().

0) back in rds_rdma_unuse(), the victim mr get used again, call
mr->r_trans->sync_mr().

Could this race condition actually happen?

Thank you.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Maybe a race condition in net/rds/rdma.c?
  2020-02-18 13:13 Maybe a race condition in net/rds/rdma.c? zerons
@ 2020-02-27 14:28 ` Håkon Bugge
  2020-02-27 18:10   ` santosh.shilimkar
  0 siblings, 1 reply; 9+ messages in thread
From: Håkon Bugge @ 2020-02-27 14:28 UTC (permalink / raw)
  To: zerons; +Cc: OFED mailing list



> On 18 Feb 2020, at 14:13, zerons <sironhide0null@gmail.com> wrote:
> 
> Hi, all
> 
> In net/rds/rdma.c
> (https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/rds/rdma.c?h=v5.5.3#n419),
> there may be a race condition between rds_rdma_unuse() and rds_free_mr().
> 
> It seems that this one need some specific devices to run test,
> unfortunately, I don't have any of these.
> I've already sent two emails to the maintainer for help, no response yet,
> (the email address may not be in use).
> 
> 0) in rds_recv_incoming_exthdrs(), it calls rds_rdma_unuse() when receive an
> extension header with force=0, if the victim mr does not have RDS_RDMA_USE_ONCE
> flag set, then the mr would stay in the rbtree. Without any lock, it tries to
> call mr->r_trans->sync_mr().
> 
> 1) in rds_free_mr(), the same mr is found, and then freed. The mr->r_refcount
> doesn't change while rds_mr_tree_walk().
> 
> 0) back in rds_rdma_unuse(), the victim mr get used again, call
> mr->r_trans->sync_mr().
> 
> Could this race condition actually happen?
> 
> Thank you.

Hi Peng,


I will have someone to look at this one.

Thanks for your report,


Håkon





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Maybe a race condition in net/rds/rdma.c?
  2020-02-27 14:28 ` Håkon Bugge
@ 2020-02-27 18:10   ` santosh.shilimkar
  2020-03-06 12:11     ` zerons
  0 siblings, 1 reply; 9+ messages in thread
From: santosh.shilimkar @ 2020-02-27 18:10 UTC (permalink / raw)
  To: zerons; +Cc: netdev, OFED mailing list


>> On 18 Feb 2020, at 14:13, zerons <sironhide0null@gmail.com> wrote:
>>
>> Hi, all
>>
>> In net/rds/rdma.c
>> (https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/rds/rdma.c?h=v5.5.3*n419__;Iw!!GqivPVa7Brio!OwwQCLtjDsKmhaIz0sfaOVSuC4ai5t5_FgB7yqNExGOCBtACtIGLF61NNJyqSDtIAcGoPg$ ),
>> there may be a race condition between rds_rdma_unuse() and rds_free_mr().
>>
Hmmm.. I didn't see email before in my inbox. Please post 
questions/patches on netdev in future which is the correct mailing list.

>> It seems that this one need some specific devices to run test,
>> unfortunately, I don't have any of these.
>> I've already sent two emails to the maintainer for help, no response yet,
>> (the email address may not be in use).
>>
>> 0) in rds_recv_incoming_exthdrs(), it calls rds_rdma_unuse() when receive an
>> extension header with force=0, if the victim mr does not have RDS_RDMA_USE_ONCE
>> flag set, then the mr would stay in the rbtree. Without any lock, it tries to
>> call mr->r_trans->sync_mr().
>>
>> 1) in rds_free_mr(), the same mr is found, and then freed. The mr->r_refcount
>> doesn't change while rds_mr_tree_walk().
>>
>> 0) back in rds_rdma_unuse(), the victim mr get used again, call
>> mr->r_trans->sync_mr().
>>
>> Could this race condition actually happen?
>>
force=0 is an interesting scenario. Let me think about it and get back.
Thanks for report.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Maybe a race condition in net/rds/rdma.c?
  2020-02-27 18:10   ` santosh.shilimkar
@ 2020-03-06 12:11     ` zerons
  2020-03-10 17:53       ` santosh.shilimkar
  0 siblings, 1 reply; 9+ messages in thread
From: zerons @ 2020-03-06 12:11 UTC (permalink / raw)
  To: santosh.shilimkar; +Cc: netdev, OFED mailing list, haakon.bugge



On 2/28/20 02:10, santosh.shilimkar@oracle.com wrote:
> 
>>> On 18 Feb 2020, at 14:13, zerons <sironhide0null@gmail.com> wrote:
>>>
>>> Hi, all
>>>
>>> In net/rds/rdma.c
>>> (https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/rds/rdma.c?h=v5.5.3*n419__;Iw!!GqivPVa7Brio!OwwQCLtjDsKmhaIz0sfaOVSuC4ai5t5_FgB7yqNExGOCBtACtIGLF61NNJyqSDtIAcGoPg$ ),
>>> there may be a race condition between rds_rdma_unuse() and rds_free_mr().
>>>
> Hmmm.. I didn't see email before in my inbox. Please post questions/patches on netdev in future which is the correct mailing list.
> 
>>> It seems that this one need some specific devices to run test,
>>> unfortunately, I don't have any of these.
>>> I've already sent two emails to the maintainer for help, no response yet,
>>> (the email address may not be in use).
>>>
>>> 0) in rds_recv_incoming_exthdrs(), it calls rds_rdma_unuse() when receive an
>>> extension header with force=0, if the victim mr does not have RDS_RDMA_USE_ONCE
>>> flag set, then the mr would stay in the rbtree. Without any lock, it tries to
>>> call mr->r_trans->sync_mr().
>>>
>>> 1) in rds_free_mr(), the same mr is found, and then freed. The mr->r_refcount
>>> doesn't change while rds_mr_tree_walk().
>>>
>>> 0) back in rds_rdma_unuse(), the victim mr get used again, call
>>> mr->r_trans->sync_mr().
>>>
>>> Could this race condition actually happen?
>>>
> force=0 is an interesting scenario. Let me think about it and get back.
> Thanks for report.
> 
> Regards,
> Santosh

Thanks for your attention.

Regards,

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Maybe a race condition in net/rds/rdma.c?
  2020-03-06 12:11     ` zerons
@ 2020-03-10 17:53       ` santosh.shilimkar
  2020-03-11  4:48         ` zerons
  0 siblings, 1 reply; 9+ messages in thread
From: santosh.shilimkar @ 2020-03-10 17:53 UTC (permalink / raw)
  To: zerons; +Cc: netdev, OFED mailing list, haakon.bugge

On 3/6/20 4:11 AM, zerons wrote:
> 
> 
> On 2/28/20 02:10, santosh.shilimkar@oracle.com wrote:
>>
>>>> On 18 Feb 2020, at 14:13, zerons <sironhide0null@gmail.com> wrote:
>>>>
>>>> Hi, all
>>>>
>>>> In net/rds/rdma.c
>>>> (https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/rds/rdma.c?h=v5.5.3*n419__;Iw!!GqivPVa7Brio!OwwQCLtjDsKmhaIz0sfaOVSuC4ai5t5_FgB7yqNExGOCBtACtIGLF61NNJyqSDtIAcGoPg$ ),
>>>> there may be a race condition between rds_rdma_unuse() and rds_free_mr().
>>>>
>> Hmmm.. I didn't see email before in my inbox. Please post questions/patches on netdev in future which is the correct mailing list.
>>
>>>> It seems that this one need some specific devices to run test,
>>>> unfortunately, I don't have any of these.
>>>> I've already sent two emails to the maintainer for help, no response yet,
>>>> (the email address may not be in use).
>>>>
>>>> 0) in rds_recv_incoming_exthdrs(), it calls rds_rdma_unuse() when receive an
>>>> extension header with force=0, if the victim mr does not have RDS_RDMA_USE_ONCE
>>>> flag set, then the mr would stay in the rbtree. Without any lock, it tries to
>>>> call mr->r_trans->sync_mr().
>>>>
MR won't stay in the rbtree with force flag. If the MR is used or
use_once is set in both cases its removed from the tree.
See "if (mr->r_use_once || force)"

>>>> 1) in rds_free_mr(), the same mr is found, and then freed. The mr->r_refcount
>>>> doesn't change while rds_mr_tree_walk().
>>>>
>>>> 0) back in rds_rdma_unuse(), the victim mr get used again, call
>>>> mr->r_trans->sync_mr().
>>>>
>>>> Could this race condition actually happen?
So from what I see, this race doesn't exist but let me know
if am missing something.

regards,
Santosh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Maybe a race condition in net/rds/rdma.c?
  2020-03-10 17:53       ` santosh.shilimkar
@ 2020-03-11  4:48         ` zerons
  2020-03-11 14:35           ` santosh.shilimkar
  0 siblings, 1 reply; 9+ messages in thread
From: zerons @ 2020-03-11  4:48 UTC (permalink / raw)
  To: santosh.shilimkar; +Cc: netdev, OFED mailing list, haakon.bugge



On 3/11/20 01:53, santosh.shilimkar@oracle.com wrote:
> On 3/6/20 4:11 AM, zerons wrote:
>>
>>
>> On 2/28/20 02:10, santosh.shilimkar@oracle.com wrote:
>>>
>>>>> On 18 Feb 2020, at 14:13, zerons <sironhide0null@gmail.com> wrote:
>>>>>
>>>>> Hi, all
>>>>>
>>>>> In net/rds/rdma.c
>>>>> (https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/rds/rdma.c?h=v5.5.3*n419__;Iw!!GqivPVa7Brio!OwwQCLtjDsKmhaIz0sfaOVSuC4ai5t5_FgB7yqNExGOCBtACtIGLF61NNJyqSDtIAcGoPg$ ),
>>>>> there may be a race condition between rds_rdma_unuse() and rds_free_mr().
>>>>>
>>> Hmmm.. I didn't see email before in my inbox. Please post questions/patches on netdev in future which is the correct mailing list.
>>>
>>>>> It seems that this one need some specific devices to run test,
>>>>> unfortunately, I don't have any of these.
>>>>> I've already sent two emails to the maintainer for help, no response yet,
>>>>> (the email address may not be in use).
>>>>>
>>>>> 0) in rds_recv_incoming_exthdrs(), it calls rds_rdma_unuse() when receive an
>>>>> extension header with force=0, if the victim mr does not have RDS_RDMA_USE_ONCE
>>>>> flag set, then the mr would stay in the rbtree. Without any lock, it tries to
>>>>> call mr->r_trans->sync_mr().
>>>>>
> MR won't stay in the rbtree with force flag. If the MR is used or
> use_once is set in both cases its removed from the tree.
> See "if (mr->r_use_once || force)"
> 

Sorry, I may misunderstand. Did you mean that if the MR is *used*,
it is removed from the tree with or without the force flag in
rds_rdma_unuse(), even when r_use_once is not set?

Regards,

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Maybe a race condition in net/rds/rdma.c?
  2020-03-11  4:48         ` zerons
@ 2020-03-11 14:35           ` santosh.shilimkar
  2020-03-12  8:58             ` zerons
  0 siblings, 1 reply; 9+ messages in thread
From: santosh.shilimkar @ 2020-03-11 14:35 UTC (permalink / raw)
  To: zerons; +Cc: netdev, OFED mailing list, haakon.bugge

On 3/10/20 9:48 PM, zerons wrote:
> 
> 
> On 3/11/20 01:53, santosh.shilimkar@oracle.com wrote:
>> On 3/6/20 4:11 AM, zerons wrote:
>>>
>>>
>>> On 2/28/20 02:10, santosh.shilimkar@oracle.com wrote:
>>>>
>>>>>> On 18 Feb 2020, at 14:13, zerons <sironhide0null@gmail.com> wrote:
>>>>>>
>>>>>> Hi, all
>>>>>>
>>>>>> In net/rds/rdma.c
>>>>>> (https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/rds/rdma.c?h=v5.5.3*n419__;Iw!!GqivPVa7Brio!OwwQCLtjDsKmhaIz0sfaOVSuC4ai5t5_FgB7yqNExGOCBtACtIGLF61NNJyqSDtIAcGoPg$ ),
>>>>>> there may be a race condition between rds_rdma_unuse() and rds_free_mr().
>>>>>>
>>>> Hmmm.. I didn't see email before in my inbox. Please post questions/patches on netdev in future which is the correct mailing list.
>>>>
>>>>>> It seems that this one need some specific devices to run test,
>>>>>> unfortunately, I don't have any of these.
>>>>>> I've already sent two emails to the maintainer for help, no response yet,
>>>>>> (the email address may not be in use).
>>>>>>
>>>>>> 0) in rds_recv_incoming_exthdrs(), it calls rds_rdma_unuse() when receive an
>>>>>> extension header with force=0, if the victim mr does not have RDS_RDMA_USE_ONCE
>>>>>> flag set, then the mr would stay in the rbtree. Without any lock, it tries to
>>>>>> call mr->r_trans->sync_mr().
>>>>>>
>> MR won't stay in the rbtree with force flag. If the MR is used or
>> use_once is set in both cases its removed from the tree.
>> See "if (mr->r_use_once || force)"
>>
> 
> Sorry, I may misunderstand. Did you mean that if the MR is *used*,
> it is removed from the tree with or without the force flag in
> rds_rdma_unuse(), even when r_use_once is not set?
> 
Once the MR is being used with use_once semantics it gets removed with 
or without remote side indicating it via extended header. use_once
optimization was added later. The base behavior is once the MR is
used by remote and same information is sent via extended header,
it gets cleaned up with force flag. Force flag ignores whether
its marked as used_once or not.

Regards,
Santosh


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Maybe a race condition in net/rds/rdma.c?
  2020-03-11 14:35           ` santosh.shilimkar
@ 2020-03-12  8:58             ` zerons
  2020-03-12 17:49               ` santosh.shilimkar
  0 siblings, 1 reply; 9+ messages in thread
From: zerons @ 2020-03-12  8:58 UTC (permalink / raw)
  To: santosh.shilimkar; +Cc: netdev, OFED mailing list, haakon.bugge



On 3/11/20 22:35, santosh.shilimkar@oracle.com wrote:
> On 3/10/20 9:48 PM, zerons wrote:
>>
>>
>> On 3/11/20 01:53, santosh.shilimkar@oracle.com wrote:
>>> On 3/6/20 4:11 AM, zerons wrote:
>>>>
>>>>
>>>> On 2/28/20 02:10, santosh.shilimkar@oracle.com wrote:
>>>>>
>>>>>>> On 18 Feb 2020, at 14:13, zerons <sironhide0null@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi, all
>>>>>>>
>>>>>>> In net/rds/rdma.c
>>>>>>> (https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/rds/rdma.c?h=v5.5.3*n419__;Iw!!GqivPVa7Brio!OwwQCLtjDsKmhaIz0sfaOVSuC4ai5t5_FgB7yqNExGOCBtACtIGLF61NNJyqSDtIAcGoPg$ ),
>>>>>>> there may be a race condition between rds_rdma_unuse() and rds_free_mr().
>>>>>>>
>>>>> Hmmm.. I didn't see email before in my inbox. Please post questions/patches on netdev in future which is the correct mailing list.
>>>>>
>>>>>>> It seems that this one need some specific devices to run test,
>>>>>>> unfortunately, I don't have any of these.
>>>>>>> I've already sent two emails to the maintainer for help, no response yet,
>>>>>>> (the email address may not be in use).
>>>>>>>
>>>>>>> 0) in rds_recv_incoming_exthdrs(), it calls rds_rdma_unuse() when receive an
>>>>>>> extension header with force=0, if the victim mr does not have RDS_RDMA_USE_ONCE
>>>>>>> flag set, then the mr would stay in the rbtree. Without any lock, it tries to
>>>>>>> call mr->r_trans->sync_mr().
>>>>>>>
>>> MR won't stay in the rbtree with force flag. If the MR is used or
>>> use_once is set in both cases its removed from the tree.
>>> See "if (mr->r_use_once || force)"
>>>
>>
>> Sorry, I may misunderstand. Did you mean that if the MR is *used*,
>> it is removed from the tree with or without the force flag in
>> rds_rdma_unuse(), even when r_use_once is not set?
>>
> Once the MR is being used with use_once semantics it gets removed with or without remote side indicating it via extended header. use_once
> optimization was added later. The base behavior is once the MR is
> used by remote and same information is sent via extended header,
> it gets cleaned up with force flag. Force flag ignores whether
> its marked as used_once or not.
> 

Sorry, I am still confused.

I check the code again. The rds_rdma_unuse() is called in two functions,
rds_recv_incoming_exthdrs() and rds_sendmsg().

In rds_sendmsg(), it calls rds_rdma_unuse() *with* force flag only when
the user included a RDMA_MAP cmsg *and* sendmsg() is failed.

In rds_recv_incoming_exthdrs(), the force is *false*. So we can consider
the rds_rdma_unuse() called *without* force flag.
Then I go check where r_use_once can be set.

__rds_rdma_map()
	rds_get_mr()
		rds_setsockopt()

	rds_get_mr_for_dest()
		rds_setsockopt()

	rds_cmsg_rdma_map()
		rds_cmsg_send()
			rds_sendmsg()

It seems to me that r_use_once is controlled by user applications.

I also wonder if we can ensure that the MR found in rds_rdma_unuse()
gets removed, then "if (mr->r_use_once || force)" doesn't make any sense.

Sorry to keep bothering you with my questions. I wish I had such a device 
that I can test it on.

Best regards,

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Maybe a race condition in net/rds/rdma.c?
  2020-03-12  8:58             ` zerons
@ 2020-03-12 17:49               ` santosh.shilimkar
  0 siblings, 0 replies; 9+ messages in thread
From: santosh.shilimkar @ 2020-03-12 17:49 UTC (permalink / raw)
  To: zerons; +Cc: netdev, OFED mailing list, haakon.bugge

On 3/12/20 1:58 AM, zerons wrote:
> 
[...]
>>>> MR won't stay in the rbtree with force flag. If the MR is used or
>>>> use_once is set in both cases its removed from the tree.
>>>> See "if (mr->r_use_once || force)"
>>>>
>>>
>>> Sorry, I may misunderstand. Did you mean that if the MR is *used*,
>>> it is removed from the tree with or without the force flag in
>>> rds_rdma_unuse(), even when r_use_once is not set?
>>>
>> Once the MR is being used with use_once semantics it gets removed with or without remote side indicating it via extended header. use_once
>> optimization was added later. The base behavior is once the MR is
>> used by remote and same information is sent via extended header,
>> it gets cleaned up with force flag. Force flag ignores whether
>> its marked as used_once or not.
>>
> 
> Sorry, I am still confused.
> 
> I check the code again. The rds_rdma_unuse() is called in two functions,
> rds_recv_incoming_exthdrs() and rds_sendmsg().
> 
> In rds_sendmsg(), it calls rds_rdma_unuse() *with* force flag only when
> the user included a RDMA_MAP cmsg *and* sendmsg() is failed.
>
correct.

> In rds_recv_incoming_exthdrs(), the force is *false*. So we can consider
> the rds_rdma_unuse() called *without* force flag.
> Then I go check where r_use_once can be set.
> 
> __rds_rdma_map()
> 	rds_get_mr()
> 		rds_setsockopt()
> 
> 	rds_get_mr_for_dest()
> 		rds_setsockopt()
> 
> 	rds_cmsg_rdma_map()
> 		rds_cmsg_send()
> 			rds_sendmsg()
> 
> It seems to me that r_use_once is controlled by user applications.
>
yes it is and its being set in the application using this in
production. But You do have point that if application don't set it
then even after MR being used and remote node indicated it being
used, the MR still remains in the RB tree.


> Sorry to keep bothering you with my questions. I wish I had such a device
> that I can test it on.
> 
Not at all. You mostly found a race condition when use_once is not used
but need to verify it. We will look into it more. Thanks for your
patience.

Regards,
Santosh


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-03-12 17:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-18 13:13 Maybe a race condition in net/rds/rdma.c? zerons
2020-02-27 14:28 ` Håkon Bugge
2020-02-27 18:10   ` santosh.shilimkar
2020-03-06 12:11     ` zerons
2020-03-10 17:53       ` santosh.shilimkar
2020-03-11  4:48         ` zerons
2020-03-11 14:35           ` santosh.shilimkar
2020-03-12  8:58             ` zerons
2020-03-12 17:49               ` santosh.shilimkar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.