netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Maybe a race condition in net/rds/rdma.c?
       [not found] ` <D8EB4A77-77D7-41EB-9021-EA7BB8C3FA5B@oracle.com>
@ 2020-02-27 18:10   ` santosh.shilimkar
  2020-03-06 12:11     ` zerons
  0 siblings, 1 reply; 7+ messages in thread
From: santosh.shilimkar @ 2020-02-27 18:10 UTC (permalink / raw)
  To: zerons; +Cc: netdev, OFED mailing list


>> On 18 Feb 2020, at 14:13, zerons <sironhide0null@gmail.com> wrote:
>>
>> Hi, all
>>
>> In net/rds/rdma.c
>> (https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/rds/rdma.c?h=v5.5.3*n419__;Iw!!GqivPVa7Brio!OwwQCLtjDsKmhaIz0sfaOVSuC4ai5t5_FgB7yqNExGOCBtACtIGLF61NNJyqSDtIAcGoPg$ ),
>> there may be a race condition between rds_rdma_unuse() and rds_free_mr().
>>
Hmmm.. I didn't see email before in my inbox. Please post 
questions/patches on netdev in future which is the correct mailing list.

>> It seems that this one need some specific devices to run test,
>> unfortunately, I don't have any of these.
>> I've already sent two emails to the maintainer for help, no response yet,
>> (the email address may not be in use).
>>
>> 0) in rds_recv_incoming_exthdrs(), it calls rds_rdma_unuse() when receive an
>> extension header with force=0, if the victim mr does not have RDS_RDMA_USE_ONCE
>> flag set, then the mr would stay in the rbtree. Without any lock, it tries to
>> call mr->r_trans->sync_mr().
>>
>> 1) in rds_free_mr(), the same mr is found, and then freed. The mr->r_refcount
>> doesn't change while rds_mr_tree_walk().
>>
>> 0) back in rds_rdma_unuse(), the victim mr get used again, call
>> mr->r_trans->sync_mr().
>>
>> Could this race condition actually happen?
>>
force=0 is an interesting scenario. Let me think about it and get back.
Thanks for report.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Maybe a race condition in net/rds/rdma.c?
  2020-02-27 18:10   ` Maybe a race condition in net/rds/rdma.c? santosh.shilimkar
@ 2020-03-06 12:11     ` zerons
  2020-03-10 17:53       ` santosh.shilimkar
  0 siblings, 1 reply; 7+ messages in thread
From: zerons @ 2020-03-06 12:11 UTC (permalink / raw)
  To: santosh.shilimkar; +Cc: netdev, OFED mailing list, haakon.bugge



On 2/28/20 02:10, santosh.shilimkar@oracle.com wrote:
> 
>>> On 18 Feb 2020, at 14:13, zerons <sironhide0null@gmail.com> wrote:
>>>
>>> Hi, all
>>>
>>> In net/rds/rdma.c
>>> (https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/rds/rdma.c?h=v5.5.3*n419__;Iw!!GqivPVa7Brio!OwwQCLtjDsKmhaIz0sfaOVSuC4ai5t5_FgB7yqNExGOCBtACtIGLF61NNJyqSDtIAcGoPg$ ),
>>> there may be a race condition between rds_rdma_unuse() and rds_free_mr().
>>>
> Hmmm.. I didn't see email before in my inbox. Please post questions/patches on netdev in future which is the correct mailing list.
> 
>>> It seems that this one need some specific devices to run test,
>>> unfortunately, I don't have any of these.
>>> I've already sent two emails to the maintainer for help, no response yet,
>>> (the email address may not be in use).
>>>
>>> 0) in rds_recv_incoming_exthdrs(), it calls rds_rdma_unuse() when receive an
>>> extension header with force=0, if the victim mr does not have RDS_RDMA_USE_ONCE
>>> flag set, then the mr would stay in the rbtree. Without any lock, it tries to
>>> call mr->r_trans->sync_mr().
>>>
>>> 1) in rds_free_mr(), the same mr is found, and then freed. The mr->r_refcount
>>> doesn't change while rds_mr_tree_walk().
>>>
>>> 0) back in rds_rdma_unuse(), the victim mr get used again, call
>>> mr->r_trans->sync_mr().
>>>
>>> Could this race condition actually happen?
>>>
> force=0 is an interesting scenario. Let me think about it and get back.
> Thanks for report.
> 
> Regards,
> Santosh

Thanks for your attention.

Regards,

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Maybe a race condition in net/rds/rdma.c?
  2020-03-06 12:11     ` zerons
@ 2020-03-10 17:53       ` santosh.shilimkar
  2020-03-11  4:48         ` zerons
  0 siblings, 1 reply; 7+ messages in thread
From: santosh.shilimkar @ 2020-03-10 17:53 UTC (permalink / raw)
  To: zerons; +Cc: netdev, OFED mailing list, haakon.bugge

On 3/6/20 4:11 AM, zerons wrote:
> 
> 
> On 2/28/20 02:10, santosh.shilimkar@oracle.com wrote:
>>
>>>> On 18 Feb 2020, at 14:13, zerons <sironhide0null@gmail.com> wrote:
>>>>
>>>> Hi, all
>>>>
>>>> In net/rds/rdma.c
>>>> (https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/rds/rdma.c?h=v5.5.3*n419__;Iw!!GqivPVa7Brio!OwwQCLtjDsKmhaIz0sfaOVSuC4ai5t5_FgB7yqNExGOCBtACtIGLF61NNJyqSDtIAcGoPg$ ),
>>>> there may be a race condition between rds_rdma_unuse() and rds_free_mr().
>>>>
>> Hmmm.. I didn't see email before in my inbox. Please post questions/patches on netdev in future which is the correct mailing list.
>>
>>>> It seems that this one need some specific devices to run test,
>>>> unfortunately, I don't have any of these.
>>>> I've already sent two emails to the maintainer for help, no response yet,
>>>> (the email address may not be in use).
>>>>
>>>> 0) in rds_recv_incoming_exthdrs(), it calls rds_rdma_unuse() when receive an
>>>> extension header with force=0, if the victim mr does not have RDS_RDMA_USE_ONCE
>>>> flag set, then the mr would stay in the rbtree. Without any lock, it tries to
>>>> call mr->r_trans->sync_mr().
>>>>
MR won't stay in the rbtree with force flag. If the MR is used or
use_once is set in both cases its removed from the tree.
See "if (mr->r_use_once || force)"

>>>> 1) in rds_free_mr(), the same mr is found, and then freed. The mr->r_refcount
>>>> doesn't change while rds_mr_tree_walk().
>>>>
>>>> 0) back in rds_rdma_unuse(), the victim mr get used again, call
>>>> mr->r_trans->sync_mr().
>>>>
>>>> Could this race condition actually happen?
So from what I see, this race doesn't exist but let me know
if am missing something.

regards,
Santosh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Maybe a race condition in net/rds/rdma.c?
  2020-03-10 17:53       ` santosh.shilimkar
@ 2020-03-11  4:48         ` zerons
  2020-03-11 14:35           ` santosh.shilimkar
  0 siblings, 1 reply; 7+ messages in thread
From: zerons @ 2020-03-11  4:48 UTC (permalink / raw)
  To: santosh.shilimkar; +Cc: netdev, OFED mailing list, haakon.bugge



On 3/11/20 01:53, santosh.shilimkar@oracle.com wrote:
> On 3/6/20 4:11 AM, zerons wrote:
>>
>>
>> On 2/28/20 02:10, santosh.shilimkar@oracle.com wrote:
>>>
>>>>> On 18 Feb 2020, at 14:13, zerons <sironhide0null@gmail.com> wrote:
>>>>>
>>>>> Hi, all
>>>>>
>>>>> In net/rds/rdma.c
>>>>> (https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/rds/rdma.c?h=v5.5.3*n419__;Iw!!GqivPVa7Brio!OwwQCLtjDsKmhaIz0sfaOVSuC4ai5t5_FgB7yqNExGOCBtACtIGLF61NNJyqSDtIAcGoPg$ ),
>>>>> there may be a race condition between rds_rdma_unuse() and rds_free_mr().
>>>>>
>>> Hmmm.. I didn't see email before in my inbox. Please post questions/patches on netdev in future which is the correct mailing list.
>>>
>>>>> It seems that this one need some specific devices to run test,
>>>>> unfortunately, I don't have any of these.
>>>>> I've already sent two emails to the maintainer for help, no response yet,
>>>>> (the email address may not be in use).
>>>>>
>>>>> 0) in rds_recv_incoming_exthdrs(), it calls rds_rdma_unuse() when receive an
>>>>> extension header with force=0, if the victim mr does not have RDS_RDMA_USE_ONCE
>>>>> flag set, then the mr would stay in the rbtree. Without any lock, it tries to
>>>>> call mr->r_trans->sync_mr().
>>>>>
> MR won't stay in the rbtree with force flag. If the MR is used or
> use_once is set in both cases its removed from the tree.
> See "if (mr->r_use_once || force)"
> 

Sorry, I may misunderstand. Did you mean that if the MR is *used*,
it is removed from the tree with or without the force flag in
rds_rdma_unuse(), even when r_use_once is not set?

Regards,

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Maybe a race condition in net/rds/rdma.c?
  2020-03-11  4:48         ` zerons
@ 2020-03-11 14:35           ` santosh.shilimkar
  2020-03-12  8:58             ` zerons
  0 siblings, 1 reply; 7+ messages in thread
From: santosh.shilimkar @ 2020-03-11 14:35 UTC (permalink / raw)
  To: zerons; +Cc: netdev, OFED mailing list, haakon.bugge

On 3/10/20 9:48 PM, zerons wrote:
> 
> 
> On 3/11/20 01:53, santosh.shilimkar@oracle.com wrote:
>> On 3/6/20 4:11 AM, zerons wrote:
>>>
>>>
>>> On 2/28/20 02:10, santosh.shilimkar@oracle.com wrote:
>>>>
>>>>>> On 18 Feb 2020, at 14:13, zerons <sironhide0null@gmail.com> wrote:
>>>>>>
>>>>>> Hi, all
>>>>>>
>>>>>> In net/rds/rdma.c
>>>>>> (https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/rds/rdma.c?h=v5.5.3*n419__;Iw!!GqivPVa7Brio!OwwQCLtjDsKmhaIz0sfaOVSuC4ai5t5_FgB7yqNExGOCBtACtIGLF61NNJyqSDtIAcGoPg$ ),
>>>>>> there may be a race condition between rds_rdma_unuse() and rds_free_mr().
>>>>>>
>>>> Hmmm.. I didn't see email before in my inbox. Please post questions/patches on netdev in future which is the correct mailing list.
>>>>
>>>>>> It seems that this one need some specific devices to run test,
>>>>>> unfortunately, I don't have any of these.
>>>>>> I've already sent two emails to the maintainer for help, no response yet,
>>>>>> (the email address may not be in use).
>>>>>>
>>>>>> 0) in rds_recv_incoming_exthdrs(), it calls rds_rdma_unuse() when receive an
>>>>>> extension header with force=0, if the victim mr does not have RDS_RDMA_USE_ONCE
>>>>>> flag set, then the mr would stay in the rbtree. Without any lock, it tries to
>>>>>> call mr->r_trans->sync_mr().
>>>>>>
>> MR won't stay in the rbtree with force flag. If the MR is used or
>> use_once is set in both cases its removed from the tree.
>> See "if (mr->r_use_once || force)"
>>
> 
> Sorry, I may misunderstand. Did you mean that if the MR is *used*,
> it is removed from the tree with or without the force flag in
> rds_rdma_unuse(), even when r_use_once is not set?
> 
Once the MR is being used with use_once semantics it gets removed with 
or without remote side indicating it via extended header. use_once
optimization was added later. The base behavior is once the MR is
used by remote and same information is sent via extended header,
it gets cleaned up with force flag. Force flag ignores whether
its marked as used_once or not.

Regards,
Santosh


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Maybe a race condition in net/rds/rdma.c?
  2020-03-11 14:35           ` santosh.shilimkar
@ 2020-03-12  8:58             ` zerons
  2020-03-12 17:49               ` santosh.shilimkar
  0 siblings, 1 reply; 7+ messages in thread
From: zerons @ 2020-03-12  8:58 UTC (permalink / raw)
  To: santosh.shilimkar; +Cc: netdev, OFED mailing list, haakon.bugge



On 3/11/20 22:35, santosh.shilimkar@oracle.com wrote:
> On 3/10/20 9:48 PM, zerons wrote:
>>
>>
>> On 3/11/20 01:53, santosh.shilimkar@oracle.com wrote:
>>> On 3/6/20 4:11 AM, zerons wrote:
>>>>
>>>>
>>>> On 2/28/20 02:10, santosh.shilimkar@oracle.com wrote:
>>>>>
>>>>>>> On 18 Feb 2020, at 14:13, zerons <sironhide0null@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi, all
>>>>>>>
>>>>>>> In net/rds/rdma.c
>>>>>>> (https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/rds/rdma.c?h=v5.5.3*n419__;Iw!!GqivPVa7Brio!OwwQCLtjDsKmhaIz0sfaOVSuC4ai5t5_FgB7yqNExGOCBtACtIGLF61NNJyqSDtIAcGoPg$ ),
>>>>>>> there may be a race condition between rds_rdma_unuse() and rds_free_mr().
>>>>>>>
>>>>> Hmmm.. I didn't see email before in my inbox. Please post questions/patches on netdev in future which is the correct mailing list.
>>>>>
>>>>>>> It seems that this one need some specific devices to run test,
>>>>>>> unfortunately, I don't have any of these.
>>>>>>> I've already sent two emails to the maintainer for help, no response yet,
>>>>>>> (the email address may not be in use).
>>>>>>>
>>>>>>> 0) in rds_recv_incoming_exthdrs(), it calls rds_rdma_unuse() when receive an
>>>>>>> extension header with force=0, if the victim mr does not have RDS_RDMA_USE_ONCE
>>>>>>> flag set, then the mr would stay in the rbtree. Without any lock, it tries to
>>>>>>> call mr->r_trans->sync_mr().
>>>>>>>
>>> MR won't stay in the rbtree with force flag. If the MR is used or
>>> use_once is set in both cases its removed from the tree.
>>> See "if (mr->r_use_once || force)"
>>>
>>
>> Sorry, I may misunderstand. Did you mean that if the MR is *used*,
>> it is removed from the tree with or without the force flag in
>> rds_rdma_unuse(), even when r_use_once is not set?
>>
> Once the MR is being used with use_once semantics it gets removed with or without remote side indicating it via extended header. use_once
> optimization was added later. The base behavior is once the MR is
> used by remote and same information is sent via extended header,
> it gets cleaned up with force flag. Force flag ignores whether
> its marked as used_once or not.
> 

Sorry, I am still confused.

I check the code again. The rds_rdma_unuse() is called in two functions,
rds_recv_incoming_exthdrs() and rds_sendmsg().

In rds_sendmsg(), it calls rds_rdma_unuse() *with* force flag only when
the user included a RDMA_MAP cmsg *and* sendmsg() is failed.

In rds_recv_incoming_exthdrs(), the force is *false*. So we can consider
the rds_rdma_unuse() called *without* force flag.
Then I go check where r_use_once can be set.

__rds_rdma_map()
	rds_get_mr()
		rds_setsockopt()

	rds_get_mr_for_dest()
		rds_setsockopt()

	rds_cmsg_rdma_map()
		rds_cmsg_send()
			rds_sendmsg()

It seems to me that r_use_once is controlled by user applications.

I also wonder if we can ensure that the MR found in rds_rdma_unuse()
gets removed, then "if (mr->r_use_once || force)" doesn't make any sense.

Sorry to keep bothering you with my questions. I wish I had such a device 
that I can test it on.

Best regards,

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Maybe a race condition in net/rds/rdma.c?
  2020-03-12  8:58             ` zerons
@ 2020-03-12 17:49               ` santosh.shilimkar
  0 siblings, 0 replies; 7+ messages in thread
From: santosh.shilimkar @ 2020-03-12 17:49 UTC (permalink / raw)
  To: zerons; +Cc: netdev, OFED mailing list, haakon.bugge

On 3/12/20 1:58 AM, zerons wrote:
> 
[...]
>>>> MR won't stay in the rbtree with force flag. If the MR is used or
>>>> use_once is set in both cases its removed from the tree.
>>>> See "if (mr->r_use_once || force)"
>>>>
>>>
>>> Sorry, I may misunderstand. Did you mean that if the MR is *used*,
>>> it is removed from the tree with or without the force flag in
>>> rds_rdma_unuse(), even when r_use_once is not set?
>>>
>> Once the MR is being used with use_once semantics it gets removed with or without remote side indicating it via extended header. use_once
>> optimization was added later. The base behavior is once the MR is
>> used by remote and same information is sent via extended header,
>> it gets cleaned up with force flag. Force flag ignores whether
>> its marked as used_once or not.
>>
> 
> Sorry, I am still confused.
> 
> I check the code again. The rds_rdma_unuse() is called in two functions,
> rds_recv_incoming_exthdrs() and rds_sendmsg().
> 
> In rds_sendmsg(), it calls rds_rdma_unuse() *with* force flag only when
> the user included a RDMA_MAP cmsg *and* sendmsg() is failed.
>
correct.

> In rds_recv_incoming_exthdrs(), the force is *false*. So we can consider
> the rds_rdma_unuse() called *without* force flag.
> Then I go check where r_use_once can be set.
> 
> __rds_rdma_map()
> 	rds_get_mr()
> 		rds_setsockopt()
> 
> 	rds_get_mr_for_dest()
> 		rds_setsockopt()
> 
> 	rds_cmsg_rdma_map()
> 		rds_cmsg_send()
> 			rds_sendmsg()
> 
> It seems to me that r_use_once is controlled by user applications.
>
yes it is and its being set in the application using this in
production. But You do have point that if application don't set it
then even after MR being used and remote node indicated it being
used, the MR still remains in the RB tree.


> Sorry to keep bothering you with my questions. I wish I had such a device
> that I can test it on.
> 
Not at all. You mostly found a race condition when use_once is not used
but need to verify it. We will look into it more. Thanks for your
patience.

Regards,
Santosh


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-03-12 17:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <afd9225d-5c43-8cc7-0eed-455837b53e10@gmail.com>
     [not found] ` <D8EB4A77-77D7-41EB-9021-EA7BB8C3FA5B@oracle.com>
2020-02-27 18:10   ` Maybe a race condition in net/rds/rdma.c? santosh.shilimkar
2020-03-06 12:11     ` zerons
2020-03-10 17:53       ` santosh.shilimkar
2020-03-11  4:48         ` zerons
2020-03-11 14:35           ` santosh.shilimkar
2020-03-12  8:58             ` zerons
2020-03-12 17:49               ` santosh.shilimkar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).