All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
@ 2011-05-23 13:53 Jan Kiszka
  2011-05-24  4:31 ` Gilles Chanteperdrix
  2011-06-19 10:14 ` Gilles Chanteperdrix
  0 siblings, 2 replies; 37+ messages in thread
From: Jan Kiszka @ 2011-05-23 13:53 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai core

The following changes since commit aec30a2543afa18fa7832deee85e187b0faeb1f0:

  xeno-test: fix reference to @XENO_TEST_DIR@ (2011-05-15 21:20:41 +0200)

are available in the git repository at:
  git://git.xenomai.org/xenomai-jki.git for-upstream

Jan Kiszka (1):
      native: Fix msendq fastlock leakage

 include/native/task.h    |    5 +++++
 ksrc/skins/native/task.c |   13 ++++++-------
 2 files changed, 11 insertions(+), 7 deletions(-)

------8<------

When a native task terminates without going through rt_task_delete, we
leaked the fastlock so far. Fix it by moving the release into the delete
hook. As the ppd is already invalid at that point, we have to save the
heap address in the task data structure.

Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid>
---
 include/native/task.h    |    5 +++++
 ksrc/skins/native/task.c |   13 ++++++-------
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/include/native/task.h b/include/native/task.h
index 519aaec..0d44ccf 100644
--- a/include/native/task.h
+++ b/include/native/task.h
@@ -22,6 +22,7 @@
 #ifndef _XENO_TASK_H
 #define _XENO_TASK_H
 
+#include <nucleus/heap.h>
 #include <nucleus/sched.h>
 #include <native/types.h>
 
@@ -166,6 +167,10 @@ typedef struct rt_task {
     xnsynch_t mrecv,
 	      msendq;
 
+#ifdef CONFIG_XENO_FASTSYNCH
+	xnheap_t *msendq_fastlock_heap;
+#endif /* CONFIG_XENO_FASTSYNCH */
+
     int flowgen;		/* !< Flow id. generator. */
 #endif /* CONFIG_XENO_OPT_NATIVE_MPS */
 
diff --git a/ksrc/skins/native/task.c b/ksrc/skins/native/task.c
index 1192509..6970363 100644
--- a/ksrc/skins/native/task.c
+++ b/ksrc/skins/native/task.c
@@ -79,6 +79,9 @@ static void __task_delete_hook(xnthread_t *thread)
 	   hooks are done. */
 	xnsynch_destroy(&task->mrecv);
 	xnsynch_destroy(&task->msendq);
+#ifdef CONFIG_XENO_FASTSYNCH
+	xnheap_free(task->msendq_fastlock_heap, task->msendq.fastlock);
+#endif /* CONFIG_XENO_FASTSYNCH */
 #endif /* CONFIG_XENO_OPT_NATIVE_MPS */
 
 	xnsynch_destroy(&task->safesynch);
@@ -301,7 +304,8 @@ int rt_task_create(RT_TASK *task,
 #ifdef CONFIG_XENO_OPT_NATIVE_MPS
 #ifdef CONFIG_XENO_FASTSYNCH
 	/* Allocate lock memory for in-kernel use */
-	fastlock = xnheap_alloc(&xnsys_ppd_get(0)->sem_heap,
+	task->msendq_fastlock_heap = &xnsys_ppd_get(0)->sem_heap;
+	fastlock = xnheap_alloc(task->msendq_fastlock_heap,
 				sizeof(*fastlock));
 	if (fastlock == NULL)
 		return -ENOMEM;
@@ -328,7 +332,7 @@ int rt_task_create(RT_TASK *task,
 	err = xnthread_register(&task->thread_base, name ? task->rname : "");
 	if (err) {
 #if defined(CONFIG_XENO_OPT_NATIVE_MPS) && defined(CONFIG_XENO_FASTSYNCH)
-		xnheap_free(&xnsys_ppd_get(0)->sem_heap, fastlock);
+		xnheap_free(task->msendq_fastlock_heap, fastlock);
 #endif
 		xnpod_delete_thread(&task->thread_base);
 	} else
@@ -640,11 +644,6 @@ int rt_task_delete(RT_TASK *task)
 	if (err)
 		goto unlock_and_exit;
 
-#if defined(CONFIG_XENO_OPT_NATIVE_MPS) && defined(CONFIG_XENO_FASTSYNCH)
-	xnheap_free(&xnsys_ppd_get(0)->sem_heap,
-		    task->msendq.fastlock);
-#endif
-
 	/* Does not return if task is current. */
 	xnpod_delete_thread(&task->thread_base);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-23 13:53 [Xenomai-core] [PULL] native: Fix msendq fastlock leakage Jan Kiszka
@ 2011-05-24  4:31 ` Gilles Chanteperdrix
  2011-05-24  9:13   ` Jan Kiszka
  2011-06-19 10:14 ` Gilles Chanteperdrix
  1 sibling, 1 reply; 37+ messages in thread
From: Gilles Chanteperdrix @ 2011-05-24  4:31 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

On 05/23/2011 03:53 PM, Jan Kiszka wrote:
> The following changes since commit aec30a2543afa18fa7832deee85e187b0faeb1f0:
> 
>   xeno-test: fix reference to @XENO_TEST_DIR@ (2011-05-15 21:20:41 +0200)
> 
> are available in the git repository at:
>   git://git.xenomai.org/xenomai-jki.git for-upstream
> 
> Jan Kiszka (1):
>       native: Fix msendq fastlock leakage
> 
>  include/native/task.h    |    5 +++++
>  ksrc/skins/native/task.c |   13 ++++++-------
>  2 files changed, 11 insertions(+), 7 deletions(-)
> 
> ------8<------
> 
> When a native task terminates without going through rt_task_delete, we
> leaked the fastlock so far. Fix it by moving the release into the delete
> hook. As the ppd is already invalid at that point, we have to save the
> heap address in the task data structure.

I Jan, I once worked on a patch to reverse the ppd cleanup order, in order
to fix bugs of this kind. Here it comes. I do not remember why I did not
commit it, but I guess it was not working well. Could we restart working
from this patch?

>From 038ecf08cd66b3112e0fe277d71d294b8eb83bcc Mon Sep 17 00:00:00 2001
From: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
Date: Sun, 29 Aug 2010 14:52:08 +0200
Subject: [PATCH] nucleus: reverse ppd cleanup order

---
 ksrc/nucleus/shadow.c |   11 ++++++-----
 1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
index b2d4326..725ae43 100644
--- a/ksrc/nucleus/shadow.c
+++ b/ksrc/nucleus/shadow.c
@@ -556,7 +556,7 @@ static unsigned ppd_lookup_inner(xnqueue_t **pq,
 	}
 	while (holder &&
 	       (ppd->key.mm < pkey->mm ||
-		(ppd->key.mm == pkey->mm && ppd->key.muxid < pkey->muxid)));
+		(ppd->key.mm == pkey->mm && ppd->key.muxid > pkey->muxid)));
 
 	if (ppd->key.mm == pkey->mm && ppd->key.muxid == pkey->muxid) {
 		/* found it, return it. */
@@ -566,7 +566,7 @@ static unsigned ppd_lookup_inner(xnqueue_t **pq,
 
 	/* not found, return successor for insertion. */
 	if (ppd->key.mm < pkey->mm ||
-	    (ppd->key.mm == pkey->mm && ppd->key.muxid < pkey->muxid))
+	    (ppd->key.mm == pkey->mm && ppd->key.muxid > pkey->muxid))
 		*pholder = holder ? link2ppd(holder) : NULL;
 	else
 		*pholder = ppd;
@@ -589,10 +589,11 @@ static int ppd_insert(xnshadow_ppd_t * holder)
 	}
 
 	inith(&holder->link);
-	if (next)
+	if (next) {
 		insertq(q, &next->link, &holder->link);
-	else
+	} else {
 		appendq(q, &holder->link);
+	}
 	xnlock_put_irqrestore(&nklock, s);
 
 	return 0;
@@ -640,7 +641,7 @@ static inline void ppd_remove_mm(struct mm_struct *mm,
 	xnqueue_t *q;
 	spl_t s;
 
-	key.muxid = 0;
+	key.muxid = ~0UL;
 	key.mm = mm;
 	xnlock_get_irqsave(&nklock, s);
 	ppd_lookup_inner(&q, &ppd, &key);
-- 
1.7.2.5



-- 
                                                                Gilles.


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-24  4:31 ` Gilles Chanteperdrix
@ 2011-05-24  9:13   ` Jan Kiszka
  2011-05-24  9:32     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Kiszka @ 2011-05-24  9:13 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai core

On 2011-05-24 06:31, Gilles Chanteperdrix wrote:
> On 05/23/2011 03:53 PM, Jan Kiszka wrote:
>> The following changes since commit aec30a2543afa18fa7832deee85e187b0faeb1f0:
>>
>>   xeno-test: fix reference to @XENO_TEST_DIR@ (2011-05-15 21:20:41 +0200)
>>
>> are available in the git repository at:
>>   git://git.xenomai.org/xenomai-jki.git for-upstream
>>
>> Jan Kiszka (1):
>>       native: Fix msendq fastlock leakage
>>
>>  include/native/task.h    |    5 +++++
>>  ksrc/skins/native/task.c |   13 ++++++-------
>>  2 files changed, 11 insertions(+), 7 deletions(-)
>>
>> ------8<------
>>
>> When a native task terminates without going through rt_task_delete, we
>> leaked the fastlock so far. Fix it by moving the release into the delete
>> hook. As the ppd is already invalid at that point, we have to save the
>> heap address in the task data structure.
> 
> I Jan, I once worked on a patch to reverse the ppd cleanup order, in order
> to fix bugs of this kind. Here it comes. I do not remember why I did not
> commit it, but I guess it was not working well. Could we restart working
> from this patch?
> 
> From 038ecf08cd66b3112e0fe277d71d294b8eb83bcc Mon Sep 17 00:00:00 2001
> From: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
> Date: Sun, 29 Aug 2010 14:52:08 +0200
> Subject: [PATCH] nucleus: reverse ppd cleanup order
> 
> ---
>  ksrc/nucleus/shadow.c |   11 ++++++-----
>  1 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
> index b2d4326..725ae43 100644
> --- a/ksrc/nucleus/shadow.c
> +++ b/ksrc/nucleus/shadow.c
> @@ -556,7 +556,7 @@ static unsigned ppd_lookup_inner(xnqueue_t **pq,
>  	}
>  	while (holder &&
>  	       (ppd->key.mm < pkey->mm ||
> -		(ppd->key.mm == pkey->mm && ppd->key.muxid < pkey->muxid)));
> +		(ppd->key.mm == pkey->mm && ppd->key.muxid > pkey->muxid)));
>  
>  	if (ppd->key.mm == pkey->mm && ppd->key.muxid == pkey->muxid) {
>  		/* found it, return it. */
> @@ -566,7 +566,7 @@ static unsigned ppd_lookup_inner(xnqueue_t **pq,
>  
>  	/* not found, return successor for insertion. */
>  	if (ppd->key.mm < pkey->mm ||
> -	    (ppd->key.mm == pkey->mm && ppd->key.muxid < pkey->muxid))
> +	    (ppd->key.mm == pkey->mm && ppd->key.muxid > pkey->muxid))
>  		*pholder = holder ? link2ppd(holder) : NULL;
>  	else
>  		*pholder = ppd;
> @@ -589,10 +589,11 @@ static int ppd_insert(xnshadow_ppd_t * holder)
>  	}
>  
>  	inith(&holder->link);
> -	if (next)
> +	if (next) {
>  		insertq(q, &next->link, &holder->link);
> -	else
> +	} else {
>  		appendq(q, &holder->link);
> +	}
>  	xnlock_put_irqrestore(&nklock, s);
>  
>  	return 0;
> @@ -640,7 +641,7 @@ static inline void ppd_remove_mm(struct mm_struct *mm,
>  	xnqueue_t *q;
>  	spl_t s;
>  
> -	key.muxid = 0;
> +	key.muxid = ~0UL;
>  	key.mm = mm;
>  	xnlock_get_irqsave(&nklock, s);
>  	ppd_lookup_inner(&q, &ppd, &key);

Unfortunately, that won't help. I think we are forced to clear
xnshadow_thrptd before calling into xnpod_delete_thread, but we would
need that for xnshadow_ppd_get (=>xnpod_userspace_p()).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-24  9:13   ` Jan Kiszka
@ 2011-05-24  9:32     ` Gilles Chanteperdrix
  2011-05-24  9:36       ` Jan Kiszka
  0 siblings, 1 reply; 37+ messages in thread
From: Gilles Chanteperdrix @ 2011-05-24  9:32 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

On 05/24/2011 11:13 AM, Jan Kiszka wrote:
> On 2011-05-24 06:31, Gilles Chanteperdrix wrote:
>> On 05/23/2011 03:53 PM, Jan Kiszka wrote:
>>> The following changes since commit aec30a2543afa18fa7832deee85e187b0faeb1f0:
>>>
>>>   xeno-test: fix reference to @XENO_TEST_DIR@ (2011-05-15 21:20:41 +0200)
>>>
>>> are available in the git repository at:
>>>   git://git.xenomai.org/xenomai-jki.git for-upstream
>>>
>>> Jan Kiszka (1):
>>>       native: Fix msendq fastlock leakage
>>>
>>>  include/native/task.h    |    5 +++++
>>>  ksrc/skins/native/task.c |   13 ++++++-------
>>>  2 files changed, 11 insertions(+), 7 deletions(-)
>>>
>>> ------8<------
>>>
>>> When a native task terminates without going through rt_task_delete, we
>>> leaked the fastlock so far. Fix it by moving the release into the delete
>>> hook. As the ppd is already invalid at that point, we have to save the
>>> heap address in the task data structure.
>>
>> I Jan, I once worked on a patch to reverse the ppd cleanup order, in order
>> to fix bugs of this kind. Here it comes. I do not remember why I did not
>> commit it, but I guess it was not working well. Could we restart working
>> from this patch?
>>
>> From 038ecf08cd66b3112e0fe277d71d294b8eb83bcc Mon Sep 17 00:00:00 2001
>> From: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
>> Date: Sun, 29 Aug 2010 14:52:08 +0200
>> Subject: [PATCH] nucleus: reverse ppd cleanup order
>>
>> ---
>>  ksrc/nucleus/shadow.c |   11 ++++++-----
>>  1 files changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
>> index b2d4326..725ae43 100644
>> --- a/ksrc/nucleus/shadow.c
>> +++ b/ksrc/nucleus/shadow.c
>> @@ -556,7 +556,7 @@ static unsigned ppd_lookup_inner(xnqueue_t **pq,
>>  	}
>>  	while (holder &&
>>  	       (ppd->key.mm < pkey->mm ||
>> -		(ppd->key.mm == pkey->mm && ppd->key.muxid < pkey->muxid)));
>> +		(ppd->key.mm == pkey->mm && ppd->key.muxid > pkey->muxid)));
>>  
>>  	if (ppd->key.mm == pkey->mm && ppd->key.muxid == pkey->muxid) {
>>  		/* found it, return it. */
>> @@ -566,7 +566,7 @@ static unsigned ppd_lookup_inner(xnqueue_t **pq,
>>  
>>  	/* not found, return successor for insertion. */
>>  	if (ppd->key.mm < pkey->mm ||
>> -	    (ppd->key.mm == pkey->mm && ppd->key.muxid < pkey->muxid))
>> +	    (ppd->key.mm == pkey->mm && ppd->key.muxid > pkey->muxid))
>>  		*pholder = holder ? link2ppd(holder) : NULL;
>>  	else
>>  		*pholder = ppd;
>> @@ -589,10 +589,11 @@ static int ppd_insert(xnshadow_ppd_t * holder)
>>  	}
>>  
>>  	inith(&holder->link);
>> -	if (next)
>> +	if (next) {
>>  		insertq(q, &next->link, &holder->link);
>> -	else
>> +	} else {
>>  		appendq(q, &holder->link);
>> +	}
>>  	xnlock_put_irqrestore(&nklock, s);
>>  
>>  	return 0;
>> @@ -640,7 +641,7 @@ static inline void ppd_remove_mm(struct mm_struct *mm,
>>  	xnqueue_t *q;
>>  	spl_t s;
>>  
>> -	key.muxid = 0;
>> +	key.muxid = ~0UL;
>>  	key.mm = mm;
>>  	xnlock_get_irqsave(&nklock, s);
>>  	ppd_lookup_inner(&q, &ppd, &key);
> 
> Unfortunately, that won't help. I think we are forced to clear
> xnshadow_thrptd before calling into xnpod_delete_thread, but we would
> need that for xnshadow_ppd_get (=>xnpod_userspace_p()).

I remember that now. Even if it worked, when the cleanup handler is
called, current->mm is NULL. We need to do this differently, the sys ppd
should be treated differently and passed to the other ppds cleanup routines.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-24  9:32     ` Gilles Chanteperdrix
@ 2011-05-24  9:36       ` Jan Kiszka
  2011-05-24  9:58         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Kiszka @ 2011-05-24  9:36 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai core

On 2011-05-24 11:32, Gilles Chanteperdrix wrote:
> On 05/24/2011 11:13 AM, Jan Kiszka wrote:
>> On 2011-05-24 06:31, Gilles Chanteperdrix wrote:
>>> On 05/23/2011 03:53 PM, Jan Kiszka wrote:
>>>> The following changes since commit aec30a2543afa18fa7832deee85e187b0faeb1f0:
>>>>
>>>>   xeno-test: fix reference to @XENO_TEST_DIR@ (2011-05-15 21:20:41 +0200)
>>>>
>>>> are available in the git repository at:
>>>>   git://git.xenomai.org/xenomai-jki.git for-upstream
>>>>
>>>> Jan Kiszka (1):
>>>>       native: Fix msendq fastlock leakage
>>>>
>>>>  include/native/task.h    |    5 +++++
>>>>  ksrc/skins/native/task.c |   13 ++++++-------
>>>>  2 files changed, 11 insertions(+), 7 deletions(-)
>>>>
>>>> ------8<------
>>>>
>>>> When a native task terminates without going through rt_task_delete, we
>>>> leaked the fastlock so far. Fix it by moving the release into the delete
>>>> hook. As the ppd is already invalid at that point, we have to save the
>>>> heap address in the task data structure.
>>>
>>> I Jan, I once worked on a patch to reverse the ppd cleanup order, in order
>>> to fix bugs of this kind. Here it comes. I do not remember why I did not
>>> commit it, but I guess it was not working well. Could we restart working
>>> from this patch?
>>>
>>> From 038ecf08cd66b3112e0fe277d71d294b8eb83bcc Mon Sep 17 00:00:00 2001
>>> From: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
>>> Date: Sun, 29 Aug 2010 14:52:08 +0200
>>> Subject: [PATCH] nucleus: reverse ppd cleanup order
>>>
>>> ---
>>>  ksrc/nucleus/shadow.c |   11 ++++++-----
>>>  1 files changed, 6 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
>>> index b2d4326..725ae43 100644
>>> --- a/ksrc/nucleus/shadow.c
>>> +++ b/ksrc/nucleus/shadow.c
>>> @@ -556,7 +556,7 @@ static unsigned ppd_lookup_inner(xnqueue_t **pq,
>>>  	}
>>>  	while (holder &&
>>>  	       (ppd->key.mm < pkey->mm ||
>>> -		(ppd->key.mm == pkey->mm && ppd->key.muxid < pkey->muxid)));
>>> +		(ppd->key.mm == pkey->mm && ppd->key.muxid > pkey->muxid)));
>>>  
>>>  	if (ppd->key.mm == pkey->mm && ppd->key.muxid == pkey->muxid) {
>>>  		/* found it, return it. */
>>> @@ -566,7 +566,7 @@ static unsigned ppd_lookup_inner(xnqueue_t **pq,
>>>  
>>>  	/* not found, return successor for insertion. */
>>>  	if (ppd->key.mm < pkey->mm ||
>>> -	    (ppd->key.mm == pkey->mm && ppd->key.muxid < pkey->muxid))
>>> +	    (ppd->key.mm == pkey->mm && ppd->key.muxid > pkey->muxid))
>>>  		*pholder = holder ? link2ppd(holder) : NULL;
>>>  	else
>>>  		*pholder = ppd;
>>> @@ -589,10 +589,11 @@ static int ppd_insert(xnshadow_ppd_t * holder)
>>>  	}
>>>  
>>>  	inith(&holder->link);
>>> -	if (next)
>>> +	if (next) {
>>>  		insertq(q, &next->link, &holder->link);
>>> -	else
>>> +	} else {
>>>  		appendq(q, &holder->link);
>>> +	}
>>>  	xnlock_put_irqrestore(&nklock, s);
>>>  
>>>  	return 0;
>>> @@ -640,7 +641,7 @@ static inline void ppd_remove_mm(struct mm_struct *mm,
>>>  	xnqueue_t *q;
>>>  	spl_t s;
>>>  
>>> -	key.muxid = 0;
>>> +	key.muxid = ~0UL;
>>>  	key.mm = mm;
>>>  	xnlock_get_irqsave(&nklock, s);
>>>  	ppd_lookup_inner(&q, &ppd, &key);
>>
>> Unfortunately, that won't help. I think we are forced to clear
>> xnshadow_thrptd before calling into xnpod_delete_thread, but we would
>> need that for xnshadow_ppd_get (=>xnpod_userspace_p()).
> 
> I remember that now. Even if it worked, when the cleanup handler is
> called, current->mm is NULL. We need to do this differently, the sys ppd
> should be treated differently and passed to the other ppds cleanup routines.

Do you already have an idea how to get that info to the delete hook
function?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-24  9:36       ` Jan Kiszka
@ 2011-05-24  9:58         ` Gilles Chanteperdrix
  2011-05-24 10:36           ` Jan Kiszka
  0 siblings, 1 reply; 37+ messages in thread
From: Gilles Chanteperdrix @ 2011-05-24  9:58 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

On 05/24/2011 11:36 AM, Jan Kiszka wrote:
> On 2011-05-24 11:32, Gilles Chanteperdrix wrote:
>> On 05/24/2011 11:13 AM, Jan Kiszka wrote:
>>> On 2011-05-24 06:31, Gilles Chanteperdrix wrote:
>>>> On 05/23/2011 03:53 PM, Jan Kiszka wrote:
>>>>> The following changes since commit aec30a2543afa18fa7832deee85e187b0faeb1f0:
>>>>>
>>>>>   xeno-test: fix reference to @XENO_TEST_DIR@ (2011-05-15 21:20:41 +0200)
>>>>>
>>>>> are available in the git repository at:
>>>>>   git://git.xenomai.org/xenomai-jki.git for-upstream
>>>>>
>>>>> Jan Kiszka (1):
>>>>>       native: Fix msendq fastlock leakage
>>>>>
>>>>>  include/native/task.h    |    5 +++++
>>>>>  ksrc/skins/native/task.c |   13 ++++++-------
>>>>>  2 files changed, 11 insertions(+), 7 deletions(-)
>>>>>
>>>>> ------8<------
>>>>>
>>>>> When a native task terminates without going through rt_task_delete, we
>>>>> leaked the fastlock so far. Fix it by moving the release into the delete
>>>>> hook. As the ppd is already invalid at that point, we have to save the
>>>>> heap address in the task data structure.
>>>>
>>>> I Jan, I once worked on a patch to reverse the ppd cleanup order, in order
>>>> to fix bugs of this kind. Here it comes. I do not remember why I did not
>>>> commit it, but I guess it was not working well. Could we restart working
>>>> from this patch?
>>>>
>>>> From 038ecf08cd66b3112e0fe277d71d294b8eb83bcc Mon Sep 17 00:00:00 2001
>>>> From: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
>>>> Date: Sun, 29 Aug 2010 14:52:08 +0200
>>>> Subject: [PATCH] nucleus: reverse ppd cleanup order
>>>>
>>>> ---
>>>>  ksrc/nucleus/shadow.c |   11 ++++++-----
>>>>  1 files changed, 6 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
>>>> index b2d4326..725ae43 100644
>>>> --- a/ksrc/nucleus/shadow.c
>>>> +++ b/ksrc/nucleus/shadow.c
>>>> @@ -556,7 +556,7 @@ static unsigned ppd_lookup_inner(xnqueue_t **pq,
>>>>  	}
>>>>  	while (holder &&
>>>>  	       (ppd->key.mm < pkey->mm ||
>>>> -		(ppd->key.mm == pkey->mm && ppd->key.muxid < pkey->muxid)));
>>>> +		(ppd->key.mm == pkey->mm && ppd->key.muxid > pkey->muxid)));
>>>>  
>>>>  	if (ppd->key.mm == pkey->mm && ppd->key.muxid == pkey->muxid) {
>>>>  		/* found it, return it. */
>>>> @@ -566,7 +566,7 @@ static unsigned ppd_lookup_inner(xnqueue_t **pq,
>>>>  
>>>>  	/* not found, return successor for insertion. */
>>>>  	if (ppd->key.mm < pkey->mm ||
>>>> -	    (ppd->key.mm == pkey->mm && ppd->key.muxid < pkey->muxid))
>>>> +	    (ppd->key.mm == pkey->mm && ppd->key.muxid > pkey->muxid))
>>>>  		*pholder = holder ? link2ppd(holder) : NULL;
>>>>  	else
>>>>  		*pholder = ppd;
>>>> @@ -589,10 +589,11 @@ static int ppd_insert(xnshadow_ppd_t * holder)
>>>>  	}
>>>>  
>>>>  	inith(&holder->link);
>>>> -	if (next)
>>>> +	if (next) {
>>>>  		insertq(q, &next->link, &holder->link);
>>>> -	else
>>>> +	} else {
>>>>  		appendq(q, &holder->link);
>>>> +	}
>>>>  	xnlock_put_irqrestore(&nklock, s);
>>>>  
>>>>  	return 0;
>>>> @@ -640,7 +641,7 @@ static inline void ppd_remove_mm(struct mm_struct *mm,
>>>>  	xnqueue_t *q;
>>>>  	spl_t s;
>>>>  
>>>> -	key.muxid = 0;
>>>> +	key.muxid = ~0UL;
>>>>  	key.mm = mm;
>>>>  	xnlock_get_irqsave(&nklock, s);
>>>>  	ppd_lookup_inner(&q, &ppd, &key);
>>>
>>> Unfortunately, that won't help. I think we are forced to clear
>>> xnshadow_thrptd before calling into xnpod_delete_thread, but we would
>>> need that for xnshadow_ppd_get (=>xnpod_userspace_p()).
>>
>> I remember that now. Even if it worked, when the cleanup handler is
>> called, current->mm is NULL. We need to do this differently, the sys ppd
>> should be treated differently and passed to the other ppds cleanup routines.
> 
> Do you already have an idea how to get that info to the delete hook
> function?

Yes. We start by not applying the list reversal patch, then the sys_ppd
is the first in the list. So, we can, in the function ppd_remove_mm,
start by removing all the others ppd, then remove the sys ppd (that is
the first), last. This changes a few signatures in the core code, a lot
of things in the skin code, but that would be for the better...

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-24  9:58         ` Gilles Chanteperdrix
@ 2011-05-24 10:36           ` Jan Kiszka
  2011-05-24 10:41             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Kiszka @ 2011-05-24 10:36 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai core

On 2011-05-24 11:58, Gilles Chanteperdrix wrote:
> On 05/24/2011 11:36 AM, Jan Kiszka wrote:
>> On 2011-05-24 11:32, Gilles Chanteperdrix wrote:
>>> On 05/24/2011 11:13 AM, Jan Kiszka wrote:
>>>> On 2011-05-24 06:31, Gilles Chanteperdrix wrote:
>>>>> On 05/23/2011 03:53 PM, Jan Kiszka wrote:
>>>>>> The following changes since commit aec30a2543afa18fa7832deee85e187b0faeb1f0:
>>>>>>
>>>>>>   xeno-test: fix reference to @XENO_TEST_DIR@ (2011-05-15 21:20:41 +0200)
>>>>>>
>>>>>> are available in the git repository at:
>>>>>>   git://git.xenomai.org/xenomai-jki.git for-upstream
>>>>>>
>>>>>> Jan Kiszka (1):
>>>>>>       native: Fix msendq fastlock leakage
>>>>>>
>>>>>>  include/native/task.h    |    5 +++++
>>>>>>  ksrc/skins/native/task.c |   13 ++++++-------
>>>>>>  2 files changed, 11 insertions(+), 7 deletions(-)
>>>>>>
>>>>>> ------8<------
>>>>>>
>>>>>> When a native task terminates without going through rt_task_delete, we
>>>>>> leaked the fastlock so far. Fix it by moving the release into the delete
>>>>>> hook. As the ppd is already invalid at that point, we have to save the
>>>>>> heap address in the task data structure.
>>>>>
>>>>> I Jan, I once worked on a patch to reverse the ppd cleanup order, in order
>>>>> to fix bugs of this kind. Here it comes. I do not remember why I did not
>>>>> commit it, but I guess it was not working well. Could we restart working
>>>>> from this patch?
>>>>>
>>>>> From 038ecf08cd66b3112e0fe277d71d294b8eb83bcc Mon Sep 17 00:00:00 2001
>>>>> From: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
>>>>> Date: Sun, 29 Aug 2010 14:52:08 +0200
>>>>> Subject: [PATCH] nucleus: reverse ppd cleanup order
>>>>>
>>>>> ---
>>>>>  ksrc/nucleus/shadow.c |   11 ++++++-----
>>>>>  1 files changed, 6 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
>>>>> index b2d4326..725ae43 100644
>>>>> --- a/ksrc/nucleus/shadow.c
>>>>> +++ b/ksrc/nucleus/shadow.c
>>>>> @@ -556,7 +556,7 @@ static unsigned ppd_lookup_inner(xnqueue_t **pq,
>>>>>  	}
>>>>>  	while (holder &&
>>>>>  	       (ppd->key.mm < pkey->mm ||
>>>>> -		(ppd->key.mm == pkey->mm && ppd->key.muxid < pkey->muxid)));
>>>>> +		(ppd->key.mm == pkey->mm && ppd->key.muxid > pkey->muxid)));
>>>>>  
>>>>>  	if (ppd->key.mm == pkey->mm && ppd->key.muxid == pkey->muxid) {
>>>>>  		/* found it, return it. */
>>>>> @@ -566,7 +566,7 @@ static unsigned ppd_lookup_inner(xnqueue_t **pq,
>>>>>  
>>>>>  	/* not found, return successor for insertion. */
>>>>>  	if (ppd->key.mm < pkey->mm ||
>>>>> -	    (ppd->key.mm == pkey->mm && ppd->key.muxid < pkey->muxid))
>>>>> +	    (ppd->key.mm == pkey->mm && ppd->key.muxid > pkey->muxid))
>>>>>  		*pholder = holder ? link2ppd(holder) : NULL;
>>>>>  	else
>>>>>  		*pholder = ppd;
>>>>> @@ -589,10 +589,11 @@ static int ppd_insert(xnshadow_ppd_t * holder)
>>>>>  	}
>>>>>  
>>>>>  	inith(&holder->link);
>>>>> -	if (next)
>>>>> +	if (next) {
>>>>>  		insertq(q, &next->link, &holder->link);
>>>>> -	else
>>>>> +	} else {
>>>>>  		appendq(q, &holder->link);
>>>>> +	}
>>>>>  	xnlock_put_irqrestore(&nklock, s);
>>>>>  
>>>>>  	return 0;
>>>>> @@ -640,7 +641,7 @@ static inline void ppd_remove_mm(struct mm_struct *mm,
>>>>>  	xnqueue_t *q;
>>>>>  	spl_t s;
>>>>>  
>>>>> -	key.muxid = 0;
>>>>> +	key.muxid = ~0UL;
>>>>>  	key.mm = mm;
>>>>>  	xnlock_get_irqsave(&nklock, s);
>>>>>  	ppd_lookup_inner(&q, &ppd, &key);
>>>>
>>>> Unfortunately, that won't help. I think we are forced to clear
>>>> xnshadow_thrptd before calling into xnpod_delete_thread, but we would
>>>> need that for xnshadow_ppd_get (=>xnpod_userspace_p()).
>>>
>>> I remember that now. Even if it worked, when the cleanup handler is
>>> called, current->mm is NULL. We need to do this differently, the sys ppd
>>> should be treated differently and passed to the other ppds cleanup routines.
>>
>> Do you already have an idea how to get that info to the delete hook
>> function?
> 
> Yes. We start by not applying the list reversal patch, then the sys_ppd
> is the first in the list. So, we can, in the function ppd_remove_mm,
> start by removing all the others ppd, then remove the sys ppd (that is
> the first), last. This changes a few signatures in the core code, a lot
> of things in the skin code, but that would be for the better...

I still don't see how this affects the order we use in
do_taskexit_event, the one that prevents xnsys_get_ppd usage even when
the mm is still present.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-24 10:36           ` Jan Kiszka
@ 2011-05-24 10:41             ` Gilles Chanteperdrix
  2011-05-24 12:23               ` Jan Kiszka
  0 siblings, 1 reply; 37+ messages in thread
From: Gilles Chanteperdrix @ 2011-05-24 10:41 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

On 05/24/2011 12:36 PM, Jan Kiszka wrote:
> On 2011-05-24 11:58, Gilles Chanteperdrix wrote:
>> On 05/24/2011 11:36 AM, Jan Kiszka wrote:
>>> On 2011-05-24 11:32, Gilles Chanteperdrix wrote:
>>>> On 05/24/2011 11:13 AM, Jan Kiszka wrote:
>>>>> On 2011-05-24 06:31, Gilles Chanteperdrix wrote:
>>>>>> On 05/23/2011 03:53 PM, Jan Kiszka wrote:
>>>>>>> The following changes since commit aec30a2543afa18fa7832deee85e187b0faeb1f0:
>>>>>>>
>>>>>>>   xeno-test: fix reference to @XENO_TEST_DIR@ (2011-05-15 21:20:41 +0200)
>>>>>>>
>>>>>>> are available in the git repository at:
>>>>>>>   git://git.xenomai.org/xenomai-jki.git for-upstream
>>>>>>>
>>>>>>> Jan Kiszka (1):
>>>>>>>       native: Fix msendq fastlock leakage
>>>>>>>
>>>>>>>  include/native/task.h    |    5 +++++
>>>>>>>  ksrc/skins/native/task.c |   13 ++++++-------
>>>>>>>  2 files changed, 11 insertions(+), 7 deletions(-)
>>>>>>>
>>>>>>> ------8<------
>>>>>>>
>>>>>>> When a native task terminates without going through rt_task_delete, we
>>>>>>> leaked the fastlock so far. Fix it by moving the release into the delete
>>>>>>> hook. As the ppd is already invalid at that point, we have to save the
>>>>>>> heap address in the task data structure.
>>>>>>
>>>>>> I Jan, I once worked on a patch to reverse the ppd cleanup order, in order
>>>>>> to fix bugs of this kind. Here it comes. I do not remember why I did not
>>>>>> commit it, but I guess it was not working well. Could we restart working
>>>>>> from this patch?
>>>>>>
>>>>>> From 038ecf08cd66b3112e0fe277d71d294b8eb83bcc Mon Sep 17 00:00:00 2001
>>>>>> From: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
>>>>>> Date: Sun, 29 Aug 2010 14:52:08 +0200
>>>>>> Subject: [PATCH] nucleus: reverse ppd cleanup order
>>>>>>
>>>>>> ---
>>>>>>  ksrc/nucleus/shadow.c |   11 ++++++-----
>>>>>>  1 files changed, 6 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
>>>>>> index b2d4326..725ae43 100644
>>>>>> --- a/ksrc/nucleus/shadow.c
>>>>>> +++ b/ksrc/nucleus/shadow.c
>>>>>> @@ -556,7 +556,7 @@ static unsigned ppd_lookup_inner(xnqueue_t **pq,
>>>>>>  	}
>>>>>>  	while (holder &&
>>>>>>  	       (ppd->key.mm < pkey->mm ||
>>>>>> -		(ppd->key.mm == pkey->mm && ppd->key.muxid < pkey->muxid)));
>>>>>> +		(ppd->key.mm == pkey->mm && ppd->key.muxid > pkey->muxid)));
>>>>>>  
>>>>>>  	if (ppd->key.mm == pkey->mm && ppd->key.muxid == pkey->muxid) {
>>>>>>  		/* found it, return it. */
>>>>>> @@ -566,7 +566,7 @@ static unsigned ppd_lookup_inner(xnqueue_t **pq,
>>>>>>  
>>>>>>  	/* not found, return successor for insertion. */
>>>>>>  	if (ppd->key.mm < pkey->mm ||
>>>>>> -	    (ppd->key.mm == pkey->mm && ppd->key.muxid < pkey->muxid))
>>>>>> +	    (ppd->key.mm == pkey->mm && ppd->key.muxid > pkey->muxid))
>>>>>>  		*pholder = holder ? link2ppd(holder) : NULL;
>>>>>>  	else
>>>>>>  		*pholder = ppd;
>>>>>> @@ -589,10 +589,11 @@ static int ppd_insert(xnshadow_ppd_t * holder)
>>>>>>  	}
>>>>>>  
>>>>>>  	inith(&holder->link);
>>>>>> -	if (next)
>>>>>> +	if (next) {
>>>>>>  		insertq(q, &next->link, &holder->link);
>>>>>> -	else
>>>>>> +	} else {
>>>>>>  		appendq(q, &holder->link);
>>>>>> +	}
>>>>>>  	xnlock_put_irqrestore(&nklock, s);
>>>>>>  
>>>>>>  	return 0;
>>>>>> @@ -640,7 +641,7 @@ static inline void ppd_remove_mm(struct mm_struct *mm,
>>>>>>  	xnqueue_t *q;
>>>>>>  	spl_t s;
>>>>>>  
>>>>>> -	key.muxid = 0;
>>>>>> +	key.muxid = ~0UL;
>>>>>>  	key.mm = mm;
>>>>>>  	xnlock_get_irqsave(&nklock, s);
>>>>>>  	ppd_lookup_inner(&q, &ppd, &key);
>>>>>
>>>>> Unfortunately, that won't help. I think we are forced to clear
>>>>> xnshadow_thrptd before calling into xnpod_delete_thread, but we would
>>>>> need that for xnshadow_ppd_get (=>xnpod_userspace_p()).
>>>>
>>>> I remember that now. Even if it worked, when the cleanup handler is
>>>> called, current->mm is NULL. We need to do this differently, the sys ppd
>>>> should be treated differently and passed to the other ppds cleanup routines.
>>>
>>> Do you already have an idea how to get that info to the delete hook
>>> function?
>>
>> Yes. We start by not applying the list reversal patch, then the sys_ppd
>> is the first in the list. So, we can, in the function ppd_remove_mm,
>> start by removing all the others ppd, then remove the sys ppd (that is
>> the first), last. This changes a few signatures in the core code, a lot
>> of things in the skin code, but that would be for the better...
> 
> I still don't see how this affects the order we use in
> do_taskexit_event, the one that prevents xnsys_get_ppd usage even when
> the mm is still present.

The idea is to change the cleanup routines not to call xnsys_get_ppd.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-24 10:41             ` Gilles Chanteperdrix
@ 2011-05-24 12:23               ` Jan Kiszka
  2011-05-24 12:30                 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Kiszka @ 2011-05-24 12:23 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai core

On 2011-05-24 12:41, Gilles Chanteperdrix wrote:
> On 05/24/2011 12:36 PM, Jan Kiszka wrote:
>> On 2011-05-24 11:58, Gilles Chanteperdrix wrote:
>>> On 05/24/2011 11:36 AM, Jan Kiszka wrote:
>>>> On 2011-05-24 11:32, Gilles Chanteperdrix wrote:
>>>>> On 05/24/2011 11:13 AM, Jan Kiszka wrote:
>>>>>> On 2011-05-24 06:31, Gilles Chanteperdrix wrote:
>>>>>>> On 05/23/2011 03:53 PM, Jan Kiszka wrote:
>>>>>>>> The following changes since commit aec30a2543afa18fa7832deee85e187b0faeb1f0:
>>>>>>>>
>>>>>>>>   xeno-test: fix reference to @XENO_TEST_DIR@ (2011-05-15 21:20:41 +0200)
>>>>>>>>
>>>>>>>> are available in the git repository at:
>>>>>>>>   git://git.xenomai.org/xenomai-jki.git for-upstream
>>>>>>>>
>>>>>>>> Jan Kiszka (1):
>>>>>>>>       native: Fix msendq fastlock leakage
>>>>>>>>
>>>>>>>>  include/native/task.h    |    5 +++++
>>>>>>>>  ksrc/skins/native/task.c |   13 ++++++-------
>>>>>>>>  2 files changed, 11 insertions(+), 7 deletions(-)
>>>>>>>>
>>>>>>>> ------8<------
>>>>>>>>
>>>>>>>> When a native task terminates without going through rt_task_delete, we
>>>>>>>> leaked the fastlock so far. Fix it by moving the release into the delete
>>>>>>>> hook. As the ppd is already invalid at that point, we have to save the
>>>>>>>> heap address in the task data structure.
>>>>>>>
>>>>>>> I Jan, I once worked on a patch to reverse the ppd cleanup order, in order
>>>>>>> to fix bugs of this kind. Here it comes. I do not remember why I did not
>>>>>>> commit it, but I guess it was not working well. Could we restart working
>>>>>>> from this patch?
>>>>>>>
>>>>>>> From 038ecf08cd66b3112e0fe277d71d294b8eb83bcc Mon Sep 17 00:00:00 2001
>>>>>>> From: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
>>>>>>> Date: Sun, 29 Aug 2010 14:52:08 +0200
>>>>>>> Subject: [PATCH] nucleus: reverse ppd cleanup order
>>>>>>>
>>>>>>> ---
>>>>>>>  ksrc/nucleus/shadow.c |   11 ++++++-----
>>>>>>>  1 files changed, 6 insertions(+), 5 deletions(-)
>>>>>>>
>>>>>>> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
>>>>>>> index b2d4326..725ae43 100644
>>>>>>> --- a/ksrc/nucleus/shadow.c
>>>>>>> +++ b/ksrc/nucleus/shadow.c
>>>>>>> @@ -556,7 +556,7 @@ static unsigned ppd_lookup_inner(xnqueue_t **pq,
>>>>>>>  	}
>>>>>>>  	while (holder &&
>>>>>>>  	       (ppd->key.mm < pkey->mm ||
>>>>>>> -		(ppd->key.mm == pkey->mm && ppd->key.muxid < pkey->muxid)));
>>>>>>> +		(ppd->key.mm == pkey->mm && ppd->key.muxid > pkey->muxid)));
>>>>>>>  
>>>>>>>  	if (ppd->key.mm == pkey->mm && ppd->key.muxid == pkey->muxid) {
>>>>>>>  		/* found it, return it. */
>>>>>>> @@ -566,7 +566,7 @@ static unsigned ppd_lookup_inner(xnqueue_t **pq,
>>>>>>>  
>>>>>>>  	/* not found, return successor for insertion. */
>>>>>>>  	if (ppd->key.mm < pkey->mm ||
>>>>>>> -	    (ppd->key.mm == pkey->mm && ppd->key.muxid < pkey->muxid))
>>>>>>> +	    (ppd->key.mm == pkey->mm && ppd->key.muxid > pkey->muxid))
>>>>>>>  		*pholder = holder ? link2ppd(holder) : NULL;
>>>>>>>  	else
>>>>>>>  		*pholder = ppd;
>>>>>>> @@ -589,10 +589,11 @@ static int ppd_insert(xnshadow_ppd_t * holder)
>>>>>>>  	}
>>>>>>>  
>>>>>>>  	inith(&holder->link);
>>>>>>> -	if (next)
>>>>>>> +	if (next) {
>>>>>>>  		insertq(q, &next->link, &holder->link);
>>>>>>> -	else
>>>>>>> +	} else {
>>>>>>>  		appendq(q, &holder->link);
>>>>>>> +	}
>>>>>>>  	xnlock_put_irqrestore(&nklock, s);
>>>>>>>  
>>>>>>>  	return 0;
>>>>>>> @@ -640,7 +641,7 @@ static inline void ppd_remove_mm(struct mm_struct *mm,
>>>>>>>  	xnqueue_t *q;
>>>>>>>  	spl_t s;
>>>>>>>  
>>>>>>> -	key.muxid = 0;
>>>>>>> +	key.muxid = ~0UL;
>>>>>>>  	key.mm = mm;
>>>>>>>  	xnlock_get_irqsave(&nklock, s);
>>>>>>>  	ppd_lookup_inner(&q, &ppd, &key);
>>>>>>
>>>>>> Unfortunately, that won't help. I think we are forced to clear
>>>>>> xnshadow_thrptd before calling into xnpod_delete_thread, but we would
>>>>>> need that for xnshadow_ppd_get (=>xnpod_userspace_p()).
>>>>>
>>>>> I remember that now. Even if it worked, when the cleanup handler is
>>>>> called, current->mm is NULL. We need to do this differently, the sys ppd
>>>>> should be treated differently and passed to the other ppds cleanup routines.
>>>>
>>>> Do you already have an idea how to get that info to the delete hook
>>>> function?
>>>
>>> Yes. We start by not applying the list reversal patch, then the sys_ppd
>>> is the first in the list. So, we can, in the function ppd_remove_mm,
>>> start by removing all the others ppd, then remove the sys ppd (that is
>>> the first), last. This changes a few signatures in the core code, a lot
>>> of things in the skin code, but that would be for the better...
>>
>> I still don't see how this affects the order we use in
>> do_taskexit_event, the one that prevents xnsys_get_ppd usage even when
>> the mm is still present.
> 
> The idea is to change the cleanup routines not to call xnsys_get_ppd.

...and use what instead? Sorry, I'm slow today.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-24 12:23               ` Jan Kiszka
@ 2011-05-24 12:30                 ` Gilles Chanteperdrix
  2011-05-24 13:52                   ` Jan Kiszka
  0 siblings, 1 reply; 37+ messages in thread
From: Gilles Chanteperdrix @ 2011-05-24 12:30 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

On 05/24/2011 02:23 PM, Jan Kiszka wrote:
> On 2011-05-24 12:41, Gilles Chanteperdrix wrote:
>> On 05/24/2011 12:36 PM, Jan Kiszka wrote:
>>> On 2011-05-24 11:58, Gilles Chanteperdrix wrote:
>>>> On 05/24/2011 11:36 AM, Jan Kiszka wrote:
>>>>> On 2011-05-24 11:32, Gilles Chanteperdrix wrote:
>>>>>> On 05/24/2011 11:13 AM, Jan Kiszka wrote:
>>>>>>> On 2011-05-24 06:31, Gilles Chanteperdrix wrote:
>>>>>>>> On 05/23/2011 03:53 PM, Jan Kiszka wrote:
>>>>>>>>> The following changes since commit aec30a2543afa18fa7832deee85e187b0faeb1f0:
>>>>>>>>>
>>>>>>>>>   xeno-test: fix reference to @XENO_TEST_DIR@ (2011-05-15 21:20:41 +0200)
>>>>>>>>>
>>>>>>>>> are available in the git repository at:
>>>>>>>>>   git://git.xenomai.org/xenomai-jki.git for-upstream
>>>>>>>>>
>>>>>>>>> Jan Kiszka (1):
>>>>>>>>>       native: Fix msendq fastlock leakage
>>>>>>>>>
>>>>>>>>>  include/native/task.h    |    5 +++++
>>>>>>>>>  ksrc/skins/native/task.c |   13 ++++++-------
>>>>>>>>>  2 files changed, 11 insertions(+), 7 deletions(-)
>>>>>>>>>
>>>>>>>>> ------8<------
>>>>>>>>>
>>>>>>>>> When a native task terminates without going through rt_task_delete, we
>>>>>>>>> leaked the fastlock so far. Fix it by moving the release into the delete
>>>>>>>>> hook. As the ppd is already invalid at that point, we have to save the
>>>>>>>>> heap address in the task data structure.
>>>>>>>>
>>>>>>>> I Jan, I once worked on a patch to reverse the ppd cleanup order, in order
>>>>>>>> to fix bugs of this kind. Here it comes. I do not remember why I did not
>>>>>>>> commit it, but I guess it was not working well. Could we restart working
>>>>>>>> from this patch?
>>>>>>>>
>>>>>>>> From 038ecf08cd66b3112e0fe277d71d294b8eb83bcc Mon Sep 17 00:00:00 2001
>>>>>>>> From: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
>>>>>>>> Date: Sun, 29 Aug 2010 14:52:08 +0200
>>>>>>>> Subject: [PATCH] nucleus: reverse ppd cleanup order
>>>>>>>>
>>>>>>>> ---
>>>>>>>>  ksrc/nucleus/shadow.c |   11 ++++++-----
>>>>>>>>  1 files changed, 6 insertions(+), 5 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
>>>>>>>> index b2d4326..725ae43 100644
>>>>>>>> --- a/ksrc/nucleus/shadow.c
>>>>>>>> +++ b/ksrc/nucleus/shadow.c
>>>>>>>> @@ -556,7 +556,7 @@ static unsigned ppd_lookup_inner(xnqueue_t **pq,
>>>>>>>>  	}
>>>>>>>>  	while (holder &&
>>>>>>>>  	       (ppd->key.mm < pkey->mm ||
>>>>>>>> -		(ppd->key.mm == pkey->mm && ppd->key.muxid < pkey->muxid)));
>>>>>>>> +		(ppd->key.mm == pkey->mm && ppd->key.muxid > pkey->muxid)));
>>>>>>>>  
>>>>>>>>  	if (ppd->key.mm == pkey->mm && ppd->key.muxid == pkey->muxid) {
>>>>>>>>  		/* found it, return it. */
>>>>>>>> @@ -566,7 +566,7 @@ static unsigned ppd_lookup_inner(xnqueue_t **pq,
>>>>>>>>  
>>>>>>>>  	/* not found, return successor for insertion. */
>>>>>>>>  	if (ppd->key.mm < pkey->mm ||
>>>>>>>> -	    (ppd->key.mm == pkey->mm && ppd->key.muxid < pkey->muxid))
>>>>>>>> +	    (ppd->key.mm == pkey->mm && ppd->key.muxid > pkey->muxid))
>>>>>>>>  		*pholder = holder ? link2ppd(holder) : NULL;
>>>>>>>>  	else
>>>>>>>>  		*pholder = ppd;
>>>>>>>> @@ -589,10 +589,11 @@ static int ppd_insert(xnshadow_ppd_t * holder)
>>>>>>>>  	}
>>>>>>>>  
>>>>>>>>  	inith(&holder->link);
>>>>>>>> -	if (next)
>>>>>>>> +	if (next) {
>>>>>>>>  		insertq(q, &next->link, &holder->link);
>>>>>>>> -	else
>>>>>>>> +	} else {
>>>>>>>>  		appendq(q, &holder->link);
>>>>>>>> +	}
>>>>>>>>  	xnlock_put_irqrestore(&nklock, s);
>>>>>>>>  
>>>>>>>>  	return 0;
>>>>>>>> @@ -640,7 +641,7 @@ static inline void ppd_remove_mm(struct mm_struct *mm,
>>>>>>>>  	xnqueue_t *q;
>>>>>>>>  	spl_t s;
>>>>>>>>  
>>>>>>>> -	key.muxid = 0;
>>>>>>>> +	key.muxid = ~0UL;
>>>>>>>>  	key.mm = mm;
>>>>>>>>  	xnlock_get_irqsave(&nklock, s);
>>>>>>>>  	ppd_lookup_inner(&q, &ppd, &key);
>>>>>>>
>>>>>>> Unfortunately, that won't help. I think we are forced to clear
>>>>>>> xnshadow_thrptd before calling into xnpod_delete_thread, but we would
>>>>>>> need that for xnshadow_ppd_get (=>xnpod_userspace_p()).
>>>>>>
>>>>>> I remember that now. Even if it worked, when the cleanup handler is
>>>>>> called, current->mm is NULL. We need to do this differently, the sys ppd
>>>>>> should be treated differently and passed to the other ppds cleanup routines.
>>>>>
>>>>> Do you already have an idea how to get that info to the delete hook
>>>>> function?
>>>>
>>>> Yes. We start by not applying the list reversal patch, then the sys_ppd
>>>> is the first in the list. So, we can, in the function ppd_remove_mm,
>>>> start by removing all the others ppd, then remove the sys ppd (that is
>>>> the first), last. This changes a few signatures in the core code, a lot
>>>> of things in the skin code, but that would be for the better...
>>>
>>> I still don't see how this affects the order we use in
>>> do_taskexit_event, the one that prevents xnsys_get_ppd usage even when
>>> the mm is still present.
>>
>> The idea is to change the cleanup routines not to call xnsys_get_ppd.
> 
> ...and use what instead? Sorry, I'm slow today.

The sys_ppd passed as other argument to the cleanup function.

> 
> Jan
> 


-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-24 12:30                 ` Gilles Chanteperdrix
@ 2011-05-24 13:52                   ` Jan Kiszka
  2011-05-24 14:03                     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Kiszka @ 2011-05-24 13:52 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai core

On 2011-05-24 14:30, Gilles Chanteperdrix wrote:
>>>>>> Do you already have an idea how to get that info to the delete hook
>>>>>> function?
>>>>>
>>>>> Yes. We start by not applying the list reversal patch, then the sys_ppd
>>>>> is the first in the list. So, we can, in the function ppd_remove_mm,
>>>>> start by removing all the others ppd, then remove the sys ppd (that is
>>>>> the first), last. This changes a few signatures in the core code, a lot
>>>>> of things in the skin code, but that would be for the better...
>>>>
>>>> I still don't see how this affects the order we use in
>>>> do_taskexit_event, the one that prevents xnsys_get_ppd usage even when
>>>> the mm is still present.
>>>
>>> The idea is to change the cleanup routines not to call xnsys_get_ppd.
>>
>> ...and use what instead? Sorry, I'm slow today.
> 
> The sys_ppd passed as other argument to the cleanup function.

That would affect all thread hooks, not only the one for deletion. And
it would pull in more shadow-specific bits into the pod.

Moreover, I think we would still be in troubles as mm, thus ppd,
deletion takes place before last task deletion, thus taskexit hook
invocation. That's due to the cleanup ordering in the kernel's do_exit.

However, if you have a patch, I'd be happy to test and rework my leakage
fix.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-24 13:52                   ` Jan Kiszka
@ 2011-05-24 14:03                     ` Gilles Chanteperdrix
  2011-05-25 11:20                       ` Jan Kiszka
  0 siblings, 1 reply; 37+ messages in thread
From: Gilles Chanteperdrix @ 2011-05-24 14:03 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

On 05/24/2011 03:52 PM, Jan Kiszka wrote:
> On 2011-05-24 14:30, Gilles Chanteperdrix wrote:
>>>>>>> Do you already have an idea how to get that info to the delete hook
>>>>>>> function?
>>>>>>
>>>>>> Yes. We start by not applying the list reversal patch, then the sys_ppd
>>>>>> is the first in the list. So, we can, in the function ppd_remove_mm,
>>>>>> start by removing all the others ppd, then remove the sys ppd (that is
>>>>>> the first), last. This changes a few signatures in the core code, a lot
>>>>>> of things in the skin code, but that would be for the better...
>>>>>
>>>>> I still don't see how this affects the order we use in
>>>>> do_taskexit_event, the one that prevents xnsys_get_ppd usage even when
>>>>> the mm is still present.
>>>>
>>>> The idea is to change the cleanup routines not to call xnsys_get_ppd.
>>>
>>> ...and use what instead? Sorry, I'm slow today.
>>
>> The sys_ppd passed as other argument to the cleanup function.
> 
> That would affect all thread hooks, not only the one for deletion. And
> it would pull in more shadow-specific bits into the pod.
> 
> Moreover, I think we would still be in troubles as mm, thus ppd,
> deletion takes place before last task deletion, thus taskexit hook
> invocation. That's due to the cleanup ordering in the kernel's do_exit.
> 
> However, if you have a patch, I'd be happy to test and rework my leakage
> fix.

I will work on this ASAP.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-24 14:03                     ` Gilles Chanteperdrix
@ 2011-05-25 11:20                       ` Jan Kiszka
  2011-05-25 11:58                         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Kiszka @ 2011-05-25 11:20 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai core

On 2011-05-24 16:03, Gilles Chanteperdrix wrote:
> On 05/24/2011 03:52 PM, Jan Kiszka wrote:
>> On 2011-05-24 14:30, Gilles Chanteperdrix wrote:
>>>>>>>> Do you already have an idea how to get that info to the delete hook
>>>>>>>> function?
>>>>>>>
>>>>>>> Yes. We start by not applying the list reversal patch, then the sys_ppd
>>>>>>> is the first in the list. So, we can, in the function ppd_remove_mm,
>>>>>>> start by removing all the others ppd, then remove the sys ppd (that is
>>>>>>> the first), last. This changes a few signatures in the core code, a lot
>>>>>>> of things in the skin code, but that would be for the better...
>>>>>>
>>>>>> I still don't see how this affects the order we use in
>>>>>> do_taskexit_event, the one that prevents xnsys_get_ppd usage even when
>>>>>> the mm is still present.
>>>>>
>>>>> The idea is to change the cleanup routines not to call xnsys_get_ppd.
>>>>
>>>> ...and use what instead? Sorry, I'm slow today.
>>>
>>> The sys_ppd passed as other argument to the cleanup function.
>>
>> That would affect all thread hooks, not only the one for deletion. And
>> it would pull in more shadow-specific bits into the pod.
>>
>> Moreover, I think we would still be in troubles as mm, thus ppd,
>> deletion takes place before last task deletion, thus taskexit hook
>> invocation. That's due to the cleanup ordering in the kernel's do_exit.
>>
>> However, if you have a patch, I'd be happy to test and rework my leakage
>> fix.
> 
> I will work on this ASAP.

Sorry for pushing, but I need to decide if we should role out my
imperfect fix or if there is chance to use some upstream version
directly. Were you able to look into this, or will this likely take a
bit more time?

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-25 11:20                       ` Jan Kiszka
@ 2011-05-25 11:58                         ` Gilles Chanteperdrix
  2011-05-25 12:12                           ` Jan Kiszka
  0 siblings, 1 reply; 37+ messages in thread
From: Gilles Chanteperdrix @ 2011-05-25 11:58 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

On 05/25/2011 01:20 PM, Jan Kiszka wrote:
> On 2011-05-24 16:03, Gilles Chanteperdrix wrote:
>> On 05/24/2011 03:52 PM, Jan Kiszka wrote:
>>> On 2011-05-24 14:30, Gilles Chanteperdrix wrote:
>>>>>>>>> Do you already have an idea how to get that info to the delete hook
>>>>>>>>> function?
>>>>>>>>
>>>>>>>> Yes. We start by not applying the list reversal patch, then the sys_ppd
>>>>>>>> is the first in the list. So, we can, in the function ppd_remove_mm,
>>>>>>>> start by removing all the others ppd, then remove the sys ppd (that is
>>>>>>>> the first), last. This changes a few signatures in the core code, a lot
>>>>>>>> of things in the skin code, but that would be for the better...
>>>>>>>
>>>>>>> I still don't see how this affects the order we use in
>>>>>>> do_taskexit_event, the one that prevents xnsys_get_ppd usage even when
>>>>>>> the mm is still present.
>>>>>>
>>>>>> The idea is to change the cleanup routines not to call xnsys_get_ppd.
>>>>>
>>>>> ...and use what instead? Sorry, I'm slow today.
>>>>
>>>> The sys_ppd passed as other argument to the cleanup function.
>>>
>>> That would affect all thread hooks, not only the one for deletion. And
>>> it would pull in more shadow-specific bits into the pod.
>>>
>>> Moreover, I think we would still be in troubles as mm, thus ppd,
>>> deletion takes place before last task deletion, thus taskexit hook
>>> invocation. That's due to the cleanup ordering in the kernel's do_exit.
>>>
>>> However, if you have a patch, I'd be happy to test and rework my leakage
>>> fix.
>>
>> I will work on this ASAP.
> 
> Sorry for pushing, but I need to decide if we should role out my
> imperfect fix or if there is chance to use some upstream version
> directly. Were you able to look into this, or will this likely take a
> bit more time?

I intended to try and do this next week-end. If it is more urgent than
that, I can try in one or two days. In any case, I do not think we
should try and workaround the current code, it is way to fragile.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-25 11:58                         ` Gilles Chanteperdrix
@ 2011-05-25 12:12                           ` Jan Kiszka
  2011-05-25 12:19                             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Kiszka @ 2011-05-25 12:12 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai core

On 2011-05-25 13:58, Gilles Chanteperdrix wrote:
> On 05/25/2011 01:20 PM, Jan Kiszka wrote:
>> On 2011-05-24 16:03, Gilles Chanteperdrix wrote:
>>> On 05/24/2011 03:52 PM, Jan Kiszka wrote:
>>>> On 2011-05-24 14:30, Gilles Chanteperdrix wrote:
>>>>>>>>>> Do you already have an idea how to get that info to the delete hook
>>>>>>>>>> function?
>>>>>>>>>
>>>>>>>>> Yes. We start by not applying the list reversal patch, then the sys_ppd
>>>>>>>>> is the first in the list. So, we can, in the function ppd_remove_mm,
>>>>>>>>> start by removing all the others ppd, then remove the sys ppd (that is
>>>>>>>>> the first), last. This changes a few signatures in the core code, a lot
>>>>>>>>> of things in the skin code, but that would be for the better...
>>>>>>>>
>>>>>>>> I still don't see how this affects the order we use in
>>>>>>>> do_taskexit_event, the one that prevents xnsys_get_ppd usage even when
>>>>>>>> the mm is still present.
>>>>>>>
>>>>>>> The idea is to change the cleanup routines not to call xnsys_get_ppd.
>>>>>>
>>>>>> ...and use what instead? Sorry, I'm slow today.
>>>>>
>>>>> The sys_ppd passed as other argument to the cleanup function.
>>>>
>>>> That would affect all thread hooks, not only the one for deletion. And
>>>> it would pull in more shadow-specific bits into the pod.
>>>>
>>>> Moreover, I think we would still be in troubles as mm, thus ppd,
>>>> deletion takes place before last task deletion, thus taskexit hook
>>>> invocation. That's due to the cleanup ordering in the kernel's do_exit.
>>>>
>>>> However, if you have a patch, I'd be happy to test and rework my leakage
>>>> fix.
>>>
>>> I will work on this ASAP.
>>
>> Sorry for pushing, but I need to decide if we should role out my
>> imperfect fix or if there is chance to use some upstream version
>> directly. Were you able to look into this, or will this likely take a
>> bit more time?
> 
> I intended to try and do this next week-end. If it is more urgent than
> that, I can try in one or two days. In any case, I do not think we
> should try and workaround the current code, it is way to fragile.

Mmh, might be true. I'm getting the feeling we should locally revert all
the recent MPS changes to work around the issues. It looks like there
are more related problems sleeping (we are still facing spurious
fast-synch related crashes here - examining ATM).

Another thing that just came to my mind: Do we have a well-defined
destruction order of native skin or native tasks vs. system skin? I mean
the latter destroys the local sem_heap while the former may purge
remaining native resources (including the MPS fastlock). I think the
ordering is inverted to what the code assumes (heap is destructed before
the last task), no?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-25 12:12                           ` Jan Kiszka
@ 2011-05-25 12:19                             ` Gilles Chanteperdrix
  2011-05-25 12:22                               ` Jan Kiszka
  0 siblings, 1 reply; 37+ messages in thread
From: Gilles Chanteperdrix @ 2011-05-25 12:19 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

On 05/25/2011 02:12 PM, Jan Kiszka wrote:
> On 2011-05-25 13:58, Gilles Chanteperdrix wrote:
>> On 05/25/2011 01:20 PM, Jan Kiszka wrote:
>>> On 2011-05-24 16:03, Gilles Chanteperdrix wrote:
>>>> On 05/24/2011 03:52 PM, Jan Kiszka wrote:
>>>>> On 2011-05-24 14:30, Gilles Chanteperdrix wrote:
>>>>>>>>>>> Do you already have an idea how to get that info to the delete hook
>>>>>>>>>>> function?
>>>>>>>>>>
>>>>>>>>>> Yes. We start by not applying the list reversal patch, then the sys_ppd
>>>>>>>>>> is the first in the list. So, we can, in the function ppd_remove_mm,
>>>>>>>>>> start by removing all the others ppd, then remove the sys ppd (that is
>>>>>>>>>> the first), last. This changes a few signatures in the core code, a lot
>>>>>>>>>> of things in the skin code, but that would be for the better...
>>>>>>>>>
>>>>>>>>> I still don't see how this affects the order we use in
>>>>>>>>> do_taskexit_event, the one that prevents xnsys_get_ppd usage even when
>>>>>>>>> the mm is still present.
>>>>>>>>
>>>>>>>> The idea is to change the cleanup routines not to call xnsys_get_ppd.
>>>>>>>
>>>>>>> ...and use what instead? Sorry, I'm slow today.
>>>>>>
>>>>>> The sys_ppd passed as other argument to the cleanup function.
>>>>>
>>>>> That would affect all thread hooks, not only the one for deletion. And
>>>>> it would pull in more shadow-specific bits into the pod.
>>>>>
>>>>> Moreover, I think we would still be in troubles as mm, thus ppd,
>>>>> deletion takes place before last task deletion, thus taskexit hook
>>>>> invocation. That's due to the cleanup ordering in the kernel's do_exit.
>>>>>
>>>>> However, if you have a patch, I'd be happy to test and rework my leakage
>>>>> fix.
>>>>
>>>> I will work on this ASAP.
>>>
>>> Sorry for pushing, but I need to decide if we should role out my
>>> imperfect fix or if there is chance to use some upstream version
>>> directly. Were you able to look into this, or will this likely take a
>>> bit more time?
>>
>> I intended to try and do this next week-end. If it is more urgent than
>> that, I can try in one or two days. In any case, I do not think we
>> should try and workaround the current code, it is way to fragile.
> 
> Mmh, might be true. I'm getting the feeling we should locally revert all
> the recent MPS changes to work around the issues. It looks like there
> are more related problems sleeping (we are still facing spurious
> fast-synch related crashes here - examining ATM).

This is the development head, it may remain broken for short times while
we are fixing. I would understand reverting on the 2.5 branch, not on -head.

> Another thing that just came to my mind: Do we have a well-defined
> destruction order of native skin or native tasks vs. system skin? I mean
> the latter destroys the local sem_heap while the former may purge
> remaining native resources (including the MPS fastlock). I think the
> ordering is inverted to what the code assumes (heap is destructed before
> the last task), no?

IMO, the system skin destroy callback should be called last, this should
solve these problems. This is what I was talking about.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-25 12:19                             ` Gilles Chanteperdrix
@ 2011-05-25 12:22                               ` Jan Kiszka
  2011-05-25 18:48                                 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Kiszka @ 2011-05-25 12:22 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai core

On 2011-05-25 14:19, Gilles Chanteperdrix wrote:
> On 05/25/2011 02:12 PM, Jan Kiszka wrote:
>> On 2011-05-25 13:58, Gilles Chanteperdrix wrote:
>>> On 05/25/2011 01:20 PM, Jan Kiszka wrote:
>>>> On 2011-05-24 16:03, Gilles Chanteperdrix wrote:
>>>>> On 05/24/2011 03:52 PM, Jan Kiszka wrote:
>>>>>> On 2011-05-24 14:30, Gilles Chanteperdrix wrote:
>>>>>>>>>>>> Do you already have an idea how to get that info to the delete hook
>>>>>>>>>>>> function?
>>>>>>>>>>>
>>>>>>>>>>> Yes. We start by not applying the list reversal patch, then the sys_ppd
>>>>>>>>>>> is the first in the list. So, we can, in the function ppd_remove_mm,
>>>>>>>>>>> start by removing all the others ppd, then remove the sys ppd (that is
>>>>>>>>>>> the first), last. This changes a few signatures in the core code, a lot
>>>>>>>>>>> of things in the skin code, but that would be for the better...
>>>>>>>>>>
>>>>>>>>>> I still don't see how this affects the order we use in
>>>>>>>>>> do_taskexit_event, the one that prevents xnsys_get_ppd usage even when
>>>>>>>>>> the mm is still present.
>>>>>>>>>
>>>>>>>>> The idea is to change the cleanup routines not to call xnsys_get_ppd.
>>>>>>>>
>>>>>>>> ...and use what instead? Sorry, I'm slow today.
>>>>>>>
>>>>>>> The sys_ppd passed as other argument to the cleanup function.
>>>>>>
>>>>>> That would affect all thread hooks, not only the one for deletion. And
>>>>>> it would pull in more shadow-specific bits into the pod.
>>>>>>
>>>>>> Moreover, I think we would still be in troubles as mm, thus ppd,
>>>>>> deletion takes place before last task deletion, thus taskexit hook
>>>>>> invocation. That's due to the cleanup ordering in the kernel's do_exit.
>>>>>>
>>>>>> However, if you have a patch, I'd be happy to test and rework my leakage
>>>>>> fix.
>>>>>
>>>>> I will work on this ASAP.
>>>>
>>>> Sorry for pushing, but I need to decide if we should role out my
>>>> imperfect fix or if there is chance to use some upstream version
>>>> directly. Were you able to look into this, or will this likely take a
>>>> bit more time?
>>>
>>> I intended to try and do this next week-end. If it is more urgent than
>>> that, I can try in one or two days. In any case, I do not think we
>>> should try and workaround the current code, it is way to fragile.
>>
>> Mmh, might be true. I'm getting the feeling we should locally revert all
>> the recent MPS changes to work around the issues. It looks like there
>> are more related problems sleeping (we are still facing spurious
>> fast-synch related crashes here - examining ATM).
> 
> This is the development head, it may remain broken for short times while
> we are fixing. I would understand reverting on the 2.5 branch, not on -head.

I was thinking loudly about our (Siemens) local branch, not -head. We
are forced to use head to have features like automatic non-RT shadow
migration.

> 
>> Another thing that just came to my mind: Do we have a well-defined
>> destruction order of native skin or native tasks vs. system skin? I mean
>> the latter destroys the local sem_heap while the former may purge
>> remaining native resources (including the MPS fastlock). I think the
>> ordering is inverted to what the code assumes (heap is destructed before
>> the last task), no?
> 
> IMO, the system skin destroy callback should be called last, this should
> solve these problems. This is what I was talking about.

OK. Still, tasks aren't destroyed on mm shoot-down but via a dedicated
do_exit callback that is invoke by the kernel a few instructions later.
Nothing we can directly influence from Xenomai code.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-25 12:22                               ` Jan Kiszka
@ 2011-05-25 18:48                                 ` Gilles Chanteperdrix
  2011-05-26  7:18                                   ` Jan Kiszka
  0 siblings, 1 reply; 37+ messages in thread
From: Gilles Chanteperdrix @ 2011-05-25 18:48 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

On 05/25/2011 02:22 PM, Jan Kiszka wrote:
> On 2011-05-25 14:19, Gilles Chanteperdrix wrote:
>> On 05/25/2011 02:12 PM, Jan Kiszka wrote:
>>> On 2011-05-25 13:58, Gilles Chanteperdrix wrote:
>>>> On 05/25/2011 01:20 PM, Jan Kiszka wrote:
>>>>> On 2011-05-24 16:03, Gilles Chanteperdrix wrote:
>>>>>> On 05/24/2011 03:52 PM, Jan Kiszka wrote:
>>>>>>> On 2011-05-24 14:30, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>> Do you already have an idea how to get that info to the delete hook
>>>>>>>>>>>>> function?
>>>>>>>>>>>>
>>>>>>>>>>>> Yes. We start by not applying the list reversal patch, then the sys_ppd
>>>>>>>>>>>> is the first in the list. So, we can, in the function ppd_remove_mm,
>>>>>>>>>>>> start by removing all the others ppd, then remove the sys ppd (that is
>>>>>>>>>>>> the first), last. This changes a few signatures in the core code, a lot
>>>>>>>>>>>> of things in the skin code, but that would be for the better...
>>>>>>>>>>>
>>>>>>>>>>> I still don't see how this affects the order we use in
>>>>>>>>>>> do_taskexit_event, the one that prevents xnsys_get_ppd usage even when
>>>>>>>>>>> the mm is still present.
>>>>>>>>>>
>>>>>>>>>> The idea is to change the cleanup routines not to call xnsys_get_ppd.
>>>>>>>>>
>>>>>>>>> ...and use what instead? Sorry, I'm slow today.
>>>>>>>>
>>>>>>>> The sys_ppd passed as other argument to the cleanup function.
>>>>>>>
>>>>>>> That would affect all thread hooks, not only the one for deletion. And
>>>>>>> it would pull in more shadow-specific bits into the pod.
>>>>>>>
>>>>>>> Moreover, I think we would still be in troubles as mm, thus ppd,
>>>>>>> deletion takes place before last task deletion, thus taskexit hook
>>>>>>> invocation. That's due to the cleanup ordering in the kernel's do_exit.
>>>>>>>
>>>>>>> However, if you have a patch, I'd be happy to test and rework my leakage
>>>>>>> fix.
>>>>>>
>>>>>> I will work on this ASAP.
>>>>>
>>>>> Sorry for pushing, but I need to decide if we should role out my
>>>>> imperfect fix or if there is chance to use some upstream version
>>>>> directly. Were you able to look into this, or will this likely take a
>>>>> bit more time?
>>>>
>>>> I intended to try and do this next week-end. If it is more urgent than
>>>> that, I can try in one or two days. In any case, I do not think we
>>>> should try and workaround the current code, it is way to fragile.
>>>
>>> Mmh, might be true. I'm getting the feeling we should locally revert all
>>> the recent MPS changes to work around the issues. It looks like there
>>> are more related problems sleeping (we are still facing spurious
>>> fast-synch related crashes here - examining ATM).
>>
>> This is the development head, it may remain broken for short times while
>> we are fixing. I would understand reverting on the 2.5 branch, not on -head.
> 
> I was thinking loudly about our (Siemens) local branch, not -head. We
> are forced to use head to have features like automatic non-RT shadow
> migration.

Now that I think about it, leaking a few bytes in the private heap is
completely harmless, since it is destroyed anyway, and xnheap_free does
enough checks to avoid clobbering already freed memory, though
destroying this heap last is the way to go, as I have already said.

A bit less in the global heap, but this one should be used less often,
and from your explanation, I understand this is not the case you are
interested in, otherwise you would not care if xnpod_userspace_p() is
working.

In any case, it should not cause crashes, it is just a leak.

> 
>>
>>> Another thing that just came to my mind: Do we have a well-defined
>>> destruction order of native skin or native tasks vs. system skin? I mean
>>> the latter destroys the local sem_heap while the former may purge
>>> remaining native resources (including the MPS fastlock). I think the
>>> ordering is inverted to what the code assumes (heap is destructed before
>>> the last task), no?
>>
>> IMO, the system skin destroy callback should be called last, this should
>> solve these problems. This is what I was talking about.
> 
> OK. Still, tasks aren't destroyed on mm shoot-down but via a dedicated
> do_exit callback that is invoke by the kernel a few instructions later.
> Nothing we can directly influence from Xenomai code.

We do not care, we only use the mm pointer value as a key, and this one
is still available when the exit callback is called. So, we have the mm
pointer, we can have the process private ppd, normally, we have all that
we need. It is just a question of finding an interface which is not too
intrusive.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-25 18:48                                 ` Gilles Chanteperdrix
@ 2011-05-26  7:18                                   ` Jan Kiszka
  2011-05-26  7:29                                     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Kiszka @ 2011-05-26  7:18 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai core

On 2011-05-25 20:48, Gilles Chanteperdrix wrote:
> On 05/25/2011 02:22 PM, Jan Kiszka wrote:
>> On 2011-05-25 14:19, Gilles Chanteperdrix wrote:
>>> On 05/25/2011 02:12 PM, Jan Kiszka wrote:
>>>> On 2011-05-25 13:58, Gilles Chanteperdrix wrote:
>>>>> On 05/25/2011 01:20 PM, Jan Kiszka wrote:
>>>>>> On 2011-05-24 16:03, Gilles Chanteperdrix wrote:
>>>>>>> On 05/24/2011 03:52 PM, Jan Kiszka wrote:
>>>>>>>> On 2011-05-24 14:30, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>> Do you already have an idea how to get that info to the delete hook
>>>>>>>>>>>>>> function?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes. We start by not applying the list reversal patch, then the sys_ppd
>>>>>>>>>>>>> is the first in the list. So, we can, in the function ppd_remove_mm,
>>>>>>>>>>>>> start by removing all the others ppd, then remove the sys ppd (that is
>>>>>>>>>>>>> the first), last. This changes a few signatures in the core code, a lot
>>>>>>>>>>>>> of things in the skin code, but that would be for the better...
>>>>>>>>>>>>
>>>>>>>>>>>> I still don't see how this affects the order we use in
>>>>>>>>>>>> do_taskexit_event, the one that prevents xnsys_get_ppd usage even when
>>>>>>>>>>>> the mm is still present.
>>>>>>>>>>>
>>>>>>>>>>> The idea is to change the cleanup routines not to call xnsys_get_ppd.
>>>>>>>>>>
>>>>>>>>>> ...and use what instead? Sorry, I'm slow today.
>>>>>>>>>
>>>>>>>>> The sys_ppd passed as other argument to the cleanup function.
>>>>>>>>
>>>>>>>> That would affect all thread hooks, not only the one for deletion. And
>>>>>>>> it would pull in more shadow-specific bits into the pod.
>>>>>>>>
>>>>>>>> Moreover, I think we would still be in troubles as mm, thus ppd,
>>>>>>>> deletion takes place before last task deletion, thus taskexit hook
>>>>>>>> invocation. That's due to the cleanup ordering in the kernel's do_exit.
>>>>>>>>
>>>>>>>> However, if you have a patch, I'd be happy to test and rework my leakage
>>>>>>>> fix.
>>>>>>>
>>>>>>> I will work on this ASAP.
>>>>>>
>>>>>> Sorry for pushing, but I need to decide if we should role out my
>>>>>> imperfect fix or if there is chance to use some upstream version
>>>>>> directly. Were you able to look into this, or will this likely take a
>>>>>> bit more time?
>>>>>
>>>>> I intended to try and do this next week-end. If it is more urgent than
>>>>> that, I can try in one or two days. In any case, I do not think we
>>>>> should try and workaround the current code, it is way to fragile.
>>>>
>>>> Mmh, might be true. I'm getting the feeling we should locally revert all
>>>> the recent MPS changes to work around the issues. It looks like there
>>>> are more related problems sleeping (we are still facing spurious
>>>> fast-synch related crashes here - examining ATM).
>>>
>>> This is the development head, it may remain broken for short times while
>>> we are fixing. I would understand reverting on the 2.5 branch, not on -head.
>>
>> I was thinking loudly about our (Siemens) local branch, not -head. We
>> are forced to use head to have features like automatic non-RT shadow
>> migration.
> 
> Now that I think about it, leaking a few bytes in the private heap is
> completely harmless, since it is destroyed anyway,

Not at all harmless if you create and destroy tasks without restarting
the application. That's what our users are do, so this bug is killing them.

> and xnheap_free does
> enough checks to avoid clobbering already freed memory, though
> destroying this heap last is the way to go, as I have already said.
> 
> A bit less in the global heap, but this one should be used less often,
> and from your explanation, I understand this is not the case you are
> interested in, otherwise you would not care if xnpod_userspace_p() is
> working.
> 
> In any case, it should not cause crashes, it is just a leak.
> 
>>
>>>
>>>> Another thing that just came to my mind: Do we have a well-defined
>>>> destruction order of native skin or native tasks vs. system skin? I mean
>>>> the latter destroys the local sem_heap while the former may purge
>>>> remaining native resources (including the MPS fastlock). I think the
>>>> ordering is inverted to what the code assumes (heap is destructed before
>>>> the last task), no?
>>>
>>> IMO, the system skin destroy callback should be called last, this should
>>> solve these problems. This is what I was talking about.
>>
>> OK. Still, tasks aren't destroyed on mm shoot-down but via a dedicated
>> do_exit callback that is invoke by the kernel a few instructions later.
>> Nothing we can directly influence from Xenomai code.
> 
> We do not care, we only use the mm pointer value as a key, and this one
> is still available when the exit callback is called. So, we have the mm
> pointer, we can have the process private ppd, normally, we have all that
> we need. It is just a question of finding an interface which is not too
> intrusive.

The problem is that the heap we need to release the fastlock to will
already be history at this point - classic use after release...

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-26  7:18                                   ` Jan Kiszka
@ 2011-05-26  7:29                                     ` Gilles Chanteperdrix
  2011-05-26  7:37                                       ` Jan Kiszka
  0 siblings, 1 reply; 37+ messages in thread
From: Gilles Chanteperdrix @ 2011-05-26  7:29 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

On 05/26/2011 09:18 AM, Jan Kiszka wrote:
> On 2011-05-25 20:48, Gilles Chanteperdrix wrote:
>> On 05/25/2011 02:22 PM, Jan Kiszka wrote:
>>> On 2011-05-25 14:19, Gilles Chanteperdrix wrote:
>>>> On 05/25/2011 02:12 PM, Jan Kiszka wrote:
>>>>> On 2011-05-25 13:58, Gilles Chanteperdrix wrote:
>>>>>> On 05/25/2011 01:20 PM, Jan Kiszka wrote:
>>>>>>> On 2011-05-24 16:03, Gilles Chanteperdrix wrote:
>>>>>>>> On 05/24/2011 03:52 PM, Jan Kiszka wrote:
>>>>>>>>> On 2011-05-24 14:30, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>> Do you already have an idea how to get that info to the delete hook
>>>>>>>>>>>>>>> function?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes. We start by not applying the list reversal patch, then the sys_ppd
>>>>>>>>>>>>>> is the first in the list. So, we can, in the function ppd_remove_mm,
>>>>>>>>>>>>>> start by removing all the others ppd, then remove the sys ppd (that is
>>>>>>>>>>>>>> the first), last. This changes a few signatures in the core code, a lot
>>>>>>>>>>>>>> of things in the skin code, but that would be for the better...
>>>>>>>>>>>>>
>>>>>>>>>>>>> I still don't see how this affects the order we use in
>>>>>>>>>>>>> do_taskexit_event, the one that prevents xnsys_get_ppd usage even when
>>>>>>>>>>>>> the mm is still present.
>>>>>>>>>>>>
>>>>>>>>>>>> The idea is to change the cleanup routines not to call xnsys_get_ppd.
>>>>>>>>>>>
>>>>>>>>>>> ...and use what instead? Sorry, I'm slow today.
>>>>>>>>>>
>>>>>>>>>> The sys_ppd passed as other argument to the cleanup function.
>>>>>>>>>
>>>>>>>>> That would affect all thread hooks, not only the one for deletion. And
>>>>>>>>> it would pull in more shadow-specific bits into the pod.
>>>>>>>>>
>>>>>>>>> Moreover, I think we would still be in troubles as mm, thus ppd,
>>>>>>>>> deletion takes place before last task deletion, thus taskexit hook
>>>>>>>>> invocation. That's due to the cleanup ordering in the kernel's do_exit.
>>>>>>>>>
>>>>>>>>> However, if you have a patch, I'd be happy to test and rework my leakage
>>>>>>>>> fix.
>>>>>>>>
>>>>>>>> I will work on this ASAP.
>>>>>>>
>>>>>>> Sorry for pushing, but I need to decide if we should role out my
>>>>>>> imperfect fix or if there is chance to use some upstream version
>>>>>>> directly. Were you able to look into this, or will this likely take a
>>>>>>> bit more time?
>>>>>>
>>>>>> I intended to try and do this next week-end. If it is more urgent than
>>>>>> that, I can try in one or two days. In any case, I do not think we
>>>>>> should try and workaround the current code, it is way to fragile.
>>>>>
>>>>> Mmh, might be true. I'm getting the feeling we should locally revert all
>>>>> the recent MPS changes to work around the issues. It looks like there
>>>>> are more related problems sleeping (we are still facing spurious
>>>>> fast-synch related crashes here - examining ATM).
>>>>
>>>> This is the development head, it may remain broken for short times while
>>>> we are fixing. I would understand reverting on the 2.5 branch, not on -head.
>>>
>>> I was thinking loudly about our (Siemens) local branch, not -head. We
>>> are forced to use head to have features like automatic non-RT shadow
>>> migration.
>>
>> Now that I think about it, leaking a few bytes in the private heap is
>> completely harmless, since it is destroyed anyway,
> 
> Not at all harmless if you create and destroy tasks without restarting
> the application. That's what our users are do, so this bug is killing them.

Still, it should not cause crashes. Only allocation returning NULL at
some point.

>> We do not care, we only use the mm pointer value as a key, and this one
>> is still available when the exit callback is called. So, we have the mm
>> pointer, we can have the process private ppd, normally, we have all that
>> we need. It is just a question of finding an interface which is not too
>> intrusive.
> 
> The problem is that the heap we need to release the fastlock to will
> already be history at this point - classic use after release...

As I said, xnheap_free is robust against this, so, this user after
release does not cause any trouble. But I have also already agreed that
we should fix it.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-26  7:29                                     ` Gilles Chanteperdrix
@ 2011-05-26  7:37                                       ` Jan Kiszka
  2011-05-26  7:58                                         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Kiszka @ 2011-05-26  7:37 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai core

On 2011-05-26 09:29, Gilles Chanteperdrix wrote:
> On 05/26/2011 09:18 AM, Jan Kiszka wrote:
>> On 2011-05-25 20:48, Gilles Chanteperdrix wrote:
>>> On 05/25/2011 02:22 PM, Jan Kiszka wrote:
>>>> On 2011-05-25 14:19, Gilles Chanteperdrix wrote:
>>>>> On 05/25/2011 02:12 PM, Jan Kiszka wrote:
>>>>>> On 2011-05-25 13:58, Gilles Chanteperdrix wrote:
>>>>>>> On 05/25/2011 01:20 PM, Jan Kiszka wrote:
>>>>>>>> On 2011-05-24 16:03, Gilles Chanteperdrix wrote:
>>>>>>>>> On 05/24/2011 03:52 PM, Jan Kiszka wrote:
>>>>>>>>>> On 2011-05-24 14:30, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>> Do you already have an idea how to get that info to the delete hook
>>>>>>>>>>>>>>>> function?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes. We start by not applying the list reversal patch, then the sys_ppd
>>>>>>>>>>>>>>> is the first in the list. So, we can, in the function ppd_remove_mm,
>>>>>>>>>>>>>>> start by removing all the others ppd, then remove the sys ppd (that is
>>>>>>>>>>>>>>> the first), last. This changes a few signatures in the core code, a lot
>>>>>>>>>>>>>>> of things in the skin code, but that would be for the better...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I still don't see how this affects the order we use in
>>>>>>>>>>>>>> do_taskexit_event, the one that prevents xnsys_get_ppd usage even when
>>>>>>>>>>>>>> the mm is still present.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The idea is to change the cleanup routines not to call xnsys_get_ppd.
>>>>>>>>>>>>
>>>>>>>>>>>> ...and use what instead? Sorry, I'm slow today.
>>>>>>>>>>>
>>>>>>>>>>> The sys_ppd passed as other argument to the cleanup function.
>>>>>>>>>>
>>>>>>>>>> That would affect all thread hooks, not only the one for deletion. And
>>>>>>>>>> it would pull in more shadow-specific bits into the pod.
>>>>>>>>>>
>>>>>>>>>> Moreover, I think we would still be in troubles as mm, thus ppd,
>>>>>>>>>> deletion takes place before last task deletion, thus taskexit hook
>>>>>>>>>> invocation. That's due to the cleanup ordering in the kernel's do_exit.
>>>>>>>>>>
>>>>>>>>>> However, if you have a patch, I'd be happy to test and rework my leakage
>>>>>>>>>> fix.
>>>>>>>>>
>>>>>>>>> I will work on this ASAP.
>>>>>>>>
>>>>>>>> Sorry for pushing, but I need to decide if we should role out my
>>>>>>>> imperfect fix or if there is chance to use some upstream version
>>>>>>>> directly. Were you able to look into this, or will this likely take a
>>>>>>>> bit more time?
>>>>>>>
>>>>>>> I intended to try and do this next week-end. If it is more urgent than
>>>>>>> that, I can try in one or two days. In any case, I do not think we
>>>>>>> should try and workaround the current code, it is way to fragile.
>>>>>>
>>>>>> Mmh, might be true. I'm getting the feeling we should locally revert all
>>>>>> the recent MPS changes to work around the issues. It looks like there
>>>>>> are more related problems sleeping (we are still facing spurious
>>>>>> fast-synch related crashes here - examining ATM).
>>>>>
>>>>> This is the development head, it may remain broken for short times while
>>>>> we are fixing. I would understand reverting on the 2.5 branch, not on -head.
>>>>
>>>> I was thinking loudly about our (Siemens) local branch, not -head. We
>>>> are forced to use head to have features like automatic non-RT shadow
>>>> migration.
>>>
>>> Now that I think about it, leaking a few bytes in the private heap is
>>> completely harmless, since it is destroyed anyway,
>>
>> Not at all harmless if you create and destroy tasks without restarting
>> the application. That's what our users are do, so this bug is killing them.
> 
> Still, it should not cause crashes. Only allocation returning NULL at
> some point.

That might be true. Reverting some suspicious patches did not resolve
the problem so far.

> 
>>> We do not care, we only use the mm pointer value as a key, and this one
>>> is still available when the exit callback is called. So, we have the mm
>>> pointer, we can have the process private ppd, normally, we have all that
>>> we need. It is just a question of finding an interface which is not too
>>> intrusive.
>>
>> The problem is that the heap we need to release the fastlock to will
>> already be history at this point - classic use after release...
> 
> As I said, xnheap_free is robust against this, so, this user after
> release does not cause any trouble. But I have also already agreed that
> we should fix it.

xnheap_test_and_free is not at all robust against working on an invalid
heap object.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-26  7:37                                       ` Jan Kiszka
@ 2011-05-26  7:58                                         ` Gilles Chanteperdrix
  0 siblings, 0 replies; 37+ messages in thread
From: Gilles Chanteperdrix @ 2011-05-26  7:58 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

On 05/26/2011 09:37 AM, Jan Kiszka wrote:
> On 2011-05-26 09:29, Gilles Chanteperdrix wrote:
>> On 05/26/2011 09:18 AM, Jan Kiszka wrote:
>>> On 2011-05-25 20:48, Gilles Chanteperdrix wrote:
>>>> On 05/25/2011 02:22 PM, Jan Kiszka wrote:
>>>>> On 2011-05-25 14:19, Gilles Chanteperdrix wrote:
>>>>>> On 05/25/2011 02:12 PM, Jan Kiszka wrote:
>>>>>>> On 2011-05-25 13:58, Gilles Chanteperdrix wrote:
>>>>>>>> On 05/25/2011 01:20 PM, Jan Kiszka wrote:
>>>>>>>>> On 2011-05-24 16:03, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 05/24/2011 03:52 PM, Jan Kiszka wrote:
>>>>>>>>>>> On 2011-05-24 14:30, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>> Do you already have an idea how to get that info to the delete hook
>>>>>>>>>>>>>>>>> function?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes. We start by not applying the list reversal patch, then the sys_ppd
>>>>>>>>>>>>>>>> is the first in the list. So, we can, in the function ppd_remove_mm,
>>>>>>>>>>>>>>>> start by removing all the others ppd, then remove the sys ppd (that is
>>>>>>>>>>>>>>>> the first), last. This changes a few signatures in the core code, a lot
>>>>>>>>>>>>>>>> of things in the skin code, but that would be for the better...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I still don't see how this affects the order we use in
>>>>>>>>>>>>>>> do_taskexit_event, the one that prevents xnsys_get_ppd usage even when
>>>>>>>>>>>>>>> the mm is still present.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The idea is to change the cleanup routines not to call xnsys_get_ppd.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ...and use what instead? Sorry, I'm slow today.
>>>>>>>>>>>>
>>>>>>>>>>>> The sys_ppd passed as other argument to the cleanup function.
>>>>>>>>>>>
>>>>>>>>>>> That would affect all thread hooks, not only the one for deletion. And
>>>>>>>>>>> it would pull in more shadow-specific bits into the pod.
>>>>>>>>>>>
>>>>>>>>>>> Moreover, I think we would still be in troubles as mm, thus ppd,
>>>>>>>>>>> deletion takes place before last task deletion, thus taskexit hook
>>>>>>>>>>> invocation. That's due to the cleanup ordering in the kernel's do_exit.
>>>>>>>>>>>
>>>>>>>>>>> However, if you have a patch, I'd be happy to test and rework my leakage
>>>>>>>>>>> fix.
>>>>>>>>>>
>>>>>>>>>> I will work on this ASAP.
>>>>>>>>>
>>>>>>>>> Sorry for pushing, but I need to decide if we should role out my
>>>>>>>>> imperfect fix or if there is chance to use some upstream version
>>>>>>>>> directly. Were you able to look into this, or will this likely take a
>>>>>>>>> bit more time?
>>>>>>>>
>>>>>>>> I intended to try and do this next week-end. If it is more urgent than
>>>>>>>> that, I can try in one or two days. In any case, I do not think we
>>>>>>>> should try and workaround the current code, it is way to fragile.
>>>>>>>
>>>>>>> Mmh, might be true. I'm getting the feeling we should locally revert all
>>>>>>> the recent MPS changes to work around the issues. It looks like there
>>>>>>> are more related problems sleeping (we are still facing spurious
>>>>>>> fast-synch related crashes here - examining ATM).
>>>>>>
>>>>>> This is the development head, it may remain broken for short times while
>>>>>> we are fixing. I would understand reverting on the 2.5 branch, not on -head.
>>>>>
>>>>> I was thinking loudly about our (Siemens) local branch, not -head. We
>>>>> are forced to use head to have features like automatic non-RT shadow
>>>>> migration.
>>>>
>>>> Now that I think about it, leaking a few bytes in the private heap is
>>>> completely harmless, since it is destroyed anyway,
>>>
>>> Not at all harmless if you create and destroy tasks without restarting
>>> the application. That's what our users are do, so this bug is killing them.
>>
>> Still, it should not cause crashes. Only allocation returning NULL at
>> some point.
> 
> That might be true. Reverting some suspicious patches did not resolve
> the problem so far.
> 
>>
>>>> We do not care, we only use the mm pointer value as a key, and this one
>>>> is still available when the exit callback is called. So, we have the mm
>>>> pointer, we can have the process private ppd, normally, we have all that
>>>> we need. It is just a question of finding an interface which is not too
>>>> intrusive.
>>>
>>> The problem is that the heap we need to release the fastlock to will
>>> already be history at this point - classic use after release...
>>
>> As I said, xnheap_free is robust against this, so, this user after
>> release does not cause any trouble. But I have also already agreed that
>> we should fix it.
> 
> xnheap_test_and_free is not at all robust against working on an invalid
> heap object.

The heap object is not really invalid, it has been scheduled to be
freed, but has not been freed yet. But OK, let us stop discussing this,
yes, this needs fixing.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-05-23 13:53 [Xenomai-core] [PULL] native: Fix msendq fastlock leakage Jan Kiszka
  2011-05-24  4:31 ` Gilles Chanteperdrix
@ 2011-06-19 10:14 ` Gilles Chanteperdrix
  2011-06-19 11:17   ` Gilles Chanteperdrix
  1 sibling, 1 reply; 37+ messages in thread
From: Gilles Chanteperdrix @ 2011-06-19 10:14 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

On 05/23/2011 03:53 PM, Jan Kiszka wrote:
> The following changes since commit aec30a2543afa18fa7832deee85e187b0faeb1f0:
> 
>   xeno-test: fix reference to @XENO_TEST_DIR@ (2011-05-15 21:20:41 +0200)
> 
> are available in the git repository at:
>   git://git.xenomai.org/xenomai-jki.git for-upstream
> 
> Jan Kiszka (1):
>       native: Fix msendq fastlock leakage
> 
>  include/native/task.h    |    5 +++++
>  ksrc/skins/native/task.c |   13 ++++++-------
>  2 files changed, 11 insertions(+), 7 deletions(-)
> 
> ------8<------
> 
> When a native task terminates without going through rt_task_delete, we
> leaked the fastlock so far. Fix it by moving the release into the delete
> hook. As the ppd is already invalid at that point, we have to save the
> heap address in the task data structure.

I am working on this ppd cleanup issue again, I am asking for help to
find a fix in -head for all cases where the sys_ppd is needed during
some cleanup.

The problem is that when the ppd cleanup is invoked:
- we have no guarantee that current is a thread from the Xenomai
application;
- if it is, current->mm is NULL.

So, associating the sys_ppd to either current or current->mm does not
work. What we could do is pass the sys_ppd to all the other ppds cleanup
handlers, this would fix cases such as freeing mutexes fastlock, but
that does not help when the sys_ppd is needed during a thread deletion hook.

I would like to find a solution where simply calling xnsys_ppd_get()
will work, where we do not have an xnsys_ppd_get for each context, such
as for instance xnsys_ppd_get_by_mm/xnsys_ppd_get_by_task_struct,
because it would be too error-prone.

Any idea anyone?

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-06-19 10:14 ` Gilles Chanteperdrix
@ 2011-06-19 11:17   ` Gilles Chanteperdrix
  2011-06-19 13:00     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 37+ messages in thread
From: Gilles Chanteperdrix @ 2011-06-19 11:17 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

On 06/19/2011 12:14 PM, Gilles Chanteperdrix wrote:
> I am working on this ppd cleanup issue again, I am asking for help to
> find a fix in -head for all cases where the sys_ppd is needed during
> some cleanup.
> 
> The problem is that when the ppd cleanup is invoked:
> - we have no guarantee that current is a thread from the Xenomai
> application;
> - if it is, current->mm is NULL.
> 
> So, associating the sys_ppd to either current or current->mm does not
> work. What we could do is pass the sys_ppd to all the other ppds cleanup
> handlers, this would fix cases such as freeing mutexes fastlock, but
> that does not help when the sys_ppd is needed during a thread deletion hook.
> 
> I would like to find a solution where simply calling xnsys_ppd_get()
> will work, where we do not have an xnsys_ppd_get for each context, such
> as for instance xnsys_ppd_get_by_mm/xnsys_ppd_get_by_task_struct,
> because it would be too error-prone.
> 
> Any idea anyone?

The best I could come up with: use a ptd to store the mm currently 
being cleaned up, so that xnshadow_ppd_get continues to work, even
in the middle of a cleanup.

diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
index b243600..b542b4f 100644
--- a/ksrc/nucleus/shadow.c
+++ b/ksrc/nucleus/shadow.c
@@ -65,6 +65,8 @@ int nkthrptd;
 EXPORT_SYMBOL_GPL(nkthrptd);
 int nkerrptd;
 EXPORT_SYMBOL_GPL(nkerrptd);
+int nkmmptd;
+EXPORT_SYMBOL_GPL(nkmmptd);
 
 struct xnskin_slot {
 	struct xnskin_props *props;
@@ -2759,7 +2761,14 @@ static void detach_ppd(xnshadow_ppd_t * ppd)
 
 static inline void do_cleanup_event(struct mm_struct *mm)
 {
+	struct mm_struct *old;
+
+	old = (struct mm_struct *)(current->ptd[nkmmptd]);
+	current->ptd[nkmmptd] = mm;
+
 	ppd_remove_mm(mm, &detach_ppd);
+
+	current->ptd[nkmmptd] = old;
 }
 
 RTHAL_DECLARE_CLEANUP_EVENT(cleanup_event);
@@ -2924,8 +2933,14 @@ EXPORT_SYMBOL_GPL(xnshadow_unregister_interface);
  */
 xnshadow_ppd_t *xnshadow_ppd_get(unsigned muxid)
 {
-	if (xnpod_userspace_p())
-		return ppd_lookup(muxid, current->mm);
+	if (xnpod_userspace_p()) {
+		struct mm_struct *mm;
+
+		mm = (struct mm_struct *)current->ptd[nkmmptd] ?: current->mm;
+		/* ptd is not null if we are currently cleaning up an mm */
+
+		return ppd_lookup(muxid, mm);
+	}
 
 	return NULL;
 }
@@ -2960,8 +2975,9 @@ int xnshadow_mount(void)
 	sema_init(&completion_mutex, 1);
 	nkthrptd = rthal_alloc_ptdkey();
 	nkerrptd = rthal_alloc_ptdkey();
+	nkmmptd = rthal_alloc_ptdkey();
 
-	if (nkthrptd < 0 || nkerrptd < 0) {
+	if (nkthrptd < 0 || nkerrptd < 0 || nkmmptd < 0) {
 		printk(KERN_ERR "Xenomai: cannot allocate PTD slots\n");
 		return -ENOMEM;
 	}
diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c
index 6ce75e5..cc86852 100644
--- a/ksrc/skins/posix/mutex.c
+++ b/ksrc/skins/posix/mutex.c
@@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t *mutex,
 	xnlock_put_irqrestore(&nklock, s);
 
 #ifdef CONFIG_XENO_FASTSYNCH
-	/* We call xnheap_free even if the mutex is not pshared; when
-	   this function is called from pse51_mutexq_cleanup, the
-	   sem_heap is destroyed, or not the one to which the fastlock
-	   belongs, xnheap will simply return an error. */
 	xnheap_free(&xnsys_ppd_get(mutex->attr.pshared)->sem_heap,
 		    mutex->synchbase.fastlock);
 #endif /* CONFIG_XENO_FASTSYNCH */


-- 
                                                                Gilles.


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-06-19 11:17   ` Gilles Chanteperdrix
@ 2011-06-19 13:00     ` Gilles Chanteperdrix
  2011-06-20 17:07       ` Jan Kiszka
  0 siblings, 1 reply; 37+ messages in thread
From: Gilles Chanteperdrix @ 2011-06-19 13:00 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

On 06/19/2011 01:17 PM, Gilles Chanteperdrix wrote:
> On 06/19/2011 12:14 PM, Gilles Chanteperdrix wrote:
>> I am working on this ppd cleanup issue again, I am asking for help to
>> find a fix in -head for all cases where the sys_ppd is needed during
>> some cleanup.
>>
>> The problem is that when the ppd cleanup is invoked:
>> - we have no guarantee that current is a thread from the Xenomai
>> application;
>> - if it is, current->mm is NULL.
>>
>> So, associating the sys_ppd to either current or current->mm does not
>> work. What we could do is pass the sys_ppd to all the other ppds cleanup
>> handlers, this would fix cases such as freeing mutexes fastlock, but
>> that does not help when the sys_ppd is needed during a thread deletion hook.
>>
>> I would like to find a solution where simply calling xnsys_ppd_get()
>> will work, where we do not have an xnsys_ppd_get for each context, such
>> as for instance xnsys_ppd_get_by_mm/xnsys_ppd_get_by_task_struct,
>> because it would be too error-prone.
>>
>> Any idea anyone?
> 
> The best I could come up with: use a ptd to store the mm currently 
> being cleaned up, so that xnshadow_ppd_get continues to work, even
> in the middle of a cleanup.

In order to also get xnshadow_ppd_get to work in task deletion hooks 
(which is needed to avoid the issue at the origin of this thread), we 
also need to set this ptd upon shadow mapping, so it is still there 
when reaching the task deletion hook (where current->mm may be NULL). 
Hence the patch:

diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
index b243600..6bc4210 100644
--- a/ksrc/nucleus/shadow.c
+++ b/ksrc/nucleus/shadow.c
@@ -65,6 +65,11 @@ int nkthrptd;
 EXPORT_SYMBOL_GPL(nkthrptd);
 int nkerrptd;
 EXPORT_SYMBOL_GPL(nkerrptd);
+int nkmmptd;
+EXPORT_SYMBOL_GPL(nkmmptd);
+
+#define xnshadow_mmptd(t) ((t)->ptd[nkmmptd])
+#define xnshadow_mm(t) ((struct mm_struct *)xnshadow_mmptd(t))
 
 struct xnskin_slot {
 	struct xnskin_props *props;
@@ -1304,6 +1309,8 @@ int xnshadow_map(xnthread_t *thread, xncompletion_t __user *u_completion,
 	 * friends.
 	 */
 	xnshadow_thrptd(current) = thread;
+	xnshadow_mmptd(current) = current->mm;
+
 	rthal_enable_notifier(current);
 
 	if (xnthread_base_priority(thread) == 0 &&
@@ -2759,7 +2766,15 @@ static void detach_ppd(xnshadow_ppd_t * ppd)
 
 static inline void do_cleanup_event(struct mm_struct *mm)
 {
+	struct task_struct *p = current;
+	struct mm_struct *old;
+
+	old = xnshadow_mm(p);
+	xnshadow_mmptd(p) = mm;
+
 	ppd_remove_mm(mm, &detach_ppd);
+
+	xnshadow_mmptd(p) = old;
 }
 
 RTHAL_DECLARE_CLEANUP_EVENT(cleanup_event);
@@ -2925,7 +2940,7 @@ EXPORT_SYMBOL_GPL(xnshadow_unregister_interface);
 xnshadow_ppd_t *xnshadow_ppd_get(unsigned muxid)
 {
 	if (xnpod_userspace_p())
-		return ppd_lookup(muxid, current->mm);
+		return ppd_lookup(muxid, xnshadow_mm(current) ?: current->mm);
 
 	return NULL;
 }
@@ -2960,8 +2975,9 @@ int xnshadow_mount(void)
 	sema_init(&completion_mutex, 1);
 	nkthrptd = rthal_alloc_ptdkey();
 	nkerrptd = rthal_alloc_ptdkey();
+	nkmmptd = rthal_alloc_ptdkey();
 
-	if (nkthrptd < 0 || nkerrptd < 0) {
+	if (nkthrptd < 0 || nkerrptd < 0 || nkmmptd < 0) {
 		printk(KERN_ERR "Xenomai: cannot allocate PTD slots\n");
 		return -ENOMEM;
 	}
diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c
index 6ce75e5..cc86852 100644
--- a/ksrc/skins/posix/mutex.c
+++ b/ksrc/skins/posix/mutex.c
@@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t *mutex,
 	xnlock_put_irqrestore(&nklock, s);
 
 #ifdef CONFIG_XENO_FASTSYNCH
-	/* We call xnheap_free even if the mutex is not pshared; when
-	   this function is called from pse51_mutexq_cleanup, the
-	   sem_heap is destroyed, or not the one to which the fastlock
-	   belongs, xnheap will simply return an error. */
 	xnheap_free(&xnsys_ppd_get(mutex->attr.pshared)->sem_heap,
 		    mutex->synchbase.fastlock);
 #endif /* CONFIG_XENO_FASTSYNCH */


-- 
                                                                Gilles.


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-06-19 13:00     ` Gilles Chanteperdrix
@ 2011-06-20 17:07       ` Jan Kiszka
  2011-06-20 17:46         ` Gilles Chanteperdrix
  2011-06-23  9:37         ` Jan Kiszka
  0 siblings, 2 replies; 37+ messages in thread
From: Jan Kiszka @ 2011-06-20 17:07 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai core

On 2011-06-19 15:00, Gilles Chanteperdrix wrote:
> On 06/19/2011 01:17 PM, Gilles Chanteperdrix wrote:
>> On 06/19/2011 12:14 PM, Gilles Chanteperdrix wrote:
>>> I am working on this ppd cleanup issue again, I am asking for help to
>>> find a fix in -head for all cases where the sys_ppd is needed during
>>> some cleanup.
>>>
>>> The problem is that when the ppd cleanup is invoked:
>>> - we have no guarantee that current is a thread from the Xenomai
>>> application;
>>> - if it is, current->mm is NULL.
>>>
>>> So, associating the sys_ppd to either current or current->mm does not
>>> work. What we could do is pass the sys_ppd to all the other ppds cleanup
>>> handlers, this would fix cases such as freeing mutexes fastlock, but
>>> that does not help when the sys_ppd is needed during a thread deletion hook.
>>>
>>> I would like to find a solution where simply calling xnsys_ppd_get()
>>> will work, where we do not have an xnsys_ppd_get for each context, such
>>> as for instance xnsys_ppd_get_by_mm/xnsys_ppd_get_by_task_struct,
>>> because it would be too error-prone.
>>>
>>> Any idea anyone?
>>
>> The best I could come up with: use a ptd to store the mm currently 
>> being cleaned up, so that xnshadow_ppd_get continues to work, even
>> in the middle of a cleanup.
> 
> In order to also get xnshadow_ppd_get to work in task deletion hooks 
> (which is needed to avoid the issue at the origin of this thread), we 
> also need to set this ptd upon shadow mapping, so it is still there 
> when reaching the task deletion hook (where current->mm may be NULL). 
> Hence the patch:
> 
> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
> index b243600..6bc4210 100644
> --- a/ksrc/nucleus/shadow.c
> +++ b/ksrc/nucleus/shadow.c
> @@ -65,6 +65,11 @@ int nkthrptd;
>  EXPORT_SYMBOL_GPL(nkthrptd);
>  int nkerrptd;
>  EXPORT_SYMBOL_GPL(nkerrptd);
> +int nkmmptd;
> +EXPORT_SYMBOL_GPL(nkmmptd);
> +
> +#define xnshadow_mmptd(t) ((t)->ptd[nkmmptd])
> +#define xnshadow_mm(t) ((struct mm_struct *)xnshadow_mmptd(t))

xnshadow_mm() can now return a no longer existing mm. So no user of
xnshadow_mm should ever dereference that pointer. Thus we better change
all that user to treat the return value as a void pointer e.g.

>  
>  struct xnskin_slot {
>  	struct xnskin_props *props;
> @@ -1304,6 +1309,8 @@ int xnshadow_map(xnthread_t *thread, xncompletion_t __user *u_completion,
>  	 * friends.
>  	 */
>  	xnshadow_thrptd(current) = thread;
> +	xnshadow_mmptd(current) = current->mm;
> +
>  	rthal_enable_notifier(current);
>  
>  	if (xnthread_base_priority(thread) == 0 &&
> @@ -2759,7 +2766,15 @@ static void detach_ppd(xnshadow_ppd_t * ppd)
>  
>  static inline void do_cleanup_event(struct mm_struct *mm)
>  {
> +	struct task_struct *p = current;
> +	struct mm_struct *old;
> +
> +	old = xnshadow_mm(p);
> +	xnshadow_mmptd(p) = mm;
> +
>  	ppd_remove_mm(mm, &detach_ppd);
> +
> +	xnshadow_mmptd(p) = old;

I don't have the full picture yet, but that feels racy: If the context
over which we clean up that foreign mm is also using xnshadow_mmptd,
other threads in that process may dislike this temporary change.

>  }
>  
>  RTHAL_DECLARE_CLEANUP_EVENT(cleanup_event);
> @@ -2925,7 +2940,7 @@ EXPORT_SYMBOL_GPL(xnshadow_unregister_interface);
>  xnshadow_ppd_t *xnshadow_ppd_get(unsigned muxid)
>  {
>  	if (xnpod_userspace_p())
> -		return ppd_lookup(muxid, current->mm);
> +		return ppd_lookup(muxid, xnshadow_mm(current) ?: current->mm);
>  
>  	return NULL;
>  }
> @@ -2960,8 +2975,9 @@ int xnshadow_mount(void)
>  	sema_init(&completion_mutex, 1);
>  	nkthrptd = rthal_alloc_ptdkey();
>  	nkerrptd = rthal_alloc_ptdkey();
> +	nkmmptd = rthal_alloc_ptdkey();
>  
> -	if (nkthrptd < 0 || nkerrptd < 0) {
> +	if (nkthrptd < 0 || nkerrptd < 0 || nkmmptd < 0) {
>  		printk(KERN_ERR "Xenomai: cannot allocate PTD slots\n");
>  		return -ENOMEM;
>  	}
> diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c
> index 6ce75e5..cc86852 100644
> --- a/ksrc/skins/posix/mutex.c
> +++ b/ksrc/skins/posix/mutex.c
> @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t *mutex,
>  	xnlock_put_irqrestore(&nklock, s);
>  
>  #ifdef CONFIG_XENO_FASTSYNCH
> -	/* We call xnheap_free even if the mutex is not pshared; when
> -	   this function is called from pse51_mutexq_cleanup, the
> -	   sem_heap is destroyed, or not the one to which the fastlock
> -	   belongs, xnheap will simply return an error. */

I think this comment is not completely obsolete. It still applies /wrt
shared/non-shared.

>  	xnheap_free(&xnsys_ppd_get(mutex->attr.pshared)->sem_heap,
>  		    mutex->synchbase.fastlock);
>  #endif /* CONFIG_XENO_FASTSYNCH */
> 
> 

If we can resolve that potential race, this looks like a nice solution.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-06-20 17:07       ` Jan Kiszka
@ 2011-06-20 17:46         ` Gilles Chanteperdrix
  2011-06-20 20:52           ` Jan Kiszka
  2011-06-23  9:37         ` Jan Kiszka
  1 sibling, 1 reply; 37+ messages in thread
From: Gilles Chanteperdrix @ 2011-06-20 17:46 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

On 06/20/2011 07:07 PM, Jan Kiszka wrote:
>>  static inline void do_cleanup_event(struct mm_struct *mm)
>>  {
>> +	struct task_struct *p = current;
>> +	struct mm_struct *old;
>> +
>> +	old = xnshadow_mm(p);
>> +	xnshadow_mmptd(p) = mm;
>> +
>>  	ppd_remove_mm(mm, &detach_ppd);
>> +
>> +	xnshadow_mmptd(p) = old;
> 
> I don't have the full picture yet, but that feels racy: If the context
> over which we clean up that foreign mm is also using xnshadow_mmptd,
> other threads in that process may dislike this temporary change.

This mmptd is only used by xnshadow_ppd_get (which never dereferences it
by the way). And if the current thread is running the cleanup event, it
is not running any other service at the same time. So, this is safe.

An alternative implementation would be to use some global per-cpu
variable and disable the preemption during cleanup. But that would not
fix the case of the shadow cleanups where current->mm is already null.

The remaining problem is an irq interrupting the cleanup, and using
xnshadow_ppd_get. But I would not expect that to happen. I mean,
xnshadow_ppd_get is more a service to be used for implementation of
system calls.

>> diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c
>> index 6ce75e5..cc86852 100644
>> --- a/ksrc/skins/posix/mutex.c
>> +++ b/ksrc/skins/posix/mutex.c
>> @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t *mutex,
>>  	xnlock_put_irqrestore(&nklock, s);
>>  
>>  #ifdef CONFIG_XENO_FASTSYNCH
>> -	/* We call xnheap_free even if the mutex is not pshared; when
>> -	   this function is called from pse51_mutexq_cleanup, the
>> -	   sem_heap is destroyed, or not the one to which the fastlock
>> -	   belongs, xnheap will simply return an error. */
> 
> I think this comment is not completely obsolete. It still applies /wrt
> shared/non-shared.

If we apply this patch after the one changing the ppd cleanup order,
everything falls in place (I can even put a warning in xnheap_destroy if
the heap has some bytes still in use when destroyed).

> If we can resolve that potential race, this looks like a nice solution.

It may look a bit overkill, since it uses an adeos ptd. On the other
hand, who else uses them anyway ;-)

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-06-20 17:46         ` Gilles Chanteperdrix
@ 2011-06-20 20:52           ` Jan Kiszka
  0 siblings, 0 replies; 37+ messages in thread
From: Jan Kiszka @ 2011-06-20 20:52 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai core

[-- Attachment #1: Type: text/plain, Size: 2684 bytes --]

On 2011-06-20 19:46, Gilles Chanteperdrix wrote:
> On 06/20/2011 07:07 PM, Jan Kiszka wrote:
>>>  static inline void do_cleanup_event(struct mm_struct *mm)
>>>  {
>>> +	struct task_struct *p = current;
>>> +	struct mm_struct *old;
>>> +
>>> +	old = xnshadow_mm(p);
>>> +	xnshadow_mmptd(p) = mm;
>>> +
>>>  	ppd_remove_mm(mm, &detach_ppd);
>>> +
>>> +	xnshadow_mmptd(p) = old;
>>
>> I don't have the full picture yet, but that feels racy: If the context
>> over which we clean up that foreign mm is also using xnshadow_mmptd,
>> other threads in that process may dislike this temporary change.
> 
> This mmptd is only used by xnshadow_ppd_get (which never dereferences it
> by the way

I know that the current code is safe. Avoiding mm_struct types is about
keeping it safe in the future. Why track the type if its just an opaque
key and should be handled as such?

). And if the current thread is running the cleanup event, it
> is not running any other service at the same time. So, this is safe.

Ah, xnshadow_mmptd is per-thread, not per-process as I incorrectly
assumed. Then it's safe.

> 
> An alternative implementation would be to use some global per-cpu
> variable and disable the preemption during cleanup. But that would not
> fix the case of the shadow cleanups where current->mm is already null.
> 
> The remaining problem is an irq interrupting the cleanup, and using
> xnshadow_ppd_get. But I would not expect that to happen. I mean,
> xnshadow_ppd_get is more a service to be used for implementation of
> system calls.

Yes, that kind of usage would be a bug in the first place.

> 
>>> diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c
>>> index 6ce75e5..cc86852 100644
>>> --- a/ksrc/skins/posix/mutex.c
>>> +++ b/ksrc/skins/posix/mutex.c
>>> @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t *mutex,
>>>  	xnlock_put_irqrestore(&nklock, s);
>>>  
>>>  #ifdef CONFIG_XENO_FASTSYNCH
>>> -	/* We call xnheap_free even if the mutex is not pshared; when
>>> -	   this function is called from pse51_mutexq_cleanup, the
>>> -	   sem_heap is destroyed, or not the one to which the fastlock
>>> -	   belongs, xnheap will simply return an error. */
>>
>> I think this comment is not completely obsolete. It still applies /wrt
>> shared/non-shared.
> 
> If we apply this patch after the one changing the ppd cleanup order,
> everything falls in place (I can even put a warning in xnheap_destroy if
> the heap has some bytes still in use when destroyed).

"We call xnheap_free even if the mutex is not pshared." This still
applies and is unrelated to the ppd changes.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-06-20 17:07       ` Jan Kiszka
  2011-06-20 17:46         ` Gilles Chanteperdrix
@ 2011-06-23  9:37         ` Jan Kiszka
  2011-06-23 11:11           ` Gilles Chanteperdrix
                             ` (2 more replies)
  1 sibling, 3 replies; 37+ messages in thread
From: Jan Kiszka @ 2011-06-23  9:37 UTC (permalink / raw)
  Cc: Xenomai core

[-- Attachment #1: Type: text/plain, Size: 5507 bytes --]

On 2011-06-20 19:07, Jan Kiszka wrote:
> On 2011-06-19 15:00, Gilles Chanteperdrix wrote:
>> On 06/19/2011 01:17 PM, Gilles Chanteperdrix wrote:
>>> On 06/19/2011 12:14 PM, Gilles Chanteperdrix wrote:
>>>> I am working on this ppd cleanup issue again, I am asking for help to
>>>> find a fix in -head for all cases where the sys_ppd is needed during
>>>> some cleanup.
>>>>
>>>> The problem is that when the ppd cleanup is invoked:
>>>> - we have no guarantee that current is a thread from the Xenomai
>>>> application;
>>>> - if it is, current->mm is NULL.
>>>>
>>>> So, associating the sys_ppd to either current or current->mm does not
>>>> work. What we could do is pass the sys_ppd to all the other ppds cleanup
>>>> handlers, this would fix cases such as freeing mutexes fastlock, but
>>>> that does not help when the sys_ppd is needed during a thread deletion hook.
>>>>
>>>> I would like to find a solution where simply calling xnsys_ppd_get()
>>>> will work, where we do not have an xnsys_ppd_get for each context, such
>>>> as for instance xnsys_ppd_get_by_mm/xnsys_ppd_get_by_task_struct,
>>>> because it would be too error-prone.
>>>>
>>>> Any idea anyone?
>>>
>>> The best I could come up with: use a ptd to store the mm currently 
>>> being cleaned up, so that xnshadow_ppd_get continues to work, even
>>> in the middle of a cleanup.
>>
>> In order to also get xnshadow_ppd_get to work in task deletion hooks 
>> (which is needed to avoid the issue at the origin of this thread), we 
>> also need to set this ptd upon shadow mapping, so it is still there 
>> when reaching the task deletion hook (where current->mm may be NULL). 
>> Hence the patch:
>>
>> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
>> index b243600..6bc4210 100644
>> --- a/ksrc/nucleus/shadow.c
>> +++ b/ksrc/nucleus/shadow.c
>> @@ -65,6 +65,11 @@ int nkthrptd;
>>  EXPORT_SYMBOL_GPL(nkthrptd);
>>  int nkerrptd;
>>  EXPORT_SYMBOL_GPL(nkerrptd);
>> +int nkmmptd;
>> +EXPORT_SYMBOL_GPL(nkmmptd);
>> +
>> +#define xnshadow_mmptd(t) ((t)->ptd[nkmmptd])
>> +#define xnshadow_mm(t) ((struct mm_struct *)xnshadow_mmptd(t))
> 
> xnshadow_mm() can now return a no longer existing mm. So no user of
> xnshadow_mm should ever dereference that pointer. Thus we better change
> all that user to treat the return value as a void pointer e.g.
> 
>>  
>>  struct xnskin_slot {
>>  	struct xnskin_props *props;
>> @@ -1304,6 +1309,8 @@ int xnshadow_map(xnthread_t *thread, xncompletion_t __user *u_completion,
>>  	 * friends.
>>  	 */
>>  	xnshadow_thrptd(current) = thread;
>> +	xnshadow_mmptd(current) = current->mm;
>> +
>>  	rthal_enable_notifier(current);
>>  
>>  	if (xnthread_base_priority(thread) == 0 &&
>> @@ -2759,7 +2766,15 @@ static void detach_ppd(xnshadow_ppd_t * ppd)
>>  
>>  static inline void do_cleanup_event(struct mm_struct *mm)
>>  {
>> +	struct task_struct *p = current;
>> +	struct mm_struct *old;
>> +
>> +	old = xnshadow_mm(p);
>> +	xnshadow_mmptd(p) = mm;
>> +
>>  	ppd_remove_mm(mm, &detach_ppd);
>> +
>> +	xnshadow_mmptd(p) = old;
> 
> I don't have the full picture yet, but that feels racy: If the context
> over which we clean up that foreign mm is also using xnshadow_mmptd,
> other threads in that process may dislike this temporary change.
> 
>>  }
>>  
>>  RTHAL_DECLARE_CLEANUP_EVENT(cleanup_event);
>> @@ -2925,7 +2940,7 @@ EXPORT_SYMBOL_GPL(xnshadow_unregister_interface);
>>  xnshadow_ppd_t *xnshadow_ppd_get(unsigned muxid)
>>  {
>>  	if (xnpod_userspace_p())
>> -		return ppd_lookup(muxid, current->mm);
>> +		return ppd_lookup(muxid, xnshadow_mm(current) ?: current->mm);
>>  
>>  	return NULL;
>>  }
>> @@ -2960,8 +2975,9 @@ int xnshadow_mount(void)
>>  	sema_init(&completion_mutex, 1);
>>  	nkthrptd = rthal_alloc_ptdkey();
>>  	nkerrptd = rthal_alloc_ptdkey();
>> +	nkmmptd = rthal_alloc_ptdkey();
>>  
>> -	if (nkthrptd < 0 || nkerrptd < 0) {
>> +	if (nkthrptd < 0 || nkerrptd < 0 || nkmmptd < 0) {
>>  		printk(KERN_ERR "Xenomai: cannot allocate PTD slots\n");
>>  		return -ENOMEM;
>>  	}
>> diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c
>> index 6ce75e5..cc86852 100644
>> --- a/ksrc/skins/posix/mutex.c
>> +++ b/ksrc/skins/posix/mutex.c
>> @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t *mutex,
>>  	xnlock_put_irqrestore(&nklock, s);
>>  
>>  #ifdef CONFIG_XENO_FASTSYNCH
>> -	/* We call xnheap_free even if the mutex is not pshared; when
>> -	   this function is called from pse51_mutexq_cleanup, the
>> -	   sem_heap is destroyed, or not the one to which the fastlock
>> -	   belongs, xnheap will simply return an error. */
> 
> I think this comment is not completely obsolete. It still applies /wrt
> shared/non-shared.
> 
>>  	xnheap_free(&xnsys_ppd_get(mutex->attr.pshared)->sem_heap,
>>  		    mutex->synchbase.fastlock);
>>  #endif /* CONFIG_XENO_FASTSYNCH */
>>
>>
> 
> If we can resolve that potential race, this looks like a nice solution.

We still have to address that ordering issue I almost forgot:
do_cleanup_event runs before do_task_exit_event when terminating the
last task. The former destroys the sem heap, the latter fires the delete
hook which then tries to free msendq.fastlock to an invalid heap.

Should be fixable by setting sem_heap NULL in the ppd on destroy and
skipping the fastlock release in __task_delete_hook if the heap pointer
is found like that.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-06-23  9:37         ` Jan Kiszka
@ 2011-06-23 11:11           ` Gilles Chanteperdrix
  2011-06-23 11:15             ` Jan Kiszka
  2011-06-23 19:08           ` Gilles Chanteperdrix
  2011-06-24  7:01           ` Gilles Chanteperdrix
  2 siblings, 1 reply; 37+ messages in thread
From: Gilles Chanteperdrix @ 2011-06-23 11:11 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

On 06/23/2011 11:37 AM, Jan Kiszka wrote:
> On 2011-06-20 19:07, Jan Kiszka wrote:
>> On 2011-06-19 15:00, Gilles Chanteperdrix wrote:
>>> On 06/19/2011 01:17 PM, Gilles Chanteperdrix wrote:
>>>> On 06/19/2011 12:14 PM, Gilles Chanteperdrix wrote:
>>>>> I am working on this ppd cleanup issue again, I am asking for help to
>>>>> find a fix in -head for all cases where the sys_ppd is needed during
>>>>> some cleanup.
>>>>>
>>>>> The problem is that when the ppd cleanup is invoked:
>>>>> - we have no guarantee that current is a thread from the Xenomai
>>>>> application;
>>>>> - if it is, current->mm is NULL.
>>>>>
>>>>> So, associating the sys_ppd to either current or current->mm does not
>>>>> work. What we could do is pass the sys_ppd to all the other ppds cleanup
>>>>> handlers, this would fix cases such as freeing mutexes fastlock, but
>>>>> that does not help when the sys_ppd is needed during a thread deletion hook.
>>>>>
>>>>> I would like to find a solution where simply calling xnsys_ppd_get()
>>>>> will work, where we do not have an xnsys_ppd_get for each context, such
>>>>> as for instance xnsys_ppd_get_by_mm/xnsys_ppd_get_by_task_struct,
>>>>> because it would be too error-prone.
>>>>>
>>>>> Any idea anyone?
>>>>
>>>> The best I could come up with: use a ptd to store the mm currently 
>>>> being cleaned up, so that xnshadow_ppd_get continues to work, even
>>>> in the middle of a cleanup.
>>>
>>> In order to also get xnshadow_ppd_get to work in task deletion hooks 
>>> (which is needed to avoid the issue at the origin of this thread), we 
>>> also need to set this ptd upon shadow mapping, so it is still there 
>>> when reaching the task deletion hook (where current->mm may be NULL). 
>>> Hence the patch:
>>>
>>> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
>>> index b243600..6bc4210 100644
>>> --- a/ksrc/nucleus/shadow.c
>>> +++ b/ksrc/nucleus/shadow.c
>>> @@ -65,6 +65,11 @@ int nkthrptd;
>>>  EXPORT_SYMBOL_GPL(nkthrptd);
>>>  int nkerrptd;
>>>  EXPORT_SYMBOL_GPL(nkerrptd);
>>> +int nkmmptd;
>>> +EXPORT_SYMBOL_GPL(nkmmptd);
>>> +
>>> +#define xnshadow_mmptd(t) ((t)->ptd[nkmmptd])
>>> +#define xnshadow_mm(t) ((struct mm_struct *)xnshadow_mmptd(t))
>>
>> xnshadow_mm() can now return a no longer existing mm. So no user of
>> xnshadow_mm should ever dereference that pointer. Thus we better change
>> all that user to treat the return value as a void pointer e.g.
>>
>>>  
>>>  struct xnskin_slot {
>>>  	struct xnskin_props *props;
>>> @@ -1304,6 +1309,8 @@ int xnshadow_map(xnthread_t *thread, xncompletion_t __user *u_completion,
>>>  	 * friends.
>>>  	 */
>>>  	xnshadow_thrptd(current) = thread;
>>> +	xnshadow_mmptd(current) = current->mm;
>>> +
>>>  	rthal_enable_notifier(current);
>>>  
>>>  	if (xnthread_base_priority(thread) == 0 &&
>>> @@ -2759,7 +2766,15 @@ static void detach_ppd(xnshadow_ppd_t * ppd)
>>>  
>>>  static inline void do_cleanup_event(struct mm_struct *mm)
>>>  {
>>> +	struct task_struct *p = current;
>>> +	struct mm_struct *old;
>>> +
>>> +	old = xnshadow_mm(p);
>>> +	xnshadow_mmptd(p) = mm;
>>> +
>>>  	ppd_remove_mm(mm, &detach_ppd);
>>> +
>>> +	xnshadow_mmptd(p) = old;
>>
>> I don't have the full picture yet, but that feels racy: If the context
>> over which we clean up that foreign mm is also using xnshadow_mmptd,
>> other threads in that process may dislike this temporary change.
>>
>>>  }
>>>  
>>>  RTHAL_DECLARE_CLEANUP_EVENT(cleanup_event);
>>> @@ -2925,7 +2940,7 @@ EXPORT_SYMBOL_GPL(xnshadow_unregister_interface);
>>>  xnshadow_ppd_t *xnshadow_ppd_get(unsigned muxid)
>>>  {
>>>  	if (xnpod_userspace_p())
>>> -		return ppd_lookup(muxid, current->mm);
>>> +		return ppd_lookup(muxid, xnshadow_mm(current) ?: current->mm);
>>>  
>>>  	return NULL;
>>>  }
>>> @@ -2960,8 +2975,9 @@ int xnshadow_mount(void)
>>>  	sema_init(&completion_mutex, 1);
>>>  	nkthrptd = rthal_alloc_ptdkey();
>>>  	nkerrptd = rthal_alloc_ptdkey();
>>> +	nkmmptd = rthal_alloc_ptdkey();
>>>  
>>> -	if (nkthrptd < 0 || nkerrptd < 0) {
>>> +	if (nkthrptd < 0 || nkerrptd < 0 || nkmmptd < 0) {
>>>  		printk(KERN_ERR "Xenomai: cannot allocate PTD slots\n");
>>>  		return -ENOMEM;
>>>  	}
>>> diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c
>>> index 6ce75e5..cc86852 100644
>>> --- a/ksrc/skins/posix/mutex.c
>>> +++ b/ksrc/skins/posix/mutex.c
>>> @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t *mutex,
>>>  	xnlock_put_irqrestore(&nklock, s);
>>>  
>>>  #ifdef CONFIG_XENO_FASTSYNCH
>>> -	/* We call xnheap_free even if the mutex is not pshared; when
>>> -	   this function is called from pse51_mutexq_cleanup, the
>>> -	   sem_heap is destroyed, or not the one to which the fastlock
>>> -	   belongs, xnheap will simply return an error. */
>>
>> I think this comment is not completely obsolete. It still applies /wrt
>> shared/non-shared.
>>
>>>  	xnheap_free(&xnsys_ppd_get(mutex->attr.pshared)->sem_heap,
>>>  		    mutex->synchbase.fastlock);
>>>  #endif /* CONFIG_XENO_FASTSYNCH */
>>>
>>>
>>
>> If we can resolve that potential race, this looks like a nice solution.
> 
> We still have to address that ordering issue I almost forgot:
> do_cleanup_event runs before do_task_exit_event when terminating the
> last task. The former destroys the sem heap, the latter fires the delete
> hook which then tries to free msendq.fastlock to an invalid heap.
> 
> Should be fixable by setting sem_heap NULL in the ppd on destroy and
> skipping the fastlock release in __task_delete_hook if the heap pointer
> is found like that.

I do not think this can be a problem, as the do_cleanup_event will also
destroy the threads. Anyway, I just pushed a branch "u_mode" on the
xenomai-gch git with all the work based on this mmptd, could you try and
pull it to see if you sill have this cleanup/task_exit issue?

> 
> Jan
> 


-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-06-23 11:11           ` Gilles Chanteperdrix
@ 2011-06-23 11:15             ` Jan Kiszka
  2011-06-23 17:32               ` Gilles Chanteperdrix
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Kiszka @ 2011-06-23 11:15 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai core

[-- Attachment #1: Type: text/plain, Size: 6329 bytes --]

On 2011-06-23 13:11, Gilles Chanteperdrix wrote:
> On 06/23/2011 11:37 AM, Jan Kiszka wrote:
>> On 2011-06-20 19:07, Jan Kiszka wrote:
>>> On 2011-06-19 15:00, Gilles Chanteperdrix wrote:
>>>> On 06/19/2011 01:17 PM, Gilles Chanteperdrix wrote:
>>>>> On 06/19/2011 12:14 PM, Gilles Chanteperdrix wrote:
>>>>>> I am working on this ppd cleanup issue again, I am asking for help to
>>>>>> find a fix in -head for all cases where the sys_ppd is needed during
>>>>>> some cleanup.
>>>>>>
>>>>>> The problem is that when the ppd cleanup is invoked:
>>>>>> - we have no guarantee that current is a thread from the Xenomai
>>>>>> application;
>>>>>> - if it is, current->mm is NULL.
>>>>>>
>>>>>> So, associating the sys_ppd to either current or current->mm does not
>>>>>> work. What we could do is pass the sys_ppd to all the other ppds cleanup
>>>>>> handlers, this would fix cases such as freeing mutexes fastlock, but
>>>>>> that does not help when the sys_ppd is needed during a thread deletion hook.
>>>>>>
>>>>>> I would like to find a solution where simply calling xnsys_ppd_get()
>>>>>> will work, where we do not have an xnsys_ppd_get for each context, such
>>>>>> as for instance xnsys_ppd_get_by_mm/xnsys_ppd_get_by_task_struct,
>>>>>> because it would be too error-prone.
>>>>>>
>>>>>> Any idea anyone?
>>>>>
>>>>> The best I could come up with: use a ptd to store the mm currently 
>>>>> being cleaned up, so that xnshadow_ppd_get continues to work, even
>>>>> in the middle of a cleanup.
>>>>
>>>> In order to also get xnshadow_ppd_get to work in task deletion hooks 
>>>> (which is needed to avoid the issue at the origin of this thread), we 
>>>> also need to set this ptd upon shadow mapping, so it is still there 
>>>> when reaching the task deletion hook (where current->mm may be NULL). 
>>>> Hence the patch:
>>>>
>>>> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
>>>> index b243600..6bc4210 100644
>>>> --- a/ksrc/nucleus/shadow.c
>>>> +++ b/ksrc/nucleus/shadow.c
>>>> @@ -65,6 +65,11 @@ int nkthrptd;
>>>>  EXPORT_SYMBOL_GPL(nkthrptd);
>>>>  int nkerrptd;
>>>>  EXPORT_SYMBOL_GPL(nkerrptd);
>>>> +int nkmmptd;
>>>> +EXPORT_SYMBOL_GPL(nkmmptd);
>>>> +
>>>> +#define xnshadow_mmptd(t) ((t)->ptd[nkmmptd])
>>>> +#define xnshadow_mm(t) ((struct mm_struct *)xnshadow_mmptd(t))
>>>
>>> xnshadow_mm() can now return a no longer existing mm. So no user of
>>> xnshadow_mm should ever dereference that pointer. Thus we better change
>>> all that user to treat the return value as a void pointer e.g.
>>>
>>>>  
>>>>  struct xnskin_slot {
>>>>  	struct xnskin_props *props;
>>>> @@ -1304,6 +1309,8 @@ int xnshadow_map(xnthread_t *thread, xncompletion_t __user *u_completion,
>>>>  	 * friends.
>>>>  	 */
>>>>  	xnshadow_thrptd(current) = thread;
>>>> +	xnshadow_mmptd(current) = current->mm;
>>>> +
>>>>  	rthal_enable_notifier(current);
>>>>  
>>>>  	if (xnthread_base_priority(thread) == 0 &&
>>>> @@ -2759,7 +2766,15 @@ static void detach_ppd(xnshadow_ppd_t * ppd)
>>>>  
>>>>  static inline void do_cleanup_event(struct mm_struct *mm)
>>>>  {
>>>> +	struct task_struct *p = current;
>>>> +	struct mm_struct *old;
>>>> +
>>>> +	old = xnshadow_mm(p);
>>>> +	xnshadow_mmptd(p) = mm;
>>>> +
>>>>  	ppd_remove_mm(mm, &detach_ppd);
>>>> +
>>>> +	xnshadow_mmptd(p) = old;
>>>
>>> I don't have the full picture yet, but that feels racy: If the context
>>> over which we clean up that foreign mm is also using xnshadow_mmptd,
>>> other threads in that process may dislike this temporary change.
>>>
>>>>  }
>>>>  
>>>>  RTHAL_DECLARE_CLEANUP_EVENT(cleanup_event);
>>>> @@ -2925,7 +2940,7 @@ EXPORT_SYMBOL_GPL(xnshadow_unregister_interface);
>>>>  xnshadow_ppd_t *xnshadow_ppd_get(unsigned muxid)
>>>>  {
>>>>  	if (xnpod_userspace_p())
>>>> -		return ppd_lookup(muxid, current->mm);
>>>> +		return ppd_lookup(muxid, xnshadow_mm(current) ?: current->mm);
>>>>  
>>>>  	return NULL;
>>>>  }
>>>> @@ -2960,8 +2975,9 @@ int xnshadow_mount(void)
>>>>  	sema_init(&completion_mutex, 1);
>>>>  	nkthrptd = rthal_alloc_ptdkey();
>>>>  	nkerrptd = rthal_alloc_ptdkey();
>>>> +	nkmmptd = rthal_alloc_ptdkey();
>>>>  
>>>> -	if (nkthrptd < 0 || nkerrptd < 0) {
>>>> +	if (nkthrptd < 0 || nkerrptd < 0 || nkmmptd < 0) {
>>>>  		printk(KERN_ERR "Xenomai: cannot allocate PTD slots\n");
>>>>  		return -ENOMEM;
>>>>  	}
>>>> diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c
>>>> index 6ce75e5..cc86852 100644
>>>> --- a/ksrc/skins/posix/mutex.c
>>>> +++ b/ksrc/skins/posix/mutex.c
>>>> @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t *mutex,
>>>>  	xnlock_put_irqrestore(&nklock, s);
>>>>  
>>>>  #ifdef CONFIG_XENO_FASTSYNCH
>>>> -	/* We call xnheap_free even if the mutex is not pshared; when
>>>> -	   this function is called from pse51_mutexq_cleanup, the
>>>> -	   sem_heap is destroyed, or not the one to which the fastlock
>>>> -	   belongs, xnheap will simply return an error. */
>>>
>>> I think this comment is not completely obsolete. It still applies /wrt
>>> shared/non-shared.
>>>
>>>>  	xnheap_free(&xnsys_ppd_get(mutex->attr.pshared)->sem_heap,
>>>>  		    mutex->synchbase.fastlock);
>>>>  #endif /* CONFIG_XENO_FASTSYNCH */
>>>>
>>>>
>>>
>>> If we can resolve that potential race, this looks like a nice solution.
>>
>> We still have to address that ordering issue I almost forgot:
>> do_cleanup_event runs before do_task_exit_event when terminating the
>> last task. The former destroys the sem heap, the latter fires the delete
>> hook which then tries to free msendq.fastlock to an invalid heap.
>>
>> Should be fixable by setting sem_heap NULL in the ppd on destroy and
>> skipping the fastlock release in __task_delete_hook if the heap pointer
>> is found like that.
> 
> I do not think this can be a problem, as the do_cleanup_event will also
> destroy the threads.

At least native tasks (but I bet that's true for al skins) aren't part
of the XNSHADOW_CLIENT_DETACH cleanup procedure.

> Anyway, I just pushed a branch "u_mode" on the
> xenomai-gch git with all the work based on this mmptd, could you try and
> pull it to see if you sill have this cleanup/task_exit issue?

Will have a look.

Thanks,
Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-06-23 11:15             ` Jan Kiszka
@ 2011-06-23 17:32               ` Gilles Chanteperdrix
  2011-06-23 18:13                 ` Philippe Gerum
  0 siblings, 1 reply; 37+ messages in thread
From: Gilles Chanteperdrix @ 2011-06-23 17:32 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

On 06/23/2011 01:15 PM, Jan Kiszka wrote:
> On 2011-06-23 13:11, Gilles Chanteperdrix wrote:
>> On 06/23/2011 11:37 AM, Jan Kiszka wrote:
>>> On 2011-06-20 19:07, Jan Kiszka wrote:
>>>> On 2011-06-19 15:00, Gilles Chanteperdrix wrote:
>>>>> On 06/19/2011 01:17 PM, Gilles Chanteperdrix wrote:
>>>>>> On 06/19/2011 12:14 PM, Gilles Chanteperdrix wrote:
>>>>>>> I am working on this ppd cleanup issue again, I am asking for help to
>>>>>>> find a fix in -head for all cases where the sys_ppd is needed during
>>>>>>> some cleanup.
>>>>>>>
>>>>>>> The problem is that when the ppd cleanup is invoked:
>>>>>>> - we have no guarantee that current is a thread from the Xenomai
>>>>>>> application;
>>>>>>> - if it is, current->mm is NULL.
>>>>>>>
>>>>>>> So, associating the sys_ppd to either current or current->mm does not
>>>>>>> work. What we could do is pass the sys_ppd to all the other ppds cleanup
>>>>>>> handlers, this would fix cases such as freeing mutexes fastlock, but
>>>>>>> that does not help when the sys_ppd is needed during a thread deletion hook.
>>>>>>>
>>>>>>> I would like to find a solution where simply calling xnsys_ppd_get()
>>>>>>> will work, where we do not have an xnsys_ppd_get for each context, such
>>>>>>> as for instance xnsys_ppd_get_by_mm/xnsys_ppd_get_by_task_struct,
>>>>>>> because it would be too error-prone.
>>>>>>>
>>>>>>> Any idea anyone?
>>>>>>
>>>>>> The best I could come up with: use a ptd to store the mm currently 
>>>>>> being cleaned up, so that xnshadow_ppd_get continues to work, even
>>>>>> in the middle of a cleanup.
>>>>>
>>>>> In order to also get xnshadow_ppd_get to work in task deletion hooks 
>>>>> (which is needed to avoid the issue at the origin of this thread), we 
>>>>> also need to set this ptd upon shadow mapping, so it is still there 
>>>>> when reaching the task deletion hook (where current->mm may be NULL). 
>>>>> Hence the patch:
>>>>>
>>>>> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
>>>>> index b243600..6bc4210 100644
>>>>> --- a/ksrc/nucleus/shadow.c
>>>>> +++ b/ksrc/nucleus/shadow.c
>>>>> @@ -65,6 +65,11 @@ int nkthrptd;
>>>>>  EXPORT_SYMBOL_GPL(nkthrptd);
>>>>>  int nkerrptd;
>>>>>  EXPORT_SYMBOL_GPL(nkerrptd);
>>>>> +int nkmmptd;
>>>>> +EXPORT_SYMBOL_GPL(nkmmptd);
>>>>> +
>>>>> +#define xnshadow_mmptd(t) ((t)->ptd[nkmmptd])
>>>>> +#define xnshadow_mm(t) ((struct mm_struct *)xnshadow_mmptd(t))
>>>>
>>>> xnshadow_mm() can now return a no longer existing mm. So no user of
>>>> xnshadow_mm should ever dereference that pointer. Thus we better change
>>>> all that user to treat the return value as a void pointer e.g.
>>>>
>>>>>  
>>>>>  struct xnskin_slot {
>>>>>  	struct xnskin_props *props;
>>>>> @@ -1304,6 +1309,8 @@ int xnshadow_map(xnthread_t *thread, xncompletion_t __user *u_completion,
>>>>>  	 * friends.
>>>>>  	 */
>>>>>  	xnshadow_thrptd(current) = thread;
>>>>> +	xnshadow_mmptd(current) = current->mm;
>>>>> +
>>>>>  	rthal_enable_notifier(current);
>>>>>  
>>>>>  	if (xnthread_base_priority(thread) == 0 &&
>>>>> @@ -2759,7 +2766,15 @@ static void detach_ppd(xnshadow_ppd_t * ppd)
>>>>>  
>>>>>  static inline void do_cleanup_event(struct mm_struct *mm)
>>>>>  {
>>>>> +	struct task_struct *p = current;
>>>>> +	struct mm_struct *old;
>>>>> +
>>>>> +	old = xnshadow_mm(p);
>>>>> +	xnshadow_mmptd(p) = mm;
>>>>> +
>>>>>  	ppd_remove_mm(mm, &detach_ppd);
>>>>> +
>>>>> +	xnshadow_mmptd(p) = old;
>>>>
>>>> I don't have the full picture yet, but that feels racy: If the context
>>>> over which we clean up that foreign mm is also using xnshadow_mmptd,
>>>> other threads in that process may dislike this temporary change.
>>>>
>>>>>  }
>>>>>  
>>>>>  RTHAL_DECLARE_CLEANUP_EVENT(cleanup_event);
>>>>> @@ -2925,7 +2940,7 @@ EXPORT_SYMBOL_GPL(xnshadow_unregister_interface);
>>>>>  xnshadow_ppd_t *xnshadow_ppd_get(unsigned muxid)
>>>>>  {
>>>>>  	if (xnpod_userspace_p())
>>>>> -		return ppd_lookup(muxid, current->mm);
>>>>> +		return ppd_lookup(muxid, xnshadow_mm(current) ?: current->mm);
>>>>>  
>>>>>  	return NULL;
>>>>>  }
>>>>> @@ -2960,8 +2975,9 @@ int xnshadow_mount(void)
>>>>>  	sema_init(&completion_mutex, 1);
>>>>>  	nkthrptd = rthal_alloc_ptdkey();
>>>>>  	nkerrptd = rthal_alloc_ptdkey();
>>>>> +	nkmmptd = rthal_alloc_ptdkey();
>>>>>  
>>>>> -	if (nkthrptd < 0 || nkerrptd < 0) {
>>>>> +	if (nkthrptd < 0 || nkerrptd < 0 || nkmmptd < 0) {
>>>>>  		printk(KERN_ERR "Xenomai: cannot allocate PTD slots\n");
>>>>>  		return -ENOMEM;
>>>>>  	}
>>>>> diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c
>>>>> index 6ce75e5..cc86852 100644
>>>>> --- a/ksrc/skins/posix/mutex.c
>>>>> +++ b/ksrc/skins/posix/mutex.c
>>>>> @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t *mutex,
>>>>>  	xnlock_put_irqrestore(&nklock, s);
>>>>>  
>>>>>  #ifdef CONFIG_XENO_FASTSYNCH
>>>>> -	/* We call xnheap_free even if the mutex is not pshared; when
>>>>> -	   this function is called from pse51_mutexq_cleanup, the
>>>>> -	   sem_heap is destroyed, or not the one to which the fastlock
>>>>> -	   belongs, xnheap will simply return an error. */
>>>>
>>>> I think this comment is not completely obsolete. It still applies /wrt
>>>> shared/non-shared.
>>>>
>>>>>  	xnheap_free(&xnsys_ppd_get(mutex->attr.pshared)->sem_heap,
>>>>>  		    mutex->synchbase.fastlock);
>>>>>  #endif /* CONFIG_XENO_FASTSYNCH */
>>>>>
>>>>>
>>>>
>>>> If we can resolve that potential race, this looks like a nice solution.
>>>
>>> We still have to address that ordering issue I almost forgot:
>>> do_cleanup_event runs before do_task_exit_event when terminating the
>>> last task. The former destroys the sem heap, the latter fires the delete
>>> hook which then tries to free msendq.fastlock to an invalid heap.
>>>
>>> Should be fixable by setting sem_heap NULL in the ppd on destroy and
>>> skipping the fastlock release in __task_delete_hook if the heap pointer
>>> is found like that.
>>
>> I do not think this can be a problem, as the do_cleanup_event will also
>> destroy the threads.
> 
> At least native tasks (but I bet that's true for al skins) aren't part
> of the XNSHADOW_CLIENT_DETACH cleanup procedure.

Then it is indeed a problem, as we will have another ppd issue in the
task deletion callbacks. I did not notice, because the posix skin does
cleanup the threads, and I simply assumed that every skin did it.

What is the reason for doing this? Are the thread deletion hooks
supposed to work only when called from the context of the dying thread?

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-06-23 17:32               ` Gilles Chanteperdrix
@ 2011-06-23 18:13                 ` Philippe Gerum
  2011-06-23 18:24                   ` Philippe Gerum
  0 siblings, 1 reply; 37+ messages in thread
From: Philippe Gerum @ 2011-06-23 18:13 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, Xenomai core

On Thu, 2011-06-23 at 19:32 +0200, Gilles Chanteperdrix wrote:
> On 06/23/2011 01:15 PM, Jan Kiszka wrote:
> > On 2011-06-23 13:11, Gilles Chanteperdrix wrote:
> >> On 06/23/2011 11:37 AM, Jan Kiszka wrote:
> >>> On 2011-06-20 19:07, Jan Kiszka wrote:
> >>>> On 2011-06-19 15:00, Gilles Chanteperdrix wrote:
> >>>>> On 06/19/2011 01:17 PM, Gilles Chanteperdrix wrote:
> >>>>>> On 06/19/2011 12:14 PM, Gilles Chanteperdrix wrote:
> >>>>>>> I am working on this ppd cleanup issue again, I am asking for help to
> >>>>>>> find a fix in -head for all cases where the sys_ppd is needed during
> >>>>>>> some cleanup.
> >>>>>>>
> >>>>>>> The problem is that when the ppd cleanup is invoked:
> >>>>>>> - we have no guarantee that current is a thread from the Xenomai
> >>>>>>> application;
> >>>>>>> - if it is, current->mm is NULL.
> >>>>>>>
> >>>>>>> So, associating the sys_ppd to either current or current->mm does not
> >>>>>>> work. What we could do is pass the sys_ppd to all the other ppds cleanup
> >>>>>>> handlers, this would fix cases such as freeing mutexes fastlock, but
> >>>>>>> that does not help when the sys_ppd is needed during a thread deletion hook.
> >>>>>>>
> >>>>>>> I would like to find a solution where simply calling xnsys_ppd_get()
> >>>>>>> will work, where we do not have an xnsys_ppd_get for each context, such
> >>>>>>> as for instance xnsys_ppd_get_by_mm/xnsys_ppd_get_by_task_struct,
> >>>>>>> because it would be too error-prone.
> >>>>>>>
> >>>>>>> Any idea anyone?
> >>>>>>
> >>>>>> The best I could come up with: use a ptd to store the mm currently 
> >>>>>> being cleaned up, so that xnshadow_ppd_get continues to work, even
> >>>>>> in the middle of a cleanup.
> >>>>>
> >>>>> In order to also get xnshadow_ppd_get to work in task deletion hooks 
> >>>>> (which is needed to avoid the issue at the origin of this thread), we 
> >>>>> also need to set this ptd upon shadow mapping, so it is still there 
> >>>>> when reaching the task deletion hook (where current->mm may be NULL). 
> >>>>> Hence the patch:
> >>>>>
> >>>>> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
> >>>>> index b243600..6bc4210 100644
> >>>>> --- a/ksrc/nucleus/shadow.c
> >>>>> +++ b/ksrc/nucleus/shadow.c
> >>>>> @@ -65,6 +65,11 @@ int nkthrptd;
> >>>>>  EXPORT_SYMBOL_GPL(nkthrptd);
> >>>>>  int nkerrptd;
> >>>>>  EXPORT_SYMBOL_GPL(nkerrptd);
> >>>>> +int nkmmptd;
> >>>>> +EXPORT_SYMBOL_GPL(nkmmptd);
> >>>>> +
> >>>>> +#define xnshadow_mmptd(t) ((t)->ptd[nkmmptd])
> >>>>> +#define xnshadow_mm(t) ((struct mm_struct *)xnshadow_mmptd(t))
> >>>>
> >>>> xnshadow_mm() can now return a no longer existing mm. So no user of
> >>>> xnshadow_mm should ever dereference that pointer. Thus we better change
> >>>> all that user to treat the return value as a void pointer e.g.
> >>>>
> >>>>>  
> >>>>>  struct xnskin_slot {
> >>>>>  	struct xnskin_props *props;
> >>>>> @@ -1304,6 +1309,8 @@ int xnshadow_map(xnthread_t *thread, xncompletion_t __user *u_completion,
> >>>>>  	 * friends.
> >>>>>  	 */
> >>>>>  	xnshadow_thrptd(current) = thread;
> >>>>> +	xnshadow_mmptd(current) = current->mm;
> >>>>> +
> >>>>>  	rthal_enable_notifier(current);
> >>>>>  
> >>>>>  	if (xnthread_base_priority(thread) == 0 &&
> >>>>> @@ -2759,7 +2766,15 @@ static void detach_ppd(xnshadow_ppd_t * ppd)
> >>>>>  
> >>>>>  static inline void do_cleanup_event(struct mm_struct *mm)
> >>>>>  {
> >>>>> +	struct task_struct *p = current;
> >>>>> +	struct mm_struct *old;
> >>>>> +
> >>>>> +	old = xnshadow_mm(p);
> >>>>> +	xnshadow_mmptd(p) = mm;
> >>>>> +
> >>>>>  	ppd_remove_mm(mm, &detach_ppd);
> >>>>> +
> >>>>> +	xnshadow_mmptd(p) = old;
> >>>>
> >>>> I don't have the full picture yet, but that feels racy: If the context
> >>>> over which we clean up that foreign mm is also using xnshadow_mmptd,
> >>>> other threads in that process may dislike this temporary change.
> >>>>
> >>>>>  }
> >>>>>  
> >>>>>  RTHAL_DECLARE_CLEANUP_EVENT(cleanup_event);
> >>>>> @@ -2925,7 +2940,7 @@ EXPORT_SYMBOL_GPL(xnshadow_unregister_interface);
> >>>>>  xnshadow_ppd_t *xnshadow_ppd_get(unsigned muxid)
> >>>>>  {
> >>>>>  	if (xnpod_userspace_p())
> >>>>> -		return ppd_lookup(muxid, current->mm);
> >>>>> +		return ppd_lookup(muxid, xnshadow_mm(current) ?: current->mm);
> >>>>>  
> >>>>>  	return NULL;
> >>>>>  }
> >>>>> @@ -2960,8 +2975,9 @@ int xnshadow_mount(void)
> >>>>>  	sema_init(&completion_mutex, 1);
> >>>>>  	nkthrptd = rthal_alloc_ptdkey();
> >>>>>  	nkerrptd = rthal_alloc_ptdkey();
> >>>>> +	nkmmptd = rthal_alloc_ptdkey();
> >>>>>  
> >>>>> -	if (nkthrptd < 0 || nkerrptd < 0) {
> >>>>> +	if (nkthrptd < 0 || nkerrptd < 0 || nkmmptd < 0) {
> >>>>>  		printk(KERN_ERR "Xenomai: cannot allocate PTD slots\n");
> >>>>>  		return -ENOMEM;
> >>>>>  	}
> >>>>> diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c
> >>>>> index 6ce75e5..cc86852 100644
> >>>>> --- a/ksrc/skins/posix/mutex.c
> >>>>> +++ b/ksrc/skins/posix/mutex.c
> >>>>> @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t *mutex,
> >>>>>  	xnlock_put_irqrestore(&nklock, s);
> >>>>>  
> >>>>>  #ifdef CONFIG_XENO_FASTSYNCH
> >>>>> -	/* We call xnheap_free even if the mutex is not pshared; when
> >>>>> -	   this function is called from pse51_mutexq_cleanup, the
> >>>>> -	   sem_heap is destroyed, or not the one to which the fastlock
> >>>>> -	   belongs, xnheap will simply return an error. */
> >>>>
> >>>> I think this comment is not completely obsolete. It still applies /wrt
> >>>> shared/non-shared.
> >>>>
> >>>>>  	xnheap_free(&xnsys_ppd_get(mutex->attr.pshared)->sem_heap,
> >>>>>  		    mutex->synchbase.fastlock);
> >>>>>  #endif /* CONFIG_XENO_FASTSYNCH */
> >>>>>
> >>>>>
> >>>>
> >>>> If we can resolve that potential race, this looks like a nice solution.
> >>>
> >>> We still have to address that ordering issue I almost forgot:
> >>> do_cleanup_event runs before do_task_exit_event when terminating the
> >>> last task. The former destroys the sem heap, the latter fires the delete
> >>> hook which then tries to free msendq.fastlock to an invalid heap.
> >>>
> >>> Should be fixable by setting sem_heap NULL in the ppd on destroy and
> >>> skipping the fastlock release in __task_delete_hook if the heap pointer
> >>> is found like that.
> >>
> >> I do not think this can be a problem, as the do_cleanup_event will also
> >> destroy the threads.
> > 
> > At least native tasks (but I bet that's true for al skins) aren't part
> > of the XNSHADOW_CLIENT_DETACH cleanup procedure.
> 
> Then it is indeed a problem, as we will have another ppd issue in the
> task deletion callbacks. I did not notice, because the posix skin does
> cleanup the threads, and I simply assumed that every skin did it.
> 
> What is the reason for doing this? Are the thread deletion hooks
> supposed to work only when called from the context of the dying thread?
> 

When a shadow thread self-deletes, yes. This is a pre-requisite for
xnshadow_unmap to work on the proper context.

-- 
Philippe.




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-06-23 18:13                 ` Philippe Gerum
@ 2011-06-23 18:24                   ` Philippe Gerum
  2011-06-23 18:56                     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 37+ messages in thread
From: Philippe Gerum @ 2011-06-23 18:24 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, Xenomai core

On Thu, 2011-06-23 at 20:13 +0200, Philippe Gerum wrote:
> On Thu, 2011-06-23 at 19:32 +0200, Gilles Chanteperdrix wrote:
> > On 06/23/2011 01:15 PM, Jan Kiszka wrote:
> > > On 2011-06-23 13:11, Gilles Chanteperdrix wrote:
> > >> On 06/23/2011 11:37 AM, Jan Kiszka wrote:
> > >>> On 2011-06-20 19:07, Jan Kiszka wrote:
> > >>>> On 2011-06-19 15:00, Gilles Chanteperdrix wrote:
> > >>>>> On 06/19/2011 01:17 PM, Gilles Chanteperdrix wrote:
> > >>>>>> On 06/19/2011 12:14 PM, Gilles Chanteperdrix wrote:
> > >>>>>>> I am working on this ppd cleanup issue again, I am asking for help to
> > >>>>>>> find a fix in -head for all cases where the sys_ppd is needed during
> > >>>>>>> some cleanup.
> > >>>>>>>
> > >>>>>>> The problem is that when the ppd cleanup is invoked:
> > >>>>>>> - we have no guarantee that current is a thread from the Xenomai
> > >>>>>>> application;
> > >>>>>>> - if it is, current->mm is NULL.
> > >>>>>>>
> > >>>>>>> So, associating the sys_ppd to either current or current->mm does not
> > >>>>>>> work. What we could do is pass the sys_ppd to all the other ppds cleanup
> > >>>>>>> handlers, this would fix cases such as freeing mutexes fastlock, but
> > >>>>>>> that does not help when the sys_ppd is needed during a thread deletion hook.
> > >>>>>>>
> > >>>>>>> I would like to find a solution where simply calling xnsys_ppd_get()
> > >>>>>>> will work, where we do not have an xnsys_ppd_get for each context, such
> > >>>>>>> as for instance xnsys_ppd_get_by_mm/xnsys_ppd_get_by_task_struct,
> > >>>>>>> because it would be too error-prone.
> > >>>>>>>
> > >>>>>>> Any idea anyone?
> > >>>>>>
> > >>>>>> The best I could come up with: use a ptd to store the mm currently 
> > >>>>>> being cleaned up, so that xnshadow_ppd_get continues to work, even
> > >>>>>> in the middle of a cleanup.
> > >>>>>
> > >>>>> In order to also get xnshadow_ppd_get to work in task deletion hooks 
> > >>>>> (which is needed to avoid the issue at the origin of this thread), we 
> > >>>>> also need to set this ptd upon shadow mapping, so it is still there 
> > >>>>> when reaching the task deletion hook (where current->mm may be NULL). 
> > >>>>> Hence the patch:
> > >>>>>
> > >>>>> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
> > >>>>> index b243600..6bc4210 100644
> > >>>>> --- a/ksrc/nucleus/shadow.c
> > >>>>> +++ b/ksrc/nucleus/shadow.c
> > >>>>> @@ -65,6 +65,11 @@ int nkthrptd;
> > >>>>>  EXPORT_SYMBOL_GPL(nkthrptd);
> > >>>>>  int nkerrptd;
> > >>>>>  EXPORT_SYMBOL_GPL(nkerrptd);
> > >>>>> +int nkmmptd;
> > >>>>> +EXPORT_SYMBOL_GPL(nkmmptd);
> > >>>>> +
> > >>>>> +#define xnshadow_mmptd(t) ((t)->ptd[nkmmptd])
> > >>>>> +#define xnshadow_mm(t) ((struct mm_struct *)xnshadow_mmptd(t))
> > >>>>
> > >>>> xnshadow_mm() can now return a no longer existing mm. So no user of
> > >>>> xnshadow_mm should ever dereference that pointer. Thus we better change
> > >>>> all that user to treat the return value as a void pointer e.g.
> > >>>>
> > >>>>>  
> > >>>>>  struct xnskin_slot {
> > >>>>>  	struct xnskin_props *props;
> > >>>>> @@ -1304,6 +1309,8 @@ int xnshadow_map(xnthread_t *thread, xncompletion_t __user *u_completion,
> > >>>>>  	 * friends.
> > >>>>>  	 */
> > >>>>>  	xnshadow_thrptd(current) = thread;
> > >>>>> +	xnshadow_mmptd(current) = current->mm;
> > >>>>> +
> > >>>>>  	rthal_enable_notifier(current);
> > >>>>>  
> > >>>>>  	if (xnthread_base_priority(thread) == 0 &&
> > >>>>> @@ -2759,7 +2766,15 @@ static void detach_ppd(xnshadow_ppd_t * ppd)
> > >>>>>  
> > >>>>>  static inline void do_cleanup_event(struct mm_struct *mm)
> > >>>>>  {
> > >>>>> +	struct task_struct *p = current;
> > >>>>> +	struct mm_struct *old;
> > >>>>> +
> > >>>>> +	old = xnshadow_mm(p);
> > >>>>> +	xnshadow_mmptd(p) = mm;
> > >>>>> +
> > >>>>>  	ppd_remove_mm(mm, &detach_ppd);
> > >>>>> +
> > >>>>> +	xnshadow_mmptd(p) = old;
> > >>>>
> > >>>> I don't have the full picture yet, but that feels racy: If the context
> > >>>> over which we clean up that foreign mm is also using xnshadow_mmptd,
> > >>>> other threads in that process may dislike this temporary change.
> > >>>>
> > >>>>>  }
> > >>>>>  
> > >>>>>  RTHAL_DECLARE_CLEANUP_EVENT(cleanup_event);
> > >>>>> @@ -2925,7 +2940,7 @@ EXPORT_SYMBOL_GPL(xnshadow_unregister_interface);
> > >>>>>  xnshadow_ppd_t *xnshadow_ppd_get(unsigned muxid)
> > >>>>>  {
> > >>>>>  	if (xnpod_userspace_p())
> > >>>>> -		return ppd_lookup(muxid, current->mm);
> > >>>>> +		return ppd_lookup(muxid, xnshadow_mm(current) ?: current->mm);
> > >>>>>  
> > >>>>>  	return NULL;
> > >>>>>  }
> > >>>>> @@ -2960,8 +2975,9 @@ int xnshadow_mount(void)
> > >>>>>  	sema_init(&completion_mutex, 1);
> > >>>>>  	nkthrptd = rthal_alloc_ptdkey();
> > >>>>>  	nkerrptd = rthal_alloc_ptdkey();
> > >>>>> +	nkmmptd = rthal_alloc_ptdkey();
> > >>>>>  
> > >>>>> -	if (nkthrptd < 0 || nkerrptd < 0) {
> > >>>>> +	if (nkthrptd < 0 || nkerrptd < 0 || nkmmptd < 0) {
> > >>>>>  		printk(KERN_ERR "Xenomai: cannot allocate PTD slots\n");
> > >>>>>  		return -ENOMEM;
> > >>>>>  	}
> > >>>>> diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c
> > >>>>> index 6ce75e5..cc86852 100644
> > >>>>> --- a/ksrc/skins/posix/mutex.c
> > >>>>> +++ b/ksrc/skins/posix/mutex.c
> > >>>>> @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t *mutex,
> > >>>>>  	xnlock_put_irqrestore(&nklock, s);
> > >>>>>  
> > >>>>>  #ifdef CONFIG_XENO_FASTSYNCH
> > >>>>> -	/* We call xnheap_free even if the mutex is not pshared; when
> > >>>>> -	   this function is called from pse51_mutexq_cleanup, the
> > >>>>> -	   sem_heap is destroyed, or not the one to which the fastlock
> > >>>>> -	   belongs, xnheap will simply return an error. */
> > >>>>
> > >>>> I think this comment is not completely obsolete. It still applies /wrt
> > >>>> shared/non-shared.
> > >>>>
> > >>>>>  	xnheap_free(&xnsys_ppd_get(mutex->attr.pshared)->sem_heap,
> > >>>>>  		    mutex->synchbase.fastlock);
> > >>>>>  #endif /* CONFIG_XENO_FASTSYNCH */
> > >>>>>
> > >>>>>
> > >>>>
> > >>>> If we can resolve that potential race, this looks like a nice solution.
> > >>>
> > >>> We still have to address that ordering issue I almost forgot:
> > >>> do_cleanup_event runs before do_task_exit_event when terminating the
> > >>> last task. The former destroys the sem heap, the latter fires the delete
> > >>> hook which then tries to free msendq.fastlock to an invalid heap.
> > >>>
> > >>> Should be fixable by setting sem_heap NULL in the ppd on destroy and
> > >>> skipping the fastlock release in __task_delete_hook if the heap pointer
> > >>> is found like that.
> > >>
> > >> I do not think this can be a problem, as the do_cleanup_event will also
> > >> destroy the threads.
> > > 
> > > At least native tasks (but I bet that's true for al skins) aren't part
> > > of the XNSHADOW_CLIENT_DETACH cleanup procedure.
> > 
> > Then it is indeed a problem, as we will have another ppd issue in the
> > task deletion callbacks. I did not notice, because the posix skin does
> > cleanup the threads, and I simply assumed that every skin did it.
> > 
> > What is the reason for doing this? Are the thread deletion hooks
> > supposed to work only when called from the context of the dying thread?
> > 
> 
> When a shadow thread self-deletes, yes. This is a pre-requisite for
> xnshadow_unmap to work on the proper context.

Read: a shadow _always_ self-deletes, this is a pre-requisite too. So
the first rule "deletion hook shall run over the deleted thread" always
apply for it.

> 

-- 
Philippe.




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-06-23 18:24                   ` Philippe Gerum
@ 2011-06-23 18:56                     ` Gilles Chanteperdrix
  0 siblings, 0 replies; 37+ messages in thread
From: Gilles Chanteperdrix @ 2011-06-23 18:56 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, Xenomai core

On 06/23/2011 08:24 PM, Philippe Gerum wrote:
> On Thu, 2011-06-23 at 20:13 +0200, Philippe Gerum wrote:
>> On Thu, 2011-06-23 at 19:32 +0200, Gilles Chanteperdrix wrote:
>>> On 06/23/2011 01:15 PM, Jan Kiszka wrote:
>>>> On 2011-06-23 13:11, Gilles Chanteperdrix wrote:
>>>>> On 06/23/2011 11:37 AM, Jan Kiszka wrote:
>>>>>> On 2011-06-20 19:07, Jan Kiszka wrote:
>>>>>>> On 2011-06-19 15:00, Gilles Chanteperdrix wrote:
>>>>>>>> On 06/19/2011 01:17 PM, Gilles Chanteperdrix wrote:
>>>>>>>>> On 06/19/2011 12:14 PM, Gilles Chanteperdrix wrote:
>>>>>>>>>> I am working on this ppd cleanup issue again, I am asking for help to
>>>>>>>>>> find a fix in -head for all cases where the sys_ppd is needed during
>>>>>>>>>> some cleanup.
>>>>>>>>>>
>>>>>>>>>> The problem is that when the ppd cleanup is invoked:
>>>>>>>>>> - we have no guarantee that current is a thread from the Xenomai
>>>>>>>>>> application;
>>>>>>>>>> - if it is, current->mm is NULL.
>>>>>>>>>>
>>>>>>>>>> So, associating the sys_ppd to either current or current->mm does not
>>>>>>>>>> work. What we could do is pass the sys_ppd to all the other ppds cleanup
>>>>>>>>>> handlers, this would fix cases such as freeing mutexes fastlock, but
>>>>>>>>>> that does not help when the sys_ppd is needed during a thread deletion hook.
>>>>>>>>>>
>>>>>>>>>> I would like to find a solution where simply calling xnsys_ppd_get()
>>>>>>>>>> will work, where we do not have an xnsys_ppd_get for each context, such
>>>>>>>>>> as for instance xnsys_ppd_get_by_mm/xnsys_ppd_get_by_task_struct,
>>>>>>>>>> because it would be too error-prone.
>>>>>>>>>>
>>>>>>>>>> Any idea anyone?
>>>>>>>>>
>>>>>>>>> The best I could come up with: use a ptd to store the mm currently 
>>>>>>>>> being cleaned up, so that xnshadow_ppd_get continues to work, even
>>>>>>>>> in the middle of a cleanup.
>>>>>>>>
>>>>>>>> In order to also get xnshadow_ppd_get to work in task deletion hooks 
>>>>>>>> (which is needed to avoid the issue at the origin of this thread), we 
>>>>>>>> also need to set this ptd upon shadow mapping, so it is still there 
>>>>>>>> when reaching the task deletion hook (where current->mm may be NULL). 
>>>>>>>> Hence the patch:
>>>>>>>>
>>>>>>>> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
>>>>>>>> index b243600..6bc4210 100644
>>>>>>>> --- a/ksrc/nucleus/shadow.c
>>>>>>>> +++ b/ksrc/nucleus/shadow.c
>>>>>>>> @@ -65,6 +65,11 @@ int nkthrptd;
>>>>>>>>  EXPORT_SYMBOL_GPL(nkthrptd);
>>>>>>>>  int nkerrptd;
>>>>>>>>  EXPORT_SYMBOL_GPL(nkerrptd);
>>>>>>>> +int nkmmptd;
>>>>>>>> +EXPORT_SYMBOL_GPL(nkmmptd);
>>>>>>>> +
>>>>>>>> +#define xnshadow_mmptd(t) ((t)->ptd[nkmmptd])
>>>>>>>> +#define xnshadow_mm(t) ((struct mm_struct *)xnshadow_mmptd(t))
>>>>>>>
>>>>>>> xnshadow_mm() can now return a no longer existing mm. So no user of
>>>>>>> xnshadow_mm should ever dereference that pointer. Thus we better change
>>>>>>> all that user to treat the return value as a void pointer e.g.
>>>>>>>
>>>>>>>>  
>>>>>>>>  struct xnskin_slot {
>>>>>>>>  	struct xnskin_props *props;
>>>>>>>> @@ -1304,6 +1309,8 @@ int xnshadow_map(xnthread_t *thread, xncompletion_t __user *u_completion,
>>>>>>>>  	 * friends.
>>>>>>>>  	 */
>>>>>>>>  	xnshadow_thrptd(current) = thread;
>>>>>>>> +	xnshadow_mmptd(current) = current->mm;
>>>>>>>> +
>>>>>>>>  	rthal_enable_notifier(current);
>>>>>>>>  
>>>>>>>>  	if (xnthread_base_priority(thread) == 0 &&
>>>>>>>> @@ -2759,7 +2766,15 @@ static void detach_ppd(xnshadow_ppd_t * ppd)
>>>>>>>>  
>>>>>>>>  static inline void do_cleanup_event(struct mm_struct *mm)
>>>>>>>>  {
>>>>>>>> +	struct task_struct *p = current;
>>>>>>>> +	struct mm_struct *old;
>>>>>>>> +
>>>>>>>> +	old = xnshadow_mm(p);
>>>>>>>> +	xnshadow_mmptd(p) = mm;
>>>>>>>> +
>>>>>>>>  	ppd_remove_mm(mm, &detach_ppd);
>>>>>>>> +
>>>>>>>> +	xnshadow_mmptd(p) = old;
>>>>>>>
>>>>>>> I don't have the full picture yet, but that feels racy: If the context
>>>>>>> over which we clean up that foreign mm is also using xnshadow_mmptd,
>>>>>>> other threads in that process may dislike this temporary change.
>>>>>>>
>>>>>>>>  }
>>>>>>>>  
>>>>>>>>  RTHAL_DECLARE_CLEANUP_EVENT(cleanup_event);
>>>>>>>> @@ -2925,7 +2940,7 @@ EXPORT_SYMBOL_GPL(xnshadow_unregister_interface);
>>>>>>>>  xnshadow_ppd_t *xnshadow_ppd_get(unsigned muxid)
>>>>>>>>  {
>>>>>>>>  	if (xnpod_userspace_p())
>>>>>>>> -		return ppd_lookup(muxid, current->mm);
>>>>>>>> +		return ppd_lookup(muxid, xnshadow_mm(current) ?: current->mm);
>>>>>>>>  
>>>>>>>>  	return NULL;
>>>>>>>>  }
>>>>>>>> @@ -2960,8 +2975,9 @@ int xnshadow_mount(void)
>>>>>>>>  	sema_init(&completion_mutex, 1);
>>>>>>>>  	nkthrptd = rthal_alloc_ptdkey();
>>>>>>>>  	nkerrptd = rthal_alloc_ptdkey();
>>>>>>>> +	nkmmptd = rthal_alloc_ptdkey();
>>>>>>>>  
>>>>>>>> -	if (nkthrptd < 0 || nkerrptd < 0) {
>>>>>>>> +	if (nkthrptd < 0 || nkerrptd < 0 || nkmmptd < 0) {
>>>>>>>>  		printk(KERN_ERR "Xenomai: cannot allocate PTD slots\n");
>>>>>>>>  		return -ENOMEM;
>>>>>>>>  	}
>>>>>>>> diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c
>>>>>>>> index 6ce75e5..cc86852 100644
>>>>>>>> --- a/ksrc/skins/posix/mutex.c
>>>>>>>> +++ b/ksrc/skins/posix/mutex.c
>>>>>>>> @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t *mutex,
>>>>>>>>  	xnlock_put_irqrestore(&nklock, s);
>>>>>>>>  
>>>>>>>>  #ifdef CONFIG_XENO_FASTSYNCH
>>>>>>>> -	/* We call xnheap_free even if the mutex is not pshared; when
>>>>>>>> -	   this function is called from pse51_mutexq_cleanup, the
>>>>>>>> -	   sem_heap is destroyed, or not the one to which the fastlock
>>>>>>>> -	   belongs, xnheap will simply return an error. */
>>>>>>>
>>>>>>> I think this comment is not completely obsolete. It still applies /wrt
>>>>>>> shared/non-shared.
>>>>>>>
>>>>>>>>  	xnheap_free(&xnsys_ppd_get(mutex->attr.pshared)->sem_heap,
>>>>>>>>  		    mutex->synchbase.fastlock);
>>>>>>>>  #endif /* CONFIG_XENO_FASTSYNCH */
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> If we can resolve that potential race, this looks like a nice solution.
>>>>>>
>>>>>> We still have to address that ordering issue I almost forgot:
>>>>>> do_cleanup_event runs before do_task_exit_event when terminating the
>>>>>> last task. The former destroys the sem heap, the latter fires the delete
>>>>>> hook which then tries to free msendq.fastlock to an invalid heap.
>>>>>>
>>>>>> Should be fixable by setting sem_heap NULL in the ppd on destroy and
>>>>>> skipping the fastlock release in __task_delete_hook if the heap pointer
>>>>>> is found like that.
>>>>>
>>>>> I do not think this can be a problem, as the do_cleanup_event will also
>>>>> destroy the threads.
>>>>
>>>> At least native tasks (but I bet that's true for al skins) aren't part
>>>> of the XNSHADOW_CLIENT_DETACH cleanup procedure.
>>>
>>> Then it is indeed a problem, as we will have another ppd issue in the
>>> task deletion callbacks. I did not notice, because the posix skin does
>>> cleanup the threads, and I simply assumed that every skin did it.
>>>
>>> What is the reason for doing this? Are the thread deletion hooks
>>> supposed to work only when called from the context of the dying thread?
>>>
>>
>> When a shadow thread self-deletes, yes. This is a pre-requisite for
>> xnshadow_unmap to work on the proper context.
> 
> Read: a shadow _always_ self-deletes, this is a pre-requisite too. So
> the first rule "deletion hook shall run over the deleted thread" always
> apply for it.

It works by chance for the posix skin then: the main thread shadow is
unmapped in the cleanup callback, but it happens to work because the
cleanup event happens to be executed most of the time over the main
thread context.

For the other threads, the xnshadow_unmap function checks that the
unmapped shadow is current, but only in a second half, which is never
executed in case of the taskexit event.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-06-23  9:37         ` Jan Kiszka
  2011-06-23 11:11           ` Gilles Chanteperdrix
@ 2011-06-23 19:08           ` Gilles Chanteperdrix
  2011-06-24  7:01           ` Gilles Chanteperdrix
  2 siblings, 0 replies; 37+ messages in thread
From: Gilles Chanteperdrix @ 2011-06-23 19:08 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

On 06/23/2011 11:37 AM, Jan Kiszka wrote:
> On 2011-06-20 19:07, Jan Kiszka wrote:
>> On 2011-06-19 15:00, Gilles Chanteperdrix wrote:
>>> On 06/19/2011 01:17 PM, Gilles Chanteperdrix wrote:
>>>> On 06/19/2011 12:14 PM, Gilles Chanteperdrix wrote:
>>>>> I am working on this ppd cleanup issue again, I am asking for help to
>>>>> find a fix in -head for all cases where the sys_ppd is needed during
>>>>> some cleanup.
>>>>>
>>>>> The problem is that when the ppd cleanup is invoked:
>>>>> - we have no guarantee that current is a thread from the Xenomai
>>>>> application;
>>>>> - if it is, current->mm is NULL.
>>>>>
>>>>> So, associating the sys_ppd to either current or current->mm does not
>>>>> work. What we could do is pass the sys_ppd to all the other ppds cleanup
>>>>> handlers, this would fix cases such as freeing mutexes fastlock, but
>>>>> that does not help when the sys_ppd is needed during a thread deletion hook.
>>>>>
>>>>> I would like to find a solution where simply calling xnsys_ppd_get()
>>>>> will work, where we do not have an xnsys_ppd_get for each context, such
>>>>> as for instance xnsys_ppd_get_by_mm/xnsys_ppd_get_by_task_struct,
>>>>> because it would be too error-prone.
>>>>>
>>>>> Any idea anyone?
>>>>
>>>> The best I could come up with: use a ptd to store the mm currently 
>>>> being cleaned up, so that xnshadow_ppd_get continues to work, even
>>>> in the middle of a cleanup.
>>>
>>> In order to also get xnshadow_ppd_get to work in task deletion hooks 
>>> (which is needed to avoid the issue at the origin of this thread), we 
>>> also need to set this ptd upon shadow mapping, so it is still there 
>>> when reaching the task deletion hook (where current->mm may be NULL). 
>>> Hence the patch:
>>>
>>> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
>>> index b243600..6bc4210 100644
>>> --- a/ksrc/nucleus/shadow.c
>>> +++ b/ksrc/nucleus/shadow.c
>>> @@ -65,6 +65,11 @@ int nkthrptd;
>>>  EXPORT_SYMBOL_GPL(nkthrptd);
>>>  int nkerrptd;
>>>  EXPORT_SYMBOL_GPL(nkerrptd);
>>> +int nkmmptd;
>>> +EXPORT_SYMBOL_GPL(nkmmptd);
>>> +
>>> +#define xnshadow_mmptd(t) ((t)->ptd[nkmmptd])
>>> +#define xnshadow_mm(t) ((struct mm_struct *)xnshadow_mmptd(t))
>>
>> xnshadow_mm() can now return a no longer existing mm. So no user of
>> xnshadow_mm should ever dereference that pointer. Thus we better change
>> all that user to treat the return value as a void pointer e.g.
>>
>>>  
>>>  struct xnskin_slot {
>>>  	struct xnskin_props *props;
>>> @@ -1304,6 +1309,8 @@ int xnshadow_map(xnthread_t *thread, xncompletion_t __user *u_completion,
>>>  	 * friends.
>>>  	 */
>>>  	xnshadow_thrptd(current) = thread;
>>> +	xnshadow_mmptd(current) = current->mm;
>>> +
>>>  	rthal_enable_notifier(current);
>>>  
>>>  	if (xnthread_base_priority(thread) == 0 &&
>>> @@ -2759,7 +2766,15 @@ static void detach_ppd(xnshadow_ppd_t * ppd)
>>>  
>>>  static inline void do_cleanup_event(struct mm_struct *mm)
>>>  {
>>> +	struct task_struct *p = current;
>>> +	struct mm_struct *old;
>>> +
>>> +	old = xnshadow_mm(p);
>>> +	xnshadow_mmptd(p) = mm;
>>> +
>>>  	ppd_remove_mm(mm, &detach_ppd);
>>> +
>>> +	xnshadow_mmptd(p) = old;
>>
>> I don't have the full picture yet, but that feels racy: If the context
>> over which we clean up that foreign mm is also using xnshadow_mmptd,
>> other threads in that process may dislike this temporary change.
>>
>>>  }
>>>  
>>>  RTHAL_DECLARE_CLEANUP_EVENT(cleanup_event);
>>> @@ -2925,7 +2940,7 @@ EXPORT_SYMBOL_GPL(xnshadow_unregister_interface);
>>>  xnshadow_ppd_t *xnshadow_ppd_get(unsigned muxid)
>>>  {
>>>  	if (xnpod_userspace_p())
>>> -		return ppd_lookup(muxid, current->mm);
>>> +		return ppd_lookup(muxid, xnshadow_mm(current) ?: current->mm);
>>>  
>>>  	return NULL;
>>>  }
>>> @@ -2960,8 +2975,9 @@ int xnshadow_mount(void)
>>>  	sema_init(&completion_mutex, 1);
>>>  	nkthrptd = rthal_alloc_ptdkey();
>>>  	nkerrptd = rthal_alloc_ptdkey();
>>> +	nkmmptd = rthal_alloc_ptdkey();
>>>  
>>> -	if (nkthrptd < 0 || nkerrptd < 0) {
>>> +	if (nkthrptd < 0 || nkerrptd < 0 || nkmmptd < 0) {
>>>  		printk(KERN_ERR "Xenomai: cannot allocate PTD slots\n");
>>>  		return -ENOMEM;
>>>  	}
>>> diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c
>>> index 6ce75e5..cc86852 100644
>>> --- a/ksrc/skins/posix/mutex.c
>>> +++ b/ksrc/skins/posix/mutex.c
>>> @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t *mutex,
>>>  	xnlock_put_irqrestore(&nklock, s);
>>>  
>>>  #ifdef CONFIG_XENO_FASTSYNCH
>>> -	/* We call xnheap_free even if the mutex is not pshared; when
>>> -	   this function is called from pse51_mutexq_cleanup, the
>>> -	   sem_heap is destroyed, or not the one to which the fastlock
>>> -	   belongs, xnheap will simply return an error. */
>>
>> I think this comment is not completely obsolete. It still applies /wrt
>> shared/non-shared.
>>
>>>  	xnheap_free(&xnsys_ppd_get(mutex->attr.pshared)->sem_heap,
>>>  		    mutex->synchbase.fastlock);
>>>  #endif /* CONFIG_XENO_FASTSYNCH */
>>>
>>>
>>
>> If we can resolve that potential race, this looks like a nice solution.
> 
> We still have to address that ordering issue I almost forgot:
> do_cleanup_event runs before do_task_exit_event when terminating the
> last task. The former destroys the sem heap, the latter fires the delete
> hook which then tries to free msendq.fastlock to an invalid heap.
> 
> Should be fixable by setting sem_heap NULL in the ppd on destroy and
> skipping the fastlock release in __task_delete_hook if the heap pointer
> is found like that.

It will not work: the ppd is destroyed by the time we reach the taskexit
event. And this order is not guaranteed either.

One way to get the guaranteed order would be, when mapping a shadow, to
increment the mm reference count, and dereference the mm when unmapping,
this way, the cleanup event would be guaranteed to happen after the last
shadow exits.

Or we can handle a reference count on the sys ppd, and when this count
reaches zero, do the detach. We would no longer need the cleanup event.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
  2011-06-23  9:37         ` Jan Kiszka
  2011-06-23 11:11           ` Gilles Chanteperdrix
  2011-06-23 19:08           ` Gilles Chanteperdrix
@ 2011-06-24  7:01           ` Gilles Chanteperdrix
  2 siblings, 0 replies; 37+ messages in thread
From: Gilles Chanteperdrix @ 2011-06-24  7:01 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai core

On 06/23/2011 11:37 AM, Jan Kiszka wrote:
> On 2011-06-20 19:07, Jan Kiszka wrote:
>> On 2011-06-19 15:00, Gilles Chanteperdrix wrote:
>>> On 06/19/2011 01:17 PM, Gilles Chanteperdrix wrote:
>>>> On 06/19/2011 12:14 PM, Gilles Chanteperdrix wrote:
>>>>> I am working on this ppd cleanup issue again, I am asking for help to
>>>>> find a fix in -head for all cases where the sys_ppd is needed during
>>>>> some cleanup.
>>>>>
>>>>> The problem is that when the ppd cleanup is invoked:
>>>>> - we have no guarantee that current is a thread from the Xenomai
>>>>> application;
>>>>> - if it is, current->mm is NULL.
>>>>>
>>>>> So, associating the sys_ppd to either current or current->mm does not
>>>>> work. What we could do is pass the sys_ppd to all the other ppds cleanup
>>>>> handlers, this would fix cases such as freeing mutexes fastlock, but
>>>>> that does not help when the sys_ppd is needed during a thread deletion hook.
>>>>>
>>>>> I would like to find a solution where simply calling xnsys_ppd_get()
>>>>> will work, where we do not have an xnsys_ppd_get for each context, such
>>>>> as for instance xnsys_ppd_get_by_mm/xnsys_ppd_get_by_task_struct,
>>>>> because it would be too error-prone.
>>>>>
>>>>> Any idea anyone?
>>>>
>>>> The best I could come up with: use a ptd to store the mm currently 
>>>> being cleaned up, so that xnshadow_ppd_get continues to work, even
>>>> in the middle of a cleanup.
>>>
>>> In order to also get xnshadow_ppd_get to work in task deletion hooks 
>>> (which is needed to avoid the issue at the origin of this thread), we 
>>> also need to set this ptd upon shadow mapping, so it is still there 
>>> when reaching the task deletion hook (where current->mm may be NULL). 
>>> Hence the patch:
>>>
>>> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
>>> index b243600..6bc4210 100644
>>> --- a/ksrc/nucleus/shadow.c
>>> +++ b/ksrc/nucleus/shadow.c
>>> @@ -65,6 +65,11 @@ int nkthrptd;
>>>  EXPORT_SYMBOL_GPL(nkthrptd);
>>>  int nkerrptd;
>>>  EXPORT_SYMBOL_GPL(nkerrptd);
>>> +int nkmmptd;
>>> +EXPORT_SYMBOL_GPL(nkmmptd);
>>> +
>>> +#define xnshadow_mmptd(t) ((t)->ptd[nkmmptd])
>>> +#define xnshadow_mm(t) ((struct mm_struct *)xnshadow_mmptd(t))
>>
>> xnshadow_mm() can now return a no longer existing mm. So no user of
>> xnshadow_mm should ever dereference that pointer. Thus we better change
>> all that user to treat the return value as a void pointer e.g.
>>
>>>  
>>>  struct xnskin_slot {
>>>  	struct xnskin_props *props;
>>> @@ -1304,6 +1309,8 @@ int xnshadow_map(xnthread_t *thread, xncompletion_t __user *u_completion,
>>>  	 * friends.
>>>  	 */
>>>  	xnshadow_thrptd(current) = thread;
>>> +	xnshadow_mmptd(current) = current->mm;
>>> +
>>>  	rthal_enable_notifier(current);
>>>  
>>>  	if (xnthread_base_priority(thread) == 0 &&
>>> @@ -2759,7 +2766,15 @@ static void detach_ppd(xnshadow_ppd_t * ppd)
>>>  
>>>  static inline void do_cleanup_event(struct mm_struct *mm)
>>>  {
>>> +	struct task_struct *p = current;
>>> +	struct mm_struct *old;
>>> +
>>> +	old = xnshadow_mm(p);
>>> +	xnshadow_mmptd(p) = mm;
>>> +
>>>  	ppd_remove_mm(mm, &detach_ppd);
>>> +
>>> +	xnshadow_mmptd(p) = old;
>>
>> I don't have the full picture yet, but that feels racy: If the context
>> over which we clean up that foreign mm is also using xnshadow_mmptd,
>> other threads in that process may dislike this temporary change.
>>
>>>  }
>>>  
>>>  RTHAL_DECLARE_CLEANUP_EVENT(cleanup_event);
>>> @@ -2925,7 +2940,7 @@ EXPORT_SYMBOL_GPL(xnshadow_unregister_interface);
>>>  xnshadow_ppd_t *xnshadow_ppd_get(unsigned muxid)
>>>  {
>>>  	if (xnpod_userspace_p())
>>> -		return ppd_lookup(muxid, current->mm);
>>> +		return ppd_lookup(muxid, xnshadow_mm(current) ?: current->mm);
>>>  
>>>  	return NULL;
>>>  }
>>> @@ -2960,8 +2975,9 @@ int xnshadow_mount(void)
>>>  	sema_init(&completion_mutex, 1);
>>>  	nkthrptd = rthal_alloc_ptdkey();
>>>  	nkerrptd = rthal_alloc_ptdkey();
>>> +	nkmmptd = rthal_alloc_ptdkey();
>>>  
>>> -	if (nkthrptd < 0 || nkerrptd < 0) {
>>> +	if (nkthrptd < 0 || nkerrptd < 0 || nkmmptd < 0) {
>>>  		printk(KERN_ERR "Xenomai: cannot allocate PTD slots\n");
>>>  		return -ENOMEM;
>>>  	}
>>> diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c
>>> index 6ce75e5..cc86852 100644
>>> --- a/ksrc/skins/posix/mutex.c
>>> +++ b/ksrc/skins/posix/mutex.c
>>> @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t *mutex,
>>>  	xnlock_put_irqrestore(&nklock, s);
>>>  
>>>  #ifdef CONFIG_XENO_FASTSYNCH
>>> -	/* We call xnheap_free even if the mutex is not pshared; when
>>> -	   this function is called from pse51_mutexq_cleanup, the
>>> -	   sem_heap is destroyed, or not the one to which the fastlock
>>> -	   belongs, xnheap will simply return an error. */
>>
>> I think this comment is not completely obsolete. It still applies /wrt
>> shared/non-shared.
>>
>>>  	xnheap_free(&xnsys_ppd_get(mutex->attr.pshared)->sem_heap,
>>>  		    mutex->synchbase.fastlock);
>>>  #endif /* CONFIG_XENO_FASTSYNCH */
>>>
>>>
>>
>> If we can resolve that potential race, this looks like a nice solution.
> 
> We still have to address that ordering issue I almost forgot:
> do_cleanup_event runs before do_task_exit_event when terminating the
> last task. The former destroys the sem heap, the latter fires the delete
> hook which then tries to free msendq.fastlock to an invalid heap.

Here is a proposed solution, we add a reference counter to the sys_ppd 
structure, decrement it both in taskexit and cleanup events, and only 
the last calls the destructors.

http://git.xenomai.org/?p=xenomai-gch.git;a=commit;h=35a7cf786986816fba2da8e633da276819d952d7

This relies on the patch which makes xnshadow_ppd_get working in these 
contexts.


-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2011-06-24  7:01 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-23 13:53 [Xenomai-core] [PULL] native: Fix msendq fastlock leakage Jan Kiszka
2011-05-24  4:31 ` Gilles Chanteperdrix
2011-05-24  9:13   ` Jan Kiszka
2011-05-24  9:32     ` Gilles Chanteperdrix
2011-05-24  9:36       ` Jan Kiszka
2011-05-24  9:58         ` Gilles Chanteperdrix
2011-05-24 10:36           ` Jan Kiszka
2011-05-24 10:41             ` Gilles Chanteperdrix
2011-05-24 12:23               ` Jan Kiszka
2011-05-24 12:30                 ` Gilles Chanteperdrix
2011-05-24 13:52                   ` Jan Kiszka
2011-05-24 14:03                     ` Gilles Chanteperdrix
2011-05-25 11:20                       ` Jan Kiszka
2011-05-25 11:58                         ` Gilles Chanteperdrix
2011-05-25 12:12                           ` Jan Kiszka
2011-05-25 12:19                             ` Gilles Chanteperdrix
2011-05-25 12:22                               ` Jan Kiszka
2011-05-25 18:48                                 ` Gilles Chanteperdrix
2011-05-26  7:18                                   ` Jan Kiszka
2011-05-26  7:29                                     ` Gilles Chanteperdrix
2011-05-26  7:37                                       ` Jan Kiszka
2011-05-26  7:58                                         ` Gilles Chanteperdrix
2011-06-19 10:14 ` Gilles Chanteperdrix
2011-06-19 11:17   ` Gilles Chanteperdrix
2011-06-19 13:00     ` Gilles Chanteperdrix
2011-06-20 17:07       ` Jan Kiszka
2011-06-20 17:46         ` Gilles Chanteperdrix
2011-06-20 20:52           ` Jan Kiszka
2011-06-23  9:37         ` Jan Kiszka
2011-06-23 11:11           ` Gilles Chanteperdrix
2011-06-23 11:15             ` Jan Kiszka
2011-06-23 17:32               ` Gilles Chanteperdrix
2011-06-23 18:13                 ` Philippe Gerum
2011-06-23 18:24                   ` Philippe Gerum
2011-06-23 18:56                     ` Gilles Chanteperdrix
2011-06-23 19:08           ` Gilles Chanteperdrix
2011-06-24  7:01           ` Gilles Chanteperdrix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.