All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] cobalt: Fix resource leak of cobalt_monitor
@ 2022-01-17 16:53 Florian Bezdeka
  2022-01-20  8:12 ` Philippe Gerum
  2022-02-15  8:23 ` Jan Kiszka
  0 siblings, 2 replies; 5+ messages in thread
From: Florian Bezdeka @ 2022-01-17 16:53 UTC (permalink / raw)
  To: xenomai

We have a test within the smokey testsuite which actually does
something like

   cobalt_monitor_init()
   cobalt_monitor_wait()
       cobalt_monitor_enter()
         -> timeout
       cobalt_monitor_exit()
   cobalt_monitor_destroy()

If the posix_mutex tests were run right after this scenario the mutex
tests failed. The wrong mutex state was caused by the monitor in the
previous test not being cleaned up properly.

Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
---

I'm not 100% sure about this fix. Maybe we should move it into
cobalt_monitor_reclaim, maybe I'm using the monitor in a wrong way
inside the y2038 tests. Everything is possible...

@Philippe: It would be very nice if you could have a look as well.
Thanks!

I triggered a CI run on all of our supported platforms. Looks good so far and
the testsuite does no longer fail in the compat case as well. The compat tests
were manually run. 64 bit kernel but 32 bit userland.


 kernel/cobalt/posix/monitor.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/cobalt/posix/monitor.c b/kernel/cobalt/posix/monitor.c
index 91396955c..c266b8b3b 100644
--- a/kernel/cobalt/posix/monitor.c
+++ b/kernel/cobalt/posix/monitor.c
@@ -425,6 +425,7 @@ COBALT_SYSCALL(monitor_destroy, primary,
 		goto fail;
 	}
 
+	xnsynch_release(&mon->gate, curr);
 	cobalt_monitor_reclaim(&mon->resnode, s); /* drops lock */
 
 	xnsched_run();
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] cobalt: Fix resource leak of cobalt_monitor
  2022-01-17 16:53 [PATCH] cobalt: Fix resource leak of cobalt_monitor Florian Bezdeka
@ 2022-01-20  8:12 ` Philippe Gerum
  2022-01-20 11:35   ` Bezdeka, Florian
  2022-02-15  8:23 ` Jan Kiszka
  1 sibling, 1 reply; 5+ messages in thread
From: Philippe Gerum @ 2022-01-20  8:12 UTC (permalink / raw)
  To: Florian Bezdeka; +Cc: xenomai, jan.kiszka


Florian Bezdeka <florian.bezdeka@siemens.com> writes:

> We have a test within the smokey testsuite which actually does
> something like
>
>    cobalt_monitor_init()
>    cobalt_monitor_wait()
>        cobalt_monitor_enter()
>          -> timeout
>        cobalt_monitor_exit()
>    cobalt_monitor_destroy()
>
> If the posix_mutex tests were run right after this scenario the mutex
> tests failed. The wrong mutex state was caused by the monitor in the
> previous test not being cleaned up properly.
>
> Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
> ---
>
> I'm not 100% sure about this fix. Maybe we should move it into
> cobalt_monitor_reclaim, maybe I'm using the monitor in a wrong way
> inside the y2038 tests. Everything is possible...
>
> @Philippe: It would be very nice if you could have a look as well.
> Thanks!
>

Looks ok, thank you for spotting this. I think the issue your patch
fixes caused a PI boost to linger since deleting the gate lock does not
drop ownership - only xnsynch_release() does so - leaving the owner with
a boosted priority unduly (*). This is likely to wreck any subsequent
test depending on the priority scheme.

You could indeed move the release call to the reclamation routine.

(*) ownership is reacquired including on a timeout condition by the call
to monitor_enter() at the end of __cobalt_monitor_wait().

-- 
Philippe.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] cobalt: Fix resource leak of cobalt_monitor
  2022-01-20  8:12 ` Philippe Gerum
@ 2022-01-20 11:35   ` Bezdeka, Florian
  2022-01-22 18:16     ` Philippe Gerum
  0 siblings, 1 reply; 5+ messages in thread
From: Bezdeka, Florian @ 2022-01-20 11:35 UTC (permalink / raw)
  To: rpm; +Cc: xenomai, jan.kiszka

On Thu, 2022-01-20 at 09:12 +0100, Philippe Gerum wrote:
> Florian Bezdeka <florian.bezdeka@siemens.com> writes:
> 
> > We have a test within the smokey testsuite which actually does
> > something like
> > 
> >    cobalt_monitor_init()
> >    cobalt_monitor_wait()
> >        cobalt_monitor_enter()
> >          -> timeout
> >        cobalt_monitor_exit()
> >    cobalt_monitor_destroy()
> > 
> > If the posix_mutex tests were run right after this scenario the mutex
> > tests failed. The wrong mutex state was caused by the monitor in the
> > previous test not being cleaned up properly.
> > 
> > Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
> > ---
> > 
> > I'm not 100% sure about this fix. Maybe we should move it into
> > cobalt_monitor_reclaim, maybe I'm using the monitor in a wrong way
> > inside the y2038 tests. Everything is possible...
> > 
> > @Philippe: It would be very nice if you could have a look as well.
> > Thanks!
> > 
> 
> Looks ok, thank you for spotting this. I think the issue your patch
> fixes caused a PI boost to linger since deleting the gate lock does not
> drop ownership - only xnsynch_release() does so - leaving the owner with
> a boosted priority unduly (*). This is likely to wreck any subsequent
> test depending on the priority scheme.
> 
> You could indeed move the release call to the reclamation routine.

Checked that again and my conclusion is that the current location is
correct and it should not be moved into cobalt_monitor_reclaim.

Reasoning: cobalt_monitor_reclaim is called by cobalt_process_detach()
as well, but there is no owner check like in CoBaLt_monitor_destroy.

Do you agree?

> 
> (*) ownership is reacquired including on a timeout condition by the call
> to monitor_enter() at the end of __cobalt_monitor_wait().
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] cobalt: Fix resource leak of cobalt_monitor
  2022-01-20 11:35   ` Bezdeka, Florian
@ 2022-01-22 18:16     ` Philippe Gerum
  0 siblings, 0 replies; 5+ messages in thread
From: Philippe Gerum @ 2022-01-22 18:16 UTC (permalink / raw)
  To: Bezdeka, Florian; +Cc: xenomai, jan.kiszka


"Bezdeka, Florian" <florian.bezdeka@siemens.com> writes:

> On Thu, 2022-01-20 at 09:12 +0100, Philippe Gerum wrote:
>> Florian Bezdeka <florian.bezdeka@siemens.com> writes:
>> 
>> > We have a test within the smokey testsuite which actually does
>> > something like
>> > 
>> >    cobalt_monitor_init()
>> >    cobalt_monitor_wait()
>> >        cobalt_monitor_enter()
>> >          -> timeout
>> >        cobalt_monitor_exit()
>> >    cobalt_monitor_destroy()
>> > 
>> > If the posix_mutex tests were run right after this scenario the mutex
>> > tests failed. The wrong mutex state was caused by the monitor in the
>> > previous test not being cleaned up properly.
>> > 
>> > Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
>> > ---
>> > 
>> > I'm not 100% sure about this fix. Maybe we should move it into
>> > cobalt_monitor_reclaim, maybe I'm using the monitor in a wrong way
>> > inside the y2038 tests. Everything is possible...
>> > 
>> > @Philippe: It would be very nice if you could have a look as well.
>> > Thanks!
>> > 
>> 
>> Looks ok, thank you for spotting this. I think the issue your patch
>> fixes caused a PI boost to linger since deleting the gate lock does not
>> drop ownership - only xnsynch_release() does so - leaving the owner with
>> a boosted priority unduly (*). This is likely to wreck any subsequent
>> test depending on the priority scheme.
>> 
>> You could indeed move the release call to the reclamation routine.
>
> Checked that again and my conclusion is that the current location is
> correct and it should not be moved into cobalt_monitor_reclaim.
>
> Reasoning: cobalt_monitor_reclaim is called by cobalt_process_detach()
> as well, but there is no owner check like in CoBaLt_monitor_destroy.
>
> Do you agree?
>

I guess you want me to look at the code at this point. Ok, let's try
this. On task exit, PI is cleared when either release_all_ownerships()
or xnsynch_release() are called, whichever comes first. For a thread
which belongs to the Cobalt personality, the exit sequence may include
in that order:

1. __handle_taskexit_event
        cobalt_remove_process
                cobalt_process_detach
                        __reclaim_resource
                                cobalt_monitor_reclaim
                                        xnsynch_destroy => flush waiters, clear PI
2. __xnthread_cleanup
        release_all_ownerships => monitor already destroyed, no lock held

So, despite the fact that an exiting thread drops all ownerships on
locks only _after_ the resources it owns have been reclaimed, this
should be ok to keep the release out of the reclamation code, because
all waiters were flushed prior to that point, plus the lock owner exits
anyway (unlike in the plain monitory_destroy() case).

So yes, you are right and I was likely off base when mentioning the move
to the reclamation code. You may want to double-check that taking a
termination signal while holding a monitor does indeed what I just
described though.

shameless plug: the v3 way of doing resource management is overly
complicated and somewhat fragile, would benefit from a serious
streamlining effort. v4 does much, much better by simply relying on the
common VFS for this part.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] cobalt: Fix resource leak of cobalt_monitor
  2022-01-17 16:53 [PATCH] cobalt: Fix resource leak of cobalt_monitor Florian Bezdeka
  2022-01-20  8:12 ` Philippe Gerum
@ 2022-02-15  8:23 ` Jan Kiszka
  1 sibling, 0 replies; 5+ messages in thread
From: Jan Kiszka @ 2022-02-15  8:23 UTC (permalink / raw)
  To: Florian Bezdeka, xenomai

On 17.01.22 17:53, Florian Bezdeka wrote:
> We have a test within the smokey testsuite which actually does
> something like
> 
>    cobalt_monitor_init()
>    cobalt_monitor_wait()
>        cobalt_monitor_enter()
>          -> timeout
>        cobalt_monitor_exit()
>    cobalt_monitor_destroy()
> 
> If the posix_mutex tests were run right after this scenario the mutex
> tests failed. The wrong mutex state was caused by the monitor in the
> previous test not being cleaned up properly.
> 
> Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
> ---
> 
> I'm not 100% sure about this fix. Maybe we should move it into
> cobalt_monitor_reclaim, maybe I'm using the monitor in a wrong way
> inside the y2038 tests. Everything is possible...
> 
> @Philippe: It would be very nice if you could have a look as well.
> Thanks!
> 
> I triggered a CI run on all of our supported platforms. Looks good so far and
> the testsuite does no longer fail in the compat case as well. The compat tests
> were manually run. 64 bit kernel but 32 bit userland.
> 
> 
>  kernel/cobalt/posix/monitor.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/cobalt/posix/monitor.c b/kernel/cobalt/posix/monitor.c
> index 91396955c..c266b8b3b 100644
> --- a/kernel/cobalt/posix/monitor.c
> +++ b/kernel/cobalt/posix/monitor.c
> @@ -425,6 +425,7 @@ COBALT_SYSCALL(monitor_destroy, primary,
>  		goto fail;
>  	}
>  
> +	xnsynch_release(&mon->gate, curr);
>  	cobalt_monitor_reclaim(&mon->resnode, s); /* drops lock */
>  
>  	xnsched_run();

Thanks, applied.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-02-15  8:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-17 16:53 [PATCH] cobalt: Fix resource leak of cobalt_monitor Florian Bezdeka
2022-01-20  8:12 ` Philippe Gerum
2022-01-20 11:35   ` Bezdeka, Florian
2022-01-22 18:16     ` Philippe Gerum
2022-02-15  8:23 ` Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.