* [PATCH] cobalt: Fix resource leak of cobalt_monitor
@ 2022-01-17 16:53 Florian Bezdeka
2022-01-20 8:12 ` Philippe Gerum
2022-02-15 8:23 ` Jan Kiszka
0 siblings, 2 replies; 5+ messages in thread
From: Florian Bezdeka @ 2022-01-17 16:53 UTC (permalink / raw)
To: xenomai
We have a test within the smokey testsuite which actually does
something like
cobalt_monitor_init()
cobalt_monitor_wait()
cobalt_monitor_enter()
-> timeout
cobalt_monitor_exit()
cobalt_monitor_destroy()
If the posix_mutex tests were run right after this scenario the mutex
tests failed. The wrong mutex state was caused by the monitor in the
previous test not being cleaned up properly.
Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
---
I'm not 100% sure about this fix. Maybe we should move it into
cobalt_monitor_reclaim, maybe I'm using the monitor in a wrong way
inside the y2038 tests. Everything is possible...
@Philippe: It would be very nice if you could have a look as well.
Thanks!
I triggered a CI run on all of our supported platforms. Looks good so far and
the testsuite does no longer fail in the compat case as well. The compat tests
were manually run. 64 bit kernel but 32 bit userland.
kernel/cobalt/posix/monitor.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/cobalt/posix/monitor.c b/kernel/cobalt/posix/monitor.c
index 91396955c..c266b8b3b 100644
--- a/kernel/cobalt/posix/monitor.c
+++ b/kernel/cobalt/posix/monitor.c
@@ -425,6 +425,7 @@ COBALT_SYSCALL(monitor_destroy, primary,
goto fail;
}
+ xnsynch_release(&mon->gate, curr);
cobalt_monitor_reclaim(&mon->resnode, s); /* drops lock */
xnsched_run();
--
2.30.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] cobalt: Fix resource leak of cobalt_monitor
2022-01-17 16:53 [PATCH] cobalt: Fix resource leak of cobalt_monitor Florian Bezdeka
@ 2022-01-20 8:12 ` Philippe Gerum
2022-01-20 11:35 ` Bezdeka, Florian
2022-02-15 8:23 ` Jan Kiszka
1 sibling, 1 reply; 5+ messages in thread
From: Philippe Gerum @ 2022-01-20 8:12 UTC (permalink / raw)
To: Florian Bezdeka; +Cc: xenomai, jan.kiszka
Florian Bezdeka <florian.bezdeka@siemens.com> writes:
> We have a test within the smokey testsuite which actually does
> something like
>
> cobalt_monitor_init()
> cobalt_monitor_wait()
> cobalt_monitor_enter()
> -> timeout
> cobalt_monitor_exit()
> cobalt_monitor_destroy()
>
> If the posix_mutex tests were run right after this scenario the mutex
> tests failed. The wrong mutex state was caused by the monitor in the
> previous test not being cleaned up properly.
>
> Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
> ---
>
> I'm not 100% sure about this fix. Maybe we should move it into
> cobalt_monitor_reclaim, maybe I'm using the monitor in a wrong way
> inside the y2038 tests. Everything is possible...
>
> @Philippe: It would be very nice if you could have a look as well.
> Thanks!
>
Looks ok, thank you for spotting this. I think the issue your patch
fixes caused a PI boost to linger since deleting the gate lock does not
drop ownership - only xnsynch_release() does so - leaving the owner with
a boosted priority unduly (*). This is likely to wreck any subsequent
test depending on the priority scheme.
You could indeed move the release call to the reclamation routine.
(*) ownership is reacquired including on a timeout condition by the call
to monitor_enter() at the end of __cobalt_monitor_wait().
--
Philippe.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] cobalt: Fix resource leak of cobalt_monitor
2022-01-20 8:12 ` Philippe Gerum
@ 2022-01-20 11:35 ` Bezdeka, Florian
2022-01-22 18:16 ` Philippe Gerum
0 siblings, 1 reply; 5+ messages in thread
From: Bezdeka, Florian @ 2022-01-20 11:35 UTC (permalink / raw)
To: rpm; +Cc: xenomai, jan.kiszka
On Thu, 2022-01-20 at 09:12 +0100, Philippe Gerum wrote:
> Florian Bezdeka <florian.bezdeka@siemens.com> writes:
>
> > We have a test within the smokey testsuite which actually does
> > something like
> >
> > cobalt_monitor_init()
> > cobalt_monitor_wait()
> > cobalt_monitor_enter()
> > -> timeout
> > cobalt_monitor_exit()
> > cobalt_monitor_destroy()
> >
> > If the posix_mutex tests were run right after this scenario the mutex
> > tests failed. The wrong mutex state was caused by the monitor in the
> > previous test not being cleaned up properly.
> >
> > Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
> > ---
> >
> > I'm not 100% sure about this fix. Maybe we should move it into
> > cobalt_monitor_reclaim, maybe I'm using the monitor in a wrong way
> > inside the y2038 tests. Everything is possible...
> >
> > @Philippe: It would be very nice if you could have a look as well.
> > Thanks!
> >
>
> Looks ok, thank you for spotting this. I think the issue your patch
> fixes caused a PI boost to linger since deleting the gate lock does not
> drop ownership - only xnsynch_release() does so - leaving the owner with
> a boosted priority unduly (*). This is likely to wreck any subsequent
> test depending on the priority scheme.
>
> You could indeed move the release call to the reclamation routine.
Checked that again and my conclusion is that the current location is
correct and it should not be moved into cobalt_monitor_reclaim.
Reasoning: cobalt_monitor_reclaim is called by cobalt_process_detach()
as well, but there is no owner check like in CoBaLt_monitor_destroy.
Do you agree?
>
> (*) ownership is reacquired including on a timeout condition by the call
> to monitor_enter() at the end of __cobalt_monitor_wait().
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] cobalt: Fix resource leak of cobalt_monitor
2022-01-20 11:35 ` Bezdeka, Florian
@ 2022-01-22 18:16 ` Philippe Gerum
0 siblings, 0 replies; 5+ messages in thread
From: Philippe Gerum @ 2022-01-22 18:16 UTC (permalink / raw)
To: Bezdeka, Florian; +Cc: xenomai, jan.kiszka
"Bezdeka, Florian" <florian.bezdeka@siemens.com> writes:
> On Thu, 2022-01-20 at 09:12 +0100, Philippe Gerum wrote:
>> Florian Bezdeka <florian.bezdeka@siemens.com> writes:
>>
>> > We have a test within the smokey testsuite which actually does
>> > something like
>> >
>> > cobalt_monitor_init()
>> > cobalt_monitor_wait()
>> > cobalt_monitor_enter()
>> > -> timeout
>> > cobalt_monitor_exit()
>> > cobalt_monitor_destroy()
>> >
>> > If the posix_mutex tests were run right after this scenario the mutex
>> > tests failed. The wrong mutex state was caused by the monitor in the
>> > previous test not being cleaned up properly.
>> >
>> > Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
>> > ---
>> >
>> > I'm not 100% sure about this fix. Maybe we should move it into
>> > cobalt_monitor_reclaim, maybe I'm using the monitor in a wrong way
>> > inside the y2038 tests. Everything is possible...
>> >
>> > @Philippe: It would be very nice if you could have a look as well.
>> > Thanks!
>> >
>>
>> Looks ok, thank you for spotting this. I think the issue your patch
>> fixes caused a PI boost to linger since deleting the gate lock does not
>> drop ownership - only xnsynch_release() does so - leaving the owner with
>> a boosted priority unduly (*). This is likely to wreck any subsequent
>> test depending on the priority scheme.
>>
>> You could indeed move the release call to the reclamation routine.
>
> Checked that again and my conclusion is that the current location is
> correct and it should not be moved into cobalt_monitor_reclaim.
>
> Reasoning: cobalt_monitor_reclaim is called by cobalt_process_detach()
> as well, but there is no owner check like in CoBaLt_monitor_destroy.
>
> Do you agree?
>
I guess you want me to look at the code at this point. Ok, let's try
this. On task exit, PI is cleared when either release_all_ownerships()
or xnsynch_release() are called, whichever comes first. For a thread
which belongs to the Cobalt personality, the exit sequence may include
in that order:
1. __handle_taskexit_event
cobalt_remove_process
cobalt_process_detach
__reclaim_resource
cobalt_monitor_reclaim
xnsynch_destroy => flush waiters, clear PI
2. __xnthread_cleanup
release_all_ownerships => monitor already destroyed, no lock held
So, despite the fact that an exiting thread drops all ownerships on
locks only _after_ the resources it owns have been reclaimed, this
should be ok to keep the release out of the reclamation code, because
all waiters were flushed prior to that point, plus the lock owner exits
anyway (unlike in the plain monitory_destroy() case).
So yes, you are right and I was likely off base when mentioning the move
to the reclamation code. You may want to double-check that taking a
termination signal while holding a monitor does indeed what I just
described though.
shameless plug: the v3 way of doing resource management is overly
complicated and somewhat fragile, would benefit from a serious
streamlining effort. v4 does much, much better by simply relying on the
common VFS for this part.
--
Philippe.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] cobalt: Fix resource leak of cobalt_monitor
2022-01-17 16:53 [PATCH] cobalt: Fix resource leak of cobalt_monitor Florian Bezdeka
2022-01-20 8:12 ` Philippe Gerum
@ 2022-02-15 8:23 ` Jan Kiszka
1 sibling, 0 replies; 5+ messages in thread
From: Jan Kiszka @ 2022-02-15 8:23 UTC (permalink / raw)
To: Florian Bezdeka, xenomai
On 17.01.22 17:53, Florian Bezdeka wrote:
> We have a test within the smokey testsuite which actually does
> something like
>
> cobalt_monitor_init()
> cobalt_monitor_wait()
> cobalt_monitor_enter()
> -> timeout
> cobalt_monitor_exit()
> cobalt_monitor_destroy()
>
> If the posix_mutex tests were run right after this scenario the mutex
> tests failed. The wrong mutex state was caused by the monitor in the
> previous test not being cleaned up properly.
>
> Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
> ---
>
> I'm not 100% sure about this fix. Maybe we should move it into
> cobalt_monitor_reclaim, maybe I'm using the monitor in a wrong way
> inside the y2038 tests. Everything is possible...
>
> @Philippe: It would be very nice if you could have a look as well.
> Thanks!
>
> I triggered a CI run on all of our supported platforms. Looks good so far and
> the testsuite does no longer fail in the compat case as well. The compat tests
> were manually run. 64 bit kernel but 32 bit userland.
>
>
> kernel/cobalt/posix/monitor.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/cobalt/posix/monitor.c b/kernel/cobalt/posix/monitor.c
> index 91396955c..c266b8b3b 100644
> --- a/kernel/cobalt/posix/monitor.c
> +++ b/kernel/cobalt/posix/monitor.c
> @@ -425,6 +425,7 @@ COBALT_SYSCALL(monitor_destroy, primary,
> goto fail;
> }
>
> + xnsynch_release(&mon->gate, curr);
> cobalt_monitor_reclaim(&mon->resnode, s); /* drops lock */
>
> xnsched_run();
Thanks, applied.
Jan
--
Siemens AG, Technology
Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-02-15 8:23 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-17 16:53 [PATCH] cobalt: Fix resource leak of cobalt_monitor Florian Bezdeka
2022-01-20 8:12 ` Philippe Gerum
2022-01-20 11:35 ` Bezdeka, Florian
2022-01-22 18:16 ` Philippe Gerum
2022-02-15 8:23 ` Jan Kiszka
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.