All of lore.kernel.org
 help / color / mirror / Atom feed
* "rtdm/nrtsig: move inband work description off the stack"
@ 2021-05-25  7:35 Jan Kiszka
  2021-05-25  8:46 ` Philippe Gerum
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2021-05-25  7:35 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

Hi Philippe,

[1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
to what it was (and even over I-pipe). xnmalloc is nothing you would
expect from a "send a signal" service, specifically from
rtdm_nrtsig_pend which does not even make use of the sending extra payload.

Can we do better? Also for xnthread_signal (fd and udd usecases are not
time-sensitive anyway).

Jan

[1]
https://source.denx.de/Xenomai/xenomai/-/commit/572049132a04fc1bd217b87598af7c0ba8711c18

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "rtdm/nrtsig: move inband work description off the stack"
  2021-05-25  7:35 "rtdm/nrtsig: move inband work description off the stack" Jan Kiszka
@ 2021-05-25  8:46 ` Philippe Gerum
  2021-05-25  8:58   ` Jan Kiszka
  0 siblings, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2021-05-25  8:46 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> Hi Philippe,
>
> [1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
> to what it was (and even over I-pipe). xnmalloc is nothing you would
> expect from a "send a signal" service, specifically from
> rtdm_nrtsig_pend which does not even make use of the sending extra payload.
>
> Can we do better? Also for xnthread_signal (fd and udd usecases are not
> time-sensitive anyway).
>

Nope, I cannot see any significantly better option which would still
allow a common implementation we can map on top of Dovetail / I-pipe.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "rtdm/nrtsig: move inband work description off the stack"
  2021-05-25  8:46 ` Philippe Gerum
@ 2021-05-25  8:58   ` Jan Kiszka
  2021-05-25 10:01     ` Philippe Gerum
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2021-05-25  8:58 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 25.05.21 10:46, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> Hi Philippe,
>>
>> [1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
>> to what it was (and even over I-pipe). xnmalloc is nothing you would
>> expect from a "send a signal" service, specifically from
>> rtdm_nrtsig_pend which does not even make use of the sending extra payload.
>>
>> Can we do better? Also for xnthread_signal (fd and udd usecases are not
>> time-sensitive anyway).
>>
> 
> Nope, I cannot see any significantly better option which would still
> allow a common implementation we can map on top of Dovetail / I-pipe.
> 

E.g. pre-allocate the work object for data-free signals and use that -
or not send it when it is in use, thus pending.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "rtdm/nrtsig: move inband work description off the stack"
  2021-05-25  8:58   ` Jan Kiszka
@ 2021-05-25 10:01     ` Philippe Gerum
  2021-05-25 10:38       ` Jan Kiszka
  0 siblings, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2021-05-25 10:01 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 25.05.21 10:46, Philippe Gerum wrote:
>> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>>> Hi Philippe,
>>>
>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
>>> to what it was (and even over I-pipe). xnmalloc is nothing you would
>>> expect from a "send a signal" service, specifically from
>>> rtdm_nrtsig_pend which does not even make use of the sending extra payload.
>>>
>>> Can we do better? Also for xnthread_signal (fd and udd usecases are not
>>> time-sensitive anyway).
>>>
>> 
>> Nope, I cannot see any significantly better option which would still
>> allow a common implementation we can map on top of Dovetail / I-pipe.
>> 
>
> E.g. pre-allocate the work object for data-free signals and use that -
> or not send it when it is in use, thus pending.
>

Ok, let's recap:

- rtdm_nrtsig_pend(): does not allocate anything, merely uses a
  user-provided memory block, containing a request header. Current
  callers should either already care for not resending a request
  uselessly because it is pending (e.g. [1]), or their internal logic
  should prevent that (e.g. [2]). This interface always convey user data
  by reference.

- rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct in order to
  convey a request block we cannot squeeze into a work_struct, since the
  latter is in itself an anchor type the user may embed into its own
  private argument block. I'm unsure ensuring that
  rtdm_schedule_nrt_work() calls do not pile up would be worth it, as
  this call is not supposed to be used frenetically on some hot path in
  the first place. But if we'd want to do that, then we would have to
  change the signature of rtdm_schedule_nrt_work(), so that we gain a
  persistent context data we could use for determining whether a nrt
  work request is in flight. We could not probe the work_struct for that
  purpose, that would be racy.

Do you see any way to have a smarter rtdm_schedule_nrt_work() without
changing its signature?

[1]
kernel/drivers/net/addons/cap.c:		rtdm_nrtsig_pend(&cap_signal);
kernel/drivers/net/addons/cap.c:		rtdm_nrtsig_pend(&cap_signal);
kernel/drivers/net/addons/proxy.c:
rtdm_nrtsig_pend(&rtnetproxy_rx_signal);

[2]
kernel/drivers/testing/switchtest.c:				rtdm_nrtsig_pend(&ctx->wake_utask);
kernel/drivers/testing/switchtest.c:		rtdm_nrtsig_pend(&ctx->wake_utask);
kernel/drivers/testing/switchtest.c:			rtdm_nrtsig_pend(&ctx->wake_utask);

-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "rtdm/nrtsig: move inband work description off the stack"
  2021-05-25 10:01     ` Philippe Gerum
@ 2021-05-25 10:38       ` Jan Kiszka
  2021-05-25 12:37         ` Philippe Gerum
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2021-05-25 10:38 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 25.05.21 12:01, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> On 25.05.21 10:46, Philippe Gerum wrote:
>>>
>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>
>>>> Hi Philippe,
>>>>
>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
>>>> to what it was (and even over I-pipe). xnmalloc is nothing you would
>>>> expect from a "send a signal" service, specifically from
>>>> rtdm_nrtsig_pend which does not even make use of the sending extra payload.
>>>>
>>>> Can we do better? Also for xnthread_signal (fd and udd usecases are not
>>>> time-sensitive anyway).
>>>>
>>>
>>> Nope, I cannot see any significantly better option which would still
>>> allow a common implementation we can map on top of Dovetail / I-pipe.
>>>
>>
>> E.g. pre-allocate the work object for data-free signals and use that -
>> or not send it when it is in use, thus pending.
>>
> 
> Ok, let's recap:
> 
> - rtdm_nrtsig_pend(): does not allocate anything, merely uses a
>   user-provided memory block, containing a request header. Current
>   callers should either already care for not resending a request
>   uselessly because it is pending (e.g. [1]), or their internal logic
>   should prevent that (e.g. [2]). This interface always convey user data
>   by reference.

Ah, I fantasized that rtdm_nrtsig_pend would call rtdm_schedule_nrt_work
to do the work, it's the other way around. False alarm here.

> 
> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct in order to
>   convey a request block we cannot squeeze into a work_struct, since the
>   latter is in itself an anchor type the user may embed into its own
>   private argument block. I'm unsure ensuring that
>   rtdm_schedule_nrt_work() calls do not pile up would be worth it, as
>   this call is not supposed to be used frenetically on some hot path in
>   the first place. But if we'd want to do that, then we would have to
>   change the signature of rtdm_schedule_nrt_work(), so that we gain a
>   persistent context data we could use for determining whether a nrt
>   work request is in flight. We could not probe the work_struct for that
>   purpose, that would be racy.
> 
> Do you see any way to have a smarter rtdm_schedule_nrt_work() without
> changing its signature?

There is no rule that prevents changing the signature if that is needed.
However, the kernel is fine without allocation as well, using a very
similar signature.

I do not yet understand way we need that indirection via the rtdm_nrtsig
on Dovetail. I thought we can trigger work directly from the oob domain
there, can't we? How do you handle such a case in evl so far?

And the third case remains xnthread_signal, btw.

Jan

> 
> [1]
> kernel/drivers/net/addons/cap.c:		rtdm_nrtsig_pend(&cap_signal);
> kernel/drivers/net/addons/cap.c:		rtdm_nrtsig_pend(&cap_signal);
> kernel/drivers/net/addons/proxy.c:
> rtdm_nrtsig_pend(&rtnetproxy_rx_signal);
> 
> [2]
> kernel/drivers/testing/switchtest.c:				rtdm_nrtsig_pend(&ctx->wake_utask);
> kernel/drivers/testing/switchtest.c:		rtdm_nrtsig_pend(&ctx->wake_utask);
> kernel/drivers/testing/switchtest.c:			rtdm_nrtsig_pend(&ctx->wake_utask);
> 

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "rtdm/nrtsig: move inband work description off the stack"
  2021-05-25 10:38       ` Jan Kiszka
@ 2021-05-25 12:37         ` Philippe Gerum
  2021-05-25 12:56           ` Jan Kiszka
  0 siblings, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2021-05-25 12:37 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 25.05.21 12:01, Philippe Gerum wrote:
>> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>>> On 25.05.21 10:46, Philippe Gerum wrote:
>>>>
>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>
>>>>> Hi Philippe,
>>>>>
>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
>>>>> to what it was (and even over I-pipe). xnmalloc is nothing you would
>>>>> expect from a "send a signal" service, specifically from
>>>>> rtdm_nrtsig_pend which does not even make use of the sending extra payload.
>>>>>
>>>>> Can we do better? Also for xnthread_signal (fd and udd usecases are not
>>>>> time-sensitive anyway).
>>>>>
>>>>
>>>> Nope, I cannot see any significantly better option which would still
>>>> allow a common implementation we can map on top of Dovetail / I-pipe.
>>>>
>>>
>>> E.g. pre-allocate the work object for data-free signals and use that -
>>> or not send it when it is in use, thus pending.
>>>
>> 
>> Ok, let's recap:
>> 
>> - rtdm_nrtsig_pend(): does not allocate anything, merely uses a
>>   user-provided memory block, containing a request header. Current
>>   callers should either already care for not resending a request
>>   uselessly because it is pending (e.g. [1]), or their internal logic
>>   should prevent that (e.g. [2]). This interface always convey user data
>>   by reference.
>
> Ah, I fantasized that rtdm_nrtsig_pend would call rtdm_schedule_nrt_work
> to do the work, it's the other way around. False alarm here.
>
>> 
>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct in order to
>>   convey a request block we cannot squeeze into a work_struct, since the
>>   latter is in itself an anchor type the user may embed into its own
>>   private argument block. I'm unsure ensuring that
>>   rtdm_schedule_nrt_work() calls do not pile up would be worth it, as
>>   this call is not supposed to be used frenetically on some hot path in
>>   the first place. But if we'd want to do that, then we would have to
>>   change the signature of rtdm_schedule_nrt_work(), so that we gain a
>>   persistent context data we could use for determining whether a nrt
>>   work request is in flight. We could not probe the work_struct for that
>>   purpose, that would be racy.
>> 
>> Do you see any way to have a smarter rtdm_schedule_nrt_work() without
>> changing its signature?
>
> There is no rule that prevents changing the signature if that is needed.
> However, the kernel is fine without allocation as well, using a very
> similar signature.
>

schedule_work() and rtdm_schedule_nrt_work() have a major difference:
the latter has to pass on the request from the oob to the in-band
context. This is what bugs us and creates the requirement for additional
mechanism and data.

> I do not yet understand way we need that indirection via the rtdm_nrtsig
> on Dovetail. I thought we can trigger work directly from the oob domain
> there, can't we? How do you handle such a case in evl so far?
>

Dovetail can trigger in-band work from oob via the generic irq_work()
service, we don't need the extra code involved in the I-pipe for the
same purpose. The gist of the matter is about having the same
implementation for rtdm_schedule_nrt_work() which works in both Dovetail
and I-pipe contexts: the way we do that is by using rtdm_nrtsig_pend()
which already abstracts the base mechanism for relaying oob -> in-band
signals.

On the other hand, EVL has the evl_call_inband[_from]() service, which
combines the irq_work and work_struct anchors we need into a single
evl_work type, which in turn can be embedded into a user request
block. This is what rtdm_schedule_nrt_work() does not expose, so it has
to build one internally by combining a dynamic allocation and the
user-provided work_struct.

> And the third case remains xnthread_signal, btw.
>

xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTERM signals
for a thread, a seldom event. Optimizing there would be overkill, unless
the application behaves wrongly in the first place.

> Jan
>
>> 
>> [1]
>> kernel/drivers/net/addons/cap.c:		rtdm_nrtsig_pend(&cap_signal);
>> kernel/drivers/net/addons/cap.c:		rtdm_nrtsig_pend(&cap_signal);
>> kernel/drivers/net/addons/proxy.c:
>> rtdm_nrtsig_pend(&rtnetproxy_rx_signal);
>> 
>> [2]
>> kernel/drivers/testing/switchtest.c:				rtdm_nrtsig_pend(&ctx->wake_utask);
>> kernel/drivers/testing/switchtest.c:		rtdm_nrtsig_pend(&ctx->wake_utask);
>> kernel/drivers/testing/switchtest.c:			rtdm_nrtsig_pend(&ctx->wake_utask);
>> 


-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "rtdm/nrtsig: move inband work description off the stack"
  2021-05-25 12:37         ` Philippe Gerum
@ 2021-05-25 12:56           ` Jan Kiszka
  2021-05-25 13:20             ` Philippe Gerum
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2021-05-25 12:56 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 25.05.21 14:37, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> On 25.05.21 12:01, Philippe Gerum wrote:
>>>
>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>
>>>> On 25.05.21 10:46, Philippe Gerum wrote:
>>>>>
>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>
>>>>>> Hi Philippe,
>>>>>>
>>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
>>>>>> to what it was (and even over I-pipe). xnmalloc is nothing you would
>>>>>> expect from a "send a signal" service, specifically from
>>>>>> rtdm_nrtsig_pend which does not even make use of the sending extra payload.
>>>>>>
>>>>>> Can we do better? Also for xnthread_signal (fd and udd usecases are not
>>>>>> time-sensitive anyway).
>>>>>>
>>>>>
>>>>> Nope, I cannot see any significantly better option which would still
>>>>> allow a common implementation we can map on top of Dovetail / I-pipe.
>>>>>
>>>>
>>>> E.g. pre-allocate the work object for data-free signals and use that -
>>>> or not send it when it is in use, thus pending.
>>>>
>>>
>>> Ok, let's recap:
>>>
>>> - rtdm_nrtsig_pend(): does not allocate anything, merely uses a
>>>   user-provided memory block, containing a request header. Current
>>>   callers should either already care for not resending a request
>>>   uselessly because it is pending (e.g. [1]), or their internal logic
>>>   should prevent that (e.g. [2]). This interface always convey user data
>>>   by reference.
>>
>> Ah, I fantasized that rtdm_nrtsig_pend would call rtdm_schedule_nrt_work
>> to do the work, it's the other way around. False alarm here.
>>
>>>
>>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct in order to
>>>   convey a request block we cannot squeeze into a work_struct, since the
>>>   latter is in itself an anchor type the user may embed into its own
>>>   private argument block. I'm unsure ensuring that
>>>   rtdm_schedule_nrt_work() calls do not pile up would be worth it, as
>>>   this call is not supposed to be used frenetically on some hot path in
>>>   the first place. But if we'd want to do that, then we would have to
>>>   change the signature of rtdm_schedule_nrt_work(), so that we gain a
>>>   persistent context data we could use for determining whether a nrt
>>>   work request is in flight. We could not probe the work_struct for that
>>>   purpose, that would be racy.
>>>
>>> Do you see any way to have a smarter rtdm_schedule_nrt_work() without
>>> changing its signature?
>>
>> There is no rule that prevents changing the signature if that is needed.
>> However, the kernel is fine without allocation as well, using a very
>> similar signature.
>>
> 
> schedule_work() and rtdm_schedule_nrt_work() have a major difference:
> the latter has to pass on the request from the oob to the in-band
> context. This is what bugs us and creates the requirement for additional
> mechanism and data.
> 
>> I do not yet understand way we need that indirection via the rtdm_nrtsig
>> on Dovetail. I thought we can trigger work directly from the oob domain
>> there, can't we? How do you handle such a case in evl so far?
>>
> 
> Dovetail can trigger in-band work from oob via the generic irq_work()
> service, we don't need the extra code involved in the I-pipe for the
> same purpose. The gist of the matter is about having the same
> implementation for rtdm_schedule_nrt_work() which works in both Dovetail
> and I-pipe contexts: the way we do that is by using rtdm_nrtsig_pend()
> which already abstracts the base mechanism for relaying oob -> in-band
> signals.
> 
> On the other hand, EVL has the evl_call_inband[_from]() service, which
> combines the irq_work and work_struct anchors we need into a single
> evl_work type, which in turn can be embedded into a user request
> block. This is what rtdm_schedule_nrt_work() does not expose, so it has
> to build one internally by combining a dynamic allocation and the
> user-provided work_struct.
> 

Then enhance the rtdm service to enable such a combination - and be even
more ready for the switch to Xenomai 4, I would say. We can either add a
new service and deprecate the old one (while implementing it as you
suggest) or just break the interface to get the (presumably) few users
quickly converted.

>> And the third case remains xnthread_signal, btw.
>>
> 
> xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTERM signals
> for a thread, a seldom event. Optimizing there would be overkill, unless
> the application behaves wrongly in the first place.

Some caller are under nklock, thus it makes potentially lengthy critical
sections now a bit longer.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "rtdm/nrtsig: move inband work description off the stack"
  2021-05-25 12:56           ` Jan Kiszka
@ 2021-05-25 13:20             ` Philippe Gerum
  2021-05-25 17:08               ` Jan Kiszka
  0 siblings, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2021-05-25 13:20 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 25.05.21 14:37, Philippe Gerum wrote:
>> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>>> On 25.05.21 12:01, Philippe Gerum wrote:
>>>>
>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>
>>>>> On 25.05.21 10:46, Philippe Gerum wrote:
>>>>>>
>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>
>>>>>>> Hi Philippe,
>>>>>>>
>>>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
>>>>>>> to what it was (and even over I-pipe). xnmalloc is nothing you would
>>>>>>> expect from a "send a signal" service, specifically from
>>>>>>> rtdm_nrtsig_pend which does not even make use of the sending extra payload.
>>>>>>>
>>>>>>> Can we do better? Also for xnthread_signal (fd and udd usecases are not
>>>>>>> time-sensitive anyway).
>>>>>>>
>>>>>>
>>>>>> Nope, I cannot see any significantly better option which would still
>>>>>> allow a common implementation we can map on top of Dovetail / I-pipe.
>>>>>>
>>>>>
>>>>> E.g. pre-allocate the work object for data-free signals and use that -
>>>>> or not send it when it is in use, thus pending.
>>>>>
>>>>
>>>> Ok, let's recap:
>>>>
>>>> - rtdm_nrtsig_pend(): does not allocate anything, merely uses a
>>>>   user-provided memory block, containing a request header. Current
>>>>   callers should either already care for not resending a request
>>>>   uselessly because it is pending (e.g. [1]), or their internal logic
>>>>   should prevent that (e.g. [2]). This interface always convey user data
>>>>   by reference.
>>>
>>> Ah, I fantasized that rtdm_nrtsig_pend would call rtdm_schedule_nrt_work
>>> to do the work, it's the other way around. False alarm here.
>>>
>>>>
>>>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct in order to
>>>>   convey a request block we cannot squeeze into a work_struct, since the
>>>>   latter is in itself an anchor type the user may embed into its own
>>>>   private argument block. I'm unsure ensuring that
>>>>   rtdm_schedule_nrt_work() calls do not pile up would be worth it, as
>>>>   this call is not supposed to be used frenetically on some hot path in
>>>>   the first place. But if we'd want to do that, then we would have to
>>>>   change the signature of rtdm_schedule_nrt_work(), so that we gain a
>>>>   persistent context data we could use for determining whether a nrt
>>>>   work request is in flight. We could not probe the work_struct for that
>>>>   purpose, that would be racy.
>>>>
>>>> Do you see any way to have a smarter rtdm_schedule_nrt_work() without
>>>> changing its signature?
>>>
>>> There is no rule that prevents changing the signature if that is needed.
>>> However, the kernel is fine without allocation as well, using a very
>>> similar signature.
>>>
>> 
>> schedule_work() and rtdm_schedule_nrt_work() have a major difference:
>> the latter has to pass on the request from the oob to the in-band
>> context. This is what bugs us and creates the requirement for additional
>> mechanism and data.
>> 
>>> I do not yet understand way we need that indirection via the rtdm_nrtsig
>>> on Dovetail. I thought we can trigger work directly from the oob domain
>>> there, can't we? How do you handle such a case in evl so far?
>>>
>> 
>> Dovetail can trigger in-band work from oob via the generic irq_work()
>> service, we don't need the extra code involved in the I-pipe for the
>> same purpose. The gist of the matter is about having the same
>> implementation for rtdm_schedule_nrt_work() which works in both Dovetail
>> and I-pipe contexts: the way we do that is by using rtdm_nrtsig_pend()
>> which already abstracts the base mechanism for relaying oob -> in-band
>> signals.
>> 
>> On the other hand, EVL has the evl_call_inband[_from]() service, which
>> combines the irq_work and work_struct anchors we need into a single
>> evl_work type, which in turn can be embedded into a user request
>> block. This is what rtdm_schedule_nrt_work() does not expose, so it has
>> to build one internally by combining a dynamic allocation and the
>> user-provided work_struct.
>> 
>
> Then enhance the rtdm service to enable such a combination - and be even
> more ready for the switch to Xenomai 4, I would say. We can either add a
> new service and deprecate the old one (while implementing it as you
> suggest) or just break the interface to get the (presumably) few users
> quickly converted.
>

rtdm_schedule_nrt_work() was added by Gilles when merging the RTnet code
into Xenomai 3 IIRC, with little documentation. Widespread usage of this
call is unlikely, so I'd vote for simplifying the whole thing and
replace it entirely.

>>> And the third case remains xnthread_signal, btw.
>>>
>> 
>> xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTERM signals
>> for a thread, a seldom event. Optimizing there would be overkill, unless
>> the application behaves wrongly in the first place.
>
> Some caller are under nklock, thus it makes potentially lengthy critical
> sections now a bit longer.
>

This is what needs to be fixed in the first place, we should not trigger
a signal relay when holding the ugly big lock.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "rtdm/nrtsig: move inband work description off the stack"
  2021-05-25 13:20             ` Philippe Gerum
@ 2021-05-25 17:08               ` Jan Kiszka
  2021-05-26 13:52                 ` Philippe Gerum
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2021-05-25 17:08 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 25.05.21 15:20, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> On 25.05.21 14:37, Philippe Gerum wrote:
>>>
>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>
>>>> On 25.05.21 12:01, Philippe Gerum wrote:
>>>>>
>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>
>>>>>> On 25.05.21 10:46, Philippe Gerum wrote:
>>>>>>>
>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>
>>>>>>>> Hi Philippe,
>>>>>>>>
>>>>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
>>>>>>>> to what it was (and even over I-pipe). xnmalloc is nothing you would
>>>>>>>> expect from a "send a signal" service, specifically from
>>>>>>>> rtdm_nrtsig_pend which does not even make use of the sending extra payload.
>>>>>>>>
>>>>>>>> Can we do better? Also for xnthread_signal (fd and udd usecases are not
>>>>>>>> time-sensitive anyway).
>>>>>>>>
>>>>>>>
>>>>>>> Nope, I cannot see any significantly better option which would still
>>>>>>> allow a common implementation we can map on top of Dovetail / I-pipe.
>>>>>>>
>>>>>>
>>>>>> E.g. pre-allocate the work object for data-free signals and use that -
>>>>>> or not send it when it is in use, thus pending.
>>>>>>
>>>>>
>>>>> Ok, let's recap:
>>>>>
>>>>> - rtdm_nrtsig_pend(): does not allocate anything, merely uses a
>>>>>   user-provided memory block, containing a request header. Current
>>>>>   callers should either already care for not resending a request
>>>>>   uselessly because it is pending (e.g. [1]), or their internal logic
>>>>>   should prevent that (e.g. [2]). This interface always convey user data
>>>>>   by reference.
>>>>
>>>> Ah, I fantasized that rtdm_nrtsig_pend would call rtdm_schedule_nrt_work
>>>> to do the work, it's the other way around. False alarm here.
>>>>
>>>>>
>>>>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct in order to
>>>>>   convey a request block we cannot squeeze into a work_struct, since the
>>>>>   latter is in itself an anchor type the user may embed into its own
>>>>>   private argument block. I'm unsure ensuring that
>>>>>   rtdm_schedule_nrt_work() calls do not pile up would be worth it, as
>>>>>   this call is not supposed to be used frenetically on some hot path in
>>>>>   the first place. But if we'd want to do that, then we would have to
>>>>>   change the signature of rtdm_schedule_nrt_work(), so that we gain a
>>>>>   persistent context data we could use for determining whether a nrt
>>>>>   work request is in flight. We could not probe the work_struct for that
>>>>>   purpose, that would be racy.
>>>>>
>>>>> Do you see any way to have a smarter rtdm_schedule_nrt_work() without
>>>>> changing its signature?
>>>>
>>>> There is no rule that prevents changing the signature if that is needed.
>>>> However, the kernel is fine without allocation as well, using a very
>>>> similar signature.
>>>>
>>>
>>> schedule_work() and rtdm_schedule_nrt_work() have a major difference:
>>> the latter has to pass on the request from the oob to the in-band
>>> context. This is what bugs us and creates the requirement for additional
>>> mechanism and data.
>>>
>>>> I do not yet understand way we need that indirection via the rtdm_nrtsig
>>>> on Dovetail. I thought we can trigger work directly from the oob domain
>>>> there, can't we? How do you handle such a case in evl so far?
>>>>
>>>
>>> Dovetail can trigger in-band work from oob via the generic irq_work()
>>> service, we don't need the extra code involved in the I-pipe for the
>>> same purpose. The gist of the matter is about having the same
>>> implementation for rtdm_schedule_nrt_work() which works in both Dovetail
>>> and I-pipe contexts: the way we do that is by using rtdm_nrtsig_pend()
>>> which already abstracts the base mechanism for relaying oob -> in-band
>>> signals.
>>>
>>> On the other hand, EVL has the evl_call_inband[_from]() service, which
>>> combines the irq_work and work_struct anchors we need into a single
>>> evl_work type, which in turn can be embedded into a user request
>>> block. This is what rtdm_schedule_nrt_work() does not expose, so it has
>>> to build one internally by combining a dynamic allocation and the
>>> user-provided work_struct.
>>>
>>
>> Then enhance the rtdm service to enable such a combination - and be even
>> more ready for the switch to Xenomai 4, I would say. We can either add a
>> new service and deprecate the old one (while implementing it as you
>> suggest) or just break the interface to get the (presumably) few users
>> quickly converted.
>>
> 
> rtdm_schedule_nrt_work() was added by Gilles when merging the RTnet code
> into Xenomai 3 IIRC, with little documentation. Widespread usage of this
> call is unlikely, so I'd vote for simplifying the whole thing and
> replace it entirely.
> 

OK, then we have a plan for this piece.

>>>> And the third case remains xnthread_signal, btw.
>>>>
>>>
>>> xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTERM signals
>>> for a thread, a seldom event. Optimizing there would be overkill, unless
>>> the application behaves wrongly in the first place.
>>
>> Some caller are under nklock, thus it makes potentially lengthy critical
>> sections now a bit longer.
>>
> 
> This is what needs to be fixed in the first place, we should not trigger
> a signal relay when holding the ugly big lock.
> 

Whatever is simpler.

Another aspect: Using xnmalloc for anything that should better succeed
is possibly not good when we are OOM. That is actually no strict
regression over I-pipe where we had a limited queue as well IIRC but a
reason to avoid carrying that over - and possibly creating new error
scenarios that are harder to debug.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "rtdm/nrtsig: move inband work description off the stack"
  2021-05-25 17:08               ` Jan Kiszka
@ 2021-05-26 13:52                 ` Philippe Gerum
  2021-05-31 15:07                   ` Jan Kiszka
  0 siblings, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2021-05-26 13:52 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 25.05.21 15:20, Philippe Gerum wrote:
>> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>>> On 25.05.21 14:37, Philippe Gerum wrote:
>>>>
>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>
>>>>> On 25.05.21 12:01, Philippe Gerum wrote:
>>>>>>
>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>
>>>>>>> On 25.05.21 10:46, Philippe Gerum wrote:
>>>>>>>>
>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>
>>>>>>>>> Hi Philippe,
>>>>>>>>>
>>>>>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
>>>>>>>>> to what it was (and even over I-pipe). xnmalloc is nothing you would
>>>>>>>>> expect from a "send a signal" service, specifically from
>>>>>>>>> rtdm_nrtsig_pend which does not even make use of the sending extra payload.
>>>>>>>>>
>>>>>>>>> Can we do better? Also for xnthread_signal (fd and udd usecases are not
>>>>>>>>> time-sensitive anyway).
>>>>>>>>>
>>>>>>>>
>>>>>>>> Nope, I cannot see any significantly better option which would still
>>>>>>>> allow a common implementation we can map on top of Dovetail / I-pipe.
>>>>>>>>
>>>>>>>
>>>>>>> E.g. pre-allocate the work object for data-free signals and use that -
>>>>>>> or not send it when it is in use, thus pending.
>>>>>>>
>>>>>>
>>>>>> Ok, let's recap:
>>>>>>
>>>>>> - rtdm_nrtsig_pend(): does not allocate anything, merely uses a
>>>>>>   user-provided memory block, containing a request header. Current
>>>>>>   callers should either already care for not resending a request
>>>>>>   uselessly because it is pending (e.g. [1]), or their internal logic
>>>>>>   should prevent that (e.g. [2]). This interface always convey user data
>>>>>>   by reference.
>>>>>
>>>>> Ah, I fantasized that rtdm_nrtsig_pend would call rtdm_schedule_nrt_work
>>>>> to do the work, it's the other way around. False alarm here.
>>>>>
>>>>>>
>>>>>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct in order to
>>>>>>   convey a request block we cannot squeeze into a work_struct, since the
>>>>>>   latter is in itself an anchor type the user may embed into its own
>>>>>>   private argument block. I'm unsure ensuring that
>>>>>>   rtdm_schedule_nrt_work() calls do not pile up would be worth it, as
>>>>>>   this call is not supposed to be used frenetically on some hot path in
>>>>>>   the first place. But if we'd want to do that, then we would have to
>>>>>>   change the signature of rtdm_schedule_nrt_work(), so that we gain a
>>>>>>   persistent context data we could use for determining whether a nrt
>>>>>>   work request is in flight. We could not probe the work_struct for that
>>>>>>   purpose, that would be racy.
>>>>>>
>>>>>> Do you see any way to have a smarter rtdm_schedule_nrt_work() without
>>>>>> changing its signature?
>>>>>
>>>>> There is no rule that prevents changing the signature if that is needed.
>>>>> However, the kernel is fine without allocation as well, using a very
>>>>> similar signature.
>>>>>
>>>>
>>>> schedule_work() and rtdm_schedule_nrt_work() have a major difference:
>>>> the latter has to pass on the request from the oob to the in-band
>>>> context. This is what bugs us and creates the requirement for additional
>>>> mechanism and data.
>>>>
>>>>> I do not yet understand way we need that indirection via the rtdm_nrtsig
>>>>> on Dovetail. I thought we can trigger work directly from the oob domain
>>>>> there, can't we? How do you handle such a case in evl so far?
>>>>>
>>>>
>>>> Dovetail can trigger in-band work from oob via the generic irq_work()
>>>> service, we don't need the extra code involved in the I-pipe for the
>>>> same purpose. The gist of the matter is about having the same
>>>> implementation for rtdm_schedule_nrt_work() which works in both Dovetail
>>>> and I-pipe contexts: the way we do that is by using rtdm_nrtsig_pend()
>>>> which already abstracts the base mechanism for relaying oob -> in-band
>>>> signals.
>>>>
>>>> On the other hand, EVL has the evl_call_inband[_from]() service, which
>>>> combines the irq_work and work_struct anchors we need into a single
>>>> evl_work type, which in turn can be embedded into a user request
>>>> block. This is what rtdm_schedule_nrt_work() does not expose, so it has
>>>> to build one internally by combining a dynamic allocation and the
>>>> user-provided work_struct.
>>>>
>>>
>>> Then enhance the rtdm service to enable such a combination - and be even
>>> more ready for the switch to Xenomai 4, I would say. We can either add a
>>> new service and deprecate the old one (while implementing it as you
>>> suggest) or just break the interface to get the (presumably) few users
>>> quickly converted.
>>>
>> 
>> rtdm_schedule_nrt_work() was added by Gilles when merging the RTnet code
>> into Xenomai 3 IIRC, with little documentation. Widespread usage of this
>> call is unlikely, so I'd vote for simplifying the whole thing and
>> replace it entirely.
>> 
>
> OK, then we have a plan for this piece.
>
>>>>> And the third case remains xnthread_signal, btw.
>>>>>
>>>>
>>>> xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTERM signals
>>>> for a thread, a seldom event. Optimizing there would be overkill, unless
>>>> the application behaves wrongly in the first place.
>>>
>>> Some caller are under nklock, thus it makes potentially lengthy critical
>>> sections now a bit longer.
>>>
>> 
>> This is what needs to be fixed in the first place, we should not trigger
>> a signal relay when holding the ugly big lock.
>> 
>
> Whatever is simpler.
>
> Another aspect: Using xnmalloc for anything that should better succeed
> is possibly not good when we are OOM. That is actually no strict
> regression over I-pipe where we had a limited queue as well IIRC but a
> reason to avoid carrying that over - and possibly creating new error
> scenarios that are harder to debug.
>

To mitigate the issue of generalized OOM, we could pull memory from a
local xnheap privately attached to xnthread_signal() along with any
caller requiring dynamic allocation, so that none of them could deplete
the sysheap.

Getting rid of dynamic allocation entirely on the other hand would
require us to add one or more "signal slots" per possible cause of
in-band signal into struct xnthread (e.g. SIGSHADOW_ACTION_BACKTRACE,
SIGDEBUG_MIGRATE_SIGNAL etc), passing the proper slot to
xnthread_signal(). SIGSHADOW(SIGWINCH) and SIGDEBUG(SIGXCPU) are
standard non-queueable signals, so we should only need a single slot per
signal cause.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "rtdm/nrtsig: move inband work description off the stack"
  2021-05-26 13:52                 ` Philippe Gerum
@ 2021-05-31 15:07                   ` Jan Kiszka
  2021-06-02  6:51                     ` Philippe Gerum
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2021-05-31 15:07 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 26.05.21 15:52, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> On 25.05.21 15:20, Philippe Gerum wrote:
>>>
>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>
>>>> On 25.05.21 14:37, Philippe Gerum wrote:
>>>>>
>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>
>>>>>> On 25.05.21 12:01, Philippe Gerum wrote:
>>>>>>>
>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>
>>>>>>>> On 25.05.21 10:46, Philippe Gerum wrote:
>>>>>>>>>
>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>
>>>>>>>>>> Hi Philippe,
>>>>>>>>>>
>>>>>>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
>>>>>>>>>> to what it was (and even over I-pipe). xnmalloc is nothing you would
>>>>>>>>>> expect from a "send a signal" service, specifically from
>>>>>>>>>> rtdm_nrtsig_pend which does not even make use of the sending extra payload.
>>>>>>>>>>
>>>>>>>>>> Can we do better? Also for xnthread_signal (fd and udd usecases are not
>>>>>>>>>> time-sensitive anyway).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Nope, I cannot see any significantly better option which would still
>>>>>>>>> allow a common implementation we can map on top of Dovetail / I-pipe.
>>>>>>>>>
>>>>>>>>
>>>>>>>> E.g. pre-allocate the work object for data-free signals and use that -
>>>>>>>> or not send it when it is in use, thus pending.
>>>>>>>>
>>>>>>>
>>>>>>> Ok, let's recap:
>>>>>>>
>>>>>>> - rtdm_nrtsig_pend(): does not allocate anything, merely uses a
>>>>>>>   user-provided memory block, containing a request header. Current
>>>>>>>   callers should either already care for not resending a request
>>>>>>>   uselessly because it is pending (e.g. [1]), or their internal logic
>>>>>>>   should prevent that (e.g. [2]). This interface always convey user data
>>>>>>>   by reference.
>>>>>>
>>>>>> Ah, I fantasized that rtdm_nrtsig_pend would call rtdm_schedule_nrt_work
>>>>>> to do the work, it's the other way around. False alarm here.
>>>>>>
>>>>>>>
>>>>>>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct in order to
>>>>>>>   convey a request block we cannot squeeze into a work_struct, since the
>>>>>>>   latter is in itself an anchor type the user may embed into its own
>>>>>>>   private argument block. I'm unsure ensuring that
>>>>>>>   rtdm_schedule_nrt_work() calls do not pile up would be worth it, as
>>>>>>>   this call is not supposed to be used frenetically on some hot path in
>>>>>>>   the first place. But if we'd want to do that, then we would have to
>>>>>>>   change the signature of rtdm_schedule_nrt_work(), so that we gain a
>>>>>>>   persistent context data we could use for determining whether a nrt
>>>>>>>   work request is in flight. We could not probe the work_struct for that
>>>>>>>   purpose, that would be racy.
>>>>>>>
>>>>>>> Do you see any way to have a smarter rtdm_schedule_nrt_work() without
>>>>>>> changing its signature?
>>>>>>
>>>>>> There is no rule that prevents changing the signature if that is needed.
>>>>>> However, the kernel is fine without allocation as well, using a very
>>>>>> similar signature.
>>>>>>
>>>>>
>>>>> schedule_work() and rtdm_schedule_nrt_work() have a major difference:
>>>>> the latter has to pass on the request from the oob to the in-band
>>>>> context. This is what bugs us and creates the requirement for additional
>>>>> mechanism and data.
>>>>>
>>>>>> I do not yet understand way we need that indirection via the rtdm_nrtsig
>>>>>> on Dovetail. I thought we can trigger work directly from the oob domain
>>>>>> there, can't we? How do you handle such a case in evl so far?
>>>>>>
>>>>>
>>>>> Dovetail can trigger in-band work from oob via the generic irq_work()
>>>>> service, we don't need the extra code involved in the I-pipe for the
>>>>> same purpose. The gist of the matter is about having the same
>>>>> implementation for rtdm_schedule_nrt_work() which works in both Dovetail
>>>>> and I-pipe contexts: the way we do that is by using rtdm_nrtsig_pend()
>>>>> which already abstracts the base mechanism for relaying oob -> in-band
>>>>> signals.
>>>>>
>>>>> On the other hand, EVL has the evl_call_inband[_from]() service, which
>>>>> combines the irq_work and work_struct anchors we need into a single
>>>>> evl_work type, which in turn can be embedded into a user request
>>>>> block. This is what rtdm_schedule_nrt_work() does not expose, so it has
>>>>> to build one internally by combining a dynamic allocation and the
>>>>> user-provided work_struct.
>>>>>
>>>>
>>>> Then enhance the rtdm service to enable such a combination - and be even
>>>> more ready for the switch to Xenomai 4, I would say. We can either add a
>>>> new service and deprecate the old one (while implementing it as you
>>>> suggest) or just break the interface to get the (presumably) few users
>>>> quickly converted.
>>>>
>>>
>>> rtdm_schedule_nrt_work() was added by Gilles when merging the RTnet code
>>> into Xenomai 3 IIRC, with little documentation. Widespread usage of this
>>> call is unlikely, so I'd vote for simplifying the whole thing and
>>> replace it entirely.
>>>
>>
>> OK, then we have a plan for this piece.
>>
>>>>>> And the third case remains xnthread_signal, btw.
>>>>>>
>>>>>
>>>>> xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTERM signals
>>>>> for a thread, a seldom event. Optimizing there would be overkill, unless
>>>>> the application behaves wrongly in the first place.
>>>>
>>>> Some caller are under nklock, thus it makes potentially lengthy critical
>>>> sections now a bit longer.
>>>>
>>>
>>> This is what needs to be fixed in the first place, we should not trigger
>>> a signal relay when holding the ugly big lock.
>>>
>>
>> Whatever is simpler.
>>
>> Another aspect: Using xnmalloc for anything that should better succeed
>> is possibly not good when we are OOM. That is actually no strict
>> regression over I-pipe where we had a limited queue as well IIRC but a
>> reason to avoid carrying that over - and possibly creating new error
>> scenarios that are harder to debug.
>>
> 
> To mitigate the issue of generalized OOM, we could pull memory from a
> local xnheap privately attached to xnthread_signal() along with any
> caller requiring dynamic allocation, so that none of them could deplete
> the sysheap.
> 
> Getting rid of dynamic allocation entirely on the other hand would
> require us to add one or more "signal slots" per possible cause of
> in-band signal into struct xnthread (e.g. SIGSHADOW_ACTION_BACKTRACE,
> SIGDEBUG_MIGRATE_SIGNAL etc), passing the proper slot to
> xnthread_signal(). SIGSHADOW(SIGWINCH) and SIGDEBUG(SIGXCPU) are
> standard non-queueable signals, so we should only need a single slot per
> signal cause.
> 

Can a thread actually trigger multiple causes per SIGDEBUG, e.g.?
Likely, we only need to carry the single cause via a field in xnthread
from the trigger point to the in-band handler. The same is probably the
case for SIGSHADOW_ACTION_BACKTRACE.

Now, the question for me is how we move forward from this? Is it simpler
to fix these issues? Or would it be better first avoid affecting I-pipe
cases and living with xnmalloc for an initial phase over Dovetail?

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "rtdm/nrtsig: move inband work description off the stack"
  2021-05-31 15:07                   ` Jan Kiszka
@ 2021-06-02  6:51                     ` Philippe Gerum
  2021-06-04  6:34                       ` Chen, Hongzhan
  0 siblings, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2021-06-02  6:51 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 26.05.21 15:52, Philippe Gerum wrote:
>> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>>> On 25.05.21 15:20, Philippe Gerum wrote:
>>>>
>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>
>>>>> On 25.05.21 14:37, Philippe Gerum wrote:
>>>>>>
>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>
>>>>>>> On 25.05.21 12:01, Philippe Gerum wrote:
>>>>>>>>
>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>
>>>>>>>>> On 25.05.21 10:46, Philippe Gerum wrote:
>>>>>>>>>>
>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>
>>>>>>>>>>> Hi Philippe,
>>>>>>>>>>>
>>>>>>>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
>>>>>>>>>>> to what it was (and even over I-pipe). xnmalloc is nothing you would
>>>>>>>>>>> expect from a "send a signal" service, specifically from
>>>>>>>>>>> rtdm_nrtsig_pend which does not even make use of the sending extra payload.
>>>>>>>>>>>
>>>>>>>>>>> Can we do better? Also for xnthread_signal (fd and udd usecases are not
>>>>>>>>>>> time-sensitive anyway).
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Nope, I cannot see any significantly better option which would still
>>>>>>>>>> allow a common implementation we can map on top of Dovetail / I-pipe.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> E.g. pre-allocate the work object for data-free signals and use that -
>>>>>>>>> or not send it when it is in use, thus pending.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Ok, let's recap:
>>>>>>>>
>>>>>>>> - rtdm_nrtsig_pend(): does not allocate anything, merely uses a
>>>>>>>>   user-provided memory block, containing a request header. Current
>>>>>>>>   callers should either already care for not resending a request
>>>>>>>>   uselessly because it is pending (e.g. [1]), or their internal logic
>>>>>>>>   should prevent that (e.g. [2]). This interface always convey user data
>>>>>>>>   by reference.
>>>>>>>
>>>>>>> Ah, I fantasized that rtdm_nrtsig_pend would call rtdm_schedule_nrt_work
>>>>>>> to do the work, it's the other way around. False alarm here.
>>>>>>>
>>>>>>>>
>>>>>>>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct in order to
>>>>>>>>   convey a request block we cannot squeeze into a work_struct, since the
>>>>>>>>   latter is in itself an anchor type the user may embed into its own
>>>>>>>>   private argument block. I'm unsure ensuring that
>>>>>>>>   rtdm_schedule_nrt_work() calls do not pile up would be worth it, as
>>>>>>>>   this call is not supposed to be used frenetically on some hot path in
>>>>>>>>   the first place. But if we'd want to do that, then we would have to
>>>>>>>>   change the signature of rtdm_schedule_nrt_work(), so that we gain a
>>>>>>>>   persistent context data we could use for determining whether a nrt
>>>>>>>>   work request is in flight. We could not probe the work_struct for that
>>>>>>>>   purpose, that would be racy.
>>>>>>>>
>>>>>>>> Do you see any way to have a smarter rtdm_schedule_nrt_work() without
>>>>>>>> changing its signature?
>>>>>>>
>>>>>>> There is no rule that prevents changing the signature if that is needed.
>>>>>>> However, the kernel is fine without allocation as well, using a very
>>>>>>> similar signature.
>>>>>>>
>>>>>>
>>>>>> schedule_work() and rtdm_schedule_nrt_work() have a major difference:
>>>>>> the latter has to pass on the request from the oob to the in-band
>>>>>> context. This is what bugs us and creates the requirement for additional
>>>>>> mechanism and data.
>>>>>>
>>>>>>> I do not yet understand way we need that indirection via the rtdm_nrtsig
>>>>>>> on Dovetail. I thought we can trigger work directly from the oob domain
>>>>>>> there, can't we? How do you handle such a case in evl so far?
>>>>>>>
>>>>>>
>>>>>> Dovetail can trigger in-band work from oob via the generic irq_work()
>>>>>> service, we don't need the extra code involved in the I-pipe for the
>>>>>> same purpose. The gist of the matter is about having the same
>>>>>> implementation for rtdm_schedule_nrt_work() which works in both Dovetail
>>>>>> and I-pipe contexts: the way we do that is by using rtdm_nrtsig_pend()
>>>>>> which already abstracts the base mechanism for relaying oob -> in-band
>>>>>> signals.
>>>>>>
>>>>>> On the other hand, EVL has the evl_call_inband[_from]() service, which
>>>>>> combines the irq_work and work_struct anchors we need into a single
>>>>>> evl_work type, which in turn can be embedded into a user request
>>>>>> block. This is what rtdm_schedule_nrt_work() does not expose, so it has
>>>>>> to build one internally by combining a dynamic allocation and the
>>>>>> user-provided work_struct.
>>>>>>
>>>>>
>>>>> Then enhance the rtdm service to enable such a combination - and be even
>>>>> more ready for the switch to Xenomai 4, I would say. We can either add a
>>>>> new service and deprecate the old one (while implementing it as you
>>>>> suggest) or just break the interface to get the (presumably) few users
>>>>> quickly converted.
>>>>>
>>>>
>>>> rtdm_schedule_nrt_work() was added by Gilles when merging the RTnet code
>>>> into Xenomai 3 IIRC, with little documentation. Widespread usage of this
>>>> call is unlikely, so I'd vote for simplifying the whole thing and
>>>> replace it entirely.
>>>>
>>>
>>> OK, then we have a plan for this piece.
>>>
>>>>>>> And the third case remains xnthread_signal, btw.
>>>>>>>
>>>>>>
>>>>>> xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTERM signals
>>>>>> for a thread, a seldom event. Optimizing there would be overkill, unless
>>>>>> the application behaves wrongly in the first place.
>>>>>
>>>>> Some caller are under nklock, thus it makes potentially lengthy critical
>>>>> sections now a bit longer.
>>>>>
>>>>
>>>> This is what needs to be fixed in the first place, we should not trigger
>>>> a signal relay when holding the ugly big lock.
>>>>
>>>
>>> Whatever is simpler.
>>>
>>> Another aspect: Using xnmalloc for anything that should better succeed
>>> is possibly not good when we are OOM. That is actually no strict
>>> regression over I-pipe where we had a limited queue as well IIRC but a
>>> reason to avoid carrying that over - and possibly creating new error
>>> scenarios that are harder to debug.
>>>
>> 
>> To mitigate the issue of generalized OOM, we could pull memory from a
>> local xnheap privately attached to xnthread_signal() along with any
>> caller requiring dynamic allocation, so that none of them could deplete
>> the sysheap.
>> 
>> Getting rid of dynamic allocation entirely on the other hand would
>> require us to add one or more "signal slots" per possible cause of
>> in-band signal into struct xnthread (e.g. SIGSHADOW_ACTION_BACKTRACE,
>> SIGDEBUG_MIGRATE_SIGNAL etc), passing the proper slot to
>> xnthread_signal(). SIGSHADOW(SIGWINCH) and SIGDEBUG(SIGXCPU) are
>> standard non-queueable signals, so we should only need a single slot per
>> signal cause.
>> 
>
> Can a thread actually trigger multiple causes per SIGDEBUG, e.g.?
> Likely, we only need to carry the single cause via a field in xnthread
> from the trigger point to the in-band handler. The same is probably the
> case for SIGSHADOW_ACTION_BACKTRACE.
>

Multiple events triggering SIGSHADOW_ACTION_HOME and
SIGSHADOW_ACTION_HARDEN _might_ pile up, although unlikely.

> Now, the question for me is how we move forward from this? Is it simpler
> to fix these issues? Or would it be better first avoid affecting I-pipe
> cases and living with xnmalloc for an initial phase over Dovetail?
>

A much better option would be that somebody volunteers to work on
implementing what has just been described, in order to have the proper
solution available in both I-pipe an Dovetail contexts, based on generic
code.

Although such work may not seem obvious at first sight reading the
jargonish descriptions above, it is actually fairly simple to
implement. I believe that any contributor would be welcome to ask for
explanations and details for that purpose.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: "rtdm/nrtsig: move inband work description off the stack"
  2021-06-02  6:51                     ` Philippe Gerum
@ 2021-06-04  6:34                       ` Chen, Hongzhan
  2021-06-04  7:34                         ` Jan Kiszka
  2021-06-04  7:51                         ` Philippe Gerum
  0 siblings, 2 replies; 21+ messages in thread
From: Chen, Hongzhan @ 2021-06-04  6:34 UTC (permalink / raw)
  To: Philippe Gerum, Jan Kiszka; +Cc: xenomai


>Jan Kiszka <jan.kiszka@siemens.com> writes:
>
>> On 26.05.21 15:52, Philippe Gerum wrote:
>>> 
>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>> 
>>>> On 25.05.21 15:20, Philippe Gerum wrote:
>>>>>
>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>
>>>>>> On 25.05.21 14:37, Philippe Gerum wrote:
>>>>>>>
>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>
>>>>>>>> On 25.05.21 12:01, Philippe Gerum wrote:
>>>>>>>>>
>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>
>>>>>>>>>> On 25.05.21 10:46, Philippe Gerum wrote:
>>>>>>>>>>>
>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Philippe,
>>>>>>>>>>>>
>>>>>>>>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
>>>>>>>>>>>> to what it was (and even over I-pipe). xnmalloc is nothing you would
>>>>>>>>>>>> expect from a "send a signal" service, specifically from
>>>>>>>>>>>> rtdm_nrtsig_pend which does not even make use of the sending extra payload.
>>>>>>>>>>>>
>>>>>>>>>>>> Can we do better? Also for xnthread_signal (fd and udd usecases are not
>>>>>>>>>>>> time-sensitive anyway).
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Nope, I cannot see any significantly better option which would still
>>>>>>>>>>> allow a common implementation we can map on top of Dovetail / I-pipe.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> E.g. pre-allocate the work object for data-free signals and use that -
>>>>>>>>>> or not send it when it is in use, thus pending.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ok, let's recap:
>>>>>>>>>
>>>>>>>>> - rtdm_nrtsig_pend(): does not allocate anything, merely uses a
>>>>>>>>>   user-provided memory block, containing a request header. Current
>>>>>>>>>   callers should either already care for not resending a request
>>>>>>>>>   uselessly because it is pending (e.g. [1]), or their internal logic
>>>>>>>>>   should prevent that (e.g. [2]). This interface always convey user data
>>>>>>>>>   by reference.
>>>>>>>>
>>>>>>>> Ah, I fantasized that rtdm_nrtsig_pend would call rtdm_schedule_nrt_work
>>>>>>>> to do the work, it's the other way around. False alarm here.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct in order to
>>>>>>>>>   convey a request block we cannot squeeze into a work_struct, since the
>>>>>>>>>   latter is in itself an anchor type the user may embed into its own
>>>>>>>>>   private argument block. I'm unsure ensuring that
>>>>>>>>>   rtdm_schedule_nrt_work() calls do not pile up would be worth it, as
>>>>>>>>>   this call is not supposed to be used frenetically on some hot path in
>>>>>>>>>   the first place. But if we'd want to do that, then we would have to
>>>>>>>>>   change the signature of rtdm_schedule_nrt_work(), so that we gain a
>>>>>>>>>   persistent context data we could use for determining whether a nrt
>>>>>>>>>   work request is in flight. We could not probe the work_struct for that
>>>>>>>>>   purpose, that would be racy.
>>>>>>>>>
>>>>>>>>> Do you see any way to have a smarter rtdm_schedule_nrt_work() without
>>>>>>>>> changing its signature?
>>>>>>>>
>>>>>>>> There is no rule that prevents changing the signature if that is needed.
>>>>>>>> However, the kernel is fine without allocation as well, using a very
>>>>>>>> similar signature.
>>>>>>>>
>>>>>>>
>>>>>>> schedule_work() and rtdm_schedule_nrt_work() have a major difference:
>>>>>>> the latter has to pass on the request from the oob to the in-band
>>>>>>> context. This is what bugs us and creates the requirement for additional
>>>>>>> mechanism and data.
>>>>>>>
>>>>>>>> I do not yet understand way we need that indirection via the rtdm_nrtsig
>>>>>>>> on Dovetail. I thought we can trigger work directly from the oob domain
>>>>>>>> there, can't we? How do you handle such a case in evl so far?
>>>>>>>>
>>>>>>>
>>>>>>> Dovetail can trigger in-band work from oob via the generic irq_work()
>>>>>>> service, we don't need the extra code involved in the I-pipe for the
>>>>>>> same purpose. The gist of the matter is about having the same
>>>>>>> implementation for rtdm_schedule_nrt_work() which works in both Dovetail
>>>>>>> and I-pipe contexts: the way we do that is by using rtdm_nrtsig_pend()
>>>>>>> which already abstracts the base mechanism for relaying oob -> in-band
>>>>>>> signals.
>>>>>>>
>>>>>>> On the other hand, EVL has the evl_call_inband[_from]() service, which
>>>>>>> combines the irq_work and work_struct anchors we need into a single
>>>>>>> evl_work type, which in turn can be embedded into a user request
>>>>>>> block. This is what rtdm_schedule_nrt_work() does not expose, so it has
>>>>>>> to build one internally by combining a dynamic allocation and the
>>>>>>> user-provided work_struct.
>>>>>>>
>>>>>>
>>>>>> Then enhance the rtdm service to enable such a combination - and be even
>>>>>> more ready for the switch to Xenomai 4, I would say. We can either add a
>>>>>> new service and deprecate the old one (while implementing it as you
>>>>>> suggest) or just break the interface to get the (presumably) few users
>>>>>> quickly converted.
>>>>>>
>>>>>
>>>>> rtdm_schedule_nrt_work() was added by Gilles when merging the RTnet code
>>>>> into Xenomai 3 IIRC, with little documentation. Widespread usage of this
>>>>> call is unlikely, so I'd vote for simplifying the whole thing and
>>>>> replace it entirely.
>>>>>
>>>>
>>>> OK, then we have a plan for this piece.

Do we  need to implement new API to replace rtdm_schedule_nrt_work() to avoid call xnmalloc 
or just keep it as it is in Xenomai 3.2 ?

>>>>
>>>>>>>> And the third case remains xnthread_signal, btw.
>>>>>>>>
>>>>>>>
>>>>>>> xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTERM signals
>>>>>>> for a thread, a seldom event. Optimizing there would be overkill, unless
>>>>>>> the application behaves wrongly in the first place.
>>>>>>
>>>>>> Some caller are under nklock, thus it makes potentially lengthy critical
>>>>>> sections now a bit longer.
>>>>>>
>>>>>
>>>>> This is what needs to be fixed in the first place, we should not trigger
>>>>> a signal relay when holding the ugly big lock.

Which functions we need to fix or do optimization to avoid calling xnthread_signal when holding
big lock if possible? xnthread_suspend and xnthread_cancel?  

>>>>>
>>>>
>>>> Whatever is simpler.
>>>>
>>>> Another aspect: Using xnmalloc for anything that should better succeed
>>>> is possibly not good when we are OOM. That is actually no strict
>>>> regression over I-pipe where we had a limited queue as well IIRC but a
>>>> reason to avoid carrying that over - and possibly creating new error
>>>> scenarios that are harder to debug.
>>>>
>>> 
>>> To mitigate the issue of generalized OOM, we could pull memory from a
>>> local xnheap privately attached to xnthread_signal() along with any
>>> caller requiring dynamic allocation, so that none of them could deplete
>>> the sysheap.
>>> 
>>> Getting rid of dynamic allocation entirely on the other hand would
>>> require us to add one or more "signal slots" per possible cause of
>>> in-band signal into struct xnthread (e.g. SIGSHADOW_ACTION_BACKTRACE,
>>> SIGDEBUG_MIGRATE_SIGNAL etc), passing the proper slot to
>>> xnthread_signal(). SIGSHADOW(SIGWINCH) and SIGDEBUG(SIGXCPU) are
>>> standard non-queueable signals, so we should only need a single slot per
>>> signal cause.
>>> 
>>
>> Can a thread actually trigger multiple causes per SIGDEBUG, e.g.?
>> Likely, we only need to carry the single cause via a field in xnthread
>> from the trigger point to the in-band handler. The same is probably the
>> case for SIGSHADOW_ACTION_BACKTRACE.
>>
>
>Multiple events triggering SIGSHADOW_ACTION_HOME and
>SIGSHADOW_ACTION_HARDEN _might_ pile up, although unlikely.

Does that means that in struct xnthread there should be three slots of
struct lostage_signal to handle each of SIGDEBUG/SIGSHADOW/SIGTERM
like patch [1]?

 If there is no pile up for each signal , we would just allocate fixed position 
for three signals in the array and then handle corresponding  lostage_signal 
 by signo. Is it enough?

[1]:

+++ b/include/cobalt/kernel/thread.h
@@ -51,6 +51,15 @@ struct xnsched_tpslot;
 struct xnthread_personality;
 struct completion;

+#define XENO_MUM_SIGNALS 3 /*SIGDEBUG/SIGSHADOW/SIGTERM */
+
+struct lostage_signal {
+       struct pipeline_inband_work inband_work; /* Must be first. */
+       struct task_struct *task;
+       int signo, sigval;
+       struct lostage_signal *self; /* Revisit: I-pipe requirement */
+};
+
 struct xnthread_init_attr {
        struct xnthread_personality *personality;
        cpumask_t affinity;
@@ -205,6 +214,7 @@ struct xnthread {
        const char *exe_path;   /* Executable path */
        u32 proghash;           /* Hash value for exe_path */
 #endif
+       struct lostage_signal sigarray[XENO_MUM_SIGNALS];
 };

>
>> Now, the question for me is how we move forward from this? Is it simpler
>> to fix these issues? Or would it be better first avoid affecting I-pipe
>> cases and living with xnmalloc for an initial phase over Dovetail?
>>
>
>A much better option would be that somebody volunteers to work on
>implementing what has just been described, in order to have the proper
>solution available in both I-pipe an Dovetail contexts, based on generic
>code.
>
>Although such work may not seem obvious at first sight reading the
>jargonish descriptions above, it is actually fairly simple to
>implement. I believe that any contributor would be welcome to ask for
>explanations and details for that purpose.
>
>-- 
>Philippe.

Regards

Hongzhan Chen


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "rtdm/nrtsig: move inband work description off the stack"
  2021-06-04  6:34                       ` Chen, Hongzhan
@ 2021-06-04  7:34                         ` Jan Kiszka
  2021-06-04  7:51                         ` Philippe Gerum
  1 sibling, 0 replies; 21+ messages in thread
From: Jan Kiszka @ 2021-06-04  7:34 UTC (permalink / raw)
  To: Chen, Hongzhan, Philippe Gerum; +Cc: xenomai

On 04.06.21 08:34, Chen, Hongzhan wrote:
> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>
>>> On 26.05.21 15:52, Philippe Gerum wrote:
>>>>
>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>
>>>>> On 25.05.21 15:20, Philippe Gerum wrote:
>>>>>>
>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>
>>>>>>> On 25.05.21 14:37, Philippe Gerum wrote:
>>>>>>>>
>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>
>>>>>>>>> On 25.05.21 12:01, Philippe Gerum wrote:
>>>>>>>>>>
>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>
>>>>>>>>>>> On 25.05.21 10:46, Philippe Gerum wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Philippe,
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
>>>>>>>>>>>>> to what it was (and even over I-pipe). xnmalloc is nothing you would
>>>>>>>>>>>>> expect from a "send a signal" service, specifically from
>>>>>>>>>>>>> rtdm_nrtsig_pend which does not even make use of the sending extra payload.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can we do better? Also for xnthread_signal (fd and udd usecases are not
>>>>>>>>>>>>> time-sensitive anyway).
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Nope, I cannot see any significantly better option which would still
>>>>>>>>>>>> allow a common implementation we can map on top of Dovetail / I-pipe.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> E.g. pre-allocate the work object for data-free signals and use that -
>>>>>>>>>>> or not send it when it is in use, thus pending.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Ok, let's recap:
>>>>>>>>>>
>>>>>>>>>> - rtdm_nrtsig_pend(): does not allocate anything, merely uses a
>>>>>>>>>>   user-provided memory block, containing a request header. Current
>>>>>>>>>>   callers should either already care for not resending a request
>>>>>>>>>>   uselessly because it is pending (e.g. [1]), or their internal logic
>>>>>>>>>>   should prevent that (e.g. [2]). This interface always convey user data
>>>>>>>>>>   by reference.
>>>>>>>>>
>>>>>>>>> Ah, I fantasized that rtdm_nrtsig_pend would call rtdm_schedule_nrt_work
>>>>>>>>> to do the work, it's the other way around. False alarm here.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct in order to
>>>>>>>>>>   convey a request block we cannot squeeze into a work_struct, since the
>>>>>>>>>>   latter is in itself an anchor type the user may embed into its own
>>>>>>>>>>   private argument block. I'm unsure ensuring that
>>>>>>>>>>   rtdm_schedule_nrt_work() calls do not pile up would be worth it, as
>>>>>>>>>>   this call is not supposed to be used frenetically on some hot path in
>>>>>>>>>>   the first place. But if we'd want to do that, then we would have to
>>>>>>>>>>   change the signature of rtdm_schedule_nrt_work(), so that we gain a
>>>>>>>>>>   persistent context data we could use for determining whether a nrt
>>>>>>>>>>   work request is in flight. We could not probe the work_struct for that
>>>>>>>>>>   purpose, that would be racy.
>>>>>>>>>>
>>>>>>>>>> Do you see any way to have a smarter rtdm_schedule_nrt_work() without
>>>>>>>>>> changing its signature?
>>>>>>>>>
>>>>>>>>> There is no rule that prevents changing the signature if that is needed.
>>>>>>>>> However, the kernel is fine without allocation as well, using a very
>>>>>>>>> similar signature.
>>>>>>>>>
>>>>>>>>
>>>>>>>> schedule_work() and rtdm_schedule_nrt_work() have a major difference:
>>>>>>>> the latter has to pass on the request from the oob to the in-band
>>>>>>>> context. This is what bugs us and creates the requirement for additional
>>>>>>>> mechanism and data.
>>>>>>>>
>>>>>>>>> I do not yet understand way we need that indirection via the rtdm_nrtsig
>>>>>>>>> on Dovetail. I thought we can trigger work directly from the oob domain
>>>>>>>>> there, can't we? How do you handle such a case in evl so far?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Dovetail can trigger in-band work from oob via the generic irq_work()
>>>>>>>> service, we don't need the extra code involved in the I-pipe for the
>>>>>>>> same purpose. The gist of the matter is about having the same
>>>>>>>> implementation for rtdm_schedule_nrt_work() which works in both Dovetail
>>>>>>>> and I-pipe contexts: the way we do that is by using rtdm_nrtsig_pend()
>>>>>>>> which already abstracts the base mechanism for relaying oob -> in-band
>>>>>>>> signals.
>>>>>>>>
>>>>>>>> On the other hand, EVL has the evl_call_inband[_from]() service, which
>>>>>>>> combines the irq_work and work_struct anchors we need into a single
>>>>>>>> evl_work type, which in turn can be embedded into a user request
>>>>>>>> block. This is what rtdm_schedule_nrt_work() does not expose, so it has
>>>>>>>> to build one internally by combining a dynamic allocation and the
>>>>>>>> user-provided work_struct.
>>>>>>>>
>>>>>>>
>>>>>>> Then enhance the rtdm service to enable such a combination - and be even
>>>>>>> more ready for the switch to Xenomai 4, I would say. We can either add a
>>>>>>> new service and deprecate the old one (while implementing it as you
>>>>>>> suggest) or just break the interface to get the (presumably) few users
>>>>>>> quickly converted.
>>>>>>>
>>>>>>
>>>>>> rtdm_schedule_nrt_work() was added by Gilles when merging the RTnet code
>>>>>> into Xenomai 3 IIRC, with little documentation. Widespread usage of this
>>>>>> call is unlikely, so I'd vote for simplifying the whole thing and
>>>>>> replace it entirely.
>>>>>>
>>>>>
>>>>> OK, then we have a plan for this piece.
> 
> Do we  need to implement new API to replace rtdm_schedule_nrt_work() to avoid call xnmalloc 
> or just keep it as it is in Xenomai 3.2 ?
> 

As Philippe said, we likely have not many users - if any at all -
outside of the core. So, let's refactor it, enforcing users to migrate
to the new version in 3.2.

>>>>>
>>>>>>>>> And the third case remains xnthread_signal, btw.
>>>>>>>>>
>>>>>>>>
>>>>>>>> xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTERM signals
>>>>>>>> for a thread, a seldom event. Optimizing there would be overkill, unless
>>>>>>>> the application behaves wrongly in the first place.
>>>>>>>
>>>>>>> Some caller are under nklock, thus it makes potentially lengthy critical
>>>>>>> sections now a bit longer.
>>>>>>>
>>>>>>
>>>>>> This is what needs to be fixed in the first place, we should not trigger
>>>>>> a signal relay when holding the ugly big lock.
> 
> Which functions we need to fix or do optimization to avoid calling xnthread_signal when holding
> big lock if possible? xnthread_suspend and xnthread_cancel?  
> 

Let's postpone the nklock avoidance. Some cases are related to debugging
anyway and not easy to break up (e.g. xnsynch_detect_relaxed_owner), and
the others will get mitigated by avoiding xnmalloc - at least they will
not get worse than what we have now.

>>>>>>
>>>>>
>>>>> Whatever is simpler.
>>>>>
>>>>> Another aspect: Using xnmalloc for anything that should better succeed
>>>>> is possibly not good when we are OOM. That is actually no strict
>>>>> regression over I-pipe where we had a limited queue as well IIRC but a
>>>>> reason to avoid carrying that over - and possibly creating new error
>>>>> scenarios that are harder to debug.
>>>>>
>>>>
>>>> To mitigate the issue of generalized OOM, we could pull memory from a
>>>> local xnheap privately attached to xnthread_signal() along with any
>>>> caller requiring dynamic allocation, so that none of them could deplete
>>>> the sysheap.
>>>>
>>>> Getting rid of dynamic allocation entirely on the other hand would
>>>> require us to add one or more "signal slots" per possible cause of
>>>> in-band signal into struct xnthread (e.g. SIGSHADOW_ACTION_BACKTRACE,
>>>> SIGDEBUG_MIGRATE_SIGNAL etc), passing the proper slot to
>>>> xnthread_signal(). SIGSHADOW(SIGWINCH) and SIGDEBUG(SIGXCPU) are
>>>> standard non-queueable signals, so we should only need a single slot per
>>>> signal cause.
>>>>
>>>
>>> Can a thread actually trigger multiple causes per SIGDEBUG, e.g.?
>>> Likely, we only need to carry the single cause via a field in xnthread
>>> from the trigger point to the in-band handler. The same is probably the
>>> case for SIGSHADOW_ACTION_BACKTRACE.
>>>
>>
>> Multiple events triggering SIGSHADOW_ACTION_HOME and
>> SIGSHADOW_ACTION_HARDEN _might_ pile up, although unlikely.
> 
> Does that means that in struct xnthread there should be three slots of
> struct lostage_signal to handle each of SIGDEBUG/SIGSHADOW/SIGTERM
> like patch [1]?
> 
>  If there is no pile up for each signal , we would just allocate fixed position 
> for three signals in the array and then handle corresponding  lostage_signal 
>  by signo. Is it enough?
> 

One pending flag for SIGDEBUG and one field for its first reason, one
for SIGTERM and one for each SIGSHADOW action. That is my current
understanding of things that could be pending in parallel for a thread.

Jan

> [1]:
> 
> +++ b/include/cobalt/kernel/thread.h
> @@ -51,6 +51,15 @@ struct xnsched_tpslot;
>  struct xnthread_personality;
>  struct completion;
> 
> +#define XENO_MUM_SIGNALS 3 /*SIGDEBUG/SIGSHADOW/SIGTERM */
> +
> +struct lostage_signal {
> +       struct pipeline_inband_work inband_work; /* Must be first. */
> +       struct task_struct *task;
> +       int signo, sigval;
> +       struct lostage_signal *self; /* Revisit: I-pipe requirement */
> +};
> +
>  struct xnthread_init_attr {
>         struct xnthread_personality *personality;
>         cpumask_t affinity;
> @@ -205,6 +214,7 @@ struct xnthread {
>         const char *exe_path;   /* Executable path */
>         u32 proghash;           /* Hash value for exe_path */
>  #endif
> +       struct lostage_signal sigarray[XENO_MUM_SIGNALS];
>  };
> 
>>
>>> Now, the question for me is how we move forward from this? Is it simpler
>>> to fix these issues? Or would it be better first avoid affecting I-pipe
>>> cases and living with xnmalloc for an initial phase over Dovetail?
>>>
>>
>> A much better option would be that somebody volunteers to work on
>> implementing what has just been described, in order to have the proper
>> solution available in both I-pipe an Dovetail contexts, based on generic
>> code.
>>
>> Although such work may not seem obvious at first sight reading the
>> jargonish descriptions above, it is actually fairly simple to
>> implement. I believe that any contributor would be welcome to ask for
>> explanations and details for that purpose.
>>
>> -- 
>> Philippe.
> 
> Regards
> 
> Hongzhan Chen
> 

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "rtdm/nrtsig: move inband work description off the stack"
  2021-06-04  6:34                       ` Chen, Hongzhan
  2021-06-04  7:34                         ` Jan Kiszka
@ 2021-06-04  7:51                         ` Philippe Gerum
  2021-06-04  7:54                           ` Jan Kiszka
  1 sibling, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2021-06-04  7:51 UTC (permalink / raw)
  To: Chen, Hongzhan; +Cc: Jan Kiszka, xenomai


Chen, Hongzhan <hongzhan.chen@intel.com> writes:

>>Jan Kiszka <jan.kiszka@siemens.com> writes:
>>
>>> On 26.05.21 15:52, Philippe Gerum wrote:
>>>> 
>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>> 
>>>>> On 25.05.21 15:20, Philippe Gerum wrote:
>>>>>>
>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>
>>>>>>> On 25.05.21 14:37, Philippe Gerum wrote:
>>>>>>>>
>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>
>>>>>>>>> On 25.05.21 12:01, Philippe Gerum wrote:
>>>>>>>>>>
>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>
>>>>>>>>>>> On 25.05.21 10:46, Philippe Gerum wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Philippe,
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
>>>>>>>>>>>>> to what it was (and even over I-pipe). xnmalloc is nothing you would
>>>>>>>>>>>>> expect from a "send a signal" service, specifically from
>>>>>>>>>>>>> rtdm_nrtsig_pend which does not even make use of the sending extra payload.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can we do better? Also for xnthread_signal (fd and udd usecases are not
>>>>>>>>>>>>> time-sensitive anyway).
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Nope, I cannot see any significantly better option which would still
>>>>>>>>>>>> allow a common implementation we can map on top of Dovetail / I-pipe.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> E.g. pre-allocate the work object for data-free signals and use that -
>>>>>>>>>>> or not send it when it is in use, thus pending.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Ok, let's recap:
>>>>>>>>>>
>>>>>>>>>> - rtdm_nrtsig_pend(): does not allocate anything, merely uses a
>>>>>>>>>>   user-provided memory block, containing a request header. Current
>>>>>>>>>>   callers should either already care for not resending a request
>>>>>>>>>>   uselessly because it is pending (e.g. [1]), or their internal logic
>>>>>>>>>>   should prevent that (e.g. [2]). This interface always convey user data
>>>>>>>>>>   by reference.
>>>>>>>>>
>>>>>>>>> Ah, I fantasized that rtdm_nrtsig_pend would call rtdm_schedule_nrt_work
>>>>>>>>> to do the work, it's the other way around. False alarm here.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct in order to
>>>>>>>>>>   convey a request block we cannot squeeze into a work_struct, since the
>>>>>>>>>>   latter is in itself an anchor type the user may embed into its own
>>>>>>>>>>   private argument block. I'm unsure ensuring that
>>>>>>>>>>   rtdm_schedule_nrt_work() calls do not pile up would be worth it, as
>>>>>>>>>>   this call is not supposed to be used frenetically on some hot path in
>>>>>>>>>>   the first place. But if we'd want to do that, then we would have to
>>>>>>>>>>   change the signature of rtdm_schedule_nrt_work(), so that we gain a
>>>>>>>>>>   persistent context data we could use for determining whether a nrt
>>>>>>>>>>   work request is in flight. We could not probe the work_struct for that
>>>>>>>>>>   purpose, that would be racy.
>>>>>>>>>>
>>>>>>>>>> Do you see any way to have a smarter rtdm_schedule_nrt_work() without
>>>>>>>>>> changing its signature?
>>>>>>>>>
>>>>>>>>> There is no rule that prevents changing the signature if that is needed.
>>>>>>>>> However, the kernel is fine without allocation as well, using a very
>>>>>>>>> similar signature.
>>>>>>>>>
>>>>>>>>
>>>>>>>> schedule_work() and rtdm_schedule_nrt_work() have a major difference:
>>>>>>>> the latter has to pass on the request from the oob to the in-band
>>>>>>>> context. This is what bugs us and creates the requirement for additional
>>>>>>>> mechanism and data.
>>>>>>>>
>>>>>>>>> I do not yet understand way we need that indirection via the rtdm_nrtsig
>>>>>>>>> on Dovetail. I thought we can trigger work directly from the oob domain
>>>>>>>>> there, can't we? How do you handle such a case in evl so far?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Dovetail can trigger in-band work from oob via the generic irq_work()
>>>>>>>> service, we don't need the extra code involved in the I-pipe for the
>>>>>>>> same purpose. The gist of the matter is about having the same
>>>>>>>> implementation for rtdm_schedule_nrt_work() which works in both Dovetail
>>>>>>>> and I-pipe contexts: the way we do that is by using rtdm_nrtsig_pend()
>>>>>>>> which already abstracts the base mechanism for relaying oob -> in-band
>>>>>>>> signals.
>>>>>>>>
>>>>>>>> On the other hand, EVL has the evl_call_inband[_from]() service, which
>>>>>>>> combines the irq_work and work_struct anchors we need into a single
>>>>>>>> evl_work type, which in turn can be embedded into a user request
>>>>>>>> block. This is what rtdm_schedule_nrt_work() does not expose, so it has
>>>>>>>> to build one internally by combining a dynamic allocation and the
>>>>>>>> user-provided work_struct.
>>>>>>>>
>>>>>>>
>>>>>>> Then enhance the rtdm service to enable such a combination - and be even
>>>>>>> more ready for the switch to Xenomai 4, I would say. We can either add a
>>>>>>> new service and deprecate the old one (while implementing it as you
>>>>>>> suggest) or just break the interface to get the (presumably) few users
>>>>>>> quickly converted.
>>>>>>>
>>>>>>
>>>>>> rtdm_schedule_nrt_work() was added by Gilles when merging the RTnet code
>>>>>> into Xenomai 3 IIRC, with little documentation. Widespread usage of this
>>>>>> call is unlikely, so I'd vote for simplifying the whole thing and
>>>>>> replace it entirely.
>>>>>>
>>>>>
>>>>> OK, then we have a plan for this piece.
>
> Do we  need to implement new API to replace rtdm_schedule_nrt_work() to avoid call xnmalloc 
> or just keep it as it is in Xenomai 3.2 ?
>
>>>>>
>>>>>>>>> And the third case remains xnthread_signal, btw.
>>>>>>>>>
>>>>>>>>
>>>>>>>> xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTERM signals
>>>>>>>> for a thread, a seldom event. Optimizing there would be overkill, unless
>>>>>>>> the application behaves wrongly in the first place.
>>>>>>>
>>>>>>> Some caller are under nklock, thus it makes potentially lengthy critical
>>>>>>> sections now a bit longer.
>>>>>>>
>>>>>>
>>>>>> This is what needs to be fixed in the first place, we should not trigger
>>>>>> a signal relay when holding the ugly big lock.
>
> Which functions we need to fix or do optimization to avoid calling xnthread_signal when holding
> big lock if possible? xnthread_suspend and xnthread_cancel?  
>
>>>>>>
>>>>>
>>>>> Whatever is simpler.
>>>>>
>>>>> Another aspect: Using xnmalloc for anything that should better succeed
>>>>> is possibly not good when we are OOM. That is actually no strict
>>>>> regression over I-pipe where we had a limited queue as well IIRC but a
>>>>> reason to avoid carrying that over - and possibly creating new error
>>>>> scenarios that are harder to debug.
>>>>>
>>>> 
>>>> To mitigate the issue of generalized OOM, we could pull memory from a
>>>> local xnheap privately attached to xnthread_signal() along with any
>>>> caller requiring dynamic allocation, so that none of them could deplete
>>>> the sysheap.
>>>> 
>>>> Getting rid of dynamic allocation entirely on the other hand would
>>>> require us to add one or more "signal slots" per possible cause of
>>>> in-band signal into struct xnthread (e.g. SIGSHADOW_ACTION_BACKTRACE,
>>>> SIGDEBUG_MIGRATE_SIGNAL etc), passing the proper slot to
>>>> xnthread_signal(). SIGSHADOW(SIGWINCH) and SIGDEBUG(SIGXCPU) are
>>>> standard non-queueable signals, so we should only need a single slot per
>>>> signal cause.
>>>> 
>>>
>>> Can a thread actually trigger multiple causes per SIGDEBUG, e.g.?
>>> Likely, we only need to carry the single cause via a field in xnthread
>>> from the trigger point to the in-band handler. The same is probably the
>>> case for SIGSHADOW_ACTION_BACKTRACE.
>>>
>>
>>Multiple events triggering SIGSHADOW_ACTION_HOME and
>>SIGSHADOW_ACTION_HARDEN _might_ pile up, although unlikely.
>
> Does that means that in struct xnthread there should be three slots of
> struct lostage_signal to handle each of SIGDEBUG/SIGSHADOW/SIGTERM
> like patch [1]?
>
>  If there is no pile up for each signal , we would just allocate fixed position 
> for three signals in the array and then handle corresponding  lostage_signal 
>  by signo. Is it enough?
>
> [1]:
>
> +++ b/include/cobalt/kernel/thread.h
> @@ -51,6 +51,15 @@ struct xnsched_tpslot;
>  struct xnthread_personality;
>  struct completion;
>
> +#define XENO_MUM_SIGNALS 3 /*SIGDEBUG/SIGSHADOW/SIGTERM */
> +
> +struct lostage_signal {
> +       struct pipeline_inband_work inband_work; /* Must be first. */
> +       struct task_struct *task;
> +       int signo, sigval;
> +       struct lostage_signal *self; /* Revisit: I-pipe requirement */
> +};
> +
>  struct xnthread_init_attr {
>         struct xnthread_personality *personality;
>         cpumask_t affinity;
> @@ -205,6 +214,7 @@ struct xnthread {
>         const char *exe_path;   /* Executable path */
>         u32 proghash;           /* Hash value for exe_path */
>  #endif
> +       struct lostage_signal sigarray[XENO_MUM_SIGNALS];
>  };
>

I would have as many entries than we have distinct reasons for notifying
the application. i.e. include/cobalt/uapi/signal.h:

#define SIGSHADOW_ACTION_HARDEN		1
#define SIGSHADOW_ACTION_BACKTRACE	2
#define SIGSHADOW_ACTION_HOME		3
#define SIGDEBUG_MIGRATE_SIGNAL		1
#define SIGDEBUG_MIGRATE_SYSCALL	2
#define SIGDEBUG_MIGRATE_FAULT		3
#define SIGDEBUG_MIGRATE_PRIOINV	4
#define SIGDEBUG_NOMLOCK		5
#define SIGDEBUG_WATCHDOG		6
#define SIGDEBUG_RESCNT_IMBALANCE	7
#define SIGDEBUG_LOCK_BREAK		8
#define SIGDEBUG_MUTEX_SLEEP		9

plus, a dedicated entry for sending SIGTERM.

The reason is that we may have some combinations in which events may be
set pending concurrently for a given task
e.g. SIGSHADOW_ACTION_BACKTRACE _and_ any of SIGDEBUG_MIGRATE_*.

The irq_work interface Dovetail uses allows us to have multiple pending
events, as long as they are represented by distinct work structs (hence
the requirement to have a dedicated inband_work struct per event
type). Likewise, the I-pipe allows multiple events to be set pending by
logging each event into an internal buffer for later delivery to the
notified task.

Now the question is: what about a particular event being notified
multiple times in a series by the core before userland had a chance to
receive the first notification? Could we, or would it matter if we only
receive a single notification for a series of identical events?  e.g.
SIGDEBUG_MIGRATE_SYSCALL x 2, leading to the second one to be ignored as
already pending.

This case should never happen for any cause which is paired with a
transition from oob -> in-band context like SIGDEBUG_MIGRATE_*, by
design. Some causes even filter out multiple triggers explicitly, such
as SIGDEBUG_MUTEX_SLEEP. Other actions are idempotent
(e.g. SIGSHADOW_ACTION_HARDEN, _HOME).

-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "rtdm/nrtsig: move inband work description off the stack"
  2021-06-04  7:51                         ` Philippe Gerum
@ 2021-06-04  7:54                           ` Jan Kiszka
  2021-06-04  8:15                             ` Philippe Gerum
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2021-06-04  7:54 UTC (permalink / raw)
  To: Philippe Gerum, Chen, Hongzhan; +Cc: xenomai

On 04.06.21 09:51, Philippe Gerum wrote:
> 
> Chen, Hongzhan <hongzhan.chen@intel.com> writes:
> 
>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>
>>>> On 26.05.21 15:52, Philippe Gerum wrote:
>>>>>
>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>
>>>>>> On 25.05.21 15:20, Philippe Gerum wrote:
>>>>>>>
>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>
>>>>>>>> On 25.05.21 14:37, Philippe Gerum wrote:
>>>>>>>>>
>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>
>>>>>>>>>> On 25.05.21 12:01, Philippe Gerum wrote:
>>>>>>>>>>>
>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>
>>>>>>>>>>>> On 25.05.21 10:46, Philippe Gerum wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Philippe,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
>>>>>>>>>>>>>> to what it was (and even over I-pipe). xnmalloc is nothing you would
>>>>>>>>>>>>>> expect from a "send a signal" service, specifically from
>>>>>>>>>>>>>> rtdm_nrtsig_pend which does not even make use of the sending extra payload.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can we do better? Also for xnthread_signal (fd and udd usecases are not
>>>>>>>>>>>>>> time-sensitive anyway).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Nope, I cannot see any significantly better option which would still
>>>>>>>>>>>>> allow a common implementation we can map on top of Dovetail / I-pipe.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> E.g. pre-allocate the work object for data-free signals and use that -
>>>>>>>>>>>> or not send it when it is in use, thus pending.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Ok, let's recap:
>>>>>>>>>>>
>>>>>>>>>>> - rtdm_nrtsig_pend(): does not allocate anything, merely uses a
>>>>>>>>>>>   user-provided memory block, containing a request header. Current
>>>>>>>>>>>   callers should either already care for not resending a request
>>>>>>>>>>>   uselessly because it is pending (e.g. [1]), or their internal logic
>>>>>>>>>>>   should prevent that (e.g. [2]). This interface always convey user data
>>>>>>>>>>>   by reference.
>>>>>>>>>>
>>>>>>>>>> Ah, I fantasized that rtdm_nrtsig_pend would call rtdm_schedule_nrt_work
>>>>>>>>>> to do the work, it's the other way around. False alarm here.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct in order to
>>>>>>>>>>>   convey a request block we cannot squeeze into a work_struct, since the
>>>>>>>>>>>   latter is in itself an anchor type the user may embed into its own
>>>>>>>>>>>   private argument block. I'm unsure ensuring that
>>>>>>>>>>>   rtdm_schedule_nrt_work() calls do not pile up would be worth it, as
>>>>>>>>>>>   this call is not supposed to be used frenetically on some hot path in
>>>>>>>>>>>   the first place. But if we'd want to do that, then we would have to
>>>>>>>>>>>   change the signature of rtdm_schedule_nrt_work(), so that we gain a
>>>>>>>>>>>   persistent context data we could use for determining whether a nrt
>>>>>>>>>>>   work request is in flight. We could not probe the work_struct for that
>>>>>>>>>>>   purpose, that would be racy.
>>>>>>>>>>>
>>>>>>>>>>> Do you see any way to have a smarter rtdm_schedule_nrt_work() without
>>>>>>>>>>> changing its signature?
>>>>>>>>>>
>>>>>>>>>> There is no rule that prevents changing the signature if that is needed.
>>>>>>>>>> However, the kernel is fine without allocation as well, using a very
>>>>>>>>>> similar signature.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> schedule_work() and rtdm_schedule_nrt_work() have a major difference:
>>>>>>>>> the latter has to pass on the request from the oob to the in-band
>>>>>>>>> context. This is what bugs us and creates the requirement for additional
>>>>>>>>> mechanism and data.
>>>>>>>>>
>>>>>>>>>> I do not yet understand way we need that indirection via the rtdm_nrtsig
>>>>>>>>>> on Dovetail. I thought we can trigger work directly from the oob domain
>>>>>>>>>> there, can't we? How do you handle such a case in evl so far?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Dovetail can trigger in-band work from oob via the generic irq_work()
>>>>>>>>> service, we don't need the extra code involved in the I-pipe for the
>>>>>>>>> same purpose. The gist of the matter is about having the same
>>>>>>>>> implementation for rtdm_schedule_nrt_work() which works in both Dovetail
>>>>>>>>> and I-pipe contexts: the way we do that is by using rtdm_nrtsig_pend()
>>>>>>>>> which already abstracts the base mechanism for relaying oob -> in-band
>>>>>>>>> signals.
>>>>>>>>>
>>>>>>>>> On the other hand, EVL has the evl_call_inband[_from]() service, which
>>>>>>>>> combines the irq_work and work_struct anchors we need into a single
>>>>>>>>> evl_work type, which in turn can be embedded into a user request
>>>>>>>>> block. This is what rtdm_schedule_nrt_work() does not expose, so it has
>>>>>>>>> to build one internally by combining a dynamic allocation and the
>>>>>>>>> user-provided work_struct.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Then enhance the rtdm service to enable such a combination - and be even
>>>>>>>> more ready for the switch to Xenomai 4, I would say. We can either add a
>>>>>>>> new service and deprecate the old one (while implementing it as you
>>>>>>>> suggest) or just break the interface to get the (presumably) few users
>>>>>>>> quickly converted.
>>>>>>>>
>>>>>>>
>>>>>>> rtdm_schedule_nrt_work() was added by Gilles when merging the RTnet code
>>>>>>> into Xenomai 3 IIRC, with little documentation. Widespread usage of this
>>>>>>> call is unlikely, so I'd vote for simplifying the whole thing and
>>>>>>> replace it entirely.
>>>>>>>
>>>>>>
>>>>>> OK, then we have a plan for this piece.
>>
>> Do we  need to implement new API to replace rtdm_schedule_nrt_work() to avoid call xnmalloc 
>> or just keep it as it is in Xenomai 3.2 ?
>>
>>>>>>
>>>>>>>>>> And the third case remains xnthread_signal, btw.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTERM signals
>>>>>>>>> for a thread, a seldom event. Optimizing there would be overkill, unless
>>>>>>>>> the application behaves wrongly in the first place.
>>>>>>>>
>>>>>>>> Some caller are under nklock, thus it makes potentially lengthy critical
>>>>>>>> sections now a bit longer.
>>>>>>>>
>>>>>>>
>>>>>>> This is what needs to be fixed in the first place, we should not trigger
>>>>>>> a signal relay when holding the ugly big lock.
>>
>> Which functions we need to fix or do optimization to avoid calling xnthread_signal when holding
>> big lock if possible? xnthread_suspend and xnthread_cancel?  
>>
>>>>>>>
>>>>>>
>>>>>> Whatever is simpler.
>>>>>>
>>>>>> Another aspect: Using xnmalloc for anything that should better succeed
>>>>>> is possibly not good when we are OOM. That is actually no strict
>>>>>> regression over I-pipe where we had a limited queue as well IIRC but a
>>>>>> reason to avoid carrying that over - and possibly creating new error
>>>>>> scenarios that are harder to debug.
>>>>>>
>>>>>
>>>>> To mitigate the issue of generalized OOM, we could pull memory from a
>>>>> local xnheap privately attached to xnthread_signal() along with any
>>>>> caller requiring dynamic allocation, so that none of them could deplete
>>>>> the sysheap.
>>>>>
>>>>> Getting rid of dynamic allocation entirely on the other hand would
>>>>> require us to add one or more "signal slots" per possible cause of
>>>>> in-band signal into struct xnthread (e.g. SIGSHADOW_ACTION_BACKTRACE,
>>>>> SIGDEBUG_MIGRATE_SIGNAL etc), passing the proper slot to
>>>>> xnthread_signal(). SIGSHADOW(SIGWINCH) and SIGDEBUG(SIGXCPU) are
>>>>> standard non-queueable signals, so we should only need a single slot per
>>>>> signal cause.
>>>>>
>>>>
>>>> Can a thread actually trigger multiple causes per SIGDEBUG, e.g.?
>>>> Likely, we only need to carry the single cause via a field in xnthread
>>>> from the trigger point to the in-band handler. The same is probably the
>>>> case for SIGSHADOW_ACTION_BACKTRACE.
>>>>
>>>
>>> Multiple events triggering SIGSHADOW_ACTION_HOME and
>>> SIGSHADOW_ACTION_HARDEN _might_ pile up, although unlikely.
>>
>> Does that means that in struct xnthread there should be three slots of
>> struct lostage_signal to handle each of SIGDEBUG/SIGSHADOW/SIGTERM
>> like patch [1]?
>>
>>  If there is no pile up for each signal , we would just allocate fixed position 
>> for three signals in the array and then handle corresponding  lostage_signal 
>>  by signo. Is it enough?
>>
>> [1]:
>>
>> +++ b/include/cobalt/kernel/thread.h
>> @@ -51,6 +51,15 @@ struct xnsched_tpslot;
>>  struct xnthread_personality;
>>  struct completion;
>>
>> +#define XENO_MUM_SIGNALS 3 /*SIGDEBUG/SIGSHADOW/SIGTERM */
>> +
>> +struct lostage_signal {
>> +       struct pipeline_inband_work inband_work; /* Must be first. */
>> +       struct task_struct *task;
>> +       int signo, sigval;
>> +       struct lostage_signal *self; /* Revisit: I-pipe requirement */
>> +};
>> +
>>  struct xnthread_init_attr {
>>         struct xnthread_personality *personality;
>>         cpumask_t affinity;
>> @@ -205,6 +214,7 @@ struct xnthread {
>>         const char *exe_path;   /* Executable path */
>>         u32 proghash;           /* Hash value for exe_path */
>>  #endif
>> +       struct lostage_signal sigarray[XENO_MUM_SIGNALS];
>>  };
>>
> 
> I would have as many entries than we have distinct reasons for notifying
> the application. i.e. include/cobalt/uapi/signal.h:
> 
> #define SIGSHADOW_ACTION_HARDEN		1
> #define SIGSHADOW_ACTION_BACKTRACE	2
> #define SIGSHADOW_ACTION_HOME		3
> #define SIGDEBUG_MIGRATE_SIGNAL		1
> #define SIGDEBUG_MIGRATE_SYSCALL	2
> #define SIGDEBUG_MIGRATE_FAULT		3
> #define SIGDEBUG_MIGRATE_PRIOINV	4
> #define SIGDEBUG_NOMLOCK		5
> #define SIGDEBUG_WATCHDOG		6
> #define SIGDEBUG_RESCNT_IMBALANCE	7
> #define SIGDEBUG_LOCK_BREAK		8
> #define SIGDEBUG_MUTEX_SLEEP		9

For SIGDEBUG, I believe we only need the first reason, as I wrote in the
other reply. We are interested in the *initial* reason for a thread to
migrate. Once it is migrated, it will consume that and only generate a
new reason after going back.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "rtdm/nrtsig: move inband work description off the stack"
  2021-06-04  7:54                           ` Jan Kiszka
@ 2021-06-04  8:15                             ` Philippe Gerum
  2021-06-04  8:21                               ` Jan Kiszka
  2021-06-04  8:36                               ` Chen, Hongzhan
  0 siblings, 2 replies; 21+ messages in thread
From: Philippe Gerum @ 2021-06-04  8:15 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Chen, Hongzhan, xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 04.06.21 09:51, Philippe Gerum wrote:
>> 
>> Chen, Hongzhan <hongzhan.chen@intel.com> writes:
>> 
>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>
>>>>> On 26.05.21 15:52, Philippe Gerum wrote:
>>>>>>
>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>
>>>>>>> On 25.05.21 15:20, Philippe Gerum wrote:
>>>>>>>>
>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>
>>>>>>>>> On 25.05.21 14:37, Philippe Gerum wrote:
>>>>>>>>>>
>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>
>>>>>>>>>>> On 25.05.21 12:01, Philippe Gerum wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 25.05.21 10:46, Philippe Gerum wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Philippe,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
>>>>>>>>>>>>>>> to what it was (and even over I-pipe). xnmalloc is nothing you would
>>>>>>>>>>>>>>> expect from a "send a signal" service, specifically from
>>>>>>>>>>>>>>> rtdm_nrtsig_pend which does not even make use of the sending extra payload.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can we do better? Also for xnthread_signal (fd and udd usecases are not
>>>>>>>>>>>>>>> time-sensitive anyway).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Nope, I cannot see any significantly better option which would still
>>>>>>>>>>>>>> allow a common implementation we can map on top of Dovetail / I-pipe.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> E.g. pre-allocate the work object for data-free signals and use that -
>>>>>>>>>>>>> or not send it when it is in use, thus pending.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Ok, let's recap:
>>>>>>>>>>>>
>>>>>>>>>>>> - rtdm_nrtsig_pend(): does not allocate anything, merely uses a
>>>>>>>>>>>>   user-provided memory block, containing a request header. Current
>>>>>>>>>>>>   callers should either already care for not resending a request
>>>>>>>>>>>>   uselessly because it is pending (e.g. [1]), or their internal logic
>>>>>>>>>>>>   should prevent that (e.g. [2]). This interface always convey user data
>>>>>>>>>>>>   by reference.
>>>>>>>>>>>
>>>>>>>>>>> Ah, I fantasized that rtdm_nrtsig_pend would call rtdm_schedule_nrt_work
>>>>>>>>>>> to do the work, it's the other way around. False alarm here.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct in order to
>>>>>>>>>>>>   convey a request block we cannot squeeze into a work_struct, since the
>>>>>>>>>>>>   latter is in itself an anchor type the user may embed into its own
>>>>>>>>>>>>   private argument block. I'm unsure ensuring that
>>>>>>>>>>>>   rtdm_schedule_nrt_work() calls do not pile up would be worth it, as
>>>>>>>>>>>>   this call is not supposed to be used frenetically on some hot path in
>>>>>>>>>>>>   the first place. But if we'd want to do that, then we would have to
>>>>>>>>>>>>   change the signature of rtdm_schedule_nrt_work(), so that we gain a
>>>>>>>>>>>>   persistent context data we could use for determining whether a nrt
>>>>>>>>>>>>   work request is in flight. We could not probe the work_struct for that
>>>>>>>>>>>>   purpose, that would be racy.
>>>>>>>>>>>>
>>>>>>>>>>>> Do you see any way to have a smarter rtdm_schedule_nrt_work() without
>>>>>>>>>>>> changing its signature?
>>>>>>>>>>>
>>>>>>>>>>> There is no rule that prevents changing the signature if that is needed.
>>>>>>>>>>> However, the kernel is fine without allocation as well, using a very
>>>>>>>>>>> similar signature.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> schedule_work() and rtdm_schedule_nrt_work() have a major difference:
>>>>>>>>>> the latter has to pass on the request from the oob to the in-band
>>>>>>>>>> context. This is what bugs us and creates the requirement for additional
>>>>>>>>>> mechanism and data.
>>>>>>>>>>
>>>>>>>>>>> I do not yet understand way we need that indirection via the rtdm_nrtsig
>>>>>>>>>>> on Dovetail. I thought we can trigger work directly from the oob domain
>>>>>>>>>>> there, can't we? How do you handle such a case in evl so far?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Dovetail can trigger in-band work from oob via the generic irq_work()
>>>>>>>>>> service, we don't need the extra code involved in the I-pipe for the
>>>>>>>>>> same purpose. The gist of the matter is about having the same
>>>>>>>>>> implementation for rtdm_schedule_nrt_work() which works in both Dovetail
>>>>>>>>>> and I-pipe contexts: the way we do that is by using rtdm_nrtsig_pend()
>>>>>>>>>> which already abstracts the base mechanism for relaying oob -> in-band
>>>>>>>>>> signals.
>>>>>>>>>>
>>>>>>>>>> On the other hand, EVL has the evl_call_inband[_from]() service, which
>>>>>>>>>> combines the irq_work and work_struct anchors we need into a single
>>>>>>>>>> evl_work type, which in turn can be embedded into a user request
>>>>>>>>>> block. This is what rtdm_schedule_nrt_work() does not expose, so it has
>>>>>>>>>> to build one internally by combining a dynamic allocation and the
>>>>>>>>>> user-provided work_struct.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Then enhance the rtdm service to enable such a combination - and be even
>>>>>>>>> more ready for the switch to Xenomai 4, I would say. We can either add a
>>>>>>>>> new service and deprecate the old one (while implementing it as you
>>>>>>>>> suggest) or just break the interface to get the (presumably) few users
>>>>>>>>> quickly converted.
>>>>>>>>>
>>>>>>>>
>>>>>>>> rtdm_schedule_nrt_work() was added by Gilles when merging the RTnet code
>>>>>>>> into Xenomai 3 IIRC, with little documentation. Widespread usage of this
>>>>>>>> call is unlikely, so I'd vote for simplifying the whole thing and
>>>>>>>> replace it entirely.
>>>>>>>>
>>>>>>>
>>>>>>> OK, then we have a plan for this piece.
>>>
>>> Do we  need to implement new API to replace rtdm_schedule_nrt_work() to avoid call xnmalloc 
>>> or just keep it as it is in Xenomai 3.2 ?
>>>
>>>>>>>
>>>>>>>>>>> And the third case remains xnthread_signal, btw.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTERM signals
>>>>>>>>>> for a thread, a seldom event. Optimizing there would be overkill, unless
>>>>>>>>>> the application behaves wrongly in the first place.
>>>>>>>>>
>>>>>>>>> Some caller are under nklock, thus it makes potentially lengthy critical
>>>>>>>>> sections now a bit longer.
>>>>>>>>>
>>>>>>>>
>>>>>>>> This is what needs to be fixed in the first place, we should not trigger
>>>>>>>> a signal relay when holding the ugly big lock.
>>>
>>> Which functions we need to fix or do optimization to avoid calling xnthread_signal when holding
>>> big lock if possible? xnthread_suspend and xnthread_cancel?  
>>>
>>>>>>>>
>>>>>>>
>>>>>>> Whatever is simpler.
>>>>>>>
>>>>>>> Another aspect: Using xnmalloc for anything that should better succeed
>>>>>>> is possibly not good when we are OOM. That is actually no strict
>>>>>>> regression over I-pipe where we had a limited queue as well IIRC but a
>>>>>>> reason to avoid carrying that over - and possibly creating new error
>>>>>>> scenarios that are harder to debug.
>>>>>>>
>>>>>>
>>>>>> To mitigate the issue of generalized OOM, we could pull memory from a
>>>>>> local xnheap privately attached to xnthread_signal() along with any
>>>>>> caller requiring dynamic allocation, so that none of them could deplete
>>>>>> the sysheap.
>>>>>>
>>>>>> Getting rid of dynamic allocation entirely on the other hand would
>>>>>> require us to add one or more "signal slots" per possible cause of
>>>>>> in-band signal into struct xnthread (e.g. SIGSHADOW_ACTION_BACKTRACE,
>>>>>> SIGDEBUG_MIGRATE_SIGNAL etc), passing the proper slot to
>>>>>> xnthread_signal(). SIGSHADOW(SIGWINCH) and SIGDEBUG(SIGXCPU) are
>>>>>> standard non-queueable signals, so we should only need a single slot per
>>>>>> signal cause.
>>>>>>
>>>>>
>>>>> Can a thread actually trigger multiple causes per SIGDEBUG, e.g.?
>>>>> Likely, we only need to carry the single cause via a field in xnthread
>>>>> from the trigger point to the in-band handler. The same is probably the
>>>>> case for SIGSHADOW_ACTION_BACKTRACE.
>>>>>
>>>>
>>>> Multiple events triggering SIGSHADOW_ACTION_HOME and
>>>> SIGSHADOW_ACTION_HARDEN _might_ pile up, although unlikely.
>>>
>>> Does that means that in struct xnthread there should be three slots of
>>> struct lostage_signal to handle each of SIGDEBUG/SIGSHADOW/SIGTERM
>>> like patch [1]?
>>>
>>>  If there is no pile up for each signal , we would just allocate fixed position 
>>> for three signals in the array and then handle corresponding  lostage_signal 
>>>  by signo. Is it enough?
>>>
>>> [1]:
>>>
>>> +++ b/include/cobalt/kernel/thread.h
>>> @@ -51,6 +51,15 @@ struct xnsched_tpslot;
>>>  struct xnthread_personality;
>>>  struct completion;
>>>
>>> +#define XENO_MUM_SIGNALS 3 /*SIGDEBUG/SIGSHADOW/SIGTERM */
>>> +
>>> +struct lostage_signal {
>>> +       struct pipeline_inband_work inband_work; /* Must be first. */
>>> +       struct task_struct *task;
>>> +       int signo, sigval;
>>> +       struct lostage_signal *self; /* Revisit: I-pipe requirement */
>>> +};
>>> +
>>>  struct xnthread_init_attr {
>>>         struct xnthread_personality *personality;
>>>         cpumask_t affinity;
>>> @@ -205,6 +214,7 @@ struct xnthread {
>>>         const char *exe_path;   /* Executable path */
>>>         u32 proghash;           /* Hash value for exe_path */
>>>  #endif
>>> +       struct lostage_signal sigarray[XENO_MUM_SIGNALS];
>>>  };
>>>
>> 
>> I would have as many entries than we have distinct reasons for notifying
>> the application. i.e. include/cobalt/uapi/signal.h:
>> 
>> #define SIGSHADOW_ACTION_HARDEN		1
>> #define SIGSHADOW_ACTION_BACKTRACE	2
>> #define SIGSHADOW_ACTION_HOME		3
>> #define SIGDEBUG_MIGRATE_SIGNAL		1
>> #define SIGDEBUG_MIGRATE_SYSCALL	2
>> #define SIGDEBUG_MIGRATE_FAULT		3
>> #define SIGDEBUG_MIGRATE_PRIOINV	4
>> #define SIGDEBUG_NOMLOCK		5
>> #define SIGDEBUG_WATCHDOG		6
>> #define SIGDEBUG_RESCNT_IMBALANCE	7
>> #define SIGDEBUG_LOCK_BREAK		8
>> #define SIGDEBUG_MUTEX_SLEEP		9
>
> For SIGDEBUG, I believe we only need the first reason, as I wrote in the
> other reply. We are interested in the *initial* reason for a thread to
> migrate. Once it is migrated, it will consume that and only generate a
> new reason after going back.
>

We may have SIGDEBUG_WATCHDOG pending concurrently to any other causes,
this is triggered asynchronously by the oob timer. So at the very least,
you want this one to have its own slot.

So, to sum up: the following events are synchronous and mutually
exclusive due to the oob->in-band transition they cause:

SIGDEBUG_MIGRATE_SIGNAL, SIGDEBUG_MIGRATE_SYSCALL,
SIGDEBUG_MIGRATE_FAULT, SIGDEBUG_MIGRATE_PRIOINV, SIGDEBUG_MUTEX_SLEEP,
SIGDEBUG_LOCK_BREAK, RESCNT_IMBALANCE. For these we may share a single slot.
SIGDEBUG_NOMLOCK can only happen at init when no Xenomai service is
available yet, we don't have to bother. This one can join the about
group too.

SIGDEBUG_WATCHDOG needs its own slot.

All SIGSHADOW_* events need their own slot: SIGSHADOW_ACTION_HARDEN and
SIGSHADOW_ACTION_HOME can be raised by remote threads asynchronously to
the target thread. SIGSHADOW_ACTION_BACKTRACE comes in addition to
SIGDEBUG_MIGRATE_*. For the latter reason, SIGSHADOW_ACTION_BACKTRACE
cannot pile up though (single we have to transition from oob->in-band
context in between).

That would make five distinct slots in struct xnthread each representing
a group of events, each advertising its own pipeline_inband_work struct.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "rtdm/nrtsig: move inband work description off the stack"
  2021-06-04  8:15                             ` Philippe Gerum
@ 2021-06-04  8:21                               ` Jan Kiszka
  2021-06-04 12:53                                 ` Philippe Gerum
  2021-06-04  8:36                               ` Chen, Hongzhan
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2021-06-04  8:21 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Chen, Hongzhan, xenomai

On 04.06.21 10:15, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> On 04.06.21 09:51, Philippe Gerum wrote:
>>>
>>> Chen, Hongzhan <hongzhan.chen@intel.com> writes:
>>>
>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>
>>>>>> On 26.05.21 15:52, Philippe Gerum wrote:
>>>>>>>
>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>
>>>>>>>> On 25.05.21 15:20, Philippe Gerum wrote:
>>>>>>>>>
>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>
>>>>>>>>>> On 25.05.21 14:37, Philippe Gerum wrote:
>>>>>>>>>>>
>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>
>>>>>>>>>>>> On 25.05.21 12:01, Philippe Gerum wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 25.05.21 10:46, Philippe Gerum wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Philippe,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
>>>>>>>>>>>>>>>> to what it was (and even over I-pipe). xnmalloc is nothing you would
>>>>>>>>>>>>>>>> expect from a "send a signal" service, specifically from
>>>>>>>>>>>>>>>> rtdm_nrtsig_pend which does not even make use of the sending extra payload.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can we do better? Also for xnthread_signal (fd and udd usecases are not
>>>>>>>>>>>>>>>> time-sensitive anyway).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Nope, I cannot see any significantly better option which would still
>>>>>>>>>>>>>>> allow a common implementation we can map on top of Dovetail / I-pipe.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> E.g. pre-allocate the work object for data-free signals and use that -
>>>>>>>>>>>>>> or not send it when it is in use, thus pending.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ok, let's recap:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - rtdm_nrtsig_pend(): does not allocate anything, merely uses a
>>>>>>>>>>>>>   user-provided memory block, containing a request header. Current
>>>>>>>>>>>>>   callers should either already care for not resending a request
>>>>>>>>>>>>>   uselessly because it is pending (e.g. [1]), or their internal logic
>>>>>>>>>>>>>   should prevent that (e.g. [2]). This interface always convey user data
>>>>>>>>>>>>>   by reference.
>>>>>>>>>>>>
>>>>>>>>>>>> Ah, I fantasized that rtdm_nrtsig_pend would call rtdm_schedule_nrt_work
>>>>>>>>>>>> to do the work, it's the other way around. False alarm here.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct in order to
>>>>>>>>>>>>>   convey a request block we cannot squeeze into a work_struct, since the
>>>>>>>>>>>>>   latter is in itself an anchor type the user may embed into its own
>>>>>>>>>>>>>   private argument block. I'm unsure ensuring that
>>>>>>>>>>>>>   rtdm_schedule_nrt_work() calls do not pile up would be worth it, as
>>>>>>>>>>>>>   this call is not supposed to be used frenetically on some hot path in
>>>>>>>>>>>>>   the first place. But if we'd want to do that, then we would have to
>>>>>>>>>>>>>   change the signature of rtdm_schedule_nrt_work(), so that we gain a
>>>>>>>>>>>>>   persistent context data we could use for determining whether a nrt
>>>>>>>>>>>>>   work request is in flight. We could not probe the work_struct for that
>>>>>>>>>>>>>   purpose, that would be racy.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do you see any way to have a smarter rtdm_schedule_nrt_work() without
>>>>>>>>>>>>> changing its signature?
>>>>>>>>>>>>
>>>>>>>>>>>> There is no rule that prevents changing the signature if that is needed.
>>>>>>>>>>>> However, the kernel is fine without allocation as well, using a very
>>>>>>>>>>>> similar signature.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> schedule_work() and rtdm_schedule_nrt_work() have a major difference:
>>>>>>>>>>> the latter has to pass on the request from the oob to the in-band
>>>>>>>>>>> context. This is what bugs us and creates the requirement for additional
>>>>>>>>>>> mechanism and data.
>>>>>>>>>>>
>>>>>>>>>>>> I do not yet understand way we need that indirection via the rtdm_nrtsig
>>>>>>>>>>>> on Dovetail. I thought we can trigger work directly from the oob domain
>>>>>>>>>>>> there, can't we? How do you handle such a case in evl so far?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Dovetail can trigger in-band work from oob via the generic irq_work()
>>>>>>>>>>> service, we don't need the extra code involved in the I-pipe for the
>>>>>>>>>>> same purpose. The gist of the matter is about having the same
>>>>>>>>>>> implementation for rtdm_schedule_nrt_work() which works in both Dovetail
>>>>>>>>>>> and I-pipe contexts: the way we do that is by using rtdm_nrtsig_pend()
>>>>>>>>>>> which already abstracts the base mechanism for relaying oob -> in-band
>>>>>>>>>>> signals.
>>>>>>>>>>>
>>>>>>>>>>> On the other hand, EVL has the evl_call_inband[_from]() service, which
>>>>>>>>>>> combines the irq_work and work_struct anchors we need into a single
>>>>>>>>>>> evl_work type, which in turn can be embedded into a user request
>>>>>>>>>>> block. This is what rtdm_schedule_nrt_work() does not expose, so it has
>>>>>>>>>>> to build one internally by combining a dynamic allocation and the
>>>>>>>>>>> user-provided work_struct.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Then enhance the rtdm service to enable such a combination - and be even
>>>>>>>>>> more ready for the switch to Xenomai 4, I would say. We can either add a
>>>>>>>>>> new service and deprecate the old one (while implementing it as you
>>>>>>>>>> suggest) or just break the interface to get the (presumably) few users
>>>>>>>>>> quickly converted.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> rtdm_schedule_nrt_work() was added by Gilles when merging the RTnet code
>>>>>>>>> into Xenomai 3 IIRC, with little documentation. Widespread usage of this
>>>>>>>>> call is unlikely, so I'd vote for simplifying the whole thing and
>>>>>>>>> replace it entirely.
>>>>>>>>>
>>>>>>>>
>>>>>>>> OK, then we have a plan for this piece.
>>>>
>>>> Do we  need to implement new API to replace rtdm_schedule_nrt_work() to avoid call xnmalloc 
>>>> or just keep it as it is in Xenomai 3.2 ?
>>>>
>>>>>>>>
>>>>>>>>>>>> And the third case remains xnthread_signal, btw.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTERM signals
>>>>>>>>>>> for a thread, a seldom event. Optimizing there would be overkill, unless
>>>>>>>>>>> the application behaves wrongly in the first place.
>>>>>>>>>>
>>>>>>>>>> Some caller are under nklock, thus it makes potentially lengthy critical
>>>>>>>>>> sections now a bit longer.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is what needs to be fixed in the first place, we should not trigger
>>>>>>>>> a signal relay when holding the ugly big lock.
>>>>
>>>> Which functions we need to fix or do optimization to avoid calling xnthread_signal when holding
>>>> big lock if possible? xnthread_suspend and xnthread_cancel?  
>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Whatever is simpler.
>>>>>>>>
>>>>>>>> Another aspect: Using xnmalloc for anything that should better succeed
>>>>>>>> is possibly not good when we are OOM. That is actually no strict
>>>>>>>> regression over I-pipe where we had a limited queue as well IIRC but a
>>>>>>>> reason to avoid carrying that over - and possibly creating new error
>>>>>>>> scenarios that are harder to debug.
>>>>>>>>
>>>>>>>
>>>>>>> To mitigate the issue of generalized OOM, we could pull memory from a
>>>>>>> local xnheap privately attached to xnthread_signal() along with any
>>>>>>> caller requiring dynamic allocation, so that none of them could deplete
>>>>>>> the sysheap.
>>>>>>>
>>>>>>> Getting rid of dynamic allocation entirely on the other hand would
>>>>>>> require us to add one or more "signal slots" per possible cause of
>>>>>>> in-band signal into struct xnthread (e.g. SIGSHADOW_ACTION_BACKTRACE,
>>>>>>> SIGDEBUG_MIGRATE_SIGNAL etc), passing the proper slot to
>>>>>>> xnthread_signal(). SIGSHADOW(SIGWINCH) and SIGDEBUG(SIGXCPU) are
>>>>>>> standard non-queueable signals, so we should only need a single slot per
>>>>>>> signal cause.
>>>>>>>
>>>>>>
>>>>>> Can a thread actually trigger multiple causes per SIGDEBUG, e.g.?
>>>>>> Likely, we only need to carry the single cause via a field in xnthread
>>>>>> from the trigger point to the in-band handler. The same is probably the
>>>>>> case for SIGSHADOW_ACTION_BACKTRACE.
>>>>>>
>>>>>
>>>>> Multiple events triggering SIGSHADOW_ACTION_HOME and
>>>>> SIGSHADOW_ACTION_HARDEN _might_ pile up, although unlikely.
>>>>
>>>> Does that means that in struct xnthread there should be three slots of
>>>> struct lostage_signal to handle each of SIGDEBUG/SIGSHADOW/SIGTERM
>>>> like patch [1]?
>>>>
>>>>  If there is no pile up for each signal , we would just allocate fixed position 
>>>> for three signals in the array and then handle corresponding  lostage_signal 
>>>>  by signo. Is it enough?
>>>>
>>>> [1]:
>>>>
>>>> +++ b/include/cobalt/kernel/thread.h
>>>> @@ -51,6 +51,15 @@ struct xnsched_tpslot;
>>>>  struct xnthread_personality;
>>>>  struct completion;
>>>>
>>>> +#define XENO_MUM_SIGNALS 3 /*SIGDEBUG/SIGSHADOW/SIGTERM */
>>>> +
>>>> +struct lostage_signal {
>>>> +       struct pipeline_inband_work inband_work; /* Must be first. */
>>>> +       struct task_struct *task;
>>>> +       int signo, sigval;
>>>> +       struct lostage_signal *self; /* Revisit: I-pipe requirement */
>>>> +};
>>>> +
>>>>  struct xnthread_init_attr {
>>>>         struct xnthread_personality *personality;
>>>>         cpumask_t affinity;
>>>> @@ -205,6 +214,7 @@ struct xnthread {
>>>>         const char *exe_path;   /* Executable path */
>>>>         u32 proghash;           /* Hash value for exe_path */
>>>>  #endif
>>>> +       struct lostage_signal sigarray[XENO_MUM_SIGNALS];
>>>>  };
>>>>
>>>
>>> I would have as many entries than we have distinct reasons for notifying
>>> the application. i.e. include/cobalt/uapi/signal.h:
>>>
>>> #define SIGSHADOW_ACTION_HARDEN		1
>>> #define SIGSHADOW_ACTION_BACKTRACE	2
>>> #define SIGSHADOW_ACTION_HOME		3
>>> #define SIGDEBUG_MIGRATE_SIGNAL		1
>>> #define SIGDEBUG_MIGRATE_SYSCALL	2
>>> #define SIGDEBUG_MIGRATE_FAULT		3
>>> #define SIGDEBUG_MIGRATE_PRIOINV	4
>>> #define SIGDEBUG_NOMLOCK		5
>>> #define SIGDEBUG_WATCHDOG		6
>>> #define SIGDEBUG_RESCNT_IMBALANCE	7
>>> #define SIGDEBUG_LOCK_BREAK		8
>>> #define SIGDEBUG_MUTEX_SLEEP		9
>>
>> For SIGDEBUG, I believe we only need the first reason, as I wrote in the
>> other reply. We are interested in the *initial* reason for a thread to
>> migrate. Once it is migrated, it will consume that and only generate a
>> new reason after going back.
>>
> 
> We may have SIGDEBUG_WATCHDOG pending concurrently to any other causes,
> this is triggered asynchronously by the oob timer. So at the very least,
> you want this one to have its own slot.

First come, first serve - we only need to know why we initially
migrated. But, yes, that may imply locking against concurrent filling of
the reason field.

Jan

> 
> So, to sum up: the following events are synchronous and mutually
> exclusive due to the oob->in-band transition they cause:
> 
> SIGDEBUG_MIGRATE_SIGNAL, SIGDEBUG_MIGRATE_SYSCALL,
> SIGDEBUG_MIGRATE_FAULT, SIGDEBUG_MIGRATE_PRIOINV, SIGDEBUG_MUTEX_SLEEP,
> SIGDEBUG_LOCK_BREAK, RESCNT_IMBALANCE. For these we may share a single slot.
> SIGDEBUG_NOMLOCK can only happen at init when no Xenomai service is
> available yet, we don't have to bother. This one can join the about
> group too.
> 
> SIGDEBUG_WATCHDOG needs its own slot.
> 
> All SIGSHADOW_* events need their own slot: SIGSHADOW_ACTION_HARDEN and
> SIGSHADOW_ACTION_HOME can be raised by remote threads asynchronously to
> the target thread. SIGSHADOW_ACTION_BACKTRACE comes in addition to
> SIGDEBUG_MIGRATE_*. For the latter reason, SIGSHADOW_ACTION_BACKTRACE
> cannot pile up though (single we have to transition from oob->in-band
> context in between).
> 
> That would make five distinct slots in struct xnthread each representing
> a group of events, each advertising its own pipeline_inband_work struct.
> 

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: "rtdm/nrtsig: move inband work description off the stack"
  2021-06-04  8:15                             ` Philippe Gerum
  2021-06-04  8:21                               ` Jan Kiszka
@ 2021-06-04  8:36                               ` Chen, Hongzhan
  2021-06-04 12:54                                 ` Philippe Gerum
  1 sibling, 1 reply; 21+ messages in thread
From: Chen, Hongzhan @ 2021-06-04  8:36 UTC (permalink / raw)
  To: Philippe Gerum, Jan Kiszka; +Cc: xenomai



>-----Original Message-----
>From: Philippe Gerum <rpm@xenomai.org> 
>Sent: Friday, June 4, 2021 4:15 PM
>To: Jan Kiszka <jan.kiszka@siemens.com>
>Cc: Chen, Hongzhan <hongzhan.chen@intel.com>; xenomai@xenomai.org
>Subject: Re: "rtdm/nrtsig: move inband work description off the stack"
>
>
>Jan Kiszka <jan.kiszka@siemens.com> writes:
>
>> On 04.06.21 09:51, Philippe Gerum wrote:
>>> 
>>> Chen, Hongzhan <hongzhan.chen@intel.com> writes:
>>> 
>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>
>>>>>> On 26.05.21 15:52, Philippe Gerum wrote:
>>>>>>>
>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>
>>>>>>>> On 25.05.21 15:20, Philippe Gerum wrote:
>>>>>>>>>
>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>
>>>>>>>>>> On 25.05.21 14:37, Philippe Gerum wrote:
>>>>>>>>>>>
>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>
>>>>>>>>>>>> On 25.05.21 12:01, Philippe Gerum wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 25.05.21 10:46, Philippe Gerum wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Philippe,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
>>>>>>>>>>>>>>>> to what it was (and even over I-pipe). xnmalloc is nothing you would
>>>>>>>>>>>>>>>> expect from a "send a signal" service, specifically from
>>>>>>>>>>>>>>>> rtdm_nrtsig_pend which does not even make use of the sending extra payload.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can we do better? Also for xnthread_signal (fd and udd usecases are not
>>>>>>>>>>>>>>>> time-sensitive anyway).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Nope, I cannot see any significantly better option which would still
>>>>>>>>>>>>>>> allow a common implementation we can map on top of Dovetail / I-pipe.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> E.g. pre-allocate the work object for data-free signals and use that -
>>>>>>>>>>>>>> or not send it when it is in use, thus pending.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ok, let's recap:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - rtdm_nrtsig_pend(): does not allocate anything, merely uses a
>>>>>>>>>>>>>   user-provided memory block, containing a request header. Current
>>>>>>>>>>>>>   callers should either already care for not resending a request
>>>>>>>>>>>>>   uselessly because it is pending (e.g. [1]), or their internal logic
>>>>>>>>>>>>>   should prevent that (e.g. [2]). This interface always convey user data
>>>>>>>>>>>>>   by reference.
>>>>>>>>>>>>
>>>>>>>>>>>> Ah, I fantasized that rtdm_nrtsig_pend would call rtdm_schedule_nrt_work
>>>>>>>>>>>> to do the work, it's the other way around. False alarm here.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct in order to
>>>>>>>>>>>>>   convey a request block we cannot squeeze into a work_struct, since the
>>>>>>>>>>>>>   latter is in itself an anchor type the user may embed into its own
>>>>>>>>>>>>>   private argument block. I'm unsure ensuring that
>>>>>>>>>>>>>   rtdm_schedule_nrt_work() calls do not pile up would be worth it, as
>>>>>>>>>>>>>   this call is not supposed to be used frenetically on some hot path in
>>>>>>>>>>>>>   the first place. But if we'd want to do that, then we would have to
>>>>>>>>>>>>>   change the signature of rtdm_schedule_nrt_work(), so that we gain a
>>>>>>>>>>>>>   persistent context data we could use for determining whether a nrt
>>>>>>>>>>>>>   work request is in flight. We could not probe the work_struct for that
>>>>>>>>>>>>>   purpose, that would be racy.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do you see any way to have a smarter rtdm_schedule_nrt_work() without
>>>>>>>>>>>>> changing its signature?
>>>>>>>>>>>>
>>>>>>>>>>>> There is no rule that prevents changing the signature if that is needed.
>>>>>>>>>>>> However, the kernel is fine without allocation as well, using a very
>>>>>>>>>>>> similar signature.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> schedule_work() and rtdm_schedule_nrt_work() have a major difference:
>>>>>>>>>>> the latter has to pass on the request from the oob to the in-band
>>>>>>>>>>> context. This is what bugs us and creates the requirement for additional
>>>>>>>>>>> mechanism and data.
>>>>>>>>>>>
>>>>>>>>>>>> I do not yet understand way we need that indirection via the rtdm_nrtsig
>>>>>>>>>>>> on Dovetail. I thought we can trigger work directly from the oob domain
>>>>>>>>>>>> there, can't we? How do you handle such a case in evl so far?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Dovetail can trigger in-band work from oob via the generic irq_work()
>>>>>>>>>>> service, we don't need the extra code involved in the I-pipe for the
>>>>>>>>>>> same purpose. The gist of the matter is about having the same
>>>>>>>>>>> implementation for rtdm_schedule_nrt_work() which works in both Dovetail
>>>>>>>>>>> and I-pipe contexts: the way we do that is by using rtdm_nrtsig_pend()
>>>>>>>>>>> which already abstracts the base mechanism for relaying oob -> in-band
>>>>>>>>>>> signals.
>>>>>>>>>>>
>>>>>>>>>>> On the other hand, EVL has the evl_call_inband[_from]() service, which
>>>>>>>>>>> combines the irq_work and work_struct anchors we need into a single
>>>>>>>>>>> evl_work type, which in turn can be embedded into a user request
>>>>>>>>>>> block. This is what rtdm_schedule_nrt_work() does not expose, so it has
>>>>>>>>>>> to build one internally by combining a dynamic allocation and the
>>>>>>>>>>> user-provided work_struct.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Then enhance the rtdm service to enable such a combination - and be even
>>>>>>>>>> more ready for the switch to Xenomai 4, I would say. We can either add a
>>>>>>>>>> new service and deprecate the old one (while implementing it as you
>>>>>>>>>> suggest) or just break the interface to get the (presumably) few users
>>>>>>>>>> quickly converted.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> rtdm_schedule_nrt_work() was added by Gilles when merging the RTnet code
>>>>>>>>> into Xenomai 3 IIRC, with little documentation. Widespread usage of this
>>>>>>>>> call is unlikely, so I'd vote for simplifying the whole thing and
>>>>>>>>> replace it entirely.
>>>>>>>>>
>>>>>>>>
>>>>>>>> OK, then we have a plan for this piece.
>>>>
>>>> Do we  need to implement new API to replace rtdm_schedule_nrt_work() to avoid call xnmalloc 
>>>> or just keep it as it is in Xenomai 3.2 ?
>>>>
>>>>>>>>
>>>>>>>>>>>> And the third case remains xnthread_signal, btw.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTERM signals
>>>>>>>>>>> for a thread, a seldom event. Optimizing there would be overkill, unless
>>>>>>>>>>> the application behaves wrongly in the first place.
>>>>>>>>>>
>>>>>>>>>> Some caller are under nklock, thus it makes potentially lengthy critical
>>>>>>>>>> sections now a bit longer.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is what needs to be fixed in the first place, we should not trigger
>>>>>>>>> a signal relay when holding the ugly big lock.
>>>>
>>>> Which functions we need to fix or do optimization to avoid calling xnthread_signal when holding
>>>> big lock if possible? xnthread_suspend and xnthread_cancel?  
>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Whatever is simpler.
>>>>>>>>
>>>>>>>> Another aspect: Using xnmalloc for anything that should better succeed
>>>>>>>> is possibly not good when we are OOM. That is actually no strict
>>>>>>>> regression over I-pipe where we had a limited queue as well IIRC but a
>>>>>>>> reason to avoid carrying that over - and possibly creating new error
>>>>>>>> scenarios that are harder to debug.
>>>>>>>>
>>>>>>>
>>>>>>> To mitigate the issue of generalized OOM, we could pull memory from a
>>>>>>> local xnheap privately attached to xnthread_signal() along with any
>>>>>>> caller requiring dynamic allocation, so that none of them could deplete
>>>>>>> the sysheap.
>>>>>>>
>>>>>>> Getting rid of dynamic allocation entirely on the other hand would
>>>>>>> require us to add one or more "signal slots" per possible cause of
>>>>>>> in-band signal into struct xnthread (e.g. SIGSHADOW_ACTION_BACKTRACE,
>>>>>>> SIGDEBUG_MIGRATE_SIGNAL etc), passing the proper slot to
>>>>>>> xnthread_signal(). SIGSHADOW(SIGWINCH) and SIGDEBUG(SIGXCPU) are
>>>>>>> standard non-queueable signals, so we should only need a single slot per
>>>>>>> signal cause.
>>>>>>>
>>>>>>
>>>>>> Can a thread actually trigger multiple causes per SIGDEBUG, e.g.?
>>>>>> Likely, we only need to carry the single cause via a field in xnthread
>>>>>> from the trigger point to the in-band handler. The same is probably the
>>>>>> case for SIGSHADOW_ACTION_BACKTRACE.
>>>>>>
>>>>>
>>>>> Multiple events triggering SIGSHADOW_ACTION_HOME and
>>>>> SIGSHADOW_ACTION_HARDEN _might_ pile up, although unlikely.
>>>>
>>>> Does that means that in struct xnthread there should be three slots of
>>>> struct lostage_signal to handle each of SIGDEBUG/SIGSHADOW/SIGTERM
>>>> like patch [1]?
>>>>
>>>>  If there is no pile up for each signal , we would just allocate fixed position 
>>>> for three signals in the array and then handle corresponding  lostage_signal 
>>>>  by signo. Is it enough?
>>>>
>>>> [1]:
>>>>
>>>> +++ b/include/cobalt/kernel/thread.h
>>>> @@ -51,6 +51,15 @@ struct xnsched_tpslot;
>>>>  struct xnthread_personality;
>>>>  struct completion;
>>>>
>>>> +#define XENO_MUM_SIGNALS 3 /*SIGDEBUG/SIGSHADOW/SIGTERM */
>>>> +
>>>> +struct lostage_signal {
>>>> +       struct pipeline_inband_work inband_work; /* Must be first. */
>>>> +       struct task_struct *task;
>>>> +       int signo, sigval;
>>>> +       struct lostage_signal *self; /* Revisit: I-pipe requirement */
>>>> +};
>>>> +
>>>>  struct xnthread_init_attr {
>>>>         struct xnthread_personality *personality;
>>>>         cpumask_t affinity;
>>>> @@ -205,6 +214,7 @@ struct xnthread {
>>>>         const char *exe_path;   /* Executable path */
>>>>         u32 proghash;           /* Hash value for exe_path */
>>>>  #endif
>>>> +       struct lostage_signal sigarray[XENO_MUM_SIGNALS];
>>>>  };
>>>>
>>> 
>>> I would have as many entries than we have distinct reasons for notifying
>>> the application. i.e. include/cobalt/uapi/signal.h:
>>> 
>>> #define SIGSHADOW_ACTION_HARDEN		1
>>> #define SIGSHADOW_ACTION_BACKTRACE	2
>>> #define SIGSHADOW_ACTION_HOME		3
>>> #define SIGDEBUG_MIGRATE_SIGNAL		1
>>> #define SIGDEBUG_MIGRATE_SYSCALL	2
>>> #define SIGDEBUG_MIGRATE_FAULT		3
>>> #define SIGDEBUG_MIGRATE_PRIOINV	4
>>> #define SIGDEBUG_NOMLOCK		5
>>> #define SIGDEBUG_WATCHDOG		6
>>> #define SIGDEBUG_RESCNT_IMBALANCE	7
>>> #define SIGDEBUG_LOCK_BREAK		8
>>> #define SIGDEBUG_MUTEX_SLEEP		9
>>
>> For SIGDEBUG, I believe we only need the first reason, as I wrote in the
>> other reply. We are interested in the *initial* reason for a thread to
>> migrate. Once it is migrated, it will consume that and only generate a
>> new reason after going back.
>>
>
>We may have SIGDEBUG_WATCHDOG pending concurrently to any other causes,
>this is triggered asynchronously by the oob timer. So at the very least,
>you want this one to have its own slot.
>
>So, to sum up: the following events are synchronous and mutually
>exclusive due to the oob->in-band transition they cause:
>
>SIGDEBUG_MIGRATE_SIGNAL, SIGDEBUG_MIGRATE_SYSCALL,
>SIGDEBUG_MIGRATE_FAULT, SIGDEBUG_MIGRATE_PRIOINV, SIGDEBUG_MUTEX_SLEEP,
>SIGDEBUG_LOCK_BREAK, RESCNT_IMBALANCE. For these we may share a single slot.
>SIGDEBUG_NOMLOCK can only happen at init when no Xenomai service is
>available yet, we don't have to bother. This one can join the about
>group too.
>
>SIGDEBUG_WATCHDOG needs its own slot.
>
>All SIGSHADOW_* events need their own slot: SIGSHADOW_ACTION_HARDEN and
>SIGSHADOW_ACTION_HOME can be raised by remote threads asynchronously to
>the target thread. SIGSHADOW_ACTION_BACKTRACE comes in addition to
>SIGDEBUG_MIGRATE_*. For the latter reason, SIGSHADOW_ACTION_BACKTRACE
>cannot pile up though (single we have to transition from oob->in-band
>context in between).
>
>That would make five distinct slots in struct xnthread each representing
>a group of events, each advertising its own pipeline_inband_work struct.

OK.   SIGTERM (1) + SIGDEBUG(2) + SIGSHADOW(3) = 6 slots in struct xnthread.

>
>-- 
>Philippe.
>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "rtdm/nrtsig: move inband work description off the stack"
  2021-06-04  8:21                               ` Jan Kiszka
@ 2021-06-04 12:53                                 ` Philippe Gerum
  0 siblings, 0 replies; 21+ messages in thread
From: Philippe Gerum @ 2021-06-04 12:53 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Chen, Hongzhan, xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 04.06.21 10:15, Philippe Gerum wrote:
>> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>>> On 04.06.21 09:51, Philippe Gerum wrote:
>>>>
>>>> Chen, Hongzhan <hongzhan.chen@intel.com> writes:
>>>>
>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>
>>>>>>> On 26.05.21 15:52, Philippe Gerum wrote:
>>>>>>>>
>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>
>>>>>>>>> On 25.05.21 15:20, Philippe Gerum wrote:
>>>>>>>>>>
>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>
>>>>>>>>>>> On 25.05.21 14:37, Philippe Gerum wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 25.05.21 12:01, Philippe Gerum wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 25.05.21 10:46, Philippe Gerum wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Philippe,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
>>>>>>>>>>>>>>>>> to what it was (and even over I-pipe). xnmalloc is nothing you would
>>>>>>>>>>>>>>>>> expect from a "send a signal" service, specifically from
>>>>>>>>>>>>>>>>> rtdm_nrtsig_pend which does not even make use of the sending extra payload.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Can we do better? Also for xnthread_signal (fd and udd usecases are not
>>>>>>>>>>>>>>>>> time-sensitive anyway).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Nope, I cannot see any significantly better option which would still
>>>>>>>>>>>>>>>> allow a common implementation we can map on top of Dovetail / I-pipe.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> E.g. pre-allocate the work object for data-free signals and use that -
>>>>>>>>>>>>>>> or not send it when it is in use, thus pending.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok, let's recap:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - rtdm_nrtsig_pend(): does not allocate anything, merely uses a
>>>>>>>>>>>>>>   user-provided memory block, containing a request header. Current
>>>>>>>>>>>>>>   callers should either already care for not resending a request
>>>>>>>>>>>>>>   uselessly because it is pending (e.g. [1]), or their internal logic
>>>>>>>>>>>>>>   should prevent that (e.g. [2]). This interface always convey user data
>>>>>>>>>>>>>>   by reference.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ah, I fantasized that rtdm_nrtsig_pend would call rtdm_schedule_nrt_work
>>>>>>>>>>>>> to do the work, it's the other way around. False alarm here.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct in order to
>>>>>>>>>>>>>>   convey a request block we cannot squeeze into a work_struct, since the
>>>>>>>>>>>>>>   latter is in itself an anchor type the user may embed into its own
>>>>>>>>>>>>>>   private argument block. I'm unsure ensuring that
>>>>>>>>>>>>>>   rtdm_schedule_nrt_work() calls do not pile up would be worth it, as
>>>>>>>>>>>>>>   this call is not supposed to be used frenetically on some hot path in
>>>>>>>>>>>>>>   the first place. But if we'd want to do that, then we would have to
>>>>>>>>>>>>>>   change the signature of rtdm_schedule_nrt_work(), so that we gain a
>>>>>>>>>>>>>>   persistent context data we could use for determining whether a nrt
>>>>>>>>>>>>>>   work request is in flight. We could not probe the work_struct for that
>>>>>>>>>>>>>>   purpose, that would be racy.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do you see any way to have a smarter rtdm_schedule_nrt_work() without
>>>>>>>>>>>>>> changing its signature?
>>>>>>>>>>>>>
>>>>>>>>>>>>> There is no rule that prevents changing the signature if that is needed.
>>>>>>>>>>>>> However, the kernel is fine without allocation as well, using a very
>>>>>>>>>>>>> similar signature.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> schedule_work() and rtdm_schedule_nrt_work() have a major difference:
>>>>>>>>>>>> the latter has to pass on the request from the oob to the in-band
>>>>>>>>>>>> context. This is what bugs us and creates the requirement for additional
>>>>>>>>>>>> mechanism and data.
>>>>>>>>>>>>
>>>>>>>>>>>>> I do not yet understand way we need that indirection via the rtdm_nrtsig
>>>>>>>>>>>>> on Dovetail. I thought we can trigger work directly from the oob domain
>>>>>>>>>>>>> there, can't we? How do you handle such a case in evl so far?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Dovetail can trigger in-band work from oob via the generic irq_work()
>>>>>>>>>>>> service, we don't need the extra code involved in the I-pipe for the
>>>>>>>>>>>> same purpose. The gist of the matter is about having the same
>>>>>>>>>>>> implementation for rtdm_schedule_nrt_work() which works in both Dovetail
>>>>>>>>>>>> and I-pipe contexts: the way we do that is by using rtdm_nrtsig_pend()
>>>>>>>>>>>> which already abstracts the base mechanism for relaying oob -> in-band
>>>>>>>>>>>> signals.
>>>>>>>>>>>>
>>>>>>>>>>>> On the other hand, EVL has the evl_call_inband[_from]() service, which
>>>>>>>>>>>> combines the irq_work and work_struct anchors we need into a single
>>>>>>>>>>>> evl_work type, which in turn can be embedded into a user request
>>>>>>>>>>>> block. This is what rtdm_schedule_nrt_work() does not expose, so it has
>>>>>>>>>>>> to build one internally by combining a dynamic allocation and the
>>>>>>>>>>>> user-provided work_struct.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Then enhance the rtdm service to enable such a combination - and be even
>>>>>>>>>>> more ready for the switch to Xenomai 4, I would say. We can either add a
>>>>>>>>>>> new service and deprecate the old one (while implementing it as you
>>>>>>>>>>> suggest) or just break the interface to get the (presumably) few users
>>>>>>>>>>> quickly converted.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> rtdm_schedule_nrt_work() was added by Gilles when merging the RTnet code
>>>>>>>>>> into Xenomai 3 IIRC, with little documentation. Widespread usage of this
>>>>>>>>>> call is unlikely, so I'd vote for simplifying the whole thing and
>>>>>>>>>> replace it entirely.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> OK, then we have a plan for this piece.
>>>>>
>>>>> Do we  need to implement new API to replace rtdm_schedule_nrt_work() to avoid call xnmalloc 
>>>>> or just keep it as it is in Xenomai 3.2 ?
>>>>>
>>>>>>>>>
>>>>>>>>>>>>> And the third case remains xnthread_signal, btw.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTERM signals
>>>>>>>>>>>> for a thread, a seldom event. Optimizing there would be overkill, unless
>>>>>>>>>>>> the application behaves wrongly in the first place.
>>>>>>>>>>>
>>>>>>>>>>> Some caller are under nklock, thus it makes potentially lengthy critical
>>>>>>>>>>> sections now a bit longer.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This is what needs to be fixed in the first place, we should not trigger
>>>>>>>>>> a signal relay when holding the ugly big lock.
>>>>>
>>>>> Which functions we need to fix or do optimization to avoid calling xnthread_signal when holding
>>>>> big lock if possible? xnthread_suspend and xnthread_cancel?  
>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Whatever is simpler.
>>>>>>>>>
>>>>>>>>> Another aspect: Using xnmalloc for anything that should better succeed
>>>>>>>>> is possibly not good when we are OOM. That is actually no strict
>>>>>>>>> regression over I-pipe where we had a limited queue as well IIRC but a
>>>>>>>>> reason to avoid carrying that over - and possibly creating new error
>>>>>>>>> scenarios that are harder to debug.
>>>>>>>>>
>>>>>>>>
>>>>>>>> To mitigate the issue of generalized OOM, we could pull memory from a
>>>>>>>> local xnheap privately attached to xnthread_signal() along with any
>>>>>>>> caller requiring dynamic allocation, so that none of them could deplete
>>>>>>>> the sysheap.
>>>>>>>>
>>>>>>>> Getting rid of dynamic allocation entirely on the other hand would
>>>>>>>> require us to add one or more "signal slots" per possible cause of
>>>>>>>> in-band signal into struct xnthread (e.g. SIGSHADOW_ACTION_BACKTRACE,
>>>>>>>> SIGDEBUG_MIGRATE_SIGNAL etc), passing the proper slot to
>>>>>>>> xnthread_signal(). SIGSHADOW(SIGWINCH) and SIGDEBUG(SIGXCPU) are
>>>>>>>> standard non-queueable signals, so we should only need a single slot per
>>>>>>>> signal cause.
>>>>>>>>
>>>>>>>
>>>>>>> Can a thread actually trigger multiple causes per SIGDEBUG, e.g.?
>>>>>>> Likely, we only need to carry the single cause via a field in xnthread
>>>>>>> from the trigger point to the in-band handler. The same is probably the
>>>>>>> case for SIGSHADOW_ACTION_BACKTRACE.
>>>>>>>
>>>>>>
>>>>>> Multiple events triggering SIGSHADOW_ACTION_HOME and
>>>>>> SIGSHADOW_ACTION_HARDEN _might_ pile up, although unlikely.
>>>>>
>>>>> Does that means that in struct xnthread there should be three slots of
>>>>> struct lostage_signal to handle each of SIGDEBUG/SIGSHADOW/SIGTERM
>>>>> like patch [1]?
>>>>>
>>>>>  If there is no pile up for each signal , we would just allocate fixed position 
>>>>> for three signals in the array and then handle corresponding  lostage_signal 
>>>>>  by signo. Is it enough?
>>>>>
>>>>> [1]:
>>>>>
>>>>> +++ b/include/cobalt/kernel/thread.h
>>>>> @@ -51,6 +51,15 @@ struct xnsched_tpslot;
>>>>>  struct xnthread_personality;
>>>>>  struct completion;
>>>>>
>>>>> +#define XENO_MUM_SIGNALS 3 /*SIGDEBUG/SIGSHADOW/SIGTERM */
>>>>> +
>>>>> +struct lostage_signal {
>>>>> +       struct pipeline_inband_work inband_work; /* Must be first. */
>>>>> +       struct task_struct *task;
>>>>> +       int signo, sigval;
>>>>> +       struct lostage_signal *self; /* Revisit: I-pipe requirement */
>>>>> +};
>>>>> +
>>>>>  struct xnthread_init_attr {
>>>>>         struct xnthread_personality *personality;
>>>>>         cpumask_t affinity;
>>>>> @@ -205,6 +214,7 @@ struct xnthread {
>>>>>         const char *exe_path;   /* Executable path */
>>>>>         u32 proghash;           /* Hash value for exe_path */
>>>>>  #endif
>>>>> +       struct lostage_signal sigarray[XENO_MUM_SIGNALS];
>>>>>  };
>>>>>
>>>>
>>>> I would have as many entries than we have distinct reasons for notifying
>>>> the application. i.e. include/cobalt/uapi/signal.h:
>>>>
>>>> #define SIGSHADOW_ACTION_HARDEN		1
>>>> #define SIGSHADOW_ACTION_BACKTRACE	2
>>>> #define SIGSHADOW_ACTION_HOME		3
>>>> #define SIGDEBUG_MIGRATE_SIGNAL		1
>>>> #define SIGDEBUG_MIGRATE_SYSCALL	2
>>>> #define SIGDEBUG_MIGRATE_FAULT		3
>>>> #define SIGDEBUG_MIGRATE_PRIOINV	4
>>>> #define SIGDEBUG_NOMLOCK		5
>>>> #define SIGDEBUG_WATCHDOG		6
>>>> #define SIGDEBUG_RESCNT_IMBALANCE	7
>>>> #define SIGDEBUG_LOCK_BREAK		8
>>>> #define SIGDEBUG_MUTEX_SLEEP		9
>>>
>>> For SIGDEBUG, I believe we only need the first reason, as I wrote in the
>>> other reply. We are interested in the *initial* reason for a thread to
>>> migrate. Once it is migrated, it will consume that and only generate a
>>> new reason after going back.
>>>
>> 
>> We may have SIGDEBUG_WATCHDOG pending concurrently to any other causes,
>> this is triggered asynchronously by the oob timer. So at the very least,
>> you want this one to have its own slot.
>
> First come, first serve - we only need to know why we initially
> migrated.
> But, yes, that may imply locking against concurrent filling of
> the reason field.

Which means you want distinct slots, otherwise all this work to save a
couple of µs on a weakly contended heap would make no sense I believe.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "rtdm/nrtsig: move inband work description off the stack"
  2021-06-04  8:36                               ` Chen, Hongzhan
@ 2021-06-04 12:54                                 ` Philippe Gerum
  0 siblings, 0 replies; 21+ messages in thread
From: Philippe Gerum @ 2021-06-04 12:54 UTC (permalink / raw)
  To: Chen, Hongzhan; +Cc: Jan Kiszka, xenomai


Chen, Hongzhan <hongzhan.chen@intel.com> writes:

>>-----Original Message-----
>>From: Philippe Gerum <rpm@xenomai.org> 
>>Sent: Friday, June 4, 2021 4:15 PM
>>To: Jan Kiszka <jan.kiszka@siemens.com>
>>Cc: Chen, Hongzhan <hongzhan.chen@intel.com>; xenomai@xenomai.org
>>Subject: Re: "rtdm/nrtsig: move inband work description off the stack"
>>
>>
>>Jan Kiszka <jan.kiszka@siemens.com> writes:
>>
>>> On 04.06.21 09:51, Philippe Gerum wrote:
>>>> 
>>>> Chen, Hongzhan <hongzhan.chen@intel.com> writes:
>>>> 
>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>
>>>>>>> On 26.05.21 15:52, Philippe Gerum wrote:
>>>>>>>>
>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>
>>>>>>>>> On 25.05.21 15:20, Philippe Gerum wrote:
>>>>>>>>>>
>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>
>>>>>>>>>>> On 25.05.21 14:37, Philippe Gerum wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 25.05.21 12:01, Philippe Gerum wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 25.05.21 10:46, Philippe Gerum wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Philippe,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
>>>>>>>>>>>>>>>>> to what it was (and even over I-pipe). xnmalloc is nothing you would
>>>>>>>>>>>>>>>>> expect from a "send a signal" service, specifically from
>>>>>>>>>>>>>>>>> rtdm_nrtsig_pend which does not even make use of the sending extra payload.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Can we do better? Also for xnthread_signal (fd and udd usecases are not
>>>>>>>>>>>>>>>>> time-sensitive anyway).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Nope, I cannot see any significantly better option which would still
>>>>>>>>>>>>>>>> allow a common implementation we can map on top of Dovetail / I-pipe.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> E.g. pre-allocate the work object for data-free signals and use that -
>>>>>>>>>>>>>>> or not send it when it is in use, thus pending.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok, let's recap:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - rtdm_nrtsig_pend(): does not allocate anything, merely uses a
>>>>>>>>>>>>>>   user-provided memory block, containing a request header. Current
>>>>>>>>>>>>>>   callers should either already care for not resending a request
>>>>>>>>>>>>>>   uselessly because it is pending (e.g. [1]), or their internal logic
>>>>>>>>>>>>>>   should prevent that (e.g. [2]). This interface always convey user data
>>>>>>>>>>>>>>   by reference.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ah, I fantasized that rtdm_nrtsig_pend would call rtdm_schedule_nrt_work
>>>>>>>>>>>>> to do the work, it's the other way around. False alarm here.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct in order to
>>>>>>>>>>>>>>   convey a request block we cannot squeeze into a work_struct, since the
>>>>>>>>>>>>>>   latter is in itself an anchor type the user may embed into its own
>>>>>>>>>>>>>>   private argument block. I'm unsure ensuring that
>>>>>>>>>>>>>>   rtdm_schedule_nrt_work() calls do not pile up would be worth it, as
>>>>>>>>>>>>>>   this call is not supposed to be used frenetically on some hot path in
>>>>>>>>>>>>>>   the first place. But if we'd want to do that, then we would have to
>>>>>>>>>>>>>>   change the signature of rtdm_schedule_nrt_work(), so that we gain a
>>>>>>>>>>>>>>   persistent context data we could use for determining whether a nrt
>>>>>>>>>>>>>>   work request is in flight. We could not probe the work_struct for that
>>>>>>>>>>>>>>   purpose, that would be racy.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do you see any way to have a smarter rtdm_schedule_nrt_work() without
>>>>>>>>>>>>>> changing its signature?
>>>>>>>>>>>>>
>>>>>>>>>>>>> There is no rule that prevents changing the signature if that is needed.
>>>>>>>>>>>>> However, the kernel is fine without allocation as well, using a very
>>>>>>>>>>>>> similar signature.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> schedule_work() and rtdm_schedule_nrt_work() have a major difference:
>>>>>>>>>>>> the latter has to pass on the request from the oob to the in-band
>>>>>>>>>>>> context. This is what bugs us and creates the requirement for additional
>>>>>>>>>>>> mechanism and data.
>>>>>>>>>>>>
>>>>>>>>>>>>> I do not yet understand way we need that indirection via the rtdm_nrtsig
>>>>>>>>>>>>> on Dovetail. I thought we can trigger work directly from the oob domain
>>>>>>>>>>>>> there, can't we? How do you handle such a case in evl so far?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Dovetail can trigger in-band work from oob via the generic irq_work()
>>>>>>>>>>>> service, we don't need the extra code involved in the I-pipe for the
>>>>>>>>>>>> same purpose. The gist of the matter is about having the same
>>>>>>>>>>>> implementation for rtdm_schedule_nrt_work() which works in both Dovetail
>>>>>>>>>>>> and I-pipe contexts: the way we do that is by using rtdm_nrtsig_pend()
>>>>>>>>>>>> which already abstracts the base mechanism for relaying oob -> in-band
>>>>>>>>>>>> signals.
>>>>>>>>>>>>
>>>>>>>>>>>> On the other hand, EVL has the evl_call_inband[_from]() service, which
>>>>>>>>>>>> combines the irq_work and work_struct anchors we need into a single
>>>>>>>>>>>> evl_work type, which in turn can be embedded into a user request
>>>>>>>>>>>> block. This is what rtdm_schedule_nrt_work() does not expose, so it has
>>>>>>>>>>>> to build one internally by combining a dynamic allocation and the
>>>>>>>>>>>> user-provided work_struct.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Then enhance the rtdm service to enable such a combination - and be even
>>>>>>>>>>> more ready for the switch to Xenomai 4, I would say. We can either add a
>>>>>>>>>>> new service and deprecate the old one (while implementing it as you
>>>>>>>>>>> suggest) or just break the interface to get the (presumably) few users
>>>>>>>>>>> quickly converted.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> rtdm_schedule_nrt_work() was added by Gilles when merging the RTnet code
>>>>>>>>>> into Xenomai 3 IIRC, with little documentation. Widespread usage of this
>>>>>>>>>> call is unlikely, so I'd vote for simplifying the whole thing and
>>>>>>>>>> replace it entirely.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> OK, then we have a plan for this piece.
>>>>>
>>>>> Do we  need to implement new API to replace rtdm_schedule_nrt_work() to avoid call xnmalloc 
>>>>> or just keep it as it is in Xenomai 3.2 ?
>>>>>
>>>>>>>>>
>>>>>>>>>>>>> And the third case remains xnthread_signal, btw.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTERM signals
>>>>>>>>>>>> for a thread, a seldom event. Optimizing there would be overkill, unless
>>>>>>>>>>>> the application behaves wrongly in the first place.
>>>>>>>>>>>
>>>>>>>>>>> Some caller are under nklock, thus it makes potentially lengthy critical
>>>>>>>>>>> sections now a bit longer.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This is what needs to be fixed in the first place, we should not trigger
>>>>>>>>>> a signal relay when holding the ugly big lock.
>>>>>
>>>>> Which functions we need to fix or do optimization to avoid calling xnthread_signal when holding
>>>>> big lock if possible? xnthread_suspend and xnthread_cancel?  
>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Whatever is simpler.
>>>>>>>>>
>>>>>>>>> Another aspect: Using xnmalloc for anything that should better succeed
>>>>>>>>> is possibly not good when we are OOM. That is actually no strict
>>>>>>>>> regression over I-pipe where we had a limited queue as well IIRC but a
>>>>>>>>> reason to avoid carrying that over - and possibly creating new error
>>>>>>>>> scenarios that are harder to debug.
>>>>>>>>>
>>>>>>>>
>>>>>>>> To mitigate the issue of generalized OOM, we could pull memory from a
>>>>>>>> local xnheap privately attached to xnthread_signal() along with any
>>>>>>>> caller requiring dynamic allocation, so that none of them could deplete
>>>>>>>> the sysheap.
>>>>>>>>
>>>>>>>> Getting rid of dynamic allocation entirely on the other hand would
>>>>>>>> require us to add one or more "signal slots" per possible cause of
>>>>>>>> in-band signal into struct xnthread (e.g. SIGSHADOW_ACTION_BACKTRACE,
>>>>>>>> SIGDEBUG_MIGRATE_SIGNAL etc), passing the proper slot to
>>>>>>>> xnthread_signal(). SIGSHADOW(SIGWINCH) and SIGDEBUG(SIGXCPU) are
>>>>>>>> standard non-queueable signals, so we should only need a single slot per
>>>>>>>> signal cause.
>>>>>>>>
>>>>>>>
>>>>>>> Can a thread actually trigger multiple causes per SIGDEBUG, e.g.?
>>>>>>> Likely, we only need to carry the single cause via a field in xnthread
>>>>>>> from the trigger point to the in-band handler. The same is probably the
>>>>>>> case for SIGSHADOW_ACTION_BACKTRACE.
>>>>>>>
>>>>>>
>>>>>> Multiple events triggering SIGSHADOW_ACTION_HOME and
>>>>>> SIGSHADOW_ACTION_HARDEN _might_ pile up, although unlikely.
>>>>>
>>>>> Does that means that in struct xnthread there should be three slots of
>>>>> struct lostage_signal to handle each of SIGDEBUG/SIGSHADOW/SIGTERM
>>>>> like patch [1]?
>>>>>
>>>>>  If there is no pile up for each signal , we would just allocate fixed position 
>>>>> for three signals in the array and then handle corresponding  lostage_signal 
>>>>>  by signo. Is it enough?
>>>>>
>>>>> [1]:
>>>>>
>>>>> +++ b/include/cobalt/kernel/thread.h
>>>>> @@ -51,6 +51,15 @@ struct xnsched_tpslot;
>>>>>  struct xnthread_personality;
>>>>>  struct completion;
>>>>>
>>>>> +#define XENO_MUM_SIGNALS 3 /*SIGDEBUG/SIGSHADOW/SIGTERM */
>>>>> +
>>>>> +struct lostage_signal {
>>>>> +       struct pipeline_inband_work inband_work; /* Must be first. */
>>>>> +       struct task_struct *task;
>>>>> +       int signo, sigval;
>>>>> +       struct lostage_signal *self; /* Revisit: I-pipe requirement */
>>>>> +};
>>>>> +
>>>>>  struct xnthread_init_attr {
>>>>>         struct xnthread_personality *personality;
>>>>>         cpumask_t affinity;
>>>>> @@ -205,6 +214,7 @@ struct xnthread {
>>>>>         const char *exe_path;   /* Executable path */
>>>>>         u32 proghash;           /* Hash value for exe_path */
>>>>>  #endif
>>>>> +       struct lostage_signal sigarray[XENO_MUM_SIGNALS];
>>>>>  };
>>>>>
>>>> 
>>>> I would have as many entries than we have distinct reasons for notifying
>>>> the application. i.e. include/cobalt/uapi/signal.h:
>>>> 
>>>> #define SIGSHADOW_ACTION_HARDEN		1
>>>> #define SIGSHADOW_ACTION_BACKTRACE	2
>>>> #define SIGSHADOW_ACTION_HOME		3
>>>> #define SIGDEBUG_MIGRATE_SIGNAL		1
>>>> #define SIGDEBUG_MIGRATE_SYSCALL	2
>>>> #define SIGDEBUG_MIGRATE_FAULT		3
>>>> #define SIGDEBUG_MIGRATE_PRIOINV	4
>>>> #define SIGDEBUG_NOMLOCK		5
>>>> #define SIGDEBUG_WATCHDOG		6
>>>> #define SIGDEBUG_RESCNT_IMBALANCE	7
>>>> #define SIGDEBUG_LOCK_BREAK		8
>>>> #define SIGDEBUG_MUTEX_SLEEP		9
>>>
>>> For SIGDEBUG, I believe we only need the first reason, as I wrote in the
>>> other reply. We are interested in the *initial* reason for a thread to
>>> migrate. Once it is migrated, it will consume that and only generate a
>>> new reason after going back.
>>>
>>
>>We may have SIGDEBUG_WATCHDOG pending concurrently to any other causes,
>>this is triggered asynchronously by the oob timer. So at the very least,
>>you want this one to have its own slot.
>>
>>So, to sum up: the following events are synchronous and mutually
>>exclusive due to the oob->in-band transition they cause:
>>
>>SIGDEBUG_MIGRATE_SIGNAL, SIGDEBUG_MIGRATE_SYSCALL,
>>SIGDEBUG_MIGRATE_FAULT, SIGDEBUG_MIGRATE_PRIOINV, SIGDEBUG_MUTEX_SLEEP,
>>SIGDEBUG_LOCK_BREAK, RESCNT_IMBALANCE. For these we may share a single slot.
>>SIGDEBUG_NOMLOCK can only happen at init when no Xenomai service is
>>available yet, we don't have to bother. This one can join the about
>>group too.
>>
>>SIGDEBUG_WATCHDOG needs its own slot.
>>
>>All SIGSHADOW_* events need their own slot: SIGSHADOW_ACTION_HARDEN and
>>SIGSHADOW_ACTION_HOME can be raised by remote threads asynchronously to
>>the target thread. SIGSHADOW_ACTION_BACKTRACE comes in addition to
>>SIGDEBUG_MIGRATE_*. For the latter reason, SIGSHADOW_ACTION_BACKTRACE
>>cannot pile up though (single we have to transition from oob->in-band
>>context in between).
>>
>>That would make five distinct slots in struct xnthread each representing
>>a group of events, each advertising its own pipeline_inband_work struct.
>
> OK.   SIGTERM (1) + SIGDEBUG(2) + SIGSHADOW(3) = 6 slots in struct xnthread.
>

Yes.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-06-04 12:54 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-25  7:35 "rtdm/nrtsig: move inband work description off the stack" Jan Kiszka
2021-05-25  8:46 ` Philippe Gerum
2021-05-25  8:58   ` Jan Kiszka
2021-05-25 10:01     ` Philippe Gerum
2021-05-25 10:38       ` Jan Kiszka
2021-05-25 12:37         ` Philippe Gerum
2021-05-25 12:56           ` Jan Kiszka
2021-05-25 13:20             ` Philippe Gerum
2021-05-25 17:08               ` Jan Kiszka
2021-05-26 13:52                 ` Philippe Gerum
2021-05-31 15:07                   ` Jan Kiszka
2021-06-02  6:51                     ` Philippe Gerum
2021-06-04  6:34                       ` Chen, Hongzhan
2021-06-04  7:34                         ` Jan Kiszka
2021-06-04  7:51                         ` Philippe Gerum
2021-06-04  7:54                           ` Jan Kiszka
2021-06-04  8:15                             ` Philippe Gerum
2021-06-04  8:21                               ` Jan Kiszka
2021-06-04 12:53                                 ` Philippe Gerum
2021-06-04  8:36                               ` Chen, Hongzhan
2021-06-04 12:54                                 ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.