Re: "rtdm/nrtsig: move inband work description off the stack"

From: Philippe Gerum <rpm@xenomai.org>
To: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Xenomai <xenomai@xenomai.org>
Subject: Re: "rtdm/nrtsig: move inband work description off the stack"
Date: Wed, 02 Jun 2021 08:51:21 +0200	[thread overview]
Message-ID: <87h7ighi8m.fsf@xenomai.org> (raw)
In-Reply-To: <1ea00a75-379d-3d7f-cb33-406f8188d38e@siemens.com>

Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 26.05.21 15:52, Philippe Gerum wrote:
>> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>>> On 25.05.21 15:20, Philippe Gerum wrote:
>>>>
>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>
>>>>> On 25.05.21 14:37, Philippe Gerum wrote:
>>>>>>
>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>
>>>>>>> On 25.05.21 12:01, Philippe Gerum wrote:
>>>>>>>>
>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>
>>>>>>>>> On 25.05.21 10:46, Philippe Gerum wrote:
>>>>>>>>>>
>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>
>>>>>>>>>>> Hi Philippe,
>>>>>>>>>>>
>>>>>>>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight service, compared
>>>>>>>>>>> to what it was (and even over I-pipe). xnmalloc is nothing you would
>>>>>>>>>>> expect from a "send a signal" service, specifically from
>>>>>>>>>>> rtdm_nrtsig_pend which does not even make use of the sending extra payload.
>>>>>>>>>>>
>>>>>>>>>>> Can we do better? Also for xnthread_signal (fd and udd usecases are not
>>>>>>>>>>> time-sensitive anyway).
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Nope, I cannot see any significantly better option which would still
>>>>>>>>>> allow a common implementation we can map on top of Dovetail / I-pipe.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> E.g. pre-allocate the work object for data-free signals and use that -
>>>>>>>>> or not send it when it is in use, thus pending.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Ok, let's recap:
>>>>>>>>
>>>>>>>> - rtdm_nrtsig_pend(): does not allocate anything, merely uses a
>>>>>>>>   user-provided memory block, containing a request header. Current
>>>>>>>>   callers should either already care for not resending a request
>>>>>>>>   uselessly because it is pending (e.g. [1]), or their internal logic
>>>>>>>>   should prevent that (e.g. [2]). This interface always convey user data
>>>>>>>>   by reference.
>>>>>>>
>>>>>>> Ah, I fantasized that rtdm_nrtsig_pend would call rtdm_schedule_nrt_work
>>>>>>> to do the work, it's the other way around. False alarm here.
>>>>>>>
>>>>>>>>
>>>>>>>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct in order to
>>>>>>>>   convey a request block we cannot squeeze into a work_struct, since the
>>>>>>>>   latter is in itself an anchor type the user may embed into its own
>>>>>>>>   private argument block. I'm unsure ensuring that
>>>>>>>>   rtdm_schedule_nrt_work() calls do not pile up would be worth it, as
>>>>>>>>   this call is not supposed to be used frenetically on some hot path in
>>>>>>>>   the first place. But if we'd want to do that, then we would have to
>>>>>>>>   change the signature of rtdm_schedule_nrt_work(), so that we gain a
>>>>>>>>   persistent context data we could use for determining whether a nrt
>>>>>>>>   work request is in flight. We could not probe the work_struct for that
>>>>>>>>   purpose, that would be racy.
>>>>>>>>
>>>>>>>> Do you see any way to have a smarter rtdm_schedule_nrt_work() without
>>>>>>>> changing its signature?
>>>>>>>
>>>>>>> There is no rule that prevents changing the signature if that is needed.
>>>>>>> However, the kernel is fine without allocation as well, using a very
>>>>>>> similar signature.
>>>>>>>
>>>>>>
>>>>>> schedule_work() and rtdm_schedule_nrt_work() have a major difference:
>>>>>> the latter has to pass on the request from the oob to the in-band
>>>>>> context. This is what bugs us and creates the requirement for additional
>>>>>> mechanism and data.
>>>>>>
>>>>>>> I do not yet understand way we need that indirection via the rtdm_nrtsig
>>>>>>> on Dovetail. I thought we can trigger work directly from the oob domain
>>>>>>> there, can't we? How do you handle such a case in evl so far?
>>>>>>>
>>>>>>
>>>>>> Dovetail can trigger in-band work from oob via the generic irq_work()
>>>>>> service, we don't need the extra code involved in the I-pipe for the
>>>>>> same purpose. The gist of the matter is about having the same
>>>>>> implementation for rtdm_schedule_nrt_work() which works in both Dovetail
>>>>>> and I-pipe contexts: the way we do that is by using rtdm_nrtsig_pend()
>>>>>> which already abstracts the base mechanism for relaying oob -> in-band
>>>>>> signals.
>>>>>>
>>>>>> On the other hand, EVL has the evl_call_inband[_from]() service, which
>>>>>> combines the irq_work and work_struct anchors we need into a single
>>>>>> evl_work type, which in turn can be embedded into a user request
>>>>>> block. This is what rtdm_schedule_nrt_work() does not expose, so it has
>>>>>> to build one internally by combining a dynamic allocation and the
>>>>>> user-provided work_struct.
>>>>>>
>>>>>
>>>>> Then enhance the rtdm service to enable such a combination - and be even
>>>>> more ready for the switch to Xenomai 4, I would say. We can either add a
>>>>> new service and deprecate the old one (while implementing it as you
>>>>> suggest) or just break the interface to get the (presumably) few users
>>>>> quickly converted.
>>>>>
>>>>
>>>> rtdm_schedule_nrt_work() was added by Gilles when merging the RTnet code
>>>> into Xenomai 3 IIRC, with little documentation. Widespread usage of this
>>>> call is unlikely, so I'd vote for simplifying the whole thing and
>>>> replace it entirely.
>>>>
>>>
>>> OK, then we have a plan for this piece.
>>>
>>>>>>> And the third case remains xnthread_signal, btw.
>>>>>>>
>>>>>>
>>>>>> xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTERM signals
>>>>>> for a thread, a seldom event. Optimizing there would be overkill, unless
>>>>>> the application behaves wrongly in the first place.
>>>>>
>>>>> Some caller are under nklock, thus it makes potentially lengthy critical
>>>>> sections now a bit longer.
>>>>>
>>>>
>>>> This is what needs to be fixed in the first place, we should not trigger
>>>> a signal relay when holding the ugly big lock.
>>>>
>>>
>>> Whatever is simpler.
>>>
>>> Another aspect: Using xnmalloc for anything that should better succeed
>>> is possibly not good when we are OOM. That is actually no strict
>>> regression over I-pipe where we had a limited queue as well IIRC but a
>>> reason to avoid carrying that over - and possibly creating new error
>>> scenarios that are harder to debug.
>>>
>> 
>> To mitigate the issue of generalized OOM, we could pull memory from a
>> local xnheap privately attached to xnthread_signal() along with any
>> caller requiring dynamic allocation, so that none of them could deplete
>> the sysheap.
>> 
>> Getting rid of dynamic allocation entirely on the other hand would
>> require us to add one or more "signal slots" per possible cause of
>> in-band signal into struct xnthread (e.g. SIGSHADOW_ACTION_BACKTRACE,
>> SIGDEBUG_MIGRATE_SIGNAL etc), passing the proper slot to
>> xnthread_signal(). SIGSHADOW(SIGWINCH) and SIGDEBUG(SIGXCPU) are
>> standard non-queueable signals, so we should only need a single slot per
>> signal cause.
>> 
>
> Can a thread actually trigger multiple causes per SIGDEBUG, e.g.?
> Likely, we only need to carry the single cause via a field in xnthread
> from the trigger point to the in-band handler. The same is probably the
> case for SIGSHADOW_ACTION_BACKTRACE.
>

Multiple events triggering SIGSHADOW_ACTION_HOME and
SIGSHADOW_ACTION_HARDEN _might_ pile up, although unlikely.

> Now, the question for me is how we move forward from this? Is it simpler
> to fix these issues? Or would it be better first avoid affecting I-pipe
> cases and living with xnmalloc for an initial phase over Dovetail?
>

A much better option would be that somebody volunteers to work on
implementing what has just been described, in order to have the proper
solution available in both I-pipe an Dovetail contexts, based on generic
code.

Although such work may not seem obvious at first sight reading the
jargonish descriptions above, it is actually fairly simple to
implement. I believe that any contributor would be welcome to ask for
explanations and details for that purpose.

-- 
Philippe.