From mboxrd@z Thu Jan 1 00:00:00 1970 References: <2098c9e4-4fb1-ef49-2c2d-d2c6573499b5@siemens.com> <87lf83cj7k.fsf@xenomai.org> <14ad027b-f326-5ea8-9cce-cb4b82984a23@siemens.com> <87y2c3b092.fsf@xenomai.org> <3bf57e72-9992-0ab4-a956-d6f6197bfe24@siemens.com> <87sg2bat1l.fsf@xenomai.org> <87o8czar20.fsf@xenomai.org> <87a6ohzjoi.fsf@xenomai.org> <1ea00a75-379d-3d7f-cb33-406f8188d38e@siemens.com> <87h7ighi8m.fsf@xenomai.org> <871r9igj9p.fsf@xenomai.org> <955096e4-be49-7630-b6c9-465529cf79f8@siemens.com> <87y2bqf3lj.fsf@xenomai.org> From: Philippe Gerum Subject: Re: "rtdm/nrtsig: move inband work description off the stack" In-reply-to: Date: Fri, 04 Jun 2021 14:53:20 +0200 Message-ID: <87r1hhg5a7.fsf@xenomai.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: "Chen, Hongzhan" , "xenomai@xenomai.org" Jan Kiszka writes: > On 04.06.21 10:15, Philippe Gerum wrote: >>=20 >> Jan Kiszka writes: >>=20 >>> On 04.06.21 09:51, Philippe Gerum wrote: >>>> >>>> Chen, Hongzhan writes: >>>> >>>>>> Jan Kiszka writes: >>>>>> >>>>>>> On 26.05.21 15:52, Philippe Gerum wrote: >>>>>>>> >>>>>>>> Jan Kiszka writes: >>>>>>>> >>>>>>>>> On 25.05.21 15:20, Philippe Gerum wrote: >>>>>>>>>> >>>>>>>>>> Jan Kiszka writes: >>>>>>>>>> >>>>>>>>>>> On 25.05.21 14:37, Philippe Gerum wrote: >>>>>>>>>>>> >>>>>>>>>>>> Jan Kiszka writes: >>>>>>>>>>>> >>>>>>>>>>>>> On 25.05.21 12:01, Philippe Gerum wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jan Kiszka writes: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 25.05.21 10:46, Philippe Gerum wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Jan Kiszka writes: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Philippe, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight ser= vice, compared >>>>>>>>>>>>>>>>> to what it was (and even over I-pipe). xnmalloc is nothin= g you would >>>>>>>>>>>>>>>>> expect from a "send a signal" service, specifically from >>>>>>>>>>>>>>>>> rtdm_nrtsig_pend which does not even make use of the send= ing extra payload. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Can we do better? Also for xnthread_signal (fd and udd us= ecases are not >>>>>>>>>>>>>>>>> time-sensitive anyway). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Nope, I cannot see any significantly better option which w= ould still >>>>>>>>>>>>>>>> allow a common implementation we can map on top of Dovetai= l / I-pipe. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> E.g. pre-allocate the work object for data-free signals and= use that - >>>>>>>>>>>>>>> or not send it when it is in use, thus pending. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ok, let's recap: >>>>>>>>>>>>>> >>>>>>>>>>>>>> - rtdm_nrtsig_pend(): does not allocate anything, merely use= s a >>>>>>>>>>>>>> user-provided memory block, containing a request header. C= urrent >>>>>>>>>>>>>> callers should either already care for not resending a req= uest >>>>>>>>>>>>>> uselessly because it is pending (e.g. [1]), or their inter= nal logic >>>>>>>>>>>>>> should prevent that (e.g. [2]). This interface always conv= ey user data >>>>>>>>>>>>>> by reference. >>>>>>>>>>>>> >>>>>>>>>>>>> Ah, I fantasized that rtdm_nrtsig_pend would call rtdm_schedu= le_nrt_work >>>>>>>>>>>>> to do the work, it's the other way around. False alarm here. >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct = in order to >>>>>>>>>>>>>> convey a request block we cannot squeeze into a work_struc= t, since the >>>>>>>>>>>>>> latter is in itself an anchor type the user may embed into= its own >>>>>>>>>>>>>> private argument block. I'm unsure ensuring that >>>>>>>>>>>>>> rtdm_schedule_nrt_work() calls do not pile up would be wor= th it, as >>>>>>>>>>>>>> this call is not supposed to be used frenetically on some = hot path in >>>>>>>>>>>>>> the first place. But if we'd want to do that, then we woul= d have to >>>>>>>>>>>>>> change the signature of rtdm_schedule_nrt_work(), so that = we gain a >>>>>>>>>>>>>> persistent context data we could use for determining wheth= er a nrt >>>>>>>>>>>>>> work request is in flight. We could not probe the work_str= uct for that >>>>>>>>>>>>>> purpose, that would be racy. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Do you see any way to have a smarter rtdm_schedule_nrt_work(= ) without >>>>>>>>>>>>>> changing its signature? >>>>>>>>>>>>> >>>>>>>>>>>>> There is no rule that prevents changing the signature if that= is needed. >>>>>>>>>>>>> However, the kernel is fine without allocation as well, using= a very >>>>>>>>>>>>> similar signature. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> schedule_work() and rtdm_schedule_nrt_work() have a major diff= erence: >>>>>>>>>>>> the latter has to pass on the request from the oob to the in-b= and >>>>>>>>>>>> context. This is what bugs us and creates the requirement for = additional >>>>>>>>>>>> mechanism and data. >>>>>>>>>>>> >>>>>>>>>>>>> I do not yet understand way we need that indirection via the = rtdm_nrtsig >>>>>>>>>>>>> on Dovetail. I thought we can trigger work directly from the = oob domain >>>>>>>>>>>>> there, can't we? How do you handle such a case in evl so far? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Dovetail can trigger in-band work from oob via the generic irq= _work() >>>>>>>>>>>> service, we don't need the extra code involved in the I-pipe f= or the >>>>>>>>>>>> same purpose. The gist of the matter is about having the same >>>>>>>>>>>> implementation for rtdm_schedule_nrt_work() which works in bot= h Dovetail >>>>>>>>>>>> and I-pipe contexts: the way we do that is by using rtdm_nrtsi= g_pend() >>>>>>>>>>>> which already abstracts the base mechanism for relaying oob ->= in-band >>>>>>>>>>>> signals. >>>>>>>>>>>> >>>>>>>>>>>> On the other hand, EVL has the evl_call_inband[_from]() servic= e, which >>>>>>>>>>>> combines the irq_work and work_struct anchors we need into a s= ingle >>>>>>>>>>>> evl_work type, which in turn can be embedded into a user reque= st >>>>>>>>>>>> block. This is what rtdm_schedule_nrt_work() does not expose, = so it has >>>>>>>>>>>> to build one internally by combining a dynamic allocation and = the >>>>>>>>>>>> user-provided work_struct. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Then enhance the rtdm service to enable such a combination - an= d be even >>>>>>>>>>> more ready for the switch to Xenomai 4, I would say. We can eit= her add a >>>>>>>>>>> new service and deprecate the old one (while implementing it as= you >>>>>>>>>>> suggest) or just break the interface to get the (presumably) fe= w users >>>>>>>>>>> quickly converted. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> rtdm_schedule_nrt_work() was added by Gilles when merging the RT= net code >>>>>>>>>> into Xenomai 3 IIRC, with little documentation. Widespread usage= of this >>>>>>>>>> call is unlikely, so I'd vote for simplifying the whole thing and >>>>>>>>>> replace it entirely. >>>>>>>>>> >>>>>>>>> >>>>>>>>> OK, then we have a plan for this piece. >>>>> >>>>> Do we need to implement new API to replace rtdm_schedule_nrt_work() = to avoid call xnmalloc=20 >>>>> or just keep it as it is in Xenomai 3.2 ? >>>>> >>>>>>>>> >>>>>>>>>>>>> And the third case remains xnthread_signal, btw. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTER= M signals >>>>>>>>>>>> for a thread, a seldom event. Optimizing there would be overki= ll, unless >>>>>>>>>>>> the application behaves wrongly in the first place. >>>>>>>>>>> >>>>>>>>>>> Some caller are under nklock, thus it makes potentially lengthy= critical >>>>>>>>>>> sections now a bit longer. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This is what needs to be fixed in the first place, we should not= trigger >>>>>>>>>> a signal relay when holding the ugly big lock. >>>>> >>>>> Which functions we need to fix or do optimization to avoid calling xn= thread_signal when holding >>>>> big lock if possible? xnthread_suspend and xnthread_cancel?=20=20 >>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> Whatever is simpler. >>>>>>>>> >>>>>>>>> Another aspect: Using xnmalloc for anything that should better su= cceed >>>>>>>>> is possibly not good when we are OOM. That is actually no strict >>>>>>>>> regression over I-pipe where we had a limited queue as well IIRC = but a >>>>>>>>> reason to avoid carrying that over - and possibly creating new er= ror >>>>>>>>> scenarios that are harder to debug. >>>>>>>>> >>>>>>>> >>>>>>>> To mitigate the issue of generalized OOM, we could pull memory fro= m a >>>>>>>> local xnheap privately attached to xnthread_signal() along with any >>>>>>>> caller requiring dynamic allocation, so that none of them could de= plete >>>>>>>> the sysheap. >>>>>>>> >>>>>>>> Getting rid of dynamic allocation entirely on the other hand would >>>>>>>> require us to add one or more "signal slots" per possible cause of >>>>>>>> in-band signal into struct xnthread (e.g. SIGSHADOW_ACTION_BACKTRA= CE, >>>>>>>> SIGDEBUG_MIGRATE_SIGNAL etc), passing the proper slot to >>>>>>>> xnthread_signal(). SIGSHADOW(SIGWINCH) and SIGDEBUG(SIGXCPU) are >>>>>>>> standard non-queueable signals, so we should only need a single sl= ot per >>>>>>>> signal cause. >>>>>>>> >>>>>>> >>>>>>> Can a thread actually trigger multiple causes per SIGDEBUG, e.g.? >>>>>>> Likely, we only need to carry the single cause via a field in xnthr= ead >>>>>>> from the trigger point to the in-band handler. The same is probably= the >>>>>>> case for SIGSHADOW_ACTION_BACKTRACE. >>>>>>> >>>>>> >>>>>> Multiple events triggering SIGSHADOW_ACTION_HOME and >>>>>> SIGSHADOW_ACTION_HARDEN _might_ pile up, although unlikely. >>>>> >>>>> Does that means that in struct xnthread there should be three slots of >>>>> struct lostage_signal to handle each of SIGDEBUG/SIGSHADOW/SIGTERM >>>>> like patch [1]? >>>>> >>>>> If there is no pile up for each signal , we would just allocate fixe= d position=20 >>>>> for three signals in the array and then handle corresponding lostage= _signal=20 >>>>> by signo. Is it enough? >>>>> >>>>> [1]: >>>>> >>>>> +++ b/include/cobalt/kernel/thread.h >>>>> @@ -51,6 +51,15 @@ struct xnsched_tpslot; >>>>> struct xnthread_personality; >>>>> struct completion; >>>>> >>>>> +#define XENO_MUM_SIGNALS 3 /*SIGDEBUG/SIGSHADOW/SIGTERM */ >>>>> + >>>>> +struct lostage_signal { >>>>> + struct pipeline_inband_work inband_work; /* Must be first. */ >>>>> + struct task_struct *task; >>>>> + int signo, sigval; >>>>> + struct lostage_signal *self; /* Revisit: I-pipe requirement */ >>>>> +}; >>>>> + >>>>> struct xnthread_init_attr { >>>>> struct xnthread_personality *personality; >>>>> cpumask_t affinity; >>>>> @@ -205,6 +214,7 @@ struct xnthread { >>>>> const char *exe_path; /* Executable path */ >>>>> u32 proghash; /* Hash value for exe_path */ >>>>> #endif >>>>> + struct lostage_signal sigarray[XENO_MUM_SIGNALS]; >>>>> }; >>>>> >>>> >>>> I would have as many entries than we have distinct reasons for notifyi= ng >>>> the application. i.e. include/cobalt/uapi/signal.h: >>>> >>>> #define SIGSHADOW_ACTION_HARDEN 1 >>>> #define SIGSHADOW_ACTION_BACKTRACE 2 >>>> #define SIGSHADOW_ACTION_HOME 3 >>>> #define SIGDEBUG_MIGRATE_SIGNAL 1 >>>> #define SIGDEBUG_MIGRATE_SYSCALL 2 >>>> #define SIGDEBUG_MIGRATE_FAULT 3 >>>> #define SIGDEBUG_MIGRATE_PRIOINV 4 >>>> #define SIGDEBUG_NOMLOCK 5 >>>> #define SIGDEBUG_WATCHDOG 6 >>>> #define SIGDEBUG_RESCNT_IMBALANCE 7 >>>> #define SIGDEBUG_LOCK_BREAK 8 >>>> #define SIGDEBUG_MUTEX_SLEEP 9 >>> >>> For SIGDEBUG, I believe we only need the first reason, as I wrote in the >>> other reply. We are interested in the *initial* reason for a thread to >>> migrate. Once it is migrated, it will consume that and only generate a >>> new reason after going back. >>> >>=20 >> We may have SIGDEBUG_WATCHDOG pending concurrently to any other causes, >> this is triggered asynchronously by the oob timer. So at the very least, >> you want this one to have its own slot. > > First come, first serve - we only need to know why we initially > migrated. > But, yes, that may imply locking against concurrent filling of > the reason field. Which means you want distinct slots, otherwise all this work to save a couple of =C2=B5s on a weakly contended heap would make no sense I believe. --=20 Philippe.