From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Chen, Hongzhan" Subject: RE: "rtdm/nrtsig: move inband work description off the stack" Date: Fri, 4 Jun 2021 08:36:42 +0000 Message-ID: References: <2098c9e4-4fb1-ef49-2c2d-d2c6573499b5@siemens.com> <87lf83cj7k.fsf@xenomai.org> <14ad027b-f326-5ea8-9cce-cb4b82984a23@siemens.com> <87y2c3b092.fsf@xenomai.org> <3bf57e72-9992-0ab4-a956-d6f6197bfe24@siemens.com> <87sg2bat1l.fsf@xenomai.org> <87o8czar20.fsf@xenomai.org> <87a6ohzjoi.fsf@xenomai.org> <1ea00a75-379d-3d7f-cb33-406f8188d38e@siemens.com> <87h7ighi8m.fsf@xenomai.org> <871r9igj9p.fsf@xenomai.org> <955096e4-be49-7630-b6c9-465529cf79f8@siemens.com> <87y2bqf3lj.fsf@xenomai.org> In-Reply-To: <87y2bqf3lj.fsf@xenomai.org> Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Philippe Gerum , Jan Kiszka Cc: "xenomai@xenomai.org" >-----Original Message----- >From: Philippe Gerum =20 >Sent: Friday, June 4, 2021 4:15 PM >To: Jan Kiszka >Cc: Chen, Hongzhan ; xenomai@xenomai.org >Subject: Re: "rtdm/nrtsig: move inband work description off the stack" > > >Jan Kiszka writes: > >> On 04.06.21 09:51, Philippe Gerum wrote: >>>=20 >>> Chen, Hongzhan writes: >>>=20 >>>>> Jan Kiszka writes: >>>>> >>>>>> On 26.05.21 15:52, Philippe Gerum wrote: >>>>>>> >>>>>>> Jan Kiszka writes: >>>>>>> >>>>>>>> On 25.05.21 15:20, Philippe Gerum wrote: >>>>>>>>> >>>>>>>>> Jan Kiszka writes: >>>>>>>>> >>>>>>>>>> On 25.05.21 14:37, Philippe Gerum wrote: >>>>>>>>>>> >>>>>>>>>>> Jan Kiszka writes: >>>>>>>>>>> >>>>>>>>>>>> On 25.05.21 12:01, Philippe Gerum wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Jan Kiszka writes: >>>>>>>>>>>>> >>>>>>>>>>>>>> On 25.05.21 10:46, Philippe Gerum wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Jan Kiszka writes: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Philippe, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight serv= ice, compared >>>>>>>>>>>>>>>> to what it was (and even over I-pipe). xnmalloc is nothing= you would >>>>>>>>>>>>>>>> expect from a "send a signal" service, specifically from >>>>>>>>>>>>>>>> rtdm_nrtsig_pend which does not even make use of the sendi= ng extra payload. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Can we do better? Also for xnthread_signal (fd and udd use= cases are not >>>>>>>>>>>>>>>> time-sensitive anyway). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Nope, I cannot see any significantly better option which wo= uld still >>>>>>>>>>>>>>> allow a common implementation we can map on top of Dovetail= / I-pipe. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> E.g. pre-allocate the work object for data-free signals and = use that - >>>>>>>>>>>>>> or not send it when it is in use, thus pending. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Ok, let's recap: >>>>>>>>>>>>> >>>>>>>>>>>>> - rtdm_nrtsig_pend(): does not allocate anything, merely uses= a >>>>>>>>>>>>> user-provided memory block, containing a request header. Cu= rrent >>>>>>>>>>>>> callers should either already care for not resending a requ= est >>>>>>>>>>>>> uselessly because it is pending (e.g. [1]), or their intern= al logic >>>>>>>>>>>>> should prevent that (e.g. [2]). This interface always conve= y user data >>>>>>>>>>>>> by reference. >>>>>>>>>>>> >>>>>>>>>>>> Ah, I fantasized that rtdm_nrtsig_pend would call rtdm_schedul= e_nrt_work >>>>>>>>>>>> to do the work, it's the other way around. False alarm here. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct i= n order to >>>>>>>>>>>>> convey a request block we cannot squeeze into a work_struct= , since the >>>>>>>>>>>>> latter is in itself an anchor type the user may embed into = its own >>>>>>>>>>>>> private argument block. I'm unsure ensuring that >>>>>>>>>>>>> rtdm_schedule_nrt_work() calls do not pile up would be wort= h it, as >>>>>>>>>>>>> this call is not supposed to be used frenetically on some h= ot path in >>>>>>>>>>>>> the first place. But if we'd want to do that, then we would= have to >>>>>>>>>>>>> change the signature of rtdm_schedule_nrt_work(), so that w= e gain a >>>>>>>>>>>>> persistent context data we could use for determining whethe= r a nrt >>>>>>>>>>>>> work request is in flight. We could not probe the work_stru= ct for that >>>>>>>>>>>>> purpose, that would be racy. >>>>>>>>>>>>> >>>>>>>>>>>>> Do you see any way to have a smarter rtdm_schedule_nrt_work()= without >>>>>>>>>>>>> changing its signature? >>>>>>>>>>>> >>>>>>>>>>>> There is no rule that prevents changing the signature if that = is needed. >>>>>>>>>>>> However, the kernel is fine without allocation as well, using = a very >>>>>>>>>>>> similar signature. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> schedule_work() and rtdm_schedule_nrt_work() have a major diffe= rence: >>>>>>>>>>> the latter has to pass on the request from the oob to the in-ba= nd >>>>>>>>>>> context. This is what bugs us and creates the requirement for a= dditional >>>>>>>>>>> mechanism and data. >>>>>>>>>>> >>>>>>>>>>>> I do not yet understand way we need that indirection via the r= tdm_nrtsig >>>>>>>>>>>> on Dovetail. I thought we can trigger work directly from the o= ob domain >>>>>>>>>>>> there, can't we? How do you handle such a case in evl so far? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Dovetail can trigger in-band work from oob via the generic irq_= work() >>>>>>>>>>> service, we don't need the extra code involved in the I-pipe fo= r the >>>>>>>>>>> same purpose. The gist of the matter is about having the same >>>>>>>>>>> implementation for rtdm_schedule_nrt_work() which works in both= Dovetail >>>>>>>>>>> and I-pipe contexts: the way we do that is by using rtdm_nrtsig= _pend() >>>>>>>>>>> which already abstracts the base mechanism for relaying oob -> = in-band >>>>>>>>>>> signals. >>>>>>>>>>> >>>>>>>>>>> On the other hand, EVL has the evl_call_inband[_from]() service= , which >>>>>>>>>>> combines the irq_work and work_struct anchors we need into a si= ngle >>>>>>>>>>> evl_work type, which in turn can be embedded into a user reques= t >>>>>>>>>>> block. This is what rtdm_schedule_nrt_work() does not expose, s= o it has >>>>>>>>>>> to build one internally by combining a dynamic allocation and t= he >>>>>>>>>>> user-provided work_struct. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Then enhance the rtdm service to enable such a combination - and= be even >>>>>>>>>> more ready for the switch to Xenomai 4, I would say. We can eith= er add a >>>>>>>>>> new service and deprecate the old one (while implementing it as = you >>>>>>>>>> suggest) or just break the interface to get the (presumably) few= users >>>>>>>>>> quickly converted. >>>>>>>>>> >>>>>>>>> >>>>>>>>> rtdm_schedule_nrt_work() was added by Gilles when merging the RTn= et code >>>>>>>>> into Xenomai 3 IIRC, with little documentation. Widespread usage = of this >>>>>>>>> call is unlikely, so I'd vote for simplifying the whole thing and >>>>>>>>> replace it entirely. >>>>>>>>> >>>>>>>> >>>>>>>> OK, then we have a plan for this piece. >>>> >>>> Do we need to implement new API to replace rtdm_schedule_nrt_work() t= o avoid call xnmalloc=20 >>>> or just keep it as it is in Xenomai 3.2 ? >>>> >>>>>>>> >>>>>>>>>>>> And the third case remains xnthread_signal, btw. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTERM= signals >>>>>>>>>>> for a thread, a seldom event. Optimizing there would be overkil= l, unless >>>>>>>>>>> the application behaves wrongly in the first place. >>>>>>>>>> >>>>>>>>>> Some caller are under nklock, thus it makes potentially lengthy = critical >>>>>>>>>> sections now a bit longer. >>>>>>>>>> >>>>>>>>> >>>>>>>>> This is what needs to be fixed in the first place, we should not = trigger >>>>>>>>> a signal relay when holding the ugly big lock. >>>> >>>> Which functions we need to fix or do optimization to avoid calling xnt= hread_signal when holding >>>> big lock if possible? xnthread_suspend and xnthread_cancel? =20 >>>> >>>>>>>>> >>>>>>>> >>>>>>>> Whatever is simpler. >>>>>>>> >>>>>>>> Another aspect: Using xnmalloc for anything that should better suc= ceed >>>>>>>> is possibly not good when we are OOM. That is actually no strict >>>>>>>> regression over I-pipe where we had a limited queue as well IIRC b= ut a >>>>>>>> reason to avoid carrying that over - and possibly creating new err= or >>>>>>>> scenarios that are harder to debug. >>>>>>>> >>>>>>> >>>>>>> To mitigate the issue of generalized OOM, we could pull memory from= a >>>>>>> local xnheap privately attached to xnthread_signal() along with any >>>>>>> caller requiring dynamic allocation, so that none of them could dep= lete >>>>>>> the sysheap. >>>>>>> >>>>>>> Getting rid of dynamic allocation entirely on the other hand would >>>>>>> require us to add one or more "signal slots" per possible cause of >>>>>>> in-band signal into struct xnthread (e.g. SIGSHADOW_ACTION_BACKTRAC= E, >>>>>>> SIGDEBUG_MIGRATE_SIGNAL etc), passing the proper slot to >>>>>>> xnthread_signal(). SIGSHADOW(SIGWINCH) and SIGDEBUG(SIGXCPU) are >>>>>>> standard non-queueable signals, so we should only need a single slo= t per >>>>>>> signal cause. >>>>>>> >>>>>> >>>>>> Can a thread actually trigger multiple causes per SIGDEBUG, e.g.? >>>>>> Likely, we only need to carry the single cause via a field in xnthre= ad >>>>>> from the trigger point to the in-band handler. The same is probably = the >>>>>> case for SIGSHADOW_ACTION_BACKTRACE. >>>>>> >>>>> >>>>> Multiple events triggering SIGSHADOW_ACTION_HOME and >>>>> SIGSHADOW_ACTION_HARDEN _might_ pile up, although unlikely. >>>> >>>> Does that means that in struct xnthread there should be three slots of >>>> struct lostage_signal to handle each of SIGDEBUG/SIGSHADOW/SIGTERM >>>> like patch [1]? >>>> >>>> If there is no pile up for each signal , we would just allocate fixed= position=20 >>>> for three signals in the array and then handle corresponding lostage_= signal=20 >>>> by signo. Is it enough? >>>> >>>> [1]: >>>> >>>> +++ b/include/cobalt/kernel/thread.h >>>> @@ -51,6 +51,15 @@ struct xnsched_tpslot; >>>> struct xnthread_personality; >>>> struct completion; >>>> >>>> +#define XENO_MUM_SIGNALS 3 /*SIGDEBUG/SIGSHADOW/SIGTERM */ >>>> + >>>> +struct lostage_signal { >>>> + struct pipeline_inband_work inband_work; /* Must be first. */ >>>> + struct task_struct *task; >>>> + int signo, sigval; >>>> + struct lostage_signal *self; /* Revisit: I-pipe requirement */ >>>> +}; >>>> + >>>> struct xnthread_init_attr { >>>> struct xnthread_personality *personality; >>>> cpumask_t affinity; >>>> @@ -205,6 +214,7 @@ struct xnthread { >>>> const char *exe_path; /* Executable path */ >>>> u32 proghash; /* Hash value for exe_path */ >>>> #endif >>>> + struct lostage_signal sigarray[XENO_MUM_SIGNALS]; >>>> }; >>>> >>>=20 >>> I would have as many entries than we have distinct reasons for notifyin= g >>> the application. i.e. include/cobalt/uapi/signal.h: >>>=20 >>> #define SIGSHADOW_ACTION_HARDEN 1 >>> #define SIGSHADOW_ACTION_BACKTRACE 2 >>> #define SIGSHADOW_ACTION_HOME 3 >>> #define SIGDEBUG_MIGRATE_SIGNAL 1 >>> #define SIGDEBUG_MIGRATE_SYSCALL 2 >>> #define SIGDEBUG_MIGRATE_FAULT 3 >>> #define SIGDEBUG_MIGRATE_PRIOINV 4 >>> #define SIGDEBUG_NOMLOCK 5 >>> #define SIGDEBUG_WATCHDOG 6 >>> #define SIGDEBUG_RESCNT_IMBALANCE 7 >>> #define SIGDEBUG_LOCK_BREAK 8 >>> #define SIGDEBUG_MUTEX_SLEEP 9 >> >> For SIGDEBUG, I believe we only need the first reason, as I wrote in the >> other reply. We are interested in the *initial* reason for a thread to >> migrate. Once it is migrated, it will consume that and only generate a >> new reason after going back. >> > >We may have SIGDEBUG_WATCHDOG pending concurrently to any other causes, >this is triggered asynchronously by the oob timer. So at the very least, >you want this one to have its own slot. > >So, to sum up: the following events are synchronous and mutually >exclusive due to the oob->in-band transition they cause: > >SIGDEBUG_MIGRATE_SIGNAL, SIGDEBUG_MIGRATE_SYSCALL, >SIGDEBUG_MIGRATE_FAULT, SIGDEBUG_MIGRATE_PRIOINV, SIGDEBUG_MUTEX_SLEEP, >SIGDEBUG_LOCK_BREAK, RESCNT_IMBALANCE. For these we may share a single slo= t. >SIGDEBUG_NOMLOCK can only happen at init when no Xenomai service is >available yet, we don't have to bother. This one can join the about >group too. > >SIGDEBUG_WATCHDOG needs its own slot. > >All SIGSHADOW_* events need their own slot: SIGSHADOW_ACTION_HARDEN and >SIGSHADOW_ACTION_HOME can be raised by remote threads asynchronously to >the target thread. SIGSHADOW_ACTION_BACKTRACE comes in addition to >SIGDEBUG_MIGRATE_*. For the latter reason, SIGSHADOW_ACTION_BACKTRACE >cannot pile up though (single we have to transition from oob->in-band >context in between). > >That would make five distinct slots in struct xnthread each representing >a group of events, each advertising its own pipeline_inband_work struct. OK. SIGTERM (1) + SIGDEBUG(2) + SIGSHADOW(3) =3D 6 slots in struct xnthre= ad. > >--=20 >Philippe. >