From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Chen, Hongzhan" <hongzhan.chen@intel.com>
Subject: RE: "rtdm/nrtsig: move inband work description off the stack"
Date: Fri, 4 Jun 2021 08:36:42 +0000
Message-ID: <BN9PR11MB522711543F9434C162E17D58F23B9@BN9PR11MB5227.namprd11.prod.outlook.com>
References: <2098c9e4-4fb1-ef49-2c2d-d2c6573499b5@siemens.com>
 <87lf83cj7k.fsf@xenomai.org>
 <14ad027b-f326-5ea8-9cce-cb4b82984a23@siemens.com>
 <87y2c3b092.fsf@xenomai.org>
 <3bf57e72-9992-0ab4-a956-d6f6197bfe24@siemens.com>
 <87sg2bat1l.fsf@xenomai.org>
 <c4aad772-ea81-11e5-afd3-ff269c6b2899@siemens.com>
 <87o8czar20.fsf@xenomai.org>
 <c734a2e8-8ef0-bc48-f05e-5f96596eeca4@siemens.com>
 <87a6ohzjoi.fsf@xenomai.org>
 <1ea00a75-379d-3d7f-cb33-406f8188d38e@siemens.com>
 <87h7ighi8m.fsf@xenomai.org>
 <BN9PR11MB522775EA51C691A702208894F23B9@BN9PR11MB5227.namprd11.prod.outlook.com>
 <871r9igj9p.fsf@xenomai.org>
 <955096e4-be49-7630-b6c9-465529cf79f8@siemens.com>
 <87y2bqf3lj.fsf@xenomai.org>
In-Reply-To: <87y2bqf3lj.fsf@xenomai.org>
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
List-Id: Discussions about the Xenomai project <xenomai.xenomai.org>
List-Unsubscribe: <https://xenomai.org/mailman/options/xenomai>,
 <mailto:xenomai-request@xenomai.org?subject=unsubscribe>
List-Archive: <http://xenomai.org/pipermail/xenomai/>
List-Post: <mailto:xenomai@xenomai.org>
List-Help: <mailto:xenomai-request@xenomai.org?subject=help>
List-Subscribe: <https://xenomai.org/mailman/listinfo/xenomai>,
 <mailto:xenomai-request@xenomai.org?subject=subscribe>
To: Philippe Gerum <rpm@xenomai.org>, Jan Kiszka <jan.kiszka@siemens.com>
Cc: "xenomai@xenomai.org" <xenomai@xenomai.org>


>-----Original Message-----
>From: Philippe Gerum <rpm@xenomai.org>=20
>Sent: Friday, June 4, 2021 4:15 PM
>To: Jan Kiszka <jan.kiszka@siemens.com>
>Cc: Chen, Hongzhan <hongzhan.chen@intel.com>; xenomai@xenomai.org
>Subject: Re: "rtdm/nrtsig: move inband work description off the stack"
>
>
>Jan Kiszka <jan.kiszka@siemens.com> writes:
>
>> On 04.06.21 09:51, Philippe Gerum wrote:
>>>=20
>>> Chen, Hongzhan <hongzhan.chen@intel.com> writes:
>>>=20
>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>
>>>>>> On 26.05.21 15:52, Philippe Gerum wrote:
>>>>>>>
>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>
>>>>>>>> On 25.05.21 15:20, Philippe Gerum wrote:
>>>>>>>>>
>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>
>>>>>>>>>> On 25.05.21 14:37, Philippe Gerum wrote:
>>>>>>>>>>>
>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>
>>>>>>>>>>>> On 25.05.21 12:01, Philippe Gerum wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 25.05.21 10:46, Philippe Gerum wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Philippe,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight serv=
ice, compared
>>>>>>>>>>>>>>>> to what it was (and even over I-pipe). xnmalloc is nothing=
 you would
>>>>>>>>>>>>>>>> expect from a "send a signal" service, specifically from
>>>>>>>>>>>>>>>> rtdm_nrtsig_pend which does not even make use of the sendi=
ng extra payload.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can we do better? Also for xnthread_signal (fd and udd use=
cases are not
>>>>>>>>>>>>>>>> time-sensitive anyway).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Nope, I cannot see any significantly better option which wo=
uld still
>>>>>>>>>>>>>>> allow a common implementation we can map on top of Dovetail=
 / I-pipe.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> E.g. pre-allocate the work object for data-free signals and =
use that -
>>>>>>>>>>>>>> or not send it when it is in use, thus pending.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ok, let's recap:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - rtdm_nrtsig_pend(): does not allocate anything, merely uses=
 a
>>>>>>>>>>>>>   user-provided memory block, containing a request header. Cu=
rrent
>>>>>>>>>>>>>   callers should either already care for not resending a requ=
est
>>>>>>>>>>>>>   uselessly because it is pending (e.g. [1]), or their intern=
al logic
>>>>>>>>>>>>>   should prevent that (e.g. [2]). This interface always conve=
y user data
>>>>>>>>>>>>>   by reference.
>>>>>>>>>>>>
>>>>>>>>>>>> Ah, I fantasized that rtdm_nrtsig_pend would call rtdm_schedul=
e_nrt_work
>>>>>>>>>>>> to do the work, it's the other way around. False alarm here.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct i=
n order to
>>>>>>>>>>>>>   convey a request block we cannot squeeze into a work_struct=
, since the
>>>>>>>>>>>>>   latter is in itself an anchor type the user may embed into =
its own
>>>>>>>>>>>>>   private argument block. I'm unsure ensuring that
>>>>>>>>>>>>>   rtdm_schedule_nrt_work() calls do not pile up would be wort=
h it, as
>>>>>>>>>>>>>   this call is not supposed to be used frenetically on some h=
ot path in
>>>>>>>>>>>>>   the first place. But if we'd want to do that, then we would=
 have to
>>>>>>>>>>>>>   change the signature of rtdm_schedule_nrt_work(), so that w=
e gain a
>>>>>>>>>>>>>   persistent context data we could use for determining whethe=
r a nrt
>>>>>>>>>>>>>   work request is in flight. We could not probe the work_stru=
ct for that
>>>>>>>>>>>>>   purpose, that would be racy.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do you see any way to have a smarter rtdm_schedule_nrt_work()=
 without
>>>>>>>>>>>>> changing its signature?
>>>>>>>>>>>>
>>>>>>>>>>>> There is no rule that prevents changing the signature if that =
is needed.
>>>>>>>>>>>> However, the kernel is fine without allocation as well, using =
a very
>>>>>>>>>>>> similar signature.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> schedule_work() and rtdm_schedule_nrt_work() have a major diffe=
rence:
>>>>>>>>>>> the latter has to pass on the request from the oob to the in-ba=
nd
>>>>>>>>>>> context. This is what bugs us and creates the requirement for a=
dditional
>>>>>>>>>>> mechanism and data.
>>>>>>>>>>>
>>>>>>>>>>>> I do not yet understand way we need that indirection via the r=
tdm_nrtsig
>>>>>>>>>>>> on Dovetail. I thought we can trigger work directly from the o=
ob domain
>>>>>>>>>>>> there, can't we? How do you handle such a case in evl so far?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Dovetail can trigger in-band work from oob via the generic irq_=
work()
>>>>>>>>>>> service, we don't need the extra code involved in the I-pipe fo=
r the
>>>>>>>>>>> same purpose. The gist of the matter is about having the same
>>>>>>>>>>> implementation for rtdm_schedule_nrt_work() which works in both=
 Dovetail
>>>>>>>>>>> and I-pipe contexts: the way we do that is by using rtdm_nrtsig=
_pend()
>>>>>>>>>>> which already abstracts the base mechanism for relaying oob -> =
in-band
>>>>>>>>>>> signals.
>>>>>>>>>>>
>>>>>>>>>>> On the other hand, EVL has the evl_call_inband[_from]() service=
, which
>>>>>>>>>>> combines the irq_work and work_struct anchors we need into a si=
ngle
>>>>>>>>>>> evl_work type, which in turn can be embedded into a user reques=
t
>>>>>>>>>>> block. This is what rtdm_schedule_nrt_work() does not expose, s=
o it has
>>>>>>>>>>> to build one internally by combining a dynamic allocation and t=
he
>>>>>>>>>>> user-provided work_struct.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Then enhance the rtdm service to enable such a combination - and=
 be even
>>>>>>>>>> more ready for the switch to Xenomai 4, I would say. We can eith=
er add a
>>>>>>>>>> new service and deprecate the old one (while implementing it as =
you
>>>>>>>>>> suggest) or just break the interface to get the (presumably) few=
 users
>>>>>>>>>> quickly converted.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> rtdm_schedule_nrt_work() was added by Gilles when merging the RTn=
et code
>>>>>>>>> into Xenomai 3 IIRC, with little documentation. Widespread usage =
of this
>>>>>>>>> call is unlikely, so I'd vote for simplifying the whole thing and
>>>>>>>>> replace it entirely.
>>>>>>>>>
>>>>>>>>
>>>>>>>> OK, then we have a plan for this piece.
>>>>
>>>> Do we  need to implement new API to replace rtdm_schedule_nrt_work() t=
o avoid call xnmalloc=20
>>>> or just keep it as it is in Xenomai 3.2 ?
>>>>
>>>>>>>>
>>>>>>>>>>>> And the third case remains xnthread_signal, btw.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTERM=
 signals
>>>>>>>>>>> for a thread, a seldom event. Optimizing there would be overkil=
l, unless
>>>>>>>>>>> the application behaves wrongly in the first place.
>>>>>>>>>>
>>>>>>>>>> Some caller are under nklock, thus it makes potentially lengthy =
critical
>>>>>>>>>> sections now a bit longer.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is what needs to be fixed in the first place, we should not =
trigger
>>>>>>>>> a signal relay when holding the ugly big lock.
>>>>
>>>> Which functions we need to fix or do optimization to avoid calling xnt=
hread_signal when holding
>>>> big lock if possible? xnthread_suspend and xnthread_cancel? =20
>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Whatever is simpler.
>>>>>>>>
>>>>>>>> Another aspect: Using xnmalloc for anything that should better suc=
ceed
>>>>>>>> is possibly not good when we are OOM. That is actually no strict
>>>>>>>> regression over I-pipe where we had a limited queue as well IIRC b=
ut a
>>>>>>>> reason to avoid carrying that over - and possibly creating new err=
or
>>>>>>>> scenarios that are harder to debug.
>>>>>>>>
>>>>>>>
>>>>>>> To mitigate the issue of generalized OOM, we could pull memory from=
 a
>>>>>>> local xnheap privately attached to xnthread_signal() along with any
>>>>>>> caller requiring dynamic allocation, so that none of them could dep=
lete
>>>>>>> the sysheap.
>>>>>>>
>>>>>>> Getting rid of dynamic allocation entirely on the other hand would
>>>>>>> require us to add one or more "signal slots" per possible cause of
>>>>>>> in-band signal into struct xnthread (e.g. SIGSHADOW_ACTION_BACKTRAC=
E,
>>>>>>> SIGDEBUG_MIGRATE_SIGNAL etc), passing the proper slot to
>>>>>>> xnthread_signal(). SIGSHADOW(SIGWINCH) and SIGDEBUG(SIGXCPU) are
>>>>>>> standard non-queueable signals, so we should only need a single slo=
t per
>>>>>>> signal cause.
>>>>>>>
>>>>>>
>>>>>> Can a thread actually trigger multiple causes per SIGDEBUG, e.g.?
>>>>>> Likely, we only need to carry the single cause via a field in xnthre=
ad
>>>>>> from the trigger point to the in-band handler. The same is probably =
the
>>>>>> case for SIGSHADOW_ACTION_BACKTRACE.
>>>>>>
>>>>>
>>>>> Multiple events triggering SIGSHADOW_ACTION_HOME and
>>>>> SIGSHADOW_ACTION_HARDEN _might_ pile up, although unlikely.
>>>>
>>>> Does that means that in struct xnthread there should be three slots of
>>>> struct lostage_signal to handle each of SIGDEBUG/SIGSHADOW/SIGTERM
>>>> like patch [1]?
>>>>
>>>>  If there is no pile up for each signal , we would just allocate fixed=
 position=20
>>>> for three signals in the array and then handle corresponding  lostage_=
signal=20
>>>>  by signo. Is it enough?
>>>>
>>>> [1]:
>>>>
>>>> +++ b/include/cobalt/kernel/thread.h
>>>> @@ -51,6 +51,15 @@ struct xnsched_tpslot;
>>>>  struct xnthread_personality;
>>>>  struct completion;
>>>>
>>>> +#define XENO_MUM_SIGNALS 3 /*SIGDEBUG/SIGSHADOW/SIGTERM */
>>>> +
>>>> +struct lostage_signal {
>>>> +       struct pipeline_inband_work inband_work; /* Must be first. */
>>>> +       struct task_struct *task;
>>>> +       int signo, sigval;
>>>> +       struct lostage_signal *self; /* Revisit: I-pipe requirement */
>>>> +};
>>>> +
>>>>  struct xnthread_init_attr {
>>>>         struct xnthread_personality *personality;
>>>>         cpumask_t affinity;
>>>> @@ -205,6 +214,7 @@ struct xnthread {
>>>>         const char *exe_path;   /* Executable path */
>>>>         u32 proghash;           /* Hash value for exe_path */
>>>>  #endif
>>>> +       struct lostage_signal sigarray[XENO_MUM_SIGNALS];
>>>>  };
>>>>
>>>=20
>>> I would have as many entries than we have distinct reasons for notifyin=
g
>>> the application. i.e. include/cobalt/uapi/signal.h:
>>>=20
>>> #define SIGSHADOW_ACTION_HARDEN		1
>>> #define SIGSHADOW_ACTION_BACKTRACE	2
>>> #define SIGSHADOW_ACTION_HOME		3
>>> #define SIGDEBUG_MIGRATE_SIGNAL		1
>>> #define SIGDEBUG_MIGRATE_SYSCALL	2
>>> #define SIGDEBUG_MIGRATE_FAULT		3
>>> #define SIGDEBUG_MIGRATE_PRIOINV	4
>>> #define SIGDEBUG_NOMLOCK		5
>>> #define SIGDEBUG_WATCHDOG		6
>>> #define SIGDEBUG_RESCNT_IMBALANCE	7
>>> #define SIGDEBUG_LOCK_BREAK		8
>>> #define SIGDEBUG_MUTEX_SLEEP		9
>>
>> For SIGDEBUG, I believe we only need the first reason, as I wrote in the
>> other reply. We are interested in the *initial* reason for a thread to
>> migrate. Once it is migrated, it will consume that and only generate a
>> new reason after going back.
>>
>
>We may have SIGDEBUG_WATCHDOG pending concurrently to any other causes,
>this is triggered asynchronously by the oob timer. So at the very least,
>you want this one to have its own slot.
>
>So, to sum up: the following events are synchronous and mutually
>exclusive due to the oob->in-band transition they cause:
>
>SIGDEBUG_MIGRATE_SIGNAL, SIGDEBUG_MIGRATE_SYSCALL,
>SIGDEBUG_MIGRATE_FAULT, SIGDEBUG_MIGRATE_PRIOINV, SIGDEBUG_MUTEX_SLEEP,
>SIGDEBUG_LOCK_BREAK, RESCNT_IMBALANCE. For these we may share a single slo=
t.
>SIGDEBUG_NOMLOCK can only happen at init when no Xenomai service is
>available yet, we don't have to bother. This one can join the about
>group too.
>
>SIGDEBUG_WATCHDOG needs its own slot.
>
>All SIGSHADOW_* events need their own slot: SIGSHADOW_ACTION_HARDEN and
>SIGSHADOW_ACTION_HOME can be raised by remote threads asynchronously to
>the target thread. SIGSHADOW_ACTION_BACKTRACE comes in addition to
>SIGDEBUG_MIGRATE_*. For the latter reason, SIGSHADOW_ACTION_BACKTRACE
>cannot pile up though (single we have to transition from oob->in-band
>context in between).
>
>That would make five distinct slots in struct xnthread each representing
>a group of events, each advertising its own pipeline_inband_work struct.

OK.   SIGTERM (1) + SIGDEBUG(2) + SIGSHADOW(3) =3D 6 slots in struct xnthre=
ad.

>
>--=20
>Philippe.
>