On 10/4/20 7:14 PM, Frederic Weisbecker wrote: > On Sun, Oct 04, 2020 at 02:44:39PM +0000, Alex Belits wrote: >> On Thu, 2020-10-01 at 15:56 +0200, Frederic Weisbecker wrote: >>> External Email >>> >>> ------------------------------------------------------------------- >>> --- >>> On Wed, Jul 22, 2020 at 02:49:49PM +0000, Alex Belits wrote: >>>> +/* >>>> + * Description of the last two tasks that ran isolated on a given >>>> CPU. >>>> + * This is intended only for messages about isolation breaking. We >>>> + * don't want any references to actual task while accessing this >>>> from >>>> + * CPU that caused isolation breaking -- we know nothing about >>>> timing >>>> + * and don't want to use locking or RCU. >>>> + */ >>>> +struct isol_task_desc { >>>> + atomic_t curr_index; >>>> + atomic_t curr_index_wr; >>>> + bool warned[2]; >>>> + pid_t pid[2]; >>>> + pid_t tgid[2]; >>>> + char comm[2][TASK_COMM_LEN]; >>>> +}; >>>> +static DEFINE_PER_CPU(struct isol_task_desc, isol_task_descs); >>> So that's quite a huge patch that would have needed to be split up. >>> Especially this tracing engine. >>> >>> Speaking of which, I agree with Thomas that it's unnecessary. It's >>> too much >>> code and complexity. We can use the existing trace events and perform >>> the >>> analysis from userspace to find the source of the disturbance. >> The idea behind this is that isolation breaking events are supposed to >> be known to the applications while applications run normally, and they >> should not require any analysis or human intervention to be handled. > Sure but you can use trace events for that. Just trace interrupts, workqueues, > timers, syscalls, exceptions and scheduler events and you get all the local > disturbance. You might want to tune a few filters but that's pretty much it. > > As for the source of the disturbances, if you really need that information, > you can trace the workqueue and timer queue events and just filter those that > target your isolated CPUs. > I agree that we can do all those things with tracing. However, IMHO having a simplified logging mechanism to gather the source of violation may help in reducing the manual effort. Although, I am not sure how easy will it be to maintain such an interface over time. -- Thanks Nitesh