Per cgroup accounting of context switches

* Per cgroup accounting of context switches
@ 2019-09-07  4:30 Gautam Kulkarni
  2019-09-07  6:38 ` Yonghong Song
  0 siblings, 1 reply; 2+ messages in thread
From: Gautam Kulkarni @ 2019-09-07  4:30 UTC (permalink / raw)
  To: bpf

Hi,

We are evaluating eBPF as a means to account voluntary and
non-voluntary context switches against cgroups. Currently, this
information is only present in the task_struct for an individual
process and not in the cgroup data structure.

With this context, I was looking for recommendation on the following
possible approaches:

1. Use the existing tracepoint (trace_sched_switch) as it exists here
with BPF_PROG_TYPE_TRACEPOINT:
https://github.com/torvalds/linux/blob/master/kernel/sched/core.c#L3877
However, based on the trace format, the kernel does not expose
prev->nivcsw and prev->nvcsw. Due to this, I feel like this approach
may not be feasible. Is my understanding correct?

2. Attach a kprobe to __schedule() and use BPF_PROG_TYPE_KPROBE
This will allow us access to the prev pointer. From the prev
(task_struct), we can access the cgroup and use an eBPF map to
accumulate per cgroup counts of context switches.

3. Implement a kernel module that attaches a kprobe to __schedule()
and implement the map in the kprobe handler.

4. Modify the kernel to have context switch information in task_group.
Would this be something that would make sense to the community?

I would highly appreciate any feedback on this.

Regards,
Gautam

^ permalink raw reply	[flat|nested] 2+ messages in thread