From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754776AbcBXC3S (ORCPT ); Tue, 23 Feb 2016 21:29:18 -0500 Received: from mx1.redhat.com ([209.132.183.28]:44192 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753568AbcBXC3Q (ORCPT ); Tue, 23 Feb 2016 21:29:16 -0500 Subject: Re: [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler To: Peter Zijlstra , Steven Rostedt References: <20160222173259.GM6357@twins.programming.kicks-ass.net> <20160222124854.3816fec4@gandalf.local.home> <20160222213017.GN6357@twins.programming.kicks-ass.net> <20160222173043.24873ae8@gandalf.local.home> <20160223104408.GO6357@twins.programming.kicks-ass.net> Cc: Ingo Molnar , Thomas Gleixner , Juri Lelli , Arnaldo Carvalho de Melo , LKML , linux-rt-users From: Daniel Bristot de Oliveira Message-ID: <56CD1579.2070300@redhat.com> Date: Tue, 23 Feb 2016 23:29:13 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 MIME-Version: 1.0 In-Reply-To: <20160223104408.GO6357@twins.programming.kicks-ass.net> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/23/2016 07:44 AM, Peter Zijlstra wrote: > Now ideally we'd do something like the below, but because trainwreck, we > cannot actually do this I think :-( Some other considerations: 1) The majority of tasks run on NORMAL scheduler with default nice. So, prev=NORMAL:{0,0,0} and next=NORMAL:{0,0,0} does not help too much. The tracepoint size increase from 64 to 112, but this difference will not be useful for the majority of the user/use-cases. 2) The sched:switch can be wider than this 212 characters output: " m-1604 [004] d... 1940.393070: sched_switch: prev_comm=m prev_pid=1604 prev=DEADLINE:{-967616,1940402119572,30000000} prev_state=Rt ==> next_comm=kworker/4:2H next_pid=1507 next=NORMAL:{-20,0,0}" 3) On the other tracepoints, rostedt suggested me to also include the now= field because the clock printed in the trace prefix is not the same clock used by the deadline scheduler. 4) Although neither perf record/script nor trace-cmd record/report broke, both printed only the old fields. So, both would need to be updated and carry both the old and the new version. Well, I bet there will be many other applications that were not well written as perf and trace-cmd :-( 5) A new scheduler will probably require modifications on this tracepoint, hence on all other tools that use this tracepoint too. 6) I liked all the code to compose the output string! But it is not fair to say that the "format" is intuitive. One may need to put the finger in the monitor and use a calculator to see that the filter for a yielded tasks is "prev_state == 8192". Btw, it only works for deadline task yield, and the yield call of all other classes will need to be changed to support this. The question is, is it worth for other classes? 7) This patch is currently showing the runtime zeroed on yield. So, it needs the clean up on yield_task_dl() to properly inform the unused runtime of the task's activation. More than it, to be precise, it must push the zeroing of the runtime of an yielded task to the replenishment code. Your patch for the sched yield already do it :-) but once there, the zeroing can't be moved back without break this tracepoint :-(. 8) we still need to define a tracepoint for replenishment, but as it takes place on the following points: a) sched_setattr() b) periodic replenishment c) on the wakeup after a blocking. There is no a "single tracepoint" already available to "attach" this information. So seems the replenishment tracepoint is still needed. Moreover, although other scheduler may have this concept, I am not sure if they take place under the same conditions. So, AFAICS, this integrate approach is neither adding more information, nor preventing ABI problems now and possibly in the future. On the other hand, although a per-scheduler approach would increase the number of tracepoints, the tracepoints tends to be simpler, confined to the scheduler's .c and more coherent to the scheduler's abstractions. I fear that the mix of the deadline scheduler and other schedulers information, and the adaptation on other tracepoints to make the deadline scheduler points of interest fit on current tracepoints, would add undesired noise/imprecision on the interpretation of the sched deadline actions/abstractions :-(. -- Daniel