From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752494AbbFLGMX (ORCPT <rfc822;w@1wt.eu>);
	Fri, 12 Jun 2015 02:12:23 -0400
Received: from mail.bmw-carit.de ([62.245.222.98]:58311 "EHLO
	mail.bmw-carit.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750773AbbFLGMW (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Fri, 12 Jun 2015 02:12:22 -0400
X-CTCH-RefID: str=0001.0A0C0202.557A783C.02AD,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0
Message-ID: <557A783C.5080803@bmw-carit.de>
Date: Fri, 12 Jun 2015 08:12:12 +0200
From: Daniel Wagner <daniel.wagner@bmw-carit.de>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0
MIME-Version: 1.0
To: Alexei Starovoitov <ast@plumgrid.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Tom Zanussi <tom.zanussi@linux.intel.com>
CC: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Wang Nan <wangnan0@huawei.com>
Subject: Re: latency histogram with BPF
References: <557937D8.90606@bmw-carit.de> <557A06D7.7030304@plumgrid.com>
In-Reply-To: <557A06D7.7030304@plumgrid.com>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 06/12/2015 12:08 AM, Alexei Starovoitov wrote:
> On 6/11/15 12:25 AM, Daniel Wagner wrote:
>> In both cases BPF or based on Tom's 'hist' triggers' patches, there is
>> some trickery necessary to get it working. While the first approach
>> has more flexibility what you want to measure or how you want to
>> present it, I suspect it will be harder to get it working/accepted.
> 
> why? Out of your patch only 2 lines are for kernel, which I wanted
> to add anyway. The sample code also looks good.

I had the impression that attaching kprobes to trace_preempt_[on|off] is
not going to work. But you seem confident to get this working :)

>> Obviously, adding kprobes to trace_preempt_[on|off] is
>> asking for problems:
> 
> thanks for reporting.
> I reproduced it with hackbench, but my stack trace looks completely
> different. Looks like some memory corruption is happening.
> Not clear where. I'm still debugging.

I have noticed that also network load triggers a panic. Most of the time
softirq is involved somehow. So far I haven't spend time looking into it
because I though that can't work anyway. I'll stare a bit more at it now.

>> CPU 3
>>        latency        : count     distribution
>>         1 -> 1        : 0       
>> |                                                  |
>>         2 -> 3        : 0       
>> |                                                  |
>>         4 -> 7        : 0       
>> |                                                  |
>>         8 -> 15       : 25182   
>> |************************************************* |
>>        16 -> 31       : 1675    
>> |**                                                |
>>        32 -> 63       : 0       
>> |                                                  |
>>        64 -> 127      : 0       
>> |                                                  |
>>       128 -> 255      : 0       
>> |                                                  |
>>       256 -> 511      : 0       
>> |                                                  |
>>       512 -> 1023     : 0       
>> |                                                  |
>>
>> The numbers look a bit too nice and low. I suspect something is going
>> wrong.
> 
> I see similar numbers. 25k over 5 sec = 5k preempt on/off per second
> which sounds about right for idle.

I expected to see some higher latencies it stays pretty low the whole
time. With Tom patches I got a histogram which looked like this:

https://lkml.org/lkml/2015/4/30/196

# trigger info:
hist:keys=latency.bucket:vals=:sort=latency.bucket:size=2048 [active]

latency:          0 hitcount:     558648
latency:        256 hitcount:      70163
latency:        512 hitcount:      24601
latency:        768 hitcount:       9649
latency:       1024 hitcount:       4578
latency:       1280 hitcount:       2623
latency:       1536 hitcount:       1642
latency:       1792 hitcount:       1281
latency:       2048 hitcount:       1203
latency:       2304 hitcount:        942
latency:       2560 hitcount:        713
latency:       2816 hitcount:        513
latency:       3072 hitcount:        436
latency:       3328 hitcount:        347
latency:       3584 hitcount:        312
latency:       3840 hitcount:        279
latency:       4096 hitcount:        269
latency:       4352 hitcount:        257
latency:       4608 hitcount:        236
latency:       4864 hitcount:        214
latency:       5120 hitcount:        190
latency:       5376 hitcount:        129
latency:       5632 hitcount:        159
latency:       5888 hitcount:        141
latency:       6144 hitcount:        118
latency:       6400 hitcount:        107
latency:       6656 hitcount:         99
latency:       6912 hitcount:         73
latency:       7168 hitcount:         70
latency:       7424 hitcount:         80
latency:       7680 hitcount:         58
latency:       7936 hitcount:         65
latency:       8192 hitcount:         57
latency:       8448 hitcount:         59
latency:       8704 hitcount:         67
latency:       8960 hitcount:         64
latency:       9216 hitcount:         82
latency:       9472 hitcount:         89
latency:       9728 hitcount:         74
latency:       9984 hitcount:         89
latency:      10240 hitcount:         65

This distribution seems to be consistent with the latency-hist.patch
from -rt. That patch has been used for a while and it looks like it
reports the right values.

https://git.kernel.org/cgit/linux/kernel/git/rt/linux-stable-rt.git/commit/?h=v3.14-rt&id=0eba3c128347cc82fff093af1c61b2211985b4c9

>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index 2d56ce5..e4d3e76 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -172,6 +172,8 @@ static const struct bpf_func_proto
>> *kprobe_prog_func_proto(enum bpf_func_id func
>>           return &bpf_probe_read_proto;
>>       case BPF_FUNC_ktime_get_ns:
>>           return &bpf_ktime_get_ns_proto;
>> +    case BPF_FUNC_get_smp_processor_id:
>> +        return &bpf_get_smp_processor_id_proto;
> 
> This part I'll take into my set of patches.

Great.

>> --- a/samples/bpf/bpf_helpers.h
>> +++ b/samples/bpf/bpf_helpers.h
>> @@ -21,6 +21,8 @@ static unsigned long long (*bpf_ktime_get_ns)(void) =
>>       (void *) BPF_FUNC_ktime_get_ns;
>>   static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) =
>>       (void *) BPF_FUNC_trace_printk;
>> +static unsigned int (*bpf_get_smp_processor_id)(void) =
>> +    (void *) BPF_FUNC_get_smp_processor_id;
> 
> this part is already there in net-next.

Ah good to know. FWIW, I had to pick some other snippets from other
trees to get it working. All those patches will hit mainline in some
near future.

>> diff --git a/samples/bpf/hwlathist_kern.c b/samples/bpf/hwlathist_kern.c
>> new file mode 100644
>> index 0000000..7a97e7e
>> --- /dev/null
>> +++ b/samples/bpf/hwlathist_kern.c
> 
> looks good and useful, but I would like to find and fix the bug first
> before exposing this bit as a 'sample', since right now it's
> 'crash me if hackbench is running' test :)

Agreed. :)

> If you have any suggestions on where to look, I'm all ears.
> My stack traces look like:
> Running with 10*40 (== 400) tasks.
> [   12.032571] kernel BUG at ../mm/slub.c:3413!
> [   12.036004]  [<ffffffff810c1913>] rcu_process_callbacks+0x283/0x680
> [   12.036004]  [<ffffffff810632d0>] __do_softirq+0xf0/0x2b0
> [   12.036004]  [<ffffffff8107dff9>] ? kthread+0xc9/0xe0
> [   12.036004]  [<ffffffff810634b1>] run_ksoftirqd+0x21/0x50
> or
> [   25.788074] kernel tried to execute NX-protected page - exploit
> attempt? (uid: 0)
> [   25.788801] BUG: unable to handle kernel paging request at
> ffff88000b1ea2a0
> [   25.800085]  [<ffffffff810c18ba>] ? rcu_process_callbacks+0x22a/0x680
> [   25.800085]  [<ffffffff810632d0>] __do_softirq+0xf0/0x2b0
> [   25.800085]  [<ffffffff816d6d7c>] do_softirq_own_stack+0x1c/0x30
> [   25.800085]  <EOI>
> [   25.800085]  [<ffffffff8106351d>] do_softirq+0x3d/0x40
> [   25.800085]  [<ffffffff810635ba>] __local_bh_enable_ip+0x9a/0xb0
> [   25.800085]  [<ffffffff816d4c1b>] _raw_spin_unlock_bh+0x1b/0x20
> [   25.800085]  [<ffffffff811f3e42>] bdi_writeback_workfn+0x1b2/0x470

I see crashes as well with do_softirq() and RCU in it as well.

cheers,
daniel