From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753180Ab3GHUUY (ORCPT ); Mon, 8 Jul 2013 16:20:24 -0400 Received: from mail-oa0-f51.google.com ([209.85.219.51]:53224 "EHLO mail-oa0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752792Ab3GHUUV (ORCPT ); Mon, 8 Jul 2013 16:20:21 -0400 MIME-Version: 1.0 In-Reply-To: <51DB1B75.8060303@intel.com> References: <20130704223010.GA30625@quad> <51DACE08.5030109@intel.com> <51DB1B75.8060303@intel.com> Date: Mon, 8 Jul 2013 22:20:21 +0200 Message-ID: Subject: Re: [PATCH] perf: fix interrupt handler timing harness From: Stephane Eranian To: Dave Hansen Cc: LKML , Peter Zijlstra , "mingo@elte.hu" , dave.hansen@linux.intel.com, "ak@linux.intel.com" , Jiri Olsa Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 8, 2013 at 10:05 PM, Dave Hansen wrote: > On 07/08/2013 11:08 AM, Stephane Eranian wrote: >> I admit I have some issues with your patch and what it is trying to avoid. >> There is already interrupt throttling. Your code seems to address latency >> issues on the handler rather than rate issues. Yet to mitigate the latency >> it is modify the throttling. > > If we have too many interrupts, we need to drop the rate (existing > throttling). > > If the interrupts _consistently_ take too long individually they can > starve out all the other CPU users. I saw no way to make them finish > faster, so the only recourse is to also drop the rate. > I think we need to investigate why some interrupts take so much time. Could be HW, could be SW. Not talking about old hardware here. Once we understand this, then we know maybe adjust the timing on our patch. >> For some unknown reasons, my HSW interrupt handler goes crazy for >> a while running a very simple: >> $ perf record -e cycles branchy_loop >> >> And I do see in the log: >> perf samples too long (2546 > 2500), lowering >> kernel.perf_event_max_sample_rate to 50000 >> >> Which is an enormous latency. I instrumented the code, and under >> normal conditions the latency >> of the handler for this perf run, is about 500ns and it is consistent >> with what I see on SNB. > > I was seeing latencies near 1 second from time to time, but > _consistently_ in the hundreds of milliseconds. On my systems, I see 500ns with one session running. But on HSW, something else is going on with bursts at 2500ns. That's not normal. I want an explanation for this.