From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1755184AbdGKCDi (ORCPT <rfc822;w@1wt.eu>);
        Mon, 10 Jul 2017 22:03:38 -0400
Received: from mail-yw0-f193.google.com ([209.85.161.193]:35776 "EHLO
        mail-yw0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1755155AbdGKCDg (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 10 Jul 2017 22:03:36 -0400
MIME-Version: 1.0
In-Reply-To: <CAOp6jLZjGDRiHaitwcV0Jz86WbL=8UocEFCGd_ZXAEYRwtf6nA@mail.gmail.com>
References: <CAP045Ao2=6A7GEhBfmY0gZ2g=urL5U4R5ywLsTo4Yb6Tk1M8Cw@mail.gmail.com>
 <2256f9b5-1277-c4b1-1472-61a10cd1db9a@linux.intel.com> <CAP045AryPCUTO92S3hmrkag3D7NqgJM-hK82a7iKm9s-rWdn1w@mail.gmail.com>
 <20170628101248.GB5981@leverpostej> <20170628105600.GC5981@leverpostej>
 <CAP045ApjOeng6M9kqwbNpv5OTN=_Gkk2MHREztzZV_CecqpmxQ@mail.gmail.com>
 <20170628174900.GG8252@leverpostej> <CAP045Apcdk6wHU9yt3m5x6L_GUoOqnU2DzKeSQj5nHRGQuNuRQ@mail.gmail.com>
 <20170704090313.xyb5lntyy55ga7dm@hirez.programming.kicks-ass.net>
 <20170704093345.GB19649@leverpostej> <20170704102159.GB20062@leverpostej> <CAOp6jLZjGDRiHaitwcV0Jz86WbL=8UocEFCGd_ZXAEYRwtf6nA@mail.gmail.com>
From: Kyle Huey <me@kylehuey.com>
Date: Mon, 10 Jul 2017 19:03:35 -0700
Message-ID: <CAP045Aob-1Yk7TY_FzxbDOMDzTsuSqwBF389ZM=PgUedG_eS8g@mail.gmail.com>
Subject: Re: [PATCH] perf/core: generate overflow signal when samples are
 dropped (WAS: Re: [REGRESSION] perf/core: PMU interrupts dropped if we
 entered the kernel in the "skid" region)
To: Mark Rutland <mark.rutland@arm.com>, Ingo Molnar <mingo@kernel.org>,
        gregkh@linuxfoundation.org, Peter Zijlstra <peterz@infradead.org>
Cc: Vince Weaver <vincent.weaver@maine.edu>,
        "Jin, Yao" <yao.jin@linux.intel.com>, stable@vger.kernel.org,
        Alexander Shishkin <alexander.shishkin@linux.intel.com>,
        Arnaldo Carvalho de Melo <acme@redhat.com>,
        Jiri Olsa <jolsa@redhat.com>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Namhyung Kim <namhyung@kernel.org>,
        Stephane Eranian <eranian@google.com>,
        Thomas Gleixner <tglx@linutronix.de>, acme@kernel.org,
        jolsa@kernel.org, kan.liang@intel.com,
        Will Deacon <will.deacon@arm.com>,
        open list <linux-kernel@vger.kernel.org>,
        "Robert O'Callahan" <robert@ocallahan.org>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed, Jul 5, 2017 at 10:07 PM, Robert O'Callahan <robert@ocallahan.org> wrote:
> On Tue, Jul 4, 2017 at 3:21 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>> Should any of those be moved into the "should be dropped" pile?
>
> Why not be conservative and clear every sample you're not sure about?
>
> We'd appreciate a fix sooner rather than later here, since rr is
> currently broken on every stable Linux kernel and our attempts to
> implement a workaround have failed.
>
> (We have separate "interrupt" and "measure" counters, and I thought we
> might work around this regression by programming the "interrupt"
> counter to count kernel events as well as user events (interrupting
> early is OK), but that caused our (completely separate) "measure"
> counter to report off-by-one results (!), which seems to be a
> different bug present on a range of older kernels.)

This seems to have stalled out here unfortunately.

Can we get a consensus (from ingo or peterz?) on Mark's question?  Or,
alternatively, can we move the patch at the top of this thread forward
on the stable branches until we do reach an answer to that question?

We've abandoned hope of working around this problem in rr and are
currently broken for all of our users with an up-to-date kernel, so
the situation for us is rather dire at the moment I'm afraid.

Thanks,

- Kyle