From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751162AbdE3QkH (ORCPT <rfc822;w@1wt.eu>);
        Tue, 30 May 2017 12:40:07 -0400
Received: from mail-io0-f175.google.com ([209.85.223.175]:34321 "EHLO
        mail-io0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1750930AbdE3QkF (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 May 2017 12:40:05 -0400
MIME-Version: 1.0
In-Reply-To: <20170530092523.xkuj5lqpq5pb5y4m@hirez.programming.kicks-ass.net>
References: <1495213582-3635-1-git-send-email-kan.liang@intel.com>
 <20170522091916.3gydvflk4fnqkzw5@hirez.programming.kicks-ass.net>
 <37D7C6CF3E00A74B8858931C1DB2F077536F079F@SHSMSX103.ccr.corp.intel.com>
 <20170522192335.v4gvhz24ix2jeihg@hirez.programming.kicks-ass.net>
 <CABPqkBTp5muPs32b7YVbfu57aEKv8aXMS+E08xgjxaOvY+B7wQ@mail.gmail.com>
 <20170523063913.363ssgcy7kmeesye@hirez.programming.kicks-ass.net>
 <CABPqkBTo=KC1Qp6vx272UJd2VdPOuX7O1B7J3aY2Y8srQaW-gg@mail.gmail.com>
 <20170524154518.GA24144@tassilo.jf.intel.com> <alpine.DEB.2.20.1705241158160.23659@macbook-air>
 <CABPqkBQq_ARmJ-WMk-SXwRguwPAgSHA4F8zhnbU3BWmAYZqo=w@mail.gmail.com> <20170530092523.xkuj5lqpq5pb5y4m@hirez.programming.kicks-ass.net>
From: Stephane Eranian <eranian@google.com>
Date: Tue, 30 May 2017 09:39:59 -0700
Message-ID: <CABPqkBSSr=+GdprEapSEjUN+3+O+ko_0RsJNCKSSpbHz+ULORQ@mail.gmail.com>
Subject: Re: [PATCH 1/2] perf/x86/intel: enable CPU ref_cycles for GP counter
To: Peter Zijlstra <peterz@infradead.org>
Cc: Vince Weaver <vincent.weaver@maine.edu>,
        Andi Kleen <ak@linux.intel.com>, "Liang, Kan" <kan.liang@intel.com>,
        "mingo@redhat.com" <mingo@redhat.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "alexander.shishkin@linux.intel.com" 
        <alexander.shishkin@linux.intel.com>,
        "acme@redhat.com" <acme@redhat.com>,
        "jolsa@redhat.com" <jolsa@redhat.com>,
        "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
        "tglx@linutronix.de" <tglx@linutronix.de>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, May 30, 2017 at 2:25 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Sun, May 28, 2017 at 01:31:09PM -0700, Stephane Eranian wrote:
>> Ultimately, I would like to see the watchdog move out of the PMU. That
>> is the only sensible solution.
>> You just need a resource able to interrupt on NMI or you handle
>> interrupt masking in software as has
>> been proposed on LKML.
>
> So even if we do the soft masking, we still need to deal with regions
> where the interrupts are disabled. Once an interrupt hits the soft mask
> we still hardware mask.
>
What I was thinking is that you never hardware mask, software always
catches the hw interrupts and keeps them pending or deliver them
depending on sw mask.

> So to get full and reliable coverage we still need an NMI source.
>
> I agree that it would be lovely to free up the one counter though.
>
>
> One other approach is running the watchdog off of _any_ PMI, then all we
> need to ensure is that PMIs happen semi regularly. There are two cases
> where this becomes 'interesting':
>
>  - we have only !sampling events; in this case we have PMIs but at the
>    max period to properly account for counter overflow. This is too
>    large a period. We'd have to muck with the max period of at least one
>    counter.
>
>  - we have _no_ events; in this case we need to somehow schedule an
>    event anyway.
>
> It might be possible to deal with both cases by fudging the state of one
> of the fixed counters. Never clear the EN bit for that counter and
> reduce the max period for that one counter.
>
>
> I think a scheme like that was mentioned before, but I'm also afraid
> that it'll turn into quite the mess if we try it. And by its very nature
> it adds complexity and therefore risks reducing the reliability of the
> thing :/