From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751363AbdBWPHM (ORCPT <rfc822;w@1wt.eu>);
        Thu, 23 Feb 2017 10:07:12 -0500
Received: from mail-it0-f44.google.com ([209.85.214.44]:37918 "EHLO
        mail-it0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1750916AbdBWPHL (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 23 Feb 2017 10:07:11 -0500
From: Vince Weaver <vincent.weaver@maine.edu>
X-Google-Original-From: Vince Weaver <vince@maine.edu>
Date: Thu, 23 Feb 2017 10:07:01 -0500 (EST)
X-X-Sender: vince@macbook-air
To: "Liang, Kan" <kan.liang@intel.com>
cc: Peter Zijlstra <peterz@infradead.org>,
        "Odzioba, Lukasz" <lukasz.odzioba@intel.com>,
        Stephane Eranian <eranian@google.com>,
        "mingo@redhat.com" <mingo@redhat.com>,
        LKML <linux-kernel@vger.kernel.org>,
        Alexander Shishkin <alexander.shishkin@linux.intel.com>,
        "ak@linux.intel.com" <ak@linux.intel.com>
Subject: RE: [PATCH] perf/x86: fix event counter update issue
In-Reply-To: <37D7C6CF3E00A74B8858931C1DB2F077536A9963@SHSMSX103.ccr.corp.intel.com>
Message-ID: <alpine.DEB.2.20.1702231000230.15726@macbook-air>
References: <1480361206-1702-1-git-send-email-kan.liang@intel.com> <20161129092520.GB3092@twins.programming.kicks-ass.net> <CABPqkBQPRBZZB8wbhYGLoS9ww_0vQrt=nkqgDC_fFgG99cqdCg@mail.gmail.com> <20161129173055.GP3092@twins.programming.kicks-ass.net>
 <37D7C6CF3E00A74B8858931C1DB2F07750CA4225@SHSMSX103.ccr.corp.intel.com> <20161129193201.GE3045@worktop.programming.kicks-ass.net> <37D7C6CF3E00A74B8858931C1DB2F07750CA42A3@SHSMSX103.ccr.corp.intel.com> <D6EDEBF1F91015459DB866AC4EE162CC024DCFE5@IRSMSX103.ger.corp.intel.com>
 <20161205102509.GH3124@twins.programming.kicks-ass.net> <alpine.DEB.2.20.1702220941570.24020@macbook-air> <37D7C6CF3E00A74B8858931C1DB2F077536A9963@SHSMSX103.ccr.corp.intel.com>
User-Agent: Alpine 2.20 (DEB 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed, 22 Feb 2017, Liang, Kan wrote:

> > So from what I understand, the issue is if we have an architecture with full-
> > width counters and we trigger a x86_perf_event_update() when bit
> > 47 is set?
> 
> No. It related to the counter width. The number of bits we can use should be
> 1 bit less than the total width. Otherwise, there will be problem.
> For big cores such as haswell, broadwell, skylake, the counter width is 48 bit.
> So we can only use 47 bits.
> For Silvermont and KNL, the counter width is only 32 bit I think. So we can only
> use 31 bits.

So on a machine with 48-bit counters I should just have a counting event
that counts to somewhere above 0x8000 0000 0001 and it should show 
problems?
Because I am unable to trigger this.

But I guess if anywhere along the line x86_perf_event_update() is run
then you start over?

I noticed your original reproducer bound the event to a core, is that 
needed to trigger this?

Can it happen on a fixed event or only a genearl purpose event?

> > So if I have a test that runs in a loop for 2^48 retired instructions (which
> > takes ~12 hours on a recent machine) and then reads the results, they
> > might be wrong?
> 
> It only needs several minutes to reproduce the issue on SLM/KNL.

Yes, but I only have machines with 48-bit counters.  So it's going to take 
256 times as long as on a machine with 40-bit counters.

I have an assembly loop that can consistently generate 2 instructions/cycle
(I'd be glad to hear suggestions for events that count faster) and on
a broadwell-ep machine it still takes at least 7 hours or so to get up
to 0x800000000000.

Vince