From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E40D7C433EF for ; Tue, 19 Apr 2022 14:03:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:Subject:Message-ID:Date:From: In-Reply-To:References:MIME-Version:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=Lhx3+5qfyFBQzYfmYNrFXx9w2ECLSt3NFKuiSH8df4s=; b=IBU6Fl+yW/0pam TgT05B3ITZAWUK22pV9RROXTBZmD3qf6voUvz+FsBayuchRhPwueuDGoYbNKoxOnE0IigsBE9z2Qd uqQhDpVCcqlf/hHnFly9fgdh/VCfBof1POOolZESMHw/lJP3WLaT0iMPUuDwgRq412cIFcF75CC4k dKiq/L8vEsvsDZ3udl/LOAnhxSS7DVcc++a++M76PLwL2udH5Xl7Kh7YK7qkLrrJp+A7gdVR1ayRh fBgb7j4WJqb25VvEQH4MX0EKUv7jx+wOu3fwjK8tMW814etKwGzIs25EQ3pVxZmo0EPRNSGxnjAw9 cF51Dcn3swboisYYpUJw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1ngoQN-0045v2-Lq; Tue, 19 Apr 2022 14:01:48 +0000 Received: from mail-lf1-x130.google.com ([2a00:1450:4864:20::130]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1ngo05-003tG0-Uy for linux-arm-kernel@lists.infradead.org; Tue, 19 Apr 2022 13:34:40 +0000 Received: by mail-lf1-x130.google.com with SMTP id x17so29414627lfa.10 for ; Tue, 19 Apr 2022 06:34:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=o8jaDpW7vFBottIVRroyHyZ4nYXTGHPIgjXmDUuRh+4=; b=VmLWceDUCs0KMuLiyuAFEiIHSR91JfFNsp5/bh58URB2Md9En1NeAAiQJnRWis24zQ Ourtd2jVkT116lA/s506G+GyQX/9D6GRfDK3BiBvqIAFjfwS7zTLyG8O8NP+fdlJ190t vk1DrYHsJeueoaJaBbIroMwTyPgW7rMPMS9rE7+WoJOXIBieEyL1Y6UTNGowGjnSvluM KOGH9gl8M/LhHjQBa732RALhxMjaq7blEyPbKCpE7nJ3APGTXX0NwaY1BxACc1ORQL0k CvqcAoxWOo5PVCPXT9OWZwdiqeOTG1bq+W7n6LwiUHyGFBTY4m9Jd/cvZN3gvtLTKG/Q 6hQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=o8jaDpW7vFBottIVRroyHyZ4nYXTGHPIgjXmDUuRh+4=; b=61EOlof/J8Q+lZSmw+lAbCggwCCDMIkYE0HhQCe9pw7GnzYlTle1ieanZraFDxGLkU vfOAMpSfBY+w9ZiQGk3KGcigEW+s1RGGl1TPLnkHnERKect8d2/fAsz7dPvlDfkbmblB fxTNmK/jbMLdgoH8Ra0/Ioh4XHsJZHyo5TkGcZw8UvOK0FSVTp+O0Lc4xq1uryNyNQMy GIX1/FdVxEDaB3x85egjbVWQMPLPisF7OZV/LQHGWboVbiIkoXFEKGPQJZPQlSK9yRXa 5slGoCQnEcjDFpPT+emSyiRWu1/piIzHaEJiQkcIc1e6kVLUFIK1Q5Rsmg5+COoVnvPO L3WA== X-Gm-Message-State: AOAM533X5ak5pRvow3ye/23pNPvqOrCkMWXqBUsUsVvCmygbPCUinsus Ehm2haMLk+hkKEGrEiH5cUh3wspNC1CyUSVqVZM= X-Google-Smtp-Source: ABdhPJwsmVwxwA8pBOlC3+/bC2os1G/aAb5n52zXVbogFrHshNL+x5oqtmatoluUk3tFMZG6x5fqn5sA4hxkmHiUf7c= X-Received: by 2002:ac2:520a:0:b0:46b:b8fa:2ad1 with SMTP id a10-20020ac2520a000000b0046bb8fa2ad1mr11192768lfl.443.1650375273492; Tue, 19 Apr 2022 06:34:33 -0700 (PDT) MIME-Version: 1.0 References: <87o80yior2.wl-maz@kernel.org> <87pmld9q4q.wl-maz@kernel.org> In-Reply-To: <87pmld9q4q.wl-maz@kernel.org> From: Yichao Yu Date: Tue, 19 Apr 2022 09:34:21 -0400 Message-ID: Subject: Re: Kernel perf counter support (for apple M1 and others) To: Marc Zyngier Cc: linux-arm-kernel@lists.infradead.org, khuong@os.amperecomputing.com, will@kernel.org, mark.rutland@arm.com, Frank.li@nxp.com, zhangshaokun@hisilicon.com, liuqi115@huawei.com, john.garry@huawei.com, Mathieu Poirier , Leo Yan X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220419_063438_076943_0A2536A8 X-CRM114-Status: GOOD ( 84.34 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Apr 19, 2022 at 9:09 AM Marc Zyngier wrote: > > On Tue, 19 Apr 2022 13:06:37 +0100, > Yichao Yu wrote: > > > > > - I don't think there is any value in stashing any of these HW events > > > in the kernel. In most cases, the kernel definition only matches the > > > x86 definition, and doesn't accurately describe the vast majority of > > > the events implemented on an ARM CPU. The ARM architecture mentions > > > a handful of architectural events that actually match the kernel > > > definition, and for these CPUs the kernel carries the in-kernel > > > description. > > > > > > - For the M1, none of the above applies, because there is *NO* > > > architectural description for the events reported by the (non > > > architectural) PMU, and there is no guarantee that they actually > > > match the common understanding we have of these events. > > > > You mentioned documents from Apple on IRC and below. Why is that the > > only acceptable source? > > Because that would be the only one giving an exact definition to what > you are counting. Anything else is guess-work. Very good guess-work, > I'm sure, but still very much a wet finger in the air. > > > The entire support for M1 is based on reverse engineering/testing of > > the hardware so why would those not be acceptable sources here as > > well? > > Because there is a difference between getting something to work (the > PMU driver itself) and interpreting the results it gives. All we know > is that it is counting something. You can sort of guess what, but you > don't know for sure. > > > My understanding is that the current cycles and instructions counters > > were figured out this way so I don't see why you want them to be > > removed. > > Because you use them as an argument to pile more crap in the > kernel. Gee, at this stage, it is the driver itself I am going to > remove. I'm sorry but I'm not sure why you are so mad about this. That's certainly not my intention. I specifically said that I wasn't intended to submit anything to the kernel at this point (which I assume is the "crap" you are talking about) because I don't know what's acceptable and I want to understand why. For comparison to other M1 supporting code, I'm not talking about the perf counter driver specifically, but all of the other related drivers. I'm sure there is a lot of code that depends on what specific thing a register is doing. For this particular question I'd like to know why there's a difference between the two. The answer might be bluntly obvious to you but that is certainly not the case for me. (And FWIW, this very reason, that I think there might be some background knowledge that I'm lacking is why I asked on IRC first) > Feel free to write a document or something else. The only thing I care > about is in the kernel tree. That is fair, but my point is that this is literally the first time I heard about hardware event type being essentially deprecated. I'm certainly not qualified to write such a document myself (at least not right now) and I won't be unless someone could explain to me what is actually the expectation and why, or if there's existing document explaining all these so that I can contribute to the document of other projects. > > Currently, just by reading the document of the hardware event type, it > > seems that it should work if the hardware supports such counters. > > Such document would be the JSON file I mentioned. But since you have > stated that you don't intend to write anything that ends up in the > kernel, I guess that's a moot point. By document I meant that `perf_event_open(2)` doesn't say anything about, say the instruction hardware counter doesn't count all instructions even when you get a non-zero value. > > > That boat has sailed a long time ago, when the BL PMU support was > > > introduced, and all counters are treated equally: they are *NOT* > > > counted globally. Changing this would be an ABI break, and I seriously > > > doubt we want to go there. > > > > Sorry I'm not familiar with the names here. What's the "BL PMU" > > support? And what are the counters that are not counted globally? > > BL stands for Big-Little. Asymmetric support, if you want. None of the > counters are counted globally, only per PMU type. And this is an ABI > we cannot break. Are you talking about the dynamic PMU type or the hardware or raw type? > > > It would also mean that the kernel would need to know which counters > > > it can accumulate over the various CPU types (which is often more than > > > 2, these days). All of that to save userspace adding things? I doubt > > > this is worth it. > > > > > > > 4. There are other events that may not make as much sense to combine > > > > (cycles for example). However, I feel like a combined cycle count > > > > isn't going to be much tricker to use given that the cycle count on a > > > > single core is still affected by frequency scaling and it can still be > > > > used correctly by pinning the thread. > > > > > > I don't understand what frequency scaling has anything to do with this > > > (a cycle is still a cycle at any frequency). > > > > Exactly, a cycle is still a cycle, so I don't see why it's that big a > > problem to count it globally. > > Because you are going to walk the list of events generated during a > time slice, work out which ones are to be merged and which ones > aren't, and accumulate them into global, userspace visible counters? I > dread to imagine the effect on scheduling latency. All that to avoid > adding two values into userspace. Great. OK, if doing that will always incur a big overhead then I can take that. What I imagined was that this only needs to be done if the process is moved to a different CPU, and also I thought there should already be some logic in scheduling related to perf counters (I was imagining that's when the kernel decide to add/remove counters for other cases) which is why I thought adding such logic shouldn't make a big difference if no counters is used by the process. I can certainly be wrong about that. Also, see below. > > What I meant exactly was that if a code runs for 100 cycles at 1 GHz, > > it doesn't mean it'll also run (close to) 100 cycles at 3 GHz. > > Similarly, if it runs for 100 cycles on the E core, it doesn't mean > > it'll run for 100 cycles on the P core. > > And? What do you derive from this set of statements? And this is replying to the original argument you gave, saying that counting cycles across different core types doesn't make sense. What I'm saying here is that I don't believe counting across core types makes any more or less sense than counting cycles across different processor frequencies. > > We already allow the former case to count using the same counter > > everywhere, I don't see why the latter can't be allowed. (ABI change > > issue aside) > > *blink*. If you don't see a problem with changing the ABI, I'm at a > loss. Yes I do see the issue with changing ABIs. However, there are multiple arguments you brought up and I'd like to understand each of them individually. It's certainly possible that some of what I was asking about is impossible for some specific reason, but I'd like to understand all of the arguments you brought up to fully understand the issue. (also I intended to mean here that I get that there could be ABI issue, although I don't fully get it yet which is why I was asking above, however, I'd like to discuss this part without concerning the ABI issue, I didn't intend to mean that we can just ignore all the ABI issues and just change things. If that's not what I said actually implies, I'm sorry about that) > > I don't have hardware to test this but it also seems that on the new > > intel chips, the E core and the P core are counted together. (this is > > purely based on the lack of multiple counter support in rr to support > > the new chip...) > > Colour me uninterested on both count. x86 can do whatever they want. Again, this is just to show that counting globally on both E and P cores isn't something that makes as little sense as you originally said. > > > > > > The main reasons I'm asking about 3 and 4 is that > > > > 1. Right now, even to just count instructions without pinning the > > > > thread, I need to create two counters. > > > > > > How bad is that? I mean, the counters are per-CPU anyway, so there > > > *are* N counters (N being the number of CPUs). You only have to create > > > a counter per PMU. > > > > > > > 2. Even if the number isn't exactly accurate, it can still be useful > > > > as a general guideline. Right now, even if I just want to do a quick > > > > check, I still need to manually specify a dozen of events in `perf > > > > stat -e` rather than simply using `perf stat` (to make it worse, perf > > > > doesn't even provide any useful warning about it). It is also much > > > > harder to do things generically (which is at least partially because > > > > of the lack of documentation....). > > > > > > I see this as a potential perf-tool improvement. Being able to say > > > 'Count this event on all CPU PMUs' would certainly be valuable to all > > > asymmetric systems. > > > > Short answer is not that bad if and only if there's a standard and > > documented way to do this, userspace or kernel. > > Feel free to improve the kernel documentation[1], which is admittedly > pretty sparse on the subject. > > The kernel gives you the tools to match PMUs and CPUs (just rummage in > sysfs). If userspace knows which counter is what, you're in business. > Do document your findings, by any mean. And as I said above, without understanding all the details I can't. And it also seems that I don't know the right way to get such information without putting up crap so I'll appreciate it if you could let me know how I can find out more detail about it without annoy more people. > > M. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/arm64/perf.rst#n136 > > -- > Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel