From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=3.0 tests=BAYES_00,DATE_IN_PAST_06_12, DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEC9AC4363C for ; Wed, 7 Oct 2020 06:16:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5E74920739 for ; Wed, 7 Oct 2020 06:16:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1602051385; bh=+3JXakIv23sLoJM/W1v21FJMVyeL1ISP4SpSrmzlBC0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:List-ID:From; b=HyjRBW4b3kTEr6rqdOILS7/2IBerpbJvTBNzzRchVeZEM6iDPnV5jvakGIJCKtakU 0a9Hny7f+6uH4yComtCJsG8IHXDCMcl4BG1Yf5CN3D3BvOeHwaFBm/3H/iRlDMXiOK CZFImYi5dfdcz0ckJBW5kkE2BXbuTryOsKDTf6no= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727133AbgJGGQY (ORCPT ); Wed, 7 Oct 2020 02:16:24 -0400 Received: from mail.kernel.org ([198.145.29.99]:45082 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726502AbgJGGQY (ORCPT ); Wed, 7 Oct 2020 02:16:24 -0400 Received: from quaco.ghostprotocols.net (unknown [179.97.37.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 5D39C206E5; Wed, 7 Oct 2020 06:16:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1602051383; bh=+3JXakIv23sLoJM/W1v21FJMVyeL1ISP4SpSrmzlBC0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=O+9z+A9R29gEifWwMYMKoFj4FpUbxHMKvYuQyWSNDesXQwxzFzbmDzpu+tLGbTNrF YvyqwsiTb6k7DvPKx+IWpdPzKpfM4k+q5oJHQCtjM2pvpAY/0p6O2UiTBGjwtxfaDN 4TZYS/j0y5ScwPoNlkTJ4txab/nRZK7vaeOX09nY= Received: by quaco.ghostprotocols.net (Postfix, from userid 1000) id 723BF403AC; Tue, 6 Oct 2020 16:00:54 -0300 (-03) Date: Tue, 6 Oct 2020 16:00:54 -0300 From: Arnaldo Carvalho de Melo To: Peter Zijlstra Cc: linux-toolchains@vger.kernel.org, Stephane Eranian , linux-kernel@ver.kernel.org, Ingo Molnar , Jiri Olsa , namhyung@kernel.org, irogers@google.com, kim.phillips@amd.com, Mark Rutland , andrii@kernel.org Subject: Re: Additional debug info to aid cacheline analysis Message-ID: <20201006190054.GA187024@kernel.org> References: <20201006131703.GR2628@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20201006131703.GR2628@hirez.programming.kicks-ass.net> X-Url: http://acmel.wordpress.com Precedence: bulk List-ID: X-Mailing-List: linux-toolchains@vger.kernel.org Em Tue, Oct 06, 2020 at 03:17:03PM +0200, Peter Zijlstra escreveu: > Hi all, > I've been trying to float this idea for a fair number of years, and I > think at least Stephane has been talking to tools people about it, but > I'm not sure what, if anything, ever happened with it, so let me post it > here :-) > Basically, what I want is a (perf) tool for cacheline optimizations. > Something very much like the excellent pahole tool, but with hit/miss > information added. > Now, some PMUs provide the data address for various relevant events, but > that gets us the problem of mapping a 'random' address to a type and > offset. And esp. for dynamic objects, that's a difficult problem. > However, the compiler actually knows what type and offset (most) memory > references are, so if perf can get us the exact IP (Intel PEBS / AMD > IBS, as opposed to one with skid on) we could get the type from debug > info. > And therein lies the rub, existing debug info (DWARF) does contain type > information, but in a way that is (I've been told) _very_ hard to use > for this purpose. > So could the compiler emit extra debug info for every instruction with a > memory reference on to facilitate this? I guess this is what is done to enable CO-RE, there you have to mark areas of interest, i.e. in your program you enclose access to fields of kernel data structures you use in your BPF program so that when loading it libbpf can check at the fields used in your program and in the kernel (/sys/kernel/btf/vmlinux) and figure out if those fields moved, then it fixes up the offsets from the start of the struct. You want those relocation records for all types in the kernel, not to fixup things, but to figure out that some load or store in some struct member is for a type. https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html Compiler support To enable BPF CO-RE and let BPF loader (i.e., libbpf) to adjust BPF program to a particular kernel running on target host, Clang was extended with few built-ins. They emit BTF relocations which capture a high-level description of what pieces of information BPF program code intended to read. If you were going to access task_struct->pid field, Clang would record that it was exactly a field named "pid" of type “pid_t” residing within a struct task_struct. This is done so that even if target kernel has a task_struct layout in which “pid” field got moved to a different offset within a task_struct structure (e.g., due to extra field added before “pid” field), or even if it was moved into some nested anonymous struct or union (and this is completely transparent in C code, so no one ever pays attention to details like that), we’ll still be able to find it just by its name and type information. This is called a field offset relocation. It is possible to capture (and subsequently relocate) not just a field offset, but other field aspects, like field existence or size. Even for bitfields (which are notoriously "uncooperative" kinds of data in the C language, resisting efforts to make them relocatable) it is still possible to capture enough information to make them relocatable, all transparently to BPF program developer. High-level BPF CO-RE mechanics BPF CO-RE brings together necessary pieces of functionality and data at all levels of the software stack: kernel, user-space BPF loader library (libbpf), and compiler (Clang) – to make it possible and easy to write BPF programs in a portable manner, handling discrepancies between different kernels within the same pre-compiled BPF program. BPF CO-RE requires a careful integration and cooperation of the following components: BTF type information, which allows to capture crucial pieces of information about kernel and BPF program types and code, enabling all the other parts of BPF CO-RE puzzle; compiler (Clang) provides means for BPF program C code to express the intent and record relocation information; BPF loader (libbpf) ties BTFs from kernel and BPF program together to adjust compiled BPF code to specific kernel on target hosts; kernel, while staying completely BPF CO-RE-agnostic, provides advanced BPF features to enable some of the more advanced scenarios. Working in ensemble, these components enable unprecedented ability to develop portable BPF programs with ease, adaptability, and expressivity, previously achievable only through compiling BPF program’s C code in runtime through BCC, but without paying a high price of the BCC way. - Arnaldo