linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] pgo: add clang's Profile Guided Optimization infrastructure
@ 2021-01-11  8:18 Bill Wendling
  2021-01-11  8:39 ` Sedat Dilek
                   ` (5 more replies)
  0 siblings, 6 replies; 116+ messages in thread
From: Bill Wendling @ 2021-01-11  8:18 UTC (permalink / raw)
  To: Jonathan Corbet, Masahiro Yamada, linux-doc, linux-kernel,
	linux-kbuild, clang-built-linux, Andrew Morton
  Cc: Nathan Chancellor, Nick Desaulniers, Sami Tolvanen, Bill Wendling

From: Sami Tolvanen <samitolvanen@google.com>

Enable the use of clang's Profile-Guided Optimization[1]. To generate a
profile, the kernel is instrumented with PGO counters, a representative
workload is run, and the raw profile data is collected from
/sys/kernel/debug/pgo/profraw.

The raw profile data must be processed by clang's "llvm-profdata" tool before
it can be used during recompilation:

  $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
  $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw

Multiple raw profiles may be merged during this step.

The data can be used either by the compiler if LTO isn't enabled:

    ... -fprofile-use=vmlinux.profdata ...

or by LLD if LTO is enabled:

    ... -lto-cs-profile-file=vmlinux.profdata ...

This initial submission is restricted to x86, as that's the platform we know
works. This restriction can be lifted once other platforms have been verified
to work with PGO.

Note that this method of profiling the kernel is clang-native and isn't
compatible with clang's gcov support in kernel/gcov.

[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Co-developed-by: Bill Wendling <morbo@google.com>
Signed-off-by: Bill Wendling <morbo@google.com>
---
 Documentation/dev-tools/index.rst     |   1 +
 Documentation/dev-tools/pgo.rst       | 127 +++++++++
 MAINTAINERS                           |   9 +
 Makefile                              |   3 +
 arch/Kconfig                          |   1 +
 arch/arm/boot/bootp/Makefile          |   1 +
 arch/arm/boot/compressed/Makefile     |   1 +
 arch/arm/vdso/Makefile                |   3 +-
 arch/arm64/kernel/vdso/Makefile       |   3 +-
 arch/arm64/kvm/hyp/nvhe/Makefile      |   1 +
 arch/mips/boot/compressed/Makefile    |   1 +
 arch/mips/vdso/Makefile               |   1 +
 arch/nds32/kernel/vdso/Makefile       |   4 +-
 arch/parisc/boot/compressed/Makefile  |   1 +
 arch/powerpc/kernel/Makefile          |   6 +-
 arch/powerpc/kernel/trace/Makefile    |   3 +-
 arch/powerpc/kernel/vdso32/Makefile   |   1 +
 arch/powerpc/kernel/vdso64/Makefile   |   1 +
 arch/powerpc/kexec/Makefile           |   3 +-
 arch/powerpc/xmon/Makefile            |   1 +
 arch/riscv/kernel/vdso/Makefile       |   3 +-
 arch/s390/boot/Makefile               |   1 +
 arch/s390/boot/compressed/Makefile    |   1 +
 arch/s390/kernel/Makefile             |   1 +
 arch/s390/kernel/vdso64/Makefile      |   3 +-
 arch/s390/purgatory/Makefile          |   1 +
 arch/sh/boot/compressed/Makefile      |   1 +
 arch/sh/mm/Makefile                   |   1 +
 arch/sparc/vdso/Makefile              |   1 +
 arch/x86/Kconfig                      |   1 +
 arch/x86/boot/Makefile                |   1 +
 arch/x86/boot/compressed/Makefile     |   1 +
 arch/x86/entry/vdso/Makefile          |   1 +
 arch/x86/kernel/vmlinux.lds.S         |   2 +
 arch/x86/platform/efi/Makefile        |   1 +
 arch/x86/purgatory/Makefile           |   1 +
 arch/x86/realmode/rm/Makefile         |   1 +
 arch/x86/um/vdso/Makefile             |   1 +
 drivers/firmware/efi/libstub/Makefile |   1 +
 drivers/s390/char/Makefile            |   1 +
 include/asm-generic/vmlinux.lds.h     |  44 +++
 kernel/Makefile                       |   1 +
 kernel/pgo/Kconfig                    |  34 +++
 kernel/pgo/Makefile                   |   5 +
 kernel/pgo/fs.c                       | 382 ++++++++++++++++++++++++++
 kernel/pgo/instrument.c               | 147 ++++++++++
 kernel/pgo/pgo.h                      | 206 ++++++++++++++
 scripts/Makefile.lib                  |  10 +
 48 files changed, 1017 insertions(+), 9 deletions(-)
 create mode 100644 Documentation/dev-tools/pgo.rst
 create mode 100644 kernel/pgo/Kconfig
 create mode 100644 kernel/pgo/Makefile
 create mode 100644 kernel/pgo/fs.c
 create mode 100644 kernel/pgo/instrument.c
 create mode 100644 kernel/pgo/pgo.h

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index f7809c7b1ba9e..8d6418e858062 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -26,6 +26,7 @@ whole; patches welcome!
    kgdb
    kselftest
    kunit/index
+   pgo
 
 
 .. only::  subproject and html
diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
new file mode 100644
index 0000000000000..2ed7f549b20ef
--- /dev/null
+++ b/Documentation/dev-tools/pgo.rst
@@ -0,0 +1,127 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+Using PGO with the Linux kernel
+===============================
+
+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
+when building with Clang. The profiling data is exported via the ``pgo``
+debugfs directory.
+
+.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
+
+
+Preparation
+===========
+
+Configure the kernel with:
+
+.. code-block:: make
+
+   CONFIG_DEBUG_FS=y
+   CONFIG_PGO_CLANG=y
+
+Note that kernels compiled with profiling flags will be significantly larger
+and run slower.
+
+Profiling data will only become accessible once debugfs has been mounted:
+
+.. code-block:: sh
+
+   mount -t debugfs none /sys/kernel/debug
+
+
+Customization
+=============
+
+You can enable or disable profiling for individual file and directories by
+adding a line similar to the following to the respective kernel Makefile:
+
+- For a single file (e.g. main.o)
+
+  .. code-block:: make
+
+     PGO_PROFILE_main.o := y
+
+- For all files in one directory
+
+  .. code-block:: make
+
+     PGO_PROFILE := y
+
+To exclude files from being profiled use
+
+  .. code-block:: make
+
+     PGO_PROFILE_main.o := n
+
+and
+
+  .. code-block:: make
+
+     PGO_PROFILE := n
+
+Only files which are linked to the main kernel image or are compiled as kernel
+modules are supported by this mechanism.
+
+
+Files
+=====
+
+The PGO kernel support creates the following files in debugfs:
+
+``/sys/kernel/debug/pgo``
+	Parent directory for all PGO-related files.
+
+``/sys/kernel/debug/pgo/reset``
+	Global reset file: resets all coverage data to zero when written to.
+
+``/sys/kernel/debug/profraw``
+	The raw PGO data that must be processed with ``llvm_profdata``.
+
+
+Workflow
+========
+
+The PGO kernel can be run on the host or test machines. The data though should
+be analyzed with Clang's tools from the same Clang version as the kernel was
+compiled. Clang's tolerant of version skew, but it's easier to use the same
+Clang version.
+
+The profiling data is useful for optimizing the kernel, analyzing coverage,
+etc. Clang offers tools to perform these tasks.
+
+Here is an example workflow for profiling an instrumented kernel with PGO and
+using the result to optimize the kernel:
+
+1) Install the kernel on the TEST machine.
+
+2) Reset the data counters right before running the load tests
+
+   .. code-block:: sh
+
+      echo 1 > /sys/kernel/debug/pgo/reset
+
+3) Run the load tests.
+
+4) Collect the raw profile data
+
+   .. code-block:: sh
+
+      cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
+
+5) (Optional) Download the raw profile data to the HOST machine.
+
+6) Process the raw profile data
+
+   .. code-block:: sh
+
+      llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
+
+   Note that multiple raw profile data files can be merged during this step.
+
+7) Rebuild the kernel using the profile data (PGO disabled)
+
+   .. code-block:: sh
+
+      make ... KCLAGS=-fprofile-use=vmlinux.profdata
diff --git a/MAINTAINERS b/MAINTAINERS
index 6390491b07e51..7a98bdaab9861 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13955,6 +13955,15 @@ S:	Maintained
 F:	include/linux/personality.h
 F:	include/uapi/linux/personality.h
 
+PGO BASED KERNEL PROFILING
+M:	Sami Tolvanen <samitolvanen@google.com>
+M:	Bill Wendling <wcw@google.com>
+R:	Nathan Chancellor <natechancellor@gmail.com>
+R:	Nick Desaulniers <ndesaulniers@google.com>
+S:	Supported
+F:	Documentation/dev-tools/pgo.rst
+F:	kernel/pgo
+
 PHOENIX RC FLIGHT CONTROLLER ADAPTER
 M:	Marcus Folkesson <marcus.folkesson@gmail.com>
 L:	linux-input@vger.kernel.org
diff --git a/Makefile b/Makefile
index 8b2c3f88ee5ea..4f42957c78134 100644
--- a/Makefile
+++ b/Makefile
@@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
 # Defaults to vmlinux, but the arch makefile usually adds further targets
 all: vmlinux
 
+CFLAGS_PGO_CLANG := -fprofile-generate
+export CFLAGS_PGO_CLANG
+
 CFLAGS_GCOV	:= -fprofile-arcs -ftest-coverage \
 	$(call cc-option,-fno-tree-loop-im) \
 	$(call cc-disable-warning,maybe-uninitialized,)
diff --git a/arch/Kconfig b/arch/Kconfig
index 78c6f05b10f91..a7a6ab7d204dc 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1106,6 +1106,7 @@ config ARCH_SUPPORTS_DEBUG_PAGEALLOC
 	bool
 
 source "kernel/gcov/Kconfig"
+source "kernel/pgo/Kconfig"
 
 source "scripts/gcc-plugins/Kconfig"
 
diff --git a/arch/arm/boot/bootp/Makefile b/arch/arm/boot/bootp/Makefile
index 981a8d03f064c..523bd58df0a4b 100644
--- a/arch/arm/boot/bootp/Makefile
+++ b/arch/arm/boot/bootp/Makefile
@@ -7,6 +7,7 @@
 #
 
 GCOV_PROFILE	:= n
+PGO_PROFILE	:= n
 
 LDFLAGS_bootp	:= --no-undefined -X \
 		 --defsym initrd_phys=$(INITRD_PHYS) \
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index fb521efcc6c20..5fd0fd85fc0e5 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -24,6 +24,7 @@ OBJS		+= hyp-stub.o
 endif
 
 GCOV_PROFILE		:= n
+PGO_PROFILE		:= n
 KASAN_SANITIZE		:= n
 
 # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile
index b558bee0e1f6b..11f6ce4b48b56 100644
--- a/arch/arm/vdso/Makefile
+++ b/arch/arm/vdso/Makefile
@@ -36,8 +36,9 @@ else
 CFLAGS_vgettimeofday.o = -O2 -include $(c-gettimeofday-y)
 endif
 
-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
 KCOV_INSTRUMENT := n
diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
index cd9c3fa25902f..d48fc0df07020 100644
--- a/arch/arm64/kernel/vdso/Makefile
+++ b/arch/arm64/kernel/vdso/Makefile
@@ -41,8 +41,9 @@ ifneq ($(c-gettimeofday-y),)
   CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
 endif
 
-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 obj-y += vdso.o
 targets += vdso.lds
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 1f1e351c5fe2b..ad128ecdbfbdf 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -60,6 +60,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS), $(KBUILD_CFLAG
 # compiler instrumentation that inserts callbacks or checks into the code may
 # cause crashes. Just disable it.
 GCOV_PROFILE	:= n
+PGO_PROFILE	:= n
 KASAN_SANITIZE	:= n
 UBSAN_SANITIZE	:= n
 KCOV_INSTRUMENT	:= n
diff --git a/arch/mips/boot/compressed/Makefile b/arch/mips/boot/compressed/Makefile
index 47cd9dc7454af..0855ea12f2c7f 100644
--- a/arch/mips/boot/compressed/Makefile
+++ b/arch/mips/boot/compressed/Makefile
@@ -37,6 +37,7 @@ KBUILD_AFLAGS := $(KBUILD_AFLAGS) -D__ASSEMBLY__ \
 # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
 KCOV_INSTRUMENT		:= n
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 # decompressor objects (linked with vmlinuz)
 vmlinuzobjs-y := $(obj)/head.o $(obj)/decompress.o $(obj)/string.o
diff --git a/arch/mips/vdso/Makefile b/arch/mips/vdso/Makefile
index 5810cc12bc1d9..d7eb64de35eae 100644
--- a/arch/mips/vdso/Makefile
+++ b/arch/mips/vdso/Makefile
@@ -66,6 +66,7 @@ ldflags-y := -Bsymbolic --no-undefined -soname=linux-vdso.so.1 \
 CFLAGS_REMOVE_vdso.o = -pg
 
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 KCOV_INSTRUMENT := n
 
diff --git a/arch/nds32/kernel/vdso/Makefile b/arch/nds32/kernel/vdso/Makefile
index 55df25ef00578..f2b53ee2124b7 100644
--- a/arch/nds32/kernel/vdso/Makefile
+++ b/arch/nds32/kernel/vdso/Makefile
@@ -15,9 +15,9 @@ obj-vdso := $(addprefix $(obj)/, $(obj-vdso))
 ccflags-y := -shared -fno-common -fno-builtin -nostdlib -fPIC -Wl,-shared -g \
 	-Wl,-soname=linux-vdso.so.1 -Wl,--hash-style=sysv
 
-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
 GCOV_PROFILE := n
-
+PGO_PROFILE := n
 
 obj-y += vdso.o
 targets += vdso.lds
diff --git a/arch/parisc/boot/compressed/Makefile b/arch/parisc/boot/compressed/Makefile
index dff4536875305..5cf93a67f7da7 100644
--- a/arch/parisc/boot/compressed/Makefile
+++ b/arch/parisc/boot/compressed/Makefile
@@ -7,6 +7,7 @@
 
 KCOV_INSTRUMENT := n
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 
 targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index fe2ef598e2ead..c642c046660d7 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -153,17 +153,21 @@ endif
 obj-$(CONFIG_PPC_SECURE_BOOT)	+= secure_boot.o ima_arch.o secvar-ops.o
 obj-$(CONFIG_PPC_SECVAR_SYSFS)	+= secvar-sysfs.o
 
-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
 GCOV_PROFILE_prom_init.o := n
+PGO_PROFILE_prom_init.o := n
 KCOV_INSTRUMENT_prom_init.o := n
 UBSAN_SANITIZE_prom_init.o := n
 GCOV_PROFILE_kprobes.o := n
+PGO_PROFILE_kprobes.o := n
 KCOV_INSTRUMENT_kprobes.o := n
 UBSAN_SANITIZE_kprobes.o := n
 GCOV_PROFILE_kprobes-ftrace.o := n
+PGO_PROFILE_kprobes-ftrace.o := n
 KCOV_INSTRUMENT_kprobes-ftrace.o := n
 UBSAN_SANITIZE_kprobes-ftrace.o := n
 GCOV_PROFILE_syscall_64.o := n
+PGO_PROFILE_syscall_64.o := n
 KCOV_INSTRUMENT_syscall_64.o := n
 UBSAN_SANITIZE_syscall_64.o := n
 UBSAN_SANITIZE_vdso.o := n
diff --git a/arch/powerpc/kernel/trace/Makefile b/arch/powerpc/kernel/trace/Makefile
index 858503775c583..7d72ae7d4f8c6 100644
--- a/arch/powerpc/kernel/trace/Makefile
+++ b/arch/powerpc/kernel/trace/Makefile
@@ -23,7 +23,8 @@ obj-$(CONFIG_TRACING)			+= trace_clock.o
 obj-$(CONFIG_PPC64)			+= $(obj64-y)
 obj-$(CONFIG_PPC32)			+= $(obj32-y)
 
-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
 GCOV_PROFILE_ftrace.o := n
+PGO_PROFILE_ftrace.o := n
 KCOV_INSTRUMENT_ftrace.o := n
 UBSAN_SANITIZE_ftrace.o := n
diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile
index 9cb6f524854b9..655e159975a04 100644
--- a/arch/powerpc/kernel/vdso32/Makefile
+++ b/arch/powerpc/kernel/vdso32/Makefile
@@ -34,6 +34,7 @@ targets := $(obj-vdso32) vdso32.so.dbg
 obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32))
 
 GCOV_PROFILE := n
+PGO_PROFILE := n
 KCOV_INSTRUMENT := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile
index bf363ff371521..12c286f5afc16 100644
--- a/arch/powerpc/kernel/vdso64/Makefile
+++ b/arch/powerpc/kernel/vdso64/Makefile
@@ -21,6 +21,7 @@ targets := $(obj-vdso64) vdso64.so.dbg
 obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64))
 
 GCOV_PROFILE := n
+PGO_PROFILE := n
 KCOV_INSTRUMENT := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
index 4aff6846c7726..1c7f65e3cb969 100644
--- a/arch/powerpc/kexec/Makefile
+++ b/arch/powerpc/kexec/Makefile
@@ -16,7 +16,8 @@ endif
 endif
 
 
-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
 GCOV_PROFILE_core_$(BITS).o := n
+PGO_PROFILE_core_$(BITS).o := n
 KCOV_INSTRUMENT_core_$(BITS).o := n
 UBSAN_SANITIZE_core_$(BITS).o := n
diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
index eb25d7554ffd1..7aff80d18b44b 100644
--- a/arch/powerpc/xmon/Makefile
+++ b/arch/powerpc/xmon/Makefile
@@ -2,6 +2,7 @@
 # Makefile for xmon
 
 GCOV_PROFILE := n
+PGO_PROFILE := n
 KCOV_INSTRUMENT := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
index 0cfd6da784f84..882340dc3c647 100644
--- a/arch/riscv/kernel/vdso/Makefile
+++ b/arch/riscv/kernel/vdso/Makefile
@@ -32,8 +32,9 @@ CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
 # Disable -pg to prevent insert call site
 CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
 
-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
 GCOV_PROFILE := n
+PGO_PROFILE := n
 KCOV_INSTRUMENT := n
 
 # Force dependency
diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
index 41a64b8dce252..bee4a32040e79 100644
--- a/arch/s390/boot/Makefile
+++ b/arch/s390/boot/Makefile
@@ -5,6 +5,7 @@
 
 KCOV_INSTRUMENT := n
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
 
diff --git a/arch/s390/boot/compressed/Makefile b/arch/s390/boot/compressed/Makefile
index de18dab518bb6..c3ab883e8425a 100644
--- a/arch/s390/boot/compressed/Makefile
+++ b/arch/s390/boot/compressed/Makefile
@@ -7,6 +7,7 @@
 
 KCOV_INSTRUMENT := n
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
 
diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
index dd73b7f074237..bd857aacad794 100644
--- a/arch/s390/kernel/Makefile
+++ b/arch/s390/kernel/Makefile
@@ -14,6 +14,7 @@ CFLAGS_REMOVE_early.o		= $(CC_FLAGS_FTRACE)
 endif
 
 GCOV_PROFILE_early.o		:= n
+PGO_PROFILE_early.o		:= n
 KCOV_INSTRUMENT_early.o		:= n
 UBSAN_SANITIZE_early.o		:= n
 KASAN_SANITIZE_ipl.o		:= n
diff --git a/arch/s390/kernel/vdso64/Makefile b/arch/s390/kernel/vdso64/Makefile
index a6e0fb6b91d6c..d7c43b7c1db96 100644
--- a/arch/s390/kernel/vdso64/Makefile
+++ b/arch/s390/kernel/vdso64/Makefile
@@ -35,8 +35,9 @@ obj-y += vdso64_wrapper.o
 targets += vdso64.lds
 CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
 
-# Disable gcov profiling, ubsan and kasan for VDSO code
+# Disable gcov and PGO profiling, ubsan and kasan for VDSO code
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
 
diff --git a/arch/s390/purgatory/Makefile b/arch/s390/purgatory/Makefile
index c57f8c40e9926..9aef584e98466 100644
--- a/arch/s390/purgatory/Makefile
+++ b/arch/s390/purgatory/Makefile
@@ -17,6 +17,7 @@ $(obj)/mem.o: $(srctree)/arch/s390/lib/mem.S FORCE
 
 KCOV_INSTRUMENT := n
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
 
diff --git a/arch/sh/boot/compressed/Makefile b/arch/sh/boot/compressed/Makefile
index 589d2d8a573db..ae19aeeb3964c 100644
--- a/arch/sh/boot/compressed/Makefile
+++ b/arch/sh/boot/compressed/Makefile
@@ -13,6 +13,7 @@ targets		:= vmlinux vmlinux.bin vmlinux.bin.gz \
 OBJECTS = $(obj)/head_32.o $(obj)/misc.o $(obj)/cache.o
 
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 #
 # IMAGE_OFFSET is the load offset of the compression loader
diff --git a/arch/sh/mm/Makefile b/arch/sh/mm/Makefile
index f69ddc70b1465..ea2782c631f43 100644
--- a/arch/sh/mm/Makefile
+++ b/arch/sh/mm/Makefile
@@ -43,3 +43,4 @@ obj-$(CONFIG_UNCACHED_MAPPING)	+= uncached.o
 obj-$(CONFIG_HAVE_SRAM_POOL)	+= sram.o
 
 GCOV_PROFILE_pmb.o := n
+PGO_PROFILE_pmb.o := n
diff --git a/arch/sparc/vdso/Makefile b/arch/sparc/vdso/Makefile
index c5e1545bc5cf9..ab5f3783fe199 100644
--- a/arch/sparc/vdso/Makefile
+++ b/arch/sparc/vdso/Makefile
@@ -115,6 +115,7 @@ quiet_cmd_vdso = VDSO    $@
 
 VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 -Bsymbolic
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 #
 # Install the unstripped copies of vdso*.so.  If our toolchain supports
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7b6dd10b162ac..a751b4f8f6645 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -95,6 +95,7 @@ config X86
 	select ARCH_SUPPORTS_DEBUG_PAGEALLOC
 	select ARCH_SUPPORTS_NUMA_BALANCING	if X86_64
 	select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP	if NR_CPUS <= 4096
+	select ARCH_SUPPORTS_PGO_CLANG		if X86_64
 	select ARCH_USE_BUILTIN_BSWAP
 	select ARCH_USE_QUEUED_RWLOCKS
 	select ARCH_USE_QUEUED_SPINLOCKS
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index fe605205b4ce2..383853e32f673 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -71,6 +71,7 @@ KBUILD_AFLAGS	:= $(KBUILD_CFLAGS) -D__ASSEMBLY__
 KBUILD_CFLAGS	+= $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
 KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 
 $(obj)/bzImage: asflags-y  := $(SVGA_MODE)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index e0bc3988c3faa..ed12ab65f6065 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
 
 KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE :=n
 
 KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 02e3e42f380bd..26e2b3af0145c 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO    $@
 VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
 	$(call ld-option, --eh-frame-hdr) -Bsymbolic
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 quiet_cmd_vdso_and_check = VDSO    $@
       cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index efd9e9ea17f25..f6cab2316c46a 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -184,6 +184,8 @@ SECTIONS
 
 	BUG_TABLE
 
+	PGO_CLANG_DATA
+
 	ORC_UNWIND_TABLE
 
 	. = ALIGN(PAGE_SIZE);
diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
index 84b09c230cbd5..5f22b31446ad4 100644
--- a/arch/x86/platform/efi/Makefile
+++ b/arch/x86/platform/efi/Makefile
@@ -2,6 +2,7 @@
 OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
 KASAN_SANITIZE := n
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 obj-$(CONFIG_EFI) 		+= quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
 obj-$(CONFIG_EFI_MIXED)		+= efi_thunk_$(BITS).o
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 95ea17a9d20cb..36f20e99da0bc 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
 
 # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
 GCOV_PROFILE	:= n
+PGO_PROFILE	:= n
 KASAN_SANITIZE	:= n
 UBSAN_SANITIZE	:= n
 KCSAN_SANITIZE	:= n
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 83f1b6a56449f..21797192f958f 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -76,4 +76,5 @@ KBUILD_CFLAGS	:= $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
 KBUILD_AFLAGS	:= $(KBUILD_CFLAGS) -D__ASSEMBLY__
 KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
index 5943387e3f357..54f5768f58530 100644
--- a/arch/x86/um/vdso/Makefile
+++ b/arch/x86/um/vdso/Makefile
@@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO    $@
 
 VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 #
 # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 8a94388e38b33..2d81623b33f29 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -40,6 +40,7 @@ KBUILD_CFLAGS			:= $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
 KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
 
 GCOV_PROFILE			:= n
+PGO_PROFILE			:= n
 # Sanitizer runtimes are unavailable and cannot be linked here.
 KASAN_SANITIZE			:= n
 KCSAN_SANITIZE			:= n
diff --git a/drivers/s390/char/Makefile b/drivers/s390/char/Makefile
index c6fdb81a068a6..bf6c5db5da1fc 100644
--- a/drivers/s390/char/Makefile
+++ b/drivers/s390/char/Makefile
@@ -9,6 +9,7 @@ CFLAGS_REMOVE_sclp_early_core.o	= $(CC_FLAGS_FTRACE)
 endif
 
 GCOV_PROFILE_sclp_early_core.o		:= n
+PGO_PROFILE_sclp_early_core.o		:= n
 KCOV_INSTRUMENT_sclp_early_core.o	:= n
 UBSAN_SANITIZE_sclp_early_core.o	:= n
 KASAN_SANITIZE_sclp_early_core.o	:= n
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index b2b3d81b1535a..3a591bb18c5fb 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -316,6 +316,49 @@
 #define THERMAL_TABLE(name)
 #endif
 
+#ifdef CONFIG_PGO_CLANG
+#define PGO_CLANG_DATA							\
+	__llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) {	\
+		. = ALIGN(8);						\
+		__llvm_prf_start = .;					\
+		__llvm_prf_data_start = .;				\
+		KEEP(*(__llvm_prf_data))				\
+		. = ALIGN(8);						\
+		__llvm_prf_data_end = .;				\
+	}								\
+	__llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) {	\
+		. = ALIGN(8);						\
+		__llvm_prf_cnts_start = .;				\
+		KEEP(*(__llvm_prf_cnts))				\
+		. = ALIGN(8);						\
+		__llvm_prf_cnts_end = .;				\
+	}								\
+	__llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) {	\
+		. = ALIGN(8);						\
+		__llvm_prf_names_start = .;				\
+		KEEP(*(__llvm_prf_names))				\
+		. = ALIGN(8);						\
+		__llvm_prf_names_end = .;				\
+		. = ALIGN(8);						\
+	}								\
+	__llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) {	\
+		__llvm_prf_vals_start = .;				\
+		KEEP(*(__llvm_prf_vals))				\
+		. = ALIGN(8);						\
+		__llvm_prf_vals_end = .;				\
+		. = ALIGN(8);						\
+	}								\
+	__llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) {	\
+		__llvm_prf_vnds_start = .;				\
+		KEEP(*(__llvm_prf_vnds))				\
+		. = ALIGN(8);						\
+		__llvm_prf_vnds_end = .;				\
+		__llvm_prf_end = .;					\
+	}
+#else
+#define PGO_CLANG_DATA
+#endif
+
 #define KERNEL_DTB()							\
 	STRUCT_ALIGN();							\
 	__dtb_start = .;						\
@@ -1125,6 +1168,7 @@
 		CONSTRUCTORS						\
 	}								\
 	BUG_TABLE							\
+	PGO_CLANG_DATA
 
 #define INIT_TEXT_SECTION(inittext_align)				\
 	. = ALIGN(inittext_align);					\
diff --git a/kernel/Makefile b/kernel/Makefile
index aa7368c7eabf3..0b34ca228ba46 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
 obj-$(CONFIG_KCSAN) += kcsan/
 obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
 obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
+obj-$(CONFIG_PGO_CLANG) += pgo/
 
 obj-$(CONFIG_PERF_EVENTS) += events/
 
diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
new file mode 100644
index 0000000000000..318d36bb3d106
--- /dev/null
+++ b/kernel/pgo/Kconfig
@@ -0,0 +1,34 @@
+# SPDX-License-Identifier: GPL-2.0-only
+menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
+
+config ARCH_SUPPORTS_PGO_CLANG
+	bool
+
+config PGO_CLANG
+	bool "Enable clang's PGO-based kernel profiling"
+	depends on DEBUG_FS
+	depends on ARCH_SUPPORTS_PGO_CLANG
+	help
+	  This option enables clang's PGO (Profile Guided Optimization) based
+	  code profiling to better optimize the kernel.
+
+	  If unsure, say N.
+
+	  Run a representative workload for your application on a kernel
+	  compiled with this option and download the raw profile file from
+	  /sys/kernel/debug/pgo/profraw. This file needs to be processed with
+	  llvm-profdata. It may be merged with other collected raw profiles.
+
+	  Copy the resulting profile file into vmlinux.profdata, and enable
+	  KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
+	  kernel.
+
+	  Note that a kernel compiled with profiling flags will be
+	  significatnly larger and run slower. Also be sure to exclude files
+	  from profiling which are not linked to the kernel image to prevent
+	  linker errors.
+
+	  Note that the debugfs filesystem has to be mounted to access
+	  profiling data.
+
+endmenu
diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
new file mode 100644
index 0000000000000..41e27cefd9a47
--- /dev/null
+++ b/kernel/pgo/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+GCOV_PROFILE	:= n
+PGO_PROFILE	:= n
+
+obj-y	+= fs.o instrument.o
diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
new file mode 100644
index 0000000000000..790a8df037bfc
--- /dev/null
+++ b/kernel/pgo/fs.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ *	Sami Tolvanen <samitolvanen@google.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt)	"pgo: " fmt
+
+#include <linux/kernel.h>
+#include <linux/debugfs.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include "pgo.h"
+
+static struct dentry *directory;
+
+struct prf_private_data {
+	void *buffer;
+	unsigned long size;
+};
+
+/*
+ * Raw profile data format:
+ *
+ *	- llvm_prf_header
+ *	- __llvm_prf_data
+ *	- __llvm_prf_cnts
+ *	- __llvm_prf_names
+ *	- zero padding to 8 bytes
+ *	- for each llvm_prf_data in __llvm_prf_data:
+ *		- llvm_prf_value_data
+ *			- llvm_prf_value_record + site count array
+ *				- llvm_prf_value_node_data
+ *				...
+ *			...
+ *		...
+ */
+
+static void prf_fill_header(void **buffer)
+{
+	struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
+
+	header->magic = LLVM_PRF_MAGIC;
+	header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
+	header->data_size = prf_data_count();
+	header->padding_bytes_before_counters = 0;
+	header->counters_size = prf_cnts_count();
+	header->padding_bytes_after_counters = 0;
+	header->names_size = prf_names_count();
+	header->counters_delta = (u64)__llvm_prf_cnts_start;
+	header->names_delta = (u64)__llvm_prf_names_start;
+	header->value_kind_last = LLVM_PRF_IPVK_LAST;
+
+	*buffer += sizeof(*header);
+}
+
+/*
+ * Copy the source into the buffer, incrementing the pointer into buffer in the
+ * process.
+ */
+static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
+{
+	memcpy(*buffer, src, size);
+	*buffer += size;
+}
+
+static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
+{
+	struct llvm_prf_value_node **nodes =
+		(struct llvm_prf_value_node **)p->values;
+	u32 kinds = 0;
+	u32 size = 0;
+	unsigned int kind;
+	unsigned int n;
+	unsigned int s = 0;
+
+	for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+		unsigned int sites = p->num_value_sites[kind];
+
+		if (!sites)
+			continue;
+
+		/* Record + site count array */
+		size += prf_get_value_record_size(sites);
+		kinds++;
+
+		if (!nodes)
+			continue;
+
+		for (n = 0; n < sites; n++) {
+			u32 count = 0;
+			struct llvm_prf_value_node *site = nodes[s + n];
+
+			while (site && ++count <= U8_MAX)
+				site = site->next;
+
+			size += count *
+				sizeof(struct llvm_prf_value_node_data);
+		}
+
+		s += sites;
+	}
+
+	if (size)
+		size += sizeof(struct llvm_prf_value_data);
+
+	if (value_kinds)
+		*value_kinds = kinds;
+
+	return size;
+}
+
+static u32 prf_get_value_size(void)
+{
+	u32 size = 0;
+	struct llvm_prf_data *p;
+
+	for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+		size += __prf_get_value_size(p, NULL);
+
+	return size;
+}
+
+/* Serialize the profiling's value. */
+static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
+{
+	struct llvm_prf_value_data header;
+	struct llvm_prf_value_node **nodes =
+		(struct llvm_prf_value_node **)p->values;
+	unsigned int kind;
+	unsigned int n;
+	unsigned int s = 0;
+
+	header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
+
+	if (!header.num_value_kinds)
+		/* Nothing to write. */
+		return;
+
+	prf_copy_to_buffer(buffer, &header, sizeof(header));
+
+	for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+		struct llvm_prf_value_record *record;
+		u8 *counts;
+		unsigned int sites = p->num_value_sites[kind];
+
+		if (!sites)
+			continue;
+
+		/* Profiling value record. */
+		record = *(struct llvm_prf_value_record **)buffer;
+		*buffer += prf_get_value_record_header_size();
+
+		record->kind = kind;
+		record->num_value_sites = sites;
+
+		/* Site count array. */
+		counts = *(u8 **)buffer;
+		*buffer += prf_get_value_record_site_count_size(sites);
+
+		/*
+		 * If we don't have nodes, we can skip updating the site count
+		 * array, because the buffer is zero filled.
+		 */
+		if (!nodes)
+			continue;
+
+		for (n = 0; n < sites; n++) {
+			u32 count = 0;
+			struct llvm_prf_value_node *site = nodes[s + n];
+
+			while (site && ++count <= U8_MAX) {
+				prf_copy_to_buffer(buffer, site,
+						   sizeof(struct llvm_prf_value_node_data));
+				site = site->next;
+			}
+
+			counts[n] = (u8)count;
+		}
+
+		s += sites;
+	}
+}
+
+static void prf_serialize_values(void **buffer)
+{
+	struct llvm_prf_data *p;
+
+	for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+		prf_serialize_value(p, buffer);
+}
+
+static inline unsigned long prf_get_padding(unsigned long size)
+{
+	return 8 - (size % 8);
+}
+
+static unsigned long prf_buffer_size(void)
+{
+	return sizeof(struct llvm_prf_header) +
+			prf_data_size()	+
+			prf_cnts_size() +
+			prf_names_size() +
+			prf_get_padding(prf_names_size()) +
+			prf_get_value_size();
+}
+
+/* Serialize the profling data into a format LLVM's tools can understand. */
+static int prf_serialize(struct prf_private_data *p)
+{
+	int err = 0;
+	void *buffer;
+
+	p->size = prf_buffer_size();
+	p->buffer = vzalloc(p->size);
+
+	if (!p->buffer) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	buffer = p->buffer;
+
+	prf_fill_header(&buffer);
+	prf_copy_to_buffer(&buffer, __llvm_prf_data_start,  prf_data_size());
+	prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start,  prf_cnts_size());
+	prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
+	buffer += prf_get_padding(prf_names_size());
+
+	prf_serialize_values(&buffer);
+
+out:
+	return err;
+}
+
+/* open() implementation for PGO. Creates a copy of the profiling data set. */
+static int prf_open(struct inode *inode, struct file *file)
+{
+	struct prf_private_data *data;
+	unsigned long flags;
+	int err;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	flags = prf_lock();
+
+	err = prf_serialize(data);
+	if (err) {
+		kfree(data);
+		goto out_unlock;
+	}
+
+	file->private_data = data;
+
+out_unlock:
+	prf_unlock(flags);
+out:
+	return err;
+}
+
+/* read() implementation for PGO. */
+static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
+			loff_t *ppos)
+{
+	struct prf_private_data *data = file->private_data;
+
+	BUG_ON(!data);
+
+	return simple_read_from_buffer(buf, count, ppos, data->buffer,
+				       data->size);
+}
+
+/* release() implementation for PGO. Release resources allocated by open(). */
+static int prf_release(struct inode *inode, struct file *file)
+{
+	struct prf_private_data *data = file->private_data;
+
+	if (data) {
+		vfree(data->buffer);
+		kfree(data);
+	}
+
+	return 0;
+}
+
+static const struct file_operations prf_fops = {
+	.owner		= THIS_MODULE,
+	.open		= prf_open,
+	.read		= prf_read,
+	.llseek		= default_llseek,
+	.release	= prf_release
+};
+
+/* write() implementation for resetting PGO's profile data. */
+static ssize_t reset_write(struct file *file, const char __user *addr,
+			   size_t len, loff_t *pos)
+{
+	struct llvm_prf_data *data;
+
+	memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
+
+	for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
+		struct llvm_prf_value_node **vnodes;
+		u64 current_vsite_count;
+		u32 i;
+
+		if (!data->values)
+			continue;
+
+		current_vsite_count = 0;
+		vnodes = (struct llvm_prf_value_node **)data->values;
+
+		for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
+			current_vsite_count += data->num_value_sites[i];
+
+		for (i = 0; i < current_vsite_count; ++i) {
+			struct llvm_prf_value_node *current_vnode = vnodes[i];
+
+			while (current_vnode) {
+				current_vnode->count = 0;
+				current_vnode = current_vnode->next;
+			}
+		}
+	}
+
+	return len;
+}
+
+static const struct file_operations prf_reset_fops = {
+	.owner		= THIS_MODULE,
+	.write		= reset_write,
+	.llseek		= noop_llseek,
+};
+
+/* Create debugfs entries. */
+static int __init pgo_init(void)
+{
+	directory = debugfs_create_dir("pgo", NULL);
+	if (!directory)
+		goto err_remove;
+
+	if (!debugfs_create_file("profraw", 0600, directory, NULL,
+				 &prf_fops))
+		goto err_remove;
+
+	if (!debugfs_create_file("reset", 0200, directory, NULL,
+				 &prf_reset_fops))
+		goto err_remove;
+
+	return 0;
+
+err_remove:
+	pr_err("initialization failed\n");
+	return -EIO;
+}
+
+/* Remove debufs entries. */
+static void __exit pgo_exit(void)
+{
+	debugfs_remove_recursive(directory);
+}
+
+module_init(pgo_init);
+module_exit(pgo_exit);
diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
new file mode 100644
index 0000000000000..d96b61a1cf712
--- /dev/null
+++ b/kernel/pgo/instrument.c
@@ -0,0 +1,147 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ *	Sami Tolvanen <samitolvanen@google.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt)	"pgo: " fmt
+
+#include <linux/kernel.h>
+#include <linux/export.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+#include "pgo.h"
+
+/* Lock guarding value node access and serialization. */
+static DEFINE_SPINLOCK(pgo_lock);
+static int current_node;
+
+unsigned long prf_lock(void)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&pgo_lock, flags);
+
+	return flags;
+}
+
+void prf_unlock(unsigned long flags)
+{
+	spin_unlock_irqrestore(&pgo_lock, flags);
+}
+
+/*
+ * Return a newly allocated profiling value node which contains the tracked
+ * value by the value profiler.
+ * Note: caller *must* hold pgo_lock.
+ */
+static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
+						 u32 index, u64 value)
+{
+	if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
+		return NULL; /* Out of nodes */
+
+	current_node++;
+
+	/* Make sure the node is entirely within the section */
+	if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
+	    &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
+		return NULL;
+
+	return &__llvm_prf_vnds_start[current_node];
+}
+
+/*
+ * Counts the number of times a target value is seen.
+ *
+ * Records the target value for the CounterIndex if not seen before. Otherwise,
+ * increments the counter associated w/ the target value.
+ */
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
+{
+	struct llvm_prf_data *p = (struct llvm_prf_data *)data;
+	struct llvm_prf_value_node **counters;
+	struct llvm_prf_value_node *curr;
+	struct llvm_prf_value_node *min = NULL;
+	struct llvm_prf_value_node *prev = NULL;
+	u64 min_count = U64_MAX;
+	u8 values = 0;
+	unsigned long flags;
+
+	if (!p || !p->values)
+		return;
+
+	counters = (struct llvm_prf_value_node **)p->values;
+	curr = counters[index];
+
+	while (curr) {
+		if (target_value == curr->value) {
+			curr->count++;
+			return;
+		}
+
+		if (curr->count < min_count) {
+			min_count = curr->count;
+			min = curr;
+		}
+
+		prev = curr;
+		curr = curr->next;
+		values++;
+	}
+
+	if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
+		if (!min->count || !(--min->count)) {
+			curr = min;
+			curr->value = target_value;
+			curr->count++;
+		}
+		return;
+	}
+
+	/* Lock when updating the value node structure. */
+	flags = prf_lock();
+
+	curr = allocate_node(p, index, target_value);
+	if (!curr)
+		goto out;
+
+	curr->value = target_value;
+	curr->count++;
+
+	if (!counters[index])
+		counters[index] = curr;
+	else if (prev && !prev->next)
+		prev->next = curr;
+
+out:
+	prf_unlock(flags);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_target);
+
+/* Counts the number of times a range of targets values are seen. */
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+				     u32 index, s64 precise_start,
+				     s64 precise_last, s64 large_value)
+{
+	if (large_value != S64_MIN && (s64)target_value >= large_value)
+		target_value = large_value;
+	else if ((s64)target_value < precise_start ||
+		 (s64)target_value > precise_last)
+		target_value = precise_last + 1;
+
+	__llvm_profile_instrument_target(target_value, data, index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_range);
diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
new file mode 100644
index 0000000000000..df0aa278f28bd
--- /dev/null
+++ b/kernel/pgo/pgo.h
@@ -0,0 +1,206 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ *	Sami Tolvanen <samitolvanen@google.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _PGO_H
+#define _PGO_H
+
+/*
+ * Note: These internal LLVM definitions must match the compiler version.
+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
+ */
+
+#ifdef CONFIG_64BIT
+	#define LLVM_PRF_MAGIC		\
+		((u64)255 << 56 |	\
+		 (u64)'l' << 48 |	\
+		 (u64)'p' << 40 |	\
+		 (u64)'r' << 32 |	\
+		 (u64)'o' << 24 |	\
+		 (u64)'f' << 16 |	\
+		 (u64)'r' << 8  |	\
+		 (u64)129)
+#else
+	#define LLVM_PRF_MAGIC		\
+		((u64)255 << 56 |	\
+		 (u64)'l' << 48 |	\
+		 (u64)'p' << 40 |	\
+		 (u64)'r' << 32 |	\
+		 (u64)'o' << 24 |	\
+		 (u64)'f' << 16 |	\
+		 (u64)'R' << 8  |	\
+		 (u64)129)
+#endif
+
+#define LLVM_PRF_VERSION		5
+#define LLVM_PRF_DATA_ALIGN		8
+#define LLVM_PRF_IPVK_FIRST		0
+#define LLVM_PRF_IPVK_LAST		1
+#define LLVM_PRF_MAX_NUM_VALS_PER_SITE	16
+
+#define LLVM_PRF_VARIANT_MASK_IR	(0x1ull << 56)
+#define LLVM_PRF_VARIANT_MASK_CSIR	(0x1ull << 57)
+
+/**
+ * struct llvm_prf_header - represents the raw profile header data structure.
+ * @magic: the magic token for the file format.
+ * @version: the version of the file format.
+ * @data_size: the number of entries in the profile data section.
+ * @padding_bytes_before_counters: the number of padding bytes before the
+ *   counters.
+ * @counters_size: the size in bytes of the LLVM profile section containing the
+ *   counters.
+ * @padding_bytes_after_counters: the number of padding bytes after the
+ *   counters.
+ * @names_size: the size in bytes of the LLVM profile section containing the
+ *   counters' names.
+ * @counters_delta: the beginning of the LLMV profile counters section.
+ * @names_delta: the beginning of the LLMV profile names section.
+ * @value_kind_last: the last profile value kind.
+ */
+struct llvm_prf_header {
+	u64 magic;
+	u64 version;
+	u64 data_size;
+	u64 padding_bytes_before_counters;
+	u64 counters_size;
+	u64 padding_bytes_after_counters;
+	u64 names_size;
+	u64 counters_delta;
+	u64 names_delta;
+	u64 value_kind_last;
+};
+
+/**
+ * struct llvm_prf_data - represents the per-function control structure.
+ * @name_ref: the reference to the function's name.
+ * @func_hash: the hash value of the function.
+ * @counter_ptr: a pointer to the profile counter.
+ * @function_ptr: a pointer to the function.
+ * @values: the profiling values associated with this function.
+ * @num_counters: the number of counters in the function.
+ * @num_value_sites: the number of value profile sites.
+ */
+struct llvm_prf_data {
+	const u64 name_ref;
+	const u64 func_hash;
+	const void *counter_ptr;
+	const void *function_ptr;
+	void *values;
+	const u32 num_counters;
+	const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
+} __aligned(LLVM_PRF_DATA_ALIGN);
+
+/**
+ * structure llvm_prf_value_node_data - represents the data part of the struct
+ *   llvm_prf_value_node data structure.
+ * @value: the value counters.
+ * @count: the counters' count.
+ */
+struct llvm_prf_value_node_data {
+	u64 value;
+	u64 count;
+};
+
+/**
+ * struct llvm_prf_value_node - represents an internal data structure used by
+ *   the value profiler.
+ * @value: the value counters.
+ * @count: the counters' count.
+ * @next: the next value node.
+ */
+struct llvm_prf_value_node {
+	u64 value;
+	u64 count;
+	struct llvm_prf_value_node *next;
+};
+
+/**
+ * struct llvm_prf_value_data - represents the value profiling data in indexed
+ *   format.
+ * @total_size: the total size in bytes including this field.
+ * @num_value_kinds: the number of value profile kinds that has value profile
+ *   data.
+ */
+struct llvm_prf_value_data {
+	u32 total_size;
+	u32 num_value_kinds;
+};
+
+/**
+ * struct llvm_prf_value_record - represents the on-disk layout of the value
+ *   profile data of a particular kind for one function.
+ * @kind: the kind of the value profile record.
+ * @num_value_sites: the number of value profile sites.
+ * @site_count_array: the first element of the array that stores the number
+ *   of profiled values for each value site.
+ */
+struct llvm_prf_value_record {
+	u32 kind;
+	u32 num_value_sites;
+	u8 site_count_array[];
+};
+
+#define prf_get_value_record_header_size()		\
+	offsetof(struct llvm_prf_value_record, site_count_array)
+#define prf_get_value_record_site_count_size(sites)	\
+	roundup((sites), 8)
+#define prf_get_value_record_size(sites)		\
+	(prf_get_value_record_header_size() +		\
+	 prf_get_value_record_site_count_size((sites)))
+
+/* Data sections */
+extern struct llvm_prf_data __llvm_prf_data_start[];
+extern struct llvm_prf_data __llvm_prf_data_end[];
+
+extern u64 __llvm_prf_cnts_start[];
+extern u64 __llvm_prf_cnts_end[];
+
+extern char __llvm_prf_names_start[];
+extern char __llvm_prf_names_end[];
+
+extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
+extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
+
+/* Locking for vnodes */
+extern unsigned long prf_lock(void);
+extern void prf_unlock(unsigned long flags);
+
+#define __DEFINE_PRF_SIZE(s) \
+	static inline unsigned long prf_ ## s ## _size(void)		\
+	{								\
+		unsigned long start =					\
+			(unsigned long)__llvm_prf_ ## s ## _start;	\
+		unsigned long end =					\
+			(unsigned long)__llvm_prf_ ## s ## _end;	\
+		return roundup(end - start,				\
+				sizeof(__llvm_prf_ ## s ## _start[0]));	\
+	}								\
+	static inline unsigned long prf_ ## s ## _count(void)		\
+	{								\
+		return prf_ ## s ## _size() /				\
+			sizeof(__llvm_prf_ ## s ## _start[0]);		\
+	}
+
+__DEFINE_PRF_SIZE(data);
+__DEFINE_PRF_SIZE(cnts);
+__DEFINE_PRF_SIZE(names);
+__DEFINE_PRF_SIZE(vnds);
+
+#undef __DEFINE_PRF_SIZE
+
+#endif /* _PGO_H */
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 213677a5ed33e..9b218afb5cb87 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
 		$(CFLAGS_GCOV))
 endif
 
+#
+# Enable clang's PGO profiling flags for a file or directory depending on
+# variables PGO_PROFILE_obj.o and PGO_PROFILE.
+#
+ifeq ($(CONFIG_PGO_CLANG),y)
+_c_flags += $(if $(patsubst n%,, \
+		$(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
+		$(CFLAGS_PGO_CLANG))
+endif
+
 #
 # Enable address sanitizer flags for kernel except some files or directories
 # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
-- 
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-11  8:18 [PATCH] pgo: add clang's Profile Guided Optimization infrastructure Bill Wendling
@ 2021-01-11  8:39 ` Sedat Dilek
  2021-01-11  8:42   ` Sedat Dilek
  2021-01-11  9:17   ` Bill Wendling
  2021-01-11 20:12 ` Fangrui Song
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 116+ messages in thread
From: Sedat Dilek @ 2021-01-11  8:39 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Jonathan Corbet, Masahiro Yamada, linux-doc, linux-kernel,
	linux-kbuild, Clang-Built-Linux ML, Andrew Morton,
	Nathan Chancellor, Nick Desaulniers, Sami Tolvanen

On Mon, Jan 11, 2021 at 9:18 AM 'Bill Wendling' via Clang Built Linux
<clang-built-linux@googlegroups.com> wrote:
>
> From: Sami Tolvanen <samitolvanen@google.com>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool before
> it can be used during recompilation:
>
>   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
>   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can be used either by the compiler if LTO isn't enabled:
>
>     ... -fprofile-use=vmlinux.profdata ...
>
> or by LLD if LTO is enabled:
>
>     ... -lto-cs-profile-file=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we know
> works. This restriction can be lifted once other platforms have been verified
> to work with PGO.
>
> Note that this method of profiling the kernel is clang-native and isn't
> compatible with clang's gcov support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>

Hi Bill and Sami,

I have seen the pull-request in the CBL issue tracker and had some
questions in mind.

Good you send this.

First of all, I like to fetch any development stuff easily from a Git
repository.
Can you offer this, please?
What is the base for your work?
I hope this is (fresh released) Linux v5.11-rc3.

I myself had some experiences with a PGO + ThinLTO optimized LLVM
toolchain built with the help of tc-build.
Here it takes very long to build it.

This means I have some profile-data archived.
Can I use it?

Is an own PGO + ThinLTO optimized LLVM toolchain pre-requirement for
this or not?
That is one of my important questions.

Thanks for your work.

Regards,
- Sedat -

> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> Co-developed-by: Bill Wendling <morbo@google.com>
> Signed-off-by: Bill Wendling <morbo@google.com>
> ---
>  Documentation/dev-tools/index.rst     |   1 +
>  Documentation/dev-tools/pgo.rst       | 127 +++++++++
>  MAINTAINERS                           |   9 +
>  Makefile                              |   3 +
>  arch/Kconfig                          |   1 +
>  arch/arm/boot/bootp/Makefile          |   1 +
>  arch/arm/boot/compressed/Makefile     |   1 +
>  arch/arm/vdso/Makefile                |   3 +-
>  arch/arm64/kernel/vdso/Makefile       |   3 +-
>  arch/arm64/kvm/hyp/nvhe/Makefile      |   1 +
>  arch/mips/boot/compressed/Makefile    |   1 +
>  arch/mips/vdso/Makefile               |   1 +
>  arch/nds32/kernel/vdso/Makefile       |   4 +-
>  arch/parisc/boot/compressed/Makefile  |   1 +
>  arch/powerpc/kernel/Makefile          |   6 +-
>  arch/powerpc/kernel/trace/Makefile    |   3 +-
>  arch/powerpc/kernel/vdso32/Makefile   |   1 +
>  arch/powerpc/kernel/vdso64/Makefile   |   1 +
>  arch/powerpc/kexec/Makefile           |   3 +-
>  arch/powerpc/xmon/Makefile            |   1 +
>  arch/riscv/kernel/vdso/Makefile       |   3 +-
>  arch/s390/boot/Makefile               |   1 +
>  arch/s390/boot/compressed/Makefile    |   1 +
>  arch/s390/kernel/Makefile             |   1 +
>  arch/s390/kernel/vdso64/Makefile      |   3 +-
>  arch/s390/purgatory/Makefile          |   1 +
>  arch/sh/boot/compressed/Makefile      |   1 +
>  arch/sh/mm/Makefile                   |   1 +
>  arch/sparc/vdso/Makefile              |   1 +
>  arch/x86/Kconfig                      |   1 +
>  arch/x86/boot/Makefile                |   1 +
>  arch/x86/boot/compressed/Makefile     |   1 +
>  arch/x86/entry/vdso/Makefile          |   1 +
>  arch/x86/kernel/vmlinux.lds.S         |   2 +
>  arch/x86/platform/efi/Makefile        |   1 +
>  arch/x86/purgatory/Makefile           |   1 +
>  arch/x86/realmode/rm/Makefile         |   1 +
>  arch/x86/um/vdso/Makefile             |   1 +
>  drivers/firmware/efi/libstub/Makefile |   1 +
>  drivers/s390/char/Makefile            |   1 +
>  include/asm-generic/vmlinux.lds.h     |  44 +++
>  kernel/Makefile                       |   1 +
>  kernel/pgo/Kconfig                    |  34 +++
>  kernel/pgo/Makefile                   |   5 +
>  kernel/pgo/fs.c                       | 382 ++++++++++++++++++++++++++
>  kernel/pgo/instrument.c               | 147 ++++++++++
>  kernel/pgo/pgo.h                      | 206 ++++++++++++++
>  scripts/Makefile.lib                  |  10 +
>  48 files changed, 1017 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/pgo.rst
>  create mode 100644 kernel/pgo/Kconfig
>  create mode 100644 kernel/pgo/Makefile
>  create mode 100644 kernel/pgo/fs.c
>  create mode 100644 kernel/pgo/instrument.c
>  create mode 100644 kernel/pgo/pgo.h
>
> diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> index f7809c7b1ba9e..8d6418e858062 100644
> --- a/Documentation/dev-tools/index.rst
> +++ b/Documentation/dev-tools/index.rst
> @@ -26,6 +26,7 @@ whole; patches welcome!
>     kgdb
>     kselftest
>     kunit/index
> +   pgo
>
>
>  .. only::  subproject and html
> diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> new file mode 100644
> index 0000000000000..2ed7f549b20ef
> --- /dev/null
> +++ b/Documentation/dev-tools/pgo.rst
> @@ -0,0 +1,127 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============================
> +Using PGO with the Linux kernel
> +===============================
> +
> +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> +when building with Clang. The profiling data is exported via the ``pgo``
> +debugfs directory.
> +
> +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> +
> +
> +Preparation
> +===========
> +
> +Configure the kernel with:
> +
> +.. code-block:: make
> +
> +   CONFIG_DEBUG_FS=y
> +   CONFIG_PGO_CLANG=y
> +
> +Note that kernels compiled with profiling flags will be significantly larger
> +and run slower.
> +
> +Profiling data will only become accessible once debugfs has been mounted:
> +
> +.. code-block:: sh
> +
> +   mount -t debugfs none /sys/kernel/debug
> +
> +
> +Customization
> +=============
> +
> +You can enable or disable profiling for individual file and directories by
> +adding a line similar to the following to the respective kernel Makefile:
> +
> +- For a single file (e.g. main.o)
> +
> +  .. code-block:: make
> +
> +     PGO_PROFILE_main.o := y
> +
> +- For all files in one directory
> +
> +  .. code-block:: make
> +
> +     PGO_PROFILE := y
> +
> +To exclude files from being profiled use
> +
> +  .. code-block:: make
> +
> +     PGO_PROFILE_main.o := n
> +
> +and
> +
> +  .. code-block:: make
> +
> +     PGO_PROFILE := n
> +
> +Only files which are linked to the main kernel image or are compiled as kernel
> +modules are supported by this mechanism.
> +
> +
> +Files
> +=====
> +
> +The PGO kernel support creates the following files in debugfs:
> +
> +``/sys/kernel/debug/pgo``
> +       Parent directory for all PGO-related files.
> +
> +``/sys/kernel/debug/pgo/reset``
> +       Global reset file: resets all coverage data to zero when written to.
> +
> +``/sys/kernel/debug/profraw``
> +       The raw PGO data that must be processed with ``llvm_profdata``.
> +
> +
> +Workflow
> +========
> +
> +The PGO kernel can be run on the host or test machines. The data though should
> +be analyzed with Clang's tools from the same Clang version as the kernel was
> +compiled. Clang's tolerant of version skew, but it's easier to use the same
> +Clang version.
> +
> +The profiling data is useful for optimizing the kernel, analyzing coverage,
> +etc. Clang offers tools to perform these tasks.
> +
> +Here is an example workflow for profiling an instrumented kernel with PGO and
> +using the result to optimize the kernel:
> +
> +1) Install the kernel on the TEST machine.
> +
> +2) Reset the data counters right before running the load tests
> +
> +   .. code-block:: sh
> +
> +      echo 1 > /sys/kernel/debug/pgo/reset
> +
> +3) Run the load tests.
> +
> +4) Collect the raw profile data
> +
> +   .. code-block:: sh
> +
> +      cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> +
> +5) (Optional) Download the raw profile data to the HOST machine.
> +
> +6) Process the raw profile data
> +
> +   .. code-block:: sh
> +
> +      llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> +
> +   Note that multiple raw profile data files can be merged during this step.
> +
> +7) Rebuild the kernel using the profile data (PGO disabled)
> +
> +   .. code-block:: sh
> +
> +      make ... KCLAGS=-fprofile-use=vmlinux.profdata
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 6390491b07e51..7a98bdaab9861 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13955,6 +13955,15 @@ S:     Maintained
>  F:     include/linux/personality.h
>  F:     include/uapi/linux/personality.h
>
> +PGO BASED KERNEL PROFILING
> +M:     Sami Tolvanen <samitolvanen@google.com>
> +M:     Bill Wendling <wcw@google.com>
> +R:     Nathan Chancellor <natechancellor@gmail.com>
> +R:     Nick Desaulniers <ndesaulniers@google.com>
> +S:     Supported
> +F:     Documentation/dev-tools/pgo.rst
> +F:     kernel/pgo
> +
>  PHOENIX RC FLIGHT CONTROLLER ADAPTER
>  M:     Marcus Folkesson <marcus.folkesson@gmail.com>
>  L:     linux-input@vger.kernel.org
> diff --git a/Makefile b/Makefile
> index 8b2c3f88ee5ea..4f42957c78134 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
>  # Defaults to vmlinux, but the arch makefile usually adds further targets
>  all: vmlinux
>
> +CFLAGS_PGO_CLANG := -fprofile-generate
> +export CFLAGS_PGO_CLANG
> +
>  CFLAGS_GCOV    := -fprofile-arcs -ftest-coverage \
>         $(call cc-option,-fno-tree-loop-im) \
>         $(call cc-disable-warning,maybe-uninitialized,)
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 78c6f05b10f91..a7a6ab7d204dc 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1106,6 +1106,7 @@ config ARCH_SUPPORTS_DEBUG_PAGEALLOC
>         bool
>
>  source "kernel/gcov/Kconfig"
> +source "kernel/pgo/Kconfig"
>
>  source "scripts/gcc-plugins/Kconfig"
>
> diff --git a/arch/arm/boot/bootp/Makefile b/arch/arm/boot/bootp/Makefile
> index 981a8d03f064c..523bd58df0a4b 100644
> --- a/arch/arm/boot/bootp/Makefile
> +++ b/arch/arm/boot/bootp/Makefile
> @@ -7,6 +7,7 @@
>  #
>
>  GCOV_PROFILE   := n
> +PGO_PROFILE    := n
>
>  LDFLAGS_bootp  := --no-undefined -X \
>                  --defsym initrd_phys=$(INITRD_PHYS) \
> diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
> index fb521efcc6c20..5fd0fd85fc0e5 100644
> --- a/arch/arm/boot/compressed/Makefile
> +++ b/arch/arm/boot/compressed/Makefile
> @@ -24,6 +24,7 @@ OBJS          += hyp-stub.o
>  endif
>
>  GCOV_PROFILE           := n
> +PGO_PROFILE            := n
>  KASAN_SANITIZE         := n
>
>  # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
> diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile
> index b558bee0e1f6b..11f6ce4b48b56 100644
> --- a/arch/arm/vdso/Makefile
> +++ b/arch/arm/vdso/Makefile
> @@ -36,8 +36,9 @@ else
>  CFLAGS_vgettimeofday.o = -O2 -include $(c-gettimeofday-y)
>  endif
>
> -# Disable gcov profiling for VDSO code
> +# Disable gcov and PGO profiling for VDSO code
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
>  KCOV_INSTRUMENT := n
> diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
> index cd9c3fa25902f..d48fc0df07020 100644
> --- a/arch/arm64/kernel/vdso/Makefile
> +++ b/arch/arm64/kernel/vdso/Makefile
> @@ -41,8 +41,9 @@ ifneq ($(c-gettimeofday-y),)
>    CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
>  endif
>
> -# Disable gcov profiling for VDSO code
> +# Disable gcov and PGO profiling for VDSO code
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  obj-y += vdso.o
>  targets += vdso.lds
> diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
> index 1f1e351c5fe2b..ad128ecdbfbdf 100644
> --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> @@ -60,6 +60,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS), $(KBUILD_CFLAG
>  # compiler instrumentation that inserts callbacks or checks into the code may
>  # cause crashes. Just disable it.
>  GCOV_PROFILE   := n
> +PGO_PROFILE    := n
>  KASAN_SANITIZE := n
>  UBSAN_SANITIZE := n
>  KCOV_INSTRUMENT        := n
> diff --git a/arch/mips/boot/compressed/Makefile b/arch/mips/boot/compressed/Makefile
> index 47cd9dc7454af..0855ea12f2c7f 100644
> --- a/arch/mips/boot/compressed/Makefile
> +++ b/arch/mips/boot/compressed/Makefile
> @@ -37,6 +37,7 @@ KBUILD_AFLAGS := $(KBUILD_AFLAGS) -D__ASSEMBLY__ \
>  # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
>  KCOV_INSTRUMENT                := n
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  # decompressor objects (linked with vmlinuz)
>  vmlinuzobjs-y := $(obj)/head.o $(obj)/decompress.o $(obj)/string.o
> diff --git a/arch/mips/vdso/Makefile b/arch/mips/vdso/Makefile
> index 5810cc12bc1d9..d7eb64de35eae 100644
> --- a/arch/mips/vdso/Makefile
> +++ b/arch/mips/vdso/Makefile
> @@ -66,6 +66,7 @@ ldflags-y := -Bsymbolic --no-undefined -soname=linux-vdso.so.1 \
>  CFLAGS_REMOVE_vdso.o = -pg
>
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE := n
>  KCOV_INSTRUMENT := n
>
> diff --git a/arch/nds32/kernel/vdso/Makefile b/arch/nds32/kernel/vdso/Makefile
> index 55df25ef00578..f2b53ee2124b7 100644
> --- a/arch/nds32/kernel/vdso/Makefile
> +++ b/arch/nds32/kernel/vdso/Makefile
> @@ -15,9 +15,9 @@ obj-vdso := $(addprefix $(obj)/, $(obj-vdso))
>  ccflags-y := -shared -fno-common -fno-builtin -nostdlib -fPIC -Wl,-shared -g \
>         -Wl,-soname=linux-vdso.so.1 -Wl,--hash-style=sysv
>
> -# Disable gcov profiling for VDSO code
> +# Disable gcov and PGO profiling for VDSO code
>  GCOV_PROFILE := n
> -
> +PGO_PROFILE := n
>
>  obj-y += vdso.o
>  targets += vdso.lds
> diff --git a/arch/parisc/boot/compressed/Makefile b/arch/parisc/boot/compressed/Makefile
> index dff4536875305..5cf93a67f7da7 100644
> --- a/arch/parisc/boot/compressed/Makefile
> +++ b/arch/parisc/boot/compressed/Makefile
> @@ -7,6 +7,7 @@
>
>  KCOV_INSTRUMENT := n
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE := n
>
>  targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index fe2ef598e2ead..c642c046660d7 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -153,17 +153,21 @@ endif
>  obj-$(CONFIG_PPC_SECURE_BOOT)  += secure_boot.o ima_arch.o secvar-ops.o
>  obj-$(CONFIG_PPC_SECVAR_SYSFS) += secvar-sysfs.o
>
> -# Disable GCOV, KCOV & sanitizers in odd or sensitive code
> +# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
>  GCOV_PROFILE_prom_init.o := n
> +PGO_PROFILE_prom_init.o := n
>  KCOV_INSTRUMENT_prom_init.o := n
>  UBSAN_SANITIZE_prom_init.o := n
>  GCOV_PROFILE_kprobes.o := n
> +PGO_PROFILE_kprobes.o := n
>  KCOV_INSTRUMENT_kprobes.o := n
>  UBSAN_SANITIZE_kprobes.o := n
>  GCOV_PROFILE_kprobes-ftrace.o := n
> +PGO_PROFILE_kprobes-ftrace.o := n
>  KCOV_INSTRUMENT_kprobes-ftrace.o := n
>  UBSAN_SANITIZE_kprobes-ftrace.o := n
>  GCOV_PROFILE_syscall_64.o := n
> +PGO_PROFILE_syscall_64.o := n
>  KCOV_INSTRUMENT_syscall_64.o := n
>  UBSAN_SANITIZE_syscall_64.o := n
>  UBSAN_SANITIZE_vdso.o := n
> diff --git a/arch/powerpc/kernel/trace/Makefile b/arch/powerpc/kernel/trace/Makefile
> index 858503775c583..7d72ae7d4f8c6 100644
> --- a/arch/powerpc/kernel/trace/Makefile
> +++ b/arch/powerpc/kernel/trace/Makefile
> @@ -23,7 +23,8 @@ obj-$(CONFIG_TRACING)                 += trace_clock.o
>  obj-$(CONFIG_PPC64)                    += $(obj64-y)
>  obj-$(CONFIG_PPC32)                    += $(obj32-y)
>
> -# Disable GCOV, KCOV & sanitizers in odd or sensitive code
> +# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
>  GCOV_PROFILE_ftrace.o := n
> +PGO_PROFILE_ftrace.o := n
>  KCOV_INSTRUMENT_ftrace.o := n
>  UBSAN_SANITIZE_ftrace.o := n
> diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile
> index 9cb6f524854b9..655e159975a04 100644
> --- a/arch/powerpc/kernel/vdso32/Makefile
> +++ b/arch/powerpc/kernel/vdso32/Makefile
> @@ -34,6 +34,7 @@ targets := $(obj-vdso32) vdso32.so.dbg
>  obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32))
>
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  KCOV_INSTRUMENT := n
>  UBSAN_SANITIZE := n
>  KASAN_SANITIZE := n
> diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile
> index bf363ff371521..12c286f5afc16 100644
> --- a/arch/powerpc/kernel/vdso64/Makefile
> +++ b/arch/powerpc/kernel/vdso64/Makefile
> @@ -21,6 +21,7 @@ targets := $(obj-vdso64) vdso64.so.dbg
>  obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64))
>
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  KCOV_INSTRUMENT := n
>  UBSAN_SANITIZE := n
>  KASAN_SANITIZE := n
> diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
> index 4aff6846c7726..1c7f65e3cb969 100644
> --- a/arch/powerpc/kexec/Makefile
> +++ b/arch/powerpc/kexec/Makefile
> @@ -16,7 +16,8 @@ endif
>  endif
>
>
> -# Disable GCOV, KCOV & sanitizers in odd or sensitive code
> +# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
>  GCOV_PROFILE_core_$(BITS).o := n
> +PGO_PROFILE_core_$(BITS).o := n
>  KCOV_INSTRUMENT_core_$(BITS).o := n
>  UBSAN_SANITIZE_core_$(BITS).o := n
> diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
> index eb25d7554ffd1..7aff80d18b44b 100644
> --- a/arch/powerpc/xmon/Makefile
> +++ b/arch/powerpc/xmon/Makefile
> @@ -2,6 +2,7 @@
>  # Makefile for xmon
>
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  KCOV_INSTRUMENT := n
>  UBSAN_SANITIZE := n
>  KASAN_SANITIZE := n
> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
> index 0cfd6da784f84..882340dc3c647 100644
> --- a/arch/riscv/kernel/vdso/Makefile
> +++ b/arch/riscv/kernel/vdso/Makefile
> @@ -32,8 +32,9 @@ CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
>  # Disable -pg to prevent insert call site
>  CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
>
> -# Disable gcov profiling for VDSO code
> +# Disable gcov and PGO profiling for VDSO code
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  KCOV_INSTRUMENT := n
>
>  # Force dependency
> diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
> index 41a64b8dce252..bee4a32040e79 100644
> --- a/arch/s390/boot/Makefile
> +++ b/arch/s390/boot/Makefile
> @@ -5,6 +5,7 @@
>
>  KCOV_INSTRUMENT := n
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE := n
>  KASAN_SANITIZE := n
>
> diff --git a/arch/s390/boot/compressed/Makefile b/arch/s390/boot/compressed/Makefile
> index de18dab518bb6..c3ab883e8425a 100644
> --- a/arch/s390/boot/compressed/Makefile
> +++ b/arch/s390/boot/compressed/Makefile
> @@ -7,6 +7,7 @@
>
>  KCOV_INSTRUMENT := n
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE := n
>  KASAN_SANITIZE := n
>
> diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
> index dd73b7f074237..bd857aacad794 100644
> --- a/arch/s390/kernel/Makefile
> +++ b/arch/s390/kernel/Makefile
> @@ -14,6 +14,7 @@ CFLAGS_REMOVE_early.o         = $(CC_FLAGS_FTRACE)
>  endif
>
>  GCOV_PROFILE_early.o           := n
> +PGO_PROFILE_early.o            := n
>  KCOV_INSTRUMENT_early.o                := n
>  UBSAN_SANITIZE_early.o         := n
>  KASAN_SANITIZE_ipl.o           := n
> diff --git a/arch/s390/kernel/vdso64/Makefile b/arch/s390/kernel/vdso64/Makefile
> index a6e0fb6b91d6c..d7c43b7c1db96 100644
> --- a/arch/s390/kernel/vdso64/Makefile
> +++ b/arch/s390/kernel/vdso64/Makefile
> @@ -35,8 +35,9 @@ obj-y += vdso64_wrapper.o
>  targets += vdso64.lds
>  CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
>
> -# Disable gcov profiling, ubsan and kasan for VDSO code
> +# Disable gcov and PGO profiling, ubsan and kasan for VDSO code
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE := n
>  KASAN_SANITIZE := n
>
> diff --git a/arch/s390/purgatory/Makefile b/arch/s390/purgatory/Makefile
> index c57f8c40e9926..9aef584e98466 100644
> --- a/arch/s390/purgatory/Makefile
> +++ b/arch/s390/purgatory/Makefile
> @@ -17,6 +17,7 @@ $(obj)/mem.o: $(srctree)/arch/s390/lib/mem.S FORCE
>
>  KCOV_INSTRUMENT := n
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE := n
>  KASAN_SANITIZE := n
>
> diff --git a/arch/sh/boot/compressed/Makefile b/arch/sh/boot/compressed/Makefile
> index 589d2d8a573db..ae19aeeb3964c 100644
> --- a/arch/sh/boot/compressed/Makefile
> +++ b/arch/sh/boot/compressed/Makefile
> @@ -13,6 +13,7 @@ targets               := vmlinux vmlinux.bin vmlinux.bin.gz \
>  OBJECTS = $(obj)/head_32.o $(obj)/misc.o $(obj)/cache.o
>
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  #
>  # IMAGE_OFFSET is the load offset of the compression loader
> diff --git a/arch/sh/mm/Makefile b/arch/sh/mm/Makefile
> index f69ddc70b1465..ea2782c631f43 100644
> --- a/arch/sh/mm/Makefile
> +++ b/arch/sh/mm/Makefile
> @@ -43,3 +43,4 @@ obj-$(CONFIG_UNCACHED_MAPPING)        += uncached.o
>  obj-$(CONFIG_HAVE_SRAM_POOL)   += sram.o
>
>  GCOV_PROFILE_pmb.o := n
> +PGO_PROFILE_pmb.o := n
> diff --git a/arch/sparc/vdso/Makefile b/arch/sparc/vdso/Makefile
> index c5e1545bc5cf9..ab5f3783fe199 100644
> --- a/arch/sparc/vdso/Makefile
> +++ b/arch/sparc/vdso/Makefile
> @@ -115,6 +115,7 @@ quiet_cmd_vdso = VDSO    $@
>
>  VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 -Bsymbolic
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  #
>  # Install the unstripped copies of vdso*.so.  If our toolchain supports
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 7b6dd10b162ac..a751b4f8f6645 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -95,6 +95,7 @@ config X86
>         select ARCH_SUPPORTS_DEBUG_PAGEALLOC
>         select ARCH_SUPPORTS_NUMA_BALANCING     if X86_64
>         select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP       if NR_CPUS <= 4096
> +       select ARCH_SUPPORTS_PGO_CLANG          if X86_64
>         select ARCH_USE_BUILTIN_BSWAP
>         select ARCH_USE_QUEUED_RWLOCKS
>         select ARCH_USE_QUEUED_SPINLOCKS
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index fe605205b4ce2..383853e32f673 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
>  KBUILD_CFLAGS  += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
>  KBUILD_CFLAGS  += -fno-asynchronous-unwind-tables
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE := n
>
>  $(obj)/bzImage: asflags-y  := $(SVGA_MODE)
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index e0bc3988c3faa..ed12ab65f6065 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
>
>  KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE :=n
>
>  KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index 02e3e42f380bd..26e2b3af0145c 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO    $@
>  VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
>         $(call ld-option, --eh-frame-hdr) -Bsymbolic
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  quiet_cmd_vdso_and_check = VDSO    $@
>        cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index efd9e9ea17f25..f6cab2316c46a 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -184,6 +184,8 @@ SECTIONS
>
>         BUG_TABLE
>
> +       PGO_CLANG_DATA
> +
>         ORC_UNWIND_TABLE
>
>         . = ALIGN(PAGE_SIZE);
> diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> index 84b09c230cbd5..5f22b31446ad4 100644
> --- a/arch/x86/platform/efi/Makefile
> +++ b/arch/x86/platform/efi/Makefile
> @@ -2,6 +2,7 @@
>  OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
>  KASAN_SANITIZE := n
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  obj-$(CONFIG_EFI)              += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
>  obj-$(CONFIG_EFI_MIXED)                += efi_thunk_$(BITS).o
> diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> index 95ea17a9d20cb..36f20e99da0bc 100644
> --- a/arch/x86/purgatory/Makefile
> +++ b/arch/x86/purgatory/Makefile
> @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
>
>  # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
>  GCOV_PROFILE   := n
> +PGO_PROFILE    := n
>  KASAN_SANITIZE := n
>  UBSAN_SANITIZE := n
>  KCSAN_SANITIZE := n
> diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> index 83f1b6a56449f..21797192f958f 100644
> --- a/arch/x86/realmode/rm/Makefile
> +++ b/arch/x86/realmode/rm/Makefile
> @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
>  KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
>  KBUILD_CFLAGS  += -fno-asynchronous-unwind-tables
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE := n
> diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> index 5943387e3f357..54f5768f58530 100644
> --- a/arch/x86/um/vdso/Makefile
> +++ b/arch/x86/um/vdso/Makefile
> @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO    $@
>
>  VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  #
>  # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index 8a94388e38b33..2d81623b33f29 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -40,6 +40,7 @@ KBUILD_CFLAGS                 := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
>  KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
>
>  GCOV_PROFILE                   := n
> +PGO_PROFILE                    := n
>  # Sanitizer runtimes are unavailable and cannot be linked here.
>  KASAN_SANITIZE                 := n
>  KCSAN_SANITIZE                 := n
> diff --git a/drivers/s390/char/Makefile b/drivers/s390/char/Makefile
> index c6fdb81a068a6..bf6c5db5da1fc 100644
> --- a/drivers/s390/char/Makefile
> +++ b/drivers/s390/char/Makefile
> @@ -9,6 +9,7 @@ CFLAGS_REMOVE_sclp_early_core.o = $(CC_FLAGS_FTRACE)
>  endif
>
>  GCOV_PROFILE_sclp_early_core.o         := n
> +PGO_PROFILE_sclp_early_core.o          := n
>  KCOV_INSTRUMENT_sclp_early_core.o      := n
>  UBSAN_SANITIZE_sclp_early_core.o       := n
>  KASAN_SANITIZE_sclp_early_core.o       := n
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index b2b3d81b1535a..3a591bb18c5fb 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -316,6 +316,49 @@
>  #define THERMAL_TABLE(name)
>  #endif
>
> +#ifdef CONFIG_PGO_CLANG
> +#define PGO_CLANG_DATA                                                 \
> +       __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) {     \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_start = .;                                   \
> +               __llvm_prf_data_start = .;                              \
> +               KEEP(*(__llvm_prf_data))                                \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_data_end = .;                                \
> +       }                                                               \
> +       __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) {     \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_cnts_start = .;                              \
> +               KEEP(*(__llvm_prf_cnts))                                \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_cnts_end = .;                                \
> +       }                                                               \
> +       __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) {   \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_names_start = .;                             \
> +               KEEP(*(__llvm_prf_names))                               \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_names_end = .;                               \
> +               . = ALIGN(8);                                           \
> +       }                                                               \
> +       __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) {     \
> +               __llvm_prf_vals_start = .;                              \
> +               KEEP(*(__llvm_prf_vals))                                \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_vals_end = .;                                \
> +               . = ALIGN(8);                                           \
> +       }                                                               \
> +       __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) {     \
> +               __llvm_prf_vnds_start = .;                              \
> +               KEEP(*(__llvm_prf_vnds))                                \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_vnds_end = .;                                \
> +               __llvm_prf_end = .;                                     \
> +       }
> +#else
> +#define PGO_CLANG_DATA
> +#endif
> +
>  #define KERNEL_DTB()                                                   \
>         STRUCT_ALIGN();                                                 \
>         __dtb_start = .;                                                \
> @@ -1125,6 +1168,7 @@
>                 CONSTRUCTORS                                            \
>         }                                                               \
>         BUG_TABLE                                                       \
> +       PGO_CLANG_DATA
>
>  #define INIT_TEXT_SECTION(inittext_align)                              \
>         . = ALIGN(inittext_align);                                      \
> diff --git a/kernel/Makefile b/kernel/Makefile
> index aa7368c7eabf3..0b34ca228ba46 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
>  obj-$(CONFIG_KCSAN) += kcsan/
>  obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
>  obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> +obj-$(CONFIG_PGO_CLANG) += pgo/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> new file mode 100644
> index 0000000000000..318d36bb3d106
> --- /dev/null
> +++ b/kernel/pgo/Kconfig
> @@ -0,0 +1,34 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> +
> +config ARCH_SUPPORTS_PGO_CLANG
> +       bool
> +
> +config PGO_CLANG
> +       bool "Enable clang's PGO-based kernel profiling"
> +       depends on DEBUG_FS
> +       depends on ARCH_SUPPORTS_PGO_CLANG
> +       help
> +         This option enables clang's PGO (Profile Guided Optimization) based
> +         code profiling to better optimize the kernel.
> +
> +         If unsure, say N.
> +
> +         Run a representative workload for your application on a kernel
> +         compiled with this option and download the raw profile file from
> +         /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> +         llvm-profdata. It may be merged with other collected raw profiles.
> +
> +         Copy the resulting profile file into vmlinux.profdata, and enable
> +         KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> +         kernel.
> +
> +         Note that a kernel compiled with profiling flags will be
> +         significatnly larger and run slower. Also be sure to exclude files
> +         from profiling which are not linked to the kernel image to prevent
> +         linker errors.
> +
> +         Note that the debugfs filesystem has to be mounted to access
> +         profiling data.
> +
> +endmenu
> diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> new file mode 100644
> index 0000000000000..41e27cefd9a47
> --- /dev/null
> +++ b/kernel/pgo/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +GCOV_PROFILE   := n
> +PGO_PROFILE    := n
> +
> +obj-y  += fs.o instrument.o
> diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> new file mode 100644
> index 0000000000000..790a8df037bfc
> --- /dev/null
> +++ b/kernel/pgo/fs.c
> @@ -0,0 +1,382 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + *     Sami Tolvanen <samitolvanen@google.com>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt)    "pgo: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/debugfs.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include "pgo.h"
> +
> +static struct dentry *directory;
> +
> +struct prf_private_data {
> +       void *buffer;
> +       unsigned long size;
> +};
> +
> +/*
> + * Raw profile data format:
> + *
> + *     - llvm_prf_header
> + *     - __llvm_prf_data
> + *     - __llvm_prf_cnts
> + *     - __llvm_prf_names
> + *     - zero padding to 8 bytes
> + *     - for each llvm_prf_data in __llvm_prf_data:
> + *             - llvm_prf_value_data
> + *                     - llvm_prf_value_record + site count array
> + *                             - llvm_prf_value_node_data
> + *                             ...
> + *                     ...
> + *             ...
> + */
> +
> +static void prf_fill_header(void **buffer)
> +{
> +       struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> +
> +       header->magic = LLVM_PRF_MAGIC;
> +       header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
> +       header->data_size = prf_data_count();
> +       header->padding_bytes_before_counters = 0;
> +       header->counters_size = prf_cnts_count();
> +       header->padding_bytes_after_counters = 0;
> +       header->names_size = prf_names_count();
> +       header->counters_delta = (u64)__llvm_prf_cnts_start;
> +       header->names_delta = (u64)__llvm_prf_names_start;
> +       header->value_kind_last = LLVM_PRF_IPVK_LAST;
> +
> +       *buffer += sizeof(*header);
> +}
> +
> +/*
> + * Copy the source into the buffer, incrementing the pointer into buffer in the
> + * process.
> + */
> +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> +{
> +       memcpy(*buffer, src, size);
> +       *buffer += size;
> +}
> +
> +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> +{
> +       struct llvm_prf_value_node **nodes =
> +               (struct llvm_prf_value_node **)p->values;
> +       u32 kinds = 0;
> +       u32 size = 0;
> +       unsigned int kind;
> +       unsigned int n;
> +       unsigned int s = 0;
> +
> +       for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> +               unsigned int sites = p->num_value_sites[kind];
> +
> +               if (!sites)
> +                       continue;
> +
> +               /* Record + site count array */
> +               size += prf_get_value_record_size(sites);
> +               kinds++;
> +
> +               if (!nodes)
> +                       continue;
> +
> +               for (n = 0; n < sites; n++) {
> +                       u32 count = 0;
> +                       struct llvm_prf_value_node *site = nodes[s + n];
> +
> +                       while (site && ++count <= U8_MAX)
> +                               site = site->next;
> +
> +                       size += count *
> +                               sizeof(struct llvm_prf_value_node_data);
> +               }
> +
> +               s += sites;
> +       }
> +
> +       if (size)
> +               size += sizeof(struct llvm_prf_value_data);
> +
> +       if (value_kinds)
> +               *value_kinds = kinds;
> +
> +       return size;
> +}
> +
> +static u32 prf_get_value_size(void)
> +{
> +       u32 size = 0;
> +       struct llvm_prf_data *p;
> +
> +       for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> +               size += __prf_get_value_size(p, NULL);
> +
> +       return size;
> +}
> +
> +/* Serialize the profiling's value. */
> +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> +{
> +       struct llvm_prf_value_data header;
> +       struct llvm_prf_value_node **nodes =
> +               (struct llvm_prf_value_node **)p->values;
> +       unsigned int kind;
> +       unsigned int n;
> +       unsigned int s = 0;
> +
> +       header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> +
> +       if (!header.num_value_kinds)
> +               /* Nothing to write. */
> +               return;
> +
> +       prf_copy_to_buffer(buffer, &header, sizeof(header));
> +
> +       for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> +               struct llvm_prf_value_record *record;
> +               u8 *counts;
> +               unsigned int sites = p->num_value_sites[kind];
> +
> +               if (!sites)
> +                       continue;
> +
> +               /* Profiling value record. */
> +               record = *(struct llvm_prf_value_record **)buffer;
> +               *buffer += prf_get_value_record_header_size();
> +
> +               record->kind = kind;
> +               record->num_value_sites = sites;
> +
> +               /* Site count array. */
> +               counts = *(u8 **)buffer;
> +               *buffer += prf_get_value_record_site_count_size(sites);
> +
> +               /*
> +                * If we don't have nodes, we can skip updating the site count
> +                * array, because the buffer is zero filled.
> +                */
> +               if (!nodes)
> +                       continue;
> +
> +               for (n = 0; n < sites; n++) {
> +                       u32 count = 0;
> +                       struct llvm_prf_value_node *site = nodes[s + n];
> +
> +                       while (site && ++count <= U8_MAX) {
> +                               prf_copy_to_buffer(buffer, site,
> +                                                  sizeof(struct llvm_prf_value_node_data));
> +                               site = site->next;
> +                       }
> +
> +                       counts[n] = (u8)count;
> +               }
> +
> +               s += sites;
> +       }
> +}
> +
> +static void prf_serialize_values(void **buffer)
> +{
> +       struct llvm_prf_data *p;
> +
> +       for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> +               prf_serialize_value(p, buffer);
> +}
> +
> +static inline unsigned long prf_get_padding(unsigned long size)
> +{
> +       return 8 - (size % 8);
> +}
> +
> +static unsigned long prf_buffer_size(void)
> +{
> +       return sizeof(struct llvm_prf_header) +
> +                       prf_data_size() +
> +                       prf_cnts_size() +
> +                       prf_names_size() +
> +                       prf_get_padding(prf_names_size()) +
> +                       prf_get_value_size();
> +}
> +
> +/* Serialize the profling data into a format LLVM's tools can understand. */
> +static int prf_serialize(struct prf_private_data *p)
> +{
> +       int err = 0;
> +       void *buffer;
> +
> +       p->size = prf_buffer_size();
> +       p->buffer = vzalloc(p->size);
> +
> +       if (!p->buffer) {
> +               err = -ENOMEM;
> +               goto out;
> +       }
> +
> +       buffer = p->buffer;
> +
> +       prf_fill_header(&buffer);
> +       prf_copy_to_buffer(&buffer, __llvm_prf_data_start,  prf_data_size());
> +       prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start,  prf_cnts_size());
> +       prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> +       buffer += prf_get_padding(prf_names_size());
> +
> +       prf_serialize_values(&buffer);
> +
> +out:
> +       return err;
> +}
> +
> +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> +static int prf_open(struct inode *inode, struct file *file)
> +{
> +       struct prf_private_data *data;
> +       unsigned long flags;
> +       int err;
> +
> +       data = kzalloc(sizeof(*data), GFP_KERNEL);
> +       if (!data) {
> +               err = -ENOMEM;
> +               goto out;
> +       }
> +
> +       flags = prf_lock();
> +
> +       err = prf_serialize(data);
> +       if (err) {
> +               kfree(data);
> +               goto out_unlock;
> +       }
> +
> +       file->private_data = data;
> +
> +out_unlock:
> +       prf_unlock(flags);
> +out:
> +       return err;
> +}
> +
> +/* read() implementation for PGO. */
> +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> +                       loff_t *ppos)
> +{
> +       struct prf_private_data *data = file->private_data;
> +
> +       BUG_ON(!data);
> +
> +       return simple_read_from_buffer(buf, count, ppos, data->buffer,
> +                                      data->size);
> +}
> +
> +/* release() implementation for PGO. Release resources allocated by open(). */
> +static int prf_release(struct inode *inode, struct file *file)
> +{
> +       struct prf_private_data *data = file->private_data;
> +
> +       if (data) {
> +               vfree(data->buffer);
> +               kfree(data);
> +       }
> +
> +       return 0;
> +}
> +
> +static const struct file_operations prf_fops = {
> +       .owner          = THIS_MODULE,
> +       .open           = prf_open,
> +       .read           = prf_read,
> +       .llseek         = default_llseek,
> +       .release        = prf_release
> +};
> +
> +/* write() implementation for resetting PGO's profile data. */
> +static ssize_t reset_write(struct file *file, const char __user *addr,
> +                          size_t len, loff_t *pos)
> +{
> +       struct llvm_prf_data *data;
> +
> +       memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> +
> +       for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
> +               struct llvm_prf_value_node **vnodes;
> +               u64 current_vsite_count;
> +               u32 i;
> +
> +               if (!data->values)
> +                       continue;
> +
> +               current_vsite_count = 0;
> +               vnodes = (struct llvm_prf_value_node **)data->values;
> +
> +               for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
> +                       current_vsite_count += data->num_value_sites[i];
> +
> +               for (i = 0; i < current_vsite_count; ++i) {
> +                       struct llvm_prf_value_node *current_vnode = vnodes[i];
> +
> +                       while (current_vnode) {
> +                               current_vnode->count = 0;
> +                               current_vnode = current_vnode->next;
> +                       }
> +               }
> +       }
> +
> +       return len;
> +}
> +
> +static const struct file_operations prf_reset_fops = {
> +       .owner          = THIS_MODULE,
> +       .write          = reset_write,
> +       .llseek         = noop_llseek,
> +};
> +
> +/* Create debugfs entries. */
> +static int __init pgo_init(void)
> +{
> +       directory = debugfs_create_dir("pgo", NULL);
> +       if (!directory)
> +               goto err_remove;
> +
> +       if (!debugfs_create_file("profraw", 0600, directory, NULL,
> +                                &prf_fops))
> +               goto err_remove;
> +
> +       if (!debugfs_create_file("reset", 0200, directory, NULL,
> +                                &prf_reset_fops))
> +               goto err_remove;
> +
> +       return 0;
> +
> +err_remove:
> +       pr_err("initialization failed\n");
> +       return -EIO;
> +}
> +
> +/* Remove debufs entries. */
> +static void __exit pgo_exit(void)
> +{
> +       debugfs_remove_recursive(directory);
> +}
> +
> +module_init(pgo_init);
> +module_exit(pgo_exit);
> diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> new file mode 100644
> index 0000000000000..d96b61a1cf712
> --- /dev/null
> +++ b/kernel/pgo/instrument.c
> @@ -0,0 +1,147 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + *     Sami Tolvanen <samitolvanen@google.com>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt)    "pgo: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/export.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +#include "pgo.h"
> +
> +/* Lock guarding value node access and serialization. */
> +static DEFINE_SPINLOCK(pgo_lock);
> +static int current_node;
> +
> +unsigned long prf_lock(void)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&pgo_lock, flags);
> +
> +       return flags;
> +}
> +
> +void prf_unlock(unsigned long flags)
> +{
> +       spin_unlock_irqrestore(&pgo_lock, flags);
> +}
> +
> +/*
> + * Return a newly allocated profiling value node which contains the tracked
> + * value by the value profiler.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> +                                                u32 index, u64 value)
> +{
> +       if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> +               return NULL; /* Out of nodes */
> +
> +       current_node++;
> +
> +       /* Make sure the node is entirely within the section */
> +       if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> +           &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> +               return NULL;
> +
> +       return &__llvm_prf_vnds_start[current_node];
> +}
> +
> +/*
> + * Counts the number of times a target value is seen.
> + *
> + * Records the target value for the CounterIndex if not seen before. Otherwise,
> + * increments the counter associated w/ the target value.
> + */
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> +{
> +       struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> +       struct llvm_prf_value_node **counters;
> +       struct llvm_prf_value_node *curr;
> +       struct llvm_prf_value_node *min = NULL;
> +       struct llvm_prf_value_node *prev = NULL;
> +       u64 min_count = U64_MAX;
> +       u8 values = 0;
> +       unsigned long flags;
> +
> +       if (!p || !p->values)
> +               return;
> +
> +       counters = (struct llvm_prf_value_node **)p->values;
> +       curr = counters[index];
> +
> +       while (curr) {
> +               if (target_value == curr->value) {
> +                       curr->count++;
> +                       return;
> +               }
> +
> +               if (curr->count < min_count) {
> +                       min_count = curr->count;
> +                       min = curr;
> +               }
> +
> +               prev = curr;
> +               curr = curr->next;
> +               values++;
> +       }
> +
> +       if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
> +               if (!min->count || !(--min->count)) {
> +                       curr = min;
> +                       curr->value = target_value;
> +                       curr->count++;
> +               }
> +               return;
> +       }
> +
> +       /* Lock when updating the value node structure. */
> +       flags = prf_lock();
> +
> +       curr = allocate_node(p, index, target_value);
> +       if (!curr)
> +               goto out;
> +
> +       curr->value = target_value;
> +       curr->count++;
> +
> +       if (!counters[index])
> +               counters[index] = curr;
> +       else if (prev && !prev->next)
> +               prev->next = curr;
> +
> +out:
> +       prf_unlock(flags);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> +
> +/* Counts the number of times a range of targets values are seen. */
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> +                                    u32 index, s64 precise_start,
> +                                    s64 precise_last, s64 large_value)
> +{
> +       if (large_value != S64_MIN && (s64)target_value >= large_value)
> +               target_value = large_value;
> +       else if ((s64)target_value < precise_start ||
> +                (s64)target_value > precise_last)
> +               target_value = precise_last + 1;
> +
> +       __llvm_profile_instrument_target(target_value, data, index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> new file mode 100644
> index 0000000000000..df0aa278f28bd
> --- /dev/null
> +++ b/kernel/pgo/pgo.h
> @@ -0,0 +1,206 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + *     Sami Tolvanen <samitolvanen@google.com>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#ifndef _PGO_H
> +#define _PGO_H
> +
> +/*
> + * Note: These internal LLVM definitions must match the compiler version.
> + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> + */
> +
> +#ifdef CONFIG_64BIT
> +       #define LLVM_PRF_MAGIC          \
> +               ((u64)255 << 56 |       \
> +                (u64)'l' << 48 |       \
> +                (u64)'p' << 40 |       \
> +                (u64)'r' << 32 |       \
> +                (u64)'o' << 24 |       \
> +                (u64)'f' << 16 |       \
> +                (u64)'r' << 8  |       \
> +                (u64)129)
> +#else
> +       #define LLVM_PRF_MAGIC          \
> +               ((u64)255 << 56 |       \
> +                (u64)'l' << 48 |       \
> +                (u64)'p' << 40 |       \
> +                (u64)'r' << 32 |       \
> +                (u64)'o' << 24 |       \
> +                (u64)'f' << 16 |       \
> +                (u64)'R' << 8  |       \
> +                (u64)129)
> +#endif
> +
> +#define LLVM_PRF_VERSION               5
> +#define LLVM_PRF_DATA_ALIGN            8
> +#define LLVM_PRF_IPVK_FIRST            0
> +#define LLVM_PRF_IPVK_LAST             1
> +#define LLVM_PRF_MAX_NUM_VALS_PER_SITE 16
> +
> +#define LLVM_PRF_VARIANT_MASK_IR       (0x1ull << 56)
> +#define LLVM_PRF_VARIANT_MASK_CSIR     (0x1ull << 57)
> +
> +/**
> + * struct llvm_prf_header - represents the raw profile header data structure.
> + * @magic: the magic token for the file format.
> + * @version: the version of the file format.
> + * @data_size: the number of entries in the profile data section.
> + * @padding_bytes_before_counters: the number of padding bytes before the
> + *   counters.
> + * @counters_size: the size in bytes of the LLVM profile section containing the
> + *   counters.
> + * @padding_bytes_after_counters: the number of padding bytes after the
> + *   counters.
> + * @names_size: the size in bytes of the LLVM profile section containing the
> + *   counters' names.
> + * @counters_delta: the beginning of the LLMV profile counters section.
> + * @names_delta: the beginning of the LLMV profile names section.
> + * @value_kind_last: the last profile value kind.
> + */
> +struct llvm_prf_header {
> +       u64 magic;
> +       u64 version;
> +       u64 data_size;
> +       u64 padding_bytes_before_counters;
> +       u64 counters_size;
> +       u64 padding_bytes_after_counters;
> +       u64 names_size;
> +       u64 counters_delta;
> +       u64 names_delta;
> +       u64 value_kind_last;
> +};
> +
> +/**
> + * struct llvm_prf_data - represents the per-function control structure.
> + * @name_ref: the reference to the function's name.
> + * @func_hash: the hash value of the function.
> + * @counter_ptr: a pointer to the profile counter.
> + * @function_ptr: a pointer to the function.
> + * @values: the profiling values associated with this function.
> + * @num_counters: the number of counters in the function.
> + * @num_value_sites: the number of value profile sites.
> + */
> +struct llvm_prf_data {
> +       const u64 name_ref;
> +       const u64 func_hash;
> +       const void *counter_ptr;
> +       const void *function_ptr;
> +       void *values;
> +       const u32 num_counters;
> +       const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
> +} __aligned(LLVM_PRF_DATA_ALIGN);
> +
> +/**
> + * structure llvm_prf_value_node_data - represents the data part of the struct
> + *   llvm_prf_value_node data structure.
> + * @value: the value counters.
> + * @count: the counters' count.
> + */
> +struct llvm_prf_value_node_data {
> +       u64 value;
> +       u64 count;
> +};
> +
> +/**
> + * struct llvm_prf_value_node - represents an internal data structure used by
> + *   the value profiler.
> + * @value: the value counters.
> + * @count: the counters' count.
> + * @next: the next value node.
> + */
> +struct llvm_prf_value_node {
> +       u64 value;
> +       u64 count;
> +       struct llvm_prf_value_node *next;
> +};
> +
> +/**
> + * struct llvm_prf_value_data - represents the value profiling data in indexed
> + *   format.
> + * @total_size: the total size in bytes including this field.
> + * @num_value_kinds: the number of value profile kinds that has value profile
> + *   data.
> + */
> +struct llvm_prf_value_data {
> +       u32 total_size;
> +       u32 num_value_kinds;
> +};
> +
> +/**
> + * struct llvm_prf_value_record - represents the on-disk layout of the value
> + *   profile data of a particular kind for one function.
> + * @kind: the kind of the value profile record.
> + * @num_value_sites: the number of value profile sites.
> + * @site_count_array: the first element of the array that stores the number
> + *   of profiled values for each value site.
> + */
> +struct llvm_prf_value_record {
> +       u32 kind;
> +       u32 num_value_sites;
> +       u8 site_count_array[];
> +};
> +
> +#define prf_get_value_record_header_size()             \
> +       offsetof(struct llvm_prf_value_record, site_count_array)
> +#define prf_get_value_record_site_count_size(sites)    \
> +       roundup((sites), 8)
> +#define prf_get_value_record_size(sites)               \
> +       (prf_get_value_record_header_size() +           \
> +        prf_get_value_record_site_count_size((sites)))
> +
> +/* Data sections */
> +extern struct llvm_prf_data __llvm_prf_data_start[];
> +extern struct llvm_prf_data __llvm_prf_data_end[];
> +
> +extern u64 __llvm_prf_cnts_start[];
> +extern u64 __llvm_prf_cnts_end[];
> +
> +extern char __llvm_prf_names_start[];
> +extern char __llvm_prf_names_end[];
> +
> +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> +
> +/* Locking for vnodes */
> +extern unsigned long prf_lock(void);
> +extern void prf_unlock(unsigned long flags);
> +
> +#define __DEFINE_PRF_SIZE(s) \
> +       static inline unsigned long prf_ ## s ## _size(void)            \
> +       {                                                               \
> +               unsigned long start =                                   \
> +                       (unsigned long)__llvm_prf_ ## s ## _start;      \
> +               unsigned long end =                                     \
> +                       (unsigned long)__llvm_prf_ ## s ## _end;        \
> +               return roundup(end - start,                             \
> +                               sizeof(__llvm_prf_ ## s ## _start[0])); \
> +       }                                                               \
> +       static inline unsigned long prf_ ## s ## _count(void)           \
> +       {                                                               \
> +               return prf_ ## s ## _size() /                           \
> +                       sizeof(__llvm_prf_ ## s ## _start[0]);          \
> +       }
> +
> +__DEFINE_PRF_SIZE(data);
> +__DEFINE_PRF_SIZE(cnts);
> +__DEFINE_PRF_SIZE(names);
> +__DEFINE_PRF_SIZE(vnds);
> +
> +#undef __DEFINE_PRF_SIZE
> +
> +#endif /* _PGO_H */
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 213677a5ed33e..9b218afb5cb87 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
>                 $(CFLAGS_GCOV))
>  endif
>
> +#
> +# Enable clang's PGO profiling flags for a file or directory depending on
> +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> +#
> +ifeq ($(CONFIG_PGO_CLANG),y)
> +_c_flags += $(if $(patsubst n%,, \
> +               $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> +               $(CFLAGS_PGO_CLANG))
> +endif
> +
>  #
>  # Enable address sanitizer flags for kernel except some files or directories
>  # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> --
> 2.30.0.284.gd98b1dd5eaa7-goog
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210111081821.3041587-1-morbo%40google.com.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-11  8:39 ` Sedat Dilek
@ 2021-01-11  8:42   ` Sedat Dilek
  2021-01-11  9:17   ` Bill Wendling
  1 sibling, 0 replies; 116+ messages in thread
From: Sedat Dilek @ 2021-01-11  8:42 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Jonathan Corbet, Masahiro Yamada, linux-doc, linux-kernel,
	linux-kbuild, Clang-Built-Linux ML, Andrew Morton,
	Nathan Chancellor, Nick Desaulniers, Sami Tolvanen

On Mon, Jan 11, 2021 at 9:39 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
> On Mon, Jan 11, 2021 at 9:18 AM 'Bill Wendling' via Clang Built Linux
> <clang-built-linux@googlegroups.com> wrote:
> >
> > From: Sami Tolvanen <samitolvanen@google.com>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool before
> > it can be used during recompilation:
> >
> >   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> >   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can be used either by the compiler if LTO isn't enabled:
> >
> >     ... -fprofile-use=vmlinux.profdata ...
> >
> > or by LLD if LTO is enabled:
> >
> >     ... -lto-cs-profile-file=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we know
> > works. This restriction can be lifted once other platforms have been verified
> > to work with PGO.
> >
> > Note that this method of profiling the kernel is clang-native and isn't
> > compatible with clang's gcov support in kernel/gcov.
> >
> > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> >
>
> Hi Bill and Sami,
>
> I have seen the pull-request in the CBL issue tracker and had some
> questions in mind.
>
> Good you send this.
>
> First of all, I like to fetch any development stuff easily from a Git
> repository.
> Can you offer this, please?
> What is the base for your work?
> I hope this is (fresh released) Linux v5.11-rc3.
>
> I myself had some experiences with a PGO + ThinLTO optimized LLVM
> toolchain built with the help of tc-build.
> Here it takes very long to build it.
>
> This means I have some profile-data archived.
> Can I use it?
>
> Is an own PGO + ThinLTO optimized LLVM toolchain pre-requirement for
> this or not?

When I recall correctly such an optimized LLVM toolchain saved here
40% of build-time.
So, I am highly interested in reducing my build-time.

- Sedat -

> That is one of my important questions.
>
> Thanks for your work.
>
> Regards,
> - Sedat -
>
> > Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> > Co-developed-by: Bill Wendling <morbo@google.com>
> > Signed-off-by: Bill Wendling <morbo@google.com>
> > ---
> >  Documentation/dev-tools/index.rst     |   1 +
> >  Documentation/dev-tools/pgo.rst       | 127 +++++++++
> >  MAINTAINERS                           |   9 +
> >  Makefile                              |   3 +
> >  arch/Kconfig                          |   1 +
> >  arch/arm/boot/bootp/Makefile          |   1 +
> >  arch/arm/boot/compressed/Makefile     |   1 +
> >  arch/arm/vdso/Makefile                |   3 +-
> >  arch/arm64/kernel/vdso/Makefile       |   3 +-
> >  arch/arm64/kvm/hyp/nvhe/Makefile      |   1 +
> >  arch/mips/boot/compressed/Makefile    |   1 +
> >  arch/mips/vdso/Makefile               |   1 +
> >  arch/nds32/kernel/vdso/Makefile       |   4 +-
> >  arch/parisc/boot/compressed/Makefile  |   1 +
> >  arch/powerpc/kernel/Makefile          |   6 +-
> >  arch/powerpc/kernel/trace/Makefile    |   3 +-
> >  arch/powerpc/kernel/vdso32/Makefile   |   1 +
> >  arch/powerpc/kernel/vdso64/Makefile   |   1 +
> >  arch/powerpc/kexec/Makefile           |   3 +-
> >  arch/powerpc/xmon/Makefile            |   1 +
> >  arch/riscv/kernel/vdso/Makefile       |   3 +-
> >  arch/s390/boot/Makefile               |   1 +
> >  arch/s390/boot/compressed/Makefile    |   1 +
> >  arch/s390/kernel/Makefile             |   1 +
> >  arch/s390/kernel/vdso64/Makefile      |   3 +-
> >  arch/s390/purgatory/Makefile          |   1 +
> >  arch/sh/boot/compressed/Makefile      |   1 +
> >  arch/sh/mm/Makefile                   |   1 +
> >  arch/sparc/vdso/Makefile              |   1 +
> >  arch/x86/Kconfig                      |   1 +
> >  arch/x86/boot/Makefile                |   1 +
> >  arch/x86/boot/compressed/Makefile     |   1 +
> >  arch/x86/entry/vdso/Makefile          |   1 +
> >  arch/x86/kernel/vmlinux.lds.S         |   2 +
> >  arch/x86/platform/efi/Makefile        |   1 +
> >  arch/x86/purgatory/Makefile           |   1 +
> >  arch/x86/realmode/rm/Makefile         |   1 +
> >  arch/x86/um/vdso/Makefile             |   1 +
> >  drivers/firmware/efi/libstub/Makefile |   1 +
> >  drivers/s390/char/Makefile            |   1 +
> >  include/asm-generic/vmlinux.lds.h     |  44 +++
> >  kernel/Makefile                       |   1 +
> >  kernel/pgo/Kconfig                    |  34 +++
> >  kernel/pgo/Makefile                   |   5 +
> >  kernel/pgo/fs.c                       | 382 ++++++++++++++++++++++++++
> >  kernel/pgo/instrument.c               | 147 ++++++++++
> >  kernel/pgo/pgo.h                      | 206 ++++++++++++++
> >  scripts/Makefile.lib                  |  10 +
> >  48 files changed, 1017 insertions(+), 9 deletions(-)
> >  create mode 100644 Documentation/dev-tools/pgo.rst
> >  create mode 100644 kernel/pgo/Kconfig
> >  create mode 100644 kernel/pgo/Makefile
> >  create mode 100644 kernel/pgo/fs.c
> >  create mode 100644 kernel/pgo/instrument.c
> >  create mode 100644 kernel/pgo/pgo.h
> >
> > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > index f7809c7b1ba9e..8d6418e858062 100644
> > --- a/Documentation/dev-tools/index.rst
> > +++ b/Documentation/dev-tools/index.rst
> > @@ -26,6 +26,7 @@ whole; patches welcome!
> >     kgdb
> >     kselftest
> >     kunit/index
> > +   pgo
> >
> >
> >  .. only::  subproject and html
> > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > new file mode 100644
> > index 0000000000000..2ed7f549b20ef
> > --- /dev/null
> > +++ b/Documentation/dev-tools/pgo.rst
> > @@ -0,0 +1,127 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +===============================
> > +Using PGO with the Linux kernel
> > +===============================
> > +
> > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > +when building with Clang. The profiling data is exported via the ``pgo``
> > +debugfs directory.
> > +
> > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > +
> > +
> > +Preparation
> > +===========
> > +
> > +Configure the kernel with:
> > +
> > +.. code-block:: make
> > +
> > +   CONFIG_DEBUG_FS=y
> > +   CONFIG_PGO_CLANG=y
> > +
> > +Note that kernels compiled with profiling flags will be significantly larger
> > +and run slower.
> > +
> > +Profiling data will only become accessible once debugfs has been mounted:
> > +
> > +.. code-block:: sh
> > +
> > +   mount -t debugfs none /sys/kernel/debug
> > +
> > +
> > +Customization
> > +=============
> > +
> > +You can enable or disable profiling for individual file and directories by
> > +adding a line similar to the following to the respective kernel Makefile:
> > +
> > +- For a single file (e.g. main.o)
> > +
> > +  .. code-block:: make
> > +
> > +     PGO_PROFILE_main.o := y
> > +
> > +- For all files in one directory
> > +
> > +  .. code-block:: make
> > +
> > +     PGO_PROFILE := y
> > +
> > +To exclude files from being profiled use
> > +
> > +  .. code-block:: make
> > +
> > +     PGO_PROFILE_main.o := n
> > +
> > +and
> > +
> > +  .. code-block:: make
> > +
> > +     PGO_PROFILE := n
> > +
> > +Only files which are linked to the main kernel image or are compiled as kernel
> > +modules are supported by this mechanism.
> > +
> > +
> > +Files
> > +=====
> > +
> > +The PGO kernel support creates the following files in debugfs:
> > +
> > +``/sys/kernel/debug/pgo``
> > +       Parent directory for all PGO-related files.
> > +
> > +``/sys/kernel/debug/pgo/reset``
> > +       Global reset file: resets all coverage data to zero when written to.
> > +
> > +``/sys/kernel/debug/profraw``
> > +       The raw PGO data that must be processed with ``llvm_profdata``.
> > +
> > +
> > +Workflow
> > +========
> > +
> > +The PGO kernel can be run on the host or test machines. The data though should
> > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > +Clang version.
> > +
> > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > +etc. Clang offers tools to perform these tasks.
> > +
> > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > +using the result to optimize the kernel:
> > +
> > +1) Install the kernel on the TEST machine.
> > +
> > +2) Reset the data counters right before running the load tests
> > +
> > +   .. code-block:: sh
> > +
> > +      echo 1 > /sys/kernel/debug/pgo/reset
> > +
> > +3) Run the load tests.
> > +
> > +4) Collect the raw profile data
> > +
> > +   .. code-block:: sh
> > +
> > +      cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > +
> > +5) (Optional) Download the raw profile data to the HOST machine.
> > +
> > +6) Process the raw profile data
> > +
> > +   .. code-block:: sh
> > +
> > +      llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > +
> > +   Note that multiple raw profile data files can be merged during this step.
> > +
> > +7) Rebuild the kernel using the profile data (PGO disabled)
> > +
> > +   .. code-block:: sh
> > +
> > +      make ... KCLAGS=-fprofile-use=vmlinux.profdata
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 6390491b07e51..7a98bdaab9861 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -13955,6 +13955,15 @@ S:     Maintained
> >  F:     include/linux/personality.h
> >  F:     include/uapi/linux/personality.h
> >
> > +PGO BASED KERNEL PROFILING
> > +M:     Sami Tolvanen <samitolvanen@google.com>
> > +M:     Bill Wendling <wcw@google.com>
> > +R:     Nathan Chancellor <natechancellor@gmail.com>
> > +R:     Nick Desaulniers <ndesaulniers@google.com>
> > +S:     Supported
> > +F:     Documentation/dev-tools/pgo.rst
> > +F:     kernel/pgo
> > +
> >  PHOENIX RC FLIGHT CONTROLLER ADAPTER
> >  M:     Marcus Folkesson <marcus.folkesson@gmail.com>
> >  L:     linux-input@vger.kernel.org
> > diff --git a/Makefile b/Makefile
> > index 8b2c3f88ee5ea..4f42957c78134 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
> >  # Defaults to vmlinux, but the arch makefile usually adds further targets
> >  all: vmlinux
> >
> > +CFLAGS_PGO_CLANG := -fprofile-generate
> > +export CFLAGS_PGO_CLANG
> > +
> >  CFLAGS_GCOV    := -fprofile-arcs -ftest-coverage \
> >         $(call cc-option,-fno-tree-loop-im) \
> >         $(call cc-disable-warning,maybe-uninitialized,)
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 78c6f05b10f91..a7a6ab7d204dc 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -1106,6 +1106,7 @@ config ARCH_SUPPORTS_DEBUG_PAGEALLOC
> >         bool
> >
> >  source "kernel/gcov/Kconfig"
> > +source "kernel/pgo/Kconfig"
> >
> >  source "scripts/gcc-plugins/Kconfig"
> >
> > diff --git a/arch/arm/boot/bootp/Makefile b/arch/arm/boot/bootp/Makefile
> > index 981a8d03f064c..523bd58df0a4b 100644
> > --- a/arch/arm/boot/bootp/Makefile
> > +++ b/arch/arm/boot/bootp/Makefile
> > @@ -7,6 +7,7 @@
> >  #
> >
> >  GCOV_PROFILE   := n
> > +PGO_PROFILE    := n
> >
> >  LDFLAGS_bootp  := --no-undefined -X \
> >                  --defsym initrd_phys=$(INITRD_PHYS) \
> > diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
> > index fb521efcc6c20..5fd0fd85fc0e5 100644
> > --- a/arch/arm/boot/compressed/Makefile
> > +++ b/arch/arm/boot/compressed/Makefile
> > @@ -24,6 +24,7 @@ OBJS          += hyp-stub.o
> >  endif
> >
> >  GCOV_PROFILE           := n
> > +PGO_PROFILE            := n
> >  KASAN_SANITIZE         := n
> >
> >  # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
> > diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile
> > index b558bee0e1f6b..11f6ce4b48b56 100644
> > --- a/arch/arm/vdso/Makefile
> > +++ b/arch/arm/vdso/Makefile
> > @@ -36,8 +36,9 @@ else
> >  CFLAGS_vgettimeofday.o = -O2 -include $(c-gettimeofday-y)
> >  endif
> >
> > -# Disable gcov profiling for VDSO code
> > +# Disable gcov and PGO profiling for VDSO code
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> >  # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
> >  KCOV_INSTRUMENT := n
> > diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
> > index cd9c3fa25902f..d48fc0df07020 100644
> > --- a/arch/arm64/kernel/vdso/Makefile
> > +++ b/arch/arm64/kernel/vdso/Makefile
> > @@ -41,8 +41,9 @@ ifneq ($(c-gettimeofday-y),)
> >    CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
> >  endif
> >
> > -# Disable gcov profiling for VDSO code
> > +# Disable gcov and PGO profiling for VDSO code
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> >  obj-y += vdso.o
> >  targets += vdso.lds
> > diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
> > index 1f1e351c5fe2b..ad128ecdbfbdf 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> > +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> > @@ -60,6 +60,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS), $(KBUILD_CFLAG
> >  # compiler instrumentation that inserts callbacks or checks into the code may
> >  # cause crashes. Just disable it.
> >  GCOV_PROFILE   := n
> > +PGO_PROFILE    := n
> >  KASAN_SANITIZE := n
> >  UBSAN_SANITIZE := n
> >  KCOV_INSTRUMENT        := n
> > diff --git a/arch/mips/boot/compressed/Makefile b/arch/mips/boot/compressed/Makefile
> > index 47cd9dc7454af..0855ea12f2c7f 100644
> > --- a/arch/mips/boot/compressed/Makefile
> > +++ b/arch/mips/boot/compressed/Makefile
> > @@ -37,6 +37,7 @@ KBUILD_AFLAGS := $(KBUILD_AFLAGS) -D__ASSEMBLY__ \
> >  # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
> >  KCOV_INSTRUMENT                := n
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> >  # decompressor objects (linked with vmlinuz)
> >  vmlinuzobjs-y := $(obj)/head.o $(obj)/decompress.o $(obj)/string.o
> > diff --git a/arch/mips/vdso/Makefile b/arch/mips/vdso/Makefile
> > index 5810cc12bc1d9..d7eb64de35eae 100644
> > --- a/arch/mips/vdso/Makefile
> > +++ b/arch/mips/vdso/Makefile
> > @@ -66,6 +66,7 @@ ldflags-y := -Bsymbolic --no-undefined -soname=linux-vdso.so.1 \
> >  CFLAGS_REMOVE_vdso.o = -pg
> >
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >  UBSAN_SANITIZE := n
> >  KCOV_INSTRUMENT := n
> >
> > diff --git a/arch/nds32/kernel/vdso/Makefile b/arch/nds32/kernel/vdso/Makefile
> > index 55df25ef00578..f2b53ee2124b7 100644
> > --- a/arch/nds32/kernel/vdso/Makefile
> > +++ b/arch/nds32/kernel/vdso/Makefile
> > @@ -15,9 +15,9 @@ obj-vdso := $(addprefix $(obj)/, $(obj-vdso))
> >  ccflags-y := -shared -fno-common -fno-builtin -nostdlib -fPIC -Wl,-shared -g \
> >         -Wl,-soname=linux-vdso.so.1 -Wl,--hash-style=sysv
> >
> > -# Disable gcov profiling for VDSO code
> > +# Disable gcov and PGO profiling for VDSO code
> >  GCOV_PROFILE := n
> > -
> > +PGO_PROFILE := n
> >
> >  obj-y += vdso.o
> >  targets += vdso.lds
> > diff --git a/arch/parisc/boot/compressed/Makefile b/arch/parisc/boot/compressed/Makefile
> > index dff4536875305..5cf93a67f7da7 100644
> > --- a/arch/parisc/boot/compressed/Makefile
> > +++ b/arch/parisc/boot/compressed/Makefile
> > @@ -7,6 +7,7 @@
> >
> >  KCOV_INSTRUMENT := n
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >  UBSAN_SANITIZE := n
> >
> >  targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2
> > diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> > index fe2ef598e2ead..c642c046660d7 100644
> > --- a/arch/powerpc/kernel/Makefile
> > +++ b/arch/powerpc/kernel/Makefile
> > @@ -153,17 +153,21 @@ endif
> >  obj-$(CONFIG_PPC_SECURE_BOOT)  += secure_boot.o ima_arch.o secvar-ops.o
> >  obj-$(CONFIG_PPC_SECVAR_SYSFS) += secvar-sysfs.o
> >
> > -# Disable GCOV, KCOV & sanitizers in odd or sensitive code
> > +# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
> >  GCOV_PROFILE_prom_init.o := n
> > +PGO_PROFILE_prom_init.o := n
> >  KCOV_INSTRUMENT_prom_init.o := n
> >  UBSAN_SANITIZE_prom_init.o := n
> >  GCOV_PROFILE_kprobes.o := n
> > +PGO_PROFILE_kprobes.o := n
> >  KCOV_INSTRUMENT_kprobes.o := n
> >  UBSAN_SANITIZE_kprobes.o := n
> >  GCOV_PROFILE_kprobes-ftrace.o := n
> > +PGO_PROFILE_kprobes-ftrace.o := n
> >  KCOV_INSTRUMENT_kprobes-ftrace.o := n
> >  UBSAN_SANITIZE_kprobes-ftrace.o := n
> >  GCOV_PROFILE_syscall_64.o := n
> > +PGO_PROFILE_syscall_64.o := n
> >  KCOV_INSTRUMENT_syscall_64.o := n
> >  UBSAN_SANITIZE_syscall_64.o := n
> >  UBSAN_SANITIZE_vdso.o := n
> > diff --git a/arch/powerpc/kernel/trace/Makefile b/arch/powerpc/kernel/trace/Makefile
> > index 858503775c583..7d72ae7d4f8c6 100644
> > --- a/arch/powerpc/kernel/trace/Makefile
> > +++ b/arch/powerpc/kernel/trace/Makefile
> > @@ -23,7 +23,8 @@ obj-$(CONFIG_TRACING)                 += trace_clock.o
> >  obj-$(CONFIG_PPC64)                    += $(obj64-y)
> >  obj-$(CONFIG_PPC32)                    += $(obj32-y)
> >
> > -# Disable GCOV, KCOV & sanitizers in odd or sensitive code
> > +# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
> >  GCOV_PROFILE_ftrace.o := n
> > +PGO_PROFILE_ftrace.o := n
> >  KCOV_INSTRUMENT_ftrace.o := n
> >  UBSAN_SANITIZE_ftrace.o := n
> > diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile
> > index 9cb6f524854b9..655e159975a04 100644
> > --- a/arch/powerpc/kernel/vdso32/Makefile
> > +++ b/arch/powerpc/kernel/vdso32/Makefile
> > @@ -34,6 +34,7 @@ targets := $(obj-vdso32) vdso32.so.dbg
> >  obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32))
> >
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >  KCOV_INSTRUMENT := n
> >  UBSAN_SANITIZE := n
> >  KASAN_SANITIZE := n
> > diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile
> > index bf363ff371521..12c286f5afc16 100644
> > --- a/arch/powerpc/kernel/vdso64/Makefile
> > +++ b/arch/powerpc/kernel/vdso64/Makefile
> > @@ -21,6 +21,7 @@ targets := $(obj-vdso64) vdso64.so.dbg
> >  obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64))
> >
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >  KCOV_INSTRUMENT := n
> >  UBSAN_SANITIZE := n
> >  KASAN_SANITIZE := n
> > diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
> > index 4aff6846c7726..1c7f65e3cb969 100644
> > --- a/arch/powerpc/kexec/Makefile
> > +++ b/arch/powerpc/kexec/Makefile
> > @@ -16,7 +16,8 @@ endif
> >  endif
> >
> >
> > -# Disable GCOV, KCOV & sanitizers in odd or sensitive code
> > +# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
> >  GCOV_PROFILE_core_$(BITS).o := n
> > +PGO_PROFILE_core_$(BITS).o := n
> >  KCOV_INSTRUMENT_core_$(BITS).o := n
> >  UBSAN_SANITIZE_core_$(BITS).o := n
> > diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
> > index eb25d7554ffd1..7aff80d18b44b 100644
> > --- a/arch/powerpc/xmon/Makefile
> > +++ b/arch/powerpc/xmon/Makefile
> > @@ -2,6 +2,7 @@
> >  # Makefile for xmon
> >
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >  KCOV_INSTRUMENT := n
> >  UBSAN_SANITIZE := n
> >  KASAN_SANITIZE := n
> > diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
> > index 0cfd6da784f84..882340dc3c647 100644
> > --- a/arch/riscv/kernel/vdso/Makefile
> > +++ b/arch/riscv/kernel/vdso/Makefile
> > @@ -32,8 +32,9 @@ CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
> >  # Disable -pg to prevent insert call site
> >  CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
> >
> > -# Disable gcov profiling for VDSO code
> > +# Disable gcov and PGO profiling for VDSO code
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >  KCOV_INSTRUMENT := n
> >
> >  # Force dependency
> > diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
> > index 41a64b8dce252..bee4a32040e79 100644
> > --- a/arch/s390/boot/Makefile
> > +++ b/arch/s390/boot/Makefile
> > @@ -5,6 +5,7 @@
> >
> >  KCOV_INSTRUMENT := n
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >  UBSAN_SANITIZE := n
> >  KASAN_SANITIZE := n
> >
> > diff --git a/arch/s390/boot/compressed/Makefile b/arch/s390/boot/compressed/Makefile
> > index de18dab518bb6..c3ab883e8425a 100644
> > --- a/arch/s390/boot/compressed/Makefile
> > +++ b/arch/s390/boot/compressed/Makefile
> > @@ -7,6 +7,7 @@
> >
> >  KCOV_INSTRUMENT := n
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >  UBSAN_SANITIZE := n
> >  KASAN_SANITIZE := n
> >
> > diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
> > index dd73b7f074237..bd857aacad794 100644
> > --- a/arch/s390/kernel/Makefile
> > +++ b/arch/s390/kernel/Makefile
> > @@ -14,6 +14,7 @@ CFLAGS_REMOVE_early.o         = $(CC_FLAGS_FTRACE)
> >  endif
> >
> >  GCOV_PROFILE_early.o           := n
> > +PGO_PROFILE_early.o            := n
> >  KCOV_INSTRUMENT_early.o                := n
> >  UBSAN_SANITIZE_early.o         := n
> >  KASAN_SANITIZE_ipl.o           := n
> > diff --git a/arch/s390/kernel/vdso64/Makefile b/arch/s390/kernel/vdso64/Makefile
> > index a6e0fb6b91d6c..d7c43b7c1db96 100644
> > --- a/arch/s390/kernel/vdso64/Makefile
> > +++ b/arch/s390/kernel/vdso64/Makefile
> > @@ -35,8 +35,9 @@ obj-y += vdso64_wrapper.o
> >  targets += vdso64.lds
> >  CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
> >
> > -# Disable gcov profiling, ubsan and kasan for VDSO code
> > +# Disable gcov and PGO profiling, ubsan and kasan for VDSO code
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >  UBSAN_SANITIZE := n
> >  KASAN_SANITIZE := n
> >
> > diff --git a/arch/s390/purgatory/Makefile b/arch/s390/purgatory/Makefile
> > index c57f8c40e9926..9aef584e98466 100644
> > --- a/arch/s390/purgatory/Makefile
> > +++ b/arch/s390/purgatory/Makefile
> > @@ -17,6 +17,7 @@ $(obj)/mem.o: $(srctree)/arch/s390/lib/mem.S FORCE
> >
> >  KCOV_INSTRUMENT := n
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >  UBSAN_SANITIZE := n
> >  KASAN_SANITIZE := n
> >
> > diff --git a/arch/sh/boot/compressed/Makefile b/arch/sh/boot/compressed/Makefile
> > index 589d2d8a573db..ae19aeeb3964c 100644
> > --- a/arch/sh/boot/compressed/Makefile
> > +++ b/arch/sh/boot/compressed/Makefile
> > @@ -13,6 +13,7 @@ targets               := vmlinux vmlinux.bin vmlinux.bin.gz \
> >  OBJECTS = $(obj)/head_32.o $(obj)/misc.o $(obj)/cache.o
> >
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> >  #
> >  # IMAGE_OFFSET is the load offset of the compression loader
> > diff --git a/arch/sh/mm/Makefile b/arch/sh/mm/Makefile
> > index f69ddc70b1465..ea2782c631f43 100644
> > --- a/arch/sh/mm/Makefile
> > +++ b/arch/sh/mm/Makefile
> > @@ -43,3 +43,4 @@ obj-$(CONFIG_UNCACHED_MAPPING)        += uncached.o
> >  obj-$(CONFIG_HAVE_SRAM_POOL)   += sram.o
> >
> >  GCOV_PROFILE_pmb.o := n
> > +PGO_PROFILE_pmb.o := n
> > diff --git a/arch/sparc/vdso/Makefile b/arch/sparc/vdso/Makefile
> > index c5e1545bc5cf9..ab5f3783fe199 100644
> > --- a/arch/sparc/vdso/Makefile
> > +++ b/arch/sparc/vdso/Makefile
> > @@ -115,6 +115,7 @@ quiet_cmd_vdso = VDSO    $@
> >
> >  VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 -Bsymbolic
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> >  #
> >  # Install the unstripped copies of vdso*.so.  If our toolchain supports
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 7b6dd10b162ac..a751b4f8f6645 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -95,6 +95,7 @@ config X86
> >         select ARCH_SUPPORTS_DEBUG_PAGEALLOC
> >         select ARCH_SUPPORTS_NUMA_BALANCING     if X86_64
> >         select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP       if NR_CPUS <= 4096
> > +       select ARCH_SUPPORTS_PGO_CLANG          if X86_64
> >         select ARCH_USE_BUILTIN_BSWAP
> >         select ARCH_USE_QUEUED_RWLOCKS
> >         select ARCH_USE_QUEUED_SPINLOCKS
> > diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> > index fe605205b4ce2..383853e32f673 100644
> > --- a/arch/x86/boot/Makefile
> > +++ b/arch/x86/boot/Makefile
> > @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> >  KBUILD_CFLAGS  += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> >  KBUILD_CFLAGS  += -fno-asynchronous-unwind-tables
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >  UBSAN_SANITIZE := n
> >
> >  $(obj)/bzImage: asflags-y  := $(SVGA_MODE)
> > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> > index e0bc3988c3faa..ed12ab65f6065 100644
> > --- a/arch/x86/boot/compressed/Makefile
> > +++ b/arch/x86/boot/compressed/Makefile
> > @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
> >
> >  KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >  UBSAN_SANITIZE :=n
> >
> >  KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> > diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> > index 02e3e42f380bd..26e2b3af0145c 100644
> > --- a/arch/x86/entry/vdso/Makefile
> > +++ b/arch/x86/entry/vdso/Makefile
> > @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO    $@
> >  VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> >         $(call ld-option, --eh-frame-hdr) -Bsymbolic
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> >  quiet_cmd_vdso_and_check = VDSO    $@
> >        cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> > index efd9e9ea17f25..f6cab2316c46a 100644
> > --- a/arch/x86/kernel/vmlinux.lds.S
> > +++ b/arch/x86/kernel/vmlinux.lds.S
> > @@ -184,6 +184,8 @@ SECTIONS
> >
> >         BUG_TABLE
> >
> > +       PGO_CLANG_DATA
> > +
> >         ORC_UNWIND_TABLE
> >
> >         . = ALIGN(PAGE_SIZE);
> > diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> > index 84b09c230cbd5..5f22b31446ad4 100644
> > --- a/arch/x86/platform/efi/Makefile
> > +++ b/arch/x86/platform/efi/Makefile
> > @@ -2,6 +2,7 @@
> >  OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> >  KASAN_SANITIZE := n
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> >  obj-$(CONFIG_EFI)              += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> >  obj-$(CONFIG_EFI_MIXED)                += efi_thunk_$(BITS).o
> > diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> > index 95ea17a9d20cb..36f20e99da0bc 100644
> > --- a/arch/x86/purgatory/Makefile
> > +++ b/arch/x86/purgatory/Makefile
> > @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
> >
> >  # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> >  GCOV_PROFILE   := n
> > +PGO_PROFILE    := n
> >  KASAN_SANITIZE := n
> >  UBSAN_SANITIZE := n
> >  KCSAN_SANITIZE := n
> > diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> > index 83f1b6a56449f..21797192f958f 100644
> > --- a/arch/x86/realmode/rm/Makefile
> > +++ b/arch/x86/realmode/rm/Makefile
> > @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> >  KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> >  KBUILD_CFLAGS  += -fno-asynchronous-unwind-tables
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >  UBSAN_SANITIZE := n
> > diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> > index 5943387e3f357..54f5768f58530 100644
> > --- a/arch/x86/um/vdso/Makefile
> > +++ b/arch/x86/um/vdso/Makefile
> > @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO    $@
> >
> >  VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> >  #
> >  # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> > index 8a94388e38b33..2d81623b33f29 100644
> > --- a/drivers/firmware/efi/libstub/Makefile
> > +++ b/drivers/firmware/efi/libstub/Makefile
> > @@ -40,6 +40,7 @@ KBUILD_CFLAGS                 := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
> >  KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
> >
> >  GCOV_PROFILE                   := n
> > +PGO_PROFILE                    := n
> >  # Sanitizer runtimes are unavailable and cannot be linked here.
> >  KASAN_SANITIZE                 := n
> >  KCSAN_SANITIZE                 := n
> > diff --git a/drivers/s390/char/Makefile b/drivers/s390/char/Makefile
> > index c6fdb81a068a6..bf6c5db5da1fc 100644
> > --- a/drivers/s390/char/Makefile
> > +++ b/drivers/s390/char/Makefile
> > @@ -9,6 +9,7 @@ CFLAGS_REMOVE_sclp_early_core.o = $(CC_FLAGS_FTRACE)
> >  endif
> >
> >  GCOV_PROFILE_sclp_early_core.o         := n
> > +PGO_PROFILE_sclp_early_core.o          := n
> >  KCOV_INSTRUMENT_sclp_early_core.o      := n
> >  UBSAN_SANITIZE_sclp_early_core.o       := n
> >  KASAN_SANITIZE_sclp_early_core.o       := n
> > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > index b2b3d81b1535a..3a591bb18c5fb 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -316,6 +316,49 @@
> >  #define THERMAL_TABLE(name)
> >  #endif
> >
> > +#ifdef CONFIG_PGO_CLANG
> > +#define PGO_CLANG_DATA                                                 \
> > +       __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) {     \
> > +               . = ALIGN(8);                                           \
> > +               __llvm_prf_start = .;                                   \
> > +               __llvm_prf_data_start = .;                              \
> > +               KEEP(*(__llvm_prf_data))                                \
> > +               . = ALIGN(8);                                           \
> > +               __llvm_prf_data_end = .;                                \
> > +       }                                                               \
> > +       __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) {     \
> > +               . = ALIGN(8);                                           \
> > +               __llvm_prf_cnts_start = .;                              \
> > +               KEEP(*(__llvm_prf_cnts))                                \
> > +               . = ALIGN(8);                                           \
> > +               __llvm_prf_cnts_end = .;                                \
> > +       }                                                               \
> > +       __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) {   \
> > +               . = ALIGN(8);                                           \
> > +               __llvm_prf_names_start = .;                             \
> > +               KEEP(*(__llvm_prf_names))                               \
> > +               . = ALIGN(8);                                           \
> > +               __llvm_prf_names_end = .;                               \
> > +               . = ALIGN(8);                                           \
> > +       }                                                               \
> > +       __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) {     \
> > +               __llvm_prf_vals_start = .;                              \
> > +               KEEP(*(__llvm_prf_vals))                                \
> > +               . = ALIGN(8);                                           \
> > +               __llvm_prf_vals_end = .;                                \
> > +               . = ALIGN(8);                                           \
> > +       }                                                               \
> > +       __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) {     \
> > +               __llvm_prf_vnds_start = .;                              \
> > +               KEEP(*(__llvm_prf_vnds))                                \
> > +               . = ALIGN(8);                                           \
> > +               __llvm_prf_vnds_end = .;                                \
> > +               __llvm_prf_end = .;                                     \
> > +       }
> > +#else
> > +#define PGO_CLANG_DATA
> > +#endif
> > +
> >  #define KERNEL_DTB()                                                   \
> >         STRUCT_ALIGN();                                                 \
> >         __dtb_start = .;                                                \
> > @@ -1125,6 +1168,7 @@
> >                 CONSTRUCTORS                                            \
> >         }                                                               \
> >         BUG_TABLE                                                       \
> > +       PGO_CLANG_DATA
> >
> >  #define INIT_TEXT_SECTION(inittext_align)                              \
> >         . = ALIGN(inittext_align);                                      \
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index aa7368c7eabf3..0b34ca228ba46 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> >  obj-$(CONFIG_KCSAN) += kcsan/
> >  obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> >  obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> > +obj-$(CONFIG_PGO_CLANG) += pgo/
> >
> >  obj-$(CONFIG_PERF_EVENTS) += events/
> >
> > diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> > new file mode 100644
> > index 0000000000000..318d36bb3d106
> > --- /dev/null
> > +++ b/kernel/pgo/Kconfig
> > @@ -0,0 +1,34 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> > +
> > +config ARCH_SUPPORTS_PGO_CLANG
> > +       bool
> > +
> > +config PGO_CLANG
> > +       bool "Enable clang's PGO-based kernel profiling"
> > +       depends on DEBUG_FS
> > +       depends on ARCH_SUPPORTS_PGO_CLANG
> > +       help
> > +         This option enables clang's PGO (Profile Guided Optimization) based
> > +         code profiling to better optimize the kernel.
> > +
> > +         If unsure, say N.
> > +
> > +         Run a representative workload for your application on a kernel
> > +         compiled with this option and download the raw profile file from
> > +         /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> > +         llvm-profdata. It may be merged with other collected raw profiles.
> > +
> > +         Copy the resulting profile file into vmlinux.profdata, and enable
> > +         KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> > +         kernel.
> > +
> > +         Note that a kernel compiled with profiling flags will be
> > +         significatnly larger and run slower. Also be sure to exclude files
> > +         from profiling which are not linked to the kernel image to prevent
> > +         linker errors.
> > +
> > +         Note that the debugfs filesystem has to be mounted to access
> > +         profiling data.
> > +
> > +endmenu
> > diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> > new file mode 100644
> > index 0000000000000..41e27cefd9a47
> > --- /dev/null
> > +++ b/kernel/pgo/Makefile
> > @@ -0,0 +1,5 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +GCOV_PROFILE   := n
> > +PGO_PROFILE    := n
> > +
> > +obj-y  += fs.o instrument.o
> > diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> > new file mode 100644
> > index 0000000000000..790a8df037bfc
> > --- /dev/null
> > +++ b/kernel/pgo/fs.c
> > @@ -0,0 +1,382 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + *     Sami Tolvanen <samitolvanen@google.com>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt)    "pgo: " fmt
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/debugfs.h>
> > +#include <linux/fs.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +#include <linux/vmalloc.h>
> > +#include "pgo.h"
> > +
> > +static struct dentry *directory;
> > +
> > +struct prf_private_data {
> > +       void *buffer;
> > +       unsigned long size;
> > +};
> > +
> > +/*
> > + * Raw profile data format:
> > + *
> > + *     - llvm_prf_header
> > + *     - __llvm_prf_data
> > + *     - __llvm_prf_cnts
> > + *     - __llvm_prf_names
> > + *     - zero padding to 8 bytes
> > + *     - for each llvm_prf_data in __llvm_prf_data:
> > + *             - llvm_prf_value_data
> > + *                     - llvm_prf_value_record + site count array
> > + *                             - llvm_prf_value_node_data
> > + *                             ...
> > + *                     ...
> > + *             ...
> > + */
> > +
> > +static void prf_fill_header(void **buffer)
> > +{
> > +       struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> > +
> > +       header->magic = LLVM_PRF_MAGIC;
> > +       header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
> > +       header->data_size = prf_data_count();
> > +       header->padding_bytes_before_counters = 0;
> > +       header->counters_size = prf_cnts_count();
> > +       header->padding_bytes_after_counters = 0;
> > +       header->names_size = prf_names_count();
> > +       header->counters_delta = (u64)__llvm_prf_cnts_start;
> > +       header->names_delta = (u64)__llvm_prf_names_start;
> > +       header->value_kind_last = LLVM_PRF_IPVK_LAST;
> > +
> > +       *buffer += sizeof(*header);
> > +}
> > +
> > +/*
> > + * Copy the source into the buffer, incrementing the pointer into buffer in the
> > + * process.
> > + */
> > +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> > +{
> > +       memcpy(*buffer, src, size);
> > +       *buffer += size;
> > +}
> > +
> > +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> > +{
> > +       struct llvm_prf_value_node **nodes =
> > +               (struct llvm_prf_value_node **)p->values;
> > +       u32 kinds = 0;
> > +       u32 size = 0;
> > +       unsigned int kind;
> > +       unsigned int n;
> > +       unsigned int s = 0;
> > +
> > +       for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> > +               unsigned int sites = p->num_value_sites[kind];
> > +
> > +               if (!sites)
> > +                       continue;
> > +
> > +               /* Record + site count array */
> > +               size += prf_get_value_record_size(sites);
> > +               kinds++;
> > +
> > +               if (!nodes)
> > +                       continue;
> > +
> > +               for (n = 0; n < sites; n++) {
> > +                       u32 count = 0;
> > +                       struct llvm_prf_value_node *site = nodes[s + n];
> > +
> > +                       while (site && ++count <= U8_MAX)
> > +                               site = site->next;
> > +
> > +                       size += count *
> > +                               sizeof(struct llvm_prf_value_node_data);
> > +               }
> > +
> > +               s += sites;
> > +       }
> > +
> > +       if (size)
> > +               size += sizeof(struct llvm_prf_value_data);
> > +
> > +       if (value_kinds)
> > +               *value_kinds = kinds;
> > +
> > +       return size;
> > +}
> > +
> > +static u32 prf_get_value_size(void)
> > +{
> > +       u32 size = 0;
> > +       struct llvm_prf_data *p;
> > +
> > +       for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> > +               size += __prf_get_value_size(p, NULL);
> > +
> > +       return size;
> > +}
> > +
> > +/* Serialize the profiling's value. */
> > +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> > +{
> > +       struct llvm_prf_value_data header;
> > +       struct llvm_prf_value_node **nodes =
> > +               (struct llvm_prf_value_node **)p->values;
> > +       unsigned int kind;
> > +       unsigned int n;
> > +       unsigned int s = 0;
> > +
> > +       header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> > +
> > +       if (!header.num_value_kinds)
> > +               /* Nothing to write. */
> > +               return;
> > +
> > +       prf_copy_to_buffer(buffer, &header, sizeof(header));
> > +
> > +       for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> > +               struct llvm_prf_value_record *record;
> > +               u8 *counts;
> > +               unsigned int sites = p->num_value_sites[kind];
> > +
> > +               if (!sites)
> > +                       continue;
> > +
> > +               /* Profiling value record. */
> > +               record = *(struct llvm_prf_value_record **)buffer;
> > +               *buffer += prf_get_value_record_header_size();
> > +
> > +               record->kind = kind;
> > +               record->num_value_sites = sites;
> > +
> > +               /* Site count array. */
> > +               counts = *(u8 **)buffer;
> > +               *buffer += prf_get_value_record_site_count_size(sites);
> > +
> > +               /*
> > +                * If we don't have nodes, we can skip updating the site count
> > +                * array, because the buffer is zero filled.
> > +                */
> > +               if (!nodes)
> > +                       continue;
> > +
> > +               for (n = 0; n < sites; n++) {
> > +                       u32 count = 0;
> > +                       struct llvm_prf_value_node *site = nodes[s + n];
> > +
> > +                       while (site && ++count <= U8_MAX) {
> > +                               prf_copy_to_buffer(buffer, site,
> > +                                                  sizeof(struct llvm_prf_value_node_data));
> > +                               site = site->next;
> > +                       }
> > +
> > +                       counts[n] = (u8)count;
> > +               }
> > +
> > +               s += sites;
> > +       }
> > +}
> > +
> > +static void prf_serialize_values(void **buffer)
> > +{
> > +       struct llvm_prf_data *p;
> > +
> > +       for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> > +               prf_serialize_value(p, buffer);
> > +}
> > +
> > +static inline unsigned long prf_get_padding(unsigned long size)
> > +{
> > +       return 8 - (size % 8);
> > +}
> > +
> > +static unsigned long prf_buffer_size(void)
> > +{
> > +       return sizeof(struct llvm_prf_header) +
> > +                       prf_data_size() +
> > +                       prf_cnts_size() +
> > +                       prf_names_size() +
> > +                       prf_get_padding(prf_names_size()) +
> > +                       prf_get_value_size();
> > +}
> > +
> > +/* Serialize the profling data into a format LLVM's tools can understand. */
> > +static int prf_serialize(struct prf_private_data *p)
> > +{
> > +       int err = 0;
> > +       void *buffer;
> > +
> > +       p->size = prf_buffer_size();
> > +       p->buffer = vzalloc(p->size);
> > +
> > +       if (!p->buffer) {
> > +               err = -ENOMEM;
> > +               goto out;
> > +       }
> > +
> > +       buffer = p->buffer;
> > +
> > +       prf_fill_header(&buffer);
> > +       prf_copy_to_buffer(&buffer, __llvm_prf_data_start,  prf_data_size());
> > +       prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start,  prf_cnts_size());
> > +       prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> > +       buffer += prf_get_padding(prf_names_size());
> > +
> > +       prf_serialize_values(&buffer);
> > +
> > +out:
> > +       return err;
> > +}
> > +
> > +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> > +static int prf_open(struct inode *inode, struct file *file)
> > +{
> > +       struct prf_private_data *data;
> > +       unsigned long flags;
> > +       int err;
> > +
> > +       data = kzalloc(sizeof(*data), GFP_KERNEL);
> > +       if (!data) {
> > +               err = -ENOMEM;
> > +               goto out;
> > +       }
> > +
> > +       flags = prf_lock();
> > +
> > +       err = prf_serialize(data);
> > +       if (err) {
> > +               kfree(data);
> > +               goto out_unlock;
> > +       }
> > +
> > +       file->private_data = data;
> > +
> > +out_unlock:
> > +       prf_unlock(flags);
> > +out:
> > +       return err;
> > +}
> > +
> > +/* read() implementation for PGO. */
> > +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> > +                       loff_t *ppos)
> > +{
> > +       struct prf_private_data *data = file->private_data;
> > +
> > +       BUG_ON(!data);
> > +
> > +       return simple_read_from_buffer(buf, count, ppos, data->buffer,
> > +                                      data->size);
> > +}
> > +
> > +/* release() implementation for PGO. Release resources allocated by open(). */
> > +static int prf_release(struct inode *inode, struct file *file)
> > +{
> > +       struct prf_private_data *data = file->private_data;
> > +
> > +       if (data) {
> > +               vfree(data->buffer);
> > +               kfree(data);
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +static const struct file_operations prf_fops = {
> > +       .owner          = THIS_MODULE,
> > +       .open           = prf_open,
> > +       .read           = prf_read,
> > +       .llseek         = default_llseek,
> > +       .release        = prf_release
> > +};
> > +
> > +/* write() implementation for resetting PGO's profile data. */
> > +static ssize_t reset_write(struct file *file, const char __user *addr,
> > +                          size_t len, loff_t *pos)
> > +{
> > +       struct llvm_prf_data *data;
> > +
> > +       memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> > +
> > +       for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
> > +               struct llvm_prf_value_node **vnodes;
> > +               u64 current_vsite_count;
> > +               u32 i;
> > +
> > +               if (!data->values)
> > +                       continue;
> > +
> > +               current_vsite_count = 0;
> > +               vnodes = (struct llvm_prf_value_node **)data->values;
> > +
> > +               for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
> > +                       current_vsite_count += data->num_value_sites[i];
> > +
> > +               for (i = 0; i < current_vsite_count; ++i) {
> > +                       struct llvm_prf_value_node *current_vnode = vnodes[i];
> > +
> > +                       while (current_vnode) {
> > +                               current_vnode->count = 0;
> > +                               current_vnode = current_vnode->next;
> > +                       }
> > +               }
> > +       }
> > +
> > +       return len;
> > +}
> > +
> > +static const struct file_operations prf_reset_fops = {
> > +       .owner          = THIS_MODULE,
> > +       .write          = reset_write,
> > +       .llseek         = noop_llseek,
> > +};
> > +
> > +/* Create debugfs entries. */
> > +static int __init pgo_init(void)
> > +{
> > +       directory = debugfs_create_dir("pgo", NULL);
> > +       if (!directory)
> > +               goto err_remove;
> > +
> > +       if (!debugfs_create_file("profraw", 0600, directory, NULL,
> > +                                &prf_fops))
> > +               goto err_remove;
> > +
> > +       if (!debugfs_create_file("reset", 0200, directory, NULL,
> > +                                &prf_reset_fops))
> > +               goto err_remove;
> > +
> > +       return 0;
> > +
> > +err_remove:
> > +       pr_err("initialization failed\n");
> > +       return -EIO;
> > +}
> > +
> > +/* Remove debufs entries. */
> > +static void __exit pgo_exit(void)
> > +{
> > +       debugfs_remove_recursive(directory);
> > +}
> > +
> > +module_init(pgo_init);
> > +module_exit(pgo_exit);
> > diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> > new file mode 100644
> > index 0000000000000..d96b61a1cf712
> > --- /dev/null
> > +++ b/kernel/pgo/instrument.c
> > @@ -0,0 +1,147 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + *     Sami Tolvanen <samitolvanen@google.com>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt)    "pgo: " fmt
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/export.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/types.h>
> > +#include "pgo.h"
> > +
> > +/* Lock guarding value node access and serialization. */
> > +static DEFINE_SPINLOCK(pgo_lock);
> > +static int current_node;
> > +
> > +unsigned long prf_lock(void)
> > +{
> > +       unsigned long flags;
> > +
> > +       spin_lock_irqsave(&pgo_lock, flags);
> > +
> > +       return flags;
> > +}
> > +
> > +void prf_unlock(unsigned long flags)
> > +{
> > +       spin_unlock_irqrestore(&pgo_lock, flags);
> > +}
> > +
> > +/*
> > + * Return a newly allocated profiling value node which contains the tracked
> > + * value by the value profiler.
> > + * Note: caller *must* hold pgo_lock.
> > + */
> > +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> > +                                                u32 index, u64 value)
> > +{
> > +       if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> > +               return NULL; /* Out of nodes */
> > +
> > +       current_node++;
> > +
> > +       /* Make sure the node is entirely within the section */
> > +       if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> > +           &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> > +               return NULL;
> > +
> > +       return &__llvm_prf_vnds_start[current_node];
> > +}
> > +
> > +/*
> > + * Counts the number of times a target value is seen.
> > + *
> > + * Records the target value for the CounterIndex if not seen before. Otherwise,
> > + * increments the counter associated w/ the target value.
> > + */
> > +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> > +{
> > +       struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> > +       struct llvm_prf_value_node **counters;
> > +       struct llvm_prf_value_node *curr;
> > +       struct llvm_prf_value_node *min = NULL;
> > +       struct llvm_prf_value_node *prev = NULL;
> > +       u64 min_count = U64_MAX;
> > +       u8 values = 0;
> > +       unsigned long flags;
> > +
> > +       if (!p || !p->values)
> > +               return;
> > +
> > +       counters = (struct llvm_prf_value_node **)p->values;
> > +       curr = counters[index];
> > +
> > +       while (curr) {
> > +               if (target_value == curr->value) {
> > +                       curr->count++;
> > +                       return;
> > +               }
> > +
> > +               if (curr->count < min_count) {
> > +                       min_count = curr->count;
> > +                       min = curr;
> > +               }
> > +
> > +               prev = curr;
> > +               curr = curr->next;
> > +               values++;
> > +       }
> > +
> > +       if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
> > +               if (!min->count || !(--min->count)) {
> > +                       curr = min;
> > +                       curr->value = target_value;
> > +                       curr->count++;
> > +               }
> > +               return;
> > +       }
> > +
> > +       /* Lock when updating the value node structure. */
> > +       flags = prf_lock();
> > +
> > +       curr = allocate_node(p, index, target_value);
> > +       if (!curr)
> > +               goto out;
> > +
> > +       curr->value = target_value;
> > +       curr->count++;
> > +
> > +       if (!counters[index])
> > +               counters[index] = curr;
> > +       else if (prev && !prev->next)
> > +               prev->next = curr;
> > +
> > +out:
> > +       prf_unlock(flags);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> > +
> > +/* Counts the number of times a range of targets values are seen. */
> > +void __llvm_profile_instrument_range(u64 target_value, void *data,
> > +                                    u32 index, s64 precise_start,
> > +                                    s64 precise_last, s64 large_value)
> > +{
> > +       if (large_value != S64_MIN && (s64)target_value >= large_value)
> > +               target_value = large_value;
> > +       else if ((s64)target_value < precise_start ||
> > +                (s64)target_value > precise_last)
> > +               target_value = precise_last + 1;
> > +
> > +       __llvm_profile_instrument_target(target_value, data, index);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> > diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> > new file mode 100644
> > index 0000000000000..df0aa278f28bd
> > --- /dev/null
> > +++ b/kernel/pgo/pgo.h
> > @@ -0,0 +1,206 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + *     Sami Tolvanen <samitolvanen@google.com>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#ifndef _PGO_H
> > +#define _PGO_H
> > +
> > +/*
> > + * Note: These internal LLVM definitions must match the compiler version.
> > + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> > + */
> > +
> > +#ifdef CONFIG_64BIT
> > +       #define LLVM_PRF_MAGIC          \
> > +               ((u64)255 << 56 |       \
> > +                (u64)'l' << 48 |       \
> > +                (u64)'p' << 40 |       \
> > +                (u64)'r' << 32 |       \
> > +                (u64)'o' << 24 |       \
> > +                (u64)'f' << 16 |       \
> > +                (u64)'r' << 8  |       \
> > +                (u64)129)
> > +#else
> > +       #define LLVM_PRF_MAGIC          \
> > +               ((u64)255 << 56 |       \
> > +                (u64)'l' << 48 |       \
> > +                (u64)'p' << 40 |       \
> > +                (u64)'r' << 32 |       \
> > +                (u64)'o' << 24 |       \
> > +                (u64)'f' << 16 |       \
> > +                (u64)'R' << 8  |       \
> > +                (u64)129)
> > +#endif
> > +
> > +#define LLVM_PRF_VERSION               5
> > +#define LLVM_PRF_DATA_ALIGN            8
> > +#define LLVM_PRF_IPVK_FIRST            0
> > +#define LLVM_PRF_IPVK_LAST             1
> > +#define LLVM_PRF_MAX_NUM_VALS_PER_SITE 16
> > +
> > +#define LLVM_PRF_VARIANT_MASK_IR       (0x1ull << 56)
> > +#define LLVM_PRF_VARIANT_MASK_CSIR     (0x1ull << 57)
> > +
> > +/**
> > + * struct llvm_prf_header - represents the raw profile header data structure.
> > + * @magic: the magic token for the file format.
> > + * @version: the version of the file format.
> > + * @data_size: the number of entries in the profile data section.
> > + * @padding_bytes_before_counters: the number of padding bytes before the
> > + *   counters.
> > + * @counters_size: the size in bytes of the LLVM profile section containing the
> > + *   counters.
> > + * @padding_bytes_after_counters: the number of padding bytes after the
> > + *   counters.
> > + * @names_size: the size in bytes of the LLVM profile section containing the
> > + *   counters' names.
> > + * @counters_delta: the beginning of the LLMV profile counters section.
> > + * @names_delta: the beginning of the LLMV profile names section.
> > + * @value_kind_last: the last profile value kind.
> > + */
> > +struct llvm_prf_header {
> > +       u64 magic;
> > +       u64 version;
> > +       u64 data_size;
> > +       u64 padding_bytes_before_counters;
> > +       u64 counters_size;
> > +       u64 padding_bytes_after_counters;
> > +       u64 names_size;
> > +       u64 counters_delta;
> > +       u64 names_delta;
> > +       u64 value_kind_last;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_data - represents the per-function control structure.
> > + * @name_ref: the reference to the function's name.
> > + * @func_hash: the hash value of the function.
> > + * @counter_ptr: a pointer to the profile counter.
> > + * @function_ptr: a pointer to the function.
> > + * @values: the profiling values associated with this function.
> > + * @num_counters: the number of counters in the function.
> > + * @num_value_sites: the number of value profile sites.
> > + */
> > +struct llvm_prf_data {
> > +       const u64 name_ref;
> > +       const u64 func_hash;
> > +       const void *counter_ptr;
> > +       const void *function_ptr;
> > +       void *values;
> > +       const u32 num_counters;
> > +       const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
> > +} __aligned(LLVM_PRF_DATA_ALIGN);
> > +
> > +/**
> > + * structure llvm_prf_value_node_data - represents the data part of the struct
> > + *   llvm_prf_value_node data structure.
> > + * @value: the value counters.
> > + * @count: the counters' count.
> > + */
> > +struct llvm_prf_value_node_data {
> > +       u64 value;
> > +       u64 count;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_node - represents an internal data structure used by
> > + *   the value profiler.
> > + * @value: the value counters.
> > + * @count: the counters' count.
> > + * @next: the next value node.
> > + */
> > +struct llvm_prf_value_node {
> > +       u64 value;
> > +       u64 count;
> > +       struct llvm_prf_value_node *next;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_data - represents the value profiling data in indexed
> > + *   format.
> > + * @total_size: the total size in bytes including this field.
> > + * @num_value_kinds: the number of value profile kinds that has value profile
> > + *   data.
> > + */
> > +struct llvm_prf_value_data {
> > +       u32 total_size;
> > +       u32 num_value_kinds;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_record - represents the on-disk layout of the value
> > + *   profile data of a particular kind for one function.
> > + * @kind: the kind of the value profile record.
> > + * @num_value_sites: the number of value profile sites.
> > + * @site_count_array: the first element of the array that stores the number
> > + *   of profiled values for each value site.
> > + */
> > +struct llvm_prf_value_record {
> > +       u32 kind;
> > +       u32 num_value_sites;
> > +       u8 site_count_array[];
> > +};
> > +
> > +#define prf_get_value_record_header_size()             \
> > +       offsetof(struct llvm_prf_value_record, site_count_array)
> > +#define prf_get_value_record_site_count_size(sites)    \
> > +       roundup((sites), 8)
> > +#define prf_get_value_record_size(sites)               \
> > +       (prf_get_value_record_header_size() +           \
> > +        prf_get_value_record_site_count_size((sites)))
> > +
> > +/* Data sections */
> > +extern struct llvm_prf_data __llvm_prf_data_start[];
> > +extern struct llvm_prf_data __llvm_prf_data_end[];
> > +
> > +extern u64 __llvm_prf_cnts_start[];
> > +extern u64 __llvm_prf_cnts_end[];
> > +
> > +extern char __llvm_prf_names_start[];
> > +extern char __llvm_prf_names_end[];
> > +
> > +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> > +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> > +
> > +/* Locking for vnodes */
> > +extern unsigned long prf_lock(void);
> > +extern void prf_unlock(unsigned long flags);
> > +
> > +#define __DEFINE_PRF_SIZE(s) \
> > +       static inline unsigned long prf_ ## s ## _size(void)            \
> > +       {                                                               \
> > +               unsigned long start =                                   \
> > +                       (unsigned long)__llvm_prf_ ## s ## _start;      \
> > +               unsigned long end =                                     \
> > +                       (unsigned long)__llvm_prf_ ## s ## _end;        \
> > +               return roundup(end - start,                             \
> > +                               sizeof(__llvm_prf_ ## s ## _start[0])); \
> > +       }                                                               \
> > +       static inline unsigned long prf_ ## s ## _count(void)           \
> > +       {                                                               \
> > +               return prf_ ## s ## _size() /                           \
> > +                       sizeof(__llvm_prf_ ## s ## _start[0]);          \
> > +       }
> > +
> > +__DEFINE_PRF_SIZE(data);
> > +__DEFINE_PRF_SIZE(cnts);
> > +__DEFINE_PRF_SIZE(names);
> > +__DEFINE_PRF_SIZE(vnds);
> > +
> > +#undef __DEFINE_PRF_SIZE
> > +
> > +#endif /* _PGO_H */
> > diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> > index 213677a5ed33e..9b218afb5cb87 100644
> > --- a/scripts/Makefile.lib
> > +++ b/scripts/Makefile.lib
> > @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
> >                 $(CFLAGS_GCOV))
> >  endif
> >
> > +#
> > +# Enable clang's PGO profiling flags for a file or directory depending on
> > +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> > +#
> > +ifeq ($(CONFIG_PGO_CLANG),y)
> > +_c_flags += $(if $(patsubst n%,, \
> > +               $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> > +               $(CFLAGS_PGO_CLANG))
> > +endif
> > +
> >  #
> >  # Enable address sanitizer flags for kernel except some files or directories
> >  # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> > --
> > 2.30.0.284.gd98b1dd5eaa7-goog
> >
> > --
> > You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210111081821.3041587-1-morbo%40google.com.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-11  8:39 ` Sedat Dilek
  2021-01-11  8:42   ` Sedat Dilek
@ 2021-01-11  9:17   ` Bill Wendling
  2021-01-11  9:57     ` Sedat Dilek
  1 sibling, 1 reply; 116+ messages in thread
From: Bill Wendling @ 2021-01-11  9:17 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Jonathan Corbet, Masahiro Yamada, linux-doc, LKML,
	Linux Kbuild mailing list, Clang-Built-Linux ML, Andrew Morton,
	Nathan Chancellor, Nick Desaulniers, Sami Tolvanen

On Mon, Jan 11, 2021 at 12:39 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
> On Mon, Jan 11, 2021 at 9:18 AM 'Bill Wendling' via Clang Built Linux
> <clang-built-linux@googlegroups.com> wrote:
> >
> > From: Sami Tolvanen <samitolvanen@google.com>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool before
> > it can be used during recompilation:
> >
> >   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> >   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can be used either by the compiler if LTO isn't enabled:
> >
> >     ... -fprofile-use=vmlinux.profdata ...
> >
> > or by LLD if LTO is enabled:
> >
> >     ... -lto-cs-profile-file=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we know
> > works. This restriction can be lifted once other platforms have been verified
> > to work with PGO.
> >
> > Note that this method of profiling the kernel is clang-native and isn't
> > compatible with clang's gcov support in kernel/gcov.
> >
> > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> >
>
> Hi Bill and Sami,
>
> I have seen the pull-request in the CBL issue tracker and had some
> questions in mind.
>
> Good you send this.
>
> First of all, I like to fetch any development stuff easily from a Git
> repository.

The version in the pull-request in the CBL issue tracker is roughly
the same as this patch. (There are some changes, but they aren't
functionality changes.)

> Can you offer this, please?
> What is the base for your work?
> I hope this is (fresh released) Linux v5.11-rc3.
>
This patch (and the PR on the CBL issue tracker) are from top-of-tree Linux.

> I myself had some experiences with a PGO + ThinLTO optimized LLVM
> toolchain built with the help of tc-build.
> Here it takes very long to build it.
>
> This means I have some profile-data archived.
> Can I use it?
>
LLVM is more tolerant of "stale" profile data than gcov, so it's
possible that your archived profile data would still work, but I can't
guarantee that it will be better than using new profile data.

> Is an own PGO + ThinLTO optimized LLVM toolchain pre-requirement for
> this or not?
> That is one of my important questions.
>
Do you mean that the LLVM tools (clang, llc, etc.) are compiled with
PGO + ThinLTO?

-bw

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-11  9:17   ` Bill Wendling
@ 2021-01-11  9:57     ` Sedat Dilek
  2021-01-11 18:28       ` Nathan Chancellor
  0 siblings, 1 reply; 116+ messages in thread
From: Sedat Dilek @ 2021-01-11  9:57 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Jonathan Corbet, Masahiro Yamada, linux-doc, LKML,
	Linux Kbuild mailing list, Clang-Built-Linux ML, Andrew Morton,
	Nathan Chancellor, Nick Desaulniers, Sami Tolvanen

On Mon, Jan 11, 2021 at 10:17 AM Bill Wendling <morbo@google.com> wrote:
>
> On Mon, Jan 11, 2021 at 12:39 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >
> > On Mon, Jan 11, 2021 at 9:18 AM 'Bill Wendling' via Clang Built Linux
> > <clang-built-linux@googlegroups.com> wrote:
> > >
> > > From: Sami Tolvanen <samitolvanen@google.com>
> > >
> > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > profile, the kernel is instrumented with PGO counters, a representative
> > > workload is run, and the raw profile data is collected from
> > > /sys/kernel/debug/pgo/profraw.
> > >
> > > The raw profile data must be processed by clang's "llvm-profdata" tool before
> > > it can be used during recompilation:
> > >
> > >   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > >   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > >
> > > Multiple raw profiles may be merged during this step.
> > >
> > > The data can be used either by the compiler if LTO isn't enabled:
> > >
> > >     ... -fprofile-use=vmlinux.profdata ...
> > >
> > > or by LLD if LTO is enabled:
> > >
> > >     ... -lto-cs-profile-file=vmlinux.profdata ...
> > >
> > > This initial submission is restricted to x86, as that's the platform we know
> > > works. This restriction can be lifted once other platforms have been verified
> > > to work with PGO.
> > >
> > > Note that this method of profiling the kernel is clang-native and isn't
> > > compatible with clang's gcov support in kernel/gcov.
> > >
> > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > >
> >
> > Hi Bill and Sami,
> >
> > I have seen the pull-request in the CBL issue tracker and had some
> > questions in mind.
> >
> > Good you send this.
> >
> > First of all, I like to fetch any development stuff easily from a Git
> > repository.
>
> The version in the pull-request in the CBL issue tracker is roughly
> the same as this patch. (There are some changes, but they aren't
> functionality changes.)
>
> > Can you offer this, please?
> > What is the base for your work?
> > I hope this is (fresh released) Linux v5.11-rc3.
> >
> This patch (and the PR on the CBL issue tracker) are from top-of-tree Linux.
>
> > I myself had some experiences with a PGO + ThinLTO optimized LLVM
> > toolchain built with the help of tc-build.
> > Here it takes very long to build it.
> >
> > This means I have some profile-data archived.
> > Can I use it?
> >
> LLVM is more tolerant of "stale" profile data than gcov, so it's
> possible that your archived profile data would still work, but I can't
> guarantee that it will be better than using new profile data.
>
> > Is an own PGO + ThinLTO optimized LLVM toolchain pre-requirement for
> > this or not?
> > That is one of my important questions.
> >
> Do you mean that the LLVM tools (clang, llc, etc.) are compiled with
> PGO + ThinLTO?
>

Yes.

- Sedat -

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-11  9:57     ` Sedat Dilek
@ 2021-01-11 18:28       ` Nathan Chancellor
  0 siblings, 0 replies; 116+ messages in thread
From: Nathan Chancellor @ 2021-01-11 18:28 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Bill Wendling, Jonathan Corbet, Masahiro Yamada, linux-doc, LKML,
	Linux Kbuild mailing list, Clang-Built-Linux ML, Andrew Morton,
	Nick Desaulniers, Sami Tolvanen

On Mon, Jan 11, 2021 at 10:57:35AM +0100, Sedat Dilek wrote:
> On Mon, Jan 11, 2021 at 10:17 AM Bill Wendling <morbo@google.com> wrote:
> >
> > On Mon, Jan 11, 2021 at 12:39 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > >
> > > On Mon, Jan 11, 2021 at 9:18 AM 'Bill Wendling' via Clang Built Linux
> > > <clang-built-linux@googlegroups.com> wrote:
> > > >
> > > > From: Sami Tolvanen <samitolvanen@google.com>
> > > >
> > > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > > profile, the kernel is instrumented with PGO counters, a representative
> > > > workload is run, and the raw profile data is collected from
> > > > /sys/kernel/debug/pgo/profraw.
> > > >
> > > > The raw profile data must be processed by clang's "llvm-profdata" tool before
> > > > it can be used during recompilation:
> > > >
> > > >   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > >   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > >
> > > > Multiple raw profiles may be merged during this step.
> > > >
> > > > The data can be used either by the compiler if LTO isn't enabled:
> > > >
> > > >     ... -fprofile-use=vmlinux.profdata ...
> > > >
> > > > or by LLD if LTO is enabled:
> > > >
> > > >     ... -lto-cs-profile-file=vmlinux.profdata ...
> > > >
> > > > This initial submission is restricted to x86, as that's the platform we know
> > > > works. This restriction can be lifted once other platforms have been verified
> > > > to work with PGO.
> > > >
> > > > Note that this method of profiling the kernel is clang-native and isn't
> > > > compatible with clang's gcov support in kernel/gcov.
> > > >
> > > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > >
> > >
> > > Hi Bill and Sami,
> > >
> > > I have seen the pull-request in the CBL issue tracker and had some
> > > questions in mind.
> > >
> > > Good you send this.
> > >
> > > First of all, I like to fetch any development stuff easily from a Git
> > > repository.
> >
> > The version in the pull-request in the CBL issue tracker is roughly
> > the same as this patch. (There are some changes, but they aren't
> > functionality changes.)
> >
> > > Can you offer this, please?
> > > What is the base for your work?
> > > I hope this is (fresh released) Linux v5.11-rc3.
> > >
> > This patch (and the PR on the CBL issue tracker) are from top-of-tree Linux.
> >
> > > I myself had some experiences with a PGO + ThinLTO optimized LLVM
> > > toolchain built with the help of tc-build.
> > > Here it takes very long to build it.
> > >
> > > This means I have some profile-data archived.
> > > Can I use it?
> > >
> > LLVM is more tolerant of "stale" profile data than gcov, so it's
> > possible that your archived profile data would still work, but I can't
> > guarantee that it will be better than using new profile data.
> >
> > > Is an own PGO + ThinLTO optimized LLVM toolchain pre-requirement for
> > > this or not?
> > > That is one of my important questions.
> > >
> > Do you mean that the LLVM tools (clang, llc, etc.) are compiled with
> > PGO + ThinLTO?
> >
> 
> Yes.
> 
> - Sedat -

No, having an optimized LLVM toolchain is not a requirement of this
patchset. It will make compiling the kernel faster but it does nothing
more than that.

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-11  8:18 [PATCH] pgo: add clang's Profile Guided Optimization infrastructure Bill Wendling
  2021-01-11  8:39 ` Sedat Dilek
@ 2021-01-11 20:12 ` Fangrui Song
  2021-01-11 20:23   ` Bill Wendling
  2021-01-11 21:04 ` Nathan Chancellor
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 116+ messages in thread
From: Fangrui Song @ 2021-01-11 20:12 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Jonathan Corbet, Masahiro Yamada, linux-doc, linux-kernel,
	linux-kbuild, clang-built-linux, Andrew Morton,
	Nathan Chancellor, Nick Desaulniers, Sami Tolvanen

On 2021-01-11, 'Bill Wendling' via Clang Built Linux wrote:
>From: Sami Tolvanen <samitolvanen@google.com>
>
>Enable the use of clang's Profile-Guided Optimization[1]. To generate a
>profile, the kernel is instrumented with PGO counters, a representative
>workload is run, and the raw profile data is collected from
>/sys/kernel/debug/pgo/profraw.
>
>The raw profile data must be processed by clang's "llvm-profdata" tool before
>it can be used during recompilation:
>
>  $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
>  $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
>Multiple raw profiles may be merged during this step.
>
>The data can be used either by the compiler if LTO isn't enabled:
>
>    ... -fprofile-use=vmlinux.profdata ...
>
>or by LLD if LTO is enabled:
>
>    ... -lto-cs-profile-file=vmlinux.profdata ...

This LLD option does not exist.
LLD does have some `--lto-*` options but the `-lto-*` form is not supported
(it clashes with -l) https://reviews.llvm.org/D79371

(There is an earlier -fprofile-instr-generate which does
instrumentation in Clang, but the option does not have broad usage.
It is used more for code coverage, not for optimization.
Noticeably, it does not even implement the Kirchhoff's current law
optimization)

-fprofile-use= is used by both regular PGO and context-sensitive PGO (CSPGO).

clang -flto=thin -fprofile-use= passes -plugin-opt=cs-profile-path= to the linker.
For regular PGO, this option is effectively a no-op (confirmed with CSPGO main developer).

So I think the "or by LLD if LTO is enabled:" part should be removed.

>This initial submission is restricted to x86, as that's the platform we know
>works. This restriction can be lifted once other platforms have been verified
>to work with PGO.
>
>Note that this method of profiling the kernel is clang-native and isn't
>compatible with clang's gcov support in kernel/gcov.
>
>[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
>Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
>Co-developed-by: Bill Wendling <morbo@google.com>
>Signed-off-by: Bill Wendling <morbo@google.com>
>---
> Documentation/dev-tools/index.rst     |   1 +
> Documentation/dev-tools/pgo.rst       | 127 +++++++++
> MAINTAINERS                           |   9 +
> Makefile                              |   3 +
> arch/Kconfig                          |   1 +
> arch/arm/boot/bootp/Makefile          |   1 +
> arch/arm/boot/compressed/Makefile     |   1 +
> arch/arm/vdso/Makefile                |   3 +-
> arch/arm64/kernel/vdso/Makefile       |   3 +-
> arch/arm64/kvm/hyp/nvhe/Makefile      |   1 +
> arch/mips/boot/compressed/Makefile    |   1 +
> arch/mips/vdso/Makefile               |   1 +
> arch/nds32/kernel/vdso/Makefile       |   4 +-
> arch/parisc/boot/compressed/Makefile  |   1 +
> arch/powerpc/kernel/Makefile          |   6 +-
> arch/powerpc/kernel/trace/Makefile    |   3 +-
> arch/powerpc/kernel/vdso32/Makefile   |   1 +
> arch/powerpc/kernel/vdso64/Makefile   |   1 +
> arch/powerpc/kexec/Makefile           |   3 +-
> arch/powerpc/xmon/Makefile            |   1 +
> arch/riscv/kernel/vdso/Makefile       |   3 +-
> arch/s390/boot/Makefile               |   1 +
> arch/s390/boot/compressed/Makefile    |   1 +
> arch/s390/kernel/Makefile             |   1 +
> arch/s390/kernel/vdso64/Makefile      |   3 +-
> arch/s390/purgatory/Makefile          |   1 +
> arch/sh/boot/compressed/Makefile      |   1 +
> arch/sh/mm/Makefile                   |   1 +
> arch/sparc/vdso/Makefile              |   1 +
> arch/x86/Kconfig                      |   1 +
> arch/x86/boot/Makefile                |   1 +
> arch/x86/boot/compressed/Makefile     |   1 +
> arch/x86/entry/vdso/Makefile          |   1 +
> arch/x86/kernel/vmlinux.lds.S         |   2 +
> arch/x86/platform/efi/Makefile        |   1 +
> arch/x86/purgatory/Makefile           |   1 +
> arch/x86/realmode/rm/Makefile         |   1 +
> arch/x86/um/vdso/Makefile             |   1 +
> drivers/firmware/efi/libstub/Makefile |   1 +
> drivers/s390/char/Makefile            |   1 +
> include/asm-generic/vmlinux.lds.h     |  44 +++
> kernel/Makefile                       |   1 +
> kernel/pgo/Kconfig                    |  34 +++
> kernel/pgo/Makefile                   |   5 +
> kernel/pgo/fs.c                       | 382 ++++++++++++++++++++++++++
> kernel/pgo/instrument.c               | 147 ++++++++++
> kernel/pgo/pgo.h                      | 206 ++++++++++++++
> scripts/Makefile.lib                  |  10 +
> 48 files changed, 1017 insertions(+), 9 deletions(-)
> create mode 100644 Documentation/dev-tools/pgo.rst
> create mode 100644 kernel/pgo/Kconfig
> create mode 100644 kernel/pgo/Makefile
> create mode 100644 kernel/pgo/fs.c
> create mode 100644 kernel/pgo/instrument.c
> create mode 100644 kernel/pgo/pgo.h
>
>diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
>index f7809c7b1ba9e..8d6418e858062 100644
>--- a/Documentation/dev-tools/index.rst
>+++ b/Documentation/dev-tools/index.rst
>@@ -26,6 +26,7 @@ whole; patches welcome!
>    kgdb
>    kselftest
>    kunit/index
>+   pgo
>
>
> .. only::  subproject and html
>diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
>new file mode 100644
>index 0000000000000..2ed7f549b20ef
>--- /dev/null
>+++ b/Documentation/dev-tools/pgo.rst
>@@ -0,0 +1,127 @@
>+.. SPDX-License-Identifier: GPL-2.0
>+
>+===============================
>+Using PGO with the Linux kernel
>+===============================
>+
>+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
>+when building with Clang. The profiling data is exported via the ``pgo``
>+debugfs directory.
>+
>+.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>+
>+
>+Preparation
>+===========
>+
>+Configure the kernel with:
>+
>+.. code-block:: make
>+
>+   CONFIG_DEBUG_FS=y
>+   CONFIG_PGO_CLANG=y

Clang also supports SamplePGO (-fprofile-sample-use; AutoFDO) and
context-sensitive PGO (post-inline, -fcs-profile-generate).
Context-sensitive SamplePGO is under development.

If the naming does not make adding future other PGO features difficult,
I am happy with CONFIG_PGO_CLANG :)


Aside from the question and the issue above, the description looks good to me.

>+Note that kernels compiled with profiling flags will be significantly larger
>+and run slower.
>+
>+Profiling data will only become accessible once debugfs has been mounted:
>+
>+.. code-block:: sh
>+
>+   mount -t debugfs none /sys/kernel/debug
>+
>+
>+Customization
>+=============
>+
>+You can enable or disable profiling for individual file and directories by
>+adding a line similar to the following to the respective kernel Makefile:
>+
>+- For a single file (e.g. main.o)
>+
>+  .. code-block:: make
>+
>+     PGO_PROFILE_main.o := y
>+
>+- For all files in one directory
>+
>+  .. code-block:: make
>+
>+     PGO_PROFILE := y
>+
>+To exclude files from being profiled use
>+
>+  .. code-block:: make
>+
>+     PGO_PROFILE_main.o := n
>+
>+and
>+
>+  .. code-block:: make
>+
>+     PGO_PROFILE := n
>+
>+Only files which are linked to the main kernel image or are compiled as kernel
>+modules are supported by this mechanism.
>+
>+
>+Files
>+=====
>+
>+The PGO kernel support creates the following files in debugfs:
>+
>+``/sys/kernel/debug/pgo``
>+	Parent directory for all PGO-related files.
>+
>+``/sys/kernel/debug/pgo/reset``
>+	Global reset file: resets all coverage data to zero when written to.
>+
>+``/sys/kernel/debug/profraw``
>+	The raw PGO data that must be processed with ``llvm_profdata``.
>+
>+
>+Workflow
>+========
>+
>+The PGO kernel can be run on the host or test machines. The data though should
>+be analyzed with Clang's tools from the same Clang version as the kernel was
>+compiled. Clang's tolerant of version skew, but it's easier to use the same
>+Clang version.
>+
>+The profiling data is useful for optimizing the kernel, analyzing coverage,
>+etc. Clang offers tools to perform these tasks.
>+
>+Here is an example workflow for profiling an instrumented kernel with PGO and
>+using the result to optimize the kernel:
>+
>+1) Install the kernel on the TEST machine.
>+
>+2) Reset the data counters right before running the load tests
>+
>+   .. code-block:: sh
>+
>+      echo 1 > /sys/kernel/debug/pgo/reset
>+
>+3) Run the load tests.
>+
>+4) Collect the raw profile data
>+
>+   .. code-block:: sh
>+
>+      cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
>+
>+5) (Optional) Download the raw profile data to the HOST machine.
>+
>+6) Process the raw profile data
>+
>+   .. code-block:: sh
>+
>+      llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>+
>+   Note that multiple raw profile data files can be merged during this step.
>+
>+7) Rebuild the kernel using the profile data (PGO disabled)
>+
>+   .. code-block:: sh
>+
>+      make ... KCLAGS=-fprofile-use=vmlinux.profdata
>diff --git a/MAINTAINERS b/MAINTAINERS
>index 6390491b07e51..7a98bdaab9861 100644
>--- a/MAINTAINERS
>+++ b/MAINTAINERS
>@@ -13955,6 +13955,15 @@ S:	Maintained
> F:	include/linux/personality.h
> F:	include/uapi/linux/personality.h
>
>+PGO BASED KERNEL PROFILING
>+M:	Sami Tolvanen <samitolvanen@google.com>
>+M:	Bill Wendling <wcw@google.com>
>+R:	Nathan Chancellor <natechancellor@gmail.com>
>+R:	Nick Desaulniers <ndesaulniers@google.com>
>+S:	Supported
>+F:	Documentation/dev-tools/pgo.rst
>+F:	kernel/pgo
>+
> PHOENIX RC FLIGHT CONTROLLER ADAPTER
> M:	Marcus Folkesson <marcus.folkesson@gmail.com>
> L:	linux-input@vger.kernel.org
>diff --git a/Makefile b/Makefile
>index 8b2c3f88ee5ea..4f42957c78134 100644
>--- a/Makefile
>+++ b/Makefile
>@@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
> # Defaults to vmlinux, but the arch makefile usually adds further targets
> all: vmlinux
>
>+CFLAGS_PGO_CLANG := -fprofile-generate
>+export CFLAGS_PGO_CLANG
>+
> CFLAGS_GCOV	:= -fprofile-arcs -ftest-coverage \
> 	$(call cc-option,-fno-tree-loop-im) \
> 	$(call cc-disable-warning,maybe-uninitialized,)
>diff --git a/arch/Kconfig b/arch/Kconfig
>index 78c6f05b10f91..a7a6ab7d204dc 100644
>--- a/arch/Kconfig
>+++ b/arch/Kconfig
>@@ -1106,6 +1106,7 @@ config ARCH_SUPPORTS_DEBUG_PAGEALLOC
> 	bool
>
> source "kernel/gcov/Kconfig"
>+source "kernel/pgo/Kconfig"
>
> source "scripts/gcc-plugins/Kconfig"
>
>diff --git a/arch/arm/boot/bootp/Makefile b/arch/arm/boot/bootp/Makefile
>index 981a8d03f064c..523bd58df0a4b 100644
>--- a/arch/arm/boot/bootp/Makefile
>+++ b/arch/arm/boot/bootp/Makefile
>@@ -7,6 +7,7 @@
> #
>
> GCOV_PROFILE	:= n
>+PGO_PROFILE	:= n
>
> LDFLAGS_bootp	:= --no-undefined -X \
> 		 --defsym initrd_phys=$(INITRD_PHYS) \
>diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
>index fb521efcc6c20..5fd0fd85fc0e5 100644
>--- a/arch/arm/boot/compressed/Makefile
>+++ b/arch/arm/boot/compressed/Makefile
>@@ -24,6 +24,7 @@ OBJS		+= hyp-stub.o
> endif
>
> GCOV_PROFILE		:= n
>+PGO_PROFILE		:= n
> KASAN_SANITIZE		:= n
>
> # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
>diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile
>index b558bee0e1f6b..11f6ce4b48b56 100644
>--- a/arch/arm/vdso/Makefile
>+++ b/arch/arm/vdso/Makefile
>@@ -36,8 +36,9 @@ else
> CFLAGS_vgettimeofday.o = -O2 -include $(c-gettimeofday-y)
> endif
>
>-# Disable gcov profiling for VDSO code
>+# Disable gcov and PGO profiling for VDSO code
> GCOV_PROFILE := n
>+PGO_PROFILE := n
>
> # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
> KCOV_INSTRUMENT := n
>diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
>index cd9c3fa25902f..d48fc0df07020 100644
>--- a/arch/arm64/kernel/vdso/Makefile
>+++ b/arch/arm64/kernel/vdso/Makefile
>@@ -41,8 +41,9 @@ ifneq ($(c-gettimeofday-y),)
>   CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
> endif
>
>-# Disable gcov profiling for VDSO code
>+# Disable gcov and PGO profiling for VDSO code
> GCOV_PROFILE := n
>+PGO_PROFILE := n
>
> obj-y += vdso.o
> targets += vdso.lds
>diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
>index 1f1e351c5fe2b..ad128ecdbfbdf 100644
>--- a/arch/arm64/kvm/hyp/nvhe/Makefile
>+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
>@@ -60,6 +60,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS), $(KBUILD_CFLAG
> # compiler instrumentation that inserts callbacks or checks into the code may
> # cause crashes. Just disable it.
> GCOV_PROFILE	:= n
>+PGO_PROFILE	:= n
> KASAN_SANITIZE	:= n
> UBSAN_SANITIZE	:= n
> KCOV_INSTRUMENT	:= n
>diff --git a/arch/mips/boot/compressed/Makefile b/arch/mips/boot/compressed/Makefile
>index 47cd9dc7454af..0855ea12f2c7f 100644
>--- a/arch/mips/boot/compressed/Makefile
>+++ b/arch/mips/boot/compressed/Makefile
>@@ -37,6 +37,7 @@ KBUILD_AFLAGS := $(KBUILD_AFLAGS) -D__ASSEMBLY__ \
> # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
> KCOV_INSTRUMENT		:= n
> GCOV_PROFILE := n
>+PGO_PROFILE := n
>
> # decompressor objects (linked with vmlinuz)
> vmlinuzobjs-y := $(obj)/head.o $(obj)/decompress.o $(obj)/string.o
>diff --git a/arch/mips/vdso/Makefile b/arch/mips/vdso/Makefile
>index 5810cc12bc1d9..d7eb64de35eae 100644
>--- a/arch/mips/vdso/Makefile
>+++ b/arch/mips/vdso/Makefile
>@@ -66,6 +66,7 @@ ldflags-y := -Bsymbolic --no-undefined -soname=linux-vdso.so.1 \
> CFLAGS_REMOVE_vdso.o = -pg
>
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> UBSAN_SANITIZE := n
> KCOV_INSTRUMENT := n
>
>diff --git a/arch/nds32/kernel/vdso/Makefile b/arch/nds32/kernel/vdso/Makefile
>index 55df25ef00578..f2b53ee2124b7 100644
>--- a/arch/nds32/kernel/vdso/Makefile
>+++ b/arch/nds32/kernel/vdso/Makefile
>@@ -15,9 +15,9 @@ obj-vdso := $(addprefix $(obj)/, $(obj-vdso))
> ccflags-y := -shared -fno-common -fno-builtin -nostdlib -fPIC -Wl,-shared -g \
> 	-Wl,-soname=linux-vdso.so.1 -Wl,--hash-style=sysv
>
>-# Disable gcov profiling for VDSO code
>+# Disable gcov and PGO profiling for VDSO code
> GCOV_PROFILE := n
>-
>+PGO_PROFILE := n
>
> obj-y += vdso.o
> targets += vdso.lds
>diff --git a/arch/parisc/boot/compressed/Makefile b/arch/parisc/boot/compressed/Makefile
>index dff4536875305..5cf93a67f7da7 100644
>--- a/arch/parisc/boot/compressed/Makefile
>+++ b/arch/parisc/boot/compressed/Makefile
>@@ -7,6 +7,7 @@
>
> KCOV_INSTRUMENT := n
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> UBSAN_SANITIZE := n
>
> targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2
>diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
>index fe2ef598e2ead..c642c046660d7 100644
>--- a/arch/powerpc/kernel/Makefile
>+++ b/arch/powerpc/kernel/Makefile
>@@ -153,17 +153,21 @@ endif
> obj-$(CONFIG_PPC_SECURE_BOOT)	+= secure_boot.o ima_arch.o secvar-ops.o
> obj-$(CONFIG_PPC_SECVAR_SYSFS)	+= secvar-sysfs.o
>
>-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
>+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
> GCOV_PROFILE_prom_init.o := n
>+PGO_PROFILE_prom_init.o := n
> KCOV_INSTRUMENT_prom_init.o := n
> UBSAN_SANITIZE_prom_init.o := n
> GCOV_PROFILE_kprobes.o := n
>+PGO_PROFILE_kprobes.o := n
> KCOV_INSTRUMENT_kprobes.o := n
> UBSAN_SANITIZE_kprobes.o := n
> GCOV_PROFILE_kprobes-ftrace.o := n
>+PGO_PROFILE_kprobes-ftrace.o := n
> KCOV_INSTRUMENT_kprobes-ftrace.o := n
> UBSAN_SANITIZE_kprobes-ftrace.o := n
> GCOV_PROFILE_syscall_64.o := n
>+PGO_PROFILE_syscall_64.o := n
> KCOV_INSTRUMENT_syscall_64.o := n
> UBSAN_SANITIZE_syscall_64.o := n
> UBSAN_SANITIZE_vdso.o := n
>diff --git a/arch/powerpc/kernel/trace/Makefile b/arch/powerpc/kernel/trace/Makefile
>index 858503775c583..7d72ae7d4f8c6 100644
>--- a/arch/powerpc/kernel/trace/Makefile
>+++ b/arch/powerpc/kernel/trace/Makefile
>@@ -23,7 +23,8 @@ obj-$(CONFIG_TRACING)			+= trace_clock.o
> obj-$(CONFIG_PPC64)			+= $(obj64-y)
> obj-$(CONFIG_PPC32)			+= $(obj32-y)
>
>-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
>+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
> GCOV_PROFILE_ftrace.o := n
>+PGO_PROFILE_ftrace.o := n
> KCOV_INSTRUMENT_ftrace.o := n
> UBSAN_SANITIZE_ftrace.o := n
>diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile
>index 9cb6f524854b9..655e159975a04 100644
>--- a/arch/powerpc/kernel/vdso32/Makefile
>+++ b/arch/powerpc/kernel/vdso32/Makefile
>@@ -34,6 +34,7 @@ targets := $(obj-vdso32) vdso32.so.dbg
> obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32))
>
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> KCOV_INSTRUMENT := n
> UBSAN_SANITIZE := n
> KASAN_SANITIZE := n
>diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile
>index bf363ff371521..12c286f5afc16 100644
>--- a/arch/powerpc/kernel/vdso64/Makefile
>+++ b/arch/powerpc/kernel/vdso64/Makefile
>@@ -21,6 +21,7 @@ targets := $(obj-vdso64) vdso64.so.dbg
> obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64))
>
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> KCOV_INSTRUMENT := n
> UBSAN_SANITIZE := n
> KASAN_SANITIZE := n
>diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
>index 4aff6846c7726..1c7f65e3cb969 100644
>--- a/arch/powerpc/kexec/Makefile
>+++ b/arch/powerpc/kexec/Makefile
>@@ -16,7 +16,8 @@ endif
> endif
>
>
>-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
>+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
> GCOV_PROFILE_core_$(BITS).o := n
>+PGO_PROFILE_core_$(BITS).o := n
> KCOV_INSTRUMENT_core_$(BITS).o := n
> UBSAN_SANITIZE_core_$(BITS).o := n
>diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
>index eb25d7554ffd1..7aff80d18b44b 100644
>--- a/arch/powerpc/xmon/Makefile
>+++ b/arch/powerpc/xmon/Makefile
>@@ -2,6 +2,7 @@
> # Makefile for xmon
>
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> KCOV_INSTRUMENT := n
> UBSAN_SANITIZE := n
> KASAN_SANITIZE := n
>diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
>index 0cfd6da784f84..882340dc3c647 100644
>--- a/arch/riscv/kernel/vdso/Makefile
>+++ b/arch/riscv/kernel/vdso/Makefile
>@@ -32,8 +32,9 @@ CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
> # Disable -pg to prevent insert call site
> CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
>
>-# Disable gcov profiling for VDSO code
>+# Disable gcov and PGO profiling for VDSO code
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> KCOV_INSTRUMENT := n
>
> # Force dependency
>diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
>index 41a64b8dce252..bee4a32040e79 100644
>--- a/arch/s390/boot/Makefile
>+++ b/arch/s390/boot/Makefile
>@@ -5,6 +5,7 @@
>
> KCOV_INSTRUMENT := n
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> UBSAN_SANITIZE := n
> KASAN_SANITIZE := n
>
>diff --git a/arch/s390/boot/compressed/Makefile b/arch/s390/boot/compressed/Makefile
>index de18dab518bb6..c3ab883e8425a 100644
>--- a/arch/s390/boot/compressed/Makefile
>+++ b/arch/s390/boot/compressed/Makefile
>@@ -7,6 +7,7 @@
>
> KCOV_INSTRUMENT := n
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> UBSAN_SANITIZE := n
> KASAN_SANITIZE := n
>
>diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
>index dd73b7f074237..bd857aacad794 100644
>--- a/arch/s390/kernel/Makefile
>+++ b/arch/s390/kernel/Makefile
>@@ -14,6 +14,7 @@ CFLAGS_REMOVE_early.o		= $(CC_FLAGS_FTRACE)
> endif
>
> GCOV_PROFILE_early.o		:= n
>+PGO_PROFILE_early.o		:= n
> KCOV_INSTRUMENT_early.o		:= n
> UBSAN_SANITIZE_early.o		:= n
> KASAN_SANITIZE_ipl.o		:= n
>diff --git a/arch/s390/kernel/vdso64/Makefile b/arch/s390/kernel/vdso64/Makefile
>index a6e0fb6b91d6c..d7c43b7c1db96 100644
>--- a/arch/s390/kernel/vdso64/Makefile
>+++ b/arch/s390/kernel/vdso64/Makefile
>@@ -35,8 +35,9 @@ obj-y += vdso64_wrapper.o
> targets += vdso64.lds
> CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
>
>-# Disable gcov profiling, ubsan and kasan for VDSO code
>+# Disable gcov and PGO profiling, ubsan and kasan for VDSO code
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> UBSAN_SANITIZE := n
> KASAN_SANITIZE := n
>
>diff --git a/arch/s390/purgatory/Makefile b/arch/s390/purgatory/Makefile
>index c57f8c40e9926..9aef584e98466 100644
>--- a/arch/s390/purgatory/Makefile
>+++ b/arch/s390/purgatory/Makefile
>@@ -17,6 +17,7 @@ $(obj)/mem.o: $(srctree)/arch/s390/lib/mem.S FORCE
>
> KCOV_INSTRUMENT := n
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> UBSAN_SANITIZE := n
> KASAN_SANITIZE := n
>
>diff --git a/arch/sh/boot/compressed/Makefile b/arch/sh/boot/compressed/Makefile
>index 589d2d8a573db..ae19aeeb3964c 100644
>--- a/arch/sh/boot/compressed/Makefile
>+++ b/arch/sh/boot/compressed/Makefile
>@@ -13,6 +13,7 @@ targets		:= vmlinux vmlinux.bin vmlinux.bin.gz \
> OBJECTS = $(obj)/head_32.o $(obj)/misc.o $(obj)/cache.o
>
> GCOV_PROFILE := n
>+PGO_PROFILE := n
>
> #
> # IMAGE_OFFSET is the load offset of the compression loader
>diff --git a/arch/sh/mm/Makefile b/arch/sh/mm/Makefile
>index f69ddc70b1465..ea2782c631f43 100644
>--- a/arch/sh/mm/Makefile
>+++ b/arch/sh/mm/Makefile
>@@ -43,3 +43,4 @@ obj-$(CONFIG_UNCACHED_MAPPING)	+= uncached.o
> obj-$(CONFIG_HAVE_SRAM_POOL)	+= sram.o
>
> GCOV_PROFILE_pmb.o := n
>+PGO_PROFILE_pmb.o := n
>diff --git a/arch/sparc/vdso/Makefile b/arch/sparc/vdso/Makefile
>index c5e1545bc5cf9..ab5f3783fe199 100644
>--- a/arch/sparc/vdso/Makefile
>+++ b/arch/sparc/vdso/Makefile
>@@ -115,6 +115,7 @@ quiet_cmd_vdso = VDSO    $@
>
> VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 -Bsymbolic
> GCOV_PROFILE := n
>+PGO_PROFILE := n
>
> #
> # Install the unstripped copies of vdso*.so.  If our toolchain supports
>diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>index 7b6dd10b162ac..a751b4f8f6645 100644
>--- a/arch/x86/Kconfig
>+++ b/arch/x86/Kconfig
>@@ -95,6 +95,7 @@ config X86
> 	select ARCH_SUPPORTS_DEBUG_PAGEALLOC
> 	select ARCH_SUPPORTS_NUMA_BALANCING	if X86_64
> 	select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP	if NR_CPUS <= 4096
>+	select ARCH_SUPPORTS_PGO_CLANG		if X86_64
> 	select ARCH_USE_BUILTIN_BSWAP
> 	select ARCH_USE_QUEUED_RWLOCKS
> 	select ARCH_USE_QUEUED_SPINLOCKS
>diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
>index fe605205b4ce2..383853e32f673 100644
>--- a/arch/x86/boot/Makefile
>+++ b/arch/x86/boot/Makefile
>@@ -71,6 +71,7 @@ KBUILD_AFLAGS	:= $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS	+= $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> UBSAN_SANITIZE := n
>
> $(obj)/bzImage: asflags-y  := $(SVGA_MODE)
>diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
>index e0bc3988c3faa..ed12ab65f6065 100644
>--- a/arch/x86/boot/compressed/Makefile
>+++ b/arch/x86/boot/compressed/Makefile
>@@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
>
> KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> UBSAN_SANITIZE :=n
>
> KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
>diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
>index 02e3e42f380bd..26e2b3af0145c 100644
>--- a/arch/x86/entry/vdso/Makefile
>+++ b/arch/x86/entry/vdso/Makefile
>@@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO    $@
> VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> 	$(call ld-option, --eh-frame-hdr) -Bsymbolic
> GCOV_PROFILE := n
>+PGO_PROFILE := n
>
> quiet_cmd_vdso_and_check = VDSO    $@
>       cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
>diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
>index efd9e9ea17f25..f6cab2316c46a 100644
>--- a/arch/x86/kernel/vmlinux.lds.S
>+++ b/arch/x86/kernel/vmlinux.lds.S
>@@ -184,6 +184,8 @@ SECTIONS
>
> 	BUG_TABLE
>
>+	PGO_CLANG_DATA
>+
> 	ORC_UNWIND_TABLE
>
> 	. = ALIGN(PAGE_SIZE);
>diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
>index 84b09c230cbd5..5f22b31446ad4 100644
>--- a/arch/x86/platform/efi/Makefile
>+++ b/arch/x86/platform/efi/Makefile
>@@ -2,6 +2,7 @@
> OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> KASAN_SANITIZE := n
> GCOV_PROFILE := n
>+PGO_PROFILE := n
>
> obj-$(CONFIG_EFI) 		+= quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> obj-$(CONFIG_EFI_MIXED)		+= efi_thunk_$(BITS).o
>diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
>index 95ea17a9d20cb..36f20e99da0bc 100644
>--- a/arch/x86/purgatory/Makefile
>+++ b/arch/x86/purgatory/Makefile
>@@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
>
> # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> GCOV_PROFILE	:= n
>+PGO_PROFILE	:= n
> KASAN_SANITIZE	:= n
> UBSAN_SANITIZE	:= n
> KCSAN_SANITIZE	:= n
>diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
>index 83f1b6a56449f..21797192f958f 100644
>--- a/arch/x86/realmode/rm/Makefile
>+++ b/arch/x86/realmode/rm/Makefile
>@@ -76,4 +76,5 @@ KBUILD_CFLAGS	:= $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> KBUILD_AFLAGS	:= $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> UBSAN_SANITIZE := n
>diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
>index 5943387e3f357..54f5768f58530 100644
>--- a/arch/x86/um/vdso/Makefile
>+++ b/arch/x86/um/vdso/Makefile
>@@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO    $@
>
> VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> GCOV_PROFILE := n
>+PGO_PROFILE := n
>
> #
> # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
>diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
>index 8a94388e38b33..2d81623b33f29 100644
>--- a/drivers/firmware/efi/libstub/Makefile
>+++ b/drivers/firmware/efi/libstub/Makefile
>@@ -40,6 +40,7 @@ KBUILD_CFLAGS			:= $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
> KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
>
> GCOV_PROFILE			:= n
>+PGO_PROFILE			:= n
> # Sanitizer runtimes are unavailable and cannot be linked here.
> KASAN_SANITIZE			:= n
> KCSAN_SANITIZE			:= n
>diff --git a/drivers/s390/char/Makefile b/drivers/s390/char/Makefile
>index c6fdb81a068a6..bf6c5db5da1fc 100644
>--- a/drivers/s390/char/Makefile
>+++ b/drivers/s390/char/Makefile
>@@ -9,6 +9,7 @@ CFLAGS_REMOVE_sclp_early_core.o	= $(CC_FLAGS_FTRACE)
> endif
>
> GCOV_PROFILE_sclp_early_core.o		:= n
>+PGO_PROFILE_sclp_early_core.o		:= n
> KCOV_INSTRUMENT_sclp_early_core.o	:= n
> UBSAN_SANITIZE_sclp_early_core.o	:= n
> KASAN_SANITIZE_sclp_early_core.o	:= n
>diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
>index b2b3d81b1535a..3a591bb18c5fb 100644
>--- a/include/asm-generic/vmlinux.lds.h
>+++ b/include/asm-generic/vmlinux.lds.h
>@@ -316,6 +316,49 @@
> #define THERMAL_TABLE(name)
> #endif
>
>+#ifdef CONFIG_PGO_CLANG
>+#define PGO_CLANG_DATA							\
>+	__llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) {	\
>+		. = ALIGN(8);						\
>+		__llvm_prf_start = .;					\
>+		__llvm_prf_data_start = .;				\
>+		KEEP(*(__llvm_prf_data))				\
>+		. = ALIGN(8);						\
>+		__llvm_prf_data_end = .;				\
>+	}								\
>+	__llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) {	\
>+		. = ALIGN(8);						\
>+		__llvm_prf_cnts_start = .;				\
>+		KEEP(*(__llvm_prf_cnts))				\
>+		. = ALIGN(8);						\
>+		__llvm_prf_cnts_end = .;				\
>+	}								\
>+	__llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) {	\
>+		. = ALIGN(8);						\
>+		__llvm_prf_names_start = .;				\
>+		KEEP(*(__llvm_prf_names))				\
>+		. = ALIGN(8);						\
>+		__llvm_prf_names_end = .;				\
>+		. = ALIGN(8);						\
>+	}								\
>+	__llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) {	\
>+		__llvm_prf_vals_start = .;				\
>+		KEEP(*(__llvm_prf_vals))				\
>+		. = ALIGN(8);						\
>+		__llvm_prf_vals_end = .;				\
>+		. = ALIGN(8);						\
>+	}								\
>+	__llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) {	\
>+		__llvm_prf_vnds_start = .;				\
>+		KEEP(*(__llvm_prf_vnds))				\
>+		. = ALIGN(8);						\
>+		__llvm_prf_vnds_end = .;				\
>+		__llvm_prf_end = .;					\
>+	}
>+#else
>+#define PGO_CLANG_DATA
>+#endif
>+
> #define KERNEL_DTB()							\
> 	STRUCT_ALIGN();							\
> 	__dtb_start = .;						\
>@@ -1125,6 +1168,7 @@
> 		CONSTRUCTORS						\
> 	}								\
> 	BUG_TABLE							\
>+	PGO_CLANG_DATA
>
> #define INIT_TEXT_SECTION(inittext_align)				\
> 	. = ALIGN(inittext_align);					\
>diff --git a/kernel/Makefile b/kernel/Makefile
>index aa7368c7eabf3..0b34ca228ba46 100644
>--- a/kernel/Makefile
>+++ b/kernel/Makefile
>@@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> obj-$(CONFIG_KCSAN) += kcsan/
> obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
>+obj-$(CONFIG_PGO_CLANG) += pgo/
>
> obj-$(CONFIG_PERF_EVENTS) += events/
>
>diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
>new file mode 100644
>index 0000000000000..318d36bb3d106
>--- /dev/null
>+++ b/kernel/pgo/Kconfig
>@@ -0,0 +1,34 @@
>+# SPDX-License-Identifier: GPL-2.0-only
>+menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
>+
>+config ARCH_SUPPORTS_PGO_CLANG
>+	bool
>+
>+config PGO_CLANG
>+	bool "Enable clang's PGO-based kernel profiling"
>+	depends on DEBUG_FS
>+	depends on ARCH_SUPPORTS_PGO_CLANG
>+	help
>+	  This option enables clang's PGO (Profile Guided Optimization) based
>+	  code profiling to better optimize the kernel.
>+
>+	  If unsure, say N.
>+
>+	  Run a representative workload for your application on a kernel
>+	  compiled with this option and download the raw profile file from
>+	  /sys/kernel/debug/pgo/profraw. This file needs to be processed with
>+	  llvm-profdata. It may be merged with other collected raw profiles.
>+
>+	  Copy the resulting profile file into vmlinux.profdata, and enable
>+	  KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
>+	  kernel.
>+
>+	  Note that a kernel compiled with profiling flags will be
>+	  significatnly larger and run slower. Also be sure to exclude files
>+	  from profiling which are not linked to the kernel image to prevent
>+	  linker errors.
>+
>+	  Note that the debugfs filesystem has to be mounted to access
>+	  profiling data.
>+
>+endmenu
>diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
>new file mode 100644
>index 0000000000000..41e27cefd9a47
>--- /dev/null
>+++ b/kernel/pgo/Makefile
>@@ -0,0 +1,5 @@
>+# SPDX-License-Identifier: GPL-2.0
>+GCOV_PROFILE	:= n
>+PGO_PROFILE	:= n
>+
>+obj-y	+= fs.o instrument.o
>diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
>new file mode 100644
>index 0000000000000..790a8df037bfc
>--- /dev/null
>+++ b/kernel/pgo/fs.c
>@@ -0,0 +1,382 @@
>+// SPDX-License-Identifier: GPL-2.0
>+/*
>+ * Copyright (C) 2019 Google, Inc.
>+ *
>+ * Author:
>+ *	Sami Tolvanen <samitolvanen@google.com>
>+ *
>+ * This software is licensed under the terms of the GNU General Public
>+ * License version 2, as published by the Free Software Foundation, and
>+ * may be copied, distributed, and modified under those terms.
>+ *
>+ * This program is distributed in the hope that it will be useful,
>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>+ * GNU General Public License for more details.
>+ *
>+ */
>+
>+#define pr_fmt(fmt)	"pgo: " fmt
>+
>+#include <linux/kernel.h>
>+#include <linux/debugfs.h>
>+#include <linux/fs.h>
>+#include <linux/module.h>
>+#include <linux/slab.h>
>+#include <linux/vmalloc.h>
>+#include "pgo.h"
>+
>+static struct dentry *directory;
>+
>+struct prf_private_data {
>+	void *buffer;
>+	unsigned long size;
>+};
>+
>+/*
>+ * Raw profile data format:
>+ *
>+ *	- llvm_prf_header
>+ *	- __llvm_prf_data
>+ *	- __llvm_prf_cnts
>+ *	- __llvm_prf_names
>+ *	- zero padding to 8 bytes
>+ *	- for each llvm_prf_data in __llvm_prf_data:
>+ *		- llvm_prf_value_data
>+ *			- llvm_prf_value_record + site count array
>+ *				- llvm_prf_value_node_data
>+ *				...
>+ *			...
>+ *		...
>+ */
>+
>+static void prf_fill_header(void **buffer)
>+{
>+	struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
>+
>+	header->magic = LLVM_PRF_MAGIC;
>+	header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
>+	header->data_size = prf_data_count();
>+	header->padding_bytes_before_counters = 0;
>+	header->counters_size = prf_cnts_count();
>+	header->padding_bytes_after_counters = 0;
>+	header->names_size = prf_names_count();
>+	header->counters_delta = (u64)__llvm_prf_cnts_start;
>+	header->names_delta = (u64)__llvm_prf_names_start;
>+	header->value_kind_last = LLVM_PRF_IPVK_LAST;
>+
>+	*buffer += sizeof(*header);
>+}
>+
>+/*
>+ * Copy the source into the buffer, incrementing the pointer into buffer in the
>+ * process.
>+ */
>+static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
>+{
>+	memcpy(*buffer, src, size);
>+	*buffer += size;
>+}
>+
>+static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
>+{
>+	struct llvm_prf_value_node **nodes =
>+		(struct llvm_prf_value_node **)p->values;
>+	u32 kinds = 0;
>+	u32 size = 0;
>+	unsigned int kind;
>+	unsigned int n;
>+	unsigned int s = 0;
>+
>+	for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
>+		unsigned int sites = p->num_value_sites[kind];
>+
>+		if (!sites)
>+			continue;
>+
>+		/* Record + site count array */
>+		size += prf_get_value_record_size(sites);
>+		kinds++;
>+
>+		if (!nodes)
>+			continue;
>+
>+		for (n = 0; n < sites; n++) {
>+			u32 count = 0;
>+			struct llvm_prf_value_node *site = nodes[s + n];
>+
>+			while (site && ++count <= U8_MAX)
>+				site = site->next;
>+
>+			size += count *
>+				sizeof(struct llvm_prf_value_node_data);
>+		}
>+
>+		s += sites;
>+	}
>+
>+	if (size)
>+		size += sizeof(struct llvm_prf_value_data);
>+
>+	if (value_kinds)
>+		*value_kinds = kinds;
>+
>+	return size;
>+}
>+
>+static u32 prf_get_value_size(void)
>+{
>+	u32 size = 0;
>+	struct llvm_prf_data *p;
>+
>+	for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
>+		size += __prf_get_value_size(p, NULL);
>+
>+	return size;
>+}
>+
>+/* Serialize the profiling's value. */
>+static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
>+{
>+	struct llvm_prf_value_data header;
>+	struct llvm_prf_value_node **nodes =
>+		(struct llvm_prf_value_node **)p->values;
>+	unsigned int kind;
>+	unsigned int n;
>+	unsigned int s = 0;
>+
>+	header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
>+
>+	if (!header.num_value_kinds)
>+		/* Nothing to write. */
>+		return;
>+
>+	prf_copy_to_buffer(buffer, &header, sizeof(header));
>+
>+	for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
>+		struct llvm_prf_value_record *record;
>+		u8 *counts;
>+		unsigned int sites = p->num_value_sites[kind];
>+
>+		if (!sites)
>+			continue;
>+
>+		/* Profiling value record. */
>+		record = *(struct llvm_prf_value_record **)buffer;
>+		*buffer += prf_get_value_record_header_size();
>+
>+		record->kind = kind;
>+		record->num_value_sites = sites;
>+
>+		/* Site count array. */
>+		counts = *(u8 **)buffer;
>+		*buffer += prf_get_value_record_site_count_size(sites);
>+
>+		/*
>+		 * If we don't have nodes, we can skip updating the site count
>+		 * array, because the buffer is zero filled.
>+		 */
>+		if (!nodes)
>+			continue;
>+
>+		for (n = 0; n < sites; n++) {
>+			u32 count = 0;
>+			struct llvm_prf_value_node *site = nodes[s + n];
>+
>+			while (site && ++count <= U8_MAX) {
>+				prf_copy_to_buffer(buffer, site,
>+						   sizeof(struct llvm_prf_value_node_data));
>+				site = site->next;
>+			}
>+
>+			counts[n] = (u8)count;
>+		}
>+
>+		s += sites;
>+	}
>+}
>+
>+static void prf_serialize_values(void **buffer)
>+{
>+	struct llvm_prf_data *p;
>+
>+	for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
>+		prf_serialize_value(p, buffer);
>+}
>+
>+static inline unsigned long prf_get_padding(unsigned long size)
>+{
>+	return 8 - (size % 8);
>+}
>+
>+static unsigned long prf_buffer_size(void)
>+{
>+	return sizeof(struct llvm_prf_header) +
>+			prf_data_size()	+
>+			prf_cnts_size() +
>+			prf_names_size() +
>+			prf_get_padding(prf_names_size()) +
>+			prf_get_value_size();
>+}
>+
>+/* Serialize the profling data into a format LLVM's tools can understand. */
>+static int prf_serialize(struct prf_private_data *p)
>+{
>+	int err = 0;
>+	void *buffer;
>+
>+	p->size = prf_buffer_size();
>+	p->buffer = vzalloc(p->size);
>+
>+	if (!p->buffer) {
>+		err = -ENOMEM;
>+		goto out;
>+	}
>+
>+	buffer = p->buffer;
>+
>+	prf_fill_header(&buffer);
>+	prf_copy_to_buffer(&buffer, __llvm_prf_data_start,  prf_data_size());
>+	prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start,  prf_cnts_size());
>+	prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
>+	buffer += prf_get_padding(prf_names_size());
>+
>+	prf_serialize_values(&buffer);
>+
>+out:
>+	return err;
>+}
>+
>+/* open() implementation for PGO. Creates a copy of the profiling data set. */
>+static int prf_open(struct inode *inode, struct file *file)
>+{
>+	struct prf_private_data *data;
>+	unsigned long flags;
>+	int err;
>+
>+	data = kzalloc(sizeof(*data), GFP_KERNEL);
>+	if (!data) {
>+		err = -ENOMEM;
>+		goto out;
>+	}
>+
>+	flags = prf_lock();
>+
>+	err = prf_serialize(data);
>+	if (err) {
>+		kfree(data);
>+		goto out_unlock;
>+	}
>+
>+	file->private_data = data;
>+
>+out_unlock:
>+	prf_unlock(flags);
>+out:
>+	return err;
>+}
>+
>+/* read() implementation for PGO. */
>+static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
>+			loff_t *ppos)
>+{
>+	struct prf_private_data *data = file->private_data;
>+
>+	BUG_ON(!data);
>+
>+	return simple_read_from_buffer(buf, count, ppos, data->buffer,
>+				       data->size);
>+}
>+
>+/* release() implementation for PGO. Release resources allocated by open(). */
>+static int prf_release(struct inode *inode, struct file *file)
>+{
>+	struct prf_private_data *data = file->private_data;
>+
>+	if (data) {
>+		vfree(data->buffer);
>+		kfree(data);
>+	}
>+
>+	return 0;
>+}
>+
>+static const struct file_operations prf_fops = {
>+	.owner		= THIS_MODULE,
>+	.open		= prf_open,
>+	.read		= prf_read,
>+	.llseek		= default_llseek,
>+	.release	= prf_release
>+};
>+
>+/* write() implementation for resetting PGO's profile data. */
>+static ssize_t reset_write(struct file *file, const char __user *addr,
>+			   size_t len, loff_t *pos)
>+{
>+	struct llvm_prf_data *data;
>+
>+	memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
>+
>+	for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
>+		struct llvm_prf_value_node **vnodes;
>+		u64 current_vsite_count;
>+		u32 i;
>+
>+		if (!data->values)
>+			continue;
>+
>+		current_vsite_count = 0;
>+		vnodes = (struct llvm_prf_value_node **)data->values;
>+
>+		for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
>+			current_vsite_count += data->num_value_sites[i];
>+
>+		for (i = 0; i < current_vsite_count; ++i) {
>+			struct llvm_prf_value_node *current_vnode = vnodes[i];
>+
>+			while (current_vnode) {
>+				current_vnode->count = 0;
>+				current_vnode = current_vnode->next;
>+			}
>+		}
>+	}
>+
>+	return len;
>+}
>+
>+static const struct file_operations prf_reset_fops = {
>+	.owner		= THIS_MODULE,
>+	.write		= reset_write,
>+	.llseek		= noop_llseek,
>+};
>+
>+/* Create debugfs entries. */
>+static int __init pgo_init(void)
>+{
>+	directory = debugfs_create_dir("pgo", NULL);
>+	if (!directory)
>+		goto err_remove;
>+
>+	if (!debugfs_create_file("profraw", 0600, directory, NULL,
>+				 &prf_fops))
>+		goto err_remove;
>+
>+	if (!debugfs_create_file("reset", 0200, directory, NULL,
>+				 &prf_reset_fops))
>+		goto err_remove;
>+
>+	return 0;
>+
>+err_remove:
>+	pr_err("initialization failed\n");
>+	return -EIO;
>+}
>+
>+/* Remove debufs entries. */
>+static void __exit pgo_exit(void)
>+{
>+	debugfs_remove_recursive(directory);
>+}
>+
>+module_init(pgo_init);
>+module_exit(pgo_exit);
>diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
>new file mode 100644
>index 0000000000000..d96b61a1cf712
>--- /dev/null
>+++ b/kernel/pgo/instrument.c
>@@ -0,0 +1,147 @@
>+// SPDX-License-Identifier: GPL-2.0
>+/*
>+ * Copyright (C) 2019 Google, Inc.
>+ *
>+ * Author:
>+ *	Sami Tolvanen <samitolvanen@google.com>
>+ *
>+ * This software is licensed under the terms of the GNU General Public
>+ * License version 2, as published by the Free Software Foundation, and
>+ * may be copied, distributed, and modified under those terms.
>+ *
>+ * This program is distributed in the hope that it will be useful,
>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>+ * GNU General Public License for more details.
>+ *
>+ */
>+
>+#define pr_fmt(fmt)	"pgo: " fmt
>+
>+#include <linux/kernel.h>
>+#include <linux/export.h>
>+#include <linux/spinlock.h>
>+#include <linux/types.h>
>+#include "pgo.h"
>+
>+/* Lock guarding value node access and serialization. */
>+static DEFINE_SPINLOCK(pgo_lock);
>+static int current_node;
>+
>+unsigned long prf_lock(void)
>+{
>+	unsigned long flags;
>+
>+	spin_lock_irqsave(&pgo_lock, flags);
>+
>+	return flags;
>+}
>+
>+void prf_unlock(unsigned long flags)
>+{
>+	spin_unlock_irqrestore(&pgo_lock, flags);
>+}
>+
>+/*
>+ * Return a newly allocated profiling value node which contains the tracked
>+ * value by the value profiler.
>+ * Note: caller *must* hold pgo_lock.
>+ */
>+static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
>+						 u32 index, u64 value)
>+{
>+	if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
>+		return NULL; /* Out of nodes */
>+
>+	current_node++;
>+
>+	/* Make sure the node is entirely within the section */
>+	if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
>+	    &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
>+		return NULL;
>+
>+	return &__llvm_prf_vnds_start[current_node];
>+}
>+
>+/*
>+ * Counts the number of times a target value is seen.
>+ *
>+ * Records the target value for the CounterIndex if not seen before. Otherwise,
>+ * increments the counter associated w/ the target value.
>+ */
>+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
>+{
>+	struct llvm_prf_data *p = (struct llvm_prf_data *)data;
>+	struct llvm_prf_value_node **counters;
>+	struct llvm_prf_value_node *curr;
>+	struct llvm_prf_value_node *min = NULL;
>+	struct llvm_prf_value_node *prev = NULL;
>+	u64 min_count = U64_MAX;
>+	u8 values = 0;
>+	unsigned long flags;
>+
>+	if (!p || !p->values)
>+		return;
>+
>+	counters = (struct llvm_prf_value_node **)p->values;
>+	curr = counters[index];
>+
>+	while (curr) {
>+		if (target_value == curr->value) {
>+			curr->count++;
>+			return;
>+		}
>+
>+		if (curr->count < min_count) {
>+			min_count = curr->count;
>+			min = curr;
>+		}
>+
>+		prev = curr;
>+		curr = curr->next;
>+		values++;
>+	}
>+
>+	if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
>+		if (!min->count || !(--min->count)) {
>+			curr = min;
>+			curr->value = target_value;
>+			curr->count++;
>+		}
>+		return;
>+	}
>+
>+	/* Lock when updating the value node structure. */
>+	flags = prf_lock();
>+
>+	curr = allocate_node(p, index, target_value);
>+	if (!curr)
>+		goto out;
>+
>+	curr->value = target_value;
>+	curr->count++;
>+
>+	if (!counters[index])
>+		counters[index] = curr;
>+	else if (prev && !prev->next)
>+		prev->next = curr;
>+
>+out:
>+	prf_unlock(flags);
>+}
>+EXPORT_SYMBOL(__llvm_profile_instrument_target);
>+
>+/* Counts the number of times a range of targets values are seen. */
>+void __llvm_profile_instrument_range(u64 target_value, void *data,
>+				     u32 index, s64 precise_start,
>+				     s64 precise_last, s64 large_value)
>+{
>+	if (large_value != S64_MIN && (s64)target_value >= large_value)
>+		target_value = large_value;
>+	else if ((s64)target_value < precise_start ||
>+		 (s64)target_value > precise_last)
>+		target_value = precise_last + 1;
>+
>+	__llvm_profile_instrument_target(target_value, data, index);
>+}
>+EXPORT_SYMBOL(__llvm_profile_instrument_range);
>diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
>new file mode 100644
>index 0000000000000..df0aa278f28bd
>--- /dev/null
>+++ b/kernel/pgo/pgo.h
>@@ -0,0 +1,206 @@
>+/* SPDX-License-Identifier: GPL-2.0 */
>+/*
>+ * Copyright (C) 2019 Google, Inc.
>+ *
>+ * Author:
>+ *	Sami Tolvanen <samitolvanen@google.com>
>+ *
>+ * This software is licensed under the terms of the GNU General Public
>+ * License version 2, as published by the Free Software Foundation, and
>+ * may be copied, distributed, and modified under those terms.
>+ *
>+ * This program is distributed in the hope that it will be useful,
>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>+ * GNU General Public License for more details.
>+ *
>+ */
>+
>+#ifndef _PGO_H
>+#define _PGO_H
>+
>+/*
>+ * Note: These internal LLVM definitions must match the compiler version.
>+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
>+ */
>+
>+#ifdef CONFIG_64BIT
>+	#define LLVM_PRF_MAGIC		\
>+		((u64)255 << 56 |	\
>+		 (u64)'l' << 48 |	\
>+		 (u64)'p' << 40 |	\
>+		 (u64)'r' << 32 |	\
>+		 (u64)'o' << 24 |	\
>+		 (u64)'f' << 16 |	\
>+		 (u64)'r' << 8  |	\
>+		 (u64)129)
>+#else
>+	#define LLVM_PRF_MAGIC		\
>+		((u64)255 << 56 |	\
>+		 (u64)'l' << 48 |	\
>+		 (u64)'p' << 40 |	\
>+		 (u64)'r' << 32 |	\
>+		 (u64)'o' << 24 |	\
>+		 (u64)'f' << 16 |	\
>+		 (u64)'R' << 8  |	\
>+		 (u64)129)
>+#endif
>+
>+#define LLVM_PRF_VERSION		5
>+#define LLVM_PRF_DATA_ALIGN		8
>+#define LLVM_PRF_IPVK_FIRST		0
>+#define LLVM_PRF_IPVK_LAST		1
>+#define LLVM_PRF_MAX_NUM_VALS_PER_SITE	16
>+
>+#define LLVM_PRF_VARIANT_MASK_IR	(0x1ull << 56)
>+#define LLVM_PRF_VARIANT_MASK_CSIR	(0x1ull << 57)
>+
>+/**
>+ * struct llvm_prf_header - represents the raw profile header data structure.
>+ * @magic: the magic token for the file format.
>+ * @version: the version of the file format.
>+ * @data_size: the number of entries in the profile data section.
>+ * @padding_bytes_before_counters: the number of padding bytes before the
>+ *   counters.
>+ * @counters_size: the size in bytes of the LLVM profile section containing the
>+ *   counters.
>+ * @padding_bytes_after_counters: the number of padding bytes after the
>+ *   counters.
>+ * @names_size: the size in bytes of the LLVM profile section containing the
>+ *   counters' names.
>+ * @counters_delta: the beginning of the LLMV profile counters section.
>+ * @names_delta: the beginning of the LLMV profile names section.
>+ * @value_kind_last: the last profile value kind.
>+ */
>+struct llvm_prf_header {
>+	u64 magic;
>+	u64 version;
>+	u64 data_size;
>+	u64 padding_bytes_before_counters;
>+	u64 counters_size;
>+	u64 padding_bytes_after_counters;
>+	u64 names_size;
>+	u64 counters_delta;
>+	u64 names_delta;
>+	u64 value_kind_last;
>+};
>+
>+/**
>+ * struct llvm_prf_data - represents the per-function control structure.
>+ * @name_ref: the reference to the function's name.
>+ * @func_hash: the hash value of the function.
>+ * @counter_ptr: a pointer to the profile counter.
>+ * @function_ptr: a pointer to the function.
>+ * @values: the profiling values associated with this function.
>+ * @num_counters: the number of counters in the function.
>+ * @num_value_sites: the number of value profile sites.
>+ */
>+struct llvm_prf_data {
>+	const u64 name_ref;
>+	const u64 func_hash;
>+	const void *counter_ptr;
>+	const void *function_ptr;
>+	void *values;
>+	const u32 num_counters;
>+	const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
>+} __aligned(LLVM_PRF_DATA_ALIGN);
>+
>+/**
>+ * structure llvm_prf_value_node_data - represents the data part of the struct
>+ *   llvm_prf_value_node data structure.
>+ * @value: the value counters.
>+ * @count: the counters' count.
>+ */
>+struct llvm_prf_value_node_data {
>+	u64 value;
>+	u64 count;
>+};
>+
>+/**
>+ * struct llvm_prf_value_node - represents an internal data structure used by
>+ *   the value profiler.
>+ * @value: the value counters.
>+ * @count: the counters' count.
>+ * @next: the next value node.
>+ */
>+struct llvm_prf_value_node {
>+	u64 value;
>+	u64 count;
>+	struct llvm_prf_value_node *next;
>+};
>+
>+/**
>+ * struct llvm_prf_value_data - represents the value profiling data in indexed
>+ *   format.
>+ * @total_size: the total size in bytes including this field.
>+ * @num_value_kinds: the number of value profile kinds that has value profile
>+ *   data.
>+ */
>+struct llvm_prf_value_data {
>+	u32 total_size;
>+	u32 num_value_kinds;
>+};
>+
>+/**
>+ * struct llvm_prf_value_record - represents the on-disk layout of the value
>+ *   profile data of a particular kind for one function.
>+ * @kind: the kind of the value profile record.
>+ * @num_value_sites: the number of value profile sites.
>+ * @site_count_array: the first element of the array that stores the number
>+ *   of profiled values for each value site.
>+ */
>+struct llvm_prf_value_record {
>+	u32 kind;
>+	u32 num_value_sites;
>+	u8 site_count_array[];
>+};
>+
>+#define prf_get_value_record_header_size()		\
>+	offsetof(struct llvm_prf_value_record, site_count_array)
>+#define prf_get_value_record_site_count_size(sites)	\
>+	roundup((sites), 8)
>+#define prf_get_value_record_size(sites)		\
>+	(prf_get_value_record_header_size() +		\
>+	 prf_get_value_record_site_count_size((sites)))
>+
>+/* Data sections */
>+extern struct llvm_prf_data __llvm_prf_data_start[];
>+extern struct llvm_prf_data __llvm_prf_data_end[];
>+
>+extern u64 __llvm_prf_cnts_start[];
>+extern u64 __llvm_prf_cnts_end[];
>+
>+extern char __llvm_prf_names_start[];
>+extern char __llvm_prf_names_end[];
>+
>+extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
>+extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
>+
>+/* Locking for vnodes */
>+extern unsigned long prf_lock(void);
>+extern void prf_unlock(unsigned long flags);
>+
>+#define __DEFINE_PRF_SIZE(s) \
>+	static inline unsigned long prf_ ## s ## _size(void)		\
>+	{								\
>+		unsigned long start =					\
>+			(unsigned long)__llvm_prf_ ## s ## _start;	\
>+		unsigned long end =					\
>+			(unsigned long)__llvm_prf_ ## s ## _end;	\
>+		return roundup(end - start,				\
>+				sizeof(__llvm_prf_ ## s ## _start[0]));	\
>+	}								\
>+	static inline unsigned long prf_ ## s ## _count(void)		\
>+	{								\
>+		return prf_ ## s ## _size() /				\
>+			sizeof(__llvm_prf_ ## s ## _start[0]);		\
>+	}
>+
>+__DEFINE_PRF_SIZE(data);
>+__DEFINE_PRF_SIZE(cnts);
>+__DEFINE_PRF_SIZE(names);
>+__DEFINE_PRF_SIZE(vnds);
>+
>+#undef __DEFINE_PRF_SIZE
>+
>+#endif /* _PGO_H */
>diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
>index 213677a5ed33e..9b218afb5cb87 100644
>--- a/scripts/Makefile.lib
>+++ b/scripts/Makefile.lib
>@@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
> 		$(CFLAGS_GCOV))
> endif
>
>+#
>+# Enable clang's PGO profiling flags for a file or directory depending on
>+# variables PGO_PROFILE_obj.o and PGO_PROFILE.
>+#
>+ifeq ($(CONFIG_PGO_CLANG),y)
>+_c_flags += $(if $(patsubst n%,, \
>+		$(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
>+		$(CFLAGS_PGO_CLANG))
>+endif
>+
> #
> # Enable address sanitizer flags for kernel except some files or directories
> # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
>-- 
>2.30.0.284.gd98b1dd5eaa7-goog
>
>-- 
>You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com.
>To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210111081821.3041587-1-morbo%40google.com.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-11 20:12 ` Fangrui Song
@ 2021-01-11 20:23   ` Bill Wendling
  2021-01-11 20:31     ` Fangrui Song
  0 siblings, 1 reply; 116+ messages in thread
From: Bill Wendling @ 2021-01-11 20:23 UTC (permalink / raw)
  To: Fangrui Song
  Cc: Jonathan Corbet, Masahiro Yamada, linux-doc, LKML,
	Linux Kbuild mailing list, clang-built-linux, Andrew Morton,
	Nathan Chancellor, Nick Desaulniers, Sami Tolvanen

On Mon, Jan 11, 2021 at 12:12 PM Fangrui Song <maskray@google.com> wrote:
>
> On 2021-01-11, 'Bill Wendling' via Clang Built Linux wrote:
> >From: Sami Tolvanen <samitolvanen@google.com>
> >
> >Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> >profile, the kernel is instrumented with PGO counters, a representative
> >workload is run, and the raw profile data is collected from
> >/sys/kernel/debug/pgo/profraw.
> >
> >The raw profile data must be processed by clang's "llvm-profdata" tool before
> >it can be used during recompilation:
> >
> >  $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> >  $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> >Multiple raw profiles may be merged during this step.
> >
> >The data can be used either by the compiler if LTO isn't enabled:
> >
> >    ... -fprofile-use=vmlinux.profdata ...
> >
> >or by LLD if LTO is enabled:
> >
> >    ... -lto-cs-profile-file=vmlinux.profdata ...
>
> This LLD option does not exist.
> LLD does have some `--lto-*` options but the `-lto-*` form is not supported
> (it clashes with -l) https://reviews.llvm.org/D79371
>
That's strange. I've been using that option for years now. :-) Is this
a recent change?

> (There is an earlier -fprofile-instr-generate which does
> instrumentation in Clang, but the option does not have broad usage.
> It is used more for code coverage, not for optimization.
> Noticeably, it does not even implement the Kirchhoff's current law
> optimization)
>
Right. I've been told outside of this email that -fprofile-generate is
the prefered flag to use.

> -fprofile-use= is used by both regular PGO and context-sensitive PGO (CSPGO).
>
> clang -flto=thin -fprofile-use= passes -plugin-opt=cs-profile-path= to the linker.
> For regular PGO, this option is effectively a no-op (confirmed with CSPGO main developer).
>
> So I think the "or by LLD if LTO is enabled:" part should be removed.

But what if you specify the linking step explicitly? Linux doesn't
call "clang" when linking, but "ld.lld".

-bw

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-11 20:23   ` Bill Wendling
@ 2021-01-11 20:31     ` Fangrui Song
  2021-01-12  0:37       ` Bill Wendling
  0 siblings, 1 reply; 116+ messages in thread
From: Fangrui Song @ 2021-01-11 20:31 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Jonathan Corbet, Masahiro Yamada, linux-doc, LKML,
	Linux Kbuild mailing list, clang-built-linux, Andrew Morton,
	Nathan Chancellor, Nick Desaulniers, Sami Tolvanen

On 2021-01-11, Bill Wendling wrote:
>On Mon, Jan 11, 2021 at 12:12 PM Fangrui Song <maskray@google.com> wrote:
>>
>> On 2021-01-11, 'Bill Wendling' via Clang Built Linux wrote:
>> >From: Sami Tolvanen <samitolvanen@google.com>
>> >
>> >Enable the use of clang's Profile-Guided Optimization[1]. To generate a
>> >profile, the kernel is instrumented with PGO counters, a representative
>> >workload is run, and the raw profile data is collected from
>> >/sys/kernel/debug/pgo/profraw.
>> >
>> >The raw profile data must be processed by clang's "llvm-profdata" tool before
>> >it can be used during recompilation:
>> >
>> >  $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
>> >  $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>> >
>> >Multiple raw profiles may be merged during this step.
>> >
>> >The data can be used either by the compiler if LTO isn't enabled:
>> >
>> >    ... -fprofile-use=vmlinux.profdata ...
>> >
>> >or by LLD if LTO is enabled:
>> >
>> >    ... -lto-cs-profile-file=vmlinux.profdata ...
>>
>> This LLD option does not exist.
>> LLD does have some `--lto-*` options but the `-lto-*` form is not supported
>> (it clashes with -l) https://reviews.llvm.org/D79371
>>
>That's strange. I've been using that option for years now. :-) Is this
>a recent change?

The more frequently used options (specifyed by the clang driver) are
-plugin-opt=... (options implemented by LLVMgold.so).
`-lto-*` is rare.

>> (There is an earlier -fprofile-instr-generate which does
>> instrumentation in Clang, but the option does not have broad usage.
>> It is used more for code coverage, not for optimization.
>> Noticeably, it does not even implement the Kirchhoff's current law
>> optimization)
>>
>Right. I've been told outside of this email that -fprofile-generate is
>the prefered flag to use.
>
>> -fprofile-use= is used by both regular PGO and context-sensitive PGO (CSPGO).
>>
>> clang -flto=thin -fprofile-use= passes -plugin-opt=cs-profile-path= to the linker.
>> For regular PGO, this option is effectively a no-op (confirmed with CSPGO main developer).
>>
>> So I think the "or by LLD if LTO is enabled:" part should be removed.
>
>But what if you specify the linking step explicitly? Linux doesn't
>call "clang" when linking, but "ld.lld".

Regular PGO+LTO does not need -plugin-opt=cs-profile-path=
CSPGO+LTO needs it.
Because -fprofile-use= may be used by both, Clang driver adds it.
CSPGO is relevant in this this patch, so the linker option does not need to be mentioned.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-11  8:18 [PATCH] pgo: add clang's Profile Guided Optimization infrastructure Bill Wendling
  2021-01-11  8:39 ` Sedat Dilek
  2021-01-11 20:12 ` Fangrui Song
@ 2021-01-11 21:04 ` Nathan Chancellor
  2021-01-11 21:17   ` Nick Desaulniers
  2021-01-12  5:14 ` [PATCH v2] " Bill Wendling
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 116+ messages in thread
From: Nathan Chancellor @ 2021-01-11 21:04 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Jonathan Corbet, Masahiro Yamada, linux-doc, linux-kernel,
	linux-kbuild, clang-built-linux, Andrew Morton, Nick Desaulniers,
	Sami Tolvanen

On Mon, Jan 11, 2021 at 12:18:21AM -0800, Bill Wendling wrote:
> From: Sami Tolvanen <samitolvanen@google.com>
> 
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
> 
> The raw profile data must be processed by clang's "llvm-profdata" tool before
> it can be used during recompilation:
> 
>   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
>   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> 
> Multiple raw profiles may be merged during this step.
> 
> The data can be used either by the compiler if LTO isn't enabled:
> 
>     ... -fprofile-use=vmlinux.profdata ...
> 
> or by LLD if LTO is enabled:
> 
>     ... -lto-cs-profile-file=vmlinux.profdata ...
> 
> This initial submission is restricted to x86, as that's the platform we know
> works. This restriction can be lifted once other platforms have been verified
> to work with PGO.
> 
> Note that this method of profiling the kernel is clang-native and isn't
> compatible with clang's gcov support in kernel/gcov.
> 
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> 
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> Co-developed-by: Bill Wendling <morbo@google.com>
> Signed-off-by: Bill Wendling <morbo@google.com>

I took this for a spin against x86_64_defconfig and ran into two issues:

1. https://github.com/ClangBuiltLinux/linux/issues/1252

   There is also one in drivers/gpu/drm/i915/i915_query.c. For the time
   being, I added PGO_PROFILE_... := n for those two files.

2. After doing that, I run into an undefined function error with ld.lld.

How I tested:

$ make -skj"$(nproc)" LLVM=1 defconfig

$ scripts/config -e PGO_CLANG

$ make -skj"$(nproc)" LLVM=1 olddefconfig vmlinux all
ld.lld: error: undefined symbol: __llvm_profile_instrument_memop
>>> referenced by head64.c
>>>               arch/x86/kernel/head64.o:(__early_make_pgtable)
>>> referenced by head64.c
>>>               arch/x86/kernel/head64.o:(x86_64_start_kernel)
>>> referenced by head64.c
>>>               arch/x86/kernel/head64.o:(copy_bootdata)
>>> referenced 2259 more times

Local diff:

diff --git a/drivers/char/Makefile b/drivers/char/Makefile
index ffce287ef415..4b2f238770b5 100644
--- a/drivers/char/Makefile
+++ b/drivers/char/Makefile
@@ -4,6 +4,7 @@
 #
 
 obj-y				+= mem.o random.o
+PGO_PROFILE_random.o		:= n
 obj-$(CONFIG_TTY_PRINTK)	+= ttyprintk.o
 obj-y				+= misc.o
 obj-$(CONFIG_ATARI_DSP56K)	+= dsp56k.o
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index e5574e506a5c..d83cacc79b1a 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -168,6 +168,7 @@ i915-y += \
 	  i915_vma.o \
 	  intel_region_lmem.o \
 	  intel_wopcm.o
+PGO_PROFILE_i915_query.o := n
 
 # general-purpose microcontroller (GuC) support
 i915-y += gt/uc/intel_uc.o \

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-11 21:04 ` Nathan Chancellor
@ 2021-01-11 21:17   ` Nick Desaulniers
  2021-01-11 21:32     ` Bill Wendling
  0 siblings, 1 reply; 116+ messages in thread
From: Nick Desaulniers @ 2021-01-11 21:17 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Bill Wendling, Jonathan Corbet, Masahiro Yamada,
	Linux Doc Mailing List, LKML, Linux Kbuild mailing list,
	clang-built-linux, Andrew Morton, Sami Tolvanen

On Mon, Jan 11, 2021 at 1:04 PM Nathan Chancellor
<natechancellor@gmail.com> wrote:
>
> On Mon, Jan 11, 2021 at 12:18:21AM -0800, Bill Wendling wrote:
> > From: Sami Tolvanen <samitolvanen@google.com>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool before
> > it can be used during recompilation:
> >
> >   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> >   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can be used either by the compiler if LTO isn't enabled:
> >
> >     ... -fprofile-use=vmlinux.profdata ...
> >
> > or by LLD if LTO is enabled:
> >
> >     ... -lto-cs-profile-file=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we know
> > works. This restriction can be lifted once other platforms have been verified
> > to work with PGO.
> >
> > Note that this method of profiling the kernel is clang-native and isn't
> > compatible with clang's gcov support in kernel/gcov.
> >
> > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> >
> > Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> > Co-developed-by: Bill Wendling <morbo@google.com>
> > Signed-off-by: Bill Wendling <morbo@google.com>
>
> I took this for a spin against x86_64_defconfig and ran into two issues:
>
> 1. https://github.com/ClangBuiltLinux/linux/issues/1252

"Cannot split an edge from a CallBrInst"
Looks like that should be fixed first, then we should gate this
feature on clang-12.

>
>    There is also one in drivers/gpu/drm/i915/i915_query.c. For the time
>    being, I added PGO_PROFILE_... := n for those two files.
>
> 2. After doing that, I run into an undefined function error with ld.lld.
>
> How I tested:
>
> $ make -skj"$(nproc)" LLVM=1 defconfig
>
> $ scripts/config -e PGO_CLANG
>
> $ make -skj"$(nproc)" LLVM=1 olddefconfig vmlinux all
> ld.lld: error: undefined symbol: __llvm_profile_instrument_memop

Err...that seems like it should be implemented in
kernel/pgo/instrument.c in this patch in a v2?

> >>> referenced by head64.c
> >>>               arch/x86/kernel/head64.o:(__early_make_pgtable)
> >>> referenced by head64.c
> >>>               arch/x86/kernel/head64.o:(x86_64_start_kernel)
> >>> referenced by head64.c
> >>>               arch/x86/kernel/head64.o:(copy_bootdata)
> >>> referenced 2259 more times
>
> Local diff:
>
> diff --git a/drivers/char/Makefile b/drivers/char/Makefile
> index ffce287ef415..4b2f238770b5 100644
> --- a/drivers/char/Makefile
> +++ b/drivers/char/Makefile
> @@ -4,6 +4,7 @@
>  #
>
>  obj-y                          += mem.o random.o
> +PGO_PROFILE_random.o           := n
>  obj-$(CONFIG_TTY_PRINTK)       += ttyprintk.o
>  obj-y                          += misc.o
>  obj-$(CONFIG_ATARI_DSP56K)     += dsp56k.o
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index e5574e506a5c..d83cacc79b1a 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -168,6 +168,7 @@ i915-y += \
>           i915_vma.o \
>           intel_region_lmem.o \
>           intel_wopcm.o
> +PGO_PROFILE_i915_query.o := n
>
>  # general-purpose microcontroller (GuC) support
>  i915-y += gt/uc/intel_uc.o \

I'd rather have these both sorted out before landing with PGO disabled
on these files.

-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-11 21:17   ` Nick Desaulniers
@ 2021-01-11 21:32     ` Bill Wendling
  0 siblings, 0 replies; 116+ messages in thread
From: Bill Wendling @ 2021-01-11 21:32 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Nathan Chancellor, Jonathan Corbet, Masahiro Yamada,
	Linux Doc Mailing List, LKML, Linux Kbuild mailing list,
	clang-built-linux, Andrew Morton, Sami Tolvanen

On Mon, Jan 11, 2021 at 1:18 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Mon, Jan 11, 2021 at 1:04 PM Nathan Chancellor
> <natechancellor@gmail.com> wrote:
> >
> > On Mon, Jan 11, 2021 at 12:18:21AM -0800, Bill Wendling wrote:
> > > From: Sami Tolvanen <samitolvanen@google.com>
> > >
> > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > profile, the kernel is instrumented with PGO counters, a representative
> > > workload is run, and the raw profile data is collected from
> > > /sys/kernel/debug/pgo/profraw.
> > >
> > > The raw profile data must be processed by clang's "llvm-profdata" tool before
> > > it can be used during recompilation:
> > >
> > >   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > >   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > >
> > > Multiple raw profiles may be merged during this step.
> > >
> > > The data can be used either by the compiler if LTO isn't enabled:
> > >
> > >     ... -fprofile-use=vmlinux.profdata ...
> > >
> > > or by LLD if LTO is enabled:
> > >
> > >     ... -lto-cs-profile-file=vmlinux.profdata ...
> > >
> > > This initial submission is restricted to x86, as that's the platform we know
> > > works. This restriction can be lifted once other platforms have been verified
> > > to work with PGO.
> > >
> > > Note that this method of profiling the kernel is clang-native and isn't
> > > compatible with clang's gcov support in kernel/gcov.
> > >
> > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > >
> > > Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> > > Co-developed-by: Bill Wendling <morbo@google.com>
> > > Signed-off-by: Bill Wendling <morbo@google.com>
> >
> > I took this for a spin against x86_64_defconfig and ran into two issues:
> >
> > 1. https://github.com/ClangBuiltLinux/linux/issues/1252
>
> "Cannot split an edge from a CallBrInst"
> Looks like that should be fixed first, then we should gate this
> feature on clang-12.
>
Weird. I'll investigate.

> >
> >    There is also one in drivers/gpu/drm/i915/i915_query.c. For the time
> >    being, I added PGO_PROFILE_... := n for those two files.
> >
> > 2. After doing that, I run into an undefined function error with ld.lld.
> >
> > How I tested:
> >
> > $ make -skj"$(nproc)" LLVM=1 defconfig
> >
> > $ scripts/config -e PGO_CLANG
> >
> > $ make -skj"$(nproc)" LLVM=1 olddefconfig vmlinux all
> > ld.lld: error: undefined symbol: __llvm_profile_instrument_memop
>
> Err...that seems like it should be implemented in
> kernel/pgo/instrument.c in this patch in a v2?
>
Yes. I'll submit a new V2 with this and other feedback integrated.

> > >>> referenced by head64.c
> > >>>               arch/x86/kernel/head64.o:(__early_make_pgtable)
> > >>> referenced by head64.c
> > >>>               arch/x86/kernel/head64.o:(x86_64_start_kernel)
> > >>> referenced by head64.c
> > >>>               arch/x86/kernel/head64.o:(copy_bootdata)
> > >>> referenced 2259 more times
> >
> > Local diff:
> >
> > diff --git a/drivers/char/Makefile b/drivers/char/Makefile
> > index ffce287ef415..4b2f238770b5 100644
> > --- a/drivers/char/Makefile
> > +++ b/drivers/char/Makefile
> > @@ -4,6 +4,7 @@
> >  #
> >
> >  obj-y                          += mem.o random.o
> > +PGO_PROFILE_random.o           := n
> >  obj-$(CONFIG_TTY_PRINTK)       += ttyprintk.o
> >  obj-y                          += misc.o
> >  obj-$(CONFIG_ATARI_DSP56K)     += dsp56k.o
> > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> > index e5574e506a5c..d83cacc79b1a 100644
> > --- a/drivers/gpu/drm/i915/Makefile
> > +++ b/drivers/gpu/drm/i915/Makefile
> > @@ -168,6 +168,7 @@ i915-y += \
> >           i915_vma.o \
> >           intel_region_lmem.o \
> >           intel_wopcm.o
> > +PGO_PROFILE_i915_query.o := n
> >
> >  # general-purpose microcontroller (GuC) support
> >  i915-y += gt/uc/intel_uc.o \
>
> I'd rather have these both sorted out before landing with PGO disabled
> on these files.
>
Agreed.

-bw

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-11 20:31     ` Fangrui Song
@ 2021-01-12  0:37       ` Bill Wendling
  2021-01-12  0:44         ` Fāng-ruì Sòng
  0 siblings, 1 reply; 116+ messages in thread
From: Bill Wendling @ 2021-01-12  0:37 UTC (permalink / raw)
  To: Fangrui Song
  Cc: Jonathan Corbet, Masahiro Yamada, Linux Doc Mailing List, LKML,
	Linux Kbuild mailing list, clang-built-linux, Andrew Morton,
	Nathan Chancellor, Nick Desaulniers, Sami Tolvanen

On Mon, Jan 11, 2021 at 12:31 PM Fangrui Song <maskray@google.com> wrote:
> On 2021-01-11, Bill Wendling wrote:
> >On Mon, Jan 11, 2021 at 12:12 PM Fangrui Song <maskray@google.com> wrote:
> >>
> >> On 2021-01-11, 'Bill Wendling' via Clang Built Linux wrote:
> >> >From: Sami Tolvanen <samitolvanen@google.com>
> >> >
> >> >Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> >> >profile, the kernel is instrumented with PGO counters, a representative
> >> >workload is run, and the raw profile data is collected from
> >> >/sys/kernel/debug/pgo/profraw.
> >> >
> >> >The raw profile data must be processed by clang's "llvm-profdata" tool before
> >> >it can be used during recompilation:
> >> >
> >> >  $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> >> >  $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >> >
> >> >Multiple raw profiles may be merged during this step.
> >> >
> >> >The data can be used either by the compiler if LTO isn't enabled:
> >> >
> >> >    ... -fprofile-use=vmlinux.profdata ...
> >> >
> >> >or by LLD if LTO is enabled:
> >> >
> >> >    ... -lto-cs-profile-file=vmlinux.profdata ...
> >>
> >> This LLD option does not exist.
> >> LLD does have some `--lto-*` options but the `-lto-*` form is not supported
> >> (it clashes with -l) https://reviews.llvm.org/D79371
> >>
> >That's strange. I've been using that option for years now. :-) Is this
> >a recent change?
>
> The more frequently used options (specifyed by the clang driver) are
> -plugin-opt=... (options implemented by LLVMgold.so).
> `-lto-*` is rare.
>
> >> (There is an earlier -fprofile-instr-generate which does
> >> instrumentation in Clang, but the option does not have broad usage.
> >> It is used more for code coverage, not for optimization.
> >> Noticeably, it does not even implement the Kirchhoff's current law
> >> optimization)
> >>
> >Right. I've been told outside of this email that -fprofile-generate is
> >the prefered flag to use.
> >
> >> -fprofile-use= is used by both regular PGO and context-sensitive PGO (CSPGO).
> >>
> >> clang -flto=thin -fprofile-use= passes -plugin-opt=cs-profile-path= to the linker.
> >> For regular PGO, this option is effectively a no-op (confirmed with CSPGO main developer).
> >>
> >> So I think the "or by LLD if LTO is enabled:" part should be removed.
> >
> >But what if you specify the linking step explicitly? Linux doesn't
> >call "clang" when linking, but "ld.lld".
>
> Regular PGO+LTO does not need -plugin-opt=cs-profile-path=
> CSPGO+LTO needs it.
> Because -fprofile-use= may be used by both, Clang driver adds it.
> CSPGO is relevant in this this patch, so the linker option does not need to be mentioned.

I'm still a bit confused. Are you saying that when clang uses
`-flto=thin -fprofile-use=foo` that the profile file "foo" is embedded
into the bitcode file so that when the linker's run it'll be used?

This is the workflow:

clang ... -fprofile-use=vmlinux.profdata ... -c -o foo.o foo.c
clang ... -fprofile-use=vmlinux.profdata ... -c -o bar.o bar.c
ld.lld ... <output file> foo.o bar.o

Are you saying that we don't need to have
"-plugin-opt=cs-profile-path=vmlinux.profdata" on the "ld.lld ..."
line?

-bw

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-12  0:37       ` Bill Wendling
@ 2021-01-12  0:44         ` Fāng-ruì Sòng
  0 siblings, 0 replies; 116+ messages in thread
From: Fāng-ruì Sòng @ 2021-01-12  0:44 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Jonathan Corbet, Masahiro Yamada, Linux Doc Mailing List, LKML,
	Linux Kbuild mailing list, clang-built-linux, Andrew Morton,
	Nathan Chancellor, Nick Desaulniers, Sami Tolvanen

On Mon, Jan 11, 2021 at 4:38 PM Bill Wendling <morbo@google.com> wrote:
>
> On Mon, Jan 11, 2021 at 12:31 PM Fangrui Song <maskray@google.com> wrote:
> > On 2021-01-11, Bill Wendling wrote:
> > >On Mon, Jan 11, 2021 at 12:12 PM Fangrui Song <maskray@google.com> wrote:
> > >>
> > >> On 2021-01-11, 'Bill Wendling' via Clang Built Linux wrote:
> > >> >From: Sami Tolvanen <samitolvanen@google.com>
> > >> >
> > >> >Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > >> >profile, the kernel is instrumented with PGO counters, a representative
> > >> >workload is run, and the raw profile data is collected from
> > >> >/sys/kernel/debug/pgo/profraw.
> > >> >
> > >> >The raw profile data must be processed by clang's "llvm-profdata" tool before
> > >> >it can be used during recompilation:
> > >> >
> > >> >  $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > >> >  $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > >> >
> > >> >Multiple raw profiles may be merged during this step.
> > >> >
> > >> >The data can be used either by the compiler if LTO isn't enabled:
> > >> >
> > >> >    ... -fprofile-use=vmlinux.profdata ...
> > >> >
> > >> >or by LLD if LTO is enabled:
> > >> >
> > >> >    ... -lto-cs-profile-file=vmlinux.profdata ...
> > >>
> > >> This LLD option does not exist.
> > >> LLD does have some `--lto-*` options but the `-lto-*` form is not supported
> > >> (it clashes with -l) https://reviews.llvm.org/D79371
> > >>
> > >That's strange. I've been using that option for years now. :-) Is this
> > >a recent change?
> >
> > The more frequently used options (specifyed by the clang driver) are
> > -plugin-opt=... (options implemented by LLVMgold.so).
> > `-lto-*` is rare.
> >
> > >> (There is an earlier -fprofile-instr-generate which does
> > >> instrumentation in Clang, but the option does not have broad usage.
> > >> It is used more for code coverage, not for optimization.
> > >> Noticeably, it does not even implement the Kirchhoff's current law
> > >> optimization)
> > >>
> > >Right. I've been told outside of this email that -fprofile-generate is
> > >the prefered flag to use.
> > >
> > >> -fprofile-use= is used by both regular PGO and context-sensitive PGO (CSPGO).
> > >>
> > >> clang -flto=thin -fprofile-use= passes -plugin-opt=cs-profile-path= to the linker.
> > >> For regular PGO, this option is effectively a no-op (confirmed with CSPGO main developer).
> > >>
> > >> So I think the "or by LLD if LTO is enabled:" part should be removed.
> > >
> > >But what if you specify the linking step explicitly? Linux doesn't
> > >call "clang" when linking, but "ld.lld".
> >
> > Regular PGO+LTO does not need -plugin-opt=cs-profile-path=
> > CSPGO+LTO needs it.
> > Because -fprofile-use= may be used by both, Clang driver adds it.
> > CSPGO is relevant in this this patch, so the linker option does not need to be mentioned.
>
> I'm still a bit confused. Are you saying that when clang uses
> `-flto=thin -fprofile-use=foo` that the profile file "foo" is embedded
> into the bitcode file so that when the linker's run it'll be used?
>
> This is the workflow:
>
> clang ... -fprofile-use=vmlinux.profdata ... -c -o foo.o foo.c
> clang ... -fprofile-use=vmlinux.profdata ... -c -o bar.o bar.c
> ld.lld ... <output file> foo.o bar.o
>
> Are you saying that we don't need to have
> "-plugin-opt=cs-profile-path=vmlinux.profdata" on the "ld.lld ..."
> line?
>
> -bw

The backend compile step -flto=thin -fprofile-use=foo has all the information.

-plugin-opt=cs-profile-path=vmlinux.profdata is not needed for regular PGO.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* [PATCH v2] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-11  8:18 [PATCH] pgo: add clang's Profile Guided Optimization infrastructure Bill Wendling
                   ` (2 preceding siblings ...)
  2021-01-11 21:04 ` Nathan Chancellor
@ 2021-01-12  5:14 ` Bill Wendling
  2021-01-12  5:17   ` Sedat Dilek
                     ` (2 more replies)
  2021-01-21  2:21 ` [PATCH] " Sedat Dilek
  2021-04-07 21:17 ` [PATCH v9] " Bill Wendling
  5 siblings, 3 replies; 116+ messages in thread
From: Bill Wendling @ 2021-01-12  5:14 UTC (permalink / raw)
  To: Jonathan Corbet, Masahiro Yamada, linux-doc, linux-kernel,
	linux-kbuild, clang-built-linux, Andrew Morton
  Cc: Nathan Chancellor, Nick Desaulniers, Sami Tolvanen, Bill Wendling

From: Sami Tolvanen <samitolvanen@google.com>

Enable the use of clang's Profile-Guided Optimization[1]. To generate a
profile, the kernel is instrumented with PGO counters, a representative
workload is run, and the raw profile data is collected from
/sys/kernel/debug/pgo/profraw.

The raw profile data must be processed by clang's "llvm-profdata" tool
before it can be used during recompilation:

  $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
  $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw

Multiple raw profiles may be merged during this step.

The data can now be used by the compiler:

  $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...

This initial submission is restricted to x86, as that's the platform we
know works. This restriction can be lifted once other platforms have
been verified to work with PGO.

Note that this method of profiling the kernel is clang-native and isn't
compatible with clang's gcov support in kernel/gcov.

[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Co-developed-by: Bill Wendling <morbo@google.com>
Signed-off-by: Bill Wendling <morbo@google.com>
---
 Documentation/dev-tools/index.rst     |   1 +
 Documentation/dev-tools/pgo.rst       | 127 +++++++++
 MAINTAINERS                           |   9 +
 Makefile                              |   3 +
 arch/Kconfig                          |   1 +
 arch/arm/boot/bootp/Makefile          |   1 +
 arch/arm/boot/compressed/Makefile     |   1 +
 arch/arm/vdso/Makefile                |   3 +-
 arch/arm64/kernel/vdso/Makefile       |   3 +-
 arch/arm64/kvm/hyp/nvhe/Makefile      |   1 +
 arch/mips/boot/compressed/Makefile    |   1 +
 arch/mips/vdso/Makefile               |   1 +
 arch/nds32/kernel/vdso/Makefile       |   4 +-
 arch/parisc/boot/compressed/Makefile  |   1 +
 arch/powerpc/kernel/Makefile          |   6 +-
 arch/powerpc/kernel/trace/Makefile    |   3 +-
 arch/powerpc/kernel/vdso32/Makefile   |   1 +
 arch/powerpc/kernel/vdso64/Makefile   |   1 +
 arch/powerpc/kexec/Makefile           |   3 +-
 arch/powerpc/xmon/Makefile            |   1 +
 arch/riscv/kernel/vdso/Makefile       |   3 +-
 arch/s390/boot/Makefile               |   1 +
 arch/s390/boot/compressed/Makefile    |   1 +
 arch/s390/kernel/Makefile             |   1 +
 arch/s390/kernel/vdso64/Makefile      |   3 +-
 arch/s390/purgatory/Makefile          |   1 +
 arch/sh/boot/compressed/Makefile      |   1 +
 arch/sh/mm/Makefile                   |   1 +
 arch/sparc/vdso/Makefile              |   1 +
 arch/x86/Kconfig                      |   1 +
 arch/x86/boot/Makefile                |   1 +
 arch/x86/boot/compressed/Makefile     |   1 +
 arch/x86/entry/vdso/Makefile          |   1 +
 arch/x86/kernel/vmlinux.lds.S         |   2 +
 arch/x86/platform/efi/Makefile        |   1 +
 arch/x86/purgatory/Makefile           |   1 +
 arch/x86/realmode/rm/Makefile         |   1 +
 arch/x86/um/vdso/Makefile             |   1 +
 drivers/firmware/efi/libstub/Makefile |   1 +
 drivers/s390/char/Makefile            |   1 +
 include/asm-generic/vmlinux.lds.h     |  44 +++
 kernel/Makefile                       |   1 +
 kernel/pgo/Kconfig                    |  34 +++
 kernel/pgo/Makefile                   |   5 +
 kernel/pgo/fs.c                       | 382 ++++++++++++++++++++++++++
 kernel/pgo/instrument.c               | 188 +++++++++++++
 kernel/pgo/pgo.h                      | 206 ++++++++++++++
 scripts/Makefile.lib                  |  10 +
 48 files changed, 1058 insertions(+), 9 deletions(-)
 create mode 100644 Documentation/dev-tools/pgo.rst
 create mode 100644 kernel/pgo/Kconfig
 create mode 100644 kernel/pgo/Makefile
 create mode 100644 kernel/pgo/fs.c
 create mode 100644 kernel/pgo/instrument.c
 create mode 100644 kernel/pgo/pgo.h

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index f7809c7b1ba9e..8d6418e858062 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -26,6 +26,7 @@ whole; patches welcome!
    kgdb
    kselftest
    kunit/index
+   pgo
 
 
 .. only::  subproject and html
diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
new file mode 100644
index 0000000000000..da0e654ae7078
--- /dev/null
+++ b/Documentation/dev-tools/pgo.rst
@@ -0,0 +1,127 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+Using PGO with the Linux kernel
+===============================
+
+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
+when building with Clang. The profiling data is exported via the ``pgo``
+debugfs directory.
+
+.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
+
+
+Preparation
+===========
+
+Configure the kernel with:
+
+.. code-block:: make
+
+   CONFIG_DEBUG_FS=y
+   CONFIG_PGO_CLANG=y
+
+Note that kernels compiled with profiling flags will be significantly larger
+and run slower.
+
+Profiling data will only become accessible once debugfs has been mounted:
+
+.. code-block:: sh
+
+   mount -t debugfs none /sys/kernel/debug
+
+
+Customization
+=============
+
+You can enable or disable profiling for individual file and directories by
+adding a line similar to the following to the respective kernel Makefile:
+
+- For a single file (e.g. main.o)
+
+  .. code-block:: make
+
+     PGO_PROFILE_main.o := y
+
+- For all files in one directory
+
+  .. code-block:: make
+
+     PGO_PROFILE := y
+
+To exclude files from being profiled use
+
+  .. code-block:: make
+
+     PGO_PROFILE_main.o := n
+
+and
+
+  .. code-block:: make
+
+     PGO_PROFILE := n
+
+Only files which are linked to the main kernel image or are compiled as kernel
+modules are supported by this mechanism.
+
+
+Files
+=====
+
+The PGO kernel support creates the following files in debugfs:
+
+``/sys/kernel/debug/pgo``
+	Parent directory for all PGO-related files.
+
+``/sys/kernel/debug/pgo/reset``
+	Global reset file: resets all coverage data to zero when written to.
+
+``/sys/kernel/debug/profraw``
+	The raw PGO data that must be processed with ``llvm_profdata``.
+
+
+Workflow
+========
+
+The PGO kernel can be run on the host or test machines. The data though should
+be analyzed with Clang's tools from the same Clang version as the kernel was
+compiled. Clang's tolerant of version skew, but it's easier to use the same
+Clang version.
+
+The profiling data is useful for optimizing the kernel, analyzing coverage,
+etc. Clang offers tools to perform these tasks.
+
+Here is an example workflow for profiling an instrumented kernel with PGO and
+using the result to optimize the kernel:
+
+1) Install the kernel on the TEST machine.
+
+2) Reset the data counters right before running the load tests
+
+   .. code-block:: sh
+
+      echo 1 > /sys/kernel/debug/pgo/reset
+
+3) Run the load tests.
+
+4) Collect the raw profile data
+
+   .. code-block:: sh
+
+      cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
+
+5) (Optional) Download the raw profile data to the HOST machine.
+
+6) Process the raw profile data
+
+   .. code-block:: sh
+
+      llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
+
+   Note that multiple raw profile data files can be merged during this step.
+
+7) Rebuild the kernel using the profile data (PGO disabled)
+
+   .. code-block:: sh
+
+      make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
diff --git a/MAINTAINERS b/MAINTAINERS
index cc1e6a5ee6e67..1b979da316fa4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13954,6 +13954,15 @@ S:	Maintained
 F:	include/linux/personality.h
 F:	include/uapi/linux/personality.h
 
+PGO BASED KERNEL PROFILING
+M:	Sami Tolvanen <samitolvanen@google.com>
+M:	Bill Wendling <wcw@google.com>
+R:	Nathan Chancellor <natechancellor@gmail.com>
+R:	Nick Desaulniers <ndesaulniers@google.com>
+S:	Supported
+F:	Documentation/dev-tools/pgo.rst
+F:	kernel/pgo
+
 PHOENIX RC FLIGHT CONTROLLER ADAPTER
 M:	Marcus Folkesson <marcus.folkesson@gmail.com>
 L:	linux-input@vger.kernel.org
diff --git a/Makefile b/Makefile
index 9e73f82e0d863..9128bfe1ccc97 100644
--- a/Makefile
+++ b/Makefile
@@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
 # Defaults to vmlinux, but the arch makefile usually adds further targets
 all: vmlinux
 
+CFLAGS_PGO_CLANG := -fprofile-generate
+export CFLAGS_PGO_CLANG
+
 CFLAGS_GCOV	:= -fprofile-arcs -ftest-coverage \
 	$(call cc-option,-fno-tree-loop-im) \
 	$(call cc-disable-warning,maybe-uninitialized,)
diff --git a/arch/Kconfig b/arch/Kconfig
index 24862d15f3a36..f39d3991f6bfe 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
 	   pairs of 32-bit arguments, select this option.
 
 source "kernel/gcov/Kconfig"
+source "kernel/pgo/Kconfig"
 
 source "scripts/gcc-plugins/Kconfig"
 
diff --git a/arch/arm/boot/bootp/Makefile b/arch/arm/boot/bootp/Makefile
index 981a8d03f064c..523bd58df0a4b 100644
--- a/arch/arm/boot/bootp/Makefile
+++ b/arch/arm/boot/bootp/Makefile
@@ -7,6 +7,7 @@
 #
 
 GCOV_PROFILE	:= n
+PGO_PROFILE	:= n
 
 LDFLAGS_bootp	:= --no-undefined -X \
 		 --defsym initrd_phys=$(INITRD_PHYS) \
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index fb521efcc6c20..5fd0fd85fc0e5 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -24,6 +24,7 @@ OBJS		+= hyp-stub.o
 endif
 
 GCOV_PROFILE		:= n
+PGO_PROFILE		:= n
 KASAN_SANITIZE		:= n
 
 # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile
index b558bee0e1f6b..11f6ce4b48b56 100644
--- a/arch/arm/vdso/Makefile
+++ b/arch/arm/vdso/Makefile
@@ -36,8 +36,9 @@ else
 CFLAGS_vgettimeofday.o = -O2 -include $(c-gettimeofday-y)
 endif
 
-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
 KCOV_INSTRUMENT := n
diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
index cd9c3fa25902f..d48fc0df07020 100644
--- a/arch/arm64/kernel/vdso/Makefile
+++ b/arch/arm64/kernel/vdso/Makefile
@@ -41,8 +41,9 @@ ifneq ($(c-gettimeofday-y),)
   CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
 endif
 
-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 obj-y += vdso.o
 targets += vdso.lds
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 1f1e351c5fe2b..ad128ecdbfbdf 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -60,6 +60,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS), $(KBUILD_CFLAG
 # compiler instrumentation that inserts callbacks or checks into the code may
 # cause crashes. Just disable it.
 GCOV_PROFILE	:= n
+PGO_PROFILE	:= n
 KASAN_SANITIZE	:= n
 UBSAN_SANITIZE	:= n
 KCOV_INSTRUMENT	:= n
diff --git a/arch/mips/boot/compressed/Makefile b/arch/mips/boot/compressed/Makefile
index 47cd9dc7454af..0855ea12f2c7f 100644
--- a/arch/mips/boot/compressed/Makefile
+++ b/arch/mips/boot/compressed/Makefile
@@ -37,6 +37,7 @@ KBUILD_AFLAGS := $(KBUILD_AFLAGS) -D__ASSEMBLY__ \
 # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
 KCOV_INSTRUMENT		:= n
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 # decompressor objects (linked with vmlinuz)
 vmlinuzobjs-y := $(obj)/head.o $(obj)/decompress.o $(obj)/string.o
diff --git a/arch/mips/vdso/Makefile b/arch/mips/vdso/Makefile
index 5810cc12bc1d9..d7eb64de35eae 100644
--- a/arch/mips/vdso/Makefile
+++ b/arch/mips/vdso/Makefile
@@ -66,6 +66,7 @@ ldflags-y := -Bsymbolic --no-undefined -soname=linux-vdso.so.1 \
 CFLAGS_REMOVE_vdso.o = -pg
 
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 KCOV_INSTRUMENT := n
 
diff --git a/arch/nds32/kernel/vdso/Makefile b/arch/nds32/kernel/vdso/Makefile
index 55df25ef00578..f2b53ee2124b7 100644
--- a/arch/nds32/kernel/vdso/Makefile
+++ b/arch/nds32/kernel/vdso/Makefile
@@ -15,9 +15,9 @@ obj-vdso := $(addprefix $(obj)/, $(obj-vdso))
 ccflags-y := -shared -fno-common -fno-builtin -nostdlib -fPIC -Wl,-shared -g \
 	-Wl,-soname=linux-vdso.so.1 -Wl,--hash-style=sysv
 
-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
 GCOV_PROFILE := n
-
+PGO_PROFILE := n
 
 obj-y += vdso.o
 targets += vdso.lds
diff --git a/arch/parisc/boot/compressed/Makefile b/arch/parisc/boot/compressed/Makefile
index dff4536875305..5cf93a67f7da7 100644
--- a/arch/parisc/boot/compressed/Makefile
+++ b/arch/parisc/boot/compressed/Makefile
@@ -7,6 +7,7 @@
 
 KCOV_INSTRUMENT := n
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 
 targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index fe2ef598e2ead..c642c046660d7 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -153,17 +153,21 @@ endif
 obj-$(CONFIG_PPC_SECURE_BOOT)	+= secure_boot.o ima_arch.o secvar-ops.o
 obj-$(CONFIG_PPC_SECVAR_SYSFS)	+= secvar-sysfs.o
 
-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
 GCOV_PROFILE_prom_init.o := n
+PGO_PROFILE_prom_init.o := n
 KCOV_INSTRUMENT_prom_init.o := n
 UBSAN_SANITIZE_prom_init.o := n
 GCOV_PROFILE_kprobes.o := n
+PGO_PROFILE_kprobes.o := n
 KCOV_INSTRUMENT_kprobes.o := n
 UBSAN_SANITIZE_kprobes.o := n
 GCOV_PROFILE_kprobes-ftrace.o := n
+PGO_PROFILE_kprobes-ftrace.o := n
 KCOV_INSTRUMENT_kprobes-ftrace.o := n
 UBSAN_SANITIZE_kprobes-ftrace.o := n
 GCOV_PROFILE_syscall_64.o := n
+PGO_PROFILE_syscall_64.o := n
 KCOV_INSTRUMENT_syscall_64.o := n
 UBSAN_SANITIZE_syscall_64.o := n
 UBSAN_SANITIZE_vdso.o := n
diff --git a/arch/powerpc/kernel/trace/Makefile b/arch/powerpc/kernel/trace/Makefile
index 858503775c583..7d72ae7d4f8c6 100644
--- a/arch/powerpc/kernel/trace/Makefile
+++ b/arch/powerpc/kernel/trace/Makefile
@@ -23,7 +23,8 @@ obj-$(CONFIG_TRACING)			+= trace_clock.o
 obj-$(CONFIG_PPC64)			+= $(obj64-y)
 obj-$(CONFIG_PPC32)			+= $(obj32-y)
 
-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
 GCOV_PROFILE_ftrace.o := n
+PGO_PROFILE_ftrace.o := n
 KCOV_INSTRUMENT_ftrace.o := n
 UBSAN_SANITIZE_ftrace.o := n
diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile
index 9cb6f524854b9..655e159975a04 100644
--- a/arch/powerpc/kernel/vdso32/Makefile
+++ b/arch/powerpc/kernel/vdso32/Makefile
@@ -34,6 +34,7 @@ targets := $(obj-vdso32) vdso32.so.dbg
 obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32))
 
 GCOV_PROFILE := n
+PGO_PROFILE := n
 KCOV_INSTRUMENT := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile
index bf363ff371521..12c286f5afc16 100644
--- a/arch/powerpc/kernel/vdso64/Makefile
+++ b/arch/powerpc/kernel/vdso64/Makefile
@@ -21,6 +21,7 @@ targets := $(obj-vdso64) vdso64.so.dbg
 obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64))
 
 GCOV_PROFILE := n
+PGO_PROFILE := n
 KCOV_INSTRUMENT := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
index 4aff6846c7726..1c7f65e3cb969 100644
--- a/arch/powerpc/kexec/Makefile
+++ b/arch/powerpc/kexec/Makefile
@@ -16,7 +16,8 @@ endif
 endif
 
 
-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
 GCOV_PROFILE_core_$(BITS).o := n
+PGO_PROFILE_core_$(BITS).o := n
 KCOV_INSTRUMENT_core_$(BITS).o := n
 UBSAN_SANITIZE_core_$(BITS).o := n
diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
index eb25d7554ffd1..7aff80d18b44b 100644
--- a/arch/powerpc/xmon/Makefile
+++ b/arch/powerpc/xmon/Makefile
@@ -2,6 +2,7 @@
 # Makefile for xmon
 
 GCOV_PROFILE := n
+PGO_PROFILE := n
 KCOV_INSTRUMENT := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
index 0cfd6da784f84..882340dc3c647 100644
--- a/arch/riscv/kernel/vdso/Makefile
+++ b/arch/riscv/kernel/vdso/Makefile
@@ -32,8 +32,9 @@ CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
 # Disable -pg to prevent insert call site
 CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
 
-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
 GCOV_PROFILE := n
+PGO_PROFILE := n
 KCOV_INSTRUMENT := n
 
 # Force dependency
diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
index 41a64b8dce252..bee4a32040e79 100644
--- a/arch/s390/boot/Makefile
+++ b/arch/s390/boot/Makefile
@@ -5,6 +5,7 @@
 
 KCOV_INSTRUMENT := n
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
 
diff --git a/arch/s390/boot/compressed/Makefile b/arch/s390/boot/compressed/Makefile
index de18dab518bb6..c3ab883e8425a 100644
--- a/arch/s390/boot/compressed/Makefile
+++ b/arch/s390/boot/compressed/Makefile
@@ -7,6 +7,7 @@
 
 KCOV_INSTRUMENT := n
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
 
diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
index dd73b7f074237..bd857aacad794 100644
--- a/arch/s390/kernel/Makefile
+++ b/arch/s390/kernel/Makefile
@@ -14,6 +14,7 @@ CFLAGS_REMOVE_early.o		= $(CC_FLAGS_FTRACE)
 endif
 
 GCOV_PROFILE_early.o		:= n
+PGO_PROFILE_early.o		:= n
 KCOV_INSTRUMENT_early.o		:= n
 UBSAN_SANITIZE_early.o		:= n
 KASAN_SANITIZE_ipl.o		:= n
diff --git a/arch/s390/kernel/vdso64/Makefile b/arch/s390/kernel/vdso64/Makefile
index a6e0fb6b91d6c..d7c43b7c1db96 100644
--- a/arch/s390/kernel/vdso64/Makefile
+++ b/arch/s390/kernel/vdso64/Makefile
@@ -35,8 +35,9 @@ obj-y += vdso64_wrapper.o
 targets += vdso64.lds
 CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
 
-# Disable gcov profiling, ubsan and kasan for VDSO code
+# Disable gcov and PGO profiling, ubsan and kasan for VDSO code
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
 
diff --git a/arch/s390/purgatory/Makefile b/arch/s390/purgatory/Makefile
index c57f8c40e9926..9aef584e98466 100644
--- a/arch/s390/purgatory/Makefile
+++ b/arch/s390/purgatory/Makefile
@@ -17,6 +17,7 @@ $(obj)/mem.o: $(srctree)/arch/s390/lib/mem.S FORCE
 
 KCOV_INSTRUMENT := n
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
 
diff --git a/arch/sh/boot/compressed/Makefile b/arch/sh/boot/compressed/Makefile
index 589d2d8a573db..ae19aeeb3964c 100644
--- a/arch/sh/boot/compressed/Makefile
+++ b/arch/sh/boot/compressed/Makefile
@@ -13,6 +13,7 @@ targets		:= vmlinux vmlinux.bin vmlinux.bin.gz \
 OBJECTS = $(obj)/head_32.o $(obj)/misc.o $(obj)/cache.o
 
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 #
 # IMAGE_OFFSET is the load offset of the compression loader
diff --git a/arch/sh/mm/Makefile b/arch/sh/mm/Makefile
index f69ddc70b1465..ea2782c631f43 100644
--- a/arch/sh/mm/Makefile
+++ b/arch/sh/mm/Makefile
@@ -43,3 +43,4 @@ obj-$(CONFIG_UNCACHED_MAPPING)	+= uncached.o
 obj-$(CONFIG_HAVE_SRAM_POOL)	+= sram.o
 
 GCOV_PROFILE_pmb.o := n
+PGO_PROFILE_pmb.o := n
diff --git a/arch/sparc/vdso/Makefile b/arch/sparc/vdso/Makefile
index c5e1545bc5cf9..ab5f3783fe199 100644
--- a/arch/sparc/vdso/Makefile
+++ b/arch/sparc/vdso/Makefile
@@ -115,6 +115,7 @@ quiet_cmd_vdso = VDSO    $@
 
 VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 -Bsymbolic
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 #
 # Install the unstripped copies of vdso*.so.  If our toolchain supports
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 21f851179ff08..36305ea61dc09 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -96,6 +96,7 @@ config X86
 	select ARCH_SUPPORTS_DEBUG_PAGEALLOC
 	select ARCH_SUPPORTS_NUMA_BALANCING	if X86_64
 	select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP	if NR_CPUS <= 4096
+	select ARCH_SUPPORTS_PGO_CLANG		if X86_64
 	select ARCH_USE_BUILTIN_BSWAP
 	select ARCH_USE_QUEUED_RWLOCKS
 	select ARCH_USE_QUEUED_SPINLOCKS
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index fe605205b4ce2..383853e32f673 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -71,6 +71,7 @@ KBUILD_AFLAGS	:= $(KBUILD_CFLAGS) -D__ASSEMBLY__
 KBUILD_CFLAGS	+= $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
 KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 
 $(obj)/bzImage: asflags-y  := $(SVGA_MODE)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index e0bc3988c3faa..ed12ab65f6065 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
 
 KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE :=n
 
 KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 02e3e42f380bd..26e2b3af0145c 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO    $@
 VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
 	$(call ld-option, --eh-frame-hdr) -Bsymbolic
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 quiet_cmd_vdso_and_check = VDSO    $@
       cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index efd9e9ea17f25..f6cab2316c46a 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -184,6 +184,8 @@ SECTIONS
 
 	BUG_TABLE
 
+	PGO_CLANG_DATA
+
 	ORC_UNWIND_TABLE
 
 	. = ALIGN(PAGE_SIZE);
diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
index 84b09c230cbd5..5f22b31446ad4 100644
--- a/arch/x86/platform/efi/Makefile
+++ b/arch/x86/platform/efi/Makefile
@@ -2,6 +2,7 @@
 OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
 KASAN_SANITIZE := n
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 obj-$(CONFIG_EFI) 		+= quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
 obj-$(CONFIG_EFI_MIXED)		+= efi_thunk_$(BITS).o
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 95ea17a9d20cb..36f20e99da0bc 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
 
 # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
 GCOV_PROFILE	:= n
+PGO_PROFILE	:= n
 KASAN_SANITIZE	:= n
 UBSAN_SANITIZE	:= n
 KCSAN_SANITIZE	:= n
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 83f1b6a56449f..21797192f958f 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -76,4 +76,5 @@ KBUILD_CFLAGS	:= $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
 KBUILD_AFLAGS	:= $(KBUILD_CFLAGS) -D__ASSEMBLY__
 KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
index 5943387e3f357..54f5768f58530 100644
--- a/arch/x86/um/vdso/Makefile
+++ b/arch/x86/um/vdso/Makefile
@@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO    $@
 
 VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 #
 # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 8a94388e38b33..2d81623b33f29 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -40,6 +40,7 @@ KBUILD_CFLAGS			:= $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
 KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
 
 GCOV_PROFILE			:= n
+PGO_PROFILE			:= n
 # Sanitizer runtimes are unavailable and cannot be linked here.
 KASAN_SANITIZE			:= n
 KCSAN_SANITIZE			:= n
diff --git a/drivers/s390/char/Makefile b/drivers/s390/char/Makefile
index c6fdb81a068a6..bf6c5db5da1fc 100644
--- a/drivers/s390/char/Makefile
+++ b/drivers/s390/char/Makefile
@@ -9,6 +9,7 @@ CFLAGS_REMOVE_sclp_early_core.o	= $(CC_FLAGS_FTRACE)
 endif
 
 GCOV_PROFILE_sclp_early_core.o		:= n
+PGO_PROFILE_sclp_early_core.o		:= n
 KCOV_INSTRUMENT_sclp_early_core.o	:= n
 UBSAN_SANITIZE_sclp_early_core.o	:= n
 KASAN_SANITIZE_sclp_early_core.o	:= n
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index b2b3d81b1535a..3a591bb18c5fb 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -316,6 +316,49 @@
 #define THERMAL_TABLE(name)
 #endif
 
+#ifdef CONFIG_PGO_CLANG
+#define PGO_CLANG_DATA							\
+	__llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) {	\
+		. = ALIGN(8);						\
+		__llvm_prf_start = .;					\
+		__llvm_prf_data_start = .;				\
+		KEEP(*(__llvm_prf_data))				\
+		. = ALIGN(8);						\
+		__llvm_prf_data_end = .;				\
+	}								\
+	__llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) {	\
+		. = ALIGN(8);						\
+		__llvm_prf_cnts_start = .;				\
+		KEEP(*(__llvm_prf_cnts))				\
+		. = ALIGN(8);						\
+		__llvm_prf_cnts_end = .;				\
+	}								\
+	__llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) {	\
+		. = ALIGN(8);						\
+		__llvm_prf_names_start = .;				\
+		KEEP(*(__llvm_prf_names))				\
+		. = ALIGN(8);						\
+		__llvm_prf_names_end = .;				\
+		. = ALIGN(8);						\
+	}								\
+	__llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) {	\
+		__llvm_prf_vals_start = .;				\
+		KEEP(*(__llvm_prf_vals))				\
+		. = ALIGN(8);						\
+		__llvm_prf_vals_end = .;				\
+		. = ALIGN(8);						\
+	}								\
+	__llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) {	\
+		__llvm_prf_vnds_start = .;				\
+		KEEP(*(__llvm_prf_vnds))				\
+		. = ALIGN(8);						\
+		__llvm_prf_vnds_end = .;				\
+		__llvm_prf_end = .;					\
+	}
+#else
+#define PGO_CLANG_DATA
+#endif
+
 #define KERNEL_DTB()							\
 	STRUCT_ALIGN();							\
 	__dtb_start = .;						\
@@ -1125,6 +1168,7 @@
 		CONSTRUCTORS						\
 	}								\
 	BUG_TABLE							\
+	PGO_CLANG_DATA
 
 #define INIT_TEXT_SECTION(inittext_align)				\
 	. = ALIGN(inittext_align);					\
diff --git a/kernel/Makefile b/kernel/Makefile
index aa7368c7eabf3..0b34ca228ba46 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
 obj-$(CONFIG_KCSAN) += kcsan/
 obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
 obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
+obj-$(CONFIG_PGO_CLANG) += pgo/
 
 obj-$(CONFIG_PERF_EVENTS) += events/
 
diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
new file mode 100644
index 0000000000000..318d36bb3d106
--- /dev/null
+++ b/kernel/pgo/Kconfig
@@ -0,0 +1,34 @@
+# SPDX-License-Identifier: GPL-2.0-only
+menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
+
+config ARCH_SUPPORTS_PGO_CLANG
+	bool
+
+config PGO_CLANG
+	bool "Enable clang's PGO-based kernel profiling"
+	depends on DEBUG_FS
+	depends on ARCH_SUPPORTS_PGO_CLANG
+	help
+	  This option enables clang's PGO (Profile Guided Optimization) based
+	  code profiling to better optimize the kernel.
+
+	  If unsure, say N.
+
+	  Run a representative workload for your application on a kernel
+	  compiled with this option and download the raw profile file from
+	  /sys/kernel/debug/pgo/profraw. This file needs to be processed with
+	  llvm-profdata. It may be merged with other collected raw profiles.
+
+	  Copy the resulting profile file into vmlinux.profdata, and enable
+	  KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
+	  kernel.
+
+	  Note that a kernel compiled with profiling flags will be
+	  significatnly larger and run slower. Also be sure to exclude files
+	  from profiling which are not linked to the kernel image to prevent
+	  linker errors.
+
+	  Note that the debugfs filesystem has to be mounted to access
+	  profiling data.
+
+endmenu
diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
new file mode 100644
index 0000000000000..41e27cefd9a47
--- /dev/null
+++ b/kernel/pgo/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+GCOV_PROFILE	:= n
+PGO_PROFILE	:= n
+
+obj-y	+= fs.o instrument.o
diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
new file mode 100644
index 0000000000000..790a8df037bfc
--- /dev/null
+++ b/kernel/pgo/fs.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ *	Sami Tolvanen <samitolvanen@google.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt)	"pgo: " fmt
+
+#include <linux/kernel.h>
+#include <linux/debugfs.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include "pgo.h"
+
+static struct dentry *directory;
+
+struct prf_private_data {
+	void *buffer;
+	unsigned long size;
+};
+
+/*
+ * Raw profile data format:
+ *
+ *	- llvm_prf_header
+ *	- __llvm_prf_data
+ *	- __llvm_prf_cnts
+ *	- __llvm_prf_names
+ *	- zero padding to 8 bytes
+ *	- for each llvm_prf_data in __llvm_prf_data:
+ *		- llvm_prf_value_data
+ *			- llvm_prf_value_record + site count array
+ *				- llvm_prf_value_node_data
+ *				...
+ *			...
+ *		...
+ */
+
+static void prf_fill_header(void **buffer)
+{
+	struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
+
+	header->magic = LLVM_PRF_MAGIC;
+	header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
+	header->data_size = prf_data_count();
+	header->padding_bytes_before_counters = 0;
+	header->counters_size = prf_cnts_count();
+	header->padding_bytes_after_counters = 0;
+	header->names_size = prf_names_count();
+	header->counters_delta = (u64)__llvm_prf_cnts_start;
+	header->names_delta = (u64)__llvm_prf_names_start;
+	header->value_kind_last = LLVM_PRF_IPVK_LAST;
+
+	*buffer += sizeof(*header);
+}
+
+/*
+ * Copy the source into the buffer, incrementing the pointer into buffer in the
+ * process.
+ */
+static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
+{
+	memcpy(*buffer, src, size);
+	*buffer += size;
+}
+
+static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
+{
+	struct llvm_prf_value_node **nodes =
+		(struct llvm_prf_value_node **)p->values;
+	u32 kinds = 0;
+	u32 size = 0;
+	unsigned int kind;
+	unsigned int n;
+	unsigned int s = 0;
+
+	for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+		unsigned int sites = p->num_value_sites[kind];
+
+		if (!sites)
+			continue;
+
+		/* Record + site count array */
+		size += prf_get_value_record_size(sites);
+		kinds++;
+
+		if (!nodes)
+			continue;
+
+		for (n = 0; n < sites; n++) {
+			u32 count = 0;
+			struct llvm_prf_value_node *site = nodes[s + n];
+
+			while (site && ++count <= U8_MAX)
+				site = site->next;
+
+			size += count *
+				sizeof(struct llvm_prf_value_node_data);
+		}
+
+		s += sites;
+	}
+
+	if (size)
+		size += sizeof(struct llvm_prf_value_data);
+
+	if (value_kinds)
+		*value_kinds = kinds;
+
+	return size;
+}
+
+static u32 prf_get_value_size(void)
+{
+	u32 size = 0;
+	struct llvm_prf_data *p;
+
+	for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+		size += __prf_get_value_size(p, NULL);
+
+	return size;
+}
+
+/* Serialize the profiling's value. */
+static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
+{
+	struct llvm_prf_value_data header;
+	struct llvm_prf_value_node **nodes =
+		(struct llvm_prf_value_node **)p->values;
+	unsigned int kind;
+	unsigned int n;
+	unsigned int s = 0;
+
+	header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
+
+	if (!header.num_value_kinds)
+		/* Nothing to write. */
+		return;
+
+	prf_copy_to_buffer(buffer, &header, sizeof(header));
+
+	for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+		struct llvm_prf_value_record *record;
+		u8 *counts;
+		unsigned int sites = p->num_value_sites[kind];
+
+		if (!sites)
+			continue;
+
+		/* Profiling value record. */
+		record = *(struct llvm_prf_value_record **)buffer;
+		*buffer += prf_get_value_record_header_size();
+
+		record->kind = kind;
+		record->num_value_sites = sites;
+
+		/* Site count array. */
+		counts = *(u8 **)buffer;
+		*buffer += prf_get_value_record_site_count_size(sites);
+
+		/*
+		 * If we don't have nodes, we can skip updating the site count
+		 * array, because the buffer is zero filled.
+		 */
+		if (!nodes)
+			continue;
+
+		for (n = 0; n < sites; n++) {
+			u32 count = 0;
+			struct llvm_prf_value_node *site = nodes[s + n];
+
+			while (site && ++count <= U8_MAX) {
+				prf_copy_to_buffer(buffer, site,
+						   sizeof(struct llvm_prf_value_node_data));
+				site = site->next;
+			}
+
+			counts[n] = (u8)count;
+		}
+
+		s += sites;
+	}
+}
+
+static void prf_serialize_values(void **buffer)
+{
+	struct llvm_prf_data *p;
+
+	for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+		prf_serialize_value(p, buffer);
+}
+
+static inline unsigned long prf_get_padding(unsigned long size)
+{
+	return 8 - (size % 8);
+}
+
+static unsigned long prf_buffer_size(void)
+{
+	return sizeof(struct llvm_prf_header) +
+			prf_data_size()	+
+			prf_cnts_size() +
+			prf_names_size() +
+			prf_get_padding(prf_names_size()) +
+			prf_get_value_size();
+}
+
+/* Serialize the profling data into a format LLVM's tools can understand. */
+static int prf_serialize(struct prf_private_data *p)
+{
+	int err = 0;
+	void *buffer;
+
+	p->size = prf_buffer_size();
+	p->buffer = vzalloc(p->size);
+
+	if (!p->buffer) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	buffer = p->buffer;
+
+	prf_fill_header(&buffer);
+	prf_copy_to_buffer(&buffer, __llvm_prf_data_start,  prf_data_size());
+	prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start,  prf_cnts_size());
+	prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
+	buffer += prf_get_padding(prf_names_size());
+
+	prf_serialize_values(&buffer);
+
+out:
+	return err;
+}
+
+/* open() implementation for PGO. Creates a copy of the profiling data set. */
+static int prf_open(struct inode *inode, struct file *file)
+{
+	struct prf_private_data *data;
+	unsigned long flags;
+	int err;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	flags = prf_lock();
+
+	err = prf_serialize(data);
+	if (err) {
+		kfree(data);
+		goto out_unlock;
+	}
+
+	file->private_data = data;
+
+out_unlock:
+	prf_unlock(flags);
+out:
+	return err;
+}
+
+/* read() implementation for PGO. */
+static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
+			loff_t *ppos)
+{
+	struct prf_private_data *data = file->private_data;
+
+	BUG_ON(!data);
+
+	return simple_read_from_buffer(buf, count, ppos, data->buffer,
+				       data->size);
+}
+
+/* release() implementation for PGO. Release resources allocated by open(). */
+static int prf_release(struct inode *inode, struct file *file)
+{
+	struct prf_private_data *data = file->private_data;
+
+	if (data) {
+		vfree(data->buffer);
+		kfree(data);
+	}
+
+	return 0;
+}
+
+static const struct file_operations prf_fops = {
+	.owner		= THIS_MODULE,
+	.open		= prf_open,
+	.read		= prf_read,
+	.llseek		= default_llseek,
+	.release	= prf_release
+};
+
+/* write() implementation for resetting PGO's profile data. */
+static ssize_t reset_write(struct file *file, const char __user *addr,
+			   size_t len, loff_t *pos)
+{
+	struct llvm_prf_data *data;
+
+	memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
+
+	for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
+		struct llvm_prf_value_node **vnodes;
+		u64 current_vsite_count;
+		u32 i;
+
+		if (!data->values)
+			continue;
+
+		current_vsite_count = 0;
+		vnodes = (struct llvm_prf_value_node **)data->values;
+
+		for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
+			current_vsite_count += data->num_value_sites[i];
+
+		for (i = 0; i < current_vsite_count; ++i) {
+			struct llvm_prf_value_node *current_vnode = vnodes[i];
+
+			while (current_vnode) {
+				current_vnode->count = 0;
+				current_vnode = current_vnode->next;
+			}
+		}
+	}
+
+	return len;
+}
+
+static const struct file_operations prf_reset_fops = {
+	.owner		= THIS_MODULE,
+	.write		= reset_write,
+	.llseek		= noop_llseek,
+};
+
+/* Create debugfs entries. */
+static int __init pgo_init(void)
+{
+	directory = debugfs_create_dir("pgo", NULL);
+	if (!directory)
+		goto err_remove;
+
+	if (!debugfs_create_file("profraw", 0600, directory, NULL,
+				 &prf_fops))
+		goto err_remove;
+
+	if (!debugfs_create_file("reset", 0200, directory, NULL,
+				 &prf_reset_fops))
+		goto err_remove;
+
+	return 0;
+
+err_remove:
+	pr_err("initialization failed\n");
+	return -EIO;
+}
+
+/* Remove debufs entries. */
+static void __exit pgo_exit(void)
+{
+	debugfs_remove_recursive(directory);
+}
+
+module_init(pgo_init);
+module_exit(pgo_exit);
diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
new file mode 100644
index 0000000000000..465615b7f8735
--- /dev/null
+++ b/kernel/pgo/instrument.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ *	Sami Tolvanen <samitolvanen@google.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt)	"pgo: " fmt
+
+#include <linux/kernel.h>
+#include <linux/export.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+#include "pgo.h"
+
+/* Lock guarding value node access and serialization. */
+static DEFINE_SPINLOCK(pgo_lock);
+static int current_node;
+
+unsigned long prf_lock(void)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&pgo_lock, flags);
+
+	return flags;
+}
+
+void prf_unlock(unsigned long flags)
+{
+	spin_unlock_irqrestore(&pgo_lock, flags);
+}
+
+/*
+ * Return a newly allocated profiling value node which contains the tracked
+ * value by the value profiler.
+ * Note: caller *must* hold pgo_lock.
+ */
+static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
+						 u32 index, u64 value)
+{
+	if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
+		return NULL; /* Out of nodes */
+
+	current_node++;
+
+	/* Make sure the node is entirely within the section */
+	if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
+	    &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
+		return NULL;
+
+	return &__llvm_prf_vnds_start[current_node];
+}
+
+/*
+ * Counts the number of times a target value is seen.
+ *
+ * Records the target value for the CounterIndex if not seen before. Otherwise,
+ * increments the counter associated w/ the target value.
+ */
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
+{
+	struct llvm_prf_data *p = (struct llvm_prf_data *)data;
+	struct llvm_prf_value_node **counters;
+	struct llvm_prf_value_node *curr;
+	struct llvm_prf_value_node *min = NULL;
+	struct llvm_prf_value_node *prev = NULL;
+	u64 min_count = U64_MAX;
+	u8 values = 0;
+	unsigned long flags;
+
+	if (!p || !p->values)
+		return;
+
+	counters = (struct llvm_prf_value_node **)p->values;
+	curr = counters[index];
+
+	while (curr) {
+		if (target_value == curr->value) {
+			curr->count++;
+			return;
+		}
+
+		if (curr->count < min_count) {
+			min_count = curr->count;
+			min = curr;
+		}
+
+		prev = curr;
+		curr = curr->next;
+		values++;
+	}
+
+	if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
+		if (!min->count || !(--min->count)) {
+			curr = min;
+			curr->value = target_value;
+			curr->count++;
+		}
+		return;
+	}
+
+	/* Lock when updating the value node structure. */
+	flags = prf_lock();
+
+	curr = allocate_node(p, index, target_value);
+	if (!curr)
+		goto out;
+
+	curr->value = target_value;
+	curr->count++;
+
+	if (!counters[index])
+		counters[index] = curr;
+	else if (prev && !prev->next)
+		prev->next = curr;
+
+out:
+	prf_unlock(flags);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_target);
+
+/* Counts the number of times a range of targets values are seen. */
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+				     u32 index, s64 precise_start,
+				     s64 precise_last, s64 large_value)
+{
+	if (large_value != S64_MIN && (s64)target_value >= large_value)
+		target_value = large_value;
+	else if ((s64)target_value < precise_start ||
+		 (s64)target_value > precise_last)
+		target_value = precise_last + 1;
+
+	__llvm_profile_instrument_target(target_value, data, index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_range);
+
+static inline int inst_prof_popcount(unsigned long long value)
+{
+	value = value - ((value >> 1) & 0x5555555555555555ULL);
+	value = (value & 0x3333333333333333ULL) +
+		((value >> 2) & 0x3333333333333333ULL);
+	value = (value + (value >> 4)) & 0x0F0F0F0F0F0F0F0FULL;
+
+	return (int)((unsigned long long)(value * 0x0101010101010101ULL) >> 56);
+}
+
+static u64 inst_prof_get_range_rep_value(u64 value)
+{
+	if (value <= 8)
+		/* The first ranges are individually tracked, us it as is. */
+		return value;
+	else if (value >= 513)
+		/* The last range is mapped to its lowest value. */
+		return 513;
+	else if (inst_prof_popcount(value) == 1)
+		/* If it's a power of two, use it as is. */
+		return value;
+
+	/* Otherwise, take to the previous power of two + 1. */
+	return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
+}
+
+/*
+ * The target values are partitioned into multiple ranges. The range spec is
+ * defined in compiler-rt/include/profile/InstrProfData.inc.
+ */
+void __llvm_profile_instrument_memop(u64 target_value, void *data,
+				     u32 counter_index)
+{
+	u64 rep_value;
+
+	/* Map the target value to the representative value of its range. */
+	rep_value = inst_prof_get_range_rep_value(target_value);
+	__llvm_profile_instrument_target(rep_value, data, counter_index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_memop);
diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
new file mode 100644
index 0000000000000..df0aa278f28bd
--- /dev/null
+++ b/kernel/pgo/pgo.h
@@ -0,0 +1,206 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ *	Sami Tolvanen <samitolvanen@google.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _PGO_H
+#define _PGO_H
+
+/*
+ * Note: These internal LLVM definitions must match the compiler version.
+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
+ */
+
+#ifdef CONFIG_64BIT
+	#define LLVM_PRF_MAGIC		\
+		((u64)255 << 56 |	\
+		 (u64)'l' << 48 |	\
+		 (u64)'p' << 40 |	\
+		 (u64)'r' << 32 |	\
+		 (u64)'o' << 24 |	\
+		 (u64)'f' << 16 |	\
+		 (u64)'r' << 8  |	\
+		 (u64)129)
+#else
+	#define LLVM_PRF_MAGIC		\
+		((u64)255 << 56 |	\
+		 (u64)'l' << 48 |	\
+		 (u64)'p' << 40 |	\
+		 (u64)'r' << 32 |	\
+		 (u64)'o' << 24 |	\
+		 (u64)'f' << 16 |	\
+		 (u64)'R' << 8  |	\
+		 (u64)129)
+#endif
+
+#define LLVM_PRF_VERSION		5
+#define LLVM_PRF_DATA_ALIGN		8
+#define LLVM_PRF_IPVK_FIRST		0
+#define LLVM_PRF_IPVK_LAST		1
+#define LLVM_PRF_MAX_NUM_VALS_PER_SITE	16
+
+#define LLVM_PRF_VARIANT_MASK_IR	(0x1ull << 56)
+#define LLVM_PRF_VARIANT_MASK_CSIR	(0x1ull << 57)
+
+/**
+ * struct llvm_prf_header - represents the raw profile header data structure.
+ * @magic: the magic token for the file format.
+ * @version: the version of the file format.
+ * @data_size: the number of entries in the profile data section.
+ * @padding_bytes_before_counters: the number of padding bytes before the
+ *   counters.
+ * @counters_size: the size in bytes of the LLVM profile section containing the
+ *   counters.
+ * @padding_bytes_after_counters: the number of padding bytes after the
+ *   counters.
+ * @names_size: the size in bytes of the LLVM profile section containing the
+ *   counters' names.
+ * @counters_delta: the beginning of the LLMV profile counters section.
+ * @names_delta: the beginning of the LLMV profile names section.
+ * @value_kind_last: the last profile value kind.
+ */
+struct llvm_prf_header {
+	u64 magic;
+	u64 version;
+	u64 data_size;
+	u64 padding_bytes_before_counters;
+	u64 counters_size;
+	u64 padding_bytes_after_counters;
+	u64 names_size;
+	u64 counters_delta;
+	u64 names_delta;
+	u64 value_kind_last;
+};
+
+/**
+ * struct llvm_prf_data - represents the per-function control structure.
+ * @name_ref: the reference to the function's name.
+ * @func_hash: the hash value of the function.
+ * @counter_ptr: a pointer to the profile counter.
+ * @function_ptr: a pointer to the function.
+ * @values: the profiling values associated with this function.
+ * @num_counters: the number of counters in the function.
+ * @num_value_sites: the number of value profile sites.
+ */
+struct llvm_prf_data {
+	const u64 name_ref;
+	const u64 func_hash;
+	const void *counter_ptr;
+	const void *function_ptr;
+	void *values;
+	const u32 num_counters;
+	const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
+} __aligned(LLVM_PRF_DATA_ALIGN);
+
+/**
+ * structure llvm_prf_value_node_data - represents the data part of the struct
+ *   llvm_prf_value_node data structure.
+ * @value: the value counters.
+ * @count: the counters' count.
+ */
+struct llvm_prf_value_node_data {
+	u64 value;
+	u64 count;
+};
+
+/**
+ * struct llvm_prf_value_node - represents an internal data structure used by
+ *   the value profiler.
+ * @value: the value counters.
+ * @count: the counters' count.
+ * @next: the next value node.
+ */
+struct llvm_prf_value_node {
+	u64 value;
+	u64 count;
+	struct llvm_prf_value_node *next;
+};
+
+/**
+ * struct llvm_prf_value_data - represents the value profiling data in indexed
+ *   format.
+ * @total_size: the total size in bytes including this field.
+ * @num_value_kinds: the number of value profile kinds that has value profile
+ *   data.
+ */
+struct llvm_prf_value_data {
+	u32 total_size;
+	u32 num_value_kinds;
+};
+
+/**
+ * struct llvm_prf_value_record - represents the on-disk layout of the value
+ *   profile data of a particular kind for one function.
+ * @kind: the kind of the value profile record.
+ * @num_value_sites: the number of value profile sites.
+ * @site_count_array: the first element of the array that stores the number
+ *   of profiled values for each value site.
+ */
+struct llvm_prf_value_record {
+	u32 kind;
+	u32 num_value_sites;
+	u8 site_count_array[];
+};
+
+#define prf_get_value_record_header_size()		\
+	offsetof(struct llvm_prf_value_record, site_count_array)
+#define prf_get_value_record_site_count_size(sites)	\
+	roundup((sites), 8)
+#define prf_get_value_record_size(sites)		\
+	(prf_get_value_record_header_size() +		\
+	 prf_get_value_record_site_count_size((sites)))
+
+/* Data sections */
+extern struct llvm_prf_data __llvm_prf_data_start[];
+extern struct llvm_prf_data __llvm_prf_data_end[];
+
+extern u64 __llvm_prf_cnts_start[];
+extern u64 __llvm_prf_cnts_end[];
+
+extern char __llvm_prf_names_start[];
+extern char __llvm_prf_names_end[];
+
+extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
+extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
+
+/* Locking for vnodes */
+extern unsigned long prf_lock(void);
+extern void prf_unlock(unsigned long flags);
+
+#define __DEFINE_PRF_SIZE(s) \
+	static inline unsigned long prf_ ## s ## _size(void)		\
+	{								\
+		unsigned long start =					\
+			(unsigned long)__llvm_prf_ ## s ## _start;	\
+		unsigned long end =					\
+			(unsigned long)__llvm_prf_ ## s ## _end;	\
+		return roundup(end - start,				\
+				sizeof(__llvm_prf_ ## s ## _start[0]));	\
+	}								\
+	static inline unsigned long prf_ ## s ## _count(void)		\
+	{								\
+		return prf_ ## s ## _size() /				\
+			sizeof(__llvm_prf_ ## s ## _start[0]);		\
+	}
+
+__DEFINE_PRF_SIZE(data);
+__DEFINE_PRF_SIZE(cnts);
+__DEFINE_PRF_SIZE(names);
+__DEFINE_PRF_SIZE(vnds);
+
+#undef __DEFINE_PRF_SIZE
+
+#endif /* _PGO_H */
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 213677a5ed33e..9b218afb5cb87 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
 		$(CFLAGS_GCOV))
 endif
 
+#
+# Enable clang's PGO profiling flags for a file or directory depending on
+# variables PGO_PROFILE_obj.o and PGO_PROFILE.
+#
+ifeq ($(CONFIG_PGO_CLANG),y)
+_c_flags += $(if $(patsubst n%,, \
+		$(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
+		$(CFLAGS_PGO_CLANG))
+endif
+
 #
 # Enable address sanitizer flags for kernel except some files or directories
 # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
-- 
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-12  5:14 ` [PATCH v2] " Bill Wendling
@ 2021-01-12  5:17   ` Sedat Dilek
  2021-01-12  5:31   ` [PATCH v3] " Bill Wendling
  2021-01-12 17:37   ` [PATCH v2] " Nick Desaulniers
  2 siblings, 0 replies; 116+ messages in thread
From: Sedat Dilek @ 2021-01-12  5:17 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Jonathan Corbet, Masahiro Yamada, linux-doc, linux-kernel,
	linux-kbuild, Clang-Built-Linux ML, Andrew Morton,
	Nathan Chancellor, Nick Desaulniers, Sami Tolvanen

On Tue, Jan 12, 2021 at 6:14 AM 'Bill Wendling' via Clang Built Linux
<clang-built-linux@googlegroups.com> wrote:
>
> From: Sami Tolvanen <samitolvanen@google.com>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
>
>   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
>   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can now be used by the compiler:
>
>   $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we
> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.
>
> Note that this method of profiling the kernel is clang-native and isn't
> compatible with clang's gcov support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> Co-developed-by: Bill Wendling <morbo@google.com>
> Signed-off-by: Bill Wendling <morbo@google.com>

Hi Bill,

can you add a history for the changelog of v2, please?
Thanks.

Regards,
- Sedat -

> ---
>  Documentation/dev-tools/index.rst     |   1 +
>  Documentation/dev-tools/pgo.rst       | 127 +++++++++
>  MAINTAINERS                           |   9 +
>  Makefile                              |   3 +
>  arch/Kconfig                          |   1 +
>  arch/arm/boot/bootp/Makefile          |   1 +
>  arch/arm/boot/compressed/Makefile     |   1 +
>  arch/arm/vdso/Makefile                |   3 +-
>  arch/arm64/kernel/vdso/Makefile       |   3 +-
>  arch/arm64/kvm/hyp/nvhe/Makefile      |   1 +
>  arch/mips/boot/compressed/Makefile    |   1 +
>  arch/mips/vdso/Makefile               |   1 +
>  arch/nds32/kernel/vdso/Makefile       |   4 +-
>  arch/parisc/boot/compressed/Makefile  |   1 +
>  arch/powerpc/kernel/Makefile          |   6 +-
>  arch/powerpc/kernel/trace/Makefile    |   3 +-
>  arch/powerpc/kernel/vdso32/Makefile   |   1 +
>  arch/powerpc/kernel/vdso64/Makefile   |   1 +
>  arch/powerpc/kexec/Makefile           |   3 +-
>  arch/powerpc/xmon/Makefile            |   1 +
>  arch/riscv/kernel/vdso/Makefile       |   3 +-
>  arch/s390/boot/Makefile               |   1 +
>  arch/s390/boot/compressed/Makefile    |   1 +
>  arch/s390/kernel/Makefile             |   1 +
>  arch/s390/kernel/vdso64/Makefile      |   3 +-
>  arch/s390/purgatory/Makefile          |   1 +
>  arch/sh/boot/compressed/Makefile      |   1 +
>  arch/sh/mm/Makefile                   |   1 +
>  arch/sparc/vdso/Makefile              |   1 +
>  arch/x86/Kconfig                      |   1 +
>  arch/x86/boot/Makefile                |   1 +
>  arch/x86/boot/compressed/Makefile     |   1 +
>  arch/x86/entry/vdso/Makefile          |   1 +
>  arch/x86/kernel/vmlinux.lds.S         |   2 +
>  arch/x86/platform/efi/Makefile        |   1 +
>  arch/x86/purgatory/Makefile           |   1 +
>  arch/x86/realmode/rm/Makefile         |   1 +
>  arch/x86/um/vdso/Makefile             |   1 +
>  drivers/firmware/efi/libstub/Makefile |   1 +
>  drivers/s390/char/Makefile            |   1 +
>  include/asm-generic/vmlinux.lds.h     |  44 +++
>  kernel/Makefile                       |   1 +
>  kernel/pgo/Kconfig                    |  34 +++
>  kernel/pgo/Makefile                   |   5 +
>  kernel/pgo/fs.c                       | 382 ++++++++++++++++++++++++++
>  kernel/pgo/instrument.c               | 188 +++++++++++++
>  kernel/pgo/pgo.h                      | 206 ++++++++++++++
>  scripts/Makefile.lib                  |  10 +
>  48 files changed, 1058 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/pgo.rst
>  create mode 100644 kernel/pgo/Kconfig
>  create mode 100644 kernel/pgo/Makefile
>  create mode 100644 kernel/pgo/fs.c
>  create mode 100644 kernel/pgo/instrument.c
>  create mode 100644 kernel/pgo/pgo.h
>
> diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> index f7809c7b1ba9e..8d6418e858062 100644
> --- a/Documentation/dev-tools/index.rst
> +++ b/Documentation/dev-tools/index.rst
> @@ -26,6 +26,7 @@ whole; patches welcome!
>     kgdb
>     kselftest
>     kunit/index
> +   pgo
>
>
>  .. only::  subproject and html
> diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> new file mode 100644
> index 0000000000000..da0e654ae7078
> --- /dev/null
> +++ b/Documentation/dev-tools/pgo.rst
> @@ -0,0 +1,127 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============================
> +Using PGO with the Linux kernel
> +===============================
> +
> +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> +when building with Clang. The profiling data is exported via the ``pgo``
> +debugfs directory.
> +
> +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> +
> +
> +Preparation
> +===========
> +
> +Configure the kernel with:
> +
> +.. code-block:: make
> +
> +   CONFIG_DEBUG_FS=y
> +   CONFIG_PGO_CLANG=y
> +
> +Note that kernels compiled with profiling flags will be significantly larger
> +and run slower.
> +
> +Profiling data will only become accessible once debugfs has been mounted:
> +
> +.. code-block:: sh
> +
> +   mount -t debugfs none /sys/kernel/debug
> +
> +
> +Customization
> +=============
> +
> +You can enable or disable profiling for individual file and directories by
> +adding a line similar to the following to the respective kernel Makefile:
> +
> +- For a single file (e.g. main.o)
> +
> +  .. code-block:: make
> +
> +     PGO_PROFILE_main.o := y
> +
> +- For all files in one directory
> +
> +  .. code-block:: make
> +
> +     PGO_PROFILE := y
> +
> +To exclude files from being profiled use
> +
> +  .. code-block:: make
> +
> +     PGO_PROFILE_main.o := n
> +
> +and
> +
> +  .. code-block:: make
> +
> +     PGO_PROFILE := n
> +
> +Only files which are linked to the main kernel image or are compiled as kernel
> +modules are supported by this mechanism.
> +
> +
> +Files
> +=====
> +
> +The PGO kernel support creates the following files in debugfs:
> +
> +``/sys/kernel/debug/pgo``
> +       Parent directory for all PGO-related files.
> +
> +``/sys/kernel/debug/pgo/reset``
> +       Global reset file: resets all coverage data to zero when written to.
> +
> +``/sys/kernel/debug/profraw``
> +       The raw PGO data that must be processed with ``llvm_profdata``.
> +
> +
> +Workflow
> +========
> +
> +The PGO kernel can be run on the host or test machines. The data though should
> +be analyzed with Clang's tools from the same Clang version as the kernel was
> +compiled. Clang's tolerant of version skew, but it's easier to use the same
> +Clang version.
> +
> +The profiling data is useful for optimizing the kernel, analyzing coverage,
> +etc. Clang offers tools to perform these tasks.
> +
> +Here is an example workflow for profiling an instrumented kernel with PGO and
> +using the result to optimize the kernel:
> +
> +1) Install the kernel on the TEST machine.
> +
> +2) Reset the data counters right before running the load tests
> +
> +   .. code-block:: sh
> +
> +      echo 1 > /sys/kernel/debug/pgo/reset
> +
> +3) Run the load tests.
> +
> +4) Collect the raw profile data
> +
> +   .. code-block:: sh
> +
> +      cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> +
> +5) (Optional) Download the raw profile data to the HOST machine.
> +
> +6) Process the raw profile data
> +
> +   .. code-block:: sh
> +
> +      llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> +
> +   Note that multiple raw profile data files can be merged during this step.
> +
> +7) Rebuild the kernel using the profile data (PGO disabled)
> +
> +   .. code-block:: sh
> +
> +      make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> diff --git a/MAINTAINERS b/MAINTAINERS
> index cc1e6a5ee6e67..1b979da316fa4 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13954,6 +13954,15 @@ S:     Maintained
>  F:     include/linux/personality.h
>  F:     include/uapi/linux/personality.h
>
> +PGO BASED KERNEL PROFILING
> +M:     Sami Tolvanen <samitolvanen@google.com>
> +M:     Bill Wendling <wcw@google.com>
> +R:     Nathan Chancellor <natechancellor@gmail.com>
> +R:     Nick Desaulniers <ndesaulniers@google.com>
> +S:     Supported
> +F:     Documentation/dev-tools/pgo.rst
> +F:     kernel/pgo
> +
>  PHOENIX RC FLIGHT CONTROLLER ADAPTER
>  M:     Marcus Folkesson <marcus.folkesson@gmail.com>
>  L:     linux-input@vger.kernel.org
> diff --git a/Makefile b/Makefile
> index 9e73f82e0d863..9128bfe1ccc97 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
>  # Defaults to vmlinux, but the arch makefile usually adds further targets
>  all: vmlinux
>
> +CFLAGS_PGO_CLANG := -fprofile-generate
> +export CFLAGS_PGO_CLANG
> +
>  CFLAGS_GCOV    := -fprofile-arcs -ftest-coverage \
>         $(call cc-option,-fno-tree-loop-im) \
>         $(call cc-disable-warning,maybe-uninitialized,)
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 24862d15f3a36..f39d3991f6bfe 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
>            pairs of 32-bit arguments, select this option.
>
>  source "kernel/gcov/Kconfig"
> +source "kernel/pgo/Kconfig"
>
>  source "scripts/gcc-plugins/Kconfig"
>
> diff --git a/arch/arm/boot/bootp/Makefile b/arch/arm/boot/bootp/Makefile
> index 981a8d03f064c..523bd58df0a4b 100644
> --- a/arch/arm/boot/bootp/Makefile
> +++ b/arch/arm/boot/bootp/Makefile
> @@ -7,6 +7,7 @@
>  #
>
>  GCOV_PROFILE   := n
> +PGO_PROFILE    := n
>
>  LDFLAGS_bootp  := --no-undefined -X \
>                  --defsym initrd_phys=$(INITRD_PHYS) \
> diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
> index fb521efcc6c20..5fd0fd85fc0e5 100644
> --- a/arch/arm/boot/compressed/Makefile
> +++ b/arch/arm/boot/compressed/Makefile
> @@ -24,6 +24,7 @@ OBJS          += hyp-stub.o
>  endif
>
>  GCOV_PROFILE           := n
> +PGO_PROFILE            := n
>  KASAN_SANITIZE         := n
>
>  # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
> diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile
> index b558bee0e1f6b..11f6ce4b48b56 100644
> --- a/arch/arm/vdso/Makefile
> +++ b/arch/arm/vdso/Makefile
> @@ -36,8 +36,9 @@ else
>  CFLAGS_vgettimeofday.o = -O2 -include $(c-gettimeofday-y)
>  endif
>
> -# Disable gcov profiling for VDSO code
> +# Disable gcov and PGO profiling for VDSO code
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
>  KCOV_INSTRUMENT := n
> diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
> index cd9c3fa25902f..d48fc0df07020 100644
> --- a/arch/arm64/kernel/vdso/Makefile
> +++ b/arch/arm64/kernel/vdso/Makefile
> @@ -41,8 +41,9 @@ ifneq ($(c-gettimeofday-y),)
>    CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
>  endif
>
> -# Disable gcov profiling for VDSO code
> +# Disable gcov and PGO profiling for VDSO code
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  obj-y += vdso.o
>  targets += vdso.lds
> diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
> index 1f1e351c5fe2b..ad128ecdbfbdf 100644
> --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> @@ -60,6 +60,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS), $(KBUILD_CFLAG
>  # compiler instrumentation that inserts callbacks or checks into the code may
>  # cause crashes. Just disable it.
>  GCOV_PROFILE   := n
> +PGO_PROFILE    := n
>  KASAN_SANITIZE := n
>  UBSAN_SANITIZE := n
>  KCOV_INSTRUMENT        := n
> diff --git a/arch/mips/boot/compressed/Makefile b/arch/mips/boot/compressed/Makefile
> index 47cd9dc7454af..0855ea12f2c7f 100644
> --- a/arch/mips/boot/compressed/Makefile
> +++ b/arch/mips/boot/compressed/Makefile
> @@ -37,6 +37,7 @@ KBUILD_AFLAGS := $(KBUILD_AFLAGS) -D__ASSEMBLY__ \
>  # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
>  KCOV_INSTRUMENT                := n
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  # decompressor objects (linked with vmlinuz)
>  vmlinuzobjs-y := $(obj)/head.o $(obj)/decompress.o $(obj)/string.o
> diff --git a/arch/mips/vdso/Makefile b/arch/mips/vdso/Makefile
> index 5810cc12bc1d9..d7eb64de35eae 100644
> --- a/arch/mips/vdso/Makefile
> +++ b/arch/mips/vdso/Makefile
> @@ -66,6 +66,7 @@ ldflags-y := -Bsymbolic --no-undefined -soname=linux-vdso.so.1 \
>  CFLAGS_REMOVE_vdso.o = -pg
>
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE := n
>  KCOV_INSTRUMENT := n
>
> diff --git a/arch/nds32/kernel/vdso/Makefile b/arch/nds32/kernel/vdso/Makefile
> index 55df25ef00578..f2b53ee2124b7 100644
> --- a/arch/nds32/kernel/vdso/Makefile
> +++ b/arch/nds32/kernel/vdso/Makefile
> @@ -15,9 +15,9 @@ obj-vdso := $(addprefix $(obj)/, $(obj-vdso))
>  ccflags-y := -shared -fno-common -fno-builtin -nostdlib -fPIC -Wl,-shared -g \
>         -Wl,-soname=linux-vdso.so.1 -Wl,--hash-style=sysv
>
> -# Disable gcov profiling for VDSO code
> +# Disable gcov and PGO profiling for VDSO code
>  GCOV_PROFILE := n
> -
> +PGO_PROFILE := n
>
>  obj-y += vdso.o
>  targets += vdso.lds
> diff --git a/arch/parisc/boot/compressed/Makefile b/arch/parisc/boot/compressed/Makefile
> index dff4536875305..5cf93a67f7da7 100644
> --- a/arch/parisc/boot/compressed/Makefile
> +++ b/arch/parisc/boot/compressed/Makefile
> @@ -7,6 +7,7 @@
>
>  KCOV_INSTRUMENT := n
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE := n
>
>  targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index fe2ef598e2ead..c642c046660d7 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -153,17 +153,21 @@ endif
>  obj-$(CONFIG_PPC_SECURE_BOOT)  += secure_boot.o ima_arch.o secvar-ops.o
>  obj-$(CONFIG_PPC_SECVAR_SYSFS) += secvar-sysfs.o
>
> -# Disable GCOV, KCOV & sanitizers in odd or sensitive code
> +# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
>  GCOV_PROFILE_prom_init.o := n
> +PGO_PROFILE_prom_init.o := n
>  KCOV_INSTRUMENT_prom_init.o := n
>  UBSAN_SANITIZE_prom_init.o := n
>  GCOV_PROFILE_kprobes.o := n
> +PGO_PROFILE_kprobes.o := n
>  KCOV_INSTRUMENT_kprobes.o := n
>  UBSAN_SANITIZE_kprobes.o := n
>  GCOV_PROFILE_kprobes-ftrace.o := n
> +PGO_PROFILE_kprobes-ftrace.o := n
>  KCOV_INSTRUMENT_kprobes-ftrace.o := n
>  UBSAN_SANITIZE_kprobes-ftrace.o := n
>  GCOV_PROFILE_syscall_64.o := n
> +PGO_PROFILE_syscall_64.o := n
>  KCOV_INSTRUMENT_syscall_64.o := n
>  UBSAN_SANITIZE_syscall_64.o := n
>  UBSAN_SANITIZE_vdso.o := n
> diff --git a/arch/powerpc/kernel/trace/Makefile b/arch/powerpc/kernel/trace/Makefile
> index 858503775c583..7d72ae7d4f8c6 100644
> --- a/arch/powerpc/kernel/trace/Makefile
> +++ b/arch/powerpc/kernel/trace/Makefile
> @@ -23,7 +23,8 @@ obj-$(CONFIG_TRACING)                 += trace_clock.o
>  obj-$(CONFIG_PPC64)                    += $(obj64-y)
>  obj-$(CONFIG_PPC32)                    += $(obj32-y)
>
> -# Disable GCOV, KCOV & sanitizers in odd or sensitive code
> +# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
>  GCOV_PROFILE_ftrace.o := n
> +PGO_PROFILE_ftrace.o := n
>  KCOV_INSTRUMENT_ftrace.o := n
>  UBSAN_SANITIZE_ftrace.o := n
> diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile
> index 9cb6f524854b9..655e159975a04 100644
> --- a/arch/powerpc/kernel/vdso32/Makefile
> +++ b/arch/powerpc/kernel/vdso32/Makefile
> @@ -34,6 +34,7 @@ targets := $(obj-vdso32) vdso32.so.dbg
>  obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32))
>
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  KCOV_INSTRUMENT := n
>  UBSAN_SANITIZE := n
>  KASAN_SANITIZE := n
> diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile
> index bf363ff371521..12c286f5afc16 100644
> --- a/arch/powerpc/kernel/vdso64/Makefile
> +++ b/arch/powerpc/kernel/vdso64/Makefile
> @@ -21,6 +21,7 @@ targets := $(obj-vdso64) vdso64.so.dbg
>  obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64))
>
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  KCOV_INSTRUMENT := n
>  UBSAN_SANITIZE := n
>  KASAN_SANITIZE := n
> diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
> index 4aff6846c7726..1c7f65e3cb969 100644
> --- a/arch/powerpc/kexec/Makefile
> +++ b/arch/powerpc/kexec/Makefile
> @@ -16,7 +16,8 @@ endif
>  endif
>
>
> -# Disable GCOV, KCOV & sanitizers in odd or sensitive code
> +# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
>  GCOV_PROFILE_core_$(BITS).o := n
> +PGO_PROFILE_core_$(BITS).o := n
>  KCOV_INSTRUMENT_core_$(BITS).o := n
>  UBSAN_SANITIZE_core_$(BITS).o := n
> diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
> index eb25d7554ffd1..7aff80d18b44b 100644
> --- a/arch/powerpc/xmon/Makefile
> +++ b/arch/powerpc/xmon/Makefile
> @@ -2,6 +2,7 @@
>  # Makefile for xmon
>
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  KCOV_INSTRUMENT := n
>  UBSAN_SANITIZE := n
>  KASAN_SANITIZE := n
> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
> index 0cfd6da784f84..882340dc3c647 100644
> --- a/arch/riscv/kernel/vdso/Makefile
> +++ b/arch/riscv/kernel/vdso/Makefile
> @@ -32,8 +32,9 @@ CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
>  # Disable -pg to prevent insert call site
>  CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
>
> -# Disable gcov profiling for VDSO code
> +# Disable gcov and PGO profiling for VDSO code
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  KCOV_INSTRUMENT := n
>
>  # Force dependency
> diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
> index 41a64b8dce252..bee4a32040e79 100644
> --- a/arch/s390/boot/Makefile
> +++ b/arch/s390/boot/Makefile
> @@ -5,6 +5,7 @@
>
>  KCOV_INSTRUMENT := n
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE := n
>  KASAN_SANITIZE := n
>
> diff --git a/arch/s390/boot/compressed/Makefile b/arch/s390/boot/compressed/Makefile
> index de18dab518bb6..c3ab883e8425a 100644
> --- a/arch/s390/boot/compressed/Makefile
> +++ b/arch/s390/boot/compressed/Makefile
> @@ -7,6 +7,7 @@
>
>  KCOV_INSTRUMENT := n
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE := n
>  KASAN_SANITIZE := n
>
> diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
> index dd73b7f074237..bd857aacad794 100644
> --- a/arch/s390/kernel/Makefile
> +++ b/arch/s390/kernel/Makefile
> @@ -14,6 +14,7 @@ CFLAGS_REMOVE_early.o         = $(CC_FLAGS_FTRACE)
>  endif
>
>  GCOV_PROFILE_early.o           := n
> +PGO_PROFILE_early.o            := n
>  KCOV_INSTRUMENT_early.o                := n
>  UBSAN_SANITIZE_early.o         := n
>  KASAN_SANITIZE_ipl.o           := n
> diff --git a/arch/s390/kernel/vdso64/Makefile b/arch/s390/kernel/vdso64/Makefile
> index a6e0fb6b91d6c..d7c43b7c1db96 100644
> --- a/arch/s390/kernel/vdso64/Makefile
> +++ b/arch/s390/kernel/vdso64/Makefile
> @@ -35,8 +35,9 @@ obj-y += vdso64_wrapper.o
>  targets += vdso64.lds
>  CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
>
> -# Disable gcov profiling, ubsan and kasan for VDSO code
> +# Disable gcov and PGO profiling, ubsan and kasan for VDSO code
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE := n
>  KASAN_SANITIZE := n
>
> diff --git a/arch/s390/purgatory/Makefile b/arch/s390/purgatory/Makefile
> index c57f8c40e9926..9aef584e98466 100644
> --- a/arch/s390/purgatory/Makefile
> +++ b/arch/s390/purgatory/Makefile
> @@ -17,6 +17,7 @@ $(obj)/mem.o: $(srctree)/arch/s390/lib/mem.S FORCE
>
>  KCOV_INSTRUMENT := n
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE := n
>  KASAN_SANITIZE := n
>
> diff --git a/arch/sh/boot/compressed/Makefile b/arch/sh/boot/compressed/Makefile
> index 589d2d8a573db..ae19aeeb3964c 100644
> --- a/arch/sh/boot/compressed/Makefile
> +++ b/arch/sh/boot/compressed/Makefile
> @@ -13,6 +13,7 @@ targets               := vmlinux vmlinux.bin vmlinux.bin.gz \
>  OBJECTS = $(obj)/head_32.o $(obj)/misc.o $(obj)/cache.o
>
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  #
>  # IMAGE_OFFSET is the load offset of the compression loader
> diff --git a/arch/sh/mm/Makefile b/arch/sh/mm/Makefile
> index f69ddc70b1465..ea2782c631f43 100644
> --- a/arch/sh/mm/Makefile
> +++ b/arch/sh/mm/Makefile
> @@ -43,3 +43,4 @@ obj-$(CONFIG_UNCACHED_MAPPING)        += uncached.o
>  obj-$(CONFIG_HAVE_SRAM_POOL)   += sram.o
>
>  GCOV_PROFILE_pmb.o := n
> +PGO_PROFILE_pmb.o := n
> diff --git a/arch/sparc/vdso/Makefile b/arch/sparc/vdso/Makefile
> index c5e1545bc5cf9..ab5f3783fe199 100644
> --- a/arch/sparc/vdso/Makefile
> +++ b/arch/sparc/vdso/Makefile
> @@ -115,6 +115,7 @@ quiet_cmd_vdso = VDSO    $@
>
>  VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 -Bsymbolic
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  #
>  # Install the unstripped copies of vdso*.so.  If our toolchain supports
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 21f851179ff08..36305ea61dc09 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -96,6 +96,7 @@ config X86
>         select ARCH_SUPPORTS_DEBUG_PAGEALLOC
>         select ARCH_SUPPORTS_NUMA_BALANCING     if X86_64
>         select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP       if NR_CPUS <= 4096
> +       select ARCH_SUPPORTS_PGO_CLANG          if X86_64
>         select ARCH_USE_BUILTIN_BSWAP
>         select ARCH_USE_QUEUED_RWLOCKS
>         select ARCH_USE_QUEUED_SPINLOCKS
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index fe605205b4ce2..383853e32f673 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
>  KBUILD_CFLAGS  += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
>  KBUILD_CFLAGS  += -fno-asynchronous-unwind-tables
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE := n
>
>  $(obj)/bzImage: asflags-y  := $(SVGA_MODE)
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index e0bc3988c3faa..ed12ab65f6065 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
>
>  KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE :=n
>
>  KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index 02e3e42f380bd..26e2b3af0145c 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO    $@
>  VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
>         $(call ld-option, --eh-frame-hdr) -Bsymbolic
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  quiet_cmd_vdso_and_check = VDSO    $@
>        cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index efd9e9ea17f25..f6cab2316c46a 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -184,6 +184,8 @@ SECTIONS
>
>         BUG_TABLE
>
> +       PGO_CLANG_DATA
> +
>         ORC_UNWIND_TABLE
>
>         . = ALIGN(PAGE_SIZE);
> diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> index 84b09c230cbd5..5f22b31446ad4 100644
> --- a/arch/x86/platform/efi/Makefile
> +++ b/arch/x86/platform/efi/Makefile
> @@ -2,6 +2,7 @@
>  OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
>  KASAN_SANITIZE := n
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  obj-$(CONFIG_EFI)              += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
>  obj-$(CONFIG_EFI_MIXED)                += efi_thunk_$(BITS).o
> diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> index 95ea17a9d20cb..36f20e99da0bc 100644
> --- a/arch/x86/purgatory/Makefile
> +++ b/arch/x86/purgatory/Makefile
> @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
>
>  # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
>  GCOV_PROFILE   := n
> +PGO_PROFILE    := n
>  KASAN_SANITIZE := n
>  UBSAN_SANITIZE := n
>  KCSAN_SANITIZE := n
> diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> index 83f1b6a56449f..21797192f958f 100644
> --- a/arch/x86/realmode/rm/Makefile
> +++ b/arch/x86/realmode/rm/Makefile
> @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
>  KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
>  KBUILD_CFLAGS  += -fno-asynchronous-unwind-tables
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE := n
> diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> index 5943387e3f357..54f5768f58530 100644
> --- a/arch/x86/um/vdso/Makefile
> +++ b/arch/x86/um/vdso/Makefile
> @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO    $@
>
>  VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  #
>  # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index 8a94388e38b33..2d81623b33f29 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -40,6 +40,7 @@ KBUILD_CFLAGS                 := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
>  KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
>
>  GCOV_PROFILE                   := n
> +PGO_PROFILE                    := n
>  # Sanitizer runtimes are unavailable and cannot be linked here.
>  KASAN_SANITIZE                 := n
>  KCSAN_SANITIZE                 := n
> diff --git a/drivers/s390/char/Makefile b/drivers/s390/char/Makefile
> index c6fdb81a068a6..bf6c5db5da1fc 100644
> --- a/drivers/s390/char/Makefile
> +++ b/drivers/s390/char/Makefile
> @@ -9,6 +9,7 @@ CFLAGS_REMOVE_sclp_early_core.o = $(CC_FLAGS_FTRACE)
>  endif
>
>  GCOV_PROFILE_sclp_early_core.o         := n
> +PGO_PROFILE_sclp_early_core.o          := n
>  KCOV_INSTRUMENT_sclp_early_core.o      := n
>  UBSAN_SANITIZE_sclp_early_core.o       := n
>  KASAN_SANITIZE_sclp_early_core.o       := n
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index b2b3d81b1535a..3a591bb18c5fb 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -316,6 +316,49 @@
>  #define THERMAL_TABLE(name)
>  #endif
>
> +#ifdef CONFIG_PGO_CLANG
> +#define PGO_CLANG_DATA                                                 \
> +       __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) {     \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_start = .;                                   \
> +               __llvm_prf_data_start = .;                              \
> +               KEEP(*(__llvm_prf_data))                                \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_data_end = .;                                \
> +       }                                                               \
> +       __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) {     \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_cnts_start = .;                              \
> +               KEEP(*(__llvm_prf_cnts))                                \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_cnts_end = .;                                \
> +       }                                                               \
> +       __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) {   \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_names_start = .;                             \
> +               KEEP(*(__llvm_prf_names))                               \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_names_end = .;                               \
> +               . = ALIGN(8);                                           \
> +       }                                                               \
> +       __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) {     \
> +               __llvm_prf_vals_start = .;                              \
> +               KEEP(*(__llvm_prf_vals))                                \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_vals_end = .;                                \
> +               . = ALIGN(8);                                           \
> +       }                                                               \
> +       __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) {     \
> +               __llvm_prf_vnds_start = .;                              \
> +               KEEP(*(__llvm_prf_vnds))                                \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_vnds_end = .;                                \
> +               __llvm_prf_end = .;                                     \
> +       }
> +#else
> +#define PGO_CLANG_DATA
> +#endif
> +
>  #define KERNEL_DTB()                                                   \
>         STRUCT_ALIGN();                                                 \
>         __dtb_start = .;                                                \
> @@ -1125,6 +1168,7 @@
>                 CONSTRUCTORS                                            \
>         }                                                               \
>         BUG_TABLE                                                       \
> +       PGO_CLANG_DATA
>
>  #define INIT_TEXT_SECTION(inittext_align)                              \
>         . = ALIGN(inittext_align);                                      \
> diff --git a/kernel/Makefile b/kernel/Makefile
> index aa7368c7eabf3..0b34ca228ba46 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
>  obj-$(CONFIG_KCSAN) += kcsan/
>  obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
>  obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> +obj-$(CONFIG_PGO_CLANG) += pgo/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> new file mode 100644
> index 0000000000000..318d36bb3d106
> --- /dev/null
> +++ b/kernel/pgo/Kconfig
> @@ -0,0 +1,34 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> +
> +config ARCH_SUPPORTS_PGO_CLANG
> +       bool
> +
> +config PGO_CLANG
> +       bool "Enable clang's PGO-based kernel profiling"
> +       depends on DEBUG_FS
> +       depends on ARCH_SUPPORTS_PGO_CLANG
> +       help
> +         This option enables clang's PGO (Profile Guided Optimization) based
> +         code profiling to better optimize the kernel.
> +
> +         If unsure, say N.
> +
> +         Run a representative workload for your application on a kernel
> +         compiled with this option and download the raw profile file from
> +         /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> +         llvm-profdata. It may be merged with other collected raw profiles.
> +
> +         Copy the resulting profile file into vmlinux.profdata, and enable
> +         KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> +         kernel.
> +
> +         Note that a kernel compiled with profiling flags will be
> +         significatnly larger and run slower. Also be sure to exclude files
> +         from profiling which are not linked to the kernel image to prevent
> +         linker errors.
> +
> +         Note that the debugfs filesystem has to be mounted to access
> +         profiling data.
> +
> +endmenu
> diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> new file mode 100644
> index 0000000000000..41e27cefd9a47
> --- /dev/null
> +++ b/kernel/pgo/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +GCOV_PROFILE   := n
> +PGO_PROFILE    := n
> +
> +obj-y  += fs.o instrument.o
> diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> new file mode 100644
> index 0000000000000..790a8df037bfc
> --- /dev/null
> +++ b/kernel/pgo/fs.c
> @@ -0,0 +1,382 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + *     Sami Tolvanen <samitolvanen@google.com>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt)    "pgo: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/debugfs.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include "pgo.h"
> +
> +static struct dentry *directory;
> +
> +struct prf_private_data {
> +       void *buffer;
> +       unsigned long size;
> +};
> +
> +/*
> + * Raw profile data format:
> + *
> + *     - llvm_prf_header
> + *     - __llvm_prf_data
> + *     - __llvm_prf_cnts
> + *     - __llvm_prf_names
> + *     - zero padding to 8 bytes
> + *     - for each llvm_prf_data in __llvm_prf_data:
> + *             - llvm_prf_value_data
> + *                     - llvm_prf_value_record + site count array
> + *                             - llvm_prf_value_node_data
> + *                             ...
> + *                     ...
> + *             ...
> + */
> +
> +static void prf_fill_header(void **buffer)
> +{
> +       struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> +
> +       header->magic = LLVM_PRF_MAGIC;
> +       header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
> +       header->data_size = prf_data_count();
> +       header->padding_bytes_before_counters = 0;
> +       header->counters_size = prf_cnts_count();
> +       header->padding_bytes_after_counters = 0;
> +       header->names_size = prf_names_count();
> +       header->counters_delta = (u64)__llvm_prf_cnts_start;
> +       header->names_delta = (u64)__llvm_prf_names_start;
> +       header->value_kind_last = LLVM_PRF_IPVK_LAST;
> +
> +       *buffer += sizeof(*header);
> +}
> +
> +/*
> + * Copy the source into the buffer, incrementing the pointer into buffer in the
> + * process.
> + */
> +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> +{
> +       memcpy(*buffer, src, size);
> +       *buffer += size;
> +}
> +
> +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> +{
> +       struct llvm_prf_value_node **nodes =
> +               (struct llvm_prf_value_node **)p->values;
> +       u32 kinds = 0;
> +       u32 size = 0;
> +       unsigned int kind;
> +       unsigned int n;
> +       unsigned int s = 0;
> +
> +       for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> +               unsigned int sites = p->num_value_sites[kind];
> +
> +               if (!sites)
> +                       continue;
> +
> +               /* Record + site count array */
> +               size += prf_get_value_record_size(sites);
> +               kinds++;
> +
> +               if (!nodes)
> +                       continue;
> +
> +               for (n = 0; n < sites; n++) {
> +                       u32 count = 0;
> +                       struct llvm_prf_value_node *site = nodes[s + n];
> +
> +                       while (site && ++count <= U8_MAX)
> +                               site = site->next;
> +
> +                       size += count *
> +                               sizeof(struct llvm_prf_value_node_data);
> +               }
> +
> +               s += sites;
> +       }
> +
> +       if (size)
> +               size += sizeof(struct llvm_prf_value_data);
> +
> +       if (value_kinds)
> +               *value_kinds = kinds;
> +
> +       return size;
> +}
> +
> +static u32 prf_get_value_size(void)
> +{
> +       u32 size = 0;
> +       struct llvm_prf_data *p;
> +
> +       for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> +               size += __prf_get_value_size(p, NULL);
> +
> +       return size;
> +}
> +
> +/* Serialize the profiling's value. */
> +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> +{
> +       struct llvm_prf_value_data header;
> +       struct llvm_prf_value_node **nodes =
> +               (struct llvm_prf_value_node **)p->values;
> +       unsigned int kind;
> +       unsigned int n;
> +       unsigned int s = 0;
> +
> +       header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> +
> +       if (!header.num_value_kinds)
> +               /* Nothing to write. */
> +               return;
> +
> +       prf_copy_to_buffer(buffer, &header, sizeof(header));
> +
> +       for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> +               struct llvm_prf_value_record *record;
> +               u8 *counts;
> +               unsigned int sites = p->num_value_sites[kind];
> +
> +               if (!sites)
> +                       continue;
> +
> +               /* Profiling value record. */
> +               record = *(struct llvm_prf_value_record **)buffer;
> +               *buffer += prf_get_value_record_header_size();
> +
> +               record->kind = kind;
> +               record->num_value_sites = sites;
> +
> +               /* Site count array. */
> +               counts = *(u8 **)buffer;
> +               *buffer += prf_get_value_record_site_count_size(sites);
> +
> +               /*
> +                * If we don't have nodes, we can skip updating the site count
> +                * array, because the buffer is zero filled.
> +                */
> +               if (!nodes)
> +                       continue;
> +
> +               for (n = 0; n < sites; n++) {
> +                       u32 count = 0;
> +                       struct llvm_prf_value_node *site = nodes[s + n];
> +
> +                       while (site && ++count <= U8_MAX) {
> +                               prf_copy_to_buffer(buffer, site,
> +                                                  sizeof(struct llvm_prf_value_node_data));
> +                               site = site->next;
> +                       }
> +
> +                       counts[n] = (u8)count;
> +               }
> +
> +               s += sites;
> +       }
> +}
> +
> +static void prf_serialize_values(void **buffer)
> +{
> +       struct llvm_prf_data *p;
> +
> +       for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> +               prf_serialize_value(p, buffer);
> +}
> +
> +static inline unsigned long prf_get_padding(unsigned long size)
> +{
> +       return 8 - (size % 8);
> +}
> +
> +static unsigned long prf_buffer_size(void)
> +{
> +       return sizeof(struct llvm_prf_header) +
> +                       prf_data_size() +
> +                       prf_cnts_size() +
> +                       prf_names_size() +
> +                       prf_get_padding(prf_names_size()) +
> +                       prf_get_value_size();
> +}
> +
> +/* Serialize the profling data into a format LLVM's tools can understand. */
> +static int prf_serialize(struct prf_private_data *p)
> +{
> +       int err = 0;
> +       void *buffer;
> +
> +       p->size = prf_buffer_size();
> +       p->buffer = vzalloc(p->size);
> +
> +       if (!p->buffer) {
> +               err = -ENOMEM;
> +               goto out;
> +       }
> +
> +       buffer = p->buffer;
> +
> +       prf_fill_header(&buffer);
> +       prf_copy_to_buffer(&buffer, __llvm_prf_data_start,  prf_data_size());
> +       prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start,  prf_cnts_size());
> +       prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> +       buffer += prf_get_padding(prf_names_size());
> +
> +       prf_serialize_values(&buffer);
> +
> +out:
> +       return err;
> +}
> +
> +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> +static int prf_open(struct inode *inode, struct file *file)
> +{
> +       struct prf_private_data *data;
> +       unsigned long flags;
> +       int err;
> +
> +       data = kzalloc(sizeof(*data), GFP_KERNEL);
> +       if (!data) {
> +               err = -ENOMEM;
> +               goto out;
> +       }
> +
> +       flags = prf_lock();
> +
> +       err = prf_serialize(data);
> +       if (err) {
> +               kfree(data);
> +               goto out_unlock;
> +       }
> +
> +       file->private_data = data;
> +
> +out_unlock:
> +       prf_unlock(flags);
> +out:
> +       return err;
> +}
> +
> +/* read() implementation for PGO. */
> +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> +                       loff_t *ppos)
> +{
> +       struct prf_private_data *data = file->private_data;
> +
> +       BUG_ON(!data);
> +
> +       return simple_read_from_buffer(buf, count, ppos, data->buffer,
> +                                      data->size);
> +}
> +
> +/* release() implementation for PGO. Release resources allocated by open(). */
> +static int prf_release(struct inode *inode, struct file *file)
> +{
> +       struct prf_private_data *data = file->private_data;
> +
> +       if (data) {
> +               vfree(data->buffer);
> +               kfree(data);
> +       }
> +
> +       return 0;
> +}
> +
> +static const struct file_operations prf_fops = {
> +       .owner          = THIS_MODULE,
> +       .open           = prf_open,
> +       .read           = prf_read,
> +       .llseek         = default_llseek,
> +       .release        = prf_release
> +};
> +
> +/* write() implementation for resetting PGO's profile data. */
> +static ssize_t reset_write(struct file *file, const char __user *addr,
> +                          size_t len, loff_t *pos)
> +{
> +       struct llvm_prf_data *data;
> +
> +       memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> +
> +       for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
> +               struct llvm_prf_value_node **vnodes;
> +               u64 current_vsite_count;
> +               u32 i;
> +
> +               if (!data->values)
> +                       continue;
> +
> +               current_vsite_count = 0;
> +               vnodes = (struct llvm_prf_value_node **)data->values;
> +
> +               for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
> +                       current_vsite_count += data->num_value_sites[i];
> +
> +               for (i = 0; i < current_vsite_count; ++i) {
> +                       struct llvm_prf_value_node *current_vnode = vnodes[i];
> +
> +                       while (current_vnode) {
> +                               current_vnode->count = 0;
> +                               current_vnode = current_vnode->next;
> +                       }
> +               }
> +       }
> +
> +       return len;
> +}
> +
> +static const struct file_operations prf_reset_fops = {
> +       .owner          = THIS_MODULE,
> +       .write          = reset_write,
> +       .llseek         = noop_llseek,
> +};
> +
> +/* Create debugfs entries. */
> +static int __init pgo_init(void)
> +{
> +       directory = debugfs_create_dir("pgo", NULL);
> +       if (!directory)
> +               goto err_remove;
> +
> +       if (!debugfs_create_file("profraw", 0600, directory, NULL,
> +                                &prf_fops))
> +               goto err_remove;
> +
> +       if (!debugfs_create_file("reset", 0200, directory, NULL,
> +                                &prf_reset_fops))
> +               goto err_remove;
> +
> +       return 0;
> +
> +err_remove:
> +       pr_err("initialization failed\n");
> +       return -EIO;
> +}
> +
> +/* Remove debufs entries. */
> +static void __exit pgo_exit(void)
> +{
> +       debugfs_remove_recursive(directory);
> +}
> +
> +module_init(pgo_init);
> +module_exit(pgo_exit);
> diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> new file mode 100644
> index 0000000000000..465615b7f8735
> --- /dev/null
> +++ b/kernel/pgo/instrument.c
> @@ -0,0 +1,188 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + *     Sami Tolvanen <samitolvanen@google.com>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt)    "pgo: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/export.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +#include "pgo.h"
> +
> +/* Lock guarding value node access and serialization. */
> +static DEFINE_SPINLOCK(pgo_lock);
> +static int current_node;
> +
> +unsigned long prf_lock(void)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&pgo_lock, flags);
> +
> +       return flags;
> +}
> +
> +void prf_unlock(unsigned long flags)
> +{
> +       spin_unlock_irqrestore(&pgo_lock, flags);
> +}
> +
> +/*
> + * Return a newly allocated profiling value node which contains the tracked
> + * value by the value profiler.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> +                                                u32 index, u64 value)
> +{
> +       if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> +               return NULL; /* Out of nodes */
> +
> +       current_node++;
> +
> +       /* Make sure the node is entirely within the section */
> +       if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> +           &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> +               return NULL;
> +
> +       return &__llvm_prf_vnds_start[current_node];
> +}
> +
> +/*
> + * Counts the number of times a target value is seen.
> + *
> + * Records the target value for the CounterIndex if not seen before. Otherwise,
> + * increments the counter associated w/ the target value.
> + */
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> +{
> +       struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> +       struct llvm_prf_value_node **counters;
> +       struct llvm_prf_value_node *curr;
> +       struct llvm_prf_value_node *min = NULL;
> +       struct llvm_prf_value_node *prev = NULL;
> +       u64 min_count = U64_MAX;
> +       u8 values = 0;
> +       unsigned long flags;
> +
> +       if (!p || !p->values)
> +               return;
> +
> +       counters = (struct llvm_prf_value_node **)p->values;
> +       curr = counters[index];
> +
> +       while (curr) {
> +               if (target_value == curr->value) {
> +                       curr->count++;
> +                       return;
> +               }
> +
> +               if (curr->count < min_count) {
> +                       min_count = curr->count;
> +                       min = curr;
> +               }
> +
> +               prev = curr;
> +               curr = curr->next;
> +               values++;
> +       }
> +
> +       if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
> +               if (!min->count || !(--min->count)) {
> +                       curr = min;
> +                       curr->value = target_value;
> +                       curr->count++;
> +               }
> +               return;
> +       }
> +
> +       /* Lock when updating the value node structure. */
> +       flags = prf_lock();
> +
> +       curr = allocate_node(p, index, target_value);
> +       if (!curr)
> +               goto out;
> +
> +       curr->value = target_value;
> +       curr->count++;
> +
> +       if (!counters[index])
> +               counters[index] = curr;
> +       else if (prev && !prev->next)
> +               prev->next = curr;
> +
> +out:
> +       prf_unlock(flags);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> +
> +/* Counts the number of times a range of targets values are seen. */
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> +                                    u32 index, s64 precise_start,
> +                                    s64 precise_last, s64 large_value)
> +{
> +       if (large_value != S64_MIN && (s64)target_value >= large_value)
> +               target_value = large_value;
> +       else if ((s64)target_value < precise_start ||
> +                (s64)target_value > precise_last)
> +               target_value = precise_last + 1;
> +
> +       __llvm_profile_instrument_target(target_value, data, index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> +
> +static inline int inst_prof_popcount(unsigned long long value)
> +{
> +       value = value - ((value >> 1) & 0x5555555555555555ULL);
> +       value = (value & 0x3333333333333333ULL) +
> +               ((value >> 2) & 0x3333333333333333ULL);
> +       value = (value + (value >> 4)) & 0x0F0F0F0F0F0F0F0FULL;
> +
> +       return (int)((unsigned long long)(value * 0x0101010101010101ULL) >> 56);
> +}
> +
> +static u64 inst_prof_get_range_rep_value(u64 value)
> +{
> +       if (value <= 8)
> +               /* The first ranges are individually tracked, us it as is. */
> +               return value;
> +       else if (value >= 513)
> +               /* The last range is mapped to its lowest value. */
> +               return 513;
> +       else if (inst_prof_popcount(value) == 1)
> +               /* If it's a power of two, use it as is. */
> +               return value;
> +
> +       /* Otherwise, take to the previous power of two + 1. */
> +       return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
> +}
> +
> +/*
> + * The target values are partitioned into multiple ranges. The range spec is
> + * defined in compiler-rt/include/profile/InstrProfData.inc.
> + */
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> +                                    u32 counter_index)
> +{
> +       u64 rep_value;
> +
> +       /* Map the target value to the representative value of its range. */
> +       rep_value = inst_prof_get_range_rep_value(target_value);
> +       __llvm_profile_instrument_target(rep_value, data, counter_index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> new file mode 100644
> index 0000000000000..df0aa278f28bd
> --- /dev/null
> +++ b/kernel/pgo/pgo.h
> @@ -0,0 +1,206 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + *     Sami Tolvanen <samitolvanen@google.com>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#ifndef _PGO_H
> +#define _PGO_H
> +
> +/*
> + * Note: These internal LLVM definitions must match the compiler version.
> + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> + */
> +
> +#ifdef CONFIG_64BIT
> +       #define LLVM_PRF_MAGIC          \
> +               ((u64)255 << 56 |       \
> +                (u64)'l' << 48 |       \
> +                (u64)'p' << 40 |       \
> +                (u64)'r' << 32 |       \
> +                (u64)'o' << 24 |       \
> +                (u64)'f' << 16 |       \
> +                (u64)'r' << 8  |       \
> +                (u64)129)
> +#else
> +       #define LLVM_PRF_MAGIC          \
> +               ((u64)255 << 56 |       \
> +                (u64)'l' << 48 |       \
> +                (u64)'p' << 40 |       \
> +                (u64)'r' << 32 |       \
> +                (u64)'o' << 24 |       \
> +                (u64)'f' << 16 |       \
> +                (u64)'R' << 8  |       \
> +                (u64)129)
> +#endif
> +
> +#define LLVM_PRF_VERSION               5
> +#define LLVM_PRF_DATA_ALIGN            8
> +#define LLVM_PRF_IPVK_FIRST            0
> +#define LLVM_PRF_IPVK_LAST             1
> +#define LLVM_PRF_MAX_NUM_VALS_PER_SITE 16
> +
> +#define LLVM_PRF_VARIANT_MASK_IR       (0x1ull << 56)
> +#define LLVM_PRF_VARIANT_MASK_CSIR     (0x1ull << 57)
> +
> +/**
> + * struct llvm_prf_header - represents the raw profile header data structure.
> + * @magic: the magic token for the file format.
> + * @version: the version of the file format.
> + * @data_size: the number of entries in the profile data section.
> + * @padding_bytes_before_counters: the number of padding bytes before the
> + *   counters.
> + * @counters_size: the size in bytes of the LLVM profile section containing the
> + *   counters.
> + * @padding_bytes_after_counters: the number of padding bytes after the
> + *   counters.
> + * @names_size: the size in bytes of the LLVM profile section containing the
> + *   counters' names.
> + * @counters_delta: the beginning of the LLMV profile counters section.
> + * @names_delta: the beginning of the LLMV profile names section.
> + * @value_kind_last: the last profile value kind.
> + */
> +struct llvm_prf_header {
> +       u64 magic;
> +       u64 version;
> +       u64 data_size;
> +       u64 padding_bytes_before_counters;
> +       u64 counters_size;
> +       u64 padding_bytes_after_counters;
> +       u64 names_size;
> +       u64 counters_delta;
> +       u64 names_delta;
> +       u64 value_kind_last;
> +};
> +
> +/**
> + * struct llvm_prf_data - represents the per-function control structure.
> + * @name_ref: the reference to the function's name.
> + * @func_hash: the hash value of the function.
> + * @counter_ptr: a pointer to the profile counter.
> + * @function_ptr: a pointer to the function.
> + * @values: the profiling values associated with this function.
> + * @num_counters: the number of counters in the function.
> + * @num_value_sites: the number of value profile sites.
> + */
> +struct llvm_prf_data {
> +       const u64 name_ref;
> +       const u64 func_hash;
> +       const void *counter_ptr;
> +       const void *function_ptr;
> +       void *values;
> +       const u32 num_counters;
> +       const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
> +} __aligned(LLVM_PRF_DATA_ALIGN);
> +
> +/**
> + * structure llvm_prf_value_node_data - represents the data part of the struct
> + *   llvm_prf_value_node data structure.
> + * @value: the value counters.
> + * @count: the counters' count.
> + */
> +struct llvm_prf_value_node_data {
> +       u64 value;
> +       u64 count;
> +};
> +
> +/**
> + * struct llvm_prf_value_node - represents an internal data structure used by
> + *   the value profiler.
> + * @value: the value counters.
> + * @count: the counters' count.
> + * @next: the next value node.
> + */
> +struct llvm_prf_value_node {
> +       u64 value;
> +       u64 count;
> +       struct llvm_prf_value_node *next;
> +};
> +
> +/**
> + * struct llvm_prf_value_data - represents the value profiling data in indexed
> + *   format.
> + * @total_size: the total size in bytes including this field.
> + * @num_value_kinds: the number of value profile kinds that has value profile
> + *   data.
> + */
> +struct llvm_prf_value_data {
> +       u32 total_size;
> +       u32 num_value_kinds;
> +};
> +
> +/**
> + * struct llvm_prf_value_record - represents the on-disk layout of the value
> + *   profile data of a particular kind for one function.
> + * @kind: the kind of the value profile record.
> + * @num_value_sites: the number of value profile sites.
> + * @site_count_array: the first element of the array that stores the number
> + *   of profiled values for each value site.
> + */
> +struct llvm_prf_value_record {
> +       u32 kind;
> +       u32 num_value_sites;
> +       u8 site_count_array[];
> +};
> +
> +#define prf_get_value_record_header_size()             \
> +       offsetof(struct llvm_prf_value_record, site_count_array)
> +#define prf_get_value_record_site_count_size(sites)    \
> +       roundup((sites), 8)
> +#define prf_get_value_record_size(sites)               \
> +       (prf_get_value_record_header_size() +           \
> +        prf_get_value_record_site_count_size((sites)))
> +
> +/* Data sections */
> +extern struct llvm_prf_data __llvm_prf_data_start[];
> +extern struct llvm_prf_data __llvm_prf_data_end[];
> +
> +extern u64 __llvm_prf_cnts_start[];
> +extern u64 __llvm_prf_cnts_end[];
> +
> +extern char __llvm_prf_names_start[];
> +extern char __llvm_prf_names_end[];
> +
> +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> +
> +/* Locking for vnodes */
> +extern unsigned long prf_lock(void);
> +extern void prf_unlock(unsigned long flags);
> +
> +#define __DEFINE_PRF_SIZE(s) \
> +       static inline unsigned long prf_ ## s ## _size(void)            \
> +       {                                                               \
> +               unsigned long start =                                   \
> +                       (unsigned long)__llvm_prf_ ## s ## _start;      \
> +               unsigned long end =                                     \
> +                       (unsigned long)__llvm_prf_ ## s ## _end;        \
> +               return roundup(end - start,                             \
> +                               sizeof(__llvm_prf_ ## s ## _start[0])); \
> +       }                                                               \
> +       static inline unsigned long prf_ ## s ## _count(void)           \
> +       {                                                               \
> +               return prf_ ## s ## _size() /                           \
> +                       sizeof(__llvm_prf_ ## s ## _start[0]);          \
> +       }
> +
> +__DEFINE_PRF_SIZE(data);
> +__DEFINE_PRF_SIZE(cnts);
> +__DEFINE_PRF_SIZE(names);
> +__DEFINE_PRF_SIZE(vnds);
> +
> +#undef __DEFINE_PRF_SIZE
> +
> +#endif /* _PGO_H */
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 213677a5ed33e..9b218afb5cb87 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
>                 $(CFLAGS_GCOV))
>  endif
>
> +#
> +# Enable clang's PGO profiling flags for a file or directory depending on
> +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> +#
> +ifeq ($(CONFIG_PGO_CLANG),y)
> +_c_flags += $(if $(patsubst n%,, \
> +               $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> +               $(CFLAGS_PGO_CLANG))
> +endif
> +
>  #
>  # Enable address sanitizer flags for kernel except some files or directories
>  # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> --
> 2.30.0.284.gd98b1dd5eaa7-goog
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210112051428.4175583-1-morbo%40google.com.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* [PATCH v3] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-12  5:14 ` [PATCH v2] " Bill Wendling
  2021-01-12  5:17   ` Sedat Dilek
@ 2021-01-12  5:31   ` Bill Wendling
       [not found]     ` <202101121755.pyYoRozB-lkp@intel.com>
  2021-01-13  6:19     ` [PATCH v4] " Bill Wendling
  2021-01-12 17:37   ` [PATCH v2] " Nick Desaulniers
  2 siblings, 2 replies; 116+ messages in thread
From: Bill Wendling @ 2021-01-12  5:31 UTC (permalink / raw)
  To: Jonathan Corbet, Masahiro Yamada, linux-doc, linux-kernel,
	linux-kbuild, clang-built-linux, Andrew Morton
  Cc: Nathan Chancellor, Nick Desaulniers, Sami Tolvanen, Bill Wendling

From: Sami Tolvanen <samitolvanen@google.com>

Enable the use of clang's Profile-Guided Optimization[1]. To generate a
profile, the kernel is instrumented with PGO counters, a representative
workload is run, and the raw profile data is collected from
/sys/kernel/debug/pgo/profraw.

The raw profile data must be processed by clang's "llvm-profdata" tool
before it can be used during recompilation:

  $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
  $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw

Multiple raw profiles may be merged during this step.

The data can now be used by the compiler:

  $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...

This initial submission is restricted to x86, as that's the platform we
know works. This restriction can be lifted once other platforms have
been verified to work with PGO.

Note that this method of profiling the kernel is clang-native and isn't
compatible with clang's gcov support in kernel/gcov.

[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Co-developed-by: Bill Wendling <morbo@google.com>
Signed-off-by: Bill Wendling <morbo@google.com>
---
v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
      testing.
    - Corrected documentation, re PGO flags when using LTO, based on Fāng-ruì
      Sòng's comments.
v3: - Added change log section based on Sedat Dilek's comments.
---
 Documentation/dev-tools/index.rst     |   1 +
 Documentation/dev-tools/pgo.rst       | 127 +++++++++
 MAINTAINERS                           |   9 +
 Makefile                              |   3 +
 arch/Kconfig                          |   1 +
 arch/arm/boot/bootp/Makefile          |   1 +
 arch/arm/boot/compressed/Makefile     |   1 +
 arch/arm/vdso/Makefile                |   3 +-
 arch/arm64/kernel/vdso/Makefile       |   3 +-
 arch/arm64/kvm/hyp/nvhe/Makefile      |   1 +
 arch/mips/boot/compressed/Makefile    |   1 +
 arch/mips/vdso/Makefile               |   1 +
 arch/nds32/kernel/vdso/Makefile       |   4 +-
 arch/parisc/boot/compressed/Makefile  |   1 +
 arch/powerpc/kernel/Makefile          |   6 +-
 arch/powerpc/kernel/trace/Makefile    |   3 +-
 arch/powerpc/kernel/vdso32/Makefile   |   1 +
 arch/powerpc/kernel/vdso64/Makefile   |   1 +
 arch/powerpc/kexec/Makefile           |   3 +-
 arch/powerpc/xmon/Makefile            |   1 +
 arch/riscv/kernel/vdso/Makefile       |   3 +-
 arch/s390/boot/Makefile               |   1 +
 arch/s390/boot/compressed/Makefile    |   1 +
 arch/s390/kernel/Makefile             |   1 +
 arch/s390/kernel/vdso64/Makefile      |   3 +-
 arch/s390/purgatory/Makefile          |   1 +
 arch/sh/boot/compressed/Makefile      |   1 +
 arch/sh/mm/Makefile                   |   1 +
 arch/sparc/vdso/Makefile              |   1 +
 arch/x86/Kconfig                      |   1 +
 arch/x86/boot/Makefile                |   1 +
 arch/x86/boot/compressed/Makefile     |   1 +
 arch/x86/entry/vdso/Makefile          |   1 +
 arch/x86/kernel/vmlinux.lds.S         |   2 +
 arch/x86/platform/efi/Makefile        |   1 +
 arch/x86/purgatory/Makefile           |   1 +
 arch/x86/realmode/rm/Makefile         |   1 +
 arch/x86/um/vdso/Makefile             |   1 +
 drivers/firmware/efi/libstub/Makefile |   1 +
 drivers/s390/char/Makefile            |   1 +
 include/asm-generic/vmlinux.lds.h     |  44 +++
 kernel/Makefile                       |   1 +
 kernel/pgo/Kconfig                    |  34 +++
 kernel/pgo/Makefile                   |   5 +
 kernel/pgo/fs.c                       | 382 ++++++++++++++++++++++++++
 kernel/pgo/instrument.c               | 188 +++++++++++++
 kernel/pgo/pgo.h                      | 206 ++++++++++++++
 scripts/Makefile.lib                  |  10 +
 48 files changed, 1058 insertions(+), 9 deletions(-)
 create mode 100644 Documentation/dev-tools/pgo.rst
 create mode 100644 kernel/pgo/Kconfig
 create mode 100644 kernel/pgo/Makefile
 create mode 100644 kernel/pgo/fs.c
 create mode 100644 kernel/pgo/instrument.c
 create mode 100644 kernel/pgo/pgo.h

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index f7809c7b1ba9e..8d6418e858062 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -26,6 +26,7 @@ whole; patches welcome!
    kgdb
    kselftest
    kunit/index
+   pgo
 
 
 .. only::  subproject and html
diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
new file mode 100644
index 0000000000000..da0e654ae7078
--- /dev/null
+++ b/Documentation/dev-tools/pgo.rst
@@ -0,0 +1,127 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+Using PGO with the Linux kernel
+===============================
+
+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
+when building with Clang. The profiling data is exported via the ``pgo``
+debugfs directory.
+
+.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
+
+
+Preparation
+===========
+
+Configure the kernel with:
+
+.. code-block:: make
+
+   CONFIG_DEBUG_FS=y
+   CONFIG_PGO_CLANG=y
+
+Note that kernels compiled with profiling flags will be significantly larger
+and run slower.
+
+Profiling data will only become accessible once debugfs has been mounted:
+
+.. code-block:: sh
+
+   mount -t debugfs none /sys/kernel/debug
+
+
+Customization
+=============
+
+You can enable or disable profiling for individual file and directories by
+adding a line similar to the following to the respective kernel Makefile:
+
+- For a single file (e.g. main.o)
+
+  .. code-block:: make
+
+     PGO_PROFILE_main.o := y
+
+- For all files in one directory
+
+  .. code-block:: make
+
+     PGO_PROFILE := y
+
+To exclude files from being profiled use
+
+  .. code-block:: make
+
+     PGO_PROFILE_main.o := n
+
+and
+
+  .. code-block:: make
+
+     PGO_PROFILE := n
+
+Only files which are linked to the main kernel image or are compiled as kernel
+modules are supported by this mechanism.
+
+
+Files
+=====
+
+The PGO kernel support creates the following files in debugfs:
+
+``/sys/kernel/debug/pgo``
+	Parent directory for all PGO-related files.
+
+``/sys/kernel/debug/pgo/reset``
+	Global reset file: resets all coverage data to zero when written to.
+
+``/sys/kernel/debug/profraw``
+	The raw PGO data that must be processed with ``llvm_profdata``.
+
+
+Workflow
+========
+
+The PGO kernel can be run on the host or test machines. The data though should
+be analyzed with Clang's tools from the same Clang version as the kernel was
+compiled. Clang's tolerant of version skew, but it's easier to use the same
+Clang version.
+
+The profiling data is useful for optimizing the kernel, analyzing coverage,
+etc. Clang offers tools to perform these tasks.
+
+Here is an example workflow for profiling an instrumented kernel with PGO and
+using the result to optimize the kernel:
+
+1) Install the kernel on the TEST machine.
+
+2) Reset the data counters right before running the load tests
+
+   .. code-block:: sh
+
+      echo 1 > /sys/kernel/debug/pgo/reset
+
+3) Run the load tests.
+
+4) Collect the raw profile data
+
+   .. code-block:: sh
+
+      cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
+
+5) (Optional) Download the raw profile data to the HOST machine.
+
+6) Process the raw profile data
+
+   .. code-block:: sh
+
+      llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
+
+   Note that multiple raw profile data files can be merged during this step.
+
+7) Rebuild the kernel using the profile data (PGO disabled)
+
+   .. code-block:: sh
+
+      make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
diff --git a/MAINTAINERS b/MAINTAINERS
index cc1e6a5ee6e67..1b979da316fa4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13954,6 +13954,15 @@ S:	Maintained
 F:	include/linux/personality.h
 F:	include/uapi/linux/personality.h
 
+PGO BASED KERNEL PROFILING
+M:	Sami Tolvanen <samitolvanen@google.com>
+M:	Bill Wendling <wcw@google.com>
+R:	Nathan Chancellor <natechancellor@gmail.com>
+R:	Nick Desaulniers <ndesaulniers@google.com>
+S:	Supported
+F:	Documentation/dev-tools/pgo.rst
+F:	kernel/pgo
+
 PHOENIX RC FLIGHT CONTROLLER ADAPTER
 M:	Marcus Folkesson <marcus.folkesson@gmail.com>
 L:	linux-input@vger.kernel.org
diff --git a/Makefile b/Makefile
index 9e73f82e0d863..9128bfe1ccc97 100644
--- a/Makefile
+++ b/Makefile
@@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
 # Defaults to vmlinux, but the arch makefile usually adds further targets
 all: vmlinux
 
+CFLAGS_PGO_CLANG := -fprofile-generate
+export CFLAGS_PGO_CLANG
+
 CFLAGS_GCOV	:= -fprofile-arcs -ftest-coverage \
 	$(call cc-option,-fno-tree-loop-im) \
 	$(call cc-disable-warning,maybe-uninitialized,)
diff --git a/arch/Kconfig b/arch/Kconfig
index 24862d15f3a36..f39d3991f6bfe 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
 	   pairs of 32-bit arguments, select this option.
 
 source "kernel/gcov/Kconfig"
+source "kernel/pgo/Kconfig"
 
 source "scripts/gcc-plugins/Kconfig"
 
diff --git a/arch/arm/boot/bootp/Makefile b/arch/arm/boot/bootp/Makefile
index 981a8d03f064c..523bd58df0a4b 100644
--- a/arch/arm/boot/bootp/Makefile
+++ b/arch/arm/boot/bootp/Makefile
@@ -7,6 +7,7 @@
 #
 
 GCOV_PROFILE	:= n
+PGO_PROFILE	:= n
 
 LDFLAGS_bootp	:= --no-undefined -X \
 		 --defsym initrd_phys=$(INITRD_PHYS) \
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index fb521efcc6c20..5fd0fd85fc0e5 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -24,6 +24,7 @@ OBJS		+= hyp-stub.o
 endif
 
 GCOV_PROFILE		:= n
+PGO_PROFILE		:= n
 KASAN_SANITIZE		:= n
 
 # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile
index b558bee0e1f6b..11f6ce4b48b56 100644
--- a/arch/arm/vdso/Makefile
+++ b/arch/arm/vdso/Makefile
@@ -36,8 +36,9 @@ else
 CFLAGS_vgettimeofday.o = -O2 -include $(c-gettimeofday-y)
 endif
 
-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
 KCOV_INSTRUMENT := n
diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
index cd9c3fa25902f..d48fc0df07020 100644
--- a/arch/arm64/kernel/vdso/Makefile
+++ b/arch/arm64/kernel/vdso/Makefile
@@ -41,8 +41,9 @@ ifneq ($(c-gettimeofday-y),)
   CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
 endif
 
-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 obj-y += vdso.o
 targets += vdso.lds
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 1f1e351c5fe2b..ad128ecdbfbdf 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -60,6 +60,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS), $(KBUILD_CFLAG
 # compiler instrumentation that inserts callbacks or checks into the code may
 # cause crashes. Just disable it.
 GCOV_PROFILE	:= n
+PGO_PROFILE	:= n
 KASAN_SANITIZE	:= n
 UBSAN_SANITIZE	:= n
 KCOV_INSTRUMENT	:= n
diff --git a/arch/mips/boot/compressed/Makefile b/arch/mips/boot/compressed/Makefile
index 47cd9dc7454af..0855ea12f2c7f 100644
--- a/arch/mips/boot/compressed/Makefile
+++ b/arch/mips/boot/compressed/Makefile
@@ -37,6 +37,7 @@ KBUILD_AFLAGS := $(KBUILD_AFLAGS) -D__ASSEMBLY__ \
 # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
 KCOV_INSTRUMENT		:= n
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 # decompressor objects (linked with vmlinuz)
 vmlinuzobjs-y := $(obj)/head.o $(obj)/decompress.o $(obj)/string.o
diff --git a/arch/mips/vdso/Makefile b/arch/mips/vdso/Makefile
index 5810cc12bc1d9..d7eb64de35eae 100644
--- a/arch/mips/vdso/Makefile
+++ b/arch/mips/vdso/Makefile
@@ -66,6 +66,7 @@ ldflags-y := -Bsymbolic --no-undefined -soname=linux-vdso.so.1 \
 CFLAGS_REMOVE_vdso.o = -pg
 
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 KCOV_INSTRUMENT := n
 
diff --git a/arch/nds32/kernel/vdso/Makefile b/arch/nds32/kernel/vdso/Makefile
index 55df25ef00578..f2b53ee2124b7 100644
--- a/arch/nds32/kernel/vdso/Makefile
+++ b/arch/nds32/kernel/vdso/Makefile
@@ -15,9 +15,9 @@ obj-vdso := $(addprefix $(obj)/, $(obj-vdso))
 ccflags-y := -shared -fno-common -fno-builtin -nostdlib -fPIC -Wl,-shared -g \
 	-Wl,-soname=linux-vdso.so.1 -Wl,--hash-style=sysv
 
-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
 GCOV_PROFILE := n
-
+PGO_PROFILE := n
 
 obj-y += vdso.o
 targets += vdso.lds
diff --git a/arch/parisc/boot/compressed/Makefile b/arch/parisc/boot/compressed/Makefile
index dff4536875305..5cf93a67f7da7 100644
--- a/arch/parisc/boot/compressed/Makefile
+++ b/arch/parisc/boot/compressed/Makefile
@@ -7,6 +7,7 @@
 
 KCOV_INSTRUMENT := n
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 
 targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index fe2ef598e2ead..c642c046660d7 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -153,17 +153,21 @@ endif
 obj-$(CONFIG_PPC_SECURE_BOOT)	+= secure_boot.o ima_arch.o secvar-ops.o
 obj-$(CONFIG_PPC_SECVAR_SYSFS)	+= secvar-sysfs.o
 
-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
 GCOV_PROFILE_prom_init.o := n
+PGO_PROFILE_prom_init.o := n
 KCOV_INSTRUMENT_prom_init.o := n
 UBSAN_SANITIZE_prom_init.o := n
 GCOV_PROFILE_kprobes.o := n
+PGO_PROFILE_kprobes.o := n
 KCOV_INSTRUMENT_kprobes.o := n
 UBSAN_SANITIZE_kprobes.o := n
 GCOV_PROFILE_kprobes-ftrace.o := n
+PGO_PROFILE_kprobes-ftrace.o := n
 KCOV_INSTRUMENT_kprobes-ftrace.o := n
 UBSAN_SANITIZE_kprobes-ftrace.o := n
 GCOV_PROFILE_syscall_64.o := n
+PGO_PROFILE_syscall_64.o := n
 KCOV_INSTRUMENT_syscall_64.o := n
 UBSAN_SANITIZE_syscall_64.o := n
 UBSAN_SANITIZE_vdso.o := n
diff --git a/arch/powerpc/kernel/trace/Makefile b/arch/powerpc/kernel/trace/Makefile
index 858503775c583..7d72ae7d4f8c6 100644
--- a/arch/powerpc/kernel/trace/Makefile
+++ b/arch/powerpc/kernel/trace/Makefile
@@ -23,7 +23,8 @@ obj-$(CONFIG_TRACING)			+= trace_clock.o
 obj-$(CONFIG_PPC64)			+= $(obj64-y)
 obj-$(CONFIG_PPC32)			+= $(obj32-y)
 
-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
 GCOV_PROFILE_ftrace.o := n
+PGO_PROFILE_ftrace.o := n
 KCOV_INSTRUMENT_ftrace.o := n
 UBSAN_SANITIZE_ftrace.o := n
diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile
index 9cb6f524854b9..655e159975a04 100644
--- a/arch/powerpc/kernel/vdso32/Makefile
+++ b/arch/powerpc/kernel/vdso32/Makefile
@@ -34,6 +34,7 @@ targets := $(obj-vdso32) vdso32.so.dbg
 obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32))
 
 GCOV_PROFILE := n
+PGO_PROFILE := n
 KCOV_INSTRUMENT := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile
index bf363ff371521..12c286f5afc16 100644
--- a/arch/powerpc/kernel/vdso64/Makefile
+++ b/arch/powerpc/kernel/vdso64/Makefile
@@ -21,6 +21,7 @@ targets := $(obj-vdso64) vdso64.so.dbg
 obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64))
 
 GCOV_PROFILE := n
+PGO_PROFILE := n
 KCOV_INSTRUMENT := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
index 4aff6846c7726..1c7f65e3cb969 100644
--- a/arch/powerpc/kexec/Makefile
+++ b/arch/powerpc/kexec/Makefile
@@ -16,7 +16,8 @@ endif
 endif
 
 
-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
 GCOV_PROFILE_core_$(BITS).o := n
+PGO_PROFILE_core_$(BITS).o := n
 KCOV_INSTRUMENT_core_$(BITS).o := n
 UBSAN_SANITIZE_core_$(BITS).o := n
diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
index eb25d7554ffd1..7aff80d18b44b 100644
--- a/arch/powerpc/xmon/Makefile
+++ b/arch/powerpc/xmon/Makefile
@@ -2,6 +2,7 @@
 # Makefile for xmon
 
 GCOV_PROFILE := n
+PGO_PROFILE := n
 KCOV_INSTRUMENT := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
index 0cfd6da784f84..882340dc3c647 100644
--- a/arch/riscv/kernel/vdso/Makefile
+++ b/arch/riscv/kernel/vdso/Makefile
@@ -32,8 +32,9 @@ CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
 # Disable -pg to prevent insert call site
 CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
 
-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
 GCOV_PROFILE := n
+PGO_PROFILE := n
 KCOV_INSTRUMENT := n
 
 # Force dependency
diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
index 41a64b8dce252..bee4a32040e79 100644
--- a/arch/s390/boot/Makefile
+++ b/arch/s390/boot/Makefile
@@ -5,6 +5,7 @@
 
 KCOV_INSTRUMENT := n
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
 
diff --git a/arch/s390/boot/compressed/Makefile b/arch/s390/boot/compressed/Makefile
index de18dab518bb6..c3ab883e8425a 100644
--- a/arch/s390/boot/compressed/Makefile
+++ b/arch/s390/boot/compressed/Makefile
@@ -7,6 +7,7 @@
 
 KCOV_INSTRUMENT := n
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
 
diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
index dd73b7f074237..bd857aacad794 100644
--- a/arch/s390/kernel/Makefile
+++ b/arch/s390/kernel/Makefile
@@ -14,6 +14,7 @@ CFLAGS_REMOVE_early.o		= $(CC_FLAGS_FTRACE)
 endif
 
 GCOV_PROFILE_early.o		:= n
+PGO_PROFILE_early.o		:= n
 KCOV_INSTRUMENT_early.o		:= n
 UBSAN_SANITIZE_early.o		:= n
 KASAN_SANITIZE_ipl.o		:= n
diff --git a/arch/s390/kernel/vdso64/Makefile b/arch/s390/kernel/vdso64/Makefile
index a6e0fb6b91d6c..d7c43b7c1db96 100644
--- a/arch/s390/kernel/vdso64/Makefile
+++ b/arch/s390/kernel/vdso64/Makefile
@@ -35,8 +35,9 @@ obj-y += vdso64_wrapper.o
 targets += vdso64.lds
 CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
 
-# Disable gcov profiling, ubsan and kasan for VDSO code
+# Disable gcov and PGO profiling, ubsan and kasan for VDSO code
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
 
diff --git a/arch/s390/purgatory/Makefile b/arch/s390/purgatory/Makefile
index c57f8c40e9926..9aef584e98466 100644
--- a/arch/s390/purgatory/Makefile
+++ b/arch/s390/purgatory/Makefile
@@ -17,6 +17,7 @@ $(obj)/mem.o: $(srctree)/arch/s390/lib/mem.S FORCE
 
 KCOV_INSTRUMENT := n
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
 
diff --git a/arch/sh/boot/compressed/Makefile b/arch/sh/boot/compressed/Makefile
index 589d2d8a573db..ae19aeeb3964c 100644
--- a/arch/sh/boot/compressed/Makefile
+++ b/arch/sh/boot/compressed/Makefile
@@ -13,6 +13,7 @@ targets		:= vmlinux vmlinux.bin vmlinux.bin.gz \
 OBJECTS = $(obj)/head_32.o $(obj)/misc.o $(obj)/cache.o
 
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 #
 # IMAGE_OFFSET is the load offset of the compression loader
diff --git a/arch/sh/mm/Makefile b/arch/sh/mm/Makefile
index f69ddc70b1465..ea2782c631f43 100644
--- a/arch/sh/mm/Makefile
+++ b/arch/sh/mm/Makefile
@@ -43,3 +43,4 @@ obj-$(CONFIG_UNCACHED_MAPPING)	+= uncached.o
 obj-$(CONFIG_HAVE_SRAM_POOL)	+= sram.o
 
 GCOV_PROFILE_pmb.o := n
+PGO_PROFILE_pmb.o := n
diff --git a/arch/sparc/vdso/Makefile b/arch/sparc/vdso/Makefile
index c5e1545bc5cf9..ab5f3783fe199 100644
--- a/arch/sparc/vdso/Makefile
+++ b/arch/sparc/vdso/Makefile
@@ -115,6 +115,7 @@ quiet_cmd_vdso = VDSO    $@
 
 VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 -Bsymbolic
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 #
 # Install the unstripped copies of vdso*.so.  If our toolchain supports
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 21f851179ff08..36305ea61dc09 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -96,6 +96,7 @@ config X86
 	select ARCH_SUPPORTS_DEBUG_PAGEALLOC
 	select ARCH_SUPPORTS_NUMA_BALANCING	if X86_64
 	select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP	if NR_CPUS <= 4096
+	select ARCH_SUPPORTS_PGO_CLANG		if X86_64
 	select ARCH_USE_BUILTIN_BSWAP
 	select ARCH_USE_QUEUED_RWLOCKS
 	select ARCH_USE_QUEUED_SPINLOCKS
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index fe605205b4ce2..383853e32f673 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -71,6 +71,7 @@ KBUILD_AFLAGS	:= $(KBUILD_CFLAGS) -D__ASSEMBLY__
 KBUILD_CFLAGS	+= $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
 KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 
 $(obj)/bzImage: asflags-y  := $(SVGA_MODE)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index e0bc3988c3faa..ed12ab65f6065 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
 
 KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE :=n
 
 KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 02e3e42f380bd..26e2b3af0145c 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO    $@
 VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
 	$(call ld-option, --eh-frame-hdr) -Bsymbolic
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 quiet_cmd_vdso_and_check = VDSO    $@
       cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index efd9e9ea17f25..f6cab2316c46a 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -184,6 +184,8 @@ SECTIONS
 
 	BUG_TABLE
 
+	PGO_CLANG_DATA
+
 	ORC_UNWIND_TABLE
 
 	. = ALIGN(PAGE_SIZE);
diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
index 84b09c230cbd5..5f22b31446ad4 100644
--- a/arch/x86/platform/efi/Makefile
+++ b/arch/x86/platform/efi/Makefile
@@ -2,6 +2,7 @@
 OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
 KASAN_SANITIZE := n
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 obj-$(CONFIG_EFI) 		+= quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
 obj-$(CONFIG_EFI_MIXED)		+= efi_thunk_$(BITS).o
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 95ea17a9d20cb..36f20e99da0bc 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
 
 # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
 GCOV_PROFILE	:= n
+PGO_PROFILE	:= n
 KASAN_SANITIZE	:= n
 UBSAN_SANITIZE	:= n
 KCSAN_SANITIZE	:= n
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 83f1b6a56449f..21797192f958f 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -76,4 +76,5 @@ KBUILD_CFLAGS	:= $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
 KBUILD_AFLAGS	:= $(KBUILD_CFLAGS) -D__ASSEMBLY__
 KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
index 5943387e3f357..54f5768f58530 100644
--- a/arch/x86/um/vdso/Makefile
+++ b/arch/x86/um/vdso/Makefile
@@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO    $@
 
 VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 #
 # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 8a94388e38b33..2d81623b33f29 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -40,6 +40,7 @@ KBUILD_CFLAGS			:= $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
 KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
 
 GCOV_PROFILE			:= n
+PGO_PROFILE			:= n
 # Sanitizer runtimes are unavailable and cannot be linked here.
 KASAN_SANITIZE			:= n
 KCSAN_SANITIZE			:= n
diff --git a/drivers/s390/char/Makefile b/drivers/s390/char/Makefile
index c6fdb81a068a6..bf6c5db5da1fc 100644
--- a/drivers/s390/char/Makefile
+++ b/drivers/s390/char/Makefile
@@ -9,6 +9,7 @@ CFLAGS_REMOVE_sclp_early_core.o	= $(CC_FLAGS_FTRACE)
 endif
 
 GCOV_PROFILE_sclp_early_core.o		:= n
+PGO_PROFILE_sclp_early_core.o		:= n
 KCOV_INSTRUMENT_sclp_early_core.o	:= n
 UBSAN_SANITIZE_sclp_early_core.o	:= n
 KASAN_SANITIZE_sclp_early_core.o	:= n
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index b2b3d81b1535a..3a591bb18c5fb 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -316,6 +316,49 @@
 #define THERMAL_TABLE(name)
 #endif
 
+#ifdef CONFIG_PGO_CLANG
+#define PGO_CLANG_DATA							\
+	__llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) {	\
+		. = ALIGN(8);						\
+		__llvm_prf_start = .;					\
+		__llvm_prf_data_start = .;				\
+		KEEP(*(__llvm_prf_data))				\
+		. = ALIGN(8);						\
+		__llvm_prf_data_end = .;				\
+	}								\
+	__llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) {	\
+		. = ALIGN(8);						\
+		__llvm_prf_cnts_start = .;				\
+		KEEP(*(__llvm_prf_cnts))				\
+		. = ALIGN(8);						\
+		__llvm_prf_cnts_end = .;				\
+	}								\
+	__llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) {	\
+		. = ALIGN(8);						\
+		__llvm_prf_names_start = .;				\
+		KEEP(*(__llvm_prf_names))				\
+		. = ALIGN(8);						\
+		__llvm_prf_names_end = .;				\
+		. = ALIGN(8);						\
+	}								\
+	__llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) {	\
+		__llvm_prf_vals_start = .;				\
+		KEEP(*(__llvm_prf_vals))				\
+		. = ALIGN(8);						\
+		__llvm_prf_vals_end = .;				\
+		. = ALIGN(8);						\
+	}								\
+	__llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) {	\
+		__llvm_prf_vnds_start = .;				\
+		KEEP(*(__llvm_prf_vnds))				\
+		. = ALIGN(8);						\
+		__llvm_prf_vnds_end = .;				\
+		__llvm_prf_end = .;					\
+	}
+#else
+#define PGO_CLANG_DATA
+#endif
+
 #define KERNEL_DTB()							\
 	STRUCT_ALIGN();							\
 	__dtb_start = .;						\
@@ -1125,6 +1168,7 @@
 		CONSTRUCTORS						\
 	}								\
 	BUG_TABLE							\
+	PGO_CLANG_DATA
 
 #define INIT_TEXT_SECTION(inittext_align)				\
 	. = ALIGN(inittext_align);					\
diff --git a/kernel/Makefile b/kernel/Makefile
index aa7368c7eabf3..0b34ca228ba46 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
 obj-$(CONFIG_KCSAN) += kcsan/
 obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
 obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
+obj-$(CONFIG_PGO_CLANG) += pgo/
 
 obj-$(CONFIG_PERF_EVENTS) += events/
 
diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
new file mode 100644
index 0000000000000..318d36bb3d106
--- /dev/null
+++ b/kernel/pgo/Kconfig
@@ -0,0 +1,34 @@
+# SPDX-License-Identifier: GPL-2.0-only
+menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
+
+config ARCH_SUPPORTS_PGO_CLANG
+	bool
+
+config PGO_CLANG
+	bool "Enable clang's PGO-based kernel profiling"
+	depends on DEBUG_FS
+	depends on ARCH_SUPPORTS_PGO_CLANG
+	help
+	  This option enables clang's PGO (Profile Guided Optimization) based
+	  code profiling to better optimize the kernel.
+
+	  If unsure, say N.
+
+	  Run a representative workload for your application on a kernel
+	  compiled with this option and download the raw profile file from
+	  /sys/kernel/debug/pgo/profraw. This file needs to be processed with
+	  llvm-profdata. It may be merged with other collected raw profiles.
+
+	  Copy the resulting profile file into vmlinux.profdata, and enable
+	  KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
+	  kernel.
+
+	  Note that a kernel compiled with profiling flags will be
+	  significatnly larger and run slower. Also be sure to exclude files
+	  from profiling which are not linked to the kernel image to prevent
+	  linker errors.
+
+	  Note that the debugfs filesystem has to be mounted to access
+	  profiling data.
+
+endmenu
diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
new file mode 100644
index 0000000000000..41e27cefd9a47
--- /dev/null
+++ b/kernel/pgo/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+GCOV_PROFILE	:= n
+PGO_PROFILE	:= n
+
+obj-y	+= fs.o instrument.o
diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
new file mode 100644
index 0000000000000..790a8df037bfc
--- /dev/null
+++ b/kernel/pgo/fs.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ *	Sami Tolvanen <samitolvanen@google.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt)	"pgo: " fmt
+
+#include <linux/kernel.h>
+#include <linux/debugfs.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include "pgo.h"
+
+static struct dentry *directory;
+
+struct prf_private_data {
+	void *buffer;
+	unsigned long size;
+};
+
+/*
+ * Raw profile data format:
+ *
+ *	- llvm_prf_header
+ *	- __llvm_prf_data
+ *	- __llvm_prf_cnts
+ *	- __llvm_prf_names
+ *	- zero padding to 8 bytes
+ *	- for each llvm_prf_data in __llvm_prf_data:
+ *		- llvm_prf_value_data
+ *			- llvm_prf_value_record + site count array
+ *				- llvm_prf_value_node_data
+ *				...
+ *			...
+ *		...
+ */
+
+static void prf_fill_header(void **buffer)
+{
+	struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
+
+	header->magic = LLVM_PRF_MAGIC;
+	header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
+	header->data_size = prf_data_count();
+	header->padding_bytes_before_counters = 0;
+	header->counters_size = prf_cnts_count();
+	header->padding_bytes_after_counters = 0;
+	header->names_size = prf_names_count();
+	header->counters_delta = (u64)__llvm_prf_cnts_start;
+	header->names_delta = (u64)__llvm_prf_names_start;
+	header->value_kind_last = LLVM_PRF_IPVK_LAST;
+
+	*buffer += sizeof(*header);
+}
+
+/*
+ * Copy the source into the buffer, incrementing the pointer into buffer in the
+ * process.
+ */
+static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
+{
+	memcpy(*buffer, src, size);
+	*buffer += size;
+}
+
+static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
+{
+	struct llvm_prf_value_node **nodes =
+		(struct llvm_prf_value_node **)p->values;
+	u32 kinds = 0;
+	u32 size = 0;
+	unsigned int kind;
+	unsigned int n;
+	unsigned int s = 0;
+
+	for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+		unsigned int sites = p->num_value_sites[kind];
+
+		if (!sites)
+			continue;
+
+		/* Record + site count array */
+		size += prf_get_value_record_size(sites);
+		kinds++;
+
+		if (!nodes)
+			continue;
+
+		for (n = 0; n < sites; n++) {
+			u32 count = 0;
+			struct llvm_prf_value_node *site = nodes[s + n];
+
+			while (site && ++count <= U8_MAX)
+				site = site->next;
+
+			size += count *
+				sizeof(struct llvm_prf_value_node_data);
+		}
+
+		s += sites;
+	}
+
+	if (size)
+		size += sizeof(struct llvm_prf_value_data);
+
+	if (value_kinds)
+		*value_kinds = kinds;
+
+	return size;
+}
+
+static u32 prf_get_value_size(void)
+{
+	u32 size = 0;
+	struct llvm_prf_data *p;
+
+	for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+		size += __prf_get_value_size(p, NULL);
+
+	return size;
+}
+
+/* Serialize the profiling's value. */
+static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
+{
+	struct llvm_prf_value_data header;
+	struct llvm_prf_value_node **nodes =
+		(struct llvm_prf_value_node **)p->values;
+	unsigned int kind;
+	unsigned int n;
+	unsigned int s = 0;
+
+	header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
+
+	if (!header.num_value_kinds)
+		/* Nothing to write. */
+		return;
+
+	prf_copy_to_buffer(buffer, &header, sizeof(header));
+
+	for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+		struct llvm_prf_value_record *record;
+		u8 *counts;
+		unsigned int sites = p->num_value_sites[kind];
+
+		if (!sites)
+			continue;
+
+		/* Profiling value record. */
+		record = *(struct llvm_prf_value_record **)buffer;
+		*buffer += prf_get_value_record_header_size();
+
+		record->kind = kind;
+		record->num_value_sites = sites;
+
+		/* Site count array. */
+		counts = *(u8 **)buffer;
+		*buffer += prf_get_value_record_site_count_size(sites);
+
+		/*
+		 * If we don't have nodes, we can skip updating the site count
+		 * array, because the buffer is zero filled.
+		 */
+		if (!nodes)
+			continue;
+
+		for (n = 0; n < sites; n++) {
+			u32 count = 0;
+			struct llvm_prf_value_node *site = nodes[s + n];
+
+			while (site && ++count <= U8_MAX) {
+				prf_copy_to_buffer(buffer, site,
+						   sizeof(struct llvm_prf_value_node_data));
+				site = site->next;
+			}
+
+			counts[n] = (u8)count;
+		}
+
+		s += sites;
+	}
+}
+
+static void prf_serialize_values(void **buffer)
+{
+	struct llvm_prf_data *p;
+
+	for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+		prf_serialize_value(p, buffer);
+}
+
+static inline unsigned long prf_get_padding(unsigned long size)
+{
+	return 8 - (size % 8);
+}
+
+static unsigned long prf_buffer_size(void)
+{
+	return sizeof(struct llvm_prf_header) +
+			prf_data_size()	+
+			prf_cnts_size() +
+			prf_names_size() +
+			prf_get_padding(prf_names_size()) +
+			prf_get_value_size();
+}
+
+/* Serialize the profling data into a format LLVM's tools can understand. */
+static int prf_serialize(struct prf_private_data *p)
+{
+	int err = 0;
+	void *buffer;
+
+	p->size = prf_buffer_size();
+	p->buffer = vzalloc(p->size);
+
+	if (!p->buffer) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	buffer = p->buffer;
+
+	prf_fill_header(&buffer);
+	prf_copy_to_buffer(&buffer, __llvm_prf_data_start,  prf_data_size());
+	prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start,  prf_cnts_size());
+	prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
+	buffer += prf_get_padding(prf_names_size());
+
+	prf_serialize_values(&buffer);
+
+out:
+	return err;
+}
+
+/* open() implementation for PGO. Creates a copy of the profiling data set. */
+static int prf_open(struct inode *inode, struct file *file)
+{
+	struct prf_private_data *data;
+	unsigned long flags;
+	int err;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	flags = prf_lock();
+
+	err = prf_serialize(data);
+	if (err) {
+		kfree(data);
+		goto out_unlock;
+	}
+
+	file->private_data = data;
+
+out_unlock:
+	prf_unlock(flags);
+out:
+	return err;
+}
+
+/* read() implementation for PGO. */
+static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
+			loff_t *ppos)
+{
+	struct prf_private_data *data = file->private_data;
+
+	BUG_ON(!data);
+
+	return simple_read_from_buffer(buf, count, ppos, data->buffer,
+				       data->size);
+}
+
+/* release() implementation for PGO. Release resources allocated by open(). */
+static int prf_release(struct inode *inode, struct file *file)
+{
+	struct prf_private_data *data = file->private_data;
+
+	if (data) {
+		vfree(data->buffer);
+		kfree(data);
+	}
+
+	return 0;
+}
+
+static const struct file_operations prf_fops = {
+	.owner		= THIS_MODULE,
+	.open		= prf_open,
+	.read		= prf_read,
+	.llseek		= default_llseek,
+	.release	= prf_release
+};
+
+/* write() implementation for resetting PGO's profile data. */
+static ssize_t reset_write(struct file *file, const char __user *addr,
+			   size_t len, loff_t *pos)
+{
+	struct llvm_prf_data *data;
+
+	memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
+
+	for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
+		struct llvm_prf_value_node **vnodes;
+		u64 current_vsite_count;
+		u32 i;
+
+		if (!data->values)
+			continue;
+
+		current_vsite_count = 0;
+		vnodes = (struct llvm_prf_value_node **)data->values;
+
+		for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
+			current_vsite_count += data->num_value_sites[i];
+
+		for (i = 0; i < current_vsite_count; ++i) {
+			struct llvm_prf_value_node *current_vnode = vnodes[i];
+
+			while (current_vnode) {
+				current_vnode->count = 0;
+				current_vnode = current_vnode->next;
+			}
+		}
+	}
+
+	return len;
+}
+
+static const struct file_operations prf_reset_fops = {
+	.owner		= THIS_MODULE,
+	.write		= reset_write,
+	.llseek		= noop_llseek,
+};
+
+/* Create debugfs entries. */
+static int __init pgo_init(void)
+{
+	directory = debugfs_create_dir("pgo", NULL);
+	if (!directory)
+		goto err_remove;
+
+	if (!debugfs_create_file("profraw", 0600, directory, NULL,
+				 &prf_fops))
+		goto err_remove;
+
+	if (!debugfs_create_file("reset", 0200, directory, NULL,
+				 &prf_reset_fops))
+		goto err_remove;
+
+	return 0;
+
+err_remove:
+	pr_err("initialization failed\n");
+	return -EIO;
+}
+
+/* Remove debufs entries. */
+static void __exit pgo_exit(void)
+{
+	debugfs_remove_recursive(directory);
+}
+
+module_init(pgo_init);
+module_exit(pgo_exit);
diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
new file mode 100644
index 0000000000000..465615b7f8735
--- /dev/null
+++ b/kernel/pgo/instrument.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ *	Sami Tolvanen <samitolvanen@google.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt)	"pgo: " fmt
+
+#include <linux/kernel.h>
+#include <linux/export.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+#include "pgo.h"
+
+/* Lock guarding value node access and serialization. */
+static DEFINE_SPINLOCK(pgo_lock);
+static int current_node;
+
+unsigned long prf_lock(void)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&pgo_lock, flags);
+
+	return flags;
+}
+
+void prf_unlock(unsigned long flags)
+{
+	spin_unlock_irqrestore(&pgo_lock, flags);
+}
+
+/*
+ * Return a newly allocated profiling value node which contains the tracked
+ * value by the value profiler.
+ * Note: caller *must* hold pgo_lock.
+ */
+static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
+						 u32 index, u64 value)
+{
+	if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
+		return NULL; /* Out of nodes */
+
+	current_node++;
+
+	/* Make sure the node is entirely within the section */
+	if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
+	    &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
+		return NULL;
+
+	return &__llvm_prf_vnds_start[current_node];
+}
+
+/*
+ * Counts the number of times a target value is seen.
+ *
+ * Records the target value for the CounterIndex if not seen before. Otherwise,
+ * increments the counter associated w/ the target value.
+ */
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
+{
+	struct llvm_prf_data *p = (struct llvm_prf_data *)data;
+	struct llvm_prf_value_node **counters;
+	struct llvm_prf_value_node *curr;
+	struct llvm_prf_value_node *min = NULL;
+	struct llvm_prf_value_node *prev = NULL;
+	u64 min_count = U64_MAX;
+	u8 values = 0;
+	unsigned long flags;
+
+	if (!p || !p->values)
+		return;
+
+	counters = (struct llvm_prf_value_node **)p->values;
+	curr = counters[index];
+
+	while (curr) {
+		if (target_value == curr->value) {
+			curr->count++;
+			return;
+		}
+
+		if (curr->count < min_count) {
+			min_count = curr->count;
+			min = curr;
+		}
+
+		prev = curr;
+		curr = curr->next;
+		values++;
+	}
+
+	if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
+		if (!min->count || !(--min->count)) {
+			curr = min;
+			curr->value = target_value;
+			curr->count++;
+		}
+		return;
+	}
+
+	/* Lock when updating the value node structure. */
+	flags = prf_lock();
+
+	curr = allocate_node(p, index, target_value);
+	if (!curr)
+		goto out;
+
+	curr->value = target_value;
+	curr->count++;
+
+	if (!counters[index])
+		counters[index] = curr;
+	else if (prev && !prev->next)
+		prev->next = curr;
+
+out:
+	prf_unlock(flags);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_target);
+
+/* Counts the number of times a range of targets values are seen. */
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+				     u32 index, s64 precise_start,
+				     s64 precise_last, s64 large_value)
+{
+	if (large_value != S64_MIN && (s64)target_value >= large_value)
+		target_value = large_value;
+	else if ((s64)target_value < precise_start ||
+		 (s64)target_value > precise_last)
+		target_value = precise_last + 1;
+
+	__llvm_profile_instrument_target(target_value, data, index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_range);
+
+static inline int inst_prof_popcount(unsigned long long value)
+{
+	value = value - ((value >> 1) & 0x5555555555555555ULL);
+	value = (value & 0x3333333333333333ULL) +
+		((value >> 2) & 0x3333333333333333ULL);
+	value = (value + (value >> 4)) & 0x0F0F0F0F0F0F0F0FULL;
+
+	return (int)((unsigned long long)(value * 0x0101010101010101ULL) >> 56);
+}
+
+static u64 inst_prof_get_range_rep_value(u64 value)
+{
+	if (value <= 8)
+		/* The first ranges are individually tracked, us it as is. */
+		return value;
+	else if (value >= 513)
+		/* The last range is mapped to its lowest value. */
+		return 513;
+	else if (inst_prof_popcount(value) == 1)
+		/* If it's a power of two, use it as is. */
+		return value;
+
+	/* Otherwise, take to the previous power of two + 1. */
+	return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
+}
+
+/*
+ * The target values are partitioned into multiple ranges. The range spec is
+ * defined in compiler-rt/include/profile/InstrProfData.inc.
+ */
+void __llvm_profile_instrument_memop(u64 target_value, void *data,
+				     u32 counter_index)
+{
+	u64 rep_value;
+
+	/* Map the target value to the representative value of its range. */
+	rep_value = inst_prof_get_range_rep_value(target_value);
+	__llvm_profile_instrument_target(rep_value, data, counter_index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_memop);
diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
new file mode 100644
index 0000000000000..df0aa278f28bd
--- /dev/null
+++ b/kernel/pgo/pgo.h
@@ -0,0 +1,206 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ *	Sami Tolvanen <samitolvanen@google.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _PGO_H
+#define _PGO_H
+
+/*
+ * Note: These internal LLVM definitions must match the compiler version.
+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
+ */
+
+#ifdef CONFIG_64BIT
+	#define LLVM_PRF_MAGIC		\
+		((u64)255 << 56 |	\
+		 (u64)'l' << 48 |	\
+		 (u64)'p' << 40 |	\
+		 (u64)'r' << 32 |	\
+		 (u64)'o' << 24 |	\
+		 (u64)'f' << 16 |	\
+		 (u64)'r' << 8  |	\
+		 (u64)129)
+#else
+	#define LLVM_PRF_MAGIC		\
+		((u64)255 << 56 |	\
+		 (u64)'l' << 48 |	\
+		 (u64)'p' << 40 |	\
+		 (u64)'r' << 32 |	\
+		 (u64)'o' << 24 |	\
+		 (u64)'f' << 16 |	\
+		 (u64)'R' << 8  |	\
+		 (u64)129)
+#endif
+
+#define LLVM_PRF_VERSION		5
+#define LLVM_PRF_DATA_ALIGN		8
+#define LLVM_PRF_IPVK_FIRST		0
+#define LLVM_PRF_IPVK_LAST		1
+#define LLVM_PRF_MAX_NUM_VALS_PER_SITE	16
+
+#define LLVM_PRF_VARIANT_MASK_IR	(0x1ull << 56)
+#define LLVM_PRF_VARIANT_MASK_CSIR	(0x1ull << 57)
+
+/**
+ * struct llvm_prf_header - represents the raw profile header data structure.
+ * @magic: the magic token for the file format.
+ * @version: the version of the file format.
+ * @data_size: the number of entries in the profile data section.
+ * @padding_bytes_before_counters: the number of padding bytes before the
+ *   counters.
+ * @counters_size: the size in bytes of the LLVM profile section containing the
+ *   counters.
+ * @padding_bytes_after_counters: the number of padding bytes after the
+ *   counters.
+ * @names_size: the size in bytes of the LLVM profile section containing the
+ *   counters' names.
+ * @counters_delta: the beginning of the LLMV profile counters section.
+ * @names_delta: the beginning of the LLMV profile names section.
+ * @value_kind_last: the last profile value kind.
+ */
+struct llvm_prf_header {
+	u64 magic;
+	u64 version;
+	u64 data_size;
+	u64 padding_bytes_before_counters;
+	u64 counters_size;
+	u64 padding_bytes_after_counters;
+	u64 names_size;
+	u64 counters_delta;
+	u64 names_delta;
+	u64 value_kind_last;
+};
+
+/**
+ * struct llvm_prf_data - represents the per-function control structure.
+ * @name_ref: the reference to the function's name.
+ * @func_hash: the hash value of the function.
+ * @counter_ptr: a pointer to the profile counter.
+ * @function_ptr: a pointer to the function.
+ * @values: the profiling values associated with this function.
+ * @num_counters: the number of counters in the function.
+ * @num_value_sites: the number of value profile sites.
+ */
+struct llvm_prf_data {
+	const u64 name_ref;
+	const u64 func_hash;
+	const void *counter_ptr;
+	const void *function_ptr;
+	void *values;
+	const u32 num_counters;
+	const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
+} __aligned(LLVM_PRF_DATA_ALIGN);
+
+/**
+ * structure llvm_prf_value_node_data - represents the data part of the struct
+ *   llvm_prf_value_node data structure.
+ * @value: the value counters.
+ * @count: the counters' count.
+ */
+struct llvm_prf_value_node_data {
+	u64 value;
+	u64 count;
+};
+
+/**
+ * struct llvm_prf_value_node - represents an internal data structure used by
+ *   the value profiler.
+ * @value: the value counters.
+ * @count: the counters' count.
+ * @next: the next value node.
+ */
+struct llvm_prf_value_node {
+	u64 value;
+	u64 count;
+	struct llvm_prf_value_node *next;
+};
+
+/**
+ * struct llvm_prf_value_data - represents the value profiling data in indexed
+ *   format.
+ * @total_size: the total size in bytes including this field.
+ * @num_value_kinds: the number of value profile kinds that has value profile
+ *   data.
+ */
+struct llvm_prf_value_data {
+	u32 total_size;
+	u32 num_value_kinds;
+};
+
+/**
+ * struct llvm_prf_value_record - represents the on-disk layout of the value
+ *   profile data of a particular kind for one function.
+ * @kind: the kind of the value profile record.
+ * @num_value_sites: the number of value profile sites.
+ * @site_count_array: the first element of the array that stores the number
+ *   of profiled values for each value site.
+ */
+struct llvm_prf_value_record {
+	u32 kind;
+	u32 num_value_sites;
+	u8 site_count_array[];
+};
+
+#define prf_get_value_record_header_size()		\
+	offsetof(struct llvm_prf_value_record, site_count_array)
+#define prf_get_value_record_site_count_size(sites)	\
+	roundup((sites), 8)
+#define prf_get_value_record_size(sites)		\
+	(prf_get_value_record_header_size() +		\
+	 prf_get_value_record_site_count_size((sites)))
+
+/* Data sections */
+extern struct llvm_prf_data __llvm_prf_data_start[];
+extern struct llvm_prf_data __llvm_prf_data_end[];
+
+extern u64 __llvm_prf_cnts_start[];
+extern u64 __llvm_prf_cnts_end[];
+
+extern char __llvm_prf_names_start[];
+extern char __llvm_prf_names_end[];
+
+extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
+extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
+
+/* Locking for vnodes */
+extern unsigned long prf_lock(void);
+extern void prf_unlock(unsigned long flags);
+
+#define __DEFINE_PRF_SIZE(s) \
+	static inline unsigned long prf_ ## s ## _size(void)		\
+	{								\
+		unsigned long start =					\
+			(unsigned long)__llvm_prf_ ## s ## _start;	\
+		unsigned long end =					\
+			(unsigned long)__llvm_prf_ ## s ## _end;	\
+		return roundup(end - start,				\
+				sizeof(__llvm_prf_ ## s ## _start[0]));	\
+	}								\
+	static inline unsigned long prf_ ## s ## _count(void)		\
+	{								\
+		return prf_ ## s ## _size() /				\
+			sizeof(__llvm_prf_ ## s ## _start[0]);		\
+	}
+
+__DEFINE_PRF_SIZE(data);
+__DEFINE_PRF_SIZE(cnts);
+__DEFINE_PRF_SIZE(names);
+__DEFINE_PRF_SIZE(vnds);
+
+#undef __DEFINE_PRF_SIZE
+
+#endif /* _PGO_H */
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 213677a5ed33e..9b218afb5cb87 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
 		$(CFLAGS_GCOV))
 endif
 
+#
+# Enable clang's PGO profiling flags for a file or directory depending on
+# variables PGO_PROFILE_obj.o and PGO_PROFILE.
+#
+ifeq ($(CONFIG_PGO_CLANG),y)
+_c_flags += $(if $(patsubst n%,, \
+		$(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
+		$(CFLAGS_PGO_CLANG))
+endif
+
 #
 # Enable address sanitizer flags for kernel except some files or directories
 # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
-- 
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v3] pgo: add clang's Profile Guided Optimization infrastructure
       [not found]     ` <202101121755.pyYoRozB-lkp@intel.com>
@ 2021-01-12 17:22       ` Nathan Chancellor
  0 siblings, 0 replies; 116+ messages in thread
From: Nathan Chancellor @ 2021-01-12 17:22 UTC (permalink / raw)
  To: kernel test robot
  Cc: Bill Wendling, Jonathan Corbet, Masahiro Yamada, linux-doc,
	linux-kernel, linux-kbuild, clang-built-linux, Andrew Morton,
	kbuild-all, Linux Memory Management List, Nick Desaulniers

On Tue, Jan 12, 2021 at 05:10:04PM +0800, kernel test robot wrote:
> Hi Bill,
> 
> I love your patch! Perhaps something to improve:
> 
> [auto build test WARNING on linus/master]
> [also build test WARNING on v5.11-rc3]
> [cannot apply to powerpc/next s390/features tip/x86/core next-20210111]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:    https://github.com/0day-ci/linux/commits/Bill-Wendling/pgo-add-clang-s-Profile-Guided-Optimization-infrastructure/20210112-133315
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git a0d54b4f5b219fb31f0776e9f53aa137e78ae431
> config: x86_64-allyesconfig (attached as .config)
> compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

Hmmm... This should probably be gated on CC_IS_CLANG? Or even better
CLANG_VERSION >= 120000 due to
https://github.com/ClangBuiltLinux/linux/issues/1252?

> reproduce (this is a W=1 build):
>         # https://github.com/0day-ci/linux/commit/6ab85bae7667afd0aa68c6442b7ca5c369fa1088
>         git remote add linux-review https://github.com/0day-ci/linux
>         git fetch --no-tags linux-review Bill-Wendling/pgo-add-clang-s-Profile-Guided-Optimization-infrastructure/20210112-133315
>         git checkout 6ab85bae7667afd0aa68c6442b7ca5c369fa1088
>         # save the attached .config to linux build tree
>         make W=1 ARCH=x86_64 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All warnings (new ones prefixed by >>):
> 
>    kernel/pgo/instrument.c:72:6: warning: no previous prototype for '__llvm_profile_instrument_target' [-Wmissing-prototypes]
>       72 | void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
>          |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    kernel/pgo/instrument.c:135:6: warning: no previous prototype for '__llvm_profile_instrument_range' [-Wmissing-prototypes]
>      135 | void __llvm_profile_instrument_range(u64 target_value, void *data,
>          |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> kernel/pgo/instrument.c:179:6: warning: no previous prototype for '__llvm_profile_instrument_memop' [-Wmissing-prototypes]
>      179 | void __llvm_profile_instrument_memop(u64 target_value, void *data,
>          |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 

I still think that this warning will show up with clang at W=1. Given
that these are compiler inserted functions, the prototypes don't matter
but we could shut it up by just putting the prototypes right above the
functions like was done in commit 1e1b6d63d634 ("lib/string.c: implement
stpcpy").

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-12  5:14 ` [PATCH v2] " Bill Wendling
  2021-01-12  5:17   ` Sedat Dilek
  2021-01-12  5:31   ` [PATCH v3] " Bill Wendling
@ 2021-01-12 17:37   ` Nick Desaulniers
  2021-01-12 17:45     ` Fāng-ruì Sòng
  2 siblings, 1 reply; 116+ messages in thread
From: Nick Desaulniers @ 2021-01-12 17:37 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Jonathan Corbet, Masahiro Yamada, Linux Doc Mailing List, LKML,
	Linux Kbuild mailing list, clang-built-linux, Andrew Morton,
	Nathan Chancellor, Sami Tolvanen, Alistair Delva

On Mon, Jan 11, 2021 at 9:14 PM Bill Wendling <morbo@google.com> wrote:
>
> From: Sami Tolvanen <samitolvanen@google.com>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
>
>   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
>   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can now be used by the compiler:
>
>   $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we

Please drop all changes to arch/* that are not to arch/x86/ then; we
can cross that bridge when we get to each arch. For example, there's
no point disabling PGO for architectures LLVM doesn't even have a
backend for.

> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.
>
> Note that this method of profiling the kernel is clang-native and isn't
> compatible with clang's gcov support in kernel/gcov.

Then the Kconfig option should depend on !GCOV so that they are
mutually exclusive and can't be selected together accidentally; such
as by bots doing randconfig tests.

<large snip>

> +static inline int inst_prof_popcount(unsigned long long value)
> +{
> +       value = value - ((value >> 1) & 0x5555555555555555ULL);
> +       value = (value & 0x3333333333333333ULL) +
> +               ((value >> 2) & 0x3333333333333333ULL);
> +       value = (value + (value >> 4)) & 0x0F0F0F0F0F0F0F0FULL;
> +
> +       return (int)((unsigned long long)(value * 0x0101010101010101ULL) >> 56);
> +}

The kernel has a portable popcnt implementation called hweight64 if
you #include <asm-generic/bitops/hweight.h>; does that work here?
https://en.wikipedia.org/wiki/Hamming_weight
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-12 17:37   ` [PATCH v2] " Nick Desaulniers
@ 2021-01-12 17:45     ` Fāng-ruì Sòng
  0 siblings, 0 replies; 116+ messages in thread
From: Fāng-ruì Sòng @ 2021-01-12 17:45 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Bill Wendling, Jonathan Corbet, Masahiro Yamada,
	Linux Doc Mailing List, LKML, Linux Kbuild mailing list,
	clang-built-linux, Andrew Morton, Nathan Chancellor,
	Sami Tolvanen, Alistair Delva

On Tue, Jan 12, 2021 at 9:37 AM 'Nick Desaulniers' via Clang Built
Linux <clang-built-linux@googlegroups.com> wrote:
>
> On Mon, Jan 11, 2021 at 9:14 PM Bill Wendling <morbo@google.com> wrote:
> >
> > From: Sami Tolvanen <samitolvanen@google.com>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool
> > before it can be used during recompilation:
> >
> >   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> >   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can now be used by the compiler:
> >
> >   $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we
>
> Please drop all changes to arch/* that are not to arch/x86/ then; we
> can cross that bridge when we get to each arch. For example, there's
> no point disabling PGO for architectures LLVM doesn't even have a
> backend for.
>
> > know works. This restriction can be lifted once other platforms have
> > been verified to work with PGO.
> >
> > Note that this method of profiling the kernel is clang-native and isn't
> > compatible with clang's gcov support in kernel/gcov.
>
> Then the Kconfig option should depend on !GCOV so that they are
> mutually exclusive and can't be selected together accidentally; such
> as by bots doing randconfig tests.

The profile formats (Clang PGO, Clang gcov, GCC gcov/PGO) are
different but Clang PGO can be used with Clang's gcov implementation:
clang -fprofile-generate --coverage a.cc; ./a.out => default*.profraw + a.gcda

> <large snip>
>
> > +static inline int inst_prof_popcount(unsigned long long value)
> > +{
> > +       value = value - ((value >> 1) & 0x5555555555555555ULL);
> > +       value = (value & 0x3333333333333333ULL) +
> > +               ((value >> 2) & 0x3333333333333333ULL);
> > +       value = (value + (value >> 4)) & 0x0F0F0F0F0F0F0F0FULL;
> > +
> > +       return (int)((unsigned long long)(value * 0x0101010101010101ULL) >> 56);
> > +}
>
> The kernel has a portable popcnt implementation called hweight64 if
> you #include <asm-generic/bitops/hweight.h>; does that work here?
> https://en.wikipedia.org/wiki/Hamming_weight
> --
> Thanks,
> ~Nick Desaulniers
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/CAKwvOdk%2BNqhzC_4wFbQMJmLMQWoDSjQiRJyCGe5dsWkqK_NJJQ%40mail.gmail.com.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-12  5:31   ` [PATCH v3] " Bill Wendling
       [not found]     ` <202101121755.pyYoRozB-lkp@intel.com>
@ 2021-01-13  6:19     ` Bill Wendling
  2021-01-13 20:55       ` Nathan Chancellor
                         ` (2 more replies)
  1 sibling, 3 replies; 116+ messages in thread
From: Bill Wendling @ 2021-01-13  6:19 UTC (permalink / raw)
  To: Jonathan Corbet, Masahiro Yamada, linux-doc, linux-kernel,
	linux-kbuild, clang-built-linux, Andrew Morton
  Cc: Nathan Chancellor, Nick Desaulniers, Sami Tolvanen, Bill Wendling

From: Sami Tolvanen <samitolvanen@google.com>

Enable the use of clang's Profile-Guided Optimization[1]. To generate a
profile, the kernel is instrumented with PGO counters, a representative
workload is run, and the raw profile data is collected from
/sys/kernel/debug/pgo/profraw.

The raw profile data must be processed by clang's "llvm-profdata" tool
before it can be used during recompilation:

  $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
  $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw

Multiple raw profiles may be merged during this step.

The data can now be used by the compiler:

  $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...

This initial submission is restricted to x86, as that's the platform we
know works. This restriction can be lifted once other platforms have
been verified to work with PGO.

Note that this method of profiling the kernel is clang-native, unlike
the clang support in kernel/gcov.

[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Co-developed-by: Bill Wendling <morbo@google.com>
Signed-off-by: Bill Wendling <morbo@google.com>
Change-Id: Ic78e69c682286d3a44c4549a0138578c98138b77
---
v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
      testing.
    - Corrected documentation, re PGO flags when using LTO, based on Fangrui
      Song's comments.
v3: - Added change log section based on Sedat Dilek's comments.
v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
      own popcount implementation, based on Nick Desaulniers's comment.
---
 Documentation/dev-tools/index.rst     |   1 +
 Documentation/dev-tools/pgo.rst       | 127 +++++++++
 MAINTAINERS                           |   9 +
 Makefile                              |   3 +
 arch/Kconfig                          |   1 +
 arch/x86/Kconfig                      |   1 +
 arch/x86/boot/Makefile                |   1 +
 arch/x86/boot/compressed/Makefile     |   1 +
 arch/x86/entry/vdso/Makefile          |   1 +
 arch/x86/kernel/vmlinux.lds.S         |   2 +
 arch/x86/platform/efi/Makefile        |   1 +
 arch/x86/purgatory/Makefile           |   1 +
 arch/x86/realmode/rm/Makefile         |   1 +
 arch/x86/um/vdso/Makefile             |   1 +
 drivers/firmware/efi/libstub/Makefile |   1 +
 include/asm-generic/vmlinux.lds.h     |  44 +++
 kernel/Makefile                       |   1 +
 kernel/pgo/Kconfig                    |  34 +++
 kernel/pgo/Makefile                   |   5 +
 kernel/pgo/fs.c                       | 382 ++++++++++++++++++++++++++
 kernel/pgo/instrument.c               | 185 +++++++++++++
 kernel/pgo/pgo.h                      | 206 ++++++++++++++
 scripts/Makefile.lib                  |  10 +
 23 files changed, 1019 insertions(+)
 create mode 100644 Documentation/dev-tools/pgo.rst
 create mode 100644 kernel/pgo/Kconfig
 create mode 100644 kernel/pgo/Makefile
 create mode 100644 kernel/pgo/fs.c
 create mode 100644 kernel/pgo/instrument.c
 create mode 100644 kernel/pgo/pgo.h

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index f7809c7b1ba9e..8d6418e858062 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -26,6 +26,7 @@ whole; patches welcome!
    kgdb
    kselftest
    kunit/index
+   pgo
 
 
 .. only::  subproject and html
diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
new file mode 100644
index 0000000000000..b7f11d8405b73
--- /dev/null
+++ b/Documentation/dev-tools/pgo.rst
@@ -0,0 +1,127 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+Using PGO with the Linux kernel
+===============================
+
+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
+when building with Clang. The profiling data is exported via the ``pgo``
+debugfs directory.
+
+.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
+
+
+Preparation
+===========
+
+Configure the kernel with:
+
+.. code-block:: make
+
+   CONFIG_DEBUG_FS=y
+   CONFIG_PGO_CLANG=y
+
+Note that kernels compiled with profiling flags will be significantly larger
+and run slower.
+
+Profiling data will only become accessible once debugfs has been mounted:
+
+.. code-block:: sh
+
+   mount -t debugfs none /sys/kernel/debug
+
+
+Customization
+=============
+
+You can enable or disable profiling for individual file and directories by
+adding a line similar to the following to the respective kernel Makefile:
+
+- For a single file (e.g. main.o)
+
+  .. code-block:: make
+
+     PGO_PROFILE_main.o := y
+
+- For all files in one directory
+
+  .. code-block:: make
+
+     PGO_PROFILE := y
+
+To exclude files from being profiled use
+
+  .. code-block:: make
+
+     PGO_PROFILE_main.o := n
+
+and
+
+  .. code-block:: make
+
+     PGO_PROFILE := n
+
+Only files which are linked to the main kernel image or are compiled as kernel
+modules are supported by this mechanism.
+
+
+Files
+=====
+
+The PGO kernel support creates the following files in debugfs:
+
+``/sys/kernel/debug/pgo``
+	Parent directory for all PGO-related files.
+
+``/sys/kernel/debug/pgo/reset``
+	Global reset file: resets all coverage data to zero when written to.
+
+``/sys/kernel/debug/profraw``
+	The raw PGO data that must be processed with ``llvm_profdata``.
+
+
+Workflow
+========
+
+The PGO kernel can be run on the host or test machines. The data though should
+be analyzed with Clang's tools from the same Clang version as the kernel was
+compiled. Clang's tolerant of version skew, but it's easier to use the same
+Clang version.
+
+The profiling data is useful for optimizing the kernel, analyzing coverage,
+etc. Clang offers tools to perform these tasks.
+
+Here is an example workflow for profiling an instrumented kernel with PGO and
+using the result to optimize the kernel:
+
+1) Install the kernel on the TEST machine.
+
+2) Reset the data counters right before running the load tests
+
+   .. code-block:: sh
+
+      $ echo 1 > /sys/kernel/debug/pgo/reset
+
+3) Run the load tests.
+
+4) Collect the raw profile data
+
+   .. code-block:: sh
+
+      $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
+
+5) (Optional) Download the raw profile data to the HOST machine.
+
+6) Process the raw profile data
+
+   .. code-block:: sh
+
+      $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
+
+   Note that multiple raw profile data files can be merged during this step.
+
+7) Rebuild the kernel using the profile data (PGO disabled)
+
+   .. code-block:: sh
+
+      $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
diff --git a/MAINTAINERS b/MAINTAINERS
index cc1e6a5ee6e67..1b979da316fa4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13954,6 +13954,15 @@ S:	Maintained
 F:	include/linux/personality.h
 F:	include/uapi/linux/personality.h
 
+PGO BASED KERNEL PROFILING
+M:	Sami Tolvanen <samitolvanen@google.com>
+M:	Bill Wendling <wcw@google.com>
+R:	Nathan Chancellor <natechancellor@gmail.com>
+R:	Nick Desaulniers <ndesaulniers@google.com>
+S:	Supported
+F:	Documentation/dev-tools/pgo.rst
+F:	kernel/pgo
+
 PHOENIX RC FLIGHT CONTROLLER ADAPTER
 M:	Marcus Folkesson <marcus.folkesson@gmail.com>
 L:	linux-input@vger.kernel.org
diff --git a/Makefile b/Makefile
index 9e73f82e0d863..9128bfe1ccc97 100644
--- a/Makefile
+++ b/Makefile
@@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
 # Defaults to vmlinux, but the arch makefile usually adds further targets
 all: vmlinux
 
+CFLAGS_PGO_CLANG := -fprofile-generate
+export CFLAGS_PGO_CLANG
+
 CFLAGS_GCOV	:= -fprofile-arcs -ftest-coverage \
 	$(call cc-option,-fno-tree-loop-im) \
 	$(call cc-disable-warning,maybe-uninitialized,)
diff --git a/arch/Kconfig b/arch/Kconfig
index 24862d15f3a36..f39d3991f6bfe 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
 	   pairs of 32-bit arguments, select this option.
 
 source "kernel/gcov/Kconfig"
+source "kernel/pgo/Kconfig"
 
 source "scripts/gcc-plugins/Kconfig"
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 21f851179ff08..36305ea61dc09 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -96,6 +96,7 @@ config X86
 	select ARCH_SUPPORTS_DEBUG_PAGEALLOC
 	select ARCH_SUPPORTS_NUMA_BALANCING	if X86_64
 	select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP	if NR_CPUS <= 4096
+	select ARCH_SUPPORTS_PGO_CLANG		if X86_64
 	select ARCH_USE_BUILTIN_BSWAP
 	select ARCH_USE_QUEUED_RWLOCKS
 	select ARCH_USE_QUEUED_SPINLOCKS
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index fe605205b4ce2..383853e32f673 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -71,6 +71,7 @@ KBUILD_AFLAGS	:= $(KBUILD_CFLAGS) -D__ASSEMBLY__
 KBUILD_CFLAGS	+= $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
 KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 
 $(obj)/bzImage: asflags-y  := $(SVGA_MODE)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index e0bc3988c3faa..ed12ab65f6065 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
 
 KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE :=n
 
 KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 02e3e42f380bd..26e2b3af0145c 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO    $@
 VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
 	$(call ld-option, --eh-frame-hdr) -Bsymbolic
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 quiet_cmd_vdso_and_check = VDSO    $@
       cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index efd9e9ea17f25..f6cab2316c46a 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -184,6 +184,8 @@ SECTIONS
 
 	BUG_TABLE
 
+	PGO_CLANG_DATA
+
 	ORC_UNWIND_TABLE
 
 	. = ALIGN(PAGE_SIZE);
diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
index 84b09c230cbd5..5f22b31446ad4 100644
--- a/arch/x86/platform/efi/Makefile
+++ b/arch/x86/platform/efi/Makefile
@@ -2,6 +2,7 @@
 OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
 KASAN_SANITIZE := n
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 obj-$(CONFIG_EFI) 		+= quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
 obj-$(CONFIG_EFI_MIXED)		+= efi_thunk_$(BITS).o
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 95ea17a9d20cb..36f20e99da0bc 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
 
 # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
 GCOV_PROFILE	:= n
+PGO_PROFILE	:= n
 KASAN_SANITIZE	:= n
 UBSAN_SANITIZE	:= n
 KCSAN_SANITIZE	:= n
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 83f1b6a56449f..21797192f958f 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -76,4 +76,5 @@ KBUILD_CFLAGS	:= $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
 KBUILD_AFLAGS	:= $(KBUILD_CFLAGS) -D__ASSEMBLY__
 KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
index 5943387e3f357..54f5768f58530 100644
--- a/arch/x86/um/vdso/Makefile
+++ b/arch/x86/um/vdso/Makefile
@@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO    $@
 
 VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 #
 # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 8a94388e38b33..2d81623b33f29 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -40,6 +40,7 @@ KBUILD_CFLAGS			:= $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
 KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
 
 GCOV_PROFILE			:= n
+PGO_PROFILE			:= n
 # Sanitizer runtimes are unavailable and cannot be linked here.
 KASAN_SANITIZE			:= n
 KCSAN_SANITIZE			:= n
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index b2b3d81b1535a..3a591bb18c5fb 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -316,6 +316,49 @@
 #define THERMAL_TABLE(name)
 #endif
 
+#ifdef CONFIG_PGO_CLANG
+#define PGO_CLANG_DATA							\
+	__llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) {	\
+		. = ALIGN(8);						\
+		__llvm_prf_start = .;					\
+		__llvm_prf_data_start = .;				\
+		KEEP(*(__llvm_prf_data))				\
+		. = ALIGN(8);						\
+		__llvm_prf_data_end = .;				\
+	}								\
+	__llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) {	\
+		. = ALIGN(8);						\
+		__llvm_prf_cnts_start = .;				\
+		KEEP(*(__llvm_prf_cnts))				\
+		. = ALIGN(8);						\
+		__llvm_prf_cnts_end = .;				\
+	}								\
+	__llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) {	\
+		. = ALIGN(8);						\
+		__llvm_prf_names_start = .;				\
+		KEEP(*(__llvm_prf_names))				\
+		. = ALIGN(8);						\
+		__llvm_prf_names_end = .;				\
+		. = ALIGN(8);						\
+	}								\
+	__llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) {	\
+		__llvm_prf_vals_start = .;				\
+		KEEP(*(__llvm_prf_vals))				\
+		. = ALIGN(8);						\
+		__llvm_prf_vals_end = .;				\
+		. = ALIGN(8);						\
+	}								\
+	__llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) {	\
+		__llvm_prf_vnds_start = .;				\
+		KEEP(*(__llvm_prf_vnds))				\
+		. = ALIGN(8);						\
+		__llvm_prf_vnds_end = .;				\
+		__llvm_prf_end = .;					\
+	}
+#else
+#define PGO_CLANG_DATA
+#endif
+
 #define KERNEL_DTB()							\
 	STRUCT_ALIGN();							\
 	__dtb_start = .;						\
@@ -1125,6 +1168,7 @@
 		CONSTRUCTORS						\
 	}								\
 	BUG_TABLE							\
+	PGO_CLANG_DATA
 
 #define INIT_TEXT_SECTION(inittext_align)				\
 	. = ALIGN(inittext_align);					\
diff --git a/kernel/Makefile b/kernel/Makefile
index aa7368c7eabf3..0b34ca228ba46 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
 obj-$(CONFIG_KCSAN) += kcsan/
 obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
 obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
+obj-$(CONFIG_PGO_CLANG) += pgo/
 
 obj-$(CONFIG_PERF_EVENTS) += events/
 
diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
new file mode 100644
index 0000000000000..318d36bb3d106
--- /dev/null
+++ b/kernel/pgo/Kconfig
@@ -0,0 +1,34 @@
+# SPDX-License-Identifier: GPL-2.0-only
+menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
+
+config ARCH_SUPPORTS_PGO_CLANG
+	bool
+
+config PGO_CLANG
+	bool "Enable clang's PGO-based kernel profiling"
+	depends on DEBUG_FS
+	depends on ARCH_SUPPORTS_PGO_CLANG
+	help
+	  This option enables clang's PGO (Profile Guided Optimization) based
+	  code profiling to better optimize the kernel.
+
+	  If unsure, say N.
+
+	  Run a representative workload for your application on a kernel
+	  compiled with this option and download the raw profile file from
+	  /sys/kernel/debug/pgo/profraw. This file needs to be processed with
+	  llvm-profdata. It may be merged with other collected raw profiles.
+
+	  Copy the resulting profile file into vmlinux.profdata, and enable
+	  KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
+	  kernel.
+
+	  Note that a kernel compiled with profiling flags will be
+	  significatnly larger and run slower. Also be sure to exclude files
+	  from profiling which are not linked to the kernel image to prevent
+	  linker errors.
+
+	  Note that the debugfs filesystem has to be mounted to access
+	  profiling data.
+
+endmenu
diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
new file mode 100644
index 0000000000000..41e27cefd9a47
--- /dev/null
+++ b/kernel/pgo/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+GCOV_PROFILE	:= n
+PGO_PROFILE	:= n
+
+obj-y	+= fs.o instrument.o
diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
new file mode 100644
index 0000000000000..790a8df037bfc
--- /dev/null
+++ b/kernel/pgo/fs.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ *	Sami Tolvanen <samitolvanen@google.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt)	"pgo: " fmt
+
+#include <linux/kernel.h>
+#include <linux/debugfs.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include "pgo.h"
+
+static struct dentry *directory;
+
+struct prf_private_data {
+	void *buffer;
+	unsigned long size;
+};
+
+/*
+ * Raw profile data format:
+ *
+ *	- llvm_prf_header
+ *	- __llvm_prf_data
+ *	- __llvm_prf_cnts
+ *	- __llvm_prf_names
+ *	- zero padding to 8 bytes
+ *	- for each llvm_prf_data in __llvm_prf_data:
+ *		- llvm_prf_value_data
+ *			- llvm_prf_value_record + site count array
+ *				- llvm_prf_value_node_data
+ *				...
+ *			...
+ *		...
+ */
+
+static void prf_fill_header(void **buffer)
+{
+	struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
+
+	header->magic = LLVM_PRF_MAGIC;
+	header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
+	header->data_size = prf_data_count();
+	header->padding_bytes_before_counters = 0;
+	header->counters_size = prf_cnts_count();
+	header->padding_bytes_after_counters = 0;
+	header->names_size = prf_names_count();
+	header->counters_delta = (u64)__llvm_prf_cnts_start;
+	header->names_delta = (u64)__llvm_prf_names_start;
+	header->value_kind_last = LLVM_PRF_IPVK_LAST;
+
+	*buffer += sizeof(*header);
+}
+
+/*
+ * Copy the source into the buffer, incrementing the pointer into buffer in the
+ * process.
+ */
+static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
+{
+	memcpy(*buffer, src, size);
+	*buffer += size;
+}
+
+static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
+{
+	struct llvm_prf_value_node **nodes =
+		(struct llvm_prf_value_node **)p->values;
+	u32 kinds = 0;
+	u32 size = 0;
+	unsigned int kind;
+	unsigned int n;
+	unsigned int s = 0;
+
+	for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+		unsigned int sites = p->num_value_sites[kind];
+
+		if (!sites)
+			continue;
+
+		/* Record + site count array */
+		size += prf_get_value_record_size(sites);
+		kinds++;
+
+		if (!nodes)
+			continue;
+
+		for (n = 0; n < sites; n++) {
+			u32 count = 0;
+			struct llvm_prf_value_node *site = nodes[s + n];
+
+			while (site && ++count <= U8_MAX)
+				site = site->next;
+
+			size += count *
+				sizeof(struct llvm_prf_value_node_data);
+		}
+
+		s += sites;
+	}
+
+	if (size)
+		size += sizeof(struct llvm_prf_value_data);
+
+	if (value_kinds)
+		*value_kinds = kinds;
+
+	return size;
+}
+
+static u32 prf_get_value_size(void)
+{
+	u32 size = 0;
+	struct llvm_prf_data *p;
+
+	for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+		size += __prf_get_value_size(p, NULL);
+
+	return size;
+}
+
+/* Serialize the profiling's value. */
+static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
+{
+	struct llvm_prf_value_data header;
+	struct llvm_prf_value_node **nodes =
+		(struct llvm_prf_value_node **)p->values;
+	unsigned int kind;
+	unsigned int n;
+	unsigned int s = 0;
+
+	header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
+
+	if (!header.num_value_kinds)
+		/* Nothing to write. */
+		return;
+
+	prf_copy_to_buffer(buffer, &header, sizeof(header));
+
+	for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+		struct llvm_prf_value_record *record;
+		u8 *counts;
+		unsigned int sites = p->num_value_sites[kind];
+
+		if (!sites)
+			continue;
+
+		/* Profiling value record. */
+		record = *(struct llvm_prf_value_record **)buffer;
+		*buffer += prf_get_value_record_header_size();
+
+		record->kind = kind;
+		record->num_value_sites = sites;
+
+		/* Site count array. */
+		counts = *(u8 **)buffer;
+		*buffer += prf_get_value_record_site_count_size(sites);
+
+		/*
+		 * If we don't have nodes, we can skip updating the site count
+		 * array, because the buffer is zero filled.
+		 */
+		if (!nodes)
+			continue;
+
+		for (n = 0; n < sites; n++) {
+			u32 count = 0;
+			struct llvm_prf_value_node *site = nodes[s + n];
+
+			while (site && ++count <= U8_MAX) {
+				prf_copy_to_buffer(buffer, site,
+						   sizeof(struct llvm_prf_value_node_data));
+				site = site->next;
+			}
+
+			counts[n] = (u8)count;
+		}
+
+		s += sites;
+	}
+}
+
+static void prf_serialize_values(void **buffer)
+{
+	struct llvm_prf_data *p;
+
+	for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+		prf_serialize_value(p, buffer);
+}
+
+static inline unsigned long prf_get_padding(unsigned long size)
+{
+	return 8 - (size % 8);
+}
+
+static unsigned long prf_buffer_size(void)
+{
+	return sizeof(struct llvm_prf_header) +
+			prf_data_size()	+
+			prf_cnts_size() +
+			prf_names_size() +
+			prf_get_padding(prf_names_size()) +
+			prf_get_value_size();
+}
+
+/* Serialize the profling data into a format LLVM's tools can understand. */
+static int prf_serialize(struct prf_private_data *p)
+{
+	int err = 0;
+	void *buffer;
+
+	p->size = prf_buffer_size();
+	p->buffer = vzalloc(p->size);
+
+	if (!p->buffer) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	buffer = p->buffer;
+
+	prf_fill_header(&buffer);
+	prf_copy_to_buffer(&buffer, __llvm_prf_data_start,  prf_data_size());
+	prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start,  prf_cnts_size());
+	prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
+	buffer += prf_get_padding(prf_names_size());
+
+	prf_serialize_values(&buffer);
+
+out:
+	return err;
+}
+
+/* open() implementation for PGO. Creates a copy of the profiling data set. */
+static int prf_open(struct inode *inode, struct file *file)
+{
+	struct prf_private_data *data;
+	unsigned long flags;
+	int err;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	flags = prf_lock();
+
+	err = prf_serialize(data);
+	if (err) {
+		kfree(data);
+		goto out_unlock;
+	}
+
+	file->private_data = data;
+
+out_unlock:
+	prf_unlock(flags);
+out:
+	return err;
+}
+
+/* read() implementation for PGO. */
+static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
+			loff_t *ppos)
+{
+	struct prf_private_data *data = file->private_data;
+
+	BUG_ON(!data);
+
+	return simple_read_from_buffer(buf, count, ppos, data->buffer,
+				       data->size);
+}
+
+/* release() implementation for PGO. Release resources allocated by open(). */
+static int prf_release(struct inode *inode, struct file *file)
+{
+	struct prf_private_data *data = file->private_data;
+
+	if (data) {
+		vfree(data->buffer);
+		kfree(data);
+	}
+
+	return 0;
+}
+
+static const struct file_operations prf_fops = {
+	.owner		= THIS_MODULE,
+	.open		= prf_open,
+	.read		= prf_read,
+	.llseek		= default_llseek,
+	.release	= prf_release
+};
+
+/* write() implementation for resetting PGO's profile data. */
+static ssize_t reset_write(struct file *file, const char __user *addr,
+			   size_t len, loff_t *pos)
+{
+	struct llvm_prf_data *data;
+
+	memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
+
+	for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
+		struct llvm_prf_value_node **vnodes;
+		u64 current_vsite_count;
+		u32 i;
+
+		if (!data->values)
+			continue;
+
+		current_vsite_count = 0;
+		vnodes = (struct llvm_prf_value_node **)data->values;
+
+		for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
+			current_vsite_count += data->num_value_sites[i];
+
+		for (i = 0; i < current_vsite_count; ++i) {
+			struct llvm_prf_value_node *current_vnode = vnodes[i];
+
+			while (current_vnode) {
+				current_vnode->count = 0;
+				current_vnode = current_vnode->next;
+			}
+		}
+	}
+
+	return len;
+}
+
+static const struct file_operations prf_reset_fops = {
+	.owner		= THIS_MODULE,
+	.write		= reset_write,
+	.llseek		= noop_llseek,
+};
+
+/* Create debugfs entries. */
+static int __init pgo_init(void)
+{
+	directory = debugfs_create_dir("pgo", NULL);
+	if (!directory)
+		goto err_remove;
+
+	if (!debugfs_create_file("profraw", 0600, directory, NULL,
+				 &prf_fops))
+		goto err_remove;
+
+	if (!debugfs_create_file("reset", 0200, directory, NULL,
+				 &prf_reset_fops))
+		goto err_remove;
+
+	return 0;
+
+err_remove:
+	pr_err("initialization failed\n");
+	return -EIO;
+}
+
+/* Remove debufs entries. */
+static void __exit pgo_exit(void)
+{
+	debugfs_remove_recursive(directory);
+}
+
+module_init(pgo_init);
+module_exit(pgo_exit);
diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
new file mode 100644
index 0000000000000..6084ff0652e85
--- /dev/null
+++ b/kernel/pgo/instrument.c
@@ -0,0 +1,185 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ *	Sami Tolvanen <samitolvanen@google.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt)	"pgo: " fmt
+
+#include <linux/bitops.h>
+#include <linux/kernel.h>
+#include <linux/export.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+#include "pgo.h"
+
+/* Lock guarding value node access and serialization. */
+static DEFINE_SPINLOCK(pgo_lock);
+static int current_node;
+
+unsigned long prf_lock(void)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&pgo_lock, flags);
+
+	return flags;
+}
+
+void prf_unlock(unsigned long flags)
+{
+	spin_unlock_irqrestore(&pgo_lock, flags);
+}
+
+/*
+ * Return a newly allocated profiling value node which contains the tracked
+ * value by the value profiler.
+ * Note: caller *must* hold pgo_lock.
+ */
+static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
+						 u32 index, u64 value)
+{
+	if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
+		return NULL; /* Out of nodes */
+
+	current_node++;
+
+	/* Make sure the node is entirely within the section */
+	if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
+	    &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
+		return NULL;
+
+	return &__llvm_prf_vnds_start[current_node];
+}
+
+/*
+ * Counts the number of times a target value is seen.
+ *
+ * Records the target value for the CounterIndex if not seen before. Otherwise,
+ * increments the counter associated w/ the target value.
+ */
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
+{
+	struct llvm_prf_data *p = (struct llvm_prf_data *)data;
+	struct llvm_prf_value_node **counters;
+	struct llvm_prf_value_node *curr;
+	struct llvm_prf_value_node *min = NULL;
+	struct llvm_prf_value_node *prev = NULL;
+	u64 min_count = U64_MAX;
+	u8 values = 0;
+	unsigned long flags;
+
+	if (!p || !p->values)
+		return;
+
+	counters = (struct llvm_prf_value_node **)p->values;
+	curr = counters[index];
+
+	while (curr) {
+		if (target_value == curr->value) {
+			curr->count++;
+			return;
+		}
+
+		if (curr->count < min_count) {
+			min_count = curr->count;
+			min = curr;
+		}
+
+		prev = curr;
+		curr = curr->next;
+		values++;
+	}
+
+	if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
+		if (!min->count || !(--min->count)) {
+			curr = min;
+			curr->value = target_value;
+			curr->count++;
+		}
+		return;
+	}
+
+	/* Lock when updating the value node structure. */
+	flags = prf_lock();
+
+	curr = allocate_node(p, index, target_value);
+	if (!curr)
+		goto out;
+
+	curr->value = target_value;
+	curr->count++;
+
+	if (!counters[index])
+		counters[index] = curr;
+	else if (prev && !prev->next)
+		prev->next = curr;
+
+out:
+	prf_unlock(flags);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_target);
+
+/* Counts the number of times a range of targets values are seen. */
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+				     u32 index, s64 precise_start,
+				     s64 precise_last, s64 large_value);
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+				     u32 index, s64 precise_start,
+				     s64 precise_last, s64 large_value)
+{
+	if (large_value != S64_MIN && (s64)target_value >= large_value)
+		target_value = large_value;
+	else if ((s64)target_value < precise_start ||
+		 (s64)target_value > precise_last)
+		target_value = precise_last + 1;
+
+	__llvm_profile_instrument_target(target_value, data, index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_range);
+
+static u64 inst_prof_get_range_rep_value(u64 value)
+{
+	if (value <= 8)
+		/* The first ranges are individually tracked, us it as is. */
+		return value;
+	else if (value >= 513)
+		/* The last range is mapped to its lowest value. */
+		return 513;
+	else if (hweight64(value) == 1)
+		/* If it's a power of two, use it as is. */
+		return value;
+
+	/* Otherwise, take to the previous power of two + 1. */
+	return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
+}
+
+/*
+ * The target values are partitioned into multiple ranges. The range spec is
+ * defined in compiler-rt/include/profile/InstrProfData.inc.
+ */
+void __llvm_profile_instrument_memop(u64 target_value, void *data,
+				     u32 counter_index);
+void __llvm_profile_instrument_memop(u64 target_value, void *data,
+				     u32 counter_index)
+{
+	u64 rep_value;
+
+	/* Map the target value to the representative value of its range. */
+	rep_value = inst_prof_get_range_rep_value(target_value);
+	__llvm_profile_instrument_target(rep_value, data, counter_index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_memop);
diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
new file mode 100644
index 0000000000000..df0aa278f28bd
--- /dev/null
+++ b/kernel/pgo/pgo.h
@@ -0,0 +1,206 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ *	Sami Tolvanen <samitolvanen@google.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _PGO_H
+#define _PGO_H
+
+/*
+ * Note: These internal LLVM definitions must match the compiler version.
+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
+ */
+
+#ifdef CONFIG_64BIT
+	#define LLVM_PRF_MAGIC		\
+		((u64)255 << 56 |	\
+		 (u64)'l' << 48 |	\
+		 (u64)'p' << 40 |	\
+		 (u64)'r' << 32 |	\
+		 (u64)'o' << 24 |	\
+		 (u64)'f' << 16 |	\
+		 (u64)'r' << 8  |	\
+		 (u64)129)
+#else
+	#define LLVM_PRF_MAGIC		\
+		((u64)255 << 56 |	\
+		 (u64)'l' << 48 |	\
+		 (u64)'p' << 40 |	\
+		 (u64)'r' << 32 |	\
+		 (u64)'o' << 24 |	\
+		 (u64)'f' << 16 |	\
+		 (u64)'R' << 8  |	\
+		 (u64)129)
+#endif
+
+#define LLVM_PRF_VERSION		5
+#define LLVM_PRF_DATA_ALIGN		8
+#define LLVM_PRF_IPVK_FIRST		0
+#define LLVM_PRF_IPVK_LAST		1
+#define LLVM_PRF_MAX_NUM_VALS_PER_SITE	16
+
+#define LLVM_PRF_VARIANT_MASK_IR	(0x1ull << 56)
+#define LLVM_PRF_VARIANT_MASK_CSIR	(0x1ull << 57)
+
+/**
+ * struct llvm_prf_header - represents the raw profile header data structure.
+ * @magic: the magic token for the file format.
+ * @version: the version of the file format.
+ * @data_size: the number of entries in the profile data section.
+ * @padding_bytes_before_counters: the number of padding bytes before the
+ *   counters.
+ * @counters_size: the size in bytes of the LLVM profile section containing the
+ *   counters.
+ * @padding_bytes_after_counters: the number of padding bytes after the
+ *   counters.
+ * @names_size: the size in bytes of the LLVM profile section containing the
+ *   counters' names.
+ * @counters_delta: the beginning of the LLMV profile counters section.
+ * @names_delta: the beginning of the LLMV profile names section.
+ * @value_kind_last: the last profile value kind.
+ */
+struct llvm_prf_header {
+	u64 magic;
+	u64 version;
+	u64 data_size;
+	u64 padding_bytes_before_counters;
+	u64 counters_size;
+	u64 padding_bytes_after_counters;
+	u64 names_size;
+	u64 counters_delta;
+	u64 names_delta;
+	u64 value_kind_last;
+};
+
+/**
+ * struct llvm_prf_data - represents the per-function control structure.
+ * @name_ref: the reference to the function's name.
+ * @func_hash: the hash value of the function.
+ * @counter_ptr: a pointer to the profile counter.
+ * @function_ptr: a pointer to the function.
+ * @values: the profiling values associated with this function.
+ * @num_counters: the number of counters in the function.
+ * @num_value_sites: the number of value profile sites.
+ */
+struct llvm_prf_data {
+	const u64 name_ref;
+	const u64 func_hash;
+	const void *counter_ptr;
+	const void *function_ptr;
+	void *values;
+	const u32 num_counters;
+	const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
+} __aligned(LLVM_PRF_DATA_ALIGN);
+
+/**
+ * structure llvm_prf_value_node_data - represents the data part of the struct
+ *   llvm_prf_value_node data structure.
+ * @value: the value counters.
+ * @count: the counters' count.
+ */
+struct llvm_prf_value_node_data {
+	u64 value;
+	u64 count;
+};
+
+/**
+ * struct llvm_prf_value_node - represents an internal data structure used by
+ *   the value profiler.
+ * @value: the value counters.
+ * @count: the counters' count.
+ * @next: the next value node.
+ */
+struct llvm_prf_value_node {
+	u64 value;
+	u64 count;
+	struct llvm_prf_value_node *next;
+};
+
+/**
+ * struct llvm_prf_value_data - represents the value profiling data in indexed
+ *   format.
+ * @total_size: the total size in bytes including this field.
+ * @num_value_kinds: the number of value profile kinds that has value profile
+ *   data.
+ */
+struct llvm_prf_value_data {
+	u32 total_size;
+	u32 num_value_kinds;
+};
+
+/**
+ * struct llvm_prf_value_record - represents the on-disk layout of the value
+ *   profile data of a particular kind for one function.
+ * @kind: the kind of the value profile record.
+ * @num_value_sites: the number of value profile sites.
+ * @site_count_array: the first element of the array that stores the number
+ *   of profiled values for each value site.
+ */
+struct llvm_prf_value_record {
+	u32 kind;
+	u32 num_value_sites;
+	u8 site_count_array[];
+};
+
+#define prf_get_value_record_header_size()		\
+	offsetof(struct llvm_prf_value_record, site_count_array)
+#define prf_get_value_record_site_count_size(sites)	\
+	roundup((sites), 8)
+#define prf_get_value_record_size(sites)		\
+	(prf_get_value_record_header_size() +		\
+	 prf_get_value_record_site_count_size((sites)))
+
+/* Data sections */
+extern struct llvm_prf_data __llvm_prf_data_start[];
+extern struct llvm_prf_data __llvm_prf_data_end[];
+
+extern u64 __llvm_prf_cnts_start[];
+extern u64 __llvm_prf_cnts_end[];
+
+extern char __llvm_prf_names_start[];
+extern char __llvm_prf_names_end[];
+
+extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
+extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
+
+/* Locking for vnodes */
+extern unsigned long prf_lock(void);
+extern void prf_unlock(unsigned long flags);
+
+#define __DEFINE_PRF_SIZE(s) \
+	static inline unsigned long prf_ ## s ## _size(void)		\
+	{								\
+		unsigned long start =					\
+			(unsigned long)__llvm_prf_ ## s ## _start;	\
+		unsigned long end =					\
+			(unsigned long)__llvm_prf_ ## s ## _end;	\
+		return roundup(end - start,				\
+				sizeof(__llvm_prf_ ## s ## _start[0]));	\
+	}								\
+	static inline unsigned long prf_ ## s ## _count(void)		\
+	{								\
+		return prf_ ## s ## _size() /				\
+			sizeof(__llvm_prf_ ## s ## _start[0]);		\
+	}
+
+__DEFINE_PRF_SIZE(data);
+__DEFINE_PRF_SIZE(cnts);
+__DEFINE_PRF_SIZE(names);
+__DEFINE_PRF_SIZE(vnds);
+
+#undef __DEFINE_PRF_SIZE
+
+#endif /* _PGO_H */
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 213677a5ed33e..9b218afb5cb87 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
 		$(CFLAGS_GCOV))
 endif
 
+#
+# Enable clang's PGO profiling flags for a file or directory depending on
+# variables PGO_PROFILE_obj.o and PGO_PROFILE.
+#
+ifeq ($(CONFIG_PGO_CLANG),y)
+_c_flags += $(if $(patsubst n%,, \
+		$(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
+		$(CFLAGS_PGO_CLANG))
+endif
+
 #
 # Enable address sanitizer flags for kernel except some files or directories
 # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
-- 
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-13  6:19     ` [PATCH v4] " Bill Wendling
@ 2021-01-13 20:55       ` Nathan Chancellor
  2021-01-13 21:59         ` Bill Wendling
  2021-01-14  4:07         ` Nick Desaulniers
  2021-01-13 23:01       ` Nick Desaulniers
  2021-01-16  9:43       ` [PATCH v5] " Bill Wendling
  2 siblings, 2 replies; 116+ messages in thread
From: Nathan Chancellor @ 2021-01-13 20:55 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Jonathan Corbet, Masahiro Yamada, linux-doc, linux-kernel,
	linux-kbuild, clang-built-linux, Andrew Morton, Nick Desaulniers,
	Sami Tolvanen

Hi Bill,

On Tue, Jan 12, 2021 at 10:19:58PM -0800, Bill Wendling wrote:
> From: Sami Tolvanen <samitolvanen@google.com>
> 
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
> 
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
> 
>   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
>   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> 
> Multiple raw profiles may be merged during this step.
> 
> The data can now be used by the compiler:
> 
>   $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> 
> This initial submission is restricted to x86, as that's the platform we
> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.
> 
> Note that this method of profiling the kernel is clang-native, unlike
> the clang support in kernel/gcov.
> 
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> 
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> Co-developed-by: Bill Wendling <morbo@google.com>
> Signed-off-by: Bill Wendling <morbo@google.com>
> Change-Id: Ic78e69c682286d3a44c4549a0138578c98138b77

Small nit: This should be removed.

I applied this patch on top of v5.11-rc3, built it with LLVM 12
(f1d5cbbdee5526bc86eac0a5652b115d9bc158e5 + D94470) with Microsoft's
WSL 5.4 config [1] + CONFIG_PGO_CLANG=y, and ran it on WSL2.

$ zgrep PGO /proc/config.gz
# Profile Guided Optimization (PGO) (EXPERIMENTAL)
CONFIG_ARCH_SUPPORTS_PGO_CLANG=y
CONFIG_PGO_CLANG=y
# end of Profile Guided Optimization (PGO) (EXPERIMENTAL)

However, I see an issue with actually using the data:

$ sudo -s
# mount -t debugfs none /sys/kernel/debug
# cp -a /sys/kernel/debug/pgo/profraw vmlinux.profraw
# chown nathan:nathan vmlinux.profraw
# exit
$ tc-build/build/llvm/stage1/bin/llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
warning: vmlinux.profraw: Invalid instrumentation profile data (bad magic)
error: No profiles could be merged.

Am I holding it wrong? :) Note, this is virtualized, I do not have any
"real" x86 hardware that I can afford to test on right now.

[1]: https://github.com/microsoft/WSL2-Linux-Kernel/raw/linux-msft-wsl-5.4.y/Microsoft/config-wsl

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-13 20:55       ` Nathan Chancellor
@ 2021-01-13 21:59         ` Bill Wendling
  2021-01-14  4:07         ` Nick Desaulniers
  1 sibling, 0 replies; 116+ messages in thread
From: Bill Wendling @ 2021-01-13 21:59 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Jonathan Corbet, Masahiro Yamada, Linux Doc Mailing List, LKML,
	Linux Kbuild mailing list, clang-built-linux, Andrew Morton,
	Nick Desaulniers, Sami Tolvanen

On Wed, Jan 13, 2021 at 12:55 PM Nathan Chancellor
<natechancellor@gmail.com> wrote:
>
> Hi Bill,
>
> On Tue, Jan 12, 2021 at 10:19:58PM -0800, Bill Wendling wrote:
> > From: Sami Tolvanen <samitolvanen@google.com>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool
> > before it can be used during recompilation:
> >
> >   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> >   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can now be used by the compiler:
> >
> >   $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we
> > know works. This restriction can be lifted once other platforms have
> > been verified to work with PGO.
> >
> > Note that this method of profiling the kernel is clang-native, unlike
> > the clang support in kernel/gcov.
> >
> > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> >
> > Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> > Co-developed-by: Bill Wendling <morbo@google.com>
> > Signed-off-by: Bill Wendling <morbo@google.com>
> > Change-Id: Ic78e69c682286d3a44c4549a0138578c98138b77
>
> Small nit: This should be removed.
>
Grrr....The git hook keeps adding it in there. :-(

> I applied this patch on top of v5.11-rc3, built it with LLVM 12
> (f1d5cbbdee5526bc86eac0a5652b115d9bc158e5 + D94470) with Microsoft's
> WSL 5.4 config [1] + CONFIG_PGO_CLANG=y, and ran it on WSL2.
>
> $ zgrep PGO /proc/config.gz
> # Profile Guided Optimization (PGO) (EXPERIMENTAL)
> CONFIG_ARCH_SUPPORTS_PGO_CLANG=y
> CONFIG_PGO_CLANG=y
> # end of Profile Guided Optimization (PGO) (EXPERIMENTAL)
>
> However, I see an issue with actually using the data:
>
> $ sudo -s
> # mount -t debugfs none /sys/kernel/debug
> # cp -a /sys/kernel/debug/pgo/profraw vmlinux.profraw
> # chown nathan:nathan vmlinux.profraw
> # exit
> $ tc-build/build/llvm/stage1/bin/llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> warning: vmlinux.profraw: Invalid instrumentation profile data (bad magic)
> error: No profiles could be merged.
>
> Am I holding it wrong? :) Note, this is virtualized, I do not have any
> "real" x86 hardware that I can afford to test on right now.
>
> [1]: https://github.com/microsoft/WSL2-Linux-Kernel/raw/linux-msft-wsl-5.4.y/Microsoft/config-wsl
>
Could you send me the vmlinux.profraw file? (Don't CC this list.)

-bw

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-13  6:19     ` [PATCH v4] " Bill Wendling
  2021-01-13 20:55       ` Nathan Chancellor
@ 2021-01-13 23:01       ` Nick Desaulniers
  2021-01-16  9:43       ` [PATCH v5] " Bill Wendling
  2 siblings, 0 replies; 116+ messages in thread
From: Nick Desaulniers @ 2021-01-13 23:01 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Jonathan Corbet, Masahiro Yamada, Linux Doc Mailing List, LKML,
	Linux Kbuild mailing list, clang-built-linux, Andrew Morton,
	Nathan Chancellor, Sami Tolvanen, Fangrui Song

On Tue, Jan 12, 2021 at 10:20 PM Bill Wendling <morbo@google.com> wrote:
>
> From: Sami Tolvanen <samitolvanen@google.com>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
>
>   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
>   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can now be used by the compiler:
>
>   $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we
> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.
>
> Note that this method of profiling the kernel is clang-native, unlike
> the clang support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> Co-developed-by: Bill Wendling <morbo@google.com>
> Signed-off-by: Bill Wendling <morbo@google.com>
> Change-Id: Ic78e69c682286d3a44c4549a0138578c98138b77
> ---
> v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
>       testing.
>     - Corrected documentation, re PGO flags when using LTO, based on Fangrui
>       Song's comments.
> v3: - Added change log section based on Sedat Dilek's comments.
> v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
>       own popcount implementation, based on Nick Desaulniers's comment.
> ---
>  Documentation/dev-tools/index.rst     |   1 +
>  Documentation/dev-tools/pgo.rst       | 127 +++++++++
>  MAINTAINERS                           |   9 +
>  Makefile                              |   3 +
>  arch/Kconfig                          |   1 +
>  arch/x86/Kconfig                      |   1 +
>  arch/x86/boot/Makefile                |   1 +
>  arch/x86/boot/compressed/Makefile     |   1 +
>  arch/x86/entry/vdso/Makefile          |   1 +
>  arch/x86/kernel/vmlinux.lds.S         |   2 +
>  arch/x86/platform/efi/Makefile        |   1 +
>  arch/x86/purgatory/Makefile           |   1 +
>  arch/x86/realmode/rm/Makefile         |   1 +
>  arch/x86/um/vdso/Makefile             |   1 +
>  drivers/firmware/efi/libstub/Makefile |   1 +
>  include/asm-generic/vmlinux.lds.h     |  44 +++
>  kernel/Makefile                       |   1 +
>  kernel/pgo/Kconfig                    |  34 +++
>  kernel/pgo/Makefile                   |   5 +
>  kernel/pgo/fs.c                       | 382 ++++++++++++++++++++++++++
>  kernel/pgo/instrument.c               | 185 +++++++++++++
>  kernel/pgo/pgo.h                      | 206 ++++++++++++++
>  scripts/Makefile.lib                  |  10 +
>  23 files changed, 1019 insertions(+)
>  create mode 100644 Documentation/dev-tools/pgo.rst
>  create mode 100644 kernel/pgo/Kconfig
>  create mode 100644 kernel/pgo/Makefile
>  create mode 100644 kernel/pgo/fs.c
>  create mode 100644 kernel/pgo/instrument.c
>  create mode 100644 kernel/pgo/pgo.h
>
> diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> index f7809c7b1ba9e..8d6418e858062 100644
> --- a/Documentation/dev-tools/index.rst
> +++ b/Documentation/dev-tools/index.rst
> @@ -26,6 +26,7 @@ whole; patches welcome!
>     kgdb
>     kselftest
>     kunit/index
> +   pgo
>
>
>  .. only::  subproject and html
> diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> new file mode 100644
> index 0000000000000..b7f11d8405b73
> --- /dev/null
> +++ b/Documentation/dev-tools/pgo.rst
> @@ -0,0 +1,127 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============================
> +Using PGO with the Linux kernel
> +===============================
> +
> +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> +when building with Clang. The profiling data is exported via the ``pgo``
> +debugfs directory.
> +
> +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> +
> +
> +Preparation
> +===========
> +
> +Configure the kernel with:
> +
> +.. code-block:: make
> +
> +   CONFIG_DEBUG_FS=y
> +   CONFIG_PGO_CLANG=y
> +
> +Note that kernels compiled with profiling flags will be significantly larger
> +and run slower.
> +
> +Profiling data will only become accessible once debugfs has been mounted:
> +
> +.. code-block:: sh
> +
> +   mount -t debugfs none /sys/kernel/debug
> +
> +
> +Customization
> +=============
> +
> +You can enable or disable profiling for individual file and directories by
> +adding a line similar to the following to the respective kernel Makefile:
> +
> +- For a single file (e.g. main.o)
> +
> +  .. code-block:: make
> +
> +     PGO_PROFILE_main.o := y
> +
> +- For all files in one directory
> +
> +  .. code-block:: make
> +
> +     PGO_PROFILE := y
> +
> +To exclude files from being profiled use
> +
> +  .. code-block:: make
> +
> +     PGO_PROFILE_main.o := n
> +
> +and
> +
> +  .. code-block:: make
> +
> +     PGO_PROFILE := n
> +
> +Only files which are linked to the main kernel image or are compiled as kernel
> +modules are supported by this mechanism.
> +
> +
> +Files
> +=====
> +
> +The PGO kernel support creates the following files in debugfs:
> +
> +``/sys/kernel/debug/pgo``
> +       Parent directory for all PGO-related files.
> +
> +``/sys/kernel/debug/pgo/reset``
> +       Global reset file: resets all coverage data to zero when written to.
> +
> +``/sys/kernel/debug/profraw``
> +       The raw PGO data that must be processed with ``llvm_profdata``.
> +
> +
> +Workflow
> +========
> +
> +The PGO kernel can be run on the host or test machines. The data though should
> +be analyzed with Clang's tools from the same Clang version as the kernel was
> +compiled. Clang's tolerant of version skew, but it's easier to use the same
> +Clang version.
> +
> +The profiling data is useful for optimizing the kernel, analyzing coverage,
> +etc. Clang offers tools to perform these tasks.
> +
> +Here is an example workflow for profiling an instrumented kernel with PGO and
> +using the result to optimize the kernel:
> +
> +1) Install the kernel on the TEST machine.
> +
> +2) Reset the data counters right before running the load tests
> +
> +   .. code-block:: sh
> +
> +      $ echo 1 > /sys/kernel/debug/pgo/reset

Maybe I'm a noob, but I had to:
$ mkdir -p /sys/kernel/debug
$ mount -t debugfs none /sys/kernel/debug

first. That might trip up future travelers (like myself, I'm prone to
forget these unless they're in my shell history).

> +
> +3) Run the load tests.
> +
> +4) Collect the raw profile data
> +
> +   .. code-block:: sh
> +
> +      $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> +
> +5) (Optional) Download the raw profile data to the HOST machine.
> +
> +6) Process the raw profile data
> +
> +   .. code-block:: sh
> +
> +      $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> +
> +   Note that multiple raw profile data files can be merged during this step.
> +
> +7) Rebuild the kernel using the profile data (PGO disabled)
> +
> +   .. code-block:: sh
> +
> +      $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> diff --git a/MAINTAINERS b/MAINTAINERS
> index cc1e6a5ee6e67..1b979da316fa4 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13954,6 +13954,15 @@ S:     Maintained
>  F:     include/linux/personality.h
>  F:     include/uapi/linux/personality.h
>
> +PGO BASED KERNEL PROFILING
> +M:     Sami Tolvanen <samitolvanen@google.com>
> +M:     Bill Wendling <wcw@google.com>
> +R:     Nathan Chancellor <natechancellor@gmail.com>
> +R:     Nick Desaulniers <ndesaulniers@google.com>
> +S:     Supported
> +F:     Documentation/dev-tools/pgo.rst
> +F:     kernel/pgo
> +
>  PHOENIX RC FLIGHT CONTROLLER ADAPTER
>  M:     Marcus Folkesson <marcus.folkesson@gmail.com>
>  L:     linux-input@vger.kernel.org
> diff --git a/Makefile b/Makefile
> index 9e73f82e0d863..9128bfe1ccc97 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
>  # Defaults to vmlinux, but the arch makefile usually adds further targets
>  all: vmlinux
>
> +CFLAGS_PGO_CLANG := -fprofile-generate
> +export CFLAGS_PGO_CLANG
> +
>  CFLAGS_GCOV    := -fprofile-arcs -ftest-coverage \
>         $(call cc-option,-fno-tree-loop-im) \
>         $(call cc-disable-warning,maybe-uninitialized,)
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 24862d15f3a36..f39d3991f6bfe 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
>            pairs of 32-bit arguments, select this option.
>
>  source "kernel/gcov/Kconfig"
> +source "kernel/pgo/Kconfig"
>
>  source "scripts/gcc-plugins/Kconfig"
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 21f851179ff08..36305ea61dc09 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -96,6 +96,7 @@ config X86
>         select ARCH_SUPPORTS_DEBUG_PAGEALLOC
>         select ARCH_SUPPORTS_NUMA_BALANCING     if X86_64
>         select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP       if NR_CPUS <= 4096
> +       select ARCH_SUPPORTS_PGO_CLANG          if X86_64
>         select ARCH_USE_BUILTIN_BSWAP
>         select ARCH_USE_QUEUED_RWLOCKS
>         select ARCH_USE_QUEUED_SPINLOCKS
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index fe605205b4ce2..383853e32f673 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
>  KBUILD_CFLAGS  += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
>  KBUILD_CFLAGS  += -fno-asynchronous-unwind-tables
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE := n
>
>  $(obj)/bzImage: asflags-y  := $(SVGA_MODE)
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index e0bc3988c3faa..ed12ab65f6065 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
>
>  KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE :=n
>
>  KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index 02e3e42f380bd..26e2b3af0145c 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO    $@
>  VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
>         $(call ld-option, --eh-frame-hdr) -Bsymbolic
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  quiet_cmd_vdso_and_check = VDSO    $@
>        cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index efd9e9ea17f25..f6cab2316c46a 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -184,6 +184,8 @@ SECTIONS
>
>         BUG_TABLE
>
> +       PGO_CLANG_DATA
> +
>         ORC_UNWIND_TABLE
>
>         . = ALIGN(PAGE_SIZE);
> diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> index 84b09c230cbd5..5f22b31446ad4 100644
> --- a/arch/x86/platform/efi/Makefile
> +++ b/arch/x86/platform/efi/Makefile
> @@ -2,6 +2,7 @@
>  OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
>  KASAN_SANITIZE := n
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  obj-$(CONFIG_EFI)              += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
>  obj-$(CONFIG_EFI_MIXED)                += efi_thunk_$(BITS).o
> diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> index 95ea17a9d20cb..36f20e99da0bc 100644
> --- a/arch/x86/purgatory/Makefile
> +++ b/arch/x86/purgatory/Makefile
> @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
>
>  # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
>  GCOV_PROFILE   := n
> +PGO_PROFILE    := n
>  KASAN_SANITIZE := n
>  UBSAN_SANITIZE := n
>  KCSAN_SANITIZE := n
> diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> index 83f1b6a56449f..21797192f958f 100644
> --- a/arch/x86/realmode/rm/Makefile
> +++ b/arch/x86/realmode/rm/Makefile
> @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
>  KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
>  KBUILD_CFLAGS  += -fno-asynchronous-unwind-tables
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE := n
> diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> index 5943387e3f357..54f5768f58530 100644
> --- a/arch/x86/um/vdso/Makefile
> +++ b/arch/x86/um/vdso/Makefile
> @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO    $@
>
>  VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  #
>  # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index 8a94388e38b33..2d81623b33f29 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -40,6 +40,7 @@ KBUILD_CFLAGS                 := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
>  KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
>
>  GCOV_PROFILE                   := n
> +PGO_PROFILE                    := n
>  # Sanitizer runtimes are unavailable and cannot be linked here.
>  KASAN_SANITIZE                 := n
>  KCSAN_SANITIZE                 := n
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index b2b3d81b1535a..3a591bb18c5fb 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -316,6 +316,49 @@
>  #define THERMAL_TABLE(name)
>  #endif
>
> +#ifdef CONFIG_PGO_CLANG
> +#define PGO_CLANG_DATA                                                 \
> +       __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) {     \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_start = .;                                   \
> +               __llvm_prf_data_start = .;                              \
> +               KEEP(*(__llvm_prf_data))                                \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_data_end = .;                                \
> +       }                                                               \

So if I do a build of
$ make CC=clang
I observe the following error:
`.discard.text' referenced in section `__llvm_prf_data' of
arch/x86/kernel/setup.o: defined in discarded section `.discard.text'
of arch/x86/kernel/setup.o
`.discard.text' referenced in section `__llvm_prf_data' of
arch/x86/mm/init.o: defined in discarded section `.discard.text' of
arch/x86/mm/init.o

This can be investigated more via:
$ llvm-objdump -Dr -j __llvm_prf_data arch/x86/mm/init.o  | grep discard
                0000000000000168:  R_X86_64_64  .discard.text

So looks like a relocation is referencing something in .discard.text.

$ llvm-objdump -Dr -j .discard.text arch/x86/mm/init.o
...
0000000000000000 <__brk_reservation_fn_early_pgt_alloc__>:
       0: 48 83 05 00 00 00 00 01       addq    $1, (%rip)  # 8
<__brk_reservation_fn_early_pgt_alloc__+0x8>
                0000000000000003:  R_X86_64_PC32        __llvm_prf_cnts+0xf3
       8: c3                            retq

Looks like arch/x86/include/asm/setup.h defines the macro RESERVE_BRK
which defines a static function in the .discard.text section.  Is
there a function attribute that we can use to say "please don't
profile this one function?"

For arch/x86/kernel/setup.o it looks like the same issue with
__brk_reservation_fn_dmi_alloc__.

More specifically, this warning goes away when using LLD:
$ make CC=clang LD=ld.lld

Not sure yet why these warnings are only observed when using BFD?
Maybe LLD should also be producing this diagnostic, but is not?

Anyways, I was able to build+boot profiling mode binaries built with:
$ make LLVM=1 defconfig+PGO
$ make LLVM=1 LLVM_IAS=1 defconfig+PGO

It would be good to resolve/investigate the above error with BFD and
fix it, or make this config also depend on LLD for now. ie.
$ make CC=clang defconfig+PGO

I'm going to try rebuilding+booting with the profile data now, and
will report back.

> +       __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) {     \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_cnts_start = .;                              \
> +               KEEP(*(__llvm_prf_cnts))                                \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_cnts_end = .;                                \
> +       }                                                               \
> +       __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) {   \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_names_start = .;                             \
> +               KEEP(*(__llvm_prf_names))                               \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_names_end = .;                               \
> +               . = ALIGN(8);                                           \
> +       }                                                               \
> +       __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) {     \
> +               __llvm_prf_vals_start = .;                              \
> +               KEEP(*(__llvm_prf_vals))                                \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_vals_end = .;                                \
> +               . = ALIGN(8);                                           \
> +       }                                                               \
> +       __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) {     \
> +               __llvm_prf_vnds_start = .;                              \
> +               KEEP(*(__llvm_prf_vnds))                                \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_vnds_end = .;                                \
> +               __llvm_prf_end = .;                                     \
> +       }
> +#else
> +#define PGO_CLANG_DATA
> +#endif
> +
>  #define KERNEL_DTB()                                                   \
>         STRUCT_ALIGN();                                                 \
>         __dtb_start = .;                                                \
> @@ -1125,6 +1168,7 @@
>                 CONSTRUCTORS                                            \
>         }                                                               \
>         BUG_TABLE                                                       \
> +       PGO_CLANG_DATA
>
>  #define INIT_TEXT_SECTION(inittext_align)                              \
>         . = ALIGN(inittext_align);                                      \
> diff --git a/kernel/Makefile b/kernel/Makefile
> index aa7368c7eabf3..0b34ca228ba46 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
>  obj-$(CONFIG_KCSAN) += kcsan/
>  obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
>  obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> +obj-$(CONFIG_PGO_CLANG) += pgo/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> new file mode 100644
> index 0000000000000..318d36bb3d106
> --- /dev/null
> +++ b/kernel/pgo/Kconfig
> @@ -0,0 +1,34 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> +
> +config ARCH_SUPPORTS_PGO_CLANG
> +       bool
> +
> +config PGO_CLANG
> +       bool "Enable clang's PGO-based kernel profiling"
> +       depends on DEBUG_FS
> +       depends on ARCH_SUPPORTS_PGO_CLANG

probably should additionally:
  depends on CLANG_VERSION >= 120000

I'm observing the same failed assertion as Nathan trying to build
x86_64 defconfig (drivers/gpu/drm/i915/i915_query.c), that should be
fixed by:
https://reviews.llvm.org/D94470

I think that would also help prevent this config from being selectable
if not using CC=clang?

> +       help
> +         This option enables clang's PGO (Profile Guided Optimization) based
> +         code profiling to better optimize the kernel.
> +
> +         If unsure, say N.
> +
> +         Run a representative workload for your application on a kernel
> +         compiled with this option and download the raw profile file from
> +         /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> +         llvm-profdata. It may be merged with other collected raw profiles.
> +
> +         Copy the resulting profile file into vmlinux.profdata, and enable
> +         KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> +         kernel.
> +
> +         Note that a kernel compiled with profiling flags will be
> +         significatnly larger and run slower. Also be sure to exclude files

^ typo: significantly

> +         from profiling which are not linked to the kernel image to prevent
> +         linker errors.
> +
> +         Note that the debugfs filesystem has to be mounted to access
> +         profiling data.
> +
> +endmenu
> diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> new file mode 100644
> index 0000000000000..41e27cefd9a47
> --- /dev/null
> +++ b/kernel/pgo/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +GCOV_PROFILE   := n
> +PGO_PROFILE    := n
> +
> +obj-y  += fs.o instrument.o
> diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> new file mode 100644
> index 0000000000000..790a8df037bfc
> --- /dev/null
> +++ b/kernel/pgo/fs.c
> @@ -0,0 +1,382 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + *     Sami Tolvanen <samitolvanen@google.com>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt)    "pgo: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/debugfs.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include "pgo.h"
> +
> +static struct dentry *directory;
> +
> +struct prf_private_data {
> +       void *buffer;
> +       unsigned long size;
> +};
> +
> +/*
> + * Raw profile data format:
> + *
> + *     - llvm_prf_header
> + *     - __llvm_prf_data
> + *     - __llvm_prf_cnts
> + *     - __llvm_prf_names
> + *     - zero padding to 8 bytes
> + *     - for each llvm_prf_data in __llvm_prf_data:
> + *             - llvm_prf_value_data
> + *                     - llvm_prf_value_record + site count array
> + *                             - llvm_prf_value_node_data
> + *                             ...
> + *                     ...
> + *             ...
> + */
> +
> +static void prf_fill_header(void **buffer)
> +{
> +       struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> +
> +       header->magic = LLVM_PRF_MAGIC;
> +       header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
> +       header->data_size = prf_data_count();
> +       header->padding_bytes_before_counters = 0;
> +       header->counters_size = prf_cnts_count();
> +       header->padding_bytes_after_counters = 0;
> +       header->names_size = prf_names_count();
> +       header->counters_delta = (u64)__llvm_prf_cnts_start;
> +       header->names_delta = (u64)__llvm_prf_names_start;
> +       header->value_kind_last = LLVM_PRF_IPVK_LAST;
> +
> +       *buffer += sizeof(*header);
> +}
> +
> +/*
> + * Copy the source into the buffer, incrementing the pointer into buffer in the
> + * process.
> + */
> +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> +{
> +       memcpy(*buffer, src, size);
> +       *buffer += size;
> +}
> +
> +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> +{
> +       struct llvm_prf_value_node **nodes =
> +               (struct llvm_prf_value_node **)p->values;
> +       u32 kinds = 0;
> +       u32 size = 0;
> +       unsigned int kind;
> +       unsigned int n;
> +       unsigned int s = 0;
> +
> +       for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> +               unsigned int sites = p->num_value_sites[kind];
> +
> +               if (!sites)
> +                       continue;
> +
> +               /* Record + site count array */
> +               size += prf_get_value_record_size(sites);
> +               kinds++;
> +
> +               if (!nodes)
> +                       continue;
> +
> +               for (n = 0; n < sites; n++) {
> +                       u32 count = 0;
> +                       struct llvm_prf_value_node *site = nodes[s + n];
> +
> +                       while (site && ++count <= U8_MAX)
> +                               site = site->next;
> +
> +                       size += count *
> +                               sizeof(struct llvm_prf_value_node_data);
> +               }
> +
> +               s += sites;
> +       }
> +
> +       if (size)
> +               size += sizeof(struct llvm_prf_value_data);
> +
> +       if (value_kinds)
> +               *value_kinds = kinds;
> +
> +       return size;
> +}
> +
> +static u32 prf_get_value_size(void)
> +{
> +       u32 size = 0;
> +       struct llvm_prf_data *p;
> +
> +       for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> +               size += __prf_get_value_size(p, NULL);
> +
> +       return size;
> +}
> +
> +/* Serialize the profiling's value. */
> +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> +{
> +       struct llvm_prf_value_data header;
> +       struct llvm_prf_value_node **nodes =
> +               (struct llvm_prf_value_node **)p->values;
> +       unsigned int kind;
> +       unsigned int n;
> +       unsigned int s = 0;
> +
> +       header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> +
> +       if (!header.num_value_kinds)
> +               /* Nothing to write. */
> +               return;
> +
> +       prf_copy_to_buffer(buffer, &header, sizeof(header));
> +
> +       for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> +               struct llvm_prf_value_record *record;
> +               u8 *counts;
> +               unsigned int sites = p->num_value_sites[kind];
> +
> +               if (!sites)
> +                       continue;
> +
> +               /* Profiling value record. */
> +               record = *(struct llvm_prf_value_record **)buffer;
> +               *buffer += prf_get_value_record_header_size();
> +
> +               record->kind = kind;
> +               record->num_value_sites = sites;
> +
> +               /* Site count array. */
> +               counts = *(u8 **)buffer;
> +               *buffer += prf_get_value_record_site_count_size(sites);
> +
> +               /*
> +                * If we don't have nodes, we can skip updating the site count
> +                * array, because the buffer is zero filled.
> +                */
> +               if (!nodes)
> +                       continue;
> +
> +               for (n = 0; n < sites; n++) {
> +                       u32 count = 0;
> +                       struct llvm_prf_value_node *site = nodes[s + n];
> +
> +                       while (site && ++count <= U8_MAX) {
> +                               prf_copy_to_buffer(buffer, site,
> +                                                  sizeof(struct llvm_prf_value_node_data));
> +                               site = site->next;
> +                       }
> +
> +                       counts[n] = (u8)count;
> +               }
> +
> +               s += sites;
> +       }
> +}
> +
> +static void prf_serialize_values(void **buffer)
> +{
> +       struct llvm_prf_data *p;
> +
> +       for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> +               prf_serialize_value(p, buffer);
> +}
> +
> +static inline unsigned long prf_get_padding(unsigned long size)
> +{
> +       return 8 - (size % 8);
> +}
> +
> +static unsigned long prf_buffer_size(void)
> +{
> +       return sizeof(struct llvm_prf_header) +
> +                       prf_data_size() +
> +                       prf_cnts_size() +
> +                       prf_names_size() +
> +                       prf_get_padding(prf_names_size()) +
> +                       prf_get_value_size();
> +}
> +
> +/* Serialize the profling data into a format LLVM's tools can understand. */

^ typo: profiling

> +static int prf_serialize(struct prf_private_data *p)
> +{
> +       int err = 0;
> +       void *buffer;
> +
> +       p->size = prf_buffer_size();
> +       p->buffer = vzalloc(p->size);
> +
> +       if (!p->buffer) {
> +               err = -ENOMEM;
> +               goto out;
> +       }
> +
> +       buffer = p->buffer;
> +
> +       prf_fill_header(&buffer);
> +       prf_copy_to_buffer(&buffer, __llvm_prf_data_start,  prf_data_size());
> +       prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start,  prf_cnts_size());
> +       prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> +       buffer += prf_get_padding(prf_names_size());
> +
> +       prf_serialize_values(&buffer);
> +
> +out:
> +       return err;
> +}
> +
> +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> +static int prf_open(struct inode *inode, struct file *file)
> +{
> +       struct prf_private_data *data;
> +       unsigned long flags;
> +       int err;
> +
> +       data = kzalloc(sizeof(*data), GFP_KERNEL);
> +       if (!data) {
> +               err = -ENOMEM;
> +               goto out;
> +       }
> +
> +       flags = prf_lock();
> +
> +       err = prf_serialize(data);
> +       if (err) {
> +               kfree(data);
> +               goto out_unlock;
> +       }
> +
> +       file->private_data = data;
> +
> +out_unlock:
> +       prf_unlock(flags);
> +out:
> +       return err;
> +}
> +
> +/* read() implementation for PGO. */
> +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> +                       loff_t *ppos)
> +{
> +       struct prf_private_data *data = file->private_data;
> +
> +       BUG_ON(!data);
> +
> +       return simple_read_from_buffer(buf, count, ppos, data->buffer,
> +                                      data->size);
> +}
> +
> +/* release() implementation for PGO. Release resources allocated by open(). */
> +static int prf_release(struct inode *inode, struct file *file)
> +{
> +       struct prf_private_data *data = file->private_data;
> +
> +       if (data) {
> +               vfree(data->buffer);
> +               kfree(data);
> +       }
> +
> +       return 0;
> +}
> +
> +static const struct file_operations prf_fops = {
> +       .owner          = THIS_MODULE,
> +       .open           = prf_open,
> +       .read           = prf_read,
> +       .llseek         = default_llseek,
> +       .release        = prf_release
> +};
> +
> +/* write() implementation for resetting PGO's profile data. */
> +static ssize_t reset_write(struct file *file, const char __user *addr,
> +                          size_t len, loff_t *pos)
> +{
> +       struct llvm_prf_data *data;
> +
> +       memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> +
> +       for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
> +               struct llvm_prf_value_node **vnodes;
> +               u64 current_vsite_count;
> +               u32 i;
> +
> +               if (!data->values)
> +                       continue;
> +
> +               current_vsite_count = 0;
> +               vnodes = (struct llvm_prf_value_node **)data->values;
> +
> +               for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
> +                       current_vsite_count += data->num_value_sites[i];
> +
> +               for (i = 0; i < current_vsite_count; ++i) {
> +                       struct llvm_prf_value_node *current_vnode = vnodes[i];
> +
> +                       while (current_vnode) {
> +                               current_vnode->count = 0;
> +                               current_vnode = current_vnode->next;
> +                       }
> +               }
> +       }
> +
> +       return len;
> +}
> +
> +static const struct file_operations prf_reset_fops = {
> +       .owner          = THIS_MODULE,
> +       .write          = reset_write,
> +       .llseek         = noop_llseek,
> +};
> +
> +/* Create debugfs entries. */
> +static int __init pgo_init(void)
> +{
> +       directory = debugfs_create_dir("pgo", NULL);
> +       if (!directory)
> +               goto err_remove;
> +
> +       if (!debugfs_create_file("profraw", 0600, directory, NULL,
> +                                &prf_fops))
> +               goto err_remove;
> +
> +       if (!debugfs_create_file("reset", 0200, directory, NULL,
> +                                &prf_reset_fops))
> +               goto err_remove;
> +
> +       return 0;
> +
> +err_remove:
> +       pr_err("initialization failed\n");
> +       return -EIO;
> +}
> +
> +/* Remove debufs entries. */

^ typo: debugfs

> +static void __exit pgo_exit(void)
> +{
> +       debugfs_remove_recursive(directory);
> +}
> +
> +module_init(pgo_init);
> +module_exit(pgo_exit);
> diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> new file mode 100644
> index 0000000000000..6084ff0652e85
> --- /dev/null
> +++ b/kernel/pgo/instrument.c
> @@ -0,0 +1,185 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + *     Sami Tolvanen <samitolvanen@google.com>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt)    "pgo: " fmt
> +
> +#include <linux/bitops.h>
> +#include <linux/kernel.h>
> +#include <linux/export.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +#include "pgo.h"
> +
> +/* Lock guarding value node access and serialization. */
> +static DEFINE_SPINLOCK(pgo_lock);
> +static int current_node;
> +
> +unsigned long prf_lock(void)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&pgo_lock, flags);
> +
> +       return flags;
> +}
> +
> +void prf_unlock(unsigned long flags)
> +{
> +       spin_unlock_irqrestore(&pgo_lock, flags);
> +}
> +
> +/*
> + * Return a newly allocated profiling value node which contains the tracked
> + * value by the value profiler.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> +                                                u32 index, u64 value)
> +{
> +       if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> +               return NULL; /* Out of nodes */
> +
> +       current_node++;
> +
> +       /* Make sure the node is entirely within the section */
> +       if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> +           &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> +               return NULL;
> +
> +       return &__llvm_prf_vnds_start[current_node];
> +}
> +
> +/*
> + * Counts the number of times a target value is seen.
> + *
> + * Records the target value for the CounterIndex if not seen before. Otherwise,
> + * increments the counter associated w/ the target value.
> + */
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> +{
> +       struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> +       struct llvm_prf_value_node **counters;
> +       struct llvm_prf_value_node *curr;
> +       struct llvm_prf_value_node *min = NULL;
> +       struct llvm_prf_value_node *prev = NULL;
> +       u64 min_count = U64_MAX;
> +       u8 values = 0;
> +       unsigned long flags;
> +
> +       if (!p || !p->values)
> +               return;
> +
> +       counters = (struct llvm_prf_value_node **)p->values;
> +       curr = counters[index];
> +
> +       while (curr) {
> +               if (target_value == curr->value) {
> +                       curr->count++;
> +                       return;
> +               }
> +
> +               if (curr->count < min_count) {
> +                       min_count = curr->count;
> +                       min = curr;
> +               }
> +
> +               prev = curr;
> +               curr = curr->next;
> +               values++;
> +       }
> +
> +       if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
> +               if (!min->count || !(--min->count)) {
> +                       curr = min;
> +                       curr->value = target_value;
> +                       curr->count++;
> +               }
> +               return;
> +       }
> +
> +       /* Lock when updating the value node structure. */
> +       flags = prf_lock();
> +
> +       curr = allocate_node(p, index, target_value);
> +       if (!curr)
> +               goto out;
> +
> +       curr->value = target_value;
> +       curr->count++;
> +
> +       if (!counters[index])
> +               counters[index] = curr;
> +       else if (prev && !prev->next)
> +               prev->next = curr;
> +
> +out:
> +       prf_unlock(flags);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> +
> +/* Counts the number of times a range of targets values are seen. */
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> +                                    u32 index, s64 precise_start,
> +                                    s64 precise_last, s64 large_value);
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> +                                    u32 index, s64 precise_start,
> +                                    s64 precise_last, s64 large_value)
> +{
> +       if (large_value != S64_MIN && (s64)target_value >= large_value)
> +               target_value = large_value;
> +       else if ((s64)target_value < precise_start ||
> +                (s64)target_value > precise_last)
> +               target_value = precise_last + 1;
> +
> +       __llvm_profile_instrument_target(target_value, data, index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> +
> +static u64 inst_prof_get_range_rep_value(u64 value)
> +{
> +       if (value <= 8)
> +               /* The first ranges are individually tracked, us it as is. */
> +               return value;
> +       else if (value >= 513)
> +               /* The last range is mapped to its lowest value. */
> +               return 513;
> +       else if (hweight64(value) == 1)
> +               /* If it's a power of two, use it as is. */
> +               return value;
> +
> +       /* Otherwise, take to the previous power of two + 1. */
> +       return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
> +}
> +
> +/*
> + * The target values are partitioned into multiple ranges. The range spec is
> + * defined in compiler-rt/include/profile/InstrProfData.inc.
> + */
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> +                                    u32 counter_index);
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> +                                    u32 counter_index)
> +{
> +       u64 rep_value;
> +
> +       /* Map the target value to the representative value of its range. */
> +       rep_value = inst_prof_get_range_rep_value(target_value);
> +       __llvm_profile_instrument_target(rep_value, data, counter_index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> new file mode 100644
> index 0000000000000..df0aa278f28bd
> --- /dev/null
> +++ b/kernel/pgo/pgo.h
> @@ -0,0 +1,206 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + *     Sami Tolvanen <samitolvanen@google.com>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#ifndef _PGO_H
> +#define _PGO_H
> +
> +/*
> + * Note: These internal LLVM definitions must match the compiler version.
> + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> + */
> +
> +#ifdef CONFIG_64BIT
> +       #define LLVM_PRF_MAGIC          \
> +               ((u64)255 << 56 |       \
> +                (u64)'l' << 48 |       \
> +                (u64)'p' << 40 |       \
> +                (u64)'r' << 32 |       \
> +                (u64)'o' << 24 |       \
> +                (u64)'f' << 16 |       \
> +                (u64)'r' << 8  |       \
> +                (u64)129)
> +#else
> +       #define LLVM_PRF_MAGIC          \
> +               ((u64)255 << 56 |       \
> +                (u64)'l' << 48 |       \
> +                (u64)'p' << 40 |       \
> +                (u64)'r' << 32 |       \
> +                (u64)'o' << 24 |       \
> +                (u64)'f' << 16 |       \
> +                (u64)'R' << 8  |       \
> +                (u64)129)
> +#endif
> +
> +#define LLVM_PRF_VERSION               5
> +#define LLVM_PRF_DATA_ALIGN            8
> +#define LLVM_PRF_IPVK_FIRST            0
> +#define LLVM_PRF_IPVK_LAST             1
> +#define LLVM_PRF_MAX_NUM_VALS_PER_SITE 16
> +
> +#define LLVM_PRF_VARIANT_MASK_IR       (0x1ull << 56)
> +#define LLVM_PRF_VARIANT_MASK_CSIR     (0x1ull << 57)
> +
> +/**
> + * struct llvm_prf_header - represents the raw profile header data structure.
> + * @magic: the magic token for the file format.
> + * @version: the version of the file format.
> + * @data_size: the number of entries in the profile data section.
> + * @padding_bytes_before_counters: the number of padding bytes before the
> + *   counters.
> + * @counters_size: the size in bytes of the LLVM profile section containing the
> + *   counters.
> + * @padding_bytes_after_counters: the number of padding bytes after the
> + *   counters.
> + * @names_size: the size in bytes of the LLVM profile section containing the
> + *   counters' names.
> + * @counters_delta: the beginning of the LLMV profile counters section.
> + * @names_delta: the beginning of the LLMV profile names section.
> + * @value_kind_last: the last profile value kind.
> + */
> +struct llvm_prf_header {
> +       u64 magic;
> +       u64 version;
> +       u64 data_size;
> +       u64 padding_bytes_before_counters;
> +       u64 counters_size;
> +       u64 padding_bytes_after_counters;
> +       u64 names_size;
> +       u64 counters_delta;
> +       u64 names_delta;
> +       u64 value_kind_last;
> +};
> +
> +/**
> + * struct llvm_prf_data - represents the per-function control structure.
> + * @name_ref: the reference to the function's name.
> + * @func_hash: the hash value of the function.
> + * @counter_ptr: a pointer to the profile counter.
> + * @function_ptr: a pointer to the function.
> + * @values: the profiling values associated with this function.
> + * @num_counters: the number of counters in the function.
> + * @num_value_sites: the number of value profile sites.
> + */
> +struct llvm_prf_data {
> +       const u64 name_ref;
> +       const u64 func_hash;
> +       const void *counter_ptr;
> +       const void *function_ptr;
> +       void *values;
> +       const u32 num_counters;
> +       const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
> +} __aligned(LLVM_PRF_DATA_ALIGN);
> +
> +/**
> + * structure llvm_prf_value_node_data - represents the data part of the struct
> + *   llvm_prf_value_node data structure.
> + * @value: the value counters.
> + * @count: the counters' count.
> + */
> +struct llvm_prf_value_node_data {
> +       u64 value;
> +       u64 count;
> +};
> +
> +/**
> + * struct llvm_prf_value_node - represents an internal data structure used by
> + *   the value profiler.
> + * @value: the value counters.
> + * @count: the counters' count.
> + * @next: the next value node.
> + */
> +struct llvm_prf_value_node {
> +       u64 value;
> +       u64 count;
> +       struct llvm_prf_value_node *next;
> +};
> +
> +/**
> + * struct llvm_prf_value_data - represents the value profiling data in indexed
> + *   format.
> + * @total_size: the total size in bytes including this field.
> + * @num_value_kinds: the number of value profile kinds that has value profile
> + *   data.
> + */
> +struct llvm_prf_value_data {
> +       u32 total_size;
> +       u32 num_value_kinds;
> +};
> +
> +/**
> + * struct llvm_prf_value_record - represents the on-disk layout of the value
> + *   profile data of a particular kind for one function.
> + * @kind: the kind of the value profile record.
> + * @num_value_sites: the number of value profile sites.
> + * @site_count_array: the first element of the array that stores the number
> + *   of profiled values for each value site.
> + */
> +struct llvm_prf_value_record {
> +       u32 kind;
> +       u32 num_value_sites;
> +       u8 site_count_array[];
> +};
> +
> +#define prf_get_value_record_header_size()             \
> +       offsetof(struct llvm_prf_value_record, site_count_array)
> +#define prf_get_value_record_site_count_size(sites)    \
> +       roundup((sites), 8)
> +#define prf_get_value_record_size(sites)               \
> +       (prf_get_value_record_header_size() +           \
> +        prf_get_value_record_site_count_size((sites)))
> +
> +/* Data sections */
> +extern struct llvm_prf_data __llvm_prf_data_start[];
> +extern struct llvm_prf_data __llvm_prf_data_end[];
> +
> +extern u64 __llvm_prf_cnts_start[];
> +extern u64 __llvm_prf_cnts_end[];
> +
> +extern char __llvm_prf_names_start[];
> +extern char __llvm_prf_names_end[];
> +
> +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> +
> +/* Locking for vnodes */
> +extern unsigned long prf_lock(void);
> +extern void prf_unlock(unsigned long flags);
> +
> +#define __DEFINE_PRF_SIZE(s) \
> +       static inline unsigned long prf_ ## s ## _size(void)            \
> +       {                                                               \
> +               unsigned long start =                                   \
> +                       (unsigned long)__llvm_prf_ ## s ## _start;      \
> +               unsigned long end =                                     \
> +                       (unsigned long)__llvm_prf_ ## s ## _end;        \
> +               return roundup(end - start,                             \
> +                               sizeof(__llvm_prf_ ## s ## _start[0])); \
> +       }                                                               \
> +       static inline unsigned long prf_ ## s ## _count(void)           \
> +       {                                                               \
> +               return prf_ ## s ## _size() /                           \
> +                       sizeof(__llvm_prf_ ## s ## _start[0]);          \
> +       }
> +
> +__DEFINE_PRF_SIZE(data);
> +__DEFINE_PRF_SIZE(cnts);
> +__DEFINE_PRF_SIZE(names);
> +__DEFINE_PRF_SIZE(vnds);
> +
> +#undef __DEFINE_PRF_SIZE
> +
> +#endif /* _PGO_H */
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 213677a5ed33e..9b218afb5cb87 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
>                 $(CFLAGS_GCOV))
>  endif
>
> +#
> +# Enable clang's PGO profiling flags for a file or directory depending on
> +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> +#
> +ifeq ($(CONFIG_PGO_CLANG),y)
> +_c_flags += $(if $(patsubst n%,, \
> +               $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> +               $(CFLAGS_PGO_CLANG))
> +endif
> +
>  #
>  # Enable address sanitizer flags for kernel except some files or directories
>  # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> --
> 2.30.0.284.gd98b1dd5eaa7-goog
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-13 20:55       ` Nathan Chancellor
  2021-01-13 21:59         ` Bill Wendling
@ 2021-01-14  4:07         ` Nick Desaulniers
  2021-01-16  0:01           ` Nick Desaulniers
  1 sibling, 1 reply; 116+ messages in thread
From: Nick Desaulniers @ 2021-01-14  4:07 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Bill Wendling, Jonathan Corbet, Masahiro Yamada,
	Linux Doc Mailing List, LKML, Linux Kbuild mailing list,
	clang-built-linux, Andrew Morton, Sami Tolvanen

On Wed, Jan 13, 2021 at 12:55 PM Nathan Chancellor
<natechancellor@gmail.com> wrote:
>
> However, I see an issue with actually using the data:
>
> $ sudo -s
> # mount -t debugfs none /sys/kernel/debug
> # cp -a /sys/kernel/debug/pgo/profraw vmlinux.profraw
> # chown nathan:nathan vmlinux.profraw
> # exit
> $ tc-build/build/llvm/stage1/bin/llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> warning: vmlinux.profraw: Invalid instrumentation profile data (bad magic)
> error: No profiles could be merged.
>
> Am I holding it wrong? :) Note, this is virtualized, I do not have any
> "real" x86 hardware that I can afford to test on right now.

Same.

I think the magic calculation in this patch may differ from upstream
llvm: https://github.com/llvm/llvm-project/blob/49142991a685bd427d7e877c29c77371dfb7634c/llvm/include/llvm/ProfileData/SampleProf.h#L96-L101

vs this patch:

+ * Note: These internal LLVM definitions must match the compiler version.
+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
+ */
+
+#ifdef CONFIG_64BIT
+ #define LLVM_PRF_MAGIC \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'r' << 8  | \
+ (u64)129)
+#else
+ #define LLVM_PRF_MAGIC \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'R' << 8  | \
+ (u64)129)
+#endif

-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-14  4:07         ` Nick Desaulniers
@ 2021-01-16  0:01           ` Nick Desaulniers
  2021-01-16  0:13             ` Nick Desaulniers
  0 siblings, 1 reply; 116+ messages in thread
From: Nick Desaulniers @ 2021-01-16  0:01 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Jonathan Corbet, Masahiro Yamada, Linux Doc Mailing List, LKML,
	Linux Kbuild mailing list, clang-built-linux, Andrew Morton,
	Sami Tolvanen, Nathan Chancellor, Linus Torvalds

On Wed, Jan 13, 2021 at 8:07 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Wed, Jan 13, 2021 at 12:55 PM Nathan Chancellor
> <natechancellor@gmail.com> wrote:
> >
> > However, I see an issue with actually using the data:
> >
> > $ sudo -s
> > # mount -t debugfs none /sys/kernel/debug
> > # cp -a /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > # chown nathan:nathan vmlinux.profraw
> > # exit
> > $ tc-build/build/llvm/stage1/bin/llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > warning: vmlinux.profraw: Invalid instrumentation profile data (bad magic)
> > error: No profiles could be merged.
> >
> > Am I holding it wrong? :) Note, this is virtualized, I do not have any
> > "real" x86 hardware that I can afford to test on right now.
>
> Same.
>
> I think the magic calculation in this patch may differ from upstream
> llvm: https://github.com/llvm/llvm-project/blob/49142991a685bd427d7e877c29c77371dfb7634c/llvm/include/llvm/ProfileData/SampleProf.h#L96-L101

Err...it looks like it was the padding calculation.  With that fixed
up, we can query the profile data to get insights on the most heavily
called functions.  Here's what my top 20 are (reset, then watch 10
minutes worth of cat videos on youtube while running `find /` and
sleeping at my desk).  Anything curious stand out to anyone?

$ llvm-profdata show -topn=20 /tmp/vmlinux.profraw
Instrumentation level: IR  entry_first = 0
Total functions: 48970
Maximum function count: 62070879
Maximum internal block count: 83221158
Top 20 functions with the largest internal block counts:
  drivers/tty/n_tty.c:n_tty_write, max count = 83221158
  rcu_read_unlock_strict, max count = 62070879
  _cond_resched, max count = 25486882
  rcu_all_qs, max count = 25451477
  drivers/cpuidle/poll_state.c:poll_idle, max count = 23618576
  _raw_spin_unlock_irqrestore, max count = 18874121
  drivers/cpuidle/governors/menu.c:menu_select, max count = 18721624
  _raw_spin_lock_irqsave, max count = 18509161
  memchr, max count = 15525452
  _raw_spin_lock, max count = 15484254
  __mod_memcg_state, max count = 14604619
  __mod_memcg_lruvec_state, max count = 14602783
  fs/ext4/hash.c:str2hashbuf_signed, max count = 14098424
  __mod_lruvec_state, max count = 12527154
  __mod_node_page_state, max count = 12525172
  native_sched_clock, max count = 8904692
  sched_clock_cpu, max count = 8895832
  sched_clock, max count = 8894627
  kernel/entry/common.c:exit_to_user_mode_prepare, max count = 8289031
  fpregs_assert_state_consistent, max count = 8287198

-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-16  0:01           ` Nick Desaulniers
@ 2021-01-16  0:13             ` Nick Desaulniers
  2021-01-16  4:30               ` Sedat Dilek
                                 ` (2 more replies)
  0 siblings, 3 replies; 116+ messages in thread
From: Nick Desaulniers @ 2021-01-16  0:13 UTC (permalink / raw)
  To: ndesaulniers
  Cc: akpm, clang-built-linux, corbet, linux-doc, linux-kbuild,
	linux-kernel, masahiroy, morbo, natechancellor, samitolvanen,
	torvalds

> On Wed, Jan 13, 2021 at 8:07 PM Nick Desaulniers
> <ndesaulniers@google.com> wrote:
> >
> > On Wed, Jan 13, 2021 at 12:55 PM Nathan Chancellor
> > <natechancellor@gmail.com> wrote:
> > >
> > > However, I see an issue with actually using the data:
> > >
> > > $ sudo -s
> > > # mount -t debugfs none /sys/kernel/debug
> > > # cp -a /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > # chown nathan:nathan vmlinux.profraw
> > > # exit
> > > $ tc-build/build/llvm/stage1/bin/llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > warning: vmlinux.profraw: Invalid instrumentation profile data (bad magic)
> > > error: No profiles could be merged.
> > >
> > > Am I holding it wrong? :) Note, this is virtualized, I do not have any
> > > "real" x86 hardware that I can afford to test on right now.
> >
> > Same.
> >
> > I think the magic calculation in this patch may differ from upstream
> > llvm: https://github.com/llvm/llvm-project/blob/49142991a685bd427d7e877c29c77371dfb7634c/llvm/include/llvm/ProfileData/SampleProf.h#L96-L101
> 
> Err...it looks like it was the padding calculation.  With that fixed
> up, we can query the profile data to get insights on the most heavily
> called functions.  Here's what my top 20 are (reset, then watch 10
> minutes worth of cat videos on youtube while running `find /` and
> sleeping at my desk).  Anything curious stand out to anyone?

Hello world from my personal laptop whose kernel was rebuilt with
profiling data!  Wow, I can run `find /` and watch cat videos on youtube
so fast!  Users will love this! /s

Checking the sections sizes of .text.hot. and .text.unlikely. looks
good!

> 
> $ llvm-profdata show -topn=20 /tmp/vmlinux.profraw
> Instrumentation level: IR  entry_first = 0
> Total functions: 48970
> Maximum function count: 62070879
> Maximum internal block count: 83221158
> Top 20 functions with the largest internal block counts:
>   drivers/tty/n_tty.c:n_tty_write, max count = 83221158
>   rcu_read_unlock_strict, max count = 62070879
>   _cond_resched, max count = 25486882
>   rcu_all_qs, max count = 25451477
>   drivers/cpuidle/poll_state.c:poll_idle, max count = 23618576
>   _raw_spin_unlock_irqrestore, max count = 18874121
>   drivers/cpuidle/governors/menu.c:menu_select, max count = 18721624
>   _raw_spin_lock_irqsave, max count = 18509161
>   memchr, max count = 15525452
>   _raw_spin_lock, max count = 15484254
>   __mod_memcg_state, max count = 14604619
>   __mod_memcg_lruvec_state, max count = 14602783
>   fs/ext4/hash.c:str2hashbuf_signed, max count = 14098424
>   __mod_lruvec_state, max count = 12527154
>   __mod_node_page_state, max count = 12525172
>   native_sched_clock, max count = 8904692
>   sched_clock_cpu, max count = 8895832
>   sched_clock, max count = 8894627
>   kernel/entry/common.c:exit_to_user_mode_prepare, max count = 8289031
>   fpregs_assert_state_consistent, max count = 8287198
> 
> -- 
> Thanks,
> ~Nick Desaulniers
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-16  0:13             ` Nick Desaulniers
@ 2021-01-16  4:30               ` Sedat Dilek
  2021-01-16  5:07               ` Sedat Dilek
  2021-01-18  0:57               ` Sedat Dilek
  2 siblings, 0 replies; 116+ messages in thread
From: Sedat Dilek @ 2021-01-16  4:30 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Nick Desaulniers, akpm, Clang-Built-Linux ML, corbet, linux-doc,
	linux-kbuild, linux-kernel, masahiroy, Bill Wendling,
	Nathan Chancellor, Sami Tolvanen, Linus Torvalds

On Sat, Jan 16, 2021 at 1:13 AM Nick Desaulniers
<nick.desaulniers@gmail.com> wrote:
>
> > On Wed, Jan 13, 2021 at 8:07 PM Nick Desaulniers
> > <ndesaulniers@google.com> wrote:
> > >
> > > On Wed, Jan 13, 2021 at 12:55 PM Nathan Chancellor
> > > <natechancellor@gmail.com> wrote:
> > > >
> > > > However, I see an issue with actually using the data:
> > > >
> > > > $ sudo -s
> > > > # mount -t debugfs none /sys/kernel/debug
> > > > # cp -a /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > # chown nathan:nathan vmlinux.profraw
> > > > # exit
> > > > $ tc-build/build/llvm/stage1/bin/llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > warning: vmlinux.profraw: Invalid instrumentation profile data (bad magic)
> > > > error: No profiles could be merged.
> > > >
> > > > Am I holding it wrong? :) Note, this is virtualized, I do not have any
> > > > "real" x86 hardware that I can afford to test on right now.
> > >
> > > Same.
> > >
> > > I think the magic calculation in this patch may differ from upstream
> > > llvm: https://github.com/llvm/llvm-project/blob/49142991a685bd427d7e877c29c77371dfb7634c/llvm/include/llvm/ProfileData/SampleProf.h#L96-L101
> >
> > Err...it looks like it was the padding calculation.  With that fixed
> > up, we can query the profile data to get insights on the most heavily
> > called functions.  Here's what my top 20 are (reset, then watch 10
> > minutes worth of cat videos on youtube while running `find /` and
> > sleeping at my desk).  Anything curious stand out to anyone?
>
> Hello world from my personal laptop whose kernel was rebuilt with
> profiling data!  Wow, I can run `find /` and watch cat videos on youtube
> so fast!  Users will love this! /s
>
> Checking the sections sizes of .text.hot. and .text.unlikely. looks
> good!
>

I love cat videos on youtube and do find parallelly...

I must try this :-)!

Might be good to write up an instruction (README) for followers?

- Sedat -

> >
> > $ llvm-profdata show -topn=20 /tmp/vmlinux.profraw
> > Instrumentation level: IR  entry_first = 0
> > Total functions: 48970
> > Maximum function count: 62070879
> > Maximum internal block count: 83221158
> > Top 20 functions with the largest internal block counts:
> >   drivers/tty/n_tty.c:n_tty_write, max count = 83221158
> >   rcu_read_unlock_strict, max count = 62070879
> >   _cond_resched, max count = 25486882
> >   rcu_all_qs, max count = 25451477
> >   drivers/cpuidle/poll_state.c:poll_idle, max count = 23618576
> >   _raw_spin_unlock_irqrestore, max count = 18874121
> >   drivers/cpuidle/governors/menu.c:menu_select, max count = 18721624
> >   _raw_spin_lock_irqsave, max count = 18509161
> >   memchr, max count = 15525452
> >   _raw_spin_lock, max count = 15484254
> >   __mod_memcg_state, max count = 14604619
> >   __mod_memcg_lruvec_state, max count = 14602783
> >   fs/ext4/hash.c:str2hashbuf_signed, max count = 14098424
> >   __mod_lruvec_state, max count = 12527154
> >   __mod_node_page_state, max count = 12525172
> >   native_sched_clock, max count = 8904692
> >   sched_clock_cpu, max count = 8895832
> >   sched_clock, max count = 8894627
> >   kernel/entry/common.c:exit_to_user_mode_prepare, max count = 8289031
> >   fpregs_assert_state_consistent, max count = 8287198
> >
> > --
> > Thanks,
> > ~Nick Desaulniers
> >
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210116001324.2865-1-nick.desaulniers%40gmail.com.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-16  0:13             ` Nick Desaulniers
  2021-01-16  4:30               ` Sedat Dilek
@ 2021-01-16  5:07               ` Sedat Dilek
  2021-01-16  5:18                 ` Sedat Dilek
  2021-01-18  0:57               ` Sedat Dilek
  2 siblings, 1 reply; 116+ messages in thread
From: Sedat Dilek @ 2021-01-16  5:07 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Nick Desaulniers, akpm, Clang-Built-Linux ML, corbet, linux-doc,
	linux-kbuild, linux-kernel, masahiroy, Bill Wendling,
	Nathan Chancellor, Sami Tolvanen, Linus Torvalds

On Sat, Jan 16, 2021 at 1:13 AM Nick Desaulniers
<nick.desaulniers@gmail.com> wrote:
>
> > On Wed, Jan 13, 2021 at 8:07 PM Nick Desaulniers
> > <ndesaulniers@google.com> wrote:
> > >
> > > On Wed, Jan 13, 2021 at 12:55 PM Nathan Chancellor
> > > <natechancellor@gmail.com> wrote:
> > > >
> > > > However, I see an issue with actually using the data:
> > > >
> > > > $ sudo -s
> > > > # mount -t debugfs none /sys/kernel/debug
> > > > # cp -a /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > # chown nathan:nathan vmlinux.profraw
> > > > # exit
> > > > $ tc-build/build/llvm/stage1/bin/llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > warning: vmlinux.profraw: Invalid instrumentation profile data (bad magic)
> > > > error: No profiles could be merged.
> > > >
> > > > Am I holding it wrong? :) Note, this is virtualized, I do not have any
> > > > "real" x86 hardware that I can afford to test on right now.
> > >
> > > Same.
> > >
> > > I think the magic calculation in this patch may differ from upstream
> > > llvm: https://github.com/llvm/llvm-project/blob/49142991a685bd427d7e877c29c77371dfb7634c/llvm/include/llvm/ProfileData/SampleProf.h#L96-L101
> >
> > Err...it looks like it was the padding calculation.  With that fixed
> > up, we can query the profile data to get insights on the most heavily
> > called functions.  Here's what my top 20 are (reset, then watch 10
> > minutes worth of cat videos on youtube while running `find /` and
> > sleeping at my desk).  Anything curious stand out to anyone?
>
> Hello world from my personal laptop whose kernel was rebuilt with
> profiling data!  Wow, I can run `find /` and watch cat videos on youtube
> so fast!  Users will love this! /s
>
> Checking the sections sizes of .text.hot. and .text.unlikely. looks
> good!
>

Is that the latest status of Bill's patch?

Or do you have me a lore link?

- Sedat -

[1] https://github.com/gwelymernans/linux/commits/gwelymernans/linux


> >
> > $ llvm-profdata show -topn=20 /tmp/vmlinux.profraw
> > Instrumentation level: IR  entry_first = 0
> > Total functions: 48970
> > Maximum function count: 62070879
> > Maximum internal block count: 83221158
> > Top 20 functions with the largest internal block counts:
> >   drivers/tty/n_tty.c:n_tty_write, max count = 83221158
> >   rcu_read_unlock_strict, max count = 62070879
> >   _cond_resched, max count = 25486882
> >   rcu_all_qs, max count = 25451477
> >   drivers/cpuidle/poll_state.c:poll_idle, max count = 23618576
> >   _raw_spin_unlock_irqrestore, max count = 18874121
> >   drivers/cpuidle/governors/menu.c:menu_select, max count = 18721624
> >   _raw_spin_lock_irqsave, max count = 18509161
> >   memchr, max count = 15525452
> >   _raw_spin_lock, max count = 15484254
> >   __mod_memcg_state, max count = 14604619
> >   __mod_memcg_lruvec_state, max count = 14602783
> >   fs/ext4/hash.c:str2hashbuf_signed, max count = 14098424
> >   __mod_lruvec_state, max count = 12527154
> >   __mod_node_page_state, max count = 12525172
> >   native_sched_clock, max count = 8904692
> >   sched_clock_cpu, max count = 8895832
> >   sched_clock, max count = 8894627
> >   kernel/entry/common.c:exit_to_user_mode_prepare, max count = 8289031
> >   fpregs_assert_state_consistent, max count = 8287198
> >
> > --
> > Thanks,
> > ~Nick Desaulniers
> >
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210116001324.2865-1-nick.desaulniers%40gmail.com.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-16  5:07               ` Sedat Dilek
@ 2021-01-16  5:18                 ` Sedat Dilek
  0 siblings, 0 replies; 116+ messages in thread
From: Sedat Dilek @ 2021-01-16  5:18 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Nick Desaulniers, akpm, Clang-Built-Linux ML, corbet, linux-doc,
	linux-kbuild, linux-kernel, masahiroy, Bill Wendling,
	Nathan Chancellor, Sami Tolvanen, Linus Torvalds

On Sat, Jan 16, 2021 at 6:07 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
> On Sat, Jan 16, 2021 at 1:13 AM Nick Desaulniers
> <nick.desaulniers@gmail.com> wrote:
> >
> > > On Wed, Jan 13, 2021 at 8:07 PM Nick Desaulniers
> > > <ndesaulniers@google.com> wrote:
> > > >
> > > > On Wed, Jan 13, 2021 at 12:55 PM Nathan Chancellor
> > > > <natechancellor@gmail.com> wrote:
> > > > >
> > > > > However, I see an issue with actually using the data:
> > > > >
> > > > > $ sudo -s
> > > > > # mount -t debugfs none /sys/kernel/debug
> > > > > # cp -a /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > > # chown nathan:nathan vmlinux.profraw
> > > > > # exit
> > > > > $ tc-build/build/llvm/stage1/bin/llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > warning: vmlinux.profraw: Invalid instrumentation profile data (bad magic)
> > > > > error: No profiles could be merged.
> > > > >
> > > > > Am I holding it wrong? :) Note, this is virtualized, I do not have any
> > > > > "real" x86 hardware that I can afford to test on right now.
> > > >
> > > > Same.
> > > >
> > > > I think the magic calculation in this patch may differ from upstream
> > > > llvm: https://github.com/llvm/llvm-project/blob/49142991a685bd427d7e877c29c77371dfb7634c/llvm/include/llvm/ProfileData/SampleProf.h#L96-L101
> > >
> > > Err...it looks like it was the padding calculation.  With that fixed
> > > up, we can query the profile data to get insights on the most heavily
> > > called functions.  Here's what my top 20 are (reset, then watch 10
> > > minutes worth of cat videos on youtube while running `find /` and
> > > sleeping at my desk).  Anything curious stand out to anyone?
> >
> > Hello world from my personal laptop whose kernel was rebuilt with
> > profiling data!  Wow, I can run `find /` and watch cat videos on youtube
> > so fast!  Users will love this! /s
> >
> > Checking the sections sizes of .text.hot. and .text.unlikely. looks
> > good!
> >
>
> Is that the latest status of Bill's patch?
>
> Or do you have me a lore link?
>

I tried with the message-id of Bill's initial email:

link="https://lore.kernel.org/r/20210111081821.3041587-1-morbo@google.com"
b4 -d am $link

This gives me:

v4_20210112_morbo_pgo_add_clang_s_profile_guided_optimization_infrastructure.mbx

- Sedat -

>
> [1] https://github.com/gwelymernans/linux/commits/gwelymernans/linux
>
>
> > >
> > > $ llvm-profdata show -topn=20 /tmp/vmlinux.profraw
> > > Instrumentation level: IR  entry_first = 0
> > > Total functions: 48970
> > > Maximum function count: 62070879
> > > Maximum internal block count: 83221158
> > > Top 20 functions with the largest internal block counts:
> > >   drivers/tty/n_tty.c:n_tty_write, max count = 83221158
> > >   rcu_read_unlock_strict, max count = 62070879
> > >   _cond_resched, max count = 25486882
> > >   rcu_all_qs, max count = 25451477
> > >   drivers/cpuidle/poll_state.c:poll_idle, max count = 23618576
> > >   _raw_spin_unlock_irqrestore, max count = 18874121
> > >   drivers/cpuidle/governors/menu.c:menu_select, max count = 18721624
> > >   _raw_spin_lock_irqsave, max count = 18509161
> > >   memchr, max count = 15525452
> > >   _raw_spin_lock, max count = 15484254
> > >   __mod_memcg_state, max count = 14604619
> > >   __mod_memcg_lruvec_state, max count = 14602783
> > >   fs/ext4/hash.c:str2hashbuf_signed, max count = 14098424
> > >   __mod_lruvec_state, max count = 12527154
> > >   __mod_node_page_state, max count = 12525172
> > >   native_sched_clock, max count = 8904692
> > >   sched_clock_cpu, max count = 8895832
> > >   sched_clock, max count = 8894627
> > >   kernel/entry/common.c:exit_to_user_mode_prepare, max count = 8289031
> > >   fpregs_assert_state_consistent, max count = 8287198
> > >
> > > --
> > > Thanks,
> > > ~Nick Desaulniers
> > >
> >
> > --
> > You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210116001324.2865-1-nick.desaulniers%40gmail.com.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-13  6:19     ` [PATCH v4] " Bill Wendling
  2021-01-13 20:55       ` Nathan Chancellor
  2021-01-13 23:01       ` Nick Desaulniers
@ 2021-01-16  9:43       ` Bill Wendling
  2021-01-16 17:38         ` Sedat Dilek
                           ` (3 more replies)
  2 siblings, 4 replies; 116+ messages in thread
From: Bill Wendling @ 2021-01-16  9:43 UTC (permalink / raw)
  To: Jonathan Corbet, Masahiro Yamada, linux-doc, linux-kernel,
	linux-kbuild, clang-built-linux, Andrew Morton
  Cc: Nathan Chancellor, Nick Desaulniers, Sami Tolvanen, Bill Wendling

From: Sami Tolvanen <samitolvanen@google.com>

Enable the use of clang's Profile-Guided Optimization[1]. To generate a
profile, the kernel is instrumented with PGO counters, a representative
workload is run, and the raw profile data is collected from
/sys/kernel/debug/pgo/profraw.

The raw profile data must be processed by clang's "llvm-profdata" tool
before it can be used during recompilation:

  $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
  $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw

Multiple raw profiles may be merged during this step.

The data can now be used by the compiler:

  $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...

This initial submission is restricted to x86, as that's the platform we
know works. This restriction can be lifted once other platforms have
been verified to work with PGO.

Note that this method of profiling the kernel is clang-native, unlike
the clang support in kernel/gcov.

[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Co-developed-by: Bill Wendling <morbo@google.com>
Signed-off-by: Bill Wendling <morbo@google.com>
---
v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
      testing.
    - Corrected documentation, re PGO flags when using LTO, based on Fangrui
      Song's comments.
v3: - Added change log section based on Sedat Dilek's comments.
v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
      own popcount implementation, based on Nick Desaulniers's comment.
v5: - Correct padding calculation, discovered by Nathan Chancellor.
---
 Documentation/dev-tools/index.rst     |   1 +
 Documentation/dev-tools/pgo.rst       | 127 +++++++++
 MAINTAINERS                           |   9 +
 Makefile                              |   3 +
 arch/Kconfig                          |   1 +
 arch/x86/Kconfig                      |   1 +
 arch/x86/boot/Makefile                |   1 +
 arch/x86/boot/compressed/Makefile     |   1 +
 arch/x86/crypto/Makefile              |   2 +
 arch/x86/entry/vdso/Makefile          |   1 +
 arch/x86/kernel/vmlinux.lds.S         |   2 +
 arch/x86/platform/efi/Makefile        |   1 +
 arch/x86/purgatory/Makefile           |   1 +
 arch/x86/realmode/rm/Makefile         |   1 +
 arch/x86/um/vdso/Makefile             |   1 +
 drivers/firmware/efi/libstub/Makefile |   1 +
 include/asm-generic/vmlinux.lds.h     |  44 +++
 kernel/Makefile                       |   1 +
 kernel/pgo/Kconfig                    |  35 +++
 kernel/pgo/Makefile                   |   5 +
 kernel/pgo/fs.c                       | 382 ++++++++++++++++++++++++++
 kernel/pgo/instrument.c               | 185 +++++++++++++
 kernel/pgo/pgo.h                      | 206 ++++++++++++++
 scripts/Makefile.lib                  |  10 +
 24 files changed, 1022 insertions(+)
 create mode 100644 Documentation/dev-tools/pgo.rst
 create mode 100644 kernel/pgo/Kconfig
 create mode 100644 kernel/pgo/Makefile
 create mode 100644 kernel/pgo/fs.c
 create mode 100644 kernel/pgo/instrument.c
 create mode 100644 kernel/pgo/pgo.h

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index f7809c7b1ba9e..8d6418e858062 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -26,6 +26,7 @@ whole; patches welcome!
    kgdb
    kselftest
    kunit/index
+   pgo
 
 
 .. only::  subproject and html
diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
new file mode 100644
index 0000000000000..b7f11d8405b73
--- /dev/null
+++ b/Documentation/dev-tools/pgo.rst
@@ -0,0 +1,127 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+Using PGO with the Linux kernel
+===============================
+
+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
+when building with Clang. The profiling data is exported via the ``pgo``
+debugfs directory.
+
+.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
+
+
+Preparation
+===========
+
+Configure the kernel with:
+
+.. code-block:: make
+
+   CONFIG_DEBUG_FS=y
+   CONFIG_PGO_CLANG=y
+
+Note that kernels compiled with profiling flags will be significantly larger
+and run slower.
+
+Profiling data will only become accessible once debugfs has been mounted:
+
+.. code-block:: sh
+
+   mount -t debugfs none /sys/kernel/debug
+
+
+Customization
+=============
+
+You can enable or disable profiling for individual file and directories by
+adding a line similar to the following to the respective kernel Makefile:
+
+- For a single file (e.g. main.o)
+
+  .. code-block:: make
+
+     PGO_PROFILE_main.o := y
+
+- For all files in one directory
+
+  .. code-block:: make
+
+     PGO_PROFILE := y
+
+To exclude files from being profiled use
+
+  .. code-block:: make
+
+     PGO_PROFILE_main.o := n
+
+and
+
+  .. code-block:: make
+
+     PGO_PROFILE := n
+
+Only files which are linked to the main kernel image or are compiled as kernel
+modules are supported by this mechanism.
+
+
+Files
+=====
+
+The PGO kernel support creates the following files in debugfs:
+
+``/sys/kernel/debug/pgo``
+	Parent directory for all PGO-related files.
+
+``/sys/kernel/debug/pgo/reset``
+	Global reset file: resets all coverage data to zero when written to.
+
+``/sys/kernel/debug/profraw``
+	The raw PGO data that must be processed with ``llvm_profdata``.
+
+
+Workflow
+========
+
+The PGO kernel can be run on the host or test machines. The data though should
+be analyzed with Clang's tools from the same Clang version as the kernel was
+compiled. Clang's tolerant of version skew, but it's easier to use the same
+Clang version.
+
+The profiling data is useful for optimizing the kernel, analyzing coverage,
+etc. Clang offers tools to perform these tasks.
+
+Here is an example workflow for profiling an instrumented kernel with PGO and
+using the result to optimize the kernel:
+
+1) Install the kernel on the TEST machine.
+
+2) Reset the data counters right before running the load tests
+
+   .. code-block:: sh
+
+      $ echo 1 > /sys/kernel/debug/pgo/reset
+
+3) Run the load tests.
+
+4) Collect the raw profile data
+
+   .. code-block:: sh
+
+      $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
+
+5) (Optional) Download the raw profile data to the HOST machine.
+
+6) Process the raw profile data
+
+   .. code-block:: sh
+
+      $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
+
+   Note that multiple raw profile data files can be merged during this step.
+
+7) Rebuild the kernel using the profile data (PGO disabled)
+
+   .. code-block:: sh
+
+      $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
diff --git a/MAINTAINERS b/MAINTAINERS
index 79b400c97059f..cb1f1f2b2baf4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13948,6 +13948,15 @@ S:	Maintained
 F:	include/linux/personality.h
 F:	include/uapi/linux/personality.h
 
+PGO BASED KERNEL PROFILING
+M:	Sami Tolvanen <samitolvanen@google.com>
+M:	Bill Wendling <wcw@google.com>
+R:	Nathan Chancellor <natechancellor@gmail.com>
+R:	Nick Desaulniers <ndesaulniers@google.com>
+S:	Supported
+F:	Documentation/dev-tools/pgo.rst
+F:	kernel/pgo
+
 PHOENIX RC FLIGHT CONTROLLER ADAPTER
 M:	Marcus Folkesson <marcus.folkesson@gmail.com>
 L:	linux-input@vger.kernel.org
diff --git a/Makefile b/Makefile
index 9e73f82e0d863..9128bfe1ccc97 100644
--- a/Makefile
+++ b/Makefile
@@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
 # Defaults to vmlinux, but the arch makefile usually adds further targets
 all: vmlinux
 
+CFLAGS_PGO_CLANG := -fprofile-generate
+export CFLAGS_PGO_CLANG
+
 CFLAGS_GCOV	:= -fprofile-arcs -ftest-coverage \
 	$(call cc-option,-fno-tree-loop-im) \
 	$(call cc-disable-warning,maybe-uninitialized,)
diff --git a/arch/Kconfig b/arch/Kconfig
index 24862d15f3a36..f39d3991f6bfe 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
 	   pairs of 32-bit arguments, select this option.
 
 source "kernel/gcov/Kconfig"
+source "kernel/pgo/Kconfig"
 
 source "scripts/gcc-plugins/Kconfig"
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 21f851179ff08..36305ea61dc09 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -96,6 +96,7 @@ config X86
 	select ARCH_SUPPORTS_DEBUG_PAGEALLOC
 	select ARCH_SUPPORTS_NUMA_BALANCING	if X86_64
 	select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP	if NR_CPUS <= 4096
+	select ARCH_SUPPORTS_PGO_CLANG		if X86_64
 	select ARCH_USE_BUILTIN_BSWAP
 	select ARCH_USE_QUEUED_RWLOCKS
 	select ARCH_USE_QUEUED_SPINLOCKS
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index fe605205b4ce2..383853e32f673 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -71,6 +71,7 @@ KBUILD_AFLAGS	:= $(KBUILD_CFLAGS) -D__ASSEMBLY__
 KBUILD_CFLAGS	+= $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
 KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
 
 $(obj)/bzImage: asflags-y  := $(SVGA_MODE)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index e0bc3988c3faa..ed12ab65f6065 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
 
 KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE :=n
 
 KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index a31de0c6ccde2..775fa0b368e98 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -4,6 +4,8 @@
 
 OBJECT_FILES_NON_STANDARD := y
 
+PGO_PROFILE_curve25519-x86_64.o := n
+
 obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o
 
 obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 02e3e42f380bd..26e2b3af0145c 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO    $@
 VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
 	$(call ld-option, --eh-frame-hdr) -Bsymbolic
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 quiet_cmd_vdso_and_check = VDSO    $@
       cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index efd9e9ea17f25..f6cab2316c46a 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -184,6 +184,8 @@ SECTIONS
 
 	BUG_TABLE
 
+	PGO_CLANG_DATA
+
 	ORC_UNWIND_TABLE
 
 	. = ALIGN(PAGE_SIZE);
diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
index 84b09c230cbd5..5f22b31446ad4 100644
--- a/arch/x86/platform/efi/Makefile
+++ b/arch/x86/platform/efi/Makefile
@@ -2,6 +2,7 @@
 OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
 KASAN_SANITIZE := n
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 obj-$(CONFIG_EFI) 		+= quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
 obj-$(CONFIG_EFI_MIXED)		+= efi_thunk_$(BITS).o
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 95ea17a9d20cb..36f20e99da0bc 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
 
 # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
 GCOV_PROFILE	:= n
+PGO_PROFILE	:= n
 KASAN_SANITIZE	:= n
 UBSAN_SANITIZE	:= n
 KCSAN_SANITIZE	:= n
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 83f1b6a56449f..21797192f958f 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -76,4 +76,5 @@ KBUILD_CFLAGS	:= $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
 KBUILD_AFLAGS	:= $(KBUILD_CFLAGS) -D__ASSEMBLY__
 KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables
 GCOV_PROFILE := n
+PGO_PROFILE := n
 UBSAN_SANITIZE := n
diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
index 5943387e3f357..54f5768f58530 100644
--- a/arch/x86/um/vdso/Makefile
+++ b/arch/x86/um/vdso/Makefile
@@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO    $@
 
 VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
 GCOV_PROFILE := n
+PGO_PROFILE := n
 
 #
 # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 8a94388e38b33..2d81623b33f29 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -40,6 +40,7 @@ KBUILD_CFLAGS			:= $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
 KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
 
 GCOV_PROFILE			:= n
+PGO_PROFILE			:= n
 # Sanitizer runtimes are unavailable and cannot be linked here.
 KASAN_SANITIZE			:= n
 KCSAN_SANITIZE			:= n
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index b2b3d81b1535a..3a591bb18c5fb 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -316,6 +316,49 @@
 #define THERMAL_TABLE(name)
 #endif
 
+#ifdef CONFIG_PGO_CLANG
+#define PGO_CLANG_DATA							\
+	__llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) {	\
+		. = ALIGN(8);						\
+		__llvm_prf_start = .;					\
+		__llvm_prf_data_start = .;				\
+		KEEP(*(__llvm_prf_data))				\
+		. = ALIGN(8);						\
+		__llvm_prf_data_end = .;				\
+	}								\
+	__llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) {	\
+		. = ALIGN(8);						\
+		__llvm_prf_cnts_start = .;				\
+		KEEP(*(__llvm_prf_cnts))				\
+		. = ALIGN(8);						\
+		__llvm_prf_cnts_end = .;				\
+	}								\
+	__llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) {	\
+		. = ALIGN(8);						\
+		__llvm_prf_names_start = .;				\
+		KEEP(*(__llvm_prf_names))				\
+		. = ALIGN(8);						\
+		__llvm_prf_names_end = .;				\
+		. = ALIGN(8);						\
+	}								\
+	__llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) {	\
+		__llvm_prf_vals_start = .;				\
+		KEEP(*(__llvm_prf_vals))				\
+		. = ALIGN(8);						\
+		__llvm_prf_vals_end = .;				\
+		. = ALIGN(8);						\
+	}								\
+	__llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) {	\
+		__llvm_prf_vnds_start = .;				\
+		KEEP(*(__llvm_prf_vnds))				\
+		. = ALIGN(8);						\
+		__llvm_prf_vnds_end = .;				\
+		__llvm_prf_end = .;					\
+	}
+#else
+#define PGO_CLANG_DATA
+#endif
+
 #define KERNEL_DTB()							\
 	STRUCT_ALIGN();							\
 	__dtb_start = .;						\
@@ -1125,6 +1168,7 @@
 		CONSTRUCTORS						\
 	}								\
 	BUG_TABLE							\
+	PGO_CLANG_DATA
 
 #define INIT_TEXT_SECTION(inittext_align)				\
 	. = ALIGN(inittext_align);					\
diff --git a/kernel/Makefile b/kernel/Makefile
index aa7368c7eabf3..0b34ca228ba46 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
 obj-$(CONFIG_KCSAN) += kcsan/
 obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
 obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
+obj-$(CONFIG_PGO_CLANG) += pgo/
 
 obj-$(CONFIG_PERF_EVENTS) += events/
 
diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
new file mode 100644
index 0000000000000..76a640b6cf6ed
--- /dev/null
+++ b/kernel/pgo/Kconfig
@@ -0,0 +1,35 @@
+# SPDX-License-Identifier: GPL-2.0-only
+menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
+
+config ARCH_SUPPORTS_PGO_CLANG
+	bool
+
+config PGO_CLANG
+	bool "Enable clang's PGO-based kernel profiling"
+	depends on DEBUG_FS
+	depends on ARCH_SUPPORTS_PGO_CLANG
+	depends on CC_IS_CLANG && CLANG_VERSION >= 120000
+	help
+	  This option enables clang's PGO (Profile Guided Optimization) based
+	  code profiling to better optimize the kernel.
+
+	  If unsure, say N.
+
+	  Run a representative workload for your application on a kernel
+	  compiled with this option and download the raw profile file from
+	  /sys/kernel/debug/pgo/profraw. This file needs to be processed with
+	  llvm-profdata. It may be merged with other collected raw profiles.
+
+	  Copy the resulting profile file into vmlinux.profdata, and enable
+	  KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
+	  kernel.
+
+	  Note that a kernel compiled with profiling flags will be
+	  significantly larger and run slower. Also be sure to exclude files
+	  from profiling which are not linked to the kernel image to prevent
+	  linker errors.
+
+	  Note that the debugfs filesystem has to be mounted to access
+	  profiling data.
+
+endmenu
diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
new file mode 100644
index 0000000000000..41e27cefd9a47
--- /dev/null
+++ b/kernel/pgo/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+GCOV_PROFILE	:= n
+PGO_PROFILE	:= n
+
+obj-y	+= fs.o instrument.o
diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
new file mode 100644
index 0000000000000..68b24672be10a
--- /dev/null
+++ b/kernel/pgo/fs.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ *	Sami Tolvanen <samitolvanen@google.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt)	"pgo: " fmt
+
+#include <linux/kernel.h>
+#include <linux/debugfs.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include "pgo.h"
+
+static struct dentry *directory;
+
+struct prf_private_data {
+	void *buffer;
+	unsigned long size;
+};
+
+/*
+ * Raw profile data format:
+ *
+ *	- llvm_prf_header
+ *	- __llvm_prf_data
+ *	- __llvm_prf_cnts
+ *	- __llvm_prf_names
+ *	- zero padding to 8 bytes
+ *	- for each llvm_prf_data in __llvm_prf_data:
+ *		- llvm_prf_value_data
+ *			- llvm_prf_value_record + site count array
+ *				- llvm_prf_value_node_data
+ *				...
+ *			...
+ *		...
+ */
+
+static void prf_fill_header(void **buffer)
+{
+	struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
+
+	header->magic = LLVM_PRF_MAGIC;
+	header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
+	header->data_size = prf_data_count();
+	header->padding_bytes_before_counters = 0;
+	header->counters_size = prf_cnts_count();
+	header->padding_bytes_after_counters = 0;
+	header->names_size = prf_names_count();
+	header->counters_delta = (u64)__llvm_prf_cnts_start;
+	header->names_delta = (u64)__llvm_prf_names_start;
+	header->value_kind_last = LLVM_PRF_IPVK_LAST;
+
+	*buffer += sizeof(*header);
+}
+
+/*
+ * Copy the source into the buffer, incrementing the pointer into buffer in the
+ * process.
+ */
+static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
+{
+	memcpy(*buffer, src, size);
+	*buffer += size;
+}
+
+static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
+{
+	struct llvm_prf_value_node **nodes =
+		(struct llvm_prf_value_node **)p->values;
+	u32 kinds = 0;
+	u32 size = 0;
+	unsigned int kind;
+	unsigned int n;
+	unsigned int s = 0;
+
+	for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+		unsigned int sites = p->num_value_sites[kind];
+
+		if (!sites)
+			continue;
+
+		/* Record + site count array */
+		size += prf_get_value_record_size(sites);
+		kinds++;
+
+		if (!nodes)
+			continue;
+
+		for (n = 0; n < sites; n++) {
+			u32 count = 0;
+			struct llvm_prf_value_node *site = nodes[s + n];
+
+			while (site && ++count <= U8_MAX)
+				site = site->next;
+
+			size += count *
+				sizeof(struct llvm_prf_value_node_data);
+		}
+
+		s += sites;
+	}
+
+	if (size)
+		size += sizeof(struct llvm_prf_value_data);
+
+	if (value_kinds)
+		*value_kinds = kinds;
+
+	return size;
+}
+
+static u32 prf_get_value_size(void)
+{
+	u32 size = 0;
+	struct llvm_prf_data *p;
+
+	for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+		size += __prf_get_value_size(p, NULL);
+
+	return size;
+}
+
+/* Serialize the profiling's value. */
+static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
+{
+	struct llvm_prf_value_data header;
+	struct llvm_prf_value_node **nodes =
+		(struct llvm_prf_value_node **)p->values;
+	unsigned int kind;
+	unsigned int n;
+	unsigned int s = 0;
+
+	header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
+
+	if (!header.num_value_kinds)
+		/* Nothing to write. */
+		return;
+
+	prf_copy_to_buffer(buffer, &header, sizeof(header));
+
+	for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+		struct llvm_prf_value_record *record;
+		u8 *counts;
+		unsigned int sites = p->num_value_sites[kind];
+
+		if (!sites)
+			continue;
+
+		/* Profiling value record. */
+		record = *(struct llvm_prf_value_record **)buffer;
+		*buffer += prf_get_value_record_header_size();
+
+		record->kind = kind;
+		record->num_value_sites = sites;
+
+		/* Site count array. */
+		counts = *(u8 **)buffer;
+		*buffer += prf_get_value_record_site_count_size(sites);
+
+		/*
+		 * If we don't have nodes, we can skip updating the site count
+		 * array, because the buffer is zero filled.
+		 */
+		if (!nodes)
+			continue;
+
+		for (n = 0; n < sites; n++) {
+			u32 count = 0;
+			struct llvm_prf_value_node *site = nodes[s + n];
+
+			while (site && ++count <= U8_MAX) {
+				prf_copy_to_buffer(buffer, site,
+						   sizeof(struct llvm_prf_value_node_data));
+				site = site->next;
+			}
+
+			counts[n] = (u8)count;
+		}
+
+		s += sites;
+	}
+}
+
+static void prf_serialize_values(void **buffer)
+{
+	struct llvm_prf_data *p;
+
+	for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+		prf_serialize_value(p, buffer);
+}
+
+static inline unsigned long prf_get_padding(unsigned long size)
+{
+	return 7 & (8 - size % 8);
+}
+
+static unsigned long prf_buffer_size(void)
+{
+	return sizeof(struct llvm_prf_header) +
+			prf_data_size()	+
+			prf_cnts_size() +
+			prf_names_size() +
+			prf_get_padding(prf_names_size()) +
+			prf_get_value_size();
+}
+
+/* Serialize the profiling data into a format LLVM's tools can understand. */
+static int prf_serialize(struct prf_private_data *p)
+{
+	int err = 0;
+	void *buffer;
+
+	p->size = prf_buffer_size();
+	p->buffer = vzalloc(p->size);
+
+	if (!p->buffer) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	buffer = p->buffer;
+
+	prf_fill_header(&buffer);
+	prf_copy_to_buffer(&buffer, __llvm_prf_data_start,  prf_data_size());
+	prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start,  prf_cnts_size());
+	prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
+	buffer += prf_get_padding(prf_names_size());
+
+	prf_serialize_values(&buffer);
+
+out:
+	return err;
+}
+
+/* open() implementation for PGO. Creates a copy of the profiling data set. */
+static int prf_open(struct inode *inode, struct file *file)
+{
+	struct prf_private_data *data;
+	unsigned long flags;
+	int err;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	flags = prf_lock();
+
+	err = prf_serialize(data);
+	if (err) {
+		kfree(data);
+		goto out_unlock;
+	}
+
+	file->private_data = data;
+
+out_unlock:
+	prf_unlock(flags);
+out:
+	return err;
+}
+
+/* read() implementation for PGO. */
+static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
+			loff_t *ppos)
+{
+	struct prf_private_data *data = file->private_data;
+
+	BUG_ON(!data);
+
+	return simple_read_from_buffer(buf, count, ppos, data->buffer,
+				       data->size);
+}
+
+/* release() implementation for PGO. Release resources allocated by open(). */
+static int prf_release(struct inode *inode, struct file *file)
+{
+	struct prf_private_data *data = file->private_data;
+
+	if (data) {
+		vfree(data->buffer);
+		kfree(data);
+	}
+
+	return 0;
+}
+
+static const struct file_operations prf_fops = {
+	.owner		= THIS_MODULE,
+	.open		= prf_open,
+	.read		= prf_read,
+	.llseek		= default_llseek,
+	.release	= prf_release
+};
+
+/* write() implementation for resetting PGO's profile data. */
+static ssize_t reset_write(struct file *file, const char __user *addr,
+			   size_t len, loff_t *pos)
+{
+	struct llvm_prf_data *data;
+
+	memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
+
+	for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
+		struct llvm_prf_value_node **vnodes;
+		u64 current_vsite_count;
+		u32 i;
+
+		if (!data->values)
+			continue;
+
+		current_vsite_count = 0;
+		vnodes = (struct llvm_prf_value_node **)data->values;
+
+		for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
+			current_vsite_count += data->num_value_sites[i];
+
+		for (i = 0; i < current_vsite_count; ++i) {
+			struct llvm_prf_value_node *current_vnode = vnodes[i];
+
+			while (current_vnode) {
+				current_vnode->count = 0;
+				current_vnode = current_vnode->next;
+			}
+		}
+	}
+
+	return len;
+}
+
+static const struct file_operations prf_reset_fops = {
+	.owner		= THIS_MODULE,
+	.write		= reset_write,
+	.llseek		= noop_llseek,
+};
+
+/* Create debugfs entries. */
+static int __init pgo_init(void)
+{
+	directory = debugfs_create_dir("pgo", NULL);
+	if (!directory)
+		goto err_remove;
+
+	if (!debugfs_create_file("profraw", 0600, directory, NULL,
+				 &prf_fops))
+		goto err_remove;
+
+	if (!debugfs_create_file("reset", 0200, directory, NULL,
+				 &prf_reset_fops))
+		goto err_remove;
+
+	return 0;
+
+err_remove:
+	pr_err("initialization failed\n");
+	return -EIO;
+}
+
+/* Remove debugfs entries. */
+static void __exit pgo_exit(void)
+{
+	debugfs_remove_recursive(directory);
+}
+
+module_init(pgo_init);
+module_exit(pgo_exit);
diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
new file mode 100644
index 0000000000000..6084ff0652e85
--- /dev/null
+++ b/kernel/pgo/instrument.c
@@ -0,0 +1,185 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ *	Sami Tolvanen <samitolvanen@google.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt)	"pgo: " fmt
+
+#include <linux/bitops.h>
+#include <linux/kernel.h>
+#include <linux/export.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+#include "pgo.h"
+
+/* Lock guarding value node access and serialization. */
+static DEFINE_SPINLOCK(pgo_lock);
+static int current_node;
+
+unsigned long prf_lock(void)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&pgo_lock, flags);
+
+	return flags;
+}
+
+void prf_unlock(unsigned long flags)
+{
+	spin_unlock_irqrestore(&pgo_lock, flags);
+}
+
+/*
+ * Return a newly allocated profiling value node which contains the tracked
+ * value by the value profiler.
+ * Note: caller *must* hold pgo_lock.
+ */
+static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
+						 u32 index, u64 value)
+{
+	if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
+		return NULL; /* Out of nodes */
+
+	current_node++;
+
+	/* Make sure the node is entirely within the section */
+	if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
+	    &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
+		return NULL;
+
+	return &__llvm_prf_vnds_start[current_node];
+}
+
+/*
+ * Counts the number of times a target value is seen.
+ *
+ * Records the target value for the CounterIndex if not seen before. Otherwise,
+ * increments the counter associated w/ the target value.
+ */
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
+{
+	struct llvm_prf_data *p = (struct llvm_prf_data *)data;
+	struct llvm_prf_value_node **counters;
+	struct llvm_prf_value_node *curr;
+	struct llvm_prf_value_node *min = NULL;
+	struct llvm_prf_value_node *prev = NULL;
+	u64 min_count = U64_MAX;
+	u8 values = 0;
+	unsigned long flags;
+
+	if (!p || !p->values)
+		return;
+
+	counters = (struct llvm_prf_value_node **)p->values;
+	curr = counters[index];
+
+	while (curr) {
+		if (target_value == curr->value) {
+			curr->count++;
+			return;
+		}
+
+		if (curr->count < min_count) {
+			min_count = curr->count;
+			min = curr;
+		}
+
+		prev = curr;
+		curr = curr->next;
+		values++;
+	}
+
+	if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
+		if (!min->count || !(--min->count)) {
+			curr = min;
+			curr->value = target_value;
+			curr->count++;
+		}
+		return;
+	}
+
+	/* Lock when updating the value node structure. */
+	flags = prf_lock();
+
+	curr = allocate_node(p, index, target_value);
+	if (!curr)
+		goto out;
+
+	curr->value = target_value;
+	curr->count++;
+
+	if (!counters[index])
+		counters[index] = curr;
+	else if (prev && !prev->next)
+		prev->next = curr;
+
+out:
+	prf_unlock(flags);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_target);
+
+/* Counts the number of times a range of targets values are seen. */
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+				     u32 index, s64 precise_start,
+				     s64 precise_last, s64 large_value);
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+				     u32 index, s64 precise_start,
+				     s64 precise_last, s64 large_value)
+{
+	if (large_value != S64_MIN && (s64)target_value >= large_value)
+		target_value = large_value;
+	else if ((s64)target_value < precise_start ||
+		 (s64)target_value > precise_last)
+		target_value = precise_last + 1;
+
+	__llvm_profile_instrument_target(target_value, data, index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_range);
+
+static u64 inst_prof_get_range_rep_value(u64 value)
+{
+	if (value <= 8)
+		/* The first ranges are individually tracked, us it as is. */
+		return value;
+	else if (value >= 513)
+		/* The last range is mapped to its lowest value. */
+		return 513;
+	else if (hweight64(value) == 1)
+		/* If it's a power of two, use it as is. */
+		return value;
+
+	/* Otherwise, take to the previous power of two + 1. */
+	return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
+}
+
+/*
+ * The target values are partitioned into multiple ranges. The range spec is
+ * defined in compiler-rt/include/profile/InstrProfData.inc.
+ */
+void __llvm_profile_instrument_memop(u64 target_value, void *data,
+				     u32 counter_index);
+void __llvm_profile_instrument_memop(u64 target_value, void *data,
+				     u32 counter_index)
+{
+	u64 rep_value;
+
+	/* Map the target value to the representative value of its range. */
+	rep_value = inst_prof_get_range_rep_value(target_value);
+	__llvm_profile_instrument_target(rep_value, data, counter_index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_memop);
diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
new file mode 100644
index 0000000000000..df0aa278f28bd
--- /dev/null
+++ b/kernel/pgo/pgo.h
@@ -0,0 +1,206 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ *	Sami Tolvanen <samitolvanen@google.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _PGO_H
+#define _PGO_H
+
+/*
+ * Note: These internal LLVM definitions must match the compiler version.
+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
+ */
+
+#ifdef CONFIG_64BIT
+	#define LLVM_PRF_MAGIC		\
+		((u64)255 << 56 |	\
+		 (u64)'l' << 48 |	\
+		 (u64)'p' << 40 |	\
+		 (u64)'r' << 32 |	\
+		 (u64)'o' << 24 |	\
+		 (u64)'f' << 16 |	\
+		 (u64)'r' << 8  |	\
+		 (u64)129)
+#else
+	#define LLVM_PRF_MAGIC		\
+		((u64)255 << 56 |	\
+		 (u64)'l' << 48 |	\
+		 (u64)'p' << 40 |	\
+		 (u64)'r' << 32 |	\
+		 (u64)'o' << 24 |	\
+		 (u64)'f' << 16 |	\
+		 (u64)'R' << 8  |	\
+		 (u64)129)
+#endif
+
+#define LLVM_PRF_VERSION		5
+#define LLVM_PRF_DATA_ALIGN		8
+#define LLVM_PRF_IPVK_FIRST		0
+#define LLVM_PRF_IPVK_LAST		1
+#define LLVM_PRF_MAX_NUM_VALS_PER_SITE	16
+
+#define LLVM_PRF_VARIANT_MASK_IR	(0x1ull << 56)
+#define LLVM_PRF_VARIANT_MASK_CSIR	(0x1ull << 57)
+
+/**
+ * struct llvm_prf_header - represents the raw profile header data structure.
+ * @magic: the magic token for the file format.
+ * @version: the version of the file format.
+ * @data_size: the number of entries in the profile data section.
+ * @padding_bytes_before_counters: the number of padding bytes before the
+ *   counters.
+ * @counters_size: the size in bytes of the LLVM profile section containing the
+ *   counters.
+ * @padding_bytes_after_counters: the number of padding bytes after the
+ *   counters.
+ * @names_size: the size in bytes of the LLVM profile section containing the
+ *   counters' names.
+ * @counters_delta: the beginning of the LLMV profile counters section.
+ * @names_delta: the beginning of the LLMV profile names section.
+ * @value_kind_last: the last profile value kind.
+ */
+struct llvm_prf_header {
+	u64 magic;
+	u64 version;
+	u64 data_size;
+	u64 padding_bytes_before_counters;
+	u64 counters_size;
+	u64 padding_bytes_after_counters;
+	u64 names_size;
+	u64 counters_delta;
+	u64 names_delta;
+	u64 value_kind_last;
+};
+
+/**
+ * struct llvm_prf_data - represents the per-function control structure.
+ * @name_ref: the reference to the function's name.
+ * @func_hash: the hash value of the function.
+ * @counter_ptr: a pointer to the profile counter.
+ * @function_ptr: a pointer to the function.
+ * @values: the profiling values associated with this function.
+ * @num_counters: the number of counters in the function.
+ * @num_value_sites: the number of value profile sites.
+ */
+struct llvm_prf_data {
+	const u64 name_ref;
+	const u64 func_hash;
+	const void *counter_ptr;
+	const void *function_ptr;
+	void *values;
+	const u32 num_counters;
+	const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
+} __aligned(LLVM_PRF_DATA_ALIGN);
+
+/**
+ * structure llvm_prf_value_node_data - represents the data part of the struct
+ *   llvm_prf_value_node data structure.
+ * @value: the value counters.
+ * @count: the counters' count.
+ */
+struct llvm_prf_value_node_data {
+	u64 value;
+	u64 count;
+};
+
+/**
+ * struct llvm_prf_value_node - represents an internal data structure used by
+ *   the value profiler.
+ * @value: the value counters.
+ * @count: the counters' count.
+ * @next: the next value node.
+ */
+struct llvm_prf_value_node {
+	u64 value;
+	u64 count;
+	struct llvm_prf_value_node *next;
+};
+
+/**
+ * struct llvm_prf_value_data - represents the value profiling data in indexed
+ *   format.
+ * @total_size: the total size in bytes including this field.
+ * @num_value_kinds: the number of value profile kinds that has value profile
+ *   data.
+ */
+struct llvm_prf_value_data {
+	u32 total_size;
+	u32 num_value_kinds;
+};
+
+/**
+ * struct llvm_prf_value_record - represents the on-disk layout of the value
+ *   profile data of a particular kind for one function.
+ * @kind: the kind of the value profile record.
+ * @num_value_sites: the number of value profile sites.
+ * @site_count_array: the first element of the array that stores the number
+ *   of profiled values for each value site.
+ */
+struct llvm_prf_value_record {
+	u32 kind;
+	u32 num_value_sites;
+	u8 site_count_array[];
+};
+
+#define prf_get_value_record_header_size()		\
+	offsetof(struct llvm_prf_value_record, site_count_array)
+#define prf_get_value_record_site_count_size(sites)	\
+	roundup((sites), 8)
+#define prf_get_value_record_size(sites)		\
+	(prf_get_value_record_header_size() +		\
+	 prf_get_value_record_site_count_size((sites)))
+
+/* Data sections */
+extern struct llvm_prf_data __llvm_prf_data_start[];
+extern struct llvm_prf_data __llvm_prf_data_end[];
+
+extern u64 __llvm_prf_cnts_start[];
+extern u64 __llvm_prf_cnts_end[];
+
+extern char __llvm_prf_names_start[];
+extern char __llvm_prf_names_end[];
+
+extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
+extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
+
+/* Locking for vnodes */
+extern unsigned long prf_lock(void);
+extern void prf_unlock(unsigned long flags);
+
+#define __DEFINE_PRF_SIZE(s) \
+	static inline unsigned long prf_ ## s ## _size(void)		\
+	{								\
+		unsigned long start =					\
+			(unsigned long)__llvm_prf_ ## s ## _start;	\
+		unsigned long end =					\
+			(unsigned long)__llvm_prf_ ## s ## _end;	\
+		return roundup(end - start,				\
+				sizeof(__llvm_prf_ ## s ## _start[0]));	\
+	}								\
+	static inline unsigned long prf_ ## s ## _count(void)		\
+	{								\
+		return prf_ ## s ## _size() /				\
+			sizeof(__llvm_prf_ ## s ## _start[0]);		\
+	}
+
+__DEFINE_PRF_SIZE(data);
+__DEFINE_PRF_SIZE(cnts);
+__DEFINE_PRF_SIZE(names);
+__DEFINE_PRF_SIZE(vnds);
+
+#undef __DEFINE_PRF_SIZE
+
+#endif /* _PGO_H */
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 213677a5ed33e..9b218afb5cb87 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
 		$(CFLAGS_GCOV))
 endif
 
+#
+# Enable clang's PGO profiling flags for a file or directory depending on
+# variables PGO_PROFILE_obj.o and PGO_PROFILE.
+#
+ifeq ($(CONFIG_PGO_CLANG),y)
+_c_flags += $(if $(patsubst n%,, \
+		$(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
+		$(CFLAGS_PGO_CLANG))
+endif
+
 #
 # Enable address sanitizer flags for kernel except some files or directories
 # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
-- 
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-16  9:43       ` [PATCH v5] " Bill Wendling
@ 2021-01-16 17:38         ` Sedat Dilek
  2021-01-16 18:36           ` Sedat Dilek
  2021-01-16 20:23           ` Bill Wendling
  2021-01-20  1:02         ` Nick Desaulniers
                           ` (2 subsequent siblings)
  3 siblings, 2 replies; 116+ messages in thread
From: Sedat Dilek @ 2021-01-16 17:38 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Jonathan Corbet, Masahiro Yamada, linux-doc, linux-kernel,
	linux-kbuild, Clang-Built-Linux ML, Andrew Morton,
	Nathan Chancellor, Nick Desaulniers, Sami Tolvanen

On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
<clang-built-linux@googlegroups.com> wrote:
>
> From: Sami Tolvanen <samitolvanen@google.com>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
>
>   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
>   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can now be used by the compiler:
>
>   $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we
> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.
>
> Note that this method of profiling the kernel is clang-native, unlike
> the clang support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> Co-developed-by: Bill Wendling <morbo@google.com>
> Signed-off-by: Bill Wendling <morbo@google.com>
> ---
> v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
>       testing.
>     - Corrected documentation, re PGO flags when using LTO, based on Fangrui
>       Song's comments.
> v3: - Added change log section based on Sedat Dilek's comments.
> v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
>       own popcount implementation, based on Nick Desaulniers's comment.
> v5: - Correct padding calculation, discovered by Nathan Chancellor.
> ---
>  Documentation/dev-tools/index.rst     |   1 +
>  Documentation/dev-tools/pgo.rst       | 127 +++++++++
>  MAINTAINERS                           |   9 +
>  Makefile                              |   3 +
>  arch/Kconfig                          |   1 +
>  arch/x86/Kconfig                      |   1 +
>  arch/x86/boot/Makefile                |   1 +
>  arch/x86/boot/compressed/Makefile     |   1 +
>  arch/x86/crypto/Makefile              |   2 +
>  arch/x86/entry/vdso/Makefile          |   1 +
>  arch/x86/kernel/vmlinux.lds.S         |   2 +
>  arch/x86/platform/efi/Makefile        |   1 +
>  arch/x86/purgatory/Makefile           |   1 +
>  arch/x86/realmode/rm/Makefile         |   1 +
>  arch/x86/um/vdso/Makefile             |   1 +
>  drivers/firmware/efi/libstub/Makefile |   1 +
>  include/asm-generic/vmlinux.lds.h     |  44 +++
>  kernel/Makefile                       |   1 +
>  kernel/pgo/Kconfig                    |  35 +++
>  kernel/pgo/Makefile                   |   5 +
>  kernel/pgo/fs.c                       | 382 ++++++++++++++++++++++++++
>  kernel/pgo/instrument.c               | 185 +++++++++++++
>  kernel/pgo/pgo.h                      | 206 ++++++++++++++
>  scripts/Makefile.lib                  |  10 +
>  24 files changed, 1022 insertions(+)
>  create mode 100644 Documentation/dev-tools/pgo.rst
>  create mode 100644 kernel/pgo/Kconfig
>  create mode 100644 kernel/pgo/Makefile
>  create mode 100644 kernel/pgo/fs.c
>  create mode 100644 kernel/pgo/instrument.c
>  create mode 100644 kernel/pgo/pgo.h
>
> diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> index f7809c7b1ba9e..8d6418e858062 100644
> --- a/Documentation/dev-tools/index.rst
> +++ b/Documentation/dev-tools/index.rst
> @@ -26,6 +26,7 @@ whole; patches welcome!
>     kgdb
>     kselftest
>     kunit/index
> +   pgo
>
>
>  .. only::  subproject and html
> diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> new file mode 100644
> index 0000000000000..b7f11d8405b73
> --- /dev/null
> +++ b/Documentation/dev-tools/pgo.rst
> @@ -0,0 +1,127 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============================
> +Using PGO with the Linux kernel
> +===============================
> +
> +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> +when building with Clang. The profiling data is exported via the ``pgo``
> +debugfs directory.
> +
> +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> +
> +
> +Preparation
> +===========
> +
> +Configure the kernel with:
> +
> +.. code-block:: make
> +
> +   CONFIG_DEBUG_FS=y
> +   CONFIG_PGO_CLANG=y
> +
> +Note that kernels compiled with profiling flags will be significantly larger
> +and run slower.
> +
> +Profiling data will only become accessible once debugfs has been mounted:
> +
> +.. code-block:: sh
> +
> +   mount -t debugfs none /sys/kernel/debug
> +
> +
> +Customization
> +=============
> +
> +You can enable or disable profiling for individual file and directories by
> +adding a line similar to the following to the respective kernel Makefile:
> +
> +- For a single file (e.g. main.o)
> +
> +  .. code-block:: make
> +
> +     PGO_PROFILE_main.o := y
> +
> +- For all files in one directory
> +
> +  .. code-block:: make
> +
> +     PGO_PROFILE := y
> +
> +To exclude files from being profiled use
> +
> +  .. code-block:: make
> +
> +     PGO_PROFILE_main.o := n
> +
> +and
> +
> +  .. code-block:: make
> +
> +     PGO_PROFILE := n
> +
> +Only files which are linked to the main kernel image or are compiled as kernel
> +modules are supported by this mechanism.
> +
> +
> +Files
> +=====
> +
> +The PGO kernel support creates the following files in debugfs:
> +
> +``/sys/kernel/debug/pgo``
> +       Parent directory for all PGO-related files.
> +
> +``/sys/kernel/debug/pgo/reset``
> +       Global reset file: resets all coverage data to zero when written to.
> +
> +``/sys/kernel/debug/profraw``
> +       The raw PGO data that must be processed with ``llvm_profdata``.
> +
> +
> +Workflow
> +========
> +
> +The PGO kernel can be run on the host or test machines. The data though should
> +be analyzed with Clang's tools from the same Clang version as the kernel was
> +compiled. Clang's tolerant of version skew, but it's easier to use the same
> +Clang version.
> +
> +The profiling data is useful for optimizing the kernel, analyzing coverage,
> +etc. Clang offers tools to perform these tasks.
> +
> +Here is an example workflow for profiling an instrumented kernel with PGO and
> +using the result to optimize the kernel:
> +
> +1) Install the kernel on the TEST machine.
> +
> +2) Reset the data counters right before running the load tests
> +
> +   .. code-block:: sh
> +
> +      $ echo 1 > /sys/kernel/debug/pgo/reset
> +

I do not get this...

# mount | grep debugfs
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)

After the load-test...?

echo 0 > /sys/kernel/debug/pgo/reset

> +3) Run the load tests.
> +
> +4) Collect the raw profile data
> +
> +   .. code-block:: sh
> +
> +      $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> +

This is only 4,9M small and seen from the date 5mins before I run the
echo-1 line.

# ll /sys/kernel/debug/pgo
insgesamt 0
drwxr-xr-x  2 root root 0 16. Jan 17:29 .
drwx------ 41 root root 0 16. Jan 17:29 ..
-rw-------  1 root root 0 16. Jan 17:29 profraw
--w-------  1 root root 0 16. Jan 18:19 reset

# cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw

# ll /tmp/vmlinux.profraw
-rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw

For me there was no prof-data collected from my defconfig kernel-build.

> +5) (Optional) Download the raw profile data to the HOST machine.
> +
> +6) Process the raw profile data
> +
> +   .. code-block:: sh
> +
> +      $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> +

Is that executed in /path/to/linux/git?

> +   Note that multiple raw profile data files can be merged during this step.
> +
> +7) Rebuild the kernel using the profile data (PGO disabled)
> +
> +   .. code-block:: sh
> +
> +      $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...

How big is vmlinux.profdata (make defconfig)?

Do I need to do a full defconfig build or can I stop the build after
let me say 10mins?

- Sedat -

> diff --git a/MAINTAINERS b/MAINTAINERS
> index 79b400c97059f..cb1f1f2b2baf4 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13948,6 +13948,15 @@ S:     Maintained
>  F:     include/linux/personality.h
>  F:     include/uapi/linux/personality.h
>
> +PGO BASED KERNEL PROFILING
> +M:     Sami Tolvanen <samitolvanen@google.com>
> +M:     Bill Wendling <wcw@google.com>
> +R:     Nathan Chancellor <natechancellor@gmail.com>
> +R:     Nick Desaulniers <ndesaulniers@google.com>
> +S:     Supported
> +F:     Documentation/dev-tools/pgo.rst
> +F:     kernel/pgo
> +
>  PHOENIX RC FLIGHT CONTROLLER ADAPTER
>  M:     Marcus Folkesson <marcus.folkesson@gmail.com>
>  L:     linux-input@vger.kernel.org
> diff --git a/Makefile b/Makefile
> index 9e73f82e0d863..9128bfe1ccc97 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
>  # Defaults to vmlinux, but the arch makefile usually adds further targets
>  all: vmlinux
>
> +CFLAGS_PGO_CLANG := -fprofile-generate
> +export CFLAGS_PGO_CLANG
> +
>  CFLAGS_GCOV    := -fprofile-arcs -ftest-coverage \
>         $(call cc-option,-fno-tree-loop-im) \
>         $(call cc-disable-warning,maybe-uninitialized,)
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 24862d15f3a36..f39d3991f6bfe 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
>            pairs of 32-bit arguments, select this option.
>
>  source "kernel/gcov/Kconfig"
> +source "kernel/pgo/Kconfig"
>
>  source "scripts/gcc-plugins/Kconfig"
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 21f851179ff08..36305ea61dc09 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -96,6 +96,7 @@ config X86
>         select ARCH_SUPPORTS_DEBUG_PAGEALLOC
>         select ARCH_SUPPORTS_NUMA_BALANCING     if X86_64
>         select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP       if NR_CPUS <= 4096
> +       select ARCH_SUPPORTS_PGO_CLANG          if X86_64
>         select ARCH_USE_BUILTIN_BSWAP
>         select ARCH_USE_QUEUED_RWLOCKS
>         select ARCH_USE_QUEUED_SPINLOCKS
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index fe605205b4ce2..383853e32f673 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
>  KBUILD_CFLAGS  += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
>  KBUILD_CFLAGS  += -fno-asynchronous-unwind-tables
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE := n
>
>  $(obj)/bzImage: asflags-y  := $(SVGA_MODE)
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index e0bc3988c3faa..ed12ab65f6065 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
>
>  KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE :=n
>
>  KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> index a31de0c6ccde2..775fa0b368e98 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -4,6 +4,8 @@
>
>  OBJECT_FILES_NON_STANDARD := y
>
> +PGO_PROFILE_curve25519-x86_64.o := n
> +
>  obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o
>
>  obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index 02e3e42f380bd..26e2b3af0145c 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO    $@
>  VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
>         $(call ld-option, --eh-frame-hdr) -Bsymbolic
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  quiet_cmd_vdso_and_check = VDSO    $@
>        cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index efd9e9ea17f25..f6cab2316c46a 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -184,6 +184,8 @@ SECTIONS
>
>         BUG_TABLE
>
> +       PGO_CLANG_DATA
> +
>         ORC_UNWIND_TABLE
>
>         . = ALIGN(PAGE_SIZE);
> diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> index 84b09c230cbd5..5f22b31446ad4 100644
> --- a/arch/x86/platform/efi/Makefile
> +++ b/arch/x86/platform/efi/Makefile
> @@ -2,6 +2,7 @@
>  OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
>  KASAN_SANITIZE := n
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  obj-$(CONFIG_EFI)              += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
>  obj-$(CONFIG_EFI_MIXED)                += efi_thunk_$(BITS).o
> diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> index 95ea17a9d20cb..36f20e99da0bc 100644
> --- a/arch/x86/purgatory/Makefile
> +++ b/arch/x86/purgatory/Makefile
> @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
>
>  # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
>  GCOV_PROFILE   := n
> +PGO_PROFILE    := n
>  KASAN_SANITIZE := n
>  UBSAN_SANITIZE := n
>  KCSAN_SANITIZE := n
> diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> index 83f1b6a56449f..21797192f958f 100644
> --- a/arch/x86/realmode/rm/Makefile
> +++ b/arch/x86/realmode/rm/Makefile
> @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
>  KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
>  KBUILD_CFLAGS  += -fno-asynchronous-unwind-tables
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>  UBSAN_SANITIZE := n
> diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> index 5943387e3f357..54f5768f58530 100644
> --- a/arch/x86/um/vdso/Makefile
> +++ b/arch/x86/um/vdso/Makefile
> @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO    $@
>
>  VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
>  GCOV_PROFILE := n
> +PGO_PROFILE := n
>
>  #
>  # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index 8a94388e38b33..2d81623b33f29 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -40,6 +40,7 @@ KBUILD_CFLAGS                 := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
>  KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
>
>  GCOV_PROFILE                   := n
> +PGO_PROFILE                    := n
>  # Sanitizer runtimes are unavailable and cannot be linked here.
>  KASAN_SANITIZE                 := n
>  KCSAN_SANITIZE                 := n
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index b2b3d81b1535a..3a591bb18c5fb 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -316,6 +316,49 @@
>  #define THERMAL_TABLE(name)
>  #endif
>
> +#ifdef CONFIG_PGO_CLANG
> +#define PGO_CLANG_DATA                                                 \
> +       __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) {     \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_start = .;                                   \
> +               __llvm_prf_data_start = .;                              \
> +               KEEP(*(__llvm_prf_data))                                \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_data_end = .;                                \
> +       }                                                               \
> +       __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) {     \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_cnts_start = .;                              \
> +               KEEP(*(__llvm_prf_cnts))                                \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_cnts_end = .;                                \
> +       }                                                               \
> +       __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) {   \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_names_start = .;                             \
> +               KEEP(*(__llvm_prf_names))                               \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_names_end = .;                               \
> +               . = ALIGN(8);                                           \
> +       }                                                               \
> +       __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) {     \
> +               __llvm_prf_vals_start = .;                              \
> +               KEEP(*(__llvm_prf_vals))                                \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_vals_end = .;                                \
> +               . = ALIGN(8);                                           \
> +       }                                                               \
> +       __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) {     \
> +               __llvm_prf_vnds_start = .;                              \
> +               KEEP(*(__llvm_prf_vnds))                                \
> +               . = ALIGN(8);                                           \
> +               __llvm_prf_vnds_end = .;                                \
> +               __llvm_prf_end = .;                                     \
> +       }
> +#else
> +#define PGO_CLANG_DATA
> +#endif
> +
>  #define KERNEL_DTB()                                                   \
>         STRUCT_ALIGN();                                                 \
>         __dtb_start = .;                                                \
> @@ -1125,6 +1168,7 @@
>                 CONSTRUCTORS                                            \
>         }                                                               \
>         BUG_TABLE                                                       \
> +       PGO_CLANG_DATA
>
>  #define INIT_TEXT_SECTION(inittext_align)                              \
>         . = ALIGN(inittext_align);                                      \
> diff --git a/kernel/Makefile b/kernel/Makefile
> index aa7368c7eabf3..0b34ca228ba46 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
>  obj-$(CONFIG_KCSAN) += kcsan/
>  obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
>  obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> +obj-$(CONFIG_PGO_CLANG) += pgo/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> new file mode 100644
> index 0000000000000..76a640b6cf6ed
> --- /dev/null
> +++ b/kernel/pgo/Kconfig
> @@ -0,0 +1,35 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> +
> +config ARCH_SUPPORTS_PGO_CLANG
> +       bool
> +
> +config PGO_CLANG
> +       bool "Enable clang's PGO-based kernel profiling"
> +       depends on DEBUG_FS
> +       depends on ARCH_SUPPORTS_PGO_CLANG
> +       depends on CC_IS_CLANG && CLANG_VERSION >= 120000
> +       help
> +         This option enables clang's PGO (Profile Guided Optimization) based
> +         code profiling to better optimize the kernel.
> +
> +         If unsure, say N.
> +
> +         Run a representative workload for your application on a kernel
> +         compiled with this option and download the raw profile file from
> +         /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> +         llvm-profdata. It may be merged with other collected raw profiles.
> +
> +         Copy the resulting profile file into vmlinux.profdata, and enable
> +         KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> +         kernel.
> +
> +         Note that a kernel compiled with profiling flags will be
> +         significantly larger and run slower. Also be sure to exclude files
> +         from profiling which are not linked to the kernel image to prevent
> +         linker errors.
> +
> +         Note that the debugfs filesystem has to be mounted to access
> +         profiling data.
> +
> +endmenu
> diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> new file mode 100644
> index 0000000000000..41e27cefd9a47
> --- /dev/null
> +++ b/kernel/pgo/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +GCOV_PROFILE   := n
> +PGO_PROFILE    := n
> +
> +obj-y  += fs.o instrument.o
> diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> new file mode 100644
> index 0000000000000..68b24672be10a
> --- /dev/null
> +++ b/kernel/pgo/fs.c
> @@ -0,0 +1,382 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + *     Sami Tolvanen <samitolvanen@google.com>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt)    "pgo: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/debugfs.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include "pgo.h"
> +
> +static struct dentry *directory;
> +
> +struct prf_private_data {
> +       void *buffer;
> +       unsigned long size;
> +};
> +
> +/*
> + * Raw profile data format:
> + *
> + *     - llvm_prf_header
> + *     - __llvm_prf_data
> + *     - __llvm_prf_cnts
> + *     - __llvm_prf_names
> + *     - zero padding to 8 bytes
> + *     - for each llvm_prf_data in __llvm_prf_data:
> + *             - llvm_prf_value_data
> + *                     - llvm_prf_value_record + site count array
> + *                             - llvm_prf_value_node_data
> + *                             ...
> + *                     ...
> + *             ...
> + */
> +
> +static void prf_fill_header(void **buffer)
> +{
> +       struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> +
> +       header->magic = LLVM_PRF_MAGIC;
> +       header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
> +       header->data_size = prf_data_count();
> +       header->padding_bytes_before_counters = 0;
> +       header->counters_size = prf_cnts_count();
> +       header->padding_bytes_after_counters = 0;
> +       header->names_size = prf_names_count();
> +       header->counters_delta = (u64)__llvm_prf_cnts_start;
> +       header->names_delta = (u64)__llvm_prf_names_start;
> +       header->value_kind_last = LLVM_PRF_IPVK_LAST;
> +
> +       *buffer += sizeof(*header);
> +}
> +
> +/*
> + * Copy the source into the buffer, incrementing the pointer into buffer in the
> + * process.
> + */
> +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> +{
> +       memcpy(*buffer, src, size);
> +       *buffer += size;
> +}
> +
> +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> +{
> +       struct llvm_prf_value_node **nodes =
> +               (struct llvm_prf_value_node **)p->values;
> +       u32 kinds = 0;
> +       u32 size = 0;
> +       unsigned int kind;
> +       unsigned int n;
> +       unsigned int s = 0;
> +
> +       for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> +               unsigned int sites = p->num_value_sites[kind];
> +
> +               if (!sites)
> +                       continue;
> +
> +               /* Record + site count array */
> +               size += prf_get_value_record_size(sites);
> +               kinds++;
> +
> +               if (!nodes)
> +                       continue;
> +
> +               for (n = 0; n < sites; n++) {
> +                       u32 count = 0;
> +                       struct llvm_prf_value_node *site = nodes[s + n];
> +
> +                       while (site && ++count <= U8_MAX)
> +                               site = site->next;
> +
> +                       size += count *
> +                               sizeof(struct llvm_prf_value_node_data);
> +               }
> +
> +               s += sites;
> +       }
> +
> +       if (size)
> +               size += sizeof(struct llvm_prf_value_data);
> +
> +       if (value_kinds)
> +               *value_kinds = kinds;
> +
> +       return size;
> +}
> +
> +static u32 prf_get_value_size(void)
> +{
> +       u32 size = 0;
> +       struct llvm_prf_data *p;
> +
> +       for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> +               size += __prf_get_value_size(p, NULL);
> +
> +       return size;
> +}
> +
> +/* Serialize the profiling's value. */
> +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> +{
> +       struct llvm_prf_value_data header;
> +       struct llvm_prf_value_node **nodes =
> +               (struct llvm_prf_value_node **)p->values;
> +       unsigned int kind;
> +       unsigned int n;
> +       unsigned int s = 0;
> +
> +       header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> +
> +       if (!header.num_value_kinds)
> +               /* Nothing to write. */
> +               return;
> +
> +       prf_copy_to_buffer(buffer, &header, sizeof(header));
> +
> +       for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> +               struct llvm_prf_value_record *record;
> +               u8 *counts;
> +               unsigned int sites = p->num_value_sites[kind];
> +
> +               if (!sites)
> +                       continue;
> +
> +               /* Profiling value record. */
> +               record = *(struct llvm_prf_value_record **)buffer;
> +               *buffer += prf_get_value_record_header_size();
> +
> +               record->kind = kind;
> +               record->num_value_sites = sites;
> +
> +               /* Site count array. */
> +               counts = *(u8 **)buffer;
> +               *buffer += prf_get_value_record_site_count_size(sites);
> +
> +               /*
> +                * If we don't have nodes, we can skip updating the site count
> +                * array, because the buffer is zero filled.
> +                */
> +               if (!nodes)
> +                       continue;
> +
> +               for (n = 0; n < sites; n++) {
> +                       u32 count = 0;
> +                       struct llvm_prf_value_node *site = nodes[s + n];
> +
> +                       while (site && ++count <= U8_MAX) {
> +                               prf_copy_to_buffer(buffer, site,
> +                                                  sizeof(struct llvm_prf_value_node_data));
> +                               site = site->next;
> +                       }
> +
> +                       counts[n] = (u8)count;
> +               }
> +
> +               s += sites;
> +       }
> +}
> +
> +static void prf_serialize_values(void **buffer)
> +{
> +       struct llvm_prf_data *p;
> +
> +       for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> +               prf_serialize_value(p, buffer);
> +}
> +
> +static inline unsigned long prf_get_padding(unsigned long size)
> +{
> +       return 7 & (8 - size % 8);
> +}
> +
> +static unsigned long prf_buffer_size(void)
> +{
> +       return sizeof(struct llvm_prf_header) +
> +                       prf_data_size() +
> +                       prf_cnts_size() +
> +                       prf_names_size() +
> +                       prf_get_padding(prf_names_size()) +
> +                       prf_get_value_size();
> +}
> +
> +/* Serialize the profiling data into a format LLVM's tools can understand. */
> +static int prf_serialize(struct prf_private_data *p)
> +{
> +       int err = 0;
> +       void *buffer;
> +
> +       p->size = prf_buffer_size();
> +       p->buffer = vzalloc(p->size);
> +
> +       if (!p->buffer) {
> +               err = -ENOMEM;
> +               goto out;
> +       }
> +
> +       buffer = p->buffer;
> +
> +       prf_fill_header(&buffer);
> +       prf_copy_to_buffer(&buffer, __llvm_prf_data_start,  prf_data_size());
> +       prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start,  prf_cnts_size());
> +       prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> +       buffer += prf_get_padding(prf_names_size());
> +
> +       prf_serialize_values(&buffer);
> +
> +out:
> +       return err;
> +}
> +
> +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> +static int prf_open(struct inode *inode, struct file *file)
> +{
> +       struct prf_private_data *data;
> +       unsigned long flags;
> +       int err;
> +
> +       data = kzalloc(sizeof(*data), GFP_KERNEL);
> +       if (!data) {
> +               err = -ENOMEM;
> +               goto out;
> +       }
> +
> +       flags = prf_lock();
> +
> +       err = prf_serialize(data);
> +       if (err) {
> +               kfree(data);
> +               goto out_unlock;
> +       }
> +
> +       file->private_data = data;
> +
> +out_unlock:
> +       prf_unlock(flags);
> +out:
> +       return err;
> +}
> +
> +/* read() implementation for PGO. */
> +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> +                       loff_t *ppos)
> +{
> +       struct prf_private_data *data = file->private_data;
> +
> +       BUG_ON(!data);
> +
> +       return simple_read_from_buffer(buf, count, ppos, data->buffer,
> +                                      data->size);
> +}
> +
> +/* release() implementation for PGO. Release resources allocated by open(). */
> +static int prf_release(struct inode *inode, struct file *file)
> +{
> +       struct prf_private_data *data = file->private_data;
> +
> +       if (data) {
> +               vfree(data->buffer);
> +               kfree(data);
> +       }
> +
> +       return 0;
> +}
> +
> +static const struct file_operations prf_fops = {
> +       .owner          = THIS_MODULE,
> +       .open           = prf_open,
> +       .read           = prf_read,
> +       .llseek         = default_llseek,
> +       .release        = prf_release
> +};
> +
> +/* write() implementation for resetting PGO's profile data. */
> +static ssize_t reset_write(struct file *file, const char __user *addr,
> +                          size_t len, loff_t *pos)
> +{
> +       struct llvm_prf_data *data;
> +
> +       memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> +
> +       for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
> +               struct llvm_prf_value_node **vnodes;
> +               u64 current_vsite_count;
> +               u32 i;
> +
> +               if (!data->values)
> +                       continue;
> +
> +               current_vsite_count = 0;
> +               vnodes = (struct llvm_prf_value_node **)data->values;
> +
> +               for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
> +                       current_vsite_count += data->num_value_sites[i];
> +
> +               for (i = 0; i < current_vsite_count; ++i) {
> +                       struct llvm_prf_value_node *current_vnode = vnodes[i];
> +
> +                       while (current_vnode) {
> +                               current_vnode->count = 0;
> +                               current_vnode = current_vnode->next;
> +                       }
> +               }
> +       }
> +
> +       return len;
> +}
> +
> +static const struct file_operations prf_reset_fops = {
> +       .owner          = THIS_MODULE,
> +       .write          = reset_write,
> +       .llseek         = noop_llseek,
> +};
> +
> +/* Create debugfs entries. */
> +static int __init pgo_init(void)
> +{
> +       directory = debugfs_create_dir("pgo", NULL);
> +       if (!directory)
> +               goto err_remove;
> +
> +       if (!debugfs_create_file("profraw", 0600, directory, NULL,
> +                                &prf_fops))
> +               goto err_remove;
> +
> +       if (!debugfs_create_file("reset", 0200, directory, NULL,
> +                                &prf_reset_fops))
> +               goto err_remove;
> +
> +       return 0;
> +
> +err_remove:
> +       pr_err("initialization failed\n");
> +       return -EIO;
> +}
> +
> +/* Remove debugfs entries. */
> +static void __exit pgo_exit(void)
> +{
> +       debugfs_remove_recursive(directory);
> +}
> +
> +module_init(pgo_init);
> +module_exit(pgo_exit);
> diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> new file mode 100644
> index 0000000000000..6084ff0652e85
> --- /dev/null
> +++ b/kernel/pgo/instrument.c
> @@ -0,0 +1,185 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + *     Sami Tolvanen <samitolvanen@google.com>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt)    "pgo: " fmt
> +
> +#include <linux/bitops.h>
> +#include <linux/kernel.h>
> +#include <linux/export.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +#include "pgo.h"
> +
> +/* Lock guarding value node access and serialization. */
> +static DEFINE_SPINLOCK(pgo_lock);
> +static int current_node;
> +
> +unsigned long prf_lock(void)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&pgo_lock, flags);
> +
> +       return flags;
> +}
> +
> +void prf_unlock(unsigned long flags)
> +{
> +       spin_unlock_irqrestore(&pgo_lock, flags);
> +}
> +
> +/*
> + * Return a newly allocated profiling value node which contains the tracked
> + * value by the value profiler.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> +                                                u32 index, u64 value)
> +{
> +       if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> +               return NULL; /* Out of nodes */
> +
> +       current_node++;
> +
> +       /* Make sure the node is entirely within the section */
> +       if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> +           &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> +               return NULL;
> +
> +       return &__llvm_prf_vnds_start[current_node];
> +}
> +
> +/*
> + * Counts the number of times a target value is seen.
> + *
> + * Records the target value for the CounterIndex if not seen before. Otherwise,
> + * increments the counter associated w/ the target value.
> + */
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> +{
> +       struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> +       struct llvm_prf_value_node **counters;
> +       struct llvm_prf_value_node *curr;
> +       struct llvm_prf_value_node *min = NULL;
> +       struct llvm_prf_value_node *prev = NULL;
> +       u64 min_count = U64_MAX;
> +       u8 values = 0;
> +       unsigned long flags;
> +
> +       if (!p || !p->values)
> +               return;
> +
> +       counters = (struct llvm_prf_value_node **)p->values;
> +       curr = counters[index];
> +
> +       while (curr) {
> +               if (target_value == curr->value) {
> +                       curr->count++;
> +                       return;
> +               }
> +
> +               if (curr->count < min_count) {
> +                       min_count = curr->count;
> +                       min = curr;
> +               }
> +
> +               prev = curr;
> +               curr = curr->next;
> +               values++;
> +       }
> +
> +       if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
> +               if (!min->count || !(--min->count)) {
> +                       curr = min;
> +                       curr->value = target_value;
> +                       curr->count++;
> +               }
> +               return;
> +       }
> +
> +       /* Lock when updating the value node structure. */
> +       flags = prf_lock();
> +
> +       curr = allocate_node(p, index, target_value);
> +       if (!curr)
> +               goto out;
> +
> +       curr->value = target_value;
> +       curr->count++;
> +
> +       if (!counters[index])
> +               counters[index] = curr;
> +       else if (prev && !prev->next)
> +               prev->next = curr;
> +
> +out:
> +       prf_unlock(flags);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> +
> +/* Counts the number of times a range of targets values are seen. */
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> +                                    u32 index, s64 precise_start,
> +                                    s64 precise_last, s64 large_value);
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> +                                    u32 index, s64 precise_start,
> +                                    s64 precise_last, s64 large_value)
> +{
> +       if (large_value != S64_MIN && (s64)target_value >= large_value)
> +               target_value = large_value;
> +       else if ((s64)target_value < precise_start ||
> +                (s64)target_value > precise_last)
> +               target_value = precise_last + 1;
> +
> +       __llvm_profile_instrument_target(target_value, data, index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> +
> +static u64 inst_prof_get_range_rep_value(u64 value)
> +{
> +       if (value <= 8)
> +               /* The first ranges are individually tracked, us it as is. */
> +               return value;
> +       else if (value >= 513)
> +               /* The last range is mapped to its lowest value. */
> +               return 513;
> +       else if (hweight64(value) == 1)
> +               /* If it's a power of two, use it as is. */
> +               return value;
> +
> +       /* Otherwise, take to the previous power of two + 1. */
> +       return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
> +}
> +
> +/*
> + * The target values are partitioned into multiple ranges. The range spec is
> + * defined in compiler-rt/include/profile/InstrProfData.inc.
> + */
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> +                                    u32 counter_index);
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> +                                    u32 counter_index)
> +{
> +       u64 rep_value;
> +
> +       /* Map the target value to the representative value of its range. */
> +       rep_value = inst_prof_get_range_rep_value(target_value);
> +       __llvm_profile_instrument_target(rep_value, data, counter_index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> new file mode 100644
> index 0000000000000..df0aa278f28bd
> --- /dev/null
> +++ b/kernel/pgo/pgo.h
> @@ -0,0 +1,206 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + *     Sami Tolvanen <samitolvanen@google.com>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#ifndef _PGO_H
> +#define _PGO_H
> +
> +/*
> + * Note: These internal LLVM definitions must match the compiler version.
> + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> + */
> +
> +#ifdef CONFIG_64BIT
> +       #define LLVM_PRF_MAGIC          \
> +               ((u64)255 << 56 |       \
> +                (u64)'l' << 48 |       \
> +                (u64)'p' << 40 |       \
> +                (u64)'r' << 32 |       \
> +                (u64)'o' << 24 |       \
> +                (u64)'f' << 16 |       \
> +                (u64)'r' << 8  |       \
> +                (u64)129)
> +#else
> +       #define LLVM_PRF_MAGIC          \
> +               ((u64)255 << 56 |       \
> +                (u64)'l' << 48 |       \
> +                (u64)'p' << 40 |       \
> +                (u64)'r' << 32 |       \
> +                (u64)'o' << 24 |       \
> +                (u64)'f' << 16 |       \
> +                (u64)'R' << 8  |       \
> +                (u64)129)
> +#endif
> +
> +#define LLVM_PRF_VERSION               5
> +#define LLVM_PRF_DATA_ALIGN            8
> +#define LLVM_PRF_IPVK_FIRST            0
> +#define LLVM_PRF_IPVK_LAST             1
> +#define LLVM_PRF_MAX_NUM_VALS_PER_SITE 16
> +
> +#define LLVM_PRF_VARIANT_MASK_IR       (0x1ull << 56)
> +#define LLVM_PRF_VARIANT_MASK_CSIR     (0x1ull << 57)
> +
> +/**
> + * struct llvm_prf_header - represents the raw profile header data structure.
> + * @magic: the magic token for the file format.
> + * @version: the version of the file format.
> + * @data_size: the number of entries in the profile data section.
> + * @padding_bytes_before_counters: the number of padding bytes before the
> + *   counters.
> + * @counters_size: the size in bytes of the LLVM profile section containing the
> + *   counters.
> + * @padding_bytes_after_counters: the number of padding bytes after the
> + *   counters.
> + * @names_size: the size in bytes of the LLVM profile section containing the
> + *   counters' names.
> + * @counters_delta: the beginning of the LLMV profile counters section.
> + * @names_delta: the beginning of the LLMV profile names section.
> + * @value_kind_last: the last profile value kind.
> + */
> +struct llvm_prf_header {
> +       u64 magic;
> +       u64 version;
> +       u64 data_size;
> +       u64 padding_bytes_before_counters;
> +       u64 counters_size;
> +       u64 padding_bytes_after_counters;
> +       u64 names_size;
> +       u64 counters_delta;
> +       u64 names_delta;
> +       u64 value_kind_last;
> +};
> +
> +/**
> + * struct llvm_prf_data - represents the per-function control structure.
> + * @name_ref: the reference to the function's name.
> + * @func_hash: the hash value of the function.
> + * @counter_ptr: a pointer to the profile counter.
> + * @function_ptr: a pointer to the function.
> + * @values: the profiling values associated with this function.
> + * @num_counters: the number of counters in the function.
> + * @num_value_sites: the number of value profile sites.
> + */
> +struct llvm_prf_data {
> +       const u64 name_ref;
> +       const u64 func_hash;
> +       const void *counter_ptr;
> +       const void *function_ptr;
> +       void *values;
> +       const u32 num_counters;
> +       const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
> +} __aligned(LLVM_PRF_DATA_ALIGN);
> +
> +/**
> + * structure llvm_prf_value_node_data - represents the data part of the struct
> + *   llvm_prf_value_node data structure.
> + * @value: the value counters.
> + * @count: the counters' count.
> + */
> +struct llvm_prf_value_node_data {
> +       u64 value;
> +       u64 count;
> +};
> +
> +/**
> + * struct llvm_prf_value_node - represents an internal data structure used by
> + *   the value profiler.
> + * @value: the value counters.
> + * @count: the counters' count.
> + * @next: the next value node.
> + */
> +struct llvm_prf_value_node {
> +       u64 value;
> +       u64 count;
> +       struct llvm_prf_value_node *next;
> +};
> +
> +/**
> + * struct llvm_prf_value_data - represents the value profiling data in indexed
> + *   format.
> + * @total_size: the total size in bytes including this field.
> + * @num_value_kinds: the number of value profile kinds that has value profile
> + *   data.
> + */
> +struct llvm_prf_value_data {
> +       u32 total_size;
> +       u32 num_value_kinds;
> +};
> +
> +/**
> + * struct llvm_prf_value_record - represents the on-disk layout of the value
> + *   profile data of a particular kind for one function.
> + * @kind: the kind of the value profile record.
> + * @num_value_sites: the number of value profile sites.
> + * @site_count_array: the first element of the array that stores the number
> + *   of profiled values for each value site.
> + */
> +struct llvm_prf_value_record {
> +       u32 kind;
> +       u32 num_value_sites;
> +       u8 site_count_array[];
> +};
> +
> +#define prf_get_value_record_header_size()             \
> +       offsetof(struct llvm_prf_value_record, site_count_array)
> +#define prf_get_value_record_site_count_size(sites)    \
> +       roundup((sites), 8)
> +#define prf_get_value_record_size(sites)               \
> +       (prf_get_value_record_header_size() +           \
> +        prf_get_value_record_site_count_size((sites)))
> +
> +/* Data sections */
> +extern struct llvm_prf_data __llvm_prf_data_start[];
> +extern struct llvm_prf_data __llvm_prf_data_end[];
> +
> +extern u64 __llvm_prf_cnts_start[];
> +extern u64 __llvm_prf_cnts_end[];
> +
> +extern char __llvm_prf_names_start[];
> +extern char __llvm_prf_names_end[];
> +
> +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> +
> +/* Locking for vnodes */
> +extern unsigned long prf_lock(void);
> +extern void prf_unlock(unsigned long flags);
> +
> +#define __DEFINE_PRF_SIZE(s) \
> +       static inline unsigned long prf_ ## s ## _size(void)            \
> +       {                                                               \
> +               unsigned long start =                                   \
> +                       (unsigned long)__llvm_prf_ ## s ## _start;      \
> +               unsigned long end =                                     \
> +                       (unsigned long)__llvm_prf_ ## s ## _end;        \
> +               return roundup(end - start,                             \
> +                               sizeof(__llvm_prf_ ## s ## _start[0])); \
> +       }                                                               \
> +       static inline unsigned long prf_ ## s ## _count(void)           \
> +       {                                                               \
> +               return prf_ ## s ## _size() /                           \
> +                       sizeof(__llvm_prf_ ## s ## _start[0]);          \
> +       }
> +
> +__DEFINE_PRF_SIZE(data);
> +__DEFINE_PRF_SIZE(cnts);
> +__DEFINE_PRF_SIZE(names);
> +__DEFINE_PRF_SIZE(vnds);
> +
> +#undef __DEFINE_PRF_SIZE
> +
> +#endif /* _PGO_H */
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 213677a5ed33e..9b218afb5cb87 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
>                 $(CFLAGS_GCOV))
>  endif
>
> +#
> +# Enable clang's PGO profiling flags for a file or directory depending on
> +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> +#
> +ifeq ($(CONFIG_PGO_CLANG),y)
> +_c_flags += $(if $(patsubst n%,, \
> +               $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> +               $(CFLAGS_PGO_CLANG))
> +endif
> +
>  #
>  # Enable address sanitizer flags for kernel except some files or directories
>  # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> --
> 2.30.0.284.gd98b1dd5eaa7-goog
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210116094357.3620352-1-morbo%40google.com.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-16 17:38         ` Sedat Dilek
@ 2021-01-16 18:36           ` Sedat Dilek
  2021-01-16 20:23           ` Bill Wendling
  1 sibling, 0 replies; 116+ messages in thread
From: Sedat Dilek @ 2021-01-16 18:36 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Jonathan Corbet, Masahiro Yamada, linux-doc, linux-kernel,
	linux-kbuild, Clang-Built-Linux ML, Andrew Morton,
	Nathan Chancellor, Nick Desaulniers, Sami Tolvanen

On Sat, Jan 16, 2021 at 6:38 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
> On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
> <clang-built-linux@googlegroups.com> wrote:
> >
> > From: Sami Tolvanen <samitolvanen@google.com>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool
> > before it can be used during recompilation:
> >
> >   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> >   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can now be used by the compiler:
> >
> >   $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we
> > know works. This restriction can be lifted once other platforms have
> > been verified to work with PGO.
> >
> > Note that this method of profiling the kernel is clang-native, unlike
> > the clang support in kernel/gcov.
> >
> > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> >
> > Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> > Co-developed-by: Bill Wendling <morbo@google.com>
> > Signed-off-by: Bill Wendling <morbo@google.com>
> > ---
> > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> >       testing.
> >     - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> >       Song's comments.
> > v3: - Added change log section based on Sedat Dilek's comments.
> > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> >       own popcount implementation, based on Nick Desaulniers's comment.
> > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > ---
> >  Documentation/dev-tools/index.rst     |   1 +
> >  Documentation/dev-tools/pgo.rst       | 127 +++++++++
> >  MAINTAINERS                           |   9 +
> >  Makefile                              |   3 +
> >  arch/Kconfig                          |   1 +
> >  arch/x86/Kconfig                      |   1 +
> >  arch/x86/boot/Makefile                |   1 +
> >  arch/x86/boot/compressed/Makefile     |   1 +
> >  arch/x86/crypto/Makefile              |   2 +
> >  arch/x86/entry/vdso/Makefile          |   1 +
> >  arch/x86/kernel/vmlinux.lds.S         |   2 +
> >  arch/x86/platform/efi/Makefile        |   1 +
> >  arch/x86/purgatory/Makefile           |   1 +
> >  arch/x86/realmode/rm/Makefile         |   1 +
> >  arch/x86/um/vdso/Makefile             |   1 +
> >  drivers/firmware/efi/libstub/Makefile |   1 +
> >  include/asm-generic/vmlinux.lds.h     |  44 +++
> >  kernel/Makefile                       |   1 +
> >  kernel/pgo/Kconfig                    |  35 +++
> >  kernel/pgo/Makefile                   |   5 +
> >  kernel/pgo/fs.c                       | 382 ++++++++++++++++++++++++++
> >  kernel/pgo/instrument.c               | 185 +++++++++++++
> >  kernel/pgo/pgo.h                      | 206 ++++++++++++++
> >  scripts/Makefile.lib                  |  10 +
> >  24 files changed, 1022 insertions(+)
> >  create mode 100644 Documentation/dev-tools/pgo.rst
> >  create mode 100644 kernel/pgo/Kconfig
> >  create mode 100644 kernel/pgo/Makefile
> >  create mode 100644 kernel/pgo/fs.c
> >  create mode 100644 kernel/pgo/instrument.c
> >  create mode 100644 kernel/pgo/pgo.h
> >
> > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > index f7809c7b1ba9e..8d6418e858062 100644
> > --- a/Documentation/dev-tools/index.rst
> > +++ b/Documentation/dev-tools/index.rst
> > @@ -26,6 +26,7 @@ whole; patches welcome!
> >     kgdb
> >     kselftest
> >     kunit/index
> > +   pgo
> >
> >
> >  .. only::  subproject and html
> > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > new file mode 100644
> > index 0000000000000..b7f11d8405b73
> > --- /dev/null
> > +++ b/Documentation/dev-tools/pgo.rst
> > @@ -0,0 +1,127 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +===============================
> > +Using PGO with the Linux kernel
> > +===============================
> > +
> > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > +when building with Clang. The profiling data is exported via the ``pgo``
> > +debugfs directory.
> > +
> > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > +
> > +
> > +Preparation
> > +===========
> > +
> > +Configure the kernel with:
> > +
> > +.. code-block:: make
> > +
> > +   CONFIG_DEBUG_FS=y
> > +   CONFIG_PGO_CLANG=y
> > +
> > +Note that kernels compiled with profiling flags will be significantly larger
> > +and run slower.
> > +
> > +Profiling data will only become accessible once debugfs has been mounted:
> > +
> > +.. code-block:: sh
> > +
> > +   mount -t debugfs none /sys/kernel/debug
> > +
> > +
> > +Customization
> > +=============
> > +
> > +You can enable or disable profiling for individual file and directories by
> > +adding a line similar to the following to the respective kernel Makefile:
> > +
> > +- For a single file (e.g. main.o)
> > +
> > +  .. code-block:: make
> > +
> > +     PGO_PROFILE_main.o := y
> > +
> > +- For all files in one directory
> > +
> > +  .. code-block:: make
> > +
> > +     PGO_PROFILE := y
> > +
> > +To exclude files from being profiled use
> > +
> > +  .. code-block:: make
> > +
> > +     PGO_PROFILE_main.o := n
> > +
> > +and
> > +
> > +  .. code-block:: make
> > +
> > +     PGO_PROFILE := n
> > +
> > +Only files which are linked to the main kernel image or are compiled as kernel
> > +modules are supported by this mechanism.
> > +
> > +
> > +Files
> > +=====
> > +
> > +The PGO kernel support creates the following files in debugfs:
> > +
> > +``/sys/kernel/debug/pgo``
> > +       Parent directory for all PGO-related files.
> > +
> > +``/sys/kernel/debug/pgo/reset``
> > +       Global reset file: resets all coverage data to zero when written to.
> > +
> > +``/sys/kernel/debug/profraw``
> > +       The raw PGO data that must be processed with ``llvm_profdata``.
> > +
> > +
> > +Workflow
> > +========
> > +
> > +The PGO kernel can be run on the host or test machines. The data though should
> > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > +Clang version.
> > +
> > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > +etc. Clang offers tools to perform these tasks.
> > +
> > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > +using the result to optimize the kernel:
> > +
> > +1) Install the kernel on the TEST machine.
> > +
> > +2) Reset the data counters right before running the load tests
> > +
> > +   .. code-block:: sh
> > +
> > +      $ echo 1 > /sys/kernel/debug/pgo/reset
> > +
>
> I do not get this...
>
> # mount | grep debugfs
> debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
>

I tried:

# umount /sys/kernel/debug

# mount -t debugfs none /sys/kernel/debug

# echo 1 > /sys/kernel/debug/pgo/reset

*** Run load-test ***

Again the profraw file is younger.

# LC_ALL=C ll /sys/kernel/debug/pgo/
total 0
drwxr-xr-x  2 root root 0 Jan 16 17:29 .
drwx------ 41 root root 0 Jan 16 17:29 ..
-rw-------  1 root root 0 Jan 16 19:14 profraw
--w-------  1 root root 0 Jan 16 19:29 reset

Did this really profile my kernel-build?

- Sedat -

> After the load-test...?
>
> echo 0 > /sys/kernel/debug/pgo/reset
>
> > +3) Run the load tests.
> > +
> > +4) Collect the raw profile data
> > +
> > +   .. code-block:: sh
> > +
> > +      $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > +
>
> This is only 4,9M small and seen from the date 5mins before I run the
> echo-1 line.
>
> # ll /sys/kernel/debug/pgo
> insgesamt 0
> drwxr-xr-x  2 root root 0 16. Jan 17:29 .
> drwx------ 41 root root 0 16. Jan 17:29 ..
> -rw-------  1 root root 0 16. Jan 17:29 profraw
> --w-------  1 root root 0 16. Jan 18:19 reset
>
> # cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
>
> # ll /tmp/vmlinux.profraw
> -rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw
>
> For me there was no prof-data collected from my defconfig kernel-build.
>
> > +5) (Optional) Download the raw profile data to the HOST machine.
> > +
> > +6) Process the raw profile data
> > +
> > +   .. code-block:: sh
> > +
> > +      $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > +
>
> Is that executed in /path/to/linux/git?
>
> > +   Note that multiple raw profile data files can be merged during this step.
> > +
> > +7) Rebuild the kernel using the profile data (PGO disabled)
> > +
> > +   .. code-block:: sh
> > +
> > +      $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> How big is vmlinux.profdata (make defconfig)?
>
> Do I need to do a full defconfig build or can I stop the build after
> let me say 10mins?
>
> - Sedat -
>
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 79b400c97059f..cb1f1f2b2baf4 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -13948,6 +13948,15 @@ S:     Maintained
> >  F:     include/linux/personality.h
> >  F:     include/uapi/linux/personality.h
> >
> > +PGO BASED KERNEL PROFILING
> > +M:     Sami Tolvanen <samitolvanen@google.com>
> > +M:     Bill Wendling <wcw@google.com>
> > +R:     Nathan Chancellor <natechancellor@gmail.com>
> > +R:     Nick Desaulniers <ndesaulniers@google.com>
> > +S:     Supported
> > +F:     Documentation/dev-tools/pgo.rst
> > +F:     kernel/pgo
> > +
> >  PHOENIX RC FLIGHT CONTROLLER ADAPTER
> >  M:     Marcus Folkesson <marcus.folkesson@gmail.com>
> >  L:     linux-input@vger.kernel.org
> > diff --git a/Makefile b/Makefile
> > index 9e73f82e0d863..9128bfe1ccc97 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
> >  # Defaults to vmlinux, but the arch makefile usually adds further targets
> >  all: vmlinux
> >
> > +CFLAGS_PGO_CLANG := -fprofile-generate
> > +export CFLAGS_PGO_CLANG
> > +
> >  CFLAGS_GCOV    := -fprofile-arcs -ftest-coverage \
> >         $(call cc-option,-fno-tree-loop-im) \
> >         $(call cc-disable-warning,maybe-uninitialized,)
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 24862d15f3a36..f39d3991f6bfe 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
> >            pairs of 32-bit arguments, select this option.
> >
> >  source "kernel/gcov/Kconfig"
> > +source "kernel/pgo/Kconfig"
> >
> >  source "scripts/gcc-plugins/Kconfig"
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 21f851179ff08..36305ea61dc09 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -96,6 +96,7 @@ config X86
> >         select ARCH_SUPPORTS_DEBUG_PAGEALLOC
> >         select ARCH_SUPPORTS_NUMA_BALANCING     if X86_64
> >         select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP       if NR_CPUS <= 4096
> > +       select ARCH_SUPPORTS_PGO_CLANG          if X86_64
> >         select ARCH_USE_BUILTIN_BSWAP
> >         select ARCH_USE_QUEUED_RWLOCKS
> >         select ARCH_USE_QUEUED_SPINLOCKS
> > diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> > index fe605205b4ce2..383853e32f673 100644
> > --- a/arch/x86/boot/Makefile
> > +++ b/arch/x86/boot/Makefile
> > @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> >  KBUILD_CFLAGS  += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> >  KBUILD_CFLAGS  += -fno-asynchronous-unwind-tables
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >  UBSAN_SANITIZE := n
> >
> >  $(obj)/bzImage: asflags-y  := $(SVGA_MODE)
> > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> > index e0bc3988c3faa..ed12ab65f6065 100644
> > --- a/arch/x86/boot/compressed/Makefile
> > +++ b/arch/x86/boot/compressed/Makefile
> > @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
> >
> >  KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >  UBSAN_SANITIZE :=n
> >
> >  KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> > diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> > index a31de0c6ccde2..775fa0b368e98 100644
> > --- a/arch/x86/crypto/Makefile
> > +++ b/arch/x86/crypto/Makefile
> > @@ -4,6 +4,8 @@
> >
> >  OBJECT_FILES_NON_STANDARD := y
> >
> > +PGO_PROFILE_curve25519-x86_64.o := n
> > +
> >  obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o
> >
> >  obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
> > diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> > index 02e3e42f380bd..26e2b3af0145c 100644
> > --- a/arch/x86/entry/vdso/Makefile
> > +++ b/arch/x86/entry/vdso/Makefile
> > @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO    $@
> >  VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> >         $(call ld-option, --eh-frame-hdr) -Bsymbolic
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> >  quiet_cmd_vdso_and_check = VDSO    $@
> >        cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> > index efd9e9ea17f25..f6cab2316c46a 100644
> > --- a/arch/x86/kernel/vmlinux.lds.S
> > +++ b/arch/x86/kernel/vmlinux.lds.S
> > @@ -184,6 +184,8 @@ SECTIONS
> >
> >         BUG_TABLE
> >
> > +       PGO_CLANG_DATA
> > +
> >         ORC_UNWIND_TABLE
> >
> >         . = ALIGN(PAGE_SIZE);
> > diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> > index 84b09c230cbd5..5f22b31446ad4 100644
> > --- a/arch/x86/platform/efi/Makefile
> > +++ b/arch/x86/platform/efi/Makefile
> > @@ -2,6 +2,7 @@
> >  OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> >  KASAN_SANITIZE := n
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> >  obj-$(CONFIG_EFI)              += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> >  obj-$(CONFIG_EFI_MIXED)                += efi_thunk_$(BITS).o
> > diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> > index 95ea17a9d20cb..36f20e99da0bc 100644
> > --- a/arch/x86/purgatory/Makefile
> > +++ b/arch/x86/purgatory/Makefile
> > @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
> >
> >  # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> >  GCOV_PROFILE   := n
> > +PGO_PROFILE    := n
> >  KASAN_SANITIZE := n
> >  UBSAN_SANITIZE := n
> >  KCSAN_SANITIZE := n
> > diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> > index 83f1b6a56449f..21797192f958f 100644
> > --- a/arch/x86/realmode/rm/Makefile
> > +++ b/arch/x86/realmode/rm/Makefile
> > @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> >  KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> >  KBUILD_CFLAGS  += -fno-asynchronous-unwind-tables
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >  UBSAN_SANITIZE := n
> > diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> > index 5943387e3f357..54f5768f58530 100644
> > --- a/arch/x86/um/vdso/Makefile
> > +++ b/arch/x86/um/vdso/Makefile
> > @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO    $@
> >
> >  VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> >  GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> >  #
> >  # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> > index 8a94388e38b33..2d81623b33f29 100644
> > --- a/drivers/firmware/efi/libstub/Makefile
> > +++ b/drivers/firmware/efi/libstub/Makefile
> > @@ -40,6 +40,7 @@ KBUILD_CFLAGS                 := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
> >  KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
> >
> >  GCOV_PROFILE                   := n
> > +PGO_PROFILE                    := n
> >  # Sanitizer runtimes are unavailable and cannot be linked here.
> >  KASAN_SANITIZE                 := n
> >  KCSAN_SANITIZE                 := n
> > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > index b2b3d81b1535a..3a591bb18c5fb 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -316,6 +316,49 @@
> >  #define THERMAL_TABLE(name)
> >  #endif
> >
> > +#ifdef CONFIG_PGO_CLANG
> > +#define PGO_CLANG_DATA                                                 \
> > +       __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) {     \
> > +               . = ALIGN(8);                                           \
> > +               __llvm_prf_start = .;                                   \
> > +               __llvm_prf_data_start = .;                              \
> > +               KEEP(*(__llvm_prf_data))                                \
> > +               . = ALIGN(8);                                           \
> > +               __llvm_prf_data_end = .;                                \
> > +       }                                                               \
> > +       __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) {     \
> > +               . = ALIGN(8);                                           \
> > +               __llvm_prf_cnts_start = .;                              \
> > +               KEEP(*(__llvm_prf_cnts))                                \
> > +               . = ALIGN(8);                                           \
> > +               __llvm_prf_cnts_end = .;                                \
> > +       }                                                               \
> > +       __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) {   \
> > +               . = ALIGN(8);                                           \
> > +               __llvm_prf_names_start = .;                             \
> > +               KEEP(*(__llvm_prf_names))                               \
> > +               . = ALIGN(8);                                           \
> > +               __llvm_prf_names_end = .;                               \
> > +               . = ALIGN(8);                                           \
> > +       }                                                               \
> > +       __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) {     \
> > +               __llvm_prf_vals_start = .;                              \
> > +               KEEP(*(__llvm_prf_vals))                                \
> > +               . = ALIGN(8);                                           \
> > +               __llvm_prf_vals_end = .;                                \
> > +               . = ALIGN(8);                                           \
> > +       }                                                               \
> > +       __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) {     \
> > +               __llvm_prf_vnds_start = .;                              \
> > +               KEEP(*(__llvm_prf_vnds))                                \
> > +               . = ALIGN(8);                                           \
> > +               __llvm_prf_vnds_end = .;                                \
> > +               __llvm_prf_end = .;                                     \
> > +       }
> > +#else
> > +#define PGO_CLANG_DATA
> > +#endif
> > +
> >  #define KERNEL_DTB()                                                   \
> >         STRUCT_ALIGN();                                                 \
> >         __dtb_start = .;                                                \
> > @@ -1125,6 +1168,7 @@
> >                 CONSTRUCTORS                                            \
> >         }                                                               \
> >         BUG_TABLE                                                       \
> > +       PGO_CLANG_DATA
> >
> >  #define INIT_TEXT_SECTION(inittext_align)                              \
> >         . = ALIGN(inittext_align);                                      \
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index aa7368c7eabf3..0b34ca228ba46 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> >  obj-$(CONFIG_KCSAN) += kcsan/
> >  obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> >  obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> > +obj-$(CONFIG_PGO_CLANG) += pgo/
> >
> >  obj-$(CONFIG_PERF_EVENTS) += events/
> >
> > diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> > new file mode 100644
> > index 0000000000000..76a640b6cf6ed
> > --- /dev/null
> > +++ b/kernel/pgo/Kconfig
> > @@ -0,0 +1,35 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> > +
> > +config ARCH_SUPPORTS_PGO_CLANG
> > +       bool
> > +
> > +config PGO_CLANG
> > +       bool "Enable clang's PGO-based kernel profiling"
> > +       depends on DEBUG_FS
> > +       depends on ARCH_SUPPORTS_PGO_CLANG
> > +       depends on CC_IS_CLANG && CLANG_VERSION >= 120000
> > +       help
> > +         This option enables clang's PGO (Profile Guided Optimization) based
> > +         code profiling to better optimize the kernel.
> > +
> > +         If unsure, say N.
> > +
> > +         Run a representative workload for your application on a kernel
> > +         compiled with this option and download the raw profile file from
> > +         /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> > +         llvm-profdata. It may be merged with other collected raw profiles.
> > +
> > +         Copy the resulting profile file into vmlinux.profdata, and enable
> > +         KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> > +         kernel.
> > +
> > +         Note that a kernel compiled with profiling flags will be
> > +         significantly larger and run slower. Also be sure to exclude files
> > +         from profiling which are not linked to the kernel image to prevent
> > +         linker errors.
> > +
> > +         Note that the debugfs filesystem has to be mounted to access
> > +         profiling data.
> > +
> > +endmenu
> > diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> > new file mode 100644
> > index 0000000000000..41e27cefd9a47
> > --- /dev/null
> > +++ b/kernel/pgo/Makefile
> > @@ -0,0 +1,5 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +GCOV_PROFILE   := n
> > +PGO_PROFILE    := n
> > +
> > +obj-y  += fs.o instrument.o
> > diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> > new file mode 100644
> > index 0000000000000..68b24672be10a
> > --- /dev/null
> > +++ b/kernel/pgo/fs.c
> > @@ -0,0 +1,382 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + *     Sami Tolvanen <samitolvanen@google.com>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt)    "pgo: " fmt
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/debugfs.h>
> > +#include <linux/fs.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +#include <linux/vmalloc.h>
> > +#include "pgo.h"
> > +
> > +static struct dentry *directory;
> > +
> > +struct prf_private_data {
> > +       void *buffer;
> > +       unsigned long size;
> > +};
> > +
> > +/*
> > + * Raw profile data format:
> > + *
> > + *     - llvm_prf_header
> > + *     - __llvm_prf_data
> > + *     - __llvm_prf_cnts
> > + *     - __llvm_prf_names
> > + *     - zero padding to 8 bytes
> > + *     - for each llvm_prf_data in __llvm_prf_data:
> > + *             - llvm_prf_value_data
> > + *                     - llvm_prf_value_record + site count array
> > + *                             - llvm_prf_value_node_data
> > + *                             ...
> > + *                     ...
> > + *             ...
> > + */
> > +
> > +static void prf_fill_header(void **buffer)
> > +{
> > +       struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> > +
> > +       header->magic = LLVM_PRF_MAGIC;
> > +       header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
> > +       header->data_size = prf_data_count();
> > +       header->padding_bytes_before_counters = 0;
> > +       header->counters_size = prf_cnts_count();
> > +       header->padding_bytes_after_counters = 0;
> > +       header->names_size = prf_names_count();
> > +       header->counters_delta = (u64)__llvm_prf_cnts_start;
> > +       header->names_delta = (u64)__llvm_prf_names_start;
> > +       header->value_kind_last = LLVM_PRF_IPVK_LAST;
> > +
> > +       *buffer += sizeof(*header);
> > +}
> > +
> > +/*
> > + * Copy the source into the buffer, incrementing the pointer into buffer in the
> > + * process.
> > + */
> > +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> > +{
> > +       memcpy(*buffer, src, size);
> > +       *buffer += size;
> > +}
> > +
> > +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> > +{
> > +       struct llvm_prf_value_node **nodes =
> > +               (struct llvm_prf_value_node **)p->values;
> > +       u32 kinds = 0;
> > +       u32 size = 0;
> > +       unsigned int kind;
> > +       unsigned int n;
> > +       unsigned int s = 0;
> > +
> > +       for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> > +               unsigned int sites = p->num_value_sites[kind];
> > +
> > +               if (!sites)
> > +                       continue;
> > +
> > +               /* Record + site count array */
> > +               size += prf_get_value_record_size(sites);
> > +               kinds++;
> > +
> > +               if (!nodes)
> > +                       continue;
> > +
> > +               for (n = 0; n < sites; n++) {
> > +                       u32 count = 0;
> > +                       struct llvm_prf_value_node *site = nodes[s + n];
> > +
> > +                       while (site && ++count <= U8_MAX)
> > +                               site = site->next;
> > +
> > +                       size += count *
> > +                               sizeof(struct llvm_prf_value_node_data);
> > +               }
> > +
> > +               s += sites;
> > +       }
> > +
> > +       if (size)
> > +               size += sizeof(struct llvm_prf_value_data);
> > +
> > +       if (value_kinds)
> > +               *value_kinds = kinds;
> > +
> > +       return size;
> > +}
> > +
> > +static u32 prf_get_value_size(void)
> > +{
> > +       u32 size = 0;
> > +       struct llvm_prf_data *p;
> > +
> > +       for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> > +               size += __prf_get_value_size(p, NULL);
> > +
> > +       return size;
> > +}
> > +
> > +/* Serialize the profiling's value. */
> > +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> > +{
> > +       struct llvm_prf_value_data header;
> > +       struct llvm_prf_value_node **nodes =
> > +               (struct llvm_prf_value_node **)p->values;
> > +       unsigned int kind;
> > +       unsigned int n;
> > +       unsigned int s = 0;
> > +
> > +       header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> > +
> > +       if (!header.num_value_kinds)
> > +               /* Nothing to write. */
> > +               return;
> > +
> > +       prf_copy_to_buffer(buffer, &header, sizeof(header));
> > +
> > +       for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> > +               struct llvm_prf_value_record *record;
> > +               u8 *counts;
> > +               unsigned int sites = p->num_value_sites[kind];
> > +
> > +               if (!sites)
> > +                       continue;
> > +
> > +               /* Profiling value record. */
> > +               record = *(struct llvm_prf_value_record **)buffer;
> > +               *buffer += prf_get_value_record_header_size();
> > +
> > +               record->kind = kind;
> > +               record->num_value_sites = sites;
> > +
> > +               /* Site count array. */
> > +               counts = *(u8 **)buffer;
> > +               *buffer += prf_get_value_record_site_count_size(sites);
> > +
> > +               /*
> > +                * If we don't have nodes, we can skip updating the site count
> > +                * array, because the buffer is zero filled.
> > +                */
> > +               if (!nodes)
> > +                       continue;
> > +
> > +               for (n = 0; n < sites; n++) {
> > +                       u32 count = 0;
> > +                       struct llvm_prf_value_node *site = nodes[s + n];
> > +
> > +                       while (site && ++count <= U8_MAX) {
> > +                               prf_copy_to_buffer(buffer, site,
> > +                                                  sizeof(struct llvm_prf_value_node_data));
> > +                               site = site->next;
> > +                       }
> > +
> > +                       counts[n] = (u8)count;
> > +               }
> > +
> > +               s += sites;
> > +       }
> > +}
> > +
> > +static void prf_serialize_values(void **buffer)
> > +{
> > +       struct llvm_prf_data *p;
> > +
> > +       for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> > +               prf_serialize_value(p, buffer);
> > +}
> > +
> > +static inline unsigned long prf_get_padding(unsigned long size)
> > +{
> > +       return 7 & (8 - size % 8);
> > +}
> > +
> > +static unsigned long prf_buffer_size(void)
> > +{
> > +       return sizeof(struct llvm_prf_header) +
> > +                       prf_data_size() +
> > +                       prf_cnts_size() +
> > +                       prf_names_size() +
> > +                       prf_get_padding(prf_names_size()) +
> > +                       prf_get_value_size();
> > +}
> > +
> > +/* Serialize the profiling data into a format LLVM's tools can understand. */
> > +static int prf_serialize(struct prf_private_data *p)
> > +{
> > +       int err = 0;
> > +       void *buffer;
> > +
> > +       p->size = prf_buffer_size();
> > +       p->buffer = vzalloc(p->size);
> > +
> > +       if (!p->buffer) {
> > +               err = -ENOMEM;
> > +               goto out;
> > +       }
> > +
> > +       buffer = p->buffer;
> > +
> > +       prf_fill_header(&buffer);
> > +       prf_copy_to_buffer(&buffer, __llvm_prf_data_start,  prf_data_size());
> > +       prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start,  prf_cnts_size());
> > +       prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> > +       buffer += prf_get_padding(prf_names_size());
> > +
> > +       prf_serialize_values(&buffer);
> > +
> > +out:
> > +       return err;
> > +}
> > +
> > +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> > +static int prf_open(struct inode *inode, struct file *file)
> > +{
> > +       struct prf_private_data *data;
> > +       unsigned long flags;
> > +       int err;
> > +
> > +       data = kzalloc(sizeof(*data), GFP_KERNEL);
> > +       if (!data) {
> > +               err = -ENOMEM;
> > +               goto out;
> > +       }
> > +
> > +       flags = prf_lock();
> > +
> > +       err = prf_serialize(data);
> > +       if (err) {
> > +               kfree(data);
> > +               goto out_unlock;
> > +       }
> > +
> > +       file->private_data = data;
> > +
> > +out_unlock:
> > +       prf_unlock(flags);
> > +out:
> > +       return err;
> > +}
> > +
> > +/* read() implementation for PGO. */
> > +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> > +                       loff_t *ppos)
> > +{
> > +       struct prf_private_data *data = file->private_data;
> > +
> > +       BUG_ON(!data);
> > +
> > +       return simple_read_from_buffer(buf, count, ppos, data->buffer,
> > +                                      data->size);
> > +}
> > +
> > +/* release() implementation for PGO. Release resources allocated by open(). */
> > +static int prf_release(struct inode *inode, struct file *file)
> > +{
> > +       struct prf_private_data *data = file->private_data;
> > +
> > +       if (data) {
> > +               vfree(data->buffer);
> > +               kfree(data);
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +static const struct file_operations prf_fops = {
> > +       .owner          = THIS_MODULE,
> > +       .open           = prf_open,
> > +       .read           = prf_read,
> > +       .llseek         = default_llseek,
> > +       .release        = prf_release
> > +};
> > +
> > +/* write() implementation for resetting PGO's profile data. */
> > +static ssize_t reset_write(struct file *file, const char __user *addr,
> > +                          size_t len, loff_t *pos)
> > +{
> > +       struct llvm_prf_data *data;
> > +
> > +       memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> > +
> > +       for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
> > +               struct llvm_prf_value_node **vnodes;
> > +               u64 current_vsite_count;
> > +               u32 i;
> > +
> > +               if (!data->values)
> > +                       continue;
> > +
> > +               current_vsite_count = 0;
> > +               vnodes = (struct llvm_prf_value_node **)data->values;
> > +
> > +               for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
> > +                       current_vsite_count += data->num_value_sites[i];
> > +
> > +               for (i = 0; i < current_vsite_count; ++i) {
> > +                       struct llvm_prf_value_node *current_vnode = vnodes[i];
> > +
> > +                       while (current_vnode) {
> > +                               current_vnode->count = 0;
> > +                               current_vnode = current_vnode->next;
> > +                       }
> > +               }
> > +       }
> > +
> > +       return len;
> > +}
> > +
> > +static const struct file_operations prf_reset_fops = {
> > +       .owner          = THIS_MODULE,
> > +       .write          = reset_write,
> > +       .llseek         = noop_llseek,
> > +};
> > +
> > +/* Create debugfs entries. */
> > +static int __init pgo_init(void)
> > +{
> > +       directory = debugfs_create_dir("pgo", NULL);
> > +       if (!directory)
> > +               goto err_remove;
> > +
> > +       if (!debugfs_create_file("profraw", 0600, directory, NULL,
> > +                                &prf_fops))
> > +               goto err_remove;
> > +
> > +       if (!debugfs_create_file("reset", 0200, directory, NULL,
> > +                                &prf_reset_fops))
> > +               goto err_remove;
> > +
> > +       return 0;
> > +
> > +err_remove:
> > +       pr_err("initialization failed\n");
> > +       return -EIO;
> > +}
> > +
> > +/* Remove debugfs entries. */
> > +static void __exit pgo_exit(void)
> > +{
> > +       debugfs_remove_recursive(directory);
> > +}
> > +
> > +module_init(pgo_init);
> > +module_exit(pgo_exit);
> > diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> > new file mode 100644
> > index 0000000000000..6084ff0652e85
> > --- /dev/null
> > +++ b/kernel/pgo/instrument.c
> > @@ -0,0 +1,185 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + *     Sami Tolvanen <samitolvanen@google.com>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt)    "pgo: " fmt
> > +
> > +#include <linux/bitops.h>
> > +#include <linux/kernel.h>
> > +#include <linux/export.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/types.h>
> > +#include "pgo.h"
> > +
> > +/* Lock guarding value node access and serialization. */
> > +static DEFINE_SPINLOCK(pgo_lock);
> > +static int current_node;
> > +
> > +unsigned long prf_lock(void)
> > +{
> > +       unsigned long flags;
> > +
> > +       spin_lock_irqsave(&pgo_lock, flags);
> > +
> > +       return flags;
> > +}
> > +
> > +void prf_unlock(unsigned long flags)
> > +{
> > +       spin_unlock_irqrestore(&pgo_lock, flags);
> > +}
> > +
> > +/*
> > + * Return a newly allocated profiling value node which contains the tracked
> > + * value by the value profiler.
> > + * Note: caller *must* hold pgo_lock.
> > + */
> > +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> > +                                                u32 index, u64 value)
> > +{
> > +       if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> > +               return NULL; /* Out of nodes */
> > +
> > +       current_node++;
> > +
> > +       /* Make sure the node is entirely within the section */
> > +       if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> > +           &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> > +               return NULL;
> > +
> > +       return &__llvm_prf_vnds_start[current_node];
> > +}
> > +
> > +/*
> > + * Counts the number of times a target value is seen.
> > + *
> > + * Records the target value for the CounterIndex if not seen before. Otherwise,
> > + * increments the counter associated w/ the target value.
> > + */
> > +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
> > +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> > +{
> > +       struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> > +       struct llvm_prf_value_node **counters;
> > +       struct llvm_prf_value_node *curr;
> > +       struct llvm_prf_value_node *min = NULL;
> > +       struct llvm_prf_value_node *prev = NULL;
> > +       u64 min_count = U64_MAX;
> > +       u8 values = 0;
> > +       unsigned long flags;
> > +
> > +       if (!p || !p->values)
> > +               return;
> > +
> > +       counters = (struct llvm_prf_value_node **)p->values;
> > +       curr = counters[index];
> > +
> > +       while (curr) {
> > +               if (target_value == curr->value) {
> > +                       curr->count++;
> > +                       return;
> > +               }
> > +
> > +               if (curr->count < min_count) {
> > +                       min_count = curr->count;
> > +                       min = curr;
> > +               }
> > +
> > +               prev = curr;
> > +               curr = curr->next;
> > +               values++;
> > +       }
> > +
> > +       if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
> > +               if (!min->count || !(--min->count)) {
> > +                       curr = min;
> > +                       curr->value = target_value;
> > +                       curr->count++;
> > +               }
> > +               return;
> > +       }
> > +
> > +       /* Lock when updating the value node structure. */
> > +       flags = prf_lock();
> > +
> > +       curr = allocate_node(p, index, target_value);
> > +       if (!curr)
> > +               goto out;
> > +
> > +       curr->value = target_value;
> > +       curr->count++;
> > +
> > +       if (!counters[index])
> > +               counters[index] = curr;
> > +       else if (prev && !prev->next)
> > +               prev->next = curr;
> > +
> > +out:
> > +       prf_unlock(flags);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> > +
> > +/* Counts the number of times a range of targets values are seen. */
> > +void __llvm_profile_instrument_range(u64 target_value, void *data,
> > +                                    u32 index, s64 precise_start,
> > +                                    s64 precise_last, s64 large_value);
> > +void __llvm_profile_instrument_range(u64 target_value, void *data,
> > +                                    u32 index, s64 precise_start,
> > +                                    s64 precise_last, s64 large_value)
> > +{
> > +       if (large_value != S64_MIN && (s64)target_value >= large_value)
> > +               target_value = large_value;
> > +       else if ((s64)target_value < precise_start ||
> > +                (s64)target_value > precise_last)
> > +               target_value = precise_last + 1;
> > +
> > +       __llvm_profile_instrument_target(target_value, data, index);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> > +
> > +static u64 inst_prof_get_range_rep_value(u64 value)
> > +{
> > +       if (value <= 8)
> > +               /* The first ranges are individually tracked, us it as is. */
> > +               return value;
> > +       else if (value >= 513)
> > +               /* The last range is mapped to its lowest value. */
> > +               return 513;
> > +       else if (hweight64(value) == 1)
> > +               /* If it's a power of two, use it as is. */
> > +               return value;
> > +
> > +       /* Otherwise, take to the previous power of two + 1. */
> > +       return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
> > +}
> > +
> > +/*
> > + * The target values are partitioned into multiple ranges. The range spec is
> > + * defined in compiler-rt/include/profile/InstrProfData.inc.
> > + */
> > +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> > +                                    u32 counter_index);
> > +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> > +                                    u32 counter_index)
> > +{
> > +       u64 rep_value;
> > +
> > +       /* Map the target value to the representative value of its range. */
> > +       rep_value = inst_prof_get_range_rep_value(target_value);
> > +       __llvm_profile_instrument_target(rep_value, data, counter_index);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> > diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> > new file mode 100644
> > index 0000000000000..df0aa278f28bd
> > --- /dev/null
> > +++ b/kernel/pgo/pgo.h
> > @@ -0,0 +1,206 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + *     Sami Tolvanen <samitolvanen@google.com>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#ifndef _PGO_H
> > +#define _PGO_H
> > +
> > +/*
> > + * Note: These internal LLVM definitions must match the compiler version.
> > + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> > + */
> > +
> > +#ifdef CONFIG_64BIT
> > +       #define LLVM_PRF_MAGIC          \
> > +               ((u64)255 << 56 |       \
> > +                (u64)'l' << 48 |       \
> > +                (u64)'p' << 40 |       \
> > +                (u64)'r' << 32 |       \
> > +                (u64)'o' << 24 |       \
> > +                (u64)'f' << 16 |       \
> > +                (u64)'r' << 8  |       \
> > +                (u64)129)
> > +#else
> > +       #define LLVM_PRF_MAGIC          \
> > +               ((u64)255 << 56 |       \
> > +                (u64)'l' << 48 |       \
> > +                (u64)'p' << 40 |       \
> > +                (u64)'r' << 32 |       \
> > +                (u64)'o' << 24 |       \
> > +                (u64)'f' << 16 |       \
> > +                (u64)'R' << 8  |       \
> > +                (u64)129)
> > +#endif
> > +
> > +#define LLVM_PRF_VERSION               5
> > +#define LLVM_PRF_DATA_ALIGN            8
> > +#define LLVM_PRF_IPVK_FIRST            0
> > +#define LLVM_PRF_IPVK_LAST             1
> > +#define LLVM_PRF_MAX_NUM_VALS_PER_SITE 16
> > +
> > +#define LLVM_PRF_VARIANT_MASK_IR       (0x1ull << 56)
> > +#define LLVM_PRF_VARIANT_MASK_CSIR     (0x1ull << 57)
> > +
> > +/**
> > + * struct llvm_prf_header - represents the raw profile header data structure.
> > + * @magic: the magic token for the file format.
> > + * @version: the version of the file format.
> > + * @data_size: the number of entries in the profile data section.
> > + * @padding_bytes_before_counters: the number of padding bytes before the
> > + *   counters.
> > + * @counters_size: the size in bytes of the LLVM profile section containing the
> > + *   counters.
> > + * @padding_bytes_after_counters: the number of padding bytes after the
> > + *   counters.
> > + * @names_size: the size in bytes of the LLVM profile section containing the
> > + *   counters' names.
> > + * @counters_delta: the beginning of the LLMV profile counters section.
> > + * @names_delta: the beginning of the LLMV profile names section.
> > + * @value_kind_last: the last profile value kind.
> > + */
> > +struct llvm_prf_header {
> > +       u64 magic;
> > +       u64 version;
> > +       u64 data_size;
> > +       u64 padding_bytes_before_counters;
> > +       u64 counters_size;
> > +       u64 padding_bytes_after_counters;
> > +       u64 names_size;
> > +       u64 counters_delta;
> > +       u64 names_delta;
> > +       u64 value_kind_last;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_data - represents the per-function control structure.
> > + * @name_ref: the reference to the function's name.
> > + * @func_hash: the hash value of the function.
> > + * @counter_ptr: a pointer to the profile counter.
> > + * @function_ptr: a pointer to the function.
> > + * @values: the profiling values associated with this function.
> > + * @num_counters: the number of counters in the function.
> > + * @num_value_sites: the number of value profile sites.
> > + */
> > +struct llvm_prf_data {
> > +       const u64 name_ref;
> > +       const u64 func_hash;
> > +       const void *counter_ptr;
> > +       const void *function_ptr;
> > +       void *values;
> > +       const u32 num_counters;
> > +       const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
> > +} __aligned(LLVM_PRF_DATA_ALIGN);
> > +
> > +/**
> > + * structure llvm_prf_value_node_data - represents the data part of the struct
> > + *   llvm_prf_value_node data structure.
> > + * @value: the value counters.
> > + * @count: the counters' count.
> > + */
> > +struct llvm_prf_value_node_data {
> > +       u64 value;
> > +       u64 count;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_node - represents an internal data structure used by
> > + *   the value profiler.
> > + * @value: the value counters.
> > + * @count: the counters' count.
> > + * @next: the next value node.
> > + */
> > +struct llvm_prf_value_node {
> > +       u64 value;
> > +       u64 count;
> > +       struct llvm_prf_value_node *next;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_data - represents the value profiling data in indexed
> > + *   format.
> > + * @total_size: the total size in bytes including this field.
> > + * @num_value_kinds: the number of value profile kinds that has value profile
> > + *   data.
> > + */
> > +struct llvm_prf_value_data {
> > +       u32 total_size;
> > +       u32 num_value_kinds;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_record - represents the on-disk layout of the value
> > + *   profile data of a particular kind for one function.
> > + * @kind: the kind of the value profile record.
> > + * @num_value_sites: the number of value profile sites.
> > + * @site_count_array: the first element of the array that stores the number
> > + *   of profiled values for each value site.
> > + */
> > +struct llvm_prf_value_record {
> > +       u32 kind;
> > +       u32 num_value_sites;
> > +       u8 site_count_array[];
> > +};
> > +
> > +#define prf_get_value_record_header_size()             \
> > +       offsetof(struct llvm_prf_value_record, site_count_array)
> > +#define prf_get_value_record_site_count_size(sites)    \
> > +       roundup((sites), 8)
> > +#define prf_get_value_record_size(sites)               \
> > +       (prf_get_value_record_header_size() +           \
> > +        prf_get_value_record_site_count_size((sites)))
> > +
> > +/* Data sections */
> > +extern struct llvm_prf_data __llvm_prf_data_start[];
> > +extern struct llvm_prf_data __llvm_prf_data_end[];
> > +
> > +extern u64 __llvm_prf_cnts_start[];
> > +extern u64 __llvm_prf_cnts_end[];
> > +
> > +extern char __llvm_prf_names_start[];
> > +extern char __llvm_prf_names_end[];
> > +
> > +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> > +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> > +
> > +/* Locking for vnodes */
> > +extern unsigned long prf_lock(void);
> > +extern void prf_unlock(unsigned long flags);
> > +
> > +#define __DEFINE_PRF_SIZE(s) \
> > +       static inline unsigned long prf_ ## s ## _size(void)            \
> > +       {                                                               \
> > +               unsigned long start =                                   \
> > +                       (unsigned long)__llvm_prf_ ## s ## _start;      \
> > +               unsigned long end =                                     \
> > +                       (unsigned long)__llvm_prf_ ## s ## _end;        \
> > +               return roundup(end - start,                             \
> > +                               sizeof(__llvm_prf_ ## s ## _start[0])); \
> > +       }                                                               \
> > +       static inline unsigned long prf_ ## s ## _count(void)           \
> > +       {                                                               \
> > +               return prf_ ## s ## _size() /                           \
> > +                       sizeof(__llvm_prf_ ## s ## _start[0]);          \
> > +       }
> > +
> > +__DEFINE_PRF_SIZE(data);
> > +__DEFINE_PRF_SIZE(cnts);
> > +__DEFINE_PRF_SIZE(names);
> > +__DEFINE_PRF_SIZE(vnds);
> > +
> > +#undef __DEFINE_PRF_SIZE
> > +
> > +#endif /* _PGO_H */
> > diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> > index 213677a5ed33e..9b218afb5cb87 100644
> > --- a/scripts/Makefile.lib
> > +++ b/scripts/Makefile.lib
> > @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
> >                 $(CFLAGS_GCOV))
> >  endif
> >
> > +#
> > +# Enable clang's PGO profiling flags for a file or directory depending on
> > +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> > +#
> > +ifeq ($(CONFIG_PGO_CLANG),y)
> > +_c_flags += $(if $(patsubst n%,, \
> > +               $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> > +               $(CFLAGS_PGO_CLANG))
> > +endif
> > +
> >  #
> >  # Enable address sanitizer flags for kernel except some files or directories
> >  # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> > --
> > 2.30.0.284.gd98b1dd5eaa7-goog
> >
> > --
> > You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210116094357.3620352-1-morbo%40google.com.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-16 17:38         ` Sedat Dilek
  2021-01-16 18:36           ` Sedat Dilek
@ 2021-01-16 20:23           ` Bill Wendling
  2021-01-17 10:44             ` Sedat Dilek
  1 sibling, 1 reply; 116+ messages in thread
From: Bill Wendling @ 2021-01-16 20:23 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Jonathan Corbet, Masahiro Yamada, Linux Doc Mailing List, LKML,
	Linux Kbuild mailing list, Clang-Built-Linux ML, Andrew Morton,
	Nathan Chancellor, Nick Desaulniers, Sami Tolvanen

On Sat, Jan 16, 2021 at 9:39 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
> <clang-built-linux@googlegroups.com> wrote:
> >
> > From: Sami Tolvanen <samitolvanen@google.com>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool
> > before it can be used during recompilation:
> >
> >   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> >   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can now be used by the compiler:
> >
> >   $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we
> > know works. This restriction can be lifted once other platforms have
> > been verified to work with PGO.
> >
> > Note that this method of profiling the kernel is clang-native, unlike
> > the clang support in kernel/gcov.
> >
> > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> >
> > Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> > Co-developed-by: Bill Wendling <morbo@google.com>
> > Signed-off-by: Bill Wendling <morbo@google.com>
> > ---
> > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> >       testing.
> >     - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> >       Song's comments.
> > v3: - Added change log section based on Sedat Dilek's comments.
> > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> >       own popcount implementation, based on Nick Desaulniers's comment.
> > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > ---
> >  Documentation/dev-tools/index.rst     |   1 +
> >  Documentation/dev-tools/pgo.rst       | 127 +++++++++
> >  MAINTAINERS                           |   9 +
> >  Makefile                              |   3 +
> >  arch/Kconfig                          |   1 +
> >  arch/x86/Kconfig                      |   1 +
> >  arch/x86/boot/Makefile                |   1 +
> >  arch/x86/boot/compressed/Makefile     |   1 +
> >  arch/x86/crypto/Makefile              |   2 +
> >  arch/x86/entry/vdso/Makefile          |   1 +
> >  arch/x86/kernel/vmlinux.lds.S         |   2 +
> >  arch/x86/platform/efi/Makefile        |   1 +
> >  arch/x86/purgatory/Makefile           |   1 +
> >  arch/x86/realmode/rm/Makefile         |   1 +
> >  arch/x86/um/vdso/Makefile             |   1 +
> >  drivers/firmware/efi/libstub/Makefile |   1 +
> >  include/asm-generic/vmlinux.lds.h     |  44 +++
> >  kernel/Makefile                       |   1 +
> >  kernel/pgo/Kconfig                    |  35 +++
> >  kernel/pgo/Makefile                   |   5 +
> >  kernel/pgo/fs.c                       | 382 ++++++++++++++++++++++++++
> >  kernel/pgo/instrument.c               | 185 +++++++++++++
> >  kernel/pgo/pgo.h                      | 206 ++++++++++++++
> >  scripts/Makefile.lib                  |  10 +
> >  24 files changed, 1022 insertions(+)
> >  create mode 100644 Documentation/dev-tools/pgo.rst
> >  create mode 100644 kernel/pgo/Kconfig
> >  create mode 100644 kernel/pgo/Makefile
> >  create mode 100644 kernel/pgo/fs.c
> >  create mode 100644 kernel/pgo/instrument.c
> >  create mode 100644 kernel/pgo/pgo.h
> >
> > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > index f7809c7b1ba9e..8d6418e858062 100644
> > --- a/Documentation/dev-tools/index.rst
> > +++ b/Documentation/dev-tools/index.rst
> > @@ -26,6 +26,7 @@ whole; patches welcome!
> >     kgdb
> >     kselftest
> >     kunit/index
> > +   pgo
> >
> >
> >  .. only::  subproject and html
> > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > new file mode 100644
> > index 0000000000000..b7f11d8405b73
> > --- /dev/null
> > +++ b/Documentation/dev-tools/pgo.rst
> > @@ -0,0 +1,127 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +===============================
> > +Using PGO with the Linux kernel
> > +===============================
> > +
> > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > +when building with Clang. The profiling data is exported via the ``pgo``
> > +debugfs directory.
> > +
> > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > +
> > +
> > +Preparation
> > +===========
> > +
> > +Configure the kernel with:
> > +
> > +.. code-block:: make
> > +
> > +   CONFIG_DEBUG_FS=y
> > +   CONFIG_PGO_CLANG=y
> > +
> > +Note that kernels compiled with profiling flags will be significantly larger
> > +and run slower.
> > +
> > +Profiling data will only become accessible once debugfs has been mounted:
> > +
> > +.. code-block:: sh
> > +
> > +   mount -t debugfs none /sys/kernel/debug
> > +
> > +
> > +Customization
> > +=============
> > +
> > +You can enable or disable profiling for individual file and directories by
> > +adding a line similar to the following to the respective kernel Makefile:
> > +
> > +- For a single file (e.g. main.o)
> > +
> > +  .. code-block:: make
> > +
> > +     PGO_PROFILE_main.o := y
> > +
> > +- For all files in one directory
> > +
> > +  .. code-block:: make
> > +
> > +     PGO_PROFILE := y
> > +
> > +To exclude files from being profiled use
> > +
> > +  .. code-block:: make
> > +
> > +     PGO_PROFILE_main.o := n
> > +
> > +and
> > +
> > +  .. code-block:: make
> > +
> > +     PGO_PROFILE := n
> > +
> > +Only files which are linked to the main kernel image or are compiled as kernel
> > +modules are supported by this mechanism.
> > +
> > +
> > +Files
> > +=====
> > +
> > +The PGO kernel support creates the following files in debugfs:
> > +
> > +``/sys/kernel/debug/pgo``
> > +       Parent directory for all PGO-related files.
> > +
> > +``/sys/kernel/debug/pgo/reset``
> > +       Global reset file: resets all coverage data to zero when written to.
> > +
> > +``/sys/kernel/debug/profraw``
> > +       The raw PGO data that must be processed with ``llvm_profdata``.
> > +
> > +
> > +Workflow
> > +========
> > +
> > +The PGO kernel can be run on the host or test machines. The data though should
> > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > +Clang version.
> > +
> > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > +etc. Clang offers tools to perform these tasks.
> > +
> > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > +using the result to optimize the kernel:
> > +
> > +1) Install the kernel on the TEST machine.
> > +
> > +2) Reset the data counters right before running the load tests
> > +
> > +   .. code-block:: sh
> > +
> > +      $ echo 1 > /sys/kernel/debug/pgo/reset
> > +
>
> I do not get this...
>
> # mount | grep debugfs
> debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
>
> After the load-test...?
>
> echo 0 > /sys/kernel/debug/pgo/reset
>
Writing anything to /sys/kernel/debug/pgo/reset will cause it to reset
the profiling counters. I picked 1 (one) semi-randomly, but it could
be any number, letter, your favorite short story, etc. You don't want
to reset it before collecting the profiling data from your load tests
though.

> > +3) Run the load tests.
> > +
> > +4) Collect the raw profile data
> > +
> > +   .. code-block:: sh
> > +
> > +      $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > +
>
> This is only 4,9M small and seen from the date 5mins before I run the
> echo-1 line.
>
> # ll /sys/kernel/debug/pgo
> insgesamt 0
> drwxr-xr-x  2 root root 0 16. Jan 17:29 .
> drwx------ 41 root root 0 16. Jan 17:29 ..
> -rw-------  1 root root 0 16. Jan 17:29 profraw
> --w-------  1 root root 0 16. Jan 18:19 reset
>
> # cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
>
> # ll /tmp/vmlinux.profraw
> -rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw
>
> For me there was no prof-data collected from my defconfig kernel-build.
>
The /sys/kernel/debug/pgo/profraw file is read-only. Nothing writes to
it, not even the kernel. All it does is serialize the profiling
counters from a memory location in the kernel into a format that
LLVM's tools can understand.

> > +5) (Optional) Download the raw profile data to the HOST machine.
> > +
> > +6) Process the raw profile data
> > +
> > +   .. code-block:: sh
> > +
> > +      $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > +
>
> Is that executed in /path/to/linux/git?
>
The llvm-profdata tool is not in the linux source tree. You need to
grab it from a clang distribution (or built from clang's git repo).

> > +   Note that multiple raw profile data files can be merged during this step.
> > +
> > +7) Rebuild the kernel using the profile data (PGO disabled)
> > +
> > +   .. code-block:: sh
> > +
> > +      $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> How big is vmlinux.profdata (make defconfig)?
>
I don't have numbers for this, but from what you listed here, it's ~5M
in size. The size is proportional to the number of counters
instrumented in the kernel.

> Do I need to do a full defconfig build or can I stop the build after
> let me say 10mins?
>
You should do a full rebuild. Make sure that PGO is disabled during the rebuild.

-bw

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-16 20:23           ` Bill Wendling
@ 2021-01-17 10:44             ` Sedat Dilek
  2021-01-17 10:53               ` Sedat Dilek
  0 siblings, 1 reply; 116+ messages in thread
From: Sedat Dilek @ 2021-01-17 10:44 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Jonathan Corbet, Masahiro Yamada, Linux Doc Mailing List, LKML,
	Linux Kbuild mailing list, Clang-Built-Linux ML, Andrew Morton,
	Nathan Chancellor, Nick Desaulniers, Sami Tolvanen

On Sat, Jan 16, 2021 at 9:23 PM Bill Wendling <morbo@google.com> wrote:
>
> On Sat, Jan 16, 2021 at 9:39 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
> > <clang-built-linux@googlegroups.com> wrote:
> > >
> > > From: Sami Tolvanen <samitolvanen@google.com>
> > >
> > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > profile, the kernel is instrumented with PGO counters, a representative
> > > workload is run, and the raw profile data is collected from
> > > /sys/kernel/debug/pgo/profraw.
> > >
> > > The raw profile data must be processed by clang's "llvm-profdata" tool
> > > before it can be used during recompilation:
> > >
> > >   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > >   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > >
> > > Multiple raw profiles may be merged during this step.
> > >
> > > The data can now be used by the compiler:
> > >
> > >   $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > >
> > > This initial submission is restricted to x86, as that's the platform we
> > > know works. This restriction can be lifted once other platforms have
> > > been verified to work with PGO.
> > >
> > > Note that this method of profiling the kernel is clang-native, unlike
> > > the clang support in kernel/gcov.
> > >
> > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > >
> > > Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> > > Co-developed-by: Bill Wendling <morbo@google.com>
> > > Signed-off-by: Bill Wendling <morbo@google.com>
> > > ---
> > > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > >       testing.
> > >     - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > >       Song's comments.
> > > v3: - Added change log section based on Sedat Dilek's comments.
> > > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > >       own popcount implementation, based on Nick Desaulniers's comment.
> > > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > > ---
> > >  Documentation/dev-tools/index.rst     |   1 +
> > >  Documentation/dev-tools/pgo.rst       | 127 +++++++++
> > >  MAINTAINERS                           |   9 +
> > >  Makefile                              |   3 +
> > >  arch/Kconfig                          |   1 +
> > >  arch/x86/Kconfig                      |   1 +
> > >  arch/x86/boot/Makefile                |   1 +
> > >  arch/x86/boot/compressed/Makefile     |   1 +
> > >  arch/x86/crypto/Makefile              |   2 +
> > >  arch/x86/entry/vdso/Makefile          |   1 +
> > >  arch/x86/kernel/vmlinux.lds.S         |   2 +
> > >  arch/x86/platform/efi/Makefile        |   1 +
> > >  arch/x86/purgatory/Makefile           |   1 +
> > >  arch/x86/realmode/rm/Makefile         |   1 +
> > >  arch/x86/um/vdso/Makefile             |   1 +
> > >  drivers/firmware/efi/libstub/Makefile |   1 +
> > >  include/asm-generic/vmlinux.lds.h     |  44 +++
> > >  kernel/Makefile                       |   1 +
> > >  kernel/pgo/Kconfig                    |  35 +++
> > >  kernel/pgo/Makefile                   |   5 +
> > >  kernel/pgo/fs.c                       | 382 ++++++++++++++++++++++++++
> > >  kernel/pgo/instrument.c               | 185 +++++++++++++
> > >  kernel/pgo/pgo.h                      | 206 ++++++++++++++
> > >  scripts/Makefile.lib                  |  10 +
> > >  24 files changed, 1022 insertions(+)
> > >  create mode 100644 Documentation/dev-tools/pgo.rst
> > >  create mode 100644 kernel/pgo/Kconfig
> > >  create mode 100644 kernel/pgo/Makefile
> > >  create mode 100644 kernel/pgo/fs.c
> > >  create mode 100644 kernel/pgo/instrument.c
> > >  create mode 100644 kernel/pgo/pgo.h
> > >
> > > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > > index f7809c7b1ba9e..8d6418e858062 100644
> > > --- a/Documentation/dev-tools/index.rst
> > > +++ b/Documentation/dev-tools/index.rst
> > > @@ -26,6 +26,7 @@ whole; patches welcome!
> > >     kgdb
> > >     kselftest
> > >     kunit/index
> > > +   pgo
> > >
> > >
> > >  .. only::  subproject and html
> > > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > > new file mode 100644
> > > index 0000000000000..b7f11d8405b73
> > > --- /dev/null
> > > +++ b/Documentation/dev-tools/pgo.rst
> > > @@ -0,0 +1,127 @@
> > > +.. SPDX-License-Identifier: GPL-2.0
> > > +
> > > +===============================
> > > +Using PGO with the Linux kernel
> > > +===============================
> > > +
> > > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > > +when building with Clang. The profiling data is exported via the ``pgo``
> > > +debugfs directory.
> > > +
> > > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > +
> > > +
> > > +Preparation
> > > +===========
> > > +
> > > +Configure the kernel with:
> > > +
> > > +.. code-block:: make
> > > +
> > > +   CONFIG_DEBUG_FS=y
> > > +   CONFIG_PGO_CLANG=y
> > > +
> > > +Note that kernels compiled with profiling flags will be significantly larger
> > > +and run slower.
> > > +
> > > +Profiling data will only become accessible once debugfs has been mounted:
> > > +
> > > +.. code-block:: sh
> > > +
> > > +   mount -t debugfs none /sys/kernel/debug
> > > +
> > > +
> > > +Customization
> > > +=============
> > > +
> > > +You can enable or disable profiling for individual file and directories by
> > > +adding a line similar to the following to the respective kernel Makefile:
> > > +
> > > +- For a single file (e.g. main.o)
> > > +
> > > +  .. code-block:: make
> > > +
> > > +     PGO_PROFILE_main.o := y
> > > +
> > > +- For all files in one directory
> > > +
> > > +  .. code-block:: make
> > > +
> > > +     PGO_PROFILE := y
> > > +
> > > +To exclude files from being profiled use
> > > +
> > > +  .. code-block:: make
> > > +
> > > +     PGO_PROFILE_main.o := n
> > > +
> > > +and
> > > +
> > > +  .. code-block:: make
> > > +
> > > +     PGO_PROFILE := n
> > > +
> > > +Only files which are linked to the main kernel image or are compiled as kernel
> > > +modules are supported by this mechanism.
> > > +
> > > +
> > > +Files
> > > +=====
> > > +
> > > +The PGO kernel support creates the following files in debugfs:
> > > +
> > > +``/sys/kernel/debug/pgo``
> > > +       Parent directory for all PGO-related files.
> > > +
> > > +``/sys/kernel/debug/pgo/reset``
> > > +       Global reset file: resets all coverage data to zero when written to.
> > > +
> > > +``/sys/kernel/debug/profraw``
> > > +       The raw PGO data that must be processed with ``llvm_profdata``.
> > > +
> > > +
> > > +Workflow
> > > +========
> > > +
> > > +The PGO kernel can be run on the host or test machines. The data though should
> > > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > > +Clang version.
> > > +
> > > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > > +etc. Clang offers tools to perform these tasks.
> > > +
> > > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > > +using the result to optimize the kernel:
> > > +
> > > +1) Install the kernel on the TEST machine.
> > > +
> > > +2) Reset the data counters right before running the load tests
> > > +
> > > +   .. code-block:: sh
> > > +
> > > +      $ echo 1 > /sys/kernel/debug/pgo/reset
> > > +
> >
> > I do not get this...
> >
> > # mount | grep debugfs
> > debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
> >
> > After the load-test...?
> >
> > echo 0 > /sys/kernel/debug/pgo/reset
> >
> Writing anything to /sys/kernel/debug/pgo/reset will cause it to reset
> the profiling counters. I picked 1 (one) semi-randomly, but it could
> be any number, letter, your favorite short story, etc. You don't want
> to reset it before collecting the profiling data from your load tests
> though.
>
> > > +3) Run the load tests.
> > > +
> > > +4) Collect the raw profile data
> > > +
> > > +   .. code-block:: sh
> > > +
> > > +      $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > +
> >
> > This is only 4,9M small and seen from the date 5mins before I run the
> > echo-1 line.
> >
> > # ll /sys/kernel/debug/pgo
> > insgesamt 0
> > drwxr-xr-x  2 root root 0 16. Jan 17:29 .
> > drwx------ 41 root root 0 16. Jan 17:29 ..
> > -rw-------  1 root root 0 16. Jan 17:29 profraw
> > --w-------  1 root root 0 16. Jan 18:19 reset
> >
> > # cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> >
> > # ll /tmp/vmlinux.profraw
> > -rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw
> >
> > For me there was no prof-data collected from my defconfig kernel-build.
> >
> The /sys/kernel/debug/pgo/profraw file is read-only. Nothing writes to
> it, not even the kernel. All it does is serialize the profiling
> counters from a memory location in the kernel into a format that
> LLVM's tools can understand.
>
> > > +5) (Optional) Download the raw profile data to the HOST machine.
> > > +
> > > +6) Process the raw profile data
> > > +
> > > +   .. code-block:: sh
> > > +
> > > +      $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > +
> >
> > Is that executed in /path/to/linux/git?
> >
> The llvm-profdata tool is not in the linux source tree. You need to
> grab it from a clang distribution (or built from clang's git repo).
>
> > > +   Note that multiple raw profile data files can be merged during this step.
> > > +
> > > +7) Rebuild the kernel using the profile data (PGO disabled)
> > > +
> > > +   .. code-block:: sh
> > > +
> > > +      $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> >
> > How big is vmlinux.profdata (make defconfig)?
> >
> I don't have numbers for this, but from what you listed here, it's ~5M
> in size. The size is proportional to the number of counters
> instrumented in the kernel.
>
> > Do I need to do a full defconfig build or can I stop the build after
> > let me say 10mins?
> >
> You should do a full rebuild. Make sure that PGO is disabled during the rebuild.
>

Thanks Bill for all the information.

And sorry if I am so pedantic.

I have installed my Debian system with Legacy-BIOS enabled.

When I rebuild with KCFLAGS=-fprofile-use=vmlinux.profdata (LLVM=1 I
have as a default) my system hangs on reboot.

[ diffconfig ]
$ scripts/diffconfig /boot/config-5.11.0-rc3-8-amd64-clang12-pgo
/boot/config-5.11.0-rc3-9-amd64-clang12-pgo
BUILD_SALT "5.11.0-rc3-8-amd64-clang12-pgo" -> "5.11.0-rc3-9-amd64-clang12-pgo"
PGO_CLANG y -> n

[ my make line ]
$ cat ../start-build_5.11.0-rc3-9-amd64-clang12-pgo.txt
dileks     63120   63095  0 06:47 pts/2    00:00:00 /usr/bin/perf_5.10
stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
LD=ld.lld LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole
LOCALVERSION=-9-amd64-clang12-pgo KBUILD_VERBOSE=1
KBUILD_BUILD_HOST=iniza KBUILD_BUILD_USER=sedat.dilek@gmail.com
KBUILD_BUILD_TIMESTAMP=2021-01-17 bindeb-pkg
KDEB_PKGVERSION=5.11.0~rc3-9~bullseye+dileks1
KCFLAGS=-fprofile-use=vmlinux.profdata

( Yes, 06:47 a.m. in the morning :-). )

When I boot with the rebuild Linux-kernel I see:

Wrong EFI loader signature
...
Decompressing
Parsing EFI
Performing Relocations done.
Booting the Kernel.

*** SYSTEM HANGS ***
( I waited for approx 1 min )

I tried to turn UEFI support ON and OFF.
No success.

Does Clang-PGO support Legacy-BIOS or is something different wrong?

Thanks.

- Sedat -

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-17 10:44             ` Sedat Dilek
@ 2021-01-17 10:53               ` Sedat Dilek
  2021-01-17 11:23                 ` Sedat Dilek
  0 siblings, 1 reply; 116+ messages in thread
From: Sedat Dilek @ 2021-01-17 10:53 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Jonathan Corbet, Masahiro Yamada, Linux Doc Mailing List, LKML,
	Linux Kbuild mailing list, Clang-Built-Linux ML, Andrew Morton,
	Nathan Chancellor, Nick Desaulniers, Sami Tolvanen

On Sun, Jan 17, 2021 at 11:44 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
> On Sat, Jan 16, 2021 at 9:23 PM Bill Wendling <morbo@google.com> wrote:
> >
> > On Sat, Jan 16, 2021 at 9:39 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > > On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
> > > <clang-built-linux@googlegroups.com> wrote:
> > > >
> > > > From: Sami Tolvanen <samitolvanen@google.com>
> > > >
> > > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > > profile, the kernel is instrumented with PGO counters, a representative
> > > > workload is run, and the raw profile data is collected from
> > > > /sys/kernel/debug/pgo/profraw.
> > > >
> > > > The raw profile data must be processed by clang's "llvm-profdata" tool
> > > > before it can be used during recompilation:
> > > >
> > > >   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > >   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > >
> > > > Multiple raw profiles may be merged during this step.
> > > >
> > > > The data can now be used by the compiler:
> > > >
> > > >   $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > >
> > > > This initial submission is restricted to x86, as that's the platform we
> > > > know works. This restriction can be lifted once other platforms have
> > > > been verified to work with PGO.
> > > >
> > > > Note that this method of profiling the kernel is clang-native, unlike
> > > > the clang support in kernel/gcov.
> > > >
> > > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > >
> > > > Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> > > > Co-developed-by: Bill Wendling <morbo@google.com>
> > > > Signed-off-by: Bill Wendling <morbo@google.com>
> > > > ---
> > > > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > > >       testing.
> > > >     - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > > >       Song's comments.
> > > > v3: - Added change log section based on Sedat Dilek's comments.
> > > > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > > >       own popcount implementation, based on Nick Desaulniers's comment.
> > > > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > > > ---
> > > >  Documentation/dev-tools/index.rst     |   1 +
> > > >  Documentation/dev-tools/pgo.rst       | 127 +++++++++
> > > >  MAINTAINERS                           |   9 +
> > > >  Makefile                              |   3 +
> > > >  arch/Kconfig                          |   1 +
> > > >  arch/x86/Kconfig                      |   1 +
> > > >  arch/x86/boot/Makefile                |   1 +
> > > >  arch/x86/boot/compressed/Makefile     |   1 +
> > > >  arch/x86/crypto/Makefile              |   2 +
> > > >  arch/x86/entry/vdso/Makefile          |   1 +
> > > >  arch/x86/kernel/vmlinux.lds.S         |   2 +
> > > >  arch/x86/platform/efi/Makefile        |   1 +
> > > >  arch/x86/purgatory/Makefile           |   1 +
> > > >  arch/x86/realmode/rm/Makefile         |   1 +
> > > >  arch/x86/um/vdso/Makefile             |   1 +
> > > >  drivers/firmware/efi/libstub/Makefile |   1 +
> > > >  include/asm-generic/vmlinux.lds.h     |  44 +++
> > > >  kernel/Makefile                       |   1 +
> > > >  kernel/pgo/Kconfig                    |  35 +++
> > > >  kernel/pgo/Makefile                   |   5 +
> > > >  kernel/pgo/fs.c                       | 382 ++++++++++++++++++++++++++
> > > >  kernel/pgo/instrument.c               | 185 +++++++++++++
> > > >  kernel/pgo/pgo.h                      | 206 ++++++++++++++
> > > >  scripts/Makefile.lib                  |  10 +
> > > >  24 files changed, 1022 insertions(+)
> > > >  create mode 100644 Documentation/dev-tools/pgo.rst
> > > >  create mode 100644 kernel/pgo/Kconfig
> > > >  create mode 100644 kernel/pgo/Makefile
> > > >  create mode 100644 kernel/pgo/fs.c
> > > >  create mode 100644 kernel/pgo/instrument.c
> > > >  create mode 100644 kernel/pgo/pgo.h
> > > >
> > > > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > > > index f7809c7b1ba9e..8d6418e858062 100644
> > > > --- a/Documentation/dev-tools/index.rst
> > > > +++ b/Documentation/dev-tools/index.rst
> > > > @@ -26,6 +26,7 @@ whole; patches welcome!
> > > >     kgdb
> > > >     kselftest
> > > >     kunit/index
> > > > +   pgo
> > > >
> > > >
> > > >  .. only::  subproject and html
> > > > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > > > new file mode 100644
> > > > index 0000000000000..b7f11d8405b73
> > > > --- /dev/null
> > > > +++ b/Documentation/dev-tools/pgo.rst
> > > > @@ -0,0 +1,127 @@
> > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > +
> > > > +===============================
> > > > +Using PGO with the Linux kernel
> > > > +===============================
> > > > +
> > > > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > > > +when building with Clang. The profiling data is exported via the ``pgo``
> > > > +debugfs directory.
> > > > +
> > > > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > +
> > > > +
> > > > +Preparation
> > > > +===========
> > > > +
> > > > +Configure the kernel with:
> > > > +
> > > > +.. code-block:: make
> > > > +
> > > > +   CONFIG_DEBUG_FS=y
> > > > +   CONFIG_PGO_CLANG=y
> > > > +
> > > > +Note that kernels compiled with profiling flags will be significantly larger
> > > > +and run slower.
> > > > +
> > > > +Profiling data will only become accessible once debugfs has been mounted:
> > > > +
> > > > +.. code-block:: sh
> > > > +
> > > > +   mount -t debugfs none /sys/kernel/debug
> > > > +
> > > > +
> > > > +Customization
> > > > +=============
> > > > +
> > > > +You can enable or disable profiling for individual file and directories by
> > > > +adding a line similar to the following to the respective kernel Makefile:
> > > > +
> > > > +- For a single file (e.g. main.o)
> > > > +
> > > > +  .. code-block:: make
> > > > +
> > > > +     PGO_PROFILE_main.o := y
> > > > +
> > > > +- For all files in one directory
> > > > +
> > > > +  .. code-block:: make
> > > > +
> > > > +     PGO_PROFILE := y
> > > > +
> > > > +To exclude files from being profiled use
> > > > +
> > > > +  .. code-block:: make
> > > > +
> > > > +     PGO_PROFILE_main.o := n
> > > > +
> > > > +and
> > > > +
> > > > +  .. code-block:: make
> > > > +
> > > > +     PGO_PROFILE := n
> > > > +
> > > > +Only files which are linked to the main kernel image or are compiled as kernel
> > > > +modules are supported by this mechanism.
> > > > +
> > > > +
> > > > +Files
> > > > +=====
> > > > +
> > > > +The PGO kernel support creates the following files in debugfs:
> > > > +
> > > > +``/sys/kernel/debug/pgo``
> > > > +       Parent directory for all PGO-related files.
> > > > +
> > > > +``/sys/kernel/debug/pgo/reset``
> > > > +       Global reset file: resets all coverage data to zero when written to.
> > > > +
> > > > +``/sys/kernel/debug/profraw``
> > > > +       The raw PGO data that must be processed with ``llvm_profdata``.
> > > > +
> > > > +
> > > > +Workflow
> > > > +========
> > > > +
> > > > +The PGO kernel can be run on the host or test machines. The data though should
> > > > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > > > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > > > +Clang version.
> > > > +
> > > > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > > > +etc. Clang offers tools to perform these tasks.
> > > > +
> > > > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > > > +using the result to optimize the kernel:
> > > > +
> > > > +1) Install the kernel on the TEST machine.
> > > > +
> > > > +2) Reset the data counters right before running the load tests
> > > > +
> > > > +   .. code-block:: sh
> > > > +
> > > > +      $ echo 1 > /sys/kernel/debug/pgo/reset
> > > > +
> > >
> > > I do not get this...
> > >
> > > # mount | grep debugfs
> > > debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
> > >
> > > After the load-test...?
> > >
> > > echo 0 > /sys/kernel/debug/pgo/reset
> > >
> > Writing anything to /sys/kernel/debug/pgo/reset will cause it to reset
> > the profiling counters. I picked 1 (one) semi-randomly, but it could
> > be any number, letter, your favorite short story, etc. You don't want
> > to reset it before collecting the profiling data from your load tests
> > though.
> >
> > > > +3) Run the load tests.
> > > > +
> > > > +4) Collect the raw profile data
> > > > +
> > > > +   .. code-block:: sh
> > > > +
> > > > +      $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > +
> > >
> > > This is only 4,9M small and seen from the date 5mins before I run the
> > > echo-1 line.
> > >
> > > # ll /sys/kernel/debug/pgo
> > > insgesamt 0
> > > drwxr-xr-x  2 root root 0 16. Jan 17:29 .
> > > drwx------ 41 root root 0 16. Jan 17:29 ..
> > > -rw-------  1 root root 0 16. Jan 17:29 profraw
> > > --w-------  1 root root 0 16. Jan 18:19 reset
> > >
> > > # cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > >
> > > # ll /tmp/vmlinux.profraw
> > > -rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw
> > >
> > > For me there was no prof-data collected from my defconfig kernel-build.
> > >
> > The /sys/kernel/debug/pgo/profraw file is read-only. Nothing writes to
> > it, not even the kernel. All it does is serialize the profiling
> > counters from a memory location in the kernel into a format that
> > LLVM's tools can understand.
> >
> > > > +5) (Optional) Download the raw profile data to the HOST machine.
> > > > +
> > > > +6) Process the raw profile data
> > > > +
> > > > +   .. code-block:: sh
> > > > +
> > > > +      $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > +
> > >
> > > Is that executed in /path/to/linux/git?
> > >
> > The llvm-profdata tool is not in the linux source tree. You need to
> > grab it from a clang distribution (or built from clang's git repo).
> >
> > > > +   Note that multiple raw profile data files can be merged during this step.
> > > > +
> > > > +7) Rebuild the kernel using the profile data (PGO disabled)
> > > > +
> > > > +   .. code-block:: sh
> > > > +
> > > > +      $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > >
> > > How big is vmlinux.profdata (make defconfig)?
> > >
> > I don't have numbers for this, but from what you listed here, it's ~5M
> > in size. The size is proportional to the number of counters
> > instrumented in the kernel.
> >
> > > Do I need to do a full defconfig build or can I stop the build after
> > > let me say 10mins?
> > >
> > You should do a full rebuild. Make sure that PGO is disabled during the rebuild.
> >
>
> Thanks Bill for all the information.
>
> And sorry if I am so pedantic.
>
> I have installed my Debian system with Legacy-BIOS enabled.
>
> When I rebuild with KCFLAGS=-fprofile-use=vmlinux.profdata (LLVM=1 I
> have as a default) my system hangs on reboot.
>
> [ diffconfig ]
> $ scripts/diffconfig /boot/config-5.11.0-rc3-8-amd64-clang12-pgo
> /boot/config-5.11.0-rc3-9-amd64-clang12-pgo
> BUILD_SALT "5.11.0-rc3-8-amd64-clang12-pgo" -> "5.11.0-rc3-9-amd64-clang12-pgo"
> PGO_CLANG y -> n
>
> [ my make line ]
> $ cat ../start-build_5.11.0-rc3-9-amd64-clang12-pgo.txt
> dileks     63120   63095  0 06:47 pts/2    00:00:00 /usr/bin/perf_5.10
> stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> LD=ld.lld LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole
> LOCALVERSION=-9-amd64-clang12-pgo KBUILD_VERBOSE=1
> KBUILD_BUILD_HOST=iniza KBUILD_BUILD_USER=sedat.dilek@gmail.com
> KBUILD_BUILD_TIMESTAMP=2021-01-17 bindeb-pkg
> KDEB_PKGVERSION=5.11.0~rc3-9~bullseye+dileks1
> KCFLAGS=-fprofile-use=vmlinux.profdata
>
> ( Yes, 06:47 a.m. in the morning :-). )
>
> When I boot with the rebuild Linux-kernel I see:
>
> Wrong EFI loader signature
> ...
> Decompressing
> Parsing EFI
> Performing Relocations done.
> Booting the Kernel.
>
> *** SYSTEM HANGS ***
> ( I waited for approx 1 min )
>
> I tried to turn UEFI support ON and OFF.
> No success.
>
> Does Clang-PGO support Legacy-BIOS or is something different wrong?
>
> Thanks.
>

My bootloader is GRUB.

In UEFI-BIOS settings there is no secure-boot disable option.
Just simple "Use UEFI BIOS" enabled|disabled.

Installed Debian packages:

ii grub-common 2.04-12
ii grub-pc 2.04-12
ii grub-pc-bin 2.04-12
ii grub2-common 2.04-12

I found in the below link to do in grub-shell:

set check_signatures=no

But this is when grub-efi is installed.

- Sedat -

Link: https://unix.stackexchange.com/questions/126286/grub-efi-disable-signature-check

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-17 10:53               ` Sedat Dilek
@ 2021-01-17 11:23                 ` Sedat Dilek
  2021-01-17 11:42                   ` Sedat Dilek
  0 siblings, 1 reply; 116+ messages in thread
From: Sedat Dilek @ 2021-01-17 11:23 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Jonathan Corbet, Masahiro Yamada, Linux Doc Mailing List, LKML,
	Linux Kbuild mailing list, Clang-Built-Linux ML, Andrew Morton,
	Nathan Chancellor, Nick Desaulniers, Sami Tolvanen

On Sun, Jan 17, 2021 at 11:53 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
> On Sun, Jan 17, 2021 at 11:44 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >
> > On Sat, Jan 16, 2021 at 9:23 PM Bill Wendling <morbo@google.com> wrote:
> > >
> > > On Sat, Jan 16, 2021 at 9:39 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > > > On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
> > > > <clang-built-linux@googlegroups.com> wrote:
> > > > >
> > > > > From: Sami Tolvanen <samitolvanen@google.com>
> > > > >
> > > > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > > > profile, the kernel is instrumented with PGO counters, a representative
> > > > > workload is run, and the raw profile data is collected from
> > > > > /sys/kernel/debug/pgo/profraw.
> > > > >
> > > > > The raw profile data must be processed by clang's "llvm-profdata" tool
> > > > > before it can be used during recompilation:
> > > > >
> > > > >   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > >   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > >
> > > > > Multiple raw profiles may be merged during this step.
> > > > >
> > > > > The data can now be used by the compiler:
> > > > >
> > > > >   $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > >
> > > > > This initial submission is restricted to x86, as that's the platform we
> > > > > know works. This restriction can be lifted once other platforms have
> > > > > been verified to work with PGO.
> > > > >
> > > > > Note that this method of profiling the kernel is clang-native, unlike
> > > > > the clang support in kernel/gcov.
> > > > >
> > > > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > >
> > > > > Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> > > > > Co-developed-by: Bill Wendling <morbo@google.com>
> > > > > Signed-off-by: Bill Wendling <morbo@google.com>
> > > > > ---
> > > > > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > > > >       testing.
> > > > >     - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > > > >       Song's comments.
> > > > > v3: - Added change log section based on Sedat Dilek's comments.
> > > > > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > > > >       own popcount implementation, based on Nick Desaulniers's comment.
> > > > > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > > > > ---
> > > > >  Documentation/dev-tools/index.rst     |   1 +
> > > > >  Documentation/dev-tools/pgo.rst       | 127 +++++++++
> > > > >  MAINTAINERS                           |   9 +
> > > > >  Makefile                              |   3 +
> > > > >  arch/Kconfig                          |   1 +
> > > > >  arch/x86/Kconfig                      |   1 +
> > > > >  arch/x86/boot/Makefile                |   1 +
> > > > >  arch/x86/boot/compressed/Makefile     |   1 +
> > > > >  arch/x86/crypto/Makefile              |   2 +
> > > > >  arch/x86/entry/vdso/Makefile          |   1 +
> > > > >  arch/x86/kernel/vmlinux.lds.S         |   2 +
> > > > >  arch/x86/platform/efi/Makefile        |   1 +
> > > > >  arch/x86/purgatory/Makefile           |   1 +
> > > > >  arch/x86/realmode/rm/Makefile         |   1 +
> > > > >  arch/x86/um/vdso/Makefile             |   1 +
> > > > >  drivers/firmware/efi/libstub/Makefile |   1 +
> > > > >  include/asm-generic/vmlinux.lds.h     |  44 +++
> > > > >  kernel/Makefile                       |   1 +
> > > > >  kernel/pgo/Kconfig                    |  35 +++
> > > > >  kernel/pgo/Makefile                   |   5 +
> > > > >  kernel/pgo/fs.c                       | 382 ++++++++++++++++++++++++++
> > > > >  kernel/pgo/instrument.c               | 185 +++++++++++++
> > > > >  kernel/pgo/pgo.h                      | 206 ++++++++++++++
> > > > >  scripts/Makefile.lib                  |  10 +
> > > > >  24 files changed, 1022 insertions(+)
> > > > >  create mode 100644 Documentation/dev-tools/pgo.rst
> > > > >  create mode 100644 kernel/pgo/Kconfig
> > > > >  create mode 100644 kernel/pgo/Makefile
> > > > >  create mode 100644 kernel/pgo/fs.c
> > > > >  create mode 100644 kernel/pgo/instrument.c
> > > > >  create mode 100644 kernel/pgo/pgo.h
> > > > >
> > > > > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > > > > index f7809c7b1ba9e..8d6418e858062 100644
> > > > > --- a/Documentation/dev-tools/index.rst
> > > > > +++ b/Documentation/dev-tools/index.rst
> > > > > @@ -26,6 +26,7 @@ whole; patches welcome!
> > > > >     kgdb
> > > > >     kselftest
> > > > >     kunit/index
> > > > > +   pgo
> > > > >
> > > > >
> > > > >  .. only::  subproject and html
> > > > > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > > > > new file mode 100644
> > > > > index 0000000000000..b7f11d8405b73
> > > > > --- /dev/null
> > > > > +++ b/Documentation/dev-tools/pgo.rst
> > > > > @@ -0,0 +1,127 @@
> > > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > > +
> > > > > +===============================
> > > > > +Using PGO with the Linux kernel
> > > > > +===============================
> > > > > +
> > > > > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > > > > +when building with Clang. The profiling data is exported via the ``pgo``
> > > > > +debugfs directory.
> > > > > +
> > > > > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > > +
> > > > > +
> > > > > +Preparation
> > > > > +===========
> > > > > +
> > > > > +Configure the kernel with:
> > > > > +
> > > > > +.. code-block:: make
> > > > > +
> > > > > +   CONFIG_DEBUG_FS=y
> > > > > +   CONFIG_PGO_CLANG=y
> > > > > +
> > > > > +Note that kernels compiled with profiling flags will be significantly larger
> > > > > +and run slower.
> > > > > +
> > > > > +Profiling data will only become accessible once debugfs has been mounted:
> > > > > +
> > > > > +.. code-block:: sh
> > > > > +
> > > > > +   mount -t debugfs none /sys/kernel/debug
> > > > > +
> > > > > +
> > > > > +Customization
> > > > > +=============
> > > > > +
> > > > > +You can enable or disable profiling for individual file and directories by
> > > > > +adding a line similar to the following to the respective kernel Makefile:
> > > > > +
> > > > > +- For a single file (e.g. main.o)
> > > > > +
> > > > > +  .. code-block:: make
> > > > > +
> > > > > +     PGO_PROFILE_main.o := y
> > > > > +
> > > > > +- For all files in one directory
> > > > > +
> > > > > +  .. code-block:: make
> > > > > +
> > > > > +     PGO_PROFILE := y
> > > > > +
> > > > > +To exclude files from being profiled use
> > > > > +
> > > > > +  .. code-block:: make
> > > > > +
> > > > > +     PGO_PROFILE_main.o := n
> > > > > +
> > > > > +and
> > > > > +
> > > > > +  .. code-block:: make
> > > > > +
> > > > > +     PGO_PROFILE := n
> > > > > +
> > > > > +Only files which are linked to the main kernel image or are compiled as kernel
> > > > > +modules are supported by this mechanism.
> > > > > +
> > > > > +
> > > > > +Files
> > > > > +=====
> > > > > +
> > > > > +The PGO kernel support creates the following files in debugfs:
> > > > > +
> > > > > +``/sys/kernel/debug/pgo``
> > > > > +       Parent directory for all PGO-related files.
> > > > > +
> > > > > +``/sys/kernel/debug/pgo/reset``
> > > > > +       Global reset file: resets all coverage data to zero when written to.
> > > > > +
> > > > > +``/sys/kernel/debug/profraw``
> > > > > +       The raw PGO data that must be processed with ``llvm_profdata``.
> > > > > +
> > > > > +
> > > > > +Workflow
> > > > > +========
> > > > > +
> > > > > +The PGO kernel can be run on the host or test machines. The data though should
> > > > > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > > > > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > > > > +Clang version.
> > > > > +
> > > > > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > > > > +etc. Clang offers tools to perform these tasks.
> > > > > +
> > > > > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > > > > +using the result to optimize the kernel:
> > > > > +
> > > > > +1) Install the kernel on the TEST machine.
> > > > > +
> > > > > +2) Reset the data counters right before running the load tests
> > > > > +
> > > > > +   .. code-block:: sh
> > > > > +
> > > > > +      $ echo 1 > /sys/kernel/debug/pgo/reset
> > > > > +
> > > >
> > > > I do not get this...
> > > >
> > > > # mount | grep debugfs
> > > > debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
> > > >
> > > > After the load-test...?
> > > >
> > > > echo 0 > /sys/kernel/debug/pgo/reset
> > > >
> > > Writing anything to /sys/kernel/debug/pgo/reset will cause it to reset
> > > the profiling counters. I picked 1 (one) semi-randomly, but it could
> > > be any number, letter, your favorite short story, etc. You don't want
> > > to reset it before collecting the profiling data from your load tests
> > > though.
> > >
> > > > > +3) Run the load tests.
> > > > > +
> > > > > +4) Collect the raw profile data
> > > > > +
> > > > > +   .. code-block:: sh
> > > > > +
> > > > > +      $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > > +
> > > >
> > > > This is only 4,9M small and seen from the date 5mins before I run the
> > > > echo-1 line.
> > > >
> > > > # ll /sys/kernel/debug/pgo
> > > > insgesamt 0
> > > > drwxr-xr-x  2 root root 0 16. Jan 17:29 .
> > > > drwx------ 41 root root 0 16. Jan 17:29 ..
> > > > -rw-------  1 root root 0 16. Jan 17:29 profraw
> > > > --w-------  1 root root 0 16. Jan 18:19 reset
> > > >
> > > > # cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > >
> > > > # ll /tmp/vmlinux.profraw
> > > > -rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw
> > > >
> > > > For me there was no prof-data collected from my defconfig kernel-build.
> > > >
> > > The /sys/kernel/debug/pgo/profraw file is read-only. Nothing writes to
> > > it, not even the kernel. All it does is serialize the profiling
> > > counters from a memory location in the kernel into a format that
> > > LLVM's tools can understand.
> > >
> > > > > +5) (Optional) Download the raw profile data to the HOST machine.
> > > > > +
> > > > > +6) Process the raw profile data
> > > > > +
> > > > > +   .. code-block:: sh
> > > > > +
> > > > > +      $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > +
> > > >
> > > > Is that executed in /path/to/linux/git?
> > > >
> > > The llvm-profdata tool is not in the linux source tree. You need to
> > > grab it from a clang distribution (or built from clang's git repo).
> > >
> > > > > +   Note that multiple raw profile data files can be merged during this step.
> > > > > +
> > > > > +7) Rebuild the kernel using the profile data (PGO disabled)
> > > > > +
> > > > > +   .. code-block:: sh
> > > > > +
> > > > > +      $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > >
> > > > How big is vmlinux.profdata (make defconfig)?
> > > >
> > > I don't have numbers for this, but from what you listed here, it's ~5M
> > > in size. The size is proportional to the number of counters
> > > instrumented in the kernel.
> > >
> > > > Do I need to do a full defconfig build or can I stop the build after
> > > > let me say 10mins?
> > > >
> > > You should do a full rebuild. Make sure that PGO is disabled during the rebuild.
> > >
> >
> > Thanks Bill for all the information.
> >
> > And sorry if I am so pedantic.
> >
> > I have installed my Debian system with Legacy-BIOS enabled.
> >
> > When I rebuild with KCFLAGS=-fprofile-use=vmlinux.profdata (LLVM=1 I
> > have as a default) my system hangs on reboot.
> >
> > [ diffconfig ]
> > $ scripts/diffconfig /boot/config-5.11.0-rc3-8-amd64-clang12-pgo
> > /boot/config-5.11.0-rc3-9-amd64-clang12-pgo
> > BUILD_SALT "5.11.0-rc3-8-amd64-clang12-pgo" -> "5.11.0-rc3-9-amd64-clang12-pgo"
> > PGO_CLANG y -> n
> >
> > [ my make line ]
> > $ cat ../start-build_5.11.0-rc3-9-amd64-clang12-pgo.txt
> > dileks     63120   63095  0 06:47 pts/2    00:00:00 /usr/bin/perf_5.10
> > stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> > LD=ld.lld LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole
> > LOCALVERSION=-9-amd64-clang12-pgo KBUILD_VERBOSE=1
> > KBUILD_BUILD_HOST=iniza KBUILD_BUILD_USER=sedat.dilek@gmail.com
> > KBUILD_BUILD_TIMESTAMP=2021-01-17 bindeb-pkg
> > KDEB_PKGVERSION=5.11.0~rc3-9~bullseye+dileks1
> > KCFLAGS=-fprofile-use=vmlinux.profdata
> >
> > ( Yes, 06:47 a.m. in the morning :-). )
> >
> > When I boot with the rebuild Linux-kernel I see:
> >
> > Wrong EFI loader signature
> > ...
> > Decompressing
> > Parsing EFI
> > Performing Relocations done.
> > Booting the Kernel.
> >
> > *** SYSTEM HANGS ***
> > ( I waited for approx 1 min )
> >
> > I tried to turn UEFI support ON and OFF.
> > No success.
> >
> > Does Clang-PGO support Legacy-BIOS or is something different wrong?
> >
> > Thanks.
> >
>
> My bootloader is GRUB.
>
> In UEFI-BIOS settings there is no secure-boot disable option.
> Just simple "Use UEFI BIOS" enabled|disabled.
>
> Installed Debian packages:
>
> ii grub-common 2.04-12
> ii grub-pc 2.04-12
> ii grub-pc-bin 2.04-12
> ii grub2-common 2.04-12
>
> I found in the below link to do in grub-shell:
>
> set check_signatures=no
>
> But this is when grub-efi is installed.
>
> - Sedat -
>
> Link: https://unix.stackexchange.com/questions/126286/grub-efi-disable-signature-check

Forget about that "Wrong EFI bootloader" - I see this with all other
kernels (all boot fine).

I tried in QEMU with and without KASLR:

[ run_qemu.sh ]
KPATH=$(pwd)

APPEND="root=/dev/ram0 console=ttyS0 hung_task_panic=1 earlyprintk=ttyS0,115200"
APPEND="$APPEND nokaslr"

qemu-system-x86_64 -enable-kvm -M pc -kernel $KPATH/bzImage -initrd
$KPATH/initrd.img -m 512 -net none -serial stdio -append "${APPEND}"
[ /run_qemu.sh ]

$ ./run_qemu.sh
Probing EDD (edd=off to disable)... ok
Wrong EFI loader signature.
early console in extract_kernel
input_data: 0x000000000289940d
input_len: 0x000000000069804a
output: 0x0000000001000000
output_len: 0x0000000001ef2010
kernel_total_size: 0x0000000001c2c000
needed_size: 0x0000000002000000
trampoline_32bit: 0x000000000009d000


KASLR disabled: 'nokaslr' on cmdline.


Decompressing Linux... Parsing ELF... No relocation needed... done.
Booting the kernel.

QEMU run stops, too.

- Sedat

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-17 11:23                 ` Sedat Dilek
@ 2021-01-17 11:42                   ` Sedat Dilek
  2021-01-17 11:58                     ` Sedat Dilek
  0 siblings, 1 reply; 116+ messages in thread
From: Sedat Dilek @ 2021-01-17 11:42 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Jonathan Corbet, Masahiro Yamada, Linux Doc Mailing List, LKML,
	Linux Kbuild mailing list, Clang-Built-Linux ML, Andrew Morton,
	Nathan Chancellor, Nick Desaulniers, Sami Tolvanen

On Sun, Jan 17, 2021 at 12:23 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
> On Sun, Jan 17, 2021 at 11:53 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >
> > On Sun, Jan 17, 2021 at 11:44 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > >
> > > On Sat, Jan 16, 2021 at 9:23 PM Bill Wendling <morbo@google.com> wrote:
> > > >
> > > > On Sat, Jan 16, 2021 at 9:39 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > > > > On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
> > > > > <clang-built-linux@googlegroups.com> wrote:
> > > > > >
> > > > > > From: Sami Tolvanen <samitolvanen@google.com>
> > > > > >
> > > > > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > > > > profile, the kernel is instrumented with PGO counters, a representative
> > > > > > workload is run, and the raw profile data is collected from
> > > > > > /sys/kernel/debug/pgo/profraw.
> > > > > >
> > > > > > The raw profile data must be processed by clang's "llvm-profdata" tool
> > > > > > before it can be used during recompilation:
> > > > > >
> > > > > >   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > > >   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > >
> > > > > > Multiple raw profiles may be merged during this step.
> > > > > >
> > > > > > The data can now be used by the compiler:
> > > > > >
> > > > > >   $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > > >
> > > > > > This initial submission is restricted to x86, as that's the platform we
> > > > > > know works. This restriction can be lifted once other platforms have
> > > > > > been verified to work with PGO.
> > > > > >
> > > > > > Note that this method of profiling the kernel is clang-native, unlike
> > > > > > the clang support in kernel/gcov.
> > > > > >
> > > > > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > > >
> > > > > > Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> > > > > > Co-developed-by: Bill Wendling <morbo@google.com>
> > > > > > Signed-off-by: Bill Wendling <morbo@google.com>
> > > > > > ---
> > > > > > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > > > > >       testing.
> > > > > >     - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > > > > >       Song's comments.
> > > > > > v3: - Added change log section based on Sedat Dilek's comments.
> > > > > > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > > > > >       own popcount implementation, based on Nick Desaulniers's comment.
> > > > > > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > > > > > ---
> > > > > >  Documentation/dev-tools/index.rst     |   1 +
> > > > > >  Documentation/dev-tools/pgo.rst       | 127 +++++++++
> > > > > >  MAINTAINERS                           |   9 +
> > > > > >  Makefile                              |   3 +
> > > > > >  arch/Kconfig                          |   1 +
> > > > > >  arch/x86/Kconfig                      |   1 +
> > > > > >  arch/x86/boot/Makefile                |   1 +
> > > > > >  arch/x86/boot/compressed/Makefile     |   1 +
> > > > > >  arch/x86/crypto/Makefile              |   2 +
> > > > > >  arch/x86/entry/vdso/Makefile          |   1 +
> > > > > >  arch/x86/kernel/vmlinux.lds.S         |   2 +
> > > > > >  arch/x86/platform/efi/Makefile        |   1 +
> > > > > >  arch/x86/purgatory/Makefile           |   1 +
> > > > > >  arch/x86/realmode/rm/Makefile         |   1 +
> > > > > >  arch/x86/um/vdso/Makefile             |   1 +
> > > > > >  drivers/firmware/efi/libstub/Makefile |   1 +
> > > > > >  include/asm-generic/vmlinux.lds.h     |  44 +++
> > > > > >  kernel/Makefile                       |   1 +
> > > > > >  kernel/pgo/Kconfig                    |  35 +++
> > > > > >  kernel/pgo/Makefile                   |   5 +
> > > > > >  kernel/pgo/fs.c                       | 382 ++++++++++++++++++++++++++
> > > > > >  kernel/pgo/instrument.c               | 185 +++++++++++++
> > > > > >  kernel/pgo/pgo.h                      | 206 ++++++++++++++
> > > > > >  scripts/Makefile.lib                  |  10 +
> > > > > >  24 files changed, 1022 insertions(+)
> > > > > >  create mode 100644 Documentation/dev-tools/pgo.rst
> > > > > >  create mode 100644 kernel/pgo/Kconfig
> > > > > >  create mode 100644 kernel/pgo/Makefile
> > > > > >  create mode 100644 kernel/pgo/fs.c
> > > > > >  create mode 100644 kernel/pgo/instrument.c
> > > > > >  create mode 100644 kernel/pgo/pgo.h
> > > > > >
> > > > > > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > > > > > index f7809c7b1ba9e..8d6418e858062 100644
> > > > > > --- a/Documentation/dev-tools/index.rst
> > > > > > +++ b/Documentation/dev-tools/index.rst
> > > > > > @@ -26,6 +26,7 @@ whole; patches welcome!
> > > > > >     kgdb
> > > > > >     kselftest
> > > > > >     kunit/index
> > > > > > +   pgo
> > > > > >
> > > > > >
> > > > > >  .. only::  subproject and html
> > > > > > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > > > > > new file mode 100644
> > > > > > index 0000000000000..b7f11d8405b73
> > > > > > --- /dev/null
> > > > > > +++ b/Documentation/dev-tools/pgo.rst
> > > > > > @@ -0,0 +1,127 @@
> > > > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > > > +
> > > > > > +===============================
> > > > > > +Using PGO with the Linux kernel
> > > > > > +===============================
> > > > > > +
> > > > > > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > > > > > +when building with Clang. The profiling data is exported via the ``pgo``
> > > > > > +debugfs directory.
> > > > > > +
> > > > > > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > > > +
> > > > > > +
> > > > > > +Preparation
> > > > > > +===========
> > > > > > +
> > > > > > +Configure the kernel with:
> > > > > > +
> > > > > > +.. code-block:: make
> > > > > > +
> > > > > > +   CONFIG_DEBUG_FS=y
> > > > > > +   CONFIG_PGO_CLANG=y
> > > > > > +
> > > > > > +Note that kernels compiled with profiling flags will be significantly larger
> > > > > > +and run slower.
> > > > > > +
> > > > > > +Profiling data will only become accessible once debugfs has been mounted:
> > > > > > +
> > > > > > +.. code-block:: sh
> > > > > > +
> > > > > > +   mount -t debugfs none /sys/kernel/debug
> > > > > > +
> > > > > > +
> > > > > > +Customization
> > > > > > +=============
> > > > > > +
> > > > > > +You can enable or disable profiling for individual file and directories by
> > > > > > +adding a line similar to the following to the respective kernel Makefile:
> > > > > > +
> > > > > > +- For a single file (e.g. main.o)
> > > > > > +
> > > > > > +  .. code-block:: make
> > > > > > +
> > > > > > +     PGO_PROFILE_main.o := y
> > > > > > +
> > > > > > +- For all files in one directory
> > > > > > +
> > > > > > +  .. code-block:: make
> > > > > > +
> > > > > > +     PGO_PROFILE := y
> > > > > > +
> > > > > > +To exclude files from being profiled use
> > > > > > +
> > > > > > +  .. code-block:: make
> > > > > > +
> > > > > > +     PGO_PROFILE_main.o := n
> > > > > > +
> > > > > > +and
> > > > > > +
> > > > > > +  .. code-block:: make
> > > > > > +
> > > > > > +     PGO_PROFILE := n
> > > > > > +
> > > > > > +Only files which are linked to the main kernel image or are compiled as kernel
> > > > > > +modules are supported by this mechanism.
> > > > > > +
> > > > > > +
> > > > > > +Files
> > > > > > +=====
> > > > > > +
> > > > > > +The PGO kernel support creates the following files in debugfs:
> > > > > > +
> > > > > > +``/sys/kernel/debug/pgo``
> > > > > > +       Parent directory for all PGO-related files.
> > > > > > +
> > > > > > +``/sys/kernel/debug/pgo/reset``
> > > > > > +       Global reset file: resets all coverage data to zero when written to.
> > > > > > +
> > > > > > +``/sys/kernel/debug/profraw``
> > > > > > +       The raw PGO data that must be processed with ``llvm_profdata``.
> > > > > > +
> > > > > > +
> > > > > > +Workflow
> > > > > > +========
> > > > > > +
> > > > > > +The PGO kernel can be run on the host or test machines. The data though should
> > > > > > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > > > > > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > > > > > +Clang version.
> > > > > > +
> > > > > > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > > > > > +etc. Clang offers tools to perform these tasks.
> > > > > > +
> > > > > > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > > > > > +using the result to optimize the kernel:
> > > > > > +
> > > > > > +1) Install the kernel on the TEST machine.
> > > > > > +
> > > > > > +2) Reset the data counters right before running the load tests
> > > > > > +
> > > > > > +   .. code-block:: sh
> > > > > > +
> > > > > > +      $ echo 1 > /sys/kernel/debug/pgo/reset
> > > > > > +
> > > > >
> > > > > I do not get this...
> > > > >
> > > > > # mount | grep debugfs
> > > > > debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
> > > > >
> > > > > After the load-test...?
> > > > >
> > > > > echo 0 > /sys/kernel/debug/pgo/reset
> > > > >
> > > > Writing anything to /sys/kernel/debug/pgo/reset will cause it to reset
> > > > the profiling counters. I picked 1 (one) semi-randomly, but it could
> > > > be any number, letter, your favorite short story, etc. You don't want
> > > > to reset it before collecting the profiling data from your load tests
> > > > though.
> > > >
> > > > > > +3) Run the load tests.
> > > > > > +
> > > > > > +4) Collect the raw profile data
> > > > > > +
> > > > > > +   .. code-block:: sh
> > > > > > +
> > > > > > +      $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > > > +
> > > > >
> > > > > This is only 4,9M small and seen from the date 5mins before I run the
> > > > > echo-1 line.
> > > > >
> > > > > # ll /sys/kernel/debug/pgo
> > > > > insgesamt 0
> > > > > drwxr-xr-x  2 root root 0 16. Jan 17:29 .
> > > > > drwx------ 41 root root 0 16. Jan 17:29 ..
> > > > > -rw-------  1 root root 0 16. Jan 17:29 profraw
> > > > > --w-------  1 root root 0 16. Jan 18:19 reset
> > > > >
> > > > > # cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > >
> > > > > # ll /tmp/vmlinux.profraw
> > > > > -rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw
> > > > >
> > > > > For me there was no prof-data collected from my defconfig kernel-build.
> > > > >
> > > > The /sys/kernel/debug/pgo/profraw file is read-only. Nothing writes to
> > > > it, not even the kernel. All it does is serialize the profiling
> > > > counters from a memory location in the kernel into a format that
> > > > LLVM's tools can understand.
> > > >
> > > > > > +5) (Optional) Download the raw profile data to the HOST machine.
> > > > > > +
> > > > > > +6) Process the raw profile data
> > > > > > +
> > > > > > +   .. code-block:: sh
> > > > > > +
> > > > > > +      $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > > +
> > > > >
> > > > > Is that executed in /path/to/linux/git?
> > > > >
> > > > The llvm-profdata tool is not in the linux source tree. You need to
> > > > grab it from a clang distribution (or built from clang's git repo).
> > > >
> > > > > > +   Note that multiple raw profile data files can be merged during this step.
> > > > > > +
> > > > > > +7) Rebuild the kernel using the profile data (PGO disabled)
> > > > > > +
> > > > > > +   .. code-block:: sh
> > > > > > +
> > > > > > +      $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > >
> > > > > How big is vmlinux.profdata (make defconfig)?
> > > > >
> > > > I don't have numbers for this, but from what you listed here, it's ~5M
> > > > in size. The size is proportional to the number of counters
> > > > instrumented in the kernel.
> > > >
> > > > > Do I need to do a full defconfig build or can I stop the build after
> > > > > let me say 10mins?
> > > > >
> > > > You should do a full rebuild. Make sure that PGO is disabled during the rebuild.
> > > >
> > >
> > > Thanks Bill for all the information.
> > >
> > > And sorry if I am so pedantic.
> > >
> > > I have installed my Debian system with Legacy-BIOS enabled.
> > >
> > > When I rebuild with KCFLAGS=-fprofile-use=vmlinux.profdata (LLVM=1 I
> > > have as a default) my system hangs on reboot.
> > >
> > > [ diffconfig ]
> > > $ scripts/diffconfig /boot/config-5.11.0-rc3-8-amd64-clang12-pgo
> > > /boot/config-5.11.0-rc3-9-amd64-clang12-pgo
> > > BUILD_SALT "5.11.0-rc3-8-amd64-clang12-pgo" -> "5.11.0-rc3-9-amd64-clang12-pgo"
> > > PGO_CLANG y -> n
> > >
> > > [ my make line ]
> > > $ cat ../start-build_5.11.0-rc3-9-amd64-clang12-pgo.txt
> > > dileks     63120   63095  0 06:47 pts/2    00:00:00 /usr/bin/perf_5.10
> > > stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> > > LD=ld.lld LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole
> > > LOCALVERSION=-9-amd64-clang12-pgo KBUILD_VERBOSE=1
> > > KBUILD_BUILD_HOST=iniza KBUILD_BUILD_USER=sedat.dilek@gmail.com
> > > KBUILD_BUILD_TIMESTAMP=2021-01-17 bindeb-pkg
> > > KDEB_PKGVERSION=5.11.0~rc3-9~bullseye+dileks1
> > > KCFLAGS=-fprofile-use=vmlinux.profdata
> > >
> > > ( Yes, 06:47 a.m. in the morning :-). )
> > >
> > > When I boot with the rebuild Linux-kernel I see:
> > >
> > > Wrong EFI loader signature
> > > ...
> > > Decompressing
> > > Parsing EFI
> > > Performing Relocations done.
> > > Booting the Kernel.
> > >
> > > *** SYSTEM HANGS ***
> > > ( I waited for approx 1 min )
> > >
> > > I tried to turn UEFI support ON and OFF.
> > > No success.
> > >
> > > Does Clang-PGO support Legacy-BIOS or is something different wrong?
> > >
> > > Thanks.
> > >
> >
> > My bootloader is GRUB.
> >
> > In UEFI-BIOS settings there is no secure-boot disable option.
> > Just simple "Use UEFI BIOS" enabled|disabled.
> >
> > Installed Debian packages:
> >
> > ii grub-common 2.04-12
> > ii grub-pc 2.04-12
> > ii grub-pc-bin 2.04-12
> > ii grub2-common 2.04-12
> >
> > I found in the below link to do in grub-shell:
> >
> > set check_signatures=no
> >
> > But this is when grub-efi is installed.
> >
> > - Sedat -
> >
> > Link: https://unix.stackexchange.com/questions/126286/grub-efi-disable-signature-check
>
> Forget about that "Wrong EFI bootloader" - I see this with all other
> kernels (all boot fine).
>
> I tried in QEMU with and without KASLR:
>
> [ run_qemu.sh ]
> KPATH=$(pwd)
>
> APPEND="root=/dev/ram0 console=ttyS0 hung_task_panic=1 earlyprintk=ttyS0,115200"
> APPEND="$APPEND nokaslr"
>
> qemu-system-x86_64 -enable-kvm -M pc -kernel $KPATH/bzImage -initrd
> $KPATH/initrd.img -m 512 -net none -serial stdio -append "${APPEND}"
> [ /run_qemu.sh ]
>
> $ ./run_qemu.sh
> Probing EDD (edd=off to disable)... ok
> Wrong EFI loader signature.
> early console in extract_kernel
> input_data: 0x000000000289940d
> input_len: 0x000000000069804a
> output: 0x0000000001000000
> output_len: 0x0000000001ef2010
> kernel_total_size: 0x0000000001c2c000
> needed_size: 0x0000000002000000
> trampoline_32bit: 0x000000000009d000
>
>
> KASLR disabled: 'nokaslr' on cmdline.
>
>
> Decompressing Linux... Parsing ELF... No relocation needed... done.
> Booting the kernel.
>
> QEMU run stops, too.
>

I re-generated my initrd.img with GZIP as compressor (my default is ZSTD).

--- /etc/initramfs-tools/initramfs.conf 2021-01-17 12:35:30.823818501 +0100
+++ /etc/initramfs-tools/initramfs.conf.zstd    2020-09-21
23:55:43.121735427 +0200
@@ -41,7 +41,7 @@ KEYMAP=n
# COMPRESS: [ gzip | bzip2 | lz4 | lzma | lzop | xz | zstd ]
#

-COMPRESS=gzip
+COMPRESS=zstd

#
# DEVICE: ...

root# KVER="5.11.0-rc3-9-amd64-clang12-pgo" ; update-initramfs -c -k $KVER

QEMU boot stops at the same stage.

Now, my head is empty...

Any comments?

- Sedat -

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-17 11:42                   ` Sedat Dilek
@ 2021-01-17 11:58                     ` Sedat Dilek
       [not found]                       ` <CA+icZUXmn15w=kSq2CZzQD5JggJw_9AEam=Sz13M0KpJ68MWZg@mail.gmail.com>
  0 siblings, 1 reply; 116+ messages in thread
From: Sedat Dilek @ 2021-01-17 11:58 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Jonathan Corbet, Masahiro Yamada, Linux Doc Mailing List, LKML,
	Linux Kbuild mailing list, Clang-Built-Linux ML, Andrew Morton,
	Nathan Chancellor, Nick Desaulniers, Sami Tolvanen

On Sun, Jan 17, 2021 at 12:42 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
> On Sun, Jan 17, 2021 at 12:23 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >
> > On Sun, Jan 17, 2021 at 11:53 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > >
> > > On Sun, Jan 17, 2021 at 11:44 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > > >
> > > > On Sat, Jan 16, 2021 at 9:23 PM Bill Wendling <morbo@google.com> wrote:
> > > > >
> > > > > On Sat, Jan 16, 2021 at 9:39 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > > > > > On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
> > > > > > <clang-built-linux@googlegroups.com> wrote:
> > > > > > >
> > > > > > > From: Sami Tolvanen <samitolvanen@google.com>
> > > > > > >
> > > > > > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > > > > > profile, the kernel is instrumented with PGO counters, a representative
> > > > > > > workload is run, and the raw profile data is collected from
> > > > > > > /sys/kernel/debug/pgo/profraw.
> > > > > > >
> > > > > > > The raw profile data must be processed by clang's "llvm-profdata" tool
> > > > > > > before it can be used during recompilation:
> > > > > > >
> > > > > > >   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > > > >   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > > >
> > > > > > > Multiple raw profiles may be merged during this step.
> > > > > > >
> > > > > > > The data can now be used by the compiler:
> > > > > > >
> > > > > > >   $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > > > >
> > > > > > > This initial submission is restricted to x86, as that's the platform we
> > > > > > > know works. This restriction can be lifted once other platforms have
> > > > > > > been verified to work with PGO.
> > > > > > >
> > > > > > > Note that this method of profiling the kernel is clang-native, unlike
> > > > > > > the clang support in kernel/gcov.
> > > > > > >
> > > > > > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > > > >
> > > > > > > Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> > > > > > > Co-developed-by: Bill Wendling <morbo@google.com>
> > > > > > > Signed-off-by: Bill Wendling <morbo@google.com>
> > > > > > > ---
> > > > > > > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > > > > > >       testing.
> > > > > > >     - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > > > > > >       Song's comments.
> > > > > > > v3: - Added change log section based on Sedat Dilek's comments.
> > > > > > > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > > > > > >       own popcount implementation, based on Nick Desaulniers's comment.
> > > > > > > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > > > > > > ---
> > > > > > >  Documentation/dev-tools/index.rst     |   1 +
> > > > > > >  Documentation/dev-tools/pgo.rst       | 127 +++++++++
> > > > > > >  MAINTAINERS                           |   9 +
> > > > > > >  Makefile                              |   3 +
> > > > > > >  arch/Kconfig                          |   1 +
> > > > > > >  arch/x86/Kconfig                      |   1 +
> > > > > > >  arch/x86/boot/Makefile                |   1 +
> > > > > > >  arch/x86/boot/compressed/Makefile     |   1 +
> > > > > > >  arch/x86/crypto/Makefile              |   2 +
> > > > > > >  arch/x86/entry/vdso/Makefile          |   1 +
> > > > > > >  arch/x86/kernel/vmlinux.lds.S         |   2 +
> > > > > > >  arch/x86/platform/efi/Makefile        |   1 +
> > > > > > >  arch/x86/purgatory/Makefile           |   1 +
> > > > > > >  arch/x86/realmode/rm/Makefile         |   1 +
> > > > > > >  arch/x86/um/vdso/Makefile             |   1 +
> > > > > > >  drivers/firmware/efi/libstub/Makefile |   1 +
> > > > > > >  include/asm-generic/vmlinux.lds.h     |  44 +++
> > > > > > >  kernel/Makefile                       |   1 +
> > > > > > >  kernel/pgo/Kconfig                    |  35 +++
> > > > > > >  kernel/pgo/Makefile                   |   5 +
> > > > > > >  kernel/pgo/fs.c                       | 382 ++++++++++++++++++++++++++
> > > > > > >  kernel/pgo/instrument.c               | 185 +++++++++++++
> > > > > > >  kernel/pgo/pgo.h                      | 206 ++++++++++++++
> > > > > > >  scripts/Makefile.lib                  |  10 +
> > > > > > >  24 files changed, 1022 insertions(+)
> > > > > > >  create mode 100644 Documentation/dev-tools/pgo.rst
> > > > > > >  create mode 100644 kernel/pgo/Kconfig
> > > > > > >  create mode 100644 kernel/pgo/Makefile
> > > > > > >  create mode 100644 kernel/pgo/fs.c
> > > > > > >  create mode 100644 kernel/pgo/instrument.c
> > > > > > >  create mode 100644 kernel/pgo/pgo.h
> > > > > > >
> > > > > > > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > > > > > > index f7809c7b1ba9e..8d6418e858062 100644
> > > > > > > --- a/Documentation/dev-tools/index.rst
> > > > > > > +++ b/Documentation/dev-tools/index.rst
> > > > > > > @@ -26,6 +26,7 @@ whole; patches welcome!
> > > > > > >     kgdb
> > > > > > >     kselftest
> > > > > > >     kunit/index
> > > > > > > +   pgo
> > > > > > >
> > > > > > >
> > > > > > >  .. only::  subproject and html
> > > > > > > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > > > > > > new file mode 100644
> > > > > > > index 0000000000000..b7f11d8405b73
> > > > > > > --- /dev/null
> > > > > > > +++ b/Documentation/dev-tools/pgo.rst
> > > > > > > @@ -0,0 +1,127 @@
> > > > > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > > > > +
> > > > > > > +===============================
> > > > > > > +Using PGO with the Linux kernel
> > > > > > > +===============================
> > > > > > > +
> > > > > > > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > > > > > > +when building with Clang. The profiling data is exported via the ``pgo``
> > > > > > > +debugfs directory.
> > > > > > > +
> > > > > > > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > > > > +
> > > > > > > +
> > > > > > > +Preparation
> > > > > > > +===========
> > > > > > > +
> > > > > > > +Configure the kernel with:
> > > > > > > +
> > > > > > > +.. code-block:: make
> > > > > > > +
> > > > > > > +   CONFIG_DEBUG_FS=y
> > > > > > > +   CONFIG_PGO_CLANG=y
> > > > > > > +
> > > > > > > +Note that kernels compiled with profiling flags will be significantly larger
> > > > > > > +and run slower.
> > > > > > > +
> > > > > > > +Profiling data will only become accessible once debugfs has been mounted:
> > > > > > > +
> > > > > > > +.. code-block:: sh
> > > > > > > +
> > > > > > > +   mount -t debugfs none /sys/kernel/debug
> > > > > > > +
> > > > > > > +
> > > > > > > +Customization
> > > > > > > +=============
> > > > > > > +
> > > > > > > +You can enable or disable profiling for individual file and directories by
> > > > > > > +adding a line similar to the following to the respective kernel Makefile:
> > > > > > > +
> > > > > > > +- For a single file (e.g. main.o)
> > > > > > > +
> > > > > > > +  .. code-block:: make
> > > > > > > +
> > > > > > > +     PGO_PROFILE_main.o := y
> > > > > > > +
> > > > > > > +- For all files in one directory
> > > > > > > +
> > > > > > > +  .. code-block:: make
> > > > > > > +
> > > > > > > +     PGO_PROFILE := y
> > > > > > > +
> > > > > > > +To exclude files from being profiled use
> > > > > > > +
> > > > > > > +  .. code-block:: make
> > > > > > > +
> > > > > > > +     PGO_PROFILE_main.o := n
> > > > > > > +
> > > > > > > +and
> > > > > > > +
> > > > > > > +  .. code-block:: make
> > > > > > > +
> > > > > > > +     PGO_PROFILE := n
> > > > > > > +
> > > > > > > +Only files which are linked to the main kernel image or are compiled as kernel
> > > > > > > +modules are supported by this mechanism.
> > > > > > > +
> > > > > > > +
> > > > > > > +Files
> > > > > > > +=====
> > > > > > > +
> > > > > > > +The PGO kernel support creates the following files in debugfs:
> > > > > > > +
> > > > > > > +``/sys/kernel/debug/pgo``
> > > > > > > +       Parent directory for all PGO-related files.
> > > > > > > +
> > > > > > > +``/sys/kernel/debug/pgo/reset``
> > > > > > > +       Global reset file: resets all coverage data to zero when written to.
> > > > > > > +
> > > > > > > +``/sys/kernel/debug/profraw``
> > > > > > > +       The raw PGO data that must be processed with ``llvm_profdata``.
> > > > > > > +
> > > > > > > +
> > > > > > > +Workflow
> > > > > > > +========
> > > > > > > +
> > > > > > > +The PGO kernel can be run on the host or test machines. The data though should
> > > > > > > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > > > > > > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > > > > > > +Clang version.
> > > > > > > +
> > > > > > > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > > > > > > +etc. Clang offers tools to perform these tasks.
> > > > > > > +
> > > > > > > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > > > > > > +using the result to optimize the kernel:
> > > > > > > +
> > > > > > > +1) Install the kernel on the TEST machine.
> > > > > > > +
> > > > > > > +2) Reset the data counters right before running the load tests
> > > > > > > +
> > > > > > > +   .. code-block:: sh
> > > > > > > +
> > > > > > > +      $ echo 1 > /sys/kernel/debug/pgo/reset
> > > > > > > +
> > > > > >
> > > > > > I do not get this...
> > > > > >
> > > > > > # mount | grep debugfs
> > > > > > debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
> > > > > >
> > > > > > After the load-test...?
> > > > > >
> > > > > > echo 0 > /sys/kernel/debug/pgo/reset
> > > > > >
> > > > > Writing anything to /sys/kernel/debug/pgo/reset will cause it to reset
> > > > > the profiling counters. I picked 1 (one) semi-randomly, but it could
> > > > > be any number, letter, your favorite short story, etc. You don't want
> > > > > to reset it before collecting the profiling data from your load tests
> > > > > though.
> > > > >
> > > > > > > +3) Run the load tests.
> > > > > > > +
> > > > > > > +4) Collect the raw profile data
> > > > > > > +
> > > > > > > +   .. code-block:: sh
> > > > > > > +
> > > > > > > +      $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > > > > +
> > > > > >
> > > > > > This is only 4,9M small and seen from the date 5mins before I run the
> > > > > > echo-1 line.
> > > > > >
> > > > > > # ll /sys/kernel/debug/pgo
> > > > > > insgesamt 0
> > > > > > drwxr-xr-x  2 root root 0 16. Jan 17:29 .
> > > > > > drwx------ 41 root root 0 16. Jan 17:29 ..
> > > > > > -rw-------  1 root root 0 16. Jan 17:29 profraw
> > > > > > --w-------  1 root root 0 16. Jan 18:19 reset
> > > > > >
> > > > > > # cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > > >
> > > > > > # ll /tmp/vmlinux.profraw
> > > > > > -rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw
> > > > > >
> > > > > > For me there was no prof-data collected from my defconfig kernel-build.
> > > > > >
> > > > > The /sys/kernel/debug/pgo/profraw file is read-only. Nothing writes to
> > > > > it, not even the kernel. All it does is serialize the profiling
> > > > > counters from a memory location in the kernel into a format that
> > > > > LLVM's tools can understand.
> > > > >
> > > > > > > +5) (Optional) Download the raw profile data to the HOST machine.
> > > > > > > +
> > > > > > > +6) Process the raw profile data
> > > > > > > +
> > > > > > > +   .. code-block:: sh
> > > > > > > +
> > > > > > > +      $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > > > +
> > > > > >
> > > > > > Is that executed in /path/to/linux/git?
> > > > > >
> > > > > The llvm-profdata tool is not in the linux source tree. You need to
> > > > > grab it from a clang distribution (or built from clang's git repo).
> > > > >
> > > > > > > +   Note that multiple raw profile data files can be merged during this step.
> > > > > > > +
> > > > > > > +7) Rebuild the kernel using the profile data (PGO disabled)
> > > > > > > +
> > > > > > > +   .. code-block:: sh
> > > > > > > +
> > > > > > > +      $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > > >
> > > > > > How big is vmlinux.profdata (make defconfig)?
> > > > > >
> > > > > I don't have numbers for this, but from what you listed here, it's ~5M
> > > > > in size. The size is proportional to the number of counters
> > > > > instrumented in the kernel.
> > > > >
> > > > > > Do I need to do a full defconfig build or can I stop the build after
> > > > > > let me say 10mins?
> > > > > >
> > > > > You should do a full rebuild. Make sure that PGO is disabled during the rebuild.
> > > > >
> > > >
> > > > Thanks Bill for all the information.
> > > >
> > > > And sorry if I am so pedantic.
> > > >
> > > > I have installed my Debian system with Legacy-BIOS enabled.
> > > >
> > > > When I rebuild with KCFLAGS=-fprofile-use=vmlinux.profdata (LLVM=1 I
> > > > have as a default) my system hangs on reboot.
> > > >
> > > > [ diffconfig ]
> > > > $ scripts/diffconfig /boot/config-5.11.0-rc3-8-amd64-clang12-pgo
> > > > /boot/config-5.11.0-rc3-9-amd64-clang12-pgo
> > > > BUILD_SALT "5.11.0-rc3-8-amd64-clang12-pgo" -> "5.11.0-rc3-9-amd64-clang12-pgo"
> > > > PGO_CLANG y -> n
> > > >
> > > > [ my make line ]
> > > > $ cat ../start-build_5.11.0-rc3-9-amd64-clang12-pgo.txt
> > > > dileks     63120   63095  0 06:47 pts/2    00:00:00 /usr/bin/perf_5.10
> > > > stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> > > > LD=ld.lld LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole
> > > > LOCALVERSION=-9-amd64-clang12-pgo KBUILD_VERBOSE=1
> > > > KBUILD_BUILD_HOST=iniza KBUILD_BUILD_USER=sedat.dilek@gmail.com
> > > > KBUILD_BUILD_TIMESTAMP=2021-01-17 bindeb-pkg
> > > > KDEB_PKGVERSION=5.11.0~rc3-9~bullseye+dileks1
> > > > KCFLAGS=-fprofile-use=vmlinux.profdata
> > > >
> > > > ( Yes, 06:47 a.m. in the morning :-). )
> > > >
> > > > When I boot with the rebuild Linux-kernel I see:
> > > >
> > > > Wrong EFI loader signature
> > > > ...
> > > > Decompressing
> > > > Parsing EFI
> > > > Performing Relocations done.
> > > > Booting the Kernel.
> > > >
> > > > *** SYSTEM HANGS ***
> > > > ( I waited for approx 1 min )
> > > >
> > > > I tried to turn UEFI support ON and OFF.
> > > > No success.
> > > >
> > > > Does Clang-PGO support Legacy-BIOS or is something different wrong?
> > > >
> > > > Thanks.
> > > >
> > >
> > > My bootloader is GRUB.
> > >
> > > In UEFI-BIOS settings there is no secure-boot disable option.
> > > Just simple "Use UEFI BIOS" enabled|disabled.
> > >
> > > Installed Debian packages:
> > >
> > > ii grub-common 2.04-12
> > > ii grub-pc 2.04-12
> > > ii grub-pc-bin 2.04-12
> > > ii grub2-common 2.04-12
> > >
> > > I found in the below link to do in grub-shell:
> > >
> > > set check_signatures=no
> > >
> > > But this is when grub-efi is installed.
> > >
> > > - Sedat -
> > >
> > > Link: https://unix.stackexchange.com/questions/126286/grub-efi-disable-signature-check
> >
> > Forget about that "Wrong EFI bootloader" - I see this with all other
> > kernels (all boot fine).
> >
> > I tried in QEMU with and without KASLR:
> >
> > [ run_qemu.sh ]
> > KPATH=$(pwd)
> >
> > APPEND="root=/dev/ram0 console=ttyS0 hung_task_panic=1 earlyprintk=ttyS0,115200"
> > APPEND="$APPEND nokaslr"
> >
> > qemu-system-x86_64 -enable-kvm -M pc -kernel $KPATH/bzImage -initrd
> > $KPATH/initrd.img -m 512 -net none -serial stdio -append "${APPEND}"
> > [ /run_qemu.sh ]
> >
> > $ ./run_qemu.sh
> > Probing EDD (edd=off to disable)... ok
> > Wrong EFI loader signature.
> > early console in extract_kernel
> > input_data: 0x000000000289940d
> > input_len: 0x000000000069804a
> > output: 0x0000000001000000
> > output_len: 0x0000000001ef2010
> > kernel_total_size: 0x0000000001c2c000
> > needed_size: 0x0000000002000000
> > trampoline_32bit: 0x000000000009d000
> >
> >
> > KASLR disabled: 'nokaslr' on cmdline.
> >
> >
> > Decompressing Linux... Parsing ELF... No relocation needed... done.
> > Booting the kernel.
> >
> > QEMU run stops, too.
> >
>
> I re-generated my initrd.img with GZIP as compressor (my default is ZSTD).
>
> --- /etc/initramfs-tools/initramfs.conf 2021-01-17 12:35:30.823818501 +0100
> +++ /etc/initramfs-tools/initramfs.conf.zstd    2020-09-21
> 23:55:43.121735427 +0200
> @@ -41,7 +41,7 @@ KEYMAP=n
> # COMPRESS: [ gzip | bzip2 | lz4 | lzma | lzop | xz | zstd ]
> #
>
> -COMPRESS=gzip
> +COMPRESS=zstd
>
> #
> # DEVICE: ...
>
> root# KVER="5.11.0-rc3-9-amd64-clang12-pgo" ; update-initramfs -c -k $KVER
>
> QEMU boot stops at the same stage.
>
> Now, my head is empty...
>
> Any comments?
>

( Just as a side note I have Nick's DWARF-v5 support enabled. )

There is one EFI related warning in my build-log:

$ grep warning: build-log_5.11.0-rc3-9-amd64-clang12-pgo.txt
dpkg-architecture: warning: specified GNU system type x86_64-linux-gnu
does not match CC system type x86_64-pc-linux-gnu, try setting a
correct CC environment variable
warning: arch/x86/platform/efi/quirks.c: Function control flow change
detected (hash mismatch) efi_arch_mem_reserve Hash =
391331300655996873 [-Wbackend-plugin]
warning: arch/x86/platform/efi/efi.c: Function control flow change
detected (hash mismatch) efi_attr_is_visible Hash = 567185240781730690
[-Wbackend-plugin]
arch/x86/crypto/aegis128-aesni-glue.c:265:30: warning: unused variable
'simd_alg' [-Wunused-variable]
warning: lib/crypto/sha256.c: Function control flow change detected
(hash mismatch) sha256_update Hash = 744640996947387358
[-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) memcmp Hash = 742261418966908927
[-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) bcmp Hash = 742261418966908927
[-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) strcmp Hash = 536873291001348520
[-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) strnlen Hash = 146835646621254984
[-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) simple_strtoull Hash =
252792765950587360 [-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) strstr Hash = 391331303349076211
[-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) strchr Hash = 1063705159280644635
[-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) kstrtoull Hash = 758414239132790022
[-Wbackend-plugin]
drivers/infiniband/hw/hfi1/platform.o: warning: objtool: tune_serdes()
falls through to next function apply_tx_lanes()

Cannot say if this information is helpful.

- Sedat -

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure
       [not found]                       ` <CA+icZUXmn15w=kSq2CZzQD5JggJw_9AEam=Sz13M0KpJ68MWZg@mail.gmail.com>
@ 2021-01-17 17:42                         ` Sedat Dilek
  2021-01-17 20:34                           ` Bill Wendling
  0 siblings, 1 reply; 116+ messages in thread
From: Sedat Dilek @ 2021-01-17 17:42 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Jonathan Corbet, Masahiro Yamada, Linux Doc Mailing List, LKML,
	Linux Kbuild mailing list, Clang-Built-Linux ML, Andrew Morton,
	Nathan Chancellor, Nick Desaulniers, Sami Tolvanen

On Sun, Jan 17, 2021 at 1:05 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
> On Sun, Jan 17, 2021 at 12:58 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >
> > On Sun, Jan 17, 2021 at 12:42 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > >
> > > On Sun, Jan 17, 2021 at 12:23 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > > >
> > > > On Sun, Jan 17, 2021 at 11:53 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > > > >
> > > > > On Sun, Jan 17, 2021 at 11:44 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > > > > >
> > > > > > On Sat, Jan 16, 2021 at 9:23 PM Bill Wendling <morbo@google.com> wrote:
> > > > > > >
> > > > > > > On Sat, Jan 16, 2021 at 9:39 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > > > > > > > On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
> > > > > > > > <clang-built-linux@googlegroups.com> wrote:
> > > > > > > > >
> > > > > > > > > From: Sami Tolvanen <samitolvanen@google.com>
> > > > > > > > >
> > > > > > > > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > > > > > > > profile, the kernel is instrumented with PGO counters, a representative
> > > > > > > > > workload is run, and the raw profile data is collected from
> > > > > > > > > /sys/kernel/debug/pgo/profraw.
> > > > > > > > >
> > > > > > > > > The raw profile data must be processed by clang's "llvm-profdata" tool
> > > > > > > > > before it can be used during recompilation:
> > > > > > > > >
> > > > > > > > >   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > > > > > >   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > > > > >
> > > > > > > > > Multiple raw profiles may be merged during this step.
> > > > > > > > >
> > > > > > > > > The data can now be used by the compiler:
> > > > > > > > >
> > > > > > > > >   $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > > > > > >
> > > > > > > > > This initial submission is restricted to x86, as that's the platform we
> > > > > > > > > know works. This restriction can be lifted once other platforms have
> > > > > > > > > been verified to work with PGO.
> > > > > > > > >
> > > > > > > > > Note that this method of profiling the kernel is clang-native, unlike
> > > > > > > > > the clang support in kernel/gcov.
> > > > > > > > >
> > > > > > > > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > > > > > >
> > > > > > > > > Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> > > > > > > > > Co-developed-by: Bill Wendling <morbo@google.com>
> > > > > > > > > Signed-off-by: Bill Wendling <morbo@google.com>
> > > > > > > > > ---
> > > > > > > > > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > > > > > > > >       testing.
> > > > > > > > >     - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > > > > > > > >       Song's comments.
> > > > > > > > > v3: - Added change log section based on Sedat Dilek's comments.
> > > > > > > > > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > > > > > > > >       own popcount implementation, based on Nick Desaulniers's comment.
> > > > > > > > > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > > > > > > > > ---
> > > > > > > > >  Documentation/dev-tools/index.rst     |   1 +
> > > > > > > > >  Documentation/dev-tools/pgo.rst       | 127 +++++++++
> > > > > > > > >  MAINTAINERS                           |   9 +
> > > > > > > > >  Makefile                              |   3 +
> > > > > > > > >  arch/Kconfig                          |   1 +
> > > > > > > > >  arch/x86/Kconfig                      |   1 +
> > > > > > > > >  arch/x86/boot/Makefile                |   1 +
> > > > > > > > >  arch/x86/boot/compressed/Makefile     |   1 +
> > > > > > > > >  arch/x86/crypto/Makefile              |   2 +
> > > > > > > > >  arch/x86/entry/vdso/Makefile          |   1 +
> > > > > > > > >  arch/x86/kernel/vmlinux.lds.S         |   2 +
> > > > > > > > >  arch/x86/platform/efi/Makefile        |   1 +
> > > > > > > > >  arch/x86/purgatory/Makefile           |   1 +
> > > > > > > > >  arch/x86/realmode/rm/Makefile         |   1 +
> > > > > > > > >  arch/x86/um/vdso/Makefile             |   1 +
> > > > > > > > >  drivers/firmware/efi/libstub/Makefile |   1 +
> > > > > > > > >  include/asm-generic/vmlinux.lds.h     |  44 +++
> > > > > > > > >  kernel/Makefile                       |   1 +
> > > > > > > > >  kernel/pgo/Kconfig                    |  35 +++
> > > > > > > > >  kernel/pgo/Makefile                   |   5 +
> > > > > > > > >  kernel/pgo/fs.c                       | 382 ++++++++++++++++++++++++++
> > > > > > > > >  kernel/pgo/instrument.c               | 185 +++++++++++++
> > > > > > > > >  kernel/pgo/pgo.h                      | 206 ++++++++++++++
> > > > > > > > >  scripts/Makefile.lib                  |  10 +
> > > > > > > > >  24 files changed, 1022 insertions(+)
> > > > > > > > >  create mode 100644 Documentation/dev-tools/pgo.rst
> > > > > > > > >  create mode 100644 kernel/pgo/Kconfig
> > > > > > > > >  create mode 100644 kernel/pgo/Makefile
> > > > > > > > >  create mode 100644 kernel/pgo/fs.c
> > > > > > > > >  create mode 100644 kernel/pgo/instrument.c
> > > > > > > > >  create mode 100644 kernel/pgo/pgo.h
> > > > > > > > >
> > > > > > > > > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > > > > > > > > index f7809c7b1ba9e..8d6418e858062 100644
> > > > > > > > > --- a/Documentation/dev-tools/index.rst
> > > > > > > > > +++ b/Documentation/dev-tools/index.rst
> > > > > > > > > @@ -26,6 +26,7 @@ whole; patches welcome!
> > > > > > > > >     kgdb
> > > > > > > > >     kselftest
> > > > > > > > >     kunit/index
> > > > > > > > > +   pgo
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >  .. only::  subproject and html
> > > > > > > > > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > > > > > > > > new file mode 100644
> > > > > > > > > index 0000000000000..b7f11d8405b73
> > > > > > > > > --- /dev/null
> > > > > > > > > +++ b/Documentation/dev-tools/pgo.rst
> > > > > > > > > @@ -0,0 +1,127 @@
> > > > > > > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > > > > > > +
> > > > > > > > > +===============================
> > > > > > > > > +Using PGO with the Linux kernel
> > > > > > > > > +===============================
> > > > > > > > > +
> > > > > > > > > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > > > > > > > > +when building with Clang. The profiling data is exported via the ``pgo``
> > > > > > > > > +debugfs directory.
> > > > > > > > > +
> > > > > > > > > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > > +Preparation
> > > > > > > > > +===========
> > > > > > > > > +
> > > > > > > > > +Configure the kernel with:
> > > > > > > > > +
> > > > > > > > > +.. code-block:: make
> > > > > > > > > +
> > > > > > > > > +   CONFIG_DEBUG_FS=y
> > > > > > > > > +   CONFIG_PGO_CLANG=y
> > > > > > > > > +
> > > > > > > > > +Note that kernels compiled with profiling flags will be significantly larger
> > > > > > > > > +and run slower.
> > > > > > > > > +
> > > > > > > > > +Profiling data will only become accessible once debugfs has been mounted:
> > > > > > > > > +
> > > > > > > > > +.. code-block:: sh
> > > > > > > > > +
> > > > > > > > > +   mount -t debugfs none /sys/kernel/debug
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > > +Customization
> > > > > > > > > +=============
> > > > > > > > > +
> > > > > > > > > +You can enable or disable profiling for individual file and directories by
> > > > > > > > > +adding a line similar to the following to the respective kernel Makefile:
> > > > > > > > > +
> > > > > > > > > +- For a single file (e.g. main.o)
> > > > > > > > > +
> > > > > > > > > +  .. code-block:: make
> > > > > > > > > +
> > > > > > > > > +     PGO_PROFILE_main.o := y
> > > > > > > > > +
> > > > > > > > > +- For all files in one directory
> > > > > > > > > +
> > > > > > > > > +  .. code-block:: make
> > > > > > > > > +
> > > > > > > > > +     PGO_PROFILE := y
> > > > > > > > > +
> > > > > > > > > +To exclude files from being profiled use
> > > > > > > > > +
> > > > > > > > > +  .. code-block:: make
> > > > > > > > > +
> > > > > > > > > +     PGO_PROFILE_main.o := n
> > > > > > > > > +
> > > > > > > > > +and
> > > > > > > > > +
> > > > > > > > > +  .. code-block:: make
> > > > > > > > > +
> > > > > > > > > +     PGO_PROFILE := n
> > > > > > > > > +
> > > > > > > > > +Only files which are linked to the main kernel image or are compiled as kernel
> > > > > > > > > +modules are supported by this mechanism.
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > > +Files
> > > > > > > > > +=====
> > > > > > > > > +
> > > > > > > > > +The PGO kernel support creates the following files in debugfs:
> > > > > > > > > +
> > > > > > > > > +``/sys/kernel/debug/pgo``
> > > > > > > > > +       Parent directory for all PGO-related files.
> > > > > > > > > +
> > > > > > > > > +``/sys/kernel/debug/pgo/reset``
> > > > > > > > > +       Global reset file: resets all coverage data to zero when written to.
> > > > > > > > > +
> > > > > > > > > +``/sys/kernel/debug/profraw``
> > > > > > > > > +       The raw PGO data that must be processed with ``llvm_profdata``.
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > > +Workflow
> > > > > > > > > +========
> > > > > > > > > +
> > > > > > > > > +The PGO kernel can be run on the host or test machines. The data though should
> > > > > > > > > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > > > > > > > > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > > > > > > > > +Clang version.
> > > > > > > > > +
> > > > > > > > > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > > > > > > > > +etc. Clang offers tools to perform these tasks.
> > > > > > > > > +
> > > > > > > > > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > > > > > > > > +using the result to optimize the kernel:
> > > > > > > > > +
> > > > > > > > > +1) Install the kernel on the TEST machine.
> > > > > > > > > +
> > > > > > > > > +2) Reset the data counters right before running the load tests
> > > > > > > > > +
> > > > > > > > > +   .. code-block:: sh
> > > > > > > > > +
> > > > > > > > > +      $ echo 1 > /sys/kernel/debug/pgo/reset
> > > > > > > > > +
> > > > > > > >
> > > > > > > > I do not get this...
> > > > > > > >
> > > > > > > > # mount | grep debugfs
> > > > > > > > debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
> > > > > > > >
> > > > > > > > After the load-test...?
> > > > > > > >
> > > > > > > > echo 0 > /sys/kernel/debug/pgo/reset
> > > > > > > >
> > > > > > > Writing anything to /sys/kernel/debug/pgo/reset will cause it to reset
> > > > > > > the profiling counters. I picked 1 (one) semi-randomly, but it could
> > > > > > > be any number, letter, your favorite short story, etc. You don't want
> > > > > > > to reset it before collecting the profiling data from your load tests
> > > > > > > though.
> > > > > > >
> > > > > > > > > +3) Run the load tests.
> > > > > > > > > +
> > > > > > > > > +4) Collect the raw profile data
> > > > > > > > > +
> > > > > > > > > +   .. code-block:: sh
> > > > > > > > > +
> > > > > > > > > +      $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > > > > > > +
> > > > > > > >
> > > > > > > > This is only 4,9M small and seen from the date 5mins before I run the
> > > > > > > > echo-1 line.
> > > > > > > >
> > > > > > > > # ll /sys/kernel/debug/pgo
> > > > > > > > insgesamt 0
> > > > > > > > drwxr-xr-x  2 root root 0 16. Jan 17:29 .
> > > > > > > > drwx------ 41 root root 0 16. Jan 17:29 ..
> > > > > > > > -rw-------  1 root root 0 16. Jan 17:29 profraw
> > > > > > > > --w-------  1 root root 0 16. Jan 18:19 reset
> > > > > > > >
> > > > > > > > # cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > > > > >
> > > > > > > > # ll /tmp/vmlinux.profraw
> > > > > > > > -rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw
> > > > > > > >
> > > > > > > > For me there was no prof-data collected from my defconfig kernel-build.
> > > > > > > >
> > > > > > > The /sys/kernel/debug/pgo/profraw file is read-only. Nothing writes to
> > > > > > > it, not even the kernel. All it does is serialize the profiling
> > > > > > > counters from a memory location in the kernel into a format that
> > > > > > > LLVM's tools can understand.
> > > > > > >
> > > > > > > > > +5) (Optional) Download the raw profile data to the HOST machine.
> > > > > > > > > +
> > > > > > > > > +6) Process the raw profile data
> > > > > > > > > +
> > > > > > > > > +   .. code-block:: sh
> > > > > > > > > +
> > > > > > > > > +      $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > > > > > +
> > > > > > > >
> > > > > > > > Is that executed in /path/to/linux/git?
> > > > > > > >
> > > > > > > The llvm-profdata tool is not in the linux source tree. You need to
> > > > > > > grab it from a clang distribution (or built from clang's git repo).
> > > > > > >
> > > > > > > > > +   Note that multiple raw profile data files can be merged during this step.
> > > > > > > > > +
> > > > > > > > > +7) Rebuild the kernel using the profile data (PGO disabled)
> > > > > > > > > +
> > > > > > > > > +   .. code-block:: sh
> > > > > > > > > +
> > > > > > > > > +      $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > > > > >
> > > > > > > > How big is vmlinux.profdata (make defconfig)?
> > > > > > > >
> > > > > > > I don't have numbers for this, but from what you listed here, it's ~5M
> > > > > > > in size. The size is proportional to the number of counters
> > > > > > > instrumented in the kernel.
> > > > > > >
> > > > > > > > Do I need to do a full defconfig build or can I stop the build after
> > > > > > > > let me say 10mins?
> > > > > > > >
> > > > > > > You should do a full rebuild. Make sure that PGO is disabled during the rebuild.
> > > > > > >
> > > > > >
> > > > > > Thanks Bill for all the information.
> > > > > >
> > > > > > And sorry if I am so pedantic.
> > > > > >
> > > > > > I have installed my Debian system with Legacy-BIOS enabled.
> > > > > >
> > > > > > When I rebuild with KCFLAGS=-fprofile-use=vmlinux.profdata (LLVM=1 I
> > > > > > have as a default) my system hangs on reboot.
> > > > > >
> > > > > > [ diffconfig ]
> > > > > > $ scripts/diffconfig /boot/config-5.11.0-rc3-8-amd64-clang12-pgo
> > > > > > /boot/config-5.11.0-rc3-9-amd64-clang12-pgo
> > > > > > BUILD_SALT "5.11.0-rc3-8-amd64-clang12-pgo" -> "5.11.0-rc3-9-amd64-clang12-pgo"
> > > > > > PGO_CLANG y -> n
> > > > > >
> > > > > > [ my make line ]
> > > > > > $ cat ../start-build_5.11.0-rc3-9-amd64-clang12-pgo.txt
> > > > > > dileks     63120   63095  0 06:47 pts/2    00:00:00 /usr/bin/perf_5.10
> > > > > > stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> > > > > > LD=ld.lld LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole
> > > > > > LOCALVERSION=-9-amd64-clang12-pgo KBUILD_VERBOSE=1
> > > > > > KBUILD_BUILD_HOST=iniza KBUILD_BUILD_USER=sedat.dilek@gmail.com
> > > > > > KBUILD_BUILD_TIMESTAMP=2021-01-17 bindeb-pkg
> > > > > > KDEB_PKGVERSION=5.11.0~rc3-9~bullseye+dileks1
> > > > > > KCFLAGS=-fprofile-use=vmlinux.profdata
> > > > > >
> > > > > > ( Yes, 06:47 a.m. in the morning :-). )
> > > > > >
> > > > > > When I boot with the rebuild Linux-kernel I see:
> > > > > >
> > > > > > Wrong EFI loader signature
> > > > > > ...
> > > > > > Decompressing
> > > > > > Parsing EFI
> > > > > > Performing Relocations done.
> > > > > > Booting the Kernel.
> > > > > >
> > > > > > *** SYSTEM HANGS ***
> > > > > > ( I waited for approx 1 min )
> > > > > >
> > > > > > I tried to turn UEFI support ON and OFF.
> > > > > > No success.
> > > > > >
> > > > > > Does Clang-PGO support Legacy-BIOS or is something different wrong?
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > >
> > > > > My bootloader is GRUB.
> > > > >
> > > > > In UEFI-BIOS settings there is no secure-boot disable option.
> > > > > Just simple "Use UEFI BIOS" enabled|disabled.
> > > > >
> > > > > Installed Debian packages:
> > > > >
> > > > > ii grub-common 2.04-12
> > > > > ii grub-pc 2.04-12
> > > > > ii grub-pc-bin 2.04-12
> > > > > ii grub2-common 2.04-12
> > > > >
> > > > > I found in the below link to do in grub-shell:
> > > > >
> > > > > set check_signatures=no
> > > > >
> > > > > But this is when grub-efi is installed.
> > > > >
> > > > > - Sedat -
> > > > >
> > > > > Link: https://unix.stackexchange.com/questions/126286/grub-efi-disable-signature-check
> > > >
> > > > Forget about that "Wrong EFI bootloader" - I see this with all other
> > > > kernels (all boot fine).
> > > >
> > > > I tried in QEMU with and without KASLR:
> > > >
> > > > [ run_qemu.sh ]
> > > > KPATH=$(pwd)
> > > >
> > > > APPEND="root=/dev/ram0 console=ttyS0 hung_task_panic=1 earlyprintk=ttyS0,115200"
> > > > APPEND="$APPEND nokaslr"
> > > >
> > > > qemu-system-x86_64 -enable-kvm -M pc -kernel $KPATH/bzImage -initrd
> > > > $KPATH/initrd.img -m 512 -net none -serial stdio -append "${APPEND}"
> > > > [ /run_qemu.sh ]
> > > >
> > > > $ ./run_qemu.sh
> > > > Probing EDD (edd=off to disable)... ok
> > > > Wrong EFI loader signature.
> > > > early console in extract_kernel
> > > > input_data: 0x000000000289940d
> > > > input_len: 0x000000000069804a
> > > > output: 0x0000000001000000
> > > > output_len: 0x0000000001ef2010
> > > > kernel_total_size: 0x0000000001c2c000
> > > > needed_size: 0x0000000002000000
> > > > trampoline_32bit: 0x000000000009d000
> > > >
> > > >
> > > > KASLR disabled: 'nokaslr' on cmdline.
> > > >
> > > >
> > > > Decompressing Linux... Parsing ELF... No relocation needed... done.
> > > > Booting the kernel.
> > > >
> > > > QEMU run stops, too.
> > > >
> > >
> > > I re-generated my initrd.img with GZIP as compressor (my default is ZSTD).
> > >
> > > --- /etc/initramfs-tools/initramfs.conf 2021-01-17 12:35:30.823818501 +0100
> > > +++ /etc/initramfs-tools/initramfs.conf.zstd    2020-09-21
> > > 23:55:43.121735427 +0200
> > > @@ -41,7 +41,7 @@ KEYMAP=n
> > > # COMPRESS: [ gzip | bzip2 | lz4 | lzma | lzop | xz | zstd ]
> > > #
> > >
> > > -COMPRESS=gzip
> > > +COMPRESS=zstd
> > >
> > > #
> > > # DEVICE: ...
> > >
> > > root# KVER="5.11.0-rc3-9-amd64-clang12-pgo" ; update-initramfs -c -k $KVER
> > >
> > > QEMU boot stops at the same stage.
> > >
> > > Now, my head is empty...
> > >
> > > Any comments?
> > >
> >
> > ( Just as a side note I have Nick's DWARF-v5 support enabled. )
> >
> > There is one EFI related warning in my build-log:
> >
> > $ grep warning: build-log_5.11.0-rc3-9-amd64-clang12-pgo.txt
> > dpkg-architecture: warning: specified GNU system type x86_64-linux-gnu
> > does not match CC system type x86_64-pc-linux-gnu, try setting a
> > correct CC environment variable
> > warning: arch/x86/platform/efi/quirks.c: Function control flow change
> > detected (hash mismatch) efi_arch_mem_reserve Hash =
> > 391331300655996873 [-Wbackend-plugin]
> > warning: arch/x86/platform/efi/efi.c: Function control flow change
> > detected (hash mismatch) efi_attr_is_visible Hash = 567185240781730690
> > [-Wbackend-plugin]
> > arch/x86/crypto/aegis128-aesni-glue.c:265:30: warning: unused variable
> > 'simd_alg' [-Wunused-variable]
> > warning: lib/crypto/sha256.c: Function control flow change detected
> > (hash mismatch) sha256_update Hash = 744640996947387358
> > [-Wbackend-plugin]
> > warning: arch/x86/boot/compressed/string.c: Function control flow
> > change detected (hash mismatch) memcmp Hash = 742261418966908927
> > [-Wbackend-plugin]
> > warning: arch/x86/boot/compressed/string.c: Function control flow
> > change detected (hash mismatch) bcmp Hash = 742261418966908927
> > [-Wbackend-plugin]
> > warning: arch/x86/boot/compressed/string.c: Function control flow
> > change detected (hash mismatch) strcmp Hash = 536873291001348520
> > [-Wbackend-plugin]
> > warning: arch/x86/boot/compressed/string.c: Function control flow
> > change detected (hash mismatch) strnlen Hash = 146835646621254984
> > [-Wbackend-plugin]
> > warning: arch/x86/boot/compressed/string.c: Function control flow
> > change detected (hash mismatch) simple_strtoull Hash =
> > 252792765950587360 [-Wbackend-plugin]
> > warning: arch/x86/boot/compressed/string.c: Function control flow
> > change detected (hash mismatch) strstr Hash = 391331303349076211
> > [-Wbackend-plugin]
> > warning: arch/x86/boot/compressed/string.c: Function control flow
> > change detected (hash mismatch) strchr Hash = 1063705159280644635
> > [-Wbackend-plugin]
> > warning: arch/x86/boot/compressed/string.c: Function control flow
> > change detected (hash mismatch) kstrtoull Hash = 758414239132790022
> > [-Wbackend-plugin]
> > drivers/infiniband/hw/hfi1/platform.o: warning: objtool: tune_serdes()
> > falls through to next function apply_tx_lanes()
> >
> > Cannot say if this information is helpful.
> >
>
> My LLVM/Clang v12 is from <apt.llvm.org>:
>
> clang-12 version 1:12~++20210115111113+45ef053bd709-1~exp1~20210115101809.3724
>
> My kernel-config is attached.
>

I dropped "LLVM_IAS=1" from my make line and did for my next build:

$ scripts/diffconfig /boot/config-5.11.0-rc3-8-amd64-clang12-pgo .config
BUILD_SALT "5.11.0-rc3-8-amd64-clang12-pgo" -> "5.11.0-rc3-10-amd64-clang12-pgo"
DEBUG_INFO_DWARF2 n -> y
DEBUG_INFO_DWARF5 y -> n
PGO_CLANG y -> n

Means dropped DWARF5 support.

- Sedat -

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure
  2021-01-17 17:42                         ` Sedat Dilek
@ 2021-01-17 20:34                           ` Bill Wendling
       [not found]                             ` <CA+icZUU1HihUFaEHzF69+01+Picg8aq6HAqHupxiRqyDGJ=Mpw@mail.gmail.com>
  0 siblings, 1 reply; 116+ messages in thread
From: Bill Wendling @ 2021-01-17 20:34 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Jonathan Corbet, Masahiro Yamada, Linux Doc Mailing List, LKML,
	Linux Kbuild mailing list, Clang-Built-Linux ML, Andrew Morton,
	Nathan Chancellor, Nick Desaulniers, Sami Tolvanen

On Sun, Jan 17, 2021 at 9:42 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
> On Sun, Jan 17, 2021 at 1:05 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >
> > On Sun, Jan 17, 2021 at 12:58 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > >
> > > On Sun, Jan 17, 2021 at 12:42 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > > >
> > > > On Sun, Jan 17, 2021 at 12:23 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > > > >
> > > > > On Sun, Jan 17, 2021 at 11:53 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > > > > >
> > > > > > On Sun, Jan 17, 2021 at 11:44 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > > > > > >
> > > > > > > On Sat, Jan 16, 2021 at 9:23 PM Bill Wendling <morbo@google.com> wrote:
> > > > > > > >
> > > > > > > > On Sat, Jan 16, 2021 at 9:39 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > > > > > > > > On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
> > > > > > > > > <clang-built-linux@googlegroups.com> wrote:
> > > > > > > > > >
> > > > > > > > > > From: Sami Tolvanen <samitolvanen@google.com>
> > > > > > > > > >
> > > > > > > > > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > > > > > > > > profile, the kernel is instrumented with PGO counters, a representative
> > > > > > > > > > workload is run, and the raw profile data is collected from
> > > > > > > > > > /sys/kernel/debug/pgo/profraw.
> > > > > > > > > >
> > > > > > > > > > The raw profile data must be processed by clang's "llvm-profdata" tool
> > > > > > > > > > before it can be used during recompilation:
> > > > > > > > > >
> > > > > > > > > >   $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > > > > > > >   $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > > > > > >
> > > > > > > > > > Multiple raw profiles may be merged during this step.
> > > > > > > > > >
> > > > > > > > > > The data can now be used by the compiler:
> > > > > > > > > >
> > > > > > > > > >   $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > > > > > > >
> > > > > > > > > > This initial submission is restricted to x86, as that's the platform we
> > > > > > > > > > know works. This restriction can be lifted once other platforms have
> > > > > > > > > > been verified to work with PGO.
> > > > > > > > > >
> > > > > > > > > > Note that this method of profiling the kernel is clang-native, unlike
> > > > > > > > > > the clang support in kernel/gcov.
> > > > > > > > > >
> > > > > > > > > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> > > > > > > > > > Co-developed-by: Bill Wendling <morbo@google.com>
> > > > > > > > > > Signed-off-by: Bill Wendling <morbo@google.com>
> > > > > > > > > > ---
> > > > > > > > > > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > > > > > > > > >       testing.
> > > > > > > > > >     - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > > > > > > > > >       Song's comments.
> > > > > > > > > > v3: - Added change log section based on Sedat Dilek's comments.
> > > > > > > > > > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > > > > > > > > >       own popcount implementation, based on Nick Desaulniers's comment.
> > > > > > > > > > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > > > > > > > > > ---
> > > > > > > > > >  Documentation/dev-tools/index.rst     |   1 +
> > > > > > > > > >  Documentation/dev-tools/pgo.rst       | 127 +++++++++
> > > > > > > > > >  MAINTAINERS                           |   9 +
> > > > > > > > > >  Makefile                              |   3 +
> > > > > > > > > >  arch/Kconfig                          |   1 +
> > > > > > > > > >  arch/x86/Kconfig                      |   1 +
> > > > > > > > > >  arch/x86/boot/Makefile                |   1 +
> > > > > > > > > >  arch/x86/boot/compressed/Makefile     |   1 +
> > > > > > > > > >  arch/x86/crypto/Makefile              |   2 +
> > > > > > > > > >  arch/x86/entry/vdso/Makefile          |   1 +
> > > > > > > > > >  arch/x86/kernel/vmlinux.lds.S         |   2 +
> > > > > > > > > >  arch/x86/platform/efi/Makefile        |   1 +
> > > > > > > > > >  arch/x86/purgatory/Makefile           |   1 +
> > > > > > > > > >  arch/x86/realmode/rm/Makefile         |   1 +
> > > > > > > > > >  arch/x86/um/vdso/Makefile             |   1 +
> > > > > > > > > >  drivers/firmware/efi/libstub/Makefile |   1 +
> > > > > > > > > >  include/asm-generic/vmlinux.lds.h     |  44 +++
> > > > > > > > > >  kernel/Makefile                       |   1 +
> > > > > > > > > >  kernel/pgo/Kconfig                    |  35 +++
> > > > > > > > > >  kernel/pgo/Makefile                   |   5 +
> > > > > > > > > >  kernel/pgo/fs.c                       | 382 ++++++++++++++++++++++++++
> > > > > > > > > >  kernel/pgo/instrument.c               | 185 +++++++++++++
> > > > > > > > > >  kernel/pgo/pgo.h                      | 206 ++++++++++++++
> > > > > > > > > >  scripts/Makefile.lib                  |  10 +
> > > > > > > > > >  24 files changed, 1022 insertions(+)
> > > > > > > > > >  create mode 100644 Documentation/dev-tools/pgo.rst
> > > > > > > > > >  create mode 100644 kernel/pgo/Kconfig
> > > > > > > > > >  create mode 100644 kernel/pgo/Makefile
> > > > > > > > > >  create mode 100644 kernel/pgo/fs.c
> > > > > > > > > >  create mode 100644 kernel/pgo/instrument.c
> > > > > > > > > >  create mode 100644 kernel/pgo/pgo.h
> > > > > > > > > >
> > > > > > > > > > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > > > > > > > > > index f7809c7b1ba9e..8d6418e858062 100644
> > > > > > > > > > --- a/Documentation/dev-tools/index.rst
> > > > > > > > > > +++ b/Documentation/dev-tools/index.rst
> > > > > > > > > > @@ -26,6 +26,7 @@ whole; patches welcome!
> > > > > > > > > >     kgdb
> > > > > > > > > >     kselftest
> > > > > > > > > >     kunit/index
> > > > > > > > > > +   pgo
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >  .. only::  subproject and html
> > > > > > > > > > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > > > > > > > > > new file mode 100644
> > > > > > > > > > index 0000000000000..b7f11d8405b73
> > > > > > > > > > --- /dev/null
> > > > > > > > > > +++ b/Documentation/dev-tools/pgo.rst
> > > > > > > > > > @@ -0,0 +1,127 @@
> > > > > > > > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > > > > > > > +
> > > > > > > > > > +===============================
> > > > > > > > > > +