All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
@ 2020-10-03  8:07 kernel test robot
  0 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2020-10-03  8:07 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 7355 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <20201003011745.7768-2-ricardo.neri-calderon@linux.intel.com>
References: <20201003011745.7768-2-ricardo.neri-calderon@linux.intel.com>
TO: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
TO: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>
TO: x86(a)kernel.org
TO: Borislav Petkov <bp@suse.de>
TO: Ingo Molnar <mingo@kernel.org>
TO: Thomas Gleixner <tglx@linutronix.de>
TO: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
CC: Tony Luck <tony.luck@intel.com>
CC: Len Brown <len.brown@intel.com>
CC: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
CC: linux-kernel(a)vger.kernel.org

Hi Ricardo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/x86/core]
[also build test WARNING on tip/master driver-core/driver-core-testing linus/master v5.9-rc7 next-20201002]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Ricardo-Neri/drivers-core-Introduce-CPU-type-sysfs-interface/20201003-091754
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 238c91115cd05c71447ea071624a4c9fe661f970
:::::: branch date: 7 hours ago
:::::: commit date: 7 hours ago
config: s390-randconfig-m031-20201002 (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
drivers/base/cpu.c:735 cpu_type_sysfs_online() warn: we never enter this loop
drivers/base/cpu.c:772 cpu_type_sysfs_offline() warn: we never enter this loop

Old smatch warnings:
drivers/base/cpu.c:747 cpu_type_sysfs_online() error: we previously assumed 'mask' could be null (see line 747)
drivers/base/cpu.c:779 cpu_type_sysfs_offline() error: we previously assumed 'mask' could be null (see line 779)

vim +735 drivers/base/cpu.c

110646cfb07328 Ricardo Neri 2020-10-02  728  
110646cfb07328 Ricardo Neri 2020-10-02  729  static int cpu_type_sysfs_online(unsigned int cpu)
110646cfb07328 Ricardo Neri 2020-10-02  730  {
110646cfb07328 Ricardo Neri 2020-10-02  731  	u32 cpu_type = arch_get_cpu_type(cpu);
110646cfb07328 Ricardo Neri 2020-10-02  732  	struct cpumask *mask;
110646cfb07328 Ricardo Neri 2020-10-02  733  	int i;
110646cfb07328 Ricardo Neri 2020-10-02  734  
110646cfb07328 Ricardo Neri 2020-10-02 @735  	for (i = 0; i < ARRAY_SIZE(cpu_type_devices); i++) {
110646cfb07328 Ricardo Neri 2020-10-02  736  		u32 this_cpu_type;
110646cfb07328 Ricardo Neri 2020-10-02  737  
110646cfb07328 Ricardo Neri 2020-10-02  738  		/*
110646cfb07328 Ricardo Neri 2020-10-02  739  		 * The first devices in the array are used first. Thus, create
110646cfb07328 Ricardo Neri 2020-10-02  740  		 * a new device as well as sysfs directory and for the type of
110646cfb07328 Ricardo Neri 2020-10-02  741  		 * @cpu.
110646cfb07328 Ricardo Neri 2020-10-02  742  		 */
110646cfb07328 Ricardo Neri 2020-10-02  743  		if (!cpu_type_devices[i])
110646cfb07328 Ricardo Neri 2020-10-02  744  			return cpu_type_create_device(cpu, i);
110646cfb07328 Ricardo Neri 2020-10-02  745  
110646cfb07328 Ricardo Neri 2020-10-02  746  		mask = dev_get_drvdata(cpu_type_devices[i]);
110646cfb07328 Ricardo Neri 2020-10-02  747  		if (!mask && !cpumask_weight(mask)) {
110646cfb07328 Ricardo Neri 2020-10-02  748  			/*
110646cfb07328 Ricardo Neri 2020-10-02  749  			 * We should not be here. Be paranoid about
110646cfb07328 Ricardo Neri 2020-10-02  750  			 * NULL pointers.
110646cfb07328 Ricardo Neri 2020-10-02  751  			 */
110646cfb07328 Ricardo Neri 2020-10-02  752  			dev_err(cpu_type_devices[i], "CPU mask is invalid");
110646cfb07328 Ricardo Neri 2020-10-02  753  			return -EINVAL;
110646cfb07328 Ricardo Neri 2020-10-02  754  		}
110646cfb07328 Ricardo Neri 2020-10-02  755  
110646cfb07328 Ricardo Neri 2020-10-02  756  		this_cpu_type = arch_get_cpu_type(cpumask_first(mask));
110646cfb07328 Ricardo Neri 2020-10-02  757  		if (cpu_type == this_cpu_type) {
110646cfb07328 Ricardo Neri 2020-10-02  758  			cpumask_set_cpu(cpu, mask);
110646cfb07328 Ricardo Neri 2020-10-02  759  			return 0;
110646cfb07328 Ricardo Neri 2020-10-02  760  		}
110646cfb07328 Ricardo Neri 2020-10-02  761  	}
110646cfb07328 Ricardo Neri 2020-10-02  762  
110646cfb07328 Ricardo Neri 2020-10-02  763  	return -ENODEV;
110646cfb07328 Ricardo Neri 2020-10-02  764  }
110646cfb07328 Ricardo Neri 2020-10-02  765  
110646cfb07328 Ricardo Neri 2020-10-02  766  static int cpu_type_sysfs_offline(unsigned int cpu)
110646cfb07328 Ricardo Neri 2020-10-02  767  {
110646cfb07328 Ricardo Neri 2020-10-02  768  	u32 cpu_type = arch_get_cpu_type(cpu);
110646cfb07328 Ricardo Neri 2020-10-02  769  	struct cpumask *mask;
110646cfb07328 Ricardo Neri 2020-10-02  770  	int i;
110646cfb07328 Ricardo Neri 2020-10-02  771  
110646cfb07328 Ricardo Neri 2020-10-02 @772  	for (i = 0; i < ARRAY_SIZE(cpu_type_devices); i++) {
110646cfb07328 Ricardo Neri 2020-10-02  773  		u32 this_cpu_type;
110646cfb07328 Ricardo Neri 2020-10-02  774  
110646cfb07328 Ricardo Neri 2020-10-02  775  		if (!cpu_type_devices[i])
110646cfb07328 Ricardo Neri 2020-10-02  776  			continue;
110646cfb07328 Ricardo Neri 2020-10-02  777  
110646cfb07328 Ricardo Neri 2020-10-02  778  		mask = dev_get_drvdata(cpu_type_devices[i]);
110646cfb07328 Ricardo Neri 2020-10-02  779  		if (!mask && !cpumask_weight(mask)) {
110646cfb07328 Ricardo Neri 2020-10-02  780  			/*
110646cfb07328 Ricardo Neri 2020-10-02  781  			 * We should not be here. Be paranoid about
110646cfb07328 Ricardo Neri 2020-10-02  782  			 * NULL pointers.
110646cfb07328 Ricardo Neri 2020-10-02  783  			 */
110646cfb07328 Ricardo Neri 2020-10-02  784  			dev_err(cpu_type_devices[i], "CPU mask is invalid");
110646cfb07328 Ricardo Neri 2020-10-02  785  			continue;
110646cfb07328 Ricardo Neri 2020-10-02  786  		}
110646cfb07328 Ricardo Neri 2020-10-02  787  
110646cfb07328 Ricardo Neri 2020-10-02  788  		this_cpu_type = arch_get_cpu_type(cpumask_first(mask));
110646cfb07328 Ricardo Neri 2020-10-02  789  		if (cpu_type == this_cpu_type) {
110646cfb07328 Ricardo Neri 2020-10-02  790  			cpumask_clear_cpu(cpu, mask);
110646cfb07328 Ricardo Neri 2020-10-02  791  			return 0;
110646cfb07328 Ricardo Neri 2020-10-02  792  		}
110646cfb07328 Ricardo Neri 2020-10-02  793  	}
110646cfb07328 Ricardo Neri 2020-10-02  794  
110646cfb07328 Ricardo Neri 2020-10-02  795  	/*
110646cfb07328 Ricardo Neri 2020-10-02  796  	 * If we are here, no matching cpu_type was found. This CPU was not
110646cfb07328 Ricardo Neri 2020-10-02  797  	 * accounted for@hotplug online.
110646cfb07328 Ricardo Neri 2020-10-02  798  	 */
110646cfb07328 Ricardo Neri 2020-10-02  799  	pr_warn("Unexpected CPU offline!\n");
110646cfb07328 Ricardo Neri 2020-10-02  800  
110646cfb07328 Ricardo Neri 2020-10-02  801  	return -ENODEV;
110646cfb07328 Ricardo Neri 2020-10-02  802  }
110646cfb07328 Ricardo Neri 2020-10-02  803  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 20713 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
       [not found]                     ` <38f290d2-4c3a-d1b0-f3cc-a0897ea10abd@gmail.com>
  2020-11-12 11:34                       ` Greg Kroah-Hartman
@ 2020-11-19  8:25                       ` Fox Chen
  1 sibling, 0 replies; 22+ messages in thread
From: Fox Chen @ 2020-11-19  8:25 UTC (permalink / raw)
  To: Brice Goglin; +Cc: Greg Kroah-Hartman, x86, linux-kernel

On Fri, Nov 13, 2020 at 2:15 PM Brice Goglin <brice.goglin@gmail.com> wrote:
>
>
> Le 12/11/2020 à 11:49, Greg Kroah-Hartman a écrit :
>
> On Thu, Nov 12, 2020 at 10:10:57AM +0100, Brice Goglin wrote:
>
> Le 12/11/2020 à 07:42, Greg Kroah-Hartman a écrit :
>
> On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote:
>
> Le 07/10/2020 à 07:15, Greg Kroah-Hartman a écrit :
>
> On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote:
>
> On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
>
> On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
>
> On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
>
> On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
>
> Hybrid CPU topologies combine CPUs of different microarchitectures in the
> same die. Thus, even though the instruction set is compatible among all
> CPUs, there may still be differences in features (e.g., some CPUs may
> have counters that others CPU do not). There may be applications
> interested in knowing the type of micro-architecture topology of the
> system to make decisions about process affinity.
>
> While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> cpu_capacity) may be used to infer the types of micro-architecture of the
> CPUs in the platform, it may not be entirely accurate. For instance, two
> subsets of CPUs with different types of micro-architecture may have the
> same capacity due to power or thermal constraints.
>
> Create the new directory /sys/devices/system/cpu/types. Under such
> directory, create individual subdirectories for each type of CPU micro-
> architecture. Each subdirectory will have cpulist and cpumap files. This
> makes it convenient for user space to read all the CPUs of the same type
> at once without having to inspect each CPU individually.
>
> Implement a generic interface using weak functions that architectures can
> override to indicate a) support for CPU types, b) the CPU type number, and
> c) a string to identify the CPU vendor and type.
>
> For example, an x86 system with one Intel Core and four Intel Atom CPUs
> would look like this (other architectures have the hooks to use whatever
> directory naming convention below "types" that meets their needs):
>
> user@host:~$: ls /sys/devices/system/cpu/types
> intel_atom_0  intel_core_0
>
> user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> cpulist cpumap
>
> user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> cpulist cpumap
>
> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> 0f
>
> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> 0-3
>
> user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> 10
>
> user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> 4
>
> Thank you for the quick and detailed Greg!
>
> The output of 'tree' sometimes makes it easier to see here, or:
> grep -R . *
> also works well.
>
> Indeed, this would definitely make it more readable.
>
> On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> created. Add a hook for this purpose.
>
> Why should these not show up if the system is not "hybrid"?
>
> My thinking was that on a non-hybrid system, it does not make sense to
> create this interface, as all the CPUs will be of the same type.
>
> Why not just have this an attribute type in the existing cpuX directory?
> Why do this have to be a totally separate directory and userspace has to
> figure out to look in two different spots for the same cpu to determine
> what it is?
>
> But if the type is located under cpuX, usespace would need to traverse
> all the CPUs and create its own cpu masks. Under the types directory it
> would only need to look once for each type of CPU, IMHO.
>
> What does a "mask" do?  What does userspace care about this?  You would
> have to create it by traversing the directories you are creating anyway,
> so it's not much different, right?
>
> Hello
>
> Sorry for the late reply. As the first userspace consumer of this
> interface [1], I can confirm that reading a single file to get the mask
> would be better, at least for performance reason. On large platforms, we
> already have to read thousands of sysfs files to get CPU topology and
> cache information, I'd be happy not to read one more file per cpu.
>
> Reading these sysfs files is slow, and it does not scale well when
> multiple processes read them in parallel.
>
> Really?  Where is the slowdown?  Would something like readfile() work
> better for you for that?
> https://lore.kernel.org/linux-api/20200704140250.423345-1-gregkh@linuxfoundation.org/
>
> I guess readfile would improve the sequential case by avoiding syscalls
> but it would not improve the parallel case since syscalls shouldn't have
> any parallel issue?
>
> syscalls should not have parallel issues at all.
>
> We've been watching the status of readfile() since it was posted on LKML
> 6 months ago, but we were actually wondering if it would end up being
> included at some point.
>
> It needs a solid reason to be merged.  My "test" benchmarks are fun to
> run, but I have yet to find a real need for it anywhere as the
> open/read/close syscall overhead seems to be lost in the noise on any
> real application workload that I can find.
>
> If you have a real need, and it reduces overhead and cpu usage, I'm more
> than willing to update the patchset and resubmit it.
>
>
> Good, I'll give it at try.
>
>
> How does multiple processes slow anything down, there shouldn't be any
> shared locks here.
>
> When I benchmarked this in 2016, reading a single (small) sysfs file was
> 41x slower when running 64 processes simultaneously on a 64-core Knights
> Landing than reading from a single process. On a SGI Altix UV with 12x
> 8-core CPUs, reading from one process per CPU (12 total) was 60x slower
> (which could mean NUMA affinity matters), and reading from one process
> per core (96 total) was 491x slower.
>
> I will try to find some time to dig further on recent kernels with perf
> and readfile (both machines were running RHEL7).
>
> 2016 was a long time ago in kernel-land, please retest on a kernel.org
> release, not a RHEL monstrosity.
>
>
> Quick test on 5.8.14 from Debian (fairly close to mainline) on a server with 2x20 cores.
>
> I am measuring the time to do open+read+close of /sys/devices/system/cpu/cpu15/topology/die_id 1000 times
>
> With a single process, it takes 2ms (2us per open+read+close, looks OK).
>
> With one process per core (with careful binding, etc), it jumps from 2ms to 190ms (without much variation).
>
> It looks like locks in kernfs_iop_permission and kernfs_dop_revalidate are causing the issue.
>
> I am attaching the perf report callgraph output below.
>
>
>
> There are ways to avoid this
> multiple discoveries by sharing hwloc info through XML or shmem, but it
> will take years before all developers of different runtimes all
> implement this :)
>
> I don't understand, what exactly are you suggesting we do here instead?
>
> I was just saying userspace has ways to mitigate the issue but it will
> take time because many different projects are involved.
>
> I still don't understand, what issue are you referring to?
>
>
> Reading many sysfs files causing the application startup to takes many seconds when launching multiple process at the same time.
>
> Brice
>
>
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 7K of event 'cycles'
> # Event count (approx.): 5291578622
> #
> # Children      Self  Command        Shared Object      Symbol
> # ........  ........  .............  .................  .......................................
> #
>     99.91%     0.00%  fops_overhead  [kernel.kallsyms]  [k] entry_SYSCALL_64_after_hwframe
>             |
>             ---entry_SYSCALL_64_after_hwframe
>                do_syscall_64
>                |
>                |--98.69%--__x64_sys_openat
>                |          |
>                |           --98.67%--do_sys_openat2
>                |                     |
>                |                      --98.57%--do_filp_open
>                |                                path_openat
>                |                                |
>                |                                |--81.83%--link_path_walk.part.0
>                |                                |          |
>                |                                |          |--52.19%--inode_permission.part.0
>                |                                |          |          |
>                |                                |          |           --51.86%--kernfs_iop_permission
>                |                                |          |                     |
>                |                                |          |                     |--50.92%--__mutex_lock.constprop.0
>                |                                |          |                     |          |
>                |                                |          |                     |           --49.58%--osq_lock
>                |                                |          |                     |
>                |                                |          |                      --0.59%--mutex_unlock
>                |                                |          |
>                |                                |           --29.47%--walk_component
>                |                                |                     |
>                |                                |                      --29.10%--lookup_fast
>                |                                |                                |
>                |                                |                                 --28.76%--kernfs_dop_revalidate
>                |                                |                                           |
>                |                                |                                            --28.29%--__mutex_lock.constprop.0
>                |                                |                                                      |
>                |                                |                                                       --27.65%--osq_lock
>                |                                |
>                |                                |--9.60%--lookup_fast
>                |                                |          |
>                |                                |           --9.50%--kernfs_dop_revalidate
>                |                                |                     |
>                |                                |                      --9.35%--__mutex_lock.constprop.0
>                |                                |                                |
>                |                                |                                 --9.18%--osq_lock
>                |                                |
>                |                                |--6.17%--may_open
>                |                                |          |
>                |                                |           --6.13%--inode_permission.part.0
>                |                                |                     |
>                |                                |                      --6.10%--kernfs_iop_permission
>                |                                |                                |
>                |                                |                                 --5.90%--__mutex_lock.constprop.0
>                |                                |                                           |
>                |                                |                                            --5.80%--osq_lock
>                |                                |
>                |                                 --0.52%--do_dentry_open
>                |
>                 --0.63%--__prepare_exit_to_usermode
>                           |
>                            --0.58%--task_work_run
>
>     99.91%     0.01%  fops_overhead  [kernel.kallsyms]  [k] do_syscall_64
>             |
>              --99.89%--do_syscall_64
>                        |
>                        |--98.69%--__x64_sys_openat
>                        |          |
>                        |           --98.67%--do_sys_openat2
>                        |                     |
>                        |                      --98.57%--do_filp_open
>                        |                                path_openat
>                        |                                |
>                        |                                |--81.83%--link_path_walk.part.0
>                        |                                |          |
>                        |                                |          |--52.19%--inode_permission.part.0
>                        |                                |          |          |
>                        |                                |          |           --51.86%--kernfs_iop_permission
>                        |                                |          |                     |
>                        |                                |          |                     |--50.92%--__mutex_lock.constprop.0
>                        |                                |          |                     |          |
>                        |                                |          |                     |           --49.58%--osq_lock
>                        |                                |          |                     |
>                        |                                |          |                      --0.59%--mutex_unlock
>                        |                                |          |
>                        |                                |           --29.47%--walk_component
>                        |                                |                     |
>                        |                                |                      --29.10%--lookup_fast
>                        |                                |                                |
>                        |                                |                                 --28.76%--kernfs_dop_revalidate
>                        |                                |                                           |
>                        |                                |                                            --28.29%--__mutex_lock.constprop.0
>                        |                                |                                                      |
>                        |                                |                                                       --27.65%--osq_lock
>                        |                                |
>                        |                                |--9.60%--lookup_fast
>                        |                                |          |
>                        |                                |           --9.50%--kernfs_dop_revalidate
>                        |                                |                     |
>                        |                                |                      --9.35%--__mutex_lock.constprop.0
>                        |                                |                                |
>                        |                                |                                 --9.18%--osq_lock
>                        |                                |
>                        |                                |--6.17%--may_open
>                        |                                |          |
>                        |                                |           --6.13%--inode_permission.part.0
>                        |                                |                     |
>                        |                                |                      --6.10%--kernfs_iop_permission
>                        |                                |                                |
>                        |                                |                                 --5.90%--__mutex_lock.constprop.0
>                        |                                |                                           |
>                        |                                |                                            --5.80%--osq_lock
>                        |                                |
>                        |                                 --0.52%--do_dentry_open
>                        |
>                         --0.63%--__prepare_exit_to_usermode
>                                   |
>                                    --0.58%--task_work_run
>
>     98.72%     0.00%  fops_overhead  [unknown]          [k] 0x7379732f73656369
>             |
>             ---0x7379732f73656369
>                __GI___libc_open
>                |
>                 --98.70%--entry_SYSCALL_64_after_hwframe
>                           do_syscall_64
>                           |
>                            --98.66%--__x64_sys_openat
>                                      |
>                                       --98.65%--do_sys_openat2
>                                                 |
>                                                  --98.55%--do_filp_open
>                                                            path_openat
>                                                            |
>                                                            |--81.80%--link_path_walk.part.0
>                                                            |          |
>                                                            |          |--52.16%--inode_permission.part.0
>                                                            |          |          |
>                                                            |          |           --51.86%--kernfs_iop_permission
>                                                            |          |                     |
>                                                            |          |                     |--50.92%--__mutex_lock.constprop.0
>                                                            |          |                     |          |
>                                                            |          |                     |           --49.58%--osq_lock
>                                                            |          |                     |
>                                                            |          |                      --0.59%--mutex_unlock
>                                                            |          |
>                                                            |           --29.47%--walk_component
>                                                            |                     |
>                                                            |                      --29.10%--lookup_fast
>                                                            |                                |
>                                                            |                                 --28.76%--kernfs_dop_revalidate
>                                                            |                                           |
>                                                            |                                            --28.29%--__mutex_lock.constprop.0
>                                                            |                                                      |
>                                                            |                                                       --27.65%--osq_lock
>                                                            |
>                                                            |--9.60%--lookup_fast
>                                                            |          |
>                                                            |           --9.50%--kernfs_dop_revalidate
>                                                            |                     |
>                                                            |                      --9.35%--__mutex_lock.constprop.0
>                                                            |                                |
>                                                            |                                 --9.18%--osq_lock
>                                                            |
>                                                            |--6.17%--may_open
>                                                            |          |
>                                                            |           --6.13%--inode_permission.part.0
>                                                            |                     |
>                                                            |                      --6.10%--kernfs_iop_permission
>                                                            |                                |
>                                                            |                                 --5.90%--__mutex_lock.constprop.0
>                                                            |                                           |
>                                                            |                                            --5.80%--osq_lock
>                                                            |
>                                                             --0.52%--do_dentry_open
>
>     98.72%     0.00%  fops_overhead  libc-2.31.so       [.] __GI___libc_open
>             |
>             ---__GI___libc_open
>                |
>                 --98.70%--entry_SYSCALL_64_after_hwframe
>                           do_syscall_64
>                           |
>                            --98.66%--__x64_sys_openat
>                                      |
>                                       --98.65%--do_sys_openat2
>                                                 |
>                                                  --98.55%--do_filp_open
>                                                            path_openat
>                                                            |
>                                                            |--81.80%--link_path_walk.part.0
>                                                            |          |
>                                                            |          |--52.16%--inode_permission.part.0
>                                                            |          |          |
>                                                            |          |           --51.86%--kernfs_iop_permission
>                                                            |          |                     |
>                                                            |          |                     |--50.92%--__mutex_lock.constprop.0
>                                                            |          |                     |          |
>                                                            |          |                     |           --49.58%--osq_lock
>                                                            |          |                     |
>                                                            |          |                      --0.59%--mutex_unlock
>                                                            |          |
>                                                            |           --29.47%--walk_component
>                                                            |                     |
>                                                            |                      --29.10%--lookup_fast
>                                                            |                                |
>                                                            |                                 --28.76%--kernfs_dop_revalidate
>                                                            |                                           |
>                                                            |                                            --28.29%--__mutex_lock.constprop.0
>                                                            |                                                      |
>                                                            |                                                       --27.65%--osq_lock
>                                                            |
>                                                            |--9.60%--lookup_fast
>                                                            |          |
>                                                            |           --9.50%--kernfs_dop_revalidate
>                                                            |                     |
>                                                            |                      --9.35%--__mutex_lock.constprop.0
>                                                            |                                |
>                                                            |                                 --9.18%--osq_lock
>                                                            |
>                                                            |--6.17%--may_open
>                                                            |          |
>                                                            |           --6.13%--inode_permission.part.0
>                                                            |                     |
>                                                            |                      --6.10%--kernfs_iop_permission
>                                                            |                                |
>                                                            |                                 --5.90%--__mutex_lock.constprop.0
>                                                            |                                           |
>                                                            |                                            --5.80%--osq_lock
>                                                            |
>                                                             --0.52%--do_dentry_open
>
>     98.69%     0.01%  fops_overhead  [kernel.kallsyms]  [k] __x64_sys_openat
>             |
>              --98.67%--__x64_sys_openat
>                        do_sys_openat2
>                        |
>                         --98.57%--do_filp_open
>                                   path_openat
>                                   |
>                                   |--81.83%--link_path_walk.part.0
>                                   |          |
>                                   |          |--52.19%--inode_permission.part.0
>                                   |          |          |
>                                   |          |           --51.86%--kernfs_iop_permission
>                                   |          |                     |
>                                   |          |                     |--50.92%--__mutex_lock.constprop.0
>                                   |          |                     |          |
>                                   |          |                     |           --49.58%--osq_lock
>                                   |          |                     |
>                                   |          |                      --0.59%--mutex_unlock
>                                   |          |
>                                   |           --29.47%--walk_component
>                                   |                     |
>                                   |                      --29.10%--lookup_fast
>                                   |                                |
>                                   |                                 --28.76%--kernfs_dop_revalidate
>                                   |                                           |
>                                   |                                            --28.29%--__mutex_lock.constprop.0
>                                   |                                                      |
>                                   |                                                       --27.65%--osq_lock
>                                   |
>                                   |--9.60%--lookup_fast
>                                   |          |
>                                   |           --9.50%--kernfs_dop_revalidate
>                                   |                     |
>                                   |                      --9.35%--__mutex_lock.constprop.0
>                                   |                                |
>                                   |                                 --9.18%--osq_lock
>                                   |
>                                   |--6.17%--may_open
>                                   |          |
>                                   |           --6.13%--inode_permission.part.0
>                                   |                     |
>                                   |                      --6.10%--kernfs_iop_permission
>                                   |                                |
>                                   |                                 --5.90%--__mutex_lock.constprop.0
>                                   |                                           |
>                                   |                                            --5.80%--osq_lock
>                                   |
>                                    --0.52%--do_dentry_open
>
>     98.67%     0.03%  fops_overhead  [kernel.kallsyms]  [k] do_sys_openat2
>             |
>              --98.65%--do_sys_openat2
>                        |
>                         --98.57%--do_filp_open
>                                   path_openat
>                                   |
>                                   |--81.83%--link_path_walk.part.0
>                                   |          |
>                                   |          |--52.19%--inode_permission.part.0
>                                   |          |          |
>                                   |          |           --51.86%--kernfs_iop_permission
>                                   |          |                     |
>                                   |          |                     |--50.92%--__mutex_lock.constprop.0
>                                   |          |                     |          |
>                                   |          |                     |           --49.58%--osq_lock
>                                   |          |                     |
>                                   |          |                      --0.59%--mutex_unlock
>                                   |          |
>                                   |           --29.47%--walk_component
>                                   |                     |
>                                   |                      --29.10%--lookup_fast
>                                   |                                |
>                                   |                                 --28.76%--kernfs_dop_revalidate
>                                   |                                           |
>                                   |                                            --28.29%--__mutex_lock.constprop.0
>                                   |                                                      |
>                                   |                                                       --27.65%--osq_lock
>                                   |
>                                   |--9.60%--lookup_fast
>                                   |          |
>                                   |           --9.50%--kernfs_dop_revalidate
>                                   |                     |
>                                   |                      --9.35%--__mutex_lock.constprop.0
>                                   |                                |
>                                   |                                 --9.18%--osq_lock
>                                   |
>                                   |--6.17%--may_open
>                                   |          |
>                                   |           --6.13%--inode_permission.part.0
>                                   |                     |
>                                   |                      --6.10%--kernfs_iop_permission
>                                   |                                |
>                                   |                                 --5.90%--__mutex_lock.constprop.0
>                                   |                                           |
>                                   |                                            --5.80%--osq_lock
>                                   |
>                                    --0.52%--do_dentry_open
>
>     98.57%     0.00%  fops_overhead  [kernel.kallsyms]  [k] do_filp_open
>             |
>             ---do_filp_open
>                path_openat
>                |
>                |--81.83%--link_path_walk.part.0
>                |          |
>                |          |--52.19%--inode_permission.part.0
>                |          |          |
>                |          |           --51.86%--kernfs_iop_permission
>                |          |                     |
>                |          |                     |--50.92%--__mutex_lock.constprop.0
>                |          |                     |          |
>                |          |                     |           --49.58%--osq_lock
>                |          |                     |
>                |          |                      --0.59%--mutex_unlock
>                |          |
>                |           --29.47%--walk_component
>                |                     |
>                |                      --29.10%--lookup_fast
>                |                                |
>                |                                 --28.76%--kernfs_dop_revalidate
>                |                                           |
>                |                                            --28.29%--__mutex_lock.constprop.0
>                |                                                      |
>                |                                                       --27.65%--osq_lock
>                |
>                |--9.60%--lookup_fast
>                |          |
>                |           --9.50%--kernfs_dop_revalidate
>                |                     |
>                |                      --9.35%--__mutex_lock.constprop.0
>                |                                |
>                |                                 --9.18%--osq_lock
>                |
>                |--6.17%--may_open
>                |          |
>                |           --6.13%--inode_permission.part.0
>                |                     |
>                |                      --6.10%--kernfs_iop_permission
>                |                                |
>                |                                 --5.90%--__mutex_lock.constprop.0
>                |                                           |
>                |                                            --5.80%--osq_lock
>                |
>                 --0.52%--do_dentry_open
>
>     98.57%     0.01%  fops_overhead  [kernel.kallsyms]  [k] path_openat
>             |
>              --98.56%--path_openat
>                        |
>                        |--81.83%--link_path_walk.part.0
>                        |          |
>                        |          |--52.19%--inode_permission.part.0
>                        |          |          |
>                        |          |           --51.86%--kernfs_iop_permission
>                        |          |                     |
>                        |          |                     |--50.92%--__mutex_lock.constprop.0
>                        |          |                     |          |
>                        |          |                     |           --49.58%--osq_lock
>                        |          |                     |
>                        |          |                      --0.59%--mutex_unlock
>                        |          |
>                        |           --29.47%--walk_component
>                        |                     |
>                        |                      --29.10%--lookup_fast
>                        |                                |
>                        |                                 --28.76%--kernfs_dop_revalidate
>                        |                                           |
>                        |                                            --28.29%--__mutex_lock.constprop.0
>                        |                                                      |
>                        |                                                       --27.65%--osq_lock
>                        |
>                        |--9.60%--lookup_fast
>                        |          |
>                        |           --9.50%--kernfs_dop_revalidate
>                        |                     |
>                        |                      --9.35%--__mutex_lock.constprop.0
>                        |                                |
>                        |                                 --9.18%--osq_lock
>                        |
>                        |--6.17%--may_open
>                        |          |
>                        |           --6.13%--inode_permission.part.0
>                        |                     |
>                        |                      --6.10%--kernfs_iop_permission
>                        |                                |
>                        |                                 --5.90%--__mutex_lock.constprop.0
>                        |                                           |
>                        |                                            --5.80%--osq_lock
>                        |
>                         --0.52%--do_dentry_open
>
>     94.52%     1.30%  fops_overhead  [kernel.kallsyms]  [k] __mutex_lock.constprop.0
>             |
>             |--93.23%--__mutex_lock.constprop.0
>             |          |
>             |          |--92.23%--osq_lock
>             |          |
>             |           --0.55%--mutex_spin_on_owner
>             |
>              --1.30%--0x7379732f73656369
>                        __GI___libc_open
>                        entry_SYSCALL_64_after_hwframe
>                        do_syscall_64
>                        __x64_sys_openat
>                        do_sys_openat2
>                        do_filp_open
>                        path_openat
>                        |
>                         --1.09%--link_path_walk.part.0
>                                   |
>                                    --0.75%--inode_permission.part.0
>                                              kernfs_iop_permission
>                                              __mutex_lock.constprop.0
>
>     92.24%    92.22%  fops_overhead  [kernel.kallsyms]  [k] osq_lock
>             |
>              --92.22%--0x7379732f73656369
>                        __GI___libc_open
>                        entry_SYSCALL_64_after_hwframe
>                        do_syscall_64
>                        __x64_sys_openat
>                        do_sys_openat2
>                        do_filp_open
>                        path_openat
>                        |
>                        |--77.21%--link_path_walk.part.0
>                        |          |
>                        |          |--49.57%--inode_permission.part.0
>                        |          |          kernfs_iop_permission
>                        |          |          __mutex_lock.constprop.0
>                        |          |          osq_lock
>                        |          |
>                        |           --27.64%--walk_component
>                        |                     lookup_fast
>                        |                     kernfs_dop_revalidate
>                        |                     __mutex_lock.constprop.0
>                        |                     osq_lock
>                        |
>                        |--9.18%--lookup_fast
>                        |          kernfs_dop_revalidate
>                        |          __mutex_lock.constprop.0
>                        |          osq_lock
>                        |
>                         --5.80%--may_open
>                                   inode_permission.part.0
>                                   kernfs_iop_permission
>                                   __mutex_lock.constprop.0
>                                   osq_lock
>
>     81.83%     0.03%  fops_overhead  [kernel.kallsyms]  [k] link_path_walk.part.0
>             |
>              --81.80%--link_path_walk.part.0
>                        |
>                        |--52.19%--inode_permission.part.0
>                        |          |
>                        |           --51.86%--kernfs_iop_permission
>                        |                     |
>                        |                     |--50.92%--__mutex_lock.constprop.0
>                        |                     |          |
>                        |                     |           --49.58%--osq_lock
>                        |                     |
>                        |                      --0.59%--mutex_unlock
>                        |
>                         --29.47%--walk_component
>                                   |
>                                    --29.10%--lookup_fast
>                                              |
>                                               --28.76%--kernfs_dop_revalidate
>                                                         |
>                                                          --28.29%--__mutex_lock.constprop.0
>                                                                    |
>                                                                     --27.65%--osq_lock
>
>     58.32%     0.24%  fops_overhead  [kernel.kallsyms]  [k] inode_permission.part.0
>             |
>              --58.08%--inode_permission.part.0
>                        |
>                         --57.97%--kernfs_iop_permission
>                                   |
>                                   |--56.81%--__mutex_lock.constprop.0
>                                   |          |
>                                   |           --55.39%--osq_lock
>                                   |
>                                    --0.73%--mutex_unlock
>
>     57.97%     0.00%  fops_overhead  [kernel.kallsyms]  [k] kernfs_iop_permission
>             |
>             ---kernfs_iop_permission
>                |
>                |--56.81%--__mutex_lock.constprop.0
>                |          |
>                |           --55.39%--osq_lock
>                |
>                 --0.73%--mutex_unlock
>
>     38.71%     0.03%  fops_overhead  [kernel.kallsyms]  [k] lookup_fast
>             |
>              --38.68%--lookup_fast
>                        |
>                         --38.26%--kernfs_dop_revalidate
>                                   |
>                                    --37.64%--__mutex_lock.constprop.0
>                                              |
>                                               --36.83%--osq_lock
>
>     38.26%     0.04%  fops_overhead  [kernel.kallsyms]  [k] kernfs_dop_revalidate
>             |
>              --38.22%--kernfs_dop_revalidate
>                        |
>                         --37.64%--__mutex_lock.constprop.0
>                                   |
>                                    --36.83%--osq_lock
>
>     29.47%     0.03%  fops_overhead  [kernel.kallsyms]  [k] walk_component
>             |
>              --29.44%--walk_component
>                        |
>                         --29.10%--lookup_fast
>                                   |
>                                    --28.76%--kernfs_dop_revalidate
>                                              |
>                                               --28.29%--__mutex_lock.constprop.0
>                                                         |
>                                                          --27.65%--osq_lock
>
>      6.17%     0.03%  fops_overhead  [kernel.kallsyms]  [k] may_open
>             |
>              --6.14%--may_open
>                        |
>                         --6.13%--inode_permission.part.0
>                                   |
>                                    --6.10%--kernfs_iop_permission
>                                              |
>                                               --5.90%--__mutex_lock.constprop.0
>                                                         |
>                                                          --5.80%--osq_lock
>
>      1.22%     0.00%  fops_overhead  [unknown]          [k] 0x5541d68949564100
>             |
>             ---0x5541d68949564100
>                __libc_start_main
>                |
>                |--0.68%--__close
>                |          |
>                |           --0.66%--entry_SYSCALL_64_after_hwframe
>                |                     do_syscall_64
>                |                     |
>                |                      --0.61%--__prepare_exit_to_usermode
>                |                                |
>                |                                 --0.58%--task_work_run
>                |
>                 --0.54%--read
>
>      1.22%     0.00%  fops_overhead  libc-2.31.so       [.] __libc_start_main
>             |
>             ---__libc_start_main
>                |
>                |--0.68%--__close
>                |          |
>                |           --0.66%--entry_SYSCALL_64_after_hwframe
>                |                     do_syscall_64
>                |                     |
>                |                      --0.61%--__prepare_exit_to_usermode
>                |                                |
>                |                                 --0.58%--task_work_run
>                |
>                 --0.54%--read
>
>      1.06%     1.05%  fops_overhead  [kernel.kallsyms]  [k] mutex_unlock
>             |
>              --1.02%--0x7379732f73656369
>                        __GI___libc_open
>                        entry_SYSCALL_64_after_hwframe
>                        do_syscall_64
>                        __x64_sys_openat
>                        do_sys_openat2
>                        do_filp_open
>                        path_openat
>                        |
>                         --0.80%--link_path_walk.part.0
>                                   |
>                                    --0.60%--inode_permission.part.0
>                                              kernfs_iop_permission
>                                              |
>                                               --0.59%--mutex_unlock
>
>      0.88%     0.79%  fops_overhead  [kernel.kallsyms]  [k] mutex_lock
>             |
>              --0.68%--0x7379732f73656369
>                        __GI___libc_open
>                        entry_SYSCALL_64_after_hwframe
>                        do_syscall_64
>                        __x64_sys_openat
>                        do_sys_openat2
>                        do_filp_open
>                        path_openat
>
>      0.68%     0.01%  fops_overhead  libc-2.31.so       [.] __close
>             |
>              --0.67%--__close
>                        |
>                         --0.66%--entry_SYSCALL_64_after_hwframe
>                                   do_syscall_64
>                                   |
>                                    --0.61%--__prepare_exit_to_usermode
>                                              |
>                                               --0.58%--task_work_run
>
>      0.63%     0.05%  fops_overhead  [kernel.kallsyms]  [k] __prepare_exit_to_usermode
>             |
>              --0.58%--__prepare_exit_to_usermode
>                        task_work_run
>
>      0.58%     0.00%  fops_overhead  [kernel.kallsyms]  [k] task_work_run
>             |
>             ---task_work_run
>
>      0.58%     0.10%  fops_overhead  [kernel.kallsyms]  [k] dput
>      0.56%     0.55%  fops_overhead  [kernel.kallsyms]  [k] mutex_spin_on_owner
>             |
>              --0.55%--0x7379732f73656369
>                        __GI___libc_open
>                        entry_SYSCALL_64_after_hwframe
>                        do_syscall_64
>                        __x64_sys_openat
>                        do_sys_openat2
>                        do_filp_open
>                        path_openat
>
>      0.54%     0.00%  fops_overhead  libc-2.31.so       [.] read
>             |
>             ---read
>
>      0.52%     0.12%  fops_overhead  [kernel.kallsyms]  [k] do_dentry_open
>      0.50%     0.00%  fops_overhead  [kernel.kallsyms]  [k] ksys_read
>      0.50%     0.03%  fops_overhead  [kernel.kallsyms]  [k] vfs_read
>      0.46%     0.05%  fops_overhead  [kernel.kallsyms]  [k] __fput
>      0.45%     0.45%  fops_overhead  [kernel.kallsyms]  [k] lockref_put_return
>      0.43%     0.43%  fops_overhead  [kernel.kallsyms]  [k] osq_unlock
>      0.41%     0.08%  fops_overhead  [kernel.kallsyms]  [k] step_into
>      0.41%     0.08%  fops_overhead  [kernel.kallsyms]  [k] __d_lookup
>      0.37%     0.35%  fops_overhead  [kernel.kallsyms]  [k] _raw_spin_lock
>      0.35%     0.03%  fops_overhead  [kernel.kallsyms]  [k] seq_read
>      0.28%     0.01%  fops_overhead  [kernel.kallsyms]  [k] kernfs_fop_open
>      0.27%     0.03%  fops_overhead  [kernel.kallsyms]  [k] kernfs_fop_release
>      0.16%     0.01%  fops_overhead  [kernel.kallsyms]  [k] kernfs_put_open_node
>      0.16%     0.00%  fops_overhead  [kernel.kallsyms]  [k] terminate_walk
>      0.12%     0.01%  fops_overhead  [kernel.kallsyms]  [k] __alloc_file
>      0.12%     0.00%  fops_overhead  [kernel.kallsyms]  [k] alloc_empty_file
>      0.12%     0.01%  fops_overhead  [kernel.kallsyms]  [k] unlazy_walk
>      0.12%     0.05%  fops_overhead  [kernel.kallsyms]  [k] _cond_resched
>      0.12%     0.07%  fops_overhead  [kernel.kallsyms]  [k] call_rcu
>      0.10%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __legitimize_path
>      0.09%     0.05%  fops_overhead  [kernel.kallsyms]  [k] sysfs_kf_seq_show
>      0.09%     0.09%  fops_overhead  [kernel.kallsyms]  [k] generic_permission
>      0.09%     0.07%  fops_overhead  [kernel.kallsyms]  [k] rcu_all_qs
>      0.09%     0.01%  fops_overhead  [kernel.kallsyms]  [k] security_file_open
>      0.08%     0.00%  fops_overhead  [kernel.kallsyms]  [k] security_file_alloc
>      0.08%     0.08%  fops_overhead  [kernel.kallsyms]  [k] lockref_get_not_dead
>      0.08%     0.03%  fops_overhead  [kernel.kallsyms]  [k] kmem_cache_alloc
>      0.08%     0.08%  fops_overhead  [kernel.kallsyms]  [k] apparmor_file_open
>      0.07%     0.05%  fops_overhead  [kernel.kallsyms]  [k] kfree
>      0.05%     0.05%  fops_overhead  [kernel.kallsyms]  [k] kernfs_fop_read
>      0.05%     0.05%  fops_overhead  [kernel.kallsyms]  [k] set_nlink
>      0.05%     0.01%  fops_overhead  [kernel.kallsyms]  [k] kernfs_seq_start
>      0.05%     0.03%  fops_overhead  [kernel.kallsyms]  [k] path_init
>      0.05%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __x64_sys_close
>      0.05%     0.00%  fops_overhead  [kernel.kallsyms]  [k] filp_close
>      0.05%     0.05%  fops_overhead  [kernel.kallsyms]  [k] syscall_return_via_sysret
>      0.05%     0.03%  fops_overhead  [kernel.kallsyms]  [k] __kmalloc_node
>      0.05%     0.05%  fops_overhead  [kernel.kallsyms]  [k] rcu_segcblist_enqueue
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] vfs_open
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
>      0.04%     0.03%  fops_overhead  [kernel.kallsyms]  [k] sprintf
>      0.04%     0.00%  fops_overhead  [kernel.kallsyms]  [k] dev_attr_show
>      0.04%     0.00%  fops_overhead  [kernel.kallsyms]  [k] die_id_show
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] kmem_cache_free
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] fsnotify_parent
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] security_inode_permission
>      0.04%     0.01%  fops_overhead  [kernel.kallsyms]  [k] __check_object_size
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] apparmor_file_alloc_security
>      0.04%     0.00%  fops_overhead  [kernel.kallsyms]  [k] seq_release
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] memset_erms
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] kernfs_get_active
>      0.03%     0.00%  fops_overhead  [kernel.kallsyms]  [k] try_to_wake_up
>      0.03%     0.00%  fops_overhead  [kernel.kallsyms]  [k] vsnprintf
>      0.03%     0.01%  fops_overhead  [kernel.kallsyms]  [k] mntput_no_expire
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] lockref_get
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] kernfs_put_active
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] fsnotify
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] locks_remove_posix
>      0.03%     0.00%  fops_overhead  [kernel.kallsyms]  [k] security_file_permission
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] rw_verify_area
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] set_root
>      0.03%     0.00%  fops_overhead  [kernel.kallsyms]  [k] nd_jump_root
>      0.03%     0.01%  fops_overhead  [kernel.kallsyms]  [k] wake_up_q
>      0.03%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __mutex_unlock_slowpath.constprop.0
>      0.03%     0.00%  fops_overhead  [kernel.kallsyms]  [k] getname_flags.part.0
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] task_work_add
>      0.03%     0.00%  fops_overhead  [kernel.kallsyms]  [k] fput_many
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] __legitimize_mnt
>      0.03%     0.01%  fops_overhead  [kernel.kallsyms]  [k] kernfs_seq_stop
>      0.02%     0.02%  fops_overhead  [nfs]              [k] nfs_do_access
>      0.02%     0.00%  fops_overhead  ld-2.31.so         [.] _dl_map_object
>      0.02%     0.00%  fops_overhead  ld-2.31.so         [.] open_path
>      0.02%     0.00%  fops_overhead  ld-2.31.so         [.] __GI___open64_nocancel
>      0.02%     0.00%  fops_overhead  [nfs]              [k] nfs_permission
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] kernfs_seq_next
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] available_idle_cpu
>      0.01%     0.00%  fops_overhead  [unknown]          [k] 0x3931206e69207364
>      0.01%     0.00%  fops_overhead  libc-2.31.so       [.] __GI___libc_write
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] ksys_write
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] vfs_write
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] tty_write
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] n_tty_write
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] pty_write
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] queue_work_on
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __queue_work
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] select_task_rq_fair
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] native_queued_spin_lock_slowpath
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] slab_free_freelist_hook
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] __list_del_entry_valid
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] memcg_kmem_put_cache
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __syscall_return_slowpath
>      0.01%     0.01%  fops_overhead  libc-2.31.so       [.] _dl_addr
>      0.01%     0.00%  fops_overhead  [unknown]          [.] 0x756e696c2d34365f
>      0.01%     0.00%  fops_overhead  [unknown]          [.] 0x00007f4b1ca1e000
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __x86_indirect_thunk_rax
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] __virt_addr_valid
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] locks_remove_file
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] memcpy_erms
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] update_rq_clock
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] entry_SYSCALL_64
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] __check_heap_object
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] apparmor_file_free_security
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] security_file_free
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] __d_lookup_rcu
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] mntput
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] get_unused_fd_flags
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] alloc_slab_page
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __slab_alloc
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] ___slab_alloc
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] allocate_slab
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __alloc_fd
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] legitimize_root
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] strncpy_from_user
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] kernfs_refresh_inode
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] build_open_flags
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] strcmp
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] memcg_kmem_get_cache
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] asm_sysvec_apic_timer_interrupt
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] sysvec_apic_timer_interrupt
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] asm_call_sysvec_on_stack
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __sysvec_apic_timer_interrupt
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] hrtimer_interrupt
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __hrtimer_run_queues
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] tick_sched_timer
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] tick_sched_handle
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] update_process_times
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] scheduler_tick
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] perf_iterate_ctx
>      0.01%     0.00%  fops_overhead  [unknown]          [k] 0x00007fd34e3a0627
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __x64_sys_execve
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] do_execve
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __do_execve_file
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] load_elf_binary
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] elf_map
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] vm_mmap_pgoff
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] do_mmap
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] mmap_region
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] perf_event_mmap
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] perf_iterate_sb
>      0.00%     0.00%  perf_5.8       [unknown]          [k] 0x00007fd34e3a0627
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] entry_SYSCALL_64_after_hwframe
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] perf_event_exec
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] do_syscall_64
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] __x64_sys_execve
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] do_execve
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] __do_execve_file
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] load_elf_binary
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] begin_new_exec
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] native_write_msr
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] __intel_pmu_enable_all.constprop.0
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] acpi_os_read_memory
>
>
> #
> # (Tip: To count events in every 1000 msec: perf stat -I 1000)
> #
>

Hi Brice,

I wrote a benchmark to do open+read+close on
/sys/devices/system/cpu/cpu0/topology/die_id
https://github.com/foxhlchen/sysfs_benchmark/blob/main/main.c


+    3.39%     3.37%  a.out  [kernel.kallsyms]  [k] mutex_unlock
                                 ◆
+    2.76%     2.74%  a.out  [kernel.kallsyms]  [k] mutex_lock
                                 ▒
+    0.92%     0.42%  a.out  [kernel.kallsyms]  [k]
__mutex_lock.constprop.0                            ▒
     0.38%     0.37%  a.out  [kernel.kallsyms]  [k]
mutex_spin_on_owner                                 ▒
     0.05%     0.05%  a.out  [kernel.kallsyms]  [k] __mutex_init
                                 ▒
     0.01%     0.01%  a.out  [kernel.kallsyms]  [k]
__mutex_lock_slowpath                               ▒
     0.01%     0.00%  a.out  [kernel.kallsyms]  [k]
__mutex_unlock_slowpath.constprop.0

But I failed to reproduce your result.

If it is possible, would you mind providing your benchmark code? :)


thanks,
fox

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-11-18 10:45                       ` Brice Goglin
@ 2020-11-18 10:57                         ` Greg Kroah-Hartman
  0 siblings, 0 replies; 22+ messages in thread
From: Greg Kroah-Hartman @ 2020-11-18 10:57 UTC (permalink / raw)
  To: Brice Goglin
  Cc: Ricardo Neri, x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Wed, Nov 18, 2020 at 11:45:46AM +0100, Brice Goglin wrote:
> 
> Le 17/11/2020 à 16:55, Brice Goglin a écrit :
> > Le 12/11/2020 à 11:49, Greg Kroah-Hartman a écrit :
> >> On Thu, Nov 12, 2020 at 10:10:57AM +0100, Brice Goglin wrote:
> >>> Le 12/11/2020 à 07:42, Greg Kroah-Hartman a écrit :
> >>>> On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote:
> >>>>> Hello
> >>>>>
> >>>>> Sorry for the late reply. As the first userspace consumer of this
> >>>>> interface [1], I can confirm that reading a single file to get the mask
> >>>>> would be better, at least for performance reason. On large platforms, we
> >>>>> already have to read thousands of sysfs files to get CPU topology and
> >>>>> cache information, I'd be happy not to read one more file per cpu.
> >>>>>
> >>>>> Reading these sysfs files is slow, and it does not scale well when
> >>>>> multiple processes read them in parallel.
> >>>> Really?  Where is the slowdown?  Would something like readfile() work
> >>>> better for you for that?
> >>>> 	https://lore.kernel.org/linux-api/20200704140250.423345-1-gregkh@linuxfoundation.org/
> >>> I guess readfile would improve the sequential case by avoiding syscalls
> >>> but it would not improve the parallel case since syscalls shouldn't have
> >>> any parallel issue?
> >> syscalls should not have parallel issues at all.
> >>
> >>> We've been watching the status of readfile() since it was posted on LKML
> >>> 6 months ago, but we were actually wondering if it would end up being
> >>> included at some point.
> >> It needs a solid reason to be merged.  My "test" benchmarks are fun to
> >> run, but I have yet to find a real need for it anywhere as the
> >> open/read/close syscall overhead seems to be lost in the noise on any
> >> real application workload that I can find.
> >>
> >> If you have a real need, and it reduces overhead and cpu usage, I'm more
> >> than willing to update the patchset and resubmit it.
> >>
> >>
> > Hello
> >
> > I updated hwloc to use readfile instead of open+read+close on all those
> > small sysfs/procfs files. Unfortunately the improvement is very small,
> > only a couple percents. On a 40 core server, our library starts in 38ms
> > instead of 39ms. I can't deploy your patches on larger machines, but I
> > tested our code on a copy of their sysfs files saved on a local disk :
> > For a 256-thread KNL, we go from 15ms to 14ms. For a 896-core SGI
> > machine, from 73ms to 71ms.
> 
> 
> Sorry, I forgot to update some codepaths to properly use readfile yesterday :/
> Here are updated and more precise numbers that show a non-negligible improvement.
> Again, we're measuring the entire hwloc topology discovery, which includes reading
> many sysfs file (improved thanks to readfile) and then building a hierarchy of
> objects describing the machine (not modified).
> 
> Server sysfs files (dual-socket x 20 cores x SMT-2)
> default  43.48ms +/-4.48
> readfile 42.15ms +/-4.58 => 3.1% better
> 1971 readfile calls => 674ns improvement per call
> 
> Knight Landing sysfs stored on local hard drive (64 cores x SMT-4)
> default  14.60ms +/-0.91
> readfile 13.63ms +/-1.05 => 6.6% better
> 2940 readfile calls => 329ns improvement per call
> 
> SGI Altix UV sysfs stored on local hard drive (56 sockets x 8 coeurs x SMT-2)
> default  69.12ms +/-1.40
> readfile 66.03ms +/-1.35 => 4.5% better
> 14525 readfile calls => 212ns improvement per call
> 
> I don't know why the first case (real sysfs files) gets a much
> higher standard deviation and higher improvement per readfile call.
> The other two cases match what microbenmarks say
> (about 200ns improvement per readfile call).

Thanks a lot for the detailed numbers.  I think 6.6% is the best that I
saw in my microbenchmarks as well.

While that would be great for a userspace task that does real work,
doing this type of thing is arguably, not "real work" :)

I'll keep the patch updated over time just for fun, but odds are it's
not going to get merged unless we have some userspace workload that
really needs it.

But, you pointing out the global kernfs mutex is an issue, that we
should work on resolving.  I've pointed someone at it already and maybe
we will have some results in a few weeks.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-11-17 15:55                     ` Brice Goglin
@ 2020-11-18 10:45                       ` Brice Goglin
  2020-11-18 10:57                         ` Greg Kroah-Hartman
  0 siblings, 1 reply; 22+ messages in thread
From: Brice Goglin @ 2020-11-18 10:45 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Ricardo Neri, x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada


Le 17/11/2020 à 16:55, Brice Goglin a écrit :
> Le 12/11/2020 à 11:49, Greg Kroah-Hartman a écrit :
>> On Thu, Nov 12, 2020 at 10:10:57AM +0100, Brice Goglin wrote:
>>> Le 12/11/2020 à 07:42, Greg Kroah-Hartman a écrit :
>>>> On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote:
>>>>> Hello
>>>>>
>>>>> Sorry for the late reply. As the first userspace consumer of this
>>>>> interface [1], I can confirm that reading a single file to get the mask
>>>>> would be better, at least for performance reason. On large platforms, we
>>>>> already have to read thousands of sysfs files to get CPU topology and
>>>>> cache information, I'd be happy not to read one more file per cpu.
>>>>>
>>>>> Reading these sysfs files is slow, and it does not scale well when
>>>>> multiple processes read them in parallel.
>>>> Really?  Where is the slowdown?  Would something like readfile() work
>>>> better for you for that?
>>>> 	https://lore.kernel.org/linux-api/20200704140250.423345-1-gregkh@linuxfoundation.org/
>>> I guess readfile would improve the sequential case by avoiding syscalls
>>> but it would not improve the parallel case since syscalls shouldn't have
>>> any parallel issue?
>> syscalls should not have parallel issues at all.
>>
>>> We've been watching the status of readfile() since it was posted on LKML
>>> 6 months ago, but we were actually wondering if it would end up being
>>> included at some point.
>> It needs a solid reason to be merged.  My "test" benchmarks are fun to
>> run, but I have yet to find a real need for it anywhere as the
>> open/read/close syscall overhead seems to be lost in the noise on any
>> real application workload that I can find.
>>
>> If you have a real need, and it reduces overhead and cpu usage, I'm more
>> than willing to update the patchset and resubmit it.
>>
>>
> Hello
>
> I updated hwloc to use readfile instead of open+read+close on all those
> small sysfs/procfs files. Unfortunately the improvement is very small,
> only a couple percents. On a 40 core server, our library starts in 38ms
> instead of 39ms. I can't deploy your patches on larger machines, but I
> tested our code on a copy of their sysfs files saved on a local disk :
> For a 256-thread KNL, we go from 15ms to 14ms. For a 896-core SGI
> machine, from 73ms to 71ms.


Sorry, I forgot to update some codepaths to properly use readfile yesterday :/
Here are updated and more precise numbers that show a non-negligible improvement.
Again, we're measuring the entire hwloc topology discovery, which includes reading
many sysfs file (improved thanks to readfile) and then building a hierarchy of
objects describing the machine (not modified).

Server sysfs files (dual-socket x 20 cores x SMT-2)
default  43.48ms +/-4.48
readfile 42.15ms +/-4.58 => 3.1% better
1971 readfile calls => 674ns improvement per call

Knight Landing sysfs stored on local hard drive (64 cores x SMT-4)
default  14.60ms +/-0.91
readfile 13.63ms +/-1.05 => 6.6% better
2940 readfile calls => 329ns improvement per call

SGI Altix UV sysfs stored on local hard drive (56 sockets x 8 coeurs x SMT-2)
default  69.12ms +/-1.40
readfile 66.03ms +/-1.35 => 4.5% better
14525 readfile calls => 212ns improvement per call

I don't know why the first case (real sysfs files) gets a much
higher standard deviation and higher improvement per readfile call.
The other two cases match what microbenmarks say
(about 200ns improvement per readfile call).

Brice




>
> I see 200ns improvement for readfile (2300) vs open+read+close (2500) on
> my server when reading a single cpu topology file. With several
> thousands of sysfs files to read in the above large hwloc tests, it
> confirms an overall improvement in the order of 1ms.
>
> So, just like you said, the overhead seems to be pretty much lost in the
> noise of hwloc doing its own stuff after reading hundreds of sysfs files :/
>
> Brice
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-11-12 10:49                   ` Greg Kroah-Hartman
@ 2020-11-17 15:55                     ` Brice Goglin
  2020-11-18 10:45                       ` Brice Goglin
       [not found]                     ` <38f290d2-4c3a-d1b0-f3cc-a0897ea10abd@gmail.com>
  1 sibling, 1 reply; 22+ messages in thread
From: Brice Goglin @ 2020-11-17 15:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Ricardo Neri, x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada


Le 12/11/2020 à 11:49, Greg Kroah-Hartman a écrit :
> On Thu, Nov 12, 2020 at 10:10:57AM +0100, Brice Goglin wrote:
>> Le 12/11/2020 à 07:42, Greg Kroah-Hartman a écrit :
>>> On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote:
>>>>
>>>> Hello
>>>>
>>>> Sorry for the late reply. As the first userspace consumer of this
>>>> interface [1], I can confirm that reading a single file to get the mask
>>>> would be better, at least for performance reason. On large platforms, we
>>>> already have to read thousands of sysfs files to get CPU topology and
>>>> cache information, I'd be happy not to read one more file per cpu.
>>>>
>>>> Reading these sysfs files is slow, and it does not scale well when
>>>> multiple processes read them in parallel.
>>> Really?  Where is the slowdown?  Would something like readfile() work
>>> better for you for that?
>>> 	https://lore.kernel.org/linux-api/20200704140250.423345-1-gregkh@linuxfoundation.org/
>>
>> I guess readfile would improve the sequential case by avoiding syscalls
>> but it would not improve the parallel case since syscalls shouldn't have
>> any parallel issue?
> syscalls should not have parallel issues at all.
>
>> We've been watching the status of readfile() since it was posted on LKML
>> 6 months ago, but we were actually wondering if it would end up being
>> included at some point.
> It needs a solid reason to be merged.  My "test" benchmarks are fun to
> run, but I have yet to find a real need for it anywhere as the
> open/read/close syscall overhead seems to be lost in the noise on any
> real application workload that I can find.
>
> If you have a real need, and it reduces overhead and cpu usage, I'm more
> than willing to update the patchset and resubmit it.
>
>

Hello

I updated hwloc to use readfile instead of open+read+close on all those
small sysfs/procfs files. Unfortunately the improvement is very small,
only a couple percents. On a 40 core server, our library starts in 38ms
instead of 39ms. I can't deploy your patches on larger machines, but I
tested our code on a copy of their sysfs files saved on a local disk :
For a 256-thread KNL, we go from 15ms to 14ms. For a 896-core SGI
machine, from 73ms to 71ms.

I see 200ns improvement for readfile (2300) vs open+read+close (2500) on
my server when reading a single cpu topology file. With several
thousands of sysfs files to read in the above large hwloc tests, it
confirms an overall improvement in the order of 1ms.

So, just like you said, the overhead seems to be pretty much lost in the
noise of hwloc doing its own stuff after reading hundreds of sysfs files :/

Brice



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
       [not found]                     ` <38f290d2-4c3a-d1b0-f3cc-a0897ea10abd@gmail.com>
@ 2020-11-12 11:34                       ` Greg Kroah-Hartman
  2020-11-19  8:25                       ` Fox Chen
  1 sibling, 0 replies; 22+ messages in thread
From: Greg Kroah-Hartman @ 2020-11-12 11:34 UTC (permalink / raw)
  To: Brice Goglin
  Cc: Ricardo Neri, x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Thu, Nov 12, 2020 at 12:21:43PM +0100, Brice Goglin wrote:
> 
> Le 12/11/2020 à 11:49, Greg Kroah-Hartman a écrit :
> > On Thu, Nov 12, 2020 at 10:10:57AM +0100, Brice Goglin wrote:
> >> Le 12/11/2020 à 07:42, Greg Kroah-Hartman a écrit :
> >>> On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote:
> >>>> Le 07/10/2020 à 07:15, Greg Kroah-Hartman a écrit :
> >>>>> On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote:
> >>>>>> On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
> >>>>>>> On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
> >>>>>>>> On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
> >>>>>>>>> On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> >>>>>>>>>> Hybrid CPU topologies combine CPUs of different microarchitectures in the
> >>>>>>>>>> same die. Thus, even though the instruction set is compatible among all
> >>>>>>>>>> CPUs, there may still be differences in features (e.g., some CPUs may
> >>>>>>>>>> have counters that others CPU do not). There may be applications
> >>>>>>>>>> interested in knowing the type of micro-architecture topology of the
> >>>>>>>>>> system to make decisions about process affinity.
> >>>>>>>>>>
> >>>>>>>>>> While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> >>>>>>>>>> cpu_capacity) may be used to infer the types of micro-architecture of the
> >>>>>>>>>> CPUs in the platform, it may not be entirely accurate. For instance, two
> >>>>>>>>>> subsets of CPUs with different types of micro-architecture may have the
> >>>>>>>>>> same capacity due to power or thermal constraints.
> >>>>>>>>>>
> >>>>>>>>>> Create the new directory /sys/devices/system/cpu/types. Under such
> >>>>>>>>>> directory, create individual subdirectories for each type of CPU micro-
> >>>>>>>>>> architecture. Each subdirectory will have cpulist and cpumap files. This
> >>>>>>>>>> makes it convenient for user space to read all the CPUs of the same type
> >>>>>>>>>> at once without having to inspect each CPU individually.
> >>>>>>>>>>
> >>>>>>>>>> Implement a generic interface using weak functions that architectures can
> >>>>>>>>>> override to indicate a) support for CPU types, b) the CPU type number, and
> >>>>>>>>>> c) a string to identify the CPU vendor and type.
> >>>>>>>>>>
> >>>>>>>>>> For example, an x86 system with one Intel Core and four Intel Atom CPUs
> >>>>>>>>>> would look like this (other architectures have the hooks to use whatever
> >>>>>>>>>> directory naming convention below "types" that meets their needs):
> >>>>>>>>>>
> >>>>>>>>>> user@host:~$: ls /sys/devices/system/cpu/types
> >>>>>>>>>> intel_atom_0  intel_core_0
> >>>>>>>>>>
> >>>>>>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> >>>>>>>>>> cpulist cpumap
> >>>>>>>>>>
> >>>>>>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> >>>>>>>>>> cpulist cpumap
> >>>>>>>>>>
> >>>>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> >>>>>>>>>> 0f
> >>>>>>>>>>
> >>>>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> >>>>>>>>>> 0-3
> >>>>>>>>>>
> >>>>>>>>>> user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> >>>>>>>>>> 10
> >>>>>>>>>>
> >>>>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> >>>>>>>>>> 4
> >>>>>>>> Thank you for the quick and detailed Greg!
> >>>>>>>>
> >>>>>>>>> The output of 'tree' sometimes makes it easier to see here, or:
> >>>>>>>>> 	grep -R . *
> >>>>>>>>> also works well.
> >>>>>>>> Indeed, this would definitely make it more readable.
> >>>>>>>>
> >>>>>>>>>> On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> >>>>>>>>>> created. Add a hook for this purpose.
> >>>>>>>>> Why should these not show up if the system is not "hybrid"?
> >>>>>>>> My thinking was that on a non-hybrid system, it does not make sense to
> >>>>>>>> create this interface, as all the CPUs will be of the same type.
> >>>>>>> Why not just have this an attribute type in the existing cpuX directory?
> >>>>>>> Why do this have to be a totally separate directory and userspace has to
> >>>>>>> figure out to look in two different spots for the same cpu to determine
> >>>>>>> what it is?
> >>>>>> But if the type is located under cpuX, usespace would need to traverse
> >>>>>> all the CPUs and create its own cpu masks. Under the types directory it
> >>>>>> would only need to look once for each type of CPU, IMHO.
> >>>>> What does a "mask" do?  What does userspace care about this?  You would
> >>>>> have to create it by traversing the directories you are creating anyway,
> >>>>> so it's not much different, right?
> >>>> Hello
> >>>>
> >>>> Sorry for the late reply. As the first userspace consumer of this
> >>>> interface [1], I can confirm that reading a single file to get the mask
> >>>> would be better, at least for performance reason. On large platforms, we
> >>>> already have to read thousands of sysfs files to get CPU topology and
> >>>> cache information, I'd be happy not to read one more file per cpu.
> >>>>
> >>>> Reading these sysfs files is slow, and it does not scale well when
> >>>> multiple processes read them in parallel.
> >>> Really?  Where is the slowdown?  Would something like readfile() work
> >>> better for you for that?
> >>> 	https://lore.kernel.org/linux-api/20200704140250.423345-1-gregkh@linuxfoundation.org/
> >>
> >> I guess readfile would improve the sequential case by avoiding syscalls
> >> but it would not improve the parallel case since syscalls shouldn't have
> >> any parallel issue?
> > syscalls should not have parallel issues at all.
> >
> >> We've been watching the status of readfile() since it was posted on LKML
> >> 6 months ago, but we were actually wondering if it would end up being
> >> included at some point.
> > It needs a solid reason to be merged.  My "test" benchmarks are fun to
> > run, but I have yet to find a real need for it anywhere as the
> > open/read/close syscall overhead seems to be lost in the noise on any
> > real application workload that I can find.
> >
> > If you have a real need, and it reduces overhead and cpu usage, I'm more
> > than willing to update the patchset and resubmit it.
> 
> 
> Good, I'll give it at try.
> 
> 
> >>> How does multiple processes slow anything down, there shouldn't be any
> >>> shared locks here.
> >>
> >> When I benchmarked this in 2016, reading a single (small) sysfs file was
> >> 41x slower when running 64 processes simultaneously on a 64-core Knights
> >> Landing than reading from a single process. On a SGI Altix UV with 12x
> >> 8-core CPUs, reading from one process per CPU (12 total) was 60x slower
> >> (which could mean NUMA affinity matters), and reading from one process
> >> per core (96 total) was 491x slower.
> >>
> >> I will try to find some time to dig further on recent kernels with perf
> >> and readfile (both machines were running RHEL7).
> > 2016 was a long time ago in kernel-land, please retest on a kernel.org
> > release, not a RHEL monstrosity.
> 
> 
> Quick test on 5.8.14 from Debian (fairly close to mainline) on a server
> with 2x20 cores.
> 
> I am measuring the time to do open+read+close of
> /sys/devices/system/cpu/cpu15/topology/die_id 1000 times
> 
> With a single process, it takes 2ms (2us per open+read+close, looks OK).
> 
> With one process per core (with careful binding, etc), it jumps from 2ms
> to 190ms (without much variation).
> 
> It looks like locks in kernfs_iop_permission and kernfs_dop_revalidate
> are causing the issue.
> 
> I am attaching the perf report callgraph output below.

Ouch, yes, we are hitting the single kernfs mutex for all of this, not
nice.  I'll add this to my list of things to look at in the near future,
thanks for the report!

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-11-12  9:10                 ` Brice Goglin
@ 2020-11-12 10:49                   ` Greg Kroah-Hartman
  2020-11-17 15:55                     ` Brice Goglin
       [not found]                     ` <38f290d2-4c3a-d1b0-f3cc-a0897ea10abd@gmail.com>
  0 siblings, 2 replies; 22+ messages in thread
From: Greg Kroah-Hartman @ 2020-11-12 10:49 UTC (permalink / raw)
  To: Brice Goglin
  Cc: Ricardo Neri, x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Thu, Nov 12, 2020 at 10:10:57AM +0100, Brice Goglin wrote:
> 
> Le 12/11/2020 à 07:42, Greg Kroah-Hartman a écrit :
> > On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote:
> >> Le 07/10/2020 à 07:15, Greg Kroah-Hartman a écrit :
> >>> On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote:
> >>>> On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
> >>>>> On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
> >>>>>> On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
> >>>>>>> On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> >>>>>>>> Hybrid CPU topologies combine CPUs of different microarchitectures in the
> >>>>>>>> same die. Thus, even though the instruction set is compatible among all
> >>>>>>>> CPUs, there may still be differences in features (e.g., some CPUs may
> >>>>>>>> have counters that others CPU do not). There may be applications
> >>>>>>>> interested in knowing the type of micro-architecture topology of the
> >>>>>>>> system to make decisions about process affinity.
> >>>>>>>>
> >>>>>>>> While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> >>>>>>>> cpu_capacity) may be used to infer the types of micro-architecture of the
> >>>>>>>> CPUs in the platform, it may not be entirely accurate. For instance, two
> >>>>>>>> subsets of CPUs with different types of micro-architecture may have the
> >>>>>>>> same capacity due to power or thermal constraints.
> >>>>>>>>
> >>>>>>>> Create the new directory /sys/devices/system/cpu/types. Under such
> >>>>>>>> directory, create individual subdirectories for each type of CPU micro-
> >>>>>>>> architecture. Each subdirectory will have cpulist and cpumap files. This
> >>>>>>>> makes it convenient for user space to read all the CPUs of the same type
> >>>>>>>> at once without having to inspect each CPU individually.
> >>>>>>>>
> >>>>>>>> Implement a generic interface using weak functions that architectures can
> >>>>>>>> override to indicate a) support for CPU types, b) the CPU type number, and
> >>>>>>>> c) a string to identify the CPU vendor and type.
> >>>>>>>>
> >>>>>>>> For example, an x86 system with one Intel Core and four Intel Atom CPUs
> >>>>>>>> would look like this (other architectures have the hooks to use whatever
> >>>>>>>> directory naming convention below "types" that meets their needs):
> >>>>>>>>
> >>>>>>>> user@host:~$: ls /sys/devices/system/cpu/types
> >>>>>>>> intel_atom_0  intel_core_0
> >>>>>>>>
> >>>>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> >>>>>>>> cpulist cpumap
> >>>>>>>>
> >>>>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> >>>>>>>> cpulist cpumap
> >>>>>>>>
> >>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> >>>>>>>> 0f
> >>>>>>>>
> >>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> >>>>>>>> 0-3
> >>>>>>>>
> >>>>>>>> user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> >>>>>>>> 10
> >>>>>>>>
> >>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> >>>>>>>> 4
> >>>>>> Thank you for the quick and detailed Greg!
> >>>>>>
> >>>>>>> The output of 'tree' sometimes makes it easier to see here, or:
> >>>>>>> 	grep -R . *
> >>>>>>> also works well.
> >>>>>> Indeed, this would definitely make it more readable.
> >>>>>>
> >>>>>>>> On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> >>>>>>>> created. Add a hook for this purpose.
> >>>>>>> Why should these not show up if the system is not "hybrid"?
> >>>>>> My thinking was that on a non-hybrid system, it does not make sense to
> >>>>>> create this interface, as all the CPUs will be of the same type.
> >>>>> Why not just have this an attribute type in the existing cpuX directory?
> >>>>> Why do this have to be a totally separate directory and userspace has to
> >>>>> figure out to look in two different spots for the same cpu to determine
> >>>>> what it is?
> >>>> But if the type is located under cpuX, usespace would need to traverse
> >>>> all the CPUs and create its own cpu masks. Under the types directory it
> >>>> would only need to look once for each type of CPU, IMHO.
> >>> What does a "mask" do?  What does userspace care about this?  You would
> >>> have to create it by traversing the directories you are creating anyway,
> >>> so it's not much different, right?
> >>
> >> Hello
> >>
> >> Sorry for the late reply. As the first userspace consumer of this
> >> interface [1], I can confirm that reading a single file to get the mask
> >> would be better, at least for performance reason. On large platforms, we
> >> already have to read thousands of sysfs files to get CPU topology and
> >> cache information, I'd be happy not to read one more file per cpu.
> >>
> >> Reading these sysfs files is slow, and it does not scale well when
> >> multiple processes read them in parallel.
> > Really?  Where is the slowdown?  Would something like readfile() work
> > better for you for that?
> > 	https://lore.kernel.org/linux-api/20200704140250.423345-1-gregkh@linuxfoundation.org/
> 
> 
> I guess readfile would improve the sequential case by avoiding syscalls
> but it would not improve the parallel case since syscalls shouldn't have
> any parallel issue?

syscalls should not have parallel issues at all.

> We've been watching the status of readfile() since it was posted on LKML
> 6 months ago, but we were actually wondering if it would end up being
> included at some point.

It needs a solid reason to be merged.  My "test" benchmarks are fun to
run, but I have yet to find a real need for it anywhere as the
open/read/close syscall overhead seems to be lost in the noise on any
real application workload that I can find.

If you have a real need, and it reduces overhead and cpu usage, I'm more
than willing to update the patchset and resubmit it.

> > How does multiple processes slow anything down, there shouldn't be any
> > shared locks here.
> 
> 
> When I benchmarked this in 2016, reading a single (small) sysfs file was
> 41x slower when running 64 processes simultaneously on a 64-core Knights
> Landing than reading from a single process. On a SGI Altix UV with 12x
> 8-core CPUs, reading from one process per CPU (12 total) was 60x slower
> (which could mean NUMA affinity matters), and reading from one process
> per core (96 total) was 491x slower.
> 
> I will try to find some time to dig further on recent kernels with perf
> and readfile (both machines were running RHEL7).

2016 was a long time ago in kernel-land, please retest on a kernel.org
release, not a RHEL monstrosity.

> >> There are ways to avoid this
> >> multiple discoveries by sharing hwloc info through XML or shmem, but it
> >> will take years before all developers of different runtimes all
> >> implement this :)
> > I don't understand, what exactly are you suggesting we do here instead?
> 
> 
> I was just saying userspace has ways to mitigate the issue but it will
> take time because many different projects are involved.

I still don't understand, what issue are you referring to?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-11-12  6:42               ` Greg Kroah-Hartman
@ 2020-11-12  9:10                 ` Brice Goglin
  2020-11-12 10:49                   ` Greg Kroah-Hartman
  0 siblings, 1 reply; 22+ messages in thread
From: Brice Goglin @ 2020-11-12  9:10 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Ricardo Neri, x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada


Le 12/11/2020 à 07:42, Greg Kroah-Hartman a écrit :
> On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote:
>> Le 07/10/2020 à 07:15, Greg Kroah-Hartman a écrit :
>>> On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote:
>>>> On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
>>>>> On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
>>>>>> On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
>>>>>>> On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
>>>>>>>> Hybrid CPU topologies combine CPUs of different microarchitectures in the
>>>>>>>> same die. Thus, even though the instruction set is compatible among all
>>>>>>>> CPUs, there may still be differences in features (e.g., some CPUs may
>>>>>>>> have counters that others CPU do not). There may be applications
>>>>>>>> interested in knowing the type of micro-architecture topology of the
>>>>>>>> system to make decisions about process affinity.
>>>>>>>>
>>>>>>>> While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
>>>>>>>> cpu_capacity) may be used to infer the types of micro-architecture of the
>>>>>>>> CPUs in the platform, it may not be entirely accurate. For instance, two
>>>>>>>> subsets of CPUs with different types of micro-architecture may have the
>>>>>>>> same capacity due to power or thermal constraints.
>>>>>>>>
>>>>>>>> Create the new directory /sys/devices/system/cpu/types. Under such
>>>>>>>> directory, create individual subdirectories for each type of CPU micro-
>>>>>>>> architecture. Each subdirectory will have cpulist and cpumap files. This
>>>>>>>> makes it convenient for user space to read all the CPUs of the same type
>>>>>>>> at once without having to inspect each CPU individually.
>>>>>>>>
>>>>>>>> Implement a generic interface using weak functions that architectures can
>>>>>>>> override to indicate a) support for CPU types, b) the CPU type number, and
>>>>>>>> c) a string to identify the CPU vendor and type.
>>>>>>>>
>>>>>>>> For example, an x86 system with one Intel Core and four Intel Atom CPUs
>>>>>>>> would look like this (other architectures have the hooks to use whatever
>>>>>>>> directory naming convention below "types" that meets their needs):
>>>>>>>>
>>>>>>>> user@host:~$: ls /sys/devices/system/cpu/types
>>>>>>>> intel_atom_0  intel_core_0
>>>>>>>>
>>>>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
>>>>>>>> cpulist cpumap
>>>>>>>>
>>>>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
>>>>>>>> cpulist cpumap
>>>>>>>>
>>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
>>>>>>>> 0f
>>>>>>>>
>>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
>>>>>>>> 0-3
>>>>>>>>
>>>>>>>> user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
>>>>>>>> 10
>>>>>>>>
>>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
>>>>>>>> 4
>>>>>> Thank you for the quick and detailed Greg!
>>>>>>
>>>>>>> The output of 'tree' sometimes makes it easier to see here, or:
>>>>>>> 	grep -R . *
>>>>>>> also works well.
>>>>>> Indeed, this would definitely make it more readable.
>>>>>>
>>>>>>>> On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
>>>>>>>> created. Add a hook for this purpose.
>>>>>>> Why should these not show up if the system is not "hybrid"?
>>>>>> My thinking was that on a non-hybrid system, it does not make sense to
>>>>>> create this interface, as all the CPUs will be of the same type.
>>>>> Why not just have this an attribute type in the existing cpuX directory?
>>>>> Why do this have to be a totally separate directory and userspace has to
>>>>> figure out to look in two different spots for the same cpu to determine
>>>>> what it is?
>>>> But if the type is located under cpuX, usespace would need to traverse
>>>> all the CPUs and create its own cpu masks. Under the types directory it
>>>> would only need to look once for each type of CPU, IMHO.
>>> What does a "mask" do?  What does userspace care about this?  You would
>>> have to create it by traversing the directories you are creating anyway,
>>> so it's not much different, right?
>>
>> Hello
>>
>> Sorry for the late reply. As the first userspace consumer of this
>> interface [1], I can confirm that reading a single file to get the mask
>> would be better, at least for performance reason. On large platforms, we
>> already have to read thousands of sysfs files to get CPU topology and
>> cache information, I'd be happy not to read one more file per cpu.
>>
>> Reading these sysfs files is slow, and it does not scale well when
>> multiple processes read them in parallel.
> Really?  Where is the slowdown?  Would something like readfile() work
> better for you for that?
> 	https://lore.kernel.org/linux-api/20200704140250.423345-1-gregkh@linuxfoundation.org/


I guess readfile would improve the sequential case by avoiding syscalls
but it would not improve the parallel case since syscalls shouldn't have
any parallel issue?

We've been watching the status of readfile() since it was posted on LKML
6 months ago, but we were actually wondering if it would end up being
included at some point.


> How does multiple processes slow anything down, there shouldn't be any
> shared locks here.


When I benchmarked this in 2016, reading a single (small) sysfs file was
41x slower when running 64 processes simultaneously on a 64-core Knights
Landing than reading from a single process. On a SGI Altix UV with 12x
8-core CPUs, reading from one process per CPU (12 total) was 60x slower
(which could mean NUMA affinity matters), and reading from one process
per core (96 total) was 491x slower.

I will try to find some time to dig further on recent kernels with perf
and readfile (both machines were running RHEL7).


>> There are ways to avoid this
>> multiple discoveries by sharing hwloc info through XML or shmem, but it
>> will take years before all developers of different runtimes all
>> implement this :)
> I don't understand, what exactly are you suggesting we do here instead?


I was just saying userspace has ways to mitigate the issue but it will
take time because many different projects are involved.

Brice



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-11-12  6:19             ` Brice Goglin
@ 2020-11-12  6:42               ` Greg Kroah-Hartman
  2020-11-12  9:10                 ` Brice Goglin
  0 siblings, 1 reply; 22+ messages in thread
From: Greg Kroah-Hartman @ 2020-11-12  6:42 UTC (permalink / raw)
  To: Brice Goglin
  Cc: Ricardo Neri, x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote:
> Le 07/10/2020 à 07:15, Greg Kroah-Hartman a écrit :
> > On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote:
> >> On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
> >>> On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
> >>>> On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
> >>>>> On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> >>>>>> Hybrid CPU topologies combine CPUs of different microarchitectures in the
> >>>>>> same die. Thus, even though the instruction set is compatible among all
> >>>>>> CPUs, there may still be differences in features (e.g., some CPUs may
> >>>>>> have counters that others CPU do not). There may be applications
> >>>>>> interested in knowing the type of micro-architecture topology of the
> >>>>>> system to make decisions about process affinity.
> >>>>>>
> >>>>>> While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> >>>>>> cpu_capacity) may be used to infer the types of micro-architecture of the
> >>>>>> CPUs in the platform, it may not be entirely accurate. For instance, two
> >>>>>> subsets of CPUs with different types of micro-architecture may have the
> >>>>>> same capacity due to power or thermal constraints.
> >>>>>>
> >>>>>> Create the new directory /sys/devices/system/cpu/types. Under such
> >>>>>> directory, create individual subdirectories for each type of CPU micro-
> >>>>>> architecture. Each subdirectory will have cpulist and cpumap files. This
> >>>>>> makes it convenient for user space to read all the CPUs of the same type
> >>>>>> at once without having to inspect each CPU individually.
> >>>>>>
> >>>>>> Implement a generic interface using weak functions that architectures can
> >>>>>> override to indicate a) support for CPU types, b) the CPU type number, and
> >>>>>> c) a string to identify the CPU vendor and type.
> >>>>>>
> >>>>>> For example, an x86 system with one Intel Core and four Intel Atom CPUs
> >>>>>> would look like this (other architectures have the hooks to use whatever
> >>>>>> directory naming convention below "types" that meets their needs):
> >>>>>>
> >>>>>> user@host:~$: ls /sys/devices/system/cpu/types
> >>>>>> intel_atom_0  intel_core_0
> >>>>>>
> >>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> >>>>>> cpulist cpumap
> >>>>>>
> >>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> >>>>>> cpulist cpumap
> >>>>>>
> >>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> >>>>>> 0f
> >>>>>>
> >>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> >>>>>> 0-3
> >>>>>>
> >>>>>> user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> >>>>>> 10
> >>>>>>
> >>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> >>>>>> 4
> >>>> Thank you for the quick and detailed Greg!
> >>>>
> >>>>> The output of 'tree' sometimes makes it easier to see here, or:
> >>>>> 	grep -R . *
> >>>>> also works well.
> >>>> Indeed, this would definitely make it more readable.
> >>>>
> >>>>>> On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> >>>>>> created. Add a hook for this purpose.
> >>>>> Why should these not show up if the system is not "hybrid"?
> >>>> My thinking was that on a non-hybrid system, it does not make sense to
> >>>> create this interface, as all the CPUs will be of the same type.
> >>> Why not just have this an attribute type in the existing cpuX directory?
> >>> Why do this have to be a totally separate directory and userspace has to
> >>> figure out to look in two different spots for the same cpu to determine
> >>> what it is?
> >> But if the type is located under cpuX, usespace would need to traverse
> >> all the CPUs and create its own cpu masks. Under the types directory it
> >> would only need to look once for each type of CPU, IMHO.
> > What does a "mask" do?  What does userspace care about this?  You would
> > have to create it by traversing the directories you are creating anyway,
> > so it's not much different, right?
> 
> 
> Hello
> 
> Sorry for the late reply. As the first userspace consumer of this
> interface [1], I can confirm that reading a single file to get the mask
> would be better, at least for performance reason. On large platforms, we
> already have to read thousands of sysfs files to get CPU topology and
> cache information, I'd be happy not to read one more file per cpu.
> 
> Reading these sysfs files is slow, and it does not scale well when
> multiple processes read them in parallel.

Really?  Where is the slowdown?  Would something like readfile() work
better for you for that?
	https://lore.kernel.org/linux-api/20200704140250.423345-1-gregkh@linuxfoundation.org/

How does multiple processes slow anything down, there shouldn't be any
shared locks here.

> There are ways to avoid this
> multiple discoveries by sharing hwloc info through XML or shmem, but it
> will take years before all developers of different runtimes all
> implement this :)

I don't understand, what exactly are you suggesting we do here instead?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-07  5:15           ` Greg Kroah-Hartman
  2020-10-08  3:34             ` Ricardo Neri
@ 2020-11-12  6:19             ` Brice Goglin
  2020-11-12  6:42               ` Greg Kroah-Hartman
  1 sibling, 1 reply; 22+ messages in thread
From: Brice Goglin @ 2020-11-12  6:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Ricardo Neri
  Cc: x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

Le 07/10/2020 à 07:15, Greg Kroah-Hartman a écrit :
> On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote:
>> On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
>>> On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
>>>> On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
>>>>> On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
>>>>>> Hybrid CPU topologies combine CPUs of different microarchitectures in the
>>>>>> same die. Thus, even though the instruction set is compatible among all
>>>>>> CPUs, there may still be differences in features (e.g., some CPUs may
>>>>>> have counters that others CPU do not). There may be applications
>>>>>> interested in knowing the type of micro-architecture topology of the
>>>>>> system to make decisions about process affinity.
>>>>>>
>>>>>> While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
>>>>>> cpu_capacity) may be used to infer the types of micro-architecture of the
>>>>>> CPUs in the platform, it may not be entirely accurate. For instance, two
>>>>>> subsets of CPUs with different types of micro-architecture may have the
>>>>>> same capacity due to power or thermal constraints.
>>>>>>
>>>>>> Create the new directory /sys/devices/system/cpu/types. Under such
>>>>>> directory, create individual subdirectories for each type of CPU micro-
>>>>>> architecture. Each subdirectory will have cpulist and cpumap files. This
>>>>>> makes it convenient for user space to read all the CPUs of the same type
>>>>>> at once without having to inspect each CPU individually.
>>>>>>
>>>>>> Implement a generic interface using weak functions that architectures can
>>>>>> override to indicate a) support for CPU types, b) the CPU type number, and
>>>>>> c) a string to identify the CPU vendor and type.
>>>>>>
>>>>>> For example, an x86 system with one Intel Core and four Intel Atom CPUs
>>>>>> would look like this (other architectures have the hooks to use whatever
>>>>>> directory naming convention below "types" that meets their needs):
>>>>>>
>>>>>> user@host:~$: ls /sys/devices/system/cpu/types
>>>>>> intel_atom_0  intel_core_0
>>>>>>
>>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
>>>>>> cpulist cpumap
>>>>>>
>>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
>>>>>> cpulist cpumap
>>>>>>
>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
>>>>>> 0f
>>>>>>
>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
>>>>>> 0-3
>>>>>>
>>>>>> user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
>>>>>> 10
>>>>>>
>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
>>>>>> 4
>>>> Thank you for the quick and detailed Greg!
>>>>
>>>>> The output of 'tree' sometimes makes it easier to see here, or:
>>>>> 	grep -R . *
>>>>> also works well.
>>>> Indeed, this would definitely make it more readable.
>>>>
>>>>>> On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
>>>>>> created. Add a hook for this purpose.
>>>>> Why should these not show up if the system is not "hybrid"?
>>>> My thinking was that on a non-hybrid system, it does not make sense to
>>>> create this interface, as all the CPUs will be of the same type.
>>> Why not just have this an attribute type in the existing cpuX directory?
>>> Why do this have to be a totally separate directory and userspace has to
>>> figure out to look in two different spots for the same cpu to determine
>>> what it is?
>> But if the type is located under cpuX, usespace would need to traverse
>> all the CPUs and create its own cpu masks. Under the types directory it
>> would only need to look once for each type of CPU, IMHO.
> What does a "mask" do?  What does userspace care about this?  You would
> have to create it by traversing the directories you are creating anyway,
> so it's not much different, right?


Hello

Sorry for the late reply. As the first userspace consumer of this
interface [1], I can confirm that reading a single file to get the mask
would be better, at least for performance reason. On large platforms, we
already have to read thousands of sysfs files to get CPU topology and
cache information, I'd be happy not to read one more file per cpu.

Reading these sysfs files is slow, and it does not scale well when
multiple processes read them in parallel. There are ways to avoid this
multiple discoveries by sharing hwloc info through XML or shmem, but it
will take years before all developers of different runtimes all
implement this :)

Thanks

Brice

[1] hwloc and lstopo are going to expose this hybrid info to HPC
runtimes (https://www.open-mpi.org/projects/hwloc/)



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-06  1:15     ` Ricardo Neri
@ 2020-10-10  3:14       ` Randy Dunlap
  0 siblings, 0 replies; 22+ messages in thread
From: Randy Dunlap @ 2020-10-10  3:14 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Greg Kroah-Hartman, x86, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Rafael J. Wysocki, Tony Luck, Len Brown,
	Ravi V. Shankar, linux-kernel, Andi Kleen, Dave Hansen,
	Gautham R. Shenoy, Kan Liang, Srinivas Pandruvada

On 10/5/20 6:15 PM, Ricardo Neri wrote:
> On Fri, Oct 02, 2020 at 08:27:41PM -0700, Randy Dunlap wrote:
>> On 10/2/20 6:17 PM, Ricardo Neri wrote:
>>> +/**
>>> + * arch_get_cpu_type() - Get the CPU type number
>>> + * @cpu:	Index of the CPU of which the index is needed
>>> + *
>>> + * Get the CPU type number of @cpu, a non-zero unsigned 32-bit number that
> 
> Thank you for your feedback Randy!
> 
>> Are you sure that @cpu is non-zero?
> 
> This is the intent. Maybe it can be reworked to return -EINVAL instead?
> I gues it is plausible but less likely that an arch defines a type as
> 0xffffffea;

You lost me here.

>>
>>> + * uniquely identifies a type of CPU micro-architecture. All CPUs of the same
>>> + * type have the same type number. Type numbers are defined by each CPU
>>> + * architecture.
>>> + */
>>> +u32 __weak arch_get_cpu_type(int cpu)
>>> +{
>>> +	return 0;
>>> +}
>>
>> arch_get_cpu_type() in patch 4/4 allows @cpu to be 0.
> 
> It should not return 0 if the system does not have
> X86_FEATURE_HYBRID_CPU. The currently existing CPU types are all
> non-zero as per the Intel SDM. Am I missing anything?

@cpu is a cpu index, not a cpu type.

-- 
~Randy


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-07  5:15           ` Greg Kroah-Hartman
@ 2020-10-08  3:34             ` Ricardo Neri
  2020-11-12  6:19             ` Brice Goglin
  1 sibling, 0 replies; 22+ messages in thread
From: Ricardo Neri @ 2020-10-08  3:34 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Wed, Oct 07, 2020 at 07:15:46AM +0200, Greg Kroah-Hartman wrote:
> On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote:
> > On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
> > > On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
> > > > On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
> > > > > On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> > > > > > Hybrid CPU topologies combine CPUs of different microarchitectures in the
> > > > > > same die. Thus, even though the instruction set is compatible among all
> > > > > > CPUs, there may still be differences in features (e.g., some CPUs may
> > > > > > have counters that others CPU do not). There may be applications
> > > > > > interested in knowing the type of micro-architecture topology of the
> > > > > > system to make decisions about process affinity.
> > > > > > 
> > > > > > While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> > > > > > cpu_capacity) may be used to infer the types of micro-architecture of the
> > > > > > CPUs in the platform, it may not be entirely accurate. For instance, two
> > > > > > subsets of CPUs with different types of micro-architecture may have the
> > > > > > same capacity due to power or thermal constraints.
> > > > > > 
> > > > > > Create the new directory /sys/devices/system/cpu/types. Under such
> > > > > > directory, create individual subdirectories for each type of CPU micro-
> > > > > > architecture. Each subdirectory will have cpulist and cpumap files. This
> > > > > > makes it convenient for user space to read all the CPUs of the same type
> > > > > > at once without having to inspect each CPU individually.
> > > > > > 
> > > > > > Implement a generic interface using weak functions that architectures can
> > > > > > override to indicate a) support for CPU types, b) the CPU type number, and
> > > > > > c) a string to identify the CPU vendor and type.
> > > > > > 
> > > > > > For example, an x86 system with one Intel Core and four Intel Atom CPUs
> > > > > > would look like this (other architectures have the hooks to use whatever
> > > > > > directory naming convention below "types" that meets their needs):
> > > > > > 
> > > > > > user@host:~$: ls /sys/devices/system/cpu/types
> > > > > > intel_atom_0  intel_core_0
> > > > > > 
> > > > > > user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> > > > > > cpulist cpumap
> > > > > > 
> > > > > > user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> > > > > > cpulist cpumap
> > > > > > 
> > > > > > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> > > > > > 0f
> > > > > > 
> > > > > > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> > > > > > 0-3
> > > > > > 
> > > > > > user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> > > > > > 10
> > > > > > 
> > > > > > user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> > > > > > 4
> > > > 
> > > > Thank you for the quick and detailed Greg!
> > > > 
> > > > > 
> > > > > The output of 'tree' sometimes makes it easier to see here, or:
> > > > > 	grep -R . *
> > > > > also works well.
> > > > 
> > > > Indeed, this would definitely make it more readable.
> > > > 
> > > > > 
> > > > > > On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> > > > > > created. Add a hook for this purpose.
> > > > > 
> > > > > Why should these not show up if the system is not "hybrid"?
> > > > 
> > > > My thinking was that on a non-hybrid system, it does not make sense to
> > > > create this interface, as all the CPUs will be of the same type.
> > > 
> > > Why not just have this an attribute type in the existing cpuX directory?
> > > Why do this have to be a totally separate directory and userspace has to
> > > figure out to look in two different spots for the same cpu to determine
> > > what it is?
> > 
> > But if the type is located under cpuX, usespace would need to traverse
> > all the CPUs and create its own cpu masks. Under the types directory it
> > would only need to look once for each type of CPU, IMHO.
> 
> What does a "mask" do?  What does userspace care about this? 

It would only need to consume the cpu masks that sysfs provides; but it
would have to implement the extra step of looking at another directory.

> You would have to create it by traversing the directories you are creating anyway,
> so it's not much different, right?

True, that is a good point, userspace would still end up traversing
diretories anyways.

> 
> > > That feels wasteful, it should be much simpler to use the existing
> > > object, right?
> > > 
> > > That way, you also show the "type" of all cpus, no matter if they are
> > > "hybrid" or not, again, making userspace deal with things in a much
> > > simpler manner.
> > 
> > Indeed, that would be simpler to implement, and perhaps a natural extension
> > of the existing interface. 
> > 
> > Lastly, legacy and non-hybrid parts will not have a type defined. Thus,
> > it may not make sense for them to expose a type in sysfs.
> 
> That's fine, not having a sysfs file if you don't know the type is fine.
> Or you can fix that up and show the type of those as well, why wouldn't
> you want to?

What information should be used for the fixup? On legacy x86 parts,
CPUID would not have the leaf that provides this information. Maybe an
"unknown" type?

> 
> > > > I thought of this as maybe intel_atcm vs a potential amd_minizen. I
> > > > guess arch is sufficient, as there will never be an amd_atom.
> > > 
> > > Even if there is, that can be part of the cpu "type" that you have for
> > > your enumerated type, right?
> > 
> > Indeed. The interface I propose lets archs define their own string
> > names.
> 
> Arches _should_ define their own names, otherwise who would?

Yes, ineed.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-07  3:14         ` Ricardo Neri
@ 2020-10-07  5:15           ` Greg Kroah-Hartman
  2020-10-08  3:34             ` Ricardo Neri
  2020-11-12  6:19             ` Brice Goglin
  0 siblings, 2 replies; 22+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-07  5:15 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote:
> On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
> > On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
> > > On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
> > > > On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> > > > > Hybrid CPU topologies combine CPUs of different microarchitectures in the
> > > > > same die. Thus, even though the instruction set is compatible among all
> > > > > CPUs, there may still be differences in features (e.g., some CPUs may
> > > > > have counters that others CPU do not). There may be applications
> > > > > interested in knowing the type of micro-architecture topology of the
> > > > > system to make decisions about process affinity.
> > > > > 
> > > > > While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> > > > > cpu_capacity) may be used to infer the types of micro-architecture of the
> > > > > CPUs in the platform, it may not be entirely accurate. For instance, two
> > > > > subsets of CPUs with different types of micro-architecture may have the
> > > > > same capacity due to power or thermal constraints.
> > > > > 
> > > > > Create the new directory /sys/devices/system/cpu/types. Under such
> > > > > directory, create individual subdirectories for each type of CPU micro-
> > > > > architecture. Each subdirectory will have cpulist and cpumap files. This
> > > > > makes it convenient for user space to read all the CPUs of the same type
> > > > > at once without having to inspect each CPU individually.
> > > > > 
> > > > > Implement a generic interface using weak functions that architectures can
> > > > > override to indicate a) support for CPU types, b) the CPU type number, and
> > > > > c) a string to identify the CPU vendor and type.
> > > > > 
> > > > > For example, an x86 system with one Intel Core and four Intel Atom CPUs
> > > > > would look like this (other architectures have the hooks to use whatever
> > > > > directory naming convention below "types" that meets their needs):
> > > > > 
> > > > > user@host:~$: ls /sys/devices/system/cpu/types
> > > > > intel_atom_0  intel_core_0
> > > > > 
> > > > > user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> > > > > cpulist cpumap
> > > > > 
> > > > > user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> > > > > cpulist cpumap
> > > > > 
> > > > > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> > > > > 0f
> > > > > 
> > > > > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> > > > > 0-3
> > > > > 
> > > > > user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> > > > > 10
> > > > > 
> > > > > user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> > > > > 4
> > > 
> > > Thank you for the quick and detailed Greg!
> > > 
> > > > 
> > > > The output of 'tree' sometimes makes it easier to see here, or:
> > > > 	grep -R . *
> > > > also works well.
> > > 
> > > Indeed, this would definitely make it more readable.
> > > 
> > > > 
> > > > > On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> > > > > created. Add a hook for this purpose.
> > > > 
> > > > Why should these not show up if the system is not "hybrid"?
> > > 
> > > My thinking was that on a non-hybrid system, it does not make sense to
> > > create this interface, as all the CPUs will be of the same type.
> > 
> > Why not just have this an attribute type in the existing cpuX directory?
> > Why do this have to be a totally separate directory and userspace has to
> > figure out to look in two different spots for the same cpu to determine
> > what it is?
> 
> But if the type is located under cpuX, usespace would need to traverse
> all the CPUs and create its own cpu masks. Under the types directory it
> would only need to look once for each type of CPU, IMHO.

What does a "mask" do?  What does userspace care about this?  You would
have to create it by traversing the directories you are creating anyway,
so it's not much different, right?

> > That feels wasteful, it should be much simpler to use the existing
> > object, right?
> > 
> > That way, you also show the "type" of all cpus, no matter if they are
> > "hybrid" or not, again, making userspace deal with things in a much
> > simpler manner.
> 
> Indeed, that would be simpler to implement, and perhaps a natural extension
> of the existing interface. 
> 
> Lastly, legacy and non-hybrid parts will not have a type defined. Thus,
> it may not make sense for them to expose a type in sysfs.

That's fine, not having a sysfs file if you don't know the type is fine.
Or you can fix that up and show the type of those as well, why wouldn't
you want to?

> > > I thought of this as maybe intel_atcm vs a potential amd_minizen. I
> > > guess arch is sufficient, as there will never be an amd_atom.
> > 
> > Even if there is, that can be part of the cpu "type" that you have for
> > your enumerated type, right?
> 
> Indeed. The interface I propose lets archs define their own string
> names.

Arches _should_ define their own names, otherwise who would?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-06  7:37       ` Greg Kroah-Hartman
@ 2020-10-07  3:14         ` Ricardo Neri
  2020-10-07  5:15           ` Greg Kroah-Hartman
  0 siblings, 1 reply; 22+ messages in thread
From: Ricardo Neri @ 2020-10-07  3:14 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
> On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
> > On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
> > > On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> > > > Hybrid CPU topologies combine CPUs of different microarchitectures in the
> > > > same die. Thus, even though the instruction set is compatible among all
> > > > CPUs, there may still be differences in features (e.g., some CPUs may
> > > > have counters that others CPU do not). There may be applications
> > > > interested in knowing the type of micro-architecture topology of the
> > > > system to make decisions about process affinity.
> > > > 
> > > > While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> > > > cpu_capacity) may be used to infer the types of micro-architecture of the
> > > > CPUs in the platform, it may not be entirely accurate. For instance, two
> > > > subsets of CPUs with different types of micro-architecture may have the
> > > > same capacity due to power or thermal constraints.
> > > > 
> > > > Create the new directory /sys/devices/system/cpu/types. Under such
> > > > directory, create individual subdirectories for each type of CPU micro-
> > > > architecture. Each subdirectory will have cpulist and cpumap files. This
> > > > makes it convenient for user space to read all the CPUs of the same type
> > > > at once without having to inspect each CPU individually.
> > > > 
> > > > Implement a generic interface using weak functions that architectures can
> > > > override to indicate a) support for CPU types, b) the CPU type number, and
> > > > c) a string to identify the CPU vendor and type.
> > > > 
> > > > For example, an x86 system with one Intel Core and four Intel Atom CPUs
> > > > would look like this (other architectures have the hooks to use whatever
> > > > directory naming convention below "types" that meets their needs):
> > > > 
> > > > user@host:~$: ls /sys/devices/system/cpu/types
> > > > intel_atom_0  intel_core_0
> > > > 
> > > > user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> > > > cpulist cpumap
> > > > 
> > > > user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> > > > cpulist cpumap
> > > > 
> > > > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> > > > 0f
> > > > 
> > > > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> > > > 0-3
> > > > 
> > > > user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> > > > 10
> > > > 
> > > > user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> > > > 4
> > 
> > Thank you for the quick and detailed Greg!
> > 
> > > 
> > > The output of 'tree' sometimes makes it easier to see here, or:
> > > 	grep -R . *
> > > also works well.
> > 
> > Indeed, this would definitely make it more readable.
> > 
> > > 
> > > > On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> > > > created. Add a hook for this purpose.
> > > 
> > > Why should these not show up if the system is not "hybrid"?
> > 
> > My thinking was that on a non-hybrid system, it does not make sense to
> > create this interface, as all the CPUs will be of the same type.
> 
> Why not just have this an attribute type in the existing cpuX directory?
> Why do this have to be a totally separate directory and userspace has to
> figure out to look in two different spots for the same cpu to determine
> what it is?

But if the type is located under cpuX, usespace would need to traverse
all the CPUs and create its own cpu masks. Under the types directory it
would only need to look once for each type of CPU, IMHO.
> 
> That feels wasteful, it should be much simpler to use the existing
> object, right?
> 
> That way, you also show the "type" of all cpus, no matter if they are
> "hybrid" or not, again, making userspace deal with things in a much
> simpler manner.

Indeed, that would be simpler to implement, and perhaps a natural extension
of the existing interface. 

Lastly, legacy and non-hybrid parts will not have a type defined. Thus,
it may not make sense for them to expose a type in sysfs.

> > > > +/**
> > > > + * arch_get_cpu_type() - Get the CPU type number
> > > > + * @cpu:	Index of the CPU of which the index is needed
> > > > + *
> > > > + * Get the CPU type number of @cpu, a non-zero unsigned 32-bit number that
> > > > + * uniquely identifies a type of CPU micro-architecture. All CPUs of the same
> > > > + * type have the same type number. Type numbers are defined by each CPU
> > > > + * architecture.
> > > > + */
> > > > +u32 __weak arch_get_cpu_type(int cpu)
> > > 
> > > Can you just have this "cpu type" be an enumerated type so we have a
> > > clue as to what they really are?  Each arch can then define it, right?
> > 
> > Do you mean that each arch can have its own definition of the same
> > enumeration or that all archs share a common enumeration?
> 
> Each arch should have its own definition of this enumerated type.

Sure, understood.

> 
> > > > +/**
> > > > + * arch_get_cpu_type_name() - Get the CPU type name
> > > > + * @cpu_type:	Type of CPU micro-architecture.
> > > > + *
> > > > + * Returns a string name associated with the CPU micro-architecture type as
> > > > + * indicated in @cpu_type. The format shall be <vendor>_<cpu_type>. Returns
> > > > + * NULL if the CPU type is not known.
> > > > + */
> > > > +const char __weak *arch_get_cpu_type_name(u32 cpu_type)
> > > > +{
> > > > +	return NULL;
> > > > +}
> > > 
> > > Why is vendor part of this?  Shouldn't it just be arch?
> > > 
> > > I say this as "vendor" is kind of "interesting" when it comes to other
> > > arches...
> > 
> > I thought of this as maybe intel_atcm vs a potential amd_minizen. I
> > guess arch is sufficient, as there will never be an amd_atom.
> 
> Even if there is, that can be part of the cpu "type" that you have for
> your enumerated type, right?

Indeed. The interface I propose lets archs define their own string
names.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-06  0:57     ` Ricardo Neri
@ 2020-10-06  7:37       ` Greg Kroah-Hartman
  2020-10-07  3:14         ` Ricardo Neri
  0 siblings, 1 reply; 22+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-06  7:37 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
> On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
> > On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> > > Hybrid CPU topologies combine CPUs of different microarchitectures in the
> > > same die. Thus, even though the instruction set is compatible among all
> > > CPUs, there may still be differences in features (e.g., some CPUs may
> > > have counters that others CPU do not). There may be applications
> > > interested in knowing the type of micro-architecture topology of the
> > > system to make decisions about process affinity.
> > > 
> > > While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> > > cpu_capacity) may be used to infer the types of micro-architecture of the
> > > CPUs in the platform, it may not be entirely accurate. For instance, two
> > > subsets of CPUs with different types of micro-architecture may have the
> > > same capacity due to power or thermal constraints.
> > > 
> > > Create the new directory /sys/devices/system/cpu/types. Under such
> > > directory, create individual subdirectories for each type of CPU micro-
> > > architecture. Each subdirectory will have cpulist and cpumap files. This
> > > makes it convenient for user space to read all the CPUs of the same type
> > > at once without having to inspect each CPU individually.
> > > 
> > > Implement a generic interface using weak functions that architectures can
> > > override to indicate a) support for CPU types, b) the CPU type number, and
> > > c) a string to identify the CPU vendor and type.
> > > 
> > > For example, an x86 system with one Intel Core and four Intel Atom CPUs
> > > would look like this (other architectures have the hooks to use whatever
> > > directory naming convention below "types" that meets their needs):
> > > 
> > > user@host:~$: ls /sys/devices/system/cpu/types
> > > intel_atom_0  intel_core_0
> > > 
> > > user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> > > cpulist cpumap
> > > 
> > > user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> > > cpulist cpumap
> > > 
> > > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> > > 0f
> > > 
> > > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> > > 0-3
> > > 
> > > user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> > > 10
> > > 
> > > user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> > > 4
> 
> Thank you for the quick and detailed Greg!
> 
> > 
> > The output of 'tree' sometimes makes it easier to see here, or:
> > 	grep -R . *
> > also works well.
> 
> Indeed, this would definitely make it more readable.
> 
> > 
> > > On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> > > created. Add a hook for this purpose.
> > 
> > Why should these not show up if the system is not "hybrid"?
> 
> My thinking was that on a non-hybrid system, it does not make sense to
> create this interface, as all the CPUs will be of the same type.

Why not just have this an attribute type in the existing cpuX directory?
Why do this have to be a totally separate directory and userspace has to
figure out to look in two different spots for the same cpu to determine
what it is?

That feels wasteful, it should be much simpler to use the existing
object, right?

That way, you also show the "type" of all cpus, no matter if they are
"hybrid" or not, again, making userspace deal with things in a much
simpler manner.

> > > +/**
> > > + * arch_get_cpu_type() - Get the CPU type number
> > > + * @cpu:	Index of the CPU of which the index is needed
> > > + *
> > > + * Get the CPU type number of @cpu, a non-zero unsigned 32-bit number that
> > > + * uniquely identifies a type of CPU micro-architecture. All CPUs of the same
> > > + * type have the same type number. Type numbers are defined by each CPU
> > > + * architecture.
> > > + */
> > > +u32 __weak arch_get_cpu_type(int cpu)
> > 
> > Can you just have this "cpu type" be an enumerated type so we have a
> > clue as to what they really are?  Each arch can then define it, right?
> 
> Do you mean that each arch can have its own definition of the same
> enumeration or that all archs share a common enumeration?

Each arch should have its own definition of this enumerated type.

> > > +/**
> > > + * arch_get_cpu_type_name() - Get the CPU type name
> > > + * @cpu_type:	Type of CPU micro-architecture.
> > > + *
> > > + * Returns a string name associated with the CPU micro-architecture type as
> > > + * indicated in @cpu_type. The format shall be <vendor>_<cpu_type>. Returns
> > > + * NULL if the CPU type is not known.
> > > + */
> > > +const char __weak *arch_get_cpu_type_name(u32 cpu_type)
> > > +{
> > > +	return NULL;
> > > +}
> > 
> > Why is vendor part of this?  Shouldn't it just be arch?
> > 
> > I say this as "vendor" is kind of "interesting" when it comes to other
> > arches...
> 
> I thought of this as maybe intel_atcm vs a potential amd_minizen. I
> guess arch is sufficient, as there will never be an amd_atom.

Even if there is, that can be part of the cpu "type" that you have for
your enumerated type, right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-03  3:27   ` Randy Dunlap
@ 2020-10-06  1:15     ` Ricardo Neri
  2020-10-10  3:14       ` Randy Dunlap
  0 siblings, 1 reply; 22+ messages in thread
From: Ricardo Neri @ 2020-10-06  1:15 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Greg Kroah-Hartman, x86, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Rafael J. Wysocki, Tony Luck, Len Brown,
	Ravi V. Shankar, linux-kernel, Andi Kleen, Dave Hansen,
	Gautham R. Shenoy, Kan Liang, Srinivas Pandruvada

On Fri, Oct 02, 2020 at 08:27:41PM -0700, Randy Dunlap wrote:
> On 10/2/20 6:17 PM, Ricardo Neri wrote:
> > +/**
> > + * arch_get_cpu_type() - Get the CPU type number
> > + * @cpu:	Index of the CPU of which the index is needed
> > + *
> > + * Get the CPU type number of @cpu, a non-zero unsigned 32-bit number that

Thank you for your feedback Randy!

> Are you sure that @cpu is non-zero?

This is the intent. Maybe it can be reworked to return -EINVAL instead?
I gues it is plausible but less likely that an arch defines a type as
0xffffffea;

> 
> > + * uniquely identifies a type of CPU micro-architecture. All CPUs of the same
> > + * type have the same type number. Type numbers are defined by each CPU
> > + * architecture.
> > + */
> > +u32 __weak arch_get_cpu_type(int cpu)
> > +{
> > +	return 0;
> > +}
> 
> arch_get_cpu_type() in patch 4/4 allows @cpu to be 0.

It should not return 0 if the system does not have
X86_FEATURE_HYBRID_CPU. The currently existing CPU types are all
non-zero as per the Intel SDM. Am I missing anything?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-03 11:05     ` Greg Kroah-Hartman
@ 2020-10-06  1:08       ` Ricardo Neri
  0 siblings, 0 replies; 22+ messages in thread
From: Ricardo Neri @ 2020-10-06  1:08 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Sat, Oct 03, 2020 at 01:05:48PM +0200, Greg Kroah-Hartman wrote:
> On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
> > On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> > > +/**
> > > + * arch_get_cpu_type_name() - Get the CPU type name
> > > + * @cpu_type:	Type of CPU micro-architecture.
> > > + *
> > > + * Returns a string name associated with the CPU micro-architecture type as
> > > + * indicated in @cpu_type. The format shall be <vendor>_<cpu_type>. Returns
> > > + * NULL if the CPU type is not known.
> > > + */
> > > +const char __weak *arch_get_cpu_type_name(u32 cpu_type)
> > > +{
> > > +	return NULL;
> > > +}
> > 
> > Why is vendor part of this?  Shouldn't it just be arch?
> > 
> > I say this as "vendor" is kind of "interesting" when it comes to other
> > arches...
> > 
> > Speaking of other arches, we all know that other arches have this
> > feature as well, have you worked with any other groups to verify that
> > this interface will also work with them?
> 
> Here's one set of patches for ARM64 for much the same type of cpu
> design:
> 	https://android-review.googlesource.com/c/kernel/common/+/1437098/3
> Yes, it's not been posted to any kernel lists, but this is public so you
> need to work with the ARM developers to come up with an interface that
> works for everyone please.

Thanks for the pointer, Greg! I will study this proposal and work with
the ARM engineers.

BR,
Ricardo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-03  8:53   ` Greg Kroah-Hartman
  2020-10-03 11:05     ` Greg Kroah-Hartman
@ 2020-10-06  0:57     ` Ricardo Neri
  2020-10-06  7:37       ` Greg Kroah-Hartman
  1 sibling, 1 reply; 22+ messages in thread
From: Ricardo Neri @ 2020-10-06  0:57 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> > Hybrid CPU topologies combine CPUs of different microarchitectures in the
> > same die. Thus, even though the instruction set is compatible among all
> > CPUs, there may still be differences in features (e.g., some CPUs may
> > have counters that others CPU do not). There may be applications
> > interested in knowing the type of micro-architecture topology of the
> > system to make decisions about process affinity.
> > 
> > While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> > cpu_capacity) may be used to infer the types of micro-architecture of the
> > CPUs in the platform, it may not be entirely accurate. For instance, two
> > subsets of CPUs with different types of micro-architecture may have the
> > same capacity due to power or thermal constraints.
> > 
> > Create the new directory /sys/devices/system/cpu/types. Under such
> > directory, create individual subdirectories for each type of CPU micro-
> > architecture. Each subdirectory will have cpulist and cpumap files. This
> > makes it convenient for user space to read all the CPUs of the same type
> > at once without having to inspect each CPU individually.
> > 
> > Implement a generic interface using weak functions that architectures can
> > override to indicate a) support for CPU types, b) the CPU type number, and
> > c) a string to identify the CPU vendor and type.
> > 
> > For example, an x86 system with one Intel Core and four Intel Atom CPUs
> > would look like this (other architectures have the hooks to use whatever
> > directory naming convention below "types" that meets their needs):
> > 
> > user@host:~$: ls /sys/devices/system/cpu/types
> > intel_atom_0  intel_core_0
> > 
> > user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> > cpulist cpumap
> > 
> > user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> > cpulist cpumap
> > 
> > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> > 0f
> > 
> > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> > 0-3
> > 
> > user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> > 10
> > 
> > user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> > 4

Thank you for the quick and detailed Greg!

> 
> The output of 'tree' sometimes makes it easier to see here, or:
> 	grep -R . *
> also works well.

Indeed, this would definitely make it more readable.

> 
> > On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> > created. Add a hook for this purpose.
> 
> Why should these not show up if the system is not "hybrid"?

My thinking was that on a non-hybrid system, it does not make sense to
create this interface, as all the CPUs will be of the same type.

> 
> > 
> > Cc: Andi Kleen <ak@linux.intel.com>
> > Cc: Dave Hansen <dave.hansen@intel.com>
> > Cc: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> > Cc: Kan Liang <kan.liang@linux.intel.com>
> > Cc: Len Brown <len.brown@intel.com>
> > Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> > Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
> > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > Cc: linux-kernel@vger.kernel.org
> > Reviewed-by: Tony Luck <tony.luck@intel.com>
> > Suggested-by: Len Brown <len.brown@intel.com> # Necessity of the interface
> > Suggested-by: Dave Hansen <dave.hansen@intel.com> # Details of the interface
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  .../ABI/testing/sysfs-devices-system-cpu      |  13 ++
> >  drivers/base/cpu.c                            | 214 ++++++++++++++++++
> >  include/linux/cpu.h                           |  12 +
> >  include/linux/cpuhotplug.h                    |   1 +
> >  4 files changed, 240 insertions(+)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
> > index b555df825447..bcb3d7195102 100644
> > --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> > +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> > @@ -614,3 +614,16 @@ Description:	SPURR ticks for cpuX when it was idle.
> >  
> >  		This sysfs interface exposes the number of SPURR ticks
> >  		for cpuX when it was idle.
> > +
> > +What:		/sys/devices/system/cpu/types
> > +		/sys/devices/system/cpu/types/<vendor>_<cpu_type>/cpumap
> > +		/sys/devices/system/cpu/types/<vendor>_<cpu_type>/cpulist
> 
> Please split these up into different entries, because:

Sure, I can split these into two entries.

> 
> > +Date:		Oct 2020
> > +Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
> > +Description:	Types of CPUs in hybrid systems
> > +
> > +		If a platform has CPUs with more than one type of micro-
> > +		architecture, there is one directory for each type of CPU.
> > +		Inside each directory, the files cpumap and cpulist contain
> > +		a mask and a list representing CPUs of the same type of micro-
> > +		architecture. These files only contain CPUs currently online.
> 
> You don't describe what mask and list actualy represent.  I have no clue
> what they mean, so odds are, others will not either.

Sure, I will explain better what mask and list represent.

> 
> > diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> > index d2136ab9b14a..7e98b11b15a1 100644
> > --- a/drivers/base/cpu.c
> > +++ b/drivers/base/cpu.c
> > @@ -607,11 +607,225 @@ static void __init cpu_register_vulnerabilities(void)
> >  static inline void cpu_register_vulnerabilities(void) { }
> >  #endif
> >  
> > +static ssize_t cpulist_show(struct device *device,
> > +			    struct device_attribute *attr,
> > +			    char *buf)
> > +{
> > +	struct cpumask *mask = dev_get_drvdata(device);
> > +
> > +	if (!mask)
> > +		return -EINVAL;
> 
> How can mask be NULL?

It cannot. I will remove this check.

> 
> > +
> > +	return cpumap_print_to_pagebuf(true, buf, mask);
> > +}
> > +
> > +static ssize_t cpumap_show(struct device *device,
> > +			   struct device_attribute *attr,
> > +			   char *buf)
> > +{
> > +	struct cpumask *mask = dev_get_drvdata(device);
> > +
> > +	if (!mask)
> > +		return -EINVAL;
> 
> Again, how can that be true?

It cannot, I will remove this check.

> 
> > +
> > +	return cpumap_print_to_pagebuf(false, buf, mask);
> > +}
> > +
> > +static DEVICE_ATTR_RO(cpumap);
> > +static DEVICE_ATTR_RO(cpulist);
> > +
> > +static struct attribute *cpu_type_attrs[] = {
> > +	&dev_attr_cpumap.attr,
> > +	&dev_attr_cpulist.attr,
> > +	NULL
> > +};
> > +
> > +static const struct attribute_group cpu_type_attr_group = {
> > +	.attrs = cpu_type_attrs,
> > +};
> > +
> > +static const struct attribute_group *cpu_type_attr_groups[] = {
> > +	&cpu_type_attr_group,
> > +	NULL,
> > +};
> 
> ATTRIBUTE_GROUPS()?

I will use this macro.

> 
> > +
> > +/* Root device for the cpu_type sysfs entries */
> > +static struct device *cpu_type_device;
> > +/*
> 
> No extra blank line?

I will add a blank line here.
> 
> > + * An array of cpu_type devices. One per each type of supported CPU micro-
> > + * architecture.
> > + */
> > +static struct device *cpu_type_devices[CPUTYPES_MAX_NR];
> > +
> > +/**
> > + * arch_get_cpu_type() - Get the CPU type number
> > + * @cpu:	Index of the CPU of which the index is needed
> > + *
> > + * Get the CPU type number of @cpu, a non-zero unsigned 32-bit number that
> > + * uniquely identifies a type of CPU micro-architecture. All CPUs of the same
> > + * type have the same type number. Type numbers are defined by each CPU
> > + * architecture.
> > + */
> > +u32 __weak arch_get_cpu_type(int cpu)
> 
> Can you just have this "cpu type" be an enumerated type so we have a
> clue as to what they really are?  Each arch can then define it, right?

Do you mean that each arch can have its own definition of the same
enumeration or that all archs share a common enumeration?
> 
> > +{
> > +	return 0;
> > +}
> > +
> > +/**
> > + * arch_has_cpu_type() - Check if CPU types are supported
> > + *
> > + * Returns true if the running platform supports CPU types. This is true if the
> > + * platform has CPUs of more than one type of micro-architecture. Otherwise,
> > + * returns false.
> > + */
> > +bool __weak arch_has_cpu_type(void)
> > +{
> > +	return false;
> > +}
> > +
> > +/**
> > + * arch_get_cpu_type_name() - Get the CPU type name
> > + * @cpu_type:	Type of CPU micro-architecture.
> > + *
> > + * Returns a string name associated with the CPU micro-architecture type as
> > + * indicated in @cpu_type. The format shall be <vendor>_<cpu_type>. Returns
> > + * NULL if the CPU type is not known.
> > + */
> > +const char __weak *arch_get_cpu_type_name(u32 cpu_type)
> > +{
> > +	return NULL;
> > +}
> 
> Why is vendor part of this?  Shouldn't it just be arch?
> 
> I say this as "vendor" is kind of "interesting" when it comes to other
> arches...

I thought of this as maybe intel_atcm vs a potential amd_minizen. I
guess arch is sufficient, as there will never be an amd_atom.

> 
> Speaking of other arches, we all know that other arches have this
> feature as well, have you worked with any other groups to verify that
> this interface will also work with them?

I was not aware of the effort that you shared in a subsequent email. I
will contact the ARM engineers and discuss with them a potential
interface.

> 
> > +
> > +static int cpu_type_create_device(int cpu, int idx)
> > +{
> > +	u32 cpu_type = arch_get_cpu_type(cpu);
> > +	struct cpumask *mask;
> > +	const char *name;
> > +	char buf[64];
> > +
> > +	mask = kzalloc(cpumask_size(), GFP_KERNEL);
> > +	if (!mask)
> > +		return -ENOMEM;
> > +
> > +	name = arch_get_cpu_type_name(cpu_type);
> > +	if (!name) {
> > +		snprintf(buf, sizeof(buf), "unknown_%u", cpu_type);
> 
> If there is no name, why are you still making one up?
> 
> Shouldn't you just not create it at all?

I think all CPUs in the system must be enumerated. It may happen that an
old kernel runs on a new processor and does not know about a new type in
such processor. In this case, it will simply say that there is a CPU type
it does not know about.
> 
> > +		name = buf;
> > +	}
> > +
> > +	cpu_type_devices[idx] = cpu_device_create(cpu_type_device, mask,
> > +						  cpu_type_attr_groups, name);
> > +	if (IS_ERR(cpu_type_devices[idx]))
> > +		goto free_cpumask;
> > +
> > +	cpumask_set_cpu(cpu, mask);
> > +	return 0;
> > +
> > +free_cpumask:
> > +	kfree(mask);
> > +	return -ENOMEM;
> > +}
> > +
> > +static int cpu_type_sysfs_online(unsigned int cpu)
> > +{
> > +	u32 cpu_type = arch_get_cpu_type(cpu);
> > +	struct cpumask *mask;
> > +	int i;
> > +
> > +	for (i = 0; i < ARRAY_SIZE(cpu_type_devices); i++) {
> > +		u32 this_cpu_type;
> > +
> > +		/*
> > +		 * The first devices in the array are used first. Thus, create
> > +		 * a new device as well as sysfs directory and for the type of
> > +		 * @cpu.
> > +		 */
> > +		if (!cpu_type_devices[i])
> > +			return cpu_type_create_device(cpu, i);
> > +
> > +		mask = dev_get_drvdata(cpu_type_devices[i]);
> > +		if (!mask && !cpumask_weight(mask)) {
> > +			/*
> > +			 * We should not be here. Be paranoid about
> > +			 * NULL pointers.
> > +			 */
> > +			dev_err(cpu_type_devices[i], "CPU mask is invalid");
> > +			return -EINVAL;
> > +		}
> > +
> > +		this_cpu_type = arch_get_cpu_type(cpumask_first(mask));
> > +		if (cpu_type == this_cpu_type) {
> > +			cpumask_set_cpu(cpu, mask);
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	return -ENODEV;
> > +}
> > +
> > +static int cpu_type_sysfs_offline(unsigned int cpu)
> > +{
> > +	u32 cpu_type = arch_get_cpu_type(cpu);
> > +	struct cpumask *mask;
> > +	int i;
> > +
> > +	for (i = 0; i < ARRAY_SIZE(cpu_type_devices); i++) {
> > +		u32 this_cpu_type;
> > +
> > +		if (!cpu_type_devices[i])
> > +			continue;
> > +
> > +		mask = dev_get_drvdata(cpu_type_devices[i]);
> > +		if (!mask && !cpumask_weight(mask)) {
> > +			/*
> > +			 * We should not be here. Be paranoid about
> > +			 * NULL pointers.
> > +			 */
> > +			dev_err(cpu_type_devices[i], "CPU mask is invalid");
> > +			continue;
> > +		}
> > +
> > +		this_cpu_type = arch_get_cpu_type(cpumask_first(mask));
> > +		if (cpu_type == this_cpu_type) {
> > +			cpumask_clear_cpu(cpu, mask);
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	/*
> > +	 * If we are here, no matching cpu_type was found. This CPU was not
> > +	 * accounted for at hotplug online.
> > +	 */
> > +	pr_warn("Unexpected CPU offline!\n");
> > +
> > +	return -ENODEV;
> > +}
> > +
> > +static void __init cpu_type_sysfs_register(void)
> > +{
> > +	struct device *dev = cpu_subsys.dev_root;
> > +	int ret;
> > +
> > +	if (!arch_has_cpu_type())
> > +		return;
> > +
> > +	cpu_type_device = cpu_device_create(dev, NULL, NULL, "types");
> 
> This feels like an abuse of cpu_device_create(), doesn't it?

Would it be better if I use a struct device?

> 
> > +	if (IS_ERR(cpu_type_device))
> > +		return;
> > +
> > +	ret = cpuhp_setup_state(CPUHP_AP_BASE_CPUTYPE_ONLINE,
> > +				"base/cpu_type_sysfs:online",
> > +				cpu_type_sysfs_online, cpu_type_sysfs_offline);
> > +	if (ret)
> > +		dev_warn(dev, "failed to CPU type sysfs\n");
> > +}
> > +
> >  void __init cpu_dev_init(void)
> >  {
> >  	if (subsys_system_register(&cpu_subsys, cpu_root_attr_groups))
> >  		panic("Failed to register CPU subsystem");
> >  
> >  	cpu_dev_register_generic();
> > +	cpu_type_sysfs_register();
> >  	cpu_register_vulnerabilities();
> >  }
> > diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> > index 8aa84c052fdf..76fba70c7801 100644
> > --- a/include/linux/cpu.h
> > +++ b/include/linux/cpu.h
> > @@ -228,4 +228,16 @@ static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0;
> >  extern bool cpu_mitigations_off(void);
> >  extern bool cpu_mitigations_auto_nosmt(void);
> >  
> > +#ifndef CPUTYPES_MAX_NR
> > +/*
> > + * The maximum number of types of cpus. Architecture-specific implementations
> > + * shall override this value.
> > + */
> > +#define CPUTYPES_MAX_NR 0
> > +#endif
> > +
> > +extern u32 arch_get_cpu_type(int cpu);
> > +extern bool arch_has_cpu_type(void);
> > +extern const char *arch_get_cpu_type_name(u32 cpu_type);
> > +
> >  #endif /* _LINUX_CPU_H_ */
> > diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> > index 6f524bbf71a2..fd90ccb6652c 100644
> > --- a/include/linux/cpuhotplug.h
> > +++ b/include/linux/cpuhotplug.h
> > @@ -186,6 +186,7 @@ enum cpuhp_state {
> >  	CPUHP_AP_WATCHDOG_ONLINE,
> >  	CPUHP_AP_WORKQUEUE_ONLINE,
> >  	CPUHP_AP_RCUTREE_ONLINE,
> > +	CPUHP_AP_BASE_CPUTYPE_ONLINE,
> >  	CPUHP_AP_BASE_CACHEINFO_ONLINE,
> 
> I'm all for messing with enumerated type values, but why put it in this
> part of the list and not right before the next one?

The reason I put CPUHP_AP_BASE_CPUTYPE_ONLINE before
CPUHP_AP_BASE_CACHEINFO_ONLINE is that both deal with enumerating and
exposing the topology of the system. I could also invert the order.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-03  8:53   ` Greg Kroah-Hartman
@ 2020-10-03 11:05     ` Greg Kroah-Hartman
  2020-10-06  1:08       ` Ricardo Neri
  2020-10-06  0:57     ` Ricardo Neri
  1 sibling, 1 reply; 22+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-03 11:05 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> > +/**
> > + * arch_get_cpu_type_name() - Get the CPU type name
> > + * @cpu_type:	Type of CPU micro-architecture.
> > + *
> > + * Returns a string name associated with the CPU micro-architecture type as
> > + * indicated in @cpu_type. The format shall be <vendor>_<cpu_type>. Returns
> > + * NULL if the CPU type is not known.
> > + */
> > +const char __weak *arch_get_cpu_type_name(u32 cpu_type)
> > +{
> > +	return NULL;
> > +}
> 
> Why is vendor part of this?  Shouldn't it just be arch?
> 
> I say this as "vendor" is kind of "interesting" when it comes to other
> arches...
> 
> Speaking of other arches, we all know that other arches have this
> feature as well, have you worked with any other groups to verify that
> this interface will also work with them?

Here's one set of patches for ARM64 for much the same type of cpu
design:
	https://android-review.googlesource.com/c/kernel/common/+/1437098/3
Yes, it's not been posted to any kernel lists, but this is public so you
need to work with the ARM developers to come up with an interface that
works for everyone please.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-03  1:17 ` [PATCH 1/4] " Ricardo Neri
  2020-10-03  3:27   ` Randy Dunlap
@ 2020-10-03  8:53   ` Greg Kroah-Hartman
  2020-10-03 11:05     ` Greg Kroah-Hartman
  2020-10-06  0:57     ` Ricardo Neri
  1 sibling, 2 replies; 22+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-03  8:53 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> Hybrid CPU topologies combine CPUs of different microarchitectures in the
> same die. Thus, even though the instruction set is compatible among all
> CPUs, there may still be differences in features (e.g., some CPUs may
> have counters that others CPU do not). There may be applications
> interested in knowing the type of micro-architecture topology of the
> system to make decisions about process affinity.
> 
> While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> cpu_capacity) may be used to infer the types of micro-architecture of the
> CPUs in the platform, it may not be entirely accurate. For instance, two
> subsets of CPUs with different types of micro-architecture may have the
> same capacity due to power or thermal constraints.
> 
> Create the new directory /sys/devices/system/cpu/types. Under such
> directory, create individual subdirectories for each type of CPU micro-
> architecture. Each subdirectory will have cpulist and cpumap files. This
> makes it convenient for user space to read all the CPUs of the same type
> at once without having to inspect each CPU individually.
> 
> Implement a generic interface using weak functions that architectures can
> override to indicate a) support for CPU types, b) the CPU type number, and
> c) a string to identify the CPU vendor and type.
> 
> For example, an x86 system with one Intel Core and four Intel Atom CPUs
> would look like this (other architectures have the hooks to use whatever
> directory naming convention below "types" that meets their needs):
> 
> user@host:~$: ls /sys/devices/system/cpu/types
> intel_atom_0  intel_core_0
> 
> user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> cpulist cpumap
> 
> user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> cpulist cpumap
> 
> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> 0f
> 
> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> 0-3
> 
> user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> 10
> 
> user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> 4

The output of 'tree' sometimes makes it easier to see here, or:
	grep -R . *
also works well.

> On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> created. Add a hook for this purpose.

Why should these not show up if the system is not "hybrid"?

> 
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> Cc: Kan Liang <kan.liang@linux.intel.com>
> Cc: Len Brown <len.brown@intel.com>
> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: linux-kernel@vger.kernel.org
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Suggested-by: Len Brown <len.brown@intel.com> # Necessity of the interface
> Suggested-by: Dave Hansen <dave.hansen@intel.com> # Details of the interface
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  .../ABI/testing/sysfs-devices-system-cpu      |  13 ++
>  drivers/base/cpu.c                            | 214 ++++++++++++++++++
>  include/linux/cpu.h                           |  12 +
>  include/linux/cpuhotplug.h                    |   1 +
>  4 files changed, 240 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
> index b555df825447..bcb3d7195102 100644
> --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> @@ -614,3 +614,16 @@ Description:	SPURR ticks for cpuX when it was idle.
>  
>  		This sysfs interface exposes the number of SPURR ticks
>  		for cpuX when it was idle.
> +
> +What:		/sys/devices/system/cpu/types
> +		/sys/devices/system/cpu/types/<vendor>_<cpu_type>/cpumap
> +		/sys/devices/system/cpu/types/<vendor>_<cpu_type>/cpulist

Please split these up into different entries, because:

> +Date:		Oct 2020
> +Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
> +Description:	Types of CPUs in hybrid systems
> +
> +		If a platform has CPUs with more than one type of micro-
> +		architecture, there is one directory for each type of CPU.
> +		Inside each directory, the files cpumap and cpulist contain
> +		a mask and a list representing CPUs of the same type of micro-
> +		architecture. These files only contain CPUs currently online.

You don't describe what mask and list actualy represent.  I have no clue
what they mean, so odds are, others will not either.

> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> index d2136ab9b14a..7e98b11b15a1 100644
> --- a/drivers/base/cpu.c
> +++ b/drivers/base/cpu.c
> @@ -607,11 +607,225 @@ static void __init cpu_register_vulnerabilities(void)
>  static inline void cpu_register_vulnerabilities(void) { }
>  #endif
>  
> +static ssize_t cpulist_show(struct device *device,
> +			    struct device_attribute *attr,
> +			    char *buf)
> +{
> +	struct cpumask *mask = dev_get_drvdata(device);
> +
> +	if (!mask)
> +		return -EINVAL;

How can mask be NULL?

> +
> +	return cpumap_print_to_pagebuf(true, buf, mask);
> +}
> +
> +static ssize_t cpumap_show(struct device *device,
> +			   struct device_attribute *attr,
> +			   char *buf)
> +{
> +	struct cpumask *mask = dev_get_drvdata(device);
> +
> +	if (!mask)
> +		return -EINVAL;

Again, how can that be true?

> +
> +	return cpumap_print_to_pagebuf(false, buf, mask);
> +}
> +
> +static DEVICE_ATTR_RO(cpumap);
> +static DEVICE_ATTR_RO(cpulist);
> +
> +static struct attribute *cpu_type_attrs[] = {
> +	&dev_attr_cpumap.attr,
> +	&dev_attr_cpulist.attr,
> +	NULL
> +};
> +
> +static const struct attribute_group cpu_type_attr_group = {
> +	.attrs = cpu_type_attrs,
> +};
> +
> +static const struct attribute_group *cpu_type_attr_groups[] = {
> +	&cpu_type_attr_group,
> +	NULL,
> +};

ATTRIBUTE_GROUPS()?

> +
> +/* Root device for the cpu_type sysfs entries */
> +static struct device *cpu_type_device;
> +/*

No extra blank line?

> + * An array of cpu_type devices. One per each type of supported CPU micro-
> + * architecture.
> + */
> +static struct device *cpu_type_devices[CPUTYPES_MAX_NR];
> +
> +/**
> + * arch_get_cpu_type() - Get the CPU type number
> + * @cpu:	Index of the CPU of which the index is needed
> + *
> + * Get the CPU type number of @cpu, a non-zero unsigned 32-bit number that
> + * uniquely identifies a type of CPU micro-architecture. All CPUs of the same
> + * type have the same type number. Type numbers are defined by each CPU
> + * architecture.
> + */
> +u32 __weak arch_get_cpu_type(int cpu)

Can you just have this "cpu type" be an enumerated type so we have a
clue as to what they really are?  Each arch can then define it, right?

> +{
> +	return 0;
> +}
> +
> +/**
> + * arch_has_cpu_type() - Check if CPU types are supported
> + *
> + * Returns true if the running platform supports CPU types. This is true if the
> + * platform has CPUs of more than one type of micro-architecture. Otherwise,
> + * returns false.
> + */
> +bool __weak arch_has_cpu_type(void)
> +{
> +	return false;
> +}
> +
> +/**
> + * arch_get_cpu_type_name() - Get the CPU type name
> + * @cpu_type:	Type of CPU micro-architecture.
> + *
> + * Returns a string name associated with the CPU micro-architecture type as
> + * indicated in @cpu_type. The format shall be <vendor>_<cpu_type>. Returns
> + * NULL if the CPU type is not known.
> + */
> +const char __weak *arch_get_cpu_type_name(u32 cpu_type)
> +{
> +	return NULL;
> +}

Why is vendor part of this?  Shouldn't it just be arch?

I say this as "vendor" is kind of "interesting" when it comes to other
arches...

Speaking of other arches, we all know that other arches have this
feature as well, have you worked with any other groups to verify that
this interface will also work with them?

> +
> +static int cpu_type_create_device(int cpu, int idx)
> +{
> +	u32 cpu_type = arch_get_cpu_type(cpu);
> +	struct cpumask *mask;
> +	const char *name;
> +	char buf[64];
> +
> +	mask = kzalloc(cpumask_size(), GFP_KERNEL);
> +	if (!mask)
> +		return -ENOMEM;
> +
> +	name = arch_get_cpu_type_name(cpu_type);
> +	if (!name) {
> +		snprintf(buf, sizeof(buf), "unknown_%u", cpu_type);

If there is no name, why are you still making one up?

Shouldn't you just not create it at all?

> +		name = buf;
> +	}
> +
> +	cpu_type_devices[idx] = cpu_device_create(cpu_type_device, mask,
> +						  cpu_type_attr_groups, name);
> +	if (IS_ERR(cpu_type_devices[idx]))
> +		goto free_cpumask;
> +
> +	cpumask_set_cpu(cpu, mask);
> +	return 0;
> +
> +free_cpumask:
> +	kfree(mask);
> +	return -ENOMEM;
> +}
> +
> +static int cpu_type_sysfs_online(unsigned int cpu)
> +{
> +	u32 cpu_type = arch_get_cpu_type(cpu);
> +	struct cpumask *mask;
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(cpu_type_devices); i++) {
> +		u32 this_cpu_type;
> +
> +		/*
> +		 * The first devices in the array are used first. Thus, create
> +		 * a new device as well as sysfs directory and for the type of
> +		 * @cpu.
> +		 */
> +		if (!cpu_type_devices[i])
> +			return cpu_type_create_device(cpu, i);
> +
> +		mask = dev_get_drvdata(cpu_type_devices[i]);
> +		if (!mask && !cpumask_weight(mask)) {
> +			/*
> +			 * We should not be here. Be paranoid about
> +			 * NULL pointers.
> +			 */
> +			dev_err(cpu_type_devices[i], "CPU mask is invalid");
> +			return -EINVAL;
> +		}
> +
> +		this_cpu_type = arch_get_cpu_type(cpumask_first(mask));
> +		if (cpu_type == this_cpu_type) {
> +			cpumask_set_cpu(cpu, mask);
> +			return 0;
> +		}
> +	}
> +
> +	return -ENODEV;
> +}
> +
> +static int cpu_type_sysfs_offline(unsigned int cpu)
> +{
> +	u32 cpu_type = arch_get_cpu_type(cpu);
> +	struct cpumask *mask;
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(cpu_type_devices); i++) {
> +		u32 this_cpu_type;
> +
> +		if (!cpu_type_devices[i])
> +			continue;
> +
> +		mask = dev_get_drvdata(cpu_type_devices[i]);
> +		if (!mask && !cpumask_weight(mask)) {
> +			/*
> +			 * We should not be here. Be paranoid about
> +			 * NULL pointers.
> +			 */
> +			dev_err(cpu_type_devices[i], "CPU mask is invalid");
> +			continue;
> +		}
> +
> +		this_cpu_type = arch_get_cpu_type(cpumask_first(mask));
> +		if (cpu_type == this_cpu_type) {
> +			cpumask_clear_cpu(cpu, mask);
> +			return 0;
> +		}
> +	}
> +
> +	/*
> +	 * If we are here, no matching cpu_type was found. This CPU was not
> +	 * accounted for at hotplug online.
> +	 */
> +	pr_warn("Unexpected CPU offline!\n");
> +
> +	return -ENODEV;
> +}
> +
> +static void __init cpu_type_sysfs_register(void)
> +{
> +	struct device *dev = cpu_subsys.dev_root;
> +	int ret;
> +
> +	if (!arch_has_cpu_type())
> +		return;
> +
> +	cpu_type_device = cpu_device_create(dev, NULL, NULL, "types");

This feels like an abuse of cpu_device_create(), doesn't it?

> +	if (IS_ERR(cpu_type_device))
> +		return;
> +
> +	ret = cpuhp_setup_state(CPUHP_AP_BASE_CPUTYPE_ONLINE,
> +				"base/cpu_type_sysfs:online",
> +				cpu_type_sysfs_online, cpu_type_sysfs_offline);
> +	if (ret)
> +		dev_warn(dev, "failed to CPU type sysfs\n");
> +}
> +
>  void __init cpu_dev_init(void)
>  {
>  	if (subsys_system_register(&cpu_subsys, cpu_root_attr_groups))
>  		panic("Failed to register CPU subsystem");
>  
>  	cpu_dev_register_generic();
> +	cpu_type_sysfs_register();
>  	cpu_register_vulnerabilities();
>  }
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index 8aa84c052fdf..76fba70c7801 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -228,4 +228,16 @@ static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0;
>  extern bool cpu_mitigations_off(void);
>  extern bool cpu_mitigations_auto_nosmt(void);
>  
> +#ifndef CPUTYPES_MAX_NR
> +/*
> + * The maximum number of types of cpus. Architecture-specific implementations
> + * shall override this value.
> + */
> +#define CPUTYPES_MAX_NR 0
> +#endif
> +
> +extern u32 arch_get_cpu_type(int cpu);
> +extern bool arch_has_cpu_type(void);
> +extern const char *arch_get_cpu_type_name(u32 cpu_type);
> +
>  #endif /* _LINUX_CPU_H_ */
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index 6f524bbf71a2..fd90ccb6652c 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -186,6 +186,7 @@ enum cpuhp_state {
>  	CPUHP_AP_WATCHDOG_ONLINE,
>  	CPUHP_AP_WORKQUEUE_ONLINE,
>  	CPUHP_AP_RCUTREE_ONLINE,
> +	CPUHP_AP_BASE_CPUTYPE_ONLINE,
>  	CPUHP_AP_BASE_CACHEINFO_ONLINE,

I'm all for messing with enumerated type values, but why put it in this
part of the list and not right before the next one?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-03  1:17 ` [PATCH 1/4] " Ricardo Neri
@ 2020-10-03  3:27   ` Randy Dunlap
  2020-10-06  1:15     ` Ricardo Neri
  2020-10-03  8:53   ` Greg Kroah-Hartman
  1 sibling, 1 reply; 22+ messages in thread
From: Randy Dunlap @ 2020-10-03  3:27 UTC (permalink / raw)
  To: Ricardo Neri, Greg Kroah-Hartman, x86, Borislav Petkov,
	Ingo Molnar, Thomas Gleixner, Rafael J. Wysocki
  Cc: Tony Luck, Len Brown, Ravi V. Shankar, linux-kernel, Andi Kleen,
	Dave Hansen, Gautham R. Shenoy, Kan Liang, Srinivas Pandruvada

On 10/2/20 6:17 PM, Ricardo Neri wrote:
> +/**
> + * arch_get_cpu_type() - Get the CPU type number
> + * @cpu:	Index of the CPU of which the index is needed
> + *
> + * Get the CPU type number of @cpu, a non-zero unsigned 32-bit number that

Are you sure that @cpu is non-zero?


> + * uniquely identifies a type of CPU micro-architecture. All CPUs of the same
> + * type have the same type number. Type numbers are defined by each CPU
> + * architecture.
> + */
> +u32 __weak arch_get_cpu_type(int cpu)
> +{
> +	return 0;
> +}

arch_get_cpu_type() in patch 4/4 allows @cpu to be 0.


-- 
~Randy


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-03  1:17 [PATCH 0/4] " Ricardo Neri
@ 2020-10-03  1:17 ` Ricardo Neri
  2020-10-03  3:27   ` Randy Dunlap
  2020-10-03  8:53   ` Greg Kroah-Hartman
  0 siblings, 2 replies; 22+ messages in thread
From: Ricardo Neri @ 2020-10-03  1:17 UTC (permalink / raw)
  To: Greg Kroah-Hartman, x86, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Rafael J. Wysocki
  Cc: Tony Luck, Len Brown, Ravi V. Shankar, linux-kernel,
	Ricardo Neri, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

Hybrid CPU topologies combine CPUs of different microarchitectures in the
same die. Thus, even though the instruction set is compatible among all
CPUs, there may still be differences in features (e.g., some CPUs may
have counters that others CPU do not). There may be applications
interested in knowing the type of micro-architecture topology of the
system to make decisions about process affinity.

While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
cpu_capacity) may be used to infer the types of micro-architecture of the
CPUs in the platform, it may not be entirely accurate. For instance, two
subsets of CPUs with different types of micro-architecture may have the
same capacity due to power or thermal constraints.

Create the new directory /sys/devices/system/cpu/types. Under such
directory, create individual subdirectories for each type of CPU micro-
architecture. Each subdirectory will have cpulist and cpumap files. This
makes it convenient for user space to read all the CPUs of the same type
at once without having to inspect each CPU individually.

Implement a generic interface using weak functions that architectures can
override to indicate a) support for CPU types, b) the CPU type number, and
c) a string to identify the CPU vendor and type.

For example, an x86 system with one Intel Core and four Intel Atom CPUs
would look like this (other architectures have the hooks to use whatever
directory naming convention below "types" that meets their needs):

user@host:~$: ls /sys/devices/system/cpu/types
intel_atom_0  intel_core_0

user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
cpulist cpumap

user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
cpulist cpumap

user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
0f

user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
0-3

user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
10

user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
4

On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
created. Add a hook for this purpose.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Tony Luck <tony.luck@intel.com>
Suggested-by: Len Brown <len.brown@intel.com> # Necessity of the interface
Suggested-by: Dave Hansen <dave.hansen@intel.com> # Details of the interface
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 .../ABI/testing/sysfs-devices-system-cpu      |  13 ++
 drivers/base/cpu.c                            | 214 ++++++++++++++++++
 include/linux/cpu.h                           |  12 +
 include/linux/cpuhotplug.h                    |   1 +
 4 files changed, 240 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index b555df825447..bcb3d7195102 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -614,3 +614,16 @@ Description:	SPURR ticks for cpuX when it was idle.
 
 		This sysfs interface exposes the number of SPURR ticks
 		for cpuX when it was idle.
+
+What:		/sys/devices/system/cpu/types
+		/sys/devices/system/cpu/types/<vendor>_<cpu_type>/cpumap
+		/sys/devices/system/cpu/types/<vendor>_<cpu_type>/cpulist
+Date:		Oct 2020
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Types of CPUs in hybrid systems
+
+		If a platform has CPUs with more than one type of micro-
+		architecture, there is one directory for each type of CPU.
+		Inside each directory, the files cpumap and cpulist contain
+		a mask and a list representing CPUs of the same type of micro-
+		architecture. These files only contain CPUs currently online.
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index d2136ab9b14a..7e98b11b15a1 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -607,11 +607,225 @@ static void __init cpu_register_vulnerabilities(void)
 static inline void cpu_register_vulnerabilities(void) { }
 #endif
 
+static ssize_t cpulist_show(struct device *device,
+			    struct device_attribute *attr,
+			    char *buf)
+{
+	struct cpumask *mask = dev_get_drvdata(device);
+
+	if (!mask)
+		return -EINVAL;
+
+	return cpumap_print_to_pagebuf(true, buf, mask);
+}
+
+static ssize_t cpumap_show(struct device *device,
+			   struct device_attribute *attr,
+			   char *buf)
+{
+	struct cpumask *mask = dev_get_drvdata(device);
+
+	if (!mask)
+		return -EINVAL;
+
+	return cpumap_print_to_pagebuf(false, buf, mask);
+}
+
+static DEVICE_ATTR_RO(cpumap);
+static DEVICE_ATTR_RO(cpulist);
+
+static struct attribute *cpu_type_attrs[] = {
+	&dev_attr_cpumap.attr,
+	&dev_attr_cpulist.attr,
+	NULL
+};
+
+static const struct attribute_group cpu_type_attr_group = {
+	.attrs = cpu_type_attrs,
+};
+
+static const struct attribute_group *cpu_type_attr_groups[] = {
+	&cpu_type_attr_group,
+	NULL,
+};
+
+/* Root device for the cpu_type sysfs entries */
+static struct device *cpu_type_device;
+/*
+ * An array of cpu_type devices. One per each type of supported CPU micro-
+ * architecture.
+ */
+static struct device *cpu_type_devices[CPUTYPES_MAX_NR];
+
+/**
+ * arch_get_cpu_type() - Get the CPU type number
+ * @cpu:	Index of the CPU of which the index is needed
+ *
+ * Get the CPU type number of @cpu, a non-zero unsigned 32-bit number that
+ * uniquely identifies a type of CPU micro-architecture. All CPUs of the same
+ * type have the same type number. Type numbers are defined by each CPU
+ * architecture.
+ */
+u32 __weak arch_get_cpu_type(int cpu)
+{
+	return 0;
+}
+
+/**
+ * arch_has_cpu_type() - Check if CPU types are supported
+ *
+ * Returns true if the running platform supports CPU types. This is true if the
+ * platform has CPUs of more than one type of micro-architecture. Otherwise,
+ * returns false.
+ */
+bool __weak arch_has_cpu_type(void)
+{
+	return false;
+}
+
+/**
+ * arch_get_cpu_type_name() - Get the CPU type name
+ * @cpu_type:	Type of CPU micro-architecture.
+ *
+ * Returns a string name associated with the CPU micro-architecture type as
+ * indicated in @cpu_type. The format shall be <vendor>_<cpu_type>. Returns
+ * NULL if the CPU type is not known.
+ */
+const char __weak *arch_get_cpu_type_name(u32 cpu_type)
+{
+	return NULL;
+}
+
+static int cpu_type_create_device(int cpu, int idx)
+{
+	u32 cpu_type = arch_get_cpu_type(cpu);
+	struct cpumask *mask;
+	const char *name;
+	char buf[64];
+
+	mask = kzalloc(cpumask_size(), GFP_KERNEL);
+	if (!mask)
+		return -ENOMEM;
+
+	name = arch_get_cpu_type_name(cpu_type);
+	if (!name) {
+		snprintf(buf, sizeof(buf), "unknown_%u", cpu_type);
+		name = buf;
+	}
+
+	cpu_type_devices[idx] = cpu_device_create(cpu_type_device, mask,
+						  cpu_type_attr_groups, name);
+	if (IS_ERR(cpu_type_devices[idx]))
+		goto free_cpumask;
+
+	cpumask_set_cpu(cpu, mask);
+	return 0;
+
+free_cpumask:
+	kfree(mask);
+	return -ENOMEM;
+}
+
+static int cpu_type_sysfs_online(unsigned int cpu)
+{
+	u32 cpu_type = arch_get_cpu_type(cpu);
+	struct cpumask *mask;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(cpu_type_devices); i++) {
+		u32 this_cpu_type;
+
+		/*
+		 * The first devices in the array are used first. Thus, create
+		 * a new device as well as sysfs directory and for the type of
+		 * @cpu.
+		 */
+		if (!cpu_type_devices[i])
+			return cpu_type_create_device(cpu, i);
+
+		mask = dev_get_drvdata(cpu_type_devices[i]);
+		if (!mask && !cpumask_weight(mask)) {
+			/*
+			 * We should not be here. Be paranoid about
+			 * NULL pointers.
+			 */
+			dev_err(cpu_type_devices[i], "CPU mask is invalid");
+			return -EINVAL;
+		}
+
+		this_cpu_type = arch_get_cpu_type(cpumask_first(mask));
+		if (cpu_type == this_cpu_type) {
+			cpumask_set_cpu(cpu, mask);
+			return 0;
+		}
+	}
+
+	return -ENODEV;
+}
+
+static int cpu_type_sysfs_offline(unsigned int cpu)
+{
+	u32 cpu_type = arch_get_cpu_type(cpu);
+	struct cpumask *mask;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(cpu_type_devices); i++) {
+		u32 this_cpu_type;
+
+		if (!cpu_type_devices[i])
+			continue;
+
+		mask = dev_get_drvdata(cpu_type_devices[i]);
+		if (!mask && !cpumask_weight(mask)) {
+			/*
+			 * We should not be here. Be paranoid about
+			 * NULL pointers.
+			 */
+			dev_err(cpu_type_devices[i], "CPU mask is invalid");
+			continue;
+		}
+
+		this_cpu_type = arch_get_cpu_type(cpumask_first(mask));
+		if (cpu_type == this_cpu_type) {
+			cpumask_clear_cpu(cpu, mask);
+			return 0;
+		}
+	}
+
+	/*
+	 * If we are here, no matching cpu_type was found. This CPU was not
+	 * accounted for at hotplug online.
+	 */
+	pr_warn("Unexpected CPU offline!\n");
+
+	return -ENODEV;
+}
+
+static void __init cpu_type_sysfs_register(void)
+{
+	struct device *dev = cpu_subsys.dev_root;
+	int ret;
+
+	if (!arch_has_cpu_type())
+		return;
+
+	cpu_type_device = cpu_device_create(dev, NULL, NULL, "types");
+	if (IS_ERR(cpu_type_device))
+		return;
+
+	ret = cpuhp_setup_state(CPUHP_AP_BASE_CPUTYPE_ONLINE,
+				"base/cpu_type_sysfs:online",
+				cpu_type_sysfs_online, cpu_type_sysfs_offline);
+	if (ret)
+		dev_warn(dev, "failed to CPU type sysfs\n");
+}
+
 void __init cpu_dev_init(void)
 {
 	if (subsys_system_register(&cpu_subsys, cpu_root_attr_groups))
 		panic("Failed to register CPU subsystem");
 
 	cpu_dev_register_generic();
+	cpu_type_sysfs_register();
 	cpu_register_vulnerabilities();
 }
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 8aa84c052fdf..76fba70c7801 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -228,4 +228,16 @@ static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0;
 extern bool cpu_mitigations_off(void);
 extern bool cpu_mitigations_auto_nosmt(void);
 
+#ifndef CPUTYPES_MAX_NR
+/*
+ * The maximum number of types of cpus. Architecture-specific implementations
+ * shall override this value.
+ */
+#define CPUTYPES_MAX_NR 0
+#endif
+
+extern u32 arch_get_cpu_type(int cpu);
+extern bool arch_has_cpu_type(void);
+extern const char *arch_get_cpu_type_name(u32 cpu_type);
+
 #endif /* _LINUX_CPU_H_ */
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 6f524bbf71a2..fd90ccb6652c 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -186,6 +186,7 @@ enum cpuhp_state {
 	CPUHP_AP_WATCHDOG_ONLINE,
 	CPUHP_AP_WORKQUEUE_ONLINE,
 	CPUHP_AP_RCUTREE_ONLINE,
+	CPUHP_AP_BASE_CPUTYPE_ONLINE,
 	CPUHP_AP_BASE_CACHEINFO_ONLINE,
 	CPUHP_AP_ONLINE_DYN,
 	CPUHP_AP_ONLINE_DYN_END		= CPUHP_AP_ONLINE_DYN + 30,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-11-19  8:26 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-03  8:07 [PATCH 1/4] drivers core: Introduce CPU type sysfs interface kernel test robot
  -- strict thread matches above, loose matches on Subject: below --
2020-10-03  1:17 [PATCH 0/4] " Ricardo Neri
2020-10-03  1:17 ` [PATCH 1/4] " Ricardo Neri
2020-10-03  3:27   ` Randy Dunlap
2020-10-06  1:15     ` Ricardo Neri
2020-10-10  3:14       ` Randy Dunlap
2020-10-03  8:53   ` Greg Kroah-Hartman
2020-10-03 11:05     ` Greg Kroah-Hartman
2020-10-06  1:08       ` Ricardo Neri
2020-10-06  0:57     ` Ricardo Neri
2020-10-06  7:37       ` Greg Kroah-Hartman
2020-10-07  3:14         ` Ricardo Neri
2020-10-07  5:15           ` Greg Kroah-Hartman
2020-10-08  3:34             ` Ricardo Neri
2020-11-12  6:19             ` Brice Goglin
2020-11-12  6:42               ` Greg Kroah-Hartman
2020-11-12  9:10                 ` Brice Goglin
2020-11-12 10:49                   ` Greg Kroah-Hartman
2020-11-17 15:55                     ` Brice Goglin
2020-11-18 10:45                       ` Brice Goglin
2020-11-18 10:57                         ` Greg Kroah-Hartman
     [not found]                     ` <38f290d2-4c3a-d1b0-f3cc-a0897ea10abd@gmail.com>
2020-11-12 11:34                       ` Greg Kroah-Hartman
2020-11-19  8:25                       ` Fox Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.