linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Brice Goglin <brice.goglin@gmail.com>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>,
	x86@kernel.org, Borislav Petkov <bp@suse.de>,
	Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Tony Luck <tony.luck@intel.com>, Len Brown <len.brown@intel.com>,
	"Ravi V. Shankar" <ravi.v.shankar@intel.com>,
	linux-kernel@vger.kernel.org, Andi Kleen <ak@linux.intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	"Gautham R. Shenoy" <ego@linux.vnet.ibm.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Subject: Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
Date: Thu, 12 Nov 2020 12:34:20 +0100	[thread overview]
Message-ID: <X60dvJoT4fURcnsF@kroah.com> (raw)
In-Reply-To: <38f290d2-4c3a-d1b0-f3cc-a0897ea10abd@gmail.com>

On Thu, Nov 12, 2020 at 12:21:43PM +0100, Brice Goglin wrote:
> 
> Le 12/11/2020 à 11:49, Greg Kroah-Hartman a écrit :
> > On Thu, Nov 12, 2020 at 10:10:57AM +0100, Brice Goglin wrote:
> >> Le 12/11/2020 à 07:42, Greg Kroah-Hartman a écrit :
> >>> On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote:
> >>>> Le 07/10/2020 à 07:15, Greg Kroah-Hartman a écrit :
> >>>>> On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote:
> >>>>>> On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
> >>>>>>> On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
> >>>>>>>> On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
> >>>>>>>>> On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> >>>>>>>>>> Hybrid CPU topologies combine CPUs of different microarchitectures in the
> >>>>>>>>>> same die. Thus, even though the instruction set is compatible among all
> >>>>>>>>>> CPUs, there may still be differences in features (e.g., some CPUs may
> >>>>>>>>>> have counters that others CPU do not). There may be applications
> >>>>>>>>>> interested in knowing the type of micro-architecture topology of the
> >>>>>>>>>> system to make decisions about process affinity.
> >>>>>>>>>>
> >>>>>>>>>> While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> >>>>>>>>>> cpu_capacity) may be used to infer the types of micro-architecture of the
> >>>>>>>>>> CPUs in the platform, it may not be entirely accurate. For instance, two
> >>>>>>>>>> subsets of CPUs with different types of micro-architecture may have the
> >>>>>>>>>> same capacity due to power or thermal constraints.
> >>>>>>>>>>
> >>>>>>>>>> Create the new directory /sys/devices/system/cpu/types. Under such
> >>>>>>>>>> directory, create individual subdirectories for each type of CPU micro-
> >>>>>>>>>> architecture. Each subdirectory will have cpulist and cpumap files. This
> >>>>>>>>>> makes it convenient for user space to read all the CPUs of the same type
> >>>>>>>>>> at once without having to inspect each CPU individually.
> >>>>>>>>>>
> >>>>>>>>>> Implement a generic interface using weak functions that architectures can
> >>>>>>>>>> override to indicate a) support for CPU types, b) the CPU type number, and
> >>>>>>>>>> c) a string to identify the CPU vendor and type.
> >>>>>>>>>>
> >>>>>>>>>> For example, an x86 system with one Intel Core and four Intel Atom CPUs
> >>>>>>>>>> would look like this (other architectures have the hooks to use whatever
> >>>>>>>>>> directory naming convention below "types" that meets their needs):
> >>>>>>>>>>
> >>>>>>>>>> user@host:~$: ls /sys/devices/system/cpu/types
> >>>>>>>>>> intel_atom_0  intel_core_0
> >>>>>>>>>>
> >>>>>>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> >>>>>>>>>> cpulist cpumap
> >>>>>>>>>>
> >>>>>>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> >>>>>>>>>> cpulist cpumap
> >>>>>>>>>>
> >>>>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> >>>>>>>>>> 0f
> >>>>>>>>>>
> >>>>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> >>>>>>>>>> 0-3
> >>>>>>>>>>
> >>>>>>>>>> user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> >>>>>>>>>> 10
> >>>>>>>>>>
> >>>>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> >>>>>>>>>> 4
> >>>>>>>> Thank you for the quick and detailed Greg!
> >>>>>>>>
> >>>>>>>>> The output of 'tree' sometimes makes it easier to see here, or:
> >>>>>>>>> 	grep -R . *
> >>>>>>>>> also works well.
> >>>>>>>> Indeed, this would definitely make it more readable.
> >>>>>>>>
> >>>>>>>>>> On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> >>>>>>>>>> created. Add a hook for this purpose.
> >>>>>>>>> Why should these not show up if the system is not "hybrid"?
> >>>>>>>> My thinking was that on a non-hybrid system, it does not make sense to
> >>>>>>>> create this interface, as all the CPUs will be of the same type.
> >>>>>>> Why not just have this an attribute type in the existing cpuX directory?
> >>>>>>> Why do this have to be a totally separate directory and userspace has to
> >>>>>>> figure out to look in two different spots for the same cpu to determine
> >>>>>>> what it is?
> >>>>>> But if the type is located under cpuX, usespace would need to traverse
> >>>>>> all the CPUs and create its own cpu masks. Under the types directory it
> >>>>>> would only need to look once for each type of CPU, IMHO.
> >>>>> What does a "mask" do?  What does userspace care about this?  You would
> >>>>> have to create it by traversing the directories you are creating anyway,
> >>>>> so it's not much different, right?
> >>>> Hello
> >>>>
> >>>> Sorry for the late reply. As the first userspace consumer of this
> >>>> interface [1], I can confirm that reading a single file to get the mask
> >>>> would be better, at least for performance reason. On large platforms, we
> >>>> already have to read thousands of sysfs files to get CPU topology and
> >>>> cache information, I'd be happy not to read one more file per cpu.
> >>>>
> >>>> Reading these sysfs files is slow, and it does not scale well when
> >>>> multiple processes read them in parallel.
> >>> Really?  Where is the slowdown?  Would something like readfile() work
> >>> better for you for that?
> >>> 	https://lore.kernel.org/linux-api/20200704140250.423345-1-gregkh@linuxfoundation.org/
> >>
> >> I guess readfile would improve the sequential case by avoiding syscalls
> >> but it would not improve the parallel case since syscalls shouldn't have
> >> any parallel issue?
> > syscalls should not have parallel issues at all.
> >
> >> We've been watching the status of readfile() since it was posted on LKML
> >> 6 months ago, but we were actually wondering if it would end up being
> >> included at some point.
> > It needs a solid reason to be merged.  My "test" benchmarks are fun to
> > run, but I have yet to find a real need for it anywhere as the
> > open/read/close syscall overhead seems to be lost in the noise on any
> > real application workload that I can find.
> >
> > If you have a real need, and it reduces overhead and cpu usage, I'm more
> > than willing to update the patchset and resubmit it.
> 
> 
> Good, I'll give it at try.
> 
> 
> >>> How does multiple processes slow anything down, there shouldn't be any
> >>> shared locks here.
> >>
> >> When I benchmarked this in 2016, reading a single (small) sysfs file was
> >> 41x slower when running 64 processes simultaneously on a 64-core Knights
> >> Landing than reading from a single process. On a SGI Altix UV with 12x
> >> 8-core CPUs, reading from one process per CPU (12 total) was 60x slower
> >> (which could mean NUMA affinity matters), and reading from one process
> >> per core (96 total) was 491x slower.
> >>
> >> I will try to find some time to dig further on recent kernels with perf
> >> and readfile (both machines were running RHEL7).
> > 2016 was a long time ago in kernel-land, please retest on a kernel.org
> > release, not a RHEL monstrosity.
> 
> 
> Quick test on 5.8.14 from Debian (fairly close to mainline) on a server
> with 2x20 cores.
> 
> I am measuring the time to do open+read+close of
> /sys/devices/system/cpu/cpu15/topology/die_id 1000 times
> 
> With a single process, it takes 2ms (2us per open+read+close, looks OK).
> 
> With one process per core (with careful binding, etc), it jumps from 2ms
> to 190ms (without much variation).
> 
> It looks like locks in kernfs_iop_permission and kernfs_dop_revalidate
> are causing the issue.
> 
> I am attaching the perf report callgraph output below.

Ouch, yes, we are hitting the single kernfs mutex for all of this, not
nice.  I'll add this to my list of things to look at in the near future,
thanks for the report!

greg k-h

  parent reply	other threads:[~2020-11-12 11:33 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-03  1:17 [PATCH 0/4] drivers core: Introduce CPU type sysfs interface Ricardo Neri
2020-10-03  1:17 ` [PATCH 1/4] " Ricardo Neri
2020-10-03  3:27   ` Randy Dunlap
2020-10-06  1:15     ` Ricardo Neri
2020-10-10  3:14       ` Randy Dunlap
2020-10-03  8:53   ` Greg Kroah-Hartman
2020-10-03 11:05     ` Greg Kroah-Hartman
2020-10-06  1:08       ` Ricardo Neri
2020-10-06  0:57     ` Ricardo Neri
2020-10-06  7:37       ` Greg Kroah-Hartman
2020-10-07  3:14         ` Ricardo Neri
2020-10-07  5:15           ` Greg Kroah-Hartman
2020-10-08  3:34             ` Ricardo Neri
2020-11-12  6:19             ` Brice Goglin
2020-11-12  6:42               ` Greg Kroah-Hartman
2020-11-12  9:10                 ` Brice Goglin
2020-11-12 10:49                   ` Greg Kroah-Hartman
2020-11-17 15:55                     ` Brice Goglin
2020-11-18 10:45                       ` Brice Goglin
2020-11-18 10:57                         ` Greg Kroah-Hartman
     [not found]                     ` <38f290d2-4c3a-d1b0-f3cc-a0897ea10abd@gmail.com>
2020-11-12 11:34                       ` Greg Kroah-Hartman [this message]
2020-11-19  8:25                       ` Fox Chen
2020-10-03  1:17 ` [PATCH 2/4] x86/cpu: Describe hybrid CPUs in cpuinfo_x86 Ricardo Neri
2020-10-03  4:07   ` kernel test robot
2020-10-03  1:17 ` [PATCH 3/4] x86/cpu/intel: Add function to get name of hybrid CPU types Ricardo Neri
2020-10-03  1:17 ` [PATCH 4/4] x86/cpu/topology: Implement the CPU type sysfs interface Ricardo Neri
2020-10-03  3:33   ` Randy Dunlap
2020-10-03  5:28   ` kernel test robot
2020-10-03  8:55   ` Greg Kroah-Hartman
2020-10-06  1:05     ` Ricardo Neri
2020-10-03  8:49 ` [PATCH 0/4] drivers core: Introduce " Borislav Petkov
2020-10-06  0:27   ` Ricardo Neri
2020-10-06  8:51 ` Qais Yousef
2020-10-07  2:50   ` Ricardo Neri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=X60dvJoT4fURcnsF@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=ak@linux.intel.com \
    --cc=bp@suse.de \
    --cc=brice.goglin@gmail.com \
    --cc=dave.hansen@intel.com \
    --cc=ego@linux.vnet.ibm.com \
    --cc=kan.liang@linux.intel.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=ravi.v.shankar@intel.com \
    --cc=ricardo.neri-calderon@linux.intel.com \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).