linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
To: Dave Hansen <dave.hansen@intel.com>,
	"tiantao (H)" <tiantao6@hisilicon.com>,
	"corbet@lwn.net" <corbet@lwn.net>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
Cc: "linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	"Valentin Schneider" <valentin.schneider@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>
Subject: RE: [PATCH 1/2] CPU, NUMA topology ABIs: clarify the overflow issue of sysfs pagebuf
Date: Thu, 29 Apr 2021 22:32:30 +0000	[thread overview]
Message-ID: <4bf6870f7f3942398e4d1fdaa42184c7@hisilicon.com> (raw)
In-Reply-To: <7c663f7e-07e0-6d95-3012-6e31a1b78f7e@intel.com>



> -----Original Message-----
> From: Dave Hansen [mailto:dave.hansen@intel.com]
> Sent: Friday, April 30, 2021 9:39 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>; tiantao (H)
> <tiantao6@hisilicon.com>; corbet@lwn.net; gregkh@linuxfoundation.org
> Cc: linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org; Rafael J.
> Wysocki <rafael@kernel.org>; Peter Zijlstra <peterz@infradead.org>; Valentin
> Schneider <valentin.schneider@arm.com>; Dave Hansen
> <dave.hansen@linux.intel.com>; Daniel Bristot de Oliveira <bristot@redhat.com>
> Subject: Re: [PATCH 1/2] CPU, NUMA topology ABIs: clarify the overflow issue
> of sysfs pagebuf
> 
> On 4/29/21 2:08 PM, Song Bao Hua (Barry Song) wrote:
> >> Do we think >PAGE_SIZE data out of a sysfs file is a worse ABI break or
> >> something?
> > This kind of cpu list ABIs have been there for many years but have
> > never been documented well.
> >
> > We have two ABIs:
> > xxx_cpus - in format like 3333333333
> > xxx_cpus_list - in format like 0,3,5,7,9,11,13....
> >
> > xxx_cpus_list is another human-readable version of xxx_cpus. It doesn't
> > include any more useful information than xxx_cpus.
> >
> > xxx_cpus won't overflow based on BUILD_BUG_ON and maximum NR_CPUS
> > in kconfig nowadays.
> >
> > if people all agree the trimmed list is a break of ABI, I think we may
> > totally remove this list. For these days, this list probably has never
> > overflowed but literally this could happen.
> >
> > thoughts?
> 
> From what Greg said, it sounds like removing the BUILD_BUG_ON(), making
> it a binary sysfs file, and making it support arbitrary lengths is the
> way to go.

I am actually more concerned on xxx_cpus_list than xxx_cpus.

xxx_cpus has never overflowed. Though there is a BUILD_BUG_ON(), but the
maximum NR_CPUS is 8096, it is still far from overflow.
8096 /32 * 9 = 2277
as 2277 < 4096

While NR_CPUS gets to ~14500, for a 4KB page, xxx_cpus will overflow.
But I don't know when hardware will reach there. If we reach there,
the existing code to describe topology and schedule tasks might also
need rework.

On the other hand, lscpu, hwloc, numactl etc are using the existing
hex bitmap ABI:

$ strace lscpu  2>&1 | grep topology
faccessat(AT_FDCWD,
"/sys/devices/system/cpu/cpu0/topology/thread_siblings", F_OK) = 0
openat(AT_FDCWD,
"/sys/devices/system/cpu/cpu0/topology/thread_siblings",
O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD,
"/sys/devices/system/cpu/cpu0/topology/core_siblings",
O_RDONLY|O_CLOEXEC) = 3
faccessat(AT_FDCWD,
"/sys/devices/system/cpu/cpu0/topology/book_siblings", F_OK) = -1
ENOENT (No such file or directory)
faccessat(AT_FDCWD,
"/sys/devices/system/cpu/cpu1/topology/thread_siblings", F_OK) = 0
openat(AT_FDCWD,
...

$ strace numactl --hardware  2>&1 | grep cpu
openat(AT_FDCWD, "/sys/devices/system/cpu",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/sys/devices/system/node/node0/cpumap", O_RDONLY) = 3
openat(AT_FDCWD, "/sys/devices/system/node/node1/cpumap", O_RDONLY) = 3
openat(AT_FDCWD, "/sys/devices/system/node/node2/cpumap", O_RDONLY) = 3
openat(AT_FDCWD, "/sys/devices/system/node/node3/cpumap", O_RDONLY) = 3

If we move to binary, it means we have to change those applications.

For this moment, I'd argue we keep cpu bitmap ABI as is and defer this
issue to when we really get so many cores.

Thanks
Barry

  reply	other threads:[~2021-04-29 22:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-29  7:03 [PATCH 0/2] clarify and cleanup CPU and NUMA topology ABIs Tian Tao
2021-04-29  7:03 ` [PATCH 1/2] CPU, NUMA topology ABIs: clarify the overflow issue of sysfs pagebuf Tian Tao
2021-04-29 14:21   ` Dave Hansen
2021-04-29 15:46     ` Greg KH
2021-04-29 21:08     ` Song Bao Hua (Barry Song)
2021-04-29 21:38       ` Dave Hansen
2021-04-29 22:32         ` Song Bao Hua (Barry Song) [this message]
2021-04-29 22:38           ` Dave Hansen
2021-04-29 22:48             ` Song Bao Hua (Barry Song)
2021-04-30  6:05             ` gregkh
2021-07-23 11:20             ` Song Bao Hua (Barry Song)
2021-07-23 11:28               ` gregkh
2021-07-23 12:48                 ` Song Bao Hua (Barry Song)
2021-05-06 23:16   ` kernel test robot
2021-04-29  7:03 ` [PATCH 2/2] Documentation/ABI: Move the topology-related sysfs interface to the right place Tian Tao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4bf6870f7f3942398e4d1fdaa42184c7@hisilicon.com \
    --to=song.bao.hua@hisilicon.com \
    --cc=bristot@redhat.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=tiantao6@hisilicon.com \
    --cc=valentin.schneider@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).