All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drivers/base/node.c: fix userspace break from using bin_attributes for cpumap and cpulist
@ 2022-07-12 21:43 Phil Auld
  2022-07-12 23:18 ` Barry Song
  2022-07-13  6:06 ` Greg Kroah-Hartman
  0 siblings, 2 replies; 11+ messages in thread
From: Phil Auld @ 2022-07-12 21:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, Rafael J . Wysocki : --cc=, Tian Tao, Barry Song

Using bin_attributes with a 0 size causes fstat and friends to return that 0 size.
This breaks userspace code that retrieves the size before reading the file. Rather
than reverting 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size
limitation of cpumap ABI") let's put in a size value at compile time. Use direct
comparison and a worst-case maximum to ensure compile time constants. For cpulist the 
max is on the order of NR_CPUS * (ceil(log10(NR_CPUS)) + 1) which for 8192 is 40960. 
In order to get near that you'd need a system with every other CPU on one node or 
something similar. e.g. (0,2,4,... 1024,1026...). We set it to a min of PAGE_SIZE 
to retain the older behavior. For cpumap, PAGE_SIZE is plenty big.

On an 80 cpu 4-node sytem (NR_CPUS == 8192)

before:

-r--r--r--. 1 root root 0 Jul 12 14:08 /sys/devices/system/node/node0/cpulist
-r--r--r--. 1 root root 0 Jul 11 17:25 /sys/devices/system/node/node0/cpumap

after:

-r--r--r--. 1 root root 40960 Jul 12 16:48 /sys/devices/system/node/node0/cpulist
-r--r--r--. 1 root root  4096 Jul 12 15:50 /sys/devices/system/node/node0/cpumap

Fixes: 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI")
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Phil Auld <pauld@redhat.com>
---
 drivers/base/node.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 0ac6376ef7a1..291c69671f23 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -45,7 +45,7 @@ static inline ssize_t cpumap_read(struct file *file, struct kobject *kobj,
 	return n;
 }
 
-static BIN_ATTR_RO(cpumap, 0);
+static BIN_ATTR_RO(cpumap, PAGE_SIZE);
 
 static inline ssize_t cpulist_read(struct file *file, struct kobject *kobj,
 				   struct bin_attribute *attr, char *buf,
@@ -66,7 +66,7 @@ static inline ssize_t cpulist_read(struct file *file, struct kobject *kobj,
 	return n;
 }
 
-static BIN_ATTR_RO(cpulist, 0);
+static BIN_ATTR_RO(cpulist, (((NR_CPUS * 5) > PAGE_SIZE) ? NR_CPUS *5 : PAGE_SIZE));
 
 /**
  * struct node_access_nodes - Access class device to hold user visible
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] drivers/base/node.c: fix userspace break from using bin_attributes for cpumap and cpulist
  2022-07-12 21:43 [PATCH] drivers/base/node.c: fix userspace break from using bin_attributes for cpumap and cpulist Phil Auld
@ 2022-07-12 23:18 ` Barry Song
  2022-07-13 11:37   ` Phil Auld
  2022-07-13  6:06 ` Greg Kroah-Hartman
  1 sibling, 1 reply; 11+ messages in thread
From: Barry Song @ 2022-07-12 23:18 UTC (permalink / raw)
  To: Phil Auld
  Cc: LKML, Greg Kroah-Hartman, Rafael J . Wysocki : --cc=,
	Tian Tao, Barry Song

On Wed, Jul 13, 2022 at 9:58 AM Phil Auld <pauld@redhat.com> wrote:
>
> Using bin_attributes with a 0 size causes fstat and friends to return that 0 size.
> This breaks userspace code that retrieves the size before reading the file. Rather
> than reverting 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size
> limitation of cpumap ABI") let's put in a size value at compile time. Use direct
> comparison and a worst-case maximum to ensure compile time constants. For cpulist the
> max is on the order of NR_CPUS * (ceil(log10(NR_CPUS)) + 1) which for 8192 is 40960.
> In order to get near that you'd need a system with every other CPU on one node or
> something similar. e.g. (0,2,4,... 1024,1026...). We set it to a min of PAGE_SIZE
> to retain the older behavior. For cpumap, PAGE_SIZE is plenty big.
>
> On an 80 cpu 4-node system (NR_CPUS == 8192)
>
> before:
>
> -r--r--r--. 1 root root 0 Jul 12 14:08 /sys/devices/system/node/node0/cpulist
> -r--r--r--. 1 root root 0 Jul 11 17:25 /sys/devices/system/node/node0/cpumap

it is a fundamental problem of bin_attr, isn't it? when we don't know the
exact size of an attribute, and this size might become more than one
PAGE_SIZE, we use bin_attr to break the limitation. but the fact is that
we really don't know or it is really hard to know the actual size of the
attribute.

>
> after:
>
> -r--r--r--. 1 root root 40960 Jul 12 16:48 /sys/devices/system/node/node0/cpulist
> -r--r--r--. 1 root root  4096 Jul 12 15:50 /sys/devices/system/node/node0/cpumap

if we finally set a size which might be improper, it seems we defeat the
purpose we start to move to bin_attr?

>
> Fixes: 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI")
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Signed-off-by: Phil Auld <pauld@redhat.com>
> ---
>  drivers/base/node.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 0ac6376ef7a1..291c69671f23 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -45,7 +45,7 @@ static inline ssize_t cpumap_read(struct file *file, struct kobject *kobj,
>         return n;
>  }
>
> -static BIN_ATTR_RO(cpumap, 0);
> +static BIN_ATTR_RO(cpumap, PAGE_SIZE);

PAGE_SIZE is probably big enough, will we still calculate to get it rather than
hard coding?

>
>  static inline ssize_t cpulist_read(struct file *file, struct kobject *kobj,
>                                    struct bin_attribute *attr, char *buf,
> @@ -66,7 +66,7 @@ static inline ssize_t cpulist_read(struct file *file, struct kobject *kobj,
>         return n;
>  }
>
> -static BIN_ATTR_RO(cpulist, 0);
> +static BIN_ATTR_RO(cpulist, (((NR_CPUS * 5) > PAGE_SIZE) ? NR_CPUS *5 : PAGE_SIZE));

I am still not sure why it is NR_CPUS * 5. Is 5 bytes big enough to
describe the number
of cpu id? technically it seems not,  for example,  for cpuid=100000,
we need at least 6
bytes.

BTW, my silly question is that what if we set the size to MAXIMUM int?
Will it fix
the userspace fsstat?

>
>  /**
>   * struct node_access_nodes - Access class device to hold user visible
> --
> 2.31.1
>

Thanks
Barry

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drivers/base/node.c: fix userspace break from using bin_attributes for cpumap and cpulist
  2022-07-12 21:43 [PATCH] drivers/base/node.c: fix userspace break from using bin_attributes for cpumap and cpulist Phil Auld
  2022-07-12 23:18 ` Barry Song
@ 2022-07-13  6:06 ` Greg Kroah-Hartman
  2022-07-13 11:47   ` Phil Auld
  1 sibling, 1 reply; 11+ messages in thread
From: Greg Kroah-Hartman @ 2022-07-13  6:06 UTC (permalink / raw)
  To: Phil Auld; +Cc: linux-kernel, Rafael J . Wysocki, Tian Tao, Barry Song

On Tue, Jul 12, 2022 at 05:43:01PM -0400, Phil Auld wrote:
> Using bin_attributes with a 0 size causes fstat and friends to return that 0 size.
> This breaks userspace code that retrieves the size before reading the file. Rather
> than reverting 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size
> limitation of cpumap ABI") let's put in a size value at compile time. Use direct
> comparison and a worst-case maximum to ensure compile time constants. For cpulist the 
> max is on the order of NR_CPUS * (ceil(log10(NR_CPUS)) + 1) which for 8192 is 40960. 
> In order to get near that you'd need a system with every other CPU on one node or 
> something similar. e.g. (0,2,4,... 1024,1026...). We set it to a min of PAGE_SIZE 
> to retain the older behavior. For cpumap, PAGE_SIZE is plenty big.

Does userspace care about that size, or can we just put any value in
there and it will be ok?  How about just returning to the original
PAGE_SIZE value to keep things looking identical, will userspace not
read more than that size from the file then?

> On an 80 cpu 4-node sytem (NR_CPUS == 8192)

We have systems running Linux with many more cpus than that, and your
company knows this :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drivers/base/node.c: fix userspace break from using bin_attributes for cpumap and cpulist
  2022-07-12 23:18 ` Barry Song
@ 2022-07-13 11:37   ` Phil Auld
  2022-07-13 12:00     ` Barry Song
  2022-07-13 13:06     ` Greg Kroah-Hartman
  0 siblings, 2 replies; 11+ messages in thread
From: Phil Auld @ 2022-07-13 11:37 UTC (permalink / raw)
  To: Barry Song
  Cc: LKML, Greg Kroah-Hartman, Rafael J . Wysocki : --cc=,
	Tian Tao, Barry Song

On Wed, Jul 13, 2022 at 11:18:59AM +1200 Barry Song wrote:
> On Wed, Jul 13, 2022 at 9:58 AM Phil Auld <pauld@redhat.com> wrote:
> >
> > Using bin_attributes with a 0 size causes fstat and friends to return that 0 size.
> > This breaks userspace code that retrieves the size before reading the file. Rather
> > than reverting 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size
> > limitation of cpumap ABI") let's put in a size value at compile time. Use direct
> > comparison and a worst-case maximum to ensure compile time constants. For cpulist the
> > max is on the order of NR_CPUS * (ceil(log10(NR_CPUS)) + 1) which for 8192 is 40960.
> > In order to get near that you'd need a system with every other CPU on one node or
> > something similar. e.g. (0,2,4,... 1024,1026...). We set it to a min of PAGE_SIZE
> > to retain the older behavior. For cpumap, PAGE_SIZE is plenty big.
> >
> > On an 80 cpu 4-node system (NR_CPUS == 8192)
> >
> > before:
> >
> > -r--r--r--. 1 root root 0 Jul 12 14:08 /sys/devices/system/node/node0/cpulist
> > -r--r--r--. 1 root root 0 Jul 11 17:25 /sys/devices/system/node/node0/cpumap
> 
> it is a fundamental problem of bin_attr, isn't it? when we don't know the
> exact size of an attribute, and this size might become more than one
> PAGE_SIZE, we use bin_attr to break the limitation. but the fact is that
> we really don't know or it is really hard to know the actual size of the
> attribute.
>

But it broke userspace applications. I figured rather than revert it maybe
we can find a max size to put in there and make it continue to work.

> >
> > after:
> >
> > -r--r--r--. 1 root root 40960 Jul 12 16:48 /sys/devices/system/node/node0/cpulist
> > -r--r--r--. 1 root root  4096 Jul 12 15:50 /sys/devices/system/node/node0/cpumap
> 
> if we finally set a size which might be improper, it seems we defeat the
> purpose we start to move to bin_attr?
> 
> >
> > Fixes: 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI")
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> > Signed-off-by: Phil Auld <pauld@redhat.com>
> > ---
> >  drivers/base/node.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/base/node.c b/drivers/base/node.c
> > index 0ac6376ef7a1..291c69671f23 100644
> > --- a/drivers/base/node.c
> > +++ b/drivers/base/node.c
> > @@ -45,7 +45,7 @@ static inline ssize_t cpumap_read(struct file *file, struct kobject *kobj,
> >         return n;
> >  }
> >
> > -static BIN_ATTR_RO(cpumap, 0);
> > +static BIN_ATTR_RO(cpumap, PAGE_SIZE);
> 
> PAGE_SIZE is probably big enough, will we still calculate to get it rather than
> hard coding?

This one is actually wrong. I did not realize how big a NR_CPUS people were actually using.
It should be something like (NR_CPUS/4 + NR_CPUS/32). 

> 
> >
> >  static inline ssize_t cpulist_read(struct file *file, struct kobject *kobj,
> >                                    struct bin_attribute *attr, char *buf,
> > @@ -66,7 +66,7 @@ static inline ssize_t cpulist_read(struct file *file, struct kobject *kobj,
> >         return n;
> >  }
> >
> > -static BIN_ATTR_RO(cpulist, 0);
> > +static BIN_ATTR_RO(cpulist, (((NR_CPUS * 5) > PAGE_SIZE) ? NR_CPUS *5 : PAGE_SIZE));
> 
> I am still not sure why it is NR_CPUS * 5. Is 5 bytes big enough to
> describe the number
> of cpu id? technically it seems not,  for example,  for cpuid=100000,
> we need at least 6
> bytes.

Sure. As I said in the comment I wanted to do NR_CPUS * (ceil(log10(NR_CPUS)) + 1) but doing
that math in the kernel was messy. So I used 5. Even that is probably way bigger than needed.
Are there really 100000 cpus on one node with discontiguous cpuids? "0-99999" is only, what,
9 characters?

We can put whatever number you want that is >= the size the read will return.

Thanks,
Phil

> 
> BTW, my silly question is that what if we set the size to MAXIMUM int?
> Will it fix
> the userspace fsstat?
> 
> >
> >  /**
> >   * struct node_access_nodes - Access class device to hold user visible
> > --
> > 2.31.1
> >
> 
> Thanks
> Barry
> 

-- 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drivers/base/node.c: fix userspace break from using bin_attributes for cpumap and cpulist
  2022-07-13  6:06 ` Greg Kroah-Hartman
@ 2022-07-13 11:47   ` Phil Auld
  2022-07-13 13:05     ` Greg Kroah-Hartman
  2022-07-13 13:10     ` Phil Auld
  0 siblings, 2 replies; 11+ messages in thread
From: Phil Auld @ 2022-07-13 11:47 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: linux-kernel, Rafael J . Wysocki, Tian Tao, Barry Song

Hi Greg,

On Wed, Jul 13, 2022 at 08:06:02AM +0200 Greg Kroah-Hartman wrote:
> On Tue, Jul 12, 2022 at 05:43:01PM -0400, Phil Auld wrote:
> > Using bin_attributes with a 0 size causes fstat and friends to return that 0 size.
> > This breaks userspace code that retrieves the size before reading the file. Rather
> > than reverting 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size
> > limitation of cpumap ABI") let's put in a size value at compile time. Use direct
> > comparison and a worst-case maximum to ensure compile time constants. For cpulist the 
> > max is on the order of NR_CPUS * (ceil(log10(NR_CPUS)) + 1) which for 8192 is 40960. 
> > In order to get near that you'd need a system with every other CPU on one node or 
> > something similar. e.g. (0,2,4,... 1024,1026...). We set it to a min of PAGE_SIZE 
> > to retain the older behavior. For cpumap, PAGE_SIZE is plenty big.
> 
> Does userspace care about that size, or can we just put any value in
> there and it will be ok?  How about just returning to the original
> PAGE_SIZE value to keep things looking identical, will userspace not
> read more than that size from the file then?
>

I'll go look. But I think the point of pre-reading the size with fstat is to allocate
a buffer to read into. So that may be a problem. 

That said, I believe in this case it's the cpulist file which given the use of ranges
is very unlikely to actually get that big. 

> > On an 80 cpu 4-node sytem (NR_CPUS == 8192)
> 
> We have systems running Linux with many more cpus than that, and your
> company knows this :)

The 80 cpus here don't matter and we only build with NR_CPUS = 8192 :)

But yes, I realize now that the cpumap part I posted is broken for larger
NR_CPUS.  I originally had it as NR_CPUS, but as I said in my reply to Barry,
it wants to be ~= NR_CPUS/4 + NR_CPUS/32. I'll change that.  

I think we should decide on a max for each and use that. 

Cheers,
Phil

> 
> thanks,
> 
> greg k-h
> 

-- 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drivers/base/node.c: fix userspace break from using bin_attributes for cpumap and cpulist
  2022-07-13 11:37   ` Phil Auld
@ 2022-07-13 12:00     ` Barry Song
  2022-07-13 12:20       ` Phil Auld
  2022-07-13 13:06     ` Greg Kroah-Hartman
  1 sibling, 1 reply; 11+ messages in thread
From: Barry Song @ 2022-07-13 12:00 UTC (permalink / raw)
  To: Phil Auld
  Cc: LKML, Greg Kroah-Hartman, Rafael J . Wysocki : --cc=,
	Tian Tao, Barry Song

Got it.

On Wed, Jul 13, 2022 at 11:37 PM Phil Auld <pauld@redhat.com> wrote:
>
> On Wed, Jul 13, 2022 at 11:18:59AM +1200 Barry Song wrote:
> > On Wed, Jul 13, 2022 at 9:58 AM Phil Auld <pauld@redhat.com> wrote:
> > >
> > > Using bin_attributes with a 0 size causes fstat and friends to return that 0 size.
> > > This breaks userspace code that retrieves the size before reading the file. Rather
> > > than reverting 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size
> > > limitation of cpumap ABI") let's put in a size value at compile time. Use direct
> > > comparison and a worst-case maximum to ensure compile time constants. For cpulist the
> > > max is on the order of NR_CPUS * (ceil(log10(NR_CPUS)) + 1) which for 8192 is 40960.
> > > In order to get near that you'd need a system with every other CPU on one node or
> > > something similar. e.g. (0,2,4,... 1024,1026...). We set it to a min of PAGE_SIZE
> > > to retain the older behavior. For cpumap, PAGE_SIZE is plenty big.
> > >
> > > On an 80 cpu 4-node system (NR_CPUS == 8192)
> > >
> > > before:
> > >
> > > -r--r--r--. 1 root root 0 Jul 12 14:08 /sys/devices/system/node/node0/cpulist
> > > -r--r--r--. 1 root root 0 Jul 11 17:25 /sys/devices/system/node/node0/cpumap
> >
> > it is a fundamental problem of bin_attr, isn't it? when we don't know the
> > exact size of an attribute, and this size might become more than one
> > PAGE_SIZE, we use bin_attr to break the limitation. but the fact is that
> > we really don't know or it is really hard to know the actual size of the
> > attribute.
> >
>
> But it broke userspace applications. I figured rather than revert it maybe
> we can find a max size to put in there and make it continue to work.
>
> > >
> > > after:
> > >
> > > -r--r--r--. 1 root root 40960 Jul 12 16:48 /sys/devices/system/node/node0/cpulist
> > > -r--r--r--. 1 root root  4096 Jul 12 15:50 /sys/devices/system/node/node0/cpumap
> >
> > if we finally set a size which might be improper, it seems we defeat the
> > purpose we start to move to bin_attr?
> >
> > >
> > > Fixes: 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI")
> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> > > Signed-off-by: Phil Auld <pauld@redhat.com>
> > > ---
> > >  drivers/base/node.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/base/node.c b/drivers/base/node.c
> > > index 0ac6376ef7a1..291c69671f23 100644
> > > --- a/drivers/base/node.c
> > > +++ b/drivers/base/node.c
> > > @@ -45,7 +45,7 @@ static inline ssize_t cpumap_read(struct file *file, struct kobject *kobj,
> > >         return n;
> > >  }
> > >
> > > -static BIN_ATTR_RO(cpumap, 0);
> > > +static BIN_ATTR_RO(cpumap, PAGE_SIZE);
> >
> > PAGE_SIZE is probably big enough, will we still calculate to get it rather than
> > hard coding?
>
> This one is actually wrong. I did not realize how big a NR_CPUS people were actually using.
> It should be something like (NR_CPUS/4 + NR_CPUS/32).
>
> >
> > >
> > >  static inline ssize_t cpulist_read(struct file *file, struct kobject *kobj,
> > >                                    struct bin_attribute *attr, char *buf,
> > > @@ -66,7 +66,7 @@ static inline ssize_t cpulist_read(struct file *file, struct kobject *kobj,
> > >         return n;
> > >  }
> > >
> > > -static BIN_ATTR_RO(cpulist, 0);
> > > +static BIN_ATTR_RO(cpulist, (((NR_CPUS * 5) > PAGE_SIZE) ? NR_CPUS *5 : PAGE_SIZE));
> >
> > I am still not sure why it is NR_CPUS * 5. Is 5 bytes big enough to
> > describe the number
> > of cpu id? technically it seems not,  for example,  for cpuid=100000,
> > we need at least 6
> > bytes.
>
> Sure. As I said in the comment I wanted to do NR_CPUS * (ceil(log10(NR_CPUS)) + 1) but doing
> that math in the kernel was messy. So I used 5. Even that is probably way bigger than needed.
> Are there really 100000 cpus on one node with discontiguous cpuids? "0-99999" is only, what,
> 9 characters?
>
> We can put whatever number you want that is >= the size the read will return.

Thanks,
does it mean we can use something like -1UL?

>
> Thanks,
> Phil
>
> >
> > BTW, my silly question is that what if we set the size to MAXIMUM int?
> > Will it fix
> > the userspace fsstat?
> >
> > >
> > >  /**
> > >   * struct node_access_nodes - Access class device to hold user visible
> > > --
> > > 2.31.1
> > >
> >
> > Thanks
> > Barry
> >
>
> --
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drivers/base/node.c: fix userspace break from using bin_attributes for cpumap and cpulist
  2022-07-13 12:00     ` Barry Song
@ 2022-07-13 12:20       ` Phil Auld
  0 siblings, 0 replies; 11+ messages in thread
From: Phil Auld @ 2022-07-13 12:20 UTC (permalink / raw)
  To: Barry Song; +Cc: LKML, Greg Kroah-Hartman, Rafael J . Wysocki : --cc=, Tian Tao

On Thu, Jul 14, 2022 at 12:00:34AM +1200 Barry Song wrote:
> Got it.
> 
> On Wed, Jul 13, 2022 at 11:37 PM Phil Auld <pauld@redhat.com> wrote:
> >
> > On Wed, Jul 13, 2022 at 11:18:59AM +1200 Barry Song wrote:
> > > On Wed, Jul 13, 2022 at 9:58 AM Phil Auld <pauld@redhat.com> wrote:
> > > >
> > > > Using bin_attributes with a 0 size causes fstat and friends to return that 0 size.
> > > > This breaks userspace code that retrieves the size before reading the file. Rather
> > > > than reverting 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size
> > > > limitation of cpumap ABI") let's put in a size value at compile time. Use direct
> > > > comparison and a worst-case maximum to ensure compile time constants. For cpulist the
> > > > max is on the order of NR_CPUS * (ceil(log10(NR_CPUS)) + 1) which for 8192 is 40960.
> > > > In order to get near that you'd need a system with every other CPU on one node or
> > > > something similar. e.g. (0,2,4,... 1024,1026...). We set it to a min of PAGE_SIZE
> > > > to retain the older behavior. For cpumap, PAGE_SIZE is plenty big.
> > > >
> > > > On an 80 cpu 4-node system (NR_CPUS == 8192)
> > > >
> > > > before:
> > > >
> > > > -r--r--r--. 1 root root 0 Jul 12 14:08 /sys/devices/system/node/node0/cpulist
> > > > -r--r--r--. 1 root root 0 Jul 11 17:25 /sys/devices/system/node/node0/cpumap
> > >
> > > it is a fundamental problem of bin_attr, isn't it? when we don't know the
> > > exact size of an attribute, and this size might become more than one
> > > PAGE_SIZE, we use bin_attr to break the limitation. but the fact is that
> > > we really don't know or it is really hard to know the actual size of the
> > > attribute.
> > >
> >
> > But it broke userspace applications. I figured rather than revert it maybe
> > we can find a max size to put in there and make it continue to work.
> >
> > > >
> > > > after:
> > > >
> > > > -r--r--r--. 1 root root 40960 Jul 12 16:48 /sys/devices/system/node/node0/cpulist
> > > > -r--r--r--. 1 root root  4096 Jul 12 15:50 /sys/devices/system/node/node0/cpumap
> > >
> > > if we finally set a size which might be improper, it seems we defeat the
> > > purpose we start to move to bin_attr?
> > >
> > > >
> > > > Fixes: 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI")
> > > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> > > > Signed-off-by: Phil Auld <pauld@redhat.com>
> > > > ---
> > > >  drivers/base/node.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/base/node.c b/drivers/base/node.c
> > > > index 0ac6376ef7a1..291c69671f23 100644
> > > > --- a/drivers/base/node.c
> > > > +++ b/drivers/base/node.c
> > > > @@ -45,7 +45,7 @@ static inline ssize_t cpumap_read(struct file *file, struct kobject *kobj,
> > > >         return n;
> > > >  }
> > > >
> > > > -static BIN_ATTR_RO(cpumap, 0);
> > > > +static BIN_ATTR_RO(cpumap, PAGE_SIZE);
> > >
> > > PAGE_SIZE is probably big enough, will we still calculate to get it rather than
> > > hard coding?
> >
> > This one is actually wrong. I did not realize how big a NR_CPUS people were actually using.
> > It should be something like (NR_CPUS/4 + NR_CPUS/32).
> >
> > >
> > > >
> > > >  static inline ssize_t cpulist_read(struct file *file, struct kobject *kobj,
> > > >                                    struct bin_attribute *attr, char *buf,
> > > > @@ -66,7 +66,7 @@ static inline ssize_t cpulist_read(struct file *file, struct kobject *kobj,
> > > >         return n;
> > > >  }
> > > >
> > > > -static BIN_ATTR_RO(cpulist, 0);
> > > > +static BIN_ATTR_RO(cpulist, (((NR_CPUS * 5) > PAGE_SIZE) ? NR_CPUS *5 : PAGE_SIZE));
> > >
> > > I am still not sure why it is NR_CPUS * 5. Is 5 bytes big enough to
> > > describe the number
> > > of cpu id? technically it seems not,  for example,  for cpuid=100000,
> > > we need at least 6
> > > bytes.
> >
> > Sure. As I said in the comment I wanted to do NR_CPUS * (ceil(log10(NR_CPUS)) + 1) but doing
> > that math in the kernel was messy. So I used 5. Even that is probably way bigger than needed.
> > Are there really 100000 cpus on one node with discontiguous cpuids? "0-99999" is only, what,
> > 9 characters?
> >
> > We can put whatever number you want that is >= the size the read will return.
> 
> Thanks,
> does it mean we can use something like -1UL?
>

I suppose we could be that seems like it would be overkill, no?  My understanding is the app
in question uses the reported size to allocate a buffer to read the file into. It needs to
be at least equal to the amount we'll actually read but 4GB seems like it might be a bit much.


Cheers,
Phil

> >
> > Thanks,
> > Phil
> >
> > >
> > > BTW, my silly question is that what if we set the size to MAXIMUM int?
> > > Will it fix
> > > the userspace fsstat?
> > >
> > > >
> > > >  /**
> > > >   * struct node_access_nodes - Access class device to hold user visible
> > > > --
> > > > 2.31.1
> > > >
> > >
> > > Thanks
> > > Barry
> > >
> >
> > --
> >
> 

-- 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drivers/base/node.c: fix userspace break from using bin_attributes for cpumap and cpulist
  2022-07-13 11:47   ` Phil Auld
@ 2022-07-13 13:05     ` Greg Kroah-Hartman
  2022-07-13 13:14       ` Phil Auld
  2022-07-13 13:10     ` Phil Auld
  1 sibling, 1 reply; 11+ messages in thread
From: Greg Kroah-Hartman @ 2022-07-13 13:05 UTC (permalink / raw)
  To: Phil Auld; +Cc: linux-kernel, Rafael J . Wysocki, Tian Tao, Barry Song

On Wed, Jul 13, 2022 at 07:47:58AM -0400, Phil Auld wrote:
> Hi Greg,
> 
> On Wed, Jul 13, 2022 at 08:06:02AM +0200 Greg Kroah-Hartman wrote:
> > On Tue, Jul 12, 2022 at 05:43:01PM -0400, Phil Auld wrote:
> > > Using bin_attributes with a 0 size causes fstat and friends to return that 0 size.
> > > This breaks userspace code that retrieves the size before reading the file. Rather
> > > than reverting 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size
> > > limitation of cpumap ABI") let's put in a size value at compile time. Use direct
> > > comparison and a worst-case maximum to ensure compile time constants. For cpulist the 
> > > max is on the order of NR_CPUS * (ceil(log10(NR_CPUS)) + 1) which for 8192 is 40960. 
> > > In order to get near that you'd need a system with every other CPU on one node or 
> > > something similar. e.g. (0,2,4,... 1024,1026...). We set it to a min of PAGE_SIZE 
> > > to retain the older behavior. For cpumap, PAGE_SIZE is plenty big.
> > 
> > Does userspace care about that size, or can we just put any value in
> > there and it will be ok?  How about just returning to the original
> > PAGE_SIZE value to keep things looking identical, will userspace not
> > read more than that size from the file then?
> >
> 
> I'll go look. But I think the point of pre-reading the size with fstat is to allocate
> a buffer to read into. So that may be a problem. 
> 
> That said, I believe in this case it's the cpulist file which given the use of ranges
> is very unlikely to actually get that big. 

That is why we had to change this to a binary file.  Think about
every-other CPU being there, that's a huge list.  This already was
broken on some systems which is why it had to be changed (i.e. we didn't
change it for no reason at all.)

> > > On an 80 cpu 4-node sytem (NR_CPUS == 8192)
> > 
> > We have systems running Linux with many more cpus than that, and your
> > company knows this :)
> 
> The 80 cpus here don't matter and we only build with NR_CPUS = 8192 :)
> 
> But yes, I realize now that the cpumap part I posted is broken for larger
> NR_CPUS.  I originally had it as NR_CPUS, but as I said in my reply to Barry,
> it wants to be ~= NR_CPUS/4 + NR_CPUS/32. I'll change that.  
> 
> I think we should decide on a max for each and use that. 

Sure, pick a max size please, that's fine with me.

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drivers/base/node.c: fix userspace break from using bin_attributes for cpumap and cpulist
  2022-07-13 11:37   ` Phil Auld
  2022-07-13 12:00     ` Barry Song
@ 2022-07-13 13:06     ` Greg Kroah-Hartman
  1 sibling, 0 replies; 11+ messages in thread
From: Greg Kroah-Hartman @ 2022-07-13 13:06 UTC (permalink / raw)
  To: Phil Auld
  Cc: Barry Song, LKML, Rafael J . Wysocki : --cc=, Tian Tao, Barry Song

On Wed, Jul 13, 2022 at 07:37:27AM -0400, Phil Auld wrote:
> On Wed, Jul 13, 2022 at 11:18:59AM +1200 Barry Song wrote:
> > On Wed, Jul 13, 2022 at 9:58 AM Phil Auld <pauld@redhat.com> wrote:
> > >
> > > Using bin_attributes with a 0 size causes fstat and friends to return that 0 size.
> > > This breaks userspace code that retrieves the size before reading the file. Rather
> > > than reverting 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size
> > > limitation of cpumap ABI") let's put in a size value at compile time. Use direct
> > > comparison and a worst-case maximum to ensure compile time constants. For cpulist the
> > > max is on the order of NR_CPUS * (ceil(log10(NR_CPUS)) + 1) which for 8192 is 40960.
> > > In order to get near that you'd need a system with every other CPU on one node or
> > > something similar. e.g. (0,2,4,... 1024,1026...). We set it to a min of PAGE_SIZE
> > > to retain the older behavior. For cpumap, PAGE_SIZE is plenty big.
> > >
> > > On an 80 cpu 4-node system (NR_CPUS == 8192)
> > >
> > > before:
> > >
> > > -r--r--r--. 1 root root 0 Jul 12 14:08 /sys/devices/system/node/node0/cpulist
> > > -r--r--r--. 1 root root 0 Jul 11 17:25 /sys/devices/system/node/node0/cpumap
> > 
> > it is a fundamental problem of bin_attr, isn't it? when we don't know the
> > exact size of an attribute, and this size might become more than one
> > PAGE_SIZE, we use bin_attr to break the limitation. but the fact is that
> > we really don't know or it is really hard to know the actual size of the
> > attribute.
> >
> 
> But it broke userspace applications. I figured rather than revert it maybe
> we can find a max size to put in there and make it continue to work.

Yes, we need to do this, we can't break userspace.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drivers/base/node.c: fix userspace break from using bin_attributes for cpumap and cpulist
  2022-07-13 11:47   ` Phil Auld
  2022-07-13 13:05     ` Greg Kroah-Hartman
@ 2022-07-13 13:10     ` Phil Auld
  1 sibling, 0 replies; 11+ messages in thread
From: Phil Auld @ 2022-07-13 13:10 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: linux-kernel, Rafael J . Wysocki, Tian Tao, Barry Song

On Wed, Jul 13, 2022 at 07:48:00AM -0400 Phil Auld wrote:
> Hi Greg,
> 
> On Wed, Jul 13, 2022 at 08:06:02AM +0200 Greg Kroah-Hartman wrote:
> > On Tue, Jul 12, 2022 at 05:43:01PM -0400, Phil Auld wrote:
> > > Using bin_attributes with a 0 size causes fstat and friends to return that 0 size.
> > > This breaks userspace code that retrieves the size before reading the file. Rather
> > > than reverting 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size
> > > limitation of cpumap ABI") let's put in a size value at compile time. Use direct
> > > comparison and a worst-case maximum to ensure compile time constants. For cpulist the 
> > > max is on the order of NR_CPUS * (ceil(log10(NR_CPUS)) + 1) which for 8192 is 40960. 
> > > In order to get near that you'd need a system with every other CPU on one node or 
> > > something similar. e.g. (0,2,4,... 1024,1026...). We set it to a min of PAGE_SIZE 
> > > to retain the older behavior. For cpumap, PAGE_SIZE is plenty big.
> > 
> > Does userspace care about that size, or can we just put any value in
> > there and it will be ok?  How about just returning to the original
> > PAGE_SIZE value to keep things looking identical, will userspace not
> > read more than that size from the file then?
> >
> 
> I'll go look. But I think the point of pre-reading the size with fstat is to allocate
> a buffer to read into. So that may be a problem. 
>

From what I understand the app does use the size from fstat to allocate a buffer.

It also does a lseek to the end and back. This is actaully where it breaks when it gets a 0.

This is before:
8747  10:20:32.584460 fstat(6</sys/devices/system/node/node0/cpulist>, {..., st_size=4096, ...}) = 0 <0.000006>
8747  10:20:32.584502 fstat(6</sys/devices/system/node/node0/cpulist>, {..., st_size=4096, ...}) = 0 <0.000005>
8747  10:20:32.584537 lseek(6</sys/devices/system/node/node0/cpulist>, 4096, SEEK_SET) = 4096 <0.000005>
8747  10:20:32.584560 lseek(6</sys/devices/system/node/node0/cpulist>, 0, SEEK_SET) = 0 <0.000005>
8747  10:20:32.584585 read(6</sys/devices/system/node/node0/cpulist>, "0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46\n", 4096) = 67 <0.000008>
8747  10:20:32.584617 read(6</sys/devices/system/node/node0/cpulist>, "", 4096) = 0 <0.000005>

(I'll note here their system does seem to have a worst case cpu to node list too)

And with the bin_attr change:
8871  09:27:41.034279 fstat(6</sys/devices/system/node/node0/cpulist>, {..., st_size=0, ...}) = 0 <0.000009>
8871  09:27:41.034330 fstat(6</sys/devices/system/node/node0/cpulist>, {..., st_size=0, ...}) = 0 <0.000010>
8871  09:27:41.034377 lseek(6</sys/devices/system/node/node0/cpulist>, 0, SEEK_SET) = 0 <0.000008>
8871  09:27:41.034410 lseek(6</sys/devices/system/node/node0/cpulist>, 0, SEEK_SET) = 0 <0.000008>
And here it spins in a loop. 


> That said, I believe in this case it's the cpulist file which given the use of ranges
> is very unlikely to actually get that big. 
> 
> > > On an 80 cpu 4-node sytem (NR_CPUS == 8192)
> > 
> > We have systems running Linux with many more cpus than that, and your
> > company knows this :)
> 
> The 80 cpus here don't matter and we only build with NR_CPUS = 8192 :)
> 
> But yes, I realize now that the cpumap part I posted is broken for larger
> NR_CPUS.  I originally had it as NR_CPUS, but as I said in my reply to Barry,
> it wants to be ~= NR_CPUS/4 + NR_CPUS/32. I'll change that.  
> 
> I think we should decide on a max for each and use that. 
>

What values of NR_CPUS are people actually using when they build kernels? Barry mentioned cpuid 100000 for
example. I'm not sure if that was real or just illustrating the need for more characters.

I've modified my patch to use NR_CPUS/2 for the cpumap which should be plenty. And to use NR_CPUS*6 for
the cpulist, which covers up to 99999 cpus safely.


Cheers,
Phil


> Cheers,
> Phil
> 
> > 
> > thanks,
> > 
> > greg k-h
> > 
> 
> -- 

-- 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drivers/base/node.c: fix userspace break from using bin_attributes for cpumap and cpulist
  2022-07-13 13:05     ` Greg Kroah-Hartman
@ 2022-07-13 13:14       ` Phil Auld
  0 siblings, 0 replies; 11+ messages in thread
From: Phil Auld @ 2022-07-13 13:14 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: linux-kernel, Rafael J . Wysocki, Tian Tao

On Wed, Jul 13, 2022 at 03:05:52PM +0200 Greg Kroah-Hartman wrote:
> On Wed, Jul 13, 2022 at 07:47:58AM -0400, Phil Auld wrote:
> > Hi Greg,
> > 
> > On Wed, Jul 13, 2022 at 08:06:02AM +0200 Greg Kroah-Hartman wrote:
> > > On Tue, Jul 12, 2022 at 05:43:01PM -0400, Phil Auld wrote:
> > > > Using bin_attributes with a 0 size causes fstat and friends to return that 0 size.
> > > > This breaks userspace code that retrieves the size before reading the file. Rather
> > > > than reverting 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size
> > > > limitation of cpumap ABI") let's put in a size value at compile time. Use direct
> > > > comparison and a worst-case maximum to ensure compile time constants. For cpulist the 
> > > > max is on the order of NR_CPUS * (ceil(log10(NR_CPUS)) + 1) which for 8192 is 40960. 
> > > > In order to get near that you'd need a system with every other CPU on one node or 
> > > > something similar. e.g. (0,2,4,... 1024,1026...). We set it to a min of PAGE_SIZE 
> > > > to retain the older behavior. For cpumap, PAGE_SIZE is plenty big.
> > > 
> > > Does userspace care about that size, or can we just put any value in
> > > there and it will be ok?  How about just returning to the original
> > > PAGE_SIZE value to keep things looking identical, will userspace not
> > > read more than that size from the file then?
> > >
> > 
> > I'll go look. But I think the point of pre-reading the size with fstat is to allocate
> > a buffer to read into. So that may be a problem. 
> > 
> > That said, I believe in this case it's the cpulist file which given the use of ranges
> > is very unlikely to actually get that big. 
> 
> That is why we had to change this to a binary file.  Think about
> every-other CPU being there, that's a huge list.  This already was
> broken on some systems which is why it had to be changed (i.e. we didn't
> change it for no reason at all.)
>

I didn't think you did and the change made sense. I did not expect this to
cause problems either when I backported it... :)

> > > > On an 80 cpu 4-node sytem (NR_CPUS == 8192)
> > > 
> > > We have systems running Linux with many more cpus than that, and your
> > > company knows this :)
> > 
> > The 80 cpus here don't matter and we only build with NR_CPUS = 8192 :)
> > 
> > But yes, I realize now that the cpumap part I posted is broken for larger
> > NR_CPUS.  I originally had it as NR_CPUS, but as I said in my reply to Barry,
> > it wants to be ~= NR_CPUS/4 + NR_CPUS/32. I'll change that.  
> > 
> > I think we should decide on a max for each and use that. 
> 
> Sure, pick a max size please, that's fine with me.

Right. I had another reply that crossed in the ether.

I can repost with the new version shortly.

It's using cpumap at NR_CPUS/2 and cpulist at NR_CPUS*6. 


Cheers,
Phil

> 
> greg k-h
> 

-- 


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-07-13 13:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-12 21:43 [PATCH] drivers/base/node.c: fix userspace break from using bin_attributes for cpumap and cpulist Phil Auld
2022-07-12 23:18 ` Barry Song
2022-07-13 11:37   ` Phil Auld
2022-07-13 12:00     ` Barry Song
2022-07-13 12:20       ` Phil Auld
2022-07-13 13:06     ` Greg Kroah-Hartman
2022-07-13  6:06 ` Greg Kroah-Hartman
2022-07-13 11:47   ` Phil Auld
2022-07-13 13:05     ` Greg Kroah-Hartman
2022-07-13 13:14       ` Phil Auld
2022-07-13 13:10     ` Phil Auld

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.