linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yury Norov <yury.norov@gmail.com>
To: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"andriy.shevchenko@linux.intel.com" 
	<andriy.shevchenko@linux.intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"dave.hansen@intel.com" <dave.hansen@intel.com>,
	"linux@rasmusvillemoes.dk" <linux@rasmusvillemoes.dk>,
	"rafael@kernel.org" <rafael@kernel.org>,
	"rdunlap@infradead.org" <rdunlap@infradead.org>,
	"agordeev@linux.ibm.com" <agordeev@linux.ibm.com>,
	"sbrivio@redhat.com" <sbrivio@redhat.com>,
	"jianpeng.ma@intel.com" <jianpeng.ma@intel.com>,
	"valentin.schneider@arm.com" <valentin.schneider@arm.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"bristot@redhat.com" <bristot@redhat.com>,
	"guodong.xu@linaro.org" <guodong.xu@linaro.org>,
	tangchengchang <tangchengchang@huawei.com>,
	"Zengtao (B)" <prime.zeng@hisilicon.com>,
	yangyicong <yangyicong@huawei.com>,
	"tim.c.chen@linux.intel.com" <tim.c.chen@linux.intel.com>,
	Linuxarm <linuxarm@huawei.com>,
	"tiantao (H)" <tiantao6@hisilicon.com>
Subject: Re: [PATCH v7 2/4] topology: use bin_attribute to break the size limitation of cpumap ABI
Date: Fri, 16 Jul 2021 18:12:21 -0700	[thread overview]
Message-ID: <YPIudYST90WC4k4U@yury-ThinkPad> (raw)
In-Reply-To: <3d83c6b11dae4a83add9d8dcc04bbd80@hisilicon.com>

On Sat, Jul 17, 2021 at 12:16:48AM +0000, Song Bao Hua (Barry Song) wrote:
> 
> 
> > -----Original Message-----
> > From: Yury Norov [mailto:yury.norov@gmail.com]
> > Sent: Saturday, July 17, 2021 8:04 AM
> > To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> > Cc: gregkh@linuxfoundation.org; akpm@linux-foundation.org;
> > andriy.shevchenko@linux.intel.com; linux-kernel@vger.kernel.org;
> > dave.hansen@intel.com; linux@rasmusvillemoes.dk; rafael@kernel.org;
> > rdunlap@infradead.org; agordeev@linux.ibm.com; sbrivio@redhat.com;
> > jianpeng.ma@intel.com; valentin.schneider@arm.com; peterz@infradead.org;
> > bristot@redhat.com; guodong.xu@linaro.org; tangchengchang
> > <tangchengchang@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
> > yangyicong <yangyicong@huawei.com>; tim.c.chen@linux.intel.com; Linuxarm
> > <linuxarm@huawei.com>; tiantao (H) <tiantao6@hisilicon.com>
> > Subject: Re: [PATCH v7 2/4] topology: use bin_attribute to break the size
> > limitation of cpumap ABI
> > 
> > On Fri, Jul 16, 2021 at 08:49:58AM +0000, Song Bao Hua (Barry Song) wrote:
> > > Hi Yury,
> > > Not sure if I have totally got your idea. But please see if the below
> > > is closer to what you prefer.
> > >
> > > I haven't really tested it. But this approach can somehow solve the
> > > problem you mentioned(malloc/free and printing is done 1000times for
> > > a 1MB buffer which is read 1K each time).
> > >
> > > Bitmap provides some API to alloc and return print_buf:
> > >
> > > +ssize_t bitmap_get_print_buf(bool list, char **buf, const unsigned long
> > *maskp,
> > > +               int nmaskbits)
> > > +{
> > > +       const char *fmt = list ? "%*pbl\n" : "%*pb\n";
> > > +       ssize_t size;
> > > +
> > > +       size = snprintf(NULL, 0, fmt, nmaskbits, maskp);
> > > +       *buf = kmalloc_track_caller(size + 1, GFP_KERNEL);
> > > +       scnprintf(*buf, size + 1, fmt, nmaskbits, maskp);
> > > +
> > > +       return size + 1;
> > > +}
> > > +
> > > +static inline ssize_t
> > > +cpumap_get_print_buf(bool list, char **buf, const struct cpumask *mask)
> > > +{
> > > +       return bitmap_get_print_buf(list, buf, cpumask_bits(mask),
> > > +                       nr_cpu_ids);
> > > +}
> > > +
> > > +struct bitmap_print_buf
> > > +{
> > > +       char *buf;
> > > +       ssize_t size;
> > > +};
> > > +
> > >
> > > In bin_attribute, move to get and save the buffer while sysfs entry is
> > > read at the first time and free it when file arrives EOF:
> > >
> > >  #define define_id_show_func(name)                                      \
> > >  static ssize_t name##_show(struct device *dev,                         \
> > >                            struct device_attribute *attr, char *buf)    \
> > > @@ -27,9 +53,27 @@ static ssize_t name##_read(struct file *file, struct kobject
> > *kobj,          \
> > >                                   loff_t off, size_t count)                     \
> > >  {                                                                              \
> > >         struct device *dev = kobj_to_dev(kobj);                                 \
> > > -                                                                               \
> > > -       return cpumap_print_to_buf(false, buf, topology_##mask(dev->id),
> > \
> > > -                                  off, count);                                 \
> > > +       struct bitmap_print_buf *bmb = dev_get_drvdata(dev);                    \
> > > +       if (!bmb) {                                                             \
> > > +               bmb = devm_kzalloc(dev, sizeof(*bmb), GFP_KERNEL);              \
> > > +               if (!bmb)                                                       \
> > > +                       return -ENOMEM;                                         \
> > > +               dev_set_drvdata(dev, bmb);                                      \
> > > +       }                                                                       \
> > > +       /* for the 1st time, getting the printed buffer */                \
> > > +       if (!bmb->buf)                                                           \
> > > +               bmb->size = cpumap_get_print_buf(false, &bmb->buf,       \
> > > +                                    topology_##mask(dev->id));                 \
> > > +       /* when we arrive EOF, free the printed buffer */                       \
> > > +       if (off >= bmb->size) {                                                 \
> > > +               kfree(bmb->buf);  bmb->buf = NULL;
> > \
> > > +               return 0;                                                       \
> > > +       }                                                                       \
> > > +       /* while a large printed buffer is read many times, we reuse         \
> > > +        * the buffer we get at the 1st time                                    \
> > > +        */                                                                     \
> > > +       strncpy(buf, bmb->buf + off, count);                                    \
> > > +       return min(count,  bmb->size - off);                                    \
> > >  }                                                                              \
> > >                                                                                 \
> > > This means a huge change in drivers though I am not sure Greg is
> > > a fan of this approach. Anyway, "1000 times" is not a real case.
> > > Typically we will arrive EOF after one time.
> > 
> > Not a real case in your driver doesn't mean not a real case at all.
> > You're adding your code to the very basic core layer, and so you'd
> > consider all possible scenarios, not only your particular one.
> > 
> 
> Generally yes. And generally I agree with your point. That point is
> exactly what I have always adhered to in software design. But my point
> I have been arguing is that the new APIs are for sysfs ABI only. So it
> is not particular one, it is common.
> 
> In my understanding, only ABI things to users will need to print bitmap
> to mask or list. Rarely a kernel module will need it. Kernel modules
> would be running real and/or/andnot bit operations on binary bitmap
> directly.
> 
> Anyway, I'm glad to take a step in the way you prefer.
> 
> > On the other hand, if you add your function(s) at drivers/base/node.c
> > and explain that O(nbits**2) 'is not a real case' in _this_ driver -
> > I think it will be acceptable. Maybe this is your choice...
> > 
> > > Thanks
> > > Barry
> > 
> > > Not sure if I have totally got your idea. But please see if the below
> > > is closer to what you prefer.
> > >
> > > I haven't really tested it. But this approach can somehow solve the
> > > problem you mentioned(malloc/free and printing is done 1000times for
> > > a 1MB buffer which is read 1K each time).
> > >
> > > Bitmap provides some API to alloc and return print_buf:
> > 
> > I'm not too familar to the topology things, and in fact never exposed
> > anything thru the sysfs.
> > 
> > From general knowledge, it's better to allocate memory for the string
> > on file creation to avoid situation when kernel has no memory to allocate
> > it when user tries to read.
> > 
> > So from my perspective, the most straightforward solution would be:
> > 
> > register(cpumask)
> > {
> >         size_t max_size = cpumap_max_string_size(list, cpumask)
> >         void *buf = kmalloc(max_size, ...);
> > 
> >         sysfs_create_file(..., buf)
> > }
> > 
> > unregister()
> > {
> >         kfree(buf);
> > }
> > 
> > show()
> > {
> >         snprintf(buf, max_size, "*pbl", cpumask);
> > }
> > 
> 
> Generally good idea. However, for sysfs ABI entries, it might not be
> that true.
> 
> A sysfs entry might never be read for its whole life. As I explained
> before, a sysfs entry - especially for list,  is randomly "cat" by users.
> Many of them won't be read forever. And after they are read once, they
> will probably never be read again. The operations to read ABI could be
> random and rare. Performance wouldn't be a concern.
> 
> To avoid holding the memory which might never be used, it is better to
> allocate and free the memory during runtime. I mean to allocate in show()
> and free in show(), aka, to do it on demand.
> 
> For example, for a server with 256CPU and each cpu has dozens of sysfs ABI
> entries, only a few of sysfs list entries might be randomly "cat" by users.
> Holding 256*entries memory doesn't look good.

Ok, makes sense.
 
> > This would require to add bitmap_max_string_size(list, bitmap, nbits),
> > but it's O(1), and I think, others will find it helpful.
> 
> What about getting size and memory at the same time?

1. We already have kasprintf()
2. It breaks coding style.

Documentation/process/coding-style.rst:
        Functions should be short and sweet, and do just one thing. 

From practical point of view, there should be some balance between
granularity and ease-of-use. But in this case, bitmap_list cries for 
a function that will help to estimate size of output buffer. And it's
easy to imagine a case where the estimated length of bitmap is needed
explicitly:

        size_t max_size = bitmap_max_string_size(nbits);
        char *buf = kmalloc(PAGE_ALIGN(max_size) * nr_cpus);

Thought, I don't insist. In your driver you can do:

        size_t size = snprintf(NULL, 0, ...);
        void *buf = kmalloc(size);

It will be fully correct, and you already have everything you need.
 
> ssize_t bitmap_get_print_buf(bool list, char **buf, const unsigned long
>  *maskp, int nmaskbits)
> 
> ssize_t cpumap_get_print_buf(bool list, char **buf, const struct cpumask *mask);
>
> This API returns the size of printed buffer, and it also gets the
> printed result saved in *buf. Then drivers don't need to do three
> steps:
> 
> 1. get cpumap buffer size which is your cpumap_max_string_size()
> 2. allocate memory for buffer according to size got in step 1
> 3. print bitmap(cpumap) to buffer by "pbl"
> 
> It will only need to call bitmap_get_print_buf() and all three
> things are done inside bitmap_get_print_buf().
> 
> How to use the size and memory allocated in cpumap_get_print_buf
> will be totally up to users.
> 
> The other benefit for this is that if we get string size during initialization,
> and then we print in show() entries, the size got at the beginning might be not
> enough as system topology might have changed. Sysfs ABI reflects the status of
> system at this moment.
> 
> > 
> > Again, I'm not professional with sysfs, and fully admit that it might
> > be wrong.
> 
> Never mind.
> 
> Thanks
> Barry

  reply	other threads:[~2021-07-17  1:12 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-15 11:58 [PATCH v7 0/4] use bin_attribute to break the size limitation of cpumap ABI Barry Song
2021-07-15 11:58 ` [PATCH v7 1/4] cpumask: introduce cpumap_print_to_buf to support large bitmask and list Barry Song
2021-07-15 15:28   ` Yury Norov
2021-07-15 21:08     ` Song Bao Hua (Barry Song)
2021-07-16  0:57     ` Song Bao Hua (Barry Song)
2021-07-15 11:58 ` [PATCH v7 2/4] topology: use bin_attribute to break the size limitation of cpumap ABI Barry Song
2021-07-16  8:49   ` Song Bao Hua (Barry Song)
2021-07-16 20:04     ` Yury Norov
2021-07-17  0:16       ` Song Bao Hua (Barry Song)
2021-07-17  1:12         ` Yury Norov [this message]
2021-07-19  9:07           ` andriy.shevchenko
2021-07-19 11:10             ` Song Bao Hua (Barry Song)
2021-07-19 17:10               ` Yury Norov
2021-07-21  9:30                 ` Song Bao Hua (Barry Song)
2021-07-15 11:58 ` [PATCH v7 3/4] drivers/base/node.c: " Barry Song
2021-07-15 11:58 ` [PATCH v7 4/4] lib: test_bitmap: add bitmap_print_to_buf test cases Barry Song
2021-07-15 12:09   ` Andy Shevchenko
2021-07-15 20:47     ` Yury Norov
2021-07-15 21:32       ` Andy Shevchenko
2021-07-15 23:23         ` Yury Norov
2021-07-16  0:41           ` Song Bao Hua (Barry Song)
2021-07-21 11:40           ` Greg Kroah-Hartman
2021-07-22 17:09             ` Yury Norov
2021-07-22 17:47               ` Greg Kroah-Hartman
2021-07-22 18:27                 ` Yury Norov
2021-07-22 18:36                   ` Andy Shevchenko
2021-07-22 18:45                   ` Greg Kroah-Hartman
2021-07-28  9:08                     ` Song Bao Hua (Barry Song)
2021-07-28  9:24                       ` Andy Shevchenko
2021-07-15 15:36   ` Yury Norov
2021-07-28 13:41 ` [PATCH v7 0/4] use bin_attribute to break the size limitation of cpumap ABI Greg KH
2021-07-28 14:53   ` Yury Norov
2021-07-28 15:25     ` Greg KH
2021-07-28 18:58 ` [PATCH] bitmap: extend comment to bitmap_print_to_buf Yury Norov
2021-07-29  6:04   ` Song Bao Hua (Barry Song)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YPIudYST90WC4k4U@yury-ThinkPad \
    --to=yury.norov@gmail.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=bristot@redhat.com \
    --cc=dave.hansen@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=guodong.xu@linaro.org \
    --cc=jianpeng.ma@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=linuxarm@huawei.com \
    --cc=peterz@infradead.org \
    --cc=prime.zeng@hisilicon.com \
    --cc=rafael@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=sbrivio@redhat.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=tangchengchang@huawei.com \
    --cc=tiantao6@hisilicon.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=valentin.schneider@arm.com \
    --cc=yangyicong@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).