Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
From: "Ray, Mark C (Global Solutions Engineering (GSE))" <mark.ray@hpe.com>
To: Ming Lei <ming.lei@redhat.com>, Greg KH <gregkh@linuxfoundation.org>
Cc: Jens Axboe <axboe@kernel.dk>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>,
	"Ray,
	Mark C (Global Solutions Engineering (GSE))" <mark.ray@hpe.com>
Subject: RE: [PATCH] blk-mq: avoid sysfs buffer overflow by too many CPU cores
Date: Thu, 15 Aug 2019 23:10:35 +0000
Message-ID: <AT5PR8401MB05784C37BAF2939B776103FC99AC0@AT5PR8401MB0578.NAMPRD84.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <20190815124321.GB28032@ming.t460p>

Hi Ming,

In the customer case, the cpu_list file was not needed.   It was just part of a SAP Hana script to collect all the block device data (similar to sosreport).    So they were just dumping everything, and it picks up the mq-related files.  

I know with IRQs, we have bitmaps/mask, and can represent the list such as "0-27", without listing every CPU.   I'm sure there's lots of options to address this, and getting rid of the cpu_list is one of them.

Best Regards,

Mark Ray
HPE Global Solutions Engineering
mark.ray@hpe.com



-----Original Message-----
From: Ming Lei [mailto:ming.lei@redhat.com] 
Sent: Thursday, August 15, 2019 9:43 PM
To: Greg KH <gregkh@linuxfoundation.org>
Cc: Jens Axboe <axboe@kernel.dk>; linux-block@vger.kernel.org; stable@vger.kernel.org; Ray, Mark C (Global Solutions Engineering (GSE)) <mark.ray@hpe.com>
Subject: Re: [PATCH] blk-mq: avoid sysfs buffer overflow by too many CPU cores

On Thu, Aug 15, 2019 at 02:35:35PM +0200, Greg KH wrote:
> On Thu, Aug 15, 2019 at 08:29:10PM +0800, Ming Lei wrote:
> > On Thu, Aug 15, 2019 at 02:24:19PM +0200, Greg KH wrote:
> > > On Thu, Aug 15, 2019 at 08:15:18PM +0800, Ming Lei wrote:
> > > > It is reported that sysfs buffer overflow can be triggered in 
> > > > case of too many CPU cores(>841 on 4K PAGE_SIZE) when showing 
> > > > CPUs in one hctx.
> > > > 
> > > > So use snprintf for avoiding the potential buffer overflow.
> > > > 
> > > > Cc: stable@vger.kernel.org
> > > > Cc: Mark Ray <mark.ray@hpe.com>
> > > > Fixes: 676141e48af7("blk-mq: don't dump CPU -> hw queue map on 
> > > > driver load")
> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > ---
> > > >  block/blk-mq-sysfs.c | 30 ++++++++++++++++++------------
> > > >  1 file changed, 18 insertions(+), 12 deletions(-)
> > > > 
> > > > diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c index 
> > > > d6e1a9bd7131..e75f41a98415 100644
> > > > --- a/block/blk-mq-sysfs.c
> > > > +++ b/block/blk-mq-sysfs.c
> > > > @@ -164,22 +164,28 @@ static ssize_t blk_mq_hw_sysfs_nr_reserved_tags_show(struct blk_mq_hw_ctx *hctx,
> > > >  	return sprintf(page, "%u\n", hctx->tags->nr_reserved_tags);  }
> > > >  
> > > > +/* avoid overflow by too many CPU cores */
> > > >  static ssize_t blk_mq_hw_sysfs_cpus_show(struct blk_mq_hw_ctx 
> > > > *hctx, char *page)  {
> > > > -	unsigned int i, first = 1;
> > > > -	ssize_t ret = 0;
> > > > -
> > > > -	for_each_cpu(i, hctx->cpumask) {
> > > > -		if (first)
> > > > -			ret += sprintf(ret + page, "%u", i);
> > > > -		else
> > > > -			ret += sprintf(ret + page, ", %u", i);
> > > > -
> > > > -		first = 0;
> > > > +	unsigned int cpu = cpumask_first(hctx->cpumask);
> > > > +	ssize_t len = snprintf(page, PAGE_SIZE - 1, "%u", cpu);
> > > > +	int last_len = len;
> > > > +
> > > > +	while ((cpu = cpumask_next(cpu, hctx->cpumask)) < nr_cpu_ids) {
> > > > +		int cur_len = snprintf(page + len, PAGE_SIZE - 1 - len,
> > > > +				       ", %u", cpu);
> > > > +		if (cur_len >= PAGE_SIZE - 1 - len) {
> > > > +			len -= last_len;
> > > > +			len += snprintf(page + len, PAGE_SIZE - 1 - len,
> > > > +					"...");
> > > > +			break;
> > > > +		}
> > > > +		len += cur_len;
> > > > +		last_len = cur_len;
> > > >  	}
> > > >  
> > > > -	ret += sprintf(ret + page, "\n");
> > > > -	return ret;
> > > > +	len += snprintf(page + len, PAGE_SIZE - 1 - len, "\n");
> > > > +	return len;
> > > >  }
> > > >
> > > 
> > > What????
> > > 
> > > sysfs is "one value per file".  You should NEVER have to care 
> > > about the size of the sysfs buffer.  If you do, you are doing something wrong.
> > > 
> > > What excatly are you trying to show in this sysfs file?  I can't 
> > > seem to find the Documenatation/ABI/ entry for it, am I just 
> > > missing it because I don't know the filename for it?
> > 
> > It is /sys/block/$DEV/mq/$N/cpu_list, all CPUs in this hctx($N) will 
> > be shown via sysfs buffer. The buffer size is one PAGE, how can it 
> > hold when there are too many CPUs(close to 1K)?
> 
> Looks like I only see 1 cpu listed on my machines in those files, what 
> am I doing wrong?

It depends on machine. The issue is reported on one machine with 896 CPU cores, when 4K buffer can only hold 841 cores.

> 
> Also, I don't see cpu_list in any of the documentation files, so I 
> have no idea what you are trying to have this file show.
> 
> And again, "one value per file" is the sysfs rule.  "all cpus in the 
> system" is not "one value" :)

I agree, and this file shouldn't be there, given each CPU will have one kobject dir under the hctx dir.

We may kill the 'cpu_list' attribute, is there anyone who objects?


Thanks,
Ming

  parent reply index

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-15 12:15 Ming Lei
2019-08-15 12:24 ` Greg KH
2019-08-15 12:29   ` Ming Lei
2019-08-15 12:35     ` Greg KH
2019-08-15 12:43       ` Ming Lei
2019-08-15 13:21         ` Greg KH
2019-08-15 23:10         ` Ray, Mark C (Global Solutions Engineering (GSE)) [this message]
2019-08-16  2:49           ` Ming Lei
2019-08-16  7:12             ` Greg KH
2019-08-16 14:21               ` Jens Axboe

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AT5PR8401MB05784C37BAF2939B776103FC99AC0@AT5PR8401MB0578.NAMPRD84.PROD.OUTLOOK.COM \
    --to=mark.ray@hpe.com \
    --cc=axboe@kernel.dk \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org linux-block@archiver.kernel.org
	public-inbox-index linux-block


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/ public-inbox