linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
To: Dave Hansen <dave.hansen@intel.com>,
	"tiantao (H)" <tiantao6@hisilicon.com>,
	"corbet@lwn.net" <corbet@lwn.net>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
Cc: "linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	"Valentin Schneider" <valentin.schneider@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Linuxarm <linuxarm@huawei.com>
Subject: RE: [PATCH 1/2] CPU, NUMA topology ABIs: clarify the overflow issue of sysfs pagebuf
Date: Fri, 23 Jul 2021 11:20:19 +0000	[thread overview]
Message-ID: <e9610060a8d046e783ab9a229f35410c@hisilicon.com> (raw)
In-Reply-To: <fd78ac30-dd3b-a7d7-eae8-193b09a7d49a@intel.com>



> -----Original Message-----
> From: Dave Hansen [mailto:dave.hansen@intel.com]
> Sent: Friday, April 30, 2021 10:39 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>; tiantao (H)
> <tiantao6@hisilicon.com>; corbet@lwn.net; gregkh@linuxfoundation.org
> Cc: linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org; Rafael J.
> Wysocki <rafael@kernel.org>; Peter Zijlstra <peterz@infradead.org>; Valentin
> Schneider <valentin.schneider@arm.com>; Dave Hansen
> <dave.hansen@linux.intel.com>; Daniel Bristot de Oliveira <bristot@redhat.com>
> Subject: Re: [PATCH 1/2] CPU, NUMA topology ABIs: clarify the overflow issue
> of sysfs pagebuf
> 
> On 4/29/21 3:32 PM, Song Bao Hua (Barry Song) wrote:
> > $ strace numactl --hardware  2>&1 | grep cpu
> > openat(AT_FDCWD, "/sys/devices/system/cpu",
> > O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
> > openat(AT_FDCWD, "/sys/devices/system/node/node0/cpumap", O_RDONLY) = 3
> > openat(AT_FDCWD, "/sys/devices/system/node/node1/cpumap", O_RDONLY) = 3
> > openat(AT_FDCWD, "/sys/devices/system/node/node2/cpumap", O_RDONLY) = 3
> > openat(AT_FDCWD, "/sys/devices/system/node/node3/cpumap", O_RDONLY) = 3
> >
> > If we move to binary, it means we have to change those applications.
> 
> I thought Greg was saying to using a sysfs binary attribute using
> something like like sysfs_create_bin_file().  Those don't have the
> PAGE_SIZE limitation.  But, there's also nothing to keep us from spewing
> nice human-readable text via the "binary" file.
> 
> We don't need to change the file format, just the internal kernel API
> that we produce the files with.

Sorry for waking-up the old thread.

I am not sure how common a regular device_attribute will be larger than
4KB and have to work around by bin_attribute. But I wrote a prototype
which can initially support large regular sysfs entry and be able to
fill the entire buffer by only one show() entry. The other words to say,
we don't need to call read() of bin_attribute multiple times for a large
regular file. Right now, only read-only attribute is supported.

Subject: [RFC PATCH] sysfs: support regular device attr which can be larger than
 PAGE_SIZE

A regular sysfs ABI could be more than 4KB, right now, we are using
bin_attribute to workaround and break this limit. A better solution
would be supporting long device attribute. In this case, we will
still be able to enjoy the advantages of buffering and seeking of
seq file and only need to fill the entire buffer of sysfs entry
once.

Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
---
 drivers/base/core.c      |  2 +-
 fs/seq_file.c            | 22 ++++++++++++++++++++++
 fs/sysfs/file.c          | 40 +++++++++++++++++++++++++++++++++++++++-
 fs/sysfs/group.c         |  4 ++--
 include/linux/device.h   | 20 ++++++++++++++++++++
 include/linux/seq_file.h |  1 +
 include/linux/sysfs.h    |  1 +
 7 files changed, 86 insertions(+), 4 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index cadcade65825..0cd4ed165154 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2053,7 +2053,7 @@ static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr,
 
 	if (dev_attr->show)
 		ret = dev_attr->show(dev, dev_attr, buf);
-	if (ret >= (ssize_t)PAGE_SIZE) {
+	if (ret >= (ssize_t)PAGE_SIZE && !(attr->mode & SYSFS_LONGATTR)) {
 		printk("dev_attr_show: %pS returned bad count\n",
 				dev_attr->show);
 	}
diff --git a/fs/seq_file.c b/fs/seq_file.c
index b117b212ef28..9054615f8f19 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -84,6 +84,28 @@ int seq_open(struct file *file, const struct seq_operations *op)
 }
 EXPORT_SYMBOL(seq_open);
 
+/**
+ *	seq_open_prealloc_buf -	allow seq_file users to preallocate buf
+ *	@file: file we initialize
+ *	@size: the maximum size of the file
+ *
+ *	apply to those scenerios single_open_size() doesn't apply. for
+ *	example, while a regular sysfs entry is more than PAGE_SIZE;
+ *	this will permit users to fill the entire buffer by calling
+ *	device_attr show() only once
+ */
+int seq_open_prealloc_buf(struct file *file, unsigned long size)
+{
+	void *buf = seq_buf_alloc(size);
+	if (buf)
+		return -ENOMEM;
+
+	((struct seq_file *)file->private_data)->buf = buf;
+	((struct seq_file *)file->private_data)->size = size;
+
+	return 0;
+}
+
 static int traverse(struct seq_file *m, loff_t offset)
 {
 	loff_t pos = 0;
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 9aefa7779b29..09ae12c2326c 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -16,6 +16,7 @@
 #include <linux/mutex.h>
 #include <linux/seq_file.h>
 #include <linux/mm.h>
+#include <linux/device.h> /* for device's longattr support */
 
 #include "sysfs.h"
 
@@ -202,6 +203,32 @@ void sysfs_notify(struct kobject *kobj, const char *dir, const char *attr)
 }
 EXPORT_SYMBOL_GPL(sysfs_notify);
 
+static int sysfs_kf_longattr_seq_show(struct seq_file *sf, void *v)
+{
+	struct kernfs_open_file *of = sf->private;
+	struct kobject *kobj = of->kn->parent->priv;
+	const struct sysfs_ops *ops = sysfs_file_ops(of->kn);
+	ssize_t count;
+	char *buf;
+
+	count = seq_get_buf(sf, &buf);
+
+	if (ops->show) {
+		count = ops->show(kobj, of->kn->priv, buf);
+		if (count < 0)
+			return count;
+	}
+
+	seq_commit(sf, count);
+	return 0;
+}
+
+static int sysfs_longattr_open(struct kernfs_open_file *of)
+{
+	struct device_long_attribute *lattr = (struct device_long_attribute *)of->kn->priv;
+	return seq_open_prealloc_buf(of->file, lattr->size);
+}
+
 static const struct kernfs_ops sysfs_file_kfops_empty = {
 };
 
@@ -223,6 +250,11 @@ static const struct kernfs_ops sysfs_prealloc_kfops_ro = {
 	.prealloc	= true,
 };
 
+static struct kernfs_ops sysfs_longattr_kfops_ro = {
+	.open		= sysfs_longattr_open,
+	.seq_show 	= sysfs_kf_longattr_seq_show,
+};
+
 static const struct kernfs_ops sysfs_prealloc_kfops_wo = {
 	.write		= sysfs_kf_write,
 	.prealloc	= true,
@@ -276,6 +308,8 @@ int sysfs_add_file_mode_ns(struct kernfs_node *parent,
 		if (sysfs_ops->show && sysfs_ops->store) {
 			if (mode & SYSFS_PREALLOC)
 				ops = &sysfs_prealloc_kfops_rw;
+			else if (mode & SYSFS_LONGATTR)
+				ops = &sysfs_longattr_kfops_ro;
 			else
 				ops = &sysfs_file_kfops_rw;
 		} else if (sysfs_ops->show) {
@@ -291,7 +325,11 @@ int sysfs_add_file_mode_ns(struct kernfs_node *parent,
 		} else
 			ops = &sysfs_file_kfops_empty;
 
-		size = PAGE_SIZE;
+		if (mode & SYSFS_LONGATTR) {
+			size = ((struct device_long_attribute *)attr)->size;
+		} else {
+			size = PAGE_SIZE;
+		}
 	} else {
 		struct bin_attribute *battr = (void *)attr;
 
diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c
index 64e6a6698935..2d80b05550d1 100644
--- a/fs/sysfs/group.c
+++ b/fs/sysfs/group.c
@@ -56,11 +56,11 @@ static int create_files(struct kernfs_node *parent, struct kobject *kobj,
 					continue;
 			}
 
-			WARN(mode & ~(SYSFS_PREALLOC | 0664),
+			WARN(mode & ~(SYSFS_LONGATTR | SYSFS_PREALLOC | 0664),
 			     "Attribute %s: Invalid permissions 0%o\n",
 			     (*attr)->name, mode);
 
-			mode &= SYSFS_PREALLOC | 0664;
+			mode &= SYSFS_LONGATTR | SYSFS_PREALLOC | 0664;
 			error = sysfs_add_file_mode_ns(parent, *attr, false,
 						       mode, uid, gid, NULL);
 			if (unlikely(error))
diff --git a/include/linux/device.h b/include/linux/device.h
index 59940f1744c1..791a3d25f0bb 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -150,6 +150,26 @@ ssize_t device_store_bool(struct device *dev, struct device_attribute *attr,
 	struct device_attribute dev_attr_##_name =		\
 		__ATTR_IGNORE_LOCKDEP(_name, _mode, _show, _store)
 
+/*
+ * for those devices whose attributes are larger than 4KB but still want
+ * to take the advantages of regular sysfs, like show() method is able to
+ * fill the entire buffer by one read operation
+ */
+struct device_long_attribute {
+	struct device_attribute attr;
+	size_t size;
+};
+
+#define __LONG_ATTR_RO(_name, _size) {                                 \
+       .attr.attr = {.name = __stringify(_name),                       \
+		     .mode = SYSFS_LONGATTR | 0444 },                  \
+       .attr.show   = _name##_show,                                    \
+       .size = _size,                                                  \
+}
+
+#define LONG_ATTR_RO(_name, _size) \
+struct device_long_attribute dev_attr_##_name = __LONG_ATTR_RO(_name, _size)
+
 int device_create_file(struct device *device,
 		       const struct device_attribute *entry);
 void device_remove_file(struct device *dev,
diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h
index dd99569595fd..e7357fc91c1c 100644
--- a/include/linux/seq_file.h
+++ b/include/linux/seq_file.h
@@ -106,6 +106,7 @@ void seq_pad(struct seq_file *m, char c);
 
 char *mangle_path(char *s, const char *p, const char *esc);
 int seq_open(struct file *, const struct seq_operations *);
+int seq_open_prealloc_buf(struct file *, unsigned long);
 ssize_t seq_read(struct file *, char __user *, size_t, loff_t *);
 ssize_t seq_read_iter(struct kiocb *iocb, struct iov_iter *iter);
 loff_t seq_lseek(struct file *, loff_t, int);
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index a12556a4b93a..9198afd46fb0 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -97,6 +97,7 @@ struct attribute_group {
  */
 
 #define SYSFS_PREALLOC 010000
+#define SYSFS_LONGATTR 020000
 
 #define __ATTR(_name, _mode, _show, _store) {				\
 	.attr = {.name = __stringify(_name),				\
-- 
2.25.1

very simple way to use it:
Add some long attribute by:

LONG_ATTR_RO(xxx, 2 * PAGE_SIZE);
LONG_ATTR_RO(yyy, 2 * PAGE_SIZE);
....
Then add xxx and yyy to attribute list as we usually do
for normal device_attribute:
struct attribute *attrs[] = {
	&dev_attr_xxx.attr.attr,
	&dev_attr_yyy.attr.attr,
}

Not quite sure if it is valuable. If it is yes, I will
split the code and send a RFC patchset.

Thanks
Barry


  parent reply	other threads:[~2021-07-23 11:36 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-29  7:03 [PATCH 0/2] clarify and cleanup CPU and NUMA topology ABIs Tian Tao
2021-04-29  7:03 ` [PATCH 1/2] CPU, NUMA topology ABIs: clarify the overflow issue of sysfs pagebuf Tian Tao
2021-04-29 14:21   ` Dave Hansen
2021-04-29 15:46     ` Greg KH
2021-04-29 21:08     ` Song Bao Hua (Barry Song)
2021-04-29 21:38       ` Dave Hansen
2021-04-29 22:32         ` Song Bao Hua (Barry Song)
2021-04-29 22:38           ` Dave Hansen
2021-04-29 22:48             ` Song Bao Hua (Barry Song)
2021-04-30  6:05             ` gregkh
2021-07-23 11:20             ` Song Bao Hua (Barry Song) [this message]
2021-07-23 11:28               ` gregkh
2021-07-23 12:48                 ` Song Bao Hua (Barry Song)
2021-05-06 23:16   ` kernel test robot
2021-04-29  7:03 ` [PATCH 2/2] Documentation/ABI: Move the topology-related sysfs interface to the right place Tian Tao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e9610060a8d046e783ab9a229f35410c@hisilicon.com \
    --to=song.bao.hua@hisilicon.com \
    --cc=bristot@redhat.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=tiantao6@hisilicon.com \
    --cc=valentin.schneider@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).