linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Jens Axboe <axboe@kernel.dk>, Tejun Heo <tj@kernel.org>,
	linux-block@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 13/13] xfs: convert xfs_sysfs attrs to use ->seq_show
Date: Tue, 14 Sep 2021 07:12:43 +0200	[thread overview]
Message-ID: <YUAvSx42abg5S2ym@kroah.com> (raw)
In-Reply-To: <20210914012029.GF2361455@dread.disaster.area>

On Tue, Sep 14, 2021 at 11:20:29AM +1000, Dave Chinner wrote:
> On Mon, Sep 13, 2021 at 08:27:50AM +0200, Greg Kroah-Hartman wrote:
> > On Mon, Sep 13, 2021 at 07:41:21AM +0200, Christoph Hellwig wrote:
> > > Trivial conversion to the seq_file based sysfs attributes.
> > > 
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > >  fs/xfs/xfs_stats.c | 24 +++++-------
> > >  fs/xfs/xfs_stats.h |  2 +-
> > >  fs/xfs/xfs_sysfs.c | 96 +++++++++++++++++++++++-----------------------
> > >  3 files changed, 58 insertions(+), 64 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_stats.c b/fs/xfs/xfs_stats.c
> > > index 20e0534a772c9..71e7a84ba0403 100644
> > > --- a/fs/xfs/xfs_stats.c
> > > +++ b/fs/xfs/xfs_stats.c
> > > @@ -16,10 +16,9 @@ static int counter_val(struct xfsstats __percpu *stats, int idx)
> > >  	return val;
> > >  }
> > >  
> > > -int xfs_stats_format(struct xfsstats __percpu *stats, char *buf)
> > > +void xfs_stats_format(struct xfsstats __percpu *stats, struct seq_file *sf)
> > >  {
> > >  	int		i, j;
> > > -	int		len = 0;
> > >  	uint64_t	xs_xstrat_bytes = 0;
> > >  	uint64_t	xs_write_bytes = 0;
> > >  	uint64_t	xs_read_bytes = 0;
> > > @@ -58,13 +57,12 @@ int xfs_stats_format(struct xfsstats __percpu *stats, char *buf)
> > >  	/* Loop over all stats groups */
> > >  
> > >  	for (i = j = 0; i < ARRAY_SIZE(xstats); i++) {
> > > -		len += scnprintf(buf + len, PATH_MAX - len, "%s",
> > > -				xstats[i].desc);
> > > +		seq_printf(sf, "%s", xstats[i].desc);
> > > +
> > >  		/* inner loop does each group */
> > >  		for (; j < xstats[i].endpoint; j++)
> > > -			len += scnprintf(buf + len, PATH_MAX - len, " %u",
> > > -					counter_val(stats, j));
> > > -		len += scnprintf(buf + len, PATH_MAX - len, "\n");
> > > +			seq_printf(sf, " %u", counter_val(stats, j));
> > > +		seq_printf(sf, "\n");
> > >  	}
> > >  	/* extra precision counters */
> > >  	for_each_possible_cpu(i) {
> > > @@ -74,18 +72,14 @@ int xfs_stats_format(struct xfsstats __percpu *stats, char *buf)
> > >  		defer_relog += per_cpu_ptr(stats, i)->s.defer_relog;
> > >  	}
> > >  
> > > -	len += scnprintf(buf + len, PATH_MAX-len, "xpc %Lu %Lu %Lu\n",
> > > +	seq_printf(sf, "xpc %Lu %Lu %Lu\n",
> > >  			xs_xstrat_bytes, xs_write_bytes, xs_read_bytes);
> > > -	len += scnprintf(buf + len, PATH_MAX-len, "defer_relog %llu\n",
> > > -			defer_relog);
> > > -	len += scnprintf(buf + len, PATH_MAX-len, "debug %u\n",
> > > +	seq_printf(sf, "defer_relog %llu\n", defer_relog);
> > >  #if defined(DEBUG)
> > > -		1);
> > > +	seq_printf(sf, "debug 1\n");
> > >  #else
> > > -		0);
> > > +	seq_printf(sf, "debug 0\n");
> > >  #endif
> > > -
> > > -	return len;
> > >  }
> > 
> > That is a sysfs file?  What happened to the "one value per file" rule
> > here?
> 
> 
> There is no "rule" that says syfs files must contain one value per
> file; the documentation says that one value per file is the
> "preferred" format.  Documentation/filesystems/sysfs.rst:
> 
> [...]
> Attributes
> ...
> Attributes should be ASCII text files, preferably with only one value
> per file. It is noted that it may not be efficient to contain only one
> value per file, so it is socially acceptable to express an array of
> values of the same type.
> [...]
> 

An array of values is one thing like "what is the power states for this
device".  A list of different key/value pairs is a totally different
thing entirely.

> We are exposing a large array of integer values here, so multiple
> values per file are explicitly considered an acceptible format.

Not really, that was not the goal of sysfs at all.

> Further, as there are roughly 200 individual stats in this file and
> calculating each stat requires per-cpu aggregation, the the cost of
> calculating and reading each stat individually is prohibitive, not
> just inefficient.

Have you measured it?  How often does the file get read and by what
tools?

We have learned from our past mistakes in /proc where we did this in the
past and required keeping obsolete values and constantly tweaking
userspace parsers.  That is why we made sysfs one-value-per-file.  If
the file is not there, the value is not there, much easier to handle
future changes.

> So, yes, we might have multiple lines in the file that you can frown
> about, but OTOH the file format has been exposed as a kernel ABI for
> a couple of decades via /proc/fs/xfs/stat.

proc had no such rules, but we have learned :)

> Hence exposing it in
> sysfs to provide a more fine-grained breakdown of the stats (per
> mount instead of global) is a no-brainer. We don't have to rewrite
> the parsing engines in multiple userspace monitoring programs to
> extract this information from the kernel - they just create a new
> instance and read a different file and it all just works.

But then you run into the max size restriction on sysfs files
(PAGE_SIZE) and things break down.

Please don't do this.

> Indeed, there's precedence for such /proc file formats in more
> fine-grained sysfs files. e.g.  /sys/bus/node/devices/node<n>/vmstat
> and /sys/bus/node/devices/node<n>/meminfo retain the same format
> (and hence userspace parsers) for the per-node stats as /proc/vmstat
> and /proc/meminfo use for the global stats...

And I have complained about those files in the past many times.  And
they are running into problems in places dealing with them too.

> tl;dr: the file contains arrays of values, it's inefficient to read
> values one at a time, it's a pre-existing ABI-constrainted file
> format, there's precedence in core kernel statistics
> implementations and the documented guidelines allow this sort of
> usage in these cases.

I would prefer not to do this, and I will not take core sysfs changes to
make this any easier.

Which is one big reason why I don't like just making sysfs use the seq
file api, it would allow stuff like this to propagate to other places in
the kernel.

Maybe I should cut the file size of a sysfs file down to PAGE_SIZE/4 or
less, that might be better :)

thanks,

greg k-h

  reply	other threads:[~2021-09-14  5:13 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-13  5:41 start switching sysfs attributes to expose the seq_file Christoph Hellwig
2021-09-13  5:41 ` [PATCH 01/13] seq_file: mark seq_get_buf as deprecated Christoph Hellwig
2021-09-13 13:19   ` Christian Brauner
2021-09-13 16:22   ` Daniel Wagner
2021-09-13 16:29   ` Tejun Heo
2021-09-13  5:41 ` [PATCH 02/13] kernfs: remove kernfs_create_file and kernfs_create_file_ns Christoph Hellwig
2021-09-13 13:20   ` Christian Brauner
2021-09-13  5:41 ` [PATCH 03/13] kernfs: remove the unused lockdep_key field in struct kernfs_ops Christoph Hellwig
2021-09-13 13:21   ` Christian Brauner
2021-09-13 16:30   ` Tejun Heo
2021-09-13  5:41 ` [PATCH 04/13] sysfs: split out binary attribute handling from sysfs_add_file_mode_ns Christoph Hellwig
2021-09-13 13:26   ` Christian Brauner
2021-09-13  5:41 ` [PATCH 05/13] sysfs: refactor sysfs_add_file_mode_ns Christoph Hellwig
2021-09-13 13:27   ` Christian Brauner
2021-09-13  5:41 ` [PATCH 06/13] sysfs: simplify sysfs_kf_seq_show Christoph Hellwig
2021-09-13  5:41 ` [PATCH 07/13] sysfs: add ->seq_show support to sysfs_ops Christoph Hellwig
2021-09-13 13:30   ` Christian Brauner
2021-09-13  5:41 ` [PATCH 08/13] block: convert the blk_mq_hw_ctx attrs to use ->seq_show Christoph Hellwig
2021-09-13  5:41 ` [PATCH 09/13] block: convert the blk_integrity " Christoph Hellwig
2021-09-13  5:41 ` [PATCH 10/13] block: convert the request_queue " Christoph Hellwig
2021-09-13  5:41 ` [PATCH 11/13] block: convert the elevator_queue " Christoph Hellwig
2021-09-13  5:41 ` [PATCH 12/13] xfs: convert xfs_errortag " Christoph Hellwig
2021-09-13  5:41 ` [PATCH 13/13] xfs: convert xfs_sysfs " Christoph Hellwig
2021-09-13  6:27   ` Greg Kroah-Hartman
2021-09-14  1:20     ` Dave Chinner
2021-09-14  5:12       ` Greg Kroah-Hartman [this message]
2021-09-14 10:56         ` Dave Chinner
2021-09-14  7:30     ` Christoph Hellwig
2021-09-14 15:28       ` Greg Kroah-Hartman
2021-09-14 15:30         ` Christoph Hellwig
2021-09-14 15:41           ` Greg Kroah-Hartman
2021-09-15  7:04             ` Christoph Hellwig
2021-09-15  7:07               ` Greg Kroah-Hartman
2021-09-13 16:39 ` start switching sysfs attributes to expose the seq_file Bart Van Assche
2021-09-13 16:46   ` Greg Kroah-Hartman
2021-09-14  2:53     ` Bart Van Assche
2021-09-13 16:46 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YUAvSx42abg5S2ym@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=axboe@kernel.dk \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).