From: "NeilBrown" <neilb@suse.de>
To: "Coly Li" <colyli@suse.de>
Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
linux-raid@vger.kernel.org, nvdimm@lists.linux.dev,
antlists@youngman.org.uk,
"Dan Williams" <dan.j.williams@intel.com>,
"Hannes Reinecke" <hare@suse.de>, "Jens Axboe" <axboe@kernel.dk>,
"Richard Fan" <richard.fan@suse.com>,
"Vishal L Verma" <vishal.l.verma@intel.com>,
"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
rafael@kernel.org
Subject: Re: Too large badblocks sysfs file (was: [PATCH v3 0/7] badblocks improvement for multiple bad block ranges)
Date: Thu, 23 Sep 2021 20:09:21 +1000 [thread overview]
Message-ID: <163239176137.2580.11220971146920860651@noble.neil.brown.name> (raw)
In-Reply-To: <a0f7b021-4816-6785-a9a4-507464b55895@suse.de>
On Thu, 23 Sep 2021, Coly Li wrote:
> Hi all the kernel gurus, and folks in mailing lists,
>
> This is a question about exporting 4KB+ text information via sysfs
> interface. I need advice on how to handle the problem.
Why do you think there is a problem?
As documented in Documentation/admin-guide/md.rst, the truncation at 1
page is expected and by design.
The "unacknowledge-bad-blocks" file is the important one that is needed
for correct behaviour. Being able to read a single block is sufficient,
though being able to read more than one could provide better performance
in some cases.
The "bad-blocks" file primarily exist to provide visibility into the
state of the system - useful during development. It can be written to
to add bad blocks. I never *needs* to be read from.
The authoritative source of information about the set of bad blocks is
the on-disk data the can be and should be read directly...
Except that mdadm does. That was a mistake. check_for_cleared_bb() is
wrong. I wonder why it was added. The commit message doesn't give any
justification.
NeilBrown
>
> Recently I work on the bad blocks API (block/badblocks.c) improvement,
> there is a sysfs file to export the bad block ranges for me raid. E.g
> for a md raid1 device, file
> /sys/block/md0/md/rd0/bad_blocks
> may contain the following text content,
> 64 32
> 128 8
> The above lines mean there are two bad block ranges, one starts at LBA
> 64, length 32 sectors, another one starts at LBA 128 and length 8
> sectors. All the content is generated from the internal bad block
> records with 512 elements. In my testing the worst case only 185 from
> 512 records can be displayed via the sysfs file if the LBA string is
> very long, e.g.the following content,
> 17668164135030776 512
> 17668164135029776 512
> 17668164135028776 512
> 17668164135027776 512
> ... ...
> The bad block ranges stored in internal bad blocks array are correct,
> but the output message is truncated. This is the problem I encountered.
>
> I don't see sysfs has seq_file support (correct me if I am wrong), and I
> know it is improper to transfer 4KB+ text via sysfs interface, but the
> code is here already for long time.
>
> There are 2 ideas to fix showing up in my brain,
> 1) Do not fix the problem
> Normally it is rare that a storage media has 100+ bad block ranges,
> maybe in real world all the existing bad blocks information won't exceed
> the page size limitation of sysfs file.
> 2) Add seq_file support to sysfs interface if there is no
>
> It is probably there is other better solution to fix. So I do want to
> get hint/advice from you.
>
> Thanks in advance for any comment :-)
>
> Coly Li
>
> On 9/14/21 12:36 AM, Coly Li wrote:
> > This is the second effort to improve badblocks code APIs to handle
> > multiple ranges in bad block table.
> >
> > There are 2 changes from previous version,
> > - Fixes 2 bugs in front_overwrite() which are detected by the user
> > space testing code.
> > - Provide the user space testing code in last patch.
> >
> > There is NO in-memory or on-disk format change in the whole series, all
> > existing API and data structures are consistent. This series just only
> > improve the code algorithm to handle more corner cases, the interfaces
> > are same and consistency to all existing callers (md raid and nvdimm
> > drivers).
> >
> > The original motivation of the change is from the requirement from our
> > customer, that current badblocks routines don't handle multiple ranges.
> > For example if the bad block setting range covers multiple ranges from
> > bad block table, only the first two bad block ranges merged and rested
> > ranges are intact. The expected behavior should be all the covered
> > ranges to be handled.
> >
> > All the patches are tested by modified user space code and the code
> > logic works as expected. The modified user space testing code is
> > provided in last patch. The testing code detects 2 defects in helper
> > front_overwrite() and fixed in this version.
> >
> > The whole change is divided into 6 patches to make the code review more
> > clear and easier. If people prefer, I'd like to post a single large
> > patch finally after the code review accomplished.
> >
> > This version is seriously tested, and so far no more defect observed.
> >
> >
> > Coly Li
> >
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Hannes Reinecke <hare@suse.de>
> > Cc: Jens Axboe <axboe@kernel.dk>
> > Cc: NeilBrown <neilb@suse.de>
> > Cc: Richard Fan <richard.fan@suse.com>
> > Cc: Vishal L Verma <vishal.l.verma@intel.com>
> > ---
> > Changelog:
> > v3: add tester Richard Fan <richard.fan@suse.com>
> > v2: the improved version, and with testing code.
> > v1: the first completed version.
> >
> >
> > Coly Li (6):
> > badblocks: add more helper structure and routines in badblocks.h
> > badblocks: add helper routines for badblock ranges handling
> > badblocks: improvement badblocks_set() for multiple ranges handling
> > badblocks: improve badblocks_clear() for multiple ranges handling
> > badblocks: improve badblocks_check() for multiple ranges handling
> > badblocks: switch to the improved badblock handling code
> > Coly Li (1):
> > test: user space code to test badblocks APIs
> >
> > block/badblocks.c | 1599 ++++++++++++++++++++++++++++++-------
> > include/linux/badblocks.h | 32 +
> > 2 files changed, 1340 insertions(+), 291 deletions(-)
> >
>
>
next prev parent reply other threads:[~2021-09-23 10:09 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-13 16:36 [PATCH v3 0/7] badblocks improvement for multiple bad block ranges Coly Li
2021-09-13 16:36 ` [PATCH v3 1/6] badblocks: add more helper structure and routines in badblocks.h Coly Li
2021-09-27 7:23 ` Geliang Tang
2021-09-27 8:23 ` Coly Li
2021-09-13 16:36 ` [PATCH v3 2/6] badblocks: add helper routines for badblock ranges handling Coly Li
2021-09-27 7:25 ` Geliang Tang
2021-09-27 8:17 ` Coly Li
2021-09-13 16:36 ` [PATCH v3 3/6] badblocks: improvement badblocks_set() for multiple " Coly Li
2021-09-27 7:30 ` Geliang Tang
2021-09-27 8:16 ` Coly Li
2021-09-13 16:36 ` [PATCH v3 4/6] badblocks: improve badblocks_clear() " Coly Li
2021-09-13 16:36 ` [PATCH v3 5/6] badblocks: improve badblocks_check() " Coly Li
2021-09-13 16:36 ` [PATCH v3 6/6] badblocks: switch to the improved badblock handling code Coly Li
2021-09-13 16:36 ` [PATCH] test: user space code to test badblocks APIs Coly Li
2021-09-23 5:59 ` Too large badblocks sysfs file (was: [PATCH v3 0/7] badblocks improvement for multiple bad block ranges) Coly Li
2021-09-23 6:08 ` Greg Kroah-Hartman
2021-09-23 6:14 ` Coly Li
2021-09-23 6:47 ` Greg Kroah-Hartman
2021-09-23 7:13 ` Coly Li
2021-09-23 9:40 ` Hannes Reinecke
2021-09-23 9:57 ` Greg Kroah-Hartman
2021-09-23 10:09 ` NeilBrown [this message]
2021-09-23 10:39 ` Greg Kroah-Hartman
2021-09-23 12:55 ` Coly Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=163239176137.2580.11220971146920860651@noble.neil.brown.name \
--to=neilb@suse.de \
--cc=antlists@youngman.org.uk \
--cc=axboe@kernel.dk \
--cc=colyli@suse.de \
--cc=dan.j.williams@intel.com \
--cc=gregkh@linuxfoundation.org \
--cc=hare@suse.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=nvdimm@lists.linux.dev \
--cc=rafael@kernel.org \
--cc=richard.fan@suse.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).