All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: kernel test robot <oliver.sang@intel.com>
Cc: Harshad Shirwadkar <harshadshirwadkar@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	lkp@lists.01.org, lkp@intel.com, dm-devel@redhat.com
Subject: Re: [ext4]  21175ca434: mdadm-selftests.enchmarks/mdadm-selftests/tests/01r1fail.fail
Date: Wed, 28 Apr 2021 10:03:16 -0400	[thread overview]
Message-ID: <YIlrJCdhVaFPdPgb@mit.edu> (raw)
In-Reply-To: <20210427081539.GF32408@xsang-OptiPlex-9020>

(Hmm, why did you cc linux-km on this report?  I would have thought
dm-devel would have made more sense?)

On Tue, Apr 27, 2021 at 04:15:39PM +0800, kernel test robot wrote:
> 
> FYI, we noticed the following commit (built with gcc-9):
> 
> commit: 21175ca434c5d49509b73cf473618b01b0b85437 ("ext4: make prefetch_block_bitmaps default")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 

> in testcase: mdadm-selftests
> version: mdadm-selftests-x86_64-5d518de-1_20201008
> with following parameters:
> 
> 	disk: 1HDD
> 	test_prefix: 01r1
> 	ucode: 0x21

So this failure makes no sense to me.  Looking at the kmesg failure
logs, it's failing in the md layer:

kern  :info  : [   99.775514] md/raid1:md0: not clean -- starting background reconstruction
kern  :info  : [   99.783372] md/raid1:md0: active with 3 out of 4 mirrors
kern  :info  : [   99.789735] md0: detected capacity change from 0 to 37888
kern  :info  : [   99.796216] md: resync of RAID array md0
kern  :crit  : [   99.900450] md/raid1:md0: Disk failure on loop2, disabling device.
                              md/raid1:md0: Operation continuing on 2 devices.
kern  :crit  : [   99.918281] md/raid1:md0: Disk failure on loop1, disabling device.
                              md/raid1:md0: Operation continuing on 1 devices.
kern  :info  : [  100.835833] md: md0: resync interrupted.
kern  :info  : [  101.852898] md: resync of RAID array md0
kern  :info  : [  101.858347] md: md0: resync done.
user  :notice: [  102.109684] /lkp/benchmarks/mdadm-selftests/tests/01r1fail... FAILED - see /var/tmp/01r1fail.log and /var/tmp/fail01r1fail.log for details

The referenced commit just turns block bitmap prefetching in ext4.
This should not cause md to failure; if so, that's an md bug, not an
ext4 bug.  There should not be anything that the file system is doing
that would cause the kernel to think there is a disk failure.

By the way, the reproduction instructions aren't working currently:

> To reproduce:
> 
>         git clone https://github.com/intel/lkp-tests.git
>         cd lkp-tests
>         bin/lkp install                job.yaml  # job file is attached in this email

This fails because lkp is trying to apply a patch which does not apply
with the current version of the md tools.

>         bin/lkp split-job --compatible job.yaml
>         bin/lkp run                    compatible-job.yaml

And the current versions lkp don't generate a compatible-job.yaml file
when you run "lkp split-job --compatable"; instead it generates a new
yaml file with a set of random characters to generate a unique name.
(What Multics parlance would be called a "shriek name"[1] :-)

Since I was having trouble running the reproduction; could you send
the /var/tmp/*fail.logs so we could have a bit more insight what is
going on?

Thanks!

					- Ted


WARNING: multiple messages have this Message-ID (diff)
From: "Theodore Ts'o" <tytso@mit.edu>
To: kernel test robot <oliver.sang@intel.com>
Cc: lkp@intel.com, Linux Memory Management List <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	lkp@lists.01.org, dm-devel@redhat.com,
	Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Subject: Re: [dm-devel] [ext4] 21175ca434: mdadm-selftests.enchmarks/mdadm-selftests/tests/01r1fail.fail
Date: Wed, 28 Apr 2021 10:03:16 -0400	[thread overview]
Message-ID: <YIlrJCdhVaFPdPgb@mit.edu> (raw)
In-Reply-To: <20210427081539.GF32408@xsang-OptiPlex-9020>

(Hmm, why did you cc linux-km on this report?  I would have thought
dm-devel would have made more sense?)

On Tue, Apr 27, 2021 at 04:15:39PM +0800, kernel test robot wrote:
> 
> FYI, we noticed the following commit (built with gcc-9):
> 
> commit: 21175ca434c5d49509b73cf473618b01b0b85437 ("ext4: make prefetch_block_bitmaps default")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 

> in testcase: mdadm-selftests
> version: mdadm-selftests-x86_64-5d518de-1_20201008
> with following parameters:
> 
> 	disk: 1HDD
> 	test_prefix: 01r1
> 	ucode: 0x21

So this failure makes no sense to me.  Looking at the kmesg failure
logs, it's failing in the md layer:

kern  :info  : [   99.775514] md/raid1:md0: not clean -- starting background reconstruction
kern  :info  : [   99.783372] md/raid1:md0: active with 3 out of 4 mirrors
kern  :info  : [   99.789735] md0: detected capacity change from 0 to 37888
kern  :info  : [   99.796216] md: resync of RAID array md0
kern  :crit  : [   99.900450] md/raid1:md0: Disk failure on loop2, disabling device.
                              md/raid1:md0: Operation continuing on 2 devices.
kern  :crit  : [   99.918281] md/raid1:md0: Disk failure on loop1, disabling device.
                              md/raid1:md0: Operation continuing on 1 devices.
kern  :info  : [  100.835833] md: md0: resync interrupted.
kern  :info  : [  101.852898] md: resync of RAID array md0
kern  :info  : [  101.858347] md: md0: resync done.
user  :notice: [  102.109684] /lkp/benchmarks/mdadm-selftests/tests/01r1fail... FAILED - see /var/tmp/01r1fail.log and /var/tmp/fail01r1fail.log for details

The referenced commit just turns block bitmap prefetching in ext4.
This should not cause md to failure; if so, that's an md bug, not an
ext4 bug.  There should not be anything that the file system is doing
that would cause the kernel to think there is a disk failure.

By the way, the reproduction instructions aren't working currently:

> To reproduce:
> 
>         git clone https://github.com/intel/lkp-tests.git
>         cd lkp-tests
>         bin/lkp install                job.yaml  # job file is attached in this email

This fails because lkp is trying to apply a patch which does not apply
with the current version of the md tools.

>         bin/lkp split-job --compatible job.yaml
>         bin/lkp run                    compatible-job.yaml

And the current versions lkp don't generate a compatible-job.yaml file
when you run "lkp split-job --compatable"; instead it generates a new
yaml file with a set of random characters to generate a unique name.
(What Multics parlance would be called a "shriek name"[1] :-)

Since I was having trouble running the reproduction; could you send
the /var/tmp/*fail.logs so we could have a bit more insight what is
going on?

Thanks!

					- Ted

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


WARNING: multiple messages have this Message-ID (diff)
From: Theodore Ts'o <tytso@mit.edu>
To: lkp@lists.01.org
Subject: Re: [ext4] 21175ca434: mdadm-selftests.enchmarks/mdadm-selftests/tests/01r1fail.fail
Date: Wed, 28 Apr 2021 10:03:16 -0400	[thread overview]
Message-ID: <YIlrJCdhVaFPdPgb@mit.edu> (raw)
In-Reply-To: <20210427081539.GF32408@xsang-OptiPlex-9020>

[-- Attachment #1: Type: text/plain, Size: 2904 bytes --]

(Hmm, why did you cc linux-km on this report?  I would have thought
dm-devel would have made more sense?)

On Tue, Apr 27, 2021 at 04:15:39PM +0800, kernel test robot wrote:
> 
> FYI, we noticed the following commit (built with gcc-9):
> 
> commit: 21175ca434c5d49509b73cf473618b01b0b85437 ("ext4: make prefetch_block_bitmaps default")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 

> in testcase: mdadm-selftests
> version: mdadm-selftests-x86_64-5d518de-1_20201008
> with following parameters:
> 
> 	disk: 1HDD
> 	test_prefix: 01r1
> 	ucode: 0x21

So this failure makes no sense to me.  Looking at the kmesg failure
logs, it's failing in the md layer:

kern  :info  : [   99.775514] md/raid1:md0: not clean -- starting background reconstruction
kern  :info  : [   99.783372] md/raid1:md0: active with 3 out of 4 mirrors
kern  :info  : [   99.789735] md0: detected capacity change from 0 to 37888
kern  :info  : [   99.796216] md: resync of RAID array md0
kern  :crit  : [   99.900450] md/raid1:md0: Disk failure on loop2, disabling device.
                              md/raid1:md0: Operation continuing on 2 devices.
kern  :crit  : [   99.918281] md/raid1:md0: Disk failure on loop1, disabling device.
                              md/raid1:md0: Operation continuing on 1 devices.
kern  :info  : [  100.835833] md: md0: resync interrupted.
kern  :info  : [  101.852898] md: resync of RAID array md0
kern  :info  : [  101.858347] md: md0: resync done.
user  :notice: [  102.109684] /lkp/benchmarks/mdadm-selftests/tests/01r1fail... FAILED - see /var/tmp/01r1fail.log and /var/tmp/fail01r1fail.log for details

The referenced commit just turns block bitmap prefetching in ext4.
This should not cause md to failure; if so, that's an md bug, not an
ext4 bug.  There should not be anything that the file system is doing
that would cause the kernel to think there is a disk failure.

By the way, the reproduction instructions aren't working currently:

> To reproduce:
> 
>         git clone https://github.com/intel/lkp-tests.git
>         cd lkp-tests
>         bin/lkp install                job.yaml  # job file is attached in this email

This fails because lkp is trying to apply a patch which does not apply
with the current version of the md tools.

>         bin/lkp split-job --compatible job.yaml
>         bin/lkp run                    compatible-job.yaml

And the current versions lkp don't generate a compatible-job.yaml file
when you run "lkp split-job --compatable"; instead it generates a new
yaml file with a set of random characters to generate a unique name.
(What Multics parlance would be called a "shriek name"[1] :-)

Since I was having trouble running the reproduction; could you send
the /var/tmp/*fail.logs so we could have a bit more insight what is
going on?

Thanks!

					- Ted

  reply	other threads:[~2021-04-28 14:03 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-27  8:15 [ext4] 21175ca434: mdadm-selftests.enchmarks/mdadm-selftests/tests/01r1fail.fail kernel test robot
2021-04-27  8:15 ` kernel test robot
2021-04-28 14:03 ` Theodore Ts'o [this message]
2021-04-28 14:03   ` Theodore Ts'o
2021-04-28 14:03   ` [dm-devel] " Theodore Ts'o
2021-04-29  7:43   ` [LKP] " Rong Chen
2021-04-29  7:43     ` Rong Chen
2021-04-29  7:43     ` [dm-devel] [LKP] " Rong Chen
2021-05-13 14:48   ` Oliver Sang
2021-05-13 14:48     ` Oliver Sang
2021-05-13 14:48     ` [dm-devel] " Oliver Sang
2021-08-31  5:26 kernel test robot
2021-08-31  5:26 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YIlrJCdhVaFPdPgb@mit.edu \
    --to=tytso@mit.edu \
    --cc=dm-devel@redhat.com \
    --cc=harshadshirwadkar@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=lkp@lists.01.org \
    --cc=oliver.sang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.