Linux-Block Archive on lore.kernel.org
 help / Atom feed
* Regression in v5.0-rc1: Panic at boot
@ 2019-01-07 19:41 Logan Gunthorpe
  2019-01-08 13:19 ` Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Logan Gunthorpe @ 2019-01-07 19:41 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: linux-block, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 564 bytes --]

Hey,

I found a regression in v5.0-rc1 this morning. My system panics on boot.
I've attached a log of the panic.

I bisected to find the problematic commit is:

Fixes: 9d037ad707ed ("block: remove req->timeout_list")

But it makes no sense to me why this commit would cause a problem like
this. I've attached a bisect log. I've also tested v5.0-rc1 with this
commit reverted and that boots fine.

The traceback seems to indicate the problem is on the bip_get_seed()
line in t10_pi_complete(). Which suggests that bio_integrity() is
returning NULL.

Thanks,

Logan

[-- Attachment #2: bisect.log --]
[-- Type: text/x-log, Size: 2748 bytes --]

[-- Attachment #3: panic.log --]
[-- Type: text/x-log, Size: 7413 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Regression in v5.0-rc1: Panic at boot
  2019-01-07 19:41 Regression in v5.0-rc1: Panic at boot Logan Gunthorpe
@ 2019-01-08 13:19 ` Christoph Hellwig
  2019-01-08 17:24   ` Logan Gunthorpe
  2019-01-08 13:49 ` Jeff Moyer
  2019-01-08 20:54 ` Logan Gunthorpe
  2 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2019-01-08 13:19 UTC (permalink / raw)
  To: Logan Gunthorpe; +Cc: Christoph Hellwig, Jens Axboe, linux-block, linux-kernel

On Mon, Jan 07, 2019 at 12:41:06PM -0700, Logan Gunthorpe wrote:
> Hey,
> 
> I found a regression in v5.0-rc1 this morning. My system panics on boot.
> I've attached a log of the panic.
> 
> I bisected to find the problematic commit is:
> 
> Fixes: 9d037ad707ed ("block: remove req->timeout_list")
> 
> But it makes no sense to me why this commit would cause a problem like
> this. I've attached a bisect log. I've also tested v5.0-rc1 with this
> commit reverted and that boots fine.
> 
> The traceback seems to indicate the problem is on the bip_get_seed()
> line in t10_pi_complete(). Which suggests that bio_integrity() is
> returning NULL.

Very odd.  Can you try an experiment?  Can you add padding the size
of struct timer_list to struct request to check if that makes the
problem go away?  Then move the padding from the where the field
was to the end and see if that still "helps"?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Regression in v5.0-rc1: Panic at boot
  2019-01-07 19:41 Regression in v5.0-rc1: Panic at boot Logan Gunthorpe
  2019-01-08 13:19 ` Christoph Hellwig
@ 2019-01-08 13:49 ` Jeff Moyer
  2019-01-08 17:31   ` Logan Gunthorpe
  2019-01-08 20:54 ` Logan Gunthorpe
  2 siblings, 1 reply; 6+ messages in thread
From: Jeff Moyer @ 2019-01-08 13:49 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Christoph Hellwig, Jens Axboe, linux-block\, linux-kernel\

Hi, Logan,

Logan Gunthorpe <logang@deltatee.com> writes:

> Hey,
>
> I found a regression in v5.0-rc1 this morning. My system panics on boot.
> I've attached a log of the panic.
>
> I bisected to find the problematic commit is:
>
> Fixes: 9d037ad707ed ("block: remove req->timeout_list")
>
> But it makes no sense to me why this commit would cause a problem like
> this. I've attached a bisect log. I've also tested v5.0-rc1 with this
> commit reverted and that boots fine.
>
> The traceback seems to indicate the problem is on the bip_get_seed()
> line in t10_pi_complete(). Which suggests that bio_integrity() is
> returning NULL.

Does your hardware actually support protection information?

-Jeff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Regression in v5.0-rc1: Panic at boot
  2019-01-08 13:19 ` Christoph Hellwig
@ 2019-01-08 17:24   ` Logan Gunthorpe
  0 siblings, 0 replies; 6+ messages in thread
From: Logan Gunthorpe @ 2019-01-08 17:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, linux-kernel



On 2019-01-08 6:19 a.m., Christoph Hellwig wrote:
> On Mon, Jan 07, 2019 at 12:41:06PM -0700, Logan Gunthorpe wrote:
>> Hey,
>>
>> I found a regression in v5.0-rc1 this morning. My system panics on boot.
>> I've attached a log of the panic.
>>
>> I bisected to find the problematic commit is:
>>
>> Fixes: 9d037ad707ed ("block: remove req->timeout_list")
>>
>> But it makes no sense to me why this commit would cause a problem like
>> this. I've attached a bisect log. I've also tested v5.0-rc1 with this
>> commit reverted and that boots fine.
>>
>> The traceback seems to indicate the problem is on the bip_get_seed()
>> line in t10_pi_complete(). Which suggests that bio_integrity() is
>> returning NULL.
> 
> Very odd.  Can you try an experiment?  Can you add padding the size
> of struct timer_list to struct request to check if that makes the
> problem go away?  Then move the padding from the where the field
> was to the end and see if that still "helps"?

Ok I tried these things and they all boot without panic:

1) Add two void pointers to where 'timer_list' was
2) Add two void pointer to the end of the struct
3) Add one void pointer to the end of the struct

So it seems to be a struct size issue...

Logan


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Regression in v5.0-rc1: Panic at boot
  2019-01-08 13:49 ` Jeff Moyer
@ 2019-01-08 17:31   ` Logan Gunthorpe
  0 siblings, 0 replies; 6+ messages in thread
From: Logan Gunthorpe @ 2019-01-08 17:31 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Christoph Hellwig, Jens Axboe, linux-block, linux-kernel



On 2019-01-08 6:49 a.m., Jeff Moyer wrote:
>> The traceback seems to indicate the problem is on the bip_get_seed()
>> line in t10_pi_complete(). Which suggests that bio_integrity() is
>> returning NULL.
> 
> Does your hardware actually support protection information?

As far as I know, I do not. If I add a printk to t10_pi_complete(), it
doesn't print on a successful boot and does print on a broken boot. So
something is causing it to be called erroneously.

Logan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Regression in v5.0-rc1: Panic at boot
  2019-01-07 19:41 Regression in v5.0-rc1: Panic at boot Logan Gunthorpe
  2019-01-08 13:19 ` Christoph Hellwig
  2019-01-08 13:49 ` Jeff Moyer
@ 2019-01-08 20:54 ` Logan Gunthorpe
  2 siblings, 0 replies; 6+ messages in thread
From: Logan Gunthorpe @ 2019-01-08 20:54 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: linux-block, linux-kernel



On 2019-01-07 12:41 p.m., Logan Gunthorpe wrote:
> I found a regression in v5.0-rc1 this morning. My system panics on boot.
> I've attached a log of the panic.

I just sent a patch which, I believe, resolves the regression:

http://lkml.kernel.org/r/20190108205043.3122-1-logang@deltatee.com

The problem turned out to be an allocate-to-small bug in the isci driver.

Logan

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-07 19:41 Regression in v5.0-rc1: Panic at boot Logan Gunthorpe
2019-01-08 13:19 ` Christoph Hellwig
2019-01-08 17:24   ` Logan Gunthorpe
2019-01-08 13:49 ` Jeff Moyer
2019-01-08 17:31   ` Logan Gunthorpe
2019-01-08 20:54 ` Logan Gunthorpe

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org linux-block@archiver.kernel.org
	public-inbox-index linux-block


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/ public-inbox