linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] qla2xxx: rewrite code to avoid hitting gcc bug 70646
@ 2016-04-15 10:36 Denys Vlasenko
  2016-04-15 14:40 ` James Bottomley
  0 siblings, 1 reply; 8+ messages in thread
From: Denys Vlasenko @ 2016-04-15 10:36 UTC (permalink / raw)
  To: Himanshu Madhani
  Cc: Denys Vlasenko, James Bottomley, qla2xxx-upstream,
	Josh Poimboeuf, linux-scsi, linux-kernel

More info here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Himanshu Madhani <himanshu.madhani@qlogic.com>
CC: James Bottomley <James.Bottomley@HansenPartnership.com>
CC: qla2xxx-upstream@qlogic.com
CC: Josh Poimboeuf <jpoimboe@redhat.com>
CC: linux-scsi@vger.kernel.org
CC: linux-kernel@vger.kernel.org
---
 drivers/scsi/qla2xxx/qla_attr.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/scsi/qla2xxx/qla_attr.c b/drivers/scsi/qla2xxx/qla_attr.c
index 4dc06a13..2dd9c72 100644
--- a/drivers/scsi/qla2xxx/qla_attr.c
+++ b/drivers/scsi/qla2xxx/qla_attr.c
@@ -2003,9 +2003,14 @@ static void
 qla2x00_get_host_fabric_name(struct Scsi_Host *shost)
 {
 	scsi_qla_host_t *vha = shost_priv(shost);
+	/*
+	 * This can trigger gcc 4.9/5.3 bug.
+	 * See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
 	uint8_t node_name[WWN_SIZE] = { 0xFF, 0xFF, 0xFF, 0xFF, \
 		0xFF, 0xFF, 0xFF, 0xFF};
 	u64 fabric_name = wwn_to_u64(node_name);
+	 */
+	u64 fabric_name = (u64)(s64)-1; /* the same as above */
 
 	if (vha->device_flags & SWITCH_FOUND)
 		fabric_name = wwn_to_u64(vha->fabric_node_name);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] qla2xxx: rewrite code to avoid hitting gcc bug 70646
  2016-04-15 10:36 [PATCH] qla2xxx: rewrite code to avoid hitting gcc bug 70646 Denys Vlasenko
@ 2016-04-15 14:40 ` James Bottomley
  2016-04-15 18:56   ` Denys Vlasenko
  0 siblings, 1 reply; 8+ messages in thread
From: James Bottomley @ 2016-04-15 14:40 UTC (permalink / raw)
  To: Denys Vlasenko, Himanshu Madhani
  Cc: qla2xxx-upstream, Josh Poimboeuf, linux-scsi, linux-kernel

On Fri, 2016-04-15 at 12:36 +0200, Denys Vlasenko wrote:
> More info here:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646

This bug is under investigation, so I'd rather not alter code for a gcc
bug until we know if we can supply options to fix it rather than
changing code.

James


> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> CC: Himanshu Madhani <himanshu.madhani@qlogic.com>
> CC: James Bottomley <James.Bottomley@HansenPartnership.com>
> CC: qla2xxx-upstream@qlogic.com
> CC: Josh Poimboeuf <jpoimboe@redhat.com>
> CC: linux-scsi@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> ---
>  drivers/scsi/qla2xxx/qla_attr.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_attr.c
> b/drivers/scsi/qla2xxx/qla_attr.c
> index 4dc06a13..2dd9c72 100644
> --- a/drivers/scsi/qla2xxx/qla_attr.c
> +++ b/drivers/scsi/qla2xxx/qla_attr.c
> @@ -2003,9 +2003,14 @@ static void
>  qla2x00_get_host_fabric_name(struct Scsi_Host *shost)
>  {
>  	scsi_qla_host_t *vha = shost_priv(shost);
> +	/*
> +	 * This can trigger gcc 4.9/5.3 bug.
> +	 * See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
>  	uint8_t node_name[WWN_SIZE] = { 0xFF, 0xFF, 0xFF, 0xFF, \
>  		0xFF, 0xFF, 0xFF, 0xFF};
>  	u64 fabric_name = wwn_to_u64(node_name);
> +	 */
> +	u64 fabric_name = (u64)(s64)-1; /* the same as above */
>  
>  	if (vha->device_flags & SWITCH_FOUND)
>  		fabric_name = wwn_to_u64(vha->fabric_node_name);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] qla2xxx: rewrite code to avoid hitting gcc bug 70646
  2016-04-15 14:40 ` James Bottomley
@ 2016-04-15 18:56   ` Denys Vlasenko
  2016-04-15 19:05     ` James Bottomley
  0 siblings, 1 reply; 8+ messages in thread
From: Denys Vlasenko @ 2016-04-15 18:56 UTC (permalink / raw)
  To: James Bottomley, Himanshu Madhani
  Cc: qla2xxx-upstream, Josh Poimboeuf, linux-scsi, linux-kernel

On 04/15/2016 04:40 PM, James Bottomley wrote:
> On Fri, 2016-04-15 at 12:36 +0200, Denys Vlasenko wrote:
>> More info here:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
> 
> This bug is under investigation, so I'd rather not alter code for a gcc
> bug until we know if we can supply options to fix it rather than
> changing code.


Background. The bug exists in gcc for 2 years, but it is rather
hard to trigger, so nobody noticed.

Unfortunately for kernel, these two commits landed in Linus tree
in March 16 and 17:


On 04/13/2016 05:36 AM, Josh Poimboeuf wrote:
> It occurs with the combination of the following two recent commits:
>
> - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations")
> - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")


and now *many* users of qla2x00 and new-ish gcc are going to
very much notice it, as their kernels will start crashing reliably.

The commits can be reverted, sure, but they per se do not contain
anything unusual. They, together with not very typical construct
in qla2x00_get_host_fabric_name, one
which boils down to "swab64p(constant_array_of_8_bytes)",
just happen to nudge gcc in a right way to finally trigger the bug.

So I came with another idea how to forestall the imminent deluge of
qla2x00 oops reports - this patch.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] qla2xxx: rewrite code to avoid hitting gcc bug 70646
  2016-04-15 18:56   ` Denys Vlasenko
@ 2016-04-15 19:05     ` James Bottomley
  2016-04-15 20:02       ` Josh Poimboeuf
  2016-04-15 21:09       ` Denys Vlasenko
  0 siblings, 2 replies; 8+ messages in thread
From: James Bottomley @ 2016-04-15 19:05 UTC (permalink / raw)
  To: Denys Vlasenko, Himanshu Madhani
  Cc: qla2xxx-upstream, Josh Poimboeuf, linux-scsi, linux-kernel

On Fri, 2016-04-15 at 20:56 +0200, Denys Vlasenko wrote:
> On 04/15/2016 04:40 PM, James Bottomley wrote:
> > On Fri, 2016-04-15 at 12:36 +0200, Denys Vlasenko wrote:
> > > More info here:
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
> > 
> > This bug is under investigation, so I'd rather not alter code for a
> > gcc
> > bug until we know if we can supply options to fix it rather than
> > changing code.
> 
> 
> Background. The bug exists in gcc for 2 years, but it is rather
> hard to trigger, so nobody noticed.

We know this ... linux-scsi is on the cc for the other thread on this.

> Unfortunately for kernel, these two commits landed in Linus tree
> in March 16 and 17:
> 
> 
> On 04/13/2016 05:36 AM, Josh Poimboeuf wrote:
> > It occurs with the combination of the following two recent commits:
> > 
> > - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining
> > of some byteswap operations")
> > - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")
> 
> 
> and now *many* users of qla2x00 and new-ish gcc are going to
> very much notice it, as their kernels will start crashing reliably.
> 
> The commits can be reverted, sure, but they per se do not contain
> anything unusual. They, together with not very typical construct
> in qla2x00_get_host_fabric_name, one
> which boils down to "swab64p(constant_array_of_8_bytes)",
> just happen to nudge gcc in a right way to finally trigger the bug.
> 
> So I came with another idea how to forestall the imminent deluge of
> qla2x00 oops reports - this patch.

There are actually a raft of checkers that run the upstream code which
aren't seeing any problem; likely because the code is harder to trigger
than you think.  So, lets wait until the resolution of the other thread
before we panic, especially since we're only at -rc3.

James

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] qla2xxx: rewrite code to avoid hitting gcc bug 70646
  2016-04-15 19:05     ` James Bottomley
@ 2016-04-15 20:02       ` Josh Poimboeuf
  2016-04-15 21:15         ` James Bottomley
  2016-04-15 21:09       ` Denys Vlasenko
  1 sibling, 1 reply; 8+ messages in thread
From: Josh Poimboeuf @ 2016-04-15 20:02 UTC (permalink / raw)
  To: James Bottomley
  Cc: Denys Vlasenko, Himanshu Madhani, qla2xxx-upstream, linux-scsi,
	linux-kernel

On Fri, Apr 15, 2016 at 12:05:26PM -0700, James Bottomley wrote:
> On Fri, 2016-04-15 at 20:56 +0200, Denys Vlasenko wrote:
> > and now *many* users of qla2x00 and new-ish gcc are going to
> > very much notice it, as their kernels will start crashing reliably.
> > 
> > The commits can be reverted, sure, but they per se do not contain
> > anything unusual. They, together with not very typical construct
> > in qla2x00_get_host_fabric_name, one
> > which boils down to "swab64p(constant_array_of_8_bytes)",
> > just happen to nudge gcc in a right way to finally trigger the bug.
> > 
> > So I came with another idea how to forestall the imminent deluge of
> > qla2x00 oops reports - this patch.
> 
> There are actually a raft of checkers that run the upstream code which
> aren't seeing any problem; likely because the code is harder to trigger
> than you think.  So, lets wait until the resolution of the other thread
> before we panic, especially since we're only at -rc3.

Regardless of the outcome of the gcc bug, it seems kind of silly to
byteswap a constant value of 0xffffffffffffffff.

	uint8_t node_name[WWN_SIZE] = { 0xFF, 0xFF, 0xFF, 0xFF, \
		0xFF, 0xFF, 0xFF, 0xFF};
	u64 fabric_name = wwn_to_u64(node_name);

Similar to what Denys suggested, it can just be:

	u64 fabric_name = -1;
or
	u64 fabric_name = 0xffffffffffffffff;

Wouldn't that be an improvement to the code regardless?

-- 
Josh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] qla2xxx: rewrite code to avoid hitting gcc bug 70646
  2016-04-15 19:05     ` James Bottomley
  2016-04-15 20:02       ` Josh Poimboeuf
@ 2016-04-15 21:09       ` Denys Vlasenko
  2016-04-15 21:25         ` James Bottomley
  1 sibling, 1 reply; 8+ messages in thread
From: Denys Vlasenko @ 2016-04-15 21:09 UTC (permalink / raw)
  To: James Bottomley
  Cc: qla2xxx-upstream, Josh Poimboeuf, linux-scsi, linux-kernel

On 04/15/2016 09:05 PM, James Bottomley wrote:
> On Fri, 2016-04-15 at 20:56 +0200, Denys Vlasenko wrote:
>> On 04/15/2016 04:40 PM, James Bottomley wrote:
>>> On Fri, 2016-04-15 at 12:36 +0200, Denys Vlasenko wrote:
>>>> More info here:
>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
>>>
>>> This bug is under investigation, so I'd rather not alter code for a
>>> gcc
>>> bug until we know if we can supply options to fix it rather than
>>> changing code.
>>
>>
>> Background. The bug exists in gcc for 2 years, but it is rather
>> hard to trigger, so nobody noticed.
> 
> We know this ... linux-scsi is on the cc for the other thread on this.
> 
>> Unfortunately for kernel, these two commits landed in Linus tree
>> in March 16 and 17:
>>
>>
>> On 04/13/2016 05:36 AM, Josh Poimboeuf wrote:
>>> It occurs with the combination of the following two recent commits:
>>>
>>> - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining
>>> of some byteswap operations")
>>> - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")
>>
>>
>> and now *many* users of qla2x00 and new-ish gcc are going to
>> very much notice it, as their kernels will start crashing reliably.
>>
>> The commits can be reverted, sure, but they per se do not contain
>> anything unusual. They, together with not very typical construct
>> in qla2x00_get_host_fabric_name, one
>> which boils down to "swab64p(constant_array_of_8_bytes)",
>> just happen to nudge gcc in a right way to finally trigger the bug.
>>
>> So I came with another idea how to forestall the imminent deluge of
>> qla2x00 oops reports - this patch.
> 
> There are actually a raft of checkers that run the upstream code which
> aren't seeing any problem; likely because the code is harder to trigger
> than you think.  So, lets wait until the resolution of the other thread
> before we panic, especially since we're only at -rc3.

I'm not panicking, James.

By sending a workaround, I just want to make sure that *other people*
won't be forced to fix up a problem which surfaced because of *my* patch.


I'm afraid "harder to trigger than you think" is not true.
It is nearly trivial to trigger it now.
I just tried the following on a freshly installed Fedora 21 machine:

$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
$ cd linux
$ make defconfig
$ sed '/SCSI_FC_ATTRS/d;/SCSI_LOWLEVEL/d' -i .config
$ make oldconfig    # answer "yes" to everything
$ nice make -j22
$ objdump -dr drivers/scsi/qla2xxx/qla_attr.o | grep -A10 qla2x00_get_host_fabric_name

0000000000001540 <qla2x00_get_host_fabric_name>:
    1540:	55                   	push   %rbp
    1541:	48 89 e5             	mov    %rsp,%rbp
    1544:	66 66 66 2e 0f 1f 84 	data32 data32 nopw %cs:0x0(%rax,%rax,1)
    154b:	00 00 00 00 00

0000000000001550 <qla2x00_fw_state_show>:
    1550:	55                   	push   %rbp
    1551:	48 89 e5             	mov    %rsp,%rbp
    1554:	53                   	push   %rbx
    1555:	48 89 d3             	mov    %rdx,%rbx

See?
I'm sure Fedora 22, 23 and 24 will also do that.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] qla2xxx: rewrite code to avoid hitting gcc bug 70646
  2016-04-15 20:02       ` Josh Poimboeuf
@ 2016-04-15 21:15         ` James Bottomley
  0 siblings, 0 replies; 8+ messages in thread
From: James Bottomley @ 2016-04-15 21:15 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Denys Vlasenko, Himanshu Madhani, qla2xxx-upstream, linux-scsi,
	linux-kernel

On Fri, 2016-04-15 at 15:02 -0500, Josh Poimboeuf wrote:
> On Fri, Apr 15, 2016 at 12:05:26PM -0700, James Bottomley wrote:
> > On Fri, 2016-04-15 at 20:56 +0200, Denys Vlasenko wrote:
> > > and now *many* users of qla2x00 and new-ish gcc are going to
> > > very much notice it, as their kernels will start crashing
> > > reliably.
> > > 
> > > The commits can be reverted, sure, but they per se do not contain
> > > anything unusual. They, together with not very typical construct
> > > in qla2x00_get_host_fabric_name, one
> > > which boils down to "swab64p(constant_array_of_8_bytes)",
> > > just happen to nudge gcc in a right way to finally trigger the
> > > bug.
> > > 
> > > So I came with another idea how to forestall the imminent deluge
> > > of
> > > qla2x00 oops reports - this patch.
> > 
> > There are actually a raft of checkers that run the upstream code
> > which
> > aren't seeing any problem; likely because the code is harder to
> > trigger
> > than you think.  So, lets wait until the resolution of the other
> > thread
> > before we panic, especially since we're only at -rc3.
> 
> Regardless of the outcome of the gcc bug, it seems kind of silly to
> byteswap a constant value of 0xffffffffffffffff.
> 
> 	uint8_t node_name[WWN_SIZE] = { 0xFF, 0xFF, 0xFF, 0xFF, \
> 		0xFF, 0xFF, 0xFF, 0xFF};
> 	u64 fabric_name = wwn_to_u64(node_name);
> 
> Similar to what Denys suggested, it can just be:
> 
> 	u64 fabric_name = -1;
> or
> 	u64 fabric_name = 0xffffffffffffffff;
> 
> Wouldn't that be an improvement to the code regardless?

"Improvement" would be in the eye of the beholder.  Semantically it
would be wrong because we're initialising a CPU local representation of
a big endian structure, so we *should* use the conversion.

James

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] qla2xxx: rewrite code to avoid hitting gcc bug 70646
  2016-04-15 21:09       ` Denys Vlasenko
@ 2016-04-15 21:25         ` James Bottomley
  0 siblings, 0 replies; 8+ messages in thread
From: James Bottomley @ 2016-04-15 21:25 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: qla2xxx-upstream, Josh Poimboeuf, linux-scsi, linux-kernel

On Fri, 2016-04-15 at 23:09 +0200, Denys Vlasenko wrote:
> On 04/15/2016 09:05 PM, James Bottomley wrote:
> > On Fri, 2016-04-15 at 20:56 +0200, Denys Vlasenko wrote:
> > > On 04/15/2016 04:40 PM, James Bottomley wrote:
> > > > On Fri, 2016-04-15 at 12:36 +0200, Denys Vlasenko wrote:
> > > > > More info here:
> > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
> > > > 
> > > > This bug is under investigation, so I'd rather not alter code
> > > > for a
> > > > gcc
> > > > bug until we know if we can supply options to fix it rather
> > > > than
> > > > changing code.
> > > 
> > > 
> > > Background. The bug exists in gcc for 2 years, but it is rather
> > > hard to trigger, so nobody noticed.
> > 
> > We know this ... linux-scsi is on the cc for the other thread on
> > this.
> > 
> > > Unfortunately for kernel, these two commits landed in Linus tree
> > > in March 16 and 17:
> > > 
> > > 
> > > On 04/13/2016 05:36 AM, Josh Poimboeuf wrote:
> > > > It occurs with the combination of the following two recent
> > > > commits:
> > > > 
> > > > - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force
> > > > inlining
> > > > of some byteswap operations")
> > > > - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn
> > > > access")
> > > 
> > > 
> > > and now *many* users of qla2x00 and new-ish gcc are going to
> > > very much notice it, as their kernels will start crashing
> > > reliably.
> > > 
> > > The commits can be reverted, sure, but they per se do not contain
> > > anything unusual. They, together with not very typical construct
> > > in qla2x00_get_host_fabric_name, one
> > > which boils down to "swab64p(constant_array_of_8_bytes)",
> > > just happen to nudge gcc in a right way to finally trigger the
> > > bug.
> > > 
> > > So I came with another idea how to forestall the imminent deluge
> > > of
> > > qla2x00 oops reports - this patch.
> > 
> > There are actually a raft of checkers that run the upstream code
> > which
> > aren't seeing any problem; likely because the code is harder to
> > trigger
> > than you think.  So, lets wait until the resolution of the other
> > thread
> > before we panic, especially since we're only at -rc3.
> 
> I'm not panicking, James.
> 
> By sending a workaround, I just want to make sure that *other people*
> won't be forced to fix up a problem which surfaced because of *my* 
> patch.

Look, if gcc really proves to be intractable, I think what should
happen is revert the triggering patch, which is

commit e3bde9568d992c5f985e6e30731a5f9f9bef7b13
Author: Denys Vlasenko <dvlasenk@redhat.com>
Date:   Thu Mar 17 14:22:47 2016 -0700

    include/linux/unaligned: force inlining of byteswap operations

But, as I've said a couple of times now, there are no bug reports from
the testers about qla2xxx (yet) so we can afford to wait a bit and see
if there's some other resolution that doesn't involve changing kernel
code to work around a local gcc bug.

James

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-04-15 21:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-15 10:36 [PATCH] qla2xxx: rewrite code to avoid hitting gcc bug 70646 Denys Vlasenko
2016-04-15 14:40 ` James Bottomley
2016-04-15 18:56   ` Denys Vlasenko
2016-04-15 19:05     ` James Bottomley
2016-04-15 20:02       ` Josh Poimboeuf
2016-04-15 21:15         ` James Bottomley
2016-04-15 21:09       ` Denys Vlasenko
2016-04-15 21:25         ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).