linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* qla2xxx panic with 4.19-stable
@ 2020-09-11  2:26 Zhengyuan Liu
  2020-09-11 17:37 ` Himanshu Madhani
  0 siblings, 1 reply; 6+ messages in thread
From: Zhengyuan Liu @ 2020-09-11  2:26 UTC (permalink / raw)
  To: qla2xxx-upstream; +Cc: linux-scsi, gregkh, liuzhengyuan

Hi,

There is a panic of NULL pointer dereference on my arm64 server when
boot  with the fabric line  plugged into the HBA of QLE2692. After
binary-search with git bisect I found this panic is introduced by
commit 4984a06bf094 ("scsi: qla2xxx: Remove all rports if fabric scan
retry fails"). The upstream and 4.19-stable both had the same problem
when reset to this point. but the upstream had fix this
unintentionally after commit da61ef053bcf ("scsi: qla2xxx: Reduce
holding sess_lock to prevent CPU") while the latest 4.19-stable still
has this issue. the panic showed as following:

[   13.380405][  0] Unable to handle kernel NULL pointer dereference
at virtual address 0000000000000000
[   13.390947][  0] Mem abort info:
[   13.395535][  0]   ESR = 0x96000045
[   13.400390][  0]   Exception class = DABT (current EL), IL = 32 bits
[   13.408089][  0]   SET = 0, FnV = 0
.
[   13.412941][  0]   EA = 0, S1PTW = 0
[   13.416747][  0] Data abort info:
[   13.420048][  0]   ISV = 0, ISS = 0x00000045
[   13.424293][  0]   CM = 0, WnR = 1
[   13.427676][  0] user pgtable: 64k pages, 48-bit VAs, pgdp = (____ptrval____)
[   13.434778][  0] [0000000000000000] pgd=0000000000000000,
pud=0000000000000000
[   13.441968][  0] Internal error: Oops: 96000045 [#1] SMP
[   13.447250][  0] Modules linked in: qla2xxx nvme_fc nvme_fabrics
scsi_transport_fc igb megaraid_sas dm_snapshot iscsi_tcp libiscsi_tcp
libs
[   13.472588][  0] Process kworker/0:2 (pid: 343, stack limit =
0x(____ptrval____))
[   13.472675][  5] audit: type=1130 audit(1599118767.260:14): pid=1
uid=0 auid=4294967295 ses=4294967295 msg='unit=initrd-parse-etc
comm="sy'
[   13.480032][  0] CPU: 0 PID: 343 Comm: kworker/0:2 Tainted: G
 W         4.19.90-19.ky10.aarch64 #1
[   13.480033][  5] Hardware name: GreatWall, BIOS 601FBE28 2020/04/20
[   13.480045][  0] Workqueue: qla2xxx_wq qla2x00_iocb_work_fn [qla2xxx]
[   13.499248][  0] audit: type=1131 audit(1599118767.260:15): pid=1
uid=0 auid=4294967295 ses=4294967295 msg='unit=initrd-parse-etc
comm="sy'
[   13.508759][  0] pstate: 40000005 (nZcv daif -PAN -UAO)
[   13.547687][ 24] pc : __memset+0x16c/0x188
[   13.547697][  0] lr : qla24xx_async_gpnft+0x194/0x950 [qla2xxx]
[   13.547701][  0] sp : ffffb2158236bc60
[   13.561388][  0] x29: ffffb2158236bc60 x28: 0000000000000000
[   13.567104][  0] x27: ffff3be824ac0148 x26: ffff3be824ac00b8
[   13.572820][  0] x25: ffff3be824b031e0 x24: 0000000000000028
[   13.578535][  0] x23: ffffb2158600d188 x22: ffffb21586d3ea38
[   13.584251][  0] x21: 0000000000008010 x20: ffffb21586d3ea08
[   13.589968][  0] x19: ffffb2158600d040 x18: 0000000000000400
[   13.595683][  0] x17: 0000000000000000 x16: ffff3be83f9a9500
[   13.601398][  0] x15: 0000000000000400 x14: 0000000000000400
[   13.607114][  0] x13: 0000000000000189 x12: 0000000000000001
[   13.612829][  0] x11: 0000000000000000 x10: 0000000000000b40
[   13.618544][  0] x9 : 0000000000000000 x8 : 0000000000000000
[   13.624259][  0] x7 : 0000000000000000 x6 : 000000000000003f
[   13.629974][  0] x5 : 0000000000000040 x4 : 0000000000000000
[   13.635689][  0] x3 : 0000000000000004 x2 : 0000000000007fd0
[   13.641404][  0] x1 : 0000000000000000 x0 : 0000000000000000
[   13.647119][  0] Call trace:
[   13.649983][  0]  __memset+0x16c/0x188
[   13.653718][  0]  qla2x00_do_work+0x398/0x440 [qla2xxx]
[   13.658920][  0]  qla2x00_iocb_work_fn+0x50/0xe8 [qla2xxx]
[   13.664378][  0]  process_one_work+0x1f0/0x3c8
[   13.668797][  0]  worker_thread+0x48/0x4d0
[   13.672871][  0]  kthread+0x128/0x130
[   13.676514][  0]  ret_from_fork+0x10/0x18
[   13.680503][  0] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428)
[   13.687027][  0] ---[ end trace 258cdcdd74a25238 ]---
[   13.692051][  0] Kernel panic - not syncing: Fatal exception

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: qla2xxx panic with 4.19-stable
  2020-09-11  2:26 qla2xxx panic with 4.19-stable Zhengyuan Liu
@ 2020-09-11 17:37 ` Himanshu Madhani
  2020-09-14  2:36   ` Zhengyuan Liu
  0 siblings, 1 reply; 6+ messages in thread
From: Himanshu Madhani @ 2020-09-11 17:37 UTC (permalink / raw)
  To: Zhengyuan Liu; +Cc: linux-scsi, gregkh, liuzhengyuan

Hi,

> On Sep 10, 2020, at 9:26 PM, Zhengyuan Liu <liuzhengyuang521@gmail.com> wrote:
> 
> Hi,
> 
> There is a panic of NULL pointer dereference on my arm64 server when
> boot  with the fabric line  plugged into the HBA of QLE2692. After
> binary-search with git bisect I found this panic is introduced by
> commit 4984a06bf094 ("scsi: qla2xxx: Remove all rports if fabric scan
> retry fails"). The upstream and 4.19-stable both had the same problem
> when reset to this point. but the upstream had fix this
> unintentionally after commit da61ef053bcf ("scsi: qla2xxx: Reduce
> holding sess_lock to prevent CPU") while the latest 4.19-stable still
> has this issue. the panic showed as following:
> 
> [   13.380405][  0] Unable to handle kernel NULL pointer dereference
> at virtual address 0000000000000000
> [   13.390947][  0] Mem abort info:
> [   13.395535][  0]   ESR = 0x96000045
> [   13.400390][  0]   Exception class = DABT (current EL), IL = 32 bits
> [   13.408089][  0]   SET = 0, FnV = 0
> .
> [   13.412941][  0]   EA = 0, S1PTW = 0
> [   13.416747][  0] Data abort info:
> [   13.420048][  0]   ISV = 0, ISS = 0x00000045
> [   13.424293][  0]   CM = 0, WnR = 1
> [   13.427676][  0] user pgtable: 64k pages, 48-bit VAs, pgdp = (____ptrval____)
> [   13.434778][  0] [0000000000000000] pgd=0000000000000000,
> pud=0000000000000000
> [   13.441968][  0] Internal error: Oops: 96000045 [#1] SMP
> [   13.447250][  0] Modules linked in: qla2xxx nvme_fc nvme_fabrics
> scsi_transport_fc igb megaraid_sas dm_snapshot iscsi_tcp libiscsi_tcp
> libs
> [   13.472588][  0] Process kworker/0:2 (pid: 343, stack limit =
> 0x(____ptrval____))
> [   13.472675][  5] audit: type=1130 audit(1599118767.260:14): pid=1
> uid=0 auid=4294967295 ses=4294967295 msg='unit=initrd-parse-etc
> comm="sy'
> [   13.480032][  0] CPU: 0 PID: 343 Comm: kworker/0:2 Tainted: G
> W         4.19.90-19.ky10.aarch64 #1
> [   13.480033][  5] Hardware name: GreatWall, BIOS 601FBE28 2020/04/20
> [   13.480045][  0] Workqueue: qla2xxx_wq qla2x00_iocb_work_fn [qla2xxx]
> [   13.499248][  0] audit: type=1131 audit(1599118767.260:15): pid=1
> uid=0 auid=4294967295 ses=4294967295 msg='unit=initrd-parse-etc
> comm="sy'
> [   13.508759][  0] pstate: 40000005 (nZcv daif -PAN -UAO)
> [   13.547687][ 24] pc : __memset+0x16c/0x188
> [   13.547697][  0] lr : qla24xx_async_gpnft+0x194/0x950 [qla2xxx]
> [   13.547701][  0] sp : ffffb2158236bc60
> [   13.561388][  0] x29: ffffb2158236bc60 x28: 0000000000000000
> [   13.567104][  0] x27: ffff3be824ac0148 x26: ffff3be824ac00b8
> [   13.572820][  0] x25: ffff3be824b031e0 x24: 0000000000000028
> [   13.578535][  0] x23: ffffb2158600d188 x22: ffffb21586d3ea38
> [   13.584251][  0] x21: 0000000000008010 x20: ffffb21586d3ea08
> [   13.589968][  0] x19: ffffb2158600d040 x18: 0000000000000400
> [   13.595683][  0] x17: 0000000000000000 x16: ffff3be83f9a9500
> [   13.601398][  0] x15: 0000000000000400 x14: 0000000000000400
> [   13.607114][  0] x13: 0000000000000189 x12: 0000000000000001
> [   13.612829][  0] x11: 0000000000000000 x10: 0000000000000b40
> [   13.618544][  0] x9 : 0000000000000000 x8 : 0000000000000000
> [   13.624259][  0] x7 : 0000000000000000 x6 : 000000000000003f
> [   13.629974][  0] x5 : 0000000000000040 x4 : 0000000000000000
> [   13.635689][  0] x3 : 0000000000000004 x2 : 0000000000007fd0
> [   13.641404][  0] x1 : 0000000000000000 x0 : 0000000000000000
> [   13.647119][  0] Call trace:
> [   13.649983][  0]  __memset+0x16c/0x188
> [   13.653718][  0]  qla2x00_do_work+0x398/0x440 [qla2xxx]
> [   13.658920][  0]  qla2x00_iocb_work_fn+0x50/0xe8 [qla2xxx]
> [   13.664378][  0]  process_one_work+0x1f0/0x3c8
> [   13.668797][  0]  worker_thread+0x48/0x4d0
> [   13.672871][  0]  kthread+0x128/0x130
> [   13.676514][  0]  ret_from_fork+0x10/0x18
> [   13.680503][  0] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428)
> [   13.687027][  0] ---[ end trace 258cdcdd74a25238 ]---
> [   13.692051][  0] Kernel panic - not syncing: Fatal exception

Have you tried applying commit da61ef053bcf ("scsi: qla2xxx: Reduce holding sess_lock to prevent CPU”) to confirm if it resolves your panic. It does look like the panic should resolve with the changes in that patch.

If you are able to verify then we can request for sable back port with your reported-by and tested-by tags. 

--
Himanshu Madhani	 Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: qla2xxx panic with 4.19-stable
  2020-09-11 17:37 ` Himanshu Madhani
@ 2020-09-14  2:36   ` Zhengyuan Liu
  2020-09-15 15:16     ` Himanshu Madhani
  0 siblings, 1 reply; 6+ messages in thread
From: Zhengyuan Liu @ 2020-09-14  2:36 UTC (permalink / raw)
  To: Himanshu Madhani; +Cc: linux-scsi, gregkh, liuzhengyuan

On Sat, Sep 12, 2020 at 1:37 AM Himanshu Madhani
<himanshu.madhani@oracle.com> wrote:
>
> Hi,
>
> > On Sep 10, 2020, at 9:26 PM, Zhengyuan Liu <liuzhengyuang521@gmail.com> wrote:
> >
> > Hi,
> >
> > There is a panic of NULL pointer dereference on my arm64 server when
> > boot  with the fabric line  plugged into the HBA of QLE2692. After
> > binary-search with git bisect I found this panic is introduced by
> > commit 4984a06bf094 ("scsi: qla2xxx: Remove all rports if fabric scan
> > retry fails"). The upstream and 4.19-stable both had the same problem
> > when reset to this point. but the upstream had fix this
> > unintentionally after commit da61ef053bcf ("scsi: qla2xxx: Reduce
> > holding sess_lock to prevent CPU") while the latest 4.19-stable still
> > has this issue. the panic showed as following:
> >
> > [   13.380405][  0] Unable to handle kernel NULL pointer dereference
> > at virtual address 0000000000000000
> > [   13.390947][  0] Mem abort info:
> > [   13.395535][  0]   ESR = 0x96000045
> > [   13.400390][  0]   Exception class = DABT (current EL), IL = 32 bits
> > [   13.408089][  0]   SET = 0, FnV = 0
> > .
> > [   13.412941][  0]   EA = 0, S1PTW = 0
> > [   13.416747][  0] Data abort info:
> > [   13.420048][  0]   ISV = 0, ISS = 0x00000045
> > [   13.424293][  0]   CM = 0, WnR = 1
> > [   13.427676][  0] user pgtable: 64k pages, 48-bit VAs, pgdp = (____ptrval____)
> > [   13.434778][  0] [0000000000000000] pgd=0000000000000000,
> > pud=0000000000000000
> > [   13.441968][  0] Internal error: Oops: 96000045 [#1] SMP
> > [   13.447250][  0] Modules linked in: qla2xxx nvme_fc nvme_fabrics
> > scsi_transport_fc igb megaraid_sas dm_snapshot iscsi_tcp libiscsi_tcp
> > libs
> > [   13.472588][  0] Process kworker/0:2 (pid: 343, stack limit =
> > 0x(____ptrval____))
> > [   13.472675][  5] audit: type=1130 audit(1599118767.260:14): pid=1
> > uid=0 auid=4294967295 ses=4294967295 msg='unit=initrd-parse-etc
> > comm="sy'
> > [   13.480032][  0] CPU: 0 PID: 343 Comm: kworker/0:2 Tainted: G
> > W         4.19.90-19.ky10.aarch64 #1
> > [   13.480033][  5] Hardware name: GreatWall, BIOS 601FBE28 2020/04/20
> > [   13.480045][  0] Workqueue: qla2xxx_wq qla2x00_iocb_work_fn [qla2xxx]
> > [   13.499248][  0] audit: type=1131 audit(1599118767.260:15): pid=1
> > uid=0 auid=4294967295 ses=4294967295 msg='unit=initrd-parse-etc
> > comm="sy'
> > [   13.508759][  0] pstate: 40000005 (nZcv daif -PAN -UAO)
> > [   13.547687][ 24] pc : __memset+0x16c/0x188
> > [   13.547697][  0] lr : qla24xx_async_gpnft+0x194/0x950 [qla2xxx]
> > [   13.547701][  0] sp : ffffb2158236bc60
> > [   13.561388][  0] x29: ffffb2158236bc60 x28: 0000000000000000
> > [   13.567104][  0] x27: ffff3be824ac0148 x26: ffff3be824ac00b8
> > [   13.572820][  0] x25: ffff3be824b031e0 x24: 0000000000000028
> > [   13.578535][  0] x23: ffffb2158600d188 x22: ffffb21586d3ea38
> > [   13.584251][  0] x21: 0000000000008010 x20: ffffb21586d3ea08
> > [   13.589968][  0] x19: ffffb2158600d040 x18: 0000000000000400
> > [   13.595683][  0] x17: 0000000000000000 x16: ffff3be83f9a9500
> > [   13.601398][  0] x15: 0000000000000400 x14: 0000000000000400
> > [   13.607114][  0] x13: 0000000000000189 x12: 0000000000000001
> > [   13.612829][  0] x11: 0000000000000000 x10: 0000000000000b40
> > [   13.618544][  0] x9 : 0000000000000000 x8 : 0000000000000000
> > [   13.624259][  0] x7 : 0000000000000000 x6 : 000000000000003f
> > [   13.629974][  0] x5 : 0000000000000040 x4 : 0000000000000000
> > [   13.635689][  0] x3 : 0000000000000004 x2 : 0000000000007fd0
> > [   13.641404][  0] x1 : 0000000000000000 x0 : 0000000000000000
> > [   13.647119][  0] Call trace:
> > [   13.649983][  0]  __memset+0x16c/0x188
> > [   13.653718][  0]  qla2x00_do_work+0x398/0x440 [qla2xxx]
> > [   13.658920][  0]  qla2x00_iocb_work_fn+0x50/0xe8 [qla2xxx]
> > [   13.664378][  0]  process_one_work+0x1f0/0x3c8
> > [   13.668797][  0]  worker_thread+0x48/0x4d0
> > [   13.672871][  0]  kthread+0x128/0x130
> > [   13.676514][  0]  ret_from_fork+0x10/0x18
> > [   13.680503][  0] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428)
> > [   13.687027][  0] ---[ end trace 258cdcdd74a25238 ]---
> > [   13.692051][  0] Kernel panic - not syncing: Fatal exception
>
> Have you tried applying commit da61ef053bcf ("scsi: qla2xxx: Reduce holding sess_lock to prevent CPU”) to confirm if it resolves your panic. It does look like the panic should resolve with the changes in that patch.
>
> If you are able to verify then we can request for sable back port with your reported-by and tested-by tags.

Yes, it did resolve my panic after backporting that commit to
4.19-stable. But I cannot apply that commit directly, in order to
resolve the conflict I also backported commit:
 3b1e23aacf80 ("scsi: qla2xxx: Update rscn_rcvd field to more meaningful").
 a4863b16c31e ("scsi: qla2xxx: Move rport registration out of internal").

>
> --
> Himanshu Madhani         Oracle Linux Engineering
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: qla2xxx panic with 4.19-stable
  2020-09-14  2:36   ` Zhengyuan Liu
@ 2020-09-15 15:16     ` Himanshu Madhani
  2020-09-16  7:49       ` Zhengyuan Liu
  0 siblings, 1 reply; 6+ messages in thread
From: Himanshu Madhani @ 2020-09-15 15:16 UTC (permalink / raw)
  To: Zhengyuan Liu; +Cc: linux-scsi, gregkh, liuzhengyuan



> On Sep 13, 2020, at 9:36 PM, Zhengyuan Liu <liuzhengyuang521@gmail.com> wrote:
> 
> On Sat, Sep 12, 2020 at 1:37 AM Himanshu Madhani
> <himanshu.madhani@oracle.com> wrote:
>> 
>> Hi,
>> 
>>> On Sep 10, 2020, at 9:26 PM, Zhengyuan Liu <liuzhengyuang521@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> There is a panic of NULL pointer dereference on my arm64 server when
>>> boot  with the fabric line  plugged into the HBA of QLE2692. After
>>> binary-search with git bisect I found this panic is introduced by
>>> commit 4984a06bf094 ("scsi: qla2xxx: Remove all rports if fabric scan
>>> retry fails"). The upstream and 4.19-stable both had the same problem
>>> when reset to this point. but the upstream had fix this
>>> unintentionally after commit da61ef053bcf ("scsi: qla2xxx: Reduce
>>> holding sess_lock to prevent CPU") while the latest 4.19-stable still
>>> has this issue. the panic showed as following:
>>> 
>>> [   13.380405][  0] Unable to handle kernel NULL pointer dereference
>>> at virtual address 0000000000000000
>>> [   13.390947][  0] Mem abort info:
>>> [   13.395535][  0]   ESR = 0x96000045
>>> [   13.400390][  0]   Exception class = DABT (current EL), IL = 32 bits
>>> [   13.408089][  0]   SET = 0, FnV = 0
>>> .
>>> [   13.412941][  0]   EA = 0, S1PTW = 0
>>> [   13.416747][  0] Data abort info:
>>> [   13.420048][  0]   ISV = 0, ISS = 0x00000045
>>> [   13.424293][  0]   CM = 0, WnR = 1
>>> [   13.427676][  0] user pgtable: 64k pages, 48-bit VAs, pgdp = (____ptrval____)
>>> [   13.434778][  0] [0000000000000000] pgd=0000000000000000,
>>> pud=0000000000000000
>>> [   13.441968][  0] Internal error: Oops: 96000045 [#1] SMP
>>> [   13.447250][  0] Modules linked in: qla2xxx nvme_fc nvme_fabrics
>>> scsi_transport_fc igb megaraid_sas dm_snapshot iscsi_tcp libiscsi_tcp
>>> libs
>>> [   13.472588][  0] Process kworker/0:2 (pid: 343, stack limit =
>>> 0x(____ptrval____))
>>> [   13.472675][  5] audit: type=1130 audit(1599118767.260:14): pid=1
>>> uid=0 auid=4294967295 ses=4294967295 msg='unit=initrd-parse-etc
>>> comm="sy'
>>> [   13.480032][  0] CPU: 0 PID: 343 Comm: kworker/0:2 Tainted: G
>>> W         4.19.90-19.ky10.aarch64 #1
>>> [   13.480033][  5] Hardware name: GreatWall, BIOS 601FBE28 2020/04/20
>>> [   13.480045][  0] Workqueue: qla2xxx_wq qla2x00_iocb_work_fn [qla2xxx]
>>> [   13.499248][  0] audit: type=1131 audit(1599118767.260:15): pid=1
>>> uid=0 auid=4294967295 ses=4294967295 msg='unit=initrd-parse-etc
>>> comm="sy'
>>> [   13.508759][  0] pstate: 40000005 (nZcv daif -PAN -UAO)
>>> [   13.547687][ 24] pc : __memset+0x16c/0x188
>>> [   13.547697][  0] lr : qla24xx_async_gpnft+0x194/0x950 [qla2xxx]
>>> [   13.547701][  0] sp : ffffb2158236bc60
>>> [   13.561388][  0] x29: ffffb2158236bc60 x28: 0000000000000000
>>> [   13.567104][  0] x27: ffff3be824ac0148 x26: ffff3be824ac00b8
>>> [   13.572820][  0] x25: ffff3be824b031e0 x24: 0000000000000028
>>> [   13.578535][  0] x23: ffffb2158600d188 x22: ffffb21586d3ea38
>>> [   13.584251][  0] x21: 0000000000008010 x20: ffffb21586d3ea08
>>> [   13.589968][  0] x19: ffffb2158600d040 x18: 0000000000000400
>>> [   13.595683][  0] x17: 0000000000000000 x16: ffff3be83f9a9500
>>> [   13.601398][  0] x15: 0000000000000400 x14: 0000000000000400
>>> [   13.607114][  0] x13: 0000000000000189 x12: 0000000000000001
>>> [   13.612829][  0] x11: 0000000000000000 x10: 0000000000000b40
>>> [   13.618544][  0] x9 : 0000000000000000 x8 : 0000000000000000
>>> [   13.624259][  0] x7 : 0000000000000000 x6 : 000000000000003f
>>> [   13.629974][  0] x5 : 0000000000000040 x4 : 0000000000000000
>>> [   13.635689][  0] x3 : 0000000000000004 x2 : 0000000000007fd0
>>> [   13.641404][  0] x1 : 0000000000000000 x0 : 0000000000000000
>>> [   13.647119][  0] Call trace:
>>> [   13.649983][  0]  __memset+0x16c/0x188
>>> [   13.653718][  0]  qla2x00_do_work+0x398/0x440 [qla2xxx]
>>> [   13.658920][  0]  qla2x00_iocb_work_fn+0x50/0xe8 [qla2xxx]
>>> [   13.664378][  0]  process_one_work+0x1f0/0x3c8
>>> [   13.668797][  0]  worker_thread+0x48/0x4d0
>>> [   13.672871][  0]  kthread+0x128/0x130
>>> [   13.676514][  0]  ret_from_fork+0x10/0x18
>>> [   13.680503][  0] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428)
>>> [   13.687027][  0] ---[ end trace 258cdcdd74a25238 ]---
>>> [   13.692051][  0] Kernel panic - not syncing: Fatal exception
>> 
>> Have you tried applying commit da61ef053bcf ("scsi: qla2xxx: Reduce holding sess_lock to prevent CPU”) to confirm if it resolves your panic. It does look like the panic should resolve with the changes in that patch.
>> 
>> If you are able to verify then we can request for sable back port with your reported-by and tested-by tags.
> 
> Yes, it did resolve my panic after backporting that commit to
> 4.19-stable. But I cannot apply that commit directly, in order to
> resolve the conflict I also backported commit:
> 3b1e23aacf80 ("scsi: qla2xxx: Update rscn_rcvd field to more meaningful").
> a4863b16c31e ("scsi: qla2xxx: Move rport registration out of internal").
> 

These patches looks good for the 4.19-stable back port. 

Please post it to stable with Reported-by and Tested-by tag. 

Thanks.

>> 
>> --
>> Himanshu Madhani         Oracle Linux Engineering

--
Himanshu Madhani	 Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: qla2xxx panic with 4.19-stable
  2020-09-15 15:16     ` Himanshu Madhani
@ 2020-09-16  7:49       ` Zhengyuan Liu
  2020-09-17 14:58         ` Himanshu Madhani
  0 siblings, 1 reply; 6+ messages in thread
From: Zhengyuan Liu @ 2020-09-16  7:49 UTC (permalink / raw)
  To: Himanshu Madhani; +Cc: linux-scsi, gregkh, liuzhengyuan

On Tue, Sep 15, 2020 at 11:16 PM Himanshu Madhani
<himanshu.madhani@oracle.com> wrote:
>
>
>
> > On Sep 13, 2020, at 9:36 PM, Zhengyuan Liu <liuzhengyuang521@gmail.com> wrote:
> >
> > On Sat, Sep 12, 2020 at 1:37 AM Himanshu Madhani
> > <himanshu.madhani@oracle.com> wrote:
> >>
> >> Hi,
> >>
> >>> On Sep 10, 2020, at 9:26 PM, Zhengyuan Liu <liuzhengyuang521@gmail.com> wrote:
> >>>
> >>> Hi,
> >>>
> >>> There is a panic of NULL pointer dereference on my arm64 server when
> >>> boot  with the fabric line  plugged into the HBA of QLE2692. After
> >>> binary-search with git bisect I found this panic is introduced by
> >>> commit 4984a06bf094 ("scsi: qla2xxx: Remove all rports if fabric scan
> >>> retry fails"). The upstream and 4.19-stable both had the same problem
> >>> when reset to this point. but the upstream had fix this
> >>> unintentionally after commit da61ef053bcf ("scsi: qla2xxx: Reduce
> >>> holding sess_lock to prevent CPU") while the latest 4.19-stable still
> >>> has this issue. the panic showed as following:
> >>>
> >>> [   13.380405][  0] Unable to handle kernel NULL pointer dereference
> >>> at virtual address 0000000000000000
> >>> [   13.390947][  0] Mem abort info:
> >>> [   13.395535][  0]   ESR = 0x96000045
> >>> [   13.400390][  0]   Exception class = DABT (current EL), IL = 32 bits
> >>> [   13.408089][  0]   SET = 0, FnV = 0
> >>> .
> >>> [   13.412941][  0]   EA = 0, S1PTW = 0
> >>> [   13.416747][  0] Data abort info:
> >>> [   13.420048][  0]   ISV = 0, ISS = 0x00000045
> >>> [   13.424293][  0]   CM = 0, WnR = 1
> >>> [   13.427676][  0] user pgtable: 64k pages, 48-bit VAs, pgdp = (____ptrval____)
> >>> [   13.434778][  0] [0000000000000000] pgd=0000000000000000,
> >>> pud=0000000000000000
> >>> [   13.441968][  0] Internal error: Oops: 96000045 [#1] SMP
> >>> [   13.447250][  0] Modules linked in: qla2xxx nvme_fc nvme_fabrics
> >>> scsi_transport_fc igb megaraid_sas dm_snapshot iscsi_tcp libiscsi_tcp
> >>> libs
> >>> [   13.472588][  0] Process kworker/0:2 (pid: 343, stack limit =
> >>> 0x(____ptrval____))
> >>> [   13.472675][  5] audit: type=1130 audit(1599118767.260:14): pid=1
> >>> uid=0 auid=4294967295 ses=4294967295 msg='unit=initrd-parse-etc
> >>> comm="sy'
> >>> [   13.480032][  0] CPU: 0 PID: 343 Comm: kworker/0:2 Tainted: G
> >>> W         4.19.90-19.ky10.aarch64 #1
> >>> [   13.480033][  5] Hardware name: GreatWall, BIOS 601FBE28 2020/04/20
> >>> [   13.480045][  0] Workqueue: qla2xxx_wq qla2x00_iocb_work_fn [qla2xxx]
> >>> [   13.499248][  0] audit: type=1131 audit(1599118767.260:15): pid=1
> >>> uid=0 auid=4294967295 ses=4294967295 msg='unit=initrd-parse-etc
> >>> comm="sy'
> >>> [   13.508759][  0] pstate: 40000005 (nZcv daif -PAN -UAO)
> >>> [   13.547687][ 24] pc : __memset+0x16c/0x188
> >>> [   13.547697][  0] lr : qla24xx_async_gpnft+0x194/0x950 [qla2xxx]
> >>> [   13.547701][  0] sp : ffffb2158236bc60
> >>> [   13.561388][  0] x29: ffffb2158236bc60 x28: 0000000000000000
> >>> [   13.567104][  0] x27: ffff3be824ac0148 x26: ffff3be824ac00b8
> >>> [   13.572820][  0] x25: ffff3be824b031e0 x24: 0000000000000028
> >>> [   13.578535][  0] x23: ffffb2158600d188 x22: ffffb21586d3ea38
> >>> [   13.584251][  0] x21: 0000000000008010 x20: ffffb21586d3ea08
> >>> [   13.589968][  0] x19: ffffb2158600d040 x18: 0000000000000400
> >>> [   13.595683][  0] x17: 0000000000000000 x16: ffff3be83f9a9500
> >>> [   13.601398][  0] x15: 0000000000000400 x14: 0000000000000400
> >>> [   13.607114][  0] x13: 0000000000000189 x12: 0000000000000001
> >>> [   13.612829][  0] x11: 0000000000000000 x10: 0000000000000b40
> >>> [   13.618544][  0] x9 : 0000000000000000 x8 : 0000000000000000
> >>> [   13.624259][  0] x7 : 0000000000000000 x6 : 000000000000003f
> >>> [   13.629974][  0] x5 : 0000000000000040 x4 : 0000000000000000
> >>> [   13.635689][  0] x3 : 0000000000000004 x2 : 0000000000007fd0
> >>> [   13.641404][  0] x1 : 0000000000000000 x0 : 0000000000000000
> >>> [   13.647119][  0] Call trace:
> >>> [   13.649983][  0]  __memset+0x16c/0x188
> >>> [   13.653718][  0]  qla2x00_do_work+0x398/0x440 [qla2xxx]
> >>> [   13.658920][  0]  qla2x00_iocb_work_fn+0x50/0xe8 [qla2xxx]
> >>> [   13.664378][  0]  process_one_work+0x1f0/0x3c8
> >>> [   13.668797][  0]  worker_thread+0x48/0x4d0
> >>> [   13.672871][  0]  kthread+0x128/0x130
> >>> [   13.676514][  0]  ret_from_fork+0x10/0x18
> >>> [   13.680503][  0] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428)
> >>> [   13.687027][  0] ---[ end trace 258cdcdd74a25238 ]---
> >>> [   13.692051][  0] Kernel panic - not syncing: Fatal exception
> >>
> >> Have you tried applying commit da61ef053bcf ("scsi: qla2xxx: Reduce holding sess_lock to prevent CPU”) to confirm if it resolves your panic. It does look like the panic should resolve with the changes in that patch.
> >>
> >> If you are able to verify then we can request for sable back port with your reported-by and tested-by tags.
> >
> > Yes, it did resolve my panic after backporting that commit to
> > 4.19-stable. But I cannot apply that commit directly, in order to
> > resolve the conflict I also backported commit:
> > 3b1e23aacf80 ("scsi: qla2xxx: Update rscn_rcvd field to more meaningful").
> > a4863b16c31e ("scsi: qla2xxx: Move rport registration out of internal").
> >
>
> These patches looks good for the 4.19-stable back port.
>
> Please post it to stable with Reported-by and Tested-by tag.

I had posted those patches to stable@vger.kernel.org and cc to you but
I have no idea why the mail server denied my address.
Please help me forward the email to stable list, thanks.

>
> Thanks.
>
> >>
> >> --
> >> Himanshu Madhani         Oracle Linux Engineering
>
> --
> Himanshu Madhani         Oracle Linux Engineering
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: qla2xxx panic with 4.19-stable
  2020-09-16  7:49       ` Zhengyuan Liu
@ 2020-09-17 14:58         ` Himanshu Madhani
  0 siblings, 0 replies; 6+ messages in thread
From: Himanshu Madhani @ 2020-09-17 14:58 UTC (permalink / raw)
  To: Zhengyuan Liu; +Cc: linux-scsi, gregkh, liuzhengyuan

Sounds like you need to follow option 2 from this stable submission rules doc

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/stable-kernel-rules.rst


> On Sep 16, 2020, at 2:49 AM, Zhengyuan Liu <liuzhengyuang521@gmail.com> wrote:
> 
> On Tue, Sep 15, 2020 at 11:16 PM Himanshu Madhani
> <himanshu.madhani@oracle.com> wrote:
>> 
>> 
>> 
>>> On Sep 13, 2020, at 9:36 PM, Zhengyuan Liu <liuzhengyuang521@gmail.com> wrote:
>>> 
>>> On Sat, Sep 12, 2020 at 1:37 AM Himanshu Madhani
>>> <himanshu.madhani@oracle.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>>> On Sep 10, 2020, at 9:26 PM, Zhengyuan Liu <liuzhengyuang521@gmail.com> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> There is a panic of NULL pointer dereference on my arm64 server when
>>>>> boot  with the fabric line  plugged into the HBA of QLE2692. After
>>>>> binary-search with git bisect I found this panic is introduced by
>>>>> commit 4984a06bf094 ("scsi: qla2xxx: Remove all rports if fabric scan
>>>>> retry fails"). The upstream and 4.19-stable both had the same problem
>>>>> when reset to this point. but the upstream had fix this
>>>>> unintentionally after commit da61ef053bcf ("scsi: qla2xxx: Reduce
>>>>> holding sess_lock to prevent CPU") while the latest 4.19-stable still
>>>>> has this issue. the panic showed as following:
>>>>> 
>>>>> [   13.380405][  0] Unable to handle kernel NULL pointer dereference
>>>>> at virtual address 0000000000000000
>>>>> [   13.390947][  0] Mem abort info:
>>>>> [   13.395535][  0]   ESR = 0x96000045
>>>>> [   13.400390][  0]   Exception class = DABT (current EL), IL = 32 bits
>>>>> [   13.408089][  0]   SET = 0, FnV = 0
>>>>> .
>>>>> [   13.412941][  0]   EA = 0, S1PTW = 0
>>>>> [   13.416747][  0] Data abort info:
>>>>> [   13.420048][  0]   ISV = 0, ISS = 0x00000045
>>>>> [   13.424293][  0]   CM = 0, WnR = 1
>>>>> [   13.427676][  0] user pgtable: 64k pages, 48-bit VAs, pgdp = (____ptrval____)
>>>>> [   13.434778][  0] [0000000000000000] pgd=0000000000000000,
>>>>> pud=0000000000000000
>>>>> [   13.441968][  0] Internal error: Oops: 96000045 [#1] SMP
>>>>> [   13.447250][  0] Modules linked in: qla2xxx nvme_fc nvme_fabrics
>>>>> scsi_transport_fc igb megaraid_sas dm_snapshot iscsi_tcp libiscsi_tcp
>>>>> libs
>>>>> [   13.472588][  0] Process kworker/0:2 (pid: 343, stack limit =
>>>>> 0x(____ptrval____))
>>>>> [   13.472675][  5] audit: type=1130 audit(1599118767.260:14): pid=1
>>>>> uid=0 auid=4294967295 ses=4294967295 msg='unit=initrd-parse-etc
>>>>> comm="sy'
>>>>> [   13.480032][  0] CPU: 0 PID: 343 Comm: kworker/0:2 Tainted: G
>>>>> W         4.19.90-19.ky10.aarch64 #1
>>>>> [   13.480033][  5] Hardware name: GreatWall, BIOS 601FBE28 2020/04/20
>>>>> [   13.480045][  0] Workqueue: qla2xxx_wq qla2x00_iocb_work_fn [qla2xxx]
>>>>> [   13.499248][  0] audit: type=1131 audit(1599118767.260:15): pid=1
>>>>> uid=0 auid=4294967295 ses=4294967295 msg='unit=initrd-parse-etc
>>>>> comm="sy'
>>>>> [   13.508759][  0] pstate: 40000005 (nZcv daif -PAN -UAO)
>>>>> [   13.547687][ 24] pc : __memset+0x16c/0x188
>>>>> [   13.547697][  0] lr : qla24xx_async_gpnft+0x194/0x950 [qla2xxx]
>>>>> [   13.547701][  0] sp : ffffb2158236bc60
>>>>> [   13.561388][  0] x29: ffffb2158236bc60 x28: 0000000000000000
>>>>> [   13.567104][  0] x27: ffff3be824ac0148 x26: ffff3be824ac00b8
>>>>> [   13.572820][  0] x25: ffff3be824b031e0 x24: 0000000000000028
>>>>> [   13.578535][  0] x23: ffffb2158600d188 x22: ffffb21586d3ea38
>>>>> [   13.584251][  0] x21: 0000000000008010 x20: ffffb21586d3ea08
>>>>> [   13.589968][  0] x19: ffffb2158600d040 x18: 0000000000000400
>>>>> [   13.595683][  0] x17: 0000000000000000 x16: ffff3be83f9a9500
>>>>> [   13.601398][  0] x15: 0000000000000400 x14: 0000000000000400
>>>>> [   13.607114][  0] x13: 0000000000000189 x12: 0000000000000001
>>>>> [   13.612829][  0] x11: 0000000000000000 x10: 0000000000000b40
>>>>> [   13.618544][  0] x9 : 0000000000000000 x8 : 0000000000000000
>>>>> [   13.624259][  0] x7 : 0000000000000000 x6 : 000000000000003f
>>>>> [   13.629974][  0] x5 : 0000000000000040 x4 : 0000000000000000
>>>>> [   13.635689][  0] x3 : 0000000000000004 x2 : 0000000000007fd0
>>>>> [   13.641404][  0] x1 : 0000000000000000 x0 : 0000000000000000
>>>>> [   13.647119][  0] Call trace:
>>>>> [   13.649983][  0]  __memset+0x16c/0x188
>>>>> [   13.653718][  0]  qla2x00_do_work+0x398/0x440 [qla2xxx]
>>>>> [   13.658920][  0]  qla2x00_iocb_work_fn+0x50/0xe8 [qla2xxx]
>>>>> [   13.664378][  0]  process_one_work+0x1f0/0x3c8
>>>>> [   13.668797][  0]  worker_thread+0x48/0x4d0
>>>>> [   13.672871][  0]  kthread+0x128/0x130
>>>>> [   13.676514][  0]  ret_from_fork+0x10/0x18
>>>>> [   13.680503][  0] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428)
>>>>> [   13.687027][  0] ---[ end trace 258cdcdd74a25238 ]---
>>>>> [   13.692051][  0] Kernel panic - not syncing: Fatal exception
>>>> 
>>>> Have you tried applying commit da61ef053bcf ("scsi: qla2xxx: Reduce holding sess_lock to prevent CPU”) to confirm if it resolves your panic. It does look like the panic should resolve with the changes in that patch.
>>>> 
>>>> If you are able to verify then we can request for sable back port with your reported-by and tested-by tags.
>>> 
>>> Yes, it did resolve my panic after backporting that commit to
>>> 4.19-stable. But I cannot apply that commit directly, in order to
>>> resolve the conflict I also backported commit:
>>> 3b1e23aacf80 ("scsi: qla2xxx: Update rscn_rcvd field to more meaningful").
>>> a4863b16c31e ("scsi: qla2xxx: Move rport registration out of internal").
>>> 
>> 
>> These patches looks good for the 4.19-stable back port.
>> 
>> Please post it to stable with Reported-by and Tested-by tag.
> 
> I had posted those patches to stable@vger.kernel.org and cc to you but
> I have no idea why the mail server denied my address.
> Please help me forward the email to stable list, thanks.
> 
>> 
>> Thanks.
>> 
>>>> 
>>>> --
>>>> Himanshu Madhani         Oracle Linux Engineering
>> 
>> --
>> Himanshu Madhani         Oracle Linux Engineering

--
Himanshu Madhani	 Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-09-17 16:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-11  2:26 qla2xxx panic with 4.19-stable Zhengyuan Liu
2020-09-11 17:37 ` Himanshu Madhani
2020-09-14  2:36   ` Zhengyuan Liu
2020-09-15 15:16     ` Himanshu Madhani
2020-09-16  7:49       ` Zhengyuan Liu
2020-09-17 14:58         ` Himanshu Madhani

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).