All of lore.kernel.org
 help / color / mirror / Atom feed
* Question about coherency of comand context between ufs and scsi
       [not found] <CGME20210614095245epcas2p2e8512382423332786f584d5ef1e225d3@epcas2p2.samsung.com>
@ 2021-06-14  9:52 ` Kiwoong Kim
  2021-06-15  1:09   ` Can Guo
  0 siblings, 1 reply; 5+ messages in thread
From: Kiwoong Kim @ 2021-06-14  9:52 UTC (permalink / raw)
  To: linux-scsi
  Cc: 'Can Guo', 'Bart Van Assche',
	'Avri Altman', 'Bean Huo', 'Jaegeuk Kim'

Dear All

I saw one symptom and started wondering on how a command context is synchronized between ufs and scsi.
In the situation where the following log happened, the lrb structure for tag 10 didn't have a command context.
That is, lrbp->cmd was null, so it led to this kernel panic.

lrbp->cmd is set when a command is issued, and cleared when the command is completed.
But what if the command is timed-out and it's completed because its response comes in at the same time?

If scsi added it into its error command list and wakes-up scsi_eh though the command is actually completed, scsi_eh will invoke eh_abort_handler and the symptom will be duplicated, I think

Otherwise, is there anyone who know how to guarantee the coherency?


[78843.058729] [3:  kworker/u16:1:27018] exynos-ufs 13100000.ufs: ufshcd_abort: cmd was completed, but without a notifying intr, tag = 10
[78843.058775] [3:  kworker/u16:1:27018] exynos-ufs 13100000.ufs: ufshcd_abort: Device abort task at tag 10
[78843.058793] [3:  kworker/u16:1:27018] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000160
..
[78843.075421] [3:  kworker/u16:1:27018] pc : scsi_print_command+0x24/0x340
[78843.075436] [3:  kworker/u16:1:27018] lr : ufshcd_abort+0x180/0x674
[78843.075444] [3:  kworker/u16:1:27018] sp : ffffffc038ea3c00
[78843.075453] [3:  kworker/u16:1:27018] x29: ffffffc038ea3c10 x28: 0000000000000400 
[78843.075464] [3:  kworker/u16:1:27018] x27: ffffff8934c0a680 x26: ffffff8931560000 
[78843.075474] [3:  kworker/u16:1:27018] x25: 000000000002000a x24: ffffff88a0dd4910 
[78843.075485] [3:  kworker/u16:1:27018] x23: 0000000000000000 x22: ffffff8930f258f0 
[78843.075495] [3:  kworker/u16:1:27018] x21: ffffff8934c0a080 x20: 000000000000000a 
[78843.075505] [3:  kworker/u16:1:27018] x19: ffffff8931560cf8 x18: ffffffc037557030 
[78843.075516] [3:  kworker/u16:1:27018] x17: 0000000000000000 x16: ffffffc010eeba70 
[78843.075526] [3:  kworker/u16:1:27018] x15: ffffffc01187d88f x14: 2067617420746120 
[78843.075536] [3:  kworker/u16:1:27018] x13: 6b7361742074726f x12: 6261206563697665 
[78843.075546] [3:  kworker/u16:1:27018] x11: 44203a74726f6261 x10: 00000000ffffffff 
[78843.075556] [3:  kworker/u16:1:27018] x9 : 0000000000000090 x8 : ffffff8934c0a620 
[78843.075566] [3:  kworker/u16:1:27018] x7 : 0000000000000000 x6 : ffffffc0102a7d6c 
[78843.075576] [3:  kworker/u16:1:27018] x5 : 0000000000000000 x4 : 0000000000000080 
[78843.075585] [3:  kworker/u16:1:27018] x3 : 0000000000000000 x2 : ffffffc0102a7d80 
[78843.075595] [3:  kworker/u16:1:27018] x1 : ffffffc0102a7d80 x0 : 0000000000000000 
[78843.075606] [3:  kworker/u16:1:27018] Call trace:
[78843.075617] [3:  kworker/u16:1:27018]  scsi_print_command+0x24/0x340
[78843.075627] [3:  kworker/u16:1:27018]  ufshcd_abort+0x180/0x674
[78843.075643] [3:  kworker/u16:1:27018]  scmd_eh_abort_handler+0x80/0x15c
[78843.075660] [3:  kworker/u16:1:27018]  process_one_work+0x290/0x4e4
[78843.075669] [3:  kworker/u16:1:27018]  worker_thread+0x258/0x534
[78843.075681] [3:  kworker/u16:1:27018]  kthread+0x178/0x188
[78843.075696] [3:  kworker/u16:1:27018]  ret_from_fork+0x10/0x18

Thanks.
Kiwoong Kim



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about coherency of comand context between ufs and scsi
  2021-06-14  9:52 ` Question about coherency of comand context between ufs and scsi Kiwoong Kim
@ 2021-06-15  1:09   ` Can Guo
  2021-06-15  7:56     ` Kiwoong Kim
  0 siblings, 1 reply; 5+ messages in thread
From: Can Guo @ 2021-06-15  1:09 UTC (permalink / raw)
  To: Kiwoong Kim
  Cc: linux-scsi, 'Bart Van Assche', 'Avri Altman',
	'Bean Huo', 'Jaegeuk Kim'

Hi Kiwoong,

On 2021-06-14 17:52, Kiwoong Kim wrote:
> Dear All
> 
> I saw one symptom and started wondering on how a command context is
> synchronized between ufs and scsi.
> In the situation where the following log happened, the lrb structure
> for tag 10 didn't have a command context.
> That is, lrbp->cmd was null, so it led to this kernel panic.
> 
> lrbp->cmd is set when a command is issued, and cleared when the
> command is completed.
> But what if the command is timed-out and it's completed because its
> response comes in at the same time?
> 
> If scsi added it into its error command list and wakes-up scsi_eh
> though the command is actually completed, scsi_eh will invoke
> eh_abort_handler and the symptom will be duplicated, I think
> 
> Otherwise, is there anyone who know how to guarantee the coherency?
> 
> 
> [78843.058729] [3:  kworker/u16:1:27018] exynos-ufs 13100000.ufs:
> ufshcd_abort: cmd was completed, but without a notifying intr, tag =
> 10
> [78843.058775] [3:  kworker/u16:1:27018] exynos-ufs 13100000.ufs:
> ufshcd_abort: Device abort task at tag 10
> [78843.058793] [3:  kworker/u16:1:27018] Unable to handle kernel NULL
> pointer dereference at virtual address 0000000000000160
> ..
> [78843.075421] [3:  kworker/u16:1:27018] pc : 
> scsi_print_command+0x24/0x340
> [78843.075436] [3:  kworker/u16:1:27018] lr : ufshcd_abort+0x180/0x674
> [78843.075444] [3:  kworker/u16:1:27018] sp : ffffffc038ea3c00
> [78843.075453] [3:  kworker/u16:1:27018] x29: ffffffc038ea3c10 x28:
> 0000000000000400
> [78843.075464] [3:  kworker/u16:1:27018] x27: ffffff8934c0a680 x26:
> ffffff8931560000
> [78843.075474] [3:  kworker/u16:1:27018] x25: 000000000002000a x24:
> ffffff88a0dd4910
> [78843.075485] [3:  kworker/u16:1:27018] x23: 0000000000000000 x22:
> ffffff8930f258f0
> [78843.075495] [3:  kworker/u16:1:27018] x21: ffffff8934c0a080 x20:
> 000000000000000a
> [78843.075505] [3:  kworker/u16:1:27018] x19: ffffff8931560cf8 x18:
> ffffffc037557030
> [78843.075516] [3:  kworker/u16:1:27018] x17: 0000000000000000 x16:
> ffffffc010eeba70
> [78843.075526] [3:  kworker/u16:1:27018] x15: ffffffc01187d88f x14:
> 2067617420746120
> [78843.075536] [3:  kworker/u16:1:27018] x13: 6b7361742074726f x12:
> 6261206563697665
> [78843.075546] [3:  kworker/u16:1:27018] x11: 44203a74726f6261 x10:
> 00000000ffffffff
> [78843.075556] [3:  kworker/u16:1:27018] x9 : 0000000000000090 x8 :
> ffffff8934c0a620
> [78843.075566] [3:  kworker/u16:1:27018] x7 : 0000000000000000 x6 :
> ffffffc0102a7d6c
> [78843.075576] [3:  kworker/u16:1:27018] x5 : 0000000000000000 x4 :
> 0000000000000080
> [78843.075585] [3:  kworker/u16:1:27018] x3 : 0000000000000000 x2 :
> ffffffc0102a7d80
> [78843.075595] [3:  kworker/u16:1:27018] x1 : ffffffc0102a7d80 x0 :
> 0000000000000000
> [78843.075606] [3:  kworker/u16:1:27018] Call trace:
> [78843.075617] [3:  kworker/u16:1:27018]  scsi_print_command+0x24/0x340
> [78843.075627] [3:  kworker/u16:1:27018]  ufshcd_abort+0x180/0x674
> [78843.075643] [3:  kworker/u16:1:27018]  
> scmd_eh_abort_handler+0x80/0x15c
> [78843.075660] [3:  kworker/u16:1:27018]  process_one_work+0x290/0x4e4
> [78843.075669] [3:  kworker/u16:1:27018]  worker_thread+0x258/0x534
> [78843.075681] [3:  kworker/u16:1:27018]  kthread+0x178/0x188
> [78843.075696] [3:  kworker/u16:1:27018]  ret_from_fork+0x10/0x18
> 

In 5.13 kernel, it is scsi_print_command(cmd) in ufshcd_abort(),
while in 5.12 and earlier kernel, it is 
scsi_print_command(hba->lrb[tag].cmd).
Which kernel are you using here?

Thanks,
Can Guo.

> Thanks.
> Kiwoong Kim

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Question about coherency of comand context between ufs and scsi
  2021-06-15  1:09   ` Can Guo
@ 2021-06-15  7:56     ` Kiwoong Kim
  2021-06-15  8:07       ` Can Guo
  0 siblings, 1 reply; 5+ messages in thread
From: Kiwoong Kim @ 2021-06-15  7:56 UTC (permalink / raw)
  To: 'Can Guo'
  Cc: linux-scsi, 'Bart Van Assche', 'Avri Altman',
	'Bean Huo', 'Jaegeuk Kim'

> If scsi added it into its error command list and wakes-up scsi_eh 
> though the command is actually completed, scsi_eh will invoke 
> eh_abort_handler and the symptom will be duplicated, I think
> 
> Otherwise, is there anyone who know how to guarantee the coherency?

> In 5.13 kernel, it is scsi_print_command(cmd) in ufshcd_abort(), while in
> 5.12 and earlier kernel, it is scsi_print_command(hba->lrb[tag].cmd).
> Which kernel are you using here?
> 
> Thanks,
> Can Guo. 

Thank you for your information. I'm seeing 5.4.
Yes, for null pointer, you're right.
Then, what do you think?
In the situation I told, is there still the possibility that I suggested?

Thanks.
Kiwoong Kim


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about coherency of comand context between ufs and scsi
  2021-06-15  7:56     ` Kiwoong Kim
@ 2021-06-15  8:07       ` Can Guo
  2021-06-15  8:21         ` Can Guo
  0 siblings, 1 reply; 5+ messages in thread
From: Can Guo @ 2021-06-15  8:07 UTC (permalink / raw)
  To: Kiwoong Kim
  Cc: linux-scsi, 'Bart Van Assche', 'Avri Altman',
	'Bean Huo', 'Jaegeuk Kim'

On 2021-06-15 15:56, Kiwoong Kim wrote:
>> If scsi added it into its error command list and wakes-up scsi_eh
>> though the command is actually completed, scsi_eh will invoke
>> eh_abort_handler and the symptom will be duplicated, I think
>> 
>> Otherwise, is there anyone who know how to guarantee the coherency?
> 
>> In 5.13 kernel, it is scsi_print_command(cmd) in ufshcd_abort(), while 
>> in
>> 5.12 and earlier kernel, it is scsi_print_command(hba->lrb[tag].cmd).
>> Which kernel are you using here?
>> 
>> Thanks,
>> Can Guo.
> 
> Thank you for your information. I'm seeing 5.4.
> Yes, for null pointer, you're right.
> Then, what do you think?
> In the situation I told, is there still the possibility that I 
> suggested?

You can make the code change to that line in your project same as 5.13.

Thanks,

Can Guo.

> 
> Thanks.
> Kiwoong Kim

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about coherency of comand context between ufs and scsi
  2021-06-15  8:07       ` Can Guo
@ 2021-06-15  8:21         ` Can Guo
  0 siblings, 0 replies; 5+ messages in thread
From: Can Guo @ 2021-06-15  8:21 UTC (permalink / raw)
  To: Kiwoong Kim
  Cc: linux-scsi, 'Bart Van Assche', 'Avri Altman',
	'Bean Huo', 'Jaegeuk Kim'

On 2021-06-15 16:07, Can Guo wrote:
> On 2021-06-15 15:56, Kiwoong Kim wrote:
>>> If scsi added it into its error command list and wakes-up scsi_eh
>>> though the command is actually completed, scsi_eh will invoke
>>> eh_abort_handler and the symptom will be duplicated, I think
>>> 
>>> Otherwise, is there anyone who know how to guarantee the coherency?
>> 

scsi_times_out() guarantees that -

300 		/*
301 		 * Set the command to complete first in order to prevent a real
302 		 * completion from releasing the command while error handling
303 		 * is using it. If the command was already completed, then the
304 		 * lower level driver beat the timeout handler, and it is safe
305 		 * to return without escalating error recovery.
306 		 *
307 		 * If timeout handling lost the race to a real completion, the
308 		 * block layer may ignore that due to a fake timeout injection,
309 		 * so return RESET_TIMER to allow error handling another shot
310 		 * at this command.
311 		 */
312 		if (test_and_set_bit(SCMD_STATE_COMPLETE, &scmd->state))
313 			return BLK_EH_RESET_TIMER;

Please read above comments.

Can Guo.

>>> In 5.13 kernel, it is scsi_print_command(cmd) in ufshcd_abort(), 
>>> while in
>>> 5.12 and earlier kernel, it is scsi_print_command(hba->lrb[tag].cmd).
>>> Which kernel are you using here?
>>> 
>>> Thanks,
>>> Can Guo.
>> 
>> Thank you for your information. I'm seeing 5.4.
>> Yes, for null pointer, you're right.
>> Then, what do you think?
>> In the situation I told, is there still the possibility that I 
>> suggested?
> 
> You can make the code change to that line in your project same as 5.13.
> 
> Thanks,
> 
> Can Guo.
> 
>> 
>> Thanks.
>> Kiwoong Kim

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-06-15  8:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20210614095245epcas2p2e8512382423332786f584d5ef1e225d3@epcas2p2.samsung.com>
2021-06-14  9:52 ` Question about coherency of comand context between ufs and scsi Kiwoong Kim
2021-06-15  1:09   ` Can Guo
2021-06-15  7:56     ` Kiwoong Kim
2021-06-15  8:07       ` Can Guo
2021-06-15  8:21         ` Can Guo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.