* Question about coherency of comand context between ufs and scsi
[not found] <CGME20210614095245epcas2p2e8512382423332786f584d5ef1e225d3@epcas2p2.samsung.com>
@ 2021-06-14 9:52 ` Kiwoong Kim
2021-06-15 1:09 ` Can Guo
0 siblings, 1 reply; 5+ messages in thread
From: Kiwoong Kim @ 2021-06-14 9:52 UTC (permalink / raw)
To: linux-scsi
Cc: 'Can Guo', 'Bart Van Assche',
'Avri Altman', 'Bean Huo', 'Jaegeuk Kim'
Dear All
I saw one symptom and started wondering on how a command context is synchronized between ufs and scsi.
In the situation where the following log happened, the lrb structure for tag 10 didn't have a command context.
That is, lrbp->cmd was null, so it led to this kernel panic.
lrbp->cmd is set when a command is issued, and cleared when the command is completed.
But what if the command is timed-out and it's completed because its response comes in at the same time?
If scsi added it into its error command list and wakes-up scsi_eh though the command is actually completed, scsi_eh will invoke eh_abort_handler and the symptom will be duplicated, I think
Otherwise, is there anyone who know how to guarantee the coherency?
[78843.058729] [3: kworker/u16:1:27018] exynos-ufs 13100000.ufs: ufshcd_abort: cmd was completed, but without a notifying intr, tag = 10
[78843.058775] [3: kworker/u16:1:27018] exynos-ufs 13100000.ufs: ufshcd_abort: Device abort task at tag 10
[78843.058793] [3: kworker/u16:1:27018] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000160
..
[78843.075421] [3: kworker/u16:1:27018] pc : scsi_print_command+0x24/0x340
[78843.075436] [3: kworker/u16:1:27018] lr : ufshcd_abort+0x180/0x674
[78843.075444] [3: kworker/u16:1:27018] sp : ffffffc038ea3c00
[78843.075453] [3: kworker/u16:1:27018] x29: ffffffc038ea3c10 x28: 0000000000000400
[78843.075464] [3: kworker/u16:1:27018] x27: ffffff8934c0a680 x26: ffffff8931560000
[78843.075474] [3: kworker/u16:1:27018] x25: 000000000002000a x24: ffffff88a0dd4910
[78843.075485] [3: kworker/u16:1:27018] x23: 0000000000000000 x22: ffffff8930f258f0
[78843.075495] [3: kworker/u16:1:27018] x21: ffffff8934c0a080 x20: 000000000000000a
[78843.075505] [3: kworker/u16:1:27018] x19: ffffff8931560cf8 x18: ffffffc037557030
[78843.075516] [3: kworker/u16:1:27018] x17: 0000000000000000 x16: ffffffc010eeba70
[78843.075526] [3: kworker/u16:1:27018] x15: ffffffc01187d88f x14: 2067617420746120
[78843.075536] [3: kworker/u16:1:27018] x13: 6b7361742074726f x12: 6261206563697665
[78843.075546] [3: kworker/u16:1:27018] x11: 44203a74726f6261 x10: 00000000ffffffff
[78843.075556] [3: kworker/u16:1:27018] x9 : 0000000000000090 x8 : ffffff8934c0a620
[78843.075566] [3: kworker/u16:1:27018] x7 : 0000000000000000 x6 : ffffffc0102a7d6c
[78843.075576] [3: kworker/u16:1:27018] x5 : 0000000000000000 x4 : 0000000000000080
[78843.075585] [3: kworker/u16:1:27018] x3 : 0000000000000000 x2 : ffffffc0102a7d80
[78843.075595] [3: kworker/u16:1:27018] x1 : ffffffc0102a7d80 x0 : 0000000000000000
[78843.075606] [3: kworker/u16:1:27018] Call trace:
[78843.075617] [3: kworker/u16:1:27018] scsi_print_command+0x24/0x340
[78843.075627] [3: kworker/u16:1:27018] ufshcd_abort+0x180/0x674
[78843.075643] [3: kworker/u16:1:27018] scmd_eh_abort_handler+0x80/0x15c
[78843.075660] [3: kworker/u16:1:27018] process_one_work+0x290/0x4e4
[78843.075669] [3: kworker/u16:1:27018] worker_thread+0x258/0x534
[78843.075681] [3: kworker/u16:1:27018] kthread+0x178/0x188
[78843.075696] [3: kworker/u16:1:27018] ret_from_fork+0x10/0x18
Thanks.
Kiwoong Kim
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Question about coherency of comand context between ufs and scsi
2021-06-14 9:52 ` Question about coherency of comand context between ufs and scsi Kiwoong Kim
@ 2021-06-15 1:09 ` Can Guo
2021-06-15 7:56 ` Kiwoong Kim
0 siblings, 1 reply; 5+ messages in thread
From: Can Guo @ 2021-06-15 1:09 UTC (permalink / raw)
To: Kiwoong Kim
Cc: linux-scsi, 'Bart Van Assche', 'Avri Altman',
'Bean Huo', 'Jaegeuk Kim'
Hi Kiwoong,
On 2021-06-14 17:52, Kiwoong Kim wrote:
> Dear All
>
> I saw one symptom and started wondering on how a command context is
> synchronized between ufs and scsi.
> In the situation where the following log happened, the lrb structure
> for tag 10 didn't have a command context.
> That is, lrbp->cmd was null, so it led to this kernel panic.
>
> lrbp->cmd is set when a command is issued, and cleared when the
> command is completed.
> But what if the command is timed-out and it's completed because its
> response comes in at the same time?
>
> If scsi added it into its error command list and wakes-up scsi_eh
> though the command is actually completed, scsi_eh will invoke
> eh_abort_handler and the symptom will be duplicated, I think
>
> Otherwise, is there anyone who know how to guarantee the coherency?
>
>
> [78843.058729] [3: kworker/u16:1:27018] exynos-ufs 13100000.ufs:
> ufshcd_abort: cmd was completed, but without a notifying intr, tag =
> 10
> [78843.058775] [3: kworker/u16:1:27018] exynos-ufs 13100000.ufs:
> ufshcd_abort: Device abort task at tag 10
> [78843.058793] [3: kworker/u16:1:27018] Unable to handle kernel NULL
> pointer dereference at virtual address 0000000000000160
> ..
> [78843.075421] [3: kworker/u16:1:27018] pc :
> scsi_print_command+0x24/0x340
> [78843.075436] [3: kworker/u16:1:27018] lr : ufshcd_abort+0x180/0x674
> [78843.075444] [3: kworker/u16:1:27018] sp : ffffffc038ea3c00
> [78843.075453] [3: kworker/u16:1:27018] x29: ffffffc038ea3c10 x28:
> 0000000000000400
> [78843.075464] [3: kworker/u16:1:27018] x27: ffffff8934c0a680 x26:
> ffffff8931560000
> [78843.075474] [3: kworker/u16:1:27018] x25: 000000000002000a x24:
> ffffff88a0dd4910
> [78843.075485] [3: kworker/u16:1:27018] x23: 0000000000000000 x22:
> ffffff8930f258f0
> [78843.075495] [3: kworker/u16:1:27018] x21: ffffff8934c0a080 x20:
> 000000000000000a
> [78843.075505] [3: kworker/u16:1:27018] x19: ffffff8931560cf8 x18:
> ffffffc037557030
> [78843.075516] [3: kworker/u16:1:27018] x17: 0000000000000000 x16:
> ffffffc010eeba70
> [78843.075526] [3: kworker/u16:1:27018] x15: ffffffc01187d88f x14:
> 2067617420746120
> [78843.075536] [3: kworker/u16:1:27018] x13: 6b7361742074726f x12:
> 6261206563697665
> [78843.075546] [3: kworker/u16:1:27018] x11: 44203a74726f6261 x10:
> 00000000ffffffff
> [78843.075556] [3: kworker/u16:1:27018] x9 : 0000000000000090 x8 :
> ffffff8934c0a620
> [78843.075566] [3: kworker/u16:1:27018] x7 : 0000000000000000 x6 :
> ffffffc0102a7d6c
> [78843.075576] [3: kworker/u16:1:27018] x5 : 0000000000000000 x4 :
> 0000000000000080
> [78843.075585] [3: kworker/u16:1:27018] x3 : 0000000000000000 x2 :
> ffffffc0102a7d80
> [78843.075595] [3: kworker/u16:1:27018] x1 : ffffffc0102a7d80 x0 :
> 0000000000000000
> [78843.075606] [3: kworker/u16:1:27018] Call trace:
> [78843.075617] [3: kworker/u16:1:27018] scsi_print_command+0x24/0x340
> [78843.075627] [3: kworker/u16:1:27018] ufshcd_abort+0x180/0x674
> [78843.075643] [3: kworker/u16:1:27018]
> scmd_eh_abort_handler+0x80/0x15c
> [78843.075660] [3: kworker/u16:1:27018] process_one_work+0x290/0x4e4
> [78843.075669] [3: kworker/u16:1:27018] worker_thread+0x258/0x534
> [78843.075681] [3: kworker/u16:1:27018] kthread+0x178/0x188
> [78843.075696] [3: kworker/u16:1:27018] ret_from_fork+0x10/0x18
>
In 5.13 kernel, it is scsi_print_command(cmd) in ufshcd_abort(),
while in 5.12 and earlier kernel, it is
scsi_print_command(hba->lrb[tag].cmd).
Which kernel are you using here?
Thanks,
Can Guo.
> Thanks.
> Kiwoong Kim
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: Question about coherency of comand context between ufs and scsi
2021-06-15 1:09 ` Can Guo
@ 2021-06-15 7:56 ` Kiwoong Kim
2021-06-15 8:07 ` Can Guo
0 siblings, 1 reply; 5+ messages in thread
From: Kiwoong Kim @ 2021-06-15 7:56 UTC (permalink / raw)
To: 'Can Guo'
Cc: linux-scsi, 'Bart Van Assche', 'Avri Altman',
'Bean Huo', 'Jaegeuk Kim'
> If scsi added it into its error command list and wakes-up scsi_eh
> though the command is actually completed, scsi_eh will invoke
> eh_abort_handler and the symptom will be duplicated, I think
>
> Otherwise, is there anyone who know how to guarantee the coherency?
> In 5.13 kernel, it is scsi_print_command(cmd) in ufshcd_abort(), while in
> 5.12 and earlier kernel, it is scsi_print_command(hba->lrb[tag].cmd).
> Which kernel are you using here?
>
> Thanks,
> Can Guo.
Thank you for your information. I'm seeing 5.4.
Yes, for null pointer, you're right.
Then, what do you think?
In the situation I told, is there still the possibility that I suggested?
Thanks.
Kiwoong Kim
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Question about coherency of comand context between ufs and scsi
2021-06-15 7:56 ` Kiwoong Kim
@ 2021-06-15 8:07 ` Can Guo
2021-06-15 8:21 ` Can Guo
0 siblings, 1 reply; 5+ messages in thread
From: Can Guo @ 2021-06-15 8:07 UTC (permalink / raw)
To: Kiwoong Kim
Cc: linux-scsi, 'Bart Van Assche', 'Avri Altman',
'Bean Huo', 'Jaegeuk Kim'
On 2021-06-15 15:56, Kiwoong Kim wrote:
>> If scsi added it into its error command list and wakes-up scsi_eh
>> though the command is actually completed, scsi_eh will invoke
>> eh_abort_handler and the symptom will be duplicated, I think
>>
>> Otherwise, is there anyone who know how to guarantee the coherency?
>
>> In 5.13 kernel, it is scsi_print_command(cmd) in ufshcd_abort(), while
>> in
>> 5.12 and earlier kernel, it is scsi_print_command(hba->lrb[tag].cmd).
>> Which kernel are you using here?
>>
>> Thanks,
>> Can Guo.
>
> Thank you for your information. I'm seeing 5.4.
> Yes, for null pointer, you're right.
> Then, what do you think?
> In the situation I told, is there still the possibility that I
> suggested?
You can make the code change to that line in your project same as 5.13.
Thanks,
Can Guo.
>
> Thanks.
> Kiwoong Kim
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Question about coherency of comand context between ufs and scsi
2021-06-15 8:07 ` Can Guo
@ 2021-06-15 8:21 ` Can Guo
0 siblings, 0 replies; 5+ messages in thread
From: Can Guo @ 2021-06-15 8:21 UTC (permalink / raw)
To: Kiwoong Kim
Cc: linux-scsi, 'Bart Van Assche', 'Avri Altman',
'Bean Huo', 'Jaegeuk Kim'
On 2021-06-15 16:07, Can Guo wrote:
> On 2021-06-15 15:56, Kiwoong Kim wrote:
>>> If scsi added it into its error command list and wakes-up scsi_eh
>>> though the command is actually completed, scsi_eh will invoke
>>> eh_abort_handler and the symptom will be duplicated, I think
>>>
>>> Otherwise, is there anyone who know how to guarantee the coherency?
>>
scsi_times_out() guarantees that -
300 /*
301 * Set the command to complete first in order to prevent a real
302 * completion from releasing the command while error handling
303 * is using it. If the command was already completed, then the
304 * lower level driver beat the timeout handler, and it is safe
305 * to return without escalating error recovery.
306 *
307 * If timeout handling lost the race to a real completion, the
308 * block layer may ignore that due to a fake timeout injection,
309 * so return RESET_TIMER to allow error handling another shot
310 * at this command.
311 */
312 if (test_and_set_bit(SCMD_STATE_COMPLETE, &scmd->state))
313 return BLK_EH_RESET_TIMER;
Please read above comments.
Can Guo.
>>> In 5.13 kernel, it is scsi_print_command(cmd) in ufshcd_abort(),
>>> while in
>>> 5.12 and earlier kernel, it is scsi_print_command(hba->lrb[tag].cmd).
>>> Which kernel are you using here?
>>>
>>> Thanks,
>>> Can Guo.
>>
>> Thank you for your information. I'm seeing 5.4.
>> Yes, for null pointer, you're right.
>> Then, what do you think?
>> In the situation I told, is there still the possibility that I
>> suggested?
>
> You can make the code change to that line in your project same as 5.13.
>
> Thanks,
>
> Can Guo.
>
>>
>> Thanks.
>> Kiwoong Kim
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-06-15 8:22 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CGME20210614095245epcas2p2e8512382423332786f584d5ef1e225d3@epcas2p2.samsung.com>
2021-06-14 9:52 ` Question about coherency of comand context between ufs and scsi Kiwoong Kim
2021-06-15 1:09 ` Can Guo
2021-06-15 7:56 ` Kiwoong Kim
2021-06-15 8:07 ` Can Guo
2021-06-15 8:21 ` Can Guo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.