* Question about coherency of comand context between ufs and scsi [not found] <CGME20210614095245epcas2p2e8512382423332786f584d5ef1e225d3@epcas2p2.samsung.com> @ 2021-06-14 9:52 ` Kiwoong Kim 2021-06-15 1:09 ` Can Guo 0 siblings, 1 reply; 5+ messages in thread From: Kiwoong Kim @ 2021-06-14 9:52 UTC (permalink / raw) To: linux-scsi Cc: 'Can Guo', 'Bart Van Assche', 'Avri Altman', 'Bean Huo', 'Jaegeuk Kim' Dear All I saw one symptom and started wondering on how a command context is synchronized between ufs and scsi. In the situation where the following log happened, the lrb structure for tag 10 didn't have a command context. That is, lrbp->cmd was null, so it led to this kernel panic. lrbp->cmd is set when a command is issued, and cleared when the command is completed. But what if the command is timed-out and it's completed because its response comes in at the same time? If scsi added it into its error command list and wakes-up scsi_eh though the command is actually completed, scsi_eh will invoke eh_abort_handler and the symptom will be duplicated, I think Otherwise, is there anyone who know how to guarantee the coherency? [78843.058729] [3: kworker/u16:1:27018] exynos-ufs 13100000.ufs: ufshcd_abort: cmd was completed, but without a notifying intr, tag = 10 [78843.058775] [3: kworker/u16:1:27018] exynos-ufs 13100000.ufs: ufshcd_abort: Device abort task at tag 10 [78843.058793] [3: kworker/u16:1:27018] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000160 .. [78843.075421] [3: kworker/u16:1:27018] pc : scsi_print_command+0x24/0x340 [78843.075436] [3: kworker/u16:1:27018] lr : ufshcd_abort+0x180/0x674 [78843.075444] [3: kworker/u16:1:27018] sp : ffffffc038ea3c00 [78843.075453] [3: kworker/u16:1:27018] x29: ffffffc038ea3c10 x28: 0000000000000400 [78843.075464] [3: kworker/u16:1:27018] x27: ffffff8934c0a680 x26: ffffff8931560000 [78843.075474] [3: kworker/u16:1:27018] x25: 000000000002000a x24: ffffff88a0dd4910 [78843.075485] [3: kworker/u16:1:27018] x23: 0000000000000000 x22: ffffff8930f258f0 [78843.075495] [3: kworker/u16:1:27018] x21: ffffff8934c0a080 x20: 000000000000000a [78843.075505] [3: kworker/u16:1:27018] x19: ffffff8931560cf8 x18: ffffffc037557030 [78843.075516] [3: kworker/u16:1:27018] x17: 0000000000000000 x16: ffffffc010eeba70 [78843.075526] [3: kworker/u16:1:27018] x15: ffffffc01187d88f x14: 2067617420746120 [78843.075536] [3: kworker/u16:1:27018] x13: 6b7361742074726f x12: 6261206563697665 [78843.075546] [3: kworker/u16:1:27018] x11: 44203a74726f6261 x10: 00000000ffffffff [78843.075556] [3: kworker/u16:1:27018] x9 : 0000000000000090 x8 : ffffff8934c0a620 [78843.075566] [3: kworker/u16:1:27018] x7 : 0000000000000000 x6 : ffffffc0102a7d6c [78843.075576] [3: kworker/u16:1:27018] x5 : 0000000000000000 x4 : 0000000000000080 [78843.075585] [3: kworker/u16:1:27018] x3 : 0000000000000000 x2 : ffffffc0102a7d80 [78843.075595] [3: kworker/u16:1:27018] x1 : ffffffc0102a7d80 x0 : 0000000000000000 [78843.075606] [3: kworker/u16:1:27018] Call trace: [78843.075617] [3: kworker/u16:1:27018] scsi_print_command+0x24/0x340 [78843.075627] [3: kworker/u16:1:27018] ufshcd_abort+0x180/0x674 [78843.075643] [3: kworker/u16:1:27018] scmd_eh_abort_handler+0x80/0x15c [78843.075660] [3: kworker/u16:1:27018] process_one_work+0x290/0x4e4 [78843.075669] [3: kworker/u16:1:27018] worker_thread+0x258/0x534 [78843.075681] [3: kworker/u16:1:27018] kthread+0x178/0x188 [78843.075696] [3: kworker/u16:1:27018] ret_from_fork+0x10/0x18 Thanks. Kiwoong Kim ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Question about coherency of comand context between ufs and scsi 2021-06-14 9:52 ` Question about coherency of comand context between ufs and scsi Kiwoong Kim @ 2021-06-15 1:09 ` Can Guo 2021-06-15 7:56 ` Kiwoong Kim 0 siblings, 1 reply; 5+ messages in thread From: Can Guo @ 2021-06-15 1:09 UTC (permalink / raw) To: Kiwoong Kim Cc: linux-scsi, 'Bart Van Assche', 'Avri Altman', 'Bean Huo', 'Jaegeuk Kim' Hi Kiwoong, On 2021-06-14 17:52, Kiwoong Kim wrote: > Dear All > > I saw one symptom and started wondering on how a command context is > synchronized between ufs and scsi. > In the situation where the following log happened, the lrb structure > for tag 10 didn't have a command context. > That is, lrbp->cmd was null, so it led to this kernel panic. > > lrbp->cmd is set when a command is issued, and cleared when the > command is completed. > But what if the command is timed-out and it's completed because its > response comes in at the same time? > > If scsi added it into its error command list and wakes-up scsi_eh > though the command is actually completed, scsi_eh will invoke > eh_abort_handler and the symptom will be duplicated, I think > > Otherwise, is there anyone who know how to guarantee the coherency? > > > [78843.058729] [3: kworker/u16:1:27018] exynos-ufs 13100000.ufs: > ufshcd_abort: cmd was completed, but without a notifying intr, tag = > 10 > [78843.058775] [3: kworker/u16:1:27018] exynos-ufs 13100000.ufs: > ufshcd_abort: Device abort task at tag 10 > [78843.058793] [3: kworker/u16:1:27018] Unable to handle kernel NULL > pointer dereference at virtual address 0000000000000160 > .. > [78843.075421] [3: kworker/u16:1:27018] pc : > scsi_print_command+0x24/0x340 > [78843.075436] [3: kworker/u16:1:27018] lr : ufshcd_abort+0x180/0x674 > [78843.075444] [3: kworker/u16:1:27018] sp : ffffffc038ea3c00 > [78843.075453] [3: kworker/u16:1:27018] x29: ffffffc038ea3c10 x28: > 0000000000000400 > [78843.075464] [3: kworker/u16:1:27018] x27: ffffff8934c0a680 x26: > ffffff8931560000 > [78843.075474] [3: kworker/u16:1:27018] x25: 000000000002000a x24: > ffffff88a0dd4910 > [78843.075485] [3: kworker/u16:1:27018] x23: 0000000000000000 x22: > ffffff8930f258f0 > [78843.075495] [3: kworker/u16:1:27018] x21: ffffff8934c0a080 x20: > 000000000000000a > [78843.075505] [3: kworker/u16:1:27018] x19: ffffff8931560cf8 x18: > ffffffc037557030 > [78843.075516] [3: kworker/u16:1:27018] x17: 0000000000000000 x16: > ffffffc010eeba70 > [78843.075526] [3: kworker/u16:1:27018] x15: ffffffc01187d88f x14: > 2067617420746120 > [78843.075536] [3: kworker/u16:1:27018] x13: 6b7361742074726f x12: > 6261206563697665 > [78843.075546] [3: kworker/u16:1:27018] x11: 44203a74726f6261 x10: > 00000000ffffffff > [78843.075556] [3: kworker/u16:1:27018] x9 : 0000000000000090 x8 : > ffffff8934c0a620 > [78843.075566] [3: kworker/u16:1:27018] x7 : 0000000000000000 x6 : > ffffffc0102a7d6c > [78843.075576] [3: kworker/u16:1:27018] x5 : 0000000000000000 x4 : > 0000000000000080 > [78843.075585] [3: kworker/u16:1:27018] x3 : 0000000000000000 x2 : > ffffffc0102a7d80 > [78843.075595] [3: kworker/u16:1:27018] x1 : ffffffc0102a7d80 x0 : > 0000000000000000 > [78843.075606] [3: kworker/u16:1:27018] Call trace: > [78843.075617] [3: kworker/u16:1:27018] scsi_print_command+0x24/0x340 > [78843.075627] [3: kworker/u16:1:27018] ufshcd_abort+0x180/0x674 > [78843.075643] [3: kworker/u16:1:27018] > scmd_eh_abort_handler+0x80/0x15c > [78843.075660] [3: kworker/u16:1:27018] process_one_work+0x290/0x4e4 > [78843.075669] [3: kworker/u16:1:27018] worker_thread+0x258/0x534 > [78843.075681] [3: kworker/u16:1:27018] kthread+0x178/0x188 > [78843.075696] [3: kworker/u16:1:27018] ret_from_fork+0x10/0x18 > In 5.13 kernel, it is scsi_print_command(cmd) in ufshcd_abort(), while in 5.12 and earlier kernel, it is scsi_print_command(hba->lrb[tag].cmd). Which kernel are you using here? Thanks, Can Guo. > Thanks. > Kiwoong Kim ^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: Question about coherency of comand context between ufs and scsi 2021-06-15 1:09 ` Can Guo @ 2021-06-15 7:56 ` Kiwoong Kim 2021-06-15 8:07 ` Can Guo 0 siblings, 1 reply; 5+ messages in thread From: Kiwoong Kim @ 2021-06-15 7:56 UTC (permalink / raw) To: 'Can Guo' Cc: linux-scsi, 'Bart Van Assche', 'Avri Altman', 'Bean Huo', 'Jaegeuk Kim' > If scsi added it into its error command list and wakes-up scsi_eh > though the command is actually completed, scsi_eh will invoke > eh_abort_handler and the symptom will be duplicated, I think > > Otherwise, is there anyone who know how to guarantee the coherency? > In 5.13 kernel, it is scsi_print_command(cmd) in ufshcd_abort(), while in > 5.12 and earlier kernel, it is scsi_print_command(hba->lrb[tag].cmd). > Which kernel are you using here? > > Thanks, > Can Guo. Thank you for your information. I'm seeing 5.4. Yes, for null pointer, you're right. Then, what do you think? In the situation I told, is there still the possibility that I suggested? Thanks. Kiwoong Kim ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Question about coherency of comand context between ufs and scsi 2021-06-15 7:56 ` Kiwoong Kim @ 2021-06-15 8:07 ` Can Guo 2021-06-15 8:21 ` Can Guo 0 siblings, 1 reply; 5+ messages in thread From: Can Guo @ 2021-06-15 8:07 UTC (permalink / raw) To: Kiwoong Kim Cc: linux-scsi, 'Bart Van Assche', 'Avri Altman', 'Bean Huo', 'Jaegeuk Kim' On 2021-06-15 15:56, Kiwoong Kim wrote: >> If scsi added it into its error command list and wakes-up scsi_eh >> though the command is actually completed, scsi_eh will invoke >> eh_abort_handler and the symptom will be duplicated, I think >> >> Otherwise, is there anyone who know how to guarantee the coherency? > >> In 5.13 kernel, it is scsi_print_command(cmd) in ufshcd_abort(), while >> in >> 5.12 and earlier kernel, it is scsi_print_command(hba->lrb[tag].cmd). >> Which kernel are you using here? >> >> Thanks, >> Can Guo. > > Thank you for your information. I'm seeing 5.4. > Yes, for null pointer, you're right. > Then, what do you think? > In the situation I told, is there still the possibility that I > suggested? You can make the code change to that line in your project same as 5.13. Thanks, Can Guo. > > Thanks. > Kiwoong Kim ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Question about coherency of comand context between ufs and scsi 2021-06-15 8:07 ` Can Guo @ 2021-06-15 8:21 ` Can Guo 0 siblings, 0 replies; 5+ messages in thread From: Can Guo @ 2021-06-15 8:21 UTC (permalink / raw) To: Kiwoong Kim Cc: linux-scsi, 'Bart Van Assche', 'Avri Altman', 'Bean Huo', 'Jaegeuk Kim' On 2021-06-15 16:07, Can Guo wrote: > On 2021-06-15 15:56, Kiwoong Kim wrote: >>> If scsi added it into its error command list and wakes-up scsi_eh >>> though the command is actually completed, scsi_eh will invoke >>> eh_abort_handler and the symptom will be duplicated, I think >>> >>> Otherwise, is there anyone who know how to guarantee the coherency? >> scsi_times_out() guarantees that - 300 /* 301 * Set the command to complete first in order to prevent a real 302 * completion from releasing the command while error handling 303 * is using it. If the command was already completed, then the 304 * lower level driver beat the timeout handler, and it is safe 305 * to return without escalating error recovery. 306 * 307 * If timeout handling lost the race to a real completion, the 308 * block layer may ignore that due to a fake timeout injection, 309 * so return RESET_TIMER to allow error handling another shot 310 * at this command. 311 */ 312 if (test_and_set_bit(SCMD_STATE_COMPLETE, &scmd->state)) 313 return BLK_EH_RESET_TIMER; Please read above comments. Can Guo. >>> In 5.13 kernel, it is scsi_print_command(cmd) in ufshcd_abort(), >>> while in >>> 5.12 and earlier kernel, it is scsi_print_command(hba->lrb[tag].cmd). >>> Which kernel are you using here? >>> >>> Thanks, >>> Can Guo. >> >> Thank you for your information. I'm seeing 5.4. >> Yes, for null pointer, you're right. >> Then, what do you think? >> In the situation I told, is there still the possibility that I >> suggested? > > You can make the code change to that line in your project same as 5.13. > > Thanks, > > Can Guo. > >> >> Thanks. >> Kiwoong Kim ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-06-15 8:22 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CGME20210614095245epcas2p2e8512382423332786f584d5ef1e225d3@epcas2p2.samsung.com> 2021-06-14 9:52 ` Question about coherency of comand context between ufs and scsi Kiwoong Kim 2021-06-15 1:09 ` Can Guo 2021-06-15 7:56 ` Kiwoong Kim 2021-06-15 8:07 ` Can Guo 2021-06-15 8:21 ` Can Guo
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.