linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH RFC stable 4.14 1/1] mmc: core: fix hung task caused by race condition on context_info
       [not found] ` <dd8d212c48944cb4ba3b58af2efe3723@horizon.ai>
@ 2022-09-29 12:41   ` Ulf Hansson
  2022-09-29 14:07     ` Steven Rostedt
  2022-09-29 16:39     ` Luis Claudio R. Goncalves
  0 siblings, 2 replies; 5+ messages in thread
From: Ulf Hansson @ 2022-09-29 12:41 UTC (permalink / raw)
  To: dinggao.pan
  Cc: bigeasy, tglx, rostedt, ming.yu, yunqian.wang, linux-mmc,
	linux-kernel, linux-rt-users

On Mon, 5 Sept 2022 at 08:22, dinggao.pan <dinggao.pan@horizon.ai> wrote:
>
> Hi,
> After applying rt patches to our 4.14 kernel and enabling preempt-rt, we met a hung task during boot caused by race condition on context_info stored in struct mmc_host.
> From our investigation, context_info should not be changed by threads that have not claimed the host, hence the following fix.
>
> Any comments are much appreciated.
> Dinggao Pan

Hi Dinggao,

Apologize for the delay.

The 4.14 kernel is too old for me to be able to comment. In
particular, the mmc block layer moved to blk-mq in v4.16, which means
the path you are investigating doesn't exist any more, sorry.

Kind regards
Uffe

>
> From: "Dinggao Pan" <mailto:dinggao.pan@horizon.ai>
>
>   A race condition happens under following circumstances:
>     (mmc_thread1)               |              (mmc_thread2)
>     mmc_issue_rq(req1)          |
>       > qcnt++ for req1         |
>         host handling req1      |
>     mmc_queue_thread(req=null)  |
>       > enter queue thread      |
>         again, fetches blk req  |
>         (return null), sets     |
>         is_waiting_last_req 1   |  mmc_request_fn(req1) -> set is_new_req 1
>                                 |                   and wake_up wait_queue
>     mmc_issue_rq(req2)          |   > mmc_thread2 tries to claim host
>       > **qcnt++ for req2**     |
>       mmc_finalize_req(req2)    |
>         > should wait for req1  |
>           done but req2 return  |
>           MMC_BLK_NEW_REQ       |
>           due to is_new_req     |
>           already set to 1      |
>                                 |
>                                 |
>     req1 done                   |
>       > qcnt-- for req1         |
>     mmc_issue_rq(req3)          |
>       > qcnt++ for req3         |
> req2 is not handled but qcnt is already added(noted by **),
> thus mmc_thread1 will never release host, causing mmc_threads
> except thread1 to hung. Fix race by moving wake_up to the front of
> context_info update.
>
> Reviewed By: Yunqian Wang <mailto:yunqian.wang@horizon.ai>
> Signed-off-by: Dinggao Pan <mailto:dinggao.pan@horizon.ai>
> Signed-off-by: Ming Yu <mailto:ming.yu@horizon.ai>
> ---
> drivers/mmc/core/queue.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> index 0a4e77a5b..58318c102 100644
> --- a/drivers/mmc/core/queue.c
> +++ b/drivers/mmc/core/queue.c
> @@ -107,6 +107,11 @@ static void mmc_request_fn(struct request_queue *q)
>                return;
>       }
>
> +      if (mq->asleep) {
> +               wake_up_process(mq->thread);
> +               return;
> +      }
> +
>       cntx = &mq->card->host->context_info;
>
>       if (cntx->is_waiting_last_req) {
> @@ -114,8 +119,6 @@ static void mmc_request_fn(struct request_queue *q)
>                wake_up_interruptible(&cntx->wait);
>       }
>
> -       if (mq->asleep)
> -                wake_up_process(mq->thread);
> }
>
> static struct scatterlist *mmc_alloc_sg(int sg_len, gfp_t gfp)
> --
> 2.36.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC stable 4.14 1/1] mmc: core: fix hung task caused by race condition on context_info
  2022-09-29 12:41   ` [PATCH RFC stable 4.14 1/1] mmc: core: fix hung task caused by race condition on context_info Ulf Hansson
@ 2022-09-29 14:07     ` Steven Rostedt
  2022-09-29 14:13       ` Steven Rostedt
  2022-09-29 16:39     ` Luis Claudio R. Goncalves
  1 sibling, 1 reply; 5+ messages in thread
From: Steven Rostedt @ 2022-09-29 14:07 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: dinggao.pan, bigeasy, tglx, ming.yu, yunqian.wang, linux-mmc,
	linux-kernel, linux-rt-users, Luis Claudio R. Goncalves

On Thu, 29 Sep 2022 14:41:26 +0200
Ulf Hansson <ulf.hansson@linaro.org> wrote:

> On Mon, 5 Sept 2022 at 08:22, dinggao.pan <dinggao.pan@horizon.ai> wrote:
> >
> > Hi,
> > After applying rt patches to our 4.14 kernel and enabling preempt-rt, we met a hung task during boot caused by race condition on context_info stored in struct mmc_host.
> > From our investigation, context_info should not be changed by threads that have not claimed the host, hence the following fix.
> >
> > Any comments are much appreciated.
> > Dinggao Pan  
> 
> Hi Dinggao,
> 
> Apologize for the delay.
> 
> The 4.14 kernel is too old for me to be able to comment. In
> particular, the mmc block layer moved to blk-mq in v4.16, which means
> the path you are investigating doesn't exist any more, sorry.

Luis (Cc'd) is still supporting the 4.14-rt kernel.

-- Steve

> 
> Kind regards
> Uffe
> 
> >
> > From: "Dinggao Pan" <mailto:dinggao.pan@horizon.ai>
> >
> >   A race condition happens under following circumstances:
> >     (mmc_thread1)               |              (mmc_thread2)
> >     mmc_issue_rq(req1)          |  
> >       > qcnt++ for req1         |  
> >         host handling req1      |
> >     mmc_queue_thread(req=null)  |  
> >       > enter queue thread      |  
> >         again, fetches blk req  |
> >         (return null), sets     |
> >         is_waiting_last_req 1   |  mmc_request_fn(req1) -> set is_new_req 1
> >                                 |                   and wake_up wait_queue
> >     mmc_issue_rq(req2)          |   > mmc_thread2 tries to claim host  
> >       > **qcnt++ for req2**     |  
> >       mmc_finalize_req(req2)    |  
> >         > should wait for req1  |  
> >           done but req2 return  |
> >           MMC_BLK_NEW_REQ       |
> >           due to is_new_req     |
> >           already set to 1      |
> >                                 |
> >                                 |
> >     req1 done                   |  
> >       > qcnt-- for req1         |  
> >     mmc_issue_rq(req3)          |  
> >       > qcnt++ for req3         |  
> > req2 is not handled but qcnt is already added(noted by **),
> > thus mmc_thread1 will never release host, causing mmc_threads
> > except thread1 to hung. Fix race by moving wake_up to the front of
> > context_info update.
> >
> > Reviewed By: Yunqian Wang <mailto:yunqian.wang@horizon.ai>
> > Signed-off-by: Dinggao Pan <mailto:dinggao.pan@horizon.ai>
> > Signed-off-by: Ming Yu <mailto:ming.yu@horizon.ai>
> > ---
> > drivers/mmc/core/queue.c | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> > index 0a4e77a5b..58318c102 100644
> > --- a/drivers/mmc/core/queue.c
> > +++ b/drivers/mmc/core/queue.c
> > @@ -107,6 +107,11 @@ static void mmc_request_fn(struct request_queue *q)
> >                return;
> >       }
> >
> > +      if (mq->asleep) {
> > +               wake_up_process(mq->thread);
> > +               return;
> > +      }
> > +
> >       cntx = &mq->card->host->context_info;
> >
> >       if (cntx->is_waiting_last_req) {
> > @@ -114,8 +119,6 @@ static void mmc_request_fn(struct request_queue *q)
> >                wake_up_interruptible(&cntx->wait);
> >       }
> >
> > -       if (mq->asleep)
> > -                wake_up_process(mq->thread);
> > }
> >
> > static struct scatterlist *mmc_alloc_sg(int sg_len, gfp_t gfp)
> > --
> > 2.36.1  


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC stable 4.14 1/1] mmc: core: fix hung task caused by race condition on context_info
  2022-09-29 14:07     ` Steven Rostedt
@ 2022-09-29 14:13       ` Steven Rostedt
  2022-09-29 14:24         ` bigeasy
  0 siblings, 1 reply; 5+ messages in thread
From: Steven Rostedt @ 2022-09-29 14:13 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: dinggao.pan, bigeasy, tglx, ming.yu, yunqian.wang, linux-mmc,
	linux-kernel, linux-rt-users, Luis Claudio R. Goncalves

On Thu, 29 Sep 2022 10:07:50 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Thu, 29 Sep 2022 14:41:26 +0200
> Ulf Hansson <ulf.hansson@linaro.org> wrote:
> 
> > On Mon, 5 Sept 2022 at 08:22, dinggao.pan <dinggao.pan@horizon.ai> wrote:  
> > >
> > > Hi,
> > > After applying rt patches to our 4.14 kernel and enabling preempt-rt, we met a hung task during boot caused by race condition on context_info stored in struct mmc_host.
> > > From our investigation, context_info should not be changed by threads that have not claimed the host, hence the following fix.
> > >
> > > Any comments are much appreciated.
> > > Dinggao Pan    
> > 
> > Hi Dinggao,
> > 
> > Apologize for the delay.
> > 
> > The 4.14 kernel is too old for me to be able to comment. In
> > particular, the mmc block layer moved to blk-mq in v4.16, which means
> > the path you are investigating doesn't exist any more, sorry.  
> 
> Luis (Cc'd) is still supporting the 4.14-rt kernel.

It appears that I have an old email address for Luis, and it bounced.
Updated with his redhat one.

-- Steve

> 
> > 
> > Kind regards
> > Uffe
> >   
> > >
> > > From: "Dinggao Pan" <mailto:dinggao.pan@horizon.ai>
> > >
> > >   A race condition happens under following circumstances:
> > >     (mmc_thread1)               |              (mmc_thread2)
> > >     mmc_issue_rq(req1)          |    
> > >       > qcnt++ for req1         |    
> > >         host handling req1      |
> > >     mmc_queue_thread(req=null)  |    
> > >       > enter queue thread      |    
> > >         again, fetches blk req  |
> > >         (return null), sets     |
> > >         is_waiting_last_req 1   |  mmc_request_fn(req1) -> set is_new_req 1
> > >                                 |                   and wake_up wait_queue
> > >     mmc_issue_rq(req2)          |   > mmc_thread2 tries to claim host    
> > >       > **qcnt++ for req2**     |    
> > >       mmc_finalize_req(req2)    |    
> > >         > should wait for req1  |    
> > >           done but req2 return  |
> > >           MMC_BLK_NEW_REQ       |
> > >           due to is_new_req     |
> > >           already set to 1      |
> > >                                 |
> > >                                 |
> > >     req1 done                   |    
> > >       > qcnt-- for req1         |    
> > >     mmc_issue_rq(req3)          |    
> > >       > qcnt++ for req3         |    
> > > req2 is not handled but qcnt is already added(noted by **),
> > > thus mmc_thread1 will never release host, causing mmc_threads
> > > except thread1 to hung. Fix race by moving wake_up to the front of
> > > context_info update.
> > >
> > > Reviewed By: Yunqian Wang <mailto:yunqian.wang@horizon.ai>
> > > Signed-off-by: Dinggao Pan <mailto:dinggao.pan@horizon.ai>
> > > Signed-off-by: Ming Yu <mailto:ming.yu@horizon.ai>
> > > ---
> > > drivers/mmc/core/queue.c | 7 +++++--
> > > 1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> > > index 0a4e77a5b..58318c102 100644
> > > --- a/drivers/mmc/core/queue.c
> > > +++ b/drivers/mmc/core/queue.c
> > > @@ -107,6 +107,11 @@ static void mmc_request_fn(struct request_queue *q)
> > >                return;
> > >       }
> > >
> > > +      if (mq->asleep) {
> > > +               wake_up_process(mq->thread);
> > > +               return;
> > > +      }
> > > +
> > >       cntx = &mq->card->host->context_info;
> > >
> > >       if (cntx->is_waiting_last_req) {
> > > @@ -114,8 +119,6 @@ static void mmc_request_fn(struct request_queue *q)
> > >                wake_up_interruptible(&cntx->wait);
> > >       }
> > >
> > > -       if (mq->asleep)
> > > -                wake_up_process(mq->thread);
> > > }
> > >
> > > static struct scatterlist *mmc_alloc_sg(int sg_len, gfp_t gfp)
> > > --
> > > 2.36.1    
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC stable 4.14 1/1] mmc: core: fix hung task caused by race condition on context_info
  2022-09-29 14:13       ` Steven Rostedt
@ 2022-09-29 14:24         ` bigeasy
  0 siblings, 0 replies; 5+ messages in thread
From: bigeasy @ 2022-09-29 14:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ulf Hansson, dinggao.pan, tglx, ming.yu, yunqian.wang, linux-mmc,
	linux-kernel, linux-rt-users, Luis Claudio R. Goncalves

On 2022-09-29 10:13:49 [-0400], Steven Rostedt wrote:
> It appears that I have an old email address for Luis, and it bounced.
> Updated with his redhat one.

It would be to know if this affects mainline. If not, does it affect
4.14 or just 4.14-RT. If it is not limited to RT then it should be sent
-stable once we know how it is fixed mainline.

> -- Steve

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC stable 4.14 1/1] mmc: core: fix hung task caused by race condition on context_info
  2022-09-29 12:41   ` [PATCH RFC stable 4.14 1/1] mmc: core: fix hung task caused by race condition on context_info Ulf Hansson
  2022-09-29 14:07     ` Steven Rostedt
@ 2022-09-29 16:39     ` Luis Claudio R. Goncalves
  1 sibling, 0 replies; 5+ messages in thread
From: Luis Claudio R. Goncalves @ 2022-09-29 16:39 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: dinggao.pan, bigeasy, tglx, rostedt, ming.yu, yunqian.wang,
	linux-mmc, linux-kernel, linux-rt-users

On Thu, Sep 29, 2022 at 02:41:26PM +0200, Ulf Hansson wrote:
> On Mon, 5 Sept 2022 at 08:22, dinggao.pan <dinggao.pan@horizon.ai> wrote:
> >
> > Hi,
> > After applying rt patches to our 4.14 kernel and enabling preempt-rt, we met a hung task during boot caused by race condition on context_info stored in struct mmc_host.
> > From our investigation, context_info should not be changed by threads that have not claimed the host, hence the following fix.
> >
> > Any comments are much appreciated.
> > Dinggao Pan
> 
> Hi Dinggao,
> 
> Apologize for the delay.
> 
> The 4.14 kernel is too old for me to be able to comment. In
> particular, the mmc block layer moved to blk-mq in v4.16, which means
> the path you are investigating doesn't exist any more, sorry.

And the new code has the queue operations protected by a spinlock
(queue_lock), which I believe is necessary to fix the issue reported
here.

Luis
 
> Kind regards
> Uffe
> 
> >
> > From: "Dinggao Pan" <mailto:dinggao.pan@horizon.ai>
> >
> >   A race condition happens under following circumstances:
> >     (mmc_thread1)               |              (mmc_thread2)
> >     mmc_issue_rq(req1)          |
> >       > qcnt++ for req1         |
> >         host handling req1      |
> >     mmc_queue_thread(req=null)  |
> >       > enter queue thread      |
> >         again, fetches blk req  |
> >         (return null), sets     |
> >         is_waiting_last_req 1   |  mmc_request_fn(req1) -> set is_new_req 1
> >                                 |                   and wake_up wait_queue
> >     mmc_issue_rq(req2)          |   > mmc_thread2 tries to claim host
> >       > **qcnt++ for req2**     |
> >       mmc_finalize_req(req2)    |
> >         > should wait for req1  |
> >           done but req2 return  |
> >           MMC_BLK_NEW_REQ       |
> >           due to is_new_req     |
> >           already set to 1      |
> >                                 |
> >                                 |
> >     req1 done                   |
> >       > qcnt-- for req1         |
> >     mmc_issue_rq(req3)          |
> >       > qcnt++ for req3         |
> > req2 is not handled but qcnt is already added(noted by **),
> > thus mmc_thread1 will never release host, causing mmc_threads
> > except thread1 to hung. Fix race by moving wake_up to the front of
> > context_info update.
> >
> > Reviewed By: Yunqian Wang <mailto:yunqian.wang@horizon.ai>
> > Signed-off-by: Dinggao Pan <mailto:dinggao.pan@horizon.ai>
> > Signed-off-by: Ming Yu <mailto:ming.yu@horizon.ai>
> > ---
> > drivers/mmc/core/queue.c | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> > index 0a4e77a5b..58318c102 100644
> > --- a/drivers/mmc/core/queue.c
> > +++ b/drivers/mmc/core/queue.c
> > @@ -107,6 +107,11 @@ static void mmc_request_fn(struct request_queue *q)
> >                return;
> >       }
> >
> > +      if (mq->asleep) {
> > +               wake_up_process(mq->thread);
> > +               return;
> > +      }
> > +
> >       cntx = &mq->card->host->context_info;
> >
> >       if (cntx->is_waiting_last_req) {
> > @@ -114,8 +119,6 @@ static void mmc_request_fn(struct request_queue *q)
> >                wake_up_interruptible(&cntx->wait);
> >       }
> >
> > -       if (mq->asleep)
> > -                wake_up_process(mq->thread);
> > }
> >
> > static struct scatterlist *mmc_alloc_sg(int sg_len, gfp_t gfp)
> > --
> > 2.36.1
> 
---end quoted text---


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-09-29 16:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <21f604139a9a4675b9ed49292839dcfb@horizon.ai>
     [not found] ` <dd8d212c48944cb4ba3b58af2efe3723@horizon.ai>
2022-09-29 12:41   ` [PATCH RFC stable 4.14 1/1] mmc: core: fix hung task caused by race condition on context_info Ulf Hansson
2022-09-29 14:07     ` Steven Rostedt
2022-09-29 14:13       ` Steven Rostedt
2022-09-29 14:24         ` bigeasy
2022-09-29 16:39     ` Luis Claudio R. Goncalves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).