All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH RFC stable 4.14 1/1] mmc: core: fix hung task caused by race condition on context_info
       [not found] ` <dd8d212c48944cb4ba3b58af2efe3723@horizon.ai>
@ 2022-09-29 12:41   ` Ulf Hansson
  2022-09-29 14:07     ` Steven Rostedt
  2022-09-29 16:39     ` Luis Claudio R. Goncalves
  0 siblings, 2 replies; 5+ messages in thread
From: Ulf Hansson @ 2022-09-29 12:41 UTC (permalink / raw)
  To: dinggao.pan
  Cc: bigeasy, tglx, rostedt, ming.yu, yunqian.wang, linux-mmc,
	linux-kernel, linux-rt-users

On Mon, 5 Sept 2022 at 08:22, dinggao.pan <dinggao.pan@horizon.ai> wrote:
>
> Hi,
> After applying rt patches to our 4.14 kernel and enabling preempt-rt, we met a hung task during boot caused by race condition on context_info stored in struct mmc_host.
> From our investigation, context_info should not be changed by threads that have not claimed the host, hence the following fix.
>
> Any comments are much appreciated.
> Dinggao Pan

Hi Dinggao,

Apologize for the delay.

The 4.14 kernel is too old for me to be able to comment. In
particular, the mmc block layer moved to blk-mq in v4.16, which means
the path you are investigating doesn't exist any more, sorry.

Kind regards
Uffe

>
> From: "Dinggao Pan" <mailto:dinggao.pan@horizon.ai>
>
>   A race condition happens under following circumstances:
>     (mmc_thread1)               |              (mmc_thread2)
>     mmc_issue_rq(req1)          |
>       > qcnt++ for req1         |
>         host handling req1      |
>     mmc_queue_thread(req=null)  |
>       > enter queue thread      |
>         again, fetches blk req  |
>         (return null), sets     |
>         is_waiting_last_req 1   |  mmc_request_fn(req1) -> set is_new_req 1
>                                 |                   and wake_up wait_queue
>     mmc_issue_rq(req2)          |   > mmc_thread2 tries to claim host
>       > **qcnt++ for req2**     |
>       mmc_finalize_req(req2)    |
>         > should wait for req1  |
>           done but req2 return  |
>           MMC_BLK_NEW_REQ       |
>           due to is_new_req     |
>           already set to 1      |
>                                 |
>                                 |
>     req1 done                   |
>       > qcnt-- for req1         |
>     mmc_issue_rq(req3)          |
>       > qcnt++ for req3         |
> req2 is not handled but qcnt is already added(noted by **),
> thus mmc_thread1 will never release host, causing mmc_threads
> except thread1 to hung. Fix race by moving wake_up to the front of
> context_info update.
>
> Reviewed By: Yunqian Wang <mailto:yunqian.wang@horizon.ai>
> Signed-off-by: Dinggao Pan <mailto:dinggao.pan@horizon.ai>
> Signed-off-by: Ming Yu <mailto:ming.yu@horizon.ai>
> ---
> drivers/mmc/core/queue.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> index 0a4e77a5b..58318c102 100644
> --- a/drivers/mmc/core/queue.c
> +++ b/drivers/mmc/core/queue.c
> @@ -107,6 +107,11 @@ static void mmc_request_fn(struct request_queue *q)
>                return;
>       }
>
> +      if (mq->asleep) {
> +               wake_up_process(mq->thread);
> +               return;
> +      }
> +
>       cntx = &mq->card->host->context_info;
>
>       if (cntx->is_waiting_last_req) {
> @@ -114,8 +119,6 @@ static void mmc_request_fn(struct request_queue *q)
>                wake_up_interruptible(&cntx->wait);
>       }
>
> -       if (mq->asleep)
> -                wake_up_process(mq->thread);
> }
>
> static struct scatterlist *mmc_alloc_sg(int sg_len, gfp_t gfp)
> --
> 2.36.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC stable 4.14 1/1] mmc: core: fix hung task caused by race condition on context_info
  2022-09-29 12:41   ` [PATCH RFC stable 4.14 1/1] mmc: core: fix hung task caused by race condition on context_info Ulf Hansson
@ 2022-09-29 14:07     ` Steven Rostedt
  2022-09-29 14:13       ` Steven Rostedt
  2022-09-29 16:39     ` Luis Claudio R. Goncalves
  1 sibling, 1 reply; 5+ messages in thread
From: Steven Rostedt @ 2022-09-29 14:07 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: dinggao.pan, bigeasy, tglx, ming.yu, yunqian.wang, linux-mmc,
	linux-kernel, linux-rt-users, Luis Claudio R. Goncalves

On Thu, 29 Sep 2022 14:41:26 +0200
Ulf Hansson <ulf.hansson@linaro.org> wrote:

> On Mon, 5 Sept 2022 at 08:22, dinggao.pan <dinggao.pan@horizon.ai> wrote:
> >
> > Hi,
> > After applying rt patches to our 4.14 kernel and enabling preempt-rt, we met a hung task during boot caused by race condition on context_info stored in struct mmc_host.
> > From our investigation, context_info should not be changed by threads that have not claimed the host, hence the following fix.
> >
> > Any comments are much appreciated.
> > Dinggao Pan  
> 
> Hi Dinggao,
> 
> Apologize for the delay.
> 
> The 4.14 kernel is too old for me to be able to comment. In
> particular, the mmc block layer moved to blk-mq in v4.16, which means
> the path you are investigating doesn't exist any more, sorry.

Luis (Cc'd) is still supporting the 4.14-rt kernel.

-- Steve

> 
> Kind regards
> Uffe
> 
> >
> > From: "Dinggao Pan" <mailto:dinggao.pan@horizon.ai>
> >
> >   A race condition happens under following circumstances:
> >     (mmc_thread1)               |              (mmc_thread2)
> >     mmc_issue_rq(req1)          |  
> >       > qcnt++ for req1         |  
> >         host handling req1      |
> >     mmc_queue_thread(req=null)  |  
> >       > enter queue thread      |  
> >         again, fetches blk req  |
> >         (return null), sets     |
> >         is_waiting_last_req 1   |  mmc_request_fn(req1) -> set is_new_req 1
> >                                 |                   and wake_up wait_queue
> >     mmc_issue_rq(req2)          |   > mmc_thread2 tries to claim host  
> >       > **qcnt++ for req2**     |  
> >       mmc_finalize_req(req2)    |  
> >         > should wait for req1  |  
> >           done but req2 return  |
> >           MMC_BLK_NEW_REQ       |
> >           due to is_new_req     |
> >           already set to 1      |
> >                                 |
> >                                 |
> >     req1 done                   |  
> >       > qcnt-- for req1         |  
> >     mmc_issue_rq(req3)          |  
> >       > qcnt++ for req3         |  
> > req2 is not handled but qcnt is already added(noted by **),
> > thus mmc_thread1 will never release host, causing mmc_threads
> > except thread1 to hung. Fix race by moving wake_up to the front of
> > context_info update.
> >
> > Reviewed By: Yunqian Wang <mailto:yunqian.wang@horizon.ai>
> > Signed-off-by: Dinggao Pan <mailto:dinggao.pan@horizon.ai>
> > Signed-off-by: Ming Yu <mailto:ming.yu@horizon.ai>
> > ---
> > drivers/mmc/core/queue.c | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> > index 0a4e77a5b..58318c102 100644
> > --- a/drivers/mmc/core/queue.c
> > +++ b/drivers/mmc/core/queue.c
> > @@ -107,6 +107,11 @@ static void mmc_request_fn(struct request_queue *q)
> >                return;
> >       }
> >
> > +      if (mq->asleep) {
> > +               wake_up_process(mq->thread);
> > +               return;
> > +      }
> > +
> >       cntx = &mq->card->host->context_info;
> >
> >       if (cntx->is_waiting_last_req) {
> > @@ -114,8 +119,6 @@ static void mmc_request_fn(struct request_queue *q)
> >                wake_up_interruptible(&cntx->wait);
> >       }
> >
> > -       if (mq->asleep)
> > -                wake_up_process(mq->thread);
> > }
> >
> > static struct scatterlist *mmc_alloc_sg(int sg_len, gfp_t gfp)
> > --
> > 2.36.1  


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC stable 4.14 1/1] mmc: core: fix hung task caused by race condition on context_info
  2022-09-29 14:07     ` Steven Rostedt
@ 2022-09-29 14:13       ` Steven Rostedt
  2022-09-29 14:24         ` bigeasy
  0 siblings, 1 reply; 5+ messages in thread
From: Steven Rostedt @ 2022-09-29 14:13 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: dinggao.pan, bigeasy, tglx, ming.yu, yunqian.wang, linux-mmc,
	linux-kernel, linux-rt-users, Luis Claudio R. Goncalves

On Thu, 29 Sep 2022 10:07:50 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Thu, 29 Sep 2022 14:41:26 +0200
> Ulf Hansson <ulf.hansson@linaro.org> wrote:
> 
> > On Mon, 5 Sept 2022 at 08:22, dinggao.pan <dinggao.pan@horizon.ai> wrote:  
> > >
> > > Hi,
> > > After applying rt patches to our 4.14 kernel and enabling preempt-rt, we met a hung task during boot caused by race condition on context_info stored in struct mmc_host.
> > > From our investigation, context_info should not be changed by threads that have not claimed the host, hence the following fix.
> > >
> > > Any comments are much appreciated.
> > > Dinggao Pan    
> > 
> > Hi Dinggao,
> > 
> > Apologize for the delay.
> > 
> > The 4.14 kernel is too old for me to be able to comment. In
> > particular, the mmc block layer moved to blk-mq in v4.16, which means
> > the path you are investigating doesn't exist any more, sorry.  
> 
> Luis (Cc'd) is still supporting the 4.14-rt kernel.

It appears that I have an old email address for Luis, and it bounced.
Updated with his redhat one.

-- Steve

> 
> > 
> > Kind regards
> > Uffe
> >   
> > >
> > > From: "Dinggao Pan" <mailto:dinggao.pan@horizon.ai>
> > >
> > >   A race condition happens under following circumstances:
> > >     (mmc_thread1)               |              (mmc_thread2)
> > >     mmc_issue_rq(req1)          |    
> > >       > qcnt++ for req1         |    
> > >         host handling req1      |
> > >     mmc_queue_thread(req=null)  |    
> > >       > enter queue thread      |    
> > >         again, fetches blk req  |
> > >         (return null), sets     |
> > >         is_waiting_last_req 1   |  mmc_request_fn(req1) -> set is_new_req 1
> > >                                 |                   and wake_up wait_queue
> > >     mmc_issue_rq(req2)          |   > mmc_thread2 tries to claim host    
> > >       > **qcnt++ for req2**     |    
> > >       mmc_finalize_req(req2)    |    
> > >         > should wait for req1  |    
> > >           done but req2 return  |
> > >           MMC_BLK_NEW_REQ       |
> > >           due to is_new_req     |
> > >           already set to 1      |
> > >                                 |
> > >                                 |
> > >     req1 done                   |    
> > >       > qcnt-- for req1         |    
> > >     mmc_issue_rq(req3)          |    
> > >       > qcnt++ for req3         |    
> > > req2 is not handled but qcnt is already added(noted by **),
> > > thus mmc_thread1 will never release host, causing mmc_threads
> > > except thread1 to hung. Fix race by moving wake_up to the front of
> > > context_info update.
> > >
> > > Reviewed By: Yunqian Wang <mailto:yunqian.wang@horizon.ai>
> > > Signed-off-by: Dinggao Pan <mailto:dinggao.pan@horizon.ai>
> > > Signed-off-by: Ming Yu <mailto:ming.yu@horizon.ai>
> > > ---
> > > drivers/mmc/core/queue.c | 7 +++++--
> > > 1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> > > index 0a4e77a5b..58318c102 100644
> > > --- a/drivers/mmc/core/queue.c
> > > +++ b/drivers/mmc/core/queue.c
> > > @@ -107,6 +107,11 @@ static void mmc_request_fn(struct request_queue *q)
> > >                return;
> > >       }
> > >
> > > +      if (mq->asleep) {
> > > +               wake_up_process(mq->thread);
> > > +               return;
> > > +      }
> > > +
> > >       cntx = &mq->card->host->context_info;
> > >
> > >       if (cntx->is_waiting_last_req) {
> > > @@ -114,8 +119,6 @@ static void mmc_request_fn(struct request_queue *q)
> > >                wake_up_interruptible(&cntx->wait);
> > >       }
> > >
> > > -       if (mq->asleep)
> > > -                wake_up_process(mq->thread);
> > > }
> > >
> > > static struct scatterlist *mmc_alloc_sg(int sg_len, gfp_t gfp)
> > > --
> > > 2.36.1    
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC stable 4.14 1/1] mmc: core: fix hung task caused by race condition on context_info
  2022-09-29 14:13       ` Steven Rostedt
@ 2022-09-29 14:24         ` bigeasy
  0 siblings, 0 replies; 5+ messages in thread
From: bigeasy @ 2022-09-29 14:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ulf Hansson, dinggao.pan, tglx, ming.yu, yunqian.wang, linux-mmc,
	linux-kernel, linux-rt-users, Luis Claudio R. Goncalves

On 2022-09-29 10:13:49 [-0400], Steven Rostedt wrote:
> It appears that I have an old email address for Luis, and it bounced.
> Updated with his redhat one.

It would be to know if this affects mainline. If not, does it affect
4.14 or just 4.14-RT. If it is not limited to RT then it should be sent
-stable once we know how it is fixed mainline.

> -- Steve

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC stable 4.14 1/1] mmc: core: fix hung task caused by race condition on context_info
  2022-09-29 12:41   ` [PATCH RFC stable 4.14 1/1] mmc: core: fix hung task caused by race condition on context_info Ulf Hansson
  2022-09-29 14:07     ` Steven Rostedt
@ 2022-09-29 16:39     ` Luis Claudio R. Goncalves
  1 sibling, 0 replies; 5+ messages in thread
From: Luis Claudio R. Goncalves @ 2022-09-29 16:39 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: dinggao.pan, bigeasy, tglx, rostedt, ming.yu, yunqian.wang,
	linux-mmc, linux-kernel, linux-rt-users

On Thu, Sep 29, 2022 at 02:41:26PM +0200, Ulf Hansson wrote:
> On Mon, 5 Sept 2022 at 08:22, dinggao.pan <dinggao.pan@horizon.ai> wrote:
> >
> > Hi,
> > After applying rt patches to our 4.14 kernel and enabling preempt-rt, we met a hung task during boot caused by race condition on context_info stored in struct mmc_host.
> > From our investigation, context_info should not be changed by threads that have not claimed the host, hence the following fix.
> >
> > Any comments are much appreciated.
> > Dinggao Pan
> 
> Hi Dinggao,
> 
> Apologize for the delay.
> 
> The 4.14 kernel is too old for me to be able to comment. In
> particular, the mmc block layer moved to blk-mq in v4.16, which means
> the path you are investigating doesn't exist any more, sorry.

And the new code has the queue operations protected by a spinlock
(queue_lock), which I believe is necessary to fix the issue reported
here.

Luis
 
> Kind regards
> Uffe
> 
> >
> > From: "Dinggao Pan" <mailto:dinggao.pan@horizon.ai>
> >
> >   A race condition happens under following circumstances:
> >     (mmc_thread1)               |              (mmc_thread2)
> >     mmc_issue_rq(req1)          |
> >       > qcnt++ for req1         |
> >         host handling req1      |
> >     mmc_queue_thread(req=null)  |
> >       > enter queue thread      |
> >         again, fetches blk req  |
> >         (return null), sets     |
> >         is_waiting_last_req 1   |  mmc_request_fn(req1) -> set is_new_req 1
> >                                 |                   and wake_up wait_queue
> >     mmc_issue_rq(req2)          |   > mmc_thread2 tries to claim host
> >       > **qcnt++ for req2**     |
> >       mmc_finalize_req(req2)    |
> >         > should wait for req1  |
> >           done but req2 return  |
> >           MMC_BLK_NEW_REQ       |
> >           due to is_new_req     |
> >           already set to 1      |
> >                                 |
> >                                 |
> >     req1 done                   |
> >       > qcnt-- for req1         |
> >     mmc_issue_rq(req3)          |
> >       > qcnt++ for req3         |
> > req2 is not handled but qcnt is already added(noted by **),
> > thus mmc_thread1 will never release host, causing mmc_threads
> > except thread1 to hung. Fix race by moving wake_up to the front of
> > context_info update.
> >
> > Reviewed By: Yunqian Wang <mailto:yunqian.wang@horizon.ai>
> > Signed-off-by: Dinggao Pan <mailto:dinggao.pan@horizon.ai>
> > Signed-off-by: Ming Yu <mailto:ming.yu@horizon.ai>
> > ---
> > drivers/mmc/core/queue.c | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> > index 0a4e77a5b..58318c102 100644
> > --- a/drivers/mmc/core/queue.c
> > +++ b/drivers/mmc/core/queue.c
> > @@ -107,6 +107,11 @@ static void mmc_request_fn(struct request_queue *q)
> >                return;
> >       }
> >
> > +      if (mq->asleep) {
> > +               wake_up_process(mq->thread);
> > +               return;
> > +      }
> > +
> >       cntx = &mq->card->host->context_info;
> >
> >       if (cntx->is_waiting_last_req) {
> > @@ -114,8 +119,6 @@ static void mmc_request_fn(struct request_queue *q)
> >                wake_up_interruptible(&cntx->wait);
> >       }
> >
> > -       if (mq->asleep)
> > -                wake_up_process(mq->thread);
> > }
> >
> > static struct scatterlist *mmc_alloc_sg(int sg_len, gfp_t gfp)
> > --
> > 2.36.1
> 
---end quoted text---


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-09-29 16:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <21f604139a9a4675b9ed49292839dcfb@horizon.ai>
     [not found] ` <dd8d212c48944cb4ba3b58af2efe3723@horizon.ai>
2022-09-29 12:41   ` [PATCH RFC stable 4.14 1/1] mmc: core: fix hung task caused by race condition on context_info Ulf Hansson
2022-09-29 14:07     ` Steven Rostedt
2022-09-29 14:13       ` Steven Rostedt
2022-09-29 14:24         ` bigeasy
2022-09-29 16:39     ` Luis Claudio R. Goncalves

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.