dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()
       [not found]       ` <20200928090143.GA2628@hirez.programming.kicks-ass.net>
@ 2020-09-28 10:04         ` Chengming Zhou
  2020-09-28 10:25           ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Chengming Zhou @ 2020-09-28 10:04 UTC (permalink / raw)
  To: Peter Zijlstra, maarten.lankhorst, mripard, tzimmermann, airlied,
	daniel, dri-devel
  Cc: juri.lelli, pmladek, vincent.guittot, linux-kernel, rostedt,
	bsegall, sergey.senozhatsky, mingo, mgorman, songmuchun,
	dietmar.eggemann


在 2020/9/28 下午5:01, Peter Zijlstra 写道:
> On Mon, Sep 28, 2020 at 04:54:53PM +0800, Chengming Zhou wrote:
>> 在 2020/9/28 下午3:32, Peter Zijlstra 写道:
>>> On Mon, Sep 28, 2020 at 12:11:30AM +0800, Chengming Zhou wrote:
>>>> The WARN_ON/WARN_ON_ONCE with rq lock held in __schedule() should be
>>>> deferred by marking the PRINTK_DEFERRED_CONTEXT_MASK, or will cause
>>>> deadlock on rq lock in the printk path.
>>> It also shouldn't happen in the first place, so who bloody cares.
>> Yes, but if our box deadlock just because a WARN_ON_ONCE, we have to
>> reboot : (
> You have to reboot anyway to get into the fixed kernel.

Mostly, WARN_ON_ONCE happened in the perf code on our machines, we actually

don't care too much about the perf function works : )   Looks like we
have to find and

fix that perf bug before go on...

>> So these WARN_ON_ONCE have BUG_ON effect ?  Or we should change to use
>> BUG_ON ?
> Or use a sane printk implementation, I never suffer this.

Well, you are lucky. So it's a problem in our printk implementation.

The deadlock path is:

printk

  vprintk_emit

    console_unlock

      vt_console_print

        hide_cursor

          bit_cursor

            soft_cursor

              queue_work_on

                __queue_work

                  try_to_wake_up

                    _raw_spin_lock

                      native_queued_spin_lock_slowpath

Looks like it's introduced by this commit:

eaa434defaca1781fb2932c685289b610aeb8b4b

"drm/fb-helper: Add fb_deferred_io support"

    This adds deferred io support to drm_fb_helper.
    The fbdev framebuffer changes are flushed using the callback
    (struct drm_framebuffer *)->funcs->dirty() by a dedicated worker
    ensuring that it always runs in process context.
   
    For those wondering why we need to be able to handle atomic calling
    contexts: Both panic paths and cursor code and fbcon blanking can run
    from atomic. See
   
    commit bcb39af4486be07e896fc374a2336bad3104ae0a
    Author: Dave Airlie <airlied@redhat.com>
    Date:   Thu Feb 7 11:19:15 2013 +1000
   
        drm/udl: make usage as a console safer
   
    for where this was originally discovered.

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()
  2020-09-28 10:04         ` [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule() Chengming Zhou
@ 2020-09-28 10:25           ` Peter Zijlstra
  2020-09-28 23:50             ` Sergey Senozhatsky
  2020-09-29 14:27             ` Petr Mladek
  0 siblings, 2 replies; 6+ messages in thread
From: Peter Zijlstra @ 2020-09-28 10:25 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: juri.lelli, pmladek, vincent.guittot, tzimmermann, john.ogness,
	airlied, linux-kernel, dri-devel, bsegall, sergey.senozhatsky,
	mingo, rostedt, songmuchun, dietmar.eggemann, mgorman

On Mon, Sep 28, 2020 at 06:04:23PM +0800, Chengming Zhou wrote:

> Well, you are lucky. So it's a problem in our printk implementation.

Not lucky; I just kicked it in the groin really hard:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git debug/experimental

> The deadlock path is:
> 
> printk
>   vprintk_emit
>     console_unlock
>       vt_console_print
>         hide_cursor
>           bit_cursor
>             soft_cursor
>               queue_work_on
>                 __queue_work
>                   try_to_wake_up
>                     _raw_spin_lock
>                       native_queued_spin_lock_slowpath
> 
> Looks like it's introduced by this commit:
> 
> eaa434defaca1781fb2932c685289b610aeb8b4b
> 
> "drm/fb-helper: Add fb_deferred_io support"

Oh gawd, yeah, all the !serial consoles are utter batshit.

Please look at John's last printk rewrite, IIRC it farms all that off to
a kernel thread instead of doing it from the printk() caller's context.

I'm not sure where he hides his latests patches, but I'm sure he'll be
more than happy to tell you.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()
  2020-09-28 10:25           ` Peter Zijlstra
@ 2020-09-28 23:50             ` Sergey Senozhatsky
  2020-09-29 14:27             ` Petr Mladek
  1 sibling, 0 replies; 6+ messages in thread
From: Sergey Senozhatsky @ 2020-09-28 23:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: juri.lelli, pmladek, vincent.guittot, tzimmermann, john.ogness,
	airlied, songmuchun, linux-kernel, bsegall, sergey.senozhatsky,
	mingo, rostedt, dri-devel, mgorman, dietmar.eggemann,
	Chengming Zhou

On (20/09/28 12:25), Peter Zijlstra wrote:
[..]
> > printk
> >   vprintk_emit
> >     console_unlock
> >       vt_console_print
> >         hide_cursor
> >           bit_cursor
> >             soft_cursor
> >               queue_work_on
> >                 __queue_work
> >                   try_to_wake_up
> >                     _raw_spin_lock
> >                       native_queued_spin_lock_slowpath
> > 
> > Looks like it's introduced by this commit:
> > 
> > eaa434defaca1781fb2932c685289b610aeb8b4b
> > 
> > "drm/fb-helper: Add fb_deferred_io support"
> 
> Oh gawd, yeah, all the !serial consoles are utter batshit.
> 
> Please look at John's last printk rewrite, IIRC it farms all that off to
> a kernel thread instead of doing it from the printk() caller's context.

Not yet. Scheduler is still part of the printk() - either in the
form of !serial consoles or console_sem, or both.

	-ss
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()
  2020-09-28 10:25           ` Peter Zijlstra
  2020-09-28 23:50             ` Sergey Senozhatsky
@ 2020-09-29 14:27             ` Petr Mladek
  2020-09-29 15:09               ` Peter Zijlstra
  1 sibling, 1 reply; 6+ messages in thread
From: Petr Mladek @ 2020-09-29 14:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: juri.lelli, vincent.guittot, tzimmermann, john.ogness, airlied,
	songmuchun, linux-kernel, bsegall, sergey.senozhatsky, mingo,
	rostedt, dri-devel, mgorman, dietmar.eggemann, Chengming Zhou

On Mon 2020-09-28 12:25:59, Peter Zijlstra wrote:
> On Mon, Sep 28, 2020 at 06:04:23PM +0800, Chengming Zhou wrote:
> 
> > Well, you are lucky. So it's a problem in our printk implementation.
> 
> Not lucky; I just kicked it in the groin really hard:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git debug/experimental
> 
> > The deadlock path is:
> > 
> > printk
> >   vprintk_emit
> >     console_unlock
> >       vt_console_print
> >         hide_cursor
> >           bit_cursor
> >             soft_cursor
> >               queue_work_on
> >                 __queue_work
> >                   try_to_wake_up
> >                     _raw_spin_lock
> >                       native_queued_spin_lock_slowpath
> > 
> > Looks like it's introduced by this commit:
> > 
> > eaa434defaca1781fb2932c685289b610aeb8b4b
> > 
> > "drm/fb-helper: Add fb_deferred_io support"
> 
> Oh gawd, yeah, all the !serial consoles are utter batshit.
> 
> Please look at John's last printk rewrite, IIRC it farms all that off to
> a kernel thread instead of doing it from the printk() caller's context.
> 
> I'm not sure where he hides his latests patches, but I'm sure he'll be
> more than happy to tell you.

AFAIK, John is just working on updating the patchset so that it will
be based on the lockless ringbuffer that is finally in the queue
for-5.10.

Upstreaming the console handling will be the next big step. I am sure
that there will be long discussion about it. But there might be
few things that would help removing printk_deferred().

1. Messages will be printed on consoles by dedicated kthreads. It will
   be safe context. No deadlocks.

2. The registration and unregistration of consoles should not longer
   be handled by console_lock (semaphore). It should be possible to
   call most consoles without a sleeping lock. It should remove all
   these deadlocks between printk() and scheduler().

   There might be problems with some consoles. For example, tty would
   most likely still need a sleeping lock because it is using the console
   semaphore also internally.


3. We will try harder to get the messages out immediately during
   panic().


It would take some time until the above reaches upstream. But it seems
to be the right way to go.


About printk_deferred():

It is a whack a mole game. It is easy to miss printk() that might
eventually cause the deadlock.

printk deferred context is more safe. But it is still a what a mole
game. The kthreads will do the same job for sure.

Finally, the deadlock happens "only" when someone is waiting on
console_lock() in parallel. Otherwise, the waitqueue for the semaphore
is empty and scheduler is not called.

It means that there is quite a big change to see the WARN(). It might
be even bigger than with printk_deferred() because WARN() in scheduler
means that the scheduler is big troubles. Nobody guarantees that
the deferred messages will get handled later.

Best Regards,
Petr
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()
  2020-09-29 14:27             ` Petr Mladek
@ 2020-09-29 15:09               ` Peter Zijlstra
  2020-09-30  0:48                 ` Sergey Senozhatsky
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2020-09-29 15:09 UTC (permalink / raw)
  To: Petr Mladek
  Cc: juri.lelli, vincent.guittot, tzimmermann, john.ogness, airlied,
	songmuchun, linux-kernel, bsegall, sergey.senozhatsky, mingo,
	rostedt, dri-devel, mgorman, dietmar.eggemann, Chengming Zhou

On Tue, Sep 29, 2020 at 04:27:51PM +0200, Petr Mladek wrote:

> Upstreaming the console handling will be the next big step. I am sure
> that there will be long discussion about it. But there might be
> few things that would help removing printk_deferred().
> 
> 1. Messages will be printed on consoles by dedicated kthreads. It will
>    be safe context. No deadlocks.

This, that's what I remember. With sane consoles having a
->write_atomic() callback which is called in-line.

Current -RT has a single kthread that's flushing the consoles.

> 2. The registration and unregistration of consoles should not longer
>    be handled by console_lock (semaphore). It should be possible to
>    call most consoles without a sleeping lock. It should remove all
>    these deadlocks between printk() and scheduler().

I'm confused, who cares about registation? That only happens once at
boot, right?

>    There might be problems with some consoles. For example, tty would
>    most likely still need a sleeping lock because it is using the console
>    semaphore also internally.

But per 1) above, that's done from a kthread, so nobody cares.

> 3. We will try harder to get the messages out immediately during
>    panic().

As long as you guarantee to first write everything to consoles with
->write_atomic() before attempting to flush others that should be fine.

> It would take some time until the above reaches upstream. But it seems
> to be the right way to go.

Yep.

> Finally, the deadlock happens "only" when someone is waiting on
> console_lock() in parallel. Otherwise, the waitqueue for the semaphore
> is empty and scheduler is not called.

There's more deadlocks. In fact the one described earlier in this
thread isn't the one you mention.

> It means that there is quite a big change to see the WARN(). It might
> be even bigger than with printk_deferred() because WARN() in scheduler
> means that the scheduler is big troubles. Nobody guarantees that
> the deferred messages will get handled later.

I only care about ->write_atomic() :-) Anything else is a
best-effort-loss anyway.

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()
  2020-09-29 15:09               ` Peter Zijlstra
@ 2020-09-30  0:48                 ` Sergey Senozhatsky
  0 siblings, 0 replies; 6+ messages in thread
From: Sergey Senozhatsky @ 2020-09-30  0:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: juri.lelli, Petr Mladek, vincent.guittot, john.ogness, airlied,
	songmuchun, linux-kernel, bsegall, sergey.senozhatsky, mingo,
	rostedt, dri-devel, tzimmermann, mgorman, dietmar.eggemann,
	Chengming Zhou

On (20/09/29 17:09), Peter Zijlstra wrote:
> > 2. The registration and unregistration of consoles should not longer
> >    be handled by console_lock (semaphore). It should be possible to
> >    call most consoles without a sleeping lock. It should remove all
> >    these deadlocks between printk() and scheduler().
> 
> I'm confused, who cares about registation? That only happens once at
> boot, right?

You can modprobe or rmmod (register/unregister) netconsole anytime,
for example.

	-ss
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-09-30  7:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200927161130.33172-1-zhouchengming@bytedance.com>
     [not found] ` <20200927161130.33172-2-zhouchengming@bytedance.com>
     [not found]   ` <20200928073202.GA2611@hirez.programming.kicks-ass.net>
     [not found]     ` <40ab934e-5b8b-735b-da65-3043efab9fdc@bytedance.com>
     [not found]       ` <20200928090143.GA2628@hirez.programming.kicks-ass.net>
2020-09-28 10:04         ` [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule() Chengming Zhou
2020-09-28 10:25           ` Peter Zijlstra
2020-09-28 23:50             ` Sergey Senozhatsky
2020-09-29 14:27             ` Petr Mladek
2020-09-29 15:09               ` Peter Zijlstra
2020-09-30  0:48                 ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).