linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Qian Cai <cai@lca.pw>
Cc: akpm@linux-foundation.org, sergey.senozhatsky.work@gmail.com,
	pmladek@suse.com, rostedt@goodmis.org, peterz@infradead.org,
	david@redhat.com, john.ogness@linutronix.de, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] mm/page_isolation: fix a deadlock with printk()
Date: Mon, 7 Oct 2019 10:07:42 +0200	[thread overview]
Message-ID: <20191007080742.GD2381@dhcp22.suse.cz> (raw)
In-Reply-To: <1570228005-24979-1-git-send-email-cai@lca.pw>

On Fri 04-10-19 18:26:45, Qian Cai wrote:
> It is unsafe to call printk() while zone->lock was held, i.e.,
> 
> zone->lock --> console_lock
> 
> because the console could always allocate some memory in different code
> paths and form locking chains in an opposite order,
> 
> console_lock --> * --> zone->lock
> 
> As the result, it triggers lockdep splats like below and in different
> code paths in this thread [1]. Since has_unmovable_pages() was only used
> in set_migratetype_isolate() and is_pageblock_removable_nolock(). Only
> the former will set the REPORT_FAILURE flag which will call printk().
> Hence, unlock the zone->lock just before the dump_page() there where
> when has_unmovable_pages() returns true, there is no need to hold the
> lock anyway in the rest of set_migratetype_isolate().
> 
> While at it, remove a problematic printk() in __offline_isolated_pages()
> only for debugging as well which will always disable lockdep on debug
> kernels.

I do not think that removing the printk is the right long term solution.
While I do agree that removing the debugging printk __offline_isolated_pages
does make sense because it is essentially of a very limited use, this
doesn't really solve the underlying problem.  There are likely other
printks from zone->lock. It would be much more saner to actually
disallow consoles to allocate any memory while printk is called from an
atomic context.

> The problem is probably there forever, but neither many developers will
> run memory offline with the lockdep enabled nor admins in the field are
> lucky enough yet to hit a perfect timing which required to trigger a
> real deadlock. In addition, there aren't many places that call printk()
> while zone->lock was held.
> 
> WARNING: possible circular locking dependency detected
> ------------------------------------------------------
> test.sh/1724 is trying to acquire lock:
> 0000000052059ec0 (console_owner){-...}, at: console_unlock+0x
> 01: 328/0xa30
> 
> but task is already holding lock:
> 000000006ffd89c8 (&(&zone->lock)->rlock){-.-.}, at: start_iso
> 01: late_page_range+0x216/0x538

I am also wondering what does this lockdep report actually say. How come
we have a dependency between a start_kernel path and a syscall?

> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #2 (&(&zone->lock)->rlock){-.-.}:
>        lock_acquire+0x21a/0x468
>        _raw_spin_lock+0x54/0x68
>        get_page_from_freelist+0x8b6/0x2d28
>        __alloc_pages_nodemask+0x246/0x658
>        __get_free_pages+0x34/0x78
>        sclp_init+0x106/0x690
>        sclp_register+0x2e/0x248
>        sclp_rw_init+0x4a/0x70
>        sclp_console_init+0x4a/0x1b8
>        console_init+0x2c8/0x410
>        start_kernel+0x530/0x6a0
>        startup_continue+0x70/0xd0
> 
> -> #1 (sclp_lock){-.-.}:
>        lock_acquire+0x21a/0x468
>        _raw_spin_lock_irqsave+0xcc/0xe8
>        sclp_add_request+0x34/0x308
>        sclp_conbuf_emit+0x100/0x138
>        sclp_console_write+0x96/0x3b8
>        console_unlock+0x6dc/0xa30
>        vprintk_emit+0x184/0x3c8
>        vprintk_default+0x44/0x50
>        printk+0xa8/0xc0
>        iommu_debugfs_setup+0xf2/0x108
>        iommu_init+0x6c/0x78
>        do_one_initcall+0x162/0x680
>        kernel_init_freeable+0x4e8/0x5a8
>        kernel_init+0x2a/0x188
>        ret_from_fork+0x30/0x34
>        kernel_thread_starter+0x0/0xc
> 
> -> #0 (console_owner){-...}:
>        check_noncircular+0x338/0x3e0
>        __lock_acquire+0x1e66/0x2d88
>        lock_acquire+0x21a/0x468
>        console_unlock+0x3a6/0xa30
>        vprintk_emit+0x184/0x3c8
>        vprintk_default+0x44/0x50
>        printk+0xa8/0xc0
>        __dump_page+0x1dc/0x710
>        dump_page+0x2e/0x58
>        has_unmovable_pages+0x2e8/0x470
>        start_isolate_page_range+0x404/0x538
>        __offline_pages+0x22c/0x1338
>        memory_subsys_offline+0xa6/0xe8
>        device_offline+0xe6/0x118
>        state_store+0xf0/0x110
>        kernfs_fop_write+0x1bc/0x270
>        vfs_write+0xce/0x220
>        ksys_write+0xea/0x190
>        system_call+0xd8/0x2b4
> 
> other info that might help us debug this:
> 
> Chain exists of:
>   console_owner --> sclp_lock --> &(&zone->lock)->rlock
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(&(&zone->lock)->rlock);
>                                lock(sclp_lock);
>                                lock(&(&zone->lock)->rlock);
>   lock(console_owner);
> 
>  *** DEADLOCK ***
> 
> 9 locks held by test.sh/1724:
>  #0: 000000000e925408 (sb_writers#4){.+.+}, at: vfs_write+0x201:
>  #1: 0000000050aa4280 (&of->mutex){+.+.}, at: kernfs_fop_write:
>  #2: 0000000062e5c628 (kn->count#198){.+.+}, at: kernfs_fop_write
>  #3: 00000000523236a0 (device_hotplug_lock){+.+.}, at:
> lock_device_hotplug_sysfs+0x30/0x80
>  #4: 0000000062e70990 (&dev->mutex){....}, at: device_offline
>  #5: 0000000051fd36b0 (cpu_hotplug_lock.rw_sem){++++}, at:
> __offline_pages+0xec/0x1338
>  #6: 00000000521ca470 (mem_hotplug_lock.rw_sem){++++}, at:
> percpu_down_write+0x38/0x210
>  #7: 000000006ffd89c8 (&(&zone->lock)->rlock){-.-.}, at:
> start_isolate_page_range+0x216/0x538
>  #8: 000000005205a100 (console_lock){+.+.}, at: vprintk_emit
> 
> stack backtrace:
> Hardware name: IBM 2964 N96 400 (z/VM 6.4.0)
> Call Trace:
> ([<00000000512ae218>] show_stack+0x110/0x1b0)
>  [<0000000051b6d506>] dump_stack+0x126/0x178
>  [<00000000513a4b08>] check_noncircular+0x338/0x3e0
>  [<00000000513aaaf6>] __lock_acquire+0x1e66/0x2d88
>  [<00000000513a7e12>] lock_acquire+0x21a/0x468
>  [<00000000513bb2fe>] console_unlock+0x3a6/0xa30
>  [<00000000513bde2c>] vprintk_emit+0x184/0x3c8
>  [<00000000513be0b4>] vprintk_default+0x44/0x50
>  [<00000000513beb60>] printk+0xa8/0xc0
>  [<000000005158c364>] __dump_page+0x1dc/0x710
>  [<000000005158c8c6>] dump_page+0x2e/0x58
>  [<00000000515d87c8>] has_unmovable_pages+0x2e8/0x470
>  [<000000005167072c>] start_isolate_page_range+0x404/0x538
>  [<0000000051b96de4>] __offline_pages+0x22c/0x1338
>  [<0000000051908586>] memory_subsys_offline+0xa6/0xe8
>  [<00000000518e561e>] device_offline+0xe6/0x118
>  [<0000000051908170>] state_store+0xf0/0x110
>  [<0000000051796384>] kernfs_fop_write+0x1bc/0x270
>  [<000000005168972e>] vfs_write+0xce/0x220
>  [<0000000051689b9a>] ksys_write+0xea/0x190
>  [<0000000051ba9990>] system_call+0xd8/0x2b4
> INFO: lockdep is turned off.
-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2019-10-07  8:07 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-04 22:26 [PATCH v2] mm/page_isolation: fix a deadlock with printk() Qian Cai
2019-10-07  8:07 ` Michal Hocko [this message]
2019-10-07  9:05   ` Petr Mladek
2019-10-07 11:33     ` Michal Hocko
2019-10-07 12:34     ` Qian Cai
2019-10-07 11:04   ` Qian Cai
2019-10-07 11:37     ` Michal Hocko
2019-10-07 12:11       ` Qian Cai
2019-10-07 12:43         ` Michal Hocko
2019-10-07 13:07           ` Qian Cai
2019-10-07 14:10             ` Petr Mladek
2019-10-07 14:30 ` Petr Mladek
2019-10-07 14:49   ` Michal Hocko
2019-10-08  7:43     ` Petr Mladek
2019-10-08  8:27       ` Michal Hocko
2019-10-08 12:56         ` Christian Borntraeger
2019-10-08 16:08           ` Qian Cai
2019-10-08 18:35             ` Michal Hocko
2019-10-08 19:06               ` Qian Cai
2019-10-08 19:17                 ` Michal Hocko
2019-10-08 19:35                   ` Qian Cai
2019-10-09 11:49                     ` Petr Mladek
2019-10-09 13:06                       ` Qian Cai
2019-10-09 13:27                         ` Michal Hocko
2019-10-09 13:43                           ` Qian Cai
2019-10-09 13:51                             ` Michal Hocko
2019-10-09 14:19                               ` Qian Cai
2019-10-09 14:34                                 ` Michal Hocko
2019-10-09 15:08                                   ` Qian Cai
2019-10-09 16:23                                     ` Michal Hocko
2019-10-09 16:23                                       ` Michal Hocko
2019-10-10  9:01                                       ` Qian Cai
2019-10-10 10:59                                         ` Michal Hocko
2019-10-10 13:11                                           ` Qian Cai
2019-10-10 14:18                                             ` Michal Hocko
2019-10-10 14:47                                               ` Qian Cai
2019-10-10 17:30                                                 ` Michal Hocko
2019-10-10 17:48                                                   ` Qian Cai
2019-10-10 18:06                                                     ` Michal Hocko
2019-10-10 18:59                                                       ` David Hildenbrand
2019-10-09 14:24                             ` Petr Mladek
2019-10-09 14:46                               ` Qian Cai
2019-10-10  7:57                                 ` Petr Mladek
2019-10-09 11:39                 ` Petr Mladek
2019-10-09 13:56             ` Peter Oberparleiter
2019-10-09 14:26               ` Michal Hocko
2019-10-10  5:12                 ` Sergey Senozhatsky
2019-10-10  7:40                   ` Michal Hocko
2019-10-10  8:16                     ` Sergey Senozhatsky
2019-10-10  8:37                       ` Michal Hocko
2019-10-10  8:21                   ` Petr Mladek
2019-10-10  8:39                     ` Sergey Senozhatsky
2019-10-10 11:11                       ` Petr Mladek
2019-10-09 15:25               ` Qian Cai
2019-10-09 15:25                 ` Qian Cai
2019-10-07 14:59   ` Qian Cai
2019-10-07 15:12     ` Michal Hocko
2019-10-07 15:33       ` Qian Cai
2019-10-08  8:15         ` Petr Mladek
2019-10-08  9:32           ` Qian Cai
2019-10-08 13:13           ` Steven Rostedt
2019-10-08 13:23             ` Qian Cai
2019-10-08 13:33               ` Steven Rostedt
2019-10-08 13:42               ` Petr Mladek
2019-10-08 13:48                 ` Michal Hocko
2019-10-08 14:03                 ` Qian Cai
2019-10-08 14:08                   ` Michal Hocko
2019-10-08  8:40         ` Michal Hocko
2019-10-08 10:04           ` Qian Cai
2019-10-08 10:39             ` Michal Hocko
2019-10-08 12:00               ` Qian Cai
2019-10-08 12:39                 ` Michal Hocko
2019-10-08 13:06                   ` Qian Cai
2019-10-08 13:37                     ` Michal Hocko
2019-10-08 13:08     ` Petr Mladek
2019-10-08 13:33       ` Qian Cai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191007080742.GD2381@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cai@lca.pw \
    --cc=david@redhat.com \
    --cc=john.ogness@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).