From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BA58C47404 for ; Mon, 7 Oct 2019 12:44:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 454DE21655 for ; Mon, 7 Oct 2019 12:44:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 454DE21655 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D29998E0005; Mon, 7 Oct 2019 08:43:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CD9DD8E0003; Mon, 7 Oct 2019 08:43:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BEF528E0005; Mon, 7 Oct 2019 08:43:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0198.hostedemail.com [216.40.44.198]) by kanga.kvack.org (Postfix) with ESMTP id 98AF18E0003 for ; Mon, 7 Oct 2019 08:43:59 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 1CAE2181AC9BA for ; Mon, 7 Oct 2019 12:43:59 +0000 (UTC) X-FDA: 76016955798.14.cow22_22b8e1d3bb317 X-HE-Tag: cow22_22b8e1d3bb317 X-Filterd-Recvd-Size: 6178 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Oct 2019 12:43:58 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 30072AB9B; Mon, 7 Oct 2019 12:43:57 +0000 (UTC) Date: Mon, 7 Oct 2019 14:43:56 +0200 From: Michal Hocko To: Qian Cai Cc: akpm@linux-foundation.org, sergey.senozhatsky.work@gmail.com, pmladek@suse.com, rostedt@goodmis.org, peterz@infradead.org, david@redhat.com, john.ogness@linutronix.de, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2] mm/page_isolation: fix a deadlock with printk() Message-ID: <20191007124356.GJ2381@dhcp22.suse.cz> References: <20191007080742.GD2381@dhcp22.suse.cz> <20191007113710.GH2381@dhcp22.suse.cz> <1570450304.5576.283.camel@lca.pw> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1570450304.5576.283.camel@lca.pw> User-Agent: Mutt/1.10.1 (2018-07-13) Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon 07-10-19 08:11:44, Qian Cai wrote: > On Mon, 2019-10-07 at 13:37 +0200, Michal Hocko wrote: > > On Mon 07-10-19 07:04:00, Qian Cai wrote: > > >=20 > > >=20 > > > > On Oct 7, 2019, at 4:07 AM, Michal Hocko wrot= e: > > > >=20 > > > > I do not think that removing the printk is the right long term so= lution. > > > > While I do agree that removing the debugging printk __offline_iso= lated_pages > > > > does make sense because it is essentially of a very limited use, = this > > > > doesn't really solve the underlying problem. There are likely ot= her > > > > printks from zone->lock. It would be much more saner to actually > > > > disallow consoles to allocate any memory while printk is called f= rom an > > > > atomic context. > > >=20 > > > No, there is only a handful of places called printk() from > > > zone->lock. It is normal that the callers will quietly process > > > =E2=80=9Cstruct zone=E2=80=9D modification in a short section with = zone->lock > > > held. > >=20 > > It is extremely error prone to have any zone->lock vs. printk > > dependency. I do not want to play an endless whack a mole. > >=20 > > > No, it is not about =E2=80=9Callocate any memory while printk is ca= lled from an > > > atomic context=E2=80=9D. It is opposite lock chain from different = processors which has the same effect. For example, > > >=20 > > > CPU0: CPU1: CPU2: > > > console_owner > > > sclp_lock > > > sclp_lock zone_lock > > > zone_lock > > > console_owner > >=20 > > Why would sclp_lock ever take a zone->lock (apart from an allocation)= . > > So really if sclp_lock is a lock that might be taken from many contex= ts > > and generate very subtle lock dependencies then it should better be > > really careful what it is calling into. > >=20 > > In other words you are trying to fix a wrong end of the problem. Fix = the > > console to not allocate or depend on MM by other means. >=20 > It looks there are way too many places that could generate those indire= ct lock > chains that are hard to eliminate them all. Here is anther example, whe= re it > has, Yeah and I strongly suspect they are consoles which are broken and need to be fixed rathert than the problem papered over. I do realize how tempting it is to remove all printks from the zone->lock but do realize that as soon as the allocator starts using any other locks then we are back to square one and the problem is there again. We would have to drop _all_ printks from any locked section in the allocator and I do not think this is viable. Really, the only way forward is to make these consoles be more careful of external dependencies. I am also wondering, this code is there for a long time (or is there any recent change?), why are we seeing reports only now? Are those consoles rarely used or you are simply luck to hit those? Or are those really representing a deadlock? Maybe the lockdep is just confused? I am not familiar with the code but console_owner_lock is doing some complex stuff to hand over the context. > console_owner -> port_lock > port_lock -> zone_lock >=20 > [=C2=A0=C2=A0297.425922] -> #3 (&(&zone->lock)->rlock){-.-.}: > [=C2=A0=C2=A0297.425925]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= __lock_acquire+0x5b3/0xb40 > [=C2=A0=C2=A0297.425925]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= lock_acquire+0x126/0x280 > [=C2=A0=C2=A0297.425926]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= _raw_spin_lock+0x2f/0x40 > [=C2=A0=C2=A0297.425927]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= rmqueue_bulk.constprop.21+0xb6/0x1160 > [=C2=A0=C2=A0297.425928]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= get_page_from_freelist+0x898/0x22c0 > [=C2=A0=C2=A0297.425928]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= __alloc_pages_nodemask+0x2f3/0x1cd0 > [=C2=A0=C2=A0297.425929]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= alloc_pages_current+0x9c/0x110 > [=C2=A0=C2=A0297.425930]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= allocate_slab+0x4c6/0x19c0 > [=C2=A0=C2=A0297.425931]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= new_slab+0x46/0x70 > [=C2=A0=C2=A0297.425931]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= ___slab_alloc+0x58b/0x960 > [=C2=A0=C2=A0297.425932]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= __slab_alloc+0x43/0x70 > [=C2=A0=C2=A0297.425933]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= __kmalloc+0x3ad/0x4b0 > [=C2=A0=C2=A0297.425933]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= __tty_buffer_request_room+0x100/0x250 > [=C2=A0=C2=A0297.425934]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= tty_insert_flip_string_fixed_flag+0x67/0x110 > [=C2=A0=C2=A0297.425935]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= pty_write+0xa2/0xf0 > [=C2=A0=C2=A0297.425936]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= n_tty_write+0x36b/0x7b0 > [=C2=A0=C2=A0297.425936]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= tty_write+0x284/0x4c0 > [=C2=A0=C2=A0297.425937]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= __vfs_write+0x50/0xa0 > [=C2=A0=C2=A0297.425938]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= vfs_write+0x105/0x290 > [=C2=A0=C2=A0297.425939]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= redirected_tty_write+0x6a/0xc0 > [=C2=A0=C2=A0297.425939]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= do_iter_write+0x248/0x2a0 > [=C2=A0=C2=A0297.425940]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= vfs_writev+0x106/0x1e0 > [=C2=A0=C2=A0297.425941]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= do_writev+0xd4/0x180 > [=C2=A0=C2=A0297.425941]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= __x64_sys_writev+0x45/0x50 > [=C2=A0=C2=A0297.425942]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= do_syscall_64+0xcc/0x76c > [=C2=A0=C2=A0297.425943]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= entry_SYSCALL_64_after_hwframe+0x49/0xbe --=20 Michal Hocko SUSE Labs