From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.1 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_2 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 268CDECE58C for ; Mon, 7 Oct 2019 13:07:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C4F0521655 for ; Mon, 7 Oct 2019 13:07:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=lca.pw header.i=@lca.pw header.b="p9/p1FMN" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C4F0521655 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=lca.pw Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 74D958E0006; Mon, 7 Oct 2019 09:07:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 725A88E0003; Mon, 7 Oct 2019 09:07:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 63AE28E0006; Mon, 7 Oct 2019 09:07:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0048.hostedemail.com [216.40.44.48]) by kanga.kvack.org (Postfix) with ESMTP id 41CF48E0003 for ; Mon, 7 Oct 2019 09:07:06 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id D8B87180AD829 for ; Mon, 7 Oct 2019 13:07:05 +0000 (UTC) X-FDA: 76017014010.08.moon11_5b069cce60013 X-HE-Tag: moon11_5b069cce60013 X-Filterd-Recvd-Size: 9355 Received: from mail-qt1-f194.google.com (mail-qt1-f194.google.com [209.85.160.194]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Oct 2019 13:07:05 +0000 (UTC) Received: by mail-qt1-f194.google.com with SMTP id m15so19028258qtq.2 for ; Mon, 07 Oct 2019 06:07:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lca.pw; s=google; h=message-id:subject:from:to:cc:date:in-reply-to:references :mime-version:content-transfer-encoding; bh=+s+M60GubuM9arGmnwNneSr3U0t+wI+6sJoXV3Zv6RU=; b=p9/p1FMNgMiG2rS6Oq8Q5l2X9KfACwjrYJ2skWhF3ADm+ayLu83ZLdZehG+8tCSNCx /9Ttcy5VkIDdIekYelvmRhQZHTY0yc24Sa0jSErJ6HPbpyPG6i8bCjQmC3GuuZS5PV/I B0zdqmq50Yc6yqyYNUHAmzpemTD5S2shYITGAmnp8vg80c1SiZUcZrgriqBex2r7udo8 Mo89lBnB9ASKEN4sa4FN0xm2nYfKM7lMFNMX0PGv/vBCwl4rk1H6C51L2vIEQ8q2HyML YDPVp9no9wzqsAQOLtCmAz2b3+NBA7XaSoMj8gkjIpLfVohpgnwnZTeA9CUpUvNp70D0 l82w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:mime-version:content-transfer-encoding; bh=+s+M60GubuM9arGmnwNneSr3U0t+wI+6sJoXV3Zv6RU=; b=X8Pz/PWSPAr6XKcIvbedbvn+rUEzDpKpoH5ztXawQntLTdy0ilE7p+FMZGafFcrcDh kgYttXVIFEKAnGPHe+w6l6lQHirDN7CkowuosWd9zL+eXWfgHePhIQuU9sH7E8QP92BM WCEGl6EQA8hK6XB4BuR9B2rLmCzldAqQAs36I0W99wQxZvl3KTK9gdJrWW4J9JS4181n Cq3lKf7l++TeL5L9paZdGr8W2UrkmxSDJ7SqKdVS0jR7sLs9pOcMhX3M+vPv27+2neXU LPD5TYBbgNCWqIqZh8ajz9M/J3JS3DdkJeGtLWZ0Q4RhLsdmE7y+ZsNWT7TZ/DEhVvZL F0Hg== X-Gm-Message-State: APjAAAWPpggUSlH8jbip4oQl7Jyb30DKGlPSVkZVslU/uk4wj3c3eTYi M869ik2d7ac3CNm3ak21s5Rn8w== X-Google-Smtp-Source: APXvYqzprZ6K72dnr7RK5nPxTmfCNilo1/EJbF13sRvrSWNs+dDCK/qkLqXDB55Y4BqhB9HYygm1PQ== X-Received: by 2002:a0c:946f:: with SMTP id i44mr3499064qvi.133.1570453624603; Mon, 07 Oct 2019 06:07:04 -0700 (PDT) Received: from dhcp-41-57.bos.redhat.com (nat-pool-bos-t.redhat.com. [66.187.233.206]) by smtp.gmail.com with ESMTPSA id q207sm8310045qke.98.2019.10.07.06.07.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Oct 2019 06:07:04 -0700 (PDT) Message-ID: <1570453622.5576.288.camel@lca.pw> Subject: Re: [PATCH v2] mm/page_isolation: fix a deadlock with printk() From: Qian Cai To: Michal Hocko Cc: akpm@linux-foundation.org, sergey.senozhatsky.work@gmail.com, pmladek@suse.com, rostedt@goodmis.org, peterz@infradead.org, david@redhat.com, john.ogness@linutronix.de, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Oct 2019 09:07:02 -0400 In-Reply-To: <20191007124356.GJ2381@dhcp22.suse.cz> References: <20191007080742.GD2381@dhcp22.suse.cz> <20191007113710.GH2381@dhcp22.suse.cz> <1570450304.5576.283.camel@lca.pw> <20191007124356.GJ2381@dhcp22.suse.cz> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.22.6 (3.22.6-10.el7) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 2019-10-07 at 14:43 +0200, Michal Hocko wrote: > On Mon 07-10-19 08:11:44, Qian Cai wrote: > > On Mon, 2019-10-07 at 13:37 +0200, Michal Hocko wrote: > > > On Mon 07-10-19 07:04:00, Qian Cai wrote: > > > >=20 > > > >=20 > > > > > On Oct 7, 2019, at 4:07 AM, Michal Hocko wr= ote: > > > > >=20 > > > > > I do not think that removing the printk is the right long term = solution. > > > > > While I do agree that removing the debugging printk __offline_i= solated_pages > > > > > does make sense because it is essentially of a very limited use= , this > > > > > doesn't really solve the underlying problem. There are likely = other > > > > > printks from zone->lock. It would be much more saner to actuall= y > > > > > disallow consoles to allocate any memory while printk is called= from an > > > > > atomic context. > > > >=20 > > > > No, there is only a handful of places called printk() from > > > > zone->lock. It is normal that the callers will quietly process > > > > =E2=80=9Cstruct zone=E2=80=9D modification in a short section wit= h zone->lock > > > > held. > > >=20 > > > It is extremely error prone to have any zone->lock vs. printk > > > dependency. I do not want to play an endless whack a mole. > > >=20 > > > > No, it is not about =E2=80=9Callocate any memory while printk is = called from an > > > > atomic context=E2=80=9D. It is opposite lock chain from differen= t processors which has the same effect. For example, > > > >=20 > > > > CPU0: CPU1: CPU2: > > > > console_owner > > > > sclp_lock > > > > sclp_lock zone_lock > > > > zone_lock > > > > console_owner > > >=20 > > > Why would sclp_lock ever take a zone->lock (apart from an allocatio= n). > > > So really if sclp_lock is a lock that might be taken from many cont= exts > > > and generate very subtle lock dependencies then it should better be > > > really careful what it is calling into. > > >=20 > > > In other words you are trying to fix a wrong end of the problem. Fi= x the > > > console to not allocate or depend on MM by other means. > >=20 > > It looks there are way too many places that could generate those indi= rect lock > > chains that are hard to eliminate them all. Here is anther example, w= here it > > has, >=20 > Yeah and I strongly suspect they are consoles which are broken and need > to be fixed rathert than the problem papered over. >=20 > I do realize how tempting it is to remove all printks from the > zone->lock but do realize that as soon as the allocator starts using an= y > other locks then we are back to square one and the problem is there > again. We would have to drop _all_ printks from any locked section in > the allocator and I do not think this is viable. >=20 > Really, the only way forward is to make these consoles be more careful > of external dependencies. Even with the new printk() Petr proposed. There is no guarantee it will f= ix it properly. It looks like just reduce the chance of this kind of deadlocks = as it may or may not call=C2=A0wake_up_klogd() in=C2=A0vprintk_emit() depends o= n timing. zone->lock printk_deferred() vprintk_emit() wake_up_klogd() wake_up_klogd_work_func() console_unlock() which essentially went into the same path, zone_lock -> console_owner_lock What else options it has here? >=20 > I am also wondering, this code is there for a long time (or is there an= y > recent change?), why are we seeing reports only now? Are those consoles > rarely used or you are simply luck to hit those? Or are those really > representing a deadlock? Maybe the lockdep is just confused? I am not > familiar with the code but console_owner_lock is doing some complex > stuff to hand over the context. As I mentioned in the changelog that almost nobody call printk() with zon= e->lock=20 held except in memory offline. "The problem is probably there forever, but neither many developers will run memory offline with the lockdep enabled nor admins in the field are lucky enough yet to hit a perfect timing which required to trigger a real deadlock. In addition, there aren't many places that call printk() while zone->lock was held." >=20 > > console_owner -> port_lock > > port_lock -> zone_lock > >=20 > > [=C2=A0=C2=A0297.425922] -> #3 (&(&zone->lock)->rlock){-.-.}: > > [=C2=A0=C2=A0297.425925]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0__lock_acquire+0x5b3/0xb40 > > [=C2=A0=C2=A0297.425925]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0lock_acquire+0x126/0x280 > > [=C2=A0=C2=A0297.425926]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0_raw_spin_lock+0x2f/0x40 > > [=C2=A0=C2=A0297.425927]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0rmqueue_bulk.constprop.21+0xb6/0x1160 > > [=C2=A0=C2=A0297.425928]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0get_page_from_freelist+0x898/0x22c0 > > [=C2=A0=C2=A0297.425928]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0__alloc_pages_nodemask+0x2f3/0x1cd0 > > [=C2=A0=C2=A0297.425929]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0alloc_pages_current+0x9c/0x110 > > [=C2=A0=C2=A0297.425930]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0allocate_slab+0x4c6/0x19c0 > > [=C2=A0=C2=A0297.425931]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0new_slab+0x46/0x70 > > [=C2=A0=C2=A0297.425931]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0___slab_alloc+0x58b/0x960 > > [=C2=A0=C2=A0297.425932]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0__slab_alloc+0x43/0x70 > > [=C2=A0=C2=A0297.425933]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0__kmalloc+0x3ad/0x4b0 > > [=C2=A0=C2=A0297.425933]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0__tty_buffer_request_room+0x100/0x250 > > [=C2=A0=C2=A0297.425934]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0tty_insert_flip_string_fixed_flag+0x67/0x110 > > [=C2=A0=C2=A0297.425935]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0pty_write+0xa2/0xf0 > > [=C2=A0=C2=A0297.425936]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0n_tty_write+0x36b/0x7b0 > > [=C2=A0=C2=A0297.425936]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0tty_write+0x284/0x4c0 > > [=C2=A0=C2=A0297.425937]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0__vfs_write+0x50/0xa0 > > [=C2=A0=C2=A0297.425938]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0vfs_write+0x105/0x290 > > [=C2=A0=C2=A0297.425939]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0redirected_tty_write+0x6a/0xc0 > > [=C2=A0=C2=A0297.425939]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0do_iter_write+0x248/0x2a0 > > [=C2=A0=C2=A0297.425940]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0vfs_writev+0x106/0x1e0 > > [=C2=A0=C2=A0297.425941]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0do_writev+0xd4/0x180 > > [=C2=A0=C2=A0297.425941]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0__x64_sys_writev+0x45/0x50 > > [=C2=A0=C2=A0297.425942]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0do_syscall_64+0xcc/0x76c > > [=C2=A0=C2=A0297.425943]=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0entry_SYSCALL_64_after_hwframe+0x49/0xbe >=20 >=20