From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.6 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39B2FC433DF for ; Fri, 24 Jul 2020 03:46:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E2CB820737 for ; Fri, 24 Jul 2020 03:46:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="C6mVErwt" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E2CB820737 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7072B6B002F; Thu, 23 Jul 2020 23:46:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6B67B6B0030; Thu, 23 Jul 2020 23:46:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A6038D0003; Thu, 23 Jul 2020 23:46:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0229.hostedemail.com [216.40.44.229]) by kanga.kvack.org (Postfix) with ESMTP id 4538D6B002F for ; Thu, 23 Jul 2020 23:46:16 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id BB97D182FA6F0 for ; Fri, 24 Jul 2020 03:46:15 +0000 (UTC) X-FDA: 77071581510.29.music04_4f14cbe26f44 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id 924C0180374A8 for ; Fri, 24 Jul 2020 03:46:15 +0000 (UTC) X-HE-Tag: music04_4f14cbe26f44 X-Filterd-Recvd-Size: 6035 Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com [209.85.128.67]) by imf17.hostedemail.com (Postfix) with ESMTP for ; Fri, 24 Jul 2020 03:46:15 +0000 (UTC) Received: by mail-wm1-f67.google.com with SMTP id f18so6987918wml.3 for ; Thu, 23 Jul 2020 20:46:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=M4J07zWXl7gx9PKEYJvn7EJ3LUQ854Vz+YTl+3vbqAY=; b=C6mVErwtomReoITMugJNqU569a7mg+Mi54NW0KTPb/ek0WcdjBImAYVsJZ2NBfpOnm FQ2wlSJFDFOO4d94T1hhmkXW1fbi8S9p483YWzMyqu3r+bGBngERW+C3Qs0na7MrVh8q NEAhrF6u90fodX0dCPFOkyQIsOtoWnTwdUogV0qsCccR3Gks+GEl963pMUuVe9Y7WI5r RTIcvOruSfzxiiVFKse6/NG8dKh73Yo06eev8NgFCbtZBAGSXZBaTMp5YfknP80pISK/ LxHrhby0G1lFReM4GpTz0kiL7D7eg08mRnRS8Vw/ShtKLC4Yp/MqsTizbqvq4seR+MZT KaPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=M4J07zWXl7gx9PKEYJvn7EJ3LUQ854Vz+YTl+3vbqAY=; b=jrLgHMClNuDBhuaSxOnPbEr+kdOXv4xcRblwwvCIpYSqs1H738kJjMy95cJC8Eq55e X+v4YE0Z+zJrsAYGFJ4qTmR0/1ku2eTODbajC04puNHPLV2NbJPF43qCo4UdlOcNy1/p UbjbRXRhatK4ba6gxQL+fnTq6rQ45hb2SnIju89Nm9RK6vTVzHLujNcotTTF4xGKto5/ NceYeMKZsE8gQc2uHq9iuC0rBWZdpcUVMCgtJzF6SlIssXqo0e/J1JdbfrxM8ZzM07gz JaWj73cS/p/jf5879+rCxu+syCpBgAK0IZRZlSkwHLjex/tOume3OnBNZ5QjaDDeSlnm OlAw== X-Gm-Message-State: AOAM533p6qElsfh1xSVLEsrs8tPclmubwyJGQDOO0Oes/2zjjpqibv83 GND0hvtNLkI0umKnb6B5pqs0GlIv7pY23pG9HjdBug== X-Google-Smtp-Source: ABdhPJz9rZ1cUR8D93W8V+89NR+0UPp92dpRf7lfMX0mGUeNHm3QfgxO0pABEBPSxzCG4U/YTKFZ3eh1g+OkV4dnWYk= X-Received: by 2002:a1c:9650:: with SMTP id y77mr6597593wmd.101.1595562373681; Thu, 23 Jul 2020 20:46:13 -0700 (PDT) MIME-Version: 1.0 References: <20200721063258.17140-1-mhocko@kernel.org> <20200723124749.GA7428@redhat.com> In-Reply-To: From: Hugh Dickins Date: Thu, 23 Jul 2020 20:45:34 -0700 Message-ID: Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page To: Linus Torvalds Cc: Oleg Nesterov , Michal Hocko , Linux-MM , LKML , Andrew Morton , Tim Chen , Michal Hocko Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 924C0180374A8 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 23, 2020 at 5:47 PM Linus Torvalds wrote: > > On Thu, Jul 23, 2020 at 5:07 PM Hugh Dickins wrote: > > > > I say that for full disclosure, so you don't wrack your brains > > too much, when it may still turn out to be a screwup on my part. > > Sounds unlikely. > > If that patch applied even reasonably closely, I don't see how you'd > see a list corruption that wasn't due to the patch. > > You'd have had to use the wrong spinlock by mistake due to munging it, > or something crazy like that. > > The main list-handling change is > > (a) open-coding of that finish_wait() > > (b) slightly different heuristics for removal in the wakeup function > > where (a) was because my original version of finishing the wait needed > to do that return code checking. > > So a normal "finish_wait()" just does > > list_del_init(&wait->entry); > > where-as my open-coded one replaced that with > > if (!list_empty(&wait->entry)) { > list_del(&wait->entry); > ret = -EINTR; > } > > and apart from that "set return to -EINTR because nobody woke us up", > it also uses just a regular "list_del()" rather than a > "list_del_init()". That causes the next/prev field to be poisoned > rather than re-initialized. But that second change was because the > list entry is on the stack, and we're not touching it any more and are > about to return, so I removed the "init" part. > > Anyway, (a) really looks harmless. Unless the poisoning now triggers > some racy debug test that had been hidden by the "init". Hmm. > > In contrast, (b) means that the likely access patterns of irqs > removing the wait entry from the list might be very different from > before. The old "autoremove" semantics would only remove the entry > from the list when try_to_wake_up() actually woke things up. Now, a > successful bit state _always_ removes it, which was kind of the point. > But it might cause very different list handling patterns. > > All the actual list handling looks "obviously safe" because it's > protected by the spinlock, though... > > If you do get oopses with the new patch too, try to send me a copy, > and maybe I'll stare at exactly where it happens register contents and > go "aah". This new version is doing much better: many hours to go, but all machines have got beyond the danger point where yesterday's version was crashing - phew! Hugh