From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95D32C433E1 for ; Thu, 23 Jul 2020 18:23:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7056D20792 for ; Thu, 23 Jul 2020 18:23:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595528587; bh=zYccf1YIuxWvATI1tSItkk6cO4pgnvtZMi2RqU173yM=; h=References:In-Reply-To:From:Date:Subject:To:Cc:List-ID:From; b=v3Rb9YdYIf130oesXqI4qw6aPVb2Bo4v+vfW18mTbwN0zwO7NAUGJ3k6YztlY5ZqY 39v2rtwcnAUNebPDPWV5uX00DtJ6mCowHxVuCn/QwCdjQRIVeyUozz4DjG0l1DQxA8 TMG0Us4Ca+dei12UmjoVIZ8dlk2jq0GKYGWViI1g= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727784AbgGWSXG (ORCPT ); Thu, 23 Jul 2020 14:23:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57368 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726304AbgGWSXF (ORCPT ); Thu, 23 Jul 2020 14:23:05 -0400 Received: from mail-lj1-x242.google.com (mail-lj1-x242.google.com [IPv6:2a00:1450:4864:20::242]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 949B6C0619DC for ; Thu, 23 Jul 2020 11:23:05 -0700 (PDT) Received: by mail-lj1-x242.google.com with SMTP id j11so7376381ljo.7 for ; Thu, 23 Jul 2020 11:23:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8jbqAu4MTtKZYfsDRzg9oxuVgmZXrDgAOWgETbdmt3I=; b=TpC1L9yeqU8VAOIGIaabNRD2URVdeVJkJUvan/PdnVoywn5ewbqyDjsAOR2s/j/716 6qzVRkeThyuPOPLm0TzlXk/CAB2wM0XXQkHKrpkXglrPHKkEtftSwCjuVRP3t7TqqESK mwqoKZRybu74iu0fz+q3OrSQs3NydidhrfzPo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8jbqAu4MTtKZYfsDRzg9oxuVgmZXrDgAOWgETbdmt3I=; b=OWh/NLUzlGXzyUZrB3xCzpcWTM9XCuVlM7aXa3sdHDm0LtVmCrF0QK9f9MC+nt3j0c AEn6fE5EN6lohs1B3U+eea7FUmqfDMq+yS7SjbUapMGZqMfeX8JbtqTiLc5VcgrVABu9 j8fgBFdpmQvTMC9c817O9fxJW+pTTBbi7BrOeAHXFSITI6Gt6jO30zW3QA5put/TXDtB P9boGbZDH0vsjDiUEBpCEFrNGhwZrY0oXiGgZcJt6vxfiLvOBQJwDnNrerwhdwG74KyV St83sYwa+L1MhSrAhCpVOkrtTOraDzWWUX5IC4Kg9H8YCtBwZFN5VbeYdXJzf0ewpsno XTxA== X-Gm-Message-State: AOAM530dKc8ipdm51KsN7rUbF7BpytMmRO5cypDbG3zPfFzEQA9jWqau RtOwwbPO8qJ2gJBM5UqdBPVB3eXaSWo= X-Google-Smtp-Source: ABdhPJwaNZAGnAKShw3Tx0ZNg0yT9Lq/nPvmhff5YmKPaCDV76LqnAF4wrDeueDL1Vr8ahHhcvbDMw== X-Received: by 2002:a2e:964d:: with SMTP id z13mr2521557ljh.98.1595528583240; Thu, 23 Jul 2020 11:23:03 -0700 (PDT) Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com. [209.85.167.50]) by smtp.gmail.com with ESMTPSA id 186sm3250414lfn.66.2020.07.23.11.23.01 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 23 Jul 2020 11:23:02 -0700 (PDT) Received: by mail-lf1-f50.google.com with SMTP id b11so3790216lfe.10 for ; Thu, 23 Jul 2020 11:23:01 -0700 (PDT) X-Received: by 2002:ac2:58d5:: with SMTP id u21mr2835310lfo.31.1595528581489; Thu, 23 Jul 2020 11:23:01 -0700 (PDT) MIME-Version: 1.0 References: <20200721063258.17140-1-mhocko@kernel.org> <20200723124749.GA7428@redhat.com> <20200723180100.GA21755@redhat.com> In-Reply-To: <20200723180100.GA21755@redhat.com> From: Linus Torvalds Date: Thu, 23 Jul 2020 11:22:44 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page To: Oleg Nesterov Cc: Hugh Dickins , Michal Hocko , Linux-MM , LKML , Andrew Morton , Tim Chen , Michal Hocko Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 23, 2020 at 11:01 AM Oleg Nesterov wrote: > > > + * > > + * We _really_ should have a "list_del_init_careful()" to > > + * properly pair with the unlocked "list_empty_careful()" > > + * in finish_wait(). > > + */ > > + smp_mb(); > > + list_del_init(&wait->entry); > > I think smp_wmb() would be enough, but this is minor. Well, what we _really_ want (and what that comment is about) would be got that list_del_init_careful() to use a "smp_store_release()" for the last store, and then "list_empty_careful()" would use a "smp_load_acquire()" for the corresponding first load. On x86, that's free. On most other architectures, it's the minimal ordering requirement. And no, I don't think "smp_wmb()" is technically enough. With crazy memory ordering, one of the earlier *reads* (eg loading "wait->private" when waking things up) could have been delayed until after the stores that initialize the list - and thus read stack contents from another process after it's been released and re-used. Does that happen in reality? No. There are various conditionals in there which means that the stores end up being gated on the loads and cannot actually be re-ordered, but it's the kind of subtley So we actually do want to constrain all earlier reads and writes wrt the final write. Which is exactly what "smp_store_release()" does. But with our current list_empty_careful(), the smp_mb() is I think technically sufficient. > We need a barrier between "wait->flags |= WQ_FLAG_WOKEN" and list_del_init(), See above: we need more than just that write barrier, although in _practice_ you're right, and the other barriers actually all already exist and are part of wake_up_state(). So the smp_mb() is unnecessary, and in fact your smp_wmb() would be too. But I left it there basically as "documentation". > But afaics we need another barrier, rmb(), in wait_on_page_bit_common() fo > the case when wait->private was not blocked; we need to ensure that if > finish_wait() sees list_empty_careful() == T then we can't miss WQ_FLAG_WOKEN. Again, this is what a proper list_empty_careful() with a smp_load_acquire() would have automatically gotten for us. But yes, I think that without that, and with the explicit barriers, we need an smp_rmb() after the list_empty_careful(). I really think it should be _in_ list_empty_careful(), though. Or maybe finish_wait(). Hmm. Because looking at all the other finish_wait() uses, the fact that the waitqueue _list_ is properly ordered isn't really a guarantee of the rest of the stack space is. In practice, it will be, but I think this lack of serialization is a potential real bug elsewhere too. (Obviously none of this would show on x86, where we already *get* that smp_store_release/smp_load_acquire behavior for the existing list_del_init()/list_empty_careful(), since all stores are releases, and all loads are acquires) So I think that is a separate issue, generic to our finish_wait() uses. Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC134C433DF for ; Thu, 23 Jul 2020 18:23:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A151420792 for ; Thu, 23 Jul 2020 18:23:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="TpC1L9ye" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A151420792 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1E5778D0001; Thu, 23 Jul 2020 14:23:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1943E6B000E; Thu, 23 Jul 2020 14:23:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 05DC78D0001; Thu, 23 Jul 2020 14:23:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0163.hostedemail.com [216.40.44.163]) by kanga.kvack.org (Postfix) with ESMTP id E2EBF6B000D for ; Thu, 23 Jul 2020 14:23:06 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 760E518027A94 for ; Thu, 23 Jul 2020 18:23:06 +0000 (UTC) X-FDA: 77070162372.28.rifle48_110946026f40 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id 496C5EFA0 for ; Thu, 23 Jul 2020 18:23:06 +0000 (UTC) X-HE-Tag: rifle48_110946026f40 X-Filterd-Recvd-Size: 6820 Received: from mail-lf1-f67.google.com (mail-lf1-f67.google.com [209.85.167.67]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Thu, 23 Jul 2020 18:23:05 +0000 (UTC) Received: by mail-lf1-f67.google.com with SMTP id k17so3803711lfg.3 for ; Thu, 23 Jul 2020 11:23:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8jbqAu4MTtKZYfsDRzg9oxuVgmZXrDgAOWgETbdmt3I=; b=TpC1L9yeqU8VAOIGIaabNRD2URVdeVJkJUvan/PdnVoywn5ewbqyDjsAOR2s/j/716 6qzVRkeThyuPOPLm0TzlXk/CAB2wM0XXQkHKrpkXglrPHKkEtftSwCjuVRP3t7TqqESK mwqoKZRybu74iu0fz+q3OrSQs3NydidhrfzPo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8jbqAu4MTtKZYfsDRzg9oxuVgmZXrDgAOWgETbdmt3I=; b=mD70SwrgpPrQKeB7FaWiEDToFF+5+gRN81E81iOH4cMZzyRIony6a3dSdKHjrdiplW e3eBme1QcCbyGxYKFIPEnvHIlM0oCyDm3chO3vhHz0x9dOlqeD/sUmv68Er4YErzNhaa XyEppCu0ppnAUXlfdkRedGFJib0v8UnrU5Cm3hRVu12gsSkK+LSKs9LAGDQsCZVq7z0/ 4XxoKc7m87iYUVXBWrA3ZZSl5ADayPVFc2B5tgmdfUO3wxus/EZg4Xq3/W9zt+hfo8+7 tnwOYVoUooS7KxP/fTG4X5LzuviWMD6cR+QwbM48WR5N0fQRfP+zyj3jKruYIBQQlNT1 e4Ww== X-Gm-Message-State: AOAM532msXu4yz9J4DlAGkCktkUBE2MFjLWyD4OnDN0fo9omGaqmsWsV c0vPMaXlplDCU8vD/bV+NLb08sr6uhU= X-Google-Smtp-Source: ABdhPJzLNiOzbcy9SmythgVtxH88wdw6PUnKg4sOzplTyIehevv/q8690bQ3fATKu8BYaUAGdj7LqQ== X-Received: by 2002:a19:c7d0:: with SMTP id x199mr2906301lff.205.1595528583298; Thu, 23 Jul 2020 11:23:03 -0700 (PDT) Received: from mail-lf1-f47.google.com (mail-lf1-f47.google.com. [209.85.167.47]) by smtp.gmail.com with ESMTPSA id b9sm3195994lfi.88.2020.07.23.11.23.01 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 23 Jul 2020 11:23:02 -0700 (PDT) Received: by mail-lf1-f47.google.com with SMTP id h8so3795204lfp.9 for ; Thu, 23 Jul 2020 11:23:01 -0700 (PDT) X-Received: by 2002:ac2:58d5:: with SMTP id u21mr2835310lfo.31.1595528581489; Thu, 23 Jul 2020 11:23:01 -0700 (PDT) MIME-Version: 1.0 References: <20200721063258.17140-1-mhocko@kernel.org> <20200723124749.GA7428@redhat.com> <20200723180100.GA21755@redhat.com> In-Reply-To: <20200723180100.GA21755@redhat.com> From: Linus Torvalds Date: Thu, 23 Jul 2020 11:22:44 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page To: Oleg Nesterov Cc: Hugh Dickins , Michal Hocko , Linux-MM , LKML , Andrew Morton , Tim Chen , Michal Hocko Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 496C5EFA0 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 23, 2020 at 11:01 AM Oleg Nesterov wrote: > > > + * > > + * We _really_ should have a "list_del_init_careful()" to > > + * properly pair with the unlocked "list_empty_careful()" > > + * in finish_wait(). > > + */ > > + smp_mb(); > > + list_del_init(&wait->entry); > > I think smp_wmb() would be enough, but this is minor. Well, what we _really_ want (and what that comment is about) would be got that list_del_init_careful() to use a "smp_store_release()" for the last store, and then "list_empty_careful()" would use a "smp_load_acquire()" for the corresponding first load. On x86, that's free. On most other architectures, it's the minimal ordering requirement. And no, I don't think "smp_wmb()" is technically enough. With crazy memory ordering, one of the earlier *reads* (eg loading "wait->private" when waking things up) could have been delayed until after the stores that initialize the list - and thus read stack contents from another process after it's been released and re-used. Does that happen in reality? No. There are various conditionals in there which means that the stores end up being gated on the loads and cannot actually be re-ordered, but it's the kind of subtley So we actually do want to constrain all earlier reads and writes wrt the final write. Which is exactly what "smp_store_release()" does. But with our current list_empty_careful(), the smp_mb() is I think technically sufficient. > We need a barrier between "wait->flags |= WQ_FLAG_WOKEN" and list_del_init(), See above: we need more than just that write barrier, although in _practice_ you're right, and the other barriers actually all already exist and are part of wake_up_state(). So the smp_mb() is unnecessary, and in fact your smp_wmb() would be too. But I left it there basically as "documentation". > But afaics we need another barrier, rmb(), in wait_on_page_bit_common() fo > the case when wait->private was not blocked; we need to ensure that if > finish_wait() sees list_empty_careful() == T then we can't miss WQ_FLAG_WOKEN. Again, this is what a proper list_empty_careful() with a smp_load_acquire() would have automatically gotten for us. But yes, I think that without that, and with the explicit barriers, we need an smp_rmb() after the list_empty_careful(). I really think it should be _in_ list_empty_careful(), though. Or maybe finish_wait(). Hmm. Because looking at all the other finish_wait() uses, the fact that the waitqueue _list_ is properly ordered isn't really a guarantee of the rest of the stack space is. In practice, it will be, but I think this lack of serialization is a potential real bug elsewhere too. (Obviously none of this would show on x86, where we already *get* that smp_store_release/smp_load_acquire behavior for the existing list_del_init()/list_empty_careful(), since all stores are releases, and all loads are acquires) So I think that is a separate issue, generic to our finish_wait() uses. Linus