From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45FA0C4338F for ; Fri, 23 Jul 2021 09:48:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6A5C760EC0 for ; Fri, 23 Jul 2021 09:48:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 6A5C760EC0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=sina.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 09C9D6B005D; Fri, 23 Jul 2021 05:48:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 04BAE6B006C; Fri, 23 Jul 2021 05:48:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA4106B0070; Fri, 23 Jul 2021 05:48:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0145.hostedemail.com [216.40.44.145]) by kanga.kvack.org (Postfix) with ESMTP id 535EF6B005D for ; Fri, 23 Jul 2021 05:48:42 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id F0607180AD82F for ; Fri, 23 Jul 2021 09:48:41 +0000 (UTC) X-FDA: 78393378042.14.B6B4078 Received: from r3-25.sinamail.sina.com.cn (r3-25.sinamail.sina.com.cn [202.108.3.25]) by imf21.hostedemail.com (Postfix) with SMTP id 90040D03911E for ; Fri, 23 Jul 2021 09:48:40 +0000 (UTC) Received: from unknown (HELO localhost.localdomain)([1.24.238.81]) by sina.com (172.16.97.23) with ESMTP id 60FA90740002B9FA; Fri, 23 Jul 2021 17:48:37 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 95559954919302 From: Hillf Danton To: Paolo Bonzini Cc: Thomas Gleixner , Sebastian Andrzej Siewior , "Michael S. Tsirkin" , linux-mm@kvack.org, LKML , Al Viro Subject: Re: 5.13-rt1 + KVM = WARNING: at fs/eventfd.c:74 eventfd_signal() Date: Fri, 23 Jul 2021 17:48:30 +0800 Message-Id: <20210723094830.1375-1-hdanton@sina.com> In-Reply-To: References: <8dfc0ee9-b97a-8ca8-d057-31c8cad3f5b6@redhat.com> <475f84e2-78ee-1a24-ef57-b16c1f2651ed@redhat.com> <20210715102249.2205-1-hdanton@sina.com> <20210716020611.2288-1-hdanton@sina.com> <20210716075539.2376-1-hdanton@sina.com> <20210716093725.2438-1-hdanton@sina.com> <20210718124219.1521-1-hdanton@sina.com> <20210721070452.1008-1-hdanton@sina.com> <20210721101119.1103-1-hdanton@sina.com> <20210723022356.1301-1-hdanton@sina.com> MIME-Version: 1.0 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 90040D03911E Authentication-Results: imf21.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf21.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.25 as permitted sender) smtp.mailfrom=hdanton@sina.com X-Stat-Signature: rkrwjr1jcjzb8aanroiw7fa4ugcc7jm3 X-HE-Tag: 1627033720-539731 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, 23 Jul 2021 09:59:55 +0200 Paolo Bonzini wrote: >On 23/07/21 04:23, Hillf Danton wrote: >> Detect concurrent reader and writer by reading event counter before an= d >> after poll_wait(), and determine feedback with the case of unstable >> counter taken into account. >>=20 >> Cut the big comment as the added barriers speak for themselves. > >First and foremost, I'm not sure what you are trying to fix. The change proposed builds the return value without assuming that the event count is stable across poll_wait(). If it is unstable then we know there are concurrent reader and/or writer who both are ingnored currently= . > >Second, the patch is wrong even without taking into account the lockless >accesses, because the condition for returning EPOLLOUT is certainly wron= g. Given it is detected that event count was consumed, there is room, though as racy as it is, in the event count for writer to make some progress. > >Third, barriers very rarely speak for themselves. In particular what >do they pair with? There is no need to consider pair frankly. Barriers are just readded for removing the seep in the comment. Then the comment goes with the seep. What the comment does not cover is the cases of more-than-two-party race. Hillf >It seems to me that you are basically reintroducing >the same mistake that commit a484c3dd9426 ("eventfd: document lockless >access in eventfd_poll", 2016-03-22) fixed, at the time where the big >comment was introduced: > > Things aren't as simple as the read barrier in eventfd_poll > would suggest. In fact, the read barrier, besides lacking a > comment, is not paired in any obvious manner with another read > barrier, and it is pointless because it is sitting between a write > (deep in poll_wait) and the read of ctx->count. > >Paolo > > >> +++ x/fs/eventfd.c >> @@ -131,49 +131,20 @@ static __poll_t eventfd_poll(struct file >> { >> struct eventfd_ctx *ctx =3D file->private_data; >> __poll_t events =3D 0; >> - u64 count; >> + u64 c0, count; >> + >> + c0 =3D ctx->count; >> + smp_rmb(); >> =20 >> poll_wait(file, &ctx->wqh, wait); >> =20 >> - /* >> - * All writes to ctx->count occur within ctx->wqh.lock. This read >> - * can be done outside ctx->wqh.lock because we know that poll_wait >> - * takes that lock (through add_wait_queue) if our caller will sleep= . >> - * >> - * The read _can_ therefore seep into add_wait_queue's critical >> - * section, but cannot move above it! add_wait_queue's spin_lock ac= ts >> - * as an acquire barrier and ensures that the read be ordered proper= ly >> - * against the writes. The following CAN happen and is safe: >> - * >> - * poll write >> - * ----------------- ------------ >> - * lock ctx->wqh.lock (in poll_wait) >> - * count =3D ctx->count >> - * __add_wait_queue >> - * unlock ctx->wqh.lock >> - * lock ctx->qwh.lock >> - * ctx->count +=3D n >> - * if (waitqueue_active) >> - * wake_up_locked_poll >> - * unlock ctx->qwh.lock >> - * eventfd_poll returns 0 >> - * >> - * but the following, which would miss a wakeup, cannot happen: >> - * >> - * poll write >> - * ----------------- ------------ >> - * count =3D ctx->count (INVALID!) >> - * lock ctx->qwh.lock >> - * ctx->count +=3D n >> - * **waitqueue_active is fals= e** >> - * **no wake_up_locked_poll!*= * >> - * unlock ctx->qwh.lock >> - * lock ctx->wqh.lock (in poll_wait) >> - * __add_wait_queue >> - * unlock ctx->wqh.lock >> - * eventfd_poll returns 0 >> - */ >> - count =3D READ_ONCE(ctx->count); >> + smp_rmb(); >> + count =3D ctx->count; >> + >> + if (c0 < count) >> + return EPOLLIN; >> + if (c0 > count) >> + return EPOLLOUT; >> =20 >> if (count > 0) >> events |=3D EPOLLIN; >>=20