From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751540AbdHNFHX (ORCPT <rfc822;w@1wt.eu>);
        Mon, 14 Aug 2017 01:07:23 -0400
Received: from mail-sn1nam02on0083.outbound.protection.outlook.com ([104.47.36.83]:38112
        "EHLO NAM02-SN1-obe.outbound.protection.outlook.com"
        rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
        id S1751501AbdHNFHW (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 14 Aug 2017 01:07:22 -0400
From: Nadav Amit <namit@vmware.com>
To: Minchan Kim <minchan@kernel.org>
CC: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>,
        Stephen Rothwell <sfr@canb.auug.org.au>,
        Andrew Morton <akpm@linux-foundation.org>,
        Thomas Gleixner <tglx@linutronix.de>, "Ingo Molnar" <mingo@elte.hu>,
        "H. Peter Anvin" <hpa@zytor.com>,
        "Linux-Next Mailing List" <linux-next@vger.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        Linus <torvalds@linux-foundation.org>
Subject: Re: linux-next: manual merge of the akpm-current tree with the tip
 tree
Thread-Topic: linux-next: manual merge of the akpm-current tree with the tip
 tree
Thread-Index: AQHTEnbvkRXpJ0prakSqPKBRkcADs6J+5TmAgAAkogCAAALYgIAAI/cAgAKfBACAAHDTgIAA8e6AgAAfCYA=
Date: Mon, 14 Aug 2017 05:07:19 +0000
Message-ID: <0F858068-D41D-46E3-B4A8-8A95B4EDB94F@vmware.com>
References: <20170811175326.36d546dc@canb.auug.org.au>
 <20170811093449.w5wttpulmwfykjzm@hirez.programming.kicks-ass.net>
 <20170811214556.322b3c4e@canb.auug.org.au>
 <20170811115607.p2vgqcp7w3wurhvw@gmail.com>
 <20170811140450.irhxa2bhdpmmhhpv@hirez.programming.kicks-ass.net>
 <DE232310-8D7E-4074-ACFE-FE6416B13A3F@vmware.com>
 <20170813125019.ihqjud37ytgri7bn@hirez.programming.kicks-ass.net>
 <20170814031613.GD25427@bbox>
In-Reply-To: <20170814031613.GD25427@bbox>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
authentication-results: spf=none (sender IP is )
 smtp.mailfrom=namit@vmware.com; 
x-originating-ip: [2601:647:4580:b719:2812:f50:2a3e:f461]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1;BY2PR05MB2264;20:Bx8vrORFdW6PVB7zeD6no2p9v7wYdo6xH97HnNUhWEmptXD2SDMkzDDA7oYx8m0NIVoat/UkeGic//OryJctzeMorLB1o1jET7F0iSW34twMpJ8y9z/4tutTFZYYMrNqldZGxZTXtq/RXcsuzCBAEW9F8K3tz4NFJbNMWHAuUW4=
x-ms-office365-filtering-correlation-id: f11911c1-88e4-409d-7c89-08d4e2d256bc
x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(300000502095)(300135100095)(22001)(2017030254152)(300000503095)(300135400095)(2017052603031)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:BY2PR05MB2264;
x-ms-traffictypediagnostic: BY2PR05MB2264:
x-exchange-antispam-report-test: UriScan:(10436049006162)(17755550239193);
x-microsoft-antispam-prvs: <BY2PR05MB22640CC45A4EBF19814BDCE8D08C0@BY2PR05MB2264.namprd05.prod.outlook.com>
x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(100000703101)(100105400095)(93006095)(93001095)(6041248)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123564025)(20161123555025)(20161123558100)(20161123562025)(20161123560025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:BY2PR05MB2264;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:BY2PR05MB2264;
x-forefront-prvs: 039975700A
x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(189002)(199003)(24454002)(7416002)(36756003)(82746002)(966005)(2900100001)(14454004)(229853002)(106356001)(83716003)(575784001)(86362001)(105586002)(5660300001)(99286003)(6246003)(53936002)(54906002)(6512007)(189998001)(68736007)(110136004)(33656002)(6306002)(6436002)(6116002)(8676002)(81156014)(81166006)(7736002)(6506006)(8936002)(102836003)(6486002)(77096006)(305945005)(4326008)(478600001)(93886004)(3660700001)(6916009)(101416001)(3280700002)(25786009)(2906002)(2950100002)(97736004)(54356999)(76176999)(50986999);DIR:OUT;SFP:1101;SCL:1;SRVR:BY2PR05MB2264;H:BY2PR05MB2215.namprd05.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en;
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="utf-8"
Content-ID: <819DE372108DE643ABE71D60F3FDF4B1@namprd05.prod.outlook.com>
MIME-Version: 1.0
X-OriginatorOrg: vmware.com
X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Aug 2017 05:07:19.0253
 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: b39138ca-3cee-4b4a-a4d6-cd83d9dd62f0
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR05MB2264
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from base64 to 8bit by nfs id v7E57YCl007833

Minchan Kim <minchan@kernel.org> wrote:

> On Sun, Aug 13, 2017 at 02:50:19PM +0200, Peter Zijlstra wrote:
>> On Sun, Aug 13, 2017 at 06:06:32AM +0000, Nadav Amit wrote:
>>>> however mm_tlb_flush_nested() is a mystery, it appears to care about
>>>> anything inside the range. For now rely on it doing at least _a_ PTL
>>>> lock instead of taking  _the_ PTL lock.
>>> 
>>> It does not care about “anything” inside the range, but only on situations
>>> in which there is at least one (same) PT that was modified by one core and
>>> then read by the other. So, yes, it will always be _the_ same PTL, and not
>>> _a_ PTL - in the cases that flush is really needed.
>>> 
>>> The issue that might require additional barriers is that
>>> inc_tlb_flush_pending() and mm_tlb_flush_nested() are called when the PTL is
>>> not held. IIUC, since the release-acquire might not behave as a full memory
>>> barrier, this requires an explicit memory barrier.
>> 
>> So I'm not entirely clear about this yet.
>> 
>> How about:
>> 
>> 
>> 	CPU0				CPU1
>> 
>> 					tlb_gather_mmu()
>> 
>> 					lock PTLn
>> 					no mod
>> 					unlock PTLn
>> 
>> 	tlb_gather_mmu()
>> 
>> 					lock PTLm
>> 					mod
>> 					include in tlb range
>> 					unlock PTLm
>> 
>> 	lock PTLn
>> 	mod
>> 	unlock PTLn
>> 
>> 					tlb_finish_mmu()
>> 					  force = mm_tlb_flush_nested(tlb->mm);
>> 					  arch_tlb_finish_mmu(force);
>> 
>> 
>> 	... more ...
>> 
>> 	tlb_finish_mmu()
>> 
>> 
>> 
>> In this case you also want CPU1's mm_tlb_flush_nested() call to return
>> true, right?
> 
> No, because CPU 1 mofified pte and added it into tlb range
> so regardless of nested, it will flush TLB so there is no stale
> TLB problem.
> 
>> But even with an smp_mb__after_atomic() at CPU0's tlg_bather_mmu()
>> you're not guaranteed CPU1 sees the increment. The only way to do that
>> is to make the PTL locks RCsc and that is a much more expensive
>> proposition.
>> 
>> 
>> What about:
>> 
>> 
>> 	CPU0				CPU1
>> 
>> 					tlb_gather_mmu()
>> 
>> 					lock PTLn
>> 					no mod
>> 					unlock PTLn
>> 
>> 
>> 					lock PTLm
>> 					mod
>> 					include in tlb range
>> 					unlock PTLm
>> 
>> 	tlb_gather_mmu()
>> 
>> 	lock PTLn
>> 	mod
>> 	unlock PTLn
>> 
>> 					tlb_finish_mmu()
>> 					  force = mm_tlb_flush_nested(tlb->mm);
>> 					  arch_tlb_finish_mmu(force);
>> 
>> 
>> 	... more ...
>> 
>> 	tlb_finish_mmu()
>> 
>> Do we want CPU1 to see it here? If so, where does it end?
> 
> Ditto. Since CPU 1 has added range, it will flush TLB regardless
> of nested condition.
> 
>> CPU0				CPU1
>> 
>> 					tlb_gather_mmu()
>> 
>> 					lock PTLn
>> 					no mod
>> 					unlock PTLn
>> 
>> 
>> 					lock PTLm
>> 					mod
>> 					include in tlb range
>> 					unlock PTLm
>> 
>> 					tlb_finish_mmu()
>> 					  force = mm_tlb_flush_nested(tlb->mm);
>> 
>> 	tlb_gather_mmu()
>> 
>> 	lock PTLn
>> 	mod
>> 	unlock PTLn
>> 
>> 					  arch_tlb_finish_mmu(force);
>> 
>> 
>> 	... more ...
>> 
>> 	tlb_finish_mmu()
>> 
>> 
>> This?
>> 
>> 
>> Could you clarify under what exact condition mm_tlb_flush_nested() must
>> return true?
> 
> mm_tlb_flush_nested aims for the CPU side where there is no pte update
> but need TLB flush.
> As I wrote https://urldefense.proofpoint.com/v2/url?u=https-3A__marc.info_-3Fl-3Dlinux-2Dmm-26m-3D150267398226529-26w-3D2&d=DwIDaQ&c=uilaK90D4TOVoH58JNXRgQ&r=x9zhXCtCLvTDtvE65-BGSA&m=v2Z7eDi7z1H9zdngcjZvlNeBudWzA9KvcXFNpU2A77s&s=amaSu_gurmBHHPcl3Pxfdl0Tk_uTnmf60tMQAsNDHVU&e= ,
> it has stable TLB problem if we don't flush TLB although there is no
> pte modification.

To clarify: the main problem that these patches address is when the first
CPU updates the PTE, and second CPU sees the updated value and thinks: “the
PTE is already what I wanted - no flush is needed”.

For some reason (I would assume intentional), all the examples here first
“do not modify” the PTE, and then modify it - which is not an “interesting”
case. However, based on what I understand on the memory barriers, I think
there is indeed a missing barrier before reading it in
mm_tlb_flush_nested(). IIUC using smp_mb__after_unlock_lock() in this case,
before reading, would solve the problem with least impact on systems with
strong memory ordering.

Minchan, as for the solution you proposed, it seems to open again a race,
since the “pending” indication is removed before the actual TLB flush is
performed.

Nadav