From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751540AbdHNFHX (ORCPT ); Mon, 14 Aug 2017 01:07:23 -0400 Received: from mail-sn1nam02on0083.outbound.protection.outlook.com ([104.47.36.83]:38112 "EHLO NAM02-SN1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751501AbdHNFHW (ORCPT ); Mon, 14 Aug 2017 01:07:22 -0400 From: Nadav Amit To: Minchan Kim CC: Peter Zijlstra , Ingo Molnar , Stephen Rothwell , Andrew Morton , Thomas Gleixner , "Ingo Molnar" , "H. Peter Anvin" , "Linux-Next Mailing List" , Linux Kernel Mailing List , Linus Subject: Re: linux-next: manual merge of the akpm-current tree with the tip tree Thread-Topic: linux-next: manual merge of the akpm-current tree with the tip tree Thread-Index: AQHTEnbvkRXpJ0prakSqPKBRkcADs6J+5TmAgAAkogCAAALYgIAAI/cAgAKfBACAAHDTgIAA8e6AgAAfCYA= Date: Mon, 14 Aug 2017 05:07:19 +0000 Message-ID: <0F858068-D41D-46E3-B4A8-8A95B4EDB94F@vmware.com> References: <20170811175326.36d546dc@canb.auug.org.au> <20170811093449.w5wttpulmwfykjzm@hirez.programming.kicks-ass.net> <20170811214556.322b3c4e@canb.auug.org.au> <20170811115607.p2vgqcp7w3wurhvw@gmail.com> <20170811140450.irhxa2bhdpmmhhpv@hirez.programming.kicks-ass.net> <20170813125019.ihqjud37ytgri7bn@hirez.programming.kicks-ass.net> <20170814031613.GD25427@bbox> In-Reply-To: <20170814031613.GD25427@bbox> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=namit@vmware.com; x-originating-ip: [2601:647:4580:b719:2812:f50:2a3e:f461] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;BY2PR05MB2264;20:Bx8vrORFdW6PVB7zeD6no2p9v7wYdo6xH97HnNUhWEmptXD2SDMkzDDA7oYx8m0NIVoat/UkeGic//OryJctzeMorLB1o1jET7F0iSW34twMpJ8y9z/4tutTFZYYMrNqldZGxZTXtq/RXcsuzCBAEW9F8K3tz4NFJbNMWHAuUW4= x-ms-office365-filtering-correlation-id: f11911c1-88e4-409d-7c89-08d4e2d256bc x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(300000502095)(300135100095)(22001)(2017030254152)(300000503095)(300135400095)(2017052603031)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:BY2PR05MB2264; x-ms-traffictypediagnostic: BY2PR05MB2264: x-exchange-antispam-report-test: UriScan:(10436049006162)(17755550239193); x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(100000703101)(100105400095)(93006095)(93001095)(6041248)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123564025)(20161123555025)(20161123558100)(20161123562025)(20161123560025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:BY2PR05MB2264;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:BY2PR05MB2264; x-forefront-prvs: 039975700A x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(189002)(199003)(24454002)(7416002)(36756003)(82746002)(966005)(2900100001)(14454004)(229853002)(106356001)(83716003)(575784001)(86362001)(105586002)(5660300001)(99286003)(6246003)(53936002)(54906002)(6512007)(189998001)(68736007)(110136004)(33656002)(6306002)(6436002)(6116002)(8676002)(81156014)(81166006)(7736002)(6506006)(8936002)(102836003)(6486002)(77096006)(305945005)(4326008)(478600001)(93886004)(3660700001)(6916009)(101416001)(3280700002)(25786009)(2906002)(2950100002)(97736004)(54356999)(76176999)(50986999);DIR:OUT;SFP:1101;SCL:1;SRVR:BY2PR05MB2264;H:BY2PR05MB2215.namprd05.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="utf-8" Content-ID: <819DE372108DE643ABE71D60F3FDF4B1@namprd05.prod.outlook.com> MIME-Version: 1.0 X-OriginatorOrg: vmware.com X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Aug 2017 05:07:19.0253 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: b39138ca-3cee-4b4a-a4d6-cd83d9dd62f0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR05MB2264 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by nfs id v7E57YCl007833 Minchan Kim wrote: > On Sun, Aug 13, 2017 at 02:50:19PM +0200, Peter Zijlstra wrote: >> On Sun, Aug 13, 2017 at 06:06:32AM +0000, Nadav Amit wrote: >>>> however mm_tlb_flush_nested() is a mystery, it appears to care about >>>> anything inside the range. For now rely on it doing at least _a_ PTL >>>> lock instead of taking _the_ PTL lock. >>> >>> It does not care about “anything” inside the range, but only on situations >>> in which there is at least one (same) PT that was modified by one core and >>> then read by the other. So, yes, it will always be _the_ same PTL, and not >>> _a_ PTL - in the cases that flush is really needed. >>> >>> The issue that might require additional barriers is that >>> inc_tlb_flush_pending() and mm_tlb_flush_nested() are called when the PTL is >>> not held. IIUC, since the release-acquire might not behave as a full memory >>> barrier, this requires an explicit memory barrier. >> >> So I'm not entirely clear about this yet. >> >> How about: >> >> >> CPU0 CPU1 >> >> tlb_gather_mmu() >> >> lock PTLn >> no mod >> unlock PTLn >> >> tlb_gather_mmu() >> >> lock PTLm >> mod >> include in tlb range >> unlock PTLm >> >> lock PTLn >> mod >> unlock PTLn >> >> tlb_finish_mmu() >> force = mm_tlb_flush_nested(tlb->mm); >> arch_tlb_finish_mmu(force); >> >> >> ... more ... >> >> tlb_finish_mmu() >> >> >> >> In this case you also want CPU1's mm_tlb_flush_nested() call to return >> true, right? > > No, because CPU 1 mofified pte and added it into tlb range > so regardless of nested, it will flush TLB so there is no stale > TLB problem. > >> But even with an smp_mb__after_atomic() at CPU0's tlg_bather_mmu() >> you're not guaranteed CPU1 sees the increment. The only way to do that >> is to make the PTL locks RCsc and that is a much more expensive >> proposition. >> >> >> What about: >> >> >> CPU0 CPU1 >> >> tlb_gather_mmu() >> >> lock PTLn >> no mod >> unlock PTLn >> >> >> lock PTLm >> mod >> include in tlb range >> unlock PTLm >> >> tlb_gather_mmu() >> >> lock PTLn >> mod >> unlock PTLn >> >> tlb_finish_mmu() >> force = mm_tlb_flush_nested(tlb->mm); >> arch_tlb_finish_mmu(force); >> >> >> ... more ... >> >> tlb_finish_mmu() >> >> Do we want CPU1 to see it here? If so, where does it end? > > Ditto. Since CPU 1 has added range, it will flush TLB regardless > of nested condition. > >> CPU0 CPU1 >> >> tlb_gather_mmu() >> >> lock PTLn >> no mod >> unlock PTLn >> >> >> lock PTLm >> mod >> include in tlb range >> unlock PTLm >> >> tlb_finish_mmu() >> force = mm_tlb_flush_nested(tlb->mm); >> >> tlb_gather_mmu() >> >> lock PTLn >> mod >> unlock PTLn >> >> arch_tlb_finish_mmu(force); >> >> >> ... more ... >> >> tlb_finish_mmu() >> >> >> This? >> >> >> Could you clarify under what exact condition mm_tlb_flush_nested() must >> return true? > > mm_tlb_flush_nested aims for the CPU side where there is no pte update > but need TLB flush. > As I wrote https://urldefense.proofpoint.com/v2/url?u=https-3A__marc.info_-3Fl-3Dlinux-2Dmm-26m-3D150267398226529-26w-3D2&d=DwIDaQ&c=uilaK90D4TOVoH58JNXRgQ&r=x9zhXCtCLvTDtvE65-BGSA&m=v2Z7eDi7z1H9zdngcjZvlNeBudWzA9KvcXFNpU2A77s&s=amaSu_gurmBHHPcl3Pxfdl0Tk_uTnmf60tMQAsNDHVU&e= , > it has stable TLB problem if we don't flush TLB although there is no > pte modification. To clarify: the main problem that these patches address is when the first CPU updates the PTE, and second CPU sees the updated value and thinks: “the PTE is already what I wanted - no flush is needed”. For some reason (I would assume intentional), all the examples here first “do not modify” the PTE, and then modify it - which is not an “interesting” case. However, based on what I understand on the memory barriers, I think there is indeed a missing barrier before reading it in mm_tlb_flush_nested(). IIUC using smp_mb__after_unlock_lock() in this case, before reading, would solve the problem with least impact on systems with strong memory ordering. Minchan, as for the solution you proposed, it seems to open again a race, since the “pending” indication is removed before the actual TLB flush is performed. Nadav