From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932125AbcH3Pco (ORCPT ); Tue, 30 Aug 2016 11:32:44 -0400 Received: from mail-db5eur01on0089.outbound.protection.outlook.com ([104.47.2.89]:46624 "EHLO EUR01-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754421AbcH3Pcj (ORCPT ); Tue, 30 Aug 2016 11:32:39 -0400 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=cmetcalf@mellanox.com; Subject: Re: [PATCH v15 04/13] task_isolation: add initial support To: Peter Zijlstra References: <1471382376-5443-1-git-send-email-cmetcalf@mellanox.com> <1471382376-5443-5-git-send-email-cmetcalf@mellanox.com> <20160829163352.GV10153@twins.programming.kicks-ass.net> <20160830075854.GZ10153@twins.programming.kicks-ass.net> CC: Gilad Ben Yossef , Steven Rostedt , Ingo Molnar , Andrew Morton , Rik van Riel , Tejun Heo , Frederic Weisbecker , Thomas Gleixner , "Paul E. McKenney" , Christoph Lameter , Viresh Kumar , Catalin Marinas , Will Deacon , Andy Lutomirski , Michal Hocko , , , , From: Chris Metcalf Message-ID: Date: Tue, 30 Aug 2016 11:32:16 -0400 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160830075854.GZ10153@twins.programming.kicks-ass.net> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [12.216.194.146] X-ClientProxiedBy: CY1PR1201CA0007.namprd12.prod.outlook.com (10.169.17.145) To AM4PR0501MB2756.eurprd05.prod.outlook.com (10.172.216.12) X-MS-Office365-Filtering-Correlation-Id: 3a2d366f-9647-4127-5398-08d3d0eadd9b X-Microsoft-Exchange-Diagnostics: 1;AM4PR0501MB2756;2:25dnv/6cMpNgsARP0zep2tvY9Pc3WIKbFHQhK3CQW/qD6oZdsmUMVsfqZGTi823i+DkxKesAfCtrvPPWKTyh6IbX2e3MPwRAen45B1IqRrjjjpMams079xjdCJdV3lPBLMT2qRanQXI+zyVCDOt/Z565tcmPbqkXAI7V8Y/vkkvN87Q2XpVyyDK9V8STLZC7;3:XffXr0C66fgryPDWHQqVvCtTx7IuLbMnNRI5O1aZOshnchHU9lvxc7YFWWjj5RG+z8h/xkQJ84JzAKFNvyUDjDwTEx6cKxS28uKPDUVW6BlKlTSSYQocW33ebbPIejZg X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:AM4PR0501MB2756; X-Microsoft-Exchange-Diagnostics: 1;AM4PR0501MB2756;25:VsNb+TgtvVbjdIQ7SVK2eNbu77l9qbJSyrdCab+UL6QdIcu9ZjvGz9MnCdEfe8rLPDVJJzM/VqgYovd1NFsz98I3Wu1lWqAjhTMORWRdjqQmzBXwG3S4iGf8go6Q48gokDOMa3GvZbaSpfnjipxrMCFOtNEhPdgXh5T5pm5uTqdB7G8RrHcggEpcrtAlrpJlO9cWG529wQShRJ/O+2zK0jkS6AHvUFt4qElbvxdySELuU6mYyitG+phY4CC+XzllParxDQYy23GXrtK0B+Zp8eO+0g3zRiPnhKn1ezbSoUX3nPanxDbyITDL8T5AsiPcIVDDJY81LruVVq34RhDdjoffR9LPI8+bQyw0QTuKJUpCj4aImL7GL8x1oMFmVoGqh+YE/uEtN02XqKmiYaSeyCBB3JW98GDIYwzaABQjHEavKqRXJUkY026XoRjtQo63QFgCwjt5Jx7XZC2Y2wJypdz/21i0ZJtoJPr+zx9d16VTgPjWuwRtDiqYyqneLbxtXU8lIL0oXBFyaB6Y3+YsbL5P7RrMYQovoNXHkEexBihOW3XeRuODUpY62hwKsaqa2nVu1g5mimP5HRFTT5kXNpodqb1rFgB+0zZ6Fj48wqgQS5RzmgAbIvlgpeZaNykXPCXpyF8IebaFHFKCm0kJSsG3ABmI3Viw+T+H3pYFg8HObIyYTr/GuZSQ1vzhu8ou5aoCWxKnap6f0ODcfmEBLLJDGDP3PK5fMOXl37eowHIWdq8w8uooq/UXv5fz0KNTGOiBjwG2sMfju5DD7cXq92nP0gdmuun4r4c/p7Alh9y26i6yojI1qrCkqaibY/QzLN7Mox0/LZfVJ94lcZFETPzrbfCRCcTZYAEQjUplmjk= X-Microsoft-Exchange-Diagnostics: 1;AM4PR0501MB2756;31:8UMVZCI8j6p3aMzKf9JMfcTkUDyQjpa+C6q2EM1+rgtO9B6cRCll9tmN8z8fsaOYm5GkKWtBD6MfCh+mUhNkI6+rslh5kcMtke8CyoI/5Orh5Os3ojoo0ct3U8J0/ZPrYU0biai/9aRTpMIxQvgt1DfoDifnHahDvUr+9huTJyG+IcSna0EEXvw4aALpbOSJvNHXvZjyIfnDcuRATdckg4NURVduJ0vuvdrs8bm2TbQ=;20:XT58JUea1Vr2fIUNMUF4bJgrLO9pgyt7E6VfimV8qG5Z7QPpr3T4GG3g9sK1448QPVI3Y9dm61CfB2KeBuOVDddRSnG4TxsRMvaEv98ngh2dM2bJD3TMEgeX6kNR8p9XrNHvVnGdt3puKTMgQ5G3Z7GF6gmP44FpwI6NXb+0TPYGeX94N4d+z2bqCgnfVTfoGYyqVhGElrFTnRwIZD2a4SRoOeH4o4RiRL1qDag0oYpFnvATUGxm/91ciJvWZ199OkdBizNzehw9Mr1OZRHCTlZ8plzd/rViZtp3MiWfJ2vOo32SLu6c0SFJOGsBGmgaffTJt3IqcAmbJVRSU68txsw0eSSBffB0ZbWhgVB5Sy/J3wjXELa/8FLHZonmgx9eWOveXhYDJDoxVMGFcmCHTqhBNW9OX+gxOerWOA8qPg5XnTVVdFX7y6+55Wz+OqdGBHsttcKO8sO5uDfvBSCbsL8w88ZaNO6KXSf3B4rNAQVSdpx6O9LlLTOmqAPuRTzH X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(171992500451332); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001)(6055026);SRVR:AM4PR0501MB2756;BCL:0;PCL:0;RULEID:;SRVR:AM4PR0501MB2756; X-Microsoft-Exchange-Diagnostics: 1;AM4PR0501MB2756;4:j8ubDgcZpr2OYRc66m1PcUhK38ZZ2FdGHdUvQ1Zonx9aRk+9wKLBoPbwBg+hvQy9cKuKkNO/F9rTuED7C88gHYbsWPmmLGYtwbDZWYgrnDuc1eMYvEAp2MVZ5SHekqU5A9SRFJAK+VbLCOob1krle+eiDrY4uxjBZn1P1489RkD78Vk9IIpw3tuL5/5dUlfXSHA78f65yCNWDbwwbfzby+HzicYU8KwbRCrJTwLxJGuqvpVzdJDJte9mOVEpiNsXWq5Tgigf2DcZ5UwK87gJkbAO9k6frgaTy4XCXOZk3P0A0Ho7Se6uIz6WW0Meox0mj3VyPmxEiBUKzls/w1H88X9apSmE3jeJ4+oo5Dn19gMTrTqx3iqonSz9+tLI6cXEgxmub12CWEEkTRy8/QMQL/KpYTjVoj/VzynRvYlQT2ig1JqfXJFCqwwpv+cfwFvrPaBK6YiZYt0dqXW5cAdsIA== X-Forefront-PRVS: 0050CEFE70 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(4630300001)(6009001)(6049001)(7916002)(189002)(199003)(24454002)(377454003)(52314003)(189998001)(77096005)(81156014)(33646002)(50986999)(7736002)(106356001)(42186005)(8676002)(81166006)(305945005)(15975445007)(36756003)(23746002)(93886004)(7846002)(105586002)(66066001)(7416002)(65956001)(47776003)(68736007)(97736004)(101416001)(19580395003)(4001350100001)(2950100001)(86362001)(65806001)(92566002)(2906002)(31686004)(586003)(4326007)(6116002)(83506001)(76176999)(110136002)(5660300001)(3846002)(64126003)(230700001)(54356999)(50466002)(31696002)(65826007)(21314002)(18886065003);DIR:OUT;SFP:1101;SCL:1;SRVR:AM4PR0501MB2756;H:[10.15.7.181];FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;AM4PR0501MB2756;23:mnYDPBcZb1fzfuy2WH12qYKw4ykTRcvrObv?= =?Windows-1252?Q?/5pxvlBxxOrE1g2Rbp2UBl+tlTJuPFgvDAUPaYuFVCxf5um+1nWpz4r4?= =?Windows-1252?Q?ovNVnzFJUhiuOlUxCw4d9O9J9tbQBHnYCgOwLbG6BJxvkARpKlaqzdzc?= =?Windows-1252?Q?+FAHhGsdLvfQTKf1y8Skbof/vHVYXp25gmoOJ/k+UVQhFu9Dqi425xSL?= =?Windows-1252?Q?Bqbz6RDbwn056qKkMcbXpK1nJVh8f2aYm4DIvbbNTlWs+cV0sBcpAd6Q?= =?Windows-1252?Q?YwgxeKRFQmtNSzsHQwXzaFu5z1LCrI/ynxk0PpbXdqy5dmEpigrfQhjE?= =?Windows-1252?Q?cY+Vm0Tg+amWAxhdE069OzoF45pUit13a+PV49smGOaGtt/YDdD4Ax+b?= =?Windows-1252?Q?t8ryW7fL0hNbOV3G6FAmYHtul4uw7uYdVvu7bY3lbznk3diGUBmfb9Gh?= =?Windows-1252?Q?t31Oi5G0rpo1LMMDYCiTm8zBPS7K/I9wF2rIj+ubov1sFh/Bo0dELr9m?= =?Windows-1252?Q?3K8ymf7oLXuF7zch27nLXtpsJwi481yy7DrohBZLmK3uYiiN4As4v9HQ?= =?Windows-1252?Q?LgSGIBpls420jLzXYEv2vo8ccS/eOx4BpE5PkiQ9BxWYVwOdxvp1Azsq?= =?Windows-1252?Q?HHR+cVCKHYU/2ORAzJBJ1n2+1ECFy9nut0pOq+dYzxnZaeJmBDRcX0SH?= =?Windows-1252?Q?yPSovlhhWcUtCA5Ohwbok+UYDoeHs0JLYopzxptLq8qLHzCDMT+F0zSD?= =?Windows-1252?Q?/og+0dwGGTzjOCBDw9AN5q9Tw92Ynfsi39RZo6JB9h/EtGLiNoo/r23j?= =?Windows-1252?Q?yleuJuoNZn2maJF0oV6Nm7KVjU6T4NbYlET1q5mi4mlUo+q0GecB6a/P?= =?Windows-1252?Q?b8RGBmpOB+nZDEW42QGvzHidkPlG+xzXGwArCtnjPuJQxoZj1neQsPmz?= =?Windows-1252?Q?bxJ8T4PgAgSXNMSNavY80RCrrh4xy0Je920USbo+tAu0Pr24ef+vHgPD?= =?Windows-1252?Q?sRyXXjT2tzYh9scZZsRQKKRCKn8czc6P0zEH+OfF61jBZZ4RaWaWjWds?= =?Windows-1252?Q?dS5+tezIO8UIIjsePmIWyGqqpVX0QHOP2kMGwpAKFYAwnIzkRxcBChFd?= =?Windows-1252?Q?nCSKkZmf377k8i48IVVhQeoq5J6cW4GJiD7HQDsMW4FtuXg4vr979vIy?= =?Windows-1252?Q?yN6GwyLqxurCXd/fCHv6FIe0aQHJ5kGsS3WjNwOIxpyT2IFE5K8zdWLz?= =?Windows-1252?Q?/V5DL/Jtssa0tStRGZdVKhx7fZwcQvjdA/tTHfpdag86bYn093qKisPE?= =?Windows-1252?Q?qFb5yd4BH0rbE3+5k/Ff2wpextbWafWYGQF4Tt2CKv+5Zv3jBPRiu/Qs?= =?Windows-1252?Q?YbWwQPG74KZFCbiy5dFxTXFBw2pFNn3QG6H2BBUcJCaM+KZOe0fOYe8o?= =?Windows-1252?Q?TuO2NUdIrwX6C+49t4FpO2TYPHKeL+mMBj4+JJ0vstekHVVunGq02GDa?= =?Windows-1252?Q?JnxhIdL5H5DHZ0vEibHgo/U0wK0RD?= X-Microsoft-Exchange-Diagnostics: 1;AM4PR0501MB2756;6:JxcLLx3wvwnoN0nozcpboGyAJjN//kfAdYZd+8CByq+GFNyPPGeolPNXb7XbmnWIY/7nL8s0GRQ335TxTjVmQT672YXxdMW6TwUxAKeKA9ht1PQG8BztHmSDoLj5Ts6MSsO6Jojngh4ot+V3puip+gBjvdUhhEB7jvkEsrVp02gn3FeuBOEJ4QYaVMuBtsWuGi5oCQ5GQonpVhlmHHxFSJk/Cu+Ze6iWIxU8reFWT8YlN0RcH7IpFnj57Ba6dcxyL/VjAo7bfs/vLbkI/6jl/u8AaEGAsw1ZoPYkEm9lxa/qVIPRBKUxBVW7tF4CqzPsYbwOQ4ClosSXySI3rSnUVw==;5:CB+S4iq3M0q6riKak99rWdRYBJz0Vx1S7dga/R/3aH3WFIITSnQ1CzjEzsFh+gm1jMKHjiyhYc62wO0xE8Hbqx5+FG1bsXvDpl3F8uZJ5hV8FN7ueYmFxzRwONgR367sUvA98FjaFYZ7BumAh3hvwA==;24:YDAKbee4zEu9FN/zTdKIHQ5Q4o+SBW2E3eLpscUlhCmx+htCZMHtL1pMuay+PARWRRlOkqyuMLyGVYzGBA2yZnEr54UqvIN9iYGAm0kFV44=;7:Y9nVZGVb5qOtQRdgKk39j6jwXlNhqgPNBd5uhkf6boOvFTkdwqhvvpotZ1qdxcKV0jjf3yAYe0omj9BAALpv1NG6visYQTny5PgPS30g9e0MiSv0sDEsWWu/STiE8lRIQMWhkgpwVfr2Jba/s/tJha1qAc3d/QCJI2ymTb32UKmsIt91udR84i9keNJtzbvDg+31vBPs9C2mk1YIOmrXXJGzrXyUB7GcEGau3pte/82gMLPPe0l4X4k7Lf+zohQb SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Aug 2016 15:32:30.3303 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR0501MB2756 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 8/30/2016 3:58 AM, Peter Zijlstra wrote: > On Mon, Aug 29, 2016 at 12:40:32PM -0400, Chris Metcalf wrote: >> On 8/29/2016 12:33 PM, Peter Zijlstra wrote: >>> On Tue, Aug 16, 2016 at 05:19:27PM -0400, Chris Metcalf wrote: >>>> + /* >>>> + * Request rescheduling unless we are in full dynticks mode. >>>> + * We would eventually get pre-empted without this, and if >>>> + * there's another task waiting, it would run; but by >>>> + * explicitly requesting the reschedule, we may reduce the >>>> + * latency. We could directly call schedule() here as well, >>>> + * but since our caller is the standard place where schedule() >>>> + * is called, we defer to the caller. >>>> + * >>>> + * A more substantive approach here would be to use a struct >>>> + * completion here explicitly, and complete it when we shut >>>> + * down dynticks, but since we presumably have nothing better >>>> + * to do on this core anyway, just spinning seems plausible. >>>> + */ >>>> + if (!tick_nohz_tick_stopped()) >>>> + set_tsk_need_resched(current); >>> This is broken.. and it would be really good if you don't actually need >>> to do this. >> Can you elaborate? We clearly do want to wait until we are in full >> dynticks mode before we return to userspace. >> >> We could do it just in the prctl() syscall only, but then we lose the >> ability to implement the NOSIG mode, which can be a convenience. > So this isn't spelled out anywhere. Why does this need to be in the > return to user path? I'm not sure where this should be spelled out, to be honest. I guess I can add some commentary to the commit message explaining this part. The basic idea is just that we don't want to be at risk from the dyntick getting enabled. Similarly, we don't want to be at risk of a later global IPI due to lru_add_drain stuff, for example. And, we may want to add additional stuff, like catching kernel TLB flushes and deferring them when a remote core is in userspace. To do all of this kind of stuff, we need to run in the return to user path so we are late enough to guarantee no further kernel things will happen to perturb our carefully-arranged isolation state that includes dyntick off, per-cpu lru cache empty, etc etc. >> Even without that consideration, we really can't be sure we stay in >> dynticks mode if we disable the dynamic tick, but then enable interrupts, >> and end up taking an interrupt on the way back to userspace, and >> it turns the tick back on. That's why we do it here, where we know >> interrupts will stay disabled until we get to userspace. > But but but.. task_isolation_enter() is explicitly ran with IRQs > _enabled_!! It even WARNs if they're disabled. Yes, true! But if you pop up to the caller, the key thing is the task_isolation_ready() routine where we are invoked with interrupts disabled, and we confirm that all our criteria are met (including tick_nohz_tick_stopped), and then leave interrupts disabled as we return from there onwards to userspace. The task_isolation_enter() code just does its best-faith attempt to make sure all these criteria are met, just like all the other TIF_xxx flag tests do in exit_to_usermode_loop() on x86, like scheduling, delivering signals, etc. As you know, we might run that code, go around the loop, and discover that the TIF flag has been re-set, and we have to run the code again before all of that stuff has "quiesced". The isolation code uses that same model; the only difference is that we clear the TIF flag manually in the loop by checking task_isolation_ready(). >> So if we are doing it here, what else can/should we do? There really >> shouldn't be any other tasks waiting to run at this point, so there's >> not a heck of a lot else to do on this core. We could just spin and >> check need_resched and signal status manually instead, but that >> seems kind of duplicative of code already done in our caller here. > What !? I really don't get this, what are you waiting for? Why is > rescheduling making things better. We need to wait for the last dyntick to fire before we can return to userspace. There are plenty of options as to what we can do in the meanwhile. 1. Try to schedule(). Good luck with that in practice, since a userspace process that has enabled task isolation is going to be alone on its core unless something pretty broken is happening on the system. But, at least folks understand the idiom of scheduling out while you wait. 2. Another variant of that: set up a wait completion and have the dynticks code complete it when the tick turns off. But this adds complexity to option 1, and really doesn't buy us much in practice that I can see. 3. Just admit that we are likely alone on the core, and just burn cycles in a busy loop waiting for that last tick to fire. Obviously if we do this we also need to test for signals and resched so the core remains responsive. We can either do this in a loop just by spinning explicitly, or I could literally just remove the line in the current patchset that sets TIF_NEED_RESCHED, at which point we busy-wait by just going around and around in exit_to_usermode_loop(). The only flaw here is that we don't mark the task explicitly as TASK_INTERRUPTIBLE while we are doing this - and that's probably worth doing. -- Chris Metcalf, Mellanox Technologies http://www.mellanox.com