From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933339AbcH3RgA (ORCPT ); Tue, 30 Aug 2016 13:36:00 -0400 Received: from mail-db5eur01on0088.outbound.protection.outlook.com ([104.47.2.88]:53520 "EHLO EUR01-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932731AbcH3Rfq (ORCPT ); Tue, 30 Aug 2016 13:35:46 -0400 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=cmetcalf@mellanox.com; Subject: Re: [PATCH v15 04/13] task_isolation: add initial support To: Andy Lutomirski References: <1471382376-5443-1-git-send-email-cmetcalf@mellanox.com> <1471382376-5443-5-git-send-email-cmetcalf@mellanox.com> <20160829163352.GV10153@twins.programming.kicks-ass.net> <20160830075854.GZ10153@twins.programming.kicks-ass.net> CC: Peter Zijlstra , Gilad Ben Yossef , Steven Rostedt , Ingo Molnar , Andrew Morton , Rik van Riel , Tejun Heo , Frederic Weisbecker , Thomas Gleixner , "Paul E. McKenney" , Christoph Lameter , Viresh Kumar , Catalin Marinas , Will Deacon , Michal Hocko , "linux-mm@kvack.org" , "linux-doc@vger.kernel.org" , Linux API , "linux-kernel@vger.kernel.org" From: Chris Metcalf Message-ID: Date: Tue, 30 Aug 2016 13:02:20 -0400 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [12.216.194.146] X-ClientProxiedBy: CY1PR1201CA0037.namprd12.prod.outlook.com (10.169.17.175) To HE1PR0501MB2764.eurprd05.prod.outlook.com (10.172.125.18) X-MS-Office365-Filtering-Correlation-Id: a3956a7e-4824-4fa1-5d5e-08d3d0f7715b X-Microsoft-Exchange-Diagnostics: 1;HE1PR0501MB2764;2:FCsF1eeQ/m8rTVJtkrNk3mNfmSEbOhV78jpbbZIVbpnt/ouwq+bWywbtIitOpOZwhggYOKbOwV/T9A6cDB6DASfAeeh3wzEnzQbDpt/lbkrTKKoB3a73os3lzQY0gHBfKD7fSY+Ezv2ruDBBafN9KgcfYCX7mjipjVfbjZFuvyMKh6OUQyzSH2/9kNsAWSMl;3:ItY1ddGataXeliOnENnMk5CcMy1vWeQew/Nl86RWz2S+78zMbj9Wtbfmqlb12/Lo1/HoAelri/5rckhWh7xJUdS91fr11YuaSSD+Jml1rUZzo3LANtgKLMaUBQz5Bl48 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:HE1PR0501MB2764; X-Microsoft-Exchange-Diagnostics: 1;HE1PR0501MB2764;25:cX9zgVuRfI4XPaljhxZZYtRoGBF9CT8aRF6YvNs3XufKvIrQ4gPBKkg6DEKLjfMuWeEx5+b7X3RAXhTq/5ClSXhm7MxffZPXBNx41HHA7putHoUYhZutQ6jVvPxMplXQkKOSx5o/Vy/wY1qicd+N1bS9W8wAqQQRLAkm7zws/t1B/OjzjV8l7FbHrXf5hY2WnIslMVLs1JeS8xS4VtKYKtBYFbV2Be0DVP4yMos4V4TcxzIW14975UOreAlEehMq7JkzzqgGbzOI9NbvsAuQDKwcvNIYtbGvAJ5y3zlh/ODpNp+A+KUDGAoDBK0SGrD8oBXB6sSEIkWQzsy2NjqFNYtbJKDlcoaSabRgi6tJS84sedZWnFSmCVcBpTIO7YI47R7Dq6p1m8jtEAymj9KP8l7PHzIlbAbLD4kptmDCgzYIyZzpSUNDBZRj6/JwuSS3NQ0gu6Nyw8z6ZDsBd0oJjxtJML1rX7ZQIADDRyQ9w6XcIUN2u+qX5wlTj9PN3YpODLSfCcUdqIgDjX4eJ/+/tTYGnGCS77d07CEaearMx0aRKscju2hdYPSKFbQqXi4wUPsbg9mp86BKZqplEchooCmZz8i4XFvAUeBBcMVpPNMdppMRCFcomjHxQZsYMXzLVQmseC9tQ2yCUJ4RHmoz2co/w+TGgiPau0/RqhLN8TWdeU6iPsjl8FJqYIxuYQ0/utWRxqYrjP0xOdTkOs74ycx8dVcnBS8IbtYsNYzfv2UmRtab2BtdwVW8ks9bc9KrRZ3TlJqCz7YWFgG4ye3DI8rVzZvefPc/PQg7KvCBf+MSxme/m8fUI9skvW9yhEkBmEAmRSdvCedBwqp7AWpjNw== X-Microsoft-Exchange-Diagnostics: 1;HE1PR0501MB2764;31:k17RIIfNR4xr5udKXwmHWHF0c7M2DR6LrdmvuyO+lJC36dSwzIcb6suBFCIcip7EzveROw/WHoZcm5dYgkVqs2ZAnobKZZl3qOKKR2+pjk9h0Ze4QtlNMl89bdqYesPw29dDPr5nakOabFrwwiQnksKgutyjqB9kFGRSeyB29YkJpSkqbUgO3wzFYbr7dGPR6Kc7Fga2HYR/a0q0VWb31d/eqtE6FLzO9C7UoSPiLXo=;20:N9RjSTneNnvdDFGyq1RdiFXXsdU3wiIKyES2Kf8ixrIocb6IrSZWpgFFyejDS4S4UkzFCp5rfF8to2f5Lark0QrpKRQBeY76FFAV/5hXg4xRoo/u5gUp13V1Um5MYhh1Ryv6SeIv/nMSekiwan28uTjUSSlHcpZYy2lohbEfUkdmnNfoYajYXuAigYo9pr//AtkoxCj5Q0INWJ9/75D5fG5Hp4spKZMOFmaMUIUXdWlNAaTNH4iN76cn043UuwFiOMj8YGXfIcFJ7FhV45k6hCTefnt5WhZ5Cb4MSrEj3HmWnnrcoZRrIqCsYd/kcjdWLAFIfttPhCz2Ua9XmHSnsfJz3RTcvNGUx4hvckgMjDqG7xiP7B/RPzVVrIE7IMfj+spEC8BqojjzkjMSJu1lXqHWQFO69xwKl7jRZQP+CXx1xw32QxFlx0bNdG4A1VtJstFFziRgKawqeurpOWRnfa3j6R0GcWjSYHJQmqU2ay/H9qxwGMTNvXHr0odAmfta X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(171992500451332); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(6055026);SRVR:HE1PR0501MB2764;BCL:0;PCL:0;RULEID:;SRVR:HE1PR0501MB2764; X-Microsoft-Exchange-Diagnostics: 1;HE1PR0501MB2764;4:dC1qH/nLKEXMGsdO30OGvjoWv5Bx+LNG7Y1uTbXVrRhUzmgUYqtCJZ2ZUcJl5YC8ngrVb6yhwAWqB0cvIzK4QRyFyj79Kwplh/sWuO8Dq6ZDVnmd9/V2zFdvmdrHKhbckamBlcHBnXcljcnOdn8AcmVsEP5/OAXBGN14Brym05IxwddW2NvItFLr+E8OJ1dfKv+k/Af64IeWS+as4ImlWBq1GjoBtqeD+kD6au9njX0IkoZzbhjc/8INRRnFotPmlrRLAimdqFYGCZwFLJcxNqzfRP26Tww3v5aI0U3rul3811ahWh+XJr3YmArj7X9smtKz7rGgB8R71FrC+B6l9YqZAIp7sS6hNCAT4hJynI9O05yNFYPZeIZXz2yxDSTV7NFG1hKiVi/u5WF0TiP1dXLzwJEcVZyWC9vGWUZ4aEry4QEqrtihxPE1csALTdST7MAI8BBeJ8b11m9qjhQIXw== X-Forefront-PRVS: 0050CEFE70 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(4630300001)(6049001)(6009001)(7916002)(52314003)(199003)(24454002)(189002)(377454003)(97736004)(42186005)(4001350100001)(31686004)(189998001)(81166006)(81156014)(8676002)(65806001)(65956001)(68736007)(66066001)(586003)(64126003)(31696002)(6116002)(3846002)(105586002)(110136002)(92566002)(36756003)(86362001)(106356001)(33646002)(50466002)(83506001)(101416001)(4326007)(305945005)(7416002)(2906002)(50986999)(54356999)(76176999)(23676002)(5660300001)(7846002)(230700001)(77096005)(93886004)(65826007)(2950100001)(47776003)(7736002)(19580395003)(19580405001)(15975445007)(18886065003);DIR:OUT;SFP:1101;SCL:1;SRVR:HE1PR0501MB2764;H:[10.15.7.181];FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtIRTFQUjA1MDFNQjI3NjQ7MjM6U3ZvSWlEeHJPZkdabEt2c1h2NjRxK3Np?= =?utf-8?B?Q0NiSUFZY3dkYXVMNjB6MElXTXJyRmh2aTk0RzVwY0lwejg4Tkh4Si85TGtO?= =?utf-8?B?byswNHVqYzMvYVQwdHFUeVUxUUtNZFBoZW9Gd0F1Snc2S1ZWSERYVlpLdUpJ?= =?utf-8?B?NGk4NmxwelowK05JMm5wWFh3WDB2TEFZQnVyUER6NHJFTHIvN2l4VUF1NGdi?= =?utf-8?B?a2FGWVJmcjgrMzZKeDVOY3V4RHZqc0wrbk42YUFwanNUM01tZVNibFJLVUV0?= =?utf-8?B?MjRaNmU5ZkY2VEx6bEtTc0JzOFdZVFZuMTlDcC9yRCs2NkM0Sm1DL2pFYUpF?= =?utf-8?B?aWkxZzljQU5oWGMwdk13bXlwMG5zaWRZN3FsekdCazVyUEkxM01MdXMzMlMx?= =?utf-8?B?SDk5U0o2V0prNDFMSEdKa3RTN01sa0NOWU9SVEdFTHVQNzBCOCtGVVdVODRT?= =?utf-8?B?dldvcWRydGRreS9lZTh4ZjdYTitEN2R4QTNzMldjWm5Jd3U0KzdBUllyRTFQ?= =?utf-8?B?QlJ5TUtkNGMveGlpLzFxTnJJTVRtVllyUUt2ZEJLMTlsRkpZWFNxcEIwTTRT?= =?utf-8?B?aE95aWxCR1N6UE1PdkRrcEo3SEI1QkcrQm1iWTFXTGN4VS9ZaU1GQ1NReUJM?= =?utf-8?B?Z3hjaTgwdVdMdGorZUFLM21QaWxBTXMwelFYdWhQc0dGREdaaGpaRGp0UWQ4?= =?utf-8?B?ZkJkNHcvdStCVG9DUmNKRTMrMGE1aVZ5aW85TTVxWWdGTmR5c3BMRW1BZE9K?= =?utf-8?B?MVJLcGJ0Q0s3NWlHaHl5U2lRODFMcC8wSG5VZ1E3MmJyUXBrOG5HU1JFaU9n?= =?utf-8?B?MlVabW9xUEJoTVVPYTRCWGRXb2pLNS8xWnA2SVVBcWF5K3BiT0dDSkNFTFFG?= =?utf-8?B?disxMUVCY0RXT2xMU1NvSVdLT1VGNGt4eEFVV1p0dFJvNHJrYmVYTWxQd0l5?= =?utf-8?B?bzNpeUtQNGhQK0NnZUlIZUs2RzRwR0tOYjc5RjVCTFRMSUpqNy9WcUtJdlV1?= =?utf-8?B?WnpGM0tPSDM1ei9qUnNPWGdjSU5CWkR2d1dkQURKeTJwbUtrb0RmS1Z0MTBj?= =?utf-8?B?SU9RRFZkaTR2UFJMVHdPdXJucTE0ZUtrZGJCVThBOFRDWE9vQlRPMGpvRGRI?= =?utf-8?B?UmEweHJlUHN3dVJuSmlaRFE4bTlFdDdOQnllRWZmRXN0alpKU3k2L2ZQSUNV?= =?utf-8?B?MlkxcXNkS2lNRGcvUVdDZXZxWFFKbm13Ty8zWCtJTW5VVkp2RGFWcFNaOXJ0?= =?utf-8?B?MmN2VzRaRGVkV2pnRnp6clpwcWx1R2ZYZFRubHZPY0xZb1Z5RzB3b3ErTUJt?= =?utf-8?B?R0RPaWxBUk5xQmQrVWxKeElSQm8xc0haSWVDSmFXV2VaK1UrYVhjeGo5RHdJ?= =?utf-8?B?dnhNREl0eWtFcU9JNHBDckhCRVdFTFFsZXpsQzdDYWFKeFRsbmpuNE5JTjVz?= =?utf-8?B?a3o3ekI2WDh5SjU3Qi9oa2FUM011ck1Vd0ttNGtJYU1jeW1rZkdsZU0yelZl?= =?utf-8?B?TjdLRlplb1F1bm85NWFwMisyc2U0OGZvTFIvM1lCbytLQXVBcnl3YVhTQ09I?= =?utf-8?B?d1RBRlpDUmg4SHBHb3Zsdm5zQUFtTkxxNTNFcUxaSzVLVnBUZVRVa2I3cjBF?= =?utf-8?B?REtFYWJKTWNIOHJXdFFYN09jRDEwVWg5aERHYXNKYklRM1pVd2hkdTdFY1k5?= =?utf-8?B?RGNjVUg0eHpEc0Z6Wm05WTBlU2VHWHlseFFva0dzb2FqK1lwa3RIMDZoT2JV?= =?utf-8?B?T3pkTEErUmdpNGhWNlJmZEZyNlB2blZwMld5NGlNKzNvalB0K3k0TXFTWWkz?= =?utf-8?B?c0VkU1U5bmZjay9RYnl2dEU5SFJublp5MnVPZTdUb3VpRXl4d0M4a3N6SXln?= =?utf-8?Q?7LcL1YY6x9PnIPDtw8MYgUGf3GrL4IPfTB?= X-Microsoft-Exchange-Diagnostics: 1;HE1PR0501MB2764;6:QkPr0qqzQlNh1ucRBPwcBVD87L6yRW3eeWbY5GFN/tydanQyrTKJSoNcJ3v1X+AK6PRj0ZW6T6RrJAbmeRjybjN/jRVlk+1dkI0O5w7iM58mv2PcaxrbikDfPoUKxLDbz/Zs6/kxNbWIUrrdins+pIGHQVH/s5g3UGRg71r86dRjD1hKEckgmXzZAO9jka9UrbHao2aXSfajuwhspONZYi+mrwGIbarXd46jPzD3htaP+f5Q6KZd1kUbyOZ6K1mHTKkccpnr0TP6tMtWuoW5jy/UEzenStNj3V0YjO1crMpMOnVQ9LmAEqVJ56FU6YD+QyyVny3B5s22aT+wLlZTiw==;5:QWh75Is0S61oSfjC+GcpBrWXTK3IbCwbCETX5VmsZYh3mQUFwVyzvdSosOsWjm3LCIH2v/6X1VWYrKhftoF/gkG2Apcp6FhcAh2QYzGsOtn4hMhsin5Y4whO5/EmsteyAWQ6taPTSmy4vUlpYxeJqw==;24:J32pY+3MsbAaydJaJleDqpFl7slFL23VfBBefJnXDvLX3qnYoAchb9yA1VCW6iPIG2fV/k/AYF6p8fh4PzhUpwxbFxPETqO0vln9FvLRJyE=;7:DZlTjTBnOdaQIU/rcHdFnkJZUkNojAJpMYs3loRwORwpENGLqJ9mglcMoolo2JmT8pYMkO0j/ITnGaoSUt+oDCEyiYH9haALTiXgbhZKJmHNevv2BdiE3HOKc4or6mgY0pE38dC/olRnN4ndSxa8uvg230DZjf4JAcY0cZ4EXsFmW3D/XvZJvTdKUNGvnuz4yo9OO5U8iT1Tzm7wD6sjyiTQh5HWpiy/XuNAJlUM5LiKo4lOXwS9JV1E56HRRwvn SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Aug 2016 17:02:31.5964 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0501MB2764 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 8/30/2016 12:30 PM, Andy Lutomirski wrote: > On Tue, Aug 30, 2016 at 8:32 AM, Chris Metcalf wrote: >> On 8/30/2016 3:58 AM, Peter Zijlstra wrote: >>> On Mon, Aug 29, 2016 at 12:40:32PM -0400, Chris Metcalf wrote: >>>> On 8/29/2016 12:33 PM, Peter Zijlstra wrote: >>>>> On Tue, Aug 16, 2016 at 05:19:27PM -0400, Chris Metcalf wrote: >>>>>> + /* >>>>>> + * Request rescheduling unless we are in full dynticks mode. >>>>>> + * We would eventually get pre-empted without this, and if >>>>>> + * there's another task waiting, it would run; but by >>>>>> + * explicitly requesting the reschedule, we may reduce the >>>>>> + * latency. We could directly call schedule() here as well, >>>>>> + * but since our caller is the standard place where schedule() >>>>>> + * is called, we defer to the caller. >>>>>> + * >>>>>> + * A more substantive approach here would be to use a struct >>>>>> + * completion here explicitly, and complete it when we shut >>>>>> + * down dynticks, but since we presumably have nothing better >>>>>> + * to do on this core anyway, just spinning seems plausible. >>>>>> + */ >>>>>> + if (!tick_nohz_tick_stopped()) >>>>>> + set_tsk_need_resched(current); >>>>> This is broken.. and it would be really good if you don't actually need >>>>> to do this. >>>> Can you elaborate? We clearly do want to wait until we are in full >>>> dynticks mode before we return to userspace. >>>> >>>> We could do it just in the prctl() syscall only, but then we lose the >>>> ability to implement the NOSIG mode, which can be a convenience. >>> So this isn't spelled out anywhere. Why does this need to be in the >>> return to user path? >> >> I'm not sure where this should be spelled out, to be honest. I guess >> I can add some commentary to the commit message explaining this part. >> >> The basic idea is just that we don't want to be at risk from the >> dyntick getting enabled. Similarly, we don't want to be at risk of a >> later global IPI due to lru_add_drain stuff, for example. And, we may >> want to add additional stuff, like catching kernel TLB flushes and >> deferring them when a remote core is in userspace. To do all of this >> kind of stuff, we need to run in the return to user path so we are >> late enough to guarantee no further kernel things will happen to >> perturb our carefully-arranged isolation state that includes dyntick >> off, per-cpu lru cache empty, etc etc. > None of the above should need to *loop*, though, AFAIK. Ordering is a problem, though. We really want to run task isolation last, so we can guarantee that all the isolation prerequisites are met (dynticks stopped, per-cpu lru cache empty, etc). But achieving that state can require enabling interrupts - most obviously if we have to schedule, e.g. for vmstat clearing or whatnot (see the cond_resched in refresh_cpu_vm_stats), or just while waiting for that last dyntick interrupt to occur. I'm also not sure that even something as simple as draining the per-cpu lru cache can be done holding interrupts disabled throughout - certainly there's a !SMP code path there that just re-enables interrupts unconditionally, which gives me pause. At any rate at that point you need to retest for signals, resched, etc, all as usual, and then you need to recheck the task isolation prerequisites once more. I may be missing something here, but it's really not obvious to me that there's a way to do this without having task isolation integrated into the usual return-to-userspace loop. -- Chris Metcalf, Mellanox Technologies http://www.mellanox.com