From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751838AbcGUOHf (ORCPT ); Thu, 21 Jul 2016 10:07:35 -0400 Received: from mail-db5eur01on0041.outbound.protection.outlook.com ([104.47.2.41]:18313 "EHLO EUR01-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750856AbcGUOHc (ORCPT ); Thu, 21 Jul 2016 10:07:32 -0400 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=cmetcalf@mellanox.com; Subject: Re: [PATCH v13 00/12] support "task_isolation" mode To: Christoph Lameter References: <1468529299-27929-1-git-send-email-cmetcalf@mellanox.com> CC: Gilad Ben Yossef , Steven Rostedt , Ingo Molnar , Peter Zijlstra , Andrew Morton , "Rik van Riel" , Tejun Heo , Frederic Weisbecker , Thomas Gleixner , "Paul E. McKenney" , Viresh Kumar , Catalin Marinas , "Will Deacon" , Andy Lutomirski , "Daniel Lezcano" , , , From: Chris Metcalf Message-ID: Date: Thu, 21 Jul 2016 10:06:56 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [108.20.173.108] X-ClientProxiedBy: CY1PR21CA0105.namprd21.prod.outlook.com (10.164.213.31) To AM4PR05MB1682.eurprd05.prod.outlook.com (10.165.245.153) X-MS-Office365-Filtering-Correlation-Id: b72b78ec-7c14-4b59-75ff-08d3b17052ee X-Microsoft-Exchange-Diagnostics: 1;AM4PR05MB1682;2:sg2WqWdJRC72SpVYVt6O7LnLjYF3wlVgEl6f+eC/qp+FrPSD0An+deAwradXOlbZG5ei3lMc9rKEK3VX6VuAHpFdSr2brnB22adUsE1GAfVNRFtV0AEGbPrsQjuYKVmV999Tz1q03PTb8azSCEmEJ6/pkVwOSrguP/wHvZ4AKnyic7BU/xGmAP+v1N4ck5EU;3:i8neAv/Dq6An1sA5dx7r2qX/CdNTXX9B00he2m6QxBAyRhVHwpxYmyRF0kJKR3zVA/VdmxHt+8So1XEt3wooBYMbqZJW8vxJ/tqUAkX2aPdInMzqUJBaloXXr1js/zJT X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:AM4PR05MB1682; X-Microsoft-Exchange-Diagnostics: 1;AM4PR05MB1682;25:rZe1Ysmh2p5XyhUZ1pP3B7nIDHYc1HATJolQ3z6iZI5Qvp84d8niFuSEYHr4TDP99ahRzlPFigIlsNyez5WT5ZoTeTTBuTJriWmHJPjRlfZSLCdIs8zYZ77AjILJcEcJSb9ucO9uslNnTFZ8QmMgKVqcqHP9rEQDgE/gQc/N5ji57v8PQ4AL8EVUDOyrzVmy8WsKb/Yk0z8IrxuN5y8EsxR4H3TtiJFuoXBgdDLCgcpMe283dcb5uOl/3WKATiC02gHNFhVrisPdz4fvEcipjD43bg7R04DS7UOuatF3s5XCWePdBXjlT8BJmto9myjRiDWgxYjATmZzLouLCNhYioWX3IVNLzxhXQofP5cAtk69EYbHgAT95reT1mXJJ5TVWT3QygAVuTPCyIGssi4TaOLlCInpRfuZtQw/WzJ0Yc6OcQC5vrZsZfczBPjYcdgT4daeHVmsXcSzIValM3ZHjXD9/DlnNpxstjFqRih8iV9xm+ZvLm3JyfuMYfiVPjRVQxSSLO53CBQE+kU+p06+AuMyE3wNhCWuQL7XxtVKVyp4vh/avp87wh/jPFV2MOyAChkrEyD5Jz+4VhcURMt/qcP5rMvACSXyVJT1ZjHVvtupWXx+9fZ8emgPXiP0ahUxB0a2FhFn4adaRd+3+rlayIQveC/ME+sFU71gYtMVdZ2856883H4X0bzFihGXHY6vBg5kZcHGWcSqqB7aM3tNA7Lw9ofOCsagXl8exNgqmmsXpPxfDu5ZwPcAVJ//ZKbumzxUKapQ9fzkvm8beoKKXFWialren//EASucYZNXh4IVJu8yZ0qqk5bOebvKAw35 X-Microsoft-Exchange-Diagnostics: 1;AM4PR05MB1682;31:chet0iLWwpJhxtSR8NwAZ7HiwbAJXbM/Fc5OR9KXa7XLC7wDDLkFczo5Jq05JEImeoE6lKbwyeFM8RouScie9oMzU3qI8vbWZUpZmC+e4IZmhJyj356j+W3EdODvSNMAuHOlZLqle/GldaaaQ5+eIG+vONaAvtuOePfeHaM36Djfngp5X2Ia3wQynY0b4sImTtk6ED6blZFIJH6f8QXeFA==;20:K3GoM2Dr3Nd+QqgjKk7g96HBJJrcNQadSxBJnOyULUTqW2M/Na3Ic+hBDP10a+0+3YUbijlyDeY2bAAiCaj3BLq11EZ//cn0XIAvKNWpQdbR+Tkeg8uRqFhgSbNKAuy2WEjduU5YvR9R085GfxxAL2SBQnqBuEUovuHpmP2qP1i0J9+ZN0stzJ0h/y8D4JLL/aCtIemYh99102iDSnV9QszlM31xrpzUeQ88ubx0/pN1xMGWRBcANcyxE+BTlgZIk2heJUnMSoyw9kWfNtmqzkXPGeKWguMHFJ69/dRCu3sUCkEbrzYytUUzR3zDYninyVas06dBFlx4QQqnRRKKoGqaEY28zb2bXAVCW9tqmhUVZh433D3nXrf4Ux1QCOUpCKX/3FloduCTXSybZBfNulMECbjni/Lld+5NqooqHiaPUeMDG20FWdZa40LNZCBD7Xv8lT2O/yNfxaNae7g72QV69tnn2Cl+SvB0b9jqCi3VGFCG7xBIfXg3IOab1lP+ X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(171992500451332); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001)(6055026);SRVR:AM4PR05MB1682;BCL:0;PCL:0;RULEID:;SRVR:AM4PR05MB1682; X-Microsoft-Exchange-Diagnostics: 1;AM4PR05MB1682;4:EeBDxzDZCGs8ZqJf9Ku/lCG52YvfkyltWssIHML3VLqNqV29/0hx499UX/scx+7BkOKJU3maopt3RyxKj6CXkz7eS6pBRaEFhSH1crbUzWU5fuWt4+mHaPGwrWw/1+irbuKl/pdmtTFiNbi+F19gifJFcgiDdoP1wM/2FYJ8/XDN3fWE8jTTz5/TYjfbw8EthB6XtGmzvwOY9Y60ejVTm8WrJNIZqF9G/42Q5kQNDGy2JPSnb9BILvdnPaIY/yNRoXkG+PCjvJIg98kJnPgAnAon5sMulj86DE5AheeWi4iCWQCmd+FAePZQbn/AkzXjxRxVsaOWDj5j4dbgqsU7Eb3THmFdIdq8UtijriNFHX33nVM/llnFED8yaDxFxdedrfOtnDWiI8+91uDPMrBydBBEnwfrfHC/IvWNaA/Pel9FatteowW3Q7+AuEmvbOEu X-Forefront-PRVS: 0010D93EFE X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(4630300001)(6049001)(6009001)(7916002)(199003)(24454002)(377454003)(189002)(54356999)(65806001)(76176999)(50986999)(19580395003)(106356001)(2950100001)(65956001)(97736004)(36756003)(47776003)(81166006)(81156014)(305945005)(66066001)(7846002)(50466002)(7736002)(4326007)(2906002)(83506001)(101416001)(77096005)(31696002)(586003)(33646002)(64126003)(189998001)(3846002)(23746002)(68736007)(8676002)(92566002)(230700001)(6116002)(42186005)(15975445007)(117156001)(31686004)(86362001)(575784001)(110136002)(4001350100001)(105586002)(65826006)(18886065003);DIR:OUT;SFP:1101;SCL:1;SRVR:AM4PR05MB1682;H:[192.168.1.158];FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;AM4PR05MB1682;23:lIWtJLkmtD8VcCO8JQJ2xF9+R7bcchhmXd5sH?= =?Windows-1252?Q?u097bTFJBaBD97infs7D+EKlswhVrdjyZK5g9/IsEM1MCV7wEC8kugf4?= =?Windows-1252?Q?N0yo1+CxHuQh+EuP7CStdOKx8i69YqLTUYuyKVfwGnBOrRreveepXP6g?= =?Windows-1252?Q?G7JahNo+ZCKheu/wHWVP9YAPRrftWFoWTHBQWckulKoxW25NUs42tyzH?= =?Windows-1252?Q?iJrk8mlZLk9mXPVsBJUCc1SmP1kH3iAguCGDM/0w2S6fwHgsFjg3G+cU?= =?Windows-1252?Q?FSbc2eum8cSsRbeMmLNrPOAWrlG/QSkbdTkYkL7/1LdRFCfsJoQp4LJt?= =?Windows-1252?Q?hu/CREoHAITlqpJ+SVsJrZxO1ZFSBHpEPHTeeTsvcnYthaxDL1Tk2yxB?= =?Windows-1252?Q?smRlMow6SY87/JNnFoP5l8Y6ijfDiEIItXxlcCTip61U/NmebVpjZIKW?= =?Windows-1252?Q?240VVDpMEBT376EbYUDK535poap5gJABwlO2yDhPYUvvpGIeVJfwX9uu?= =?Windows-1252?Q?TTL0gzM/vX5LmbGXhLFN8uQHbzOb/8FFqpsNssBxiFDp9JmkDkJtCV4d?= =?Windows-1252?Q?7gKXwJsJ1MmWE6PsCGmv3gbgC+VUOz5gPOQhOhsEihCxR6K3W3Xnu4On?= =?Windows-1252?Q?COpw92f2LkvVxWXnFnf9LbARE3e+04d3DgJJoynj5hPst/BAIa5T81q4?= =?Windows-1252?Q?KtodiDf8CZ9SvrDPQY2kmAfbgwtiQonoq268us8d45xMtnN4zAIGIsXg?= =?Windows-1252?Q?uXKejC2hyCcyjL1+007mJedrT1fWstDoYnFZf6qX15STovALrKhEQXJX?= =?Windows-1252?Q?BGoO2LplHNYO6a7aQVCxFmQi49JUflYbp1YaKDnjhf1SeRh2PGuljTpE?= =?Windows-1252?Q?uyvzXS9hWFRrO4UYFQgS/A+ZVHXh8ijtWZyLdP7mz+djaenkdK6EL5Cm?= =?Windows-1252?Q?rzKsZnGSEj0mqegUHzWDjNkYaDSlANuUGo4Y36Ioo6+ujRsL6M9buu7R?= =?Windows-1252?Q?CEQTVgFasgMy76zj0Bfn4kA6iZED3PD7uxKhpCkc1rtrpxWNDc4hzzZU?= =?Windows-1252?Q?AedlL4PVeVKoUUXSig3aZ7oQJqbmaewgNpfQwoCLa8HbdsWuynxITv1b?= =?Windows-1252?Q?Vh8rITTehgT0Mpfse7nyTuOilJCUeNJgIzRhEIZF9v69os5zN9XMjisH?= =?Windows-1252?Q?Z6/Jw3j/yfF8yNEwP0ltIGvF9qHjLPubd+V9VBUELS3abHlpujl9OnXL?= =?Windows-1252?Q?4o60eH+fyz3q6Wn0x3024wPIIJfCgflHJOxveG6/8FzGmBuwWhqkp6PD?= =?Windows-1252?Q?XffVt2+/8zwoycuxVTfgeH5IqMyBfu10UG0h7zNa/zqN6p24Ah7v0X9E?= =?Windows-1252?Q?L/Zg6pI8fV2P+zCOyhq5t3C05gifUZkbm2t+oaNJxbdbEI9Jbk2z7HP9?= =?Windows-1252?Q?QubWqtDbGdBkiBsZwkD?= X-Microsoft-Exchange-Diagnostics: 1;AM4PR05MB1682;6:oqM0zQiEHwo4k+qzOzAzVVV6ps8dUE9KXp5ek9u8Uv8k5c+AjW1EqK1crxgRT8bXtz8KCp7ZG0D07xSVIUZ3y6TgD0H5pdeNGzP1ZsiKIEV0+136xUY/TAi78nyMOslvH9qSRjsqC3vNgZ5SFoSki18YbyqTTV8n6bH0wAWhRi28puCmGxdDk7gGNigw9XDy1GMXr0WyLg/iyqw6SlLj+z0LCbHaEsxqS1H9uOGutZCJforoPhocoTHwYMWLi9dVHAP+KEfYWF8b7kRRifchjxwteCYVlnWq7C6c841pX/qsF/bOEhc4eLnI9zGgSgbGjeC8XOokG7j/yFNWQGJ4UQ==;5:FobanmG9jdxjWgKan1oRFH3wKiacev/R2/3aamGKqZ8lDwCOOPEEGQ6EH3TCS2K8z6wdiIGL22uK0lSImT7me76WLYr3PhtACYb92XtqTVK1M/pJHdy3ZG/0u6qIhraeL+1ArZg21fh8LdyWPPo11w==;24:KGCGmuWe0K4gjGXfo4S4PlaZq4x8kHXrlN14ClTk6l7ERTr0a062gooWdgYvOiuycggH/xeLnqsDlUErTf2eNXrMMDppNbQzfaryAQHLJjY=;7:7FsbiTuTn8et8QnwaBenOV7yUqB0QRKXG2IKLly8RGp0+dQknLnq1UiiCVJUe52wEZUzwIgfJcu3KVzY6109nKT+ZECgBxsa4DjM7MsmoAYzlI1rLrrfIbBikXGkBQlUeLYafqgzxSk5bf3JobRGxI3sjJ7xrCtQf0QIhdLnr/Ev3DdYwaATgkc5O485FAD+zczosHWv1QzJplCuOx0Fsq6PK0MaIhwUDRMWReVEFnb4nrjl6Rsm96kIFyTyhlHB SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 Jul 2016 14:07:11.5025 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR05MB1682 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/20/2016 10:04 PM, Christoph Lameter wrote: > We are trying to test the patchset on x86 and are getting strange > backtraces and aborts. It seems that the cpu before the cpu we are running > on creates an irq_work event that causes a latency event on the next cpu. > > This is weird. Is there a new round robin IPI feature in the kernel that I > am not aware of? This seems to be from your clocksource declaring itself to be unstable, and then scheduling work to safely remove that timer. I haven't looked at this code before (in kernel/time/clocksource.c under CONFIG_CLOCKSOURCE_WATCHDOG) since the timers on arm64 and tile aren't unstable. Is it possible to boot your machine with a stable clocksource? > Backtraces from dmesg: > > [ 956.603223] latencytest/7928: task_isolation mode lost due to irq_work > [ 956.610817] cpu 12: irq_work violating task isolation for latencytest/7928 on cpu 13 > [ 956.619985] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 4.7.0-rc7-stream1 #1 > [ 956.628765] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.0.2 03/15/2016 > [ 956.637642] 0000000000000086 ce6735c7b39e7b81 ffff88103e783d00 ffffffff8134f6ff > [ 956.646739] ffff88102c50d700 000000000000000d ffff88103e783d28 ffffffff811986f4 > [ 956.655828] ffff88102c50d700 ffff88203cf97f80 000000000000000d ffff88103e783d68 > [ 956.664924] Call Trace: > [ 956.667945] [] dump_stack+0x63/0x84 > [ 956.674740] [] task_isolation_debug_task+0xb4/0xd0 > [ 956.682229] [] _task_isolation_debug+0x83/0xc0 > [ 956.689331] [] irq_work_queue_on+0x9c/0x120 > [ 956.696142] [] tick_nohz_full_kick_cpu+0x44/0x50 > [ 956.703438] [] wake_up_nohz_cpu+0x99/0x110 > [ 956.710150] [] internal_add_timer+0x71/0xb0 > [ 956.716959] [] add_timer_on+0xbb/0x140 > [ 956.723283] [] clocksource_watchdog+0x230/0x300 > [ 956.730480] [] ? __clocksource_unstable.isra.2+0x40/0x40 > [ 956.738555] [] call_timer_fn+0x35/0x120 > [ 956.744973] [] ? __clocksource_unstable.isra.2+0x40/0x40 > [ 956.753046] [] run_timer_softirq+0x23c/0x2f0 > [ 956.759952] [] __do_softirq+0xd7/0x2c5 > [ 956.766272] [] irq_exit+0xf5/0x100 > [ 956.772209] [] smp_apic_timer_interrupt+0x42/0x50 > [ 956.779600] [] apic_timer_interrupt+0x8c/0xa0 > [ 956.786602] [] ? poll_idle+0x40/0x80 > [ 956.793490] [] cpuidle_enter_state+0x9c/0x260 > [ 956.800498] [] cpuidle_enter+0x17/0x20 > [ 956.806810] [] cpu_startup_entry+0x2b7/0x3a0 > [ 956.813717] [] start_secondary+0x15c/0x1a0 > [ 1036.601758] cpu 12: irq_work violating task isolation for latencytest/8447 on cpu 13 > [ 1036.610922] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 4.7.0-rc7-stream1 #1 > [ 1036.619692] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.0.2 03/15/2016 > [ 1036.628551] 0000000000000086 ce6735c7b39e7b81 ffff88103e783d00 ffffffff8134f6ff > [ 1036.637648] ffff88102dca0000 000000000000000d ffff88103e783d28 ffffffff811986f4 > [ 1036.646741] ffff88102dca0000 ffff88203cf97f80 000000000000000d ffff88103e783d68 > [ 1036.655833] Call Trace: > [ 1036.658852] [] dump_stack+0x63/0x84 > [ 1036.665649] [] task_isolation_debug_task+0xb4/0xd0 > [ 1036.673136] [] _task_isolation_debug+0x83/0xc0 > [ 1036.680237] [] irq_work_queue_on+0x9c/0x120 > [ 1036.687091] [] tick_nohz_full_kick_cpu+0x44/0x50 > [ 1036.694388] [] wake_up_nohz_cpu+0x99/0x110 > [ 1036.701089] [] internal_add_timer+0x71/0xb0 > [ 1036.707896] [] add_timer_on+0xbb/0x140 > [ 1036.714210] [] clocksource_watchdog+0x230/0x300 > [ 1036.721411] [] ? __clocksource_unstable.isra.2+0x40/0x40 > [ 1036.729478] [] call_timer_fn+0x35/0x120 > [ 1036.735899] [] ? __clocksource_unstable.isra.2+0x40/0x40 > [ 1036.743970] [] run_timer_softirq+0x23c/0x2f0 > [ 1036.750878] [] __do_softirq+0xd7/0x2c5 > [ 1036.757199] [] irq_exit+0xf5/0x100 > [ 1036.763132] [] smp_apic_timer_interrupt+0x42/0x50 > [ 1036.770520] [] apic_timer_interrupt+0x8c/0xa0 > [ 1036.777520] [] ? poll_idle+0x40/0x80 > [ 1036.784410] [] cpuidle_enter_state+0x9c/0x260 > [ 1036.791413] [] cpuidle_enter+0x17/0x20 > [ 1036.797734] [] cpu_startup_entry+0x2b7/0x3a0 > [ 1036.804641] [] start_secondary+0x15c/0x1a0 > > -- Chris Metcalf, Mellanox Technologies http://www.mellanox.com