From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752820AbdLHJNX (ORCPT ); Fri, 8 Dec 2017 04:13:23 -0500 Received: from mail-eopbgr40091.outbound.protection.outlook.com ([40.107.4.91]:56506 "EHLO EUR03-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752425AbdLHJNS (ORCPT ); Fri, 8 Dec 2017 04:13:18 -0500 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=aryabinin@virtuozzo.com; Subject: Re: [RFC PATCH] mm: kasan: suppress soft lockup in slub when !CONFIG_PREEMPT To: Dmitry Vyukov , Matthew Wilcox Cc: Yang Shi , Alexander Potapenko , Andrew Morton , Linux-MM , kasan-dev , LKML References: <1512689407-100663-1-git-send-email-yang.s@alibaba-inc.com> <20171207234056.GF26792@bombadil.infradead.org> From: Andrey Ryabinin Message-ID: <57afe220-036a-591c-2acc-56c5f3c6acef@virtuozzo.com> Date: Fri, 8 Dec 2017 12:16:49 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [195.214.232.6] X-ClientProxiedBy: VI1P189CA0036.EURP189.PROD.OUTLOOK.COM (2603:10a6:802:2a::49) To AM4PR08MB2819.eurprd08.prod.outlook.com (2603:10a6:205:d::25) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: ec8e424a-8ea7-4b4c-6e18-08d53e1be9fc X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(4534020)(4602075)(7168020)(4627115)(201703031133081)(201702281549075)(5600026)(4604075)(2017052603307);SRVR:AM4PR08MB2819; X-Microsoft-Exchange-Diagnostics: 1;AM4PR08MB2819;3:csAtb+CBOS4sHakoZ3CXMalu7ynTF3TVbGq5xM7Hl+nPqUKpRibzuQH5ZYP2yX3QXLT5e3u0N2GVirlH49ykHRnEj891U5ylyje7k0/Mbb5sikJCFjcIzihnsyB3YUryZXqUF8/ggodrnj9c6xocrrjfEKqSmyunooqBedT4nYL05dy9iQ15Pf8yYFVciepbkX1WcdoYoUs0yf37bjw5ITmKlMRC5kng9fGtoqIp34CUWI4jYa4hZ0nRCiUWyQUf;25:K7F+RJ3VZaDfhrJs3rO+fD9Ee6/owk/KrDT2FQ+E+G1afFw3ULWGmyudd6LiNvy21YMtw32mk6M1ExoOxhGOHEVgYRe4U3KUz5fFjxkcDgADav18AEss3ZkYYIvxcA9QwyjQD/l0zGps9Y4r8Uv+5a/fNqErvH4fsfptwKb9BtEIEAMAc2+rNalKBD7IhQBadAWNdF7VZI0xtYxsHhupcKRx9n1H3poaMLuJ4ZdNSRlAKE5zKKwTNYlRkczqEH9Prig8MSOoBxU9pybfPCwX4hl6W7vt29txzhMkBXm6HKlVXg4yAR8RU/ZPWvs9FKNeV0uasfdr1r3rpG4914d77g==;31:DRjb7Xjm+v5dSv/ZRrTTLwFn2wAkQlul3geJA2G2Ea80WEDwTbYvakGUmI9W7LdVtCrtqDb5Z0neOWdPVK/ZKhtpcZUFXVtAmSja5JKFqV3ud2YVt56DCzAooGZNtwOVT9ImeEPr+KUy4pnPWvclBOb61vlbYHmvG2jEzZ5JRCVakPm97MBiUsa+58lqu6u4rgrIAlOzbeGzKUKY9RYOwLcDIrK8Gd0vq/4Mp5itkOY= X-MS-TrafficTypeDiagnostic: AM4PR08MB2819: X-Microsoft-Exchange-Diagnostics: 1;AM4PR08MB2819;20:n5iqYOiRdGCGpSxHTqW6GEz6dF/lvyiNfLfBPfy98qFSYOCgWNmT+i6u0XSeBfWgxj47Iv0I4YzQ25W12ksS1VuwGgEHh3h6ihaIdrim2gR9tjUXpA7ATTm4oLT2yDVwP7XKyih8cfuD3gPh1jrk11TvLMTN4AiLIqSUmEU72mn2ptbr49iue7F3bulI8MQhfx74Kr3FNlcdRK1YDYQuhNVMtUp6J0qrfDqdZVCP43AFFekmlEf7e/Bf/udeNemVAzHjVoXgN+PTUHNaVb1Uqz4+QbAtz/WhJxy2Ye7IybSYH/C/43xR3EOytgvBkAFOVlahWawlqaNM9wKjAQWqokMGLgdPX5J6my5hDD4+1Lfo4s5BKgYNebWlY6F5zZc5v4nmXHoiz8uWJsPdreKisu3oDKIviP1aK1/JcJ/iul0=;4:CHmurTV0if9iIynhDs8ceRxeyExmPINAlKaVOPgUNd2R04+7fkOEtPdkNeZjNyJVG0P1Np8XB90zZAKczqulPuPS+DlIzWGrua6RLIkCTcX3MtDMZi/B6bfWgJuFBGunYYKF4ib31EilD9wDaAuMiMsyeGVwRnC5xCrrE31+mMcDo/ksr08gjkdtHWgB2fmlH4jqIMkOFtFUy9Cw4tUf3E12wsVZXlxAQ319Vgbu3wdU1srxWwVxCOH3ilVpsJyGVgBKLsO8+WQwMaiL4JQIEg== X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040450)(2401047)(5005006)(8121501046)(3002001)(3231022)(93006095)(93001095)(10201501046)(6041248)(20161123555025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123558100)(20161123562025)(20161123564025)(20161123560025)(6072148)(201708071742011);SRVR:AM4PR08MB2819;BCL:0;PCL:0;RULEID:(100000803101)(100110400095);SRVR:AM4PR08MB2819; X-Forefront-PRVS: 0515208626 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(6049001)(376002)(366004)(346002)(39830400002)(189003)(199004)(24454002)(52116002)(65956001)(2486003)(52146003)(31696002)(478600001)(68736007)(66066001)(76176011)(83506002)(53936002)(316002)(97736004)(58126008)(65806001)(16526018)(23676004)(229853002)(53546010)(86362001)(2906002)(81166006)(8676002)(25786009)(81156014)(8936002)(47776003)(50466002)(6666003)(36756003)(2950100002)(7736002)(33646002)(65826007)(5660300001)(305945005)(55236003)(6246003)(105586002)(4326008)(110136005)(6486002)(64126003)(6116002)(54906003)(77096006)(16576012)(106356001)(3846002)(31686004)(230700001);DIR:OUT;SFP:1102;SCL:1;SRVR:AM4PR08MB2819;H:[172.16.25.12];FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtBTTRQUjA4TUIyODE5OzIzOkhwaUFZNFBoY2c3TkhHcFpSNnFwdWQ5Y1JN?= =?utf-8?B?ckM3Zzl1SjMxanpVaGRVaWIxenp4LzlQMGMvM08yYm1kQkhpTmtmTEc1N0g3?= =?utf-8?B?ZDREZ0J2MUE0WFRpVnBTbVRlVXlNSFNSdWZVeHFKNVc1VHNYV083U3ZreHdt?= =?utf-8?B?TDZjbE1ZanFjK0hWbkk5N1JCK1hZQ0EvaWZ1UVh0cnYySEZvY0h5R3FtMkdB?= =?utf-8?B?aVhuczN5ZGtSc2hMRXI0dXNoZ2o2QmJwMEZvd1BpZmVmWlg5eHFkZjRTWGl1?= =?utf-8?B?VTFjc3BFa1lROW8vcGJuMlFINUpCTlo5MmpxelZ5Ujl0TXBZTmkrbXV3UDFC?= =?utf-8?B?cG1oMmp3T1lQajZSQU9sZWlCZkFOcmViL2MzRno3dDhwcXhIMnBNTlJNREpG?= =?utf-8?B?SEZabzdzbHdxVXpqM1Mwd3FpUVRhM0V5ZWJrUzRsTHN1RHJKQXl5NUhMeDhL?= =?utf-8?B?dit5SzBzdSs0SEhSK2txdEhMWlVZbHFNYUNFVENHUnZCSjhrQWFKQmJVR1pn?= =?utf-8?B?c1h1bEx0a09mTHNOMGNQUDZhRXlYalIyU005NksvVTZXMVJjeUtkeXBJdm5Y?= =?utf-8?B?d2RjRVVteGsvbEM5SEZSc25ZUFhnRHRUZXFORDdqUDVQM3hTOWdGN0lESTJ2?= =?utf-8?B?akNhaWtscXdxb1dUNlcvQ3lmdHVLRGtERzltRHV1ZENabDEyem5Lc1VpYW85?= =?utf-8?B?TUtES1dQUnNQaWRXQ3cvS3FOMHBpVjFKdDJha0hmZVByU0ZFNGJUY3BDWk15?= =?utf-8?B?YnhUdXlZYzh0d3JkVE92T3BNeXhyVHdJd3c3V0ZGalUrM2FvQUFmN2d2RFYz?= =?utf-8?B?V0pjZTB6UE1IMWNvVjVlSEIxMENlUmx2b3pWR3EvSkp1K285bjhWRUc4YXEz?= =?utf-8?B?d0JEcEl2STJScHVXNzVrdEU3RjF3eXBGMlEwZ3RXZTNGZG9GNTlPMzJFU3BU?= =?utf-8?B?R0RHRnJ4T0hyMTJ5MENiUFZZNVNwRlVTVjBCUG1LVTF3SFM0RHJmSTVHWWxa?= =?utf-8?B?cnlwQUFlTmR3VkZuVU5YY3JqSWIvYzFYOWYvTkd0TVcrelUzSDNIY3NlMC9L?= =?utf-8?B?dXdWMllpVkF3aHNURDI1OXNkanBUNjJDSC82TVhnRHlqQ21TTm8vK0U1a09V?= =?utf-8?B?Ulgrc1NzZ21ReFlpUzFEcjNaTGFCQWFDTnhrOUdiSERZdmFIN0N6TGNqakF5?= =?utf-8?B?dDFZbTkvN0lsOU02L2lDTHI0aXhTbE9HUVh6cDhCT0pwZDltNVNaa3JCQWFX?= =?utf-8?B?cFp1MlhRb2FrOGcwbjdINXFzaTFiRXQwQWlzZDRmZlZkY2o5OHorOS9MNzc2?= =?utf-8?B?LzR1MzV2YWlCNndHL1hLSk1jV3Bya2NpRVpTa1g0VFFNbE1mMmFWbFMxUFgy?= =?utf-8?B?MU05aTdHRm9VUHRlczdTSy95b1lmVjVNL0ZjcklvVjUwQVZlMlh4cU5vY0Q1?= =?utf-8?B?ZXlYOVMwbi9Db3hMc0I5anBoQjlYTTFqZUhndzh0NWtEMklPTGh6WktrNjNv?= =?utf-8?B?Q29yN3FpaHQ3WFl3MVpwMjRxR2srSEQxbldEU3NDZW8wYjdqSEtoUHFtRzZT?= =?utf-8?B?SlY3b0JlNnU0eGllUldsK0lpYW9DMndlODN3Tk5NNzEwdzZCZjVwTGYrS2Mz?= =?utf-8?B?U0RPZFJVbGZpcGE5MGlaUmxrakhaMTI3WUpha2RtdUxWSzJMWFYvY0ppeXVm?= =?utf-8?B?VnkyZHdRTVgrZENObTh4NFM0R1k4VkJGVktiYS9kV3Y3b2drTSs2MHhyb3Uv?= =?utf-8?B?N2FGSFdab1hKTHY4UE93WFIwZk5RREpvVTVLNVZPTmRZbkhMN2F4VjFNN0pJ?= =?utf-8?B?VldJd3lEbW52cmxjRHZrai9Jc3V1VDJXQUN2YnZjaEtXTHc9PQ==?= X-Microsoft-Exchange-Diagnostics: 1;AM4PR08MB2819;6:ghqGh+h7FjUj9VF74e625VavA2FKVegwJdDfxi3/PNS01Vw5kvXnJKurhu0lsUacJmU9y7JvZnVh5717ZHtsdDyxfXlqiErkd1rIIiyqBliDwq/QHQ0srQzpaiFEHLMz31/cAH0yWicp1pmScckBDuoZD0au1A0UN8eoF1397Y897+ql3MgzQB6saIdT+O86umpKjQoOSlIGcalOFHWbRQNFlGDC/b/uIV4rSSqkKy3UCdY3ZXg9bIXTWyCcEfkaJbqvH8wbYhX9qPJFBY1Uc3kl6meKFHM7IMyd4ZyGlWkPYFojYx+GxncKmedYnIPaZ80vmeAeJN0iDYL432hPyP+dnh6Pii30ug49Ta1Bl4U=;5:5n8iK7dr92C8gBMpovUhNQXa/SdPxwkWvSNDxBLYz5ojKH1fNkNdEke3kXzikzO1c4eJxB8SpKRoyGkLl8/WIqsq5Sz60MolBrW8r30raRneXyuxU4287pPnBnggfZdOAPQqcfgeJalThcjgQjox9JDBL7rMXqrYoRs5CtitAL0=;24:+bZQJffhdvVOBfoNkk3AwoJ687ujV8frC241gIZh3quMwL2S3ey55L7UWLzty6/hRe32z0nxX1zqcpfFzpcF7/Wyd3vYUmZE8ywySegOfMA=;7:C5Pejpyn+nlCszNvrueSku03dA62iYEz5gbupvtm+WpDQ9eD90LYpn3KJR26ElmMCxcD9iJNoJr9Yur5+VbsA8wTPRCWmD2X4wCrXKpD949D+oD8p4VN6yKeKXAlNkjAGk0PLFt37Liuaax8yv6fpGgeXtj9MITWt4GI+E17rru2zQbeD25gk2mGwUOhqfO5Xb3J9xGJlyMRdsE3wj2HH5UKlu2DojNXELye+V3+/+3xkW/ex76uoKcTjqg07at1 SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;AM4PR08MB2819;20:W6AQFSQtWnubU0gNOn72d7lWue9FJShFwDwMiiSazwr+BNMxh1Djx72EnSb5XTQP1QuosTNOZ69WwwtChGc3c5tdRD0viRqQwqOnboaHjabi/4/bl1+BdQObBN7iVmfB5379nb73wIoAhWwHDeU3nzKDFautBzdzy0hfCbCDumA= X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Dec 2017 09:13:14.5498 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ec8e424a-8ea7-4b4c-6e18-08d53e1be9fc X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 0bc7f26d-0264-416e-a6fc-8352af79c58f X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR08MB2819 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/08/2017 11:26 AM, Dmitry Vyukov wrote: > On Fri, Dec 8, 2017 at 12:40 AM, Matthew Wilcox wrote: >> On Fri, Dec 08, 2017 at 07:30:07AM +0800, Yang Shi wrote: >>> When running stress test with KASAN enabled, the below softlockup may >>> happen occasionally: >>> >>> NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! >>> hardirqs last enabled at (0): [< (null)>] (null) >>> hardirqs last disabled at (0): [] copy_process.part.30+0x5c6/0x1f50 >>> softirqs last enabled at (0): [] copy_process.part.30+0x5c6/0x1f50 >>> softirqs last disabled at (0): [< (null)>] (null) >> >>> Call Trace: >>> [] __slab_free+0x19c/0x270 >>> [] ___cache_free+0xa6/0xb0 >>> [] qlist_free_all+0x47/0x80 >>> [] quarantine_reduce+0x159/0x190 >>> [] kasan_kmalloc+0xaf/0xc0 >>> [] kasan_slab_alloc+0x12/0x20 >>> [] kmem_cache_alloc+0xfa/0x360 >>> [] ? getname_flags+0x4f/0x1f0 >>> [] getname_flags+0x4f/0x1f0 >>> [] getname+0x12/0x20 >>> [] do_sys_open+0xf9/0x210 >>> [] SyS_open+0x1e/0x20 >>> [] entry_SYSCALL_64_fastpath+0x1f/0xc2 >> >> This feels like papering over a problem. KASAN only calls >> quarantine_reduce() when it's allowed to block. Presumably it has >> millions of entries on the free list at this point. I think the right >> thing to do is for qlist_free_all() to call cond_resched() after freeing >> every N items. > > > Agree. Adding touch_softlockup_watchdog() to a random low-level > function looks like a wrong thing to do. > quarantine_reduce() already has this logic. Look at > QUARANTINE_BATCHES. It's meant to do exactly this -- limit amount of > work in quarantine_reduce() and in quarantine_remove_cache() to > reasonably-sized batches. We could simply increase number of batches > to make them smaller. But it would be good to understand what exactly > happens in this case. Batches should on a par of ~~1MB. Why freeing > 1MB worth of objects (smallest of which is 32b) takes 22 seconds? > I think the problem here is that kernel 4.9.44-003.ali3000.alios7.x86_64.debug doesn't have 64abdcb24351 ("kasan: eliminate long stalls during quarantine reduction"). We probably should ask that commit to be included in stable, but it would be good to hear a confirmation from Yang that it really helps. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f71.google.com (mail-pg0-f71.google.com [74.125.83.71]) by kanga.kvack.org (Postfix) with ESMTP id A509A6B0261 for ; Fri, 8 Dec 2017 04:13:19 -0500 (EST) Received: by mail-pg0-f71.google.com with SMTP id f3so757071pgv.21 for ; Fri, 08 Dec 2017 01:13:19 -0800 (PST) Received: from EUR03-DB5-obe.outbound.protection.outlook.com (mail-eopbgr40110.outbound.protection.outlook.com. [40.107.4.110]) by mx.google.com with ESMTPS id 1si5170441pll.596.2017.12.08.01.13.17 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 08 Dec 2017 01:13:18 -0800 (PST) Subject: Re: [RFC PATCH] mm: kasan: suppress soft lockup in slub when !CONFIG_PREEMPT References: <1512689407-100663-1-git-send-email-yang.s@alibaba-inc.com> <20171207234056.GF26792@bombadil.infradead.org> From: Andrey Ryabinin Message-ID: <57afe220-036a-591c-2acc-56c5f3c6acef@virtuozzo.com> Date: Fri, 8 Dec 2017 12:16:49 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Dmitry Vyukov , Matthew Wilcox Cc: Yang Shi , Alexander Potapenko , Andrew Morton , Linux-MM , kasan-dev , LKML On 12/08/2017 11:26 AM, Dmitry Vyukov wrote: > On Fri, Dec 8, 2017 at 12:40 AM, Matthew Wilcox wrote: >> On Fri, Dec 08, 2017 at 07:30:07AM +0800, Yang Shi wrote: >>> When running stress test with KASAN enabled, the below softlockup may >>> happen occasionally: >>> >>> NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! >>> hardirqs last enabled at (0): [< (null)>] (null) >>> hardirqs last disabled at (0): [] copy_process.part.30+0x5c6/0x1f50 >>> softirqs last enabled at (0): [] copy_process.part.30+0x5c6/0x1f50 >>> softirqs last disabled at (0): [< (null)>] (null) >> >>> Call Trace: >>> [] __slab_free+0x19c/0x270 >>> [] ___cache_free+0xa6/0xb0 >>> [] qlist_free_all+0x47/0x80 >>> [] quarantine_reduce+0x159/0x190 >>> [] kasan_kmalloc+0xaf/0xc0 >>> [] kasan_slab_alloc+0x12/0x20 >>> [] kmem_cache_alloc+0xfa/0x360 >>> [] ? getname_flags+0x4f/0x1f0 >>> [] getname_flags+0x4f/0x1f0 >>> [] getname+0x12/0x20 >>> [] do_sys_open+0xf9/0x210 >>> [] SyS_open+0x1e/0x20 >>> [] entry_SYSCALL_64_fastpath+0x1f/0xc2 >> >> This feels like papering over a problem. KASAN only calls >> quarantine_reduce() when it's allowed to block. Presumably it has >> millions of entries on the free list at this point. I think the right >> thing to do is for qlist_free_all() to call cond_resched() after freeing >> every N items. > > > Agree. Adding touch_softlockup_watchdog() to a random low-level > function looks like a wrong thing to do. > quarantine_reduce() already has this logic. Look at > QUARANTINE_BATCHES. It's meant to do exactly this -- limit amount of > work in quarantine_reduce() and in quarantine_remove_cache() to > reasonably-sized batches. We could simply increase number of batches > to make them smaller. But it would be good to understand what exactly > happens in this case. Batches should on a par of ~~1MB. Why freeing > 1MB worth of objects (smallest of which is 32b) takes 22 seconds? > I think the problem here is that kernel 4.9.44-003.ali3000.alios7.x86_64.debug doesn't have 64abdcb24351 ("kasan: eliminate long stalls during quarantine reduction"). We probably should ask that commit to be included in stable, but it would be good to hear a confirmation from Yang that it really helps. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org