From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752304AbcGSVXS (ORCPT ); Tue, 19 Jul 2016 17:23:18 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:50538 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751892AbcGSVXP (ORCPT ); Tue, 19 Jul 2016 17:23:15 -0400 Subject: Re: [BUG] Slab corruption during XFS writeback under memory pressure To: Dave Chinner References: <28f77d74-5ab4-d913-2921-df90da53f393@fb.com> <20160717000003.GW1922@dastard> <20160718060215.GB16044@dastard> <24d2f83f-5281-ab3c-9e91-985a4b8e2f8b@fb.com> CC: , , , , From: Calvin Owens Message-ID: <53af895c-7ddb-1e50-6c90-d4d59f5c7a2f@fb.com> Date: Tue, 19 Jul 2016 14:22:47 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.2.0 MIME-Version: 1.0 In-Reply-To: <24d2f83f-5281-ab3c-9e91-985a4b8e2f8b@fb.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [2620:10d:c090:200::3:2ab5] X-ClientProxiedBy: DM3PR14CA0048.namprd14.prod.outlook.com (10.166.156.144) To SN1PR15MB0415.namprd15.prod.outlook.com (10.163.205.149) X-MS-Office365-Filtering-Correlation-Id: cd613ff2-4c07-4611-1cfe-08d3b01ad8f6 X-Microsoft-Exchange-Diagnostics: 1;SN1PR15MB0415;2:HZRvv9iNf7E7zGvzIS24PsahGyE9s62v+iKL9Tt+PtypdE/9jnMFWsoyr0d9YsgAXOXnwQWtwBvoExjOPjrjJuRY9LWYadn9HtkA5w/pbRmr+Qf82fBmU/ARXkenpBQONQ/3QdXCh/9YkBoOsRiSM8KBFI4Ve2k5CwG7c1VV9Xk9Xd+mV8oa2++f6q0QnhNa;3:Y8ad7EgyJu58vFFDY3NgUjXKhb4r/DfJh1FAg0PN9riQXyFgrMEStVzzOeG7xdbsbEfDmb6EkXPbzAsctWU0qR+ZjjttaC6lBaKbrVQaJTQWTODVN9FXDJkLUTjPaTEf;25:ybpImNqOmoqkdKnH6s3mNHRyAkiR4jB3WbipyjiAqXDt/P2G/4yvftYvKOrDMtoDoBGBOe7XVYPk84N0F8vcAUgEvMRQm7UUUrxIQLSInRKvgn21OXBTL5DA8z5fb8UROAq8ioiOGYGStdgMEqmDwIG/OUqOhZSbQTIEyE7CE5SVP2I1DvJLfqM119FdF5ImupB5kdD2AtqcEA5xB55RJpkFWTqDZMHS5LowGZfDKGHRTQWtd3Lx7ifBS7LRcjR6zv9id9xSDet7Q1tO8sTNQq6UhDd3uy5f/7R4pBT6tNRJfR3Rir0r9R0fb9AxPBmdJtevHvakuMoN+sj+ySyWQUjRRsWOyqxIH2jX04gI3uJVxmH6QI1gSs+ED+T4aW03Rl24/hGAdYsPkna44nl1XTGDANY3P4STcZeOz76bPgY=;31:H0Zjzg7cB038/tnA0SeONnvz6qTI9wHv93wQBbto6Wd7U/vhTJZT2YHKPidBI8bdgHZ/61avh/GkugwhDraoQ8vqMzI0j/tEMSQBT3OD70qIGoVDzai9P2I7zbpn+zY7EILBQh0501JKjQU7acuOZH/dyq6VibA8A7acXqGKnw9+2NQk1yFfRnDGZ0/0Pyr5tdbvo+vs53c8lB11fcjRMg== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:SN1PR15MB0415; X-LD-Processed: 8ae927fe-1255-47a7-a2af-5f3a069daaa2,ExtAddr X-Microsoft-Exchange-Diagnostics: 1;SN1PR15MB0415;20:SYIVOJwP3s4brkK46M9h0YgtDbhfSWJFjd3xxcM/DtZzEphdarnR6Tcd8GyN3vwukO/VoQWESIRtMCyzMHPqlGyaO4qOJMNFnJ6eGGnUlEavWheZpITRgB8OArgOl5p68YjmNvv1Xy8wO4FEXYvWpNhUodmvXwKmqbyhWnvb8ps=;4:74G19iigknOF8L6mOS5f+0pYHVf9F8BDPnpXq881i5SLnSO4KGMt/b87whf0A1DNi2hHQ8qZY9TaDva6t8w4tLBBjHcDHWQJ1BrKkbT1++5IFEcpBzJRzVhv6tcEMyoHzeXsIdO9O8Pt0nJg5IdoBfOPquTL0Fjzlnsq7O25I4XLeHwmYfu/EuFkmEn9URbHYp5QN0cFKQ7VDjg2rr8DAPDRCTv06FAvn1q5GUsEABSWOI2ZEnw0h3/cYcw3VDod+6frarPHiGkLm3mSx8I1O9bMM2QTr53cRBpWJnVwOxA1jukJ1Vav5LaGfwnN0nDCFw92g0sl2L07QxZwr9LAOOhJ+PlkXVm7rwqTBPhivSLNLEFBBrxORETqajwc6+quCLZGUgnKHcNHOeNE0Rg4VXaYYYbdq46t0Z1T9s0T+28ev0ofS710bXh9Q4F0u0T4 X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(67672495146484)(81227570615382); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001);SRVR:SN1PR15MB0415;BCL:0;PCL:0;RULEID:;SRVR:SN1PR15MB0415; X-Forefront-PRVS: 000800954F X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(7916002)(53754006)(199003)(24454002)(377454003)(189002)(586003)(106356001)(105586002)(1706002)(6116002)(68736007)(2950100001)(77096005)(19580395003)(19580405001)(8676002)(81156014)(33646002)(230700001)(23746002)(92566002)(64126003)(305945005)(7846002)(83506001)(7736002)(50986999)(76176999)(54356999)(47776003)(101416001)(4326007)(81166006)(42186005)(97736004)(31686004)(110136002)(2906002)(36756003)(93886004)(189998001)(86362001)(31696002)(4001350100001)(50466002)(3826002);DIR:OUT;SFP:1102;SCL:1;SRVR:SN1PR15MB0415;H:[IPv6:2620:10d:c082:10e2:c23f:d5ff:fe6b:54f7];FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;SN1PR15MB0415;23:jbDkfi36Rln5o+BJ+wimZ3GX7gINZ6CEC1vu7?= =?Windows-1252?Q?0IpnNh7AJ/myrfIhsIAXD4vh/2zNfqeVrQZ1gyV32buzp5SKxLWV2SK6?= =?Windows-1252?Q?6N9RUdKHW2l3GKt5Cb4x3DsyNInZHxqRzY/lqg8nHNjz9UoR+DZexZeb?= =?Windows-1252?Q?8dbr6WsGYUAg8C3HH/q30e/YI4pdpiTy8emwG5csLtYHErqHLs61T7Lt?= =?Windows-1252?Q?ttOgKWKbx/9WA9XxoLYUIEGgQfFbbuUpT/GH43BezWnjIjoayKespduk?= =?Windows-1252?Q?Tsb6GpvQ0+wchiXCpCoDxJJjokTpyHHNeN0LEJu7EIp2KYzBg9Rya9RN?= =?Windows-1252?Q?JLzB+aRgEeXI+8Fb1K8jC77n3DQmJuEicdTksA3bF2ANxKoCcMpaC+6w?= =?Windows-1252?Q?9jBaW+KzPL+acHZbuS312X3NiDf/BYh/aVCTxa/2Lo+ZxbtX/t5B7qHB?= =?Windows-1252?Q?JnOOiym55UBXp0clI8RK/AdW8mtpLZBM1f0zJwpIsYUSWxUAYtGBVTmz?= =?Windows-1252?Q?t46J/cjil5CXua5IWZLpTOraCOJEbIBYOAHZFVp9I/ACuSDeGaVQXz5b?= =?Windows-1252?Q?sFo+HNVfwbNlQGgMo76x5IHLH/TUZJRMatCsPnLScs4iesJtsRcLF3gL?= =?Windows-1252?Q?dahlYu5wYPFn0WYvg2fepPBoi4SsM0G2H25PlIAJkttAir3MhmtyWibb?= =?Windows-1252?Q?JRGt66gkd5Fmb0i6TJuD1jVK93nKyvaSSM6Oq4ii/8DEPlMRZMHiH0sX?= =?Windows-1252?Q?Tgf472k08H/rgBuSd1HVyM7K33f3ETSPYBwFEQFkBBEBQW/gIuHmIah1?= =?Windows-1252?Q?5OW05bjP0P+ZFsLpJo0l8JykCWGIj9+GCN7GyX1ElOd+Z5cbhJl6YYyS?= =?Windows-1252?Q?+uCbMZq4B4kUYsw8if+SCWANcDDi3hx2pFDeUFA+ndFwey+XmQT9ICj2?= =?Windows-1252?Q?fGmZ1VZDXFSbAdSWcyZUNidge8edg6/T1r9fCalQvILrxwAuRHNckbp+?= =?Windows-1252?Q?KejJE/M2/JFG/afJfzO2hRuNCfbyfpiryzKeVpWHxY/HCo4i/gHFcry/?= =?Windows-1252?Q?lQtwj+ac8sHa6ljofeOOnsYcPoBm1V0Q7RCghoKSN0k//tp4wJwmno2B?= =?Windows-1252?Q?tmapfzSDakFBiZC0k/gKz+vqrtkqV3k+TO6vqZRcvDf6EeYGsaPvH426?= =?Windows-1252?Q?ggtnOpBTaGY60ayN0MiZI7p5NxMqSob+gMvDTUjcCmY/ogrR0H9n0eQN?= =?Windows-1252?Q?N8+oBA+4xdWvM6tipO5HrlFFg5dSKwg8Uod/KoIJRnj/C+DGUEUyQO3D?= =?Windows-1252?Q?cyYlFUFXTtl41MWyZoPem2uhQ=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1;SN1PR15MB0415;6:7TLEcdHUzBtsAy0Y4LxKJtiUKFLA5DOQjUpEyqj9oi/EMCEuHu9PjUkU6owro4Bru83EDS/778wqeebDdlMi5awfFQjTAKwP2fqJzlol2Sa4zruV5OJxZek/7OK4jdw4R1mzlEL/Us17lL+cMV7WgMNXyvrR5QQRkKfIxEvE8xA78gtR8RE6LFUkKkVjWRRu/4zGYs2TpKSsFIuJM06g6I7G3Lrz8rxsif7AbPPHcEE0lMvQM/1E1M+gY/PETcFB/SGYFSGSqgdzarLMx+IdXLYWyVDYoUCg6qcRyQoYUBs=;5:1AJtCW0Ow+TidI5qHjESyFP/DTMRi9KBtp7T5jJNyr9OaM7VaeEK/nLi0Cmo0BzSnMq8C48+XT9+hXqCHGsm4Eop4lqwihkQX7Q7OF7Bx0jGzFNdoYox/iq3TcxGprcnphkpRiwfJThMKud0RZS9UA==;24:RTZUM5qjCmj0/rsTA/r7Ntx4eHpPAKQ/Tyx9g68t5eyhXC1gCVg+/5gfZWha7AxrlYgIkXuwJG1+3URR8VG2oPaYvghF142qKnandfzIJQ0=;7:8ij4NX7x3u1cWHrQDrr2EvIFG2WPj0wTTEoUzlKUpZ3H0MOhJlxPdSB9kPXmX9ZKmJAhEtHIKuy4A/fq8/hGZeZR3pgsZuOImiULQ81osr8fxst4T5q/7HTLFhwNsdWFZixyR4ZTzTQeGEnaZzvdfNXzlarNIimEMZXBJXgGl6k6qPS/V+G3tdZ6rlSNmNqy+JW+K9ejBiGS3TYzC4uujtfyhVgnVOKUS6j1oiyWdLM/Hj7ymCsk6EOLIVtO+f2C SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;SN1PR15MB0415;20:MFsqpkvuDW5sUlRYfjaK/f1LZZwNNbTJl5lEpIh4zlHMtfdWgostLwPZd+Cw4/MtkWbMst6w1dJfIOi9gN5SiTP0gqAoBmOaRtyWipSH/fzwh7ZkIXK/q0X1F+U+9WDcd/mFB/Xl4m3FpDokLK4KzyUyyGYII+Bkq2KLQrZuOaI= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Jul 2016 21:22:53.5799 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN1PR15MB0415 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-07-19_14:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/18/2016 07:05 PM, Calvin Owens wrote: > On 07/17/2016 11:02 PM, Dave Chinner wrote: >> On Sun, Jul 17, 2016 at 10:00:03AM +1000, Dave Chinner wrote: >>> On Fri, Jul 15, 2016 at 05:18:02PM -0700, Calvin Owens wrote: >>>> Hello all, >>>> >>>> I've found a nasty source of slab corruption. Based on seeing similar symptoms >>>> on boxes at Facebook, I suspect it's been around since at least 3.10. >>>> >>>> It only reproduces under memory pressure so far as I can tell: the issue seems >>>> to be that XFS reclaims pages from buffers that are still in use by >>>> scsi/block. I'm not sure which side the bug lies on, but I've only observed it >>>> with XFS. >> [....] >>> But this indicates that the page is under writeback at this point, >>> so that tends to indicate that the above freeing was incorrect. >>> >>> Hmmm - it's clear we've got direct reclaim involved here, and the >>> suspicion of a dirty page that has had it's bufferheads cleared. >>> Are there any other warnings in the log from XFS prior to kasan >>> throwing the error? >> >> Can you try the patch below? > > Thanks for getting this out so quickly :) > > So far so good: I booted Linus' tree as of this morning and reproduced the ASAN > splat. After applying your patch I haven't triggered it. > > I'm a bit wary since it was hard to trigger reliably in the first place... so I > lined up a few dozen boxes to run the test case overnight. I'll confirm in the > morning (-0700) they look good. All right, my testcase ran 2099 times overnight without triggering anything. For the overnight tests, I booted the boxes with "mem=" to artificially limit RAM, which makes my repro *much* more reliable (I feel silly for not thinking of that in the first place). With that setup, I hit the ASAN splat 21 times in 98 runs on vanilla 4.7-rc7. So I'm sold. Tested-by: Calvin Owens Again, really appreciate the quick response :) Thanks, Calvin > Thanks, > Calvin > >> -Dave.