From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752310AbcGSCGE (ORCPT ); Mon, 18 Jul 2016 22:06:04 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:1170 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752060AbcGSCF6 (ORCPT ); Mon, 18 Jul 2016 22:05:58 -0400 From: Calvin Owens Subject: Re: [BUG] Slab corruption during XFS writeback under memory pressure To: Dave Chinner References: <28f77d74-5ab4-d913-2921-df90da53f393@fb.com> <20160717000003.GW1922@dastard> <20160718060215.GB16044@dastard> CC: , , , , Message-ID: <24d2f83f-5281-ab3c-9e91-985a4b8e2f8b@fb.com> Date: Mon, 18 Jul 2016 19:05:44 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160718060215.GB16044@dastard> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [2620:10d:c090:200::3:2ab5] X-ClientProxiedBy: BY2PR04CA0065.namprd04.prod.outlook.com (10.255.247.33) To CY1PR15MB0411.namprd15.prod.outlook.com (10.163.234.25) X-MS-Office365-Filtering-Correlation-Id: c0f59df0-0b8f-4231-83e9-08d3af793478 X-Microsoft-Exchange-Diagnostics: 1;CY1PR15MB0411;2:SR1Y1GIDCSGk4PRgzrd/4Yn3axI1H6caORba4uORb0CUa5uoa4goNl0QotipAe5tmKcNMR+LJmiIRixmpUSZlBwdN7Icbj0ad8wgJH/6NH2Np8KD46ROnVnlaxFWbVe39yigFk+ODigX3cgkJrtBu81ehaLOhL863+avRewHQyzu+qxr2NQb0C8r87PqtzIy;3:cSg3CEO8sNv8220UWfJNyv5bcnxgcmNBKAwkd5WwzcU/h9kIFSiF9p8or64wO6Sg3R9Ffawok4I2Uu8/u4PfKR01dIrvVm6KqtFQRwB2uzEk6bqZSUhlQESFHPCkVBDr;25:f/nbPuuvXola4rFOPyOuGgiccV6Q7QkjIRIUaS0bUKYt3bD6gwyBmQKo64qal21BXlEezoq4sU7ez/Ik+FIrteo4kxdFbzv8+J+V2B4IEkhTp7mv6siJZIB5aABCIDQ+3dgAK7CYcB3BEI8Y7p6C3d0Bgw1hfLfngUm5S9Z8sI2tb2eHcT5qcR7jLpIAOEXJQtvxmCxfAX7UCsu/nomz6z+AvIr/xBNeIXlGQEnrGpcxE9n4Qrq6NbDucd6bTllXCFyh1bNH8HFp8PbwG9PfF4MKEadRdo7aiS65wXQirnNKXoboUkc28KrU918bRJ57wuo54sqm20lOX2o6Exai1p7tl2+DLFdJsdR46iHp9DU2uOk0WGoaujCyIgqCnSeCzNg2TnKVX+jrftb3kaYEe1+V6PfSWXMra4XfQDW157Q=;31:TwAcXPPsPPTXcQ81vJmd8LgMQuy6zP8FR9ZcL9pd/LmzBlq/4pkwiK0xfhDzZQCrjCc9MbLvBze/S/mRSnfGNxjyoMdxu1aewwPzH/WcOM9/4hOOKKNMd4HngEdzKw785AkhbF3ShHAXsIfwlmp/1K7sjmk5kWUjKCPEkY69yElOX6+iJKrd/dAsNH67mIkWjKaQkyw77/EDVLfnU/NstQ== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CY1PR15MB0411; X-LD-Processed: 8ae927fe-1255-47a7-a2af-5f3a069daaa2,ExtAddr X-Microsoft-Exchange-Diagnostics: 1;CY1PR15MB0411;20:t2XogY9imQ5a/ZyjyMYSmJ3aDyPa6ge/Qqpa1r+6PFjGDG8iNDSN7wD2bOPfLqnbjJVa+2TbJZlu46XyxNepHSrvr83GFfb/X6htx1WuVkgOI/88fP5YIZpBePd14ihzx/69+meGeC9R0/87hs0D6Zvhpn0GvNg+8ekS6QWBXzw=;4:zjPkK7ithIACn5/h4gkP6ebTjlEBTVoAkKj+9RI0QLb5NjWPGz3QWGmRBcyp7GJbNTF5g7yWEz4sVY0b/x+oeQIWz8/4ryaEeK7ITMEU1zYxssrJYRmV8oevhPRo7STziQ85RaGHW7/1Uo7FpjX8L7ozRdLV3mD0r3U0TCMwLFBhx/jqq+M8nNhTFrxMSKnZ2ybgDPV6JtEClOchvTkUnyjgSTh8QTbjt38woEa9heO4ipjDZg8BbNnJfWeFrGNfmJh5r6bartQ7IvbPt4kNudVnQKWH8afz/HZxlGtGbUJuEolppIaqshhQb35Q29j5jBeUn9rN/Kn3Q97/ktYzS1yR4khECHMyesd7muvW7M3Lj+wHfOqCfxtdjcNL2sHYPS1eyV0ysw6Y6Of+gXYQUEqZA68aMb5TOIQXbbSM56I= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(81227570615382); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001);SRVR:CY1PR15MB0411;BCL:0;PCL:0;RULEID:;SRVR:CY1PR15MB0411; X-Forefront-PRVS: 000800954F X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(7916002)(199003)(189002)(24454002)(53754006)(377454003)(42186005)(7736002)(81156014)(4001350100001)(36756003)(64126003)(106356001)(105586002)(1706002)(81166006)(83506001)(31696002)(230700001)(86362001)(97736004)(23746002)(101416001)(68736007)(305945005)(33646002)(2906002)(92566002)(77096005)(2950100001)(76176999)(47776003)(8676002)(54356999)(4326007)(7846002)(110136002)(189998001)(50466002)(31686004)(6116002)(586003)(50986999)(3826002);DIR:OUT;SFP:1102;SCL:1;SRVR:CY1PR15MB0411;H:[IPv6:2620:10d:c082:10e2:c23f:d5ff:fe6b:54f7];FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;CY1PR15MB0411;23:sGlL03FIeonyRYrXhZ5IfOwjaaTkZoMgLENkb?= =?Windows-1252?Q?y3IbA4Lc9muj90WOEjF097oz3+z3MTzX/isRytDtfa5NNqZeZYk1CvOr?= =?Windows-1252?Q?OeyH9FUL8onEOvCZlF7kHy8R+DcR8ohST5cuuvmWuj8u8zwmlMxaqmoZ?= =?Windows-1252?Q?CmaKTSToIkvqGzgnLBzSRTakOX8qRBpz7u9eKKs1NxKLIQG+KFLjsFT7?= =?Windows-1252?Q?PhisWWK42xn6eByDG+lBT+U8G6EExciQOrKFg++/N44td+wO/bHsH0ku?= =?Windows-1252?Q?tZpe/GuoNFXakhFMf6ebYqsHNMnzmBzT8wTeB9sImbVelKg0g1yz6EWX?= =?Windows-1252?Q?YRQdM/S5HfThX5+RcWN/kasliwvtjLM2JRkDy+z4NdSMTjKLPLRQCc5W?= =?Windows-1252?Q?vCoAJk4OiUr9q6QE3ZQgbJfnHx3sQzMBp754VPI/dEwBRzLzEYu3DE0e?= =?Windows-1252?Q?RxNwiFkYqZ7X1r6LnQv5mnF5slxLhuV1h3O+DHRr6xbciet/5xb505kc?= =?Windows-1252?Q?ZN18GNv2Z+c6zJYMSS3U//XEhdRbq9OPfC46MJKKCrDBwPaTPtK6qlah?= =?Windows-1252?Q?7WDlLrtA4btEMWcN819M4RJch6e4NkEXHw9ziIzIzHstf+ROiAp+RRZi?= =?Windows-1252?Q?8Ta1bzVj6ef8cAk56WAosuJERNJ3lJFKeIzUGyDXe9Y3X/hr5JDgrqGI?= =?Windows-1252?Q?l1pIcuDP6aN2dDodFN908sKuBfRG+FcFdrBgQ3VG9seWerGPl098rfWN?= =?Windows-1252?Q?5RDyLN1FaIshUiF+Qf60bl+AQyLxURADqaEE+iENyHTsxL5y8xp5OTFT?= =?Windows-1252?Q?EDB7ELv4HzyrC2AARi4vcDpFeZF3U8cG2jvLlABm1pZU4R01bI9dP/19?= =?Windows-1252?Q?pN6ArwgUCjt0aB/jh1GBW+fuPdUib1HLxtjXleZ6JPIjTnE9zQy/n9He?= =?Windows-1252?Q?uGKE50JWw0I6PTzw70GyDXn2St7FTlLqtiDdYIhE6x/bzGa4U3+T9zKX?= =?Windows-1252?Q?YVVLIA/9xO0lkdHkpf+KIDdqzn31OdXarPu2VVqMp2DENSLEd3Fvf+Oy?= =?Windows-1252?Q?OlW5yh1Aaidsr85Q+IyaAPtFHWSZVh2RZ8LB0WBxqLve5FXuzGBztbOJ?= =?Windows-1252?Q?1C+UO+99s8qjZbgrFxkoElRu/qlG30dR5LTn5SAmKaSq8FmIgLhVsWer?= =?Windows-1252?Q?5OeuDJYk57bUI/qG49l6sl2doEFGQzoKk3/+pCYFYTDoKS3nIy1?= X-Microsoft-Exchange-Diagnostics: 1;CY1PR15MB0411;6:bpgNDzoMyQRS4OydM+G+etgQrXkzNzXau63bHM8MqKFHNwqiDCYSR/nRrnxkMp0MLAcHV1hQRR50OPdlP7CD9LU2hC9gFwaoPCQQRPaRDfKZUhDewJQ9TjgQLyuBmj8qTb8fE520lizsXetzREZvVAVJvjb1uulfazT0/aZfzQsKXFY5+WobfjGrIlL+AX0CPpOm/tnT3HkLmPAkYjoN+ieIFZuQhC4s37KlCaorF1xmBX7mOTCZ9aqSPMOo7QJExNBCm0VYTi0ZZzYhzIXW+yQvrx9Mpj+KZVPmbZqdLuM=;5:cLX1iRYKbGyb8SBtJsUde7YvTrZHemPWsVFouJDLtlwPb7IkPq5ipv+L9RLVZG81ePebyYFz7XlSrfZW2tqk+lNX9MarVnDUKMkkd7LrGPzbvvLCUuC0TnFAdgHEdGEmtOaSO5HO0u25wYGJUs4pqA==;24:txBYb7uCVAC7PtcoMff+k2H5kXB2AYokDWk5Dew1ymqPX1F1lsXgWZ9SGHFeJEXGcdHRvWe8Idu86CrbJGWGios0cx2aJTzZfq0Dju2EvKQ=;7:5XutmDKtnRdA+chNTWL83zDVOi2EwQIE7bs4gub2dJskPW0VkhmlO20kwylPmAUkL2rXkQMgOnVJbIjYu5Dp3a2AsprgY8kDOEG2BtTaVPzANcW/py+0cjtcbmon4kIrL3OFizecTtJUJQsc27baKsesvcHYqD3EDrhdyydckOhjgjMyyFYz5sa+WgXe24qebXr/52uXvoIypi8TsuBcfNVE5sYggO2+8zUa5pcoKfmjPTuV6/yxIjHFw8Qhfb2U SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;CY1PR15MB0411;20:GX8N9SdtzXGTyXLN4AnXqdQIINZCWqnZGENuGrXz2tOEVE2N3ZAPeD/MLt5yK5H4yq1WW/Qxl6fuijVspLLq3/mZ5wT+TuL40afEuHPiiOFO5lbVuVc5WpJVaTmM1zDFU9D10lRow36L1i+hbYz8f0JsRCqzE2Hj5UWihJuhmUI= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Jul 2016 02:05:49.0234 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR15MB0411 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-07-18_09:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/17/2016 11:02 PM, Dave Chinner wrote: > On Sun, Jul 17, 2016 at 10:00:03AM +1000, Dave Chinner wrote: >> On Fri, Jul 15, 2016 at 05:18:02PM -0700, Calvin Owens wrote: >>> Hello all, >>> >>> I've found a nasty source of slab corruption. Based on seeing similar symptoms >>> on boxes at Facebook, I suspect it's been around since at least 3.10. >>> >>> It only reproduces under memory pressure so far as I can tell: the issue seems >>> to be that XFS reclaims pages from buffers that are still in use by >>> scsi/block. I'm not sure which side the bug lies on, but I've only observed it >>> with XFS. > [....] >> But this indicates that the page is under writeback at this point, >> so that tends to indicate that the above freeing was incorrect. >> >> Hmmm - it's clear we've got direct reclaim involved here, and the >> suspicion of a dirty page that has had it's bufferheads cleared. >> Are there any other warnings in the log from XFS prior to kasan >> throwing the error? > > Can you try the patch below? Thanks for getting this out so quickly :) So far so good: I booted Linus' tree as of this morning and reproduced the ASAN splat. After applying your patch I haven't triggered it. I'm a bit wary since it was hard to trigger reliably in the first place... so I lined up a few dozen boxes to run the test case overnight. I'll confirm in the morning (-0700) they look good. Thanks, Calvin > -Dave.