From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FBD6C433EF for ; Mon, 21 Mar 2022 01:23:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232714AbiCUBZF (ORCPT ); Sun, 20 Mar 2022 21:25:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39138 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244598AbiCUBZE (ORCPT ); Sun, 20 Mar 2022 21:25:04 -0400 Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 449D4E0BB for ; Sun, 20 Mar 2022 18:23:36 -0700 (PDT) Received: from dread.disaster.area (pa49-186-150-27.pa.vic.optusnet.com.au [49.186.150.27]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 368C65336DB for ; Mon, 21 Mar 2022 12:23:34 +1100 (AEDT) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1nW6lg-007vzB-9F for linux-xfs@vger.kernel.org; Mon, 21 Mar 2022 12:23:32 +1100 Received: from dave by discord.disaster.area with local (Exim 4.95) (envelope-from ) id 1nW6lg-001Zuf-6j for linux-xfs@vger.kernel.org; Mon, 21 Mar 2022 12:23:32 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 0/2] xfs: more shutdown/recovery fixes Date: Mon, 21 Mar 2022 12:23:27 +1100 Message-Id: <20220321012329.376307-1-david@fromorbit.com> X-Mailer: git-send-email 2.35.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.4 cv=e9dl9Yl/ c=1 sm=1 tr=0 ts=6237d396 a=sPqof0Mm7fxWrhYUF33ZaQ==:117 a=sPqof0Mm7fxWrhYUF33ZaQ==:17 a=o8Y5sQTvuykA:10 a=ok0e8giGdOLxln7VH7gA:9 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Hi folks, These two patches are followups to my recent series of shutdown/recovery fixes. The cluster buffer lock patch addresses a race condition that started to show up regularly once the fixes in the previous series were done - it is a regression from the async inode reclaim work that was done almost 2 years ago now. The second patch is something I'm really surprised has taken this long to uncover. There is a check in intent recovery/cancellation that checks that there are no intent items in the AIL after the first non-intent item is found. This behaviour was correct back when we only had standalone intent items (i.e. EFI/EFD), but when we started to chain complex operations by intents, the recovery of an incomplete intent can log and commit new intents and they can end up in the AIL before log recovery is complete and finished processing the deferred items. Hence the ASSERT() check that no intents exist in the AIL after the first non-intent item is simply invalid. With these two patches, I'm back to being able to run hundreds of cycles of g/388 or -g recoveryloop without seeing any failures. -Dave.