From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ming Lin Subject: xlog_write: reservation ran out Date: Fri, 28 Apr 2017 13:15:11 -0700 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Return-path: Received: from mail.kernel.org ([198.145.29.136]:47218 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1164689AbdD1UPP (ORCPT ); Fri, 28 Apr 2017 16:15:15 -0400 Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 6A0662034C for ; Fri, 28 Apr 2017 20:15:14 +0000 (UTC) Received: from mail-yw0-f177.google.com (mail-yw0-f177.google.com [209.85.161.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 7DDE0202E9 for ; Fri, 28 Apr 2017 20:15:12 +0000 (UTC) Received: by mail-yw0-f177.google.com with SMTP id l18so36220049ywh.3 for ; Fri, 28 Apr 2017 13:15:12 -0700 (PDT) Sender: ceph-devel-owner@vger.kernel.org List-ID: To: xfs@oss.sgi.com, Dave Chinner , Christoph Hellwig Cc: Ceph Development , ceph-users , "LIU, Fei" , xiongwei.jiang@alibaba-inc.com, boqian.zy@alibaba-inc.com Hi Dave & Christoph, I run into below error during a pre-production ceph cluster test with xfs backend. Kernel version: CentOS 7.2 3.10.0-327.el7.x86_64 [146702.392840] XFS (nvme9n1p1): xlog_write: reservation summary: trans type = INACTIVE (3) unit res = 83812 bytes current res = -9380 bytes total reg = 0 bytes (o/flow = 0 bytes) ophdrs = 0 (ophdr space = 0 bytes) ophdr + reg = 0 bytes num regions = 0 [146702.428729] XFS (nvme9n1p1): xlog_write: reservation ran out. Need to up reservation [146702.436917] XFS (nvme9n1p1): xfs_do_force_shutdown(0x2) called from line 2070 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa0651738 [146702.449969] XFS (nvme9n1p1): Log I/O Error Detected. Shutting down filesystem [146702.457590] XFS (nvme9n1p1): Please umount the filesystem and rectify the problem(s) [146702.467903] XFS (nvme9n1p1): xfs_log_force: error -5 returned. [146732.324308] XFS (nvme9n1p1): xfs_log_force: error -5 returned. [146762.436923] XFS (nvme9n1p1): xfs_log_force: error -5 returned. [146792.549545] XFS (nvme9n1p1): xfs_log_force: error -5 returned. Each XFS fs is 1.7T. The cluster was written about 80% full then we delete the ceph rbd image, which actually delete a lot of files in the backend xfs. I'm going to have a try below quick hack and see if it helps. diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c index 1b754cb..b2702f5 100644 --- a/fs/xfs/libxfs/xfs_trans_resv.c +++ b/fs/xfs/libxfs/xfs_trans_resv.c @@ -800,7 +800,7 @@ xfs_trans_resv_calc( resp->tr_link.tr_logcount = XFS_LINK_LOG_COUNT; resp->tr_link.tr_logflags |= XFS_TRANS_PERM_LOG_RES; - resp->tr_remove.tr_logres = xfs_calc_remove_reservation(mp); + resp->tr_remove.tr_logres = xfs_calc_remove_reservation(mp) * 2; resp->tr_remove.tr_logcount = XFS_REMOVE_LOG_COUNT; resp->tr_remove.tr_logflags |= XFS_TRANS_PERM_LOG_RES; Meanwhile, could you suggest any upstream patch that I can have a try? Any help is appropriated. Thanks, Ming ------ $lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Thread(s) per core: 2 Core(s) per socket: 16 Socket(s): 2 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz Stepping: 1 CPU MHz: 2499.609 BogoMIPS: 4994.43 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 40960K NUMA node0 CPU(s): 0-63 $free total used free shared buff/cache available Mem: 131451952 40856592 83974824 9832 6620536 84212472 Swap: 2097148 0 2097148