From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758126AbZANE3W (ORCPT ); Tue, 13 Jan 2009 23:29:22 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752778AbZANE3M (ORCPT ); Tue, 13 Jan 2009 23:29:12 -0500 Received: from mx1.redhat.com ([66.187.233.31]:47992 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752174AbZANE3L (ORCPT ); Tue, 13 Jan 2009 23:29:11 -0500 Date: Tue, 13 Jan 2009 23:28:58 -0500 (EST) From: Mikulas Patocka X-X-Sender: mpatocka@hs20-bc2-1.build.redhat.com To: Dave Chinner cc: xfs@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: spurious -ENOSPC on XFS In-Reply-To: <20090113214949.GN8071@disturbed> Message-ID: References: <20090113214949.GN8071@disturbed> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > > This misbehavior is apparently caused by delayed allocation, delayed > > allocation does not exactly know how much space will be occupied by data, > > so it makes some upper bound guess. > > No, we know *exactly* how much space is consumed by the data. What > we don't know is how much space will be required for additional > *metadata* to do the allocation so we reserve the worst case need so > hat we should never get an ENOSPC during async writeback when we > can't report the error to anyone. Worst case is 4 metadata blocks > per allocation (delalloc extent, really). > > If we ENOSPC in the delalloc path, we have two choices: > > 1. potentially lock the system up due to OOM and being > unable to flush pages > 2. throw away user data without being able to report an > error to the application that wrote it originally. > > Personally, I don't like either option, so premature ENOSPC at > write() time is fine by me.... > > > Because free space count is only a > > guess, not the actual data being consumed, XFS should not return -ENOSPC > > on behalf of it. When the free space overflows, XFS should sync itself, > > retry allocation and only return -ENOSPC if it fails the second time, > > after the sync. > > It does, by graduated response (see xfs_iomap_write_delay() and > xfs_flush_space()): > > 1. trigger async flush of the inode and retry > 2. retry again > 3. start a filesystem wide flush, wait 500ms and try again The result must not depend on magic timer values. If it does, you end up with undebbugable nondeterministic failures. Why don't you change that 500ms wait to "wait until the flush finishes"? That would be correct. > 4. really ENOSPC now. > > It could probably be improved but, quite frankly, XFS wasn't designed > for small filesystems so I don't think this is worth investing any > major effort in changing/fixing. > > Cheers, > > Dave. Mikulas