From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753050AbbD3O5Q (ORCPT ); Thu, 30 Apr 2015 10:57:16 -0400 Received: from imap.thunk.org ([74.207.234.97]:37750 "EHLO imap.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753017AbbD3O5N (ORCPT ); Thu, 30 Apr 2015 10:57:13 -0400 Date: Thu, 30 Apr 2015 10:57:10 -0400 From: "Theodore Ts'o" To: Martin Steigerwald Cc: Dave Chinner , Mike Galbraith , Daniel Phillips , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, tux3@tux3.org, OGAWA Hirofumi Subject: Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?) Message-ID: <20150430145710.GE12374@thunk.org> Mail-Followup-To: Theodore Ts'o , Martin Steigerwald , Dave Chinner , Mike Galbraith , Daniel Phillips , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, tux3@tux3.org, OGAWA Hirofumi References: <8f886f13-6550-4322-95be-93244ae61045@phunq.net> <1430334326.7360.25.camel@gmail.com> <20150430002008.GY15810@dastard> <4154074.ZWLyZCMjhl@merkaba> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4154074.ZWLyZCMjhl@merkaba> User-Agent: Mutt/1.5.23 (2014-03-12) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: tytso@thunk.org X-SA-Exim-Scanned: No (on imap.thunk.org); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 30, 2015 at 11:00:05AM +0200, Martin Steigerwald wrote: > > IOWS, XFS just hates your disk. Spend $50 and buy a cheap SSD and > > the problem goes away. :) > > I am quite surprised that a traditional filesystem that was created in the > age of rotating media does not like this kind of media and even seems to > excel on BTRFS on the new non rotating media available. You shouldn't be surprised; XFS was designed in an era where RAID was extremely important. To this day, on a very large RAID arrays, I'm pretty sure none of the other file systems will come close to touching XFS, because it was optimized by some really, really good file system engineers for that hardware. And while RAID systems are certainly not identical to SSD, the fact that you have multiple disk heads means that a good file system will optimize for that parallelism, and that's how SSD's get their speed (individual SSD channels aren't really all that fast; it's the fast that you can be reading or writing arge numbers of them in parallel that high end flash get their really great performance numbers.) > > Thing is, once you've abused those filesytsems for a couple of > > months, the files in ext4, btrfs and tux3 are not going to be laid > > out perfectly on the outer edge of the disk. They'll be spread all > > over the place and so all the filesystems will be seeing large seeks > > on read. The thing is, XFS will have roughly the same performance as > > when the filesystem is empty because the spreading of the allocation > > allows it to maintain better locality and separation and hence > > doesn't fragment free space nearly as badly as the oher filesystems. > > Free space fragmentation is what leads to performance degradation in > > filesystems, and all the other filesystem will have degraded to be > > *much worse* than XFS. In fact, ext4 doesn't actually lay out things perfectly on the outer edge of the disk either, because we try to do spreading as well. Worse, we use a random algorithm to try to do the spreading, so that means that results from run to run on an empty file system will show a lot more variation. I won't claim that we're best in class with either our spreading techniques or our ability to manage free space fragmentation, although we do a lot of work to manage free space fragmentation as well. One of the problems is that it's *hard* to get good benchmarking numbers that take into account file system aging and measure how well the free space has been fragmented over time. Most of the benchmark results that I've seen do a really lousy job at this, and the vast majority don't even try. This is one of the reasons why I find head-to-head "competitions" between file systems to be not very helpful for anything other than benchmarketing. It's almost certain that the benchmark won't be "fair" in some way, and it doesn't really matter whether the person doing the benchmark was doing it with malice aforethought, or was just incompetent and didn't understand the issues --- or did understand the issues and didn't really care, because what they _really_ wanted to do was to market their file system. And even if the benchmark is fair, it might not match up with the end user's hardware, or their use case. There will always be some use case where file system A is better than file system B, for pretty much any file system. Don't get me wrong --- I will do comparisons between file systems, but only so I can figure out ways of making _my_ file system better. And more often than not, it's comparisons of the same file system before and after adding some new feature which is the most interesting. > That are the allocation groups. I always wondered how it can be beneficial > to spread the allocations onto 4 areas of one partition on expensive seek > media. Now that makes better sense for me. I always had the gut impression > that XFS may not be the fastest in all cases, but it is one of the > filesystem with the most consistent performance over time, but never was > able to fully explain why that is. Yep, pretty much all of the traditional update-in-place file systems since the BSD FFS have done this, and for the same reason. For COW file systems which are are constantly moving data and metadata blocks around, they will need different strategies for trying to avoid the free space fragmentation problem as the file system ages. Cheers, - Ted