From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id q1OEr2l0055478 for ; Fri, 24 Feb 2012 08:53:03 -0600 Received: from anakin.london.02.net (anakin.london.02.net [87.194.255.134]) by cuda.sgi.com with ESMTP id OzKDxqzVpSHG3sCO for ; Fri, 24 Feb 2012 06:52:59 -0800 (PST) Received: from ty.sabi.co.UK (87.194.99.40) by anakin.london.02.net (8.5.140) id 4EEB63D20135AF02 for xfs@OSS.SGI.com; Fri, 24 Feb 2012 14:52:57 +0000 Received: from from [127.0.0.1] (helo=tree.ty.sabi.co.UK) by ty.sabi.co.UK with esmtp(Exim 4.71 #1) id 1S0wVs-00088l-0R for ; Fri, 24 Feb 2012 14:52:48 +0000 MIME-Version: 1.0 Message-ID: <20295.42047.874856.157404@tree.ty.sabi.co.UK> Date: Fri, 24 Feb 2012 14:52:47 +0000 Subject: Re: creating a new 80 TB XFS In-Reply-To: <4F478818.4050803@cape-horn-eng.com> References: <4F478818.4050803@cape-horn-eng.com> From: pg_xf2@xf2.for.sabi.co.UK (Peter Grandi) List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Linux fs XFS [ ... ] > We are getting now 32 x 3 TB Hitachi SATA HDDs. I plan to > configure them in a single RAID 6 set with one or two > hot-standby discs. The raw storage space will then be 28 x 3 > TB = 84 TB. On this one RAID set I will create only one > volume. Any thoughts on this? Well, many storage experts would be impressed by and support such an audacious plan... But I think that wide RAID6 sets and large RAID6 stripes are a phenomenally bad idea, and large filetrees also strikingly bad, and the two combined seems to me almost the most terrible setup. It is also remarkably brave to use 32 identical drives in a RAID set. But all this is very popular because in the beginning "it works" and is really cheap. The proposed setup has only 7% redundancy, RMW issues with large stripe sizes, and 'fsck' time and space issues with large trees. Consider this series of blog notes: http://www.sabi.co.uk/blog/12-two.html#120218 http://www.sabi.co.uk/blog/12-two.html#120127 http://www.sabi.co.uk/blog/1104Apr.html#110401 http://groups.google.com/group/linux.debian.ports.x86-64/msg/fd2b4d46a4c294b5 > This storage will be used as secondary storage for backups. We > use dirvish (www.dirvish.org, which uses rsync) to run our > daily backups. So it will be lots and lots of metadata (mostly directory) updates. Not a very good match there. Especially considering that almost always you will be only writing to it even for data, and presumably from multiple hosts concurrently. You may benefit considerably from putting the XFS log on a separate disk, and if you use Linux MD for RAID the bitmaps on a separate disk. > *MKFS* We also heavily use ACLs for almost all of our files. That's a daring choice. > [ ... ] "-i size=512" on XFS creation, so my mkfs.xfs would look > something like: mkfs.xfs -i size=512 -d su=stripe_size,sw=28 > -L Backup_2 /dev/sdX1 As a rule I specify a sector size of 4096, and in your case perhaps an inode size of 2048 might be appropriate to raise the chance of ACLs and directories fully stored in inode tails, which seem particularly important in your case. Something like: -s size=4096 -b size=4096 -i size=2048,attr=2 > mount -o noatime,nobarrier,nofail,logbufs=8,logbsize=256k,inode64 > /dev/sdX1 /mount_point 'nobarrier' seems rather optimistic unless you are very very sure there won't be failures. There are many others details to looks into, from readhead to flusher frequency. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs