From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andi Kleen Subject: Re: Btrfs: broken file system design Date: Fri, 25 Jun 2010 11:15:55 +0200 Message-ID: <87hbkrealw.fsf@basil.nowhere.org> References: <4C07C321.8010000@redhat.com> <4C1B7560.1000806@gmail.com> <4C1BA3E5.7020400@gmail.com> <20100623234031.GF7058@shareable.org> <469D2D911E4BF043BFC8AD32E8E30F5B24AEBA@wdscexbe07.sc.wdc.com> <469D2D911E4BF043BFC8AD32E8E30F5B24AEBB@wdscexbe07.sc.wdc.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "Mike Fedyk" , "Daniel J Blueman" , "Mat" , "LKML" , , "Chris Mason" , "Ric Wheeler" , "Andrew Morton" , "Linus Torvalds" , "The development of BTRFS" To: "Daniel Taylor" Return-path: In-Reply-To: <469D2D911E4BF043BFC8AD32E8E30F5B24AEBB@wdscexbe07.sc.wdc.com> (Daniel Taylor's message of "Thu, 24 Jun 2010 15:06:03 -0700") List-ID: "Daniel Taylor" writes: > > As long as no object smaller than the disk block size is ever > flushed to media, and all flushed objects are aligned to the disk > blocks, there should be no real performance hit from that. The question is just how large such a block needs to be. Traditionally some RAID controllers (and possibly some SSDs now) needed very large blocks upto MBs. > > Otherwise we end up with the damage for the ext[234] family, where > the file blocks can be aligned, but the 1K inode updates cause > the read-modify-write (RMW) cycles and and cost >10% performance > hit for creation/update of large numbers of files. Fixing that doesn't require a new file system layout, just some effort to read/write inodes in batches of multiple of them. XFS did similar things for a long time, I believe there were some efforts for this for ext4 too. -Andi -- ak@linux.intel.com -- Speaking for myself only.