All of lore.kernel.org
 help / color / mirror / Atom feed
* Odd Block allocation behavior on Reiser3
@ 2004-08-09 20:19 Sonny Rao
  2004-08-09 20:30 ` Chris Mason
  0 siblings, 1 reply; 19+ messages in thread
From: Sonny Rao @ 2004-08-09 20:19 UTC (permalink / raw)
  To: reiserfs-list

Hi, I'm investigating filesystem performance on sequential read
patterns of large files, and I discovered an odd pattern of block
allocation and subsequent re-allocation after overwrite under reiser3:

x44016way1:/mnt/tmp1 # dd if=/dev/zero of=filedh bs=1k count=$[ 512 * 1024 ]
524288+0 records in
524288+0 records out
x44016way1:/mnt/tmp1 # filefrag  *
filedh: 134 extents found
x44016way1:/mnt/tmp1 # dd if=/dev/zero of=filedh bs=1k count=$[ 512 * 1024 ]
524288+0 records in
524288+0 records out
x44016way1:/mnt/tmp1 # filefrag  *
filedh: 5 extents found
x44016way1:/mnt/tmp1 # dd if=/dev/zero of=filedh bs=1k count=$[ 512 * 1024 ]
524288+0 records in
524288+0 records out
x44016way1:/mnt/tmp1 # filefrag  *
filedh: 134 extents found
x44016way1:/mnt/tmp1 # dd if=/dev/zero of=filedh bs=1k count=$[ 512 * 1024 ]
524288+0 records in
524288+0 records out
x44016way1:/mnt/tmp1 # filefrag  *
filedh: 5 extents found

This was done on a newly created filesystem with plenty of available
space and no other files.  I tried this test several times and saw the
number of extents for the file vary from 5,6,7 and 134 extents, but it
is always different after each over-write.

First, I expect that an extent-based filesystem like reiserfs
would simply allocate one or just a few extents for this file rather
than the 134 extents I see.   XFS and JFS both will create files with
1 extent.  Even ext3, will create the 512MB file with just 5 extents.
134 extents on a brand new filesystem seems ridiculous to me.  

Second, I'm wondering why we see re-allocation going on when we
over-write the file?  The result of just 5 extents is good, but on a
second re-write we could get the 134 extents again or even a different
value.  I see that dd, opens the file with O_TRUNC, could that be
causing the differences? 

I've put the output of filefrag -v below for the 134 extent file.

Thanks,

Sonny Rao

filefrag -v filedh 
Checking filedh
Filesystem type is: 52654973
Filesystem cylinder groups is approximately 60672
Blocksize of file filedh is 4096
File size of filedh is 536870912 (131072 blocks)
Discontinuity: Block 956 is at 271842 (was 271839)
Discontinuity: Block 1976 is at 272863 (was 272861)
Discontinuity: Block 2979 is at 273867 (was 273865)
Discontinuity: Block 3999 is at 274888 (was 274886)
Discontinuity: Block 5019 is at 275909 (was 275907)
Discontinuity: Block 6022 is at 276913 (was 276911)
Discontinuity: Block 7042 is at 277934 (was 277932)
Discontinuity: Block 8045 is at 278938 (was 278936)
Discontinuity: Block 9065 is at 279959 (was 279957)
Discontinuity: Block 10068 is at 280963 (was 280961)
Discontinuity: Block 11088 is at 281984 (was 281982)
Discontinuity: Block 12091 is at 282988 (was 282986)
Discontinuity: Block 13111 is at 284009 (was 284007)
Discontinuity: Block 14114 is at 285013 (was 285011)
Discontinuity: Block 15134 is at 286034 (was 286032)
Discontinuity: Block 16137 is at 287038 (was 287036)
Discontinuity: Block 17157 is at 288059 (was 288057)
Discontinuity: Block 18160 is at 289063 (was 289061)
Discontinuity: Block 19180 is at 290084 (was 290082)
Discontinuity: Block 20183 is at 291088 (was 291086)
Discontinuity: Block 21203 is at 292109 (was 292107)
Discontinuity: Block 22223 is at 293130 (was 293128)
Discontinuity: Block 23226 is at 294134 (was 294132)
Discontinuity: Block 24004 is at 294913 (was 294911)
Discontinuity: Block 24242 is at 295152 (was 295150)
Discontinuity: Block 25245 is at 296156 (was 296154)
Discontinuity: Block 26265 is at 297177 (was 297175)
Discontinuity: Block 27268 is at 298181 (was 298179)
Discontinuity: Block 28288 is at 299202 (was 299200)
Discontinuity: Block 29291 is at 300206 (was 300204)
Discontinuity: Block 30311 is at 301227 (was 301225)
Discontinuity: Block 31331 is at 302248 (was 302246)
Discontinuity: Block 32334 is at 303252 (was 303250)
Discontinuity: Block 33354 is at 304273 (was 304271)
Discontinuity: Block 34357 is at 305277 (was 305275)
Discontinuity: Block 35377 is at 306298 (was 306296)
Discontinuity: Block 36380 is at 307302 (was 307300)
Discontinuity: Block 37400 is at 308323 (was 308321)
Discontinuity: Block 38403 is at 309327 (was 309325)
Discontinuity: Block 39423 is at 310348 (was 310346)
Discontinuity: Block 40426 is at 311352 (was 311350)
Discontinuity: Block 41446 is at 312373 (was 312371)
Discontinuity: Block 42449 is at 313377 (was 313375)
Discontinuity: Block 43469 is at 314398 (was 314396)
Discontinuity: Block 44472 is at 315402 (was 315400)
Discontinuity: Block 45492 is at 316423 (was 316421)
Discontinuity: Block 46495 is at 317427 (was 317425)
Discontinuity: Block 47513 is at 318446 (was 318444)
Discontinuity: Block 48533 is at 319467 (was 319465)
Discontinuity: Block 49536 is at 320471 (was 320469)
Discontinuity: Block 50556 is at 321492 (was 321490)
Discontinuity: Block 51559 is at 322496 (was 322494)
Discontinuity: Block 52579 is at 323517 (was 323515)
Discontinuity: Block 53582 is at 324521 (was 324519)
Discontinuity: Block 54602 is at 325542 (was 325540)
Discontinuity: Block 55605 is at 326546 (was 326544)
Discontinuity: Block 56625 is at 327567 (was 327565)
Discontinuity: Block 56738 is at 327681 (was 327679)
Discontinuity: Block 57639 is at 328583 (was 328581)
Discontinuity: Block 58642 is at 329587 (was 329585)
Discontinuity: Block 59662 is at 330608 (was 330606)
Discontinuity: Block 60665 is at 331612 (was 331610)
Discontinuity: Block 61685 is at 332633 (was 332631)
Discontinuity: Block 62688 is at 333637 (was 333635)
Discontinuity: Block 63708 is at 334658 (was 334656)
Discontinuity: Block 64711 is at 335662 (was 335660)
Discontinuity: Block 65731 is at 336683 (was 336681)
Discontinuity: Block 66751 is at 337704 (was 337702)
Discontinuity: Block 67754 is at 338708 (was 338706)
Discontinuity: Block 68774 is at 339729 (was 339727)
Discontinuity: Block 69777 is at 340733 (was 340731)
Discontinuity: Block 70797 is at 341754 (was 341752)
Discontinuity: Block 71800 is at 342758 (was 342756)
Discontinuity: Block 72820 is at 343779 (was 343777)
Discontinuity: Block 73823 is at 344783 (was 344781)
Discontinuity: Block 74843 is at 345804 (was 345802)
Discontinuity: Block 75846 is at 346808 (was 346806)
Discontinuity: Block 76866 is at 347829 (was 347827)
Discontinuity: Block 77869 is at 348833 (was 348831)
Discontinuity: Block 78889 is at 349854 (was 349852)
Discontinuity: Block 79892 is at 350858 (was 350856)
Discontinuity: Block 80912 is at 351879 (was 351877)
Discontinuity: Block 81915 is at 352883 (was 352881)
Discontinuity: Block 82935 is at 353904 (was 353902)
Discontinuity: Block 83955 is at 354925 (was 354923)
Discontinuity: Block 84958 is at 355929 (was 355927)
Discontinuity: Block 85978 is at 356950 (was 356948)
Discontinuity: Block 86981 is at 357954 (was 357952)
Discontinuity: Block 88001 is at 358975 (was 358973)
Discontinuity: Block 89004 is at 359979 (was 359977)
Discontinuity: Block 89473 is at 360449 (was 360447)
Discontinuity: Block 90017 is at 360994 (was 360992)
Discontinuity: Block 91037 is at 362015 (was 362013)
Discontinuity: Block 92040 is at 363019 (was 363017)
Discontinuity: Block 93060 is at 364040 (was 364038)
Discontinuity: Block 94063 is at 365044 (was 365042)
Discontinuity: Block 95083 is at 366065 (was 366063)
Discontinuity: Block 96086 is at 367069 (was 367067)
Discontinuity: Block 97106 is at 368090 (was 368088)
Discontinuity: Block 98109 is at 369094 (was 369092)
Discontinuity: Block 99129 is at 370115 (was 370113)
Discontinuity: Block 100132 is at 371119 (was 371117)
Discontinuity: Block 101152 is at 372140 (was 372138)
Discontinuity: Block 102155 is at 373144 (was 373142)
Discontinuity: Block 103175 is at 374165 (was 374163)
Discontinuity: Block 104195 is at 375186 (was 375184)
Discontinuity: Block 105198 is at 376190 (was 376188)
Discontinuity: Block 106218 is at 377211 (was 377209)
Discontinuity: Block 107221 is at 378215 (was 378213)
Discontinuity: Block 108241 is at 379236 (was 379234)
Discontinuity: Block 109244 is at 380240 (was 380238)
Discontinuity: Block 110264 is at 381261 (was 381259)
Discontinuity: Block 111267 is at 382265 (was 382263)
Discontinuity: Block 112287 is at 383286 (was 383284)
Discontinuity: Block 113290 is at 384290 (was 384288)
Discontinuity: Block 114310 is at 385311 (was 385309)
Discontinuity: Block 115313 is at 386315 (was 386313)
Discontinuity: Block 116333 is at 387336 (was 387334)
Discontinuity: Block 117336 is at 388340 (was 388338)
Discontinuity: Block 118356 is at 389361 (was 389359)
Discontinuity: Block 119359 is at 390365 (was 390363)
Discontinuity: Block 120379 is at 391386 (was 391384)
Discontinuity: Block 121399 is at 392407 (was 392405)
Discontinuity: Block 122208 is at 393217 (was 393215)
Discontinuity: Block 122395 is at 393405 (was 393403)
Discontinuity: Block 123415 is at 394426 (was 394424)
Discontinuity: Block 124435 is at 395447 (was 395445)
Discontinuity: Block 125438 is at 396451 (was 396449)
Discontinuity: Block 126458 is at 397472 (was 397470)
Discontinuity: Block 127461 is at 398476 (was 398474)
Discontinuity: Block 128481 is at 399497 (was 399495)
Discontinuity: Block 129484 is at 400501 (was 400499)
Discontinuity: Block 130500 is at 401518 (was 401516)
filedh: 134 extents found

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Odd Block allocation behavior on Reiser3
  2004-08-09 20:19 Odd Block allocation behavior on Reiser3 Sonny Rao
@ 2004-08-09 20:30 ` Chris Mason
  2004-08-09 22:04   ` Sonny Rao
  0 siblings, 1 reply; 19+ messages in thread
From: Chris Mason @ 2004-08-09 20:30 UTC (permalink / raw)
  To: Sonny Rao; +Cc: reiserfs-list

On Mon, 2004-08-09 at 16:19, Sonny Rao wrote:
> Hi, I'm investigating filesystem performance on sequential read
> patterns of large files, and I discovered an odd pattern of block
> allocation and subsequent re-allocation after overwrite under reiser3:
> 
Exactly which kernel is this?  The block allocator in v3 has changed
recently.

> This was done on a newly created filesystem with plenty of available
> space and no other files.  I tried this test several times and saw the
> number of extents for the file vary from 5,6,7 and 134 extents, but it
> is always different after each over-write.
> 
You've hit a "feature" of the journal.  When you delete a file, the data
blocks aren't available for reuse until the transaction that allocated
them is committed to the log.  If you were to put a sync in between each
run of dd, you should get roughly the same blocks allocated each time. 
ext3 does the same things, although somewhat differently.  The
asynchronous commit is probably just finishing a little sooner on ext3.

> First, I expect that an extent-based filesystem like reiserfs

reiser4 is extent based, reiser3 is not.

-chris



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Odd Block allocation behavior on Reiser3
  2004-08-09 20:30 ` Chris Mason
@ 2004-08-09 22:04   ` Sonny Rao
  2004-08-10  7:16     ` Hans Reiser
  2004-08-10 12:53     ` Odd Block allocation behavior on Reiser3 Chris Mason
  0 siblings, 2 replies; 19+ messages in thread
From: Sonny Rao @ 2004-08-09 22:04 UTC (permalink / raw)
  To: Chris Mason; +Cc: reiserfs-list

On Mon, Aug 09, 2004 at 04:30:51PM -0400, Chris Mason wrote:
> On Mon, 2004-08-09 at 16:19, Sonny Rao wrote:
> > Hi, I'm investigating filesystem performance on sequential read
> > patterns of large files, and I discovered an odd pattern of block
> > allocation and subsequent re-allocation after overwrite under reiser3:
> > 
> Exactly which kernel is this?  The block allocator in v3 has changed
> recently.

2.6.7 stock

> > This was done on a newly created filesystem with plenty of available
> > space and no other files.  I tried this test several times and saw the
> > number of extents for the file vary from 5,6,7 and 134 extents, but it
> > is always different after each over-write.
> > 
> You've hit a "feature" of the journal.  When you delete a file, the data
> blocks aren't available for reuse until the transaction that allocated
> them is committed to the log.  If you were to put a sync in between each
> run of dd, you should get roughly the same blocks allocated each time. 
> ext3 does the same things, although somewhat differently.  The
> asynchronous commit is probably just finishing a little sooner on ext3.
> 
> > First, I expect that an extent-based filesystem like reiserfs
> 
> reiser4 is extent based, reiser3 is not.


Ah, I didn't know that.  I'm still confused as to why on the first
allocation/create we get such bad fragmentation, you can see that even
though the file is fragmented into 134 blocks, the blocks are very
close together.  Most of the extents are only 2 blocks apart.

Sonny

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Odd Block allocation behavior on Reiser3
  2004-08-09 22:04   ` Sonny Rao
@ 2004-08-10  7:16     ` Hans Reiser
  2004-08-10 15:45       ` Sonny Rao
  2004-08-10 12:53     ` Odd Block allocation behavior on Reiser3 Chris Mason
  1 sibling, 1 reply; 19+ messages in thread
From: Hans Reiser @ 2004-08-10  7:16 UTC (permalink / raw)
  To: Sonny Rao; +Cc: Chris Mason, reiserfs-list

Sonny Rao wrote:

>On Mon, Aug 09, 2004 at 04:30:51PM -0400, Chris Mason wrote:
>  
>
>>On Mon, 2004-08-09 at 16:19, Sonny Rao wrote:
>>    
>>
>>>Hi, I'm investigating filesystem performance on sequential read
>>>patterns of large files, and I discovered an odd pattern of block
>>>allocation and subsequent re-allocation after overwrite under reiser3:
>>>
>>>      
>>>
>>Exactly which kernel is this?  The block allocator in v3 has changed
>>recently.
>>    
>>
>
>2.6.7 stock
>
>  
>
>>>This was done on a newly created filesystem with plenty of available
>>>space and no other files.  I tried this test several times and saw the
>>>number of extents for the file vary from 5,6,7 and 134 extents, but it
>>>is always different after each over-write.
>>>
>>>      
>>>
>>You've hit a "feature" of the journal.  When you delete a file, the data
>>blocks aren't available for reuse until the transaction that allocated
>>them is committed to the log.  If you were to put a sync in between each
>>run of dd, you should get roughly the same blocks allocated each time. 
>>ext3 does the same things, although somewhat differently.  The
>>asynchronous commit is probably just finishing a little sooner on ext3.
>>
>>    
>>
>>>First, I expect that an extent-based filesystem like reiserfs
>>>      
>>>
>>reiser4 is extent based, reiser3 is not.
>>    
>>
>
>
>Ah, I didn't know that.  I'm still confused as to why on the first
>allocation/create we get such bad fragmentation, you can see that even
>though the file is fragmented into 134 blocks, the blocks are very
>close together.  Most of the extents are only 2 blocks apart.
>
>Sonny
>
>
>  
>
Interesting.What happens without overwrite, that is, if you write more 
files without deleting the old ones?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Odd Block allocation behavior on Reiser3
  2004-08-09 22:04   ` Sonny Rao
  2004-08-10  7:16     ` Hans Reiser
@ 2004-08-10 12:53     ` Chris Mason
  2004-08-10 16:12       ` Sonny Rao
  1 sibling, 1 reply; 19+ messages in thread
From: Chris Mason @ 2004-08-10 12:53 UTC (permalink / raw)
  To: Sonny Rao; +Cc: reiserfs-list

On Mon, 2004-08-09 at 18:04, Sonny Rao wrote:
> On Mon, Aug 09, 2004 at 04:30:51PM -0400, Chris Mason wrote:
> > On Mon, 2004-08-09 at 16:19, Sonny Rao wrote:
> > > Hi, I'm investigating filesystem performance on sequential read
> > > patterns of large files, and I discovered an odd pattern of block
> > > allocation and subsequent re-allocation after overwrite under reiser3:
> > > 
> > Exactly which kernel is this?  The block allocator in v3 has changed
> > recently.
> 
> 2.6.7 stock
> 
Ok, the block allocator optimizations went in after 2.6.7.  I'd be
curious to see how 2.6.8-rc3 does in your tests.

> > > This was done on a newly created filesystem with plenty of available
> > > space and no other files.  I tried this test several times and saw the
> > > number of extents for the file vary from 5,6,7 and 134 extents, but it
> > > is always different after each over-write.
> > > 
> > You've hit a "feature" of the journal.  When you delete a file, the data
> > blocks aren't available for reuse until the transaction that allocated
> > them is committed to the log.  If you were to put a sync in between each
> > run of dd, you should get roughly the same blocks allocated each time. 
> > ext3 does the same things, although somewhat differently.  The
> > asynchronous commit is probably just finishing a little sooner on ext3.
> > 
> > > First, I expect that an extent-based filesystem like reiserfs
> > 
> > reiser4 is extent based, reiser3 is not.
> 
> 
> Ah, I didn't know that.  I'm still confused as to why on the first
> allocation/create we get such bad fragmentation, you can see that even
> though the file is fragmented into 134 blocks, the blocks are very
> close together.  Most of the extents are only 2 blocks apart.

This could be the metadata mixed in with the file data.  In general this
is a good
Still there were a number of cases the old allocator didn't do as well
with. thing, when you read the file sequentially, the metadata required
to find the next block will already be in the drive's cache.

Whenever you're doing fragmentation tests, it helps to also identify the
actual effect of the fragmentation on the time it takes to read a file
or set of files.  It's easy to create a directory where all the files
are 99.99% contiguous, but that takes 3x as much time to read in.

-chris


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Odd Block allocation behavior on Reiser3
  2004-08-10  7:16     ` Hans Reiser
@ 2004-08-10 15:45       ` Sonny Rao
  2004-08-10 17:52         ` Hans Reiser
  0 siblings, 1 reply; 19+ messages in thread
From: Sonny Rao @ 2004-08-10 15:45 UTC (permalink / raw)
  To: Hans Reiser; +Cc: reiserfs-list

On Tue, Aug 10, 2004 at 12:16:39AM -0700, Hans Reiser wrote:
> >
> Interesting.What happens without overwrite, that is, if you write more 
> files without deleting the old ones?

Below I made 24 one gigabyte files in sequence
All of them are similarly fragmented: 

x44016way1:/mnt/tmp0/data # ls -lh
total 25G
drwx------  2 root root  816 Aug 10 10:05 .
drwxr-xr-x  5 root root   96 Aug 10 10:00 ..
-rwx------  1 root root 1.0G Aug 10 10:00 datafile0
-rwx------  1 root root 1.0G Aug 10 10:00 datafile1
-rwx------  1 root root 1.0G Aug 10 10:02 datafile10
-rwx------  1 root root 1.0G Aug 10 10:02 datafile11
-rwx------  1 root root 1.0G Aug 10 10:03 datafile12
-rwx------  1 root root 1.0G Aug 10 10:03 datafile13
-rwx------  1 root root 1.0G Aug 10 10:03 datafile14
-rwx------  1 root root 1.0G Aug 10 10:03 datafile15
-rwx------  1 root root 1.0G Aug 10 10:03 datafile16
-rwx------  1 root root 1.0G Aug 10 10:04 datafile17
-rwx------  1 root root 1.0G Aug 10 10:04 datafile18
-rwx------  1 root root 1.0G Aug 10 10:04 datafile19
-rwx------  1 root root 1.0G Aug 10 10:01 datafile2
-rwx------  1 root root 1.0G Aug 10 10:04 datafile20
-rwx------  1 root root 1.0G Aug 10 10:04 datafile21
-rwx------  1 root root 1.0G Aug 10 10:05 datafile22
-rwx------  1 root root 1.0G Aug 10 10:05 datafile23
-rwx------  1 root root 1.0G Aug 10 10:01 datafile3
-rwx------  1 root root 1.0G Aug 10 10:01 datafile4
-rwx------  1 root root 1.0G Aug 10 10:01 datafile5
-rwx------  1 root root 1.0G Aug 10 10:01 datafile6
-rwx------  1 root root 1.0G Aug 10 10:02 datafile7
-rwx------  1 root root 1.0G Aug 10 10:02 datafile8
-rwx------  1 root root 1.0G Aug 10 10:02 datafile9

x44016way1:/mnt/tmp0/data # filefrag *
datafile0: 268 extents found
datafile1: 268 extents found
datafile10: 267 extents found
datafile11: 268 extents found
datafile12: 268 extents found
datafile13: 268 extents found
datafile14: 268 extents found
datafile15: 268 extents found
datafile16: 268 extents found
datafile17: 268 extents found
datafile18: 267 extents found
datafile19: 268 extents found
datafile2: 267 extents found
datafile20: 268 extents found
datafile21: 268 extents found
datafile22: 268 extents found
datafile23: 267 extents found
datafile3: 268 extents found
datafile4: 268 extents found
datafile5: 268 extents found
datafile6: 268 extents found
datafile7: 267 extents found
datafile8: 268 extents found
datafile9: 268 extents found


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Odd Block allocation behavior on Reiser3
  2004-08-10 12:53     ` Odd Block allocation behavior on Reiser3 Chris Mason
@ 2004-08-10 16:12       ` Sonny Rao
  0 siblings, 0 replies; 19+ messages in thread
From: Sonny Rao @ 2004-08-10 16:12 UTC (permalink / raw)
  To: reiserfs-list

On Tue, Aug 10, 2004 at 08:53:31AM -0400, Chris Mason wrote:
<snip>
> Ok, the block allocator optimizations went in after 2.6.7.  I'd be
> curious to see how 2.6.8-rc3 does in your tests.

I'll try it.


> This could be the metadata mixed in with the file data.  In general this
> is a good
> Still there were a number of cases the old allocator didn't do as well
> with. thing, when you read the file sequentially, the metadata required
> to find the next block will already be in the drive's cache.
>
> Whenever you're doing fragmentation tests, it helps to also identify the
> actual effect of the fragmentation on the time it takes to read a file
> or set of files.  It's easy to create a directory where all the files
> are 99.99% contiguous, but that takes 3x as much time to read in.
>
> -chris

In this case, I began investigating fragmentation because I noticed
reiserfs is performing below the other filesystems I tested for some
reason, so I belive this allocation pattern is hurting performance
on sequential reads.

Sonny

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Odd Block allocation behavior on Reiser3
  2004-08-10 15:45       ` Sonny Rao
@ 2004-08-10 17:52         ` Hans Reiser
  2004-08-10 18:25           ` Chris Mason
  2004-08-10 20:12           ` Why larger extent counts aren't necessarily bad (was Re: Odd Block allocation behavior on Reiser3) Jeff Mahoney
  0 siblings, 2 replies; 19+ messages in thread
From: Hans Reiser @ 2004-08-10 17:52 UTC (permalink / raw)
  To: Sonny Rao; +Cc: reiserfs-list

Sonny Rao wrote:

>On Tue, Aug 10, 2004 at 12:16:39AM -0700, Hans Reiser wrote:
>  
>
>>Interesting.What happens without overwrite, that is, if you write more 
>>files without deleting the old ones?
>>    
>>
>
>Below I made 24 one gigabyte files in sequence
>All of them are similarly fragmented: 
>
>x44016way1:/mnt/tmp0/data # ls -lh
>total 25G
>drwx------  2 root root  816 Aug 10 10:05 .
>drwxr-xr-x  5 root root   96 Aug 10 10:00 ..
>-rwx------  1 root root 1.0G Aug 10 10:00 datafile0
>-rwx------  1 root root 1.0G Aug 10 10:00 datafile1
>-rwx------  1 root root 1.0G Aug 10 10:02 datafile10
>-rwx------  1 root root 1.0G Aug 10 10:02 datafile11
>-rwx------  1 root root 1.0G Aug 10 10:03 datafile12
>-rwx------  1 root root 1.0G Aug 10 10:03 datafile13
>-rwx------  1 root root 1.0G Aug 10 10:03 datafile14
>-rwx------  1 root root 1.0G Aug 10 10:03 datafile15
>-rwx------  1 root root 1.0G Aug 10 10:03 datafile16
>-rwx------  1 root root 1.0G Aug 10 10:04 datafile17
>-rwx------  1 root root 1.0G Aug 10 10:04 datafile18
>-rwx------  1 root root 1.0G Aug 10 10:04 datafile19
>-rwx------  1 root root 1.0G Aug 10 10:01 datafile2
>-rwx------  1 root root 1.0G Aug 10 10:04 datafile20
>-rwx------  1 root root 1.0G Aug 10 10:04 datafile21
>-rwx------  1 root root 1.0G Aug 10 10:05 datafile22
>-rwx------  1 root root 1.0G Aug 10 10:05 datafile23
>-rwx------  1 root root 1.0G Aug 10 10:01 datafile3
>-rwx------  1 root root 1.0G Aug 10 10:01 datafile4
>-rwx------  1 root root 1.0G Aug 10 10:01 datafile5
>-rwx------  1 root root 1.0G Aug 10 10:01 datafile6
>-rwx------  1 root root 1.0G Aug 10 10:02 datafile7
>-rwx------  1 root root 1.0G Aug 10 10:02 datafile8
>-rwx------  1 root root 1.0G Aug 10 10:02 datafile9
>
>x44016way1:/mnt/tmp0/data # filefrag *
>datafile0: 268 extents found
>datafile1: 268 extents found
>datafile10: 267 extents found
>datafile11: 268 extents found
>datafile12: 268 extents found
>datafile13: 268 extents found
>datafile14: 268 extents found
>datafile15: 268 extents found
>datafile16: 268 extents found
>datafile17: 268 extents found
>datafile18: 267 extents found
>datafile19: 268 extents found
>datafile2: 267 extents found
>datafile20: 268 extents found
>datafile21: 268 extents found
>datafile22: 268 extents found
>datafile23: 267 extents found
>datafile3: 268 extents found
>datafile4: 268 extents found
>datafile5: 268 extents found
>datafile6: 268 extents found
>datafile7: 267 extents found
>datafile8: 268 extents found
>datafile9: 268 extents found
>
>
>
>  
>
this could explain some reiser3 performance problems.   This is what 
happens when I spend all my time chasing funding and don't spend it 
reviewing code and benchmarks, sigh.

Thanks for spotting this.  I would be curious if this is occuring near 
the transition between unformatted nodes and their parents, or something 
else.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Odd Block allocation behavior on Reiser3
  2004-08-10 17:52         ` Hans Reiser
@ 2004-08-10 18:25           ` Chris Mason
  2004-08-10 18:50             ` Chris Mason
  2004-08-10 19:40             ` Hans Reiser
  2004-08-10 20:12           ` Why larger extent counts aren't necessarily bad (was Re: Odd Block allocation behavior on Reiser3) Jeff Mahoney
  1 sibling, 2 replies; 19+ messages in thread
From: Chris Mason @ 2004-08-10 18:25 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Sonny Rao, reiserfs-list

On Tue, 2004-08-10 at 13:52, Hans Reiser wrote:
> Sonny Rao wrote:
> 
> >On Tue, Aug 10, 2004 at 12:16:39AM -0700, Hans Reiser wrote:
> >  
> >
> >>Interesting.What happens without overwrite, that is, if you write more 
> >>files without deleting the old ones?
> >>    
> >>
> >
> >Below I made 24 one gigabyte files in sequence
> >All of them are similarly fragmented: 
> >data # filefrag *
> >datafile0: 268 extents found
> >
> this could explain some reiser3 performance problems.   This is what 
> happens when I spend all my time chasing funding and don't spend it 
> reviewing code and benchmarks, sigh.
> 
> Thanks for spotting this.  I would be curious if this is occuring near 
> the transition between unformatted nodes and their parents, or something 
> else.

There have been a few threads on this on reiserfs-list

singer:/data # dd if=/dev/zero of=foo bs=1MB count=1000
1000+0 records in
1000+0 records out
singer:/data # filefrag foo
foo: 1 extent found

The new allocator really should be doing a better job here.

-chris



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Odd Block allocation behavior on Reiser3
  2004-08-10 18:25           ` Chris Mason
@ 2004-08-10 18:50             ` Chris Mason
  2004-08-10 19:42               ` Hans Reiser
  2004-08-10 23:12               ` Sonny Rao
  2004-08-10 19:40             ` Hans Reiser
  1 sibling, 2 replies; 19+ messages in thread
From: Chris Mason @ 2004-08-10 18:50 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Sonny Rao, reiserfs-list

On Tue, 2004-08-10 at 14:25, Chris Mason wrote:
> On Tue, 2004-08-10 at 13:52, Hans Reiser wrote:
> > Sonny Rao wrote:
> > 
> > >On Tue, Aug 10, 2004 at 12:16:39AM -0700, Hans Reiser wrote:
> > >  
> > >
> > >>Interesting.What happens without overwrite, that is, if you write more 
> > >>files without deleting the old ones?
> > >>    
> > >>
> > >
> > >Below I made 24 one gigabyte files in sequence
> > >All of them are similarly fragmented: 
> > >data # filefrag *
> > >datafile0: 268 extents found
> > >
> > this could explain some reiser3 performance problems.   This is what 
> > happens when I spend all my time chasing funding and don't spend it 
> > reviewing code and benchmarks, sigh.
> > 
> > Thanks for spotting this.  I would be curious if this is occuring near 
> > the transition between unformatted nodes and their parents, or something 
> > else.
> 
> There have been a few threads on this on reiserfs-list
> 
> singer:/data # dd if=/dev/zero of=foo bs=1MB count=1000
> 1000+0 records in
> 1000+0 records out
> singer:/data # filefrag foo
> foo: 1 extent found
> 
> The new allocator really should be doing a better job here.

Hmpf, that's what I get for expecting filefrag to work properly on
amd64.  The actual number of extents is 199, which is still better then
268.  Using fibmap, the fragmentation percentage is still the same as
ext3 (99.99% unfragmented) meaning the length between the extents is
quite small.

If you mount with:

mount -o alloc=skip_busy:oid_groups

You get 8 extents on a 1GB file.

This is because the oid grouping tries much harder to isolate the file
data from data from other files and metadata.  It is far from optimal
for normal usage, but for huge files it works nicely.

-chris



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Odd Block allocation behavior on Reiser3
  2004-08-10 18:25           ` Chris Mason
  2004-08-10 18:50             ` Chris Mason
@ 2004-08-10 19:40             ` Hans Reiser
  2004-08-10 23:00               ` Sonny Rao
  1 sibling, 1 reply; 19+ messages in thread
From: Hans Reiser @ 2004-08-10 19:40 UTC (permalink / raw)
  To: Chris Mason; +Cc: Sonny Rao, reiserfs-list, E. Gryaznova

Chris Mason wrote:

>On Tue, 2004-08-10 at 13:52, Hans Reiser wrote:
>  
>
>>Sonny Rao wrote:
>>
>>    
>>
>>>On Tue, Aug 10, 2004 at 12:16:39AM -0700, Hans Reiser wrote:
>>> 
>>>
>>>      
>>>
>>>>Interesting.What happens without overwrite, that is, if you write more 
>>>>files without deleting the old ones?
>>>>   
>>>>
>>>>        
>>>>
>>>Below I made 24 one gigabyte files in sequence
>>>All of them are similarly fragmented: 
>>>data # filefrag *
>>>datafile0: 268 extents found
>>>
>>>      
>>>
>>this could explain some reiser3 performance problems.   This is what 
>>happens when I spend all my time chasing funding and don't spend it 
>>reviewing code and benchmarks, sigh.
>>
>>Thanks for spotting this.  I would be curious if this is occuring near 
>>the transition between unformatted nodes and their parents, or something 
>>else.
>>    
>>
>
>There have been a few threads on this on reiserfs-list
>
>singer:/data # dd if=/dev/zero of=foo bs=1MB count=1000
>1000+0 records in
>1000+0 records out
>singer:/data # filefrag foo
>foo: 1 extent found
>
>The new allocator really should be doing a better job here.
>
>-chris
>
>
>
>
>  
>
Well, this explains why we haven't cured the problem, we can't reproduce 
it, yes?

We need to figure out what Sonny is doing, and maybe have elena try on 
yet another machine.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Odd Block allocation behavior on Reiser3
  2004-08-10 18:50             ` Chris Mason
@ 2004-08-10 19:42               ` Hans Reiser
  2004-08-10 20:29                 ` Chris Mason
  2004-08-10 23:12               ` Sonny Rao
  1 sibling, 1 reply; 19+ messages in thread
From: Hans Reiser @ 2004-08-10 19:42 UTC (permalink / raw)
  To: Chris Mason; +Cc: Sonny Rao, reiserfs-list

Chris Mason wrote:

>On Tue, 2004-08-10 at 14:25, Chris Mason wrote:
>  
>
>>On Tue, 2004-08-10 at 13:52, Hans Reiser wrote:
>>    
>>
>>>Sonny Rao wrote:
>>>
>>>      
>>>
>>>>On Tue, Aug 10, 2004 at 12:16:39AM -0700, Hans Reiser wrote:
>>>> 
>>>>
>>>>        
>>>>
>>>>>Interesting.What happens without overwrite, that is, if you write more 
>>>>>files without deleting the old ones?
>>>>>   
>>>>>
>>>>>          
>>>>>
>>>>Below I made 24 one gigabyte files in sequence
>>>>All of them are similarly fragmented: 
>>>>data # filefrag *
>>>>datafile0: 268 extents found
>>>>
>>>>        
>>>>
>>>this could explain some reiser3 performance problems.   This is what 
>>>happens when I spend all my time chasing funding and don't spend it 
>>>reviewing code and benchmarks, sigh.
>>>
>>>Thanks for spotting this.  I would be curious if this is occuring near 
>>>the transition between unformatted nodes and their parents, or something 
>>>else.
>>>      
>>>
>>There have been a few threads on this on reiserfs-list
>>
>>singer:/data # dd if=/dev/zero of=foo bs=1MB count=1000
>>1000+0 records in
>>1000+0 records out
>>singer:/data # filefrag foo
>>foo: 1 extent found
>>
>>The new allocator really should be doing a better job here.
>>    
>>
>
>Hmpf, that's what I get for expecting filefrag to work properly on
>amd64.  The actual number of extents is 199, which is still better then
>268.  Using fibmap, the fragmentation percentage is still the same as
>ext3 (99.99% unfragmented) meaning the length between the extents is
>quite small.
>
>If you mount with:
>
>mount -o alloc=skip_busy:oid_groups
>
>You get 8 extents on a 1GB file.
>
>This is because the oid grouping tries much harder to isolate the file
>data from data from other files and metadata.  It is far from optimal
>for normal usage, but for huge files it works nicely.
>
>-chris
>
>
>
>
>  
>
For an empty filesystem there should be no fragmentation at all for big 
dds one after another, except at bitmap block boundaries.  Any other 
result indicates flawed code.  Remember, a hint of flawed code often 
leads to more than a trivial flaw when fully understood.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Why larger extent counts aren't necessarily bad (was Re: Odd Block allocation behavior on Reiser3)
  2004-08-10 17:52         ` Hans Reiser
  2004-08-10 18:25           ` Chris Mason
@ 2004-08-10 20:12           ` Jeff Mahoney
  2004-09-09 17:04             ` Hans Reiser
  1 sibling, 1 reply; 19+ messages in thread
From: Jeff Mahoney @ 2004-08-10 20:12 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Sonny Rao, reiserfs-list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hans Reiser wrote:
| Sonny Rao wrote:

|> Below I made 24 one gigabyte files in sequence
|> All of them are similarly fragmented:
| this could explain some reiser3 performance problems.   This is what
| happens when I spend all my time chasing funding and don't spend it
| reviewing code and benchmarks, sigh.
|
| Thanks for spotting this.  I would be curious if this is occuring near
| the transition between unformatted nodes and their parents, or something
| else.

Having a high extent count isn't indicative of fragmentation negatively
affecting performance. In fact, it may be just the opposite.

I modified filefrag.c to display the displacement between the extents,
and the average extent length. My disk was only 9 GB, so I had to limit
my test to 8 1 GB files, but the results are the same - it's a
sequential write. The number of files has no bearing on the result.

For this workload, the patterns are so simple, it's distributed almost
perfectly. Even using the skip_busy algorithm by itself (a practice I
warned about over a year ago) produces acceptible results. The results
showed an median extent length of 1023 extents (1 less than contained in
an indirect pointer block), followed by a median extent displacement of
2 blocks. For all intents and purposes, the file is contiguous, with
metadata interspersed.

The pattern of a streaming read/write operation would be like so:
Locate file, locate first indirect pointer block, read blocks, find next
indirect pointer block, read blocks, ...

In the perceived "ideal" fragementation pattern of 9 fragments
(1024MB/128MB - 4k per bitmap + 8*4k remainder), the metadata is not
interspersed with the file data. It makes the fragmentation extent
number look nice and low, but it really means that every time we need to
read another indirect pointer block, we're seeking outside the data
stream, reading a few blocks (readahead), and seeking back.

In the fragmentation pattern created using the new allocation algorithms
that Chris and I developed, you'll get a higher fragmentation number,
but the extents are close together, and the pointer blocks are already
read in due to readahead. The "actual" fragmentation is lower than the
"ideal" case above since less seeking is required.

- -Jeff

- --
Jeff Mahoney
SuSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBGSwaLPWxlyuTD7IRAkwWAJ4uPTDcvSAgKpJm6KA4KMcSDb5iKwCcDjKN
cdMveSkc/zsVdGzsZvx5SsM=
=l45w
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Odd Block allocation behavior on Reiser3
  2004-08-10 19:42               ` Hans Reiser
@ 2004-08-10 20:29                 ` Chris Mason
  2004-08-10 21:47                   ` Hans Reiser
  0 siblings, 1 reply; 19+ messages in thread
From: Chris Mason @ 2004-08-10 20:29 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Sonny Rao, reiserfs-list

On Tue, 2004-08-10 at 15:42, Hans Reiser wrote:

> >Hmpf, that's what I get for expecting filefrag to work properly on
> >amd64.  The actual number of extents is 199, which is still better then
> >268.  Using fibmap, the fragmentation percentage is still the same as
> >ext3 (99.99% unfragmented) meaning the length between the extents is
> >quite small.
> >
> >If you mount with:
> >
> >mount -o alloc=skip_busy:oid_groups
> >
> >You get 8 extents on a 1GB file.
> >
> >This is because the oid grouping tries much harder to isolate the file
> >data from data from other files and metadata.  It is far from optimal
> >for normal usage, but for huge files it works nicely.

> For an empty filesystem there should be no fragmentation at all for big 
> dds one after another, except at bitmap block boundaries.  Any other 
> result indicates flawed code.  Remember, a hint of flawed code often 
> leads to more than a trivial flaw when fully understood.

Grin, you're welcome to overhaul the v3 allocator once again ;)  I made
a number of trade-offs under the restriction of the disk format and size
of the changes that were reasonable to make to the v3 code base.  For
every workload I tried, it scored as well or better then the old
allocator in fragmentation and performance.

The only reason I passed on the 1 extent number was that I assumed
filefrag had some fuzziness built in.  I believe that fibmap is a better
measurement overall, since it calculates not only the number of extents
but the total distance between the extents.  I feel the latter is a
better indication of the performance you'll get when you try to read a
given file.

If you examine the actual layout achieved by the new allocator, most of
the fragments are 1023 or so blocks long (jeff did some quick double
checks of this), which is what you can fit in a single leaf.  

In other words, the leaves are mixed in with the file data they point
to, which makes it quite likely they will be in cache (drive or OS) and
not need any seeks during the read.

-chris



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Odd Block allocation behavior on Reiser3
  2004-08-10 20:29                 ` Chris Mason
@ 2004-08-10 21:47                   ` Hans Reiser
  0 siblings, 0 replies; 19+ messages in thread
From: Hans Reiser @ 2004-08-10 21:47 UTC (permalink / raw)
  To: Chris Mason; +Cc: Sonny Rao, reiserfs-list

Chris Mason wrote:

>On Tue, 2004-08-10 at 15:42, Hans Reiser wrote:
>
>  
>
>>>Hmpf, that's what I get for expecting filefrag to work properly on
>>>amd64.  The actual number of extents is 199, which is still better then
>>>268.  Using fibmap, the fragmentation percentage is still the same as
>>>ext3 (99.99% unfragmented) meaning the length between the extents is
>>>quite small.
>>>
>>>If you mount with:
>>>
>>>mount -o alloc=skip_busy:oid_groups
>>>
>>>You get 8 extents on a 1GB file.
>>>
>>>This is because the oid grouping tries much harder to isolate the file
>>>data from data from other files and metadata.  It is far from optimal
>>>for normal usage, but for huge files it works nicely.
>>>      
>>>
>
>  
>
>>For an empty filesystem there should be no fragmentation at all for big 
>>dds one after another, except at bitmap block boundaries.  Any other 
>>result indicates flawed code.  Remember, a hint of flawed code often 
>>leads to more than a trivial flaw when fully understood.
>>    
>>
>
>Grin, you're welcome to overhaul the v3 allocator once again ;)  I made
>a number of trade-offs under the restriction of the disk format and size
>of the changes that were reasonable to make to the v3 code base.  For
>every workload I tried, it scored as well or better then the old
>allocator in fragmentation and performance.
>
>The only reason I passed on the 1 extent number was that I assumed
>filefrag had some fuzziness built in.  I believe that fibmap is a better
>measurement overall, since it calculates not only the number of extents
>but the total distance between the extents.  I feel the latter is a
>better indication of the performance you'll get when you try to read a
>given file.
>
>If you examine the actual layout achieved by the new allocator, most of
>the fragments are 1023 or so blocks long (jeff did some quick double
>checks of this), which is what you can fit in a single leaf.  
>
>In other words, the leaves are mixed in with the file data they point
>to, which makes it quite likely they will be in cache (drive or OS) and
>not need any seeks during the read.
>
>-chris
>
>
>
>
>  
>
Well, you are right, we should focus on V4 block allocation instead, 
your point is taken on that.

If Jeff would like to look into improving V4 allocation, please let me 
know. 

Hans

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Odd Block allocation behavior on Reiser3
  2004-08-10 19:40             ` Hans Reiser
@ 2004-08-10 23:00               ` Sonny Rao
  0 siblings, 0 replies; 19+ messages in thread
From: Sonny Rao @ 2004-08-10 23:00 UTC (permalink / raw)
  To: Hans Reiser; +Cc: reiserfs-list

On Tue, Aug 10, 2004 at 12:40:19PM -0700, Hans Reiser wrote:
> Chris Mason wrote:
> 
> >On Tue, 2004-08-10 at 13:52, Hans Reiser wrote:
> > 
> >
> >>Sonny Rao wrote:
> >>
> >>   
> >>
> >>>On Tue, Aug 10, 2004 at 12:16:39AM -0700, Hans Reiser wrote:
> >>>
> >>>
> >>>     
> >>>
> >>>>Interesting.What happens without overwrite, that is, if you write more 
> >>>>files without deleting the old ones?
> >>>>  
> >>>>
> >>>>       
> >>>>
> >>>Below I made 24 one gigabyte files in sequence
> >>>All of them are similarly fragmented: 
> >>>data # filefrag *
> >>>datafile0: 268 extents found
> >>>
> >>>     
> >>>
> >>this could explain some reiser3 performance problems.   This is what 
> >>happens when I spend all my time chasing funding and don't spend it 
> >>reviewing code and benchmarks, sigh.
> >>
> >>Thanks for spotting this.  I would be curious if this is occuring near 
> >>the transition between unformatted nodes and their parents, or something 
> >>else.
> >>   
> >>
> >
> >There have been a few threads on this on reiserfs-list
> >
> >singer:/data # dd if=/dev/zero of=foo bs=1MB count=1000
> >1000+0 records in
> >1000+0 records out
> >singer:/data # filefrag foo
> >foo: 1 extent found
> >
> >The new allocator really should be doing a better job here.
> >
> >-chris
> >
> >
> >
> >
> > 
> >
> Well, this explains why we haven't cured the problem, we can't reproduce 
> it, yes?
> 
> We need to figure out what Sonny is doing, and maybe have elena try on 
> yet another machine.

As far as reproducing these results, I have been able to reproduce
those results on another machine running an older 2.6.4 kernel using a
simple dd command line to produce the files :

for i in $(seq 0 10) ; do dd if=/dev/zero of=file${i} bs=1k count=$[
1024 * 1024 ] ; done

All 11 files had 268 extents.

Sonny

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Odd Block allocation behavior on Reiser3
  2004-08-10 18:50             ` Chris Mason
  2004-08-10 19:42               ` Hans Reiser
@ 2004-08-10 23:12               ` Sonny Rao
  2004-08-11  1:31                 ` Jeff Mahoney
  1 sibling, 1 reply; 19+ messages in thread
From: Sonny Rao @ 2004-08-10 23:12 UTC (permalink / raw)
  To: Chris Mason; +Cc: reiserfs-list

On Tue, Aug 10, 2004 at 02:50:13PM -0400, Chris Mason wrote:
> On Tue, 2004-08-10 at 14:25, Chris Mason wrote:
> > On Tue, 2004-08-10 at 13:52, Hans Reiser wrote:
> > > Sonny Rao wrote:
> > > 
> > > >On Tue, Aug 10, 2004 at 12:16:39AM -0700, Hans Reiser wrote:
> > > >  
> > > >
> > > >>Interesting.What happens without overwrite, that is, if you write more 
> > > >>files without deleting the old ones?
> > > >>    
> > > >>
> > > >
> > > >Below I made 24 one gigabyte files in sequence
> > > >All of them are similarly fragmented: 
> > > >data # filefrag *
> > > >datafile0: 268 extents found
> > > >
> > > this could explain some reiser3 performance problems.   This is what 
> > > happens when I spend all my time chasing funding and don't spend it 
> > > reviewing code and benchmarks, sigh.
> > > 
> > > Thanks for spotting this.  I would be curious if this is occuring near 
> > > the transition between unformatted nodes and their parents, or something 
> > > else.
> > 
> > There have been a few threads on this on reiserfs-list
> > 
> > singer:/data # dd if=/dev/zero of=foo bs=1MB count=1000
> > 1000+0 records in
> > 1000+0 records out
> > singer:/data # filefrag foo
> > foo: 1 extent found
> > 
> > The new allocator really should be doing a better job here.
> 
> Hmpf, that's what I get for expecting filefrag to work properly on
> amd64.  The actual number of extents is 199, which is still better then
> 268.  Using fibmap, the fragmentation percentage is still the same as
> ext3 (99.99% unfragmented) meaning the length between the extents is
> quite small.
> 
> If you mount with:
> 
> mount -o alloc=skip_busy:oid_groups
> 
> You get 8 extents on a 1GB file.
> 
> This is because the oid grouping tries much harder to isolate the file
> data from data from other files and metadata.  It is far from optimal
> for normal usage, but for huge files it works nicely.
>

Is that "oid_groups" option present in the stock 2.6.7 kernel?  It
didn't like that mount option, and a cursory grep for "oid_groups" in
linux-2.6.7/fs/reiserfs/* doesn't show me anything.  The "skip_busy"
option worked but didn't seem to change allocation behavior.

Another piece of information that pointed me at possible fragmentation
came from iostat.  It shows during my tests that the average request
size is about 745 sectors while on ext3 I see an average request size
of about 900 sectors.   
The io size limit on the kernel was 512k, so the max size should have
been about 1024 sectors.  On JFS and XFS I did see the average request size
very close to 1024 sectors, and the best performance.

So, even though you are probably correct about the 2 block gaps being
metadata, the iostat data could be interpreted to indicate this
behavior is hurting performance relative to the other filesystems on
this particular test.

I'm planning on trying out 2.6.8-rc4 soon.

Sonny 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Odd Block allocation behavior on Reiser3
  2004-08-10 23:12               ` Sonny Rao
@ 2004-08-11  1:31                 ` Jeff Mahoney
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff Mahoney @ 2004-08-11  1:31 UTC (permalink / raw)
  To: Sonny Rao; +Cc: Chris Mason, reiserfs-list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sonny Rao wrote:
| On Tue, Aug 10, 2004 at 02:50:13PM -0400, Chris Mason wrote:
|
|>On Tue, 2004-08-10 at 14:25, Chris Mason wrote:
|>
|>>On Tue, 2004-08-10 at 13:52, Hans Reiser wrote:
|>>
|>>>Sonny Rao wrote:
|>>>
|>>>
|>>>>On Tue, Aug 10, 2004 at 12:16:39AM -0700, Hans Reiser wrote:
|>>>>
|>>>>
|>>>>
|>>>>>Interesting.What happens without overwrite, that is, if you write
more
|>>>>>files without deleting the old ones?
|>>>>>
|>>>>>
|>>>>
|>>>>Below I made 24 one gigabyte files in sequence
|>>>>All of them are similarly fragmented:
|>>>>data # filefrag *
|>>>>datafile0: 268 extents found
|>>>>
|>>>
|>>>this could explain some reiser3 performance problems.   This is what
|>>>happens when I spend all my time chasing funding and don't spend it
|>>>reviewing code and benchmarks, sigh.
|>>>
|>>>Thanks for spotting this.  I would be curious if this is occuring near
|>>>the transition between unformatted nodes and their parents, or
something
|>>>else.
|>>
|>>There have been a few threads on this on reiserfs-list
|>>
|>>singer:/data # dd if=/dev/zero of=foo bs=1MB count=1000
|>>1000+0 records in
|>>1000+0 records out
|>>singer:/data # filefrag foo
|>>foo: 1 extent found
|>>
|>>The new allocator really should be doing a better job here.
|>
|>Hmpf, that's what I get for expecting filefrag to work properly on
|>amd64.  The actual number of extents is 199, which is still better then
|>268.  Using fibmap, the fragmentation percentage is still the same as
|>ext3 (99.99% unfragmented) meaning the length between the extents is
|>quite small.
|>
|>If you mount with:
|>
|>mount -o alloc=skip_busy:oid_groups
|>
|>You get 8 extents on a 1GB file.
|>
|>This is because the oid grouping tries much harder to isolate the file
|>data from data from other files and metadata.  It is far from optimal
|>for normal usage, but for huge files it works nicely.
|>
|
|
| Is that "oid_groups" option present in the stock 2.6.7 kernel?  It
| didn't like that mount option, and a cursory grep for "oid_groups" in
| linux-2.6.7/fs/reiserfs/* doesn't show me anything.  The "skip_busy"
| option worked but didn't seem to change allocation behavior.

The oid_groups option is present in 2.6.8-pre releases, but only before
that from Chris Mason's FTP archive.

As far as skip_busy, there's two reasons for that: 1) The skip_busy
behavior was the default, and 2) skip_busy was never meant to be a full
scale allocation algorithm, just a way of keeping a pool of available
blocks close to already allocated files. When there's no hinting
involved, it ends up being poor for allocation - it will select the
first block available on disk when the file doesn't already exist. I
warned about this when I first submitted the code.

- -Jeff

- --
Jeff Mahoney
SuSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBGXb4LPWxlyuTD7IRAlFrAJ4lvW20RLzhFhLq4V0OMTDVFGsWwACfahaG
Jbb5Cq74+nI5Se90SdWrmsQ=
=hfS5
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Why larger extent counts aren't necessarily bad (was Re: Odd Block allocation behavior on Reiser3)
  2004-08-10 20:12           ` Why larger extent counts aren't necessarily bad (was Re: Odd Block allocation behavior on Reiser3) Jeff Mahoney
@ 2004-09-09 17:04             ` Hans Reiser
  0 siblings, 0 replies; 19+ messages in thread
From: Hans Reiser @ 2004-09-09 17:04 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: Sonny Rao, reiserfs-list

Jeff Mahoney wrote:

> Hans Reiser wrote:
> | Sonny Rao wrote:
>
> |> Below I made 24 one gigabyte files in sequence
> |> All of them are similarly fragmented:
> | this could explain some reiser3 performance problems. This is what
> | happens when I spend all my time chasing funding and don't spend it
> | reviewing code and benchmarks, sigh.
> |
> | Thanks for spotting this. I would be curious if this is occuring near
> | the transition between unformatted nodes and their parents, or something
> | else.
>
> Having a high extent count isn't indicative of fragmentation negatively
> affecting performance. In fact, it may be just the opposite.
>
> I modified filefrag.c to display the displacement between the extents,
> and the average extent length. My disk was only 9 GB, so I had to limit
> my test to 8 1 GB files, but the results are the same - it's a
> sequential write. The number of files has no bearing on the result.
>
> For this workload, the patterns are so simple, it's distributed almost
> perfectly. Even using the skip_busy algorithm by itself (a practice I
> warned about over a year ago) produces acceptible results. The results
> showed an median extent length of 1023 extents (1 less than contained in
> an indirect pointer block), followed by a median extent displacement of
> 2 blocks

So the filefrag program considers indirect pointer blocks to not be part 
of the file? Ok, then that is an error of filefrag. Why 2 blocks and not 1?

> . For all intents and purposes, the file is contiguous, with
> metadata interspersed.
>
> The pattern of a streaming read/write operation would be like so:
> Locate file, locate first indirect pointer block, read blocks, find next
> indirect pointer block, read blocks, ...
>
> In the perceived "ideal" fragementation pattern of 9 fragments
> (1024MB/128MB - 4k per bitmap + 8*4k remainder), the metadata is not
> interspersed with the file data. It makes the fragmentation extent
> number look nice and low, but it really means that every time we need to
> read another indirect pointer block, we're seeking outside the data
> stream, reading a few blocks (readahead), and seeking back.
>
> In the fragmentation pattern created using the new allocation algorithms
> that Chris and I developed, you'll get a higher fragmentation number,
> but the extents are close together, and the pointer blocks are already
> read in due to readahead. The "actual" fragmentation is lower than the
> "ideal" case above since less seeking is required.
>
> -Jeff
>
> --
> Jeff Mahoney
> SuSE Labs


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2004-09-09 17:04 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-09 20:19 Odd Block allocation behavior on Reiser3 Sonny Rao
2004-08-09 20:30 ` Chris Mason
2004-08-09 22:04   ` Sonny Rao
2004-08-10  7:16     ` Hans Reiser
2004-08-10 15:45       ` Sonny Rao
2004-08-10 17:52         ` Hans Reiser
2004-08-10 18:25           ` Chris Mason
2004-08-10 18:50             ` Chris Mason
2004-08-10 19:42               ` Hans Reiser
2004-08-10 20:29                 ` Chris Mason
2004-08-10 21:47                   ` Hans Reiser
2004-08-10 23:12               ` Sonny Rao
2004-08-11  1:31                 ` Jeff Mahoney
2004-08-10 19:40             ` Hans Reiser
2004-08-10 23:00               ` Sonny Rao
2004-08-10 20:12           ` Why larger extent counts aren't necessarily bad (was Re: Odd Block allocation behavior on Reiser3) Jeff Mahoney
2004-09-09 17:04             ` Hans Reiser
2004-08-10 12:53     ` Odd Block allocation behavior on Reiser3 Chris Mason
2004-08-10 16:12       ` Sonny Rao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.