All of lore.kernel.org
 help / color / mirror / Atom feed
* Alignment size?
@ 2010-08-12 22:10 Michael Tokarev
  2010-08-12 23:49 ` Dave Chinner
  0 siblings, 1 reply; 11+ messages in thread
From: Michael Tokarev @ 2010-08-12 22:10 UTC (permalink / raw)
  To: xfs

Hello.

I used XFS for a long time on many different
servers, and it works well.  But now I encountered
an.. unexpected problem.

The question is: on one of our servers, XFS requires
different alignment size for O_DIRECT operations than
on others.  Usually it's 512 bytes, but on this server
it is 4096 - both min_io and alignment (this is from
XFS_IOC_DIOINFO ioctl).

I'm not sure what the reason for this is.
On this server, the underlying block device is raid5
(linux sw raid), but we had other machines with raid5
which didn't have that alignment requiriments.

The problem with that is that Oracle db, which we use
with XFS alot, refuses to work on this machine, or,
rather, XFS refuses to process I/O in 512-byte chunks
from oracle (control files and redolog files).

I know it is a frequent combination which is used in
production in many places, and is used here alot too,
but I haven't seen anyone mentioning this issue we
have now, with "larger than usual" alignment size
requiriments.

Is there a way to remedy this somehow, without
reformatting whole 600+ gb?

Thank you!

/mjt

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Alignment size?
  2010-08-12 22:10 Alignment size? Michael Tokarev
@ 2010-08-12 23:49 ` Dave Chinner
  2010-08-13  6:24   ` Michael Tokarev
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2010-08-12 23:49 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: xfs

On Fri, Aug 13, 2010 at 02:10:39AM +0400, Michael Tokarev wrote:
> Hello.
> 
> I used XFS for a long time on many different
> servers, and it works well.  But now I encountered
> an.. unexpected problem.
> 
> The question is: on one of our servers, XFS requires
> different alignment size for O_DIRECT operations than
> on others.  Usually it's 512 bytes, but on this server
> it is 4096 - both min_io and alignment (this is from
> XFS_IOC_DIOINFO ioctl).

It'll be a filesystem set up with a 4k sector size, then.  Check the
output of xfs_info.

> I'm not sure what the reason for this is.
> On this server, the underlying block device is raid5
> (linux sw raid), but we had other machines with raid5
> which didn't have that alignment requiriments.
> 
> The problem with that is that Oracle db, which we use
> with XFS alot, refuses to work on this machine, or,
> rather, XFS refuses to process I/O in 512-byte chunks
> from oracle (control files and redolog files).

A clear case of application failure. I guess Oracle have some work
to do to support 4k sector drives where they won't be able to do 512
byte direct IOs at all....

> Is there a way to remedy this somehow, without
> reformatting whole 600+ gb?

Not really. If it is 4k sector size, then there is some extremely
dangerous voodoo that you could do to realign and resize the AG
headers, followed by a full xfs_repair run to fix up all the block
accounting. This is not something I'd recommend anyone ever does,
and for only 600GB of data it would probably take more time to work
out how to do it correctly (using disposable filesystem images) than
it would to dump, mkfs and restore...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Alignment size?
  2010-08-12 23:49 ` Dave Chinner
@ 2010-08-13  6:24   ` Michael Tokarev
  2010-08-13 10:27     ` Stan Hoeppner
                       ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Michael Tokarev @ 2010-08-13  6:24 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

13.08.2010 03:49, Dave Chinner wrote:
> On Fri, Aug 13, 2010 at 02:10:39AM +0400, Michael Tokarev wrote:
>> Hello.
>>
>> I used XFS for a long time on many different
>> servers, and it works well.  But now I encountered
>> an.. unexpected problem.
>>
>> The question is: on one of our servers, XFS requires
>> different alignment size for O_DIRECT operations than
>> on others.  Usually it's 512 bytes, but on this server
>> it is 4096 - both min_io and alignment (this is from
>> XFS_IOC_DIOINFO ioctl).
> 
> It'll be a filesystem set up with a 4k sector size, then.  Check the
> output of xfs_info.

yes, xfs_info reports sectsz=4096, I noticed this yesterday.

>> I'm not sure what the reason for this is.
>> On this server, the underlying block device is raid5
>> (linux sw raid), but we had other machines with raid5
>> which didn't have that alignment requiriments.
>>
>> The problem with that is that Oracle db, which we use
>> with XFS alot, refuses to work on this machine, or,
>> rather, XFS refuses to process I/O in 512-byte chunks
>> from oracle (control files and redolog files).
> 
> A clear case of application failure. I guess Oracle have some work
> to do to support 4k sector drives where they won't be able to do 512
> byte direct IOs at all....

Sure thing, that's oracle10, and at least at that time
there was no way to determine the size of I/O in a generic
way.  Now there is, and I hope in oracle12 there will be
support for various different sectors.

But this is not the point..
.
>> Is there a way to remedy this somehow, without
>> reformatting whole 600+ gb?
> 
> Not really. If it is 4k sector size, then there is some extremely
> dangerous voodoo that you could do to realign and resize the AG
> headers, followed by a full xfs_repair run to fix up all the block
> accounting. This is not something I'd recommend anyone ever does,
> and for only 600GB of data it would probably take more time to work
> out how to do it correctly (using disposable filesystem images) than
> it would to dump, mkfs and restore...

Ugh.  I see.  Well, I was afraid of that, but I'm already
sorta-prepared for that, after "sleeping with this idea"... ;)
It'll take ages for sure, but there's no other choice for
now.

So the question that remains is: why?

It's an old machine (PIV era), with old scsi disks (74Gb
non-hotswap), -- the same disks as used on numerous other
machines out there, where there's no such issue.

Plain old linux software raid array, also as used on many
other systems.

At that time, all stuff were in 512 bytes for sure.

The array and filesystem were re-created last year (we
added another drive to it), but I don't think at that
time there were a kernel version that supported >512
sector sizes either (it was 2.6.27 I think).

So why xfs decided the block size is 4K??

And a related question, -- is there a way to create
xfs fs with the right sector size?  The filesystem
were ok in years, not only on this machine, and I'm
quite afraid to replace it with something else (e.g.
ext4) in a hurry without good prior testing.

By the way, how one can check the "sector size" of a
block device nowadays?  I think I saw something about
sysfs, but I see nothing of that sort in 2.6.32 kernel
(which is used on this and other systems).

Thanks!

/mjt

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Alignment size?
  2010-08-13  6:24   ` Michael Tokarev
@ 2010-08-13 10:27     ` Stan Hoeppner
  2010-08-13 11:00       ` Michael Tokarev
  2010-08-13 11:36     ` Roger Willcocks
  2010-08-13 11:39     ` Dave Chinner
  2 siblings, 1 reply; 11+ messages in thread
From: Stan Hoeppner @ 2010-08-13 10:27 UTC (permalink / raw)
  To: xfs

Michael Tokarev put forth on 8/13/2010 1:24 AM:

> So the question that remains is: why?

4096 is the default block size and has been since at least 2.6.26 when I
started using XFS.  From "man mkfs.xfs":

OPTIONS
-b block_size_options
This  option  specifies the fundamental block size of the filesystem.  The
valid block_size_options are: log=value or size=value and only one can be
supplied.  The block size is specified either as a base two logarithm value
with log=, or in  bytes  with size=.  The default value is 4096 bytes (4 KiB),
the minimum is 512, and the maximum is 65536 (64 KiB).  XFS on Linux currently
only supports pagesize or smaller blocks.

> So why xfs decided the block size is 4K??

See above.  It's the default.  Dave, Eric, Alex and others may be able to
explain why 4096 was chosen as the default.  I'm guessing it has to do with
the best all around performance across a wide variety of storage systems.

> And a related question, -- is there a way to create
> xfs fs with the right sector size?

Yes.

-s sector_size
This option specifies the fundamental sector size of the filesystem.  The
sector_size is specified  either as a value in  bytes  with size=value or as a
base two logarithm value with log=value.  The default sector_size is 512
bytes. The minimum value for sector size is 512; the maximum is 32768 (32
KiB). The sector_size must be a power of 2 size and cannot be made larger than
the filesystem block size.

Note that the default is 512.  This would lead me to believe that whoever
created this 600GB XFS filesystem manually specified "-s 4096" on the command
line when creating it.

> The filesystem
> were ok in years, not only on this machine, and I'm
> quite afraid to replace it with something else (e.g.
> ext4) in a hurry without good prior testing.
> 
> By the way, how one can check the "sector size" of a
> block device nowadays?

cat /sys/block/[device]/queue/hw_sector_size

That will give you the hardware sector size.  As mentioned above, the XFS
sector size can be manually specified during FS creation.  Thus they may not
match, which is likely the case with the 600GB FS you're having the problems with.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Alignment size?
  2010-08-13 10:27     ` Stan Hoeppner
@ 2010-08-13 11:00       ` Michael Tokarev
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Tokarev @ 2010-08-13 11:00 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: xfs

13.08.2010 14:27, Stan Hoeppner wrote:
> Michael Tokarev put forth on 8/13/2010 1:24 AM:
> 
>> So the question that remains is: why?
> 
> 4096 is the default block size and has been since at least 2.6.26 when I
> started using XFS.  From "man mkfs.xfs":
> 
> OPTIONS
> -b block_size_options

This is block size.  But XFS_IOC_DIOINFO returns
_sector_ size.  All other XFS filesystems we have
are made with the same 4096 _block_ size.

> [..]  The default value is 4096 bytes (4 KiB)
> 
>> So why xfs decided the block size is 4K??

That was the wrong question.  The right one is
about _sector_ size, not _block_ size.
The filesystem in question has _sector_ size =4096,
all the rest has it =512.

>> And a related question, -- is there a way to create
>> xfs fs with the right sector size?
> 
> Yes.
> 
> -s sector_size
> This option specifies the fundamental sector size of the filesystem.  The
> sector_size is specified  either as a value in  bytes  with size=value or as a
> base two logarithm value with log=value.  The default sector_size is 512

Yeah, the default is 512, my manpage agrees.  But yet
I've a filesystem that has it =4096...

But maybe it were specified during filesystem creation.
I re-read the mkfs.xfs manpage yesterday, but somehow
missed the sector size option (!), which you quoted
above.  Maybe we used it year ago when creating the
filesystem, for yet to be determined reason... ;)

I just tried to create an xfs filesystem on this
machine (on a small reserved partition) - it uses
sector size = 512 as expected.

>> By the way, how one can check the "sector size" of a
>> block device nowadays?
> 
> cat /sys/block/[device]/queue/hw_sector_size

And it shows 512 even for the md array in question.

> That will give you the hardware sector size.  As mentioned above, the XFS
> sector size can be manually specified during FS creation.  Thus they may not
> match, which is likely the case with the 600GB FS you're having the problems with.

Yup!

Thank you all for the information, and please excuse
me for the noize - just too many stuff at once... ;)

/mjt

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Alignment size?
  2010-08-13  6:24   ` Michael Tokarev
  2010-08-13 10:27     ` Stan Hoeppner
@ 2010-08-13 11:36     ` Roger Willcocks
  2010-08-13 11:39     ` Dave Chinner
  2 siblings, 0 replies; 11+ messages in thread
From: Roger Willcocks @ 2010-08-13 11:36 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: xfs


On Fri, 2010-08-13 at 10:24 +0400, Michael Tokarev wrote:

> So why xfs decided the block size is 4K??

We had a similar problem with direct io here, with 2.6.9; I quote from
the Bugzilla:

"mkfs.xfs has md-specific code (!) that looks at the raid flavour to
figure stripe parameters, alignment requirements, etc.

"Raid flavours 4,5 and 6 force the alignment to be the same as the file system
block size (which is 4096 bytes)."

Here's a program to test the alignment requirements:

----

#include <xfs/libxfs.h>
#include <fcntl.h>

int main(int argc, char* argv[])
{
	struct dioattr	dio;

	int tfd = open((argc == 2) ? argv[1] : "/mnt/disk1", O_RDONLY, 0666);

	if (ioctl(tfd, XFS_IOC_DIOINFO, &dio) < 0)
		perror("ioctl");
	else {
		printf("min io size = %d\n", dio.d_miniosz);
		printf("max io size = %d\n", dio.d_maxiosz);
		printf("align = %d\n", dio.d_mem);
	}
	close(tfd);
	return 0;
}

----

The same disk set returned 'align = 4096' for raid 5, but 'align = 512' for raid 0.

--
Roger



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Alignment size?
  2010-08-13  6:24   ` Michael Tokarev
  2010-08-13 10:27     ` Stan Hoeppner
  2010-08-13 11:36     ` Roger Willcocks
@ 2010-08-13 11:39     ` Dave Chinner
  2010-08-13 15:15       ` Christoph Hellwig
  2010-08-17  0:18       ` Michael Tokarev
  2 siblings, 2 replies; 11+ messages in thread
From: Dave Chinner @ 2010-08-13 11:39 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: xfs

On Fri, Aug 13, 2010 at 10:24:46AM +0400, Michael Tokarev wrote:
> 13.08.2010 03:49, Dave Chinner wrote:
> > On Fri, Aug 13, 2010 at 02:10:39AM +0400, Michael Tokarev wrote:
> >> Hello.
> >>
> >> I used XFS for a long time on many different
> >> servers, and it works well.  But now I encountered
> >> an.. unexpected problem.
> >>
> >> The question is: on one of our servers, XFS requires
> >> different alignment size for O_DIRECT operations than
> >> on others.  Usually it's 512 bytes, but on this server
> >> it is 4096 - both min_io and alignment (this is from
> >> XFS_IOC_DIOINFO ioctl).
> > 
> > It'll be a filesystem set up with a 4k sector size, then.  Check the
> > output of xfs_info.
> 
> yes, xfs_info reports sectsz=4096, I noticed this yesterday.

....

> So the question that remains is: why?
> 
> It's an old machine (PIV era), with old scsi disks (74Gb
> non-hotswap), -- the same disks as used on numerous other
> machines out there, where there's no such issue.

If the software was as old as the machine, then that's the likely
reason.  The old md raid5 implementation did not handle sub-page
size aligned IO very well - a change of IO alignment would cause the
stripe cache to be purged and cause performance to be terrible.
Hence every time XFS wrote the superblock or an AG header it would
purge the stripe cache.

The workaround old versions of mkfs.xfs used was to create the fs
with a sector size of 4k when it detected md raid5 underneath it so
the sb and ag headers were all 4k aligned and sized, just like the
rest of the filesystem....

> And a related question, -- is there a way to create
> xfs fs with the right sector size?  The filesystem
> were ok in years, not only on this machine, and I'm
> quite afraid to replace it with something else (e.g.
> ext4) in a hurry without good prior testing.

# mkfs.xfs -s <size> ....

if you want to set it manually. YOu shouldn't need to with any
relatively recent mkfs.xfs...

> By the way, how one can check the "sector size" of a
> block device nowadays?  I think I saw something about
> sysfs, but I see nothing of that sort in 2.6.32 kernel
> (which is used on this and other systems).

/sys/block/<dev>/queue/hw_sector_size

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Alignment size?
  2010-08-13 11:39     ` Dave Chinner
@ 2010-08-13 15:15       ` Christoph Hellwig
  2010-08-17  0:18       ` Michael Tokarev
  1 sibling, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2010-08-13 15:15 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Michael Tokarev, xfs

On Fri, Aug 13, 2010 at 09:39:15PM +1000, Dave Chinner wrote:
> The workaround old versions of mkfs.xfs used was to create the fs
> with a sector size of 4k when it detected md raid5 underneath it so
> the sb and ag headers were all 4k aligned and sized, just like the
> rest of the filesystem....

That workaround is still in latests mkfs.xfs if you build against
the internal libdisk instead of libblkid.  And that's the case at
least for Debian, and probably a few other distributions as well.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Alignment size?
  2010-08-13 11:39     ` Dave Chinner
  2010-08-13 15:15       ` Christoph Hellwig
@ 2010-08-17  0:18       ` Michael Tokarev
  2010-08-17  0:30         ` Michael Tokarev
  2010-08-17  0:31         ` Dave Chinner
  1 sibling, 2 replies; 11+ messages in thread
From: Michael Tokarev @ 2010-08-17  0:18 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

13.08.2010 15:39, Dave Chinner wrote:
> On Fri, Aug 13, 2010 at 10:24:46AM +0400, Michael Tokarev wrote:
[]
>> And a related question, -- is there a way to create
>> xfs fs with the right sector size?  The filesystem
>> were ok in years, not only on this machine, and I'm
>> quite afraid to replace it with something else (e.g.
>> ext4) in a hurry without good prior testing.
> 
> # mkfs.xfs -s <size> ....
> 
> if you want to set it manually. YOu shouldn't need to with any
> relatively recent mkfs.xfs...

Um.  It appears that mkfs.xfs ignores -s size=512 on this
raid5 array, and silently creates a filesystem with 4096
sector size, regardless of various -s size=nn and -s log=mm
options.

This is xfsprogs  3.1.2-1 (debian squeeze package).

So the question stands...

Thanks!

/mjt

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Alignment size?
  2010-08-17  0:18       ` Michael Tokarev
@ 2010-08-17  0:30         ` Michael Tokarev
  2010-08-17  0:31         ` Dave Chinner
  1 sibling, 0 replies; 11+ messages in thread
From: Michael Tokarev @ 2010-08-17  0:30 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

17.08.2010 04:18, Michael Tokarev wrote:
>> # mkfs.xfs -s <size> ....
>>
>> if you want to set it manually. YOu shouldn't need to with any
>> relatively recent mkfs.xfs...
> 
> Um.  It appears that mkfs.xfs ignores -s size=512 on this
> raid5 array, and silently creates a filesystem with 4096
> sector size, regardless of various -s size=nn and -s log=mm
> options.
> 
> This is xfsprogs  3.1.2-1 (debian squeeze package).
> 
> So the question stands...

Debian builds it with internal blkid.  Rebuilding it with
libblkid fixes that.

Thanks!

/mjt

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Alignment size?
  2010-08-17  0:18       ` Michael Tokarev
  2010-08-17  0:30         ` Michael Tokarev
@ 2010-08-17  0:31         ` Dave Chinner
  1 sibling, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2010-08-17  0:31 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: xfs

On Tue, Aug 17, 2010 at 04:18:28AM +0400, Michael Tokarev wrote:
> 13.08.2010 15:39, Dave Chinner wrote:
> > On Fri, Aug 13, 2010 at 10:24:46AM +0400, Michael Tokarev wrote:
> []
> >> And a related question, -- is there a way to create
> >> xfs fs with the right sector size?  The filesystem
> >> were ok in years, not only on this machine, and I'm
> >> quite afraid to replace it with something else (e.g.
> >> ext4) in a hurry without good prior testing.
> > 
> > # mkfs.xfs -s <size> ....
> > 
> > if you want to set it manually. YOu shouldn't need to with any
> > relatively recent mkfs.xfs...
> 
> Um.  It appears that mkfs.xfs ignores -s size=512 on this
> raid5 array, and silently creates a filesystem with 4096
> sector size, regardless of various -s size=nn and -s log=mm
> options.
> 
> This is xfsprogs  3.1.2-1 (debian squeeze package).
> 
> So the question stands...

IIRC, the current debian xfsprogs package is still being built with
the old detection library. If you install libblkid and build
xfsprogs yourself, it should use the newer detection code and
behave as expected.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-08-17  0:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-12 22:10 Alignment size? Michael Tokarev
2010-08-12 23:49 ` Dave Chinner
2010-08-13  6:24   ` Michael Tokarev
2010-08-13 10:27     ` Stan Hoeppner
2010-08-13 11:00       ` Michael Tokarev
2010-08-13 11:36     ` Roger Willcocks
2010-08-13 11:39     ` Dave Chinner
2010-08-13 15:15       ` Christoph Hellwig
2010-08-17  0:18       ` Michael Tokarev
2010-08-17  0:30         ` Michael Tokarev
2010-08-17  0:31         ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.