linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] remove 2TB block device limit
@ 2002-05-10  3:36 Peter Chubb
  2002-05-10  4:05 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Peter Chubb @ 2002-05-10  3:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, martin, neilb


Hi,
	At present, linux is limited to 2TB filesystems even on 64-bit
systems, because there are various places where the block offset on
disc are assigned to unsigned or int 32-bit variables.

There's a type, sector_t, that's meant to hold offsets in sectors and
blocks.  It's not used consistently (yet).

The patch at
    http://www.gelato.unsw.edu.au/patches/2.5.14-largefile-patch

(also available from bk://gelato.unsw.edu.au:2023/ for those using
bitkeeper)
has the following changes to address the problem:

	bmap() changes from int bmap(struct address_space *, long)
	to		    sector_t bmap(struct address_space *,
				     sector_t)

	The partitioning code takes sector_t everywhere that makes
	sense (to allow efi, for example, to create partitions on enormous
	discs).

	The block_sizes[] array is sector_t not int.

	get_nr_sectors() and get_start_sect() etc., now return a
	sector_t

	__bread() takes a sector_t as its second argument, and struct
	buffer_head contains a sector_t blocknumber field.

	struct scsi_disk and struct gendisk have a sector_t field for
	capacity.

	The scsi disc code now uses 16-byte commands if they're
	needed.

	ioctl(..GETBLKSZ..) now fails with EFBIG if the size won't fit
	in a long. (at least for devices using the generic version).

Plus a smattering of casts to avoid compilation warnings (mostly so
that printk() works whether sector_t is 64 or 32 bits) and a new
CONFIG_LFS option to turn on 64-bit sector_t on 32-bit platforms.

On an old pentium I now have a 15Tb file mounted as JFS on the loop
device -- and it seems to work for almost everything.  There are a few
user-mode programs that'll have to be fixed (notably parted, mkfs.???
etc) to cope with the new GETBLKSIZE failure (they should use
alternate mechanisms, e.g., GETBLKSIZE64, or just seek to the end of
the partition and look at the offset).

As this touches lots of places -- the generic block layer (Andrew?)
the IDE code (Martin?) and RAID (Neil?) and minor changes to the scsi
I've CCd a few people directly.

--
Peter Chubb
Gelato@UNSW http://www.gelato.unsw.edu.au/

^ permalink raw reply	[flat|nested] 41+ messages in thread
* Re: [PATCH] remove 2TB block device limit
@ 2002-05-10  3:53 Neil Brown
  0 siblings, 0 replies; 41+ messages in thread
From: Neil Brown @ 2002-05-10  3:53 UTC (permalink / raw)
  To: Peter Chubb; +Cc: linux-kernel, akpm, martin

On Friday May 10, peter@chubb.wattle.id.au wrote:
> 
> Hi,
> 	At present, linux is limited to 2TB filesystems even on 64-bit
> systems, because there are various places where the block offset on
> disc are assigned to unsigned or int 32-bit variables.
> 
> There's a type, sector_t, that's meant to hold offsets in sectors and
> blocks.  It's not used consistently (yet).
> 
> The patch at
>     http://www.gelato.unsw.edu.au/patches/2.5.14-largefile-patch

> 
> As this touches lots of places -- the generic block layer (Andrew?)
> the IDE code (Martin?) and RAID (Neil?) and minor changes to the scsi
> I've CCd a few people directly.
> 

Thanks.
MD part looks sane to me. However I would rather the
 
+#ifdef CONFIG_LFS
+#include <asm/div64.h>
+#else
+#undef do_div
+#define do_div(n, b)({ int _res; _res = (n) % (b); (n) /= (b); _res;})
+#endif
+

part went in linux/raid/md_k.h and defined "sector_div" (or similar)
as either do_div or ({ int _res; _res = (n) % (b); (n) /= (b); _res;})
as appropriate.

NeilBrown

^ permalink raw reply	[flat|nested] 41+ messages in thread
[parent not found: <1060250300@toto.iv>]
* Re: [PATCH] remove 2TB block device limit
@ 2002-05-15  9:41 Hirotaka Sasaki
  2002-05-15 21:49 ` Steve Lord
  0 siblings, 1 reply; 41+ messages in thread
From: Hirotaka Sasaki @ 2002-05-15  9:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: taka, minoura, alexr

Hi,
My name is Hirotaka Sasaki and I work for VA Linux Japan.  
We've had a need for large disk support as well, and so I've developed
support for 64-bit block numbers and page cache indices. 

I'm not subscribed to this list so please CC on any responses.

All development I've done so far has been tested on 2.4.17 w/XFS
        - linux-2.4.17
        - xfs-1.0.2
        - x86 (p3) architecture

The main revisions my patch includes:
        - 64-bit page cache indices (doesn't support 64-bit mmap)
        - 64-bit block #'s, sector #'s in the block I/O layer
        - 64-bit block device file (support for block #'s)
        - raw and direct I/O support for 64-bit block and sector #'s
        - 64-bit start_sect/nr_sect support in struct hd_struct
        - 64-bit blk_size table
        - 64-bit SCSI device sizes (sd_sizes/sr_sizes)
        - 64-bit loop device

  This patch at:
  ftp://ftp.valinux.co.jp/pub/people/sasaki/blk64/va-block64-2.4.17.patch

Other revisions (not necessarily including the kernel):

In order to create a file system larger than 2TB on XFS I,
        - changed ioctl(BLKGETSIZE) to ioctl(BLKGETSIZE64) in mkfs.xfs
        - in the kernel fixed an error in the handling of ioctl(BLKGETSIZE64)

  This patches at:
  ftp://ftp.valinux.co.jp/pub/people/sasaki/blk64/va-blkgetsize64-2.4.17.patch
  ftp://ftp.valinux.co.jp/pub/people/sasaki/blk64/xfsprogs-1.3.17-blkgetsize64.patch

In order to display a file system size larger than 16TB using df I,
        - added a new system call to the kernel, statfs64
        - added statfs64 to struct super
        - modified XFS and NFSv3 to support statfs64
        - created a new library, statvfs64, to use statfs64 which is
                  then called by df command

  This patches at:
  ftp://ftp.valinux.co.jp/pub/people/sasaki/blk64/va-statfs64-2.4.17.patch
  ftp://ftp.valinux.co.jp/pub/people/sasaki/blk64/va-statfs64_xfs-2.4.17.patch
  ftp://ftp.valinux.co.jp/pub/people/sasaki/blk64/va-statfs64_nfsd-2.4.17.patch
  ftp://ftp.valinux.co.jp/pub/people/sasaki/blk64/va-statfs64_nfs-2.4.17.patch        
  ftp://ftp.valinux.co.jp/pub/people/sasaki/blk64/fileutils-4.1-df_statvfs64.patch

I ran several tests on XFS by creating a file system and mounting
it on the loop device.  I noticed that the size of the file system
is limited to 16TB by XFS_MAX_FILE_OFFSET.  I need to test file systems
> 16TB so I changed XFS_MAX_FILE_OFFSET to (long long)((1ULL<<63)-1ULL).
However, XFS internally uses unsigned long's for the page cache indices
which means everything works great until you mount the file system, but
after that it all falls apart.

  This patch at:
  ftp://ftp.valinux.co.jp/pub/people/sasaki/blk64/va-xfs_max_file_offset-2.4.17.patch

Under XFS I've tested,
        - 16TB XFS file system (successfully mounted and accessed)
        - 32TB XFS file system (successfully mounted but access failed
                as outlined above)

Further improvements I plan on making:
        - 64-bit support for LVM (including LVM tools)
        - SCSI device support for 64-bit in the common layer
        - 16-byte SCSI command support

Any help or advice you can offer is greatly appreciated!

# Thanks to Alexander Reeder for translating 
--
Hirotaka Sasaki <sasaki@valinux.co.jp>
Engineering Dept.
VA Linux Systems Japan K.K. 

^ permalink raw reply	[flat|nested] 41+ messages in thread
[parent not found: <581856778@toto.iv>]

end of thread, other threads:[~2002-05-17 20:27 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-10  3:36 [PATCH] remove 2TB block device limit Peter Chubb
2002-05-10  4:05 ` Andrew Morton
2002-05-10  8:43   ` Anton Altaparmakov
2002-05-10  9:04     ` Andrew Morton
2002-05-16 19:08       ` Daniel Phillips
2002-05-10  9:05     ` Jens Axboe
2002-05-10  9:53       ` Peter Chubb
2002-05-10 10:01         ` Jens Axboe
2002-05-10 11:43         ` Anton Altaparmakov
2002-05-10  4:51 ` Martin Dalecki
     [not found] ` <20020510084713.43ce396e.jeremy@kerneltrap.org>
2002-05-10 19:12   ` Peter Chubb
2002-05-10 23:46     ` Andreas Dilger
2002-05-11  0:07       ` David Mosberger
2002-05-15 22:17         ` Andreas Dilger
2002-05-16 20:22           ` Daniel Phillips
2002-05-16 22:54             ` Andreas Dilger
2002-05-17  1:17               ` Daniel Phillips
2002-05-11  4:40       ` Peter Chubb
2002-05-15 13:49       ` Pavel Machek
2002-05-11 18:13     ` Padraig Brady
2002-05-10  3:53 Neil Brown
     [not found] <1060250300@toto.iv>
2002-05-13 10:28 ` Peter Chubb
2002-05-13 12:13   ` Christoph Hellwig
2002-05-14  0:30     ` Peter Chubb
2002-05-14  1:36       ` Anton Altaparmakov
2002-05-16 20:32         ` Daniel Phillips
2002-05-14  2:09       ` Andrew Morton
2002-05-14  2:58         ` Peter Chubb
2002-05-14  7:22           ` Christoph Hellwig
2002-05-14  7:21         ` Christoph Hellwig
2002-05-15  9:41 Hirotaka Sasaki
2002-05-15 21:49 ` Steve Lord
     [not found] <581856778@toto.iv>
2002-05-17  0:04 ` Peter Chubb
2002-05-17  0:18   ` Daniel Phillips
2002-05-17 13:32     ` Jesse Pollard
2002-05-17 18:02       ` Daniel Phillips
2002-05-17 18:26         ` Jesse Pollard
2002-05-17 18:36       ` Andreas Dilger
2002-05-17 19:52       ` Daniel Phillips
2002-05-17 20:25         ` Andrew Morton
2002-05-17 15:26     ` Jason L Tibbitts III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).