All of lore.kernel.org
 help / color / mirror / Atom feed
From: Erik Jensen <erikjensen@rkjnsn.net>
To: Theodore Ts'o <tytso@mit.edu>, Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: page->index limitation on 32bit system?
Date: Fri, 19 Feb 2021 18:20:43 -0800	[thread overview]
Message-ID: <a79562ac-1b87-8761-05a6-43b911e093a0@rkjnsn.net> (raw)
In-Reply-To: <YC/jYW/K9krbfnfl@mit.edu>

On 2/19/21 8:12 AM, Theodore Ts'o wrote:
> On Fri, Feb 19, 2021 at 08:37:30AM +0800, Qu Wenruo wrote:
>> So it means the 32bit archs are already 2nd tier targets for at least
>> upstream linux kernel?
> 
> At least as far as btrfs is concerned, anyway....
> 
>> Or would it be possible to make it an option to make the index u64?
>> So guys who really wants large file support can enable it while most
>> other 32bit guys can just keep the existing behavior?
> 
> I think if this is going to be done at all, it would need to be a
> compile-time CONFIG option to make the index be 64-bits.  That's
> because there are a huge number of low-end Android devices (retail
> price ~$30 USD in India, for example --- this set of customers is
> sometimes called "the next billion users" by some folks) that are
> using 32-bit ARM systems.  And they will be using ext4 or f2fs, and it
> would be massively unfortunate/unfair/etc. to impose that performance
> penalty on them.

A CONFIG option would certainly work for my use case. I was also 
wondering (and I ask this as and end user with admittedly no knowledge 
whatsoever about how the page cache works) whether it might be possible 
to treat the top bit as a kind of "extended address" bit, with some kind 
of additional side table that handles indexes more than 31 bits. That 
way, filesystems that are 8TB or less wouldn't lose any performance, 
while still supporting those larger than 16TB.

I assume the 4KiB entry size in the page cache is fundamental, and can't 
be, e.g., increased to 16KiB to allow addressing up to 64TiB of storage?

> It sounds like what Willy is saying is that supporting a 64-bit page
> index on 32-bit platforms is going to be have a lot of downsides, and
> not just the performance / memory overhead issue.  It's also a code
> mainteinance concern, and that tax would land on the mm developers.
> And if it's not well-maintained, without regular testing, it's likely
> to be heavily subject to bitrot.  (Although I suppose if we don't mind
> doubling the number of configs that kernelci has to test, this could
> be mitigated.)
> 
> In contrast, changing btrfs to not depend on a single address space
> for all of its metadata might be a lot of work, but it's something
> which lands on the btrfs developers, as opposed to a another (perhaps
> more central) kernel subsystem.  Managing at this tradeoff is
> something that is going to be between the mm developers and the btrfs
> developers, but as someone who doesn't do any work on either of these
> subsystems, it seems like a pretty obvious choice.
> 
> The final observation I'll make is that if we know which NAS box
> vendor can (properly) support volumes > 16 TB, we can probably find
> the 64-bit page index patch.  It'll probably be against a fairly old
> kernel, so it might not all _that_ helpful, but it might give folks a
> bit of a head start.
> 
> I can tell you that the NAS box vendor that it _isn't_ is Synology.
> Synology boxes uses btrfs, and on 32-bit processors, they have a 16TB
> volume size limit, and this is enforced by the Synology NAS
> software[1].  However, Synology NAS boxes can support multiple
> volumes; until today, I never understood why, since it seemed to be
> unnecessary complexity, but I suspect the real answer was this was how
> Synology handled storage array sizes > 16TB on their older systems.
> (All of their new NAS boxes use 64-bit processors.)
> 
> [1] https://www.reddit.com/r/synology/comments/a62xrx/max_volume_size_of_16tb/
> 
> Cheers,
> 
> 					- Ted
> 

  parent reply	other threads:[~2021-02-20  2:21 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-18  8:54 page->index limitation on 32bit system? Qu Wenruo
2021-02-18 12:15 ` Matthew Wilcox
2021-02-18 12:42   ` Qu Wenruo
2021-02-18 13:39     ` Matthew Wilcox
2021-02-19  0:37       ` Qu Wenruo
2021-02-19 16:12         ` Theodore Ts'o
2021-02-19 23:10           ` Qu Wenruo
2021-02-20  0:23             ` Matthew Wilcox
2021-02-22  0:19             ` Dave Chinner
2021-02-20  2:20           ` Erik Jensen [this message]
2021-02-20  3:40             ` Matthew Wilcox
2021-02-20 23:02       ` Erik Jensen
2021-02-20 23:22         ` Matthew Wilcox
2021-02-21  0:01           ` Erik Jensen
2021-02-21 17:15             ` Matthew Wilcox
2021-02-18 21:27   ` Erik Jensen
2021-02-19 14:22     ` Matthew Wilcox
2021-02-19 17:51       ` Matthew Wilcox
2021-02-19 23:13         ` Qu Wenruo
2021-02-22  1:48       ` Dave Chinner
2021-03-01  1:49         ` GWB

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a79562ac-1b87-8761-05a6-43b911e093a0@rkjnsn.net \
    --to=erikjensen@rkjnsn.net \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.