All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Matthew Wilcox <willy@infradead.org>
Cc: "Luis Chamberlain" <mcgrof@kernel.org>,
	"Keith Busch" <kbusch@kernel.org>,
	"Theodore Ts'o" <tytso@mit.edu>,
	"Pankaj Raghav" <p.raghav@samsung.com>,
	"Daniel Gomez" <da.gomez@samsung.com>,
	"Javier González" <javier.gonz@samsung.com>,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-block@vger.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] Cloud storage optimizations
Date: Mon, 6 Mar 2023 11:05:20 +0100	[thread overview]
Message-ID: <d1976bc4-0350-256e-2f88-028278a3b9fa@suse.de> (raw)
In-Reply-To: <ZAWi5KwrsYL+0Uru@casper.infradead.org>

On 3/6/23 09:23, Matthew Wilcox wrote:
> On Sun, Mar 05, 2023 at 12:22:15PM +0100, Hannes Reinecke wrote:
>> On 3/4/23 18:54, Matthew Wilcox wrote:
>>> I think we're talking about different things (probably different storage
>>> vendors want different things, or even different people at the same
>>> storage vendor want different things).
>>>
>>> Luis and I are talking about larger LBA sizes.  That is, the minimum
>>> read/write size from the block device is 16kB or 64kB or whatever.
>>> In this scenario, the minimum amount of space occupied by a file goes
>>> up from 512 bytes or 4kB to 64kB.  That's doable, even if somewhat
>>> suboptimal.
>>>
>> And so do I. One can view zones as really large LBAs.
>>
>> Indeed it might be suboptimal from the OS point of view.
>> But from the device point of view it won't.
>> And, in fact, with devices becoming faster and faster the question is
>> whether sticking with relatively small sectors won't become a limiting
>> factor eventually.
>>
>>> Your concern seems to be more around shingled devices (or their equivalent
>>> in SSD terms) where there are large zones which are append-only, but
>>> you can still random-read 512 byte LBAs.  I think there are different
>>> solutions to these problems, and people are working on both of these
>>> problems.
>>>
>> My point being that zones are just there because the I/O stack can only deal
>> with sectors up to 4k. If the I/O stack would be capable of dealing
>> with larger LBAs one could identify a zone with an LBA, and the entire issue
>> of append-only and sequential writes would be moot.
>> Even the entire concept of zones becomes irrelevant as the OS would
>> trivially only write entire zones.
> 
> All current filesystems that I'm aware of require their fs block size
> to be >= LBA size.  That is, you can't take a 512-byte blocksize ext2
> filesystem and put it on a 4kB LBA storage device.
> 
> That means that files can only grow/shrink in 256MB increments.  I
> don't think that amount of wasted space is going to be acceptable.
> So if we're serious about going down this path, we need to tell
> filesystem people to start working out how to support fs block
> size < LBA size.
> 
> That's a big ask, so let's be sure storage vendors actually want
> this.  Both supporting zoned devices & suporting 16k/64k block
> sizes are easier asks.

Why, I know. And this really is a future goal.
(Possibly a very _distant_ future goal.)

Indeed we should concentrate on getting 16k/64k blocks initially.
Or maybe 128k blocks to help our RAIDed friends.

Cheers,

Hannes


  reply	other threads:[~2023-03-06 10:05 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-01  3:52 [LSF/MM/BPF TOPIC] Cloud storage optimizations Theodore Ts'o
2023-03-01  4:18 ` Gao Xiang
2023-03-01  4:40   ` Matthew Wilcox
2023-03-01  4:59     ` Gao Xiang
2023-03-01  4:35 ` Matthew Wilcox
2023-03-01  4:49   ` Gao Xiang
2023-03-01  5:01     ` Matthew Wilcox
2023-03-01  5:09       ` Gao Xiang
2023-03-01  5:19         ` Gao Xiang
2023-03-01  5:42         ` Matthew Wilcox
2023-03-01  5:51           ` Gao Xiang
2023-03-01  6:00             ` Gao Xiang
2023-03-02  3:13 ` Chaitanya Kulkarni
2023-03-02  3:50 ` Darrick J. Wong
2023-03-03  3:03   ` Martin K. Petersen
2023-03-02 20:30 ` Bart Van Assche
2023-03-03  3:05   ` Martin K. Petersen
2023-03-03  1:58 ` Keith Busch
2023-03-03  3:49   ` Matthew Wilcox
2023-03-03 11:32     ` Hannes Reinecke
2023-03-03 13:11     ` James Bottomley
2023-03-04  7:34       ` Matthew Wilcox
2023-03-04 13:41         ` James Bottomley
2023-03-04 16:39           ` Matthew Wilcox
2023-03-05  4:15             ` Luis Chamberlain
2023-03-05  5:02               ` Matthew Wilcox
2023-03-08  6:11                 ` Luis Chamberlain
2023-03-08  7:59                   ` Dave Chinner
2023-03-06 12:04               ` Hannes Reinecke
2023-03-06  3:50             ` James Bottomley
2023-03-04 19:04         ` Luis Chamberlain
2023-03-03 21:45     ` Luis Chamberlain
2023-03-03 22:07       ` Keith Busch
2023-03-03 22:14         ` Luis Chamberlain
2023-03-03 22:32           ` Keith Busch
2023-03-03 23:09             ` Luis Chamberlain
2023-03-16 15:29             ` Pankaj Raghav
2023-03-16 15:41               ` Pankaj Raghav
2023-03-03 23:51       ` Bart Van Assche
2023-03-04 11:08       ` Hannes Reinecke
2023-03-04 13:24         ` Javier González
2023-03-04 16:47         ` Matthew Wilcox
2023-03-04 17:17           ` Hannes Reinecke
2023-03-04 17:54             ` Matthew Wilcox
2023-03-04 18:53               ` Luis Chamberlain
2023-03-05  3:06               ` Damien Le Moal
2023-03-05 11:22               ` Hannes Reinecke
2023-03-06  8:23                 ` Matthew Wilcox
2023-03-06 10:05                   ` Hannes Reinecke [this message]
2023-03-06 16:12                   ` Theodore Ts'o
2023-03-08 17:53                     ` Matthew Wilcox
2023-03-08 18:13                       ` James Bottomley
2023-03-09  8:04                         ` Javier González
2023-03-09 13:11                           ` James Bottomley
2023-03-09 14:05                             ` Keith Busch
2023-03-09 15:23                             ` Martin K. Petersen
2023-03-09 20:49                               ` James Bottomley
2023-03-09 21:13                                 ` Luis Chamberlain
2023-03-09 21:28                                   ` Martin K. Petersen
2023-03-10  1:16                                     ` Dan Helmick
2023-03-10  7:59                             ` Javier González
2023-03-08 19:35                 ` Luis Chamberlain
2023-03-08 19:55                 ` Bart Van Assche
2023-03-03  2:54 ` Martin K. Petersen
2023-03-03  3:29   ` Keith Busch
2023-03-03  4:20   ` Theodore Ts'o
2023-07-16  4:09 BELINDA Goodpaster kelly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d1976bc4-0350-256e-2f88-028278a3b9fa@suse.de \
    --to=hare@suse.de \
    --cc=da.gomez@samsung.com \
    --cc=javier.gonz@samsung.com \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mcgrof@kernel.org \
    --cc=p.raghav@samsung.com \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.