All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rohan Puri <rohan.puri15@gmail.com>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: Linux FS Devel <linux-fsdevel@vger.kernel.org>
Subject: Re: [RFC] Draft Linux kernel interfaces for ZBC drives
Date: Tue, 25 Feb 2014 15:06:58 +0530	[thread overview]
Message-ID: <CALJfu6MtBeqrirTgOd1Fm3SwU=OAE0mEZ9JfoDhhO9m24EHw2w@mail.gmail.com> (raw)
In-Reply-To: <20140221154956.GG31047@thunk.org>

On Fri, Feb 21, 2014 at 9:19 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> On Fri, Feb 21, 2014 at 03:32:52PM +0530, Rohan Puri wrote:
>> > extern int blkdev_query_zones(struct block_device *bdev,
>> >                               sector_t start_sector,
>> >                               int free_sectors_criteria,
>> >                               struct zone_status *ret_zones,
>> >                               int max_zones);
>>
>> In this api, the caller would allocate the memory for ret_zones as
>> sizeof(struct zone_status) * max_zones, right? There can be a case
>> where return value is less than max_zones, in this case we would be
>> preallocating extra memory for (max_zones - ret val) that would not be
>> used (since they would not contain valid zone_status structs). As the
>> hdd ages, it can be prone to failures, instances of differences of the
>> two values can happen. Can we pass a double pointer to ret_zones, so
>> that the api allocates the memory and the caller can free it? Would
>> like to know your views on this. This thing will be invalid for the
>> single zone_status example that you gave.
>
> I think you are making the assumption here that max_zones will
> normally be the maximum number of zone available to the disk.In
No, not the maximum anything greater than 1 and less than maximum
number of available zone. Consider a out of 32,768 zones, kernel wants
to query for 1000 zones, out of this 1000 zones information requested,
there can be a case that only 700-800 zones information could be
obtained & for remaining 200-300 zones information couldn't be
obtained due to some error condition. Now, we would be preallocated
more (200-300) * 24 bytes of information. This will only happen in
case of error path and i am not quite sure about the probability of
it. What are your views on this?
> practice, this will never be true.  Consider that a 8TB SMR drive with
> 256 MB zones will have 32,768 zones.  The kernel will *not* want to
> allocate 768k of non-swappable kernel memory on a regular basis.
> (There is no guarantee there will be that number of contiguous pages
> available, and if you use vmalloc() instead, it's slower since it
> involves page table operations.)  Also, when will the kernel ever want
> to see all of the zones all at once, anyway?
>
any filesystem that would be SMR-aware can need this, right? like its
block allocator, to optimise for fragmentation n stuff?
> So it's likely that the caller will always be allocating, a relatively
> small number of zones (I suspect it will always be less than 128), and
agree, but this no has to be optimal to reduce the no of disk reads.
> if the caller needs more zones, it will simply call
> blkdev_qeury_zones() with a larger start_sector value and get the next
> 128 zones.
>
> So your concern about preallocating extra memory for zones that would
> not be used is I don't belive a major issue.
>
Yes, only in disk read errors & requests of more than 1 zones
information this could happen.
>
> My anticipation is that kernel will be storing the information
> returned blkdev_query_zones() in a much more compact fashion (since we
> don't need to store the write pointer if the zone is completely full,
> or completely empty, which will very often be the case, I suspect),
> and there will be a different interface that will be used by block
> device drivers to send this information to the block device layer
> library function which will be maintaining this information in a
> compact form.
>
> I know that I still need to spec out some functions to make life
> easier for the block device drivers that will be interfacing into ZBC
> maintenance layer.   They will probably look something like this:
>
> extern int blkdev_set_zone_info(struct block_device *bdev,
>                                 struct zone_status *zone_info);
>
> blkdev_set_zone_info() would get called once per zone when the block
> device is initially set up.  My assumption is that the block device
would this happen every time on os boot up? if so then will it not
increase the os boot time?
> layer will query the drive initially, and grab all of this
> information, and keep it in the compressed form.  (Since querying this
> data each time the OS needs it will likely be too expensive; even if
> the ZBC commands don't have the same insanity as the non-queable TRIM
> command, the fact that we need to go out to the disk means that we
> will need to send a disk command and wait for an command completion
> interrupt, which would be sad.)
>
Agree.
> I suspect we will also need commands such as these for the convenience
> of the block device driver:
>
> extern int blkdev_update_write_ptr(struct block_device *bdev,
>                                    sector_t start_sector,
>                                    u32 write_ptr);
>
> extern int blkdev_update_zone_info(struct block_device *bdev,
>                                    struct zone_status *zone_info);
>
Will this lead to the update on the disk or in-memory, like write
through or write back?
> And we will probably want to define that in blockdev_query_zones(), if
> start_sector is not located at the beginning of a zone, that the first
> zone returned will be zone containing the specified sector.  (We'll
> need this in the event that the T10 committee allows for variable
> sized zones, instead of the much simpler fixed-size zone design, since
> given a sector number, the block driver or the file system above the
> ZBC OS management layer would have no way of mapping a sector number
> to a specific zone.)
>
> So I suspect as start implementing device mapper SMR simulators and
> actual SAS/SATA block device drivers which will interface with the ZBC
> prototype drives, there may be other functions we will need to
> implement in order to make life easier both for these systems.
>
I was interested in project core-04, smr simulator. I read a project
report related to it, research conducted at ucsc link : -
http://www.ssrc.ucsc.edu/Papers/ssrctr-12-05.pdf
Also, would like to know your inputs to for core-04.

> Cheers,
>
>                                         - Ted

- Regards,
     Rohan

      reply	other threads:[~2014-02-25  9:37 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-31  5:38 [RFC] Draft Linux kernel interfaces for ZBC drives Theodore Ts'o
2014-01-31 13:07 ` Matthew Wilcox
2014-01-31 15:44   ` Theodore Ts'o
2014-02-03 21:01 ` Jeff Moyer
2014-02-03 21:07   ` Martin K. Petersen
2014-02-03 21:38   ` Theodore Ts'o
2014-02-03 22:26     ` Jeff Moyer
2014-02-03 21:03 ` Eric Sandeen
2014-02-03 22:17   ` Theodore Ts'o
2014-02-04  2:00 ` HanBin Yoon
2014-02-04 16:27   ` Theodore Ts'o
2014-02-11 18:43 ` [RFC] Draft Linux kernel interfaces for SMR/ZBC drives Theodore Ts'o
2014-02-11 19:04   ` Andreas Dilger
2014-02-11 19:53     ` Theodore Ts'o
2014-02-13  2:08       ` Andreas Dilger
2014-02-13  3:09         ` Theodore Ts'o
2014-02-21 10:02 ` [RFC] Draft Linux kernel interfaces for ZBC drives Rohan Puri
2014-02-21 15:49   ` Theodore Ts'o
2014-02-25  9:36     ` Rohan Puri [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALJfu6MtBeqrirTgOd1Fm3SwU=OAE0mEZ9JfoDhhO9m24EHw2w@mail.gmail.com' \
    --to=rohan.puri15@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.