All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@whamcloud.com>
To: lustre-devel@lists.lustre.org
Subject: [lustre-devel] Compact layouts
Date: Wed, 21 Nov 2018 23:53:03 +0000	[thread overview]
Message-ID: <E2B38E8A-55A5-481D-AF44-FF0B4CD8472E@whamcloud.com> (raw)
In-Reply-To: <3600968B-94D3-4C3F-857A-D00A5C3BC601@cray.com>

On Nov 16, 2018, at 11:06, Patrick Farrell <paf@cray.com> wrote:
> 
> All,
>  
> There is an old idea for reducing the data required to describe file striping by using a bitmap to record which OSTs are in use.  As best I can tell, this was most recently described here:
> http://wiki.lustre.org/Layout_Enhancement_Solution_Architecture#Compact_Layouts_2
>  
> I?m curious if this has been pursued any further, if there?s a JIRA or other place that might have more info or be tracking the idea.  I poked around and didn?t find anything.
>  
> In particular, this comment:
> ?with enough data that for each OST index set in the bitmap, a corresponding OST object FID may be computed?
> Points at the difficult part of implementing this.
>  
> So, before I get too far considering this problem - Is there more out there somewhere?  Hoping to avoid duplicating work!

Patrick,
as you mention above, the tricky part is that there would need to be sequential FID sequence allocation across all of the OSTs.  Then, each of the compact files would allocate/reserve the same OID in each of the sequences so that the mapping could be compact.  I don't think that is insurmountable - we already have a good mechanism for allocating FID sequences to different targets, but it would need to be extended so that compact layouts would allocate sequences from a different range of values from regular layouts.

It would also likely need to implement "OST object create on write" so that there aren't large numbers of unused objects on each OST (one for each OID that isn't used on a particular file).

The other issue is that anything like migrating any single object to another OST (e.g. for mirror resync, tiering, etc) would potentially break the compact layout.

I guess the question is what the need for compact layouts is?  To handle more than 2000 stripes, to reduce the xattr size/RPC size, to allow more complex PFL layouts to fit into the layout size limit?

In the past we discussed compressing the layout with gzip, which might be quite effective since large parts of it are zero-filled and repetitive.  This would help the xattr/RPC size, and I think even with compact layouts that they would still be expanded in RAM to allow easier processing.

Cheers, Andreas
---
Andreas Dilger
Principal Lustre Architect
Whamcloud

  reply	other threads:[~2018-11-21 23:53 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-16 18:06 [lustre-devel] Compact layouts Patrick Farrell
2018-11-21 23:53 ` Andreas Dilger [this message]
2018-11-22  2:27   ` Patrick Farrell
2018-11-22  2:30     ` John Bent
2018-11-22  2:41       ` Patrick Farrell
2018-11-22  2:53         ` Patrick Farrell
2018-11-22  3:29     ` Andreas Dilger
2018-11-22  6:53       ` George Melikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E2B38E8A-55A5-481D-AF44-FF0B4CD8472E@whamcloud.com \
    --to=adilger@whamcloud.com \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.