All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Glauber Costa <glauber@scylladb.com>
Cc: Avi Kivity <avi@scylladb.com>, xfs@oss.sgi.com
Subject: Re: sleeps and waits during io_submit
Date: Tue, 1 Dec 2015 09:02:07 -0500	[thread overview]
Message-ID: <20151201140206.GC26129@bfoster.bfoster> (raw)
In-Reply-To: <CAD-J=zYpXui+HarW9RTGyygE9vQn=0FXs__kmVGF3nHN3AfsYA@mail.gmail.com>

On Tue, Dec 01, 2015 at 08:39:06AM -0500, Glauber Costa wrote:
> >
> > The truncate will free blocks and require block allocation on subsequent
> > writes. That might be something you could look into trying to avoid
> > (e.g., keeping files around and reusing space), but that depends on your
> > application design.
> 
> 
> This one is a bit hard. We have a journal-like structure for the
> modifications issued to the data store, which dominates most of our
> write workloads (including this one that I am discussing here). We
> could keep they around by renaming them outside of user visibility and
> then renaming them back, but that would mean that we are now using
> twice as much space. Perhaps we could use a pool that can at least
> guarantee one or two allocations from a pre-existing file. I am
> assuming here that renaming the file won't block. If it does, we are
> better off not doing so.
> 
> > Inodes chunks are allocated and freed dynamically by
> > default as well. The 'ikeep' mount option keeps inode chunks around
> > indefinitely (even if individual inodes are all freed) if you wanted to
> > avoid inode chunk reallocation and know you have a fairly stable working
> > set of inodes.
> 
> I believe we do have a fairly stable inode working set, even though
> that depends a bit on what's considered stable. For our journal-like
> structure, we will keep them around until we are sure the information
> is safe and them delete them - creating new ones as we receive more
> data. But that's always bounded in size.
> 
> Am I correct to understand that ikeep being passed, new allocations
> would just reuse space from the empty chunks on disk?
> 

Yes.. current behavior is that inodes are allocated and freed in chunks
of 64. When the entire chunk of inodes is freed from the namespace, the
chunk is freed (i.e., it is now free space). With ikeep, inode chunks
are never freed. When an individual inode allocation request is made,
the inode is allocated from one of the existing inode chunks before a
new chunk is allocated.

The tradeoff is that you could consume a significant amount of space
with inodes, free a bunch of them and that space is not freed. So that
is something to be aware of for your use case, particularly if the fs
has other uses from your journaling mechanism described above because it
affects the entire fs.

> 
> > Per-inode extent size hints might be another option to
> > increase the size of allocations and perhaps reduce the number of them.
> >
> 
> That's absolutely greatastic. Our files for that journal are all more
> or less the same size. That's a great candidate for a hint.
> 

You could consider preallocation (fallocate()) as well if you know the
full size in advance.

Brian

> > Brian
> 
> Thanks again, Brian

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2015-12-01 14:02 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-28  2:43 sleeps and waits during io_submit Glauber Costa
2015-11-30 14:10 ` Brian Foster
2015-11-30 14:29   ` Avi Kivity
2015-11-30 16:14     ` Brian Foster
2015-12-01  9:08       ` Avi Kivity
2015-12-01 13:11         ` Brian Foster
2015-12-01 13:58           ` Avi Kivity
2015-12-01 14:01             ` Glauber Costa
2015-12-01 14:37               ` Avi Kivity
2015-12-01 20:45               ` Dave Chinner
2015-12-01 20:56                 ` Avi Kivity
2015-12-01 23:41                   ` Dave Chinner
2015-12-02  8:23                     ` Avi Kivity
2015-12-01 14:56             ` Brian Foster
2015-12-01 15:22               ` Avi Kivity
2015-12-01 16:01                 ` Brian Foster
2015-12-01 16:08                   ` Avi Kivity
2015-12-01 16:29                     ` Brian Foster
2015-12-01 17:09                       ` Avi Kivity
2015-12-01 18:03                         ` Carlos Maiolino
2015-12-01 19:07                           ` Avi Kivity
2015-12-01 21:19                             ` Dave Chinner
2015-12-01 21:38                               ` Avi Kivity
2015-12-01 23:06                                 ` Dave Chinner
2015-12-02  9:02                                   ` Avi Kivity
2015-12-02 12:57                                     ` Carlos Maiolino
2015-12-02 23:19                                     ` Dave Chinner
2015-12-03 12:52                                       ` Avi Kivity
2015-12-04  3:16                                         ` Dave Chinner
2015-12-08 13:52                                           ` Avi Kivity
2015-12-08 23:13                                             ` Dave Chinner
2015-12-01 18:51                         ` Brian Foster
2015-12-01 19:07                           ` Glauber Costa
2015-12-01 19:35                             ` Brian Foster
2015-12-01 19:45                               ` Avi Kivity
2015-12-01 19:26                           ` Avi Kivity
2015-12-01 19:41                             ` Christoph Hellwig
2015-12-01 19:50                               ` Avi Kivity
2015-12-02  0:13                             ` Brian Foster
2015-12-02  0:57                               ` Dave Chinner
2015-12-02  8:38                                 ` Avi Kivity
2015-12-02  8:34                               ` Avi Kivity
2015-12-08  6:03                                 ` Dave Chinner
2015-12-08 13:56                                   ` Avi Kivity
2015-12-08 23:32                                     ` Dave Chinner
2015-12-09  8:37                                       ` Avi Kivity
2015-12-01 21:04                 ` Dave Chinner
2015-12-01 21:10                   ` Glauber Costa
2015-12-01 21:39                     ` Dave Chinner
2015-12-01 21:24                   ` Avi Kivity
2015-12-01 21:31                     ` Glauber Costa
2015-11-30 15:49   ` Glauber Costa
2015-12-01 13:11     ` Brian Foster
2015-12-01 13:39       ` Glauber Costa
2015-12-01 14:02         ` Brian Foster [this message]
2015-11-30 23:10 ` Dave Chinner
2015-11-30 23:51   ` Glauber Costa
2015-12-01 20:30     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151201140206.GC26129@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=avi@scylladb.com \
    --cc=glauber@scylladb.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.