All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Avi Kivity <avi@scylladb.com>
Cc: Glauber Costa <glauber@scylladb.com>, xfs@oss.sgi.com
Subject: Re: sleeps and waits during io_submit
Date: Wed, 2 Dec 2015 08:19:14 +1100	[thread overview]
Message-ID: <20151201211914.GZ19199@dastard> (raw)
In-Reply-To: <565DEFE2.2000308@scylladb.com>

On Tue, Dec 01, 2015 at 09:07:14PM +0200, Avi Kivity wrote:
> On 12/01/2015 08:03 PM, Carlos Maiolino wrote:
> >Hi Avi,
> >
> >>>else is going to execute in our place until this thread can make
> >>>progress.
> >>For us, nothing else can execute in our place, we usually have exactly one
> >>thread per logical core.  So we are heavily dependent on io_submit not
> >>sleeping.
> >>
> >>The case of a contended lock is, to me, less worrying.  It can be reduced by
> >>using more allocation groups, which is apparently the shared resource under
> >>contention.
> >>
> >I apologize if I misread your previous comments, but, IIRC you said you can't
> >change the directory structure your application is using, and IIRC your
> >application does not spread files across several directories.
> 
> I miswrote somewhat: the application writes data files and commitlog
> files.  The data file directory structure is fixed due to
> compatibility concerns (it is not a single directory, but some
> workloads will see most access on files in a single directory.  The
> commitlog directory structure is more relaxed, and we can split it
> to a directory per shard (=cpu) or something else.
> 
> If worst comes to worst, we'll hack around this and distribute the
> data files into more directories, and provide some hack for
> compatibility.
> 
> >XFS spread files across the allocation groups, based on the directory these
> >files are created,
> 
> Idea: create the files in some subdirectory, and immediately move
> them to their required location.

See xfs_fsr.

> 
> >  trying to keep files as close as possible from their
> >metadata.
> 
> This is pointless for an SSD. Perhaps XFS should randomize the ag on
> nonrotational media instead.

Actually, no, it is not pointless. SSDs do not require optimisation
for minimal seek time, but data locality is still just as important
as spinning disks, if not moreso. Why? Because the garbage
collection routines in the SSDs are all about locality and we can't
drive garbage collection effectively via discard operations if the
filesystem is not keeping temporally related files close together in
it's block address space.

e.g. If the files in a directory are all close together, and the
directory is removed, we then leave a big empty contiguous region in
the filesystem free space map, and when we send discards over that
we end up with a single big trim and the drive handles that far more
effectively than lots of little trims (i.e. one per file) that the
drive cannot do anything useful with because they are all smaller
than the internal SSD page/block sizes and so get ignored.  This is
one of the reasons fstrim is so much more efficient and effective
than using the discard mount option.

And, well, XFS is designed to operate on storage devices made up of
more than one drive, so the way AGs are selected is designed to
given long term load balancing (both for space usage and
instantenous performance). With the existing algorithms we've not
had any issues with SSD lifetimes, long term performance
degradation, etc, so there's no evidence that we actually need to
change the fundamental allocation algorithms specially for SSDs.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2015-12-01 21:19 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-28  2:43 sleeps and waits during io_submit Glauber Costa
2015-11-30 14:10 ` Brian Foster
2015-11-30 14:29   ` Avi Kivity
2015-11-30 16:14     ` Brian Foster
2015-12-01  9:08       ` Avi Kivity
2015-12-01 13:11         ` Brian Foster
2015-12-01 13:58           ` Avi Kivity
2015-12-01 14:01             ` Glauber Costa
2015-12-01 14:37               ` Avi Kivity
2015-12-01 20:45               ` Dave Chinner
2015-12-01 20:56                 ` Avi Kivity
2015-12-01 23:41                   ` Dave Chinner
2015-12-02  8:23                     ` Avi Kivity
2015-12-01 14:56             ` Brian Foster
2015-12-01 15:22               ` Avi Kivity
2015-12-01 16:01                 ` Brian Foster
2015-12-01 16:08                   ` Avi Kivity
2015-12-01 16:29                     ` Brian Foster
2015-12-01 17:09                       ` Avi Kivity
2015-12-01 18:03                         ` Carlos Maiolino
2015-12-01 19:07                           ` Avi Kivity
2015-12-01 21:19                             ` Dave Chinner [this message]
2015-12-01 21:38                               ` Avi Kivity
2015-12-01 23:06                                 ` Dave Chinner
2015-12-02  9:02                                   ` Avi Kivity
2015-12-02 12:57                                     ` Carlos Maiolino
2015-12-02 23:19                                     ` Dave Chinner
2015-12-03 12:52                                       ` Avi Kivity
2015-12-04  3:16                                         ` Dave Chinner
2015-12-08 13:52                                           ` Avi Kivity
2015-12-08 23:13                                             ` Dave Chinner
2015-12-01 18:51                         ` Brian Foster
2015-12-01 19:07                           ` Glauber Costa
2015-12-01 19:35                             ` Brian Foster
2015-12-01 19:45                               ` Avi Kivity
2015-12-01 19:26                           ` Avi Kivity
2015-12-01 19:41                             ` Christoph Hellwig
2015-12-01 19:50                               ` Avi Kivity
2015-12-02  0:13                             ` Brian Foster
2015-12-02  0:57                               ` Dave Chinner
2015-12-02  8:38                                 ` Avi Kivity
2015-12-02  8:34                               ` Avi Kivity
2015-12-08  6:03                                 ` Dave Chinner
2015-12-08 13:56                                   ` Avi Kivity
2015-12-08 23:32                                     ` Dave Chinner
2015-12-09  8:37                                       ` Avi Kivity
2015-12-01 21:04                 ` Dave Chinner
2015-12-01 21:10                   ` Glauber Costa
2015-12-01 21:39                     ` Dave Chinner
2015-12-01 21:24                   ` Avi Kivity
2015-12-01 21:31                     ` Glauber Costa
2015-11-30 15:49   ` Glauber Costa
2015-12-01 13:11     ` Brian Foster
2015-12-01 13:39       ` Glauber Costa
2015-12-01 14:02         ` Brian Foster
2015-11-30 23:10 ` Dave Chinner
2015-11-30 23:51   ` Glauber Costa
2015-12-01 20:30     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151201211914.GZ19199@dastard \
    --to=david@fromorbit.com \
    --cc=avi@scylladb.com \
    --cc=glauber@scylladb.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.