All of lore.kernel.org
 help / color / mirror / Atom feed
From: Milosz Tanski <milosz@adfin.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>, Jens Axboe <axboe@kernel.dk>,
	Goldwyn Rodrigues <rgoldwyn@suse.com>,
	Mel Gorman <mgorman@suse.de>,
	Volker Lendecke <Volker.Lendecke@sernet.de>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	linux-block@vger.kernel.org
Subject: Re: non-blockling buffered reads
Date: Thu, 29 Jun 2017 21:12:19 -0400	[thread overview]
Message-ID: <CANP1eJG6OUtZ_Vcu0A53SZYKfwSY8n5nVfvXnokPMXzpbdLQ1A@mail.gmail.com> (raw)
In-Reply-To: <20170629212503.15110-1-hch@lst.de>

On Thu, Jun 29, 2017 at 5:25 PM, Christoph Hellwig <hch@lst.de> wrote:
>
> This series resurrects the old patches from Milosz to implement
> non-blocking buffered reads.  Thanks to the non-blocking AIO code from
> Goldwyn the implementation becomes pretty much trivial.  As that
> implementation is in the block tree I would suggest that we merge
> these patches through the block tree as well.  I've also forward
> ported the test Milosz sent for recent xfsprogs to verify it works
> properly, but I'll still have to address the review comments for it.
> I'll also volunteer to work with Goldwyn to properly document the
> RWF_NOWAIT flag in the man page including this change.

I had update patches for the man pages; so I'll check tomorrow if I
can dig up the changes and I'll forward them on.

>
> Here are additional details from the original cover letter from Milosz,
> where the flag was still called RWF_NONBLOCK:
>
>
> Background:
>
>  Using a threadpool to emulate non-blocking operations on regular buffered
>  files is a common pattern today (samba, libuv, etc...) Applications split the
>  work between network bound threads (epoll) and IO threadpool. Not every
>  application can use sendfile syscall (TLS / post-processing).
>
>  This common pattern leads to increased request latency. Latency can be due to
>  additional synchronization between the threads or fast (cached data) request
>  stuck behind slow request (large / uncached data).
>
>  The preadv2 syscall with RWF_NONBLOCK lets userspace applications bypass
>  enqueuing operation in the threadpool if it's already available in the
>  pagecache.
>
>
> Performance numbers (newer Samba):
>
>  https://drive.google.com/file/d/0B3maCn0jCvYncndGbXJKbGlhejQ/view?usp=sharing
>  https://docs.google.com/spreadsheets/d/1GGTivi-MfZU0doMzomG4XUo9ioWtRvOGQ5FId042L6s/edit?usp=sharing
>
>
> Performance number (older):
>
>  Some perf data generated using fio comparing the posix aio engine to a version
>  of the posix AIO engine that attempts to performs "fast" reads before
>  submitting the operations to the queue. This workflow is on ext4 partition on
>  raid0 (test / build-rig.) Simulating our database access patern workload using
>  16kb read accesses. Our database uses a home-spun posix aio like queue (samba
>  does the same thing.)
>
>  f1: ~73% rand read over mostly cached data (zipf med-size dataset)
>  f2: ~18% rand read over mostly un-cached data (uniform large-dataset)
>  f3: ~9% seq-read over large dataset
>
>  before:
>
>  f1:
>      bw (KB  /s): min=   11, max= 9088, per=0.56%, avg=969.54, stdev=827.99
>      lat (msec) : 50=0.01%, 100=1.06%, 250=5.88%, 500=4.08%, 750=12.48%
>      lat (msec) : 1000=17.27%, 2000=49.86%, >=2000=9.42%
>  f2:
>      bw (KB  /s): min=    2, max= 1882, per=0.16%, avg=273.28, stdev=220.26
>      lat (msec) : 250=5.65%, 500=3.31%, 750=15.64%, 1000=24.59%, 2000=46.56%
>      lat (msec) : >=2000=4.33%
>  f3:
>      bw (KB  /s): min=    0, max=265568, per=99.95%, avg=174575.10,
>                   stdev=34526.89
>      lat (usec) : 2=0.01%, 4=0.01%, 10=0.02%, 20=0.27%, 50=10.82%
>      lat (usec) : 100=50.34%, 250=5.05%, 500=7.12%, 750=6.60%, 1000=4.55%
>      lat (msec) : 2=8.73%, 4=3.49%, 10=1.83%, 20=0.89%, 50=0.22%
>      lat (msec) : 100=0.05%, 250=0.02%, 500=0.01%
>  total:
>     READ: io=102365MB, aggrb=174669KB/s, minb=240KB/s, maxb=173599KB/s,
>           mint=600001msec, maxt=600113msec
>
>  after (with fast read using preadv2 before submit):
>
>  f1:
>      bw (KB  /s): min=    3, max=14897, per=1.28%, avg=2276.69, stdev=2930.39
>      lat (usec) : 2=70.63%, 4=0.01%
>      lat (msec) : 250=0.20%, 500=2.26%, 750=1.18%, 2000=0.22%, >=2000=25.53%
>  f2:
>      bw (KB  /s): min=    2, max= 2362, per=0.14%, avg=249.83, stdev=222.00
>      lat (msec) : 250=6.35%, 500=1.78%, 750=9.29%, 1000=20.49%, 2000=52.18%
>      lat (msec) : >=2000=9.99%
>  f3:
>      bw (KB  /s): min=    1, max=245448, per=100.00%, avg=177366.50,
>                   stdev=35995.60
>      lat (usec) : 2=64.04%, 4=0.01%, 10=0.01%, 20=0.06%, 50=0.43%
>      lat (usec) : 100=0.20%, 250=1.27%, 500=2.93%, 750=3.93%, 1000=7.35%
>      lat (msec) : 2=14.27%, 4=2.88%, 10=1.54%, 20=0.81%, 50=0.22%
>      lat (msec) : 100=0.05%, 250=0.02%
>  total:
>     READ: io=103941MB, aggrb=177339KB/s, minb=213KB/s, maxb=176375KB/s,
>           mint=600020msec, maxt=600178msec
>
>  Interpreting the results you can see total bandwidth stays the same but overall
>  request latency is decreased in f1 (random, mostly cached) and f3 (sequential)
>  workloads. There is a slight bump in latency for since it's random data that's
>  unlikely to be cached but we're always trying "fast read".
>
>  In our application we have starting keeping track of "fast read" hits/misses
>  and for files / requests that have a lot hit ratio we don't do "fast reads"
>  mostly getting rid of extra latency in the uncached cases. In our real world
>  work load we were able to reduce average response time by 20 to 30% (depends
>  on amount of IO done by request).
>
>  I've performed other benchmarks and I have no observed any perf regressions in
>  any of the normal (old) code paths.




-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: milosz@adfin.com

      parent reply	other threads:[~2017-06-30  1:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-29 21:25 non-blockling buffered reads Christoph Hellwig
2017-06-29 21:25 ` [PATCH 1/3] fs: pass iocb to do_generic_file_read Christoph Hellwig
2017-06-29 21:25 ` [PATCH 2/3] fs: support IOCB_NOWAIT in generic_file_buffered_read Christoph Hellwig
2017-06-29 21:25 ` [PATCH 3/3] fs: support RWF_NOWAIT for buffered reads Christoph Hellwig
2017-06-30  3:43   ` Goldwyn Rodrigues
2017-06-29 23:46 ` non-blockling " Al Viro
2017-06-30  0:34   ` Christoph Hellwig
2017-06-30  1:11 ` Milosz Tanski
2017-06-30  1:12 ` Milosz Tanski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANP1eJG6OUtZ_Vcu0A53SZYKfwSY8n5nVfvXnokPMXzpbdLQ1A@mail.gmail.com \
    --to=milosz@adfin.com \
    --cc=Volker.Lendecke@sernet.de \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=rgoldwyn@suse.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.