linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Ming Lin <mlin@kernel.org>
Cc: Ming Lei <ming.lei@canonical.com>,
	dm-devel@redhat.com, Christoph Hellwig <hch@lst.de>,
	Alasdair G Kergon <agk@redhat.com>,
	Lars Ellenberg <drbd-dev@lists.linbit.com>,
	Philip Kelleher <pjk1939@linux.vnet.ibm.com>,
	Joshua Morris <josh.h.morris@us.ibm.com>,
	Christoph Hellwig <hch@infradead.org>,
	Kent Overstreet <kent.overstreet@gmail.com>,
	Nitin Gupta <ngupta@vflare.org>,
	Oleg Drokin <oleg.drokin@intel.com>,
	Al Viro <viro@zeniv.linux.org.uk>, Jens Axboe <axboe@kernel.dk>,
	Andreas Dilger <andreas.dilger@intel.com>,
	Geoff Levand <geoff@infradead.org>, Jiri Kosina <jkosina@suse.cz>,
	lkml <linux-kernel@vger.kernel.org>, Jim Paris <jim@jtan.com>,
	Minchan Kim <minchan@kernel.org>, Dongsu Park <dpark@posteo.net>,
	drbd-user@lists.linbit.com
Subject: Re: [PATCH v4 01/11] block: make generic_make_request handle arbitrarily sized bios
Date: Wed, 10 Jun 2015 17:46:11 -0400	[thread overview]
Message-ID: <20150610214611.GA744@redhat.com> (raw)
In-Reply-To: <CAF1ivSbo456n+1JxDp7eAHAgKs5qBOcomN4N9hhCEXmy8i4nPQ@mail.gmail.com>

On Wed, Jun 10 2015 at  5:20pm -0400,
Ming Lin <mlin@kernel.org> wrote:

> On Mon, Jun 8, 2015 at 11:09 PM, Ming Lin <mlin@kernel.org> wrote:
> > On Thu, 2015-06-04 at 17:06 -0400, Mike Snitzer wrote:
> >> We need to test on large HW raid setups like a Netapp filer (or even
> >> local SAS drives connected via some SAS controller).  Like a 8+2 drive
> >> RAID6 or 8+1 RAID5 setup.  Testing with MD raid on JBOD setups with 8
> >> devices is also useful.  It is larger RAID setups that will be more
> >> sensitive to IO sizes being properly aligned on RAID stripe and/or chunk
> >> size boundaries.
> >
> > Here are tests results of xfs/ext4/btrfs read/write on HW RAID6/MD RAID6/DM stripe target.
> > Each case run 0.5 hour, so it took 36 hours to finish all the tests on 4.1-rc4 and 4.1-rc4-patched kernels.
> >
> > No performance regressions were introduced.
> >
> > Test server: Dell R730xd(2 sockets/48 logical cpus/264G memory)
> > HW RAID6/MD RAID6/DM stripe target were configured with 10 HDDs, each 280G
> > Stripe size 64k and 128k were tested.
> >
> > devs="/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk"
> > spare_devs="/dev/sdl /dev/sdm"
> > stripe_size=64 (or 128)
> >
> > MD RAID6 was created by:
> > mdadm --create --verbose /dev/md0 --level=6 --raid-devices=10 $devs --spare-devices=2 $spare_devs -c $stripe_size
> >
> > DM stripe target was created by:
> > pvcreate $devs
> > vgcreate striped_vol_group $devs
> > lvcreate -i10 -I${stripe_size} -L2T -nstriped_logical_volume striped_vol_group

DM had a regression relative to merge_bvec that wasn't fixed until
recently (it wasn't in 4.1-rc4), see commit 1c220c69ce0 ("dm: fix
casting bug in dm_merge_bvec()").  It was introduced in 4.1.

So your 4.1-rc4 DM stripe testing may have effectively been with
merge_bvec disabled.

> > Here is an example of fio script for stripe size 128k:
> > [global]
> > ioengine=libaio
> > iodepth=64
> > direct=1
> > runtime=1800
> > time_based
> > group_reporting
> > numjobs=48
> > gtod_reduce=0
> > norandommap
> > write_iops_log=fs
> >
> > [job1]
> > bs=1280K
> > directory=/mnt
> > size=5G
> > rw=read
> >
> > All results here: http://minggr.net/pub/20150608/fio_results/
> >
> > Results summary:
> >
> > 1. HW RAID6: stripe size 64k
> >                 4.1-rc4         4.1-rc4-patched
> >                 -------         ---------------
> >                 (MB/s)          (MB/s)
> > xfs read:       821.23          812.20  -1.09%
> > xfs write:      753.16          754.42  +0.16%
> > ext4 read:      827.80          834.82  +0.84%
> > ext4 write:     783.08          777.58  -0.70%
> > btrfs read:     859.26          871.68  +1.44%
> > btrfs write:    815.63          844.40  +3.52%
> >
> > 2. HW RAID6: stripe size 128k
> >                 4.1-rc4         4.1-rc4-patched
> >                 -------         ---------------
> >                 (MB/s)          (MB/s)
> > xfs read:       948.27          979.11  +3.25%
> > xfs write:      820.78          819.94  -0.10%
> > ext4 read:      978.35          997.92  +2.00%
> > ext4 write:     853.51          847.97  -0.64%
> > btrfs read:     1013.1          1015.6  +0.24%
> > btrfs write:    854.43          850.42  -0.46%
> >
> > 3. MD RAID6: stripe size 64k
> >                 4.1-rc4         4.1-rc4-patched
> >                 -------         ---------------
> >                 (MB/s)          (MB/s)
> > xfs read:       847.34          869.43  +2.60%
> > xfs write:      198.67          199.03  +0.18%
> > ext4 read:      763.89          767.79  +0.51%
> > ext4 write:     281.44          282.83  +0.49%
> > btrfs read:     756.02          743.69  -1.63%
> > btrfs write:    268.37          265.93  -0.90%
> >
> > 4. MD RAID6: stripe size 128k
> >                 4.1-rc4         4.1-rc4-patched
> >                 -------         ---------------
> >                 (MB/s)          (MB/s)
> > xfs read:       993.04          1014.1  +2.12%
> > xfs write:      293.06          298.95  +2.00%
> > ext4 read:      1019.6          1020.9  +0.12%
> > ext4 write:     371.51          371.47  -0.01%
> > btrfs read:     1000.4          1020.8  +2.03%
> > btrfs write:    241.08          246.77  +2.36%
> >
> > 5. DM: stripe size 64k
> >                 4.1-rc4         4.1-rc4-patched
> >                 -------         ---------------
> >                 (MB/s)          (MB/s)
> > xfs read:       1084.4          1080.1  -0.39%
> > xfs write:      1071.1          1063.4  -0.71%
> > ext4 read:      991.54          1003.7  +1.22%
> > ext4 write:     1069.7          1052.2  -1.63%
> > btrfs read:     1076.1          1082.1  +0.55%
> > btrfs write:    968.98          965.07  -0.40%
> >
> > 6. DM: stripe size 128k
> >                 4.1-rc4         4.1-rc4-patched
> >                 -------         ---------------
> >                 (MB/s)          (MB/s)
> > xfs read:       1020.4          1066.1  +4.47%
> > xfs write:      1058.2          1066.6  +0.79%
> > ext4 read:      990.72          988.19  -0.25%
> > ext4 write:     1050.4          1070.2  +1.88%
> > btrfs read:     1080.9          1074.7  -0.57%
> > btrfs write:    975.10          972.76  -0.23%
> 
> Hi Mike,
> 
> How about these numbers?

Looks fairly good.  I just am not sure the workload is going to test the
code paths in question like we'd hope.  I'll have to set aside some time
to think through scenarios to test.

My concern still remains that at some point it the future we'll regret
not having merge_bvec but it'll be too late.  That is just my own FUD at
this point...

> I'm also happy to run other fio jobs your team used.

I've been busy getting DM changes for the 4.2 merge window finalized.
As such I haven't connected with others on the team to discuss this
issue.

I'll see if we can make time in the next 2 days.  But I also have
RHEL-specific kernel deadlines I'm coming up against.

Seems late to be staging this extensive a change for 4.2... are you
pushing for this code to land in the 4.2 merge window?  Or do we have
time to work this further and target the 4.3 merge?

Mike

  reply	other threads:[~2015-06-10 21:46 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-22 18:18 [PATCH v4 00/11] simplify block layer based on immutable biovecs Ming Lin
2015-05-22 18:18 ` [PATCH v4 01/11] block: make generic_make_request handle arbitrarily sized bios Ming Lin
2015-05-25  5:46   ` NeilBrown
2015-05-26 14:36   ` Mike Snitzer
2015-05-26 15:02     ` Ming Lin
2015-05-26 15:34       ` Alasdair G Kergon
2015-05-26 23:06         ` NeilBrown
2015-05-27  0:40           ` [dm-devel] " Alasdair G Kergon
2015-05-27  8:20             ` Christoph Hellwig
2015-05-26 16:04       ` Mike Snitzer
2015-05-26 17:17         ` Ming Lin
2015-05-27 23:42         ` Ming Lin
2015-05-28  0:36           ` Alasdair G Kergon
2015-05-28  5:54             ` Ming Lin
2015-05-29  7:05             ` Ming Lin
2015-05-29 15:15               ` Mike Snitzer
2015-06-01  6:02             ` Ming Lin
2015-06-02 20:59               ` Ming Lin
2015-06-04 21:06                 ` Mike Snitzer
2015-06-04 22:21                   ` Ming Lin
2015-06-05  0:06                     ` Mike Snitzer
2015-06-05  5:21                       ` Ming Lin
2015-06-09  6:09                   ` Ming Lin
2015-06-10 21:20                     ` Ming Lin
2015-06-10 21:46                       ` Mike Snitzer [this message]
2015-06-10 22:06                         ` Ming Lin
2015-06-12  5:49                           ` Ming Lin
2015-06-18  5:27                         ` Ming Lin
2015-05-22 18:18 ` [PATCH v4 02/11] block: simplify bio_add_page() Ming Lin
2015-05-22 18:18 ` [PATCH v4 03/11] bcache: remove driver private bio splitting code Ming Lin
2015-05-22 18:18 ` [PATCH v4 04/11] btrfs: remove bio splitting and merge_bvec_fn() calls Ming Lin
2015-05-22 18:18 ` [PATCH v4 05/11] block: remove split code in blkdev_issue_discard Ming Lin
2015-05-22 18:18 ` [PATCH v4 06/11] md/raid5: get rid of bio_fits_rdev() Ming Lin
2015-05-25  5:48   ` NeilBrown
2015-05-25  7:03     ` Ming Lin
2015-05-25  7:54       ` NeilBrown
2015-05-25 14:17         ` Christoph Hellwig
2015-05-26 14:33           ` Ming Lin
2015-05-26 22:32             ` Ming Lin
2015-05-26 23:03               ` NeilBrown
2015-05-26 23:42                 ` Ming Lin
2015-05-27  0:38                   ` NeilBrown
2015-05-27  8:15                 ` Christoph Hellwig
2015-05-22 18:18 ` [PATCH v4 07/11] md/raid5: split bio for chunk_aligned_read Ming Lin
2015-05-22 18:18 ` [PATCH v4 08/11] block: kill merge_bvec_fn() completely Ming Lin
2015-05-25  5:49   ` NeilBrown
2015-05-25 14:04   ` Christoph Hellwig
2015-05-25 15:02     ` Ilya Dryomov
2015-05-25 15:08       ` Christoph Hellwig
2015-05-25 15:19         ` Ilya Dryomov
2015-05-25 15:35       ` Alex Elder
2015-05-22 18:18 ` [PATCH v4 09/11] fs: use helper bio_add_page() instead of open coding on bi_io_vec Ming Lin
2015-05-22 18:18 ` [PATCH v4 10/11] block: remove bio_get_nr_vecs() Ming Lin
2015-05-22 18:18 ` [PATCH v4 11/11] Documentation: update notes in biovecs about arbitrarily sized bios Ming Lin
2015-05-23 14:15 ` [PATCH v4 00/11] simplify block layer based on immutable biovecs Christoph Hellwig
2015-05-24  7:37   ` Ming Lin
2015-05-25 13:51     ` Christoph Hellwig
2015-05-29  6:39       ` Ming Lin
2015-06-01  6:15   ` Ming Lin
2015-06-03  6:57     ` Christoph Hellwig
2015-06-03 13:28       ` Jeff Moyer
2015-06-03 17:06         ` Ming Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150610214611.GA744@redhat.com \
    --to=snitzer@redhat.com \
    --cc=agk@redhat.com \
    --cc=andreas.dilger@intel.com \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=dpark@posteo.net \
    --cc=drbd-dev@lists.linbit.com \
    --cc=drbd-user@lists.linbit.com \
    --cc=geoff@infradead.org \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=jim@jtan.com \
    --cc=jkosina@suse.cz \
    --cc=josh.h.morris@us.ibm.com \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=minchan@kernel.org \
    --cc=ming.lei@canonical.com \
    --cc=mlin@kernel.org \
    --cc=ngupta@vflare.org \
    --cc=oleg.drokin@intel.com \
    --cc=pjk1939@linux.vnet.ibm.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).