linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Daniel Browning <db@kavod.com>,
	Mike Snitzer <snitzer@redhat.com>,
	Alasdair G Kergon <agk@redhat.com>
Subject: [ 16/61] dm thin: fix queue limits stacking
Date: Tue, 12 Feb 2013 12:34:36 -0800	[thread overview]
Message-ID: <20130212203420.469918899@linuxfoundation.org> (raw)
In-Reply-To: <20130212203417.890993903@linuxfoundation.org>

3.7-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mike Snitzer <snitzer@redhat.com>

commit 0f640dca08330dfc7820d610578e5935b5e654b2 upstream.

thin_io_hints() is blindly copying the queue limits from the thin-pool
which can lead to incorrect limits being set.  The fix here simply
deletes the thin_io_hints() hook which leaves the existing stacking
infrastructure to set the limits correctly.

When a thin-pool uses an MD device for the data device a thin device
from the thin-pool must respect MD's constraints about disallowing a bio
from spanning multiple chunks.  Otherwise we can see problems.  If the raid0
chunksize is 1152K and thin-pool chunksize is 256K I see the following
md/raid0 error (with extra debug tracing added to thin_endio) when
mkfs.xfs is executed against the thin device:

md/raid0:md99: make_request bug: can't convert block across chunks or bigger than 1152k 6688 127
device-mapper: thin: bio sector=2080 err=-5 bi_size=130560 bi_rw=17 bi_vcnt=32 bi_idx=0

This extra DM debugging shows that the failing bio is spanning across
the first and second logical 1152K chunk (sector 2080 + 255 takes the
bio beyond the first chunk's boundary of sector 2304).  So the bio
splitting that DM is doing clearly isn't respecting the MD limits.

max_hw_sectors_kb is 127 for both the thin-pool and thin device
(queue_max_hw_sectors returns 255 so we'll excuse sysfs's lack of
precision).  So this explains why bi_size is 130560.

But the thin device's max_hw_sectors_kb should be 4 (PAGE_SIZE) given
that it doesn't have a .merge function (for bio_add_page to consult
indirectly via dm_merge_bvec) yet the thin-pool does sit above an MD
device that has a compulsory merge_bvec_fn.  This scenario is exactly
why DM must resort to sending single PAGE_SIZE bios to the underlying
layer. Some additional context for this is available in the header for
commit 8cbeb67a ("dm: avoid unsupported spanning of md stripe boundaries").

Long story short, the reason a thin device doesn't properly get
configured to have a max_hw_sectors_kb of 4 (PAGE_SIZE) is that
thin_io_hints() is blindly copying the queue limits from the thin-pool
device directly to the thin device's queue limits.

Fix this by eliminating thin_io_hints.  Doing so is safe because the
block layer's queue limits stacking already enables the upper level thin
device to inherit the thin-pool device's discard and minimum_io_size and
optimal_io_size limits that get set in pool_io_hints.  But avoiding the
queue limits copy allows the thin and thin-pool limits to be different
where it is important, namely max_hw_sectors_kb.

Reported-by: Daniel Browning <db@kavod.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/md/dm-thin.c |   13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -2766,19 +2766,9 @@ static int thin_iterate_devices(struct d
 	return 0;
 }
 
-/*
- * A thin device always inherits its queue limits from its pool.
- */
-static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
-{
-	struct thin_c *tc = ti->private;
-
-	*limits = bdev_get_queue(tc->pool_dev->bdev)->limits;
-}
-
 static struct target_type thin_target = {
 	.name = "thin",
-	.version = {1, 5, 0},
+	.version = {1, 5, 1},
 	.module	= THIS_MODULE,
 	.ctr = thin_ctr,
 	.dtr = thin_dtr,
@@ -2787,7 +2777,6 @@ static struct target_type thin_target =
 	.postsuspend = thin_postsuspend,
 	.status = thin_status,
 	.iterate_devices = thin_iterate_devices,
-	.io_hints = thin_io_hints,
 };
 
 /*----------------------------------------------------------------*/



  parent reply	other threads:[~2013-02-12 21:08 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-12 20:34 [ 00/61] 3.7.8-stable review Greg Kroah-Hartman
2013-02-12 20:34 ` [ 01/61] rtlwifi: Fix the usage of the wrong variable in usb.c Greg Kroah-Hartman
2013-02-12 20:34 ` [ 02/61] rtlwifi: Fix scheduling while atomic bug Greg Kroah-Hartman
2013-02-12 20:34 ` [ 03/61] regulator: max8998: fix incorrect min_uV value for ldo10 Greg Kroah-Hartman
2013-02-12 20:34 ` [ 04/61] regulator: clear state each invocation of of_regulator_match Greg Kroah-Hartman
2013-02-12 20:34 ` [ 05/61] regulator: s2mps11: fix incorrect register for buck10 Greg Kroah-Hartman
2013-02-12 20:34 ` [ 06/61] IB/qib: Fix for broken sparse warning fix Greg Kroah-Hartman
2013-02-12 20:34 ` [ 07/61] virtio_console: Dont access uninitialized data Greg Kroah-Hartman
2013-02-12 20:34 ` [ 08/61] Bluetooth: Fix handling of unexpected SMP PDUs Greg Kroah-Hartman
2013-02-12 20:34 ` [ 09/61] Revert "iwlwifi: fix the reclaimed packet tracking upon flush queue" Greg Kroah-Hartman
2013-02-12 20:34 ` [ 10/61] can: c_can: Set reserved bit in IFx_MASK2 to 1 on write Greg Kroah-Hartman
2013-02-12 20:34 ` [ 11/61] mwifiex: fix incomplete scan in case of IE parsing error Greg Kroah-Hartman
2013-02-12 20:34 ` [ 12/61] e1000e: enable ECC on I217/I218 to catch packet buffer memory errors Greg Kroah-Hartman
2013-02-12 20:34 ` [ 13/61] media: pwc-if: must check vb2_queue_init() success Greg Kroah-Hartman
2013-02-12 20:34 ` [ 14/61] ath9k_hw: fix calibration issues on chainmask that dont include chain 0 Greg Kroah-Hartman
2013-02-12 20:34 ` [ 15/61] mfd: db8500-prcmu: Fix irqdomain usage Greg Kroah-Hartman
2013-02-12 20:34 ` Greg Kroah-Hartman [this message]
2013-02-12 20:34 ` [ 17/61] net: prevent setting ttl=0 via IP_TTL Greg Kroah-Hartman
2013-02-12 20:34 ` [ 18/61] ipv6: fix the noflags test in addrconf_get_prefix_route Greg Kroah-Hartman
2013-02-12 20:34 ` [ 19/61] net, wireless: overwrite default_ethtool_ops Greg Kroah-Hartman
2013-02-12 20:34 ` [ 20/61] tcp: fix a panic on UP machines in reqsk_fastopen_remove Greg Kroah-Hartman
2013-02-12 20:34 ` [ 21/61] MAINTAINERS: Stephen Hemminger email change Greg Kroah-Hartman
2013-02-12 20:34 ` [ 22/61] ipv6: fix header length calculation in ip6_append_data() Greg Kroah-Hartman
2013-02-12 20:34 ` [ 23/61] macvlan: fix macvlan_get_size() Greg Kroah-Hartman
2013-02-12 20:34 ` [ 24/61] net: calxedaxgmac: throw away overrun frames Greg Kroah-Hartman
2013-02-12 20:34 ` [ 25/61] net/mlx4_en: Fix bridged vSwitch configuration for non SRIOV mode Greg Kroah-Hartman
2013-02-12 20:34 ` [ 26/61] net/mlx4_core: Set number of msix vectors under SRIOV mode to firmware defaults Greg Kroah-Hartman
2013-02-12 20:34 ` [ 27/61] tcp: fix incorrect LOCKDROPPEDICMPS counter Greg Kroah-Hartman
2013-02-12 20:34 ` [ 28/61] isdn/gigaset: fix zero size border case in debug dump Greg Kroah-Hartman
2013-02-12 20:34 ` [ 29/61] netxen: fix off by one bug in netxen_release_tx_buffer() Greg Kroah-Hartman
2013-02-12 20:34 ` [ 30/61] r8169: remove the obsolete and incorrect AMD workaround Greg Kroah-Hartman
2013-02-12 20:34 ` [ 31/61] net: loopback: fix a dst refcounting issue Greg Kroah-Hartman
2013-02-12 20:34 ` [ 32/61] IP_GRE: Fix kernel panic in IP_GRE with GRE csum Greg Kroah-Hartman
2013-02-12 20:34 ` [ 33/61] pktgen: correctly handle failures when adding a device Greg Kroah-Hartman
2013-02-12 20:34 ` [ 34/61] ipv6: do not create neighbor entries for local delivery Greg Kroah-Hartman
2013-02-12 20:34 ` [ 35/61] via-rhine: Fix bugs in NAPI support Greg Kroah-Hartman
2013-02-12 20:34 ` [ 36/61] packet: fix leakage of tx_ring memory Greg Kroah-Hartman
2013-02-12 20:34 ` [ 37/61] ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit() Greg Kroah-Hartman
2013-02-12 20:34 ` [ 38/61] atm/iphase: rename fregt_t -> ffreg_t Greg Kroah-Hartman
2013-02-12 20:34 ` [ 39/61] xen/netback: shutdown the ring if it contains garbage Greg Kroah-Hartman
2013-02-12 20:35 ` [ 40/61] xen/netback: dont leak pages on failure in xen_netbk_tx_check_gop Greg Kroah-Hartman
2013-02-12 20:35 ` [ 41/61] xen/netback: free already allocated memory on failure in xen_netbk_get_requests Greg Kroah-Hartman
2013-02-12 20:35 ` [ 42/61] netback: correct netbk_tx_err to handle wrap around Greg Kroah-Hartman
2013-02-12 20:35 ` [ 43/61] ipv4: Remove output route check in ipv4_mtu Greg Kroah-Hartman
2013-02-12 20:35 ` [ 44/61] ipv4: Dont update the pmtu on mtu locked routes Greg Kroah-Hartman
2013-02-12 20:35 ` [ 45/61] ipv6: Add an error handler for icmp6 Greg Kroah-Hartman
2013-02-12 20:35 ` [ 46/61] ipv4: Invalidate the socket cached route on pmtu events if possible Greg Kroah-Hartman
2013-02-12 20:35 ` [ 47/61] ipv4: Add a socket release callback for datagram sockets Greg Kroah-Hartman
2013-02-12 20:35 ` [ 48/61] ipv4: Fix route refcount on pmtu discovery Greg Kroah-Hartman
2013-02-12 20:35 ` [ 49/61] sctp: refactor sctp_outq_teardown to insure proper re-initalization Greg Kroah-Hartman
2013-02-12 20:35 ` [ 50/61] net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree Greg Kroah-Hartman
2013-02-12 20:35 ` [ 51/61] net: sctp: sctp_endpoint_free: zero out secret key data Greg Kroah-Hartman
2013-02-12 20:35 ` [ 52/61] tcp: detect SYN/data drop when F-RTO is disabled Greg Kroah-Hartman
2013-02-12 20:35 ` [ 53/61] tcp: fix an infinite loop in tcp_slow_start() Greg Kroah-Hartman
2013-02-12 20:35 ` [ 54/61] tcp: frto should not set snd_cwnd to 0 Greg Kroah-Hartman
2013-02-12 20:35 ` [ 55/61] tcp: fix for zero packets_in_flight was too broad Greg Kroah-Hartman
2013-02-12 20:35 ` [ 56/61] tcp: dont abort splice() after small transfers Greg Kroah-Hartman
2013-02-12 20:35 ` [ 57/61] tcp: splice: fix an infinite loop in tcp_read_sock() Greg Kroah-Hartman
2013-02-12 20:35 ` [ 58/61] tcp: fix splice() and tcp collapsing interaction Greg Kroah-Hartman
2013-02-12 20:35 ` [ 59/61] net: splice: avoid high order page splitting Greg Kroah-Hartman
2013-02-12 20:35 ` [ 60/61] net: splice: fix __splice_segment() Greg Kroah-Hartman
2013-02-12 20:35 ` [ 61/61] drm/nouveau: add lockdep annotations Greg Kroah-Hartman
2013-02-13  3:35   ` Peter Hurley
2013-02-13  9:33     ` Arend van Spriel
2013-02-13  9:43       ` Ben Skeggs
2013-02-13  9:52         ` Arend van Spriel
2013-02-13 17:46         ` Marcin Slusarz
2013-02-13 18:38           ` Marcin Slusarz
2013-02-13  7:06 ` [ 00/61] 3.7.8-stable review Satoru Takeuchi
2013-02-13 15:51 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130212203420.469918899@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=agk@redhat.com \
    --cc=db@kavod.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=snitzer@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).