linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Jeff Moyer <jmoyer@redhat.com>,
	Hannes Reinecke <hare@suse.de>, Jens Axboe <axboe@kernel.dk>
Subject: [PATCH 3.10 62/80] block: fix race between request completion and timeout handling
Date: Tue, 26 Nov 2013 16:57:31 -0800	[thread overview]
Message-ID: <20131127005645.246080115@linuxfoundation.org> (raw)
In-Reply-To: <20131127005640.934155527@linuxfoundation.org>

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jeff Moyer <jmoyer@redhat.com>

commit 4912aa6c11e6a5d910264deedbec2075c6f1bb73 upstream.

crocode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp ioatdma dca be2net sg ses enclosure ext4 mbcache jbd2 sd_mod crc_t10dif ahci megaraid_sas(U) dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 491, comm: scsi_eh_0 Tainted: G        W  ----------------   2.6.32-220.13.1.el6.x86_64 #1 IBM  -[8722PAX]-/00D1461
RIP: 0010:[<ffffffff8124e424>]  [<ffffffff8124e424>] blk_requeue_request+0x94/0xa0
RSP: 0018:ffff881057eefd60  EFLAGS: 00010012
RAX: ffff881d99e3e8a8 RBX: ffff881d99e3e780 RCX: ffff881d99e3e8a8
RDX: ffff881d99e3e8a8 RSI: ffff881d99e3e780 RDI: ffff881d99e3e780
RBP: ffff881057eefd80 R08: ffff881057eefe90 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff881057f92338
R13: 0000000000000000 R14: ffff881057f92338 R15: ffff883058188000
FS:  0000000000000000(0000) GS:ffff880040200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006d3ec0 CR3: 000000302cd7d000 CR4: 00000000000406b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process scsi_eh_0 (pid: 491, threadinfo ffff881057eee000, task ffff881057e29540)
Stack:
 0000000000001057 0000000000000286 ffff8810275efdc0 ffff881057f16000
<0> ffff881057eefdd0 ffffffff81362323 ffff881057eefe20 ffffffff8135f393
<0> ffff881057e29af8 ffff8810275efdc0 ffff881057eefe78 ffff881057eefe90
Call Trace:
 [<ffffffff81362323>] __scsi_queue_insert+0xa3/0x150
 [<ffffffff8135f393>] ? scsi_eh_ready_devs+0x5e3/0x850
 [<ffffffff81362a23>] scsi_queue_insert+0x13/0x20
 [<ffffffff8135e4d4>] scsi_eh_flush_done_q+0x104/0x160
 [<ffffffff8135fb6b>] scsi_error_handler+0x35b/0x660
 [<ffffffff8135f810>] ? scsi_error_handler+0x0/0x660
 [<ffffffff810908c6>] kthread+0x96/0xa0
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffff81090830>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
Code: 00 00 eb d1 4c 8b 2d 3c 8f 97 00 4d 85 ed 74 bf 49 8b 45 00 49 83 c5 08 48 89 de 4c 89 e7 ff d0 49 8b 45 00 48 85 c0 75 eb eb a4 <0f> 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00
RIP  [<ffffffff8124e424>] blk_requeue_request+0x94/0xa0
 RSP <ffff881057eefd60>

The RIP is this line:
        BUG_ON(blk_queued_rq(rq));

After digging through the code, I think there may be a race between the
request completion and the timer handler running.

A timer is started for each request put on the device's queue (see
blk_start_request->blk_add_timer).  If the request does not complete
before the timer expires, the timer handler (blk_rq_timed_out_timer)
will mark the request complete atomically:

static inline int blk_mark_rq_complete(struct request *rq)
{
        return test_and_set_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
}

and then call blk_rq_timed_out.  The latter function will call
scsi_times_out, which will return one of BLK_EH_HANDLED,
BLK_EH_RESET_TIMER or BLK_EH_NOT_HANDLED.  If BLK_EH_RESET_TIMER is
returned, blk_clear_rq_complete is called, and blk_add_timer is again
called to simply wait longer for the request to complete.

Now, if the request happens to complete while this is going on, what
happens?  Given that we know the completion handler will bail if it
finds the REQ_ATOM_COMPLETE bit set, we need to focus on the completion
handler running after that bit is cleared.  So, from the above
paragraph, after the call to blk_clear_rq_complete.  If the completion
sets REQ_ATOM_COMPLETE before the BUG_ON in blk_add_timer, we go boom
there (I haven't seen this in the cores).  Next, if we get the
completion before the call to list_add_tail, then the timer will
eventually fire for an old req, which may either be freed or reallocated
(there is evidence that this might be the case).  Finally, if the
completion comes in *after* the addition to the timeout list, I think
it's harmless.  The request will be removed from the timeout list,
req_atom_complete will be set, and all will be well.

This will only actually explain the coredumps *IF* the request
structure was freed, reallocated *and* queued before the error handler
thread had a chance to process it.  That is possible, but it may make
sense to keep digging for another race.  I think that if this is what
was happening, we would see other instances of this problem showing up
as null pointer or garbage pointer dereferences, for example when the
request structure was not re-used.  It looks like we actually do run
into that situation in other reports.

This patch moves the BUG_ON(test_bit(REQ_ATOM_COMPLETE,
&req->atomic_flags)); from blk_add_timer to the only caller that could
trip over it (blk_start_request).  It then inverts the calls to
blk_clear_rq_complete and blk_add_timer in blk_rq_timed_out to address
the race.  I've boot tested this patch, but nothing more.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 block/blk-core.c    |    1 +
 block/blk-timeout.c |    3 +--
 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2229,6 +2229,7 @@ void blk_start_request(struct request *r
 	if (unlikely(blk_bidi_rq(req)))
 		req->next_rq->resid_len = blk_rq_bytes(req->next_rq);
 
+	BUG_ON(test_bit(REQ_ATOM_COMPLETE, &req->atomic_flags));
 	blk_add_timer(req);
 }
 EXPORT_SYMBOL(blk_start_request);
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -90,8 +90,8 @@ static void blk_rq_timed_out(struct requ
 		__blk_complete_request(req);
 		break;
 	case BLK_EH_RESET_TIMER:
-		blk_clear_rq_complete(req);
 		blk_add_timer(req);
+		blk_clear_rq_complete(req);
 		break;
 	case BLK_EH_NOT_HANDLED:
 		/*
@@ -173,7 +173,6 @@ void blk_add_timer(struct request *req)
 		return;
 
 	BUG_ON(!list_empty(&req->timeout_list));
-	BUG_ON(test_bit(REQ_ATOM_COMPLETE, &req->atomic_flags));
 
 	/*
 	 * Some LLDs, like scsi, peek at the timeout to prevent a



  parent reply	other threads:[~2013-11-27  1:47 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-27  0:56 [PATCH 3.10 00/80] 3.10.21-stable review Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 01/80] ACPICA: DeRefOf operator: Update to fully resolve FieldUnit and BufferField refs Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 02/80] libertas: potential oops in debugfs Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 03/80] aacraid: prevent invalid pointer dereference Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 04/80] ACPICA: Return error if DerefOf resolves to a null package element Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 05/80] ACPICA: Fix for a Store->ArgX when ArgX contains a reference to a field Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 06/80] USB: mos7840: fix tiocmget error handling Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 07/80] can: kvaser_usb: fix usb endpoints detection Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 08/80] crypto: ansi_cprng - Fix off by one error in non-block size request Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 09/80] crypto: s390 - Fix aes-cbc IV corruption Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 10/80] can: c_can: Fix RX message handling, handle lost message before EOB Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 11/80] ipc,shm: correct error return value in shmctl (SHM_UNLOCK) Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 12/80] ipc,shm: fix shm_file deletion races Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 13/80] drm/nv50-/disp: remove dcb_outp_match call, and related variables Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 14/80] drm/nva3-/disp: fix hda eld writing, needs to be padded Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 15/80] SUNRPC: dont map EKEYEXPIRED to EACCES in call_refreshresult Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 16/80] sched, idle: Fix the idle polling state logic Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 17/80] PCI: Allow PCIe Capability link-related register access for switches Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 18/80] PCI: Remove PCIe Capability version checks Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 19/80] PCI: Support PCIe Capability Slot registers only for ports with slots Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 20/80] perf/ftrace: Fix paranoid level for enabling function tracer Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 21/80] ACPI / EC: Ensure lock is acquired before accessing ec struct members Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 22/80] ACPI / video: Quirk initial backlight level 0 Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 23/80] ACPI / hotplug: Fix handle_root_bridge_removal() Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 24/80] ACPI / hotplug: Do not execute "insert in progress" _OST Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 25/80] rt2x00: fix a crash bug in the HT descriptor handling fix Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 26/80] rt2x00: check if device is still available on rt2x00mac_flush() Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 27/80] rt2x00: rt2800lib: fix VGC adjustment for RT5592 Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 28/80] rt2x00: fix HT TX descriptor settings regression Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 29/80] Revert "ima: policy for RAMFS" Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 30/80] exec/ptrace: fix get_dumpable() incorrect tests Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 31/80] ALSA: 6fire: Fix probe of multiple cards Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 32/80] ALSA: compress: fix drain calls blocking other compress functions Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 33/80] ALSA: compress: fix drain calls blocking other compress functions (v6) Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 34/80] ALSA: msnd: Avoid duplicated driver name Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 35/80] ALSA: hda - Add support of ALC255 codecs Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 36/80] ALSA: hda - Enable SPDIF for Acer TravelMate 6293 Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 37/80] ALSA: hda - Make sure mute LEDs stay on during runtime suspend (Realtek) Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 38/80] ALSA: hda - Add support for CX20952 Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 39/80] ALSA: hda - Add pincfg fixup for ASUS W5A Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 40/80] ALSA: hda - Fix Line Out automute on Realtek multifunction jacks Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 41/80] ALSA: hda - Check keep_eapd_on before inv_eapd Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 42/80] ALSA: hda - Dont turn off EAPD for headphone on Lenovo N100 Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 43/80] ALSA: hda - Dont clear the power state at snd_hda_codec_reset() Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 44/80] ALSA: hda - Fix unbalanced runtime PM notification at resume Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 45/80] ALSA: hda - Fix the headphone jack detection on Sony VAIO TX Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 46/80] ALSA: hda - Add headset quirk for Dell Inspiron 3135 Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 47/80] ALSA: hda - Provide missing pin configs for VAIO with ALC260 Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 48/80] NFSv4: Fix a use-after-free situation in _nfs4_proc_getlk() Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 49/80] NFSv4: fix NULL dereference in open recover Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 50/80] NFSv4: dont fail on missing fattr " Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 51/80] NFSv4: dont reprocess cached open CLAIM_PREVIOUS Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 52/80] NFSv4: Fix state reference counting in _nfs4_opendata_reclaim_to_nfs4_state Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 53/80] nfsd: return better errors to exportfs Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 54/80] nfsd: split up nfsd_setattr Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 55/80] nfsd: make sure to balance get/put_write_access Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 56/80] x86/microcode/amd: Tone down printk(), dont treat a missing firmware file as an error Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 57/80] KVM: x86: fix emulation of "movzbl %bpl, %eax" Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 58/80] ftrace/x86: skip over the breakpoint for ftrace caller Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 59/80] KVM: IOMMU: hva align mapping page size Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 60/80] arm/arm64: KVM: Fix hyp mappings of vmalloc regions Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 61/80] hwmon: (lm90) Fix max6696 alarm handling Greg Kroah-Hartman
2013-11-27  0:57 ` Greg Kroah-Hartman [this message]
2013-11-27  0:57 ` [PATCH 3.10 63/80] block: fix a probe argument to blk_register_region Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 64/80] block: properly stack underlying max_segment_size to DM device Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 65/80] powerpc/52xx: fix build breakage for MPC5200 LPBFIFO module Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 66/80] powerpc/vio: use strcpy in modalias_show Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 67/80] powerpc/powernv: Add PE to its own PELTV Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 68/80] powerpc: ppc64 address space capped at 32TB, mmap randomisation disabled Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 69/80] powerpc/signals: Mark VSX not saved with small contexts Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 70/80] slub: Handle NULL parameter in kmem_cache_flags Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 71/80] SUNRPC: Fix a data corruption issue when retransmitting RPC calls Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 72/80] mei: nfc: fix memory leak in error path Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 73/80] usb: hub: Clear Port Reset Change during init/resume Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 74/80] rt2800usb: slow down TX status polling Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 75/80] s390/vtime: correct idle time calculation Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 76/80] configfs: fix race between dentry put and lookup Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 77/80] cris: media platform drivers: fix build Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 78/80] dmi: add support for exact DMI matches in addition to substring matching Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 79/80] drm/i915: quirk away phantom LVDS on Intels D510MO mainboard Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 80/80] drm/i915: No LVDS hardware on Intel D410PT and D425KT Greg Kroah-Hartman
2013-11-27 12:57 ` [PATCH 3.10 00/80] 3.10.21-stable review Guenter Roeck
2013-11-27 22:29 ` Shuah Khan
2013-11-28 10:55 ` Satoru Takeuchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131127005645.246080115@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=axboe@kernel.dk \
    --cc=hare@suse.de \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).