All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Syms <mark.syms@citrix.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
Date: Mon, 8 Oct 2018 13:36:30 +0100	[thread overview]
Message-ID: <1539002191-40831-2-git-send-email-mark.syms@citrix.com> (raw)
In-Reply-To: <1539002191-40831-1-git-send-email-mark.syms@citrix.com>

During a VM stress test we encountered a system lockup and kern.log contained

kernel: [21389.462707] INFO: task python:15480 blocked for more than 120 seconds.
kernel: [21389.462749] Tainted: G O 4.4.0+10 #1
kernel: [21389.462763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: [21389.462783] python D ffff88019628bc90 0 15480 1 0x00000000
kernel: [21389.462790] ffff88019628bc90 ffff880198f11c00 ffff88005a509c00 ffff88019628c000
kernel: [21389.462795] ffffc90040226000 ffff88019628bd80 fffffffffffffe58 ffff8801818da418
kernel: [21389.462799] ffff88019628bca8 ffffffff815a1cd4 ffff8801818da5c0 ffff88019628bd68
kernel: [21389.462803] Call Trace:
kernel: [21389.462815] [<ffffffff815a1cd4>] schedule+0x64/0x80
kernel: [21389.462877] [<ffffffffa0663624>] find_insert_glock+0x4a4/0x530 [gfs2]
kernel: [21389.462891] [<ffffffffa0660c20>] ? gfs2_holder_wake+0x20/0x20 [gfs2]
kernel: [21389.462903] [<ffffffffa06639ed>] gfs2_glock_get+0x3d/0x330 [gfs2]
kernel: [21389.462928] [<ffffffffa066cff2>] do_flock+0xf2/0x210 [gfs2]
kernel: [21389.462933] [<ffffffffa0671ad0>] ? gfs2_getattr+0xe0/0xf0 [gfs2]
kernel: [21389.462938] [<ffffffff811ba2fb>] ? cp_new_stat+0x10b/0x120
kernel: [21389.462943] [<ffffffffa066d188>] gfs2_flock+0x78/0xa0 [gfs2]
kernel: [21389.462946] [<ffffffff812021e9>] SyS_flock+0x129/0x170
kernel: [21389.462948] [<ffffffff815a57ee>] entry_SYSCALL_64_fastpath+0x12/0x71

On examination of the code it was determined that this code path
is only taken if the selected glock is marked as dead, the supposition
therefore is that by the time schedule as called the glock had been
cleaned up and therefore nothing woke the schedule. Instead of calling
schedule, call schedule_timeout(HZ) so at least we get a chance to
re-evaluate.

On repeating the stress test, the printk message was seen once in the logs
across four servers but no further occurences nor were there any stuck task
log entries. This indicates that when the timeout occured the code repeated
the lookup and did not find the same glock entry but as we hadn't been
woken this means that we would never have been woken.

Signed-off-by: Mark Syms <mark.syms@citrix.com>
Signed-off-by: Tim Smith <tim.smith@citrix.com>
---
 fs/gfs2/glock.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 4614ee2..0a59a01 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -758,7 +758,8 @@ static struct gfs2_glock *find_insert_glock(struct lm_lockname *name,
 	}
 	if (gl && !lockref_get_not_dead(&gl->gl_lockref)) {
 		rcu_read_unlock();
-		schedule();
+		if (schedule_timeout(HZ) == 0)
+			printk(KERN_INFO "find_insert_glock schedule timed out\n");
 		goto again;
 	}
 out:
-- 
1.8.3.1



  reply	other threads:[~2018-10-08 12:36 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-08 12:36 [Cluster-devel] [PATCH 0/2] GFS2 locking patches Mark Syms
2018-10-08 12:36 ` Mark Syms [this message]
2018-10-08 12:56   ` [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock Steven Whitehouse
2018-10-08 12:59     ` Mark Syms
2018-10-08 13:03       ` Steven Whitehouse
2018-10-08 13:10         ` Tim Smith
2018-10-08 13:13           ` Steven Whitehouse
2018-10-08 13:26             ` Tim Smith
2018-10-09  8:13               ` Mark Syms
2018-10-09  8:41                 ` Steven Whitehouse
2018-10-09  8:50                   ` Mark Syms
2018-10-09 12:34               ` Andreas Gruenbacher
2018-10-09 14:00                 ` Tim Smith
2018-10-09 14:47                   ` Andreas Gruenbacher
2018-10-09 15:34                     ` Tim Smith
2018-10-08 13:25         ` Bob Peterson
2018-10-08 12:36 ` [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads Mark Syms
2018-10-08 12:57   ` Steven Whitehouse
2018-10-08 20:10   ` Bob Peterson
2018-10-08 20:53     ` Mark Syms
     [not found] ` <1603192.QCZcUHDx0E@dhcp-3-135.uk.xensource.com>
     [not found]   ` <CAHc6FU7AyDQ=yEj9WtN+aqtf7jfWs0TGh=Og0fQFVHyYMKHacA@mail.gmail.com>
2018-10-09 15:08     ` [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock Tim Smith
2018-10-10  8:22       ` Tim Smith
2018-10-11  8:14         ` Mark Syms
2018-10-11 19:28           ` Mark Syms
     [not found]           ` <5bbfa461.1c69fb81.1242f.59a7SMTPIN_ADDED_BROKEN@mx.google.com>
2019-03-06 16:14             ` Andreas Gruenbacher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1539002191-40831-2-git-send-email-mark.syms@citrix.com \
    --to=mark.syms@citrix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.