Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Eric Wong <e@80x24.org>
To: Deepa Dinamani <deepa.kernel@gmail.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Jason Baron <jbaron@akamai.com>
Cc: linux-kernel@vger.kernel.org, Omar Kilani <omar.kilani@gmail.com>,
	linux-fsdevel@vger.kernel.org
Subject: Re: Strange issues with epoll since 5.0
Date: Sat, 27 Apr 2019 09:33:19 +0000
Message-ID: <20190427093319.sgicqik2oqkez3wk@dcvr> (raw)
In-Reply-To: <20190424193903.swlfmfuo6cqnpkwa@dcvr>

Eric Wong <e@80x24.org> wrote:
> Omar Kilani <omar.kilani@gmail.com> wrote:
> > Hi there,
> > 
> > I’m still trying to piece together a reproducible test that triggers
> > this, but I wanted to post in case someone goes “hmmm... change X
> > might have done this”.
> 
> Maybe Davidlohr knows, since he's responsible for most of the
> epoll changes in 5.0.

Well, I am not sure if I am hitting the same problem Omar is
hitting.  But I did find an epoll_pwait regression in 5.0:

epoll_pwait seems unresponsive to SIGURG in my
heavily-parallelized use case[1] on 5.0.9.  I bisected it to
commit 854a6ed56839a40f6b5d02a2962f48841482eec4
("signal: Add restore_user_sigmask()")

Just reverting the fs/eventpoll.c change in 854a6ed56 seems
enough to fix the non-responsive epoll_pwait for me.  I have not
looked deeply into this, but perhaps the signal_pending check in
restore_user_sigmask is racy w.r.t. epoll.  It is been a while
since I have looked at kernel stuff, myself.

Anyways, this revert works; but I'm not 100% sure why...

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index a5d219d920e7..151739d76801 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2247,7 +2247,20 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
 
 	error = do_epoll_wait(epfd, events, maxevents, timeout);
 
-	restore_user_sigmask(sigmask, &sigsaved);
+	/*
+	 * If we changed the signal mask, we need to restore the original one.
+	 * In case we've got a signal while waiting, we do not restore the
+	 * signal mask yet, and we allow do_signal() to deliver the signal on
+	 * the way back to userspace, before the signal mask is restored.
+	 */
+	if (sigmask) {
+		if (error == -EINTR) {
+			memcpy(&current->saved_sigmask, &sigsaved,
+			       sizeof(sigsaved));
+			set_restore_sigmask();
+		} else
+			set_current_blocked(&sigsaved);
+	}
 
 	return error;
 }
@@ -2272,7 +2285,20 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
 
 	err = do_epoll_wait(epfd, events, maxevents, timeout);
 
-	restore_user_sigmask(sigmask, &sigsaved);
+	/*
+	 * If we changed the signal mask, we need to restore the original one.
+	 * In case we've got a signal while waiting, we do not restore the
+	 * signal mask yet, and we allow do_signal() to deliver the signal on
+	 * the way back to userspace, before the signal mask is restored.
+	 */
+	if (sigmask) {
+		if (err == -EINTR) {
+			memcpy(&current->saved_sigmask, &sigsaved,
+			       sizeof(sigsaved));
+			set_restore_sigmask();
+		} else
+			set_current_blocked(&sigsaved);
+	}
 
 	return err;
 }

Comments and/or a proper fix would be greatly appreciated.

[1] my test case is running the cmogstored 1.7.0 test suite
    in amd64 Debian stable environment.
    test/mgmt_auto_adjust would get stuck and time-out after 60s
    on vanilla v5.0.9

    tgz: https://bogomips.org/cmogstored/files/cmogstored-1.7.0.tar.gz
    # Standard autotools install,  N=32 or some high-ish number
    ./configure
    make -j$N
    make check -j$N

    # OR git clone https://bogomips.org/cmogstored.git

So, requoting the rest of Omar's original report, here; since
I am not sure if his use case involves epoll_pwait like mine does:

> Omar Kilani <omar.kilani@gmail.com> wrote:
> > Basically, something’s broken (or at least, has changed enough to
> > cause problems in user space) in epoll since 5.0. It’s still broken in
> > 5.1-rc5.
> > 
> > It doesn’t happen 100% of the time. It’s sort of hard to pin down but
> > I’ve observed the following:
> > 
> > * nginx not accepting connections under load
> > * A java app which uses netty / NIO having strange writability
> > semantics on channels, which confuses netty / java enough to not
> > properly flush written data on the socket.
> > 
> > I went and tested these Linux kernels:
> > 
> > 4.20.17
> > 4.19.32
> > 4.14.111
> > 
> > And the issue(s) do not show up there.
> > 
> > I’m still actively chasing this up, and will report back — I haven’t
> > touched kernel code in 15 years so I’m a little rusty. :)
> > 
> > Regards,
> > Omar

       reply index

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CA+8F9hicnF=kvjXPZFQy=Pa2HJUS3JS+G9VswFHNQQynPMHGVQ@mail.gmail.com>
     [not found] ` <20190424193903.swlfmfuo6cqnpkwa@dcvr>
2019-04-27  9:33   ` Eric Wong [this message]
2019-04-27 23:31     ` Deepa Dinamani
2019-04-28  0:48       ` Eric Wong
2019-04-29 20:47         ` Davidlohr Bueso
2019-04-29 21:04           ` Eric Wong
2019-04-30 21:07             ` Deepa Dinamani
2019-05-01  2:14               ` Eric Wong
2019-05-01  2:26                 ` Eric Wong
2019-05-01  7:39                 ` Eric Wong
2019-05-01 18:37                   ` Deepa Dinamani
2019-05-01 20:48                     ` Eric Wong
2019-05-01 20:53                       ` Deepa Dinamani
2019-05-03  0:01                         ` Deepa Dinamani
2019-05-03  2:34                           ` Eric Wong
2019-05-03  3:34                           ` Davidlohr Bueso
2019-05-03  3:42                             ` [PATCH] signal: Adjust error codes according to restore_user_sigmask() Deepa Dinamani
2019-05-03  6:34                               ` Eric Wong
2019-05-03 18:21                                 ` Deepa Dinamani
2019-05-03 19:51                               ` Davidlohr Bueso
2019-05-03 22:53                                 ` Deepa Dinamani

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190427093319.sgicqik2oqkez3wk@dcvr \
    --to=e@80x24.org \
    --cc=arnd@arndb.de \
    --cc=dave@stgolabs.net \
    --cc=deepa.kernel@gmail.com \
    --cc=jbaron@akamai.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=omar.kilani@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org linux-fsdevel@archiver.kernel.org
	public-inbox-index linux-fsdevel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox