All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	qemu-devel@nongnu.org, Pavel Emelyanov <xemul@parallels.com>,
	Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Andres Lagar-Cavilla <andreslc@google.com>,
	Andy Lutomirski <luto@amacapital.net>,
	linux-mm <linux-mm@kvack.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Rik van Riel <riel@redhat.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	zhang.zhanghailiang@huawei.com,
	Sanidhya Kashyap <sanidhya.gatech@gmail.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Peter Feiner <pfeiner@google.com>, Mel Gorman <mgorman@suse.de>,
	KVM list <kvm@vger.kernel.org>
Subject: Re: [PATCH 5/7] userfaultfd: switch to exclusive wakeup for blocking reads
Date: Tue, 16 Jun 2015 14:17:47 +0200	[thread overview]
Message-ID: <20150616121747.GJ18909@redhat.com> (raw)
In-Reply-To: <CA+55aFxZ2_Nix6-PrNE+yN87T02CdqG-y+piHXg=5AMGOiJy2A@mail.gmail.com>

On Mon, Jun 15, 2015 at 08:41:24PM -1000, Linus Torvalds wrote:
> On Mon, Jun 15, 2015 at 12:19 PM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> >
> > Yes, it would leave the other blocked, how is it different from having
> > just 1 reader and it gets killed?
> 
> Either is completely wrong. But the read() code can at least see that
> "I'm returning early due to a signal, so I'll wake up any other
> waiters".
> 
> Poll simply *cannot* do that. Because by definition poll always
> returns without actually clearing the thing that caused the wakeup.
> 
> So for "poll()", using exclusive waits is wrong very much by
> definition. For read(), you *can* use exclusive waits correctly, it
> just requires you to wake up others if you don't read all the data
> (either due to being killed by a signal, or because the read was
> incomplete).

There's no interface to do wakeone with poll so I haven't thought much
about it frankly but intuitively it didn't look radically different as
long as poll checks every fd revent it gets. If I was to patch it to
introduce wakeone in poll I'd think more about it of course. Perhaps
I've been overoptimistic here.

> What does qemu have to do with anything?
> 
> We don't implement kernel interfaces that are broken, and that can
> leave processes blocked when they shouldn't be blocked. We also don't
> implement kernel interfaces that only work with one program and then
> say "if that program is broken, it's not our problem".

I'm testing with the stresstest application not with qemu, qemu cannot
take advantage of this anyway because it uses a single thread so far
and it uses poll not blocking reads. The stresstest suite listens to
events with one thread per CPU and it interleaves poll usage with
blocking reads at every bounce, and it's working correctly so far.

> > I'm not saying doing wakeone is easy [...]
> 
> Bullshit, Andrea.
> 
> That's *exactly* what you said in the commit message for the broken
> patch that I complained about. And I quote:

Please don't quote me out of context, and quote me in full if you
quote me:

"I'm not saying doing wakeone is easy and it's enough to flip a switch
everywhere to get it everywhere"

In the above paragraph (that you quoted in a truncated version of it)
I was talking in general, not specific to userfaultfd. I meant in
general wakeone is not easy.

> patch that I complained about. And I quote:
> 
>   "Blocking reads can easily use exclusive wakeups. Poll in theory
> could too but there's no poll_wait_exclusive in common code yet"

With "I'm not saying doing wakeone is easy and it's enough to flip a
switch everywhere to get it everywhere" I intended exactly to clarify
that "Blocking reads can easily use exclusive wakeups" was in the
context of userfaultfd only.

With regard to the patch you still haven't told what exactly what
runtime breakage I shall expect from my simple change. The case you
mentioned about a thread that gets killed is irrelevant because an
userfault would get missed anyway, if a task listening to userfaultfd
get killed after it received any uffd_msg structure. Wakeone or
wakeall won't move the needle for that case. There's no broadcast of
userfaults to all readers, even with wakeall, only the first reader
that wakes up, gets the messages, the others returns to sleep
immediately.

WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	qemu-devel@nongnu.org, Pavel Emelyanov <xemul@parallels.com>,
	Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Andres Lagar-Cavilla <andreslc@google.com>,
	Andy Lutomirski <luto@amacapital.net>,
	linux-mm <linux-mm@kvack.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Rik van Riel <riel@redhat.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	zhang.zhanghailiang@huawei.com,
	Sanidhya Kashyap <sanidhya.gatech@gmail.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Peter Feiner <pfeiner@google.com>, Mel Gorman <mgorman@suse.de>,
	KVM list <kvm@vger.kernel.org>
Subject: Re: [PATCH 5/7] userfaultfd: switch to exclusive wakeup for blocking reads
Date: Tue, 16 Jun 2015 14:17:47 +0200	[thread overview]
Message-ID: <20150616121747.GJ18909@redhat.com> (raw)
In-Reply-To: <CA+55aFxZ2_Nix6-PrNE+yN87T02CdqG-y+piHXg=5AMGOiJy2A@mail.gmail.com>

On Mon, Jun 15, 2015 at 08:41:24PM -1000, Linus Torvalds wrote:
> On Mon, Jun 15, 2015 at 12:19 PM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> >
> > Yes, it would leave the other blocked, how is it different from having
> > just 1 reader and it gets killed?
> 
> Either is completely wrong. But the read() code can at least see that
> "I'm returning early due to a signal, so I'll wake up any other
> waiters".
> 
> Poll simply *cannot* do that. Because by definition poll always
> returns without actually clearing the thing that caused the wakeup.
> 
> So for "poll()", using exclusive waits is wrong very much by
> definition. For read(), you *can* use exclusive waits correctly, it
> just requires you to wake up others if you don't read all the data
> (either due to being killed by a signal, or because the read was
> incomplete).

There's no interface to do wakeone with poll so I haven't thought much
about it frankly but intuitively it didn't look radically different as
long as poll checks every fd revent it gets. If I was to patch it to
introduce wakeone in poll I'd think more about it of course. Perhaps
I've been overoptimistic here.

> What does qemu have to do with anything?
> 
> We don't implement kernel interfaces that are broken, and that can
> leave processes blocked when they shouldn't be blocked. We also don't
> implement kernel interfaces that only work with one program and then
> say "if that program is broken, it's not our problem".

I'm testing with the stresstest application not with qemu, qemu cannot
take advantage of this anyway because it uses a single thread so far
and it uses poll not blocking reads. The stresstest suite listens to
events with one thread per CPU and it interleaves poll usage with
blocking reads at every bounce, and it's working correctly so far.

> > I'm not saying doing wakeone is easy [...]
> 
> Bullshit, Andrea.
> 
> That's *exactly* what you said in the commit message for the broken
> patch that I complained about. And I quote:

Please don't quote me out of context, and quote me in full if you
quote me:

"I'm not saying doing wakeone is easy and it's enough to flip a switch
everywhere to get it everywhere"

In the above paragraph (that you quoted in a truncated version of it)
I was talking in general, not specific to userfaultfd. I meant in
general wakeone is not easy.

> patch that I complained about. And I quote:
> 
>   "Blocking reads can easily use exclusive wakeups. Poll in theory
> could too but there's no poll_wait_exclusive in common code yet"

With "I'm not saying doing wakeone is easy and it's enough to flip a
switch everywhere to get it everywhere" I intended exactly to clarify
that "Blocking reads can easily use exclusive wakeups" was in the
context of userfaultfd only.

With regard to the patch you still haven't told what exactly what
runtime breakage I shall expect from my simple change. The case you
mentioned about a thread that gets killed is irrelevant because an
userfault would get missed anyway, if a task listening to userfaultfd
get killed after it received any uffd_msg structure. Wakeone or
wakeall won't move the needle for that case. There's no broadcast of
userfaults to all readers, even with wakeall, only the first reader
that wakes up, gets the messages, the others returns to sleep
immediately.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>,
	zhang.zhanghailiang@huawei.com, KVM list <kvm@vger.kernel.org>,
	Pavel Emelyanov <xemul@parallels.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Hugh Dickins <hughd@google.com>,
	"Huangpeng (Peter)" <peter.huangpeng@huawei.com>,
	qemu-devel@nongnu.org,
	Sanidhya Kashyap <sanidhya.gatech@gmail.com>,
	linux-mm <linux-mm@kvack.org>,
	Andres Lagar-Cavilla <andreslc@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@amacapital.net>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Peter Feiner <pfeiner@google.com>
Subject: Re: [Qemu-devel] [PATCH 5/7] userfaultfd: switch to exclusive wakeup for blocking reads
Date: Tue, 16 Jun 2015 14:17:47 +0200	[thread overview]
Message-ID: <20150616121747.GJ18909@redhat.com> (raw)
In-Reply-To: <CA+55aFxZ2_Nix6-PrNE+yN87T02CdqG-y+piHXg=5AMGOiJy2A@mail.gmail.com>

On Mon, Jun 15, 2015 at 08:41:24PM -1000, Linus Torvalds wrote:
> On Mon, Jun 15, 2015 at 12:19 PM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> >
> > Yes, it would leave the other blocked, how is it different from having
> > just 1 reader and it gets killed?
> 
> Either is completely wrong. But the read() code can at least see that
> "I'm returning early due to a signal, so I'll wake up any other
> waiters".
> 
> Poll simply *cannot* do that. Because by definition poll always
> returns without actually clearing the thing that caused the wakeup.
> 
> So for "poll()", using exclusive waits is wrong very much by
> definition. For read(), you *can* use exclusive waits correctly, it
> just requires you to wake up others if you don't read all the data
> (either due to being killed by a signal, or because the read was
> incomplete).

There's no interface to do wakeone with poll so I haven't thought much
about it frankly but intuitively it didn't look radically different as
long as poll checks every fd revent it gets. If I was to patch it to
introduce wakeone in poll I'd think more about it of course. Perhaps
I've been overoptimistic here.

> What does qemu have to do with anything?
> 
> We don't implement kernel interfaces that are broken, and that can
> leave processes blocked when they shouldn't be blocked. We also don't
> implement kernel interfaces that only work with one program and then
> say "if that program is broken, it's not our problem".

I'm testing with the stresstest application not with qemu, qemu cannot
take advantage of this anyway because it uses a single thread so far
and it uses poll not blocking reads. The stresstest suite listens to
events with one thread per CPU and it interleaves poll usage with
blocking reads at every bounce, and it's working correctly so far.

> > I'm not saying doing wakeone is easy [...]
> 
> Bullshit, Andrea.
> 
> That's *exactly* what you said in the commit message for the broken
> patch that I complained about. And I quote:

Please don't quote me out of context, and quote me in full if you
quote me:

"I'm not saying doing wakeone is easy and it's enough to flip a switch
everywhere to get it everywhere"

In the above paragraph (that you quoted in a truncated version of it)
I was talking in general, not specific to userfaultfd. I meant in
general wakeone is not easy.

> patch that I complained about. And I quote:
> 
>   "Blocking reads can easily use exclusive wakeups. Poll in theory
> could too but there's no poll_wait_exclusive in common code yet"

With "I'm not saying doing wakeone is easy and it's enough to flip a
switch everywhere to get it everywhere" I intended exactly to clarify
that "Blocking reads can easily use exclusive wakeups" was in the
context of userfaultfd only.

With regard to the patch you still haven't told what exactly what
runtime breakage I shall expect from my simple change. The case you
mentioned about a thread that gets killed is irrelevant because an
userfault would get missed anyway, if a task listening to userfaultfd
get killed after it received any uffd_msg structure. Wakeone or
wakeall won't move the needle for that case. There's no broadcast of
userfaults to all readers, even with wakeall, only the first reader
that wakes up, gets the messages, the others returns to sleep
immediately.

  reply	other threads:[~2015-06-16 12:18 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-15 17:22 [PATCH 0/7] userfault21 update Andrea Arcangeli
2015-06-15 17:22 ` [Qemu-devel] " Andrea Arcangeli
2015-06-15 17:22 ` Andrea Arcangeli
2015-06-15 17:22 ` [PATCH 1/7] userfaultfd: require UFFDIO_API before other ioctls Andrea Arcangeli
2015-06-15 17:22   ` [Qemu-devel] " Andrea Arcangeli
2015-06-15 17:22   ` Andrea Arcangeli
2015-06-15 18:11   ` Linus Torvalds
2015-06-15 18:11     ` [Qemu-devel] " Linus Torvalds
2015-06-15 21:43     ` Andrea Arcangeli
2015-06-15 21:43       ` [Qemu-devel] " Andrea Arcangeli
2015-06-15 21:43       ` Andrea Arcangeli
2015-06-15 21:55       ` Linus Torvalds
2015-06-15 21:55         ` [Qemu-devel] " Linus Torvalds
2015-06-15 17:22 ` [PATCH 2/7] userfaultfd: propagate the full address in THP faults Andrea Arcangeli
2015-06-15 17:22   ` [Qemu-devel] " Andrea Arcangeli
2015-06-15 17:22   ` Andrea Arcangeli
2015-06-15 17:22 ` [PATCH 3/7] userfaultfd: allow signals to interrupt a userfault Andrea Arcangeli
2015-06-15 17:22   ` [Qemu-devel] " Andrea Arcangeli
2015-06-15 17:22   ` Andrea Arcangeli
2015-06-15 17:22 ` [PATCH 4/7] userfaultfd: avoid missing wakeups during refile in userfaultfd_read Andrea Arcangeli
2015-06-15 17:22   ` [Qemu-devel] " Andrea Arcangeli
2015-06-15 17:22   ` Andrea Arcangeli
2015-06-15 17:22 ` [PATCH 5/7] userfaultfd: switch to exclusive wakeup for blocking reads Andrea Arcangeli
2015-06-15 17:22   ` [Qemu-devel] " Andrea Arcangeli
2015-06-15 17:22   ` Andrea Arcangeli
2015-06-15 18:19   ` Linus Torvalds
2015-06-15 18:19     ` [Qemu-devel] " Linus Torvalds
2015-06-15 22:19     ` Andrea Arcangeli
2015-06-15 22:19       ` [Qemu-devel] " Andrea Arcangeli
2015-06-15 22:19       ` Andrea Arcangeli
2015-06-16  6:41       ` Linus Torvalds
2015-06-16  6:41         ` [Qemu-devel] " Linus Torvalds
2015-06-16  6:41         ` Linus Torvalds
2015-06-16 12:17         ` Andrea Arcangeli [this message]
2015-06-16 12:17           ` [Qemu-devel] " Andrea Arcangeli
2015-06-16 12:17           ` Andrea Arcangeli
2015-06-15 17:22 ` [PATCH 6/7] userfaultfd: Revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key" Andrea Arcangeli
2015-06-15 17:22   ` [Qemu-devel] " Andrea Arcangeli
2015-06-15 17:22   ` Andrea Arcangeli
2015-06-15 17:22 ` [PATCH 7/7] userfaultfd: selftest Andrea Arcangeli
2015-06-15 17:22   ` [Qemu-devel] " Andrea Arcangeli
2015-06-15 17:22   ` Andrea Arcangeli
2015-10-12 15:04 ` [PATCH 0/7] userfault21 update Patrick Donnelly
2015-10-12 15:04   ` [Qemu-devel] " Patrick Donnelly
2015-10-12 15:04   ` Patrick Donnelly
2015-10-19 21:42   ` Andrea Arcangeli
2015-10-19 21:42     ` [Qemu-devel] " Andrea Arcangeli
2015-10-19 21:42     ` Andrea Arcangeli
2015-10-19 21:42     ` Andrea Arcangeli
2015-10-20 13:44     ` Patrick Donnelly
2015-10-20 13:44       ` [Qemu-devel] " Patrick Donnelly
2015-10-20 13:44       ` Patrick Donnelly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150616121747.GJ18909@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreslc@google.com \
    --cc=dave.hansen@intel.com \
    --cc=dgilbert@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kirill@shutemov.name \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@amacapital.net \
    --cc=mgorman@suse.de \
    --cc=pbonzini@redhat.com \
    --cc=peter.huangpeng@huawei.com \
    --cc=pfeiner@google.com \
    --cc=qemu-devel@nongnu.org \
    --cc=riel@redhat.com \
    --cc=sanidhya.gatech@gmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=xemul@parallels.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.