All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
To: Roland Dreier <rdreier-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
Cc: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>,
	Pavel Machek <pavel-+ZI9xUNit7I@public.gmane.org>,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Paul Mackerras <paulus-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>,
	Anton Blanchard <anton-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>,
	general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org
Subject: Re: [ofa-general] Re: [GIT PULL] please pull ummunotify
Date: Mon, 12 Oct 2009 19:33:22 +0200	[thread overview]
Message-ID: <1255368802.8392.26.camel@twins> (raw)
In-Reply-To: <ada3a5uq1dk.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>

On Wed, 2009-10-07 at 15:34 -0700, Roland Dreier wrote:
> > So I looked a little deeper into this, and I don't think (even with the
>  > filtering extensions) that perf events are directly applicable to this
>  > problem.  The first issue is that, assuming I'm understanding the
>  > comment in perf_event.c:
>  > 
>  >         /*
>  >          * Raw tracepoint data is a severe data leak, only allow root to
>  >          * have these.
>  >          */
>  > 
>  > currently tracepoints can only be used by privileged processes.  A key
>  > feature of ummunotify is that ordinary unprivileged processes can use it.
>  > 
>  > So would it be acceptable to add something like PERF_TYPE_MMU_NOTIFIER
>  > as a way of letting unprivileged userspace get access to just MMU events
>  > for their own process?  Clearly this touches core infrastructure and is
>  > not as simple as just adding two tracepoints.
>  > 
>  > Then, assuming we have some way to create an "MMU notifier" perf event,
>  > we need a way for userspace to specify which address ranges it would
>  > like events for (I don't think the string filter expression used by
>  > existing trace filtering works, because if userspace is looking at a few
>  > hundred regions, then the size of the filtering expression explodes, and
>  > adding or removing a single range becomes a pain).  So I guess a new
>  > ioctl() to add/remove ranges for MMU_NOTIFIER perf events?
>  > 
>  > I think filtering is needed, because otherwise events for ranges that
>  > are not of interest are just a waste of resources to generate and
>  > process, and make losing good events because of overflow much more
>  > likely.
>  > 
>  > We still have the problem of lost events if the mmap buffer overflows,
>  > but userspace should be able to size the buffer so that such events are
>  > rare I guess.
>  > 
>  > In the end this seems to just take the ummunotify code I have, and make
>  > it be a new type of perf counter instead of a character special device.
>  > I'd actually be OK with that, since having an oddball new char dev
>  > interface is not particularly nice.  But on the other hand just
>  > multiplexing a new type of thing under perf events is not all that much
>  > better.  What do you think?
> 
> Ingo/Peter/<anyone suggesting perf events> -- can you comment on this
> plan of creating PERF_TYPE_MMU_NOTIFIER for perf events to implement
> ummunotify?  To me it looks like a wash -- the main difference is how
> userspace gets the magic ummunotify file descriptor, either by
> open("/dev/ummunotify") or by perf_event_open(...PERF_TYPE_MMU_NOTIFIER...),
> but pretty much everything else stays pretty much the same in terms of
> how much kernel code is involved.  We do reuse the perf events mmap
> buffer code but I think that ends up being more complicated than
> returning events via read().
> 
> Anyway, before I spend the time converting over to the new
> infrastructure and causing the MPI guys to churn their code, I'd like to
> make sure that this is what you guys have in mind.
> 
> (By the way, after thinking about this more, I really do think that
> filtering events by address range is a must-have -- with filtering,
> userspace can map sufficient buffer space to avoid losing events for a
> given number of regions; without filtering, events might get lost just
> because of invalidate events for ranges userspace didn't even care about)

I think something like

PERF_TYPE_SOFTWARE, PERF_COUNT_SW_MUNMAP + $filter

or

PERF_TYPE_TRACEPOINT, //events/vm/munmap/id + $filter

As for the read/poll issue, I think we can do something like
PERF_FORMAT_BLOCK which would make read() block when ->count hasn't
changed, and make poll() work without requiring a mmap().

As to filter, we can do two things, add a simple single range filter to
perf_event_attr, which is something ia64 has hardware support for IIRC,
or we can possibly use this trace filter muck.

Would something like that be sufficient? With such events only
generating a wakeup (poll) when the unmap actually happens, you'd not
even need an mmap() buffer to keep up with that.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Roland Dreier <rdreier@cisco.com>
Cc: Ingo Molnar <mingo@elte.hu>, Pavel Machek <pavel@ucw.cz>,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	Paul Mackerras <paulus@samba.org>,
	Anton Blanchard <anton@samba.org>,
	general@lists.openfabrics.org, akpm@linux-foundation.org,
	torvalds@linux-foundation.org
Subject: Re: [ofa-general] Re: [GIT PULL] please pull ummunotify
Date: Mon, 12 Oct 2009 19:33:22 +0200	[thread overview]
Message-ID: <1255368802.8392.26.camel@twins> (raw)
In-Reply-To: <ada3a5uq1dk.fsf@cisco.com>

On Wed, 2009-10-07 at 15:34 -0700, Roland Dreier wrote:
> > So I looked a little deeper into this, and I don't think (even with the
>  > filtering extensions) that perf events are directly applicable to this
>  > problem.  The first issue is that, assuming I'm understanding the
>  > comment in perf_event.c:
>  > 
>  >         /*
>  >          * Raw tracepoint data is a severe data leak, only allow root to
>  >          * have these.
>  >          */
>  > 
>  > currently tracepoints can only be used by privileged processes.  A key
>  > feature of ummunotify is that ordinary unprivileged processes can use it.
>  > 
>  > So would it be acceptable to add something like PERF_TYPE_MMU_NOTIFIER
>  > as a way of letting unprivileged userspace get access to just MMU events
>  > for their own process?  Clearly this touches core infrastructure and is
>  > not as simple as just adding two tracepoints.
>  > 
>  > Then, assuming we have some way to create an "MMU notifier" perf event,
>  > we need a way for userspace to specify which address ranges it would
>  > like events for (I don't think the string filter expression used by
>  > existing trace filtering works, because if userspace is looking at a few
>  > hundred regions, then the size of the filtering expression explodes, and
>  > adding or removing a single range becomes a pain).  So I guess a new
>  > ioctl() to add/remove ranges for MMU_NOTIFIER perf events?
>  > 
>  > I think filtering is needed, because otherwise events for ranges that
>  > are not of interest are just a waste of resources to generate and
>  > process, and make losing good events because of overflow much more
>  > likely.
>  > 
>  > We still have the problem of lost events if the mmap buffer overflows,
>  > but userspace should be able to size the buffer so that such events are
>  > rare I guess.
>  > 
>  > In the end this seems to just take the ummunotify code I have, and make
>  > it be a new type of perf counter instead of a character special device.
>  > I'd actually be OK with that, since having an oddball new char dev
>  > interface is not particularly nice.  But on the other hand just
>  > multiplexing a new type of thing under perf events is not all that much
>  > better.  What do you think?
> 
> Ingo/Peter/<anyone suggesting perf events> -- can you comment on this
> plan of creating PERF_TYPE_MMU_NOTIFIER for perf events to implement
> ummunotify?  To me it looks like a wash -- the main difference is how
> userspace gets the magic ummunotify file descriptor, either by
> open("/dev/ummunotify") or by perf_event_open(...PERF_TYPE_MMU_NOTIFIER...),
> but pretty much everything else stays pretty much the same in terms of
> how much kernel code is involved.  We do reuse the perf events mmap
> buffer code but I think that ends up being more complicated than
> returning events via read().
> 
> Anyway, before I spend the time converting over to the new
> infrastructure and causing the MPI guys to churn their code, I'd like to
> make sure that this is what you guys have in mind.
> 
> (By the way, after thinking about this more, I really do think that
> filtering events by address range is a must-have -- with filtering,
> userspace can map sufficient buffer space to avoid losing events for a
> given number of regions; without filtering, events might get lost just
> because of invalidate events for ranges userspace didn't even care about)

I think something like

PERF_TYPE_SOFTWARE, PERF_COUNT_SW_MUNMAP + $filter

or

PERF_TYPE_TRACEPOINT, //events/vm/munmap/id + $filter

As for the read/poll issue, I think we can do something like
PERF_FORMAT_BLOCK which would make read() block when ->count hasn't
changed, and make poll() work without requiring a mmap().

As to filter, we can do two things, add a simple single range filter to
perf_event_attr, which is something ia64 has hardware support for IIRC,
or we can possibly use this trace filter muck.

Would something like that be sufficient? With such events only
generating a wakeup (poll) when the unmap actually happens, you'd not
even need an mmap() buffer to keep up with that.



  parent reply	other threads:[~2009-10-12 17:33 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-11  4:38 [GIT PULL] please pull ummunotify Roland Dreier
2009-09-11  4:38 ` Roland Dreier
2009-09-15 11:34 ` Pavel Machek
     [not found]   ` <20090915113434.GF1328-+ZI9xUNit7I@public.gmane.org>
2009-09-15 14:57     ` [ofa-general] " Roland Dreier
2009-09-15 14:57       ` Roland Dreier
     [not found]       ` <ada7hw0gsqz.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2009-09-28 20:49         ` Pavel Machek
2009-09-28 20:49           ` Pavel Machek
     [not found]           ` <20090928204923.GA1960-I/5MKhXcvmPrBKCeMvbIDA@public.gmane.org>
2009-09-28 21:40             ` Jason Gunthorpe
2009-09-28 21:40               ` Jason Gunthorpe
     [not found] ` <aday6omhz9d.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2009-09-11  5:56   ` KOSAKI Motohiro
2009-09-11  5:56     ` KOSAKI Motohiro
2009-09-11  6:03     ` [ofa-general] " Roland Dreier
2009-09-11  6:03       ` Roland Dreier
     [not found]       ` <adatyzahvbm.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2009-09-11  6:11         ` KOSAKI Motohiro
2009-09-11  6:11           ` KOSAKI Motohiro
     [not found]           ` <20090911150552.DB68.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2009-09-11 16:42             ` Gleb Natapov
2009-09-11 16:42               ` Gleb Natapov
2009-09-11  6:15       ` Brice Goglin
     [not found]         ` <4AA9EAF7.5010401-MZpvjPyXg2s@public.gmane.org>
2009-09-11  6:21           ` KOSAKI Motohiro
2009-09-11  6:21             ` KOSAKI Motohiro
2009-09-11  6:22           ` Roland Dreier
2009-09-11  6:22             ` Roland Dreier
2009-09-11  6:40             ` [ofa-general] " Jason Gunthorpe
2009-09-11  6:40               ` Jason Gunthorpe
     [not found]               ` <20090911064019.GZ4973-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-09-11 16:58                 ` Roland Dreier
2009-09-11 16:58                   ` Roland Dreier
     [not found]                   ` <adaljklifkt.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2009-09-15  7:03                     ` KOSAKI Motohiro
2009-09-15  7:03                       ` KOSAKI Motohiro
2009-09-15  8:27                       ` Roland Dreier
2009-09-15  8:27                         ` Roland Dreier
2009-09-15 12:38                       ` Jeff Squyres
2009-09-15 12:38                         ` Jeff Squyres
2009-09-16 16:30   ` Roland Dreier
2009-09-16 16:30     ` Roland Dreier
2009-09-16 16:40     ` [ofa-general] " Linus Torvalds
2009-09-16 16:40       ` Linus Torvalds
2009-09-17 11:30   ` Peter Zijlstra
2009-09-17 11:30     ` Peter Zijlstra
2009-09-17 14:24     ` [ofa-general] " Roland Dreier
     [not found]       ` <adafxalejiq.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2009-09-17 14:32         ` Roland Dreier
2009-09-17 14:32           ` Roland Dreier
     [not found]           ` <adaab0tej5c.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2009-09-17 14:49             ` Peter Zijlstra
2009-09-17 14:49               ` Peter Zijlstra
2009-09-17 15:03               ` Roland Dreier
2009-09-17 15:03                 ` Roland Dreier
     [not found]                 ` <adazl8td35u.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2009-09-17 15:22                   ` Peter Zijlstra
2009-09-17 15:22                     ` Peter Zijlstra
2009-09-17 15:45                   ` Roland Dreier
2009-09-17 15:45                     ` Roland Dreier
     [not found]                     ` <adatyz1d17q.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2009-09-18 11:50                       ` Ingo Molnar
2009-09-18 11:50                         ` Ingo Molnar
2009-09-29 17:13                       ` Pavel Machek
2009-09-29 17:13                         ` Pavel Machek
     [not found]                         ` <20090929171332.GD14405-I/5MKhXcvmPrBKCeMvbIDA@public.gmane.org>
2009-09-30  9:44                           ` Ingo Molnar
2009-09-30  9:44                             ` Ingo Molnar
     [not found]                             ` <20090930094456.GD24621-X9Un+BFzKDI@public.gmane.org>
2009-09-30 16:02                               ` Jason Gunthorpe
2009-09-30 16:02                                 ` Jason Gunthorpe
     [not found]                                 ` <20090930160232.GZ22310-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-12 18:19                                   ` Ingo Molnar
2009-10-12 18:19                                     ` Ingo Molnar
     [not found]                                     ` <20091012181944.GF17138-X9Un+BFzKDI@public.gmane.org>
2009-10-12 19:30                                       ` Jason Gunthorpe
2009-10-12 19:30                                         ` Jason Gunthorpe
2009-10-12 20:20                                         ` Ingo Molnar
     [not found]                                           ` <20091012202046.GA7648-X9Un+BFzKDI@public.gmane.org>
2009-10-13  4:05                                             ` Jason Gunthorpe
2009-10-13  4:05                                               ` Jason Gunthorpe
2009-10-13  6:40                                               ` Ingo Molnar
     [not found]                                                 ` <20091013064006.GC9470-X9Un+BFzKDI@public.gmane.org>
2009-10-13 16:27                                                   ` Jason Gunthorpe
2009-10-13 16:27                                                     ` Jason Gunthorpe
2009-10-13  5:43                                           ` Brice Goglin
     [not found]                                             ` <4AD41373.8010108-MZpvjPyXg2s@public.gmane.org>
2009-10-13  6:38                                               ` Ingo Molnar
2009-10-13  6:38                                                 ` Ingo Molnar
2009-09-30 17:06                               ` Roland Dreier
2009-09-30 17:06                                 ` Roland Dreier
2009-10-02 16:32                               ` Roland Dreier
2009-10-02 16:32                                 ` Roland Dreier
     [not found]                                 ` <ada3a61rc3j.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2009-10-02 20:45                                   ` Pavel Machek
2009-10-02 20:45                                     ` Pavel Machek
2009-10-07 22:34                                   ` Roland Dreier
2009-10-07 22:34                                     ` Roland Dreier
     [not found]                                     ` <ada3a5uq1dk.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2009-10-12 17:33                                       ` Peter Zijlstra [this message]
2009-10-12 17:33                                         ` Peter Zijlstra
2009-09-17 14:43         ` Peter Zijlstra
2009-09-17 14:43           ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1255368802.8392.26.camel@twins \
    --to=peterz-wegcikhe2lqwvfeawa7xhq@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=anton-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org \
    --cc=general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mingo-X9Un+BFzKDI@public.gmane.org \
    --cc=paulus-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org \
    --cc=pavel-+ZI9xUNit7I@public.gmane.org \
    --cc=rdreier-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org \
    --cc=torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.