All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gregory Haskins <ghaskins@novell.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	avi@redhat.com, davidel@xmailserver.org,
	paulmck@linux.vnet.ibm.com, mingo@elte.hu
Subject: Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based	notifier interface
Date: Tue, 16 Jun 2009 12:17:22 -0400	[thread overview]
Message-ID: <4A37C592.2030407@novell.com> (raw)
In-Reply-To: <20090616154150.GA17494@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 8836 bytes --]

Michael S. Tsirkin wrote:
> On Tue, Jun 16, 2009 at 11:20:18AM -0400, Gregory Haskins wrote:
>   
>>>>> eventfd can't detect this state. But the callers know in what context they are.
>>>>> So the *caller* of eventfd_signal_task makes sure of this: if you are in a task,
>>>>> you can call eventfd_signal_task() if not, you must call eventfd_signal.
>>>>>
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>> Hmm, this is an interesting idea, but I think it would be problematic in
>>>> real-world applications for the long-term.  For instance, the -rt tree
>>>> and irq-threads .config option in the process of merging into mainline
>>>> changes context types for established code.  Therefore, what might be
>>>> "hardirq/softirq" logic today may execute in a kthread tomorrow.
>>>>     
>>>>         
>>> That's OK, it's always safe to call eventfd_signal: eventfd_signal_task is just
>>> an optimization. I think everyone not in the context of a system call or vmexit
>>> can just call eventfd_signal_task.
>>>   
>>>       
>>                                  ^^^^^^^^^^^^^^^^^^^^
>>
>> I assume you meant s/eventfd_signal_task/eventfd_signal there?
>>     
>
> Yea. Sorry.
>   

np.  I knew what you meant ;)
>   
>>>   
>>>       
>>>>  I
>>>> think its dangerous to try to solve the problem with caller provided
>>>> info:  the caller may be ignorant of its true state.
>>>>     
>>>>         
>>> I assume this wasn't clear enough: the idea is that you only
>>> calls eventfd_signal_task if you know you are on a systemcall path.
>>> If you are ignorant of the state, call eventfd_signal.
>>>   
>>>       
>> Well, its not a matter of correctness.  Its more for optimal
>> performance.  If I have PCI pass-though injecting interrupts from
>> hardirq in mainline, clearly eventfd_signal() is proper.  In -rt, the
>> hardirq is transparently converted to a kthread, so technically
>> eventfd_signal_task() would work
>>     
>
> I think it's wrong to sleep for a long time while handling interrupts even if
> you technically can with threaded interrupts.

Well, this is somewhat of an orthogonal issue so I don't want to open
this can of worms per se.

But, one of the long-term goals of the threaded-irq construct is to
eliminate the need for the traditional "bh" contexts to do work.  The
idea, of course, is that the threaded-irq context can do all of the work
traditionally shunted to tasklets/softirqs/workqueues directly, so why
bother switching contexts.  So, the short answer is that its not
necessarily wrong to sleep or to do significant processing from a
threaded-irq.

In any case, threaded-irqs are just one example.  I will try to
highlight others below.

>  For that matter, I think you can
> sleep while within a spinlock if preempt is on
Yes, in fact the spinlocks are mutexes in this mode, so the locks
themselves can sleep.


> , but that does not mean you
> should - and I think you will get warnings in the log if you do. No?
>
>   
Nope, sleeping is fine (voluntary or involuntary).  The idea is its all
governed by priority, and priority-inheritance, and a scheduler that is
free to make fine-grained decisions (due to the broadly preemptible
kernel).  But this is getting off-topic so I will stop.

>> (at least for the can_sleep() part, not
>> for current->mm per se).  But in this case, the PCI logic would not know
>> it was converted to a kthread.  It all happens transparently in the
>> low-level code and the pci code is unmodified.
>>
>> In this case, your proposal would have the passthrough path invoking
>> irqfd with eventfd_signal().  It  would therefore still shunt to a
>> workqueue to inject the interrupt, even though it would have been
>> perfectly fine to just inject it directly because taking
>> mutex_lock(&kvm->irq_lock) is legal.
>>     
>
> This specific issue should just be addressed by making it possible
> to inject the interrupt from an atomic context.
>   

I assume you mean
s/mutex_lock(&kvm->irq_lock)/spin_lock(&kvm->irq_lock)?  If so, I agree
this is a good idea.  TBH: I am more concerned about the iosignalfd path
w.r.t. the premptiblity of the callback.  We can optimize the virtio-net
interface, for instance, once we make this ->signal() callback
preemptible.  Having irqfd ->signal() preemptible is just a bonus, but
we could work around it by fixing irq_lock as well, I agree.

>   
>>  Perhaps I am over-optimizing, but
>> this is the scenario I am concerned about and what I was trying to
>> address with preemptible()/can_sleep().
>>     
>
> What, a path that is never invoked without threaded interrupts?
> Yes, I would say it currently looks like an over-optimization.
>   

You are only seeing part of the picture though.  This is a cascading
pattern.
>   
>> I think your idea is a good one to address the current->mm portion.  It
>> would only ever be safe to access the MM context from syscall/vmexit
>> context, as you point out.  Therefore, I see no problem with
>> implementing something like iosignalfd with eventfd_signal_task().
>>
>> But accessing current->mm is only a subset of the use-cases.  The other
>> use-cases would include the ability to sleep, and possibly the ability
>> to address other->mm.  For these latter cases, I really only need the
>> "can_sleep()" behavior, not the full blown "can_access_current_mm()". 
>> Additionally, the eventfd_signal_task() data at least for iosignalfd is
>> superfluous:  I already know that I can access current->mm by virtue of
>> the design.
>>     
>
> Maybe I misunderstand what iosignalfd is - isn't it true that you get eventfd
> and kvm will signal that when guest performs io write to a specific
> address, and then drivers can get eventfd and poll it to detect
> these writes?
>   

Correct.

> If yes, you have no way to know that the other end of eventfd
> is connected to kvm, so you don't know you can access current->mm.
>   

Well, my intended use for them *does* know its connected to KVM. 
Perhaps there will be others that do not in the future, but like I said
it could be addressed as those actually arise.


>   
>> So since I cannot use it accurately for the hardirq/threaded-irq type
>> case, and I don't actually need it for the iosignalfd case, I am not
>> sure its the right direction (at least for now).  I do think it might
>> have merit for syscal/vmexit uses outside of iosignalfd, but I do not
>> see a current use-case for it so perhaps it can wait until one arises.
>>
>> -Greg
>>     
>
> But, it addresses the CONFIG_PREEMPT off case, which your design doesn't
> seem to.
>   

Well, the problem is that it only addresses it partially in a limited
set of circumstances, and doesn't address the broader problem.  I'll
give you an example:

(First off, lets assume that we are not going to have
eventfd_signal_task(), but rather eventfd_signal() with two option
flags: EVENTFD_SIGNAL_CANSLEEP, and EVENTFD_SIGNAL_CURRENTVALID

Today vbus-enet has a rx-thread and a tx-thread at least partially
because I need process-context to do the copy_to_user(other->mm) type
stuff (and we will need similar for our virtio-net backend as well).  I
also utilize irqfd to do interrupt injection.  Now, since I know that I
have kthread_context, I could modify vbus-enet to inject interrupts with
EVENTFD_SIGNAL_CANSLEEP set, and this is fine.  I know that is safe.

However, the problem is above that!  I would like to optimize out the
tx-thread to begin with with a similar "can_sleep()" pattern whenever
possible.

For instance, if the netif stack calls me to do a transmit and it
happens to be in a sleepable context, it would be nice to just skip
waking up the tx-thread.  I would therefore just do the
copy_to_user(other->mm) + irqfd directly in the netif callback, thereby
skipping the context switch.

So the problem is a pattern that I would like to address generally
outside of just eventfd: "can I be sleepy"?  If yes, do it now, if not
defer it.

So if the stack calls me in a preemptible state, I would like to detect
that and optimize my deferment mechanisms away.  This isn't something
that happens readily today given the way the stacks locking and
softirq-net work, but its moving in that direction (for instance,
threaded softirqs).

This is why I am pushing for a run-time detection mechanism (like
can_sleep()), as opposed to something in the eventfd interface level
(like EVENTFD_SIGNAL_CANSLEEP).  I think at least the CURRENTVALID idea
is a great one, I just don't have a current need for it.  That said, I
do not have a problem with modifing eventfd to provide such information
if Davide et. al. ack it as well.

Does this all make sense?

-Greg



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

  reply	other threads:[~2009-06-16 16:17 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-16  2:29 [KVM-RFC PATCH 0/2] eventfd enhancements for irqfd/iosignalfd Gregory Haskins
2009-06-16  2:29 ` [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface Gregory Haskins
2009-06-16 14:02   ` Michael S. Tsirkin
2009-06-16 14:11     ` Gregory Haskins
2009-06-16 14:38       ` Michael S. Tsirkin
2009-06-16 14:48         ` Gregory Haskins
2009-06-16 14:54           ` Gregory Haskins
2009-06-16 15:16             ` Michael S. Tsirkin
2009-06-16 14:55           ` Michael S. Tsirkin
2009-06-16 15:20             ` Gregory Haskins
2009-06-16 15:41               ` Michael S. Tsirkin
2009-06-16 16:17                 ` Gregory Haskins [this message]
2009-06-16 16:19                   ` Davide Libenzi
2009-06-16 17:01                     ` Gregory Haskins
2009-06-17 16:38                       ` Davide Libenzi
2009-06-17 17:28                         ` Gregory Haskins
2009-06-17 17:44                           ` Davide Libenzi
2009-06-17 19:17                             ` Gregory Haskins
2009-06-17 19:50                               ` Davide Libenzi
2009-06-17 21:48                                 ` Gregory Haskins
2009-06-17 23:21                                   ` Davide Libenzi
2009-06-18  6:23                                     ` Michael S. Tsirkin
2009-06-18 17:52                                       ` Davide Libenzi
2009-06-18 14:01                                     ` Gregory Haskins
2009-06-18 14:01                                       ` Gregory Haskins
2009-06-18 17:44                                       ` Davide Libenzi
2009-06-18 19:04                                         ` Gregory Haskins
2009-06-18 19:04                                           ` Gregory Haskins
2009-06-18 22:03                                           ` Davide Libenzi
2009-06-18 22:47                                             ` Gregory Haskins
2009-06-18 22:47                                               ` Gregory Haskins
2009-06-19 18:51                                             ` Gregory Haskins
2009-06-19 18:51                                               ` [PATCH 1/3] eventfd: Allow waiters to be notified about the eventfd file* going away Gregory Haskins
2009-06-19 18:51                                               ` [PATCH 2/3] eventfd: add generalized notifier interface Gregory Haskins
2009-06-19 18:51                                               ` [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions Gregory Haskins
2009-06-19 19:10                                                 ` Davide Libenzi
2009-06-19 21:16                                                   ` Gregory Haskins
2009-06-19 21:26                                                     ` Davide Libenzi
2009-06-19 21:49                                                       ` Gregory Haskins
2009-06-19 21:54                                                         ` Davide Libenzi
2009-06-19 22:47                                                           ` Davide Libenzi
2009-06-20  2:09                                                             ` Gregory Haskins
2009-06-20 21:17                                                               ` Davide Libenzi
2009-06-20 22:11                                                                 ` Davide Libenzi
2009-06-20 23:48                                                                   ` Davide Libenzi
2009-06-21  1:14                                                                     ` Gregory Haskins
2009-06-21 16:51                                                                       ` Davide Libenzi
2009-06-21 18:39                                                                         ` Gregory Haskins
2009-06-21 23:54                                                                           ` Davide Libenzi
2009-06-22 16:05                                                                             ` Gregory Haskins
2009-06-22 16:05                                                                               ` Gregory Haskins
2009-06-22 17:01                                                                               ` Davide Libenzi
2009-06-22 17:43                                                                                 ` Gregory Haskins
2009-06-22 17:43                                                                                   ` Gregory Haskins
2009-06-22 18:03                                                                                   ` Davide Libenzi
2009-06-22 18:31                                                                                     ` Gregory Haskins
2009-06-22 18:31                                                                                       ` Gregory Haskins
2009-06-22 18:40                                                                                       ` Davide Libenzi
2009-06-22 18:41                                                                                     ` Michael S. Tsirkin
2009-06-22 18:51                                                                                       ` Davide Libenzi
2009-06-22 19:05                                                                                         ` Michael S. Tsirkin
2009-06-22 19:26                                                                                           ` Gregory Haskins
2009-06-22 19:29                                                                                             ` Davide Libenzi
2009-06-22 20:06                                                                                               ` Gregory Haskins
2009-06-22 22:53                                                                                                 ` Davide Libenzi
2009-06-23  1:03                                                                                                   ` Gregory Haskins
2009-06-23  1:17                                                                                                     ` Davide Libenzi
2009-06-23  1:26                                                                                                       ` Gregory Haskins
2009-06-23  1:26                                                                                                         ` Gregory Haskins
2009-06-23 14:29                                                                                                         ` Davide Libenzi
2009-06-23 14:37                                                                                                           ` Gregory Haskins
2009-06-23 14:37                                                                                                             ` Gregory Haskins
2009-06-23 14:35                                                                                                             ` Davide Libenzi
2009-06-23 14:42                                                                                                               ` Gregory Haskins
2009-06-23 14:42                                                                                                                 ` Gregory Haskins
2009-06-23 15:04                                                                                                               ` Michael S. Tsirkin
2009-06-22 20:28                                                                                             ` Michael S. Tsirkin
2009-06-22 19:16                                                                                         ` Gregory Haskins
2009-06-22 19:54                                                                                           ` Davide Libenzi
2009-06-24  3:25                                                                                     ` Rusty Russell
2009-06-24 22:45                                                                                       ` Davide Libenzi
2009-06-25 11:42                                                                                         ` Rusty Russell
2009-06-25 16:34                                                                                           ` Davide Libenzi
2009-06-25 17:32                                                                                             ` Gregory Haskins
2009-06-25 18:26                                                                                               ` Michael S. Tsirkin
2009-06-25 18:41                                                                                                 ` Gregory Haskins
2009-06-26 11:23                                                                                                   ` Michael S. Tsirkin
2009-06-23  3:25                                                                             ` Rusty Russell
2009-06-23 14:31                                                                               ` Davide Libenzi
2009-06-25  0:19                                                                                 ` Davide Libenzi
2009-06-21  1:05                                                                 ` Gregory Haskins
2009-06-16 17:54                   ` [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface Michael S. Tsirkin
2009-06-16 18:09                     ` Gregory Haskins
2009-06-17 14:45                       ` Michael S. Tsirkin
2009-06-17 15:02                         ` Gregory Haskins
2009-06-17 16:25                           ` Michael S. Tsirkin
2009-06-17 16:41                             ` Gregory Haskins
2009-06-16 14:17     ` Gregory Haskins
2009-06-16 14:22       ` Gregory Haskins
2009-06-16 14:40     ` Gregory Haskins
2009-06-16 14:46       ` Michael S. Tsirkin
2009-06-18  9:03       ` Avi Kivity
2009-06-18 11:43         ` Gregory Haskins
2009-06-16  2:30 ` [KVM-RFC PATCH 2/2] eventfd: add module reference counting support for registered notifiers Gregory Haskins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A37C592.2030407@novell.com \
    --to=ghaskins@novell.com \
    --cc=avi@redhat.com \
    --cc=davidel@xmailserver.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mst@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.