All of lore.kernel.org
 help / color / mirror / Atom feed
* wait queues
@ 2015-04-20  1:23 Ruben Safir
  2015-04-20  1:48 ` Ruben Safir
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Ruben Safir @ 2015-04-20  1:23 UTC (permalink / raw)
  To: kernelnewbies

I'm not pouring over Love's book in detail and the section in Chapter 4
on the wit queue is implemented  in the text completely surprised me.

He is recommending that you have to right your own wait queue entry
routine for every process?  Isn't that reckless?

He is suggesting

DEFINE_WAIT(wait) //what IS wait

add_wait_queue(q, &wait); // in the current kernel this invovled
                         //  flag   checking and a linked list

while(!condition){ /* an event we are weighting for
  prepare_to_wait(&q, &wait, TASK_INTERRUPTIBLE);
  if(signal_pending(current))
        /* SIGNAl HANDLE */
  schedule();
}

finish_wait(&q, &wait);

He also write how this proceeds to function and one part confuses me

5.  When the taks awakens, it again checks whether the condition is
true.  If it is, it exists the loop.  Otherwise it again calls schedule.


This is not the order that it seems to follow according to the code.

To me it looks like it should
1 - creat2 the wait queue
2 - adds &wait onto queue q
3 checks if condition is true, if so, if not, enter a while loop
4 prepare_to_wait which changes the status of our &wait to
TASK_INTERUPPABLE
5 check for signals ... notice the process is still moving.  Does it
stop and wait now?
6  schedule itself on the runtime rbtree... which make NO sense unless
there was a stopage I didn't know about.
7 check the condition again and repeat while look
	7a. if the loop ends fishish_waiting... take it off the queue.



Isn't this reckless to leave this to users to write the code.  Your
begging for a race condition.

Ruben

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies at kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 16+ messages in thread

* wait queues
  2015-04-20  1:23 wait queues Ruben Safir
@ 2015-04-20  1:48 ` Ruben Safir
  2015-04-20  1:54 ` Fred Chou
  2015-04-20 15:23 ` michi1 at michaelblizek.twilightparadox.com
  2 siblings, 0 replies; 16+ messages in thread
From: Ruben Safir @ 2015-04-20  1:48 UTC (permalink / raw)
  To: kernelnewbies

I assume this is a different wait then the one we covered in call for
concurrency.


On 04/19/2015 09:23 PM, Ruben Safir wrote:
> I'm not pouring over Love's book in detail and the section in Chapter 4
> on the wait queue is implemented  in the text completely surprised me.
> 
> He is recommending that you have to right your own wait queue entry
> routine for every process?  Isn't that reckless?
> 
> He is suggesting
> 
> DEFINE_WAIT(wait) //what IS wait
> 
> add_wait_queue(q, &wait); // in the current kernel this invovled
>                          //  flag   checking and a linked list
> 
> while(!condition){ /* an event we are weighting for
>   prepare_to_wait(&q, &wait, TASK_INTERRUPTIBLE);
>   if(signal_pending(current))
>         /* SIGNAl HANDLE */
>   schedule();
> }
> 
> finish_wait(&q, &wait);
> 
> He also write how this proceeds to function and one part confuses me
> 
> 5.  When the task awakens, it again checks whether the condition is
> true.  If it is, it exists the loop.  Otherwise it again calls schedule.
> 
> 
> This is not the order that it seems to follow according to the code.
> 
> To me it looks like it should
> 1 - create the wait queue
> 2 - adds &wait onto queue q
> 3 checks if condition is true, if not, enter a while loop
> 4 prepare_to_wait which changes the status of our &wait to
> TASK_INTERUPPABLE

see this here must mean that wait is something else?

> 5 check for signals ... notice the process is still moving.  Does it
> stop and wait now?
> 6  schedule itself on the runtime rbtree... which make NO sense unless
> there was a stopage I didn't know about.
> 7 check the condition again and repeat while look
> 	7a. if the loop ends fishish_waiting... take it off the queue.
> 
> 
> 
> Isn't this reckless to leave this to users to write the code.  Your
> begging for a race condition.
> 
> Ruben
> 
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
> 
> 
> 
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* wait queues
  2015-04-20  1:23 wait queues Ruben Safir
  2015-04-20  1:48 ` Ruben Safir
@ 2015-04-20  1:54 ` Fred Chou
  2015-04-20  8:57   ` Ruben Safir
  2015-04-20 15:23 ` michi1 at michaelblizek.twilightparadox.com
  2 siblings, 1 reply; 16+ messages in thread
From: Fred Chou @ 2015-04-20  1:54 UTC (permalink / raw)
  To: kernelnewbies



On 20/4/2015 9:23 AM, Ruben Safir wrote:
> I'm not pouring over Love's book in detail and the section in Chapter 4
> on the wit queue is implemented  in the text completely surprised me.
> 
> He is recommending that you have to right your own wait queue entry
> routine for every process?  Isn't that reckless?
> 
> He is suggesting
> 
> DEFINE_WAIT(wait) //what IS wait
> 
> add_wait_queue(q, &wait); // in the current kernel this invovled
>                          //  flag   checking and a linked list
> 
> while(!condition){ /* an event we are weighting for
>   prepare_to_wait(&q, &wait, TASK_INTERRUPTIBLE);
>   if(signal_pending(current))
>         /* SIGNAl HANDLE */
>   schedule();
> }
> 
> finish_wait(&q, &wait);
> 
> He also write how this proceeds to function and one part confuses me
> 
> 5.  When the taks awakens, it again checks whether the condition is
> true.  If it is, it exists the loop.  Otherwise it again calls schedule.
> 
> 
> This is not the order that it seems to follow according to the code.
> 
> To me it looks like it should
> 1 - creat2 the wait queue
> 2 - adds &wait onto queue q
> 3 checks if condition is true, if so, if not, enter a while loop
> 4 prepare_to_wait which changes the status of our &wait to
> TASK_INTERUPPABLE
> 5 check for signals ... notice the process is still moving.  Does it
> stop and wait now?
> 6  schedule itself on the runtime rbtree... which make NO sense unless
> there was a stopage I didn't know about.
> 7 check the condition again and repeat while look
> 	7a. if the loop ends fishish_waiting... take it off the queue.
> 

Could this be a lost wake-up problem?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* wait queues
  2015-04-20  1:54 ` Fred Chou
@ 2015-04-20  8:57   ` Ruben Safir
  0 siblings, 0 replies; 16+ messages in thread
From: Ruben Safir @ 2015-04-20  8:57 UTC (permalink / raw)
  To: kernelnewbies

On 04/19/2015 09:54 PM, Fred Chou wrote:
> Could this be a lost wake-up problem?

that is what it is supposed to solve.  It doesn't help to understand the
code though.

Ruben

^ permalink raw reply	[flat|nested] 16+ messages in thread

* wait queues
  2015-04-20  1:23 wait queues Ruben Safir
  2015-04-20  1:48 ` Ruben Safir
  2015-04-20  1:54 ` Fred Chou
@ 2015-04-20 15:23 ` michi1 at michaelblizek.twilightparadox.com
  2015-04-20 16:39   ` Ruben Safir
  2 siblings, 1 reply; 16+ messages in thread
From: michi1 at michaelblizek.twilightparadox.com @ 2015-04-20 15:23 UTC (permalink / raw)
  To: kernelnewbies

Hi!

On 21:23 Sun 19 Apr     , Ruben Safir wrote:
> I'm not pouring over Love's book in detail and the section in Chapter 4
> on the wit queue is implemented  in the text completely surprised me.
> 
> He is recommending that you have to right your own wait queue entry
> routine for every process?  Isn't that reckless?

I would not recommend that. There are already functions in linux/wait.h for
these purposes like wait_event_interruptible(). 

	-Michi
-- 
programing a layer 3+4 network protocol for mesh networks
see http://michaelblizek.twilightparadox.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* wait queues
  2015-04-20 15:23 ` michi1 at michaelblizek.twilightparadox.com
@ 2015-04-20 16:39   ` Ruben Safir
  2015-04-21 15:05     ` michi1 at michaelblizek.twilightparadox.com
  0 siblings, 1 reply; 16+ messages in thread
From: Ruben Safir @ 2015-04-20 16:39 UTC (permalink / raw)
  To: kernelnewbies

On 04/20/2015 11:23 AM, michi1 at michaelblizek.twilightparadox.com wrote:
> I would not recommend that. There are already functions in linux/wait.h for
> these purposes like wait_event_interruptible(). 


can you do that in the kernel?  The wait_event_interuptable creates wait
queues?

One of the things that makes this more confusing is that there is
wait/signal syntax for concurrence as well

^ permalink raw reply	[flat|nested] 16+ messages in thread

* wait queues
  2015-04-20 16:39   ` Ruben Safir
@ 2015-04-21 15:05     ` michi1 at michaelblizek.twilightparadox.com
  2015-04-22 11:23       ` wait queues semiphores kernel implementations Ruben Safir
  0 siblings, 1 reply; 16+ messages in thread
From: michi1 at michaelblizek.twilightparadox.com @ 2015-04-21 15:05 UTC (permalink / raw)
  To: kernelnewbies

Hi!

On 12:39 Mon 20 Apr     , Ruben Safir wrote:
> On 04/20/2015 11:23 AM, michi1 at michaelblizek.twilightparadox.com wrote:
> > I would not recommend that. There are already functions in linux/wait.h for
> > these purposes like wait_event_interruptible(). 
> 
> 
> can you do that in the kernel?  The wait_event_interuptable creates wait
> queues?

No, wait_event_interuptable waits on an existing waitqueue. If you want to
create a waitqueue, call init_waitqueue_head().

	-Michi
-- 
programing a layer 3+4 network protocol for mesh networks
see http://michaelblizek.twilightparadox.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* wait queues semiphores kernel implementations
  2015-04-21 15:05     ` michi1 at michaelblizek.twilightparadox.com
@ 2015-04-22 11:23       ` Ruben Safir
  2015-04-22 16:49         ` michi1 at michaelblizek.twilightparadox.com
  0 siblings, 1 reply; 16+ messages in thread
From: Ruben Safir @ 2015-04-22 11:23 UTC (permalink / raw)
  To: kernelnewbies

Ruben QUOTED Previously:

<<<I'm pouring over Love's (Kernel) book in detail and the section in
Chapter 4 on the wait queue how it is implemented  in the text
completely surprised me.

He is recommending that you have to write your own wait queue entry
routine for every process?  Isn't that reckless?

He is suggesting

DEFINE_WAIT(wait) //what IS wait EXACTLY in this context

add_wait_queue(q, &wait); // in the current kernel this invovled
                         //  flag   checking and a linked list

while(!condition){ /* an event we are weighting for
  prepare_to_wait(&q, &wait, TASK_INTERRUPTIBLE);
  if(signal_pending(current))
        /* SIGNAl HANDLE */
  schedule();
}

finish_wait(&q, &wait);

He also write how this proceeds to function and one part confuses me

5.  When the taks awakens, it again checks whether the condition is
true.  If it is, it exists the loop.  Otherwise it again calls schedule.


This is not the order that it seems to follow according to the code.

To me it looks like it should
1 - creat2 the wait queue
2 - adds &wait onto queue q
3 checks if condition is true, if so, if not, enter a while loop
4 prepare_to_wait which changes the status of our &wait to
TASK_INTERUPPABLE
5 check for signals ... notice the process is still moving.  Does it
stop and wait now?
6  schedule itself on the runtime rbtree... which make NO sense unless
there was a stopage I didn't know about.
7 check the condition again and repeat while look
	7a. if the loop ends fishish_waiting... take it off the queue.



Isn't this reckless to leave this to users to write the code.  Your
begging for a race condition.

Ruben >>



On 04/21/2015 11:05 AM, michi1 at michaelblizek.twilightparadox.com wrote:
> Hi!
> 
> On 12:39 Mon 20 Apr     , Ruben Safir wrote:
>> On 04/20/2015 11:23 AM, michi1 at michaelblizek.twilightparadox.com wrote:
>>> I would not recommend that. There are already functions in linux/wait.h for
>>> these purposes like wait_event_interruptible(). 
>>
>>
>> can you do that in the kernel?  The wait_event_interuptable creates wait
>> queues?
> 
> No, wait_event_interuptable waits on an existing waitqueue. If you want to
> create a waitqueue, call init_waitqueue_head().
> 
> 	-Michi
> 


Here is the confusing part.  this is a discussion on wait and semiphores
in a standard text

5.6.2
 Semaphore Implementation
Recall that the implementation of mutex locks discussed in Section 5.5
suffers from busy waiting. The definitions of the wait() and signal()
semaphore operations just described present the same problem. To
overcome the need for busy waiting, we can modify the definition of the
wait() and signal() operations as follows: When a process executes the
wait() operation and finds that the semaphore value is not positive, it
must wait. However, rather than engaging in busy waiting, the process
can block itself. The block operation places a process into a waiting
queue associated with the semaphore, and the state of the process is
switched to the waiting state. Then control is transferred to the CPU
scheduler, which selects another process to execute.

A process that is blocked, waiting on a semaphore S, should be restarted
when some other process executes a signal() operation. The process is
restarted by a wakeup() operation, which changes the process from the
waiting state to the ready state. The process is then placed in the
ready queue. (The CPU may or may not be switched from the running
process to the newly ready process, depending on the CPU-scheduling
algorithm.)

To implement semaphores under this definition, we define a semaphore as
follows:

typedef struct {
    int value;
    struct process *list;
} semaphore;

Each semaphore has an integer value and a list of processes list. When
a process must wait on a semaphore, it is added to the list of processes. A
signal() operation removes one process from the list of waiting
processes and awakens that process.
Now, the wait() semaphore operation can be defined as

wait(semaphore *S) {
   S->value--;
  if (S->value < 0) {
  add this process to S->list;
  block();
  }
}

and the signal() semaphore operation can be defined as

signal(semaphore *S) {
  S->value++;
  if (S->value <= 0) {
  remove a process P from S->list;
  wakeup(P);
  }
}


Minus the Semiphore, that sounds like what we are doing with the wait
list in the scheduler.   But it looks like we are leaving it to the
user.  Why?  It is similar but oddly different so I'm trying to figure
out what is happening here.

Ruben

^ permalink raw reply	[flat|nested] 16+ messages in thread

* wait queues semiphores kernel implementations
  2015-04-22 11:23       ` wait queues semiphores kernel implementations Ruben Safir
@ 2015-04-22 16:49         ` michi1 at michaelblizek.twilightparadox.com
  0 siblings, 0 replies; 16+ messages in thread
From: michi1 at michaelblizek.twilightparadox.com @ 2015-04-22 16:49 UTC (permalink / raw)
  To: kernelnewbies

Hi!

On 07:23 Wed 22 Apr     , Ruben Safir wrote:
> Ruben QUOTED Previously:
> 
> <<<I'm pouring over Love's (Kernel) book in detail and the section in
> Chapter 4 on the wait queue how it is implemented  in the text
> completely surprised me.
> 
> He is recommending that you have to write your own wait queue entry
> routine for every process?  Isn't that reckless?
> 
> He is suggesting
> 
> DEFINE_WAIT(wait) //what IS wait EXACTLY in this context

#define DEFINE_WAIT_FUNC(name, function)				\
	wait_queue_t name = {						\
		.private	= current,				\
		.func		= function,				\
		.task_list	= LIST_HEAD_INIT((name).task_list),	\
	}

#define DEFINE_WAIT(name) DEFINE_WAIT_FUNC(name, autoremove_wake_function)

> add_wait_queue(q, &wait); // in the current kernel this invovled
>                          //  flag   checking and a linked list
> 
> while(!condition){ /* an event we are weighting for
>   prepare_to_wait(&q, &wait, TASK_INTERRUPTIBLE);
>   if(signal_pending(current))
>         /* SIGNAl HANDLE */
>   schedule();
> }
> 
> finish_wait(&q, &wait);
> 
> He also write how this proceeds to function and one part confuses me
> 
> 5.  When the taks awakens, it again checks whether the condition is
> true.  If it is, it exists the loop.  Otherwise it again calls schedule.
> 
> 
> This is not the order that it seems to follow according to the code.
> 
> To me it looks like it should
> 1 - creat2 the wait queue
> 2 - adds &wait onto queue q
> 3 checks if condition is true, if so, if not, enter a while loop
> 4 prepare_to_wait which changes the status of our &wait to
> TASK_INTERUPPABLE
> 5 check for signals ... notice the process is still moving.  Does it
> stop and wait now?
> 6  schedule itself on the runtime rbtree... which make NO sense unless
> there was a stopage I didn't know about.
> 7 check the condition again and repeat while look
> 	7a. if the loop ends fishish_waiting... take it off the queue.

This is what wait_event_interruptable looks like:
http://lxr.linux.no/linux+*/include/linux/wait.h#L390

Seems like prepare_to_wait is now called before checking the condition and
add_wait_queue does not exist anymore.

> Isn't this reckless to leave this to users to write the code.  Your
> begging for a race condition.

I agree. This is why I would not recommend it unless you have a good  reason
to do so.

...
> Minus the Semiphore, that sounds like what we are doing with the wait
> list in the scheduler.   But it looks like we are leaving it to the
> user.  Why?  It is similar but oddly different so I'm trying to figure
> out what is happening here.

The concept behind a waitqueue is more not about counting up+down. Basically
when you call wait_event_* you define what you are waiting for. For example
you have a socket and want to wait incoming data. Wheneven anything happens to
the socket (e.g. data arrives, error, ...), somebody calls wake_up, your
thread makes up, check if the condition is true and then wait_event_* either
goes back to sleep or returns.

The difference is that you can have situations where wait_event_* returns
without anybody even having called wake_up. Also you can have situations with
lots of calls to wake_up, but wait_event_* always goes back to sleep because
the events which happen do not cause your condition to become true.

	-Michi
-- 
programing a layer 3+4 network protocol for mesh networks
see http://michaelblizek.twilightparadox.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* wait queues
@ 2015-04-19 10:20 Ruben Safir
  0 siblings, 0 replies; 16+ messages in thread
From: Ruben Safir @ 2015-04-19 10:20 UTC (permalink / raw)
  To: kernelnewbies

I'm not pouring over Love's book in detail and the section in Chapter 4
on the wit queue is implemented  in the text completely surprised me.

He is recommending that you have to right your own wait queue entry
routine for every process?  Isn't that reckless?

He is suggesting

DEFINE_WAIT(wait) //what IS wait

add_wait_queue(q, &wait); // in the current kernel this invovled
                         //  flag   checking and a linked list

while(!condition){ /* an event we are weighting for
  prepare_to_wait(&q, &wait, TASK_INTERRUPTIBLE);
  if(signal_pending(current))
        /* SIGNAl HANDLE */
  schedule();
}

finish_wait(&q, &wait);

He also write how this proceeds to function and one part confuses me

5.  When the taks awakens, it again checks whether the condition is
true.  If it is, it exists the loop.  Otherwise it again calls schedule.


This is not the order that it seems to follow according to the code.

To me it looks like it should
1 - creat2 the wait queue
2 - adds &wait onto queue q
3 checks if condition is true, if so, if not, enter a while loop
4 prepare_to_wait which changes the status of our &wait to
TASK_INTERUPPABLE
5 check for signals ... notice the process is still moving.  Does it
stop and wait now?
6  schedule itself on the runtime rbtree... which make NO sense unless
there was a stopage I didn't know about.
7 check the condition again and repeat while look
	7a. if the loop ends fishish_waiting... take it off the queue.



Isn't this reckless to leave this to users to write the code.  Your
begging for a race condition.

Ruben

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Wait Queues
  2012-11-08 15:39     ` Andres Lagar-Cavilla
@ 2012-11-08 16:48       ` Keir Fraser
  0 siblings, 0 replies; 16+ messages in thread
From: Keir Fraser @ 2012-11-08 16:48 UTC (permalink / raw)
  To: Andres Lagar-Cavilla; +Cc: Tim Deegan, Jan Beulich, xen-devel

On 08/11/2012 15:39, "Andres Lagar-Cavilla" <andreslc@gridcentric.ca> wrote:

>> dom0 vcpuŠ.
> Uhmm. But it seems there is _some_ method to the madness. Luckily mm locks are
> all taken after the p2m lock (and enforced that way). dom0 can grab ... the
> big domain lock? the grant table lock?
> 
> Perhaps we can categorize locks between reflexive or foreign (not that we have
> abundant space in the spin lock struct to stash more flags) and perform some
> sort of enforcement like what goes on in the mm layer. Xen insults via
> BUG_ON's are a strong conditioning tool for developers. It is certainly
> simpler to tease out the locks that might deadlock dom0 than all possible
> locks, including RCU read-locks.
> 
> What I mean:
> 
> BUG_ON(current->domain != d && lock_is_reflexive)
> An example of a reflexive lock is the per page sharing lock.
> 
> BUG_ON(prepare_to_wait && current->domain->holds_foreign_lock)
> An example of a transitive lock is the gran table lock.
> 
> A third category would entail global locks like the domain list, which are
> identical to a foreign lock wrt to this analysis.
> 
> Another benefit of this is that only reflexive locks need to be made
> sleep-capable, not everything under the sun. I.e. the possibility of livelock
> is corralled to apply only to vcpus of the same domain, and then it's avoided
> by making those lock holders re-schedulable.

This sounds possible. RCU read locks will often count as global locks by the
way, as they are most often used as an alternative to taking a global
spinlock or multi-reader lock. So sleeping in RCU critical regions is
generally not going to be a good idea. Perhaps it will turn out that such
regions don't get in your way too often.

> Andres

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Wait Queues
  2012-11-08  7:42   ` Keir Fraser
@ 2012-11-08 15:39     ` Andres Lagar-Cavilla
  2012-11-08 16:48       ` Keir Fraser
  0 siblings, 1 reply; 16+ messages in thread
From: Andres Lagar-Cavilla @ 2012-11-08 15:39 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Andres Lagar-Cavilla, Tim Deegan, Jan Beulich, xen-devel

On Nov 8, 2012, at 2:42 AM, Keir Fraser <keir.xen@gmail.com> wrote:

> On 08/11/2012 03:22, "Andres Lagar-Cavilla" <andreslc@gridcentric.ca> wrote:
> 
>>> I'd like to propose an approach that ensures that as long some properties are
>>> met, arbitrary wait queue sleep is allowed. Here are the properties:
>>> 1. Third parties servicing a wait queue sleep are indeed third parties. In
>>> other words, dom0 does not do paging.
>>> 2. Vcpus of a wait queue servicing domain may never go to sleep on a wait
>>> queue during a foreign map.
>>> 3. A guest vcpu may go to sleep on a wait queue holding any kinds of locks as
>>> long as it does not hold the p2m lock.
>> 
>> N.B: I understand (now) this may cause any other vcpu contending on a lock
>> held by the wait queue sleeper to not yield to the scheduler and pin its
>> physical cpu.
>> 
>> What I am struggling with is coming up with a solution that doesn't turn
>> hypervisor mm hacking into a locking minefield.
>> 
>> Linux fixes this with many kinds of sleeping synchronization primitives. A
>> task can, for example, hold the mmap semaphore and sleep on a wait queue. Is
>> this the only way out of this mess? Not if wait queues force the vcpu to wake
>> up on the same phys cpu it was using at the time of sleepingŠ.
> 
> Well, the forcing to wake up on same phys cpu it slept on is going to be
> fixed. But it's not clear to me how that current restriction makes the
> problem harder? What if you were running on a single-phys-cpu system?
It's not a hard blocker. It's giving up efficiency otherwise. It's a "nice to have" precondition.

> 
> As you have realised, the fact that all locks in Xen are spinlocks makes the
> potential for deadlock very obvious. Someone else gets scheduled and takes
> out the phys cpu by spinning on a lock that someone else is holding while
> they are descheduled.
> 
> Linux-style sleeping mutexes might help. We could add those. They don't help
> as readily as in the Linux case however! In some ways they push the deadlock
> up one level of abstraction, to the virt cpu (vcpu). Consider single-vcpu
> dom0 running a pager -- even if you are careful that the pager itself does
> not acquire any locks that one of its clients may hold-while-sleeping, if
> *anything* running in dom0 can acquire such a lock, you have an obvious
> deadlock, as that will take out the dom0 vcpu and leave it blocked forever
> waiting for a lock that is held while its holder waits for service from the
> dom0 vcpu….
Uhmm. But it seems there is _some_ method to the madness. Luckily mm locks are all taken after the p2m lock (and enforced that way). dom0 can grab ... the big domain lock? the grant table lock?

Perhaps we can categorize locks between reflexive or foreign (not that we have abundant space in the spin lock struct to stash more flags) and perform some sort of enforcement like what goes on in the mm layer. Xen insults via BUG_ON's are a strong conditioning tool for developers. It is certainly simpler to tease out the locks that might deadlock dom0 than all possible locks, including RCU read-locks.

What I mean:

BUG_ON(current->domain != d && lock_is_reflexive)
An example of a reflexive lock is the per page sharing lock.

BUG_ON(prepare_to_wait && current->domain->holds_foreign_lock)
An example of a transitive lock is the gran table lock.

A third category would entail global locks like the domain list, which are identical to a foreign lock wrt to this analysis.

Another benefit of this is that only reflexive locks need to be made sleep-capable, not everything under the sun. I.e. the possibility of livelock is corralled to apply only to vcpus of the same domain, and then it's avoided by making those lock holders re-schedulable.

Andres

> 
> I don't think there is an easy solution here!
> 
> -- Keir
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Wait Queues
  2012-11-08  3:22 ` Andres Lagar-Cavilla
@ 2012-11-08  7:42   ` Keir Fraser
  2012-11-08 15:39     ` Andres Lagar-Cavilla
  0 siblings, 1 reply; 16+ messages in thread
From: Keir Fraser @ 2012-11-08  7:42 UTC (permalink / raw)
  To: Andres Lagar-Cavilla, Tim Deegan, Jan Beulich, xen-devel

On 08/11/2012 03:22, "Andres Lagar-Cavilla" <andreslc@gridcentric.ca> wrote:

>> I'd like to propose an approach that ensures that as long some properties are
>> met, arbitrary wait queue sleep is allowed. Here are the properties:
>> 1. Third parties servicing a wait queue sleep are indeed third parties. In
>> other words, dom0 does not do paging.
>> 2. Vcpus of a wait queue servicing domain may never go to sleep on a wait
>> queue during a foreign map.
>> 3. A guest vcpu may go to sleep on a wait queue holding any kinds of locks as
>> long as it does not hold the p2m lock.
> 
> N.B: I understand (now) this may cause any other vcpu contending on a lock
> held by the wait queue sleeper to not yield to the scheduler and pin its
> physical cpu.
> 
> What I am struggling with is coming up with a solution that doesn't turn
> hypervisor mm hacking into a locking minefield.
> 
> Linux fixes this with many kinds of sleeping synchronization primitives. A
> task can, for example, hold the mmap semaphore and sleep on a wait queue. Is
> this the only way out of this mess? Not if wait queues force the vcpu to wake
> up on the same phys cpu it was using at the time of sleepingŠ.

Well, the forcing to wake up on same phys cpu it slept on is going to be
fixed. But it's not clear to me how that current restriction makes the
problem harder? What if you were running on a single-phys-cpu system?

As you have realised, the fact that all locks in Xen are spinlocks makes the
potential for deadlock very obvious. Someone else gets scheduled and takes
out the phys cpu by spinning on a lock that someone else is holding while
they are descheduled.

Linux-style sleeping mutexes might help. We could add those. They don't help
as readily as in the Linux case however! In some ways they push the deadlock
up one level of abstraction, to the virt cpu (vcpu). Consider single-vcpu
dom0 running a pager -- even if you are careful that the pager itself does
not acquire any locks that one of its clients may hold-while-sleeping, if
*anything* running in dom0 can acquire such a lock, you have an obvious
deadlock, as that will take out the dom0 vcpu and leave it blocked forever
waiting for a lock that is held while its holder waits for service from the
dom0 vcpu....

I don't think there is an easy solution here!

 -- Keir

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Wait Queues
  2012-11-07 20:54 Wait Queues Andres Lagar-Cavilla
@ 2012-11-08  3:22 ` Andres Lagar-Cavilla
  2012-11-08  7:42   ` Keir Fraser
  0 siblings, 1 reply; 16+ messages in thread
From: Andres Lagar-Cavilla @ 2012-11-08  3:22 UTC (permalink / raw)
  To: Tim Deegan, Keir (Xen.org), Jan Beulich, xen-devel

On Nov 7, 2012, at 3:54 PM, Andres Lagar-Cavilla <andres@gridcentric.ca> wrote:

> Hi all,
> we currently have a problem in the (x86) mm layer. Callers may request the p2m to perform a translation of a gfn to an mfn. Such translation may need to wait for a third party to service it. This happens when:
> 
> - a page needs to be paged in
> - a CoW breaking of a shared page fails due to lack of memory
> 
> Note that paging in may also fail due to lack of memory. In both ENOMEM cases, all the plumbing for a toolstack to be notified and take some corrective action to release some memory and retry is in place. We also have plumbing for pagers in place.
> 
> Ideally we want the internals to be self-contained, so that callers need not be concerned with any of this. A request for a p2m translation may or may not sleep, but on exit from the p2m the caller either has an mfn with a page ref, or an error code due to some other condition.
> 
> Wait queue support in (x86) Xen prevents sleeping on a wait queue if any locks are held, including RCU read-side locks (i.e. BUG_ON(!in_atomic()).
> 
> For this reason, we have not yet implemented sleeping on the p2m. Callers may get errors telling them to retry. A lot of (imho) ugly code is peppered around the hypervisor to deal with the consequences of this. More fundamentally, in some cases there is no possible elegant handling, and guests are crashed (for example, if a page table page is paged out and the hypervisor needs to translate a guest virtual address to a gfn). This limits the applicability of memory paging and sharing.
> 
> One way to solve this would be to ensure no code path liable to sleep in a wait queue is holding any locks at wait queue sleep time. I believe this is doomed. Not just because this is a herculean task. It also makes writing hypervisor code *very* difficult. Anyone trying to throw a p2m translation into a code path needs to think of all possible upstream call sequences. Not even RCU read locks are allowed.
> 
> I'd like to propose an approach that ensures that as long some properties are met, arbitrary wait queue sleep is allowed. Here are the properties:
> 1. Third parties servicing a wait queue sleep are indeed third parties. In other words, dom0 does not do paging.
> 2. Vcpus of a wait queue servicing domain may never go to sleep on a wait queue during a foreign map.
> 3. A guest vcpu may go to sleep on a wait queue holding any kinds of locks as long as it does not hold the p2m lock.

N.B: I understand (now) this may cause any other vcpu contending on a lock held by the wait queue sleeper to not yield to the scheduler and pin its physical cpu.

What I am struggling with is coming up with a solution that doesn't turn hypervisor mm hacking into a locking minefield.

Linux fixes this with many kinds of sleeping synchronization primitives. A task can, for example, hold the mmap semaphore and sleep on a wait queue. Is this the only way out of this mess? Not if wait queues force the vcpu to wake up on the same phys cpu it was using at the time of sleeping….

Andres

> 4. "Kick" routines in the hypervisor by which service domains effectively wake up a vcpu may only take the p2m lock to do a fix up of the p2m.
> 5. Wait queues can be awakened on a special domain destroy condition.
> 
> Property 1. is hopefully self-evident, and although not enforced in the code it is reasonably simple to do so.
> 
> Property 2. is also self-evident and enforced in the code today.
> 
> Property 3. simplifies reasoning about p2m translations and wait queue sleeping. Provides a clean model for reasoning about what might or might not happen. It will require us to restructure some code paths (i.e. XENMEM_add_to_physmap) that require multiple p2m translations in a single critical region to perform all translations up front.
> 
> Property 4. is already enforced in the code as is right now.
> 
> Property 5. needs adding some logic to the top of domain destruction: set a flag, wake up all vcpus in wait queues. Extra logic on the wait queue side will exit the wait if the destroy flag is set, with an error. Most if not all callers can deal right now with a p2m translation error (other than paging), and unwind and release all their locks.
> 
> I confess my understanding of RCU is not 100% there and I am not sure what will happen to force_quiescent_state. I also understand there is a impedance mismatch wrt to "saving" and "restoring" the physical CPU preempt count.
> 
> Thanks,
> Andres

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Wait Queues
@ 2012-11-07 20:54 Andres Lagar-Cavilla
  2012-11-08  3:22 ` Andres Lagar-Cavilla
  0 siblings, 1 reply; 16+ messages in thread
From: Andres Lagar-Cavilla @ 2012-11-07 20:54 UTC (permalink / raw)
  To: Tim Deegan, Keir (Xen.org), Jan Beulich, xen-devel

Hi all,
we currently have a problem in the (x86) mm layer. Callers may request the p2m to perform a translation of a gfn to an mfn. Such translation may need to wait for a third party to service it. This happens when:

- a page needs to be paged in
- a CoW breaking of a shared page fails due to lack of memory

Note that paging in may also fail due to lack of memory. In both ENOMEM cases, all the plumbing for a toolstack to be notified and take some corrective action to release some memory and retry is in place. We also have plumbing for pagers in place.

Ideally we want the internals to be self-contained, so that callers need not be concerned with any of this. A request for a p2m translation may or may not sleep, but on exit from the p2m the caller either has an mfn with a page ref, or an error code due to some other condition.

Wait queue support in (x86) Xen prevents sleeping on a wait queue if any locks are held, including RCU read-side locks (i.e. BUG_ON(!in_atomic()).

For this reason, we have not yet implemented sleeping on the p2m. Callers may get errors telling them to retry. A lot of (imho) ugly code is peppered around the hypervisor to deal with the consequences of this. More fundamentally, in some cases there is no possible elegant handling, and guests are crashed (for example, if a page table page is paged out and the hypervisor needs to translate a guest virtual address to a gfn). This limits the applicability of memory paging and sharing.

One way to solve this would be to ensure no code path liable to sleep in a wait queue is holding any locks at wait queue sleep time. I believe this is doomed. Not just because this is a herculean task. It also makes writing hypervisor code *very* difficult. Anyone trying to throw a p2m translation into a code path needs to think of all possible upstream call sequences. Not even RCU read locks are allowed.

I'd like to propose an approach that ensures that as long some properties are met, arbitrary wait queue sleep is allowed. Here are the properties:
1. Third parties servicing a wait queue sleep are indeed third parties. In other words, dom0 does not do paging.
2. Vcpus of a wait queue servicing domain may never go to sleep on a wait queue during a foreign map.
3. A guest vcpu may go to sleep on a wait queue holding any kinds of locks as long as it does not hold the p2m lock.
4. "Kick" routines in the hypervisor by which service domains effectively wake up a vcpu may only take the p2m lock to do a fix up of the p2m.
5. Wait queues can be awakened on a special domain destroy condition.

Property 1. is hopefully self-evident, and although not enforced in the code it is reasonably simple to do so.

Property 2. is also self-evident and enforced in the code today.

Property 3. simplifies reasoning about p2m translations and wait queue sleeping. Provides a clean model for reasoning about what might or might not happen. It will require us to restructure some code paths (i.e. XENMEM_add_to_physmap) that require multiple p2m translations in a single critical region to perform all translations up front.

Property 4. is already enforced in the code as is right now.

Property 5. needs adding some logic to the top of domain destruction: set a flag, wake up all vcpus in wait queues. Extra logic on the wait queue side will exit the wait if the destroy flag is set, with an error. Most if not all callers can deal right now with a p2m translation error (other than paging), and unwind and release all their locks.

I confess my understanding of RCU is not 100% there and I am not sure what will happen to force_quiescent_state. I also understand there is a impedance mismatch wrt to "saving" and "restoring" the physical CPU preempt count.

Thanks,
Andres

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Wait Queues
@ 2000-12-11 15:43 Carlo Pagano
  0 siblings, 0 replies; 16+ messages in thread
From: Carlo Pagano @ 2000-12-11 15:43 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2019 bytes --]

I am trying to modify a driver that worked great on 2.2.16 to 2.4.0-x..
 
My old code was:
 
static struct wait_queue *roundrobin_wait; 
static struct wait_queue *task_stop_wait; 
 
static struct tq_struct roundrobin_task; 
static struct timer_list timeout_timer; 
 
...
 
init_timer(&timeout_timer);              
timeout_timer.function = Timer; 
timeout_timer.data = (unsigned long)&timer_data; 
timeout_timer.expires = jiffies + 3*HZ;
 
void Timer(unsigned long ptr) 
{ 
    struct clientdata *pTimerData = (struct clientdata *) ptr;
 
if (pTimerData->one_shot_queue_task){
        // start the main round robin
        queue_task(&roundrobin_task, &tq_scheduler); 
        pTimerData->one_shot_queue_task = FALSE;
    }
 
    /* wake-up the task responsible for the Timeout callbacks round-robin */
    wake_up_interruptible(&roundrobin_wait); 
 
    /* re-schedule this Timer function */
    init_timer(&timeout_timer);              
    timeout_timer.function = Timer; 
    timeout_timer.data = (unsigned long)&timer_data; 
    timeout_timer.expires = jiffies + HZ/100; 
    add_timer(&timeout_timer); 
} 
 
void RoundRobin(void *ptr) 
{ 
    struct clientdata *data = (struct clientdata *) ptr;
 
    interruptible_sleep_on(&roundrobin_wait); 
 
    if (data->queue)    // data->queue set to NULL in Stop()
    {
        /* do whatever you want to do here ... */ 
        OSALTimeoutCallback *pCallback = data->callback; 
 
        pCallback->RoundRobinCallbacks(); 
    }
 
    /* re-register itself, if needed */ 
    roundrobin_task.routine = RoundRobin; //main_round_robin; 
    roundrobin_task.data = (void *) &roundrobin_data; 
    if (data->queue)
    { 
        queue_task(&roundrobin_task, data->queue); 
    } 
    else 
    { 
        wake_up_interruptible(&task_stop_wait); 
 
    } 
} 
 
 
 
 
 
Carlo Pagano
Software Designer
Trisignal Communications, a division of i-data Technology
(514) 832-3603
carlop@trisignal.com
 

[-- Attachment #2: Type: text/html, Size: 4462 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-04-22 16:49 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-20  1:23 wait queues Ruben Safir
2015-04-20  1:48 ` Ruben Safir
2015-04-20  1:54 ` Fred Chou
2015-04-20  8:57   ` Ruben Safir
2015-04-20 15:23 ` michi1 at michaelblizek.twilightparadox.com
2015-04-20 16:39   ` Ruben Safir
2015-04-21 15:05     ` michi1 at michaelblizek.twilightparadox.com
2015-04-22 11:23       ` wait queues semiphores kernel implementations Ruben Safir
2015-04-22 16:49         ` michi1 at michaelblizek.twilightparadox.com
  -- strict thread matches above, loose matches on Subject: below --
2015-04-19 10:20 wait queues Ruben Safir
2012-11-07 20:54 Wait Queues Andres Lagar-Cavilla
2012-11-08  3:22 ` Andres Lagar-Cavilla
2012-11-08  7:42   ` Keir Fraser
2012-11-08 15:39     ` Andres Lagar-Cavilla
2012-11-08 16:48       ` Keir Fraser
2000-12-11 15:43 Carlo Pagano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.