All of lore.kernel.org
 help / color / mirror / Atom feed
* Deterministic behavior for TTY serial
@ 2012-04-17 14:38 Ivo Sieben
  2012-04-19  0:14 ` Greg KH
  2012-04-19 11:19 ` Alan Cox
  0 siblings, 2 replies; 13+ messages in thread
From: Ivo Sieben @ 2012-04-17 14:38 UTC (permalink / raw)
  To: linux-serial; +Cc: Alan Cox, RT

Hello,

We are currently using the TTY framework for serial communication.

We are wondering if it is possible to give the TTY device in more
deterministic behavior (as in "less locks & no sleeping")
So in case of non blocking read/write behavior:
- We want directly write data to the serial_core transmit buffer and
return immediately.
- Incoming data should be buffered, on a read data is read directly
from that buffer and when no data available return immediately

We have the idea that the default N_TTY line discipline introduces too
much overhead & locking behavior what makes it less suitable for
deterministic serial communication on a PREEMT_RT platform.
Our first thought was that we need to use some kind of "raw" line
discipline, that directly writes data to the serial_core transmit
buffer, and buffers incoming data.

But isn't such a line discipline already available?
Or are we missing a option/flag for the N_TTY line discipline, that
will achieve the same behavior?

Regards,
Ivo Sieben

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Deterministic behavior for TTY serial
  2012-04-17 14:38 Deterministic behavior for TTY serial Ivo Sieben
@ 2012-04-19  0:14 ` Greg KH
  2012-04-19 15:37   ` Ivo Sieben
  2012-04-19 11:19 ` Alan Cox
  1 sibling, 1 reply; 13+ messages in thread
From: Greg KH @ 2012-04-19  0:14 UTC (permalink / raw)
  To: Ivo Sieben; +Cc: linux-serial, Alan Cox, RT

On Tue, Apr 17, 2012 at 04:38:30PM +0200, Ivo Sieben wrote:
> Hello,
> 
> We are currently using the TTY framework for serial communication.
> 
> We are wondering if it is possible to give the TTY device in more
> deterministic behavior (as in "less locks & no sleeping")

What specifically are you looking for?

> So in case of non blocking read/write behavior:
> - We want directly write data to the serial_core transmit buffer and
> return immediately.

What is "immediately"?

> - Incoming data should be buffered, on a read data is read directly
> from that buffer and when no data available return immediately

That doesn't happen today?

What type of latencies are you seeing today that is bothering you?  What
hardware are you expecting to work in this manner?  What exact UART are
you using?  What happens when the UART buffers data?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Deterministic behavior for TTY serial
  2012-04-17 14:38 Deterministic behavior for TTY serial Ivo Sieben
  2012-04-19  0:14 ` Greg KH
@ 2012-04-19 11:19 ` Alan Cox
  2012-04-19 15:42   ` Ivo Sieben
  1 sibling, 1 reply; 13+ messages in thread
From: Alan Cox @ 2012-04-19 11:19 UTC (permalink / raw)
  To: Ivo Sieben; +Cc: linux-serial, RT

> We are wondering if it is possible to give the TTY device in more
> deterministic behavior (as in "less locks & no sleeping")
> So in case of non blocking read/write behavior:
> - We want directly write data to the serial_core transmit buffer and
> return immediately.

If you have the tty in raw mode then that is basically what the ldisc
code does (plus any flow control you may have selected).

> - Incoming data should be buffered, on a read data is read directly
> from that buffer and when no data available return immediately

Ditto in raw mode, and you can use O_NDELAY or the VMIN/VTIME fields to
optimise block transfer behaviour. We do actually do an additional
memcpy but memory copies of cached memory are so cheap it should be
irrelevant unless trying to do megabit speeds on low end embedded
processors.

> We have the idea that the default N_TTY line discipline introduces too

Based upon what analysis ?

> much overhead & locking behavior what makes it less suitable for
> deterministic serial communication on a PREEMT_RT platform.

Are you using USB ports ?

Alan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Deterministic behavior for TTY serial
  2012-04-19  0:14 ` Greg KH
@ 2012-04-19 15:37   ` Ivo Sieben
  2012-04-19 15:46     ` Greg KH
  0 siblings, 1 reply; 13+ messages in thread
From: Ivo Sieben @ 2012-04-19 15:37 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-serial, Alan Cox, RT

Hi Greg,

Op 19 april 2012 02:14 heeft Greg KH <gregkh@linuxfoundation.org> het
volgende geschreven:
> On Tue, Apr 17, 2012 at 04:38:30PM +0200, Ivo Sieben wrote:
>> Hello,
>>
>> We are currently using the TTY framework for serial communication.
>>
>> We are wondering if it is possible to give the TTY device in more
>> deterministic behavior (as in "less locks & no sleeping")
>
> What specifically are you looking for?
>

We run an application with a real-time thread, running on a high priority.
This thread does serial communication, using non blocking read/write
file I/O on a tty device, with small amounts of data (= 24 bytes).
This application runs on a AT2AM9261 processor, 200 MHz
The maximum execution time of both the read & write go up to 200 us

>> So in case of non blocking read/write behavior:
>> - We want directly write data to the serial_core transmit buffer and
>> return immediately.
>
> What is "immediately"?
>

We use non blocking read & write functions
We would like the read/write functions to always execute less than 100us

>> - Incoming data should be buffered, on a read data is read directly
>> from that buffer and when no data available return immediately
>
> That doesn't happen today?
>
> What type of latencies are you seeing today that is bothering you?  What
> hardware are you expecting to work in this manner?  What exact UART are
> you using?  What happens when the UART buffers data?
>
> thanks,
>
> greg k-h

We use a self written serial_core device uart driver that implements a
driver for a UART peripheral in a FPGA on our target board..
--
To unsubscribe from this list: send the line "unsubscribe linux-serial" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Deterministic behavior for TTY serial
  2012-04-19 11:19 ` Alan Cox
@ 2012-04-19 15:42   ` Ivo Sieben
  0 siblings, 0 replies; 13+ messages in thread
From: Ivo Sieben @ 2012-04-19 15:42 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-serial, RT, Greg KH

Hi Alan,

Op 19 april 2012 13:19 heeft Alan Cox <alan@linux.intel.com> het
volgende geschreven:
>> We are wondering if it is possible to give the TTY device in more
>> deterministic behavior (as in "less locks & no sleeping")
>> So in case of non blocking read/write behavior:
>> - We want directly write data to the serial_core transmit buffer and
>> return immediately.
>
> If you have the tty in raw mode then that is basically what the ldisc
> code does (plus any flow control you may have selected).
>
>> - Incoming data should be buffered, on a read data is read directly
>> from that buffer and when no data available return immediately
>
> Ditto in raw mode, and you can use O_NDELAY or the VMIN/VTIME fields to
> optimise block transfer behaviour. We do actually do an additional
> memcpy but memory copies of cached memory are so cheap it should be
> irrelevant unless trying to do megabit speeds on low end embedded
> processors.
>

I assume we use raw mode:
- We call the cfmakeraw() function.
- We also set the O_NDELAY flag, VMIN = 1, VTIME = 0

>> We have the idea that the default N_TTY line discipline introduces too
>
> Based upon what analysis ?
>

I think you are right: I did not yet do "real" analysis on the N_TTY
line discipline apart from source code analysis.
I will do some extra ftrace tests, to find out what causes the delays
in the execution time.
I'll come back to that...

>> much overhead & locking behavior what makes it less suitable for
>> deterministic serial communication on a PREEMT_RT platform.
>
> Are you using USB ports ?
>

No, it is a UART peripheral in a FPGA on our target board.
We use a self written serial_core base driver

> Alan

Thanks,
Ivo Sieben

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Deterministic behavior for TTY serial
  2012-04-19 15:37   ` Ivo Sieben
@ 2012-04-19 15:46     ` Greg KH
  2012-04-26 14:27       ` Ivo Sieben
  0 siblings, 1 reply; 13+ messages in thread
From: Greg KH @ 2012-04-19 15:46 UTC (permalink / raw)
  To: Ivo Sieben; +Cc: linux-serial, Alan Cox, RT

On Thu, Apr 19, 2012 at 05:37:56PM +0200, Ivo Sieben wrote:
> Hi Greg,
> 
> Op 19 april 2012 02:14 heeft Greg KH <gregkh@linuxfoundation.org> het
> volgende geschreven:
> > On Tue, Apr 17, 2012 at 04:38:30PM +0200, Ivo Sieben wrote:
> >> Hello,
> >>
> >> We are currently using the TTY framework for serial communication.
> >>
> >> We are wondering if it is possible to give the TTY device in more
> >> deterministic behavior (as in "less locks & no sleeping")
> >
> > What specifically are you looking for?
> >
> 
> We run an application with a real-time thread, running on a high priority.
> This thread does serial communication, using non blocking read/write
> file I/O on a tty device, with small amounts of data (= 24 bytes).
> This application runs on a AT2AM9261 processor, 200 MHz
> The maximum execution time of both the read & write go up to 200 us
> 
> >> So in case of non blocking read/write behavior:
> >> - We want directly write data to the serial_core transmit buffer and
> >> return immediately.
> >
> > What is "immediately"?
> >
> 
> We use non blocking read & write functions
> We would like the read/write functions to always execute less than 100us

Ok, and are you sure that your processor can even do something like
this?  Where is the time being spent when you make these calls?  A read
function should never hit the hardware, only retrieving data from a
buffer in memory, so if your processor can go this fast, it should be
fine.

Have you done profiling to determine exactly what it taking "too long"
for you?  If so, what is the delay?  If not, you should do this :)

> We use a self written serial_core device uart driver that implements a
> driver for a UART peripheral in a FPGA on our target board..

Do you have a pointer to the driver anywhere?  Why isn't it submitted
for inclusion in the main kernel tree?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Deterministic behavior for TTY serial
  2012-04-19 15:46     ` Greg KH
@ 2012-04-26 14:27       ` Ivo Sieben
  2012-05-01 14:30         ` Ivo Sieben
  0 siblings, 1 reply; 13+ messages in thread
From: Ivo Sieben @ 2012-04-26 14:27 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-serial, Alan Cox, RT

Hi

>> We use a self written serial_core device uart driver that implements a
>> driver for a UART peripheral in a FPGA on our target board..
>
> Do you have a pointer to the driver anywhere?  Why isn't it submitted
> for inclusion in the main kernel tree?

This driver is work in progress and not yet mature enough for
inclusion in the main kernel tree.

> Have you done profiling to determine exactly what it taking "too long"
> for you?  If so, what is the delay?  If not, you should do this :)
>

I did some analyses using the ftrace 'function_graph' tracer to find
out what causes the TTY read to take longer than expected.
I use a test application, running on RT prioirty 99 that writes bursts
of 24 Bytes data to a my TTY device. A loop back connector is used, so
the application also reads back these 24 bytes. Non blocking reads &
writes are used.

This analyses is still ongoing...
But I found two issues that you might can help me to explain (but  as
I said: I still need to "dive" further into the source)

1)
In some cases the tty_flush_to_ldisc() function (called by
drivers/tty/n_tty.c, line 1599) takes a rather long time than with
other TTY reads...
For trace see: http://pastebin.com/zXCYTLNj

If I understand it correctly, the TTY flip buffer uses a workqueue to
transfer receive read data from the TTY flip buffer into the buffer of
the line discipline. It seems like the N_TTY line discipline tries to
actively flush data from the TTY flip buffer into the line discipline
buffer. But my serial device drive initiates a tty_flip_buffer_push()
every time after a number of bytes were received (this is intiated
from threaded irq context by the way).

In the case of this trace: is the workqueue currently already busy
transferring data to the ldisc because of the UART receive interrupt
handling has queued that a new work item? So is the N_TTY read()
function actually waiting for that work item to be finished? I guess
in that case for non blocking reads you would like to return with an
EAGAIN, and try to read the data the next time you call the read
function... right?

2)
In some cases the tty_ldisc_deref() function (called by
drivers/tty/tty_io.c, line 977) takes a rather long time than with
other TTY reads
For trace see: http://pastebin.com/Nuh5cLGv

After a successful read from the N_TTY line discipline, the TTY
framework dereferences the line discipline.
Can the TTY read() block here because other processes are currently
try to get or release a reference to the N_TTY line discpline at the
same time?
So in that case: does the high priority TTY read has to wait for a
lower priority TTY read/write operation (e.g. from the terminal I/O).


Regards,
Ivo Sieben
--
To unsubscribe from this list: send the line "unsubscribe linux-serial" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Deterministic behavior for TTY serial
  2012-04-26 14:27       ` Ivo Sieben
@ 2012-05-01 14:30         ` Ivo Sieben
  2012-05-01 15:04           ` Alan Cox
  0 siblings, 1 reply; 13+ messages in thread
From: Ivo Sieben @ 2012-05-01 14:30 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-serial, Alan Cox, RT

Hello,

2012/4/26 Ivo Sieben <meltedpianoman@gmail.com>:
>
> I did some analyses using the ftrace 'function_graph' tracer to find
> out what causes the TTY read to take longer than expected.
> I use a test application, running on RT prioirty 99 that writes bursts
> of 24 Bytes data to a my TTY device. A loop back connector is used, so
> the application also reads back these 24 bytes. Non blocking reads &
> writes are used.
>
> This analyses is still ongoing...
> Regards,
> Ivo Sieben

I've proceeded with my analyses, and I think I've found one cause for
the non deterministic read behavior ...
(there is also an issue with the flip buffer read handling, but I'm
still investigating that)

On line 47 of tty_ldisc.c a Spin Lock is defined that guards the line
discipline administration.
It protects two reference counters:
"users", atomic counter in the tty_ldisc struct, that holds the number
of active users of the ldisc in each tty instance. Used for "idle"
handling.
"refcount", counter in the tty_ldisc_ops struct, that holds the number
of lines using the discipline. (when the user count of a line reaches
0, the refcount of the line discipline is decreased by one) Only when
the refcount is 0, it is allowed to unregister the line discipline.

This Spin Lock is defined globally, This causes that my high priority
process to get blocked because of a lower priority process that holds
this spin lock. Of course you get priority inheritance, but this adds
quite a lot of extra execution time to the read function (because of
additional spin lock behavior, and scheduling to the lower priority
process)

Since the "user" and "refcount" reference counters have combined
behavior (refcount is decremented when users reaches zero), I don't
see a way to remove this global lock. Any ideas how this can be
improved?

We are considering to implement our UART device driver as a "normal"
character device driver and bypass the TTY framework. But that would
be a pitty, since we would have to re-implement some functionality
that is already in the TTY framework.

Regards,
Ivo Sieben

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Deterministic behavior for TTY serial
  2012-05-01 14:30         ` Ivo Sieben
@ 2012-05-01 15:04           ` Alan Cox
       [not found]             ` <CAMSQXEHAyPOF6YghsYmqqyx+N0oMgn5E=znhgFyspMUnaH78ig@mail.gmail.com>
  0 siblings, 1 reply; 13+ messages in thread
From: Alan Cox @ 2012-05-01 15:04 UTC (permalink / raw)
  To: Ivo Sieben; +Cc: Greg KH, linux-serial, Alan Cox, RT

> This Spin Lock is defined globally, This causes that my high priority
> process to get blocked because of a lower priority process that holds
> this spin lock. Of course you get priority inheritance, but this adds
> quite a lot of extra execution time to the read function (because of
> additional spin lock behavior, and scheduling to the lower priority
> process)

It's a spin lock and its only held across very small numbers of
instructions in any normal path so this rather surprises me - in your
actual capture data can you see what is holding the lock for long times
causing this ?

> Since the "user" and "refcount" reference counters have combined
> behavior (refcount is decremented when users reaches zero), I don't
> see a way to remove this global lock. Any ideas how this can be
> improved?

I've never really thought about it because it should never be contended
in any meaningful way.

Alan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Deterministic behavior for TTY serial
       [not found]             ` <CAMSQXEHAyPOF6YghsYmqqyx+N0oMgn5E=znhgFyspMUnaH78ig@mail.gmail.com>
@ 2012-05-02  8:38               ` Ivo Sieben
  2012-05-02 12:39                 ` Ivo Sieben
  0 siblings, 1 reply; 13+ messages in thread
From: Ivo Sieben @ 2012-05-02  8:38 UTC (permalink / raw)
  To: Greg KH, linux-serial, Alan Cox, RT

Sorry, now my responce with send to all instead of only to Alan....

2012/5/2 Ivo Sieben <meltedpianoman@gmail.com>:
> Hi,
>
>> It's a spin lock and its only held across very small numbers of
>> instructions in any normal path so this rather surprises me - in your
>> actual capture data can you see what is holding the lock for long times
>> causing this ?
>
> Indeed the lock is only taken for a very short time. So  in most
> situations it works fine, the lock is free, and can be taken quickly.
> But when I stress my system (e.g. by continuously dumpling a lot of
> text data to another serial terminal from the background), my high
> priority task sometimes finds the lock is taken. In that case
> additional lock handling comes in: priority inheritance, a context
> switch to the background task that holds the lock, and a context
> switch back after the lock is released again. This makes a normal read
> that takes about 50us, to increase in execution time to 150us.
>
> The ftrace for a "quick" spin lock:
>
> 0)               |        tty_ldisc_deref() {
> 0)               |          put_ldisc() {
> 0)   8.960 us    |            atomic_dec_and_spin_lock();
> 0)               |            __wake_up() {
> 0)               |              rt_spin_lock() {
> 0)               |                rt_spin_lock_slowlock() {
> 0)   8.720 us    |                  __try_to_take_rt_mutex();
> 0) + 25.360 us   |                }
> 0) + 41.760 us   |              }
> 0)   8.400 us    |              __wake_up_common();
> 0)               |              rt_spin_unlock() {
> 0)   8.240 us    |                rt_spin_lock_slowunlock();
> 0) + 24.320 us   |              }
> 0) ! 107.840 us  |            }
> 0) ! 146.400 us  |          }
> 0) ! 163.120 us  |        }
>
> While the ftrace for a spin lock that is already taken is a lot
> longer, adding a lot of execution time.
>
>  0)               |        tty_ldisc_deref() {
>  0)               |          put_ldisc() {
>  0)   8.640 us    |            atomic_dec_and_spin_lock();
>  0)               |            __wake_up() {
>  0)               |              rt_spin_lock() {
>  0)               |                rt_spin_lock_slowlock() {
>  0)   8.160 us    |                  __try_to_take_rt_mutex();
>  0)               |                  task_blocks_on_rt_mutex() {
>  0)   8.720 us    |                    __rt_mutex_adjust_prio();
>  0)               |                    __rt_mutex_adjust_prio() {
>  0)               |                      task_setprio() {
>  0)               |                        dequeue_task() {
>  0)   9.680 us    |                          update_rq_clock();
>  0)               |                          dequeue_task_rt() {
>  0)   9.200 us    |                            update_curr_rt();
>  0) + 10.400 us   |                            dequeue_rt_stack();
>  0) + 44.560 us   |                          }
>  0) + 80.000 us   |                        }
>  0)               |                        enqueue_task() {
>  0)   9.120 us    |                          update_rq_clock();
>  0)               |                          enqueue_task_rt() {
>  0)   8.400 us    |                            dequeue_rt_stack();
>  0)   9.280 us    |                            __enqueue_rt_entity();
>  0) + 42.240 us   |                          }
>  0) + 75.840 us   |                        }
>  0)   8.560 us    |                        prio_changed_rt();
>  0)   8.000 us    |                        __task_rq_unlock();
>  0) ! 216.240 us  |                      }
>  0) ! 233.280 us  |                    }
>  0) ! 271.360 us  |                  }
>  0)   8.160 us    |                  __try_to_take_rt_mutex();
>  0)               |                  schedule() {
>  0)               |                    __schedule() {
>  0)               |                      rcu_note_context_switch() {
>  0)   9.120 us    |                        rcu_preempt_note_context_switch();
>  0) + 25.920 us   |                      }
>  0)               |                      deactivate_task() {
>  0)               |                        dequeue_task() {
>  0)   9.120 us    |                          update_rq_clock();
>  0)               |                          dequeue_task_rt() {
>  0)   8.400 us    |                            update_curr_rt();
>  0)   8.720 us    |                            dequeue_rt_stack();
>  0) + 40.880 us   |                          }
>  0) + 74.480 us   |                        }
>  0) + 91.200 us   |                      }
>  0)               |                      put_prev_task_rt() {
>  0)   8.480 us    |                        update_curr_rt();
>  0) + 24.560 us   |                      }
>  0)   8.000 us    |                      pick_next_task_stop();
>  0)               |                      pick_next_task_rt() {
>  0)   8.560 us    |                        pick_next_rt_entity();
>  0) + 25.280 us   |                      }
>  0)   ==========> |
>  0)               |                      asm_do_IRQ() {
>  0)               |                        irq_enter() {
>  0)   8.320 us    |                          idle_cpu();
>  0) + 24.160 us   |                        }
>  0)               |                        generic_handle_irq() {
>  0)               |                          handle_level_irq() {
>  0)   8.240 us    |                            at91_aic_mask_irq();
>  0)   8.160 us    |                            at91_aic_mask_irq();
>  0)               |                            handle_irq_event() {
>  0)               |                              handle_irq_event_percpu() {
>  0)               |                                periodic_tick_interrupt() {
>  0)               |                                  roserts_timer_hook() {
>  0)               |
> system_timer_get_highres_time() {
>  0)   10.000 us   |                                      __get_fpga_time_64();
>  0) + 26.880 us   |                                    }
>  0) + 43.920 us   |                                  }
>  0) + 61.520 us   |                                }
>  0)   8.480 us    |                                note_interrupt();
>  0) + 96.320 us   |                              }
>  0) ! 113.200 us  |                            }
>  0)               |                            unmask_irq() {
>  0)   8.160 us    |                              at91_aic_unmask_irq();
>  0) + 24.240 us   |                            }
>  0) ! 195.840 us  |                          }
>  0) ! 212.080 us  |                        }
>  0)   8.800 us    |                        irq_exit();
>  0) ! 278.240 us  |                      }
>  0)   <========== |
>  0)               |                      atomic_notifier_call_chain() {
>  0)               |                        __atomic_notifier_call_chain() {
>  0)   8.240 us    |                          __rcu_read_lock();
>  0)   8.400 us    |                          notifier_call_chain();
>  0)   8.480 us    |                          __rcu_read_unlock();
>  0) + 58.080 us   |                        }
>  0) + 75.920 us   |                      }
>  ------------------------------------------
>  0)  uart_to-493   =>   ksoftir-3
>  ------------------------------------------
>
>  0) + 10.640 us   |  finish_task_switch();
>  0)               |  rt_spin_unlock() {
>  0)               |    rt_spin_lock_slowunlock() {
>  0)               |      wakeup_next_waiter() {
>  0)               |        wake_up_lock_sleeper() {
>  0)               |          try_to_wake_up() {
>  0)               |            activate_task() {
>  0)               |              enqueue_task() {
>  0)   9.120 us    |                update_rq_clock();
>  0)               |                enqueue_task_rt() {
>  0)   8.240 us    |                  dequeue_rt_stack();
>  0)   8.080 us    |                  __enqueue_rt_entity();
>  0) + 39.360 us   |                }
>  0) + 71.840 us   |              }
>  0) + 87.760 us   |            }
>  0)               |            ttwu_do_wakeup() {
>  0)               |              check_preempt_curr() {
>  0)   7.840 us    |                check_preempt_curr_rt();
>  0) + 24.320 us   |              }
>  0) + 43.520 us   |            }
>  0) ! 156.960 us  |          }
>  0) ! 172.960 us  |        }
>  0) ! 190.080 us  |      }
>  0)               |      rt_mutex_adjust_prio() {
>  0)               |        __rt_mutex_adjust_prio() {
>  0)               |          task_setprio() {
>  0)               |            dequeue_task() {
>  0)   8.480 us    |              update_rq_clock();
>  0)               |              dequeue_task_rt() {
>  0)   8.880 us    |                update_curr_rt();
>  0)   8.080 us    |                dequeue_rt_stack();
>  0) + 39.920 us   |              }
>  0) + 72.000 us   |            }
>  0)               |            put_prev_task_rt() {
>  0) + 12.800 us   |              update_curr_rt();
>  0) + 28.000 us   |            }
>  0)   7.600 us    |            set_curr_task_rt();
>  0)               |            enqueue_task() {
>  0)   8.560 us    |              update_rq_clock();
>  0)               |              enqueue_task_rt() {
>  0)   7.760 us    |                dequeue_rt_stack();
>  0)   7.920 us    |                __enqueue_rt_entity();
>  0) + 38.160 us   |              }
>  0) + 69.440 us   |            }
>  0)   7.920 us    |            prio_changed_rt();
>  0)   8.240 us    |            __task_rq_unlock();
>  0) ! 249.760 us  |          }
>  0) ! 265.600 us  |        }
>  0)               |        __schedule() {
>  0)               |          rcu_note_context_switch() {
>  0)   8.800 us    |            rcu_preempt_note_context_switch();
>  0) + 25.200 us   |          }
>  0)   9.360 us    |          update_rq_clock();
>  0)               |          put_prev_task_rt() {
>  0)   8.560 us    |            update_curr_rt();
>  0) + 24.720 us   |          }
>  0)   8.160 us    |          pick_next_task_stop();
>  0)               |          pick_next_task_rt() {
>  0)   8.560 us    |            pick_next_rt_entity();
>  0) + 24.880 us   |          }
>  0)               |          atomic_notifier_call_chain() {
>  0)               |            __atomic_notifier_call_chain() {
>  0)   7.600 us    |              __rcu_read_lock();
>  0)   7.760 us    |              notifier_call_chain();
>  0)   7.920 us    |              __rcu_read_unlock();
>  0) + 53.760 us   |            }
>  0) + 69.200 us   |          }
>  ------------------------------------------
>  0)   ksoftir-3    =>  uart_to-493
>  ------------------------------------------
>
>  0) + 10.080 us   |                      finish_task_switch();
>  0) ! 1381.120 us |                    }
>  0) ! 1397.680 us |                  } /* schedule */
>  0) + 10.320 us   |                  __try_to_take_rt_mutex();
>  0) ! 1748.240 us |                } /* rt_spin_lock_slowlock */
>  0) ! 1764.320 us |              } /* rt_spin_lock */
>  0)   8.400 us    |              __wake_up_common();
>  0)               |              rt_spin_unlock() {
>  0)   8.160 us    |                rt_spin_lock_slowunlock();
>  0) + 24.320 us   |              }
>  0) ! 1830.720 us |            } /* __wake_up */
>  0) ! 1864.960 us |          } /* put_ldisc */
>  0) ! 1881.760 us |        } /* tty_ldisc_deref */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Deterministic behavior for TTY serial
  2012-05-02  8:38               ` Ivo Sieben
@ 2012-05-02 12:39                 ` Ivo Sieben
  2012-05-03 15:28                   ` Ivo Sieben
  0 siblings, 1 reply; 13+ messages in thread
From: Ivo Sieben @ 2012-05-02 12:39 UTC (permalink / raw)
  To: Greg KH, linux-serial, Alan Cox, RT

Hi,

> 2012/5/2 Ivo Sieben <meltedpianoman@gmail.com>:
>> Hi,
>>
>> Indeed the lock is only taken for a very short time. So  in most
>> situations it works fine, the lock is free, and can be taken quickly.
>> But when I stress my system (e.g. by continuously dumpling a lot of
>> text data to another serial terminal from the background), my high
>> priority task sometimes finds the lock is taken. In that case
>> additional lock handling comes in: priority inheritance, a context
>> switch to the background task that holds the lock, and a context
>> switch back after the lock is released again. This makes a normal read
>> that takes about 50us, to increase in execution time to 150us.

The PREEMPT_RT uses mutexes for "normal" spin locks that do not
disable interrupts...
I'll try to use raw spinlocks in this code section and for the tty flip buffer
See if that can solve my problem.

If you have other ideas... let me know!

Regards,
Ivo
--
To unsubscribe from this list: send the line "unsubscribe linux-serial" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Deterministic behavior for TTY serial
  2012-05-02 12:39                 ` Ivo Sieben
@ 2012-05-03 15:28                   ` Ivo Sieben
  2012-05-05  0:32                     ` Greg KH
  0 siblings, 1 reply; 13+ messages in thread
From: Ivo Sieben @ 2012-05-03 15:28 UTC (permalink / raw)
  To: Greg KH, linux-serial, Alan Cox, RT

Hi,

>
> The PREEMPT_RT uses mutexes for "normal" spin locks that do not
> disable interrupts...
> I'll try to use raw spinlocks in this code section and for the tty flip buffer
> See if that can solve my problem.
>
> If you have other ideas... let me know!
>
> Regards,
> Ivo

I've changed some small things to the tty layer (see my other 3 RFC
patches I've send).
Performance increased with my loopback stress test:
- Old situation: average read call last for 50us, with peaks up to 230 us
- New situation: average read call still 50us, peak up to 60 us
- Write was stable in both situations: average of 90 us, peak up to 100 us

Only the very first read & write took extra time (128 us for read, 143
for write)
I'm still investigating that...

Feedback is very appreciated.

Regards,
Ivo Sieben

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Deterministic behavior for TTY serial
  2012-05-03 15:28                   ` Ivo Sieben
@ 2012-05-05  0:32                     ` Greg KH
  0 siblings, 0 replies; 13+ messages in thread
From: Greg KH @ 2012-05-05  0:32 UTC (permalink / raw)
  To: Ivo Sieben; +Cc: linux-serial, Alan Cox, RT

On Thu, May 03, 2012 at 05:28:47PM +0200, Ivo Sieben wrote:
> Hi,
> 
> >
> > The PREEMPT_RT uses mutexes for "normal" spin locks that do not
> > disable interrupts...
> > I'll try to use raw spinlocks in this code section and for the tty flip buffer
> > See if that can solve my problem.
> >
> > If you have other ideas... let me know!
> >
> > Regards,
> > Ivo
> 
> I've changed some small things to the tty layer (see my other 3 RFC
> patches I've send).
> Performance increased with my loopback stress test:
> - Old situation: average read call last for 50us, with peaks up to 230 us
> - New situation: average read call still 50us, peak up to 60 us
> - Write was stable in both situations: average of 90 us, peak up to 100 us
> 
> Only the very first read & write took extra time (128 us for read, 143
> for write)
> I'm still investigating that...
> 
> Feedback is very appreciated.

Why are raw spinlocks "faster" here?  I like the end-result of what you
have accomplished, but I had some questions on your patches, care to
answer them?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-05-05  0:33 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-17 14:38 Deterministic behavior for TTY serial Ivo Sieben
2012-04-19  0:14 ` Greg KH
2012-04-19 15:37   ` Ivo Sieben
2012-04-19 15:46     ` Greg KH
2012-04-26 14:27       ` Ivo Sieben
2012-05-01 14:30         ` Ivo Sieben
2012-05-01 15:04           ` Alan Cox
     [not found]             ` <CAMSQXEHAyPOF6YghsYmqqyx+N0oMgn5E=znhgFyspMUnaH78ig@mail.gmail.com>
2012-05-02  8:38               ` Ivo Sieben
2012-05-02 12:39                 ` Ivo Sieben
2012-05-03 15:28                   ` Ivo Sieben
2012-05-05  0:32                     ` Greg KH
2012-04-19 11:19 ` Alan Cox
2012-04-19 15:42   ` Ivo Sieben

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.