All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] tty: fix data race in n_tty_receive_buf_common
@ 2018-01-03 13:48 Gaurav Kohli
  2018-01-03 19:38 ` Alan Cox
  0 siblings, 1 reply; 18+ messages in thread
From: Gaurav Kohli @ 2018-01-03 13:48 UTC (permalink / raw)
  To: jslaby, gregkh, mikey; +Cc: linux-kernel, Gaurav Kohli

There can be a race, if receive_buf call comes before
tty initialization completes in n_tty_open and tty->disc_data
may be NULL.

CPU0                                        cpu1
----                                        ----
 000|n_tty_receive_buf_common()     n_tty_open()
-001|n_tty_receive_buf2()           tty_ldisc_open.isra.3()
-002|tty_ldisc_receive_buf(inline)  tty_ldisc_setup()

If tty->disc_data is NULL, then return from flush_to_ldisc

Signed-off-by: Gaurav Kohli <gkohli@codeaurora.org>

diff --git a/drivers/tty/tty_port.c b/drivers/tty/tty_port.c
index 25d7368..5d49183 100644
--- a/drivers/tty/tty_port.c
+++ b/drivers/tty/tty_port.c
@@ -35,6 +35,9 @@ static int tty_port_default_receive_buf(struct tty_port *port,
 	if (!disc)
 		return 0;
 
+	if (!tty->disc_data)
+		return 0;
+
 	ret = tty_ldisc_receive_buf(disc, p, (char *)f, count);
 
 	tty_ldisc_deref(disc);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] tty: fix data race in n_tty_receive_buf_common
  2018-01-03 13:48 [PATCH] tty: fix data race in n_tty_receive_buf_common Gaurav Kohli
@ 2018-01-03 19:38 ` Alan Cox
  2018-01-04  5:47   ` Kohli, Gaurav
  0 siblings, 1 reply; 18+ messages in thread
From: Alan Cox @ 2018-01-03 19:38 UTC (permalink / raw)
  To: Gaurav Kohli; +Cc: jslaby, gregkh, mikey, linux-kernel

On Wed,  3 Jan 2018 19:18:52 +0530
Gaurav Kohli <gkohli@codeaurora.org> wrote:

> There can be a race, if receive_buf call comes before
> tty initialization completes in n_tty_open and tty->disc_data
> may be NULL.


This makes no sense. If the race exists then the check you do isn't good
enough because the ldsic dsta isn't valid even after the initial
assignment of tty->disc_data.

More to the point no ldisc receive method should ever be getting called
during the ldisc open.  Likewise we must avoid hitting the window of the
old one closing (potentialyl stale disc_data from the old ldisc)

Any change to the ldisc is supposed to occur under tty->ldisc_sem and the
code does an ldisc_ref before invoking the ldisc method.

The only cases I can see where we set an ldisc are

1. tty_set_ldisc

This correctly takes an ldisc_ref so cannot run in parallel with
tty_port_default_receive_buf.

2. tty_init_dev when we set up a new tty

At that point the tty is not supposed to be receiving data and sure
enough we don't call tty->ops->open until it has finished the ldisc set
up, nor do we tty_init_dev a port that is already running.

So given we don't activate the port until tty->ops->open() calls
tty_port_open calls the port activation routine I don't see a bug.

3. tty_release()

Here we take the locks in tty_ldisc_release so that is ok

4. hangup

Again we take the ldisc lock



So unless your driver is stuffing bytes into the tty either before it's
been told too or after it's been told to shut up I don't see a bug. And
if you driver is doing either of those then it's broken. Even then
port->tty ought to be NULL.

What does a full (all CPU) trace of the bug look like and what tty driver
are you using when you capture the trace ?

Alan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tty: fix data race in n_tty_receive_buf_common
  2018-01-03 19:38 ` Alan Cox
@ 2018-01-04  5:47   ` Kohli, Gaurav
  2018-01-04 11:09     ` Alan Cox
  0 siblings, 1 reply; 18+ messages in thread
From: Kohli, Gaurav @ 2018-01-04  5:47 UTC (permalink / raw)
  To: Alan Cox; +Cc: jslaby, gregkh, mikey, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2683 bytes --]


Hi Alan ,

In our case we are seeing this, when tty_init_dev call is running on one 
core and flush_to_ldisc is
running on other core:

So in below case tty_port is intact , only tty  is reinitialized again.
    kworker

  -000|n_tty_receive_buf_common()
-001|n_tty_receive_buf2()
-002|tty_ldisc_receive_buf(inline)
-002|receive_buf(inline)
-002|flush_to_ldisc()
-003|__read_once_size(inline)
-003|static_key_count(inline)
-003|static_key_false(inline)
-003|trace_workqueue_execute_end(inline)
-003|process_one_work()
	init

-005|tty_unthrottle()
-006|n_tty_open()
-007|tty_ldisc_open.isra.3()
-008|tty_ldisc_setup()
-009|tty_init_dev()
-010|tty_open_by_driver(inline)
-010|tty_open()
-011|chrdev_open()
-012|do_dentry_open()
-013|vfs_open()
-014|do_last()


>
>
> 2. tty_init_dev when we set up a new tty
>
> At that point the tty is not supposed to be receiving data and sure
> enough we don't call tty->ops->open until it has finished the ldisc set
> up, nor do we tty_init_dev a port that is already running.
>
> So given we don't activate the port until tty->ops->open() calls
> tty_port_open calls the port activation routine I don't see a bug.
So it seems this case only , open call has not completed but we have 
request for receive .
>
> What does a full (all CPU) trace of the bug look like and what tty driver
> are you using when you capture the trace ?
We are using tty for console logging,
     |    tty = 0xFFFFFFFF477AC880 -> (
     |      magic = 21505,
     |      kref = (refcount = (counter = 2)),
     |      dev = 0xFFFFFFFFEDE3DA80,
     |      driver = 0xFFFFFFFFEDE2A480,
     |      ops = 0xFFFFFF9F26F7D0D0,
     |      index = 0,
     |      ldisc_sem = (count = 1, wait_lock = (raw_lock = (owner = 0, 
next = 0), magic = 3735899821, own
     |      termiox = 0x0,
     |      name = "ttyMSM0",
     |      pgrp = 0x0,

We have seen this issue on 4.9 and also one thing i have observed, 
before tty is getting reinit in tty_init_dev(),
there is console service exited before it in all the dumps.
  35206.969644:   <2> init: Service 'console' (pid 7440) exited with 
status 130
  35206.969690:   <2> init: Sending signal 9 to service 'console' (pid 
7440) process group...
  35206.970857:   <2> init: kill(7440, 9) failed: No such process.

So how can we stop request of receive buff, if we already have tty_port 
and tty is getting reinitialized in midway like above
case?

Regards
Gaurav



-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.


[-- Attachment #2: Type: text/html, Size: 4083 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tty: fix data race in n_tty_receive_buf_common
  2018-01-04  5:47   ` Kohli, Gaurav
@ 2018-01-04 11:09     ` Alan Cox
  2018-01-04 13:46       ` Kohli, Gaurav
  0 siblings, 1 reply; 18+ messages in thread
From: Alan Cox @ 2018-01-04 11:09 UTC (permalink / raw)
  To: Kohli, Gaurav; +Cc: jslaby, gregkh, mikey, linux-kernel

> > What does a full (all CPU) trace of the bug look like and what tty driver
> > are you using when you capture the trace ?  

Which tty driver ? serial/msm_serial.c ?

> We are using tty for console logging,
>      |    tty = 0xFFFFFFFF477AC880 -> (
>      |      magic = 21505,
>      |      kref = (refcount = (counter = 2)),
>      |      dev = 0xFFFFFFFFEDE3DA80,
>      |      driver = 0xFFFFFFFFEDE2A480,
>      |      ops = 0xFFFFFF9F26F7D0D0,
>      |      index = 0,
>      |      ldisc_sem = (count = 1, wait_lock = (raw_lock = (owner = 0, 
> next = 0), magic = 3735899821, own
>      |      termiox = 0x0,
>      |      name = "ttyMSM0",
>      |      pgrp = 0x0,

Ok no what I need to see is a trace of what each CPU is doing at the
point you detect the problem. That way we can see what the path that
races is.

> We have seen this issue on 4.9 and also one thing i have observed, 
> before tty is getting reinit in tty_init_dev(),

When yo stop the DMA is it instantaneous or does it cause a final
interrupt after you return from stop_rx ?

To me it still looks like data is being queued after the port is told to
stop but that's not a certainty.

> there is console service exited before it in all the dumps.
>   35206.969644:   <2> init: Service 'console' (pid 7440) exited with 
> status 130
>   35206.969690:   <2> init: Sending signal 9 to service 'console' (pid 
> 7440) process group...
>   35206.970857:   <2> init: kill(7440, 9) failed: No such process.
> 
> So how can we stop request of receive buff, if we already have tty_port 
> and tty is getting reinitialized in midway like above
> case?

Is the port your console device. If you use a different port as a console
device does the problem go away - that could be a very important detail
as the hangup behaviour for the two is quite different.

Alan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tty: fix data race in n_tty_receive_buf_common
  2018-01-04 11:09     ` Alan Cox
@ 2018-01-04 13:46       ` Kohli, Gaurav
  2018-01-04 14:37         ` Alan Cox
  0 siblings, 1 reply; 18+ messages in thread
From: Kohli, Gaurav @ 2018-01-04 13:46 UTC (permalink / raw)
  To: Alan Cox; +Cc: jslaby, gregkh, mikey, linux-kernel, linux-arm-msm


> Which tty driver ? serial/msm_serial.c ?

We are using our internal driver, msm_geni_serial.c
>    
> Ok no what I need to see is a trace of what each CPU is doing at the
> point you detect the problem. That way we can see what the path that
> races is.
Below is stack trace running by init in our case on one core
-006|n_tty_open(
     |    tty = 0xFFFFFFFF477AC880 -> (
     |      disc_data = 0xFFFFFF80197AD000,

     |      port = 0xFFFFFFFFEDE40000))
     |  ldata = 0xFFFFFF80197AD000

     |  trace_printk_fmt = 0xFFFFFF9F275125F8
-007|tty_ldisc_open.isra.3(
     |    tty = 0xFFFFFFFF477AC880)
-008|tty_ldisc_setup(

-009|tty_init_dev(
     |    driver = 0xFFFFFFFFEDE2A480,
     |    idx = 0)

-010|tty_open_by_driver(inline)
-010|tty_open(

Core 2:
-000|n_tty_receive_buf_common(
     |    tty = 0xFFFFFFFF477AC880,

     |  ?)
     |  ldata_=_0x0
     |  __func__ = (110, 95, 116, 116, 121, 95, 114, 101, 99, 101, 105, 
118, 101, 95, 98, 117, 102, 95, 99, 111, 109, 109, 111, 110, 0)
     |  __u = (__val = 7079195495121566464, __c = (0))
     |  c = 127
     |  ldata = 0xFFFFFFFFF40DF97C

     |  c = 0
     |  ldata = 0xFFFFFF9F26F46000

-001|n_tty_receive_buf2(
     |    tty = 0xFFFFFFFF477AC880,

-002|tty_ldisc_receive_buf(inline)
-002|receive_buf(inline)
-002|flush_to_ldisc(

Please let me know in case some other trace required
>> We have seen this issue on 4.9 and also one thing i have observed,
>> before tty is getting reinit in tty_init_dev(),
> When yo stop the DMA is it instantaneous or does it cause a final
> interrupt after you return from stop_rx ?
>
> To me it still looks like data is being queued after the port is told to
> stop but that's not a certainty.
This geni is based on FIFO.
I have also put ftraces and from that we can see open is not able to 
finish but there is request of flushing:
          kworker/-15514   2.... 35206.979226: workqueue_execute_start: 
work struct 0xffffffffede40008: function flush_to_ldisc

          kworker/-15514   2.... 35206.979237: bprint: 
n_tty_receive_buf_common: <Debug>n_tty_receive_buf_common 
tty=0xffffffff477ac880 ldata=(nil)

                    init-1       4.... 35206.979751: 
bprint:               n_tty_open: <Debug>n_tty_open 
tty=0xffffffff477ac880 ldata=0xffffff80197ad000
>> there is console service exited before it in all the dumps.
>>    35206.969644:   <2> init: Service 'console' (pid 7440) exited with
>> status 130
>>    35206.969690:   <2> init: Sending signal 9 to service 'console' (pid
>> 7440) process group...
>>    35206.970857:   <2> init: kill(7440, 9) failed: No such process.
>>
>> So how can we stop request of receive buff, if we already have tty_port
>> and tty is getting reinitialized in midway like above
>> case?
> Is the port your console device. If you use a different port as a console
> device does the problem go away - that could be a very important detail
> as the hangup behaviour for the two is quite different.

Yes , we are using ttymsm0 as console device, this is the only port we 
are using.

So it seems here, we are getting flush request when init reinitialize 
the tty for same port. Please let me know
if some other debug logs are required.

Regards
Gaurav

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tty: fix data race in n_tty_receive_buf_common
  2018-01-04 13:46       ` Kohli, Gaurav
@ 2018-01-04 14:37         ` Alan Cox
  2018-01-05  7:34           ` Kohli, Gaurav
  0 siblings, 1 reply; 18+ messages in thread
From: Alan Cox @ 2018-01-04 14:37 UTC (permalink / raw)
  To: Kohli, Gaurav; +Cc: jslaby, gregkh, mikey, linux-kernel, linux-arm-msm

On Thu, 4 Jan 2018 19:16:46 +0530
"Kohli, Gaurav" <gkohli@codeaurora.org> wrote:

> > Which tty driver ? serial/msm_serial.c ?  
> 
> We are using our internal driver, msm_geni_serial.c

Can you make that code available otherwise it's impossible to see what
the problem might be.

> >    
> > Ok no what I need to see is a trace of what each CPU is doing at the
> > point you detect the problem. That way we can see what the path that
> > races is.  
> Below is stack trace running by init in our case on one core
> -006|n_tty_open(
>      |    tty = 0xFFFFFFFF477AC880 -> (
>      |      disc_data = 0xFFFFFF80197AD000,
> 
>      |      port = 0xFFFFFFFFEDE40000))
>      |  ldata = 0xFFFFFF80197AD000
> 
>      |  trace_printk_fmt = 0xFFFFFF9F275125F8
> -007|tty_ldisc_open.isra.3(
>      |    tty = 0xFFFFFFFF477AC880)
> -008|tty_ldisc_setup(
> 
> -009|tty_init_dev(
>      |    driver = 0xFFFFFFFFEDE2A480,
>      |    idx = 0)
> 
> -010|tty_open_by_driver(inline)
> -010|tty_open(

So core 1 is opening the tty from user space and that's a normal looking
trace for an open of a port that was closed

> 
> Core 2:
> -000|n_tty_receive_buf_common(
>      |    tty = 0xFFFFFFFF477AC880,
> 
>      |  ?)
>      |  ldata_=_0x0
>      |  __func__ = (110, 95, 116, 116, 121, 95, 114, 101, 99, 101, 105, 
> 118, 101, 95, 98, 117, 102, 95, 99, 111, 109, 109, 111, 110, 0)
>      |  __u = (__val = 7079195495121566464, __c = (0))
>      |  c = 127
>      |  ldata = 0xFFFFFFFFF40DF97C
> 
>      |  c = 0
>      |  ldata = 0xFFFFFF9F26F46000
> 
> -001|n_tty_receive_buf2(
>      |    tty = 0xFFFFFFFF477AC880,
> 
> -002|tty_ldisc_receive_buf(inline)
> -002|receive_buf(inline)
> -002|flush_to_ldisc(

This is probably the important bit. As you say we are doing a flush to
ldisc for a port even though it is not open.

That's starting to make more sense. Becausee your driver is the console
tty_port_shutdown doesn't stop everything (so console printk still
works), and that means you can receive data and we have a window on
reopening a tty that is only in use as a console where port->tty is valid
but ldisc is not.

I wonder what Jiri thinks but my first thougt is that tty_init_dev in
fact needs to do

	tty_ldisc_lock(tty, 5 * HZ);
	tty_ldisc_setup(tty);
	tty_ldisc_unlock(tty)

with the relevant error handling so that the flush_to_ldisc waits and
either hits 'no ldisc' or 'ldisc valid'

Alan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tty: fix data race in n_tty_receive_buf_common
  2018-01-04 14:37         ` Alan Cox
@ 2018-01-05  7:34           ` Kohli, Gaurav
  2018-01-05  7:45             ` Kohli, Gaurav
  0 siblings, 1 reply; 18+ messages in thread
From: Kohli, Gaurav @ 2018-01-05  7:34 UTC (permalink / raw)
  To: Alan Cox; +Cc: jslaby, gregkh, mikey, linux-kernel, linux-arm-msm

[-- Attachment #1: Type: text/plain, Size: 3291 bytes --]



>

>> Can you make that code available otherwise it's impossible to see what
>> the problem might be.

https://source.codeaurora.org/quic/la/kernel/msm-4.9/tree/drivers/tty/serial?h=msm-4.9 


As discussed , there not seems a problem as we are getting print request.
>>>     
>>> Ok no what I need to see is a trace of what each CPU is doing at the
>>> point you detect the problem. That way we can see what the path that
>>> races is.
>> Below is stack trace running by init in our case on one core
>> -006|n_tty_open(
>>       |    tty = 0xFFFFFFFF477AC880 -> (
>>       |      disc_data = 0xFFFFFF80197AD000,
>>
>>       |      port = 0xFFFFFFFFEDE40000))
>>       |  ldata = 0xFFFFFF80197AD000
>>
>>       |  trace_printk_fmt = 0xFFFFFF9F275125F8
>> -007|tty_ldisc_open.isra.3(
>>       |    tty = 0xFFFFFFFF477AC880)
>> -008|tty_ldisc_setup(
>>
>> -009|tty_init_dev(
>>       |    driver = 0xFFFFFFFFEDE2A480,
>>       |    idx = 0)
>>
>> -010|tty_open_by_driver(inline)
>> -010|tty_open(
> So core 1 is opening the tty from user space and that's a normal looking
> trace for an open of a port that was closed
>
>> Core 2:
>> -000|n_tty_receive_buf_common(
>>       |    tty = 0xFFFFFFFF477AC880,
>>
>>       |  ?)
>>       |  ldata_=_0x0
>>       |  __func__ = (110, 95, 116, 116, 121, 95, 114, 101, 99, 101, 105,
>> 118, 101, 95, 98, 117, 102, 95, 99, 111, 109, 109, 111, 110, 0)
>>       |  __u = (__val = 7079195495121566464, __c = (0))
>>       |  c = 127
>>       |  ldata = 0xFFFFFFFFF40DF97C
>>
>>       |  c = 0
>>       |  ldata = 0xFFFFFF9F26F46000
>>
>> -001|n_tty_receive_buf2(
>>       |    tty = 0xFFFFFFFF477AC880,
>>
>> -002|tty_ldisc_receive_buf(inline)
>> -002|receive_buf(inline)
>> -002|flush_to_ldisc(
> This is probably the important bit. As you say we are doing a flush to
> ldisc for a port even though it is not open.
>
> That's starting to make more sense. Becausee your driver is the console
> tty_port_shutdown doesn't stop everything (so console printk still
> works), and that means you can receive data and we have a window on
> reopening a tty that is only in use as a console where port->tty is valid
> but ldisc is not.
>
> I wonder what Jiri thinks but my first thougt is that tty_init_dev in
> fact needs to do
>
> 	tty_ldisc_lock(tty, 5 * HZ);
> 	tty_ldisc_setup(tty);
> 	tty_ldisc_unlock(tty)
>
> with the relevant error handling so that the flush_to_ldisc waits and
> either hits 'no ldisc' or 'ldisc valid'
>

But in above lock there is a chance, when flush_to_ldisc will occur 
first and acquired a lock in
tty_ldisc_ref

So this may fail, I am not much sure here, Please correct me if here i 
am missing something.

tty_ldisc_lock(tty, 5 * HZ);
	tty_ldisc_setup(tty);
	tty_ldisc_unlock(tty)


So can not we simply return from flush_to_ldisc ,when we know disc_data 
is not valid like
we are doing for tty and ldisc already?

if (tty->disc_data == NULL) {
                 tty_ldisc_deref(disc);
                 return;
         }

Regards
Gaurav


-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.


[-- Attachment #2: Type: text/html, Size: 5972 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tty: fix data race in n_tty_receive_buf_common
  2018-01-05  7:34           ` Kohli, Gaurav
@ 2018-01-05  7:45             ` Kohli, Gaurav
  2018-01-05 13:36               ` Alan Cox
  0 siblings, 1 reply; 18+ messages in thread
From: Kohli, Gaurav @ 2018-01-05  7:45 UTC (permalink / raw)
  To: Alan Cox; +Cc: jslaby, gregkh, mikey, linux-kernel, linux-arm-msm

Hi Alan,
>>>
>>> Can you make that code available otherwise it's impossible to see 
>>> what the problem might be.
>
>
  https://source.codeaurora.org/quic/la/kernel/msm-4.9/tree/drivers/tty/serial?h=msm-4.9
  As discussed , there not seems a problem as we are getting print 
request even when port seems to closed.


tty_ldisc_lock(tty, 5 * HZ);
  tty_ldisc_setup(tty);
  tty_ldisc_unlock(tty)

But in above lock,  there is a chance when flush_to_ldisc will occur 
first and acquired a lock in
tty_ldisc_ref itself.
So this may fail, I am not much sure here, Please correct me, If i am 
missing something here.
> So can not we simply return from flush_to_ldisc ,when we know 
> disc_data is not valid like
> we are doing for tty and ldisc already?
>
> if (tty->disc_data == NULL) {
>                 tty_ldisc_deref(disc);
>                 return;
>         }
>
>
>
>


Regards
Gaurav
-- Qualcomm India Private Limited, on behalf of Qualcomm Innovation 
Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation 
Collaborative Project.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tty: fix data race in n_tty_receive_buf_common
  2018-01-05  7:45             ` Kohli, Gaurav
@ 2018-01-05 13:36               ` Alan Cox
  2018-01-05 13:56                 ` Kohli, Gaurav
  0 siblings, 1 reply; 18+ messages in thread
From: Alan Cox @ 2018-01-05 13:36 UTC (permalink / raw)
  To: Kohli, Gaurav; +Cc: jslaby, gregkh, mikey, linux-kernel, linux-arm-msm

On Fri, 5 Jan 2018 13:15:45 +0530
"Kohli, Gaurav" <gkohli@codeaurora.org> wrote:

> Hi Alan,
> >>>
> >>> Can you make that code available otherwise it's impossible to see 
> >>> what the problem might be.  
> >
> >  
>   https://source.codeaurora.org/quic/la/kernel/msm-4.9/tree/drivers/tty/serial?h=msm-4.9
>   As discussed , there not seems a problem as we are getting print 
> request even when port seems to closed.
> 
> 
> tty_ldisc_lock(tty, 5 * HZ);
>   tty_ldisc_setup(tty);
>   tty_ldisc_unlock(tty)
> 
> But in above lock,  there is a chance when flush_to_ldisc will occur 
> first and acquired a lock in
> tty_ldisc_ref itself.

Which is fine.

If the flush_to_ldisc gets there first then it will find there is a NULL
ldisc and do nothing. When it finishes the tty_init_dev will run and will
be protected from a further re-entry.

If the init_dev gets there first it will complete the init before the
flush_to_ldisc is permitted to proceed.

In other words we restore the intended invariant that ldisc's do not get
entered while their setup routine is running.

Alan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tty: fix data race in n_tty_receive_buf_common
  2018-01-05 13:36               ` Alan Cox
@ 2018-01-05 13:56                 ` Kohli, Gaurav
  2018-01-05 14:15                   ` Alan Cox
  0 siblings, 1 reply; 18+ messages in thread
From: Kohli, Gaurav @ 2018-01-05 13:56 UTC (permalink / raw)
  To: Alan Cox; +Cc: jslaby, gregkh, mikey, linux-kernel, linux-arm-msm



On 1/5/2018 7:06 PM, Alan Cox wrote:
> On Fri, 5 Jan 2018 13:15:45 +0530
> "Kohli, Gaurav" <gkohli@codeaurora.org> wrote:
>
>> Hi Alan,
>>>>> Can you make that code available otherwise it's impossible to see
>>>>> what the problem might be.
>>>   
>>    https://source.codeaurora.org/quic/la/kernel/msm-4.9/tree/drivers/tty/serial?h=msm-4.9
>>    As discussed , there not seems a problem as we are getting print
>> request even when port seems to closed.
>>
>>
>> tty_ldisc_lock(tty, 5 * HZ);
>>    tty_ldisc_setup(tty);
>>    tty_ldisc_unlock(tty)
>>
>> But in above lock,  there is a chance when flush_to_ldisc will occur
>> first and acquired a lock in
>> tty_ldisc_ref itself.
> Which is fine.
>
> If the flush_to_ldisc gets there first then it will find there is a NULL
> ldisc and do nothing. When it finishes the tty_init_dev will run and will
> be protected from a further re-entry.
>
> If the init_dev gets there first it will complete the init before the
> flush_to_ldisc is permitted to proceed.
>
> In other words we restore the intended invariant that ldisc's do not get
> entered while their setup routine is running.
>
>

But in above case , there we can hit another race, if we have a sequence 
like this
tty_init_dev->alloc_tty_struct -> tty_ldisc_init -> this will initialize 
ldisc ,
but at this moment disc_data is still NULL

And if flush_to_ldisc comes in between, it will take ldisc reference and 
proceeds receive buffer.


Regards
Gaurav


-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tty: fix data race in n_tty_receive_buf_common
  2018-01-05 13:56                 ` Kohli, Gaurav
@ 2018-01-05 14:15                   ` Alan Cox
  2018-01-05 20:14                     ` Kohli, Gaurav
  0 siblings, 1 reply; 18+ messages in thread
From: Alan Cox @ 2018-01-05 14:15 UTC (permalink / raw)
  To: Kohli, Gaurav; +Cc: jslaby, gregkh, mikey, linux-kernel, linux-arm-msm

> But in above case , there we can hit another race, if we have a sequence 
> like this
> tty_init_dev->alloc_tty_struct -> tty_ldisc_init -> this will initialize 
> ldisc ,
> but at this moment disc_data is still NULL
> 
> And if flush_to_ldisc comes in between, it will take ldisc reference and 
> proceeds receive buffer.

So you need to move the lock up one line to protect the assignment to
tty->port->itty. We can do that.

At that point your flush_to_ldisc should see either port->itty = NULL or a
valid initialized ldisc.

Alan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tty: fix data race in n_tty_receive_buf_common
  2018-01-05 14:15                   ` Alan Cox
@ 2018-01-05 20:14                     ` Kohli, Gaurav
  2018-01-05 20:24                       ` Kohli, Gaurav
  2018-01-05 20:28                       ` Kohli, Gaurav
  0 siblings, 2 replies; 18+ messages in thread
From: Kohli, Gaurav @ 2018-01-05 20:14 UTC (permalink / raw)
  To: Alan Cox; +Cc: jslaby, gregkh, mikey, linux-kernel, linux-arm-msm

Hi Alan,


On 1/5/2018 7:45 PM, Alan Cox wrote:
>> But in above case , there we can hit another race, if we have a sequence
>> like this
>> tty_init_dev->alloc_tty_struct -> tty_ldisc_init -> this will initialize
>> ldisc ,
>> but at this moment disc_data is still NULL
>>
>> And if flush_to_ldisc comes in between, it will take ldisc reference and
>> proceeds receive buffer.
> So you need to move the lock up one line to protect the assignment to
> tty->port->itty. We can do that.
>
> At that point your flush_to_ldisc should see either port->itty = NULL or a
> valid initialized ldisc.
>
>

Yes , with little modification this should work.

+retval =  tty_ldisc_lock(tty, 5 * HZ);
+if (retval)
+         goto err_release_lock;
tty->port->itty = tty;
/*
* Structures all installed ... call the ldisc open routines.
* If we fail here just call release_tty to clean up.  No need
* to decrement the use counts, as release_tty doesn't care.
*/
retval = tty_ldisc_setup(tty, tty->link);
         if (retval)
             goto err_release_tty;
tty_ldisc_unlock(tty);
err_release_tty:
tty_unlock(tty);
tty_info_ratelimited(tty, "ldisc open failed (%d), clearing slot %d\n",
     retval, idx);
     release_tty(tty, idx);
     return ERR_PTR(retval);

+err_release_lock;
+tty_ldisc_unlock(tty);
+return ERR_PTR(retval);

Please let me know if above modification seems fine , then we can upload 
this and try reproduction of Bug.

Regards
Gaurav

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tty: fix data race in n_tty_receive_buf_common
  2018-01-05 20:14                     ` Kohli, Gaurav
@ 2018-01-05 20:24                       ` Kohli, Gaurav
  2018-01-05 21:05                         ` Alan Cox
  2018-01-05 20:28                       ` Kohli, Gaurav
  1 sibling, 1 reply; 18+ messages in thread
From: Kohli, Gaurav @ 2018-01-05 20:24 UTC (permalink / raw)
  To: Alan Cox; +Cc: jslaby, gregkh, mikey, linux-kernel, linux-arm-msm

Hi Alan,

Sorry correcting the typo here:
+retval =  tty_ldisc_lock(tty, 5 * HZ);
+if (retval)
+         goto err_release_lock;
tty->port->itty = tty;
/*
* Structures all installed ... call the ldisc open routines.
* If we fail here just call release_tty to clean up.  No need
* to decrement the use counts, as release_tty doesn't care.
*/
retval = tty_ldisc_setup(tty, tty->link);
         if (retval)
             goto err_release_tty;
tty_ldisc_unlock(tty);
err_release_tty:
tty_info_ratelimited(tty, "ldisc open failed (%d), clearing slot %d\n",
     retval, idx);
+err_release_lock;
+tty_unlock(tty);
+release_tty(tty, idx);
+tty_ldisc_unlock(tty);
+return ERR_PTR(retval);

On 1/6/2018 1:44 AM, Kohli, Gaurav wrote:
> Hi Alan,
>
>
> On 1/5/2018 7:45 PM, Alan Cox wrote:
>>> But in above case , there we can hit another race, if we have a 
>>> sequence
>>> like this
>>> tty_init_dev->alloc_tty_struct -> tty_ldisc_init -> this will 
>>> initialize
>>> ldisc ,
>>> but at this moment disc_data is still NULL
>>>
>>> And if flush_to_ldisc comes in between, it will take ldisc reference 
>>> and
>>> proceeds receive buffer.
>> So you need to move the lock up one line to protect the assignment to
>> tty->port->itty. We can do that.
>>
>> At that point your flush_to_ldisc should see either port->itty = NULL 
>> or a
>> valid initialized ldisc.
>>
>>
>
> Yes , with little modification this should work.
>
> +retval =  tty_ldisc_lock(tty, 5 * HZ);
> +if (retval)
> +         goto err_release_lock;
> tty->port->itty = tty;
> /*
> * Structures all installed ... call the ldisc open routines.
> * If we fail here just call release_tty to clean up.  No need
> * to decrement the use counts, as release_tty doesn't care.
> */
> retval = tty_ldisc_setup(tty, tty->link);
>         if (retval)
>             goto err_release_tty;
> tty_ldisc_unlock(tty);
> err_release_tty:
> tty_unlock(tty);
> tty_info_ratelimited(tty, "ldisc open failed (%d), clearing slot %d\n",
>     retval, idx);
>     release_tty(tty, idx);
>     return ERR_PTR(retval);
>
> +err_release_lock;
> +tty_ldisc_unlock(tty);
> +return ERR_PTR(retval);
>
> Please let me know if above modification seems fine , then we can 
> upload this and try reproduction of Bug.
>
> Regards
> Gaurav
>

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tty: fix data race in n_tty_receive_buf_common
  2018-01-05 20:14                     ` Kohli, Gaurav
  2018-01-05 20:24                       ` Kohli, Gaurav
@ 2018-01-05 20:28                       ` Kohli, Gaurav
  1 sibling, 0 replies; 18+ messages in thread
From: Kohli, Gaurav @ 2018-01-05 20:28 UTC (permalink / raw)
  To: Alan Cox; +Cc: jslaby, gregkh, mikey, linux-kernel, linux-arm-msm

HI Alan,

Sorry correcting the typo here:


On 1/6/2018 1:44 AM, Kohli, Gaurav wrote:
> Hi Alan,
>
>
> On 1/5/2018 7:45 PM, Alan Cox wrote:
>>> But in above case , there we can hit another race, if we have a 
>>> sequence
>>> like this
>>> tty_init_dev->alloc_tty_struct -> tty_ldisc_init -> this will 
>>> initialize
>>> ldisc ,
>>> but at this moment disc_data is still NULL
>>>
>>> And if flush_to_ldisc comes in between, it will take ldisc reference 
>>> and
>>> proceeds receive buffer.
>> So you need to move the lock up one line to protect the assignment to
>> tty->port->itty. We can do that.
>>
>> At that point your flush_to_ldisc should see either port->itty = NULL 
>> or a
>> valid initialized ldisc.
>>
>>
>
> Yes , with little modification this should work.
>
> +retval =  tty_ldisc_lock(tty, 5 * HZ);
> +if (retval)
> +         goto err_release_lock;
> tty->port->itty = tty;
> /*
> * Structures all installed ... call the ldisc open routines.
> * If we fail here just call release_tty to clean up.  No need
> * to decrement the use counts, as release_tty doesn't care.
> */
> retval = tty_ldisc_setup(tty, tty->link);
>         if (retval)
>             goto err_release_tty;
> tty_ldisc_unlock(tty);
> err_release_tty:
> tty_info_ratelimited(tty, "ldisc open failed (%d), clearing slot %d\n",
>     retval, idx);
> +err_release_lock;
     +tty_unlock(tty);
     +release_tty(tty, idx);
> +tty_ldisc_unlock(tty);
> +return ERR_PTR(retval);
>
> Please let me know if above modification seems fine , then we can 
> upload this and try reproduction of Bug.
>
> Regards
> Gaurav
>

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tty: fix data race in n_tty_receive_buf_common
  2018-01-05 20:24                       ` Kohli, Gaurav
@ 2018-01-05 21:05                         ` Alan Cox
  2018-01-06  7:50                           ` Kohli, Gaurav
  0 siblings, 1 reply; 18+ messages in thread
From: Alan Cox @ 2018-01-05 21:05 UTC (permalink / raw)
  To: Kohli, Gaurav; +Cc: jslaby, gregkh, mikey, linux-kernel, linux-arm-msm

On Sat, 6 Jan 2018 01:54:36 +0530
"Kohli, Gaurav" <gkohli@codeaurora.org> wrote:

> Hi Alan,
> 
> Sorry correcting the typo here:
> +retval =  tty_ldisc_lock(tty, 5 * HZ);
> +if (retval)
> +         goto err_release_lock;
> tty->port->itty = tty;
> /*
> * Structures all installed ... call the ldisc open routines.
> * If we fail here just call release_tty to clean up.  No need
> * to decrement the use counts, as release_tty doesn't care.
> */
> retval = tty_ldisc_setup(tty, tty->link);
>          if (retval)
>              goto err_release_tty;
> tty_ldisc_unlock(tty);
> err_release_tty:
> tty_info_ratelimited(tty, "ldisc open failed (%d), clearing slot %d\n",
>      retval, idx);
> +err_release_lock;
> +tty_unlock(tty);
> +release_tty(tty, idx);
> +tty_ldisc_unlock(tty);
> +return ERR_PTR(retval);

Thanks - can you give that a testing since for some reason you seem to be
the only system able to hit this and confirm that it's now working
properly. Then I'll push it upstream

And thanks for doing all the debug work to find this and identify what
was going on.

Alan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tty: fix data race in n_tty_receive_buf_common
  2018-01-05 21:05                         ` Alan Cox
@ 2018-01-06  7:50                           ` Kohli, Gaurav
  2018-01-17 13:25                             ` Kohli, Gaurav
  0 siblings, 1 reply; 18+ messages in thread
From: Kohli, Gaurav @ 2018-01-06  7:50 UTC (permalink / raw)
  To: Alan Cox; +Cc: jslaby, gregkh, mikey, linux-kernel, linux-arm-msm



On 1/6/2018 2:35 AM, Alan Cox wrote:
> On Sat, 6 Jan 2018 01:54:36 +0530
> "Kohli, Gaurav" <gkohli@codeaurora.org> wrote:
>
>> Hi Alan,
>>
>> Sorry correcting the typo here:
>> +retval =  tty_ldisc_lock(tty, 5 * HZ);
>> +if (retval)
>> +         goto err_release_lock;
>> tty->port->itty = tty;
>> /*
>> * Structures all installed ... call the ldisc open routines.
>> * If we fail here just call release_tty to clean up.  No need
>> * to decrement the use counts, as release_tty doesn't care.
>> */
>> retval = tty_ldisc_setup(tty, tty->link);
>>           if (retval)
>>               goto err_release_tty;
>> tty_ldisc_unlock(tty);
>> err_release_tty:
>> tty_info_ratelimited(tty, "ldisc open failed (%d), clearing slot %d\n",
>>       retval, idx);
>> +err_release_lock;
>> +tty_unlock(tty);
>> +release_tty(tty, idx);
>> +tty_ldisc_unlock(tty);
>> +return ERR_PTR(retval);
> Thanks - can you give that a testing since for some reason you seem to be
> the only system able to hit this and confirm that it's now working
> properly. Then I'll push it upstream
>
> And thanks for doing all the debug work to find this and identify what
> was going on.
>
> Alan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Thanks Alan for your support, yes we will try to reproduce and get back 
to you.
Ideally it take 2-3 days for issue reproduction, but yes it is 
consistently reproducible.

Regards
Gaurav

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tty: fix data race in n_tty_receive_buf_common
  2018-01-06  7:50                           ` Kohli, Gaurav
@ 2018-01-17 13:25                             ` Kohli, Gaurav
  2018-01-20 18:49                               ` Alan Cox
  0 siblings, 1 reply; 18+ messages in thread
From: Kohli, Gaurav @ 2018-01-17 13:25 UTC (permalink / raw)
  To: Alan Cox; +Cc: jslaby, gregkh, mikey, linux-kernel, linux-arm-msm

On 1/6/2018 1:20 PM, Kohli, Gaurav wrote:

> On 1/6/2018 2:35 AM, Alan Cox wrote:
>> On Sat, 6 Jan 2018 01:54:36 +0530
>> "Kohli, Gaurav" <gkohli@codeaurora.org> wrote:
>>> Hi Alan,
>>> Sorry correcting the typo here:
>>> +retval =  tty_ldisc_lock(tty, 5 * HZ);
>>> +if (retval)
>>> +         goto err_release_lock;
>>> tty->port->itty = tty;
>>> /*
>>> * Structures all installed ... call the ldisc open routines.
>>> * If we fail here just call release_tty to clean up.  No need
>>> * to decrement the use counts, as release_tty doesn't care.
>>> */
>>> retval = tty_ldisc_setup(tty, tty->link);
>>>            if (retval)
>>>                goto err_release_tty;
>>> tty_ldisc_unlock(tty);
>>> err_release_tty:
>>> tty_info_ratelimited(tty, "ldisc open failed (%d), clearing slot %d\n",
>>>        retval, idx);
>>> +err_release_lock;
>>> +tty_unlock(tty);
>>> +release_tty(tty, idx);
>>> +tty_ldisc_unlock(tty);
>>> +return ERR_PTR(retval);
>> Thanks - can you give that a testing since for some reason you seem to be
>> the only system able to hit this and confirm that it's now working
>> properly. Then I'll push it upstream
>> And thanks for doing all the debug work to find this and identify what
>> was going on.
>> Alan
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Thanks Alan for your support, yes we will try to reproduce and get back
> to you.
> Ideally it take 2-3 days for issue reproduction, but yes it is
> consistently reproducible.
> Regards
> Gaurav

Hi Alan,
Sorry for the delayed response, as i was waiting for test results.
I have uploaded the new patch v1 as per your suggestions and result 
looks good.

Thanks
Gaurav
>
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tty: fix data race in n_tty_receive_buf_common
  2018-01-17 13:25                             ` Kohli, Gaurav
@ 2018-01-20 18:49                               ` Alan Cox
  0 siblings, 0 replies; 18+ messages in thread
From: Alan Cox @ 2018-01-20 18:49 UTC (permalink / raw)
  To: Kohli, Gaurav; +Cc: jslaby, gregkh, mikey, linux-kernel, linux-arm-msm

> Sorry for the delayed response, as i was waiting for test results.
> I have uploaded the new patch v1 as per your suggestions and result 
> looks good.

Thanks for all the testing. This looks good to me too.

Alan

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-01-20 18:49 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-03 13:48 [PATCH] tty: fix data race in n_tty_receive_buf_common Gaurav Kohli
2018-01-03 19:38 ` Alan Cox
2018-01-04  5:47   ` Kohli, Gaurav
2018-01-04 11:09     ` Alan Cox
2018-01-04 13:46       ` Kohli, Gaurav
2018-01-04 14:37         ` Alan Cox
2018-01-05  7:34           ` Kohli, Gaurav
2018-01-05  7:45             ` Kohli, Gaurav
2018-01-05 13:36               ` Alan Cox
2018-01-05 13:56                 ` Kohli, Gaurav
2018-01-05 14:15                   ` Alan Cox
2018-01-05 20:14                     ` Kohli, Gaurav
2018-01-05 20:24                       ` Kohli, Gaurav
2018-01-05 21:05                         ` Alan Cox
2018-01-06  7:50                           ` Kohli, Gaurav
2018-01-17 13:25                             ` Kohli, Gaurav
2018-01-20 18:49                               ` Alan Cox
2018-01-05 20:28                       ` Kohli, Gaurav

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.