Re: [PATCH] tty: vt: make do_con_write() no-op if IRQ is disabled

From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: "Fabio M. De Francesco" <fmdefrancesco@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Peter Hurley <peter@hurleysoftware.com>,
	Jiri Slaby <jirislaby@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH] tty: vt: make do_con_write() no-op if IRQ is disabled
Date: Fri, 3 Dec 2021 21:32:21 +0900	[thread overview]
Message-ID: <5d055fbd-e94a-fe54-d3e0-982dc455ed1a@i-love.sakura.ne.jp> (raw)
In-Reply-To: <3017492.JFOoIcAZ2s@localhost.localdomain>

On 2021/12/03 20:00, Fabio M. De Francesco wrote:
> On Thursday, December 2, 2021 7:35:16 PM CET Linus Torvalds wrote:
>> On Thu, Dec 2, 2021 at 7:41 AM Tetsuo Handa
>> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>>
>>>> Looking at the backtrace, I see
>>>>
>>>>    n_hdlc_send_frames+0x24b/0x490 drivers/tty/n_hdlc.c:290
>>>>    tty_wakeup+0xe1/0x120 drivers/tty/tty_io.c:534
>>>>    __start_tty drivers/tty/tty_io.c:806 [inline]
>>>>    __start_tty+0xfb/0x130 drivers/tty/tty_io.c:799
>>>>
>>>> and apparently it's that hdlc line discipline (and
>>>> n_hdlc_send_frames() in particular) that is the problem here.
>>>>
>>>> I think that's where the fix should be.
>>>
>>> Do you mean that we should change the behavior of n_hdlc_send_frames()
>>> rather than trying to make __start_tty() schedulable again?
>>
>> I wouldn't change n_hdlc_send_frames() itself. It does what it says it does.
>>
>> But n_hdlc_tty_wakeup() probably shouldn't call it directly. Other tty
>> line disciplines don't do that kind of thing - although I only looked
>> at a couple. They all seem to just set bits and prepare things. Like a
>> wakeup function should do.
>>
>> So I think n_hdlc_tty_wakeup() should perhaps only do a
>> "schedule_work()" or similar to get that n_hdlc_send_frames() started,
>> rather than doing it itself.
>>
>> Example: net/nfc/nci/uart.c. It does that
>>
>>         schedule_work(&nu->write_work);
>>
>> instead of actually trying to do a write from a wakeup routine
>> (similar examples in ppp - "tasklet_schedule(&ap->tsk)" etc).
>>
>> I mean, it's called "wakeup", not "write". So I think the fundamental
>> confusion here is in hdlc, not the tty layer.
>>
>>               Linus
>>

OK.

> This is what I understand from the above argument: do a schedule_work() to get 
> that n_hdlc_send_frames() started; in this way, n_hdlc_tty_wakeup() can
> return to the caller and n_hdlc_send_frames() is executed asynchronously 
> (i.e., no longer in an atomic context). 

Yes. If we copy how net/nfc/nci/uart.c does, the fix would look like:

--------------------
 drivers/tty/n_hdlc.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/n_hdlc.c b/drivers/tty/n_hdlc.c
index 7e0884ecc74f..a71fcac60925 100644
--- a/drivers/tty/n_hdlc.c
+++ b/drivers/tty/n_hdlc.c
@@ -140,6 +140,8 @@ struct n_hdlc {
 	struct n_hdlc_buf_list	rx_buf_list;
 	struct n_hdlc_buf_list	tx_free_buf_list;
 	struct n_hdlc_buf_list	rx_free_buf_list;
+	struct work_struct	write_work;
+	struct tty_struct	*tty_for_write_work;
 };
 
 /*
@@ -210,6 +212,8 @@ static void n_hdlc_tty_close(struct tty_struct *tty)
 	wake_up_interruptible(&tty->read_wait);
 	wake_up_interruptible(&tty->write_wait);
 
+	cancel_work_sync(&n_hdlc->write_work);
+
 	n_hdlc_free_buf_list(&n_hdlc->rx_free_buf_list);
 	n_hdlc_free_buf_list(&n_hdlc->tx_free_buf_list);
 	n_hdlc_free_buf_list(&n_hdlc->rx_buf_list);
@@ -334,6 +338,20 @@ static void n_hdlc_send_frames(struct n_hdlc *n_hdlc, struct tty_struct *tty)
 		goto check_again;
 }	/* end of n_hdlc_send_frames() */
 
+/**
+ * n_hdlc_tty_write_work - Asynchronous callback for transmit wakeup
+ * @work: pointer to work_struct
+ *
+ * Called when low level device driver can accept more send data.
+ */
+static void n_hdlc_tty_write_work(struct work_struct *work)
+{
+	struct n_hdlc *n_hdlc = container_of(work, struct n_hdlc, write_work);
+	struct tty_struct *tty = n_hdlc->tty_for_write_work;
+
+	n_hdlc_send_frames(n_hdlc, tty);
+}	/* end of n_hdlc_tty_write_work() */
+
 /**
  * n_hdlc_tty_wakeup - Callback for transmit wakeup
  * @tty: pointer to associated tty instance data
@@ -344,7 +362,8 @@ static void n_hdlc_tty_wakeup(struct tty_struct *tty)
 {
 	struct n_hdlc *n_hdlc = tty->disc_data;
 
-	n_hdlc_send_frames(n_hdlc, tty);
+	n_hdlc->tty_for_write_work = tty;
+	schedule_work(&n_hdlc->write_work);
 }	/* end of n_hdlc_tty_wakeup() */
 
 /**
@@ -706,6 +725,7 @@ static struct n_hdlc *n_hdlc_alloc(void)
 	if (!n_hdlc)
 		return NULL;
 
+	INIT_WORK(&n_hdlc->write_work, n_hdlc_tty_write_work);
 	spin_lock_init(&n_hdlc->rx_free_buf_list.spinlock);
 	spin_lock_init(&n_hdlc->tx_free_buf_list.spinlock);
 	spin_lock_init(&n_hdlc->rx_buf_list.spinlock);
--------------------

> 
> I hope that I'm not missing something. If the above summary is correct, 
> please forgive a newbie for the following questions... 
> 
> Commit f9e053dcfc02 ("tty: Serialize tty flow control changes with flow_lock") 
> has introduced spinlocks to serialize flow control changes and avoid the 
> concurrent executions of __start_tty() and __stop_tty().
> 
> This is an excerpt from the above-mentioned commit:
> 
> ->
>     Introduce tty->flow_lock spinlock to serialize tty flow control changes.
>     Split out unlocked __start_tty()/__stop_tty() flavors for use by
>     ioctl(TCXONC) in follow-on patch.
> <-
> 
> This is the reason why we are dealing with this bug. Currently we have __start_tty() 
> called with an acquired spinlock and IRQs disabled and the calls chain leads to 
> console_lock() while in atomic context.

If we hit a race window described in that commit

    CPU 0                          | CPU 1
    stop_tty()                     |
      lock ctrl_lock               |
      tty->stopped = 1             |
      unlock ctrl_lock             |
                                   | start_tty()
                                   |   lock ctrl_lock
                                   |   tty->stopped = 0
                                   |   unlock ctrl_lock
                                   |   driver->start()
      driver->stop()               |

    In this case, the flow control state now indicates the tty has
    been started, but the actual hardware state has actually been stopped.

, the tty->stopped flag remains 0 despite driver->stop() is called after
driver->start() finished. tty->stopped (the flow control state) says "not stopped"
but the actual hardware state is "stopped".

> 
> In summation, my questions are... 
> 
> 1) Why do we still need to protect __start_tty() and __stop_tty() with spin_lock_irq() 
> if the solution to the bug is to execute n_hdlc_send_frames() asynchronously?

Without serialization, tty->stopped flag and the actual hardware state can mismatch.

> 
> 2) If it is true that we need to avoid concurrent executions of __start_tty() and 
> __stop_tty(), can we just use a Mutex in the IOCTL's helper?

Yes if all __start_tty() and __stop_tty() callers were schedulable context.
But stop_tty() says that stop_tty() might be called from atomic context.
Thus, we can't use a mutex for protecting tty->stopped flag.

> 
> Thanks,
> 
> Fabio M. De Francesco

By the way, even with above patch, I think

    CPU 0                  | CPU 1                  |  CPU 2
    stop_tty()             |                        |
      lock flow.lock       |                        |
      tty->stopped = 1     |                        |
      driver->stop()       |                        |
      unlock flow.lock     |                        |
                           | start_tty()            |
                           |   lock flow.lock       |
                           |   tty->stopped = 0     |
                           |   driver->start() => Schedules n_hdlc_send_frames()
                           |   unlock flow.lock     |
    stop_tty()             |                        |
      lock flow.lock       |                        |
      tty->stopped = 1     |                        |
      driver->stop()       |                        |
      unlock flow.lock     |                        |
                           |                        | Starts n_hdlc_send_frames()

(that is, the n_hdlc is writing to consoles despite tty->stopped is 1) can happen
until n_hdlc_send_frames() completes.

Then, even scheduling next n_hdlc_send_frames() while previous n_hdlc_send_frames() is
possible? In the worst case, multiple CPUs can run n_hdlc_send_frames() concurrently?

    CPU 0                  | CPU 1                  |  CPU 2                 | CPU 3
    stop_tty()             |                        |                        |
      lock flow.lock       |                        |                        |
      tty->stopped = 1     |                        |                        |
      driver->stop()       |                        |                        |
      unlock flow.lock     |                        |                        |
                           | start_tty()            |                        |
                           |   lock flow.lock       |                        |
                           |   tty->stopped = 0     |                        |
                           |   driver->start() => Schedules n_hdlc_send_frames()
                           |   unlock flow.lock     |                        |
                           |                        | Starts n_hdlc_send_frames()
    stop_tty()             |                        |                        |
      lock flow.lock       |                        |                        |
      tty->stopped = 1     |                        |                        |
      driver->stop()       |                        |                        |
      unlock flow.lock     |                        |                        |
                           | start_tty()            |                        |
                           |   lock flow.lock       |                        |
                           |   tty->stopped = 0     |                        |
                           |   driver->start() => Schedules n_hdlc_send_frames()
                           |   unlock flow.lock     |                        |
                           |                        |                        | Starts n_hdlc_send_frames()

Ah, OK. n_hdlc->tbusy is there for serialization.