From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21D37C433F5 for ; Tue, 24 May 2022 11:10:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234382AbiEXLKI (ORCPT ); Tue, 24 May 2022 07:10:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42436 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232221AbiEXLKH (ORCPT ); Tue, 24 May 2022 07:10:07 -0400 Received: from mail-qk1-x72e.google.com (mail-qk1-x72e.google.com [IPv6:2607:f8b0:4864:20::72e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AA6C2719D9 for ; Tue, 24 May 2022 04:10:05 -0700 (PDT) Received: by mail-qk1-x72e.google.com with SMTP id p63so545929qkf.0 for ; Tue, 24 May 2022 04:10:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=YCkv9ahwSNCP+j2fJJRQm+rt5MWWMmA/mENdIhGqJXY=; b=eGvNXQSLgeF3wdxMvCn2TNDDh5VHX4j/ZgvAMncDlWtbtVlvLLpw7j+Tf9La+ymScl 3PqJOlnynGaG8+I3uo/CXRuvTWf9zJLjRaXD3v62sHJf7n3t0OOABR3WBlzKYpFCYd8Q OH6UyYoCMyPvQMxB3eGpa24CaVSTyvOinyIdGaO6QkDzYv+M8cBnSxYA+YYhS94Q7cRe jV0lyEItVnEdzHWzirF6sV4D3nBWOL0y0l1+fji4kc02otwE06cBgxmSWXzB95P45d4P qDgcjDjMQC1jKME1I5NOebZZ0Ggab3o9MClqWhpN9GlfTPhE/tVfINQN/ddEGHYm+yao EMAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=YCkv9ahwSNCP+j2fJJRQm+rt5MWWMmA/mENdIhGqJXY=; b=c6BleAR1QgeUz+9ZT5PejTGBmUe7XXGKHa65r4LSPzvtv0ivIg5LOIHtwpGdPLoNbv qgb7rjoPgGIY3tGxBnRkb4gxM+ZrpxSvXy2ZBymZ2TPZeOK3BlpIYa/EBRTm2JwEzV2g uZIWyaEVbdWWIO2Ug7MwqC5vw1+9U+TE1F/6z+89OmQTBigH2Hl7zSNNNYCUCKyqdlF3 i0wdc7PjrFSAgHDMHeN88S+vZemEpztosXuwna5AqNevsIl4++nBoDad5fotp8fVXhG2 3DdkXGxoEgP5cAyyArm1dZJ4TVaDr0tf8bc/WHU6HC40toEA8O3q7LSHfKafWBNCz2gc 59PQ== X-Gm-Message-State: AOAM530oEtg/APHbRWQpskoVHBrPZut/f4UrKjz1tbHu3QLWdBKia+e9 3uJK244zj7/JKuzJ7c8hCe7jSklLIjppTz1EESS9Uk4T3V8= X-Google-Smtp-Source: ABdhPJwYzSVL7OCRTiGZDX0L/njJch6or68FQSYuYSvzXFHjDyqULth84lCTaQkTvNTUP+4tD+fpr+49GuhQ7EOzkac= X-Received: by 2002:a37:6d5:0:b0:6a0:5e23:7c93 with SMTP id 204-20020a3706d5000000b006a05e237c93mr14796518qkg.721.1653390604812; Tue, 24 May 2022 04:10:04 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: cael Date: Tue, 24 May 2022 19:09:53 +0800 Message-ID: Subject: Re: tty: fix a possible hang on tty device To: =?UTF-8?Q?Ilpo_J=C3=A4rvinen?= Cc: Greg Kroah-Hartman , Jiri Slaby , linux-serial Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-serial@vger.kernel.org Thanks for the answer, yes, there exists a race between reader and kworker, but it's OK. Before checking chars_in_buffer in kworker, ldata->no_room is set true, if reader changes ldata->read_tail in n_tty_read when kworker checks this v= alue which makes the check fail, then when reader reaches end of n_tty_read, n_tty_kick_worker will also be called. Besides, kworker and reader may call n_tty_kick_worker at the same time, this function only queues work on workqueue, so it's harmless. Ilpo J=C3=A4rvinen =E4=BA=8E2022=E5=B9=B45= =E6=9C=8824=E6=97=A5=E5=91=A8=E4=BA=8C 17:11=E5=86=99=E9=81=93=EF=BC=9A > > On Tue, 24 May 2022, cael wrote: > > > We have met a hang on pty device, the reader was blocking at > > epoll on master side, the writer was sleeping at wait_woken inside > > n_tty_write on slave side ,and the write buffer on tty_port was full, = we > > Space after comma. It would be also useful to tone down usage of "we" in > the changelog. > > > found that the reader and writer would never be woken again and block > > forever. > > > > We thought the problem was caused as a race between reader and > > kworker as follows: > > n_tty_read(reader)| n_tty_receive_buf_common(kworker) > > |room =3D N_TTY_BUF_SIZE - (ldata->read_head - tail) > > |room <=3D 0 > > copy_from_read_buf| > > n_tty_kick_worker | > > |ldata->no_room =3D true > > > > After writing to slave device, writer wakes up kworker to flush > > data on tty_port to reader, and the kworker finds that reader > > has no room to store data so room <=3D 0 is met. At this moment, > > reader consumes all the data on reader buffer and call > > n_tty_kick_worker to check ldata->no_room and finds that there > > is no need to call tty_buffer_restart_work to flush data to reader > > and reader quits reading. Then kworker sets ldata->no_room=3Dtrue > > and quits too. > > > > If write buffer is not full, writer will wake kworker to flush data > > again after following writes, but if writer buffer is full and writer > > goes to sleep, kworker will never be woken again and tty device is > > blocked. > > > > We think this problem can be solved with a check for read buffer > > inside function n_tty_receive_buf_common, if read buffer is empty and > > ldata->no_room is true, this means that kworker has more data to flush > > to read buffer, so a call to n_tty_kick_worker is necessary. > > > > Signed-off-by: cael > > --- > > diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c > > index efc72104c840..36c7bc033c78 100644 > > --- a/drivers/tty/n_tty.c > > +++ b/drivers/tty/n_tty.c > > @@ -1663,6 +1663,9 @@ n_tty_receive_buf_common(struct tty_struct *tty, > > const unsigned char *cp, > > } else > > n_tty_check_throttle(tty); > > > > + if (!chars_in_buffer(tty)) > > + n_tty_kick_worker(tty); > > + > > chars_in_buffer() accesses ldata->read_tail in producer context so this > probably just moves the race there? > > > -- > i. >