From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752942AbbLIVOo (ORCPT ); Wed, 9 Dec 2015 16:14:44 -0500 Received: from smtp-out-so.shaw.ca ([64.59.136.139]:47560 "EHLO smtp-out-so.shaw.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751960AbbLIVOn (ORCPT ); Wed, 9 Dec 2015 16:14:43 -0500 X-Greylist: delayed 487 seconds by postgrey-1.27 at vger.kernel.org; Wed, 09 Dec 2015 16:14:43 EST X-Authority-Analysis: v=2.1 cv=He60Nnw8 c=1 sm=1 tr=0 a=qZxK3cM5tHtOUZVZkOzy1Q==:117 a=qZxK3cM5tHtOUZVZkOzy1Q==:17 a=3I1X_3ewAAAA:8 a=kj9zAlcOel0A:10 a=g5zXW9WrAAAA:8 a=WcxHNsqOAAAA:8 a=kjA1WfKlZGSDP-a8gjUA:9 a=CjuIK1q_8ugA:10 Date: Wed, 9 Dec 2015 14:06:32 -0700 (MST) From: Marc Aurele La France To: Peter Hurley , Greg Kroah-Hartman , Jiri Slaby , linux-kernel@vger.kernel.org cc: Volth , Damien Miller Subject: n_tty: Check the other end of pty pair before returning EAGAIN on a read() Message-ID: User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-CMAE-Envelope: MS4wfIQ9PjtRn2WA+gWtpQdREaSw6TTqVeELeFu2pnLqfd8LuyPl6Gwd8RGkF3y+ar4vsOK4TN0vNVxr9pwqXFM8ot6mXlI+HjxgRHw//rDGiJJzsfM4r2fc 9tkOrZHOrbDmwi4YVBgCCdtV+jysgeWzcZ89YOKoetyocNC+FEGKLy47Q4glyGfOiJTwUxFGw9hQEA1IohJ/Gfc5S8XhotfsV74AN1Y0MHQOdoOZ13CLQGwH QoYGctCe9YtoyQ6de6qFTRwpdjio/dMgoWUdniMjZg8ByscFntiWNzR2+9u0rS/Oj14mok4AACQZ9al0Yp9bi4KUBuIp86HY3Uz48xgfvlE= Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Greetings. The following four commits are some of the changes that have been made to the tty layer since kernel version 3.11: 1) f95499c3030fe1bfad57745f2db1959c5b43dca8 n_tty: Don't wait for buffer work in read() loop 2) f8747d4a466ab2cafe56112c51b3379f9fdb7a12 tty: Fix pty master read() after slave closes 3) 52bce7f8d4fc633c9a9d0646eef58ba6ae9a3b73 pty, n_tty: Simplify input processing on final close 4) 1a48632ffed61352a7810ce089dc5a8bcd505a60 pty: Fix input race when closing Commit "4)" corrected an issue whereby EIO could be prematurely returned on a read of one end of a master/slave pty pair after the other had been completely closed. Yet, I would argue that EAGAIN should not be returned either when there actually is data to be returned. This whether or not the other end has been completely closed. Indeed, the previous code (before commit "1)") checked the other end of the pty pair for any data before returning EAGAIN. This mimics the behaviour of other System-V variants (Solaris, AIX, etc.) in this regard and ensured that EAGAIN really did mean no data was available at the time of the call. Portable OpenSSH, since release 4.6p1 in 2007, relies on this being the case and has been broken since commit "1)" introduced spurious EAGAIN returns (i.e. as of 3.12 kernels). The scenario at hand is as follows. After sshd has been SIGCHLD'ed about the shell's termination, it continues to read the master pty until an error occurs. This error will be EIO if no process has the slave pty open. Otherwise (for example when the shell spawned long-running processes in the background before terminating), that error is expected to be EAGAIN. sshd cannot continue to read until an EIO in all cases, because doing so causes the session to hang until all processes have closed the slave pty, which is not the desired behaviour. Thus a spurious EAGAIN return causes sshd to lose data, whether or not the slave pty is completely closed. I've been using the following script to reproduce the problem. It loops until the issue is detected. #! /bin/bash LOG=sshlog-`date "+%F.%T"` touch ${LOG} while test -z "`grep -n '^Connection' ${LOG} | grep -v '0:Connection'`" do ssh -p 22 -tt root@localhost \ '/bin/bash -c "/bin/ping -c4 8.8.8.8"' 2>&1 | \ tee -a ${LOG} done It should be noted that the problem is extremely rare, but still occurs, on real hardware. This bug is easier to replicate in a virtual machine such as those that can be created using Google Cloud. The patch below is a suggested fix. It was developed using a 4.3.0 kernel and should apply, modulo fuzz, to any release >= 4.0.5. My suggested fix is modeled after commit "2)" mentionned above. Given commit "2)" was later reworked by commit "3)", I fully expect my fix to be reworked as well. I volunteer to backport the fix this ends up being to any stable release >= 3.12 deemed needed. Please Reply-To-All. Thanks and have a great day. Marc. Reported-by: Volth BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=52 BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=2492 Signed-off-by: Marc Aurele La France --- a/drivers/tty/n_tty.c +++ a/drivers/tty/n_tty.c @@ -2281,20 +2281,26 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, if (!timeout) break; if (file->f_flags & O_NONBLOCK) { - retval = -EAGAIN; - break; - } - if (signal_pending(current)) { - retval = -ERESTARTSYS; - break; - } - up_read(&tty->termios_rwsem); + up_read(&tty->termios_rwsem); + flush_work(&tty->port->buf.work); + down_read(&tty->termios_rwsem); + if (!input_available_p(tty, 0)) { + retval = -EAGAIN; + break; + } + } else { + if (signal_pending(current)) { + retval = -ERESTARTSYS; + break; + } + up_read(&tty->termios_rwsem); - timeout = wait_woken(&wait, TASK_INTERRUPTIBLE, - timeout); + timeout = wait_woken(&wait, TASK_INTERRUPTIBLE, + timeout); - down_read(&tty->termios_rwsem); - continue; + down_read(&tty->termios_rwsem); + continue; + } } if (ldata->icanon && !L_EXTPROC(tty)) {