From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ivanoab7.miniserver.com ([37.128.132.42] helo=www.kot-begemot.co.uk) by bombadil.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lZTp9-00DSOw-UB for linux-um@lists.infradead.org; Thu, 22 Apr 2021 07:32:33 +0000 Subject: Re: Race between SIGIO and epoll from SMP host From: Anton Ivanov References: Message-ID: Date: Thu, 22 Apr 2021 08:32:25 +0100 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-um" Errors-To: linux-um-bounces+geert=linux-m68k.org@lists.infradead.org To: YiFei Zhu Cc: linux-um@lists.infradead.org On 21/04/2021 16:45, Anton Ivanov wrote: > > > On 21/04/2021 14:35, YiFei Zhu wrote: >> On Wed, Apr 21, 2021 at 7:32 AM Anton Ivanov >> wrote: >>>> Considering that this is a race on the host, what would be the best >>>> way to fix this? >>> >>> Interesting one. I need to think. >>> >>> One option would be to wait for epoll events with a timeout which is >>> larger than zero - f.e. HZ. >> >> I was about to say I could reproduce it even with a timeout of 1ms, >> then I realized that code I pasted above already used 1ms timeout. >> Assertion failures using 1ms timeout seems much rarer than 0 timeout >> however. >> >> For reference my CONFIG_HZ on the host is 1000. I also use >> CONFIG_NO_HZ_IDLE if that's relevant (I'm not too familiar with how >> the kernel ticking works). >> >>> If we have received a SIGIO there is an epoll event on the way. The >>> fact that it is not in the queue right now means that we are due to >>> process it shortly. > > This seems to be limited to ttys. Why - I need to figure it out. > > If this ends up as tty specific, we can enable the work-around for ttys > which was there when they were not producing sigio on write correctly. > > This ends up disabled on most modern machines, because modern kernels > produce sigio on write correctly for ttys. > > With the workaround enabled there is an extra IO event which is produced > after the notification appears on the poll loop in a helper thread. So > the stall should never happen. I now have an idea why we see this on ttys. TTY IO wake-up in addition to doing SIGIO before poll notifications, also does poll notifications using a wake-up which will reschedule. Compared to that, let's say socket does a sync wake-up which does not reschedule and does it before SIGIO. In either case, we stand a chance of missing an interrupt. Just in the second case it is extremely small. So small that I have never seen it in practice. The real way of dealing with it will be to do to do a helper thread which (e)polls the epoll fd and generates a SIGIO if there is an outstanding EPOLL notification which has been missed. This would also take care of the range of conditions which are currently handled by the SIGIO fd helper so that would become surplus to requirements. I think that just polling the epoll fd should do the job here. So this will also get rid of all the motions needed to register fds with the async helper. Brgds, > > A. > >>> >>> A. >> >> YiFei Zhu >> >> _______________________________________________ >> linux-um mailing list >> linux-um@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-um >> > -- Anton R. Ivanov https://www.kot-begemot.co.uk/ _______________________________________________ linux-um mailing list linux-um@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um