From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andy Lutomirski Subject: Re: How do I avoid recvmsg races with IP_RECVERR? Date: Tue, 2 Jun 2015 14:33:49 -0700 Message-ID: References: <1433279822.2479038.285136017.7DD69CC6@webmail.messagingengine.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: Andy Lutomirski , Network Development To: Hannes Frederic Sowa Return-path: Received: from mail-lb0-f180.google.com ([209.85.217.180]:33313 "EHLO mail-lb0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751668AbbFBVeL (ORCPT ); Tue, 2 Jun 2015 17:34:11 -0400 Received: by lbcue7 with SMTP id ue7so113092255lbc.0 for ; Tue, 02 Jun 2015 14:34:10 -0700 (PDT) In-Reply-To: <1433279822.2479038.285136017.7DD69CC6@webmail.messagingengine.com> Sender: netdev-owner@vger.kernel.org List-ID: On Tue, Jun 2, 2015 at 2:17 PM, Hannes Frederic Sowa wrote: > On Tue, Jun 2, 2015, at 21:40, Andy Lutomirski wrote: >> As far as I can tell, enabling IP_RECVERR causes the presence of a >> queued error to cause recvmsg, etc to return an error (once). It's >> worse, though: a new error can be queued asynchronously at any time, >> this setting sk_err to a nonzero value. How do I sensibly distinguish >> recvmsg failures to to genuine errors receiving messages from recvmsg >> failures because there's a queued error? >> >> The only way I can see to get reliable error handling is to literally >> call recvmsg in a loop: >> >> while (true /* or while POLLIN is set */) { >> int ret = recvmsg(..., MSG_ERRQUEUE not set); >> if (ret < 0 && /* what goes here? */) { >> whoops! this might be a harmless asynchronous error! >> take no action! >> } > > I see either two possibilities: > > We export the icmp_err_convert tables along with the udp_lib_err error > conversions to user space and spice them up with flags to mark if they > are transient (icmp_err_convert already has a fatal flag). This seems overcomplicated. I'd rather have a flag I pass to tell the kernel that I don't want to see transient errors (nd that I'll clear them myself using POLLERR and either MSG_ERRQUEUE or SO_ERROR. > > Otherwise you should be able to call recvmsg with MSG_ERRQUEUE set after > you got a ret < 0 when calling without MSG_ERRQUEUE and inspect the > sock_extended_err, no? I do this already, which makes me think that there's a bug or another race somewhere. I've only seen a failure once in several years of operation. The failure happened on a ping socket. I suspect that the race is: ping_err: ip_icmp_error(...); user: recvmsg(MSG_ERRQUEUE) and dequeues the error. ping_err: sk_err = err; user: recvmsg(MSG_ERRQUEUE not set), and recvmsg sees and clears the error via sock_error. user: recvmsg(MSG_ERRQUEUE), and recvmsg returns -EAGAIN. Now the user code thinks that it was a real (non-transient) error and aborts. Shouldn't that sk->sk_err = err assignment at least use WRITE_ONCE? Even if this race were fixed, this interface still sucks IMO. --Andy