From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752191AbbINUTl (ORCPT ); Mon, 14 Sep 2015 16:19:41 -0400 Received: from tex.lwn.net ([70.33.254.29]:44544 "EHLO vena.lwn.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752017AbbINUTd (ORCPT ); Mon, 14 Sep 2015 16:19:33 -0400 Date: Mon, 14 Sep 2015 14:19:31 -0600 From: Jonathan Corbet To: Alexander Shishkin Cc: Peter Zijlstra , Ingo Molnar , linux-kernel@vger.kernel.org, vince@deater.net, eranian@google.com, johannes@sipsolutions.net, Arnaldo Carvalho de Melo Subject: Re: [PATCH RFC v3 1/6] exterr: Introduce extended syscall error reporting Message-ID: <20150914141931.39f6be92@lwn.net> In-Reply-To: <1441987205-4021-2-git-send-email-alexander.shishkin@linux.intel.com> References: <1441987205-4021-1-git-send-email-alexander.shishkin@linux.intel.com> <1441987205-4021-2-git-send-email-alexander.shishkin@linux.intel.com> Organization: LWN.net MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 11 Sep 2015 19:00:00 +0300 Alexander Shishkin wrote: > It has been pointed out several times that certain system calls' error > reporting leaves a lot to be desired [1], [2]. Such system calls would > take complex parameter structures as their input and return -EINVAL if > one or more parameters are invalid or in conflict leaving it up to the > user to figure out exactly what is wrong with their request. One such > syscall is perf_event_open() with its attribute structure containing > 40+ parameters and tens of parameter validation checks. > > This patch introduces a fairly simple infrastructure that allows call > sites to annotate their error codes with arbitrary strings, which the > userspace can fetch using a prctl() along with the module name that > produced the error message, file name, line number and optionally any > amount of additional information in JSON format. So, a couple of comments as I try to figure this stuff out... > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 119823decc..0eeeda62ef 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1776,6 +1776,7 @@ struct task_struct { > #ifdef CONFIG_DEBUG_ATOMIC_SLEEP > unsigned long task_state_change; > #endif > + int ext_err_code; > int pagefault_disabled; Here you add this extended error code, fine. I'll come back to this in a moment, but first... [...] > +static struct ext_err_site *ext_err_site_find(int code) > +{ > + struct ext_err_domain_desc *dom = ext_err_domain_find(code); > + struct ext_err_site *site; > + unsigned long off; > + > + if (!code) > + return NULL; You've already used "code" by this point, and it looks like ext_err_domain_find() will react by doing a full, doomed search for it. > + /* shouldn't happen */ > + if (WARN_ON_ONCE(!dom)) > + return NULL; > + > + code -= dom->first; > + off = (unsigned long)dom->start + dom->err_site_size * code; > + site = (struct ext_err_site *)off; > + > + return site; > +} > + > +int ext_err_copy_to_user(void __user *buf, size_t size) > +{ > + struct ext_err_domain_desc *dom = ext_err_domain_find(current->ext_err_code); > + struct ext_err_site *site = ext_err_site_find(current->ext_err_code); Here too you use ext_err_code without making sure it has a useful value. This will also result in two calls to ext_err_domain_find(), and, thus, a redundant search of the domains array. Why not just pass dom to the second call? [...] > + if (!ret) > + current->ext_err_code = 0; Back to this code: here is the only place you ever clear it, and this seems to be an error response in its own right. So, if I read this correctly, once an extended error has been signalled, it will remain forever in the task state until another extended error overwrites it, right? What that says to me is that there will be no way to know whether an error description returned from prctl() corresponds to an error just reported to the application or not; it could be an old error from last week. That strikes me as being less useful than it could otherwise be. It seems to me that current->ext_err_code needs to be cleared on each system call entry (except for your special prctl() of course!). Or have I missed something somewhere? Thanks, jon