linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries
@ 2021-11-13 20:37 David Malcolm
  2021-11-13 20:37 ` [PATCH 1a/6] RFC: Implement "#pragma GCC custom_address_space" David Malcolm
                   ` (9 more replies)
  0 siblings, 10 replies; 39+ messages in thread
From: David Malcolm @ 2021-11-13 20:37 UTC (permalink / raw)
  To: gcc-patches, linux-toolchains; +Cc: David Malcolm

[Crossposting between gcc-patches@gcc.gnu.org and
linux-toolchains@vger.kernel.org; sorry about my lack of kernel
knowledge, in case of the following seems bogus]

I've been trying to turn my prototype from the LPC2021 session on
"Adding kernel-specific test coverage to GCC's -fanalyzer option"
( https://linuxplumbersconf.org/event/11/contributions/1076/ ) into
something that can go into GCC upstream without adding kernel-specific
special cases, or requiring a GCC plugin.  The prototype simply
specialcased "copy_from_user" and "copy_to_user" in GCC, which is
clearly not OK.

This GCC patch kit implements detection of "trust boundaries", aimed at
detection of "infoleaks" and of use of unsanitized attacker-controlled
values ("taint") in the Linux kernel.

For example, here's an infoleak diagnostic (using notes to
express what fields and padding within a struct have not been
initialized):

infoleak-CVE-2011-1078-2.c: In function ‘test_1’:
infoleak-CVE-2011-1078-2.c:28:9: warning: potential exposure of sensitive
  information by copying uninitialized data from stack across trust
  boundary [CWE-200] [-Wanalyzer-exposure-through-uninit-copy]
   28 |         copy_to_user(optval, &cinfo, sizeof(cinfo));
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ‘test_1’: events 1-3
    |
    |   21 |         struct sco_conninfo cinfo;
    |      |                             ^~~~~
    |      |                             |
    |      |                             (1) region created on stack here
    |      |                             (2) capacity: 6 bytes
    |......
    |   28 |         copy_to_user(optval, &cinfo, sizeof(cinfo));
    |      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |         |
    |      |         (3) uninitialized data copied from stack here
    |
infoleak-CVE-2011-1078-2.c:28:9: note: 1 byte is uninitialized
   28 |         copy_to_user(optval, &cinfo, sizeof(cinfo));
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
infoleak-CVE-2011-1078-2.c:14:15: note: padding after field ‘dev_class’ is uninitialized (1 byte)
   14 |         __u8  dev_class[3];
      |               ^~~~~~~~~
infoleak-CVE-2011-1078-2.c:21:29: note: suggest forcing zero-initialization by providing a ‘{0}’ initializer
   21 |         struct sco_conninfo cinfo;
      |                             ^~~~~
      |                                   = {0}

I have to come up with a way of expressing trust boundaries in a way
that will be:
- acceptable to the GCC community (not be too kernel-specific), and
- useful to the Linux kernel community.

At LPC it was pointed out that the kernel already has various
annotations e.g. "__user" for different kinds of pointers, and that it
would be best to reuse those.


Approach 1: Custom Address Spaces
=================================

GCC's C frontend supports target-specific address spaces; see:
  https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
Quoting the N1275 draft of ISO/IEC DTR 18037:
  "Address space names are ordinary identifiers, sharing the same name
  space as variables and typedef names.  Any such names follow the same
  rules for scope as other ordinary identifiers (such as typedef names).
  An implementation may provide an implementation-defined set of
  intrinsic address spaces that are, in effect, predefined at the start
  of every translation unit.  The names of intrinsic address spaces must
  be reserved identifiers (beginning with an underscore and an uppercase
  letter or with two underscores).  An implementation may also
  optionally support a means for new address space names to be defined
  within a translation unit."

Patch 1a in the following patch kit for GCC implements such a means to
define new address spaces names in a translation unit, via a pragma:
  #prgama GCC custom_address_space(NAME_OF_ADDRESS_SPACE)

For example, the Linux kernel could perhaps write:

  #define __kernel
  #pragma GCC custom_address_space(__user)
  #pragma GCC custom_address_space(__iomem)
  #pragma GCC custom_address_space(__percpu)
  #pragma GCC custom_address_space(__rcu)

and thus the C frontend can complain about code that mismatches __user
and kernel pointers, e.g.:

custom-address-space-1.c: In function ‘test_argpass_to_p’:
custom-address-space-1.c:29:14: error: passing argument 1 of ‘accepts_p’
from pointer to non-enclosed address space
   29 |   accepts_p (p_user);
      |              ^~~~~~
custom-address-space-1.c:21:24: note: expected ‘void *’ but argument is
of type ‘__user void *’
   21 | extern void accepts_p (void *);
      |                        ^~~~~~
custom-address-space-1.c: In function ‘test_cast_k_to_u’:
custom-address-space-1.c:135:12: warning: cast to ‘__user’ address space
pointer from disjoint generic address space pointer
  135 |   p_user = (void __user *)p_kernel;
      |            ^

The patch doesn't yet maintain a good distinction between implicit
target-specific address spaces and user-defined address spaces, has at
least one known major bug, and has only been lightly tested.  I can
fix these issues, but was hoping for feedback that this approach is the
right direction from both the GCC and Linux development communities.

Implementation status: doesn't yet bootstrap; am running into stage2
vs stage3 comparison issues.


Approach 2: An "untrusted" attribute
====================================

Alternatively, patch 1b in the kit implements:

  __attribute__((untrusted))

which can be applied to types as a qualifier (similarly to const,
volatile, etc) to mark a trust boundary, hence the kernel could have:

  #define __user __attribute__((untrusted))

where my patched GCC treats
  T *
vs 
  T __attribute__((untrusted)) *
as being different types and thus the C frontend can complain (even without
-fanalyzer) about e.g.:

extern void accepts_p(void *);

void test_argpass_to_p(void __user *p_user)
{
  accepts_p(p_user);
}

untrusted-pointer-1.c: In function ‘test_argpass_to_p’:
untrusted-pointer-1.c:22:13: error: passing argument 1 of ‘accepts_p’
from pointer with different trust level
   22 |   accepts_p(p_user);
      |              ^~~~~~
untrusted-pointer-1.c:14:23: note: expected ‘void *’ but argument is of
type ‘__attribute__((untrusted)) void *’
   14 | extern void accepts_p(void *);
      |                        ^~~~~~

So you'd get enforcement of __user vs non-__user pointers as part of
GCC's regular type-checking.  (You need an explicit cast to convert
between the untrusted vs trusted types).

This approach is much less expressive that the custom addres space
approach; it would only cover the trust boundary aspect; it wouldn't
cover any differences between generic pointers and __user, vs __iomem,
__percpu, and __rcu which I admit I only dimly understand.

Implementation status: bootstraps and passes regression testing.
Builds most of the kernel, but am running into various conversion
issues.  It would be good to have some clarity on what conversions
the compiler ought to warn about, and what conversions should be OK.


Approach 3: some kind of custom qualifier
=========================================

Approach 1 extends the existing "named address space" machinery to add
new values; approach 2 adds a new flag to cv-qualifiers.  Both of these
approaches work in terms of cv-qualifiers.  We have some spare bits
available for these; perhaps a third approach could be to add a new
kind of user-defined qualifier, like named address spaces, but othogonal
to them.   I haven't attempted to implement this.


Other attributes
================

Patch 2 in the kit adds:
  __attribute__((returns_zero_on_success))
and
  __attribute__((returns_nonzero_on_success))
as hints to the analyzer that it's worth bifurcating the analysis of
such functions (to explore failure vs success, and thus to better
explore error-handling paths).  It's also a hint to the human reader of
the source code.

Given the above, the kernel could then have:

extern int copy_from_user(void *to, const void __user *from, long n)
  __attribute__((access (write_only, 1, 3),
		 access (read_only, 2, 3),
		 returns_zero_on_success));

extern long copy_to_user(void __user *to, const void *from, unsigned long n)
  __attribute__((access (write_only, 1, 3),
		 access (read_only, 2, 3),
		 returns_zero_on_success));

with suitable macros in compiler.h or whatnot.

("access" is an existing GCC attribute; see
 https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html )

My patched GCC add a heuristic to -fanalyzer that a 3-argument function
with a read_only buffer, a write_only buffer and a shared size argument
is a "copy function", and treats it as a copy from *from to *to of up to
n bytes that succeeds, or, given one of the above attributes can succeed
or fail.  I'm wiring things up so that values read from *untrusted_ptr
are tracked as tainted, and values written to *untrusted_ptr are treated
as possible infoleaks (e.g. uninitialized values written to
*untrusted_ptr are specifically called out).  This gets the extra
checking for infoleaks and taint that my earlier prototype had, but is
thus expressed via attributes, without having to have kernel-specific
special cases.

Patch 3 of the kit adds infoleak detection to GCC's -fanalyzer (as
in the example above).

Possibly silly question: is it always a bug for the value of a kernel
pointer to leak into user space?  i.e. should I be complaining about an
infoleak if the value of a trusted_ptr itself is written to
*untrusted_ptr?  e.g.

  s.p = some_kernel_ptr;
  copy_to_user(user_p, &s, sizeof (s));
     /* value of some_kernel_ptr is written to user space;
        is this something we should warn for?  */

Patch 4a/4b wire up the different implementations of "untrusted" into
GCC's -fanalyzer, which is used by...

Patch 5 uses this so that "untrusted" values are used in taint detection
in the analyzer, so that it can complain about attacker-controlled
values being used without sanitization.

Patch 6 adds a new __attribute__ ((tainted)) allowing for further
taint detection (e.g. identifying syscalls), with minimal patching of
the kernel, and without requiring a lot of link-time interprocedural
analysis.  I believe that some of this could work independently of
the trust boundary marking from the rest of the patch kit.

The combined patch kit (using approach 2 i.e. the "b" patches)
successfully bootstraps and passes regression testing on
x86_64-pc-linux-gnu.


Which of the 3 approaches looks best to:
- the GCC community?
- the Linux kernel community?

Does clang/LLVM have anything similar?

There are many examples in the patches, some of which are taken from
historical kernel vulnerabilities, and others from my "antipatterns.ko"
project ( https://github.com/davidmalcolm/antipatterns.ko ).

Thoughts?

Dave


David Malcolm (6 or 8, depending how you count):
  1a: RFC: Implement "#pragma GCC custom_address_space"
  1b: Add __attribute__((untrusted))
  2: Add returns_zero_on_success/failure attributes
  3: analyzer: implement infoleak detection
  4a: analyzer: implemention of region::untrusted_p in terms of custom
    address spaces
  4b: analyzer: implement region::untrusted_p in terms of
    __attribute__((untrusted))
  5: analyzer: use region::untrusted_p in taint detection
  6: Add __attribute__ ((tainted))

 gcc/Makefile.in                               |   3 +-
 gcc/analyzer/analyzer.opt                     |  20 +
 gcc/analyzer/checker-path.cc                  | 104 +++
 gcc/analyzer/checker-path.h                   |  47 +
 gcc/analyzer/diagnostic-manager.cc            |  75 +-
 gcc/analyzer/diagnostic-manager.h             |   3 +-
 gcc/analyzer/engine.cc                        | 342 ++++++-
 gcc/analyzer/exploded-graph.h                 |   3 +
 gcc/analyzer/pending-diagnostic.cc            |  30 +
 gcc/analyzer/pending-diagnostic.h             |  24 +
 gcc/analyzer/program-state.cc                 |  26 +-
 gcc/analyzer/region-model-impl-calls.cc       |  26 +-
 gcc/analyzer/region-model.cc                  | 504 ++++++++++-
 gcc/analyzer/region-model.h                   |  46 +-
 gcc/analyzer/region.cc                        |  52 ++
 gcc/analyzer/region.h                         |   4 +
 gcc/analyzer/sm-taint.cc                      | 839 ++++++++++++++++--
 gcc/analyzer/sm.h                             |   9 +
 gcc/analyzer/store.h                          |   1 +
 gcc/analyzer/trust-boundaries.cc              | 615 +++++++++++++
 gcc/c-family/c-attribs.c                      | 132 +++
 gcc/c-family/c-pretty-print.c                 |   2 +
 gcc/c/c-typeck.c                              |  64 ++
 gcc/doc/extend.texi                           |  63 +-
 gcc/doc/invoke.texi                           |  80 +-
 gcc/print-tree.c                              |   3 +
 .../c-c++-common/attr-returns-zero-on-1.c     |  68 ++
 gcc/testsuite/c-c++-common/attr-untrusted-1.c | 165 ++++
 .../gcc.dg/analyzer/attr-tainted-1.c          |  88 ++
 .../gcc.dg/analyzer/attr-tainted-misuses.c    |   6 +
 .../gcc.dg/analyzer/copy-function-1.c         |  98 ++
 .../gcc.dg/analyzer/copy_from_user-1.c        |  45 +
 gcc/testsuite/gcc.dg/analyzer/infoleak-1.c    | 181 ++++
 gcc/testsuite/gcc.dg/analyzer/infoleak-2.c    |  29 +
 gcc/testsuite/gcc.dg/analyzer/infoleak-3.c    | 141 +++
 gcc/testsuite/gcc.dg/analyzer/infoleak-5.c    |  35 +
 .../analyzer/infoleak-CVE-2011-1078-1.c       | 134 +++
 .../analyzer/infoleak-CVE-2011-1078-2.c       |  42 +
 .../analyzer/infoleak-CVE-2014-1446-1.c       | 117 +++
 .../analyzer/infoleak-CVE-2017-18549-1.c      | 101 +++
 .../analyzer/infoleak-CVE-2017-18550-1.c      | 171 ++++
 .../gcc.dg/analyzer/infoleak-antipatterns-1.c | 162 ++++
 .../gcc.dg/analyzer/infoleak-fixit-1.c        |  22 +
 gcc/testsuite/gcc.dg/analyzer/pr93382.c       |   2 +-
 .../analyzer/taint-CVE-2011-0521-1-fixed.c    | 113 +++
 .../gcc.dg/analyzer/taint-CVE-2011-0521-1.c   | 113 +++
 .../analyzer/taint-CVE-2011-0521-2-fixed.c    |  93 ++
 .../gcc.dg/analyzer/taint-CVE-2011-0521-2.c   |  93 ++
 .../analyzer/taint-CVE-2011-0521-3-fixed.c    |  56 ++
 .../gcc.dg/analyzer/taint-CVE-2011-0521-3.c   |  57 ++
 .../gcc.dg/analyzer/taint-CVE-2011-0521-4.c   |  40 +
 .../gcc.dg/analyzer/taint-CVE-2011-0521-5.c   |  42 +
 .../gcc.dg/analyzer/taint-CVE-2011-0521-6.c   |  37 +
 .../gcc.dg/analyzer/taint-CVE-2011-0521.h     | 136 +++
 .../gcc.dg/analyzer/taint-CVE-2011-2210-1.c   |  93 ++
 .../gcc.dg/analyzer/taint-CVE-2020-13143-1.c  |  38 +
 .../gcc.dg/analyzer/taint-CVE-2020-13143-2.c  |  32 +
 .../gcc.dg/analyzer/taint-CVE-2020-13143.h    |  91 ++
 gcc/testsuite/gcc.dg/analyzer/taint-alloc-1.c |  64 ++
 gcc/testsuite/gcc.dg/analyzer/taint-alloc-2.c |  27 +
 gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c |  21 +
 gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c |  31 +
 .../gcc.dg/analyzer/taint-antipatterns-1.c    | 137 +++
 .../gcc.dg/analyzer/taint-divisor-1.c         |  26 +
 .../{taint-1.c => taint-read-index-1.c}       |  19 +-
 .../gcc.dg/analyzer/taint-read-offset-1.c     | 128 +++
 .../taint-read-through-untrusted-ptr-1.c      |  37 +
 gcc/testsuite/gcc.dg/analyzer/taint-size-1.c  |  32 +
 .../gcc.dg/analyzer/taint-write-index-1.c     | 132 +++
 .../gcc.dg/analyzer/taint-write-offset-1.c    | 132 +++
 gcc/testsuite/gcc.dg/analyzer/test-uaccess.h  |  19 +
 .../torture/infoleak-net-ethtool-ioctl.c      |  78 ++
 .../torture/infoleak-vfio_iommu_type1.c       |  39 +
 gcc/tree-core.h                               |   6 +-
 gcc/tree.c                                    |   1 +
 gcc/tree.h                                    |  11 +-
 76 files changed, 6558 insertions(+), 140 deletions(-)
 create mode 100644 gcc/analyzer/trust-boundaries.cc
 create mode 100644 gcc/testsuite/c-c++-common/attr-returns-zero-on-1.c
 create mode 100644 gcc/testsuite/c-c++-common/attr-untrusted-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/copy-function-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/copy_from_user-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-5.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2011-1078-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2011-1078-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2014-1446-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2017-18549-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2017-18550-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-antipatterns-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-fixit-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-1-fixed.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-2-fixed.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-3-fixed.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-5.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-6.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521.h
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-antipatterns-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-divisor-1.c
 rename gcc/testsuite/gcc.dg/analyzer/{taint-1.c => taint-read-index-1.c} (72%)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-read-offset-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-read-through-untrusted-ptr-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-write-index-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-write-offset-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/test-uaccess.h
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/torture/infoleak-net-ethtool-ioctl.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/torture/infoleak-vfio_iommu_type1.c

-- 
2.26.3


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1a/6] RFC: Implement "#pragma GCC custom_address_space"
  2021-11-13 20:37 [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries David Malcolm
@ 2021-11-13 20:37 ` David Malcolm
  2021-11-13 20:37 ` [PATCH 1b/6] Add __attribute__((untrusted)) David Malcolm
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 39+ messages in thread
From: David Malcolm @ 2021-11-13 20:37 UTC (permalink / raw)
  To: gcc-patches, linux-toolchains; +Cc: David Malcolm

This work-in-progress patch adds a new:

  #prgama GCC custom_address_space(NAME_OF_ADDRESS_SPACE)

for use by the C front-end.

Currently the custom address spaces are:

- disjoint from all other address spaces, *including* the generic one

- treated the same as the generic address space at the RTL level (in
  terms of code generation)

- treated as "untrusted" by -fanalyzer in a follow-up patch.

but additional syntax could be added to change those defaults if
needed.

The intended use for this is in Linux kernel code, allowing e.g.:

  #define __kernel
  #pragma GCC custom_address_space(__user)
  #pragma GCC custom_address_space(__iomem)
  #pragma GCC custom_address_space(__percpu)
  #pragma GCC custom_address_space(__rcu)

so that the C front-end can complain about mismatching user-space vs
kernel-space pointers during type-checking (and that -fanalyzer can
detect infoleaks and "taint" as data is copied across trust boundaries).

Known issues:
- addr_space_convert is not implemented.
- there isn't yet a way to forcibly cast between address spaces,
  perhaps this should be a built-in function.
- only tested so far on x86_64 (probably needs to use
  ensure_builtin_addr_space everywhere in the target-specific code that
  tests against specific address space IDs).
- issue in testsuite (custom-address-space-2.c)
- issue with precompiled headers

gcc/ChangeLog:
	* Makefile.in (OBJS): Add addr-space.o.
	(GTFILES): Add addr-space.cc.
	* addr-space.cc: New file.
	* addr-space.h: New file.
	* auto-inc-dec.c: Include "addr-space.h".
	(find_inc): Convert targetm.addr_space. uses into addr_space_
	calls.
	* builtins.c: Include "addr-space.h".
	(get_builtin_sync_mem): Convert targetm.addr_space. use into
	addr_space_ call.
	* cfgexpand.c: Include "addr-space.h".
	(convert_debug_memory_address): Convert targetm.addr_space. uses
	into addr_space_ calls.
	(expand_debug_expr): Likewise.
	* config/i386/i386.c: Include "addr-space.h".
	(ix86_print_operand_address_as): Call ensure_builtin_addr_space.
	* coretypes.h (ADDR_SPACE_T_MAX): New.
	(struct custom_addr_space): New forward decl.
	* doc/extend.texi (Named Address Spaces): Mention the new pragma.
	(Custom Address Space Pragmas): New node and subsection.
	* dwarf2out.c: Include "addr-space.h".
	(modified_type_die): Convert targetm.addr_space. use into
	addr_space_ call.
	* emit-rtl.c: Include "addr-space.h".
	(adjust_address_1): Convert targetm.addr_space. use into
	addr_space_ call.
	* explow.c: Include "addr-space.h".
	(convert_memory_address_addr_space_1): Convert targetm.addr_space.
	use into addr_space_ call.
	(memory_address_addr_space): Likewise.
	(promote_mode): Likewise.
	* expr.c: Include "addr-space.h".
	(store_expr): Convert targetm.addr_space. use into addr_space_ call.
	(expand_expr_addr_expr): Likewise.
	(expand_expr_real_2): Likewise.
	(expand_expr_real_1): Likewise.
	* fold-const.c: Include "addr-space.h".
	(const_unop): Convert targetm.addr_space. use into addr_space_ call.
	* gimple.c: Include "addr-space.h".
	(check_loadstore): Convert targetm.addr_space. use into
	addr_space_ call.
	* lra-constraints.c: Include "addr-space.h".
	(valid_address_p): Convert targetm.addr_space. use into
	addr_space_ call.
	* pointer-query.cc: Include "addr-space.h"; drop include of
	"target.h".
	(compute_objsize_r): Convert targetm.addr_space. use into
	addr_space_ call.
	* recog.c: Include "addr-space.h".
	(memory_address_addr_space_p): Convert targetm.addr_space. use
	into addr_space_ call.
	(offsettable_address_addr_space_p): Likewise.
	* reload.c: Include "addr-space.h".
	(strict_memory_address_addr_space_p): Convert targetm.addr_space.
	use into addr_space_ call.
	(find_reloads_address): Likewise.
	* rtlanal.c: Include "addr-space.h".
	(get_address_mode): Convert targetm.addr_space. use into
	addr_space_ call.
	* tree-ssa-address.c: Include "addr-space.h".
	(addr_for_mem_ref): Convert targetm.addr_space. use into
	addr_space_ call.
	(multiplier_allowed_in_address_p): Likewise.
	(most_expensive_mult_to_index): Likewise.
	* tree-ssa-loop-ivopts.c: Include "addr-space.h".
	(addr_offset_valid_p): Convert targetm.addr_space. use into
	addr_space_ call.
	(produce_memory_decl_rtl): Likewise.
	* tree.c: Include "addr-space.h".
	(build_pointer_type_for_mode): Convert targetm.addr_space. use
	into addr_space_ call.
	(build_reference_type_for_mode): Likewise.
	* varasm.c: Include "addr-space.h".
	(make_decl_rtl): Convert targetm.addr_space. use into addr_space_
	call.
	(output_constant): Likewise.

gcc/c-family/ChangeLog:
	* c-attribs.c: Include "addr-space.h".
	(handle_mode_attribute): Convert targetm.addr_space. use into
	addr_space_ call.
	* c-common.h (c_register_custom_addr_space): New decl.
	* c-pragma.c: Include "addr-space.h".
	(handle_pragma_custom_address_space): New.
	(init_pragma): Register the new pragma "GCC custom_address_space".
	Create the address space manager.

gcc/c/ChangeLog:
	* c-decl.c: Include "addr-space.h".
	(c_build_pointer_type): Convert targetm.addr_space. use into
	addr_space_ call.
	(register_addr_space_identifier): New, split out from...
	(c_register_addr_space): ...this function.
	(c_register_custom_addr_space): New.
	* c-parser.c: Include "addr-space.h".
	(c_lex_one_token): Convert targetm.addr_space. use into
	addr_space_ call.
	* c-typeck.c: Include "addr-space.h".
	(addr_space_superset): Convert targetm.addr_space. uses into
	addr_space_ calls.
	(build_c_cast): Quote the names of named address spaces.
	(convert_for_assignment): Convert targetm.addr_space. use into
	addr_space_ call.  Add auto_diagnostic_group and a note about
	which types were involved when complaining about mismatching
	address spaces.

gcc/cp/ChangeLog:
	* tree.c (c_register_custom_addr_space): New stub.

gcc/testsuite/ChangeLog:
	* gcc.dg/custom-address-space-1.c: New test.
	* gcc.dg/custom-address-space-2.c: New test.
	* gcc.dg/custom-address-space-3.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
---
 gcc/Makefile.in                               |   2 +
 gcc/addr-space.cc                             | 177 ++++++++++++++++++
 gcc/addr-space.h                              | 122 ++++++++++++
 gcc/auto-inc-dec.c                            |   5 +-
 gcc/builtins.c                                |   3 +-
 gcc/c-family/c-attribs.c                      |   3 +-
 gcc/c-family/c-common.h                       |   1 +
 gcc/c-family/c-pragma.c                       |  30 +++
 gcc/c/c-decl.c                                |  59 +++++-
 gcc/c/c-parser.c                              |   3 +-
 gcc/c/c-typeck.c                              |  34 ++--
 gcc/cfgexpand.c                               |   9 +-
 gcc/config/i386/i386.c                        |   3 +
 gcc/coretypes.h                               |   3 +
 gcc/cp/tree.c                                 |   8 +
 gcc/doc/extend.texi                           |  46 +++++
 gcc/dwarf2out.c                               |   3 +-
 gcc/emit-rtl.c                                |   3 +-
 gcc/explow.c                                  |  11 +-
 gcc/expr.c                                    |  17 +-
 gcc/fold-const.c                              |   3 +-
 gcc/gimple.c                                  |   3 +-
 gcc/lra-constraints.c                         |   3 +-
 gcc/pointer-query.cc                          |   4 +-
 gcc/recog.c                                   |   7 +-
 gcc/reload.c                                  |   5 +-
 gcc/rtlanal.c                                 |   3 +-
 gcc/testsuite/gcc.dg/custom-address-space-1.c | 174 +++++++++++++++++
 gcc/testsuite/gcc.dg/custom-address-space-2.c |  21 +++
 gcc/testsuite/gcc.dg/custom-address-space-3.c |  15 ++
 gcc/tree-ssa-address.c                        |   9 +-
 gcc/tree-ssa-loop-ivopts.c                    |   5 +-
 gcc/tree.c                                    |   5 +-
 gcc/varasm.c                                  |   7 +-
 34 files changed, 742 insertions(+), 64 deletions(-)
 create mode 100644 gcc/addr-space.cc
 create mode 100644 gcc/addr-space.h
 create mode 100644 gcc/testsuite/gcc.dg/custom-address-space-1.c
 create mode 100644 gcc/testsuite/gcc.dg/custom-address-space-2.c
 create mode 100644 gcc/testsuite/gcc.dg/custom-address-space-3.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 571e9c28e29..846f44f24fa 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1296,6 +1296,7 @@ OBJS = \
 	insn-recog.o \
 	insn-enums.o \
 	ggc-page.o \
+	addr-space.o \
 	adjust-alignment.o \
 	alias.o \
 	alloc-pool.o \
@@ -2655,6 +2656,7 @@ GTFILES = $(CPPLIB_H) $(srcdir)/input.h $(srcdir)/coretypes.h \
   $(srcdir)/symtab-thunks.h $(srcdir)/symtab-thunks.cc \
   $(srcdir)/symtab-clones.h \
   $(srcdir)/reload.h $(srcdir)/caller-save.c $(srcdir)/symtab.c \
+  $(srcdir)/addr-space.cc \
   $(srcdir)/alias.c $(srcdir)/bitmap.c $(srcdir)/cselib.c $(srcdir)/cgraph.c \
   $(srcdir)/ipa-prop.c $(srcdir)/ipa-cp.c $(srcdir)/ipa-utils.h \
   $(srcdir)/ipa-param-manipulation.h $(srcdir)/ipa-sra.c $(srcdir)/dbxout.c \
diff --git a/gcc/addr-space.cc b/gcc/addr-space.cc
new file mode 100644
index 00000000000..ebb2829171f
--- /dev/null
+++ b/gcc/addr-space.cc
@@ -0,0 +1,177 @@
+/* Support for managing address spaces (both target-specific and custom).
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tree.h"
+#include "addr-space.h"
+#include "target.h"
+
+/* If AS is a custom address space, return a built-in address space
+   that's equivalent to it at the RTL level.
+   Otherwise, return AS.  */
+
+addr_space_t
+ensure_builtin_addr_space (addr_space_t as)
+{
+  /* For now, map all custom address spaces to the generic address space.  */
+  if (g_addr_space_mgr)
+    if (g_addr_space_mgr->custom_p (as))
+      return ADDR_SPACE_GENERIC;
+  return as;
+}
+
+/* Various functions to act on addr_space_t.
+   These handle custom address spaces, and otherwise call into the
+   corresponding target hook targetm.addr_space.NAME.  */
+
+scalar_int_mode
+addr_space_pointer_mode (addr_space_t address_space)
+{
+  address_space = ensure_builtin_addr_space (address_space);
+  return targetm.addr_space.pointer_mode (address_space);
+}
+
+scalar_int_mode
+addr_space_address_mode (addr_space_t address_space)
+{
+  address_space = ensure_builtin_addr_space (address_space);
+  return targetm.addr_space.address_mode (address_space);
+}
+
+bool
+addr_space_valid_pointer_mode (scalar_int_mode mode,
+			       addr_space_t as)
+{
+  as = ensure_builtin_addr_space (as);
+  return targetm.addr_space.valid_pointer_mode (mode, as);
+}
+
+bool
+addr_space_legitimate_address_p (machine_mode mode, rtx exp,
+				 bool strict, addr_space_t as)
+{
+  as = ensure_builtin_addr_space (as);
+  return targetm.addr_space.legitimate_address_p (mode, exp, strict, as);
+}
+
+rtx
+addr_space_legitimize_address (rtx x, rtx oldx, machine_mode mode,
+			       addr_space_t as)
+{
+  as = ensure_builtin_addr_space (as);
+  return targetm.addr_space.legitimize_address (x, oldx, mode, as);
+}
+
+bool
+addr_space_subset_p (addr_space_t subset, addr_space_t superset)
+{
+  if (subset == superset)
+    return true;
+  if (g_addr_space_mgr)
+    {
+      /* For now, assume all custom address spaces are disjoint
+	 from each other and from builtin address spaces.  */
+      if (g_addr_space_mgr->custom_p (subset)
+	  || g_addr_space_mgr->custom_p (superset))
+	return false;
+    }
+  /* We have a pair of target-defined implicit address spaces.  */
+  return targetm.addr_space.subset_p (subset, superset);
+}
+
+bool
+addr_space_zero_address_valid (addr_space_t as)
+{
+  as = ensure_builtin_addr_space (as);
+  return targetm.addr_space.zero_address_valid (as);
+}
+
+rtx
+addr_space_convert (rtx /*op*/, tree /*from_type*/, tree /*to_type*/)
+{
+  gcc_unreachable (); // TODO
+}
+
+int
+addr_space_debug (addr_space_t as)
+{
+  as = ensure_builtin_addr_space (as);
+  return targetm.addr_space.debug (as);
+}
+
+void
+addr_space_diagnose_usage (addr_space_t as, location_t loc)
+{
+  as = ensure_builtin_addr_space (as);
+  return targetm.addr_space.diagnose_usage (as, loc);
+}
+
+/* class addr_space_manager.  */
+
+addr_space_manager::addr_space_manager ()
+: m_custom_addr_spaces (NULL),
+  m_max_static_addr_space (0),
+  m_last_addr_space (0)
+{
+}
+
+/* Hook to be called when a built-in address space is registered.  */
+
+void
+addr_space_manager::on_builtin_addr_space (addr_space_t as)
+{
+  /* All builtin addr spaces should have been created before creating
+     any custom address spaces.  */
+  gcc_assert (m_custom_addr_spaces == NULL);
+
+  m_max_static_addr_space = MAX (m_max_static_addr_space, as);
+  m_last_addr_space = m_max_static_addr_space;
+}
+
+/* Attempt to populate *OUT with a previously unused value.
+   Return true if successful, false otherwise.  */
+
+bool
+addr_space_manager::assign_dynamic_addr_space_t (addr_space_t *out)
+{
+  if (m_last_addr_space == ADDR_SPACE_T_MAX)
+    return false;
+
+  *out = ++m_last_addr_space;
+  return true;
+}
+
+/* Create a new custom_addr_space in the GC heap and stash a pointer to it.  */
+
+custom_addr_space *
+addr_space_manager::create_custom_addr_space (tree name,
+					      addr_space_t as,
+					      location_t loc)
+{
+  custom_addr_space *result
+    = new (ggc_alloc<custom_addr_space> ()) custom_addr_space (name, as, loc);
+  vec_safe_push (m_custom_addr_spaces, result);
+  return result;
+}
+
+/* The singleton instance of addr_space_manager.  */
+
+addr_space_manager *g_addr_space_mgr;
diff --git a/gcc/addr-space.h b/gcc/addr-space.h
new file mode 100644
index 00000000000..3da9a70c5d7
--- /dev/null
+++ b/gcc/addr-space.h
@@ -0,0 +1,122 @@
+/* Support for managing address spaces (both target-specific and custom).
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_ADDR_SPACE_H
+#define GCC_ADDR_SPACE_H
+
+extern addr_space_t ensure_builtin_addr_space (addr_space_t as);
+
+/* Various functions to act on addr_space_t.
+   These handle custom address spaces, and otherwise call into the
+   corresponding target hook targetm.addr_space.NAME.  */
+
+/* MODE to use for a pointer into another address space.  */
+extern scalar_int_mode addr_space_pointer_mode (addr_space_t address_space);
+
+/* MODE to use for an address in another address space.  */
+extern scalar_int_mode addr_space_address_mode (addr_space_t address_space);
+
+/* True if MODE is valid for a pointer in __attribute__((mode("MODE")))
+   in another address space.  */
+extern bool addr_space_valid_pointer_mode (scalar_int_mode mode,
+					   addr_space_t as);
+
+/* True if an address is a valid memory address to a given named address
+   space for a given mode.  */
+extern bool addr_space_legitimate_address_p (machine_mode mode, rtx exp,
+					     bool strict, addr_space_t as);
+
+/* Return an updated address to convert an invalid pointer to a named
+   address space to a valid one.  If NULL_RTX is returned use machine
+   independent methods to make the address valid.  */
+extern rtx addr_space_legitimize_address (rtx x, rtx oldx, machine_mode mode,
+					  addr_space_t as);
+
+/* True if one named address space is a subset of another named address. */
+extern bool addr_space_subset_p (addr_space_t subset, addr_space_t superset);
+
+/* True if 0 is a valid address in the address space, or false if
+   0 is a NULL in the address space.  */
+extern bool addr_space_zero_address_valid (addr_space_t as);
+
+/* Function to convert an rtl expression from one address space to another.  */
+extern rtx addr_space_convert (rtx op, tree from_type, tree to_type);
+
+/* Function to encode an address space into dwarf.  */
+extern int addr_space_debug (addr_space_t as);
+
+/* Function to emit custom diagnostic if an address space is used.  */
+extern void addr_space_diagnose_usage (addr_space_t as, location_t loc);
+
+
+/* Data structures for managing custom address spaces.  */
+
+/* These are GC-managed so that custom address spaces are preserved in
+   PCH files.  */
+
+/* A custom address space.  */
+
+struct GTY(()) custom_addr_space
+{
+  custom_addr_space () {}
+  custom_addr_space (tree id, addr_space_t as, location_t pragma_loc)
+  : m_id (id), m_as (as), m_pragma_loc (pragma_loc)
+  {
+  }
+
+  tree m_id;
+  addr_space_t m_as;
+  /* The location of the #pragma declaring this object.  */
+  location_t m_pragma_loc;
+
+  /* TODO: additional properties of the address space.  */
+};
+
+/* A class to manage addr_space_t IDs and custom_addr_space instances.
+
+   Targets have statically-assigned address space IDs, which are used
+   e.g. as cases in switch statements so we need to do a two-phase
+   allocation: all statically-assigned addr_space_t IDs, then any
+   dynamically-assigned addr_space_t IDs.  */
+
+class GTY(()) addr_space_manager
+{
+ public:
+  addr_space_manager ();
+  void on_builtin_addr_space (addr_space_t);
+  bool assign_dynamic_addr_space_t (addr_space_t *out);
+  custom_addr_space *create_custom_addr_space (tree name,
+					       addr_space_t as,
+					       location_t loc);
+  bool custom_p (addr_space_t as) const
+  {
+    return as > m_max_static_addr_space;
+  }
+
+ private:
+  vec<custom_addr_space *, va_gc> *m_custom_addr_spaces;
+  addr_space_t m_max_static_addr_space;
+  addr_space_t m_last_addr_space;
+};
+
+extern GTY(()) addr_space_manager *g_addr_space_mgr;
+
+/* TODO: test coverage for PCH.  */
+
+#endif /* GCC_ADDR_SPACE_H */
diff --git a/gcc/auto-inc-dec.c b/gcc/auto-inc-dec.c
index c531df8815c..04c7480c520 100644
--- a/gcc/auto-inc-dec.c
+++ b/gcc/auto-inc-dec.c
@@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dbgcnt.h"
 #include "print-rtl.h"
 #include "valtrack.h"
+#include "addr-space.h"
 
 /* This pass was originally removed from flow.c. However there is
    almost nothing that remains of that code.
@@ -1172,7 +1173,7 @@ find_inc (bool first_try)
 		     the inc must be a valid addressing reg.  */
 		  addr_space_t as = MEM_ADDR_SPACE (*mem_insn.mem_loc);
 		  if (GET_MODE (inc_insn.reg_res)
-		      != targetm.addr_space.address_mode (as))
+		      != addr_space_address_mode (as))
 		    {
 		      if (dump_file)
 			fprintf (dump_file, "base reg mode failure.\n");
@@ -1223,7 +1224,7 @@ find_inc (bool first_try)
 	     must be a valid addressing reg.  */
 	  addr_space_t as = MEM_ADDR_SPACE (*mem_insn.mem_loc);
 	  if (GET_MODE (inc_insn.reg_res)
-	      != targetm.addr_space.address_mode (as))
+	      != addr_space_address_mode (as))
 	    {
 	      if (dump_file)
 		fprintf (dump_file, "base reg mode failure.\n");
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 384864bfb3a..213785703fd 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -81,6 +81,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "demangle.h"
 #include "gimple-range.h"
 #include "pointer-query.h"
+#include "addr-space.h"
 
 struct target_builtins default_target_builtins;
 #if SWITCHABLE_TARGET
@@ -5562,7 +5563,7 @@ get_builtin_sync_mem (tree loc, machine_mode mode)
   int addr_space = TYPE_ADDR_SPACE (POINTER_TYPE_P (TREE_TYPE (loc))
 				    ? TREE_TYPE (TREE_TYPE (loc))
 				    : TREE_TYPE (loc));
-  scalar_int_mode addr_mode = targetm.addr_space.address_mode (addr_space);
+  scalar_int_mode addr_mode = addr_space_address_mode (addr_space);
 
   addr = expand_expr (loc, NULL_RTX, addr_mode, EXPAND_SUM);
   addr = convert_memory_address (addr_mode, addr);
diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 007b928c54b..e957d620651 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimplify.h"
 #include "tree-pretty-print.h"
 #include "gcc-rich-location.h"
+#include "addr-space.h"
 
 static tree handle_packed_attribute (tree *, tree, tree, int, bool *);
 static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *);
@@ -2115,7 +2116,7 @@ handle_mode_attribute (tree *node, tree name, tree args,
 	  tree (*fn)(tree, machine_mode, bool);
 
 	  if (!is_a <scalar_int_mode> (mode, &addr_mode)
-	      || !targetm.addr_space.valid_pointer_mode (addr_mode, as))
+	      || !addr_space_valid_pointer_mode (addr_mode, as))
 	    {
 	      error ("invalid pointer mode %qs", p);
 	      return NULL_TREE;
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index d5dad99ff97..d9d5cc35a4c 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -831,6 +831,7 @@ extern tree (*make_fname_decl) (location_t, tree, int);
 
 /* In c-decl.c and cp/tree.c.  FIXME.  */
 extern void c_register_addr_space (const char *str, addr_space_t as);
+extern custom_addr_space *c_register_custom_addr_space (tree id, location_t loc);
 
 /* In c-common.c.  */
 extern bool in_late_binary_op;
diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
index 3663eb1cfbb..d4c57fb5544 100644
--- a/gcc/c-family/c-pragma.c
+++ b/gcc/c-family/c-pragma.c
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "plugin.h"
 #include "opt-suggestions.h"
+#include "addr-space.h"
 
 #define GCC_BAD(gmsgid) \
   do { warning (OPT_Wpragmas, gmsgid); return; } while (0)
@@ -1220,6 +1221,30 @@ handle_pragma_message (cpp_reader *ARG_UNUSED(dummy))
 	    TREE_STRING_POINTER (message));
 }
 
+/* Handle #pragma GCC custom_address_space by attempting to register a
+   custom address space.  */
+
+static void
+handle_pragma_custom_address_space (cpp_reader *)
+{
+  location_t loc, id_loc;
+  tree x;
+  tree id;
+  const char *name = "#pragma GCC custom_address_space";
+  if (pragma_lex (&x, &loc) != CPP_OPEN_PAREN)
+    GCC_BAD2_AT (loc, "missing %<(%> after %<%s%> - ignored", name);
+
+  if (pragma_lex (&id, &id_loc) != CPP_NAME)
+    GCC_BAD2_AT (id_loc,
+		 "expected an identifier after %<%s(%> - ignored", name);
+
+  if (pragma_lex (&x) != CPP_CLOSE_PAREN)
+    GCC_BAD2_AT (loc, "malformed %<%s%> - ignored", name);
+
+  c_register_custom_addr_space (id, id_loc);
+  /* FIXME: additional clauses to set properties of addr space?  */
+}
+
 /* Mark whether the current location is valid for a STDC pragma.  */
 
 static bool valid_location_for_stdc_pragma;
@@ -1643,6 +1668,11 @@ init_pragma (void)
 
   c_register_pragma_with_expansion (0, "message", handle_pragma_message);
 
+  c_register_pragma ("GCC", "custom_address_space",
+		     handle_pragma_custom_address_space);
+  gcc_assert (g_addr_space_mgr == NULL);
+  g_addr_space_mgr
+    = new (ggc_alloc<addr_space_manager> ()) addr_space_manager ();
 #ifdef REGISTER_TARGET_PRAGMAS
   REGISTER_TARGET_PRAGMAS ();
 #endif
diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 186fa1692c1..c670f12ae06 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -61,6 +61,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "context.h"  /* For 'g'.  */
 #include "omp-general.h"
 #include "omp-offload.h"  /* For offload_vars.  */
+#include "addr-space.h"
 
 #include "tree-pretty-print.h"
 
@@ -653,7 +654,7 @@ c_build_pointer_type (tree to_type)
   machine_mode pointer_mode;
 
   if (as != ADDR_SPACE_GENERIC || c_default_pointer_mode == VOIDmode)
-    pointer_mode = targetm.addr_space.pointer_mode (as);
+    pointer_mode = addr_space_pointer_mode (as);
   else
     pointer_mode = c_default_pointer_mode;
   return build_pointer_type_for_mode (to_type, pointer_mode, false);
@@ -12334,23 +12335,67 @@ c_parse_final_cleanups (void)
   ext_block = NULL;
 }
 
+/* Register ID as a reserved word for the given RID.  */
+
+static void
+register_addr_space_identifier (tree id, int rid)
+{
+  C_SET_RID_CODE (id, rid);
+  C_IS_RESERVED_WORD (id) = 1;
+  ridpointers [rid] = id;
+}
+
 /* Register reserved keyword WORD as qualifier for address space AS.  */
 
 void
 c_register_addr_space (const char *word, addr_space_t as)
 {
+  /* Address space qualifiers are only supported
+     in C with GNU extensions enabled.  */
+  if (c_dialect_objc () || flag_no_asm)
+    return;
+
+  tree id = get_identifier (word);
+
   int rid = RID_FIRST_ADDR_SPACE + as;
-  tree id;
+  register_addr_space_identifier (id, rid);
+  gcc_assert (g_addr_space_mgr);
+  g_addr_space_mgr->on_builtin_addr_space (as);
+}
+
+/* Attempt to register a custom address space, reserving ID at LOC for
+   it as a reserved word.
 
+   If successful, register a GC-allocated custom_addr_space, registered
+   with the address_space_manager.
+   Otherwise emit a diagnostic and return NULL.  */
+
+custom_addr_space *
+c_register_custom_addr_space (tree id, location_t loc)
+{
   /* Address space qualifiers are only supported
      in C with GNU extensions enabled.  */
   if (c_dialect_objc () || flag_no_asm)
-    return;
+    return NULL; // FIXME: diagnostic
 
-  id = get_identifier (word);
-  C_SET_RID_CODE (id, rid);
-  C_IS_RESERVED_WORD (id) = 1;
-  ridpointers [rid] = id;
+  gcc_assert (g_addr_space_mgr);
+
+  addr_space_t as;
+  if (!g_addr_space_mgr->assign_dynamic_addr_space_t (&as))
+    {
+      warning_at (loc, OPT_Wpragmas, "too many custom address spaces");
+      return NULL;
+    }
+
+  int rid = RID_FIRST_ADDR_SPACE + as;
+  if (rid > RID_LAST_ADDR_SPACE)
+    {
+      warning_at (loc, OPT_Wpragmas, "too many custom address spaces");
+      return NULL;
+    }
+
+  register_addr_space_identifier (id, rid);
+  return g_addr_space_mgr->create_custom_addr_space (id, as, loc);
 }
 
 /* Return identifier to look up for omp declare reduction.  */
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 80dd61d599e..88ad30c543a 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -71,6 +71,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pretty-print.h"
 #include "memmodel.h"
 #include "c-family/known-headers.h"
+#include "addr-space.h"
 
 /* We need to walk over decls with incomplete struct/union/enum types
    after parsing the whole translation unit.
@@ -326,7 +327,7 @@ c_lex_one_token (c_parser *parser, c_token *token, bool raw = false)
 	      {
 		addr_space_t as;
 		as = (addr_space_t) (rid_code - RID_FIRST_ADDR_SPACE);
-		targetm.addr_space.diagnose_usage (as, token->location);
+		addr_space_diagnose_usage (as, token->location);
 		token->id_kind = C_ID_ADDRSPACE;
 		token->keyword = rid_code;
 		break;
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 782414f8c8c..afaa3a63029 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stringpool.h"
 #include "attribs.h"
 #include "asan.h"
+#include "addr-space.h"
 
 /* Possible cases of implicit conversions.  Used to select diagnostic messages
    and control folding initializers in convert_for_assignment.  */
@@ -308,12 +309,12 @@ addr_space_superset (addr_space_t as1, addr_space_t as2, addr_space_t *common)
       *common = as1;
       return true;
     }
-  else if (targetm.addr_space.subset_p (as1, as2))
+  else if (addr_space_subset_p (as1, as2))
     {
       *common = as2;
       return true;
     }
-  else if (targetm.addr_space.subset_p (as2, as1))
+  else if (addr_space_subset_p (as2, as1))
     {
       *common = as1;
       return true;
@@ -6015,18 +6016,18 @@ build_c_cast (location_t loc, tree type, tree expr)
 	  if (!addr_space_superset (as_to, as_from, &as_common))
 	    {
 	      if (ADDR_SPACE_GENERIC_P (as_from))
-		warning_at (loc, 0, "cast to %s address space pointer "
+		warning_at (loc, 0, "cast to %qs address space pointer "
 			    "from disjoint generic address space pointer",
 			    c_addr_space_name (as_to));
 
 	      else if (ADDR_SPACE_GENERIC_P (as_to))
 		warning_at (loc, 0, "cast to generic address space pointer "
-			    "from disjoint %s address space pointer",
+			    "from disjoint %qs address space pointer",
 			    c_addr_space_name (as_from));
 
 	      else
-		warning_at (loc, 0, "cast to %s address space pointer "
-			    "from disjoint %s address space pointer",
+		warning_at (loc, 0, "cast to %qs address space pointer "
+			    "from disjoint %qs address space pointer",
 			    c_addr_space_name (as_to),
 			    c_addr_space_name (as_from));
 	    }
@@ -7233,8 +7234,10 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
       asl = TYPE_ADDR_SPACE (ttl);
       asr = TYPE_ADDR_SPACE (ttr);
       if (!null_pointer_constant_p (rhs)
-	  && asr != asl && !targetm.addr_space.subset_p (asr, asl))
+	  && asr != asl && !addr_space_subset_p (asr, asl))
 	{
+	  auto_diagnostic_group d;
+	  bool diagnosed = true;
 	  switch (errtype)
 	    {
 	    case ic_argpass:
@@ -7242,7 +7245,8 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
 		const char msg[] = G_("passing argument %d of %qE from "
 				      "pointer to non-enclosed address space");
 		if (warnopt)
-		  warning_at (expr_loc, warnopt, msg, parmnum, rname);
+		  diagnosed
+		    = warning_at (expr_loc, warnopt, msg, parmnum, rname);
 		else
 		  error_at (expr_loc, msg, parmnum, rname);
 	      break;
@@ -7252,7 +7256,7 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
 		const char msg[] = G_("assignment from pointer to "
 				      "non-enclosed address space");
 		if (warnopt)
-		  warning_at (location, warnopt, msg);
+		  diagnosed = warning_at (location, warnopt, msg);
 		else
 		  error_at (location, msg);
 		break;
@@ -7263,7 +7267,7 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
 		const char msg[] = G_("initialization from pointer to "
 				      "non-enclosed address space");
 		if (warnopt)
-		  warning_at (location, warnopt, msg);
+		  diagnosed = warning_at (location, warnopt, msg);
 		else
 		  error_at (location, msg);
 		break;
@@ -7273,7 +7277,7 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
 		const char msg[] = G_("return from pointer to "
 				      "non-enclosed address space");
 		if (warnopt)
-		  warning_at (location, warnopt, msg);
+		  diagnosed = warning_at (location, warnopt, msg);
 		else
 		  error_at (location, msg);
 		break;
@@ -7281,6 +7285,14 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
 	    default:
 	      gcc_unreachable ();
 	    }
+	  if (diagnosed)
+	    {
+	      if (errtype == ic_argpass)
+		inform_for_arg (fundecl, expr_loc, parmnum, type, rhstype);
+	      else
+		inform (location, "expected %qT but pointer is of type %qT",
+			type, rhstype);
+	    }
 	  return error_mark_node;
 	}
 
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 55ff75bd78e..9dd958a5371 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -74,6 +74,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "output.h"
 #include "builtins.h"
 #include "opts.h"
+#include "addr-space.h"
 
 /* Some systems use __main in a way incompatible with its use in gcc, in these
    cases use the macros NAME__MAIN to give a quoted symbol and SYMBOL__MAIN to
@@ -4248,12 +4249,12 @@ convert_debug_memory_address (scalar_int_mode mode, rtx x,
 {
 #ifndef POINTERS_EXTEND_UNSIGNED
   gcc_assert (mode == Pmode
-	      || mode == targetm.addr_space.address_mode (as));
+	      || mode == addr_space_address_mode (as));
   gcc_assert (GET_MODE (x) == mode || GET_MODE (x) == VOIDmode);
 #else
   rtx temp;
 
-  gcc_assert (targetm.addr_space.valid_pointer_mode (mode, as));
+  gcc_assert (addr_space_valid_pointer_mode (mode, as));
 
   if (GET_MODE (x) == mode || GET_MODE (x) == VOIDmode)
     return x;
@@ -4694,7 +4695,7 @@ expand_debug_expr (tree exp)
 
       as = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (TREE_OPERAND (exp, 0))));
 
-      op0 = convert_debug_memory_address (targetm.addr_space.address_mode (as),
+      op0 = convert_debug_memory_address (addr_space_address_mode (as),
 					  op0, as);
       if (op0 == NULL_RTX)
 	return NULL;
@@ -4719,7 +4720,7 @@ expand_debug_expr (tree exp)
 	return NULL;
 
       as = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (TREE_OPERAND (exp, 0))));
-      op0 = convert_debug_memory_address (targetm.addr_space.address_mode (as),
+      op0 = convert_debug_memory_address (addr_space_address_mode (as),
 					  op0, as);
       if (op0 == NULL_RTX)
 	return NULL;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e94efdf39fb..181edd9ef40 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -96,6 +96,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "i386-expand.h"
 #include "i386-features.h"
 #include "function-abi.h"
+#include "addr-space.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -13718,6 +13719,8 @@ ix86_print_operand_address_as (FILE *file, rtx addr,
   bool vsib = false;
   int code = 0;
 
+  as = ensure_builtin_addr_space (as);
+
   if (GET_CODE (addr) == UNSPEC && XINT (addr, 1) == UNSPEC_VSIBADDR)
     {
       ok = ix86_decompose_address (XVECEXP (addr, 0, 0), &parts);
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index b4f530d57ac..f08932a1af7 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -166,11 +166,14 @@ class bitmap_view;
 
 /* Address space number for named address space support.  */
 typedef unsigned char addr_space_t;
+#define ADDR_SPACE_T_MAX 255
 
 /* The value of addr_space_t that represents the generic address space.  */
 #define ADDR_SPACE_GENERIC 0
 #define ADDR_SPACE_GENERIC_P(AS) ((AS) == ADDR_SPACE_GENERIC)
 
+struct custom_addr_space;
+
 /* The major intermediate representations of GCC.  */
 enum ir_type {
   IR_GIMPLE,
diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 32ddf835a91..1c741028a5e 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -5961,6 +5961,14 @@ c_register_addr_space (const char * /*word*/, addr_space_t /*as*/)
 {
 }
 
+/* Stub for c-common.  Please keep in sync with c-decl.c.  */
+
+custom_addr_space *
+c_register_custom_addr_space (tree, location_t)
+{
+  return NULL;
+}
+
 /* Return the number of operands in T that we care about for things like
    mangling.  */
 
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 6e6c580e329..bc298da8956 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -1412,6 +1412,10 @@ Address space identifiers may be used exactly like any other C type
 qualifier (e.g., @code{const} or @code{volatile}).  See the N1275
 document for more details.
 
+As a further extension, GNU C supports user-defined address spaces
+via @code{#pragma GCC custom_address_space}; see that pragma for
+more details.
+
 @anchor{AVR Named Address Spaces}
 @subsection AVR Named Address Spaces
 
@@ -23331,6 +23335,7 @@ information.
 * Push/Pop Macro Pragmas::
 * Function Specific Option Pragmas::
 * Loop-Specific Pragmas::
+* Custom Address Space Pragmas::
 @end menu
 
 @node AArch64 Pragmas
@@ -24008,6 +24013,47 @@ The values of @math{0} and @math{1} block any unrolling of the loop.
 
 @end table
 
+@node Custom Address Space Pragmas
+@subsection Custom Address Space Pragmas
+
+@table @code
+@item #pragma GCC custom_address_space (@var{name})
+@cindex pragma GCC custom_address_space
+
+As an extension, GNU C supports named address spaces on some targets as
+defined in the N1275 draft of ISO/IEC DTR 18037.  Support for named
+address spaces in GCC will evolve as the draft technical report
+changes.
+
+This pragma creates a user-defined address space with the given name
+within the translation unit, supplementing the implicit target-specific
+address spaces, and the ``generic'' address space.
+
+All custom address spaces are disjoint from each other and from all
+built-in address spaces (including the generic address space).
+For example, given:
+
+@smallexample
+#pragma GCC custom_address_space(__kernel)
+#pragma GCC custom_address_space(__user)
+void __kernel *kernel_ptr;
+void __user *user_ptr;
+@end smallexample
+
+then GNU C will issue a diagnostic on attempts to use a convert between
+@code{void __kernel *} and a @code {void __user *}, or between these
+pointers and a plain @code{void *}.
+
+Although new address spaces created this way are disjoint and thus are
+not equivalent in terms of type-checking, they are all equivalent to the
+generic address space in terms of code generation.
+
+The number of user-defined address spaces allowed in a translation unit
+is target-dependent, but very limited - the total number of target-specific
+and user-defined address spaces in a translation unit must not exceed 15.
+
+@end table
+
 @node Unnamed Fields
 @section Unnamed Structure and Union Fields
 @cindex @code{struct}
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index fb0e3381e5b..028d4235054 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -97,6 +97,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stringpool.h"
 #include "attribs.h"
 #include "file-prefix-map.h" /* remap_debug_filename()  */
+#include "addr-space.h"
 
 static void dwarf2out_source_line (unsigned int, unsigned int, const char *,
 				   int, bool);
@@ -13771,7 +13772,7 @@ modified_type_die (tree type, int cv_quals, bool reverse,
       addr_space_t as = TYPE_ADDR_SPACE (item_type);
       if (!ADDR_SPACE_GENERIC_P (as))
 	{
-	  int action = targetm.addr_space.debug (as);
+	  int action = addr_space_debug (as);
 	  if (action >= 0)
 	    {
 	      /* Positive values indicate an address_class.  */
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index e6158f243c0..6419235c8d5 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -63,6 +63,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple.h"
 #include "gimple-ssa.h"
 #include "gimplify.h"
+#include "addr-space.h"
 
 struct target_rtl default_target_rtl;
 #if SWITCHABLE_TARGET
@@ -2349,7 +2350,7 @@ adjust_address_1 (rtx memref, machine_mode mode, poly_int64 offset,
   unsigned HOST_WIDE_INT max_align;
 #ifdef POINTERS_EXTEND_UNSIGNED
   scalar_int_mode pointer_mode
-    = targetm.addr_space.pointer_mode (attrs.addrspace);
+    = addr_space_pointer_mode (attrs.addrspace);
 #endif
 
   /* VOIDmode means no mode change for change_address_1.  */
diff --git a/gcc/explow.c b/gcc/explow.c
index a35423f5d16..81c688bb1c3 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stringpool.h"
 #include "common/common-target.h"
 #include "output.h"
+#include "addr-space.h"
 
 static rtx break_out_memory_refs (rtx);
 
@@ -310,8 +311,8 @@ convert_memory_address_addr_space_1 (scalar_int_mode to_mode ATTRIBUTE_UNUSED,
   if (GET_MODE (x) == to_mode)
     return x;
 
-  pointer_mode = targetm.addr_space.pointer_mode (as);
-  address_mode = targetm.addr_space.address_mode (as);
+  pointer_mode = addr_space_pointer_mode (as);
+  address_mode = addr_space_address_mode (as);
   from_mode = to_mode == pointer_mode ? address_mode : pointer_mode;
 
   /* Here we handle some special cases.  If none of them apply, fall through
@@ -433,7 +434,7 @@ rtx
 memory_address_addr_space (machine_mode mode, rtx x, addr_space_t as)
 {
   rtx oldx = x;
-  scalar_int_mode address_mode = targetm.addr_space.address_mode (as);
+  scalar_int_mode address_mode = addr_space_address_mode (as);
 
   x = convert_memory_address_addr_space (address_mode, x, as);
 
@@ -469,7 +470,7 @@ memory_address_addr_space (machine_mode mode, rtx x, addr_space_t as)
 	 transformations can make better code.  */
       {
 	rtx orig_x = x;
-	x = targetm.addr_space.legitimize_address (x, oldx, mode, as);
+	x = addr_space_legitimize_address (x, oldx, mode, as);
 	if (orig_x != x && memory_address_addr_space_p (mode, x, as))
 	  goto done;
       }
@@ -853,7 +854,7 @@ promote_mode (const_tree type ATTRIBUTE_UNUSED, machine_mode mode,
     case REFERENCE_TYPE:
     case POINTER_TYPE:
       *punsignedp = POINTERS_EXTEND_UNSIGNED;
-      return targetm.addr_space.address_mode
+      return addr_space_address_mode
 	       (TYPE_ADDR_SPACE (TREE_TYPE (type)));
 #endif
 
diff --git a/gcc/expr.c b/gcc/expr.c
index 5673902b1fc..bfecf416aa6 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "rtx-vector-builder.h"
 #include "tree-pretty-print.h"
 #include "flags.h"
+#include "addr-space.h"
 
 
 /* If this is nonzero, we do not bother generating VOLATILE
@@ -6191,7 +6192,7 @@ store_expr (tree exp, rtx target, int call_param_p,
 	  else
 	    {
 	      machine_mode pointer_mode
-		= targetm.addr_space.pointer_mode (MEM_ADDR_SPACE (target));
+		= addr_space_pointer_mode (MEM_ADDR_SPACE (target));
 	      machine_mode address_mode = get_address_mode (target);
 
 	      /* Compute the size of the data to copy from the string.  */
@@ -8537,8 +8538,8 @@ expand_expr_addr_expr (tree exp, rtx target, machine_mode tmode,
   if (POINTER_TYPE_P (TREE_TYPE (exp)))
     {
       as = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (exp)));
-      address_mode = targetm.addr_space.address_mode (as);
-      pointer_mode = targetm.addr_space.pointer_mode (as);
+      address_mode = addr_space_address_mode (as);
+      pointer_mode = addr_space_pointer_mode (as);
     }
 
   /* We can get called with some Weird Things if the user does silliness
@@ -9125,10 +9126,10 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 
         /* Ask target code to handle conversion between pointers
 	   to overlapping address spaces.  */
-	if (targetm.addr_space.subset_p (as_to, as_from)
-	    || targetm.addr_space.subset_p (as_from, as_to))
+	if (addr_space_subset_p (as_to, as_from)
+	    || addr_space_subset_p (as_from, as_to))
 	  {
-	    op0 = targetm.addr_space.convert (op0, treeop0_type, type);
+	    op0 = addr_space_convert (op0, treeop0_type, type);
 	  }
         else
           {
@@ -10727,7 +10728,7 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
 	  /* Writing into CONST_DECL is always invalid, but handle it
 	     gracefully.  */
 	  addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (exp));
-	  scalar_int_mode address_mode = targetm.addr_space.address_mode (as);
+	  scalar_int_mode address_mode = addr_space_address_mode (as);
 	  op0 = expand_expr_addr_expr_1 (exp, NULL_RTX, address_mode,
 					 EXPAND_NORMAL, as);
 	  op0 = memory_address_addr_space (mode, op0, as);
@@ -10901,7 +10902,7 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
 	    REF_REVERSE_STORAGE_ORDER (exp) = reverse;
 	    return expand_expr (exp, target, tmode, modifier);
 	  }
-	address_mode = targetm.addr_space.address_mode (as);
+	address_mode = addr_space_address_mode (as);
 	if ((def_stmt = get_def_for_expr (base, BIT_AND_EXPR)))
 	  {
 	    tree mask = gimple_assign_rhs2 (def_stmt);
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 90d82257ae7..dc5d8729508 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -84,6 +84,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "vec-perm-indices.h"
 #include "asan.h"
 #include "gimple-range.h"
+#include "addr-space.h"
 
 /* Nonzero if we are folding constants inside an initializer; zero
    otherwise.  */
@@ -1744,7 +1745,7 @@ const_unop (enum tree_code code, tree type, tree arg0)
       /* If the source address is 0, and the source address space
 	 cannot have a valid object at 0, fold to dest type null.  */
       if (integer_zerop (arg0)
-	  && !(targetm.addr_space.zero_address_valid
+	  && !(addr_space_zero_address_valid
 	       (TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg0))))))
 	return fold_convert_const (code, type, arg0);
       break;
diff --git a/gcc/gimple.c b/gcc/gimple.c
index 1e0fad92e15..1cf4bc8ffda 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -49,6 +49,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-modref-tree.h"
 #include "ipa-modref.h"
 #include "dbgcnt.h"
+#include "addr-space.h"
 
 /* All the tuples have their operand vector (if present) at the very bottom
    of the structure.  Therefore, the offset required to find the
@@ -3023,7 +3024,7 @@ check_loadstore (gimple *, tree op, tree, void *data)
     {
       /* Some address spaces may legitimately dereference zero.  */
       addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (op));
-      if (targetm.addr_space.zero_address_valid (as))
+      if (addr_space_zero_address_valid (as))
 	return false;
 
       return operand_equal_p (TREE_OPERAND (op, 0), (tree)data, 0);
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 0195b4fb9c3..53066fcb675 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -132,6 +132,7 @@
 #include "print-rtl.h"
 #include "function-abi.h"
 #include "rtl-iter.h"
+#include "addr-space.h"
 
 /* Value of LRA_CURR_RELOAD_NUM at the beginning of BB of the current
    insn.  Remember that LRA_CURR_RELOAD_NUM is the number of emitted
@@ -335,7 +336,7 @@ valid_address_p (machine_mode mode ATTRIBUTE_UNUSED,
  win:
   return 1;
 #else
-  return targetm.addr_space.legitimate_address_p (mode, addr, 0, as);
+  return addr_space_legitimate_address_p (mode, addr, 0, as);
 #endif
 }
 
diff --git a/gcc/pointer-query.cc b/gcc/pointer-query.cc
index a0e4543d8a3..fec58093cd0 100644
--- a/gcc/pointer-query.cc
+++ b/gcc/pointer-query.cc
@@ -41,7 +41,7 @@
 #include "pointer-query.h"
 #include "tree-pretty-print.h"
 #include "tree-ssanames.h"
-#include "target.h"
+#include "addr-space.h"
 
 static bool compute_objsize_r (tree, gimple *, int, access_ref *,
 			       ssa_name_limit_t &, pointer_query *);
@@ -1889,7 +1889,7 @@ compute_objsize_r (tree ptr, gimple *stmt, int ostype, access_ref *pref,
 	{
 	  tree deref_type = TREE_TYPE (TREE_TYPE (ptr));
 	  addr_space_t as = TYPE_ADDR_SPACE (deref_type);
-	  if (targetm.addr_space.zero_address_valid (as))
+	  if (addr_space_zero_address_valid (as))
 	    pref->set_max_size_range ();
 	  else
 	    pref->sizrng[0] = pref->sizrng[1] = 0;
diff --git a/gcc/recog.c b/gcc/recog.c
index 5a42c45361d..7deb9c7285e 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "reload.h"
 #include "tree-pass.h"
 #include "function-abi.h"
+#include "addr-space.h"
 
 #ifndef STACK_POP_CODE
 #if STACK_GROWS_DOWNWARD
@@ -1791,7 +1792,7 @@ memory_address_addr_space_p (machine_mode mode ATTRIBUTE_UNUSED,
  win:
   return true;
 #else
-  return targetm.addr_space.legitimate_address_p (mode, addr, 0, as);
+  return addr_space_legitimate_address_p (mode, addr, 0, as);
 #endif
 }
 
@@ -2423,9 +2424,9 @@ offsettable_address_addr_space_p (int strictp, machine_mode mode, rtx y,
 
   machine_mode address_mode = GET_MODE (y);
   if (address_mode == VOIDmode)
-    address_mode = targetm.addr_space.address_mode (as);
+    address_mode = addr_space_address_mode (as);
 #ifdef POINTERS_EXTEND_UNSIGNED
-  machine_mode pointer_mode = targetm.addr_space.pointer_mode (as);
+  machine_mode pointer_mode = addr_space_pointer_mode (as);
 #endif
 
   /* ??? How much offset does an offsettable BLKmode reference need?
diff --git a/gcc/reload.c b/gcc/reload.c
index 4c55ca58a5f..b1018ca684a 100644
--- a/gcc/reload.c
+++ b/gcc/reload.c
@@ -106,6 +106,7 @@ a register with any other reload.  */
 #include "reload.h"
 #include "addresses.h"
 #include "function-abi.h"
+#include "addr-space.h"
 
 /* True if X is a constant that can be forced into the constant pool.
    MODE is the mode of the operand, or VOIDmode if not known.  */
@@ -2172,7 +2173,7 @@ strict_memory_address_addr_space_p (machine_mode mode ATTRIBUTE_UNUSED,
  win:
   return true;
 #else
-  return targetm.addr_space.legitimate_address_p (mode, addr, 1, as);
+  return addr_space_legitimate_address_p (mode, addr, 1, as);
 #endif
 }
 \f
@@ -5242,7 +5243,7 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
     {
       machine_mode address_mode = GET_MODE (ad);
       if (address_mode == VOIDmode)
-	address_mode = targetm.addr_space.address_mode (as);
+	address_mode = addr_space_address_mode (as);
 
       /* If AD is an address in the constant pool, the MEM rtx may be shared.
 	 Unshare it so we can safely alter it.  */
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index d37f7789b20..1046b359c30 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "rtl-iter.h"
 #include "hard-reg-set.h"
 #include "function-abi.h"
+#include "addr-space.h"
 
 /* Forward declarations */
 static void set_of_1 (rtx, const_rtx, void *);
@@ -6278,7 +6279,7 @@ get_address_mode (rtx mem)
   mode = GET_MODE (XEXP (mem, 0));
   if (mode != VOIDmode)
     return as_a <scalar_int_mode> (mode);
-  return targetm.addr_space.address_mode (MEM_ADDR_SPACE (mem));
+  return addr_space_address_mode (MEM_ADDR_SPACE (mem));
 }
 \f
 /* Split up a CONST_DOUBLE or integer constant rtx
diff --git a/gcc/testsuite/gcc.dg/custom-address-space-1.c b/gcc/testsuite/gcc.dg/custom-address-space-1.c
new file mode 100644
index 00000000000..9ca1157a730
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/custom-address-space-1.c
@@ -0,0 +1,174 @@
+/* Verify that we can create multiple custom address spaces,
+   that they are treated as disjoint from each other and from
+   the generic address space.  */
+
+/* Avoid using "-ansi".  */
+/* { dg-options "" } */
+
+#define __kernel
+#pragma GCC custom_address_space(__user)
+#pragma GCC custom_address_space(__iomem)
+#pragma GCC custom_address_space(__percpu)
+#pragma GCC custom_address_space(__rcu)
+
+void *p;
+void __kernel *p_kernel;
+void __user *p_user;
+void __iomem *p_iomem;
+void __percpu *p_percpu;
+void __rcu *p_rcu;
+
+extern void accepts_p (void *); /* { dg-message "24: expected 'void \\*' but argument is of type '__user void \\*'" } */
+extern void accepts_p_kernel (void __kernel *);
+extern void accepts_p_user (void __user *);
+
+void test_argpass_to_p (void)
+{
+  accepts_p (p);
+  accepts_p (p_kernel);
+  accepts_p (p_user); /* { dg-error "passing argument 1 of 'accepts_p' from pointer to non-enclosed address space" } */
+}
+
+void test_init_p (void)
+{
+  void *local_p_1 = p;
+  void *local_p_2 = p_kernel;
+  void *local_p_3 = p_user; /* { dg-error "initialization from pointer to non-enclosed address space" } */
+  /* { dg-message "expected 'void \\*' but pointer is of type '__user void \\*'" "" { target *-*-* } .-1 } */
+  void *local_p_4 = p_iomem; /* { dg-error "initialization from pointer to non-enclosed address space" } */
+  /* { dg-message "expected 'void \\*' but pointer is of type '__iomem void \\*'" "" { target *-*-* } .-1 } */
+}
+
+void test_init_p_kernel (void)
+{
+  void __kernel *local_p_1 = p;
+  void __kernel *local_p_2 = p_kernel;
+  void __kernel *local_p_3 = p_user; /* { dg-error "initialization from pointer to non-enclosed address space" } */
+  /* { dg-message "expected 'void \\*' but pointer is of type '__user void \\*'" "" { target *-*-* } .-1 } */
+}
+
+void test_init_p_user (void)
+{
+  void __user *local_p_1 = p; /* { dg-error "initialization from pointer to non-enclosed address space" } */
+  /* { dg-message "expected '__user void \\*' but pointer is of type 'void \\*'" "" { target *-*-* } .-1 } */
+  void __user *local_p_2 = p_kernel; /* { dg-error "initialization from pointer to non-enclosed address space" } */
+  /* { dg-message "expected '__user void \\*' but pointer is of type 'void \\*'" "" { target *-*-* } .-1 } */
+  void __user *local_p_3 = p_user;
+}
+
+void test_assign_to_p (void)
+{
+  p = p;
+  p = p_kernel;
+  p = p_user; /* { dg-error "assignment from pointer to non-enclosed address space" } */
+  /* { dg-message "expected 'void \\*' but pointer is of type '__user void \\*'" "" { target *-*-* } .-1 } */
+  // etc
+}
+
+void test_assign_to_p_kernel (void)
+{
+  p_kernel = p;
+  p_kernel = p_kernel;
+  p_kernel = p_user; /* { dg-error "assignment from pointer to non-enclosed address space" } */
+  /* { dg-message "expected 'void \\*' but pointer is of type '__user void \\*'" "" { target *-*-* } .-1 } */
+  // etc
+}
+
+void test_assign_to_p_user (void)
+{
+  p_user = p;  /* { dg-error "assignment from pointer to non-enclosed address space" } */
+  /* { dg-message "expected '__user void \\*' but pointer is of type 'void \\*'" "" { target *-*-* } .-1 } */
+  p_user = p_kernel;  /* { dg-error "assignment from pointer to non-enclosed address space" } */
+  /* { dg-message "expected '__user void \\*' but pointer is of type 'void \\*'" "" { target *-*-* } .-1 } */
+  p_user = p_user;
+  // etc
+}
+
+void *test_return_p (int i)
+{
+  switch (i)
+    {
+    default:
+    case 0:
+      return p;
+    case 1:
+      return p_kernel;
+    case 2:
+      return p_user; /* { dg-error "return from pointer to non-enclosed address space" } */
+      /* { dg-message "expected 'void \\*' but pointer is of type '__user void \\*'" "" { target *-*-* } .-1 } */
+    }
+}
+
+void __kernel *test_return_p_kernel (int i)
+{
+  switch (i)
+    {
+    default:
+    case 0:
+      return p;
+    case 1:
+      return p_kernel;
+    case 2:
+      return p_user; /* { dg-error "return from pointer to non-enclosed address space" } */
+      /* { dg-message "expected 'void \\*' but pointer is of type '__user void \\*'" "" { target *-*-* } .-1 } */
+    }
+}
+
+void __user *test_return_p_user (int i)
+{
+  switch (i)
+    {
+    default:
+    case 0:
+      return p; /* { dg-error "return from pointer to non-enclosed address space" } */
+      /* { dg-message "expected '__user void \\*' but pointer is of type 'void \\*'" "" { target *-*-* } .-1 } */
+    case 1:
+      return p_kernel; /* { dg-error "return from pointer to non-enclosed address space" } */
+      /* { dg-message "expected '__user void \\*' but pointer is of type 'void \\*'" "" { target *-*-* } .-1 } */
+    case 2:
+      return p_user;
+    }
+}
+
+void test_cast_k_to_u (void)
+{
+  p_user = (void __user *)p_kernel; /* { dg-warning "cast to '__user' address space pointer from disjoint generic address space pointer" } */
+}
+
+void test_cast_u_to_k (void)
+{
+  p_kernel = (void __kernel *)p_user; /* { dg-warning "cast to generic address space pointer from disjoint '__user' address space pointer" } */
+}
+
+void test_cast_user_to_iomem (void)
+{
+  p_iomem = (void __iomem *)p_user; /* { dg-warning "cast to '__iomem' address space pointer from disjoint '__user' address space pointer" } */
+}
+
+int test_deref_read (int __user *p)
+{
+  return *p; // FIXME: should we have a way to disallow direct access?
+}
+
+void test_deref_write (int __user *p, int i)
+{
+  *p = i; // FIXME: should we have a way to disallow direct access?
+}
+
+typedef struct foo { int i; } __user *foo_ptr_t;
+
+void __user *
+test_pass_through (void __user *ptr)
+{
+  return ptr;
+}
+
+#define NULL ((void *)0)
+
+void __user *
+test_return_null_p_user ()
+{
+  return NULL;
+}
+
+// etc
diff --git a/gcc/testsuite/gcc.dg/custom-address-space-2.c b/gcc/testsuite/gcc.dg/custom-address-space-2.c
new file mode 100644
index 00000000000..7a07f3b134a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/custom-address-space-2.c
@@ -0,0 +1,21 @@
+/* Verify that we fail gracefully if the user defines too many address spaces.  */
+/* Avoid using "-ansi".  */
+/* { dg-options "" } */
+
+#pragma GCC custom_address_space(__cas_01)
+#pragma GCC custom_address_space(__cas_02)
+#pragma GCC custom_address_space(__cas_03)
+#pragma GCC custom_address_space(__cas_04)
+#pragma GCC custom_address_space(__cas_05)
+#pragma GCC custom_address_space(__cas_06)
+#pragma GCC custom_address_space(__cas_07)
+#pragma GCC custom_address_space(__cas_08)
+#pragma GCC custom_address_space(__cas_09)
+#pragma GCC custom_address_space(__cas_10)
+#pragma GCC custom_address_space(__cas_11)
+#pragma GCC custom_address_space(__cas_12)
+#pragma GCC custom_address_space(__cas_13)
+#pragma GCC custom_address_space(__cas_14) /* { dg-warning "too many custom address spaces" } */
+#pragma GCC custom_address_space(__cas_15) /* { dg-warning "too many custom address spaces" } */
+
+// FIXME: how to filter this by target; it's going to vary by target
diff --git a/gcc/testsuite/gcc.dg/custom-address-space-3.c b/gcc/testsuite/gcc.dg/custom-address-space-3.c
new file mode 100644
index 00000000000..426d6ce5c64
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/custom-address-space-3.c
@@ -0,0 +1,15 @@
+/* Verify that we can successfully compile code with a custom address space.  */
+/* Avoid using "-ansi".  */
+/* { dg-options "" } */
+
+#pragma GCC custom_address_space(__user)
+
+int test_1 (int __user *p)
+{
+  return *p;
+}
+
+void test_2 (int __user *p, int val)
+{
+  *p = val;
+}
diff --git a/gcc/tree-ssa-address.c b/gcc/tree-ssa-address.c
index f35556db2f7..e75ab19c8f0 100644
--- a/gcc/tree-ssa-address.c
+++ b/gcc/tree-ssa-address.c
@@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-affine.h"
 #include "gimplify.h"
 #include "builtins.h"
+#include "addr-space.h"
 
 /* FIXME: We compute address costs using RTL.  */
 #include "tree-ssa-address.h"
@@ -192,8 +193,8 @@ rtx
 addr_for_mem_ref (struct mem_address *addr, addr_space_t as,
 		  bool really_expand)
 {
-  scalar_int_mode address_mode = targetm.addr_space.address_mode (as);
-  scalar_int_mode pointer_mode = targetm.addr_space.pointer_mode (as);
+  scalar_int_mode address_mode = addr_space_address_mode (as);
+  scalar_int_mode pointer_mode = addr_space_pointer_mode (as);
   rtx address, sym, bse, idx, st, off;
   struct mem_addr_template *templ;
 
@@ -576,7 +577,7 @@ multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, machine_mode mode,
   valid_mult = valid_mult_list[data_index];
   if (!valid_mult)
     {
-      machine_mode address_mode = targetm.addr_space.address_mode (as);
+      machine_mode address_mode = addr_space_address_mode (as);
       rtx reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
       rtx reg2 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 2);
       rtx addr, scaled;
@@ -622,7 +623,7 @@ most_expensive_mult_to_index (tree type, struct mem_address *parts,
 			      aff_tree *addr, bool speed)
 {
   addr_space_t as = TYPE_ADDR_SPACE (type);
-  machine_mode address_mode = targetm.addr_space.address_mode (as);
+  machine_mode address_mode = addr_space_address_mode (as);
   HOST_WIDE_INT coef;
   unsigned best_mult_cost = 0, acost;
   tree mult_elt = NULL_TREE, elt;
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 4a498abe3b0..0cda74825e8 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -131,6 +131,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "tree-vectorizer.h"
 #include "dbgcnt.h"
+#include "addr-space.h"
 
 /* For lang_hooks.types.type_for_mode.  */
 #include "langhooks.h"
@@ -2606,7 +2607,7 @@ addr_offset_valid_p (struct iv_use *use, poly_int64 offset)
   addr = (*addr_list)[list_index];
   if (!addr)
     {
-      addr_mode = targetm.addr_space.address_mode (as);
+      addr_mode = addr_space_address_mode (as);
       reg = gen_raw_REG (addr_mode, LAST_VIRTUAL_REGISTER + 1);
       addr = gen_rtx_fmt_ee (PLUS, addr_mode, reg, NULL_RTX);
       (*addr_list)[list_index] = addr;
@@ -3729,7 +3730,7 @@ static rtx
 produce_memory_decl_rtl (tree obj, int *regno)
 {
   addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (obj));
-  machine_mode address_mode = targetm.addr_space.address_mode (as);
+  machine_mode address_mode = addr_space_address_mode (as);
   rtx x;
 
   gcc_assert (obj);
diff --git a/gcc/tree.c b/gcc/tree.c
index 845228a055b..119cef9cfcc 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -69,6 +69,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "escaped_string.h"
 #include "gimple-range.h"
+#include "addr-space.h"
 
 /* Tree code classes.  */
 
@@ -6829,7 +6830,7 @@ build_pointer_type_for_mode (tree to_type, machine_mode mode,
   if (mode == VOIDmode)
     {
       addr_space_t as = TYPE_ADDR_SPACE (to_type);
-      mode = targetm.addr_space.pointer_mode (as);
+      mode = addr_space_pointer_mode (as);
     }
 
   /* If the pointed-to type has the may_alias attribute set, force
@@ -6901,7 +6902,7 @@ build_reference_type_for_mode (tree to_type, machine_mode mode,
   if (mode == VOIDmode)
     {
       addr_space_t as = TYPE_ADDR_SPACE (to_type);
-      mode = targetm.addr_space.pointer_mode (as);
+      mode = addr_space_pointer_mode (as);
     }
 
   /* If the pointed-to type has the may_alias attribute set, force
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 09316c62050..d328adc0a7b 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -61,6 +61,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "alloc-pool.h"
 #include "toplev.h"
 #include "opts.h"
+#include "addr-space.h"
 
 #ifdef XCOFF_DEBUGGING_INFO
 #include "xcoffout.h"		/* Needed for external data declarations.  */
@@ -1620,7 +1621,7 @@ make_decl_rtl (tree decl)
       if (TREE_TYPE (decl) != error_mark_node)
 	{
 	  addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (decl));
-	  address_mode = targetm.addr_space.address_mode (as);
+	  address_mode = addr_space_address_mode (as);
 	}
       x = gen_rtx_SYMBOL_REF (address_mode, name);
     }
@@ -5178,7 +5179,7 @@ output_constant (tree exp, unsigned HOST_WIDE_INT size, unsigned int align,
      resolving it.  */
   if (TREE_CODE (exp) == NOP_EXPR
       && POINTER_TYPE_P (TREE_TYPE (exp))
-      && targetm.addr_space.valid_pointer_mode
+      && addr_space_valid_pointer_mode
 	   (SCALAR_INT_TYPE_MODE (TREE_TYPE (exp)),
 	    TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (exp)))))
     {
@@ -5188,7 +5189,7 @@ output_constant (tree exp, unsigned HOST_WIDE_INT size, unsigned int align,
 	 pointer modes.  */
       while (TREE_CODE (exp) == NOP_EXPR
 	     && POINTER_TYPE_P (TREE_TYPE (exp))
-	     && targetm.addr_space.valid_pointer_mode
+	     && addr_space_valid_pointer_mode
 		  (SCALAR_INT_TYPE_MODE (TREE_TYPE (exp)),
 		   TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (exp)))))
 	exp = TREE_OPERAND (exp, 0);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 1b/6] Add __attribute__((untrusted))
  2021-11-13 20:37 [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries David Malcolm
  2021-11-13 20:37 ` [PATCH 1a/6] RFC: Implement "#pragma GCC custom_address_space" David Malcolm
@ 2021-11-13 20:37 ` David Malcolm
  2021-12-09 22:54   ` Martin Sebor
  2021-11-13 20:37 ` [PATCH 2/6] Add returns_zero_on_success/failure attributes David Malcolm
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 39+ messages in thread
From: David Malcolm @ 2021-11-13 20:37 UTC (permalink / raw)
  To: gcc-patches, linux-toolchains; +Cc: David Malcolm

This patch adds a new:

  __attribute__((untrusted))

for use by the C front-end, intended for use by the Linux kernel for
use with "__user", but which could be used by other operating system
kernels, and potentialy by other projects.

Known issues:
- at least one TODO in handle_untrusted_attribute
- should it be permitted to dereference an untrusted pointer?  The patch
  currently allows this

gcc/c-family/ChangeLog:
	* c-attribs.c (c_common_attribute_table): Add "untrusted".
	(build_untrusted_type): New.
	(handle_untrusted_attribute): New.
	* c-pretty-print.c (pp_c_cv_qualifiers): Handle
	TYPE_QUAL_UNTRUSTED.

gcc/c/ChangeLog:
	* c-typeck.c (convert_for_assignment): Complain if the trust
	levels vary when assigning a non-NULL pointer.

gcc/ChangeLog:
	* doc/extend.texi (Common Type Attributes): Add "untrusted".
	* print-tree.c (print_node): Handle TYPE_UNTRUSTED.
	* tree-core.h (enum cv_qualifier): Add TYPE_QUAL_UNTRUSTED.
	(struct tree_type_common): Assign one of the spare bits to a new
	"untrusted_flag".
	* tree.c (set_type_quals): Handle TYPE_QUAL_UNTRUSTED.
	* tree.h (TYPE_QUALS): Likewise.
	(TYPE_QUALS_NO_ADDR_SPACE): Likewise.
	(TYPE_QUALS_NO_ADDR_SPACE_NO_ATOMIC): Likewise.

gcc/testsuite/ChangeLog:
	* c-c++-common/attr-untrusted-1.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
---
 gcc/c-family/c-attribs.c                      |  59 +++++++
 gcc/c-family/c-pretty-print.c                 |   2 +
 gcc/c/c-typeck.c                              |  64 +++++++
 gcc/doc/extend.texi                           |  25 +++
 gcc/print-tree.c                              |   3 +
 gcc/testsuite/c-c++-common/attr-untrusted-1.c | 165 ++++++++++++++++++
 gcc/tree-core.h                               |   6 +-
 gcc/tree.c                                    |   1 +
 gcc/tree.h                                    |  11 +-
 9 files changed, 332 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/attr-untrusted-1.c

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 007b928c54b..100c2dabab2 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -136,6 +136,7 @@ static tree handle_warn_unused_result_attribute (tree *, tree, tree, int,
 						 bool *);
 static tree handle_access_attribute (tree *, tree, tree, int, bool *);
 
+static tree handle_untrusted_attribute (tree *, tree, tree, int, bool *);
 static tree handle_sentinel_attribute (tree *, tree, tree, int, bool *);
 static tree handle_type_generic_attribute (tree *, tree, tree, int, bool *);
 static tree handle_alloc_size_attribute (tree *, tree, tree, int, bool *);
@@ -536,6 +537,8 @@ const struct attribute_spec c_common_attribute_table[] =
 			      handle_special_var_sec_attribute, attr_section_exclusions },
   { "access",		      1, 3, false, true, true, false,
 			      handle_access_attribute, NULL },
+  { "untrusted",	      0, 0, false,  true, false, true,
+			      handle_untrusted_attribute, NULL },
   /* Attributes used by Objective-C.  */
   { "NSObject",		      0, 0, true, false, false, false,
 			      handle_nsobject_attribute, NULL },
@@ -5224,6 +5227,62 @@ build_attr_access_from_parms (tree parms, bool skip_voidptr)
   return build_tree_list (name, attrargs);
 }
 
+/* Build (or reuse) a type based on BASE_TYPE, but with
+   TYPE_QUAL_UNTRUSTED.  */
+
+static tree
+build_untrusted_type (tree base_type)
+{
+  int base_type_quals = TYPE_QUALS (base_type);
+  return build_qualified_type (base_type,
+			       base_type_quals | TYPE_QUAL_UNTRUSTED);
+}
+
+/* Handle an "untrusted" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_untrusted_attribute (tree *node, tree ARG_UNUSED (name),
+			    tree ARG_UNUSED (args), int ARG_UNUSED (flags),
+			    bool *no_add_attrs)
+{
+  if (TREE_CODE (*node) == POINTER_TYPE)
+    {
+      tree base_type = TREE_TYPE (*node);
+      tree untrusted_base_type = build_untrusted_type (base_type);
+      *node = build_pointer_type (untrusted_base_type);
+      *no_add_attrs = true; /* OK */
+      return NULL_TREE;
+    }
+  else if (TREE_CODE (*node) == FUNCTION_TYPE)
+    {
+      tree return_type = TREE_TYPE (*node);
+      if (TREE_CODE (return_type) == POINTER_TYPE)
+	{
+	  tree base_type = TREE_TYPE (return_type);
+	  tree untrusted_base_type = build_untrusted_type (base_type);
+	  tree untrusted_return_type = build_pointer_type (untrusted_base_type);
+	  tree fn_type = build_function_type (untrusted_return_type,
+					      TYPE_ARG_TYPES (*node));
+	  *node = fn_type;
+	  *no_add_attrs = true; /* OK */
+	  return NULL_TREE;
+	}
+      else
+	{
+	  gcc_unreachable (); // TODO
+	}
+    }
+  else
+    {
+      tree base_type = *node;
+      tree untrusted_base_type = build_untrusted_type (base_type);
+      *node = untrusted_base_type;
+      *no_add_attrs = true; /* OK */
+      return NULL_TREE;
+    }
+}
+
 /* Handle a "nothrow" attribute; arguments as in
    struct attribute_spec.handler.  */
 
diff --git a/gcc/c-family/c-pretty-print.c b/gcc/c-family/c-pretty-print.c
index a987da46d6d..120e1e6d167 100644
--- a/gcc/c-family/c-pretty-print.c
+++ b/gcc/c-family/c-pretty-print.c
@@ -191,6 +191,8 @@ pp_c_cv_qualifiers (c_pretty_printer *pp, int qualifiers, bool func_type)
   if (qualifiers & TYPE_QUAL_RESTRICT)
     pp_c_ws_string (pp, (flag_isoc99 && !c_dialect_cxx ()
 			 ? "restrict" : "__restrict__"));
+  if (qualifiers & TYPE_QUAL_UNTRUSTED)
+    pp_c_ws_string (pp, "__attribute__((untrusted))");
 }
 
 /* Pretty-print T using the type-cast notation '( type-name )'.  */
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 782414f8c8c..44de82b99ba 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -7284,6 +7284,70 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
 	  return error_mark_node;
 	}
 
+      /* Untrusted vs trusted pointers, but allowing NULL to be used
+	 for everything.  */
+      if (TYPE_UNTRUSTED (ttl) != TYPE_UNTRUSTED (ttr)
+	  && !null_pointer_constant_p (rhs))
+	{
+	  auto_diagnostic_group d;
+	  bool diagnosed = true;
+	  switch (errtype)
+	    {
+	    case ic_argpass:
+	      {
+		const char msg[] = G_("passing argument %d of %qE from "
+				      "pointer with different trust level");
+		if (warnopt)
+		  diagnosed
+		    = warning_at (expr_loc, warnopt, msg, parmnum, rname);
+		else
+		  error_at (expr_loc, msg, parmnum, rname);
+	      break;
+	      }
+	    case ic_assign:
+	      {
+		const char msg[] = G_("assignment from pointer with "
+				      "different trust level");
+		if (warnopt)
+		  warning_at (location, warnopt, msg);
+		else
+		  error_at (location, msg);
+		break;
+	      }
+	    case ic_init:
+	      {
+		const char msg[] = G_("initialization from pointer with "
+				      "different trust level");
+		if (warnopt)
+		  warning_at (location, warnopt, msg);
+		else
+		  error_at (location, msg);
+		break;
+	      }
+	    case ic_return:
+	      {
+		const char msg[] = G_("return from pointer with "
+				      "different trust level");
+		if (warnopt)
+		  warning_at (location, warnopt, msg);
+		else
+		  error_at (location, msg);
+		break;
+	      }
+	    default:
+	      gcc_unreachable ();
+	    }
+	  if (diagnosed)
+	    {
+	      if (errtype == ic_argpass)
+		inform_for_arg (fundecl, expr_loc, parmnum, type, rhstype);
+	      else
+		inform (location, "expected %qT but pointer is of type %qT",
+			type, rhstype);
+	    }
+	  return error_mark_node;
+	}
+
       /* Check if the right-hand side has a format attribute but the
 	 left-hand side doesn't.  */
       if (warn_suggest_attribute_format
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 6e6c580e329..e9f47519df2 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8770,6 +8770,31 @@ pid_t wait (wait_status_ptr_t p)
 @}
 @end smallexample
 
+@item untrusted
+@cindex @code{untrusted} type attribute
+Types marked with this attribute are treated as being ``untrusted'' -
+values should be treated as under attacker control.
+
+The C front end will issue an error diagnostic on attempts to assign
+pointer values between untrusted and trusted pointer types without
+an explicit cast.
+
+For example, when implementing an operating system kernel, one
+might write
+
+@smallexample
+#define __kernel
+#define __user    __attribute__ ((untrusted))
+void __kernel *p_kernel;
+void __user *p_user;
+
+/* With the above, the following assignment should be diagnosed as an error.  */
+p_user = p_kernel;
+@end smallexample
+
+The NULL pointer is treated as being usable with both trusted and
+untrusted pointers.
+
 @item unused
 @cindex @code{unused} type attribute
 When attached to a type (including a @code{union} or a @code{struct}),
diff --git a/gcc/print-tree.c b/gcc/print-tree.c
index d1fbd044c27..e5123807521 100644
--- a/gcc/print-tree.c
+++ b/gcc/print-tree.c
@@ -640,6 +640,9 @@ print_node (FILE *file, const char *prefix, tree node, int indent,
       if (TYPE_RESTRICT (node))
 	fputs (" restrict", file);
 
+      if (TYPE_UNTRUSTED (node))
+	fputs (" untrusted", file);
+
       if (TYPE_LANG_FLAG_0 (node))
 	fputs (" type_0", file);
       if (TYPE_LANG_FLAG_1 (node))
diff --git a/gcc/testsuite/c-c++-common/attr-untrusted-1.c b/gcc/testsuite/c-c++-common/attr-untrusted-1.c
new file mode 100644
index 00000000000..84a217fc59f
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/attr-untrusted-1.c
@@ -0,0 +1,165 @@
+#define __kernel
+#define __user __attribute__((untrusted))
+#define __iomem
+#define __percpu
+#define __rcu
+
+void *p;
+void __kernel *p_kernel;
+void __user *p_user;
+void __iomem *p_iomem;
+void __percpu *p_percpu;
+void __rcu *p_rcu;
+
+#define NULL ((void *)0)
+
+extern void accepts_p (void *); /* { dg-message "24: expected 'void \\*' but argument is of type '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } } */
+/* { dg-message "24:  initializing argument 1 of 'void accepts_p\\(void\\*\\)'" "" { target c++ } .-1 } */
+extern void accepts_p_kernel (void __kernel *);
+extern void accepts_p_user (void __user *);
+
+void test_argpass_to_p (void)
+{
+  accepts_p (p);
+  accepts_p (p_kernel);
+  accepts_p (p_user); /* { dg-error "passing argument 1 of 'accepts_p' from pointer with different trust level" "" { target c } } */
+  /* { dg-error "invalid conversion from '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" { target c++ } .-1 } */
+}
+
+void test_init_p (void)
+{
+  void *local_p_1 = p;
+  void *local_p_2 = p_kernel;
+  void *local_p_3 = p_user; /* { dg-error "initialization from pointer with different trust level" "" { target c } } */
+  /* { dg-message "expected 'void \\*' but pointer is of type '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 } */
+  /* { dg-error "invalid conversion from '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" { target c++ } .-2 } */
+}
+
+void test_init_p_kernel (void)
+{
+  void __kernel *local_p_1 = p;
+  void __kernel *local_p_2 = p_kernel;
+  void __kernel *local_p_3 = p_user; /* { dg-error "initialization from pointer with different trust level" "" { target c } } */
+  /* { dg-message "expected 'void \\*' but pointer is of type '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 } */
+  /* { dg-error "invalid conversion from '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" { target c++ } .-2 } */
+}
+
+void test_init_p_user (void)
+{
+  void __user *local_p_1 = p; /* { dg-error "initialization from pointer with different trust level" "" { target c } } */
+  /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\) void \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
+  void __user *local_p_2 = p_kernel; /* { dg-error "initialization from pointer with different trust level" "" { target c } } */
+  /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\) void \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
+  void __user *local_p_3 = p_user;
+  void __user *local_p_4 = NULL;
+}
+
+void test_assign_to_p (void)
+{
+  p = p;
+  p = p_kernel;
+  p = p_user; /* { dg-error "assignment from pointer with different trust level" "" { target c } } */
+  /* { dg-message "expected 'void \\*' but pointer is of type '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 } */
+  /* { dg-error "invalid conversion from '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" { target c++ } .-2 } */
+  // etc
+}
+
+void test_assign_to_p_kernel (void)
+{
+  p_kernel = p;
+  p_kernel = p_kernel;
+  p_kernel = p_user; /* { dg-error "assignment from pointer with different trust level" "" { target c } } */
+  /* { dg-message "expected 'void \\*' but pointer is of type '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 } */
+  /* { dg-error "invalid conversion from '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" { target c++ } .-2 } */
+  // etc
+}
+
+void test_assign_to_p_user (void)
+{
+  p_user = p;  /* { dg-error "assignment from pointer with different trust level" "" { target c } } */
+  /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\) void \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
+  p_user = p_kernel;  /* { dg-error "assignment from pointer with different trust level" "" { target c } } */
+  /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\) void \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
+  p_user = p_user;
+  p_user = NULL;
+  // etc
+}
+
+void *test_return_p (int i)
+{
+  switch (i)
+    {
+    default:
+    case 0:
+      return p;
+    case 1:
+      return p_kernel;
+    case 2:
+      return p_user; /* { dg-error "return from pointer with different trust level" "" { target c } } */
+      /* { dg-message "expected 'void \\*' but pointer is of type '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 } */
+      /* { dg-error "invalid conversion from '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" { target c++ } .-2 } */
+    }
+}
+
+void __kernel *test_return_p_kernel (int i)
+{
+  switch (i)
+    {
+    default:
+    case 0:
+      return p;
+    case 1:
+      return p_kernel;
+    case 2:
+      return p_user; /* { dg-error "return from pointer with different trust level" "" { target c } } */
+      /* { dg-message "expected 'void \\*' but pointer is of type '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 } */
+      /* { dg-error "invalid conversion from '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" { target c++ } .-2 } */
+    }
+}
+
+void __user *
+test_return_p_user (int i)
+{
+  switch (i)
+    {
+    default:
+    case 0:
+      return p; /* { dg-error "return from pointer with different trust level" "" { target c } } */
+      /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\) void \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
+    case 1:
+      return p_kernel; /* { dg-error "return from pointer with different trust level" "" { target c } } */
+      /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\) void \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
+    case 2:
+      return p_user;
+    case 3:
+      return NULL;
+    }
+}
+
+void test_cast_k_to_u (void)
+{
+  p_user = (void __user *)p_kernel;
+}
+
+void test_cast_u_to_k (void)
+{
+  p_kernel = (void __kernel *)p_user;
+}
+
+int test_deref_read (int __user *p)
+{
+  return *p; // FIXME: should this be allowed directly?
+}
+
+void test_deref_write (int __user *p, int i)
+{
+  *p = i; // FIXME: should this be allowed directly?
+}
+
+typedef struct foo { int i; } __user *foo_ptr_t;
+
+void __user *
+test_pass_through (void __user *ptr)
+{
+  return ptr;
+}
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 8ab119dc9a2..35a7f50c06c 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -604,7 +604,8 @@ enum cv_qualifier {
   TYPE_QUAL_CONST    = 0x1,
   TYPE_QUAL_VOLATILE = 0x2,
   TYPE_QUAL_RESTRICT = 0x4,
-  TYPE_QUAL_ATOMIC   = 0x8
+  TYPE_QUAL_ATOMIC   = 0x8,
+  TYPE_QUAL_UNTRUSTED = 0x10
 };
 
 /* Standard named or nameless data types of the C compiler.  */
@@ -1684,7 +1685,8 @@ struct GTY(()) tree_type_common {
   unsigned typeless_storage : 1;
   unsigned empty_flag : 1;
   unsigned indivisible_p : 1;
-  unsigned spare : 16;
+  unsigned untrusted_flag : 1;
+  unsigned spare : 15;
 
   alias_set_type alias_set;
   tree pointer_to;
diff --git a/gcc/tree.c b/gcc/tree.c
index 845228a055b..3600639d985 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -5379,6 +5379,7 @@ set_type_quals (tree type, int type_quals)
   TYPE_VOLATILE (type) = (type_quals & TYPE_QUAL_VOLATILE) != 0;
   TYPE_RESTRICT (type) = (type_quals & TYPE_QUAL_RESTRICT) != 0;
   TYPE_ATOMIC (type) = (type_quals & TYPE_QUAL_ATOMIC) != 0;
+  TYPE_UNTRUSTED (type) = (type_quals & TYPE_QUAL_UNTRUSTED) != 0;
   TYPE_ADDR_SPACE (type) = DECODE_QUAL_ADDR_SPACE (type_quals);
 }
 
diff --git a/gcc/tree.h b/gcc/tree.h
index f62c00bc870..caab575b210 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -2197,6 +2197,10 @@ extern tree vector_element_bits_tree (const_tree);
    the term.  */
 #define TYPE_RESTRICT(NODE) (TYPE_CHECK (NODE)->type_common.restrict_flag)
 
+/* Nonzero in a type considered "untrusted" - values should be treated as
+   under attacker control.  */
+#define TYPE_UNTRUSTED(NODE) (TYPE_CHECK (NODE)->type_common.untrusted_flag)
+
 /* If nonzero, type's name shouldn't be emitted into debug info.  */
 #define TYPE_NAMELESS(NODE) (TYPE_CHECK (NODE)->base.u.bits.nameless_flag)
 
@@ -2221,6 +2225,7 @@ extern tree vector_element_bits_tree (const_tree);
 	  | (TYPE_VOLATILE (NODE) * TYPE_QUAL_VOLATILE)		\
 	  | (TYPE_ATOMIC (NODE) * TYPE_QUAL_ATOMIC)		\
 	  | (TYPE_RESTRICT (NODE) * TYPE_QUAL_RESTRICT)		\
+	  | (TYPE_UNTRUSTED (NODE) * TYPE_QUAL_UNTRUSTED)	\
 	  | (ENCODE_QUAL_ADDR_SPACE (TYPE_ADDR_SPACE (NODE)))))
 
 /* The same as TYPE_QUALS without the address space qualifications.  */
@@ -2228,14 +2233,16 @@ extern tree vector_element_bits_tree (const_tree);
   ((int) ((TYPE_READONLY (NODE) * TYPE_QUAL_CONST)		\
 	  | (TYPE_VOLATILE (NODE) * TYPE_QUAL_VOLATILE)		\
 	  | (TYPE_ATOMIC (NODE) * TYPE_QUAL_ATOMIC)		\
-	  | (TYPE_RESTRICT (NODE) * TYPE_QUAL_RESTRICT)))
+	  | (TYPE_RESTRICT (NODE) * TYPE_QUAL_RESTRICT)		\
+	  | (TYPE_UNTRUSTED (NODE) * TYPE_QUAL_UNTRUSTED)))
 
 /* The same as TYPE_QUALS without the address space and atomic 
    qualifications.  */
 #define TYPE_QUALS_NO_ADDR_SPACE_NO_ATOMIC(NODE)		\
   ((int) ((TYPE_READONLY (NODE) * TYPE_QUAL_CONST)		\
 	  | (TYPE_VOLATILE (NODE) * TYPE_QUAL_VOLATILE)		\
-	  | (TYPE_RESTRICT (NODE) * TYPE_QUAL_RESTRICT)))
+	  | (TYPE_RESTRICT (NODE) * TYPE_QUAL_RESTRICT)		\
+	  | (TYPE_UNTRUSTED (NODE) * TYPE_QUAL_UNTRUSTED)))
 
 /* These flags are available for each language front end to use internally.  */
 #define TYPE_LANG_FLAG_0(NODE) (TYPE_CHECK (NODE)->type_common.lang_flag_0)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/6] Add returns_zero_on_success/failure attributes
  2021-11-13 20:37 [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries David Malcolm
  2021-11-13 20:37 ` [PATCH 1a/6] RFC: Implement "#pragma GCC custom_address_space" David Malcolm
  2021-11-13 20:37 ` [PATCH 1b/6] Add __attribute__((untrusted)) David Malcolm
@ 2021-11-13 20:37 ` David Malcolm
  2021-11-15  7:03   ` Prathamesh Kulkarni
  2021-11-13 20:37 ` [PATCH 4a/6] analyzer: implement region::untrusted_p in terms of custom address spaces David Malcolm
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 39+ messages in thread
From: David Malcolm @ 2021-11-13 20:37 UTC (permalink / raw)
  To: gcc-patches, linux-toolchains; +Cc: David Malcolm

This patch adds two new attributes.  The followup patch makes use of
the attributes in -fanalyzer.

gcc/c-family/ChangeLog:
	* c-attribs.c (attr_noreturn_exclusions): Add
	"returns_zero_on_failure" and "returns_zero_on_success".
	(attr_returns_twice_exclusions): Likewise.
	(attr_returns_zero_on_exclusions): New.
	(c_common_attribute_table): Add "returns_zero_on_failure" and
	"returns_zero_on_success".
	(handle_returns_zero_on_attributes): New.

gcc/ChangeLog:
	* doc/extend.texi (Common Function Attributes): Document
	"returns_zero_on_failure" and "returns_zero_on_success".

gcc/testsuite/ChangeLog:
	* c-c++-common/attr-returns-zero-on-1.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
---
 gcc/c-family/c-attribs.c                      | 37 ++++++++++
 gcc/doc/extend.texi                           | 16 +++++
 .../c-c++-common/attr-returns-zero-on-1.c     | 68 +++++++++++++++++++
 3 files changed, 121 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/attr-returns-zero-on-1.c

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 100c2dabab2..9e03156de5e 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -153,6 +153,7 @@ static tree handle_argspec_attribute (tree *, tree, tree, int, bool *);
 static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
 static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
 static tree handle_returns_nonnull_attribute (tree *, tree, tree, int, bool *);
+static tree handle_returns_zero_on_attributes (tree *, tree, tree, int, bool *);
 static tree handle_omp_declare_simd_attribute (tree *, tree, tree, int,
 					       bool *);
 static tree handle_omp_declare_variant_attribute (tree *, tree, tree, int,
@@ -221,6 +222,8 @@ extern const struct attribute_spec::exclusions attr_noreturn_exclusions[] =
   ATTR_EXCL ("pure", true, true, true),
   ATTR_EXCL ("returns_twice", true, true, true),
   ATTR_EXCL ("warn_unused_result", true, true, true),
+  ATTR_EXCL ("returns_zero_on_failure", true, true, true),
+  ATTR_EXCL ("returns_zero_on_success", true, true, true),
   ATTR_EXCL (NULL, false, false, false),
 };
 
@@ -235,6 +238,8 @@ attr_warn_unused_result_exclusions[] =
 static const struct attribute_spec::exclusions attr_returns_twice_exclusions[] =
 {
   ATTR_EXCL ("noreturn", true, true, true),
+  ATTR_EXCL ("returns_zero_on_failure", true, true, true),
+  ATTR_EXCL ("returns_zero_on_success", true, true, true),
   ATTR_EXCL (NULL, false, false, false),
 };
 
@@ -275,6 +280,16 @@ static const struct attribute_spec::exclusions attr_stack_protect_exclusions[] =
   ATTR_EXCL (NULL, false, false, false),
 };
 
+/* Exclusions that apply to the returns_zero_on_* attributes.  */
+static const struct attribute_spec::exclusions
+  attr_returns_zero_on_exclusions[] =
+{
+  ATTR_EXCL ("noreturn", true, true, true),
+  ATTR_EXCL ("returns_twice", true, true, true),
+  ATTR_EXCL ("returns_zero_on_failure", true, true, true),
+  ATTR_EXCL ("returns_zero_on_success", true, true, true),
+  ATTR_EXCL (NULL, false, false, false),
+};
 
 /* Table of machine-independent attributes common to all C-like languages.
 
@@ -493,6 +508,12 @@ const struct attribute_spec c_common_attribute_table[] =
 			      handle_warn_unused_attribute, NULL },
   { "returns_nonnull",        0, 0, false, true, true, false,
 			      handle_returns_nonnull_attribute, NULL },
+  { "returns_zero_on_failure",0, 0, false, true, true, false,
+			      handle_returns_zero_on_attributes,
+			      attr_returns_zero_on_exclusions },
+  { "returns_zero_on_success",0, 0, false, true, true, false,
+			      handle_returns_zero_on_attributes,
+			      attr_returns_zero_on_exclusions },
   { "omp declare simd",       0, -1, true,  false, false, false,
 			      handle_omp_declare_simd_attribute, NULL },
   { "omp declare variant base", 0, -1, true,  false, false, false,
@@ -5660,6 +5681,22 @@ handle_returns_nonnull_attribute (tree *node, tree name, tree, int,
   return NULL_TREE;
 }
 
+/* Handle "returns_zero_on_failure" and "returns_zero_on_success" attributes;
+   arguments as in struct attribute_spec.handler.  */
+
+static tree
+handle_returns_zero_on_attributes (tree *node, tree name, tree, int,
+				   bool *no_add_attrs)
+{
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (*node)))
+    {
+      error ("%qE attribute on a function not returning an integral type",
+	     name);
+      *no_add_attrs = true;
+    }
+  return NULL_TREE;
+}
+
 /* Handle a "designated_init" attribute; arguments as in
    struct attribute_spec.handler.  */
 
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e9f47519df2..5a6ef464779 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3784,6 +3784,22 @@ function.  Examples of such functions are @code{setjmp} and @code{vfork}.
 The @code{longjmp}-like counterpart of such function, if any, might need
 to be marked with the @code{noreturn} attribute.
 
+@item returns_zero_on_failure
+@cindex @code{returns_zero_on_failure} function attribute
+The @code{returns_zero_on_failure} attribute hints that the function
+can succeed or fail, returning non-zero on success and zero on failure.
+This is used by the @option{-fanalyzer} option to consider both outcomes
+separately, which may improve how it explores error-handling paths, and
+how such outcomes are labelled in diagnostics.  It is also a hint
+to the human reader of the source code.
+
+@item returns_zero_on_success
+@cindex @code{returns_zero_on_success} function attribute
+The @code{returns_zero_on_success} attribute is identical to the
+@code{returns_zero_on_failure} attribute, apart from having the
+opposite interpretation of the return value: zero on success, non-zero
+on failure.
+
 @item section ("@var{section-name}")
 @cindex @code{section} function attribute
 @cindex functions in arbitrary sections
diff --git a/gcc/testsuite/c-c++-common/attr-returns-zero-on-1.c b/gcc/testsuite/c-c++-common/attr-returns-zero-on-1.c
new file mode 100644
index 00000000000..5475dfe61db
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/attr-returns-zero-on-1.c
@@ -0,0 +1,68 @@
+/* Verify the parsing of the "returns_zero_on_{sucess|failure}" attributes.  */
+
+/* Correct usage.  */
+
+extern int test_int_return_s ()
+  __attribute__((returns_zero_on_success));
+extern long test_long_return_f ()
+  __attribute__((returns_zero_on_failure));
+
+/* Should complain if not a function.  */
+
+extern int not_a_function_s
+  __attribute__((returns_zero_on_success)); /* { dg-warning "'returns_zero_on_success' attribute only applies to function types" } */
+extern int not_a_function_f
+  __attribute__((returns_zero_on_failure)); /* { dg-warning "'returns_zero_on_failure' attribute only applies to function types" } */
+
+/* Should complain if return type is not integral.  */
+
+extern void test_void_return_s ()
+  __attribute__((returns_zero_on_success)); /* { dg-error "'returns_zero_on_success' attribute on a function not returning an integral type" } */
+extern void test_void_return_f ()
+  __attribute__((returns_zero_on_failure)); /* { dg-error "'returns_zero_on_failure' attribute on a function not returning an integral type" } */
+
+extern void *test_void_star_return_s ()
+  __attribute__((returns_zero_on_success)); /* { dg-error "'returns_zero_on_success' attribute on a function not returning an integral type" } */
+extern void *test_void_star_return_f ()
+  __attribute__((returns_zero_on_failure)); /* { dg-error "'returns_zero_on_failure' attribute on a function not returning an integral type" } */
+
+/* (and this prevents mixing with returns_non_null, which requires a pointer).  */
+
+/* Should complain if more than one returns_* attribute.  */
+
+extern int test_void_returns_s_f ()
+  __attribute__((returns_zero_on_success))
+  __attribute__((returns_zero_on_failure)); /* { dg-warning "ignoring attribute 'returns_zero_on_failure' because it conflicts with attribute 'returns_zero_on_success'" } */
+extern int test_void_returns_f_s ()
+  __attribute__((returns_zero_on_failure))
+  __attribute__((returns_zero_on_success)); /* { dg-warning "ignoring attribute 'returns_zero_on_success' because it conflicts with attribute 'returns_zero_on_failure'" } */
+
+/* Should complain if mixed with "noreturn".  */
+
+extern int test_noreturn_returns_s ()
+  __attribute__((noreturn))
+  __attribute__((returns_zero_on_success)); /* { dg-warning "ignoring attribute 'returns_zero_on_success' because it conflicts with attribute 'noreturn'" } */
+extern int test_returns_s_noreturn ()
+  __attribute__((returns_zero_on_success))
+  __attribute__((noreturn)); /* { dg-warning "ignoring attribute 'noreturn' because it conflicts with attribute 'returns_zero_on_success'" } */
+extern int test_noreturn_returns_f ()
+  __attribute__((noreturn))
+  __attribute__((returns_zero_on_failure)); /* { dg-warning "ignoring attribute 'returns_zero_on_failure' because it conflicts with attribute 'noreturn'" } */
+extern int test_returns_f_noreturn ()
+  __attribute__((returns_zero_on_failure))
+  __attribute__((noreturn)); /* { dg-warning "ignoring attribute 'noreturn' because it conflicts with attribute 'returns_zero_on_failure'" } */
+
+/* Should complain if mixed with "returns_twice".  */
+
+extern int test_returns_twice_returns_s ()
+  __attribute__((returns_twice))
+  __attribute__((returns_zero_on_success)); /* { dg-warning "ignoring attribute 'returns_zero_on_success' because it conflicts with attribute 'returns_twice'" } */
+extern int test_returns_s_returns_twice ()
+  __attribute__((returns_zero_on_success))
+  __attribute__((returns_twice)); /* { dg-warning "ignoring attribute 'returns_twice' because it conflicts with attribute 'returns_zero_on_success'" } */
+extern int test_returns_twice_returns_f ()
+  __attribute__((returns_twice))
+  __attribute__((returns_zero_on_failure)); /* { dg-warning "ignoring attribute 'returns_zero_on_failure' because it conflicts with attribute 'returns_twice'" } */
+extern int test_returns_f_returns_twice ()
+  __attribute__((returns_zero_on_failure))
+  __attribute__((returns_twice)); /* { dg-warning "ignoring attribute 'returns_twice' because it conflicts with attribute 'returns_zero_on_failure'" } */
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4a/6] analyzer: implement region::untrusted_p in terms of custom address spaces
  2021-11-13 20:37 [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries David Malcolm
                   ` (2 preceding siblings ...)
  2021-11-13 20:37 ` [PATCH 2/6] Add returns_zero_on_success/failure attributes David Malcolm
@ 2021-11-13 20:37 ` David Malcolm
  2021-11-13 20:37 ` [PATCH 4b/6] analyzer: implement region::untrusted_p in terms of __attribute__((untrusted)) David Malcolm
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 39+ messages in thread
From: David Malcolm @ 2021-11-13 20:37 UTC (permalink / raw)
  To: gcc-patches, linux-toolchains; +Cc: David Malcolm

gcc/analyzer/ChangeLog:
	(region::untrusted_p): New.

gcc/testsuite/ChangeLog:
	* gcc.dg/analyzer/test-uaccess.h: New header.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
---
 gcc/analyzer/region.cc                       | 13 +++++++++++++
 gcc/testsuite/gcc.dg/analyzer/test-uaccess.h | 19 +++++++++++++++++++
 2 files changed, 32 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/test-uaccess.h

diff --git a/gcc/analyzer/region.cc b/gcc/analyzer/region.cc
index bb4f53b8802..b84504dbe42 100644
--- a/gcc/analyzer/region.cc
+++ b/gcc/analyzer/region.cc
@@ -666,6 +666,19 @@ region::symbolic_for_unknown_ptr_p () const
   return false;
 }
 
+/* Return true if accessing this region crosses a trust boundary
+   e.g. user-space memory as seen by an OS kernel.  */
+
+bool
+region::untrusted_p () const
+{
+  addr_space_t as = get_addr_space ();
+  /* FIXME: treat all non-generic address spaces as untrusted for now.  */
+  if (!ADDR_SPACE_GENERIC_P (as))
+    return true;
+  return false;
+}
+
 /* region's ctor.  */
 
 region::region (complexity c, unsigned id, const region *parent, tree type)
diff --git a/gcc/testsuite/gcc.dg/analyzer/test-uaccess.h b/gcc/testsuite/gcc.dg/analyzer/test-uaccess.h
new file mode 100644
index 00000000000..0500e20b22b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/test-uaccess.h
@@ -0,0 +1,19 @@
+/* Shared header for testcases for copy_from_user/copy_to_user.  */
+
+/* Adapted from include/linux/compiler.h  */
+
+#pragma GCC custom_address_space(__user)
+
+/* Adapted from include/asm-generic/uaccess.h  */
+
+extern int copy_from_user(void *to, const void __user *from, long n)
+  __attribute__((access (write_only, 1, 3),
+		 access (read_only, 2, 3),
+		 returns_zero_on_success
+		 ));
+
+extern long copy_to_user(void __user *to, const void *from, unsigned long n)
+  __attribute__((access (write_only, 1, 3),
+		 access (read_only, 2, 3),
+		 returns_zero_on_success
+		 ));
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4b/6] analyzer: implement region::untrusted_p in terms of __attribute__((untrusted))
  2021-11-13 20:37 [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries David Malcolm
                   ` (3 preceding siblings ...)
  2021-11-13 20:37 ` [PATCH 4a/6] analyzer: implement region::untrusted_p in terms of custom address spaces David Malcolm
@ 2021-11-13 20:37 ` David Malcolm
  2021-11-13 20:37 ` [PATCH 5/6] analyzer: use region::untrusted_p in taint detection David Malcolm
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 39+ messages in thread
From: David Malcolm @ 2021-11-13 20:37 UTC (permalink / raw)
  To: gcc-patches, linux-toolchains; +Cc: David Malcolm

gcc/analyzer/ChangeLog:
	* region.cc (region::untrusted_p): Implement in terms of
	__attribute__((untrusted)).

gcc/testsuite/ChangeLog:
	* gcc.dg/analyzer/test-uaccess.h: Change from custom_address_space
	pragma to __attribute__((untrusted)).

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
---
 gcc/analyzer/region.cc                       | 19 +++++++++++++++----
 gcc/testsuite/gcc.dg/analyzer/test-uaccess.h |  2 +-
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/gcc/analyzer/region.cc b/gcc/analyzer/region.cc
index b84504dbe42..52e9fa2d1e6 100644
--- a/gcc/analyzer/region.cc
+++ b/gcc/analyzer/region.cc
@@ -672,10 +672,21 @@ region::symbolic_for_unknown_ptr_p () const
 bool
 region::untrusted_p () const
 {
-  addr_space_t as = get_addr_space ();
-  /* FIXME: treat all non-generic address spaces as untrusted for now.  */
-  if (!ADDR_SPACE_GENERIC_P (as))
-    return true;
+  const region *iter = this;
+  while (iter)
+    {
+      if (iter->get_type ())
+	return TYPE_UNTRUSTED (iter->get_type ());
+      switch (iter->get_kind ())
+	{
+	default:
+	  iter = iter->get_parent_region ();
+	  continue;
+	case RK_CAST:
+	  iter = iter->dyn_cast_cast_region ()->get_original_region ();
+	  continue;
+	}
+    }
   return false;
 }
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/test-uaccess.h b/gcc/testsuite/gcc.dg/analyzer/test-uaccess.h
index 0500e20b22b..280f4045418 100644
--- a/gcc/testsuite/gcc.dg/analyzer/test-uaccess.h
+++ b/gcc/testsuite/gcc.dg/analyzer/test-uaccess.h
@@ -2,7 +2,7 @@
 
 /* Adapted from include/linux/compiler.h  */
 
-#pragma GCC custom_address_space(__user)
+#define __user __attribute__((untrusted))
 
 /* Adapted from include/asm-generic/uaccess.h  */
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 5/6] analyzer: use region::untrusted_p in taint detection
  2021-11-13 20:37 [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries David Malcolm
                   ` (4 preceding siblings ...)
  2021-11-13 20:37 ` [PATCH 4b/6] analyzer: implement region::untrusted_p in terms of __attribute__((untrusted)) David Malcolm
@ 2021-11-13 20:37 ` David Malcolm
  2021-11-13 20:37 ` [PATCH 6/6] Add __attribute__ ((tainted)) David Malcolm
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 39+ messages in thread
From: David Malcolm @ 2021-11-13 20:37 UTC (permalink / raw)
  To: gcc-patches, linux-toolchains; +Cc: David Malcolm

This patch wires up the "untrusted" region logic to the analyzer's taint
detection, so that any data copied via a __user pointer (e.g. via a
suitably annotated "copy_from_user" decl) is treated as tainted.

It includes a series of reproducers for detecting CVE-2011-0521.
Unfortunately the analyzer doesn't yet detect the issue until the
code has been significantly simplified from its original form:
currently only in -5.c and -6.c in the series of tests (see notes
in the individual cases).

gcc/analyzer/ChangeLog:
	* sm-taint.cc (taint_state_machine::get_default_state): New, using
	region::untrusted_p.

gcc/testsuite/ChangeLog:
	* gcc.dg/analyzer/taint-CVE-2011-0521-1-fixed.c: New test.
	* gcc.dg/analyzer/taint-CVE-2011-0521-1.c: New test.
	* gcc.dg/analyzer/taint-CVE-2011-0521-2-fixed.c: New test.
	* gcc.dg/analyzer/taint-CVE-2011-0521-2.c: New test.
	* gcc.dg/analyzer/taint-CVE-2011-0521-3-fixed.c: New test.
	* gcc.dg/analyzer/taint-CVE-2011-0521-3.c: New test.
	* gcc.dg/analyzer/taint-CVE-2011-0521-4.c: New test.
	* gcc.dg/analyzer/taint-CVE-2011-0521-5.c: New test.
	* gcc.dg/analyzer/taint-CVE-2011-0521-6.c: New test.
	* gcc.dg/analyzer/taint-CVE-2011-0521.h: New test.
	* gcc.dg/analyzer/taint-antipatterns-1.c: New test.
	* gcc.dg/analyzer/taint-read-through-untrusted-ptr-1.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
---
 gcc/analyzer/sm-taint.cc                      |  13 ++
 .../analyzer/taint-CVE-2011-0521-1-fixed.c    | 113 +++++++++++++++
 .../gcc.dg/analyzer/taint-CVE-2011-0521-1.c   | 113 +++++++++++++++
 .../analyzer/taint-CVE-2011-0521-2-fixed.c    |  93 ++++++++++++
 .../gcc.dg/analyzer/taint-CVE-2011-0521-2.c   |  93 ++++++++++++
 .../analyzer/taint-CVE-2011-0521-3-fixed.c    |  56 +++++++
 .../gcc.dg/analyzer/taint-CVE-2011-0521-3.c   |  57 ++++++++
 .../gcc.dg/analyzer/taint-CVE-2011-0521-4.c   |  40 +++++
 .../gcc.dg/analyzer/taint-CVE-2011-0521-5.c   |  42 ++++++
 .../gcc.dg/analyzer/taint-CVE-2011-0521-6.c   |  37 +++++
 .../gcc.dg/analyzer/taint-CVE-2011-0521.h     | 136 +++++++++++++++++
 .../gcc.dg/analyzer/taint-antipatterns-1.c    | 137 ++++++++++++++++++
 .../taint-read-through-untrusted-ptr-1.c      |  37 +++++
 13 files changed, 967 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-1-fixed.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-2-fixed.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-3-fixed.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-5.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-6.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521.h
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-antipatterns-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-read-through-untrusted-ptr-1.c

diff --git a/gcc/analyzer/sm-taint.cc b/gcc/analyzer/sm-taint.cc
index 0a51a1fe2ea..53ba6f2b30c 100644
--- a/gcc/analyzer/sm-taint.cc
+++ b/gcc/analyzer/sm-taint.cc
@@ -85,6 +85,19 @@ public:
 				   const extrinsic_state &ext_state)
     const FINAL OVERRIDE;
 
+  state_machine::state_t
+  get_default_state (const svalue *sval) const FINAL OVERRIDE
+  {
+    /* Default to "tainted" when reading through a pointer to an untrusted
+       region.  */
+    if (const initial_svalue *initial_sval = sval->dyn_cast_initial_svalue ())
+      {
+	if (initial_sval->get_region ()->untrusted_p ())
+	  return m_tainted;
+      }
+    return m_start;
+  }
+
   bool on_stmt (sm_context *sm_ctxt,
 		const supernode *node,
 		const gimple *stmt) const FINAL OVERRIDE;
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-1-fixed.c b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-1-fixed.c
new file mode 100644
index 00000000000..a97896f2266
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-1-fixed.c
@@ -0,0 +1,113 @@
+/* See notes in this header.  */
+#include "taint-CVE-2011-0521.h"
+
+// TODO: remove need for this option
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+/* Adapted from drivers/media/dvb/ttpci/av7110_ca.c  */
+
+int dvb_ca_ioctl(struct file *file, unsigned int cmd, void *parg)
+{
+	struct dvb_device *dvbdev = file->private_data;
+	struct av7110 *av7110 = dvbdev->priv;
+	unsigned long arg = (unsigned long) parg;
+
+	/* case CA_GET_SLOT_INFO:  */
+	{
+		ca_slot_info_t *info=(ca_slot_info_t *)parg;
+
+		if (info->num < 0 || info->num > 1)
+			return -EINVAL;
+		av7110->ci_slot[info->num].num = info->num; /* { dg-bogus "attacker-controlled value" } */
+		av7110->ci_slot[info->num].type = FW_CI_LL_SUPPORT(av7110->arm_app) ?
+							CA_CI_LINK : CA_CI;
+		memcpy(info, &av7110->ci_slot[info->num], sizeof(ca_slot_info_t));
+	}
+	return 0;
+}
+
+static struct dvb_device dvbdev_ca = {
+	.priv		= NULL,
+	/* [...snip...] */
+	.kernel_ioctl	= dvb_ca_ioctl,
+};
+
+/* Adapted from drivers/media/dvb/dvb-core/dvbdev.c  */
+
+static DEFINE_MUTEX(dvbdev_mutex);
+
+int dvb_usercopy(struct file *file,
+		     unsigned int cmd, unsigned long arg,
+		     int (*func)(struct file *file,
+		     unsigned int cmd, void *arg))
+{
+	char    sbuf[128];
+	void    *mbuf = NULL;
+	void    *parg = NULL;
+	int     err  = -1;
+
+	/*  Copy arguments into temp kernel buffer  */
+	switch (_IOC_DIR(cmd)) {
+	case _IOC_NONE:
+		/*
+		 * For this command, the pointer is actually an integer
+		 * argument.
+		 */
+		parg = (void *) arg;
+		break;
+	case _IOC_READ: /* some v4l ioctls are marked wrong ... */
+	case _IOC_WRITE:
+	case (_IOC_WRITE | _IOC_READ):
+		if (_IOC_SIZE(cmd) <= sizeof(sbuf)) {
+			parg = sbuf;
+		} else {
+			/* too big to allocate from stack */
+			mbuf = kmalloc(_IOC_SIZE(cmd),GFP_KERNEL);
+			if (NULL == mbuf)
+				return -ENOMEM;
+			parg = mbuf;
+		}
+
+		err = -EFAULT;
+		if (copy_from_user(parg, (void __user *)arg, _IOC_SIZE(cmd)))
+			goto out;
+		break;
+	}
+
+	/* call driver */
+	mutex_lock(&dvbdev_mutex);
+	if ((err = func(file, cmd, parg)) == -ENOIOCTLCMD)
+		err = -EINVAL;
+	mutex_unlock(&dvbdev_mutex);
+
+	if (err < 0)
+		goto out;
+
+	/*  Copy results into user buffer  */
+	switch (_IOC_DIR(cmd))
+	{
+	case _IOC_READ:
+	case (_IOC_WRITE | _IOC_READ):
+		if (copy_to_user((void __user *)arg, parg, _IOC_SIZE(cmd)))
+			err = -EFAULT;
+		break;
+	}
+
+out:
+	kfree(mbuf);
+	return err;
+}
+
+long dvb_generic_ioctl(struct file *file,
+		       unsigned int cmd, unsigned long arg)
+{
+	struct dvb_device *dvbdev = file->private_data;
+
+	if (!dvbdev)
+		return -ENODEV;
+
+	if (!dvbdev->kernel_ioctl)
+		return -EINVAL;
+
+	return dvb_usercopy(file, cmd, arg, dvbdev->kernel_ioctl);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-1.c b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-1.c
new file mode 100644
index 00000000000..1279f40d948
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-1.c
@@ -0,0 +1,113 @@
+/* See notes in this header.  */
+#include "taint-CVE-2011-0521.h"
+
+// TODO: remove need for this option
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+/* Adapted from drivers/media/dvb/ttpci/av7110_ca.c  */
+
+int dvb_ca_ioctl(struct file *file, unsigned int cmd, void *parg)
+{
+	struct dvb_device *dvbdev = file->private_data;
+	struct av7110 *av7110 = dvbdev->priv;
+	unsigned long arg = (unsigned long) parg;
+
+	/* case CA_GET_SLOT_INFO:  */
+	{
+		ca_slot_info_t *info=(ca_slot_info_t *)parg;
+
+		if (info->num > 1)
+			return -EINVAL;
+		av7110->ci_slot[info->num].num = info->num; /* { dg-warning "attacker-controlled value" "" { xfail *-*-* } } */
+		av7110->ci_slot[info->num].type = FW_CI_LL_SUPPORT(av7110->arm_app) ?
+							CA_CI_LINK : CA_CI;
+		memcpy(info, &av7110->ci_slot[info->num], sizeof(ca_slot_info_t));
+	}
+	return 0;
+}
+
+static struct dvb_device dvbdev_ca = {
+	.priv		= NULL,
+	/* [...snip...] */
+	.kernel_ioctl	= dvb_ca_ioctl,
+};
+
+/* Adapted from drivers/media/dvb/dvb-core/dvbdev.c  */
+
+static DEFINE_MUTEX(dvbdev_mutex);
+
+int dvb_usercopy(struct file *file,
+		     unsigned int cmd, unsigned long arg,
+		     int (*func)(struct file *file,
+		     unsigned int cmd, void *arg))
+{
+	char    sbuf[128];
+	void    *mbuf = NULL;
+	void    *parg = NULL;
+	int     err  = -1;
+
+	/*  Copy arguments into temp kernel buffer  */
+	switch (_IOC_DIR(cmd)) {
+	case _IOC_NONE:
+		/*
+		 * For this command, the pointer is actually an integer
+		 * argument.
+		 */
+		parg = (void *) arg;
+		break;
+	case _IOC_READ: /* some v4l ioctls are marked wrong ... */
+	case _IOC_WRITE:
+	case (_IOC_WRITE | _IOC_READ):
+		if (_IOC_SIZE(cmd) <= sizeof(sbuf)) {
+			parg = sbuf;
+		} else {
+			/* too big to allocate from stack */
+			mbuf = kmalloc(_IOC_SIZE(cmd),GFP_KERNEL);
+			if (NULL == mbuf)
+				return -ENOMEM;
+			parg = mbuf;
+		}
+
+		err = -EFAULT;
+		if (copy_from_user(parg, (void __user *)arg, _IOC_SIZE(cmd)))
+			goto out;
+		break;
+	}
+
+	/* call driver */
+	mutex_lock(&dvbdev_mutex);
+	if ((err = func(file, cmd, parg)) == -ENOIOCTLCMD)
+		err = -EINVAL;
+	mutex_unlock(&dvbdev_mutex);
+
+	if (err < 0)
+		goto out;
+
+	/*  Copy results into user buffer  */
+	switch (_IOC_DIR(cmd))
+	{
+	case _IOC_READ:
+	case (_IOC_WRITE | _IOC_READ):
+		if (copy_to_user((void __user *)arg, parg, _IOC_SIZE(cmd)))
+			err = -EFAULT;
+		break;
+	}
+
+out:
+	kfree(mbuf);
+	return err;
+}
+
+long dvb_generic_ioctl(struct file *file,
+		       unsigned int cmd, unsigned long arg)
+{
+	struct dvb_device *dvbdev = file->private_data;
+
+	if (!dvbdev)
+		return -ENODEV;
+
+	if (!dvbdev->kernel_ioctl)
+		return -EINVAL;
+
+	return dvb_usercopy(file, cmd, arg, dvbdev->kernel_ioctl);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-2-fixed.c b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-2-fixed.c
new file mode 100644
index 00000000000..2b06bde4063
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-2-fixed.c
@@ -0,0 +1,93 @@
+/* See notes in this header.  */
+#include "taint-CVE-2011-0521.h"
+
+// TODO: remove need for this option
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+/* Adapted from drivers/media/dvb/ttpci/av7110_ca.c  */
+
+int dvb_ca_ioctl(struct file *file, unsigned int cmd, void *parg)
+{
+	struct dvb_device *dvbdev = file->private_data;
+	struct av7110 *av7110 = dvbdev->priv;
+	unsigned long arg = (unsigned long) parg;
+
+	/* case CA_GET_SLOT_INFO:  */
+	{
+		ca_slot_info_t *info=(ca_slot_info_t *)parg;
+
+		if (info->num < 0 || info->num > 1)
+			return -EINVAL;
+		av7110->ci_slot[info->num].num = info->num; /* { dg-bogus "attacker-controlled value" } */
+		av7110->ci_slot[info->num].type = FW_CI_LL_SUPPORT(av7110->arm_app) ?
+							CA_CI_LINK : CA_CI;
+		memcpy(info, &av7110->ci_slot[info->num], sizeof(ca_slot_info_t));
+	}
+	return 0;
+}
+
+/* Adapted from drivers/media/dvb/dvb-core/dvbdev.c
+   Somewhat simplified: rather than pass in a callback that can
+   be dvb_ca_ioctl, call dvb_ca_ioctl directly.  */
+
+static DEFINE_MUTEX(dvbdev_mutex);
+
+int dvb_usercopy(struct file *file,
+		 unsigned int cmd, unsigned long arg)
+{
+	char    sbuf[128];
+	void    *mbuf = NULL;
+	void    *parg = NULL;
+	int     err  = -1;
+
+	/*  Copy arguments into temp kernel buffer  */
+	switch (_IOC_DIR(cmd)) {
+	case _IOC_NONE:
+		/*
+		 * For this command, the pointer is actually an integer
+		 * argument.
+		 */
+		parg = (void *) arg;
+		break;
+	case _IOC_READ: /* some v4l ioctls are marked wrong ... */
+	case _IOC_WRITE:
+	case (_IOC_WRITE | _IOC_READ):
+		if (_IOC_SIZE(cmd) <= sizeof(sbuf)) {
+			parg = sbuf;
+		} else {
+			/* too big to allocate from stack */
+			mbuf = kmalloc(_IOC_SIZE(cmd),GFP_KERNEL);
+			if (NULL == mbuf)
+				return -ENOMEM;
+			parg = mbuf;
+		}
+
+		err = -EFAULT;
+		if (copy_from_user(parg, (void __user *)arg, _IOC_SIZE(cmd)))
+			goto out;
+		break;
+	}
+
+	/* call driver */
+	mutex_lock(&dvbdev_mutex);
+	if ((err = dvb_ca_ioctl(file, cmd, parg)) == -ENOIOCTLCMD)
+		err = -EINVAL;
+	mutex_unlock(&dvbdev_mutex);
+
+	if (err < 0)
+		goto out;
+
+	/*  Copy results into user buffer  */
+	switch (_IOC_DIR(cmd))
+	{
+	case _IOC_READ:
+	case (_IOC_WRITE | _IOC_READ):
+		if (copy_to_user((void __user *)arg, parg, _IOC_SIZE(cmd)))
+			err = -EFAULT;
+		break;
+	}
+
+out:
+	kfree(mbuf);
+	return err;
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-2.c b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-2.c
new file mode 100644
index 00000000000..c1bf748ae15
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-2.c
@@ -0,0 +1,93 @@
+/* See notes in this header.  */
+#include "taint-CVE-2011-0521.h"
+
+// TODO: remove need for this option
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+/* Adapted from drivers/media/dvb/ttpci/av7110_ca.c  */
+
+int dvb_ca_ioctl(struct file *file, unsigned int cmd, void *parg)
+{
+	struct dvb_device *dvbdev = file->private_data;
+	struct av7110 *av7110 = dvbdev->priv;
+	unsigned long arg = (unsigned long) parg;
+
+	/* case CA_GET_SLOT_INFO:  */
+	{
+		ca_slot_info_t *info=(ca_slot_info_t *)parg;
+
+		if (info->num > 1)
+			return -EINVAL;
+		av7110->ci_slot[info->num].num = info->num; /* { dg-warning "attacker-controlled value" "" { xfail *-*-* } } */
+		av7110->ci_slot[info->num].type = FW_CI_LL_SUPPORT(av7110->arm_app) ?
+							CA_CI_LINK : CA_CI;
+		memcpy(info, &av7110->ci_slot[info->num], sizeof(ca_slot_info_t));
+	}
+	return 0;
+}
+
+/* Adapted from drivers/media/dvb/dvb-core/dvbdev.c
+   Somewhat simplified: rather than pass in a callback that can
+   be dvb_ca_ioctl, call dvb_ca_ioctl directly.  */
+
+static DEFINE_MUTEX(dvbdev_mutex);
+
+int dvb_usercopy(struct file *file,
+		 unsigned int cmd, unsigned long arg)
+{
+	char    sbuf[128];
+	void    *mbuf = NULL;
+	void    *parg = NULL;
+	int     err  = -1;
+
+	/*  Copy arguments into temp kernel buffer  */
+	switch (_IOC_DIR(cmd)) {
+	case _IOC_NONE:
+		/*
+		 * For this command, the pointer is actually an integer
+		 * argument.
+		 */
+		parg = (void *) arg;
+		break;
+	case _IOC_READ: /* some v4l ioctls are marked wrong ... */
+	case _IOC_WRITE:
+	case (_IOC_WRITE | _IOC_READ):
+		if (_IOC_SIZE(cmd) <= sizeof(sbuf)) {
+			parg = sbuf;
+		} else {
+			/* too big to allocate from stack */
+			mbuf = kmalloc(_IOC_SIZE(cmd),GFP_KERNEL);
+			if (NULL == mbuf)
+				return -ENOMEM;
+			parg = mbuf;
+		}
+
+		err = -EFAULT;
+		if (copy_from_user(parg, (void __user *)arg, _IOC_SIZE(cmd)))
+			goto out;
+		break;
+	}
+
+	/* call driver */
+	mutex_lock(&dvbdev_mutex);
+	if ((err = dvb_ca_ioctl(file, cmd, parg)) == -ENOIOCTLCMD)
+		err = -EINVAL;
+	mutex_unlock(&dvbdev_mutex);
+
+	if (err < 0)
+		goto out;
+
+	/*  Copy results into user buffer  */
+	switch (_IOC_DIR(cmd))
+	{
+	case _IOC_READ:
+	case (_IOC_WRITE | _IOC_READ):
+		if (copy_to_user((void __user *)arg, parg, _IOC_SIZE(cmd)))
+			err = -EFAULT;
+		break;
+	}
+
+out:
+	kfree(mbuf);
+	return err;
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-3-fixed.c b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-3-fixed.c
new file mode 100644
index 00000000000..0147759f4df
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-3-fixed.c
@@ -0,0 +1,56 @@
+/* See notes in this header.  */
+#include "taint-CVE-2011-0521.h"
+
+// TODO: remove need for this option
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+/* Adapted from drivers/media/dvb/ttpci/av7110_ca.c  */
+
+int dvb_ca_ioctl(struct file *file, unsigned int cmd, void *parg)
+{
+	struct dvb_device *dvbdev = file->private_data;
+	struct av7110 *av7110 = dvbdev->priv;
+	unsigned long arg = (unsigned long) parg;
+
+	/* case CA_GET_SLOT_INFO:  */
+	{
+		ca_slot_info_t *info=(ca_slot_info_t *)parg;
+
+		if (info->num < 0 || info->num > 1)
+			return -EINVAL;
+		av7110->ci_slot[info->num].num = info->num; /* { dg-bogus "attacker-controlled value" } */
+		av7110->ci_slot[info->num].type = FW_CI_LL_SUPPORT(av7110->arm_app) ?
+							CA_CI_LINK : CA_CI;
+		memcpy(info, &av7110->ci_slot[info->num], sizeof(ca_slot_info_t));
+	}
+	return 0;
+}
+
+/* Adapted from drivers/media/dvb/dvb-core/dvbdev.c
+   Further simplified from -2; always use an on-stack buffer.  */
+
+static DEFINE_MUTEX(dvbdev_mutex);
+
+int dvb_usercopy(struct file *file,
+		 unsigned int cmd, unsigned long arg)
+{
+	char    sbuf[128];
+	void    *parg = sbuf;
+	int     err = -EFAULT;
+	if (copy_from_user(parg, (void __user *)arg, sizeof(sbuf)))
+	  goto out;
+
+	mutex_lock(&dvbdev_mutex);
+	if ((err = dvb_ca_ioctl(file, cmd, parg)) == -ENOIOCTLCMD)
+		err = -EINVAL;
+	mutex_unlock(&dvbdev_mutex);
+
+	if (err < 0)
+		goto out;
+
+	if (copy_to_user((void __user *)arg, parg, sizeof(sbuf)))
+	  err = -EFAULT;
+
+out:
+	return err;
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-3.c b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-3.c
new file mode 100644
index 00000000000..c53071afbab
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-3.c
@@ -0,0 +1,57 @@
+/* See notes in this header.  */
+#include "taint-CVE-2011-0521.h"
+
+// TODO: remove need for this option
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+/* Adapted from drivers/media/dvb/ttpci/av7110_ca.c  */
+
+int dvb_ca_ioctl(struct file *file, unsigned int cmd, void *parg)
+{
+	struct dvb_device *dvbdev = file->private_data;
+	struct av7110 *av7110 = dvbdev->priv;
+	unsigned long arg = (unsigned long) parg;
+
+	/* case CA_GET_SLOT_INFO:  */
+	{
+		ca_slot_info_t *info=(ca_slot_info_t *)parg;
+
+		if (info->num > 1)
+			return -EINVAL;
+		av7110->ci_slot[info->num].num = info->num; /* { dg-warning "attacker-controlled value" "" { xfail *-*-* } } */
+		// TODO(xfail)
+		av7110->ci_slot[info->num].type = FW_CI_LL_SUPPORT(av7110->arm_app) ?
+							CA_CI_LINK : CA_CI;
+		memcpy(info, &av7110->ci_slot[info->num], sizeof(ca_slot_info_t));
+	}
+	return 0;
+}
+
+/* Adapted from drivers/media/dvb/dvb-core/dvbdev.c
+   Further simplified from -2; always use an on-stack buffer.  */
+
+static DEFINE_MUTEX(dvbdev_mutex);
+
+int dvb_usercopy(struct file *file,
+		 unsigned int cmd, unsigned long arg)
+{
+	char    sbuf[128];
+	void    *parg = sbuf;
+	int     err = -EFAULT;
+	if (copy_from_user(parg, (void __user *)arg, sizeof(sbuf)))
+	  goto out;
+
+	mutex_lock(&dvbdev_mutex);
+	if ((err = dvb_ca_ioctl(file, cmd, parg)) == -ENOIOCTLCMD)
+		err = -EINVAL;
+	mutex_unlock(&dvbdev_mutex);
+
+	if (err < 0)
+		goto out;
+
+	if (copy_to_user((void __user *)arg, parg, sizeof(sbuf)))
+	  err = -EFAULT;
+
+out:
+	return err;
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-4.c b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-4.c
new file mode 100644
index 00000000000..eab95929cd6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-4.c
@@ -0,0 +1,40 @@
+/* See notes in this header.  */
+#include "taint-CVE-2011-0521.h"
+
+// TODO: remove need for this option
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+/* Adapted from dvb_ca_ioctl in drivers/media/dvb/ttpci/av7110_ca.c and
+   dvb_usercopy in drivers/media/dvb/dvb-core/dvbdev.c
+
+   Further simplified from -3; merge into a single function; drop the mutex,
+   remove control flow.  */
+
+int test_1(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	char    sbuf[128];
+	void    *parg = sbuf;
+
+	copy_from_user(parg, (void __user *)arg, sizeof(sbuf));
+
+	{
+		struct dvb_device *dvbdev = file->private_data;
+		struct av7110 *av7110 = dvbdev->priv;
+		unsigned long arg = (unsigned long) parg;
+
+		/* case CA_GET_SLOT_INFO:  */
+		ca_slot_info_t *info=(ca_slot_info_t *)parg;
+
+		if (info->num > 1)
+			return -EINVAL;
+		av7110->ci_slot[info->num].num = info->num; /* { dg-warning "attacker-controlled value" "" { xfail *-*-* } } */
+		// TODO(xfail)
+		av7110->ci_slot[info->num].type = FW_CI_LL_SUPPORT(av7110->arm_app) ?
+							CA_CI_LINK : CA_CI;
+		memcpy(info, &av7110->ci_slot[info->num], sizeof(ca_slot_info_t));
+	}
+
+	copy_to_user((void __user *)arg, parg, sizeof(sbuf));
+
+	return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-5.c b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-5.c
new file mode 100644
index 00000000000..9cf465204cc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-5.c
@@ -0,0 +1,42 @@
+/* See notes in this header.  */
+#include "taint-CVE-2011-0521.h"
+
+// TODO: remove need for this option
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+/* Adapted from dvb_ca_ioctl in drivers/media/dvb/ttpci/av7110_ca.c and
+   dvb_usercopy in drivers/media/dvb/dvb-core/dvbdev.c
+
+   Further simplified from -4; avoid parg and the cast to char[128].  */
+
+int test_1(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	ca_slot_info_t sbuf;
+
+	if (copy_from_user(&sbuf, (void __user *)arg, sizeof(sbuf)) != 0)
+		return -1;
+
+	{
+		struct dvb_device *dvbdev = file->private_data;
+		struct av7110 *av7110 = dvbdev->priv;
+
+		/* case CA_GET_SLOT_INFO:  */
+		ca_slot_info_t *info= &sbuf;
+
+		__analyzer_dump_state ("taint", info->num); /* { dg-warning "tainted" } */
+
+		if (info->num > 1)
+			return -EINVAL;
+
+		__analyzer_dump_state ("taint", info->num); /* { dg-warning "has_ub" } */
+
+		av7110->ci_slot[info->num].num = info->num; /* { dg-warning "use of attacker-controlled value '\\*info\\.num' in array lookup without checking for negative" } */
+		av7110->ci_slot[info->num].type = FW_CI_LL_SUPPORT(av7110->arm_app) ?
+							CA_CI_LINK : CA_CI;
+		memcpy(info, &av7110->ci_slot[info->num], sizeof(ca_slot_info_t));
+	}
+
+	copy_to_user((void __user *)arg, &sbuf, sizeof(sbuf));
+
+	return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-6.c b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-6.c
new file mode 100644
index 00000000000..35a16af2316
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-6.c
@@ -0,0 +1,37 @@
+/* See notes in this header.  */
+#include "taint-CVE-2011-0521.h"
+
+// TODO: remove need for this option
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+/* Adapted from dvb_ca_ioctl in drivers/media/dvb/ttpci/av7110_ca.c and
+   dvb_usercopy in drivers/media/dvb/dvb-core/dvbdev.c
+
+   Further simplified from -5; remove all control flow.  */
+
+int test_1(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	ca_slot_info_t sbuf;
+
+	if (copy_from_user(&sbuf, (void __user *)arg, sizeof(sbuf)) != 0)
+		return -1;
+
+	{
+		struct dvb_device *dvbdev = file->private_data;
+		struct av7110 *av7110 = dvbdev->priv;
+
+		/* case CA_GET_SLOT_INFO:  */
+		ca_slot_info_t *info= &sbuf;
+
+		__analyzer_dump_state ("taint", info->num); /* { dg-warning "tainted" } */
+
+		av7110->ci_slot[info->num].num = info->num; /* { dg-warning "use of attacker-controlled value '\\*info\\.num' in array lookup without bounds checking" } */
+		av7110->ci_slot[info->num].type = FW_CI_LL_SUPPORT(av7110->arm_app) ?
+							CA_CI_LINK : CA_CI;
+		memcpy(info, &av7110->ci_slot[info->num], sizeof(ca_slot_info_t));
+	}
+
+	copy_to_user((void __user *)arg, &sbuf, sizeof(sbuf));
+
+	return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521.h b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521.h
new file mode 100644
index 00000000000..0d79f9f9e08
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521.h
@@ -0,0 +1,136 @@
+/* Shared header for the various taint-CVE-2011-0521-*.c tests.
+   These are a series of successively simpler reductions of the reproducer.
+   Ideally the analyzer would detect the issue in all of the testcases,
+   but currently requires some simplification of the code to do so.
+
+   "The dvb_ca_ioctl function in drivers/media/dvb/ttpci/av7110_ca.c in the
+   Linux kernel before 2.6.38-rc2 does not check the sign of a certain integer
+   field, which allows local users to cause a denial of service (memory
+   corruption) or possibly have unspecified other impact via a negative value."
+
+   Adapted from Linux 2.6.38, which is under the GPLv2.
+
+   Fixed in e.g. cb26a24ee9706473f31d34cc259f4dcf45cd0644 on linux-2.6.38.y  */
+
+#include <string.h>
+#include "test-uaccess.h"
+#include "analyzer-decls.h"
+
+typedef unsigned int u32;
+
+/* Adapted from include/linux/compiler.h  */
+
+#define __force
+
+/* Adapted from include/asm-generic/errno-base.h  */
+
+#define	ENOMEM		12	/* Out of memory */
+#define	EFAULT		14	/* Bad address */
+#define	ENODEV		19	/* No such device */
+#define	EINVAL		22	/* Invalid argument */
+
+/* Adapted from include/linux/errno.h  */
+
+#define ENOIOCTLCMD	515	/* No ioctl command */
+
+/* Adapted from include/linux/fs.h  */
+
+struct file {
+	/* [...snip...] */
+	void			*private_data;
+	/* [...snip...] */
+};
+
+/* Adapted from drivers/media/dvb/dvb-core/dvbdev.h  */
+
+struct dvb_device {
+	/* [...snip...] */
+	int (*kernel_ioctl)(struct file *file, unsigned int cmd, void *arg);
+
+	void *priv;
+};
+
+
+/* Adapted from include/linux/dvb/ca.h  */
+
+typedef struct ca_slot_info {
+	int num;               /* slot number */
+
+	int type;              /* CA interface this slot supports */
+#define CA_CI            1     /* CI high level interface */
+#define CA_CI_LINK       2     /* CI link layer level interface */
+	/* [...snip...] */
+} ca_slot_info_t;
+
+
+/* Adapted from drivers/media/dvb/ttpci/av7110.h  */
+
+struct av7110 {
+	/* [...snip...] */
+	ca_slot_info_t		ci_slot[2];
+	/* [...snip...] */
+	u32		    arm_app;
+	/* [...snip...] */
+};
+
+/* Adapted from drivers/media/dvb/ttpci/av7110_hw.h  */
+
+#define FW_CI_LL_SUPPORT(arm_app) ((arm_app) & 0x80000000)
+
+/* Adapted from include/asm-generic/ioctl.h  */
+
+#define _IOC_NRBITS	8
+#define _IOC_TYPEBITS	8
+
+#define _IOC_SIZEBITS	14
+#define _IOC_DIRBITS	2
+
+#define _IOC_SIZEMASK	((1 << _IOC_SIZEBITS)-1)
+#define _IOC_DIRMASK	((1 << _IOC_DIRBITS)-1)
+#define _IOC_NRSHIFT	0
+#define _IOC_TYPESHIFT	(_IOC_NRSHIFT+_IOC_NRBITS)
+#define _IOC_SIZESHIFT	(_IOC_TYPESHIFT+_IOC_TYPEBITS)
+#define _IOC_DIRSHIFT	(_IOC_SIZESHIFT+_IOC_SIZEBITS)
+
+#define _IOC_NONE	0U
+#define _IOC_WRITE	1U
+#define _IOC_READ	2U
+
+#define _IOC_DIR(nr)		(((nr) >> _IOC_DIRSHIFT) & _IOC_DIRMASK)
+#define _IOC_SIZE(nr)		(((nr) >> _IOC_SIZESHIFT) & _IOC_SIZEMASK)
+
+/* Adapted from include/linux/mutex.h  */
+
+struct mutex {
+	/* [...snip...] */
+};
+
+#define __MUTEX_INITIALIZER(lockname) \
+		{ /* [...snip...] */ }
+
+#define DEFINE_MUTEX(mutexname) \
+	struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
+
+extern void mutex_lock(struct mutex *lock);
+extern void mutex_unlock(struct mutex *lock);
+
+/* Adapted from include/linux/types.h  */
+
+#define __bitwise__
+typedef unsigned __bitwise__ gfp_t;
+
+/* Adapted from include/linux/gfp.h  */
+
+#define ___GFP_WAIT		0x10u
+#define ___GFP_IO		0x40u
+#define ___GFP_FS		0x80u
+#define __GFP_WAIT	((__force gfp_t)___GFP_WAIT)
+#define __GFP_IO	((__force gfp_t)___GFP_IO)
+#define __GFP_FS	((__force gfp_t)___GFP_FS)
+#define GFP_KERNEL  (__GFP_WAIT | __GFP_IO | __GFP_FS)
+
+/* Adapted from include/linux/slab.h  */
+
+void kfree(const void *);
+void *kmalloc(size_t size, gfp_t flags)
+  __attribute__((malloc (kfree)));
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-antipatterns-1.c b/gcc/testsuite/gcc.dg/analyzer/taint-antipatterns-1.c
new file mode 100644
index 00000000000..5e81410a847
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-antipatterns-1.c
@@ -0,0 +1,137 @@
+// TODO: remove need for this:
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+#include "test-uaccess.h"
+
+/* Adapted and simplified decls from linux kernel headers.  */
+
+typedef unsigned char u8;
+typedef unsigned __INT16_TYPE__ u16;
+typedef unsigned __INT32_TYPE__ u32;
+typedef signed __INT32_TYPE__ s32;
+typedef __SIZE_TYPE__ size_t;
+
+#define   EFAULT          14
+
+typedef unsigned int gfp_t;
+#define GFP_KERNEL 0
+
+void kfree(const void *);
+void *kmalloc(size_t size, gfp_t flags)
+  __attribute__((malloc (kfree)));
+
+/* Adapted from antipatterns.ko:taint.c (GPL-v2.0).   */
+
+struct cmd_1
+{
+  u32 idx;
+  u32 val;
+};
+
+static u32 arr[16];
+
+int taint_array_access(void __user *src)
+{
+  struct cmd_1 cmd;
+  if (copy_from_user(&cmd, src, sizeof(cmd)))
+    return -EFAULT;
+  /*
+   * cmd.idx is an unsanitized value from user-space, hence
+   * this is an arbitrary kernel memory access.
+   */
+  arr[cmd.idx] = cmd.val; /* { dg-warning "use of attacker-controlled value 'cmd.idx' in array lookup without upper-bounds checking" } */
+  return 0;
+}
+
+struct cmd_2
+{
+  s32 idx;
+  u32 val;
+};
+
+int taint_signed_array_access(void __user *src)
+{
+  struct cmd_2 cmd;
+  if (copy_from_user(&cmd, src, sizeof(cmd)))
+    return -EFAULT;
+  if (cmd.idx >= 16)
+    return -EFAULT;
+
+  /*
+   * cmd.idx hasn't been checked for being negative, hence
+   * this is an arbitrary kernel memory access.
+   */
+  arr[cmd.idx] = cmd.val; /* { dg-warning "use of attacker-controlled value 'cmd.idx' in array lookup without checking for negative" } */
+  return 0;
+}
+
+struct cmd_s32_binop
+{
+  s32 a;
+  s32 b;
+  s32 result;
+};
+
+int taint_divide_by_zero_direct(void __user *uptr)
+{
+  struct cmd_s32_binop cmd;
+  if (copy_from_user(&cmd, uptr, sizeof(cmd)))
+    return -EFAULT;
+
+  /* cmd.b is attacker-controlled and could be zero */
+  cmd.result = cmd.a / cmd.b; /* { dg-warning "use of attacker-controlled value 'cmd.b' as divisor without checking for zero" } */
+
+  if (copy_to_user (uptr, &cmd, sizeof(cmd)))
+    return -EFAULT;
+  return 0;
+}
+
+int taint_divide_by_zero_compound(void __user *uptr)
+{
+  struct cmd_s32_binop cmd;
+  if (copy_from_user(&cmd, uptr, sizeof(cmd)))
+    return -EFAULT;
+
+  /*
+   * cmd.b is attacker-controlled and could be -1, hence
+   * the divisor could be zero
+   */
+  cmd.result = cmd.a / (cmd.b + 1); /* { dg-warning "use of attacker-controlled value 'cmd.b \\+ 1' as divisor without checking for zero" } */
+
+  if (copy_to_user (uptr, &cmd, sizeof(cmd)))
+    return -EFAULT;
+  return 0;
+}
+
+int taint_mod_by_zero_direct(void __user *uptr)
+{
+  struct cmd_s32_binop cmd;
+  if (copy_from_user(&cmd, uptr, sizeof(cmd)))
+    return -EFAULT;
+
+  /* cmd.b is attacker-controlled and could be zero */
+  cmd.result = cmd.a % cmd.b; /* { dg-warning "use of attacker-controlled value 'cmd.b' as divisor without checking for zero" } */
+
+  if (copy_to_user (uptr, &cmd, sizeof(cmd)))
+    return -EFAULT;
+  return 0;
+}
+
+int taint_mod_by_zero_compound(void __user *uptr)
+{
+  struct cmd_s32_binop cmd;
+  if (copy_from_user(&cmd, uptr, sizeof(cmd)))
+    return -EFAULT;
+
+  /*
+   * cmd.b is attacker-controlled and could be -1, hence
+   * the divisor could be zero
+   */
+  cmd.result = cmd.a % (cmd.b + 1); /* { dg-warning "use of attacker-controlled value 'cmd.b \\+ 1' as divisor without checking for zero" } */
+
+  if (copy_to_user (uptr, &cmd, sizeof(cmd)))
+    return -EFAULT;
+  return 0;
+}
+
+/* TODO: etc.  */
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-read-through-untrusted-ptr-1.c b/gcc/testsuite/gcc.dg/analyzer/taint-read-through-untrusted-ptr-1.c
new file mode 100644
index 00000000000..cd2911683e6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-read-through-untrusted-ptr-1.c
@@ -0,0 +1,37 @@
+// TODO: remove need for this:
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+#include "test-uaccess.h"
+
+typedef unsigned __INT32_TYPE__ u32;
+
+struct cmd_1
+{
+  u32 idx;
+  u32 val;
+};
+
+u32 arr[16];
+
+int taint_array_access_1 (struct cmd_1 __user *src)
+{
+  /*
+   * src->idx is an unsanitized value from user-space, hence
+   * this is an arbitrary kernel memory access.
+   */
+  arr[src->idx] = src->val; /* { dg-warning "use of attacker-controlled value '\\*src.idx' in array lookup without upper-bounds checking" } */
+  return 0;
+}
+
+int taint_array_access_2 (struct cmd_1 __user *src)
+{
+  struct cmd_1 cmd;
+  cmd = *src;
+
+  /*
+   * cmd.idx is an unsanitized value from user-space, hence
+   * this is an arbitrary kernel memory access.
+   */
+  arr[cmd.idx] = cmd.val; /* { dg-warning "use of attacker-controlled value '\\*src.idx' in array lookup without upper-bounds checking" } */
+  return 0;
+}
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 6/6] Add __attribute__ ((tainted))
  2021-11-13 20:37 [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries David Malcolm
                   ` (5 preceding siblings ...)
  2021-11-13 20:37 ` [PATCH 5/6] analyzer: use region::untrusted_p in taint detection David Malcolm
@ 2021-11-13 20:37 ` David Malcolm
  2022-01-06 14:08   ` PING (C/C++): " David Malcolm
  2021-11-13 23:20 ` [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries Peter Zijlstra
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 39+ messages in thread
From: David Malcolm @ 2021-11-13 20:37 UTC (permalink / raw)
  To: gcc-patches, linux-toolchains; +Cc: David Malcolm

This patch adds a new __attribute__ ((tainted)) to the C/C++ frontends.

It can be used on function decls: the analyzer will treat as tainted
all parameters to the function and all buffers pointed to by parameters
to the function.  Adding this in one place to the Linux kernel's
__SYSCALL_DEFINEx macro allows the analyzer to treat all syscalls as
having tainted inputs.  This gives additional testing beyond e.g. __user
pointers added by earlier patches - an example of the use of this can be
seen in CVE-2011-2210, where given:

 SYSCALL_DEFINE5(osf_getsysinfo, unsigned long, op, void __user *, buffer,
                 unsigned long, nbytes, int __user *, start, void __user *, arg)

the analyzer will treat the nbytes param as under attacker control, and
can complain accordingly:

taint-CVE-2011-2210-1.c: In function ‘sys_osf_getsysinfo’:
taint-CVE-2011-2210-1.c:69:21: warning: use of attacker-controlled value
  ‘nbytes’ as size without upper-bounds checking [CWE-129] [-Wanalyzer-tainted-size]
   69 |                 if (copy_to_user(buffer, hwrpb, nbytes) != 0)
      |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Additionally, the patch allows the attribute to be used on field decls:
specifically function pointers.  Any function used as an initializer
for such a field gets treated as tainted.  An example can be seen in
CVE-2020-13143, where adding __attribute__((tainted)) to the "store"
callback of configfs_attribute:

  struct configfs_attribute {
     /* [...snip...] */
     ssize_t (*store)(struct config_item *, const char *, size_t)
       __attribute__((tainted));
     /* [...snip...] */
  };

allows the analyzer to see:

 CONFIGFS_ATTR(gadget_dev_desc_, UDC);

and treat gadget_dev_desc_UDC_store as tainted, so that it complains:

taint-CVE-2020-13143-1.c: In function ‘gadget_dev_desc_UDC_store’:
taint-CVE-2020-13143-1.c:33:17: warning: use of attacker-controlled value
  ‘len + 18446744073709551615’ as offset without upper-bounds checking [CWE-823] [-Wanalyzer-tainted-offset]
   33 |         if (name[len - 1] == '\n')
      |             ~~~~^~~~~~~~~

Similarly, the attribute could be used on the ioctl callback field,
USB device callbacks, network-handling callbacks etc.  This potentially
gives a lot of test coverage with relatively little code annotation, and
without necessarily needing link-time analysis (which -fanalyzer can
only do at present on trivial examples).

I believe this is the first time we've had an attribute on a field.
If that's an issue, I could prepare a version of the patch that
merely allowed it on functions themselves.

As before this currently still needs -fanalyzer-checker=taint (in
addition to -fanalyzer).

gcc/analyzer/ChangeLog:
	* engine.cc: Include "stringpool.h", "attribs.h", and
	"tree-dfa.h".
	(mark_params_as_tainted): New.
	(class tainted_function_custom_event): New.
	(class tainted_function_info): New.
	(exploded_graph::add_function_entry): Handle functions with
	"tainted" attribute.
	(class tainted_field_custom_event): New.
	(class tainted_callback_custom_event): New.
	(class tainted_call_info): New.
	(add_tainted_callback): New.
	(add_any_callbacks): New.
	(exploded_graph::build_initial_worklist): Find callbacks that are
	reachable from global initializers, calling add_any_callbacks on
	them.

gcc/c-family/ChangeLog:
	* c-attribs.c (c_common_attribute_table): Add "tainted".
	(handle_tainted_attribute): New.

gcc/ChangeLog:
	* doc/extend.texi (Function Attributes): Note that "tainted" can
	be used on field decls.
	(Common Function Attributes): Add entry on "tainted" attribute.

gcc/testsuite/ChangeLog:
	* gcc.dg/analyzer/attr-tainted-1.c: New test.
	* gcc.dg/analyzer/attr-tainted-misuses.c: New test.
	* gcc.dg/analyzer/taint-CVE-2011-2210-1.c: New test.
	* gcc.dg/analyzer/taint-CVE-2020-13143-1.c: New test.
	* gcc.dg/analyzer/taint-CVE-2020-13143-2.c: New test.
	* gcc.dg/analyzer/taint-CVE-2020-13143.h: New test.
	* gcc.dg/analyzer/taint-alloc-3.c: New test.
	* gcc.dg/analyzer/taint-alloc-4.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
---
 gcc/analyzer/engine.cc                        | 317 +++++++++++++++++-
 gcc/c-family/c-attribs.c                      |  36 ++
 gcc/doc/extend.texi                           |  22 +-
 .../gcc.dg/analyzer/attr-tainted-1.c          |  88 +++++
 .../gcc.dg/analyzer/attr-tainted-misuses.c    |   6 +
 .../gcc.dg/analyzer/taint-CVE-2011-2210-1.c   |  93 +++++
 .../gcc.dg/analyzer/taint-CVE-2020-13143-1.c  |  38 +++
 .../gcc.dg/analyzer/taint-CVE-2020-13143-2.c  |  32 ++
 .../gcc.dg/analyzer/taint-CVE-2020-13143.h    |  91 +++++
 gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c |  21 ++
 gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c |  31 ++
 11 files changed, 772 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 096e219392d..5fab41daf93 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -68,6 +68,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "plugin.h"
 #include "target.h"
 #include <memory>
+#include "stringpool.h"
+#include "attribs.h"
+#include "tree-dfa.h"
 
 /* For an overview, see gcc/doc/analyzer.texi.  */
 
@@ -2276,6 +2279,116 @@ exploded_graph::~exploded_graph ()
     delete (*iter).second;
 }
 
+/* Subroutine for use when implementing __attribute__((tainted))
+   on functions and on function pointer fields in structs.
+
+   Called on STATE representing a call to FNDECL.
+   Mark all params of FNDECL in STATE as "tainted".  Mark the value of all
+   regions pointed to by params of FNDECL as "tainted".
+
+   Return true if successful; return false if the "taint" state machine
+   was not found.  */
+
+static bool
+mark_params_as_tainted (program_state *state, tree fndecl,
+			const extrinsic_state &ext_state)
+{
+  unsigned taint_sm_idx;
+  if (!ext_state.get_sm_idx_by_name ("taint", &taint_sm_idx))
+    return false;
+  sm_state_map *smap = state->m_checker_states[taint_sm_idx];
+
+  const state_machine &sm = ext_state.get_sm (taint_sm_idx);
+  state_machine::state_t tainted = sm.get_state_by_name ("tainted");
+
+  region_model_manager *mgr = ext_state.get_model_manager ();
+
+  function *fun = DECL_STRUCT_FUNCTION (fndecl);
+  gcc_assert (fun);
+
+  for (tree iter_parm = DECL_ARGUMENTS (fndecl); iter_parm;
+       iter_parm = DECL_CHAIN (iter_parm))
+    {
+      tree param = iter_parm;
+      if (tree parm_default_ssa = ssa_default_def (fun, iter_parm))
+	param = parm_default_ssa;
+      const region *param_reg = state->m_region_model->get_lvalue (param, NULL);
+      const svalue *init_sval = mgr->get_or_create_initial_value (param_reg);
+      smap->set_state (state->m_region_model, init_sval,
+		       tainted, NULL /*origin_new_sval*/, ext_state);
+      if (POINTER_TYPE_P (TREE_TYPE (param)))
+	{
+	  const region *pointee_reg = mgr->get_symbolic_region (init_sval);
+	  /* Mark "*param" as tainted.  */
+	  const svalue *init_pointee_sval
+	    = mgr->get_or_create_initial_value (pointee_reg);
+	  smap->set_state (state->m_region_model, init_pointee_sval,
+			   tainted, NULL /*origin_new_sval*/, ext_state);
+	}
+    }
+
+  return true;
+}
+
+/* Custom event for use by tainted_function_info when a function
+   has been marked with __attribute__((tainted)).  */
+
+class tainted_function_custom_event : public custom_event
+{
+public:
+  tainted_function_custom_event (location_t loc, tree fndecl, int depth)
+  : custom_event (loc, fndecl, depth),
+    m_fndecl (fndecl)
+  {
+  }
+
+  label_text get_desc (bool can_colorize) const FINAL OVERRIDE
+  {
+    return make_label_text
+      (can_colorize,
+       "function %qE marked with %<__attribute__((tainted))%>",
+       m_fndecl);
+  }
+
+private:
+  tree m_fndecl;
+};
+
+/* Custom exploded_edge info for top-level calls to a function
+   marked with __attribute__((tainted)).  */
+
+class tainted_function_info : public custom_edge_info
+{
+public:
+  tainted_function_info (tree fndecl)
+  : m_fndecl (fndecl)
+  {}
+
+  void print (pretty_printer *pp) const FINAL OVERRIDE
+  {
+    pp_string (pp, "call to tainted function");
+  };
+
+  bool update_model (region_model *,
+		     const exploded_edge *,
+		     region_model_context *) const FINAL OVERRIDE
+  {
+    /* No-op.  */
+    return true;
+  }
+
+  void add_events_to_path (checker_path *emission_path,
+			   const exploded_edge &) const FINAL OVERRIDE
+  {
+    emission_path->add_event
+      (new tainted_function_custom_event
+       (DECL_SOURCE_LOCATION (m_fndecl), m_fndecl, 0));
+  }
+
+private:
+  tree m_fndecl;
+};
+
 /* Ensure that there is an exploded_node representing an external call to
    FUN, adding it to the worklist if creating it.
 
@@ -2302,14 +2415,25 @@ exploded_graph::add_function_entry (function *fun)
   program_state state (m_ext_state);
   state.push_frame (m_ext_state, fun);
 
+  custom_edge_info *edge_info = NULL;
+
+  if (lookup_attribute ("tainted", DECL_ATTRIBUTES (fun->decl)))
+    {
+      if (mark_params_as_tainted (&state, fun->decl, m_ext_state))
+	edge_info = new tainted_function_info (fun->decl);
+    }
+
   if (!state.m_valid)
     return NULL;
 
   exploded_node *enode = get_or_create_node (point, state, NULL);
   if (!enode)
-    return NULL;
+    {
+      delete edge_info;
+      return NULL;
+    }
 
-  add_edge (m_origin, enode, NULL);
+  add_edge (m_origin, enode, NULL, edge_info);
 
   m_functions_with_enodes.add (fun);
 
@@ -2623,6 +2747,184 @@ toplevel_function_p (function *fun, logger *logger)
   return true;
 }
 
+/* Custom event for use by tainted_call_info when a callback field has been
+   marked with __attribute__((tainted)), for labelling the field.  */
+
+class tainted_field_custom_event : public custom_event
+{
+public:
+  tainted_field_custom_event (tree field)
+  : custom_event (DECL_SOURCE_LOCATION (field), NULL_TREE, 0),
+    m_field (field)
+  {
+  }
+
+  label_text get_desc (bool can_colorize) const FINAL OVERRIDE
+  {
+    return make_label_text (can_colorize,
+			    "field %qE of %qT"
+			    " is marked with %<__attribute__((tainted))%>",
+			    m_field, DECL_CONTEXT (m_field));
+  }
+
+private:
+  tree m_field;
+};
+
+/* Custom event for use by tainted_call_info when a callback field has been
+   marked with __attribute__((tainted)), for labelling the function used
+   in that callback.  */
+
+class tainted_callback_custom_event : public custom_event
+{
+public:
+  tainted_callback_custom_event (location_t loc, tree fndecl, int depth,
+				 tree field)
+  : custom_event (loc, fndecl, depth),
+    m_field (field)
+  {
+  }
+
+  label_text get_desc (bool can_colorize) const FINAL OVERRIDE
+  {
+    return make_label_text (can_colorize,
+			    "function %qE used as initializer for field %qE"
+			    " marked with %<__attribute__((tainted))%>",
+			    m_fndecl, m_field);
+  }
+
+private:
+  tree m_field;
+};
+
+/* Custom edge info for use when adding a function used by a callback field
+   marked with '__attribute__((tainted))'.   */
+
+class tainted_call_info : public custom_edge_info
+{
+public:
+  tainted_call_info (tree field, tree fndecl, location_t loc)
+  : m_field (field), m_fndecl (fndecl), m_loc (loc)
+  {}
+
+  void print (pretty_printer *pp) const FINAL OVERRIDE
+  {
+    pp_string (pp, "call to tainted field");
+  };
+
+  bool update_model (region_model *,
+		     const exploded_edge *,
+		     region_model_context *) const FINAL OVERRIDE
+  {
+    /* No-op.  */
+    return true;
+  }
+
+  void add_events_to_path (checker_path *emission_path,
+			   const exploded_edge &) const FINAL OVERRIDE
+  {
+    /* Show the field in the struct declaration
+       e.g. "(1) field 'store' is marked with '__attribute__((tainted))'"  */
+    emission_path->add_event
+      (new tainted_field_custom_event (m_field));
+
+    /* Show the callback in the initializer
+       e.g.
+       "(2) function 'gadget_dev_desc_UDC_store' used as initializer
+       for field 'store' marked with '__attribute__((tainted))'".  */
+    emission_path->add_event
+      (new tainted_callback_custom_event (m_loc, m_fndecl, 0, m_field));
+  }
+
+private:
+  tree m_field;
+  tree m_fndecl;
+  location_t m_loc;
+};
+
+/* Given an initializer at LOC for FIELD marked with '__attribute__((tainted))'
+   initialized with FNDECL, add an entrypoint to FNDECL to EG (and to its
+   worklist) where the params to FNDECL are marked as tainted.  */
+
+static void
+add_tainted_callback (exploded_graph *eg, tree field, tree fndecl,
+		      location_t loc)
+{
+  logger *logger = eg->get_logger ();
+
+  LOG_SCOPE (logger);
+
+  if (!gimple_has_body_p (fndecl))
+    return;
+
+  const extrinsic_state &ext_state = eg->get_ext_state ();
+
+  function *fun = DECL_STRUCT_FUNCTION (fndecl);
+  gcc_assert (fun);
+
+  program_point point
+    = program_point::from_function_entry (eg->get_supergraph (), fun);
+  program_state state (ext_state);
+  state.push_frame (ext_state, fun);
+
+  if (!mark_params_as_tainted (&state, fndecl, ext_state))
+    return;
+
+  if (!state.m_valid)
+    return;
+
+  exploded_node *enode = eg->get_or_create_node (point, state, NULL);
+  if (logger)
+    {
+      if (enode)
+	logger->log ("created EN %i for tainted %qE entrypoint",
+		     enode->m_index, fndecl);
+      else
+	{
+	  logger->log ("did not create enode for tainted %qE entrypoint",
+		       fndecl);
+	  return;
+	}
+    }
+
+  tainted_call_info *info = new tainted_call_info (field, fndecl, loc);
+  eg->add_edge (eg->get_origin (), enode, NULL, info);
+}
+
+/* Callback for walk_tree for finding callbacks within initializers;
+   ensure that any callback initializer where the corresponding field is
+   marked with '__attribute__((tainted))' is treated as an entrypoint to the
+   analysis, special-casing that the inputs to the callback are
+   untrustworthy.  */
+
+static tree
+add_any_callbacks (tree *tp, int *, void *data)
+{
+  exploded_graph *eg = (exploded_graph *)data;
+  if (TREE_CODE (*tp) == CONSTRUCTOR)
+    {
+      /* Find fields with the "tainted" attribute.
+	 walk_tree only walks the values, not the index values;
+	 look at the index values.  */
+      unsigned HOST_WIDE_INT idx;
+      constructor_elt *ce;
+
+      for (idx = 0; vec_safe_iterate (CONSTRUCTOR_ELTS (*tp), idx, &ce);
+	   idx++)
+	if (ce->index && TREE_CODE (ce->index) == FIELD_DECL)
+	  if (lookup_attribute ("tainted", DECL_ATTRIBUTES (ce->index)))
+	    {
+	      tree value = ce->value;
+	      if (TREE_CODE (value) == ADDR_EXPR
+		  && TREE_CODE (TREE_OPERAND (value, 0)) == FUNCTION_DECL)
+		add_tainted_callback (eg, ce->index, TREE_OPERAND (value, 0),
+				      EXPR_LOCATION (value));
+	    }
+    }
+
+  return NULL_TREE;
+}
+
 /* Add initial nodes to EG, with entrypoints for externally-callable
    functions.  */
 
@@ -2648,6 +2950,17 @@ exploded_graph::build_initial_worklist ()
 	  logger->log ("did not create enode for %qE entrypoint", fun->decl);
       }
   }
+
+  /* Find callbacks that are reachable from global initializers.  */
+  varpool_node *vpnode;
+  FOR_EACH_VARIABLE (vpnode)
+    {
+      tree decl = vpnode->decl;
+      tree init = DECL_INITIAL (decl);
+      if (!init)
+	continue;
+      walk_tree (&init, add_any_callbacks, this, NULL);
+    }
 }
 
 /* The main loop of the analysis.
diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 9e03156de5e..835ba6e0e8c 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -117,6 +117,7 @@ static tree handle_no_profile_instrument_function_attribute (tree *, tree,
 							     tree, int, bool *);
 static tree handle_malloc_attribute (tree *, tree, tree, int, bool *);
 static tree handle_dealloc_attribute (tree *, tree, tree, int, bool *);
+static tree handle_tainted_attribute (tree *, tree, tree, int, bool *);
 static tree handle_returns_twice_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_limit_stack_attribute (tree *, tree, tree, int,
 					     bool *);
@@ -569,6 +570,8 @@ const struct attribute_spec c_common_attribute_table[] =
 			      handle_objc_nullability_attribute, NULL },
   { "*dealloc",                1, 2, true, false, false, false,
 			      handle_dealloc_attribute, NULL },
+  { "tainted",		      0, 0, true,  false, false, false,
+			      handle_tainted_attribute, NULL },
   { NULL,                     0, 0, false, false, false, false, NULL, NULL }
 };
 
@@ -5857,6 +5860,39 @@ handle_objc_nullability_attribute (tree *node, tree name, tree args,
   return NULL_TREE;
 }
 
+/* Handle a "tainted" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_tainted_attribute (tree *node, tree name, tree, int,
+			  bool *no_add_attrs)
+{
+  if (TREE_CODE (*node) != FUNCTION_DECL
+      && TREE_CODE (*node) != FIELD_DECL)
+    {
+      warning (OPT_Wattributes, "%qE attribute ignored; valid only "
+	       "for functions and function pointer fields",
+	       name);
+      *no_add_attrs = true;
+      return NULL_TREE;
+    }
+
+  if (TREE_CODE (*node) == FIELD_DECL
+      && !(TREE_CODE (TREE_TYPE (*node)) == POINTER_TYPE
+	   && TREE_CODE (TREE_TYPE (TREE_TYPE (*node))) == FUNCTION_TYPE))
+    {
+      warning (OPT_Wattributes, "%qE attribute ignored;"
+	       " field must be a function pointer",
+	       name);
+      *no_add_attrs = true;
+      return NULL_TREE;
+    }
+
+  *no_add_attrs = false; /* OK */
+
+  return NULL_TREE;
+}
+
 /* Attempt to partially validate a single attribute ATTR as if
    it were to be applied to an entity OPER.  */
 
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 5a6ef464779..826bbd48e7e 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -2465,7 +2465,8 @@ variable declarations (@pxref{Variable Attributes}),
 labels (@pxref{Label Attributes}),
 enumerators (@pxref{Enumerator Attributes}),
 statements (@pxref{Statement Attributes}),
-and types (@pxref{Type Attributes}).
+types (@pxref{Type Attributes}),
+and on field declarations (for @code{tainted}).
 
 There is some overlap between the purposes of attributes and pragmas
 (@pxref{Pragmas,,Pragmas Accepted by GCC}).  It has been
@@ -3977,6 +3978,25 @@ addition to creating a symbol version (as if
 @code{"@var{name2}@@@var{nodename}"} was used) the version will be also used
 to resolve @var{name2} by the linker.
 
+@item tainted
+@cindex @code{tainted} function attribute
+The @code{tainted} attribute is used to specify that a function is called
+in a way that requires sanitization of its arguments, such as a system
+call in an operating system kernel.  Such a function can be considered part
+of the ``attack surface'' of the program.  The attribute can be used both
+on function declarations, and on field declarations containing function
+pointers.  In the latter case, any function used as an initializer of
+such a callback field will be treated as tainted.
+
+The analyzer will pay particular attention to such functions when both
+@option{-fanalyzer} and @option{-fanalyzer-checker=taint} are supplied,
+potentially issuing warnings guarded by
+@option{-Wanalyzer-exposure-through-uninit-copy},
+@option{-Wanalyzer-tainted-allocation-size},
+@option{-Wanalyzer-tainted-array-index},
+@option{Wanalyzer-tainted-offset},
+and @option{Wanalyzer-tainted-size}.
+
 @item target_clones (@var{options})
 @cindex @code{target_clones} function attribute
 The @code{target_clones} attribute is used to specify that a function
diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
new file mode 100644
index 00000000000..cc4d5900372
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
@@ -0,0 +1,88 @@
+// TODO: remove need for this option
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+#include "analyzer-decls.h"
+
+struct arg_buf
+{
+  int i;
+  int j;
+};
+
+/* Example of marking a function as tainted.  */
+
+void __attribute__((tainted))
+test_1 (int i, void *p, char *q)
+{
+  /* There should be a single enode,
+     for the "tainted" entry to the function.  */
+  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1 processed enode" } */
+
+  __analyzer_dump_state ("taint", i); /* { dg-warning "state: 'tainted'" } */
+  __analyzer_dump_state ("taint", p); /* { dg-warning "state: 'tainted'" } */
+  __analyzer_dump_state ("taint", q); /* { dg-warning "state: 'tainted'" } */
+  __analyzer_dump_state ("taint", *q); /* { dg-warning "state: 'tainted'" } */
+
+  struct arg_buf *args = p;
+  __analyzer_dump_state ("taint", args->i); /* { dg-warning "state: 'tainted'" } */
+  __analyzer_dump_state ("taint", args->j); /* { dg-warning "state: 'tainted'" } */  
+}
+
+/* Example of marking a callback field as tainted.  */
+
+struct s2
+{
+  void (*cb) (int, void *, char *)
+    __attribute__((tainted));
+};
+
+/* Function not marked as tainted.  */
+
+void
+test_2a (int i, void *p, char *q)
+{
+  /* There should be a single enode,
+     for the normal entry to the function.  */
+  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1 processed enode" } */
+
+  __analyzer_dump_state ("taint", i); /* { dg-warning "state: 'start'" } */
+  __analyzer_dump_state ("taint", p); /* { dg-warning "state: 'start'" } */
+  __analyzer_dump_state ("taint", q); /* { dg-warning "state: 'start'" } */
+
+  struct arg_buf *args = p;
+  __analyzer_dump_state ("taint", args->i); /* { dg-warning "state: 'start'" } */
+  __analyzer_dump_state ("taint", args->j); /* { dg-warning "state: 'start'" } */  
+}
+
+/* Function referenced via t2b.cb, marked as "tainted".  */
+
+void
+test_2b (int i, void *p, char *q)
+{
+  /* There should be two enodes
+     for the direct call, and the "tainted" entry to the function.  */
+  __analyzer_dump_exploded_nodes (0); /* { dg-warning "2 processed enodes" } */
+}
+
+/* Callback used via t2c.cb, marked as "tainted".  */
+void
+__analyzer_test_2c (int i, void *p, char *q)
+{
+  /* There should be a single enode,
+     for the "tainted" entry to the function.  */
+  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1 processed enode" } */
+
+  __analyzer_dump_state ("taint", i); /* { dg-warning "state: 'tainted'" } */
+  __analyzer_dump_state ("taint", p); /* { dg-warning "state: 'tainted'" } */
+  __analyzer_dump_state ("taint", q); /* { dg-warning "state: 'tainted'" } */
+}
+
+struct s2 t2b =
+{
+  .cb = test_2b
+};
+
+struct s2 t2c =
+{
+  .cb = __analyzer_test_2c
+};
diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
new file mode 100644
index 00000000000..6f4cbc82efb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
@@ -0,0 +1,6 @@
+int not_a_fn __attribute__ ((tainted)); /* { dg-warning "'tainted' attribute ignored; valid only for functions and function pointer fields" } */
+
+struct s
+{
+  int f __attribute__ ((tainted)); /* { dg-warning "'tainted' attribute ignored; field must be a function pointer" } */
+};
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
new file mode 100644
index 00000000000..fe6c7ebbb1f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
@@ -0,0 +1,93 @@
+/* "The osf_getsysinfo function in arch/alpha/kernel/osf_sys.c in the
+   Linux kernel before 2.6.39.4 on the Alpha platform does not properly
+   restrict the data size for GSI_GET_HWRPB operations, which allows
+   local users to obtain sensitive information from kernel memory via
+   a crafted call."
+
+   Fixed in 3d0475119d8722798db5e88f26493f6547a4bb5b on linux-2.6.39.y
+   in linux-stable.  */
+
+// TODO: remove need for this option:
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+#include "analyzer-decls.h"
+#include "test-uaccess.h"
+
+/* Adapted from include/linux/linkage.h.  */
+
+#define asmlinkage
+
+/* Adapted from include/linux/syscalls.h.  */
+
+#define __SC_DECL1(t1, a1)	t1 a1
+#define __SC_DECL2(t2, a2, ...) t2 a2, __SC_DECL1(__VA_ARGS__)
+#define __SC_DECL3(t3, a3, ...) t3 a3, __SC_DECL2(__VA_ARGS__)
+#define __SC_DECL4(t4, a4, ...) t4 a4, __SC_DECL3(__VA_ARGS__)
+#define __SC_DECL5(t5, a5, ...) t5 a5, __SC_DECL4(__VA_ARGS__)
+#define __SC_DECL6(t6, a6, ...) t6 a6, __SC_DECL5(__VA_ARGS__)
+
+#define SYSCALL_DEFINEx(x, sname, ...)				\
+	__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
+
+#define SYSCALL_DEFINE(name) asmlinkage long sys_##name
+#define __SYSCALL_DEFINEx(x, name, ...)					\
+	asmlinkage __attribute__((tainted)) \
+	long sys##name(__SC_DECL##x(__VA_ARGS__))
+
+#define SYSCALL_DEFINE5(name, ...) SYSCALL_DEFINEx(5, _##name, __VA_ARGS__)
+
+/* Adapted from arch/alpha/include/asm/hwrpb.h.  */
+
+struct hwrpb_struct {
+	unsigned long phys_addr;	/* check: physical address of the hwrpb */
+	unsigned long id;		/* check: "HWRPB\0\0\0" */
+	unsigned long revision;
+	unsigned long size;		/* size of hwrpb */
+	/* [...snip...] */
+};
+
+extern struct hwrpb_struct *hwrpb;
+
+/* Adapted from arch/alpha/kernel/osf_sys.c.  */
+
+SYSCALL_DEFINE5(osf_getsysinfo, unsigned long, op, void __user *, buffer,
+		unsigned long, nbytes, int __user *, start, void __user *, arg)
+{
+	/* [...snip...] */
+
+	__analyzer_dump_state ("taint", nbytes);  /* { dg-warning "tainted" } */
+
+	/* TODO: should have an event explaining why "nbytes" is treated as
+	   attacker-controlled.  */
+
+	/* case GSI_GET_HWRPB: */
+		if (nbytes < sizeof(*hwrpb))
+			return -1;
+
+		__analyzer_dump_state ("taint", nbytes);  /* { dg-warning "has_lb" } */
+
+		if (copy_to_user(buffer, hwrpb, nbytes) != 0) /* { dg-warning "use of attacker-controlled value 'nbytes' as size without upper-bounds checking" } */
+			return -2;
+
+		return 1;
+
+	/* [...snip...] */
+}
+
+/* With the fix for the sense of the size comparison.  */
+
+SYSCALL_DEFINE5(osf_getsysinfo_fixed, unsigned long, op, void __user *, buffer,
+		unsigned long, nbytes, int __user *, start, void __user *, arg)
+{
+	/* [...snip...] */
+
+	/* case GSI_GET_HWRPB: */
+		if (nbytes > sizeof(*hwrpb))
+			return -1;
+		if (copy_to_user(buffer, hwrpb, nbytes) != 0) /* { dg-bogus "attacker-controlled" } */
+			return -2;
+
+		return 1;
+
+	/* [...snip...] */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
new file mode 100644
index 00000000000..0b9a94a8d6c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
@@ -0,0 +1,38 @@
+/* See notes in this header.  */
+#include "taint-CVE-2020-13143.h"
+
+// TODO: remove need for this option
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+struct configfs_attribute {
+	/* [...snip...] */
+	ssize_t (*store)(struct config_item *, const char *, size_t) /* { dg-message "\\(1\\) field 'store' of 'struct configfs_attribute' is marked with '__attribute__\\(\\(tainted\\)\\)'" } */
+		__attribute__((tainted)); /* (this is added).  */
+};
+static inline struct gadget_info *to_gadget_info(struct config_item *item)
+{
+	 return container_of(to_config_group(item), struct gadget_info, group);
+}
+
+static ssize_t gadget_dev_desc_UDC_store(struct config_item *item,
+		const char *page, size_t len)
+{
+	struct gadget_info *gi = to_gadget_info(item);
+	char *name;
+	int ret;
+
+#if 0
+	/* FIXME: this is the fix.  */
+	if (strlen(page) < len)
+		return -EOVERFLOW;
+#endif
+
+	name = kstrdup(page, GFP_KERNEL);
+	if (!name)
+		return -ENOMEM;
+	if (name[len - 1] == '\n') /* { dg-warning "use of attacker-controlled value 'len \[^\n\r\]+' as offset without upper-bounds checking" } */
+		name[len - 1] = '\0'; /* { dg-warning "use of attacker-controlled value 'len \[^\n\r\]+' as offset without upper-bounds checking" } */
+	/* [...snip...] */				\
+}
+
+CONFIGFS_ATTR(gadget_dev_desc_, UDC); /* { dg-message "\\(2\\) function 'gadget_dev_desc_UDC_store' used as initializer for field 'store' marked with '__attribute__\\(\\(tainted\\)\\)'" } */
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
new file mode 100644
index 00000000000..e05da9276c1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
@@ -0,0 +1,32 @@
+/* See notes in this header.  */
+#include "taint-CVE-2020-13143.h"
+
+// TODO: remove need for this option
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+struct configfs_attribute {
+	/* [...snip...] */
+	ssize_t (*store)(struct config_item *, const char *, size_t) /* { dg-message "\\(1\\) field 'store' of 'struct configfs_attribute' is marked with '__attribute__\\(\\(tainted\\)\\)'" } */
+		__attribute__((tainted)); /* (this is added).  */
+};
+
+/* Highly simplified version.  */
+
+static ssize_t gadget_dev_desc_UDC_store(struct config_item *item,
+		const char *page, size_t len)
+{
+	/* TODO: ought to have state_change_event talking about where the tainted value comes from.  */
+
+	char *name;
+	/* [...snip...] */
+
+	name = kstrdup(page, GFP_KERNEL);
+	if (!name)
+		return -ENOMEM;
+	if (name[len - 1] == '\n') /* { dg-warning "use of attacker-controlled value 'len \[^\n\r\]+' as offset without upper-bounds checking" } */
+		name[len - 1] = '\0';  /* { dg-warning "use of attacker-controlled value 'len \[^\n\r\]+' as offset without upper-bounds checking" } */
+	/* [...snip...] */
+	return 0;
+}
+
+CONFIGFS_ATTR(gadget_dev_desc_, UDC); /* { dg-message "\\(2\\) function 'gadget_dev_desc_UDC_store' used as initializer for field 'store' marked with '__attribute__\\(\\(tainted\\)\\)'" } */
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
new file mode 100644
index 00000000000..0ba023539af
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
@@ -0,0 +1,91 @@
+/* Shared header for the various taint-CVE-2020-13143.h tests.
+   
+   "gadget_dev_desc_UDC_store in drivers/usb/gadget/configfs.c in the
+   Linux kernel 3.16 through 5.6.13 relies on kstrdup without considering
+   the possibility of an internal '\0' value, which allows attackers to
+   trigger an out-of-bounds read, aka CID-15753588bcd4."
+
+   Fixed by 15753588bcd4bbffae1cca33c8ced5722477fe1f on linux-5.7.y
+   in linux-stable.  */
+
+// TODO: remove need for this option
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+#include <stddef.h>
+
+/* Adapted from include/uapi/asm-generic/posix_types.h  */
+
+typedef unsigned int     __kernel_size_t;
+typedef int              __kernel_ssize_t;
+
+/* Adapted from include/linux/types.h  */
+
+//typedef __kernel_size_t		size_t;
+typedef __kernel_ssize_t	ssize_t;
+
+/* Adapted from include/linux/kernel.h  */
+
+#define container_of(ptr, type, member) ({				\
+	void *__mptr = (void *)(ptr);					\
+	/* [...snip...] */						\
+	((type *)(__mptr - offsetof(type, member))); })
+
+/* Adapted from include/linux/configfs.h  */
+
+struct config_item {
+	/* [...snip...] */
+};
+
+struct config_group {
+	struct config_item		cg_item;
+	/* [...snip...] */
+};
+
+static inline struct config_group *to_config_group(struct config_item *item)
+{
+	return item ? container_of(item,struct config_group,cg_item) : NULL;
+}
+
+#define CONFIGFS_ATTR(_pfx, _name)				\
+static struct configfs_attribute _pfx##attr_##_name = {	\
+	/* [...snip...] */				\
+	.store		= _pfx##_name##_store,		\
+}
+
+/* Adapted from include/linux/compiler.h  */
+
+#define __force
+
+/* Adapted from include/asm-generic/errno-base.h  */
+
+#define	ENOMEM		12	/* Out of memory */
+
+/* Adapted from include/linux/types.h  */
+
+#define __bitwise__
+typedef unsigned __bitwise__ gfp_t;
+
+/* Adapted from include/linux/gfp.h  */
+
+#define ___GFP_WAIT		0x10u
+#define ___GFP_IO		0x40u
+#define ___GFP_FS		0x80u
+#define __GFP_WAIT	((__force gfp_t)___GFP_WAIT)
+#define __GFP_IO	((__force gfp_t)___GFP_IO)
+#define __GFP_FS	((__force gfp_t)___GFP_FS)
+#define GFP_KERNEL  (__GFP_WAIT | __GFP_IO | __GFP_FS)
+
+/* Adapted from include/linux/compiler_attributes.h  */
+
+#define __malloc                        __attribute__((__malloc__))
+
+/* Adapted from include/linux/string.h  */
+
+extern char *kstrdup(const char *s, gfp_t gfp) __malloc;
+
+/* Adapted from drivers/usb/gadget/configfs.c  */
+
+struct gadget_info {
+	struct config_group group;
+	/* [...snip...] */				\
+};
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
new file mode 100644
index 00000000000..4c567b2ffdf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
@@ -0,0 +1,21 @@
+// TODO: remove need for this option:
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+#include "analyzer-decls.h"
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+/* malloc with tainted size from a syscall.  */
+
+void *p;
+
+void __attribute__((tainted))
+test_1 (size_t sz) /* { dg-message "\\(1\\) function 'test_1' marked with '__attribute__\\(\\(tainted\\)\\)'" } */
+{
+  /* TODO: should have a message saying why "sz" is tainted, e.g.
+     "treating 'sz' as attacker-controlled because 'test_1' is marked with '__attribute__((tainted))'"  */
+
+  p = malloc (sz); /* { dg-warning "use of attacker-controlled value 'sz' as allocation size without upper-bounds checking" "warning" } */
+  /* { dg-message "\\(\[0-9\]+\\) use of attacker-controlled value 'sz' as allocation size without upper-bounds checking" "final event" { target *-*-* } .-1 } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
new file mode 100644
index 00000000000..f52cafcd71d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
@@ -0,0 +1,31 @@
+// TODO: remove need for this option:
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+#include "analyzer-decls.h"
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+/* malloc with tainted size from a syscall.  */
+
+struct arg_buf
+{
+  size_t sz;
+};
+
+void *p;
+
+void __attribute__((tainted))
+test_1 (void *data) /* { dg-message "\\(1\\) function 'test_1' marked with '__attribute__\\(\\(tainted\\)\\)'" } */
+{
+  /* we should treat pointed-to-structs as tainted.  */
+  __analyzer_dump_state ("taint", data); /* { dg-warning "state: 'tainted'" } */
+  
+  struct arg_buf *args = data;
+
+  __analyzer_dump_state ("taint", args); /* { dg-warning "state: 'tainted'" } */
+  __analyzer_dump_state ("taint", args->sz); /* { dg-warning "state: 'tainted'" } */
+  
+  p = malloc (args->sz); /* { dg-warning "use of attacker-controlled value '\\*args.sz' as allocation size without upper-bounds checking" "warning" } */
+  /* { dg-message "\\(\[0-9\]+\\) use of attacker-controlled value '\\*args.sz' as allocation size without upper-bounds checking" "final event" { target *-*-* } .-1 } */
+}
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries
  2021-11-13 20:37 [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries David Malcolm
                   ` (6 preceding siblings ...)
  2021-11-13 20:37 ` [PATCH 6/6] Add __attribute__ ((tainted)) David Malcolm
@ 2021-11-13 23:20 ` Peter Zijlstra
  2021-11-14  2:54   ` David Malcolm
  2021-11-14 13:54 ` Miguel Ojeda
  2021-12-06 18:12 ` Martin Sebor
  9 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2021-11-13 23:20 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches, linux-toolchains

On Sat, Nov 13, 2021 at 03:37:24PM -0500, David Malcolm wrote:

> This approach is much less expressive that the custom addres space
> approach; it would only cover the trust boundary aspect; it wouldn't
> cover any differences between generic pointers and __user, vs __iomem,
> __percpu, and __rcu which I admit I only dimly understand.

__iomem would point at device memory, which can have curious side
effects or is yet another trust boundary, depending on device and usage.

__percpu is an address space that denotes a per-cpu variable's relative
offset, it needs be combined with a per-cpu offset to get a 'real'
pointer, on x86_64 %gs segment offset is used for this purpose, other
architectures are less fortunate. The whole per_cpu()/this_cpu_*()
family of APIs accepts such pointers.

__rcu is the regular kernel address space, but denotes that the object
pointed to has RCU lifetime management. The attribute is laundered
through rcu_dereference() to remove the __rcu qualifier.

> Possibly silly question: is it always a bug for the value of a kernel
> pointer to leak into user space?  i.e. should I be complaining about an
> infoleak if the value of a trusted_ptr itself is written to
> *untrusted_ptr?  e.g.

Yes, always. Leaking kernel pointers is unconditionally bad.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries
  2021-11-13 23:20 ` [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries Peter Zijlstra
@ 2021-11-14  2:54   ` David Malcolm
  0 siblings, 0 replies; 39+ messages in thread
From: David Malcolm @ 2021-11-14  2:54 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: gcc-patches, linux-toolchains

On Sun, 2021-11-14 at 00:20 +0100, Peter Zijlstra wrote:
> On Sat, Nov 13, 2021 at 03:37:24PM -0500, David Malcolm wrote:
> 
> > This approach is much less expressive that the custom addres space
> > approach; it would only cover the trust boundary aspect; it
> > wouldn't
> > cover any differences between generic pointers and __user, vs
> > __iomem,
> > __percpu, and __rcu which I admit I only dimly understand.
> 
> __iomem would point at device memory, which can have curious side
> effects or is yet another trust boundary, depending on device and
> usage.
> 
> __percpu is an address space that denotes a per-cpu variable's
> relative
> offset, it needs be combined with a per-cpu offset to get a 'real'
> pointer, on x86_64 %gs segment offset is used for this purpose, other
> architectures are less fortunate. The whole per_cpu()/this_cpu_*()
> family of APIs accepts such pointers.
> 
> __rcu is the regular kernel address space, but denotes that the
> object
> pointed to has RCU lifetime management. The attribute is laundered
> through rcu_dereference() to remove the __rcu qualifier.

Thanks; this is very helpful.

> 
> > Possibly silly question: is it always a bug for the value of a
> > kernel
> > pointer to leak into user space?  i.e. should I be complaining
> > about an
> > infoleak if the value of a trusted_ptr itself is written to
> > *untrusted_ptr?  e.g.
> 
> Yes, always. Leaking kernel pointers is unconditionally bad.

Thanks.

FWIW I've thrown together a new warning in -fanalyzer for this, e.g.
given:

/* Some kernel space thing, where the address is presumably secret */
struct foo_t
{
} foo;

/* Response struct for some ioctl/syscall  */
struct s1
{
  void *ptr;
};

void test_1 (void __user *p)
{
  struct s1 s = {0};
  s.ptr = &foo;
  copy_to_user (p, &s, sizeof (s));
}

...my code emits...

infoleak-ptr-1.c: In function ‘test_1’:
infoleak-ptr-1.c:17:3: warning: potential exposure of sensitive
  information by copying pointer ‘&foo’ across trust boundary
  [-Wanalyzer-exposure-of-pointer]
   17 |   copy_to_user (p, &s, sizeof (s));
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

but it strikes me that there could be other sensitive information
beyond just the values of kernel-space pointers that must not cross a
trust boundary.  GCC's -fanalyzer currently has a state machine for
tracking "sensitive" values, but it's currently just a proof-of-concept
that merely treats the result of the user-space API "getpass" as
sensitive (with a demo of detecting passwords being exposed via
logfiles).  Any ideas on other values in the kernel that it would be
useful to treat as "sensitive"?  (maybe crypto private keys???  other
internal state???)  I can do it by types, by results of functions, etc.
That said, I'm not modeling the kernel's own access model (root vs
regular user etc) in the analyzer, so maybe extending things beyond
kernel space addresses is misguided?


Hope this is constructive
Dave


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries
  2021-11-13 20:37 [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries David Malcolm
                   ` (7 preceding siblings ...)
  2021-11-13 23:20 ` [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries Peter Zijlstra
@ 2021-11-14 13:54 ` Miguel Ojeda
  2021-12-06 18:12 ` Martin Sebor
  9 siblings, 0 replies; 39+ messages in thread
From: Miguel Ojeda @ 2021-11-14 13:54 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches, linux-toolchains

On Sat, Nov 13, 2021 at 9:37 PM David Malcolm <dmalcolm@redhat.com> wrote:
>
>   #define __user __attribute__((untrusted))
>
> where my patched GCC treats
>   T *
> vs
>   T __attribute__((untrusted)) *
> as being different types and thus the C frontend can complain (even without
> -fanalyzer) about e.g.:

This one sounds similar to the `Untrusted<T>` wrapper I suggested for
the Rust side -- we would have a method to "extract and trust" the
value (instead of a cast).

> Patch 2 in the kit adds:
>   __attribute__((returns_zero_on_success))
> and
>   __attribute__((returns_nonzero_on_success))
> as hints to the analyzer that it's worth bifurcating the analysis of
> such functions (to explore failure vs success, and thus to better
> explore error-handling paths).  It's also a hint to the human reader of
> the source code.

These two sound quite nice to have for most C projects. Would it be
useful to generalize to different values than 0/non-0? e.g.
`returns_on_success(0)` and `returns_on_failure(0)`.

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] Add returns_zero_on_success/failure attributes
  2021-11-13 20:37 ` [PATCH 2/6] Add returns_zero_on_success/failure attributes David Malcolm
@ 2021-11-15  7:03   ` Prathamesh Kulkarni
  2021-11-15 14:45     ` Peter Zijlstra
  2021-11-15 22:12     ` David Malcolm
  0 siblings, 2 replies; 39+ messages in thread
From: Prathamesh Kulkarni @ 2021-11-15  7:03 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches, linux-toolchains

On Sun, 14 Nov 2021 at 02:07, David Malcolm via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> This patch adds two new attributes.  The followup patch makes use of
> the attributes in -fanalyzer.
>
> gcc/c-family/ChangeLog:
>         * c-attribs.c (attr_noreturn_exclusions): Add
>         "returns_zero_on_failure" and "returns_zero_on_success".
>         (attr_returns_twice_exclusions): Likewise.
>         (attr_returns_zero_on_exclusions): New.
>         (c_common_attribute_table): Add "returns_zero_on_failure" and
>         "returns_zero_on_success".
>         (handle_returns_zero_on_attributes): New.
>
> gcc/ChangeLog:
>         * doc/extend.texi (Common Function Attributes): Document
>         "returns_zero_on_failure" and "returns_zero_on_success".
>
> gcc/testsuite/ChangeLog:
>         * c-c++-common/attr-returns-zero-on-1.c: New test.
>
> Signed-off-by: David Malcolm <dmalcolm@redhat.com>
> ---
>  gcc/c-family/c-attribs.c                      | 37 ++++++++++
>  gcc/doc/extend.texi                           | 16 +++++
>  .../c-c++-common/attr-returns-zero-on-1.c     | 68 +++++++++++++++++++
>  3 files changed, 121 insertions(+)
>  create mode 100644 gcc/testsuite/c-c++-common/attr-returns-zero-on-1.c
>
> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> index 100c2dabab2..9e03156de5e 100644
> --- a/gcc/c-family/c-attribs.c
> +++ b/gcc/c-family/c-attribs.c
> @@ -153,6 +153,7 @@ static tree handle_argspec_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_returns_nonnull_attribute (tree *, tree, tree, int, bool *);
> +static tree handle_returns_zero_on_attributes (tree *, tree, tree, int, bool *);
>  static tree handle_omp_declare_simd_attribute (tree *, tree, tree, int,
>                                                bool *);
>  static tree handle_omp_declare_variant_attribute (tree *, tree, tree, int,
> @@ -221,6 +222,8 @@ extern const struct attribute_spec::exclusions attr_noreturn_exclusions[] =
>    ATTR_EXCL ("pure", true, true, true),
>    ATTR_EXCL ("returns_twice", true, true, true),
>    ATTR_EXCL ("warn_unused_result", true, true, true),
> +  ATTR_EXCL ("returns_zero_on_failure", true, true, true),
> +  ATTR_EXCL ("returns_zero_on_success", true, true, true),
>    ATTR_EXCL (NULL, false, false, false),
>  };
>
> @@ -235,6 +238,8 @@ attr_warn_unused_result_exclusions[] =
>  static const struct attribute_spec::exclusions attr_returns_twice_exclusions[] =
>  {
>    ATTR_EXCL ("noreturn", true, true, true),
> +  ATTR_EXCL ("returns_zero_on_failure", true, true, true),
> +  ATTR_EXCL ("returns_zero_on_success", true, true, true),
>    ATTR_EXCL (NULL, false, false, false),
>  };
>
> @@ -275,6 +280,16 @@ static const struct attribute_spec::exclusions attr_stack_protect_exclusions[] =
>    ATTR_EXCL (NULL, false, false, false),
>  };
>
> +/* Exclusions that apply to the returns_zero_on_* attributes.  */
> +static const struct attribute_spec::exclusions
> +  attr_returns_zero_on_exclusions[] =
> +{
> +  ATTR_EXCL ("noreturn", true, true, true),
> +  ATTR_EXCL ("returns_twice", true, true, true),
> +  ATTR_EXCL ("returns_zero_on_failure", true, true, true),
> +  ATTR_EXCL ("returns_zero_on_success", true, true, true),
> +  ATTR_EXCL (NULL, false, false, false),
> +};
>
>  /* Table of machine-independent attributes common to all C-like languages.
>
> @@ -493,6 +508,12 @@ const struct attribute_spec c_common_attribute_table[] =
>                               handle_warn_unused_attribute, NULL },
>    { "returns_nonnull",        0, 0, false, true, true, false,
>                               handle_returns_nonnull_attribute, NULL },
> +  { "returns_zero_on_failure",0, 0, false, true, true, false,
> +                             handle_returns_zero_on_attributes,
> +                             attr_returns_zero_on_exclusions },
> +  { "returns_zero_on_success",0, 0, false, true, true, false,
> +                             handle_returns_zero_on_attributes,
> +                             attr_returns_zero_on_exclusions },
>    { "omp declare simd",       0, -1, true,  false, false, false,
>                               handle_omp_declare_simd_attribute, NULL },
>    { "omp declare variant base", 0, -1, true,  false, false, false,
> @@ -5660,6 +5681,22 @@ handle_returns_nonnull_attribute (tree *node, tree name, tree, int,
>    return NULL_TREE;
>  }
>
> +/* Handle "returns_zero_on_failure" and "returns_zero_on_success" attributes;
> +   arguments as in struct attribute_spec.handler.  */
> +
> +static tree
> +handle_returns_zero_on_attributes (tree *node, tree name, tree, int,
> +                                  bool *no_add_attrs)
> +{
> +  if (!INTEGRAL_TYPE_P (TREE_TYPE (*node)))
> +    {
> +      error ("%qE attribute on a function not returning an integral type",
> +            name);
> +      *no_add_attrs = true;
> +    }
> +  return NULL_TREE;
Hi David,
Just curious if a warning should be emitted if the function is marked
with the attribute but it's return value isn't actually 0 ?

There are other constants like -1 or 1 that are often used to indicate
error, so maybe tweak the attribute to
take the integer as an argument ?
Sth like returns_int_on_success(cst) / returns_int_on_failure(cst) ?

Also, would it make sense to extend it for pointers too for returning
NULL on success / failure ?

Thanks,
Prathamesh
> +}
> +
>  /* Handle a "designated_init" attribute; arguments as in
>     struct attribute_spec.handler.  */
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index e9f47519df2..5a6ef464779 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -3784,6 +3784,22 @@ function.  Examples of such functions are @code{setjmp} and @code{vfork}.
>  The @code{longjmp}-like counterpart of such function, if any, might need
>  to be marked with the @code{noreturn} attribute.
>
> +@item returns_zero_on_failure
> +@cindex @code{returns_zero_on_failure} function attribute
> +The @code{returns_zero_on_failure} attribute hints that the function
> +can succeed or fail, returning non-zero on success and zero on failure.
> +This is used by the @option{-fanalyzer} option to consider both outcomes
> +separately, which may improve how it explores error-handling paths, and
> +how such outcomes are labelled in diagnostics.  It is also a hint
> +to the human reader of the source code.
> +
> +@item returns_zero_on_success
> +@cindex @code{returns_zero_on_success} function attribute
> +The @code{returns_zero_on_success} attribute is identical to the
> +@code{returns_zero_on_failure} attribute, apart from having the
> +opposite interpretation of the return value: zero on success, non-zero
> +on failure.
> +
>  @item section ("@var{section-name}")
>  @cindex @code{section} function attribute
>  @cindex functions in arbitrary sections
> diff --git a/gcc/testsuite/c-c++-common/attr-returns-zero-on-1.c b/gcc/testsuite/c-c++-common/attr-returns-zero-on-1.c
> new file mode 100644
> index 00000000000..5475dfe61db
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/attr-returns-zero-on-1.c
> @@ -0,0 +1,68 @@
> +/* Verify the parsing of the "returns_zero_on_{sucess|failure}" attributes.  */
> +
> +/* Correct usage.  */
> +
> +extern int test_int_return_s ()
> +  __attribute__((returns_zero_on_success));
> +extern long test_long_return_f ()
> +  __attribute__((returns_zero_on_failure));
> +
> +/* Should complain if not a function.  */
> +
> +extern int not_a_function_s
> +  __attribute__((returns_zero_on_success)); /* { dg-warning "'returns_zero_on_success' attribute only applies to function types" } */
> +extern int not_a_function_f
> +  __attribute__((returns_zero_on_failure)); /* { dg-warning "'returns_zero_on_failure' attribute only applies to function types" } */
> +
> +/* Should complain if return type is not integral.  */
> +
> +extern void test_void_return_s ()
> +  __attribute__((returns_zero_on_success)); /* { dg-error "'returns_zero_on_success' attribute on a function not returning an integral type" } */
> +extern void test_void_return_f ()
> +  __attribute__((returns_zero_on_failure)); /* { dg-error "'returns_zero_on_failure' attribute on a function not returning an integral type" } */
> +
> +extern void *test_void_star_return_s ()
> +  __attribute__((returns_zero_on_success)); /* { dg-error "'returns_zero_on_success' attribute on a function not returning an integral type" } */
> +extern void *test_void_star_return_f ()
> +  __attribute__((returns_zero_on_failure)); /* { dg-error "'returns_zero_on_failure' attribute on a function not returning an integral type" } */
> +
> +/* (and this prevents mixing with returns_non_null, which requires a pointer).  */
> +
> +/* Should complain if more than one returns_* attribute.  */
> +
> +extern int test_void_returns_s_f ()
> +  __attribute__((returns_zero_on_success))
> +  __attribute__((returns_zero_on_failure)); /* { dg-warning "ignoring attribute 'returns_zero_on_failure' because it conflicts with attribute 'returns_zero_on_success'" } */
> +extern int test_void_returns_f_s ()
> +  __attribute__((returns_zero_on_failure))
> +  __attribute__((returns_zero_on_success)); /* { dg-warning "ignoring attribute 'returns_zero_on_success' because it conflicts with attribute 'returns_zero_on_failure'" } */
> +
> +/* Should complain if mixed with "noreturn".  */
> +
> +extern int test_noreturn_returns_s ()
> +  __attribute__((noreturn))
> +  __attribute__((returns_zero_on_success)); /* { dg-warning "ignoring attribute 'returns_zero_on_success' because it conflicts with attribute 'noreturn'" } */
> +extern int test_returns_s_noreturn ()
> +  __attribute__((returns_zero_on_success))
> +  __attribute__((noreturn)); /* { dg-warning "ignoring attribute 'noreturn' because it conflicts with attribute 'returns_zero_on_success'" } */
> +extern int test_noreturn_returns_f ()
> +  __attribute__((noreturn))
> +  __attribute__((returns_zero_on_failure)); /* { dg-warning "ignoring attribute 'returns_zero_on_failure' because it conflicts with attribute 'noreturn'" } */
> +extern int test_returns_f_noreturn ()
> +  __attribute__((returns_zero_on_failure))
> +  __attribute__((noreturn)); /* { dg-warning "ignoring attribute 'noreturn' because it conflicts with attribute 'returns_zero_on_failure'" } */
> +
> +/* Should complain if mixed with "returns_twice".  */
> +
> +extern int test_returns_twice_returns_s ()
> +  __attribute__((returns_twice))
> +  __attribute__((returns_zero_on_success)); /* { dg-warning "ignoring attribute 'returns_zero_on_success' because it conflicts with attribute 'returns_twice'" } */
> +extern int test_returns_s_returns_twice ()
> +  __attribute__((returns_zero_on_success))
> +  __attribute__((returns_twice)); /* { dg-warning "ignoring attribute 'returns_twice' because it conflicts with attribute 'returns_zero_on_success'" } */
> +extern int test_returns_twice_returns_f ()
> +  __attribute__((returns_twice))
> +  __attribute__((returns_zero_on_failure)); /* { dg-warning "ignoring attribute 'returns_zero_on_failure' because it conflicts with attribute 'returns_twice'" } */
> +extern int test_returns_f_returns_twice ()
> +  __attribute__((returns_zero_on_failure))
> +  __attribute__((returns_twice)); /* { dg-warning "ignoring attribute 'returns_twice' because it conflicts with attribute 'returns_zero_on_failure'" } */
> --
> 2.26.3
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] Add returns_zero_on_success/failure attributes
  2021-11-15  7:03   ` Prathamesh Kulkarni
@ 2021-11-15 14:45     ` Peter Zijlstra
  2021-11-15 22:30       ` David Malcolm
  2021-11-15 22:12     ` David Malcolm
  1 sibling, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2021-11-15 14:45 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: David Malcolm, gcc-patches, linux-toolchains

On Mon, Nov 15, 2021 at 12:33:16PM +0530, Prathamesh Kulkarni wrote:
> On Sun, 14 Nov 2021 at 02:07, David Malcolm via Gcc-patches

> > +/* Handle "returns_zero_on_failure" and "returns_zero_on_success" attributes;
> > +   arguments as in struct attribute_spec.handler.  */
> > +
> > +static tree
> > +handle_returns_zero_on_attributes (tree *node, tree name, tree, int,
> > +                                  bool *no_add_attrs)
> > +{
> > +  if (!INTEGRAL_TYPE_P (TREE_TYPE (*node)))
> > +    {
> > +      error ("%qE attribute on a function not returning an integral type",
> > +            name);
> > +      *no_add_attrs = true;
> > +    }
> > +  return NULL_TREE;
> Hi David,
> Just curious if a warning should be emitted if the function is marked
> with the attribute but it's return value isn't actually 0 ?
> 
> There are other constants like -1 or 1 that are often used to indicate
> error, so maybe tweak the attribute to
> take the integer as an argument ?
> Sth like returns_int_on_success(cst) / returns_int_on_failure(cst) ?
> 
> Also, would it make sense to extend it for pointers too for returning
> NULL on success / failure ?

Please also consider that in Linux we use the 'last' page for error code
returns. That is, a function returning a pointer could return '(void
*)-EFAULT' also see linux/err.h

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] Add returns_zero_on_success/failure attributes
  2021-11-15  7:03   ` Prathamesh Kulkarni
  2021-11-15 14:45     ` Peter Zijlstra
@ 2021-11-15 22:12     ` David Malcolm
  2021-11-17  9:23       ` Prathamesh Kulkarni
  1 sibling, 1 reply; 39+ messages in thread
From: David Malcolm @ 2021-11-15 22:12 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: gcc-patches, linux-toolchains

On Mon, 2021-11-15 at 12:33 +0530, Prathamesh Kulkarni wrote:
> On Sun, 14 Nov 2021 at 02:07, David Malcolm via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> > 
> > This patch adds two new attributes.  The followup patch makes use of
> > the attributes in -fanalyzer.

[...snip...]

> > +/* Handle "returns_zero_on_failure" and "returns_zero_on_success"
> > attributes;
> > +   arguments as in struct attribute_spec.handler.  */
> > +
> > +static tree
> > +handle_returns_zero_on_attributes (tree *node, tree name, tree,
> > int,
> > +                                  bool *no_add_attrs)
> > +{
> > +  if (!INTEGRAL_TYPE_P (TREE_TYPE (*node)))
> > +    {
> > +      error ("%qE attribute on a function not returning an
> > integral type",
> > +            name);
> > +      *no_add_attrs = true;
> > +    }
> > +  return NULL_TREE;
> Hi David,

Thanks for the ideas.

> Just curious if a warning should be emitted if the function is marked
> with the attribute but it's return value isn't actually 0 ?

That sounds like a worthwhile extension of the idea.  It should be
possible to identify functions that can't return zero or non-zero that
have been marked as being able to.

That said:

(a) if you apply the attribute to a function pointer for a callback,
you could have an implementation of the callback that always fails and
returns, say, -1; should the warning complain that the function has the
"returns_zero_on_success" property and is always returning -1?

(b) the attributes introduce a concept of "success" vs "failure", which
might be hard for a machine to determine.  It's only used later on in
terms of the events presented to the user, so that -fanalyzer can emit
e.g. "when 'copy_from_user' fails", which IMHO is easier to read than
"when 'copy_from_user' returns non-zero".

> 
> There are other constants like -1 or 1 that are often used to indicate
> error, so maybe tweak the attribute to
> take the integer as an argument ?
> Sth like returns_int_on_success(cst) / returns_int_on_failure(cst) ?

Those could work nicely; I like the idea of them being supplementary to
the returns_zero_on_* ones.

I got the urge to bikeshed about wording; some ideas:
  success_return_value(CST)
  failure_return_value(CST)
or maybe additionally:
  success_return_range(LOWER_BOUND_CST, UPPER_BOUND_CST)
  failure_return_range(LOWER_BOUND_CST, UPPER_BOUND_CST)

I can also imagine a
  sets_errno_on_failure
attribute being useful (and perhaps a "doesnt_touch_errno"???)

> Also, would it make sense to extend it for pointers too for returning
> NULL on success / failure ?

Possibly expressible by generalizing it to allow pointer types, or by
adding this pair:

  returns_null_on_failure
  returns_null_on_success

or by using the "range" idea above.

In terms of scope, for the trust boundary stuff, I want to be able to
express the idea that a call can succeed vs fail, what the success vs
failure is in terms of nonzero vs zero, and to be able to wire up the
heuristic that if it looks like a "copy function" (use of access
attributes and a size), that success/failure can mean "copies all of
it" vs "copies none of it" (which seems to get decent test coverage on
the Linux kernel with the copy_from/to_user fns).

Thanks
Dave


> 
> Thanks,
> Prathamesh

[...snip...]


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] Add returns_zero_on_success/failure attributes
  2021-11-15 14:45     ` Peter Zijlstra
@ 2021-11-15 22:30       ` David Malcolm
  0 siblings, 0 replies; 39+ messages in thread
From: David Malcolm @ 2021-11-15 22:30 UTC (permalink / raw)
  To: Peter Zijlstra, Prathamesh Kulkarni; +Cc: gcc-patches, linux-toolchains

On Mon, 2021-11-15 at 15:45 +0100, Peter Zijlstra wrote:
> On Mon, Nov 15, 2021 at 12:33:16PM +0530, Prathamesh Kulkarni wrote:
> > On Sun, 14 Nov 2021 at 02:07, David Malcolm via Gcc-patches
> 
> > > +/* Handle "returns_zero_on_failure" and "returns_zero_on_success"
> > > attributes;
> > > +   arguments as in struct attribute_spec.handler.  */
> > > +
> > > +static tree
> > > +handle_returns_zero_on_attributes (tree *node, tree name, tree,
> > > int,
> > > +                                  bool *no_add_attrs)
> > > +{
> > > +  if (!INTEGRAL_TYPE_P (TREE_TYPE (*node)))
> > > +    {
> > > +      error ("%qE attribute on a function not returning an
> > > integral type",
> > > +            name);
> > > +      *no_add_attrs = true;
> > > +    }
> > > +  return NULL_TREE;
> > Hi David,
> > Just curious if a warning should be emitted if the function is marked
> > with the attribute but it's return value isn't actually 0 ?
> > 
> > There are other constants like -1 or 1 that are often used to
> > indicate
> > error, so maybe tweak the attribute to
> > take the integer as an argument ?
> > Sth like returns_int_on_success(cst) / returns_int_on_failure(cst) ?
> > 
> > Also, would it make sense to extend it for pointers too for returning
> > NULL on success / failure ?
> 
> Please also consider that in Linux we use the 'last' page for error
> code
> returns. That is, a function returning a pointer could return '(void
> *)-EFAULT' also see linux/err.h
> 

Thanks.

Am I right in thinking that such functions return non-NULL, giving
something like:

  __attribute__((returns_ptr_in_range_on_success (0x1, NULL - 4096)))
  __attribute__((returns_ptr_in_range_on_failure (NULL - 4096, NULL - 1)))
  __attribute__((returns_non_null))

as attributes?  (I have no idea if the above will parse, and I admit
these look ugly as-is, though I suppose they could be hidden behind a
macro).

Looking at include/linux/err.h I see functions:

static inline bool __must_check IS_ERR(__force const void *ptr)
{
	return IS_ERR_VALUE((unsigned long)ptr);
}

static inline bool __must_check IS_ERR_OR_NULL(__force const void *ptr)
{
	return unlikely(!ptr) || IS_ERR_VALUE((unsigned long)ptr);
}

so maybe attribute could refer to predicate functions, something like
this:

  __attribute__((return_value_success_predicate(FUNCTION_DECL)))
  __attribute__((return_value_failure_predicate(FUNCTION_DECL)))

where this case could use something like:

  __attribute__((return_value_failure_predicate(IS_ERR)))

to express the idea "this function can succeed or fail, and the given
function decl expresses whether a given return value is a failure" - or
somesuch.  The predicate function would probably have to be pure.

Obviously I'm just brainstorming here; as noted in my reply to
Prathamesh, all I need for the initial implementation of the trust
boundary work is just being able to express that zero vs non-zero
return is the success vs failure condition for a function.

Dave



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] Add returns_zero_on_success/failure attributes
  2021-11-15 22:12     ` David Malcolm
@ 2021-11-17  9:23       ` Prathamesh Kulkarni
  2021-11-17 22:43         ` Joseph Myers
  2021-11-18 23:15         ` David Malcolm
  0 siblings, 2 replies; 39+ messages in thread
From: Prathamesh Kulkarni @ 2021-11-17  9:23 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches, linux-toolchains

On Tue, 16 Nov 2021 at 03:42, David Malcolm <dmalcolm@redhat.com> wrote:
>
> On Mon, 2021-11-15 at 12:33 +0530, Prathamesh Kulkarni wrote:
> > On Sun, 14 Nov 2021 at 02:07, David Malcolm via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > This patch adds two new attributes.  The followup patch makes use of
> > > the attributes in -fanalyzer.
>
> [...snip...]
>
> > > +/* Handle "returns_zero_on_failure" and "returns_zero_on_success"
> > > attributes;
> > > +   arguments as in struct attribute_spec.handler.  */
> > > +
> > > +static tree
> > > +handle_returns_zero_on_attributes (tree *node, tree name, tree,
> > > int,
> > > +                                  bool *no_add_attrs)
> > > +{
> > > +  if (!INTEGRAL_TYPE_P (TREE_TYPE (*node)))
> > > +    {
> > > +      error ("%qE attribute on a function not returning an
> > > integral type",
> > > +            name);
> > > +      *no_add_attrs = true;
> > > +    }
> > > +  return NULL_TREE;
> > Hi David,
>
> Thanks for the ideas.
>
> > Just curious if a warning should be emitted if the function is marked
> > with the attribute but it's return value isn't actually 0 ?
>
> That sounds like a worthwhile extension of the idea.  It should be
> possible to identify functions that can't return zero or non-zero that
> have been marked as being able to.
>
> That said:
>
> (a) if you apply the attribute to a function pointer for a callback,
> you could have an implementation of the callback that always fails and
> returns, say, -1; should the warning complain that the function has the
> "returns_zero_on_success" property and is always returning -1?
Ah OK. In that case, emitting a diagnostic if the return value
isn't 0, doesn't make sense for "returns_zero_on_success" since the
function "always fails".
Thanks for pointing out!
>
> (b) the attributes introduce a concept of "success" vs "failure", which
> might be hard for a machine to determine.  It's only used later on in
> terms of the events presented to the user, so that -fanalyzer can emit
> e.g. "when 'copy_from_user' fails", which IMHO is easier to read than
> "when 'copy_from_user' returns non-zero".
Indeed.
>
> >
> > There are other constants like -1 or 1 that are often used to indicate
> > error, so maybe tweak the attribute to
> > take the integer as an argument ?
> > Sth like returns_int_on_success(cst) / returns_int_on_failure(cst) ?
>
> Those could work nicely; I like the idea of them being supplementary to
> the returns_zero_on_* ones.
>
> I got the urge to bikeshed about wording; some ideas:
>   success_return_value(CST)
>   failure_return_value(CST)
> or maybe additionally:
>   success_return_range(LOWER_BOUND_CST, UPPER_BOUND_CST)
>   failure_return_range(LOWER_BOUND_CST, UPPER_BOUND_CST)
Extending to range is a nice idea ;-)
Apart from success / failure, if we just had an attribute
return_range(low_cst, high_cst), I suppose that could
be useful for return value optimization ?
>
> I can also imagine a
>   sets_errno_on_failure
> attribute being useful (and perhaps a "doesnt_touch_errno"???)
More generally, would it be a good idea to provide attributes for
mod/ref anaylsis ?
So sth like:
void foo(void) __attribute__((modifies(errno)));
which would state that foo modifies errno, but neither reads nor
modifies any other global var.
and
void bar(void) __attribute__((reads(errno)))
which would state that bar only reads errno, and doesn't modify or
read any other global var.
I guess that can benefit optimization, since we can have better
context about side-effects of a function call.
For success / failure context, we could add attributes
modifies_on_success, modifies_on_failure ?

Thanks,
Prathamesh
>
> > Also, would it make sense to extend it for pointers too for returning
> > NULL on success / failure ?
>
> Possibly expressible by generalizing it to allow pointer types, or by
> adding this pair:
>
>   returns_null_on_failure
>   returns_null_on_success
>
> or by using the "range" idea above.
>
> In terms of scope, for the trust boundary stuff, I want to be able to
> express the idea that a call can succeed vs fail, what the success vs
> failure is in terms of nonzero vs zero, and to be able to wire up the
> heuristic that if it looks like a "copy function" (use of access
> attributes and a size), that success/failure can mean "copies all of
> it" vs "copies none of it" (which seems to get decent test coverage on
> the Linux kernel with the copy_from/to_user fns).
>
> Thanks
> Dave
>
>
> >
> > Thanks,
> > Prathamesh
>
> [...snip...]
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] Add returns_zero_on_success/failure attributes
  2021-11-17  9:23       ` Prathamesh Kulkarni
@ 2021-11-17 22:43         ` Joseph Myers
  2021-11-18 20:08           ` Segher Boessenkool
  2021-11-18 23:34           ` David Malcolm
  2021-11-18 23:15         ` David Malcolm
  1 sibling, 2 replies; 39+ messages in thread
From: Joseph Myers @ 2021-11-17 22:43 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: David Malcolm, gcc-patches, linux-toolchains

On Wed, 17 Nov 2021, Prathamesh Kulkarni via Gcc-patches wrote:

> More generally, would it be a good idea to provide attributes for
> mod/ref anaylsis ?
> So sth like:
> void foo(void) __attribute__((modifies(errno)));
> which would state that foo modifies errno, but neither reads nor
> modifies any other global var.
> and
> void bar(void) __attribute__((reads(errno)))
> which would state that bar only reads errno, and doesn't modify or
> read any other global var.

Many math.h functions are const except for possibly setting errno, 
possibly raising floating-point exceptions (which might have other effects 
when using alternate exception handling) and possibly reading the rounding 
mode.  To represent that, it might be useful for such attributes to be 
able to describe state (such as the floating-point environment) that 
doesn't correspond to a C identifier.  (errno tends to be a macro, so 
referring to it as such in an attribute may be awkward as well.)

(See also <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2825.htm> with 
some proposals for features to describe const/pure-like properties of 
functions.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] Add returns_zero_on_success/failure attributes
  2021-11-17 22:43         ` Joseph Myers
@ 2021-11-18 20:08           ` Segher Boessenkool
  2021-11-18 23:45             ` David Malcolm
  2021-11-18 23:34           ` David Malcolm
  1 sibling, 1 reply; 39+ messages in thread
From: Segher Boessenkool @ 2021-11-18 20:08 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Prathamesh Kulkarni, David Malcolm, gcc-patches, linux-toolchains

On Wed, Nov 17, 2021 at 10:43:58PM +0000, Joseph Myers wrote:
> On Wed, 17 Nov 2021, Prathamesh Kulkarni via Gcc-patches wrote:
> > More generally, would it be a good idea to provide attributes for
> > mod/ref anaylsis ?
> > So sth like:
> > void foo(void) __attribute__((modifies(errno)));
> > which would state that foo modifies errno, but neither reads nor
> > modifies any other global var.
> > and
> > void bar(void) __attribute__((reads(errno)))
> > which would state that bar only reads errno, and doesn't modify or
> > read any other global var.
> 
> Many math.h functions are const except for possibly setting errno, 
> possibly raising floating-point exceptions (which might have other effects 
> when using alternate exception handling) and possibly reading the rounding 
> mode.  To represent that, it might be useful for such attributes to be 
> able to describe state (such as the floating-point environment) that 
> doesn't correspond to a C identifier.  (errno tends to be a macro, so 
> referring to it as such in an attribute may be awkward as well.)

We need some way to describe these things in Gimple and RTL as well,
and not just on function calls: also on other expressions.  Adding
attributes that allow to describe this (partially, only per function) in
C source code does not bring us closer to where we need to be.


Segher

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] Add returns_zero_on_success/failure attributes
  2021-11-17  9:23       ` Prathamesh Kulkarni
  2021-11-17 22:43         ` Joseph Myers
@ 2021-11-18 23:15         ` David Malcolm
  1 sibling, 0 replies; 39+ messages in thread
From: David Malcolm @ 2021-11-18 23:15 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: gcc-patches, linux-toolchains

On Wed, 2021-11-17 at 14:53 +0530, Prathamesh Kulkarni wrote:
> On Tue, 16 Nov 2021 at 03:42, David Malcolm <dmalcolm@redhat.com>
> wrote:
> > 
> > On Mon, 2021-11-15 at 12:33 +0530, Prathamesh Kulkarni wrote:
> > > On Sun, 14 Nov 2021 at 02:07, David Malcolm via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > > 
> > > > This patch adds two new attributes.  The followup patch makes
> > > > use of
> > > > the attributes in -fanalyzer.
> > 
> > [...snip...]
> > 
> > > > +/* Handle "returns_zero_on_failure" and
> > > > "returns_zero_on_success"
> > > > attributes;
> > > > +   arguments as in struct attribute_spec.handler.  */
> > > > +
> > > > +static tree
> > > > +handle_returns_zero_on_attributes (tree *node, tree name,
> > > > tree,
> > > > int,
> > > > +                                  bool *no_add_attrs)
> > > > +{
> > > > +  if (!INTEGRAL_TYPE_P (TREE_TYPE (*node)))
> > > > +    {
> > > > +      error ("%qE attribute on a function not returning an
> > > > integral type",
> > > > +            name);
> > > > +      *no_add_attrs = true;
> > > > +    }
> > > > +  return NULL_TREE;
> > > Hi David,
> > 
> > Thanks for the ideas.
> > 
> > > Just curious if a warning should be emitted if the function is
> > > marked
> > > with the attribute but it's return value isn't actually 0 ?
> > 
> > That sounds like a worthwhile extension of the idea.  It should be
> > possible to identify functions that can't return zero or non-zero
> > that
> > have been marked as being able to.
> > 
> > That said:
> > 
> > (a) if you apply the attribute to a function pointer for a
> > callback,
> > you could have an implementation of the callback that always fails
> > and
> > returns, say, -1; should the warning complain that the function has
> > the
> > "returns_zero_on_success" property and is always returning -1?
> Ah OK. In that case, emitting a diagnostic if the return value
> isn't 0, doesn't make sense for "returns_zero_on_success" since the
> function "always fails".
> Thanks for pointing out!
> > 
> > (b) the attributes introduce a concept of "success" vs "failure",
> > which
> > might be hard for a machine to determine.  It's only used later on
> > in
> > terms of the events presented to the user, so that -fanalyzer can
> > emit
> > e.g. "when 'copy_from_user' fails", which IMHO is easier to read
> > than
> > "when 'copy_from_user' returns non-zero".
> Indeed.
> > 
> > > 
> > > There are other constants like -1 or 1 that are often used to
> > > indicate
> > > error, so maybe tweak the attribute to
> > > take the integer as an argument ?
> > > Sth like returns_int_on_success(cst) /
> > > returns_int_on_failure(cst) ?
> > 
> > Those could work nicely; I like the idea of them being
> > supplementary to
> > the returns_zero_on_* ones.
> > 
> > I got the urge to bikeshed about wording; some ideas:
> >   success_return_value(CST)
> >   failure_return_value(CST)
> > or maybe additionally:
> >   success_return_range(LOWER_BOUND_CST, UPPER_BOUND_CST)
> >   failure_return_range(LOWER_BOUND_CST, UPPER_BOUND_CST)
> Extending to range is a nice idea ;-)
> Apart from success / failure, if we just had an attribute
> return_range(low_cst, high_cst), I suppose that could
> be useful for return value optimization ?

Perhaps.  All of this sounds like scope creep beyond my immediate
requirements though :)

> > 
> > I can also imagine a
> >   sets_errno_on_failure
> > attribute being useful (and perhaps a "doesnt_touch_errno"???)
> More generally, would it be a good idea to provide attributes for
> mod/ref anaylsis ?
> So sth like:
> void foo(void) __attribute__((modifies(errno)));
> which would state that foo modifies errno, but neither reads nor
> modifies any other global var.
> and
> void bar(void) __attribute__((reads(errno)))
> which would state that bar only reads errno, and doesn't modify or
> read any other global var.
> I guess that can benefit optimization, since we can have better
> context about side-effects of a function call.
> For success / failure context, we could add attributes
> modifies_on_success, modifies_on_failure ?

Likewise - sounds potentially useful, but I don't need this for this
kernel trust boundaries patch kit.

Dave

> 
> Thanks,
> Prathamesh
> > 
> > > Also, would it make sense to extend it for pointers too for
> > > returning
> > > NULL on success / failure ?
> > 
> > Possibly expressible by generalizing it to allow pointer types, or
> > by
> > adding this pair:
> > 
> >   returns_null_on_failure
> >   returns_null_on_success
> > 
> > or by using the "range" idea above.
> > 
> > In terms of scope, for the trust boundary stuff, I want to be able
> > to
> > express the idea that a call can succeed vs fail, what the success
> > vs
> > failure is in terms of nonzero vs zero, and to be able to wire up
> > the
> > heuristic that if it looks like a "copy function" (use of access
> > attributes and a size), that success/failure can mean "copies all
> > of
> > it" vs "copies none of it" (which seems to get decent test coverage
> > on
> > the Linux kernel with the copy_from/to_user fns).
> > 
> > Thanks
> > Dave
> > 
> > 
> > > 
> > > Thanks,
> > > Prathamesh
> > 
> > [...snip...]
> > 
> 



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] Add returns_zero_on_success/failure attributes
  2021-11-17 22:43         ` Joseph Myers
  2021-11-18 20:08           ` Segher Boessenkool
@ 2021-11-18 23:34           ` David Malcolm
  2021-12-06 18:34             ` Martin Sebor
  1 sibling, 1 reply; 39+ messages in thread
From: David Malcolm @ 2021-11-18 23:34 UTC (permalink / raw)
  To: Joseph Myers, Prathamesh Kulkarni; +Cc: gcc-patches, linux-toolchains

On Wed, 2021-11-17 at 22:43 +0000, Joseph Myers wrote:
> On Wed, 17 Nov 2021, Prathamesh Kulkarni via Gcc-patches wrote:
> 
> > More generally, would it be a good idea to provide attributes for
> > mod/ref anaylsis ?
> > So sth like:
> > void foo(void) __attribute__((modifies(errno)));
> > which would state that foo modifies errno, but neither reads nor
> > modifies any other global var.
> > and
> > void bar(void) __attribute__((reads(errno)))
> > which would state that bar only reads errno, and doesn't modify or
> > read any other global var.
> 
> Many math.h functions are const except for possibly setting errno, 
> possibly raising floating-point exceptions (which might have other
> effects 
> when using alternate exception handling) and possibly reading the
> rounding 
> mode.  To represent that, it might be useful for such attributes to
> be 
> able to describe state (such as the floating-point environment) that 
> doesn't correspond to a C identifier.  (errno tends to be a macro, so
> referring to it as such in an attribute may be awkward as well.)
> 
> (See also <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2825.htm>
> with 
> some proposals for features to describe const/pure-like properties of
> functions.)
> 

Thanks for the link.

As noted in my reply to Prathamesh, these ideas sound interesting, but
this thread seems to be entering scope creep - I don't need these ideas
to implement this patch kit (but I do need the attributes specified in
the patch, or similar).  

Do the specific attributes I posted sound reasonable?  (without
necessarily going in to a full review).

If we're thinking longer term, I want the ability to express that a
function can have multiple outcomes (e.g. "success" vs "failure" or
"found" vs "not found", etc), and it might be good to have a way to
attach attributes to those outcomes.  Unfortunately the attribute
syntax is flat, but maybe there could be a two level hierarchy,
something like:

int foo (args)
  __attribute__((outcome("success")
                 __attribute__((return_value(0))))
  __attribute__((outcome("failure")
                 __attribute__((return_value_ne(0))
                 __attribute__((modifies(errno)))));

Or given that we're enamored by Lisp-ish DSLs we could go the whole hog
and have something like:

int foo (args)
  __attribute ((semantics(
    "(def-outcomes (success (return-value (eq 0))"
    "              (failure (return-value (ne 0)"
    "                        modifies (errno))))")));

which may be over-engineering things :)


Going back to the patch itself, returns_zero_on_success/failure get me
what I want to express for finding trust boundaries in the Linux
kernel, have obvious meaning to a programmer (helpful even w/o compiler
support), and could interoperate with one the more elaborate ideas in
this thread.

Hope this is constructive
Dave






^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] Add returns_zero_on_success/failure attributes
  2021-11-18 20:08           ` Segher Boessenkool
@ 2021-11-18 23:45             ` David Malcolm
  2021-11-19 21:52               ` Segher Boessenkool
  0 siblings, 1 reply; 39+ messages in thread
From: David Malcolm @ 2021-11-18 23:45 UTC (permalink / raw)
  To: Segher Boessenkool, Joseph Myers
  Cc: Prathamesh Kulkarni, gcc-patches, linux-toolchains

On Thu, 2021-11-18 at 14:08 -0600, Segher Boessenkool wrote:
> On Wed, Nov 17, 2021 at 10:43:58PM +0000, Joseph Myers wrote:
> > On Wed, 17 Nov 2021, Prathamesh Kulkarni via Gcc-patches wrote:
> > > More generally, would it be a good idea to provide attributes for
> > > mod/ref anaylsis ?
> > > So sth like:
> > > void foo(void) __attribute__((modifies(errno)));
> > > which would state that foo modifies errno, but neither reads nor
> > > modifies any other global var.
> > > and
> > > void bar(void) __attribute__((reads(errno)))
> > > which would state that bar only reads errno, and doesn't modify
> > > or
> > > read any other global var.
> > 
> > Many math.h functions are const except for possibly setting errno, 
> > possibly raising floating-point exceptions (which might have other
> > effects 
> > when using alternate exception handling) and possibly reading the
> > rounding 
> > mode.  To represent that, it might be useful for such attributes to
> > be 
> > able to describe state (such as the floating-point environment)
> > that 
> > doesn't correspond to a C identifier.  (errno tends to be a macro,
> > so 
> > referring to it as such in an attribute may be awkward as well.)
> 
> We need some way to describe these things in Gimple and RTL as well,
> and not just on function calls: also on other expressions.  Adding
> attributes that allow to describe this (partially, only per function)
> in
> C source code does not bring us closer to where we need to be.

Right, but those IR concerns are orthogonal to the needs of the patch
kit, which is a way to express certain *other* things per-function in
the C frontend.  

As noted in my other replies, this thread seems to be turning into
something of a scope-creep pile-on, when I have some specific things I
need for the rest of the patch kit, and they're unrelated to the
problems of errno or floating-point handling.

Dave


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] Add returns_zero_on_success/failure attributes
  2021-11-18 23:45             ` David Malcolm
@ 2021-11-19 21:52               ` Segher Boessenkool
  0 siblings, 0 replies; 39+ messages in thread
From: Segher Boessenkool @ 2021-11-19 21:52 UTC (permalink / raw)
  To: David Malcolm
  Cc: Joseph Myers, Prathamesh Kulkarni, gcc-patches, linux-toolchains

On Thu, Nov 18, 2021 at 06:45:42PM -0500, David Malcolm wrote:
> On Thu, 2021-11-18 at 14:08 -0600, Segher Boessenkool wrote:
> > We need some way to describe these things in Gimple and RTL as well,
> > and not just on function calls: also on other expressions.  Adding
> > attributes that allow to describe this (partially, only per function)
> > in
> > C source code does not bring us closer to where we need to be.
> 
> Right, but those IR concerns are orthogonal to the needs of the patch
> kit, which is a way to express certain *other* things per-function in
> the C frontend.  

My fear is that such band-aids will only make attacking the long
standing hard problems even harder.

> As noted in my other replies, this thread seems to be turning into
> something of a scope-creep pile-on, when I have some specific things I
> need for the rest of the patch kit, and they're unrelated to the
> problems of errno or floating-point handling.

I am just asking to think about the broader picture, and see how this
fits in there.


Segher

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries
  2021-11-13 20:37 [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries David Malcolm
                   ` (8 preceding siblings ...)
  2021-11-14 13:54 ` Miguel Ojeda
@ 2021-12-06 18:12 ` Martin Sebor
  2021-12-06 19:40   ` Segher Boessenkool
  2021-12-08 23:11   ` David Malcolm
  9 siblings, 2 replies; 39+ messages in thread
From: Martin Sebor @ 2021-12-06 18:12 UTC (permalink / raw)
  To: David Malcolm, gcc-patches, linux-toolchains

On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:
> [Crossposting between gcc-patches@gcc.gnu.org and
> linux-toolchains@vger.kernel.org; sorry about my lack of kernel
> knowledge, in case of the following seems bogus]
> 
> I've been trying to turn my prototype from the LPC2021 session on
> "Adding kernel-specific test coverage to GCC's -fanalyzer option"
> ( https://linuxplumbersconf.org/event/11/contributions/1076/ ) into
> something that can go into GCC upstream without adding kernel-specific
> special cases, or requiring a GCC plugin.  The prototype simply
> specialcased "copy_from_user" and "copy_to_user" in GCC, which is
> clearly not OK.
> 
> This GCC patch kit implements detection of "trust boundaries", aimed at
> detection of "infoleaks" and of use of unsanitized attacker-controlled
> values ("taint") in the Linux kernel.
> 
> For example, here's an infoleak diagnostic (using notes to
> express what fields and padding within a struct have not been
> initialized):
> 
> infoleak-CVE-2011-1078-2.c: In function ‘test_1’:
> infoleak-CVE-2011-1078-2.c:28:9: warning: potential exposure of sensitive
>    information by copying uninitialized data from stack across trust
>    boundary [CWE-200] [-Wanalyzer-exposure-through-uninit-copy]
>     28 |         copy_to_user(optval, &cinfo, sizeof(cinfo));
>        |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    ‘test_1’: events 1-3
>      |
>      |   21 |         struct sco_conninfo cinfo;
>      |      |                             ^~~~~
>      |      |                             |
>      |      |                             (1) region created on stack here
>      |      |                             (2) capacity: 6 bytes
>      |......
>      |   28 |         copy_to_user(optval, &cinfo, sizeof(cinfo));
>      |      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>      |      |         |
>      |      |         (3) uninitialized data copied from stack here
>      |
> infoleak-CVE-2011-1078-2.c:28:9: note: 1 byte is uninitialized
>     28 |         copy_to_user(optval, &cinfo, sizeof(cinfo));
>        |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> infoleak-CVE-2011-1078-2.c:14:15: note: padding after field ‘dev_class’ is uninitialized (1 byte)
>     14 |         __u8  dev_class[3];
>        |               ^~~~~~~~~
> infoleak-CVE-2011-1078-2.c:21:29: note: suggest forcing zero-initialization by providing a ‘{0}’ initializer
>     21 |         struct sco_conninfo cinfo;
>        |                             ^~~~~
>        |                                   = {0}
> 
> I have to come up with a way of expressing trust boundaries in a way
> that will be:
> - acceptable to the GCC community (not be too kernel-specific), and
> - useful to the Linux kernel community.
> 
> At LPC it was pointed out that the kernel already has various
> annotations e.g. "__user" for different kinds of pointers, and that it
> would be best to reuse those.
> 
> 
> Approach 1: Custom Address Spaces
> =================================
> 
> GCC's C frontend supports target-specific address spaces; see:
>    https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
> Quoting the N1275 draft of ISO/IEC DTR 18037:
>    "Address space names are ordinary identifiers, sharing the same name
>    space as variables and typedef names.  Any such names follow the same
>    rules for scope as other ordinary identifiers (such as typedef names).
>    An implementation may provide an implementation-defined set of
>    intrinsic address spaces that are, in effect, predefined at the start
>    of every translation unit.  The names of intrinsic address spaces must
>    be reserved identifiers (beginning with an underscore and an uppercase
>    letter or with two underscores).  An implementation may also
>    optionally support a means for new address space names to be defined
>    within a translation unit."
> 
> Patch 1a in the following patch kit for GCC implements such a means to
> define new address spaces names in a translation unit, via a pragma:
>    #prgama GCC custom_address_space(NAME_OF_ADDRESS_SPACE)
> 
> For example, the Linux kernel could perhaps write:
> 
>    #define __kernel
>    #pragma GCC custom_address_space(__user)
>    #pragma GCC custom_address_space(__iomem)
>    #pragma GCC custom_address_space(__percpu)
>    #pragma GCC custom_address_space(__rcu)
> 
> and thus the C frontend can complain about code that mismatches __user
> and kernel pointers, e.g.:
> 
> custom-address-space-1.c: In function ‘test_argpass_to_p’:
> custom-address-space-1.c:29:14: error: passing argument 1 of ‘accepts_p’
> from pointer to non-enclosed address space
>     29 |   accepts_p (p_user);
>        |              ^~~~~~
> custom-address-space-1.c:21:24: note: expected ‘void *’ but argument is
> of type ‘__user void *’
>     21 | extern void accepts_p (void *);
>        |                        ^~~~~~
> custom-address-space-1.c: In function ‘test_cast_k_to_u’:
> custom-address-space-1.c:135:12: warning: cast to ‘__user’ address space
> pointer from disjoint generic address space pointer
>    135 |   p_user = (void __user *)p_kernel;
>        |            ^

This seems like an excellent use of named address spaces :)

I'm familiar with TR 18037 but I'm not an expert on this stuff
so I can't really say a whole lot more.

My only suggestion here is to follow the terminology from
there in the naming of the pragma, unless you have some reason
not to.  I'd also recommend to consider other implementations
of named address spaces, if there are any, especially those
that try to be compatible with GCC.  If there are none, rather
than custom_address_space I'd suggest either just address_space
or named_address_space.

I have not yet looked at the implementation so this is just
a high-level comment on the design.

> The patch doesn't yet maintain a good distinction between implicit
> target-specific address spaces and user-defined address spaces, has at
> least one known major bug, and has only been lightly tested.  I can
> fix these issues, but was hoping for feedback that this approach is the
> right direction from both the GCC and Linux development communities.
> 
> Implementation status: doesn't yet bootstrap; am running into stage2
> vs stage3 comparison issues.
> 
> 
> Approach 2: An "untrusted" attribute
> ====================================
> 
> Alternatively, patch 1b in the kit implements:
> 
>    __attribute__((untrusted))
> 
> which can be applied to types as a qualifier (similarly to const,
> volatile, etc) to mark a trust boundary, hence the kernel could have:
> 
>    #define __user __attribute__((untrusted))
> 
> where my patched GCC treats
>    T *
> vs
>    T __attribute__((untrusted)) *
> as being different types and thus the C frontend can complain (even without
> -fanalyzer) about e.g.:
> 
> extern void accepts_p(void *);
> 
> void test_argpass_to_p(void __user *p_user)
> {
>    accepts_p(p_user);
> }
> 
> untrusted-pointer-1.c: In function ‘test_argpass_to_p’:
> untrusted-pointer-1.c:22:13: error: passing argument 1 of ‘accepts_p’
> from pointer with different trust level
>     22 |   accepts_p(p_user);
>        |              ^~~~~~
> untrusted-pointer-1.c:14:23: note: expected ‘void *’ but argument is of
> type ‘__attribute__((untrusted)) void *’
>     14 | extern void accepts_p(void *);
>        |                        ^~~~~~
> 
> So you'd get enforcement of __user vs non-__user pointers as part of
> GCC's regular type-checking.  (You need an explicit cast to convert
> between the untrusted vs trusted types).

As with the named address space idea, this approach also looks
reasonable to me.  If you anticipate using the attribute only
in the analyzer I would suggest to consider introducing it in
the analyzer's namespace (e.g., analyzer::untrusted, or even
gnu::analyzer::untrusted).

I'll try to loook at the patch itself sometime later this week
and comment on the implementation there.

> 
> This approach is much less expressive that the custom addres space
> approach; it would only cover the trust boundary aspect; it wouldn't
> cover any differences between generic pointers and __user, vs __iomem,
> __percpu, and __rcu which I admit I only dimly understand.
> 
> Implementation status: bootstraps and passes regression testing.
> Builds most of the kernel, but am running into various conversion
> issues.  It would be good to have some clarity on what conversions
> the compiler ought to warn about, and what conversions should be OK.
> 
> 
> Approach 3: some kind of custom qualifier
> =========================================
> 
> Approach 1 extends the existing "named address space" machinery to add
> new values; approach 2 adds a new flag to cv-qualifiers.  Both of these
> approaches work in terms of cv-qualifiers.  We have some spare bits
> available for these; perhaps a third approach could be to add a new
> kind of user-defined qualifier, like named address spaces, but othogonal
> to them.   I haven't attempted to implement this.

I'm afraid I don't understand what this would be useful for
enough to comment.

> Other attributes
> ================
> 
> Patch 2 in the kit adds:
>    __attribute__((returns_zero_on_success))
> and
>    __attribute__((returns_nonzero_on_success))
> as hints to the analyzer that it's worth bifurcating the analysis of
> such functions (to explore failure vs success, and thus to better
> explore error-handling paths).  It's also a hint to the human reader of
> the source code.

I thing being able to express something along these lines would
be useful even outside the analyzer, both for warnings and, when
done right, perhaps also for optimization.  So I'm in favor of
something like this.  I'll just reiterate here the comment on
this attribute I sent you privately some time ago.

A more general attribute would also make it possible to specify
the value (or argument) on success and failure.  With those we
would be able to express the return values of the POSIX read and
write functions and others like it:

   ssize_t read (int fildes, void *buf, size_t nbyte);
   ssize_t write (int fildes, const void *buf, size_t nbyte);

I.e., it would be nice to express that the return value is
also the number of bytes (elements?) of the array the function
wrote into.  This, along with symbolic evaluation in the middle
end, would then let us detect uninitialized reads back in
the function's caller (after read) and similar.

This is just an idea, and there may be more general apoproaches
that would be even more expressive.  But it's probably too late
in the development cycle to design and add those to GCC 12.

As I promised, I'll try to look at the meat of each patch and
give you some comments, hopefully later this week.

Martin

> 
> Given the above, the kernel could then have:
> 
> extern int copy_from_user(void *to, const void __user *from, long n)
>    __attribute__((access (write_only, 1, 3),
> 		 access (read_only, 2, 3),
> 		 returns_zero_on_success));
> 
> extern long copy_to_user(void __user *to, const void *from, unsigned long n)
>    __attribute__((access (write_only, 1, 3),
> 		 access (read_only, 2, 3),
> 		 returns_zero_on_success));
> 
> with suitable macros in compiler.h or whatnot.
> 
> ("access" is an existing GCC attribute; see
>   https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html )
> 
> My patched GCC add a heuristic to -fanalyzer that a 3-argument function
> with a read_only buffer, a write_only buffer and a shared size argument
> is a "copy function", and treats it as a copy from *from to *to of up to
> n bytes that succeeds, or, given one of the above attributes can succeed
> or fail.  I'm wiring things up so that values read from *untrusted_ptr
> are tracked as tainted, and values written to *untrusted_ptr are treated
> as possible infoleaks (e.g. uninitialized values written to
> *untrusted_ptr are specifically called out).  This gets the extra
> checking for infoleaks and taint that my earlier prototype had, but is
> thus expressed via attributes, without having to have kernel-specific
> special cases.
> 
> Patch 3 of the kit adds infoleak detection to GCC's -fanalyzer (as
> in the example above).
> 
> Possibly silly question: is it always a bug for the value of a kernel
> pointer to leak into user space?  i.e. should I be complaining about an
> infoleak if the value of a trusted_ptr itself is written to
> *untrusted_ptr?  e.g.
> 
>    s.p = some_kernel_ptr;
>    copy_to_user(user_p, &s, sizeof (s));
>       /* value of some_kernel_ptr is written to user space;
>          is this something we should warn for?  */
> 
> Patch 4a/4b wire up the different implementations of "untrusted" into
> GCC's -fanalyzer, which is used by...
> 
> Patch 5 uses this so that "untrusted" values are used in taint detection
> in the analyzer, so that it can complain about attacker-controlled
> values being used without sanitization.
> 
> Patch 6 adds a new __attribute__ ((tainted)) allowing for further
> taint detection (e.g. identifying syscalls), with minimal patching of
> the kernel, and without requiring a lot of link-time interprocedural
> analysis.  I believe that some of this could work independently of
> the trust boundary marking from the rest of the patch kit.
> 
> The combined patch kit (using approach 2 i.e. the "b" patches)
> successfully bootstraps and passes regression testing on
> x86_64-pc-linux-gnu.
> 
> 
> Which of the 3 approaches looks best to:
> - the GCC community?
> - the Linux kernel community?
> 
> Does clang/LLVM have anything similar?
> 
> There are many examples in the patches, some of which are taken from
> historical kernel vulnerabilities, and others from my "antipatterns.ko"
> project ( https://github.com/davidmalcolm/antipatterns.ko ).
> 
> Thoughts?
> 
> Dave
> 
> 
> David Malcolm (6 or 8, depending how you count):
>    1a: RFC: Implement "#pragma GCC custom_address_space"
>    1b: Add __attribute__((untrusted))
>    2: Add returns_zero_on_success/failure attributes
>    3: analyzer: implement infoleak detection
>    4a: analyzer: implemention of region::untrusted_p in terms of custom
>      address spaces
>    4b: analyzer: implement region::untrusted_p in terms of
>      __attribute__((untrusted))
>    5: analyzer: use region::untrusted_p in taint detection
>    6: Add __attribute__ ((tainted))
> 
>   gcc/Makefile.in                               |   3 +-
>   gcc/analyzer/analyzer.opt                     |  20 +
>   gcc/analyzer/checker-path.cc                  | 104 +++
>   gcc/analyzer/checker-path.h                   |  47 +
>   gcc/analyzer/diagnostic-manager.cc            |  75 +-
>   gcc/analyzer/diagnostic-manager.h             |   3 +-
>   gcc/analyzer/engine.cc                        | 342 ++++++-
>   gcc/analyzer/exploded-graph.h                 |   3 +
>   gcc/analyzer/pending-diagnostic.cc            |  30 +
>   gcc/analyzer/pending-diagnostic.h             |  24 +
>   gcc/analyzer/program-state.cc                 |  26 +-
>   gcc/analyzer/region-model-impl-calls.cc       |  26 +-
>   gcc/analyzer/region-model.cc                  | 504 ++++++++++-
>   gcc/analyzer/region-model.h                   |  46 +-
>   gcc/analyzer/region.cc                        |  52 ++
>   gcc/analyzer/region.h                         |   4 +
>   gcc/analyzer/sm-taint.cc                      | 839 ++++++++++++++++--
>   gcc/analyzer/sm.h                             |   9 +
>   gcc/analyzer/store.h                          |   1 +
>   gcc/analyzer/trust-boundaries.cc              | 615 +++++++++++++
>   gcc/c-family/c-attribs.c                      | 132 +++
>   gcc/c-family/c-pretty-print.c                 |   2 +
>   gcc/c/c-typeck.c                              |  64 ++
>   gcc/doc/extend.texi                           |  63 +-
>   gcc/doc/invoke.texi                           |  80 +-
>   gcc/print-tree.c                              |   3 +
>   .../c-c++-common/attr-returns-zero-on-1.c     |  68 ++
>   gcc/testsuite/c-c++-common/attr-untrusted-1.c | 165 ++++
>   .../gcc.dg/analyzer/attr-tainted-1.c          |  88 ++
>   .../gcc.dg/analyzer/attr-tainted-misuses.c    |   6 +
>   .../gcc.dg/analyzer/copy-function-1.c         |  98 ++
>   .../gcc.dg/analyzer/copy_from_user-1.c        |  45 +
>   gcc/testsuite/gcc.dg/analyzer/infoleak-1.c    | 181 ++++
>   gcc/testsuite/gcc.dg/analyzer/infoleak-2.c    |  29 +
>   gcc/testsuite/gcc.dg/analyzer/infoleak-3.c    | 141 +++
>   gcc/testsuite/gcc.dg/analyzer/infoleak-5.c    |  35 +
>   .../analyzer/infoleak-CVE-2011-1078-1.c       | 134 +++
>   .../analyzer/infoleak-CVE-2011-1078-2.c       |  42 +
>   .../analyzer/infoleak-CVE-2014-1446-1.c       | 117 +++
>   .../analyzer/infoleak-CVE-2017-18549-1.c      | 101 +++
>   .../analyzer/infoleak-CVE-2017-18550-1.c      | 171 ++++
>   .../gcc.dg/analyzer/infoleak-antipatterns-1.c | 162 ++++
>   .../gcc.dg/analyzer/infoleak-fixit-1.c        |  22 +
>   gcc/testsuite/gcc.dg/analyzer/pr93382.c       |   2 +-
>   .../analyzer/taint-CVE-2011-0521-1-fixed.c    | 113 +++
>   .../gcc.dg/analyzer/taint-CVE-2011-0521-1.c   | 113 +++
>   .../analyzer/taint-CVE-2011-0521-2-fixed.c    |  93 ++
>   .../gcc.dg/analyzer/taint-CVE-2011-0521-2.c   |  93 ++
>   .../analyzer/taint-CVE-2011-0521-3-fixed.c    |  56 ++
>   .../gcc.dg/analyzer/taint-CVE-2011-0521-3.c   |  57 ++
>   .../gcc.dg/analyzer/taint-CVE-2011-0521-4.c   |  40 +
>   .../gcc.dg/analyzer/taint-CVE-2011-0521-5.c   |  42 +
>   .../gcc.dg/analyzer/taint-CVE-2011-0521-6.c   |  37 +
>   .../gcc.dg/analyzer/taint-CVE-2011-0521.h     | 136 +++
>   .../gcc.dg/analyzer/taint-CVE-2011-2210-1.c   |  93 ++
>   .../gcc.dg/analyzer/taint-CVE-2020-13143-1.c  |  38 +
>   .../gcc.dg/analyzer/taint-CVE-2020-13143-2.c  |  32 +
>   .../gcc.dg/analyzer/taint-CVE-2020-13143.h    |  91 ++
>   gcc/testsuite/gcc.dg/analyzer/taint-alloc-1.c |  64 ++
>   gcc/testsuite/gcc.dg/analyzer/taint-alloc-2.c |  27 +
>   gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c |  21 +
>   gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c |  31 +
>   .../gcc.dg/analyzer/taint-antipatterns-1.c    | 137 +++
>   .../gcc.dg/analyzer/taint-divisor-1.c         |  26 +
>   .../{taint-1.c => taint-read-index-1.c}       |  19 +-
>   .../gcc.dg/analyzer/taint-read-offset-1.c     | 128 +++
>   .../taint-read-through-untrusted-ptr-1.c      |  37 +
>   gcc/testsuite/gcc.dg/analyzer/taint-size-1.c  |  32 +
>   .../gcc.dg/analyzer/taint-write-index-1.c     | 132 +++
>   .../gcc.dg/analyzer/taint-write-offset-1.c    | 132 +++
>   gcc/testsuite/gcc.dg/analyzer/test-uaccess.h  |  19 +
>   .../torture/infoleak-net-ethtool-ioctl.c      |  78 ++
>   .../torture/infoleak-vfio_iommu_type1.c       |  39 +
>   gcc/tree-core.h                               |   6 +-
>   gcc/tree.c                                    |   1 +
>   gcc/tree.h                                    |  11 +-
>   76 files changed, 6558 insertions(+), 140 deletions(-)
>   create mode 100644 gcc/analyzer/trust-boundaries.cc
>   create mode 100644 gcc/testsuite/c-c++-common/attr-returns-zero-on-1.c
>   create mode 100644 gcc/testsuite/c-c++-common/attr-untrusted-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/copy-function-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/copy_from_user-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-5.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2011-1078-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2011-1078-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2014-1446-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2017-18549-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2017-18550-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-antipatterns-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-fixit-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-1-fixed.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-2-fixed.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-3-fixed.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-4.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-5.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-6.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521.h
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-antipatterns-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-divisor-1.c
>   rename gcc/testsuite/gcc.dg/analyzer/{taint-1.c => taint-read-index-1.c} (72%)
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-read-offset-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-read-through-untrusted-ptr-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-size-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-write-index-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-write-offset-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/test-uaccess.h
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/torture/infoleak-net-ethtool-ioctl.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/torture/infoleak-vfio_iommu_type1.c
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] Add returns_zero_on_success/failure attributes
  2021-11-18 23:34           ` David Malcolm
@ 2021-12-06 18:34             ` Martin Sebor
  0 siblings, 0 replies; 39+ messages in thread
From: Martin Sebor @ 2021-12-06 18:34 UTC (permalink / raw)
  To: David Malcolm, Joseph Myers, Prathamesh Kulkarni
  Cc: gcc-patches, linux-toolchains

On 11/18/21 4:34 PM, David Malcolm via Gcc-patches wrote:
> On Wed, 2021-11-17 at 22:43 +0000, Joseph Myers wrote:
>> On Wed, 17 Nov 2021, Prathamesh Kulkarni via Gcc-patches wrote:
>>
>>> More generally, would it be a good idea to provide attributes for
>>> mod/ref anaylsis ?
>>> So sth like:
>>> void foo(void) __attribute__((modifies(errno)));
>>> which would state that foo modifies errno, but neither reads nor
>>> modifies any other global var.
>>> and
>>> void bar(void) __attribute__((reads(errno)))
>>> which would state that bar only reads errno, and doesn't modify or
>>> read any other global var.
>>
>> Many math.h functions are const except for possibly setting errno,
>> possibly raising floating-point exceptions (which might have other
>> effects
>> when using alternate exception handling) and possibly reading the
>> rounding
>> mode.  To represent that, it might be useful for such attributes to
>> be
>> able to describe state (such as the floating-point environment) that
>> doesn't correspond to a C identifier.  (errno tends to be a macro, so
>> referring to it as such in an attribute may be awkward as well.)
>>
>> (See also <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2825.htm>
>> with
>> some proposals for features to describe const/pure-like properties of
>> functions.)
>>
> 
> Thanks for the link.
> 
> As noted in my reply to Prathamesh, these ideas sound interesting, but
> this thread seems to be entering scope creep - I don't need these ideas
> to implement this patch kit (but I do need the attributes specified in
> the patch, or similar).
> 
> Do the specific attributes I posted sound reasonable?  (without
> necessarily going in to a full review).
> 
> If we're thinking longer term, I want the ability to express that a
> function can have multiple outcomes (e.g. "success" vs "failure" or
> "found" vs "not found", etc), and it might be good to have a way to
> attach attributes to those outcomes.  Unfortunately the attribute
> syntax is flat, but maybe there could be a two level hierarchy,
> something like:
> 
> int foo (args)
>    __attribute__((outcome("success")
>                   __attribute__((return_value(0))))
>    __attribute__((outcome("failure")
>                   __attribute__((return_value_ne(0))
>                   __attribute__((modifies(errno)))));
> 
> Or given that we're enamored by Lisp-ish DSLs we could go the whole hog
> and have something like:
> 
> int foo (args)
>    __attribute ((semantics(
>      "(def-outcomes (success (return-value (eq 0))"
>      "              (failure (return-value (ne 0)"
>      "                        modifies (errno))))")));
> 
> which may be over-engineering things :)

For a fully general solution, one that can express (nearly)
arbitrarily complex pre-conditions and invariants, I'd look
at the ideas in the C++ contracts papers.  I don't know if
any of the proposals (there were quite a few) made it possible
to specify postconditions involving function return values,
but I'd think that could be overcome by introducing some
special token like __retval.

Syntactically, one of the nice things about contracts that
I hope should be possible to implement in our attributes is
a way to refer to formal function arguments by name rather
than by their position in the argument list.  With that,
the expressivity goes up dramatically because it becomes
possible to use any C expression.

Martin

> Going back to the patch itself, returns_zero_on_success/failure get me
> what I want to express for finding trust boundaries in the Linux
> kernel, have obvious meaning to a programmer (helpful even w/o compiler
> support), and could interoperate with one the more elaborate ideas in
> this thread.
> 
> Hope this is constructive
> Dave
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries
  2021-12-06 18:12 ` Martin Sebor
@ 2021-12-06 19:40   ` Segher Boessenkool
  2021-12-09  0:06     ` David Malcolm
  2021-12-09 16:42     ` Martin Sebor
  2021-12-08 23:11   ` David Malcolm
  1 sibling, 2 replies; 39+ messages in thread
From: Segher Boessenkool @ 2021-12-06 19:40 UTC (permalink / raw)
  To: Martin Sebor; +Cc: David Malcolm, gcc-patches, linux-toolchains

On Mon, Dec 06, 2021 at 11:12:00AM -0700, Martin Sebor wrote:
> On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:
> >Approach 1: Custom Address Spaces
> >=================================
> >
> >GCC's C frontend supports target-specific address spaces; see:
> >   https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
> >Quoting the N1275 draft of ISO/IEC DTR 18037:
> >   "Address space names are ordinary identifiers, sharing the same name
> >   space as variables and typedef names.  Any such names follow the same
> >   rules for scope as other ordinary identifiers (such as typedef names).
> >   An implementation may provide an implementation-defined set of
> >   intrinsic address spaces that are, in effect, predefined at the start
> >   of every translation unit.  The names of intrinsic address spaces must
> >   be reserved identifiers (beginning with an underscore and an uppercase
> >   letter or with two underscores).  An implementation may also
> >   optionally support a means for new address space names to be defined
> >   within a translation unit."
> >
> >Patch 1a in the following patch kit for GCC implements such a means to
> >define new address spaces names in a translation unit, via a pragma:
> >   #prgama GCC custom_address_space(NAME_OF_ADDRESS_SPACE)
> >
> >For example, the Linux kernel could perhaps write:
> >
> >   #define __kernel
> >   #pragma GCC custom_address_space(__user)
> >   #pragma GCC custom_address_space(__iomem)
> >   #pragma GCC custom_address_space(__percpu)
> >   #pragma GCC custom_address_space(__rcu)
> >
> >and thus the C frontend can complain about code that mismatches __user
> >and kernel pointers, e.g.:
> >
> >custom-address-space-1.c: In function ‘test_argpass_to_p’:
> >custom-address-space-1.c:29:14: error: passing argument 1 of 
> >‘accepts_p’
> >from pointer to non-enclosed address space
> >    29 |   accepts_p (p_user);
> >       |              ^~~~~~
> >custom-address-space-1.c:21:24: note: expected ‘void *’ but argument is
> >of type ‘__user void *’
> >    21 | extern void accepts_p (void *);
> >       |                        ^~~~~~
> >custom-address-space-1.c: In function ‘test_cast_k_to_u’:
> >custom-address-space-1.c:135:12: warning: cast to ‘__user’ address 
> >space
> >pointer from disjoint generic address space pointer
> >   135 |   p_user = (void __user *)p_kernel;
> >       |            ^
> 
> This seems like an excellent use of named address spaces :)

It has some big problems though.

Named address spaces are completely target-specific.  Defining them with
a pragma like this does not allow you to set the pointer mode or
anything related to a custom LEGITIMATE_ADDRESS_P.  It does not allow
you to sayy zero pointers are invalid in some address spaces and not in
others.  You cannot provide any of the DWARF address space stuff this
way.  But most importantly, there are only four bits for the address
space field internally, and they are used by however a backend wants to
use them.

None of this cannot be solved, but all of it will have to be solved.

IMO it will be best to not mix this with address spaces in the user
interface (it is of course fine to *implement* it like that, or with
big overlap at least).

> >The patch doesn't yet maintain a good distinction between implicit
> >target-specific address spaces and user-defined address spaces,

And that will have to be fixed in the user code syntax at least.

> >has at
> >least one known major bug, and has only been lightly tested.  I can
> >fix these issues, but was hoping for feedback that this approach is the
> >right direction from both the GCC and Linux development communities.

Allowing the user to define new address spaces does not jibe well with
how targets do (validly!) use them.

> >Approach 2: An "untrusted" attribute
> >====================================
> >
> >Alternatively, patch 1b in the kit implements:
> >
> >   __attribute__((untrusted))
> >
> >which can be applied to types as a qualifier (similarly to const,
> >volatile, etc) to mark a trust boundary, hence the kernel could have:
> >
> >   #define __user __attribute__((untrusted))
> >
> >where my patched GCC treats
> >   T *
> >vs
> >   T __attribute__((untrusted)) *
> >as being different types and thus the C frontend can complain (even without
> >-fanalyzer) about e.g.:
> >
> >extern void accepts_p(void *);
> >
> >void test_argpass_to_p(void __user *p_user)
> >{
> >   accepts_p(p_user);
> >}
> >
> >untrusted-pointer-1.c: In function ‘test_argpass_to_p’:
> >untrusted-pointer-1.c:22:13: error: passing argument 1 of ‘accepts_p’
> >from pointer with different trust level
> >    22 |   accepts_p(p_user);
> >       |              ^~~~~~
> >untrusted-pointer-1.c:14:23: note: expected ‘void *’ but argument is of
> >type ‘__attribute__((untrusted)) void *’
> >    14 | extern void accepts_p(void *);
> >       |                        ^~~~~~
> >
> >So you'd get enforcement of __user vs non-__user pointers as part of
> >GCC's regular type-checking.  (You need an explicit cast to convert
> >between the untrusted vs trusted types).
> 
> As with the named address space idea, this approach also looks
> reasonable to me.  If you anticipate using the attribute only
> in the analyzer I would suggest to consider introducing it in
> the analyzer's namespace (e.g., analyzer::untrusted, or even
> gnu::analyzer::untrusted).

I don't see any fundamental problems with this approach.  It also is
very much in line with how Perl handles this (and some copycat languages
do as well), the "tainted" flag on data.

> >This approach is much less expressive that the custom addres space
> >approach; it would only cover the trust boundary aspect; it wouldn't
> >cover any differences between generic pointers and __user, vs __iomem,
> >__percpu, and __rcu which I admit I only dimly understand.

Yes, it does not have any of the big problems that come with those
address spaces either!  :-)

> >Other attributes
> >================
> >
> >Patch 2 in the kit adds:
> >   __attribute__((returns_zero_on_success))
> >and
> >   __attribute__((returns_nonzero_on_success))
> >as hints to the analyzer that it's worth bifurcating the analysis of
> >such functions (to explore failure vs success, and thus to better
> >explore error-handling paths).  It's also a hint to the human reader of
> >the source code.
> 
> I thing being able to express something along these lines would
> be useful even outside the analyzer, both for warnings and, when
> done right, perhaps also for optimization.  So I'm in favor of
> something like this.  I'll just reiterate here the comment on
> this attribute I sent you privately some time ago.

What is "success" though?  You probably want it so some checker can make
sure you do handle failure some way, but how do you see what is handling
failure and what is handling the successful case?


Segher

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries
  2021-12-06 18:12 ` Martin Sebor
  2021-12-06 19:40   ` Segher Boessenkool
@ 2021-12-08 23:11   ` David Malcolm
  1 sibling, 0 replies; 39+ messages in thread
From: David Malcolm @ 2021-12-08 23:11 UTC (permalink / raw)
  To: Martin Sebor, gcc-patches, linux-toolchains

On Mon, 2021-12-06 at 11:12 -0700, Martin Sebor wrote:
> On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:
> > [Crossposting between gcc-patches@gcc.gnu.org and
> > linux-toolchains@vger.kernel.org; sorry about my lack of kernel
> > knowledge, in case of the following seems bogus]
> > 
> > I've been trying to turn my prototype from the LPC2021 session on
> > "Adding kernel-specific test coverage to GCC's -fanalyzer option"
> > ( https://linuxplumbersconf.org/event/11/contributions/1076/ ) into
> > something that can go into GCC upstream without adding kernel-
> > specific
> > special cases, or requiring a GCC plugin.  The prototype simply
> > specialcased "copy_from_user" and "copy_to_user" in GCC, which is
> > clearly not OK.
> > 
> > This GCC patch kit implements detection of "trust boundaries", aimed
> > at
> > detection of "infoleaks" and of use of unsanitized attacker-
> > controlled
> > values ("taint") in the Linux kernel.
> > 
> > For example, here's an infoleak diagnostic (using notes to
> > express what fields and padding within a struct have not been
> > initialized):
> > 
> > infoleak-CVE-2011-1078-2.c: In function ‘test_1’:
> > infoleak-CVE-2011-1078-2.c:28:9: warning: potential exposure of
> > sensitive
> >    information by copying uninitialized data from stack across trust
> >    boundary [CWE-200] [-Wanalyzer-exposure-through-uninit-copy]
> >     28 |         copy_to_user(optval, &cinfo, sizeof(cinfo));
> >        |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >    ‘test_1’: events 1-3
> >      |
> >      |   21 |         struct sco_conninfo cinfo;
> >      |      |                             ^~~~~
> >      |      |                             |
> >      |      |                             (1) region created on stack
> > here
> >      |      |                             (2) capacity: 6 bytes
> >      |......
> >      |   28 |         copy_to_user(optval, &cinfo, sizeof(cinfo));
> >      |      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >      |      |         |
> >      |      |         (3) uninitialized data copied from stack here
> >      |
> > infoleak-CVE-2011-1078-2.c:28:9: note: 1 byte is uninitialized
> >     28 |         copy_to_user(optval, &cinfo, sizeof(cinfo));
> >        |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > infoleak-CVE-2011-1078-2.c:14:15: note: padding after field
> > ‘dev_class’ is uninitialized (1 byte)
> >     14 |         __u8  dev_class[3];
> >        |               ^~~~~~~~~
> > infoleak-CVE-2011-1078-2.c:21:29: note: suggest forcing zero-
> > initialization by providing a ‘{0}’ initializer
> >     21 |         struct sco_conninfo cinfo;
> >        |                             ^~~~~
> >        |                                   = {0}
> > 
> > I have to come up with a way of expressing trust boundaries in a way
> > that will be:
> > - acceptable to the GCC community (not be too kernel-specific), and
> > - useful to the Linux kernel community.
> > 
> > At LPC it was pointed out that the kernel already has various
> > annotations e.g. "__user" for different kinds of pointers, and that
> > it
> > would be best to reuse those.
> > 
> > 
> > Approach 1: Custom Address Spaces
> > =================================
> > 
> > GCC's C frontend supports target-specific address spaces; see:
> >    https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
> > Quoting the N1275 draft of ISO/IEC DTR 18037:
> >    "Address space names are ordinary identifiers, sharing the same
> > name
> >    space as variables and typedef names.  Any such names follow the
> > same
> >    rules for scope as other ordinary identifiers (such as typedef
> > names).
> >    An implementation may provide an implementation-defined set of
> >    intrinsic address spaces that are, in effect, predefined at the
> > start
> >    of every translation unit.  The names of intrinsic address spaces
> > must
> >    be reserved identifiers (beginning with an underscore and an
> > uppercase
> >    letter or with two underscores).  An implementation may also
> >    optionally support a means for new address space names to be
> > defined
> >    within a translation unit."
> > 
> > Patch 1a in the following patch kit for GCC implements such a means
> > to
> > define new address spaces names in a translation unit, via a pragma:
> >    #prgama GCC custom_address_space(NAME_OF_ADDRESS_SPACE)
> > 
> > For example, the Linux kernel could perhaps write:
> > 
> >    #define __kernel
> >    #pragma GCC custom_address_space(__user)
> >    #pragma GCC custom_address_space(__iomem)
> >    #pragma GCC custom_address_space(__percpu)
> >    #pragma GCC custom_address_space(__rcu)
> > 
> > and thus the C frontend can complain about code that mismatches
> > __user
> > and kernel pointers, e.g.:
> > 
> > custom-address-space-1.c: In function ‘test_argpass_to_p’:
> > custom-address-space-1.c:29:14: error: passing argument 1 of
> > ‘accepts_p’
> > from pointer to non-enclosed address space
> >     29 |   accepts_p (p_user);
> >        |              ^~~~~~
> > custom-address-space-1.c:21:24: note: expected ‘void *’ but argument
> > is
> > of type ‘__user void *’
> >     21 | extern void accepts_p (void *);
> >        |                        ^~~~~~
> > custom-address-space-1.c: In function ‘test_cast_k_to_u’:
> > custom-address-space-1.c:135:12: warning: cast to ‘__user’ address
> > space
> > pointer from disjoint generic address space pointer
> >    135 |   p_user = (void __user *)p_kernel;
> >        |            ^
> 
> This seems like an excellent use of named address spaces :)
> 
> I'm familiar with TR 18037 but I'm not an expert on this stuff
> so I can't really say a whole lot more.
> 
> My only suggestion here is to follow the terminology from
> there in the naming of the pragma, unless you have some reason
> not to.  I'd also recommend to consider other implementations
> of named address spaces, if there are any, especially those
> that try to be compatible with GCC.  If there are none, rather
> than custom_address_space I'd suggest either just address_space
> or named_address_space.

True, but the syntax is introducing a new address space.  As noted in
patch 1a, the current implementation of custom address spaces hardcodes
quite a lot, but we might need extra syntax to allow further tweaking;
currently the following is hardcoded for each custom address space is :

- disjoint from all other address spaces, *including* the generic one

- treated the same as the generic address space at the RTL level (in
  terms of code generation)

- treated as "untrusted" by -fanalyzer in a follow-up patch.

> 
> I have not yet looked at the implementation so this is just
> a high-level comment on the design.
> 
> > The patch doesn't yet maintain a good distinction between implicit
> > target-specific address spaces and user-defined address spaces, has
> > at
> > least one known major bug, and has only been lightly tested.  I can
> > fix these issues, but was hoping for feedback that this approach is
> > the
> > right direction from both the GCC and Linux development
> > communities.
> > 
> > Implementation status: doesn't yet bootstrap; am running into
> > stage2
> > vs stage3 comparison issues.
> > 
> > 
> > Approach 2: An "untrusted" attribute
> > ====================================
> > 
> > Alternatively, patch 1b in the kit implements:
> > 
> >    __attribute__((untrusted))
> > 
> > which can be applied to types as a qualifier (similarly to const,
> > volatile, etc) to mark a trust boundary, hence the kernel could
> > have:
> > 
> >    #define __user __attribute__((untrusted))
> > 
> > where my patched GCC treats
> >    T *
> > vs
> >    T __attribute__((untrusted)) *
> > as being different types and thus the C frontend can complain (even
> > without
> > -fanalyzer) about e.g.:
> > 
> > extern void accepts_p(void *);
> > 
> > void test_argpass_to_p(void __user *p_user)
> > {
> >    accepts_p(p_user);
> > }
> > 
> > untrusted-pointer-1.c: In function ‘test_argpass_to_p’:
> > untrusted-pointer-1.c:22:13: error: passing argument 1 of
> > ‘accepts_p’
> > from pointer with different trust level
> >     22 |   accepts_p(p_user);
> >        |              ^~~~~~
> > untrusted-pointer-1.c:14:23: note: expected ‘void *’ but argument
> > is of
> > type ‘__attribute__((untrusted)) void *’
> >     14 | extern void accepts_p(void *);
> >        |                        ^~~~~~
> > 
> > So you'd get enforcement of __user vs non-__user pointers as part
> > of
> > GCC's regular type-checking.  (You need an explicit cast to convert
> > between the untrusted vs trusted types).
> 
> As with the named address space idea, this approach also looks
> reasonable to me.  If you anticipate using the attribute only
> in the analyzer I would suggest to consider introducing it in
> the analyzer's namespace (e.g., analyzer::untrusted, or even
> gnu::analyzer::untrusted).

Is there a way to spell that using the GCC attribute syntax in C?

IIRC C23's attribute syntax is that it allows just one level of
namespacing, and that attribute-prefix is just an identifier, so
perhaps gnu_analyzer::untrusted or gcc_analyzer::untrusted???

> 
> I'll try to loook at the patch itself sometime later this week
> and comment on the implementation there.
> 
> > 
> > This approach is much less expressive that the custom addres space
> > approach; it would only cover the trust boundary aspect; it
> > wouldn't
> > cover any differences between generic pointers and __user, vs
> > __iomem,
> > __percpu, and __rcu which I admit I only dimly understand.
> > 
> > Implementation status: bootstraps and passes regression testing.
> > Builds most of the kernel, but am running into various conversion
> > issues.  It would be good to have some clarity on what conversions
> > the compiler ought to warn about, and what conversions should be
> > OK.
> > 
> > 
> > Approach 3: some kind of custom qualifier
> > =========================================
> > 
> > Approach 1 extends the existing "named address space" machinery to
> > add
> > new values; approach 2 adds a new flag to cv-qualifiers.  Both of
> > these
> > approaches work in terms of cv-qualifiers.  We have some spare bits
> > available for these; perhaps a third approach could be to add a new
> > kind of user-defined qualifier, like named address spaces, but
> > othogonal
> > to them.   I haven't attempted to implement this.
> 
> I'm afraid I don't understand what this would be useful for
> enough to comment.
> 
> > Other attributes
> > ================
> > 
> > Patch 2 in the kit adds:
> >    __attribute__((returns_zero_on_success))
> > and
> >    __attribute__((returns_nonzero_on_success))
> > as hints to the analyzer that it's worth bifurcating the analysis
> > of
> > such functions (to explore failure vs success, and thus to better
> > explore error-handling paths).  It's also a hint to the human
> > reader of
> > the source code.
> 
> I thing being able to express something along these lines would
> be useful even outside the analyzer, both for warnings and, when
> done right, perhaps also for optimization.  So I'm in favor of
> something like this.  I'll just reiterate here the comment on
> this attribute I sent you privately some time ago.
> 
> A more general attribute would also make it possible to specify
> the value (or argument) on success and failure.  With those we
> would be able to express the return values of the POSIX read and
> write functions and others like it:
> 
>    ssize_t read (int fildes, void *buf, size_t nbyte);
>    ssize_t write (int fildes, const void *buf, size_t nbyte);
> 
> I.e., it would be nice to express that the return value is
> also the number of bytes (elements?) of the array the function
> wrote into.  This, along with symbolic evaluation in the middle
> end, would then let us detect uninitialized reads back in
> the function's caller (after read) and similar.
> 
> This is just an idea, and there may be more general apoproaches
> that would be even more expressive.  But it's probably too late
> in the development cycle to design and add those to GCC 12.

(nods)

> 
> As I promised, I'll try to look at the meat of each patch and
> give you some comments, hopefully later this week.

Thanks for the comments so far
Dave

> 
> Martin
> 
> > 
> > Given the above, the kernel could then have:
> > 
> > extern int copy_from_user(void *to, const void __user *from, long
> > n)
> >    __attribute__((access (write_only, 1, 3),
> >                  access (read_only, 2, 3),
> >                  returns_zero_on_success));
> > 
> > extern long copy_to_user(void __user *to, const void *from,
> > unsigned long n)
> >    __attribute__((access (write_only, 1, 3),
> >                  access (read_only, 2, 3),
> >                  returns_zero_on_success));
> > 
> > with suitable macros in compiler.h or whatnot.
> > 
> > ("access" is an existing GCC attribute; see
> >      
> > https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html 
> > )
> > 
> > My patched GCC add a heuristic to -fanalyzer that a 3-argument
> > function
> > with a read_only buffer, a write_only buffer and a shared size
> > argument
> > is a "copy function", and treats it as a copy from *from to *to of
> > up to
> > n bytes that succeeds, or, given one of the above attributes can
> > succeed
> > or fail.  I'm wiring things up so that values read from
> > *untrusted_ptr
> > are tracked as tainted, and values written to *untrusted_ptr are
> > treated
> > as possible infoleaks (e.g. uninitialized values written to
> > *untrusted_ptr are specifically called out).  This gets the extra
> > checking for infoleaks and taint that my earlier prototype had, but
> > is
> > thus expressed via attributes, without having to have kernel-
> > specific
> > special cases.
> > 
> > Patch 3 of the kit adds infoleak detection to GCC's -fanalyzer (as
> > in the example above).
> > 
> > Possibly silly question: is it always a bug for the value of a
> > kernel
> > pointer to leak into user space?  i.e. should I be complaining
> > about an
> > infoleak if the value of a trusted_ptr itself is written to
> > *untrusted_ptr?  e.g.
> > 
> >    s.p = some_kernel_ptr;
> >    copy_to_user(user_p, &s, sizeof (s));
> >       /* value of some_kernel_ptr is written to user space;
> >          is this something we should warn for?  */
> > 
> > Patch 4a/4b wire up the different implementations of "untrusted"
> > into
> > GCC's -fanalyzer, which is used by...
> > 
> > Patch 5 uses this so that "untrusted" values are used in taint
> > detection
> > in the analyzer, so that it can complain about attacker-controlled
> > values being used without sanitization.
> > 
> > Patch 6 adds a new __attribute__ ((tainted)) allowing for further
> > taint detection (e.g. identifying syscalls), with minimal patching
> > of
> > the kernel, and without requiring a lot of link-time
> > interprocedural
> > analysis.  I believe that some of this could work independently of
> > the trust boundary marking from the rest of the patch kit.
> > 
> > The combined patch kit (using approach 2 i.e. the "b" patches)
> > successfully bootstraps and passes regression testing on
> > x86_64-pc-linux-gnu.
> > 
> > 
> > Which of the 3 approaches looks best to:
> > - the GCC community?
> > - the Linux kernel community?
> > 
> > Does clang/LLVM have anything similar?
> > 
> > There are many examples in the patches, some of which are taken
> > from
> > historical kernel vulnerabilities, and others from my
> > "antipatterns.ko"
> > project ( https://github.com/davidmalcolm/antipatterns.ko ).
> > 
> > Thoughts?
> > 
> > Dave
> > 
> > 
> > David Malcolm (6 or 8, depending how you count):
> >    1a: RFC: Implement "#pragma GCC custom_address_space"
> >    1b: Add __attribute__((untrusted))
> >    2: Add returns_zero_on_success/failure attributes
> >    3: analyzer: implement infoleak detection
> >    4a: analyzer: implemention of region::untrusted_p in terms of
> > custom
> >      address spaces
> >    4b: analyzer: implement region::untrusted_p in terms of
> >      __attribute__((untrusted))
> >    5: analyzer: use region::untrusted_p in taint detection
> >    6: Add __attribute__ ((tainted))
> > 
> >   gcc/Makefile.in                               |   3 +-
> >   gcc/analyzer/analyzer.opt                     |  20 +
> >   gcc/analyzer/checker-path.cc                  | 104 +++
> >   gcc/analyzer/checker-path.h                   |  47 +
> >   gcc/analyzer/diagnostic-manager.cc            |  75 +-
> >   gcc/analyzer/diagnostic-manager.h             |   3 +-
> >   gcc/analyzer/engine.cc                        | 342 ++++++-
> >   gcc/analyzer/exploded-graph.h                 |   3 +
> >   gcc/analyzer/pending-diagnostic.cc            |  30 +
> >   gcc/analyzer/pending-diagnostic.h             |  24 +
> >   gcc/analyzer/program-state.cc                 |  26 +-
> >   gcc/analyzer/region-model-impl-calls.cc       |  26 +-
> >   gcc/analyzer/region-model.cc                  | 504 ++++++++++-
> >   gcc/analyzer/region-model.h                   |  46 +-
> >   gcc/analyzer/region.cc                        |  52 ++
> >   gcc/analyzer/region.h                         |   4 +
> >   gcc/analyzer/sm-taint.cc                      | 839
> > ++++++++++++++++--
> >   gcc/analyzer/sm.h                             |   9 +
> >   gcc/analyzer/store.h                          |   1 +
> >   gcc/analyzer/trust-boundaries.cc              | 615 +++++++++++++
> >   gcc/c-family/c-attribs.c                      | 132 +++
> >   gcc/c-family/c-pretty-print.c                 |   2 +
> >   gcc/c/c-typeck.c                              |  64 ++
> >   gcc/doc/extend.texi                           |  63 +-
> >   gcc/doc/invoke.texi                           |  80 +-
> >   gcc/print-tree.c                              |   3 +
> >   .../c-c++-common/attr-returns-zero-on-1.c     |  68 ++
> >   gcc/testsuite/c-c++-common/attr-untrusted-1.c | 165 ++++
> >   .../gcc.dg/analyzer/attr-tainted-1.c          |  88 ++
> >   .../gcc.dg/analyzer/attr-tainted-misuses.c    |   6 +
> >   .../gcc.dg/analyzer/copy-function-1.c         |  98 ++
> >   .../gcc.dg/analyzer/copy_from_user-1.c        |  45 +
> >   gcc/testsuite/gcc.dg/analyzer/infoleak-1.c    | 181 ++++
> >   gcc/testsuite/gcc.dg/analyzer/infoleak-2.c    |  29 +
> >   gcc/testsuite/gcc.dg/analyzer/infoleak-3.c    | 141 +++
> >   gcc/testsuite/gcc.dg/analyzer/infoleak-5.c    |  35 +
> >   .../analyzer/infoleak-CVE-2011-1078-1.c       | 134 +++
> >   .../analyzer/infoleak-CVE-2011-1078-2.c       |  42 +
> >   .../analyzer/infoleak-CVE-2014-1446-1.c       | 117 +++
> >   .../analyzer/infoleak-CVE-2017-18549-1.c      | 101 +++
> >   .../analyzer/infoleak-CVE-2017-18550-1.c      | 171 ++++
> >   .../gcc.dg/analyzer/infoleak-antipatterns-1.c | 162 ++++
> >   .../gcc.dg/analyzer/infoleak-fixit-1.c        |  22 +
> >   gcc/testsuite/gcc.dg/analyzer/pr93382.c       |   2 +-
> >   .../analyzer/taint-CVE-2011-0521-1-fixed.c    | 113 +++
> >   .../gcc.dg/analyzer/taint-CVE-2011-0521-1.c   | 113 +++
> >   .../analyzer/taint-CVE-2011-0521-2-fixed.c    |  93 ++
> >   .../gcc.dg/analyzer/taint-CVE-2011-0521-2.c   |  93 ++
> >   .../analyzer/taint-CVE-2011-0521-3-fixed.c    |  56 ++
> >   .../gcc.dg/analyzer/taint-CVE-2011-0521-3.c   |  57 ++
> >   .../gcc.dg/analyzer/taint-CVE-2011-0521-4.c   |  40 +
> >   .../gcc.dg/analyzer/taint-CVE-2011-0521-5.c   |  42 +
> >   .../gcc.dg/analyzer/taint-CVE-2011-0521-6.c   |  37 +
> >   .../gcc.dg/analyzer/taint-CVE-2011-0521.h     | 136 +++
> >   .../gcc.dg/analyzer/taint-CVE-2011-2210-1.c   |  93 ++
> >   .../gcc.dg/analyzer/taint-CVE-2020-13143-1.c  |  38 +
> >   .../gcc.dg/analyzer/taint-CVE-2020-13143-2.c  |  32 +
> >   .../gcc.dg/analyzer/taint-CVE-2020-13143.h    |  91 ++
> >   gcc/testsuite/gcc.dg/analyzer/taint-alloc-1.c |  64 ++
> >   gcc/testsuite/gcc.dg/analyzer/taint-alloc-2.c |  27 +
> >   gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c |  21 +
> >   gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c |  31 +
> >   .../gcc.dg/analyzer/taint-antipatterns-1.c    | 137 +++
> >   .../gcc.dg/analyzer/taint-divisor-1.c         |  26 +
> >   .../{taint-1.c => taint-read-index-1.c}       |  19 +-
> >   .../gcc.dg/analyzer/taint-read-offset-1.c     | 128 +++
> >   .../taint-read-through-untrusted-ptr-1.c      |  37 +
> >   gcc/testsuite/gcc.dg/analyzer/taint-size-1.c  |  32 +
> >   .../gcc.dg/analyzer/taint-write-index-1.c     | 132 +++
> >   .../gcc.dg/analyzer/taint-write-offset-1.c    | 132 +++
> >   gcc/testsuite/gcc.dg/analyzer/test-uaccess.h  |  19 +
> >   .../torture/infoleak-net-ethtool-ioctl.c      |  78 ++
> >   .../torture/infoleak-vfio_iommu_type1.c       |  39 +
> >   gcc/tree-core.h                               |   6 +-
> >   gcc/tree.c                                    |   1 +
> >   gcc/tree.h                                    |  11 +-
> >   76 files changed, 6558 insertions(+), 140 deletions(-)
> >   create mode 100644 gcc/analyzer/trust-boundaries.cc
> >   create mode 100644 gcc/testsuite/c-c++-common/attr-returns-zero-
> > on-1.c
> >   create mode 100644 gcc/testsuite/c-c++-common/attr-untrusted-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-
> > misuses.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/copy-function-
> > 1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/copy_from_user-
> > 1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-2.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-3.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-5.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-
> > 2011-1078-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-
> > 2011-1078-2.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-
> > 2014-1446-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-
> > 2017-18549-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-
> > 2017-18550-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-
> > antipatterns-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-fixit-
> > 1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-
> > 0521-1-fixed.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-
> > 0521-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-
> > 0521-2-fixed.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-
> > 0521-2.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-
> > 0521-3-fixed.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-
> > 0521-3.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-
> > 0521-4.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-
> > 0521-5.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-
> > 0521-6.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-
> > 0521.h
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-
> > 2210-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-
> > 13143-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-
> > 13143-2.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-
> > 13143.h
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-2.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-
> > antipatterns-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-divisor-
> > 1.c
> >   rename gcc/testsuite/gcc.dg/analyzer/{taint-1.c => taint-read-
> > index-1.c} (72%)
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-read-
> > offset-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-read-
> > through-untrusted-ptr-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-size-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-write-
> > index-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-write-
> > offset-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/test-uaccess.h
> >   create mode 100644
> > gcc/testsuite/gcc.dg/analyzer/torture/infoleak-net-ethtool-ioctl.c
> >   create mode 100644
> > gcc/testsuite/gcc.dg/analyzer/torture/infoleak-vfio_iommu_type1.c
> > 
> 



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries
  2021-12-06 19:40   ` Segher Boessenkool
@ 2021-12-09  0:06     ` David Malcolm
  2021-12-09  0:41       ` Segher Boessenkool
  2021-12-09 16:42     ` Martin Sebor
  1 sibling, 1 reply; 39+ messages in thread
From: David Malcolm @ 2021-12-09  0:06 UTC (permalink / raw)
  To: Segher Boessenkool, Martin Sebor; +Cc: gcc-patches, linux-toolchains

On Mon, 2021-12-06 at 13:40 -0600, Segher Boessenkool wrote:
> On Mon, Dec 06, 2021 at 11:12:00AM -0700, Martin Sebor wrote:
> > On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:
> > > Approach 1: Custom Address Spaces
> > > =================================
> > > 
> > > GCC's C frontend supports target-specific address spaces; see:
> > >   https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
> > > Quoting the N1275 draft of ISO/IEC DTR 18037:
> > >   "Address space names are ordinary identifiers, sharing the same
> > > name
> > >   space as variables and typedef names.  Any such names follow the
> > > same
> > >   rules for scope as other ordinary identifiers (such as typedef
> > > names).
> > >   An implementation may provide an implementation-defined set of
> > >   intrinsic address spaces that are, in effect, predefined at the
> > > start
> > >   of every translation unit.  The names of intrinsic address spaces
> > > must
> > >   be reserved identifiers (beginning with an underscore and an
> > > uppercase
> > >   letter or with two underscores).  An implementation may also
> > >   optionally support a means for new address space names to be
> > > defined
> > >   within a translation unit."
> > > 
> > > Patch 1a in the following patch kit for GCC implements such a means
> > > to
> > > define new address spaces names in a translation unit, via a
> > > pragma:
> > >   #prgama GCC custom_address_space(NAME_OF_ADDRESS_SPACE)
> > > 
> > > For example, the Linux kernel could perhaps write:
> > > 
> > >   #define __kernel
> > >   #pragma GCC custom_address_space(__user)
> > >   #pragma GCC custom_address_space(__iomem)
> > >   #pragma GCC custom_address_space(__percpu)
> > >   #pragma GCC custom_address_space(__rcu)
> > > 
> > > and thus the C frontend can complain about code that mismatches
> > > __user
> > > and kernel pointers, e.g.:
> > > 
> > > custom-address-space-1.c: In function ‘test_argpass_to_p’:
> > > custom-address-space-1.c:29:14: error: passing argument 1 of 
> > > ‘accepts_p’
> > > from pointer to non-enclosed address space
> > >    29 |   accepts_p (p_user);
> > >       |              ^~~~~~
> > > custom-address-space-1.c:21:24: note: expected ‘void *’ but
> > > argument is
> > > of type ‘__user void *’
> > >    21 | extern void accepts_p (void *);
> > >       |                        ^~~~~~
> > > custom-address-space-1.c: In function ‘test_cast_k_to_u’:
> > > custom-address-space-1.c:135:12: warning: cast to ‘__user’ address 
> > > space
> > > pointer from disjoint generic address space pointer
> > >   135 |   p_user = (void __user *)p_kernel;
> > >       |            ^
> > 
> > This seems like an excellent use of named address spaces :)
> 
> It has some big problems though.

Thanks for raising these points.

> 
> Named address spaces are completely target-specific.  Defining them
> with
> a pragma like this does not allow you to set the pointer mode or
> anything related to a custom LEGITIMATE_ADDRESS_P.

My thinking was that each custom address space is based on an existing
address space, but is disjoint from it, where "based on" means "what it
looks like in terms of RTL generation" (clearly I'm handwaving here).  

In patch 1a, the custom address spaces are all based on the generic
address space (but disjoint from it); syntax could be added to base
them on one of the target-specific address spaces.

>   It does not allow
> you to sayy zero pointers are invalid in some address spaces and not in
> others.

Syntax could be added for this, I suppose.


> You cannot provide any of the DWARF address space stuff this
> way.

True.  I confess that I haven't thought about the debugging experience,
and I'd need to think what would happen at the DWARF level.


> But most importantly, there are only four bits for the address
> space field internally, and they are used by however a backend wants to
> use them.

One of the ideas of patch 1a is to divide up this 4-bit space between
the target-specific and the custom address spaces.  The backend code
would need to be tweaked to decode the 4-bit value to get at the
underlying target-specific address space value.  This is done by the
function ensure_builtin_addr_space in patch 1a, though I've likely
missed some places.

IIRC, the target that's currently using the most address spaces is avr,
which I believe has 8 target-specific address spaces, in addition to
the generic one, i.e. 9 builtin address spaces, which would leave room
for up to 6 user-defined address spaces.  The Linux kernel's smatch
annotations currently effectively introduce 4 custom address spaces,
__user, __iomem, __percpu, and __rcu (assuming that __kernel is the
generic address space), so it's something of a tight squeeze, but it
does fit.  This doesn't account for out-of-tree backends, of course.

> 
> None of this cannot be solved, but all of it will have to be solved.

(nods)

> 
> IMO it will be best to not mix this with address spaces in the user
> interface (it is of course fine to *implement* it like that, or with
> big overlap at least).

I was thinking the other way around, in that it should look like
address spaces in terms of the user's source code, but has some
implementation differences.

> 
> > > The patch doesn't yet maintain a good distinction between implicit
> > > target-specific address spaces and user-defined address spaces,
> 
> And that will have to be fixed in the user code syntax at least.
> 
> > > has at
> > > least one known major bug, and has only been lightly tested.  I can
> > > fix these issues, but was hoping for feedback that this approach is
> > > the
> > > right direction from both the GCC and Linux development
> > > communities.
> 
> Allowing the user to define new address spaces does not jibe well with
> how targets do (validly!) use them.

I think from a user's perspective it's a nice approach - my feeling is
that it makes certain things easier for the user, whilst complicating
things from a backend implementation perspective.

Plus you've raised various technical issues which I'd have to resolve
if we went in this direction.


> 
> > > Approach 2: An "untrusted" attribute
> > > ====================================
> > > 
> > > Alternatively, patch 1b in the kit implements:
> > > 
> > >   __attribute__((untrusted))
> > > 
> > > which can be applied to types as a qualifier (similarly to const,
> > > volatile, etc) to mark a trust boundary, hence the kernel could
> > > have:
> > > 
> > >   #define __user __attribute__((untrusted))
> > > 
> > > where my patched GCC treats
> > >   T *
> > > vs
> > >   T __attribute__((untrusted)) *
> > > as being different types and thus the C frontend can complain (even
> > > without
> > > -fanalyzer) about e.g.:
> > > 
> > > extern void accepts_p(void *);
> > > 
> > > void test_argpass_to_p(void __user *p_user)
> > > {
> > >   accepts_p(p_user);
> > > }
> > > 
> > > untrusted-pointer-1.c: In function ‘test_argpass_to_p’:
> > > untrusted-pointer-1.c:22:13: error: passing argument 1 of
> > > ‘accepts_p’
> > > from pointer with different trust level
> > >    22 |   accepts_p(p_user);
> > >       |              ^~~~~~
> > > untrusted-pointer-1.c:14:23: note: expected ‘void *’ but argument
> > > is of
> > > type ‘__attribute__((untrusted)) void *’
> > >    14 | extern void accepts_p(void *);
> > >       |                        ^~~~~~
> > > 
> > > So you'd get enforcement of __user vs non-__user pointers as part
> > > of
> > > GCC's regular type-checking.  (You need an explicit cast to convert
> > > between the untrusted vs trusted types).
> > 
> > As with the named address space idea, this approach also looks
> > reasonable to me.  If you anticipate using the attribute only
> > in the analyzer I would suggest to consider introducing it in
> > the analyzer's namespace (e.g., analyzer::untrusted, or even
> > gnu::analyzer::untrusted).
> 
> I don't see any fundamental problems with this approach.  It also is
> very much in line with how Perl handles this (and some copycat
> languages
> do as well), the "tainted" flag on data.

(nods)

> 
> > > This approach is much less expressive that the custom addres space
> > > approach; it would only cover the trust boundary aspect; it
> > > wouldn't
> > > cover any differences between generic pointers and __user, vs
> > > __iomem,
> > > __percpu, and __rcu which I admit I only dimly understand.
> 
> Yes, it does not have any of the big problems that come with those
> address spaces either!  :-)

Indeed - I think this "untrusted attribute" approach is much simpler
implementation-wise than the "custom address space" approach, which is
also in its favor.

I'm wondering if anyone from the kernel development community has
strong opinions here, since the custom address space approach is
potentially much more expressive.

Otherwise I think we're both preferring the "untrusted attribute"
approach (patch 1b).

> 
> > > Other attributes
> > > ================
> > > 
> > > Patch 2 in the kit adds:
> > >   __attribute__((returns_zero_on_success))
> > > and
> > >   __attribute__((returns_nonzero_on_success))
> > > as hints to the analyzer that it's worth bifurcating the analysis
> > > of
> > > such functions (to explore failure vs success, and thus to better
> > > explore error-handling paths).  It's also a hint to the human
> > > reader of
> > > the source code.
> > 
> > I thing being able to express something along these lines would
> > be useful even outside the analyzer, both for warnings and, when
> > done right, perhaps also for optimization.  So I'm in favor of
> > something like this.  I'll just reiterate here the comment on
> > this attribute I sent you privately some time ago.
> 
> What is "success" though?  You probably want it so some checker can
> make
> sure you do handle failure some way, but how do you see what is
> handling
> failure and what is handling the successful case?

"success" and "failure" in this case are purely in terms of how we
label events for the user in the analyzer, such as in event (3) in the
following:

infoleak-antipatterns-1.c: In function ‘infoleak_stack_unchecked_err’:
infoleak-antipatterns-1.c:118:10: warning: potential exposure of
sensitive information by copying uninitialized data from stack across
trust boundary [CWE-200] [-Wanalyzer-exposure-through-uninit-copy]
  118 |   err |= copy_to_user (dst, &st, sizeof(st));
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ‘infoleak_stack_unchecked_err’: events 1-4
    |
    |  110 |   struct infoleak_buf st;
    |      |                       ^~
    |      |                       |
    |      |                       (1) source region created on stack here
    |      |                       (2) capacity: 256 bytes
    |......
    |  117 |   int err = copy_from_user (&st, src, sizeof(st));
    |      |             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |             |
    |      |             (3) when ‘copy_from_user’ fails, returning non-zero
    |  118 |   err |= copy_to_user (dst, &st, sizeof(st));
    |      |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |          |
    |      |          (4) uninitialized data copied from stack here
    |

i.e. it's allows the analyzer to provide a hint to the reader of the
analyzer output.  The attribute is also a hint to the human reader of
the source code.

Thanks
Dave


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries
  2021-12-09  0:06     ` David Malcolm
@ 2021-12-09  0:41       ` Segher Boessenkool
  0 siblings, 0 replies; 39+ messages in thread
From: Segher Boessenkool @ 2021-12-09  0:41 UTC (permalink / raw)
  To: David Malcolm; +Cc: Martin Sebor, gcc-patches, linux-toolchains

Hi!

On Wed, Dec 08, 2021 at 07:06:30PM -0500, David Malcolm wrote:
> On Mon, 2021-12-06 at 13:40 -0600, Segher Boessenkool wrote:
> > Named address spaces are completely target-specific.  Defining them
> > with
> > a pragma like this does not allow you to set the pointer mode or
> > anything related to a custom LEGITIMATE_ADDRESS_P.
> 
> My thinking was that each custom address space is based on an existing
> address space, but is disjoint from it, where "based on" means "what it
> looks like in terms of RTL generation" (clearly I'm handwaving here).  
> 
> In patch 1a, the custom address spaces are all based on the generic
> address space (but disjoint from it); syntax could be added to base
> them on one of the target-specific address spaces.
> 
> >   It does not allow
> > you to sayy zero pointers are invalid in some address spaces and not in
> > others.
> 
> Syntax could be added for this, I suppose.
> 
> > You cannot provide any of the DWARF address space stuff this
> > way.
> 
> True.  I confess that I haven't thought about the debugging experience,
> and I'd need to think what would happen at the DWARF level.
> 
> > But most importantly, there are only four bits for the address
> > space field internally, and they are used by however a backend wants to
> > use them.
> 
> One of the ideas of patch 1a is to divide up this 4-bit space between
> the target-specific and the custom address spaces.  The backend code
> would need to be tweaked to decode the 4-bit value to get at the
> underlying target-specific address space value.  This is done by the
> function ensure_builtin_addr_space in patch 1a, though I've likely
> missed some places.
> 
> IIRC, the target that's currently using the most address spaces is avr,
> which I believe has 8 target-specific address spaces, in addition to
> the generic one, i.e. 9 builtin address spaces, which would leave room
> for up to 6 user-defined address spaces.

Except that a backend is free to use this bitfield any way it pleases.

All of the above says that what you want is something completely
orthogonal to and separate from named address spaces.  Very similar in
some ways, sure, but keeping it apart will work much better and be much
less pain.

> The Linux kernel's smatch
> annotations currently effectively introduce 4 custom address spaces,
> __user, __iomem, __percpu, and __rcu (assuming that __kernel is the
> generic address space), so it's something of a tight squeeze, but it
> does fit.  This doesn't account for out-of-tree backends, of course.

Or any future backends.

> > IMO it will be best to not mix this with address spaces in the user
> > interface (it is of course fine to *implement* it like that, or with
> > big overlap at least).
> 
> I was thinking the other way around, in that it should look like
> address spaces in terms of the user's source code, but has some
> implementation differences.

That does not solve any of the problems I brought up though.  That was
just a list of all the basic features from address spaces btw, from
gccint.

> > Allowing the user to define new address spaces does not jibe well with
> > how targets do (validly!) use them.
> 
> I think from a user's perspective it's a nice approach - my feeling is
> that it makes certain things easier for the user, whilst complicating
> things from a backend implementation perspective.
> 
> Plus you've raised various technical issues which I'd have to resolve
> if we went in this direction.

It is fine to have a (very) similar concept for the user, but it does
not work well at all to equate this to the existing concept of named
address spaces.

> Indeed - I think this "untrusted attribute" approach is much simpler
> implementation-wise than the "custom address space" approach, which is
> also in its favor.
> 
> I'm wondering if anyone from the kernel development community has
> strong opinions here, since the custom address space approach is
> potentially much more expressive.

Anything that is more expressive than you have thought through what the
consequences will be is not a feature but a danger.  Anything that does
not fit in with the rest structurally now, will never do that.

> Otherwise I think we're both preferring the "untrusted attribute"
> approach (patch 1b).

That attribute does not interfere with anything else afaics, so that is
much safer.

> > > I thing being able to express something along these lines would
> > > be useful even outside the analyzer, both for warnings and, when
> > > done right, perhaps also for optimization.  So I'm in favor of
> > > something like this.  I'll just reiterate here the comment on
> > > this attribute I sent you privately some time ago.
> > 
> > What is "success" though?  You probably want it so some checker can
> > make
> > sure you do handle failure some way, but how do you see what is
> > handling
> > failure and what is handling the successful case?
> 
> "success" and "failure" in this case are purely in terms of how we
> label events for the user in the analyzer, such as in event (3) in the
> following:
> 
> infoleak-antipatterns-1.c: In function ‘infoleak_stack_unchecked_err’:
> infoleak-antipatterns-1.c:118:10: warning: potential exposure of
> sensitive information by copying uninitialized data from stack across
> trust boundary [CWE-200] [-Wanalyzer-exposure-through-uninit-copy]
>   118 |   err |= copy_to_user (dst, &st, sizeof(st));
>       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>   ‘infoleak_stack_unchecked_err’: events 1-4
>     |
>     |  110 |   struct infoleak_buf st;
>     |      |                       ^~
>     |      |                       |
>     |      |                       (1) source region created on stack here
>     |      |                       (2) capacity: 256 bytes
>     |......
>     |  117 |   int err = copy_from_user (&st, src, sizeof(st));
>     |      |             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     |      |             |
>     |      |             (3) when ‘copy_from_user’ fails, returning non-zero
>     |  118 |   err |= copy_to_user (dst, &st, sizeof(st));
>     |      |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     |      |          |
>     |      |          (4) uninitialized data copied from stack here
>     |
> 
> i.e. it's allows the analyzer to provide a hint to the reader of the
> analyzer output.  The attribute is also a hint to the human reader of
> the source code.

But how do you tell the analyser what is success and what is failure?
Do you always count non-zero return values as failure, like here?  There
are other conventions (negative means error, zero means error, etc.)


Segher

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries
  2021-12-06 19:40   ` Segher Boessenkool
  2021-12-09  0:06     ` David Malcolm
@ 2021-12-09 16:42     ` Martin Sebor
  2021-12-09 23:40       ` Segher Boessenkool
  1 sibling, 1 reply; 39+ messages in thread
From: Martin Sebor @ 2021-12-09 16:42 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: David Malcolm, gcc-patches, linux-toolchains

On 12/6/21 12:40 PM, Segher Boessenkool wrote:
> On Mon, Dec 06, 2021 at 11:12:00AM -0700, Martin Sebor wrote:
>> On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:
>>> Approach 1: Custom Address Spaces
>>> =================================
>>>
>>> GCC's C frontend supports target-specific address spaces; see:
>>>    https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
>>> Quoting the N1275 draft of ISO/IEC DTR 18037:
>>>    "Address space names are ordinary identifiers, sharing the same name
>>>    space as variables and typedef names.  Any such names follow the same
>>>    rules for scope as other ordinary identifiers (such as typedef names).
>>>    An implementation may provide an implementation-defined set of
>>>    intrinsic address spaces that are, in effect, predefined at the start
>>>    of every translation unit.  The names of intrinsic address spaces must
>>>    be reserved identifiers (beginning with an underscore and an uppercase
>>>    letter or with two underscores).  An implementation may also
>>>    optionally support a means for new address space names to be defined
>>>    within a translation unit."
>>>
>>> Patch 1a in the following patch kit for GCC implements such a means to
>>> define new address spaces names in a translation unit, via a pragma:
>>>    #prgama GCC custom_address_space(NAME_OF_ADDRESS_SPACE)
>>>
>>> For example, the Linux kernel could perhaps write:
>>>
>>>    #define __kernel
>>>    #pragma GCC custom_address_space(__user)
>>>    #pragma GCC custom_address_space(__iomem)
>>>    #pragma GCC custom_address_space(__percpu)
>>>    #pragma GCC custom_address_space(__rcu)
>>>
>>> and thus the C frontend can complain about code that mismatches __user
>>> and kernel pointers, e.g.:
>>>
>>> custom-address-space-1.c: In function ‘test_argpass_to_p’:
>>> custom-address-space-1.c:29:14: error: passing argument 1 of
>>> ‘accepts_p’
>> >from pointer to non-enclosed address space
>>>     29 |   accepts_p (p_user);
>>>        |              ^~~~~~
>>> custom-address-space-1.c:21:24: note: expected ‘void *’ but argument is
>>> of type ‘__user void *’
>>>     21 | extern void accepts_p (void *);
>>>        |                        ^~~~~~
>>> custom-address-space-1.c: In function ‘test_cast_k_to_u’:
>>> custom-address-space-1.c:135:12: warning: cast to ‘__user’ address
>>> space
>>> pointer from disjoint generic address space pointer
>>>    135 |   p_user = (void __user *)p_kernel;
>>>        |            ^
>>
>> This seems like an excellent use of named address spaces :)
> 
> It has some big problems though.
> 
> Named address spaces are completely target-specific.

My understanding of these kernel/user address spaces that David
is adding for the benefit of the analyzer is that the correspond
to what TR 18037 calls nested namespaces.  They're nested within
the generic namespace that's a union of the twp.  With that, I'd
expect them to be fully handled early on and be transparent
afterwards.  Is implementing this idea not feasible in the GCC
design?

Martin

> Defining them with
> a pragma like this does not allow you to set the pointer mode or
> anything related to a custom LEGITIMATE_ADDRESS_P.  It does not allow
> you to sayy zero pointers are invalid in some address spaces and not in
> others.  You cannot provide any of the DWARF address space stuff this
> way.  But most importantly, there are only four bits for the address
> space field internally, and they are used by however a backend wants to
> use them.
> 
> None of this cannot be solved, but all of it will have to be solved.
> 
> IMO it will be best to not mix this with address spaces in the user
> interface (it is of course fine to *implement* it like that, or with
> big overlap at least).
> 
>>> The patch doesn't yet maintain a good distinction between implicit
>>> target-specific address spaces and user-defined address spaces,
> 
> And that will have to be fixed in the user code syntax at least.
> 
>>> has at
>>> least one known major bug, and has only been lightly tested.  I can
>>> fix these issues, but was hoping for feedback that this approach is the
>>> right direction from both the GCC and Linux development communities.
> 
> Allowing the user to define new address spaces does not jibe well with
> how targets do (validly!) use them.
> 
>>> Approach 2: An "untrusted" attribute
>>> ====================================
>>>
>>> Alternatively, patch 1b in the kit implements:
>>>
>>>    __attribute__((untrusted))
>>>
>>> which can be applied to types as a qualifier (similarly to const,
>>> volatile, etc) to mark a trust boundary, hence the kernel could have:
>>>
>>>    #define __user __attribute__((untrusted))
>>>
>>> where my patched GCC treats
>>>    T *
>>> vs
>>>    T __attribute__((untrusted)) *
>>> as being different types and thus the C frontend can complain (even without
>>> -fanalyzer) about e.g.:
>>>
>>> extern void accepts_p(void *);
>>>
>>> void test_argpass_to_p(void __user *p_user)
>>> {
>>>    accepts_p(p_user);
>>> }
>>>
>>> untrusted-pointer-1.c: In function ‘test_argpass_to_p’:
>>> untrusted-pointer-1.c:22:13: error: passing argument 1 of ‘accepts_p’
>> >from pointer with different trust level
>>>     22 |   accepts_p(p_user);
>>>        |              ^~~~~~
>>> untrusted-pointer-1.c:14:23: note: expected ‘void *’ but argument is of
>>> type ‘__attribute__((untrusted)) void *’
>>>     14 | extern void accepts_p(void *);
>>>        |                        ^~~~~~
>>>
>>> So you'd get enforcement of __user vs non-__user pointers as part of
>>> GCC's regular type-checking.  (You need an explicit cast to convert
>>> between the untrusted vs trusted types).
>>
>> As with the named address space idea, this approach also looks
>> reasonable to me.  If you anticipate using the attribute only
>> in the analyzer I would suggest to consider introducing it in
>> the analyzer's namespace (e.g., analyzer::untrusted, or even
>> gnu::analyzer::untrusted).
> 
> I don't see any fundamental problems with this approach.  It also is
> very much in line with how Perl handles this (and some copycat languages
> do as well), the "tainted" flag on data.
> 
>>> This approach is much less expressive that the custom addres space
>>> approach; it would only cover the trust boundary aspect; it wouldn't
>>> cover any differences between generic pointers and __user, vs __iomem,
>>> __percpu, and __rcu which I admit I only dimly understand.
> 
> Yes, it does not have any of the big problems that come with those
> address spaces either!  :-)
> 
>>> Other attributes
>>> ================
>>>
>>> Patch 2 in the kit adds:
>>>    __attribute__((returns_zero_on_success))
>>> and
>>>    __attribute__((returns_nonzero_on_success))
>>> as hints to the analyzer that it's worth bifurcating the analysis of
>>> such functions (to explore failure vs success, and thus to better
>>> explore error-handling paths).  It's also a hint to the human reader of
>>> the source code.
>>
>> I thing being able to express something along these lines would
>> be useful even outside the analyzer, both for warnings and, when
>> done right, perhaps also for optimization.  So I'm in favor of
>> something like this.  I'll just reiterate here the comment on
>> this attribute I sent you privately some time ago.
> 
> What is "success" though?  You probably want it so some checker can make
> sure you do handle failure some way, but how do you see what is handling
> failure and what is handling the successful case?
> 
> 
> Segher
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1b/6] Add __attribute__((untrusted))
  2021-11-13 20:37 ` [PATCH 1b/6] Add __attribute__((untrusted)) David Malcolm
@ 2021-12-09 22:54   ` Martin Sebor
  2022-01-06 15:10     ` David Malcolm
  0 siblings, 1 reply; 39+ messages in thread
From: Martin Sebor @ 2021-12-09 22:54 UTC (permalink / raw)
  To: David Malcolm, gcc-patches, linux-toolchains

On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:
> This patch adds a new:
> 
>    __attribute__((untrusted))
> 
> for use by the C front-end, intended for use by the Linux kernel for
> use with "__user", but which could be used by other operating system
> kernels, and potentialy by other projects.

It looks like untrusted is a type attribute (rather than one
that applies to variables and/or function return values or
writeable by-reference arguments).  I find that quite surprising.
  I'm used to thinking of trusted vs tainted as dynamic properties
of data so I'm having trouble deciding what to think about
the attribute applying to types.  Can you explain why it's
useful on types?

I'd expect the taint property of a type to be quickly lost as
an object of the type is passed through existing APIs (e.g.,
a char array manipulated by string functions like strchr).

(I usually look at tests to help me understand the design of
a change but I couldn't find an answer to my question in those
in the patch.)

Thanks
Martin

PS I found one paper online that discusses type-based taint
analysis in Java but not much more.  I only quickly skimmed
the paper and although it conceptually makes sense I'm still
having difficulties seeing how it would be useful in C.

> 
> Known issues:
> - at least one TODO in handle_untrusted_attribute
> - should it be permitted to dereference an untrusted pointer?  The patch
>    currently allows this
> 
> gcc/c-family/ChangeLog:
> 	* c-attribs.c (c_common_attribute_table): Add "untrusted".
> 	(build_untrusted_type): New.
> 	(handle_untrusted_attribute): New.
> 	* c-pretty-print.c (pp_c_cv_qualifiers): Handle
> 	TYPE_QUAL_UNTRUSTED.
> 
> gcc/c/ChangeLog:
> 	* c-typeck.c (convert_for_assignment): Complain if the trust
> 	levels vary when assigning a non-NULL pointer.
> 
> gcc/ChangeLog:
> 	* doc/extend.texi (Common Type Attributes): Add "untrusted".
> 	* print-tree.c (print_node): Handle TYPE_UNTRUSTED.
> 	* tree-core.h (enum cv_qualifier): Add TYPE_QUAL_UNTRUSTED.
> 	(struct tree_type_common): Assign one of the spare bits to a new
> 	"untrusted_flag".
> 	* tree.c (set_type_quals): Handle TYPE_QUAL_UNTRUSTED.
> 	* tree.h (TYPE_QUALS): Likewise.
> 	(TYPE_QUALS_NO_ADDR_SPACE): Likewise.
> 	(TYPE_QUALS_NO_ADDR_SPACE_NO_ATOMIC): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 	* c-c++-common/attr-untrusted-1.c: New test.
> 
> Signed-off-by: David Malcolm <dmalcolm@redhat.com>
> ---
>   gcc/c-family/c-attribs.c                      |  59 +++++++
>   gcc/c-family/c-pretty-print.c                 |   2 +
>   gcc/c/c-typeck.c                              |  64 +++++++
>   gcc/doc/extend.texi                           |  25 +++
>   gcc/print-tree.c                              |   3 +
>   gcc/testsuite/c-c++-common/attr-untrusted-1.c | 165 ++++++++++++++++++
>   gcc/tree-core.h                               |   6 +-
>   gcc/tree.c                                    |   1 +
>   gcc/tree.h                                    |  11 +-
>   9 files changed, 332 insertions(+), 4 deletions(-)
>   create mode 100644 gcc/testsuite/c-c++-common/attr-untrusted-1.c
> 
> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> index 007b928c54b..100c2dabab2 100644
> --- a/gcc/c-family/c-attribs.c
> +++ b/gcc/c-family/c-attribs.c
> @@ -136,6 +136,7 @@ static tree handle_warn_unused_result_attribute (tree *, tree, tree, int,
>   						 bool *);
>   static tree handle_access_attribute (tree *, tree, tree, int, bool *);
>   
> +static tree handle_untrusted_attribute (tree *, tree, tree, int, bool *);
>   static tree handle_sentinel_attribute (tree *, tree, tree, int, bool *);
>   static tree handle_type_generic_attribute (tree *, tree, tree, int, bool *);
>   static tree handle_alloc_size_attribute (tree *, tree, tree, int, bool *);
> @@ -536,6 +537,8 @@ const struct attribute_spec c_common_attribute_table[] =
>   			      handle_special_var_sec_attribute, attr_section_exclusions },
>     { "access",		      1, 3, false, true, true, false,
>   			      handle_access_attribute, NULL },
> +  { "untrusted",	      0, 0, false,  true, false, true,
> +			      handle_untrusted_attribute, NULL },
>     /* Attributes used by Objective-C.  */
>     { "NSObject",		      0, 0, true, false, false, false,
>   			      handle_nsobject_attribute, NULL },
> @@ -5224,6 +5227,62 @@ build_attr_access_from_parms (tree parms, bool skip_voidptr)
>     return build_tree_list (name, attrargs);
>   }
>   
> +/* Build (or reuse) a type based on BASE_TYPE, but with
> +   TYPE_QUAL_UNTRUSTED.  */
> +
> +static tree
> +build_untrusted_type (tree base_type)
> +{
> +  int base_type_quals = TYPE_QUALS (base_type);
> +  return build_qualified_type (base_type,
> +			       base_type_quals | TYPE_QUAL_UNTRUSTED);
> +}
> +
> +/* Handle an "untrusted" attribute; arguments as in
> +   struct attribute_spec.handler.  */
> +
> +static tree
> +handle_untrusted_attribute (tree *node, tree ARG_UNUSED (name),
> +			    tree ARG_UNUSED (args), int ARG_UNUSED (flags),
> +			    bool *no_add_attrs)
> +{
> +  if (TREE_CODE (*node) == POINTER_TYPE)
> +    {
> +      tree base_type = TREE_TYPE (*node);
> +      tree untrusted_base_type = build_untrusted_type (base_type);
> +      *node = build_pointer_type (untrusted_base_type);
> +      *no_add_attrs = true; /* OK */
> +      return NULL_TREE;
> +    }
> +  else if (TREE_CODE (*node) == FUNCTION_TYPE)
> +    {
> +      tree return_type = TREE_TYPE (*node);
> +      if (TREE_CODE (return_type) == POINTER_TYPE)
> +	{
> +	  tree base_type = TREE_TYPE (return_type);
> +	  tree untrusted_base_type = build_untrusted_type (base_type);
> +	  tree untrusted_return_type = build_pointer_type (untrusted_base_type);
> +	  tree fn_type = build_function_type (untrusted_return_type,
> +					      TYPE_ARG_TYPES (*node));
> +	  *node = fn_type;
> +	  *no_add_attrs = true; /* OK */
> +	  return NULL_TREE;
> +	}
> +      else
> +	{
> +	  gcc_unreachable (); // TODO
> +	}
> +    }
> +  else
> +    {
> +      tree base_type = *node;
> +      tree untrusted_base_type = build_untrusted_type (base_type);
> +      *node = untrusted_base_type;
> +      *no_add_attrs = true; /* OK */
> +      return NULL_TREE;
> +    }
> +}
> +
>   /* Handle a "nothrow" attribute; arguments as in
>      struct attribute_spec.handler.  */
>   
> diff --git a/gcc/c-family/c-pretty-print.c b/gcc/c-family/c-pretty-print.c
> index a987da46d6d..120e1e6d167 100644
> --- a/gcc/c-family/c-pretty-print.c
> +++ b/gcc/c-family/c-pretty-print.c
> @@ -191,6 +191,8 @@ pp_c_cv_qualifiers (c_pretty_printer *pp, int qualifiers, bool func_type)
>     if (qualifiers & TYPE_QUAL_RESTRICT)
>       pp_c_ws_string (pp, (flag_isoc99 && !c_dialect_cxx ()
>   			 ? "restrict" : "__restrict__"));
> +  if (qualifiers & TYPE_QUAL_UNTRUSTED)
> +    pp_c_ws_string (pp, "__attribute__((untrusted))");
>   }
>   
>   /* Pretty-print T using the type-cast notation '( type-name )'.  */
> diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
> index 782414f8c8c..44de82b99ba 100644
> --- a/gcc/c/c-typeck.c
> +++ b/gcc/c/c-typeck.c
> @@ -7284,6 +7284,70 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
>   	  return error_mark_node;
>   	}
>   
> +      /* Untrusted vs trusted pointers, but allowing NULL to be used
> +	 for everything.  */
> +      if (TYPE_UNTRUSTED (ttl) != TYPE_UNTRUSTED (ttr)
> +	  && !null_pointer_constant_p (rhs))
> +	{
> +	  auto_diagnostic_group d;
> +	  bool diagnosed = true;
> +	  switch (errtype)
> +	    {
> +	    case ic_argpass:
> +	      {
> +		const char msg[] = G_("passing argument %d of %qE from "
> +				      "pointer with different trust level");
> +		if (warnopt)
> +		  diagnosed
> +		    = warning_at (expr_loc, warnopt, msg, parmnum, rname);
> +		else
> +		  error_at (expr_loc, msg, parmnum, rname);
> +	      break;
> +	      }
> +	    case ic_assign:
> +	      {
> +		const char msg[] = G_("assignment from pointer with "
> +				      "different trust level");
> +		if (warnopt)
> +		  warning_at (location, warnopt, msg);
> +		else
> +		  error_at (location, msg);
> +		break;
> +	      }
> +	    case ic_init:
> +	      {
> +		const char msg[] = G_("initialization from pointer with "
> +				      "different trust level");
> +		if (warnopt)
> +		  warning_at (location, warnopt, msg);
> +		else
> +		  error_at (location, msg);
> +		break;
> +	      }
> +	    case ic_return:
> +	      {
> +		const char msg[] = G_("return from pointer with "
> +				      "different trust level");
> +		if (warnopt)
> +		  warning_at (location, warnopt, msg);
> +		else
> +		  error_at (location, msg);
> +		break;
> +	      }
> +	    default:
> +	      gcc_unreachable ();
> +	    }
> +	  if (diagnosed)
> +	    {
> +	      if (errtype == ic_argpass)
> +		inform_for_arg (fundecl, expr_loc, parmnum, type, rhstype);
> +	      else
> +		inform (location, "expected %qT but pointer is of type %qT",
> +			type, rhstype);
> +	    }
> +	  return error_mark_node;
> +	}
> +
>         /* Check if the right-hand side has a format attribute but the
>   	 left-hand side doesn't.  */
>         if (warn_suggest_attribute_format
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 6e6c580e329..e9f47519df2 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -8770,6 +8770,31 @@ pid_t wait (wait_status_ptr_t p)
>   @}
>   @end smallexample
>   
> +@item untrusted
> +@cindex @code{untrusted} type attribute
> +Types marked with this attribute are treated as being ``untrusted'' -
> +values should be treated as under attacker control.
> +
> +The C front end will issue an error diagnostic on attempts to assign
> +pointer values between untrusted and trusted pointer types without
> +an explicit cast.
> +
> +For example, when implementing an operating system kernel, one
> +might write
> +
> +@smallexample
> +#define __kernel
> +#define __user    __attribute__ ((untrusted))
> +void __kernel *p_kernel;
> +void __user *p_user;
> +
> +/* With the above, the following assignment should be diagnosed as an error.  */
> +p_user = p_kernel;
> +@end smallexample
> +
> +The NULL pointer is treated as being usable with both trusted and
> +untrusted pointers.
> +
>   @item unused
>   @cindex @code{unused} type attribute
>   When attached to a type (including a @code{union} or a @code{struct}),
> diff --git a/gcc/print-tree.c b/gcc/print-tree.c
> index d1fbd044c27..e5123807521 100644
> --- a/gcc/print-tree.c
> +++ b/gcc/print-tree.c
> @@ -640,6 +640,9 @@ print_node (FILE *file, const char *prefix, tree node, int indent,
>         if (TYPE_RESTRICT (node))
>   	fputs (" restrict", file);
>   
> +      if (TYPE_UNTRUSTED (node))
> +	fputs (" untrusted", file);
> +
>         if (TYPE_LANG_FLAG_0 (node))
>   	fputs (" type_0", file);
>         if (TYPE_LANG_FLAG_1 (node))
> diff --git a/gcc/testsuite/c-c++-common/attr-untrusted-1.c b/gcc/testsuite/c-c++-common/attr-untrusted-1.c
> new file mode 100644
> index 00000000000..84a217fc59f
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/attr-untrusted-1.c
> @@ -0,0 +1,165 @@
> +#define __kernel
> +#define __user __attribute__((untrusted))
> +#define __iomem
> +#define __percpu
> +#define __rcu
> +
> +void *p;
> +void __kernel *p_kernel;
> +void __user *p_user;
> +void __iomem *p_iomem;
> +void __percpu *p_percpu;
> +void __rcu *p_rcu;
> +
> +#define NULL ((void *)0)
> +
> +extern void accepts_p (void *); /* { dg-message "24: expected 'void \\*' but argument is of type '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } } */
> +/* { dg-message "24:  initializing argument 1 of 'void accepts_p\\(void\\*\\)'" "" { target c++ } .-1 } */
> +extern void accepts_p_kernel (void __kernel *);
> +extern void accepts_p_user (void __user *);
> +
> +void test_argpass_to_p (void)
> +{
> +  accepts_p (p);
> +  accepts_p (p_kernel);
> +  accepts_p (p_user); /* { dg-error "passing argument 1 of 'accepts_p' from pointer with different trust level" "" { target c } } */
> +  /* { dg-error "invalid conversion from '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" { target c++ } .-1 } */
> +}
> +
> +void test_init_p (void)
> +{
> +  void *local_p_1 = p;
> +  void *local_p_2 = p_kernel;
> +  void *local_p_3 = p_user; /* { dg-error "initialization from pointer with different trust level" "" { target c } } */
> +  /* { dg-message "expected 'void \\*' but pointer is of type '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 } */
> +  /* { dg-error "invalid conversion from '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" { target c++ } .-2 } */
> +}
> +
> +void test_init_p_kernel (void)
> +{
> +  void __kernel *local_p_1 = p;
> +  void __kernel *local_p_2 = p_kernel;
> +  void __kernel *local_p_3 = p_user; /* { dg-error "initialization from pointer with different trust level" "" { target c } } */
> +  /* { dg-message "expected 'void \\*' but pointer is of type '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 } */
> +  /* { dg-error "invalid conversion from '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" { target c++ } .-2 } */
> +}
> +
> +void test_init_p_user (void)
> +{
> +  void __user *local_p_1 = p; /* { dg-error "initialization from pointer with different trust level" "" { target c } } */
> +  /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\) void \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
> +  void __user *local_p_2 = p_kernel; /* { dg-error "initialization from pointer with different trust level" "" { target c } } */
> +  /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\) void \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
> +  void __user *local_p_3 = p_user;
> +  void __user *local_p_4 = NULL;
> +}
> +
> +void test_assign_to_p (void)
> +{
> +  p = p;
> +  p = p_kernel;
> +  p = p_user; /* { dg-error "assignment from pointer with different trust level" "" { target c } } */
> +  /* { dg-message "expected 'void \\*' but pointer is of type '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 } */
> +  /* { dg-error "invalid conversion from '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" { target c++ } .-2 } */
> +  // etc
> +}
> +
> +void test_assign_to_p_kernel (void)
> +{
> +  p_kernel = p;
> +  p_kernel = p_kernel;
> +  p_kernel = p_user; /* { dg-error "assignment from pointer with different trust level" "" { target c } } */
> +  /* { dg-message "expected 'void \\*' but pointer is of type '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 } */
> +  /* { dg-error "invalid conversion from '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" { target c++ } .-2 } */
> +  // etc
> +}
> +
> +void test_assign_to_p_user (void)
> +{
> +  p_user = p;  /* { dg-error "assignment from pointer with different trust level" "" { target c } } */
> +  /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\) void \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
> +  p_user = p_kernel;  /* { dg-error "assignment from pointer with different trust level" "" { target c } } */
> +  /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\) void \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
> +  p_user = p_user;
> +  p_user = NULL;
> +  // etc
> +}
> +
> +void *test_return_p (int i)
> +{
> +  switch (i)
> +    {
> +    default:
> +    case 0:
> +      return p;
> +    case 1:
> +      return p_kernel;
> +    case 2:
> +      return p_user; /* { dg-error "return from pointer with different trust level" "" { target c } } */
> +      /* { dg-message "expected 'void \\*' but pointer is of type '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 } */
> +      /* { dg-error "invalid conversion from '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" { target c++ } .-2 } */
> +    }
> +}
> +
> +void __kernel *test_return_p_kernel (int i)
> +{
> +  switch (i)
> +    {
> +    default:
> +    case 0:
> +      return p;
> +    case 1:
> +      return p_kernel;
> +    case 2:
> +      return p_user; /* { dg-error "return from pointer with different trust level" "" { target c } } */
> +      /* { dg-message "expected 'void \\*' but pointer is of type '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 } */
> +      /* { dg-error "invalid conversion from '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" { target c++ } .-2 } */
> +    }
> +}
> +
> +void __user *
> +test_return_p_user (int i)
> +{
> +  switch (i)
> +    {
> +    default:
> +    case 0:
> +      return p; /* { dg-error "return from pointer with different trust level" "" { target c } } */
> +      /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\) void \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
> +    case 1:
> +      return p_kernel; /* { dg-error "return from pointer with different trust level" "" { target c } } */
> +      /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\) void \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
> +    case 2:
> +      return p_user;
> +    case 3:
> +      return NULL;
> +    }
> +}
> +
> +void test_cast_k_to_u (void)
> +{
> +  p_user = (void __user *)p_kernel;
> +}
> +
> +void test_cast_u_to_k (void)
> +{
> +  p_kernel = (void __kernel *)p_user;
> +}
> +
> +int test_deref_read (int __user *p)
> +{
> +  return *p; // FIXME: should this be allowed directly?
> +}
> +
> +void test_deref_write (int __user *p, int i)
> +{
> +  *p = i; // FIXME: should this be allowed directly?
> +}
> +
> +typedef struct foo { int i; } __user *foo_ptr_t;
> +
> +void __user *
> +test_pass_through (void __user *ptr)
> +{
> +  return ptr;
> +}
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index 8ab119dc9a2..35a7f50c06c 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -604,7 +604,8 @@ enum cv_qualifier {
>     TYPE_QUAL_CONST    = 0x1,
>     TYPE_QUAL_VOLATILE = 0x2,
>     TYPE_QUAL_RESTRICT = 0x4,
> -  TYPE_QUAL_ATOMIC   = 0x8
> +  TYPE_QUAL_ATOMIC   = 0x8,
> +  TYPE_QUAL_UNTRUSTED = 0x10
>   };
>   
>   /* Standard named or nameless data types of the C compiler.  */
> @@ -1684,7 +1685,8 @@ struct GTY(()) tree_type_common {
>     unsigned typeless_storage : 1;
>     unsigned empty_flag : 1;
>     unsigned indivisible_p : 1;
> -  unsigned spare : 16;
> +  unsigned untrusted_flag : 1;
> +  unsigned spare : 15;
>   
>     alias_set_type alias_set;
>     tree pointer_to;
> diff --git a/gcc/tree.c b/gcc/tree.c
> index 845228a055b..3600639d985 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -5379,6 +5379,7 @@ set_type_quals (tree type, int type_quals)
>     TYPE_VOLATILE (type) = (type_quals & TYPE_QUAL_VOLATILE) != 0;
>     TYPE_RESTRICT (type) = (type_quals & TYPE_QUAL_RESTRICT) != 0;
>     TYPE_ATOMIC (type) = (type_quals & TYPE_QUAL_ATOMIC) != 0;
> +  TYPE_UNTRUSTED (type) = (type_quals & TYPE_QUAL_UNTRUSTED) != 0;
>     TYPE_ADDR_SPACE (type) = DECODE_QUAL_ADDR_SPACE (type_quals);
>   }
>   
> diff --git a/gcc/tree.h b/gcc/tree.h
> index f62c00bc870..caab575b210 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -2197,6 +2197,10 @@ extern tree vector_element_bits_tree (const_tree);
>      the term.  */
>   #define TYPE_RESTRICT(NODE) (TYPE_CHECK (NODE)->type_common.restrict_flag)
>   
> +/* Nonzero in a type considered "untrusted" - values should be treated as
> +   under attacker control.  */
> +#define TYPE_UNTRUSTED(NODE) (TYPE_CHECK (NODE)->type_common.untrusted_flag)
> +
>   /* If nonzero, type's name shouldn't be emitted into debug info.  */
>   #define TYPE_NAMELESS(NODE) (TYPE_CHECK (NODE)->base.u.bits.nameless_flag)
>   
> @@ -2221,6 +2225,7 @@ extern tree vector_element_bits_tree (const_tree);
>   	  | (TYPE_VOLATILE (NODE) * TYPE_QUAL_VOLATILE)		\
>   	  | (TYPE_ATOMIC (NODE) * TYPE_QUAL_ATOMIC)		\
>   	  | (TYPE_RESTRICT (NODE) * TYPE_QUAL_RESTRICT)		\
> +	  | (TYPE_UNTRUSTED (NODE) * TYPE_QUAL_UNTRUSTED)	\
>   	  | (ENCODE_QUAL_ADDR_SPACE (TYPE_ADDR_SPACE (NODE)))))
>   
>   /* The same as TYPE_QUALS without the address space qualifications.  */
> @@ -2228,14 +2233,16 @@ extern tree vector_element_bits_tree (const_tree);
>     ((int) ((TYPE_READONLY (NODE) * TYPE_QUAL_CONST)		\
>   	  | (TYPE_VOLATILE (NODE) * TYPE_QUAL_VOLATILE)		\
>   	  | (TYPE_ATOMIC (NODE) * TYPE_QUAL_ATOMIC)		\
> -	  | (TYPE_RESTRICT (NODE) * TYPE_QUAL_RESTRICT)))
> +	  | (TYPE_RESTRICT (NODE) * TYPE_QUAL_RESTRICT)		\
> +	  | (TYPE_UNTRUSTED (NODE) * TYPE_QUAL_UNTRUSTED)))
>   
>   /* The same as TYPE_QUALS without the address space and atomic
>      qualifications.  */
>   #define TYPE_QUALS_NO_ADDR_SPACE_NO_ATOMIC(NODE)		\
>     ((int) ((TYPE_READONLY (NODE) * TYPE_QUAL_CONST)		\
>   	  | (TYPE_VOLATILE (NODE) * TYPE_QUAL_VOLATILE)		\
> -	  | (TYPE_RESTRICT (NODE) * TYPE_QUAL_RESTRICT)))
> +	  | (TYPE_RESTRICT (NODE) * TYPE_QUAL_RESTRICT)		\
> +	  | (TYPE_UNTRUSTED (NODE) * TYPE_QUAL_UNTRUSTED)))
>   
>   /* These flags are available for each language front end to use internally.  */
>   #define TYPE_LANG_FLAG_0(NODE) (TYPE_CHECK (NODE)->type_common.lang_flag_0)
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries
  2021-12-09 16:42     ` Martin Sebor
@ 2021-12-09 23:40       ` Segher Boessenkool
  0 siblings, 0 replies; 39+ messages in thread
From: Segher Boessenkool @ 2021-12-09 23:40 UTC (permalink / raw)
  To: Martin Sebor; +Cc: David Malcolm, gcc-patches, linux-toolchains

On Thu, Dec 09, 2021 at 09:42:04AM -0700, Martin Sebor wrote:
> On 12/6/21 12:40 PM, Segher Boessenkool wrote:
> >Named address spaces are completely target-specific.
> 
> My understanding of these kernel/user address spaces that David
> is adding for the benefit of the analyzer is that the correspond
> to what TR 18037 calls nested namespaces.  They're nested within
> the generic namespace that's a union of the twp.  With that, I'd
> expect them to be fully handled early on and be transparent
> afterwards.  Is implementing this idea not feasible in the GCC
> design?

As long as you can explain it, it can be implemented.  What I am saying
though is it is imnsho a very bad idea to try to implement this in terms
of named address spaces (which is a GCC extension).


Segher

^ permalink raw reply	[flat|nested] 39+ messages in thread

* PING (C/C++): Re: [PATCH 6/6] Add __attribute__ ((tainted))
  2021-11-13 20:37 ` [PATCH 6/6] Add __attribute__ ((tainted)) David Malcolm
@ 2022-01-06 14:08   ` David Malcolm
  2022-01-10 21:36     ` PING^2 " David Malcolm
  0 siblings, 1 reply; 39+ messages in thread
From: David Malcolm @ 2022-01-06 14:08 UTC (permalink / raw)
  To: gcc-patches, linux-toolchains

On Sat, 2021-11-13 at 15:37 -0500, David Malcolm wrote:
> This patch adds a new __attribute__ ((tainted)) to the C/C++
> frontends.

Ping for GCC C/C++ mantainers for review of the C/C++ FE parts of this
patch (attribute registration, documentation, the name of the
attribute, etc).

(I believe it's independent of the rest of the patch kit, in that it
could go into trunk without needing the prior patches)

Thanks
Dave


> 
> It can be used on function decls: the analyzer will treat as tainted
> all parameters to the function and all buffers pointed to by
> parameters
> to the function.  Adding this in one place to the Linux kernel's
> __SYSCALL_DEFINEx macro allows the analyzer to treat all syscalls as
> having tainted inputs.  This gives additional testing beyond e.g.
> __user
> pointers added by earlier patches - an example of the use of this can
> be
> seen in CVE-2011-2210, where given:
> 
>  SYSCALL_DEFINE5(osf_getsysinfo, unsigned long, op, void __user *,
> buffer,
>                  unsigned long, nbytes, int __user *, start, void
> __user *, arg)
> 
> the analyzer will treat the nbytes param as under attacker control,
> and
> can complain accordingly:
> 
> taint-CVE-2011-2210-1.c: In function ‘sys_osf_getsysinfo’:
> taint-CVE-2011-2210-1.c:69:21: warning: use of attacker-controlled
> value
>   ‘nbytes’ as size without upper-bounds checking [CWE-129] [-
> Wanalyzer-tainted-size]
>    69 |                 if (copy_to_user(buffer, hwrpb, nbytes) != 0)
>       |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Additionally, the patch allows the attribute to be used on field
> decls:
> specifically function pointers.  Any function used as an initializer
> for such a field gets treated as tainted.  An example can be seen in
> CVE-2020-13143, where adding __attribute__((tainted)) to the "store"
> callback of configfs_attribute:
> 
>   struct configfs_attribute {
>      /* [...snip...] */
>      ssize_t (*store)(struct config_item *, const char *, size_t)
>        __attribute__((tainted));
>      /* [...snip...] */
>   };
> 
> allows the analyzer to see:
> 
>  CONFIGFS_ATTR(gadget_dev_desc_, UDC);
> 
> and treat gadget_dev_desc_UDC_store as tainted, so that it complains:
> 
> taint-CVE-2020-13143-1.c: In function ‘gadget_dev_desc_UDC_store’:
> taint-CVE-2020-13143-1.c:33:17: warning: use of attacker-controlled
> value
>   ‘len + 18446744073709551615’ as offset without upper-bounds
> checking [CWE-823] [-Wanalyzer-tainted-offset]
>    33 |         if (name[len - 1] == '\n')
>       |             ~~~~^~~~~~~~~
> 
> Similarly, the attribute could be used on the ioctl callback field,
> USB device callbacks, network-handling callbacks etc.  This
> potentially
> gives a lot of test coverage with relatively little code annotation,
> and
> without necessarily needing link-time analysis (which -fanalyzer can
> only do at present on trivial examples).
> 
> I believe this is the first time we've had an attribute on a field.
> If that's an issue, I could prepare a version of the patch that
> merely allowed it on functions themselves.
> 
> As before this currently still needs -fanalyzer-checker=taint (in
> addition to -fanalyzer).
> 
> gcc/analyzer/ChangeLog:
>         * engine.cc: Include "stringpool.h", "attribs.h", and
>         "tree-dfa.h".
>         (mark_params_as_tainted): New.
>         (class tainted_function_custom_event): New.
>         (class tainted_function_info): New.
>         (exploded_graph::add_function_entry): Handle functions with
>         "tainted" attribute.
>         (class tainted_field_custom_event): New.
>         (class tainted_callback_custom_event): New.
>         (class tainted_call_info): New.
>         (add_tainted_callback): New.
>         (add_any_callbacks): New.
>         (exploded_graph::build_initial_worklist): Find callbacks that
> are
>         reachable from global initializers, calling add_any_callbacks
> on
>         them.
> 
> gcc/c-family/ChangeLog:
>         * c-attribs.c (c_common_attribute_table): Add "tainted".
>         (handle_tainted_attribute): New.
> 
> gcc/ChangeLog:
>         * doc/extend.texi (Function Attributes): Note that "tainted"
> can
>         be used on field decls.
>         (Common Function Attributes): Add entry on "tainted"
> attribute.
> 
> gcc/testsuite/ChangeLog:
>         * gcc.dg/analyzer/attr-tainted-1.c: New test.
>         * gcc.dg/analyzer/attr-tainted-misuses.c: New test.
>         * gcc.dg/analyzer/taint-CVE-2011-2210-1.c: New test.
>         * gcc.dg/analyzer/taint-CVE-2020-13143-1.c: New test.
>         * gcc.dg/analyzer/taint-CVE-2020-13143-2.c: New test.
>         * gcc.dg/analyzer/taint-CVE-2020-13143.h: New test.
>         * gcc.dg/analyzer/taint-alloc-3.c: New test.
>         * gcc.dg/analyzer/taint-alloc-4.c: New test.
> 
> Signed-off-by: David Malcolm <dmalcolm@redhat.com>
> ---
>  gcc/analyzer/engine.cc                        | 317
> +++++++++++++++++-
>  gcc/c-family/c-attribs.c                      |  36 ++
>  gcc/doc/extend.texi                           |  22 +-
>  .../gcc.dg/analyzer/attr-tainted-1.c          |  88 +++++
>  .../gcc.dg/analyzer/attr-tainted-misuses.c    |   6 +
>  .../gcc.dg/analyzer/taint-CVE-2011-2210-1.c   |  93 +++++
>  .../gcc.dg/analyzer/taint-CVE-2020-13143-1.c  |  38 +++
>  .../gcc.dg/analyzer/taint-CVE-2020-13143-2.c  |  32 ++
>  .../gcc.dg/analyzer/taint-CVE-2020-13143.h    |  91 +++++
>  gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c |  21 ++
>  gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c |  31 ++
>  11 files changed, 772 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-
> misuses.c
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-
> 2210-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-
> 13143-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-
> 13143-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-
> 13143.h
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
> 
> diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
> index 096e219392d..5fab41daf93 100644
> --- a/gcc/analyzer/engine.cc
> +++ b/gcc/analyzer/engine.cc
> @@ -68,6 +68,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "plugin.h"
>  #include "target.h"
>  #include <memory>
> +#include "stringpool.h"
> +#include "attribs.h"
> +#include "tree-dfa.h"
>  
>  /* For an overview, see gcc/doc/analyzer.texi.  */
>  
> @@ -2276,6 +2279,116 @@ exploded_graph::~exploded_graph ()
>      delete (*iter).second;
>  }
>  
> +/* Subroutine for use when implementing __attribute__((tainted))
> +   on functions and on function pointer fields in structs.
> +
> +   Called on STATE representing a call to FNDECL.
> +   Mark all params of FNDECL in STATE as "tainted".  Mark the value
> of all
> +   regions pointed to by params of FNDECL as "tainted".
> +
> +   Return true if successful; return false if the "taint" state
> machine
> +   was not found.  */
> +
> +static bool
> +mark_params_as_tainted (program_state *state, tree fndecl,
> +                       const extrinsic_state &ext_state)
> +{
> +  unsigned taint_sm_idx;
> +  if (!ext_state.get_sm_idx_by_name ("taint", &taint_sm_idx))
> +    return false;
> +  sm_state_map *smap = state->m_checker_states[taint_sm_idx];
> +
> +  const state_machine &sm = ext_state.get_sm (taint_sm_idx);
> +  state_machine::state_t tainted = sm.get_state_by_name ("tainted");
> +
> +  region_model_manager *mgr = ext_state.get_model_manager ();
> +
> +  function *fun = DECL_STRUCT_FUNCTION (fndecl);
> +  gcc_assert (fun);
> +
> +  for (tree iter_parm = DECL_ARGUMENTS (fndecl); iter_parm;
> +       iter_parm = DECL_CHAIN (iter_parm))
> +    {
> +      tree param = iter_parm;
> +      if (tree parm_default_ssa = ssa_default_def (fun, iter_parm))
> +       param = parm_default_ssa;
> +      const region *param_reg = state->m_region_model->get_lvalue
> (param, NULL);
> +      const svalue *init_sval = mgr->get_or_create_initial_value
> (param_reg);
> +      smap->set_state (state->m_region_model, init_sval,
> +                      tainted, NULL /*origin_new_sval*/, ext_state);
> +      if (POINTER_TYPE_P (TREE_TYPE (param)))
> +       {
> +         const region *pointee_reg = mgr->get_symbolic_region
> (init_sval);
> +         /* Mark "*param" as tainted.  */
> +         const svalue *init_pointee_sval
> +           = mgr->get_or_create_initial_value (pointee_reg);
> +         smap->set_state (state->m_region_model, init_pointee_sval,
> +                          tainted, NULL /*origin_new_sval*/,
> ext_state);
> +       }
> +    }
> +
> +  return true;
> +}
> +
> +/* Custom event for use by tainted_function_info when a function
> +   has been marked with __attribute__((tainted)).  */
> +
> +class tainted_function_custom_event : public custom_event
> +{
> +public:
> +  tainted_function_custom_event (location_t loc, tree fndecl, int
> depth)
> +  : custom_event (loc, fndecl, depth),
> +    m_fndecl (fndecl)
> +  {
> +  }
> +
> +  label_text get_desc (bool can_colorize) const FINAL OVERRIDE
> +  {
> +    return make_label_text
> +      (can_colorize,
> +       "function %qE marked with %<__attribute__((tainted))%>",
> +       m_fndecl);
> +  }
> +
> +private:
> +  tree m_fndecl;
> +};
> +
> +/* Custom exploded_edge info for top-level calls to a function
> +   marked with __attribute__((tainted)).  */
> +
> +class tainted_function_info : public custom_edge_info
> +{
> +public:
> +  tainted_function_info (tree fndecl)
> +  : m_fndecl (fndecl)
> +  {}
> +
> +  void print (pretty_printer *pp) const FINAL OVERRIDE
> +  {
> +    pp_string (pp, "call to tainted function");
> +  };
> +
> +  bool update_model (region_model *,
> +                    const exploded_edge *,
> +                    region_model_context *) const FINAL OVERRIDE
> +  {
> +    /* No-op.  */
> +    return true;
> +  }
> +
> +  void add_events_to_path (checker_path *emission_path,
> +                          const exploded_edge &) const FINAL
> OVERRIDE
> +  {
> +    emission_path->add_event
> +      (new tainted_function_custom_event
> +       (DECL_SOURCE_LOCATION (m_fndecl), m_fndecl, 0));
> +  }
> +
> +private:
> +  tree m_fndecl;
> +};
> +
>  /* Ensure that there is an exploded_node representing an external
> call to
>     FUN, adding it to the worklist if creating it.
>  
> @@ -2302,14 +2415,25 @@ exploded_graph::add_function_entry (function
> *fun)
>    program_state state (m_ext_state);
>    state.push_frame (m_ext_state, fun);
>  
> +  custom_edge_info *edge_info = NULL;
> +
> +  if (lookup_attribute ("tainted", DECL_ATTRIBUTES (fun->decl)))
> +    {
> +      if (mark_params_as_tainted (&state, fun->decl, m_ext_state))
> +       edge_info = new tainted_function_info (fun->decl);
> +    }
> +
>    if (!state.m_valid)
>      return NULL;
>  
>    exploded_node *enode = get_or_create_node (point, state, NULL);
>    if (!enode)
> -    return NULL;
> +    {
> +      delete edge_info;
> +      return NULL;
> +    }
>  
> -  add_edge (m_origin, enode, NULL);
> +  add_edge (m_origin, enode, NULL, edge_info);
>  
>    m_functions_with_enodes.add (fun);
>  
> @@ -2623,6 +2747,184 @@ toplevel_function_p (function *fun, logger
> *logger)
>    return true;
>  }
>  
> +/* Custom event for use by tainted_call_info when a callback field
> has been
> +   marked with __attribute__((tainted)), for labelling the field. 
> */
> +
> +class tainted_field_custom_event : public custom_event
> +{
> +public:
> +  tainted_field_custom_event (tree field)
> +  : custom_event (DECL_SOURCE_LOCATION (field), NULL_TREE, 0),
> +    m_field (field)
> +  {
> +  }
> +
> +  label_text get_desc (bool can_colorize) const FINAL OVERRIDE
> +  {
> +    return make_label_text (can_colorize,
> +                           "field %qE of %qT"
> +                           " is marked with
> %<__attribute__((tainted))%>",
> +                           m_field, DECL_CONTEXT (m_field));
> +  }
> +
> +private:
> +  tree m_field;
> +};
> +
> +/* Custom event for use by tainted_call_info when a callback field
> has been
> +   marked with __attribute__((tainted)), for labelling the function
> used
> +   in that callback.  */
> +
> +class tainted_callback_custom_event : public custom_event
> +{
> +public:
> +  tainted_callback_custom_event (location_t loc, tree fndecl, int
> depth,
> +                                tree field)
> +  : custom_event (loc, fndecl, depth),
> +    m_field (field)
> +  {
> +  }
> +
> +  label_text get_desc (bool can_colorize) const FINAL OVERRIDE
> +  {
> +    return make_label_text (can_colorize,
> +                           "function %qE used as initializer for
> field %qE"
> +                           " marked with
> %<__attribute__((tainted))%>",
> +                           m_fndecl, m_field);
> +  }
> +
> +private:
> +  tree m_field;
> +};
> +
> +/* Custom edge info for use when adding a function used by a
> callback field
> +   marked with '__attribute__((tainted))'.   */
> +
> +class tainted_call_info : public custom_edge_info
> +{
> +public:
> +  tainted_call_info (tree field, tree fndecl, location_t loc)
> +  : m_field (field), m_fndecl (fndecl), m_loc (loc)
> +  {}
> +
> +  void print (pretty_printer *pp) const FINAL OVERRIDE
> +  {
> +    pp_string (pp, "call to tainted field");
> +  };
> +
> +  bool update_model (region_model *,
> +                    const exploded_edge *,
> +                    region_model_context *) const FINAL OVERRIDE
> +  {
> +    /* No-op.  */
> +    return true;
> +  }
> +
> +  void add_events_to_path (checker_path *emission_path,
> +                          const exploded_edge &) const FINAL
> OVERRIDE
> +  {
> +    /* Show the field in the struct declaration
> +       e.g. "(1) field 'store' is marked with
> '__attribute__((tainted))'"  */
> +    emission_path->add_event
> +      (new tainted_field_custom_event (m_field));
> +
> +    /* Show the callback in the initializer
> +       e.g.
> +       "(2) function 'gadget_dev_desc_UDC_store' used as initializer
> +       for field 'store' marked with '__attribute__((tainted))'". 
> */
> +    emission_path->add_event
> +      (new tainted_callback_custom_event (m_loc, m_fndecl, 0,
> m_field));
> +  }
> +
> +private:
> +  tree m_field;
> +  tree m_fndecl;
> +  location_t m_loc;
> +};
> +
> +/* Given an initializer at LOC for FIELD marked with
> '__attribute__((tainted))'
> +   initialized with FNDECL, add an entrypoint to FNDECL to EG (and
> to its
> +   worklist) where the params to FNDECL are marked as tainted.  */
> +
> +static void
> +add_tainted_callback (exploded_graph *eg, tree field, tree fndecl,
> +                     location_t loc)
> +{
> +  logger *logger = eg->get_logger ();
> +
> +  LOG_SCOPE (logger);
> +
> +  if (!gimple_has_body_p (fndecl))
> +    return;
> +
> +  const extrinsic_state &ext_state = eg->get_ext_state ();
> +
> +  function *fun = DECL_STRUCT_FUNCTION (fndecl);
> +  gcc_assert (fun);
> +
> +  program_point point
> +    = program_point::from_function_entry (eg->get_supergraph (),
> fun);
> +  program_state state (ext_state);
> +  state.push_frame (ext_state, fun);
> +
> +  if (!mark_params_as_tainted (&state, fndecl, ext_state))
> +    return;
> +
> +  if (!state.m_valid)
> +    return;
> +
> +  exploded_node *enode = eg->get_or_create_node (point, state,
> NULL);
> +  if (logger)
> +    {
> +      if (enode)
> +       logger->log ("created EN %i for tainted %qE entrypoint",
> +                    enode->m_index, fndecl);
> +      else
> +       {
> +         logger->log ("did not create enode for tainted %qE
> entrypoint",
> +                      fndecl);
> +         return;
> +       }
> +    }
> +
> +  tainted_call_info *info = new tainted_call_info (field, fndecl,
> loc);
> +  eg->add_edge (eg->get_origin (), enode, NULL, info);
> +}
> +
> +/* Callback for walk_tree for finding callbacks within initializers;
> +   ensure that any callback initializer where the corresponding
> field is
> +   marked with '__attribute__((tainted))' is treated as an
> entrypoint to the
> +   analysis, special-casing that the inputs to the callback are
> +   untrustworthy.  */
> +
> +static tree
> +add_any_callbacks (tree *tp, int *, void *data)
> +{
> +  exploded_graph *eg = (exploded_graph *)data;
> +  if (TREE_CODE (*tp) == CONSTRUCTOR)
> +    {
> +      /* Find fields with the "tainted" attribute.
> +        walk_tree only walks the values, not the index values;
> +        look at the index values.  */
> +      unsigned HOST_WIDE_INT idx;
> +      constructor_elt *ce;
> +
> +      for (idx = 0; vec_safe_iterate (CONSTRUCTOR_ELTS (*tp), idx,
> &ce);
> +          idx++)
> +       if (ce->index && TREE_CODE (ce->index) == FIELD_DECL)
> +         if (lookup_attribute ("tainted", DECL_ATTRIBUTES (ce-
> >index)))
> +           {
> +             tree value = ce->value;
> +             if (TREE_CODE (value) == ADDR_EXPR
> +                 && TREE_CODE (TREE_OPERAND (value, 0)) ==
> FUNCTION_DECL)
> +               add_tainted_callback (eg, ce->index, TREE_OPERAND
> (value, 0),
> +                                     EXPR_LOCATION (value));
> +           }
> +    }
> +
> +  return NULL_TREE;
> +}
> +
>  /* Add initial nodes to EG, with entrypoints for externally-callable
>     functions.  */
>  
> @@ -2648,6 +2950,17 @@ exploded_graph::build_initial_worklist ()
>           logger->log ("did not create enode for %qE entrypoint",
> fun->decl);
>        }
>    }
> +
> +  /* Find callbacks that are reachable from global initializers.  */
> +  varpool_node *vpnode;
> +  FOR_EACH_VARIABLE (vpnode)
> +    {
> +      tree decl = vpnode->decl;
> +      tree init = DECL_INITIAL (decl);
> +      if (!init)
> +       continue;
> +      walk_tree (&init, add_any_callbacks, this, NULL);
> +    }
>  }
>  
>  /* The main loop of the analysis.
> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> index 9e03156de5e..835ba6e0e8c 100644
> --- a/gcc/c-family/c-attribs.c
> +++ b/gcc/c-family/c-attribs.c
> @@ -117,6 +117,7 @@ static tree
> handle_no_profile_instrument_function_attribute (tree *, tree,
>                                                              tree,
> int, bool *);
>  static tree handle_malloc_attribute (tree *, tree, tree, int, bool
> *);
>  static tree handle_dealloc_attribute (tree *, tree, tree, int, bool
> *);
> +static tree handle_tainted_attribute (tree *, tree, tree, int, bool
> *);
>  static tree handle_returns_twice_attribute (tree *, tree, tree, int,
> bool *);
>  static tree handle_no_limit_stack_attribute (tree *, tree, tree,
> int,
>                                              bool *);
> @@ -569,6 +570,8 @@ const struct attribute_spec
> c_common_attribute_table[] =
>                               handle_objc_nullability_attribute, NULL
> },
>    { "*dealloc",                1, 2, true, false, false, false,
>                               handle_dealloc_attribute, NULL },
> +  { "tainted",               0, 0, true,  false, false, false,
> +                             handle_tainted_attribute, NULL },
>    { NULL,                     0, 0, false, false, false, false,
> NULL, NULL }
>  };
>  
> @@ -5857,6 +5860,39 @@ handle_objc_nullability_attribute (tree *node,
> tree name, tree args,
>    return NULL_TREE;
>  }
>  
> +/* Handle a "tainted" attribute; arguments as in
> +   struct attribute_spec.handler.  */
> +
> +static tree
> +handle_tainted_attribute (tree *node, tree name, tree, int,
> +                         bool *no_add_attrs)
> +{
> +  if (TREE_CODE (*node) != FUNCTION_DECL
> +      && TREE_CODE (*node) != FIELD_DECL)
> +    {
> +      warning (OPT_Wattributes, "%qE attribute ignored; valid only "
> +              "for functions and function pointer fields",
> +              name);
> +      *no_add_attrs = true;
> +      return NULL_TREE;
> +    }
> +
> +  if (TREE_CODE (*node) == FIELD_DECL
> +      && !(TREE_CODE (TREE_TYPE (*node)) == POINTER_TYPE
> +          && TREE_CODE (TREE_TYPE (TREE_TYPE (*node))) ==
> FUNCTION_TYPE))
> +    {
> +      warning (OPT_Wattributes, "%qE attribute ignored;"
> +              " field must be a function pointer",
> +              name);
> +      *no_add_attrs = true;
> +      return NULL_TREE;
> +    }
> +
> +  *no_add_attrs = false; /* OK */
> +
> +  return NULL_TREE;
> +}
> +
>  /* Attempt to partially validate a single attribute ATTR as if
>     it were to be applied to an entity OPER.  */
>  
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 5a6ef464779..826bbd48e7e 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -2465,7 +2465,8 @@ variable declarations (@pxref{Variable
> Attributes}),
>  labels (@pxref{Label Attributes}),
>  enumerators (@pxref{Enumerator Attributes}),
>  statements (@pxref{Statement Attributes}),
> -and types (@pxref{Type Attributes}).
> +types (@pxref{Type Attributes}),
> +and on field declarations (for @code{tainted}).
>  
>  There is some overlap between the purposes of attributes and pragmas
>  (@pxref{Pragmas,,Pragmas Accepted by GCC}).  It has been
> @@ -3977,6 +3978,25 @@ addition to creating a symbol version (as if
>  @code{"@var{name2}@@@var{nodename}"} was used) the version will be
> also used
>  to resolve @var{name2} by the linker.
>  
> +@item tainted
> +@cindex @code{tainted} function attribute
> +The @code{tainted} attribute is used to specify that a function is
> called
> +in a way that requires sanitization of its arguments, such as a
> system
> +call in an operating system kernel.  Such a function can be
> considered part
> +of the ``attack surface'' of the program.  The attribute can be used
> both
> +on function declarations, and on field declarations containing
> function
> +pointers.  In the latter case, any function used as an initializer
> of
> +such a callback field will be treated as tainted.
> +
> +The analyzer will pay particular attention to such functions when
> both
> +@option{-fanalyzer} and @option{-fanalyzer-checker=taint} are
> supplied,
> +potentially issuing warnings guarded by
> +@option{-Wanalyzer-exposure-through-uninit-copy},
> +@option{-Wanalyzer-tainted-allocation-size},
> +@option{-Wanalyzer-tainted-array-index},
> +@option{Wanalyzer-tainted-offset},
> +and @option{Wanalyzer-tainted-size}.
> +
>  @item target_clones (@var{options})
>  @cindex @code{target_clones} function attribute
>  The @code{target_clones} attribute is used to specify that a
> function
> diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
> b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
> new file mode 100644
> index 00000000000..cc4d5900372
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
> @@ -0,0 +1,88 @@
> +// TODO: remove need for this option
> +/* { dg-additional-options "-fanalyzer-checker=taint" } */
> +
> +#include "analyzer-decls.h"
> +
> +struct arg_buf
> +{
> +  int i;
> +  int j;
> +};
> +
> +/* Example of marking a function as tainted.  */
> +
> +void __attribute__((tainted))
> +test_1 (int i, void *p, char *q)
> +{
> +  /* There should be a single enode,
> +     for the "tainted" entry to the function.  */
> +  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1 processed
> enode" } */
> +
> +  __analyzer_dump_state ("taint", i); /* { dg-warning "state:
> 'tainted'" } */
> +  __analyzer_dump_state ("taint", p); /* { dg-warning "state:
> 'tainted'" } */
> +  __analyzer_dump_state ("taint", q); /* { dg-warning "state:
> 'tainted'" } */
> +  __analyzer_dump_state ("taint", *q); /* { dg-warning "state:
> 'tainted'" } */
> +
> +  struct arg_buf *args = p;
> +  __analyzer_dump_state ("taint", args->i); /* { dg-warning "state:
> 'tainted'" } */
> +  __analyzer_dump_state ("taint", args->j); /* { dg-warning "state:
> 'tainted'" } */  
> +}
> +
> +/* Example of marking a callback field as tainted.  */
> +
> +struct s2
> +{
> +  void (*cb) (int, void *, char *)
> +    __attribute__((tainted));
> +};
> +
> +/* Function not marked as tainted.  */
> +
> +void
> +test_2a (int i, void *p, char *q)
> +{
> +  /* There should be a single enode,
> +     for the normal entry to the function.  */
> +  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1 processed
> enode" } */
> +
> +  __analyzer_dump_state ("taint", i); /* { dg-warning "state:
> 'start'" } */
> +  __analyzer_dump_state ("taint", p); /* { dg-warning "state:
> 'start'" } */
> +  __analyzer_dump_state ("taint", q); /* { dg-warning "state:
> 'start'" } */
> +
> +  struct arg_buf *args = p;
> +  __analyzer_dump_state ("taint", args->i); /* { dg-warning "state:
> 'start'" } */
> +  __analyzer_dump_state ("taint", args->j); /* { dg-warning "state:
> 'start'" } */  
> +}
> +
> +/* Function referenced via t2b.cb, marked as "tainted".  */
> +
> +void
> +test_2b (int i, void *p, char *q)
> +{
> +  /* There should be two enodes
> +     for the direct call, and the "tainted" entry to the function. 
> */
> +  __analyzer_dump_exploded_nodes (0); /* { dg-warning "2 processed
> enodes" } */
> +}
> +
> +/* Callback used via t2c.cb, marked as "tainted".  */
> +void
> +__analyzer_test_2c (int i, void *p, char *q)
> +{
> +  /* There should be a single enode,
> +     for the "tainted" entry to the function.  */
> +  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1 processed
> enode" } */
> +
> +  __analyzer_dump_state ("taint", i); /* { dg-warning "state:
> 'tainted'" } */
> +  __analyzer_dump_state ("taint", p); /* { dg-warning "state:
> 'tainted'" } */
> +  __analyzer_dump_state ("taint", q); /* { dg-warning "state:
> 'tainted'" } */
> +}
> +
> +struct s2 t2b =
> +{
> +  .cb = test_2b
> +};
> +
> +struct s2 t2c =
> +{
> +  .cb = __analyzer_test_2c
> +};
> diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
> b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
> new file mode 100644
> index 00000000000..6f4cbc82efb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
> @@ -0,0 +1,6 @@
> +int not_a_fn __attribute__ ((tainted)); /* { dg-warning "'tainted'
> attribute ignored; valid only for functions and function pointer
> fields" } */
> +
> +struct s
> +{
> +  int f __attribute__ ((tainted)); /* { dg-warning "'tainted'
> attribute ignored; field must be a function pointer" } */
> +};
> diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
> b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
> new file mode 100644
> index 00000000000..fe6c7ebbb1f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
> @@ -0,0 +1,93 @@
> +/* "The osf_getsysinfo function in arch/alpha/kernel/osf_sys.c in
> the
> +   Linux kernel before 2.6.39.4 on the Alpha platform does not
> properly
> +   restrict the data size for GSI_GET_HWRPB operations, which allows
> +   local users to obtain sensitive information from kernel memory
> via
> +   a crafted call."
> +
> +   Fixed in 3d0475119d8722798db5e88f26493f6547a4bb5b on linux-
> 2.6.39.y
> +   in linux-stable.  */
> +
> +// TODO: remove need for this option:
> +/* { dg-additional-options "-fanalyzer-checker=taint" } */
> +
> +#include "analyzer-decls.h"
> +#include "test-uaccess.h"
> +
> +/* Adapted from include/linux/linkage.h.  */
> +
> +#define asmlinkage
> +
> +/* Adapted from include/linux/syscalls.h.  */
> +
> +#define __SC_DECL1(t1, a1)     t1 a1
> +#define __SC_DECL2(t2, a2, ...) t2 a2, __SC_DECL1(__VA_ARGS__)
> +#define __SC_DECL3(t3, a3, ...) t3 a3, __SC_DECL2(__VA_ARGS__)
> +#define __SC_DECL4(t4, a4, ...) t4 a4, __SC_DECL3(__VA_ARGS__)
> +#define __SC_DECL5(t5, a5, ...) t5 a5, __SC_DECL4(__VA_ARGS__)
> +#define __SC_DECL6(t6, a6, ...) t6 a6, __SC_DECL5(__VA_ARGS__)
> +
> +#define SYSCALL_DEFINEx(x, sname, ...)                         \
> +       __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
> +
> +#define SYSCALL_DEFINE(name) asmlinkage long sys_##name
> +#define __SYSCALL_DEFINEx(x, name,
> ...)                                        \
> +       asmlinkage __attribute__((tainted)) \
> +       long sys##name(__SC_DECL##x(__VA_ARGS__))
> +
> +#define SYSCALL_DEFINE5(name, ...) SYSCALL_DEFINEx(5, _##name,
> __VA_ARGS__)
> +
> +/* Adapted from arch/alpha/include/asm/hwrpb.h.  */
> +
> +struct hwrpb_struct {
> +       unsigned long phys_addr;        /* check: physical address of
> the hwrpb */
> +       unsigned long id;               /* check: "HWRPB\0\0\0" */
> +       unsigned long revision;
> +       unsigned long size;             /* size of hwrpb */
> +       /* [...snip...] */
> +};
> +
> +extern struct hwrpb_struct *hwrpb;
> +
> +/* Adapted from arch/alpha/kernel/osf_sys.c.  */
> +
> +SYSCALL_DEFINE5(osf_getsysinfo, unsigned long, op, void __user *,
> buffer,
> +               unsigned long, nbytes, int __user *, start, void
> __user *, arg)
> +{
> +       /* [...snip...] */
> +
> +       __analyzer_dump_state ("taint", nbytes);  /* { dg-warning
> "tainted" } */
> +
> +       /* TODO: should have an event explaining why "nbytes" is
> treated as
> +          attacker-controlled.  */
> +
> +       /* case GSI_GET_HWRPB: */
> +               if (nbytes < sizeof(*hwrpb))
> +                       return -1;
> +
> +               __analyzer_dump_state ("taint", nbytes);  /* { dg-
> warning "has_lb" } */
> +
> +               if (copy_to_user(buffer, hwrpb, nbytes) != 0) /* {
> dg-warning "use of attacker-controlled value 'nbytes' as size without
> upper-bounds checking" } */
> +                       return -2;
> +
> +               return 1;
> +
> +       /* [...snip...] */
> +}
> +
> +/* With the fix for the sense of the size comparison.  */
> +
> +SYSCALL_DEFINE5(osf_getsysinfo_fixed, unsigned long, op, void __user
> *, buffer,
> +               unsigned long, nbytes, int __user *, start, void
> __user *, arg)
> +{
> +       /* [...snip...] */
> +
> +       /* case GSI_GET_HWRPB: */
> +               if (nbytes > sizeof(*hwrpb))
> +                       return -1;
> +               if (copy_to_user(buffer, hwrpb, nbytes) != 0) /* {
> dg-bogus "attacker-controlled" } */
> +                       return -2;
> +
> +               return 1;
> +
> +       /* [...snip...] */
> +}
> diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
> b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
> new file mode 100644
> index 00000000000..0b9a94a8d6c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
> @@ -0,0 +1,38 @@
> +/* See notes in this header.  */
> +#include "taint-CVE-2020-13143.h"
> +
> +// TODO: remove need for this option
> +/* { dg-additional-options "-fanalyzer-checker=taint" } */
> +
> +struct configfs_attribute {
> +       /* [...snip...] */
> +       ssize_t (*store)(struct config_item *, const char *, size_t)
> /* { dg-message "\\(1\\) field 'store' of 'struct configfs_attribute'
> is marked with '__attribute__\\(\\(tainted\\)\\)'" } */
> +               __attribute__((tainted)); /* (this is added).  */
> +};
> +static inline struct gadget_info *to_gadget_info(struct config_item
> *item)
> +{
> +        return container_of(to_config_group(item), struct
> gadget_info, group);
> +}
> +
> +static ssize_t gadget_dev_desc_UDC_store(struct config_item *item,
> +               const char *page, size_t len)
> +{
> +       struct gadget_info *gi = to_gadget_info(item);
> +       char *name;
> +       int ret;
> +
> +#if 0
> +       /* FIXME: this is the fix.  */
> +       if (strlen(page) < len)
> +               return -EOVERFLOW;
> +#endif
> +
> +       name = kstrdup(page, GFP_KERNEL);
> +       if (!name)
> +               return -ENOMEM;
> +       if (name[len - 1] == '\n') /* { dg-warning "use of attacker-
> controlled value 'len \[^\n\r\]+' as offset without upper-bounds
> checking" } */
> +               name[len - 1] = '\0'; /* { dg-warning "use of
> attacker-controlled value 'len \[^\n\r\]+' as offset without upper-
> bounds checking" } */
> +       /* [...snip...] */                              \
> +}
> +
> +CONFIGFS_ATTR(gadget_dev_desc_, UDC); /* { dg-message "\\(2\\)
> function 'gadget_dev_desc_UDC_store' used as initializer for field
> 'store' marked with '__attribute__\\(\\(tainted\\)\\)'" } */
> diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
> b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
> new file mode 100644
> index 00000000000..e05da9276c1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
> @@ -0,0 +1,32 @@
> +/* See notes in this header.  */
> +#include "taint-CVE-2020-13143.h"
> +
> +// TODO: remove need for this option
> +/* { dg-additional-options "-fanalyzer-checker=taint" } */
> +
> +struct configfs_attribute {
> +       /* [...snip...] */
> +       ssize_t (*store)(struct config_item *, const char *, size_t)
> /* { dg-message "\\(1\\) field 'store' of 'struct configfs_attribute'
> is marked with '__attribute__\\(\\(tainted\\)\\)'" } */
> +               __attribute__((tainted)); /* (this is added).  */
> +};
> +
> +/* Highly simplified version.  */
> +
> +static ssize_t gadget_dev_desc_UDC_store(struct config_item *item,
> +               const char *page, size_t len)
> +{
> +       /* TODO: ought to have state_change_event talking about where
> the tainted value comes from.  */
> +
> +       char *name;
> +       /* [...snip...] */
> +
> +       name = kstrdup(page, GFP_KERNEL);
> +       if (!name)
> +               return -ENOMEM;
> +       if (name[len - 1] == '\n') /* { dg-warning "use of attacker-
> controlled value 'len \[^\n\r\]+' as offset without upper-bounds
> checking" } */
> +               name[len - 1] = '\0';  /* { dg-warning "use of
> attacker-controlled value 'len \[^\n\r\]+' as offset without upper-
> bounds checking" } */
> +       /* [...snip...] */
> +       return 0;
> +}
> +
> +CONFIGFS_ATTR(gadget_dev_desc_, UDC); /* { dg-message "\\(2\\)
> function 'gadget_dev_desc_UDC_store' used as initializer for field
> 'store' marked with '__attribute__\\(\\(tainted\\)\\)'" } */
> diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
> b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
> new file mode 100644
> index 00000000000..0ba023539af
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
> @@ -0,0 +1,91 @@
> +/* Shared header for the various taint-CVE-2020-13143.h tests.
> +   
> +   "gadget_dev_desc_UDC_store in drivers/usb/gadget/configfs.c in
> the
> +   Linux kernel 3.16 through 5.6.13 relies on kstrdup without
> considering
> +   the possibility of an internal '\0' value, which allows attackers
> to
> +   trigger an out-of-bounds read, aka CID-15753588bcd4."
> +
> +   Fixed by 15753588bcd4bbffae1cca33c8ced5722477fe1f on linux-5.7.y
> +   in linux-stable.  */
> +
> +// TODO: remove need for this option
> +/* { dg-additional-options "-fanalyzer-checker=taint" } */
> +
> +#include <stddef.h>
> +
> +/* Adapted from include/uapi/asm-generic/posix_types.h  */
> +
> +typedef unsigned int     __kernel_size_t;
> +typedef int              __kernel_ssize_t;
> +
> +/* Adapted from include/linux/types.h  */
> +
> +//typedef __kernel_size_t              size_t;
> +typedef __kernel_ssize_t       ssize_t;
> +
> +/* Adapted from include/linux/kernel.h  */
> +
> +#define container_of(ptr, type, member)
> ({                             \
> +       void *__mptr = (void
> *)(ptr);                                   \
> +       /* [...snip...]
> */                                              \
> +       ((type *)(__mptr - offsetof(type, member))); })
> +
> +/* Adapted from include/linux/configfs.h  */
> +
> +struct config_item {
> +       /* [...snip...] */
> +};
> +
> +struct config_group {
> +       struct config_item              cg_item;
> +       /* [...snip...] */
> +};
> +
> +static inline struct config_group *to_config_group(struct
> config_item *item)
> +{
> +       return item ? container_of(item,struct config_group,cg_item)
> : NULL;
> +}
> +
> +#define CONFIGFS_ATTR(_pfx, _name)                             \
> +static struct configfs_attribute _pfx##attr_##_name = {        \
> +       /* [...snip...] */                              \
> +       .store          = _pfx##_name##_store,          \
> +}
> +
> +/* Adapted from include/linux/compiler.h  */
> +
> +#define __force
> +
> +/* Adapted from include/asm-generic/errno-base.h  */
> +
> +#define        ENOMEM          12      /* Out of memory */
> +
> +/* Adapted from include/linux/types.h  */
> +
> +#define __bitwise__
> +typedef unsigned __bitwise__ gfp_t;
> +
> +/* Adapted from include/linux/gfp.h  */
> +
> +#define ___GFP_WAIT            0x10u
> +#define ___GFP_IO              0x40u
> +#define ___GFP_FS              0x80u
> +#define __GFP_WAIT     ((__force gfp_t)___GFP_WAIT)
> +#define __GFP_IO       ((__force gfp_t)___GFP_IO)
> +#define __GFP_FS       ((__force gfp_t)___GFP_FS)
> +#define GFP_KERNEL  (__GFP_WAIT | __GFP_IO | __GFP_FS)
> +
> +/* Adapted from include/linux/compiler_attributes.h  */
> +
> +#define __malloc                        __attribute__((__malloc__))
> +
> +/* Adapted from include/linux/string.h  */
> +
> +extern char *kstrdup(const char *s, gfp_t gfp) __malloc;
> +
> +/* Adapted from drivers/usb/gadget/configfs.c  */
> +
> +struct gadget_info {
> +       struct config_group group;
> +       /* [...snip...] */                              \
> +};
> diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
> b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
> new file mode 100644
> index 00000000000..4c567b2ffdf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
> @@ -0,0 +1,21 @@
> +// TODO: remove need for this option:
> +/* { dg-additional-options "-fanalyzer-checker=taint" } */
> +
> +#include "analyzer-decls.h"
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +
> +/* malloc with tainted size from a syscall.  */
> +
> +void *p;
> +
> +void __attribute__((tainted))
> +test_1 (size_t sz) /* { dg-message "\\(1\\) function 'test_1' marked
> with '__attribute__\\(\\(tainted\\)\\)'" } */
> +{
> +  /* TODO: should have a message saying why "sz" is tainted, e.g.
> +     "treating 'sz' as attacker-controlled because 'test_1' is
> marked with '__attribute__((tainted))'"  */
> +
> +  p = malloc (sz); /* { dg-warning "use of attacker-controlled value
> 'sz' as allocation size without upper-bounds checking" "warning" } */
> +  /* { dg-message "\\(\[0-9\]+\\) use of attacker-controlled value
> 'sz' as allocation size without upper-bounds checking" "final event"
> { target *-*-* } .-1 } */
> +}
> diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
> b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
> new file mode 100644
> index 00000000000..f52cafcd71d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
> @@ -0,0 +1,31 @@
> +// TODO: remove need for this option:
> +/* { dg-additional-options "-fanalyzer-checker=taint" } */
> +
> +#include "analyzer-decls.h"
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +
> +/* malloc with tainted size from a syscall.  */
> +
> +struct arg_buf
> +{
> +  size_t sz;
> +};
> +
> +void *p;
> +
> +void __attribute__((tainted))
> +test_1 (void *data) /* { dg-message "\\(1\\) function 'test_1'
> marked with '__attribute__\\(\\(tainted\\)\\)'" } */
> +{
> +  /* we should treat pointed-to-structs as tainted.  */
> +  __analyzer_dump_state ("taint", data); /* { dg-warning "state:
> 'tainted'" } */
> +  
> +  struct arg_buf *args = data;
> +
> +  __analyzer_dump_state ("taint", args); /* { dg-warning "state:
> 'tainted'" } */
> +  __analyzer_dump_state ("taint", args->sz); /* { dg-warning "state:
> 'tainted'" } */
> +  
> +  p = malloc (args->sz); /* { dg-warning "use of attacker-controlled
> value '\\*args.sz' as allocation size without upper-bounds checking"
> "warning" } */
> +  /* { dg-message "\\(\[0-9\]+\\) use of attacker-controlled value
> '\\*args.sz' as allocation size without upper-bounds checking" "final
> event" { target *-*-* } .-1 } */
> +}



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1b/6] Add __attribute__((untrusted))
  2021-12-09 22:54   ` Martin Sebor
@ 2022-01-06 15:10     ` David Malcolm
  2022-01-06 18:59       ` Martin Sebor
  0 siblings, 1 reply; 39+ messages in thread
From: David Malcolm @ 2022-01-06 15:10 UTC (permalink / raw)
  To: Martin Sebor, gcc-patches, linux-toolchains

On Thu, 2021-12-09 at 15:54 -0700, Martin Sebor wrote:
> On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:
> > This patch adds a new:
> > 
> >    __attribute__((untrusted))
> > 
> > for use by the C front-end, intended for use by the Linux kernel for
> > use with "__user", but which could be used by other operating system
> > kernels, and potentialy by other projects.
> 
> It looks like untrusted is a type attribute (rather than one
> that applies to variables and/or function return values or
> writeable by-reference arguments).  I find that quite surprising.

FWIW I initially tried implementing it on pointer types, but doing it
on the underlying type was much cleaner.

>   I'm used to thinking of trusted vs tainted as dynamic properties
> of data so I'm having trouble deciding what to think about
> the attribute applying to types.  Can you explain why it's
> useful on types?

A type system *is* a way of detecting problems involving dynamic
properties of data.  Ultimately all we have at runtime is a collection
of bits; the toolchain has the concept of types as a way to allow us to
reason about properies of those bits without requiring a full cross-TU
analysis (to try to figure out that e.g. x is, say, a 32 bit unsigned
integer), and to document these properties clearly to human readers of
the code.

I see this as working like a qualifier (rather like "const" and
"volatile"), in that an
  untrusted char *
when dereferenced gives you an
  untrusted char

The intent is to have a way of treating the values as "actively
hostile", so that code analyzers can assume the worst possible values
for such types (or more glibly, that we're dealing with data from Satan
rather than from Murphy).

Such types are also relevant to infoleaks: writing sensitive
information to an untrusted value can be detected relatively easily
with this approach, by checking the type of the value - the types
express the trust boundary

Doing this with qualifiers allows us to use the C type system to detect
these kinds of issues without having to add a full cross-TU
interprocedural analysis, and documents it to human readers of the
code.   Compare with const-correctness; we can have an analogous
"trust-correctness".

> 
> I'd expect the taint property of a type to be quickly lost as
> an object of the type is passed through existing APIs (e.g.,
> a char array manipulated by string functions like strchr).

FWIW you can't directly pass an attacker-controlled buffer to strchr:
strchr requires there to be a 0-terminator to the array; if the array's
content is untrusted then the attacker might not have 0-terminated it.

As implemented, the patch doesn't complain about this, though maybe it
should.

The main point here is to support the existing __user annotation used
by the Linux kernel, in particular, copy_from_user and copy_to_user.

> 
> (I usually look at tests to help me understand the design of
> a change but I couldn't find an answer to my question in those
> in the patch.)

The patch kit was rather unclear on this, due to the use of two
different approaches (custom address spaces vs this untrusted
attribute).  Sorry about this.

Patches 4a and 4b in the kit add test-uaccess.h (to
gcc/testsuite/gcc.dg/analyzer) which supplies "__user"; see the tests
that use "test-uaccess.h" in patch 3:
 [PATCH 3/6] analyzer: implement infoleak detection
    https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584377.html
and in patch 5:
 [PATCH 5/6] analyzer: use region::untrusted_p in taint detection
   https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584374.html

(sorry about messing up the order of the patches).

Patch 4a here:
 [PATCH 4a/6] analyzer: implement region::untrusted_p in terms of custom address spaces
   https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584371.html
implements "__user" as a custom address space, 

whereas patch 4b here:

 [PATCH 4b/6] analyzer: implement region::untrusted_p in terms of __attribute__((untrusted))
    https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584373.html

implements "__user" to be __attribute__((untrusted)).

Perhaps I should drop the custom address space versions of the patches
and post a version of the kit that simply uses the attribute?

Dave


> 
> Thanks
> Martin
> 
> PS I found one paper online that discusses type-based taint
> analysis in Java but not much more.  I only quickly skimmed
> the paper and although it conceptually makes sense I'm still
> having difficulties seeing how it would be useful in C.
> 
> > 
> > Known issues:
> > - at least one TODO in handle_untrusted_attribute
> > - should it be permitted to dereference an untrusted pointer?  The
> > patch
> >    currently allows this
> > 
> > gcc/c-family/ChangeLog:
> >         * c-attribs.c (c_common_attribute_table): Add "untrusted".
> >         (build_untrusted_type): New.
> >         (handle_untrusted_attribute): New.
> >         * c-pretty-print.c (pp_c_cv_qualifiers): Handle
> >         TYPE_QUAL_UNTRUSTED.
> > 
> > gcc/c/ChangeLog:
> >         * c-typeck.c (convert_for_assignment): Complain if the trust
> >         levels vary when assigning a non-NULL pointer.
> > 
> > gcc/ChangeLog:
> >         * doc/extend.texi (Common Type Attributes): Add "untrusted".
> >         * print-tree.c (print_node): Handle TYPE_UNTRUSTED.
> >         * tree-core.h (enum cv_qualifier): Add TYPE_QUAL_UNTRUSTED.
> >         (struct tree_type_common): Assign one of the spare bits to a
> > new
> >         "untrusted_flag".
> >         * tree.c (set_type_quals): Handle TYPE_QUAL_UNTRUSTED.
> >         * tree.h (TYPE_QUALS): Likewise.
> >         (TYPE_QUALS_NO_ADDR_SPACE): Likewise.
> >         (TYPE_QUALS_NO_ADDR_SPACE_NO_ATOMIC): Likewise.
> > 
> > gcc/testsuite/ChangeLog:
> >         * c-c++-common/attr-untrusted-1.c: New test.
> > 
> > Signed-off-by: David Malcolm <dmalcolm@redhat.com>
> > ---
> >   gcc/c-family/c-attribs.c                      |  59 +++++++
> >   gcc/c-family/c-pretty-print.c                 |   2 +
> >   gcc/c/c-typeck.c                              |  64 +++++++
> >   gcc/doc/extend.texi                           |  25 +++
> >   gcc/print-tree.c                              |   3 +
> >   gcc/testsuite/c-c++-common/attr-untrusted-1.c | 165
> > ++++++++++++++++++
> >   gcc/tree-core.h                               |   6 +-
> >   gcc/tree.c                                    |   1 +
> >   gcc/tree.h                                    |  11 +-
> >   9 files changed, 332 insertions(+), 4 deletions(-)
> >   create mode 100644 gcc/testsuite/c-c++-common/attr-untrusted-1.c
> > 
> > diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> > index 007b928c54b..100c2dabab2 100644
> > --- a/gcc/c-family/c-attribs.c
> > +++ b/gcc/c-family/c-attribs.c
> > @@ -136,6 +136,7 @@ static tree handle_warn_unused_result_attribute
> > (tree *, tree, tree, int,
> >                                                  bool *);
> >   static tree handle_access_attribute (tree *, tree, tree, int, bool
> > *);
> >   
> > +static tree handle_untrusted_attribute (tree *, tree, tree, int,
> > bool *);
> >   static tree handle_sentinel_attribute (tree *, tree, tree, int,
> > bool *);
> >   static tree handle_type_generic_attribute (tree *, tree, tree, int,
> > bool *);
> >   static tree handle_alloc_size_attribute (tree *, tree, tree, int,
> > bool *);
> > @@ -536,6 +537,8 @@ const struct attribute_spec
> > c_common_attribute_table[] =
> >                               handle_special_var_sec_attribute,
> > attr_section_exclusions },
> >     { "access",               1, 3, false, true, true, false,
> >                               handle_access_attribute, NULL },
> > +  { "untrusted",             0, 0, false,  true, false, true,
> > +                             handle_untrusted_attribute, NULL },
> >     /* Attributes used by Objective-C.  */
> >     { "NSObject",                     0, 0, true, false, false,
> > false,
> >                               handle_nsobject_attribute, NULL },
> > @@ -5224,6 +5227,62 @@ build_attr_access_from_parms (tree parms, bool
> > skip_voidptr)
> >     return build_tree_list (name, attrargs);
> >   }
> >   
> > +/* Build (or reuse) a type based on BASE_TYPE, but with
> > +   TYPE_QUAL_UNTRUSTED.  */
> > +
> > +static tree
> > +build_untrusted_type (tree base_type)
> > +{
> > +  int base_type_quals = TYPE_QUALS (base_type);
> > +  return build_qualified_type (base_type,
> > +                              base_type_quals |
> > TYPE_QUAL_UNTRUSTED);
> > +}
> > +
> > +/* Handle an "untrusted" attribute; arguments as in
> > +   struct attribute_spec.handler.  */
> > +
> > +static tree
> > +handle_untrusted_attribute (tree *node, tree ARG_UNUSED (name),
> > +                           tree ARG_UNUSED (args), int ARG_UNUSED
> > (flags),
> > +                           bool *no_add_attrs)
> > +{
> > +  if (TREE_CODE (*node) == POINTER_TYPE)
> > +    {
> > +      tree base_type = TREE_TYPE (*node);
> > +      tree untrusted_base_type = build_untrusted_type (base_type);
> > +      *node = build_pointer_type (untrusted_base_type);
> > +      *no_add_attrs = true; /* OK */
> > +      return NULL_TREE;
> > +    }
> > +  else if (TREE_CODE (*node) == FUNCTION_TYPE)
> > +    {
> > +      tree return_type = TREE_TYPE (*node);
> > +      if (TREE_CODE (return_type) == POINTER_TYPE)
> > +       {
> > +         tree base_type = TREE_TYPE (return_type);
> > +         tree untrusted_base_type = build_untrusted_type
> > (base_type);
> > +         tree untrusted_return_type = build_pointer_type
> > (untrusted_base_type);
> > +         tree fn_type = build_function_type (untrusted_return_type,
> > +                                             TYPE_ARG_TYPES
> > (*node));
> > +         *node = fn_type;
> > +         *no_add_attrs = true; /* OK */
> > +         return NULL_TREE;
> > +       }
> > +      else
> > +       {
> > +         gcc_unreachable (); // TODO
> > +       }
> > +    }
> > +  else
> > +    {
> > +      tree base_type = *node;
> > +      tree untrusted_base_type = build_untrusted_type (base_type);
> > +      *node = untrusted_base_type;
> > +      *no_add_attrs = true; /* OK */
> > +      return NULL_TREE;
> > +    }
> > +}
> > +
> >   /* Handle a "nothrow" attribute; arguments as in
> >      struct attribute_spec.handler.  */
> >   
> > diff --git a/gcc/c-family/c-pretty-print.c b/gcc/c-family/c-pretty-
> > print.c
> > index a987da46d6d..120e1e6d167 100644
> > --- a/gcc/c-family/c-pretty-print.c
> > +++ b/gcc/c-family/c-pretty-print.c
> > @@ -191,6 +191,8 @@ pp_c_cv_qualifiers (c_pretty_printer *pp, int
> > qualifiers, bool func_type)
> >     if (qualifiers & TYPE_QUAL_RESTRICT)
> >       pp_c_ws_string (pp, (flag_isoc99 && !c_dialect_cxx ()
> >                          ? "restrict" : "__restrict__"));
> > +  if (qualifiers & TYPE_QUAL_UNTRUSTED)
> > +    pp_c_ws_string (pp, "__attribute__((untrusted))");
> >   }
> >   
> >   /* Pretty-print T using the type-cast notation '( type-name )'.  */
> > diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
> > index 782414f8c8c..44de82b99ba 100644
> > --- a/gcc/c/c-typeck.c
> > +++ b/gcc/c/c-typeck.c
> > @@ -7284,6 +7284,70 @@ convert_for_assignment (location_t location,
> > location_t expr_loc, tree type,
> >           return error_mark_node;
> >         }
> >   
> > +      /* Untrusted vs trusted pointers, but allowing NULL to be used
> > +        for everything.  */
> > +      if (TYPE_UNTRUSTED (ttl) != TYPE_UNTRUSTED (ttr)
> > +         && !null_pointer_constant_p (rhs))
> > +       {
> > +         auto_diagnostic_group d;
> > +         bool diagnosed = true;
> > +         switch (errtype)
> > +           {
> > +           case ic_argpass:
> > +             {
> > +               const char msg[] = G_("passing argument %d of %qE
> > from "
> > +                                     "pointer with different trust
> > level");
> > +               if (warnopt)
> > +                 diagnosed
> > +                   = warning_at (expr_loc, warnopt, msg, parmnum,
> > rname);
> > +               else
> > +                 error_at (expr_loc, msg, parmnum, rname);
> > +             break;
> > +             }
> > +           case ic_assign:
> > +             {
> > +               const char msg[] = G_("assignment from pointer with "
> > +                                     "different trust level");
> > +               if (warnopt)
> > +                 warning_at (location, warnopt, msg);
> > +               else
> > +                 error_at (location, msg);
> > +               break;
> > +             }
> > +           case ic_init:
> > +             {
> > +               const char msg[] = G_("initialization from pointer
> > with "
> > +                                     "different trust level");
> > +               if (warnopt)
> > +                 warning_at (location, warnopt, msg);
> > +               else
> > +                 error_at (location, msg);
> > +               break;
> > +             }
> > +           case ic_return:
> > +             {
> > +               const char msg[] = G_("return from pointer with "
> > +                                     "different trust level");
> > +               if (warnopt)
> > +                 warning_at (location, warnopt, msg);
> > +               else
> > +                 error_at (location, msg);
> > +               break;
> > +             }
> > +           default:
> > +             gcc_unreachable ();
> > +           }
> > +         if (diagnosed)
> > +           {
> > +             if (errtype == ic_argpass)
> > +               inform_for_arg (fundecl, expr_loc, parmnum, type,
> > rhstype);
> > +             else
> > +               inform (location, "expected %qT but pointer is of
> > type %qT",
> > +                       type, rhstype);
> > +           }
> > +         return error_mark_node;
> > +       }
> > +
> >         /* Check if the right-hand side has a format attribute but
> > the
> >          left-hand side doesn't.  */
> >         if (warn_suggest_attribute_format
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > index 6e6c580e329..e9f47519df2 100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -8770,6 +8770,31 @@ pid_t wait (wait_status_ptr_t p)
> >   @}
> >   @end smallexample
> >   
> > +@item untrusted
> > +@cindex @code{untrusted} type attribute
> > +Types marked with this attribute are treated as being ``untrusted''
> > -
> > +values should be treated as under attacker control.
> > +
> > +The C front end will issue an error diagnostic on attempts to assign
> > +pointer values between untrusted and trusted pointer types without
> > +an explicit cast.
> > +
> > +For example, when implementing an operating system kernel, one
> > +might write
> > +
> > +@smallexample
> > +#define __kernel
> > +#define __user    __attribute__ ((untrusted))
> > +void __kernel *p_kernel;
> > +void __user *p_user;
> > +
> > +/* With the above, the following assignment should be diagnosed as
> > an error.  */
> > +p_user = p_kernel;
> > +@end smallexample
> > +
> > +The NULL pointer is treated as being usable with both trusted and
> > +untrusted pointers.
> > +
> >   @item unused
> >   @cindex @code{unused} type attribute
> >   When attached to a type (including a @code{union} or a
> > @code{struct}),
> > diff --git a/gcc/print-tree.c b/gcc/print-tree.c
> > index d1fbd044c27..e5123807521 100644
> > --- a/gcc/print-tree.c
> > +++ b/gcc/print-tree.c
> > @@ -640,6 +640,9 @@ print_node (FILE *file, const char *prefix, tree
> > node, int indent,
> >         if (TYPE_RESTRICT (node))
> >         fputs (" restrict", file);
> >   
> > +      if (TYPE_UNTRUSTED (node))
> > +       fputs (" untrusted", file);
> > +
> >         if (TYPE_LANG_FLAG_0 (node))
> >         fputs (" type_0", file);
> >         if (TYPE_LANG_FLAG_1 (node))
> > diff --git a/gcc/testsuite/c-c++-common/attr-untrusted-1.c
> > b/gcc/testsuite/c-c++-common/attr-untrusted-1.c
> > new file mode 100644
> > index 00000000000..84a217fc59f
> > --- /dev/null
> > +++ b/gcc/testsuite/c-c++-common/attr-untrusted-1.c
> > @@ -0,0 +1,165 @@
> > +#define __kernel
> > +#define __user __attribute__((untrusted))
> > +#define __iomem
> > +#define __percpu
> > +#define __rcu
> > +
> > +void *p;
> > +void __kernel *p_kernel;
> > +void __user *p_user;
> > +void __iomem *p_iomem;
> > +void __percpu *p_percpu;
> > +void __rcu *p_rcu;
> > +
> > +#define NULL ((void *)0)
> > +
> > +extern void accepts_p (void *); /* { dg-message "24: expected 'void
> > \\*' but argument is of type '__attribute__\\(\\(untrusted\\)\\) void
> > \\*'" "" { target c } } */
> > +/* { dg-message "24:  initializing argument 1 of 'void
> > accepts_p\\(void\\*\\)'" "" { target c++ } .-1 } */
> > +extern void accepts_p_kernel (void __kernel *);
> > +extern void accepts_p_user (void __user *);
> > +
> > +void test_argpass_to_p (void)
> > +{
> > +  accepts_p (p);
> > +  accepts_p (p_kernel);
> > +  accepts_p (p_user); /* { dg-error "passing argument 1 of
> > 'accepts_p' from pointer with different trust level" "" { target c }
> > } */
> > +  /* { dg-error "invalid conversion from
> > '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" {
> > target c++ } .-1 } */
> > +}
> > +
> > +void test_init_p (void)
> > +{
> > +  void *local_p_1 = p;
> > +  void *local_p_2 = p_kernel;
> > +  void *local_p_3 = p_user; /* { dg-error "initialization from
> > pointer with different trust level" "" { target c } } */
> > +  /* { dg-message "expected 'void \\*' but pointer is of type
> > '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 }
> > */
> > +  /* { dg-error "invalid conversion from
> > '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" {
> > target c++ } .-2 } */
> > +}
> > +
> > +void test_init_p_kernel (void)
> > +{
> > +  void __kernel *local_p_1 = p;
> > +  void __kernel *local_p_2 = p_kernel;
> > +  void __kernel *local_p_3 = p_user; /* { dg-error "initialization
> > from pointer with different trust level" "" { target c } } */
> > +  /* { dg-message "expected 'void \\*' but pointer is of type
> > '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 }
> > */
> > +  /* { dg-error "invalid conversion from
> > '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" {
> > target c++ } .-2 } */
> > +}
> > +
> > +void test_init_p_user (void)
> > +{
> > +  void __user *local_p_1 = p; /* { dg-error "initialization from
> > pointer with different trust level" "" { target c } } */
> > +  /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\) void
> > \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
> > +  void __user *local_p_2 = p_kernel; /* { dg-error "initialization
> > from pointer with different trust level" "" { target c } } */
> > +  /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\) void
> > \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
> > +  void __user *local_p_3 = p_user;
> > +  void __user *local_p_4 = NULL;
> > +}
> > +
> > +void test_assign_to_p (void)
> > +{
> > +  p = p;
> > +  p = p_kernel;
> > +  p = p_user; /* { dg-error "assignment from pointer with different
> > trust level" "" { target c } } */
> > +  /* { dg-message "expected 'void \\*' but pointer is of type
> > '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 }
> > */
> > +  /* { dg-error "invalid conversion from
> > '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" {
> > target c++ } .-2 } */
> > +  // etc
> > +}
> > +
> > +void test_assign_to_p_kernel (void)
> > +{
> > +  p_kernel = p;
> > +  p_kernel = p_kernel;
> > +  p_kernel = p_user; /* { dg-error "assignment from pointer with
> > different trust level" "" { target c } } */
> > +  /* { dg-message "expected 'void \\*' but pointer is of type
> > '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 }
> > */
> > +  /* { dg-error "invalid conversion from
> > '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" {
> > target c++ } .-2 } */
> > +  // etc
> > +}
> > +
> > +void test_assign_to_p_user (void)
> > +{
> > +  p_user = p;  /* { dg-error "assignment from pointer with different
> > trust level" "" { target c } } */
> > +  /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\) void
> > \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
> > +  p_user = p_kernel;  /* { dg-error "assignment from pointer with
> > different trust level" "" { target c } } */
> > +  /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\) void
> > \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
> > +  p_user = p_user;
> > +  p_user = NULL;
> > +  // etc
> > +}
> > +
> > +void *test_return_p (int i)
> > +{
> > +  switch (i)
> > +    {
> > +    default:
> > +    case 0:
> > +      return p;
> > +    case 1:
> > +      return p_kernel;
> > +    case 2:
> > +      return p_user; /* { dg-error "return from pointer with
> > different trust level" "" { target c } } */
> > +      /* { dg-message "expected 'void \\*' but pointer is of type
> > '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 }
> > */
> > +      /* { dg-error "invalid conversion from
> > '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" {
> > target c++ } .-2 } */
> > +    }
> > +}
> > +
> > +void __kernel *test_return_p_kernel (int i)
> > +{
> > +  switch (i)
> > +    {
> > +    default:
> > +    case 0:
> > +      return p;
> > +    case 1:
> > +      return p_kernel;
> > +    case 2:
> > +      return p_user; /* { dg-error "return from pointer with
> > different trust level" "" { target c } } */
> > +      /* { dg-message "expected 'void \\*' but pointer is of type
> > '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 }
> > */
> > +      /* { dg-error "invalid conversion from
> > '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" {
> > target c++ } .-2 } */
> > +    }
> > +}
> > +
> > +void __user *
> > +test_return_p_user (int i)
> > +{
> > +  switch (i)
> > +    {
> > +    default:
> > +    case 0:
> > +      return p; /* { dg-error "return from pointer with different
> > trust level" "" { target c } } */
> > +      /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\)
> > void \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
> > +    case 1:
> > +      return p_kernel; /* { dg-error "return from pointer with
> > different trust level" "" { target c } } */
> > +      /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\)
> > void \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
> > +    case 2:
> > +      return p_user;
> > +    case 3:
> > +      return NULL;
> > +    }
> > +}
> > +
> > +void test_cast_k_to_u (void)
> > +{
> > +  p_user = (void __user *)p_kernel;
> > +}
> > +
> > +void test_cast_u_to_k (void)
> > +{
> > +  p_kernel = (void __kernel *)p_user;
> > +}
> > +
> > +int test_deref_read (int __user *p)
> > +{
> > +  return *p; // FIXME: should this be allowed directly?
> > +}
> > +
> > +void test_deref_write (int __user *p, int i)
> > +{
> > +  *p = i; // FIXME: should this be allowed directly?
> > +}
> > +
> > +typedef struct foo { int i; } __user *foo_ptr_t;
> > +
> > +void __user *
> > +test_pass_through (void __user *ptr)
> > +{
> > +  return ptr;
> > +}
> > diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> > index 8ab119dc9a2..35a7f50c06c 100644
> > --- a/gcc/tree-core.h
> > +++ b/gcc/tree-core.h
> > @@ -604,7 +604,8 @@ enum cv_qualifier {
> >     TYPE_QUAL_CONST    = 0x1,
> >     TYPE_QUAL_VOLATILE = 0x2,
> >     TYPE_QUAL_RESTRICT = 0x4,
> > -  TYPE_QUAL_ATOMIC   = 0x8
> > +  TYPE_QUAL_ATOMIC   = 0x8,
> > +  TYPE_QUAL_UNTRUSTED = 0x10
> >   };
> >   
> >   /* Standard named or nameless data types of the C compiler.  */
> > @@ -1684,7 +1685,8 @@ struct GTY(()) tree_type_common {
> >     unsigned typeless_storage : 1;
> >     unsigned empty_flag : 1;
> >     unsigned indivisible_p : 1;
> > -  unsigned spare : 16;
> > +  unsigned untrusted_flag : 1;
> > +  unsigned spare : 15;
> >   
> >     alias_set_type alias_set;
> >     tree pointer_to;
> > diff --git a/gcc/tree.c b/gcc/tree.c
> > index 845228a055b..3600639d985 100644
> > --- a/gcc/tree.c
> > +++ b/gcc/tree.c
> > @@ -5379,6 +5379,7 @@ set_type_quals (tree type, int type_quals)
> >     TYPE_VOLATILE (type) = (type_quals & TYPE_QUAL_VOLATILE) != 0;
> >     TYPE_RESTRICT (type) = (type_quals & TYPE_QUAL_RESTRICT) != 0;
> >     TYPE_ATOMIC (type) = (type_quals & TYPE_QUAL_ATOMIC) != 0;
> > +  TYPE_UNTRUSTED (type) = (type_quals & TYPE_QUAL_UNTRUSTED) != 0;
> >     TYPE_ADDR_SPACE (type) = DECODE_QUAL_ADDR_SPACE (type_quals);
> >   }
> >   
> > diff --git a/gcc/tree.h b/gcc/tree.h
> > index f62c00bc870..caab575b210 100644
> > --- a/gcc/tree.h
> > +++ b/gcc/tree.h
> > @@ -2197,6 +2197,10 @@ extern tree vector_element_bits_tree
> > (const_tree);
> >      the term.  */
> >   #define TYPE_RESTRICT(NODE) (TYPE_CHECK (NODE)-
> > >type_common.restrict_flag)
> >   
> > +/* Nonzero in a type considered "untrusted" - values should be
> > treated as
> > +   under attacker control.  */
> > +#define TYPE_UNTRUSTED(NODE) (TYPE_CHECK (NODE)-
> > >type_common.untrusted_flag)
> > +
> >   /* If nonzero, type's name shouldn't be emitted into debug info. 
> > */
> >   #define TYPE_NAMELESS(NODE) (TYPE_CHECK (NODE)-
> > >base.u.bits.nameless_flag)
> >   
> > @@ -2221,6 +2225,7 @@ extern tree vector_element_bits_tree
> > (const_tree);
> >           | (TYPE_VOLATILE (NODE) * TYPE_QUAL_VOLATILE)         \
> >           | (TYPE_ATOMIC (NODE) * TYPE_QUAL_ATOMIC)             \
> >           | (TYPE_RESTRICT (NODE) * TYPE_QUAL_RESTRICT)         \
> > +         | (TYPE_UNTRUSTED (NODE) * TYPE_QUAL_UNTRUSTED)       \
> >           | (ENCODE_QUAL_ADDR_SPACE (TYPE_ADDR_SPACE (NODE)))))
> >   
> >   /* The same as TYPE_QUALS without the address space
> > qualifications.  */
> > @@ -2228,14 +2233,16 @@ extern tree vector_element_bits_tree
> > (const_tree);
> >     ((int) ((TYPE_READONLY (NODE) * TYPE_QUAL_CONST)            \
> >           | (TYPE_VOLATILE (NODE) * TYPE_QUAL_VOLATILE)         \
> >           | (TYPE_ATOMIC (NODE) * TYPE_QUAL_ATOMIC)             \
> > -         | (TYPE_RESTRICT (NODE) * TYPE_QUAL_RESTRICT)))
> > +         | (TYPE_RESTRICT (NODE) * TYPE_QUAL_RESTRICT)         \
> > +         | (TYPE_UNTRUSTED (NODE) * TYPE_QUAL_UNTRUSTED)))
> >   
> >   /* The same as TYPE_QUALS without the address space and atomic
> >      qualifications.  */
> >   #define TYPE_QUALS_NO_ADDR_SPACE_NO_ATOMIC(NODE)              \
> >     ((int) ((TYPE_READONLY (NODE) * TYPE_QUAL_CONST)            \
> >           | (TYPE_VOLATILE (NODE) * TYPE_QUAL_VOLATILE)         \
> > -         | (TYPE_RESTRICT (NODE) * TYPE_QUAL_RESTRICT)))
> > +         | (TYPE_RESTRICT (NODE) * TYPE_QUAL_RESTRICT)         \
> > +         | (TYPE_UNTRUSTED (NODE) * TYPE_QUAL_UNTRUSTED)))
> >   
> >   /* These flags are available for each language front end to use
> > internally.  */
> >   #define TYPE_LANG_FLAG_0(NODE) (TYPE_CHECK (NODE)-
> > >type_common.lang_flag_0)
> > 
> 



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1b/6] Add __attribute__((untrusted))
  2022-01-06 15:10     ` David Malcolm
@ 2022-01-06 18:59       ` Martin Sebor
  0 siblings, 0 replies; 39+ messages in thread
From: Martin Sebor @ 2022-01-06 18:59 UTC (permalink / raw)
  To: David Malcolm, gcc-patches, linux-toolchains

On 1/6/22 8:10 AM, David Malcolm wrote:
> On Thu, 2021-12-09 at 15:54 -0700, Martin Sebor wrote:
>> On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:
>>> This patch adds a new:
>>>
>>>     __attribute__((untrusted))
>>>
>>> for use by the C front-end, intended for use by the Linux kernel for
>>> use with "__user", but which could be used by other operating system
>>> kernels, and potentialy by other projects.
>>
>> It looks like untrusted is a type attribute (rather than one
>> that applies to variables and/or function return values or
>> writeable by-reference arguments).  I find that quite surprising.
> 
> FWIW I initially tried implementing it on pointer types, but doing it
> on the underlying type was much cleaner.
> 
>>    I'm used to thinking of trusted vs tainted as dynamic properties
>> of data so I'm having trouble deciding what to think about
>> the attribute applying to types.  Can you explain why it's
>> useful on types?
> 
> A type system *is* a way of detecting problems involving dynamic
> properties of data.  Ultimately all we have at runtime is a collection
> of bits; the toolchain has the concept of types as a way to allow us to
> reason about properies of those bits without requiring a full cross-TU
> analysis (to try to figure out that e.g. x is, say, a 32 bit unsigned
> integer), and to document these properties clearly to human readers of
> the code.

I understand that relying on the type system is a way to do it.
It just doesn't seem like a very good way in a loosely typed
language like C (or C++).

> 
> I see this as working like a qualifier (rather like "const" and
> "volatile"), in that an
>    untrusted char *
> when dereferenced gives you an
>    untrusted char

Dereferencing a const char* yields a const char lvalue that
implicitly converts to an unqualified value of the referenced
object.  The qualifier is lost in the conversion, so modeling
taint/trust this way will also lose the property in the same
contexts.  It sounds to me like the concept you're modeling
might be more akin to a type specifier (maybe like _Atomic,
although that still converts to the underlying type).

> 
> The intent is to have a way of treating the values as "actively
> hostile", so that code analyzers can assume the worst possible values
> for such types (or more glibly, that we're dealing with data from Satan
> rather than from Murphy).
> 
> Such types are also relevant to infoleaks: writing sensitive
> information to an untrusted value can be detected relatively easily
> with this approach, by checking the type of the value - the types
> express the trust boundary
> 
> Doing this with qualifiers allows us to use the C type system to detect
> these kinds of issues without having to add a full cross-TU
> interprocedural analysis, and documents it to human readers of the
> code.   Compare with const-correctness; we can have an analogous
> "trust-correctness".

The problem with const-correctness in C is that it's so easily
lost (like with strchr, or in the lvalue-rvalue conversion).
This is also why I'm skeptical of the type-based approach here.

> 
>>
>> I'd expect the taint property of a type to be quickly lost as
>> an object of the type is passed through existing APIs (e.g.,
>> a char array manipulated by string functions like strchr).
> 
> FWIW you can't directly pass an attacker-controlled buffer to strchr:
> strchr requires there to be a 0-terminator to the array; if the array's
> content is untrusted then the attacker might not have 0-terminated it.

strchr is just an example of the many functions that in my mind
make the type-based approach less than ideal.  If the untrusted
string was known to be nul-teminated, strchr still couldn't be
used without losing the property.  Ditto for memchr.  It seems
that all sanitization would either have to be written from
scratch, without relying on existing utility functions, or by
providing wrappers that called the common utility functions
after removing the qualifier from the tainted data even before
the santization was complete.  That would obviously be error-
prone, but it's something that would be made much more robust
by tracking the taint independently of the data type.

Martin

> 
> As implemented, the patch doesn't complain about this, though maybe it
> should.
> 
> The main point here is to support the existing __user annotation used
> by the Linux kernel, in particular, copy_from_user and copy_to_user.
> 
>>
>> (I usually look at tests to help me understand the design of
>> a change but I couldn't find an answer to my question in those
>> in the patch.)
> 
> The patch kit was rather unclear on this, due to the use of two
> different approaches (custom address spaces vs this untrusted
> attribute).  Sorry about this.
> 
> Patches 4a and 4b in the kit add test-uaccess.h (to
> gcc/testsuite/gcc.dg/analyzer) which supplies "__user"; see the tests
> that use "test-uaccess.h" in patch 3:
>   [PATCH 3/6] analyzer: implement infoleak detection
>      https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584377.html
> and in patch 5:
>   [PATCH 5/6] analyzer: use region::untrusted_p in taint detection
>     https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584374.html
> 
> (sorry about messing up the order of the patches).
> 
> Patch 4a here:
>   [PATCH 4a/6] analyzer: implement region::untrusted_p in terms of custom address spaces
>     https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584371.html
> implements "__user" as a custom address space,
> 
> whereas patch 4b here:
> 
>   [PATCH 4b/6] analyzer: implement region::untrusted_p in terms of __attribute__((untrusted))
>      https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584373.html
> 
> implements "__user" to be __attribute__((untrusted)).
> 
> Perhaps I should drop the custom address space versions of the patches
> and post a version of the kit that simply uses the attribute?
> 
> Dave
> 
> 
>>
>> Thanks
>> Martin
>>
>> PS I found one paper online that discusses type-based taint
>> analysis in Java but not much more.  I only quickly skimmed
>> the paper and although it conceptually makes sense I'm still
>> having difficulties seeing how it would be useful in C.
>>
>>>
>>> Known issues:
>>> - at least one TODO in handle_untrusted_attribute
>>> - should it be permitted to dereference an untrusted pointer?  The
>>> patch
>>>     currently allows this
>>>
>>> gcc/c-family/ChangeLog:
>>>          * c-attribs.c (c_common_attribute_table): Add "untrusted".
>>>          (build_untrusted_type): New.
>>>          (handle_untrusted_attribute): New.
>>>          * c-pretty-print.c (pp_c_cv_qualifiers): Handle
>>>          TYPE_QUAL_UNTRUSTED.
>>>
>>> gcc/c/ChangeLog:
>>>          * c-typeck.c (convert_for_assignment): Complain if the trust
>>>          levels vary when assigning a non-NULL pointer.
>>>
>>> gcc/ChangeLog:
>>>          * doc/extend.texi (Common Type Attributes): Add "untrusted".
>>>          * print-tree.c (print_node): Handle TYPE_UNTRUSTED.
>>>          * tree-core.h (enum cv_qualifier): Add TYPE_QUAL_UNTRUSTED.
>>>          (struct tree_type_common): Assign one of the spare bits to a
>>> new
>>>          "untrusted_flag".
>>>          * tree.c (set_type_quals): Handle TYPE_QUAL_UNTRUSTED.
>>>          * tree.h (TYPE_QUALS): Likewise.
>>>          (TYPE_QUALS_NO_ADDR_SPACE): Likewise.
>>>          (TYPE_QUALS_NO_ADDR_SPACE_NO_ATOMIC): Likewise.
>>>
>>> gcc/testsuite/ChangeLog:
>>>          * c-c++-common/attr-untrusted-1.c: New test.
>>>
>>> Signed-off-by: David Malcolm <dmalcolm@redhat.com>
>>> ---
>>>    gcc/c-family/c-attribs.c                      |  59 +++++++
>>>    gcc/c-family/c-pretty-print.c                 |   2 +
>>>    gcc/c/c-typeck.c                              |  64 +++++++
>>>    gcc/doc/extend.texi                           |  25 +++
>>>    gcc/print-tree.c                              |   3 +
>>>    gcc/testsuite/c-c++-common/attr-untrusted-1.c | 165
>>> ++++++++++++++++++
>>>    gcc/tree-core.h                               |   6 +-
>>>    gcc/tree.c                                    |   1 +
>>>    gcc/tree.h                                    |  11 +-
>>>    9 files changed, 332 insertions(+), 4 deletions(-)
>>>    create mode 100644 gcc/testsuite/c-c++-common/attr-untrusted-1.c
>>>
>>> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
>>> index 007b928c54b..100c2dabab2 100644
>>> --- a/gcc/c-family/c-attribs.c
>>> +++ b/gcc/c-family/c-attribs.c
>>> @@ -136,6 +136,7 @@ static tree handle_warn_unused_result_attribute
>>> (tree *, tree, tree, int,
>>>                                                   bool *);
>>>    static tree handle_access_attribute (tree *, tree, tree, int, bool
>>> *);
>>>    
>>> +static tree handle_untrusted_attribute (tree *, tree, tree, int,
>>> bool *);
>>>    static tree handle_sentinel_attribute (tree *, tree, tree, int,
>>> bool *);
>>>    static tree handle_type_generic_attribute (tree *, tree, tree, int,
>>> bool *);
>>>    static tree handle_alloc_size_attribute (tree *, tree, tree, int,
>>> bool *);
>>> @@ -536,6 +537,8 @@ const struct attribute_spec
>>> c_common_attribute_table[] =
>>>                                handle_special_var_sec_attribute,
>>> attr_section_exclusions },
>>>      { "access",               1, 3, false, true, true, false,
>>>                                handle_access_attribute, NULL },
>>> +  { "untrusted",             0, 0, false,  true, false, true,
>>> +                             handle_untrusted_attribute, NULL },
>>>      /* Attributes used by Objective-C.  */
>>>      { "NSObject",                     0, 0, true, false, false,
>>> false,
>>>                                handle_nsobject_attribute, NULL },
>>> @@ -5224,6 +5227,62 @@ build_attr_access_from_parms (tree parms, bool
>>> skip_voidptr)
>>>      return build_tree_list (name, attrargs);
>>>    }
>>>    
>>> +/* Build (or reuse) a type based on BASE_TYPE, but with
>>> +   TYPE_QUAL_UNTRUSTED.  */
>>> +
>>> +static tree
>>> +build_untrusted_type (tree base_type)
>>> +{
>>> +  int base_type_quals = TYPE_QUALS (base_type);
>>> +  return build_qualified_type (base_type,
>>> +                              base_type_quals |
>>> TYPE_QUAL_UNTRUSTED);
>>> +}
>>> +
>>> +/* Handle an "untrusted" attribute; arguments as in
>>> +   struct attribute_spec.handler.  */
>>> +
>>> +static tree
>>> +handle_untrusted_attribute (tree *node, tree ARG_UNUSED (name),
>>> +                           tree ARG_UNUSED (args), int ARG_UNUSED
>>> (flags),
>>> +                           bool *no_add_attrs)
>>> +{
>>> +  if (TREE_CODE (*node) == POINTER_TYPE)
>>> +    {
>>> +      tree base_type = TREE_TYPE (*node);
>>> +      tree untrusted_base_type = build_untrusted_type (base_type);
>>> +      *node = build_pointer_type (untrusted_base_type);
>>> +      *no_add_attrs = true; /* OK */
>>> +      return NULL_TREE;
>>> +    }
>>> +  else if (TREE_CODE (*node) == FUNCTION_TYPE)
>>> +    {
>>> +      tree return_type = TREE_TYPE (*node);
>>> +      if (TREE_CODE (return_type) == POINTER_TYPE)
>>> +       {
>>> +         tree base_type = TREE_TYPE (return_type);
>>> +         tree untrusted_base_type = build_untrusted_type
>>> (base_type);
>>> +         tree untrusted_return_type = build_pointer_type
>>> (untrusted_base_type);
>>> +         tree fn_type = build_function_type (untrusted_return_type,
>>> +                                             TYPE_ARG_TYPES
>>> (*node));
>>> +         *node = fn_type;
>>> +         *no_add_attrs = true; /* OK */
>>> +         return NULL_TREE;
>>> +       }
>>> +      else
>>> +       {
>>> +         gcc_unreachable (); // TODO
>>> +       }
>>> +    }
>>> +  else
>>> +    {
>>> +      tree base_type = *node;
>>> +      tree untrusted_base_type = build_untrusted_type (base_type);
>>> +      *node = untrusted_base_type;
>>> +      *no_add_attrs = true; /* OK */
>>> +      return NULL_TREE;
>>> +    }
>>> +}
>>> +
>>>    /* Handle a "nothrow" attribute; arguments as in
>>>       struct attribute_spec.handler.  */
>>>    
>>> diff --git a/gcc/c-family/c-pretty-print.c b/gcc/c-family/c-pretty-
>>> print.c
>>> index a987da46d6d..120e1e6d167 100644
>>> --- a/gcc/c-family/c-pretty-print.c
>>> +++ b/gcc/c-family/c-pretty-print.c
>>> @@ -191,6 +191,8 @@ pp_c_cv_qualifiers (c_pretty_printer *pp, int
>>> qualifiers, bool func_type)
>>>      if (qualifiers & TYPE_QUAL_RESTRICT)
>>>        pp_c_ws_string (pp, (flag_isoc99 && !c_dialect_cxx ()
>>>                           ? "restrict" : "__restrict__"));
>>> +  if (qualifiers & TYPE_QUAL_UNTRUSTED)
>>> +    pp_c_ws_string (pp, "__attribute__((untrusted))");
>>>    }
>>>    
>>>    /* Pretty-print T using the type-cast notation '( type-name )'.  */
>>> diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
>>> index 782414f8c8c..44de82b99ba 100644
>>> --- a/gcc/c/c-typeck.c
>>> +++ b/gcc/c/c-typeck.c
>>> @@ -7284,6 +7284,70 @@ convert_for_assignment (location_t location,
>>> location_t expr_loc, tree type,
>>>            return error_mark_node;
>>>          }
>>>    
>>> +      /* Untrusted vs trusted pointers, but allowing NULL to be used
>>> +        for everything.  */
>>> +      if (TYPE_UNTRUSTED (ttl) != TYPE_UNTRUSTED (ttr)
>>> +         && !null_pointer_constant_p (rhs))
>>> +       {
>>> +         auto_diagnostic_group d;
>>> +         bool diagnosed = true;
>>> +         switch (errtype)
>>> +           {
>>> +           case ic_argpass:
>>> +             {
>>> +               const char msg[] = G_("passing argument %d of %qE
>>> from "
>>> +                                     "pointer with different trust
>>> level");
>>> +               if (warnopt)
>>> +                 diagnosed
>>> +                   = warning_at (expr_loc, warnopt, msg, parmnum,
>>> rname);
>>> +               else
>>> +                 error_at (expr_loc, msg, parmnum, rname);
>>> +             break;
>>> +             }
>>> +           case ic_assign:
>>> +             {
>>> +               const char msg[] = G_("assignment from pointer with "
>>> +                                     "different trust level");
>>> +               if (warnopt)
>>> +                 warning_at (location, warnopt, msg);
>>> +               else
>>> +                 error_at (location, msg);
>>> +               break;
>>> +             }
>>> +           case ic_init:
>>> +             {
>>> +               const char msg[] = G_("initialization from pointer
>>> with "
>>> +                                     "different trust level");
>>> +               if (warnopt)
>>> +                 warning_at (location, warnopt, msg);
>>> +               else
>>> +                 error_at (location, msg);
>>> +               break;
>>> +             }
>>> +           case ic_return:
>>> +             {
>>> +               const char msg[] = G_("return from pointer with "
>>> +                                     "different trust level");
>>> +               if (warnopt)
>>> +                 warning_at (location, warnopt, msg);
>>> +               else
>>> +                 error_at (location, msg);
>>> +               break;
>>> +             }
>>> +           default:
>>> +             gcc_unreachable ();
>>> +           }
>>> +         if (diagnosed)
>>> +           {
>>> +             if (errtype == ic_argpass)
>>> +               inform_for_arg (fundecl, expr_loc, parmnum, type,
>>> rhstype);
>>> +             else
>>> +               inform (location, "expected %qT but pointer is of
>>> type %qT",
>>> +                       type, rhstype);
>>> +           }
>>> +         return error_mark_node;
>>> +       }
>>> +
>>>          /* Check if the right-hand side has a format attribute but
>>> the
>>>           left-hand side doesn't.  */
>>>          if (warn_suggest_attribute_format
>>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>>> index 6e6c580e329..e9f47519df2 100644
>>> --- a/gcc/doc/extend.texi
>>> +++ b/gcc/doc/extend.texi
>>> @@ -8770,6 +8770,31 @@ pid_t wait (wait_status_ptr_t p)
>>>    @}
>>>    @end smallexample
>>>    
>>> +@item untrusted
>>> +@cindex @code{untrusted} type attribute
>>> +Types marked with this attribute are treated as being ``untrusted''
>>> -
>>> +values should be treated as under attacker control.
>>> +
>>> +The C front end will issue an error diagnostic on attempts to assign
>>> +pointer values between untrusted and trusted pointer types without
>>> +an explicit cast.
>>> +
>>> +For example, when implementing an operating system kernel, one
>>> +might write
>>> +
>>> +@smallexample
>>> +#define __kernel
>>> +#define __user    __attribute__ ((untrusted))
>>> +void __kernel *p_kernel;
>>> +void __user *p_user;
>>> +
>>> +/* With the above, the following assignment should be diagnosed as
>>> an error.  */
>>> +p_user = p_kernel;
>>> +@end smallexample
>>> +
>>> +The NULL pointer is treated as being usable with both trusted and
>>> +untrusted pointers.
>>> +
>>>    @item unused
>>>    @cindex @code{unused} type attribute
>>>    When attached to a type (including a @code{union} or a
>>> @code{struct}),
>>> diff --git a/gcc/print-tree.c b/gcc/print-tree.c
>>> index d1fbd044c27..e5123807521 100644
>>> --- a/gcc/print-tree.c
>>> +++ b/gcc/print-tree.c
>>> @@ -640,6 +640,9 @@ print_node (FILE *file, const char *prefix, tree
>>> node, int indent,
>>>          if (TYPE_RESTRICT (node))
>>>          fputs (" restrict", file);
>>>    
>>> +      if (TYPE_UNTRUSTED (node))
>>> +       fputs (" untrusted", file);
>>> +
>>>          if (TYPE_LANG_FLAG_0 (node))
>>>          fputs (" type_0", file);
>>>          if (TYPE_LANG_FLAG_1 (node))
>>> diff --git a/gcc/testsuite/c-c++-common/attr-untrusted-1.c
>>> b/gcc/testsuite/c-c++-common/attr-untrusted-1.c
>>> new file mode 100644
>>> index 00000000000..84a217fc59f
>>> --- /dev/null
>>> +++ b/gcc/testsuite/c-c++-common/attr-untrusted-1.c
>>> @@ -0,0 +1,165 @@
>>> +#define __kernel
>>> +#define __user __attribute__((untrusted))
>>> +#define __iomem
>>> +#define __percpu
>>> +#define __rcu
>>> +
>>> +void *p;
>>> +void __kernel *p_kernel;
>>> +void __user *p_user;
>>> +void __iomem *p_iomem;
>>> +void __percpu *p_percpu;
>>> +void __rcu *p_rcu;
>>> +
>>> +#define NULL ((void *)0)
>>> +
>>> +extern void accepts_p (void *); /* { dg-message "24: expected 'void
>>> \\*' but argument is of type '__attribute__\\(\\(untrusted\\)\\) void
>>> \\*'" "" { target c } } */
>>> +/* { dg-message "24:  initializing argument 1 of 'void
>>> accepts_p\\(void\\*\\)'" "" { target c++ } .-1 } */
>>> +extern void accepts_p_kernel (void __kernel *);
>>> +extern void accepts_p_user (void __user *);
>>> +
>>> +void test_argpass_to_p (void)
>>> +{
>>> +  accepts_p (p);
>>> +  accepts_p (p_kernel);
>>> +  accepts_p (p_user); /* { dg-error "passing argument 1 of
>>> 'accepts_p' from pointer with different trust level" "" { target c }
>>> } */
>>> +  /* { dg-error "invalid conversion from
>>> '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" {
>>> target c++ } .-1 } */
>>> +}
>>> +
>>> +void test_init_p (void)
>>> +{
>>> +  void *local_p_1 = p;
>>> +  void *local_p_2 = p_kernel;
>>> +  void *local_p_3 = p_user; /* { dg-error "initialization from
>>> pointer with different trust level" "" { target c } } */
>>> +  /* { dg-message "expected 'void \\*' but pointer is of type
>>> '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 }
>>> */
>>> +  /* { dg-error "invalid conversion from
>>> '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" {
>>> target c++ } .-2 } */
>>> +}
>>> +
>>> +void test_init_p_kernel (void)
>>> +{
>>> +  void __kernel *local_p_1 = p;
>>> +  void __kernel *local_p_2 = p_kernel;
>>> +  void __kernel *local_p_3 = p_user; /* { dg-error "initialization
>>> from pointer with different trust level" "" { target c } } */
>>> +  /* { dg-message "expected 'void \\*' but pointer is of type
>>> '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 }
>>> */
>>> +  /* { dg-error "invalid conversion from
>>> '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" {
>>> target c++ } .-2 } */
>>> +}
>>> +
>>> +void test_init_p_user (void)
>>> +{
>>> +  void __user *local_p_1 = p; /* { dg-error "initialization from
>>> pointer with different trust level" "" { target c } } */
>>> +  /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\) void
>>> \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
>>> +  void __user *local_p_2 = p_kernel; /* { dg-error "initialization
>>> from pointer with different trust level" "" { target c } } */
>>> +  /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\) void
>>> \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
>>> +  void __user *local_p_3 = p_user;
>>> +  void __user *local_p_4 = NULL;
>>> +}
>>> +
>>> +void test_assign_to_p (void)
>>> +{
>>> +  p = p;
>>> +  p = p_kernel;
>>> +  p = p_user; /* { dg-error "assignment from pointer with different
>>> trust level" "" { target c } } */
>>> +  /* { dg-message "expected 'void \\*' but pointer is of type
>>> '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 }
>>> */
>>> +  /* { dg-error "invalid conversion from
>>> '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" {
>>> target c++ } .-2 } */
>>> +  // etc
>>> +}
>>> +
>>> +void test_assign_to_p_kernel (void)
>>> +{
>>> +  p_kernel = p;
>>> +  p_kernel = p_kernel;
>>> +  p_kernel = p_user; /* { dg-error "assignment from pointer with
>>> different trust level" "" { target c } } */
>>> +  /* { dg-message "expected 'void \\*' but pointer is of type
>>> '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 }
>>> */
>>> +  /* { dg-error "invalid conversion from
>>> '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" {
>>> target c++ } .-2 } */
>>> +  // etc
>>> +}
>>> +
>>> +void test_assign_to_p_user (void)
>>> +{
>>> +  p_user = p;  /* { dg-error "assignment from pointer with different
>>> trust level" "" { target c } } */
>>> +  /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\) void
>>> \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
>>> +  p_user = p_kernel;  /* { dg-error "assignment from pointer with
>>> different trust level" "" { target c } } */
>>> +  /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\) void
>>> \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
>>> +  p_user = p_user;
>>> +  p_user = NULL;
>>> +  // etc
>>> +}
>>> +
>>> +void *test_return_p (int i)
>>> +{
>>> +  switch (i)
>>> +    {
>>> +    default:
>>> +    case 0:
>>> +      return p;
>>> +    case 1:
>>> +      return p_kernel;
>>> +    case 2:
>>> +      return p_user; /* { dg-error "return from pointer with
>>> different trust level" "" { target c } } */
>>> +      /* { dg-message "expected 'void \\*' but pointer is of type
>>> '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 }
>>> */
>>> +      /* { dg-error "invalid conversion from
>>> '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" {
>>> target c++ } .-2 } */
>>> +    }
>>> +}
>>> +
>>> +void __kernel *test_return_p_kernel (int i)
>>> +{
>>> +  switch (i)
>>> +    {
>>> +    default:
>>> +    case 0:
>>> +      return p;
>>> +    case 1:
>>> +      return p_kernel;
>>> +    case 2:
>>> +      return p_user; /* { dg-error "return from pointer with
>>> different trust level" "" { target c } } */
>>> +      /* { dg-message "expected 'void \\*' but pointer is of type
>>> '__attribute__\\(\\(untrusted\\)\\) void \\*'" "" { target c } .-1 }
>>> */
>>> +      /* { dg-error "invalid conversion from
>>> '__attribute__\\(\\(untrusted\\)\\) void\\*' to 'void\\*'" "" {
>>> target c++ } .-2 } */
>>> +    }
>>> +}
>>> +
>>> +void __user *
>>> +test_return_p_user (int i)
>>> +{
>>> +  switch (i)
>>> +    {
>>> +    default:
>>> +    case 0:
>>> +      return p; /* { dg-error "return from pointer with different
>>> trust level" "" { target c } } */
>>> +      /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\)
>>> void \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
>>> +    case 1:
>>> +      return p_kernel; /* { dg-error "return from pointer with
>>> different trust level" "" { target c } } */
>>> +      /* { dg-message "expected '__attribute__\\(\\(untrusted\\)\\)
>>> void \\*' but pointer is of type 'void \\*'" "" { target c } .-1 } */
>>> +    case 2:
>>> +      return p_user;
>>> +    case 3:
>>> +      return NULL;
>>> +    }
>>> +}
>>> +
>>> +void test_cast_k_to_u (void)
>>> +{
>>> +  p_user = (void __user *)p_kernel;
>>> +}
>>> +
>>> +void test_cast_u_to_k (void)
>>> +{
>>> +  p_kernel = (void __kernel *)p_user;
>>> +}
>>> +
>>> +int test_deref_read (int __user *p)
>>> +{
>>> +  return *p; // FIXME: should this be allowed directly?
>>> +}
>>> +
>>> +void test_deref_write (int __user *p, int i)
>>> +{
>>> +  *p = i; // FIXME: should this be allowed directly?
>>> +}
>>> +
>>> +typedef struct foo { int i; } __user *foo_ptr_t;
>>> +
>>> +void __user *
>>> +test_pass_through (void __user *ptr)
>>> +{
>>> +  return ptr;
>>> +}
>>> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
>>> index 8ab119dc9a2..35a7f50c06c 100644
>>> --- a/gcc/tree-core.h
>>> +++ b/gcc/tree-core.h
>>> @@ -604,7 +604,8 @@ enum cv_qualifier {
>>>      TYPE_QUAL_CONST    = 0x1,
>>>      TYPE_QUAL_VOLATILE = 0x2,
>>>      TYPE_QUAL_RESTRICT = 0x4,
>>> -  TYPE_QUAL_ATOMIC   = 0x8
>>> +  TYPE_QUAL_ATOMIC   = 0x8,
>>> +  TYPE_QUAL_UNTRUSTED = 0x10
>>>    };
>>>    
>>>    /* Standard named or nameless data types of the C compiler.  */
>>> @@ -1684,7 +1685,8 @@ struct GTY(()) tree_type_common {
>>>      unsigned typeless_storage : 1;
>>>      unsigned empty_flag : 1;
>>>      unsigned indivisible_p : 1;
>>> -  unsigned spare : 16;
>>> +  unsigned untrusted_flag : 1;
>>> +  unsigned spare : 15;
>>>    
>>>      alias_set_type alias_set;
>>>      tree pointer_to;
>>> diff --git a/gcc/tree.c b/gcc/tree.c
>>> index 845228a055b..3600639d985 100644
>>> --- a/gcc/tree.c
>>> +++ b/gcc/tree.c
>>> @@ -5379,6 +5379,7 @@ set_type_quals (tree type, int type_quals)
>>>      TYPE_VOLATILE (type) = (type_quals & TYPE_QUAL_VOLATILE) != 0;
>>>      TYPE_RESTRICT (type) = (type_quals & TYPE_QUAL_RESTRICT) != 0;
>>>      TYPE_ATOMIC (type) = (type_quals & TYPE_QUAL_ATOMIC) != 0;
>>> +  TYPE_UNTRUSTED (type) = (type_quals & TYPE_QUAL_UNTRUSTED) != 0;
>>>      TYPE_ADDR_SPACE (type) = DECODE_QUAL_ADDR_SPACE (type_quals);
>>>    }
>>>    
>>> diff --git a/gcc/tree.h b/gcc/tree.h
>>> index f62c00bc870..caab575b210 100644
>>> --- a/gcc/tree.h
>>> +++ b/gcc/tree.h
>>> @@ -2197,6 +2197,10 @@ extern tree vector_element_bits_tree
>>> (const_tree);
>>>       the term.  */
>>>    #define TYPE_RESTRICT(NODE) (TYPE_CHECK (NODE)-
>>>> type_common.restrict_flag)
>>>    
>>> +/* Nonzero in a type considered "untrusted" - values should be
>>> treated as
>>> +   under attacker control.  */
>>> +#define TYPE_UNTRUSTED(NODE) (TYPE_CHECK (NODE)-
>>>> type_common.untrusted_flag)
>>> +
>>>    /* If nonzero, type's name shouldn't be emitted into debug info.
>>> */
>>>    #define TYPE_NAMELESS(NODE) (TYPE_CHECK (NODE)-
>>>> base.u.bits.nameless_flag)
>>>    
>>> @@ -2221,6 +2225,7 @@ extern tree vector_element_bits_tree
>>> (const_tree);
>>>            | (TYPE_VOLATILE (NODE) * TYPE_QUAL_VOLATILE)         \
>>>            | (TYPE_ATOMIC (NODE) * TYPE_QUAL_ATOMIC)             \
>>>            | (TYPE_RESTRICT (NODE) * TYPE_QUAL_RESTRICT)         \
>>> +         | (TYPE_UNTRUSTED (NODE) * TYPE_QUAL_UNTRUSTED)       \
>>>            | (ENCODE_QUAL_ADDR_SPACE (TYPE_ADDR_SPACE (NODE)))))
>>>    
>>>    /* The same as TYPE_QUALS without the address space
>>> qualifications.  */
>>> @@ -2228,14 +2233,16 @@ extern tree vector_element_bits_tree
>>> (const_tree);
>>>      ((int) ((TYPE_READONLY (NODE) * TYPE_QUAL_CONST)            \
>>>            | (TYPE_VOLATILE (NODE) * TYPE_QUAL_VOLATILE)         \
>>>            | (TYPE_ATOMIC (NODE) * TYPE_QUAL_ATOMIC)             \
>>> -         | (TYPE_RESTRICT (NODE) * TYPE_QUAL_RESTRICT)))
>>> +         | (TYPE_RESTRICT (NODE) * TYPE_QUAL_RESTRICT)         \
>>> +         | (TYPE_UNTRUSTED (NODE) * TYPE_QUAL_UNTRUSTED)))
>>>    
>>>    /* The same as TYPE_QUALS without the address space and atomic
>>>       qualifications.  */
>>>    #define TYPE_QUALS_NO_ADDR_SPACE_NO_ATOMIC(NODE)              \
>>>      ((int) ((TYPE_READONLY (NODE) * TYPE_QUAL_CONST)            \
>>>            | (TYPE_VOLATILE (NODE) * TYPE_QUAL_VOLATILE)         \
>>> -         | (TYPE_RESTRICT (NODE) * TYPE_QUAL_RESTRICT)))
>>> +         | (TYPE_RESTRICT (NODE) * TYPE_QUAL_RESTRICT)         \
>>> +         | (TYPE_UNTRUSTED (NODE) * TYPE_QUAL_UNTRUSTED)))
>>>    
>>>    /* These flags are available for each language front end to use
>>> internally.  */
>>>    #define TYPE_LANG_FLAG_0(NODE) (TYPE_CHECK (NODE)-
>>>> type_common.lang_flag_0)
>>>
>>
> 
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* PING^2 (C/C++): Re: [PATCH 6/6] Add __attribute__ ((tainted))
  2022-01-06 14:08   ` PING (C/C++): " David Malcolm
@ 2022-01-10 21:36     ` David Malcolm
  2022-01-12  4:36       ` Jason Merrill
  0 siblings, 1 reply; 39+ messages in thread
From: David Malcolm @ 2022-01-10 21:36 UTC (permalink / raw)
  To: gcc-patches, linux-toolchains

On Thu, 2022-01-06 at 09:08 -0500, David Malcolm wrote:
> On Sat, 2021-11-13 at 15:37 -0500, David Malcolm wrote:
> > This patch adds a new __attribute__ ((tainted)) to the C/C++
> > frontends.
> 
> Ping for GCC C/C++ mantainers for review of the C/C++ FE parts of this
> patch (attribute registration, documentation, the name of the
> attribute, etc).
> 
> (I believe it's independent of the rest of the patch kit, in that it
> could go into trunk without needing the prior patches)
> 
> Thanks
> Dave

Getting close to end of stage 3 for GCC 12, so pinging this patch
again...

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584376.html

Thanks
Dave

> 
> 
> > 
> > It can be used on function decls: the analyzer will treat as tainted
> > all parameters to the function and all buffers pointed to by
> > parameters
> > to the function.  Adding this in one place to the Linux kernel's
> > __SYSCALL_DEFINEx macro allows the analyzer to treat all syscalls as
> > having tainted inputs.  This gives additional testing beyond e.g.
> > __user
> > pointers added by earlier patches - an example of the use of this can
> > be
> > seen in CVE-2011-2210, where given:
> > 
> >  SYSCALL_DEFINE5(osf_getsysinfo, unsigned long, op, void __user *,
> > buffer,
> >                  unsigned long, nbytes, int __user *, start, void
> > __user *, arg)
> > 
> > the analyzer will treat the nbytes param as under attacker control,
> > and
> > can complain accordingly:
> > 
> > taint-CVE-2011-2210-1.c: In function ‘sys_osf_getsysinfo’:
> > taint-CVE-2011-2210-1.c:69:21: warning: use of attacker-controlled
> > value
> >   ‘nbytes’ as size without upper-bounds checking [CWE-129] [-
> > Wanalyzer-tainted-size]
> >    69 |                 if (copy_to_user(buffer, hwrpb, nbytes) != 0)
> >       |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > 
> > Additionally, the patch allows the attribute to be used on field
> > decls:
> > specifically function pointers.  Any function used as an initializer
> > for such a field gets treated as tainted.  An example can be seen in
> > CVE-2020-13143, where adding __attribute__((tainted)) to the "store"
> > callback of configfs_attribute:
> > 
> >   struct configfs_attribute {
> >      /* [...snip...] */
> >      ssize_t (*store)(struct config_item *, const char *, size_t)
> >        __attribute__((tainted));
> >      /* [...snip...] */
> >   };
> > 
> > allows the analyzer to see:
> > 
> >  CONFIGFS_ATTR(gadget_dev_desc_, UDC);
> > 
> > and treat gadget_dev_desc_UDC_store as tainted, so that it complains:
> > 
> > taint-CVE-2020-13143-1.c: In function ‘gadget_dev_desc_UDC_store’:
> > taint-CVE-2020-13143-1.c:33:17: warning: use of attacker-controlled
> > value
> >   ‘len + 18446744073709551615’ as offset without upper-bounds
> > checking [CWE-823] [-Wanalyzer-tainted-offset]
> >    33 |         if (name[len - 1] == '\n')
> >       |             ~~~~^~~~~~~~~
> > 
> > Similarly, the attribute could be used on the ioctl callback field,
> > USB device callbacks, network-handling callbacks etc.  This
> > potentially
> > gives a lot of test coverage with relatively little code annotation,
> > and
> > without necessarily needing link-time analysis (which -fanalyzer can
> > only do at present on trivial examples).
> > 
> > I believe this is the first time we've had an attribute on a field.
> > If that's an issue, I could prepare a version of the patch that
> > merely allowed it on functions themselves.
> > 
> > As before this currently still needs -fanalyzer-checker=taint (in
> > addition to -fanalyzer).
> > 
> > gcc/analyzer/ChangeLog:
> >         * engine.cc: Include "stringpool.h", "attribs.h", and
> >         "tree-dfa.h".
> >         (mark_params_as_tainted): New.
> >         (class tainted_function_custom_event): New.
> >         (class tainted_function_info): New.
> >         (exploded_graph::add_function_entry): Handle functions with
> >         "tainted" attribute.
> >         (class tainted_field_custom_event): New.
> >         (class tainted_callback_custom_event): New.
> >         (class tainted_call_info): New.
> >         (add_tainted_callback): New.
> >         (add_any_callbacks): New.
> >         (exploded_graph::build_initial_worklist): Find callbacks that
> > are
> >         reachable from global initializers, calling add_any_callbacks
> > on
> >         them.
> > 
> > gcc/c-family/ChangeLog:
> >         * c-attribs.c (c_common_attribute_table): Add "tainted".
> >         (handle_tainted_attribute): New.
> > 
> > gcc/ChangeLog:
> >         * doc/extend.texi (Function Attributes): Note that "tainted"
> > can
> >         be used on field decls.
> >         (Common Function Attributes): Add entry on "tainted"
> > attribute.
> > 
> > gcc/testsuite/ChangeLog:
> >         * gcc.dg/analyzer/attr-tainted-1.c: New test.
> >         * gcc.dg/analyzer/attr-tainted-misuses.c: New test.
> >         * gcc.dg/analyzer/taint-CVE-2011-2210-1.c: New test.
> >         * gcc.dg/analyzer/taint-CVE-2020-13143-1.c: New test.
> >         * gcc.dg/analyzer/taint-CVE-2020-13143-2.c: New test.
> >         * gcc.dg/analyzer/taint-CVE-2020-13143.h: New test.
> >         * gcc.dg/analyzer/taint-alloc-3.c: New test.
> >         * gcc.dg/analyzer/taint-alloc-4.c: New test.
> > 
> > Signed-off-by: David Malcolm <dmalcolm@redhat.com>
> > ---
> >  gcc/analyzer/engine.cc                        | 317
> > +++++++++++++++++-
> >  gcc/c-family/c-attribs.c                      |  36 ++
> >  gcc/doc/extend.texi                           |  22 +-
> >  .../gcc.dg/analyzer/attr-tainted-1.c          |  88 +++++
> >  .../gcc.dg/analyzer/attr-tainted-misuses.c    |   6 +
> >  .../gcc.dg/analyzer/taint-CVE-2011-2210-1.c   |  93 +++++
> >  .../gcc.dg/analyzer/taint-CVE-2020-13143-1.c  |  38 +++
> >  .../gcc.dg/analyzer/taint-CVE-2020-13143-2.c  |  32 ++
> >  .../gcc.dg/analyzer/taint-CVE-2020-13143.h    |  91 +++++
> >  gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c |  21 ++
> >  gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c |  31 ++
> >  11 files changed, 772 insertions(+), 3 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-
> > misuses.c
> >  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-
> > 2210-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-
> > 13143-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-
> > 13143-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-
> > 13143.h
> >  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
> >  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
> > 
> > diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
> > index 096e219392d..5fab41daf93 100644
> > --- a/gcc/analyzer/engine.cc
> > +++ b/gcc/analyzer/engine.cc
> > @@ -68,6 +68,9 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "plugin.h"
> >  #include "target.h"
> >  #include <memory>
> > +#include "stringpool.h"
> > +#include "attribs.h"
> > +#include "tree-dfa.h"
> >  
> >  /* For an overview, see gcc/doc/analyzer.texi.  */
> >  
> > @@ -2276,6 +2279,116 @@ exploded_graph::~exploded_graph ()
> >      delete (*iter).second;
> >  }
> >  
> > +/* Subroutine for use when implementing __attribute__((tainted))
> > +   on functions and on function pointer fields in structs.
> > +
> > +   Called on STATE representing a call to FNDECL.
> > +   Mark all params of FNDECL in STATE as "tainted".  Mark the value
> > of all
> > +   regions pointed to by params of FNDECL as "tainted".
> > +
> > +   Return true if successful; return false if the "taint" state
> > machine
> > +   was not found.  */
> > +
> > +static bool
> > +mark_params_as_tainted (program_state *state, tree fndecl,
> > +                       const extrinsic_state &ext_state)
> > +{
> > +  unsigned taint_sm_idx;
> > +  if (!ext_state.get_sm_idx_by_name ("taint", &taint_sm_idx))
> > +    return false;
> > +  sm_state_map *smap = state->m_checker_states[taint_sm_idx];
> > +
> > +  const state_machine &sm = ext_state.get_sm (taint_sm_idx);
> > +  state_machine::state_t tainted = sm.get_state_by_name ("tainted");
> > +
> > +  region_model_manager *mgr = ext_state.get_model_manager ();
> > +
> > +  function *fun = DECL_STRUCT_FUNCTION (fndecl);
> > +  gcc_assert (fun);
> > +
> > +  for (tree iter_parm = DECL_ARGUMENTS (fndecl); iter_parm;
> > +       iter_parm = DECL_CHAIN (iter_parm))
> > +    {
> > +      tree param = iter_parm;
> > +      if (tree parm_default_ssa = ssa_default_def (fun, iter_parm))
> > +       param = parm_default_ssa;
> > +      const region *param_reg = state->m_region_model->get_lvalue
> > (param, NULL);
> > +      const svalue *init_sval = mgr->get_or_create_initial_value
> > (param_reg);
> > +      smap->set_state (state->m_region_model, init_sval,
> > +                      tainted, NULL /*origin_new_sval*/, ext_state);
> > +      if (POINTER_TYPE_P (TREE_TYPE (param)))
> > +       {
> > +         const region *pointee_reg = mgr->get_symbolic_region
> > (init_sval);
> > +         /* Mark "*param" as tainted.  */
> > +         const svalue *init_pointee_sval
> > +           = mgr->get_or_create_initial_value (pointee_reg);
> > +         smap->set_state (state->m_region_model, init_pointee_sval,
> > +                          tainted, NULL /*origin_new_sval*/,
> > ext_state);
> > +       }
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Custom event for use by tainted_function_info when a function
> > +   has been marked with __attribute__((tainted)).  */
> > +
> > +class tainted_function_custom_event : public custom_event
> > +{
> > +public:
> > +  tainted_function_custom_event (location_t loc, tree fndecl, int
> > depth)
> > +  : custom_event (loc, fndecl, depth),
> > +    m_fndecl (fndecl)
> > +  {
> > +  }
> > +
> > +  label_text get_desc (bool can_colorize) const FINAL OVERRIDE
> > +  {
> > +    return make_label_text
> > +      (can_colorize,
> > +       "function %qE marked with %<__attribute__((tainted))%>",
> > +       m_fndecl);
> > +  }
> > +
> > +private:
> > +  tree m_fndecl;
> > +};
> > +
> > +/* Custom exploded_edge info for top-level calls to a function
> > +   marked with __attribute__((tainted)).  */
> > +
> > +class tainted_function_info : public custom_edge_info
> > +{
> > +public:
> > +  tainted_function_info (tree fndecl)
> > +  : m_fndecl (fndecl)
> > +  {}
> > +
> > +  void print (pretty_printer *pp) const FINAL OVERRIDE
> > +  {
> > +    pp_string (pp, "call to tainted function");
> > +  };
> > +
> > +  bool update_model (region_model *,
> > +                    const exploded_edge *,
> > +                    region_model_context *) const FINAL OVERRIDE
> > +  {
> > +    /* No-op.  */
> > +    return true;
> > +  }
> > +
> > +  void add_events_to_path (checker_path *emission_path,
> > +                          const exploded_edge &) const FINAL
> > OVERRIDE
> > +  {
> > +    emission_path->add_event
> > +      (new tainted_function_custom_event
> > +       (DECL_SOURCE_LOCATION (m_fndecl), m_fndecl, 0));
> > +  }
> > +
> > +private:
> > +  tree m_fndecl;
> > +};
> > +
> >  /* Ensure that there is an exploded_node representing an external
> > call to
> >     FUN, adding it to the worklist if creating it.
> >  
> > @@ -2302,14 +2415,25 @@ exploded_graph::add_function_entry (function
> > *fun)
> >    program_state state (m_ext_state);
> >    state.push_frame (m_ext_state, fun);
> >  
> > +  custom_edge_info *edge_info = NULL;
> > +
> > +  if (lookup_attribute ("tainted", DECL_ATTRIBUTES (fun->decl)))
> > +    {
> > +      if (mark_params_as_tainted (&state, fun->decl, m_ext_state))
> > +       edge_info = new tainted_function_info (fun->decl);
> > +    }
> > +
> >    if (!state.m_valid)
> >      return NULL;
> >  
> >    exploded_node *enode = get_or_create_node (point, state, NULL);
> >    if (!enode)
> > -    return NULL;
> > +    {
> > +      delete edge_info;
> > +      return NULL;
> > +    }
> >  
> > -  add_edge (m_origin, enode, NULL);
> > +  add_edge (m_origin, enode, NULL, edge_info);
> >  
> >    m_functions_with_enodes.add (fun);
> >  
> > @@ -2623,6 +2747,184 @@ toplevel_function_p (function *fun, logger
> > *logger)
> >    return true;
> >  }
> >  
> > +/* Custom event for use by tainted_call_info when a callback field
> > has been
> > +   marked with __attribute__((tainted)), for labelling the field. 
> > */
> > +
> > +class tainted_field_custom_event : public custom_event
> > +{
> > +public:
> > +  tainted_field_custom_event (tree field)
> > +  : custom_event (DECL_SOURCE_LOCATION (field), NULL_TREE, 0),
> > +    m_field (field)
> > +  {
> > +  }
> > +
> > +  label_text get_desc (bool can_colorize) const FINAL OVERRIDE
> > +  {
> > +    return make_label_text (can_colorize,
> > +                           "field %qE of %qT"
> > +                           " is marked with
> > %<__attribute__((tainted))%>",
> > +                           m_field, DECL_CONTEXT (m_field));
> > +  }
> > +
> > +private:
> > +  tree m_field;
> > +};
> > +
> > +/* Custom event for use by tainted_call_info when a callback field
> > has been
> > +   marked with __attribute__((tainted)), for labelling the function
> > used
> > +   in that callback.  */
> > +
> > +class tainted_callback_custom_event : public custom_event
> > +{
> > +public:
> > +  tainted_callback_custom_event (location_t loc, tree fndecl, int
> > depth,
> > +                                tree field)
> > +  : custom_event (loc, fndecl, depth),
> > +    m_field (field)
> > +  {
> > +  }
> > +
> > +  label_text get_desc (bool can_colorize) const FINAL OVERRIDE
> > +  {
> > +    return make_label_text (can_colorize,
> > +                           "function %qE used as initializer for
> > field %qE"
> > +                           " marked with
> > %<__attribute__((tainted))%>",
> > +                           m_fndecl, m_field);
> > +  }
> > +
> > +private:
> > +  tree m_field;
> > +};
> > +
> > +/* Custom edge info for use when adding a function used by a
> > callback field
> > +   marked with '__attribute__((tainted))'.   */
> > +
> > +class tainted_call_info : public custom_edge_info
> > +{
> > +public:
> > +  tainted_call_info (tree field, tree fndecl, location_t loc)
> > +  : m_field (field), m_fndecl (fndecl), m_loc (loc)
> > +  {}
> > +
> > +  void print (pretty_printer *pp) const FINAL OVERRIDE
> > +  {
> > +    pp_string (pp, "call to tainted field");
> > +  };
> > +
> > +  bool update_model (region_model *,
> > +                    const exploded_edge *,
> > +                    region_model_context *) const FINAL OVERRIDE
> > +  {
> > +    /* No-op.  */
> > +    return true;
> > +  }
> > +
> > +  void add_events_to_path (checker_path *emission_path,
> > +                          const exploded_edge &) const FINAL
> > OVERRIDE
> > +  {
> > +    /* Show the field in the struct declaration
> > +       e.g. "(1) field 'store' is marked with
> > '__attribute__((tainted))'"  */
> > +    emission_path->add_event
> > +      (new tainted_field_custom_event (m_field));
> > +
> > +    /* Show the callback in the initializer
> > +       e.g.
> > +       "(2) function 'gadget_dev_desc_UDC_store' used as initializer
> > +       for field 'store' marked with '__attribute__((tainted))'". 
> > */
> > +    emission_path->add_event
> > +      (new tainted_callback_custom_event (m_loc, m_fndecl, 0,
> > m_field));
> > +  }
> > +
> > +private:
> > +  tree m_field;
> > +  tree m_fndecl;
> > +  location_t m_loc;
> > +};
> > +
> > +/* Given an initializer at LOC for FIELD marked with
> > '__attribute__((tainted))'
> > +   initialized with FNDECL, add an entrypoint to FNDECL to EG (and
> > to its
> > +   worklist) where the params to FNDECL are marked as tainted.  */
> > +
> > +static void
> > +add_tainted_callback (exploded_graph *eg, tree field, tree fndecl,
> > +                     location_t loc)
> > +{
> > +  logger *logger = eg->get_logger ();
> > +
> > +  LOG_SCOPE (logger);
> > +
> > +  if (!gimple_has_body_p (fndecl))
> > +    return;
> > +
> > +  const extrinsic_state &ext_state = eg->get_ext_state ();
> > +
> > +  function *fun = DECL_STRUCT_FUNCTION (fndecl);
> > +  gcc_assert (fun);
> > +
> > +  program_point point
> > +    = program_point::from_function_entry (eg->get_supergraph (),
> > fun);
> > +  program_state state (ext_state);
> > +  state.push_frame (ext_state, fun);
> > +
> > +  if (!mark_params_as_tainted (&state, fndecl, ext_state))
> > +    return;
> > +
> > +  if (!state.m_valid)
> > +    return;
> > +
> > +  exploded_node *enode = eg->get_or_create_node (point, state,
> > NULL);
> > +  if (logger)
> > +    {
> > +      if (enode)
> > +       logger->log ("created EN %i for tainted %qE entrypoint",
> > +                    enode->m_index, fndecl);
> > +      else
> > +       {
> > +         logger->log ("did not create enode for tainted %qE
> > entrypoint",
> > +                      fndecl);
> > +         return;
> > +       }
> > +    }
> > +
> > +  tainted_call_info *info = new tainted_call_info (field, fndecl,
> > loc);
> > +  eg->add_edge (eg->get_origin (), enode, NULL, info);
> > +}
> > +
> > +/* Callback for walk_tree for finding callbacks within initializers;
> > +   ensure that any callback initializer where the corresponding
> > field is
> > +   marked with '__attribute__((tainted))' is treated as an
> > entrypoint to the
> > +   analysis, special-casing that the inputs to the callback are
> > +   untrustworthy.  */
> > +
> > +static tree
> > +add_any_callbacks (tree *tp, int *, void *data)
> > +{
> > +  exploded_graph *eg = (exploded_graph *)data;
> > +  if (TREE_CODE (*tp) == CONSTRUCTOR)
> > +    {
> > +      /* Find fields with the "tainted" attribute.
> > +        walk_tree only walks the values, not the index values;
> > +        look at the index values.  */
> > +      unsigned HOST_WIDE_INT idx;
> > +      constructor_elt *ce;
> > +
> > +      for (idx = 0; vec_safe_iterate (CONSTRUCTOR_ELTS (*tp), idx,
> > &ce);
> > +          idx++)
> > +       if (ce->index && TREE_CODE (ce->index) == FIELD_DECL)
> > +         if (lookup_attribute ("tainted", DECL_ATTRIBUTES (ce-
> > > index)))
> > +           {
> > +             tree value = ce->value;
> > +             if (TREE_CODE (value) == ADDR_EXPR
> > +                 && TREE_CODE (TREE_OPERAND (value, 0)) ==
> > FUNCTION_DECL)
> > +               add_tainted_callback (eg, ce->index, TREE_OPERAND
> > (value, 0),
> > +                                     EXPR_LOCATION (value));
> > +           }
> > +    }
> > +
> > +  return NULL_TREE;
> > +}
> > +
> >  /* Add initial nodes to EG, with entrypoints for externally-callable
> >     functions.  */
> >  
> > @@ -2648,6 +2950,17 @@ exploded_graph::build_initial_worklist ()
> >           logger->log ("did not create enode for %qE entrypoint",
> > fun->decl);
> >        }
> >    }
> > +
> > +  /* Find callbacks that are reachable from global initializers.  */
> > +  varpool_node *vpnode;
> > +  FOR_EACH_VARIABLE (vpnode)
> > +    {
> > +      tree decl = vpnode->decl;
> > +      tree init = DECL_INITIAL (decl);
> > +      if (!init)
> > +       continue;
> > +      walk_tree (&init, add_any_callbacks, this, NULL);
> > +    }
> >  }
> >  
> >  /* The main loop of the analysis.
> > diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> > index 9e03156de5e..835ba6e0e8c 100644
> > --- a/gcc/c-family/c-attribs.c
> > +++ b/gcc/c-family/c-attribs.c
> > @@ -117,6 +117,7 @@ static tree
> > handle_no_profile_instrument_function_attribute (tree *, tree,
> >                                                              tree,
> > int, bool *);
> >  static tree handle_malloc_attribute (tree *, tree, tree, int, bool
> > *);
> >  static tree handle_dealloc_attribute (tree *, tree, tree, int, bool
> > *);
> > +static tree handle_tainted_attribute (tree *, tree, tree, int, bool
> > *);
> >  static tree handle_returns_twice_attribute (tree *, tree, tree, int,
> > bool *);
> >  static tree handle_no_limit_stack_attribute (tree *, tree, tree,
> > int,
> >                                              bool *);
> > @@ -569,6 +570,8 @@ const struct attribute_spec
> > c_common_attribute_table[] =
> >                               handle_objc_nullability_attribute, NULL
> > },
> >    { "*dealloc",                1, 2, true, false, false, false,
> >                               handle_dealloc_attribute, NULL },
> > +  { "tainted",               0, 0, true,  false, false, false,
> > +                             handle_tainted_attribute, NULL },
> >    { NULL,                     0, 0, false, false, false, false,
> > NULL, NULL }
> >  };
> >  
> > @@ -5857,6 +5860,39 @@ handle_objc_nullability_attribute (tree *node,
> > tree name, tree args,
> >    return NULL_TREE;
> >  }
> >  
> > +/* Handle a "tainted" attribute; arguments as in
> > +   struct attribute_spec.handler.  */
> > +
> > +static tree
> > +handle_tainted_attribute (tree *node, tree name, tree, int,
> > +                         bool *no_add_attrs)
> > +{
> > +  if (TREE_CODE (*node) != FUNCTION_DECL
> > +      && TREE_CODE (*node) != FIELD_DECL)
> > +    {
> > +      warning (OPT_Wattributes, "%qE attribute ignored; valid only "
> > +              "for functions and function pointer fields",
> > +              name);
> > +      *no_add_attrs = true;
> > +      return NULL_TREE;
> > +    }
> > +
> > +  if (TREE_CODE (*node) == FIELD_DECL
> > +      && !(TREE_CODE (TREE_TYPE (*node)) == POINTER_TYPE
> > +          && TREE_CODE (TREE_TYPE (TREE_TYPE (*node))) ==
> > FUNCTION_TYPE))
> > +    {
> > +      warning (OPT_Wattributes, "%qE attribute ignored;"
> > +              " field must be a function pointer",
> > +              name);
> > +      *no_add_attrs = true;
> > +      return NULL_TREE;
> > +    }
> > +
> > +  *no_add_attrs = false; /* OK */
> > +
> > +  return NULL_TREE;
> > +}
> > +
> >  /* Attempt to partially validate a single attribute ATTR as if
> >     it were to be applied to an entity OPER.  */
> >  
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > index 5a6ef464779..826bbd48e7e 100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -2465,7 +2465,8 @@ variable declarations (@pxref{Variable
> > Attributes}),
> >  labels (@pxref{Label Attributes}),
> >  enumerators (@pxref{Enumerator Attributes}),
> >  statements (@pxref{Statement Attributes}),
> > -and types (@pxref{Type Attributes}).
> > +types (@pxref{Type Attributes}),
> > +and on field declarations (for @code{tainted}).
> >  
> >  There is some overlap between the purposes of attributes and pragmas
> >  (@pxref{Pragmas,,Pragmas Accepted by GCC}).  It has been
> > @@ -3977,6 +3978,25 @@ addition to creating a symbol version (as if
> >  @code{"@var{name2}@@@var{nodename}"} was used) the version will be
> > also used
> >  to resolve @var{name2} by the linker.
> >  
> > +@item tainted
> > +@cindex @code{tainted} function attribute
> > +The @code{tainted} attribute is used to specify that a function is
> > called
> > +in a way that requires sanitization of its arguments, such as a
> > system
> > +call in an operating system kernel.  Such a function can be
> > considered part
> > +of the ``attack surface'' of the program.  The attribute can be used
> > both
> > +on function declarations, and on field declarations containing
> > function
> > +pointers.  In the latter case, any function used as an initializer
> > of
> > +such a callback field will be treated as tainted.
> > +
> > +The analyzer will pay particular attention to such functions when
> > both
> > +@option{-fanalyzer} and @option{-fanalyzer-checker=taint} are
> > supplied,
> > +potentially issuing warnings guarded by
> > +@option{-Wanalyzer-exposure-through-uninit-copy},
> > +@option{-Wanalyzer-tainted-allocation-size},
> > +@option{-Wanalyzer-tainted-array-index},
> > +@option{Wanalyzer-tainted-offset},
> > +and @option{Wanalyzer-tainted-size}.
> > +
> >  @item target_clones (@var{options})
> >  @cindex @code{target_clones} function attribute
> >  The @code{target_clones} attribute is used to specify that a
> > function
> > diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
> > b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
> > new file mode 100644
> > index 00000000000..cc4d5900372
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
> > @@ -0,0 +1,88 @@
> > +// TODO: remove need for this option
> > +/* { dg-additional-options "-fanalyzer-checker=taint" } */
> > +
> > +#include "analyzer-decls.h"
> > +
> > +struct arg_buf
> > +{
> > +  int i;
> > +  int j;
> > +};
> > +
> > +/* Example of marking a function as tainted.  */
> > +
> > +void __attribute__((tainted))
> > +test_1 (int i, void *p, char *q)
> > +{
> > +  /* There should be a single enode,
> > +     for the "tainted" entry to the function.  */
> > +  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1 processed
> > enode" } */
> > +
> > +  __analyzer_dump_state ("taint", i); /* { dg-warning "state:
> > 'tainted'" } */
> > +  __analyzer_dump_state ("taint", p); /* { dg-warning "state:
> > 'tainted'" } */
> > +  __analyzer_dump_state ("taint", q); /* { dg-warning "state:
> > 'tainted'" } */
> > +  __analyzer_dump_state ("taint", *q); /* { dg-warning "state:
> > 'tainted'" } */
> > +
> > +  struct arg_buf *args = p;
> > +  __analyzer_dump_state ("taint", args->i); /* { dg-warning "state:
> > 'tainted'" } */
> > +  __analyzer_dump_state ("taint", args->j); /* { dg-warning "state:
> > 'tainted'" } */  
> > +}
> > +
> > +/* Example of marking a callback field as tainted.  */
> > +
> > +struct s2
> > +{
> > +  void (*cb) (int, void *, char *)
> > +    __attribute__((tainted));
> > +};
> > +
> > +/* Function not marked as tainted.  */
> > +
> > +void
> > +test_2a (int i, void *p, char *q)
> > +{
> > +  /* There should be a single enode,
> > +     for the normal entry to the function.  */
> > +  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1 processed
> > enode" } */
> > +
> > +  __analyzer_dump_state ("taint", i); /* { dg-warning "state:
> > 'start'" } */
> > +  __analyzer_dump_state ("taint", p); /* { dg-warning "state:
> > 'start'" } */
> > +  __analyzer_dump_state ("taint", q); /* { dg-warning "state:
> > 'start'" } */
> > +
> > +  struct arg_buf *args = p;
> > +  __analyzer_dump_state ("taint", args->i); /* { dg-warning "state:
> > 'start'" } */
> > +  __analyzer_dump_state ("taint", args->j); /* { dg-warning "state:
> > 'start'" } */  
> > +}
> > +
> > +/* Function referenced via t2b.cb, marked as "tainted".  */
> > +
> > +void
> > +test_2b (int i, void *p, char *q)
> > +{
> > +  /* There should be two enodes
> > +     for the direct call, and the "tainted" entry to the function. 
> > */
> > +  __analyzer_dump_exploded_nodes (0); /* { dg-warning "2 processed
> > enodes" } */
> > +}
> > +
> > +/* Callback used via t2c.cb, marked as "tainted".  */
> > +void
> > +__analyzer_test_2c (int i, void *p, char *q)
> > +{
> > +  /* There should be a single enode,
> > +     for the "tainted" entry to the function.  */
> > +  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1 processed
> > enode" } */
> > +
> > +  __analyzer_dump_state ("taint", i); /* { dg-warning "state:
> > 'tainted'" } */
> > +  __analyzer_dump_state ("taint", p); /* { dg-warning "state:
> > 'tainted'" } */
> > +  __analyzer_dump_state ("taint", q); /* { dg-warning "state:
> > 'tainted'" } */
> > +}
> > +
> > +struct s2 t2b =
> > +{
> > +  .cb = test_2b
> > +};
> > +
> > +struct s2 t2c =
> > +{
> > +  .cb = __analyzer_test_2c
> > +};
> > diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
> > b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
> > new file mode 100644
> > index 00000000000..6f4cbc82efb
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
> > @@ -0,0 +1,6 @@
> > +int not_a_fn __attribute__ ((tainted)); /* { dg-warning "'tainted'
> > attribute ignored; valid only for functions and function pointer
> > fields" } */
> > +
> > +struct s
> > +{
> > +  int f __attribute__ ((tainted)); /* { dg-warning "'tainted'
> > attribute ignored; field must be a function pointer" } */
> > +};
> > diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
> > b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
> > new file mode 100644
> > index 00000000000..fe6c7ebbb1f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
> > @@ -0,0 +1,93 @@
> > +/* "The osf_getsysinfo function in arch/alpha/kernel/osf_sys.c in
> > the
> > +   Linux kernel before 2.6.39.4 on the Alpha platform does not
> > properly
> > +   restrict the data size for GSI_GET_HWRPB operations, which allows
> > +   local users to obtain sensitive information from kernel memory
> > via
> > +   a crafted call."
> > +
> > +   Fixed in 3d0475119d8722798db5e88f26493f6547a4bb5b on linux-
> > 2.6.39.y
> > +   in linux-stable.  */
> > +
> > +// TODO: remove need for this option:
> > +/* { dg-additional-options "-fanalyzer-checker=taint" } */
> > +
> > +#include "analyzer-decls.h"
> > +#include "test-uaccess.h"
> > +
> > +/* Adapted from include/linux/linkage.h.  */
> > +
> > +#define asmlinkage
> > +
> > +/* Adapted from include/linux/syscalls.h.  */
> > +
> > +#define __SC_DECL1(t1, a1)     t1 a1
> > +#define __SC_DECL2(t2, a2, ...) t2 a2, __SC_DECL1(__VA_ARGS__)
> > +#define __SC_DECL3(t3, a3, ...) t3 a3, __SC_DECL2(__VA_ARGS__)
> > +#define __SC_DECL4(t4, a4, ...) t4 a4, __SC_DECL3(__VA_ARGS__)
> > +#define __SC_DECL5(t5, a5, ...) t5 a5, __SC_DECL4(__VA_ARGS__)
> > +#define __SC_DECL6(t6, a6, ...) t6 a6, __SC_DECL5(__VA_ARGS__)
> > +
> > +#define SYSCALL_DEFINEx(x, sname, ...)                         \
> > +       __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
> > +
> > +#define SYSCALL_DEFINE(name) asmlinkage long sys_##name
> > +#define __SYSCALL_DEFINEx(x, name,
> > ...)                                        \
> > +       asmlinkage __attribute__((tainted)) \
> > +       long sys##name(__SC_DECL##x(__VA_ARGS__))
> > +
> > +#define SYSCALL_DEFINE5(name, ...) SYSCALL_DEFINEx(5, _##name,
> > __VA_ARGS__)
> > +
> > +/* Adapted from arch/alpha/include/asm/hwrpb.h.  */
> > +
> > +struct hwrpb_struct {
> > +       unsigned long phys_addr;        /* check: physical address of
> > the hwrpb */
> > +       unsigned long id;               /* check: "HWRPB\0\0\0" */
> > +       unsigned long revision;
> > +       unsigned long size;             /* size of hwrpb */
> > +       /* [...snip...] */
> > +};
> > +
> > +extern struct hwrpb_struct *hwrpb;
> > +
> > +/* Adapted from arch/alpha/kernel/osf_sys.c.  */
> > +
> > +SYSCALL_DEFINE5(osf_getsysinfo, unsigned long, op, void __user *,
> > buffer,
> > +               unsigned long, nbytes, int __user *, start, void
> > __user *, arg)
> > +{
> > +       /* [...snip...] */
> > +
> > +       __analyzer_dump_state ("taint", nbytes);  /* { dg-warning
> > "tainted" } */
> > +
> > +       /* TODO: should have an event explaining why "nbytes" is
> > treated as
> > +          attacker-controlled.  */
> > +
> > +       /* case GSI_GET_HWRPB: */
> > +               if (nbytes < sizeof(*hwrpb))
> > +                       return -1;
> > +
> > +               __analyzer_dump_state ("taint", nbytes);  /* { dg-
> > warning "has_lb" } */
> > +
> > +               if (copy_to_user(buffer, hwrpb, nbytes) != 0) /* {
> > dg-warning "use of attacker-controlled value 'nbytes' as size without
> > upper-bounds checking" } */
> > +                       return -2;
> > +
> > +               return 1;
> > +
> > +       /* [...snip...] */
> > +}
> > +
> > +/* With the fix for the sense of the size comparison.  */
> > +
> > +SYSCALL_DEFINE5(osf_getsysinfo_fixed, unsigned long, op, void __user
> > *, buffer,
> > +               unsigned long, nbytes, int __user *, start, void
> > __user *, arg)
> > +{
> > +       /* [...snip...] */
> > +
> > +       /* case GSI_GET_HWRPB: */
> > +               if (nbytes > sizeof(*hwrpb))
> > +                       return -1;
> > +               if (copy_to_user(buffer, hwrpb, nbytes) != 0) /* {
> > dg-bogus "attacker-controlled" } */
> > +                       return -2;
> > +
> > +               return 1;
> > +
> > +       /* [...snip...] */
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
> > b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
> > new file mode 100644
> > index 00000000000..0b9a94a8d6c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
> > @@ -0,0 +1,38 @@
> > +/* See notes in this header.  */
> > +#include "taint-CVE-2020-13143.h"
> > +
> > +// TODO: remove need for this option
> > +/* { dg-additional-options "-fanalyzer-checker=taint" } */
> > +
> > +struct configfs_attribute {
> > +       /* [...snip...] */
> > +       ssize_t (*store)(struct config_item *, const char *, size_t)
> > /* { dg-message "\\(1\\) field 'store' of 'struct configfs_attribute'
> > is marked with '__attribute__\\(\\(tainted\\)\\)'" } */
> > +               __attribute__((tainted)); /* (this is added).  */
> > +};
> > +static inline struct gadget_info *to_gadget_info(struct config_item
> > *item)
> > +{
> > +        return container_of(to_config_group(item), struct
> > gadget_info, group);
> > +}
> > +
> > +static ssize_t gadget_dev_desc_UDC_store(struct config_item *item,
> > +               const char *page, size_t len)
> > +{
> > +       struct gadget_info *gi = to_gadget_info(item);
> > +       char *name;
> > +       int ret;
> > +
> > +#if 0
> > +       /* FIXME: this is the fix.  */
> > +       if (strlen(page) < len)
> > +               return -EOVERFLOW;
> > +#endif
> > +
> > +       name = kstrdup(page, GFP_KERNEL);
> > +       if (!name)
> > +               return -ENOMEM;
> > +       if (name[len - 1] == '\n') /* { dg-warning "use of attacker-
> > controlled value 'len \[^\n\r\]+' as offset without upper-bounds
> > checking" } */
> > +               name[len - 1] = '\0'; /* { dg-warning "use of
> > attacker-controlled value 'len \[^\n\r\]+' as offset without upper-
> > bounds checking" } */
> > +       /* [...snip...] */                              \
> > +}
> > +
> > +CONFIGFS_ATTR(gadget_dev_desc_, UDC); /* { dg-message "\\(2\\)
> > function 'gadget_dev_desc_UDC_store' used as initializer for field
> > 'store' marked with '__attribute__\\(\\(tainted\\)\\)'" } */
> > diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
> > b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
> > new file mode 100644
> > index 00000000000..e05da9276c1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
> > @@ -0,0 +1,32 @@
> > +/* See notes in this header.  */
> > +#include "taint-CVE-2020-13143.h"
> > +
> > +// TODO: remove need for this option
> > +/* { dg-additional-options "-fanalyzer-checker=taint" } */
> > +
> > +struct configfs_attribute {
> > +       /* [...snip...] */
> > +       ssize_t (*store)(struct config_item *, const char *, size_t)
> > /* { dg-message "\\(1\\) field 'store' of 'struct configfs_attribute'
> > is marked with '__attribute__\\(\\(tainted\\)\\)'" } */
> > +               __attribute__((tainted)); /* (this is added).  */
> > +};
> > +
> > +/* Highly simplified version.  */
> > +
> > +static ssize_t gadget_dev_desc_UDC_store(struct config_item *item,
> > +               const char *page, size_t len)
> > +{
> > +       /* TODO: ought to have state_change_event talking about where
> > the tainted value comes from.  */
> > +
> > +       char *name;
> > +       /* [...snip...] */
> > +
> > +       name = kstrdup(page, GFP_KERNEL);
> > +       if (!name)
> > +               return -ENOMEM;
> > +       if (name[len - 1] == '\n') /* { dg-warning "use of attacker-
> > controlled value 'len \[^\n\r\]+' as offset without upper-bounds
> > checking" } */
> > +               name[len - 1] = '\0';  /* { dg-warning "use of
> > attacker-controlled value 'len \[^\n\r\]+' as offset without upper-
> > bounds checking" } */
> > +       /* [...snip...] */
> > +       return 0;
> > +}
> > +
> > +CONFIGFS_ATTR(gadget_dev_desc_, UDC); /* { dg-message "\\(2\\)
> > function 'gadget_dev_desc_UDC_store' used as initializer for field
> > 'store' marked with '__attribute__\\(\\(tainted\\)\\)'" } */
> > diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
> > b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
> > new file mode 100644
> > index 00000000000..0ba023539af
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
> > @@ -0,0 +1,91 @@
> > +/* Shared header for the various taint-CVE-2020-13143.h tests.
> > +   
> > +   "gadget_dev_desc_UDC_store in drivers/usb/gadget/configfs.c in
> > the
> > +   Linux kernel 3.16 through 5.6.13 relies on kstrdup without
> > considering
> > +   the possibility of an internal '\0' value, which allows attackers
> > to
> > +   trigger an out-of-bounds read, aka CID-15753588bcd4."
> > +
> > +   Fixed by 15753588bcd4bbffae1cca33c8ced5722477fe1f on linux-5.7.y
> > +   in linux-stable.  */
> > +
> > +// TODO: remove need for this option
> > +/* { dg-additional-options "-fanalyzer-checker=taint" } */
> > +
> > +#include <stddef.h>
> > +
> > +/* Adapted from include/uapi/asm-generic/posix_types.h  */
> > +
> > +typedef unsigned int     __kernel_size_t;
> > +typedef int              __kernel_ssize_t;
> > +
> > +/* Adapted from include/linux/types.h  */
> > +
> > +//typedef __kernel_size_t              size_t;
> > +typedef __kernel_ssize_t       ssize_t;
> > +
> > +/* Adapted from include/linux/kernel.h  */
> > +
> > +#define container_of(ptr, type, member)
> > ({                             \
> > +       void *__mptr = (void
> > *)(ptr);                                   \
> > +       /* [...snip...]
> > */                                              \
> > +       ((type *)(__mptr - offsetof(type, member))); })
> > +
> > +/* Adapted from include/linux/configfs.h  */
> > +
> > +struct config_item {
> > +       /* [...snip...] */
> > +};
> > +
> > +struct config_group {
> > +       struct config_item              cg_item;
> > +       /* [...snip...] */
> > +};
> > +
> > +static inline struct config_group *to_config_group(struct
> > config_item *item)
> > +{
> > +       return item ? container_of(item,struct config_group,cg_item)
> > : NULL;
> > +}
> > +
> > +#define CONFIGFS_ATTR(_pfx, _name)                             \
> > +static struct configfs_attribute _pfx##attr_##_name = {        \
> > +       /* [...snip...] */                              \
> > +       .store          = _pfx##_name##_store,          \
> > +}
> > +
> > +/* Adapted from include/linux/compiler.h  */
> > +
> > +#define __force
> > +
> > +/* Adapted from include/asm-generic/errno-base.h  */
> > +
> > +#define        ENOMEM          12      /* Out of memory */
> > +
> > +/* Adapted from include/linux/types.h  */
> > +
> > +#define __bitwise__
> > +typedef unsigned __bitwise__ gfp_t;
> > +
> > +/* Adapted from include/linux/gfp.h  */
> > +
> > +#define ___GFP_WAIT            0x10u
> > +#define ___GFP_IO              0x40u
> > +#define ___GFP_FS              0x80u
> > +#define __GFP_WAIT     ((__force gfp_t)___GFP_WAIT)
> > +#define __GFP_IO       ((__force gfp_t)___GFP_IO)
> > +#define __GFP_FS       ((__force gfp_t)___GFP_FS)
> > +#define GFP_KERNEL  (__GFP_WAIT | __GFP_IO | __GFP_FS)
> > +
> > +/* Adapted from include/linux/compiler_attributes.h  */
> > +
> > +#define __malloc                        __attribute__((__malloc__))
> > +
> > +/* Adapted from include/linux/string.h  */
> > +
> > +extern char *kstrdup(const char *s, gfp_t gfp) __malloc;
> > +
> > +/* Adapted from drivers/usb/gadget/configfs.c  */
> > +
> > +struct gadget_info {
> > +       struct config_group group;
> > +       /* [...snip...] */                              \
> > +};
> > diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
> > b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
> > new file mode 100644
> > index 00000000000..4c567b2ffdf
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
> > @@ -0,0 +1,21 @@
> > +// TODO: remove need for this option:
> > +/* { dg-additional-options "-fanalyzer-checker=taint" } */
> > +
> > +#include "analyzer-decls.h"
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +
> > +/* malloc with tainted size from a syscall.  */
> > +
> > +void *p;
> > +
> > +void __attribute__((tainted))
> > +test_1 (size_t sz) /* { dg-message "\\(1\\) function 'test_1' marked
> > with '__attribute__\\(\\(tainted\\)\\)'" } */
> > +{
> > +  /* TODO: should have a message saying why "sz" is tainted, e.g.
> > +     "treating 'sz' as attacker-controlled because 'test_1' is
> > marked with '__attribute__((tainted))'"  */
> > +
> > +  p = malloc (sz); /* { dg-warning "use of attacker-controlled value
> > 'sz' as allocation size without upper-bounds checking" "warning" } */
> > +  /* { dg-message "\\(\[0-9\]+\\) use of attacker-controlled value
> > 'sz' as allocation size without upper-bounds checking" "final event"
> > { target *-*-* } .-1 } */
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
> > b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
> > new file mode 100644
> > index 00000000000..f52cafcd71d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
> > @@ -0,0 +1,31 @@
> > +// TODO: remove need for this option:
> > +/* { dg-additional-options "-fanalyzer-checker=taint" } */
> > +
> > +#include "analyzer-decls.h"
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +
> > +/* malloc with tainted size from a syscall.  */
> > +
> > +struct arg_buf
> > +{
> > +  size_t sz;
> > +};
> > +
> > +void *p;
> > +
> > +void __attribute__((tainted))
> > +test_1 (void *data) /* { dg-message "\\(1\\) function 'test_1'
> > marked with '__attribute__\\(\\(tainted\\)\\)'" } */
> > +{
> > +  /* we should treat pointed-to-structs as tainted.  */
> > +  __analyzer_dump_state ("taint", data); /* { dg-warning "state:
> > 'tainted'" } */
> > +  
> > +  struct arg_buf *args = data;
> > +
> > +  __analyzer_dump_state ("taint", args); /* { dg-warning "state:
> > 'tainted'" } */
> > +  __analyzer_dump_state ("taint", args->sz); /* { dg-warning "state:
> > 'tainted'" } */
> > +  
> > +  p = malloc (args->sz); /* { dg-warning "use of attacker-controlled
> > value '\\*args.sz' as allocation size without upper-bounds checking"
> > "warning" } */
> > +  /* { dg-message "\\(\[0-9\]+\\) use of attacker-controlled value
> > '\\*args.sz' as allocation size without upper-bounds checking" "final
> > event" { target *-*-* } .-1 } */
> > +}
> 



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: PING^2 (C/C++): Re: [PATCH 6/6] Add __attribute__ ((tainted))
  2022-01-10 21:36     ` PING^2 " David Malcolm
@ 2022-01-12  4:36       ` Jason Merrill
  2022-01-12 15:33         ` David Malcolm
  0 siblings, 1 reply; 39+ messages in thread
From: Jason Merrill @ 2022-01-12  4:36 UTC (permalink / raw)
  To: David Malcolm, gcc-patches, linux-toolchains

On 1/10/22 16:36, David Malcolm via Gcc-patches wrote:
> On Thu, 2022-01-06 at 09:08 -0500, David Malcolm wrote:
>> On Sat, 2021-11-13 at 15:37 -0500, David Malcolm wrote:
>>> This patch adds a new __attribute__ ((tainted)) to the C/C++
>>> frontends.
>>
>> Ping for GCC C/C++ mantainers for review of the C/C++ FE parts of this
>> patch (attribute registration, documentation, the name of the
>> attribute, etc).
>>
>> (I believe it's independent of the rest of the patch kit, in that it
>> could go into trunk without needing the prior patches)
>>
>> Thanks
>> Dave
> 
> Getting close to end of stage 3 for GCC 12, so pinging this patch
> again...
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584376.html

The c-family change is OK.

> Thanks
> Dave
> 
>>
>>
>>>
>>> It can be used on function decls: the analyzer will treat as tainted
>>> all parameters to the function and all buffers pointed to by
>>> parameters
>>> to the function.  Adding this in one place to the Linux kernel's
>>> __SYSCALL_DEFINEx macro allows the analyzer to treat all syscalls as
>>> having tainted inputs.  This gives additional testing beyond e.g.
>>> __user
>>> pointers added by earlier patches - an example of the use of this can
>>> be
>>> seen in CVE-2011-2210, where given:
>>>
>>>   SYSCALL_DEFINE5(osf_getsysinfo, unsigned long, op, void __user *,
>>> buffer,
>>>                   unsigned long, nbytes, int __user *, start, void
>>> __user *, arg)
>>>
>>> the analyzer will treat the nbytes param as under attacker control,
>>> and
>>> can complain accordingly:
>>>
>>> taint-CVE-2011-2210-1.c: In function ‘sys_osf_getsysinfo’:
>>> taint-CVE-2011-2210-1.c:69:21: warning: use of attacker-controlled
>>> value
>>>    ‘nbytes’ as size without upper-bounds checking [CWE-129] [-
>>> Wanalyzer-tainted-size]
>>>     69 |                 if (copy_to_user(buffer, hwrpb, nbytes) != 0)
>>>        |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>> Additionally, the patch allows the attribute to be used on field
>>> decls:
>>> specifically function pointers.  Any function used as an initializer
>>> for such a field gets treated as tainted.  An example can be seen in
>>> CVE-2020-13143, where adding __attribute__((tainted)) to the "store"
>>> callback of configfs_attribute:
>>>
>>>    struct configfs_attribute {
>>>       /* [...snip...] */
>>>       ssize_t (*store)(struct config_item *, const char *, size_t)
>>>         __attribute__((tainted));
>>>       /* [...snip...] */
>>>    };
>>>
>>> allows the analyzer to see:
>>>
>>>   CONFIGFS_ATTR(gadget_dev_desc_, UDC);
>>>
>>> and treat gadget_dev_desc_UDC_store as tainted, so that it complains:
>>>
>>> taint-CVE-2020-13143-1.c: In function ‘gadget_dev_desc_UDC_store’:
>>> taint-CVE-2020-13143-1.c:33:17: warning: use of attacker-controlled
>>> value
>>>    ‘len + 18446744073709551615’ as offset without upper-bounds
>>> checking [CWE-823] [-Wanalyzer-tainted-offset]
>>>     33 |         if (name[len - 1] == '\n')
>>>        |             ~~~~^~~~~~~~~
>>>
>>> Similarly, the attribute could be used on the ioctl callback field,
>>> USB device callbacks, network-handling callbacks etc.  This
>>> potentially
>>> gives a lot of test coverage with relatively little code annotation,
>>> and
>>> without necessarily needing link-time analysis (which -fanalyzer can
>>> only do at present on trivial examples).
>>>
>>> I believe this is the first time we've had an attribute on a field.
>>> If that's an issue, I could prepare a version of the patch that
>>> merely allowed it on functions themselves.
>>>
>>> As before this currently still needs -fanalyzer-checker=taint (in
>>> addition to -fanalyzer).
>>>
>>> gcc/analyzer/ChangeLog:
>>>          * engine.cc: Include "stringpool.h", "attribs.h", and
>>>          "tree-dfa.h".
>>>          (mark_params_as_tainted): New.
>>>          (class tainted_function_custom_event): New.
>>>          (class tainted_function_info): New.
>>>          (exploded_graph::add_function_entry): Handle functions with
>>>          "tainted" attribute.
>>>          (class tainted_field_custom_event): New.
>>>          (class tainted_callback_custom_event): New.
>>>          (class tainted_call_info): New.
>>>          (add_tainted_callback): New.
>>>          (add_any_callbacks): New.
>>>          (exploded_graph::build_initial_worklist): Find callbacks that
>>> are
>>>          reachable from global initializers, calling add_any_callbacks
>>> on
>>>          them.
>>>
>>> gcc/c-family/ChangeLog:
>>>          * c-attribs.c (c_common_attribute_table): Add "tainted".
>>>          (handle_tainted_attribute): New.
>>>
>>> gcc/ChangeLog:
>>>          * doc/extend.texi (Function Attributes): Note that "tainted"
>>> can
>>>          be used on field decls.
>>>          (Common Function Attributes): Add entry on "tainted"
>>> attribute.
>>>
>>> gcc/testsuite/ChangeLog:
>>>          * gcc.dg/analyzer/attr-tainted-1.c: New test.
>>>          * gcc.dg/analyzer/attr-tainted-misuses.c: New test.
>>>          * gcc.dg/analyzer/taint-CVE-2011-2210-1.c: New test.
>>>          * gcc.dg/analyzer/taint-CVE-2020-13143-1.c: New test.
>>>          * gcc.dg/analyzer/taint-CVE-2020-13143-2.c: New test.
>>>          * gcc.dg/analyzer/taint-CVE-2020-13143.h: New test.
>>>          * gcc.dg/analyzer/taint-alloc-3.c: New test.
>>>          * gcc.dg/analyzer/taint-alloc-4.c: New test.
>>>
>>> Signed-off-by: David Malcolm <dmalcolm@redhat.com>
>>> ---
>>>   gcc/analyzer/engine.cc                        | 317
>>> +++++++++++++++++-
>>>   gcc/c-family/c-attribs.c                      |  36 ++
>>>   gcc/doc/extend.texi                           |  22 +-
>>>   .../gcc.dg/analyzer/attr-tainted-1.c          |  88 +++++
>>>   .../gcc.dg/analyzer/attr-tainted-misuses.c    |   6 +
>>>   .../gcc.dg/analyzer/taint-CVE-2011-2210-1.c   |  93 +++++
>>>   .../gcc.dg/analyzer/taint-CVE-2020-13143-1.c  |  38 +++
>>>   .../gcc.dg/analyzer/taint-CVE-2020-13143-2.c  |  32 ++
>>>   .../gcc.dg/analyzer/taint-CVE-2020-13143.h    |  91 +++++
>>>   gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c |  21 ++
>>>   gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c |  31 ++
>>>   11 files changed, 772 insertions(+), 3 deletions(-)
>>>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
>>>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-
>>> misuses.c
>>>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-
>>> 2210-1.c
>>>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-
>>> 13143-1.c
>>>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-
>>> 13143-2.c
>>>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-
>>> 13143.h
>>>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
>>>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
>>>
>>> diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
>>> index 096e219392d..5fab41daf93 100644
>>> --- a/gcc/analyzer/engine.cc
>>> +++ b/gcc/analyzer/engine.cc
>>> @@ -68,6 +68,9 @@ along with GCC; see the file COPYING3.  If not see
>>>   #include "plugin.h"
>>>   #include "target.h"
>>>   #include <memory>
>>> +#include "stringpool.h"
>>> +#include "attribs.h"
>>> +#include "tree-dfa.h"
>>>   
>>>   /* For an overview, see gcc/doc/analyzer.texi.  */
>>>   
>>> @@ -2276,6 +2279,116 @@ exploded_graph::~exploded_graph ()
>>>       delete (*iter).second;
>>>   }
>>>   
>>> +/* Subroutine for use when implementing __attribute__((tainted))
>>> +   on functions and on function pointer fields in structs.
>>> +
>>> +   Called on STATE representing a call to FNDECL.
>>> +   Mark all params of FNDECL in STATE as "tainted".  Mark the value
>>> of all
>>> +   regions pointed to by params of FNDECL as "tainted".
>>> +
>>> +   Return true if successful; return false if the "taint" state
>>> machine
>>> +   was not found.  */
>>> +
>>> +static bool
>>> +mark_params_as_tainted (program_state *state, tree fndecl,
>>> +                       const extrinsic_state &ext_state)
>>> +{
>>> +  unsigned taint_sm_idx;
>>> +  if (!ext_state.get_sm_idx_by_name ("taint", &taint_sm_idx))
>>> +    return false;
>>> +  sm_state_map *smap = state->m_checker_states[taint_sm_idx];
>>> +
>>> +  const state_machine &sm = ext_state.get_sm (taint_sm_idx);
>>> +  state_machine::state_t tainted = sm.get_state_by_name ("tainted");
>>> +
>>> +  region_model_manager *mgr = ext_state.get_model_manager ();
>>> +
>>> +  function *fun = DECL_STRUCT_FUNCTION (fndecl);
>>> +  gcc_assert (fun);
>>> +
>>> +  for (tree iter_parm = DECL_ARGUMENTS (fndecl); iter_parm;
>>> +       iter_parm = DECL_CHAIN (iter_parm))
>>> +    {
>>> +      tree param = iter_parm;
>>> +      if (tree parm_default_ssa = ssa_default_def (fun, iter_parm))
>>> +       param = parm_default_ssa;
>>> +      const region *param_reg = state->m_region_model->get_lvalue
>>> (param, NULL);
>>> +      const svalue *init_sval = mgr->get_or_create_initial_value
>>> (param_reg);
>>> +      smap->set_state (state->m_region_model, init_sval,
>>> +                      tainted, NULL /*origin_new_sval*/, ext_state);
>>> +      if (POINTER_TYPE_P (TREE_TYPE (param)))
>>> +       {
>>> +         const region *pointee_reg = mgr->get_symbolic_region
>>> (init_sval);
>>> +         /* Mark "*param" as tainted.  */
>>> +         const svalue *init_pointee_sval
>>> +           = mgr->get_or_create_initial_value (pointee_reg);
>>> +         smap->set_state (state->m_region_model, init_pointee_sval,
>>> +                          tainted, NULL /*origin_new_sval*/,
>>> ext_state);
>>> +       }
>>> +    }
>>> +
>>> +  return true;
>>> +}
>>> +
>>> +/* Custom event for use by tainted_function_info when a function
>>> +   has been marked with __attribute__((tainted)).  */
>>> +
>>> +class tainted_function_custom_event : public custom_event
>>> +{
>>> +public:
>>> +  tainted_function_custom_event (location_t loc, tree fndecl, int
>>> depth)
>>> +  : custom_event (loc, fndecl, depth),
>>> +    m_fndecl (fndecl)
>>> +  {
>>> +  }
>>> +
>>> +  label_text get_desc (bool can_colorize) const FINAL OVERRIDE
>>> +  {
>>> +    return make_label_text
>>> +      (can_colorize,
>>> +       "function %qE marked with %<__attribute__((tainted))%>",
>>> +       m_fndecl);
>>> +  }
>>> +
>>> +private:
>>> +  tree m_fndecl;
>>> +};
>>> +
>>> +/* Custom exploded_edge info for top-level calls to a function
>>> +   marked with __attribute__((tainted)).  */
>>> +
>>> +class tainted_function_info : public custom_edge_info
>>> +{
>>> +public:
>>> +  tainted_function_info (tree fndecl)
>>> +  : m_fndecl (fndecl)
>>> +  {}
>>> +
>>> +  void print (pretty_printer *pp) const FINAL OVERRIDE
>>> +  {
>>> +    pp_string (pp, "call to tainted function");
>>> +  };
>>> +
>>> +  bool update_model (region_model *,
>>> +                    const exploded_edge *,
>>> +                    region_model_context *) const FINAL OVERRIDE
>>> +  {
>>> +    /* No-op.  */
>>> +    return true;
>>> +  }
>>> +
>>> +  void add_events_to_path (checker_path *emission_path,
>>> +                          const exploded_edge &) const FINAL
>>> OVERRIDE
>>> +  {
>>> +    emission_path->add_event
>>> +      (new tainted_function_custom_event
>>> +       (DECL_SOURCE_LOCATION (m_fndecl), m_fndecl, 0));
>>> +  }
>>> +
>>> +private:
>>> +  tree m_fndecl;
>>> +};
>>> +
>>>   /* Ensure that there is an exploded_node representing an external
>>> call to
>>>      FUN, adding it to the worklist if creating it.
>>>   
>>> @@ -2302,14 +2415,25 @@ exploded_graph::add_function_entry (function
>>> *fun)
>>>     program_state state (m_ext_state);
>>>     state.push_frame (m_ext_state, fun);
>>>   
>>> +  custom_edge_info *edge_info = NULL;
>>> +
>>> +  if (lookup_attribute ("tainted", DECL_ATTRIBUTES (fun->decl)))
>>> +    {
>>> +      if (mark_params_as_tainted (&state, fun->decl, m_ext_state))
>>> +       edge_info = new tainted_function_info (fun->decl);
>>> +    }
>>> +
>>>     if (!state.m_valid)
>>>       return NULL;
>>>   
>>>     exploded_node *enode = get_or_create_node (point, state, NULL);
>>>     if (!enode)
>>> -    return NULL;
>>> +    {
>>> +      delete edge_info;
>>> +      return NULL;
>>> +    }
>>>   
>>> -  add_edge (m_origin, enode, NULL);
>>> +  add_edge (m_origin, enode, NULL, edge_info);
>>>   
>>>     m_functions_with_enodes.add (fun);
>>>   
>>> @@ -2623,6 +2747,184 @@ toplevel_function_p (function *fun, logger
>>> *logger)
>>>     return true;
>>>   }
>>>   
>>> +/* Custom event for use by tainted_call_info when a callback field
>>> has been
>>> +   marked with __attribute__((tainted)), for labelling the field.
>>> */
>>> +
>>> +class tainted_field_custom_event : public custom_event
>>> +{
>>> +public:
>>> +  tainted_field_custom_event (tree field)
>>> +  : custom_event (DECL_SOURCE_LOCATION (field), NULL_TREE, 0),
>>> +    m_field (field)
>>> +  {
>>> +  }
>>> +
>>> +  label_text get_desc (bool can_colorize) const FINAL OVERRIDE
>>> +  {
>>> +    return make_label_text (can_colorize,
>>> +                           "field %qE of %qT"
>>> +                           " is marked with
>>> %<__attribute__((tainted))%>",
>>> +                           m_field, DECL_CONTEXT (m_field));
>>> +  }
>>> +
>>> +private:
>>> +  tree m_field;
>>> +};
>>> +
>>> +/* Custom event for use by tainted_call_info when a callback field
>>> has been
>>> +   marked with __attribute__((tainted)), for labelling the function
>>> used
>>> +   in that callback.  */
>>> +
>>> +class tainted_callback_custom_event : public custom_event
>>> +{
>>> +public:
>>> +  tainted_callback_custom_event (location_t loc, tree fndecl, int
>>> depth,
>>> +                                tree field)
>>> +  : custom_event (loc, fndecl, depth),
>>> +    m_field (field)
>>> +  {
>>> +  }
>>> +
>>> +  label_text get_desc (bool can_colorize) const FINAL OVERRIDE
>>> +  {
>>> +    return make_label_text (can_colorize,
>>> +                           "function %qE used as initializer for
>>> field %qE"
>>> +                           " marked with
>>> %<__attribute__((tainted))%>",
>>> +                           m_fndecl, m_field);
>>> +  }
>>> +
>>> +private:
>>> +  tree m_field;
>>> +};
>>> +
>>> +/* Custom edge info for use when adding a function used by a
>>> callback field
>>> +   marked with '__attribute__((tainted))'.   */
>>> +
>>> +class tainted_call_info : public custom_edge_info
>>> +{
>>> +public:
>>> +  tainted_call_info (tree field, tree fndecl, location_t loc)
>>> +  : m_field (field), m_fndecl (fndecl), m_loc (loc)
>>> +  {}
>>> +
>>> +  void print (pretty_printer *pp) const FINAL OVERRIDE
>>> +  {
>>> +    pp_string (pp, "call to tainted field");
>>> +  };
>>> +
>>> +  bool update_model (region_model *,
>>> +                    const exploded_edge *,
>>> +                    region_model_context *) const FINAL OVERRIDE
>>> +  {
>>> +    /* No-op.  */
>>> +    return true;
>>> +  }
>>> +
>>> +  void add_events_to_path (checker_path *emission_path,
>>> +                          const exploded_edge &) const FINAL
>>> OVERRIDE
>>> +  {
>>> +    /* Show the field in the struct declaration
>>> +       e.g. "(1) field 'store' is marked with
>>> '__attribute__((tainted))'"  */
>>> +    emission_path->add_event
>>> +      (new tainted_field_custom_event (m_field));
>>> +
>>> +    /* Show the callback in the initializer
>>> +       e.g.
>>> +       "(2) function 'gadget_dev_desc_UDC_store' used as initializer
>>> +       for field 'store' marked with '__attribute__((tainted))'".
>>> */
>>> +    emission_path->add_event
>>> +      (new tainted_callback_custom_event (m_loc, m_fndecl, 0,
>>> m_field));
>>> +  }
>>> +
>>> +private:
>>> +  tree m_field;
>>> +  tree m_fndecl;
>>> +  location_t m_loc;
>>> +};
>>> +
>>> +/* Given an initializer at LOC for FIELD marked with
>>> '__attribute__((tainted))'
>>> +   initialized with FNDECL, add an entrypoint to FNDECL to EG (and
>>> to its
>>> +   worklist) where the params to FNDECL are marked as tainted.  */
>>> +
>>> +static void
>>> +add_tainted_callback (exploded_graph *eg, tree field, tree fndecl,
>>> +                     location_t loc)
>>> +{
>>> +  logger *logger = eg->get_logger ();
>>> +
>>> +  LOG_SCOPE (logger);
>>> +
>>> +  if (!gimple_has_body_p (fndecl))
>>> +    return;
>>> +
>>> +  const extrinsic_state &ext_state = eg->get_ext_state ();
>>> +
>>> +  function *fun = DECL_STRUCT_FUNCTION (fndecl);
>>> +  gcc_assert (fun);
>>> +
>>> +  program_point point
>>> +    = program_point::from_function_entry (eg->get_supergraph (),
>>> fun);
>>> +  program_state state (ext_state);
>>> +  state.push_frame (ext_state, fun);
>>> +
>>> +  if (!mark_params_as_tainted (&state, fndecl, ext_state))
>>> +    return;
>>> +
>>> +  if (!state.m_valid)
>>> +    return;
>>> +
>>> +  exploded_node *enode = eg->get_or_create_node (point, state,
>>> NULL);
>>> +  if (logger)
>>> +    {
>>> +      if (enode)
>>> +       logger->log ("created EN %i for tainted %qE entrypoint",
>>> +                    enode->m_index, fndecl);
>>> +      else
>>> +       {
>>> +         logger->log ("did not create enode for tainted %qE
>>> entrypoint",
>>> +                      fndecl);
>>> +         return;
>>> +       }
>>> +    }
>>> +
>>> +  tainted_call_info *info = new tainted_call_info (field, fndecl,
>>> loc);
>>> +  eg->add_edge (eg->get_origin (), enode, NULL, info);
>>> +}
>>> +
>>> +/* Callback for walk_tree for finding callbacks within initializers;
>>> +   ensure that any callback initializer where the corresponding
>>> field is
>>> +   marked with '__attribute__((tainted))' is treated as an
>>> entrypoint to the
>>> +   analysis, special-casing that the inputs to the callback are
>>> +   untrustworthy.  */
>>> +
>>> +static tree
>>> +add_any_callbacks (tree *tp, int *, void *data)
>>> +{
>>> +  exploded_graph *eg = (exploded_graph *)data;
>>> +  if (TREE_CODE (*tp) == CONSTRUCTOR)
>>> +    {
>>> +      /* Find fields with the "tainted" attribute.
>>> +        walk_tree only walks the values, not the index values;
>>> +        look at the index values.  */
>>> +      unsigned HOST_WIDE_INT idx;
>>> +      constructor_elt *ce;
>>> +
>>> +      for (idx = 0; vec_safe_iterate (CONSTRUCTOR_ELTS (*tp), idx,
>>> &ce);
>>> +          idx++)
>>> +       if (ce->index && TREE_CODE (ce->index) == FIELD_DECL)
>>> +         if (lookup_attribute ("tainted", DECL_ATTRIBUTES (ce-
>>>> index)))
>>> +           {
>>> +             tree value = ce->value;
>>> +             if (TREE_CODE (value) == ADDR_EXPR
>>> +                 && TREE_CODE (TREE_OPERAND (value, 0)) ==
>>> FUNCTION_DECL)
>>> +               add_tainted_callback (eg, ce->index, TREE_OPERAND
>>> (value, 0),
>>> +                                     EXPR_LOCATION (value));
>>> +           }
>>> +    }
>>> +
>>> +  return NULL_TREE;
>>> +}
>>> +
>>>   /* Add initial nodes to EG, with entrypoints for externally-callable
>>>      functions.  */
>>>   
>>> @@ -2648,6 +2950,17 @@ exploded_graph::build_initial_worklist ()
>>>            logger->log ("did not create enode for %qE entrypoint",
>>> fun->decl);
>>>         }
>>>     }
>>> +
>>> +  /* Find callbacks that are reachable from global initializers.  */
>>> +  varpool_node *vpnode;
>>> +  FOR_EACH_VARIABLE (vpnode)
>>> +    {
>>> +      tree decl = vpnode->decl;
>>> +      tree init = DECL_INITIAL (decl);
>>> +      if (!init)
>>> +       continue;
>>> +      walk_tree (&init, add_any_callbacks, this, NULL);
>>> +    }
>>>   }
>>>   
>>>   /* The main loop of the analysis.
>>> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
>>> index 9e03156de5e..835ba6e0e8c 100644
>>> --- a/gcc/c-family/c-attribs.c
>>> +++ b/gcc/c-family/c-attribs.c
>>> @@ -117,6 +117,7 @@ static tree
>>> handle_no_profile_instrument_function_attribute (tree *, tree,
>>>                                                               tree,
>>> int, bool *);
>>>   static tree handle_malloc_attribute (tree *, tree, tree, int, bool
>>> *);
>>>   static tree handle_dealloc_attribute (tree *, tree, tree, int, bool
>>> *);
>>> +static tree handle_tainted_attribute (tree *, tree, tree, int, bool
>>> *);
>>>   static tree handle_returns_twice_attribute (tree *, tree, tree, int,
>>> bool *);
>>>   static tree handle_no_limit_stack_attribute (tree *, tree, tree,
>>> int,
>>>                                               bool *);
>>> @@ -569,6 +570,8 @@ const struct attribute_spec
>>> c_common_attribute_table[] =
>>>                                handle_objc_nullability_attribute, NULL
>>> },
>>>     { "*dealloc",                1, 2, true, false, false, false,
>>>                                handle_dealloc_attribute, NULL },
>>> +  { "tainted",               0, 0, true,  false, false, false,
>>> +                             handle_tainted_attribute, NULL },
>>>     { NULL,                     0, 0, false, false, false, false,
>>> NULL, NULL }
>>>   };
>>>   
>>> @@ -5857,6 +5860,39 @@ handle_objc_nullability_attribute (tree *node,
>>> tree name, tree args,
>>>     return NULL_TREE;
>>>   }
>>>   
>>> +/* Handle a "tainted" attribute; arguments as in
>>> +   struct attribute_spec.handler.  */
>>> +
>>> +static tree
>>> +handle_tainted_attribute (tree *node, tree name, tree, int,
>>> +                         bool *no_add_attrs)
>>> +{
>>> +  if (TREE_CODE (*node) != FUNCTION_DECL
>>> +      && TREE_CODE (*node) != FIELD_DECL)
>>> +    {
>>> +      warning (OPT_Wattributes, "%qE attribute ignored; valid only "
>>> +              "for functions and function pointer fields",
>>> +              name);
>>> +      *no_add_attrs = true;
>>> +      return NULL_TREE;
>>> +    }
>>> +
>>> +  if (TREE_CODE (*node) == FIELD_DECL
>>> +      && !(TREE_CODE (TREE_TYPE (*node)) == POINTER_TYPE
>>> +          && TREE_CODE (TREE_TYPE (TREE_TYPE (*node))) ==
>>> FUNCTION_TYPE))
>>> +    {
>>> +      warning (OPT_Wattributes, "%qE attribute ignored;"
>>> +              " field must be a function pointer",
>>> +              name);
>>> +      *no_add_attrs = true;
>>> +      return NULL_TREE;
>>> +    }
>>> +
>>> +  *no_add_attrs = false; /* OK */
>>> +
>>> +  return NULL_TREE;
>>> +}
>>> +
>>>   /* Attempt to partially validate a single attribute ATTR as if
>>>      it were to be applied to an entity OPER.  */
>>>   
>>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>>> index 5a6ef464779..826bbd48e7e 100644
>>> --- a/gcc/doc/extend.texi
>>> +++ b/gcc/doc/extend.texi
>>> @@ -2465,7 +2465,8 @@ variable declarations (@pxref{Variable
>>> Attributes}),
>>>   labels (@pxref{Label Attributes}),
>>>   enumerators (@pxref{Enumerator Attributes}),
>>>   statements (@pxref{Statement Attributes}),
>>> -and types (@pxref{Type Attributes}).
>>> +types (@pxref{Type Attributes}),
>>> +and on field declarations (for @code{tainted}).
>>>   
>>>   There is some overlap between the purposes of attributes and pragmas
>>>   (@pxref{Pragmas,,Pragmas Accepted by GCC}).  It has been
>>> @@ -3977,6 +3978,25 @@ addition to creating a symbol version (as if
>>>   @code{"@var{name2}@@@var{nodename}"} was used) the version will be
>>> also used
>>>   to resolve @var{name2} by the linker.
>>>   
>>> +@item tainted
>>> +@cindex @code{tainted} function attribute
>>> +The @code{tainted} attribute is used to specify that a function is
>>> called
>>> +in a way that requires sanitization of its arguments, such as a
>>> system
>>> +call in an operating system kernel.  Such a function can be
>>> considered part
>>> +of the ``attack surface'' of the program.  The attribute can be used
>>> both
>>> +on function declarations, and on field declarations containing
>>> function
>>> +pointers.  In the latter case, any function used as an initializer
>>> of
>>> +such a callback field will be treated as tainted.
>>> +
>>> +The analyzer will pay particular attention to such functions when
>>> both
>>> +@option{-fanalyzer} and @option{-fanalyzer-checker=taint} are
>>> supplied,
>>> +potentially issuing warnings guarded by
>>> +@option{-Wanalyzer-exposure-through-uninit-copy},
>>> +@option{-Wanalyzer-tainted-allocation-size},
>>> +@option{-Wanalyzer-tainted-array-index},
>>> +@option{Wanalyzer-tainted-offset},
>>> +and @option{Wanalyzer-tainted-size}.
>>> +
>>>   @item target_clones (@var{options})
>>>   @cindex @code{target_clones} function attribute
>>>   The @code{target_clones} attribute is used to specify that a
>>> function
>>> diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
>>> b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
>>> new file mode 100644
>>> index 00000000000..cc4d5900372
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
>>> @@ -0,0 +1,88 @@
>>> +// TODO: remove need for this option
>>> +/* { dg-additional-options "-fanalyzer-checker=taint" } */
>>> +
>>> +#include "analyzer-decls.h"
>>> +
>>> +struct arg_buf
>>> +{
>>> +  int i;
>>> +  int j;
>>> +};
>>> +
>>> +/* Example of marking a function as tainted.  */
>>> +
>>> +void __attribute__((tainted))
>>> +test_1 (int i, void *p, char *q)
>>> +{
>>> +  /* There should be a single enode,
>>> +     for the "tainted" entry to the function.  */
>>> +  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1 processed
>>> enode" } */
>>> +
>>> +  __analyzer_dump_state ("taint", i); /* { dg-warning "state:
>>> 'tainted'" } */
>>> +  __analyzer_dump_state ("taint", p); /* { dg-warning "state:
>>> 'tainted'" } */
>>> +  __analyzer_dump_state ("taint", q); /* { dg-warning "state:
>>> 'tainted'" } */
>>> +  __analyzer_dump_state ("taint", *q); /* { dg-warning "state:
>>> 'tainted'" } */
>>> +
>>> +  struct arg_buf *args = p;
>>> +  __analyzer_dump_state ("taint", args->i); /* { dg-warning "state:
>>> 'tainted'" } */
>>> +  __analyzer_dump_state ("taint", args->j); /* { dg-warning "state:
>>> 'tainted'" } */
>>> +}
>>> +
>>> +/* Example of marking a callback field as tainted.  */
>>> +
>>> +struct s2
>>> +{
>>> +  void (*cb) (int, void *, char *)
>>> +    __attribute__((tainted));
>>> +};
>>> +
>>> +/* Function not marked as tainted.  */
>>> +
>>> +void
>>> +test_2a (int i, void *p, char *q)
>>> +{
>>> +  /* There should be a single enode,
>>> +     for the normal entry to the function.  */
>>> +  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1 processed
>>> enode" } */
>>> +
>>> +  __analyzer_dump_state ("taint", i); /* { dg-warning "state:
>>> 'start'" } */
>>> +  __analyzer_dump_state ("taint", p); /* { dg-warning "state:
>>> 'start'" } */
>>> +  __analyzer_dump_state ("taint", q); /* { dg-warning "state:
>>> 'start'" } */
>>> +
>>> +  struct arg_buf *args = p;
>>> +  __analyzer_dump_state ("taint", args->i); /* { dg-warning "state:
>>> 'start'" } */
>>> +  __analyzer_dump_state ("taint", args->j); /* { dg-warning "state:
>>> 'start'" } */
>>> +}
>>> +
>>> +/* Function referenced via t2b.cb, marked as "tainted".  */
>>> +
>>> +void
>>> +test_2b (int i, void *p, char *q)
>>> +{
>>> +  /* There should be two enodes
>>> +     for the direct call, and the "tainted" entry to the function.
>>> */
>>> +  __analyzer_dump_exploded_nodes (0); /* { dg-warning "2 processed
>>> enodes" } */
>>> +}
>>> +
>>> +/* Callback used via t2c.cb, marked as "tainted".  */
>>> +void
>>> +__analyzer_test_2c (int i, void *p, char *q)
>>> +{
>>> +  /* There should be a single enode,
>>> +     for the "tainted" entry to the function.  */
>>> +  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1 processed
>>> enode" } */
>>> +
>>> +  __analyzer_dump_state ("taint", i); /* { dg-warning "state:
>>> 'tainted'" } */
>>> +  __analyzer_dump_state ("taint", p); /* { dg-warning "state:
>>> 'tainted'" } */
>>> +  __analyzer_dump_state ("taint", q); /* { dg-warning "state:
>>> 'tainted'" } */
>>> +}
>>> +
>>> +struct s2 t2b =
>>> +{
>>> +  .cb = test_2b
>>> +};
>>> +
>>> +struct s2 t2c =
>>> +{
>>> +  .cb = __analyzer_test_2c
>>> +};
>>> diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
>>> b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
>>> new file mode 100644
>>> index 00000000000..6f4cbc82efb
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
>>> @@ -0,0 +1,6 @@
>>> +int not_a_fn __attribute__ ((tainted)); /* { dg-warning "'tainted'
>>> attribute ignored; valid only for functions and function pointer
>>> fields" } */
>>> +
>>> +struct s
>>> +{
>>> +  int f __attribute__ ((tainted)); /* { dg-warning "'tainted'
>>> attribute ignored; field must be a function pointer" } */
>>> +};
>>> diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
>>> b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
>>> new file mode 100644
>>> index 00000000000..fe6c7ebbb1f
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
>>> @@ -0,0 +1,93 @@
>>> +/* "The osf_getsysinfo function in arch/alpha/kernel/osf_sys.c in
>>> the
>>> +   Linux kernel before 2.6.39.4 on the Alpha platform does not
>>> properly
>>> +   restrict the data size for GSI_GET_HWRPB operations, which allows
>>> +   local users to obtain sensitive information from kernel memory
>>> via
>>> +   a crafted call."
>>> +
>>> +   Fixed in 3d0475119d8722798db5e88f26493f6547a4bb5b on linux-
>>> 2.6.39.y
>>> +   in linux-stable.  */
>>> +
>>> +// TODO: remove need for this option:
>>> +/* { dg-additional-options "-fanalyzer-checker=taint" } */
>>> +
>>> +#include "analyzer-decls.h"
>>> +#include "test-uaccess.h"
>>> +
>>> +/* Adapted from include/linux/linkage.h.  */
>>> +
>>> +#define asmlinkage
>>> +
>>> +/* Adapted from include/linux/syscalls.h.  */
>>> +
>>> +#define __SC_DECL1(t1, a1)     t1 a1
>>> +#define __SC_DECL2(t2, a2, ...) t2 a2, __SC_DECL1(__VA_ARGS__)
>>> +#define __SC_DECL3(t3, a3, ...) t3 a3, __SC_DECL2(__VA_ARGS__)
>>> +#define __SC_DECL4(t4, a4, ...) t4 a4, __SC_DECL3(__VA_ARGS__)
>>> +#define __SC_DECL5(t5, a5, ...) t5 a5, __SC_DECL4(__VA_ARGS__)
>>> +#define __SC_DECL6(t6, a6, ...) t6 a6, __SC_DECL5(__VA_ARGS__)
>>> +
>>> +#define SYSCALL_DEFINEx(x, sname, ...)                         \
>>> +       __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
>>> +
>>> +#define SYSCALL_DEFINE(name) asmlinkage long sys_##name
>>> +#define __SYSCALL_DEFINEx(x, name,
>>> ...)                                        \
>>> +       asmlinkage __attribute__((tainted)) \
>>> +       long sys##name(__SC_DECL##x(__VA_ARGS__))
>>> +
>>> +#define SYSCALL_DEFINE5(name, ...) SYSCALL_DEFINEx(5, _##name,
>>> __VA_ARGS__)
>>> +
>>> +/* Adapted from arch/alpha/include/asm/hwrpb.h.  */
>>> +
>>> +struct hwrpb_struct {
>>> +       unsigned long phys_addr;        /* check: physical address of
>>> the hwrpb */
>>> +       unsigned long id;               /* check: "HWRPB\0\0\0" */
>>> +       unsigned long revision;
>>> +       unsigned long size;             /* size of hwrpb */
>>> +       /* [...snip...] */
>>> +};
>>> +
>>> +extern struct hwrpb_struct *hwrpb;
>>> +
>>> +/* Adapted from arch/alpha/kernel/osf_sys.c.  */
>>> +
>>> +SYSCALL_DEFINE5(osf_getsysinfo, unsigned long, op, void __user *,
>>> buffer,
>>> +               unsigned long, nbytes, int __user *, start, void
>>> __user *, arg)
>>> +{
>>> +       /* [...snip...] */
>>> +
>>> +       __analyzer_dump_state ("taint", nbytes);  /* { dg-warning
>>> "tainted" } */
>>> +
>>> +       /* TODO: should have an event explaining why "nbytes" is
>>> treated as
>>> +          attacker-controlled.  */
>>> +
>>> +       /* case GSI_GET_HWRPB: */
>>> +               if (nbytes < sizeof(*hwrpb))
>>> +                       return -1;
>>> +
>>> +               __analyzer_dump_state ("taint", nbytes);  /* { dg-
>>> warning "has_lb" } */
>>> +
>>> +               if (copy_to_user(buffer, hwrpb, nbytes) != 0) /* {
>>> dg-warning "use of attacker-controlled value 'nbytes' as size without
>>> upper-bounds checking" } */
>>> +                       return -2;
>>> +
>>> +               return 1;
>>> +
>>> +       /* [...snip...] */
>>> +}
>>> +
>>> +/* With the fix for the sense of the size comparison.  */
>>> +
>>> +SYSCALL_DEFINE5(osf_getsysinfo_fixed, unsigned long, op, void __user
>>> *, buffer,
>>> +               unsigned long, nbytes, int __user *, start, void
>>> __user *, arg)
>>> +{
>>> +       /* [...snip...] */
>>> +
>>> +       /* case GSI_GET_HWRPB: */
>>> +               if (nbytes > sizeof(*hwrpb))
>>> +                       return -1;
>>> +               if (copy_to_user(buffer, hwrpb, nbytes) != 0) /* {
>>> dg-bogus "attacker-controlled" } */
>>> +                       return -2;
>>> +
>>> +               return 1;
>>> +
>>> +       /* [...snip...] */
>>> +}
>>> diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
>>> b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
>>> new file mode 100644
>>> index 00000000000..0b9a94a8d6c
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
>>> @@ -0,0 +1,38 @@
>>> +/* See notes in this header.  */
>>> +#include "taint-CVE-2020-13143.h"
>>> +
>>> +// TODO: remove need for this option
>>> +/* { dg-additional-options "-fanalyzer-checker=taint" } */
>>> +
>>> +struct configfs_attribute {
>>> +       /* [...snip...] */
>>> +       ssize_t (*store)(struct config_item *, const char *, size_t)
>>> /* { dg-message "\\(1\\) field 'store' of 'struct configfs_attribute'
>>> is marked with '__attribute__\\(\\(tainted\\)\\)'" } */
>>> +               __attribute__((tainted)); /* (this is added).  */
>>> +};
>>> +static inline struct gadget_info *to_gadget_info(struct config_item
>>> *item)
>>> +{
>>> +        return container_of(to_config_group(item), struct
>>> gadget_info, group);
>>> +}
>>> +
>>> +static ssize_t gadget_dev_desc_UDC_store(struct config_item *item,
>>> +               const char *page, size_t len)
>>> +{
>>> +       struct gadget_info *gi = to_gadget_info(item);
>>> +       char *name;
>>> +       int ret;
>>> +
>>> +#if 0
>>> +       /* FIXME: this is the fix.  */
>>> +       if (strlen(page) < len)
>>> +               return -EOVERFLOW;
>>> +#endif
>>> +
>>> +       name = kstrdup(page, GFP_KERNEL);
>>> +       if (!name)
>>> +               return -ENOMEM;
>>> +       if (name[len - 1] == '\n') /* { dg-warning "use of attacker-
>>> controlled value 'len \[^\n\r\]+' as offset without upper-bounds
>>> checking" } */
>>> +               name[len - 1] = '\0'; /* { dg-warning "use of
>>> attacker-controlled value 'len \[^\n\r\]+' as offset without upper-
>>> bounds checking" } */
>>> +       /* [...snip...] */                              \
>>> +}
>>> +
>>> +CONFIGFS_ATTR(gadget_dev_desc_, UDC); /* { dg-message "\\(2\\)
>>> function 'gadget_dev_desc_UDC_store' used as initializer for field
>>> 'store' marked with '__attribute__\\(\\(tainted\\)\\)'" } */
>>> diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
>>> b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
>>> new file mode 100644
>>> index 00000000000..e05da9276c1
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
>>> @@ -0,0 +1,32 @@
>>> +/* See notes in this header.  */
>>> +#include "taint-CVE-2020-13143.h"
>>> +
>>> +// TODO: remove need for this option
>>> +/* { dg-additional-options "-fanalyzer-checker=taint" } */
>>> +
>>> +struct configfs_attribute {
>>> +       /* [...snip...] */
>>> +       ssize_t (*store)(struct config_item *, const char *, size_t)
>>> /* { dg-message "\\(1\\) field 'store' of 'struct configfs_attribute'
>>> is marked with '__attribute__\\(\\(tainted\\)\\)'" } */
>>> +               __attribute__((tainted)); /* (this is added).  */
>>> +};
>>> +
>>> +/* Highly simplified version.  */
>>> +
>>> +static ssize_t gadget_dev_desc_UDC_store(struct config_item *item,
>>> +               const char *page, size_t len)
>>> +{
>>> +       /* TODO: ought to have state_change_event talking about where
>>> the tainted value comes from.  */
>>> +
>>> +       char *name;
>>> +       /* [...snip...] */
>>> +
>>> +       name = kstrdup(page, GFP_KERNEL);
>>> +       if (!name)
>>> +               return -ENOMEM;
>>> +       if (name[len - 1] == '\n') /* { dg-warning "use of attacker-
>>> controlled value 'len \[^\n\r\]+' as offset without upper-bounds
>>> checking" } */
>>> +               name[len - 1] = '\0';  /* { dg-warning "use of
>>> attacker-controlled value 'len \[^\n\r\]+' as offset without upper-
>>> bounds checking" } */
>>> +       /* [...snip...] */
>>> +       return 0;
>>> +}
>>> +
>>> +CONFIGFS_ATTR(gadget_dev_desc_, UDC); /* { dg-message "\\(2\\)
>>> function 'gadget_dev_desc_UDC_store' used as initializer for field
>>> 'store' marked with '__attribute__\\(\\(tainted\\)\\)'" } */
>>> diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
>>> b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
>>> new file mode 100644
>>> index 00000000000..0ba023539af
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
>>> @@ -0,0 +1,91 @@
>>> +/* Shared header for the various taint-CVE-2020-13143.h tests.
>>> +
>>> +   "gadget_dev_desc_UDC_store in drivers/usb/gadget/configfs.c in
>>> the
>>> +   Linux kernel 3.16 through 5.6.13 relies on kstrdup without
>>> considering
>>> +   the possibility of an internal '\0' value, which allows attackers
>>> to
>>> +   trigger an out-of-bounds read, aka CID-15753588bcd4."
>>> +
>>> +   Fixed by 15753588bcd4bbffae1cca33c8ced5722477fe1f on linux-5.7.y
>>> +   in linux-stable.  */
>>> +
>>> +// TODO: remove need for this option
>>> +/* { dg-additional-options "-fanalyzer-checker=taint" } */
>>> +
>>> +#include <stddef.h>
>>> +
>>> +/* Adapted from include/uapi/asm-generic/posix_types.h  */
>>> +
>>> +typedef unsigned int     __kernel_size_t;
>>> +typedef int              __kernel_ssize_t;
>>> +
>>> +/* Adapted from include/linux/types.h  */
>>> +
>>> +//typedef __kernel_size_t              size_t;
>>> +typedef __kernel_ssize_t       ssize_t;
>>> +
>>> +/* Adapted from include/linux/kernel.h  */
>>> +
>>> +#define container_of(ptr, type, member)
>>> ({                             \
>>> +       void *__mptr = (void
>>> *)(ptr);                                   \
>>> +       /* [...snip...]
>>> */                                              \
>>> +       ((type *)(__mptr - offsetof(type, member))); })
>>> +
>>> +/* Adapted from include/linux/configfs.h  */
>>> +
>>> +struct config_item {
>>> +       /* [...snip...] */
>>> +};
>>> +
>>> +struct config_group {
>>> +       struct config_item              cg_item;
>>> +       /* [...snip...] */
>>> +};
>>> +
>>> +static inline struct config_group *to_config_group(struct
>>> config_item *item)
>>> +{
>>> +       return item ? container_of(item,struct config_group,cg_item)
>>> : NULL;
>>> +}
>>> +
>>> +#define CONFIGFS_ATTR(_pfx, _name)                             \
>>> +static struct configfs_attribute _pfx##attr_##_name = {        \
>>> +       /* [...snip...] */                              \
>>> +       .store          = _pfx##_name##_store,          \
>>> +}
>>> +
>>> +/* Adapted from include/linux/compiler.h  */
>>> +
>>> +#define __force
>>> +
>>> +/* Adapted from include/asm-generic/errno-base.h  */
>>> +
>>> +#define        ENOMEM          12      /* Out of memory */
>>> +
>>> +/* Adapted from include/linux/types.h  */
>>> +
>>> +#define __bitwise__
>>> +typedef unsigned __bitwise__ gfp_t;
>>> +
>>> +/* Adapted from include/linux/gfp.h  */
>>> +
>>> +#define ___GFP_WAIT            0x10u
>>> +#define ___GFP_IO              0x40u
>>> +#define ___GFP_FS              0x80u
>>> +#define __GFP_WAIT     ((__force gfp_t)___GFP_WAIT)
>>> +#define __GFP_IO       ((__force gfp_t)___GFP_IO)
>>> +#define __GFP_FS       ((__force gfp_t)___GFP_FS)
>>> +#define GFP_KERNEL  (__GFP_WAIT | __GFP_IO | __GFP_FS)
>>> +
>>> +/* Adapted from include/linux/compiler_attributes.h  */
>>> +
>>> +#define __malloc                        __attribute__((__malloc__))
>>> +
>>> +/* Adapted from include/linux/string.h  */
>>> +
>>> +extern char *kstrdup(const char *s, gfp_t gfp) __malloc;
>>> +
>>> +/* Adapted from drivers/usb/gadget/configfs.c  */
>>> +
>>> +struct gadget_info {
>>> +       struct config_group group;
>>> +       /* [...snip...] */                              \
>>> +};
>>> diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
>>> b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
>>> new file mode 100644
>>> index 00000000000..4c567b2ffdf
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
>>> @@ -0,0 +1,21 @@
>>> +// TODO: remove need for this option:
>>> +/* { dg-additional-options "-fanalyzer-checker=taint" } */
>>> +
>>> +#include "analyzer-decls.h"
>>> +#include <stdio.h>
>>> +#include <stdlib.h>
>>> +#include <string.h>
>>> +
>>> +/* malloc with tainted size from a syscall.  */
>>> +
>>> +void *p;
>>> +
>>> +void __attribute__((tainted))
>>> +test_1 (size_t sz) /* { dg-message "\\(1\\) function 'test_1' marked
>>> with '__attribute__\\(\\(tainted\\)\\)'" } */
>>> +{
>>> +  /* TODO: should have a message saying why "sz" is tainted, e.g.
>>> +     "treating 'sz' as attacker-controlled because 'test_1' is
>>> marked with '__attribute__((tainted))'"  */
>>> +
>>> +  p = malloc (sz); /* { dg-warning "use of attacker-controlled value
>>> 'sz' as allocation size without upper-bounds checking" "warning" } */
>>> +  /* { dg-message "\\(\[0-9\]+\\) use of attacker-controlled value
>>> 'sz' as allocation size without upper-bounds checking" "final event"
>>> { target *-*-* } .-1 } */
>>> +}
>>> diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
>>> b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
>>> new file mode 100644
>>> index 00000000000..f52cafcd71d
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
>>> @@ -0,0 +1,31 @@
>>> +// TODO: remove need for this option:
>>> +/* { dg-additional-options "-fanalyzer-checker=taint" } */
>>> +
>>> +#include "analyzer-decls.h"
>>> +#include <stdio.h>
>>> +#include <stdlib.h>
>>> +#include <string.h>
>>> +
>>> +/* malloc with tainted size from a syscall.  */
>>> +
>>> +struct arg_buf
>>> +{
>>> +  size_t sz;
>>> +};
>>> +
>>> +void *p;
>>> +
>>> +void __attribute__((tainted))
>>> +test_1 (void *data) /* { dg-message "\\(1\\) function 'test_1'
>>> marked with '__attribute__\\(\\(tainted\\)\\)'" } */
>>> +{
>>> +  /* we should treat pointed-to-structs as tainted.  */
>>> +  __analyzer_dump_state ("taint", data); /* { dg-warning "state:
>>> 'tainted'" } */
>>> +
>>> +  struct arg_buf *args = data;
>>> +
>>> +  __analyzer_dump_state ("taint", args); /* { dg-warning "state:
>>> 'tainted'" } */
>>> +  __analyzer_dump_state ("taint", args->sz); /* { dg-warning "state:
>>> 'tainted'" } */
>>> +
>>> +  p = malloc (args->sz); /* { dg-warning "use of attacker-controlled
>>> value '\\*args.sz' as allocation size without upper-bounds checking"
>>> "warning" } */
>>> +  /* { dg-message "\\(\[0-9\]+\\) use of attacker-controlled value
>>> '\\*args.sz' as allocation size without upper-bounds checking" "final
>>> event" { target *-*-* } .-1 } */
>>> +}
>>
> 
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: PING^2 (C/C++): Re: [PATCH 6/6] Add __attribute__ ((tainted))
  2022-01-12  4:36       ` Jason Merrill
@ 2022-01-12 15:33         ` David Malcolm
  2022-01-13 19:08           ` Jason Merrill
  0 siblings, 1 reply; 39+ messages in thread
From: David Malcolm @ 2022-01-12 15:33 UTC (permalink / raw)
  To: Jason Merrill, gcc-patches, linux-toolchains

On Tue, 2022-01-11 at 23:36 -0500, Jason Merrill wrote:
> On 1/10/22 16:36, David Malcolm via Gcc-patches wrote:
> > On Thu, 2022-01-06 at 09:08 -0500, David Malcolm wrote:
> > > On Sat, 2021-11-13 at 15:37 -0500, David Malcolm wrote:
> > > > This patch adds a new __attribute__ ((tainted)) to the C/C++
> > > > frontends.
> > > 
> > > Ping for GCC C/C++ mantainers for review of the C/C++ FE parts of
> > > this
> > > patch (attribute registration, documentation, the name of the
> > > attribute, etc).
> > > 
> > > (I believe it's independent of the rest of the patch kit, in that
> > > it
> > > could go into trunk without needing the prior patches)
> > > 
> > > Thanks
> > > Dave
> > 
> > Getting close to end of stage 3 for GCC 12, so pinging this patch
> > again...
> > 
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584376.html
> 
> The c-family change is OK.

Thanks.

I'm retesting the patch now, but it now seems to me that
  __attribute__((tainted_args))
would lead to more readable code than:
  __attribute__((tainted))

in that the name "tainted_args" better conveys the idea that all
arguments are under attacker-control (as opposed to the body of the
function or the function pointer being under attacker-control).

Looking at
  https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
we already have some attributes with underscores in their names.

Does this sound good?
Dave

> 
> > Thanks
> > Dave
> > 
> > > 
> > > 
> > > > 
> > > > It can be used on function decls: the analyzer will treat as
> > > > tainted
> > > > all parameters to the function and all buffers pointed to by
> > > > parameters
> > > > to the function.  Adding this in one place to the Linux kernel's
> > > > __SYSCALL_DEFINEx macro allows the analyzer to treat all syscalls
> > > > as
> > > > having tainted inputs.  This gives additional testing beyond e.g.
> > > > __user
> > > > pointers added by earlier patches - an example of the use of this
> > > > can
> > > > be
> > > > seen in CVE-2011-2210, where given:
> > > > 
> > > >   SYSCALL_DEFINE5(osf_getsysinfo, unsigned long, op, void __user
> > > > *,
> > > > buffer,
> > > >                   unsigned long, nbytes, int __user *, start,
> > > > void
> > > > __user *, arg)
> > > > 
> > > > the analyzer will treat the nbytes param as under attacker
> > > > control,
> > > > and
> > > > can complain accordingly:
> > > > 
> > > > taint-CVE-2011-2210-1.c: In function ‘sys_osf_getsysinfo’:
> > > > taint-CVE-2011-2210-1.c:69:21: warning: use of attacker-
> > > > controlled
> > > > value
> > > >    ‘nbytes’ as size without upper-bounds checking [CWE-129] [-
> > > > Wanalyzer-tainted-size]
> > > >     69 |                 if (copy_to_user(buffer, hwrpb, nbytes)
> > > > != 0)
> > > >        |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > 
> > > > Additionally, the patch allows the attribute to be used on field
> > > > decls:
> > > > specifically function pointers.  Any function used as an
> > > > initializer
> > > > for such a field gets treated as tainted.  An example can be seen
> > > > in
> > > > CVE-2020-13143, where adding __attribute__((tainted)) to the
> > > > "store"
> > > > callback of configfs_attribute:
> > > > 
> > > >    struct configfs_attribute {
> > > >       /* [...snip...] */
> > > >       ssize_t (*store)(struct config_item *, const char *,
> > > > size_t)
> > > >         __attribute__((tainted));
> > > >       /* [...snip...] */
> > > >    };
> > > > 
> > > > allows the analyzer to see:
> > > > 
> > > >   CONFIGFS_ATTR(gadget_dev_desc_, UDC);
> > > > 
> > > > and treat gadget_dev_desc_UDC_store as tainted, so that it
> > > > complains:
> > > > 
> > > > taint-CVE-2020-13143-1.c: In function
> > > > ‘gadget_dev_desc_UDC_store’:
> > > > taint-CVE-2020-13143-1.c:33:17: warning: use of attacker-
> > > > controlled
> > > > value
> > > >    ‘len + 18446744073709551615’ as offset without upper-bounds
> > > > checking [CWE-823] [-Wanalyzer-tainted-offset]
> > > >     33 |         if (name[len - 1] == '\n')
> > > >        |             ~~~~^~~~~~~~~
> > > > 
> > > > Similarly, the attribute could be used on the ioctl callback
> > > > field,
> > > > USB device callbacks, network-handling callbacks etc.  This
> > > > potentially
> > > > gives a lot of test coverage with relatively little code
> > > > annotation,
> > > > and
> > > > without necessarily needing link-time analysis (which -fanalyzer
> > > > can
> > > > only do at present on trivial examples).
> > > > 
> > > > I believe this is the first time we've had an attribute on a
> > > > field.
> > > > If that's an issue, I could prepare a version of the patch that
> > > > merely allowed it on functions themselves.
> > > > 
> > > > As before this currently still needs -fanalyzer-checker=taint (in
> > > > addition to -fanalyzer).
> > > > 
> > > > gcc/analyzer/ChangeLog:
> > > >          * engine.cc: Include "stringpool.h", "attribs.h", and
> > > >          "tree-dfa.h".
> > > >          (mark_params_as_tainted): New.
> > > >          (class tainted_function_custom_event): New.
> > > >          (class tainted_function_info): New.
> > > >          (exploded_graph::add_function_entry): Handle functions
> > > > with
> > > >          "tainted" attribute.
> > > >          (class tainted_field_custom_event): New.
> > > >          (class tainted_callback_custom_event): New.
> > > >          (class tainted_call_info): New.
> > > >          (add_tainted_callback): New.
> > > >          (add_any_callbacks): New.
> > > >          (exploded_graph::build_initial_worklist): Find callbacks
> > > > that
> > > > are
> > > >          reachable from global initializers, calling
> > > > add_any_callbacks
> > > > on
> > > >          them.
> > > > 
> > > > gcc/c-family/ChangeLog:
> > > >          * c-attribs.c (c_common_attribute_table): Add "tainted".
> > > >          (handle_tainted_attribute): New.
> > > > 
> > > > gcc/ChangeLog:
> > > >          * doc/extend.texi (Function Attributes): Note that
> > > > "tainted"
> > > > can
> > > >          be used on field decls.
> > > >          (Common Function Attributes): Add entry on "tainted"
> > > > attribute.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > >          * gcc.dg/analyzer/attr-tainted-1.c: New test.
> > > >          * gcc.dg/analyzer/attr-tainted-misuses.c: New test.
> > > >          * gcc.dg/analyzer/taint-CVE-2011-2210-1.c: New test.
> > > >          * gcc.dg/analyzer/taint-CVE-2020-13143-1.c: New test.
> > > >          * gcc.dg/analyzer/taint-CVE-2020-13143-2.c: New test.
> > > >          * gcc.dg/analyzer/taint-CVE-2020-13143.h: New test.
> > > >          * gcc.dg/analyzer/taint-alloc-3.c: New test.
> > > >          * gcc.dg/analyzer/taint-alloc-4.c: New test.
> > > > 
> > > > Signed-off-by: David Malcolm <dmalcolm@redhat.com>
> > > > ---
> > > >   gcc/analyzer/engine.cc                        | 317
> > > > +++++++++++++++++-
> > > >   gcc/c-family/c-attribs.c                      |  36 ++
> > > >   gcc/doc/extend.texi                           |  22 +-
> > > >   .../gcc.dg/analyzer/attr-tainted-1.c          |  88 +++++
> > > >   .../gcc.dg/analyzer/attr-tainted-misuses.c    |   6 +
> > > >   .../gcc.dg/analyzer/taint-CVE-2011-2210-1.c   |  93 +++++
> > > >   .../gcc.dg/analyzer/taint-CVE-2020-13143-1.c  |  38 +++
> > > >   .../gcc.dg/analyzer/taint-CVE-2020-13143-2.c  |  32 ++
> > > >   .../gcc.dg/analyzer/taint-CVE-2020-13143.h    |  91 +++++
> > > >   gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c |  21 ++
> > > >   gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c |  31 ++
> > > >   11 files changed, 772 insertions(+), 3 deletions(-)
> > > >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-
> > > > 1.c
> > > >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-
> > > > misuses.c
> > > >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-
> > > > 2011-
> > > > 2210-1.c
> > > >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-
> > > > 2020-
> > > > 13143-1.c
> > > >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-
> > > > 2020-
> > > > 13143-2.c
> > > >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-
> > > > 2020-
> > > > 13143.h
> > > >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-
> > > > 3.c
> > > >   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-
> > > > 4.c
> > > > 
> > > > diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
> > > > index 096e219392d..5fab41daf93 100644
> > > > --- a/gcc/analyzer/engine.cc
> > > > +++ b/gcc/analyzer/engine.cc
> > > > @@ -68,6 +68,9 @@ along with GCC; see the file COPYING3.  If not
> > > > see
> > > >   #include "plugin.h"
> > > >   #include "target.h"
> > > >   #include <memory>
> > > > +#include "stringpool.h"
> > > > +#include "attribs.h"
> > > > +#include "tree-dfa.h"
> > > >   
> > > >   /* For an overview, see gcc/doc/analyzer.texi.  */
> > > >   
> > > > @@ -2276,6 +2279,116 @@ exploded_graph::~exploded_graph ()
> > > >       delete (*iter).second;
> > > >   }
> > > >   
> > > > +/* Subroutine for use when implementing __attribute__((tainted))
> > > > +   on functions and on function pointer fields in structs.
> > > > +
> > > > +   Called on STATE representing a call to FNDECL.
> > > > +   Mark all params of FNDECL in STATE as "tainted".  Mark the
> > > > value
> > > > of all
> > > > +   regions pointed to by params of FNDECL as "tainted".
> > > > +
> > > > +   Return true if successful; return false if the "taint" state
> > > > machine
> > > > +   was not found.  */
> > > > +
> > > > +static bool
> > > > +mark_params_as_tainted (program_state *state, tree fndecl,
> > > > +                       const extrinsic_state &ext_state)
> > > > +{
> > > > +  unsigned taint_sm_idx;
> > > > +  if (!ext_state.get_sm_idx_by_name ("taint", &taint_sm_idx))
> > > > +    return false;
> > > > +  sm_state_map *smap = state->m_checker_states[taint_sm_idx];
> > > > +
> > > > +  const state_machine &sm = ext_state.get_sm (taint_sm_idx);
> > > > +  state_machine::state_t tainted = sm.get_state_by_name
> > > > ("tainted");
> > > > +
> > > > +  region_model_manager *mgr = ext_state.get_model_manager ();
> > > > +
> > > > +  function *fun = DECL_STRUCT_FUNCTION (fndecl);
> > > > +  gcc_assert (fun);
> > > > +
> > > > +  for (tree iter_parm = DECL_ARGUMENTS (fndecl); iter_parm;
> > > > +       iter_parm = DECL_CHAIN (iter_parm))
> > > > +    {
> > > > +      tree param = iter_parm;
> > > > +      if (tree parm_default_ssa = ssa_default_def (fun,
> > > > iter_parm))
> > > > +       param = parm_default_ssa;
> > > > +      const region *param_reg = state->m_region_model-
> > > > >get_lvalue
> > > > (param, NULL);
> > > > +      const svalue *init_sval = mgr->get_or_create_initial_value
> > > > (param_reg);
> > > > +      smap->set_state (state->m_region_model, init_sval,
> > > > +                      tainted, NULL /*origin_new_sval*/,
> > > > ext_state);
> > > > +      if (POINTER_TYPE_P (TREE_TYPE (param)))
> > > > +       {
> > > > +         const region *pointee_reg = mgr->get_symbolic_region
> > > > (init_sval);
> > > > +         /* Mark "*param" as tainted.  */
> > > > +         const svalue *init_pointee_sval
> > > > +           = mgr->get_or_create_initial_value (pointee_reg);
> > > > +         smap->set_state (state->m_region_model,
> > > > init_pointee_sval,
> > > > +                          tainted, NULL /*origin_new_sval*/,
> > > > ext_state);
> > > > +       }
> > > > +    }
> > > > +
> > > > +  return true;
> > > > +}
> > > > +
> > > > +/* Custom event for use by tainted_function_info when a function
> > > > +   has been marked with __attribute__((tainted)).  */
> > > > +
> > > > +class tainted_function_custom_event : public custom_event
> > > > +{
> > > > +public:
> > > > +  tainted_function_custom_event (location_t loc, tree fndecl,
> > > > int
> > > > depth)
> > > > +  : custom_event (loc, fndecl, depth),
> > > > +    m_fndecl (fndecl)
> > > > +  {
> > > > +  }
> > > > +
> > > > +  label_text get_desc (bool can_colorize) const FINAL OVERRIDE
> > > > +  {
> > > > +    return make_label_text
> > > > +      (can_colorize,
> > > > +       "function %qE marked with %<__attribute__((tainted))%>",
> > > > +       m_fndecl);
> > > > +  }
> > > > +
> > > > +private:
> > > > +  tree m_fndecl;
> > > > +};
> > > > +
> > > > +/* Custom exploded_edge info for top-level calls to a function
> > > > +   marked with __attribute__((tainted)).  */
> > > > +
> > > > +class tainted_function_info : public custom_edge_info
> > > > +{
> > > > +public:
> > > > +  tainted_function_info (tree fndecl)
> > > > +  : m_fndecl (fndecl)
> > > > +  {}
> > > > +
> > > > +  void print (pretty_printer *pp) const FINAL OVERRIDE
> > > > +  {
> > > > +    pp_string (pp, "call to tainted function");
> > > > +  };
> > > > +
> > > > +  bool update_model (region_model *,
> > > > +                    const exploded_edge *,
> > > > +                    region_model_context *) const FINAL OVERRIDE
> > > > +  {
> > > > +    /* No-op.  */
> > > > +    return true;
> > > > +  }
> > > > +
> > > > +  void add_events_to_path (checker_path *emission_path,
> > > > +                          const exploded_edge &) const FINAL
> > > > OVERRIDE
> > > > +  {
> > > > +    emission_path->add_event
> > > > +      (new tainted_function_custom_event
> > > > +       (DECL_SOURCE_LOCATION (m_fndecl), m_fndecl, 0));
> > > > +  }
> > > > +
> > > > +private:
> > > > +  tree m_fndecl;
> > > > +};
> > > > +
> > > >   /* Ensure that there is an exploded_node representing an
> > > > external
> > > > call to
> > > >      FUN, adding it to the worklist if creating it.
> > > >   
> > > > @@ -2302,14 +2415,25 @@ exploded_graph::add_function_entry
> > > > (function
> > > > *fun)
> > > >     program_state state (m_ext_state);
> > > >     state.push_frame (m_ext_state, fun);
> > > >   
> > > > +  custom_edge_info *edge_info = NULL;
> > > > +
> > > > +  if (lookup_attribute ("tainted", DECL_ATTRIBUTES (fun->decl)))
> > > > +    {
> > > > +      if (mark_params_as_tainted (&state, fun->decl,
> > > > m_ext_state))
> > > > +       edge_info = new tainted_function_info (fun->decl);
> > > > +    }
> > > > +
> > > >     if (!state.m_valid)
> > > >       return NULL;
> > > >   
> > > >     exploded_node *enode = get_or_create_node (point, state,
> > > > NULL);
> > > >     if (!enode)
> > > > -    return NULL;
> > > > +    {
> > > > +      delete edge_info;
> > > > +      return NULL;
> > > > +    }
> > > >   
> > > > -  add_edge (m_origin, enode, NULL);
> > > > +  add_edge (m_origin, enode, NULL, edge_info);
> > > >   
> > > >     m_functions_with_enodes.add (fun);
> > > >   
> > > > @@ -2623,6 +2747,184 @@ toplevel_function_p (function *fun,
> > > > logger
> > > > *logger)
> > > >     return true;
> > > >   }
> > > >   
> > > > +/* Custom event for use by tainted_call_info when a callback
> > > > field
> > > > has been
> > > > +   marked with __attribute__((tainted)), for labelling the
> > > > field.
> > > > */
> > > > +
> > > > +class tainted_field_custom_event : public custom_event
> > > > +{
> > > > +public:
> > > > +  tainted_field_custom_event (tree field)
> > > > +  : custom_event (DECL_SOURCE_LOCATION (field), NULL_TREE, 0),
> > > > +    m_field (field)
> > > > +  {
> > > > +  }
> > > > +
> > > > +  label_text get_desc (bool can_colorize) const FINAL OVERRIDE
> > > > +  {
> > > > +    return make_label_text (can_colorize,
> > > > +                           "field %qE of %qT"
> > > > +                           " is marked with
> > > > %<__attribute__((tainted))%>",
> > > > +                           m_field, DECL_CONTEXT (m_field));
> > > > +  }
> > > > +
> > > > +private:
> > > > +  tree m_field;
> > > > +};
> > > > +
> > > > +/* Custom event for use by tainted_call_info when a callback
> > > > field
> > > > has been
> > > > +   marked with __attribute__((tainted)), for labelling the
> > > > function
> > > > used
> > > > +   in that callback.  */
> > > > +
> > > > +class tainted_callback_custom_event : public custom_event
> > > > +{
> > > > +public:
> > > > +  tainted_callback_custom_event (location_t loc, tree fndecl,
> > > > int
> > > > depth,
> > > > +                                tree field)
> > > > +  : custom_event (loc, fndecl, depth),
> > > > +    m_field (field)
> > > > +  {
> > > > +  }
> > > > +
> > > > +  label_text get_desc (bool can_colorize) const FINAL OVERRIDE
> > > > +  {
> > > > +    return make_label_text (can_colorize,
> > > > +                           "function %qE used as initializer for
> > > > field %qE"
> > > > +                           " marked with
> > > > %<__attribute__((tainted))%>",
> > > > +                           m_fndecl, m_field);
> > > > +  }
> > > > +
> > > > +private:
> > > > +  tree m_field;
> > > > +};
> > > > +
> > > > +/* Custom edge info for use when adding a function used by a
> > > > callback field
> > > > +   marked with '__attribute__((tainted))'.   */
> > > > +
> > > > +class tainted_call_info : public custom_edge_info
> > > > +{
> > > > +public:
> > > > +  tainted_call_info (tree field, tree fndecl, location_t loc)
> > > > +  : m_field (field), m_fndecl (fndecl), m_loc (loc)
> > > > +  {}
> > > > +
> > > > +  void print (pretty_printer *pp) const FINAL OVERRIDE
> > > > +  {
> > > > +    pp_string (pp, "call to tainted field");
> > > > +  };
> > > > +
> > > > +  bool update_model (region_model *,
> > > > +                    const exploded_edge *,
> > > > +                    region_model_context *) const FINAL OVERRIDE
> > > > +  {
> > > > +    /* No-op.  */
> > > > +    return true;
> > > > +  }
> > > > +
> > > > +  void add_events_to_path (checker_path *emission_path,
> > > > +                          const exploded_edge &) const FINAL
> > > > OVERRIDE
> > > > +  {
> > > > +    /* Show the field in the struct declaration
> > > > +       e.g. "(1) field 'store' is marked with
> > > > '__attribute__((tainted))'"  */
> > > > +    emission_path->add_event
> > > > +      (new tainted_field_custom_event (m_field));
> > > > +
> > > > +    /* Show the callback in the initializer
> > > > +       e.g.
> > > > +       "(2) function 'gadget_dev_desc_UDC_store' used as
> > > > initializer
> > > > +       for field 'store' marked with
> > > > '__attribute__((tainted))'".
> > > > */
> > > > +    emission_path->add_event
> > > > +      (new tainted_callback_custom_event (m_loc, m_fndecl, 0,
> > > > m_field));
> > > > +  }
> > > > +
> > > > +private:
> > > > +  tree m_field;
> > > > +  tree m_fndecl;
> > > > +  location_t m_loc;
> > > > +};
> > > > +
> > > > +/* Given an initializer at LOC for FIELD marked with
> > > > '__attribute__((tainted))'
> > > > +   initialized with FNDECL, add an entrypoint to FNDECL to EG
> > > > (and
> > > > to its
> > > > +   worklist) where the params to FNDECL are marked as tainted. 
> > > > */
> > > > +
> > > > +static void
> > > > +add_tainted_callback (exploded_graph *eg, tree field, tree
> > > > fndecl,
> > > > +                     location_t loc)
> > > > +{
> > > > +  logger *logger = eg->get_logger ();
> > > > +
> > > > +  LOG_SCOPE (logger);
> > > > +
> > > > +  if (!gimple_has_body_p (fndecl))
> > > > +    return;
> > > > +
> > > > +  const extrinsic_state &ext_state = eg->get_ext_state ();
> > > > +
> > > > +  function *fun = DECL_STRUCT_FUNCTION (fndecl);
> > > > +  gcc_assert (fun);
> > > > +
> > > > +  program_point point
> > > > +    = program_point::from_function_entry (eg->get_supergraph (),
> > > > fun);
> > > > +  program_state state (ext_state);
> > > > +  state.push_frame (ext_state, fun);
> > > > +
> > > > +  if (!mark_params_as_tainted (&state, fndecl, ext_state))
> > > > +    return;
> > > > +
> > > > +  if (!state.m_valid)
> > > > +    return;
> > > > +
> > > > +  exploded_node *enode = eg->get_or_create_node (point, state,
> > > > NULL);
> > > > +  if (logger)
> > > > +    {
> > > > +      if (enode)
> > > > +       logger->log ("created EN %i for tainted %qE entrypoint",
> > > > +                    enode->m_index, fndecl);
> > > > +      else
> > > > +       {
> > > > +         logger->log ("did not create enode for tainted %qE
> > > > entrypoint",
> > > > +                      fndecl);
> > > > +         return;
> > > > +       }
> > > > +    }
> > > > +
> > > > +  tainted_call_info *info = new tainted_call_info (field,
> > > > fndecl,
> > > > loc);
> > > > +  eg->add_edge (eg->get_origin (), enode, NULL, info);
> > > > +}
> > > > +
> > > > +/* Callback for walk_tree for finding callbacks within
> > > > initializers;
> > > > +   ensure that any callback initializer where the corresponding
> > > > field is
> > > > +   marked with '__attribute__((tainted))' is treated as an
> > > > entrypoint to the
> > > > +   analysis, special-casing that the inputs to the callback are
> > > > +   untrustworthy.  */
> > > > +
> > > > +static tree
> > > > +add_any_callbacks (tree *tp, int *, void *data)
> > > > +{
> > > > +  exploded_graph *eg = (exploded_graph *)data;
> > > > +  if (TREE_CODE (*tp) == CONSTRUCTOR)
> > > > +    {
> > > > +      /* Find fields with the "tainted" attribute.
> > > > +        walk_tree only walks the values, not the index values;
> > > > +        look at the index values.  */
> > > > +      unsigned HOST_WIDE_INT idx;
> > > > +      constructor_elt *ce;
> > > > +
> > > > +      for (idx = 0; vec_safe_iterate (CONSTRUCTOR_ELTS (*tp),
> > > > idx,
> > > > &ce);
> > > > +          idx++)
> > > > +       if (ce->index && TREE_CODE (ce->index) == FIELD_DECL)
> > > > +         if (lookup_attribute ("tainted", DECL_ATTRIBUTES (ce-
> > > > > index)))
> > > > +           {
> > > > +             tree value = ce->value;
> > > > +             if (TREE_CODE (value) == ADDR_EXPR
> > > > +                 && TREE_CODE (TREE_OPERAND (value, 0)) ==
> > > > FUNCTION_DECL)
> > > > +               add_tainted_callback (eg, ce->index, TREE_OPERAND
> > > > (value, 0),
> > > > +                                     EXPR_LOCATION (value));
> > > > +           }
> > > > +    }
> > > > +
> > > > +  return NULL_TREE;
> > > > +}
> > > > +
> > > >   /* Add initial nodes to EG, with entrypoints for externally-
> > > > callable
> > > >      functions.  */
> > > >   
> > > > @@ -2648,6 +2950,17 @@ exploded_graph::build_initial_worklist ()
> > > >            logger->log ("did not create enode for %qE
> > > > entrypoint",
> > > > fun->decl);
> > > >         }
> > > >     }
> > > > +
> > > > +  /* Find callbacks that are reachable from global
> > > > initializers.  */
> > > > +  varpool_node *vpnode;
> > > > +  FOR_EACH_VARIABLE (vpnode)
> > > > +    {
> > > > +      tree decl = vpnode->decl;
> > > > +      tree init = DECL_INITIAL (decl);
> > > > +      if (!init)
> > > > +       continue;
> > > > +      walk_tree (&init, add_any_callbacks, this, NULL);
> > > > +    }
> > > >   }
> > > >   
> > > >   /* The main loop of the analysis.
> > > > diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> > > > index 9e03156de5e..835ba6e0e8c 100644
> > > > --- a/gcc/c-family/c-attribs.c
> > > > +++ b/gcc/c-family/c-attribs.c
> > > > @@ -117,6 +117,7 @@ static tree
> > > > handle_no_profile_instrument_function_attribute (tree *, tree,
> > > >                                                              
> > > > tree,
> > > > int, bool *);
> > > >   static tree handle_malloc_attribute (tree *, tree, tree, int,
> > > > bool
> > > > *);
> > > >   static tree handle_dealloc_attribute (tree *, tree, tree, int,
> > > > bool
> > > > *);
> > > > +static tree handle_tainted_attribute (tree *, tree, tree, int,
> > > > bool
> > > > *);
> > > >   static tree handle_returns_twice_attribute (tree *, tree, tree,
> > > > int,
> > > > bool *);
> > > >   static tree handle_no_limit_stack_attribute (tree *, tree,
> > > > tree,
> > > > int,
> > > >                                               bool *);
> > > > @@ -569,6 +570,8 @@ const struct attribute_spec
> > > > c_common_attribute_table[] =
> > > >                                handle_objc_nullability_attribute,
> > > > NULL
> > > > },
> > > >     { "*dealloc",                1, 2, true, false, false, false,
> > > >                                handle_dealloc_attribute, NULL },
> > > > +  { "tainted",               0, 0, true,  false, false, false,
> > > > +                             handle_tainted_attribute, NULL },
> > > >     { NULL,                     0, 0, false, false, false, false,
> > > > NULL, NULL }
> > > >   };
> > > >   
> > > > @@ -5857,6 +5860,39 @@ handle_objc_nullability_attribute (tree
> > > > *node,
> > > > tree name, tree args,
> > > >     return NULL_TREE;
> > > >   }
> > > >   
> > > > +/* Handle a "tainted" attribute; arguments as in
> > > > +   struct attribute_spec.handler.  */
> > > > +
> > > > +static tree
> > > > +handle_tainted_attribute (tree *node, tree name, tree, int,
> > > > +                         bool *no_add_attrs)
> > > > +{
> > > > +  if (TREE_CODE (*node) != FUNCTION_DECL
> > > > +      && TREE_CODE (*node) != FIELD_DECL)
> > > > +    {
> > > > +      warning (OPT_Wattributes, "%qE attribute ignored; valid
> > > > only "
> > > > +              "for functions and function pointer fields",
> > > > +              name);
> > > > +      *no_add_attrs = true;
> > > > +      return NULL_TREE;
> > > > +    }
> > > > +
> > > > +  if (TREE_CODE (*node) == FIELD_DECL
> > > > +      && !(TREE_CODE (TREE_TYPE (*node)) == POINTER_TYPE
> > > > +          && TREE_CODE (TREE_TYPE (TREE_TYPE (*node))) ==
> > > > FUNCTION_TYPE))
> > > > +    {
> > > > +      warning (OPT_Wattributes, "%qE attribute ignored;"
> > > > +              " field must be a function pointer",
> > > > +              name);
> > > > +      *no_add_attrs = true;
> > > > +      return NULL_TREE;
> > > > +    }
> > > > +
> > > > +  *no_add_attrs = false; /* OK */
> > > > +
> > > > +  return NULL_TREE;
> > > > +}
> > > > +
> > > >   /* Attempt to partially validate a single attribute ATTR as if
> > > >      it were to be applied to an entity OPER.  */
> > > >   
> > > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > > > index 5a6ef464779..826bbd48e7e 100644
> > > > --- a/gcc/doc/extend.texi
> > > > +++ b/gcc/doc/extend.texi
> > > > @@ -2465,7 +2465,8 @@ variable declarations (@pxref{Variable
> > > > Attributes}),
> > > >   labels (@pxref{Label Attributes}),
> > > >   enumerators (@pxref{Enumerator Attributes}),
> > > >   statements (@pxref{Statement Attributes}),
> > > > -and types (@pxref{Type Attributes}).
> > > > +types (@pxref{Type Attributes}),
> > > > +and on field declarations (for @code{tainted}).
> > > >   
> > > >   There is some overlap between the purposes of attributes and
> > > > pragmas
> > > >   (@pxref{Pragmas,,Pragmas Accepted by GCC}).  It has been
> > > > @@ -3977,6 +3978,25 @@ addition to creating a symbol version (as
> > > > if
> > > >   @code{"@var{name2}@@@var{nodename}"} was used) the version will
> > > > be
> > > > also used
> > > >   to resolve @var{name2} by the linker.
> > > >   
> > > > +@item tainted
> > > > +@cindex @code{tainted} function attribute
> > > > +The @code{tainted} attribute is used to specify that a function
> > > > is
> > > > called
> > > > +in a way that requires sanitization of its arguments, such as a
> > > > system
> > > > +call in an operating system kernel.  Such a function can be
> > > > considered part
> > > > +of the ``attack surface'' of the program.  The attribute can be
> > > > used
> > > > both
> > > > +on function declarations, and on field declarations containing
> > > > function
> > > > +pointers.  In the latter case, any function used as an
> > > > initializer
> > > > of
> > > > +such a callback field will be treated as tainted.
> > > > +
> > > > +The analyzer will pay particular attention to such functions
> > > > when
> > > > both
> > > > +@option{-fanalyzer} and @option{-fanalyzer-checker=taint} are
> > > > supplied,
> > > > +potentially issuing warnings guarded by
> > > > +@option{-Wanalyzer-exposure-through-uninit-copy},
> > > > +@option{-Wanalyzer-tainted-allocation-size},
> > > > +@option{-Wanalyzer-tainted-array-index},
> > > > +@option{Wanalyzer-tainted-offset},
> > > > +and @option{Wanalyzer-tainted-size}.
> > > > +
> > > >   @item target_clones (@var{options})
> > > >   @cindex @code{target_clones} function attribute
> > > >   The @code{target_clones} attribute is used to specify that a
> > > > function
> > > > diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
> > > > b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
> > > > new file mode 100644
> > > > index 00000000000..cc4d5900372
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
> > > > @@ -0,0 +1,88 @@
> > > > +// TODO: remove need for this option
> > > > +/* { dg-additional-options "-fanalyzer-checker=taint" } */
> > > > +
> > > > +#include "analyzer-decls.h"
> > > > +
> > > > +struct arg_buf
> > > > +{
> > > > +  int i;
> > > > +  int j;
> > > > +};
> > > > +
> > > > +/* Example of marking a function as tainted.  */
> > > > +
> > > > +void __attribute__((tainted))
> > > > +test_1 (int i, void *p, char *q)
> > > > +{
> > > > +  /* There should be a single enode,
> > > > +     for the "tainted" entry to the function.  */
> > > > +  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1
> > > > processed
> > > > enode" } */
> > > > +
> > > > +  __analyzer_dump_state ("taint", i); /* { dg-warning "state:
> > > > 'tainted'" } */
> > > > +  __analyzer_dump_state ("taint", p); /* { dg-warning "state:
> > > > 'tainted'" } */
> > > > +  __analyzer_dump_state ("taint", q); /* { dg-warning "state:
> > > > 'tainted'" } */
> > > > +  __analyzer_dump_state ("taint", *q); /* { dg-warning "state:
> > > > 'tainted'" } */
> > > > +
> > > > +  struct arg_buf *args = p;
> > > > +  __analyzer_dump_state ("taint", args->i); /* { dg-warning
> > > > "state:
> > > > 'tainted'" } */
> > > > +  __analyzer_dump_state ("taint", args->j); /* { dg-warning
> > > > "state:
> > > > 'tainted'" } */
> > > > +}
> > > > +
> > > > +/* Example of marking a callback field as tainted.  */
> > > > +
> > > > +struct s2
> > > > +{
> > > > +  void (*cb) (int, void *, char *)
> > > > +    __attribute__((tainted));
> > > > +};
> > > > +
> > > > +/* Function not marked as tainted.  */
> > > > +
> > > > +void
> > > > +test_2a (int i, void *p, char *q)
> > > > +{
> > > > +  /* There should be a single enode,
> > > > +     for the normal entry to the function.  */
> > > > +  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1
> > > > processed
> > > > enode" } */
> > > > +
> > > > +  __analyzer_dump_state ("taint", i); /* { dg-warning "state:
> > > > 'start'" } */
> > > > +  __analyzer_dump_state ("taint", p); /* { dg-warning "state:
> > > > 'start'" } */
> > > > +  __analyzer_dump_state ("taint", q); /* { dg-warning "state:
> > > > 'start'" } */
> > > > +
> > > > +  struct arg_buf *args = p;
> > > > +  __analyzer_dump_state ("taint", args->i); /* { dg-warning
> > > > "state:
> > > > 'start'" } */
> > > > +  __analyzer_dump_state ("taint", args->j); /* { dg-warning
> > > > "state:
> > > > 'start'" } */
> > > > +}
> > > > +
> > > > +/* Function referenced via t2b.cb, marked as "tainted".  */
> > > > +
> > > > +void
> > > > +test_2b (int i, void *p, char *q)
> > > > +{
> > > > +  /* There should be two enodes
> > > > +     for the direct call, and the "tainted" entry to the
> > > > function.
> > > > */
> > > > +  __analyzer_dump_exploded_nodes (0); /* { dg-warning "2
> > > > processed
> > > > enodes" } */
> > > > +}
> > > > +
> > > > +/* Callback used via t2c.cb, marked as "tainted".  */
> > > > +void
> > > > +__analyzer_test_2c (int i, void *p, char *q)
> > > > +{
> > > > +  /* There should be a single enode,
> > > > +     for the "tainted" entry to the function.  */
> > > > +  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1
> > > > processed
> > > > enode" } */
> > > > +
> > > > +  __analyzer_dump_state ("taint", i); /* { dg-warning "state:
> > > > 'tainted'" } */
> > > > +  __analyzer_dump_state ("taint", p); /* { dg-warning "state:
> > > > 'tainted'" } */
> > > > +  __analyzer_dump_state ("taint", q); /* { dg-warning "state:
> > > > 'tainted'" } */
> > > > +}
> > > > +
> > > > +struct s2 t2b =
> > > > +{
> > > > +  .cb = test_2b
> > > > +};
> > > > +
> > > > +struct s2 t2c =
> > > > +{
> > > > +  .cb = __analyzer_test_2c
> > > > +};
> > > > diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
> > > > b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
> > > > new file mode 100644
> > > > index 00000000000..6f4cbc82efb
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
> > > > @@ -0,0 +1,6 @@
> > > > +int not_a_fn __attribute__ ((tainted)); /* { dg-warning
> > > > "'tainted'
> > > > attribute ignored; valid only for functions and function pointer
> > > > fields" } */
> > > > +
> > > > +struct s
> > > > +{
> > > > +  int f __attribute__ ((tainted)); /* { dg-warning "'tainted'
> > > > attribute ignored; field must be a function pointer" } */
> > > > +};
> > > > diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-
> > > > 1.c
> > > > b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
> > > > new file mode 100644
> > > > index 00000000000..fe6c7ebbb1f
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
> > > > @@ -0,0 +1,93 @@
> > > > +/* "The osf_getsysinfo function in arch/alpha/kernel/osf_sys.c
> > > > in
> > > > the
> > > > +   Linux kernel before 2.6.39.4 on the Alpha platform does not
> > > > properly
> > > > +   restrict the data size for GSI_GET_HWRPB operations, which
> > > > allows
> > > > +   local users to obtain sensitive information from kernel
> > > > memory
> > > > via
> > > > +   a crafted call."
> > > > +
> > > > +   Fixed in 3d0475119d8722798db5e88f26493f6547a4bb5b on linux-
> > > > 2.6.39.y
> > > > +   in linux-stable.  */
> > > > +
> > > > +// TODO: remove need for this option:
> > > > +/* { dg-additional-options "-fanalyzer-checker=taint" } */
> > > > +
> > > > +#include "analyzer-decls.h"
> > > > +#include "test-uaccess.h"
> > > > +
> > > > +/* Adapted from include/linux/linkage.h.  */
> > > > +
> > > > +#define asmlinkage
> > > > +
> > > > +/* Adapted from include/linux/syscalls.h.  */
> > > > +
> > > > +#define __SC_DECL1(t1, a1)     t1 a1
> > > > +#define __SC_DECL2(t2, a2, ...) t2 a2, __SC_DECL1(__VA_ARGS__)
> > > > +#define __SC_DECL3(t3, a3, ...) t3 a3, __SC_DECL2(__VA_ARGS__)
> > > > +#define __SC_DECL4(t4, a4, ...) t4 a4, __SC_DECL3(__VA_ARGS__)
> > > > +#define __SC_DECL5(t5, a5, ...) t5 a5, __SC_DECL4(__VA_ARGS__)
> > > > +#define __SC_DECL6(t6, a6, ...) t6 a6, __SC_DECL5(__VA_ARGS__)
> > > > +
> > > > +#define SYSCALL_DEFINEx(x, sname, ...)                         \
> > > > +       __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
> > > > +
> > > > +#define SYSCALL_DEFINE(name) asmlinkage long sys_##name
> > > > +#define __SYSCALL_DEFINEx(x, name,
> > > > ...)                                        \
> > > > +       asmlinkage __attribute__((tainted)) \
> > > > +       long sys##name(__SC_DECL##x(__VA_ARGS__))
> > > > +
> > > > +#define SYSCALL_DEFINE5(name, ...) SYSCALL_DEFINEx(5, _##name,
> > > > __VA_ARGS__)
> > > > +
> > > > +/* Adapted from arch/alpha/include/asm/hwrpb.h.  */
> > > > +
> > > > +struct hwrpb_struct {
> > > > +       unsigned long phys_addr;        /* check: physical
> > > > address of
> > > > the hwrpb */
> > > > +       unsigned long id;               /* check: "HWRPB\0\0\0"
> > > > */
> > > > +       unsigned long revision;
> > > > +       unsigned long size;             /* size of hwrpb */
> > > > +       /* [...snip...] */
> > > > +};
> > > > +
> > > > +extern struct hwrpb_struct *hwrpb;
> > > > +
> > > > +/* Adapted from arch/alpha/kernel/osf_sys.c.  */
> > > > +
> > > > +SYSCALL_DEFINE5(osf_getsysinfo, unsigned long, op, void __user
> > > > *,
> > > > buffer,
> > > > +               unsigned long, nbytes, int __user *, start, void
> > > > __user *, arg)
> > > > +{
> > > > +       /* [...snip...] */
> > > > +
> > > > +       __analyzer_dump_state ("taint", nbytes);  /* { dg-warning
> > > > "tainted" } */
> > > > +
> > > > +       /* TODO: should have an event explaining why "nbytes" is
> > > > treated as
> > > > +          attacker-controlled.  */
> > > > +
> > > > +       /* case GSI_GET_HWRPB: */
> > > > +               if (nbytes < sizeof(*hwrpb))
> > > > +                       return -1;
> > > > +
> > > > +               __analyzer_dump_state ("taint", nbytes);  /* {
> > > > dg-
> > > > warning "has_lb" } */
> > > > +
> > > > +               if (copy_to_user(buffer, hwrpb, nbytes) != 0) /*
> > > > {
> > > > dg-warning "use of attacker-controlled value 'nbytes' as size
> > > > without
> > > > upper-bounds checking" } */
> > > > +                       return -2;
> > > > +
> > > > +               return 1;
> > > > +
> > > > +       /* [...snip...] */
> > > > +}
> > > > +
> > > > +/* With the fix for the sense of the size comparison.  */
> > > > +
> > > > +SYSCALL_DEFINE5(osf_getsysinfo_fixed, unsigned long, op, void
> > > > __user
> > > > *, buffer,
> > > > +               unsigned long, nbytes, int __user *, start, void
> > > > __user *, arg)
> > > > +{
> > > > +       /* [...snip...] */
> > > > +
> > > > +       /* case GSI_GET_HWRPB: */
> > > > +               if (nbytes > sizeof(*hwrpb))
> > > > +                       return -1;
> > > > +               if (copy_to_user(buffer, hwrpb, nbytes) != 0) /*
> > > > {
> > > > dg-bogus "attacker-controlled" } */
> > > > +                       return -2;
> > > > +
> > > > +               return 1;
> > > > +
> > > > +       /* [...snip...] */
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-
> > > > 1.c
> > > > b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
> > > > new file mode 100644
> > > > index 00000000000..0b9a94a8d6c
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
> > > > @@ -0,0 +1,38 @@
> > > > +/* See notes in this header.  */
> > > > +#include "taint-CVE-2020-13143.h"
> > > > +
> > > > +// TODO: remove need for this option
> > > > +/* { dg-additional-options "-fanalyzer-checker=taint" } */
> > > > +
> > > > +struct configfs_attribute {
> > > > +       /* [...snip...] */
> > > > +       ssize_t (*store)(struct config_item *, const char *,
> > > > size_t)
> > > > /* { dg-message "\\(1\\) field 'store' of 'struct
> > > > configfs_attribute'
> > > > is marked with '__attribute__\\(\\(tainted\\)\\)'" } */
> > > > +               __attribute__((tainted)); /* (this is added).  */
> > > > +};
> > > > +static inline struct gadget_info *to_gadget_info(struct
> > > > config_item
> > > > *item)
> > > > +{
> > > > +        return container_of(to_config_group(item), struct
> > > > gadget_info, group);
> > > > +}
> > > > +
> > > > +static ssize_t gadget_dev_desc_UDC_store(struct config_item
> > > > *item,
> > > > +               const char *page, size_t len)
> > > > +{
> > > > +       struct gadget_info *gi = to_gadget_info(item);
> > > > +       char *name;
> > > > +       int ret;
> > > > +
> > > > +#if 0
> > > > +       /* FIXME: this is the fix.  */
> > > > +       if (strlen(page) < len)
> > > > +               return -EOVERFLOW;
> > > > +#endif
> > > > +
> > > > +       name = kstrdup(page, GFP_KERNEL);
> > > > +       if (!name)
> > > > +               return -ENOMEM;
> > > > +       if (name[len - 1] == '\n') /* { dg-warning "use of
> > > > attacker-
> > > > controlled value 'len \[^\n\r\]+' as offset without upper-bounds
> > > > checking" } */
> > > > +               name[len - 1] = '\0'; /* { dg-warning "use of
> > > > attacker-controlled value 'len \[^\n\r\]+' as offset without
> > > > upper-
> > > > bounds checking" } */
> > > > +       /* [...snip...] */                              \
> > > > +}
> > > > +
> > > > +CONFIGFS_ATTR(gadget_dev_desc_, UDC); /* { dg-message "\\(2\\)
> > > > function 'gadget_dev_desc_UDC_store' used as initializer for
> > > > field
> > > > 'store' marked with '__attribute__\\(\\(tainted\\)\\)'" } */
> > > > diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-
> > > > 2.c
> > > > b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
> > > > new file mode 100644
> > > > index 00000000000..e05da9276c1
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
> > > > @@ -0,0 +1,32 @@
> > > > +/* See notes in this header.  */
> > > > +#include "taint-CVE-2020-13143.h"
> > > > +
> > > > +// TODO: remove need for this option
> > > > +/* { dg-additional-options "-fanalyzer-checker=taint" } */
> > > > +
> > > > +struct configfs_attribute {
> > > > +       /* [...snip...] */
> > > > +       ssize_t (*store)(struct config_item *, const char *,
> > > > size_t)
> > > > /* { dg-message "\\(1\\) field 'store' of 'struct
> > > > configfs_attribute'
> > > > is marked with '__attribute__\\(\\(tainted\\)\\)'" } */
> > > > +               __attribute__((tainted)); /* (this is added).  */
> > > > +};
> > > > +
> > > > +/* Highly simplified version.  */
> > > > +
> > > > +static ssize_t gadget_dev_desc_UDC_store(struct config_item
> > > > *item,
> > > > +               const char *page, size_t len)
> > > > +{
> > > > +       /* TODO: ought to have state_change_event talking about
> > > > where
> > > > the tainted value comes from.  */
> > > > +
> > > > +       char *name;
> > > > +       /* [...snip...] */
> > > > +
> > > > +       name = kstrdup(page, GFP_KERNEL);
> > > > +       if (!name)
> > > > +               return -ENOMEM;
> > > > +       if (name[len - 1] == '\n') /* { dg-warning "use of
> > > > attacker-
> > > > controlled value 'len \[^\n\r\]+' as offset without upper-bounds
> > > > checking" } */
> > > > +               name[len - 1] = '\0';  /* { dg-warning "use of
> > > > attacker-controlled value 'len \[^\n\r\]+' as offset without
> > > > upper-
> > > > bounds checking" } */
> > > > +       /* [...snip...] */
> > > > +       return 0;
> > > > +}
> > > > +
> > > > +CONFIGFS_ATTR(gadget_dev_desc_, UDC); /* { dg-message "\\(2\\)
> > > > function 'gadget_dev_desc_UDC_store' used as initializer for
> > > > field
> > > > 'store' marked with '__attribute__\\(\\(tainted\\)\\)'" } */
> > > > diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
> > > > b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
> > > > new file mode 100644
> > > > index 00000000000..0ba023539af
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
> > > > @@ -0,0 +1,91 @@
> > > > +/* Shared header for the various taint-CVE-2020-13143.h tests.
> > > > +
> > > > +   "gadget_dev_desc_UDC_store in drivers/usb/gadget/configfs.c
> > > > in
> > > > the
> > > > +   Linux kernel 3.16 through 5.6.13 relies on kstrdup without
> > > > considering
> > > > +   the possibility of an internal '\0' value, which allows
> > > > attackers
> > > > to
> > > > +   trigger an out-of-bounds read, aka CID-15753588bcd4."
> > > > +
> > > > +   Fixed by 15753588bcd4bbffae1cca33c8ced5722477fe1f on linux-
> > > > 5.7.y
> > > > +   in linux-stable.  */
> > > > +
> > > > +// TODO: remove need for this option
> > > > +/* { dg-additional-options "-fanalyzer-checker=taint" } */
> > > > +
> > > > +#include <stddef.h>
> > > > +
> > > > +/* Adapted from include/uapi/asm-generic/posix_types.h  */
> > > > +
> > > > +typedef unsigned int     __kernel_size_t;
> > > > +typedef int              __kernel_ssize_t;
> > > > +
> > > > +/* Adapted from include/linux/types.h  */
> > > > +
> > > > +//typedef __kernel_size_t              size_t;
> > > > +typedef __kernel_ssize_t       ssize_t;
> > > > +
> > > > +/* Adapted from include/linux/kernel.h  */
> > > > +
> > > > +#define container_of(ptr, type, member)
> > > > ({                             \
> > > > +       void *__mptr = (void
> > > > *)(ptr);                                   \
> > > > +       /* [...snip...]
> > > > */                                              \
> > > > +       ((type *)(__mptr - offsetof(type, member))); })
> > > > +
> > > > +/* Adapted from include/linux/configfs.h  */
> > > > +
> > > > +struct config_item {
> > > > +       /* [...snip...] */
> > > > +};
> > > > +
> > > > +struct config_group {
> > > > +       struct config_item              cg_item;
> > > > +       /* [...snip...] */
> > > > +};
> > > > +
> > > > +static inline struct config_group *to_config_group(struct
> > > > config_item *item)
> > > > +{
> > > > +       return item ? container_of(item,struct
> > > > config_group,cg_item)
> > > > : NULL;
> > > > +}
> > > > +
> > > > +#define CONFIGFS_ATTR(_pfx, _name)                             \
> > > > +static struct configfs_attribute _pfx##attr_##_name = {        \
> > > > +       /* [...snip...] */                              \
> > > > +       .store          = _pfx##_name##_store,          \
> > > > +}
> > > > +
> > > > +/* Adapted from include/linux/compiler.h  */
> > > > +
> > > > +#define __force
> > > > +
> > > > +/* Adapted from include/asm-generic/errno-base.h  */
> > > > +
> > > > +#define        ENOMEM          12      /* Out of memory */
> > > > +
> > > > +/* Adapted from include/linux/types.h  */
> > > > +
> > > > +#define __bitwise__
> > > > +typedef unsigned __bitwise__ gfp_t;
> > > > +
> > > > +/* Adapted from include/linux/gfp.h  */
> > > > +
> > > > +#define ___GFP_WAIT            0x10u
> > > > +#define ___GFP_IO              0x40u
> > > > +#define ___GFP_FS              0x80u
> > > > +#define __GFP_WAIT     ((__force gfp_t)___GFP_WAIT)
> > > > +#define __GFP_IO       ((__force gfp_t)___GFP_IO)
> > > > +#define __GFP_FS       ((__force gfp_t)___GFP_FS)
> > > > +#define GFP_KERNEL  (__GFP_WAIT | __GFP_IO | __GFP_FS)
> > > > +
> > > > +/* Adapted from include/linux/compiler_attributes.h  */
> > > > +
> > > > +#define __malloc                       
> > > > __attribute__((__malloc__))
> > > > +
> > > > +/* Adapted from include/linux/string.h  */
> > > > +
> > > > +extern char *kstrdup(const char *s, gfp_t gfp) __malloc;
> > > > +
> > > > +/* Adapted from drivers/usb/gadget/configfs.c  */
> > > > +
> > > > +struct gadget_info {
> > > > +       struct config_group group;
> > > > +       /* [...snip...] */                              \
> > > > +};
> > > > diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
> > > > b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
> > > > new file mode 100644
> > > > index 00000000000..4c567b2ffdf
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
> > > > @@ -0,0 +1,21 @@
> > > > +// TODO: remove need for this option:
> > > > +/* { dg-additional-options "-fanalyzer-checker=taint" } */
> > > > +
> > > > +#include "analyzer-decls.h"
> > > > +#include <stdio.h>
> > > > +#include <stdlib.h>
> > > > +#include <string.h>
> > > > +
> > > > +/* malloc with tainted size from a syscall.  */
> > > > +
> > > > +void *p;
> > > > +
> > > > +void __attribute__((tainted))
> > > > +test_1 (size_t sz) /* { dg-message "\\(1\\) function 'test_1'
> > > > marked
> > > > with '__attribute__\\(\\(tainted\\)\\)'" } */
> > > > +{
> > > > +  /* TODO: should have a message saying why "sz" is tainted,
> > > > e.g.
> > > > +     "treating 'sz' as attacker-controlled because 'test_1' is
> > > > marked with '__attribute__((tainted))'"  */
> > > > +
> > > > +  p = malloc (sz); /* { dg-warning "use of attacker-controlled
> > > > value
> > > > 'sz' as allocation size without upper-bounds checking" "warning"
> > > > } */
> > > > +  /* { dg-message "\\(\[0-9\]+\\) use of attacker-controlled
> > > > value
> > > > 'sz' as allocation size without upper-bounds checking" "final
> > > > event"
> > > > { target *-*-* } .-1 } */
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
> > > > b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
> > > > new file mode 100644
> > > > index 00000000000..f52cafcd71d
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
> > > > @@ -0,0 +1,31 @@
> > > > +// TODO: remove need for this option:
> > > > +/* { dg-additional-options "-fanalyzer-checker=taint" } */
> > > > +
> > > > +#include "analyzer-decls.h"
> > > > +#include <stdio.h>
> > > > +#include <stdlib.h>
> > > > +#include <string.h>
> > > > +
> > > > +/* malloc with tainted size from a syscall.  */
> > > > +
> > > > +struct arg_buf
> > > > +{
> > > > +  size_t sz;
> > > > +};
> > > > +
> > > > +void *p;
> > > > +
> > > > +void __attribute__((tainted))
> > > > +test_1 (void *data) /* { dg-message "\\(1\\) function 'test_1'
> > > > marked with '__attribute__\\(\\(tainted\\)\\)'" } */
> > > > +{
> > > > +  /* we should treat pointed-to-structs as tainted.  */
> > > > +  __analyzer_dump_state ("taint", data); /* { dg-warning "state:
> > > > 'tainted'" } */
> > > > +
> > > > +  struct arg_buf *args = data;
> > > > +
> > > > +  __analyzer_dump_state ("taint", args); /* { dg-warning "state:
> > > > 'tainted'" } */
> > > > +  __analyzer_dump_state ("taint", args->sz); /* { dg-warning
> > > > "state:
> > > > 'tainted'" } */
> > > > +
> > > > +  p = malloc (args->sz); /* { dg-warning "use of attacker-
> > > > controlled
> > > > value '\\*args.sz' as allocation size without upper-bounds
> > > > checking"
> > > > "warning" } */
> > > > +  /* { dg-message "\\(\[0-9\]+\\) use of attacker-controlled
> > > > value
> > > > '\\*args.sz' as allocation size without upper-bounds checking"
> > > > "final
> > > > event" { target *-*-* } .-1 } */
> > > > +}
> > > 
> > 
> > 
> 



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: PING^2 (C/C++): Re: [PATCH 6/6] Add __attribute__ ((tainted))
  2022-01-12 15:33         ` David Malcolm
@ 2022-01-13 19:08           ` Jason Merrill
  2022-01-14  1:25             ` [committed] Add __attribute__ ((tainted_args)) David Malcolm
  0 siblings, 1 reply; 39+ messages in thread
From: Jason Merrill @ 2022-01-13 19:08 UTC (permalink / raw)
  To: David Malcolm, gcc-patches, linux-toolchains

On 1/12/22 10:33, David Malcolm wrote:
> On Tue, 2022-01-11 at 23:36 -0500, Jason Merrill wrote:
>> On 1/10/22 16:36, David Malcolm via Gcc-patches wrote:
>>> On Thu, 2022-01-06 at 09:08 -0500, David Malcolm wrote:
>>>> On Sat, 2021-11-13 at 15:37 -0500, David Malcolm wrote:
>>>>> This patch adds a new __attribute__ ((tainted)) to the C/C++
>>>>> frontends.
>>>>
>>>> Ping for GCC C/C++ mantainers for review of the C/C++ FE parts of
>>>> this
>>>> patch (attribute registration, documentation, the name of the
>>>> attribute, etc).
>>>>
>>>> (I believe it's independent of the rest of the patch kit, in that
>>>> it
>>>> could go into trunk without needing the prior patches)
>>>>
>>>> Thanks
>>>> Dave
>>>
>>> Getting close to end of stage 3 for GCC 12, so pinging this patch
>>> again...
>>>
>>> https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584376.html
>>
>> The c-family change is OK.
> 
> Thanks.
> 
> I'm retesting the patch now, but it now seems to me that
>    __attribute__((tainted_args))
> would lead to more readable code than:
>    __attribute__((tainted))
> 
> in that the name "tainted_args" better conveys the idea that all
> arguments are under attacker-control (as opposed to the body of the
> function or the function pointer being under attacker-control).
> 
> Looking at
>    https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
> we already have some attributes with underscores in their names.
> 
> Does this sound good?

Makes sense to me.

>>
>>> Thanks
>>> Dave
>>>
>>>>
>>>>
>>>>>
>>>>> It can be used on function decls: the analyzer will treat as
>>>>> tainted
>>>>> all parameters to the function and all buffers pointed to by
>>>>> parameters
>>>>> to the function.  Adding this in one place to the Linux kernel's
>>>>> __SYSCALL_DEFINEx macro allows the analyzer to treat all syscalls
>>>>> as
>>>>> having tainted inputs.  This gives additional testing beyond e.g.
>>>>> __user
>>>>> pointers added by earlier patches - an example of the use of this
>>>>> can
>>>>> be
>>>>> seen in CVE-2011-2210, where given:
>>>>>
>>>>>    SYSCALL_DEFINE5(osf_getsysinfo, unsigned long, op, void __user
>>>>> *,
>>>>> buffer,
>>>>>                    unsigned long, nbytes, int __user *, start,
>>>>> void
>>>>> __user *, arg)
>>>>>
>>>>> the analyzer will treat the nbytes param as under attacker
>>>>> control,
>>>>> and
>>>>> can complain accordingly:
>>>>>
>>>>> taint-CVE-2011-2210-1.c: In function ‘sys_osf_getsysinfo’:
>>>>> taint-CVE-2011-2210-1.c:69:21: warning: use of attacker-
>>>>> controlled
>>>>> value
>>>>>     ‘nbytes’ as size without upper-bounds checking [CWE-129] [-
>>>>> Wanalyzer-tainted-size]
>>>>>      69 |                 if (copy_to_user(buffer, hwrpb, nbytes)
>>>>> != 0)
>>>>>         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>
>>>>> Additionally, the patch allows the attribute to be used on field
>>>>> decls:
>>>>> specifically function pointers.  Any function used as an
>>>>> initializer
>>>>> for such a field gets treated as tainted.  An example can be seen
>>>>> in
>>>>> CVE-2020-13143, where adding __attribute__((tainted)) to the
>>>>> "store"
>>>>> callback of configfs_attribute:
>>>>>
>>>>>     struct configfs_attribute {
>>>>>        /* [...snip...] */
>>>>>        ssize_t (*store)(struct config_item *, const char *,
>>>>> size_t)
>>>>>          __attribute__((tainted));
>>>>>        /* [...snip...] */
>>>>>     };
>>>>>
>>>>> allows the analyzer to see:
>>>>>
>>>>>    CONFIGFS_ATTR(gadget_dev_desc_, UDC);
>>>>>
>>>>> and treat gadget_dev_desc_UDC_store as tainted, so that it
>>>>> complains:
>>>>>
>>>>> taint-CVE-2020-13143-1.c: In function
>>>>> ‘gadget_dev_desc_UDC_store’:
>>>>> taint-CVE-2020-13143-1.c:33:17: warning: use of attacker-
>>>>> controlled
>>>>> value
>>>>>     ‘len + 18446744073709551615’ as offset without upper-bounds
>>>>> checking [CWE-823] [-Wanalyzer-tainted-offset]
>>>>>      33 |         if (name[len - 1] == '\n')
>>>>>         |             ~~~~^~~~~~~~~
>>>>>
>>>>> Similarly, the attribute could be used on the ioctl callback
>>>>> field,
>>>>> USB device callbacks, network-handling callbacks etc.  This
>>>>> potentially
>>>>> gives a lot of test coverage with relatively little code
>>>>> annotation,
>>>>> and
>>>>> without necessarily needing link-time analysis (which -fanalyzer
>>>>> can
>>>>> only do at present on trivial examples).
>>>>>
>>>>> I believe this is the first time we've had an attribute on a
>>>>> field.
>>>>> If that's an issue, I could prepare a version of the patch that
>>>>> merely allowed it on functions themselves.
>>>>>
>>>>> As before this currently still needs -fanalyzer-checker=taint (in
>>>>> addition to -fanalyzer).
>>>>>
>>>>> gcc/analyzer/ChangeLog:
>>>>>           * engine.cc: Include "stringpool.h", "attribs.h", and
>>>>>           "tree-dfa.h".
>>>>>           (mark_params_as_tainted): New.
>>>>>           (class tainted_function_custom_event): New.
>>>>>           (class tainted_function_info): New.
>>>>>           (exploded_graph::add_function_entry): Handle functions
>>>>> with
>>>>>           "tainted" attribute.
>>>>>           (class tainted_field_custom_event): New.
>>>>>           (class tainted_callback_custom_event): New.
>>>>>           (class tainted_call_info): New.
>>>>>           (add_tainted_callback): New.
>>>>>           (add_any_callbacks): New.
>>>>>           (exploded_graph::build_initial_worklist): Find callbacks
>>>>> that
>>>>> are
>>>>>           reachable from global initializers, calling
>>>>> add_any_callbacks
>>>>> on
>>>>>           them.
>>>>>
>>>>> gcc/c-family/ChangeLog:
>>>>>           * c-attribs.c (c_common_attribute_table): Add "tainted".
>>>>>           (handle_tainted_attribute): New.
>>>>>
>>>>> gcc/ChangeLog:
>>>>>           * doc/extend.texi (Function Attributes): Note that
>>>>> "tainted"
>>>>> can
>>>>>           be used on field decls.
>>>>>           (Common Function Attributes): Add entry on "tainted"
>>>>> attribute.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>           * gcc.dg/analyzer/attr-tainted-1.c: New test.
>>>>>           * gcc.dg/analyzer/attr-tainted-misuses.c: New test.
>>>>>           * gcc.dg/analyzer/taint-CVE-2011-2210-1.c: New test.
>>>>>           * gcc.dg/analyzer/taint-CVE-2020-13143-1.c: New test.
>>>>>           * gcc.dg/analyzer/taint-CVE-2020-13143-2.c: New test.
>>>>>           * gcc.dg/analyzer/taint-CVE-2020-13143.h: New test.
>>>>>           * gcc.dg/analyzer/taint-alloc-3.c: New test.
>>>>>           * gcc.dg/analyzer/taint-alloc-4.c: New test.
>>>>>
>>>>> Signed-off-by: David Malcolm <dmalcolm@redhat.com>
>>>>> ---
>>>>>    gcc/analyzer/engine.cc                        | 317
>>>>> +++++++++++++++++-
>>>>>    gcc/c-family/c-attribs.c                      |  36 ++
>>>>>    gcc/doc/extend.texi                           |  22 +-
>>>>>    .../gcc.dg/analyzer/attr-tainted-1.c          |  88 +++++
>>>>>    .../gcc.dg/analyzer/attr-tainted-misuses.c    |   6 +
>>>>>    .../gcc.dg/analyzer/taint-CVE-2011-2210-1.c   |  93 +++++
>>>>>    .../gcc.dg/analyzer/taint-CVE-2020-13143-1.c  |  38 +++
>>>>>    .../gcc.dg/analyzer/taint-CVE-2020-13143-2.c  |  32 ++
>>>>>    .../gcc.dg/analyzer/taint-CVE-2020-13143.h    |  91 +++++
>>>>>    gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c |  21 ++
>>>>>    gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c |  31 ++
>>>>>    11 files changed, 772 insertions(+), 3 deletions(-)
>>>>>    create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-
>>>>> 1.c
>>>>>    create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-
>>>>> misuses.c
>>>>>    create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-
>>>>> 2011-
>>>>> 2210-1.c
>>>>>    create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-
>>>>> 2020-
>>>>> 13143-1.c
>>>>>    create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-
>>>>> 2020-
>>>>> 13143-2.c
>>>>>    create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-
>>>>> 2020-
>>>>> 13143.h
>>>>>    create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-
>>>>> 3.c
>>>>>    create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-
>>>>> 4.c
>>>>>
>>>>> diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
>>>>> index 096e219392d..5fab41daf93 100644
>>>>> --- a/gcc/analyzer/engine.cc
>>>>> +++ b/gcc/analyzer/engine.cc
>>>>> @@ -68,6 +68,9 @@ along with GCC; see the file COPYING3.  If not
>>>>> see
>>>>>    #include "plugin.h"
>>>>>    #include "target.h"
>>>>>    #include <memory>
>>>>> +#include "stringpool.h"
>>>>> +#include "attribs.h"
>>>>> +#include "tree-dfa.h"
>>>>>    
>>>>>    /* For an overview, see gcc/doc/analyzer.texi.  */
>>>>>    
>>>>> @@ -2276,6 +2279,116 @@ exploded_graph::~exploded_graph ()
>>>>>        delete (*iter).second;
>>>>>    }
>>>>>    
>>>>> +/* Subroutine for use when implementing __attribute__((tainted))
>>>>> +   on functions and on function pointer fields in structs.
>>>>> +
>>>>> +   Called on STATE representing a call to FNDECL.
>>>>> +   Mark all params of FNDECL in STATE as "tainted".  Mark the
>>>>> value
>>>>> of all
>>>>> +   regions pointed to by params of FNDECL as "tainted".
>>>>> +
>>>>> +   Return true if successful; return false if the "taint" state
>>>>> machine
>>>>> +   was not found.  */
>>>>> +
>>>>> +static bool
>>>>> +mark_params_as_tainted (program_state *state, tree fndecl,
>>>>> +                       const extrinsic_state &ext_state)
>>>>> +{
>>>>> +  unsigned taint_sm_idx;
>>>>> +  if (!ext_state.get_sm_idx_by_name ("taint", &taint_sm_idx))
>>>>> +    return false;
>>>>> +  sm_state_map *smap = state->m_checker_states[taint_sm_idx];
>>>>> +
>>>>> +  const state_machine &sm = ext_state.get_sm (taint_sm_idx);
>>>>> +  state_machine::state_t tainted = sm.get_state_by_name
>>>>> ("tainted");
>>>>> +
>>>>> +  region_model_manager *mgr = ext_state.get_model_manager ();
>>>>> +
>>>>> +  function *fun = DECL_STRUCT_FUNCTION (fndecl);
>>>>> +  gcc_assert (fun);
>>>>> +
>>>>> +  for (tree iter_parm = DECL_ARGUMENTS (fndecl); iter_parm;
>>>>> +       iter_parm = DECL_CHAIN (iter_parm))
>>>>> +    {
>>>>> +      tree param = iter_parm;
>>>>> +      if (tree parm_default_ssa = ssa_default_def (fun,
>>>>> iter_parm))
>>>>> +       param = parm_default_ssa;
>>>>> +      const region *param_reg = state->m_region_model-
>>>>>> get_lvalue
>>>>> (param, NULL);
>>>>> +      const svalue *init_sval = mgr->get_or_create_initial_value
>>>>> (param_reg);
>>>>> +      smap->set_state (state->m_region_model, init_sval,
>>>>> +                      tainted, NULL /*origin_new_sval*/,
>>>>> ext_state);
>>>>> +      if (POINTER_TYPE_P (TREE_TYPE (param)))
>>>>> +       {
>>>>> +         const region *pointee_reg = mgr->get_symbolic_region
>>>>> (init_sval);
>>>>> +         /* Mark "*param" as tainted.  */
>>>>> +         const svalue *init_pointee_sval
>>>>> +           = mgr->get_or_create_initial_value (pointee_reg);
>>>>> +         smap->set_state (state->m_region_model,
>>>>> init_pointee_sval,
>>>>> +                          tainted, NULL /*origin_new_sval*/,
>>>>> ext_state);
>>>>> +       }
>>>>> +    }
>>>>> +
>>>>> +  return true;
>>>>> +}
>>>>> +
>>>>> +/* Custom event for use by tainted_function_info when a function
>>>>> +   has been marked with __attribute__((tainted)).  */
>>>>> +
>>>>> +class tainted_function_custom_event : public custom_event
>>>>> +{
>>>>> +public:
>>>>> +  tainted_function_custom_event (location_t loc, tree fndecl,
>>>>> int
>>>>> depth)
>>>>> +  : custom_event (loc, fndecl, depth),
>>>>> +    m_fndecl (fndecl)
>>>>> +  {
>>>>> +  }
>>>>> +
>>>>> +  label_text get_desc (bool can_colorize) const FINAL OVERRIDE
>>>>> +  {
>>>>> +    return make_label_text
>>>>> +      (can_colorize,
>>>>> +       "function %qE marked with %<__attribute__((tainted))%>",
>>>>> +       m_fndecl);
>>>>> +  }
>>>>> +
>>>>> +private:
>>>>> +  tree m_fndecl;
>>>>> +};
>>>>> +
>>>>> +/* Custom exploded_edge info for top-level calls to a function
>>>>> +   marked with __attribute__((tainted)).  */
>>>>> +
>>>>> +class tainted_function_info : public custom_edge_info
>>>>> +{
>>>>> +public:
>>>>> +  tainted_function_info (tree fndecl)
>>>>> +  : m_fndecl (fndecl)
>>>>> +  {}
>>>>> +
>>>>> +  void print (pretty_printer *pp) const FINAL OVERRIDE
>>>>> +  {
>>>>> +    pp_string (pp, "call to tainted function");
>>>>> +  };
>>>>> +
>>>>> +  bool update_model (region_model *,
>>>>> +                    const exploded_edge *,
>>>>> +                    region_model_context *) const FINAL OVERRIDE
>>>>> +  {
>>>>> +    /* No-op.  */
>>>>> +    return true;
>>>>> +  }
>>>>> +
>>>>> +  void add_events_to_path (checker_path *emission_path,
>>>>> +                          const exploded_edge &) const FINAL
>>>>> OVERRIDE
>>>>> +  {
>>>>> +    emission_path->add_event
>>>>> +      (new tainted_function_custom_event
>>>>> +       (DECL_SOURCE_LOCATION (m_fndecl), m_fndecl, 0));
>>>>> +  }
>>>>> +
>>>>> +private:
>>>>> +  tree m_fndecl;
>>>>> +};
>>>>> +
>>>>>    /* Ensure that there is an exploded_node representing an
>>>>> external
>>>>> call to
>>>>>       FUN, adding it to the worklist if creating it.
>>>>>    
>>>>> @@ -2302,14 +2415,25 @@ exploded_graph::add_function_entry
>>>>> (function
>>>>> *fun)
>>>>>      program_state state (m_ext_state);
>>>>>      state.push_frame (m_ext_state, fun);
>>>>>    
>>>>> +  custom_edge_info *edge_info = NULL;
>>>>> +
>>>>> +  if (lookup_attribute ("tainted", DECL_ATTRIBUTES (fun->decl)))
>>>>> +    {
>>>>> +      if (mark_params_as_tainted (&state, fun->decl,
>>>>> m_ext_state))
>>>>> +       edge_info = new tainted_function_info (fun->decl);
>>>>> +    }
>>>>> +
>>>>>      if (!state.m_valid)
>>>>>        return NULL;
>>>>>    
>>>>>      exploded_node *enode = get_or_create_node (point, state,
>>>>> NULL);
>>>>>      if (!enode)
>>>>> -    return NULL;
>>>>> +    {
>>>>> +      delete edge_info;
>>>>> +      return NULL;
>>>>> +    }
>>>>>    
>>>>> -  add_edge (m_origin, enode, NULL);
>>>>> +  add_edge (m_origin, enode, NULL, edge_info);
>>>>>    
>>>>>      m_functions_with_enodes.add (fun);
>>>>>    
>>>>> @@ -2623,6 +2747,184 @@ toplevel_function_p (function *fun,
>>>>> logger
>>>>> *logger)
>>>>>      return true;
>>>>>    }
>>>>>    
>>>>> +/* Custom event for use by tainted_call_info when a callback
>>>>> field
>>>>> has been
>>>>> +   marked with __attribute__((tainted)), for labelling the
>>>>> field.
>>>>> */
>>>>> +
>>>>> +class tainted_field_custom_event : public custom_event
>>>>> +{
>>>>> +public:
>>>>> +  tainted_field_custom_event (tree field)
>>>>> +  : custom_event (DECL_SOURCE_LOCATION (field), NULL_TREE, 0),
>>>>> +    m_field (field)
>>>>> +  {
>>>>> +  }
>>>>> +
>>>>> +  label_text get_desc (bool can_colorize) const FINAL OVERRIDE
>>>>> +  {
>>>>> +    return make_label_text (can_colorize,
>>>>> +                           "field %qE of %qT"
>>>>> +                           " is marked with
>>>>> %<__attribute__((tainted))%>",
>>>>> +                           m_field, DECL_CONTEXT (m_field));
>>>>> +  }
>>>>> +
>>>>> +private:
>>>>> +  tree m_field;
>>>>> +};
>>>>> +
>>>>> +/* Custom event for use by tainted_call_info when a callback
>>>>> field
>>>>> has been
>>>>> +   marked with __attribute__((tainted)), for labelling the
>>>>> function
>>>>> used
>>>>> +   in that callback.  */
>>>>> +
>>>>> +class tainted_callback_custom_event : public custom_event
>>>>> +{
>>>>> +public:
>>>>> +  tainted_callback_custom_event (location_t loc, tree fndecl,
>>>>> int
>>>>> depth,
>>>>> +                                tree field)
>>>>> +  : custom_event (loc, fndecl, depth),
>>>>> +    m_field (field)
>>>>> +  {
>>>>> +  }
>>>>> +
>>>>> +  label_text get_desc (bool can_colorize) const FINAL OVERRIDE
>>>>> +  {
>>>>> +    return make_label_text (can_colorize,
>>>>> +                           "function %qE used as initializer for
>>>>> field %qE"
>>>>> +                           " marked with
>>>>> %<__attribute__((tainted))%>",
>>>>> +                           m_fndecl, m_field);
>>>>> +  }
>>>>> +
>>>>> +private:
>>>>> +  tree m_field;
>>>>> +};
>>>>> +
>>>>> +/* Custom edge info for use when adding a function used by a
>>>>> callback field
>>>>> +   marked with '__attribute__((tainted))'.   */
>>>>> +
>>>>> +class tainted_call_info : public custom_edge_info
>>>>> +{
>>>>> +public:
>>>>> +  tainted_call_info (tree field, tree fndecl, location_t loc)
>>>>> +  : m_field (field), m_fndecl (fndecl), m_loc (loc)
>>>>> +  {}
>>>>> +
>>>>> +  void print (pretty_printer *pp) const FINAL OVERRIDE
>>>>> +  {
>>>>> +    pp_string (pp, "call to tainted field");
>>>>> +  };
>>>>> +
>>>>> +  bool update_model (region_model *,
>>>>> +                    const exploded_edge *,
>>>>> +                    region_model_context *) const FINAL OVERRIDE
>>>>> +  {
>>>>> +    /* No-op.  */
>>>>> +    return true;
>>>>> +  }
>>>>> +
>>>>> +  void add_events_to_path (checker_path *emission_path,
>>>>> +                          const exploded_edge &) const FINAL
>>>>> OVERRIDE
>>>>> +  {
>>>>> +    /* Show the field in the struct declaration
>>>>> +       e.g. "(1) field 'store' is marked with
>>>>> '__attribute__((tainted))'"  */
>>>>> +    emission_path->add_event
>>>>> +      (new tainted_field_custom_event (m_field));
>>>>> +
>>>>> +    /* Show the callback in the initializer
>>>>> +       e.g.
>>>>> +       "(2) function 'gadget_dev_desc_UDC_store' used as
>>>>> initializer
>>>>> +       for field 'store' marked with
>>>>> '__attribute__((tainted))'".
>>>>> */
>>>>> +    emission_path->add_event
>>>>> +      (new tainted_callback_custom_event (m_loc, m_fndecl, 0,
>>>>> m_field));
>>>>> +  }
>>>>> +
>>>>> +private:
>>>>> +  tree m_field;
>>>>> +  tree m_fndecl;
>>>>> +  location_t m_loc;
>>>>> +};
>>>>> +
>>>>> +/* Given an initializer at LOC for FIELD marked with
>>>>> '__attribute__((tainted))'
>>>>> +   initialized with FNDECL, add an entrypoint to FNDECL to EG
>>>>> (and
>>>>> to its
>>>>> +   worklist) where the params to FNDECL are marked as tainted.
>>>>> */
>>>>> +
>>>>> +static void
>>>>> +add_tainted_callback (exploded_graph *eg, tree field, tree
>>>>> fndecl,
>>>>> +                     location_t loc)
>>>>> +{
>>>>> +  logger *logger = eg->get_logger ();
>>>>> +
>>>>> +  LOG_SCOPE (logger);
>>>>> +
>>>>> +  if (!gimple_has_body_p (fndecl))
>>>>> +    return;
>>>>> +
>>>>> +  const extrinsic_state &ext_state = eg->get_ext_state ();
>>>>> +
>>>>> +  function *fun = DECL_STRUCT_FUNCTION (fndecl);
>>>>> +  gcc_assert (fun);
>>>>> +
>>>>> +  program_point point
>>>>> +    = program_point::from_function_entry (eg->get_supergraph (),
>>>>> fun);
>>>>> +  program_state state (ext_state);
>>>>> +  state.push_frame (ext_state, fun);
>>>>> +
>>>>> +  if (!mark_params_as_tainted (&state, fndecl, ext_state))
>>>>> +    return;
>>>>> +
>>>>> +  if (!state.m_valid)
>>>>> +    return;
>>>>> +
>>>>> +  exploded_node *enode = eg->get_or_create_node (point, state,
>>>>> NULL);
>>>>> +  if (logger)
>>>>> +    {
>>>>> +      if (enode)
>>>>> +       logger->log ("created EN %i for tainted %qE entrypoint",
>>>>> +                    enode->m_index, fndecl);
>>>>> +      else
>>>>> +       {
>>>>> +         logger->log ("did not create enode for tainted %qE
>>>>> entrypoint",
>>>>> +                      fndecl);
>>>>> +         return;
>>>>> +       }
>>>>> +    }
>>>>> +
>>>>> +  tainted_call_info *info = new tainted_call_info (field,
>>>>> fndecl,
>>>>> loc);
>>>>> +  eg->add_edge (eg->get_origin (), enode, NULL, info);
>>>>> +}
>>>>> +
>>>>> +/* Callback for walk_tree for finding callbacks within
>>>>> initializers;
>>>>> +   ensure that any callback initializer where the corresponding
>>>>> field is
>>>>> +   marked with '__attribute__((tainted))' is treated as an
>>>>> entrypoint to the
>>>>> +   analysis, special-casing that the inputs to the callback are
>>>>> +   untrustworthy.  */
>>>>> +
>>>>> +static tree
>>>>> +add_any_callbacks (tree *tp, int *, void *data)
>>>>> +{
>>>>> +  exploded_graph *eg = (exploded_graph *)data;
>>>>> +  if (TREE_CODE (*tp) == CONSTRUCTOR)
>>>>> +    {
>>>>> +      /* Find fields with the "tainted" attribute.
>>>>> +        walk_tree only walks the values, not the index values;
>>>>> +        look at the index values.  */
>>>>> +      unsigned HOST_WIDE_INT idx;
>>>>> +      constructor_elt *ce;
>>>>> +
>>>>> +      for (idx = 0; vec_safe_iterate (CONSTRUCTOR_ELTS (*tp),
>>>>> idx,
>>>>> &ce);
>>>>> +          idx++)
>>>>> +       if (ce->index && TREE_CODE (ce->index) == FIELD_DECL)
>>>>> +         if (lookup_attribute ("tainted", DECL_ATTRIBUTES (ce-
>>>>>> index)))
>>>>> +           {
>>>>> +             tree value = ce->value;
>>>>> +             if (TREE_CODE (value) == ADDR_EXPR
>>>>> +                 && TREE_CODE (TREE_OPERAND (value, 0)) ==
>>>>> FUNCTION_DECL)
>>>>> +               add_tainted_callback (eg, ce->index, TREE_OPERAND
>>>>> (value, 0),
>>>>> +                                     EXPR_LOCATION (value));
>>>>> +           }
>>>>> +    }
>>>>> +
>>>>> +  return NULL_TREE;
>>>>> +}
>>>>> +
>>>>>    /* Add initial nodes to EG, with entrypoints for externally-
>>>>> callable
>>>>>       functions.  */
>>>>>    
>>>>> @@ -2648,6 +2950,17 @@ exploded_graph::build_initial_worklist ()
>>>>>             logger->log ("did not create enode for %qE
>>>>> entrypoint",
>>>>> fun->decl);
>>>>>          }
>>>>>      }
>>>>> +
>>>>> +  /* Find callbacks that are reachable from global
>>>>> initializers.  */
>>>>> +  varpool_node *vpnode;
>>>>> +  FOR_EACH_VARIABLE (vpnode)
>>>>> +    {
>>>>> +      tree decl = vpnode->decl;
>>>>> +      tree init = DECL_INITIAL (decl);
>>>>> +      if (!init)
>>>>> +       continue;
>>>>> +      walk_tree (&init, add_any_callbacks, this, NULL);
>>>>> +    }
>>>>>    }
>>>>>    
>>>>>    /* The main loop of the analysis.
>>>>> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
>>>>> index 9e03156de5e..835ba6e0e8c 100644
>>>>> --- a/gcc/c-family/c-attribs.c
>>>>> +++ b/gcc/c-family/c-attribs.c
>>>>> @@ -117,6 +117,7 @@ static tree
>>>>> handle_no_profile_instrument_function_attribute (tree *, tree,
>>>>>                                                               
>>>>> tree,
>>>>> int, bool *);
>>>>>    static tree handle_malloc_attribute (tree *, tree, tree, int,
>>>>> bool
>>>>> *);
>>>>>    static tree handle_dealloc_attribute (tree *, tree, tree, int,
>>>>> bool
>>>>> *);
>>>>> +static tree handle_tainted_attribute (tree *, tree, tree, int,
>>>>> bool
>>>>> *);
>>>>>    static tree handle_returns_twice_attribute (tree *, tree, tree,
>>>>> int,
>>>>> bool *);
>>>>>    static tree handle_no_limit_stack_attribute (tree *, tree,
>>>>> tree,
>>>>> int,
>>>>>                                                bool *);
>>>>> @@ -569,6 +570,8 @@ const struct attribute_spec
>>>>> c_common_attribute_table[] =
>>>>>                                 handle_objc_nullability_attribute,
>>>>> NULL
>>>>> },
>>>>>      { "*dealloc",                1, 2, true, false, false, false,
>>>>>                                 handle_dealloc_attribute, NULL },
>>>>> +  { "tainted",               0, 0, true,  false, false, false,
>>>>> +                             handle_tainted_attribute, NULL },
>>>>>      { NULL,                     0, 0, false, false, false, false,
>>>>> NULL, NULL }
>>>>>    };
>>>>>    
>>>>> @@ -5857,6 +5860,39 @@ handle_objc_nullability_attribute (tree
>>>>> *node,
>>>>> tree name, tree args,
>>>>>      return NULL_TREE;
>>>>>    }
>>>>>    
>>>>> +/* Handle a "tainted" attribute; arguments as in
>>>>> +   struct attribute_spec.handler.  */
>>>>> +
>>>>> +static tree
>>>>> +handle_tainted_attribute (tree *node, tree name, tree, int,
>>>>> +                         bool *no_add_attrs)
>>>>> +{
>>>>> +  if (TREE_CODE (*node) != FUNCTION_DECL
>>>>> +      && TREE_CODE (*node) != FIELD_DECL)
>>>>> +    {
>>>>> +      warning (OPT_Wattributes, "%qE attribute ignored; valid
>>>>> only "
>>>>> +              "for functions and function pointer fields",
>>>>> +              name);
>>>>> +      *no_add_attrs = true;
>>>>> +      return NULL_TREE;
>>>>> +    }
>>>>> +
>>>>> +  if (TREE_CODE (*node) == FIELD_DECL
>>>>> +      && !(TREE_CODE (TREE_TYPE (*node)) == POINTER_TYPE
>>>>> +          && TREE_CODE (TREE_TYPE (TREE_TYPE (*node))) ==
>>>>> FUNCTION_TYPE))
>>>>> +    {
>>>>> +      warning (OPT_Wattributes, "%qE attribute ignored;"
>>>>> +              " field must be a function pointer",
>>>>> +              name);
>>>>> +      *no_add_attrs = true;
>>>>> +      return NULL_TREE;
>>>>> +    }
>>>>> +
>>>>> +  *no_add_attrs = false; /* OK */
>>>>> +
>>>>> +  return NULL_TREE;
>>>>> +}
>>>>> +
>>>>>    /* Attempt to partially validate a single attribute ATTR as if
>>>>>       it were to be applied to an entity OPER.  */
>>>>>    
>>>>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>>>>> index 5a6ef464779..826bbd48e7e 100644
>>>>> --- a/gcc/doc/extend.texi
>>>>> +++ b/gcc/doc/extend.texi
>>>>> @@ -2465,7 +2465,8 @@ variable declarations (@pxref{Variable
>>>>> Attributes}),
>>>>>    labels (@pxref{Label Attributes}),
>>>>>    enumerators (@pxref{Enumerator Attributes}),
>>>>>    statements (@pxref{Statement Attributes}),
>>>>> -and types (@pxref{Type Attributes}).
>>>>> +types (@pxref{Type Attributes}),
>>>>> +and on field declarations (for @code{tainted}).
>>>>>    
>>>>>    There is some overlap between the purposes of attributes and
>>>>> pragmas
>>>>>    (@pxref{Pragmas,,Pragmas Accepted by GCC}).  It has been
>>>>> @@ -3977,6 +3978,25 @@ addition to creating a symbol version (as
>>>>> if
>>>>>    @code{"@var{name2}@@@var{nodename}"} was used) the version will
>>>>> be
>>>>> also used
>>>>>    to resolve @var{name2} by the linker.
>>>>>    
>>>>> +@item tainted
>>>>> +@cindex @code{tainted} function attribute
>>>>> +The @code{tainted} attribute is used to specify that a function
>>>>> is
>>>>> called
>>>>> +in a way that requires sanitization of its arguments, such as a
>>>>> system
>>>>> +call in an operating system kernel.  Such a function can be
>>>>> considered part
>>>>> +of the ``attack surface'' of the program.  The attribute can be
>>>>> used
>>>>> both
>>>>> +on function declarations, and on field declarations containing
>>>>> function
>>>>> +pointers.  In the latter case, any function used as an
>>>>> initializer
>>>>> of
>>>>> +such a callback field will be treated as tainted.
>>>>> +
>>>>> +The analyzer will pay particular attention to such functions
>>>>> when
>>>>> both
>>>>> +@option{-fanalyzer} and @option{-fanalyzer-checker=taint} are
>>>>> supplied,
>>>>> +potentially issuing warnings guarded by
>>>>> +@option{-Wanalyzer-exposure-through-uninit-copy},
>>>>> +@option{-Wanalyzer-tainted-allocation-size},
>>>>> +@option{-Wanalyzer-tainted-array-index},
>>>>> +@option{Wanalyzer-tainted-offset},
>>>>> +and @option{Wanalyzer-tainted-size}.
>>>>> +
>>>>>    @item target_clones (@var{options})
>>>>>    @cindex @code{target_clones} function attribute
>>>>>    The @code{target_clones} attribute is used to specify that a
>>>>> function
>>>>> diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
>>>>> b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
>>>>> new file mode 100644
>>>>> index 00000000000..cc4d5900372
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
>>>>> @@ -0,0 +1,88 @@
>>>>> +// TODO: remove need for this option
>>>>> +/* { dg-additional-options "-fanalyzer-checker=taint" } */
>>>>> +
>>>>> +#include "analyzer-decls.h"
>>>>> +
>>>>> +struct arg_buf
>>>>> +{
>>>>> +  int i;
>>>>> +  int j;
>>>>> +};
>>>>> +
>>>>> +/* Example of marking a function as tainted.  */
>>>>> +
>>>>> +void __attribute__((tainted))
>>>>> +test_1 (int i, void *p, char *q)
>>>>> +{
>>>>> +  /* There should be a single enode,
>>>>> +     for the "tainted" entry to the function.  */
>>>>> +  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1
>>>>> processed
>>>>> enode" } */
>>>>> +
>>>>> +  __analyzer_dump_state ("taint", i); /* { dg-warning "state:
>>>>> 'tainted'" } */
>>>>> +  __analyzer_dump_state ("taint", p); /* { dg-warning "state:
>>>>> 'tainted'" } */
>>>>> +  __analyzer_dump_state ("taint", q); /* { dg-warning "state:
>>>>> 'tainted'" } */
>>>>> +  __analyzer_dump_state ("taint", *q); /* { dg-warning "state:
>>>>> 'tainted'" } */
>>>>> +
>>>>> +  struct arg_buf *args = p;
>>>>> +  __analyzer_dump_state ("taint", args->i); /* { dg-warning
>>>>> "state:
>>>>> 'tainted'" } */
>>>>> +  __analyzer_dump_state ("taint", args->j); /* { dg-warning
>>>>> "state:
>>>>> 'tainted'" } */
>>>>> +}
>>>>> +
>>>>> +/* Example of marking a callback field as tainted.  */
>>>>> +
>>>>> +struct s2
>>>>> +{
>>>>> +  void (*cb) (int, void *, char *)
>>>>> +    __attribute__((tainted));
>>>>> +};
>>>>> +
>>>>> +/* Function not marked as tainted.  */
>>>>> +
>>>>> +void
>>>>> +test_2a (int i, void *p, char *q)
>>>>> +{
>>>>> +  /* There should be a single enode,
>>>>> +     for the normal entry to the function.  */
>>>>> +  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1
>>>>> processed
>>>>> enode" } */
>>>>> +
>>>>> +  __analyzer_dump_state ("taint", i); /* { dg-warning "state:
>>>>> 'start'" } */
>>>>> +  __analyzer_dump_state ("taint", p); /* { dg-warning "state:
>>>>> 'start'" } */
>>>>> +  __analyzer_dump_state ("taint", q); /* { dg-warning "state:
>>>>> 'start'" } */
>>>>> +
>>>>> +  struct arg_buf *args = p;
>>>>> +  __analyzer_dump_state ("taint", args->i); /* { dg-warning
>>>>> "state:
>>>>> 'start'" } */
>>>>> +  __analyzer_dump_state ("taint", args->j); /* { dg-warning
>>>>> "state:
>>>>> 'start'" } */
>>>>> +}
>>>>> +
>>>>> +/* Function referenced via t2b.cb, marked as "tainted".  */
>>>>> +
>>>>> +void
>>>>> +test_2b (int i, void *p, char *q)
>>>>> +{
>>>>> +  /* There should be two enodes
>>>>> +     for the direct call, and the "tainted" entry to the
>>>>> function.
>>>>> */
>>>>> +  __analyzer_dump_exploded_nodes (0); /* { dg-warning "2
>>>>> processed
>>>>> enodes" } */
>>>>> +}
>>>>> +
>>>>> +/* Callback used via t2c.cb, marked as "tainted".  */
>>>>> +void
>>>>> +__analyzer_test_2c (int i, void *p, char *q)
>>>>> +{
>>>>> +  /* There should be a single enode,
>>>>> +     for the "tainted" entry to the function.  */
>>>>> +  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1
>>>>> processed
>>>>> enode" } */
>>>>> +
>>>>> +  __analyzer_dump_state ("taint", i); /* { dg-warning "state:
>>>>> 'tainted'" } */
>>>>> +  __analyzer_dump_state ("taint", p); /* { dg-warning "state:
>>>>> 'tainted'" } */
>>>>> +  __analyzer_dump_state ("taint", q); /* { dg-warning "state:
>>>>> 'tainted'" } */
>>>>> +}
>>>>> +
>>>>> +struct s2 t2b =
>>>>> +{
>>>>> +  .cb = test_2b
>>>>> +};
>>>>> +
>>>>> +struct s2 t2c =
>>>>> +{
>>>>> +  .cb = __analyzer_test_2c
>>>>> +};
>>>>> diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
>>>>> b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
>>>>> new file mode 100644
>>>>> index 00000000000..6f4cbc82efb
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
>>>>> @@ -0,0 +1,6 @@
>>>>> +int not_a_fn __attribute__ ((tainted)); /* { dg-warning
>>>>> "'tainted'
>>>>> attribute ignored; valid only for functions and function pointer
>>>>> fields" } */
>>>>> +
>>>>> +struct s
>>>>> +{
>>>>> +  int f __attribute__ ((tainted)); /* { dg-warning "'tainted'
>>>>> attribute ignored; field must be a function pointer" } */
>>>>> +};
>>>>> diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-
>>>>> 1.c
>>>>> b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
>>>>> new file mode 100644
>>>>> index 00000000000..fe6c7ebbb1f
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
>>>>> @@ -0,0 +1,93 @@
>>>>> +/* "The osf_getsysinfo function in arch/alpha/kernel/osf_sys.c
>>>>> in
>>>>> the
>>>>> +   Linux kernel before 2.6.39.4 on the Alpha platform does not
>>>>> properly
>>>>> +   restrict the data size for GSI_GET_HWRPB operations, which
>>>>> allows
>>>>> +   local users to obtain sensitive information from kernel
>>>>> memory
>>>>> via
>>>>> +   a crafted call."
>>>>> +
>>>>> +   Fixed in 3d0475119d8722798db5e88f26493f6547a4bb5b on linux-
>>>>> 2.6.39.y
>>>>> +   in linux-stable.  */
>>>>> +
>>>>> +// TODO: remove need for this option:
>>>>> +/* { dg-additional-options "-fanalyzer-checker=taint" } */
>>>>> +
>>>>> +#include "analyzer-decls.h"
>>>>> +#include "test-uaccess.h"
>>>>> +
>>>>> +/* Adapted from include/linux/linkage.h.  */
>>>>> +
>>>>> +#define asmlinkage
>>>>> +
>>>>> +/* Adapted from include/linux/syscalls.h.  */
>>>>> +
>>>>> +#define __SC_DECL1(t1, a1)     t1 a1
>>>>> +#define __SC_DECL2(t2, a2, ...) t2 a2, __SC_DECL1(__VA_ARGS__)
>>>>> +#define __SC_DECL3(t3, a3, ...) t3 a3, __SC_DECL2(__VA_ARGS__)
>>>>> +#define __SC_DECL4(t4, a4, ...) t4 a4, __SC_DECL3(__VA_ARGS__)
>>>>> +#define __SC_DECL5(t5, a5, ...) t5 a5, __SC_DECL4(__VA_ARGS__)
>>>>> +#define __SC_DECL6(t6, a6, ...) t6 a6, __SC_DECL5(__VA_ARGS__)
>>>>> +
>>>>> +#define SYSCALL_DEFINEx(x, sname, ...)                         \
>>>>> +       __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
>>>>> +
>>>>> +#define SYSCALL_DEFINE(name) asmlinkage long sys_##name
>>>>> +#define __SYSCALL_DEFINEx(x, name,
>>>>> ...)                                        \
>>>>> +       asmlinkage __attribute__((tainted)) \
>>>>> +       long sys##name(__SC_DECL##x(__VA_ARGS__))
>>>>> +
>>>>> +#define SYSCALL_DEFINE5(name, ...) SYSCALL_DEFINEx(5, _##name,
>>>>> __VA_ARGS__)
>>>>> +
>>>>> +/* Adapted from arch/alpha/include/asm/hwrpb.h.  */
>>>>> +
>>>>> +struct hwrpb_struct {
>>>>> +       unsigned long phys_addr;        /* check: physical
>>>>> address of
>>>>> the hwrpb */
>>>>> +       unsigned long id;               /* check: "HWRPB\0\0\0"
>>>>> */
>>>>> +       unsigned long revision;
>>>>> +       unsigned long size;             /* size of hwrpb */
>>>>> +       /* [...snip...] */
>>>>> +};
>>>>> +
>>>>> +extern struct hwrpb_struct *hwrpb;
>>>>> +
>>>>> +/* Adapted from arch/alpha/kernel/osf_sys.c.  */
>>>>> +
>>>>> +SYSCALL_DEFINE5(osf_getsysinfo, unsigned long, op, void __user
>>>>> *,
>>>>> buffer,
>>>>> +               unsigned long, nbytes, int __user *, start, void
>>>>> __user *, arg)
>>>>> +{
>>>>> +       /* [...snip...] */
>>>>> +
>>>>> +       __analyzer_dump_state ("taint", nbytes);  /* { dg-warning
>>>>> "tainted" } */
>>>>> +
>>>>> +       /* TODO: should have an event explaining why "nbytes" is
>>>>> treated as
>>>>> +          attacker-controlled.  */
>>>>> +
>>>>> +       /* case GSI_GET_HWRPB: */
>>>>> +               if (nbytes < sizeof(*hwrpb))
>>>>> +                       return -1;
>>>>> +
>>>>> +               __analyzer_dump_state ("taint", nbytes);  /* {
>>>>> dg-
>>>>> warning "has_lb" } */
>>>>> +
>>>>> +               if (copy_to_user(buffer, hwrpb, nbytes) != 0) /*
>>>>> {
>>>>> dg-warning "use of attacker-controlled value 'nbytes' as size
>>>>> without
>>>>> upper-bounds checking" } */
>>>>> +                       return -2;
>>>>> +
>>>>> +               return 1;
>>>>> +
>>>>> +       /* [...snip...] */
>>>>> +}
>>>>> +
>>>>> +/* With the fix for the sense of the size comparison.  */
>>>>> +
>>>>> +SYSCALL_DEFINE5(osf_getsysinfo_fixed, unsigned long, op, void
>>>>> __user
>>>>> *, buffer,
>>>>> +               unsigned long, nbytes, int __user *, start, void
>>>>> __user *, arg)
>>>>> +{
>>>>> +       /* [...snip...] */
>>>>> +
>>>>> +       /* case GSI_GET_HWRPB: */
>>>>> +               if (nbytes > sizeof(*hwrpb))
>>>>> +                       return -1;
>>>>> +               if (copy_to_user(buffer, hwrpb, nbytes) != 0) /*
>>>>> {
>>>>> dg-bogus "attacker-controlled" } */
>>>>> +                       return -2;
>>>>> +
>>>>> +               return 1;
>>>>> +
>>>>> +       /* [...snip...] */
>>>>> +}
>>>>> diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-
>>>>> 1.c
>>>>> b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
>>>>> new file mode 100644
>>>>> index 00000000000..0b9a94a8d6c
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
>>>>> @@ -0,0 +1,38 @@
>>>>> +/* See notes in this header.  */
>>>>> +#include "taint-CVE-2020-13143.h"
>>>>> +
>>>>> +// TODO: remove need for this option
>>>>> +/* { dg-additional-options "-fanalyzer-checker=taint" } */
>>>>> +
>>>>> +struct configfs_attribute {
>>>>> +       /* [...snip...] */
>>>>> +       ssize_t (*store)(struct config_item *, const char *,
>>>>> size_t)
>>>>> /* { dg-message "\\(1\\) field 'store' of 'struct
>>>>> configfs_attribute'
>>>>> is marked with '__attribute__\\(\\(tainted\\)\\)'" } */
>>>>> +               __attribute__((tainted)); /* (this is added).  */
>>>>> +};
>>>>> +static inline struct gadget_info *to_gadget_info(struct
>>>>> config_item
>>>>> *item)
>>>>> +{
>>>>> +        return container_of(to_config_group(item), struct
>>>>> gadget_info, group);
>>>>> +}
>>>>> +
>>>>> +static ssize_t gadget_dev_desc_UDC_store(struct config_item
>>>>> *item,
>>>>> +               const char *page, size_t len)
>>>>> +{
>>>>> +       struct gadget_info *gi = to_gadget_info(item);
>>>>> +       char *name;
>>>>> +       int ret;
>>>>> +
>>>>> +#if 0
>>>>> +       /* FIXME: this is the fix.  */
>>>>> +       if (strlen(page) < len)
>>>>> +               return -EOVERFLOW;
>>>>> +#endif
>>>>> +
>>>>> +       name = kstrdup(page, GFP_KERNEL);
>>>>> +       if (!name)
>>>>> +               return -ENOMEM;
>>>>> +       if (name[len - 1] == '\n') /* { dg-warning "use of
>>>>> attacker-
>>>>> controlled value 'len \[^\n\r\]+' as offset without upper-bounds
>>>>> checking" } */
>>>>> +               name[len - 1] = '\0'; /* { dg-warning "use of
>>>>> attacker-controlled value 'len \[^\n\r\]+' as offset without
>>>>> upper-
>>>>> bounds checking" } */
>>>>> +       /* [...snip...] */                              \
>>>>> +}
>>>>> +
>>>>> +CONFIGFS_ATTR(gadget_dev_desc_, UDC); /* { dg-message "\\(2\\)
>>>>> function 'gadget_dev_desc_UDC_store' used as initializer for
>>>>> field
>>>>> 'store' marked with '__attribute__\\(\\(tainted\\)\\)'" } */
>>>>> diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-
>>>>> 2.c
>>>>> b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
>>>>> new file mode 100644
>>>>> index 00000000000..e05da9276c1
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
>>>>> @@ -0,0 +1,32 @@
>>>>> +/* See notes in this header.  */
>>>>> +#include "taint-CVE-2020-13143.h"
>>>>> +
>>>>> +// TODO: remove need for this option
>>>>> +/* { dg-additional-options "-fanalyzer-checker=taint" } */
>>>>> +
>>>>> +struct configfs_attribute {
>>>>> +       /* [...snip...] */
>>>>> +       ssize_t (*store)(struct config_item *, const char *,
>>>>> size_t)
>>>>> /* { dg-message "\\(1\\) field 'store' of 'struct
>>>>> configfs_attribute'
>>>>> is marked with '__attribute__\\(\\(tainted\\)\\)'" } */
>>>>> +               __attribute__((tainted)); /* (this is added).  */
>>>>> +};
>>>>> +
>>>>> +/* Highly simplified version.  */
>>>>> +
>>>>> +static ssize_t gadget_dev_desc_UDC_store(struct config_item
>>>>> *item,
>>>>> +               const char *page, size_t len)
>>>>> +{
>>>>> +       /* TODO: ought to have state_change_event talking about
>>>>> where
>>>>> the tainted value comes from.  */
>>>>> +
>>>>> +       char *name;
>>>>> +       /* [...snip...] */
>>>>> +
>>>>> +       name = kstrdup(page, GFP_KERNEL);
>>>>> +       if (!name)
>>>>> +               return -ENOMEM;
>>>>> +       if (name[len - 1] == '\n') /* { dg-warning "use of
>>>>> attacker-
>>>>> controlled value 'len \[^\n\r\]+' as offset without upper-bounds
>>>>> checking" } */
>>>>> +               name[len - 1] = '\0';  /* { dg-warning "use of
>>>>> attacker-controlled value 'len \[^\n\r\]+' as offset without
>>>>> upper-
>>>>> bounds checking" } */
>>>>> +       /* [...snip...] */
>>>>> +       return 0;
>>>>> +}
>>>>> +
>>>>> +CONFIGFS_ATTR(gadget_dev_desc_, UDC); /* { dg-message "\\(2\\)
>>>>> function 'gadget_dev_desc_UDC_store' used as initializer for
>>>>> field
>>>>> 'store' marked with '__attribute__\\(\\(tainted\\)\\)'" } */
>>>>> diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
>>>>> b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
>>>>> new file mode 100644
>>>>> index 00000000000..0ba023539af
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
>>>>> @@ -0,0 +1,91 @@
>>>>> +/* Shared header for the various taint-CVE-2020-13143.h tests.
>>>>> +
>>>>> +   "gadget_dev_desc_UDC_store in drivers/usb/gadget/configfs.c
>>>>> in
>>>>> the
>>>>> +   Linux kernel 3.16 through 5.6.13 relies on kstrdup without
>>>>> considering
>>>>> +   the possibility of an internal '\0' value, which allows
>>>>> attackers
>>>>> to
>>>>> +   trigger an out-of-bounds read, aka CID-15753588bcd4."
>>>>> +
>>>>> +   Fixed by 15753588bcd4bbffae1cca33c8ced5722477fe1f on linux-
>>>>> 5.7.y
>>>>> +   in linux-stable.  */
>>>>> +
>>>>> +// TODO: remove need for this option
>>>>> +/* { dg-additional-options "-fanalyzer-checker=taint" } */
>>>>> +
>>>>> +#include <stddef.h>
>>>>> +
>>>>> +/* Adapted from include/uapi/asm-generic/posix_types.h  */
>>>>> +
>>>>> +typedef unsigned int     __kernel_size_t;
>>>>> +typedef int              __kernel_ssize_t;
>>>>> +
>>>>> +/* Adapted from include/linux/types.h  */
>>>>> +
>>>>> +//typedef __kernel_size_t              size_t;
>>>>> +typedef __kernel_ssize_t       ssize_t;
>>>>> +
>>>>> +/* Adapted from include/linux/kernel.h  */
>>>>> +
>>>>> +#define container_of(ptr, type, member)
>>>>> ({                             \
>>>>> +       void *__mptr = (void
>>>>> *)(ptr);                                   \
>>>>> +       /* [...snip...]
>>>>> */                                              \
>>>>> +       ((type *)(__mptr - offsetof(type, member))); })
>>>>> +
>>>>> +/* Adapted from include/linux/configfs.h  */
>>>>> +
>>>>> +struct config_item {
>>>>> +       /* [...snip...] */
>>>>> +};
>>>>> +
>>>>> +struct config_group {
>>>>> +       struct config_item              cg_item;
>>>>> +       /* [...snip...] */
>>>>> +};
>>>>> +
>>>>> +static inline struct config_group *to_config_group(struct
>>>>> config_item *item)
>>>>> +{
>>>>> +       return item ? container_of(item,struct
>>>>> config_group,cg_item)
>>>>> : NULL;
>>>>> +}
>>>>> +
>>>>> +#define CONFIGFS_ATTR(_pfx, _name)                             \
>>>>> +static struct configfs_attribute _pfx##attr_##_name = {        \
>>>>> +       /* [...snip...] */                              \
>>>>> +       .store          = _pfx##_name##_store,          \
>>>>> +}
>>>>> +
>>>>> +/* Adapted from include/linux/compiler.h  */
>>>>> +
>>>>> +#define __force
>>>>> +
>>>>> +/* Adapted from include/asm-generic/errno-base.h  */
>>>>> +
>>>>> +#define        ENOMEM          12      /* Out of memory */
>>>>> +
>>>>> +/* Adapted from include/linux/types.h  */
>>>>> +
>>>>> +#define __bitwise__
>>>>> +typedef unsigned __bitwise__ gfp_t;
>>>>> +
>>>>> +/* Adapted from include/linux/gfp.h  */
>>>>> +
>>>>> +#define ___GFP_WAIT            0x10u
>>>>> +#define ___GFP_IO              0x40u
>>>>> +#define ___GFP_FS              0x80u
>>>>> +#define __GFP_WAIT     ((__force gfp_t)___GFP_WAIT)
>>>>> +#define __GFP_IO       ((__force gfp_t)___GFP_IO)
>>>>> +#define __GFP_FS       ((__force gfp_t)___GFP_FS)
>>>>> +#define GFP_KERNEL  (__GFP_WAIT | __GFP_IO | __GFP_FS)
>>>>> +
>>>>> +/* Adapted from include/linux/compiler_attributes.h  */
>>>>> +
>>>>> +#define __malloc
>>>>> __attribute__((__malloc__))
>>>>> +
>>>>> +/* Adapted from include/linux/string.h  */
>>>>> +
>>>>> +extern char *kstrdup(const char *s, gfp_t gfp) __malloc;
>>>>> +
>>>>> +/* Adapted from drivers/usb/gadget/configfs.c  */
>>>>> +
>>>>> +struct gadget_info {
>>>>> +       struct config_group group;
>>>>> +       /* [...snip...] */                              \
>>>>> +};
>>>>> diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
>>>>> b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
>>>>> new file mode 100644
>>>>> index 00000000000..4c567b2ffdf
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
>>>>> @@ -0,0 +1,21 @@
>>>>> +// TODO: remove need for this option:
>>>>> +/* { dg-additional-options "-fanalyzer-checker=taint" } */
>>>>> +
>>>>> +#include "analyzer-decls.h"
>>>>> +#include <stdio.h>
>>>>> +#include <stdlib.h>
>>>>> +#include <string.h>
>>>>> +
>>>>> +/* malloc with tainted size from a syscall.  */
>>>>> +
>>>>> +void *p;
>>>>> +
>>>>> +void __attribute__((tainted))
>>>>> +test_1 (size_t sz) /* { dg-message "\\(1\\) function 'test_1'
>>>>> marked
>>>>> with '__attribute__\\(\\(tainted\\)\\)'" } */
>>>>> +{
>>>>> +  /* TODO: should have a message saying why "sz" is tainted,
>>>>> e.g.
>>>>> +     "treating 'sz' as attacker-controlled because 'test_1' is
>>>>> marked with '__attribute__((tainted))'"  */
>>>>> +
>>>>> +  p = malloc (sz); /* { dg-warning "use of attacker-controlled
>>>>> value
>>>>> 'sz' as allocation size without upper-bounds checking" "warning"
>>>>> } */
>>>>> +  /* { dg-message "\\(\[0-9\]+\\) use of attacker-controlled
>>>>> value
>>>>> 'sz' as allocation size without upper-bounds checking" "final
>>>>> event"
>>>>> { target *-*-* } .-1 } */
>>>>> +}
>>>>> diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
>>>>> b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
>>>>> new file mode 100644
>>>>> index 00000000000..f52cafcd71d
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
>>>>> @@ -0,0 +1,31 @@
>>>>> +// TODO: remove need for this option:
>>>>> +/* { dg-additional-options "-fanalyzer-checker=taint" } */
>>>>> +
>>>>> +#include "analyzer-decls.h"
>>>>> +#include <stdio.h>
>>>>> +#include <stdlib.h>
>>>>> +#include <string.h>
>>>>> +
>>>>> +/* malloc with tainted size from a syscall.  */
>>>>> +
>>>>> +struct arg_buf
>>>>> +{
>>>>> +  size_t sz;
>>>>> +};
>>>>> +
>>>>> +void *p;
>>>>> +
>>>>> +void __attribute__((tainted))
>>>>> +test_1 (void *data) /* { dg-message "\\(1\\) function 'test_1'
>>>>> marked with '__attribute__\\(\\(tainted\\)\\)'" } */
>>>>> +{
>>>>> +  /* we should treat pointed-to-structs as tainted.  */
>>>>> +  __analyzer_dump_state ("taint", data); /* { dg-warning "state:
>>>>> 'tainted'" } */
>>>>> +
>>>>> +  struct arg_buf *args = data;
>>>>> +
>>>>> +  __analyzer_dump_state ("taint", args); /* { dg-warning "state:
>>>>> 'tainted'" } */
>>>>> +  __analyzer_dump_state ("taint", args->sz); /* { dg-warning
>>>>> "state:
>>>>> 'tainted'" } */
>>>>> +
>>>>> +  p = malloc (args->sz); /* { dg-warning "use of attacker-
>>>>> controlled
>>>>> value '\\*args.sz' as allocation size without upper-bounds
>>>>> checking"
>>>>> "warning" } */
>>>>> +  /* { dg-message "\\(\[0-9\]+\\) use of attacker-controlled
>>>>> value
>>>>> '\\*args.sz' as allocation size without upper-bounds checking"
>>>>> "final
>>>>> event" { target *-*-* } .-1 } */
>>>>> +}
>>>>
>>>
>>>
>>
> 
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [committed] Add __attribute__ ((tainted_args))
  2022-01-13 19:08           ` Jason Merrill
@ 2022-01-14  1:25             ` David Malcolm
  0 siblings, 0 replies; 39+ messages in thread
From: David Malcolm @ 2022-01-14  1:25 UTC (permalink / raw)
  To: gcc-patches, Jason Merrill, linux-toolchains; +Cc: David Malcolm

On Thu, 2022-01-13 at 14:08 -0500, Jason Merrill wrote:
> On 1/12/22 10:33, David Malcolm wrote:
> > On Tue, 2022-01-11 at 23:36 -0500, Jason Merrill wrote:
> > > On 1/10/22 16:36, David Malcolm via Gcc-patches wrote:
> > > > On Thu, 2022-01-06 at 09:08 -0500, David Malcolm wrote:
> > > > > On Sat, 2021-11-13 at 15:37 -0500, David Malcolm wrote:
> > > > > > This patch adds a new __attribute__ ((tainted)) to the
> > > > > > C/C++
> > > > > > frontends.
> > > > > 
> > > > > Ping for GCC C/C++ mantainers for review of the C/C++ FE
> > > > > parts of
> > > > > this
> > > > > patch (attribute registration, documentation, the name of the
> > > > > attribute, etc).
> > > > > 
> > > > > (I believe it's independent of the rest of the patch kit, in
> > > > > that
> > > > > it
> > > > > could go into trunk without needing the prior patches)
> > > > > 
> > > > > Thanks
> > > > > Dave
> > > > 
> > > > Getting close to end of stage 3 for GCC 12, so pinging this
> > > > patch
> > > > again...
> > > > 
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584376.html
> > > 
> > > The c-family change is OK.
> > 
> > Thanks.
> > 
> > I'm retesting the patch now, but it now seems to me that
> >    __attribute__((tainted_args))
> > would lead to more readable code than:
> >    __attribute__((tainted))
> > 
> > in that the name "tainted_args" better conveys the idea that all
> > arguments are under attacker-control (as opposed to the body of the
> > function or the function pointer being under attacker-control).
> > 
> > Looking at
> >   
> > https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
> > we already have some attributes with underscores in their names.
> > 
> > Does this sound good?
> 
> Makes sense to me.

Thanks.

I updated the patch to use the name "tainted_args" for the attribute,
and there were a few other changes needed due to splitting it out from
the rest of the kit.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk for gcc 12 as b31cec9c22b8dfa40baefd4c2dd774477e8e04c5.

The following is what I committed, for reference:

This patch adds a new __attribute__ ((tainted_args)) to the C/C++ frontends.

It can be used on function decls: the analyzer will treat as tainted
all parameters to the function and all buffers pointed to by parameters
to the function.  Adding this in one place to the Linux kernel's
__SYSCALL_DEFINEx macro allows the analyzer to treat all syscalls as
having tainted inputs.  This gives some coverage of system calls without
needing to "teach" the analyzer about "__user" - an example of the use
of this can be seen in CVE-2011-2210, where given:

 SYSCALL_DEFINE5(osf_getsysinfo, unsigned long, op, void __user *, buffer,
                 unsigned long, nbytes, int __user *, start, void __user *, arg)

the analyzer will treat the nbytes param as under attacker control, and
can complain accordingly:

taint-CVE-2011-2210-1.c: In function 'sys_osf_getsysinfo':
taint-CVE-2011-2210-1.c:69:21: warning: use of attacker-controlled value
  'nbytes' as size without upper-bounds checking [CWE-129] [-Wanalyzer-tainted-size]
   69 |                 if (copy_to_user(buffer, hwrpb, nbytes) != 0)
      |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Additionally, the patch allows the attribute to be used on field decls:
specifically function pointers.  Any function used as an initializer
for such a field gets treated as being called with tainted arguments.
An example can be seen in CVE-2020-13143, where adding
__attribute__((tainted_args)) to the "store" callback of
configfs_attribute:

  struct configfs_attribute {
    /* [...snip...] */
    ssize_t (*store)(struct config_item *, const char *, size_t)
      __attribute__((tainted_args));
    /* [...snip...] */
  };

allows the analyzer to see:

 CONFIGFS_ATTR(gadget_dev_desc_, UDC);

and treat gadget_dev_desc_UDC_store as having tainted arguments, so that
it complains:

taint-CVE-2020-13143-1.c: In function 'gadget_dev_desc_UDC_store':
taint-CVE-2020-13143-1.c:33:17: warning: use of attacker-controlled value
  'len + 18446744073709551615' as offset without upper-bounds checking [CWE-823] [-Wanalyzer-tainted-offset]
   33 |         if (name[len - 1] == '\n')
      |             ~~~~^~~~~~~~~

As before this currently still needs -fanalyzer-checker=taint (in
addition to -fanalyzer).

gcc/analyzer/ChangeLog:
	* engine.cc: Include "stringpool.h", "attribs.h", and
	"tree-dfa.h".
	(mark_params_as_tainted): New.
	(class tainted_args_function_custom_event): New.
	(class tainted_args_function_info): New.
	(exploded_graph::add_function_entry): Handle functions with
	"tainted_args" attribute.
	(class tainted_args_field_custom_event): New.
	(class tainted_args_callback_custom_event): New.
	(class tainted_args_call_info): New.
	(add_tainted_args_callback): New.
	(add_any_callbacks): New.
	(exploded_graph::build_initial_worklist): Likewise.
	(exploded_graph::build_initial_worklist): Find callbacks that are
	reachable from global initializers, calling add_any_callbacks on
	them.

gcc/c-family/ChangeLog:
	* c-attribs.c (c_common_attribute_table): Add "tainted_args".
	(handle_tainted_args_attribute): New.

gcc/ChangeLog:
	* doc/extend.texi (Function Attributes): Note that "tainted_args" can
	be used on field decls.
	(Common Function Attributes): Add entry on "tainted_args" attribute.

gcc/testsuite/ChangeLog:
	* gcc.dg/analyzer/attr-tainted_args-1.c: New test.
	* gcc.dg/analyzer/attr-tainted_args-misuses.c: New test.
	* gcc.dg/analyzer/taint-CVE-2011-2210-1.c: New test.
	* gcc.dg/analyzer/taint-CVE-2020-13143-1.c: New test.
	* gcc.dg/analyzer/taint-CVE-2020-13143-2.c: New test.
	* gcc.dg/analyzer/taint-CVE-2020-13143.h: New test.
	* gcc.dg/analyzer/taint-alloc-3.c: New test.
	* gcc.dg/analyzer/taint-alloc-4.c: New test.
	* gcc.dg/analyzer/test-uaccess.h: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
---
 gcc/analyzer/engine.cc                        | 320 +++++++++++++++++-
 gcc/c-family/c-attribs.c                      |  36 ++
 gcc/doc/extend.texi                           |  23 +-
 .../gcc.dg/analyzer/attr-tainted_args-1.c     |  88 +++++
 .../analyzer/attr-tainted_args-misuses.c      |   6 +
 .../gcc.dg/analyzer/taint-CVE-2011-2210-1.c   |  93 +++++
 .../gcc.dg/analyzer/taint-CVE-2020-13143-1.c  |  38 +++
 .../gcc.dg/analyzer/taint-CVE-2020-13143-2.c  |  32 ++
 .../gcc.dg/analyzer/taint-CVE-2020-13143.h    |  91 +++++
 gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c |  21 ++
 gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c |  31 ++
 gcc/testsuite/gcc.dg/analyzer/test-uaccess.h  |  15 +
 12 files changed, 791 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted_args-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted_args-misuses.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/test-uaccess.h

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 8b6f4c83f0f..243235e4cd4 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -68,6 +68,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "plugin.h"
 #include "target.h"
 #include <memory>
+#include "stringpool.h"
+#include "attribs.h"
+#include "tree-dfa.h"
 
 /* For an overview, see gcc/doc/analyzer.texi.  */
 
@@ -2287,6 +2290,116 @@ exploded_graph::~exploded_graph ()
     delete (*iter).second;
 }
 
+/* Subroutine for use when implementing __attribute__((tainted_args))
+   on functions and on function pointer fields in structs.
+
+   Called on STATE representing a call to FNDECL.
+   Mark all params of FNDECL in STATE as "tainted".  Mark the value of all
+   regions pointed to by params of FNDECL as "tainted".
+
+   Return true if successful; return false if the "taint" state machine
+   was not found.  */
+
+static bool
+mark_params_as_tainted (program_state *state, tree fndecl,
+			const extrinsic_state &ext_state)
+{
+  unsigned taint_sm_idx;
+  if (!ext_state.get_sm_idx_by_name ("taint", &taint_sm_idx))
+    return false;
+  sm_state_map *smap = state->m_checker_states[taint_sm_idx];
+
+  const state_machine &sm = ext_state.get_sm (taint_sm_idx);
+  state_machine::state_t tainted = sm.get_state_by_name ("tainted");
+
+  region_model_manager *mgr = ext_state.get_model_manager ();
+
+  function *fun = DECL_STRUCT_FUNCTION (fndecl);
+  gcc_assert (fun);
+
+  for (tree iter_parm = DECL_ARGUMENTS (fndecl); iter_parm;
+       iter_parm = DECL_CHAIN (iter_parm))
+    {
+      tree param = iter_parm;
+      if (tree parm_default_ssa = ssa_default_def (fun, iter_parm))
+	param = parm_default_ssa;
+      const region *param_reg = state->m_region_model->get_lvalue (param, NULL);
+      const svalue *init_sval = mgr->get_or_create_initial_value (param_reg);
+      smap->set_state (state->m_region_model, init_sval,
+		       tainted, NULL /*origin_new_sval*/, ext_state);
+      if (POINTER_TYPE_P (TREE_TYPE (param)))
+	{
+	  const region *pointee_reg = mgr->get_symbolic_region (init_sval);
+	  /* Mark "*param" as tainted.  */
+	  const svalue *init_pointee_sval
+	    = mgr->get_or_create_initial_value (pointee_reg);
+	  smap->set_state (state->m_region_model, init_pointee_sval,
+			   tainted, NULL /*origin_new_sval*/, ext_state);
+	}
+    }
+
+  return true;
+}
+
+/* Custom event for use by tainted_args_function_info when a function
+   has been marked with __attribute__((tainted_args)).  */
+
+class tainted_args_function_custom_event : public custom_event
+{
+public:
+  tainted_args_function_custom_event (location_t loc, tree fndecl, int depth)
+  : custom_event (loc, fndecl, depth),
+    m_fndecl (fndecl)
+  {
+  }
+
+  label_text get_desc (bool can_colorize) const FINAL OVERRIDE
+  {
+    return make_label_text
+      (can_colorize,
+       "function %qE marked with %<__attribute__((tainted_args))%>",
+       m_fndecl);
+  }
+
+private:
+  tree m_fndecl;
+};
+
+/* Custom exploded_edge info for top-level calls to a function
+   marked with __attribute__((tainted_args)).  */
+
+class tainted_args_function_info : public custom_edge_info
+{
+public:
+  tainted_args_function_info (tree fndecl)
+  : m_fndecl (fndecl)
+  {}
+
+  void print (pretty_printer *pp) const FINAL OVERRIDE
+  {
+    pp_string (pp, "call to tainted_args function");
+  };
+
+  bool update_model (region_model *,
+		     const exploded_edge *,
+		     region_model_context *) const FINAL OVERRIDE
+  {
+    /* No-op.  */
+    return true;
+  }
+
+  void add_events_to_path (checker_path *emission_path,
+			   const exploded_edge &) const FINAL OVERRIDE
+  {
+    emission_path->add_event
+      (new tainted_args_function_custom_event
+       (DECL_SOURCE_LOCATION (m_fndecl), m_fndecl, 0));
+  }
+
+private:
+  tree m_fndecl;
+};
+
 /* Ensure that there is an exploded_node representing an external call to
    FUN, adding it to the worklist if creating it.
 
@@ -2313,14 +2426,25 @@ exploded_graph::add_function_entry (function *fun)
   program_state state (m_ext_state);
   state.push_frame (m_ext_state, fun);
 
+  custom_edge_info *edge_info = NULL;
+
+  if (lookup_attribute ("tainted_args", DECL_ATTRIBUTES (fun->decl)))
+    {
+      if (mark_params_as_tainted (&state, fun->decl, m_ext_state))
+	edge_info = new tainted_args_function_info (fun->decl);
+    }
+
   if (!state.m_valid)
     return NULL;
 
   exploded_node *enode = get_or_create_node (point, state, NULL);
   if (!enode)
-    return NULL;
+    {
+      delete edge_info;
+      return NULL;
+    }
 
-  add_edge (m_origin, enode, NULL);
+  add_edge (m_origin, enode, NULL, edge_info);
 
   m_functions_with_enodes.add (fun);
 
@@ -2634,6 +2758,187 @@ toplevel_function_p (function *fun, logger *logger)
   return true;
 }
 
+/* Custom event for use by tainted_call_info when a callback field has been
+   marked with __attribute__((tainted_args)), for labelling the field.  */
+
+class tainted_args_field_custom_event : public custom_event
+{
+public:
+  tainted_args_field_custom_event (tree field)
+  : custom_event (DECL_SOURCE_LOCATION (field), NULL_TREE, 0),
+    m_field (field)
+  {
+  }
+
+  label_text get_desc (bool can_colorize) const FINAL OVERRIDE
+  {
+    return make_label_text (can_colorize,
+			    "field %qE of %qT"
+			    " is marked with %<__attribute__((tainted_args))%>",
+			    m_field, DECL_CONTEXT (m_field));
+  }
+
+private:
+  tree m_field;
+};
+
+/* Custom event for use by tainted_call_info when a callback field has been
+   marked with __attribute__((tainted_args)), for labelling the function used
+   in that callback.  */
+
+class tainted_args_callback_custom_event : public custom_event
+{
+public:
+  tainted_args_callback_custom_event (location_t loc, tree fndecl, int depth,
+				 tree field)
+  : custom_event (loc, fndecl, depth),
+    m_field (field)
+  {
+  }
+
+  label_text get_desc (bool can_colorize) const FINAL OVERRIDE
+  {
+    return make_label_text (can_colorize,
+			    "function %qE used as initializer for field %qE"
+			    " marked with %<__attribute__((tainted_args))%>",
+			    m_fndecl, m_field);
+  }
+
+private:
+  tree m_field;
+};
+
+/* Custom edge info for use when adding a function used by a callback field
+   marked with '__attribute__((tainted_args))'.   */
+
+class tainted_args_call_info : public custom_edge_info
+{
+public:
+  tainted_args_call_info (tree field, tree fndecl, location_t loc)
+  : m_field (field), m_fndecl (fndecl), m_loc (loc)
+  {}
+
+  void print (pretty_printer *pp) const FINAL OVERRIDE
+  {
+    pp_string (pp, "call to tainted field");
+  };
+
+  bool update_model (region_model *,
+		     const exploded_edge *,
+		     region_model_context *) const FINAL OVERRIDE
+  {
+    /* No-op.  */
+    return true;
+  }
+
+  void add_events_to_path (checker_path *emission_path,
+			   const exploded_edge &) const FINAL OVERRIDE
+  {
+    /* Show the field in the struct declaration, e.g.
+       "(1) field 'store' is marked with '__attribute__((tainted_args))'"  */
+    emission_path->add_event
+      (new tainted_args_field_custom_event (m_field));
+
+    /* Show the callback in the initializer
+       e.g.
+       "(2) function 'gadget_dev_desc_UDC_store' used as initializer
+       for field 'store' marked with '__attribute__((tainted_args))'".  */
+    emission_path->add_event
+      (new tainted_args_callback_custom_event (m_loc, m_fndecl, 0, m_field));
+  }
+
+private:
+  tree m_field;
+  tree m_fndecl;
+  location_t m_loc;
+};
+
+/* Given an initializer at LOC for FIELD marked with
+   '__attribute__((tainted_args))' initialized with FNDECL, add an
+   entrypoint to FNDECL to EG (and to its worklist) where the params to
+   FNDECL are marked as tainted.  */
+
+static void
+add_tainted_args_callback (exploded_graph *eg, tree field, tree fndecl,
+			   location_t loc)
+{
+  logger *logger = eg->get_logger ();
+
+  LOG_SCOPE (logger);
+
+  if (!gimple_has_body_p (fndecl))
+    return;
+
+  const extrinsic_state &ext_state = eg->get_ext_state ();
+
+  function *fun = DECL_STRUCT_FUNCTION (fndecl);
+  gcc_assert (fun);
+
+  program_point point
+    = program_point::from_function_entry (eg->get_supergraph (), fun);
+  program_state state (ext_state);
+  state.push_frame (ext_state, fun);
+
+  if (!mark_params_as_tainted (&state, fndecl, ext_state))
+    return;
+
+  if (!state.m_valid)
+    return;
+
+  exploded_node *enode = eg->get_or_create_node (point, state, NULL);
+  if (logger)
+    {
+      if (enode)
+	logger->log ("created EN %i for tainted_args %qE entrypoint",
+		     enode->m_index, fndecl);
+      else
+	{
+	  logger->log ("did not create enode for tainted_args %qE entrypoint",
+		       fndecl);
+	  return;
+	}
+    }
+
+  tainted_args_call_info *info
+    = new tainted_args_call_info (field, fndecl, loc);
+  eg->add_edge (eg->get_origin (), enode, NULL, info);
+}
+
+/* Callback for walk_tree for finding callbacks within initializers;
+   ensure that any callback initializer where the corresponding field is
+   marked with '__attribute__((tainted_args))' is treated as an entrypoint
+   to the analysis, special-casing that the inputs to the callback are
+   untrustworthy.  */
+
+static tree
+add_any_callbacks (tree *tp, int *, void *data)
+{
+  exploded_graph *eg = (exploded_graph *)data;
+  if (TREE_CODE (*tp) == CONSTRUCTOR)
+    {
+      /* Find fields with the "tainted_args" attribute.
+	 walk_tree only walks the values, not the index values;
+	 look at the index values.  */
+      unsigned HOST_WIDE_INT idx;
+      constructor_elt *ce;
+
+      for (idx = 0; vec_safe_iterate (CONSTRUCTOR_ELTS (*tp), idx, &ce);
+	   idx++)
+	if (ce->index && TREE_CODE (ce->index) == FIELD_DECL)
+	  if (lookup_attribute ("tainted_args", DECL_ATTRIBUTES (ce->index)))
+	    {
+	      tree value = ce->value;
+	      if (TREE_CODE (value) == ADDR_EXPR
+		  && TREE_CODE (TREE_OPERAND (value, 0)) == FUNCTION_DECL)
+		add_tainted_args_callback (eg, ce->index,
+					   TREE_OPERAND (value, 0),
+					   EXPR_LOCATION (value));
+	    }
+    }
+
+  return NULL_TREE;
+}
+
 /* Add initial nodes to EG, with entrypoints for externally-callable
    functions.  */
 
@@ -2659,6 +2964,17 @@ exploded_graph::build_initial_worklist ()
 	  logger->log ("did not create enode for %qE entrypoint", fun->decl);
       }
   }
+
+  /* Find callbacks that are reachable from global initializers.  */
+  varpool_node *vpnode;
+  FOR_EACH_VARIABLE (vpnode)
+    {
+      tree decl = vpnode->decl;
+      tree init = DECL_INITIAL (decl);
+      if (!init)
+	continue;
+      walk_tree (&init, add_any_callbacks, this, NULL);
+    }
 }
 
 /* The main loop of the analysis.
diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index bdf72ce385c..4fb5dbd1409 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -117,6 +117,7 @@ static tree handle_no_profile_instrument_function_attribute (tree *, tree,
 							     tree, int, bool *);
 static tree handle_malloc_attribute (tree *, tree, tree, int, bool *);
 static tree handle_dealloc_attribute (tree *, tree, tree, int, bool *);
+static tree handle_tainted_args_attribute (tree *, tree, tree, int, bool *);
 static tree handle_returns_twice_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_limit_stack_attribute (tree *, tree, tree, int,
 					     bool *);
@@ -548,6 +549,8 @@ const struct attribute_spec c_common_attribute_table[] =
 			      handle_objc_nullability_attribute, NULL },
   { "*dealloc",                1, 2, true, false, false, false,
 			      handle_dealloc_attribute, NULL },
+  { "tainted_args",	      0, 0, true,  false, false, false,
+			      handle_tainted_args_attribute, NULL },
   { NULL,                     0, 0, false, false, false, false, NULL, NULL }
 };
 
@@ -5774,6 +5777,39 @@ handle_objc_nullability_attribute (tree *node, tree name, tree args,
   return NULL_TREE;
 }
 
+/* Handle a "tainted_args" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_tainted_args_attribute (tree *node, tree name, tree, int,
+			       bool *no_add_attrs)
+{
+  if (TREE_CODE (*node) != FUNCTION_DECL
+      && TREE_CODE (*node) != FIELD_DECL)
+    {
+      warning (OPT_Wattributes, "%qE attribute ignored; valid only "
+	       "for functions and function pointer fields",
+	       name);
+      *no_add_attrs = true;
+      return NULL_TREE;
+    }
+
+  if (TREE_CODE (*node) == FIELD_DECL
+      && !(TREE_CODE (TREE_TYPE (*node)) == POINTER_TYPE
+	   && TREE_CODE (TREE_TYPE (TREE_TYPE (*node))) == FUNCTION_TYPE))
+    {
+      warning (OPT_Wattributes, "%qE attribute ignored;"
+	       " field must be a function pointer",
+	       name);
+      *no_add_attrs = true;
+      return NULL_TREE;
+    }
+
+  *no_add_attrs = false; /* OK */
+
+  return NULL_TREE;
+}
+
 /* Attempt to partially validate a single attribute ATTR as if
    it were to be applied to an entity OPER.  */
 
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 637124a7172..20a5944256a 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -2512,7 +2512,8 @@ variable declarations (@pxref{Variable Attributes}),
 labels (@pxref{Label Attributes}),
 enumerators (@pxref{Enumerator Attributes}),
 statements (@pxref{Statement Attributes}),
-and types (@pxref{Type Attributes}).
+types (@pxref{Type Attributes}),
+and on field declarations (for @code{tainted_args}).
 
 There is some overlap between the purposes of attributes and pragmas
 (@pxref{Pragmas,,Pragmas Accepted by GCC}).  It has been
@@ -4009,6 +4010,26 @@ addition to creating a symbol version (as if
 @code{"@var{name2}@@@var{nodename}"} was used) the version will be also used
 to resolve @var{name2} by the linker.
 
+@item tainted_args
+@cindex @code{tainted_args} function attribute
+The @code{tainted_args} attribute is used to specify that a function is called
+in a way that requires sanitization of its arguments, such as a system
+call in an operating system kernel.  Such a function can be considered part
+of the ``attack surface'' of the program.  The attribute can be used both
+on function declarations, and on field declarations containing function
+pointers.  In the latter case, any function used as an initializer of
+such a callback field will be treated as being called with tainted
+arguments.
+
+The analyzer will pay particular attention to such functions when both
+@option{-fanalyzer} and @option{-fanalyzer-checker=taint} are supplied,
+potentially issuing warnings guarded by
+@option{-Wanalyzer-tainted-allocation-size},
+@option{-Wanalyzer-tainted-array-index},
+@option{-Wanalyzer-tainted-divisor},
+@option{-Wanalyzer-tainted-offset},
+and @option{-Wanalyzer-tainted-size}.
+
 @item target_clones (@var{options})
 @cindex @code{target_clones} function attribute
 The @code{target_clones} attribute is used to specify that a function
diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-tainted_args-1.c b/gcc/testsuite/gcc.dg/analyzer/attr-tainted_args-1.c
new file mode 100644
index 00000000000..e1d87c9cece
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/attr-tainted_args-1.c
@@ -0,0 +1,88 @@
+// TODO: remove need for this option
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+#include "analyzer-decls.h"
+
+struct arg_buf
+{
+  int i;
+  int j;
+};
+
+/* Example of marking a function as tainted.  */
+
+void __attribute__((tainted_args))
+test_1 (int i, void *p, char *q)
+{
+  /* There should be a single enode,
+     for the "tainted" entry to the function.  */
+  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1 processed enode" } */
+
+  __analyzer_dump_state ("taint", i); /* { dg-warning "state: 'tainted'" } */
+  __analyzer_dump_state ("taint", p); /* { dg-warning "state: 'tainted'" } */
+  __analyzer_dump_state ("taint", q); /* { dg-warning "state: 'tainted'" } */
+  __analyzer_dump_state ("taint", *q); /* { dg-warning "state: 'tainted'" } */
+
+  struct arg_buf *args = p;
+  __analyzer_dump_state ("taint", args->i); /* { dg-warning "state: 'tainted'" } */
+  __analyzer_dump_state ("taint", args->j); /* { dg-warning "state: 'tainted'" } */  
+}
+
+/* Example of marking a callback field as tainted.  */
+
+struct s2
+{
+  void (*cb) (int, void *, char *)
+    __attribute__((tainted_args));
+};
+
+/* Function not marked as tainted.  */
+
+void
+test_2a (int i, void *p, char *q)
+{
+  /* There should be a single enode,
+     for the normal entry to the function.  */
+  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1 processed enode" } */
+
+  __analyzer_dump_state ("taint", i); /* { dg-warning "state: 'start'" } */
+  __analyzer_dump_state ("taint", p); /* { dg-warning "state: 'start'" } */
+  __analyzer_dump_state ("taint", q); /* { dg-warning "state: 'start'" } */
+
+  struct arg_buf *args = p;
+  __analyzer_dump_state ("taint", args->i); /* { dg-warning "state: 'start'" } */
+  __analyzer_dump_state ("taint", args->j); /* { dg-warning "state: 'start'" } */  
+}
+
+/* Function referenced via t2b.cb, marked as "tainted".  */
+
+void
+test_2b (int i, void *p, char *q)
+{
+  /* There should be two enodes
+     for the direct call, and the "tainted" entry to the function.  */
+  __analyzer_dump_exploded_nodes (0); /* { dg-warning "2 processed enodes" } */
+}
+
+/* Callback used via t2c.cb, marked as "tainted".  */
+void
+__analyzer_test_2c (int i, void *p, char *q)
+{
+  /* There should be a single enode,
+     for the "tainted" entry to the function.  */
+  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1 processed enode" } */
+
+  __analyzer_dump_state ("taint", i); /* { dg-warning "state: 'tainted'" } */
+  __analyzer_dump_state ("taint", p); /* { dg-warning "state: 'tainted'" } */
+  __analyzer_dump_state ("taint", q); /* { dg-warning "state: 'tainted'" } */
+}
+
+struct s2 t2b =
+{
+  .cb = test_2b
+};
+
+struct s2 t2c =
+{
+  .cb = __analyzer_test_2c
+};
diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-tainted_args-misuses.c b/gcc/testsuite/gcc.dg/analyzer/attr-tainted_args-misuses.c
new file mode 100644
index 00000000000..4b0dc915059
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/attr-tainted_args-misuses.c
@@ -0,0 +1,6 @@
+int not_a_fn __attribute__ ((tainted_args)); /* { dg-warning "'tainted_args' attribute ignored; valid only for functions and function pointer fields" } */
+
+struct s
+{
+  int f __attribute__ ((tainted_args)); /* { dg-warning "'tainted_args' attribute ignored; field must be a function pointer" } */
+};
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
new file mode 100644
index 00000000000..b44be993568
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
@@ -0,0 +1,93 @@
+/* "The osf_getsysinfo function in arch/alpha/kernel/osf_sys.c in the
+   Linux kernel before 2.6.39.4 on the Alpha platform does not properly
+   restrict the data size for GSI_GET_HWRPB operations, which allows
+   local users to obtain sensitive information from kernel memory via
+   a crafted call."
+
+   Fixed in 3d0475119d8722798db5e88f26493f6547a4bb5b on linux-2.6.39.y
+   in linux-stable.  */
+
+// TODO: remove need for this option:
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+#include "analyzer-decls.h"
+#include "test-uaccess.h"
+
+/* Adapted from include/linux/linkage.h.  */
+
+#define asmlinkage
+
+/* Adapted from include/linux/syscalls.h.  */
+
+#define __SC_DECL1(t1, a1)	t1 a1
+#define __SC_DECL2(t2, a2, ...) t2 a2, __SC_DECL1(__VA_ARGS__)
+#define __SC_DECL3(t3, a3, ...) t3 a3, __SC_DECL2(__VA_ARGS__)
+#define __SC_DECL4(t4, a4, ...) t4 a4, __SC_DECL3(__VA_ARGS__)
+#define __SC_DECL5(t5, a5, ...) t5 a5, __SC_DECL4(__VA_ARGS__)
+#define __SC_DECL6(t6, a6, ...) t6 a6, __SC_DECL5(__VA_ARGS__)
+
+#define SYSCALL_DEFINEx(x, sname, ...)				\
+	__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
+
+#define SYSCALL_DEFINE(name) asmlinkage long sys_##name
+#define __SYSCALL_DEFINEx(x, name, ...)					\
+	asmlinkage __attribute__((tainted_args)) \
+	long sys##name(__SC_DECL##x(__VA_ARGS__))
+
+#define SYSCALL_DEFINE5(name, ...) SYSCALL_DEFINEx(5, _##name, __VA_ARGS__)
+
+/* Adapted from arch/alpha/include/asm/hwrpb.h.  */
+
+struct hwrpb_struct {
+	unsigned long phys_addr;	/* check: physical address of the hwrpb */
+	unsigned long id;		/* check: "HWRPB\0\0\0" */
+	unsigned long revision;
+	unsigned long size;		/* size of hwrpb */
+	/* [...snip...] */
+};
+
+extern struct hwrpb_struct *hwrpb;
+
+/* Adapted from arch/alpha/kernel/osf_sys.c.  */
+
+SYSCALL_DEFINE5(osf_getsysinfo, unsigned long, op, void __user *, buffer,
+		unsigned long, nbytes, int __user *, start, void __user *, arg)
+{
+	/* [...snip...] */
+
+	__analyzer_dump_state ("taint", nbytes);  /* { dg-warning "tainted" } */
+
+	/* TODO: should have an event explaining why "nbytes" is treated as
+	   attacker-controlled.  */
+
+	/* case GSI_GET_HWRPB: */
+		if (nbytes < sizeof(*hwrpb))
+			return -1;
+
+		__analyzer_dump_state ("taint", nbytes);  /* { dg-warning "has_lb" } */
+
+		if (copy_to_user(buffer, hwrpb, nbytes) != 0) /* { dg-warning "use of attacker-controlled value 'nbytes' as size without upper-bounds checking" } */
+			return -2;
+
+		return 1;
+
+	/* [...snip...] */
+}
+
+/* With the fix for the sense of the size comparison.  */
+
+SYSCALL_DEFINE5(osf_getsysinfo_fixed, unsigned long, op, void __user *, buffer,
+		unsigned long, nbytes, int __user *, start, void __user *, arg)
+{
+	/* [...snip...] */
+
+	/* case GSI_GET_HWRPB: */
+		if (nbytes > sizeof(*hwrpb))
+			return -1;
+		if (copy_to_user(buffer, hwrpb, nbytes) != 0) /* { dg-bogus "attacker-controlled" } */
+			return -2;
+
+		return 1;
+
+	/* [...snip...] */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
new file mode 100644
index 00000000000..328c5799145
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
@@ -0,0 +1,38 @@
+/* See notes in this header.  */
+#include "taint-CVE-2020-13143.h"
+
+// TODO: remove need for this option
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+struct configfs_attribute {
+	/* [...snip...] */
+	ssize_t (*store)(struct config_item *, const char *, size_t) /* { dg-message "\\(1\\) field 'store' of 'struct configfs_attribute' is marked with '__attribute__\\(\\(tainted_args\\)\\)'" } */
+		__attribute__((tainted_args)); /* (this is added).  */
+};
+static inline struct gadget_info *to_gadget_info(struct config_item *item)
+{
+	 return container_of(to_config_group(item), struct gadget_info, group);
+}
+
+static ssize_t gadget_dev_desc_UDC_store(struct config_item *item,
+		const char *page, size_t len)
+{
+	struct gadget_info *gi = to_gadget_info(item);
+	char *name;
+	int ret;
+
+#if 0
+	/* FIXME: this is the fix.  */
+	if (strlen(page) < len)
+		return -EOVERFLOW;
+#endif
+
+	name = kstrdup(page, GFP_KERNEL);
+	if (!name)
+		return -ENOMEM;
+	if (name[len - 1] == '\n') /* { dg-warning "use of attacker-controlled value 'len \[^\n\r\]+' as offset without upper-bounds checking" } */
+		name[len - 1] = '\0'; /* { dg-warning "use of attacker-controlled value 'len \[^\n\r\]+' as offset without upper-bounds checking" } */
+	/* [...snip...] */				\
+}
+
+CONFIGFS_ATTR(gadget_dev_desc_, UDC); /* { dg-message "\\(2\\) function 'gadget_dev_desc_UDC_store' used as initializer for field 'store' marked with '__attribute__\\(\\(tainted_args\\)\\)'" } */
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
new file mode 100644
index 00000000000..c74a460b01e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
@@ -0,0 +1,32 @@
+/* See notes in this header.  */
+#include "taint-CVE-2020-13143.h"
+
+// TODO: remove need for this option
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+struct configfs_attribute {
+	/* [...snip...] */
+	ssize_t (*store)(struct config_item *, const char *, size_t) /* { dg-message "\\(1\\) field 'store' of 'struct configfs_attribute' is marked with '__attribute__\\(\\(tainted_args\\)\\)'" } */
+		__attribute__((tainted_args)); /* (this is added).  */
+};
+
+/* Highly simplified version.  */
+
+static ssize_t gadget_dev_desc_UDC_store(struct config_item *item,
+		const char *page, size_t len)
+{
+	/* TODO: ought to have state_change_event talking about where the tainted value comes from.  */
+
+	char *name;
+	/* [...snip...] */
+
+	name = kstrdup(page, GFP_KERNEL);
+	if (!name)
+		return -ENOMEM;
+	if (name[len - 1] == '\n') /* { dg-warning "use of attacker-controlled value 'len \[^\n\r\]+' as offset without upper-bounds checking" } */
+		name[len - 1] = '\0';  /* { dg-warning "use of attacker-controlled value 'len \[^\n\r\]+' as offset without upper-bounds checking" } */
+	/* [...snip...] */
+	return 0;
+}
+
+CONFIGFS_ATTR(gadget_dev_desc_, UDC); /* { dg-message "\\(2\\) function 'gadget_dev_desc_UDC_store' used as initializer for field 'store' marked with '__attribute__\\(\\(tainted_args\\)\\)'" } */
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
new file mode 100644
index 00000000000..0ba023539af
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
@@ -0,0 +1,91 @@
+/* Shared header for the various taint-CVE-2020-13143.h tests.
+   
+   "gadget_dev_desc_UDC_store in drivers/usb/gadget/configfs.c in the
+   Linux kernel 3.16 through 5.6.13 relies on kstrdup without considering
+   the possibility of an internal '\0' value, which allows attackers to
+   trigger an out-of-bounds read, aka CID-15753588bcd4."
+
+   Fixed by 15753588bcd4bbffae1cca33c8ced5722477fe1f on linux-5.7.y
+   in linux-stable.  */
+
+// TODO: remove need for this option
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+#include <stddef.h>
+
+/* Adapted from include/uapi/asm-generic/posix_types.h  */
+
+typedef unsigned int     __kernel_size_t;
+typedef int              __kernel_ssize_t;
+
+/* Adapted from include/linux/types.h  */
+
+//typedef __kernel_size_t		size_t;
+typedef __kernel_ssize_t	ssize_t;
+
+/* Adapted from include/linux/kernel.h  */
+
+#define container_of(ptr, type, member) ({				\
+	void *__mptr = (void *)(ptr);					\
+	/* [...snip...] */						\
+	((type *)(__mptr - offsetof(type, member))); })
+
+/* Adapted from include/linux/configfs.h  */
+
+struct config_item {
+	/* [...snip...] */
+};
+
+struct config_group {
+	struct config_item		cg_item;
+	/* [...snip...] */
+};
+
+static inline struct config_group *to_config_group(struct config_item *item)
+{
+	return item ? container_of(item,struct config_group,cg_item) : NULL;
+}
+
+#define CONFIGFS_ATTR(_pfx, _name)				\
+static struct configfs_attribute _pfx##attr_##_name = {	\
+	/* [...snip...] */				\
+	.store		= _pfx##_name##_store,		\
+}
+
+/* Adapted from include/linux/compiler.h  */
+
+#define __force
+
+/* Adapted from include/asm-generic/errno-base.h  */
+
+#define	ENOMEM		12	/* Out of memory */
+
+/* Adapted from include/linux/types.h  */
+
+#define __bitwise__
+typedef unsigned __bitwise__ gfp_t;
+
+/* Adapted from include/linux/gfp.h  */
+
+#define ___GFP_WAIT		0x10u
+#define ___GFP_IO		0x40u
+#define ___GFP_FS		0x80u
+#define __GFP_WAIT	((__force gfp_t)___GFP_WAIT)
+#define __GFP_IO	((__force gfp_t)___GFP_IO)
+#define __GFP_FS	((__force gfp_t)___GFP_FS)
+#define GFP_KERNEL  (__GFP_WAIT | __GFP_IO | __GFP_FS)
+
+/* Adapted from include/linux/compiler_attributes.h  */
+
+#define __malloc                        __attribute__((__malloc__))
+
+/* Adapted from include/linux/string.h  */
+
+extern char *kstrdup(const char *s, gfp_t gfp) __malloc;
+
+/* Adapted from drivers/usb/gadget/configfs.c  */
+
+struct gadget_info {
+	struct config_group group;
+	/* [...snip...] */				\
+};
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
new file mode 100644
index 00000000000..80d8f0b8247
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
@@ -0,0 +1,21 @@
+// TODO: remove need for this option:
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+#include "analyzer-decls.h"
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+/* malloc with tainted size from a syscall.  */
+
+void *p;
+
+void __attribute__((tainted_args))
+test_1 (size_t sz) /* { dg-message "\\(1\\) function 'test_1' marked with '__attribute__\\(\\(tainted_args\\)\\)'" } */
+{
+  /* TODO: should have a message saying why "sz" is tainted, e.g.
+     "treating 'sz' as attacker-controlled because 'test_1' is marked with '__attribute__((tainted_args))'"  */
+
+  p = malloc (sz); /* { dg-warning "use of attacker-controlled value 'sz' as allocation size without upper-bounds checking" "warning" } */
+  /* { dg-message "\\(\[0-9\]+\\) use of attacker-controlled value 'sz' as allocation size without upper-bounds checking" "final event" { target *-*-* } .-1 } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
new file mode 100644
index 00000000000..bd47097b1d5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
@@ -0,0 +1,31 @@
+// TODO: remove need for this option:
+/* { dg-additional-options "-fanalyzer-checker=taint" } */
+
+#include "analyzer-decls.h"
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+/* malloc with tainted size from a syscall.  */
+
+struct arg_buf
+{
+  size_t sz;
+};
+
+void *p;
+
+void __attribute__((tainted_args))
+test_1 (void *data) /* { dg-message "\\(1\\) function 'test_1' marked with '__attribute__\\(\\(tainted_args\\)\\)'" } */
+{
+  /* we should treat pointed-to-structs as tainted.  */
+  __analyzer_dump_state ("taint", data); /* { dg-warning "state: 'tainted'" } */
+  
+  struct arg_buf *args = data;
+
+  __analyzer_dump_state ("taint", args); /* { dg-warning "state: 'tainted'" } */
+  __analyzer_dump_state ("taint", args->sz); /* { dg-warning "state: 'tainted'" } */
+  
+  p = malloc (args->sz); /* { dg-warning "use of attacker-controlled value '\\*args.sz' as allocation size without upper-bounds checking" "warning" } */
+  /* { dg-message "\\(\[0-9\]+\\) use of attacker-controlled value '\\*args.sz' as allocation size without upper-bounds checking" "final event" { target *-*-* } .-1 } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/test-uaccess.h b/gcc/testsuite/gcc.dg/analyzer/test-uaccess.h
new file mode 100644
index 00000000000..70c9d6309ef
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/test-uaccess.h
@@ -0,0 +1,15 @@
+/* Shared header for testcases for copy_from_user/copy_to_user.  */
+
+/* Adapted from include/linux/compiler.h  */
+
+#define __user
+
+/* Adapted from include/asm-generic/uaccess.h  */
+
+extern int copy_from_user(void *to, const void __user *from, long n)
+  __attribute__((access (write_only, 1, 3),
+		 access (read_only, 2, 3)));
+
+extern long copy_to_user(void __user *to, const void *from, unsigned long n)
+  __attribute__((access (write_only, 1, 3),
+		 access (read_only, 2, 3)));
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2022-01-14  1:25 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-13 20:37 [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries David Malcolm
2021-11-13 20:37 ` [PATCH 1a/6] RFC: Implement "#pragma GCC custom_address_space" David Malcolm
2021-11-13 20:37 ` [PATCH 1b/6] Add __attribute__((untrusted)) David Malcolm
2021-12-09 22:54   ` Martin Sebor
2022-01-06 15:10     ` David Malcolm
2022-01-06 18:59       ` Martin Sebor
2021-11-13 20:37 ` [PATCH 2/6] Add returns_zero_on_success/failure attributes David Malcolm
2021-11-15  7:03   ` Prathamesh Kulkarni
2021-11-15 14:45     ` Peter Zijlstra
2021-11-15 22:30       ` David Malcolm
2021-11-15 22:12     ` David Malcolm
2021-11-17  9:23       ` Prathamesh Kulkarni
2021-11-17 22:43         ` Joseph Myers
2021-11-18 20:08           ` Segher Boessenkool
2021-11-18 23:45             ` David Malcolm
2021-11-19 21:52               ` Segher Boessenkool
2021-11-18 23:34           ` David Malcolm
2021-12-06 18:34             ` Martin Sebor
2021-11-18 23:15         ` David Malcolm
2021-11-13 20:37 ` [PATCH 4a/6] analyzer: implement region::untrusted_p in terms of custom address spaces David Malcolm
2021-11-13 20:37 ` [PATCH 4b/6] analyzer: implement region::untrusted_p in terms of __attribute__((untrusted)) David Malcolm
2021-11-13 20:37 ` [PATCH 5/6] analyzer: use region::untrusted_p in taint detection David Malcolm
2021-11-13 20:37 ` [PATCH 6/6] Add __attribute__ ((tainted)) David Malcolm
2022-01-06 14:08   ` PING (C/C++): " David Malcolm
2022-01-10 21:36     ` PING^2 " David Malcolm
2022-01-12  4:36       ` Jason Merrill
2022-01-12 15:33         ` David Malcolm
2022-01-13 19:08           ` Jason Merrill
2022-01-14  1:25             ` [committed] Add __attribute__ ((tainted_args)) David Malcolm
2021-11-13 23:20 ` [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries Peter Zijlstra
2021-11-14  2:54   ` David Malcolm
2021-11-14 13:54 ` Miguel Ojeda
2021-12-06 18:12 ` Martin Sebor
2021-12-06 19:40   ` Segher Boessenkool
2021-12-09  0:06     ` David Malcolm
2021-12-09  0:41       ` Segher Boessenkool
2021-12-09 16:42     ` Martin Sebor
2021-12-09 23:40       ` Segher Boessenkool
2021-12-08 23:11   ` David Malcolm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).