netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Curious bpf regression in 5.18 already fixed in stable 5.18.3
@ 2022-06-15 16:45 Maciej Żenczykowski
  2022-06-15 16:57 ` Maciej Żenczykowski
  0 siblings, 1 reply; 13+ messages in thread
From: Maciej Żenczykowski @ 2022-06-15 16:45 UTC (permalink / raw)
  To: Linux NetDev, BPF Mailing List, Maciej Żenczykowski,
	Stanislav Fomichev, Alexei Starovoitov, Martin KaFai Lau,
	Sasha Levin, Carlos Llamas

Are you folks aware that:

'bpf: Move rcu lock management out of BPF_PROG_RUN routines'

fixes a weird regression where sendmsg with an egress tc bpf program
denying it was returning EFAULT instead of EPERM

I've confirmed vanilla 5.18.0 is broken, and all it takes is
cherrypicking that specific stable 5.18.x patch [
710a8989b4b4067903f5b61314eda491667b6ab3 ] to fix behaviour.

This was not a flaky failure... but a 100% reproducible behavioural
breakage/failure in the test case at
https://cs.android.com/android/platform/superproject/+/master:kernel/tests/net/test/bpf_test.py;l=517
(where 5.18 would return EFAULT instead of EPERM)

---

A non standalone but perhaps useful (for reading) simplification of
the test case follows.

I was planning on reporting it, hence why I was trying to trim it down
and have this ready anyway, only to discover it's already been fixed.
But the commit message seems to be unrelated...  some sort of compiler
optimization shenanigans?  Missing test case opportunity?

Note: that I run this on x86_64 UML - that might matter??

#!/usr/bin/python
# extracted snippet from AOSP, Apache2 licensed

import csocket
import cstruct
import ctypes
import errno
import os
import platform
import socket
import unittest

__NR_bpf = {  # pylint: disable=invalid-name
    "aarch64-32bit": 386,
    "aarch64-64bit": 280,
    "armv7l-32bit": 386,
    "armv8l-32bit": 386,
    "armv8l-64bit": 280,
    "i686-32bit": 357,
    "i686-64bit": 321,
    "x86_64-32bit": 357,
    "x86_64-64bit": 321,
}[os.uname()[4] + "-" + platform.architecture()[0]]

LOG_LEVEL = 1
LOG_SIZE = 65536

BPF_PROG_LOAD = 5
BPF_PROG_ATTACH = 8
BPF_PROG_DETACH = 9

BPF_PROG_TYPE_CGROUP_SKB = 8

BPF_CGROUP_INET_EGRESS = 1

BPF_REG_0 = 0

BPF_JMP = 0x05
BPF_K = 0x00
BPF_ALU64 = 0x07
BPF_MOV = 0xb0
BPF_EXIT = 0x90

BpfAttrProgLoad = cstruct.Struct(
    "bpf_attr_prog_load", "=IIQQIIQI", "prog_type insn_cnt insns"
    " license log_level log_size log_buf kern_version")
BpfAttrProgAttach = cstruct.Struct(
    "bpf_attr_prog_attach", "=III", "target_fd attach_bpf_fd attach_type")
BpfInsn = cstruct.Struct("bpf_insn", "=BBhi", "code dst_src_reg off imm")

libc = ctypes.CDLL(ctypes.util.find_library("c"), use_errno=True)


def BpfSyscall(op, attr):
  ret = libc.syscall(__NR_bpf, op, csocket.VoidPointer(attr), len(attr))
  csocket.MaybeRaiseSocketError(ret)
  return ret


def BpfProgLoad(prog_type, instructions, prog_license=b"GPL"):
  bpf_prog = "".join(instructions)
  insn_buff = ctypes.create_string_buffer(bpf_prog)
  gpl_license = ctypes.create_string_buffer(prog_license)
  log_buf = ctypes.create_string_buffer(b"", LOG_SIZE)
  return BpfSyscall(BPF_PROG_LOAD,
                    BpfAttrProgLoad((prog_type, len(insn_buff) / len(BpfInsn),
                                    ctypes.addressof(insn_buff),
                                    ctypes.addressof(gpl_license), LOG_LEVEL,
                                    LOG_SIZE, ctypes.addressof(log_buf), 0)))


# Attach a eBPF filter to a cgroup
def BpfProgAttach(prog_fd, target_fd, prog_type):
  attr = BpfAttrProgAttach((target_fd, prog_fd, prog_type))
  return BpfSyscall(BPF_PROG_ATTACH, attr)


# Detach a eBPF filter from a cgroup
def BpfProgDetach(target_fd, prog_type):
  attr = BpfAttrProgAttach((target_fd, 0, prog_type))
  return BpfSyscall(BPF_PROG_DETACH, attr)


class BpfCgroupEgressTest(unittest.TestCase):
  def setUp(self):
    self._cg_fd = os.open("/sys/fs/cgroup", os.O_DIRECTORY | os.O_RDONLY)
    BpfProgAttach(BpfProgLoad(BPF_PROG_TYPE_CGROUP_SKB, [
        BpfInsn((BPF_ALU64 | BPF_MOV | BPF_K, BPF_REG_0, 0,
0)).Pack(),  # Mov64Imm(REG0, 0)
        BpfInsn((BPF_JMP | BPF_EXIT, 0, 0, 0)).Pack()                    # Exit
    ]), self._cg_fd, BPF_CGROUP_INET_EGRESS)

  def tearDown(self):
    BpfProgDetach(self._cg_fd, BPF_CGROUP_INET_EGRESS)
    os.close(self._cg_fd)

  def testCgroupEgressBlocked(self):
    s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, 0)
    s.bind(("127.0.0.1", 0))
    addr = s.getsockname()
    # previously:   s.sendto("foo", addr)   would fail with EPERM, but
on 5.18+ it EFAULTs
    self.assertRaisesRegexp(EnvironmentError,
os.strerror(errno.EFAULT), s.sendto, "foo", addr)

if __name__ == "__main__":
  unittest.main()

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3
  2022-06-15 16:45 Curious bpf regression in 5.18 already fixed in stable 5.18.3 Maciej Żenczykowski
@ 2022-06-15 16:57 ` Maciej Żenczykowski
  2022-06-15 17:37   ` Alexei Starovoitov
  0 siblings, 1 reply; 13+ messages in thread
From: Maciej Żenczykowski @ 2022-06-15 16:57 UTC (permalink / raw)
  To: Linux NetDev, BPF Mailing List, Maciej Żenczykowski,
	Stanislav Fomichev, Alexei Starovoitov, Martin KaFai Lau,
	Sasha Levin, Carlos Llamas

On Wed, Jun 15, 2022 at 9:45 AM Maciej Żenczykowski
<zenczykowski@gmail.com> wrote:
>
> Are you folks aware that:
>
> 'bpf: Move rcu lock management out of BPF_PROG_RUN routines'
>
> fixes a weird regression where sendmsg with an egress tc bpf program
> denying it was returning EFAULT instead of EPERM
>
> I've confirmed vanilla 5.18.0 is broken, and all it takes is
> cherrypicking that specific stable 5.18.x patch [
> 710a8989b4b4067903f5b61314eda491667b6ab3 ] to fix behaviour.
>
> This was not a flaky failure... but a 100% reproducible behavioural
> breakage/failure in the test case at
> https://cs.android.com/android/platform/superproject/+/master:kernel/tests/net/test/bpf_test.py;l=517
> (where 5.18 would return EFAULT instead of EPERM)

I bisected on 5.18.x to find the fixing CL, so I don't know which CL
actually caused the breakage.

sdf says:
5.15 is where they rewrote defines to funcs, so there is still
something else involved it seems

b8bd3ee1971d1edbc53cf322c149ca0227472e56 this is where we added EFAULT in 5.16
(we've added a mechanism to return custom errno, I wonder if some of
that is related)

and that this EFAULT breakage is not something he was expecting to fix...
so it's some sort of unintended consequence.

I recall that:
- vanilla 5.15 and 5.16 are definitely good
- I think the only regression in 5.17 is an unrelated icmp socket one
- so from a bpf perspective it was also good.
- 5.18 had 3 regressions: icmp sockets, the pf_key regression (fixed
via revert in 5.18.4) plus this bpf one

The bad pf_key change being reverted in 5.18.4 is why I even switched
from dev/test against 5.18 to against 5.18.4
and noticed that this was already fixed before I could even report it...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3
  2022-06-15 16:57 ` Maciej Żenczykowski
@ 2022-06-15 17:37   ` Alexei Starovoitov
  2022-06-15 17:46     ` Maciej Żenczykowski
  0 siblings, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2022-06-15 17:37 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Linux NetDev, BPF Mailing List, Stanislav Fomichev,
	Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas

On Wed, Jun 15, 2022 at 9:57 AM Maciej Żenczykowski <maze@google.com> wrote:
> >
> > I've confirmed vanilla 5.18.0 is broken, and all it takes is
> > cherrypicking that specific stable 5.18.x patch [
> > 710a8989b4b4067903f5b61314eda491667b6ab3 ] to fix behaviour.
...
> b8bd3ee1971d1edbc53cf322c149ca0227472e56 this is where we added EFAULT in 5.16

There are no such sha-s in the upstream kernel.
Sorry we cannot help with debugging of android kernels.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3
  2022-06-15 17:37   ` Alexei Starovoitov
@ 2022-06-15 17:46     ` Maciej Żenczykowski
  2022-06-15 17:55       ` sdf
  0 siblings, 1 reply; 13+ messages in thread
From: Maciej Żenczykowski @ 2022-06-15 17:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Linux NetDev, BPF Mailing List, Stanislav Fomichev,
	Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas

On Wed, Jun 15, 2022 at 10:38 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Jun 15, 2022 at 9:57 AM Maciej Żenczykowski <maze@google.com> wrote:
> > >
> > > I've confirmed vanilla 5.18.0 is broken, and all it takes is
> > > cherrypicking that specific stable 5.18.x patch [
> > > 710a8989b4b4067903f5b61314eda491667b6ab3 ] to fix behaviour.
> ...
> > b8bd3ee1971d1edbc53cf322c149ca0227472e56 this is where we added EFAULT in 5.16
>
> There are no such sha-s in the upstream kernel.
> Sorry we cannot help with debugging of android kernels.

Yes, sdf@ quoted the wrong sha1, it's a clean cherrypick to an
internal branch of
'bpf: Add cgroup helpers bpf_{get,set}_retval to get/set syscall return value'
commit b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.16.y&id=b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93

Anyway, I think it's unrelated - or at least not the immediate root cause.

Also there's *no* Android kernels involved here.
This is the android net tests failing on vanilla 5.18 and passing on 5.18.3.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3
  2022-06-15 17:46     ` Maciej Żenczykowski
@ 2022-06-15 17:55       ` sdf
  2022-06-15 20:26         ` sdf
  0 siblings, 1 reply; 13+ messages in thread
From: sdf @ 2022-06-15 17:55 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Alexei Starovoitov, Linux NetDev, BPF Mailing List,
	Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas

On 06/15, Maciej Żenczykowski wrote:
> On Wed, Jun 15, 2022 at 10:38 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, Jun 15, 2022 at 9:57 AM Maciej Żenczykowski <maze@google.com>  
> wrote:
> > > >
> > > > I've confirmed vanilla 5.18.0 is broken, and all it takes is
> > > > cherrypicking that specific stable 5.18.x patch [
> > > > 710a8989b4b4067903f5b61314eda491667b6ab3 ] to fix behaviour.
> > ...
> > > b8bd3ee1971d1edbc53cf322c149ca0227472e56 this is where we added  
> EFAULT in 5.16
> >
> > There are no such sha-s in the upstream kernel.
> > Sorry we cannot help with debugging of android kernels.

> Yes, sdf@ quoted the wrong sha1, it's a clean cherrypick to an
> internal branch of
> 'bpf: Add cgroup helpers bpf_{get,set}_retval to get/set syscall return  
> value'
> commit b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93.

> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.16.y&id=b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93

> Anyway, I think it's unrelated - or at least not the immediate root cause.

> Also there's *no* Android kernels involved here.
> This is the android net tests failing on vanilla 5.18 and passing on  
> 5.18.3.

Yeah, sorry, didn't mean to send those outside :-)

Attached un-android-ified testcase. Passes on bpf-next, trying to see
what happens on vanilla 5.18. Will update once I get more data..

-- 

#!/usr/bin/python2.7
# extracted snippet from AOSP, Apache2 licensed

import ctypes
import ctypes.util
import re
import errno
import os
import platform
import struct
import socket
import unittest

__NR_bpf = {  # pylint: disable=invalid-name
     "aarch64-32bit": 386,
     "aarch64-64bit": 280,
     "armv7l-32bit": 386,
     "armv8l-32bit": 386,
     "armv8l-64bit": 280,
     "i686-32bit": 357,
     "i686-64bit": 321,
     "x86_64-32bit": 357,
     "x86_64-64bit": 321,
}[os.uname()[4] + "-" + platform.architecture()[0]]

LOG_LEVEL = 1
LOG_SIZE = 65536

BPF_PROG_LOAD = 5
BPF_PROG_ATTACH = 8
BPF_PROG_DETACH = 9

BPF_PROG_TYPE_CGROUP_SKB = 8

BPF_CGROUP_INET_EGRESS = 1

BPF_REG_0 = 0

BPF_JMP = 0x05
BPF_K = 0x00
BPF_ALU64 = 0x07
BPF_MOV = 0xb0
BPF_EXIT = 0x90


def CalcSize(fmt):
   if "A" in fmt:
     fmt = fmt.replace("A", "s")
   # Remove the last digital since it will cause error in python3.
   fmt = (re.split('\d+$', fmt)[0])
   return struct.calcsize(fmt)

def CalcNumElements(fmt):
   prevlen = len(fmt)
   fmt = fmt.replace("S", "")
   numstructs = prevlen - len(fmt)
   size = CalcSize(fmt)
   elements = struct.unpack(fmt, b"\x00" * size)
   return len(elements) + numstructs


def Struct(name, fmt, fieldnames, substructs={}):
   """Function that returns struct classes."""

   class Meta(type):

     def __len__(cls):
       return cls._length

     def __init__(cls, unused_name, unused_bases, namespace):
       # Make the class object have the name that's passed in.
       type.__init__(cls, namespace["_name"], unused_bases, namespace)

   class CStruct(object):
     """Class representing a C-like structure."""

     __metaclass__ = Meta

     # Name of the struct.
     _name = name
     # List of field names.
     _fieldnames = fieldnames
     # Dict mapping field indices to nested struct classes.
     _nested = {}
     # List of string fields that are ASCII strings.
     _asciiz = set()

     _fieldnames = _fieldnames.split(" ")

     # Parse fmt into _format, converting any S format characters to "XXs",
     # where XX is the length of the struct type's packed representation.
     _format = ""
     laststructindex = 0
     for i in range(len(fmt)):
       if fmt[i] == "S":
         # Nested struct. Record the index in our struct it should go into.
         index = CalcNumElements(fmt[:i])
         _nested[index] = substructs[laststructindex]
         laststructindex += 1
         _format += "%ds" % len(_nested[index])
       elif fmt[i] == "A":
         # Null-terminated ASCII string.
         index = CalcNumElements(fmt[:i])
         _asciiz.add(index)
         _format += "s"
       else:
         # Standard struct format character.
         _format += fmt[i]

     _length = CalcSize(_format)

     offset_list = [0]
     last_offset = 0
     for i in range(len(_format)):
       offset = CalcSize(_format[:i])
       if offset > last_offset:
         last_offset = offset
         offset_list.append(offset)

     # A dictionary that maps field names to their offsets in the struct.
     _offsets = dict(list(zip(_fieldnames, offset_list)))

     # Check that the number of field names matches the number of fields.
     numfields = len(struct.unpack(_format, b"\x00" * _length))
     if len(_fieldnames) != numfields:
       raise ValueError("Invalid cstruct: \"%s\" has %d elements, \"%s\"  
has %d."
                        % (fmt, numfields, fieldnames, len(_fieldnames)))

     def _SetValues(self, values):
       # Replace self._values with the given list. We can't do direct  
assignment
       # because of the __setattr__ overload on this class.
       super(CStruct, self).__setattr__("_values", list(values))

     def _Parse(self, data):
       data = data[:self._length]
       values = list(struct.unpack(self._format, data))
       for index, value in enumerate(values):
         if isinstance(value, str) and index in self._nested:
           values[index] = self._nested[index](value)
       self._SetValues(values)

     def __init__(self, tuple_or_bytes=None, **kwargs):
       """Construct an instance of this Struct.

       1. With no args, the whole struct is zero-initialized.
       2. With keyword args, the matching fields are populated; rest are  
zeroed.
       3. With one tuple as the arg, the fields are assigned based on  
position.
       4. With one string arg, the Struct is parsed from bytes.
       """
       if tuple_or_bytes and kwargs:
         raise TypeError(
             "%s: cannot specify both a tuple and keyword args" % self._name)

       if tuple_or_bytes is None:
         # Default construct from null bytes.
         self._Parse("\x00" * len(self))
         # If any keywords were supplied, set those fields.
         for k, v in kwargs.items():
           setattr(self, k, v)
       elif isinstance(tuple_or_bytes, str):
         # Initializing from a string.
         if len(tuple_or_bytes) < self._length:
           raise TypeError("%s requires string of length %d, got %d" %
                           (self._name, self._length, len(tuple_or_bytes)))
         self._Parse(tuple_or_bytes)
       else:
         # Initializing from a tuple.
         if len(tuple_or_bytes) != len(self._fieldnames):
           raise TypeError("%s has exactly %d fieldnames (%d given)" %
                           (self._name, len(self._fieldnames),
                            len(tuple_or_bytes)))
         self._SetValues(tuple_or_bytes)

     def _FieldIndex(self, attr):
       try:
         return self._fieldnames.index(attr)
       except ValueError:
         raise AttributeError("'%s' has no attribute '%s'" %
                              (self._name, attr))

     def __getattr__(self, name):
       return self._values[self._FieldIndex(name)]

     def __setattr__(self, name, value):
       # TODO: check value type against self._format and throw here, or else
       # callers get an unhelpful exception when they call Pack().
       self._values[self._FieldIndex(name)] = value

     def offset(self, name):
       if "." in name:
         raise NotImplementedError("offset() on nested field")
       return self._offsets[name]

     @classmethod
     def __len__(cls):
       return cls._length

     def __ne__(self, other):
       return not self.__eq__(other)

     def __eq__(self, other):
       return (isinstance(other, self.__class__) and
               self._name == other._name and
               self._fieldnames == other._fieldnames and
               self._values == other._values)

     @staticmethod
     def _MaybePackStruct(value):
       if hasattr(value, "__metaclass__"):# and value.__metaclass__ == Meta:
         return value.Pack()
       else:
         return value

     def Pack(self):
       values = [self._MaybePackStruct(v) for v in self._values]
       return struct.pack(self._format, *values)

     def __str__(self):
       def FieldDesc(index, name, value):
         if isinstance(value, str):
           if index in self._asciiz:
             value = value.rstrip("\x00")
           elif any(c not in string.printable for c in value):
             value = value.encode("hex")
         return "%s=%s" % (name, value)

       descriptions = [
           FieldDesc(i, n, v) for i, (n, v) in
           enumerate(zip(self._fieldnames, self._values))]

       return "%s(%s)" % (self._name, ", ".join(descriptions))

     def __repr__(self):
       return str(self)

     def CPointer(self):
       """Returns a C pointer to the serialized structure."""
       buf = ctypes.create_string_buffer(self.Pack())
       # Store the C buffer in the object so it doesn't get garbage  
collected.
       super(CStruct, self).__setattr__("_buffer", buf)
       return ctypes.addressof(self._buffer)

   return CStruct




BpfAttrProgLoad = Struct(
     "bpf_attr_prog_load", "=IIQQIIQI", "prog_type insn_cnt insns"
     " license log_level log_size log_buf kern_version")
BpfAttrProgAttach = Struct(
     "bpf_attr_prog_attach", "=III", "target_fd attach_bpf_fd attach_type")
BpfInsn = Struct("bpf_insn", "=BBhi", "code dst_src_reg off imm")

libc = ctypes.CDLL(ctypes.util.find_library("c"), use_errno=True)

def VoidPointer(s):
     return ctypes.cast(s.CPointer(), ctypes.c_void_p)

def MaybeRaiseSocketError(ret):
   if ret < 0:
     errno = ctypes.get_errno()
     raise socket.error(errno, os.strerror(errno))

def BpfSyscall(op, attr):
   ret = libc.syscall(__NR_bpf, op, VoidPointer(attr), len(attr))
   MaybeRaiseSocketError(ret)
   return ret


def BpfProgLoad(prog_type, instructions, prog_license=b"GPL"):
   bpf_prog = b"".join(instructions)
   insn_buff = ctypes.create_string_buffer(bpf_prog)
   gpl_license = ctypes.create_string_buffer(prog_license)
   log_buf = ctypes.create_string_buffer(b"", LOG_SIZE)
   return BpfSyscall(BPF_PROG_LOAD,
                     BpfAttrProgLoad((prog_type, len(instructions),
                                     ctypes.addressof(insn_buff),
                                     ctypes.addressof(gpl_license),  
LOG_LEVEL,
                                     LOG_SIZE, ctypes.addressof(log_buf),  
0)))


# Attach a eBPF filter to a cgroup
def BpfProgAttach(prog_fd, target_fd, prog_type):
   attr = BpfAttrProgAttach((target_fd, prog_fd, prog_type))
   return BpfSyscall(BPF_PROG_ATTACH, attr)


# Detach a eBPF filter from a cgroup
def BpfProgDetach(target_fd, prog_type):
   attr = BpfAttrProgAttach((target_fd, 0, prog_type))
   return BpfSyscall(BPF_PROG_DETACH, attr)


class BpfCgroupEgressTest(unittest.TestCase):
   def setUp(self):
     self._cg_fd = os.open("/sys/fs/cgroup", os.O_DIRECTORY | os.O_RDONLY)
     BpfProgAttach(BpfProgLoad(BPF_PROG_TYPE_CGROUP_SKB, [
         BpfInsn((BPF_ALU64 | BPF_MOV | BPF_K, BPF_REG_0, 0,
0)).Pack(),  # Mov64Imm(REG0, 0)
         BpfInsn((BPF_JMP | BPF_EXIT, 0, 0, 0)).Pack()                    #  
Exit
     ]), self._cg_fd, BPF_CGROUP_INET_EGRESS)

   def tearDown(self):
     BpfProgDetach(self._cg_fd, BPF_CGROUP_INET_EGRESS)
     os.close(self._cg_fd)

   def testCgroupEgressBlocked(self):
     s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, 0)
     s.bind(("127.0.0.1", 0))
     addr = s.getsockname()
     self.assertRaisesRegexp(EnvironmentError, os.strerror(errno.EPERM),  
s.sendto, b"foo", addr)

if __name__ == "__main__":
   unittest.main()

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3
  2022-06-15 17:55       ` sdf
@ 2022-06-15 20:26         ` sdf
  2022-06-15 20:32           ` sdf
  0 siblings, 1 reply; 13+ messages in thread
From: sdf @ 2022-06-15 20:26 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Alexei Starovoitov, Linux NetDev, BPF Mailing List,
	Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas,
	zhuyifei

On 06/15, sdf@google.com wrote:
> On 06/15, Maciej Żenczykowski wrote:
> > On Wed, Jun 15, 2022 at 10:38 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Wed, Jun 15, 2022 at 9:57 AM Maciej Żenczykowski <maze@google.com>
> > wrote:
> > > > >
> > > > > I've confirmed vanilla 5.18.0 is broken, and all it takes is
> > > > > cherrypicking that specific stable 5.18.x patch [
> > > > > 710a8989b4b4067903f5b61314eda491667b6ab3 ] to fix behaviour.
> > > ...
> > > > b8bd3ee1971d1edbc53cf322c149ca0227472e56 this is where we added
> > EFAULT in 5.16
> > >
> > > There are no such sha-s in the upstream kernel.
> > > Sorry we cannot help with debugging of android kernels.

> > Yes, sdf@ quoted the wrong sha1, it's a clean cherrypick to an
> > internal branch of
> > 'bpf: Add cgroup helpers bpf_{get,set}_retval to get/set syscall return
> > value'
> > commit b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93.

> >  
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.16.y&id=b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93

> > Anyway, I think it's unrelated - or at least not the immediate root  
> cause.

> > Also there's *no* Android kernels involved here.
> > This is the android net tests failing on vanilla 5.18 and passing on
> > 5.18.3.

> Yeah, sorry, didn't mean to send those outside :-)

> Attached un-android-ified testcase. Passes on bpf-next, trying to see
> what happens on vanilla 5.18. Will update once I get more data..

I've bisected the original issue to:

b44123b4a3dc ("bpf: Add cgroup helpers bpf_{get,set}_retval to get/set
syscall return value")

And I believe it's these two lines from the original patch:

  #define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func)		\
  	({						\
@@ -1398,10 +1398,12 @@ out:
  		u32 _ret;				\
  		_ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, 0, &_flags); \
  		_cn = _flags & BPF_RET_SET_CN;		\
+		if (_ret && !IS_ERR_VALUE((long)_ret))	\
+			_ret = -EFAULT;	

_ret is u32 and ret gets -1 (ffffffff). IS_ERR_VALUE((long)ffffffff) returns
false in this case because it doesn't sign-expand the argument and  
internally
does ffff_ffff >= ffff_ffff_ffff_f001 comparison.

I'll try to see what I've changed in my unrelated patch to fix it. But I  
think
we should audit all these IS_ERR_VALUE((long)_ret) regardless; they don't
seem to work the way we want them to...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3
  2022-06-15 20:26         ` sdf
@ 2022-06-15 20:32           ` sdf
  2022-06-15 20:39             ` sdf
  0 siblings, 1 reply; 13+ messages in thread
From: sdf @ 2022-06-15 20:32 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Alexei Starovoitov, Linux NetDev, BPF Mailing List,
	Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas,
	zhuyifei

On 06/15, Stanislav Fomichev wrote:
> On 06/15, sdf@google.com wrote:
> > On 06/15, Maciej Żenczykowski wrote:
> > > On Wed, Jun 15, 2022 at 10:38 AM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Wed, Jun 15, 2022 at 9:57 AM Maciej Żenczykowski  
> <maze@google.com>
> > > wrote:
> > > > > >
> > > > > > I've confirmed vanilla 5.18.0 is broken, and all it takes is
> > > > > > cherrypicking that specific stable 5.18.x patch [
> > > > > > 710a8989b4b4067903f5b61314eda491667b6ab3 ] to fix behaviour.
> > > > ...
> > > > > b8bd3ee1971d1edbc53cf322c149ca0227472e56 this is where we added
> > > EFAULT in 5.16
> > > >
> > > > There are no such sha-s in the upstream kernel.
> > > > Sorry we cannot help with debugging of android kernels.
> >
> > > Yes, sdf@ quoted the wrong sha1, it's a clean cherrypick to an
> > > internal branch of
> > > 'bpf: Add cgroup helpers bpf_{get,set}_retval to get/set syscall  
> return
> > > value'
> > > commit b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93.
> >
> > >  
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.16.y&id=b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93
> >
> > > Anyway, I think it's unrelated - or at least not the immediate root  
> cause.
> >
> > > Also there's *no* Android kernels involved here.
> > > This is the android net tests failing on vanilla 5.18 and passing on
> > > 5.18.3.
> >
> > Yeah, sorry, didn't mean to send those outside :-)
> >
> > Attached un-android-ified testcase. Passes on bpf-next, trying to see
> > what happens on vanilla 5.18. Will update once I get more data..

> I've bisected the original issue to:

> b44123b4a3dc ("bpf: Add cgroup helpers bpf_{get,set}_retval to get/set
> syscall return value")

> And I believe it's these two lines from the original patch:

>   #define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func)		\
>   	({						\
> @@ -1398,10 +1398,12 @@ out:
>   		u32 _ret;				\
>   		_ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, 0, &_flags); \
>   		_cn = _flags & BPF_RET_SET_CN;		\
> +		if (_ret && !IS_ERR_VALUE((long)_ret))	\
> +			_ret = -EFAULT;	

> _ret is u32 and ret gets -1 (ffffffff). IS_ERR_VALUE((long)ffffffff)  
> returns
> false in this case because it doesn't sign-expand the argument and  
> internally
> does ffff_ffff >= ffff_ffff_ffff_f001 comparison.

> I'll try to see what I've changed in my unrelated patch to fix it. But I  
> think
> we should audit all these IS_ERR_VALUE((long)_ret) regardless; they don't
> seem to work the way we want them to...

Ok, and my patch fixes it because I'm replacing 'u32 _ret' with 'int ret'.

So, basically, with u32 _ret we have to do IS_ERR_VALUE((long)(int)_ret).

Sigh..

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3
  2022-06-15 20:32           ` sdf
@ 2022-06-15 20:39             ` sdf
  2022-06-16  1:36               ` Maciej Żenczykowski
  0 siblings, 1 reply; 13+ messages in thread
From: sdf @ 2022-06-15 20:39 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Alexei Starovoitov, Linux NetDev, BPF Mailing List,
	Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas,
	zhuyifei

On 06/15, Stanislav Fomichev wrote:
> On 06/15, Stanislav Fomichev wrote:
> > On 06/15, sdf@google.com wrote:
> > > On 06/15, Maciej Żenczykowski wrote:
> > > > On Wed, Jun 15, 2022 at 10:38 AM Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >
> > > > > On Wed, Jun 15, 2022 at 9:57 AM Maciej Żenczykowski  
> <maze@google.com>
> > > > wrote:
> > > > > > >
> > > > > > > I've confirmed vanilla 5.18.0 is broken, and all it takes is
> > > > > > > cherrypicking that specific stable 5.18.x patch [
> > > > > > > 710a8989b4b4067903f5b61314eda491667b6ab3 ] to fix behaviour.
> > > > > ...
> > > > > > b8bd3ee1971d1edbc53cf322c149ca0227472e56 this is where we added
> > > > EFAULT in 5.16
> > > > >
> > > > > There are no such sha-s in the upstream kernel.
> > > > > Sorry we cannot help with debugging of android kernels.
> > >
> > > > Yes, sdf@ quoted the wrong sha1, it's a clean cherrypick to an
> > > > internal branch of
> > > > 'bpf: Add cgroup helpers bpf_{get,set}_retval to get/set syscall  
> return
> > > > value'
> > > > commit b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93.
> > >
> > > >  
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.16.y&id=b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93
> > >
> > > > Anyway, I think it's unrelated - or at least not the immediate root  
> cause.
> > >
> > > > Also there's *no* Android kernels involved here.
> > > > This is the android net tests failing on vanilla 5.18 and passing on
> > > > 5.18.3.
> > >
> > > Yeah, sorry, didn't mean to send those outside :-)
> > >
> > > Attached un-android-ified testcase. Passes on bpf-next, trying to see
> > > what happens on vanilla 5.18. Will update once I get more data..
> >
> > I've bisected the original issue to:
> >
> > b44123b4a3dc ("bpf: Add cgroup helpers bpf_{get,set}_retval to get/set
> > syscall return value")
> >
> > And I believe it's these two lines from the original patch:
> >
> >  #define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func)		\
> >  	({						\
> > @@ -1398,10 +1398,12 @@ out:
> >  		u32 _ret;				\
> >  		_ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, 0, &_flags); \
> >  		_cn = _flags & BPF_RET_SET_CN;		\
> > +		if (_ret && !IS_ERR_VALUE((long)_ret))	\
> > +			_ret = -EFAULT;	
> >
> > _ret is u32 and ret gets -1 (ffffffff). IS_ERR_VALUE((long)ffffffff)  
> returns
> > false in this case because it doesn't sign-expand the argument and  
> internally
> > does ffff_ffff >= ffff_ffff_ffff_f001 comparison.
> >
> > I'll try to see what I've changed in my unrelated patch to fix it. But  
> I think
> > we should audit all these IS_ERR_VALUE((long)_ret) regardless; they  
> don't
> > seem to work the way we want them to...

> Ok, and my patch fixes it because I'm replacing 'u32 _ret' with 'int ret'.

> So, basically, with u32 _ret we have to do IS_ERR_VALUE((long)(int)_ret).

> Sigh..

And to follow up on that, the other two places we have are fine:

IS_ERR_VALUE((long)run_ctx.retval))

run_ctx.retval is an int.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3
  2022-06-15 20:39             ` sdf
@ 2022-06-16  1:36               ` Maciej Żenczykowski
  2022-06-16 15:57                 ` Stanislav Fomichev
  0 siblings, 1 reply; 13+ messages in thread
From: Maciej Żenczykowski @ 2022-06-16  1:36 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Alexei Starovoitov, Linux NetDev, BPF Mailing List,
	Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas,
	YiFei Zhu

> > > I've bisected the original issue to:
> > >
> > > b44123b4a3dc ("bpf: Add cgroup helpers bpf_{get,set}_retval to get/set
> > > syscall return value")
> > >
> > > And I believe it's these two lines from the original patch:
> > >
> > >  #define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func)            \
> > >     ({                                              \
> > > @@ -1398,10 +1398,12 @@ out:
> > >             u32 _ret;                               \
> > >             _ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, 0, &_flags); \
> > >             _cn = _flags & BPF_RET_SET_CN;          \
> > > +           if (_ret && !IS_ERR_VALUE((long)_ret))  \
> > > +                   _ret = -EFAULT;
> > >
> > > _ret is u32 and ret gets -1 (ffffffff). IS_ERR_VALUE((long)ffffffff)
> > returns
> > > false in this case because it doesn't sign-expand the argument and
> > internally
> > > does ffff_ffff >= ffff_ffff_ffff_f001 comparison.
> > >
> > > I'll try to see what I've changed in my unrelated patch to fix it. But
> > I think
> > > we should audit all these IS_ERR_VALUE((long)_ret) regardless; they
> > don't
> > > seem to work the way we want them to...
>
> > Ok, and my patch fixes it because I'm replacing 'u32 _ret' with 'int ret'.
>
> > So, basically, with u32 _ret we have to do IS_ERR_VALUE((long)(int)_ret).
>
> > Sigh..
>
> And to follow up on that, the other two places we have are fine:
>
> IS_ERR_VALUE((long)run_ctx.retval))
>
> run_ctx.retval is an int.

I'm guessing this means the regression only affects 64-bit archs,
where long = void* is 8 bytes > u32 of 4 bytes, but not 32-bit ones,
where long = u32 = 4 bytes

Unfortunately my dev machine's 32-bit build capability has somehow
regressed again and I can't check this.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3
  2022-06-16  1:36               ` Maciej Żenczykowski
@ 2022-06-16 15:57                 ` Stanislav Fomichev
  2022-06-16 16:41                   ` Maciej Żenczykowski
  0 siblings, 1 reply; 13+ messages in thread
From: Stanislav Fomichev @ 2022-06-16 15:57 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Alexei Starovoitov, Linux NetDev, BPF Mailing List,
	Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas,
	YiFei Zhu

On Wed, Jun 15, 2022 at 6:36 PM Maciej Żenczykowski <maze@google.com> wrote:
>
> > > > I've bisected the original issue to:
> > > >
> > > > b44123b4a3dc ("bpf: Add cgroup helpers bpf_{get,set}_retval to get/set
> > > > syscall return value")
> > > >
> > > > And I believe it's these two lines from the original patch:
> > > >
> > > >  #define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func)            \
> > > >     ({                                              \
> > > > @@ -1398,10 +1398,12 @@ out:
> > > >             u32 _ret;                               \
> > > >             _ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, 0, &_flags); \
> > > >             _cn = _flags & BPF_RET_SET_CN;          \
> > > > +           if (_ret && !IS_ERR_VALUE((long)_ret))  \
> > > > +                   _ret = -EFAULT;
> > > >
> > > > _ret is u32 and ret gets -1 (ffffffff). IS_ERR_VALUE((long)ffffffff)
> > > returns
> > > > false in this case because it doesn't sign-expand the argument and
> > > internally
> > > > does ffff_ffff >= ffff_ffff_ffff_f001 comparison.
> > > >
> > > > I'll try to see what I've changed in my unrelated patch to fix it. But
> > > I think
> > > > we should audit all these IS_ERR_VALUE((long)_ret) regardless; they
> > > don't
> > > > seem to work the way we want them to...
> >
> > > Ok, and my patch fixes it because I'm replacing 'u32 _ret' with 'int ret'.
> >
> > > So, basically, with u32 _ret we have to do IS_ERR_VALUE((long)(int)_ret).
> >
> > > Sigh..
> >
> > And to follow up on that, the other two places we have are fine:
> >
> > IS_ERR_VALUE((long)run_ctx.retval))
> >
> > run_ctx.retval is an int.
>
> I'm guessing this means the regression only affects 64-bit archs,
> where long = void* is 8 bytes > u32 of 4 bytes, but not 32-bit ones,
> where long = u32 = 4 bytes
>
> Unfortunately my dev machine's 32-bit build capability has somehow
> regressed again and I can't check this.

Seems so, yes. But I'm actually not sure whether we should at all
treat it as a regression. There is a question of whether that EPERM is
UAPI or not. That's why we most likely haven't caught it in the
selftests; most of the time we only check that syscall has returned -1
and don't pay attention to the particular errno.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3
  2022-06-16 15:57                 ` Stanislav Fomichev
@ 2022-06-16 16:41                   ` Maciej Żenczykowski
  2022-06-16 17:39                     ` Stanislav Fomichev
  2022-06-16 21:21                     ` YiFei Zhu
  0 siblings, 2 replies; 13+ messages in thread
From: Maciej Żenczykowski @ 2022-06-16 16:41 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Alexei Starovoitov, Linux NetDev, BPF Mailing List,
	Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas,
	YiFei Zhu

On Thu, Jun 16, 2022 at 8:57 AM Stanislav Fomichev <sdf@google.com> wrote:
> On Wed, Jun 15, 2022 at 6:36 PM Maciej Żenczykowski <maze@google.com> wrote:
> > I'm guessing this means the regression only affects 64-bit archs,
> > where long = void* is 8 bytes > u32 of 4 bytes, but not 32-bit ones,
> > where long = u32 = 4 bytes
> >
> > Unfortunately my dev machine's 32-bit build capability has somehow
> > regressed again and I can't check this.
>
> Seems so, yes. But I'm actually not sure whether we should at all
> treat it as a regression. There is a question of whether that EPERM is
> UAPI or not. That's why we most likely haven't caught it in the
> selftests; most of the time we only check that syscall has returned -1
> and don't pay attention to the particular errno.

EFAULT seems like a terrible error to return no matter what, it has a very clear
'memory read/write access violation' semantic (ie. if you'd done from
userspace you'd get a SIGSEGV)

I'm actually surprised to learn you return EFAULT on positive number...
It should rather be some unique error code or EINVAL or something.

I know someone will argue that (most/all) system calls can return EFAULT...
But that's not actually true.  From a userspace developer the expectation is
they will not return EFAULT if you pass in memory you know is good.

#include <sys/utsname.h>
int main() {
  struct utsname uts;
  uname(&uts);
}

The above cannot EFAULT in spite of it being documented as the only
error uname can report,
because obviously the uts structure on the stack is valid memory.

Maybe ENOSYS would at least make it obvious something is very weird.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3
  2022-06-16 16:41                   ` Maciej Żenczykowski
@ 2022-06-16 17:39                     ` Stanislav Fomichev
  2022-06-16 21:21                     ` YiFei Zhu
  1 sibling, 0 replies; 13+ messages in thread
From: Stanislav Fomichev @ 2022-06-16 17:39 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Alexei Starovoitov, Linux NetDev, BPF Mailing List,
	Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas,
	YiFei Zhu

On Thu, Jun 16, 2022 at 9:41 AM Maciej Żenczykowski <maze@google.com> wrote:
>
> On Thu, Jun 16, 2022 at 8:57 AM Stanislav Fomichev <sdf@google.com> wrote:
> > On Wed, Jun 15, 2022 at 6:36 PM Maciej Żenczykowski <maze@google.com> wrote:
> > > I'm guessing this means the regression only affects 64-bit archs,
> > > where long = void* is 8 bytes > u32 of 4 bytes, but not 32-bit ones,
> > > where long = u32 = 4 bytes
> > >
> > > Unfortunately my dev machine's 32-bit build capability has somehow
> > > regressed again and I can't check this.
> >
> > Seems so, yes. But I'm actually not sure whether we should at all
> > treat it as a regression. There is a question of whether that EPERM is
> > UAPI or not. That's why we most likely haven't caught it in the
> > selftests; most of the time we only check that syscall has returned -1
> > and don't pay attention to the particular errno.
>
> EFAULT seems like a terrible error to return no matter what, it has a very clear
> 'memory read/write access violation' semantic (ie. if you'd done from
> userspace you'd get a SIGSEGV)
>
> I'm actually surprised to learn you return EFAULT on positive number...
> It should rather be some unique error code or EINVAL or something.
>
> I know someone will argue that (most/all) system calls can return EFAULT...
> But that's not actually true.  From a userspace developer the expectation is
> they will not return EFAULT if you pass in memory you know is good.
>
> #include <sys/utsname.h>
> int main() {
>   struct utsname uts;
>   uname(&uts);
> }
>
> The above cannot EFAULT in spite of it being documented as the only
> error uname can report,
> because obviously the uts structure on the stack is valid memory.
>
> Maybe ENOSYS would at least make it obvious something is very weird.

I'd like to see less of the applications poking into errno and making
some decisions based on that. IMO, the only time where it makes sense
is EINTR/EAGAIN vs the rest; the rest should be logged. Having errnos
hard-coded in the tests is fine to make sure that the condition you're
testing against has been triggered; but still treating them as UAPI
might be too much, idk.

We had to add bpf_set_retval() because some of our and third party
libraries would upgrade to v6/v4 sockets only when they receive some
specific errno, which doesn't make a lot of sense to me.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3
  2022-06-16 16:41                   ` Maciej Żenczykowski
  2022-06-16 17:39                     ` Stanislav Fomichev
@ 2022-06-16 21:21                     ` YiFei Zhu
  1 sibling, 0 replies; 13+ messages in thread
From: YiFei Zhu @ 2022-06-16 21:21 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Stanislav Fomichev, Alexei Starovoitov, Linux NetDev,
	BPF Mailing List, Alexei Starovoitov, Martin KaFai Lau,
	Sasha Levin, Carlos Llamas

On Thu, Jun 16, 2022 at 9:41 AM Maciej Żenczykowski <maze@google.com> wrote:
>
> On Thu, Jun 16, 2022 at 8:57 AM Stanislav Fomichev <sdf@google.com> wrote:
> > On Wed, Jun 15, 2022 at 6:36 PM Maciej Żenczykowski <maze@google.com> wrote:
> > > I'm guessing this means the regression only affects 64-bit archs,
> > > where long = void* is 8 bytes > u32 of 4 bytes, but not 32-bit ones,
> > > where long = u32 = 4 bytes
> > >
> > > Unfortunately my dev machine's 32-bit build capability has somehow
> > > regressed again and I can't check this.
> >
> > Seems so, yes. But I'm actually not sure whether we should at all
> > treat it as a regression. There is a question of whether that EPERM is
> > UAPI or not. That's why we most likely haven't caught it in the
> > selftests; most of the time we only check that syscall has returned -1
> > and don't pay attention to the particular errno.
>
> EFAULT seems like a terrible error to return no matter what, it has a very clear
> 'memory read/write access violation' semantic (ie. if you'd done from
> userspace you'd get a SIGSEGV)

I chose EFAULT because the original code of getsockopt hook returns
-EFAULT if the retval is set to a number that isn't zero or the
original value. i.e., in c4dcfdd406aa^, there was:

  /* BPF programs only allowed to set retval to 0, not some
   * arbitrary value.
   */
  if (ctx.retval != 0 && ctx.retval != retval) {
          ret = -EFAULT;
          goto out;
  }

I understood that as the convention that if a BPF program does
something illegal at runtime, return -EFAULT.

> I'm actually surprised to learn you return EFAULT on positive number...
> It should rather be some unique error code or EINVAL or something.
>
> I know someone will argue that (most/all) system calls can return EFAULT...
> But that's not actually true.  From a userspace developer the expectation is
> they will not return EFAULT if you pass in memory you know is good.
>
> #include <sys/utsname.h>
> int main() {
>   struct utsname uts;
>   uname(&uts);
> }
>
> The above cannot EFAULT in spite of it being documented as the only
> error uname can report,
> because obviously the uts structure on the stack is valid memory.
>
> Maybe ENOSYS would at least make it obvious something is very weird.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-06-16 21:22 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-15 16:45 Curious bpf regression in 5.18 already fixed in stable 5.18.3 Maciej Żenczykowski
2022-06-15 16:57 ` Maciej Żenczykowski
2022-06-15 17:37   ` Alexei Starovoitov
2022-06-15 17:46     ` Maciej Żenczykowski
2022-06-15 17:55       ` sdf
2022-06-15 20:26         ` sdf
2022-06-15 20:32           ` sdf
2022-06-15 20:39             ` sdf
2022-06-16  1:36               ` Maciej Żenczykowski
2022-06-16 15:57                 ` Stanislav Fomichev
2022-06-16 16:41                   ` Maciej Żenczykowski
2022-06-16 17:39                     ` Stanislav Fomichev
2022-06-16 21:21                     ` YiFei Zhu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).