On Wed, Jan 6, 2021 at 3:01 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> I triggered the following crash on x86_32 by simply doing a:
>
> (ssh'ing into the box)
>
>   # head -100 /tmp/output-file
>
> Where the /tmp/output-file was the output of a trace-cmd report.
> Even after rebooting and not running the tracing code, simply doing the
> head command still crashed.

The code decodes to

   0:   3b 5d e8                cmp    -0x18(%ebp),%ebx
   3:   0f 47 5d e8             cmova  -0x18(%ebp),%ebx
   7:   c7 45 e0 00 00 00 00    movl   $0x0,-0x20(%ebp)
   e:   8b 7d e0                mov    -0x20(%ebp),%edi
  11:   39 7d e8                cmp    %edi,-0x18(%ebp)
  14:   76 3a                   jbe    0x50
  16:   8b 45 d4                mov    -0x2c(%ebp),%eax
  19:   e8 a4 e4 ff ff          call   0xffffe4c2
  1e:   8b 55 e4                mov    -0x1c(%ebp),%edx
  21:   03 55 e0                add    -0x20(%ebp),%edx
  24:   89 d9                   mov    %ebx,%ecx
  26:   01 c6                   add    %eax,%esi
  28:   89 d7                   mov    %edx,%edi
  2a:*  f3 a4                   rep movsb %ds:(%esi),%es:(%edi)
 <-- trapping instruction
  2c:   e8 c9 e4 ff ff          call   0xffffe4fa
  31:   01 5d e0                add    %ebx,-0x20(%ebp)
  34:   8b 5d e8                mov    -0x18(%ebp),%ebx
  37:   b8 00 10 00 00          mov    $0x1000,%eax
  3c:   2b 5d e0                sub    -0x20(%ebp),%ebx

and while it would be good to see the output of
scripts/decode_stacktrace.sh, I strongly suspect that the above is

                                vaddr = kmap_atomic(p);
                                memcpy(to + copied, vaddr + p_off, p_len);
                                kunmap_atomic(vaddr);

(although I wonder how/why the heck you've enabled
CC_OPTIMIZE_FOR_SIZE=y, which is what causes "memcpy()" to be done as
that "rep movsb". I thought we disabled it because it's so bad on most
cpus).

So that first "call" instruction is the kmap_atomic(), the "rep movs"
is the memcpy(), and the "call" instruction immediately after is the
kunmap_atomic().

Anyway, you can see vaddr in register state:

        EAX: fff57000

so we've kmapped that one page at fff57000, but we're accessing past
it into the next page:

> BUG: unable to handle page fault for address: fff58000

with the current source address being (ESI: fff58000) and we still
have 248 bytes to go (ECX: 000000f8) even though we've already
overflowed into the next page.

You can see the original count still (EBX: 000005a8), so it really
looks like that skb_frag_foreach_page() logic

                        skb_frag_foreach_page(f,
                                              skb_frag_off(f) + offset - start,
                                              copy, p, p_off, p_len, copied) {
                                vaddr = kmap_atomic(p);
                                memcpy(to + copied, vaddr + p_off, p_len);
                                kunmap_atomic(vaddr);
                        }

must be wrong, and doesn't handle the "each page" part properly. It
must have started in the middle of the page, and p_len (that 0x5a8)
was wrong.

IOW, it really looks like p_off + p_len had the value 0x10f8, which is
larger than one page. And looking at the code, in
skb_frag_foreach_page(), I see:

             p_off = (f_off) & (PAGE_SIZE - 1),                         \
             p_len = skb_frag_must_loop(p) ?                            \
             min_t(u32, f_len, PAGE_SIZE - p_off) : f_len,              \

where that "min_t(u32, f_len, PAGE_SIZE - p_off)" looks correct, but
then presumably skb_frag_must_loop() must be wrong.

Oh, and when I look at that, I see

    static inline bool skb_frag_must_loop(struct page *p)
    {
    #if defined(CONFIG_HIGHMEM)
            if (PageHighMem(p))
                    return true;
    #endif
            return false;
    }

and that is no longer true. With the kmap debugging, even non-highmem
pages need that "do one page at a time" code, because even non-highmem
pages get remapped by kmap().

IOW, I think the patch to fix this might be something like the attached.

I wonder whether there is other code that "knows" about kmap() only
affecting PageHighmem() pages thing that is no longer true.

Looking at some other code, skb_gro_reset_offset() looks suspiciously
like it also thinks highmem pages are special.

Adding the networking people involved in this area to the cc too.

               Linus