All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Terrell <terrelln@fb.com>
To: lkp@lists.01.org
Subject: Re: 5c1aab1dd5 ("btrfs: Add zstd support"): BUG: kernel hang in early-boot stage, last printk: Booting the kernel.
Date: Tue, 29 Aug 2017 23:06:16 +0000	[thread overview]
Message-ID: <0271FBD7-803B-4147-B4CF-459D6C89EB8D@fb.com> (raw)
In-Reply-To: <59a4e199.wyCxZm/7Ay1mDqgZ%fengguang.wu@intel.com>

[-- Attachment #1: Type: text/plain, Size: 10739 bytes --]

On 8/28/17, 8:39 PM, "kernel test robot" <fengguang.wu@intel.com> wrote:
> Greetings,
> 
> 0day kernel testing robot got the below dmesg and the first bad commit is
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> 
> commit 5c1aab1dd5445ed8bdcdbb575abc1b0d7ee5b2e7
> Author:     Nick Terrell <terrelln@fb.com>
> AuthorDate: Wed Aug 9 19:39:02 2017 -0700
> Commit:     Chris Mason <clm@fb.com>
> CommitDate: Tue Aug 15 09:02:09 2017 -0700
> 
>     btrfs: Add zstd support
>     
>     Add zstd compression and decompression support to BtrFS. zstd at its
>     fastest level compresses almost as well as zlib, while offering much
>     faster compression and decompression, approaching lzo speeds.
>     
>     I benchmarked btrfs with zstd compression against no compression, lzo
>     compression, and zlib compression. I benchmarked two scenarios. Copying
>     a set of files to btrfs, and then reading the files. Copying a tarball
>     to btrfs, extracting it to btrfs, and then reading the extracted files.
>     After every operation, I call `sync` and include the sync time.
>     Between every pair of operations I unmount and remount the filesystem
>     to avoid caching. The benchmark files can be found in the upstream
>     zstd source repository under
>     `contrib/linux-kernel/{btrfs-benchmark.sh,btrfs-extract-benchmark.sh}`
>     [1] [2].
>     
>     I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
>     The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
>     16 GB of RAM, and a SSD.
>     
>     The first compression benchmark is copying 10 copies of the unzipped
>     Silesia corpus [3] into a BtrFS filesystem mounted with
>     `-o compress-force=Method`. The decompression benchmark times how long
>     it takes to `tar` all 10 copies into `/dev/null`. The compression ratio is
>     measured by comparing the output of `df` and `du`. See the benchmark file
>     [1] for details. I benchmarked multiple zstd compression levels, although
>     the patch uses zstd level 1.
>     
>     | Method  | Ratio | Compression MB/s | Decompression speed |
>     |---------|-------|------------------|---------------------|
>     | None    |  0.99 |              504 |                 686 |
>     | lzo     |  1.66 |              398 |                 442 |
>     | zlib    |  2.58 |               65 |                 241 |
>     | zstd 1  |  2.57 |              260 |                 383 |
>     | zstd 3  |  2.71 |              174 |                 408 |
>     | zstd 6  |  2.87 |               70 |                 398 |
>     | zstd 9  |  2.92 |               43 |                 406 |
>     | zstd 12 |  2.93 |               21 |                 408 |
>     | zstd 15 |  3.01 |               11 |                 354 |
>     
>     The next benchmark first copies `linux-4.11.6.tar` [4] to btrfs. Then it
>     measures the compression ratio, extracts the tar, and deletes the tar.
>     Then it measures the compression ratio again, and `tar`s the extracted
>     files into `/dev/null`. See the benchmark file [2] for details.
>     
>     | Method | Tar Ratio | Extract Ratio | Copy (s) | Extract (s)| Read (s) |
>     |--------|-----------|---------------|----------|------------|----------|
>     | None   |      0.97 |          0.78 |    0.981 |      5.501 |    8.807 |
>     | lzo    |      2.06 |          1.38 |    1.631 |      8.458 |    8.585 |
>     | zlib   |      3.40 |          1.86 |    7.750 |     21.544 |   11.744 |
>     | zstd 1 |      3.57 |          1.85 |    2.579 |     11.479 |    9.389 |
>     
>     [1] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-benchmark.sh
>     [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-extract-benchmark.sh
>     [3] https://urldefense.proofpoint.com/v2/url?u=http-3A__sun.aei.polsl.pl_-7Esdeor_index.php-3Fpage-3Dsilesia&d=DwICAg&c=5VD0RTtNlTh3ycd41b3MUw&r=HQM5IQdWOB8WaMoii2dYTw&m=jKbTw_D1YPuKLOwR-K2a-Jr4XtdNiz4rTSEQA3yOv6E&s=2rrCeLrxEl7CwQIV25514qTmsvT3DKiKnpXW7QPcU20&e= 
>     [4] https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.11.6.tar.xz
>     
>     zstd source repository: https://github.com/facebook/zstd
>     
>     Signed-off-by: Nick Terrell <terrelln@fb.com>
>     Signed-off-by: Chris Mason <clm@fb.com>
> 
> 73f3d1b48f  lib: Add zstd modules
> 5c1aab1dd5  btrfs: Add zstd support
> adc4148c10  Add linux-next specific files for 20170828
> +--------------------------------------------------------------------+------------+------------+---------------+
> |                                                                    | 73f3d1b48f | 5c1aab1dd5 | next-20170828 |
> +--------------------------------------------------------------------+------------+------------+---------------+
> | boot_successes                                                     | 31         | 0          | 0             |
> | boot_failures                                                      | 4          | 21         | 23            |
> | IP-Config:Auto-configuration_of_network_failed                     | 4          |            |               |
> | BUG:kernel_hang_in_early-boot_stage,last_printk:Booting_the_kernel | 0          | 21         | 23            |
> +--------------------------------------------------------------------+------------+------------+---------------+
> 
> 
> Decompressing Linux... Parsing ELF... done.
> Booting the kernel.
> 
> 
>                                                           # HH:MM RESULT GOOD BAD GOOD_BUT_DIRTY DIRTY_NOT_BAD
> git bisect start 7d744d9889d3bf6d18152cc7ab2eff4b541e91ac 14ccee78fc82f5512908f4424f541549a5705b89 --
> git bisect good 7a40daca5117a71f893b75929e7d6cce24eeb15a  # 03:50  G     11     0    0   0  Merge 'linux-review/Vadim-Lomovtsev/net-sunrpc-svcsock-fix-NULL-pointer-exception/20170823-212714' into devel-spot-201708240558
> git bisect good 0f3e945501ce5af141983e83078f1aea933ae2ee  # 04:36  G     11     0    0   0  Merge 'linux-review/Dmitry-Fleytman/usb-Add-device-quirk-for-Logitech-HD-Pro-Webcam-C920-C/20170823-145321' into devel-spot-201708240558
> git bisect  bad 5c3677900758dfd87eec1bade63f66853683f00e  # 06:43  B      0     3   16   0  Merge 'char-misc/char-misc-testing' into devel-spot-201708240558
> git bisect good 871b4817f99d09e8294a7f4f6e550603ce23a599  # 08:56  G     10     0    0   2  Merge 'linux-review/Christophe-JAILLET/phy-qcom-usb-hsic-Fix-error-handling/20170823-114654' into devel-spot-201708240558
> git bisect  bad 32771c0528f294c21f373021a6b7facbb6ca3af1  # 08:56  B      0     6   18   0  Merge 'stffrdhrn/openrisc-4.13-smp-qspinlock' into devel-spot-201708240558
> git bisect  bad 202e19b651a4ede36ae1cfbaea0fad6f68a5eee3  # 09:50  B      0    11   33   9  Merge 'linux-review/SF-Markus-Elfring/btrfs-Use-common-error-handling-code-in-update_ref_path/20170823-105603' into devel-spot-201708240558
> git bisect  bad 5c1aab1dd5445ed8bdcdbb575abc1b0d7ee5b2e7  # 10:21  B      0    11   25   2  btrfs: Add zstd support
> git bisect good 73f3d1b48f5069d46ba48aa28c2898dc93185560  # 10:42  G     11     0    0   0  lib: Add zstd modules
> # first bad commit: [5c1aab1dd5445ed8bdcdbb575abc1b0d7ee5b2e7] btrfs: Add zstd support
> git bisect good 73f3d1b48f5069d46ba48aa28c2898dc93185560  # 10:53  G     31     0    0   4  lib: Add zstd modules
> # extra tests with CONFIG_DEBUG_INFO_REDUCED
> git bisect  bad 5c1aab1dd5445ed8bdcdbb575abc1b0d7ee5b2e7  # 11:12  B      0    11   23   0  btrfs: Add zstd support
> # extra tests on HEAD of linux-devel/devel-spot-201708240558
> git bisect  bad 7d744d9889d3bf6d18152cc7ab2eff4b541e91ac  # 11:13  B      0    17   32   0  0day head guard for 'devel-spot-201708240558'
> # extra tests on tree/branch linux-next/master
> git bisect  bad adc4148c101c038bc105ca4539083dcd1a246596  # 11:32  B      0     3   15   0  Add linux-next specific files for 20170828
> 
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.01.org_pipermail_lkp&d=DwICAg&c=5VD0RTtNlTh3ycd41b3MUw&r=HQM5IQdWOB8WaMoii2dYTw&m=jKbTw_D1YPuKLOwR-K2a-Jr4XtdNiz4rTSEQA3yOv6E&s=vyJpcaq6bmI9UvAmrgoO22Zvq47hlKS_i33NkppQJpI&e=                           Intel Corporation
> 

Omar and I have been debugging this issue. We've reproduced it with the
reproduction config/script attached with gcc-6.4, but not gcc-7. The QEMU
and dmesg output of the reproduced crash is

early console in setup code
early console in extract_kernel
input_data: 0x0000000006382276
input_len: 0x0000000000c7f75c
output: 0x0000000001000000
output_len: 0x0000000003c28878
kernel_total_size: 0x000000000601f000

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
KVM internal error. Suberror: 3
extra data[0]: 80000b0e
extra data[1]: 31
RAX=1ffffffff0859ef4 RBX=ffffffff842cf7c0 RCX=0000000000000001 RDX=0000000000000000
RSI=000000000000000d RDI=ffffffff800bffa8 RBP=ffffffff826a08b0 RSP=ffffffff800bff98
R8 =0000000000000007 R9 =0000000000000004 R10=ffffffff8552db6c R11=ffffffff8552db43
R12=0000000000000008 R13=0000000000000000 R14=0000000000000001 R15=0000000000000001
RIP=ffffffff826a0910 RFL=00010097 [--S-APC] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 ffffffff 00000000
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0000 0000000000000000 ffffffff 00000000
DS =0000 0000000000000000 ffffffff 00000000
FS =0000 0000000000000000 ffffffff 00000000
GS =0000 ffffffff843fa000 ffffffff 00000000
LDT=0000 0000000000000000 ffffffff 00000000
TR =0020 0000000000000000 00000fff 00808b00 DPL=0 TSS64-busy
GDT=     ffffffff8440c000 0000007f
IDT=     ffffffff84829000 00000fff
CR0=80050033 CR2=000000000000000d CR3=00000000046a6000 CR4=000006a0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000d01
Code=ff ff ff ff 01 00 00 00 00 00 00 00 48 09 6a 82 ff ff ff ff <a7> 71 2b 81 ff ff ff ff 07 00 00 00 00 00 00 00 c0 f7 2c 84 ff ff ff ff 08 00 00 00 00 00

I think the KVM error is a page fault.

We believe that the issue is caused by the size of the kernel image.
The error goes away if you disable any of XFS, F2FS, FUSE, BTRFS_ASSERT,
KASAN, or likely any large chunk of code.

The page fault happens in an unrelated piece of code (partial manually
gathered stack trace):

kernel/sched/core.c      task_rq_lock()
kernel/sched/core.c:1068 __set_cpus_allowed_ptr()
init/main.c:409          rest_init()
init/main.c:701          start_kernel()



      reply	other threads:[~2017-08-29 23:06 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-29  3:38 5c1aab1dd5 ("btrfs: Add zstd support"): BUG: kernel hang in early-boot stage, last printk: Booting the kernel kernel test robot
2017-08-29 23:06 ` Nick Terrell [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0271FBD7-803B-4147-B4CF-459D6C89EB8D@fb.com \
    --to=terrelln@fb.com \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.