* Linux 4.18-rc7 @ 2018-07-29 22:09 Linus Torvalds 2018-07-30 6:47 ` Amit Pundir 2018-07-30 13:24 ` Guenter Roeck 0 siblings, 2 replies; 32+ messages in thread From: Linus Torvalds @ 2018-07-29 22:09 UTC (permalink / raw) To: Linux Kernel Mailing List So unless something odd happens, this should be the last rc for 4.18. Nothing particularly odd happened this last week - we got the usual random set of various minor fixes all over. About two thirds of it is drivers - networking, staging and usb stands out, but there's a little bit of stuff all over (clk, block, gpu, nvme..). Outside of drivers, the bulk is some core networking stuff, with random changes elsewhere (minor arch updates, filesystems, core kernel, test scripts). The appended shortlog gives a flavor of the details. Linus --- Adam Thomson (1): usb: typec: tcpm: Fix sink PDO starting index for PPS APDO selection Aleksander Morgado (1): qmi_wwan: fix interface number for DW5821e production firmware Alexander Sverdlin (1): i2c: davinci: Avoid zero value of CLKH Amar Singhal (1): cfg80211: never ignore user regulatory hint Anssi Hannula (7): can: xilinx_can: fix device dropping off bus on RX overrun can: xilinx_can: fix RX loop if RXNEMP is asserted without RXOK can: xilinx_can: fix recovery from error states not being propagated can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting can: xilinx_can: fix RX overflow interrupt not being enabled can: xilinx_can: fix incorrect clear of non-processed interrupts can: xilinx_can: fix power management handling Antti Seppälä (2): usb: dwc2: Fix DMA alignment to start at allocated boundary usb: dwc2: Fix inefficient copy of unaligned buffers Ariel Levkovich (1): net/mlx5: Adjust clock overflow work period Arnd Bergmann (2): kasan: only select SLUB_DEBUG with SYSFS=y include/linux/eventfd.h: include linux/errno.h Artem Savkov (1): tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure Benjamin Herrenschmidt (4): clk: aspeed: Treat a gate in reset as disabled usb: gadget: aspeed-vhub: Fix SETUP packets with OUT data phase usb: gadget: Fix OS descriptors support usb: gadget: aspeed: Workaround memory ordering issue Bernd Edlinger (1): nl80211: Add a missing break in parse_station_flags Bin Liu (1): usb: core: handle hub C_PORT_OVER_CURRENT condition Carlos Maiolino (1): xfs: Initialize variables in xfs_alloc_get_rec before using them Chen-Yu Tsai (1): Input: i8042 - add Lenovo LaVie Z to the i8042 reset list Clint Taylor (1): drm/i915/glk: Add Quirk for GLK NUC HDMI port issues. Colin Ian King (1): usb: dwc3: gadget: remove redundant variable maxpacket Dan Williams (1): mm: fix exports that inadvertently make put_page() EXPORT_SYMBOL_GPL Daniel Borkmann (3): bpf, ppc64: fix unexpected r0=0 exit path inside bpf_xadd bpf: test case to check whether src/dst regs got mangled by xadd sock: fix sg page frag coalescing in sk_alloc_sg Dave Jiang (1): mm: disallow mappings that conflict for devm_memremap_pages() David Ahern (1): net/ipv6: Fix linklocal to global address with VRF Davidlohr Bueso (1): ipc/sem.c: prevent queue.status tearing in semop Denis Kenzior (1): nl80211/mac80211: allow non-linear skb in rx_control_port Dirk Mueller (1): arm64: Check for errata before evaluating cpu features Dmitry Torokhov (1): usb: dwc2: host: do not delay retries for CONTROL IN transfers Donald Shanty III (1): Input: elan_i2c - add ACPI ID for lenovo ideapad 330 Doron Roberts-Kedes (1): tls: check RCV_SHUTDOWN in tls_wait_data Enric Balletbo i Serra (1): usb: dwc3: rockchip: Fix PHY documentation links. Eran Ben Elisha (2): net/mlx5e: Fix quota counting in aRFS expire flow net/mlx5e: Don't allow aRFS for encapsulated packets Eric Dumazet (6): net: skb_segment() should not return NULL tcp: free batches of packets in tcp_prune_ofo_queue() tcp: avoid collapses in tcp_prune_queue() if possible tcp: detect malicious patterns in tcp_collapse_ofo_queue() tcp: call tcp_drop() from tcp_data_queue_ofo() tcp: add tcp_ooo_try_coalesce() helper Eric Sandeen (1): xfs: properly handle free inodes in extent hint validators Esben Haabendal (1): i2c: imx: Fix reinit_completion() use Eugeniu Rosca (3): usb: gadget: f_uac2: fix error handling in afunc_bind (again) usb: gadget: u_audio: fix pcm/card naming in g_audio_setup() usb: gadget: f_uac2: fix endianness of 'struct cntrl_*_lay3' Fabio Estevam (1): usb: chipidea: Always build ULPI code Faiz Abbas (2): can: m_can: Fix runtime resume call can: m_can: Move accessing of message ram to after clocks are enabled Felix Fietkau (1): MIPS: ath79: fix register address in ath79_ddr_wb_flush() Florian Westphal (7): netfilter: nf_tables: use dev->name directly netfilter: nf_tables: free flow table struct too netfilter: nf_tables: fix memory leaks on chain rename netfilter: nf_tables: don't allow to rename to already-pending name netfilter: conntrack: dccp: treat SYNC/SYNCACK as invalid if no prior state atl1c: reserve min skb headroom netfilter: nf_tables: move dumper state allocation into ->start Geert Uytterhoeven (2): MAINTAINERS: Add file patterns for serio device tree bindings clk: Really show symbolic clock flags in debugfs Greg Edwards (1): block: reset bi_iter.bi_done after splitting bio Gregory CLEMENT (1): clk: mvebu: armada-37xx-periph: Fix switching CPU rate from 300Mhz to 1.2GHz Guenter Roeck (1): media: staging: omap4iss: Include asm/cacheflush.h after generic includes Hangbin Liu (1): multicast: do not restore deleted record source filter mode to new one Hannes Reinecke (2): nvmet: fixup crash on NULL device path nvmet: only check for filebacking on -ENOTBLK Hans de Goede (1): Revert "staging:r8188eu: Use lib80211 to support TKIP" Heiner Kallweit (2): net: phy: consider PHY_IGNORE_INTERRUPT in phy_start_aneg_priv r8169: restore previous behavior to accept BIOS WoL settings Jack Morgenstein (1): net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper Jaedon Shin (1): phy: phy-brcm-usb-init: Fix power down USB 3.0 PHY when XHCI reenabled James Smart (2): nvmet-fc: fix target sgl list on large transfers nvme: if_ready checks to fail io to deleting controller Jarod Wilson (1): bonding: set default miimon value for non-arp modes if not set Jerome Brunet (1): clk: meson: audio-divider is one based Jerry Zhang (1): usb: gadget: f_fs: Only return delayed status when len is 0 Jia-Ju Bai (2): usb: gadget: r8a66597: Fix two possible sleep-in-atomic-context bugs in init_controller() usb: gadget: r8a66597: Fix a possible sleep-in-atomic-context bugs in r8a66597_queue() Joel Stanley (2): clk: aspeed: Mark bclk (PCIe) and dclk (VGA) as critical clk: aspeed: Support HPLL strapping on ast2400 Johannes Weiner (1): arm64: fix vmemmap BUILD_BUG_ON() triggering on !vmemmap setups John Hurley (1): nfp: flower: ensure dead neighbour entries are not offloaded John Keeping (1): usb: dwc2: avoid NULL dereferences Josef Bacik (2): nbd: don't requeue the same request twice. nbd: handle unexpected replies better Joshua Frkuska (1): usb: gadget: u_audio: update hw_ptr in iso_complete after data copied KT Liao (1): Input: elan_i2c - add another ACPI ID for Lenovo Ideapad 330-15AST Keith Busch (2): blk-mq: export setting request completion state scsi: set timed out out mq requests to complete Kiran Kumar Modukuri (5): fscache: Allow cancelled operations to be enqueued cachefiles: Fix refcounting bug in backing-file read monitoring fscache: Fix reference overput in fscache_attach_object() error handling cachefiles: Fix missing clear of the CACHEFILES_OBJECT_ACTIVE flag cachefiles: Wait rather than BUG'ing on "Unexpected object collision" Kirill A. Shutemov (3): mm: introduce vma_init() mm: use vma_init() to initialize VMAs on stack and data segments mm: fix vma_is_anonymous() false-positives Li Wang (1): zswap: re-check zswap_is_full() after do zswap_shrink() Linus Torvalds (2): squashfs: be more careful about metadata corruption Linux 4.18-rc7 Linus Walleij (1): gpio: of: Handle fixed regulator flags properly Lubomir Rintel (1): usb: cdc_acm: Add quirk for Castles VEGA3000 Lucas Stach (2): drm/imx: imx-ldb: disable LDB on driver bind drm/imx: imx-ldb: check if channel is enabled before printing warning Martin KaFai Lau (2): bpf: btf: Clean up BTF_INT_BITS() in uapi btf.h bpf: Use option "help" in the llvm-objcopy test Martin Schwidefsky (1): s390: disable gcc plugins Martin Wilck (3): block: bio_iov_iter_get_pages: fix size of last iovec blkdev: __blkdev_direct_IO_simple: fix leak in error case block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs Masahiro Yamada (1): gpio: uniphier: set legitimate irq trigger type in .to_irq hook Masami Hiramatsu (2): ring_buffer: tracing: Inherit the tracing setting to next ring buffer selftests/ftrace: Add snapshot and tracing_on test case Neil Armstrong (1): clk: meson-gxbb: set fclk_div2 as CLK_IS_CRITICAL Nicholas Mc Guire (1): can: mpc5xxx_can: check of_iomap return before use Paolo Abeni (1): ip: hash fragments consistently Paul Burton (1): MIPS: Fix off-by-one in pci_resource_to_user() Peter Senna Tschudin (1): tools: usb: ffs-test: Fix build on big endian systems Raed Salem (1): net/mlx5: Fix 'DON'T_TRAP' functionality Rafael J. Wysocki (1): driver core: Partially revert "driver core: correct device's shutdown order" Rafał Miłecki (1): Revert "MIPS: BCM47XX: Enable 74K Core ExternalSync for PCIe erratum" Randy Dunlap (2): usb/phy: fix PPC64 build errors in phy-fsl-usb.c net: prevent ISA drivers from building on PPC32 Roi Dayan (1): net/mlx5e: Only allow offloading decap egress (egdev) flows Roman Fietze (1): can: m_can.c: fix setup of CCCR register: clear CCCR NISO bit before checking can.ctrlmode Roopa Prabhu (4): rtnetlink: add rtnl_link_state check in rtnl_configure_link vxlan: add new fdb alloc and create helpers vxlan: make netlink notify in vxlan_fdb_destroy optional vxlan: fix default fdb entry netlink notify ordering during netdev create Saeed Mahameed (1): net/mlx5: E-Switch, UBSAN fix undefined behavior in mlx5_eswitch_mode Samuel Thibault (1): staging: speakup: fix wraparound in uaccess length check Schmauss, Erik (1): ACPICA: AML Parser: ignore dispatcher error status during table load Sergio Paracuellos (1): staging: ks7010: call 'hostif_mib_set_request_int' instead of 'hostif_mib_set_request_bool' Shakeel Butt (1): kvm, mm: account shadow page tables to kmemcg Shay Agroskin (1): net/mlx5e: Refine ets validation function Shubhrajyoti Datta (1): net: axienet: Fix double deregister of mdio Snild Dolkow (1): kthread, tracing: Don't expose half-written comm when creating kthreads Stephane Grosjean (1): can: peak_canfd: fix firmware < v3.3.0: limit allocation to 32-bit DMA addr only Steve Longerbeam (1): gpu: ipu-csi: Check for field type alternate Steven Rostedt (VMware) (3): tracing: Fix double free of event_trigger_data tracing: Fix possible double free in event_enable_trigger_func() tracing: Quiet gcc warning about maybe unused link variable Sudarsana Reddy Kalluru (4): qed: Fix link flap issue due to mismatching EEE capabilities. qed: Fix possible race for the link state value. qed: Correct Multicast API to reflect existence of 256 approximate buckets. bnx2x: Fix invalid memory access in rss hash config path. Taehee Yoo (3): netfilter: nf_tables: fix jumpstack depth validation netfilter: nft_set_hash: add rcu_barrier() in the nft_rhash_destroy() netfilter: nft_set_rbtree: fix panic when destroying set by GC Taeung Song (1): tools/bpftool: Fix segfault case regarding 'pin' arguments Tariq Toukan (2): net/mlx5: Fix QP fragmented buffer allocation net/page_pool: Fix inconsistent lock state warning Tejun Heo (1): delayacct: fix crash in delayacct_blkio_end() after delayacct init failure Theodore Ts'o (6): ext4: fix false negatives *and* false positives in ext4_check_descriptors() ext4: clear mmp sequence number when remounting read-only ext4: fix inline data updates with checksums enabled ext4: check for allocation block validity with block group locked random: mix rdrand with entropy sent in from userspace ext4: fix check to prevent initializing reserved inodes Thomas Tai (1): PCI/AER: Work around use-after-free in pcie_do_fatal_recovery() Tony Lindgren (1): phy: mapphone-mdm6600: Fix wrong enum used for status lines Uwe Kleine-König (1): net: dsa: mv88e6xxx: fix races between lock and irq freeing Vinod Koul (1): clk: qcom: gcc-msm8996: Disable halt check on UFS tx clock Vivek Gautam (1): clk/mmcc-msm8996: Make mmagic_bimc_gdsc ALWAYS_ON Vladimir Zapolskiy (3): usb: gadget: u_audio: remove caching of stream buffer parameters usb: gadget: u_audio: remove cached period bytes value usb: gadget: u_audio: protect stream runtime fields with stream spinlock Wei Wang (1): ipv6: use fib6_info_hold_safe() when necessary Willem de Bruijn (1): ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull Wolfram Sang (2): i2c: rcar: handle RXDMA HW behaviour on Gen3 i2c: imx: use open drain for recovery GPIO Yuchung Cheng (3): tcp: helpers to send special DCTCP ack tcp: do not cancel delay-AcK on DCTCP special ACK tcp: do not delay ACK in DCTCP upon CE status change YueHaibing (3): net: caif: Add a missing rcu_read_unlock() in caif_flow_cb bpfilter: Fix mismatch in function argument types cpufreq: qcom-kryo: add NULL entry to the end of_device_id array Zhao Chen (1): net-next/hinic: fix a problem in hinic_xmit_frame() Zheng Xiaowei (1): usb: xhci: Fix memory leak in xhci_endpoint_reset() mpubbise@codeaurora.org (1): mac80211: add stations tied to AP_VLANs during hw reconfig ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-07-29 22:09 Linux 4.18-rc7 Linus Torvalds @ 2018-07-30 6:47 ` Amit Pundir 2018-07-30 13:01 ` Kirill A. Shutemov 2018-07-30 13:24 ` Guenter Roeck 1 sibling, 1 reply; 32+ messages in thread From: Amit Pundir @ 2018-07-30 6:47 UTC (permalink / raw) To: Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, aarcange, Linus Torvalds Cc: Greg Kroah-Hartman, John Stultz, linux-mm, lkml On Mon, 30 Jul 2018 at 03:39, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > So unless something odd happens, this should be the last rc for 4.18. > > Nothing particularly odd happened this last week - we got the usual > random set of various minor fixes all over. About two thirds of it is > drivers - networking, staging and usb stands out, but there's a little > bit of stuff all over (clk, block, gpu, nvme..). > > Outside of drivers, the bulk is some core networking stuff, with > random changes elsewhere (minor arch updates, filesystems, core > kernel, test scripts). > > The appended shortlog gives a flavor of the details. > > Linus > > --- > Kirill A. Shutemov (3): > mm: introduce vma_init() > mm: use vma_init() to initialize VMAs on stack and data segments > mm: fix vma_is_anonymous() false-positives Hi, I have run into AOSP userspace crash with v4.18-rc7, leading to above mm patches. bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives") to be specific. The same userspace is working fine with v4.18-rc6. I didn't yet look into what is going wrong from userspace point of view, but I just wanted to give you a heads up on this. I'll be happy to assist in further debugging/diagnosis if required. Here is the crash log from logcat, if it helps: F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** F DEBUG : Build fingerprint: 'Android/db410c32_only/db410c32_only:Q/OC-MR1/102:userdebug/test-key F DEBUG : Revision: '0' F DEBUG : ABI: 'arm' F DEBUG : pid: 2261, tid: 2261, name: zygote >>> zygote <<< F DEBUG : signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0xec00008 .. <snip> .. F DEBUG : backtrace: F DEBUG : #00 pc 00001c04 /system/lib/libc.so (memset+48) F DEBUG : #01 pc 0010c513 /system/lib/libart.so (create_mspace_with_base+82) F DEBUG : #02 pc 0015c601 /system/lib/libart.so (art::gc::space::DlMallocSpace::CreateMspace(void*, unsigned int, unsigned int)+40) F DEBUG : #03 pc 0015c3ed /system/lib/libart.so (art::gc::space::DlMallocSpace::CreateFromMemMap(art::MemMap*, std::__1::basic_string<char, std::__ 1::char_traits<char>, std::__1::allocator<char>> const&, unsigned int, unsigned int, unsigned int, unsigned int, bool)+36) F DEBUG : #04 pc 0013c9ab /system/lib/libart.so (art::gc::Heap::Heap(unsigned int, unsigned int, unsigned int, unsigned int, double, double, unsigned int, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, art::InstructionSet, art::gc::CollectorType, art::gc::CollectorType, art::gc::space::LargeObjectSpaceType, unsigned int, unsigned int, unsigned int, bool, unsigned int, unsigned int, bool, bool, bool, bool, bool, bool, bool, bool, bool, bool, bool, unsigned long long)+1674) DEBUG : #05 pc 00318201 /system/lib/libart.so (art::Runtime::Init(art::RuntimeArgumentMap&&)+7036) DEBUG : #06 pc 0031af19 /system/lib/libart.so (art::Runtime::Create(std::__1::vector<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, void const*>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, void const*>>> const&, bool)+68) F DEBUG : #07 pc 0023c353 /system/lib/libart.so (JNI_CreateJavaVM+658) F DEBUG : #08 pc 0000205f /system/lib/libandroid_runtime.so (android::AndroidRuntime::startVm(_JavaVM**, _JNIEnv**, bool)+5038) F DEBUG : #09 pc 00002381 /system/lib/libandroid_runtime.so (android::AndroidRuntime::start(char const*, android::Vector<android::String8> const&, bool)+196) F DEBUG : #10 pc 0000046b /system/bin/app_process32 (main+702) Regards, Amit Pundir ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-07-30 6:47 ` Amit Pundir @ 2018-07-30 13:01 ` Kirill A. Shutemov 2018-07-30 13:34 ` Amit Pundir 2018-07-30 17:32 ` Linus Torvalds 0 siblings, 2 replies; 32+ messages in thread From: Kirill A. Shutemov @ 2018-07-30 13:01 UTC (permalink / raw) To: Amit Pundir Cc: Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, aarcange, Linus Torvalds, Greg Kroah-Hartman, John Stultz, linux-mm, lkml, youling 257 On Mon, Jul 30, 2018 at 12:17:46PM +0530, Amit Pundir wrote: > On Mon, 30 Jul 2018 at 03:39, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > So unless something odd happens, this should be the last rc for 4.18. > > > > Nothing particularly odd happened this last week - we got the usual > > random set of various minor fixes all over. About two thirds of it is > > drivers - networking, staging and usb stands out, but there's a little > > bit of stuff all over (clk, block, gpu, nvme..). > > > > Outside of drivers, the bulk is some core networking stuff, with > > random changes elsewhere (minor arch updates, filesystems, core > > kernel, test scripts). > > > > The appended shortlog gives a flavor of the details. > > > > Linus > > > > --- > > Kirill A. Shutemov (3): > > mm: introduce vma_init() > > mm: use vma_init() to initialize VMAs on stack and data segments > > mm: fix vma_is_anonymous() false-positives > > Hi, I have run into AOSP userspace crash with v4.18-rc7, leading to > above mm patches. bfd40eaff5ab ("mm: fix vma_is_anonymous() > false-positives") to be specific. The same userspace is working fine > with v4.18-rc6. > > I didn't yet look into what is going wrong from userspace point of > view, but I just wanted to give you a heads up on this. I'll be happy > to assist in further debugging/diagnosis if required. Youling reported basically the same bug with zygote crashing, but on x86-64. I think I missed vma_set_anonymous() somewhere, but I fail to see where. Could you check if removing 'vma->vm_ops = &dummy_vm_ops;" from vma_init makes the problem go away? Any chance the code that crashes can be run under strace? > Here is the crash log from logcat, if it helps: > F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** > F DEBUG : Build fingerprint: > 'Android/db410c32_only/db410c32_only:Q/OC-MR1/102:userdebug/test-key > F DEBUG : Revision: '0' > F DEBUG : ABI: 'arm' > F DEBUG : pid: 2261, tid: 2261, name: zygote >>> zygote <<< > F DEBUG : signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0xec00008 > .. <snip> .. > F DEBUG : backtrace: > F DEBUG : #00 pc 00001c04 /system/lib/libc.so (memset+48) > F DEBUG : #01 pc 0010c513 /system/lib/libart.so > (create_mspace_with_base+82) > F DEBUG : #02 pc 0015c601 /system/lib/libart.so > (art::gc::space::DlMallocSpace::CreateMspace(void*, unsigned int, > unsigned int)+40) > F DEBUG : #03 pc 0015c3ed /system/lib/libart.so > (art::gc::space::DlMallocSpace::CreateFromMemMap(art::MemMap*, > std::__1::basic_string<char, std::__ > 1::char_traits<char>, std::__1::allocator<char>> const&, unsigned int, > unsigned int, unsigned int, unsigned int, bool)+36) > F DEBUG : #04 pc 0013c9ab /system/lib/libart.so > (art::gc::Heap::Heap(unsigned int, unsigned int, unsigned int, > unsigned int, double, double, unsigned int, unsigned int, > std::__1::basic_string<char, std::__1::char_traits<char>, > std::__1::allocator<char>> const&, art::InstructionSet, > art::gc::CollectorType, art::gc::CollectorType, > art::gc::space::LargeObjectSpaceType, unsigned int, unsigned int, > unsigned int, bool, unsigned int, unsigned int, bool, bool, bool, > bool, bool, bool, bool, bool, bool, bool, bool, unsigned long > long)+1674) > DEBUG : #05 pc 00318201 /system/lib/libart.so > (art::Runtime::Init(art::RuntimeArgumentMap&&)+7036) > DEBUG : #06 pc 0031af19 /system/lib/libart.so > (art::Runtime::Create(std::__1::vector<std::__1::pair<std::__1::basic_string<char, > std::__1::char_traits<char>, std::__1::allocator<char>>, void const*>, > std::__1::allocator<std::__1::pair<std::__1::basic_string<char, > std::__1::char_traits<char>, std::__1::allocator<char>>, void > const*>>> const&, bool)+68) > F DEBUG : #07 pc 0023c353 /system/lib/libart.so (JNI_CreateJavaVM+658) > F DEBUG : #08 pc 0000205f /system/lib/libandroid_runtime.so > (android::AndroidRuntime::startVm(_JavaVM**, _JNIEnv**, bool)+5038) > F DEBUG : #09 pc 00002381 /system/lib/libandroid_runtime.so > (android::AndroidRuntime::start(char const*, > android::Vector<android::String8> const&, bool)+196) > F DEBUG : #10 pc 0000046b /system/bin/app_process32 (main+702) > > Regards, > Amit Pundir > -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-07-30 13:01 ` Kirill A. Shutemov @ 2018-07-30 13:34 ` Amit Pundir 2018-07-30 17:32 ` Linus Torvalds 1 sibling, 0 replies; 32+ messages in thread From: Amit Pundir @ 2018-07-30 13:34 UTC (permalink / raw) To: kirill Cc: Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, aarcange, Linus Torvalds, Greg Kroah-Hartman, John Stultz, linux-mm, lkml, youling 257 On Mon, 30 Jul 2018 at 18:31, Kirill A. Shutemov <kirill@shutemov.name> wrote: > > On Mon, Jul 30, 2018 at 12:17:46PM +0530, Amit Pundir wrote: > > On Mon, 30 Jul 2018 at 03:39, Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > > > > > > So unless something odd happens, this should be the last rc for 4.18. > > > > > > Nothing particularly odd happened this last week - we got the usual > > > random set of various minor fixes all over. About two thirds of it is > > > drivers - networking, staging and usb stands out, but there's a little > > > bit of stuff all over (clk, block, gpu, nvme..). > > > > > > Outside of drivers, the bulk is some core networking stuff, with > > > random changes elsewhere (minor arch updates, filesystems, core > > > kernel, test scripts). > > > > > > The appended shortlog gives a flavor of the details. > > > > > > Linus > > > > > > --- > > > Kirill A. Shutemov (3): > > > mm: introduce vma_init() > > > mm: use vma_init() to initialize VMAs on stack and data segments > > > mm: fix vma_is_anonymous() false-positives > > > > Hi, I have run into AOSP userspace crash with v4.18-rc7, leading to > > above mm patches. bfd40eaff5ab ("mm: fix vma_is_anonymous() > > false-positives") to be specific. The same userspace is working fine > > with v4.18-rc6. > > > > I didn't yet look into what is going wrong from userspace point of > > view, but I just wanted to give you a heads up on this. I'll be happy > > to assist in further debugging/diagnosis if required. > > Youling reported basically the same bug with zygote crashing, but on > x86-64. > > I think I missed vma_set_anonymous() somewhere, but I fail to see where. > > Could you check if removing 'vma->vm_ops = &dummy_vm_ops;" from vma_init > makes the problem go away? Yes removing 'vma->vm_ops = &dummy_vm_ops;" from vma_init() works. Crash is gone with that change. > > Any chance the code that crashes can be run under strace? Running strace on zygote is going to be a pain. I can check logcat again and see if any other relatively less complex process is crashing with similar backtrace and try to run that with strace if that is still required. Regards, Amit Pundir > > > Here is the crash log from logcat, if it helps: > > F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** > > F DEBUG : Build fingerprint: > > 'Android/db410c32_only/db410c32_only:Q/OC-MR1/102:userdebug/test-key > > F DEBUG : Revision: '0' > > F DEBUG : ABI: 'arm' > > F DEBUG : pid: 2261, tid: 2261, name: zygote >>> zygote <<< > > F DEBUG : signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0xec00008 > > .. <snip> .. > > F DEBUG : backtrace: > > F DEBUG : #00 pc 00001c04 /system/lib/libc.so (memset+48) > > F DEBUG : #01 pc 0010c513 /system/lib/libart.so > > (create_mspace_with_base+82) > > F DEBUG : #02 pc 0015c601 /system/lib/libart.so > > (art::gc::space::DlMallocSpace::CreateMspace(void*, unsigned int, > > unsigned int)+40) > > F DEBUG : #03 pc 0015c3ed /system/lib/libart.so > > (art::gc::space::DlMallocSpace::CreateFromMemMap(art::MemMap*, > > std::__1::basic_string<char, std::__ > > 1::char_traits<char>, std::__1::allocator<char>> const&, unsigned int, > > unsigned int, unsigned int, unsigned int, bool)+36) > > F DEBUG : #04 pc 0013c9ab /system/lib/libart.so > > (art::gc::Heap::Heap(unsigned int, unsigned int, unsigned int, > > unsigned int, double, double, unsigned int, unsigned int, > > std::__1::basic_string<char, std::__1::char_traits<char>, > > std::__1::allocator<char>> const&, art::InstructionSet, > > art::gc::CollectorType, art::gc::CollectorType, > > art::gc::space::LargeObjectSpaceType, unsigned int, unsigned int, > > unsigned int, bool, unsigned int, unsigned int, bool, bool, bool, > > bool, bool, bool, bool, bool, bool, bool, bool, unsigned long > > long)+1674) > > DEBUG : #05 pc 00318201 /system/lib/libart.so > > (art::Runtime::Init(art::RuntimeArgumentMap&&)+7036) > > DEBUG : #06 pc 0031af19 /system/lib/libart.so > > (art::Runtime::Create(std::__1::vector<std::__1::pair<std::__1::basic_string<char, > > std::__1::char_traits<char>, std::__1::allocator<char>>, void const*>, > > std::__1::allocator<std::__1::pair<std::__1::basic_string<char, > > std::__1::char_traits<char>, std::__1::allocator<char>>, void > > const*>>> const&, bool)+68) > > F DEBUG : #07 pc 0023c353 /system/lib/libart.so (JNI_CreateJavaVM+658) > > F DEBUG : #08 pc 0000205f /system/lib/libandroid_runtime.so > > (android::AndroidRuntime::startVm(_JavaVM**, _JNIEnv**, bool)+5038) > > F DEBUG : #09 pc 00002381 /system/lib/libandroid_runtime.so > > (android::AndroidRuntime::start(char const*, > > android::Vector<android::String8> const&, bool)+196) > > F DEBUG : #10 pc 0000046b /system/bin/app_process32 (main+702) > > > > Regards, > > Amit Pundir > > > > -- > Kirill A. Shutemov ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-07-30 13:01 ` Kirill A. Shutemov 2018-07-30 13:34 ` Amit Pundir @ 2018-07-30 17:32 ` Linus Torvalds 2018-07-30 21:53 ` Hugh Dickins 1 sibling, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2018-07-30 17:32 UTC (permalink / raw) To: Kirill A. Shutemov, Hugh Dickins, Matthew Wilcox Cc: Amit Pundir, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, John Stultz, linux-mm, Linux Kernel Mailing List, youling257 On Mon, Jul 30, 2018 at 6:01 AM Kirill A. Shutemov <kirill@shutemov.name> wrote: > > I think I missed vma_set_anonymous() somewhere, but I fail to see where. Honestly, by now we just need to revert that commit. It's not even clear that it was a good idea to begin with. The rest of the commits were cleanups, this one was driven by a incorrect VM_BUG_ON() that triggered, and that checked "vma_is_anonymous(vma)" without any explanations of wht it should matter. I think the biggest problem with vma_is_anonymous() may be its name, not what it does. What the code historically *did* (and what vma_is_anonymous() checks) is not "is this anonymous", but rather "does this have any special operations associated with it". The two are similar. But people have grown opinions about exactly what "anonymous" means. If we had named it just "no_vm_ops()", we wouldn't have random crazy checks for "vma_is_anonymous()" in places where it makes no sense. So what I think we want a real explanation for is why people who use "vma_is_anonymous()" care. Instead of trying to change its very historical meaning, we should look at the users, and perhaps change its name. In this case, for example, I think the *real* problem was described by commit 684283988f70 ("huge pagecache: mmap_sem is unlocked when truncation splits pmd"), and the problem is that an existing check that required that mmap_sem was held was changed to say "only for anonymous mappings". But the fact is, you can truncate mappings that don't have any ops just *fine*. So maybe that original BUG() was entirely bogus to begin with, and it shouldn't exist at all? Or maybe the code should test "do I have a vm_file" instead of testing "do I have vm_ops"? What's the problem with just doing split_huge_pmd() there when it's a pmd_trans_huge or pmd_devmap pmd? Why is that VM_BUG_ON_VMA() there in the first place? Why are allegedly "anonymous" mappings so special here for locking? Adding a few more people to the cc, they were involved the last that time VM_BUG_ON_VMA() was modified. New people: see commit bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives") for details. Right now I think it's getting reverted, but the oops explanation in the commit is about that kernel BUG at mm/memory.c:1422! which was/is debatable and seems to make no sense (and definitely is still triggerable despite that commit 684283988f70 ("huge pagecache: mmap_sem is unlocked when truncation splits pmd") that limited it a bit - but I think it didn't limit it enough. Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-07-30 17:32 ` Linus Torvalds @ 2018-07-30 21:53 ` Hugh Dickins 2018-07-31 1:01 ` Linus Torvalds 0 siblings, 1 reply; 32+ messages in thread From: Hugh Dickins @ 2018-07-30 21:53 UTC (permalink / raw) To: Linus Torvalds Cc: Kirill A. Shutemov, Hugh Dickins, Matthew Wilcox, Amit Pundir, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, John Stultz, linux-mm, Linux Kernel Mailing List, youling257 On Mon, 30 Jul 2018, Linus Torvalds wrote: > On Mon, Jul 30, 2018 at 6:01 AM Kirill A. Shutemov <kirill@shutemov.name> wrote: > > > > I think I missed vma_set_anonymous() somewhere, but I fail to see where. > > Honestly, by now we just need to revert that commit. > > It's not even clear that it was a good idea to begin with. The rest of > the commits were cleanups, this one was driven by a incorrect > VM_BUG_ON() that triggered, and that checked "vma_is_anonymous(vma)" > without any explanations of wht it should matter. > > I think the biggest problem with vma_is_anonymous() may be its name, > not what it does. > > What the code historically *did* (and what vma_is_anonymous() checks) > is not "is this anonymous", but rather "does this have any special > operations associated with it". > > The two are similar. But people have grown opinions about exactly what > "anonymous" means. If we had named it just "no_vm_ops()", we wouldn't > have random crazy checks for "vma_is_anonymous()" in places where it > makes no sense. > > So what I think we want a real explanation for is why people who use > "vma_is_anonymous()" care. Instead of trying to change its very > historical meaning, we should look at the users, and perhaps change > its name. You make a very good point on the naming. Especially confusing when layered on top of "we call shmem 'file' here, but 'anon' there". I have no problem with reverting -rc7's vma_is_anonymous() series. > > In this case, for example, I think the *real* problem was described by > commit 684283988f70 ("huge pagecache: mmap_sem is unlocked when > truncation splits pmd"), and the problem is that an existing check > that required that mmap_sem was held was changed to say "only for > anonymous mappings". > > But the fact is, you can truncate mappings that don't have any ops just *fine*. > > So maybe that original BUG() was entirely bogus to begin with, and it > shouldn't exist at all? > > Or maybe the code should test "do I have a vm_file" instead of testing > "do I have vm_ops"? > > What's the problem with just doing split_huge_pmd() there when it's a > pmd_trans_huge or pmd_devmap pmd? Why is that VM_BUG_ON_VMA() there in > the first place? Why are allegedly "anonymous" mappings so special > here for locking? > > Adding a few more people to the cc, they were involved the last that > time VM_BUG_ON_VMA() was modified. > > New people: see commit bfd40eaff5ab ("mm: fix vma_is_anonymous() > false-positives") for details. Right now I think it's getting > reverted, but the oops explanation in the commit is about that > > kernel BUG at mm/memory.c:1422! > > which was/is debatable and seems to make no sense (and definitely is > still triggerable despite that commit 684283988f70 ("huge pagecache: > mmap_sem is unlocked when truncation splits pmd") that limited it a > bit - but I think it didn't limit it enough. I'm all for deleting that VM_BUG_ON_VMA() in zap_pmd_range(), it was just a compromise with those who wanted to keep something there; I don't think we even need a WARN_ON_ONCE() now. It's historical: back in the day when only (hugetlbfs which never gets there, and) anon THP used huge pmds for userspace, it did half- document an obscure assumption underlying __split_huge_pmd(), and IIRC was added in response to a trinity bug catch in that area. (It remains quite interesting how exit_mmap() does not come that way, and most syscalls split the vma beforehand in vma_adjust(): it's mostly about madvise(,,MADV_DONTNEED), perhaps others now, which zap ptes without prior splitting.) Once DAX and huge tmpfs came in, the BUG had to be scrapped or weakened because truncation and hole-punching can come that way without mmap_sem. And perhaps other drivers since are taking advantage of huge pmds now, with other assumptions. But I have not yet taken a look to see where the -rc7 changes actually go wrong: I'll spend a little while looking, but don't expect to find it - certainly don't wait on me, I'll only speak up if I find something. Hugh ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-07-30 21:53 ` Hugh Dickins @ 2018-07-31 1:01 ` Linus Torvalds 2018-07-31 3:26 ` Hugh Dickins 2018-07-31 6:29 ` Linux 4.18-rc7 Kirill A. Shutemov 0 siblings, 2 replies; 32+ messages in thread From: Linus Torvalds @ 2018-07-31 1:01 UTC (permalink / raw) To: Hugh Dickins Cc: Kirill A. Shutemov, Matthew Wilcox, Amit Pundir, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, John Stultz, linux-mm, Linux Kernel Mailing List, youling257 On Mon, Jul 30, 2018 at 2:53 PM Hugh Dickins <hughd@google.com> wrote: > > I have no problem with reverting -rc7's vma_is_anonymous() series. I don't think we need to revert the whole series: I think the rest are all fairly obvious cleanups, and shouldn't really have any semantic changes. It's literally only that last patch in the series that then changes that meaning of "vm_ops". And I don't really _mind_ that last step either, but since we don't know exactly what it was that it broke, and we're past rc7, I don't think we really have any option but the revert it. And if we revert it, I think we need to just remove the VM_BUG_ON_VMA() that it was supposed to fix. Because I do think that it is quite likely that the real bug is that overzealous BUG_ON(), since I can't see any reason why anonymous mappings should be special there. But I'm certainly also ok with re-visiting that commit later. I just think that right _now_ the above is my preferred plan. Kirill? > I'm all for deleting that VM_BUG_ON_VMA() in zap_pmd_range(), it was > just a compromise with those who wanted to keep something there; > I don't think we even need a WARN_ON_ONCE() now. So to me it looks like a historical check that simply doesn't "normally" trigger, but there's no reason I can see why we should care about the case it tests against. > (It remains quite interesting how exit_mmap() does not come that way, > and most syscalls split the vma beforehand in vma_adjust(): it's mostly > about madvise(,,MADV_DONTNEED), perhaps others now, which zap ptes > without prior splitting.) Well, in this case it's the ftruncate() path, which fundamentally doesn't split the vma at all (prior *or* later). But yes, madvise() is in the same boat - it doesn't change the vma at all, it just changes the contents of the vma. And exit_mmap() is special because it just tears down everything. So we do have three very distinct cases: (a) changing and thus splitting the vma itself (mprotect, munmap/mmap, mlock), (b) not changing the vma, but changing the underlying mapping (truncate and madvise(MADV_DONTNEED) (c) tearing down everything, and no locking needed because it's the last user (exit_mmap). that are different for what I think are good reasons. Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-07-31 1:01 ` Linus Torvalds @ 2018-07-31 3:26 ` Hugh Dickins 2018-07-31 4:25 ` John Stultz 2018-07-31 6:29 ` Linux 4.18-rc7 Kirill A. Shutemov 1 sibling, 1 reply; 32+ messages in thread From: Hugh Dickins @ 2018-07-31 3:26 UTC (permalink / raw) To: Linus Torvalds Cc: Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Amit Pundir, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, John Stultz, linux-mm, Linux Kernel Mailing List, youling257 On Mon, 30 Jul 2018, Linus Torvalds wrote: > On Mon, Jul 30, 2018 at 2:53 PM Hugh Dickins <hughd@google.com> wrote: > > > > I have no problem with reverting -rc7's vma_is_anonymous() series. > > I don't think we need to revert the whole series: I think the rest are > all fairly obvious cleanups, and shouldn't really have any semantic > changes. Okay. > > It's literally only that last patch in the series that then changes > that meaning of "vm_ops". And I don't really _mind_ that last step > either, but since we don't know exactly what it was that it broke, and > we're past rc7, I don't think we really have any option but the revert > it. It took me a long time to grasp what was happening, that that last patch bfd40eaff5ab was fixing. Not quite explained in the commit. I think it was that by mistakenly passing the vma_is_anonymous() test, create_huge_pmd() gave the MAP_PRIVATE kcov mapping a THP (instead of COWing pages from kcov); which the truncate then had to split, and in going to do so, again hit the mistaken vma_is_anonymous() test, BUG. > > And if we revert it, I think we need to just remove the > VM_BUG_ON_VMA() that it was supposed to fix. Because I do think that > it is quite likely that the real bug is that overzealous BUG_ON(), > since I can't see any reason why anonymous mappings should be special > there. Yes, that probably has to go: but it's not clear what state it leaves us in, with an anon THP being split by a truncate, without the expected locking; I don't remember offhand, probably a subtler bug than that BUG, which you may or may not consider an improvement. I fear that Kirill has not missed inserting a vma_set_anonymous() from somewhere that it should be, but rather that zygote is working with some special mapping which used to satisfy vma_is_anonymous(), faults supplying backing pages, but now comes out as !vma_is_anonymous(), so do_fault() finds !dummy_vm_ops.fault hence SIGBUS. If that's so, perhaps dummy_vm_ops needs to be given a back-compatible fault handler; or the driver(?) in question given vm_ops and that fault handler. But when I say "back-compatible", I don't think it should ever go so far as to supply a THP. > > But I'm certainly also ok with re-visiting that commit later. I just > think that right _now_ the above is my preferred plan. > > Kirill? > > > I'm all for deleting that VM_BUG_ON_VMA() in zap_pmd_range(), it was > > just a compromise with those who wanted to keep something there; > > I don't think we even need a WARN_ON_ONCE() now. > > So to me it looks like a historical check that simply doesn't > "normally" trigger, but there's no reason I can see why we should care > about the case it tests against. > > > (It remains quite interesting how exit_mmap() does not come that way, > > and most syscalls split the vma beforehand in vma_adjust(): it's mostly > > about madvise(,,MADV_DONTNEED), perhaps others now, which zap ptes > > without prior splitting.) > > Well, in this case it's the ftruncate() path, which fundamentally > doesn't split the vma at all (prior *or* later). But yes, madvise() is > in the same boat - it doesn't change the vma at all, it just changes > the contents of the vma. > > And exit_mmap() is special because it just tears down everything. > > So we do have three very distinct cases: > > (a) changing and thus splitting the vma itself (mprotect, munmap/mmap, mlock), Yes. > > (b) not changing the vma, but changing the underlying mapping > (truncate and madvise(MADV_DONTNEED) Yes, though I think I would distinguish the truncate & hole-punch case from the madvise zap case, they have different characteristics in other ways (or did, before that awkward case of truncating an anon THP surfaced). > > (c) tearing down everything, and no locking needed because it's the > last user (exit_mmap). Yes; and it goes linearly from start to finish, never jumping into the middle of a vma, so never needing to split a THP. > > that are different for what I think are good reasons. > > Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-07-31 3:26 ` Hugh Dickins @ 2018-07-31 4:25 ` John Stultz 2018-07-31 6:40 ` Amit Pundir 0 siblings, 1 reply; 32+ messages in thread From: John Stultz @ 2018-07-31 4:25 UTC (permalink / raw) To: Hugh Dickins Cc: Linus Torvalds, Kirill A. Shutemov, Matthew Wilcox, Amit Pundir, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm, Linux Kernel Mailing List, youling257, Joel Fernandes, Colin Cross On Mon, Jul 30, 2018 at 8:26 PM, Hugh Dickins <hughd@google.com> wrote: > On Mon, 30 Jul 2018, Linus Torvalds wrote: >> On Mon, Jul 30, 2018 at 2:53 PM Hugh Dickins <hughd@google.com> wrote: >> > >> > I have no problem with reverting -rc7's vma_is_anonymous() series. >> >> I don't think we need to revert the whole series: I think the rest are >> all fairly obvious cleanups, and shouldn't really have any semantic >> changes. > > Okay. > >> >> It's literally only that last patch in the series that then changes >> that meaning of "vm_ops". And I don't really _mind_ that last step >> either, but since we don't know exactly what it was that it broke, and >> we're past rc7, I don't think we really have any option but the revert >> it. > > It took me a long time to grasp what was happening, that that last > patch bfd40eaff5ab was fixing. Not quite explained in the commit. > > I think it was that by mistakenly passing the vma_is_anonymous() test, > create_huge_pmd() gave the MAP_PRIVATE kcov mapping a THP (instead of > COWing pages from kcov); which the truncate then had to split, and in > going to do so, again hit the mistaken vma_is_anonymous() test, BUG. > >> >> And if we revert it, I think we need to just remove the >> VM_BUG_ON_VMA() that it was supposed to fix. Because I do think that >> it is quite likely that the real bug is that overzealous BUG_ON(), >> since I can't see any reason why anonymous mappings should be special >> there. > > Yes, that probably has to go: but it's not clear what state it leaves > us in, with an anon THP being split by a truncate, without the expected > locking; I don't remember offhand, probably a subtler bug than that BUG, > which you may or may not consider an improvement. > > I fear that Kirill has not missed inserting a vma_set_anonymous() from > somewhere that it should be, but rather that zygote is working with some > special mapping which used to satisfy vma_is_anonymous(), faults supplying > backing pages, but now comes out as !vma_is_anonymous(), so do_fault() > finds !dummy_vm_ops.fault hence SIGBUS. I've been only casually following this thread (mostly just glad Amit caught it and I could avoid having to bisect the issue in my own Android testing), but this bit starting to shake a few old cobwebs loose in my brain. I'm wondering if Zygote is utilizing ashmem here, and we're somehow traversing ashmem purged memory, or due to some setup issue the initial traverse isn't being zero-filled as expected? ashmem ranges are created using: shmem_file_setup() and shmem_zero_setup() https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n377 If we purge pages, it punches it out with: vfs_fallocate(range->asma->file, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, start, end - start); here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n447 But in ashmem_pin(), we don't do anything other then returning if we purged any page in the range. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n577 And I believe the future assumption is the if we traverse those pages they will be zero filled (if purged or even during the initial traversal after mmap) Its been a long time, and I've not looked at the code in question but it sounds from Hugh's comments above that we might instead get a SIGBUS here. Looking more at the problematic patch.. Amit: Does adding something like (whitespace damaged, apologies): index a1a0025..1af6915 100644 --- a/drivers/staging/android/ashmem.c +++ b/drivers/staging/android/ashmem.c @@ -402,7 +402,8 @@ static int ashmem_mmap(struct file *file, struct vm_area_struct *vma) fput(asma->file); goto out; } - } + } else + vma_set_anonymous(vma); if (vma->vm_file) fput(vma->vm_file); Seem to resolve it? (Sorry, I'd test it myself, but I'm away from my desk for the night). thanks -john ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-07-31 4:25 ` John Stultz @ 2018-07-31 6:40 ` Amit Pundir 2018-07-31 6:56 ` Kirill A. Shutemov 2018-07-31 16:29 ` Linus Torvalds 0 siblings, 2 replies; 32+ messages in thread From: Amit Pundir @ 2018-07-31 6:40 UTC (permalink / raw) To: John Stultz Cc: Hugh Dickins, Linus Torvalds, kirill, willy, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, aarcange, Greg Kroah-Hartman, linux-mm, lkml, youling 257, Joel Fernandes, ccross On Tue, 31 Jul 2018 at 09:55, John Stultz <john.stultz@linaro.org> wrote: > > On Mon, Jul 30, 2018 at 8:26 PM, Hugh Dickins <hughd@google.com> wrote: > > On Mon, 30 Jul 2018, Linus Torvalds wrote: > >> On Mon, Jul 30, 2018 at 2:53 PM Hugh Dickins <hughd@google.com> wrote: > >> > > >> > I have no problem with reverting -rc7's vma_is_anonymous() series. > >> > >> I don't think we need to revert the whole series: I think the rest are > >> all fairly obvious cleanups, and shouldn't really have any semantic > >> changes. > > > > Okay. > > > >> > >> It's literally only that last patch in the series that then changes > >> that meaning of "vm_ops". And I don't really _mind_ that last step > >> either, but since we don't know exactly what it was that it broke, and > >> we're past rc7, I don't think we really have any option but the revert > >> it. > > > > It took me a long time to grasp what was happening, that that last > > patch bfd40eaff5ab was fixing. Not quite explained in the commit. > > > > I think it was that by mistakenly passing the vma_is_anonymous() test, > > create_huge_pmd() gave the MAP_PRIVATE kcov mapping a THP (instead of > > COWing pages from kcov); which the truncate then had to split, and in > > going to do so, again hit the mistaken vma_is_anonymous() test, BUG. > > > >> > >> And if we revert it, I think we need to just remove the > >> VM_BUG_ON_VMA() that it was supposed to fix. Because I do think that > >> it is quite likely that the real bug is that overzealous BUG_ON(), > >> since I can't see any reason why anonymous mappings should be special > >> there. > > > > Yes, that probably has to go: but it's not clear what state it leaves > > us in, with an anon THP being split by a truncate, without the expected > > locking; I don't remember offhand, probably a subtler bug than that BUG, > > which you may or may not consider an improvement. > > > > I fear that Kirill has not missed inserting a vma_set_anonymous() from > > somewhere that it should be, but rather that zygote is working with some > > special mapping which used to satisfy vma_is_anonymous(), faults supplying > > backing pages, but now comes out as !vma_is_anonymous(), so do_fault() > > finds !dummy_vm_ops.fault hence SIGBUS. > > I've been only casually following this thread (mostly just glad Amit > caught it and I could avoid having to bisect the issue in my own > Android testing), but this bit starting to shake a few old cobwebs > loose in my brain. > > I'm wondering if Zygote is utilizing ashmem here, and we're somehow > traversing ashmem purged memory, or due to some setup issue the > initial traverse isn't being zero-filled as expected? > > ashmem ranges are created using: shmem_file_setup() and shmem_zero_setup() > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n377 > > > If we purge pages, it punches it out with: > vfs_fallocate(range->asma->file, > FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, > start, end - start); > here: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n447 > > But in ashmem_pin(), we don't do anything other then returning if we > purged any page in the range. > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n577 > > And I believe the future assumption is the if we traverse those pages > they will be zero filled (if purged or even during the initial > traversal after mmap) > > Its been a long time, and I've not looked at the code in question but > it sounds from Hugh's comments above that we might instead get a > SIGBUS here. > > Looking more at the problematic patch.. > Amit: Does adding something like (whitespace damaged, apologies): > > index a1a0025..1af6915 100644 > --- a/drivers/staging/android/ashmem.c > +++ b/drivers/staging/android/ashmem.c > @@ -402,7 +402,8 @@ static int ashmem_mmap(struct file *file, struct > vm_area_struct *vma) > fput(asma->file); > goto out; > } > - } > + } else > + vma_set_anonymous(vma); > > if (vma->vm_file) > fput(vma->vm_file); > This ashmem change ^^ worked too. Regards, Amit Pundir > > Seem to resolve it? (Sorry, I'd test it myself, but I'm away from my > desk for the night). > thanks > -john ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-07-31 6:40 ` Amit Pundir @ 2018-07-31 6:56 ` Kirill A. Shutemov 2018-07-31 16:29 ` Linus Torvalds 1 sibling, 0 replies; 32+ messages in thread From: Kirill A. Shutemov @ 2018-07-31 6:56 UTC (permalink / raw) To: Amit Pundir Cc: John Stultz, Hugh Dickins, Linus Torvalds, willy, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, aarcange, Greg Kroah-Hartman, linux-mm, lkml, youling 257, Joel Fernandes, ccross On Tue, Jul 31, 2018 at 12:10:06PM +0530, Amit Pundir wrote: > On Tue, 31 Jul 2018 at 09:55, John Stultz <john.stultz@linaro.org> wrote: > > > > On Mon, Jul 30, 2018 at 8:26 PM, Hugh Dickins <hughd@google.com> wrote: > > > On Mon, 30 Jul 2018, Linus Torvalds wrote: > > >> On Mon, Jul 30, 2018 at 2:53 PM Hugh Dickins <hughd@google.com> wrote: > > >> > > > >> > I have no problem with reverting -rc7's vma_is_anonymous() series. > > >> > > >> I don't think we need to revert the whole series: I think the rest are > > >> all fairly obvious cleanups, and shouldn't really have any semantic > > >> changes. > > > > > > Okay. > > > > > >> > > >> It's literally only that last patch in the series that then changes > > >> that meaning of "vm_ops". And I don't really _mind_ that last step > > >> either, but since we don't know exactly what it was that it broke, and > > >> we're past rc7, I don't think we really have any option but the revert > > >> it. > > > > > > It took me a long time to grasp what was happening, that that last > > > patch bfd40eaff5ab was fixing. Not quite explained in the commit. > > > > > > I think it was that by mistakenly passing the vma_is_anonymous() test, > > > create_huge_pmd() gave the MAP_PRIVATE kcov mapping a THP (instead of > > > COWing pages from kcov); which the truncate then had to split, and in > > > going to do so, again hit the mistaken vma_is_anonymous() test, BUG. > > > > > >> > > >> And if we revert it, I think we need to just remove the > > >> VM_BUG_ON_VMA() that it was supposed to fix. Because I do think that > > >> it is quite likely that the real bug is that overzealous BUG_ON(), > > >> since I can't see any reason why anonymous mappings should be special > > >> there. > > > > > > Yes, that probably has to go: but it's not clear what state it leaves > > > us in, with an anon THP being split by a truncate, without the expected > > > locking; I don't remember offhand, probably a subtler bug than that BUG, > > > which you may or may not consider an improvement. > > > > > > I fear that Kirill has not missed inserting a vma_set_anonymous() from > > > somewhere that it should be, but rather that zygote is working with some > > > special mapping which used to satisfy vma_is_anonymous(), faults supplying > > > backing pages, but now comes out as !vma_is_anonymous(), so do_fault() > > > finds !dummy_vm_ops.fault hence SIGBUS. > > > > I've been only casually following this thread (mostly just glad Amit > > caught it and I could avoid having to bisect the issue in my own > > Android testing), but this bit starting to shake a few old cobwebs > > loose in my brain. > > > > I'm wondering if Zygote is utilizing ashmem here, and we're somehow > > traversing ashmem purged memory, or due to some setup issue the > > initial traverse isn't being zero-filled as expected? > > > > ashmem ranges are created using: shmem_file_setup() and shmem_zero_setup() > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n377 > > > > > > If we purge pages, it punches it out with: > > vfs_fallocate(range->asma->file, > > FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, > > start, end - start); > > here: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n447 > > > > But in ashmem_pin(), we don't do anything other then returning if we > > purged any page in the range. > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n577 > > > > And I believe the future assumption is the if we traverse those pages > > they will be zero filled (if purged or even during the initial > > traversal after mmap) > > > > Its been a long time, and I've not looked at the code in question but > > it sounds from Hugh's comments above that we might instead get a > > SIGBUS here. > > > > Looking more at the problematic patch.. > > Amit: Does adding something like (whitespace damaged, apologies): > > > > index a1a0025..1af6915 100644 > > --- a/drivers/staging/android/ashmem.c > > +++ b/drivers/staging/android/ashmem.c > > @@ -402,7 +402,8 @@ static int ashmem_mmap(struct file *file, struct > > vm_area_struct *vma) > > fput(asma->file); > > goto out; > > } > > - } > > + } else > > + vma_set_anonymous(vma); > > > > if (vma->vm_file) > > fput(vma->vm_file); > > > > This ashmem change ^^ worked too. Okay. It makes sense. But I'm not convinced that's a legitimate way to get an anonymous mapping. I don't know how ashmem suppose to work. Looks like we get a shmem file associated with the mapping, even if user asked for private mapping. Shouldn't in this case vm_ops point to shmem_vm_ops? Note, we have only one other case when MAP_PRIVATE on a file gets you an anonymous mapping -- /dev/zero. -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-07-31 6:40 ` Amit Pundir 2018-07-31 6:56 ` Kirill A. Shutemov @ 2018-07-31 16:29 ` Linus Torvalds 2018-07-31 16:56 ` John Stultz ` (2 more replies) 1 sibling, 3 replies; 32+ messages in thread From: Linus Torvalds @ 2018-07-31 16:29 UTC (permalink / raw) To: Amit Pundir Cc: John Stultz, Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm, Linux Kernel Mailing List, youling 257, Joel Fernandes, Colin Cross On Mon, Jul 30, 2018 at 11:40 PM Amit Pundir <amit.pundir@linaro.org> wrote: > > This ashmem change ^^ worked too. Ok, let's go for that one and hope it's the only one. John, can I get a proper commit message and sign-off for that ashmem change? Kirill - you mentioned that somebody reproduced a problem on x86-64 too. I didn't see that report. Was that some odd x86 Android setup with Ashmem too, or is there something else pending? Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-07-31 16:29 ` Linus Torvalds @ 2018-07-31 16:56 ` John Stultz 2018-07-31 17:03 ` Kirill A. Shutemov 2018-07-31 17:17 ` [PATCH] staging: ashmem: Fix SIGBUS crash when traversing mmaped ashmem pages John Stultz 2 siblings, 0 replies; 32+ messages in thread From: John Stultz @ 2018-07-31 16:56 UTC (permalink / raw) To: Linus Torvalds Cc: Amit Pundir, Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm, Linux Kernel Mailing List, youling 257, Joel Fernandes, Colin Cross On Tue, Jul 31, 2018 at 9:29 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Jul 30, 2018 at 11:40 PM Amit Pundir <amit.pundir@linaro.org> wrote: >> >> This ashmem change ^^ worked too. > > Ok, let's go for that one and hope it's the only one. > > John, can I get a proper commit message and sign-off for that ashmem change? Will do. Just doing some local testing myself to make sure all is well. > Kirill - you mentioned that somebody reproduced a problem on x86-64 > too. I didn't see that report. Was that some odd x86 Android setup > with Ashmem too, or is there something else pending? Krill mentioned "zygote crashing, but on x86-64" and zygote is Android so I assume it is the same issue. thanks -john ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-07-31 16:29 ` Linus Torvalds 2018-07-31 16:56 ` John Stultz @ 2018-07-31 17:03 ` Kirill A. Shutemov 2018-07-31 17:43 ` Luck, Tony 2018-07-31 17:17 ` [PATCH] staging: ashmem: Fix SIGBUS crash when traversing mmaped ashmem pages John Stultz 2 siblings, 1 reply; 32+ messages in thread From: Kirill A. Shutemov @ 2018-07-31 17:03 UTC (permalink / raw) To: Linus Torvalds, Luck, Tony Cc: Amit Pundir, John Stultz, Hugh Dickins, Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm, Linux Kernel Mailing List, youling 257, Joel Fernandes, Colin Cross On Tue, Jul 31, 2018 at 09:29:22AM -0700, Linus Torvalds wrote: > On Mon, Jul 30, 2018 at 11:40 PM Amit Pundir <amit.pundir@linaro.org> wrote: > > > > This ashmem change ^^ worked too. > > Ok, let's go for that one and hope it's the only one. > > John, can I get a proper commit message and sign-off for that ashmem change? > > Kirill - you mentioned that somebody reproduced a problem on x86-64 > too. I didn't see that report. Was that some odd x86 Android setup > with Ashmem too, or is there something else pending? I've got report from youling privately: "mm: fix vma_is_anonymous() false-positives" cause my userspace boot failed, our Androidx86 userspace can running on linux mainline kernel, revert it boot succeed with 4.18rc7 kernel. "mm: fix vma_is_anonymous() false-positives" cause these 07-30 11:04:19.556 1609 1609 F DEBUG : pid: 1304, tid: 1304, name: zygote >>> zygote <<< 07-30 11:04:19.556 1609 1609 F DEBUG : signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0x7494d008 07-30 11:04:19.556 1609 1609 F DEBUG : eax 00000000 ebx f337bb68 ecx 000001e0 edx 7494d008 07-30 11:04:19.556 1609 1609 F DEBUG : esi 7494d000 edi 00000000 07-30 11:04:19.556 1609 1609 F DEBUG : xcs 00000023 xds 0000002b xes 0000002b xfs 00000003 xss 0000002b 07-30 11:04:19.556 1609 1609 F DEBUG : eip f40f5c76 ebp ffa8d288 esp ffa8d238 flags 00010202 07-30 11:04:19.581 1609 1609 F DEBUG : ------------------------------------------------------------------------- The report also had screenshot attached about system info. It's a Baytrail tablet with LinageOS, so I believe it's the same issue. But it's not the only issue unfortunately. Tony reported issue with booting ia64 with the patch. I have no clue why. I rechecked everything ia64-specific and looks fine to me. :-/ -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-07-31 17:03 ` Kirill A. Shutemov @ 2018-07-31 17:43 ` Luck, Tony 2018-07-31 19:02 ` Linus Torvalds 2018-08-01 17:15 ` Linus Torvalds 0 siblings, 2 replies; 32+ messages in thread From: Luck, Tony @ 2018-07-31 17:43 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Linus Torvalds, Amit Pundir, John Stultz, Hugh Dickins, Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm, Linux Kernel Mailing List, youling 257, Joel Fernandes, Colin Cross On Tue, Jul 31, 2018 at 08:03:28PM +0300, Kirill A. Shutemov wrote: > But it's not the only issue unfortunately. Tony reported issue with > booting ia64 with the patch. I have no clue why. I rechecked everything > ia64-specific and looks fine to me. :-/ If I just revert bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives") then ia64 boots again. -Tony ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-07-31 17:43 ` Luck, Tony @ 2018-07-31 19:02 ` Linus Torvalds 2018-08-01 17:15 ` Linus Torvalds 1 sibling, 0 replies; 32+ messages in thread From: Linus Torvalds @ 2018-07-31 19:02 UTC (permalink / raw) To: Tony Luck Cc: Kirill A. Shutemov, Amit Pundir, John Stultz, Hugh Dickins, Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm, Linux Kernel Mailing List, youling 257, Joel Fernandes, Colin Cross On Tue, Jul 31, 2018 at 10:43 AM Luck, Tony <tony.luck@intel.com> wrote: > > If I just revert bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives") > then ia64 boots again. Ok, so it's not just the ashmem thing. I think I'll do an rc8 with the revert, just so that we'll have some time to figure this out. It's only Tuesday, but I already have 90 commits since rc7, so this isn't the only issue we're having. I _prefer_ just the regular cadence of releases, but when I have a reason to delay, I'll delay. Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-07-31 17:43 ` Luck, Tony 2018-07-31 19:02 ` Linus Torvalds @ 2018-08-01 17:15 ` Linus Torvalds 2018-08-01 18:31 ` Hugh Dickins ` (3 more replies) 1 sibling, 4 replies; 32+ messages in thread From: Linus Torvalds @ 2018-08-01 17:15 UTC (permalink / raw) To: Tony Luck Cc: Kirill A. Shutemov, Amit Pundir, John Stultz, Hugh Dickins, Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm, Linux Kernel Mailing List, youling 257, Joel Fernandes, Colin Cross On Tue, Jul 31, 2018 at 10:43 AM Luck, Tony <tony.luck@intel.com> wrote: > > On Tue, Jul 31, 2018 at 08:03:28PM +0300, Kirill A. Shutemov wrote: > > But it's not the only issue unfortunately. Tony reported issue with > > booting ia64 with the patch. I have no clue why. I rechecked everything > > ia64-specific and looks fine to me. :-/ > > If I just revert bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives") > then ia64 boots again. Ok, I'd love to have more information about this, but I'm assuming that Tony isn't running some odd ia64 version of Android, so there's definitely something else than just the ashmem thing going on. Either it's some odd ia64-specific special vma, or it's just something triggered on an ia64 boot that nobody else noticed or cared about. And I was just going to do the final revert and started this email to say so, when I looked at the obvious candidate: the ia64_init_addr_space() function. Trivially fixed. But as I was doing that, I also noticed another problem with the vma series: the vma_init() conversion of automatic variables is buggy. Commit 2c4541e24c55 ("mm: use vma_init() to initialize VMAs on stack and data segments") is really bad, because it never grew the memset() that was discussed, and the patch that was applied was the original one - so vma_init() only initializes a couple of fields. As a result, doing things like this: - struct vm_area_struct vma = { .vm_mm = mm }; + struct vm_area_struct vma; + vma_init(&vma, mm); is just completely wrong, because it actually initializes much less than it used to, and leaves most of the vma filled with random stack garbage. In particular, it now fills with garbage the fields that TLB flushing really can care about: things like vm_flags that says "is this perhaps an executable-only mapping that only needs to flush the ITLB?" I don't actually believe that we should do vma_init() on those on-stack vma's anyway, since they aren't "real" vma's. They are literally crafted just for TLB flushing - filling in the vm_mm (and sometimes vm_flags) pointers so that the TLB flushing knows what to do. So using "vma_init()" on them is actively detrimental as things stand right now. The reason I looked at them was that I was trying to see who actually uses "vm_area_alloc()" and "vma_init()" right now that would be affected by that commit bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives") outside of actual honest-to-goodness device file mmaps. Anyway, the upshot of all this is that I think I know what the ia64 problem was, and John sent the patch for the ashmem case, and I'm going to hold off reverting that vma_is_anonymous() false-positives commit after all. I'm still unhappy about the vma_init() ones, and I have not decided how to go with those. Either the memset() in vma_init(), or just reverting the (imho unnecessary) commit 2c4541e24c55. Kirill, Andrew, comments? Tony, can you please double-check my commit ebad825cdd4e ("ia64: mark special ia64 memory areas anonymous") fixes things for you? I didn't even compile it, but it really looks so obvious that I just committed it directly. It's not pushed out yet (I'm doing the normal full build test because of the ashmem commit first), but it should be out in about 20 minutes when my testing has finished. I'd like to get this sorted out asap, although at this point I still think that I'll have to do an rc8 even though I feel like we may have caught everything. Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-08-01 17:15 ` Linus Torvalds @ 2018-08-01 18:31 ` Hugh Dickins 2018-08-01 20:58 ` Kirill A. Shutemov 2018-08-01 18:36 ` Luck, Tony ` (2 subsequent siblings) 3 siblings, 1 reply; 32+ messages in thread From: Hugh Dickins @ 2018-08-01 18:31 UTC (permalink / raw) To: Linus Torvalds Cc: Tony Luck, Kirill A. Shutemov, Amit Pundir, John Stultz, Hugh Dickins, Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm, Linux Kernel Mailing List, youling 257, Joel Fernandes, Colin Cross On Wed, 1 Aug 2018, Linus Torvalds wrote: > > Anyway, the upshot of all this is that I think I know what the ia64 > problem was, and John sent the patch for the ashmem case, and I'm > going to hold off reverting that vma_is_anonymous() false-positives > commit after all. I'd better send deletion of zap_pmd_range()'s VM_BUG_ON_VMA(): below (but I've no proprietorial interest, if you prefer to do your own). John's patch is good, and originally I thought it was safe from that VM_BUG_ON_VMA(), because the /dev/ashmem fd exposed to the user is disconnected from the vm_file in the vma, and madvise(,,MADV_REMOVE) insists on VM_SHARED. But afterwards read John's earlier mail, drawing attention to the vfs_fallocate() in there: I may be wrong, and I don't know if Android has THP in the config anyway, but it looks to me like an unmap_mapping_range() from ashmem's vfs_fallocate() could hit precisely the VM_BUG_ON_VMA(), once it's vma_is_anonymous(). (I'm not familiar with ashmem, and I certainly don't understand the role of MAP_PRIVATE ashmem mappings - hole-punch's zap_pte_range() should end up leaving any anon pages in place; but the presence of the BUG is requiring us all to understand too much too quickly.) [PATCH] mm: delete historical BUG from zap_pmd_range() Delete the old VM_BUG_ON_VMA() from zap_pmd_range(), which asserted that mmap_sem must be held when splitting an "anonymous" vma there. Whether that's still strictly true nowadays is not entirely clear, but the danger of sometimes crashing on the BUG is now fairly clear. Signed-off-by: Hugh Dickins <hughd@google.com> --- mm/memory.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) --- 4.18-rc7/mm/memory.c 2018-06-16 18:48:22.041173422 -0700 +++ linux/mm/memory.c 2018-08-01 11:01:21.397286507 -0700 @@ -1417,11 +1417,9 @@ static inline unsigned long zap_pmd_rang do { next = pmd_addr_end(addr, end); if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) { - if (next - addr != HPAGE_PMD_SIZE) { - VM_BUG_ON_VMA(vma_is_anonymous(vma) && - !rwsem_is_locked(&tlb->mm->mmap_sem), vma); + if (next - addr != HPAGE_PMD_SIZE) __split_huge_pmd(vma, pmd, addr, false, NULL); - } else if (zap_huge_pmd(tlb, vma, pmd, addr)) + else if (zap_huge_pmd(tlb, vma, pmd, addr)) goto next; /* fall through */ } ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-08-01 18:31 ` Hugh Dickins @ 2018-08-01 20:58 ` Kirill A. Shutemov 2018-08-01 21:55 ` Hugh Dickins 0 siblings, 1 reply; 32+ messages in thread From: Kirill A. Shutemov @ 2018-08-01 20:58 UTC (permalink / raw) To: Hugh Dickins Cc: Linus Torvalds, Tony Luck, Amit Pundir, John Stultz, Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm, Linux Kernel Mailing List, youling 257, Joel Fernandes, Colin Cross On Wed, Aug 01, 2018 at 11:31:52AM -0700, Hugh Dickins wrote: > On Wed, 1 Aug 2018, Linus Torvalds wrote: > > > > Anyway, the upshot of all this is that I think I know what the ia64 > > problem was, and John sent the patch for the ashmem case, and I'm > > going to hold off reverting that vma_is_anonymous() false-positives > > commit after all. > > I'd better send deletion of zap_pmd_range()'s VM_BUG_ON_VMA(): below > (but I've no proprietorial interest, if you prefer to do your own). Agreed. Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > John's patch is good, and originally I thought it was safe from that > VM_BUG_ON_VMA(), because the /dev/ashmem fd exposed to the user is > disconnected from the vm_file in the vma, and madvise(,,MADV_REMOVE) > insists on VM_SHARED. But afterwards read John's earlier mail, > drawing attention to the vfs_fallocate() in there: I may be wrong, > and I don't know if Android has THP in the config anyway, but it looks > to me like an unmap_mapping_range() from ashmem's vfs_fallocate() > could hit precisely the VM_BUG_ON_VMA(), once it's vma_is_anonymous(). > > (I'm not familiar with ashmem, and I certainly don't understand the > role of MAP_PRIVATE ashmem mappings - hole-punch's zap_pte_range() > should end up leaving any anon pages in place; but the presence of > the BUG is requiring us all to understand too much too quickly.) Hugh, do you see any reason why ashmem shouldn't have vm_ops == shmem_vm_ops? I don't understand ashmem, but I feel uncomfortable that we have this sneaky way to create an anonymous VMA. It feels wrong to me. -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-08-01 20:58 ` Kirill A. Shutemov @ 2018-08-01 21:55 ` Hugh Dickins 2018-08-02 19:12 ` John Stultz 0 siblings, 1 reply; 32+ messages in thread From: Hugh Dickins @ 2018-08-01 21:55 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Hugh Dickins, Linus Torvalds, Tony Luck, Amit Pundir, John Stultz, Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm, Linux Kernel Mailing List, youling 257, Joel Fernandes, Colin Cross On Wed, 1 Aug 2018, Kirill A. Shutemov wrote: > On Wed, Aug 01, 2018 at 11:31:52AM -0700, Hugh Dickins wrote: > > On Wed, 1 Aug 2018, Linus Torvalds wrote: > > > > > > Anyway, the upshot of all this is that I think I know what the ia64 > > > problem was, and John sent the patch for the ashmem case, and I'm > > > going to hold off reverting that vma_is_anonymous() false-positives > > > commit after all. > > > > I'd better send deletion of zap_pmd_range()'s VM_BUG_ON_VMA(): below > > (but I've no proprietorial interest, if you prefer to do your own). > > Agreed. > > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Thanks. > > > John's patch is good, and originally I thought it was safe from that > > VM_BUG_ON_VMA(), because the /dev/ashmem fd exposed to the user is > > disconnected from the vm_file in the vma, and madvise(,,MADV_REMOVE) > > insists on VM_SHARED. But afterwards read John's earlier mail, > > drawing attention to the vfs_fallocate() in there: I may be wrong, > > and I don't know if Android has THP in the config anyway, but it looks > > to me like an unmap_mapping_range() from ashmem's vfs_fallocate() > > could hit precisely the VM_BUG_ON_VMA(), once it's vma_is_anonymous(). > > > > (I'm not familiar with ashmem, and I certainly don't understand the > > role of MAP_PRIVATE ashmem mappings - hole-punch's zap_pte_range() > > should end up leaving any anon pages in place; but the presence of > > the BUG is requiring us all to understand too much too quickly.) > > Hugh, do you see any reason why ashmem shouldn't have vm_ops == > shmem_vm_ops? I cannot immediately think of an absolute reason why not, but I'm not giving it much thought; and that might turn it into a stranger beast than it already is. > > I don't understand ashmem, but I feel uncomfortable that we have this > sneaky way to create an anonymous VMA. It feels wrong to me. I agree it's odd, but in this respect it's no odder than /dev/zero: that has exactly the same pattern of shmem_zero_setup() for VM_SHARED, else vma_set_anonymous(): which made me comfortable with John's patch, restoring the way it worked before. Admittedly, the subsequent vfs_fallocate() might generate surprises; and the business of doing a shmem_file_setup() first, and then undoing it with a shmem_zero_setup(), looks weird - from John's old XXX comment, I think it was a quick hack to piece together some functionality they needed in a hurry, which never got revisited (they wanted a name for the area? maybe memfd would be good for that now). But if what's in there is working now, I do not want to mess with it: I'd be adding bugs faster than removing them. Hugh ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-08-01 21:55 ` Hugh Dickins @ 2018-08-02 19:12 ` John Stultz 0 siblings, 0 replies; 32+ messages in thread From: John Stultz @ 2018-08-02 19:12 UTC (permalink / raw) To: Hugh Dickins Cc: Kirill A. Shutemov, Linus Torvalds, Tony Luck, Amit Pundir, Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm, Linux Kernel Mailing List, youling 257, Joel Fernandes, Colin Cross On Wed, Aug 1, 2018 at 2:55 PM, Hugh Dickins <hughd@google.com> wrote: > On Wed, 1 Aug 2018, Kirill A. Shutemov wrote: >> On Wed, Aug 01, 2018 at 11:31:52AM -0700, Hugh Dickins wrote: >> > On Wed, 1 Aug 2018, Linus Torvalds wrote: >> > > >> > > Anyway, the upshot of all this is that I think I know what the ia64 >> > > problem was, and John sent the patch for the ashmem case, and I'm >> > > going to hold off reverting that vma_is_anonymous() false-positives >> > > commit after all. >> > >> > I'd better send deletion of zap_pmd_range()'s VM_BUG_ON_VMA(): below >> > (but I've no proprietorial interest, if you prefer to do your own). >> >> Agreed. >> >> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > Thanks. > >> >> > John's patch is good, and originally I thought it was safe from that >> > VM_BUG_ON_VMA(), because the /dev/ashmem fd exposed to the user is >> > disconnected from the vm_file in the vma, and madvise(,,MADV_REMOVE) >> > insists on VM_SHARED. But afterwards read John's earlier mail, >> > drawing attention to the vfs_fallocate() in there: I may be wrong, >> > and I don't know if Android has THP in the config anyway, but it looks >> > to me like an unmap_mapping_range() from ashmem's vfs_fallocate() >> > could hit precisely the VM_BUG_ON_VMA(), once it's vma_is_anonymous(). >> > >> > (I'm not familiar with ashmem, and I certainly don't understand the >> > role of MAP_PRIVATE ashmem mappings - hole-punch's zap_pte_range() >> > should end up leaving any anon pages in place; but the presence of >> > the BUG is requiring us all to understand too much too quickly.) >> >> Hugh, do you see any reason why ashmem shouldn't have vm_ops == >> shmem_vm_ops? > > I cannot immediately think of an absolute reason why not, but I'm > not giving it much thought; and that might turn it into a stranger > beast than it already is. > >> >> I don't understand ashmem, but I feel uncomfortable that we have this >> sneaky way to create an anonymous VMA. It feels wrong to me. > > I agree it's odd, but in this respect it's no odder than /dev/zero: > that has exactly the same pattern of shmem_zero_setup() for VM_SHARED, > else vma_set_anonymous(): which made me comfortable with John's patch, > restoring the way it worked before. > > Admittedly, the subsequent vfs_fallocate() might generate surprises; > and the business of doing a shmem_file_setup() first, and then undoing > it with a shmem_zero_setup(), looks weird - from John's old XXX comment, Yes, it is weird. :) > I think it was a quick hack to piece together some functionality they > needed in a hurry, which never got revisited (they wanted a name for > the area? maybe memfd would be good for that now). So my XXX comment is to do with a change I made to ashmem in order for it to go into staging, compared with the original Android implementation. They still carry the patch to undo it here: https://android.googlesource.com/kernel/common.git/+/ebfc8d6476a66dc91a1b30cbfc18e67d4401af09%5E%21/ But it has more to do w/ in the shared mapping case, providing a cleaner way of setting the vma->vm_ops = &shmem_vm_ops without having to use shmem_zero_setup(), and doesn't change the behavior in the private mapping case. This discussion has spurred Joel and I to chat a bit about reviving the effort to "properly" upstream ashmem. We're still in discussion but I'm thinking we might just try to add the pin/unpin functionality to memfd. We'll see how the discussion evolves. thanks -john ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-08-01 17:15 ` Linus Torvalds 2018-08-01 18:31 ` Hugh Dickins @ 2018-08-01 18:36 ` Luck, Tony 2018-08-01 20:05 ` Linus Torvalds 2018-08-02 6:59 ` Amit Pundir 3 siblings, 0 replies; 32+ messages in thread From: Luck, Tony @ 2018-08-01 18:36 UTC (permalink / raw) To: Linus Torvalds Cc: Kirill A. Shutemov, Amit Pundir, John Stultz, Hugh Dickins, Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm, Linux Kernel Mailing List, youling 257, Joel Fernandes, Colin Cross On Wed, Aug 01, 2018 at 10:15:05AM -0700, Linus Torvalds wrote: > Tony, can you please double-check my commit ebad825cdd4e ("ia64: mark > special ia64 memory areas anonymous") fixes things for you? I didn't > even compile it, but it really looks so obvious that I just committed > it directly. It's not pushed out yet (I'm doing the normal full build > test because of the ashmem commit first), but it should be out in > about 20 minutes when my testing has finished. > > I'd like to get this sorted out asap, although at this point I still > think that I'll have to do an rc8 even though I feel like we may have > caught everything. I pulled and got HEAD = 44960f2a7b63e224b1091b3e1d6f60e0cdf4be0c which includes ebad825cdd4e. ia64 boots fine with no issues. Thanks -Tony ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-08-01 17:15 ` Linus Torvalds 2018-08-01 18:31 ` Hugh Dickins 2018-08-01 18:36 ` Luck, Tony @ 2018-08-01 20:05 ` Linus Torvalds 2018-08-01 20:51 ` Kirill A. Shutemov 2018-08-02 6:59 ` Amit Pundir 3 siblings, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2018-08-01 20:05 UTC (permalink / raw) To: Tony Luck Cc: Kirill A. Shutemov, Amit Pundir, John Stultz, Hugh Dickins, Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm, Linux Kernel Mailing List, youling 257, Joel Fernandes, Colin Cross [-- Attachment #1: Type: text/plain, Size: 1614 bytes --] On Wed, Aug 1, 2018 at 10:15 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I'm still unhappy about the vma_init() ones, and I have not decided > how to go with those. Either the memset() in vma_init(), or just > reverting the (imho unnecessary) commit 2c4541e24c55. Kirill, Andrew, > comments? Ugh. Adding a memset looks simple, but screws up some places that have other initialization. It also requires adding a new include of <linux/string.h>, or we'd need to uninline vma_init() and put it somewhere else. But just reverting commit 2c4541e24c55 ("mm: use vma_init() to initialize VMAs on stack and data segments") entirely isn't good either, because some of the cases aren't about the TLB flush interface, and call down to "real" VM functions. The 'pseudo_vma' use of remove_inode_hugepages() and hugetlbfs_fallocate() in particular is odd, but using vma_init() looks good there. And those places had the memset() already. So I'm inclined to simply mark the TLB-related vma_init() cases special, and use something like this: #define TLB_FLUSH_VMA(mm,flags) { .vm_mm = (mm), .vm_flags = (flags) } to make it very obvious when we're doing that vma initialization for flush_tlb_range(). It's done as an initializer, exactly so that the only valid syntax is to do somethin glike this: struct vm_area_struct vma = TLB_FLUSH_VMA(mm, VM_EXEC); That leaves vma_init() users to be just the actual real allocation path, and a few very specific specual vmas (the hugetlbfs and mempolicy pseudo-vma, and a couple of "gate" vmas). Suggested patch attached. Comments? Linus [-- Attachment #2: patch.diff --] [-- Type: text/x-patch, Size: 4025 bytes --] arch/arm/mach-rpc/ecard.c | 5 +---- arch/arm64/include/asm/tlb.h | 4 +--- arch/arm64/mm/hugetlbpage.c | 10 ++++------ arch/ia64/include/asm/tlb.h | 7 +++---- include/linux/mm.h | 3 +++ 5 files changed, 12 insertions(+), 17 deletions(-) diff --git a/arch/arm/mach-rpc/ecard.c b/arch/arm/mach-rpc/ecard.c index 8db62cc54a6a..04b2f22c2739 100644 --- a/arch/arm/mach-rpc/ecard.c +++ b/arch/arm/mach-rpc/ecard.c @@ -212,7 +212,7 @@ static DEFINE_MUTEX(ecard_mutex); */ static void ecard_init_pgtables(struct mm_struct *mm) { - struct vm_area_struct vma; + struct vm_area_struct vma = TLB_FLUSH_VMA(mm, VM_EXEC); /* We want to set up the page tables for the following mapping: * Virtual Physical @@ -237,9 +237,6 @@ static void ecard_init_pgtables(struct mm_struct *mm) memcpy(dst_pgd, src_pgd, sizeof(pgd_t) * (EASI_SIZE / PGDIR_SIZE)); - vma_init(&vma, mm); - vma.vm_flags = VM_EXEC; - flush_tlb_range(&vma, IO_START, IO_START + IO_SIZE); flush_tlb_range(&vma, EASI_START, EASI_START + EASI_SIZE); } diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h index d87f2d646caa..0ad1cf233470 100644 --- a/arch/arm64/include/asm/tlb.h +++ b/arch/arm64/include/asm/tlb.h @@ -37,9 +37,7 @@ static inline void __tlb_remove_table(void *_table) static inline void tlb_flush(struct mmu_gather *tlb) { - struct vm_area_struct vma; - - vma_init(&vma, tlb->mm); + struct vm_area_struct vma = TLB_FLUSH_VMA(tlb->mm, 0); /* * The ASID allocator will either invalidate the ASID or mark diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 1854e49aa18a..192b3ba07075 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -108,13 +108,10 @@ static pte_t get_clear_flush(struct mm_struct *mm, unsigned long pgsize, unsigned long ncontig) { - struct vm_area_struct vma; pte_t orig_pte = huge_ptep_get(ptep); bool valid = pte_valid(orig_pte); unsigned long i, saddr = addr; - vma_init(&vma, mm); - for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) { pte_t pte = ptep_get_and_clear(mm, addr, ptep); @@ -127,8 +124,10 @@ static pte_t get_clear_flush(struct mm_struct *mm, orig_pte = pte_mkdirty(orig_pte); } - if (valid) + if (valid) { + struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0); flush_tlb_range(&vma, saddr, addr); + } return orig_pte; } @@ -147,10 +146,9 @@ static void clear_flush(struct mm_struct *mm, unsigned long pgsize, unsigned long ncontig) { - struct vm_area_struct vma; + struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0); unsigned long i, saddr = addr; - vma_init(&vma, mm); for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) pte_clear(mm, addr, ptep); diff --git a/arch/ia64/include/asm/tlb.h b/arch/ia64/include/asm/tlb.h index db89e7306081..516355a774bf 100644 --- a/arch/ia64/include/asm/tlb.h +++ b/arch/ia64/include/asm/tlb.h @@ -115,12 +115,11 @@ ia64_tlb_flush_mmu_tlbonly(struct mmu_gather *tlb, unsigned long start, unsigned flush_tlb_all(); } else { /* - * XXX fix me: flush_tlb_range() should take an mm pointer instead of a - * vma pointer. + * flush_tlb_range() takes a vma instead of a mm pointer because + * some architectures want the vm_flags for ITLB/DTLB flush. */ - struct vm_area_struct vma; + struct vm_area_struct vma = TLB_FLUSH_VMA(tlb->mm, 0); - vma_init(&vma, tlb->mm); /* flush the address range from the tlb: */ flush_tlb_range(&vma, start, end); /* now flush the virt. page-table area mapping the address range: */ diff --git a/include/linux/mm.h b/include/linux/mm.h index 7ba6d356d18f..68a5121694ef 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -466,6 +466,9 @@ static inline void vma_set_anonymous(struct vm_area_struct *vma) vma->vm_ops = NULL; } +/* flush_tlb_range() takes a vma, not a mm, and can care about flags */ +#define TLB_FLUSH_VMA(mm,flags) { .vm_mm = (mm), .vm_flags = (flags) } + struct mmu_gather; struct inode; ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-08-01 20:05 ` Linus Torvalds @ 2018-08-01 20:51 ` Kirill A. Shutemov 2018-08-01 20:56 ` Linus Torvalds 0 siblings, 1 reply; 32+ messages in thread From: Kirill A. Shutemov @ 2018-08-01 20:51 UTC (permalink / raw) To: Linus Torvalds Cc: Tony Luck, Amit Pundir, John Stultz, Hugh Dickins, Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm, Linux Kernel Mailing List, youling 257, Joel Fernandes, Colin Cross On Wed, Aug 01, 2018 at 01:05:48PM -0700, Linus Torvalds wrote: > On Wed, Aug 1, 2018 at 10:15 AM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > I'm still unhappy about the vma_init() ones, and I have not decided > > how to go with those. Either the memset() in vma_init(), or just > > reverting the (imho unnecessary) commit 2c4541e24c55. Kirill, Andrew, > > comments? > > Ugh. Adding a memset looks simple, but screws up some places that have > other initialization. It also requires adding a new include of > <linux/string.h>, or we'd need to uninline vma_init() and put it > somewhere else. > > But just reverting commit 2c4541e24c55 ("mm: use vma_init() to > initialize VMAs on stack and data segments") entirely isn't good > either, because some of the cases aren't about the TLB flush > interface, and call down to "real" VM functions. The 'pseudo_vma' use > of remove_inode_hugepages() and hugetlbfs_fallocate() in particular is > odd, but using vma_init() looks good there. And those places had the > memset() already. > > So I'm inclined to simply mark the TLB-related vma_init() cases > special, and use something like this: > > #define TLB_FLUSH_VMA(mm,flags) { .vm_mm = (mm), .vm_flags = (flags) } > > to make it very obvious when we're doing that vma initialization for > flush_tlb_range(). It's done as an initializer, exactly so that the > only valid syntax is to do somethin glike this: > > struct vm_area_struct vma = TLB_FLUSH_VMA(mm, VM_EXEC); > > That leaves vma_init() users to be just the actual real allocation > path, and a few very specific specual vmas (the hugetlbfs and > mempolicy pseudo-vma, and a couple of "gate" vmas). > > Suggested patch attached. Comments? Is there a reason why we pass vma to flush_tlb_range? It's not obvious to me what information from VMA can be useful for an implementation. I see that ecard.c initialize vm_flags too, but it seems unused by flush_tlb_range. Maybe it's cleaner to have generic helper flush_tlb_range_mm() or something? In longer term we can change the interface to take mm instead of vma. -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-08-01 20:51 ` Kirill A. Shutemov @ 2018-08-01 20:56 ` Linus Torvalds 2018-08-01 21:25 ` Kirill A. Shutemov 0 siblings, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2018-08-01 20:56 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Tony Luck, Amit Pundir, John Stultz, Hugh Dickins, Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm, Linux Kernel Mailing List, youling 257, Joel Fernandes, Colin Cross On Wed, Aug 1, 2018 at 1:52 PM Kirill A. Shutemov <kirill@shutemov.name> wrote: > > Is there a reason why we pass vma to flush_tlb_range? Yes. It's even in that patch. The fact is, real MM users *have* a vma, and passing it in to the TLB flushing is the right thing to do. That allows architectures that care (mainly powerpc, I think) to notice that "hey, this range only had execute permissions, so I only need to flush the ITLB". The people who use tlb_flush_range() any other way are doing an arch-specific hack. It's not how tlb_flush_range() was defined, and it's not how you can use it in general. > It's not obvious to me what information from VMA can be useful for an > implementation. See the patch I sent, which had this as part of it: - * XXX fix me: flush_tlb_range() should take an mm pointer instead of a - * vma pointer. + * flush_tlb_range() takes a vma instead of a mm pointer because + * some architectures want the vm_flags for ITLB/DTLB flush. because I wanted to educate people about why the interface was what it was, and the "fixme" was bogus shit. > In longer term we can change the interface to take mm instead of vma. FUCK NO! Goddammit, read the code, or read the patch. The places ytou added those broken vma_init() calls to were architecture-specific hacks. Those architecture-specific hacks do not get to screw up the design for everybody else. Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-08-01 20:56 ` Linus Torvalds @ 2018-08-01 21:25 ` Kirill A. Shutemov 0 siblings, 0 replies; 32+ messages in thread From: Kirill A. Shutemov @ 2018-08-01 21:25 UTC (permalink / raw) To: Linus Torvalds Cc: Tony Luck, Amit Pundir, John Stultz, Hugh Dickins, Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm, Linux Kernel Mailing List, youling 257, Joel Fernandes, Colin Cross On Wed, Aug 01, 2018 at 01:56:19PM -0700, Linus Torvalds wrote: > On Wed, Aug 1, 2018 at 1:52 PM Kirill A. Shutemov <kirill@shutemov.name> wrote: > > > > Is there a reason why we pass vma to flush_tlb_range? > > Yes. It's even in that patch. > > The fact is, real MM users *have* a vma, and passing it in to the TLB > flushing is the right thing to do. That allows architectures that care > (mainly powerpc, I think) to notice that "hey, this range only had > execute permissions, so I only need to flush the ITLB". > > The people who use tlb_flush_range() any other way are doing an > arch-specific hack. It's not how tlb_flush_range() was defined, and > it's not how you can use it in general. Okay, I see. ARM, unicore32 and xtensa avoid iTLB flush for non-executable VMAs. > > > It's not obvious to me what information from VMA can be useful for an > > implementation. > > See the patch I sent, which had this as part of it: > > - * XXX fix me: flush_tlb_range() should take an mm > pointer instead of a > - * vma pointer. > + * flush_tlb_range() takes a vma instead of a mm pointer because > + * some architectures want the vm_flags for ITLB/DTLB flush. > > because I wanted to educate people about why the interface was what it > was, and the "fixme" was bogus shit. I didn't noticied this. Sorry. -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-08-01 17:15 ` Linus Torvalds ` (2 preceding siblings ...) 2018-08-01 20:05 ` Linus Torvalds @ 2018-08-02 6:59 ` Amit Pundir 3 siblings, 0 replies; 32+ messages in thread From: Amit Pundir @ 2018-08-02 6:59 UTC (permalink / raw) To: Linus Torvalds Cc: tony.luck, kirill, John Stultz, Hugh Dickins, willy, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, aarcange, Greg Kroah-Hartman, linux-mm, lkml, youling 257, Joel Fernandes, Colin Cross On Wed, 1 Aug 2018 at 22:45, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I'd like to get this sorted out asap, although at this point I still > think that I'll have to do an rc8 even though I feel like we may have > caught everything. No AOSP regressions in my limited smoke testing so far with current HEAD: 6b4703768268 ("Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm"). Thanks. Regards, Amit Pundir ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] staging: ashmem: Fix SIGBUS crash when traversing mmaped ashmem pages 2018-07-31 16:29 ` Linus Torvalds 2018-07-31 16:56 ` John Stultz 2018-07-31 17:03 ` Kirill A. Shutemov @ 2018-07-31 17:17 ` John Stultz 2 siblings, 0 replies; 32+ messages in thread From: John Stultz @ 2018-07-31 17:17 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: John Stultz, Amit Pundir, Kirill A. Shutemov, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, aarcange, Linus Torvalds, Greg Kroah-Hartman, Hugh Dickins, Joel Fernandes, Colin Cross, Matthew Wilcox, linux-mm, youling 257 Amit Pundir and Youling in parallel reported crashes with recent mainline kernels running Android: F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** F DEBUG : Build fingerprint: 'Android/db410c32_only/db410c32_only:Q/OC-MR1/102:userdebug/test-key F DEBUG : Revision: '0' F DEBUG : ABI: 'arm' F DEBUG : pid: 2261, tid: 2261, name: zygote >>> zygote <<< F DEBUG : signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0xec00008 ... <snip> ... F DEBUG : backtrace: F DEBUG : #00 pc 00001c04 /system/lib/libc.so (memset+48) F DEBUG : #01 pc 0010c513 /system/lib/libart.so (create_mspace_with_base+82) F DEBUG : #02 pc 0015c601 /system/lib/libart.so (art::gc::space::DlMallocSpace::CreateMspace(void*, unsigned int, unsigned int)+40) F DEBUG : #03 pc 0015c3ed /system/lib/libart.so (art::gc::space::DlMallocSpace::CreateFromMemMap(art::MemMap*, std::__1::basic_string<char, std::__ 1::char_traits<char>, std::__1::allocator<char>> const&, unsigned int, unsigned int, unsigned int, unsigned int, bool)+36) ... This was bisected back to commit bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives"). create_mspace_with_base() in the trace above, utilizes ashmem, and with ashmem, for shared mappings we use shmem_zero_setup(), which sets the vma->vm_ops to &shmem_vm_ops. But for private ashmem mappings nothing sets the vma->vm_ops. Looking at the problematic patch, it seems to add a requirement that one call vma_set_anonymous() on a vma, otherwise the dummy_vm_ops will be used. Using the dummy_vm_ops seem to triggger SIGBUS when traversing unmapped pages. Thus, this patch adds a call to vma_set_anonymous() for ashmem private mappings and seems to avoid the reported problem. Cc: Amit Pundir <amit.pundir@linaro.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: aarcange@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Colin Cross <ccross@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: linux-mm@kvack.org Cc: youling 257 <youling257@gmail.com> Fixes: bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives") Reported-by: Amit Pundir <amit.pundir@linaro.org> Reported-by: Youling 257 <youling257@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org> --- Hopefully my explanation make sense here. Please let me know if it needs corrections. thanks -john --- drivers/staging/android/ashmem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c index a1a0025..d5d33e1 100644 --- a/drivers/staging/android/ashmem.c +++ b/drivers/staging/android/ashmem.c @@ -402,6 +402,8 @@ static int ashmem_mmap(struct file *file, struct vm_area_struct *vma) fput(asma->file); goto out; } + } else { + vma_set_anonymous(vma); } if (vma->vm_file) -- 2.7.4 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-07-31 1:01 ` Linus Torvalds 2018-07-31 3:26 ` Hugh Dickins @ 2018-07-31 6:29 ` Kirill A. Shutemov 2018-07-31 14:57 ` Kirill A. Shutemov 1 sibling, 1 reply; 32+ messages in thread From: Kirill A. Shutemov @ 2018-07-31 6:29 UTC (permalink / raw) To: Linus Torvalds Cc: Hugh Dickins, Matthew Wilcox, Amit Pundir, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, John Stultz, linux-mm, Linux Kernel Mailing List, youling257 On Mon, Jul 30, 2018 at 06:01:26PM -0700, Linus Torvalds wrote: > On Mon, Jul 30, 2018 at 2:53 PM Hugh Dickins <hughd@google.com> wrote: > > > > I have no problem with reverting -rc7's vma_is_anonymous() series. > > I don't think we need to revert the whole series: I think the rest are > all fairly obvious cleanups, and shouldn't really have any semantic > changes. > > It's literally only that last patch in the series that then changes > that meaning of "vm_ops". And I don't really _mind_ that last step > either, but since we don't know exactly what it was that it broke, and > we're past rc7, I don't think we really have any option but the revert > it. > > And if we revert it, I think we need to just remove the > VM_BUG_ON_VMA() that it was supposed to fix. Because I do think that > it is quite likely that the real bug is that overzealous BUG_ON(), > since I can't see any reason why anonymous mappings should be special > there. > > But I'm certainly also ok with re-visiting that commit later. I just > think that right _now_ the above is my preferred plan. > > Kirill? Considering the timing, I'm okay with reverting the last patch with dropping the VM_BUG_ON_VMA(). But in the end I would like to see strong vma_is_anonymous(). The VM_BUG_ON_VMA() is only triggerable by the test case because vma_is_anonymous() false-positive in fault path and we get anon-THP allocated in file-private mapping. I don't see immediately how this may trigger other crashes. But it definitely looks wrong. > > I'm all for deleting that VM_BUG_ON_VMA() in zap_pmd_range(), it was > > just a compromise with those who wanted to keep something there; > > I don't think we even need a WARN_ON_ONCE() now. > > So to me it looks like a historical check that simply doesn't > "normally" trigger, but there's no reason I can see why we should care > about the case it tests against. I'll think more on what could go wrong with __split_huge_pmd() called on anon-THP page without mmap_sem(). It's not yet clear cut to me. -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-07-31 6:29 ` Linux 4.18-rc7 Kirill A. Shutemov @ 2018-07-31 14:57 ` Kirill A. Shutemov 2018-08-01 0:09 ` Hugh Dickins 0 siblings, 1 reply; 32+ messages in thread From: Kirill A. Shutemov @ 2018-07-31 14:57 UTC (permalink / raw) To: Linus Torvalds Cc: Hugh Dickins, Matthew Wilcox, Amit Pundir, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, John Stultz, linux-mm, Linux Kernel Mailing List, youling257 On Tue, Jul 31, 2018 at 09:29:27AM +0300, Kirill A. Shutemov wrote: > On Mon, Jul 30, 2018 at 06:01:26PM -0700, Linus Torvalds wrote: > > On Mon, Jul 30, 2018 at 2:53 PM Hugh Dickins <hughd@google.com> wrote: > > > > > > I have no problem with reverting -rc7's vma_is_anonymous() series. > > > > I don't think we need to revert the whole series: I think the rest are > > all fairly obvious cleanups, and shouldn't really have any semantic > > changes. > > > > It's literally only that last patch in the series that then changes > > that meaning of "vm_ops". And I don't really _mind_ that last step > > either, but since we don't know exactly what it was that it broke, and > > we're past rc7, I don't think we really have any option but the revert > > it. > > > > And if we revert it, I think we need to just remove the > > VM_BUG_ON_VMA() that it was supposed to fix. Because I do think that > > it is quite likely that the real bug is that overzealous BUG_ON(), > > since I can't see any reason why anonymous mappings should be special > > there. > > > > But I'm certainly also ok with re-visiting that commit later. I just > > think that right _now_ the above is my preferred plan. > > > > Kirill? > > Considering the timing, I'm okay with reverting the last patch with > dropping the VM_BUG_ON_VMA(). > > But in the end I would like to see strong vma_is_anonymous(). > > The VM_BUG_ON_VMA() is only triggerable by the test case because > vma_is_anonymous() false-positive in fault path and we get anon-THP > allocated in file-private mapping. > > I don't see immediately how this may trigger other crashes. > But it definitely looks wrong. > > > > I'm all for deleting that VM_BUG_ON_VMA() in zap_pmd_range(), it was > > > just a compromise with those who wanted to keep something there; > > > I don't think we even need a WARN_ON_ONCE() now. > > > > So to me it looks like a historical check that simply doesn't > > "normally" trigger, but there's no reason I can see why we should care > > about the case it tests against. > > I'll think more on what could go wrong with __split_huge_pmd() called on > anon-THP page without mmap_sem(). It's not yet clear cut to me. I think not having mmap_sem taken at least on read when we call __split_huge_pmd() opens possiblity of race with khugepaged: khugepaged can collapse the page back to THP as soon as we drop ptl. As result pmd_none_or_trans_huge_or_clear_bad() would return true and we basically leave the THP behind, not zapped. -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-07-31 14:57 ` Kirill A. Shutemov @ 2018-08-01 0:09 ` Hugh Dickins 0 siblings, 0 replies; 32+ messages in thread From: Hugh Dickins @ 2018-08-01 0:09 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Linus Torvalds, Hugh Dickins, Matthew Wilcox, Amit Pundir, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, John Stultz, linux-mm, Linux Kernel Mailing List, youling257 On Tue, 31 Jul 2018, Kirill A. Shutemov wrote: > On Tue, Jul 31, 2018 at 09:29:27AM +0300, Kirill A. Shutemov wrote: > > On Mon, Jul 30, 2018 at 06:01:26PM -0700, Linus Torvalds wrote: > > > > > > So to me it looks like a historical check that simply doesn't > > > "normally" trigger, but there's no reason I can see why we should care > > > about the case it tests against. > > > > I'll think more on what could go wrong with __split_huge_pmd() called on > > anon-THP page without mmap_sem(). It's not yet clear cut to me. > > I think not having mmap_sem taken at least on read when we call > __split_huge_pmd() opens possiblity of race with khugepaged: > khugepaged can collapse the page back to THP as soon as we drop ptl. > As result pmd_none_or_trans_huge_or_clear_bad() would return true and we > basically leave the THP behind, not zapped. I think we don't care deeply about the POSIX truncate semantics on the kind of "file" that has managed to get to this point: in the unlikely event that a THP is immediately recreated there, never mind, so long as we don't crash or leak memory or suchlike (the surplus THP would get freed at exit). I think we're altogether better off just deleting that VM_BUG_ON_VMA(); but I do find it very very hard to arrive at a firm conclusion on the absolute safety of splitting a pmd without mmap_sem there (though any problem unlikely even if real, and more likely a figment of my paranoia). I believe the VM_BUG_ON is a relic from the old days, when anon_vma_lock played a big part in guarding the pmd+page split: remember how mmap_sem is one of the ways you can guarantee that the anon_vma will not vanish beneath you (page_get_anon_vma was added later than anon THP). I'm a little more worried by the nearby zap_huge_pmd() (which could never be covered by a suitable VM_BUG_ON): the way that frees a previously deposited page table, and you have no guarantee of when and where that page table was last used. Again I can't point to an actual problem, just the recollection that it's been found subtly safe in the past, but any change in the conditions might affect that. And a little worried to see how split_huge_page_to_list() uses anon_vma_lock on PageAnon versus i_mmap_lock on !PageAnon: which makes complete sense in itself, but won't protect against a PageAnon THP being concurrently split from the truncate_pagecache() direction, where unmap_mapping_range() uses i_mmap_lock. (simple_setattr() the default setattr: that's a bit of a worry too.) I feel I'm moaning and crying at shadows, rather than providing any useful suggestions or patches; but thought I ought to report back. Hugh ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Linux 4.18-rc7 2018-07-29 22:09 Linux 4.18-rc7 Linus Torvalds 2018-07-30 6:47 ` Amit Pundir @ 2018-07-30 13:24 ` Guenter Roeck 1 sibling, 0 replies; 32+ messages in thread From: Guenter Roeck @ 2018-07-30 13:24 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List On Sun, Jul 29, 2018 at 03:09:05PM -0700, Linus Torvalds wrote: > So unless something odd happens, this should be the last rc for 4.18. > > Nothing particularly odd happened this last week - we got the usual > random set of various minor fixes all over. About two thirds of it is > drivers - networking, staging and usb stands out, but there's a little > bit of stuff all over (clk, block, gpu, nvme..). > > Outside of drivers, the bulk is some core networking stuff, with > random changes elsewhere (minor arch updates, filesystems, core > kernel, test scripts). > Build results: total: 134 pass: 134 fail: 0 Qemu test results: total: 183 pass: 183 fail: 0 Fetails are available at http://kerneltests.org/builders/. Guenter ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2018-08-02 19:12 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-07-29 22:09 Linux 4.18-rc7 Linus Torvalds 2018-07-30 6:47 ` Amit Pundir 2018-07-30 13:01 ` Kirill A. Shutemov 2018-07-30 13:34 ` Amit Pundir 2018-07-30 17:32 ` Linus Torvalds 2018-07-30 21:53 ` Hugh Dickins 2018-07-31 1:01 ` Linus Torvalds 2018-07-31 3:26 ` Hugh Dickins 2018-07-31 4:25 ` John Stultz 2018-07-31 6:40 ` Amit Pundir 2018-07-31 6:56 ` Kirill A. Shutemov 2018-07-31 16:29 ` Linus Torvalds 2018-07-31 16:56 ` John Stultz 2018-07-31 17:03 ` Kirill A. Shutemov 2018-07-31 17:43 ` Luck, Tony 2018-07-31 19:02 ` Linus Torvalds 2018-08-01 17:15 ` Linus Torvalds 2018-08-01 18:31 ` Hugh Dickins 2018-08-01 20:58 ` Kirill A. Shutemov 2018-08-01 21:55 ` Hugh Dickins 2018-08-02 19:12 ` John Stultz 2018-08-01 18:36 ` Luck, Tony 2018-08-01 20:05 ` Linus Torvalds 2018-08-01 20:51 ` Kirill A. Shutemov 2018-08-01 20:56 ` Linus Torvalds 2018-08-01 21:25 ` Kirill A. Shutemov 2018-08-02 6:59 ` Amit Pundir 2018-07-31 17:17 ` [PATCH] staging: ashmem: Fix SIGBUS crash when traversing mmaped ashmem pages John Stultz 2018-07-31 6:29 ` Linux 4.18-rc7 Kirill A. Shutemov 2018-07-31 14:57 ` Kirill A. Shutemov 2018-08-01 0:09 ` Hugh Dickins 2018-07-30 13:24 ` Guenter Roeck
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).