linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Linux 4.18-rc7
@ 2018-07-29 22:09 Linus Torvalds
  2018-07-30  6:47 ` Amit Pundir
  2018-07-30 13:24 ` Guenter Roeck
  0 siblings, 2 replies; 32+ messages in thread
From: Linus Torvalds @ 2018-07-29 22:09 UTC (permalink / raw)
  To: Linux Kernel Mailing List

So unless something odd happens, this should be the last rc for 4.18.

Nothing particularly odd happened this last week - we got the usual
random set of various minor fixes all over. About two thirds of it is
drivers - networking, staging and usb stands out, but there's a little
bit of stuff all over (clk, block, gpu, nvme..).

Outside of drivers, the bulk is some core networking stuff, with
random changes elsewhere (minor arch updates, filesystems, core
kernel, test scripts).

The appended shortlog gives a flavor of the details.

                  Linus

---

Adam Thomson (1):
      usb: typec: tcpm: Fix sink PDO starting index for PPS APDO selection

Aleksander Morgado (1):
      qmi_wwan: fix interface number for DW5821e production firmware

Alexander Sverdlin (1):
      i2c: davinci: Avoid zero value of CLKH

Amar Singhal (1):
      cfg80211: never ignore user regulatory hint

Anssi Hannula (7):
      can: xilinx_can: fix device dropping off bus on RX overrun
      can: xilinx_can: fix RX loop if RXNEMP is asserted without RXOK
      can: xilinx_can: fix recovery from error states not being propagated
      can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting
      can: xilinx_can: fix RX overflow interrupt not being enabled
      can: xilinx_can: fix incorrect clear of non-processed interrupts
      can: xilinx_can: fix power management handling

Antti Seppälä (2):
      usb: dwc2: Fix DMA alignment to start at allocated boundary
      usb: dwc2: Fix inefficient copy of unaligned buffers

Ariel Levkovich (1):
      net/mlx5: Adjust clock overflow work period

Arnd Bergmann (2):
      kasan: only select SLUB_DEBUG with SYSFS=y
      include/linux/eventfd.h: include linux/errno.h

Artem Savkov (1):
      tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure

Benjamin Herrenschmidt (4):
      clk: aspeed: Treat a gate in reset as disabled
      usb: gadget: aspeed-vhub: Fix SETUP packets with OUT data phase
      usb: gadget: Fix OS descriptors support
      usb: gadget: aspeed: Workaround memory ordering issue

Bernd Edlinger (1):
      nl80211: Add a missing break in parse_station_flags

Bin Liu (1):
      usb: core: handle hub C_PORT_OVER_CURRENT condition

Carlos Maiolino (1):
      xfs: Initialize variables in xfs_alloc_get_rec before using them

Chen-Yu Tsai (1):
      Input: i8042 - add Lenovo LaVie Z to the i8042 reset list

Clint Taylor (1):
      drm/i915/glk: Add Quirk for GLK NUC HDMI port issues.

Colin Ian King (1):
      usb: dwc3: gadget: remove redundant variable maxpacket

Dan Williams (1):
      mm: fix exports that inadvertently make put_page() EXPORT_SYMBOL_GPL

Daniel Borkmann (3):
      bpf, ppc64: fix unexpected r0=0 exit path inside bpf_xadd
      bpf: test case to check whether src/dst regs got mangled by xadd
      sock: fix sg page frag coalescing in sk_alloc_sg

Dave Jiang (1):
      mm: disallow mappings that conflict for devm_memremap_pages()

David Ahern (1):
      net/ipv6: Fix linklocal to global address with VRF

Davidlohr Bueso (1):
      ipc/sem.c: prevent queue.status tearing in semop

Denis Kenzior (1):
      nl80211/mac80211: allow non-linear skb in rx_control_port

Dirk Mueller (1):
      arm64: Check for errata before evaluating cpu features

Dmitry Torokhov (1):
      usb: dwc2: host: do not delay retries for CONTROL IN transfers

Donald Shanty III (1):
      Input: elan_i2c - add ACPI ID for lenovo ideapad 330

Doron Roberts-Kedes (1):
      tls: check RCV_SHUTDOWN in tls_wait_data

Enric Balletbo i Serra (1):
      usb: dwc3: rockchip: Fix PHY documentation links.

Eran Ben Elisha (2):
      net/mlx5e: Fix quota counting in aRFS expire flow
      net/mlx5e: Don't allow aRFS for encapsulated packets

Eric Dumazet (6):
      net: skb_segment() should not return NULL
      tcp: free batches of packets in tcp_prune_ofo_queue()
      tcp: avoid collapses in tcp_prune_queue() if possible
      tcp: detect malicious patterns in tcp_collapse_ofo_queue()
      tcp: call tcp_drop() from tcp_data_queue_ofo()
      tcp: add tcp_ooo_try_coalesce() helper

Eric Sandeen (1):
      xfs: properly handle free inodes in extent hint validators

Esben Haabendal (1):
      i2c: imx: Fix reinit_completion() use

Eugeniu Rosca (3):
      usb: gadget: f_uac2: fix error handling in afunc_bind (again)
      usb: gadget: u_audio: fix pcm/card naming in g_audio_setup()
      usb: gadget: f_uac2: fix endianness of 'struct cntrl_*_lay3'

Fabio Estevam (1):
      usb: chipidea: Always build ULPI code

Faiz Abbas (2):
      can: m_can: Fix runtime resume call
      can: m_can: Move accessing of message ram to after clocks are enabled

Felix Fietkau (1):
      MIPS: ath79: fix register address in ath79_ddr_wb_flush()

Florian Westphal (7):
      netfilter: nf_tables: use dev->name directly
      netfilter: nf_tables: free flow table struct too
      netfilter: nf_tables: fix memory leaks on chain rename
      netfilter: nf_tables: don't allow to rename to already-pending name
      netfilter: conntrack: dccp: treat SYNC/SYNCACK as invalid if no
prior state
      atl1c: reserve min skb headroom
      netfilter: nf_tables: move dumper state allocation into ->start

Geert Uytterhoeven (2):
      MAINTAINERS: Add file patterns for serio device tree bindings
      clk: Really show symbolic clock flags in debugfs

Greg Edwards (1):
      block: reset bi_iter.bi_done after splitting bio

Gregory CLEMENT (1):
      clk: mvebu: armada-37xx-periph: Fix switching CPU rate from
300Mhz to 1.2GHz

Guenter Roeck (1):
      media: staging: omap4iss: Include asm/cacheflush.h after generic includes

Hangbin Liu (1):
      multicast: do not restore deleted record source filter mode to new one

Hannes Reinecke (2):
      nvmet: fixup crash on NULL device path
      nvmet: only check for filebacking on -ENOTBLK

Hans de Goede (1):
      Revert "staging:r8188eu: Use lib80211 to support TKIP"

Heiner Kallweit (2):
      net: phy: consider PHY_IGNORE_INTERRUPT in phy_start_aneg_priv
      r8169: restore previous behavior to accept BIOS WoL settings

Jack Morgenstein (1):
      net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper

Jaedon Shin (1):
      phy: phy-brcm-usb-init: Fix power down USB 3.0 PHY when XHCI reenabled

James Smart (2):
      nvmet-fc: fix target sgl list on large transfers
      nvme: if_ready checks to fail io to deleting controller

Jarod Wilson (1):
      bonding: set default miimon value for non-arp modes if not set

Jerome Brunet (1):
      clk: meson: audio-divider is one based

Jerry Zhang (1):
      usb: gadget: f_fs: Only return delayed status when len is 0

Jia-Ju Bai (2):
      usb: gadget: r8a66597: Fix two possible sleep-in-atomic-context
bugs in init_controller()
      usb: gadget: r8a66597: Fix a possible sleep-in-atomic-context
bugs in r8a66597_queue()

Joel Stanley (2):
      clk: aspeed: Mark bclk (PCIe) and dclk (VGA) as critical
      clk: aspeed: Support HPLL strapping on ast2400

Johannes Weiner (1):
      arm64: fix vmemmap BUILD_BUG_ON() triggering on !vmemmap setups

John Hurley (1):
      nfp: flower: ensure dead neighbour entries are not offloaded

John Keeping (1):
      usb: dwc2: avoid NULL dereferences

Josef Bacik (2):
      nbd: don't requeue the same request twice.
      nbd: handle unexpected replies better

Joshua Frkuska (1):
      usb: gadget: u_audio: update hw_ptr in iso_complete after data copied

KT Liao (1):
      Input: elan_i2c - add another ACPI ID for Lenovo Ideapad 330-15AST

Keith Busch (2):
      blk-mq: export setting request completion state
      scsi: set timed out out mq requests to complete

Kiran Kumar Modukuri (5):
      fscache: Allow cancelled operations to be enqueued
      cachefiles: Fix refcounting bug in backing-file read monitoring
      fscache: Fix reference overput in fscache_attach_object() error handling
      cachefiles: Fix missing clear of the CACHEFILES_OBJECT_ACTIVE flag
      cachefiles: Wait rather than BUG'ing on "Unexpected object collision"

Kirill A. Shutemov (3):
      mm: introduce vma_init()
      mm: use vma_init() to initialize VMAs on stack and data segments
      mm: fix vma_is_anonymous() false-positives

Li Wang (1):
      zswap: re-check zswap_is_full() after do zswap_shrink()

Linus Torvalds (2):
      squashfs: be more careful about metadata corruption
      Linux 4.18-rc7

Linus Walleij (1):
      gpio: of: Handle fixed regulator flags properly

Lubomir Rintel (1):
      usb: cdc_acm: Add quirk for Castles VEGA3000

Lucas Stach (2):
      drm/imx: imx-ldb: disable LDB on driver bind
      drm/imx: imx-ldb: check if channel is enabled before printing warning

Martin KaFai Lau (2):
      bpf: btf: Clean up BTF_INT_BITS() in uapi btf.h
      bpf: Use option "help" in the llvm-objcopy test

Martin Schwidefsky (1):
      s390: disable gcc plugins

Martin Wilck (3):
      block: bio_iov_iter_get_pages: fix size of last iovec
      blkdev: __blkdev_direct_IO_simple: fix leak in error case
      block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs

Masahiro Yamada (1):
      gpio: uniphier: set legitimate irq trigger type in .to_irq hook

Masami Hiramatsu (2):
      ring_buffer: tracing: Inherit the tracing setting to next ring buffer
      selftests/ftrace: Add snapshot and tracing_on test case

Neil Armstrong (1):
      clk: meson-gxbb: set fclk_div2 as CLK_IS_CRITICAL

Nicholas Mc Guire (1):
      can: mpc5xxx_can: check of_iomap return before use

Paolo Abeni (1):
      ip: hash fragments consistently

Paul Burton (1):
      MIPS: Fix off-by-one in pci_resource_to_user()

Peter Senna Tschudin (1):
      tools: usb: ffs-test: Fix build on big endian systems

Raed Salem (1):
      net/mlx5: Fix 'DON'T_TRAP' functionality

Rafael J. Wysocki (1):
      driver core: Partially revert "driver core: correct device's
shutdown order"

Rafał Miłecki (1):
      Revert "MIPS: BCM47XX: Enable 74K Core ExternalSync for PCIe erratum"

Randy Dunlap (2):
      usb/phy: fix PPC64 build errors in phy-fsl-usb.c
      net: prevent ISA drivers from building on PPC32

Roi Dayan (1):
      net/mlx5e: Only allow offloading decap egress (egdev) flows

Roman Fietze (1):
      can: m_can.c: fix setup of CCCR register: clear CCCR NISO bit
before checking can.ctrlmode

Roopa Prabhu (4):
      rtnetlink: add rtnl_link_state check in rtnl_configure_link
      vxlan: add new fdb alloc and create helpers
      vxlan: make netlink notify in vxlan_fdb_destroy optional
      vxlan: fix default fdb entry netlink notify ordering during netdev create

Saeed Mahameed (1):
      net/mlx5: E-Switch, UBSAN fix undefined behavior in mlx5_eswitch_mode

Samuel Thibault (1):
      staging: speakup: fix wraparound in uaccess length check

Schmauss, Erik (1):
      ACPICA: AML Parser: ignore dispatcher error status during table load

Sergio Paracuellos (1):
      staging: ks7010: call 'hostif_mib_set_request_int' instead of
'hostif_mib_set_request_bool'

Shakeel Butt (1):
      kvm, mm: account shadow page tables to kmemcg

Shay Agroskin (1):
      net/mlx5e: Refine ets validation function

Shubhrajyoti Datta (1):
      net: axienet: Fix double deregister of mdio

Snild Dolkow (1):
      kthread, tracing: Don't expose half-written comm when creating kthreads

Stephane Grosjean (1):
      can: peak_canfd: fix firmware < v3.3.0: limit allocation to
32-bit DMA addr only

Steve Longerbeam (1):
      gpu: ipu-csi: Check for field type alternate

Steven Rostedt (VMware) (3):
      tracing: Fix double free of event_trigger_data
      tracing: Fix possible double free in event_enable_trigger_func()
      tracing: Quiet gcc warning about maybe unused link variable

Sudarsana Reddy Kalluru (4):
      qed: Fix link flap issue due to mismatching EEE capabilities.
      qed: Fix possible race for the link state value.
      qed: Correct Multicast API to reflect existence of 256
approximate buckets.
      bnx2x: Fix invalid memory access in rss hash config path.

Taehee Yoo (3):
      netfilter: nf_tables: fix jumpstack depth validation
      netfilter: nft_set_hash: add rcu_barrier() in the nft_rhash_destroy()
      netfilter: nft_set_rbtree: fix panic when destroying set by GC

Taeung Song (1):
      tools/bpftool: Fix segfault case regarding 'pin' arguments

Tariq Toukan (2):
      net/mlx5: Fix QP fragmented buffer allocation
      net/page_pool: Fix inconsistent lock state warning

Tejun Heo (1):
      delayacct: fix crash in delayacct_blkio_end() after delayacct init failure

Theodore Ts'o (6):
      ext4: fix false negatives *and* false positives in
ext4_check_descriptors()
      ext4: clear mmp sequence number when remounting read-only
      ext4: fix inline data updates with checksums enabled
      ext4: check for allocation block validity with block group locked
      random: mix rdrand with entropy sent in from userspace
      ext4: fix check to prevent initializing reserved inodes

Thomas Tai (1):
      PCI/AER: Work around use-after-free in pcie_do_fatal_recovery()

Tony Lindgren (1):
      phy: mapphone-mdm6600: Fix wrong enum used for status lines

Uwe Kleine-König (1):
      net: dsa: mv88e6xxx: fix races between lock and irq freeing

Vinod Koul (1):
      clk: qcom: gcc-msm8996: Disable halt check on UFS tx clock

Vivek Gautam (1):
      clk/mmcc-msm8996: Make mmagic_bimc_gdsc ALWAYS_ON

Vladimir Zapolskiy (3):
      usb: gadget: u_audio: remove caching of stream buffer parameters
      usb: gadget: u_audio: remove cached period bytes value
      usb: gadget: u_audio: protect stream runtime fields with stream spinlock

Wei Wang (1):
      ipv6: use fib6_info_hold_safe() when necessary

Willem de Bruijn (1):
      ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull

Wolfram Sang (2):
      i2c: rcar: handle RXDMA HW behaviour on Gen3
      i2c: imx: use open drain for recovery GPIO

Yuchung Cheng (3):
      tcp: helpers to send special DCTCP ack
      tcp: do not cancel delay-AcK on DCTCP special ACK
      tcp: do not delay ACK in DCTCP upon CE status change

YueHaibing (3):
      net: caif: Add a missing rcu_read_unlock() in caif_flow_cb
      bpfilter: Fix mismatch in function argument types
      cpufreq: qcom-kryo: add NULL entry to the end of_device_id array

Zhao Chen (1):
      net-next/hinic: fix a problem in hinic_xmit_frame()

Zheng Xiaowei (1):
      usb: xhci: Fix memory leak in xhci_endpoint_reset()

mpubbise@codeaurora.org (1):
      mac80211: add stations tied to AP_VLANs during hw reconfig

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-29 22:09 Linux 4.18-rc7 Linus Torvalds
@ 2018-07-30  6:47 ` Amit Pundir
  2018-07-30 13:01   ` Kirill A. Shutemov
  2018-07-30 13:24 ` Guenter Roeck
  1 sibling, 1 reply; 32+ messages in thread
From: Amit Pundir @ 2018-07-30  6:47 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	aarcange, Linus Torvalds
  Cc: Greg Kroah-Hartman, John Stultz, linux-mm, lkml

On Mon, 30 Jul 2018 at 03:39, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So unless something odd happens, this should be the last rc for 4.18.
>
> Nothing particularly odd happened this last week - we got the usual
> random set of various minor fixes all over. About two thirds of it is
> drivers - networking, staging and usb stands out, but there's a little
> bit of stuff all over (clk, block, gpu, nvme..).
>
> Outside of drivers, the bulk is some core networking stuff, with
> random changes elsewhere (minor arch updates, filesystems, core
> kernel, test scripts).
>
> The appended shortlog gives a flavor of the details.
>
>                   Linus
>
> ---
> Kirill A. Shutemov (3):
>       mm: introduce vma_init()
>       mm: use vma_init() to initialize VMAs on stack and data segments
>       mm: fix vma_is_anonymous() false-positives

Hi, I have run into AOSP userspace crash with v4.18-rc7, leading to
above mm patches. bfd40eaff5ab ("mm: fix vma_is_anonymous()
false-positives") to be specific. The same userspace is working fine
with v4.18-rc6.

I didn't yet look into what is going wrong from userspace point of
view, but I just wanted to give you a heads up on this. I'll be happy
to assist in further debugging/diagnosis if required.

Here is the crash log from logcat, if it helps:
F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
F DEBUG   : Build fingerprint:
'Android/db410c32_only/db410c32_only:Q/OC-MR1/102:userdebug/test-key
F DEBUG   : Revision: '0'
F DEBUG   : ABI: 'arm'
F DEBUG   : pid: 2261, tid: 2261, name: zygote  >>> zygote <<<
F DEBUG   : signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0xec00008
.. <snip> ..
F DEBUG   : backtrace:
F DEBUG   :     #00 pc 00001c04  /system/lib/libc.so (memset+48)
F DEBUG   :     #01 pc 0010c513  /system/lib/libart.so
(create_mspace_with_base+82)
F DEBUG   :     #02 pc 0015c601  /system/lib/libart.so
(art::gc::space::DlMallocSpace::CreateMspace(void*, unsigned int,
unsigned int)+40)
F DEBUG   :     #03 pc 0015c3ed  /system/lib/libart.so
(art::gc::space::DlMallocSpace::CreateFromMemMap(art::MemMap*,
std::__1::basic_string<char, std::__
1::char_traits<char>, std::__1::allocator<char>> const&, unsigned int,
unsigned int, unsigned int, unsigned int, bool)+36)
F DEBUG   :     #04 pc 0013c9ab  /system/lib/libart.so
(art::gc::Heap::Heap(unsigned int, unsigned int, unsigned int,
unsigned int, double, double, unsigned int, unsigned int,
std::__1::basic_string<char, std::__1::char_traits<char>,
std::__1::allocator<char>> const&, art::InstructionSet,
art::gc::CollectorType, art::gc::CollectorType,
art::gc::space::LargeObjectSpaceType, unsigned int, unsigned int,
unsigned int, bool, unsigned int, unsigned int, bool, bool, bool,
bool, bool, bool, bool, bool, bool, bool, bool, unsigned long
long)+1674)
DEBUG   :     #05 pc 00318201  /system/lib/libart.so
(art::Runtime::Init(art::RuntimeArgumentMap&&)+7036)
DEBUG   :     #06 pc 0031af19  /system/lib/libart.so
(art::Runtime::Create(std::__1::vector<std::__1::pair<std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char>>, void const*>,
std::__1::allocator<std::__1::pair<std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char>>, void
const*>>> const&, bool)+68)
F DEBUG   :     #07 pc 0023c353  /system/lib/libart.so (JNI_CreateJavaVM+658)
F DEBUG   :     #08 pc 0000205f  /system/lib/libandroid_runtime.so
(android::AndroidRuntime::startVm(_JavaVM**, _JNIEnv**, bool)+5038)
F DEBUG   :     #09 pc 00002381  /system/lib/libandroid_runtime.so
(android::AndroidRuntime::start(char const*,
android::Vector<android::String8> const&, bool)+196)
F DEBUG   :     #10 pc 0000046b  /system/bin/app_process32 (main+702)

Regards,
Amit Pundir

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-30  6:47 ` Amit Pundir
@ 2018-07-30 13:01   ` Kirill A. Shutemov
  2018-07-30 13:34     ` Amit Pundir
  2018-07-30 17:32     ` Linus Torvalds
  0 siblings, 2 replies; 32+ messages in thread
From: Kirill A. Shutemov @ 2018-07-30 13:01 UTC (permalink / raw)
  To: Amit Pundir
  Cc: Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	aarcange, Linus Torvalds, Greg Kroah-Hartman, John Stultz,
	linux-mm, lkml, youling 257

On Mon, Jul 30, 2018 at 12:17:46PM +0530, Amit Pundir wrote:
> On Mon, 30 Jul 2018 at 03:39, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > So unless something odd happens, this should be the last rc for 4.18.
> >
> > Nothing particularly odd happened this last week - we got the usual
> > random set of various minor fixes all over. About two thirds of it is
> > drivers - networking, staging and usb stands out, but there's a little
> > bit of stuff all over (clk, block, gpu, nvme..).
> >
> > Outside of drivers, the bulk is some core networking stuff, with
> > random changes elsewhere (minor arch updates, filesystems, core
> > kernel, test scripts).
> >
> > The appended shortlog gives a flavor of the details.
> >
> >                   Linus
> >
> > ---
> > Kirill A. Shutemov (3):
> >       mm: introduce vma_init()
> >       mm: use vma_init() to initialize VMAs on stack and data segments
> >       mm: fix vma_is_anonymous() false-positives
> 
> Hi, I have run into AOSP userspace crash with v4.18-rc7, leading to
> above mm patches. bfd40eaff5ab ("mm: fix vma_is_anonymous()
> false-positives") to be specific. The same userspace is working fine
> with v4.18-rc6.
> 
> I didn't yet look into what is going wrong from userspace point of
> view, but I just wanted to give you a heads up on this. I'll be happy
> to assist in further debugging/diagnosis if required.

Youling reported basically the same bug with zygote crashing, but on
x86-64.

I think I missed vma_set_anonymous() somewhere, but I fail to see where.

Could you check if removing 'vma->vm_ops = &dummy_vm_ops;" from vma_init
makes the problem go away?

Any chance the code that crashes can be run under strace?

> Here is the crash log from logcat, if it helps:
> F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
> F DEBUG   : Build fingerprint:
> 'Android/db410c32_only/db410c32_only:Q/OC-MR1/102:userdebug/test-key
> F DEBUG   : Revision: '0'
> F DEBUG   : ABI: 'arm'
> F DEBUG   : pid: 2261, tid: 2261, name: zygote  >>> zygote <<<
> F DEBUG   : signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0xec00008
> .. <snip> ..
> F DEBUG   : backtrace:
> F DEBUG   :     #00 pc 00001c04  /system/lib/libc.so (memset+48)
> F DEBUG   :     #01 pc 0010c513  /system/lib/libart.so
> (create_mspace_with_base+82)
> F DEBUG   :     #02 pc 0015c601  /system/lib/libart.so
> (art::gc::space::DlMallocSpace::CreateMspace(void*, unsigned int,
> unsigned int)+40)
> F DEBUG   :     #03 pc 0015c3ed  /system/lib/libart.so
> (art::gc::space::DlMallocSpace::CreateFromMemMap(art::MemMap*,
> std::__1::basic_string<char, std::__
> 1::char_traits<char>, std::__1::allocator<char>> const&, unsigned int,
> unsigned int, unsigned int, unsigned int, bool)+36)
> F DEBUG   :     #04 pc 0013c9ab  /system/lib/libart.so
> (art::gc::Heap::Heap(unsigned int, unsigned int, unsigned int,
> unsigned int, double, double, unsigned int, unsigned int,
> std::__1::basic_string<char, std::__1::char_traits<char>,
> std::__1::allocator<char>> const&, art::InstructionSet,
> art::gc::CollectorType, art::gc::CollectorType,
> art::gc::space::LargeObjectSpaceType, unsigned int, unsigned int,
> unsigned int, bool, unsigned int, unsigned int, bool, bool, bool,
> bool, bool, bool, bool, bool, bool, bool, bool, unsigned long
> long)+1674)
> DEBUG   :     #05 pc 00318201  /system/lib/libart.so
> (art::Runtime::Init(art::RuntimeArgumentMap&&)+7036)
> DEBUG   :     #06 pc 0031af19  /system/lib/libart.so
> (art::Runtime::Create(std::__1::vector<std::__1::pair<std::__1::basic_string<char,
> std::__1::char_traits<char>, std::__1::allocator<char>>, void const*>,
> std::__1::allocator<std::__1::pair<std::__1::basic_string<char,
> std::__1::char_traits<char>, std::__1::allocator<char>>, void
> const*>>> const&, bool)+68)
> F DEBUG   :     #07 pc 0023c353  /system/lib/libart.so (JNI_CreateJavaVM+658)
> F DEBUG   :     #08 pc 0000205f  /system/lib/libandroid_runtime.so
> (android::AndroidRuntime::startVm(_JavaVM**, _JNIEnv**, bool)+5038)
> F DEBUG   :     #09 pc 00002381  /system/lib/libandroid_runtime.so
> (android::AndroidRuntime::start(char const*,
> android::Vector<android::String8> const&, bool)+196)
> F DEBUG   :     #10 pc 0000046b  /system/bin/app_process32 (main+702)
> 
> Regards,
> Amit Pundir
> 

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-29 22:09 Linux 4.18-rc7 Linus Torvalds
  2018-07-30  6:47 ` Amit Pundir
@ 2018-07-30 13:24 ` Guenter Roeck
  1 sibling, 0 replies; 32+ messages in thread
From: Guenter Roeck @ 2018-07-30 13:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

On Sun, Jul 29, 2018 at 03:09:05PM -0700, Linus Torvalds wrote:
> So unless something odd happens, this should be the last rc for 4.18.
> 
> Nothing particularly odd happened this last week - we got the usual
> random set of various minor fixes all over. About two thirds of it is
> drivers - networking, staging and usb stands out, but there's a little
> bit of stuff all over (clk, block, gpu, nvme..).
> 
> Outside of drivers, the bulk is some core networking stuff, with
> random changes elsewhere (minor arch updates, filesystems, core
> kernel, test scripts).
> 

Build results:
	total: 134 pass: 134 fail: 0
Qemu test results:
	total: 183 pass: 183 fail: 0

Fetails are available at http://kerneltests.org/builders/.

Guenter

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-30 13:01   ` Kirill A. Shutemov
@ 2018-07-30 13:34     ` Amit Pundir
  2018-07-30 17:32     ` Linus Torvalds
  1 sibling, 0 replies; 32+ messages in thread
From: Amit Pundir @ 2018-07-30 13:34 UTC (permalink / raw)
  To: kirill
  Cc: Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	aarcange, Linus Torvalds, Greg Kroah-Hartman, John Stultz,
	linux-mm, lkml, youling 257

On Mon, 30 Jul 2018 at 18:31, Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> On Mon, Jul 30, 2018 at 12:17:46PM +0530, Amit Pundir wrote:
> > On Mon, 30 Jul 2018 at 03:39, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > So unless something odd happens, this should be the last rc for 4.18.
> > >
> > > Nothing particularly odd happened this last week - we got the usual
> > > random set of various minor fixes all over. About two thirds of it is
> > > drivers - networking, staging and usb stands out, but there's a little
> > > bit of stuff all over (clk, block, gpu, nvme..).
> > >
> > > Outside of drivers, the bulk is some core networking stuff, with
> > > random changes elsewhere (minor arch updates, filesystems, core
> > > kernel, test scripts).
> > >
> > > The appended shortlog gives a flavor of the details.
> > >
> > >                   Linus
> > >
> > > ---
> > > Kirill A. Shutemov (3):
> > >       mm: introduce vma_init()
> > >       mm: use vma_init() to initialize VMAs on stack and data segments
> > >       mm: fix vma_is_anonymous() false-positives
> >
> > Hi, I have run into AOSP userspace crash with v4.18-rc7, leading to
> > above mm patches. bfd40eaff5ab ("mm: fix vma_is_anonymous()
> > false-positives") to be specific. The same userspace is working fine
> > with v4.18-rc6.
> >
> > I didn't yet look into what is going wrong from userspace point of
> > view, but I just wanted to give you a heads up on this. I'll be happy
> > to assist in further debugging/diagnosis if required.
>
> Youling reported basically the same bug with zygote crashing, but on
> x86-64.
>
> I think I missed vma_set_anonymous() somewhere, but I fail to see where.
>
> Could you check if removing 'vma->vm_ops = &dummy_vm_ops;" from vma_init
> makes the problem go away?

Yes removing 'vma->vm_ops = &dummy_vm_ops;" from vma_init() works.
Crash is gone with that change.

>
> Any chance the code that crashes can be run under strace?

Running strace on zygote is going to be a pain. I can check logcat
again and see if any other relatively less complex process is crashing
with similar backtrace and try to run that with strace if that is
still required.

Regards,
Amit Pundir

>
> > Here is the crash log from logcat, if it helps:
> > F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
> > F DEBUG   : Build fingerprint:
> > 'Android/db410c32_only/db410c32_only:Q/OC-MR1/102:userdebug/test-key
> > F DEBUG   : Revision: '0'
> > F DEBUG   : ABI: 'arm'
> > F DEBUG   : pid: 2261, tid: 2261, name: zygote  >>> zygote <<<
> > F DEBUG   : signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0xec00008
> > .. <snip> ..
> > F DEBUG   : backtrace:
> > F DEBUG   :     #00 pc 00001c04  /system/lib/libc.so (memset+48)
> > F DEBUG   :     #01 pc 0010c513  /system/lib/libart.so
> > (create_mspace_with_base+82)
> > F DEBUG   :     #02 pc 0015c601  /system/lib/libart.so
> > (art::gc::space::DlMallocSpace::CreateMspace(void*, unsigned int,
> > unsigned int)+40)
> > F DEBUG   :     #03 pc 0015c3ed  /system/lib/libart.so
> > (art::gc::space::DlMallocSpace::CreateFromMemMap(art::MemMap*,
> > std::__1::basic_string<char, std::__
> > 1::char_traits<char>, std::__1::allocator<char>> const&, unsigned int,
> > unsigned int, unsigned int, unsigned int, bool)+36)
> > F DEBUG   :     #04 pc 0013c9ab  /system/lib/libart.so
> > (art::gc::Heap::Heap(unsigned int, unsigned int, unsigned int,
> > unsigned int, double, double, unsigned int, unsigned int,
> > std::__1::basic_string<char, std::__1::char_traits<char>,
> > std::__1::allocator<char>> const&, art::InstructionSet,
> > art::gc::CollectorType, art::gc::CollectorType,
> > art::gc::space::LargeObjectSpaceType, unsigned int, unsigned int,
> > unsigned int, bool, unsigned int, unsigned int, bool, bool, bool,
> > bool, bool, bool, bool, bool, bool, bool, bool, unsigned long
> > long)+1674)
> > DEBUG   :     #05 pc 00318201  /system/lib/libart.so
> > (art::Runtime::Init(art::RuntimeArgumentMap&&)+7036)
> > DEBUG   :     #06 pc 0031af19  /system/lib/libart.so
> > (art::Runtime::Create(std::__1::vector<std::__1::pair<std::__1::basic_string<char,
> > std::__1::char_traits<char>, std::__1::allocator<char>>, void const*>,
> > std::__1::allocator<std::__1::pair<std::__1::basic_string<char,
> > std::__1::char_traits<char>, std::__1::allocator<char>>, void
> > const*>>> const&, bool)+68)
> > F DEBUG   :     #07 pc 0023c353  /system/lib/libart.so (JNI_CreateJavaVM+658)
> > F DEBUG   :     #08 pc 0000205f  /system/lib/libandroid_runtime.so
> > (android::AndroidRuntime::startVm(_JavaVM**, _JNIEnv**, bool)+5038)
> > F DEBUG   :     #09 pc 00002381  /system/lib/libandroid_runtime.so
> > (android::AndroidRuntime::start(char const*,
> > android::Vector<android::String8> const&, bool)+196)
> > F DEBUG   :     #10 pc 0000046b  /system/bin/app_process32 (main+702)
> >
> > Regards,
> > Amit Pundir
> >
>
> --
>  Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-30 13:01   ` Kirill A. Shutemov
  2018-07-30 13:34     ` Amit Pundir
@ 2018-07-30 17:32     ` Linus Torvalds
  2018-07-30 21:53       ` Hugh Dickins
  1 sibling, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2018-07-30 17:32 UTC (permalink / raw)
  To: Kirill A. Shutemov, Hugh Dickins, Matthew Wilcox
  Cc: Amit Pundir, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, John Stultz,
	linux-mm, Linux Kernel Mailing List, youling257

On Mon, Jul 30, 2018 at 6:01 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> I think I missed vma_set_anonymous() somewhere, but I fail to see where.

Honestly, by now we just need to revert that commit.

It's not even clear that it was a good idea to begin with. The rest of
the commits were cleanups, this one was driven by a incorrect
VM_BUG_ON() that triggered, and that checked "vma_is_anonymous(vma)"
without any explanations of wht it should matter.

I think the biggest problem with vma_is_anonymous() may be its name,
not what it does.

What the code historically *did* (and what vma_is_anonymous() checks)
is not "is this anonymous", but rather "does this have any special
operations associated with it".

The two are similar. But people have grown opinions about exactly what
"anonymous" means. If we had named it just "no_vm_ops()", we wouldn't
have random crazy checks for "vma_is_anonymous()" in places where it
makes no sense.

So what I think we want a real explanation for is why people who use
"vma_is_anonymous()" care. Instead of trying to change its very
historical meaning, we should look at the users, and perhaps change
its name.

In this case, for example, I think the *real* problem was described by
commit 684283988f70 ("huge pagecache: mmap_sem is unlocked when
truncation splits pmd"), and the problem is that an existing check
that required that mmap_sem was held was changed to say "only for
anonymous mappings".

But the fact is, you can truncate mappings that don't have any ops just *fine*.

So maybe that original BUG() was entirely bogus to begin with, and it
shouldn't exist at all?

Or maybe the code should test "do I have a vm_file" instead of testing
"do I have vm_ops"?

What's the problem with just doing split_huge_pmd() there when it's a
pmd_trans_huge or pmd_devmap pmd? Why is that VM_BUG_ON_VMA() there in
the first place? Why are allegedly "anonymous" mappings so special
here for locking?

Adding a few more people to the cc, they were involved the last that
time VM_BUG_ON_VMA() was modified.

New people: see commit bfd40eaff5ab ("mm: fix vma_is_anonymous()
false-positives") for details. Right now I think it's getting
reverted, but the oops explanation in the commit is about that

            kernel BUG at mm/memory.c:1422!

which was/is debatable and seems to make no sense (and definitely is
still triggerable despite that commit 684283988f70 ("huge pagecache:
mmap_sem is unlocked when truncation splits pmd") that limited it a
bit - but I think it didn't limit it enough.

               Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-30 17:32     ` Linus Torvalds
@ 2018-07-30 21:53       ` Hugh Dickins
  2018-07-31  1:01         ` Linus Torvalds
  0 siblings, 1 reply; 32+ messages in thread
From: Hugh Dickins @ 2018-07-30 21:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kirill A. Shutemov, Hugh Dickins, Matthew Wilcox, Amit Pundir,
	Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	Andrea Arcangeli, Greg Kroah-Hartman, John Stultz, linux-mm,
	Linux Kernel Mailing List, youling257

On Mon, 30 Jul 2018, Linus Torvalds wrote:
> On Mon, Jul 30, 2018 at 6:01 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
> >
> > I think I missed vma_set_anonymous() somewhere, but I fail to see where.
> 
> Honestly, by now we just need to revert that commit.
> 
> It's not even clear that it was a good idea to begin with. The rest of
> the commits were cleanups, this one was driven by a incorrect
> VM_BUG_ON() that triggered, and that checked "vma_is_anonymous(vma)"
> without any explanations of wht it should matter.
> 
> I think the biggest problem with vma_is_anonymous() may be its name,
> not what it does.
> 
> What the code historically *did* (and what vma_is_anonymous() checks)
> is not "is this anonymous", but rather "does this have any special
> operations associated with it".
> 
> The two are similar. But people have grown opinions about exactly what
> "anonymous" means. If we had named it just "no_vm_ops()", we wouldn't
> have random crazy checks for "vma_is_anonymous()" in places where it
> makes no sense.
> 
> So what I think we want a real explanation for is why people who use
> "vma_is_anonymous()" care. Instead of trying to change its very
> historical meaning, we should look at the users, and perhaps change
> its name.

You make a very good point on the naming. Especially confusing when
layered on top of "we call shmem 'file' here, but 'anon' there".

I have no problem with reverting -rc7's vma_is_anonymous() series.

> 
> In this case, for example, I think the *real* problem was described by
> commit 684283988f70 ("huge pagecache: mmap_sem is unlocked when
> truncation splits pmd"), and the problem is that an existing check
> that required that mmap_sem was held was changed to say "only for
> anonymous mappings".
> 
> But the fact is, you can truncate mappings that don't have any ops just *fine*.
> 
> So maybe that original BUG() was entirely bogus to begin with, and it
> shouldn't exist at all?
> 
> Or maybe the code should test "do I have a vm_file" instead of testing
> "do I have vm_ops"?
> 
> What's the problem with just doing split_huge_pmd() there when it's a
> pmd_trans_huge or pmd_devmap pmd? Why is that VM_BUG_ON_VMA() there in
> the first place? Why are allegedly "anonymous" mappings so special
> here for locking?
> 
> Adding a few more people to the cc, they were involved the last that
> time VM_BUG_ON_VMA() was modified.
> 
> New people: see commit bfd40eaff5ab ("mm: fix vma_is_anonymous()
> false-positives") for details. Right now I think it's getting
> reverted, but the oops explanation in the commit is about that
> 
>             kernel BUG at mm/memory.c:1422!
> 
> which was/is debatable and seems to make no sense (and definitely is
> still triggerable despite that commit 684283988f70 ("huge pagecache:
> mmap_sem is unlocked when truncation splits pmd") that limited it a
> bit - but I think it didn't limit it enough.

I'm all for deleting that VM_BUG_ON_VMA() in zap_pmd_range(), it was
just a compromise with those who wanted to keep something there;
I don't think we even need a WARN_ON_ONCE() now.

It's historical: back in the day when only (hugetlbfs which never
gets there, and) anon THP used huge pmds for userspace, it did half-
document an obscure assumption underlying __split_huge_pmd(), and
IIRC was added in response to a trinity bug catch in that area.

(It remains quite interesting how exit_mmap() does not come that way,
and most syscalls split the vma beforehand in vma_adjust(): it's mostly
about madvise(,,MADV_DONTNEED), perhaps others now, which zap ptes
without prior splitting.)

Once DAX and huge tmpfs came in, the BUG had to be scrapped or weakened
because truncation and hole-punching can come that way without mmap_sem.
And perhaps other drivers since are taking advantage of huge pmds now,
with other assumptions.

But I have not yet taken a look to see where the -rc7 changes actually
go wrong: I'll spend a little while looking, but don't expect to find it -
certainly don't wait on me, I'll only speak up if I find something.

Hugh

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-30 21:53       ` Hugh Dickins
@ 2018-07-31  1:01         ` Linus Torvalds
  2018-07-31  3:26           ` Hugh Dickins
  2018-07-31  6:29           ` Linux 4.18-rc7 Kirill A. Shutemov
  0 siblings, 2 replies; 32+ messages in thread
From: Linus Torvalds @ 2018-07-31  1:01 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Kirill A. Shutemov, Matthew Wilcox, Amit Pundir,
	Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	Andrea Arcangeli, Greg Kroah-Hartman, John Stultz, linux-mm,
	Linux Kernel Mailing List, youling257

On Mon, Jul 30, 2018 at 2:53 PM Hugh Dickins <hughd@google.com> wrote:
>
> I have no problem with reverting -rc7's vma_is_anonymous() series.

I don't think we need to revert the whole series: I think the rest are
all fairly obvious cleanups, and shouldn't really have any semantic
changes.

It's literally only that last patch in the series that then changes
that meaning of "vm_ops". And I don't really _mind_ that last step
either, but since we don't know exactly what it was that it broke, and
we're past rc7, I don't think we really have any option but the revert
it.

And if we revert it, I think we need to just remove the
VM_BUG_ON_VMA() that it was supposed to fix. Because I do think that
it is quite likely that the real bug is that overzealous BUG_ON(),
since I can't see any reason why anonymous mappings should be special
there.

But I'm certainly also ok with re-visiting that commit later.  I just
think that right _now_ the above is my preferred plan.

Kirill?

> I'm all for deleting that VM_BUG_ON_VMA() in zap_pmd_range(), it was
> just a compromise with those who wanted to keep something there;
> I don't think we even need a WARN_ON_ONCE() now.

So to me it looks like a historical check that simply doesn't
"normally" trigger, but there's no reason I can see why we should care
about the case it tests against.

> (It remains quite interesting how exit_mmap() does not come that way,
> and most syscalls split the vma beforehand in vma_adjust(): it's mostly
> about madvise(,,MADV_DONTNEED), perhaps others now, which zap ptes
> without prior splitting.)

Well, in this case it's the ftruncate() path, which fundamentally
doesn't split the vma at all (prior *or* later). But yes, madvise() is
in the same boat - it doesn't change the vma at all, it just changes
the contents of the vma.

And exit_mmap() is special because it just tears down everything.

So we do have three very distinct cases:

 (a) changing and thus splitting the vma itself (mprotect, munmap/mmap, mlock),

 (b) not changing the vma, but changing the underlying mapping
(truncate and madvise(MADV_DONTNEED)

 (c) tearing down everything, and no locking needed because it's the
last user (exit_mmap).

that are different for what I think are good reasons.

                   Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31  1:01         ` Linus Torvalds
@ 2018-07-31  3:26           ` Hugh Dickins
  2018-07-31  4:25             ` John Stultz
  2018-07-31  6:29           ` Linux 4.18-rc7 Kirill A. Shutemov
  1 sibling, 1 reply; 32+ messages in thread
From: Hugh Dickins @ 2018-07-31  3:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Amit Pundir,
	Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	Andrea Arcangeli, Greg Kroah-Hartman, John Stultz, linux-mm,
	Linux Kernel Mailing List, youling257

On Mon, 30 Jul 2018, Linus Torvalds wrote:
> On Mon, Jul 30, 2018 at 2:53 PM Hugh Dickins <hughd@google.com> wrote:
> >
> > I have no problem with reverting -rc7's vma_is_anonymous() series.
> 
> I don't think we need to revert the whole series: I think the rest are
> all fairly obvious cleanups, and shouldn't really have any semantic
> changes.

Okay.

> 
> It's literally only that last patch in the series that then changes
> that meaning of "vm_ops". And I don't really _mind_ that last step
> either, but since we don't know exactly what it was that it broke, and
> we're past rc7, I don't think we really have any option but the revert
> it.

It took me a long time to grasp what was happening, that that last
patch bfd40eaff5ab was fixing. Not quite explained in the commit.

I think it was that by mistakenly passing the vma_is_anonymous() test,
create_huge_pmd() gave the MAP_PRIVATE kcov mapping a THP (instead of
COWing pages from kcov); which the truncate then had to split, and in
going to do so, again hit the mistaken vma_is_anonymous() test, BUG.

> 
> And if we revert it, I think we need to just remove the
> VM_BUG_ON_VMA() that it was supposed to fix. Because I do think that
> it is quite likely that the real bug is that overzealous BUG_ON(),
> since I can't see any reason why anonymous mappings should be special
> there.

Yes, that probably has to go: but it's not clear what state it leaves
us in, with an anon THP being split by a truncate, without the expected
locking; I don't remember offhand, probably a subtler bug than that BUG,
which you may or may not consider an improvement.

I fear that Kirill has not missed inserting a vma_set_anonymous() from
somewhere that it should be, but rather that zygote is working with some
special mapping which used to satisfy vma_is_anonymous(), faults supplying
backing pages, but now comes out as !vma_is_anonymous(), so do_fault()
finds !dummy_vm_ops.fault hence SIGBUS.

If that's so, perhaps dummy_vm_ops needs to be given a back-compatible
fault handler; or the driver(?) in question given vm_ops and that fault
handler. But when I say "back-compatible", I don't think it should ever
go so far as to supply a THP.

> 
> But I'm certainly also ok with re-visiting that commit later.  I just
> think that right _now_ the above is my preferred plan.
> 
> Kirill?
> 
> > I'm all for deleting that VM_BUG_ON_VMA() in zap_pmd_range(), it was
> > just a compromise with those who wanted to keep something there;
> > I don't think we even need a WARN_ON_ONCE() now.
> 
> So to me it looks like a historical check that simply doesn't
> "normally" trigger, but there's no reason I can see why we should care
> about the case it tests against.
> 
> > (It remains quite interesting how exit_mmap() does not come that way,
> > and most syscalls split the vma beforehand in vma_adjust(): it's mostly
> > about madvise(,,MADV_DONTNEED), perhaps others now, which zap ptes
> > without prior splitting.)
> 
> Well, in this case it's the ftruncate() path, which fundamentally
> doesn't split the vma at all (prior *or* later). But yes, madvise() is
> in the same boat - it doesn't change the vma at all, it just changes
> the contents of the vma.
> 
> And exit_mmap() is special because it just tears down everything.
> 
> So we do have three very distinct cases:
> 
>  (a) changing and thus splitting the vma itself (mprotect, munmap/mmap, mlock),

Yes.

> 
>  (b) not changing the vma, but changing the underlying mapping
> (truncate and madvise(MADV_DONTNEED)

Yes, though I think I would distinguish the truncate & hole-punch case
from the madvise zap case, they have different characteristics in other
ways (or did, before that awkward case of truncating an anon THP surfaced).

> 
>  (c) tearing down everything, and no locking needed because it's the
> last user (exit_mmap).

Yes; and it goes linearly from start to finish, never jumping into
the middle of a vma, so never needing to split a THP.

> 
> that are different for what I think are good reasons.
> 
>                    Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31  3:26           ` Hugh Dickins
@ 2018-07-31  4:25             ` John Stultz
  2018-07-31  6:40               ` Amit Pundir
  0 siblings, 1 reply; 32+ messages in thread
From: John Stultz @ 2018-07-31  4:25 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Kirill A. Shutemov, Matthew Wilcox, Amit Pundir,
	Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling257, Joel Fernandes,
	Colin Cross

On Mon, Jul 30, 2018 at 8:26 PM, Hugh Dickins <hughd@google.com> wrote:
> On Mon, 30 Jul 2018, Linus Torvalds wrote:
>> On Mon, Jul 30, 2018 at 2:53 PM Hugh Dickins <hughd@google.com> wrote:
>> >
>> > I have no problem with reverting -rc7's vma_is_anonymous() series.
>>
>> I don't think we need to revert the whole series: I think the rest are
>> all fairly obvious cleanups, and shouldn't really have any semantic
>> changes.
>
> Okay.
>
>>
>> It's literally only that last patch in the series that then changes
>> that meaning of "vm_ops". And I don't really _mind_ that last step
>> either, but since we don't know exactly what it was that it broke, and
>> we're past rc7, I don't think we really have any option but the revert
>> it.
>
> It took me a long time to grasp what was happening, that that last
> patch bfd40eaff5ab was fixing. Not quite explained in the commit.
>
> I think it was that by mistakenly passing the vma_is_anonymous() test,
> create_huge_pmd() gave the MAP_PRIVATE kcov mapping a THP (instead of
> COWing pages from kcov); which the truncate then had to split, and in
> going to do so, again hit the mistaken vma_is_anonymous() test, BUG.
>
>>
>> And if we revert it, I think we need to just remove the
>> VM_BUG_ON_VMA() that it was supposed to fix. Because I do think that
>> it is quite likely that the real bug is that overzealous BUG_ON(),
>> since I can't see any reason why anonymous mappings should be special
>> there.
>
> Yes, that probably has to go: but it's not clear what state it leaves
> us in, with an anon THP being split by a truncate, without the expected
> locking; I don't remember offhand, probably a subtler bug than that BUG,
> which you may or may not consider an improvement.
>
> I fear that Kirill has not missed inserting a vma_set_anonymous() from
> somewhere that it should be, but rather that zygote is working with some
> special mapping which used to satisfy vma_is_anonymous(), faults supplying
> backing pages, but now comes out as !vma_is_anonymous(), so do_fault()
> finds !dummy_vm_ops.fault hence SIGBUS.

I've been only casually following this thread (mostly just glad Amit
caught it and I could avoid having to bisect the issue in my own
Android testing), but this bit starting to shake a few old cobwebs
loose in my brain.

I'm wondering if Zygote is utilizing ashmem here, and we're somehow
traversing ashmem purged memory, or due to some setup issue the
initial traverse isn't being zero-filled as expected?

ashmem ranges are created using: shmem_file_setup() and shmem_zero_setup()
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n377


If we purge pages, it punches it out with:
vfs_fallocate(range->asma->file,
     FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
     start, end - start);
here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n447

But in ashmem_pin(), we don't do anything other then returning if we
purged any page in the range.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n577

And I believe the future assumption is the if we traverse those pages
they will be zero filled (if purged or even during the initial
traversal after mmap)

Its been a long time, and I've not looked at the code in question but
it sounds from Hugh's comments above that we might instead get a
SIGBUS here.

Looking more at the problematic patch..
Amit: Does adding something like (whitespace damaged, apologies):

index a1a0025..1af6915 100644
--- a/drivers/staging/android/ashmem.c
+++ b/drivers/staging/android/ashmem.c
@@ -402,7 +402,8 @@ static int ashmem_mmap(struct file *file, struct
vm_area_struct *vma)
                        fput(asma->file);
                        goto out;
                }
-       }
+       } else
+               vma_set_anonymous(vma);

        if (vma->vm_file)
                fput(vma->vm_file);


Seem to resolve it? (Sorry, I'd test it myself, but I'm away from my
desk for the night).
thanks
-john

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31  1:01         ` Linus Torvalds
  2018-07-31  3:26           ` Hugh Dickins
@ 2018-07-31  6:29           ` Kirill A. Shutemov
  2018-07-31 14:57             ` Kirill A. Shutemov
  1 sibling, 1 reply; 32+ messages in thread
From: Kirill A. Shutemov @ 2018-07-31  6:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Matthew Wilcox, Amit Pundir, Kirill A. Shutemov,
	Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli,
	Greg Kroah-Hartman, John Stultz, linux-mm,
	Linux Kernel Mailing List, youling257

On Mon, Jul 30, 2018 at 06:01:26PM -0700, Linus Torvalds wrote:
> On Mon, Jul 30, 2018 at 2:53 PM Hugh Dickins <hughd@google.com> wrote:
> >
> > I have no problem with reverting -rc7's vma_is_anonymous() series.
> 
> I don't think we need to revert the whole series: I think the rest are
> all fairly obvious cleanups, and shouldn't really have any semantic
> changes.
> 
> It's literally only that last patch in the series that then changes
> that meaning of "vm_ops". And I don't really _mind_ that last step
> either, but since we don't know exactly what it was that it broke, and
> we're past rc7, I don't think we really have any option but the revert
> it.
> 
> And if we revert it, I think we need to just remove the
> VM_BUG_ON_VMA() that it was supposed to fix. Because I do think that
> it is quite likely that the real bug is that overzealous BUG_ON(),
> since I can't see any reason why anonymous mappings should be special
> there.
> 
> But I'm certainly also ok with re-visiting that commit later.  I just
> think that right _now_ the above is my preferred plan.
> 
> Kirill?

Considering the timing, I'm okay with reverting the last patch with
dropping the VM_BUG_ON_VMA().

But in the end I would like to see strong vma_is_anonymous().

The VM_BUG_ON_VMA() is only triggerable by the test case because
vma_is_anonymous() false-positive in fault path and we get anon-THP
allocated in file-private mapping.

I don't see immediately how this may trigger other crashes.
But it definitely looks wrong.

> > I'm all for deleting that VM_BUG_ON_VMA() in zap_pmd_range(), it was
> > just a compromise with those who wanted to keep something there;
> > I don't think we even need a WARN_ON_ONCE() now.
> 
> So to me it looks like a historical check that simply doesn't
> "normally" trigger, but there's no reason I can see why we should care
> about the case it tests against.

I'll think more on what could go wrong with __split_huge_pmd() called on
anon-THP page without mmap_sem(). It's not yet clear cut to me.


-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31  4:25             ` John Stultz
@ 2018-07-31  6:40               ` Amit Pundir
  2018-07-31  6:56                 ` Kirill A. Shutemov
  2018-07-31 16:29                 ` Linus Torvalds
  0 siblings, 2 replies; 32+ messages in thread
From: Amit Pundir @ 2018-07-31  6:40 UTC (permalink / raw)
  To: John Stultz
  Cc: Hugh Dickins, Linus Torvalds, kirill, willy, Kirill A. Shutemov,
	Andrew Morton, Dmitry Vyukov, Oleg Nesterov, aarcange,
	Greg Kroah-Hartman, linux-mm, lkml, youling 257, Joel Fernandes,
	ccross

On Tue, 31 Jul 2018 at 09:55, John Stultz <john.stultz@linaro.org> wrote:
>
> On Mon, Jul 30, 2018 at 8:26 PM, Hugh Dickins <hughd@google.com> wrote:
> > On Mon, 30 Jul 2018, Linus Torvalds wrote:
> >> On Mon, Jul 30, 2018 at 2:53 PM Hugh Dickins <hughd@google.com> wrote:
> >> >
> >> > I have no problem with reverting -rc7's vma_is_anonymous() series.
> >>
> >> I don't think we need to revert the whole series: I think the rest are
> >> all fairly obvious cleanups, and shouldn't really have any semantic
> >> changes.
> >
> > Okay.
> >
> >>
> >> It's literally only that last patch in the series that then changes
> >> that meaning of "vm_ops". And I don't really _mind_ that last step
> >> either, but since we don't know exactly what it was that it broke, and
> >> we're past rc7, I don't think we really have any option but the revert
> >> it.
> >
> > It took me a long time to grasp what was happening, that that last
> > patch bfd40eaff5ab was fixing. Not quite explained in the commit.
> >
> > I think it was that by mistakenly passing the vma_is_anonymous() test,
> > create_huge_pmd() gave the MAP_PRIVATE kcov mapping a THP (instead of
> > COWing pages from kcov); which the truncate then had to split, and in
> > going to do so, again hit the mistaken vma_is_anonymous() test, BUG.
> >
> >>
> >> And if we revert it, I think we need to just remove the
> >> VM_BUG_ON_VMA() that it was supposed to fix. Because I do think that
> >> it is quite likely that the real bug is that overzealous BUG_ON(),
> >> since I can't see any reason why anonymous mappings should be special
> >> there.
> >
> > Yes, that probably has to go: but it's not clear what state it leaves
> > us in, with an anon THP being split by a truncate, without the expected
> > locking; I don't remember offhand, probably a subtler bug than that BUG,
> > which you may or may not consider an improvement.
> >
> > I fear that Kirill has not missed inserting a vma_set_anonymous() from
> > somewhere that it should be, but rather that zygote is working with some
> > special mapping which used to satisfy vma_is_anonymous(), faults supplying
> > backing pages, but now comes out as !vma_is_anonymous(), so do_fault()
> > finds !dummy_vm_ops.fault hence SIGBUS.
>
> I've been only casually following this thread (mostly just glad Amit
> caught it and I could avoid having to bisect the issue in my own
> Android testing), but this bit starting to shake a few old cobwebs
> loose in my brain.
>
> I'm wondering if Zygote is utilizing ashmem here, and we're somehow
> traversing ashmem purged memory, or due to some setup issue the
> initial traverse isn't being zero-filled as expected?
>
> ashmem ranges are created using: shmem_file_setup() and shmem_zero_setup()
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n377
>
>
> If we purge pages, it punches it out with:
> vfs_fallocate(range->asma->file,
>      FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
>      start, end - start);
> here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n447
>
> But in ashmem_pin(), we don't do anything other then returning if we
> purged any page in the range.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n577
>
> And I believe the future assumption is the if we traverse those pages
> they will be zero filled (if purged or even during the initial
> traversal after mmap)
>
> Its been a long time, and I've not looked at the code in question but
> it sounds from Hugh's comments above that we might instead get a
> SIGBUS here.
>
> Looking more at the problematic patch..
> Amit: Does adding something like (whitespace damaged, apologies):
>
> index a1a0025..1af6915 100644
> --- a/drivers/staging/android/ashmem.c
> +++ b/drivers/staging/android/ashmem.c
> @@ -402,7 +402,8 @@ static int ashmem_mmap(struct file *file, struct
> vm_area_struct *vma)
>                         fput(asma->file);
>                         goto out;
>                 }
> -       }
> +       } else
> +               vma_set_anonymous(vma);
>
>         if (vma->vm_file)
>                 fput(vma->vm_file);
>

This ashmem change ^^ worked too.

Regards,
Amit Pundir

>
> Seem to resolve it? (Sorry, I'd test it myself, but I'm away from my
> desk for the night).
> thanks
> -john

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31  6:40               ` Amit Pundir
@ 2018-07-31  6:56                 ` Kirill A. Shutemov
  2018-07-31 16:29                 ` Linus Torvalds
  1 sibling, 0 replies; 32+ messages in thread
From: Kirill A. Shutemov @ 2018-07-31  6:56 UTC (permalink / raw)
  To: Amit Pundir
  Cc: John Stultz, Hugh Dickins, Linus Torvalds, willy,
	Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	aarcange, Greg Kroah-Hartman, linux-mm, lkml, youling 257,
	Joel Fernandes, ccross

On Tue, Jul 31, 2018 at 12:10:06PM +0530, Amit Pundir wrote:
> On Tue, 31 Jul 2018 at 09:55, John Stultz <john.stultz@linaro.org> wrote:
> >
> > On Mon, Jul 30, 2018 at 8:26 PM, Hugh Dickins <hughd@google.com> wrote:
> > > On Mon, 30 Jul 2018, Linus Torvalds wrote:
> > >> On Mon, Jul 30, 2018 at 2:53 PM Hugh Dickins <hughd@google.com> wrote:
> > >> >
> > >> > I have no problem with reverting -rc7's vma_is_anonymous() series.
> > >>
> > >> I don't think we need to revert the whole series: I think the rest are
> > >> all fairly obvious cleanups, and shouldn't really have any semantic
> > >> changes.
> > >
> > > Okay.
> > >
> > >>
> > >> It's literally only that last patch in the series that then changes
> > >> that meaning of "vm_ops". And I don't really _mind_ that last step
> > >> either, but since we don't know exactly what it was that it broke, and
> > >> we're past rc7, I don't think we really have any option but the revert
> > >> it.
> > >
> > > It took me a long time to grasp what was happening, that that last
> > > patch bfd40eaff5ab was fixing. Not quite explained in the commit.
> > >
> > > I think it was that by mistakenly passing the vma_is_anonymous() test,
> > > create_huge_pmd() gave the MAP_PRIVATE kcov mapping a THP (instead of
> > > COWing pages from kcov); which the truncate then had to split, and in
> > > going to do so, again hit the mistaken vma_is_anonymous() test, BUG.
> > >
> > >>
> > >> And if we revert it, I think we need to just remove the
> > >> VM_BUG_ON_VMA() that it was supposed to fix. Because I do think that
> > >> it is quite likely that the real bug is that overzealous BUG_ON(),
> > >> since I can't see any reason why anonymous mappings should be special
> > >> there.
> > >
> > > Yes, that probably has to go: but it's not clear what state it leaves
> > > us in, with an anon THP being split by a truncate, without the expected
> > > locking; I don't remember offhand, probably a subtler bug than that BUG,
> > > which you may or may not consider an improvement.
> > >
> > > I fear that Kirill has not missed inserting a vma_set_anonymous() from
> > > somewhere that it should be, but rather that zygote is working with some
> > > special mapping which used to satisfy vma_is_anonymous(), faults supplying
> > > backing pages, but now comes out as !vma_is_anonymous(), so do_fault()
> > > finds !dummy_vm_ops.fault hence SIGBUS.
> >
> > I've been only casually following this thread (mostly just glad Amit
> > caught it and I could avoid having to bisect the issue in my own
> > Android testing), but this bit starting to shake a few old cobwebs
> > loose in my brain.
> >
> > I'm wondering if Zygote is utilizing ashmem here, and we're somehow
> > traversing ashmem purged memory, or due to some setup issue the
> > initial traverse isn't being zero-filled as expected?
> >
> > ashmem ranges are created using: shmem_file_setup() and shmem_zero_setup()
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n377
> >
> >
> > If we purge pages, it punches it out with:
> > vfs_fallocate(range->asma->file,
> >      FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> >      start, end - start);
> > here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n447
> >
> > But in ashmem_pin(), we don't do anything other then returning if we
> > purged any page in the range.
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n577
> >
> > And I believe the future assumption is the if we traverse those pages
> > they will be zero filled (if purged or even during the initial
> > traversal after mmap)
> >
> > Its been a long time, and I've not looked at the code in question but
> > it sounds from Hugh's comments above that we might instead get a
> > SIGBUS here.
> >
> > Looking more at the problematic patch..
> > Amit: Does adding something like (whitespace damaged, apologies):
> >
> > index a1a0025..1af6915 100644
> > --- a/drivers/staging/android/ashmem.c
> > +++ b/drivers/staging/android/ashmem.c
> > @@ -402,7 +402,8 @@ static int ashmem_mmap(struct file *file, struct
> > vm_area_struct *vma)
> >                         fput(asma->file);
> >                         goto out;
> >                 }
> > -       }
> > +       } else
> > +               vma_set_anonymous(vma);
> >
> >         if (vma->vm_file)
> >                 fput(vma->vm_file);
> >
> 
> This ashmem change ^^ worked too.

Okay. It makes sense.

But I'm not convinced that's a legitimate way to get an anonymous mapping.

I don't know how ashmem suppose to work. Looks like we get a shmem file
associated with the mapping, even if user asked for private mapping.

Shouldn't in this case vm_ops point to shmem_vm_ops?

Note, we have only one other case when MAP_PRIVATE on a file gets you an
anonymous mapping -- /dev/zero.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31  6:29           ` Linux 4.18-rc7 Kirill A. Shutemov
@ 2018-07-31 14:57             ` Kirill A. Shutemov
  2018-08-01  0:09               ` Hugh Dickins
  0 siblings, 1 reply; 32+ messages in thread
From: Kirill A. Shutemov @ 2018-07-31 14:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Matthew Wilcox, Amit Pundir, Kirill A. Shutemov,
	Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli,
	Greg Kroah-Hartman, John Stultz, linux-mm,
	Linux Kernel Mailing List, youling257

On Tue, Jul 31, 2018 at 09:29:27AM +0300, Kirill A. Shutemov wrote:
> On Mon, Jul 30, 2018 at 06:01:26PM -0700, Linus Torvalds wrote:
> > On Mon, Jul 30, 2018 at 2:53 PM Hugh Dickins <hughd@google.com> wrote:
> > >
> > > I have no problem with reverting -rc7's vma_is_anonymous() series.
> > 
> > I don't think we need to revert the whole series: I think the rest are
> > all fairly obvious cleanups, and shouldn't really have any semantic
> > changes.
> > 
> > It's literally only that last patch in the series that then changes
> > that meaning of "vm_ops". And I don't really _mind_ that last step
> > either, but since we don't know exactly what it was that it broke, and
> > we're past rc7, I don't think we really have any option but the revert
> > it.
> > 
> > And if we revert it, I think we need to just remove the
> > VM_BUG_ON_VMA() that it was supposed to fix. Because I do think that
> > it is quite likely that the real bug is that overzealous BUG_ON(),
> > since I can't see any reason why anonymous mappings should be special
> > there.
> > 
> > But I'm certainly also ok with re-visiting that commit later.  I just
> > think that right _now_ the above is my preferred plan.
> > 
> > Kirill?
> 
> Considering the timing, I'm okay with reverting the last patch with
> dropping the VM_BUG_ON_VMA().
> 
> But in the end I would like to see strong vma_is_anonymous().
> 
> The VM_BUG_ON_VMA() is only triggerable by the test case because
> vma_is_anonymous() false-positive in fault path and we get anon-THP
> allocated in file-private mapping.
> 
> I don't see immediately how this may trigger other crashes.
> But it definitely looks wrong.
> 
> > > I'm all for deleting that VM_BUG_ON_VMA() in zap_pmd_range(), it was
> > > just a compromise with those who wanted to keep something there;
> > > I don't think we even need a WARN_ON_ONCE() now.
> > 
> > So to me it looks like a historical check that simply doesn't
> > "normally" trigger, but there's no reason I can see why we should care
> > about the case it tests against.
> 
> I'll think more on what could go wrong with __split_huge_pmd() called on
> anon-THP page without mmap_sem(). It's not yet clear cut to me.

I think not having mmap_sem taken at least on read when we call
__split_huge_pmd() opens possiblity of race with khugepaged:
khugepaged can collapse the page back to THP as soon as we drop ptl.
As result pmd_none_or_trans_huge_or_clear_bad() would return true and we
basically leave the THP behind, not zapped.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31  6:40               ` Amit Pundir
  2018-07-31  6:56                 ` Kirill A. Shutemov
@ 2018-07-31 16:29                 ` Linus Torvalds
  2018-07-31 16:56                   ` John Stultz
                                     ` (2 more replies)
  1 sibling, 3 replies; 32+ messages in thread
From: Linus Torvalds @ 2018-07-31 16:29 UTC (permalink / raw)
  To: Amit Pundir
  Cc: John Stultz, Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox,
	Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Mon, Jul 30, 2018 at 11:40 PM Amit Pundir <amit.pundir@linaro.org> wrote:
>
> This ashmem change ^^ worked too.

Ok, let's go for that one and hope it's the only one.

John, can I get a proper commit message and sign-off for that ashmem change?

Kirill - you mentioned that somebody reproduced a problem on x86-64
too. I didn't see that report. Was that some odd x86 Android setup
with Ashmem too, or is there something else pending?

                       Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31 16:29                 ` Linus Torvalds
@ 2018-07-31 16:56                   ` John Stultz
  2018-07-31 17:03                   ` Kirill A. Shutemov
  2018-07-31 17:17                   ` [PATCH] staging: ashmem: Fix SIGBUS crash when traversing mmaped ashmem pages John Stultz
  2 siblings, 0 replies; 32+ messages in thread
From: John Stultz @ 2018-07-31 16:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Amit Pundir, Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox,
	Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Tue, Jul 31, 2018 at 9:29 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Jul 30, 2018 at 11:40 PM Amit Pundir <amit.pundir@linaro.org> wrote:
>>
>> This ashmem change ^^ worked too.
>
> Ok, let's go for that one and hope it's the only one.
>
> John, can I get a proper commit message and sign-off for that ashmem change?

Will do. Just doing some local testing myself to make sure all is well.

> Kirill - you mentioned that somebody reproduced a problem on x86-64
> too. I didn't see that report. Was that some odd x86 Android setup
> with Ashmem too, or is there something else pending?

Krill mentioned "zygote crashing, but on x86-64" and zygote is Android
so I assume it is the same issue.

thanks
-john

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31 16:29                 ` Linus Torvalds
  2018-07-31 16:56                   ` John Stultz
@ 2018-07-31 17:03                   ` Kirill A. Shutemov
  2018-07-31 17:43                     ` Luck, Tony
  2018-07-31 17:17                   ` [PATCH] staging: ashmem: Fix SIGBUS crash when traversing mmaped ashmem pages John Stultz
  2 siblings, 1 reply; 32+ messages in thread
From: Kirill A. Shutemov @ 2018-07-31 17:03 UTC (permalink / raw)
  To: Linus Torvalds, Luck, Tony
  Cc: Amit Pundir, John Stultz, Hugh Dickins, Matthew Wilcox,
	Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Tue, Jul 31, 2018 at 09:29:22AM -0700, Linus Torvalds wrote:
> On Mon, Jul 30, 2018 at 11:40 PM Amit Pundir <amit.pundir@linaro.org> wrote:
> >
> > This ashmem change ^^ worked too.
> 
> Ok, let's go for that one and hope it's the only one.
> 
> John, can I get a proper commit message and sign-off for that ashmem change?
> 
> Kirill - you mentioned that somebody reproduced a problem on x86-64
> too. I didn't see that report. Was that some odd x86 Android setup
> with Ashmem too, or is there something else pending?

I've got report from youling privately:

"mm: fix vma_is_anonymous() false-positives" cause my userspace boot failed,
our Androidx86 userspace can running on linux mainline kernel,
revert it boot succeed with 4.18rc7 kernel.

"mm: fix vma_is_anonymous() false-positives" cause these

07-30 11:04:19.556 1609 1609 F DEBUG : pid: 1304, tid: 1304, name: zygote
>>> zygote <<<
07-30 11:04:19.556 1609 1609 F DEBUG : signal 7 (SIGBUS), code 2
(BUS_ADRERR), fault addr 0x7494d008
07-30 11:04:19.556 1609 1609 F DEBUG : eax 00000000 ebx f337bb68 ecx
000001e0 edx 7494d008
07-30 11:04:19.556 1609 1609 F DEBUG : esi 7494d000 edi 00000000
07-30 11:04:19.556 1609 1609 F DEBUG : xcs 00000023 xds 0000002b xes
0000002b xfs 00000003 xss 0000002b
07-30 11:04:19.556 1609 1609 F DEBUG : eip f40f5c76 ebp ffa8d288 esp
ffa8d238 flags 00010202
07-30 11:04:19.581 1609 1609 F DEBUG :

-------------------------------------------------------------------------

The report also had screenshot attached about system info. It's a Baytrail
tablet with LinageOS, so I believe it's the same issue.

But it's not the only issue unfortunately. Tony reported issue with
booting ia64 with the patch. I have no clue why. I rechecked everything
ia64-specific and looks fine to me. :-/

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH] staging: ashmem: Fix SIGBUS crash when traversing mmaped ashmem pages
  2018-07-31 16:29                 ` Linus Torvalds
  2018-07-31 16:56                   ` John Stultz
  2018-07-31 17:03                   ` Kirill A. Shutemov
@ 2018-07-31 17:17                   ` John Stultz
  2 siblings, 0 replies; 32+ messages in thread
From: John Stultz @ 2018-07-31 17:17 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Amit Pundir, Kirill A. Shutemov, Kirill A. Shutemov,
	Andrew Morton, Dmitry Vyukov, Oleg Nesterov, aarcange,
	Linus Torvalds, Greg Kroah-Hartman, Hugh Dickins, Joel Fernandes,
	Colin Cross, Matthew Wilcox, linux-mm, youling 257

Amit Pundir and Youling in parallel reported crashes with recent
mainline kernels running Android:

F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
F DEBUG   : Build fingerprint:
'Android/db410c32_only/db410c32_only:Q/OC-MR1/102:userdebug/test-key
F DEBUG   : Revision: '0'
F DEBUG   : ABI: 'arm'
F DEBUG   : pid: 2261, tid: 2261, name: zygote  >>> zygote <<<
F DEBUG   : signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0xec00008
... <snip> ...
F DEBUG   : backtrace:
F DEBUG   :     #00 pc 00001c04  /system/lib/libc.so (memset+48)
F DEBUG   :     #01 pc 0010c513  /system/lib/libart.so
(create_mspace_with_base+82)
F DEBUG   :     #02 pc 0015c601  /system/lib/libart.so
(art::gc::space::DlMallocSpace::CreateMspace(void*, unsigned int,
unsigned int)+40)
F DEBUG   :     #03 pc 0015c3ed  /system/lib/libart.so
(art::gc::space::DlMallocSpace::CreateFromMemMap(art::MemMap*,
std::__1::basic_string<char, std::__
1::char_traits<char>, std::__1::allocator<char>> const&, unsigned int,
unsigned int, unsigned int, unsigned int, bool)+36)
...

This was bisected back to commit bfd40eaff5ab ("mm: fix vma_is_anonymous()
false-positives").

create_mspace_with_base() in the trace above, utilizes ashmem, and
with ashmem, for shared mappings we use shmem_zero_setup(), which sets
the vma->vm_ops to &shmem_vm_ops. But for private ashmem mappings
nothing sets the vma->vm_ops.

Looking at the problematic patch, it seems to add a requirement that
one call vma_set_anonymous() on a vma, otherwise the dummy_vm_ops will
be used. Using the dummy_vm_ops seem to triggger SIGBUS when traversing
unmapped pages.

Thus, this patch adds a call to vma_set_anonymous() for ashmem private
mappings and seems to avoid the reported problem.

Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: aarcange@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Colin Cross <ccross@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org
Cc: youling 257 <youling257@gmail.com>
Fixes: bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives")
Reported-by: Amit Pundir <amit.pundir@linaro.org>
Reported-by: Youling 257 <youling257@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---

Hopefully my explanation make sense here. Please let me know if it
needs corrections.
thanks
-john

---
 drivers/staging/android/ashmem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
index a1a0025..d5d33e1 100644
--- a/drivers/staging/android/ashmem.c
+++ b/drivers/staging/android/ashmem.c
@@ -402,6 +402,8 @@ static int ashmem_mmap(struct file *file, struct vm_area_struct *vma)
 			fput(asma->file);
 			goto out;
 		}
+	} else {
+		vma_set_anonymous(vma);
 	}
 
 	if (vma->vm_file)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31 17:03                   ` Kirill A. Shutemov
@ 2018-07-31 17:43                     ` Luck, Tony
  2018-07-31 19:02                       ` Linus Torvalds
  2018-08-01 17:15                       ` Linus Torvalds
  0 siblings, 2 replies; 32+ messages in thread
From: Luck, Tony @ 2018-07-31 17:43 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Linus Torvalds, Amit Pundir, John Stultz, Hugh Dickins,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Tue, Jul 31, 2018 at 08:03:28PM +0300, Kirill A. Shutemov wrote:
> But it's not the only issue unfortunately. Tony reported issue with
> booting ia64 with the patch. I have no clue why. I rechecked everything
> ia64-specific and looks fine to me. :-/

If I just revert bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives")
then ia64 boots again.

-Tony

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31 17:43                     ` Luck, Tony
@ 2018-07-31 19:02                       ` Linus Torvalds
  2018-08-01 17:15                       ` Linus Torvalds
  1 sibling, 0 replies; 32+ messages in thread
From: Linus Torvalds @ 2018-07-31 19:02 UTC (permalink / raw)
  To: Tony Luck
  Cc: Kirill A. Shutemov, Amit Pundir, John Stultz, Hugh Dickins,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Tue, Jul 31, 2018 at 10:43 AM Luck, Tony <tony.luck@intel.com> wrote:
>
> If I just revert bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives")
> then ia64 boots again.

Ok, so it's not just the ashmem thing.

I think I'll do an rc8 with the revert, just so that we'll have some
time to figure this out. It's only Tuesday, but I already have 90
commits since rc7, so this isn't the only issue we're having.

I _prefer_ just the regular cadence of releases, but when I have a
reason to delay, I'll delay.

                 Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31 14:57             ` Kirill A. Shutemov
@ 2018-08-01  0:09               ` Hugh Dickins
  0 siblings, 0 replies; 32+ messages in thread
From: Hugh Dickins @ 2018-08-01  0:09 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Linus Torvalds, Hugh Dickins, Matthew Wilcox, Amit Pundir,
	Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	Andrea Arcangeli, Greg Kroah-Hartman, John Stultz, linux-mm,
	Linux Kernel Mailing List, youling257

On Tue, 31 Jul 2018, Kirill A. Shutemov wrote:
> On Tue, Jul 31, 2018 at 09:29:27AM +0300, Kirill A. Shutemov wrote:
> > On Mon, Jul 30, 2018 at 06:01:26PM -0700, Linus Torvalds wrote:
> > > 
> > > So to me it looks like a historical check that simply doesn't
> > > "normally" trigger, but there's no reason I can see why we should care
> > > about the case it tests against.
> > 
> > I'll think more on what could go wrong with __split_huge_pmd() called on
> > anon-THP page without mmap_sem(). It's not yet clear cut to me.
> 
> I think not having mmap_sem taken at least on read when we call
> __split_huge_pmd() opens possiblity of race with khugepaged:
> khugepaged can collapse the page back to THP as soon as we drop ptl.
> As result pmd_none_or_trans_huge_or_clear_bad() would return true and we
> basically leave the THP behind, not zapped.

I think we don't care deeply about the POSIX truncate semantics on the
kind of "file" that has managed to get to this point: in the unlikely
event that a THP is immediately recreated there, never mind, so long as
we don't crash or leak memory or suchlike (the surplus THP would get
freed at exit).

I think we're altogether better off just deleting that VM_BUG_ON_VMA();
but I do find it very very hard to arrive at a firm conclusion on the
absolute safety of splitting a pmd without mmap_sem there (though any
problem unlikely even if real, and more likely a figment of my paranoia).

I believe the VM_BUG_ON is a relic from the old days, when anon_vma_lock
played a big part in guarding the pmd+page split: remember how mmap_sem
is one of the ways you can guarantee that the anon_vma will not vanish
beneath you (page_get_anon_vma was added later than anon THP).

I'm a little more worried by the nearby zap_huge_pmd() (which could
never be covered by a suitable VM_BUG_ON): the way that frees a
previously deposited page table, and you have no guarantee of when
and where that page table was last used. Again I can't point to an
actual problem, just the recollection that it's been found subtly
safe in the past, but any change in the conditions might affect that.

And a little worried to see how split_huge_page_to_list() uses
anon_vma_lock on PageAnon versus i_mmap_lock on !PageAnon: which
makes complete sense in itself, but won't protect against a PageAnon
THP being concurrently split from the truncate_pagecache() direction,
where unmap_mapping_range() uses i_mmap_lock. (simple_setattr() the
default setattr: that's a bit of a worry too.)

I feel I'm moaning and crying at shadows, rather than providing any
useful suggestions or patches; but thought I ought to report back.

Hugh

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31 17:43                     ` Luck, Tony
  2018-07-31 19:02                       ` Linus Torvalds
@ 2018-08-01 17:15                       ` Linus Torvalds
  2018-08-01 18:31                         ` Hugh Dickins
                                           ` (3 more replies)
  1 sibling, 4 replies; 32+ messages in thread
From: Linus Torvalds @ 2018-08-01 17:15 UTC (permalink / raw)
  To: Tony Luck
  Cc: Kirill A. Shutemov, Amit Pundir, John Stultz, Hugh Dickins,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Tue, Jul 31, 2018 at 10:43 AM Luck, Tony <tony.luck@intel.com> wrote:
>
> On Tue, Jul 31, 2018 at 08:03:28PM +0300, Kirill A. Shutemov wrote:
> > But it's not the only issue unfortunately. Tony reported issue with
> > booting ia64 with the patch. I have no clue why. I rechecked everything
> > ia64-specific and looks fine to me. :-/
>
> If I just revert bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives")
> then ia64 boots again.

Ok, I'd love to have more information about this, but I'm assuming
that Tony isn't running some odd ia64 version of Android, so there's
definitely something else than just the ashmem thing going on. Either
it's some odd ia64-specific special vma, or it's just something
triggered on an ia64 boot that nobody else noticed or cared about.

And I was just going to do the final revert and started this email to
say so, when I looked at the obvious candidate: the
ia64_init_addr_space() function. Trivially fixed.

But as I was doing that, I also noticed another problem with the vma
series: the vma_init() conversion of automatic variables is buggy.
Commit 2c4541e24c55 ("mm: use vma_init() to initialize VMAs on stack
and data segments") is really bad, because it never grew the memset()
that was discussed, and the patch that was applied was the original
one - so vma_init() only initializes a couple of fields. As a result,
doing things like this:

-       struct vm_area_struct vma = { .vm_mm = mm };
+       struct vm_area_struct vma;

+       vma_init(&vma, mm);

is just completely wrong, because it actually initializes much less
than it used to, and leaves most of the vma filled with random stack
garbage. In particular, it now fills with garbage the fields that TLB
flushing really can care about: things like vm_flags that says "is
this perhaps an executable-only mapping that only needs to flush the
ITLB?"

I don't actually believe that we should do vma_init() on those
on-stack vma's anyway, since they aren't "real" vma's. They are
literally crafted just for TLB flushing - filling in the vm_mm (and
sometimes vm_flags) pointers so that the TLB flushing knows what to
do.

So using "vma_init()" on them is actively detrimental as things stand
right now. The reason I looked at them was that I was trying to see
who actually uses "vm_area_alloc()" and "vma_init()" right now that
would be affected by that commit bfd40eaff5ab ("mm: fix
vma_is_anonymous() false-positives") outside of actual
honest-to-goodness device file mmaps.

Anyway, the upshot of all this is that I think I know what the ia64
problem was, and John sent the patch for the ashmem case, and I'm
going to hold off reverting that vma_is_anonymous() false-positives
commit after all.

I'm still unhappy about the vma_init() ones, and I have not decided
how to go with those. Either the memset() in vma_init(), or just
reverting the (imho unnecessary) commit 2c4541e24c55. Kirill, Andrew,
comments?

Tony, can you please double-check my commit ebad825cdd4e ("ia64: mark
special ia64 memory areas anonymous") fixes things for  you? I didn't
even compile it, but it really looks so obvious that I just committed
it directly. It's not pushed out yet (I'm doing the normal full build
test because of the ashmem commit first), but it should be out in
about 20 minutes when my testing has finished.

I'd like to get this sorted out asap, although at this point I still
think that I'll have to do an rc8 even though I feel like we may have
caught everything.

                        Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-08-01 17:15                       ` Linus Torvalds
@ 2018-08-01 18:31                         ` Hugh Dickins
  2018-08-01 20:58                           ` Kirill A. Shutemov
  2018-08-01 18:36                         ` Luck, Tony
                                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 32+ messages in thread
From: Hugh Dickins @ 2018-08-01 18:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tony Luck, Kirill A. Shutemov, Amit Pundir, John Stultz,
	Hugh Dickins, Matthew Wilcox, Kirill A. Shutemov, Andrew Morton,
	Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli,
	Greg Kroah-Hartman, linux-mm, Linux Kernel Mailing List,
	youling 257, Joel Fernandes, Colin Cross

On Wed, 1 Aug 2018, Linus Torvalds wrote:
> 
> Anyway, the upshot of all this is that I think I know what the ia64
> problem was, and John sent the patch for the ashmem case, and I'm
> going to hold off reverting that vma_is_anonymous() false-positives
> commit after all.

I'd better send deletion of zap_pmd_range()'s VM_BUG_ON_VMA(): below
(but I've no proprietorial interest, if you prefer to do your own).

John's patch is good, and originally I thought it was safe from that
VM_BUG_ON_VMA(), because the /dev/ashmem fd exposed to the user is
disconnected from the vm_file in the vma, and madvise(,,MADV_REMOVE)
insists on VM_SHARED. But afterwards read John's earlier mail,
drawing attention to the vfs_fallocate() in there: I may be wrong,
and I don't know if Android has THP in the config anyway, but it looks
to me like an unmap_mapping_range() from ashmem's vfs_fallocate()
could hit precisely the VM_BUG_ON_VMA(), once it's vma_is_anonymous().

(I'm not familiar with ashmem, and I certainly don't understand the
role of MAP_PRIVATE ashmem mappings - hole-punch's zap_pte_range()
should end up leaving any anon pages in place; but the presence of
the BUG is requiring us all to understand too much too quickly.)


[PATCH] mm: delete historical BUG from zap_pmd_range()

Delete the old VM_BUG_ON_VMA() from zap_pmd_range(), which asserted
that mmap_sem must be held when splitting an "anonymous" vma there.
Whether that's still strictly true nowadays is not entirely clear,
but the danger of sometimes crashing on the BUG is now fairly clear.

Signed-off-by: Hugh Dickins <hughd@google.com>
---

 mm/memory.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

--- 4.18-rc7/mm/memory.c	2018-06-16 18:48:22.041173422 -0700
+++ linux/mm/memory.c	2018-08-01 11:01:21.397286507 -0700
@@ -1417,11 +1417,9 @@ static inline unsigned long zap_pmd_rang
 	do {
 		next = pmd_addr_end(addr, end);
 		if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) {
-			if (next - addr != HPAGE_PMD_SIZE) {
-				VM_BUG_ON_VMA(vma_is_anonymous(vma) &&
-				    !rwsem_is_locked(&tlb->mm->mmap_sem), vma);
+			if (next - addr != HPAGE_PMD_SIZE)
 				__split_huge_pmd(vma, pmd, addr, false, NULL);
-			} else if (zap_huge_pmd(tlb, vma, pmd, addr))
+			else if (zap_huge_pmd(tlb, vma, pmd, addr))
 				goto next;
 			/* fall through */
 		}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-08-01 17:15                       ` Linus Torvalds
  2018-08-01 18:31                         ` Hugh Dickins
@ 2018-08-01 18:36                         ` Luck, Tony
  2018-08-01 20:05                         ` Linus Torvalds
  2018-08-02  6:59                         ` Amit Pundir
  3 siblings, 0 replies; 32+ messages in thread
From: Luck, Tony @ 2018-08-01 18:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kirill A. Shutemov, Amit Pundir, John Stultz, Hugh Dickins,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Wed, Aug 01, 2018 at 10:15:05AM -0700, Linus Torvalds wrote:
> Tony, can you please double-check my commit ebad825cdd4e ("ia64: mark
> special ia64 memory areas anonymous") fixes things for  you? I didn't
> even compile it, but it really looks so obvious that I just committed
> it directly. It's not pushed out yet (I'm doing the normal full build
> test because of the ashmem commit first), but it should be out in
> about 20 minutes when my testing has finished.
> 
> I'd like to get this sorted out asap, although at this point I still
> think that I'll have to do an rc8 even though I feel like we may have
> caught everything.

I pulled and got HEAD = 44960f2a7b63e224b1091b3e1d6f60e0cdf4be0c which
includes ebad825cdd4e.

ia64 boots fine with no issues.

Thanks

-Tony

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-08-01 17:15                       ` Linus Torvalds
  2018-08-01 18:31                         ` Hugh Dickins
  2018-08-01 18:36                         ` Luck, Tony
@ 2018-08-01 20:05                         ` Linus Torvalds
  2018-08-01 20:51                           ` Kirill A. Shutemov
  2018-08-02  6:59                         ` Amit Pundir
  3 siblings, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2018-08-01 20:05 UTC (permalink / raw)
  To: Tony Luck
  Cc: Kirill A. Shutemov, Amit Pundir, John Stultz, Hugh Dickins,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

[-- Attachment #1: Type: text/plain, Size: 1614 bytes --]

On Wed, Aug 1, 2018 at 10:15 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I'm still unhappy about the vma_init() ones, and I have not decided
> how to go with those. Either the memset() in vma_init(), or just
> reverting the (imho unnecessary) commit 2c4541e24c55. Kirill, Andrew,
> comments?

Ugh. Adding a memset looks simple, but screws up some places that have
other initialization. It also requires adding a new include of
<linux/string.h>, or we'd need to uninline vma_init() and put it
somewhere else.

But just reverting commit 2c4541e24c55 ("mm: use vma_init() to
initialize VMAs on stack and data segments") entirely isn't good
either, because some of the cases aren't about the TLB flush
interface, and call down to "real" VM functions. The 'pseudo_vma' use
of remove_inode_hugepages() and hugetlbfs_fallocate() in particular is
odd, but using vma_init() looks good there. And those places had the
memset() already.

So I'm inclined to simply mark the TLB-related vma_init() cases
special, and use something like this:

  #define TLB_FLUSH_VMA(mm,flags) { .vm_mm = (mm), .vm_flags = (flags) }

to make it very obvious when we're doing that vma initialization for
flush_tlb_range(). It's done as an initializer, exactly so that the
only valid syntax is to do somethin glike this:

        struct vm_area_struct vma = TLB_FLUSH_VMA(mm, VM_EXEC);

That leaves vma_init() users to be just the actual real allocation
path, and a few very specific specual vmas (the hugetlbfs and
mempolicy pseudo-vma, and a couple of "gate" vmas).

Suggested patch attached. Comments?

                 Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 4025 bytes --]

 arch/arm/mach-rpc/ecard.c    |  5 +----
 arch/arm64/include/asm/tlb.h |  4 +---
 arch/arm64/mm/hugetlbpage.c  | 10 ++++------
 arch/ia64/include/asm/tlb.h  |  7 +++----
 include/linux/mm.h           |  3 +++
 5 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/arch/arm/mach-rpc/ecard.c b/arch/arm/mach-rpc/ecard.c
index 8db62cc54a6a..04b2f22c2739 100644
--- a/arch/arm/mach-rpc/ecard.c
+++ b/arch/arm/mach-rpc/ecard.c
@@ -212,7 +212,7 @@ static DEFINE_MUTEX(ecard_mutex);
  */
 static void ecard_init_pgtables(struct mm_struct *mm)
 {
-	struct vm_area_struct vma;
+	struct vm_area_struct vma = TLB_FLUSH_VMA(mm, VM_EXEC);
 
 	/* We want to set up the page tables for the following mapping:
 	 *  Virtual	Physical
@@ -237,9 +237,6 @@ static void ecard_init_pgtables(struct mm_struct *mm)
 
 	memcpy(dst_pgd, src_pgd, sizeof(pgd_t) * (EASI_SIZE / PGDIR_SIZE));
 
-	vma_init(&vma, mm);
-	vma.vm_flags = VM_EXEC;
-
 	flush_tlb_range(&vma, IO_START, IO_START + IO_SIZE);
 	flush_tlb_range(&vma, EASI_START, EASI_START + EASI_SIZE);
 }
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index d87f2d646caa..0ad1cf233470 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -37,9 +37,7 @@ static inline void __tlb_remove_table(void *_table)
 
 static inline void tlb_flush(struct mmu_gather *tlb)
 {
-	struct vm_area_struct vma;
-
-	vma_init(&vma, tlb->mm);
+	struct vm_area_struct vma = TLB_FLUSH_VMA(tlb->mm, 0);
 
 	/*
 	 * The ASID allocator will either invalidate the ASID or mark
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 1854e49aa18a..192b3ba07075 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -108,13 +108,10 @@ static pte_t get_clear_flush(struct mm_struct *mm,
 			     unsigned long pgsize,
 			     unsigned long ncontig)
 {
-	struct vm_area_struct vma;
 	pte_t orig_pte = huge_ptep_get(ptep);
 	bool valid = pte_valid(orig_pte);
 	unsigned long i, saddr = addr;
 
-	vma_init(&vma, mm);
-
 	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
 		pte_t pte = ptep_get_and_clear(mm, addr, ptep);
 
@@ -127,8 +124,10 @@ static pte_t get_clear_flush(struct mm_struct *mm,
 			orig_pte = pte_mkdirty(orig_pte);
 	}
 
-	if (valid)
+	if (valid) {
+		struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0);
 		flush_tlb_range(&vma, saddr, addr);
+	}
 	return orig_pte;
 }
 
@@ -147,10 +146,9 @@ static void clear_flush(struct mm_struct *mm,
 			     unsigned long pgsize,
 			     unsigned long ncontig)
 {
-	struct vm_area_struct vma;
+	struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0);
 	unsigned long i, saddr = addr;
 
-	vma_init(&vma, mm);
 	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++)
 		pte_clear(mm, addr, ptep);
 
diff --git a/arch/ia64/include/asm/tlb.h b/arch/ia64/include/asm/tlb.h
index db89e7306081..516355a774bf 100644
--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -115,12 +115,11 @@ ia64_tlb_flush_mmu_tlbonly(struct mmu_gather *tlb, unsigned long start, unsigned
 		flush_tlb_all();
 	} else {
 		/*
-		 * XXX fix me: flush_tlb_range() should take an mm pointer instead of a
-		 * vma pointer.
+		 * flush_tlb_range() takes a vma instead of a mm pointer because
+		 * some architectures want the vm_flags for ITLB/DTLB flush.
 		 */
-		struct vm_area_struct vma;
+		struct vm_area_struct vma = TLB_FLUSH_VMA(tlb->mm, 0);
 
-		vma_init(&vma, tlb->mm);
 		/* flush the address range from the tlb: */
 		flush_tlb_range(&vma, start, end);
 		/* now flush the virt. page-table area mapping the address range: */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7ba6d356d18f..68a5121694ef 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -466,6 +466,9 @@ static inline void vma_set_anonymous(struct vm_area_struct *vma)
 	vma->vm_ops = NULL;
 }
 
+/* flush_tlb_range() takes a vma, not a mm, and can care about flags */
+#define TLB_FLUSH_VMA(mm,flags) { .vm_mm = (mm), .vm_flags = (flags) }
+
 struct mmu_gather;
 struct inode;
 

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-08-01 20:05                         ` Linus Torvalds
@ 2018-08-01 20:51                           ` Kirill A. Shutemov
  2018-08-01 20:56                             ` Linus Torvalds
  0 siblings, 1 reply; 32+ messages in thread
From: Kirill A. Shutemov @ 2018-08-01 20:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tony Luck, Amit Pundir, John Stultz, Hugh Dickins,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Wed, Aug 01, 2018 at 01:05:48PM -0700, Linus Torvalds wrote:
> On Wed, Aug 1, 2018 at 10:15 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > I'm still unhappy about the vma_init() ones, and I have not decided
> > how to go with those. Either the memset() in vma_init(), or just
> > reverting the (imho unnecessary) commit 2c4541e24c55. Kirill, Andrew,
> > comments?
> 
> Ugh. Adding a memset looks simple, but screws up some places that have
> other initialization. It also requires adding a new include of
> <linux/string.h>, or we'd need to uninline vma_init() and put it
> somewhere else.
> 
> But just reverting commit 2c4541e24c55 ("mm: use vma_init() to
> initialize VMAs on stack and data segments") entirely isn't good
> either, because some of the cases aren't about the TLB flush
> interface, and call down to "real" VM functions. The 'pseudo_vma' use
> of remove_inode_hugepages() and hugetlbfs_fallocate() in particular is
> odd, but using vma_init() looks good there. And those places had the
> memset() already.
> 
> So I'm inclined to simply mark the TLB-related vma_init() cases
> special, and use something like this:
> 
>   #define TLB_FLUSH_VMA(mm,flags) { .vm_mm = (mm), .vm_flags = (flags) }
> 
> to make it very obvious when we're doing that vma initialization for
> flush_tlb_range(). It's done as an initializer, exactly so that the
> only valid syntax is to do somethin glike this:
> 
>         struct vm_area_struct vma = TLB_FLUSH_VMA(mm, VM_EXEC);
> 
> That leaves vma_init() users to be just the actual real allocation
> path, and a few very specific specual vmas (the hugetlbfs and
> mempolicy pseudo-vma, and a couple of "gate" vmas).
> 
> Suggested patch attached. Comments?

Is there a reason why we pass vma to flush_tlb_range?

It's not obvious to me what information from VMA can be useful for an
implementation. I see that ecard.c initialize vm_flags too, but it seems
unused by flush_tlb_range.

Maybe it's cleaner to have generic helper flush_tlb_range_mm() or
something?

In longer term we can change the interface to take mm instead of vma.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-08-01 20:51                           ` Kirill A. Shutemov
@ 2018-08-01 20:56                             ` Linus Torvalds
  2018-08-01 21:25                               ` Kirill A. Shutemov
  0 siblings, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2018-08-01 20:56 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Tony Luck, Amit Pundir, John Stultz, Hugh Dickins,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Wed, Aug 1, 2018 at 1:52 PM Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> Is there a reason why we pass vma to flush_tlb_range?

Yes. It's even in that patch.

The fact is, real MM users *have* a vma, and passing it in to the TLB
flushing is the right thing to do. That allows architectures that care
(mainly powerpc, I think) to notice that "hey, this range only had
execute permissions, so I only need to flush the ITLB".

The people who use tlb_flush_range() any other way are doing an
arch-specific hack.  It's not how tlb_flush_range() was defined, and
it's not how you can use it in general.

> It's not obvious to me what information from VMA can be useful for an
> implementation.

See the patch I sent, which had this as part of it:

-                * XXX fix me: flush_tlb_range() should take an mm
pointer instead of a
-                * vma pointer.
+                * flush_tlb_range() takes a vma instead of a mm pointer because
+                * some architectures want the vm_flags for ITLB/DTLB flush.

because I wanted to educate people about why the interface was what it
was, and the "fixme" was bogus shit.

> In longer term we can change the interface to take mm instead of vma.

FUCK NO!

Goddammit, read the code, or read the patch. The places ytou added
those broken vma_init() calls to were architecture-specific hacks.

Those architecture-specific hacks do not get to screw up the design
for everybody else.

                     Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-08-01 18:31                         ` Hugh Dickins
@ 2018-08-01 20:58                           ` Kirill A. Shutemov
  2018-08-01 21:55                             ` Hugh Dickins
  0 siblings, 1 reply; 32+ messages in thread
From: Kirill A. Shutemov @ 2018-08-01 20:58 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Tony Luck, Amit Pundir, John Stultz,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Wed, Aug 01, 2018 at 11:31:52AM -0700, Hugh Dickins wrote:
> On Wed, 1 Aug 2018, Linus Torvalds wrote:
> > 
> > Anyway, the upshot of all this is that I think I know what the ia64
> > problem was, and John sent the patch for the ashmem case, and I'm
> > going to hold off reverting that vma_is_anonymous() false-positives
> > commit after all.
> 
> I'd better send deletion of zap_pmd_range()'s VM_BUG_ON_VMA(): below
> (but I've no proprietorial interest, if you prefer to do your own).

Agreed.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

> John's patch is good, and originally I thought it was safe from that
> VM_BUG_ON_VMA(), because the /dev/ashmem fd exposed to the user is
> disconnected from the vm_file in the vma, and madvise(,,MADV_REMOVE)
> insists on VM_SHARED. But afterwards read John's earlier mail,
> drawing attention to the vfs_fallocate() in there: I may be wrong,
> and I don't know if Android has THP in the config anyway, but it looks
> to me like an unmap_mapping_range() from ashmem's vfs_fallocate()
> could hit precisely the VM_BUG_ON_VMA(), once it's vma_is_anonymous().
> 
> (I'm not familiar with ashmem, and I certainly don't understand the
> role of MAP_PRIVATE ashmem mappings - hole-punch's zap_pte_range()
> should end up leaving any anon pages in place; but the presence of
> the BUG is requiring us all to understand too much too quickly.)

Hugh, do you see any reason why ashmem shouldn't have vm_ops ==
shmem_vm_ops?

I don't understand ashmem, but I feel uncomfortable that we have this
sneaky way to create an anonymous VMA. It feels wrong to me.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-08-01 20:56                             ` Linus Torvalds
@ 2018-08-01 21:25                               ` Kirill A. Shutemov
  0 siblings, 0 replies; 32+ messages in thread
From: Kirill A. Shutemov @ 2018-08-01 21:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tony Luck, Amit Pundir, John Stultz, Hugh Dickins,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Wed, Aug 01, 2018 at 01:56:19PM -0700, Linus Torvalds wrote:
> On Wed, Aug 1, 2018 at 1:52 PM Kirill A. Shutemov <kirill@shutemov.name> wrote:
> >
> > Is there a reason why we pass vma to flush_tlb_range?
> 
> Yes. It's even in that patch.
> 
> The fact is, real MM users *have* a vma, and passing it in to the TLB
> flushing is the right thing to do. That allows architectures that care
> (mainly powerpc, I think) to notice that "hey, this range only had
> execute permissions, so I only need to flush the ITLB".
> 
> The people who use tlb_flush_range() any other way are doing an
> arch-specific hack.  It's not how tlb_flush_range() was defined, and
> it's not how you can use it in general.

Okay, I see.

ARM, unicore32 and xtensa avoid iTLB flush for non-executable VMAs.

> 
> > It's not obvious to me what information from VMA can be useful for an
> > implementation.
> 
> See the patch I sent, which had this as part of it:
> 
> -                * XXX fix me: flush_tlb_range() should take an mm
> pointer instead of a
> -                * vma pointer.
> +                * flush_tlb_range() takes a vma instead of a mm pointer because
> +                * some architectures want the vm_flags for ITLB/DTLB flush.
> 
> because I wanted to educate people about why the interface was what it
> was, and the "fixme" was bogus shit.

I didn't noticied this. Sorry.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-08-01 20:58                           ` Kirill A. Shutemov
@ 2018-08-01 21:55                             ` Hugh Dickins
  2018-08-02 19:12                               ` John Stultz
  0 siblings, 1 reply; 32+ messages in thread
From: Hugh Dickins @ 2018-08-01 21:55 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Linus Torvalds, Tony Luck, Amit Pundir,
	John Stultz, Matthew Wilcox, Kirill A. Shutemov, Andrew Morton,
	Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli,
	Greg Kroah-Hartman, linux-mm, Linux Kernel Mailing List,
	youling 257, Joel Fernandes, Colin Cross

On Wed, 1 Aug 2018, Kirill A. Shutemov wrote:
> On Wed, Aug 01, 2018 at 11:31:52AM -0700, Hugh Dickins wrote:
> > On Wed, 1 Aug 2018, Linus Torvalds wrote:
> > > 
> > > Anyway, the upshot of all this is that I think I know what the ia64
> > > problem was, and John sent the patch for the ashmem case, and I'm
> > > going to hold off reverting that vma_is_anonymous() false-positives
> > > commit after all.
> > 
> > I'd better send deletion of zap_pmd_range()'s VM_BUG_ON_VMA(): below
> > (but I've no proprietorial interest, if you prefer to do your own).
> 
> Agreed.
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Thanks.

> 
> > John's patch is good, and originally I thought it was safe from that
> > VM_BUG_ON_VMA(), because the /dev/ashmem fd exposed to the user is
> > disconnected from the vm_file in the vma, and madvise(,,MADV_REMOVE)
> > insists on VM_SHARED. But afterwards read John's earlier mail,
> > drawing attention to the vfs_fallocate() in there: I may be wrong,
> > and I don't know if Android has THP in the config anyway, but it looks
> > to me like an unmap_mapping_range() from ashmem's vfs_fallocate()
> > could hit precisely the VM_BUG_ON_VMA(), once it's vma_is_anonymous().
> > 
> > (I'm not familiar with ashmem, and I certainly don't understand the
> > role of MAP_PRIVATE ashmem mappings - hole-punch's zap_pte_range()
> > should end up leaving any anon pages in place; but the presence of
> > the BUG is requiring us all to understand too much too quickly.)
> 
> Hugh, do you see any reason why ashmem shouldn't have vm_ops ==
> shmem_vm_ops?

I cannot immediately think of an absolute reason why not, but I'm
not giving it much thought; and that might turn it into a stranger
beast than it already is.

> 
> I don't understand ashmem, but I feel uncomfortable that we have this
> sneaky way to create an anonymous VMA. It feels wrong to me.

I agree it's odd, but in this respect it's no odder than /dev/zero:
that has exactly the same pattern of shmem_zero_setup() for VM_SHARED,
else vma_set_anonymous(): which made me comfortable with John's patch,
restoring the way it worked before.

Admittedly, the subsequent vfs_fallocate() might generate surprises;
and the business of doing a shmem_file_setup() first, and then undoing
it with a shmem_zero_setup(), looks weird - from John's old XXX comment,
I think it was a quick hack to piece together some functionality they
needed in a hurry, which never got revisited (they wanted a name for
the area? maybe memfd would be good for that now).

But if what's in there is working now, I do not want to mess with it:
I'd be adding bugs faster than removing them.

Hugh

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-08-01 17:15                       ` Linus Torvalds
                                           ` (2 preceding siblings ...)
  2018-08-01 20:05                         ` Linus Torvalds
@ 2018-08-02  6:59                         ` Amit Pundir
  3 siblings, 0 replies; 32+ messages in thread
From: Amit Pundir @ 2018-08-02  6:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: tony.luck, kirill, John Stultz, Hugh Dickins, willy,
	Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	aarcange, Greg Kroah-Hartman, linux-mm, lkml, youling 257,
	Joel Fernandes, Colin Cross

On Wed, 1 Aug 2018 at 22:45, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I'd like to get this sorted out asap, although at this point I still
> think that I'll have to do an rc8 even though I feel like we may have
> caught everything.

No AOSP regressions in my limited smoke testing so far with
current HEAD: 6b4703768268 ("Merge branch 'fixes' of
git://git.armlinux.org.uk/~rmk/linux-arm"). Thanks.

Regards,
Amit Pundir

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-08-01 21:55                             ` Hugh Dickins
@ 2018-08-02 19:12                               ` John Stultz
  0 siblings, 0 replies; 32+ messages in thread
From: John Stultz @ 2018-08-02 19:12 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Kirill A. Shutemov, Linus Torvalds, Tony Luck, Amit Pundir,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Wed, Aug 1, 2018 at 2:55 PM, Hugh Dickins <hughd@google.com> wrote:
> On Wed, 1 Aug 2018, Kirill A. Shutemov wrote:
>> On Wed, Aug 01, 2018 at 11:31:52AM -0700, Hugh Dickins wrote:
>> > On Wed, 1 Aug 2018, Linus Torvalds wrote:
>> > >
>> > > Anyway, the upshot of all this is that I think I know what the ia64
>> > > problem was, and John sent the patch for the ashmem case, and I'm
>> > > going to hold off reverting that vma_is_anonymous() false-positives
>> > > commit after all.
>> >
>> > I'd better send deletion of zap_pmd_range()'s VM_BUG_ON_VMA(): below
>> > (but I've no proprietorial interest, if you prefer to do your own).
>>
>> Agreed.
>>
>> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>
> Thanks.
>
>>
>> > John's patch is good, and originally I thought it was safe from that
>> > VM_BUG_ON_VMA(), because the /dev/ashmem fd exposed to the user is
>> > disconnected from the vm_file in the vma, and madvise(,,MADV_REMOVE)
>> > insists on VM_SHARED. But afterwards read John's earlier mail,
>> > drawing attention to the vfs_fallocate() in there: I may be wrong,
>> > and I don't know if Android has THP in the config anyway, but it looks
>> > to me like an unmap_mapping_range() from ashmem's vfs_fallocate()
>> > could hit precisely the VM_BUG_ON_VMA(), once it's vma_is_anonymous().
>> >
>> > (I'm not familiar with ashmem, and I certainly don't understand the
>> > role of MAP_PRIVATE ashmem mappings - hole-punch's zap_pte_range()
>> > should end up leaving any anon pages in place; but the presence of
>> > the BUG is requiring us all to understand too much too quickly.)
>>
>> Hugh, do you see any reason why ashmem shouldn't have vm_ops ==
>> shmem_vm_ops?
>
> I cannot immediately think of an absolute reason why not, but I'm
> not giving it much thought; and that might turn it into a stranger
> beast than it already is.
>
>>
>> I don't understand ashmem, but I feel uncomfortable that we have this
>> sneaky way to create an anonymous VMA. It feels wrong to me.
>
> I agree it's odd, but in this respect it's no odder than /dev/zero:
> that has exactly the same pattern of shmem_zero_setup() for VM_SHARED,
> else vma_set_anonymous(): which made me comfortable with John's patch,
> restoring the way it worked before.
>
> Admittedly, the subsequent vfs_fallocate() might generate surprises;
> and the business of doing a shmem_file_setup() first, and then undoing
> it with a shmem_zero_setup(), looks weird - from John's old XXX comment,

Yes, it is weird. :)

> I think it was a quick hack to piece together some functionality they
> needed in a hurry, which never got revisited (they wanted a name for
> the area? maybe memfd would be good for that now).

So my XXX comment is to do with a change I made to ashmem in order for
it to go into staging, compared with the original Android
implementation. They still carry the patch to undo it here:
https://android.googlesource.com/kernel/common.git/+/ebfc8d6476a66dc91a1b30cbfc18e67d4401af09%5E%21/

But it has more to do w/ in the shared mapping case, providing a
cleaner way of setting the vma->vm_ops = &shmem_vm_ops without having
to use shmem_zero_setup(), and doesn't change the behavior in the
private mapping case.

This discussion has spurred Joel and I to chat a bit about reviving
the effort to "properly" upstream ashmem. We're still in discussion
but I'm thinking we might just try to add the pin/unpin functionality
to memfd. We'll see how the discussion evolves.

thanks
-john

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2018-08-02 19:12 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-29 22:09 Linux 4.18-rc7 Linus Torvalds
2018-07-30  6:47 ` Amit Pundir
2018-07-30 13:01   ` Kirill A. Shutemov
2018-07-30 13:34     ` Amit Pundir
2018-07-30 17:32     ` Linus Torvalds
2018-07-30 21:53       ` Hugh Dickins
2018-07-31  1:01         ` Linus Torvalds
2018-07-31  3:26           ` Hugh Dickins
2018-07-31  4:25             ` John Stultz
2018-07-31  6:40               ` Amit Pundir
2018-07-31  6:56                 ` Kirill A. Shutemov
2018-07-31 16:29                 ` Linus Torvalds
2018-07-31 16:56                   ` John Stultz
2018-07-31 17:03                   ` Kirill A. Shutemov
2018-07-31 17:43                     ` Luck, Tony
2018-07-31 19:02                       ` Linus Torvalds
2018-08-01 17:15                       ` Linus Torvalds
2018-08-01 18:31                         ` Hugh Dickins
2018-08-01 20:58                           ` Kirill A. Shutemov
2018-08-01 21:55                             ` Hugh Dickins
2018-08-02 19:12                               ` John Stultz
2018-08-01 18:36                         ` Luck, Tony
2018-08-01 20:05                         ` Linus Torvalds
2018-08-01 20:51                           ` Kirill A. Shutemov
2018-08-01 20:56                             ` Linus Torvalds
2018-08-01 21:25                               ` Kirill A. Shutemov
2018-08-02  6:59                         ` Amit Pundir
2018-07-31 17:17                   ` [PATCH] staging: ashmem: Fix SIGBUS crash when traversing mmaped ashmem pages John Stultz
2018-07-31  6:29           ` Linux 4.18-rc7 Kirill A. Shutemov
2018-07-31 14:57             ` Kirill A. Shutemov
2018-08-01  0:09               ` Hugh Dickins
2018-07-30 13:24 ` Guenter Roeck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).