linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Linux 4.9-rc6
@ 2016-11-20 22:05 Linus Torvalds
  2016-11-20 22:27 ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2016-11-20 22:05 UTC (permalink / raw)
  To: Linux Kernel Mailing List

We're getting further in the rc series, and while things have stayed
pretty calm, I'm not sure if we're quite there yet. There's a few
outstanding issues that just shouldn't be issues at rc6 time, so we'll
just have to see. This may be one of those releases that have an rc8,
which considering the size of 4.9 is perhaps not that unusual.

That said, nothing particular is bothering me all that much, but we've
had some of the VMALLOC_STACK fixups continue to trickle in, so I
worry that we're not quite done there yet. And let's see what
Thorsten's regression list looks like next week. So no decision yet,
it could still go either way.

The fact that rc6 is bigger than rc5 was is not a particularly great
sign, though. But most of that seems to be just the usual timing
fluctuation: rc6 had networking updates, rc5 didn't, for example.
There are also some rdma updates etc that stand out. Nothing that
looks particularly worrisome.

Aside from the aforementioned networking and rdma, there's gpu fixes,
some tooling and build fixes, and various arch updates (x86, powerpc,
arm, xtensa). And misc fixes all over (i2c, sound, fuse, kvm..)

Go forth and test,

                    Linus

---

Aaron Lu (1):
      mremap: fix race between mremap() and page cleanning

Abhi Das (1):
      fix iov_iter_advance() for ITER_PIPE

Adam Ford (2):
      ARM: dts: omap3: Fix memory node in Torpedo board
      ARM: omap3: Add missing memory node in SOM-LV

Alex Deucher (1):
      drm/amdgpu/powerplay: drop a redundant NULL check

Alex Hemme (1):
      i2c: i2c-mux-pca954x: fix deselect enabling for device-tree

Alexander Duyck (1):
      fib_trie: Correct /proc/net/route off by one error

Alexei Starovoitov (1):
      ftrace: Ignore FTRACE_FL_DISABLED while walking dyn_ftrace records

Allan Chou (1):
      Net Driver: Add Cypress GX3 VID=04b4 PID=3610.

Andreas Gruenbacher (1):
      xattr: Fix setting security xattrs on sockfs

Andrew Donnellan (1):
      powerpc/oops: Fix missing pr_cont()s in instruction dump

Andy Gospodarek (1):
      bgmac: stop clearing DMA receive control register right after it is set

Aneesh Kumar K.V (1):
      powerpc/mm: Fix missing update of HID register on secondary CPUs

Arkadi Sharshevsky (1):
      mlxsw: spectrum_router: Correctly dump neighbour activity

Arnd Bergmann (4):
      brcmfmac: avoid maybe-uninitialized warning in brcmf_cfg80211_start_ap
      netfilter: ip_vs_sync: fix bogus maybe-uninitialized warning
      vxlan: hide unused local variable
      crypto: caam - fix type mismatch warning

Axl-zhang (1):
      dmaengine: sun6i: fix the uninitialized value for v_lli

Azhar Shaikh (1):
      mfd: intel-lpss: Do not put device in reset state on suspend

Baoquan He (2):
      Revert "bnx2: Reset device during driver initialization"
      bnx2: Wait for in-flight DMA to complete at probe stage

Bart Van Assche (1):
      nvmet-rdma: Fix possible NULL deref when handling rdma cm events

Baruch Siach (1):
      net: bpqether.h: remove if_ether.h guard

Benjamin Herrenschmidt (1):
      powerpc/64: Fix setting of AIL in hypervisor mode

Benjamin Poirier (1):
      bna: Add synchronization for tx ring.

Bert Kenward (1):
      sfc: clear napi_hash state when copying channels

Bibby Hsieh (3):
      drm/mediatek: fix a typo of OD_CFG to OD_RELAYMODE
      drm/mediatek: set vblank_disable_allowed to true
      drm/mediatek: clear IRQ status before enable OVL interrupt

Borislav Petkov (2):
      x86/efi: Fix EFI memmap pointer size warning
      kbuild: Steal gcc's pie from the very beginning

Chris Metcalf (1):
      tile: handle __ro_after_init like parisc does

Chris Wilson (1):
      drm/i915: Mark CPU cache as dirty when used for rendering

Christoph Hellwig (1):
      nvme-rdma: reject non-connect commands before the queue is live

Christophe JAILLET (2):
      drm/sun4i: Fix error handling
      drm/sun4i: Propagate error to the caller

Christophe Jaillet (1):
      net/mlx5: Simplify a test

Colin Ian King (3):
      ARM: OMAP2+: PRM: initialize en_uart4_mask and grpsel_uart4_mask
      net: ethernet: ixp4xx_eth: fix spelling mistake in debug message
      ps3_gelic: fix spelling mistake in debug message

Cédric Le Goater (1):
      ipmi/bt-bmc: change compatible node to 'aspeed, ast2400-ibt-bmc'

Dan Carpenter (1):
      ntb_perf: potential info leak in debugfs

Daniel Borkmann (2):
      bpf: fix htab map destruction when extra reserve is in use
      bpf: fix map not being uncharged during map creation failure

Daniel Jurgens (2):
      IB/mlx5: Use cache line size to select CQE stride
      IB/mlx4: Check gid_index return value

Dasaratharaman Chandramouli (1):
      IB/hfi1: Fix ECN processing in prescan_rxq

Dave Airlie (2):
      Revert "drm/mediatek: fix a typo of OD_CFG to OD_RELAYMODE"
      Revert "drm/mediatek: set vblank_disable_allowed to true"

Dave Gerlach (1):
      ARM: AM43XX: Select OMAP_INTERCONNECT in Kconfig

Dave Jiang (1):
      ntb: ntb_hw_intel: init peer_addr in struct intel_ntb_dev

David Ahern (4):
      net: tcp: check skb is non-NULL for exact match on lookups
      net: icmp6_send should use dst dev to determine L3 domain
      net: icmp_route_lookup should use rt dev to determine L3 domain
      net: tcp response should set oif only if it is L3 master

Dennis Dalessandro (3):
      IB/rdmavt: rdmavt can handle non aligned page maps
      IB/hfi1: Remove leftover snoop references
      IB/hfi1: Remove incorrect IS_ERR check

Dongli Zhang (2):
      xen-netfront: do not cast grant table reference to signed short
      xen-netfront: cast grant table reference first to type int

Easwar Hariharan (2):
      IB/hfi1: Clean up unused argument
      IB/hfi1: Delete unused lock

Eli Cohen (2):
      IB/mlx5: Fix fatal error dispatching
      IB/mlx5: Fix NULL pointer dereference on debug print

Eli Cooper (2):
      ip6_tunnel: Clear IP6CB in ip6tunnel_xmit()
      ip6_udp_tunnel: remove unused IPCB related codes

Eric Biggers (2):
      fscrypto: don't use on-stack buffer for filename encryption
      fscrypto: don't use on-stack buffer for key derivation

Eric Dumazet (12):
      net: clear sk_err_soft in sk_clone_lock()
      net: mangle zero checksum in skb_checksum_help()
      tcp: fix potential memory corruption
      tcp: fix return value for partial writes
      dccp: do not release listeners too soon
      dccp: do not send reset to already closed sockets
      dccp: fix out of bound access in dccp_v4_err()
      netlink: netlink_diag_dump() runs without locks
      ipv6: dccp: fix out of bound access in dccp_v6_err()
      ipv6: dccp: add missing bind_conflict to dccp_ipv6_mapped
      net: __skb_flow_dissect() must cap its return value
      tcp: take care of truncations done by sk_filter()

Eugeniy Paltsev (1):
      drm/arcpgu: Accommodate adv7511 switch to DRM bridge

Fabian Mewes (1):
      Documentation: networking: dsa: Update tagging protocols

Fabio Estevam (1):
      ARM: dts: imx53-qsb: Fix regulator constraints

Florian Fainelli (1):
      net: stmmac: Fix lack of link transition for fixed PHYs

Florian Westphal (5):
      netfilter: conntrack: avoid excess memory allocation
      dctcp: avoid bogus doubling of cwnd after loss
      netfilter: connmark: ignore skbs with magic untracked conntrack objects
      netfilter: conntrack: fix CT target for UNSPEC helpers
      netfilter: conntrack: refine gc worker heuristics

Gao Feng (1):
      driver: macvlan: Destroy new macvlan port if
macvlan_common_newlink failed.

Gregory CLEMENT (1):
      arm64: dts: marvell: Fix typo in label name on Armada 37xx

Guenter Roeck (1):
      r8152: Fix error path in open function

Guilherme G. Piccoli (1):
      ehea: fix operation state report

H. Nikolaus Schaller (4):
      dts: omap5: board-common: add phandle to reference Palmas gpadc
      dts: omap5: board-common: enable twl6040 headset jack detection
      ASoC: omap-abe-twl6040: fix typo in bindings documentation
      ARM: dts: omap5: board-common: fix wrong SMPS6 (VDD-DDR3) voltage

Haim Dreyfuss (1):
      iwlwifi: mvm: comply with fw_restart mod param on suspend

Hariprasad Shenai (1):
      cxgb4: correct device ID of T6 adapter

Heikki Krogerus (1):
      mfd: intel_soc_pmic_bxtwc: Fix usbc interrupt

Herbert Xu (1):
      crypto: algif_hash - Fix NULL hash crash with shash

Hoan Tran (1):
      mailbox: PCC: Fix lockdep warning when request PCC channel

Hugh Dickins (1):
      powerpc: Fix exception vector build with 2.23 era binutils

Hui Wang (1):
      ALSA: hda - add a new condition to check if it is thinkpad

Huy Nguyen (1):
      net/mlx5: Fix invalid pointer reference when prof_sel parameter is invalid

Icenowy Zheng (1):
      ARM: dts: sun8i: fix the pinmux for UART1

Ido Schimmel (2):
      mlxsw: spectrum: Fix incorrect reuse of MID entries
      mlxsw: spectrum_router: Flush FIB tables during fini

Ignacio Alvarado (1):
      KVM: Disable irq while unregistering user notifier

Ira Weiny (1):
      IB/hfi1: Fix rnr_timer addition

Isaac Boukris (1):
      unix: escape all null bytes in abstract unix domain socket

Iyappan Subramanian (2):
      drivers: net: xgene: fix: Disable coalescing on v1 hardware
      drivers: net: xgene: fix: Coalescing values for v2 hardware

Jakub Pawlak (2):
      IB/hfi1: Fix integrity check flags default values
      IB/hfi1: Fix status error code for unsupported packets

Jakub Sitnicki (1):
      ipv6: Don't use ufo handling on later transformed packets

Jarkko Nikula (1):
      mfd: lpss: Fix Intel Kaby Lake PCH-H properties

Javier Martinez Canillas (1):
      rtc: asm9260: fix module autoload

Jianxin Xiong (2):
      IB/hfi1: Fix a potential memory leak in hfi1_create_ctxts()
      IB/hfi1: Prevent hardware counter names from being cut off

Jiri Pirko (2):
      mlxsw: spectrum_router: Fix handling of neighbour structure
      mlxsw: spectrum_router: Ignore FIB notification events for
non-init namespaces

Johan Hovold (5):
      phy: fix device reference leaks
      net: ethernet: ti: cpsw: fix device and of_node leaks
      net: ethernet: ti: davinci_emac: fix device reference leak
      net: hns: fix device reference leaks
      mfd: core: Fix device reference leak in mfd_clone_cell

Johannes Berg (1):
      iwlwifi: pcie: mark command queue lock with separate lockdep class

John Allen (1):
      ibmvnic: Start completion queue negotiation at server-provided
optimum values

John W. Linville (1):
      netfilter: nf_tables: fix type mismatch with error return from
nft_parse_u32_check

Jonathan Liu (1):
      drm/sun4i: rgb: Enable panel after controller

Junzhi Zhao (3):
      drm/mediatek: do mtk_hdmi_send_infoframe after HDMI clock enable
      drm/mediatek: enhance the HDMI driving current
      drm/mediatek: modify the factor to make the pll_rate set in the
1G-2G range

Jérémy Lefaure (1):
      dmaengine: mmp_tdma: add missing select GENERIC_ALLOCATOR in Kconfig

Kan Liang (1):
      perf/x86/intel/uncore: Add more Intel uncore IMC PCI IDs for SkyLake

Keith Busch (1):
      nvme/pci: Don't free queues on error

Keno Fischer (1):
      gpio: Remove GPIO_DEVRES option

Krzysztof Blaszkowski (2):
      IB/hfi1: Return ENODEV for unsupported PCI device ids.
      IB/hfi1: Relocate rcvhdrcnt module parameter check.

LABBE Corentin (1):
      rtc: cmos: remove all __exit_p annotations

Lance Richardson (2):
      ipv4: allow local fragmentation in ip_finish_output_gso()
      ipv4: update comment to document GSO fragmentation cases.

Leon Romanovsky (1):
      IB/core: Set routable RoCE gid type for ipv4/ipv6 networks

Linus Torvalds (3):
      Revert "printk: make reading the kernel log flush pending lines"
      ASoC: lpass-platform: fix uninitialized variable
      Linux 4.9-rc6

Linus Walleij (5):
      video: ARM CLCD: fix Vexpress regression
      i2c: mux: fix up dependencies
      gpio: do not double-check direction on sleeping chips
      gpio: tc3589x: fix up .get_direction()
      mfd: stmpe: Fix RESET regression on STMPE2401

Liping Zhang (6):
      netfilter: nft_dynset: fix panic if NFT_SET_HASH is not enabled
      netfilter: nf_tables: fix *leak* when expr clone fail
      netfilter: nf_tables: fix race when create new element in dynset
      netfilter: nf_tables: destroy the set if fail to add transaction
      netfilter: nft_dup: do not use sreg_dev if the user doesn't specify it
      netfilter: nf_tables: fix oops when inserting an element into a
verdict map

Loic Pallardy (1):
      ARM: dts: STiH410-b2260: Fix typo in spi0 chipselect definition

Lokesh Vutla (1):
      rtc: omap: Fix selecting external osc

Luca Coelho (4):
      iwlwifi: mvm: use ssize_t for len in iwl_debugfs_mem_read()
      iwlwifi: mvm: fix d3_test with unified D0/D3 images
      iwlwifi: pcie: fix SPLC structure parsing
      iwlwifi: mvm: fix netdetect starting/stopping for unified images

Lukas Resch (1):
      can: sja1000: plx_pci: Add support for Moxa CAN devices

Lukas Wunner (1):
      x86/platform/intel-mid: Retrofit pci_platform_pm_ops ->get_state hook

Lv Zheng (1):
      tools/power/acpi: Remove direct kernel source include reference

Maciej Żenczykowski (1):
      net-ipv6: on device mtu change do not add mtu to mtu-less routes

Majd Dibbiny (1):
      IB/mlx5: Fix memory leak in query device

Maor Gottlieb (1):
      IB/mlx5: Validate requested RQT size

Marcelo Ricardo Leitner (1):
      sctp: assign assoc_id earlier in __sctp_connect

Marcin Wojtas (2):
      arm64: dts: marvell: fix clocksource for CP110 slave SPI0
      arm64: dts: marvell: add unique identifiers for Armada A8k SPI controllers

Marek Szyprowski (1):
      ARM: 8628/1: dma-mapping: preallocate DMA-debug hash tables in
core_initcall

Mario Kleiner (1):
      drm/amdgpu: Attach exclusive fence to prime exported bo's. (v5)

Mark Bloch (3):
      IB/cm: Mark stale CM id's whenever the mad agent was unregistered
      IB/core: Add missing check for addr_resolve callback return value
      IB/core: Avoid unsigned int overflow in sg_alloc_table

Mark Lord (1):
      r8152: Fix broken RX checksums.

Martin KaFai Lau (2):
      bpf: Fix bpf_redirect to an ipip/ip6tnl dev
      bpf: Add test for bpf_redirect to ipip/ip6tnl

Matan Barak (1):
      IB/mlx4: Fix create CQ error flow

Mathias Krause (1):
      rtnl: reset calcit fptr in rtnl_unregister()

Matt Fleming (1):
      x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK=y

Mauro Carvalho Chehab (1):
      gp8psk-fe: add missing MODULE_foo() macros

Max Filippov (2):
      xtensa: clean up printk usage for boot/crash logging
      xtensa: wire up new pkey_{mprotect,alloc,free} syscalls

Maxime Ripard (1):
      drm/sun4i: rgb: Remove the bridge enable/disable functions

Michael Chan (2):
      bnxt_en: Fix ring arithmetic in bnxt_setup_tc().
      bnxt_en: Fix VF virtual link state.

Michael Ellerman (3):
      powerpc/oops: Fix missing pr_cont()s in show_stack()
      powerpc/oops: Fix missing pr_cont()s in print_msr_bits() et. al.
      powerpc/oops: Fix missing pr_cont()s in show_regs()

Michael Neuling (1):
      powerpc/mm/radix: Invalidate ERAT on tlbiel for POWER9 DD1

Michael S. Tsirkin (1):
      virtio-net: drop legacy features in virtio 1 mode

Mike Frysinger (1):
      Revert "include/uapi/linux/atm_zatm.h: include linux/time.h"

Mike Marshall (1):
      orangefs: add .owner to debugfs file_operations

Miklos Szeredi (2):
      fuse: fix root dentry initialization
      fuse: fix fuse_write_end() if zero bytes were copied

Mintz, Yuval (2):
      qede: Fix statistics' strings for Tx/Rx queues
      qede: Correctly map aggregation replacement pages

Monk Liu (1):
      drm/amdgpu:fix vpost_needed routine

Moshe Lazer (1):
      IB/mlx5: Resolve soft lock on massive reg MRs

Namhyung Kim (5):
      perf hist browser: Fix hierarchy column counts
      perf hists browser: Fix indentation of folded sign on --hierarchy
      perf hists browser: Show folded sign properly on --hierarchy
      perf hists browser: Fix column indentation on --hierarchy
      perf hists: Fix column length on --hierarchy

Nicholas Mc Guire (2):
      ntb_transport: make DMA_OUT_RESOURCE_TO HZ independent
      ntb: make DMA_OUT_RESOURCE_TO HZ independent

Nicholas Piggin (4):
      kbuild: prevent lib-ksyms.o rebuilds
      kbuild: modversions for EXPORT_SYMBOL() for asm
      kbuild: be more careful about matching preprocessed asm ___EXPORT_SYMBOL
      powerpc/64s: Fix system reset interrupt winkle wakeups

Nicolae Rosia (1):
      ARM: OMAP2+: avoid NULL pointer dereference

Nicolas Pitre (1):
      ARM: 8624/1: proc-v7m.S: fix init section name

Oliver Hartkopp (1):
      can: bcm: fix warning in bcm_connect/proc_register

Or Gerlitz (3):
      net/mlx5e: Disallow changing name-space for VF representors
      net/mlx5e: Handle matching on vlan priority for offloaded TC rules
      net/mlx5: E-Switch, Set the actions for offloaded rules properly

Paolo Bonzini (5):
      KVM: x86: do not go through vcpu in __get_kvmclock_ns
      kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use
      KVM: async_pf: avoid recursive flushing of work items
      KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr
      kvm: x86: merge kvm_arch_set_irq and kvm_arch_set_irq_inatomic

Pavel Machek (1):
      MAINTAINERS: Add LED subsystem co-maintainer

Peter Rosin (1):
      i2c: Documentation: i2c-topology: fix minor whitespace nit

Phil Reid (2):
      gpio: pca953x: Fix corruption of other gpios in set_multiple.
      gpio: pca953x: Move memcpy into mutex lock for set multiple

Rafael J. Wysocki (1):
      Revert "ACPICA: FADT support cleanup"

Rafał Miłecki (1):
      net: bgmac: fix reversed checks for clock control flag

Ram Amrani (2):
      qed: configure ll2 RoCE v1/v2 flavor correctly
      qed: Correct rdma params configuration

Russell King (3):
      net: mv643xx_eth: ensure coalesce settings survive read-modify-write
      ARM: fix backtrace
      ARM: Fix XIP kernels

Saeed Mahameed (3):
      MAINTAINERS: Update MELLANOX MLX5 core VPI driver maintainers
      net/mlx5e: Fix XDP error path of mlx5e_open_channel()
      net/mlx5e: Re-arrange XDP SQ/CQ creation

Sagi Grimberg (3):
      nvmet: Don't queue fatal error work if csts.cfs is set
      nvmet-rdma: don't forget to delete a queue from the list of
connection failed
      nvmet-rdma: drain the queue-pair just before freeing it

Sara Sharon (1):
      iwlwifi: mvm: wake the wait queue when the RX sync counter is zero

Scott Mayhew (1):
      sunrpc: svc_age_temp_xprts_now should not call setsockopt
non-tcp transports

Sebastian Andrzej Siewior (3):
      kbuild: add -fno-PIE
      scripts/has-stack-protector: add -fno-PIE
      x86/kexec: add -fno-PIE

Soheil Hassas Yeganeh (1):
      sock: fix sendmmsg for partial sendmsg

Stefan Agner (3):
      drm/fsl-dcu: do not update when modifying irq registers
      drm/fsl-dcu: update all registers on flush
      drm/fsl-dcu: disable planes before disabling CRTC

Stephen Suryaputra Lin (1):
      ipv4: use new_gw for redirect neigh lookup

Steve Wise (3):
      nvme-rdma: stop and free io queues on connect failure
      iw_cxgb4: set *bad_wr for post_send/post_recv errors
      iw_cxgb4: invalidate the mr when posting a read_w_inv wr

Steven Rostedt (Red Hat) (1):
      ftrace: Add more checks for FTRACE_FL_DISABLED in processing ip records

Sven Ebenfeld (1):
      crypto: caam - do not register AES-XTS mode on LP units

Tadeusz Struk (2):
      IB/hfi1: Remove redundant sysfs irq affinity entry
      IB/hfi1: Fix an Oops on pci device force remove

Takashi Iwai (2):
      ALSA: hda - Fix mic regression by ASRock mobo fixup
      ALSA: usb-audio: Fix use-after-free of usb_device at disconnect

Tariq Toukan (2):
      Revert "net/mlx4_en: Fix panic during reboot"
      IB/uverbs: Fix leak of XRC target QPs

Tero Kristo (1):
      rtc: omap: prevent disabling of clock/module during suspend

Theodore Ts'o (1):
      ext4: sanity check the block and cluster size at mount time

Thomas Falcon (2):
      ibmvnic: Unmap ibmvnic_statistics structure
      ibmvnic: Fix size of debugfs name buffer

Thomas Gleixner (2):
      genirq: Use irq type from irqdata instead of irqdesc
      x86/cpu: Deal with broken firmware (VMWare/XEN)

Timur Tabi (3):
      net: qcom/emac: use correct value for SGMII_LN_UCDR_SO_GAIN_MODE0
      net: qcom/emac: configure the external phy to allow pause frames
      net: qcom/emac: enable flow control if requested

Tony Lindgren (5):
      ARM: OMAP3: Fix formatting of features printed
      dmaengine: cppi41: Fix list not empty warning on module removal
      dmaengine: cppi41: Fix unpaired pm runtime when only a USB hub
is connected
      dmaengine: cpp41: Fix handling of error path
      dmaengine: cppi41: More PM runtime fixes

Ulrich Weber (1):
      netfilter: nf_conntrack_sip: extend request line validation

Ville Syrjälä (4):
      rtc: cmos: Don't enable interrupts in the middle of the interrupt handler
      drm/i915: Grab the rotation from the passed plane state for VLV sprites
      drm/i915: Refresh that status of MST capable connectors in ->detect()
      drm/i915: Assume non-DP++ port if dvo_port is HDMI and there's
no AUX ch specified in the VBT

WANG Cong (4):
      inet: fix sleeping inside inet_wait_for_connect()
      genetlink: fix a memory leak on error path
      taskstats: fix the length of cgroupstats_cmd_get_policy
      ipvs: use IPVS_CMD_ATTR_MAX for family.maxattr

Wei Huang (2):
      arm64: KVM: pmu: Fix AArch32 cycle counter access
      KVM: arm64: Fix the issues when guest PMCCFILTR is configured

Wei Yongjun (4):
      dmaengine: edma: Fix error return code in edma_alloc_chan_resources()
      ntb_pingpong: Fix db_init parameter description
      NTB: ntb_hw_intel: Fix typo in module parameter descriptions
      i2c: digicolor: use clk_disable_unprepare instead of clk_unprepare

Wolfram Sang (1):
      i2c: mux: demux-pinctrl: make drivers with no pinctrl work again

Xin Long (5):
      ipv6: add mtu lock check in __ip6_rt_update_pmtu
      sctp: hold transport instead of assoc in sctp_diag
      sctp: return back transport in __sctp_rcv_init_lookup
      sctp: hold transport instead of assoc when lookup assoc in rx path
      sctp: change sk state only when it has assocs in sctp_shutdown

Yazen Ghannam (1):
      x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systems

Yonatan Cohen (4):
      IB/rxe: Fix kernel panic in UDP tunnel with GRO and RX checksum
      IB/rxe: Fix handling of erroneous WR
      IB/rxe: Clear queue buffer when modifying QP to reset
      IB/rxe: Update qp state for user query

Yotam Gigi (1):
      mlxsw: spectrum: Fix refcount bug on span entries

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Linux 4.9-rc6
  2016-11-20 22:05 Linux 4.9-rc6 Linus Torvalds
@ 2016-11-20 22:27 ` Eric Dumazet
  2016-11-20 23:27   ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2016-11-20 22:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

On Sun, 2016-11-20 at 14:05 -0800, Linus Torvalds wrote:

> That said, nothing particular is bothering me all that much, but we've
> had some of the VMALLOC_STACK fixups continue to trickle in, so I
> worry that we're not quite done there yet. And let's see what
> Thorsten's regression list looks like next week. So no decision yet,
> it could still go either way.

Hosts with ~100,000 threads have an issue with /prov/vmallocinfo

It can take about 800 usec to skip over ~100,000 struct vmap_area
in s_start(), while holding vmap_area_lock spinlock, and therefore
blocking fork()/pthread_create().

I presume we can not switch to the rbtree (vmap_area_root)
for /proc/vmallocinfo, because this file is seek-able, right ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Linux 4.9-rc6
  2016-11-20 22:27 ` Eric Dumazet
@ 2016-11-20 23:27   ` Linus Torvalds
  2016-11-21  1:35     ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2016-11-20 23:27 UTC (permalink / raw)
  To: Eric Dumazet, Al Viro; +Cc: Linux Kernel Mailing List

On Sun, Nov 20, 2016 at 2:27 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Hosts with ~100,000 threads have an issue with /prov/vmallocinfo
>
> It can take about 800 usec to skip over ~100,000 struct vmap_area
> in s_start(), while holding vmap_area_lock spinlock, and therefore
> blocking fork()/pthread_create().
>
> I presume we can not switch to the rbtree (vmap_area_root)
> for /proc/vmallocinfo, because this file is seek-able, right ?

Well, the good news is that the file is root-only anyway, which means
that at least it won't have the issue that a lot of other /proc files
have had - namely being opened by random user programs or libraries.

Which means that the users of it are likely fairly limited.

Which in turn means that we can probably afford to play more games
with it. Including, for example, possibly marking it non-seekable.

Or even just limit the maximum entries we are willing to walk.

Or we could decide that that file shouldn't be a seq_file at all, use
the old "one page buffer" approach that was so common for /proc files,
and make the position encode the vmalloc address in it (make the lower
PAGE_MASK bits be the offset in the line), and then we *could* just
look things up using the btree method.

Al, do you have any clever ideas?

                 Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Linux 4.9-rc6
  2016-11-20 23:27   ` Linus Torvalds
@ 2016-11-21  1:35     ` Al Viro
  2016-11-21  4:59       ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2016-11-21  1:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Eric Dumazet, Linux Kernel Mailing List

On Sun, Nov 20, 2016 at 03:27:07PM -0800, Linus Torvalds wrote:
> On Sun, Nov 20, 2016 at 2:27 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > Hosts with ~100,000 threads have an issue with /prov/vmallocinfo
> >
> > It can take about 800 usec to skip over ~100,000 struct vmap_area
> > in s_start(), while holding vmap_area_lock spinlock, and therefore
> > blocking fork()/pthread_create().
> >
> > I presume we can not switch to the rbtree (vmap_area_root)
> > for /proc/vmallocinfo, because this file is seek-able, right ?
> 
> Well, the good news is that the file is root-only anyway, which means
> that at least it won't have the issue that a lot of other /proc files
> have had - namely being opened by random user programs or libraries.
> 
> Which means that the users of it are likely fairly limited.
> 
> Which in turn means that we can probably afford to play more games
> with it. Including, for example, possibly marking it non-seekable.
> 
> Or even just limit the maximum entries we are willing to walk.
> 
> Or we could decide that that file shouldn't be a seq_file at all, use
> the old "one page buffer" approach that was so common for /proc files,
> and make the position encode the vmalloc address in it (make the lower
> PAGE_MASK bits be the offset in the line), and then we *could* just
> look things up using the btree method.
> 
> Al, do you have any clever ideas?

Umm...  One possibility would be something like fs/namespace.c:m_start() -
if nothing has changed since the last time, just use a cached pointer.
That has sped the damn thing (/proc/mounts et.al.) big way, but it's
dependent upon having an event count updated whenever we change the
mount tree - doing the same for vma_area list might or might not be
a good idea.  /proc/mounts and friends get ->poll() on that as well;
that probably would _not_ be a good idea in this case.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Linux 4.9-rc6
  2016-11-21  1:35     ` Al Viro
@ 2016-11-21  4:59       ` Eric Dumazet
  2016-11-21  8:34         ` David Rientjes
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2016-11-21  4:59 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Mon, 2016-11-21 at 01:35 +0000, Al Viro wrote:

> 
> Umm...  One possibility would be something like fs/namespace.c:m_start() -
> if nothing has changed since the last time, just use a cached pointer.
> That has sped the damn thing (/proc/mounts et.al.) big way, but it's
> dependent upon having an event count updated whenever we change the
> mount tree - doing the same for vma_area list might or might not be
> a good idea.  /proc/mounts and friends get ->poll() on that as well;
> that probably would _not_ be a good idea in this case.

Yes, a generation number could help in some cases.

Another potential issue with CONFIG_VMAP_STACK is that we make no
attempt to allocate 4 consecutive pages.

Even if we have plenty of memory, 4 calls to alloc_page() are likely to
give us 4 pages in completely different locations.

Here I printed the hugepage number of the 4 pages for some stacks :


0xffffc9001a07c000-0xffffc9001a081000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfcac Hfeba Hfec0 Hfc9d N0=4
0xffffc9001a084000-0xffffc9001a089000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfc79 Hfc79 Hfc79 Hfc83 N0=4
0xffffc9001a08c000-0xffffc9001a091000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfc9b Hfe91 Hfebe Hfca2 N0=4
0xffffc9001a094000-0xffffc9001a099000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfcaa Hfcaa Hfca6 Hfebc N0=4
0xffffc9001a09c000-0xffffc9001a0a1000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfe9b Hfe90 Hff09 Hfefb N0=4
0xffffc9001a0a4000-0xffffc9001a0a9000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfe94 Hfe62 Hfea0 Hfe7b N0=4
0xffffc9001a0ac000-0xffffc9001a0b1000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfe78 Hff05 Hff05 Hfc74 N0=4
0xffffc9001a0b4000-0xffffc9001a0b9000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfc9b Hfc9b Hfe83 Hf782 N0=4
0xffffc9001a0bc000-0xffffc9001a0c1000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfe78 Hfe78 Hfc7f Hfc7f N0=4
0xffffc9001a0c4000-0xffffc9001a0c9000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfebe Hfebe Hfe82 Hfe85 N0=4
0xffffc9001a0cc000-0xffffc9001a0d1000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfc6b Hfe62 Hfe62 Hfcaa N0=4
0xffffc9001a0d4000-0xffffc9001a0d9000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfebd Hfebd Hfc92 Hfc92 N0=4

This is a vmalloc() generic issue that is worth fixing now ?

Note this RFC might conflict with NUMA interleave policy.

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f2481cb4e6b2..0123e97debb9 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1602,9 +1602,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 				 pgprot_t prot, int node)
 {
 	struct page **pages;
-	unsigned int nr_pages, array_size, i;
+	unsigned int nr_pages, array_size, i, j;
 	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
 	const gfp_t alloc_mask = gfp_mask | __GFP_NOWARN;
+	const gfp_t multi_alloc_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_NORETRY;
 
 	nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
 	array_size = (nr_pages * sizeof(struct page *));
@@ -1624,20 +1625,34 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 		return NULL;
 	}
 
-	for (i = 0; i < area->nr_pages; i++) {
-		struct page *page;
-
-		if (node == NUMA_NO_NODE)
-			page = alloc_page(alloc_mask);
-		else
-			page = alloc_pages_node(node, alloc_mask, 0);
+	for (i = 0; i < area->nr_pages;) {
+		struct page *page = NULL;
+		unsigned int chunk_order = min(ilog2(area->nr_pages - i), MAX_ORDER - 1);
+
+		while (chunk_order && !page) {
+			if (node == NUMA_NO_NODE)
+				page = alloc_pages(multi_alloc_mask, chunk_order);
+			else
+				page = alloc_pages_node(node, multi_alloc_mask, chunk_order);
+			if (page)
+				split_page(page, chunk_order);
+			else
+				chunk_order--;
+		}
+		if (!page) {
+			if (node == NUMA_NO_NODE)
+				page = alloc_pages(alloc_mask, 0);
+			else
+				page = alloc_pages_node(node, alloc_mask, 0);
+		}
 
 		if (unlikely(!page)) {
 			/* Successfully allocated i pages, free them in __vunmap() */
 			area->nr_pages = i;
 			goto fail;
 		}
-		area->pages[i] = page;
+		for (j = 0; j < (1 << chunk_order); j++)
+			area->pages[i++] = page++;
 		if (gfpflags_allow_blocking(gfp_mask))
 			cond_resched();
 	}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Linux 4.9-rc6
  2016-11-21  4:59       ` Eric Dumazet
@ 2016-11-21  8:34         ` David Rientjes
  2016-11-21 13:32           ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: David Rientjes @ 2016-11-21  8:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Al Viro, Linus Torvalds, Linux Kernel Mailing List

On Sun, 20 Nov 2016, Eric Dumazet wrote:

> Another potential issue with CONFIG_VMAP_STACK is that we make no
> attempt to allocate 4 consecutive pages.
> 
> Even if we have plenty of memory, 4 calls to alloc_page() are likely to
> give us 4 pages in completely different locations.
> 
> Here I printed the hugepage number of the 4 pages for some stacks :
> 
> 
> 0xffffc9001a07c000-0xffffc9001a081000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfcac Hfeba Hfec0 Hfc9d N0=4
> 0xffffc9001a084000-0xffffc9001a089000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfc79 Hfc79 Hfc79 Hfc83 N0=4
> 0xffffc9001a08c000-0xffffc9001a091000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfc9b Hfe91 Hfebe Hfca2 N0=4
> 0xffffc9001a094000-0xffffc9001a099000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfcaa Hfcaa Hfca6 Hfebc N0=4
> 0xffffc9001a09c000-0xffffc9001a0a1000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfe9b Hfe90 Hff09 Hfefb N0=4
> 0xffffc9001a0a4000-0xffffc9001a0a9000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfe94 Hfe62 Hfea0 Hfe7b N0=4
> 0xffffc9001a0ac000-0xffffc9001a0b1000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfe78 Hff05 Hff05 Hfc74 N0=4
> 0xffffc9001a0b4000-0xffffc9001a0b9000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfc9b Hfc9b Hfe83 Hf782 N0=4
> 0xffffc9001a0bc000-0xffffc9001a0c1000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfe78 Hfe78 Hfc7f Hfc7f N0=4
> 0xffffc9001a0c4000-0xffffc9001a0c9000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfebe Hfebe Hfe82 Hfe85 N0=4
> 0xffffc9001a0cc000-0xffffc9001a0d1000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfc6b Hfe62 Hfe62 Hfcaa N0=4
> 0xffffc9001a0d4000-0xffffc9001a0d9000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfebd Hfebd Hfc92 Hfc92 N0=4
> 
> This is a vmalloc() generic issue that is worth fixing now ?
> 
> Note this RFC might conflict with NUMA interleave policy.
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index f2481cb4e6b2..0123e97debb9 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1602,9 +1602,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  				 pgprot_t prot, int node)
>  {
>  	struct page **pages;
> -	unsigned int nr_pages, array_size, i;
> +	unsigned int nr_pages, array_size, i, j;
>  	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
>  	const gfp_t alloc_mask = gfp_mask | __GFP_NOWARN;
> +	const gfp_t multi_alloc_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_NORETRY;
>  
>  	nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
>  	array_size = (nr_pages * sizeof(struct page *));

I think multi_alloc_mask wants to use alloc_mask rather than gfp_mask 
before clearing the bit, otherwise the failed high-order allocations with 
no chance to reclaim will spew page allocation failure warnings.  Using 
__GFP_NORETRY here would be a no-op, but it depends on the implementation 
so no problems setting it.

> @@ -1624,20 +1625,34 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  		return NULL;
>  	}
>  
> -	for (i = 0; i < area->nr_pages; i++) {
> -		struct page *page;
> -
> -		if (node == NUMA_NO_NODE)
> -			page = alloc_page(alloc_mask);
> -		else
> -			page = alloc_pages_node(node, alloc_mask, 0);
> +	for (i = 0; i < area->nr_pages;) {
> +		struct page *page = NULL;
> +		unsigned int chunk_order = min(ilog2(area->nr_pages - i), MAX_ORDER - 1);
> +
> +		while (chunk_order && !page) {
> +			if (node == NUMA_NO_NODE)
> +				page = alloc_pages(multi_alloc_mask, chunk_order);
> +			else
> +				page = alloc_pages_node(node, multi_alloc_mask, chunk_order);
> +			if (page)
> +				split_page(page, chunk_order);
> +			else
> +				chunk_order--;
> +		}
> +		if (!page) {
> +			if (node == NUMA_NO_NODE)
> +				page = alloc_pages(alloc_mask, 0);
> +			else
> +				page = alloc_pages_node(node, alloc_mask, 0);
> +		}
>  
>  		if (unlikely(!page)) {
>  			/* Successfully allocated i pages, free them in __vunmap() */
>  			area->nr_pages = i;
>  			goto fail;
>  		}
> -		area->pages[i] = page;
> +		for (j = 0; j < (1 << chunk_order); j++)
> +			area->pages[i++] = page++;
>  		if (gfpflags_allow_blocking(gfp_mask))
>  			cond_resched();
>  	}
> 
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Linux 4.9-rc6
  2016-11-21  8:34         ` David Rientjes
@ 2016-11-21 13:32           ` Eric Dumazet
  2016-11-21 13:51             ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2016-11-21 13:32 UTC (permalink / raw)
  To: David Rientjes; +Cc: Al Viro, Linus Torvalds, Linux Kernel Mailing List

On Mon, 2016-11-21 at 00:34 -0800, David Rientjes wrote:
> On Sun, 20 Nov 2016, Eric Dumazet wrote:
> 
> > Another potential issue with CONFIG_VMAP_STACK is that we make no
> > attempt to allocate 4 consecutive pages.
> > 
> > Even if we have plenty of memory, 4 calls to alloc_page() are likely to
> > give us 4 pages in completely different locations.
> > 
> > Here I printed the hugepage number of the 4 pages for some stacks :
> > 
> > 
> > 0xffffc9001a07c000-0xffffc9001a081000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfcac Hfeba Hfec0 Hfc9d N0=4
> > 0xffffc9001a084000-0xffffc9001a089000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfc79 Hfc79 Hfc79 Hfc83 N0=4
> > 0xffffc9001a08c000-0xffffc9001a091000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfc9b Hfe91 Hfebe Hfca2 N0=4
> > 0xffffc9001a094000-0xffffc9001a099000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfcaa Hfcaa Hfca6 Hfebc N0=4
> > 0xffffc9001a09c000-0xffffc9001a0a1000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfe9b Hfe90 Hff09 Hfefb N0=4
> > 0xffffc9001a0a4000-0xffffc9001a0a9000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfe94 Hfe62 Hfea0 Hfe7b N0=4
> > 0xffffc9001a0ac000-0xffffc9001a0b1000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfe78 Hff05 Hff05 Hfc74 N0=4
> > 0xffffc9001a0b4000-0xffffc9001a0b9000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfc9b Hfc9b Hfe83 Hf782 N0=4
> > 0xffffc9001a0bc000-0xffffc9001a0c1000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfe78 Hfe78 Hfc7f Hfc7f N0=4
> > 0xffffc9001a0c4000-0xffffc9001a0c9000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfebe Hfebe Hfe82 Hfe85 N0=4
> > 0xffffc9001a0cc000-0xffffc9001a0d1000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfc6b Hfe62 Hfe62 Hfcaa N0=4
> > 0xffffc9001a0d4000-0xffffc9001a0d9000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfebd Hfebd Hfc92 Hfc92 N0=4
> > 
> > This is a vmalloc() generic issue that is worth fixing now ?
> > 
> > Note this RFC might conflict with NUMA interleave policy.
> > 
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index f2481cb4e6b2..0123e97debb9 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -1602,9 +1602,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> >  				 pgprot_t prot, int node)
> >  {
> >  	struct page **pages;
> > -	unsigned int nr_pages, array_size, i;
> > +	unsigned int nr_pages, array_size, i, j;
> >  	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> >  	const gfp_t alloc_mask = gfp_mask | __GFP_NOWARN;
> > +	const gfp_t multi_alloc_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_NORETRY;
> >  
> >  	nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
> >  	array_size = (nr_pages * sizeof(struct page *));
> 
> I think multi_alloc_mask wants to use alloc_mask rather than gfp_mask 
> before clearing the bit, otherwise the failed high-order allocations with 
> no chance to reclaim will spew page allocation failure warnings.  Using 
> __GFP_NORETRY here would be a no-op, but it depends on the implementation 
> so no problems setting it.

Oh, this was definitely my intent of course, thanks for noticing this
typo ;)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Linux 4.9-rc6
  2016-11-21 13:32           ` Eric Dumazet
@ 2016-11-21 13:51             ` Eric Dumazet
  2016-11-21 16:49               ` Eric Dumazet
  2016-12-04 10:43               ` Thorsten Leemhuis
  0 siblings, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2016-11-21 13:51 UTC (permalink / raw)
  To: David Rientjes; +Cc: Al Viro, Linus Torvalds, Linux Kernel Mailing List

On Mon, 2016-11-21 at 05:32 -0800, Eric Dumazet wrote:

> 
> Oh, this was definitely my intent of course, thanks for noticing this
> typo ;)

V2 is fixing this, and brings back NUMA spreading,
(eg alloc_large_system_hash() done at boot time )


lpaa24:~# grep alloc_large /proc/vmallocinfo 
0xffffc90000009000-0xffffc9000000c000   12288 alloc_large_system_hash+0x178/0x238 pages=2 vmalloc N0=1 N1=1
0xffffc9000000c000-0xffffc9000000f000   12288 alloc_large_system_hash+0x178/0x238 pages=2 vmalloc N0=1 N1=1
0xffffc9000001e000-0xffffc9000009f000  528384 alloc_large_system_hash+0x178/0x238 pages=128 vmalloc N0=64 N1=64
0xffffc9000009f000-0xffffc900000e0000  266240 alloc_large_system_hash+0x178/0x238 pages=64 vmalloc N0=32 N1=32
0xffffc900001d3000-0xffffc900101d4000 268439552 alloc_large_system_hash+0x178/0x238 pages=65536 vmalloc vpages N0=32768 N1=32768
0xffffc900101d4000-0xffffc900181d5000 134221824 alloc_large_system_hash+0x178/0x238 pages=32768 vmalloc vpages N0=16384 N1=16384
0xffffc900181d5000-0xffffc900185d6000 4198400 alloc_large_system_hash+0x178/0x238 pages=1024 vmalloc vpages N0=512 N1=512
0xffffc900185d6000-0xffffc900189d7000 4198400 alloc_large_system_hash+0x178/0x238 pages=1024 vmalloc vpages N0=512 N1=512
0xffffc9001b271000-0xffffc9001b672000 4198400 alloc_large_system_hash+0x178/0x238 pages=1024 vmalloc vpages N0=512 N1=512
0xffffc9001b672000-0xffffc9001b675000   12288 alloc_large_system_hash+0x178/0x238 pages=2 vmalloc N0=1 N1=1
0xffffc9001b675000-0xffffc9001b776000 1052672 alloc_large_system_hash+0x178/0x238 pages=256 vmalloc N0=128 N1=128
0xffffc9001b776000-0xffffc9001b977000 2101248 alloc_large_system_hash+0x178/0x238 pages=512 vmalloc N0=256 N1=256
0xffffc9001b977000-0xffffc9001bb78000 2101248 alloc_large_system_hash+0x178/0x238 pages=512 vmalloc N0=256 N1=256
0xffffc9001c075000-0xffffc9001c176000 1052672 alloc_large_system_hash+0x178/0x238 pages=256 vmalloc N0=128 N1=128


 mm/vmalloc.c |   47 +++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 39 insertions(+), 8 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f2481cb4e6b2..f4b9c9238f86 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -21,6 +21,7 @@
 #include <linux/debugobjects.h>
 #include <linux/kallsyms.h>
 #include <linux/list.h>
+#include <linux/mempolicy.h>
 #include <linux/notifier.h>
 #include <linux/rbtree.h>
 #include <linux/radix-tree.h>
@@ -1602,9 +1603,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 				 pgprot_t prot, int node)
 {
 	struct page **pages;
-	unsigned int nr_pages, array_size, i;
+	unsigned int nr_pages, array_size, i, j;
 	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
 	const gfp_t alloc_mask = gfp_mask | __GFP_NOWARN;
+	const gfp_t multi_alloc_mask = (alloc_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_NORETRY;
+	int max_node_order = MAX_ORDER - 1;
 
 	nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
 	array_size = (nr_pages * sizeof(struct page *));
@@ -1624,20 +1627,48 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 		return NULL;
 	}
 
-	for (i = 0; i < area->nr_pages; i++) {
-		struct page *page;
+	if (IS_ENABLED(CONFIG_NUMA) && nr_online_nodes > 1) {
+		struct mempolicy *policy = current->mempolicy;
+		int pages_per_node;
 
-		if (node == NUMA_NO_NODE)
-			page = alloc_page(alloc_mask);
-		else
-			page = alloc_pages_node(node, alloc_mask, 0);
+		if (policy && policy->mode == MPOL_INTERLEAVE) {
+			pages_per_node = DIV_ROUND_UP(nr_pages,
+						      nr_online_nodes);
+			max_node_order = min(max_node_order,
+					     ilog2(pages_per_node));
+		}
+	}
+
+	for (i = 0; i < area->nr_pages;) {
+		unsigned int chunk_order = min(ilog2(area->nr_pages - i),
+					       max_node_order);
+		struct page *page = NULL;
+
+		while (chunk_order) {
+			if (node == NUMA_NO_NODE)
+				page = alloc_pages(multi_alloc_mask, chunk_order);
+			else
+				page = alloc_pages_node(node, multi_alloc_mask, chunk_order);
+			if (page) {
+				split_page(page, chunk_order);
+				break;
+			}
+			chunk_order--;
+		}
+		if (!page) {
+			if (node == NUMA_NO_NODE)
+				page = alloc_pages(alloc_mask, 0);
+			else
+				page = alloc_pages_node(node, alloc_mask, 0);
+		}
 
 		if (unlikely(!page)) {
 			/* Successfully allocated i pages, free them in __vunmap() */
 			area->nr_pages = i;
 			goto fail;
 		}
-		area->pages[i] = page;
+		for (j = 0; j < (1U << chunk_order); j++)
+			area->pages[i++] = page++;
 		if (gfpflags_allow_blocking(gfp_mask))
 			cond_resched();
 	}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Linux 4.9-rc6
  2016-11-21 13:51             ` Eric Dumazet
@ 2016-11-21 16:49               ` Eric Dumazet
  2016-12-04 10:43               ` Thorsten Leemhuis
  1 sibling, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2016-11-21 16:49 UTC (permalink / raw)
  To: David Rientjes; +Cc: Al Viro, Linus Torvalds, Linux Kernel Mailing List

On Mon, 2016-11-21 at 05:51 -0800, Eric Dumazet wrote:

> +		while (chunk_order) {
> +			if (node == NUMA_NO_NODE)
> +				page = alloc_pages(multi_alloc_mask, chunk_order);
> +			else
> +				page = alloc_pages_node(node, multi_alloc_mask, chunk_order);
> +			if (page) {
> +				split_page(page, chunk_order);
> +				break;
> +			}
> +			chunk_order--;
> +		}


We also could remember the page order with set_page_private() and
speedup show_numa_info()

I wonder if we could avoid the split_page() and speedup vfree().

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Linux 4.9-rc6
  2016-11-21 13:51             ` Eric Dumazet
  2016-11-21 16:49               ` Eric Dumazet
@ 2016-12-04 10:43               ` Thorsten Leemhuis
       [not found]                 ` <CA+55aFzPiZW4FfWbvM-+AFraa0fkUHv4C1Y9SCzHdXEcUSPqdg@mail.gmail.com>
  1 sibling, 1 reply; 12+ messages in thread
From: Thorsten Leemhuis @ 2016-12-04 10:43 UTC (permalink / raw)
  To: Eric Dumazet, David Rientjes
  Cc: Al Viro, Linus Torvalds, Linux Kernel Mailing List

Lo! On 21.11.2016 14:51, Eric Dumazet wrote:
> On Mon, 2016-11-21 at 05:32 -0800, Eric Dumazet wrote:
>> Oh, this was definitely my intent of course, thanks for noticing this
>> typo ;)
> V2 is fixing this, and brings back NUMA spreading,
> (eg alloc_large_system_hash() done at boot time )

What the status of below patch? From the discussion it looks a lot like
it was developed to fix a regression in 4.9, but the patch afaics has
neither his mainline or linux-next yet. That's why I'm inclined to add
it to this weeks regression report.

Ciao, Thorsten

> lpaa24:~# grep alloc_large /proc/vmallocinfo 
> 0xffffc90000009000-0xffffc9000000c000   12288 alloc_large_system_hash+0x178/0x238 pages=2 vmalloc N0=1 N1=1
> 0xffffc9000000c000-0xffffc9000000f000   12288 alloc_large_system_hash+0x178/0x238 pages=2 vmalloc N0=1 N1=1
> 0xffffc9000001e000-0xffffc9000009f000  528384 alloc_large_system_hash+0x178/0x238 pages=128 vmalloc N0=64 N1=64
> 0xffffc9000009f000-0xffffc900000e0000  266240 alloc_large_system_hash+0x178/0x238 pages=64 vmalloc N0=32 N1=32
> 0xffffc900001d3000-0xffffc900101d4000 268439552 alloc_large_system_hash+0x178/0x238 pages=65536 vmalloc vpages N0=32768 N1=32768
> 0xffffc900101d4000-0xffffc900181d5000 134221824 alloc_large_system_hash+0x178/0x238 pages=32768 vmalloc vpages N0=16384 N1=16384
> 0xffffc900181d5000-0xffffc900185d6000 4198400 alloc_large_system_hash+0x178/0x238 pages=1024 vmalloc vpages N0=512 N1=512
> 0xffffc900185d6000-0xffffc900189d7000 4198400 alloc_large_system_hash+0x178/0x238 pages=1024 vmalloc vpages N0=512 N1=512
> 0xffffc9001b271000-0xffffc9001b672000 4198400 alloc_large_system_hash+0x178/0x238 pages=1024 vmalloc vpages N0=512 N1=512
> 0xffffc9001b672000-0xffffc9001b675000   12288 alloc_large_system_hash+0x178/0x238 pages=2 vmalloc N0=1 N1=1
> 0xffffc9001b675000-0xffffc9001b776000 1052672 alloc_large_system_hash+0x178/0x238 pages=256 vmalloc N0=128 N1=128
> 0xffffc9001b776000-0xffffc9001b977000 2101248 alloc_large_system_hash+0x178/0x238 pages=512 vmalloc N0=256 N1=256
> 0xffffc9001b977000-0xffffc9001bb78000 2101248 alloc_large_system_hash+0x178/0x238 pages=512 vmalloc N0=256 N1=256
> 0xffffc9001c075000-0xffffc9001c176000 1052672 alloc_large_system_hash+0x178/0x238 pages=256 vmalloc N0=128 N1=128
> 
> 
>  mm/vmalloc.c |   47 +++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 39 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index f2481cb4e6b2..f4b9c9238f86 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -21,6 +21,7 @@
>  #include <linux/debugobjects.h>
>  #include <linux/kallsyms.h>
>  #include <linux/list.h>
> +#include <linux/mempolicy.h>
>  #include <linux/notifier.h>
>  #include <linux/rbtree.h>
>  #include <linux/radix-tree.h>
> @@ -1602,9 +1603,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  				 pgprot_t prot, int node)
>  {
>  	struct page **pages;
> -	unsigned int nr_pages, array_size, i;
> +	unsigned int nr_pages, array_size, i, j;
>  	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
>  	const gfp_t alloc_mask = gfp_mask | __GFP_NOWARN;
> +	const gfp_t multi_alloc_mask = (alloc_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_NORETRY;
> +	int max_node_order = MAX_ORDER - 1;
>  
>  	nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
>  	array_size = (nr_pages * sizeof(struct page *));
> @@ -1624,20 +1627,48 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  		return NULL;
>  	}
>  
> -	for (i = 0; i < area->nr_pages; i++) {
> -		struct page *page;
> +	if (IS_ENABLED(CONFIG_NUMA) && nr_online_nodes > 1) {
> +		struct mempolicy *policy = current->mempolicy;
> +		int pages_per_node;
>  
> -		if (node == NUMA_NO_NODE)
> -			page = alloc_page(alloc_mask);
> -		else
> -			page = alloc_pages_node(node, alloc_mask, 0);
> +		if (policy && policy->mode == MPOL_INTERLEAVE) {
> +			pages_per_node = DIV_ROUND_UP(nr_pages,
> +						      nr_online_nodes);
> +			max_node_order = min(max_node_order,
> +					     ilog2(pages_per_node));
> +		}
> +	}
> +
> +	for (i = 0; i < area->nr_pages;) {
> +		unsigned int chunk_order = min(ilog2(area->nr_pages - i),
> +					       max_node_order);
> +		struct page *page = NULL;
> +
> +		while (chunk_order) {
> +			if (node == NUMA_NO_NODE)
> +				page = alloc_pages(multi_alloc_mask, chunk_order);
> +			else
> +				page = alloc_pages_node(node, multi_alloc_mask, chunk_order);
> +			if (page) {
> +				split_page(page, chunk_order);
> +				break;
> +			}
> +			chunk_order--;
> +		}
> +		if (!page) {
> +			if (node == NUMA_NO_NODE)
> +				page = alloc_pages(alloc_mask, 0);
> +			else
> +				page = alloc_pages_node(node, alloc_mask, 0);
> +		}
>  
>  		if (unlikely(!page)) {
>  			/* Successfully allocated i pages, free them in __vunmap() */
>  			area->nr_pages = i;
>  			goto fail;
>  		}
> -		area->pages[i] = page;
> +		for (j = 0; j < (1U << chunk_order); j++)
> +			area->pages[i++] = page++;
>  		if (gfpflags_allow_blocking(gfp_mask))
>  			cond_resched();
>  	}
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Linux 4.9-rc6
       [not found]                 ` <CA+55aFzPiZW4FfWbvM-+AFraa0fkUHv4C1Y9SCzHdXEcUSPqdg@mail.gmail.com>
@ 2016-12-04 17:17                   ` Eric Dumazet
  2016-12-21 15:30                     ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2016-12-04 17:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thorsten Leemhuis, Linux Kernel Mailing List, Al Viro, David Rientjes

On Sun, 2016-12-04 at 03:10 -0800, Linus Torvalds wrote:
> 
> 
> On Dec 4, 2016 02:43, "Thorsten Leemhuis" <regressions@leemhuis.info>
> wrote:
>         
>         
>         What the status of below patch? From the discussion it looks a
>         lot like
>         it was developed to fix a regression in 4.9, but the patch
>         afaics has
>         neither his mainline or linux-next yet. 
> 
> 
> It's not a regression as far as I can tell. It's a small optimization.
> Maybe.
> 
> 
> It's not going into 4.9, is not even clear it's worth it later either,
> unless somebody had numbers (which I haven't seen)
> 
Right, the patch was not in anyway ready for 4.9 ;)

I'll try to complete this for next cycle.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Linux 4.9-rc6
  2016-12-04 17:17                   ` Eric Dumazet
@ 2016-12-21 15:30                     ` Eric Dumazet
  0 siblings, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2016-12-21 15:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thorsten Leemhuis, Linux Kernel Mailing List, Al Viro,
	David Rientjes, Hugh Dickins

On Sun, 2016-12-04 at 09:17 -0800, Eric Dumazet wrote:
> On Sun, 2016-12-04 at 03:10 -0800, Linus Torvalds wrote:
> > 
> > 
> > On Dec 4, 2016 02:43, "Thorsten Leemhuis" <regressions@leemhuis.info>
> > wrote:
> >         
> >         
> >         What the status of below patch? From the discussion it looks a
> >         lot like
> >         it was developed to fix a regression in 4.9, but the patch
> >         afaics has
> >         neither his mainline or linux-next yet. 
> > 
> > 
> > It's not a regression as far as I can tell. It's a small optimization.
> > Maybe.
> > 
> > 
> > It's not going into 4.9, is not even clear it's worth it later either,
> > unless somebody had numbers (which I haven't seen)
> > 
> Right, the patch was not in anyway ready for 4.9 ;)
> 
> I'll try to complete this for next cycle.

I now have a hacky patch that also adds PMD alignment for large
allocations, and support hugepages (this last part depends on
CONFIG_HAVE_ARCH_HUGE_VMAP at this moment, x86/arm64 so far)

Toshi Kani added pmd_set_huge() in commit e61ce6ade404e ("mm: change
ioremap to set up huge I/O mappings"), I am not sure why vmalloc() was
not considered (or I might have missed it completely)

It seems to provide about 25 cycles gain per random access for large
tables on my x86 lab hosts.

(I did a test with a program having 10 Million fds)

For allocations above 2 MB (pages >= 512), like Dentry cache,
Inode-cache, TCP established hash table, or large alloc_fdmem() ones,
might benefit from this.

lpaa23:~# grep large /proc/vmallocinfo 
0xffffc90000009000-0xffffc9000000c000   12288 alloc_large_system_hash+0x189/0x253 pages=2 vmalloc N0=1 N1=1
0xffffc9000000c000-0xffffc9000000f000   12288 alloc_large_system_hash+0x189/0x253 pages=2 vmalloc N0=1 N1=1
0xffffc9000001e000-0xffffc9000009f000  528384 alloc_large_system_hash+0x189/0x253 pages=128 vmalloc N0=64 N1=64
0xffffc9000009f000-0xffffc900000e0000  266240 alloc_large_system_hash+0x189/0x253 pages=64 vmalloc N0=32 N1=32
0xffffc900001d9000-0xffffc900001dc000   12288 alloc_large_system_hash+0x189/0x253 pages=2 vmalloc N0=1 N1=1
0xffffc90000200000-0xffffc90010201000 268439552 alloc_large_system_hash+0x189/0x253 pages=65536 vmalloc vpages N0=32768 N1=32768
0xffffc90010400000-0xffffc90018401000 134221824 alloc_large_system_hash+0x189/0x253 pages=32768 vmalloc vpages N0=16384 N1=16384
0xffffc90018600000-0xffffc90018a01000 4198400 alloc_large_system_hash+0x189/0x253 pages=1024 vmalloc vpages N0=512 N1=512
0xffffc90018c00000-0xffffc90019001000 4198400 alloc_large_system_hash+0x189/0x253 pages=1024 vmalloc vpages N0=512 N1=512
0xffffc9001b249000-0xffffc9001b34a000 1052672 alloc_large_system_hash+0x189/0x253 pages=256 vmalloc N0=128 N1=128
0xffffc9001b400000-0xffffc9001b801000 4198400 alloc_large_system_hash+0x189/0x253 pages=1024 vmalloc vpages N0=512 N1=512
0xffffc9001ba00000-0xffffc9001bc01000 2101248 alloc_large_system_hash+0x189/0x253 pages=512 vmalloc N0=256 N1=256
0xffffc9001bc01000-0xffffc9001bd02000 1052672 alloc_large_system_hash+0x189/0x253 pages=256 vmalloc N0=128 N1=128
0xffffc9001be00000-0xffffc9001c001000 2101248 alloc_large_system_hash+0x189/0x253 pages=512 vmalloc N0=256 N1=256


I wont be able to split this patch in 3 parts before January 6th, after
my vacations. I am showing the WIP if anyone is interested seeing this.

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a5584384eabc..055b027ee659 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -21,6 +21,7 @@
 #include <linux/debugobjects.h>
 #include <linux/kallsyms.h>
 #include <linux/list.h>
+#include <linux/mempolicy.h>
 #include <linux/notifier.h>
 #include <linux/rbtree.h>
 #include <linux/radix-tree.h>
@@ -154,6 +155,18 @@ static int vmap_pmd_range(pud_t *pud, unsigned long addr,
 		return -ENOMEM;
 	do {
 		next = pmd_addr_end(addr, end);
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+		if (next - addr == PMD_SIZE) {
+			struct page *page = pages[*nr];
+
+			if (compound_order(page) == PMD_SHIFT - PAGE_SHIFT) {
+				if (pmd_set_huge(pmd, page_to_phys(page), prot)) {
+					(*nr) += 1 << (PMD_SHIFT - PAGE_SHIFT);
+					continue;
+				}
+			}
+		}
+#endif
 		if (vmap_pte_range(pmd, addr, next, prot, pages, nr))
 			return -ENOMEM;
 	} while (pmd++, addr = next, addr != end);
@@ -1349,7 +1362,8 @@ static struct vm_struct *__get_vm_area_node(unsigned long size,
 	if (flags & VM_IOREMAP)
 		align = 1ul << clamp_t(int, get_count_order_long(size),
 				       PAGE_SHIFT, IOREMAP_MAX_ORDER);
-
+	else if (size >= PMD_SIZE)
+		align = PMD_SIZE;
 	area = kzalloc_node(sizeof(*area), gfp_mask & GFP_RECLAIM_MASK, node);
 	if (unlikely(!area))
 		return NULL;
@@ -1482,11 +1496,14 @@ static void __vunmap(const void *addr, int deallocate_pages)
 	if (deallocate_pages) {
 		int i;
 
-		for (i = 0; i < area->nr_pages; i++) {
+		for (i = 0; i < area->nr_pages;) {
 			struct page *page = area->pages[i];
+			unsigned int order;
 
 			BUG_ON(!page);
-			__free_pages(page, 0);
+			order = compound_order(page);
+			__free_pages(page, order);
+			i += 1 << order;
 		}
 
 		kvfree(area->pages);
@@ -1613,16 +1630,39 @@ EXPORT_SYMBOL(vmap);
 static void *__vmalloc_node(unsigned long size, unsigned long align,
 			    gfp_t gfp_mask, pgprot_t prot,
 			    int node, const void *caller);
+
+static int vmalloc_max_order(int node, int nr_pages)
+{
+	int max_node_order = min(PMD_SHIFT - PAGE_SHIFT, MAX_ORDER - 1);
+
+#if defined(CONFIG_NUMA)
+	if (nr_online_nodes > 1 && node == NUMA_NO_NODE) {
+		struct mempolicy *pol = current->mempolicy;
+		int pages_per_node, nr_nodes;
+
+		if (pol && pol->mode == MPOL_INTERLEAVE) {
+			nr_nodes = nodes_weight(pol->v.nodes);
+			pages_per_node = DIV_ROUND_UP(nr_pages, nr_nodes);
+			max_node_order = min(max_node_order,
+					     ilog2(pages_per_node));
+		}
+	}
+#endif
+	return max_node_order;
+}
+
 static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 				 pgprot_t prot, int node)
 {
 	struct page **pages;
-	unsigned int nr_pages, array_size, i;
+	unsigned int nr_pages, array_size, i, j;
 	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
 	const gfp_t alloc_mask = gfp_mask | __GFP_NOWARN;
+	int max_node_order;
 
 	nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
 	array_size = (nr_pages * sizeof(struct page *));
+	max_node_order = vmalloc_max_order(node, nr_pages);
 
 	area->nr_pages = nr_pages;
 	/* Please note that the recursion is strictly bounded. */
@@ -1639,20 +1679,31 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 		return NULL;
 	}
 
-	for (i = 0; i < area->nr_pages; i++) {
-		struct page *page;
 
-		if (node == NUMA_NO_NODE)
-			page = alloc_page(alloc_mask);
-		else
-			page = alloc_pages_node(node, alloc_mask, 0);
+	for (i = 0; i < area->nr_pages;) {
+		int order = min(ilog2(area->nr_pages - i), max_node_order);
+		struct page *page;
 
-		if (unlikely(!page)) {
-			/* Successfully allocated i pages, free them in __vunmap() */
-			area->nr_pages = i;
-			goto fail;
+		for (;;) {
+			gfp_t gfp = alloc_mask;
+
+			if (order > 0)
+				gfp = (gfp & ~__GFP_DIRECT_RECLAIM) |
+				      __GFP_NORETRY | __GFP_COMP;
+			if (node == NUMA_NO_NODE)
+				page = alloc_pages(gfp, order);
+			else
+				page = alloc_pages_node(node, gfp, order);
+			if (page)
+				break;
+			if (unlikely(--order < 0)) {
+				/* Successfully allocated i pages, free them in __vunmap() */
+				area->nr_pages = i;
+				goto fail;
+			}
 		}
-		area->pages[i] = page;
+		for (j = 0; j < (1U << order); j++)
+			area->pages[i++] = page++;
 		if (gfpflags_allow_blocking(gfp_mask))
 			cond_resched();
 	}
@@ -2619,9 +2670,13 @@ static void show_numa_info(struct seq_file *m, struct vm_struct *v)
 
 		memset(counters, 0, nr_node_ids * sizeof(unsigned int));
 
-		for (nr = 0; nr < v->nr_pages; nr++)
-			counters[page_to_nid(v->pages[nr])]++;
+		for (nr = 0; nr < v->nr_pages;) {
+			struct page *page = v->pages[nr];
+			int npages = 1 << compound_order(page);
 
+			counters[page_to_nid(page)] += npages;
+			nr += npages;
+		}
 		for_each_node_state(nr, N_HIGH_MEMORY)
 			if (counters[nr])
 				seq_printf(m, " N%u=%u", nr, counters[nr]);

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-12-21 15:30 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-20 22:05 Linux 4.9-rc6 Linus Torvalds
2016-11-20 22:27 ` Eric Dumazet
2016-11-20 23:27   ` Linus Torvalds
2016-11-21  1:35     ` Al Viro
2016-11-21  4:59       ` Eric Dumazet
2016-11-21  8:34         ` David Rientjes
2016-11-21 13:32           ` Eric Dumazet
2016-11-21 13:51             ` Eric Dumazet
2016-11-21 16:49               ` Eric Dumazet
2016-12-04 10:43               ` Thorsten Leemhuis
     [not found]                 ` <CA+55aFzPiZW4FfWbvM-+AFraa0fkUHv4C1Y9SCzHdXEcUSPqdg@mail.gmail.com>
2016-12-04 17:17                   ` Eric Dumazet
2016-12-21 15:30                     ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).