All of lore.kernel.org
 help / color / mirror / Atom feed
* Linux 2.6.39-rc4
@ 2011-04-19  4:57 Linus Torvalds
  2011-04-19 20:04 ` [PATCH] uml: fix hppfs build Randy Dunlap
  2011-04-20 15:39 ` Linux 2.6.39-rc4 (regression: NUMA on multi-node CPUs broken) Andreas Herrmann
  0 siblings, 2 replies; 18+ messages in thread
From: Linus Torvalds @ 2011-04-19  4:57 UTC (permalink / raw)
  To: Linux Kernel Mailing List

So things have sadly not continued to calm down even further. We had
more commits in -rc4 than we had in -rc3, and I sincerely hope that
upward trend doesn't continue.

That said, so far the only thing that has really caused problems this
release cycle has been the block layer plugging changes, and as of
-rc4 the issues we had with MD should hopefully now be behind us. So
we're making progress on that front too.

The plugging code still seems to trigger some issue with what looks
like an infinite stream of disk-change notifications on CD-ROMs - but
Jens is hopefully going to squish that problem soon. In the meantime,
you can avoid the problem by either running SMP or having preemption
enabled.

Other than that? We may have a bit more commits than in -rc3, but it
hasn't been _too_ bad. There's certainly nothing overly exciting:
aside from the block/MD fixups, we've got some filesystem updates
(btrfs, cifs and ubifs) and some driver updates (the largest chunk of
which is actually a duplicate driver removal). USB, some KMS, nothing
really earthshaking.

Shortlog appended for the curious.

                 Linus

---
Abhilash Kesavan (2):
      ARM: S5P: Remove unused s3c_pm_check_resume_pin
      ARM: SAMSUNG: Fix build failure in PM CRC check code

Alan Stern (1):
      USB: EHCI: unlink unused QHs when the controller is stopped

Alberto Mardegan (1):
      samsung-laptop: Samsung R410P backlight driver

Alex Deucher (8):
      drm/radeon/kms: pll tweaks for rv6xx
      drm/radeon/kms: make radeon i2c put/get bytes less noisy
      drm/radeon/kms: clean up gart dummy page handling
      drm/radeon/kms: fix suspend on rv530 asics
      drm/radeon/kms: fix pcie_p callbacks on btc and cayman
      drm/radeon/kms: add voltage type to atom set voltage function
      drm/radeon/kms: properly program vddci on evergreen+
      i2c-algo-bit: Call pre/post_xfer for bit_test

Alexander Clouter (1):
      MAINTAINERS: add ARM/ts78xx-setup platform maintainer

Alexandre Bounine (2):
      RapidIO: add IDT CPS-1432 switch definitions
      RapidIO/mpc85xx: fix possible mport registration problems

Alexey Dobriyan (2):
      kstrtox: fix compile warnings in test
      kstrtox: simpler code in _kstrtoull()

Alexey Khoroshilov (1):
      USB: usb-gadget: unlock data->lock mutex on error path in ep_read()

Andi Kleen (1):
      mm: add VM counters for transparent hugepages

Andiry Xu (2):
      usbcore: Bug fix: system can't suspend with USB3.0 device
connected to USB3.0 hub
      xHCI: Implement AMD PLL quirk

Andreas Bießmann (1):
      avr32: add ATAG_BOARDINFO

Aneesh Kumar K.V (5):
      fs/9p: Fix revalidate to return correct value
      fs/9p: Use write_inode for data sync on server
      9p: revert tsyncfs related changes
      fs/9p: Fix error reported by coccicheck
      9p: Fix sparse error

Anton Blanchard (1):
      powerpc: Fix oops if scan_dispatch_log is called too early

Antonio Ospite (1):
      leds/leds-regulator.c: fix handling of already enabled regulators

Arne Jansen (1):
      btrfs: using cached extent_state in set/unlock combinations

Artem Bityutskiy (1):
      UBIFS: fix oops when R/O file-system is fsync'ed

Axel Lin (2):
      Input: twl4030_keypad - fix potential NULL dereference in
twl4030_kp_probe()
      drivers/rtc/rtc-mc13xxx.c: fix unterminated platform_device_id table

Ben Hutchings (2):
      avr32: Fix .size directive for cpu_enter_idle
      mm/thp: use conventional format for boolean attributes

Ben Skeggs (5):
      drm/nouveau: implement init table opcode 0x5c
      drm/nouveau: quirk for XFX GT-240X-YA
      drm/nv50: use "nv86" tlb flush method on everything except 0x50/0xac
      drm/nv50-nvc0: remove some code that doesn't belong here
      drm/nvc0: improve vm flush function

Benjamin Herrenschmidt (1):
      powerpc/powermac: Build fix with SMP and CPU hotplug

Bob Liu (1):
      ramfs: fix memleak on no-mmu arch

Catalin Marinas (3):
      ARM: 6866/1: Do not restrict HIGHPTE to !OUTER_CACHE
      ARM: 6867/1: Introduce THREAD_NOTIFY_COPY for copy_thread() hooks
      ARM: 6868/1: Preserve the VFP state during fork

Chase Douglas (1):
      Input: document event types and codes and their intended use

Chris Mason (4):
      Btrfs: make uncache_state unconditional
      Btrfs: don't force chunk allocation in find_free_extent
      Btrfs end_bio_extent_readpage should look for locked bits
      Btrfs: fix free space cache leak

Christian Simon (1):
      USB: ftdi_sio: Added IDs for CTI USB Serial Devices

Christoph Fritz (1):
      Input: h3600_ts - fix error handling at connect

Christoph Hellwig (2):
      block: cleanup the block plug helper functions
      block: add blk_run_queue_async

Christoph Lameter (1):
      vmstat: update comment regarding stat_threshold

Colin Cross (1):
      ARM: tegra: gpio: Fix unused variable warnings

Corentin Chary (3):
      asus-laptop: remove removed features from feature-removal-schedule.txt
      asus-wmi: swap input name and phys
      eeepc-wmi: add keys found on EeePC 1215T

Dan Carpenter (6):
      USB: musb: add missing unlock in cppi_interrupt()
      USB: musb: using 0 instead of NULL
      USB: musb: silence printk format warning
      USB: musb: dereferencing an iomem pointer
      usb: pch_udc: unlock on allocation failure
      USB: xhci: unsigned char never equals -1

Daniel J Blueman (1):
      fix user annotation in ioctl.c

Daniel Kiper (1):
      mm: optimize pfn calculation in online_page()

Darren Hart (1):
      futex: Set FLAGS_HAS_TIMEOUT during futex_wait restart setup

Dave Airlie (3):
      i915: restore only the mode of this driver on lastclose
      Revert "ttm: Utilize the DMA API for pages that have
TTM_PAGE_FLAG_DMA32 set."
      Revert "i915: restore only the mode of this driver on lastclose"

Dave Chinner (1):
      nfs: don't call __mark_inode_dirty while holding i_lock

David Brown (2):
      msm: Remove extraneous ffa device check
      msm: timer: fix missing return value

David Dillow (1):
      drm/nv50-nvc0: work around an evo channel hang that some people see

Dmitry Eremin-Solenikov (2):
      pcmcia: limit pxa2xx_balloon3 subdriver to balloon3 platform
      pcmcia: limit pxa2xx_trizeps4 subdriver to trizeps4 platform

Dmitry Torokhov (6):
      USB: fix formatting of SuperSpeed endpoints in /proc/bus/usb/devices
      USB: xhci - fix unsafe macro definitions
      USB: xhci - remove excessive 'inline' markings
      USB: xhci: simplify logic of skipping missed isoc TDs
      USB: xhci - fix math in xhci_get_endpoint_interval()
      USB: xhci - also free streams when resetting devices

Emil Velikov (1):
      nv30: Fix parsing of perf table

Eric B Munson (1):
      powerpc/perf_event: Skip updating kernel counters if register
value shrinks

Eric Dumazet (2):
      perf: Fix a build error with some GCC versions
      memcg: fix mem_cgroup_rotate_reclaimable_page()

Eric Miao (1):
      ARM: pxa: convert incorrect IRQ_TO_IRQ() to irq_to_gpio()

Felipe Balbi (2):
      usb: musb: temporarily make it bool
      usb: musb: gadget: check the correct list_head

Feng Tang (1):
      RTC: rtc-mrst: follow on to the change of rtc_device_register()

Geert Uytterhoeven (1):
      m68k,m68knommu: Wire up name_to_handle_at, open_by_handle_at,
clock_adjtime, syncfs

Graf Yang (1):
      Blackfin: SMP: make all barriers handle cache issues

Greg Kroah-Hartman (2):
      samsung-laptop: add support for N230 model
      Revert "USB: isp1760-hcd: move imask clear after pending work is done"

Hans J. Koch (1):
      MAINTAINERS: change mail adress of Hans J. Koch

Haojian Zhuang (3):
      ARM: pxa: always clear LPM bits for PXA168 MFPR
      ARM: pxa: align NR_BUILTIN_GPIO with GPIO interrupt number
      ARM: mmp: align NR_BUILTIN_GPIO with gpio interrupt number

Harsh Prateek Bora (1):
      net/9p: nwname should be an unsigned int

Hema HK (1):
      usb: musb: Fix the crash issue during reboot

Hugh Dickins (1):
      tmpfs: fix off-by-one in max_blocks checks

Igor Mammedov (1):
      Input: xen-kbdfront - fix mouse getting stuck after save/restore

Jacob Pan (1):
      x86/mrst: Fix boot crash caused by incorrect pin to irq mapping

Jarod Wilson (1):
      Input: add KEY_IMAGES specifically for AL Image Browser

Jean Delvare (1):
      i2c: Improve deprecation warnings

Jean-Christophe PLAGNIOL-VILLARD (1):
      avr32: At32ap: pio fix typo "))" on gpio_irq_unmask prototype

Jeff Brown (2):
      Input: evdev - indicate buffer overrun with SYN_DROPPED
      Input: estimate number of events per packet

Jeff Layton (9):
      cifs: check for private_data before trying to put it
      cifs: replace /proc/fs/cifs/Experimental with a module parm
      cifs: always do is_path_accessible check in cifs_mount
      cifs: fix broken BCC check in is_valid_oplock_break
      cifs: set ra_pages in backing_dev_info
      cifs: clean up length checks in check2ndT2
      cifs: clean up various nits in unicode routines (try #2)
      cifs: wrap received signature check in srv_mutex
      cifs: don't allow mmap'ed pages to be dirtied while under
writeback (try #3)

Jeff Mahoney (1):
      fs/fhandle.c: add <linux/personality.h> for ia64

Jens Axboe (13):
      block: remove block_unplug_timer() trace point
      block: fixup block IO unplug trace call
      block: add comment on why we save and disable interrupts in
flush_plug_list()
      block: add callback function for unplug notification
      block: readd plug trace event
      block: kill queue_sync_plugs()
      block: move queue run on unplug to kblockd
      block: only force kblockd unplugging from the schedule() path
      block: let io_schedule() flush the plug inline
      block: make unplug timer trace event correspond to the schedule() unplug
      Revert "block: add callback function for unplug notification"
      block: drop queue lock before calling __blk_run_queue() for kblockd punt
      block: blk_delay_queue() should use kblockd workqueue

Jeremy Fitzhardinge (1):
      xen: just completely disable XSAVE

Jiri Kosina (1):
      brk: COMPAT_BRK: fix detection of randomized brk

Joe Perches (2):
      MAINTAINERS: update m68knommu patterns
      MAINTAINERS: update various tty patterns

Joerg Roedel (2):
      USB host: Fix lockdep warning in AMD PLL quirk
      x86, amd: Disable GartTlbWlkErr when BIOS forgets it

Johan Hovold (2):
      usb: musb: omap2430: fix build failure
      USB: ftdi_sio: add PID for OCT DK201 docking station

John Stultz (1):
      RTC: Fix early irqs caused by calling rtc_set_alarm too early

Josef Bacik (11):
      Btrfs: deal with the case that we run out of space in the cache
      Btrfs: only retry transaction reservation once
      Btrfs: map the inode item when doing fill_inode_item
      Btrfs: do not call btrfs_update_inode in endio if nothing changed
      Btrfs: don't split dio bios if we don't have to
      Btrfs: do not use async submit for small DIO io's
      Btrfs: reuse the extent_map we found when calling btrfs_get_extent
      Btrfs: check for duplicate iov_base's when doing dio reads
      Btrfs: check for duplicate iov_base's when doing dio reads
      Btrfs: avoid taking the trans_mutex in btrfs_end_transaction
      Btrfs: avoid taking the chunk_mutex in do_chunk_alloc

Justin P. Mattock (1):
      ARM: 6872/1: arch:common:Makefile Remove unused config in the Makefile.

KOSAKI Motohiro (3):
      vmscan: all_unreclaimable() use zone->all_unreclaimable as a name
      oom-kill: remove boost_dying_task_prio()
      x86, NUMA: Fix fakenuma boot failure

Keith Packard (1):
      thinkpad-acpi fails to load with newer Thinkpad X201s BIOS

Ken Chen (2):
      sched: Fix sched-domain avg_load calculation
      sched: Fix erroneous all_pinned logic

Konrad Rzeszutek Wilk (1):
      xen/debug: Don't be so verbose with WARN on 1-1 mapping errors.

Konstantin Khlebnikov (1):
      i915: select VIDEO_OUTPUT_CONTROL for ACPI_VIDEO

Kumar Gala (2):
      powerpc/book3e: Fix CPU feature handling on 64-bit e5500
      powerpc/85xx: disable Suspend support if SMP enabled

Lee, Chun-Yi (1):
      acer-wmi: Fix capitalisation of GUID in module alias

Li Zefan (2):
      Btrfs: Check if btrfs_next_leaf() returns error in btrfs_listxattr()
      Btrfs: Check if btrfs_next_leaf() returns error in btrfs_real_readdir()

Linus Torvalds (9):
      Revert "vfs: Export file system uuid via /proc/<pid>/mountinfo"
      vm: fix mlock() on stack guard page
      vfs: Re-introduce s_uuid in the superblock
      vm: fix vm_pgoff wrap in stack expansion
      block: don't flush plugged IO on forced preemtion scheduling
      vfs: fix incorrect dentry_update_name_case() BUG_ON() test
      next_pidmap: fix overflow condition
      proc: do proper range check on readdir offset
      Linux 2.6.39-rc4

Liu Yuan (1):
      block, blk-sysfs: Use the variable directly instead of a function call

Maksim Rayskiy (1):
      UBIFS: fix compilation warnings when compiling with gcc 4.5

Marcin Slusarz (1):
      drm/nouveau: fix oops on unload with disabled LVDS panel

Marco Chiappero (1):
      sony-laptop: keyboard backlight fixes

Marek Vasut (1):
      ARM: pxafb: Fix access to nonexistent member of pxafb_info

Marius B. Kotsbak (1):
      USB: option: Added support for Samsung GT-B3730/GT-B3710 LTE USB modem.

Matt Fleming (1):
      avr32: init cannot ignore signals sent by force_sig_info()

Matthew Garrett (1):
      x86 platform drivers: Build fix for intel_pmic_gpio

Matthew Wilcox (1):
      USB: Fix unplug of device with active streams

Mattia Dongili (2):
      sony-laptop: fix early NULL pointer dereference
      sony-laptop: only show the handles sysfs file in debug mode

Maurus Cuelenaere (1):
      ARM: SAMSUNG: Fix warning 's3c_pm_show_resume_irqs' defined but not used

Mian Yousaf Kaukab (2):
      usb: musb: clear AUTOSET while clearing DMAENAB
      usb: musb: ux500: copy dma mask from platform device to musb device

Miao Xie (2):
      Btrfs: Fix incorrect inode nlink in btrfs_link()
      Btrfs: Check validity before setting an acl

Michael Ellerman (1):
      mm: check that we have the right vma in __access_remote_vm()

Michal Marek (2):
      staging: samsung-laptop has moved to platform/x86
      samsung-laptop: set backlight type

Michal Simek (1):
      usb: Fix Kconfig unmet dependencies for Microblaze EHCI

Michel Dänzer (2):
      radeon: Fix KMS CP writeback on big endian machines.
      drm/radeon: Fix KMS legacy backlight support if
CONFIG_BACKLIGHT_CLASS_DEVICE=m.

Mike Frysinger (4):
      RTC: add missing "return 0" in new alarm func for rtc-bfin.c
      USB: musb: blackfin: work around anomaly 05000450
      Blackfin: gptimers: fix thinko when disabling timers
      Blackfin: time-ts: ack gptimer sooner to avoid missing short ints

Milton Miller (1):
      fs: synchronize_rcu when unregister_filesystem success not failure

NeilBrown (8):
      block: splice plug list to local context
      block: Enhance new plugging support to support general callbacks
      md: use new plugging interface for RAID IO.
      md/dm - remove remains of plug_fn callback.
      md - remove old plugging code.
      md: provide generic support for handling unplug callbacks.
      md: incorporate new plugging into raid5.
      md: fix up raid1/raid10 unplugging.

Nicolas Kaiser (2):
      xen: events: fix error checks in bind_*_to_irqhandler()
      arm: tegra: fix error check in tegra2_clocks.c

Nicolas Pitre (3):
      ARM: 6877/1: the ADDR_NO_RANDOMIZE personality flag should be
honored with mmap()
      ARM: 6878/1: fix personality flag propagation across an exec
      ARM: 6879/1: fix personality test wrt usage of domain handlers

Nishanth Aravamudan (1):
      powerpc/pseries: Use a kmem cache for DTL buffers

Ole Henrik Jahren (1):
      avr32: fix deadlock when reading clock list in debugfs

Paul Friedrich (1):
      USB: ftdi_sio: add ids for Hameg HO720 and HO730

Paul Gortmaker (1):
      powerpc/kexec: Fix regression causing compile failure on UP

Paul Mundt (1):
      mm/page_alloc.c: silence build_all_zonelists() section mismatch

Prabhakar Kushwaha (2):
      powerpc/85xx: Don't add disabled PCIe devices
      powerpc: Check device status before adding serial device

Rafael J. Wysocki (1):
      PM / Hibernate: Introduce CONFIG_HIBERNATE_CALLBACKS

Randy Dunlap (3):
      msi-laptop: fix config-dependent build error
      usb: fix ips1760-hcd printk format warning
      MAINTAINERS: update STABLE BRANCH info

Richard Henderson (4):
      alpha: Don't force -Werror.
      alpha: Remove set but unused variables.
      alpha: Fix RTC interrupt setup.
      alpha: Fix uninitialized value in read_persistent_clock.

Richard Retanubun (1):
      USB: isp1760-hcd: move imask clear after pending work is done

Richard Weinberger (2):
      um: fix call tracer and bug handler
      um: disable CONFIG_CMPXCHG_LOCAL

Roy Spliet (1):
      drm/nouveau: correct memtiming table parsing for nv4x

Russell King (2):
      ARM: Make consolidated PM sleep code depend on PM_SLEEP
      ARM: Only allow PM_SLEEP with CPUs which support suspend

Sage Weil (1):
      libceph: fix linger request requeueing

Samuel Ortiz (1):
      mfd: Fetch cell pointer from platform_device->mfd_cell

Sarah Sharp (2):
      xhci: Fix NULL pointer deref in handle_port_status()
      xhci: Tell USB core both roothubs lost power.

Scott Wood (1):
      powerpc/e500mc: Remove CPU_FTR_MAYBE_CAN_NAP/CPU_FTR_MAYBE_CAN_DOZE

Sebastian Andrzej Siewior (2):
      x86/ce4100: Add reg property to bridges
      usb/gadget: don't leak hs_descriptors

Sergei Trofimovich (1):
      btrfs: properly handle overlapping areas in memmove_extent_buffer

Shan Haitao (1):
      xen: Allow PV-OPS kernel to detect whether XSAVE is supported

Shriram Rajagopalan (1):
      fix XEN_SAVE_RESTORE Kconfig dependencies

Shubhrajyoti D (1):
      Input: twl4030_keypad - avoid potential NULL-pointer dereference

Sonic Zhang (1):
      Blackfin: SMP: fix cache flush loop

Stefan Roese (1):
      powerpc: Don't write protect kernel text with
CONFIG_DYNAMIC_FTRACE enabled

Stephane Eranian (1):
      perf_event: Fix cgrp event scheduling bug in perf_enable_on_exec()

Stephen Boyd (1):
      ARM: 6876/1: Kconfig.debug: Remove unused CONFIG_DEBUG_ERRORS

Steve French (6):
      Allow user names longer than 32 bytes
      Max share size is too small
      Elminate sparse __CHECK_ENDIAN__ warnings on port conversion
      various endian fixes to cifs
      [CIFS] cifs: clarify the meaning of tcpStatus == CifsGood
      [CIFS] Warn on requesting default security (ntlm) on mount

Steven Hardy (3):
      usb: Fix qcserial memory leak on rmmod
      usb: qcserial avoid pointing to freed memory
      usb: qcserial add missing errorpath kfrees

Thomas Gleixner (1):
      platform-drivers: x86: pmic: Restore the dropped buslock/unlock

Tim Chen (1):
      vfs: Fix absolute RCU path walk failures due to uninitialized seq number

Timo Warns (1):
      fs/partitions/ldm.c: fix oops caused by corrupted partition table

Uwe Kleine-König (1):
      don't check platform_get_irq's return value against zero

Valentin Longchamp (1):
      USB: fsl_qe_udc: send ZLP when zero flag and length % maxpacket == 0

Vasily Khoruzhick (1):
      RTC: Fix s3c compile error due to missing s3c_rtc_setpie

Wanlong Gao (2):
      fix the wrong argument of the functions definition
      drivers/misc/sgi-gru/grufile.c: fix the wrong members of gru_chip

Will Deacon (2):
      ARM: 6864/1: hw_breakpoint: clear DBGVCR out of reset
      ARM: 6865/1: perf: ensure pass through zero is counted on overflow

Xin Zhong (1):
      Btrfs: fix subvolume mount by name problem when default mount
subvolume is set

Yauheni Kaliuta (1):
      usb: gadget: eem: fix echo command processing

Yoichi Yuasa (1):
      USB: ohci-au1xxx: fix warning "__BIG_ENDIAN" is not defined

Yoshihiro Shimoda (1):
      usb: r8a66597-udc: fix spinlock usage

Yoshinori Sano (1):
      Btrfs: fix memory leaks in btrfs_new_inode()

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] uml: fix hppfs build
  2011-04-19  4:57 Linux 2.6.39-rc4 Linus Torvalds
@ 2011-04-19 20:04 ` Randy Dunlap
  2011-04-19 20:09   ` Richard Weinberger
  2011-04-20 15:39 ` Linux 2.6.39-rc4 (regression: NUMA on multi-node CPUs broken) Andreas Herrmann
  1 sibling, 1 reply; 18+ messages in thread
From: Randy Dunlap @ 2011-04-19 20:04 UTC (permalink / raw)
  To: Linus Torvalds, Simon Danner
  Cc: Linux Kernel Mailing List, Jeff Dike, Richard Weinberger,
	user-mode-linux-devel, Christoph Hellwig

From: Randy Dunlap <randy.dunlap@oracle.com>

Make HoneyPot ProcFS depend on CONFIG_PROC_FS so that it will build.
Recommended by Christoph Hellwig.

Fixes kernel bugzilla #33692:
  https://bugzilla.kernel.org/show_bug.cgi?id=33692

Reported-by: Simon Danner <danner.simon@gmail.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc:	Jeff Dike <jdike@addtoit.com>
Cc:	Richard Weinberger <richard@nod.at>
Cc:	user-mode-linux-devel@lists.sourceforge.net
Cc:	Christoph Hellwig <hch@infradead.org>
---
 arch/um/Kconfig.um |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- lnx-2639-rc4.orig/arch/um/Kconfig.um
+++ lnx-2639-rc4/arch/um/Kconfig.um
@@ -47,7 +47,7 @@ config HOSTFS
 
 config HPPFS
 	tristate "HoneyPot ProcFS (EXPERIMENTAL)"
-	depends on EXPERIMENTAL
+	depends on EXPERIMENTAL && PROC_FS
 	help
 	  hppfs (HoneyPot ProcFS) is a filesystem which allows UML /proc
 	  entries to be overridden, removed, or fabricated from the host.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] uml: fix hppfs build
  2011-04-19 20:04 ` [PATCH] uml: fix hppfs build Randy Dunlap
@ 2011-04-19 20:09   ` Richard Weinberger
  0 siblings, 0 replies; 18+ messages in thread
From: Richard Weinberger @ 2011-04-19 20:09 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Linus Torvalds, Simon Danner, Linux Kernel Mailing List,
	Jeff Dike, user-mode-linux-devel, Christoph Hellwig

Am Dienstag 19 April 2011, 22:04:19 schrieb Randy Dunlap:
> From: Randy Dunlap <randy.dunlap@oracle.com>
> 
> Make HoneyPot ProcFS depend on CONFIG_PROC_FS so that it will build.
> Recommended by Christoph Hellwig.
> 
> Fixes kernel bugzilla #33692:
>   https://bugzilla.kernel.org/show_bug.cgi?id=33692
> 
> Reported-by: Simon Danner <danner.simon@gmail.com>
> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
> Cc:	Jeff Dike <jdike@addtoit.com>
> Cc:	Richard Weinberger <richard@nod.at>
> Cc:	user-mode-linux-devel@lists.sourceforge.net
> Cc:	Christoph Hellwig <hch@infradead.org>
> ---
>  arch/um/Kconfig.um |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- lnx-2639-rc4.orig/arch/um/Kconfig.um
> +++ lnx-2639-rc4/arch/um/Kconfig.um
> @@ -47,7 +47,7 @@ config HOSTFS
> 
>  config HPPFS
>  	tristate "HoneyPot ProcFS (EXPERIMENTAL)"
> -	depends on EXPERIMENTAL
> +	depends on EXPERIMENTAL && PROC_FS
>  	help
>  	  hppfs (HoneyPot ProcFS) is a filesystem which allows UML /proc
>  	  entries to be overridden, removed, or fabricated from the host.

Applied.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux 2.6.39-rc4 (regression: NUMA on multi-node CPUs broken)
  2011-04-19  4:57 Linux 2.6.39-rc4 Linus Torvalds
  2011-04-19 20:04 ` [PATCH] uml: fix hppfs build Randy Dunlap
@ 2011-04-20 15:39 ` Andreas Herrmann
  2011-04-21  0:45   ` David Rientjes
  2011-04-21  2:04   ` KOSAKI Motohiro
  1 sibling, 2 replies; 18+ messages in thread
From: Andreas Herrmann @ 2011-04-20 15:39 UTC (permalink / raw)
  To: Linus Torvalds, KOSAKI Motohiro
  Cc: Linux Kernel Mailing List, Ingo Molnar, Tejun Heo

Following patch breaks real NUMA on multi-node CPUs like AMD
Magny-Cours and should be reverted (or changed to just take effect in
case of numa=fake):

  commit 7d6b46707f2491a94f4bd3b4329d2d7f809e9368
  Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
  Date:   Fri Apr 15 20:39:01 2011 +0900

    x86, NUMA: Fix fakenuma boot failure

    ...

    Thus, this patch implements a reassignment of node-ids if buggy firmware
    or numa emulation makes wrong cpu node map. Tt enforce all logical cpus
    in the same physical cpu share the same node.

    ...

  +static void __cpuinit check_cpu_siblings_on_same_node(int cpu1, int cpu2)
  +{
  +       int node1 = early_cpu_to_node(cpu1);
  +       int node2 = early_cpu_to_node(cpu2);
  +
  +       /*
  +        * Our CPU scheduler assumes all logical cpus in the same physical cpu
  +        * share the same node. But, buggy ACPI or NUMA emulation might assign
  +        * them to different node. Fix it.
  +        */

   ...

This is a false assumption. Magny-Cours has two nodes in the same
physical package. The scheduler was (kind of) fixed to work around
this boot problem for multi-node CPUs (with 2.6.32). If this is also
an issue with wrong cpu node maps in case of NUMA emulation this might
be fixed similar or this quirk should only be applied in case of NUMA
emulation.

With this patch Linux shows

   root # numactl  --hardware
   available: 8 nodes (0-7)
   node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11
   node 0 size: 8189 MB
   node 0 free: 7937 MB
   node 1 cpus:
   node 1 size: 16384 MB
   node 1 free: 16129 MB
   node 2 cpus: 12 13 14 15 16 17 18 19 20 21 22 23
   node 2 size: 8192 MB
   node 2 free: 8024 MB
   node 3 cpus:
   node 3 size: 16384 MB
   node 3 free: 16129 MB
   node 4 cpus: 24 25 26 27 28 29 30 31 32 33 34 35
   node 4 size: 8192 MB
   node 4 free: 8013 MB
   node 5 cpus:
   node 5 size: 16384 MB
   node 5 free: 16129 MB
   node 6 cpus: 36 37 38 39 40 41 42 43 44 45 46 47
   node 6 size: 8192 MB
   node 6 free: 8025 MB
   node 7 cpus:
   node 7 size: 16384 MB
   node 7 free: 16128 MB
   node distances:
   node   0   1   2   3   4   5   6   7 
     0:  10  16  16  22  16  22  16  22 
     1:  16  10  22  16  16  22  22  16 
     2:  16  22  10  16  16  16  16  16 
     3:  22  16  16  10  16  16  22  22 
     4:  16  16  16  16  10  16  16  22 
     5:  22  22  16  16  16  10  22  16 
     6:  16  22  16  22  16  22  10  16 
     7:  22  16  16  22  22  16  16  10 


which is bogus. The correct NUMA-information (based on SRAT) (w/o this
patch) is

    linux # numactl --hardware
   available: 8 nodes (0-7)
   node 0 cpus: 0 1 2 3 4 5
   node 0 size: 8189 MB
   node 0 free: 7947 MB
   node 1 cpus: 6 7 8 9 10 11
   node 1 size: 16384 MB
   node 1 free: 16114 MB
   node 2 cpus: 12 13 14 15 16 17
   node 2 size: 8192 MB
   node 2 free: 7941 MB
   node 3 cpus: 18 19 20 21 22 23
   node 3 size: 16384 MB
   node 3 free: 16120 MB
   node 4 cpus: 24 25 26 27 28 29
   node 4 size: 8192 MB
   node 4 free: 8028 MB
   node 5 cpus: 30 31 32 33 34 35
   node 5 size: 16384 MB
   node 5 free: 16116 MB
   node 6 cpus: 36 37 38 39 40 41
   node 6 size: 8192 MB
   node 6 free: 8033 MB
   node 7 cpus: 42 43 44 45 46 47
   node 7 size: 16384 MB
   node 7 free: 16120 MB
   node distances:
   node   0   1   2   3   4   5   6   7 
     0:  10  16  16  22  16  22  16  22 
     1:  16  10  22  16  16  22  22  16 
     2:  16  22  10  16  16  16  16  16 
     3:  22  16  16  10  16  16  22  22 
     4:  16  16  16  16  10  16  16  22 
     5:  22  22  16  16  16  10  22  16 
     6:  16  22  16  22  16  22  10  16 
     7:  22  16  16  22  22  16  16  10 



Regards,

Andreas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux 2.6.39-rc4 (regression: NUMA on multi-node CPUs broken)
  2011-04-20 15:39 ` Linux 2.6.39-rc4 (regression: NUMA on multi-node CPUs broken) Andreas Herrmann
@ 2011-04-21  0:45   ` David Rientjes
  2011-04-21  2:04     ` KOSAKI Motohiro
                       ` (2 more replies)
  2011-04-21  2:04   ` KOSAKI Motohiro
  1 sibling, 3 replies; 18+ messages in thread
From: David Rientjes @ 2011-04-21  0:45 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Linus Torvalds, KOSAKI Motohiro, linux-kernel, Ingo Molnar, Tejun Heo

On Wed, 20 Apr 2011, Andreas Herrmann wrote:

> Following patch breaks real NUMA on multi-node CPUs like AMD
> Magny-Cours and should be reverted (or changed to just take effect in
> case of numa=fake):
> 
>   commit 7d6b46707f2491a94f4bd3b4329d2d7f809e9368
>   Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>   Date:   Fri Apr 15 20:39:01 2011 +0900
> 
>     x86, NUMA: Fix fakenuma boot failure
> 
>     ...
> 
>     Thus, this patch implements a reassignment of node-ids if buggy firmware
>     or numa emulation makes wrong cpu node map. Tt enforce all logical cpus
>     in the same physical cpu share the same node.
> 
>     ...
> 
>   +static void __cpuinit check_cpu_siblings_on_same_node(int cpu1, int cpu2)
>   +{
>   +       int node1 = early_cpu_to_node(cpu1);
>   +       int node2 = early_cpu_to_node(cpu2);
>   +
>   +       /*
>   +        * Our CPU scheduler assumes all logical cpus in the same physical cpu
>   +        * share the same node. But, buggy ACPI or NUMA emulation might assign
>   +        * them to different node. Fix it.
>   +        */
> 
>    ...
> 
> This is a false assumption. Magny-Cours has two nodes in the same
> physical package. The scheduler was (kind of) fixed to work around
> this boot problem for multi-node CPUs (with 2.6.32). If this is also
> an issue with wrong cpu node maps in case of NUMA emulation this might
> be fixed similar or this quirk should only be applied in case of NUMA
> emulation.
> 

Right, this yields cpuless nodes that the scheduler can't handle.  Prior 
to the unification and cleanup, NUMA emulation would bind cpus to all 
nodes that are allocated on the physical node that it has affinity with on 
the board.  This causes all nodes to have bound cpus such that 
node_to_cpumask() correctly reveals the proximity that cpus have to its 
nodes, either emulated or otherwise.

We usually don't touch NUMA code for real architectures to fix a problem 
that can only happen with NUMA emulation, so 7d6b46707f24 should probably 
be reverted.

With that patch reverted, NUMA emulation works fine for me; for example, 
with numa=fake=8:

	/sys/devices/system/node/node0/cpulist:0-3
	/sys/devices/system/node/node1/cpulist:4-7
	/sys/devices/system/node/node2/cpulist:8-11
	/sys/devices/system/node/node3/cpulist:12-15
	/sys/devices/system/node/node4/cpulist:0-3
	/sys/devices/system/node/node5/cpulist:4-7
	/sys/devices/system/node/node6/cpulist:8-11
	/sys/devices/system/node/node7/cpulist:12-15

I'm not sure what it's trying to address (yes, there is a problem with the 
binding for CONFIG_NUMA_EMU && CONFIG_DEBUG_PER_CPU_MAPS, but not 
otherwise).

KOSAKI-san?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux 2.6.39-rc4 (regression: NUMA on multi-node CPUs broken)
  2011-04-21  0:45   ` David Rientjes
@ 2011-04-21  2:04     ` KOSAKI Motohiro
  2011-04-21  2:17       ` David Rientjes
  2011-04-21  2:19     ` [patch 1/2] x86, numa: Revert "Fix fakenuma boot failure" David Rientjes
  2011-04-21 19:45     ` Linux 2.6.39-rc4 (regression: NUMA on multi-node CPUs broken) David Rientjes
  2 siblings, 1 reply; 18+ messages in thread
From: KOSAKI Motohiro @ 2011-04-21  2:04 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, Andreas Herrmann, Linus Torvalds, linux-kernel,
	Ingo Molnar, Tejun Heo

> Right, this yields cpuless nodes that the scheduler can't handle.  Prior 
> to the unification and cleanup, NUMA emulation would bind cpus to all 
> nodes that are allocated on the physical node that it has affinity with on 
> the board.  This causes all nodes to have bound cpus such that 
> node_to_cpumask() correctly reveals the proximity that cpus have to its 
> nodes, either emulated or otherwise.
> 
> We usually don't touch NUMA code for real architectures to fix a problem 
> that can only happen with NUMA emulation, so 7d6b46707f24 should probably 
> be reverted.
> 
> With that patch reverted, NUMA emulation works fine for me; for example, 
> with numa=fake=8:
> 
> 	/sys/devices/system/node/node0/cpulist:0-3
> 	/sys/devices/system/node/node1/cpulist:4-7
> 	/sys/devices/system/node/node2/cpulist:8-11
> 	/sys/devices/system/node/node3/cpulist:12-15
> 	/sys/devices/system/node/node4/cpulist:0-3
> 	/sys/devices/system/node/node5/cpulist:4-7
> 	/sys/devices/system/node/node6/cpulist:8-11
> 	/sys/devices/system/node/node7/cpulist:12-15
> 
> I'm not sure what it's trying to address (yes, there is a problem with the 
> binding for CONFIG_NUMA_EMU && CONFIG_DEBUG_PER_CPU_MAPS, but not 
> otherwise).
> 
> KOSAKI-san?

Simple revert 7d6b46707f24 makes the same boot failure again.

[    0.215976] Pid: 1, comm: swapper Not tainted 2.6.39-rc4+ #10 FUJITSU-SV      PRIMERGY                      /D2559-A1
[    0.215976] RIP: 0010:[<ffffffff81085b94>]  [<ffffffff81085b94>] find_busiest_group+0x464/0xea0
[    0.215976] RSP: 0018:ffff88003c67d850  EFLAGS: 00010046
[    0.215976] RAX: 0000000000000000 RBX: 00000000001d2ec0 RCX: 0000000000000000
[    0.215976] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
[    0.215976] RBP: ffff88003c67da10 R08: 0000000000000000 R09: 0000000000000000
[    0.215976] R10: 0000000000000400 R11: 0000000000000000 R12: 00000000001d2ec0
[    0.215976] R13: 00000000ffffffff R14: ffff88003c640780 R15: 0000000000000001
[    0.215976] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[    0.215976] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    0.215976] CR2: 0000000000000000 CR3: 0000000001a03000 CR4: 00000000000006f0
[    0.215976] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.215976] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    0.215976] Process swapper (pid: 1, threadinfo ffff88003c67c000, task ffff88003c678040)
[    0.215976] Stack:
[    0.215976]  ffff88003c678078 ffff88003c67d9a0 ffff88003c67d880 ffff88003fc00000
[    0.215976]  0000000000000000 00000000001d2ec0 ffff88003c67db00 0100000000000002
[    0.215976]  ffff88003c67dbdc 0000000000000001 ffff88003fc0e4a0 000000003c678040
[    0.215976] Call Trace:
[    0.215976]  [<ffffffff810c24ff>] ? local_clock+0x6f/0x80
[    0.215976]  [<ffffffff8108c875>] load_balance+0xc5/0x990
[    0.215976]  [<ffffffff810d05ed>] ? trace_hardirqs_off+0xd/0x10
[    0.215976]  [<ffffffff810c24ff>] ? local_clock+0x6f/0x80
[    0.215976]  [<ffffffff8107e6a2>] ? update_shares+0x162/0x1a0
[    0.215976]  [<ffffffff8107e6ba>] ? update_shares+0x17a/0x1a0
[    0.215976]  [<ffffffff8107e540>] ? update_cfs_shares+0x1d0/0x1d0
[    0.215976]  [<ffffffff815a2673>] schedule+0xb03/0xb10
[    0.215976]  [<ffffffff810d48e1>] ? __lock_acquire+0x541/0x1e80
[    0.215976]  [<ffffffff810c24ff>] ? local_clock+0x6f/0x80
[    0.215976]  [<ffffffff815a2fa5>] schedule_timeout+0x265/0x320
[    0.215976]  [<ffffffff810d05ed>] ? trace_hardirqs_off+0xd/0x10
[    0.215976]  [<ffffffff810c24ff>] ? local_clock+0x6f/0x80
[    0.215976]  [<ffffffff810d0625>] ? lock_release_holdtime+0x35/0x180
[    0.215976]  [<ffffffff815a59e0>] ? _raw_spin_unlock_irq+0x30/0x40
[    0.215976]  [<ffffffff815a59e0>] ? _raw_spin_unlock_irq+0x30/0x40
[    0.215976]  [<ffffffff815a2a80>] wait_for_common+0x130/0x190
[    0.215976]  [<ffffffff8108ddb0>] ? try_to_wake_up+0x520/0x520
[    0.215976]  [<ffffffff815a2bbd>] wait_for_completion+0x1d/0x20
[    0.215976]  [<ffffffff810bafbc>] kthread_create_on_node+0xac/0x150
[    0.215976]  [<ffffffff810b3870>] ? process_scheduled_works+0x40/0x40
[    0.215976]  [<ffffffff815a299f>] ? wait_for_common+0x4f/0x190
[    0.215976]  [<ffffffff810b5f03>] __alloc_workqueue_key+0x1a3/0x590
[    0.215976]  [<ffffffff81cc2864>] cpuset_init_smp+0x64/0x74
[    0.215976]  [<ffffffff81ca8cd7>] kernel_init+0xa9/0x168
[    0.215976]  [<ffffffff815af4e4>] kernel_thread_helper+0x4/0x10
[    0.215976]  [<ffffffff815a61d4>] ? retint_restore_args+0x13/0x13
[    0.215976]  [<ffffffff81ca8c2e>] ? start_kernel+0x3f6/0x3f6
[    0.215976]  [<ffffffff815af4e0>] ? gs_change+0x13/0x13
[    0.215976] Code: 50 fe ff ff 41 89 50 08 0f 1f 80 00 00 00 00 48 8b 95 b0 fe ff ff 48 8b 7d 98 44 8b 42 08 48 89 f8 31 d2 48 c1 e0 0a 48 8b 4d a0
[    0.215976]  f7 f0 48 85 c9 48 89 c6 49 89 c1 48 89 45 90 74 1f 31 d2 48
[    0.215976] RIP  [<ffffffff81085b94>] find_busiest_group+0x464/0xea0
[    0.215976]  RSP <ffff88003c67d850>
[    0.215976] divide error: 0000 [#2]
[    0.215976] ---[ end trace 93d72a36b9146f22 ]---
[    0.215990] swapper used greatest stack depth: 3608 bytes left
[    0.216000] Kernel panic - not syncing: Attempted to kill init!
[    0.216002] Pid: 1, comm: swapper Tainted: G      D     2.6.39-rc4+ #10
[    0.216003] Call Trace:
[    0.216006]  [<ffffffff815a1816>] panic+0x91/0x1ab
[    0.216009]  [<ffffffff815a5a20>] ? _raw_write_unlock_irq+0x30/0x40
[    0.216011]  [<ffffffff8109b0ca>] ? do_exit+0x80a/0x970
[    0.216013]  [<ffffffff8109b183>] do_exit+0x8c3/0x970
[    0.216016]  [<ffffffff815a71ef>] oops_end+0xaf/0xf0
[    0.216019]  [<ffffffff81040fab>] die+0x5b/0x90
[    0.216021]  [<ffffffff815a68e4>] do_trap+0xc4/0x170
[    0.216023]  [<ffffffff8103de4f>] do_divide_error+0x8f/0xb0
[    0.216025]  [<ffffffff81085b94>] ? find_busiest_group+0x464/0xea0
[    0.216028]  [<ffffffff812c8d2d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[    0.216030]  [<ffffffff815a6204>] ? restore_args+0x30/0x30
[    0.216033]  [<ffffffff815af2fb>] divide_error+0x1b/0x20
[    0.216035]  [<ffffffff81085b94>] ? find_busiest_group+0x464/0xea0
[    0.216038]  [<ffffffff810c24ff>] ? local_clock+0x6f/0x80
[    0.216041]  [<ffffffff8108c875>] load_balance+0xc5/0x990
[    0.216043]  [<ffffffff810d05ed>] ? trace_hardirqs_off+0xd/0x10
[    0.216046]  [<ffffffff810c24ff>] ? local_clock+0x6f/0x80
[    0.216048]  [<ffffffff8107e6a2>] ? update_shares+0x162/0x1a0
[    0.216051]  [<ffffffff8107e6ba>] ? update_shares+0x17a/0x1a0
[    0.216053]  [<ffffffff8107e540>] ? update_cfs_shares+0x1d0/0x1d0
[    0.216055]  [<ffffffff815a2673>] schedule+0xb03/0xb10
[    0.216058]  [<ffffffff810d48e1>] ? __lock_acquire+0x541/0x1e80
[    0.216060]  [<ffffffff810c24ff>] ? local_clock+0x6f/0x80
[    0.216062]  [<ffffffff815a2fa5>] schedule_timeout+0x265/0x320
[    0.216064]  [<ffffffff810d05ed>] ? trace_hardirqs_off+0xd/0x10
[    0.216066]  [<ffffffff810c24ff>] ? local_clock+0x6f/0x80
[    0.216069]  [<ffffffff810d0625>] ? lock_release_holdtime+0x35/0x180
[    0.216071]  [<ffffffff815a59e0>] ? _raw_spin_unlock_irq+0x30/0x40
[    0.216073]  [<ffffffff815a59e0>] ? _raw_spin_unlock_irq+0x30/0x40
[    0.216076]  [<ffffffff815a2a80>] wait_for_common+0x130/0x190
[    0.216078]  [<ffffffff8108ddb0>] ? try_to_wake_up+0x520/0x520
[    0.216080]  [<ffffffff815a2bbd>] wait_for_completion+0x1d/0x20
[    0.216083]  [<ffffffff810bafbc>] kthread_create_on_node+0xac/0x150
[    0.216085]  [<ffffffff810b3870>] ? process_scheduled_works+0x40/0x40
[    0.216088]  [<ffffffff815a299f>] ? wait_for_common+0x4f/0x190
[    0.216090]  [<ffffffff810b5f03>] __alloc_workqueue_key+0x1a3/0x590
[    0.216092]  [<ffffffff81cc2864>] cpuset_init_smp+0x64/0x74
[    0.216095]  [<ffffffff81ca8cd7>] kernel_init+0xa9/0x168
[    0.216097]  [<ffffffff815af4e4>] kernel_thread_helper+0x4/0x10
[    0.216099]  [<ffffffff815a61d4>] ? retint_restore_args+0x13/0x13
[    0.216101]  [<ffffffff81ca8c2e>] ? start_kernel+0x3f6/0x3f6
[    0.216103]  [<ffffffff815af4e0>] ? gs_change+0x13/0x13
[    0.215976] SMP
[    0.215976] last sysfs file:
[    0.215976] CPU 1
[    0.215976] Modules linked in:
[    0.215976]
[    0.215976] Pid: 2, comm: kthreadd Tainted: G      D     2.6.39-rc4+ #10 FUJITSU-SV      PRIMERGY                      /D2559-A1
[    0.215976] RIP: 0010:[<ffffffff81084d65>]  [<ffffffff81084d65>] select_task_rq_fair+0x855/0xb80
[    0.215976] RSP: 0000:ffff88003c67fc40  EFLAGS: 00010046
[    0.215976] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[    0.215976] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000002
[    0.215976] RBP: ffff88003c67fcf0 R08: ffff88007aa133f0 R09: 0000000000000000
[    0.215976] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88007aa133f0
[    0.215976] R13: ffff88007aa133d8 R14: 0000000000000000 R15: 0000000000000000
[    0.215976] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[    0.215976] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    0.215976] CR2: 0000000000000000 CR3: 0000000001a03000 CR4: 00000000000006e0
[    0.215976] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.215976] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    0.215976] Process kthreadd (pid: 2, threadinfo ffff88003c67e000, task ffff88003c680080)
[    0.215976] Stack:
[    0.215976]  ffffffff815a5a20 000000007aa886e8 ffff88007fdd2ed8 0000000000000002
[    0.215976]  0000000000000000 00000000001d2ec0 000000000000007d 0000000000000200
[    0.215976]  ffffffffffffffff 0000000000000000 0000000100000008 ffffffff00000001
[    0.215976] Call Trace:
[    0.215976]  [<ffffffff815a5a20>] ? _raw_write_unlock_irq+0x30/0x40
[    0.215976]  [<ffffffff8108e201>] wake_up_new_task+0x41/0x1b0
[    0.215976]  [<ffffffff810b6cd0>] ? __task_pid_nr_ns+0xc0/0x100
[    0.215976]  [<ffffffff810b6c10>] ? cpumask_weight+0x20/0x20
[    0.215976]  [<ffffffff81095112>] do_fork+0xe2/0x3a0
[    0.215976]  [<ffffffff815a59e0>] ? _raw_spin_unlock_irq+0x30/0x40
[    0.215976]  [<ffffffff815a59e0>] ? _raw_spin_unlock_irq+0x30/0x40
[    0.215976]  [<ffffffff81044885>] ? native_sched_clock+0x15/0x70
[    0.215976]  [<ffffffff810c24ff>] ? local_clock+0x6f/0x80
[    0.215976]  [<ffffffff810456d6>] kernel_thread+0x76/0x80
[    0.215976]  [<ffffffff810bac70>] ? __init_kthread_worker+0x70/0x70
[    0.215976]  [<ffffffff815af4e0>] ? gs_change+0x13/0x13
[    0.215976]  [<ffffffff810bb1c3>] kthreadd+0x113/0x150
[    0.215976]  [<ffffffff815af4e4>] kernel_thread_helper+0x4/0x10
[    0.215976]  [<ffffffff815a61d4>] ? retint_restore_args+0x13/0x13
[    0.215976]  [<ffffffff810bb0b0>] ? tsk_fork_get_node+0x30/0x30
[    0.215976]  [<ffffffff815af4e0>] ? gs_change+0x13/0x13
[    0.215976] Code: ff ff 44 89 fe 89 c7 e8 4a 26 ff ff 8b 8d 68 ff ff ff 8b 95 70 ff ff ff eb 93 0f 1f 40 00 31 d2 48 89 d8 41 8b 4d 08 48 c1 e0 0a
[    0.215976]  f7 f1 45 85 f6 75 43 48 3b 45 90 0f 83 d9 fe ff ff 4c 89 6d
[    0.215976] RIP  [<ffffffff81084d65>] select_task_rq_fair+0x855/0xb80
[    0.215976]  RSP <ffff88003c67fc40>
[    0.215976] ---[ end trace 93d72a36b9146f23 ]---





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux 2.6.39-rc4 (regression: NUMA on multi-node CPUs broken)
  2011-04-20 15:39 ` Linux 2.6.39-rc4 (regression: NUMA on multi-node CPUs broken) Andreas Herrmann
  2011-04-21  0:45   ` David Rientjes
@ 2011-04-21  2:04   ` KOSAKI Motohiro
  2011-04-21  6:04     ` Andreas Herrmann
  1 sibling, 1 reply; 18+ messages in thread
From: KOSAKI Motohiro @ 2011-04-21  2:04 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: kosaki.motohiro, Linus Torvalds, Linux Kernel Mailing List,
	Ingo Molnar, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 1864 bytes --]

> Following patch breaks real NUMA on multi-node CPUs like AMD
> Magny-Cours and should be reverted (or changed to just take effect in
> case of numa=fake):
> 
>   commit 7d6b46707f2491a94f4bd3b4329d2d7f809e9368
>   Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>   Date:   Fri Apr 15 20:39:01 2011 +0900
> 
>     x86, NUMA: Fix fakenuma boot failure
> 
>     ...
> 
>     Thus, this patch implements a reassignment of node-ids if buggy firmware
>     or numa emulation makes wrong cpu node map. Tt enforce all logical cpus
>     in the same physical cpu share the same node.
> 
>     ...
> 
>   +static void __cpuinit check_cpu_siblings_on_same_node(int cpu1, int cpu2)
>   +{
>   +       int node1 = early_cpu_to_node(cpu1);
>   +       int node2 = early_cpu_to_node(cpu2);
>   +
>   +       /*
>   +        * Our CPU scheduler assumes all logical cpus in the same physical cpu
>   +        * share the same node. But, buggy ACPI or NUMA emulation might assign
>   +        * them to different node. Fix it.
>   +        */
> 
>    ...
> 
> This is a false assumption. Magny-Cours has two nodes in the same
> physical package. The scheduler was (kind of) fixed to work around
> this boot problem for multi-node CPUs (with 2.6.32). 

I agree we have to fix this ASAP. I also think we have to avoid reintroduce 
the same again. Can you please tell me the commit-id of this one? 

> If this is also
> an issue with wrong cpu node maps in case of NUMA emulation this might
> be fixed similar or this quirk should only be applied in case of NUMA
> emulation.

Indeed.

Tejun, Do you remember I sent numa emulation specific patch at first. now
I'm beside with Andreas. Because I bet current numa fallback code (you 
pointed out one) has no user. 

Or, please let us know if you have an alternative patch.



Attached revert and fakenuma spefic fix patches.

[-- Attachment #2: 0001-Revert-x86-NUMA-Fix-fakenuma-boot-failure.patch --]
[-- Type: application/octet-stream, Size: 4840 bytes --]

From 8183833bb4b48fdb150f64905b80fd21045946ec Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Thu, 21 Apr 2011 10:32:57 +0900
Subject: [PATCH 1/2] Revert "x86, NUMA: Fix fakenuma boot failure"

This reverts commit 7d6b46707f2491a94f4bd3b4329d2d7f809e9368.
Andreas Herrmann reported the patch breaks AMD Mangy-Cours because
Magny-Cours has two nodes in the same physical package.

He said,
:With this patch Linux shows
:
:   root # numactl  --hardware
:   available: 8 nodes (0-7)
:   node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11
:   node 0 size: 8189 MB
:   node 0 free: 7937 MB
:   node 1 cpus:
:   node 1 size: 16384 MB
:   node 1 free: 16129 MB
:   node 2 cpus: 12 13 14 15 16 17 18 19 20 21 22 23
:   node 2 size: 8192 MB
:   node 2 free: 8024 MB
:   node 3 cpus:
:   node 3 size: 16384 MB
:   node 3 free: 16129 MB
:   node 4 cpus: 24 25 26 27 28 29 30 31 32 33 34 35
:   node 4 size: 8192 MB
:   node 4 free: 8013 MB
:   node 5 cpus:
:   node 5 size: 16384 MB
:   node 5 free: 16129 MB
:   node 6 cpus: 36 37 38 39 40 41 42 43 44 45 46 47
:   node 6 size: 8192 MB
:   node 6 free: 8025 MB
:   node 7 cpus:
:   node 7 size: 16384 MB
:   node 7 free: 16128 MB
:   node distances:
:   node   0   1   2   3   4   5   6   7
:     0:  10  16  16  22  16  22  16  22
:     1:  16  10  22  16  16  22  22  16
:     2:  16  22  10  16  16  16  16  16
:     3:  22  16  16  10  16  16  22  22
:     4:  16  16  16  16  10  16  16  22
:     5:  22  22  16  16  16  10  22  16
:     6:  16  22  16  22  16  22  10  16
:     7:  22  16  16  22  22  16  16  10
:
:which is bogus. The correct NUMA-information (based on SRAT) (w/o this
:patch) is
:
:    linux # numactl --hardware
:   available: 8 nodes (0-7)
:   node 0 cpus: 0 1 2 3 4 5
:   node 0 size: 8189 MB
:   node 0 free: 7947 MB
:   node 1 cpus: 6 7 8 9 10 11
:   node 1 size: 16384 MB
:   node 1 free: 16114 MB
:   node 2 cpus: 12 13 14 15 16 17
:   node 2 size: 8192 MB
:   node 2 free: 7941 MB
:   node 3 cpus: 18 19 20 21 22 23
:   node 3 size: 16384 MB
:   node 3 free: 16120 MB
:   node 4 cpus: 24 25 26 27 28 29
:   node 4 size: 8192 MB
:   node 4 free: 8028 MB
:   node 5 cpus: 30 31 32 33 34 35
:   node 5 size: 16384 MB
:   node 5 free: 16116 MB
:   node 6 cpus: 36 37 38 39 40 41
:   node 6 size: 8192 MB
:   node 6 free: 8033 MB
:   node 7 cpus: 42 43 44 45 46 47
:   node 7 size: 16384 MB
:   node 7 free: 16120 MB
:   node distances:
:   node   0   1   2   3   4   5   6   7
:     0:  10  16  16  22  16  22  16  22
:     1:  16  10  22  16  16  22  22  16
:     2:  16  22  10  16  16  16  16  16
:     3:  22  16  16  10  16  16  22  22
:     4:  16  16  16  16  10  16  16  22
:     5:  22  22  16  16  16  10  22  16
:     6:  16  22  16  22  16  22  10  16
:     7:  22  16  16  22  22  16  16  10

We need another and less intrusive patch.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 arch/x86/kernel/smpboot.c |   23 -----------------------
 1 files changed, 0 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 8ed8908..c2871d3 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -312,26 +312,6 @@ void __cpuinit smp_store_cpu_info(int id)
 		identify_secondary_cpu(c);
 }
 
-static void __cpuinit check_cpu_siblings_on_same_node(int cpu1, int cpu2)
-{
-	int node1 = early_cpu_to_node(cpu1);
-	int node2 = early_cpu_to_node(cpu2);
-
-	/*
-	 * Our CPU scheduler assumes all logical cpus in the same physical cpu
-	 * share the same node. But, buggy ACPI or NUMA emulation might assign
-	 * them to different node. Fix it.
-	 */
-	if (node1 != node2) {
-		pr_warning("CPU %d in node %d and CPU %d in node %d are in the same physical CPU. forcing same node %d\n",
-			   cpu1, node1, cpu2, node2, node2);
-
-		numa_remove_cpu(cpu1);
-		numa_set_node(cpu1, node2);
-		numa_add_cpu(cpu1);
-	}
-}
-
 static void __cpuinit link_thread_siblings(int cpu1, int cpu2)
 {
 	cpumask_set_cpu(cpu1, cpu_sibling_mask(cpu2));
@@ -340,7 +320,6 @@ static void __cpuinit link_thread_siblings(int cpu1, int cpu2)
 	cpumask_set_cpu(cpu2, cpu_core_mask(cpu1));
 	cpumask_set_cpu(cpu1, cpu_llc_shared_mask(cpu2));
 	cpumask_set_cpu(cpu2, cpu_llc_shared_mask(cpu1));
-	check_cpu_siblings_on_same_node(cpu1, cpu2);
 }
 
 
@@ -382,12 +361,10 @@ void __cpuinit set_cpu_sibling_map(int cpu)
 		    per_cpu(cpu_llc_id, cpu) == per_cpu(cpu_llc_id, i)) {
 			cpumask_set_cpu(i, cpu_llc_shared_mask(cpu));
 			cpumask_set_cpu(cpu, cpu_llc_shared_mask(i));
-			check_cpu_siblings_on_same_node(cpu, i);
 		}
 		if (c->phys_proc_id == cpu_data(i).phys_proc_id) {
 			cpumask_set_cpu(i, cpu_core_mask(cpu));
 			cpumask_set_cpu(cpu, cpu_core_mask(i));
-			check_cpu_siblings_on_same_node(cpu, i);
 			/*
 			 *  Does this new cpu bringup a new core?
 			 */
-- 
1.7.3.1


[-- Attachment #3: 0002-x86-64-NUMA-reimplement-cpu-node-map-initialization-.patch --]
[-- Type: application/octet-stream, Size: 3777 bytes --]

From 209e31ed67190c82f33c00769095b241dde11a6b Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Fri, 8 Apr 2011 23:23:50 +0900
Subject: [PATCH 2/2] x86-64, NUMA: reimplement cpu node map initialization for fake numa.

Currently, numa=fake boot parameter is broken. If it's used, kernel
may panic due to devide by zero error depending on CPU configuration

Call Trace:
 [<ffffffff8104ad4c>] find_busiest_group+0x38c/0xd30
 [<ffffffff81086aff>] ? local_clock+0x6f/0x80
 [<ffffffff81050533>] load_balance+0xa3/0x600
 [<ffffffff81050f53>] idle_balance+0xf3/0x180
 [<ffffffff81550092>] schedule+0x722/0x7d0
 [<ffffffff81550538>] ? wait_for_common+0x128/0x190
 [<ffffffff81550a65>] schedule_timeout+0x265/0x320
 [<ffffffff81095815>] ? lock_release_holdtime+0x35/0x1a0
 [<ffffffff81550538>] ? wait_for_common+0x128/0x190
 [<ffffffff8109bb6c>] ? __lock_release+0x9c/0x1d0
 [<ffffffff815534e0>] ? _raw_spin_unlock_irq+0x30/0x40
 [<ffffffff815534e0>] ? _raw_spin_unlock_irq+0x30/0x40
 [<ffffffff81550540>] wait_for_common+0x130/0x190
 [<ffffffff81051920>] ? try_to_wake_up+0x510/0x510
 [<ffffffff8155067d>] wait_for_completion+0x1d/0x20
 [<ffffffff8107f36c>] kthread_create_on_node+0xac/0x150
 [<ffffffff81077bb0>] ? process_scheduled_works+0x40/0x40
 [<ffffffff8155045f>] ? wait_for_common+0x4f/0x190
 [<ffffffff8107a283>] __alloc_workqueue_key+0x1a3/0x590
 [<ffffffff81e0cce2>] cpuset_init_smp+0x6b/0x7b
 [<ffffffff81df3d07>] kernel_init+0xc3/0x182
 [<ffffffff8155d5e4>] kernel_thread_helper+0x4/0x10
 [<ffffffff81553cd4>] ? retint_restore_args+0x13/0x13
 [<ffffffff81df3c44>] ? start_kernel+0x400/0x400
 [<ffffffff8155d5e0>] ? gs_change+0x13/0x13

The divede by zero is caused following line. (ie group->cpu_power==0)

kernel/sched_fair.c::update_sg_lb_stats()
        /* Adjust by relative CPU power of the group */
        sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) /
group->cpu_power;

This is regression by commit e23bba6044 (x86-64, NUMA: Unify emulated
distance mapping) because it changes cpu -> node mapping in the process
of dropping fake_physnodes().

  old) all cpus are assinged node 0
  now) cpus are assigned round robin
       (the logic is implemented by numa_init_array())

  Note: The change is heppen only if the system doesn't have neigher
    ACPI srat table nor AMD northbridge NUMA information.

Why round robin assignment doesn't work? Because init_numa_sched_groups_power()
assumes all logical cpus in the same physical cpu share the same node
(Then it only accounts for group_first_cpu()), and the simple round
robin breaks the above assumption.

This patch enforce all cpus use node 0 if fake numa is enabled as same
as v2.6.38 or priorer.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/mm/numa_emulation.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
index ad091e4..4fda351 100644
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -419,6 +419,14 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
 	/* free the copied physical distance table */
 	if (phys_dist)
 		memblock_x86_free_range(__pa(phys_dist), __pa(phys_dist) + phys_size);
+
+	/* Setup cpu node map. */
+	for (i = 0; i < nr_cpu_ids; i++) {
+		if (early_cpu_to_node(i) != NUMA_NO_NODE)
+			continue;
+		numa_set_node(i, 0);
+	}
+
 	return;
 
 no_emu:
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Linux 2.6.39-rc4 (regression: NUMA on multi-node CPUs broken)
  2011-04-21  2:04     ` KOSAKI Motohiro
@ 2011-04-21  2:17       ` David Rientjes
  2011-04-21  5:45         ` KOSAKI Motohiro
  0 siblings, 1 reply; 18+ messages in thread
From: David Rientjes @ 2011-04-21  2:17 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andreas Herrmann, Linus Torvalds, linux-kernel, Ingo Molnar, Tejun Heo

On Thu, 21 Apr 2011, KOSAKI Motohiro wrote:

> Simple revert 7d6b46707f24 makes the same boot failure again.
> 

Do you have CONFIG_DEBUG_PER_CPU_MAPS enabled?  If not, please send your 
.config.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 1/2] x86, numa: Revert "Fix fakenuma boot failure"
  2011-04-21  0:45   ` David Rientjes
  2011-04-21  2:04     ` KOSAKI Motohiro
@ 2011-04-21  2:19     ` David Rientjes
  2011-04-21  2:19       ` [patch 2/2] x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPS David Rientjes
                         ` (2 more replies)
  2011-04-21 19:45     ` Linux 2.6.39-rc4 (regression: NUMA on multi-node CPUs broken) David Rientjes
  2 siblings, 3 replies; 18+ messages in thread
From: David Rientjes @ 2011-04-21  2:19 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andreas Herrmann, KOSAKI Motohiro, linux-kernel, Tejun Heo, x86

7d6b46707f24 (x86, NUMA: Fix fakenuma boot failure) could cause physical 
NUMA topologies to move sibling cpus to a single node when in reality 
they are in separate domains.  This may result in some nodes being 
completely void of cpus, which doesn't accurately represent the correct 
topology.

This commit was intended as a fix for NUMA emulation, but should not 
cause a regression for real NUMA machines as a side effect.

Reported-by: Andreas Herrmann <herrmann.der.user@googlemail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 arch/x86/kernel/smpboot.c |   23 -----------------------
 1 files changed, 0 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -312,26 +312,6 @@ void __cpuinit smp_store_cpu_info(int id)
 		identify_secondary_cpu(c);
 }
 
-static void __cpuinit check_cpu_siblings_on_same_node(int cpu1, int cpu2)
-{
-	int node1 = early_cpu_to_node(cpu1);
-	int node2 = early_cpu_to_node(cpu2);
-
-	/*
-	 * Our CPU scheduler assumes all logical cpus in the same physical cpu
-	 * share the same node. But, buggy ACPI or NUMA emulation might assign
-	 * them to different node. Fix it.
-	 */
-	if (node1 != node2) {
-		pr_warning("CPU %d in node %d and CPU %d in node %d are in the same physical CPU. forcing same node %d\n",
-			   cpu1, node1, cpu2, node2, node2);
-
-		numa_remove_cpu(cpu1);
-		numa_set_node(cpu1, node2);
-		numa_add_cpu(cpu1);
-	}
-}
-
 static void __cpuinit link_thread_siblings(int cpu1, int cpu2)
 {
 	cpumask_set_cpu(cpu1, cpu_sibling_mask(cpu2));
@@ -340,7 +320,6 @@ static void __cpuinit link_thread_siblings(int cpu1, int cpu2)
 	cpumask_set_cpu(cpu2, cpu_core_mask(cpu1));
 	cpumask_set_cpu(cpu1, cpu_llc_shared_mask(cpu2));
 	cpumask_set_cpu(cpu2, cpu_llc_shared_mask(cpu1));
-	check_cpu_siblings_on_same_node(cpu1, cpu2);
 }
 
 
@@ -382,12 +361,10 @@ void __cpuinit set_cpu_sibling_map(int cpu)
 		    per_cpu(cpu_llc_id, cpu) == per_cpu(cpu_llc_id, i)) {
 			cpumask_set_cpu(i, cpu_llc_shared_mask(cpu));
 			cpumask_set_cpu(cpu, cpu_llc_shared_mask(i));
-			check_cpu_siblings_on_same_node(cpu, i);
 		}
 		if (c->phys_proc_id == cpu_data(i).phys_proc_id) {
 			cpumask_set_cpu(i, cpu_core_mask(cpu));
 			cpumask_set_cpu(cpu, cpu_core_mask(i));
-			check_cpu_siblings_on_same_node(cpu, i);
 			/*
 			 *  Does this new cpu bringup a new core?
 			 */

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 2/2] x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPS
  2011-04-21  2:19     ` [patch 1/2] x86, numa: Revert "Fix fakenuma boot failure" David Rientjes
@ 2011-04-21  2:19       ` David Rientjes
  2011-04-21  5:45         ` KOSAKI Motohiro
  2011-04-21 12:10         ` [tip:x86/urgent] " tip-bot for David Rientjes
  2011-04-21  5:45       ` [patch 1/2] x86, numa: Revert "Fix fakenuma boot failure" KOSAKI Motohiro
  2011-04-21 12:09       ` [tip:x86/urgent] Revert "x86, NUMA: Fix " tip-bot for David Rientjes
  2 siblings, 2 replies; 18+ messages in thread
From: David Rientjes @ 2011-04-21  2:19 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andreas Herrmann, KOSAKI Motohiro, linux-kernel, Tejun Heo, x86

cpu nodemasks under CONFIG_DEBUG_PER_CPU_MAPS when NUMA emulation is 
enabled is currently broken because it does not iterate through every 
emulated node and bind cpus that have affinity to it.  NUMA emulation 
should bind each cpu to every local node to accurately represent the true 
NUMA topology of the underlying machine.

debug_cpumask_set_cpu() needs to be fixed at the same time so that the 
debugging information that it emits shows the new cpumask of the node 
being assigned when the cpu is being added or removed.  It can now take 
responsibility of setting or clearing the cpu itself to remove the need 
for duplicate code.

Also changes its last formal, "enable", to have the correct bool type 
since it can only be true or false.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 arch/x86/include/asm/numa.h  |    2 +-
 arch/x86/mm/numa.c           |   27 +++++++++++----------------
 arch/x86/mm/numa_emulation.c |   20 ++++++--------------
 3 files changed, 18 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
--- a/arch/x86/include/asm/numa.h
+++ b/arch/x86/include/asm/numa.h
@@ -51,7 +51,7 @@ static inline void numa_remove_cpu(int cpu)		{ }
 #endif	/* CONFIG_NUMA */
 
 #ifdef CONFIG_DEBUG_PER_CPU_MAPS
-struct cpumask __cpuinit *debug_cpumask_set_cpu(int cpu, int enable);
+void debug_cpumask_set_cpu(int cpu, int node, bool enable);
 #endif
 
 #endif	/* _ASM_X86_NUMA_H */
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -213,9 +213,8 @@ int early_cpu_to_node(int cpu)
 	return per_cpu(x86_cpu_to_node_map, cpu);
 }
 
-struct cpumask __cpuinit *debug_cpumask_set_cpu(int cpu, int enable)
+void debug_cpumask_set_cpu(int cpu, int node, bool enable)
 {
-	int node = early_cpu_to_node(cpu);
 	struct cpumask *mask;
 	char buf[64];
 
@@ -227,9 +226,14 @@ struct cpumask __cpuinit *debug_cpumask_set_cpu(int cpu, int enable)
 	if (!mask) {
 		pr_err("node_to_cpumask_map[%i] NULL\n", node);
 		dump_stack();
-		return NULL;
+		return;
 	}
 
+	if (enable)
+		cpumask_set_cpu(cpu, mask);
+	else
+		cpumask_clear_cpu(cpu, mask);
+
 	cpulist_scnprintf(buf, sizeof(buf), mask);
 	printk(KERN_DEBUG "%s cpu %d node %d: mask now %s\n",
 		enable ? "numa_add_cpu" : "numa_remove_cpu",
@@ -238,28 +242,19 @@ struct cpumask __cpuinit *debug_cpumask_set_cpu(int cpu, int enable)
 }
 
 # ifndef CONFIG_NUMA_EMU
-static void __cpuinit numa_set_cpumask(int cpu, int enable)
+static void __cpuinit numa_set_cpumask(int cpu, bool enable)
 {
-	struct cpumask *mask;
-
-	mask = debug_cpumask_set_cpu(cpu, enable);
-	if (!mask)
-		return;
-
-	if (enable)
-		cpumask_set_cpu(cpu, mask);
-	else
-		cpumask_clear_cpu(cpu, mask);
+	debug_cpumask_set_cpu(cpu, early_cpu_to_node(cpu), enable);
 }
 
 void __cpuinit numa_add_cpu(int cpu)
 {
-	numa_set_cpumask(cpu, 1);
+	numa_set_cpumask(cpu, true);
 }
 
 void __cpuinit numa_remove_cpu(int cpu)
 {
-	numa_set_cpumask(cpu, 0);
+	numa_set_cpumask(cpu, false);
 }
 # endif	/* !CONFIG_NUMA_EMU */
 
diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -454,10 +454,9 @@ void __cpuinit numa_remove_cpu(int cpu)
 		cpumask_clear_cpu(cpu, node_to_cpumask_map[i]);
 }
 #else	/* !CONFIG_DEBUG_PER_CPU_MAPS */
-static void __cpuinit numa_set_cpumask(int cpu, int enable)
+static void __cpuinit numa_set_cpumask(int cpu, bool enable)
 {
-	struct cpumask *mask;
-	int nid, physnid, i;
+	int nid, physnid;
 
 	nid = early_cpu_to_node(cpu);
 	if (nid == NUMA_NO_NODE) {
@@ -467,28 +466,21 @@ static void __cpuinit numa_set_cpumask(int cpu, int enable)
 
 	physnid = emu_nid_to_phys[nid];
 
-	for_each_online_node(i) {
+	for_each_online_node(nid) {
 		if (emu_nid_to_phys[nid] != physnid)
 			continue;
 
-		mask = debug_cpumask_set_cpu(cpu, enable);
-		if (!mask)
-			return;
-
-		if (enable)
-			cpumask_set_cpu(cpu, mask);
-		else
-			cpumask_clear_cpu(cpu, mask);
+		debug_cpumask_set_cpu(cpu, nid, enable);
 	}
 }
 
 void __cpuinit numa_add_cpu(int cpu)
 {
-	numa_set_cpumask(cpu, 1);
+	numa_set_cpumask(cpu, true);
 }
 
 void __cpuinit numa_remove_cpu(int cpu)
 {
-	numa_set_cpumask(cpu, 0);
+	numa_set_cpumask(cpu, false);
 }
 #endif	/* !CONFIG_DEBUG_PER_CPU_MAPS */

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux 2.6.39-rc4 (regression: NUMA on multi-node CPUs broken)
  2011-04-21  2:17       ` David Rientjes
@ 2011-04-21  5:45         ` KOSAKI Motohiro
  0 siblings, 0 replies; 18+ messages in thread
From: KOSAKI Motohiro @ 2011-04-21  5:45 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, Andreas Herrmann, Linus Torvalds, linux-kernel,
	Ingo Molnar, Tejun Heo

> On Thu, 21 Apr 2011, KOSAKI Motohiro wrote:
> 
> > Simple revert 7d6b46707f24 makes the same boot failure again.
> > 
> 
> Do you have CONFIG_DEBUG_PER_CPU_MAPS enabled?  If not, please send your 
> .config.

Oops. Yes, I have CONFIG_DEBUG_PER_CPU_MAPS=y.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 1/2] x86, numa: Revert "Fix fakenuma boot failure"
  2011-04-21  2:19     ` [patch 1/2] x86, numa: Revert "Fix fakenuma boot failure" David Rientjes
  2011-04-21  2:19       ` [patch 2/2] x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPS David Rientjes
@ 2011-04-21  5:45       ` KOSAKI Motohiro
  2011-04-21 12:09       ` [tip:x86/urgent] Revert "x86, NUMA: Fix " tip-bot for David Rientjes
  2 siblings, 0 replies; 18+ messages in thread
From: KOSAKI Motohiro @ 2011-04-21  5:45 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, Linus Torvalds, Ingo Molnar, Andreas Herrmann,
	linux-kernel, Tejun Heo, x86

> 7d6b46707f24 (x86, NUMA: Fix fakenuma boot failure) could cause physical 
> NUMA topologies to move sibling cpus to a single node when in reality 
> they are in separate domains.  This may result in some nodes being 
> completely void of cpus, which doesn't accurately represent the correct 
> topology.
> 
> This commit was intended as a fix for NUMA emulation, but should not 
> cause a regression for real NUMA machines as a side effect.
> 
> Reported-by: Andreas Herrmann <herrmann.der.user@googlemail.com>
> Signed-off-by: David Rientjes <rientjes@google.com>

Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 2/2] x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPS
  2011-04-21  2:19       ` [patch 2/2] x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPS David Rientjes
@ 2011-04-21  5:45         ` KOSAKI Motohiro
  2011-04-21 19:43           ` David Rientjes
  2011-04-21 12:10         ` [tip:x86/urgent] " tip-bot for David Rientjes
  1 sibling, 1 reply; 18+ messages in thread
From: KOSAKI Motohiro @ 2011-04-21  5:45 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, Linus Torvalds, Ingo Molnar, Andreas Herrmann,
	linux-kernel, Tejun Heo, x86

> cpu nodemasks under CONFIG_DEBUG_PER_CPU_MAPS when NUMA emulation is 
> enabled is currently broken because it does not iterate through every 
> emulated node and bind cpus that have affinity to it.  NUMA emulation 
> should bind each cpu to every local node to accurately represent the true 
> NUMA topology of the underlying machine.
> 
> debug_cpumask_set_cpu() needs to be fixed at the same time so that the 
> debugging information that it emits shows the new cpumask of the node 
> being assigned when the cpu is being added or removed.  It can now take 
> responsibility of setting or clearing the cpu itself to remove the need 
> for duplicate code.
> 
> Also changes its last formal, "enable", to have the correct bool type 
> since it can only be true or false.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>

Ok, this is better. I haven't realized node_to_cpumask_map[] don't
need exclusive cpu map. 

Thank you!


Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>



However

> -struct cpumask __cpuinit *debug_cpumask_set_cpu(int cpu, int enable)
> +void debug_cpumask_set_cpu(int cpu, int node, bool enable)
>  {
> -	int node = early_cpu_to_node(cpu);
>  	struct cpumask *mask;
>  	char buf[64];
>  
> @@ -227,9 +226,14 @@ struct cpumask __cpuinit *debug_cpumask_set_cpu(int cpu, int enable)
>  	if (!mask) {
>  		pr_err("node_to_cpumask_map[%i] NULL\n", node);
>  		dump_stack();
> -		return NULL;
> +		return;
>  	}
>  
> +	if (enable)
> +		cpumask_set_cpu(cpu, mask);
> +	else
> +		cpumask_clear_cpu(cpu, mask);
> +

Following patch also shold be apply?


From aaca24826696f7911bd66380baa18cfbe4f4b18e Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Thu, 21 Apr 2011 14:01:42 +0900
Subject: [PATCH] Fix

debug_cpumask_set_cpu() has tree return statement. we have change
rest two return statement.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 arch/x86/mm/numa.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 0471b1d6..745258d 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -220,7 +220,7 @@ void debug_cpumask_set_cpu(int cpu, int node, bool enable)
 
 	if (node == NUMA_NO_NODE) {
 		/* early_cpu_to_node() already emits a warning and trace */
-		return NULL;
+		return;
 	}
 	mask = node_to_cpumask_map[node];
 	if (!mask) {
@@ -238,7 +238,7 @@ void debug_cpumask_set_cpu(int cpu, int node, bool enable)
 	printk(KERN_DEBUG "%s cpu %d node %d: mask now %s\n",
 		enable ? "numa_add_cpu" : "numa_remove_cpu",
 		cpu, node, buf);
-	return mask;
+	return;
 }
 
 # ifndef CONFIG_NUMA_EMU
-- 
1.7.3.1





^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Linux 2.6.39-rc4 (regression: NUMA on multi-node CPUs broken)
  2011-04-21  2:04   ` KOSAKI Motohiro
@ 2011-04-21  6:04     ` Andreas Herrmann
  0 siblings, 0 replies; 18+ messages in thread
From: Andreas Herrmann @ 2011-04-21  6:04 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Linus Torvalds, Linux Kernel Mailing List, Ingo Molnar, Tejun Heo

On Thu, Apr 21, 2011 at 11:04:27AM +0900, KOSAKI Motohiro wrote:
> > Following patch breaks real NUMA on multi-node CPUs like AMD
> > Magny-Cours and should be reverted (or changed to just take effect in
> > case of numa=fake):
> > 
> >   commit 7d6b46707f2491a94f4bd3b4329d2d7f809e9368
> >   Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> >   Date:   Fri Apr 15 20:39:01 2011 +0900
> > 
> >     x86, NUMA: Fix fakenuma boot failure
> > 
> >     ...
> > 
> >     Thus, this patch implements a reassignment of node-ids if buggy firmware
> >     or numa emulation makes wrong cpu node map. Tt enforce all logical cpus
> >     in the same physical cpu share the same node.
> > 
> >     ...
> > 
> >   +static void __cpuinit check_cpu_siblings_on_same_node(int cpu1, int cpu2)
> >   +{
> >   +       int node1 = early_cpu_to_node(cpu1);
> >   +       int node2 = early_cpu_to_node(cpu2);
> >   +
> >   +       /*
> >   +        * Our CPU scheduler assumes all logical cpus in the same physical cpu
> >   +        * share the same node. But, buggy ACPI or NUMA emulation might assign
> >   +        * them to different node. Fix it.
> >   +        */
> > 
> >    ...
> > 
> > This is a false assumption. Magny-Cours has two nodes in the same
> > physical package. The scheduler was (kind of) fixed to work around
> > this boot problem for multi-node CPUs (with 2.6.32). 
> 
> I agree we have to fix this ASAP. I also think we have to avoid reintroduce 
> the same again. Can you please tell me the commit-id of this one? 

It's

  commit 5a925b4282d7f805deafde62001a83dbaf8be275
  Author: Andreas Herrmann <andreas.herrmann3@amd.com>
  Date:   Thu Sep 3 09:44:28 2009 +0200

    x86, sched: Workaround broken sched domain creation for AMD Magny-Cours
 

 
> > If this is also
> > an issue with wrong cpu node maps in case of NUMA emulation this might
> > be fixed similar or this quirk should only be applied in case of NUMA
> > emulation.
> 
> Indeed.
> 
> Tejun, Do you remember I sent numa emulation specific patch at first. now
> I'm beside with Andreas. Because I bet current numa fallback code (you 
> pointed out one) has no user. 
> 
> Or, please let us know if you have an alternative patch.
> 
> Attached revert and fakenuma spefic fix patches.


Andreas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [tip:x86/urgent] Revert "x86, NUMA: Fix fakenuma boot failure"
  2011-04-21  2:19     ` [patch 1/2] x86, numa: Revert "Fix fakenuma boot failure" David Rientjes
  2011-04-21  2:19       ` [patch 2/2] x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPS David Rientjes
  2011-04-21  5:45       ` [patch 1/2] x86, numa: Revert "Fix fakenuma boot failure" KOSAKI Motohiro
@ 2011-04-21 12:09       ` tip-bot for David Rientjes
  2 siblings, 0 replies; 18+ messages in thread
From: tip-bot for David Rientjes @ 2011-04-21 12:09 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, herrmann.der.user, tj, tglx,
	rientjes, kosaki.motohiro, mingo

Commit-ID:  37f8527dbfd05af0f670aa02370d0c4cca7fbda6
Gitweb:     http://git.kernel.org/tip/37f8527dbfd05af0f670aa02370d0c4cca7fbda6
Author:     David Rientjes <rientjes@google.com>
AuthorDate: Wed, 20 Apr 2011 19:19:10 -0700
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 21 Apr 2011 11:30:59 +0200

Revert "x86, NUMA: Fix fakenuma boot failure"

Andreas Herrmann reported that 7d6b46707f24 ("x86, NUMA: Fix fakenuma
boot failure") causes certain physical NUMA topologies (for example
AMD Magny-Cours) to move sibling cpus to a single node when in reality
they are in separate domains.

This may result in some nodes being completely void of cpus, which
doesn't accurately represent the correct topology. The system will
boot, but will have suboptimal NUMA performance.

This commit was intended as a fix for NUMA emulation, but should
not cause a regression for real NUMA machines as a side effect.

( There will be a separate fix for the numa-debug code, which
  will not affect physical topologies. )

Reported-by: Andreas Herrmann <herrmann.der.user@googlemail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1104201918110.12634@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/smpboot.c |   23 -----------------------
 1 files changed, 0 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 8ed8908..c2871d3 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -312,26 +312,6 @@ void __cpuinit smp_store_cpu_info(int id)
 		identify_secondary_cpu(c);
 }
 
-static void __cpuinit check_cpu_siblings_on_same_node(int cpu1, int cpu2)
-{
-	int node1 = early_cpu_to_node(cpu1);
-	int node2 = early_cpu_to_node(cpu2);
-
-	/*
-	 * Our CPU scheduler assumes all logical cpus in the same physical cpu
-	 * share the same node. But, buggy ACPI or NUMA emulation might assign
-	 * them to different node. Fix it.
-	 */
-	if (node1 != node2) {
-		pr_warning("CPU %d in node %d and CPU %d in node %d are in the same physical CPU. forcing same node %d\n",
-			   cpu1, node1, cpu2, node2, node2);
-
-		numa_remove_cpu(cpu1);
-		numa_set_node(cpu1, node2);
-		numa_add_cpu(cpu1);
-	}
-}
-
 static void __cpuinit link_thread_siblings(int cpu1, int cpu2)
 {
 	cpumask_set_cpu(cpu1, cpu_sibling_mask(cpu2));
@@ -340,7 +320,6 @@ static void __cpuinit link_thread_siblings(int cpu1, int cpu2)
 	cpumask_set_cpu(cpu2, cpu_core_mask(cpu1));
 	cpumask_set_cpu(cpu1, cpu_llc_shared_mask(cpu2));
 	cpumask_set_cpu(cpu2, cpu_llc_shared_mask(cpu1));
-	check_cpu_siblings_on_same_node(cpu1, cpu2);
 }
 
 
@@ -382,12 +361,10 @@ void __cpuinit set_cpu_sibling_map(int cpu)
 		    per_cpu(cpu_llc_id, cpu) == per_cpu(cpu_llc_id, i)) {
 			cpumask_set_cpu(i, cpu_llc_shared_mask(cpu));
 			cpumask_set_cpu(cpu, cpu_llc_shared_mask(i));
-			check_cpu_siblings_on_same_node(cpu, i);
 		}
 		if (c->phys_proc_id == cpu_data(i).phys_proc_id) {
 			cpumask_set_cpu(i, cpu_core_mask(cpu));
 			cpumask_set_cpu(cpu, cpu_core_mask(i));
-			check_cpu_siblings_on_same_node(cpu, i);
 			/*
 			 *  Does this new cpu bringup a new core?
 			 */

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [tip:x86/urgent] x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPS
  2011-04-21  2:19       ` [patch 2/2] x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPS David Rientjes
  2011-04-21  5:45         ` KOSAKI Motohiro
@ 2011-04-21 12:10         ` tip-bot for David Rientjes
  1 sibling, 0 replies; 18+ messages in thread
From: tip-bot for David Rientjes @ 2011-04-21 12:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, herrmann.der.user, tj, tglx,
	rientjes, kosaki.motohiro, mingo

Commit-ID:  7a6c6547825a2324faa76cff856db11d78de075e
Gitweb:     http://git.kernel.org/tip/7a6c6547825a2324faa76cff856db11d78de075e
Author:     David Rientjes <rientjes@google.com>
AuthorDate: Wed, 20 Apr 2011 19:19:13 -0700
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 21 Apr 2011 11:31:00 +0200

x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPS

The cpu<->node mappings under CONFIG_DEBUG_PER_CPU_MAPS=y
when NUMA emulation is enabled is currently broken because it does
not iterate through every emulated node and bind cpus that have
affinity to it.

NUMA emulation should bind each cpu to every local node to
accurately represent the true NUMA topology of the underlying
machine.

debug_cpumask_set_cpu() needs to be fixed at the same time so
that the debugging information that it emits shows the new
cpumask of the node being assigned when the cpu is being added
or removed.

It can now take responsibility of setting or clearing the cpu
itself to remove the need for duplicate code.

Also change its last parameter, "enable", to have the correct bool
type since it can only be true or false.

 -v2: Fix the return statements, by Kosaki Motohiro

Acked-and-Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1104201918470.12634@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/include/asm/numa.h  |    2 +-
 arch/x86/mm/numa.c           |   31 +++++++++++++------------------
 arch/x86/mm/numa_emulation.c |   20 ++++++--------------
 3 files changed, 20 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
index 3d4dab4..a50fc9f 100644
--- a/arch/x86/include/asm/numa.h
+++ b/arch/x86/include/asm/numa.h
@@ -51,7 +51,7 @@ static inline void numa_remove_cpu(int cpu)		{ }
 #endif	/* CONFIG_NUMA */
 
 #ifdef CONFIG_DEBUG_PER_CPU_MAPS
-struct cpumask __cpuinit *debug_cpumask_set_cpu(int cpu, int enable);
+void debug_cpumask_set_cpu(int cpu, int node, bool enable);
 #endif
 
 #endif	/* _ASM_X86_NUMA_H */
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 9559d36..745258d 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -213,53 +213,48 @@ int early_cpu_to_node(int cpu)
 	return per_cpu(x86_cpu_to_node_map, cpu);
 }
 
-struct cpumask __cpuinit *debug_cpumask_set_cpu(int cpu, int enable)
+void debug_cpumask_set_cpu(int cpu, int node, bool enable)
 {
-	int node = early_cpu_to_node(cpu);
 	struct cpumask *mask;
 	char buf[64];
 
 	if (node == NUMA_NO_NODE) {
 		/* early_cpu_to_node() already emits a warning and trace */
-		return NULL;
+		return;
 	}
 	mask = node_to_cpumask_map[node];
 	if (!mask) {
 		pr_err("node_to_cpumask_map[%i] NULL\n", node);
 		dump_stack();
-		return NULL;
+		return;
 	}
 
+	if (enable)
+		cpumask_set_cpu(cpu, mask);
+	else
+		cpumask_clear_cpu(cpu, mask);
+
 	cpulist_scnprintf(buf, sizeof(buf), mask);
 	printk(KERN_DEBUG "%s cpu %d node %d: mask now %s\n",
 		enable ? "numa_add_cpu" : "numa_remove_cpu",
 		cpu, node, buf);
-	return mask;
+	return;
 }
 
 # ifndef CONFIG_NUMA_EMU
-static void __cpuinit numa_set_cpumask(int cpu, int enable)
+static void __cpuinit numa_set_cpumask(int cpu, bool enable)
 {
-	struct cpumask *mask;
-
-	mask = debug_cpumask_set_cpu(cpu, enable);
-	if (!mask)
-		return;
-
-	if (enable)
-		cpumask_set_cpu(cpu, mask);
-	else
-		cpumask_clear_cpu(cpu, mask);
+	debug_cpumask_set_cpu(cpu, early_cpu_to_node(cpu), enable);
 }
 
 void __cpuinit numa_add_cpu(int cpu)
 {
-	numa_set_cpumask(cpu, 1);
+	numa_set_cpumask(cpu, true);
 }
 
 void __cpuinit numa_remove_cpu(int cpu)
 {
-	numa_set_cpumask(cpu, 0);
+	numa_set_cpumask(cpu, false);
 }
 # endif	/* !CONFIG_NUMA_EMU */
 
diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
index ad091e4..de84cc1 100644
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -454,10 +454,9 @@ void __cpuinit numa_remove_cpu(int cpu)
 		cpumask_clear_cpu(cpu, node_to_cpumask_map[i]);
 }
 #else	/* !CONFIG_DEBUG_PER_CPU_MAPS */
-static void __cpuinit numa_set_cpumask(int cpu, int enable)
+static void __cpuinit numa_set_cpumask(int cpu, bool enable)
 {
-	struct cpumask *mask;
-	int nid, physnid, i;
+	int nid, physnid;
 
 	nid = early_cpu_to_node(cpu);
 	if (nid == NUMA_NO_NODE) {
@@ -467,28 +466,21 @@ static void __cpuinit numa_set_cpumask(int cpu, int enable)
 
 	physnid = emu_nid_to_phys[nid];
 
-	for_each_online_node(i) {
+	for_each_online_node(nid) {
 		if (emu_nid_to_phys[nid] != physnid)
 			continue;
 
-		mask = debug_cpumask_set_cpu(cpu, enable);
-		if (!mask)
-			return;
-
-		if (enable)
-			cpumask_set_cpu(cpu, mask);
-		else
-			cpumask_clear_cpu(cpu, mask);
+		debug_cpumask_set_cpu(cpu, nid, enable);
 	}
 }
 
 void __cpuinit numa_add_cpu(int cpu)
 {
-	numa_set_cpumask(cpu, 1);
+	numa_set_cpumask(cpu, true);
 }
 
 void __cpuinit numa_remove_cpu(int cpu)
 {
-	numa_set_cpumask(cpu, 0);
+	numa_set_cpumask(cpu, false);
 }
 #endif	/* !CONFIG_DEBUG_PER_CPU_MAPS */

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [patch 2/2] x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPS
  2011-04-21  5:45         ` KOSAKI Motohiro
@ 2011-04-21 19:43           ` David Rientjes
  0 siblings, 0 replies; 18+ messages in thread
From: David Rientjes @ 2011-04-21 19:43 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Linus Torvalds, Ingo Molnar, Andreas Herrmann, linux-kernel,
	Tejun Heo, x86

On Thu, 21 Apr 2011, KOSAKI Motohiro wrote:

> From aaca24826696f7911bd66380baa18cfbe4f4b18e Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Date: Thu, 21 Apr 2011 14:01:42 +0900
> Subject: [PATCH] Fix
> 
> debug_cpumask_set_cpu() has tree return statement. we have change
> rest two return statement.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  arch/x86/mm/numa.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index 0471b1d6..745258d 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -220,7 +220,7 @@ void debug_cpumask_set_cpu(int cpu, int node, bool enable)
>  
>  	if (node == NUMA_NO_NODE) {
>  		/* early_cpu_to_node() already emits a warning and trace */
> -		return NULL;
> +		return;
>  	}
>  	mask = node_to_cpumask_map[node];
>  	if (!mask) {
> @@ -238,7 +238,7 @@ void debug_cpumask_set_cpu(int cpu, int node, bool enable)
>  	printk(KERN_DEBUG "%s cpu %d node %d: mask now %s\n",
>  		enable ? "numa_add_cpu" : "numa_remove_cpu",
>  		cpu, node, buf);
> -	return mask;
> +	return;
>  }
>  
>  # ifndef CONFIG_NUMA_EMU

Yes, it looks like Ingo fixed that up when it was merged in the latest 
git as 7a6c6547825a, thanks for pointing it out.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux 2.6.39-rc4 (regression: NUMA on multi-node CPUs broken)
  2011-04-21  0:45   ` David Rientjes
  2011-04-21  2:04     ` KOSAKI Motohiro
  2011-04-21  2:19     ` [patch 1/2] x86, numa: Revert "Fix fakenuma boot failure" David Rientjes
@ 2011-04-21 19:45     ` David Rientjes
  2 siblings, 0 replies; 18+ messages in thread
From: David Rientjes @ 2011-04-21 19:45 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Linus Torvalds, KOSAKI Motohiro, linux-kernel, Ingo Molnar, Tejun Heo

On Wed, 20 Apr 2011, David Rientjes wrote:

> I'm not sure what it's trying to address (yes, there is a problem with the 
> binding for CONFIG_NUMA_EMU && CONFIG_DEBUG_PER_CPU_MAPS, but not 
> otherwise).
> 

Andreas, the revert (37f8527dbfd0) and the new NUMA emulation fix 
(7a6c6547825a) have been merged into the latest -git, please let us know 
if there are any other issues that you notice.  Thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2011-04-21 19:45 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-19  4:57 Linux 2.6.39-rc4 Linus Torvalds
2011-04-19 20:04 ` [PATCH] uml: fix hppfs build Randy Dunlap
2011-04-19 20:09   ` Richard Weinberger
2011-04-20 15:39 ` Linux 2.6.39-rc4 (regression: NUMA on multi-node CPUs broken) Andreas Herrmann
2011-04-21  0:45   ` David Rientjes
2011-04-21  2:04     ` KOSAKI Motohiro
2011-04-21  2:17       ` David Rientjes
2011-04-21  5:45         ` KOSAKI Motohiro
2011-04-21  2:19     ` [patch 1/2] x86, numa: Revert "Fix fakenuma boot failure" David Rientjes
2011-04-21  2:19       ` [patch 2/2] x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPS David Rientjes
2011-04-21  5:45         ` KOSAKI Motohiro
2011-04-21 19:43           ` David Rientjes
2011-04-21 12:10         ` [tip:x86/urgent] " tip-bot for David Rientjes
2011-04-21  5:45       ` [patch 1/2] x86, numa: Revert "Fix fakenuma boot failure" KOSAKI Motohiro
2011-04-21 12:09       ` [tip:x86/urgent] Revert "x86, NUMA: Fix " tip-bot for David Rientjes
2011-04-21 19:45     ` Linux 2.6.39-rc4 (regression: NUMA on multi-node CPUs broken) David Rientjes
2011-04-21  2:04   ` KOSAKI Motohiro
2011-04-21  6:04     ` Andreas Herrmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.