All of lore.kernel.org
 help / color / mirror / Atom feed
* Linux 2.6.39-rc3
@ 2011-04-12  0:40 Linus Torvalds
  2011-04-12  9:02 ` Joerg Roedel
                   ` (2 more replies)
  0 siblings, 3 replies; 108+ messages in thread
From: Linus Torvalds @ 2011-04-12  0:40 UTC (permalink / raw)
  To: Linux Kernel Mailing List

It's been another almost spookily calm week. Usually this kind of
calmness happens much later in the -rc series (during -rc7 or -rc8,
say), but I'm not going to complain. I'm just still waiting for the
other shoe to drop.

And it is possible that this really ended up being a very calm release
cycle. We certainly didn't have any big revolutionary changes like the
name lookup stuff we had last cycle. So I'm quietly optimistic that no
shoe-drop will happen.

Anyway, not only has it been calm, it's been pretty normal. Two thirds
drivers is pretty normal, with the rest being fairly spread out all
over:

   3.5% Documentation/
   2.8% arch/arm/
  10.2% arch/
   2.0% drivers/media/video/
   3.3% drivers/media/
   2.6% drivers/net/wireless/
  10.3% drivers/net/
   5.5% drivers/scsi/
   8.4% drivers/staging/memrar/
  24.2% drivers/staging/
   3.5% drivers/video/
  62.0% drivers/
   2.0% fs/xfs/linux-2.6/
   4.3% fs/xfs/
   9.6% fs/
   2.2% include/linux/
   3.7% include/
   4.4% net/
   2.5% sound/

and the appended shortlog shows more details if you care. Nothing
earth-shattering.

Let's hope the release cycle continues like this. I _like_ it when
people really seem to follow the whole "big changes during the merge
window" rules.

Thanks guys,

                   Linus

---

Aaron Plattner (1):
      ALSA: hda - HDMI: Fix MCP7x audio infoframe checksums

Ajit Khaparde (1):
      be2net: Fix a potential crash during shutdown.

Alan Cox (1):
      staging: sep: remove last memrar remnants

Alexandre Courbot (5):
      sh: mach-ecovec24: support for main lcd backlight
      sh: mach-ap325rxa: move backlight control code
      fbdev: sh_mobile_lcdcfb: add blanking support
      fbdev: sh_mobile_lcdcfb: fix module lock acquisition
      serial: sh-sci: prevent setup of uninitialized serial console

Andre Przywara (2):
      KVM: fix XSAVE bit scanning
      KVM: move and fix substitue search for missing CPUID entries

Antonio Ospite (1):
      ASoC: zylonite: set .codec_dai_name in initializer

Arjan Mels (4):
      staging: usbip: fix shutdown problems.
      staging: usbip: bugfixes related to kthread conversion
      staging: usbip: bugfix add number of packets for isochronous frames
      staging: usbip: bugfix for isochronous packets and optimization

Artem Bityutskiy (10):
      UBIFS: do not read flash unnecessarily
      UBIFS: fix oops on error path in read_pnode
      UBIFS: fix assertion warnings
      UBIFS: do not select KALLSYMS_ALL
      UBIFS: unify error path dbg_debugfs_init_fs
      UBIFS: fix error path in dbg_debugfs_init_fs
      UBIFS: fix debugging failure in dbg_check_space_info
      UBI: check if we are in RO mode in the erase routine
      UBI: do not compare array with NULL
      UBI: do not select KALLSYMS_ALL

Ben Dooks (1):
      gpio/pca953x: fix error handling path in probe() call

Bryan Schumaker (2):
      NFS: Fix a signed vs. unsigned secinfo bug
      NFS: Change initial mount authflavor only when server returns
NFS4ERR_WRONGSEC

Chase Douglas (1):
      HID: hid-magicmouse: Increase evdev buffer size

Christian Lamparter (1):
      p54usb: IDs for two new devices

Christoph Hellwig (3):
      xfs: fix variable set but not used warnings
      xfs: fix xfs_debug warnings
      xfs: use proper interfaces for on-stack plugging

Curt Wohlgemuth (1):
      ext4: sync the directory inode in ext4_sync_parent()

Dan Carpenter (1):
      Staging: westbridge/astoria: unlock on error path

Dave Chinner (10):
      xfs: fix unreferenced var error in xfs_buf.c
      xfs: fix extent format buffer allocation size
      xfs: introduce a xfssyncd workqueue
      xfs: convert ENOSPC inode flushing to use new syncd workqueue
      xfs: introduce background inode reclaim work
      xfs: convert the xfsaild threads to a workqueue
      xfs: clean up code layout in xfs_trans_ail.c
      xfs: push the AIL from memory reclaim and periodic sync
      xfs: catch bad block numbers freeing extents.
      xfs: convert log tail checking to a warning

Dave Jones (1):
      staging: hv: fix reversed memset arguments in hv_mouse

David Henningsson (2):
      ALSA: HDA: Fix dock mic for Lenovo X220-tablet
      ALSA: HDA: Fix single internal mic on ALC275 (Sony Vaio VPCSB1C5E)

David Sterba (1):
      netfilter: h323: bug in parsing of ASN1 SEQOF field

Davidlohr Bueso (1):
      efifb: support AMD Radeon HD 6490

Enric Balletbo i Serra (2):
      smsc911x: fix mac_lock acquision before calling smsc911x_mac_read
      can: mcp251x: Allow pass IRQ flags through platform data.

Feng Tang (2):
      rtc, x86/mrst/vrtc: Fix boot crash in rtc_read_alarm()
      x86/mrst/vrtc: Fix boot crash in mrst_rtc_init()

Florian Tobias Schandinat (2):
      viafb: refresh rate bug collection
      viafb: initialize margins correct

Florian Westphal (4):
      netfilter: af_info: add network namespace parameter to route hook
      netfilter: af_info: add 'strict' parameter to limit lookup to .oif
      netfilter: xt_addrtype: replace rt6_lookup with nf_afinfo->route
      netfilter: xt_conntrack: fix inverted conntrack direction test

Gleb Natapov (1):
      KVM: Enable async page fault processing

Greg Kroah-Hartman (2):
      Staging: vt665?: prevent modules from being built into the kernel.
      staging: memrar: remove driver from tree

Guennadi Liakhovetski (1):
      ARM: arch-shmobile: only run HDMI init on respective boards

H. Peter Anvin (1):
      x86, hibernate: Initialize mmu_cr4_features during boot

Haiyang Zhang (1):
      staging: hv: Fix GARP not sent after Quick Migration

Hans Rosenfeld (1):
      x86-32, fpu: Fix FPU exception handling on non-SSE systems

Hans Schillstrom (1):
      IPVS: fix NULL ptr dereference in ip_vs_ctl.c ip_vs_genl_dump_daemons()

Helmut Schaa (1):
      mac80211: Fix duplicate frames on cooked monitor

Hong Xu (2):
      mtd: atmel_nand: fix support for CPUs that do not support DMA access
      mtd: atmel_nand: use CPU I/O when buffer is in vmalloc(ed) region

Ian Campbell (1):
      MAINTAINERS: add entry for Xen network backend

Ira W. Snyder (1):
      dt/fsldma: fix build warning caused by of_platform_device changes

J. Bruce Fields (2):
      nfsd: fix auth_domain reference leak on nlm operations
      nfsd4: fix oops on lock failure

Jan Glauber (1):
      [S390] oprofile s390: prevent stack corruption

Jan Kara (2):
      quota: Don't write quota info in dquot_commit()
      ext4: remove unnecessary [cm]time update of quota file

Javier M. Mellid (1):
      staging: sm7xx: fixed defines

Jingoo Han (1):
      video: s3c-fb: fix checkpatch errors and warning

Jiri Kosina (2):
      HID: add support for Skycable 0x3f07 wireless presenter
      HID: Add support for CH Pro Throttle

Johannes Berg (1):
      mac80211: fix comment regarding aggregation buf_size

John W. Linville (2):
      b43: allocate receive buffers big enough for max frame len + offset
      iwlwifi: accept EEPROM version 0x423 for iwl6000

Jozsef Kadlecsik (2):
      netfilter: ipset: list:set timeout variant fixes
      netfilter: ipset: references are protected by rwlock instead of mutex

Julia Lawall (1):
      drivers/video/bfin-lq035q1-fb.c: introduce missing kfree

Jussi Kivilinna (2):
      zd1211rw: remove URB_SHORT_NOT_OK flag in zd_usb_iowrite16v_async()
      zd1211rw: reset rx idle timer from tasklet

Kazuya Mio (1):
      ext4: Allow indirect-block file to grow the file size to max file size

Kuninori Morimoto (1):
      ARM: arch-shmobile: only run FSI init on respective boards

Larry Finger (1):
      rtlwifi: Fix some warnings/bugs

Linus Torvalds (3):
      mm: avoid wrapping vm_pgoff in mremap()
      pci: fix PCI bus allocation alignment handling
      Linux 2.6.39-rc3

Lucas De Marchi (1):
      Fix common misspellings

Luciano Coelho (2):
      wl12xx: fix module author's email address in the spi and sdio modules
      wl12xx: fix potential buffer overflow in testmode nvs push

Martin Schwidefsky (1):
      [S390] compile fix for latest binutils

Masami Hiramatsu (5):
      perf probe: Fix to ensure function declared file
      perf probe: Fix to remove redundant close
      perf probe: Fix multiple --vars options behavior
      perf probe: Fix to find recursively inlined function
      perf probe: Fix listing incorrect line number with inline function

Matthew Garrett (2):
      fb: Reduce priority of resource conflict message
      efifb: Add override for 11" Macbook Air 3,1

Michael Hennerich (6):
      staging: IIO: IMU: ADIS16400: Fix up SPI messages cs_change behavior
      staging: IIO: IMU: ADIS16400: Add delay after self test
      staging: IIO: IMU: ADIS16400: Fix addresses of GYRO and ACCEL
calibration offset
      staging: IIO: IMU: ADIS16400: Make sure only enabled
scan_elements are pushed into the ring
      staging: IIO: IMU: ADIS16400: Fix product ID check, skip
embedded revision number
      staging: IIO: IMU: ADIS16400: Avoid using printk facility directly

Michael Holzheu (1):
      [S390] Fix parameter passing for smp_switch_to_cpu()

Michael S. Tsirkin (1):
      KVM: fix crash on irqfd deassign

Neil Horman (1):
      ipv6: Enable RFS sk_rxhash tracking for ipv6 sockets (v2)

Nicolas Ferre (2):
      mtd: atmel_nand: trivial: change DMA usage information trace
      mtd: atmel_nand: modify test case for using DMA operations

Nobuhiro Iwamatsu (3):
      sh: sh-sci: Fix double initialization by serial_console_setup
      sh: landisk: Remove mv_nr_irqs
      sh: landisk: Remove whitespace

OGAWA Hirofumi (1):
      ipv4: Fix "Set rt->rt_iif more sanely on output routes."

Olaf Hering (2):
      staging: hv: use sync_bitops when interacting with the hypervisor
      staging: hv: update dist release parsing in hv_kvp_daemon

Ondrej Zary (1):
      s3fb: fix Virge/GX2

Padmanabh Ratnakar (2):
      be2net: Rename some struct members for clarity
      be2net: Fix suspend/resume operation

Paul Mundt (1):
      sh: select ARCH_NO_SYSDEV_OPS.

Peter Jones (1):
      efifb: Support overriding fields FW tells us with the DMI data.

Peter Korsgaard (2):
      dsa/mv88e6131: add support for mv88e6085 switch
      watchdog: mpc8xxx_wdt: fix build

Peter Oberparleiter (1):
      [S390] cio: prevent purging of CCW devices in the online state

Peter Tyser (2):
      gpio/ml_ioh_gpio: Fix output value of ioh_gpio_direction_output()
      gpio/pch_gpio: Fix output value of pch_gpio_direction_output()

Peter Zijlstra (1):
      sched: Clean up rebalance_domains() load-balance interval calculation

Randy Dunlap (4):
      mtd: mtdswap: fix printk format warning
      staging: fix hv_mouse build, needs delay.h
      staging/rtl81*: build as loadable modules only
      signal.c: fix erroneous syscall kernel-doc

Rasesh Mody (1):
      bna: Fix for handling firmware heartbeat failure

Roland Vossen (3):
      staging: brcm80211: fix for 'AC_BE txop..' logs spammed problem
      staging: brcm80211: fix for 'Short CCK' log spam
      staging: brcm80211: removed 'is_amsdu causing toss' log spam

Sascha Silbe (1):
      staging: fix olpc_dcon build errors

Sebastian Ott (1):
      [S390] qdio: fix init sequence

Senthil Balasubramanian (1):
      ath9k: Fix phy info print message with AR9485 chipset.

Sergey Senozhatsky (1):
      fbcon: Remove unused 'display *p' variable from fb_flashcursor()

Simon Horman (1):
      ARM: mach-shmobile: Correctly check for CONFIG_MACH_MACKEREL

Simon Wood (1):
      HID: add FF support for Logitech G25/G27

Stanislaw Gruszka (1):
      rt2x00: fix cancelling uninitialized work

Stefan Achatz (1):
      HID: roccat: Add support for wireless variant of Pyra

Stephen Boyd (1):
      HID: Fix typo Keyoutch -> Keytouch

Stephen Warren (1):
      ASoC: format_register_str: Don't clip register values

Steve Glendinning (1):
      net: Add support for SMSC LAN9530, LAN9730 and LAN89530

Takashi Iwai (1):
      ALSA: hda - Don't query connections for widgets have no connections

Tao Ma (2):
      ext4: fix a double free in ext4_register_li_request
      ext4: init timer earlier to avoid a kernel panic in __save_error_info

Tarek Soliman (1):
      ALSA: usb-audio: define another USB ID for a buggy USB MIDI cable

Tejun Heo (1):
      x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change

Theodore Ts'o (1):
      ext4: fix data corruption regression by reverting commit 6de9843dab3f

Thomas Gleixner (1):
      x86: visws: Fixup irq overhaul fallout

Tony Luck (1):
      xfs_destroy_workqueues() should not be tagged with__exit

Tormod Volden (3):
      savagefb: Replace magic register address with define
      savagefb: Set up I2C based on chip family instead of card id
      savagefb: Remove obsolete else clause in savage_setup_i2c_bus

Trond Myklebust (1):
      Revert "net/sunrpc: Use static const char arrays"

Ulrich Weber (1):
      pppoe: drop PPPOX_ZOMBIEs in pppoe_flush_dev

Vasily Khoruzhick (2):
      ASoC: PXA: Fix oops in __pxa2xx_pcm_prepare
      spi: Fix race condition in stop_queue()

Xiaotian Feng (1):
      genirq: Fix cpumask leak in __setup_irq()

Yevgeny Petrilin (2):
      mlx4: Sensing link type at device initialization
      mlx4_en: Restoring RX buffer pointer in case of failure

Yongqiang Yang (3):
      ext3: Fix writepage credits computation for ordered mode
      ext4: fix credits computing for indirect mapped files
      ext4: allow an active handle to be started when freezing

Yoshihiro Shimoda (2):
      dma: shdma: add checking the DMAOR_AE in sh_dmae_err
      sh: fix build error in board-sh7757lcr.c

Youquan Song (1):
      fix build fail for hv_mouse indefine udelay

Zhang Huan (1):
      jbd2: fix potential memory leak on transaction commit

Zhu Yanhai (1):
      jbd2: move bdget out of critical section

pixo (1):
      staging: ft1000-pcmcia: Fix ft1000_dnld() to work also on 64bit
architectures.

wwang (2):
      staging: rts_pstor: modify initial card clock
      staging: rts_pstor: set lun_mode in a different place

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-12  0:40 Linux 2.6.39-rc3 Linus Torvalds
@ 2011-04-12  9:02 ` Joerg Roedel
  2011-04-12 14:15   ` Alex Deucher
  2011-04-12 19:09 ` Dave Jones
  2011-04-14 20:24 ` Borislav Petkov
  2 siblings, 1 reply; 108+ messages in thread
From: Joerg Roedel @ 2011-04-12  9:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, Alex Deucher, dri-devel

On Mon, Apr 11, 2011 at 05:40:11PM -0700, Linus Torvalds wrote:
> Let's hope the release cycle continues like this. I _like_ it when
> people really seem to follow the whole "big changes during the merge
> window" rules.

Sorry for disturbing the silence, but radeon seems to have issues. I
tested -rc3 (and after that -rc1 which also has the issue) on my Laptop
and it just reboots after (or while?) GFX initialization. The last lines
of dmesg are:

 Freeing unused kernel memory: 624k freed
 Write protecting the kernel read-only data: 8192k
 Freeing unused kernel memory: 1456k freed
 Freeing unused kernel memory: 16k freed
 udev: starting version 151
 udevd (62): /proc/62/oom_adj is deprecated, please use /proc/62/oom_score_adj instead.
 [drm] Initialized drm 1.1.0 20060810
 [drm] radeon defaulting to kernel modesetting.
 [drm] radeon kernel modesetting enabled.
 radeon 0000:01:05.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
 [drm] initializing kernel modesetting (RS880 0x1002:0x9712).
 [drm] register mmio base: 0xD6400000
 [drm] register mmio size: 65536
 ATOM BIOS: HP_TAG
 radeon 0000:01:05.0: VRAM: 320M 0x00000000C0000000 - 0x00000000D3FFFFFF (320M used)
 radeon 0000:01:05.0: GTT: 512M 0x00000000A0000000 - 0x00000000BFFFFFFF
 [drm] Detected VRAM RAM=320M, BAR=256M
 [drm] RAM width 32bits DDR
 [TTM] Zone  kernel: Available graphics memory: 1896512 kiB.
 usb 7-2: new full speed USB device number 2 using ohci_hcd
 [TTM] Initializing pool allocator.
 [drm] radeon: 320M of VRAM memory ready
 [drm] radeon: 512M of GTT memory ready.
 [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
 [drm] Driver supports precise vblank timestamp query.
 [drm] radeon: irq initialized.
 [drm] GART: num cpu pages 131072, num gpu pages 131072
 [drm] Loading RS780 Microcode
 radeon 0000:01:05.0: WB enabled
 [drm] ring test succeeded in 1 usecs
 [drm] radeon: ib pool ready.

The card is a Radeon Mobility 4200:

01:05.0 VGA compatible controller: ATI Technologies Inc M880G [Mobility Radeon HD 4200]
        Subsystem: Hewlett-Packard Company Device 307e
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 18
        Region 0: Memory at c0000000 (32-bit, prefetchable) [size=256M]
        Region 1: I/O ports at 6000 [size=256]
        Region 2: Memory at d6400000 (32-bit, non-prefetchable) [size=64K]
        Region 5: Memory at d6300000 (32-bit, non-prefetchable) [size=1M]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [a0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
                Address: 0000000000000000  Data: 0000
        Kernel driver in use: radeon
        Kernel modules: radeon

The problem does not happen with 2.6.38. I try to bisect this further down to a
commit. Alex, please let me know if you need any further information.

	Joerg


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-12  9:02 ` Joerg Roedel
@ 2011-04-12 14:15   ` Alex Deucher
  2011-04-12 18:44     ` Joerg Roedel
  0 siblings, 1 reply; 108+ messages in thread
From: Alex Deucher @ 2011-04-12 14:15 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Linus Torvalds, Linux Kernel Mailing List, dri-devel

On Tue, Apr 12, 2011 at 5:02 AM, Joerg Roedel <joro@8bytes.org> wrote:
> On Mon, Apr 11, 2011 at 05:40:11PM -0700, Linus Torvalds wrote:
>> Let's hope the release cycle continues like this. I _like_ it when
>> people really seem to follow the whole "big changes during the merge
>> window" rules.
>
> Sorry for disturbing the silence, but radeon seems to have issues. I
> tested -rc3 (and after that -rc1 which also has the issue) on my Laptop
> and it just reboots after (or while?) GFX initialization. The last lines
> of dmesg are:
>
>  Freeing unused kernel memory: 624k freed
>  Write protecting the kernel read-only data: 8192k
>  Freeing unused kernel memory: 1456k freed
>  Freeing unused kernel memory: 16k freed
>  udev: starting version 151
>  udevd (62): /proc/62/oom_adj is deprecated, please use /proc/62/oom_score_adj instead.
>  [drm] Initialized drm 1.1.0 20060810
>  [drm] radeon defaulting to kernel modesetting.
>  [drm] radeon kernel modesetting enabled.
>  radeon 0000:01:05.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
>  [drm] initializing kernel modesetting (RS880 0x1002:0x9712).
>  [drm] register mmio base: 0xD6400000
>  [drm] register mmio size: 65536
>  ATOM BIOS: HP_TAG
>  radeon 0000:01:05.0: VRAM: 320M 0x00000000C0000000 - 0x00000000D3FFFFFF (320M used)
>  radeon 0000:01:05.0: GTT: 512M 0x00000000A0000000 - 0x00000000BFFFFFFF
>  [drm] Detected VRAM RAM=320M, BAR=256M
>  [drm] RAM width 32bits DDR
>  [TTM] Zone  kernel: Available graphics memory: 1896512 kiB.
>  usb 7-2: new full speed USB device number 2 using ohci_hcd
>  [TTM] Initializing pool allocator.
>  [drm] radeon: 320M of VRAM memory ready
>  [drm] radeon: 512M of GTT memory ready.
>  [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
>  [drm] Driver supports precise vblank timestamp query.
>  [drm] radeon: irq initialized.
>  [drm] GART: num cpu pages 131072, num gpu pages 131072
>  [drm] Loading RS780 Microcode
>  radeon 0000:01:05.0: WB enabled
>  [drm] ring test succeeded in 1 usecs
>  [drm] radeon: ib pool ready.
>
> The card is a Radeon Mobility 4200:
>
> 01:05.0 VGA compatible controller: ATI Technologies Inc M880G [Mobility Radeon HD 4200]
>        Subsystem: Hewlett-Packard Company Device 307e
>        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Latency: 0, Cache Line Size: 64 bytes
>        Interrupt: pin A routed to IRQ 18
>        Region 0: Memory at c0000000 (32-bit, prefetchable) [size=256M]
>        Region 1: I/O ports at 6000 [size=256]
>        Region 2: Memory at d6400000 (32-bit, non-prefetchable) [size=64K]
>        Region 5: Memory at d6300000 (32-bit, non-prefetchable) [size=1M]
>        Capabilities: [50] Power Management version 3
>                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>        Capabilities: [a0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
>                Address: 0000000000000000  Data: 0000
>        Kernel driver in use: radeon
>        Kernel modules: radeon
>
> The problem does not happen with 2.6.38. I try to bisect this further down to a
> commit. Alex, please let me know if you need any further information.

If you can bisect it, that would be great.  Thanks,

Alex

>
>        Joerg
>
>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-12 14:15   ` Alex Deucher
@ 2011-04-12 18:44     ` Joerg Roedel
  2011-04-13  1:27       ` David Rientjes
  2011-04-13  6:46       ` Ingo Molnar
  0 siblings, 2 replies; 108+ messages in thread
From: Joerg Roedel @ 2011-04-12 18:44 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Linus Torvalds, Linux Kernel Mailing List, dri-devel, Ingo Molnar

On Tue, Apr 12, 2011 at 10:15:11AM -0400, Alex Deucher wrote:
> On Tue, Apr 12, 2011 at 5:02 AM, Joerg Roedel <joro@8bytes.org> wrote:
> > On Mon, Apr 11, 2011 at 05:40:11PM -0700, Linus Torvalds wrote:
> >> Let's hope the release cycle continues like this. I _like_ it when
> >> people really seem to follow the whole "big changes during the merge
> >> window" rules.
> >
> > Sorry for disturbing the silence, but radeon seems to have issues. I
> > tested -rc3 (and after that -rc1 which also has the issue) on my Laptop
> > and it just reboots after (or while?) GFX initialization. The last lines
> > of dmesg are:
> >
> >  Freeing unused kernel memory: 624k freed
> >  Write protecting the kernel read-only data: 8192k
> >  Freeing unused kernel memory: 1456k freed
> >  Freeing unused kernel memory: 16k freed
> >  udev: starting version 151
> >  udevd (62): /proc/62/oom_adj is deprecated, please use /proc/62/oom_score_adj instead.
> >  [drm] Initialized drm 1.1.0 20060810
> >  [drm] radeon defaulting to kernel modesetting.
> >  [drm] radeon kernel modesetting enabled.
> >  radeon 0000:01:05.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
> >  [drm] initializing kernel modesetting (RS880 0x1002:0x9712).
> >  [drm] register mmio base: 0xD6400000
> >  [drm] register mmio size: 65536
> >  ATOM BIOS: HP_TAG
> >  radeon 0000:01:05.0: VRAM: 320M 0x00000000C0000000 - 0x00000000D3FFFFFF (320M used)
> >  radeon 0000:01:05.0: GTT: 512M 0x00000000A0000000 - 0x00000000BFFFFFFF
> >  [drm] Detected VRAM RAM=320M, BAR=256M
> >  [drm] RAM width 32bits DDR
> >  [TTM] Zone  kernel: Available graphics memory: 1896512 kiB.
> >  usb 7-2: new full speed USB device number 2 using ohci_hcd
> >  [TTM] Initializing pool allocator.
> >  [drm] radeon: 320M of VRAM memory ready
> >  [drm] radeon: 512M of GTT memory ready.
> >  [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
> >  [drm] Driver supports precise vblank timestamp query.
> >  [drm] radeon: irq initialized.
> >  [drm] GART: num cpu pages 131072, num gpu pages 131072
> >  [drm] Loading RS780 Microcode
> >  radeon 0000:01:05.0: WB enabled
> >  [drm] ring test succeeded in 1 usecs
> >  [drm] radeon: ib pool ready.
> >
> > The card is a Radeon Mobility 4200:
> >
> > 01:05.0 VGA compatible controller: ATI Technologies Inc M880G [Mobility Radeon HD 4200]
> >        Subsystem: Hewlett-Packard Company Device 307e
> >        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> >        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> >        Latency: 0, Cache Line Size: 64 bytes
> >        Interrupt: pin A routed to IRQ 18
> >        Region 0: Memory at c0000000 (32-bit, prefetchable) [size=256M]
> >        Region 1: I/O ports at 6000 [size=256]
> >        Region 2: Memory at d6400000 (32-bit, non-prefetchable) [size=64K]
> >        Region 5: Memory at d6300000 (32-bit, non-prefetchable) [size=1M]
> >        Capabilities: [50] Power Management version 3
> >                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> >                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> >        Capabilities: [a0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
> >                Address: 0000000000000000  Data: 0000
> >        Kernel driver in use: radeon
> >        Kernel modules: radeon
> >
> > The problem does not happen with 2.6.38. I try to bisect this further down to a
> > commit. Alex, please let me know if you need any further information.
> 
> If you can bisect it, that would be great.  Thanks,

Bisecting actually gave a very weird result. It points to

	d2137d5af4259f50c19addb8246a186c9ffac325

which is a merge-commit in the x86 tree. Even more weird is that this
notebook is the only machine with these symptoms, all my other boxes are
fine.
During the bisect I tested commits from Yinghai which were good. It
seems like the problem appeared with the merge.

	Joerg


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-12  0:40 Linux 2.6.39-rc3 Linus Torvalds
  2011-04-12  9:02 ` Joerg Roedel
@ 2011-04-12 19:09 ` Dave Jones
  2011-04-12 19:21   ` Dave Jones
  2011-04-12 20:20   ` Eric Sandeen
  2011-04-14 20:24 ` Borislav Petkov
  2 siblings, 2 replies; 108+ messages in thread
From: Dave Jones @ 2011-04-12 19:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, Eric Sandeen

On Mon, Apr 11, 2011 at 05:40:11PM -0700, Linus Torvalds wrote:
 > It's been another almost spookily calm week. Usually this kind of
 > calmness happens much later in the -rc series (during -rc7 or -rc8,
 > say), but I'm not going to complain. I'm just still waiting for the
 > other shoe to drop.
 
Here's an odd one.

my laptop's fstab has

/dev/mapper/vg_adamo-lv_home /home                   ext4    defaults        1 2

on 2.6.38, /proc/mounts contains ..

/dev/mapper/vg_adamo-lv_home /home ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0

on 2.6.39rc3 it looks like..

/dev/mapper/vg_adamo-lv_home /home ext4 rw,seclabel,relatime,user_xattr,barrier=1,data=ordered 0 0


Which looks like ea6633369458992241599c9d9ebadffaeddec164, so nothing untoward..

however, the output of mount looks very confused..

.38:
/dev/mapper/vg_adamo-lv_home on /home type ext4 (rw,relatime,seclabel,barrier=1,data=ordered)

.39:
- on /home type 79a9-4526-888c-1f86d35a6704 (rw,relatime,ext4)

It looks like /proc/self/mountinfo broke abi.

.38:
48 45 253:3 / /home rw,relatime - ext4 /dev/mapper/vg_adamo-lv_home rw,seclabel,barrier=1,data=ordered

.39:
46 22 253:3 / /home rw,relatime uuid:f3971858-79a9-4526-888c-1f86d35a6704 - ext4 /dev/mapper/vg_adamo-lv_home rw,seclabel,user_xattr,barrier=1,data=ordered



	Dave


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-12 19:09 ` Dave Jones
@ 2011-04-12 19:21   ` Dave Jones
  2011-04-12 19:55     ` Linus Torvalds
  2011-04-14  8:20     ` Aneesh Kumar K.V
  2011-04-12 20:20   ` Eric Sandeen
  1 sibling, 2 replies; 108+ messages in thread
From: Dave Jones @ 2011-04-12 19:21 UTC (permalink / raw)
  To: Linus Torvalds, Linux Kernel Mailing List, Eric Sandeen,
	Aneesh Kumar K.V

On Tue, Apr 12, 2011 at 03:09:34PM -0400, Dave Jones wrote:

 > however, the output of mount looks very confused..
 > 
 > .38:
 > /dev/mapper/vg_adamo-lv_home on /home type ext4 (rw,relatime,seclabel,barrier=1,data=ordered)
 > 
 > .39:
 > - on /home type 79a9-4526-888c-1f86d35a6704 (rw,relatime,ext4)
 > 
 > It looks like /proc/self/mountinfo broke abi.
 > 
 > .38:
 > 48 45 253:3 / /home rw,relatime - ext4 /dev/mapper/vg_adamo-lv_home rw,seclabel,barrier=1,data=ordered
 > 
 > .39:
 > 46 22 253:3 / /home rw,relatime uuid:f3971858-79a9-4526-888c-1f86d35a6704 - ext4 /dev/mapper/vg_adamo-lv_home rw,seclabel,user_xattr,barrier=1,data=ordered

looks like this was caused by 93f1c20bc8cdb757be50566eff88d65c3b26881f

perhaps adding that string to the end of the line would preserve what mount expects ?

	Dave

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-12 19:21   ` Dave Jones
@ 2011-04-12 19:55     ` Linus Torvalds
  2011-04-12 20:13       ` Dave Jones
  2011-04-14  8:20     ` Aneesh Kumar K.V
  1 sibling, 1 reply; 108+ messages in thread
From: Linus Torvalds @ 2011-04-12 19:55 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel Mailing List, Eric Sandeen, Aneesh Kumar K.V

On Tue, Apr 12, 2011 at 12:21 PM, Dave Jones <davej@redhat.com> wrote:
>
> looks like this was caused by 93f1c20bc8cdb757be50566eff88d65c3b26881f
>
> perhaps adding that string to the end of the line would preserve what mount expects ?

Care to test? Otherwise I'll just revert the thing.. It's clearly not
valid behavior to randomly add some new field into the middle of a
/proc file.

                              Linus

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-12 19:55     ` Linus Torvalds
@ 2011-04-12 20:13       ` Dave Jones
  0 siblings, 0 replies; 108+ messages in thread
From: Dave Jones @ 2011-04-12 20:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, Eric Sandeen, Aneesh Kumar K.V

On Tue, Apr 12, 2011 at 12:55:24PM -0700, Linus Torvalds wrote:
 > On Tue, Apr 12, 2011 at 12:21 PM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > looks like this was caused by 93f1c20bc8cdb757be50566eff88d65c3b26881f
 > >
 > > perhaps adding that string to the end of the line would preserve what mount expects ?
 > 
 > Care to test? Otherwise I'll just revert the thing.. It's clearly not
 > valid behavior to randomly add some new field into the middle of a
 > /proc file.

Moving it to the EOL seems to restore things to how it looked in .38
I don't know if this breaks anything else. (I haven't dug to see why that
field was added, so I don't know what tool is using it, or what was used
to test the original patch in the first place).

	Dave

-- 

93f1c20bc8cdb757be50566eff88d65c3b26881f added a uuid field in the middle of
a line in /proc/self/mountinfo.  This broke the ABI expected by mount(8).
Moving it to the end restores the output to what it expects.

Signed-off-by: Dave Jones <davej@redhat.com>

diff --git a/fs/namespace.c b/fs/namespace.c
index 7dba2ed..cc2df9d 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1085,10 +1085,6 @@ static int show_mountinfo(struct seq_file *m, void *v)
 	if (IS_MNT_UNBINDABLE(mnt))
 		seq_puts(m, " unbindable");
 
-	if (!uuid_is_nil(mnt->mnt_sb->s_uuid))
-		/* print the uuid */
-		seq_printf(m, " uuid:%pU", mnt->mnt_sb->s_uuid);
-
 	/* Filesystem specific data */
 	seq_puts(m, " - ");
 	show_type(m, sb);
@@ -1105,6 +1101,10 @@ static int show_mountinfo(struct seq_file *m, void *v)
 		goto out;
 	if (sb->s_op->show_options)
 		err = sb->s_op->show_options(m, mnt);
+	if (!uuid_is_nil(mnt->mnt_sb->s_uuid))
+		/* print the uuid */
+		seq_printf(m, " uuid:%pU", mnt->mnt_sb->s_uuid);
+
 	seq_putc(m, '\n');
 out:
 	return err;


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-12 19:09 ` Dave Jones
  2011-04-12 19:21   ` Dave Jones
@ 2011-04-12 20:20   ` Eric Sandeen
  2011-04-12 20:27     ` Karel Zak
  2011-04-12 20:33     ` Linus Torvalds
  1 sibling, 2 replies; 108+ messages in thread
From: Eric Sandeen @ 2011-04-12 20:20 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Linux Kernel Mailing List,
	Aneesh Kumar, Karel Zak

On 4/12/11 2:09 PM, Dave Jones wrote:
> On Mon, Apr 11, 2011 at 05:40:11PM -0700, Linus Torvalds wrote:
>  > It's been another almost spookily calm week. Usually this kind of
>  > calmness happens much later in the -rc series (during -rc7 or -rc8,
>  > say), but I'm not going to complain. I'm just still waiting for the
>  > other shoe to drop.
>  
> Here's an odd one.
> 
> my laptop's fstab has
> 
> /dev/mapper/vg_adamo-lv_home /home                   ext4    defaults        1 2
> 
> on 2.6.38, /proc/mounts contains ..
> 
> /dev/mapper/vg_adamo-lv_home /home ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0
> 
> on 2.6.39rc3 it looks like..
> 
> /dev/mapper/vg_adamo-lv_home /home ext4 rw,seclabel,relatime,user_xattr,barrier=1,data=ordered 0 0
> 
> 
> Which looks like ea6633369458992241599c9d9ebadffaeddec164, so nothing untoward..
> 
> however, the output of mount looks very confused..
> 
> .38:
> /dev/mapper/vg_adamo-lv_home on /home type ext4 (rw,relatime,seclabel,barrier=1,data=ordered)
> 
> .39:
> - on /home type 79a9-4526-888c-1f86d35a6704 (rw,relatime,ext4)
> 
> It looks like /proc/self/mountinfo broke abi.
> 
> .38:
> 48 45 253:3 / /home rw,relatime - ext4 /dev/mapper/vg_adamo-lv_home rw,seclabel,barrier=1,data=ordered
> 
> .39:
> 46 22 253:3 / /home rw,relatime uuid:f3971858-79a9-4526-888c-1f86d35a6704 - ext4 /dev/mapper/vg_adamo-lv_home rw,seclabel,user_xattr,barrier=1,data=ordered
> 

so it's supposed to be like this, from Documentation/filesystems/proc.txt:

> This file contains lines of the form:
> 
> 36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue
> (1)(2)(3)   (4)   (5)      (6)      (7)   (8) (9)   (10)         (11)
> 
> (1) mount ID:  unique identifier of the mount (may be reused after umount)
> (2) parent ID:  ID of parent (or of self for the top of the mount tree)
> (3) major:minor:  value of st_dev for files on filesystem
> (4) root:  root of the mount within the filesystem
> (5) mount point:  mount point relative to the process's root
> (6) mount options:  per mount options
> (7) optional fields:  zero or more fields of the form "tag[:value]"
> (8) separator:  marks the end of the optional fields
> (9) filesystem type:  name of filesystem of the form "type[.subtype]"
> (10) mount source:  filesystem specific information or "none"
> (11) super options:  per super block options

it does seem that the new UUID info is in a perfectly fine place (the optional fields slot), at least per the docs, so I guess I might blame the mount binary for not following the aforementioned rules...

Maybe Karel knows?  cc'd...

-Eric

> 
> 	Dave
> 


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-12 20:20   ` Eric Sandeen
@ 2011-04-12 20:27     ` Karel Zak
  2011-04-12 20:33     ` Linus Torvalds
  1 sibling, 0 replies; 108+ messages in thread
From: Karel Zak @ 2011-04-12 20:27 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Dave Jones, Linus Torvalds, Linux Kernel Mailing List, Aneesh Kumar

On Tue, Apr 12, 2011 at 03:20:14PM -0500, Eric Sandeen wrote:
> > It looks like /proc/self/mountinfo broke abi.
> > 
> > .38:
> > 48 45 253:3 / /home rw,relatime - ext4 /dev/mapper/vg_adamo-lv_home rw,seclabel,barrier=1,data=ordered
> > 
> > .39:
> > 46 22 253:3 / /home rw,relatime uuid:f3971858-79a9-4526-888c-1f86d35a6704 - ext4 /dev/mapper/vg_adamo-lv_home rw,seclabel,user_xattr,barrier=1,data=ordered
> > 
> 
> so it's supposed to be like this, from Documentation/filesystems/proc.txt:
> 
> > This file contains lines of the form:
> > 
> > 36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue
> > (1)(2)(3)   (4)   (5)      (6)      (7)   (8) (9)   (10)         (11)
> > 
> > (1) mount ID:  unique identifier of the mount (may be reused after umount)
> > (2) parent ID:  ID of parent (or of self for the top of the mount tree)
> > (3) major:minor:  value of st_dev for files on filesystem
> > (4) root:  root of the mount within the filesystem
> > (5) mount point:  mount point relative to the process's root
> > (6) mount options:  per mount options
> > (7) optional fields:  zero or more fields of the form "tag[:value]"
> > (8) separator:  marks the end of the optional fields
> > (9) filesystem type:  name of filesystem of the form "type[.subtype]"
> > (10) mount source:  filesystem specific information or "none"
> > (11) super options:  per super block options
> 
> it does seem that the new UUID info is in a perfectly fine place
> (the optional fields slot), at least per the docs, so I guess I
> might blame the mount binary for not following the aforementioned
> rules...

 Yes, I agree. Already discussed at lkml and linux-fs :-)
 http://thread.gmane.org/gmane.linux.kernel/1121533 

> Maybe Karel knows?  cc'd...

 Already fixed in util-linux upstream.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-12 20:20   ` Eric Sandeen
  2011-04-12 20:27     ` Karel Zak
@ 2011-04-12 20:33     ` Linus Torvalds
  1 sibling, 0 replies; 108+ messages in thread
From: Linus Torvalds @ 2011-04-12 20:33 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Dave Jones, Linux Kernel Mailing List, Aneesh Kumar, Karel Zak,
	Ram Pai, Miklos Szeredi

On Tue, Apr 12, 2011 at 1:20 PM, Eric Sandeen <sandeen@redhat.com> wrote:
>
> so it's supposed to be like this, from Documentation/filesystems/proc.txt:
>
>> This file contains lines of the form:
>>
>> 36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue
>> (1)(2)(3)   (4)   (5)      (6)      (7)   (8) (9)   (10)         (11)

Gaah, yes. Apparently that placement is correct and documented, and
has been since the beginning.

However, reality always takes precedence, so I think that for now
we'll just have to revert the commit that added the uid: tag, and we
can re-visit this issue when hopefully the tools have been fixed.

"But it's documented.." sadly doesn't fix actual user installations or
user-visible regressions.

                          Linus

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-12 18:44     ` Joerg Roedel
@ 2011-04-13  1:27       ` David Rientjes
  2011-04-13  6:46       ` Ingo Molnar
  1 sibling, 0 replies; 108+ messages in thread
From: David Rientjes @ 2011-04-13  1:27 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Alex Deucher, Linus Torvalds, Linux Kernel Mailing List,
	dri-devel, Ingo Molnar, Alexandre Demers, Yinghai Lu

On Tue, 12 Apr 2011, Joerg Roedel wrote:

> Bisecting actually gave a very weird result. It points to
> 
> 	d2137d5af4259f50c19addb8246a186c9ffac325
> 
> which is a merge-commit in the x86 tree. Even more weird is that this
> notebook is the only machine with these symptoms, all my other boxes are
> fine.
> During the bisect I tested commits from Yinghai which were good. It
> seems like the problem appeared with the merge.
> 

Alexandre Demers (cc'd) reports a boot failure bisected to the same merge 
on a 64-bit AMD tricore in 
https://bugzilla.kernel.org/show_bug.cgi?id=33012.  We're awaiting 
earlyprintk= output from that kernel, if possible, and Yinghai asked for 
his .config and dmesg output from the last known working kernel.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-12 18:44     ` Joerg Roedel
  2011-04-13  1:27       ` David Rientjes
@ 2011-04-13  6:46       ` Ingo Molnar
  2011-04-13 17:21         ` Joerg Roedel
  1 sibling, 1 reply; 108+ messages in thread
From: Ingo Molnar @ 2011-04-13  6:46 UTC (permalink / raw)
  To: Joerg Roedel, Yinghai Lu
  Cc: Alex Deucher, Linus Torvalds, Linux Kernel Mailing List,
	dri-devel, H. Peter Anvin, Thomas Gleixner, Tejun Heo


* Joerg Roedel <joro@8bytes.org> wrote:

> > > The problem does not happen with 2.6.38. I try to bisect this further 
> > > down to a commit. Alex, please let me know if you need any further 
> > > information.
> > 
> > If you can bisect it, that would be great.  Thanks,
> 
> Bisecting actually gave a very weird result. It points to
> 
> 	d2137d5af4259f50c19addb8246a186c9ffac325
> 
> which is a merge-commit in the x86 tree. Even more weird is that this
> notebook is the only machine with these symptoms, all my other boxes are
> fine.
>
> During the bisect I tested commits from Yinghai which were good. It seems 
> like the problem appeared with the merge.

There's a similar looking bug being debugged here:

  https://bugzilla.kernel.org/show_bug.cgi?id=33012

Could you please send the before/after bootlog (in particular all memory init 
messages included) and your .config?

 before:  f005fe12b90c: x86-64: Move out cleanup higmap [_brk_end, _end) out of init_memory_mapping()
  after:  d2137d5af425: Merge branch 'linus' into x86/bootmem

I've Cc:-ed more people who might have an idea about it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13  6:46       ` Ingo Molnar
@ 2011-04-13 17:21         ` Joerg Roedel
  2011-04-13 18:39           ` H. Peter Anvin
                             ` (2 more replies)
  0 siblings, 3 replies; 108+ messages in thread
From: Joerg Roedel @ 2011-04-13 17:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yinghai Lu, Alex Deucher, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, H. Peter Anvin,
	Thomas Gleixner, Tejun Heo

On Wed, Apr 13, 2011 at 08:46:09AM +0200, Ingo Molnar wrote:
> Could you please send the before/after bootlog (in particular all memory init 
> messages included) and your .config?
> 
>  before:  f005fe12b90c: x86-64: Move out cleanup higmap [_brk_end, _end) out of init_memory_mapping()
>   after:  d2137d5af425: Merge branch 'linus' into x86/bootmem
> 
> I've Cc:-ed more people who might have an idea about it.

Okay, I have done some more bisecting and debugging today.

First of all, I bisected between v2.6.37-rc2..f005fe12b90c which where
only a couple of patches and merged v2.6.38-rc4 in at every step. There
was no failure found.
Then I tried this again, but this time I merged v2.6.38-rc5 at every
step and was successful. The bad commit in this branch turned out to be

	1a4a678b12c84db9ae5dce424e0e97f0559bb57c

which is related to memblock.

Then I tried to find out which change between 2.6.38-rc4 and 2.6.38-rc5
is needed to trigger the failure, so I used f005fe12b90c as a base,
bisected between v2.6.38-rc4..v2.6.38-rc5 and merged every bisect step
into the base and tested. Here the bad commit turned out to be

	e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20

which is related to gart. It turned out that the gart aperture on that
box is on another position with these patches. Before it was as
0xa4000000 and now it is at 0xa0000000. It seems like this has something
to do with the root-cause.

Reverting commit 1a4a678b12c84db9ae5dce424e0e97f0559bb57c fixes the
problem btw. and booting with iommu=soft also works, but I have no idea
yet why the aperture at that address is a problem (with the patch
reverted the aperture lands at 0x80000000).

I have put some debug-data online. There is my .config and two
dmesg-files for good (==2.6.39-rc3 + revert) and bad (==2.6.39-rc3)
I also created these dmesg-files again with memblock=debug, maybe that
helps to find the problem. The files are at

	http://www.8bytes.org/~joro/debug/

Or someone else has an idea about the issue...

	Joerg


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 17:21         ` Joerg Roedel
@ 2011-04-13 18:39           ` H. Peter Anvin
  2011-04-13 19:26             ` Joerg Roedel
  2011-04-13 18:51           ` H. Peter Anvin
  2011-04-13 19:14           ` Yinghai Lu
  2 siblings, 1 reply; 108+ messages in thread
From: H. Peter Anvin @ 2011-04-13 18:39 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Ingo Molnar, Yinghai Lu, Alex Deucher, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo

On 04/13/2011 10:21 AM, Joerg Roedel wrote:
> On Wed, Apr 13, 2011 at 08:46:09AM +0200, Ingo Molnar wrote:
>> Could you please send the before/after bootlog (in particular all memory init 
>> messages included) and your .config?
>>
>>  before:  f005fe12b90c: x86-64: Move out cleanup higmap [_brk_end, _end) out of init_memory_mapping()
>>   after:  d2137d5af425: Merge branch 'linus' into x86/bootmem
>>
>> I've Cc:-ed more people who might have an idea about it.
> 
> Okay, I have done some more bisecting and debugging today.
> 

First of all, *huge* thanks for this effort.  At least we need to track
down the bits that need to be reverted -- it is past rc3, and it's time
to see what we should revert and tell the submitter to try again next cycle.

This looks to be the same issue as in bugzilla 33012:

	https://bugzilla.kernel.org/show_bug.cgi?id=33012

... so it would be good if we could keep the information in there.

	-hpa

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 17:21         ` Joerg Roedel
  2011-04-13 18:39           ` H. Peter Anvin
@ 2011-04-13 18:51           ` H. Peter Anvin
  2011-04-13 19:24             ` Joerg Roedel
  2011-04-13 19:14           ` Yinghai Lu
  2 siblings, 1 reply; 108+ messages in thread
From: H. Peter Anvin @ 2011-04-13 18:51 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Ingo Molnar, Yinghai Lu, Alex Deucher, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo

On 04/13/2011 10:21 AM, Joerg Roedel wrote:
> 
> First of all, I bisected between v2.6.37-rc2..f005fe12b90c which where
> only a couple of patches and merged v2.6.38-rc4 in at every step. There
> was no failure found.
> Then I tried this again, but this time I merged v2.6.38-rc5 at every
> step and was successful. The bad commit in this branch turned out to be
> 
> 	1a4a678b12c84db9ae5dce424e0e97f0559bb57c
> 
> which is related to memblock.
> 
> Then I tried to find out which change between 2.6.38-rc4 and 2.6.38-rc5
> is needed to trigger the failure, so I used f005fe12b90c as a base,
> bisected between v2.6.38-rc4..v2.6.38-rc5 and merged every bisect step
> into the base and tested. Here the bad commit turned out to be
> 
> 	e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20
> 
> which is related to gart. It turned out that the gart aperture on that
> box is on another position with these patches. Before it was as
> 0xa4000000 and now it is at 0xa0000000. It seems like this has something
> to do with the root-cause.
> 
> Reverting commit 1a4a678b12c84db9ae5dce424e0e97f0559bb57c fixes the
> problem btw. and booting with iommu=soft also works, but I have no idea
> yet why the aperture at that address is a problem (with the patch
> reverted the aperture lands at 0x80000000).
> 

Does reverting e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20 solve the
problem for you?

1a4a678b12c84db9ae5dce424e0e97f0559bb57c is a memory-allocation-order
patch, which have a nasty tendency to unmask bugs elsewhere in the
kernel.  However, e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20 looks
positively strange (and it doesn't exactly help that the description is
written in Yinghai-ese and is therefore nearly impossible to decode,
never mind tell if it is remotely correct.)

	-hpa



^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 17:21         ` Joerg Roedel
  2011-04-13 18:39           ` H. Peter Anvin
  2011-04-13 18:51           ` H. Peter Anvin
@ 2011-04-13 19:14           ` Yinghai Lu
  2011-04-13 19:34             ` Joerg Roedel
                               ` (2 more replies)
  2 siblings, 3 replies; 108+ messages in thread
From: Yinghai Lu @ 2011-04-13 19:14 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Ingo Molnar, Alex Deucher, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, H. Peter Anvin,
	Thomas Gleixner, Tejun Heo

On 04/13/2011 10:21 AM, Joerg Roedel wrote:
> On Wed, Apr 13, 2011 at 08:46:09AM +0200, Ingo Molnar wrote:
> First of all, I bisected between v2.6.37-rc2..f005fe12b90c which where
> only a couple of patches and merged v2.6.38-rc4 in at every step. There
> was no failure found.
> Then I tried this again, but this time I merged v2.6.38-rc5 at every
> step and was successful. The bad commit in this branch turned out to be
> 
> 	1a4a678b12c84db9ae5dce424e0e97f0559bb57c
> 
> which is related to memblock.
> 
> Then I tried to find out which change between 2.6.38-rc4 and 2.6.38-rc5
> is needed to trigger the failure, so I used f005fe12b90c as a base,
> bisected between v2.6.38-rc4..v2.6.38-rc5 and merged every bisect step
> into the base and tested. Here the bad commit turned out to be
> 
> 	e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20
> 
> which is related to gart. It turned out that the gart aperture on that
> box is on another position with these patches. Before it was as
> 0xa4000000 and now it is at 0xa0000000. It seems like this has something
> to do with the root-cause.
> 
> Reverting commit 1a4a678b12c84db9ae5dce424e0e97f0559bb57c fixes the
> problem btw. and booting with iommu=soft also works, but I have no idea
> yet why the aperture at that address is a problem (with the patch
> reverted the aperture lands at 0x80000000).
> 
> I have put some debug-data online. There is my .config and two
> dmesg-files for good (==2.6.39-rc3 + revert) and bad (==2.6.39-rc3)
> I also created these dmesg-files again with memblock=debug, maybe that
> helps to find the problem. The files are at
> 
> 	http://www.8bytes.org/~joro/debug/

thanks for the bisecting...

so those two patches uncover some problems.

[    0.000000] Checking aperture...
[    0.000000] No AGP bridge found
[    0.000000] Node 0: aperture @ a0000000 size 32 MB
[    0.000000] Aperture pointing to e820 RAM. Ignoring.
[    0.000000] Your BIOS doesn't leave a aperture memory hole
[    0.000000] Please enable the IOMMU option in the BIOS setup
[    0.000000] This costs you 64 MB of RAM
[    0.000000]     memblock_x86_reserve_range: [0xa0000000-0xa3ffffff]       aperture64
[    0.000000] Mapping aperture over 65536 KB of RAM @ a0000000

so kernel try to reallocate apperture. because BIOS allocated is pointed to RAM or size is too small.

but your radeon does use [0xa0000000, 0xbfffffff)

[    4.281993] radeon 0000:01:05.0: VRAM: 320M 0x00000000C0000000 - 0x00000000D3FFFFFF (320M used)
[    4.290672] radeon 0000:01:05.0: GTT: 512M 0x00000000A0000000 - 0x00000000BFFFFFFF
[    4.298550] [drm] Detected VRAM RAM=320M, BAR=256M
[    4.309857] [drm] RAM width 32bits DDR
[    4.313748] [TTM] Zone  kernel: Available graphics memory: 1896524 kiB.
[    4.320379] [TTM] Initializing pool allocator.
[    4.324948] [drm] radeon: 320M of VRAM memory ready
[    4.329832] [drm] radeon: 512M of GTT memory ready.

and the one seems working:

[    0.000000] Checking aperture...
[    0.000000] No AGP bridge found
[    0.000000] Node 0: aperture @ a0000000 size 32 MB
[    0.000000] Aperture pointing to e820 RAM. Ignoring.
[    0.000000] Your BIOS doesn't leave a aperture memory hole
[    0.000000] Please enable the IOMMU option in the BIOS setup
[    0.000000] This costs you 64 MB of RAM
[    0.000000]     memblock_x86_reserve_range: [0x80000000-0x83ffffff]       aperture64
[    0.000000] Mapping aperture over 65536 KB of RAM @ 80000000
[    0.000000]     memblock_x86_reserve_range: [0xacb6bdc0-0xacb6bddf]          BOOTMEM

will use different position...

[    4.250159] radeon 0000:01:05.0: VRAM: 320M 0x00000000C0000000 - 0x00000000D3FFFFFF (320M used)
[    4.258830] radeon 0000:01:05.0: GTT: 512M 0x00000000A0000000 - 0x00000000BFFFFFFF
[    4.266742] [drm] Detected VRAM RAM=320M, BAR=256M
[    4.271549] [drm] RAM width 32bits DDR
[    4.275435] [TTM] Zone  kernel: Available graphics memory: 1896526 kiB.
[    4.282066] [TTM] Initializing pool allocator.
[    4.282085] usb 7-2: new full speed USB device number 2 using ohci_hcd
[    4.293076] [drm] radeon: 320M of VRAM memory ready
[    4.298277] [drm] radeon: 512M of GTT memory ready.
[    4.303218] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[    4.309854] [drm] Driver supports precise vblank timestamp query.
[    4.315970] [drm] radeon: irq initialized.
[    4.320094] [drm] GART: num cpu pages 131072, num gpu pages 131072

So question is why radeon is using the address [0xa0000000 - 0xc000000], and in E820 it is RAM ....

[    0.000000]  BIOS-e820: 0000000000100000 - 00000000acb8d000 (usable)
[    0.000000]  BIOS-e820: 00000000acb8d000 - 00000000acb8f000 (reserved)
[    0.000000]  BIOS-e820: 00000000acb8f000 - 00000000afce9000 (usable)
[    0.000000]  BIOS-e820: 00000000afce9000 - 00000000afd21000 (reserved)
[    0.000000]  BIOS-e820: 00000000afd21000 - 00000000afd4f000 (usable)
[    0.000000]  BIOS-e820: 00000000afd4f000 - 00000000afdcf000 (reserved)
[    0.000000]  BIOS-e820: 00000000afdcf000 - 00000000afecf000 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000afecf000 - 00000000afeff000 (ACPI data)
[    0.000000]  BIOS-e820: 00000000afeff000 - 00000000aff00000 (usable)


so looks bios program wrong address to the radon card?

Thanks

Yinghai Lu

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 18:51           ` H. Peter Anvin
@ 2011-04-13 19:24             ` Joerg Roedel
  0 siblings, 0 replies; 108+ messages in thread
From: Joerg Roedel @ 2011-04-13 19:24 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Yinghai Lu, Alex Deucher, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo

On Wed, Apr 13, 2011 at 11:51:39AM -0700, H. Peter Anvin wrote:
> On 04/13/2011 10:21 AM, Joerg Roedel wrote:
> > 
> > First of all, I bisected between v2.6.37-rc2..f005fe12b90c which where
> > only a couple of patches and merged v2.6.38-rc4 in at every step. There
> > was no failure found.
> > Then I tried this again, but this time I merged v2.6.38-rc5 at every
> > step and was successful. The bad commit in this branch turned out to be
> > 
> > 	1a4a678b12c84db9ae5dce424e0e97f0559bb57c
> > 
> > which is related to memblock.
> > 
> > Then I tried to find out which change between 2.6.38-rc4 and 2.6.38-rc5
> > is needed to trigger the failure, so I used f005fe12b90c as a base,
> > bisected between v2.6.38-rc4..v2.6.38-rc5 and merged every bisect step
> > into the base and tested. Here the bad commit turned out to be
> > 
> > 	e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20
> > 
> > which is related to gart. It turned out that the gart aperture on that
> > box is on another position with these patches. Before it was as
> > 0xa4000000 and now it is at 0xa0000000. It seems like this has something
> > to do with the root-cause.
> > 
> > Reverting commit 1a4a678b12c84db9ae5dce424e0e97f0559bb57c fixes the
> > problem btw. and booting with iommu=soft also works, but I have no idea
> > yet why the aperture at that address is a problem (with the patch
> > reverted the aperture lands at 0x80000000).
> > 
> 
> Does reverting e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20 solve the
> problem for you?

No, reverting that patch doesn't make the problem go away (and the gart
aperture is still on 0xa0000000). I tested this in 39-rc3, I havn't
tested if it makes a difference on the original bisect-commit from Ingo,
probably it does (don't know if that matters).
Strange about this commit is that it fixes an x86 gart aperture
allocation bug in generic memblock code.

> 1a4a678b12c84db9ae5dce424e0e97f0559bb57c is a memory-allocation-order
> patch, which have a nasty tendency to unmask bugs elsewhere in the
> kernel.  However, e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20 looks
> positively strange (and it doesn't exactly help that the description is
> written in Yinghai-ese and is therefore nearly impossible to decode,
> never mind tell if it is remotely correct.)

I think that the two commits are okay and the bug is somewhere else, but
I have no idea yet were to look next. I spent some time looking at
radeon code and talking to Alex about it (because it seemed suspicous
that the GTT is on 0xa0000000 too, but as Alex explained me this is an
address in the GPU address space and shouldn't matter).

Regards,

       Joerg	


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 18:39           ` H. Peter Anvin
@ 2011-04-13 19:26             ` Joerg Roedel
  0 siblings, 0 replies; 108+ messages in thread
From: Joerg Roedel @ 2011-04-13 19:26 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Yinghai Lu, Alex Deucher, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo

On Wed, Apr 13, 2011 at 11:39:29AM -0700, H. Peter Anvin wrote:
> On 04/13/2011 10:21 AM, Joerg Roedel wrote:
> > On Wed, Apr 13, 2011 at 08:46:09AM +0200, Ingo Molnar wrote:
> >> Could you please send the before/after bootlog (in particular all memory init 
> >> messages included) and your .config?
> >>
> >>  before:  f005fe12b90c: x86-64: Move out cleanup higmap [_brk_end, _end) out of init_memory_mapping()
> >>   after:  d2137d5af425: Merge branch 'linus' into x86/bootmem
> >>
> >> I've Cc:-ed more people who might have an idea about it.
> > 
> > Okay, I have done some more bisecting and debugging today.
> > 
> 
> First of all, *huge* thanks for this effort.  At least we need to track
> down the bits that need to be reverted -- it is past rc3, and it's time
> to see what we should revert and tell the submitter to try again next cycle.
> 
> This looks to be the same issue as in bugzilla 33012:
> 
> 	https://bugzilla.kernel.org/show_bug.cgi?id=33012
> 
> ... so it would be good if we could keep the information in there.

Yes, I try to find my korg bugzilla account again and drop the
information from this email there.

	Joerg


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 19:14           ` Yinghai Lu
@ 2011-04-13 19:34             ` Joerg Roedel
  2011-04-13 20:48               ` Yinghai Lu
  2011-04-13 19:48             ` Alex Deucher
  2011-04-14  1:58               ` H. Peter Anvin
  2 siblings, 1 reply; 108+ messages in thread
From: Joerg Roedel @ 2011-04-13 19:34 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Alex Deucher, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, H. Peter Anvin,
	Thomas Gleixner, Tejun Heo

On Wed, Apr 13, 2011 at 12:14:55PM -0700, Yinghai Lu wrote:
> thanks for the bisecting...
> 
> so those two patches uncover some problems.
> 
> [    0.000000] Checking aperture...
> [    0.000000] No AGP bridge found
> [    0.000000] Node 0: aperture @ a0000000 size 32 MB
> [    0.000000] Aperture pointing to e820 RAM. Ignoring.
> [    0.000000] Your BIOS doesn't leave a aperture memory hole
> [    0.000000] Please enable the IOMMU option in the BIOS setup
> [    0.000000] This costs you 64 MB of RAM
> [    0.000000]     memblock_x86_reserve_range: [0xa0000000-0xa3ffffff]       aperture64
> [    0.000000] Mapping aperture over 65536 KB of RAM @ a0000000
> 
> so kernel try to reallocate apperture. because BIOS allocated is pointed to RAM or size is too small.

It is actually beyond 4GB on that machine, this value read here is from
the previous kernel-boot. The BIOS does not reset these values on a
reboot.

> but your radeon does use [0xa0000000, 0xbfffffff)

Yes, I suspected that too (and spent a few hours reading radeon code),
but then I talked the Alex Deucher and he explained that these addresses
which the driver prints for GTT and VRAM are in the GPU address space
and do not refer to system ram. So this shouldn't be the problem.

	Joerg


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 19:14           ` Yinghai Lu
  2011-04-13 19:34             ` Joerg Roedel
@ 2011-04-13 19:48             ` Alex Deucher
  2011-04-14  1:58               ` H. Peter Anvin
  2 siblings, 0 replies; 108+ messages in thread
From: Alex Deucher @ 2011-04-13 19:48 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Joerg Roedel, Ingo Molnar, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, H. Peter Anvin,
	Thomas Gleixner, Tejun Heo

On Wed, Apr 13, 2011 at 3:14 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On 04/13/2011 10:21 AM, Joerg Roedel wrote:
>> On Wed, Apr 13, 2011 at 08:46:09AM +0200, Ingo Molnar wrote:
>> First of all, I bisected between v2.6.37-rc2..f005fe12b90c which where
>> only a couple of patches and merged v2.6.38-rc4 in at every step. There
>> was no failure found.
>> Then I tried this again, but this time I merged v2.6.38-rc5 at every
>> step and was successful. The bad commit in this branch turned out to be
>>
>>       1a4a678b12c84db9ae5dce424e0e97f0559bb57c
>>
>> which is related to memblock.
>>
>> Then I tried to find out which change between 2.6.38-rc4 and 2.6.38-rc5
>> is needed to trigger the failure, so I used f005fe12b90c as a base,
>> bisected between v2.6.38-rc4..v2.6.38-rc5 and merged every bisect step
>> into the base and tested. Here the bad commit turned out to be
>>
>>       e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20
>>
>> which is related to gart. It turned out that the gart aperture on that
>> box is on another position with these patches. Before it was as
>> 0xa4000000 and now it is at 0xa0000000. It seems like this has something
>> to do with the root-cause.
>>
>> Reverting commit 1a4a678b12c84db9ae5dce424e0e97f0559bb57c fixes the
>> problem btw. and booting with iommu=soft also works, but I have no idea
>> yet why the aperture at that address is a problem (with the patch
>> reverted the aperture lands at 0x80000000).
>>
>> I have put some debug-data online. There is my .config and two
>> dmesg-files for good (==2.6.39-rc3 + revert) and bad (==2.6.39-rc3)
>> I also created these dmesg-files again with memblock=debug, maybe that
>> helps to find the problem. The files are at
>>
>>       http://www.8bytes.org/~joro/debug/
>
> thanks for the bisecting...
>
> so those two patches uncover some problems.
>
> [    0.000000] Checking aperture...
> [    0.000000] No AGP bridge found
> [    0.000000] Node 0: aperture @ a0000000 size 32 MB
> [    0.000000] Aperture pointing to e820 RAM. Ignoring.
> [    0.000000] Your BIOS doesn't leave a aperture memory hole
> [    0.000000] Please enable the IOMMU option in the BIOS setup
> [    0.000000] This costs you 64 MB of RAM
> [    0.000000]     memblock_x86_reserve_range: [0xa0000000-0xa3ffffff]       aperture64
> [    0.000000] Mapping aperture over 65536 KB of RAM @ a0000000
>
> so kernel try to reallocate apperture. because BIOS allocated is pointed to RAM or size is too small.
>
> but your radeon does use [0xa0000000, 0xbfffffff)
>
> [    4.281993] radeon 0000:01:05.0: VRAM: 320M 0x00000000C0000000 - 0x00000000D3FFFFFF (320M used)
> [    4.290672] radeon 0000:01:05.0: GTT: 512M 0x00000000A0000000 - 0x00000000BFFFFFFF
> [    4.298550] [drm] Detected VRAM RAM=320M, BAR=256M
> [    4.309857] [drm] RAM width 32bits DDR
> [    4.313748] [TTM] Zone  kernel: Available graphics memory: 1896524 kiB.
> [    4.320379] [TTM] Initializing pool allocator.
> [    4.324948] [drm] radeon: 320M of VRAM memory ready
> [    4.329832] [drm] radeon: 512M of GTT memory ready.
>
> and the one seems working:
>
> [    0.000000] Checking aperture...
> [    0.000000] No AGP bridge found
> [    0.000000] Node 0: aperture @ a0000000 size 32 MB
> [    0.000000] Aperture pointing to e820 RAM. Ignoring.
> [    0.000000] Your BIOS doesn't leave a aperture memory hole
> [    0.000000] Please enable the IOMMU option in the BIOS setup
> [    0.000000] This costs you 64 MB of RAM
> [    0.000000]     memblock_x86_reserve_range: [0x80000000-0x83ffffff]       aperture64
> [    0.000000] Mapping aperture over 65536 KB of RAM @ 80000000
> [    0.000000]     memblock_x86_reserve_range: [0xacb6bdc0-0xacb6bddf]          BOOTMEM
>
> will use different position...
>
> [    4.250159] radeon 0000:01:05.0: VRAM: 320M 0x00000000C0000000 - 0x00000000D3FFFFFF (320M used)
> [    4.258830] radeon 0000:01:05.0: GTT: 512M 0x00000000A0000000 - 0x00000000BFFFFFFF
> [    4.266742] [drm] Detected VRAM RAM=320M, BAR=256M
> [    4.271549] [drm] RAM width 32bits DDR
> [    4.275435] [TTM] Zone  kernel: Available graphics memory: 1896526 kiB.
> [    4.282066] [TTM] Initializing pool allocator.
> [    4.282085] usb 7-2: new full speed USB device number 2 using ohci_hcd
> [    4.293076] [drm] radeon: 320M of VRAM memory ready
> [    4.298277] [drm] radeon: 512M of GTT memory ready.
> [    4.303218] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
> [    4.309854] [drm] Driver supports precise vblank timestamp query.
> [    4.315970] [drm] radeon: irq initialized.
> [    4.320094] [drm] GART: num cpu pages 131072, num gpu pages 131072
>
> So question is why radeon is using the address [0xa0000000 - 0xc000000], and in E820 it is RAM ....

The VRAM and GTT addresses in the dmesg are internal GPU addresses not
system addresses.  The GPU has it's own internal address space for
on-chip memory clients (texture samplers, render buffers, display
controllers, etc.).  The GPU sets up two apertures in it's internal
address space and on-chip client requests are forwarded to the
appropriate place by the GPU's memory controller.  Addresses in the
GPU's VRAM aperture go to local vram on discrete cards, or to the
stolen memory at the top of system memory for IGP cards.  Addresses in
the GPU's GTT aperture hit a page table and get forwarded to the
appropriate dma pages.

Alex

>
> [    0.000000]  BIOS-e820: 0000000000100000 - 00000000acb8d000 (usable)
> [    0.000000]  BIOS-e820: 00000000acb8d000 - 00000000acb8f000 (reserved)
> [    0.000000]  BIOS-e820: 00000000acb8f000 - 00000000afce9000 (usable)
> [    0.000000]  BIOS-e820: 00000000afce9000 - 00000000afd21000 (reserved)
> [    0.000000]  BIOS-e820: 00000000afd21000 - 00000000afd4f000 (usable)
> [    0.000000]  BIOS-e820: 00000000afd4f000 - 00000000afdcf000 (reserved)
> [    0.000000]  BIOS-e820: 00000000afdcf000 - 00000000afecf000 (ACPI NVS)
> [    0.000000]  BIOS-e820: 00000000afecf000 - 00000000afeff000 (ACPI data)
> [    0.000000]  BIOS-e820: 00000000afeff000 - 00000000aff00000 (usable)
>
>
> so looks bios program wrong address to the radon card?
>
> Thanks
>
> Yinghai Lu
>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 19:34             ` Joerg Roedel
@ 2011-04-13 20:48               ` Yinghai Lu
  2011-04-13 20:54                 ` Linus Torvalds
  2011-04-13 21:50                 ` Joerg Roedel
  0 siblings, 2 replies; 108+ messages in thread
From: Yinghai Lu @ 2011-04-13 20:48 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Ingo Molnar, Alex Deucher, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, H. Peter Anvin,
	Thomas Gleixner, Tejun Heo

On 04/13/2011 12:34 PM, Joerg Roedel wrote:
> On Wed, Apr 13, 2011 at 12:14:55PM -0700, Yinghai Lu wrote:
>> thanks for the bisecting...
>>
>> so those two patches uncover some problems.
>>
>> [    0.000000] Checking aperture...
>> [    0.000000] No AGP bridge found
>> [    0.000000] Node 0: aperture @ a0000000 size 32 MB
>> [    0.000000] Aperture pointing to e820 RAM. Ignoring.
>> [    0.000000] Your BIOS doesn't leave a aperture memory hole
>> [    0.000000] Please enable the IOMMU option in the BIOS setup
>> [    0.000000] This costs you 64 MB of RAM
>> [    0.000000]     memblock_x86_reserve_range: [0xa0000000-0xa3ffffff]       aperture64
>> [    0.000000] Mapping aperture over 65536 KB of RAM @ a0000000
>>
>> so kernel try to reallocate apperture. because BIOS allocated is pointed to RAM or size is too small.
> 
> It is actually beyond 4GB on that machine, this value read here is from
> the previous kernel-boot. The BIOS does not reset these values on a
> reboot.
> 
>> but your radeon does use [0xa0000000, 0xbfffffff)
> 
> Yes, I suspected that too (and spent a few hours reading radeon code),
> but then I talked the Alex Deucher and he explained that these addresses
> which the driver prints for GTT and VRAM are in the GPU address space
> and do not refer to system ram. So this shouldn't be the problem.


can you try following change ? it will push gart to 0x80000000

diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index 86d1ad4..3b6a9d5 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -83,7 +83,7 @@ static u32 __init allocate_aperture(void)
 	 * so don't use 512M below as gart iommu, leave the space for kernel
 	 * code for safe
 	 */
-	addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
+	addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21);
 	if (addr == MEMBLOCK_ERROR || addr + aper_size > 0xffffffff) {
 		printk(KERN_ERR
 			"Cannot allocate aperture memory hole (%lx,%uK)\n",

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 20:48               ` Yinghai Lu
@ 2011-04-13 20:54                 ` Linus Torvalds
  2011-04-13 21:23                   ` Yinghai Lu
  2011-04-13 21:50                 ` Joerg Roedel
  1 sibling, 1 reply; 108+ messages in thread
From: Linus Torvalds @ 2011-04-13 20:54 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Joerg Roedel, Ingo Molnar, Alex Deucher,
	Linux Kernel Mailing List, dri-devel, H. Peter Anvin,
	Thomas Gleixner, Tejun Heo

On Wed, Apr 13, 2011 at 1:48 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>
> can you try following change ? it will push gart to 0x80000000
>
> diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
> index 86d1ad4..3b6a9d5 100644
> --- a/arch/x86/kernel/aperture_64.c
> +++ b/arch/x86/kernel/aperture_64.c
> @@ -83,7 +83,7 @@ static u32 __init allocate_aperture(void)
>         * so don't use 512M below as gart iommu, leave the space for kernel
>         * code for safe
>         */
> -       addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
> +       addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21);

What are all the magic numbers, and why would 0x80000000 be special?

Why don't we write code that just works?

Or absent a "just works" set of patches, why don't we revert to code
that has years of testing?

This kind of "I broke things, so now I will jiggle things randomly
until they unbreak" is not acceptable.

Either explain why that fixes a real BUG (and why the magic constants
need to be what they are), or just revert the patch that caused the
problem, and go back to the allocation patters that have years of
experience.

Guys, we've had this discussion before, in PCI allocation. We don't do
this. We tried switching the PCI region allocations to top-down, and
IT WAS A FAILURE. We reverted it to what we had years of testing with.

Don't just make random changes. There really are only two acceptable
models of development: "think and analyze" or "years and years of
testing on thousands of machines". Those two really do work.

                   Linus

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 20:54                 ` Linus Torvalds
@ 2011-04-13 21:23                   ` Yinghai Lu
  2011-04-13 23:39                     ` Linus Torvalds
  0 siblings, 1 reply; 108+ messages in thread
From: Yinghai Lu @ 2011-04-13 21:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Joerg Roedel, Ingo Molnar, Alex Deucher,
	Linux Kernel Mailing List, dri-devel, H. Peter Anvin,
	Thomas Gleixner, Tejun Heo

On 04/13/2011 01:54 PM, Linus Torvalds wrote:
> On Wed, Apr 13, 2011 at 1:48 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>
>> can you try following change ? it will push gart to 0x80000000
>>
>> diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
>> index 86d1ad4..3b6a9d5 100644
>> --- a/arch/x86/kernel/aperture_64.c
>> +++ b/arch/x86/kernel/aperture_64.c
>> @@ -83,7 +83,7 @@ static u32 __init allocate_aperture(void)
>>         * so don't use 512M below as gart iommu, leave the space for kernel
>>         * code for safe
>>         */
>> -       addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
>> +       addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21);
> 
> What are all the magic numbers, and why would 0x80000000 be special?

that is the old value when kernel was doing bottom-up bootmem allocation.

> 
> Why don't we write code that just works?
> 
> Or absent a "just works" set of patches, why don't we revert to code
> that has years of testing?
> 
> This kind of "I broke things, so now I will jiggle things randomly
> until they unbreak" is not acceptable.
> 
> Either explain why that fixes a real BUG (and why the magic constants
> need to be what they are), or just revert the patch that caused the
> problem, and go back to the allocation patters that have years of
> experience.
> 
> Guys, we've had this discussion before, in PCI allocation. We don't do
> this. We tried switching the PCI region allocations to top-down, and
> IT WAS A FAILURE. We reverted it to what we had years of testing with.
> 
> Don't just make random changes. There really are only two acceptable
> models of development: "think and analyze" or "years and years of
> testing on thousands of machines". Those two really do work.

We did do the analyzing, and only difference seems to be:
good one is using 0x80000000
and bad one is using 0xa0000000.

We try to figure out if it needs low address and it happen to work 
because kernel was doing bottom up allocation.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 20:48               ` Yinghai Lu
  2011-04-13 20:54                 ` Linus Torvalds
@ 2011-04-13 21:50                 ` Joerg Roedel
  2011-04-13 21:59                   ` Yinghai Lu
  2011-04-13 22:01                   ` H. Peter Anvin
  1 sibling, 2 replies; 108+ messages in thread
From: Joerg Roedel @ 2011-04-13 21:50 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Alex Deucher, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, H. Peter Anvin,
	Thomas Gleixner, Tejun Heo

On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote:
> -	addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
> +	addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21);

Btw, while looking at this code I wondered why the 512M goal is enforced
by the alignment. Start could be set to 512M instead and the alignment
can be aper_size as it should. Any reason for such a big alignment?

	Joerg

P.S.: The box is still in the office, I will try this debug-patch
      tomorrow.


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 21:50                 ` Joerg Roedel
@ 2011-04-13 21:59                   ` Yinghai Lu
  2011-04-13 22:11                     ` H. Peter Anvin
  2011-04-13 22:01                   ` H. Peter Anvin
  1 sibling, 1 reply; 108+ messages in thread
From: Yinghai Lu @ 2011-04-13 21:59 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Ingo Molnar, Alex Deucher, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, H. Peter Anvin,
	Thomas Gleixner, Tejun Heo

On 04/13/2011 02:50 PM, Joerg Roedel wrote:
> On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote:
>> -	addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
>> +	addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21);
> 
> Btw, while looking at this code I wondered why the 512M goal is enforced
> by the alignment. Start could be set to 512M instead and the alignment
> can be aper_size as it should. Any reason for such a big alignment?
> 

when using bootmem, try to use big alignment (512M ), so we could avoid take ram range below 512M.

commit 7677b2ef6c0c4fddc84f6473f3863f40eb71821b
Author: Yinghai Lu <yhlu.kernel.send@gmail.com>
Date:   Mon Apr 14 20:40:37 2008 -0700

    x86_64: allocate gart aperture from 512M
    
    because we try to reserve dma32 early, so we have chance to get aperture
    from 64M.
    
    with some sequence aperture allocated from RAM, could become E820_RESERVED.
    
    and then if doing a kexec with a big kernel that uncompressed size is above
    64M we could have a range conflict with still using gart.
    
    So allocate gart aperture from 512M instead.
    
    Also change the fallback_aper_order to 5, because we don't have chance to get
    2G or 4G aperture.

We can change it back to 32M or make it equal to size.

> 
> P.S.: The box is still in the office, I will try this debug-patch
>       tomorrow.

Alexandre's system is working at 0xa4000000 with 2.6.38.2

So it is not low address problem. could be other reason like
some other code could need lower address.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 21:50                 ` Joerg Roedel
  2011-04-13 21:59                   ` Yinghai Lu
@ 2011-04-13 22:01                   ` H. Peter Anvin
  2011-04-13 22:22                     ` Joerg Roedel
  1 sibling, 1 reply; 108+ messages in thread
From: H. Peter Anvin @ 2011-04-13 22:01 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Yinghai Lu, Ingo Molnar, Alex Deucher, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo

On 04/13/2011 02:50 PM, Joerg Roedel wrote:
> On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote:
>> -	addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
>> +	addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21);
> 
> Btw, while looking at this code I wondered why the 512M goal is enforced
> by the alignment. Start could be set to 512M instead and the alignment
> can be aper_size as it should. Any reason for such a big alignment?
> 
> 	Joerg
> 
> P.S.: The box is still in the office, I will try this debug-patch
>       tomorrow.

The only reason that I can think of is that the aperture itself can be
huge, and perhaps 512 MiB is the biggest such known.  512ULL<<21 is of
course a particularly moronic way to write 1 GiB, but it was a debug patch.

The value 512 MiB apparently comes from
7677b2ef6c0c4fddc84f6473f3863f40eb71821b, which is apparently totally ad
hoc; effectively it tries to prevent a collision with kexec by
hardcoding the kdump allocation as it sat at that point in time in the
GART assignment rules.

Yeah.  Brilliant.

	-hpa


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 21:59                   ` Yinghai Lu
@ 2011-04-13 22:11                     ` H. Peter Anvin
  0 siblings, 0 replies; 108+ messages in thread
From: H. Peter Anvin @ 2011-04-13 22:11 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Joerg Roedel, Ingo Molnar, Alex Deucher, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo

On 04/13/2011 02:59 PM, Yinghai Lu wrote:
> On 04/13/2011 02:50 PM, Joerg Roedel wrote:
>> On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote:
>>> -	addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
>>> +	addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21);
>>
>> Btw, while looking at this code I wondered why the 512M goal is enforced
>> by the alignment. Start could be set to 512M instead and the alignment
>> can be aper_size as it should. Any reason for such a big alignment?
>>
> 
> when using bootmem, try to use big alignment (512M ), so we could avoid take ram range below 512M.
> 

Yes, his question was why on Earth are you using 0 as start if that is
the purpose.

On top of that, where the hell does the magic 512 MiB come from?  It
looks like it is either completly ad hoc, or it has something to do with
where the kexec kernel was allocated once upon a time.

	-hpa

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 22:01                   ` H. Peter Anvin
@ 2011-04-13 22:22                     ` Joerg Roedel
  2011-04-13 22:31                       ` H. Peter Anvin
  0 siblings, 1 reply; 108+ messages in thread
From: Joerg Roedel @ 2011-04-13 22:22 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Ingo Molnar, Alex Deucher, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo

On Wed, Apr 13, 2011 at 03:01:10PM -0700, H. Peter Anvin wrote:
> On 04/13/2011 02:50 PM, Joerg Roedel wrote:
> > On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote:
> >> -	addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
> >> +	addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21);
> > 
> > Btw, while looking at this code I wondered why the 512M goal is enforced
> > by the alignment. Start could be set to 512M instead and the alignment
> > can be aper_size as it should. Any reason for such a big alignment?
> > 
> > 	Joerg
> > 
> > P.S.: The box is still in the office, I will try this debug-patch
> >       tomorrow.
> 
> The only reason that I can think of is that the aperture itself can be
> huge, and perhaps 512 MiB is the biggest such known. 

Well, that would work as well by just using aper_size as alignment, the
aperture needs to be aligned on its size anyway. This code only runs
when Linux allocates the aperture itself and if I am mistaken is uses
always 64MB when doing this.

	Joerg


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 22:22                     ` Joerg Roedel
@ 2011-04-13 22:31                       ` H. Peter Anvin
  2011-04-14  8:59                         ` Joerg Roedel
  0 siblings, 1 reply; 108+ messages in thread
From: H. Peter Anvin @ 2011-04-13 22:31 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Yinghai Lu, Ingo Molnar, Alex Deucher, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo

On 04/13/2011 03:22 PM, Joerg Roedel wrote:
> On Wed, Apr 13, 2011 at 03:01:10PM -0700, H. Peter Anvin wrote:
>> On 04/13/2011 02:50 PM, Joerg Roedel wrote:
>>> On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote:
>>>> -	addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
>>>> +	addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21);
>>>
>>> Btw, while looking at this code I wondered why the 512M goal is enforced
>>> by the alignment. Start could be set to 512M instead and the alignment
>>> can be aper_size as it should. Any reason for such a big alignment?
>>>
>>> 	Joerg
>>>
>>> P.S.: The box is still in the office, I will try this debug-patch
>>>       tomorrow.
>>
>> The only reason that I can think of is that the aperture itself can be
>> huge, and perhaps 512 MiB is the biggest such known. 
> 
> Well, that would work as well by just using aper_size as alignment, the
> aperture needs to be aligned on its size anyway. This code only runs
> when Linux allocates the aperture itself and if I am mistaken is uses
> always 64MB when doing this.

Yes, I would agree with that.  The sane thing would be to set the base
to whatever address needs to be guarded against (WHICH SHOULD BE
MOTIVATED), and use aper_size as alignment, *unless* we are only using
the initial portion of a much larger hardware structure that needs
natural alignment (which isn't clear to me, I do know we sometimes use
only a fraction of the GART, but that doesn't mean we need to
naturally-align the entire thing, nor that 512 MiB is sufficient to do so.)

	-hpa



^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 21:23                   ` Yinghai Lu
@ 2011-04-13 23:39                     ` Linus Torvalds
  2011-04-14  0:10                       ` Yinghai Lu
  2011-04-14  2:03                       ` H. Peter Anvin
  0 siblings, 2 replies; 108+ messages in thread
From: Linus Torvalds @ 2011-04-13 23:39 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Joerg Roedel, Ingo Molnar, Alex Deucher,
	Linux Kernel Mailing List, dri-devel, H. Peter Anvin,
	Thomas Gleixner, Tejun Heo

On Wed, Apr 13, 2011 at 2:23 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>
>> What are all the magic numbers, and why would 0x80000000 be special?
>
> that is the old value when kernel was doing bottom-up bootmem allocation.

I understand, BUT THAT IS STILL A TOTALLY MAGIC NUMBER!

It makes it come out the same ON THAT ONE MACHINE.  So no, it's not
"the old value". It's a random value that gets the old value in one
specific case.

>> Why don't we write code that just works?
>>
>> Or absent a "just works" set of patches, why don't we revert to code
>> that has years of testing?
>>
>> This kind of "I broke things, so now I will jiggle things randomly
>> until they unbreak" is not acceptable.
>>
>> Either explain why that fixes a real BUG (and why the magic constants
>> need to be what they are), or just revert the patch that caused the
>> problem, and go back to the allocation patters that have years of
>> experience.
>>
>> Guys, we've had this discussion before, in PCI allocation. We don't do
>> this. We tried switching the PCI region allocations to top-down, and
>> IT WAS A FAILURE. We reverted it to what we had years of testing with.
>>
>> Don't just make random changes. There really are only two acceptable
>> models of development: "think and analyze" or "years and years of
>> testing on thousands of machines". Those two really do work.
>
> We did do the analyzing, and only difference seems to be:

No.

Yinghai, we have had this discussion before, and dammit, you need to
understand the difference between "understanding the problem" and "put
in random values until it works on one machine".

There was absolutely _zero_ analysis done. You do not actually
understand WHY the numbers matter. You just look at two random
numbers, and one works, the other does not. That's not "analyzing".
That's just "random number games".

If you cannot see and understand the difference between an actual
analytical solution where you _understand_ what the code is doing and
why, and "random numbers that happen to work on one machine", I don't
know what to tell you.

> good one is using 0x80000000
> and bad one is using 0xa0000000.
>
> We try to figure out if it needs low address and it happen to work
> because kernel was doing bottom up allocation.

No.

Let me repeat my point one more time.

You have TWO choices. Not more, not less:

 - choice #1: go back to the old allocation model. It's tested. It
doesn't regress. Admittedly we may not know exactly _why_ it works,
and it might not work on all machines, but it doesn't cause
regressions (ie the machines it doesn't work on it _never_ worked on).

   And this doesn't mean "old value for that _one_ machine". It means
"old value for _every_ machine". So it means we revert the whole
bottom-down thing entirely. Not just "change one random number so that
the totally different allocation pattern happens to give the same
result on one particular machine".

   Quite frankly, I don't see the point of doing top-to-bottom anyway,
so I think we should do this regardless. Just revert the whole
"allocate from top". It didn't work for PCI, it's not working for this
case either. Stop doing it.

 - Choice #2: understand exactly _what_ goes wrong, and fix it
analytically (ie by _understanding_ the problem, and being able to
solve it exactly, and in a way you can argue about without having to
resort to "magic happens").

Now, the whole analytic approach (aka "computer sciency" approach),
where you can actually think about the problem without having any
pesky "reality" impact the solution is obviously the one we tend to
prefer. Sadly, it's seldom the one we can use in reality when it comes
to things like resource allocation, since we end up starting off with
often buggy approximations of what the actual hardware is all about
(ie broken firmware tables).

So I'd love to know exactly why one random number works, and why
another one doesn't. But as long as we do _not_ know the "Why" of it,
we will have to revert.

It really is that simple. It's _always_ that simple.

So the numbers shouldn't be "magic", they should have real
explanations. And in the absense of real explanation, the model that
works is "this is what we've always done". Including, very much, the
whole allocation order. Not just one random number on one random
machine.

                        Linus

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 23:39                     ` Linus Torvalds
@ 2011-04-14  0:10                       ` Yinghai Lu
  2011-04-14  2:03                       ` H. Peter Anvin
  1 sibling, 0 replies; 108+ messages in thread
From: Yinghai Lu @ 2011-04-14  0:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Joerg Roedel, Ingo Molnar, Alex Deucher,
	Linux Kernel Mailing List, dri-devel, H. Peter Anvin,
	Thomas Gleixner, Tejun Heo

On 04/13/2011 04:39 PM, Linus Torvalds wrote:
> On Wed, Apr 13, 2011 at 2:23 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>>
>>> What are all the magic numbers, and why would 0x80000000 be special?
>>
>> that is the old value when kernel was doing bottom-up bootmem allocation.
> 
> I understand, BUT THAT IS STILL A TOTALLY MAGIC NUMBER!
> 
> It makes it come out the same ON THAT ONE MACHINE.  So no, it's not
> "the old value". It's a random value that gets the old value in one
> specific case.

Alexandre's system is working 2.6.38.2 and kernel allocate from 0xa4000000
Joerg's system working 2.6.39-rc3 while revert the top down bootmem patch 
	1a4a678b12c84db9ae5dce424e0e97f0559bb57c
and kernel allocate to 0x80000000.
Alexandre's system is working while increasing alignment to 1g, and make kernel to
allocate 0x80000000 to gart.

they are not working if kernel allocate from 0xa0000000

the 0xa0000000 looks like same value from radon GTT.


[    4.250159] radeon 0000:01:05.0: VRAM: 320M 0x00000000C0000000 - 0x00000000D3FFFFFF (320M used)
[    4.258830] radeon 0000:01:05.0: GTT: 512M 0x00000000A0000000 - 0x00000000BFFFFFFF
[    4.266742] [drm] Detected VRAM RAM=320M, BAR=256M
[    4.271549] [drm] RAM width 32bits DDR
[    4.275435] [TTM] Zone  kernel: Available graphics memory: 1896526 kiB.
[    4.282066] [TTM] Initializing pool allocator.
[    4.282085] usb 7-2: new full speed USB device number 2 using ohci_hcd
[    4.293076] [drm] radeon: 320M of VRAM memory ready
[    4.298277] [drm] radeon: 512M of GTT memory ready.
[    4.303218] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[    4.309854] [drm] Driver supports precise vblank timestamp query.
[    4.315970] [drm] radeon: irq initialized.
[    4.320094] [drm] GART: num cpu pages 131072, num gpu pages 131072

Alex said that 0xa0000000 is ok and is from GPU address space
---
The VRAM and GTT addresses in the dmesg are internal GPU addresses not
system addresses.  The GPU has it's own internal address space for
on-chip memory clients (texture samplers, render buffers, display
controllers, etc.).  The GPU sets up two apertures in it's internal
address space and on-chip client requests are forwarded to the
appropriate place by the GPU's memory controller.  Addresses in the
GPU's VRAM aperture go to local vram on discrete cards, or to the
stolen memory at the top of system memory for IGP cards.  Addresses in
the GPU's GTT aperture hit a page table and get forwarded to the
appropriate dma pages.
---

> 
>>> Why don't we write code that just works?
>>>
>>> Or absent a "just works" set of patches, why don't we revert to code
>>> that has years of testing?
>>>
>>> This kind of "I broke things, so now I will jiggle things randomly
>>> until they unbreak" is not acceptable.
>>>
>>> Either explain why that fixes a real BUG (and why the magic constants
>>> need to be what they are), or just revert the patch that caused the
>>> problem, and go back to the allocation patters that have years of
>>> experience.
>>>
>>> Guys, we've had this discussion before, in PCI allocation. We don't do
>>> this. We tried switching the PCI region allocations to top-down, and
>>> IT WAS A FAILURE. We reverted it to what we had years of testing with.
>>>
>>> Don't just make random changes. There really are only two acceptable
>>> models of development: "think and analyze" or "years and years of
>>> testing on thousands of machines". Those two really do work.
>>
>> We did do the analyzing, and only difference seems to be:
> 
> No.
> 
> Yinghai, we have had this discussion before, and dammit, you need to
> understand the difference between "understanding the problem" and "put
> in random values until it works on one machine".
> 
> There was absolutely _zero_ analysis done. You do not actually
> understand WHY the numbers matter. You just look at two random
> numbers, and one works, the other does not. That's not "analyzing".
> That's just "random number games".
> 
> If you cannot see and understand the difference between an actual
> analytical solution where you _understand_ what the code is doing and
> why, and "random numbers that happen to work on one machine", I don't
> know what to tell you.
> 
>> good one is using 0x80000000
>> and bad one is using 0xa0000000.
>>
>> We try to figure out if it needs low address and it happen to work
>> because kernel was doing bottom up allocation.
> 
> No.
> 
> Let me repeat my point one more time.
> 
> You have TWO choices. Not more, not less:
> 
>  - choice #1: go back to the old allocation model. It's tested. It
> doesn't regress. Admittedly we may not know exactly _why_ it works,
> and it might not work on all machines, but it doesn't cause
> regressions (ie the machines it doesn't work on it _never_ worked on).
> 
>    And this doesn't mean "old value for that _one_ machine". It means
> "old value for _every_ machine". So it means we revert the whole
> bottom-down thing entirely. Not just "change one random number so that
> the totally different allocation pattern happens to give the same
> result on one particular machine".
> 
>    Quite frankly, I don't see the point of doing top-to-bottom anyway,
> so I think we should do this regardless. Just revert the whole
> "allocate from top". It didn't work for PCI, it's not working for this
> case either. Stop doing it.

we did some codes to prevent bootmem to use low range.

> 
>  - Choice #2: understand exactly _what_ goes wrong, and fix it
> analytically (ie by _understanding_ the problem, and being able to
> solve it exactly, and in a way you can argue about without having to
> resort to "magic happens").
> 
> Now, the whole analytic approach (aka "computer sciency" approach),
> where you can actually think about the problem without having any
> pesky "reality" impact the solution is obviously the one we tend to
> prefer. Sadly, it's seldom the one we can use in reality when it comes
> to things like resource allocation, since we end up starting off with
> often buggy approximations of what the actual hardware is all about
> (ie broken firmware tables).
> 
> So I'd love to know exactly why one random number works, and why
> another one doesn't. But as long as we do _not_ know the "Why" of it,
> we will have to revert.
> 
> It really is that simple. It's _always_ that simple.
> 
> So the numbers shouldn't be "magic", they should have real
> explanations. And in the absense of real explanation, the model that
> works is "this is what we've always done". Including, very much, the
> whole allocation order. Not just one random number on one random
> machine.

Ok, let's try to figure out why 0xa0000000 can not be used.

if we can not figure out, we can revert

1a4a678b12c84db9ae5dce424e0e97f0559bb57c

thanks

Yinghai 

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 19:14           ` Yinghai Lu
@ 2011-04-14  1:58               ` H. Peter Anvin
  2011-04-13 19:48             ` Alex Deucher
  2011-04-14  1:58               ` H. Peter Anvin
  2 siblings, 0 replies; 108+ messages in thread
From: H. Peter Anvin @ 2011-04-14  1:58 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Joerg Roedel, Ingo Molnar, Alex Deucher, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo

On 04/13/2011 12:14 PM, Yinghai Lu wrote:
> 
> so those two patches uncover some problems.
> 
> [    0.000000] Checking aperture...
> [    0.000000] No AGP bridge found
> [    0.000000] Node 0: aperture @ a0000000 size 32 MB
> [    0.000000] Aperture pointing to e820 RAM. Ignoring.
> [    0.000000] Your BIOS doesn't leave a aperture memory hole
> [    0.000000] Please enable the IOMMU option in the BIOS setup
> [    0.000000] This costs you 64 MB of RAM
> [    0.000000]     memblock_x86_reserve_range: [0xa0000000-0xa3ffffff]       aperture64
> [    0.000000] Mapping aperture over 65536 KB of RAM @ a0000000
> 
> so kernel try to reallocate apperture. because BIOS allocated is pointed to RAM or size is too small.
> 
> but your radeon does use [0xa0000000, 0xbfffffff)
> 
> [    4.281993] radeon 0000:01:05.0: VRAM: 320M 0x00000000C0000000 - 0x00000000D3FFFFFF (320M used)
> [    4.290672] radeon 0000:01:05.0: GTT: 512M 0x00000000A0000000 - 0x00000000BFFFFFFF
> [    4.298550] [drm] Detected VRAM RAM=320M, BAR=256M
> [    4.309857] [drm] RAM width 32bits DDR
> [    4.313748] [TTM] Zone  kernel: Available graphics memory: 1896524 kiB.
> [    4.320379] [TTM] Initializing pool allocator.
> [    4.324948] [drm] radeon: 320M of VRAM memory ready
> [    4.329832] [drm] radeon: 512M of GTT memory ready.
> 
> and the one seems working:
> 
> [    0.000000] Checking aperture...
> [    0.000000] No AGP bridge found
> [    0.000000] Node 0: aperture @ a0000000 size 32 MB
> [    0.000000] Aperture pointing to e820 RAM. Ignoring.
> [    0.000000] Your BIOS doesn't leave a aperture memory hole
> [    0.000000] Please enable the IOMMU option in the BIOS setup
> [    0.000000] This costs you 64 MB of RAM
> [    0.000000]     memblock_x86_reserve_range: [0x80000000-0x83ffffff]       aperture64
> [    0.000000] Mapping aperture over 65536 KB of RAM @ 80000000
> [    0.000000]     memblock_x86_reserve_range: [0xacb6bdc0-0xacb6bddf]          BOOTMEM
> 
> will use different position...
> 
> [    4.250159] radeon 0000:01:05.0: VRAM: 320M 0x00000000C0000000 - 0x00000000D3FFFFFF (320M used)
> [    4.258830] radeon 0000:01:05.0: GTT: 512M 0x00000000A0000000 - 0x00000000BFFFFFFF
> [    4.266742] [drm] Detected VRAM RAM=320M, BAR=256M
> [    4.271549] [drm] RAM width 32bits DDR
> [    4.275435] [TTM] Zone  kernel: Available graphics memory: 1896526 kiB.
> [    4.282066] [TTM] Initializing pool allocator.
> [    4.282085] usb 7-2: new full speed USB device number 2 using ohci_hcd
> [    4.293076] [drm] radeon: 320M of VRAM memory ready
> [    4.298277] [drm] radeon: 512M of GTT memory ready.
> [    4.303218] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
> [    4.309854] [drm] Driver supports precise vblank timestamp query.
> [    4.315970] [drm] radeon: irq initialized.
> [    4.320094] [drm] GART: num cpu pages 131072, num gpu pages 131072
> 
> So question is why radeon is using the address [0xa0000000 - 0xc000000], and in E820 it is RAM ....
> 
> [    0.000000]  BIOS-e820: 0000000000100000 - 00000000acb8d000 (usable)
> [    0.000000]  BIOS-e820: 00000000acb8d000 - 00000000acb8f000 (reserved)
> [    0.000000]  BIOS-e820: 00000000acb8f000 - 00000000afce9000 (usable)
> [    0.000000]  BIOS-e820: 00000000afce9000 - 00000000afd21000 (reserved)
> [    0.000000]  BIOS-e820: 00000000afd21000 - 00000000afd4f000 (usable)
> [    0.000000]  BIOS-e820: 00000000afd4f000 - 00000000afdcf000 (reserved)
> [    0.000000]  BIOS-e820: 00000000afdcf000 - 00000000afecf000 (ACPI NVS)
> [    0.000000]  BIOS-e820: 00000000afecf000 - 00000000afeff000 (ACPI data)
> [    0.000000]  BIOS-e820: 00000000afeff000 - 00000000aff00000 (usable)
> 
> so looks bios program wrong address to the radon card?
> 

Okay, staring at this, it definitely seems toxic to overlay the GART
over memory areas reserved by the BIOS.  If I were to guess, I would say
that the problem here seems to be that the kernel thinks it is
overlaying 64 MiB of memory, but the actual GART is in fact 512 MiB in
size -- 131072 CPU pages -- which now overlaps the BIOS reserved areas.

Alex D., could you comment on the "num cpu pages" bit?

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
@ 2011-04-14  1:58               ` H. Peter Anvin
  0 siblings, 0 replies; 108+ messages in thread
From: H. Peter Anvin @ 2011-04-14  1:58 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linux Kernel Mailing List, dri-devel, Tejun Heo, Linus Torvalds,
	Thomas Gleixner

On 04/13/2011 12:14 PM, Yinghai Lu wrote:
> 
> so those two patches uncover some problems.
> 
> [    0.000000] Checking aperture...
> [    0.000000] No AGP bridge found
> [    0.000000] Node 0: aperture @ a0000000 size 32 MB
> [    0.000000] Aperture pointing to e820 RAM. Ignoring.
> [    0.000000] Your BIOS doesn't leave a aperture memory hole
> [    0.000000] Please enable the IOMMU option in the BIOS setup
> [    0.000000] This costs you 64 MB of RAM
> [    0.000000]     memblock_x86_reserve_range: [0xa0000000-0xa3ffffff]       aperture64
> [    0.000000] Mapping aperture over 65536 KB of RAM @ a0000000
> 
> so kernel try to reallocate apperture. because BIOS allocated is pointed to RAM or size is too small.
> 
> but your radeon does use [0xa0000000, 0xbfffffff)
> 
> [    4.281993] radeon 0000:01:05.0: VRAM: 320M 0x00000000C0000000 - 0x00000000D3FFFFFF (320M used)
> [    4.290672] radeon 0000:01:05.0: GTT: 512M 0x00000000A0000000 - 0x00000000BFFFFFFF
> [    4.298550] [drm] Detected VRAM RAM=320M, BAR=256M
> [    4.309857] [drm] RAM width 32bits DDR
> [    4.313748] [TTM] Zone  kernel: Available graphics memory: 1896524 kiB.
> [    4.320379] [TTM] Initializing pool allocator.
> [    4.324948] [drm] radeon: 320M of VRAM memory ready
> [    4.329832] [drm] radeon: 512M of GTT memory ready.
> 
> and the one seems working:
> 
> [    0.000000] Checking aperture...
> [    0.000000] No AGP bridge found
> [    0.000000] Node 0: aperture @ a0000000 size 32 MB
> [    0.000000] Aperture pointing to e820 RAM. Ignoring.
> [    0.000000] Your BIOS doesn't leave a aperture memory hole
> [    0.000000] Please enable the IOMMU option in the BIOS setup
> [    0.000000] This costs you 64 MB of RAM
> [    0.000000]     memblock_x86_reserve_range: [0x80000000-0x83ffffff]       aperture64
> [    0.000000] Mapping aperture over 65536 KB of RAM @ 80000000
> [    0.000000]     memblock_x86_reserve_range: [0xacb6bdc0-0xacb6bddf]          BOOTMEM
> 
> will use different position...
> 
> [    4.250159] radeon 0000:01:05.0: VRAM: 320M 0x00000000C0000000 - 0x00000000D3FFFFFF (320M used)
> [    4.258830] radeon 0000:01:05.0: GTT: 512M 0x00000000A0000000 - 0x00000000BFFFFFFF
> [    4.266742] [drm] Detected VRAM RAM=320M, BAR=256M
> [    4.271549] [drm] RAM width 32bits DDR
> [    4.275435] [TTM] Zone  kernel: Available graphics memory: 1896526 kiB.
> [    4.282066] [TTM] Initializing pool allocator.
> [    4.282085] usb 7-2: new full speed USB device number 2 using ohci_hcd
> [    4.293076] [drm] radeon: 320M of VRAM memory ready
> [    4.298277] [drm] radeon: 512M of GTT memory ready.
> [    4.303218] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
> [    4.309854] [drm] Driver supports precise vblank timestamp query.
> [    4.315970] [drm] radeon: irq initialized.
> [    4.320094] [drm] GART: num cpu pages 131072, num gpu pages 131072
> 
> So question is why radeon is using the address [0xa0000000 - 0xc000000], and in E820 it is RAM ....
> 
> [    0.000000]  BIOS-e820: 0000000000100000 - 00000000acb8d000 (usable)
> [    0.000000]  BIOS-e820: 00000000acb8d000 - 00000000acb8f000 (reserved)
> [    0.000000]  BIOS-e820: 00000000acb8f000 - 00000000afce9000 (usable)
> [    0.000000]  BIOS-e820: 00000000afce9000 - 00000000afd21000 (reserved)
> [    0.000000]  BIOS-e820: 00000000afd21000 - 00000000afd4f000 (usable)
> [    0.000000]  BIOS-e820: 00000000afd4f000 - 00000000afdcf000 (reserved)
> [    0.000000]  BIOS-e820: 00000000afdcf000 - 00000000afecf000 (ACPI NVS)
> [    0.000000]  BIOS-e820: 00000000afecf000 - 00000000afeff000 (ACPI data)
> [    0.000000]  BIOS-e820: 00000000afeff000 - 00000000aff00000 (usable)
> 
> so looks bios program wrong address to the radon card?
> 

Okay, staring at this, it definitely seems toxic to overlay the GART
over memory areas reserved by the BIOS.  If I were to guess, I would say
that the problem here seems to be that the kernel thinks it is
overlaying 64 MiB of memory, but the actual GART is in fact 512 MiB in
size -- 131072 CPU pages -- which now overlaps the BIOS reserved areas.

Alex D., could you comment on the "num cpu pages" bit?

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 23:39                     ` Linus Torvalds
  2011-04-14  0:10                       ` Yinghai Lu
@ 2011-04-14  2:03                       ` H. Peter Anvin
  2011-04-14  2:27                           ` Linus Torvalds
  1 sibling, 1 reply; 108+ messages in thread
From: H. Peter Anvin @ 2011-04-14  2:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Yinghai Lu, Joerg Roedel, Ingo Molnar, Alex Deucher,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo

On 04/13/2011 04:39 PM, Linus Torvalds wrote:
> 
>  - Choice #2: understand exactly _what_ goes wrong, and fix it
> analytically (ie by _understanding_ the problem, and being able to
> solve it exactly, and in a way you can argue about without having to
> resort to "magic happens").
> 
> Now, the whole analytic approach (aka "computer sciency" approach),
> where you can actually think about the problem without having any
> pesky "reality" impact the solution is obviously the one we tend to
> prefer. Sadly, it's seldom the one we can use in reality when it comes
> to things like resource allocation, since we end up starting off with
> often buggy approximations of what the actual hardware is all about
> (ie broken firmware tables).
> 
> So I'd love to know exactly why one random number works, and why
> another one doesn't. But as long as we do _not_ know the "Why" of it,
> we will have to revert.
> 

Yes.  However, even if we *do* revert (and the time is running short on
not reverting) I would like to understand this particular one, simply
because I think it may very well be a problem that is manifesting itself
in other ways on other systems.

The other thing that this has uncovered is that we already have a bunch
of complete b*llsh*t magic numbers in this path, some of which are
trivially shown to be wrong or at least completely arbitrary, so there
are more issues here :(

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14  1:58               ` H. Peter Anvin
  (?)
@ 2011-04-14  2:07               ` Dave Airlie
  2011-04-14  6:10                 ` H. Peter Anvin
  -1 siblings, 1 reply; 108+ messages in thread
From: Dave Airlie @ 2011-04-14  2:07 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Linux Kernel Mailing List, dri-devel, Tejun Heo,
	Linus Torvalds, Thomas Gleixner

On Wed, 2011-04-13 at 18:58 -0700, H. Peter Anvin wrote:
> On 04/13/2011 12:14 PM, Yinghai Lu wrote:
> > 
> > so those two patches uncover some problems.
> > 
> > [    0.000000] Checking aperture...
> > [    0.000000] No AGP bridge found
> > [    0.000000] Node 0: aperture @ a0000000 size 32 MB
> > [    0.000000] Aperture pointing to e820 RAM. Ignoring.
> > [    0.000000] Your BIOS doesn't leave a aperture memory hole
> > [    0.000000] Please enable the IOMMU option in the BIOS setup
> > [    0.000000] This costs you 64 MB of RAM
> > [    0.000000]     memblock_x86_reserve_range: [0xa0000000-0xa3ffffff]       aperture64
> > [    0.000000] Mapping aperture over 65536 KB of RAM @ a0000000
> > 
> > so kernel try to reallocate apperture. because BIOS allocated is pointed to RAM or size is too small.
> > 
> > but your radeon does use [0xa0000000, 0xbfffffff)
> > 
> > [    4.281993] radeon 0000:01:05.0: VRAM: 320M 0x00000000C0000000 - 0x00000000D3FFFFFF (320M used)
> > [    4.290672] radeon 0000:01:05.0: GTT: 512M 0x00000000A0000000 - 0x00000000BFFFFFFF
> > [    4.298550] [drm] Detected VRAM RAM=320M, BAR=256M
> > [    4.309857] [drm] RAM width 32bits DDR
> > [    4.313748] [TTM] Zone  kernel: Available graphics memory: 1896524 kiB.
> > [    4.320379] [TTM] Initializing pool allocator.
> > [    4.324948] [drm] radeon: 320M of VRAM memory ready
> > [    4.329832] [drm] radeon: 512M of GTT memory ready.
> > 
> > and the one seems working:
> > 
> > [    0.000000] Checking aperture...
> > [    0.000000] No AGP bridge found
> > [    0.000000] Node 0: aperture @ a0000000 size 32 MB
> > [    0.000000] Aperture pointing to e820 RAM. Ignoring.
> > [    0.000000] Your BIOS doesn't leave a aperture memory hole
> > [    0.000000] Please enable the IOMMU option in the BIOS setup
> > [    0.000000] This costs you 64 MB of RAM
> > [    0.000000]     memblock_x86_reserve_range: [0x80000000-0x83ffffff]       aperture64
> > [    0.000000] Mapping aperture over 65536 KB of RAM @ 80000000
> > [    0.000000]     memblock_x86_reserve_range: [0xacb6bdc0-0xacb6bddf]          BOOTMEM
> > 
> > will use different position...
> > 
> > [    4.250159] radeon 0000:01:05.0: VRAM: 320M 0x00000000C0000000 - 0x00000000D3FFFFFF (320M used)
> > [    4.258830] radeon 0000:01:05.0: GTT: 512M 0x00000000A0000000 - 0x00000000BFFFFFFF
> > [    4.266742] [drm] Detected VRAM RAM=320M, BAR=256M
> > [    4.271549] [drm] RAM width 32bits DDR
> > [    4.275435] [TTM] Zone  kernel: Available graphics memory: 1896526 kiB.
> > [    4.282066] [TTM] Initializing pool allocator.
> > [    4.282085] usb 7-2: new full speed USB device number 2 using ohci_hcd
> > [    4.293076] [drm] radeon: 320M of VRAM memory ready
> > [    4.298277] [drm] radeon: 512M of GTT memory ready.
> > [    4.303218] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
> > [    4.309854] [drm] Driver supports precise vblank timestamp query.
> > [    4.315970] [drm] radeon: irq initialized.
> > [    4.320094] [drm] GART: num cpu pages 131072, num gpu pages 131072
> > 
> > So question is why radeon is using the address [0xa0000000 - 0xc000000], and in E820 it is RAM ....
> > 
> > [    0.000000]  BIOS-e820: 0000000000100000 - 00000000acb8d000 (usable)
> > [    0.000000]  BIOS-e820: 00000000acb8d000 - 00000000acb8f000 (reserved)
> > [    0.000000]  BIOS-e820: 00000000acb8f000 - 00000000afce9000 (usable)
> > [    0.000000]  BIOS-e820: 00000000afce9000 - 00000000afd21000 (reserved)
> > [    0.000000]  BIOS-e820: 00000000afd21000 - 00000000afd4f000 (usable)
> > [    0.000000]  BIOS-e820: 00000000afd4f000 - 00000000afdcf000 (reserved)
> > [    0.000000]  BIOS-e820: 00000000afdcf000 - 00000000afecf000 (ACPI NVS)
> > [    0.000000]  BIOS-e820: 00000000afecf000 - 00000000afeff000 (ACPI data)
> > [    0.000000]  BIOS-e820: 00000000afeff000 - 00000000aff00000 (usable)
> > 
> > so looks bios program wrong address to the radon card?
> > 
> 
> Okay, staring at this, it definitely seems toxic to overlay the GART
> over memory areas reserved by the BIOS.  If I were to guess, I would say
> that the problem here seems to be that the kernel thinks it is
> overlaying 64 MiB of memory, but the actual GART is in fact 512 MiB in
> size -- 131072 CPU pages -- which now overlaps the BIOS reserved areas.
> 
> Alex D., could you comment on the "num cpu pages" bit?

These are not CPU addresses. I think we've stated that already. Not the
droids.

the num cpu pages is how many CPU pages would be needed to fill the GPU
GTT, for those crazy cases where CPU pagesize != GPU pagesize.

Dave.



^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14  2:03                       ` H. Peter Anvin
@ 2011-04-14  2:27                           ` Linus Torvalds
  0 siblings, 0 replies; 108+ messages in thread
From: Linus Torvalds @ 2011-04-14  2:27 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Yinghai Lu, Joerg Roedel, Ingo Molnar,
	Alex Deucher, Linux Kernel Mailing List, dri-devel,
	Thomas Gleixner, Tejun Heo

On Wednesday, April 13, 2011, H. Peter Anvin <hpa@zytor.com> wrote:
>
> Yes.  However, even if we *do* revert (and the time is running short on
> not reverting) I would like to understand this particular one, simply
> because I think it may very well be a problem that is manifesting itself
> in other ways on other systems.
>
> The other thing that this has uncovered is that we already have a bunch
> of complete b*llsh*t magic numbers in this

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
@ 2011-04-14  2:27                           ` Linus Torvalds
  0 siblings, 0 replies; 108+ messages in thread
From: Linus Torvalds @ 2011-04-14  2:27 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Linux Kernel Mailing List, dri-devel, Tejun Heo,
	Yinghai Lu, Thomas Gleixner

On Wednesday, April 13, 2011, H. Peter Anvin <hpa@zytor.com> wrote:
>
> Yes.  However, even if we *do* revert (and the time is running short on
> not reverting) I would like to understand this particular one, simply
> because I think it may very well be a problem that is manifesting itself
> in other ways on other systems.
>
> The other thing that this has uncovered is that we already have a bunch
> of complete b*llsh*t magic numbers in this

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14  2:27                           ` Linus Torvalds
@ 2011-04-14  2:33                             ` Linus Torvalds
  -1 siblings, 0 replies; 108+ messages in thread
From: Linus Torvalds @ 2011-04-14  2:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Yinghai Lu, Joerg Roedel, Ingo Molnar,
	Alex Deucher, Linux Kernel Mailing List, dri-devel,
	Thomas Gleixner, Tejun Heo

On Wednesday, April 13, 2011, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wednesday, April 13, 2011, H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> Yes.  However, even if we *do* revert (and the time is running short on
>> not reverting) I would like to understand this particular one, simply
>> because I think it may very well be a problem that is manifesting itself
>> in other ways on other systems.

 sorry, fingerfart. Anyway, I agree 100%.

 we definitely want to also understand the reason for things not
working, even if we do revert..

        Linus
>> of complete b*llsh*t magic numbers in this
>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
@ 2011-04-14  2:33                             ` Linus Torvalds
  0 siblings, 0 replies; 108+ messages in thread
From: Linus Torvalds @ 2011-04-14  2:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, dri-devel, Tejun Heo, H. Peter Anvin,
	Yinghai Lu, Thomas Gleixner

On Wednesday, April 13, 2011, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wednesday, April 13, 2011, H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> Yes.  However, even if we *do* revert (and the time is running short on
>> not reverting) I would like to understand this particular one, simply
>> because I think it may very well be a problem that is manifesting itself
>> in other ways on other systems.

 sorry, fingerfart. Anyway, I agree 100%.

 we definitely want to also understand the reason for things not
working, even if we do revert..

        Linus
>> of complete b*llsh*t magic numbers in this
>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14  2:33                             ` Linus Torvalds
  (?)
@ 2011-04-14  4:03                             ` Tejun Heo
  2011-04-14  9:36                               ` Joerg Roedel
  -1 siblings, 1 reply; 108+ messages in thread
From: Tejun Heo @ 2011-04-14  4:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Yinghai Lu, Joerg Roedel, Ingo Molnar,
	Alex Deucher, Linux Kernel Mailing List, dri-devel,
	Thomas Gleixner

Hello,

On Wed, Apr 13, 2011 at 07:33:40PM -0700, Linus Torvalds wrote:
> On Wednesday, April 13, 2011, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Wednesday, April 13, 2011, H. Peter Anvin <hpa@zytor.com> wrote:
> >>
> >> Yes.  However, even if we *do* revert (and the time is running short on
> >> not reverting) I would like to understand this particular one, simply
> >> because I think it may very well be a problem that is manifesting itself
> >> in other ways on other systems.
> 
>  sorry, fingerfart. Anyway, I agree 100%.
> 
>  we definitely want to also understand the reason for things not
> working, even if we do revert..

There were (and still are) places where memblock callers implemented
ad-hoc top-down allocation by stepping down start limit until
allocation succeeds.  Several of them have been removed since top-down
became the default behavior, so simply reverting the commit is likely
to cause subtle issues.  Maybe the best approach is introducing
@topdown parameter and use it selectively for pure memory allocations.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14  2:07               ` Dave Airlie
@ 2011-04-14  6:10                 ` H. Peter Anvin
  0 siblings, 0 replies; 108+ messages in thread
From: H. Peter Anvin @ 2011-04-14  6:10 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Yinghai Lu, Linux Kernel Mailing List, dri-devel, Tejun Heo,
	Linus Torvalds, Thomas Gleixner

On 04/13/2011 07:07 PM, Dave Airlie wrote:
>>
>> Okay, staring at this, it definitely seems toxic to overlay the GART
>> over memory areas reserved by the BIOS.  If I were to guess, I would say
>> that the problem here seems to be that the kernel thinks it is
>> overlaying 64 MiB of memory, but the actual GART is in fact 512 MiB in
>> size -- 131072 CPU pages -- which now overlaps the BIOS reserved areas.
>>
>> Alex D., could you comment on the "num cpu pages" bit?
> 
> These are not CPU addresses. I think we've stated that already. Not the
> droids.
> 
> the num cpu pages is how many CPU pages would be needed to fill the GPU
> GTT, for those crazy cases where CPU pagesize != GPU pagesize.
> 

OK, well, something is still weird.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14  2:33                             ` Linus Torvalds
@ 2011-04-14  8:09                               ` Alan Cox
  -1 siblings, 0 replies; 108+ messages in thread
From: Alan Cox @ 2011-04-14  8:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Yinghai Lu, Joerg Roedel, Ingo Molnar,
	Alex Deucher, Linux Kernel Mailing List, dri-devel,
	Thomas Gleixner, Tejun Heo

On Wed, 13 Apr 2011 19:33:40 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wednesday, April 13, 2011, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Wednesday, April 13, 2011, H. Peter Anvin <hpa@zytor.com> wrote:
> >>
> >> Yes.  However, even if we *do* revert (and the time is running short on
> >> not reverting) I would like to understand this particular one, simply
> >> because I think it may very well be a problem that is manifesting itself
> >> in other ways on other systems.
> 
>  sorry, fingerfart. Anyway, I agree 100%.
> 
>  we definitely want to also understand the reason for things not
> working, even if we do revert..

Definitely because if it fails when the "magic" involves the GART base it
starts to sound like something may be hitting the wrong address space or
not flushing properly.


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
@ 2011-04-14  8:09                               ` Alan Cox
  0 siblings, 0 replies; 108+ messages in thread
From: Alan Cox @ 2011-04-14  8:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, dri-devel, Tejun Heo, Joerg,
	H. Peter Anvin, Yinghai Lu, Thomas Gleixner

On Wed, 13 Apr 2011 19:33:40 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wednesday, April 13, 2011, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Wednesday, April 13, 2011, H. Peter Anvin <hpa@zytor.com> wrote:
> >>
> >> Yes.  However, even if we *do* revert (and the time is running short on
> >> not reverting) I would like to understand this particular one, simply
> >> because I think it may very well be a problem that is manifesting itself
> >> in other ways on other systems.
> 
>  sorry, fingerfart. Anyway, I agree 100%.
> 
>  we definitely want to also understand the reason for things not
> working, even if we do revert..

Definitely because if it fails when the "magic" involves the GART base it
starts to sound like something may be hitting the wrong address space or
not flushing properly.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-12 19:21   ` Dave Jones
  2011-04-12 19:55     ` Linus Torvalds
@ 2011-04-14  8:20     ` Aneesh Kumar K.V
  2011-04-18 22:57       ` Kay Sievers
  1 sibling, 1 reply; 108+ messages in thread
From: Aneesh Kumar K.V @ 2011-04-14  8:20 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Linux Kernel Mailing List, Eric Sandeen

On Tue, 12 Apr 2011 15:21:03 -0400, Dave Jones <davej@redhat.com> wrote:
> On Tue, Apr 12, 2011 at 03:09:34PM -0400, Dave Jones wrote:
> 
>  > however, the output of mount looks very confused..
>  > 
>  > .38:
>  > /dev/mapper/vg_adamo-lv_home on /home type ext4 (rw,relatime,seclabel,barrier=1,data=ordered)
>  > 
>  > .39:
>  > - on /home type 79a9-4526-888c-1f86d35a6704 (rw,relatime,ext4)
>  > 
>  > It looks like /proc/self/mountinfo broke abi.
>  > 
>  > .38:
>  > 48 45 253:3 / /home rw,relatime - ext4 /dev/mapper/vg_adamo-lv_home rw,seclabel,barrier=1,data=ordered
>  > 
>  > .39:
>  > 46 22 253:3 / /home rw,relatime uuid:f3971858-79a9-4526-888c-1f86d35a6704 - ext4 /dev/mapper/vg_adamo-lv_home rw,seclabel,user_xattr,barrier=1,data=ordered
> 
> looks like this was caused by 93f1c20bc8cdb757be50566eff88d65c3b26881f
> 
> perhaps adding that string to the end of the line would preserve what mount expects ?

uuid:<value> is the option field  as per
Documentation/filesystem/proc.txt. There was an error in libmount
parsing which got fixed upstream recently 

you can find the details here 

http://thread.gmane.org/gmane.linux.kernel/1121533/focus=52474

-aneesh

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14  1:58               ` H. Peter Anvin
  (?)
  (?)
@ 2011-04-14  8:56               ` Joerg Roedel
  2011-04-14  9:07                 ` Dave Airlie
                                   ` (2 more replies)
  -1 siblings, 3 replies; 108+ messages in thread
From: Joerg Roedel @ 2011-04-14  8:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Ingo Molnar, Alex Deucher, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo

On Wed, Apr 13, 2011 at 06:58:46PM -0700, H. Peter Anvin wrote:
> On 04/13/2011 12:14 PM, Yinghai Lu wrote:
> > 
> > so looks bios program wrong address to the radon card?
> > 
> 
> Okay, staring at this, it definitely seems toxic to overlay the GART
> over memory areas reserved by the BIOS.  If I were to guess, I would say
> that the problem here seems to be that the kernel thinks it is
> overlaying 64 MiB of memory, but the actual GART is in fact 512 MiB in
> size -- 131072 CPU pages -- which now overlaps the BIOS reserved areas.
> 
> Alex D., could you comment on the "num cpu pages" bit?

Okay, I tried the debug-patch from Yinghai (posted to the bugzilla):

--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -325,6 +325,8 @@ void radeon_gtt_location(struct radeon_device *rdev, struct radeon_mc *mc)
                        mc->gtt_size = size_bf;
                }
                mc->gtt_start = (mc->vram_start & ~mc->gtt_base_align) - mc->gtt_size;
+               if (mc->gtt_start == 0xa0000000)
+                       mc->gtt_start = 0x80000000;
        } else {
                if (mc->gtt_size > size_af) {
                        dev_warn(rdev->dev, "limiting GTT\n");

And this makes a difference, with this change on-top of -rc3 the box boots
fine. So there seems to be some dependency between the GART base and the GTT
base even when they are in different address spaces.

Alex, can you comment on this?

Regards,

	Joerg


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 22:31                       ` H. Peter Anvin
@ 2011-04-14  8:59                         ` Joerg Roedel
  0 siblings, 0 replies; 108+ messages in thread
From: Joerg Roedel @ 2011-04-14  8:59 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Ingo Molnar, Alex Deucher, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo

On Wed, Apr 13, 2011 at 03:31:09PM -0700, H. Peter Anvin wrote:
> On 04/13/2011 03:22 PM, Joerg Roedel wrote:
> > On Wed, Apr 13, 2011 at 03:01:10PM -0700, H. Peter Anvin wrote:
> >> On 04/13/2011 02:50 PM, Joerg Roedel wrote:
> >>> On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote:
> >>>> -	addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
> >>>> +	addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21);
> >>>
> >>> Btw, while looking at this code I wondered why the 512M goal is enforced
> >>> by the alignment. Start could be set to 512M instead and the alignment
> >>> can be aper_size as it should. Any reason for such a big alignment?
> >>>
> >>> 	Joerg
> >>>
> >>> P.S.: The box is still in the office, I will try this debug-patch
> >>>       tomorrow.
> >>
> >> The only reason that I can think of is that the aperture itself can be
> >> huge, and perhaps 512 MiB is the biggest such known. 
> > 
> > Well, that would work as well by just using aper_size as alignment, the
> > aperture needs to be aligned on its size anyway. This code only runs
> > when Linux allocates the aperture itself and if I am mistaken is uses
> > always 64MB when doing this.
> 
> Yes, I would agree with that.  The sane thing would be to set the base
> to whatever address needs to be guarded against (WHICH SHOULD BE
> MOTIVATED), and use aper_size as alignment, *unless* we are only using
> the initial portion of a much larger hardware structure that needs
> natural alignment (which isn't clear to me, I do know we sometimes use
> only a fraction of the GART, but that doesn't mean we need to
> naturally-align the entire thing, nor that 512 MiB is sufficient to do so.)

Whats allocated here is the address-space for the aperture. The code
actually allocates the memory but all it needs is the physical address
range. This range is later programmed into hardware as the GART aperture
(the area the GART remaps).
The Linux code can split the aperture if necessary for DMA-API usage and
AGP usage. In that case both users get a half of the aperture and manage
them itself.

	Joerg


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14  8:56               ` Joerg Roedel
@ 2011-04-14  9:07                 ` Dave Airlie
  2011-04-14  9:11                 ` Ingo Molnar
  2011-04-14 14:28                 ` Alex Deucher
  2 siblings, 0 replies; 108+ messages in thread
From: Dave Airlie @ 2011-04-14  9:07 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: H. Peter Anvin, Yinghai Lu, Ingo Molnar, Alex Deucher,
	Linus Torvalds, Linux Kernel Mailing List, dri-devel,
	Thomas Gleixner, Tejun Heo

On Thu, Apr 14, 2011 at 6:56 PM, Joerg Roedel <joro@8bytes.org> wrote:
> On Wed, Apr 13, 2011 at 06:58:46PM -0700, H. Peter Anvin wrote:
>> On 04/13/2011 12:14 PM, Yinghai Lu wrote:
>> >
>> > so looks bios program wrong address to the radon card?
>> >
>>
>> Okay, staring at this, it definitely seems toxic to overlay the GART
>> over memory areas reserved by the BIOS.  If I were to guess, I would say
>> that the problem here seems to be that the kernel thinks it is
>> overlaying 64 MiB of memory, but the actual GART is in fact 512 MiB in
>> size -- 131072 CPU pages -- which now overlaps the BIOS reserved areas.
>>
>> Alex D., could you comment on the "num cpu pages" bit?
>
> Okay, I tried the debug-patch from Yinghai (posted to the bugzilla):
>
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -325,6 +325,8 @@ void radeon_gtt_location(struct radeon_device *rdev, struct radeon_mc *mc)
>                        mc->gtt_size = size_bf;
>                }
>                mc->gtt_start = (mc->vram_start & ~mc->gtt_base_align) - mc->gtt_size;
> +               if (mc->gtt_start == 0xa0000000)
> +                       mc->gtt_start = 0x80000000;
>        } else {
>                if (mc->gtt_size > size_af) {
>                        dev_warn(rdev->dev, "limiting GTT\n");
>
> And this makes a difference, with this change on-top of -rc3 the box boots
> fine. So there seems to be some dependency between the GART base and the GTT
> base even when they are in different address spaces.
>
> Alex, can you comment on this?

Wierd either a hw bug or some access to the GTT is leaking out before,
things are setup properly,

I think the RS780/880 docs are on the website, but generally the
address spaces are completely separate so anything getting through is
very unusual.

Dave.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14  8:56               ` Joerg Roedel
  2011-04-14  9:07                 ` Dave Airlie
@ 2011-04-14  9:11                 ` Ingo Molnar
  2011-04-14 14:31                   ` H. Peter Anvin
  2011-04-14 14:28                 ` Alex Deucher
  2 siblings, 1 reply; 108+ messages in thread
From: Ingo Molnar @ 2011-04-14  9:11 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: H. Peter Anvin, Yinghai Lu, Alex Deucher, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo


* Joerg Roedel <joro@8bytes.org> wrote:

> On Wed, Apr 13, 2011 at 06:58:46PM -0700, H. Peter Anvin wrote:
> > On 04/13/2011 12:14 PM, Yinghai Lu wrote:
> > > 
> > > so looks bios program wrong address to the radon card?
> > > 
> > 
> > Okay, staring at this, it definitely seems toxic to overlay the GART
> > over memory areas reserved by the BIOS.  If I were to guess, I would say
> > that the problem here seems to be that the kernel thinks it is
> > overlaying 64 MiB of memory, but the actual GART is in fact 512 MiB in
> > size -- 131072 CPU pages -- which now overlaps the BIOS reserved areas.
> > 
> > Alex D., could you comment on the "num cpu pages" bit?
> 
> Okay, I tried the debug-patch from Yinghai (posted to the bugzilla):
> 
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -325,6 +325,8 @@ void radeon_gtt_location(struct radeon_device *rdev, struct radeon_mc *mc)
>                         mc->gtt_size = size_bf;
>                 }
>                 mc->gtt_start = (mc->vram_start & ~mc->gtt_base_align) - mc->gtt_size;
> +               if (mc->gtt_start == 0xa0000000)
> +                       mc->gtt_start = 0x80000000;
>         } else {
>                 if (mc->gtt_size > size_af) {
>                         dev_warn(rdev->dev, "limiting GTT\n");
> 
> And this makes a difference, with this change on-top of -rc3 the box boots
> fine. So there seems to be some dependency between the GART base and the GTT
> base even when they are in different address spaces.
> 
> Alex, can you comment on this?

I'd strongly suggest we revert back to the old and proven allocation order, as 
long as it results in valid layouts. Even if we figure out this particular 
GART/GTT assumption there might be a dozen others in other types of hardware.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14  4:03                             ` Tejun Heo
@ 2011-04-14  9:36                               ` Joerg Roedel
  0 siblings, 0 replies; 108+ messages in thread
From: Joerg Roedel @ 2011-04-14  9:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Linus Torvalds, H. Peter Anvin, Yinghai Lu, Ingo Molnar,
	Alex Deucher, Linux Kernel Mailing List, dri-devel,
	Thomas Gleixner

On Thu, Apr 14, 2011 at 01:03:37PM +0900, Tejun Heo wrote:
> Hello,
> 
> On Wed, Apr 13, 2011 at 07:33:40PM -0700, Linus Torvalds wrote:
> > On Wednesday, April 13, 2011, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > > On Wednesday, April 13, 2011, H. Peter Anvin <hpa@zytor.com> wrote:
> > >>
> > >> Yes.  However, even if we *do* revert (and the time is running short on
> > >> not reverting) I would like to understand this particular one, simply
> > >> because I think it may very well be a problem that is manifesting itself
> > >> in other ways on other systems.
> > 
> >  sorry, fingerfart. Anyway, I agree 100%.
> > 
> >  we definitely want to also understand the reason for things not
> > working, even if we do revert..
> 
> There were (and still are) places where memblock callers implemented
> ad-hoc top-down allocation by stepping down start limit until
> allocation succeeds.  Several of them have been removed since top-down
> became the default behavior, so simply reverting the commit is likely
> to cause subtle issues.  Maybe the best approach is introducing
> @topdown parameter and use it selectively for pure memory allocations.

Wouldn't it be better to provide a seperate memblock allocation
function which operates top-down and use this one in the places that
need it? This way it wouldn't break code that relies on bottom-up.

	Joerg


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14  8:56               ` Joerg Roedel
  2011-04-14  9:07                 ` Dave Airlie
  2011-04-14  9:11                 ` Ingo Molnar
@ 2011-04-14 14:28                 ` Alex Deucher
  2011-04-14 21:09                   ` Joerg Roedel
  2 siblings, 1 reply; 108+ messages in thread
From: Alex Deucher @ 2011-04-14 14:28 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: H. Peter Anvin, Yinghai Lu, Ingo Molnar, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo

On Thu, Apr 14, 2011 at 4:56 AM, Joerg Roedel <joro@8bytes.org> wrote:
> On Wed, Apr 13, 2011 at 06:58:46PM -0700, H. Peter Anvin wrote:
>> On 04/13/2011 12:14 PM, Yinghai Lu wrote:
>> >
>> > so looks bios program wrong address to the radon card?
>> >
>>
>> Okay, staring at this, it definitely seems toxic to overlay the GART
>> over memory areas reserved by the BIOS.  If I were to guess, I would say
>> that the problem here seems to be that the kernel thinks it is
>> overlaying 64 MiB of memory, but the actual GART is in fact 512 MiB in
>> size -- 131072 CPU pages -- which now overlaps the BIOS reserved areas.
>>
>> Alex D., could you comment on the "num cpu pages" bit?
>
> Okay, I tried the debug-patch from Yinghai (posted to the bugzilla):
>
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -325,6 +325,8 @@ void radeon_gtt_location(struct radeon_device *rdev, struct radeon_mc *mc)
>                        mc->gtt_size = size_bf;
>                }
>                mc->gtt_start = (mc->vram_start & ~mc->gtt_base_align) - mc->gtt_size;
> +               if (mc->gtt_start == 0xa0000000)
> +                       mc->gtt_start = 0x80000000;
>        } else {
>                if (mc->gtt_size > size_af) {
>                        dev_warn(rdev->dev, "limiting GTT\n");
>
> And this makes a difference, with this change on-top of -rc3 the box boots
> fine. So there seems to be some dependency between the GART base and the GTT
> base even when they are in different address spaces.
>
> Alex, can you comment on this?

As Dave said, they are completely different addresses spaces.  You
could put the GPU aperture at 0 if you wanted (in fact we do on some
chips).  Perhaps there's some strange interaction with the nb gart
since the nb gart on that chipset was designed to be used for graphics
and the rs780/880 can be configured to use an agp aperture.
Unfortunately, I'm not that familiar with the nb gart.

Alex

>
> Regards,
>
>        Joerg
>
>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14  9:11                 ` Ingo Molnar
@ 2011-04-14 14:31                   ` H. Peter Anvin
  0 siblings, 0 replies; 108+ messages in thread
From: H. Peter Anvin @ 2011-04-14 14:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Joerg Roedel, Yinghai Lu, Alex Deucher, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo

On 04/14/2011 02:11 AM, Ingo Molnar wrote:
> 
> I'd strongly suggest we revert back to the old and proven allocation order, as 
> long as it results in valid layouts. Even if we figure out this particular 
> GART/GTT assumption there might be a dozen others in other types of hardware.
> 

Yes, but we might also be hiding a real bug which bites other hardware.
 We have found real and very serious bugs in the kernel this way before
-- things where drivers scribble over random memory and allocation order
exposed the failure in a predictable way, as opposed to random crashes.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-12  0:40 Linux 2.6.39-rc3 Linus Torvalds
  2011-04-12  9:02 ` Joerg Roedel
  2011-04-12 19:09 ` Dave Jones
@ 2011-04-14 20:24 ` Borislav Petkov
  2011-04-14 20:55   ` Linus Torvalds
  2 siblings, 1 reply; 108+ messages in thread
From: Borislav Petkov @ 2011-04-14 20:24 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Mon, Apr 11, 2011 at 05:40:11PM -0700, Linus Torvalds wrote:
> It's been another almost spookily calm week. Usually this kind of
> calmness happens much later in the -rc series (during -rc7 or -rc8,
> say), but I'm not going to complain. I'm just still waiting for the
> other shoe to drop.

Yep, I had to hit a buglet too. Looks like block layer, ->request_fn
(do_ide_request) called in IRQ disabled region. Process is blkid, i.e.
some udev helper. Kernel is -rc3 + Joerg's USB quirk fix which should be
unrelated.

Happens when I put a cd in the drive. More info upon request.

[20933.365059] BUG: sleeping function called from invalid context at drivers/ide/ide-io.c:468
[20933.365113] in_atomic(): 1, irqs_disabled(): 0, pid: 5817, name: blkid
[20933.365166] no locks held by blkid/5817.
[20933.365193] Pid: 5817, comm: blkid Not tainted 2.6.39-rc3-00001-g1b521ee #9
[20933.365228] Call Trace:
[20933.365282]  [<ffffffff8102db2b>] __might_sleep+0x103/0x108
[20933.365312]  [<ffffffff812e61c8>] do_ide_request+0x4a/0x58e
[20933.365362]  [<ffffffff811951af>] ? cfq_prio_tree_add+0xb3/0xc2
[20933.365390]  [<ffffffff811974f3>] ? cfq_add_rq_rb+0xb1/0xc5
[20933.365437]  [<ffffffff81197579>] ? cfq_insert_request+0x72/0x433
[20933.365465]  [<ffffffff81187fd8>] __blk_run_queue+0x80/0xee
[20933.365511]  [<ffffffff81188141>] flush_plug_list+0xfb/0x139
[20933.365540]  [<ffffffff810a9a86>] ? sleep_on_page+0x12/0x12
[20933.365586]  [<ffffffff81188199>] __blk_flush_plug+0x1a/0x3a
[20933.365613]  [<ffffffff81442913>] schedule+0x3d2/0xb4b
[20933.365663]  [<ffffffff811a2cee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[20933.365692]  [<ffffffff8100387f>] ? do_softirq+0x77/0x85
[20933.365739]  [<ffffffff81446244>] ? retint_restore_args+0xe/0xe
[20933.365767]  [<ffffffff810a9a86>] ? sleep_on_page+0x12/0x12
[20933.365812]  [<ffffffff81443377>] preempt_schedule+0x37/0x4b
[20933.365839]  [<ffffffff81445f92>] _raw_spin_unlock_irqrestore+0x64/0x69
[20933.365878]  [<ffffffff81057b3b>] prepare_to_wait_exclusive+0x6c/0x77
[20933.365925]  [<ffffffff81443806>] __wait_on_bit_lock+0x34/0x8f
[20933.365953]  [<ffffffff810a9a00>] __lock_page_killable+0x66/0x6d
[20933.366000]  [<ffffffff810579a1>] ? autoremove_wake_function+0x3d/0x3d
[20933.366090]  [<ffffffff810ab56d>] generic_file_aio_read+0x491/0x67c
[20933.366123]  [<ffffffff810e8137>] do_sync_read+0xcb/0x108
[20933.366183]  [<ffffffff81068f4d>] ? trace_hardirqs_on+0xd/0xf
[20933.366217]  [<ffffffff810e8bce>] vfs_read+0xb3/0x13b
[20933.366270]  [<ffffffff810e8d1f>] sys_read+0x4d/0x77
[20933.366303]  [<ffffffff81446aeb>] system_call_fastpath+0x16/0x1b

Thanks.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14 20:24 ` Borislav Petkov
@ 2011-04-14 20:55   ` Linus Torvalds
  2011-04-15  4:14     ` Christoph Hellwig
  0 siblings, 1 reply; 108+ messages in thread
From: Linus Torvalds @ 2011-04-14 20:55 UTC (permalink / raw)
  To: Borislav Petkov, Jens Axboe, Linus Torvalds, Linux Kernel Mailing List

On Thu, Apr 14, 2011 at 1:24 PM, Borislav Petkov <bp@alien8.de> wrote:
>
> Yep, I had to hit a buglet too. Looks like block layer, ->request_fn
> (do_ide_request) called in IRQ disabled region. Process is blkid, i.e.
> some udev helper. Kernel is -rc3 + Joerg's USB quirk fix which should be
> unrelated.

I think this particular backtrace should be fixed by commit
6631e635c65d ("block: don't flush plugged IO on forced preemtion
scheduling"), although even without preempt scheduling, I don't think
it's at all ok to sleep inside __blk_run_queue.

Jens? Even from a _regular_ schedule, it would not be ok if we end up
sleeping - we're caching things like the request-queue, and we have
preempt_disable() inside the scheduler for a very good reason.

So if the unplugging can cause sleeping, that's a problem. See the

        /* HLD do_request() callback might sleep, make sure it's okay */
        might_sleep();

comment in drivers/ide/ide-io.c. Hmm?

                    Linus

---
> Happens when I put a cd in the drive. More info upon request.
>
> [20933.365059] BUG: sleeping function called from invalid context at drivers/ide/ide-io.c:468
> [20933.365113] in_atomic(): 1, irqs_disabled(): 0, pid: 5817, name: blkid
> [20933.365166] no locks held by blkid/5817.
> [20933.365193] Pid: 5817, comm: blkid Not tainted 2.6.39-rc3-00001-g1b521ee #9
> [20933.365228] Call Trace:
> [20933.365282]  [<ffffffff8102db2b>] __might_sleep+0x103/0x108
> [20933.365312]  [<ffffffff812e61c8>] do_ide_request+0x4a/0x58e
> [20933.365362]  [<ffffffff811951af>] ? cfq_prio_tree_add+0xb3/0xc2
> [20933.365390]  [<ffffffff811974f3>] ? cfq_add_rq_rb+0xb1/0xc5
> [20933.365437]  [<ffffffff81197579>] ? cfq_insert_request+0x72/0x433
> [20933.365465]  [<ffffffff81187fd8>] __blk_run_queue+0x80/0xee
> [20933.365511]  [<ffffffff81188141>] flush_plug_list+0xfb/0x139
> [20933.365540]  [<ffffffff810a9a86>] ? sleep_on_page+0x12/0x12
> [20933.365586]  [<ffffffff81188199>] __blk_flush_plug+0x1a/0x3a
> [20933.365613]  [<ffffffff81442913>] schedule+0x3d2/0xb4b
> [20933.365663]  [<ffffffff811a2cee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [20933.365692]  [<ffffffff8100387f>] ? do_softirq+0x77/0x85
> [20933.365739]  [<ffffffff81446244>] ? retint_restore_args+0xe/0xe
> [20933.365767]  [<ffffffff810a9a86>] ? sleep_on_page+0x12/0x12
> [20933.365812]  [<ffffffff81443377>] preempt_schedule+0x37/0x4b
> [20933.365839]  [<ffffffff81445f92>] _raw_spin_unlock_irqrestore+0x64/0x69
> [20933.365878]  [<ffffffff81057b3b>] prepare_to_wait_exclusive+0x6c/0x77
> [20933.365925]  [<ffffffff81443806>] __wait_on_bit_lock+0x34/0x8f
> [20933.365953]  [<ffffffff810a9a00>] __lock_page_killable+0x66/0x6d
> [20933.366000]  [<ffffffff810579a1>] ? autoremove_wake_function+0x3d/0x3d
> [20933.366090]  [<ffffffff810ab56d>] generic_file_aio_read+0x491/0x67c
> [20933.366123]  [<ffffffff810e8137>] do_sync_read+0xcb/0x108
> [20933.366183]  [<ffffffff81068f4d>] ? trace_hardirqs_on+0xd/0xf
> [20933.366217]  [<ffffffff810e8bce>] vfs_read+0xb3/0x13b
> [20933.366270]  [<ffffffff810e8d1f>] sys_read+0x4d/0x77
> [20933.366303]  [<ffffffff81446aeb>] system_call_fastpath+0x16/0x1b
>
> Thanks.
>
> --
> Regards/Gruss,
>    Boris.
>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14 14:28                 ` Alex Deucher
@ 2011-04-14 21:09                   ` Joerg Roedel
  2011-04-14 21:34                     ` Alex Deucher
  2011-04-15  8:26                       ` Michel Dänzer
  0 siblings, 2 replies; 108+ messages in thread
From: Joerg Roedel @ 2011-04-14 21:09 UTC (permalink / raw)
  To: Alex Deucher
  Cc: H. Peter Anvin, Yinghai Lu, Ingo Molnar, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo

On Thu, Apr 14, 2011 at 10:28:43AM -0400, Alex Deucher wrote:
> On Thu, Apr 14, 2011 at 4:56 AM, Joerg Roedel <joro@8bytes.org> wrote:
> > And this makes a difference, with this change on-top of -rc3 the box boots
> > fine. So there seems to be some dependency between the GART base and the GTT
> > base even when they are in different address spaces.
> >
> > Alex, can you comment on this?
> 
> As Dave said, they are completely different addresses spaces.  You
> could put the GPU aperture at 0 if you wanted (in fact we do on some
> chips).  Perhaps there's some strange interaction with the nb gart
> since the nb gart on that chipset was designed to be used for graphics
> and the rs780/880 can be configured to use an agp aperture.
> Unfortunately, I'm not that familiar with the nb gart.

Actually, the nb gart is part of the cpu. It is part of the cpu north
bridge and can translate io and cpu accesses. In fact, it is a remapper
of physical memory addresses.

The problem seems to be related to specific gpu chips. On another
notebook with an hd3000 card gtt and the nb gart aperture are both on
0xa0000000 too but the box works fine. I havn't tested with an hd5000
yet. The failing notebook has an hd4200 mobility.

Btw. what happens if the gpu accesses an unmapped address in the gtt
range?

Regards,

	Joerg


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14 21:09                   ` Joerg Roedel
@ 2011-04-14 21:34                     ` Alex Deucher
  2011-04-15  6:50                       ` Joerg Roedel
  2011-04-15 14:49                       ` Andreas Herrmann
  2011-04-15  8:26                       ` Michel Dänzer
  1 sibling, 2 replies; 108+ messages in thread
From: Alex Deucher @ 2011-04-14 21:34 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: H. Peter Anvin, Yinghai Lu, Ingo Molnar, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo

On Thu, Apr 14, 2011 at 5:09 PM, Joerg Roedel <joro@8bytes.org> wrote:
> On Thu, Apr 14, 2011 at 10:28:43AM -0400, Alex Deucher wrote:
>> On Thu, Apr 14, 2011 at 4:56 AM, Joerg Roedel <joro@8bytes.org> wrote:
>> > And this makes a difference, with this change on-top of -rc3 the box boots
>> > fine. So there seems to be some dependency between the GART base and the GTT
>> > base even when they are in different address spaces.
>> >
>> > Alex, can you comment on this?
>>
>> As Dave said, they are completely different addresses spaces.  You
>> could put the GPU aperture at 0 if you wanted (in fact we do on some
>> chips).  Perhaps there's some strange interaction with the nb gart
>> since the nb gart on that chipset was designed to be used for graphics
>> and the rs780/880 can be configured to use an agp aperture.
>> Unfortunately, I'm not that familiar with the nb gart.
>
> Actually, the nb gart is part of the cpu. It is part of the cpu north
> bridge and can translate io and cpu accesses. In fact, it is a remapper
> of physical memory addresses.

I know what it's for.  In the IGP graphics chip is also part of the
north bridge, but it may not be related at all.

>
> The problem seems to be related to specific gpu chips. On another
> notebook with an hd3000 card gtt and the nb gart aperture are both on
> 0xa0000000 too but the box works fine. I havn't tested with an hd5000
> yet. The failing notebook has an hd4200 mobility.

What exact model is the hd3000?   Is it IGP GPU or a discrete GPU?  It
it's an IGP, it's identical to the hd4200 programming-wise.

>
> Btw. what happens if the gpu accesses an unmapped address in the gtt
> range?

It's redirected to a dummy page.

Alex

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14 20:55   ` Linus Torvalds
@ 2011-04-15  4:14     ` Christoph Hellwig
  2011-04-20 20:12       ` Borislav Petkov
  0 siblings, 1 reply; 108+ messages in thread
From: Christoph Hellwig @ 2011-04-15  4:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Borislav Petkov, Jens Axboe, Linux Kernel Mailing List

On Thu, Apr 14, 2011 at 01:55:02PM -0700, Linus Torvalds wrote:
> On Thu, Apr 14, 2011 at 1:24 PM, Borislav Petkov <bp@alien8.de> wrote:
> >
> > Yep, I had to hit a buglet too. Looks like block layer, ->request_fn
> > (do_ide_request) called in IRQ disabled region. Process is blkid, i.e.
> > some udev helper. Kernel is -rc3 + Joerg's USB quirk fix which should be
> > unrelated.
> 
> I think this particular backtrace should be fixed by commit
> 6631e635c65d ("block: don't flush plugged IO on forced preemtion
> scheduling"), although even without preempt scheduling, I don't think
> it's at all ok to sleep inside __blk_run_queue.
> 
> Jens? Even from a _regular_ schedule, it would not be ok if we end up
> sleeping - we're caching things like the request-queue, and we have
> preempt_disable() inside the scheduler for a very good reason.

Jens already has a fix in his tree to always offload the block I/O
submission to blockd for this case.


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14 21:34                     ` Alex Deucher
@ 2011-04-15  6:50                       ` Joerg Roedel
  2011-04-15 14:49                       ` Andreas Herrmann
  1 sibling, 0 replies; 108+ messages in thread
From: Joerg Roedel @ 2011-04-15  6:50 UTC (permalink / raw)
  To: Alex Deucher
  Cc: H. Peter Anvin, Yinghai Lu, Ingo Molnar, Linus Torvalds,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo

On Thu, Apr 14, 2011 at 05:34:46PM -0400, Alex Deucher wrote:
> On Thu, Apr 14, 2011 at 5:09 PM, Joerg Roedel <joro@8bytes.org> wrote:

> > Actually, the nb gart is part of the cpu. It is part of the cpu north
> > bridge and can translate io and cpu accesses. In fact, it is a remapper
> > of physical memory addresses.
> 
> I know what it's for.  In the IGP graphics chip is also part of the
> north bridge, but it may not be related at all.

Okay, just wanted to make clear that it is part of the CPU and not of
the chipset :)

> > The problem seems to be related to specific gpu chips. On another
> > notebook with an hd3000 card gtt and the nb gart aperture are both on
> > 0xa0000000 too but the box works fine. I havn't tested with an hd5000
> > yet. The failing notebook has an hd4200 mobility.
> 
> What exact model is the hd3000?   Is it IGP GPU or a discrete GPU?  It
> it's an IGP, it's identical to the hd4200 programming-wise.

It is an IGP card, an 

	"ATI Technologies Inc RS780M/RS780MN [Radeon HD 3200 Graphics]"

according to lspci.

> > Btw. what happens if the gpu accesses an unmapped address in the gtt
> > range?
> 
> It's redirected to a dummy page.

So there should be no issue too, this is a very weird bug.

	Joerg


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14 21:09                   ` Joerg Roedel
@ 2011-04-15  8:26                       ` Michel Dänzer
  2011-04-15  8:26                       ` Michel Dänzer
  1 sibling, 0 replies; 108+ messages in thread
From: Michel Dänzer @ 2011-04-15  8:26 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Alex Deucher, Yinghai Lu, Linux Kernel Mailing List, dri-devel,
	H. Peter Anvin, Tejun Heo, Linus Torvalds, Thomas Gleixner

On Don, 2011-04-14 at 23:09 +0200, Joerg Roedel wrote: 
> On Thu, Apr 14, 2011 at 10:28:43AM -0400, Alex Deucher wrote:
> > On Thu, Apr 14, 2011 at 4:56 AM, Joerg Roedel <joro@8bytes.org> wrote:
> > > And this makes a difference, with this change on-top of -rc3 the box boots
> > > fine. So there seems to be some dependency between the GART base and the GTT
> > > base even when they are in different address spaces.
> > >
> > > Alex, can you comment on this?
> > 
> > As Dave said, they are completely different addresses spaces.  You
> > could put the GPU aperture at 0 if you wanted (in fact we do on some
> > chips).  Perhaps there's some strange interaction with the nb gart
> > since the nb gart on that chipset was designed to be used for graphics
> > and the rs780/880 can be configured to use an agp aperture.
> > Unfortunately, I'm not that familiar with the nb gart.
> 
> Actually, the nb gart is part of the cpu. It is part of the cpu north
> bridge and can translate io and cpu accesses. In fact, it is a remapper
> of physical memory addresses.
> 
> The problem seems to be related to specific gpu chips. On another
> notebook with an hd3000 card gtt and the nb gart aperture are both on
> 0xa0000000 too but the box works fine.

Wasn't the working theory that the problem occurs if those two values
aren't the same?


-- 
Earthling Michel Dänzer           |                http://www.vmware.com
Libre software enthusiast         |          Debian, X and DRI developer

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
@ 2011-04-15  8:26                       ` Michel Dänzer
  0 siblings, 0 replies; 108+ messages in thread
From: Michel Dänzer @ 2011-04-15  8:26 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Thomas, Tejun, Linus Torvalds, Linux Kernel Mailing List,
	dri-devel, Heo, H. Peter Anvin, Gleixner, Yinghai Lu

On Don, 2011-04-14 at 23:09 +0200, Joerg Roedel wrote: 
> On Thu, Apr 14, 2011 at 10:28:43AM -0400, Alex Deucher wrote:
> > On Thu, Apr 14, 2011 at 4:56 AM, Joerg Roedel <joro@8bytes.org> wrote:
> > > And this makes a difference, with this change on-top of -rc3 the box boots
> > > fine. So there seems to be some dependency between the GART base and the GTT
> > > base even when they are in different address spaces.
> > >
> > > Alex, can you comment on this?
> > 
> > As Dave said, they are completely different addresses spaces.  You
> > could put the GPU aperture at 0 if you wanted (in fact we do on some
> > chips).  Perhaps there's some strange interaction with the nb gart
> > since the nb gart on that chipset was designed to be used for graphics
> > and the rs780/880 can be configured to use an agp aperture.
> > Unfortunately, I'm not that familiar with the nb gart.
> 
> Actually, the nb gart is part of the cpu. It is part of the cpu north
> bridge and can translate io and cpu accesses. In fact, it is a remapper
> of physical memory addresses.
> 
> The problem seems to be related to specific gpu chips. On another
> notebook with an hd3000 card gtt and the nb gart aperture are both on
> 0xa0000000 too but the box works fine.

Wasn't the working theory that the problem occurs if those two values
aren't the same?


-- 
Earthling Michel Dänzer           |                http://www.vmware.com
Libre software enthusiast         |          Debian, X and DRI developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-15  8:26                       ` Michel Dänzer
  (?)
@ 2011-04-15  8:55                       ` Joerg Roedel
  -1 siblings, 0 replies; 108+ messages in thread
From: Joerg Roedel @ 2011-04-15  8:55 UTC (permalink / raw)
  To: Michel Dänzer
  Cc: Alex Deucher, Yinghai Lu, Linux Kernel Mailing List, dri-devel,
	H. Peter Anvin, Tejun Heo, Linus Torvalds, Thomas Gleixner

On Fri, Apr 15, 2011 at 10:26:34AM +0200, Michel Dänzer wrote:
> On Don, 2011-04-14 at 23:09 +0200, Joerg Roedel wrote: 
> > On Thu, Apr 14, 2011 at 10:28:43AM -0400, Alex Deucher wrote:
> > > On Thu, Apr 14, 2011 at 4:56 AM, Joerg Roedel <joro@8bytes.org> wrote:
> > > > And this makes a difference, with this change on-top of -rc3 the box boots
> > > > fine. So there seems to be some dependency between the GART base and the GTT
> > > > base even when they are in different address spaces.
> > > >
> > > > Alex, can you comment on this?
> > > 
> > > As Dave said, they are completely different addresses spaces.  You
> > > could put the GPU aperture at 0 if you wanted (in fact we do on some
> > > chips).  Perhaps there's some strange interaction with the nb gart
> > > since the nb gart on that chipset was designed to be used for graphics
> > > and the rs780/880 can be configured to use an agp aperture.
> > > Unfortunately, I'm not that familiar with the nb gart.
> > 
> > Actually, the nb gart is part of the cpu. It is part of the cpu north
> > bridge and can translate io and cpu accesses. In fact, it is a remapper
> > of physical memory addresses.
> > 
> > The problem seems to be related to specific gpu chips. On another
> > notebook with an hd3000 card gtt and the nb gart aperture are both on
> > 0xa0000000 too but the box works fine.
> 
> Wasn't the working theory that the problem occurs if those two values
> aren't the same?

Yes it is, but this doesn't seem to be problematic on all readeon GPU
chips.

	Joerg


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14  2:33                             ` Linus Torvalds
                                               ` (2 preceding siblings ...)
  (?)
@ 2011-04-15 13:11                             ` Joerg Roedel
  2011-04-15 13:16                               ` Ingo Molnar
                                                 ` (3 more replies)
  -1 siblings, 4 replies; 108+ messages in thread
From: Joerg Roedel @ 2011-04-15 13:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Yinghai Lu, Ingo Molnar, Alex Deucher,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo,
	alexandre.f.demers

On Wed, Apr 13, 2011 at 07:33:40PM -0700, Linus Torvalds wrote:
>  we definitely want to also understand the reason for things not
> working, even if we do revert..

Okay, here it is.

After experimenting with different configurations for the north-bridge
it turned out that a GART related MCE fires at the time the machine
reboots. BIOSes configure the machine to sync-flood in that case which
causes a reboot.

After decoding the MCE it turned out to be a GART TBL Wlk Error. Such
errors can happen if devices (speculativly) access GART ranges mapped
invalid. The AMD BKDG for Fam10h CPUs recommends to disable these errors
at all. But unfortunatly some BIOSes (including the one on my laptop)
forget to do this.

Below is a patch which disables these errors if the BIOS didn't do it.
It fixes the problem on my site.

Alexandre, can you try this patch on your machine too, please?

Regards,

	Joerg

>From aaacff8db50b6ed4345e337ecbe53e505699c7e5 Mon Sep 17 00:00:00 2001
From: Joerg Roedel <joerg.roedel@amd.com>
Date: Fri, 15 Apr 2011 14:47:40 +0200
Subject: [PATCH] x86/amd: Disable GartTlbWlkErr when BIOS forgets it

This patch disables GartTlbWlk errors on AMD Fam10h CPUs if
the BIOS forgets to do is (or is just too old). Letting
these errors enabled can cause a sync-flood on the CPU
causing a reboot.

This patch is the fix for

	https://bugzilla.kernel.org/show_bug.cgi?id=33012

on my machine.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/msr-index.h |    4 ++++
 arch/x86/kernel/cpu/amd.c        |   19 +++++++++++++++++++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index fd5a1f3..3cce714 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -96,11 +96,15 @@
 #define MSR_IA32_MC0_ADDR		0x00000402
 #define MSR_IA32_MC0_MISC		0x00000403
 
+#define MSR_AMD64_MC0_MASK		0xc0010044
+
 #define MSR_IA32_MCx_CTL(x)		(MSR_IA32_MC0_CTL + 4*(x))
 #define MSR_IA32_MCx_STATUS(x)		(MSR_IA32_MC0_STATUS + 4*(x))
 #define MSR_IA32_MCx_ADDR(x)		(MSR_IA32_MC0_ADDR + 4*(x))
 #define MSR_IA32_MCx_MISC(x)		(MSR_IA32_MC0_MISC + 4*(x))
 
+#define MSR_AMD64_MCx_MASK(x)		(MSR_AMD64_MC0_MASK + (x))
+
 /* These are consecutive and not in the normal 4er MCE bank block */
 #define MSR_IA32_MC0_CTL2		0x00000280
 #define MSR_IA32_MCx_CTL2(x)		(MSR_IA32_MC0_CTL2 + (x))
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 3ecece0..3532d3b 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -615,6 +615,25 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c)
 	/* As a rule processors have APIC timer running in deep C states */
 	if (c->x86 >= 0xf && !cpu_has_amd_erratum(amd_erratum_400))
 		set_cpu_cap(c, X86_FEATURE_ARAT);
+
+	/*
+	 * Disable GART TLB Walk Errors on Fam10h. We do this here
+	 * because this is always needed when GART is enabled, even in a
+	 * kernel which has no MCE support built in.
+	 */
+	if (c->x86 == 0x10) {
+		/*
+		 * BIOS should disable GartTlbWlk Errors themself. If
+		 * it doesn't do it here as suggested by the BKDG.
+		 *
+		 * Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=33012
+		 */
+		u64 mask;
+
+		rdmsrl(MSR_AMD64_MCx_MASK(4), mask);
+		mask |= (1 << 10);
+		wrmsrl(MSR_AMD64_MCx_MASK(4), mask);
+	}
 }
 
 #ifdef CONFIG_X86_32
-- 
1.7.1


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-15 13:11                             ` Joerg Roedel
@ 2011-04-15 13:16                               ` Ingo Molnar
  2011-04-15 14:33                                 ` Joerg Roedel
  2011-04-15 15:46                                 ` Joerg Roedel
  2011-04-15 14:04                               ` Andreas Herrmann
                                                 ` (2 subsequent siblings)
  3 siblings, 2 replies; 108+ messages in thread
From: Ingo Molnar @ 2011-04-15 13:16 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Linus Torvalds, H. Peter Anvin, Yinghai Lu, Alex Deucher,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo,
	alexandre.f.demers


* Joerg Roedel <joro@8bytes.org> wrote:

> On Wed, Apr 13, 2011 at 07:33:40PM -0700, Linus Torvalds wrote:
> >  we definitely want to also understand the reason for things not
> > working, even if we do revert..
> 
> Okay, here it is.
> 
> After experimenting with different configurations for the north-bridge
> it turned out that a GART related MCE fires at the time the machine
> reboots. BIOSes configure the machine to sync-flood in that case which
> causes a reboot.
> 
> After decoding the MCE it turned out to be a GART TBL Wlk Error. Such
> errors can happen if devices (speculativly) access GART ranges mapped
> invalid. The AMD BKDG for Fam10h CPUs recommends to disable these errors
> at all. But unfortunatly some BIOSes (including the one on my laptop)
> forget to do this.
> 
> Below is a patch which disables these errors if the BIOS didn't do it.
> It fixes the problem on my site.

Ok, but how did the allocation changes start triggering this error in 
v2.6.39-rc1? There must still be some layout specific thing here, right?
Do we understand the details of that as well?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-15 13:11                             ` Joerg Roedel
  2011-04-15 13:16                               ` Ingo Molnar
@ 2011-04-15 14:04                               ` Andreas Herrmann
  2011-04-15 14:28                                 ` Joerg Roedel
  2011-04-15 14:16                               ` Alexandre Demers
  2011-04-16  0:03                               ` [tip:x86/urgent] x86, amd: Disable GartTlbWlkErr when BIOS forgets it tip-bot for Joerg Roedel
  3 siblings, 1 reply; 108+ messages in thread
From: Andreas Herrmann @ 2011-04-15 14:04 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Linus Torvalds, H. Peter Anvin, Yinghai Lu, Ingo Molnar,
	Alex Deucher, Linux Kernel Mailing List, dri-devel,
	Thomas Gleixner, Tejun Heo, alexandre.f.demers

On Fri, Apr 15, 2011 at 03:11:52PM +0200, Joerg Roedel wrote:
> On Wed, Apr 13, 2011 at 07:33:40PM -0700, Linus Torvalds wrote:
> >  we definitely want to also understand the reason for things not
> > working, even if we do revert..
> 
> Okay, here it is.
> 
> After experimenting with different configurations for the north-bridge
> it turned out that a GART related MCE fires at the time the machine
> reboots. BIOSes configure the machine to sync-flood in that case which
> causes a reboot.
> 
> After decoding the MCE it turned out to be a GART TBL Wlk Error. Such
> errors can happen if devices (speculativly) access GART ranges mapped
> invalid. The AMD BKDG for Fam10h CPUs recommends to disable these errors
> at all. But unfortunatly some BIOSes (including the one on my laptop)
> forget to do this.
> 
> Below is a patch which disables these errors if the BIOS didn't do it.
> It fixes the problem on my site.
> 
> Alexandre, can you try this patch on your machine too, please?
> 
> Regards,
> 
> 	Joerg
> 
> From aaacff8db50b6ed4345e337ecbe53e505699c7e5 Mon Sep 17 00:00:00 2001
> From: Joerg Roedel <joerg.roedel@amd.com>
> Date: Fri, 15 Apr 2011 14:47:40 +0200
> Subject: [PATCH] x86/amd: Disable GartTlbWlkErr when BIOS forgets it
> 
> This patch disables GartTlbWlk errors on AMD Fam10h CPUs if
> the BIOS forgets to do is (or is just too old). Letting
> these errors enabled can cause a sync-flood on the CPU
> causing a reboot.
> 
> This patch is the fix for
> 
> 	https://bugzilla.kernel.org/show_bug.cgi?id=33012
> 
> on my machine.
> 
> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>


Joerg,

What about tagging this patch for stable/longterm releases?

Potentially there are other cases where certain combinations of
hardware(GPUs)/drivers/whatsoever might trigger a GartTlbWlkErr. If
the BIOS doesn't follow the BKDG recommendation to mask these errors,
the system will hang/reboot. Thus I think having this quirk in .32 and
.38 (at least) is useful.


Andreas

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-15 13:11                             ` Joerg Roedel
  2011-04-15 13:16                               ` Ingo Molnar
  2011-04-15 14:04                               ` Andreas Herrmann
@ 2011-04-15 14:16                               ` Alexandre Demers
  2011-04-15 14:27                                   ` Joerg Roedel
  2011-04-16  0:03                               ` [tip:x86/urgent] x86, amd: Disable GartTlbWlkErr when BIOS forgets it tip-bot for Joerg Roedel
  3 siblings, 1 reply; 108+ messages in thread
From: Alexandre Demers @ 2011-04-15 14:16 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Linus Torvalds, H. Peter Anvin, Yinghai Lu, Ingo Molnar,
	Alex Deucher, Linux Kernel Mailing List, dri-devel,
	Thomas Gleixner, Tejun Heo

On 11-04-15 09:11 AM, Joerg Roedel wrote:
> On Wed, Apr 13, 2011 at 07:33:40PM -0700, Linus Torvalds wrote:
>>  we definitely want to also understand the reason for things not
>> working, even if we do revert..
> Okay, here it is.
>
> After experimenting with different configurations for the north-bridge
> it turned out that a GART related MCE fires at the time the machine
> reboots. BIOSes configure the machine to sync-flood in that case which
> causes a reboot.
>
> After decoding the MCE it turned out to be a GART TBL Wlk Error. Such
> errors can happen if devices (speculativly) access GART ranges mapped
> invalid. The AMD BKDG for Fam10h CPUs recommends to disable these errors
> at all. But unfortunatly some BIOSes (including the one on my laptop)
> forget to do this.
>
> Below is a patch which disables these errors if the BIOS didn't do it.
> It fixes the problem on my site.
>
> Alexandre, can you try this patch on your machine too, please?
>
> Regards,
>
> 	Joerg
>
> From aaacff8db50b6ed4345e337ecbe53e505699c7e5 Mon Sep 17 00:00:00 2001
> From: Joerg Roedel <joerg.roedel@amd.com>
> Date: Fri, 15 Apr 2011 14:47:40 +0200
> Subject: [PATCH] x86/amd: Disable GartTlbWlkErr when BIOS forgets it
>
> This patch disables GartTlbWlk errors on AMD Fam10h CPUs if
> the BIOS forgets to do is (or is just too old). Letting
> these errors enabled can cause a sync-flood on the CPU
> causing a reboot.
>
> This patch is the fix for
>
> 	https://bugzilla.kernel.org/show_bug.cgi?id=33012
>
> on my machine.
>
> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
> ---
>  arch/x86/include/asm/msr-index.h |    4 ++++
>  arch/x86/kernel/cpu/amd.c        |   19 +++++++++++++++++++
>  2 files changed, 23 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index fd5a1f3..3cce714 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -96,11 +96,15 @@
>  #define MSR_IA32_MC0_ADDR		0x00000402
>  #define MSR_IA32_MC0_MISC		0x00000403
>  
> +#define MSR_AMD64_MC0_MASK		0xc0010044
> +
>  #define MSR_IA32_MCx_CTL(x)		(MSR_IA32_MC0_CTL + 4*(x))
>  #define MSR_IA32_MCx_STATUS(x)		(MSR_IA32_MC0_STATUS + 4*(x))
>  #define MSR_IA32_MCx_ADDR(x)		(MSR_IA32_MC0_ADDR + 4*(x))
>  #define MSR_IA32_MCx_MISC(x)		(MSR_IA32_MC0_MISC + 4*(x))
>  
> +#define MSR_AMD64_MCx_MASK(x)		(MSR_AMD64_MC0_MASK + (x))
> +
>  /* These are consecutive and not in the normal 4er MCE bank block */
>  #define MSR_IA32_MC0_CTL2		0x00000280
>  #define MSR_IA32_MCx_CTL2(x)		(MSR_IA32_MC0_CTL2 + (x))
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 3ecece0..3532d3b 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -615,6 +615,25 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c)
>  	/* As a rule processors have APIC timer running in deep C states */
>  	if (c->x86 >= 0xf && !cpu_has_amd_erratum(amd_erratum_400))
>  		set_cpu_cap(c, X86_FEATURE_ARAT);
> +
> +	/*
> +	 * Disable GART TLB Walk Errors on Fam10h. We do this here
> +	 * because this is always needed when GART is enabled, even in a
> +	 * kernel which has no MCE support built in.
> +	 */
> +	if (c->x86 == 0x10) {
> +		/*
> +		 * BIOS should disable GartTlbWlk Errors themself. If
> +		 * it doesn't do it here as suggested by the BKDG.
> +		 *
> +		 * Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=33012
> +		 */
> +		u64 mask;
> +
> +		rdmsrl(MSR_AMD64_MCx_MASK(4), mask);
> +		mask |= (1 << 10);
> +		wrmsrl(MSR_AMD64_MCx_MASK(4), mask);
> +	}
>  }
>  
>  #ifdef CONFIG_X86_32
Ok, I'll test it today. Should I apply it on a clean rc3 without any of
the other patches?

BTW, may I suggest adding the info under bug 33012 in kernel bugzilla?
This could be useful in the future.

I'll keep you up to date.

-- 
Alexandre Demers


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-15 14:16                               ` Alexandre Demers
@ 2011-04-15 14:27                                   ` Joerg Roedel
  0 siblings, 0 replies; 108+ messages in thread
From: Joerg Roedel @ 2011-04-15 14:27 UTC (permalink / raw)
  To: Alexandre Demers
  Cc: Linus Torvalds, H. Peter Anvin, Yinghai Lu, Ingo Molnar,
	Alex Deucher, Linux Kernel Mailing List, dri-devel,
	Thomas Gleixner, Tejun Heo

On Fri, Apr 15, 2011 at 10:16:59AM -0400, Alexandre Demers wrote:
> Ok, I'll test it today. Should I apply it on a clean rc3 without any of
> the other patches?

Yes, apply it just on -rc3 without any other patch.

> 
> BTW, may I suggest adding the info under bug 33012 in kernel bugzilla?
> This could be useful in the future.

Cool, thanks


	Joerg

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
@ 2011-04-15 14:27                                   ` Joerg Roedel
  0 siblings, 0 replies; 108+ messages in thread
From: Joerg Roedel @ 2011-04-15 14:27 UTC (permalink / raw)
  To: Alexandre Demers
  Cc: Linus Torvalds, Linux Kernel Mailing List, dri-devel, Tejun Heo,
	H. Peter Anvin, Yinghai Lu, Thomas Gleixner

On Fri, Apr 15, 2011 at 10:16:59AM -0400, Alexandre Demers wrote:
> Ok, I'll test it today. Should I apply it on a clean rc3 without any of
> the other patches?

Yes, apply it just on -rc3 without any other patch.

> 
> BTW, may I suggest adding the info under bug 33012 in kernel bugzilla?
> This could be useful in the future.

Cool, thanks


	Joerg

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-15 14:04                               ` Andreas Herrmann
@ 2011-04-15 14:28                                 ` Joerg Roedel
  0 siblings, 0 replies; 108+ messages in thread
From: Joerg Roedel @ 2011-04-15 14:28 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Linus Torvalds, H. Peter Anvin, Yinghai Lu, Ingo Molnar,
	Alex Deucher, Linux Kernel Mailing List, dri-devel,
	Thomas Gleixner, Tejun Heo, alexandre.f.demers

On Fri, Apr 15, 2011 at 04:04:45PM +0200, Andreas Herrmann wrote:
> What about tagging this patch for stable/longterm releases?
> 
> Potentially there are other cases where certain combinations of
> hardware(GPUs)/drivers/whatsoever might trigger a GartTlbWlkErr. If
> the BIOS doesn't follow the BKDG recommendation to mask these errors,
> the system will hang/reboot. Thus I think having this quirk in .32 and
> .38 (at least) is useful.

Right, thats certainly a good idea. The problem is not specific to GPUs,
any other device can trigger this too.

	Joerg


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-15 13:16                               ` Ingo Molnar
@ 2011-04-15 14:33                                 ` Joerg Roedel
  2011-04-15 16:11                                   ` Alex Deucher
  2011-04-15 15:46                                 ` Joerg Roedel
  1 sibling, 1 reply; 108+ messages in thread
From: Joerg Roedel @ 2011-04-15 14:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, H. Peter Anvin, Yinghai Lu, Alex Deucher,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo,
	alexandre.f.demers

On Fri, Apr 15, 2011 at 03:16:50PM +0200, Ingo Molnar wrote:
> Ok, but how did the allocation changes start triggering this error in 
> v2.6.39-rc1? There must still be some layout specific thing here, right?
> Do we understand the details of that as well?

No, I must admit that I lack enough knowledge about the GPU hardware to
make an guess how this tanslation-request happened. All I can tell is
the address that was reported in the MCE, it is 0xa0001000 (==the second
page of the GART aperture).

Maybe Alex can help here. Alex, may it be possible that the GPU
generates DMA requests in the GTT area before the GTT is activated (or
the activation is completed)? Or can you imagine any other reason?

	Joerg


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14 21:34                     ` Alex Deucher
  2011-04-15  6:50                       ` Joerg Roedel
@ 2011-04-15 14:49                       ` Andreas Herrmann
  1 sibling, 0 replies; 108+ messages in thread
From: Andreas Herrmann @ 2011-04-15 14:49 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Joerg Roedel, H. Peter Anvin, Yinghai Lu, Ingo Molnar,
	Linus Torvalds, Linux Kernel Mailing List, dri-devel,
	Thomas Gleixner, Tejun Heo

On Thu, Apr 14, 2011 at 05:34:46PM -0400, Alex Deucher wrote:
> On Thu, Apr 14, 2011 at 5:09 PM, Joerg Roedel <joro@8bytes.org> wrote:
> > On Thu, Apr 14, 2011 at 10:28:43AM -0400, Alex Deucher wrote:
> >> On Thu, Apr 14, 2011 at 4:56 AM, Joerg Roedel <joro@8bytes.org> wrote:
> >> > And this makes a difference, with this change on-top of -rc3 the box boots
> >> > fine. So there seems to be some dependency between the GART base and the GTT
> >> > base even when they are in different address spaces.
> >> >
> >> > Alex, can you comment on this?
> >>
> >> As Dave said, they are completely different addresses spaces.  You
> >> could put the GPU aperture at 0 if you wanted (in fact we do on some
> >> chips).  Perhaps there's some strange interaction with the nb gart
> >> since the nb gart on that chipset was designed to be used for graphics
> >> and the rs780/880 can be configured to use an agp aperture.
> >> Unfortunately, I'm not that familiar with the nb gart.
> >
> > Actually, the nb gart is part of the cpu. It is part of the cpu north
> > bridge and can translate io and cpu accesses. In fact, it is a remapper
> > of physical memory addresses.
> 
> I know what it's for.  In the IGP graphics chip is also part of the
> north bridge, but it may not be related at all.
> 
> >
> > The problem seems to be related to specific gpu chips. On another
> > notebook with an hd3000 card gtt and the nb gart aperture are both on
> > 0xa0000000 too but the box works fine. I havn't tested with an hd5000
> > yet. The failing notebook has an hd4200 mobility.
> 
> What exact model is the hd3000?   Is it IGP GPU or a discrete GPU?  It
> it's an IGP, it's identical to the hd4200 programming-wise.

BTW, first of all the other notebook had a different CPU (it's family
0fh and Joerg's is family 10h). So different CPUs different GARTs
different issues ;-)

(Furthermore for CPU family 0fh reporting of GartTblWalk errors is
already switched off in arch/x86/kernel/cpu/mcheck/mce.c.)


Andreas

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-15 13:16                               ` Ingo Molnar
  2011-04-15 14:33                                 ` Joerg Roedel
@ 2011-04-15 15:46                                 ` Joerg Roedel
  2011-04-15 16:11                                   ` Jerome Glisse
  1 sibling, 1 reply; 108+ messages in thread
From: Joerg Roedel @ 2011-04-15 15:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, H. Peter Anvin, Yinghai Lu, Alex Deucher,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo,
	alexandre.f.demers

On Fri, Apr 15, 2011 at 03:16:50PM +0200, Ingo Molnar wrote:
> Ok, but how did the allocation changes start triggering this error in 
> v2.6.39-rc1? There must still be some layout specific thing here, right?
> Do we understand the details of that as well?

Well, thinking again about this, the GPU likely generated this DMA
request before too (which has an address in the range configured for the
GTT on the card), but nobody noticed because they just hit main memory.
And with the allocation changes in 39-rc1 the GART aperture started to
be on the same address as the GTT (in their respective address spaces)
so that the DMA request hit the GART. This caused the MCE and the
sync-flood.
The open question is why the GPU generates a DMA request with an address
that is configured as the GTT base (+1 page) on the card.

	Joerg


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-15 14:33                                 ` Joerg Roedel
@ 2011-04-15 16:11                                   ` Alex Deucher
  0 siblings, 0 replies; 108+ messages in thread
From: Alex Deucher @ 2011-04-15 16:11 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Ingo Molnar, Linus Torvalds, H. Peter Anvin, Yinghai Lu,
	Linux Kernel Mailing List, dri-devel, Thomas Gleixner, Tejun Heo,
	alexandre.f.demers

On Fri, Apr 15, 2011 at 10:33 AM, Joerg Roedel <joro@8bytes.org> wrote:
> On Fri, Apr 15, 2011 at 03:16:50PM +0200, Ingo Molnar wrote:
>> Ok, but how did the allocation changes start triggering this error in
>> v2.6.39-rc1? There must still be some layout specific thing here, right?
>> Do we understand the details of that as well?
>
> No, I must admit that I lack enough knowledge about the GPU hardware to
> make an guess how this tanslation-request happened. All I can tell is
> the address that was reported in the MCE, it is 0xa0001000 (==the second
> page of the GART aperture).
>
> Maybe Alex can help here. Alex, may it be possible that the GPU
> generates DMA requests in the GTT area before the GTT is activated (or
> the activation is completed)? Or can you imagine any other reason?

It shouldn't.  The driver binds a dummy page to all entries in the
table at init time and whenever the actual pages are unbound.

Alex

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-15 15:46                                 ` Joerg Roedel
@ 2011-04-15 16:11                                   ` Jerome Glisse
  2011-04-16 16:35                                       ` Joerg Roedel
  0 siblings, 1 reply; 108+ messages in thread
From: Jerome Glisse @ 2011-04-15 16:11 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Ingo Molnar, Linus Torvalds, Linux Kernel Mailing List,
	dri-devel, Tejun Heo, H. Peter Anvin, Thomas Gleixner,
	Yinghai Lu, alexandre.f.demers

On Fri, Apr 15, 2011 at 11:46 AM, Joerg Roedel <joro@8bytes.org> wrote:
> On Fri, Apr 15, 2011 at 03:16:50PM +0200, Ingo Molnar wrote:
>> Ok, but how did the allocation changes start triggering this error in
>> v2.6.39-rc1? There must still be some layout specific thing here, right?
>> Do we understand the details of that as well?
>
> Well, thinking again about this, the GPU likely generated this DMA
> request before too (which has an address in the range configured for the
> GTT on the card), but nobody noticed because they just hit main memory.
> And with the allocation changes in 39-rc1 the GART aperture started to
> be on the same address as the GTT (in their respective address spaces)
> so that the DMA request hit the GART. This caused the MCE and the
> sync-flood.
> The open question is why the GPU generates a DMA request with an address
> that is configured as the GTT base (+1 page) on the card.
>
>        Joerg
>

Do you also got the write if you load radeon with radeon.no_wb=1 ?
I think at this address it's the wb page, or maybe the cp as wb likely
take only one page

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-15 14:27                                   ` Joerg Roedel
  (?)
@ 2011-04-15 18:59                                   ` Alexandre Demers
  2011-04-15 19:06                                     ` Ingo Molnar
  -1 siblings, 1 reply; 108+ messages in thread
From: Alexandre Demers @ 2011-04-15 18:59 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Linus Torvalds, H. Peter Anvin, Yinghai Lu, Ingo Molnar,
	Alex Deucher, Linux Kernel Mailing List, dri-devel,
	Thomas Gleixner, Tejun Heo

On 11-04-15 10:27 AM, Joerg Roedel wrote:
> On Fri, Apr 15, 2011 at 10:16:59AM -0400, Alexandre Demers wrote:
>> Ok, I'll test it today. Should I apply it on a clean rc3 without any of
>> the other patches?
> Yes, apply it just on -rc3 without any other patch.
>
>> BTW, may I suggest adding the info under bug 33012 in kernel bugzilla?
>> This could be useful in the future.
> Cool, thanks
>
>
> 	Joerg
The patch was applied and tested. It looks fine, I'm able to boot
without problem.

-- 
Alexandre Demers


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-15 18:59                                   ` Alexandre Demers
@ 2011-04-15 19:06                                     ` Ingo Molnar
  2011-04-15 19:18                                       ` Yinghai Lu
  2011-04-16 12:00                                       ` Joerg Roedel
  0 siblings, 2 replies; 108+ messages in thread
From: Ingo Molnar @ 2011-04-15 19:06 UTC (permalink / raw)
  To: Alexandre Demers
  Cc: Joerg Roedel, Linus Torvalds, H. Peter Anvin, Yinghai Lu,
	Alex Deucher, Linux Kernel Mailing List, dri-devel,
	Thomas Gleixner, Tejun Heo


* Alexandre Demers <alexandre.f.demers@gmail.com> wrote:

> On 11-04-15 10:27 AM, Joerg Roedel wrote:
> > On Fri, Apr 15, 2011 at 10:16:59AM -0400, Alexandre Demers wrote:
> >> Ok, I'll test it today. Should I apply it on a clean rc3 without any of
> >> the other patches?
> > Yes, apply it just on -rc3 without any other patch.
> >
> >> BTW, may I suggest adding the info under bug 33012 in kernel bugzilla?
> >> This could be useful in the future.
> > Cool, thanks
> >
> >
> > 	Joerg
> The patch was applied and tested. It looks fine, I'm able to boot
> without problem.

Joerg, mind submitting it with a changelog that includes everything we learned 
about this bug and all the Tested-by's in place?

Is anyone of the opinion that we should try to revert the allocation 
order/alignment changes in addition to this fix?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-15 19:06                                     ` Ingo Molnar
@ 2011-04-15 19:18                                       ` Yinghai Lu
  2011-04-15 20:22                                         ` H. Peter Anvin
  2011-04-16 12:01                                           ` Joerg Roedel
  2011-04-16 12:00                                       ` Joerg Roedel
  1 sibling, 2 replies; 108+ messages in thread
From: Yinghai Lu @ 2011-04-15 19:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexandre Demers, Joerg Roedel, Linus Torvalds, H. Peter Anvin,
	Alex Deucher, Linux Kernel Mailing List, dri-devel,
	Thomas Gleixner, Tejun Heo

On 04/15/2011 12:06 PM, Ingo Molnar wrote:

> 
> Joerg, mind submitting it with a changelog that includes everything we learned 
> about this bug and all the Tested-by's in place?
> 
> Is anyone of the opinion that we should try to revert the allocation 
> order/alignment changes in addition to this fix?

We should figure out what is written to 0xa0001000 (main memory) by GPU before internal GART is setup.

Joerg,
can you insert some dump code in the drm/radon code to find out which function cause the problem?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-15 19:18                                       ` Yinghai Lu
@ 2011-04-15 20:22                                         ` H. Peter Anvin
  2011-04-16 12:01                                           ` Joerg Roedel
  1 sibling, 0 replies; 108+ messages in thread
From: H. Peter Anvin @ 2011-04-15 20:22 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Alexandre Demers, Joerg Roedel, Linus Torvalds,
	Alex Deucher, Linux Kernel Mailing List, dri-devel,
	Thomas Gleixner, Tejun Heo

On 04/15/2011 12:18 PM, Yinghai Lu wrote:
> On 04/15/2011 12:06 PM, Ingo Molnar wrote:
> 
>>
>> Joerg, mind submitting it with a changelog that includes everything we learned 
>> about this bug and all the Tested-by's in place?
>>
>> Is anyone of the opinion that we should try to revert the allocation 
>> order/alignment changes in addition to this fix?
> 
> We should figure out what is written to 0xa0001000 (main memory) by GPU before internal GART is setup.
> 
> Joerg,
> can you insert some dump code in the drm/radon code to find out which function cause the problem?
> 

Yes, I would like to make sure we don't just paper over a real bug
(again).  I think we still should talk Joerg's patch since it seems to
be the right thing to do anyway, but I do want to make sure we don't
have a memory-overwrite bug in the kernel that we're papering over.

	-hpa

^ permalink raw reply	[flat|nested] 108+ messages in thread

* [tip:x86/urgent] x86, amd: Disable GartTlbWlkErr when BIOS forgets it
  2011-04-15 13:11                             ` Joerg Roedel
                                                 ` (2 preceding siblings ...)
  2011-04-15 14:16                               ` Alexandre Demers
@ 2011-04-16  0:03                               ` tip-bot for Joerg Roedel
  3 siblings, 0 replies; 108+ messages in thread
From: tip-bot for Joerg Roedel @ 2011-04-16  0:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, joerg.roedel, stable,
	alexandre.f.demers, tglx, hpa

Commit-ID:  5bbc097d890409d8eff4e3f1d26f11a9d6b7c07e
Gitweb:     http://git.kernel.org/tip/5bbc097d890409d8eff4e3f1d26f11a9d6b7c07e
Author:     Joerg Roedel <joerg.roedel@amd.com>
AuthorDate: Fri, 15 Apr 2011 14:47:40 +0200
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 15 Apr 2011 16:03:16 -0700

x86, amd: Disable GartTlbWlkErr when BIOS forgets it

This patch disables GartTlbWlk errors on AMD Fam10h CPUs if
the BIOS forgets to do is (or is just too old). Letting
these errors enabled can cause a sync-flood on the CPU
causing a reboot.

The AMD BKDG recommends disabling GART TLB Wlk Error completely.

This patch is the fix for

	https://bugzilla.kernel.org/show_bug.cgi?id=33012

on my machine.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Link: http://lkml.kernel.org/r/20110415131152.GJ18463@8bytes.org
Tested-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/msr-index.h |    4 ++++
 arch/x86/kernel/cpu/amd.c        |   19 +++++++++++++++++++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index fd5a1f3..3cce714 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -96,11 +96,15 @@
 #define MSR_IA32_MC0_ADDR		0x00000402
 #define MSR_IA32_MC0_MISC		0x00000403
 
+#define MSR_AMD64_MC0_MASK		0xc0010044
+
 #define MSR_IA32_MCx_CTL(x)		(MSR_IA32_MC0_CTL + 4*(x))
 #define MSR_IA32_MCx_STATUS(x)		(MSR_IA32_MC0_STATUS + 4*(x))
 #define MSR_IA32_MCx_ADDR(x)		(MSR_IA32_MC0_ADDR + 4*(x))
 #define MSR_IA32_MCx_MISC(x)		(MSR_IA32_MC0_MISC + 4*(x))
 
+#define MSR_AMD64_MCx_MASK(x)		(MSR_AMD64_MC0_MASK + (x))
+
 /* These are consecutive and not in the normal 4er MCE bank block */
 #define MSR_IA32_MC0_CTL2		0x00000280
 #define MSR_IA32_MCx_CTL2(x)		(MSR_IA32_MC0_CTL2 + (x))
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 3ecece0..3532d3b 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -615,6 +615,25 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c)
 	/* As a rule processors have APIC timer running in deep C states */
 	if (c->x86 >= 0xf && !cpu_has_amd_erratum(amd_erratum_400))
 		set_cpu_cap(c, X86_FEATURE_ARAT);
+
+	/*
+	 * Disable GART TLB Walk Errors on Fam10h. We do this here
+	 * because this is always needed when GART is enabled, even in a
+	 * kernel which has no MCE support built in.
+	 */
+	if (c->x86 == 0x10) {
+		/*
+		 * BIOS should disable GartTlbWlk Errors themself. If
+		 * it doesn't do it here as suggested by the BKDG.
+		 *
+		 * Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=33012
+		 */
+		u64 mask;
+
+		rdmsrl(MSR_AMD64_MCx_MASK(4), mask);
+		mask |= (1 << 10);
+		wrmsrl(MSR_AMD64_MCx_MASK(4), mask);
+	}
 }
 
 #ifdef CONFIG_X86_32

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-15 19:06                                     ` Ingo Molnar
  2011-04-15 19:18                                       ` Yinghai Lu
@ 2011-04-16 12:00                                       ` Joerg Roedel
  2011-04-16 12:21                                           ` Ingo Molnar
  1 sibling, 1 reply; 108+ messages in thread
From: Joerg Roedel @ 2011-04-16 12:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexandre Demers, Linus Torvalds, H. Peter Anvin, Yinghai Lu,
	Alex Deucher, Linux Kernel Mailing List, dri-devel,
	Thomas Gleixner, Tejun Heo

On Fri, Apr 15, 2011 at 09:06:41PM +0200, Ingo Molnar wrote:
> 
> * Alexandre Demers <alexandre.f.demers@gmail.com> wrote:
> 
> > On 11-04-15 10:27 AM, Joerg Roedel wrote:
> > > On Fri, Apr 15, 2011 at 10:16:59AM -0400, Alexandre Demers wrote:
> > >> Ok, I'll test it today. Should I apply it on a clean rc3 without any of
> > >> the other patches?
> > > Yes, apply it just on -rc3 without any other patch.
> > >
> > >> BTW, may I suggest adding the info under bug 33012 in kernel bugzilla?
> > >> This could be useful in the future.
> > > Cool, thanks
> > >
> > >
> > > 	Joerg
> > The patch was applied and tested. It looks fine, I'm able to boot
> > without problem.
> 
> Joerg, mind submitting it with a changelog that includes everything we learned 
> about this bug and all the Tested-by's in place?

Looks like I am too late, it is already applied. But the changelog
contains a link to the korg-bugzilla which has all information too. So
the information is not lost.

	Joerg


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-15 19:18                                       ` Yinghai Lu
@ 2011-04-16 12:01                                           ` Joerg Roedel
  2011-04-16 12:01                                           ` Joerg Roedel
  1 sibling, 0 replies; 108+ messages in thread
From: Joerg Roedel @ 2011-04-16 12:01 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Alexandre Demers, Linus Torvalds, H. Peter Anvin,
	Alex Deucher, Linux Kernel Mailing List, dri-devel,
	Thomas Gleixner, Tejun Heo

On Fri, Apr 15, 2011 at 12:18:02PM -0700, Yinghai Lu wrote:
> On 04/15/2011 12:06 PM, Ingo Molnar wrote:
> 
> > 
> > Joerg, mind submitting it with a changelog that includes everything we learned 
> > about this bug and all the Tested-by's in place?
> > 
> > Is anyone of the opinion that we should try to revert the allocation 
> > order/alignment changes in addition to this fix?
> 
> We should figure out what is written to 0xa0001000 (main memory) by GPU before internal GART is setup.
> 
> Joerg,
> can you insert some dump code in the drm/radon code to find out which
> function cause the problem?

I am not a GPU expert, but I will see what I can find out.

	Joerg


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
@ 2011-04-16 12:01                                           ` Joerg Roedel
  0 siblings, 0 replies; 108+ messages in thread
From: Joerg Roedel @ 2011-04-16 12:01 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linux Kernel Mailing List, dri-devel, Tejun Heo, H. Peter Anvin,
	Linus Torvalds, Thomas Gleixner, Alexandre Demers

On Fri, Apr 15, 2011 at 12:18:02PM -0700, Yinghai Lu wrote:
> On 04/15/2011 12:06 PM, Ingo Molnar wrote:
> 
> > 
> > Joerg, mind submitting it with a changelog that includes everything we learned 
> > about this bug and all the Tested-by's in place?
> > 
> > Is anyone of the opinion that we should try to revert the allocation 
> > order/alignment changes in addition to this fix?
> 
> We should figure out what is written to 0xa0001000 (main memory) by GPU before internal GART is setup.
> 
> Joerg,
> can you insert some dump code in the drm/radon code to find out which
> function cause the problem?

I am not a GPU expert, but I will see what I can find out.

	Joerg

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-16 12:00                                       ` Joerg Roedel
@ 2011-04-16 12:21                                           ` Ingo Molnar
  0 siblings, 0 replies; 108+ messages in thread
From: Ingo Molnar @ 2011-04-16 12:21 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Alexandre Demers, Linus Torvalds, H. Peter Anvin, Yinghai Lu,
	Alex Deucher, Linux Kernel Mailing List, dri-devel,
	Thomas Gleixner, Tejun Heo


* Joerg Roedel <joro@8bytes.org> wrote:

> On Fri, Apr 15, 2011 at 09:06:41PM +0200, Ingo Molnar wrote:
> > 
> > * Alexandre Demers <alexandre.f.demers@gmail.com> wrote:
> > 
> > > On 11-04-15 10:27 AM, Joerg Roedel wrote:
> > > > On Fri, Apr 15, 2011 at 10:16:59AM -0400, Alexandre Demers wrote:
> > > >> Ok, I'll test it today. Should I apply it on a clean rc3 without any of
> > > >> the other patches?
> > > > Yes, apply it just on -rc3 without any other patch.
> > > >
> > > >> BTW, may I suggest adding the info under bug 33012 in kernel bugzilla?
> > > >> This could be useful in the future.
> > > > Cool, thanks
> > > >
> > > >
> > > > 	Joerg
> > > The patch was applied and tested. It looks fine, I'm able to boot
> > > without problem.
> > 
> > Joerg, mind submitting it with a changelog that includes everything we learned 
> > about this bug and all the Tested-by's in place?
> 
> Looks like I am too late, it is already applied. But the changelog
> contains a link to the korg-bugzilla which has all information too. So
> the information is not lost.

Yeah. In this case getting the fix into -rc4 in a timely manner looked more 
important than waiting for an updated changelog :-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
@ 2011-04-16 12:21                                           ` Ingo Molnar
  0 siblings, 0 replies; 108+ messages in thread
From: Ingo Molnar @ 2011-04-16 12:21 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Linus Torvalds, Linux Kernel Mailing List, dri-devel, Tejun Heo,
	H. Peter Anvin, Thomas Gleixner, Yinghai Lu, Alexandre Demers


* Joerg Roedel <joro@8bytes.org> wrote:

> On Fri, Apr 15, 2011 at 09:06:41PM +0200, Ingo Molnar wrote:
> > 
> > * Alexandre Demers <alexandre.f.demers@gmail.com> wrote:
> > 
> > > On 11-04-15 10:27 AM, Joerg Roedel wrote:
> > > > On Fri, Apr 15, 2011 at 10:16:59AM -0400, Alexandre Demers wrote:
> > > >> Ok, I'll test it today. Should I apply it on a clean rc3 without any of
> > > >> the other patches?
> > > > Yes, apply it just on -rc3 without any other patch.
> > > >
> > > >> BTW, may I suggest adding the info under bug 33012 in kernel bugzilla?
> > > >> This could be useful in the future.
> > > > Cool, thanks
> > > >
> > > >
> > > > 	Joerg
> > > The patch was applied and tested. It looks fine, I'm able to boot
> > > without problem.
> > 
> > Joerg, mind submitting it with a changelog that includes everything we learned 
> > about this bug and all the Tested-by's in place?
> 
> Looks like I am too late, it is already applied. But the changelog
> contains a link to the korg-bugzilla which has all information too. So
> the information is not lost.

Yeah. In this case getting the fix into -rc4 in a timely manner looked more 
important than waiting for an updated changelog :-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-15 16:11                                   ` Jerome Glisse
@ 2011-04-16 16:35                                       ` Joerg Roedel
  0 siblings, 0 replies; 108+ messages in thread
From: Joerg Roedel @ 2011-04-16 16:35 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Ingo Molnar, Linus Torvalds, Linux Kernel Mailing List,
	dri-devel, Tejun Heo, H. Peter Anvin, Thomas Gleixner,
	Yinghai Lu, alexandre.f.demers

On Fri, Apr 15, 2011 at 12:11:28PM -0400, Jerome Glisse wrote:
> Do you also got the write if you load radeon with radeon.no_wb=1 ?
> I think at this address it's the wb page, or maybe the cp as wb likely
> take only one page

radeon.no_wb=1 makes no difference. The box still reboots.

	Joerg


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
@ 2011-04-16 16:35                                       ` Joerg Roedel
  0 siblings, 0 replies; 108+ messages in thread
From: Joerg Roedel @ 2011-04-16 16:35 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Yinghai Lu, Linux Kernel Mailing List, dri-devel, H. Peter Anvin,
	Tejun Heo, Linus Torvalds, Thomas Gleixner, alexandre.f.demers

On Fri, Apr 15, 2011 at 12:11:28PM -0400, Jerome Glisse wrote:
> Do you also got the write if you load radeon with radeon.no_wb=1 ?
> I think at this address it's the wb page, or maybe the cp as wb likely
> take only one page

radeon.no_wb=1 makes no difference. The box still reboots.

	Joerg

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-16 16:35                                       ` Joerg Roedel
@ 2011-04-16 18:54                                         ` Jerome Glisse
  -1 siblings, 0 replies; 108+ messages in thread
From: Jerome Glisse @ 2011-04-16 18:54 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Ingo Molnar, Linus Torvalds, Linux Kernel Mailing List,
	dri-devel, Tejun Heo, H. Peter Anvin, Thomas Gleixner,
	Yinghai Lu, alexandre.f.demers

On Sat, Apr 16, 2011 at 12:35 PM, Joerg Roedel <joro@8bytes.org> wrote:
> On Fri, Apr 15, 2011 at 12:11:28PM -0400, Jerome Glisse wrote:
>> Do you also got the write if you load radeon with radeon.no_wb=1 ?
>> I think at this address it's the wb page, or maybe the cp as wb likely
>> take only one page
>
> radeon.no_wb=1 makes no difference. The box still reboots.
>
>        Joerg
>
>

If you want to go the printk way you can add printk before each test
ring_test, ib_test in r600.c this 2 functions are the own that might
trigger the first GPU gart activities.

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
@ 2011-04-16 18:54                                         ` Jerome Glisse
  0 siblings, 0 replies; 108+ messages in thread
From: Jerome Glisse @ 2011-04-16 18:54 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Yinghai Lu, Linux Kernel Mailing List, dri-devel, H. Peter Anvin,
	Tejun Heo, Linus Torvalds, Thomas Gleixner, alexandre.f.demers

On Sat, Apr 16, 2011 at 12:35 PM, Joerg Roedel <joro@8bytes.org> wrote:
> On Fri, Apr 15, 2011 at 12:11:28PM -0400, Jerome Glisse wrote:
>> Do you also got the write if you load radeon with radeon.no_wb=1 ?
>> I think at this address it's the wb page, or maybe the cp as wb likely
>> take only one page
>
> radeon.no_wb=1 makes no difference. The box still reboots.
>
>        Joerg
>
>

If you want to go the printk way you can add printk before each test
ring_test, ib_test in r600.c this 2 functions are the own that might
trigger the first GPU gart activities.

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-16 18:54                                         ` Jerome Glisse
  (?)
@ 2011-04-17 14:09                                         ` Joerg Roedel
  2011-04-18  1:12                                           ` Jerome Glisse
  2011-04-18 15:23                                             ` Alex Deucher
  -1 siblings, 2 replies; 108+ messages in thread
From: Joerg Roedel @ 2011-04-17 14:09 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Ingo Molnar, Linus Torvalds, Linux Kernel Mailing List,
	dri-devel, Tejun Heo, H. Peter Anvin, Thomas Gleixner,
	Yinghai Lu, alexandre.f.demers

On Sat, Apr 16, 2011 at 02:54:04PM -0400, Jerome Glisse wrote:

> If you want to go the printk way you can add printk before each test
> ring_test, ib_test in r600.c this 2 functions are the own that might
> trigger the first GPU gart activities.

Okay, I found the place in source that triggers this. It happens in the
function r600_ib_test. The interesting thing is that not the ib-command
itself is responsible but the fence that is emitted afterwards (proved
by removing the fence command, where the problem went away).
I don't know enough about the command semantics to make a guess what
goes wrong there. But maybe you GPU folks have an idea?

	Joerg


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-17 14:09                                         ` Joerg Roedel
@ 2011-04-18  1:12                                           ` Jerome Glisse
  2011-04-18 15:23                                             ` Alex Deucher
  1 sibling, 0 replies; 108+ messages in thread
From: Jerome Glisse @ 2011-04-18  1:12 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Ingo Molnar, Linus Torvalds, Linux Kernel Mailing List,
	dri-devel, Tejun Heo, H. Peter Anvin, Thomas Gleixner,
	Yinghai Lu, alexandre.f.demers

On Sun, Apr 17, 2011 at 10:09 AM, Joerg Roedel <joro@8bytes.org> wrote:
> On Sat, Apr 16, 2011 at 02:54:04PM -0400, Jerome Glisse wrote:
>
>> If you want to go the printk way you can add printk before each test
>> ring_test, ib_test in r600.c this 2 functions are the own that might
>> trigger the first GPU gart activities.
>
> Okay, I found the place in source that triggers this. It happens in the
> function r600_ib_test. The interesting thing is that not the ib-command
> itself is responsible but the fence that is emitted afterwards (proved
> by removing the fence command, where the problem went away).
> I don't know enough about the command semantics to make a guess what
> goes wrong there. But maybe you GPU folks have an idea?
>
>        Joerg
>
>

I can't think of any theory, at that point the wb, irq ring, cp buffer
& ib pool are all allocated and pinned into gtt so they all have valid
entry backed by a real page. Maybe the GART flush & update is
seriously buggy but i expect we would have been hurt sooner by such
things. Maybe there is a bug in the hw... wouldn't be surprised. Will
try to think to crazy theory.

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-17 14:09                                         ` Joerg Roedel
@ 2011-04-18 15:23                                             ` Alex Deucher
  2011-04-18 15:23                                             ` Alex Deucher
  1 sibling, 0 replies; 108+ messages in thread
From: Alex Deucher @ 2011-04-18 15:23 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Jerome Glisse, Yinghai Lu, Linux Kernel Mailing List, dri-devel,
	H. Peter Anvin, Tejun Heo, Linus Torvalds, Thomas Gleixner,
	alexandre.f.demers

On Sun, Apr 17, 2011 at 10:09 AM, Joerg Roedel <joro@8bytes.org> wrote:
> On Sat, Apr 16, 2011 at 02:54:04PM -0400, Jerome Glisse wrote:
>
>> If you want to go the printk way you can add printk before each test
>> ring_test, ib_test in r600.c this 2 functions are the own that might
>> trigger the first GPU gart activities.
>
> Okay, I found the place in source that triggers this. It happens in the
> function r600_ib_test. The interesting thing is that not the ib-command
> itself is responsible but the fence that is emitted afterwards (proved
> by removing the fence command, where the problem went away).
> I don't know enough about the command semantics to make a guess what
> goes wrong there. But maybe you GPU folks have an idea?
>

I can't think of anything off hand.  It might be worth disabling the
call to r600_ib_test() in r600_init() and then seeing if you get any
errors when the fences are used later on when X starts or just at that
point in the module load sequence.  What's odd is that when you tested
radeon.no_wb=1 you got the same behavior as that disables shadowing of
fence writes to gpu gart mem, so it wouldn't be writing to memory in
that case.

Alex

>        Joerg
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
@ 2011-04-18 15:23                                             ` Alex Deucher
  0 siblings, 0 replies; 108+ messages in thread
From: Alex Deucher @ 2011-04-18 15:23 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Linus Torvalds, Linux Kernel Mailing List, dri-devel, Tejun Heo,
	H. Peter Anvin, Thomas Gleixner, Yinghai Lu, alexandre.f.demers

On Sun, Apr 17, 2011 at 10:09 AM, Joerg Roedel <joro@8bytes.org> wrote:
> On Sat, Apr 16, 2011 at 02:54:04PM -0400, Jerome Glisse wrote:
>
>> If you want to go the printk way you can add printk before each test
>> ring_test, ib_test in r600.c this 2 functions are the own that might
>> trigger the first GPU gart activities.
>
> Okay, I found the place in source that triggers this. It happens in the
> function r600_ib_test. The interesting thing is that not the ib-command
> itself is responsible but the fence that is emitted afterwards (proved
> by removing the fence command, where the problem went away).
> I don't know enough about the command semantics to make a guess what
> goes wrong there. But maybe you GPU folks have an idea?
>

I can't think of anything off hand.  It might be worth disabling the
call to r600_ib_test() in r600_init() and then seeing if you get any
errors when the fences are used later on when X starts or just at that
point in the module load sequence.  What's odd is that when you tested
radeon.no_wb=1 you got the same behavior as that disables shadowing of
fence writes to gpu gart mem, so it wouldn't be writing to memory in
that case.

Alex

>        Joerg
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-18 15:23                                             ` Alex Deucher
  (?)
@ 2011-04-18 15:29                                             ` Jerome Glisse
  2011-04-18 15:33                                               ` Alex Deucher
  -1 siblings, 1 reply; 108+ messages in thread
From: Jerome Glisse @ 2011-04-18 15:29 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Joerg Roedel, Yinghai Lu, Linux Kernel Mailing List, dri-devel,
	H. Peter Anvin, Tejun Heo, Linus Torvalds, Thomas Gleixner,
	alexandre.f.demers

On Mon, Apr 18, 2011 at 11:23 AM, Alex Deucher <alexdeucher@gmail.com> wrote:
> On Sun, Apr 17, 2011 at 10:09 AM, Joerg Roedel <joro@8bytes.org> wrote:
>> On Sat, Apr 16, 2011 at 02:54:04PM -0400, Jerome Glisse wrote:
>>
>>> If you want to go the printk way you can add printk before each test
>>> ring_test, ib_test in r600.c this 2 functions are the own that might
>>> trigger the first GPU gart activities.
>>
>> Okay, I found the place in source that triggers this. It happens in the
>> function r600_ib_test. The interesting thing is that not the ib-command
>> itself is responsible but the fence that is emitted afterwards (proved
>> by removing the fence command, where the problem went away).
>> I don't know enough about the command semantics to make a guess what
>> goes wrong there. But maybe you GPU folks have an idea?
>>
>
> I can't think of anything off hand.  It might be worth disabling the
> call to r600_ib_test() in r600_init() and then seeing if you get any
> errors when the fences are used later on when X starts or just at that
> point in the module load sequence.  What's odd is that when you tested
> radeon.no_wb=1 you got the same behavior as that disables shadowing of
> fence writes to gpu gart mem, so it wouldn't be writing to memory in
> that case.
>
> Alex
>

It might be the irq ring write that is faulty.

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-18 15:29                                             ` Jerome Glisse
@ 2011-04-18 15:33                                               ` Alex Deucher
  2011-04-18 15:59                                                 ` Jerome Glisse
  0 siblings, 1 reply; 108+ messages in thread
From: Alex Deucher @ 2011-04-18 15:33 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Joerg Roedel, Yinghai Lu, Linux Kernel Mailing List, dri-devel,
	H. Peter Anvin, Tejun Heo, Linus Torvalds, Thomas Gleixner,
	alexandre.f.demers

On Mon, Apr 18, 2011 at 11:29 AM, Jerome Glisse <j.glisse@gmail.com> wrote:
> On Mon, Apr 18, 2011 at 11:23 AM, Alex Deucher <alexdeucher@gmail.com> wrote:
>> On Sun, Apr 17, 2011 at 10:09 AM, Joerg Roedel <joro@8bytes.org> wrote:
>>> On Sat, Apr 16, 2011 at 02:54:04PM -0400, Jerome Glisse wrote:
>>>
>>>> If you want to go the printk way you can add printk before each test
>>>> ring_test, ib_test in r600.c this 2 functions are the own that might
>>>> trigger the first GPU gart activities.
>>>
>>> Okay, I found the place in source that triggers this. It happens in the
>>> function r600_ib_test. The interesting thing is that not the ib-command
>>> itself is responsible but the fence that is emitted afterwards (proved
>>> by removing the fence command, where the problem went away).
>>> I don't know enough about the command semantics to make a guess what
>>> goes wrong there. But maybe you GPU folks have an idea?
>>>
>>
>> I can't think of anything off hand.  It might be worth disabling the
>> call to r600_ib_test() in r600_init() and then seeing if you get any
>> errors when the fences are used later on when X starts or just at that
>> point in the module load sequence.  What's odd is that when you tested
>> radeon.no_wb=1 you got the same behavior as that disables shadowing of
>> fence writes to gpu gart mem, so it wouldn't be writing to memory in
>> that case.
>>
>> Alex
>>
>
> It might be the irq ring write that is faulty.

That's disabled with no_wb=1 as well.

Alex

>
> Cheers,
> Jerome
>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-18 15:33                                               ` Alex Deucher
@ 2011-04-18 15:59                                                 ` Jerome Glisse
  2011-04-18 16:35                                                   ` Alex Deucher
  0 siblings, 1 reply; 108+ messages in thread
From: Jerome Glisse @ 2011-04-18 15:59 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Joerg Roedel, Yinghai Lu, Linux Kernel Mailing List, dri-devel,
	H. Peter Anvin, Tejun Heo, Linus Torvalds, Thomas Gleixner,
	alexandre.f.demers

On Mon, Apr 18, 2011 at 11:33 AM, Alex Deucher <alexdeucher@gmail.com> wrote:
> On Mon, Apr 18, 2011 at 11:29 AM, Jerome Glisse <j.glisse@gmail.com> wrote:
>> On Mon, Apr 18, 2011 at 11:23 AM, Alex Deucher <alexdeucher@gmail.com> wrote:
>>> On Sun, Apr 17, 2011 at 10:09 AM, Joerg Roedel <joro@8bytes.org> wrote:
>>>> On Sat, Apr 16, 2011 at 02:54:04PM -0400, Jerome Glisse wrote:
>>>>
>>>>> If you want to go the printk way you can add printk before each test
>>>>> ring_test, ib_test in r600.c this 2 functions are the own that might
>>>>> trigger the first GPU gart activities.
>>>>
>>>> Okay, I found the place in source that triggers this. It happens in the
>>>> function r600_ib_test. The interesting thing is that not the ib-command
>>>> itself is responsible but the fence that is emitted afterwards (proved
>>>> by removing the fence command, where the problem went away).
>>>> I don't know enough about the command semantics to make a guess what
>>>> goes wrong there. But maybe you GPU folks have an idea?
>>>>
>>>
>>> I can't think of anything off hand.  It might be worth disabling the
>>> call to r600_ib_test() in r600_init() and then seeing if you get any
>>> errors when the fences are used later on when X starts or just at that
>>> point in the module load sequence.  What's odd is that when you tested
>>> radeon.no_wb=1 you got the same behavior as that disables shadowing of
>>> fence writes to gpu gart mem, so it wouldn't be writing to memory in
>>> that case.
>>>
>>> Alex
>>>
>>
>> It might be the irq ring write that is faulty.
>
> That's disabled with no_wb=1 as well.
>
> Alex
>

I mean the irq interrupt ring, i don't see this being disabled when no_wb=1

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-18 15:59                                                 ` Jerome Glisse
@ 2011-04-18 16:35                                                   ` Alex Deucher
  0 siblings, 0 replies; 108+ messages in thread
From: Alex Deucher @ 2011-04-18 16:35 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Joerg Roedel, Yinghai Lu, Linux Kernel Mailing List, dri-devel,
	H. Peter Anvin, Tejun Heo, Linus Torvalds, Thomas Gleixner,
	alexandre.f.demers

On Mon, Apr 18, 2011 at 11:59 AM, Jerome Glisse <j.glisse@gmail.com> wrote:
> On Mon, Apr 18, 2011 at 11:33 AM, Alex Deucher <alexdeucher@gmail.com> wrote:
>> On Mon, Apr 18, 2011 at 11:29 AM, Jerome Glisse <j.glisse@gmail.com> wrote:
>>> On Mon, Apr 18, 2011 at 11:23 AM, Alex Deucher <alexdeucher@gmail.com> wrote:
>>>> On Sun, Apr 17, 2011 at 10:09 AM, Joerg Roedel <joro@8bytes.org> wrote:
>>>>> On Sat, Apr 16, 2011 at 02:54:04PM -0400, Jerome Glisse wrote:
>>>>>
>>>>>> If you want to go the printk way you can add printk before each test
>>>>>> ring_test, ib_test in r600.c this 2 functions are the own that might
>>>>>> trigger the first GPU gart activities.
>>>>>
>>>>> Okay, I found the place in source that triggers this. It happens in the
>>>>> function r600_ib_test. The interesting thing is that not the ib-command
>>>>> itself is responsible but the fence that is emitted afterwards (proved
>>>>> by removing the fence command, where the problem went away).
>>>>> I don't know enough about the command semantics to make a guess what
>>>>> goes wrong there. But maybe you GPU folks have an idea?
>>>>>
>>>>
>>>> I can't think of anything off hand.  It might be worth disabling the
>>>> call to r600_ib_test() in r600_init() and then seeing if you get any
>>>> errors when the fences are used later on when X starts or just at that
>>>> point in the module load sequence.  What's odd is that when you tested
>>>> radeon.no_wb=1 you got the same behavior as that disables shadowing of
>>>> fence writes to gpu gart mem, so it wouldn't be writing to memory in
>>>> that case.
>>>>
>>>> Alex
>>>>
>>>
>>> It might be the irq ring write that is faulty.
>>
>> That's disabled with no_wb=1 as well.
>>
>> Alex
>>
>
> I mean the irq interrupt ring, i don't see this being disabled when no_wb=1

I meant the IH ring pointer writeback.  The ih ring itself is still in
gart memory.

Alex

>
> Cheers,
> Jerome
>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14  8:20     ` Aneesh Kumar K.V
@ 2011-04-18 22:57       ` Kay Sievers
  2011-04-18 23:02         ` Dave Jones
  2011-04-19  8:23         ` Aneesh Kumar K.V
  0 siblings, 2 replies; 108+ messages in thread
From: Kay Sievers @ 2011-04-18 22:57 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Dave Jones, Linus Torvalds, Linux Kernel Mailing List, Eric Sandeen

On Thu, Apr 14, 2011 at 10:20, Aneesh Kumar K.V
<aneesh.kumar@linux.vnet.ibm.com> wrote:
> On Tue, 12 Apr 2011 15:21:03 -0400, Dave Jones <davej@redhat.com> wrote:
>> On Tue, Apr 12, 2011 at 03:09:34PM -0400, Dave Jones wrote:
>>
>>  > however, the output of mount looks very confused..
>>  >
>>  > .38:
>>  > /dev/mapper/vg_adamo-lv_home on /home type ext4 (rw,relatime,seclabel,barrier=1,data=ordered)
>>  >
>>  > .39:
>>  > - on /home type 79a9-4526-888c-1f86d35a6704 (rw,relatime,ext4)
>>  >
>>  > It looks like /proc/self/mountinfo broke abi.
>>  >
>>  > .38:
>>  > 48 45 253:3 / /home rw,relatime - ext4 /dev/mapper/vg_adamo-lv_home rw,seclabel,barrier=1,data=ordered
>>  >
>>  > .39:
>>  > 46 22 253:3 / /home rw,relatime uuid:f3971858-79a9-4526-888c-1f86d35a6704 - ext4 /dev/mapper/vg_adamo-lv_home rw,seclabel,user_xattr,barrier=1,data=ordered
>>
>> looks like this was caused by 93f1c20bc8cdb757be50566eff88d65c3b26881f
>>
>> perhaps adding that string to the end of the line would preserve what mount expects ?
>
> uuid:<value> is the option field  as per
> Documentation/filesystem/proc.txt. There was an error in libmount
> parsing which got fixed upstream recently

Just a simple question about this approach in general? A filesystem
UUID can be changed on disk at any time (tune2fs -U ...).

Your code looks like you copy the bytes to the in-kernel superblock
structure without noticing any later changes on disk? How is that
supposed to work?

Kay

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-18 22:57       ` Kay Sievers
@ 2011-04-18 23:02         ` Dave Jones
  2011-04-18 23:14           ` Kay Sievers
  2011-04-19 11:42           ` Ted Ts'o
  2011-04-19  8:23         ` Aneesh Kumar K.V
  1 sibling, 2 replies; 108+ messages in thread
From: Dave Jones @ 2011-04-18 23:02 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Aneesh Kumar K.V, Linus Torvalds, Linux Kernel Mailing List,
	Eric Sandeen

On Tue, Apr 19, 2011 at 12:57:27AM +0200, Kay Sievers wrote:

 > > uuid:<value> is the option field  as per
 > > Documentation/filesystem/proc.txt. There was an error in libmount
 > > parsing which got fixed upstream recently
 > 
 > Just a simple question about this approach in general? A filesystem
 > UUID can be changed on disk at any time (tune2fs -U ...).
 > 
 > Your code looks like you copy the bytes to the in-kernel superblock
 > structure without noticing any later changes on disk? How is that
 > supposed to work?

I thought tune2fs on a mounted filesystem was always a
"you get to keep both pieces if it breaks" situation.

	Dave


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-18 23:02         ` Dave Jones
@ 2011-04-18 23:14           ` Kay Sievers
  2011-04-19 11:42           ` Ted Ts'o
  1 sibling, 0 replies; 108+ messages in thread
From: Kay Sievers @ 2011-04-18 23:14 UTC (permalink / raw)
  To: Dave Jones, Kay Sievers, Aneesh Kumar K.V, Linus Torvalds,
	Linux Kernel Mailing List, Eric Sandeen

On Tue, Apr 19, 2011 at 01:02, Dave Jones <davej@redhat.com> wrote:
> On Tue, Apr 19, 2011 at 12:57:27AM +0200, Kay Sievers wrote:
>
>  > > uuid:<value> is the option field  as per
>  > > Documentation/filesystem/proc.txt. There was an error in libmount
>  > > parsing which got fixed upstream recently
>  >
>  > Just a simple question about this approach in general? A filesystem
>  > UUID can be changed on disk at any time (tune2fs -U ...).
>  >
>  > Your code looks like you copy the bytes to the in-kernel superblock
>  > structure without noticing any later changes on disk? How is that
>  > supposed to work?
>
> I thought tune2fs on a mounted filesystem was always a
> "you get to keep both pieces if it breaks" situation.

No idea, it works fine that way since forever. :)

$ cat /proc/self/mountinfo | grep sda1
21 1 8:1 / / rw,relatime - ext4 /dev/sda1 rw, ...

$ blkid /dev/sda1
/dev/sda1: LABEL="root" UUID="0e4974cc-6a11-11e0-8d7b-002186a23ce5" TYPE="ext4"

$ tune2fs -U time /dev/sda1
tune2fs 1.41.14 (22-Dec-2010)

$ blkid /dev/sda1
/dev/sda1: LABEL="root" UUID="26be6e7c-6a11-11e0-ad62-002186a23ce5" TYPE="ext4"

I don't think that approach makes any sense without doing a call into
the filesystem, and such calls have no place in mountinfo.

Kay

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-18 22:57       ` Kay Sievers
  2011-04-18 23:02         ` Dave Jones
@ 2011-04-19  8:23         ` Aneesh Kumar K.V
  2011-04-19  8:37           ` Steven Whitehouse
  2011-04-19  9:55           ` Kay Sievers
  1 sibling, 2 replies; 108+ messages in thread
From: Aneesh Kumar K.V @ 2011-04-19  8:23 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Dave Jones, Linus Torvalds, Linux Kernel Mailing List, Eric Sandeen

On Tue, 19 Apr 2011 00:57:27 +0200, Kay Sievers <kay.sievers@vrfy.org> wrote:
> On Thu, Apr 14, 2011 at 10:20, Aneesh Kumar K.V
> <aneesh.kumar@linux.vnet.ibm.com> wrote:
> > On Tue, 12 Apr 2011 15:21:03 -0400, Dave Jones <davej@redhat.com> wrote:
> >> On Tue, Apr 12, 2011 at 03:09:34PM -0400, Dave Jones wrote:
> >>
> >>  > however, the output of mount looks very confused..
> >>  >
> >>  > .38:
> >>  > /dev/mapper/vg_adamo-lv_home on /home type ext4 (rw,relatime,seclabel,barrier=1,data=ordered)
> >>  >
> >>  > .39:
> >>  > - on /home type 79a9-4526-888c-1f86d35a6704 (rw,relatime,ext4)
> >>  >
> >>  > It looks like /proc/self/mountinfo broke abi.
> >>  >
> >>  > .38:
> >>  > 48 45 253:3 / /home rw,relatime - ext4 /dev/mapper/vg_adamo-lv_home rw,seclabel,barrier=1,data=ordered
> >>  >
> >>  > .39:
> >>  > 46 22 253:3 / /home rw,relatime uuid:f3971858-79a9-4526-888c-1f86d35a6704 - ext4 /dev/mapper/vg_adamo-lv_home rw,seclabel,user_xattr,barrier=1,data=ordered
> >>
> >> looks like this was caused by 93f1c20bc8cdb757be50566eff88d65c3b26881f
> >>
> >> perhaps adding that string to the end of the line would preserve what mount expects ?
> >
> > uuid:<value> is the option field  as per
> > Documentation/filesystem/proc.txt. There was an error in libmount
> > parsing which got fixed upstream recently
> 
> Just a simple question about this approach in general? A filesystem
> UUID can be changed on disk at any time (tune2fs -U ...).
> 
> Your code looks like you copy the bytes to the in-kernel superblock
> structure without noticing any later changes on disk? How is that
> supposed to work?
> 

Isn't that true even for the fsid returned by statfs ?.  IIUC tune2fs
won't change even the ext4_super_block.s_uuid .

-aneesh

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-19  8:23         ` Aneesh Kumar K.V
@ 2011-04-19  8:37           ` Steven Whitehouse
  2011-04-19  9:55           ` Kay Sievers
  1 sibling, 0 replies; 108+ messages in thread
From: Steven Whitehouse @ 2011-04-19  8:37 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Kay Sievers, Dave Jones, Linus Torvalds,
	Linux Kernel Mailing List, Eric Sandeen

Hi,

On Tue, 2011-04-19 at 13:53 +0530, Aneesh Kumar K.V wrote:
> On Tue, 19 Apr 2011 00:57:27 +0200, Kay Sievers <kay.sievers@vrfy.org> wrote:
> > On Thu, Apr 14, 2011 at 10:20, Aneesh Kumar K.V
> > <aneesh.kumar@linux.vnet.ibm.com> wrote:
> > > On Tue, 12 Apr 2011 15:21:03 -0400, Dave Jones <davej@redhat.com> wrote:
> > >> On Tue, Apr 12, 2011 at 03:09:34PM -0400, Dave Jones wrote:
> > >>
> > >>  > however, the output of mount looks very confused..
> > >>  >
> > >>  > .38:
> > >>  > /dev/mapper/vg_adamo-lv_home on /home type ext4 (rw,relatime,seclabel,barrier=1,data=ordered)
> > >>  >
> > >>  > .39:
> > >>  > - on /home type 79a9-4526-888c-1f86d35a6704 (rw,relatime,ext4)
> > >>  >
> > >>  > It looks like /proc/self/mountinfo broke abi.
> > >>  >
> > >>  > .38:
> > >>  > 48 45 253:3 / /home rw,relatime - ext4 /dev/mapper/vg_adamo-lv_home rw,seclabel,barrier=1,data=ordered
> > >>  >
> > >>  > .39:
> > >>  > 46 22 253:3 / /home rw,relatime uuid:f3971858-79a9-4526-888c-1f86d35a6704 - ext4 /dev/mapper/vg_adamo-lv_home rw,seclabel,user_xattr,barrier=1,data=ordered
> > >>
> > >> looks like this was caused by 93f1c20bc8cdb757be50566eff88d65c3b26881f
> > >>
> > >> perhaps adding that string to the end of the line would preserve what mount expects ?
> > >
> > > uuid:<value> is the option field  as per
> > > Documentation/filesystem/proc.txt. There was an error in libmount
> > > parsing which got fixed upstream recently
> > 
> > Just a simple question about this approach in general? A filesystem
> > UUID can be changed on disk at any time (tune2fs -U ...).
> > 
> > Your code looks like you copy the bytes to the in-kernel superblock
> > structure without noticing any later changes on disk? How is that
> > supposed to work?
> > 
> 
> Isn't that true even for the fsid returned by statfs ?.  IIUC tune2fs
> won't change even the ext4_super_block.s_uuid .
> 
> -aneesh

For gfs2 we insist that the volume label (lock table name) and the uuid
do not change during the lifetime of a mount since they are used as
identifiers by the userland infrastructure. We include them with all
uevent messages, for example, and we do not have a "uuid changed"
uevent. So we use exactly the behaviour proposed: copying the info from
the sb at mount time and then never changing it during the lifetime of
the mount.

The doesn't prevent someone making an on-disk change though while the fs
is mounted with tunegfs2 (or gfs2_tool for older gfs2-utils).

One question though, is why would it be useful to change the uuid of a
mounted filesystem? It seems likely to lead to confusion for no useful
gain,

Steve.



^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-19  8:23         ` Aneesh Kumar K.V
  2011-04-19  8:37           ` Steven Whitehouse
@ 2011-04-19  9:55           ` Kay Sievers
  1 sibling, 0 replies; 108+ messages in thread
From: Kay Sievers @ 2011-04-19  9:55 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Dave Jones, Linus Torvalds, Linux Kernel Mailing List, Eric Sandeen

On Tue, Apr 19, 2011 at 10:23, Aneesh Kumar K.V
<aneesh.kumar@linux.vnet.ibm.com> wrote:
> On Tue, 19 Apr 2011 00:57:27 +0200, Kay Sievers <kay.sievers@vrfy.org> wrote:

>> Just a simple question about this approach in general? A filesystem
>> UUID can be changed on disk at any time (tune2fs -U ...).
>>
>> Your code looks like you copy the bytes to the in-kernel superblock
>> structure without noticing any later changes on disk? How is that
>> supposed to work?
>
> Isn't that true even for the fsid returned by statfs ?.  IIUC tune2fs
> won't change even the ext4_super_block.s_uuid .

What matter is that it's common practice today, to change labels
on-disks of mounted filesystems.

There should probably be getter/setter (like generic ioctls, or
whatever fits) for uuid/label of a mounted filesystem. That call would
also update this new superblock info. Guess that's needed before the
kernel can export such stuff in mountinfo.

So tools at least have a chance to do it right here, and the current
on-disk edit can rightfully be deprecated. Exporting possible
out-of-sync data, without the chance to update it without a "reboot"
really doesn't sound convincing.

Kay

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-18 23:02         ` Dave Jones
  2011-04-18 23:14           ` Kay Sievers
@ 2011-04-19 11:42           ` Ted Ts'o
  1 sibling, 0 replies; 108+ messages in thread
From: Ted Ts'o @ 2011-04-19 11:42 UTC (permalink / raw)
  To: Dave Jones, Kay Sievers, Aneesh Kumar K.V, Linus Torvalds,
	Linux Kernel Mailing List, Eric Sandeen

On Mon, Apr 18, 2011 at 07:02:55PM -0400, Dave Jones wrote:
>  > Your code looks like you copy the bytes to the in-kernel superblock
>  > structure without noticing any later changes on disk? How is that
>  > supposed to work?
> 
> I thought tune2fs on a mounted filesystem was always a
> "you get to keep both pieces if it breaks" situation.

It's actually something that we've supported for a long time, and we
go to some lengths to make it be safe.  Ext[234] always directly
checks things that could be safely changed by tune2fs directly in the
buffer cache where the superblock is stored, and tune2fs checks to see
if the file system is mounted, and (a) will refuse to make certain
changes that are unsafe, and (b) make the changes to the buffer cache
by seeking to the right place in the superblock and only writing the
1/2/4 bytes which are needed to make the change.

So it is something that we've advertised will work, although some
changes only take effect when the file system is mounted and
remounted, even if you are allowed to make the change while the file
system is mounted.  The best known example of this is being able to
on-line convert a mounted root file system to add a journal.  You can
do that while it is mounted, but you have to reboot and/or
mount/remount the file system in order for the journal to start
getting used.

						- Ted

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-15  4:14     ` Christoph Hellwig
@ 2011-04-20 20:12       ` Borislav Petkov
  0 siblings, 0 replies; 108+ messages in thread
From: Borislav Petkov @ 2011-04-20 20:12 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Linus Torvalds, Jens Axboe, Linux Kernel Mailing List

On Fri, Apr 15, 2011 at 12:14:55AM -0400, Christoph Hellwig wrote:
> Jens already has a fix in his tree to always offload the block I/O
> submission to blockd for this case.

FWIW, I can't trigger the warning with 2.6.39-rc4-00089-g2f666bc
anymore. So either Jens' fixes have trickled up to Linus or it is due to
6631e635c65d he mentioned earlier.

Thanks.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-14  2:27                           ` Linus Torvalds
  (?)
  (?)
@ 2011-05-06 21:17                           ` Linus Torvalds
  -1 siblings, 0 replies; 108+ messages in thread
From: Linus Torvalds @ 2011-05-06 21:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Yinghai Lu, Joerg Roedel, Ingo Molnar,
	Alex Deucher, Linux Kernel Mailing List, dri-devel,
	Thomas Gleixner, Tejun Heo

On Wednesday, April 13, 2011, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wednesday, April 13, 2011, H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> Yes.  However, even if we *do* revert (and the time is running short on
>> not reverting) I would like to understand this particular one, simply
>> because I think it may very well be a problem that is manifesting itself
>> in other ways on other systems.

 sorry, fingerfart. Anyway, I agree 100%.

 we definitely want to also understand the reason for things not
working, even if we do revert..

        Linus
>> of complete b*llsh*t magic numbers in this
>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13 14:54 ` Linus Torvalds
@ 2011-04-14 18:28   ` Pavel Machek
  0 siblings, 0 replies; 108+ messages in thread
From: Pavel Machek @ 2011-04-14 18:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: George Spelvin, davej, linux-kernel, joro

Hi!

> > Note that the discussion on the libmount mailing list revealed a possible
> > kernel workaround: escape the hyphens as \055.  Damn hard to read for
> > a human, but it does parse correctly, and the workaround can be fixed
> > once the library updates have propagated.
> 
> I'd rather replace it with some non-dash character that is
> human-readable, like '+' or some utf-8 sequence that _looks_ like a
> dash.

Just delete the dashes? IIRC UUIDs are fixed length binary numbers, so
no info will be lost.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
  2011-04-13  4:32 George Spelvin
@ 2011-04-13 14:54 ` Linus Torvalds
  2011-04-14 18:28   ` Pavel Machek
  0 siblings, 1 reply; 108+ messages in thread
From: Linus Torvalds @ 2011-04-13 14:54 UTC (permalink / raw)
  To: George Spelvin; +Cc: davej, linux-kernel, joro

On Tue, Apr 12, 2011 at 9:32 PM, George Spelvin <linux@horizon.com> wrote:
>
> Note that the discussion on the libmount mailing list revealed a possible
> kernel workaround: escape the hyphens as \055.  Damn hard to read for
> a human, but it does parse correctly, and the workaround can be fixed
> once the library updates have propagated.

I'd rather replace it with some non-dash character that is
human-readable, like '+' or some utf-8 sequence that _looks_ like a
dash.

So it wouldn't parse as a uuid to some code - big deal. Clearly
neither does the correct dash. Using \055 would be just ugly.

Of course, not exposing it at all is also a reasonable strategy.

Who uses /proc/self/mountinfo? I see the problem on my alpha Fedora-15
test-machine, but Fedora-14 doesn't seem to do it. Ubuntu? SuSE? Do we
know how widespread the breakage is?

If it is _just_ Fedora-15, then I presume that pushing out the
util-linux fix and waiting a few weeks will fix it (since anybody
using F-15 right now lives on the edge anyway, and a real release
hasn't happened). But if we have real releases using it, then we may
need to look at workarounds using non-'-' characters if people really
want the uuid showing up any time soon.

                      Linus

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
@ 2011-04-13  4:32 George Spelvin
  2011-04-13 14:54 ` Linus Torvalds
  0 siblings, 1 reply; 108+ messages in thread
From: George Spelvin @ 2011-04-13  4:32 UTC (permalink / raw)
  To: davej, linux-kernel, torvalds; +Cc: joro, linux

> Gaah, yes. Apparently that placement is correct and documented, and
> has been since the beginning.
>
> However, reality always takes precedence, so I think that for now
> we'll just have to revert the commit that added the uid: tag, and we
> can re-visit this issue when hopefully the tools have been fixed.

Note that the discussion on the libmount mailing list revealed a possible
kernel workaround: escape the hyphens as \055.  Damn hard to read for
a human, but it does parse correctly, and the workaround can be fixed
once the library updates have propagated.

In case it's useful.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: Linux 2.6.39-rc3
@ 2011-04-12 21:21 Alexandre Demers
  0 siblings, 0 replies; 108+ messages in thread
From: Alexandre Demers @ 2011-04-12 21:21 UTC (permalink / raw)
  To: alexdeucher, joro; +Cc: torvalds, dri-devel

Already tracking it here: https://bugzilla.kernel.org/show_bug.cgi?id=33012

Same problem, same culprit commit.

-- 
Alexandre Demers

^ permalink raw reply	[flat|nested] 108+ messages in thread

end of thread, other threads:[~2011-05-06 21:18 UTC | newest]

Thread overview: 108+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-12  0:40 Linux 2.6.39-rc3 Linus Torvalds
2011-04-12  9:02 ` Joerg Roedel
2011-04-12 14:15   ` Alex Deucher
2011-04-12 18:44     ` Joerg Roedel
2011-04-13  1:27       ` David Rientjes
2011-04-13  6:46       ` Ingo Molnar
2011-04-13 17:21         ` Joerg Roedel
2011-04-13 18:39           ` H. Peter Anvin
2011-04-13 19:26             ` Joerg Roedel
2011-04-13 18:51           ` H. Peter Anvin
2011-04-13 19:24             ` Joerg Roedel
2011-04-13 19:14           ` Yinghai Lu
2011-04-13 19:34             ` Joerg Roedel
2011-04-13 20:48               ` Yinghai Lu
2011-04-13 20:54                 ` Linus Torvalds
2011-04-13 21:23                   ` Yinghai Lu
2011-04-13 23:39                     ` Linus Torvalds
2011-04-14  0:10                       ` Yinghai Lu
2011-04-14  2:03                       ` H. Peter Anvin
2011-04-14  2:27                         ` Linus Torvalds
2011-04-14  2:27                           ` Linus Torvalds
2011-04-14  2:33                           ` Linus Torvalds
2011-04-14  2:33                             ` Linus Torvalds
2011-04-14  4:03                             ` Tejun Heo
2011-04-14  9:36                               ` Joerg Roedel
2011-04-14  8:09                             ` Alan Cox
2011-04-14  8:09                               ` Alan Cox
2011-04-15 13:11                             ` Joerg Roedel
2011-04-15 13:16                               ` Ingo Molnar
2011-04-15 14:33                                 ` Joerg Roedel
2011-04-15 16:11                                   ` Alex Deucher
2011-04-15 15:46                                 ` Joerg Roedel
2011-04-15 16:11                                   ` Jerome Glisse
2011-04-16 16:35                                     ` Joerg Roedel
2011-04-16 16:35                                       ` Joerg Roedel
2011-04-16 18:54                                       ` Jerome Glisse
2011-04-16 18:54                                         ` Jerome Glisse
2011-04-17 14:09                                         ` Joerg Roedel
2011-04-18  1:12                                           ` Jerome Glisse
2011-04-18 15:23                                           ` Alex Deucher
2011-04-18 15:23                                             ` Alex Deucher
2011-04-18 15:29                                             ` Jerome Glisse
2011-04-18 15:33                                               ` Alex Deucher
2011-04-18 15:59                                                 ` Jerome Glisse
2011-04-18 16:35                                                   ` Alex Deucher
2011-04-15 14:04                               ` Andreas Herrmann
2011-04-15 14:28                                 ` Joerg Roedel
2011-04-15 14:16                               ` Alexandre Demers
2011-04-15 14:27                                 ` Joerg Roedel
2011-04-15 14:27                                   ` Joerg Roedel
2011-04-15 18:59                                   ` Alexandre Demers
2011-04-15 19:06                                     ` Ingo Molnar
2011-04-15 19:18                                       ` Yinghai Lu
2011-04-15 20:22                                         ` H. Peter Anvin
2011-04-16 12:01                                         ` Joerg Roedel
2011-04-16 12:01                                           ` Joerg Roedel
2011-04-16 12:00                                       ` Joerg Roedel
2011-04-16 12:21                                         ` Ingo Molnar
2011-04-16 12:21                                           ` Ingo Molnar
2011-04-16  0:03                               ` [tip:x86/urgent] x86, amd: Disable GartTlbWlkErr when BIOS forgets it tip-bot for Joerg Roedel
2011-05-06 21:17                           ` Linux 2.6.39-rc3 Linus Torvalds
2011-04-13 21:50                 ` Joerg Roedel
2011-04-13 21:59                   ` Yinghai Lu
2011-04-13 22:11                     ` H. Peter Anvin
2011-04-13 22:01                   ` H. Peter Anvin
2011-04-13 22:22                     ` Joerg Roedel
2011-04-13 22:31                       ` H. Peter Anvin
2011-04-14  8:59                         ` Joerg Roedel
2011-04-13 19:48             ` Alex Deucher
2011-04-14  1:58             ` H. Peter Anvin
2011-04-14  1:58               ` H. Peter Anvin
2011-04-14  2:07               ` Dave Airlie
2011-04-14  6:10                 ` H. Peter Anvin
2011-04-14  8:56               ` Joerg Roedel
2011-04-14  9:07                 ` Dave Airlie
2011-04-14  9:11                 ` Ingo Molnar
2011-04-14 14:31                   ` H. Peter Anvin
2011-04-14 14:28                 ` Alex Deucher
2011-04-14 21:09                   ` Joerg Roedel
2011-04-14 21:34                     ` Alex Deucher
2011-04-15  6:50                       ` Joerg Roedel
2011-04-15 14:49                       ` Andreas Herrmann
2011-04-15  8:26                     ` Michel Dänzer
2011-04-15  8:26                       ` Michel Dänzer
2011-04-15  8:55                       ` Joerg Roedel
2011-04-12 19:09 ` Dave Jones
2011-04-12 19:21   ` Dave Jones
2011-04-12 19:55     ` Linus Torvalds
2011-04-12 20:13       ` Dave Jones
2011-04-14  8:20     ` Aneesh Kumar K.V
2011-04-18 22:57       ` Kay Sievers
2011-04-18 23:02         ` Dave Jones
2011-04-18 23:14           ` Kay Sievers
2011-04-19 11:42           ` Ted Ts'o
2011-04-19  8:23         ` Aneesh Kumar K.V
2011-04-19  8:37           ` Steven Whitehouse
2011-04-19  9:55           ` Kay Sievers
2011-04-12 20:20   ` Eric Sandeen
2011-04-12 20:27     ` Karel Zak
2011-04-12 20:33     ` Linus Torvalds
2011-04-14 20:24 ` Borislav Petkov
2011-04-14 20:55   ` Linus Torvalds
2011-04-15  4:14     ` Christoph Hellwig
2011-04-20 20:12       ` Borislav Petkov
2011-04-12 21:21 Alexandre Demers
2011-04-13  4:32 George Spelvin
2011-04-13 14:54 ` Linus Torvalds
2011-04-14 18:28   ` Pavel Machek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.