linux-cve-announce.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* CVE-2024-26762: cxl/pci: Skip to handle RAS errors if CXL.mem device is detached
@ 2024-04-03 17:31 Greg Kroah-Hartman
  0 siblings, 0 replies; only message in thread
From: Greg Kroah-Hartman @ 2024-04-03 17:31 UTC (permalink / raw)
  To: linux-cve-announce; +Cc: Greg Kroah-Hartman

Description
===========

In the Linux kernel, the following vulnerability has been resolved:

cxl/pci: Skip to handle RAS errors if CXL.mem device is detached

The PCI AER model is an awkward fit for CXL error handling. While the
expectation is that a PCI device can escalate to link reset to recover
from an AER event, the same reset on CXL amounts to a surprise memory
hotplug of massive amounts of memory.

At present, the CXL error handler attempts some optimistic error
handling to unbind the device from the cxl_mem driver after reaping some
RAS register values. This results in a "hopeful" attempt to unplug the
memory, but there is no guarantee that will succeed.

A subsequent AER notification after the memdev unbind event can no
longer assume the registers are mapped. Check for memdev bind before
reaping status register values to avoid crashes of the form:

 BUG: unable to handle page fault for address: ffa00000195e9100
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 [...]
 RIP: 0010:__cxl_handle_ras+0x30/0x110 [cxl_core]
 [...]
 Call Trace:
  <TASK>
  ? __die+0x24/0x70
  ? page_fault_oops+0x82/0x160
  ? kernelmode_fixup_or_oops+0x84/0x110
  ? exc_page_fault+0x113/0x170
  ? asm_exc_page_fault+0x26/0x30
  ? __pfx_dpc_reset_link+0x10/0x10
  ? __cxl_handle_ras+0x30/0x110 [cxl_core]
  ? find_cxl_port+0x59/0x80 [cxl_core]
  cxl_handle_rp_ras+0xbc/0xd0 [cxl_core]
  cxl_error_detected+0x6c/0xf0 [cxl_core]
  report_error_detected+0xc7/0x1c0
  pci_walk_bus+0x73/0x90
  pcie_do_recovery+0x23f/0x330

Longer term, the unbind and PCI_ERS_RESULT_DISCONNECT behavior might
need to be replaced with a new PCI_ERS_RESULT_PANIC.

The Linux kernel CVE team has assigned CVE-2024-26762 to this issue.


Affected and fixed versions
===========================

	Issue introduced in 6.7 with commit 6ac07883dbb5 and fixed in 6.7.7 with commit 21e5e84f3f63
	Issue introduced in 6.7 with commit 6ac07883dbb5 and fixed in 6.8 with commit eef5c7b28dbe

Please see https://www.kernel.org for a full list of currently supported
kernel versions by the kernel community.

Unaffected versions might change over time as fixes are backported to
older supported kernel versions.  The official CVE entry at
	https://cve.org/CVERecord/?id=CVE-2024-26762
will be updated if fixes are backported, please check that for the most
up to date information about this issue.


Affected files
==============

The file(s) affected by this issue are:
	drivers/cxl/core/pci.c


Mitigation
==========

The Linux kernel CVE team recommends that you update to the latest
stable kernel version for this, and many other bugfixes.  Individual
changes are never tested alone, but rather are part of a larger kernel
release.  Cherry-picking individual commits is not recommended or
supported by the Linux kernel community at all.  If however, updating to
the latest release is impossible, the individual changes to resolve this
issue can be found at these commits:
	https://git.kernel.org/stable/c/21e5e84f3f63fdf44e49642a6e45cd895e921a84
	https://git.kernel.org/stable/c/eef5c7b28dbecd6b141987a96db6c54e49828102

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2024-04-03 17:32 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-03 17:31 CVE-2024-26762: cxl/pci: Skip to handle RAS errors if CXL.mem device is detached Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).