* [PATCH,RFC] iommu/amd: Recover from event log overflow @ 2021-08-21 15:22 Lennert Buytenhek 2021-09-28 8:45 ` Joerg Roedel 0 siblings, 1 reply; 3+ messages in thread From: Lennert Buytenhek @ 2021-08-21 15:22 UTC (permalink / raw) To: iommu, Suthikulpanit, Suravee, Joerg Roedel The AMD IOMMU logs I/O page faults and such to a ring buffer in system memory, and this ring buffer can overflow. The AMD IOMMU spec has the following to say about the interrupt status bit that signals this overflow condition: EventOverflow: Event log overflow. RW1C. Reset 0b. 1 = IOMMU event log overflow has occurred. This bit is set when a new event is to be written to the event log and there is no usable entry in the event log, causing the new event information to be discarded. An interrupt is generated when EventOverflow = 1b and MMIO Offset 0018h[EventIntEn] = 1b. No new event log entries are written while this bit is set. Software Note: To resume logging, clear EventOverflow (W1C), and write a 1 to MMIO Offset 0018h[EventLogEn]. The AMD IOMMU driver doesn't currently implement this recovery sequence, meaning that if a ring buffer overflow occurs, logging of EVT/PPR/GA events will cease entirely. This patch implements the spec-mandated reset sequence, with the minor tweak that the hardware seems to want to have a 0 written to MMIO Offset 0018h[EventLogEn] first, before writing an 1 into this field, or the IOMMU won't actually resume logging events. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> --- N.B.: I have only tested this change against an older in-house custom kernel version. (Where it appears to survive stress testing.) This version of the patch was only compile-tested. This patch makes iommu_feature_{disable,enable}() be called at non-init time (via amd_iommu_restart_event_logging()), and those functions perform unguarded read-modify-write sequences on the MMIO_CONTROL_OFFSET register, and I haven't yet verified whether that is safe to do. Reproducing this issue is fairly easy if you can easily cause I/O page faults -- either stick an msleep() in the amd_iommu_int_thread() loop, or configure a slow serial console (the latter being how this was initially found). drivers/iommu/amd/amd_iommu.h | 1 + drivers/iommu/amd/amd_iommu_types.h | 1 + drivers/iommu/amd/init.c | 10 ++++++++++ drivers/iommu/amd/iommu.c | 10 ++++++++-- 4 files changed, 20 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h index 416815a525d6..bb95edf74415 100644 --- a/drivers/iommu/amd/amd_iommu.h +++ b/drivers/iommu/amd/amd_iommu.h @@ -14,6 +14,7 @@ extern irqreturn_t amd_iommu_int_thread(int irq, void *data); extern irqreturn_t amd_iommu_int_handler(int irq, void *data); extern void amd_iommu_apply_erratum_63(u16 devid); +extern void amd_iommu_restart_event_logging(struct amd_iommu *iommu); extern void amd_iommu_reset_cmd_buffer(struct amd_iommu *iommu); extern int amd_iommu_init_devices(void); extern void amd_iommu_uninit_devices(void); diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h index 94c1a7a9876d..34b90f8b2056 100644 --- a/drivers/iommu/amd/amd_iommu_types.h +++ b/drivers/iommu/amd/amd_iommu_types.h @@ -110,6 +110,7 @@ #define PASID_MASK 0x0000ffff /* MMIO status bits */ +#define MMIO_STATUS_EVT_OVERFLOW_INT_MASK (1 << 0) #define MMIO_STATUS_EVT_INT_MASK (1 << 1) #define MMIO_STATUS_COM_WAIT_INT_MASK (1 << 2) #define MMIO_STATUS_PPR_INT_MASK (1 << 6) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 46280e6e1535..44d1536474ed 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -639,6 +639,16 @@ static int __init alloc_command_buffer(struct amd_iommu *iommu) return iommu->cmd_buf ? 0 : -ENOMEM; } +/* + * This function restarts event logging in case the IOMMU experienced + * an event log buffer overflow. + */ +void amd_iommu_restart_event_logging(struct amd_iommu *iommu) +{ + iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN); + iommu_feature_enable(iommu, CONTROL_EVT_LOG_EN); +} + /* * This function resets the command buffer if the IOMMU stopped fetching * commands from it. diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 811a49a95d04..b04f0cd5ffc2 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -736,7 +736,8 @@ amd_iommu_set_pci_msi_domain(struct device *dev, struct amd_iommu *iommu) { } #endif /* !CONFIG_IRQ_REMAP */ #define AMD_IOMMU_INT_MASK \ - (MMIO_STATUS_EVT_INT_MASK | \ + (MMIO_STATUS_EVT_OVERFLOW_INT_MASK | \ + MMIO_STATUS_EVT_INT_MASK | \ MMIO_STATUS_PPR_INT_MASK | \ MMIO_STATUS_GALOG_INT_MASK) @@ -746,7 +747,7 @@ irqreturn_t amd_iommu_int_thread(int irq, void *data) u32 status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET); while (status & AMD_IOMMU_INT_MASK) { - /* Enable EVT and PPR and GA interrupts again */ + /* Enable interrupt sources again */ writel(AMD_IOMMU_INT_MASK, iommu->mmio_base + MMIO_STATUS_OFFSET); @@ -767,6 +768,11 @@ irqreturn_t amd_iommu_int_thread(int irq, void *data) } #endif + if (status & MMIO_STATUS_EVT_OVERFLOW_INT_MASK) { + pr_info("AMD-Vi: IOMMU event log overflow\n"); + amd_iommu_restart_event_logging(iommu); + } + /* * Hardware bug: ERBT1312 * When re-enabling interrupt (by writing 1 -- 2.31.1 _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH,RFC] iommu/amd: Recover from event log overflow 2021-08-21 15:22 [PATCH,RFC] iommu/amd: Recover from event log overflow Lennert Buytenhek @ 2021-09-28 8:45 ` Joerg Roedel 2021-09-30 13:45 ` Lennert Buytenhek 0 siblings, 1 reply; 3+ messages in thread From: Joerg Roedel @ 2021-09-28 8:45 UTC (permalink / raw) To: Lennert Buytenhek; +Cc: iommu Hi Lennert, On Sat, Aug 21, 2021 at 06:22:10PM +0300, Lennert Buytenhek wrote: > +/* > + * This function restarts event logging in case the IOMMU experienced > + * an event log buffer overflow. > + */ > +void amd_iommu_restart_event_logging(struct amd_iommu *iommu) > +{ > + iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN); > + iommu_feature_enable(iommu, CONTROL_EVT_LOG_EN); > +} A few more things need to happen here. First of all head and tail are likely equal when the event buffer overflows, so the events will not be polled and reported. And next it is also a good idea to reset the head and tail pointers of the event log to 0 after the events have been polled. Regards, Joerg _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH,RFC] iommu/amd: Recover from event log overflow 2021-09-28 8:45 ` Joerg Roedel @ 2021-09-30 13:45 ` Lennert Buytenhek 0 siblings, 0 replies; 3+ messages in thread From: Lennert Buytenhek @ 2021-09-30 13:45 UTC (permalink / raw) To: Joerg Roedel; +Cc: iommu On Tue, Sep 28, 2021 at 10:45:30AM +0200, Joerg Roedel wrote: > Hi Lennert, Hi Joerg, Thanks for your feedback! > > +/* > > + * This function restarts event logging in case the IOMMU experienced > > + * an event log buffer overflow. > > + */ > > +void amd_iommu_restart_event_logging(struct amd_iommu *iommu) > > +{ > > + iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN); > > + iommu_feature_enable(iommu, CONTROL_EVT_LOG_EN); > > +} > > A few more things need to happen here. First of all head and tail are > likely equal when the event buffer overflows, so the events will not be > polled and reported. I applied the following change on top of my patch, to check the state of the event log ring buffer when we enter the IRQ handler with an overflow condition pending: --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -752,6 +752,18 @@ irqreturn_t amd_iommu_int_thread(int irq, void *data) struct amd_iommu *iommu = (struct amd_iommu *) data; u32 status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET); + if (status & MMIO_STATUS_EVT_OVERFLOW_INT_MASK) { + u32 events; + + events = readl(iommu->mmio_base + MMIO_EVT_TAIL_OFFSET) - + readl(iommu->mmio_base + MMIO_EVT_HEAD_OFFSET); + if (events & 0x80000000) + events += EVT_BUFFER_SIZE; + events /= EVENT_ENTRY_SIZE; + + pr_info("Overflow with %d events queued\n", events); + } + while (status & AMD_IOMMU_INT_MASK) { /* Enable interrupt sources again */ writel(AMD_IOMMU_INT_MASK, And that gives me: [ 33.304821] AMD-Vi: Overflow with 511 events queued [ 35.304812] AMD-Vi: Overflow with 511 events queued [ 39.304791] AMD-Vi: Overflow with 511 events queued [ 40.304792] AMD-Vi: Overflow with 511 events queued [ 41.304782] AMD-Vi: Overflow with 511 events queued [ 44.009920] AMD-Vi: Overflow with 511 events queued [ 45.304768] AMD-Vi: Overflow with 511 events queued [ 46.304782] AMD-Vi: Overflow with 511 events queued [ 46.385028] AMD-Vi: Overflow with 511 events queued [ 51.304733] AMD-Vi: Overflow with 511 events queued [ 53.318892] AMD-Vi: Overflow with 511 events queued [ 60.304681] AMD-Vi: Overflow with 511 events queued [ 63.304676] AMD-Vi: Overflow with 511 events queued In other words, it seems that the hardware considers the event log to be full when there's still one free entry in the ring buffer, and it will not actually fill up the last free entry and make the head and tail pointer equal by doing so. This seems consistent across Ryzen 3700X, Ryzen 5950X, EPYC 3151, EPYC 3251, RX-421ND, RX-216TD. The docs talk about "When the Event Log has overflowed", but they don't define exactly when that happens (i.e. when tail becomes equal to head or one entry before that), but I did find this phrase that talks about the overflow condition: The host software must make space in the event log after an overflow by reading entries (by adjusting the head pointer) or by resizing the log. Event logging may then be restarted. This seems to suggest that the hardware will never consume the last entry in the ring buffer and thereby advance tail to be equal to head, because if it would, then there would be no need for "reading entries (by adjusting the head pointer)" to make space in the event log buffer, because the 'empty' and 'full' conditions would be indistinguishable in that case. If there _is_ some variation of the hardware out there that does advance the tail pointer to coincide with the head pointer when there are already N-1 entries in the log and it has one more entry to write, then this patch would still work OK on such hardware -- we would just lose a few more events in that case than we otherwise would, but the point of the patch is to be able to recover after a burst of I/O page faults, at which point we've very likely already lost some events, so I think such hypothetical lossage would be acceptable, given that none of the hardware I have access to seems to behave that way and from the wording in the docs it is unlikely to behave that way. In fact, thinking about it a bit more, by the time an event log overflow condition is handled, it is actually possible for the event log ring buffer to be empty and not full. Imagine this scenario: - We enter the IRQ handler, EVT status bit is set, the ring buffer is full but it hasn't overflowed yet, so OVERFLOW is not set. - We read interrupt status and clear the EVT bit. - Before we call iommu_poll_events(), another event comes in, which causes the OVERFLOW flag to be set, since we haven't advanced head yet. - iommu_poll_events() is called, and it processes all events in the ring buffer, which is now empty (and will remain empty, since the overflow bit is set). - The MMIO_STATUS_OFFSET re-reading at the end of the IRQ loop finds the OVERFLOW bit set and loops back to the beginning of the loop. - We signal status & MMIO_STATUS_EVT_OVERFLOW_INT_MASK in the next loop iteration, but there are actually no events in the ring buffer anymore, since we cleared those all out in the previous loop. In other words, even if there is hardware out there that uses up every entry in the ring buffer (and will thus let tail become equal to head), on such hardware we cannot be sure that every entry in the ring buffer is valid by the time we signal the overflow condition, as it might also be the case that the ring buffer is now entirely empty and not full, and the overflow handling code therefore cannot just go ahead and report every entry in the ring buffer when an overflow happens. (We _could_ deal with this by reading head and tail and then re-reading the interrupt status bit after that to check for an overflow condition once again, but I think that that's probably more trouble than it's worth, given that we're talking about hypothetical hardware here. The way I did it is probably the simplest way of handling overflows, I think.) > And next it is also a good idea to reset the head and tail pointers of > the event log to 0 after the events have been polled. What difference would this make? It would cost two MMIO writes, and I don't see what the advantage of doing this would be, as it's fine for the IOMMU to resume writing events at any index in the ring buffer, and we will handle that just fine. Some more testing suggests that this would be a good change to make: ;-) @@ -775,7 +775,7 @@ irqreturn_t amd_iommu_int_thread(int irq, void *data) #endif if (status & MMIO_STATUS_EVT_OVERFLOW_INT_MASK) { - pr_info("AMD-Vi: IOMMU event log overflow\n"); + pr_info_ratelimited("IOMMU event log overflow\n"); amd_iommu_restart_event_logging(iommu); } I can resubmit the patch with that change applied. Thanks, Lennert _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-09-30 13:45 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-08-21 15:22 [PATCH,RFC] iommu/amd: Recover from event log overflow Lennert Buytenhek 2021-09-28 8:45 ` Joerg Roedel 2021-09-30 13:45 ` Lennert Buytenhek
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.