All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Safonov <dima@arista.com>
To: Lu Baolu <baolu.lu@linux.intel.com>,
	linux-kernel@vger.kernel.org, joro@8bytes.org, "Raj,
	Ashok" <ashok.raj@intel.com>
Cc: 0x7f454c46@gmail.com,
	Alex Williamson <alex.williamson@redhat.com>,
	David Woodhouse <dwmw2@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	iommu@lists.linux-foundation.org
Subject: Re: [PATCHv4 2/2] iommu/vt-d: Limit number of faults to clear in irq handler
Date: Thu, 03 May 2018 03:34:50 +0100	[thread overview]
Message-ID: <1525314890.14025.38.camel@arista.com> (raw)
In-Reply-To: <5AEA70FD.1010209@linux.intel.com>

On Thu, 2018-05-03 at 10:16 +0800, Lu Baolu wrote:
> Hi,
> 
> On 05/03/2018 09:59 AM, Dmitry Safonov wrote:
> > On Thu, 2018-05-03 at 09:32 +0800, Lu Baolu wrote:
> > > Hi,
> > > 
> > > On 05/03/2018 08:52 AM, Dmitry Safonov wrote:
> > > > AFAICS, we're doing fault-clearing in a loop inside irq
> > > > handler.
> > > > That means that while we're clearing if a fault raises, it'll
> > > > make
> > > > an irq level triggered (or on edge) on lapic. So, whenever we
> > > > return
> > > > from the irq handler, irq will raise again.
> > > > 
> > > 
> > > Uhm, double checked with the spec. Interrupts should be generated
> > > since we always clear the fault overflow bit.
> > > 
> > > Anyway, we can't clear faults in a limited loop, as the spec says
> > > in
> > > 7.3.1:
> > 
> > Mind to elaborate?
> > ITOW, I do not see a contradiction. We're still clearing faults in
> > FIFO
> > fashion. There is no limitation to do some spare work in between
> > clearings (return from interrupt, then fault again and continue).
> 
> Hardware maintains an internal index to reference the fault recording
> register in which the next fault can be recorded. When a fault comes,
> hardware will check the Fault bit (bit 31 of the 4th 32-bit register
> recording
> register) referenced by the internal index. If this bit is set,
> hardware will
> not record the fault.
> 
> Since we now don't clear the F bit until a register entry which has
> the F bit
> cleared, we might exit the fault handling with some register entries
> still
> have the F bit set.
> 
>   F
> > 0 |  xxxxxxxxxxxxx|
> > 0 |  xxxxxxxxxxxxx|
> > 0 |  xxxxxxxxxxxxx|  <--- Fault record index in fault status
> > register
> > 0 |  xxxxxxxxxxxxx|
> > 1 |  xxxxxxxxxxxxx|  <--- hardware maintained index
> > 1 |  xxxxxxxxxxxxx|
> > 1 |  xxxxxxxxxxxxx|
> > 0 |  xxxxxxxxxxxxx|
> > 0 |  xxxxxxxxxxxxx|
> > 0 |  xxxxxxxxxxxxx|
> > 0 |  xxxxxxxxxxxxx|
> 
> Take an example as above, hardware could only record 2 more faults
> with
> others all dropped.

Ugh, yeah, I got what you're saying.. Thanks for explanations.
So, we shouldn't mark faults as cleared until we've actually processed
them here:
:        writel(DMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_PRO,
:               iommu->reg + DMAR_FSTS_REG);

As Joerg mentioned, we do care about latency here, so this fault work
can't be moved entirely into workqueue.. but we might limit loop and
check if we've hit the limit - to proceed servicing faults in a wq,
as in that case we should care about being too long in irq-disabled
section more than about latencies.
Does that makes any sense, what do you think?

I can possibly re-write 2/2 with idea above..
And it would be a bit joy to have 1/1 applied, as it's independent fix
and fixes an issue that happens for real on our devices, heh.

-- 
Thanks,
             Dmitry

WARNING: multiple messages have this Message-ID (diff)
From: Dmitry Safonov via iommu <iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
To: Lu Baolu <baolu.lu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org, "Raj,
	Ashok" <ashok.raj-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: David Woodhouse <dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	0x7f454c46-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
Subject: Re: [PATCHv4 2/2] iommu/vt-d: Limit number of faults to clear in irq handler
Date: Thu, 03 May 2018 03:34:50 +0100	[thread overview]
Message-ID: <1525314890.14025.38.camel@arista.com> (raw)
In-Reply-To: <5AEA70FD.1010209-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>

On Thu, 2018-05-03 at 10:16 +0800, Lu Baolu wrote:
> Hi,
> 
> On 05/03/2018 09:59 AM, Dmitry Safonov wrote:
> > On Thu, 2018-05-03 at 09:32 +0800, Lu Baolu wrote:
> > > Hi,
> > > 
> > > On 05/03/2018 08:52 AM, Dmitry Safonov wrote:
> > > > AFAICS, we're doing fault-clearing in a loop inside irq
> > > > handler.
> > > > That means that while we're clearing if a fault raises, it'll
> > > > make
> > > > an irq level triggered (or on edge) on lapic. So, whenever we
> > > > return
> > > > from the irq handler, irq will raise again.
> > > > 
> > > 
> > > Uhm, double checked with the spec. Interrupts should be generated
> > > since we always clear the fault overflow bit.
> > > 
> > > Anyway, we can't clear faults in a limited loop, as the spec says
> > > in
> > > 7.3.1:
> > 
> > Mind to elaborate?
> > ITOW, I do not see a contradiction. We're still clearing faults in
> > FIFO
> > fashion. There is no limitation to do some spare work in between
> > clearings (return from interrupt, then fault again and continue).
> 
> Hardware maintains an internal index to reference the fault recording
> register in which the next fault can be recorded. When a fault comes,
> hardware will check the Fault bit (bit 31 of the 4th 32-bit register
> recording
> register) referenced by the internal index. If this bit is set,
> hardware will
> not record the fault.
> 
> Since we now don't clear the F bit until a register entry which has
> the F bit
> cleared, we might exit the fault handling with some register entries
> still
> have the F bit set.
> 
>   F
> > 0 |  xxxxxxxxxxxxx|
> > 0 |  xxxxxxxxxxxxx|
> > 0 |  xxxxxxxxxxxxx|  <--- Fault record index in fault status
> > register
> > 0 |  xxxxxxxxxxxxx|
> > 1 |  xxxxxxxxxxxxx|  <--- hardware maintained index
> > 1 |  xxxxxxxxxxxxx|
> > 1 |  xxxxxxxxxxxxx|
> > 0 |  xxxxxxxxxxxxx|
> > 0 |  xxxxxxxxxxxxx|
> > 0 |  xxxxxxxxxxxxx|
> > 0 |  xxxxxxxxxxxxx|
> 
> Take an example as above, hardware could only record 2 more faults
> with
> others all dropped.

Ugh, yeah, I got what you're saying.. Thanks for explanations.
So, we shouldn't mark faults as cleared until we've actually processed
them here:
:        writel(DMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_PRO,
:               iommu->reg + DMAR_FSTS_REG);

As Joerg mentioned, we do care about latency here, so this fault work
can't be moved entirely into workqueue.. but we might limit loop and
check if we've hit the limit - to proceed servicing faults in a wq,
as in that case we should care about being too long in irq-disabled
section more than about latencies.
Does that makes any sense, what do you think?

I can possibly re-write 2/2 with idea above..
And it would be a bit joy to have 1/1 applied, as it's independent fix
and fixes an issue that happens for real on our devices, heh.

-- 
Thanks,
             Dmitry

  parent reply	other threads:[~2018-05-03  2:35 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-31  0:33 [PATCHv4 1/2] iommu/vt-d: Ratelimit each dmar fault printing Dmitry Safonov
2018-03-31  0:33 ` Dmitry Safonov via iommu
2018-03-31  0:33 ` [PATCHv4 2/2] iommu/vt-d: Limit number of faults to clear in irq handler Dmitry Safonov
2018-03-31  0:33   ` Dmitry Safonov via iommu
2018-05-02  6:34   ` Lu Baolu
2018-05-02  6:34     ` Lu Baolu
2018-05-02 12:38     ` Dmitry Safonov
2018-05-02 12:38       ` Dmitry Safonov via iommu
2018-05-02 23:49       ` Lu Baolu
2018-05-02 23:49         ` Lu Baolu
2018-05-03  0:52         ` Dmitry Safonov
2018-05-03  0:52           ` Dmitry Safonov via iommu
2018-05-03  1:32           ` Lu Baolu
2018-05-03  1:32             ` Lu Baolu
2018-05-03  1:59             ` Dmitry Safonov
2018-05-03  1:59               ` Dmitry Safonov via iommu
2018-05-03  2:16               ` Lu Baolu
2018-05-03  2:16                 ` Lu Baolu
2018-05-03  2:32                 ` Lu Baolu
2018-05-03  2:32                   ` Lu Baolu
2018-05-03  2:34                 ` Dmitry Safonov [this message]
2018-05-03  2:34                   ` Dmitry Safonov via iommu
2018-05-03  2:44                   ` Lu Baolu
2018-05-03  2:44                     ` Lu Baolu
2018-05-02  2:22 ` [PATCHv4 1/2] iommu/vt-d: Ratelimit each dmar fault printing Dmitry Safonov
2018-05-02  2:22   ` Dmitry Safonov via iommu
2018-05-03 12:40   ` Joerg Roedel
2018-05-03 16:12     ` Dmitry Safonov
2018-05-03 16:12       ` Dmitry Safonov via iommu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1525314890.14025.38.camel@arista.com \
    --to=dima@arista.com \
    --cc=0x7f454c46@gmail.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.