All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lucas Stach <l.stach@pengutronix.de>
To: Alexandre Courbot <acourbot@nvidia.com>
Cc: Alexandre Courbot <gnurou@gmail.com>,
	"nouveau@lists.freedesktop.org" <nouveau@lists.freedesktop.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	Ben Skeggs <bskeggs@redhat.com>,
	"linux-tegra@vger.kernel.org" <linux-tegra@vger.kernel.org>
Subject: Re: [Nouveau] [PATCH v4 2/6] drm/nouveau: map pages using DMA API on platform devices
Date: Fri, 11 Jul 2014 11:53:06 +0200	[thread overview]
Message-ID: <1405072386.4630.9.camel@weser.hi.pengutronix.de> (raw)
In-Reply-To: <53BF52A0.4070907@nvidia.com>

Am Freitag, den 11.07.2014, 11:57 +0900 schrieb Alexandre Courbot:
[...]
> >> Yeah, I am not familiar with i915 but it seems like we are on a similar boat
> >> here (excepted ARM is more constrained as to its memory mappings). The
> >> strategy in this series is, map buffers used by user-space cached and
> >> explicitly synchronize them (since the ownership transition from user to GPU
> >> is always clearly performed by syscalls), and use coherent mappings for
> >> buffers used by the kernel which are accessed more randomly. This has solved
> >> all our coherency issues and resulted in the best performance so far.
> > I wonder if we might want to use unsnooped cached mappings of pages on
> > non-ARM platforms also, to avoid the overhead of the cache snooping?
> 
> You might want to indeed, now that coherency is guaranteed by the sync 
> functions originally introduced by Lucas. The only issue I could see is 
> that they always invalidate the full buffer whereas bus snooping only 
> affects pages that are actually touched. Someone would need to try this 
> on a desktop machine and see how it affects performance.
> 
> I'd be all for it though, since it would also allow us to get rid of 
> this ungraceful nv_device_is_cpu_coherent() function and result in 
> simplifying nouveau_bo.c a bit.

This will need some testing to get hard numbers, but I suspect that
invalidating the whole buffer isn't to bad as the prefetch machinery
works very well with the access patterns we see in graphics drivers.

Flushing out the whole buffer should be even less problematic, as it
will only flush out dirty lines that would need to be flushed on GPU
read snooping anyways.

In the long run we might want a separate cpu prepare/finish ioctl where
we can indicate the area of interest. This might help to avoid some of
the invalidate overhead especially for userspace suballocated buffers.

Regards,
Lucas

-- 
Pengutronix e.K.             | Lucas Stach                 |
Industrial Linux Solutions   | http://www.pengutronix.de/  |

WARNING: multiple messages have this Message-ID (diff)
From: Lucas Stach <l.stach@pengutronix.de>
To: Alexandre Courbot <acourbot@nvidia.com>
Cc: Ben Skeggs <skeggsb@gmail.com>,
	Alexandre Courbot <gnurou@gmail.com>,
	"nouveau@lists.freedesktop.org" <nouveau@lists.freedesktop.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	Ben Skeggs <bskeggs@redhat.com>,
	"linux-tegra@vger.kernel.org" <linux-tegra@vger.kernel.org>
Subject: Re: [Nouveau] [PATCH v4 2/6] drm/nouveau: map pages using DMA API on platform devices
Date: Fri, 11 Jul 2014 11:53:06 +0200	[thread overview]
Message-ID: <1405072386.4630.9.camel@weser.hi.pengutronix.de> (raw)
In-Reply-To: <53BF52A0.4070907@nvidia.com>

Am Freitag, den 11.07.2014, 11:57 +0900 schrieb Alexandre Courbot:
[...]
> >> Yeah, I am not familiar with i915 but it seems like we are on a similar boat
> >> here (excepted ARM is more constrained as to its memory mappings). The
> >> strategy in this series is, map buffers used by user-space cached and
> >> explicitly synchronize them (since the ownership transition from user to GPU
> >> is always clearly performed by syscalls), and use coherent mappings for
> >> buffers used by the kernel which are accessed more randomly. This has solved
> >> all our coherency issues and resulted in the best performance so far.
> > I wonder if we might want to use unsnooped cached mappings of pages on
> > non-ARM platforms also, to avoid the overhead of the cache snooping?
> 
> You might want to indeed, now that coherency is guaranteed by the sync 
> functions originally introduced by Lucas. The only issue I could see is 
> that they always invalidate the full buffer whereas bus snooping only 
> affects pages that are actually touched. Someone would need to try this 
> on a desktop machine and see how it affects performance.
> 
> I'd be all for it though, since it would also allow us to get rid of 
> this ungraceful nv_device_is_cpu_coherent() function and result in 
> simplifying nouveau_bo.c a bit.

This will need some testing to get hard numbers, but I suspect that
invalidating the whole buffer isn't to bad as the prefetch machinery
works very well with the access patterns we see in graphics drivers.

Flushing out the whole buffer should be even less problematic, as it
will only flush out dirty lines that would need to be flushed on GPU
read snooping anyways.

In the long run we might want a separate cpu prepare/finish ioctl where
we can indicate the area of interest. This might help to avoid some of
the invalidate overhead especially for userspace suballocated buffers.

Regards,
Lucas

-- 
Pengutronix e.K.             | Lucas Stach                 |
Industrial Linux Solutions   | http://www.pengutronix.de/  |


  reply	other threads:[~2014-07-11  9:53 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-08  8:25 [PATCH v4 0/6] drm: nouveau: memory coherency on ARM Alexandre Courbot
2014-07-08  8:25 ` Alexandre Courbot
     [not found] ` <1404807961-30530-1-git-send-email-acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2014-07-08  8:25   ` [PATCH v4 1/6] drm/ttm: expose CPU address of DMA-allocated pages Alexandre Courbot
2014-07-08  8:25     ` Alexandre Courbot
2014-07-08  8:25   ` [PATCH v4 2/6] drm/nouveau: map pages using DMA API on platform devices Alexandre Courbot
2014-07-08  8:25     ` Alexandre Courbot
     [not found]     ` <1404807961-30530-3-git-send-email-acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2014-07-10 12:58       ` Daniel Vetter
2014-07-10 12:58         ` [Nouveau] " Daniel Vetter
     [not found]         ` <20140710125849.GF17271-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2014-07-11  2:35           ` Alexandre Courbot
2014-07-11  2:35             ` Alexandre Courbot
     [not found]             ` <53BF4D6B.70904-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2014-07-11  2:50               ` Ben Skeggs
2014-07-11  2:50                 ` [Nouveau] " Ben Skeggs
     [not found]                 ` <CACAvsv7eER4VmbR81Ym=YE7fQZ9cNuJsb5372SAuSX+PQfYyrQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-07-11  2:57                   ` Alexandre Courbot
2014-07-11  2:57                     ` Alexandre Courbot
2014-07-11  9:53                     ` Lucas Stach [this message]
2014-07-11  9:53                       ` Lucas Stach
2014-07-11  7:38             ` Daniel Vetter
2014-07-11  7:38               ` Daniel Vetter
2014-07-08  8:25   ` [PATCH v4 3/6] drm/nouveau: introduce nv_device_is_cpu_coherent() Alexandre Courbot
2014-07-08  8:25     ` Alexandre Courbot
2014-07-08  8:25   ` [PATCH v4 4/6] drm/nouveau: synchronize BOs when required Alexandre Courbot
2014-07-08  8:25     ` Alexandre Courbot
2014-07-10 13:04     ` [Nouveau] " Daniel Vetter
2014-07-10 13:04       ` Daniel Vetter
     [not found]       ` <20140710130449.GG17271-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2014-07-11  2:40         ` Alexandre Courbot
2014-07-11  2:40           ` Alexandre Courbot
     [not found]           ` <53BF4E9B.7090606-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2014-07-11  7:41             ` Daniel Vetter
2014-07-11  7:41               ` [Nouveau] " Daniel Vetter
     [not found]               ` <20140711074138.GW17271-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2014-07-11  9:35                 ` Alexandre Courbot
2014-07-11  9:35                   ` [Nouveau] " Alexandre Courbot
2014-07-08  8:26   ` [PATCH v4 5/6] drm/nouveau: implement explicitly coherent BOs Alexandre Courbot
2014-07-08  8:26     ` Alexandre Courbot
2014-07-08  8:26 ` [PATCH v4 6/6] drm/nouveau: allocate GPFIFOs and fences coherently Alexandre Courbot
2014-07-08  8:26   ` Alexandre Courbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1405072386.4630.9.camel@weser.hi.pengutronix.de \
    --to=l.stach@pengutronix.de \
    --cc=acourbot@nvidia.com \
    --cc=bskeggs@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gnurou@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=nouveau@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.