All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Qemu-devel] Possibility of unaligned DMA accesses via the QEMU DMA API?
       [not found] ` <20130717081627.GB2458@dhcp-200-207.str.redhat.com>
@ 2013-07-17 12:52   ` Mark Cave-Ayland
  2013-07-17 12:59     ` Alexander Graf
  0 siblings, 1 reply; 8+ messages in thread
From: Mark Cave-Ayland @ 2013-07-17 12:52 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Alexander Graf, qemu-devel

On 17/07/13 09:16, Kevin Wolf wrote:

Hi Kevin,

Thanks for the reply - CC to qemu-devel as requested.

>> I've been testing some of Alex Graf's patches for running Darwin
>> under QEMU PPC and have been experiencing some timeout problems on
>> block devices. My attention is drawn to this commit in particular:
>> https://github.com/qemu/qemu/commit/80fc95d8bdaf3392106b131a97ca701fd374489a.
>>
>> The reason for this commit is that Darwin programs the DBDMA
>> controller to transfer data from the ATA FIFO in chunks that aren't
>> sector aligned, e.g. the ATA command requests 0x10000 (256 sectors)
>> but transfers the DMA engine to transfer the data to memory as 3
>> chunks of 0xfffe, 0xfffe and 0x4 bytes.
>
> I'm not familiar with how DMA works for the macio IDE device. Do you
> have any pointers to specs or something?

It works by setting up a DMA descriptor table (which is a list of 
commands) which are then "executed" when the RUN status bit is set until 
a STOP command is reached. Things are slightly more complicated in that 
commands can have conditional branches set on them.

> The one important point I'm wondering about is why you call
> dma_bdrv_read() with a single 0xfffe QEMUSGList. Shouldn't it really be
> called with a QEMUSGList { 0xfffe, 0xfffe, 0x4 }, which should enable
> dma-helpers.c to do the right thing?

Hmmm I guess you could perhaps scan down the command list from the 
current position looking for all INPUT/OUTPUT commands until the next 
STOP command, and maybe build up a single QEMUSGList from that? I'm not 
sure exactly how robust that would be with the conditional branching 
though - Alex?

> In any case, it would be good if you could prepare a (yet failing) qtest
> case that demonstrates how DMA works on this controller and what the
> problematic requests look like.

I'll see what I can do, however I've not really looked at qtest before 
so it could take a little time. In the meantime, you can easily see 
these transfers by booting an old Darwin installation ISO.

>> It seems that the DMA API dma_bdrv_read()/dma_bdrv_write() can't
>> handle unaligned transfers in this way, yet I think there is a
>> better solution for this that doesn't mix DMA/non-DMA APIs in this
>> manner. I'd like to try and come up with a better solution, but
>> there seems to be a mix of synchronous/asynchronous/co-routine block
>> APIs that could be used.
>>
>> So my question is: what do you think is the best way to approach
>> solving the unaligned data access for MACIO using a DMA-friendly
>> API?
>
> First, as I said above, I'd like to understand why you need to go with
> unaligned values into the DMA API. Real hardware also only works on a
> sector level.

The main culprit for these transfers is Darwin which limits large 
transfers to 0xfffe (see http://searchcode.com/codesearch/view/23337208 
line 382). Hence most large disk transactions get broken down into 
irregularly-sized chunks which highlights this issue.

> The block layer is really designed for working with whole sectors. The
> only functions dealing with byte-aligned requests are bdrv_pread/pwrite.
> They are synchronous (i.e. block the vcpu while running) and writes are
> slow (because they first read the whole sector, copy in the modified
> part, and write out the whole sector again), so you want to avoid it.

Yeah, I figured this wasn't the most efficient way of doing it. The 
reason for asking the question was that I'm still struggling with some 
kind of timing/threading issue with Alex's work here, and I'm wondering 
if making the unaligned DMA requests more "atomic" by not mixing 
DMA/non-DMA/synchronous APIs will help solve the issue or not.


ATB,

Mark.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Possibility of unaligned DMA accesses via the QEMU DMA API?
  2013-07-17 12:52   ` [Qemu-devel] Possibility of unaligned DMA accesses via the QEMU DMA API? Mark Cave-Ayland
@ 2013-07-17 12:59     ` Alexander Graf
  2013-07-17 13:35       ` Kevin Wolf
  0 siblings, 1 reply; 8+ messages in thread
From: Alexander Graf @ 2013-07-17 12:59 UTC (permalink / raw)
  To: Mark Cave-Ayland; +Cc: Kevin Wolf, qemu-devel


On 17.07.2013, at 14:52, Mark Cave-Ayland wrote:

> On 17/07/13 09:16, Kevin Wolf wrote:
> 
> Hi Kevin,
> 
> Thanks for the reply - CC to qemu-devel as requested.
> 
>>> I've been testing some of Alex Graf's patches for running Darwin
>>> under QEMU PPC and have been experiencing some timeout problems on
>>> block devices. My attention is drawn to this commit in particular:
>>> https://github.com/qemu/qemu/commit/80fc95d8bdaf3392106b131a97ca701fd374489a.
>>> 
>>> The reason for this commit is that Darwin programs the DBDMA
>>> controller to transfer data from the ATA FIFO in chunks that aren't
>>> sector aligned, e.g. the ATA command requests 0x10000 (256 sectors)
>>> but transfers the DMA engine to transfer the data to memory as 3
>>> chunks of 0xfffe, 0xfffe and 0x4 bytes.
>> 
>> I'm not familiar with how DMA works for the macio IDE device. Do you
>> have any pointers to specs or something?
> 
> It works by setting up a DMA descriptor table (which is a list of commands) which are then "executed" when the RUN status bit is set until a STOP command is reached. Things are slightly more complicated in that commands can have conditional branches set on them.
> 
>> The one important point I'm wondering about is why you call
>> dma_bdrv_read() with a single 0xfffe QEMUSGList. Shouldn't it really be
>> called with a QEMUSGList { 0xfffe, 0xfffe, 0x4 }, which should enable
>> dma-helpers.c to do the right thing?
> 
> Hmmm I guess you could perhaps scan down the command list from the current position looking for all INPUT/OUTPUT commands until the next STOP command, and maybe build up a single QEMUSGList from that? I'm not sure exactly how robust that would be with the conditional branching though - Alex?

It'd at least be vastly different from how real hardware works, yes. We'd basically have to throw away the current interpretation code and instead emulate the device based on assumptions.

> 
>> In any case, it would be good if you could prepare a (yet failing) qtest
>> case that demonstrates how DMA works on this controller and what the
>> problematic requests look like.
> 
> I'll see what I can do, however I've not really looked at qtest before so it could take a little time. In the meantime, you can easily see these transfers by booting an old Darwin installation ISO.
> 
>>> It seems that the DMA API dma_bdrv_read()/dma_bdrv_write() can't
>>> handle unaligned transfers in this way, yet I think there is a
>>> better solution for this that doesn't mix DMA/non-DMA APIs in this
>>> manner. I'd like to try and come up with a better solution, but
>>> there seems to be a mix of synchronous/asynchronous/co-routine block
>>> APIs that could be used.
>>> 
>>> So my question is: what do you think is the best way to approach
>>> solving the unaligned data access for MACIO using a DMA-friendly
>>> API?
>> 
>> First, as I said above, I'd like to understand why you need to go with
>> unaligned values into the DMA API. Real hardware also only works on a
>> sector level.
> 
> The main culprit for these transfers is Darwin which limits large transfers to 0xfffe (see http://searchcode.com/codesearch/view/23337208 line 382). Hence most large disk transactions get broken down into irregularly-sized chunks which highlights this issue.

The main issue is that we're dealing with 3 separate pieces of hardware here. There is the IDE controller which works on sector level. And then there's the DMA controller which fetches data from the IDE controller byte-wise (from what I understand). Both work independently, but we try to shoehorn both into the same callback.

> 
>> The block layer is really designed for working with whole sectors. The
>> only functions dealing with byte-aligned requests are bdrv_pread/pwrite.
>> They are synchronous (i.e. block the vcpu while running) and writes are
>> slow (because they first read the whole sector, copy in the modified
>> part, and write out the whole sector again), so you want to avoid it.
> 
> Yeah, I figured this wasn't the most efficient way of doing it. The reason for asking the question was that I'm still struggling with some kind of timing/threading issue with Alex's work here, and I'm wondering if making the unaligned DMA requests more "atomic" by not mixing DMA/non-DMA/synchronous APIs will help solve the issue or not.

I don't think it'll make a difference. But I'd be more than happy to design this more properly too - the current code is vastly ugly.


Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Possibility of unaligned DMA accesses via the QEMU DMA API?
  2013-07-17 12:59     ` Alexander Graf
@ 2013-07-17 13:35       ` Kevin Wolf
  2013-07-17 20:12         ` Mark Cave-Ayland
  0 siblings, 1 reply; 8+ messages in thread
From: Kevin Wolf @ 2013-07-17 13:35 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Mark Cave-Ayland, qemu-devel

Am 17.07.2013 um 14:59 hat Alexander Graf geschrieben:
> 
> On 17.07.2013, at 14:52, Mark Cave-Ayland wrote:
> 
> > On 17/07/13 09:16, Kevin Wolf wrote:
> > 
> > Hi Kevin,
> > 
> > Thanks for the reply - CC to qemu-devel as requested.
> > 
> >>> I've been testing some of Alex Graf's patches for running Darwin
> >>> under QEMU PPC and have been experiencing some timeout problems on
> >>> block devices. My attention is drawn to this commit in particular:
> >>> https://github.com/qemu/qemu/commit/80fc95d8bdaf3392106b131a97ca701fd374489a.
> >>> 
> >>> The reason for this commit is that Darwin programs the DBDMA
> >>> controller to transfer data from the ATA FIFO in chunks that aren't
> >>> sector aligned, e.g. the ATA command requests 0x10000 (256 sectors)
> >>> but transfers the DMA engine to transfer the data to memory as 3
> >>> chunks of 0xfffe, 0xfffe and 0x4 bytes.
> >> 
> >> I'm not familiar with how DMA works for the macio IDE device. Do you
> >> have any pointers to specs or something?
> > 
> > It works by setting up a DMA descriptor table (which is a list of commands) which are then "executed" when the RUN status bit is set until a STOP command is reached. Things are slightly more complicated in that commands can have conditional branches set on them.
> > 
> >> The one important point I'm wondering about is why you call
> >> dma_bdrv_read() with a single 0xfffe QEMUSGList. Shouldn't it really be
> >> called with a QEMUSGList { 0xfffe, 0xfffe, 0x4 }, which should enable
> >> dma-helpers.c to do the right thing?
> > 
> > Hmmm I guess you could perhaps scan down the command list from the current position looking for all INPUT/OUTPUT commands until the next STOP command, and maybe build up a single QEMUSGList from that? I'm not sure exactly how robust that would be with the conditional branching though - Alex?
> 
> It'd at least be vastly different from how real hardware works, yes. We'd basically have to throw away the current interpretation code and instead emulate the device based on assumptions.

Okay, so I've had a quick look at that DMA controller, and it seems that
for a complete emulation, there's no way around using a bounce buffer
(and calling directly into the block layer instead of using
dma-helpers.c) for the general case.

You can have a fast path that is triggered if one or more directly
following INPUT/OUTPUT commands cover the whole IDE command, and that
creates an QEMUSGList as described above and uses dma-helpers.c to
implement zero-copy requests. I suspect that your Darwin requests would
actually fall into this category.

Essentially I think Alex' patches are doing something similar, just not
implementing the complete DMA controller feature set and with the
regular slow path hacked as additional code into the fast path. So the
code could be cleaner, it could use asynchronous block layer functions
and handle errors, and it could be more complete, but at the end of
the day you'd still have some fast-path zero-copy I/O and some calls
into the block layer using bounce buffers.

> > The main culprit for these transfers is Darwin which limits large transfers to 0xfffe (see http://searchcode.com/codesearch/view/23337208 line 382). Hence most large disk transactions get broken down into irregularly-sized chunks which highlights this issue.
> 
> The main issue is that we're dealing with 3 separate pieces of hardware here. There is the IDE controller which works on sector level. And then there's the DMA controller which fetches data from the IDE controller byte-wise (from what I understand). Both work independently, but we try to shoehorn both into the same callback.

But this is not really visible to software. At some point the bytes are
gathered until they fill up at least a full sector and only then they
are written to disk. The emulation must do the same.

Kevin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Possibility of unaligned DMA accesses via the QEMU DMA API?
  2013-07-17 13:35       ` Kevin Wolf
@ 2013-07-17 20:12         ` Mark Cave-Ayland
  2013-07-18  7:41           ` Kevin Wolf
  0 siblings, 1 reply; 8+ messages in thread
From: Mark Cave-Ayland @ 2013-07-17 20:12 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Alexander Graf, qemu-devel

On 17/07/13 14:35, Kevin Wolf wrote:

> Okay, so I've had a quick look at that DMA controller, and it seems that
> for a complete emulation, there's no way around using a bounce buffer
> (and calling directly into the block layer instead of using
> dma-helpers.c) for the general case.
>
> You can have a fast path that is triggered if one or more directly
> following INPUT/OUTPUT commands cover the whole IDE command, and that
> creates an QEMUSGList as described above and uses dma-helpers.c to
> implement zero-copy requests. I suspect that your Darwin requests would
> actually fall into this category.
>
> Essentially I think Alex' patches are doing something similar, just not
> implementing the complete DMA controller feature set and with the
> regular slow path hacked as additional code into the fast path. So the
> code could be cleaner, it could use asynchronous block layer functions
> and handle errors, and it could be more complete, but at the end of
> the day you'd still have some fast-path zero-copy I/O and some calls
> into the block layer using bounce buffers.

I think the key concept to understand here is at what point does the 
data hit the disk? From the comments in various parts of Darwin/Linux, 
it could be understood that the DMA transfers are between memory and the 
ATA drive *buffer*, so for writes especially there is no guarantee that 
they even hit the disk until some point in the future, unless of course 
the FLUSH flag is set in the control register.

So part of me makes me think that maybe we are over-thinking this and we 
should just go with Kevin's original suggestion: what about if we start 
a new QEMUSGList for each IDE transfer, and just keep appending 
QEMUSGList entries until we find an OUTPUT_LAST/INPUT_LAST command?

Why is this valid? We can respond with a complete status for the 
intermediate INPUT_MORE/OUTPUT_MORE commands without touching the disk 
because all that guarantees is that data has been passed between memory 
and the drive *buffer* - not that it has actually hit the disk. And what 
is the point of having explicit _LAST commands if they aren't used to 
signify completion of the whole transfer between drive and memory?


ATB,

Mark.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Possibility of unaligned DMA accesses via the QEMU DMA API?
  2013-07-17 20:12         ` Mark Cave-Ayland
@ 2013-07-18  7:41           ` Kevin Wolf
  2013-07-18 13:44             ` Alexander Graf
  0 siblings, 1 reply; 8+ messages in thread
From: Kevin Wolf @ 2013-07-18  7:41 UTC (permalink / raw)
  To: Mark Cave-Ayland; +Cc: Alexander Graf, qemu-devel

Am 17.07.2013 um 22:12 hat Mark Cave-Ayland geschrieben:
> On 17/07/13 14:35, Kevin Wolf wrote:
> 
> >Okay, so I've had a quick look at that DMA controller, and it seems that
> >for a complete emulation, there's no way around using a bounce buffer
> >(and calling directly into the block layer instead of using
> >dma-helpers.c) for the general case.
> >
> >You can have a fast path that is triggered if one or more directly
> >following INPUT/OUTPUT commands cover the whole IDE command, and that
> >creates an QEMUSGList as described above and uses dma-helpers.c to
> >implement zero-copy requests. I suspect that your Darwin requests would
> >actually fall into this category.
> >
> >Essentially I think Alex' patches are doing something similar, just not
> >implementing the complete DMA controller feature set and with the
> >regular slow path hacked as additional code into the fast path. So the
> >code could be cleaner, it could use asynchronous block layer functions
> >and handle errors, and it could be more complete, but at the end of
> >the day you'd still have some fast-path zero-copy I/O and some calls
> >into the block layer using bounce buffers.
> 
> I think the key concept to understand here is at what point does the
> data hit the disk? From the comments in various parts of
> Darwin/Linux, it could be understood that the DMA transfers are
> between memory and the ATA drive *buffer*, so for writes especially
> there is no guarantee that they even hit the disk until some point
> in the future, unless of course the FLUSH flag is set in the control
> register.
> 
> So part of me makes me think that maybe we are over-thinking this
> and we should just go with Kevin's original suggestion: what about
> if we start a new QEMUSGList for each IDE transfer, and just keep
> appending QEMUSGList entries until we find an OUTPUT_LAST/INPUT_LAST
> command?
> 
> Why is this valid? We can respond with a complete status for the
> intermediate INPUT_MORE/OUTPUT_MORE commands without touching the
> disk because all that guarantees is that data has been passed
> between memory and the drive *buffer* - not that it has actually hit
> the disk. And what is the point of having explicit _LAST commands if
> they aren't used to signify completion of the whole transfer between
> drive and memory?

I don't think there is even a clear relation between the DMA controller
status and whether the data is on disk or not. It's the IDE register's
job to tell the driver when a request has completed. The DMA controller
is only responsible for getting the data from the RAM to the device,
which might start doing a write only after it has received all data and
completed the DMA operation. (cf. PIO operation in core.c where the
bytes are gathered in a bounce buffer and only when the last byte
arrives, the whole sector is written out)

What I would do, however, is to complete even the INPUT/OUTPUT_MORE
commands only at the end of the whole request. This is definitely
allowed behaviour, and it ensures that a memory region isn't already
reused by the OS while e.g. a write request is still running and taking
data from this memory. We should only complete the DMA command as
soon as we don't touch the memory any more.

Kevin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Possibility of unaligned DMA accesses via the QEMU DMA API?
  2013-07-18  7:41           ` Kevin Wolf
@ 2013-07-18 13:44             ` Alexander Graf
  2013-07-18 13:55               ` Kevin Wolf
  2013-07-22  8:04               ` Mark Cave-Ayland
  0 siblings, 2 replies; 8+ messages in thread
From: Alexander Graf @ 2013-07-18 13:44 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Mark Cave-Ayland, qemu-devel


On 18.07.2013, at 09:41, Kevin Wolf wrote:

> Am 17.07.2013 um 22:12 hat Mark Cave-Ayland geschrieben:
>> On 17/07/13 14:35, Kevin Wolf wrote:
>> 
>>> Okay, so I've had a quick look at that DMA controller, and it seems that
>>> for a complete emulation, there's no way around using a bounce buffer
>>> (and calling directly into the block layer instead of using
>>> dma-helpers.c) for the general case.
>>> 
>>> You can have a fast path that is triggered if one or more directly
>>> following INPUT/OUTPUT commands cover the whole IDE command, and that
>>> creates an QEMUSGList as described above and uses dma-helpers.c to
>>> implement zero-copy requests. I suspect that your Darwin requests would
>>> actually fall into this category.
>>> 
>>> Essentially I think Alex' patches are doing something similar, just not
>>> implementing the complete DMA controller feature set and with the
>>> regular slow path hacked as additional code into the fast path. So the
>>> code could be cleaner, it could use asynchronous block layer functions
>>> and handle errors, and it could be more complete, but at the end of
>>> the day you'd still have some fast-path zero-copy I/O and some calls
>>> into the block layer using bounce buffers.
>> 
>> I think the key concept to understand here is at what point does the
>> data hit the disk? From the comments in various parts of
>> Darwin/Linux, it could be understood that the DMA transfers are
>> between memory and the ATA drive *buffer*, so for writes especially
>> there is no guarantee that they even hit the disk until some point
>> in the future, unless of course the FLUSH flag is set in the control
>> register.
>> 
>> So part of me makes me think that maybe we are over-thinking this
>> and we should just go with Kevin's original suggestion: what about
>> if we start a new QEMUSGList for each IDE transfer, and just keep
>> appending QEMUSGList entries until we find an OUTPUT_LAST/INPUT_LAST
>> command?
>> 
>> Why is this valid? We can respond with a complete status for the
>> intermediate INPUT_MORE/OUTPUT_MORE commands without touching the
>> disk because all that guarantees is that data has been passed
>> between memory and the drive *buffer* - not that it has actually hit
>> the disk. And what is the point of having explicit _LAST commands if
>> they aren't used to signify completion of the whole transfer between
>> drive and memory?
> 
> I don't think there is even a clear relation between the DMA controller
> status and whether the data is on disk or not. It's the IDE register's
> job to tell the driver when a request has completed. The DMA controller
> is only responsible for getting the data from the RAM to the device,
> which might start doing a write only after it has received all data and
> completed the DMA operation. (cf. PIO operation in core.c where the
> bytes are gathered in a bounce buffer and only when the last byte
> arrives, the whole sector is written out)
> 
> What I would do, however, is to complete even the INPUT/OUTPUT_MORE
> commands only at the end of the whole request. This is definitely
> allowed behaviour, and it ensures that a memory region isn't already
> reused by the OS while e.g. a write request is still running and taking
> data from this memory. We should only complete the DMA command as
> soon as we don't touch the memory any more.

Yes, that's the version that I described as "throw away almost all of today's code and rewrite it" :). Keep in mind that the same DMA controller can be used for Ethernet, so coupling it very tightly with IDE doesn't sound overly appealing to me either.


Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Possibility of unaligned DMA accesses via the QEMU DMA API?
  2013-07-18 13:44             ` Alexander Graf
@ 2013-07-18 13:55               ` Kevin Wolf
  2013-07-22  8:04               ` Mark Cave-Ayland
  1 sibling, 0 replies; 8+ messages in thread
From: Kevin Wolf @ 2013-07-18 13:55 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Mark Cave-Ayland, qemu-devel

Am 18.07.2013 um 15:44 hat Alexander Graf geschrieben:
> 
> On 18.07.2013, at 09:41, Kevin Wolf wrote:
> 
> > Am 17.07.2013 um 22:12 hat Mark Cave-Ayland geschrieben:
> >> On 17/07/13 14:35, Kevin Wolf wrote:
> >> 
> >>> Okay, so I've had a quick look at that DMA controller, and it seems that
> >>> for a complete emulation, there's no way around using a bounce buffer
> >>> (and calling directly into the block layer instead of using
> >>> dma-helpers.c) for the general case.
> >>> 
> >>> You can have a fast path that is triggered if one or more directly
> >>> following INPUT/OUTPUT commands cover the whole IDE command, and that
> >>> creates an QEMUSGList as described above and uses dma-helpers.c to
> >>> implement zero-copy requests. I suspect that your Darwin requests would
> >>> actually fall into this category.
> >>> 
> >>> Essentially I think Alex' patches are doing something similar, just not
> >>> implementing the complete DMA controller feature set and with the
> >>> regular slow path hacked as additional code into the fast path. So the
> >>> code could be cleaner, it could use asynchronous block layer functions
> >>> and handle errors, and it could be more complete, but at the end of
> >>> the day you'd still have some fast-path zero-copy I/O and some calls
> >>> into the block layer using bounce buffers.
> >> 
> >> I think the key concept to understand here is at what point does the
> >> data hit the disk? From the comments in various parts of
> >> Darwin/Linux, it could be understood that the DMA transfers are
> >> between memory and the ATA drive *buffer*, so for writes especially
> >> there is no guarantee that they even hit the disk until some point
> >> in the future, unless of course the FLUSH flag is set in the control
> >> register.
> >> 
> >> So part of me makes me think that maybe we are over-thinking this
> >> and we should just go with Kevin's original suggestion: what about
> >> if we start a new QEMUSGList for each IDE transfer, and just keep
> >> appending QEMUSGList entries until we find an OUTPUT_LAST/INPUT_LAST
> >> command?
> >> 
> >> Why is this valid? We can respond with a complete status for the
> >> intermediate INPUT_MORE/OUTPUT_MORE commands without touching the
> >> disk because all that guarantees is that data has been passed
> >> between memory and the drive *buffer* - not that it has actually hit
> >> the disk. And what is the point of having explicit _LAST commands if
> >> they aren't used to signify completion of the whole transfer between
> >> drive and memory?
> > 
> > I don't think there is even a clear relation between the DMA controller
> > status and whether the data is on disk or not. It's the IDE register's
> > job to tell the driver when a request has completed. The DMA controller
> > is only responsible for getting the data from the RAM to the device,
> > which might start doing a write only after it has received all data and
> > completed the DMA operation. (cf. PIO operation in core.c where the
> > bytes are gathered in a bounce buffer and only when the last byte
> > arrives, the whole sector is written out)
> > 
> > What I would do, however, is to complete even the INPUT/OUTPUT_MORE
> > commands only at the end of the whole request. This is definitely
> > allowed behaviour, and it ensures that a memory region isn't already
> > reused by the OS while e.g. a write request is still running and taking
> > data from this memory. We should only complete the DMA command as
> > soon as we don't touch the memory any more.
> 
> Yes, that's the version that I described as "throw away almost all of
> today's code and rewrite it" :).

Well, Mark didn't ask me what's easy to implement, but what's the Right
Thing to do. :-)

> Keep in mind that the same DMA controller can be used for Ethernet, so
> coupling it very tightly with IDE doesn't sound overly appealing to me
> either.

Taking this into consideration will make it even harder, but of course,
once designed right, it's also the most useful approach.

Kevin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Possibility of unaligned DMA accesses via the QEMU DMA API?
  2013-07-18 13:44             ` Alexander Graf
  2013-07-18 13:55               ` Kevin Wolf
@ 2013-07-22  8:04               ` Mark Cave-Ayland
  1 sibling, 0 replies; 8+ messages in thread
From: Mark Cave-Ayland @ 2013-07-22  8:04 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Kevin Wolf, qemu-devel

On 18/07/13 14:44, Alexander Graf wrote:

>> What I would do, however, is to complete even the INPUT/OUTPUT_MORE
>> commands only at the end of the whole request. This is definitely
>> allowed behaviour, and it ensures that a memory region isn't already
>> reused by the OS while e.g. a write request is still running and taking
>> data from this memory. We should only complete the DMA command as
>> soon as we don't touch the memory any more.
>
> Yes, that's the version that I described as "throw away almost all of today's code and rewrite it" :).
> Keep in mind that the same DMA controller can be used for Ethernet, so coupling it very tightly with
 > IDE doesn't sound overly appealing to me either.

I had a go at implementing this over yesterday afternoon and I managed 
to come up with something that boots both Linux and Darwin. I've 
uploaded it to github in order for you to take a look at here: 
https://github.com/mcayland/qemu/tree/mca-macio.

On the plus side, the code seems a lot more readable; but on the 
downside my heisenbug is *still* present, even in this version of the 
code (see more below). Does the revised version look good enough to you 
both? If so, I'll tidy it up a bit further and post it to the qemu-devel 
list.

In order to ensure that things aren't too tightly coupled to IDE, I've 
introduced a new ready() function pointer which is called from the 
modified DBDMA code. Even though it's currently not used for anything 
other than IDE in QEMU, I think this would enable the same controller to 
be used with Ethernet if required.

A couple of stylistic points for the patch:

1) Can I swap m->aiocb for s->bus->dma->aiocb? This seems to be the 
field used by the internal IDE core.

2) Should the bdrv_acct_start() calls be removed from 
pmac_ide_transfer()? It looks as if they are already handled in 
ide_sector_start_dma() and ide_atapi_cmd_read_dma().

Now back to the heisenbug: for Kevin's benefit, the reason I started 
looking at this is because I have a bug with Alex's existing patches in 
that Aurenlien's test Linux PPC images fails to boot with DMA timeout 
errors (see http://www.ilande.co.uk/tmp/bad-ppc-linux.png). Sadly if I 
try and debug it (by compiling with -O0 -g or adding debug printf() 
statements) then everything works as normal :/

On the plus side, with my revised version I can trigger the bug fairly 
easily by commenting out DBDMA_kick in ide_dbdma_start() and enabling 
DEBUG_IDE_ATAPI in hw/ide/internal.h. Note that while the timeout occurs 
on the CDROM for the majority of time, I've still seen it very 
occasionally on the HD device too.

With DEBUG_IDE_ATAPI enabled the tail of the debug output I can get 
without the bug disappearing looks like this:

reply: tx_size=36 elem_tx_size=0 index=0
byte_count_limit=36
status=0x58
reply: tx_size=0 elem_tx_size=0 index=36
status=0x50
ATAPI limit=0x8 packet: 46 00 00 00 00 00 00 00 08 00 00 00
reply: tx_size=8 elem_tx_size=0 index=0
byte_count_limit=8
status=0x58
reply: tx_size=0 elem_tx_size=0 index=8
status=0x50
ATAPI limit=0x14 packet: 46 01 00 00 00 00 00 00 14 00 00 00
reply: tx_size=20 elem_tx_size=0 index=0
byte_count_limit=20
status=0x58
reply: tx_size=0 elem_tx_size=0 index=20
status=0x50
ATAPI limit=0xc packet: 43 00 00 00 00 00 01 00 0c 00 00 00
reply: tx_size=12 elem_tx_size=0 index=0
byte_count_limit=12
status=0x58
reply: tx_size=0 elem_tx_size=0 index=12
status=0x50
ATAPI limit=0x14 packet: 43 00 00 00 00 00 01 00 14 00 00 00
reply: tx_size=20 elem_tx_size=0 index=0
byte_count_limit=20
status=0x58
reply: tx_size=0 elem_tx_size=0 index=20
status=0x50
ATAPI limit=0xc packet: 43 00 01 00 00 00 00 00 0c 00 00 00
reply: tx_size=12 elem_tx_size=0 index=0
byte_count_limit=12
status=0x58
reply: tx_size=0 elem_tx_size=0 index=12
status=0x50
ATAPI limit=0x20 packet: 51 00 00 00 00 00 00 00 20 00 00 00
Data ready for CD device: 20 bytes

When the bug appears, the guest doesn't bother to program the DMA 
controller to finish the transfer which makes me wonder if one of the 
IDE status flags is not being cleared correctly? Sadly if I enable 
DEBUG_IDE then of the course the delay introduced makes everything work 
again :/

Interestingly enough, the final 0x51 command packet above has s->lba set 
to -1. Given that the code tries to use s->lba as the DMA sector number, 
this is definitely a bug - what should we actually be doing in this case?


ATB,

Mark.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-07-22  8:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <51E64692.1010407@ilande.co.uk>
     [not found] ` <20130717081627.GB2458@dhcp-200-207.str.redhat.com>
2013-07-17 12:52   ` [Qemu-devel] Possibility of unaligned DMA accesses via the QEMU DMA API? Mark Cave-Ayland
2013-07-17 12:59     ` Alexander Graf
2013-07-17 13:35       ` Kevin Wolf
2013-07-17 20:12         ` Mark Cave-Ayland
2013-07-18  7:41           ` Kevin Wolf
2013-07-18 13:44             ` Alexander Graf
2013-07-18 13:55               ` Kevin Wolf
2013-07-22  8:04               ` Mark Cave-Ayland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.