[Xen-devel] [PATCH v4 00/17] x86/hvm: I/O emulation cleanup and fix

Discussion:

[Xen-devel] [PATCH v4 00/17] x86/hvm: I/O emulation cleanup and fix

Paul Durrant

2015-06-24 11:24:32 UTC

This patch series re-works much of the code involved in emulation of port
and memory mapped I/O for HVM guests.

The code has become very convoluted and, at least by inspection, certain
emulations will apparently malfunction.

The series is broken down into 17 patches (which are also available in
my xenbits repo: http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git
on the emulation27 branch) as follows:

0001-x86-hvm-simplify-hvmemul_do_io.patch
0002-x86-hvm-remove-hvm_io_pending-check-in-hvmemul_do_io.patch
0003-x86-hvm-remove-extraneous-parameter-from-hvmtrace_io.patch
0004-x86-hvm-remove-multiple-open-coded-chunking-loops.patch
0005-x86-hvm-re-name-struct-hvm_mmio_handler-to-hvm_mmio_.patch
0006-x86-hvm-unify-internal-portio-and-mmio-intercepts.patch
0007-x86-hvm-add-length-to-mmio-check-op.patch
0008-x86-hvm-unify-dpci-portio-intercept-with-standard-po.patch
0009-x86-hvm-unify-stdvga-mmio-intercept-with-standard-mm.patch
0010-x86-hvm-revert-82ed8716b-fix-direct-PCI-port-I-O-emu.patch
0011-x86-hvm-only-call-hvm_io_assist-from-hvm_wait_for_io.patch
0012-x86-hvm-split-I-O-completion-handling-from-state-mod.patch
0013-x86-hvm-remove-HVMIO_dispatched-I-O-state.patch
0014-x86-hvm-remove-hvm_io_state-enumeration.patch
0015-x86-hvm-use-ioreq_t-to-track-in-flight-state.patch
0016-x86-hvm-always-re-emulate-I-O-from-a-buffer.patch
0017-x86-hvm-track-large-memory-mapped-accesses-by-buffer.patch

v2:
- Removed bogus assertion from patch 16
- Re-worked patch #17 after basic testing of back-port onto XenServer

v3:
- Addressed comments from Jan
- Re-ordered series to bring a couple of more trivial patches to the
front
- Backport to XenServer (4.5) now passing automated tests
- Tested on unstable with QEMU upstream and trad, with and without
HAP (to force shadow emulation)

v4:
- Removed previous patch #4 (make sure translated MMIO reads or
writes fall within a page) and rebased rest of series.
- Address Jan's comments on prevous patch #5 (now patch #4)

Paul Durrant

2015-06-24 11:24:34 UTC

The check is done at the wrong point (since it is irrelevant if the
I/O is to be handled by the hypervisor) and its functionality can be
covered by returning X86EMUL_UNHANDLEABLE from hvm_send_assist_req()
instead.

This patch also removes the domain_crash() call from
hvm_send_assist_req(). Returning X86EMUL_UNHANDLEABLE allows the
higher layers of emulation to decide what to do instead.

Signed-off-by: Paul Durrant <***@citrix.com>
Cc: Keir Fraser <***@xen.org>
Cc: Jan Beulich <***@suse.com>
Cc: Andrew Cooper <***@citrix.com>
---
xen/arch/x86/hvm/emulate.c | 10 ++--------
xen/arch/x86/hvm/hvm.c | 16 ++++++----------
xen/include/asm-x86/hvm/hvm.h | 2 +-
3 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 9d7af0c..b412302 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -121,12 +121,6 @@ static int hvmemul_do_io(
return X86EMUL_UNHANDLEABLE;
}

- if ( hvm_io_pending(curr) )
- {
- gdprintk(XENLOG_WARNING, "WARNING: io already pending?\n");
- return X86EMUL_UNHANDLEABLE;
- }
-
vio->io_state = (data_is_addr || dir == IOREQ_WRITE) ?
HVMIO_dispatched : HVMIO_awaiting_completion;
vio->io_size = size;
@@ -188,8 +182,8 @@ static int hvmemul_do_io(
}
else
{
- rc = X86EMUL_RETRY;
- if ( !hvm_send_assist_req(s, &p) )
+ rc = hvm_send_assist_req(s, &p);
+ if ( rc != X86EMUL_RETRY )
vio->io_state = HVMIO_none;
else if ( data_is_addr || dir == IOREQ_WRITE )
rc = X86EMUL_OKAY;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index d5e5242..535d622 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2605,7 +2605,7 @@ int hvm_buffered_io_send(ioreq_t *p)
return 1;
}

-bool_t hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *proto_p)
+int hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *proto_p)
{
struct vcpu *curr = current;
struct domain *d = curr->domain;
@@ -2613,7 +2613,7 @@ bool_t hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *proto_p)

ASSERT(s);
if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )
- return 0; /* implicitly bins the i/o operation */
+ return X86EMUL_OKAY;

list_for_each_entry ( sv,
&s->ioreq_vcpu_list,
@@ -2628,14 +2628,14 @@ bool_t hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *proto_p)
{
gprintk(XENLOG_ERR, "device model set bad IO state %d\n",
p->state);
- goto crash;
+ break;
}

if ( unlikely(p->vp_eport != port) )
{
gprintk(XENLOG_ERR, "device model set bad event channel %d\n",
p->vp_eport);
- goto crash;
+ break;
}

proto_p->state = STATE_IOREQ_NONE;
@@ -2651,15 +2651,11 @@ bool_t hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *proto_p)
*/
p->state = STATE_IOREQ_READY;
notify_via_xen_event_channel(d, port);
- break;
+ return X86EMUL_RETRY;
}
}

- return 1;
-
- crash:
- domain_crash(d);
- return 0;
+ return X86EMUL_UNHANDLEABLE;
}

void hvm_complete_assist_req(ioreq_t *p)
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 77eeac5..57f9605 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -230,7 +230,7 @@ void hvm_vcpu_reset_state(struct vcpu *v, uint16_t cs, uint16_t ip);

struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
ioreq_t *p);
-bool_t hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *p);
+int hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *p);
void hvm_broadcast_assist_req(ioreq_t *p);
void hvm_complete_assist_req(ioreq_t *p);

--
1.7.10.4

Paul Durrant

2015-06-24 11:24:41 UTC

It's clear from the following check in hvmemul_rep_movs:

if ( sp2mt == p2m_mmio_direct || dp2mt == p2m_mmio_direct ||
(sp2mt == p2m_mmio_dm && dp2mt == p2m_mmio_dm) )
return X86EMUL_UNHANDLEABLE;

that mmio <-> mmio copy is not handled. This means the code in the
stdvga mmio intercept that explicitly handles mmio <-> mmio copy when
hvm_copy_to/from_guest_phys() fails is never going to be executed.

This patch therefore adds a check in hvmemul_do_io_addr() to make sure
mmio <-> mmio is disallowed and then registers standard mmio intercept ops
in stdvga_init().

With this patch all mmio and portio handled within Xen now goes through
process_io_intercept().

Signed-off-by: Paul Durrant <***@citrix.com>
Cc: Keir Fraser <***@xen.org>
Cc: Jan Beulich <***@suse.com>
Cc: Andrew Cooper <***@citrix.com>
---
xen/arch/x86/hvm/emulate.c | 9 +++
xen/arch/x86/hvm/intercept.c | 7 --
xen/arch/x86/hvm/stdvga.c | 173 +++++++++---------------------------------
xen/include/asm-x86/hvm/io.h | 1 -
4 files changed, 44 insertions(+), 146 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 9ced81b..4e2fdf1 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -267,6 +267,15 @@ static int hvmemul_acquire_page(unsigned long gmfn, struct page_info **page)
return X86EMUL_RETRY;
}

+ /* This code should not be reached if the gmfn is not RAM */
+ if ( p2m_is_mmio(p2mt) )
+ {
+ domain_crash(curr_d);
+
+ put_page(*page);
+ return X86EMUL_UNHANDLEABLE;
+ }
+
return X86EMUL_OKAY;
}

diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index 5633959..625e585 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -279,13 +279,6 @@ int hvm_io_intercept(ioreq_t *p)
{
struct hvm_io_handler *handler;

- if ( p->type == IOREQ_TYPE_COPY )
- {
- int rc = stdvga_intercept_mmio(p);
- if ( (rc == X86EMUL_OKAY) || (rc == X86EMUL_RETRY) )
- return rc;
- }
-
handler = hvm_find_io_handler(p);

if ( handler == NULL )
diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
index dcd532a..639da6a 100644
--- a/xen/arch/x86/hvm/stdvga.c
+++ b/xen/arch/x86/hvm/stdvga.c
@@ -275,9 +275,10 @@ static uint8_t stdvga_mem_readb(uint64_t addr)
return ret;
}

-static uint64_t stdvga_mem_read(uint64_t addr, uint64_t size)
+static int stdvga_mem_read(struct vcpu *v, unsigned long addr,
+ unsigned long size, unsigned long *p_data)
{
- uint64_t data = 0;
+ unsigned long data = 0;

switch ( size )
{
@@ -313,7 +314,9 @@ static uint64_t stdvga_mem_read(uint64_t addr, uint64_t size)
break;
}

- return data;
+ *p_data = data;
+
+ return X86EMUL_OKAY;
}

static void stdvga_mem_writeb(uint64_t addr, uint32_t val)
@@ -424,8 +427,17 @@ static void stdvga_mem_writeb(uint64_t addr, uint32_t val)
}
}

-static void stdvga_mem_write(uint64_t addr, uint64_t data, uint64_t size)
+static int stdvga_mem_write(struct vcpu *v, unsigned long addr,
+ unsigned long size, unsigned long data)
{
+ ioreq_t p = { .type = IOREQ_TYPE_COPY,
+ .addr = addr,
+ .size = size,
+ .count = 1,
+ .dir = IOREQ_WRITE,
+ .data = data,
+ };
+
/* Intercept mmio write */
switch ( size )
{
@@ -460,153 +472,36 @@ static void stdvga_mem_write(uint64_t addr, uint64_t data, uint64_t size)
gdprintk(XENLOG_WARNING, "invalid io size: %"PRId64"\n", size);
break;
}
-}
-
-static uint32_t read_data;
-
-static int mmio_move(struct hvm_hw_stdvga *s, ioreq_t *p)
-{
- int i;
- uint64_t addr = p->addr;
- p2m_type_t p2mt;
- struct domain *d = current->domain;
- int step = p->df ? -p->size : p->size;

- if ( p->data_is_ptr )
- {
- uint64_t data = p->data, tmp;
-
- if ( p->dir == IOREQ_READ )
- {
- for ( i = 0; i < p->count; i++ )
- {
- tmp = stdvga_mem_read(addr, p->size);
- if ( hvm_copy_to_guest_phys(data, &tmp, p->size) !=
- HVMCOPY_okay )
- {
- struct page_info *dp = get_page_from_gfn(
- d, data >> PAGE_SHIFT, &p2mt, P2M_ALLOC);
- /*
- * The only case we handle is vga_mem <-> vga_mem.
- * Anything else disables caching and leaves it to qemu-dm.
- */
- if ( (p2mt != p2m_mmio_dm) || (data < VGA_MEM_BASE) ||
- ((data + p->size) > (VGA_MEM_BASE + VGA_MEM_SIZE)) )
- {
- if ( dp )
- put_page(dp);
- return 0;
- }
- ASSERT(!dp);
- stdvga_mem_write(data, tmp, p->size);
- }
- data += step;
- addr += step;
- }
- }
- else
- {
- for ( i = 0; i < p->count; i++ )
- {
- if ( hvm_copy_from_guest_phys(&tmp, data, p->size) !=
- HVMCOPY_okay )
- {
- struct page_info *dp = get_page_from_gfn(
- d, data >> PAGE_SHIFT, &p2mt, P2M_ALLOC);
- if ( (p2mt != p2m_mmio_dm) || (data < VGA_MEM_BASE) ||
- ((data + p->size) > (VGA_MEM_BASE + VGA_MEM_SIZE)) )
- {
- if ( dp )
- put_page(dp);
- return 0;
- }
- ASSERT(!dp);
- tmp = stdvga_mem_read(data, p->size);
- }
- stdvga_mem_write(addr, tmp, p->size);
- data += step;
- addr += step;
- }
- }
- }
- else if ( p->dir == IOREQ_WRITE )
- {
- for ( i = 0; i < p->count; i++ )
- {
- stdvga_mem_write(addr, p->data, p->size);
- addr += step;
- }
- }
- else
- {
- ASSERT(p->count == 1);
- p->data = stdvga_mem_read(addr, p->size);
- }
+ if ( !hvm_buffered_io_send(&p) )
+ return X86EMUL_UNHANDLEABLE;

- read_data = p->data;
- return 1;
+ return X86EMUL_OKAY;
}

-int stdvga_intercept_mmio(ioreq_t *p)
+static int stdvga_mem_check(struct vcpu *v, unsigned long addr,
+ unsigned long length)
{
- struct domain *d = current->domain;
- struct hvm_hw_stdvga *s = &d->arch.hvm_domain.stdvga;
- uint64_t start, end, count = p->count, size = p->size;
- int buf = 0, rc;
-
- if ( p->df )
- {
- start = (p->addr - (count - 1) * size);
- end = p->addr + size;
- }
- else
- {
- start = p->addr;
- end = p->addr + count * size;
- }
-
- if ( (start < VGA_MEM_BASE) || (end > (VGA_MEM_BASE + VGA_MEM_SIZE)) )
- return X86EMUL_UNHANDLEABLE;
-
- if ( p->size > 8 )
- {
- gdprintk(XENLOG_WARNING, "invalid mmio size %d\n", (int)p->size);
- return X86EMUL_UNHANDLEABLE;
- }
+ struct hvm_hw_stdvga *s = &v->domain->arch.hvm_domain.stdvga;
+ int rc;

spin_lock(&s->lock);

- if ( s->stdvga && s->cache )
- {
- switch ( p->type )
- {
- case IOREQ_TYPE_COPY:
- buf = mmio_move(s, p);
- if ( !buf )
- s->cache = 0;
- break;
- default:
- gdprintk(XENLOG_WARNING, "unsupported mmio request type:%d "
- "addr:0x%04x data:0x%04x size:%d count:%d state:%d "
- "isptr:%d dir:%d df:%d\n",
- p->type, (int)p->addr, (int)p->data, (int)p->size,
- (int)p->count, p->state,
- p->data_is_ptr, p->dir, p->df);
- s->cache = 0;
- }
- }
- else
- {
- buf = (p->dir == IOREQ_WRITE);
- }
-
- rc = (buf && hvm_buffered_io_send(p));
+ rc = s->stdvga && s->cache &&
+ (addr >= VGA_MEM_BASE) &&
+ ((addr + length) < (VGA_MEM_BASE + VGA_MEM_SIZE));

spin_unlock(&s->lock);

- return rc ? X86EMUL_OKAY : X86EMUL_UNHANDLEABLE;
+ return rc;
}

+static const struct hvm_mmio_ops stdvga_mem_ops = {
+ .check = stdvga_mem_check,
+ .read = stdvga_mem_read,
+ .write = stdvga_mem_write
+};
+
void stdvga_init(struct domain *d)
{
struct hvm_hw_stdvga *s = &d->arch.hvm_domain.stdvga;
@@ -634,6 +529,8 @@ void stdvga_init(struct domain *d)
register_portio_handler(d, 0x3c4, 2, stdvga_intercept_pio);
/* Graphics registers. */
register_portio_handler(d, 0x3ce, 2, stdvga_intercept_pio);
+ /* VGA memory */
+ register_mmio_handler(d, &stdvga_mem_ops);
}
}

diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index afa27bf..00b4a89 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -122,7 +122,6 @@ struct hvm_hw_stdvga {
};

void stdvga_init(struct domain *d);
-int stdvga_intercept_mmio(ioreq_t *p);
void stdvga_deinit(struct domain *d);

extern void hvm_dpci_msi_eoi(struct domain *d, int vector);

--
1.7.10.4

Jan Beulich

2015-06-24 13:59:10 UTC

Post by Paul Durrant
@@ -424,8 +427,17 @@ static void stdvga_mem_writeb(uint64_t addr, uint32_t val)
}
}
-static void stdvga_mem_write(uint64_t addr, uint64_t data, uint64_t size)
+static int stdvga_mem_write(struct vcpu *v, unsigned long addr,
+ unsigned long size, unsigned long data)
{
+ ioreq_t p = { .type = IOREQ_TYPE_COPY,
+ .addr = addr,
+ .size = size,
+ .count = 1,
+ .dir = IOREQ_WRITE,
+ .data = data,

Indentation.

Post by Paul Durrant
- if ( s->stdvga && s->cache )
- {
- switch ( p->type )
- {
- buf = mmio_move(s, p);
- if ( !buf )
- s->cache = 0;
- break;
- gdprintk(XENLOG_WARNING, "unsupported mmio request type:%d "
- "addr:0x%04x data:0x%04x size:%d count:%d state:%d "
- "isptr:%d dir:%d df:%d\n",
- p->type, (int)p->addr, (int)p->data, (int)p->size,
- (int)p->count, p->state,
- p->data_is_ptr, p->dir, p->df);
- s->cache = 0;
- }

I can't see where these cases of clearing s->cache move to.

Post by Paul Durrant
- }
- else
- {
- buf = (p->dir == IOREQ_WRITE);
- }
-
- rc = (buf && hvm_buffered_io_send(p));
+ rc = s->stdvga && s->cache &&
+ (addr >= VGA_MEM_BASE) &&
+ ((addr + length) < (VGA_MEM_BASE + VGA_MEM_SIZE));

Not how the old code also calls hvm_buffered_io_send() when
!s->stdvga || !s->cache but p->dir == IOREQ_WRITE. Do you
really mean to drop that?

Jan

Paul Durrant

2015-06-24 14:12:43 UTC

-----Original Message-----
Sent: 24 June 2015 14:59
To: Paul Durrant
Subject: Re: [PATCH v4 09/17] x86/hvm: unify stdvga mmio intercept with
standard mmio intercept

Post by Paul Durrant
@@ -424,8 +427,17 @@ static void stdvga_mem_writeb(uint64_t addr,

uint32_t val)

Post by Paul Durrant
}
}
-static void stdvga_mem_write(uint64_t addr, uint64_t data, uint64_t size)
+static int stdvga_mem_write(struct vcpu *v, unsigned long addr,
+ unsigned long size, unsigned long data)
{
+ ioreq_t p = { .type = IOREQ_TYPE_COPY,
+ .addr = addr,
+ .size = size,
+ .count = 1,
+ .dir = IOREQ_WRITE,
+ .data = data,

Indentation.

Damn emacs.

Post by Paul Durrant
- if ( s->stdvga && s->cache )
- {
- switch ( p->type )
- {
- buf = mmio_move(s, p);
- if ( !buf )
- s->cache = 0;
- break;
- gdprintk(XENLOG_WARNING, "unsupported mmio request type:%d

"

Post by Paul Durrant
- "addr:0x%04x data:0x%04x size:%d count:%d state:%d "
- "isptr:%d dir:%d df:%d\n",
- p->type, (int)p->addr, (int)p->data, (int)p->size,
- (int)p->count, p->state,
- p->data_is_ptr, p->dir, p->df);
- s->cache = 0;
- }

I can't see where these cases of clearing s->cache move to.

There's only one case AFAICT, which is if the domain goes through a save/restore then s->cache is cleared.

Post by Paul Durrant
- }
- else
- {
- buf = (p->dir == IOREQ_WRITE);
- }
-
- rc = (buf && hvm_buffered_io_send(p));
+ rc = s->stdvga && s->cache &&
+ (addr >= VGA_MEM_BASE) &&
+ ((addr + length) < (VGA_MEM_BASE + VGA_MEM_SIZE));

Not how the old code also calls hvm_buffered_io_send() when
!s->stdvga || !s->cache but p->dir == IOREQ_WRITE. Do you
really mean to drop that?

Hmmm. I'm not sure why it would have done that. It seems wrong. I'll check.

Paul

Jan

Jan Beulich

2015-06-24 14:30:07 UTC

Post by Paul Durrant

Sent: 24 June 2015 14:59

Post by Paul Durrant
- if ( s->stdvga && s->cache )
- {
- switch ( p->type )
- {
- buf = mmio_move(s, p);
- if ( !buf )
- s->cache = 0;
- break;
- gdprintk(XENLOG_WARNING, "unsupported mmio request type:%d

"

Post by Paul Durrant
- "addr:0x%04x data:0x%04x size:%d count:%d state:%d "
- "isptr:%d dir:%d df:%d\n",
- p->type, (int)p->addr, (int)p->data, (int)p->size,
- (int)p->count, p->state,
- p->data_is_ptr, p->dir, p->df);
- s->cache = 0;
- }

I can't see where these cases of clearing s->cache move to.

There's only one case AFAICT, which is if the domain goes through a
save/restore then s->cache is cleared.

There are two cases visible in the context above, which afaict
have nothing to do with save/restore.

Jan

Paul Durrant

2015-06-24 14:43:44 UTC

-----Original Message-----
Sent: 24 June 2015 15:30
To: Paul Durrant
Subject: RE: [PATCH v4 09/17] x86/hvm: unify stdvga mmio intercept with
standard mmio intercept

Post by Paul Durrant

Sent: 24 June 2015 14:59

Post by Paul Durrant
- if ( s->stdvga && s->cache )
- {
- switch ( p->type )
- {
- buf = mmio_move(s, p);
- if ( !buf )
- s->cache = 0;
- break;
- gdprintk(XENLOG_WARNING, "unsupported mmio request

type:%d

Post by Paul Durrant

"

Post by Paul Durrant
- "addr:0x%04x data:0x%04x size:%d count:%d state:%d "
- "isptr:%d dir:%d df:%d\n",
- p->type, (int)p->addr, (int)p->data, (int)p->size,
- (int)p->count, p->state,
- p->data_is_ptr, p->dir, p->df);
- s->cache = 0;
- }

I can't see where these cases of clearing s->cache move to.

There's only one case AFAICT, which is if the domain goes through a
save/restore then s->cache is cleared.

There are two cases visible in the context above, which afaict
have nothing to do with save/restore.

Caching was also disabled on an MMIO <-> MMIO move that was not within VRAM, or an MMIO <-> RAM move. As we know, MMIO <-> MMIO can't happen anyway and I don't see why a rep mov necessarily should disable cache. Before this change I guess it had to since the code did not have the component writes to forward to QEMU as buffered requests. Now it does.

I'd missed the fact that it still forwarded writes as buffered even if it was not caching though.

Paul

Jan

Paul Durrant

2015-06-24 11:24:36 UTC

...in hvmemul_read/write()

Add hvmemul_phys_mmio_access() and hvmemul_linear_mmio_access() functions
to reduce code duplication.

NOTE: This patch also introduces a change in 'chunking' around a page
boundary. Previously (for example) an 8 byte access at the last
byte of a page would get carried out as 8 single-byte accesses.
It will now be carried out as a single-byte access, followed by
a 4-byte access, a 2-byte access and then another single-byte
access.

Signed-off-by: Paul Durrant <***@citrix.com>
Cc: Keir Fraser <***@xen.org>
Cc: Jan Beulich <***@suse.com>
Cc: Andrew Cooper <***@citrix.com>
---
xen/arch/x86/hvm/emulate.c | 225 +++++++++++++++++++++++---------------------
1 file changed, 118 insertions(+), 107 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 935eab3..4d11c6c 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -540,6 +540,119 @@ static int hvmemul_virtual_to_linear(
return X86EMUL_EXCEPTION;
}

+static int hvmemul_phys_mmio_access(
+ paddr_t gpa, unsigned int size, uint8_t dir, uint8_t **buffer)
+{
+ unsigned long one_rep = 1;
+ unsigned int chunk;
+ int rc;
+
+ /* Accesses must fall within a page */
+ BUG_ON((gpa & (PAGE_SIZE - 1)) + size > PAGE_SIZE);
+
+ /*
+ * hvmemul_do_io() cannot handle non-power-of-2 accesses or
+ * accesses larger than sizeof(long), so choose the highest power
+ * of 2 not exceeding sizeof(long) as the 'chunk' size.
+ */
+ chunk = 1 << (fls(size) - 1);
+ if ( chunk > sizeof (long) )
+ chunk = sizeof (long);
+
+ for ( ;; )
+ {
+ rc = hvmemul_do_mmio_buffer(gpa, &one_rep, chunk, dir, 0,
+ *buffer);
+ if ( rc != X86EMUL_OKAY )
+ break;
+
+ /* Advance to the next chunk */
+ gpa += chunk;
+ *buffer += chunk;
+ size -= chunk;
+
+ if ( size == 0 )
+ break;
+
+ /*
+ * If the chunk now exceeds the remaining size, choose the next
+ * lowest power of 2 that will fit.
+ */
+ while ( chunk > size )
+ chunk >>= 1;
+ }
+
+ return rc;
+}
+
+static int hvmemul_linear_mmio_access(
+ unsigned long gla, unsigned int size, uint8_t dir, uint8_t *buffer,
+ uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt, bool_t translate)
+{
+ struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
+ unsigned long page_off = gla & (PAGE_SIZE - 1);
+ unsigned int chunk;
+ paddr_t gpa;
+ unsigned long one_rep = 1;
+ int rc;
+
+ chunk = min_t(unsigned int, size, PAGE_SIZE - page_off);
+
+ if ( translate )
+ gpa = pfn_to_paddr(vio->mmio_gpfn) | page_off;
+ else
+ {
+ rc = hvmemul_linear_to_phys(gla, &gpa, chunk, &one_rep, pfec,
+ hvmemul_ctxt);
+ if ( rc != X86EMUL_OKAY )
+ return rc;
+ }
+
+ for ( ;; )
+ {
+ rc = hvmemul_phys_mmio_access(gpa, chunk, dir, &buffer);
+ if ( rc != X86EMUL_OKAY )
+ break;
+
+ gla += chunk;
+ gpa += chunk;
+ size -= chunk;
+
+ if ( size == 0 )
+ break;
+
+ ASSERT((gla & (PAGE_SIZE - 1)) == 0);
+ chunk = min_t(unsigned int, size, PAGE_SIZE);
+ if ( !translate )
+ {
+ rc = hvmemul_linear_to_phys(gla, &gpa, chunk, &one_rep, pfec,
+ hvmemul_ctxt);
+ if ( rc != X86EMUL_OKAY )
+ return rc;
+ }
+ }
+
+ return rc;
+}
+
+static inline int hvmemul_linear_mmio_read(
+ unsigned long gla, unsigned int size, void *buffer,
+ uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt,
+ bool_t translate)
+{
+ return hvmemul_linear_mmio_access(gla, size, IOREQ_READ, buffer,
+ pfec, hvmemul_ctxt, translate);
+}
+
+static inline int hvmemul_linear_mmio_write(
+ unsigned long gla, unsigned int size, void *buffer,
+ uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt,
+ bool_t translate)
+{
+ return hvmemul_linear_mmio_access(gla, size, IOREQ_WRITE, buffer,
+ pfec, hvmemul_ctxt, translate);
+}
+
static int __hvmemul_read(
enum x86_segment seg,
unsigned long offset,
@@ -550,51 +663,19 @@ static int __hvmemul_read(
{
struct vcpu *curr = current;
unsigned long addr, reps = 1;
- unsigned int off, chunk = min(bytes, 1U << LONG_BYTEORDER);
uint32_t pfec = PFEC_page_present;
struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
- paddr_t gpa;
int rc;

rc = hvmemul_virtual_to_linear(
seg, offset, bytes, &reps, access_type, hvmemul_ctxt, &addr);
if ( rc != X86EMUL_OKAY )
return rc;
- off = addr & (PAGE_SIZE - 1);
- /*
- * We only need to handle sizes actual instruction operands can have. All
- * such sizes are either powers of 2 or the sum of two powers of 2. Thus
- * picking as initial chunk size the largest power of 2 not greater than
- * the total size will always result in only power-of-2 size requests
- * issued to hvmemul_do_mmio() (hvmemul_do_io() rejects non-powers-of-2).
- */
- while ( chunk & (chunk - 1) )
- chunk &= chunk - 1;
- if ( off + bytes > PAGE_SIZE )
- while ( off & (chunk - 1) )
- chunk >>= 1;
-
if ( ((access_type != hvm_access_insn_fetch
? vio->mmio_access.read_access
: vio->mmio_access.insn_fetch)) &&
(vio->mmio_gva == (addr & PAGE_MASK)) )
- {
- gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off);
- while ( (off + chunk) <= PAGE_SIZE )
- {
- rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_READ, 0,
- p_data);
- if ( rc != X86EMUL_OKAY || bytes == chunk )
- return rc;
- addr += chunk;
- off += chunk;
- gpa += chunk;
- p_data += chunk;
- bytes -= chunk;
- if ( bytes < chunk )
- chunk = bytes;
- }
- }
+ return hvmemul_linear_mmio_read(addr, bytes, p_data, pfec, hvmemul_ctxt, 1);

if ( (seg != x86_seg_none) &&
(hvmemul_ctxt->seg_reg[x86_seg_ss].attr.fields.dpl == 3) )
@@ -613,30 +694,8 @@ static int __hvmemul_read(
case HVMCOPY_bad_gfn_to_mfn:
if ( access_type == hvm_access_insn_fetch )
return X86EMUL_UNHANDLEABLE;
- rc = hvmemul_linear_to_phys(addr, &gpa, chunk, &reps, pfec,
- hvmemul_ctxt);
- while ( rc == X86EMUL_OKAY )
- {
- rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_READ, 0,
- p_data);
- if ( rc != X86EMUL_OKAY || bytes == chunk )
- break;
- addr += chunk;
- off += chunk;
- p_data += chunk;
- bytes -= chunk;
- if ( bytes < chunk )
- chunk = bytes;
- if ( off < PAGE_SIZE )
- gpa += chunk;
- else
- {
- rc = hvmemul_linear_to_phys(addr, &gpa, chunk, &reps, pfec,
- hvmemul_ctxt);
- off = 0;
- }
- }
- return rc;
+
+ return hvmemul_linear_mmio_read(addr, bytes, p_data, pfec, hvmemul_ctxt, 0);
case HVMCOPY_gfn_paged_out:
case HVMCOPY_gfn_shared:
return X86EMUL_RETRY;
@@ -702,43 +761,18 @@ static int hvmemul_write(
container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
struct vcpu *curr = current;
unsigned long addr, reps = 1;
- unsigned int off, chunk = min(bytes, 1U << LONG_BYTEORDER);
uint32_t pfec = PFEC_page_present | PFEC_write_access;
struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
- paddr_t gpa;
int rc;

rc = hvmemul_virtual_to_linear(
seg, offset, bytes, &reps, hvm_access_write, hvmemul_ctxt, &addr);
if ( rc != X86EMUL_OKAY )
return rc;
- off = addr & (PAGE_SIZE - 1);
- /* See the respective comment in __hvmemul_read(). */
- while ( chunk & (chunk - 1) )
- chunk &= chunk - 1;
- if ( off + bytes > PAGE_SIZE )
- while ( off & (chunk - 1) )
- chunk >>= 1;

if ( vio->mmio_access.write_access &&
(vio->mmio_gva == (addr & PAGE_MASK)) )
- {
- gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off);
- while ( (off + chunk) <= PAGE_SIZE )
- {
- rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_WRITE, 0,
- p_data);
- if ( rc != X86EMUL_OKAY || bytes == chunk )
- return rc;
- addr += chunk;
- off += chunk;
- gpa += chunk;
- p_data += chunk;
- bytes -= chunk;
- if ( bytes < chunk )
- chunk = bytes;
- }
- }
+ return hvmemul_linear_mmio_write(addr, bytes, p_data, pfec, hvmemul_ctxt, 1);

if ( (seg != x86_seg_none) &&
(hvmemul_ctxt->seg_reg[x86_seg_ss].attr.fields.dpl == 3) )
@@ -753,30 +787,7 @@ static int hvmemul_write(
case HVMCOPY_bad_gva_to_gfn:
return X86EMUL_EXCEPTION;
case HVMCOPY_bad_gfn_to_mfn:
- rc = hvmemul_linear_to_phys(addr, &gpa, chunk, &reps, pfec,
- hvmemul_ctxt);
- while ( rc == X86EMUL_OKAY )
- {
- rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_WRITE, 0,
- p_data);
- if ( rc != X86EMUL_OKAY || bytes == chunk )
- break;
- addr += chunk;
- off += chunk;
- p_data += chunk;
- bytes -= chunk;
- if ( bytes < chunk )
- chunk = bytes;
- if ( off < PAGE_SIZE )
- gpa += chunk;
- else
- {
- rc = hvmemul_linear_to_phys(addr, &gpa, chunk, &reps, pfec,
- hvmemul_ctxt);
- off = 0;
- }
- }
- return rc;
+ return hvmemul_linear_mmio_write(addr, bytes, p_data, pfec, hvmemul_ctxt, 0);
case HVMCOPY_gfn_paged_out:
case HVMCOPY_gfn_shared:
return X86EMUL_RETRY;

--
1.7.10.4

Jan Beulich

2015-06-24 12:33:59 UTC

Post by Paul Durrant
+static int hvmemul_phys_mmio_access(
+ paddr_t gpa, unsigned int size, uint8_t dir, uint8_t **buffer)

As much as the earlier offset you returned via indirection to the
caller was unnecessary, the indirection here seems pointless too.
All callers know how (or don't care) to update the buffer pointer.

Post by Paul Durrant
+static int hvmemul_linear_mmio_access(
+ unsigned long gla, unsigned int size, uint8_t dir, uint8_t *buffer,
+ uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt, bool_t translate)
+{
+ struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
+ unsigned long page_off = gla & (PAGE_SIZE - 1);
+ unsigned int chunk;
+ paddr_t gpa;
+ unsigned long one_rep = 1;
+ int rc;
+
+ chunk = min_t(unsigned int, size, PAGE_SIZE - page_off);
+
+ if ( translate )
+ gpa = pfn_to_paddr(vio->mmio_gpfn) | page_off;

"translate" as name for the parameter signaling that the translation
is known is kind of odd - "translated" or "known_gpfn" or some such?
Or invert the meaning?

Post by Paul Durrant
+ else
+ {
+ rc = hvmemul_linear_to_phys(gla, &gpa, chunk, &one_rep, pfec,
+ hvmemul_ctxt);
+ if ( rc != X86EMUL_OKAY )
+ return rc;
+ }
+
+ for ( ;; )
+ {
+ rc = hvmemul_phys_mmio_access(gpa, chunk, dir, &buffer);
+ if ( rc != X86EMUL_OKAY )
+ break;
+
+ gla += chunk;
+ gpa += chunk;
+ size -= chunk;
+
+ if ( size == 0 )
+ break;
+
+ ASSERT((gla & (PAGE_SIZE - 1)) == 0);
+ chunk = min_t(unsigned int, size, PAGE_SIZE);

I think you could just assert that size is now less than PAGE_SIZE.

Post by Paul Durrant
+ if ( !translate )
+ {
+ rc = hvmemul_linear_to_phys(gla, &gpa, chunk, &one_rep, pfec,
+ hvmemul_ctxt);
+ if ( rc != X86EMUL_OKAY )
+ return rc;
+ }

This must be done unconditionally (and gpa doesn't need updating
above then), as the known translation is only for the first byte
(and whatever falls on the same page).

Jan

Paul Durrant

2015-06-24 12:59:14 UTC

-----Original Message-----
Sent: 24 June 2015 13:34
To: Paul Durrant
Subject: Re: [PATCH v4 04/17] x86/hvm: remove multiple open coded
'chunking' loops

Post by Paul Durrant
+static int hvmemul_phys_mmio_access(
+ paddr_t gpa, unsigned int size, uint8_t dir, uint8_t **buffer)

As much as the earlier offset you returned via indirection to the
caller was unnecessary, the indirection here seems pointless too.
All callers know how (or don't care) to update the buffer pointer.

Ok. Personally I'd prefer one thing to be in charge of updating the pointer though.

Post by Paul Durrant
+static int hvmemul_linear_mmio_access(
+ unsigned long gla, unsigned int size, uint8_t dir, uint8_t *buffer,
+ uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt, bool_t

translate)

Post by Paul Durrant
+{
+ struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
+ unsigned long page_off = gla & (PAGE_SIZE - 1);
+ unsigned int chunk;
+ paddr_t gpa;
+ unsigned long one_rep = 1;
+ int rc;
+
+ chunk = min_t(unsigned int, size, PAGE_SIZE - page_off);
+
+ if ( translate )
+ gpa = pfn_to_paddr(vio->mmio_gpfn) | page_off;

"translate" as name for the parameter signaling that the translation
is known is kind of odd - "translated" or "known_gpfn" or some such?
Or invert the meaning?

Ok. I think I'll go with the latter.

Post by Paul Durrant
+ else
+ {
+ rc = hvmemul_linear_to_phys(gla, &gpa, chunk, &one_rep, pfec,
+ hvmemul_ctxt);
+ if ( rc != X86EMUL_OKAY )
+ return rc;
+ }
+
+ for ( ;; )
+ {
+ rc = hvmemul_phys_mmio_access(gpa, chunk, dir, &buffer);
+ if ( rc != X86EMUL_OKAY )
+ break;
+
+ gla += chunk;
+ gpa += chunk;
+ size -= chunk;
+
+ if ( size == 0 )
+ break;
+
+ ASSERT((gla & (PAGE_SIZE - 1)) == 0);
+ chunk = min_t(unsigned int, size, PAGE_SIZE);

I think you could just assert that size is now less than PAGE_SIZE.

True.

Post by Paul Durrant
+ if ( !translate )
+ {
+ rc = hvmemul_linear_to_phys(gla, &gpa, chunk, &one_rep, pfec,
+ hvmemul_ctxt);
+ if ( rc != X86EMUL_OKAY )
+ return rc;
+ }

This must be done unconditionally (and gpa doesn't need updating
above then), as the known translation is only for the first byte
(and whatever falls on the same page).

Yes indeed, I somehow missed that before.

Paul

Jan

Jan Beulich

2015-06-24 13:09:51 UTC

Post by Paul Durrant

-----Original Message-----
Sent: 24 June 2015 13:34
To: Paul Durrant
Subject: Re: [PATCH v4 04/17] x86/hvm: remove multiple open coded
'chunking' loops

Post by Paul Durrant
+static int hvmemul_phys_mmio_access(
+ paddr_t gpa, unsigned int size, uint8_t dir, uint8_t **buffer)

As much as the earlier offset you returned via indirection to the
caller was unnecessary, the indirection here seems pointless too.
All callers know how (or don't care) to update the buffer pointer.

Ok. Personally I'd prefer one thing to be in charge of updating the pointer though.

Generally seconded, but not at the expense of producing worse
code (which the extra indirection surely does).

Jan

Paul Durrant

2015-06-24 11:24:38 UTC

The implementation of mmio and portio intercepts is unnecessarily different.
This leads to much code duplication. This patch unifies much of the
intercept handling, leaving only distinct handlers for stdvga mmio and dpci
portio. Subsequent patches will unify those handlers.

Signed-off-by: Paul Durrant <***@citrix.com>
Cc: Keir Fraser <***@xen.org>
Cc: Jan Beulich <***@suse.com>
Cc: Andrew Cooper <***@citrix.com>
---
xen/arch/x86/hvm/emulate.c | 11 +-
xen/arch/x86/hvm/hpet.c | 4 +-
xen/arch/x86/hvm/hvm.c | 7 +-
xen/arch/x86/hvm/intercept.c | 502 +++++++++++++----------------
xen/arch/x86/hvm/stdvga.c | 30 +-
xen/arch/x86/hvm/vioapic.c | 4 +-
xen/arch/x86/hvm/vlapic.c | 5 +-
xen/arch/x86/hvm/vmsi.c | 7 +-
xen/drivers/passthrough/amd/iommu_guest.c | 30 +-
xen/include/asm-x86/hvm/domain.h | 1 +
xen/include/asm-x86/hvm/io.h | 119 +++----
11 files changed, 350 insertions(+), 370 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 4d11c6c..9ced81b 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -143,16 +143,7 @@ static int hvmemul_do_io(
hvmtrace_io_assist(&p);
}

- if ( is_mmio )
- {
- rc = hvm_mmio_intercept(&p);
- if ( rc == X86EMUL_UNHANDLEABLE )
- rc = hvm_buffered_io_intercept(&p);
- }
- else
- {
- rc = hvm_portio_intercept(&p);
- }
+ rc = hvm_io_intercept(&p);

switch ( rc )
{
diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index 9585ca8..8958873 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -504,7 +504,7 @@ static int hpet_range(struct vcpu *v, unsigned long addr)
(addr < (HPET_BASE_ADDRESS + HPET_MMAP_SIZE)) );
}

-const struct hvm_mmio_ops hpet_mmio_ops = {
+static const struct hvm_mmio_ops hpet_mmio_ops = {
.check = hpet_range,
.read = hpet_read,
.write = hpet_write
@@ -659,6 +659,8 @@ void hpet_init(struct domain *d)
h->hpet.comparator64[i] = ~0ULL;
h->pt[i].source = PTSRC_isa;
}
+
+ register_mmio_handler(d, &hpet_mmio_ops);
}

void hpet_deinit(struct domain *d)
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..c10db78 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1465,11 +1465,12 @@ int hvm_domain_initialise(struct domain *d)
goto fail0;

d->arch.hvm_domain.params = xzalloc_array(uint64_t, HVM_NR_PARAMS);
- d->arch.hvm_domain.io_handler = xmalloc(struct hvm_io_handler);
+ d->arch.hvm_domain.io_handler = xzalloc_array(struct hvm_io_handler,
+ NR_IO_HANDLERS);
rc = -ENOMEM;
if ( !d->arch.hvm_domain.params || !d->arch.hvm_domain.io_handler )
goto fail1;
- d->arch.hvm_domain.io_handler->num_slot = 0;
+ d->arch.hvm_domain.io_handler_count = 0;

/* Set the default IO Bitmap. */
if ( is_hardware_domain(d) )
@@ -1506,6 +1507,8 @@ int hvm_domain_initialise(struct domain *d)

rtc_init(d);

+ msixtbl_init(d);
+
register_portio_handler(d, 0xe9, 1, hvm_print_line);
register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);

diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index cc44733..4db024e 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -32,205 +32,97 @@
#include <xen/event.h>
#include <xen/iommu.h>

-static const struct hvm_mmio_ops *const
-hvm_mmio_handlers[HVM_MMIO_HANDLER_NR] =
+static bool_t hvm_mmio_accept(struct hvm_io_handler *handler,
+ uint64_t addr,
+ uint64_t size)
{
- &hpet_mmio_ops,
- &vlapic_mmio_ops,
- &vioapic_mmio_ops,
- &msixtbl_mmio_ops,
- &iommu_mmio_ops
-};
+ BUG_ON(handler->type != IOREQ_TYPE_COPY);
+
+ return handler->u.mmio.ops->check(current, addr);
+}

-static int hvm_mmio_access(struct vcpu *v,
- ioreq_t *p,
- hvm_mmio_read_t read,
- hvm_mmio_write_t write)
+static int hvm_mmio_read(struct hvm_io_handler *handler,
+ uint64_t addr,
+ uint64_t size,
+ uint64_t *data)
{
- struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
- unsigned long data;
- int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
+ BUG_ON(handler->type != IOREQ_TYPE_COPY);

- if ( !p->data_is_ptr )
- {
- if ( p->dir == IOREQ_READ )
- {
- if ( vio->mmio_retrying )
- {
- if ( vio->mmio_large_read_bytes != p->size )
- return X86EMUL_UNHANDLEABLE;
- memcpy(&data, vio->mmio_large_read, p->size);
- vio->mmio_large_read_bytes = 0;
- vio->mmio_retrying = 0;
- }
- else
- rc = read(v, p->addr, p->size, &data);
- p->data = data;
- }
- else /* p->dir == IOREQ_WRITE */
- rc = write(v, p->addr, p->size, p->data);
- return rc;
- }
+ return handler->u.mmio.ops->read(current, addr, size, data);
+}

- if ( p->dir == IOREQ_READ )
- {
- for ( i = 0; i < p->count; i++ )
- {
- if ( vio->mmio_retrying )
- {
- if ( vio->mmio_large_read_bytes != p->size )
- return X86EMUL_UNHANDLEABLE;
- memcpy(&data, vio->mmio_large_read, p->size);
- vio->mmio_large_read_bytes = 0;
- vio->mmio_retrying = 0;
- }
- else
- {
- rc = read(v, p->addr + step * i, p->size, &data);
- if ( rc != X86EMUL_OKAY )
- break;
- }
- switch ( hvm_copy_to_guest_phys(p->data + step * i,
- &data, p->size) )
- {
- case HVMCOPY_okay:
- break;
- case HVMCOPY_gfn_paged_out:
- case HVMCOPY_gfn_shared:
- rc = X86EMUL_RETRY;
- break;
- case HVMCOPY_bad_gfn_to_mfn:
- /* Drop the write as real hardware would. */
- continue;
- case HVMCOPY_bad_gva_to_gfn:
- ASSERT(0);
- /* fall through */
- default:
- rc = X86EMUL_UNHANDLEABLE;
- break;
- }
- if ( rc != X86EMUL_OKAY)
- break;
- }
+static int hvm_mmio_write(struct hvm_io_handler *handler,
+ uint64_t addr,
+ uint64_t size,
+ uint64_t data)
+{
+ BUG_ON(handler->type != IOREQ_TYPE_COPY);

- if ( rc == X86EMUL_RETRY )
- {
- vio->mmio_retry = 1;
- vio->mmio_large_read_bytes = p->size;
- memcpy(vio->mmio_large_read, &data, p->size);
- }
- }
- else
- {
- for ( i = 0; i < p->count; i++ )
- {
- switch ( hvm_copy_from_guest_phys(&data, p->data + step * i,
- p->size) )
- {
- case HVMCOPY_okay:
- break;
- case HVMCOPY_gfn_paged_out:
- case HVMCOPY_gfn_shared:
- rc = X86EMUL_RETRY;
- break;
- case HVMCOPY_bad_gfn_to_mfn:
- data = ~0;
- break;
- case HVMCOPY_bad_gva_to_gfn:
- ASSERT(0);
- /* fall through */
- default:
- rc = X86EMUL_UNHANDLEABLE;
- break;
- }
- if ( rc != X86EMUL_OKAY )
- break;
- rc = write(v, p->addr + step * i, p->size, data);
- if ( rc != X86EMUL_OKAY )
- break;
- }
+ return handler->u.mmio.ops->write(current, addr, size, data);
+}

- if ( rc == X86EMUL_RETRY )
- vio->mmio_retry = 1;
- }
+static const struct hvm_io_ops mmio_ops = {
+ .accept = hvm_mmio_accept,
+ .read = hvm_mmio_read,
+ .write = hvm_mmio_write
+};

- if ( i != 0 )
- {
- p->count = i;
- rc = X86EMUL_OKAY;
- }
+static bool_t hvm_portio_accept(struct hvm_io_handler *handler,
+ uint64_t addr,
+ uint64_t size)
+{
+ BUG_ON(handler->type != IOREQ_TYPE_PIO);

- return rc;
+ return (addr >= handler->u.portio.start) &&
+ ((addr + size) <= handler->u.portio.end);
}

-bool_t hvm_mmio_internal(paddr_t gpa)
+static int hvm_portio_read(struct hvm_io_handler *handler,
+ uint64_t addr,
+ uint64_t size,
+ uint64_t *data)
{
- struct vcpu *curr = current;
- unsigned int i;
+ uint32_t val = 0;
+ int rc;

- for ( i = 0; i < HVM_MMIO_HANDLER_NR; ++i )
- if ( hvm_mmio_handlers[i]->check(curr, gpa) )
- return 1;
+ BUG_ON(handler->type != IOREQ_TYPE_PIO);

- return 0;
+ rc = handler->u.portio.action(IOREQ_READ, addr, size, &val);
+ if ( rc == X86EMUL_OKAY )
+ *data = val;
+
+ return rc;
}

-int hvm_mmio_intercept(ioreq_t *p)
+static int hvm_portio_write(struct hvm_io_handler *handler,
+ uint64_t addr,
+ uint64_t size,
+ uint64_t data)
{
- struct vcpu *v = current;
- int i;
+ uint32_t val = data;

- for ( i = 0; i < HVM_MMIO_HANDLER_NR; i++ )
- {
- hvm_mmio_check_t check = hvm_mmio_handlers[i]->check;
-
- if ( check(v, p->addr) )
- {
- if ( unlikely(p->count > 1) &&
- !check(v, unlikely(p->df)
- ? p->addr - (p->count - 1L) * p->size
- : p->addr + (p->count - 1L) * p->size) )
- p->count = 1;
-
- return hvm_mmio_access(
- v, p,
- hvm_mmio_handlers[i]->read,
- hvm_mmio_handlers[i]->write);
- }
- }
+ BUG_ON(handler->type != IOREQ_TYPE_PIO);

- return X86EMUL_UNHANDLEABLE;
+ return handler->u.portio.action(IOREQ_WRITE, addr, size, &val);
}

-static int process_portio_intercept(portio_action_t action, ioreq_t *p)
+static const struct hvm_io_ops portio_ops = {
+ .accept = hvm_portio_accept,
+ .read = hvm_portio_read,
+ .write = hvm_portio_write
+};
+
+static int hvm_process_io_intercept(struct hvm_io_handler *handler,
+ ioreq_t *p)
{
struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
+ const struct hvm_io_ops *ops =
+ (p->type == IOREQ_TYPE_COPY) ?
+ &mmio_ops :
+ &portio_ops;
int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
- uint32_t data;
-
- if ( !p->data_is_ptr )
- {
- if ( p->dir == IOREQ_READ )
- {
- if ( vio->mmio_retrying )
- {
- if ( vio->mmio_large_read_bytes != p->size )
- return X86EMUL_UNHANDLEABLE;
- memcpy(&data, vio->mmio_large_read, p->size);
- vio->mmio_large_read_bytes = 0;
- vio->mmio_retrying = 0;
- }
- else
- rc = action(IOREQ_READ, p->addr, p->size, &data);
- p->data = data;
- }
- else
- {
- data = p->data;
- rc = action(IOREQ_WRITE, p->addr, p->size, &data);
- }
- return rc;
- }
+ uint64_t data;
+ uint64_t addr;

if ( p->dir == IOREQ_READ )
{
@@ -246,31 +138,40 @@ static int process_portio_intercept(portio_action_t action, ioreq_t *p)
}
else
{
- rc = action(IOREQ_READ, p->addr, p->size, &data);
+ addr = (p->type == IOREQ_TYPE_COPY) ?
+ p->addr + step * i :
+ p->addr;
+ rc = ops->read(handler, addr, p->size, &data);
if ( rc != X86EMUL_OKAY )
break;
}
- switch ( hvm_copy_to_guest_phys(p->data + step * i,
- &data, p->size) )
+
+ if ( p->data_is_ptr )
{
- case HVMCOPY_okay:
- break;
- case HVMCOPY_gfn_paged_out:
- case HVMCOPY_gfn_shared:
- rc = X86EMUL_RETRY;
- break;
- case HVMCOPY_bad_gfn_to_mfn:
- /* Drop the write as real hardware would. */
- continue;
- case HVMCOPY_bad_gva_to_gfn:
- ASSERT(0);
- /* fall through */
- default:
- rc = X86EMUL_UNHANDLEABLE;
- break;
+ switch ( hvm_copy_to_guest_phys(p->data + step * i,
+ &data, p->size) )
+ {
+ case HVMCOPY_okay:
+ break;
+ case HVMCOPY_gfn_paged_out:
+ case HVMCOPY_gfn_shared:
+ rc = X86EMUL_RETRY;
+ break;
+ case HVMCOPY_bad_gfn_to_mfn:
+ /* Drop the write as real hardware would. */
+ continue;
+ case HVMCOPY_bad_gva_to_gfn:
+ ASSERT_UNREACHABLE();
+ /* fall through */
+ default:
+ rc = X86EMUL_UNHANDLEABLE;
+ break;
+ }
+ if ( rc != X86EMUL_OKAY )
+ break;
}
- if ( rc != X86EMUL_OKAY)
- break;
+ else
+ p->data = data;
}

if ( rc == X86EMUL_RETRY )
@@ -284,29 +185,37 @@ static int process_portio_intercept(portio_action_t action, ioreq_t *p)
{
for ( i = 0; i < p->count; i++ )
{
- data = 0;
- switch ( hvm_copy_from_guest_phys(&data, p->data + step * i,
- p->size) )
+ if ( p->data_is_ptr )
{
- case HVMCOPY_okay:
- break;
- case HVMCOPY_gfn_paged_out:
- case HVMCOPY_gfn_shared:
- rc = X86EMUL_RETRY;
- break;
- case HVMCOPY_bad_gfn_to_mfn:
- data = ~0;
- break;
- case HVMCOPY_bad_gva_to_gfn:
- ASSERT(0);
- /* fall through */
- default:
- rc = X86EMUL_UNHANDLEABLE;
- break;
+ switch ( hvm_copy_from_guest_phys(&data, p->data + step * i,
+ p->size) )
+ {
+ case HVMCOPY_okay:
+ break;
+ case HVMCOPY_gfn_paged_out:
+ case HVMCOPY_gfn_shared:
+ rc = X86EMUL_RETRY;
+ break;
+ case HVMCOPY_bad_gfn_to_mfn:
+ data = ~0;
+ break;
+ case HVMCOPY_bad_gva_to_gfn:
+ ASSERT_UNREACHABLE();
+ /* fall through */
+ default:
+ rc = X86EMUL_UNHANDLEABLE;
+ break;
+ }
+ if ( rc != X86EMUL_OKAY )
+ break;
}
- if ( rc != X86EMUL_OKAY )
- break;
- rc = action(IOREQ_WRITE, p->addr, p->size, &data);
+ else
+ data = p->data;
+
+ addr = (p->type == IOREQ_TYPE_COPY) ?
+ p->addr + step * i :
+ p->addr;
+ rc = ops->write(handler, addr, p->size, data);
if ( rc != X86EMUL_OKAY )
break;
}
@@ -324,78 +233,133 @@ static int process_portio_intercept(portio_action_t action, ioreq_t *p)
return rc;
}

-/*
- * Check if the request is handled inside xen
- * return value: 0 --not handled; 1 --handled
- */
-int hvm_io_intercept(ioreq_t *p, int type)
+static struct hvm_io_handler *hvm_find_io_handler(ioreq_t *p)
+{
+ struct vcpu *curr = current;
+ struct domain *curr_d = curr->domain;
+ const struct hvm_io_ops *ops =
+ (p->type == IOREQ_TYPE_COPY) ?
+ &mmio_ops :
+ &portio_ops;
+ unsigned int i;
+
+ for ( i = 0; i < curr_d->arch.hvm_domain.io_handler_count; i++ )
+ {
+ struct hvm_io_handler *handler =
+ &curr_d->arch.hvm_domain.io_handler[i];
+ uint64_t start, end, count = p->count, size = p->size;
+
+ if ( handler->type != p->type )
+ continue;
+
+ switch ( handler->type )
+ {
+ case IOREQ_TYPE_PIO:
+ start = p->addr;
+ end = p->addr + size;
+ break;
+ case IOREQ_TYPE_COPY:
+ if ( p->df )
+ {
+ start = (p->addr - (count - 1) * size);
+ end = p->addr + size;
+ }
+ else
+ {
+ start = p->addr;
+ end = p->addr + count * size;
+ }
+ break;
+ default:
+ BUG();
+ }
+
+ if ( ops->accept(handler, start, end - start) )
+ return handler;
+ }
+
+ return NULL;
+}
+
+int hvm_io_intercept(ioreq_t *p)
{
- struct vcpu *v = current;
- struct hvm_io_handler *handler = v->domain->arch.hvm_domain.io_handler;
- int i;
- unsigned long addr, size;
+ struct hvm_io_handler *handler;

- if ( type == HVM_PORTIO )
+ if ( p->type == IOREQ_TYPE_PIO )
{
int rc = dpci_ioport_intercept(p);
if ( (rc == X86EMUL_OKAY) || (rc == X86EMUL_RETRY) )
return rc;
}
-
- for ( i = 0; i < handler->num_slot; i++ )
+ else if ( p->type == IOREQ_TYPE_COPY )
{
- if ( type != handler->hdl_list[i].type )
- continue;
- addr = handler->hdl_list[i].addr;
- size = handler->hdl_list[i].size;
- if ( (p->addr >= addr) &&
- ((p->addr + p->size) <= (addr + size)) )
- {
- if ( type == HVM_PORTIO )
- return process_portio_intercept(
- handler->hdl_list[i].action.portio, p);
+ int rc = stdvga_intercept_mmio(p);
+ if ( (rc == X86EMUL_OKAY) || (rc == X86EMUL_RETRY) )
+ return rc;
+ }

- if ( unlikely(p->count > 1) &&
- (unlikely(p->df)
- ? p->addr - (p->count - 1L) * p->size < addr
- : p->addr + p->count * 1L * p->size - 1 >= addr + size) )
- p->count = 1;
+ handler = hvm_find_io_handler(p);

- return handler->hdl_list[i].action.mmio(p);
- }
- }
+ if ( handler == NULL )
+ return X86EMUL_UNHANDLEABLE;

- return X86EMUL_UNHANDLEABLE;
+ return hvm_process_io_intercept(handler, p);
}

-void register_io_handler(
- struct domain *d, unsigned long addr, unsigned long size,
- void *action, int type)
+static struct hvm_io_handler *hvm_next_io_handler(struct domain *d)
{
- struct hvm_io_handler *handler = d->arch.hvm_domain.io_handler;
- int num = handler->num_slot;
+ unsigned int i = d->arch.hvm_domain.io_handler_count++;

- BUG_ON(num >= MAX_IO_HANDLER);
+ if (i == NR_IO_HANDLERS)
+ domain_crash(d);

- handler->hdl_list[num].addr = addr;
- handler->hdl_list[num].size = size;
- handler->hdl_list[num].action.ptr = action;
- handler->hdl_list[num].type = type;
- handler->num_slot++;
+ return &d->arch.hvm_domain.io_handler[i];
}

-void relocate_io_handler(
- struct domain *d, unsigned long old_addr, unsigned long new_addr,
- unsigned long size, int type)
+void register_mmio_handler(struct domain *d, const struct hvm_mmio_ops *ops)
{
- struct hvm_io_handler *handler = d->arch.hvm_domain.io_handler;
- int i;
-
- for ( i = 0; i < handler->num_slot; i++ )
- if ( (handler->hdl_list[i].addr == old_addr) &&
- (handler->hdl_list[i].size == size) &&
- (handler->hdl_list[i].type == type) )
- handler->hdl_list[i].addr = new_addr;
+ struct hvm_io_handler *handler = hvm_next_io_handler(d);
+
+ handler->type = IOREQ_TYPE_COPY;
+ handler->u.mmio.ops = ops;
+}
+
+void register_portio_handler(struct domain *d, uint32_t addr,
+ uint32_t size, portio_action_t action)
+{
+ struct hvm_io_handler *handler = hvm_next_io_handler(d);
+
+ handler->type = IOREQ_TYPE_PIO;
+ handler->u.portio.start = addr;
+ handler->u.portio.end = addr + size;
+ handler->u.portio.action = action;
+}
+
+void relocate_portio_handler(struct domain *d, uint32_t old_addr,
+ uint32_t new_addr, uint32_t size)
+{
+ ioreq_t p = {
+ .type = IOREQ_TYPE_PIO,
+ .addr = old_addr,
+ .size = size
+ };
+ struct hvm_io_handler *handler = hvm_find_io_handler(&p);
+
+ if ( handler == NULL )
+ return;
+
+ handler->u.portio.start = new_addr;
+ handler->u.portio.end = new_addr + size;
+}
+
+bool_t hvm_mmio_internal(paddr_t gpa)
+{
+ ioreq_t p = {
+ .type = IOREQ_TYPE_COPY,
+ .addr = gpa
+ };
+
+ return (hvm_find_io_handler(&p) != NULL);
}

/*
diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
index 13d1029..dcd532a 100644
--- a/xen/arch/x86/hvm/stdvga.c
+++ b/xen/arch/x86/hvm/stdvga.c
@@ -547,12 +547,27 @@ static int mmio_move(struct hvm_hw_stdvga *s, ioreq_t *p)
return 1;
}

-static int stdvga_intercept_mmio(ioreq_t *p)
+int stdvga_intercept_mmio(ioreq_t *p)
{
struct domain *d = current->domain;
struct hvm_hw_stdvga *s = &d->arch.hvm_domain.stdvga;
+ uint64_t start, end, count = p->count, size = p->size;
int buf = 0, rc;

+ if ( p->df )
+ {
+ start = (p->addr - (count - 1) * size);
+ end = p->addr + size;
+ }
+ else
+ {
+ start = p->addr;
+ end = p->addr + count * size;
+ }
+
+ if ( (start < VGA_MEM_BASE) || (end > (VGA_MEM_BASE + VGA_MEM_SIZE)) )
+ return X86EMUL_UNHANDLEABLE;
+
if ( p->size > 8 )
{
gdprintk(XENLOG_WARNING, "invalid mmio size %d\n", (int)p->size);
@@ -619,9 +634,6 @@ void stdvga_init(struct domain *d)
register_portio_handler(d, 0x3c4, 2, stdvga_intercept_pio);
/* Graphics registers. */
register_portio_handler(d, 0x3ce, 2, stdvga_intercept_pio);
- /* MMIO. */
- register_buffered_io_handler(
- d, VGA_MEM_BASE, VGA_MEM_SIZE, stdvga_intercept_mmio);
}
}

@@ -638,3 +650,13 @@ void stdvga_deinit(struct domain *d)
s->vram_page[i] = NULL;
}
}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index cbbef9f..9ad909b 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -250,7 +250,7 @@ static int vioapic_range(struct vcpu *v, unsigned long addr)
(addr < vioapic->base_address + VIOAPIC_MEM_LENGTH)));
}

-const struct hvm_mmio_ops vioapic_mmio_ops = {
+static const struct hvm_mmio_ops vioapic_mmio_ops = {
.check = vioapic_range,
.read = vioapic_read,
.write = vioapic_write
@@ -456,6 +456,8 @@ int vioapic_init(struct domain *d)
d->arch.hvm_domain.vioapic->domain = d;
vioapic_reset(d);

+ register_mmio_handler(d, &vioapic_mmio_ops);
+
return 0;
}

diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index d5855cc..f2052cf 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -996,7 +996,7 @@ static int vlapic_range(struct vcpu *v, unsigned long addr)
(offset < PAGE_SIZE);
}

-const struct hvm_mmio_ops vlapic_mmio_ops = {
+static const struct hvm_mmio_ops vlapic_mmio_ops = {
.check = vlapic_range,
.read = vlapic_read,
.write = vlapic_write
@@ -1462,6 +1462,9 @@ int vlapic_init(struct vcpu *v)
vlapic_init_sipi_action,
(unsigned long)v);

+ if ( v->vcpu_id == 0 )
+ register_mmio_handler(v->domain, &vlapic_mmio_ops);
+
return 0;
}

diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index f89233d..09ea301 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -344,7 +344,7 @@ static int msixtbl_range(struct vcpu *v, unsigned long addr)
return !!desc;
}

-const struct hvm_mmio_ops msixtbl_mmio_ops = {
+static const struct hvm_mmio_ops msixtbl_mmio_ops = {
.check = msixtbl_range,
.read = msixtbl_read,
.write = msixtbl_write
@@ -481,6 +481,11 @@ found:
spin_unlock_irq(&irq_desc->lock);
}

+void msixtbl_init(struct domain *d)
+{
+ register_mmio_handler(d, &msixtbl_mmio_ops);
+}
+
void msixtbl_pt_cleanup(struct domain *d)
{
struct msixtbl_entry *entry, *temp;
diff --git a/xen/drivers/passthrough/amd/iommu_guest.c b/xen/drivers/passthrough/amd/iommu_guest.c
index 7b0c102..9c3c488 100644
--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -868,6 +868,20 @@ static void guest_iommu_reg_init(struct guest_iommu *iommu)
iommu->reg_ext_feature.hi = upper;
}

+static int guest_iommu_mmio_range(struct vcpu *v, unsigned long addr)
+{
+ struct guest_iommu *iommu = vcpu_iommu(v);
+
+ return iommu && addr >= iommu->mmio_base &&
+ addr < iommu->mmio_base + IOMMU_MMIO_SIZE;
+}
+
+static const struct hvm_mmio_ops iommu_mmio_ops = {
+ .check = guest_iommu_mmio_range,
+ .read = guest_iommu_mmio_read,
+ .write = guest_iommu_mmio_write
+};
+
/* Domain specific initialization */
int guest_iommu_init(struct domain* d)
{
@@ -894,6 +908,8 @@ int guest_iommu_init(struct domain* d)

spin_lock_init(&iommu->lock);

+ register_mmio_handler(d, &iommu_mmio_ops);
+
return 0;
}

@@ -910,17 +926,3 @@ void guest_iommu_destroy(struct domain *d)

domain_hvm_iommu(d)->arch.g_iommu = NULL;
}
-
-static int guest_iommu_mmio_range(struct vcpu *v, unsigned long addr)
-{
- struct guest_iommu *iommu = vcpu_iommu(v);
-
- return iommu && addr >= iommu->mmio_base &&
- addr < iommu->mmio_base + IOMMU_MMIO_SIZE;
-}
-
-const struct hvm_mmio_ops iommu_mmio_ops = {
- .check = guest_iommu_mmio_range,
- .read = guest_iommu_mmio_read,
- .write = guest_iommu_mmio_write
-};
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index ad68fcf..b612755 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -94,6 +94,7 @@ struct hvm_domain {
struct pl_time pl_time;

struct hvm_io_handler *io_handler;
+ unsigned int io_handler_count;

/* Lock protects access to irq, vpic and vioapic. */
spinlock_t irq_lock;
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index f2aaec5..fecd02d 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -25,10 +25,7 @@
#include <public/hvm/ioreq.h>
#include <public/event_channel.h>

-#define MAX_IO_HANDLER 16
-
-#define HVM_PORTIO 0
-#define HVM_BUFFERED_IO 2
+#define NR_IO_HANDLERS 32

typedef int (*hvm_mmio_read_t)(struct vcpu *v,
unsigned long addr,
@@ -40,81 +37,57 @@ typedef int (*hvm_mmio_write_t)(struct vcpu *v,
unsigned long val);
typedef int (*hvm_mmio_check_t)(struct vcpu *v, unsigned long addr);

-typedef int (*portio_action_t)(
- int dir, uint32_t port, uint32_t bytes, uint32_t *val);
-typedef int (*mmio_action_t)(ioreq_t *);
-struct io_handler {
- int type;
- unsigned long addr;
- unsigned long size;
- union {
- portio_action_t portio;
- mmio_action_t mmio;
- void *ptr;
- } action;
-};
-
-struct hvm_io_handler {
- int num_slot;
- struct io_handler hdl_list[MAX_IO_HANDLER];
-};
-
struct hvm_mmio_ops {
hvm_mmio_check_t check;
hvm_mmio_read_t read;
hvm_mmio_write_t write;
};

-extern const struct hvm_mmio_ops hpet_mmio_ops;
-extern const struct hvm_mmio_ops vlapic_mmio_ops;
-extern const struct hvm_mmio_ops vioapic_mmio_ops;
-extern const struct hvm_mmio_ops msixtbl_mmio_ops;
-extern const struct hvm_mmio_ops iommu_mmio_ops;
-
-#define HVM_MMIO_HANDLER_NR 5
+typedef int (*portio_action_t)(
+ int dir, uint32_t port, uint32_t bytes, uint32_t *val);

-int hvm_io_intercept(ioreq_t *p, int type);
-void register_io_handler(
- struct domain *d, unsigned long addr, unsigned long size,
- void *action, int type);
-void relocate_io_handler(
- struct domain *d, unsigned long old_addr, unsigned long new_addr,
- unsigned long size, int type);
+struct hvm_io_handler {
+ union {
+ struct {
+ const struct hvm_mmio_ops *ops;
+ } mmio;
+ struct {
+ uint32_t start, end;
+ portio_action_t action;
+ } portio;
+ } u;
+ uint8_t type;
+};

-static inline int hvm_portio_intercept(ioreq_t *p)
-{
- return hvm_io_intercept(p, HVM_PORTIO);
-}
+typedef int (*hvm_io_read_t)(struct hvm_io_handler *,
+ uint64_t addr,
+ uint64_t size,
+ uint64_t *data);
+typedef int (*hvm_io_write_t)(struct hvm_io_handler *,
+ uint64_t addr,
+ uint64_t size,
+ uint64_t data);
+typedef bool_t (*hvm_io_accept_t)(struct hvm_io_handler *,
+ uint64_t addr,
+ uint64_t size);
+
+struct hvm_io_ops {
+ hvm_io_accept_t accept;
+ hvm_io_read_t read;
+ hvm_io_write_t write;
+};

-static inline int hvm_buffered_io_intercept(ioreq_t *p)
-{
- return hvm_io_intercept(p, HVM_BUFFERED_IO);
-}
+int hvm_io_intercept(ioreq_t *p);

+void register_mmio_handler(struct domain *d,
+ const struct hvm_mmio_ops *ops);
+void register_portio_handler(struct domain *d, uint32_t addr,
+ uint32_t size, portio_action_t action);
+void relocate_portio_handler(struct domain *d, uint32_t old_addr,
+ uint32_t new_addr, uint32_t size);
bool_t hvm_mmio_internal(paddr_t gpa);
-int hvm_mmio_intercept(ioreq_t *p);
-int hvm_buffered_io_send(ioreq_t *p);

-static inline void register_portio_handler(
- struct domain *d, unsigned long addr,
- unsigned long size, portio_action_t action)
-{
- register_io_handler(d, addr, size, action, HVM_PORTIO);
-}
-
-static inline void relocate_portio_handler(
- struct domain *d, unsigned long old_addr, unsigned long new_addr,
- unsigned long size)
-{
- relocate_io_handler(d, old_addr, new_addr, size, HVM_PORTIO);
-}
-
-static inline void register_buffered_io_handler(
- struct domain *d, unsigned long addr,
- unsigned long size, mmio_action_t action)
-{
- register_io_handler(d, addr, size, action, HVM_BUFFERED_IO);
-}
+int hvm_buffered_io_send(ioreq_t *p);

void send_timeoffset_req(unsigned long timeoff);
void send_invalidate_req(void);
@@ -127,6 +100,7 @@ void hvm_io_assist(ioreq_t *p);
void hvm_dpci_eoi(struct domain *d, unsigned int guest_irq,
const union vioapic_redir_entry *ent);
void msix_write_completion(struct vcpu *);
+void msixtbl_init(struct domain *d);

struct hvm_hw_stdvga {
uint8_t sr_index;
@@ -141,8 +115,19 @@ struct hvm_hw_stdvga {
};

void stdvga_init(struct domain *d);
+int stdvga_intercept_mmio(ioreq_t *p);
void stdvga_deinit(struct domain *d);

extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
#endif /* __ASM_X86_HVM_IO_H__ */

+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */

--
1.7.10.4

Paul Durrant

2015-06-24 11:24:39 UTC

When memory mapped I/O is range checked by internal handlers, the length
of the access should be taken into account.

Signed-off-by: Paul Durrant <***@citrix.com>
Cc: Keir Fraser <***@xen.org>
Cc: Jan Beulich <***@suse.com>
Cc: Andrew Cooper <***@citrix.com>
---
xen/arch/x86/hvm/hpet.c | 7 ++++---
xen/arch/x86/hvm/intercept.c | 2 +-
xen/arch/x86/hvm/vioapic.c | 17 ++++++++++++++---
xen/arch/x86/hvm/vlapic.c | 8 +++++---
xen/arch/x86/hvm/vmsi.c | 27 ++++++++++++++++++++-------
xen/drivers/passthrough/amd/iommu_guest.c | 18 +++++++++++++++---
xen/include/asm-x86/hvm/io.h | 4 +++-
7 files changed, 62 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index 8958873..1a1f239 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -498,10 +498,11 @@ static int hpet_write(
return X86EMUL_OKAY;
}

-static int hpet_range(struct vcpu *v, unsigned long addr)
+static int hpet_range(struct vcpu *v, unsigned long addr,
+ unsigned long length)
{
- return ( (addr >= HPET_BASE_ADDRESS) &&
- (addr < (HPET_BASE_ADDRESS + HPET_MMAP_SIZE)) );
+ return (addr >= HPET_BASE_ADDRESS) &&
+ ((addr + length) < (HPET_BASE_ADDRESS + HPET_MMAP_SIZE));
}

static const struct hvm_mmio_ops hpet_mmio_ops = {
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index 4db024e..5e8d8b2 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -38,7 +38,7 @@ static bool_t hvm_mmio_accept(struct hvm_io_handler *handler,
{
BUG_ON(handler->type != IOREQ_TYPE_COPY);

- return handler->u.mmio.ops->check(current, addr);
+ return handler->u.mmio.ops->check(current, addr, size);
}

static int hvm_mmio_read(struct hvm_io_handler *handler,
diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 9ad909b..4a9b33e 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -242,12 +242,13 @@ static int vioapic_write(
return X86EMUL_OKAY;
}

-static int vioapic_range(struct vcpu *v, unsigned long addr)
+static int vioapic_range(struct vcpu *v, unsigned long addr,
+ unsigned long length)
{
struct hvm_hw_vioapic *vioapic = domain_vioapic(v->domain);

- return ((addr >= vioapic->base_address &&
- (addr < vioapic->base_address + VIOAPIC_MEM_LENGTH)));
+ return (addr >= vioapic->base_address) &&
+ ((addr + length) <= (vioapic->base_address + VIOAPIC_MEM_LENGTH));
}

static const struct hvm_mmio_ops vioapic_mmio_ops = {
@@ -466,3 +467,13 @@ void vioapic_deinit(struct domain *d)
xfree(d->arch.hvm_domain.vioapic);
d->arch.hvm_domain.vioapic = NULL;
}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index f2052cf..7421fc5 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -986,14 +986,16 @@ int hvm_x2apic_msr_write(struct vcpu *v, unsigned int msr, uint64_t msr_content)
return vlapic_reg_write(v, offset, (uint32_t)msr_content);
}

-static int vlapic_range(struct vcpu *v, unsigned long addr)
+static int vlapic_range(struct vcpu *v, unsigned long address,
+ unsigned long len)
{
struct vlapic *vlapic = vcpu_vlapic(v);
- unsigned long offset = addr - vlapic_base_address(vlapic);
+ unsigned long offset = address - vlapic_base_address(vlapic);

return !vlapic_hw_disabled(vlapic) &&
!vlapic_x2apic_mode(vlapic) &&
- (offset < PAGE_SIZE);
+ (address >= vlapic_base_address(vlapic)) &&
+ ((offset + len) <= PAGE_SIZE);
}

static const struct hvm_mmio_ops vlapic_mmio_ops = {
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 09ea301..61fe391 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -168,14 +168,14 @@ struct msixtbl_entry
static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock);

static struct msixtbl_entry *msixtbl_find_entry(
- struct vcpu *v, unsigned long addr)
+ struct vcpu *v, unsigned long address, unsigned long len)
{
struct msixtbl_entry *entry;
struct domain *d = v->domain;

list_for_each_entry( entry, &d->arch.hvm_domain.msixtbl_list, list )
- if ( addr >= entry->gtable &&
- addr < entry->gtable + entry->table_len )
+ if ( (address >= entry->gtable) &&
+ ((address + len) <= (entry->gtable + entry->table_len)) )
return entry;

return NULL;
@@ -214,7 +214,7 @@ static int msixtbl_read(

rcu_read_lock(&msixtbl_rcu_lock);

- entry = msixtbl_find_entry(v, address);
+ entry = msixtbl_find_entry(v, address, len);
if ( !entry )
goto out;
offset = address & (PCI_MSIX_ENTRY_SIZE - 1);
@@ -273,7 +273,7 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,

rcu_read_lock(&msixtbl_rcu_lock);

- entry = msixtbl_find_entry(v, address);
+ entry = msixtbl_find_entry(v, address, len);
if ( !entry )
goto out;
nr_entry = (address - entry->gtable) / PCI_MSIX_ENTRY_SIZE;
@@ -333,12 +333,15 @@ out:
return r;
}

-static int msixtbl_range(struct vcpu *v, unsigned long addr)
+static int msixtbl_range(struct vcpu *v, unsigned long address,
+ unsigned long len)
{
+ struct msixtbl_entry *entry;
const struct msi_desc *desc;

rcu_read_lock(&msixtbl_rcu_lock);
- desc = msixtbl_addr_to_desc(msixtbl_find_entry(v, addr), addr);
+ entry = msixtbl_find_entry(v, address, len);
+ desc = msixtbl_addr_to_desc(entry, address);
rcu_read_unlock(&msixtbl_rcu_lock);

return !!desc;
@@ -514,3 +517,13 @@ void msix_write_completion(struct vcpu *v)
if ( msixtbl_write(v, ctrl_address, 4, 0) != X86EMUL_OKAY )
gdprintk(XENLOG_WARNING, "MSI-X write completion failure\n");
}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/drivers/passthrough/amd/iommu_guest.c b/xen/drivers/passthrough/amd/iommu_guest.c
index 9c3c488..dd84281 100644
--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -868,12 +868,14 @@ static void guest_iommu_reg_init(struct guest_iommu *iommu)
iommu->reg_ext_feature.hi = upper;
}

-static int guest_iommu_mmio_range(struct vcpu *v, unsigned long addr)
+static int guest_iommu_mmio_range(struct vcpu *v, unsigned long addr,
+ unsigned long length)
{
struct guest_iommu *iommu = vcpu_iommu(v);

- return iommu && addr >= iommu->mmio_base &&
- addr < iommu->mmio_base + IOMMU_MMIO_SIZE;
+ return iommu &&
+ (addr >= iommu->mmio_base) &&
+ ((addr + length) <= (iommu->mmio_base + IOMMU_MMIO_SIZE));
}

static const struct hvm_mmio_ops iommu_mmio_ops = {
@@ -926,3 +928,13 @@ void guest_iommu_destroy(struct domain *d)

domain_hvm_iommu(d)->arch.g_iommu = NULL;
}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index fecd02d..b4596fc 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -35,7 +35,9 @@ typedef int (*hvm_mmio_write_t)(struct vcpu *v,
unsigned long addr,
unsigned long length,
unsigned long val);
-typedef int (*hvm_mmio_check_t)(struct vcpu *v, unsigned long addr);
+typedef int (*hvm_mmio_check_t)(struct vcpu *v,
+ unsigned long addr,
+ unsigned long length);

struct hvm_mmio_ops {
hvm_mmio_check_t check;

--
1.7.10.4

Jan Beulich

2015-06-24 13:08:24 UTC

Post by Paul Durrant
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -498,10 +498,11 @@ static int hpet_write(
return X86EMUL_OKAY;
}
-static int hpet_range(struct vcpu *v, unsigned long addr)
+static int hpet_range(struct vcpu *v, unsigned long addr,
+ unsigned long length)
{
- return ( (addr >= HPET_BASE_ADDRESS) &&
- (addr < (HPET_BASE_ADDRESS + HPET_MMAP_SIZE)) );
+ return (addr >= HPET_BASE_ADDRESS) &&
+ ((addr + length) < (HPET_BASE_ADDRESS + HPET_MMAP_SIZE));

<=

Post by Paul Durrant
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -986,14 +986,16 @@ int hvm_x2apic_msr_write(struct vcpu *v, unsigned int msr, uint64_t msr_content)
return vlapic_reg_write(v, offset, (uint32_t)msr_content);
}
-static int vlapic_range(struct vcpu *v, unsigned long addr)
+static int vlapic_range(struct vcpu *v, unsigned long address,
+ unsigned long len)
{
struct vlapic *vlapic = vcpu_vlapic(v);
- unsigned long offset = addr - vlapic_base_address(vlapic);
+ unsigned long offset = address - vlapic_base_address(vlapic);
return !vlapic_hw_disabled(vlapic) &&
!vlapic_x2apic_mode(vlapic) &&
- (offset < PAGE_SIZE);
+ (address >= vlapic_base_address(vlapic)) &&
+ ((offset + len) <= PAGE_SIZE);

I'd prefer to stay with checking just offset here, unless you see
anything wrong with that.

Post by Paul Durrant
return r;
}
-static int msixtbl_range(struct vcpu *v, unsigned long addr)
+static int msixtbl_range(struct vcpu *v, unsigned long address,
+ unsigned long len)
{
+ struct msixtbl_entry *entry;
const struct msi_desc *desc;
rcu_read_lock(&msixtbl_rcu_lock);
- desc = msixtbl_addr_to_desc(msixtbl_find_entry(v, addr), addr);
+ entry = msixtbl_find_entry(v, address, len);
+ desc = msixtbl_addr_to_desc(entry, address);

Again I don't see the need to do more adjustments here than
necessary for your purpose.

Post by Paul Durrant
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -35,7 +35,9 @@ typedef int (*hvm_mmio_write_t)(struct vcpu *v,
unsigned long addr,
unsigned long length,
unsigned long val);
-typedef int (*hvm_mmio_check_t)(struct vcpu *v, unsigned long addr);
+typedef int (*hvm_mmio_check_t)(struct vcpu *v,
+ unsigned long addr,
+ unsigned long length);

I don't think this really needs to be "long"?

Jan

Paul Durrant

2015-06-24 13:14:13 UTC

-----Original Message-----
Sent: 24 June 2015 14:08
To: Paul Durrant
Subject: Re: [Xen-devel] [PATCH v4 07/17] x86/hvm: add length to mmio
check op

Post by Paul Durrant
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -498,10 +498,11 @@ static int hpet_write(
return X86EMUL_OKAY;
}
-static int hpet_range(struct vcpu *v, unsigned long addr)
+static int hpet_range(struct vcpu *v, unsigned long addr,
+ unsigned long length)
{
- return ( (addr >= HPET_BASE_ADDRESS) &&
- (addr < (HPET_BASE_ADDRESS + HPET_MMAP_SIZE)) );
+ return (addr >= HPET_BASE_ADDRESS) &&
+ ((addr + length) < (HPET_BASE_ADDRESS + HPET_MMAP_SIZE));

<=

Yep.

Post by Paul Durrant
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -986,14 +986,16 @@ int hvm_x2apic_msr_write(struct vcpu *v,

unsigned int msr, uint64_t msr_content)

Post by Paul Durrant
return vlapic_reg_write(v, offset, (uint32_t)msr_content);
}
-static int vlapic_range(struct vcpu *v, unsigned long addr)
+static int vlapic_range(struct vcpu *v, unsigned long address,
+ unsigned long len)
{
struct vlapic *vlapic = vcpu_vlapic(v);
- unsigned long offset = addr - vlapic_base_address(vlapic);
+ unsigned long offset = address - vlapic_base_address(vlapic);
return !vlapic_hw_disabled(vlapic) &&
!vlapic_x2apic_mode(vlapic) &&
- (offset < PAGE_SIZE);
+ (address >= vlapic_base_address(vlapic)) &&
+ ((offset + len) <= PAGE_SIZE);

I'd prefer to stay with checking just offset here, unless you see
anything wrong with that.

No, I'm happy with that if you are.

Post by Paul Durrant
return r;
}
-static int msixtbl_range(struct vcpu *v, unsigned long addr)
+static int msixtbl_range(struct vcpu *v, unsigned long address,
+ unsigned long len)
{
+ struct msixtbl_entry *entry;
const struct msi_desc *desc;
rcu_read_lock(&msixtbl_rcu_lock);
- desc = msixtbl_addr_to_desc(msixtbl_find_entry(v, addr), addr);
+ entry = msixtbl_find_entry(v, address, len);
+ desc = msixtbl_addr_to_desc(entry, address);

Again I don't see the need to do more adjustments here than
necessary for your purpose.

Sure.

Post by Paul Durrant
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -35,7 +35,9 @@ typedef int (*hvm_mmio_write_t)(struct vcpu *v,
unsigned long addr,
unsigned long length,
unsigned long val);
-typedef int (*hvm_mmio_check_t)(struct vcpu *v, unsigned long addr);
+typedef int (*hvm_mmio_check_t)(struct vcpu *v,
+ unsigned long addr,
+ unsigned long length);

I don't think this really needs to be "long"?

For consistency with the mmio read and write function types I went with 'long'. Is there any harm in that?

Paul

Jan
_______________________________________________
Xen-devel mailing list
http://lists.xen.org/xen-devel

Jan Beulich

2015-06-24 13:25:08 UTC

Post by Paul Durrant

Sent: 24 June 2015 14:08

Post by Paul Durrant
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -35,7 +35,9 @@ typedef int (*hvm_mmio_write_t)(struct vcpu *v,
unsigned long addr,
unsigned long length,
unsigned long val);
-typedef int (*hvm_mmio_check_t)(struct vcpu *v, unsigned long addr);
+typedef int (*hvm_mmio_check_t)(struct vcpu *v,
+ unsigned long addr,
+ unsigned long length);

I don't think this really needs to be "long"?

For consistency with the mmio read and write function types I went with
'long'. Is there any harm in that?

Generally generates worse code (due to the need for the REX64
prefix on all involved instructions). Perhaps the other ones don't
need sizes/lengths passed as longs either?

Jan

Paul Durrant

2015-06-24 13:34:45 UTC

-----Original Message-----
Sent: 24 June 2015 14:25
To: Paul Durrant
Subject: RE: [Xen-devel] [PATCH v4 07/17] x86/hvm: add length to mmio
check op

Post by Paul Durrant

Sent: 24 June 2015 14:08

Post by Paul Durrant
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -35,7 +35,9 @@ typedef int (*hvm_mmio_write_t)(struct vcpu *v,
unsigned long addr,
unsigned long length,
unsigned long val);
-typedef int (*hvm_mmio_check_t)(struct vcpu *v, unsigned long

addr);

Post by Paul Durrant

Post by Paul Durrant
+typedef int (*hvm_mmio_check_t)(struct vcpu *v,
+ unsigned long addr,
+ unsigned long length);

I don't think this really needs to be "long"?

For consistency with the mmio read and write function types I went with
'long'. Is there any harm in that?

Generally generates worse code (due to the need for the REX64
prefix on all involved instructions). Perhaps the other ones don't
need sizes/lengths passed as longs either?

I'm happy to do it that way round if you don't mind the extra diffs. I'll do it as a separate patch just before this one, to ease review.

Paul

Jan

Jan Beulich

2015-06-24 14:01:17 UTC

Post by Paul Durrant

-----Original Message-----
Sent: 24 June 2015 14:25
To: Paul Durrant
Subject: RE: [Xen-devel] [PATCH v4 07/17] x86/hvm: add length to mmio
check op

Post by Paul Durrant

Sent: 24 June 2015 14:08

Post by Paul Durrant
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -35,7 +35,9 @@ typedef int (*hvm_mmio_write_t)(struct vcpu *v,
unsigned long addr,
unsigned long length,
unsigned long val);
-typedef int (*hvm_mmio_check_t)(struct vcpu *v, unsigned long

addr);

Post by Paul Durrant

Post by Paul Durrant
+typedef int (*hvm_mmio_check_t)(struct vcpu *v,
+ unsigned long addr,
+ unsigned long length);

I don't think this really needs to be "long"?

For consistency with the mmio read and write function types I went with
'long'. Is there any harm in that?

Generally generates worse code (due to the need for the REX64
prefix on all involved instructions). Perhaps the other ones don't
need sizes/lengths passed as longs either?

I'm happy to do it that way round if you don't mind the extra diffs. I'll do
it as a separate patch just before this one, to ease review.

Thanks. There's actually a good example of why they shouldn't be
unsigned long in patch 9 I just looked at:

+static int stdvga_mem_write(struct vcpu *v, unsigned long addr,
+ unsigned long size, unsigned long data)
{
+ ioreq_t p = { .type = IOREQ_TYPE_COPY,
+ .addr = addr,
+ .size = size,

This clearly would truncate size if it ever exceeded a uint32_t (and
hence an unsigned int on x86).

Jan

Andrew Cooper

2015-06-25 12:21:27 UTC

Post by Paul Durrant
When memory mapped I/O is range checked by internal handlers, the length
of the access should be taken into account.

For what purpose? The length of the access doesn't affect which handler
should accept the IO.

This length check now causes an MMIO handler to not claim an access
which straddles the upper boundary.

It is probably fine to terminate such an access early, but it isn't fine
to pass such a straddled access to the default ioreq server.

~Andrew

Jan Beulich

2015-06-25 12:46:15 UTC

Post by Andrew Cooper

Post by Paul Durrant
When memory mapped I/O is range checked by internal handlers, the length
of the access should be taken into account.

For what purpose? The length of the access doesn't affect which handler
should accept the IO.
This length check now causes an MMIO handler to not claim an access
which straddles the upper boundary.
It is probably fine to terminate such an access early, but it isn't fine
to pass such a straddled access to the default ioreq server.

No, without involving the length in the check we can end up with
check() saying "Yes, mine" but read() or write() saying "Not me".
What I would agree with is for the generic handler to split the
access if the first byte fits, but the final byte doesn't.

Jan

Paul Durrant

2015-06-25 13:08:15 UTC

-----Original Message-----
Sent: 25 June 2015 13:46
To: Andrew Cooper
Subject: Re: [PATCH v4 07/17] x86/hvm: add length to mmio check op

Post by Andrew Cooper

Post by Paul Durrant
When memory mapped I/O is range checked by internal handlers, the

length

Post by Andrew Cooper

Post by Paul Durrant
of the access should be taken into account.

For what purpose? The length of the access doesn't affect which handler
should accept the IO.
This length check now causes an MMIO handler to not claim an access
which straddles the upper boundary.
It is probably fine to terminate such an access early, but it isn't fine
to pass such a straddled access to the default ioreq server.

No, without involving the length in the check we can end up with
check() saying "Yes, mine" but read() or write() saying "Not me".
What I would agree with is for the generic handler to split the
access if the first byte fits, but the final byte doesn't.

That's not a trivial thing to do. Could we, for now, have the check claim based on address but domain_crash() if length does not fit?

Paul

Jan

Jan Beulich

2015-06-25 13:29:26 UTC

Post by Paul Durrant

-----Original Message-----
Sent: 25 June 2015 13:46
To: Andrew Cooper
Subject: Re: [PATCH v4 07/17] x86/hvm: add length to mmio check op

Post by Andrew Cooper

Post by Paul Durrant
When memory mapped I/O is range checked by internal handlers, the

length

Post by Andrew Cooper

Post by Paul Durrant
of the access should be taken into account.

For what purpose? The length of the access doesn't affect which handler
should accept the IO.
This length check now causes an MMIO handler to not claim an access
which straddles the upper boundary.
It is probably fine to terminate such an access early, but it isn't fine
to pass such a straddled access to the default ioreq server.

No, without involving the length in the check we can end up with
check() saying "Yes, mine" but read() or write() saying "Not me".
What I would agree with is for the generic handler to split the
access if the first byte fits, but the final byte doesn't.

That's not a trivial thing to do. Could we, for now, have the check claim
based on address but domain_crash() if length does not fit?

Would seem acceptable to me; if problems arise we could drop
that domain_crash() later on with a trivial patch.

Jan

Paul Durrant

2015-06-25 13:30:52 UTC

-----Original Message-----
Sent: 25 June 2015 14:29
To: Andrew Cooper; Paul Durrant
Subject: RE: [PATCH v4 07/17] x86/hvm: add length to mmio check op

Post by Paul Durrant

-----Original Message-----
Sent: 25 June 2015 13:46
To: Andrew Cooper
Subject: Re: [PATCH v4 07/17] x86/hvm: add length to mmio check op

Post by Andrew Cooper

Post by Paul Durrant
When memory mapped I/O is range checked by internal handlers, the

length

Post by Andrew Cooper

Post by Paul Durrant
of the access should be taken into account.

For what purpose? The length of the access doesn't affect which

handler

Post by Paul Durrant

Post by Andrew Cooper
should accept the IO.
This length check now causes an MMIO handler to not claim an access
which straddles the upper boundary.
It is probably fine to terminate such an access early, but it isn't fine
to pass such a straddled access to the default ioreq server.

No, without involving the length in the check we can end up with
check() saying "Yes, mine" but read() or write() saying "Not me".
What I would agree with is for the generic handler to split the
access if the first byte fits, but the final byte doesn't.

That's not a trivial thing to do. Could we, for now, have the check claim
based on address but domain_crash() if length does not fit?

Would seem acceptable to me; if problems arise we could drop
that domain_crash() later on with a trivial patch.

Ok. Thanks,

Paul

Jan

Andrew Cooper

2015-06-25 13:34:01 UTC

Post by Jan Beulich

Post by Andrew Cooper

Post by Paul Durrant
When memory mapped I/O is range checked by internal handlers, the length
of the access should be taken into account.

For what purpose? The length of the access doesn't affect which handler
should accept the IO.
This length check now causes an MMIO handler to not claim an access
which straddles the upper boundary.
It is probably fine to terminate such an access early, but it isn't fine
to pass such a straddled access to the default ioreq server.

No, without involving the length in the check we can end up with
check() saying "Yes, mine" but read() or write() saying "Not me".
What I would agree with is for the generic handler to split the
access if the first byte fits, but the final byte doesn't.

I discussed this with Paul over lunch. I had not considered how IO gets
forwarded to the device model for shared implementations.

Is it reasonable to split a straddled access and direct the halves at
different handlers? This is not in line with how other hardware behaves
(PCIe will reject any straddled access). Furthermore, given small MMIO
regions and larger registers, there is no guarantee that a single split
will suffice.

I see in the other thread going on that a domain_crash() is deemed ok
for now, which is fine my me.

~Andrew

Paul Durrant

2015-06-25 13:36:13 UTC

-----Original Message-----
Sent: 25 June 2015 14:34
To: Jan Beulich
Subject: Re: [PATCH v4 07/17] x86/hvm: add length to mmio check op

Post by Jan Beulich

Post by Andrew Cooper

Post by Paul Durrant
When memory mapped I/O is range checked by internal handlers, the

length

Post by Jan Beulich

Post by Andrew Cooper

Post by Paul Durrant
of the access should be taken into account.

For what purpose? The length of the access doesn't affect which handler
should accept the IO.
This length check now causes an MMIO handler to not claim an access
which straddles the upper boundary.
It is probably fine to terminate such an access early, but it isn't fine
to pass such a straddled access to the default ioreq server.

No, without involving the length in the check we can end up with
check() saying "Yes, mine" but read() or write() saying "Not me".
What I would agree with is for the generic handler to split the
access if the first byte fits, but the final byte doesn't.

I discussed this with Paul over lunch. I had not considered how IO gets
forwarded to the device model for shared implementations.
Is it reasonable to split a straddled access and direct the halves at
different handlers? This is not in line with how other hardware behaves
(PCIe will reject any straddled access). Furthermore, given small MMIO
regions and larger registers, there is no guarantee that a single split
will suffice.
I see in the other thread going on that a domain_crash() is deemed ok
for now, which is fine my me.

I think that also allows me to simplfy the patch since I don't have to modify the mmio_check op any more. I simply call it once for the first byte of the access and, if it accepts, verify that it also accepts the last byte of the access.

Paul

~Andrew

Andrew Cooper

2015-06-25 13:37:44 UTC

Post by Paul Durrant

-----Original Message-----
Sent: 25 June 2015 14:34
To: Jan Beulich
Subject: Re: [PATCH v4 07/17] x86/hvm: add length to mmio check op

Post by Jan Beulich

Post by Andrew Cooper

Post by Paul Durrant
When memory mapped I/O is range checked by internal handlers, the

length

Post by Jan Beulich

Post by Andrew Cooper

Post by Paul Durrant
of the access should be taken into account.

For what purpose? The length of the access doesn't affect which handler
should accept the IO.
This length check now causes an MMIO handler to not claim an access
which straddles the upper boundary.
It is probably fine to terminate such an access early, but it isn't fine
to pass such a straddled access to the default ioreq server.

No, without involving the length in the check we can end up with
check() saying "Yes, mine" but read() or write() saying "Not me".
What I would agree with is for the generic handler to split the
access if the first byte fits, but the final byte doesn't.

I discussed this with Paul over lunch. I had not considered how IO gets
forwarded to the device model for shared implementations.
Is it reasonable to split a straddled access and direct the halves at
different handlers? This is not in line with how other hardware behaves
(PCIe will reject any straddled access). Furthermore, given small MMIO
regions and larger registers, there is no guarantee that a single split
will suffice.
I see in the other thread going on that a domain_crash() is deemed ok
for now, which is fine my me.

I think that also allows me to simplfy the patch since I don't have to modify the mmio_check op any more. I simply call it once for the first byte of the access and, if it accepts, verify that it also accepts the last byte of the access.

At that point, I would say it would be easier to modify the claim check
to return "yes/straddled/no" rather than calling it twice.

~Andrew

Paul Durrant

2015-06-25 13:38:52 UTC

-----Original Message-----
Sent: 25 June 2015 14:38
To: Paul Durrant; Jan Beulich
Subject: Re: [PATCH v4 07/17] x86/hvm: add length to mmio check op

Post by Paul Durrant

-----Original Message-----
Sent: 25 June 2015 14:34
To: Jan Beulich
Subject: Re: [PATCH v4 07/17] x86/hvm: add length to mmio check op

Post by Jan Beulich

Post by Andrew Cooper

Post by Paul Durrant
When memory mapped I/O is range checked by internal handlers, the

length

Post by Jan Beulich

Post by Andrew Cooper

Post by Paul Durrant
of the access should be taken into account.

For what purpose? The length of the access doesn't affect which

handler

Post by Paul Durrant

Post by Jan Beulich

Post by Andrew Cooper
should accept the IO.
This length check now causes an MMIO handler to not claim an access
which straddles the upper boundary.
It is probably fine to terminate such an access early, but it isn't fine
to pass such a straddled access to the default ioreq server.

No, without involving the length in the check we can end up with
check() saying "Yes, mine" but read() or write() saying "Not me".
What I would agree with is for the generic handler to split the
access if the first byte fits, but the final byte doesn't.

I discussed this with Paul over lunch. I had not considered how IO gets
forwarded to the device model for shared implementations.
Is it reasonable to split a straddled access and direct the halves at
different handlers? This is not in line with how other hardware behaves
(PCIe will reject any straddled access). Furthermore, given small MMIO
regions and larger registers, there is no guarantee that a single split
will suffice.
I see in the other thread going on that a domain_crash() is deemed ok
for now, which is fine my me.

I think that also allows me to simplfy the patch since I don't have to modify

the mmio_check op any more. I simply call it once for the first byte of the
access and, if it accepts, verify that it also accepts the last byte of the access.
At that point, I would say it would be easier to modify the claim check
to return "yes/straddled/no" rather than calling it twice.

That's excessive code churn, I think. The check functions are generally cheap and the second call is only made if the first accepts.

Paul

~Andrew

Andrew Cooper

2015-06-25 13:46:49 UTC

Post by Paul Durrant

-----Original Message-----
Sent: 25 June 2015 14:38
To: Paul Durrant; Jan Beulich
Subject: Re: [PATCH v4 07/17] x86/hvm: add length to mmio check op

Post by Paul Durrant

-----Original Message-----
Sent: 25 June 2015 14:34
To: Jan Beulich
Subject: Re: [PATCH v4 07/17] x86/hvm: add length to mmio check op

Post by Jan Beulich

Post by Andrew Cooper

Post by Paul Durrant
When memory mapped I/O is range checked by internal handlers, the

length

Post by Jan Beulich

Post by Andrew Cooper

Post by Paul Durrant
of the access should be taken into account.

For what purpose? The length of the access doesn't affect which

handler

Post by Paul Durrant

Post by Jan Beulich

Post by Andrew Cooper
should accept the IO.
This length check now causes an MMIO handler to not claim an access
which straddles the upper boundary.
It is probably fine to terminate such an access early, but it isn't fine
to pass such a straddled access to the default ioreq server.

No, without involving the length in the check we can end up with
check() saying "Yes, mine" but read() or write() saying "Not me".
What I would agree with is for the generic handler to split the
access if the first byte fits, but the final byte doesn't.

I discussed this with Paul over lunch. I had not considered how IO gets
forwarded to the device model for shared implementations.
Is it reasonable to split a straddled access and direct the halves at
different handlers? This is not in line with how other hardware behaves
(PCIe will reject any straddled access). Furthermore, given small MMIO
regions and larger registers, there is no guarantee that a single split
will suffice.
I see in the other thread going on that a domain_crash() is deemed ok
for now, which is fine my me.

I think that also allows me to simplfy the patch since I don't have to modify

the mmio_check op any more. I simply call it once for the first byte of the
access and, if it accepts, verify that it also accepts the last byte of the access.
At that point, I would say it would be easier to modify the claim check
to return "yes/straddled/no" rather than calling it twice.

That's excessive code churn, I think. The check functions are generally cheap and the second call is only made if the first accepts.

You are already churning everything anyway by inserting an extra
parameter. I do think it would make the logic cleaner and easier to
follow (which IMO takes precedent over churn).

~Andrew

Paul Durrant

2015-06-25 13:48:17 UTC

-----Original Message-----
Sent: 25 June 2015 14:47
To: Paul Durrant; Jan Beulich
Subject: Re: [PATCH v4 07/17] x86/hvm: add length to mmio check op

Post by Paul Durrant

-----Original Message-----
Sent: 25 June 2015 14:38
To: Paul Durrant; Jan Beulich
Subject: Re: [PATCH v4 07/17] x86/hvm: add length to mmio check op

Post by Paul Durrant

-----Original Message-----
Sent: 25 June 2015 14:34
To: Jan Beulich
Subject: Re: [PATCH v4 07/17] x86/hvm: add length to mmio check op

Post by Jan Beulich

Post by Andrew Cooper

Post by Paul Durrant
When memory mapped I/O is range checked by internal handlers,

the

Post by Paul Durrant

Post by Paul Durrant

length

Post by Jan Beulich

Post by Andrew Cooper

Post by Paul Durrant
of the access should be taken into account.

For what purpose? The length of the access doesn't affect which

handler

Post by Paul Durrant

Post by Jan Beulich

Post by Andrew Cooper
should accept the IO.
This length check now causes an MMIO handler to not claim an

access

Post by Paul Durrant

Post by Paul Durrant

Post by Jan Beulich

Post by Andrew Cooper
which straddles the upper boundary.
It is probably fine to terminate such an access early, but it isn't fine
to pass such a straddled access to the default ioreq server.

No, without involving the length in the check we can end up with
check() saying "Yes, mine" but read() or write() saying "Not me".
What I would agree with is for the generic handler to split the
access if the first byte fits, but the final byte doesn't.

I discussed this with Paul over lunch. I had not considered how IO gets
forwarded to the device model for shared implementations.
Is it reasonable to split a straddled access and direct the halves at
different handlers? This is not in line with how other hardware behaves
(PCIe will reject any straddled access). Furthermore, given small MMIO
regions and larger registers, there is no guarantee that a single split
will suffice.
I see in the other thread going on that a domain_crash() is deemed ok
for now, which is fine my me.

I think that also allows me to simplfy the patch since I don't have to

modify

Post by Paul Durrant

the mmio_check op any more. I simply call it once for the first byte of the
access and, if it accepts, verify that it also accepts the last byte of the

access.

Post by Paul Durrant

At that point, I would say it would be easier to modify the claim check
to return "yes/straddled/no" rather than calling it twice.

That's excessive code churn, I think. The check functions are generally

cheap and the second call is only made if the first accepts.
You are already churning everything anyway by inserting an extra
parameter. I do think it would make the logic cleaner and easier to
follow (which IMO takes precedent over churn).

No, my point was that by making the second call I don't need to add the extra parameter. Wait for the revised patch... it's about 6 lines long now ;-)

Paul

~Andrew

Jan Beulich

2015-06-25 13:47:35 UTC

Post by Paul Durrant
I think that also allows me to simplfy the patch since I don't have to
modify the mmio_check op any more. I simply call it once for the first byte
of the access and, if it accepts, verify that it also accepts the last byte
of the access.

That's actually not (generally) okay: There could be a hole in the
middle. But as long as instructions don't do accesses wider than
a page, we're fine with that in practice I think. Or wait, no, in the
MSI-X this could not be okay: A 64-byte read to the 16 bytes
32 bytes away from a page boundary (and being the last entry
on one device's MSI-X table) would extend into another device's
MSI-X table on the next page. I.e. first and last bytes would be
okay to be accessed, but bytes 16...31 of the access wouldn't.
Of course the MSI-X read/write handlers don't currently permit
such wide accesses, but anyway...

Jan

Paul Durrant

2015-06-25 13:52:35 UTC

-----Original Message-----
Sent: 25 June 2015 14:48
To: Paul Durrant
Subject: RE: [PATCH v4 07/17] x86/hvm: add length to mmio check op

Post by Paul Durrant
I think that also allows me to simplfy the patch since I don't have to
modify the mmio_check op any more. I simply call it once for the first byte
of the access and, if it accepts, verify that it also accepts the last byte
of the access.

That's actually not (generally) okay: There could be a hole in the
middle. But as long as instructions don't do accesses wider than
a page, we're fine with that in practice I think. Or wait, no, in the
MSI-X this could not be okay: A 64-byte read to the 16 bytes
32 bytes away from a page boundary (and being the last entry
on one device's MSI-X table) would extend into another device's
MSI-X table on the next page. I.e. first and last bytes would be
okay to be accessed, but bytes 16...31 of the access wouldn't.
Of course the MSI-X read/write handlers don't currently permit
such wide accesses, but anyway...

We could also verify that, for a rep op, all reads/writes come back with OKAY. I think that would be ok.

Paul

Jan

Jan Beulich

2015-06-25 14:46:08 UTC

Post by Paul Durrant

-----Original Message-----
Sent: 25 June 2015 14:48
To: Paul Durrant
Subject: RE: [PATCH v4 07/17] x86/hvm: add length to mmio check op

Post by Paul Durrant
I think that also allows me to simplfy the patch since I don't have to
modify the mmio_check op any more. I simply call it once for the first byte
of the access and, if it accepts, verify that it also accepts the last byte
of the access.

That's actually not (generally) okay: There could be a hole in the
middle. But as long as instructions don't do accesses wider than
a page, we're fine with that in practice I think. Or wait, no, in the
MSI-X this could not be okay: A 64-byte read to the 16 bytes
32 bytes away from a page boundary (and being the last entry
on one device's MSI-X table) would extend into another device's
MSI-X table on the next page. I.e. first and last bytes would be
okay to be accessed, but bytes 16...31 of the access wouldn't.
Of course the MSI-X read/write handlers don't currently permit
such wide accesses, but anyway...

We could also verify that, for a rep op, all reads/writes come back with
OKAY. I think that would be ok.

I wasn't thinking of a rep op, but of an AVX-512 memory access.

Jan

Paul Durrant

2015-06-25 14:57:58 UTC

-----Original Message-----
Sent: 25 June 2015 15:46
To: Paul Durrant
Subject: RE: [PATCH v4 07/17] x86/hvm: add length to mmio check op

Post by Paul Durrant

-----Original Message-----
Sent: 25 June 2015 14:48
To: Paul Durrant
Subject: RE: [PATCH v4 07/17] x86/hvm: add length to mmio check op

Post by Paul Durrant
I think that also allows me to simplfy the patch since I don't have to
modify the mmio_check op any more. I simply call it once for the first

byte

Post by Paul Durrant

Post by Paul Durrant
of the access and, if it accepts, verify that it also accepts the last byte
of the access.

That's actually not (generally) okay: There could be a hole in the
middle. But as long as instructions don't do accesses wider than
a page, we're fine with that in practice I think. Or wait, no, in the
MSI-X this could not be okay: A 64-byte read to the 16 bytes
32 bytes away from a page boundary (and being the last entry
on one device's MSI-X table) would extend into another device's
MSI-X table on the next page. I.e. first and last bytes would be
okay to be accessed, but bytes 16...31 of the access wouldn't.
Of course the MSI-X read/write handlers don't currently permit
such wide accesses, but anyway...

We could also verify that, for a rep op, all reads/writes come back with
OKAY. I think that would be ok.

I wasn't thinking of a rep op, but of an AVX-512 memory access.

That's ok because it would be chunked at the higher level and accept() would be called for each (up to) 8 byte chunk.

Paul

Jan

Paul Durrant

2015-06-24 11:24:37 UTC

The struct just contains three methods and no data, so the name
hvm_mmio_ops more accurately reflects its content. A subsequent patch
introduces a new structure which more accurately warrants the name
hvm_mmio_handler so doing the rename in this purely cosmetic patch avoids
conflating functional and cosmetic changes in a single patch.

Signed-off-by: Paul Durrant <***@citrix.com>
Cc: Keir Fraser <***@xen.org>
Acked-by: Jan Beulich <***@suse.com>
Cc: Andrew Cooper <***@citrix.com>
---
tools/tests/vhpet/emul.h | 8 +++---
tools/tests/vhpet/main.c | 6 ++---
xen/arch/x86/hvm/hpet.c | 8 +++---
xen/arch/x86/hvm/intercept.c | 41 ++++++++++++++---------------
xen/arch/x86/hvm/vioapic.c | 8 +++---
xen/arch/x86/hvm/vlapic.c | 8 +++---
xen/arch/x86/hvm/vmsi.c | 8 +++---
xen/drivers/passthrough/amd/iommu_guest.c | 8 +++---
xen/include/asm-x86/hvm/io.h | 18 ++++++-------
9 files changed, 56 insertions(+), 57 deletions(-)

diff --git a/tools/tests/vhpet/emul.h b/tools/tests/vhpet/emul.h
index 09e4611..383acff 100644
--- a/tools/tests/vhpet/emul.h
+++ b/tools/tests/vhpet/emul.h
@@ -237,11 +237,11 @@ typedef int (*hvm_mmio_write_t)(struct vcpu *v,
typedef int (*hvm_mmio_check_t)(struct vcpu *v, unsigned long addr);

-struct hvm_mmio_handler
+struct hvm_mmio_ops
{
- hvm_mmio_check_t check_handler;
- hvm_mmio_read_t read_handler;
- hvm_mmio_write_t write_handler;
+ hvm_mmio_check_t check;
+ hvm_mmio_read_t read;
+ hvm_mmio_write_t write;
};

/* Marshalling and unmarshalling uses a buffer with size and cursor. */
diff --git a/tools/tests/vhpet/main.c b/tools/tests/vhpet/main.c
index fbd7510..6fe65ea 100644
--- a/tools/tests/vhpet/main.c
+++ b/tools/tests/vhpet/main.c
@@ -70,7 +70,7 @@ static int skip_error_on_load;

static char *global_thousep;

-extern const struct hvm_mmio_handler hpet_mmio_handler;
+extern const struct hvm_mmio_ops hpet_mmio_ops;

struct domain dom1;
struct vcpu vcpu0;
@@ -297,13 +297,13 @@ void udelay(int w)
unsigned int hpet_readl(unsigned long a)
{
unsigned long ret = 0;
- hpet_mmio_handler.read_handler(current, a, 4, &ret);
+ hpet_mmio_ops.read(current, a, 4, &ret);
return ret;
}

void hpet_writel(unsigned long d, unsigned long a)
{
- hpet_mmio_handler.write_handler(current, a, 4, d);
+ hpet_mmio_ops.write(current, a, 4, d);
return;
}

diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index d898169..9585ca8 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -504,10 +504,10 @@ static int hpet_range(struct vcpu *v, unsigned long addr)
(addr < (HPET_BASE_ADDRESS + HPET_MMAP_SIZE)) );
}

-const struct hvm_mmio_handler hpet_mmio_handler = {
- .check_handler = hpet_range,
- .read_handler = hpet_read,
- .write_handler = hpet_write
+const struct hvm_mmio_ops hpet_mmio_ops = {
+ .check = hpet_range,
+ .read = hpet_read,
+ .write = hpet_write
};

diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index d52a48c..cc44733 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -32,20 +32,20 @@
#include <xen/event.h>
#include <xen/iommu.h>

-static const struct hvm_mmio_handler *const
+static const struct hvm_mmio_ops *const
hvm_mmio_handlers[HVM_MMIO_HANDLER_NR] =
{
- &hpet_mmio_handler,
- &vlapic_mmio_handler,
- &vioapic_mmio_handler,
- &msixtbl_mmio_handler,
- &iommu_mmio_handler
+ &hpet_mmio_ops,
+ &vlapic_mmio_ops,
+ &vioapic_mmio_ops,
+ &msixtbl_mmio_ops,
+ &iommu_mmio_ops
};

static int hvm_mmio_access(struct vcpu *v,
ioreq_t *p,
- hvm_mmio_read_t read_handler,
- hvm_mmio_write_t write_handler)
+ hvm_mmio_read_t read,
+ hvm_mmio_write_t write)
{
struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
unsigned long data;
@@ -64,11 +64,11 @@ static int hvm_mmio_access(struct vcpu *v,
vio->mmio_retrying = 0;
}
else
- rc = read_handler(v, p->addr, p->size, &data);
+ rc = read(v, p->addr, p->size, &data);
p->data = data;
}
else /* p->dir == IOREQ_WRITE */
- rc = write_handler(v, p->addr, p->size, p->data);
+ rc = write(v, p->addr, p->size, p->data);
return rc;
}

@@ -86,7 +86,7 @@ static int hvm_mmio_access(struct vcpu *v,
}
else
{
- rc = read_handler(v, p->addr + step * i, p->size, &data);
+ rc = read(v, p->addr + step * i, p->size, &data);
if ( rc != X86EMUL_OKAY )
break;
}
@@ -145,7 +145,7 @@ static int hvm_mmio_access(struct vcpu *v,
}
if ( rc != X86EMUL_OKAY )
break;
- rc = write_handler(v, p->addr + step * i, p->size, data);
+ rc = write(v, p->addr + step * i, p->size, data);
if ( rc != X86EMUL_OKAY )
break;
}
@@ -169,7 +169,7 @@ bool_t hvm_mmio_internal(paddr_t gpa)
unsigned int i;

for ( i = 0; i < HVM_MMIO_HANDLER_NR; ++i )
- if ( hvm_mmio_handlers[i]->check_handler(curr, gpa) )
+ if ( hvm_mmio_handlers[i]->check(curr, gpa) )
return 1;

return 0;
@@ -182,21 +182,20 @@ int hvm_mmio_intercept(ioreq_t *p)

for ( i = 0; i < HVM_MMIO_HANDLER_NR; i++ )
{
- hvm_mmio_check_t check_handler =
- hvm_mmio_handlers[i]->check_handler;
+ hvm_mmio_check_t check = hvm_mmio_handlers[i]->check;

- if ( check_handler(v, p->addr) )
+ if ( check(v, p->addr) )
{
if ( unlikely(p->count > 1) &&
- !check_handler(v, unlikely(p->df)
- ? p->addr - (p->count - 1L) * p->size
- : p->addr + (p->count - 1L) * p->size) )
+ !check(v, unlikely(p->df)
+ ? p->addr - (p->count - 1L) * p->size
+ : p->addr + (p->count - 1L) * p->size) )
p->count = 1;

return hvm_mmio_access(
v, p,
- hvm_mmio_handlers[i]->read_handler,
- hvm_mmio_handlers[i]->write_handler);
+ hvm_mmio_handlers[i]->read,
+ hvm_mmio_handlers[i]->write);
}
}

diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index f903420..cbbef9f 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -250,10 +250,10 @@ static int vioapic_range(struct vcpu *v, unsigned long addr)
(addr < vioapic->base_address + VIOAPIC_MEM_LENGTH)));
}

-const struct hvm_mmio_handler vioapic_mmio_handler = {
- .check_handler = vioapic_range,
- .read_handler = vioapic_read,
- .write_handler = vioapic_write
+const struct hvm_mmio_ops vioapic_mmio_ops = {
+ .check = vioapic_range,
+ .read = vioapic_read,
+ .write = vioapic_write
};

static void ioapic_inj_irq(
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index fbc51d1..d5855cc 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -996,10 +996,10 @@ static int vlapic_range(struct vcpu *v, unsigned long addr)
(offset < PAGE_SIZE);
}

-const struct hvm_mmio_handler vlapic_mmio_handler = {
- .check_handler = vlapic_range,
- .read_handler = vlapic_read,
- .write_handler = vlapic_write
+const struct hvm_mmio_ops vlapic_mmio_ops = {
+ .check = vlapic_range,
+ .read = vlapic_read,
+ .write = vlapic_write
};

static void set_x2apic_id(struct vlapic *vlapic)
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 3dbbe37..f89233d 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -344,10 +344,10 @@ static int msixtbl_range(struct vcpu *v, unsigned long addr)
return !!desc;
}

-const struct hvm_mmio_handler msixtbl_mmio_handler = {
- .check_handler = msixtbl_range,
- .read_handler = msixtbl_read,
- .write_handler = msixtbl_write
+const struct hvm_mmio_ops msixtbl_mmio_ops = {
+ .check = msixtbl_range,
+ .read = msixtbl_read,
+ .write = msixtbl_write
};

static void add_msixtbl_entry(struct domain *d,
diff --git a/xen/drivers/passthrough/amd/iommu_guest.c b/xen/drivers/passthrough/amd/iommu_guest.c
index 98e7b38..7b0c102 100644
--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -919,8 +919,8 @@ static int guest_iommu_mmio_range(struct vcpu *v, unsigned long addr)
addr < iommu->mmio_base + IOMMU_MMIO_SIZE;
}

-const struct hvm_mmio_handler iommu_mmio_handler = {
- .check_handler = guest_iommu_mmio_range,
- .read_handler = guest_iommu_mmio_read,
- .write_handler = guest_iommu_mmio_write
+const struct hvm_mmio_ops iommu_mmio_ops = {
+ .check = guest_iommu_mmio_range,
+ .read = guest_iommu_mmio_read,
+ .write = guest_iommu_mmio_write
};
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 886a9d6..f2aaec5 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -59,17 +59,17 @@ struct hvm_io_handler {
struct io_handler hdl_list[MAX_IO_HANDLER];
};

-struct hvm_mmio_handler {
- hvm_mmio_check_t check_handler;
- hvm_mmio_read_t read_handler;
- hvm_mmio_write_t write_handler;
+struct hvm_mmio_ops {
+ hvm_mmio_check_t check;
+ hvm_mmio_read_t read;
+ hvm_mmio_write_t write;
};

-extern const struct hvm_mmio_handler hpet_mmio_handler;
-extern const struct hvm_mmio_handler vlapic_mmio_handler;
-extern const struct hvm_mmio_handler vioapic_mmio_handler;
-extern const struct hvm_mmio_handler msixtbl_mmio_handler;
-extern const struct hvm_mmio_handler iommu_mmio_handler;
+extern const struct hvm_mmio_ops hpet_mmio_ops;
+extern const struct hvm_mmio_ops vlapic_mmio_ops;
+extern const struct hvm_mmio_ops vioapic_mmio_ops;
+extern const struct hvm_mmio_ops msixtbl_mmio_ops;
+extern const struct hvm_mmio_ops iommu_mmio_ops;

#define HVM_MMIO_HANDLER_NR 5

--
1.7.10.4

Paul Durrant

2015-06-24 11:24:33 UTC

Currently hvmemul_do_io() handles paging for I/O to/from a guest address
inline. This causes every exit point to have to execute:

if ( ram_page )
put_page(ram_page);

This patch introduces wrapper hvmemul_do_io_addr() and
hvmemul_do_io_buffer() functions. The latter is used for I/O to/from a Xen
buffer and thus the complexity of paging can be restricted only to the
former, making the common hvmemul_do_io() function less convoluted.

This patch also tightens up some types and introduces pio/mmio wrappers
for the above functions with comments to document their semantics.

Signed-off-by: Paul Durrant <***@citrix.com>
Cc: Keir Fraser <***@xen.org>
Cc: Jan Beulich <***@suse.com>
Cc: Andrew Cooper <***@citrix.com>
---
xen/arch/x86/hvm/emulate.c | 278 ++++++++++++++++++++++++-------------
xen/arch/x86/hvm/io.c | 4 +-
xen/include/asm-x86/hvm/emulate.h | 17 ++-
3 files changed, 198 insertions(+), 101 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index ac9c9d6..9d7af0c 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -51,41 +51,23 @@ static void hvmtrace_io_assist(int is_mmio, ioreq_t *p)
}

static int hvmemul_do_io(
- int is_mmio, paddr_t addr, unsigned long *reps, int size,
- paddr_t ram_gpa, int dir, int df, void *p_data)
+ bool_t is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
+ uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
{
struct vcpu *curr = current;
- struct hvm_vcpu_io *vio;
+ struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
ioreq_t p = {
.type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
.addr = addr,
.size = size,
.dir = dir,
.df = df,
- .data = ram_gpa,
- .data_is_ptr = (p_data == NULL),
+ .data = data,
+ .data_is_ptr = data_is_addr, /* ioreq_t field name is misleading */
};
- unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
- p2m_type_t p2mt;
- struct page_info *ram_page;
+ void *p_data = (void *)data;
int rc;

- /* Check for paged out page */
- ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE);
- if ( p2m_is_paging(p2mt) )
- {
- if ( ram_page )
- put_page(ram_page);
- p2m_mem_paging_populate(curr->domain, ram_gfn);
- return X86EMUL_RETRY;
- }
- if ( p2m_is_shared(p2mt) )
- {
- if ( ram_page )
- put_page(ram_page);
- return X86EMUL_RETRY;
- }
-
/*
* Weird-sized accesses have undefined behaviour: we discard writes
* and read all-ones.
@@ -93,23 +75,10 @@ static int hvmemul_do_io(
if ( unlikely((size > sizeof(long)) || (size & (size - 1))) )
{
gdprintk(XENLOG_WARNING, "bad mmio size %d\n", size);
- ASSERT(p_data != NULL); /* cannot happen with a REP prefix */
- if ( dir == IOREQ_READ )
- memset(p_data, ~0, size);
- if ( ram_page )
- put_page(ram_page);
return X86EMUL_UNHANDLEABLE;
}

- if ( !p.data_is_ptr && (dir == IOREQ_WRITE) )
- {
- memcpy(&p.data, p_data, size);
- p_data = NULL;
- }
-
- vio = &curr->arch.hvm_vcpu.hvm_io;
-
- if ( is_mmio && !p.data_is_ptr )
+ if ( is_mmio && !data_is_addr )
{
/* Part of a multi-cycle read or write? */
if ( dir == IOREQ_WRITE )
@@ -117,11 +86,7 @@ static int hvmemul_do_io(
paddr_t pa = vio->mmio_large_write_pa;
unsigned int bytes = vio->mmio_large_write_bytes;
if ( (addr >= pa) && ((addr + size) <= (pa + bytes)) )
- {
- if ( ram_page )
- put_page(ram_page);
return X86EMUL_OKAY;
- }
}
else
{
@@ -131,8 +96,6 @@ static int hvmemul_do_io(
{
memcpy(p_data, &vio->mmio_large_read[addr - pa],
size);
- if ( ram_page )
- put_page(ram_page);
return X86EMUL_OKAY;
}
}
@@ -144,40 +107,28 @@ static int hvmemul_do_io(
break;
case HVMIO_completed:
vio->io_state = HVMIO_none;
- if ( p_data == NULL )
- {
- if ( ram_page )
- put_page(ram_page);
+ if ( data_is_addr || dir == IOREQ_WRITE )
return X86EMUL_UNHANDLEABLE;
- }
goto finish_access;
case HVMIO_dispatched:
/* May have to wait for previous cycle of a multi-write to complete. */
- if ( is_mmio && !p.data_is_ptr && (dir == IOREQ_WRITE) &&
+ if ( is_mmio && !data_is_addr && (dir == IOREQ_WRITE) &&
(addr == (vio->mmio_large_write_pa +
vio->mmio_large_write_bytes)) )
- {
- if ( ram_page )
- put_page(ram_page);
return X86EMUL_RETRY;
- }
/* fallthrough */
default:
- if ( ram_page )
- put_page(ram_page);
return X86EMUL_UNHANDLEABLE;
}

if ( hvm_io_pending(curr) )
{
gdprintk(XENLOG_WARNING, "WARNING: io already pending?\n");
- if ( ram_page )
- put_page(ram_page);
return X86EMUL_UNHANDLEABLE;
}

- vio->io_state =
- (p_data == NULL) ? HVMIO_dispatched : HVMIO_awaiting_completion;
+ vio->io_state = (data_is_addr || dir == IOREQ_WRITE) ?
+ HVMIO_dispatched : HVMIO_awaiting_completion;
vio->io_size = size;

/*
@@ -190,7 +141,12 @@ static int hvmemul_do_io(
p.count = *reps;

if ( dir == IOREQ_WRITE )
+ {
+ if ( !data_is_addr )
+ memcpy(&p.data, p_data, size);
+
hvmtrace_io_assist(is_mmio, &p);
+ }

if ( is_mmio )
{
@@ -235,7 +191,7 @@ static int hvmemul_do_io(
rc = X86EMUL_RETRY;
if ( !hvm_send_assist_req(s, &p) )
vio->io_state = HVMIO_none;
- else if ( p_data == NULL )
+ else if ( data_is_addr || dir == IOREQ_WRITE )
rc = X86EMUL_OKAY;
}
break;
@@ -245,20 +201,18 @@ static int hvmemul_do_io(
}

if ( rc != X86EMUL_OKAY )
- {
- if ( ram_page )
- put_page(ram_page);
return rc;
- }

finish_access:
if ( dir == IOREQ_READ )
+ {
hvmtrace_io_assist(is_mmio, &p);

- if ( p_data != NULL )
- memcpy(p_data, &vio->io_data, size);
+ if ( !data_is_addr )
+ memcpy(p_data, &vio->io_data, size);
+ }

- if ( is_mmio && !p.data_is_ptr )
+ if ( is_mmio && !data_is_addr )
{
/* Part of a multi-cycle read or write? */
if ( dir == IOREQ_WRITE )
@@ -285,23 +239,153 @@ static int hvmemul_do_io(
}
}

- if ( ram_page )
- put_page(ram_page);
return X86EMUL_OKAY;
}

-int hvmemul_do_pio(
- unsigned long port, unsigned long *reps, int size,
- paddr_t ram_gpa, int dir, int df, void *p_data)
+static int hvmemul_do_io_buffer(
+ bool_t is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
+ uint8_t dir, bool_t df, void *buffer)
+{
+ int rc;
+
+ BUG_ON(buffer == NULL);
+
+ rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 0,
+ (uintptr_t)buffer);
+ if ( rc == X86EMUL_UNHANDLEABLE && dir == IOREQ_READ )
+ memset(buffer, 0xff, size);
+
+ return rc;
+}
+
+static int hvmemul_acquire_page(unsigned long gmfn, struct page_info **page)
+{
+ struct domain *curr_d = current->domain;
+ p2m_type_t p2mt;
+
+ *page = get_page_from_gfn(curr_d, gmfn, &p2mt, P2M_UNSHARE);
+
+ if ( *page == NULL )
+ return X86EMUL_UNHANDLEABLE;
+
+ if ( p2m_is_paging(p2mt) )
+ {
+ put_page(*page);
+ p2m_mem_paging_populate(curr_d, gmfn);
+ return X86EMUL_RETRY;
+ }
+
+ if ( p2m_is_shared(p2mt) )
+ {
+ put_page(*page);
+ return X86EMUL_RETRY;
+ }
+
+ return X86EMUL_OKAY;
+}
+
+static inline void hvmemul_release_page(struct page_info *page)
{
- return hvmemul_do_io(0, port, reps, size, ram_gpa, dir, df, p_data);
+ put_page(page);
}

-static int hvmemul_do_mmio(
- paddr_t gpa, unsigned long *reps, int size,
- paddr_t ram_gpa, int dir, int df, void *p_data)
+static int hvmemul_do_io_addr(
+ bool_t is_mmio, paddr_t addr, unsigned long *reps,
+ unsigned int size, uint8_t dir, bool_t df, paddr_t ram_gpa)
+{
+ struct page_info *ram_page;
+ int rc;
+
+ rc = hvmemul_acquire_page(paddr_to_pfn(ram_gpa), &ram_page);
+ if ( rc != X86EMUL_OKAY )
+ return rc;
+
+ rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 1,
+ ram_gpa);
+
+ hvmemul_release_page(ram_page);
+
+ return rc;
+}
+
+/*
+ * Perform I/O between <port> and <buffer>. <dir> indicates the
+ * direction: IOREQ_READ means a read from <port> to <buffer> and
+ * IOREQ_WRITE means a write from <buffer> to <port>. Each access has
+ * width <size>.
+ */
+int hvmemul_do_pio_buffer(uint16_t port,
+ unsigned int size,
+ uint8_t dir,
+ void *buffer)
+{
+ unsigned long one_rep = 1;
+
+ return hvmemul_do_io_buffer(0, port, &one_rep, size, dir, 0, buffer);
+}
+
+/*
+ * Perform I/O between <port> and guest RAM starting at <ram_addr>.
+ * <dir> indicates the direction: IOREQ_READ means a read from <port> to
+ * RAM and IOREQ_WRITE means a write from RAM to <port>. Each access has
+ * width <size> and up to *<reps> accesses will be performed. If
+ * X86EMUL_OKAY is returned then <reps> will be updated with the number
+ * of accesses actually performed.
+ * Each access will be done to/from successive RAM addresses, increasing
+ * if <df> is 0 or decreasing if <df> is 1.
+ */
+static int hvmemul_do_pio_addr(uint16_t port,
+ unsigned long *reps,
+ unsigned int size,
+ uint8_t dir,
+ bool_t df,
+ paddr_t ram_addr)
+{
+ return hvmemul_do_io_addr(0, port, reps, size, dir, df, ram_addr);
+}
+
+/*
+ * Perform I/O between MMIO space starting at <mmio_gpa> and <buffer>.
+ * <dir> indicates the direction: IOREQ_READ means a read from MMIO to
+ * <buffer> and IOREQ_WRITE means a write from <buffer> to MMIO. Each
+ * access has width <size> and up to *<reps> accesses will be performed.
+ * If X86EMUL_OKAY is returned then <reps> will be updated with the number
+ * of accesses actually performed.
+ * Each access will be done to/from successive MMIO addresses, increasing
+ * if <df> is 0 or decreasing if <df> is 1.
+ *
+ * NOTE: If *<reps> is greater than 1, each access will use the
+ * <buffer> pointer; there is no implicit interation over a
+ * block of memory starting at <buffer>.
+ */
+static int hvmemul_do_mmio_buffer(paddr_t mmio_gpa,
+ unsigned long *reps,
+ unsigned int size,
+ uint8_t dir,
+ bool_t df,
+ void *buffer)
+{
+ return hvmemul_do_io_buffer(1, mmio_gpa, reps, size, dir, df, buffer);
+}
+
+/*
+ * Perform I/O between MMIO space starting at <mmio_gpa> and guest RAM
+ * starting at <ram_gpa>. <dir> indicates the direction: IOREQ_READ
+ * means a read from MMIO to RAM and IOREQ_WRITE means a write from RAM
+ * to MMIO. Each access has width <size> and up to *<reps> accesses will
+ * be performed. If X86EMUL_OKAY is returned then <reps> will be updated
+ * with the number of accesses actually performed.
+ * Each access will be done to/from successive RAM *and* MMIO addresses,
+ * increasing if <df> is 0 or decreasing if <df> is 1.
+ */
+static int hvmemul_do_mmio_addr(paddr_t mmio_gpa,
+ unsigned long *reps,
+ unsigned int size,
+ uint8_t dir,
+ bool_t df,
+ paddr_t ram_gpa)
{
- return hvmemul_do_io(1, gpa, reps, size, ram_gpa, dir, df, p_data);
+ return hvmemul_do_io_addr(1, mmio_gpa, reps, size, dir, df, ram_gpa);
}

/*
@@ -503,7 +587,8 @@ static int __hvmemul_read(
gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off);
while ( (off + chunk) <= PAGE_SIZE )
{
- rc = hvmemul_do_mmio(gpa, &reps, chunk, 0, IOREQ_READ, 0, p_data);
+ rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_READ, 0,
+ p_data);
if ( rc != X86EMUL_OKAY || bytes == chunk )
return rc;
addr += chunk;
@@ -537,7 +622,8 @@ static int __hvmemul_read(
hvmemul_ctxt);
while ( rc == X86EMUL_OKAY )
{
- rc = hvmemul_do_mmio(gpa, &reps, chunk, 0, IOREQ_READ, 0, p_data);
+ rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_READ, 0,
+ p_data);
if ( rc != X86EMUL_OKAY || bytes == chunk )
break;
addr += chunk;
@@ -645,7 +731,8 @@ static int hvmemul_write(
gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off);
while ( (off + chunk) <= PAGE_SIZE )
{
- rc = hvmemul_do_mmio(gpa, &reps, chunk, 0, IOREQ_WRITE, 0, p_data);
+ rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_WRITE, 0,
+ p_data);
if ( rc != X86EMUL_OKAY || bytes == chunk )
return rc;
addr += chunk;
@@ -675,7 +762,8 @@ static int hvmemul_write(
hvmemul_ctxt);
while ( rc == X86EMUL_OKAY )
{
- rc = hvmemul_do_mmio(gpa, &reps, chunk, 0, IOREQ_WRITE, 0, p_data);
+ rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_WRITE, 0,
+ p_data);
if ( rc != X86EMUL_OKAY || bytes == chunk )
break;
addr += chunk;
@@ -849,8 +937,8 @@ static int hvmemul_rep_ins(
if ( p2mt == p2m_mmio_direct || p2mt == p2m_mmio_dm )
return X86EMUL_UNHANDLEABLE;

- return hvmemul_do_pio(src_port, reps, bytes_per_rep, gpa, IOREQ_READ,
- !!(ctxt->regs->eflags & X86_EFLAGS_DF), NULL);
+ return hvmemul_do_pio_addr(src_port, reps, bytes_per_rep, IOREQ_READ,
+ !!(ctxt->regs->eflags & X86_EFLAGS_DF), gpa);
}

static int hvmemul_rep_outs(
@@ -887,8 +975,8 @@ static int hvmemul_rep_outs(
if ( p2mt == p2m_mmio_direct || p2mt == p2m_mmio_dm )
return X86EMUL_UNHANDLEABLE;

- return hvmemul_do_pio(dst_port, reps, bytes_per_rep, gpa, IOREQ_WRITE,
- !!(ctxt->regs->eflags & X86_EFLAGS_DF), NULL);
+ return hvmemul_do_pio_addr(dst_port, reps, bytes_per_rep, IOREQ_WRITE,
+ !!(ctxt->regs->eflags & X86_EFLAGS_DF), gpa);
}

static int hvmemul_rep_movs(
@@ -944,12 +1032,12 @@ static int hvmemul_rep_movs(
return X86EMUL_UNHANDLEABLE;

if ( sp2mt == p2m_mmio_dm )
- return hvmemul_do_mmio(
- sgpa, reps, bytes_per_rep, dgpa, IOREQ_READ, df, NULL);
+ return hvmemul_do_mmio_addr(
+ sgpa, reps, bytes_per_rep, IOREQ_READ, df, dgpa);

if ( dp2mt == p2m_mmio_dm )
- return hvmemul_do_mmio(
- dgpa, reps, bytes_per_rep, sgpa, IOREQ_WRITE, df, NULL);
+ return hvmemul_do_mmio_addr(
+ dgpa, reps, bytes_per_rep, IOREQ_WRITE, df, sgpa);

/* RAM-to-RAM copy: emulate as equivalent of memmove(dgpa, sgpa, bytes). */
bytes = *reps * bytes_per_rep;
@@ -1102,8 +1190,8 @@ static int hvmemul_rep_stos(
return X86EMUL_UNHANDLEABLE;

case p2m_mmio_dm:
- return hvmemul_do_mmio(gpa, reps, bytes_per_rep, 0, IOREQ_WRITE, df,
- p_data);
+ return hvmemul_do_mmio_buffer(gpa, reps, bytes_per_rep, IOREQ_WRITE, df,
+ p_data);
}
}

@@ -1140,9 +1228,8 @@ static int hvmemul_read_io(
unsigned long *val,
struct x86_emulate_ctxt *ctxt)
{
- unsigned long reps = 1;
*val = 0;
- return hvmemul_do_pio(port, &reps, bytes, 0, IOREQ_READ, 0, val);
+ return hvmemul_do_pio_buffer(port, bytes, IOREQ_READ, val);
}

static int hvmemul_write_io(
@@ -1151,8 +1238,7 @@ static int hvmemul_write_io(
unsigned long val,
struct x86_emulate_ctxt *ctxt)
{
- unsigned long reps = 1;
- return hvmemul_do_pio(port, &reps, bytes, 0, IOREQ_WRITE, 0, &val);
+ return hvmemul_do_pio_buffer(port, bytes, IOREQ_WRITE, &val);
}

static int hvmemul_read_cr(
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 68fb890..c0964ec 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -132,7 +132,7 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
{
struct vcpu *curr = current;
struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
- unsigned long data, reps = 1;
+ unsigned long data;
int rc;

ASSERT((size - 1) < 4 && size != 3);
@@ -140,7 +140,7 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
if ( dir == IOREQ_WRITE )
data = guest_cpu_user_regs()->eax;

- rc = hvmemul_do_pio(port, &reps, size, 0, dir, 0, &data);
+ rc = hvmemul_do_pio_buffer(port, size, dir, &data);

switch ( rc )
{
diff --git a/xen/include/asm-x86/hvm/emulate.h b/xen/include/asm-x86/hvm/emulate.h
index b3971c8..594be38 100644
--- a/xen/include/asm-x86/hvm/emulate.h
+++ b/xen/include/asm-x86/hvm/emulate.h
@@ -50,11 +50,22 @@ struct segment_register *hvmemul_get_seg_reg(
enum x86_segment seg,
struct hvm_emulate_ctxt *hvmemul_ctxt);

-int hvmemul_do_pio(
- unsigned long port, unsigned long *reps, int size,
- paddr_t ram_gpa, int dir, int df, void *p_data);
+int hvmemul_do_pio_buffer(uint16_t port,
+ unsigned int size,
+ uint8_t dir,
+ void *buffer);

void hvm_dump_emulation_state(const char *prefix,
struct hvm_emulate_ctxt *hvmemul_ctxt);

#endif /* __ASM_X86_HVM_EMULATE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */

--
1.7.10.4

Paul Durrant

2015-06-24 11:24:35 UTC

The is_mmio parameter can be inferred from the ioreq type.

Signed-off-by: Paul Durrant <***@citrix.com>
Cc: Keir Fraser <***@xen.org>
Cc: Jan Beulich <***@suse.com>
Cc: Andrew Cooper <***@citrix.com>
---
xen/arch/x86/hvm/emulate.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index b412302..935eab3 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -23,8 +23,9 @@
#include <asm/hvm/support.h>
#include <asm/hvm/svm/svm.h>

-static void hvmtrace_io_assist(int is_mmio, ioreq_t *p)
+static void hvmtrace_io_assist(ioreq_t *p)
{
+ bool_t is_mmio = (p->type == IOREQ_TYPE_COPY);
unsigned int size, event;
unsigned char buffer[12];

@@ -139,7 +140,7 @@ static int hvmemul_do_io(
if ( !data_is_addr )
memcpy(&p.data, p_data, size);

- hvmtrace_io_assist(is_mmio, &p);
+ hvmtrace_io_assist(&p);
}

if ( is_mmio )
@@ -200,7 +201,7 @@ static int hvmemul_do_io(
finish_access:
if ( dir == IOREQ_READ )
{
- hvmtrace_io_assist(is_mmio, &p);
+ hvmtrace_io_assist(&p);

if ( !data_is_addr )
memcpy(p_data, &vio->io_data, size);

--
1.7.10.4

Paul Durrant

2015-06-24 11:24:40 UTC

This patch re-works the dpci portio intercepts so that they can be unified
with standard portio handling thereby removing a substantial amount of
code duplication.

Signed-off-by: Paul Durrant <***@citrix.com>
Cc: Keir Fraser <***@xen.org>
Cc: Jan Beulich <***@suse.com>
Cc: Andrew Cooper <***@citrix.com>
---
xen/arch/x86/hvm/hvm.c | 2 +
xen/arch/x86/hvm/intercept.c | 22 ++--
xen/arch/x86/hvm/io.c | 225 +++++++++++++---------------------------
xen/include/asm-x86/hvm/io.h | 8 ++
xen/include/asm-x86/hvm/vcpu.h | 2 +
xen/include/xen/iommu.h | 1 -
6 files changed, 88 insertions(+), 172 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index c10db78..f8486f4 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1486,6 +1486,8 @@ int hvm_domain_initialise(struct domain *d)
else
d->arch.hvm_domain.io_bitmap = hvm_io_bitmap;

+ register_dpci_portio_handler(d);
+
if ( is_pvh_domain(d) )
{
register_portio_handler(d, 0, 0x10003, handle_pvh_io);
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index 5e8d8b2..5633959 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -116,10 +116,7 @@ static int hvm_process_io_intercept(struct hvm_io_handler *handler,
ioreq_t *p)
{
struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
- const struct hvm_io_ops *ops =
- (p->type == IOREQ_TYPE_COPY) ?
- &mmio_ops :
- &portio_ops;
+ const struct hvm_io_ops *ops = handler->ops;
int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
uint64_t data;
uint64_t addr;
@@ -237,16 +234,13 @@ static struct hvm_io_handler *hvm_find_io_handler(ioreq_t *p)
{
struct vcpu *curr = current;
struct domain *curr_d = curr->domain;
- const struct hvm_io_ops *ops =
- (p->type == IOREQ_TYPE_COPY) ?
- &mmio_ops :
- &portio_ops;
unsigned int i;

for ( i = 0; i < curr_d->arch.hvm_domain.io_handler_count; i++ )
{
struct hvm_io_handler *handler =
&curr_d->arch.hvm_domain.io_handler[i];
+ const struct hvm_io_ops *ops = handler->ops;
uint64_t start, end, count = p->count, size = p->size;

if ( handler->type != p->type )
@@ -285,13 +279,7 @@ int hvm_io_intercept(ioreq_t *p)
{
struct hvm_io_handler *handler;

- if ( p->type == IOREQ_TYPE_PIO )
- {
- int rc = dpci_ioport_intercept(p);
- if ( (rc == X86EMUL_OKAY) || (rc == X86EMUL_RETRY) )
- return rc;
- }
- else if ( p->type == IOREQ_TYPE_COPY )
+ if ( p->type == IOREQ_TYPE_COPY )
{
int rc = stdvga_intercept_mmio(p);
if ( (rc == X86EMUL_OKAY) || (rc == X86EMUL_RETRY) )
@@ -306,7 +294,7 @@ int hvm_io_intercept(ioreq_t *p)
return hvm_process_io_intercept(handler, p);
}

-static struct hvm_io_handler *hvm_next_io_handler(struct domain *d)
+struct hvm_io_handler *hvm_next_io_handler(struct domain *d)
{
unsigned int i = d->arch.hvm_domain.io_handler_count++;

@@ -321,6 +309,7 @@ void register_mmio_handler(struct domain *d, const struct hvm_mmio_ops *ops)
struct hvm_io_handler *handler = hvm_next_io_handler(d);

handler->type = IOREQ_TYPE_COPY;
+ handler->ops = &mmio_ops;
handler->u.mmio.ops = ops;
}

@@ -330,6 +319,7 @@ void register_portio_handler(struct domain *d, uint32_t addr,
struct hvm_io_handler *handler = hvm_next_io_handler(d);

handler->type = IOREQ_TYPE_PIO;
+ handler->ops = &portio_ops;
handler->u.portio.start = addr;
handler->u.portio.end = addr + size;
handler->u.portio.action = action;
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index c0964ec..51ef19a 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -208,185 +208,100 @@ void hvm_io_assist(ioreq_t *p)
}
}

-static int dpci_ioport_read(uint32_t mport, ioreq_t *p)
+static bool_t dpci_portio_accept(struct hvm_io_handler *handler,
+ uint64_t addr,
+ uint64_t size)
{
- struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
- int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
- uint32_t data = 0;
+ struct vcpu *curr = current;
+ struct hvm_iommu *hd = domain_hvm_iommu(curr->domain);
+ struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
+ struct g2m_ioport *g2m_ioport;
+ uint32_t start, end;
+ uint32_t gport = addr, mport;

- for ( i = 0; i < p->count; i++ )
+ list_for_each_entry( g2m_ioport, &hd->arch.g2m_ioport_list, list )
{
- if ( vio->mmio_retrying )
- {
- if ( vio->mmio_large_read_bytes != p->size )
- return X86EMUL_UNHANDLEABLE;
- memcpy(&data, vio->mmio_large_read, p->size);
- vio->mmio_large_read_bytes = 0;
- vio->mmio_retrying = 0;
- }
- else switch ( p->size )
- {
- case 1:
- data = inb(mport);
- break;
- case 2:
- data = inw(mport);
- break;
- case 4:
- data = inl(mport);
- break;
- default:
- BUG();
- }
-
- if ( p->data_is_ptr )
- {
- switch ( hvm_copy_to_guest_phys(p->data + step * i,
- &data, p->size) )
- {
- case HVMCOPY_okay:
- break;
- case HVMCOPY_gfn_paged_out:
- case HVMCOPY_gfn_shared:
- rc = X86EMUL_RETRY;
- break;
- case HVMCOPY_bad_gfn_to_mfn:
- /* Drop the write as real hardware would. */
- continue;
- case HVMCOPY_bad_gva_to_gfn:
- ASSERT(0);
- /* fall through */
- default:
- rc = X86EMUL_UNHANDLEABLE;
- break;
- }
- if ( rc != X86EMUL_OKAY)
- break;
- }
- else
- p->data = data;
+ start = g2m_ioport->gport;
+ end = start + g2m_ioport->np;
+ if ( (gport >= start) && (gport + size <= end) )
+ goto found;
}

- if ( rc == X86EMUL_RETRY )
- {
- vio->mmio_retry = 1;
- vio->mmio_large_read_bytes = p->size;
- memcpy(vio->mmio_large_read, &data, p->size);
- }
+ return 0;

- if ( i != 0 )
- {
- p->count = i;
- rc = X86EMUL_OKAY;
- }
+ found:
+ mport = (gport - start) + g2m_ioport->mport;

- return rc;
+ vio->g2m_ioport = g2m_ioport;
+ return 1;
}

-static int dpci_ioport_write(uint32_t mport, ioreq_t *p)
+static int dpci_portio_read(struct hvm_io_handler *handler,
+ uint64_t addr,
+ uint64_t size,
+ uint64_t *data)
{
- int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
- uint32_t data;
-
- for ( i = 0; i < p->count; i++ )
- {
- data = p->data;
- if ( p->data_is_ptr )
- {
- switch ( hvm_copy_from_guest_phys(&data, p->data + step * i,
- p->size) )
- {
- case HVMCOPY_okay:
- break;
- case HVMCOPY_gfn_paged_out:
- case HVMCOPY_gfn_shared:
- rc = X86EMUL_RETRY;
- break;
- case HVMCOPY_bad_gfn_to_mfn:
- data = ~0;
- break;
- case HVMCOPY_bad_gva_to_gfn:
- ASSERT(0);
- /* fall through */
- default:
- rc = X86EMUL_UNHANDLEABLE;
- break;
- }
- if ( rc != X86EMUL_OKAY)
- break;
- }
-
- switch ( p->size )
- {
- case 1:
- outb(data, mport);
- break;
- case 2:
- outw(data, mport);
- break;
- case 4:
- outl(data, mport);
- break;
- default:
- BUG();
- }
- }
-
- if ( rc == X86EMUL_RETRY )
- current->arch.hvm_vcpu.hvm_io.mmio_retry = 1;
+ struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
+ struct g2m_ioport *g2m_ioport = vio->g2m_ioport;
+ uint32_t mport = (addr - g2m_ioport->gport) + g2m_ioport->mport;

- if ( i != 0 )
+ switch ( size )
{
- p->count = i;
- rc = X86EMUL_OKAY;
+ case 1:
+ *data = inb(mport);
+ break;
+ case 2:
+ *data = inw(mport);
+ break;
+ case 4:
+ *data = inl(mport);
+ break;
+ default:
+ BUG();
}

- return rc;
+ return X86EMUL_OKAY;
}

-int dpci_ioport_intercept(ioreq_t *p)
+static int dpci_portio_write(struct hvm_io_handler *handler,
+ uint64_t addr,
+ uint64_t size,
+ uint64_t data)
{
- struct domain *d = current->domain;
- struct hvm_iommu *hd = domain_hvm_iommu(d);
- struct g2m_ioport *g2m_ioport;
- unsigned int mport, gport = p->addr;
- unsigned int s = 0, e = 0;
- int rc;
-
- list_for_each_entry( g2m_ioport, &hd->arch.g2m_ioport_list, list )
- {
- s = g2m_ioport->gport;
- e = s + g2m_ioport->np;
- if ( (gport >= s) && (gport < e) )
- goto found;
- }
-
- return X86EMUL_UNHANDLEABLE;
-
- found:
- mport = (gport - s) + g2m_ioport->mport;
-
- if ( !ioports_access_permitted(d, mport, mport + p->size - 1) )
- {
- gdprintk(XENLOG_ERR, "Error: access to gport=%#x denied!\n",
- (uint32_t)p->addr);
- return X86EMUL_UNHANDLEABLE;
- }
+ struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
+ struct g2m_ioport *g2m_ioport = vio->g2m_ioport;
+ uint32_t mport = (addr - g2m_ioport->gport) + g2m_ioport->mport;

- switch ( p->dir )
+ switch ( size )
{
- case IOREQ_READ:
- rc = dpci_ioport_read(mport, p);
+ case 1:
+ outb(data, mport);
break;
- case IOREQ_WRITE:
- rc = dpci_ioport_write(mport, p);
+ case 2:
+ outw(data, mport);
+ break;
+ case 4:
+ outl(data, mport);
break;
default:
- gdprintk(XENLOG_ERR, "Error: couldn't handle p->dir = %d", p->dir);
- rc = X86EMUL_UNHANDLEABLE;
+ BUG();
}

- return rc;
+ return X86EMUL_OKAY;
+}
+
+static const struct hvm_io_ops dpci_portio_ops = {
+ .accept = dpci_portio_accept,
+ .read = dpci_portio_read,
+ .write = dpci_portio_write
+};
+
+void register_dpci_portio_handler(struct domain *d)
+{
+ struct hvm_io_handler *handler = hvm_next_io_handler(d);
+
+ handler->type = IOREQ_TYPE_PIO;
+ handler->ops = &dpci_portio_ops;
}

/*
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index b4596fc..afa27bf 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -48,6 +48,8 @@ struct hvm_mmio_ops {
typedef int (*portio_action_t)(
int dir, uint32_t port, uint32_t bytes, uint32_t *val);

+struct hvm_io_ops;
+
struct hvm_io_handler {
union {
struct {
@@ -58,6 +60,7 @@ struct hvm_io_handler {
portio_action_t action;
} portio;
} u;
+ const struct hvm_io_ops *ops;
uint8_t type;
};

@@ -81,6 +84,8 @@ struct hvm_io_ops {

int hvm_io_intercept(ioreq_t *p);

+struct hvm_io_handler *hvm_next_io_handler(struct domain *d);
+
void register_mmio_handler(struct domain *d,
const struct hvm_mmio_ops *ops);
void register_portio_handler(struct domain *d, uint32_t addr,
@@ -121,6 +126,9 @@ int stdvga_intercept_mmio(ioreq_t *p);
void stdvga_deinit(struct domain *d);

extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
+
+void register_dpci_portio_handler(struct domain *d);
+
#endif /* __ASM_X86_HVM_IO_H__ */

diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 3d8f4dc..dd08416 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -77,6 +77,8 @@ struct hvm_vcpu_io {
bool_t mmio_retry, mmio_retrying;

unsigned long msix_unmask_address;
+
+ struct g2m_ioport *g2m_ioport;
};

#define VMCX_EADDR (~0ULL)
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index b30bf41..1d00696 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -93,7 +93,6 @@ void pt_pci_init(void);

struct pirq;
int hvm_do_IRQ_dpci(struct domain *, struct pirq *);
-int dpci_ioport_intercept(ioreq_t *p);
int pt_irq_create_bind(struct domain *, xen_domctl_bind_pt_irq_t *);
int pt_irq_destroy_bind(struct domain *, xen_domctl_bind_pt_irq_t *);

--
1.7.10.4

Jan Beulich

2015-06-24 13:46:21 UTC

Post by Paul Durrant
@@ -237,16 +234,13 @@ static struct hvm_io_handler *hvm_find_io_handler(ioreq_t *p)
{
struct vcpu *curr = current;
struct domain *curr_d = curr->domain;
- const struct hvm_io_ops *ops =
- (p->type == IOREQ_TYPE_COPY) ?
- &portio_ops;
unsigned int i;
for ( i = 0; i < curr_d->arch.hvm_domain.io_handler_count; i++ )
{
struct hvm_io_handler *handler =
&curr_d->arch.hvm_domain.io_handler[i];
+ const struct hvm_io_ops *ops = handler->ops;

Which highlights that handler should probably also become const
(right where being introduced).

Post by Paul Durrant
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -208,185 +208,100 @@ void hvm_io_assist(ioreq_t *p)
}
}
-static int dpci_ioport_read(uint32_t mport, ioreq_t *p)
+static bool_t dpci_portio_accept(struct hvm_io_handler *handler,
+ uint64_t addr,
+ uint64_t size)
{
- struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
- int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
- uint32_t data = 0;
+ struct vcpu *curr = current;
+ struct hvm_iommu *hd = domain_hvm_iommu(curr->domain);
+ struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
+ struct g2m_ioport *g2m_ioport;
+ uint32_t start, end;
+ uint32_t gport = addr, mport;

All four of them should be unsigned int. And the same would again
apply to earlier patches. There's no need for port numbers to be
fixed width other than uint16_t (which would be inefficient for
other reasons).

Post by Paul Durrant
+ mport = (gport - start) + g2m_ioport->mport;
- return rc;
+ vio->g2m_ioport = g2m_ioport;
+ return 1;
}

The value calculated into mport is not used.

Post by Paul Durrant
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -48,6 +48,8 @@ struct hvm_mmio_ops {
typedef int (*portio_action_t)(
int dir, uint32_t port, uint32_t bytes, uint32_t *val);
+struct hvm_io_ops;

This is not needed in an internal header when the use is ...

Post by Paul Durrant
@@ -58,6 +60,7 @@ struct hvm_io_handler {
portio_action_t action;
} portio;
} u;
+ const struct hvm_io_ops *ops;

... in a structure, union, return value of a function, or declaration
of a global variable. Only function parameter types require such.

Post by Paul Durrant
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -77,6 +77,8 @@ struct hvm_vcpu_io {
bool_t mmio_retry, mmio_retrying;
unsigned long msix_unmask_address;
+
+ struct g2m_ioport *g2m_ioport;
};

I think this should be const, documenting that users of the field
won't modify what it points to.

Jan

Paul Durrant

2015-06-24 11:24:48 UTC

If memory mapped I/O is 'chunked' then the I/O must be re-emulated,
otherwise only the first chunk will be processed. This patch makes
sure all I/O from a buffer is re-emulated regardless of whether it
is a read or a write.

Signed-off-by: Paul Durrant <***@citrix.com>
Cc: Keir Fraser <***@xen.org>
Cc: Jan Beulich <***@suse.com>
Cc: Andrew Cooper <***@citrix.com>
---
xen/arch/x86/hvm/emulate.c | 4 ++--
xen/arch/x86/hvm/hvm.c | 13 ++++++++-----
xen/include/asm-x86/hvm/vcpu.h | 3 +--
3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 6c50ef5..aa68787 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -148,7 +148,7 @@ static int hvmemul_do_io(
(p.data_is_ptr != data_is_addr) )
domain_crash(curr->domain);

- if ( data_is_addr || dir == IOREQ_WRITE )
+ if ( data_is_addr )
return X86EMUL_UNHANDLEABLE;
goto finish_access;
default:
@@ -188,7 +188,7 @@ static int hvmemul_do_io(
rc = hvm_send_assist_req(s, &p);
if ( rc != X86EMUL_RETRY )
vio->io_req.state = STATE_IOREQ_NONE;
- else if ( data_is_addr || dir == IOREQ_WRITE )
+ else if ( data_is_addr )
rc = X86EMUL_OKAY;
}
break;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 8abf29b..c062c9f 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -501,11 +501,14 @@ void hvm_do_resume(struct vcpu *v)
(void)handle_mmio();
break;
case HVMIO_pio_completion:
- if ( vio->io_req.size == 4 ) /* Needs zero extension. */
- guest_cpu_user_regs()->rax = (uint32_t)vio->io_req.data;
- else
- memcpy(&guest_cpu_user_regs()->rax, &vio->io_req.data,
- vio->io_req.size);
+ if ( vio->io_req.dir == IOREQ_READ )
+ {
+ if ( vio->io_req.size == 4 ) /* Needs zero extension. */
+ guest_cpu_user_regs()->rax = (uint32_t)vio->io_req.data;
+ else
+ memcpy(&guest_cpu_user_regs()->rax, &vio->io_req.data,
+ vio->io_req.size);
+ }
vio->io_req.state = STATE_IOREQ_NONE;
break;
case HVMIO_realmode_completion:
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 7338638..008c8fa 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -49,8 +49,7 @@ struct hvm_vcpu_io {

#define HVMIO_NEED_COMPLETION(_vio) \
( ((_vio)->io_req.state == STATE_IOREQ_READY) && \
- !(_vio)->io_req.data_is_ptr && \
- ((_vio)->io_req.dir == IOREQ_READ) )
+ !(_vio)->io_req.data_is_ptr )

/*
* HVM emulation:

--
1.7.10.4

Jan Beulich

2015-06-25 09:57:44 UTC

Post by Paul Durrant
If memory mapped I/O is 'chunked' then the I/O must be re-emulated,
otherwise only the first chunk will be processed. This patch makes
sure all I/O from a buffer is re-emulated regardless of whether it
is a read or a write.

I'm not sure I understand this: Isn't the reason for treating reads
and writes differently due to the fact that MMIO reads may have
side effects, and hence can't be re-issued (whereas writes are
always the last thing an instruction does, and hence can't hold up
retiring of it, and hence don't need retrying)?

Furthermore, doesn't "only the first chunk" get represented correctly
already by informing the caller that only a single iteration of a
repeated instruction was done, such that further repeats will get
carried out anyway (resulting in another, fresh cycle through the
emulator)?

Jan

Paul Durrant

2015-06-25 10:32:45 UTC

-----Original Message-----
Sent: 25 June 2015 10:58
To: Paul Durrant
Subject: Re: [PATCH v4 16/17] x86/hvm: always re-emulate I/O from a buffer

Post by Paul Durrant
If memory mapped I/O is 'chunked' then the I/O must be re-emulated,
otherwise only the first chunk will be processed. This patch makes
sure all I/O from a buffer is re-emulated regardless of whether it
is a read or a write.

I'm not sure I understand this: Isn't the reason for treating reads
and writes differently due to the fact that MMIO reads may have
side effects, and hence can't be re-issued (whereas writes are
always the last thing an instruction does, and hence can't hold up
retiring of it, and hence don't need retrying)?

Read were always re-issued, which is why handle_mmio() is called in hvm_io_assit(). If the underlying MMIO is deferred to QEMU then this is the only way for Xen to pick up the result. This patch adds completion for writes.
If the I/O has been broken down in the underlying hvmemul_write() and a 'chunk' deferred to QEMU then there is actually need to re-emulate otherwise any remaining chunks will not be handled.

Furthermore, doesn't "only the first chunk" get represented correctly
already by informing the caller that only a single iteration of a
repeated instruction was done, such that further repeats will get
carried out anyway (resulting in another, fresh cycle through the
emulator)?

No, because we're talking about 'chunks' here and not 'reps'. If a single non-rep I/O is broken down into, say, two chunks then we:

- Issue the I/O for the first chunk to QEMU
- Say we did nothing by returning RETRY
- Re-issue the emulation from hvm_io_assist()
- Pick up the result of the first chunk from the ioreq, add it to the cache, and issue the second chunk to QEMU
- Say we did nothing by returning RETRY
- Re-issue the emulation from hvm_io_assist()
- Pick up the result of the first chunk from the cache and pick up the result of the second chunk from the ioreq
- Say we completed the I/O by returning OKAY

I agree it's not nice, and bouncing would have been preferable, but that's the way 'wide I/O' works.

Paul

Jan

Jan Beulich

2015-06-25 10:50:10 UTC

Post by Paul Durrant

-----Original Message-----
Sent: 25 June 2015 10:58
To: Paul Durrant
Subject: Re: [PATCH v4 16/17] x86/hvm: always re-emulate I/O from a buffer

Post by Paul Durrant
If memory mapped I/O is 'chunked' then the I/O must be re-emulated,
otherwise only the first chunk will be processed. This patch makes
sure all I/O from a buffer is re-emulated regardless of whether it
is a read or a write.

I'm not sure I understand this: Isn't the reason for treating reads
and writes differently due to the fact that MMIO reads may have
side effects, and hence can't be re-issued (whereas writes are
always the last thing an instruction does, and hence can't hold up
retiring of it, and hence don't need retrying)?

Read were always re-issued, which is why handle_mmio() is called in
hvm_io_assit(). If the underlying MMIO is deferred to QEMU then this is the
only way for Xen to pick up the result. This patch adds completion for
writes.
If the I/O has been broken down in the underlying hvmemul_write() and a
'chunk' deferred to QEMU then there is actually need to re-emulate otherwise
any remaining chunks will not be handled.

Furthermore, doesn't "only the first chunk" get represented correctly
already by informing the caller that only a single iteration of a
repeated instruction was done, such that further repeats will get
carried out anyway (resulting in another, fresh cycle through the
emulator)?

No, because we're talking about 'chunks' here and not 'reps'. If a single
- Issue the I/O for the first chunk to QEMU
- Say we did nothing by returning RETRY
- Re-issue the emulation from hvm_io_assist()
- Pick up the result of the first chunk from the ioreq, add it to the cache,
and issue the second chunk to QEMU
- Say we did nothing by returning RETRY
- Re-issue the emulation from hvm_io_assist()
- Pick up the result of the first chunk from the cache and pick up the result
of the second chunk from the ioreq
- Say we completed the I/O by returning OKAY
I agree it's not nice, and bouncing would have been preferable, but that's
the way 'wide I/O' works.

I see. Which means
Acked-by: Jan Beulich <***@suse.com>

Jan

Paul Durrant

2015-06-25 10:52:02 UTC

-----Original Message-----
Sent: 25 June 2015 11:50
To: Paul Durrant
Subject: RE: [PATCH v4 16/17] x86/hvm: always re-emulate I/O from a buffer

Post by Paul Durrant

-----Original Message-----
Sent: 25 June 2015 10:58
To: Paul Durrant
Subject: Re: [PATCH v4 16/17] x86/hvm: always re-emulate I/O from a

buffer

Post by Paul Durrant

Post by Paul Durrant
If memory mapped I/O is 'chunked' then the I/O must be re-emulated,
otherwise only the first chunk will be processed. This patch makes
sure all I/O from a buffer is re-emulated regardless of whether it
is a read or a write.

I'm not sure I understand this: Isn't the reason for treating reads
and writes differently due to the fact that MMIO reads may have
side effects, and hence can't be re-issued (whereas writes are
always the last thing an instruction does, and hence can't hold up
retiring of it, and hence don't need retrying)?

Read were always re-issued, which is why handle_mmio() is called in
hvm_io_assit(). If the underlying MMIO is deferred to QEMU then this is

the

Post by Paul Durrant
only way for Xen to pick up the result. This patch adds completion for
writes.
If the I/O has been broken down in the underlying hvmemul_write() and a
'chunk' deferred to QEMU then there is actually need to re-emulate

otherwise

Post by Paul Durrant
any remaining chunks will not be handled.

Furthermore, doesn't "only the first chunk" get represented correctly
already by informing the caller that only a single iteration of a
repeated instruction was done, such that further repeats will get
carried out anyway (resulting in another, fresh cycle through the
emulator)?

No, because we're talking about 'chunks' here and not 'reps'. If a single
- Issue the I/O for the first chunk to QEMU
- Say we did nothing by returning RETRY
- Re-issue the emulation from hvm_io_assist()
- Pick up the result of the first chunk from the ioreq, add it to the cache,
and issue the second chunk to QEMU
- Say we did nothing by returning RETRY
- Re-issue the emulation from hvm_io_assist()
- Pick up the result of the first chunk from the cache and pick up the result
of the second chunk from the ioreq
- Say we completed the I/O by returning OKAY
I agree it's not nice, and bouncing would have been preferable, but that's
the way 'wide I/O' works.

I see. Which means

Thanks.

Paul

Jan

Paul Durrant

2015-06-24 11:24:49 UTC

The code in hvmemul_do_io() that tracks large reads or writes, to avoid
re-issue of component I/O, is defeated by accesses across a page boundary
because it uses physical address. The code is also only relevant to memory
mapped I/O to or from a buffer.

This patch re-factors the code and moves it into hvmemul_phys_mmio_access()
where it is relevant and tracks using buffer offset rather then address.

Signed-off-by: Paul Durrant <***@citrix.com>
Cc: Keir Fraser <***@xen.org>
Cc: Jan Beulich <***@suse.com>
Cc: Andrew Cooper <***@citrix.com>
---
xen/arch/x86/hvm/emulate.c | 98 ++++++++++++++++------------------------
xen/include/asm-x86/hvm/vcpu.h | 16 ++++---
2 files changed, 48 insertions(+), 66 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index aa68787..4424dfc 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -107,29 +107,6 @@ static int hvmemul_do_io(
return X86EMUL_UNHANDLEABLE;
}

- if ( is_mmio && !data_is_addr )
- {
- /* Part of a multi-cycle read or write? */
- if ( dir == IOREQ_WRITE )
- {
- paddr_t pa = vio->mmio_large_write_pa;
- unsigned int bytes = vio->mmio_large_write_bytes;
- if ( (addr >= pa) && ((addr + size) <= (pa + bytes)) )
- return X86EMUL_OKAY;
- }
- else
- {
- paddr_t pa = vio->mmio_large_read_pa;
- unsigned int bytes = vio->mmio_large_read_bytes;
- if ( (addr >= pa) && ((addr + size) <= (pa + bytes)) )
- {
- memcpy(p_data, &vio->mmio_large_read[addr - pa],
- size);
- return X86EMUL_OKAY;
- }
- }
- }
-
switch ( vio->io_req.state )
{
case STATE_IOREQ_NONE:
@@ -209,33 +186,6 @@ static int hvmemul_do_io(
memcpy(p_data, &p.data, size);
}

- if ( is_mmio && !data_is_addr )
- {
- /* Part of a multi-cycle read or write? */
- if ( dir == IOREQ_WRITE )
- {
- paddr_t pa = vio->mmio_large_write_pa;
- unsigned int bytes = vio->mmio_large_write_bytes;
- if ( bytes == 0 )
- pa = vio->mmio_large_write_pa = addr;
- if ( addr == (pa + bytes) )
- vio->mmio_large_write_bytes += size;
- }
- else
- {
- paddr_t pa = vio->mmio_large_read_pa;
- unsigned int bytes = vio->mmio_large_read_bytes;
- if ( bytes == 0 )
- pa = vio->mmio_large_read_pa = addr;
- if ( (addr == (pa + bytes)) &&
- ((bytes + size) <= sizeof(vio->mmio_large_read)) )
- {
- memcpy(&vio->mmio_large_read[bytes], p_data, size);
- vio->mmio_large_read_bytes += size;
- }
- }
- }
-
return X86EMUL_OKAY;
}

@@ -601,8 +551,11 @@ static int hvmemul_virtual_to_linear(
}

static int hvmemul_phys_mmio_access(
- paddr_t gpa, unsigned int size, uint8_t dir, uint8_t **buffer)
+ paddr_t gpa, unsigned int size, uint8_t dir, uint8_t *buffer,
+ unsigned int *off)
{
+ struct vcpu *curr = current;
+ struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
unsigned long one_rep = 1;
unsigned int chunk;
int rc;
@@ -621,14 +574,41 @@ static int hvmemul_phys_mmio_access(

for ( ;; )
{
- rc = hvmemul_do_mmio_buffer(gpa, &one_rep, chunk, dir, 0,
- *buffer);
- if ( rc != X86EMUL_OKAY )
- break;
+ /* Have we already done this chunk? */
+ if ( (*off + chunk) <= vio->mmio_cache[dir].size )
+ {
+ ASSERT(*off + chunk <= vio->mmio_cache[dir].size);
+
+ if ( dir == IOREQ_READ )
+ memcpy(&buffer[*off],
+ &vio->mmio_cache[IOREQ_READ].buffer[*off],
+ chunk);
+ else
+ {
+ if ( memcmp(&buffer[*off],
+ &vio->mmio_cache[IOREQ_WRITE].buffer[*off],
+ chunk) != 0 )
+ domain_crash(curr->domain);
+ }
+ }
+ else
+ {
+ ASSERT(*off == vio->mmio_cache[dir].size);
+
+ rc = hvmemul_do_mmio_buffer(gpa, &one_rep, chunk, dir, 0,
+ &buffer[*off]);
+ if ( rc != X86EMUL_OKAY )
+ break;
+
+ /* Note that we have now done this chunk */
+ memcpy(&vio->mmio_cache[dir].buffer[*off],
+ &buffer[*off], chunk);
+ vio->mmio_cache[dir].size += chunk;
+ }

/* Advance to the next chunk */
gpa += chunk;
- *buffer += chunk;
+ *off += chunk;
size -= chunk;

if ( size == 0 )
@@ -651,7 +631,7 @@ static int hvmemul_linear_mmio_access(
{
struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
unsigned long page_off = gla & (PAGE_SIZE - 1);
- unsigned int chunk;
+ unsigned int chunk, buffer_off = 0;
paddr_t gpa;
unsigned long one_rep = 1;
int rc;
@@ -670,7 +650,7 @@ static int hvmemul_linear_mmio_access(

for ( ;; )
{
- rc = hvmemul_phys_mmio_access(gpa, chunk, dir, &buffer);
+ rc = hvmemul_phys_mmio_access(gpa, chunk, dir, buffer, &buffer_off);
if ( rc != X86EMUL_OKAY )
break;

@@ -1625,7 +1605,7 @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
rc = X86EMUL_RETRY;
if ( rc != X86EMUL_RETRY )
{
- vio->mmio_large_read_bytes = vio->mmio_large_write_bytes = 0;
+ memset(&vio->mmio_cache, 0, sizeof(vio->mmio_cache));
vio->mmio_insn_bytes = 0;
}
else
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 008c8fa..4f41c83 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -61,13 +61,15 @@ struct hvm_vcpu_io {
unsigned long mmio_gva;
unsigned long mmio_gpfn;

- /* We may read up to m256 as a number of device-model transactions. */
- paddr_t mmio_large_read_pa;
- uint8_t mmio_large_read[32];
- unsigned int mmio_large_read_bytes;
- /* We may write up to m256 as a number of device-model transactions. */
- unsigned int mmio_large_write_bytes;
- paddr_t mmio_large_write_pa;
+ /*
+ * We may read or write up to m256 as a number of device-model
+ * transactions.
+ */
+ struct {
+ unsigned long size;
+ uint8_t buffer[32];
+ } mmio_cache[2]; /* Indexed by ioreq type */
+
/* For retries we shouldn't re-fetch the instruction. */
unsigned int mmio_insn_bytes;
unsigned char mmio_insn[16];

--
1.7.10.4

Jan Beulich

2015-06-25 10:46:50 UTC

Post by Paul Durrant
@@ -621,14 +574,41 @@ static int hvmemul_phys_mmio_access(
for ( ;; )
{
- rc = hvmemul_do_mmio_buffer(gpa, &one_rep, chunk, dir, 0,
- *buffer);
- if ( rc != X86EMUL_OKAY )
- break;
+ /* Have we already done this chunk? */
+ if ( (*off + chunk) <= vio->mmio_cache[dir].size )

I can see why you would like to get rid of the address check, but
I'm afraid you can't: You have to avoid getting mixed up multiple
same kind (reads or writes) memory accesses that a single
instruction can do. While generally I would assume that
secondary accesses (like the I/O bitmap read associated with an
OUTS) wouldn't go to MMIO, CMPS with both operands being
in MMIO would break even if neither crosses a page boundary
(not to think of when the emulator starts supporting the
scatter/gather instructions, albeit supporting them will require
further changes, or we could choose to do them one element at
a time).

Post by Paul Durrant
+ {
+ ASSERT(*off + chunk <= vio->mmio_cache[dir].size);

I don't see any difference to the if() expression just above.

Post by Paul Durrant
+ if ( dir == IOREQ_READ )
+ memcpy(&buffer[*off],
+ &vio->mmio_cache[IOREQ_READ].buffer[*off],
+ chunk);
+ else
+ {
+ if ( memcmp(&buffer[*off],

"else if" please.

Post by Paul Durrant
+ &vio->mmio_cache[IOREQ_WRITE].buffer[*off],
+ chunk) != 0 )
+ domain_crash(curr->domain);
+ }
+ }
+ else
+ {
+ ASSERT(*off == vio->mmio_cache[dir].size);
+
+ rc = hvmemul_do_mmio_buffer(gpa, &one_rep, chunk, dir, 0,
+ &buffer[*off]);
+ if ( rc != X86EMUL_OKAY )
+ break;
+
+ /* Note that we have now done this chunk */

Missing stop.

Jan

Paul Durrant

2015-06-25 10:51:32 UTC

-----Original Message-----
Sent: 25 June 2015 11:47
To: Paul Durrant
Subject: Re: [PATCH v4 17/17] x86/hvm: track large memory mapped
accesses by buffer offset

Post by Paul Durrant
@@ -621,14 +574,41 @@ static int hvmemul_phys_mmio_access(
for ( ;; )
{
- rc = hvmemul_do_mmio_buffer(gpa, &one_rep, chunk, dir, 0,
- *buffer);
- if ( rc != X86EMUL_OKAY )
- break;
+ /* Have we already done this chunk? */
+ if ( (*off + chunk) <= vio->mmio_cache[dir].size )

I can see why you would like to get rid of the address check, but
I'm afraid you can't: You have to avoid getting mixed up multiple
same kind (reads or writes) memory accesses that a single
instruction can do. While generally I would assume that
secondary accesses (like the I/O bitmap read associated with an
OUTS) wouldn't go to MMIO, CMPS with both operands being
in MMIO would break even if neither crosses a page boundary
(not to think of when the emulator starts supporting the
scatter/gather instructions, albeit supporting them will require
further changes, or we could choose to do them one element at
a time).

Ok. Can I assume at most two distinct set of addresses for read or write? If so then I can just keep two sets of caches in the hvm_io struct.

Post by Paul Durrant
+ {
+ ASSERT(*off + chunk <= vio->mmio_cache[dir].size);

I don't see any difference to the if() expression just above.

That's possible - this has been through a few re-bases.

Post by Paul Durrant
+ if ( dir == IOREQ_READ )
+ memcpy(&buffer[*off],
+ &vio->mmio_cache[IOREQ_READ].buffer[*off],
+ chunk);
+ else
+ {
+ if ( memcmp(&buffer[*off],

"else if" please.

Ok.

Post by Paul Durrant
+ &vio->mmio_cache[IOREQ_WRITE].buffer[*off],
+ chunk) != 0 )
+ domain_crash(curr->domain);
+ }
+ }
+ else
+ {
+ ASSERT(*off == vio->mmio_cache[dir].size);
+
+ rc = hvmemul_do_mmio_buffer(gpa, &one_rep, chunk, dir, 0,
+ &buffer[*off]);
+ if ( rc != X86EMUL_OKAY )
+ break;
+
+ /* Note that we have now done this chunk */

Missing stop.

Ok.

Paul

Jan

Jan Beulich

2015-06-25 11:05:51 UTC

Post by Paul Durrant

-----Original Message-----
Sent: 25 June 2015 11:47
To: Paul Durrant
Subject: Re: [PATCH v4 17/17] x86/hvm: track large memory mapped
accesses by buffer offset

Post by Paul Durrant
@@ -621,14 +574,41 @@ static int hvmemul_phys_mmio_access(
for ( ;; )
{
- rc = hvmemul_do_mmio_buffer(gpa, &one_rep, chunk, dir, 0,
- *buffer);
- if ( rc != X86EMUL_OKAY )
- break;
+ /* Have we already done this chunk? */
+ if ( (*off + chunk) <= vio->mmio_cache[dir].size )

I can see why you would like to get rid of the address check, but
I'm afraid you can't: You have to avoid getting mixed up multiple
same kind (reads or writes) memory accesses that a single
instruction can do. While generally I would assume that
secondary accesses (like the I/O bitmap read associated with an
OUTS) wouldn't go to MMIO, CMPS with both operands being
in MMIO would break even if neither crosses a page boundary
(not to think of when the emulator starts supporting the
scatter/gather instructions, albeit supporting them will require
further changes, or we could choose to do them one element at
a time).

Ok. Can I assume at most two distinct set of addresses for read or write? If
so then I can just keep two sets of caches in the hvm_io struct.

If we can leave out implicit accesses (like the one mentioned)
as well as stack ones, then there shouldn't be more than two
(disjoint) reads and one write per instruction, but each possibly
crossing a page boundary.

If we want to support stacks in MMIO, enter and leave would
extend that set, as would said implicit accesses. Of course we
should take into consideration what currently works, and I
think both stack and implicit accesses would currently work as
long as they're aligned (as misalignment would be the only
reason for them to get split up - they're never wider than a
long). I.e. you may want to consider avoiding any ASSERT()s
or other conditionals potentially breaking these special cases.

Jan

Paul Durrant

2015-06-25 10:55:07 UTC

-----Original Message-----
From: Paul Durrant
Sent: 25 June 2015 11:52
To: 'Jan Beulich'
Subject: RE: [PATCH v4 17/17] x86/hvm: track large memory mapped
accesses by buffer offset

-----Original Message-----
Sent: 25 June 2015 11:47
To: Paul Durrant
Subject: Re: [PATCH v4 17/17] x86/hvm: track large memory mapped
accesses by buffer offset

Post by Paul Durrant
@@ -621,14 +574,41 @@ static int hvmemul_phys_mmio_access(
for ( ;; )
{
- rc = hvmemul_do_mmio_buffer(gpa, &one_rep, chunk, dir, 0,
- *buffer);
- if ( rc != X86EMUL_OKAY )
- break;
+ /* Have we already done this chunk? */
+ if ( (*off + chunk) <= vio->mmio_cache[dir].size )

I can see why you would like to get rid of the address check, but
I'm afraid you can't: You have to avoid getting mixed up multiple
same kind (reads or writes) memory accesses that a single
instruction can do. While generally I would assume that
secondary accesses (like the I/O bitmap read associated with an
OUTS) wouldn't go to MMIO, CMPS with both operands being
in MMIO would break even if neither crosses a page boundary
(not to think of when the emulator starts supporting the
scatter/gather instructions, albeit supporting them will require
further changes, or we could choose to do them one element at
a time).

Ok. Can I assume at most two distinct set of addresses for read or write? If so
then I can just keep two sets of caches in the hvm_io struct.

Oh, I mean linear addresses here BTW.

Paul

Post by Paul Durrant
+ {
+ ASSERT(*off + chunk <= vio->mmio_cache[dir].size);

I don't see any difference to the if() expression just above.

That's possible - this has been through a few re-bases.

Post by Paul Durrant
+ if ( dir == IOREQ_READ )
+ memcpy(&buffer[*off],
+ &vio->mmio_cache[IOREQ_READ].buffer[*off],
+ chunk);
+ else
+ {
+ if ( memcmp(&buffer[*off],

"else if" please.

Ok.

Post by Paul Durrant
+ &vio->mmio_cache[IOREQ_WRITE].buffer[*off],
+ chunk) != 0 )
+ domain_crash(curr->domain);
+ }
+ }
+ else
+ {
+ ASSERT(*off == vio->mmio_cache[dir].size);
+
+ rc = hvmemul_do_mmio_buffer(gpa, &one_rep, chunk, dir, 0,
+ &buffer[*off]);
+ if ( rc != X86EMUL_OKAY )
+ break;
+
+ /* Note that we have now done this chunk */

Missing stop.

Ok.
Paul

Jan

Jan Beulich

2015-06-25 11:08:09 UTC

Post by Paul Durrant

From: Paul Durrant
Sent: 25 June 2015 11:52

Sent: 25 June 2015 11:47

Post by Paul Durrant
@@ -621,14 +574,41 @@ static int hvmemul_phys_mmio_access(
for ( ;; )
{
- rc = hvmemul_do_mmio_buffer(gpa, &one_rep, chunk, dir, 0,
- *buffer);
- if ( rc != X86EMUL_OKAY )
- break;
+ /* Have we already done this chunk? */
+ if ( (*off + chunk) <= vio->mmio_cache[dir].size )

I can see why you would like to get rid of the address check, but
I'm afraid you can't: You have to avoid getting mixed up multiple
same kind (reads or writes) memory accesses that a single
instruction can do. While generally I would assume that
secondary accesses (like the I/O bitmap read associated with an
OUTS) wouldn't go to MMIO, CMPS with both operands being
in MMIO would break even if neither crosses a page boundary
(not to think of when the emulator starts supporting the
scatter/gather instructions, albeit supporting them will require
further changes, or we could choose to do them one element at
a time).

Ok. Can I assume at most two distinct set of addresses for read or write? If

so

then I can just keep two sets of caches in the hvm_io struct.

Oh, I mean linear addresses here BTW.

Yes, that's what I implied - afaics switching to using linear addresses
shouldn't result in any problem (but then again I wonder whether
physical addresses really were chosen originally for no real reason).

Jan

Paul Durrant

2015-06-24 11:24:44 UTC

The state of in-flight I/O and how its completion will be handled are
logically separate and conflating the two makes the code unnecessarily
confusing.

Signed-off-by: Paul Durrant <***@citrix.com>
Cc: Keir Fraser <***@xen.org>
Cc: Jan Beulich <***@suse.com>
Cc: Andrew Cooper <***@citrix.com>
---
xen/arch/x86/hvm/hvm.c | 50 ++++++++++++++++++++++++-------------
xen/arch/x86/hvm/io.c | 6 ++---
xen/arch/x86/hvm/vmx/realmode.c | 27 ++++++++++++++------
xen/include/asm-x86/hvm/vcpu.h | 16 ++++++++----
xen/include/asm-x86/hvm/vmx/vmx.h | 1 +
5 files changed, 68 insertions(+), 32 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 3365abb..39f40ad 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -59,6 +59,7 @@
#include <asm/hvm/trace.h>
#include <asm/hvm/nestedhvm.h>
#include <asm/hvm/event.h>
+#include <asm/hvm/vmx/vmx.h>
#include <asm/mtrr.h>
#include <asm/apic.h>
#include <public/sched.h>
@@ -428,26 +429,12 @@ static void hvm_io_assist(ioreq_t *p)
vio->io_state = HVMIO_completed;
vio->io_data = p->data;
break;
- case HVMIO_handle_mmio_awaiting_completion:
- vio->io_state = HVMIO_completed;
- vio->io_data = p->data;
- (void)handle_mmio();
- break;
- case HVMIO_handle_pio_awaiting_completion:
- if ( vio->io_size == 4 ) /* Needs zero extension. */
- guest_cpu_user_regs()->rax = (uint32_t)p->data;
- else
- memcpy(&guest_cpu_user_regs()->rax, &p->data, vio->io_size);
- break;
default:
break;
}

- if ( p->state == STATE_IOREQ_NONE )
- {
- msix_write_completion(curr);
- vcpu_end_shutdown_deferral(curr);
- }
+ msix_write_completion(curr);
+ vcpu_end_shutdown_deferral(curr);
}

static bool_t hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
@@ -482,6 +469,7 @@ void hvm_do_resume(struct vcpu *v)
struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
struct domain *d = v->domain;
struct hvm_ioreq_server *s;
+ enum hvm_io_completion io_completion;

check_wakeup_from_wait();

@@ -508,8 +496,36 @@ void hvm_do_resume(struct vcpu *v)
}
}

- if ( vio->mmio_retry )
+ io_completion = vio->io_completion;
+ vio->io_completion = HVMIO_no_completion;
+
+ switch ( io_completion )
+ {
+ case HVMIO_no_completion:
+ break;
+ case HVMIO_mmio_completion:
(void)handle_mmio();
+ break;
+ case HVMIO_pio_completion:
+ if ( vio->io_size == 4 ) /* Needs zero extension. */
+ guest_cpu_user_regs()->rax = (uint32_t)vio->io_data;
+ else
+ memcpy(&guest_cpu_user_regs()->rax, &vio->io_data, vio->io_size);
+ vio->io_state = HVMIO_none;
+ break;
+ case HVMIO_realmode_completion:
+ {
+ struct hvm_emulate_ctxt ctxt;
+
+ hvm_emulate_prepare(&ctxt, guest_cpu_user_regs());
+ vmx_realmode_emulate_one(&ctxt);
+ hvm_emulate_writeback(&ctxt);
+
+ break;
+ }
+ default:
+ BUG();
+ }

/* Inject pending hw/sw trap */
if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 61df6dd..27150e9 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -92,8 +92,8 @@ int handle_mmio(void)

if ( rc != X86EMUL_RETRY )
vio->io_state = HVMIO_none;
- if ( vio->io_state == HVMIO_awaiting_completion )
- vio->io_state = HVMIO_handle_mmio_awaiting_completion;
+ if ( vio->io_state == HVMIO_awaiting_completion || vio->mmio_retry )
+ vio->io_completion = HVMIO_mmio_completion;
else
vio->mmio_access = (struct npfec){};

@@ -158,7 +158,7 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
return 0;
/* Completion in hvm_io_assist() with no re-emulation required. */
ASSERT(dir == IOREQ_READ);
- vio->io_state = HVMIO_handle_pio_awaiting_completion;
+ vio->io_completion = HVMIO_pio_completion;
break;
default:
gdprintk(XENLOG_ERR, "Weird HVM ioemulation status %d.\n", rc);
diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c
index fe8b4a0..76ff9a5 100644
--- a/xen/arch/x86/hvm/vmx/realmode.c
+++ b/xen/arch/x86/hvm/vmx/realmode.c
@@ -101,15 +101,19 @@ static void realmode_deliver_exception(
}
}

-static void realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt)
+void vmx_realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt)
{
struct vcpu *curr = current;
+ struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
int rc;

perfc_incr(realmode_emulations);

rc = hvm_emulate_one(hvmemul_ctxt);

+ if ( vio->io_state == HVMIO_awaiting_completion || vio->mmio_retry )
+ vio->io_completion = HVMIO_realmode_completion;
+
if ( rc == X86EMUL_UNHANDLEABLE )
{
gdprintk(XENLOG_ERR, "Failed to emulate insn.\n");
@@ -177,9 +181,6 @@ void vmx_realmode(struct cpu_user_regs *regs)

hvm_emulate_prepare(&hvmemul_ctxt, regs);

- if ( vio->io_state == HVMIO_completed )
- realmode_emulate_one(&hvmemul_ctxt);
-
/* Only deliver interrupts into emulated real mode. */
if ( !(curr->arch.hvm_vcpu.guest_cr[0] & X86_CR0_PE) &&
(intr_info & INTR_INFO_VALID_MASK) )
@@ -190,8 +191,7 @@ void vmx_realmode(struct cpu_user_regs *regs)

curr->arch.hvm_vmx.vmx_emulate = 1;
while ( curr->arch.hvm_vmx.vmx_emulate &&
- !softirq_pending(smp_processor_id()) &&
- (vio->io_state == HVMIO_none) )
+ !softirq_pending(smp_processor_id()) )
{
/*
* Check for pending interrupts only every 16 instructions, because
@@ -203,7 +203,10 @@ void vmx_realmode(struct cpu_user_regs *regs)
hvm_local_events_need_delivery(curr) )
break;

- realmode_emulate_one(&hvmemul_ctxt);
+ vmx_realmode_emulate_one(&hvmemul_ctxt);
+
+ if ( vio->io_state != HVMIO_none || vio->mmio_retry )
+ break;

/* Stop emulating unless our segment state is not safe */
if ( curr->arch.hvm_vmx.vmx_realmode )
@@ -245,3 +248,13 @@ void vmx_realmode(struct cpu_user_regs *regs)
if ( intr_info & INTR_INFO_VALID_MASK )
__vmwrite(VM_ENTRY_INTR_INFO, intr_info);
}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 97d78bd..c4d96a8 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -34,11 +34,16 @@ enum hvm_io_state {
HVMIO_none = 0,
HVMIO_dispatched,
HVMIO_awaiting_completion,
- HVMIO_handle_mmio_awaiting_completion,
- HVMIO_handle_pio_awaiting_completion,
HVMIO_completed
};

+enum hvm_io_completion {
+ HVMIO_no_completion = 0,
+ HVMIO_mmio_completion,
+ HVMIO_pio_completion,
+ HVMIO_realmode_completion
+};
+
struct hvm_vcpu_asid {
uint64_t generation;
uint32_t asid;
@@ -46,9 +51,10 @@ struct hvm_vcpu_asid {

struct hvm_vcpu_io {
/* I/O request in flight to device model. */
- enum hvm_io_state io_state;
- unsigned long io_data;
- int io_size;
+ enum hvm_io_state io_state;
+ unsigned long io_data;
+ int io_size;
+ enum hvm_io_completion io_completion;

/*
* HVM emulation:
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 35f804a..c5f3d24 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -94,6 +94,7 @@ void vmx_asm_do_vmentry(void);
void vmx_intr_assist(void);
void noreturn vmx_do_resume(struct vcpu *);
void vmx_vlapic_msr_changed(struct vcpu *v);
+void vmx_realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt);
void vmx_realmode(struct cpu_user_regs *regs);
void vmx_update_debug_state(struct vcpu *v);
void vmx_update_exception_bitmap(struct vcpu *v);

--
1.7.10.4

Jan Beulich

2015-06-25 09:40:38 UTC

Post by Paul Durrant
@@ -428,26 +429,12 @@ static void hvm_io_assist(ioreq_t *p)
vio->io_state = HVMIO_completed;
vio->io_data = p->data;
break;
- vio->io_state = HVMIO_completed;
- vio->io_data = p->data;
- (void)handle_mmio();
- break;
- if ( vio->io_size == 4 ) /* Needs zero extension. */
- guest_cpu_user_regs()->rax = (uint32_t)p->data;
- else
- memcpy(&guest_cpu_user_regs()->rax, &p->data, vio->io_size);
- break;
break;
}
- if ( p->state == STATE_IOREQ_NONE )
- {
- msix_write_completion(curr);
- vcpu_end_shutdown_deferral(curr);
- }
+ msix_write_completion(curr);
+ vcpu_end_shutdown_deferral(curr);
}

Doesn't this mean that these can now potentially be called more than
once for a single instruction (due to retries)? That's presumably not a
problem for vcpu_end_shutdown_deferral(), but I'm uncertain about
msix_write_completion().

Post by Paul Durrant
@@ -508,8 +496,36 @@ void hvm_do_resume(struct vcpu *v)
}
}
- if ( vio->mmio_retry )
+ io_completion = vio->io_completion;
+ vio->io_completion = HVMIO_no_completion;
+
+ switch ( io_completion )
+ {
+ break;
(void)handle_mmio();
+ break;
+ if ( vio->io_size == 4 ) /* Needs zero extension. */
+ guest_cpu_user_regs()->rax = (uint32_t)vio->io_data;
+ else
+ memcpy(&guest_cpu_user_regs()->rax, &vio->io_data, vio->io_size);
+ vio->io_state = HVMIO_none;
+ break;
+ {
+ struct hvm_emulate_ctxt ctxt;
+
+ hvm_emulate_prepare(&ctxt, guest_cpu_user_regs());
+ vmx_realmode_emulate_one(&ctxt);
+ hvm_emulate_writeback(&ctxt);
+
+ break;
+ }
+ BUG();

I'd prefer such to be ASSERT_UNREACHABLE(), as I don't see
getting here to be a reason to crash the hypervisor (and the
guest would likely crash in such a case anyway). Or maybe
fold in the first case and make this
ASSERT(io_completion == HVMIO_no_completion).

Post by Paul Durrant
@@ -46,9 +51,10 @@ struct hvm_vcpu_asid {
struct hvm_vcpu_io {
/* I/O request in flight to device model. */
- enum hvm_io_state io_state;
- unsigned long io_data;
- int io_size;
+ enum hvm_io_state io_state;
+ unsigned long io_data;
+ int io_size;

Unless this can validly be negative, please make this unsigned int
if you already touch it.

Jan

Paul Durrant

2015-06-25 15:59:21 UTC

-----Original Message-----
Sent: 25 June 2015 10:41
To: Paul Durrant
Subject: Re: [PATCH v4 12/17] x86/hvm: split I/O completion handling from
state model

Post by Paul Durrant
@@ -428,26 +429,12 @@ static void hvm_io_assist(ioreq_t *p)
vio->io_state = HVMIO_completed;
vio->io_data = p->data;
break;
- vio->io_state = HVMIO_completed;
- vio->io_data = p->data;
- (void)handle_mmio();
- break;
- if ( vio->io_size == 4 ) /* Needs zero extension. */
- guest_cpu_user_regs()->rax = (uint32_t)p->data;
- else
- memcpy(&guest_cpu_user_regs()->rax, &p->data, vio->io_size);
- break;
break;
}
- if ( p->state == STATE_IOREQ_NONE )
- {
- msix_write_completion(curr);
- vcpu_end_shutdown_deferral(curr);
- }
+ msix_write_completion(curr);
+ vcpu_end_shutdown_deferral(curr);
}

Doesn't this mean that these can now potentially be called more than
once for a single instruction (due to retries)? That's presumably not a
problem for vcpu_end_shutdown_deferral(), but I'm uncertain about
msix_write_completion().

Actually I think this is the wrong place for this now. It should have been moved in or prior to patch 11. I found the need for a completion function for stdvga so I'll use the same mechanism for msix_write_completion(). As you say, vcpu_end_shutdown_deferral() can stay.

Post by Paul Durrant
@@ -508,8 +496,36 @@ void hvm_do_resume(struct vcpu *v)
}
}
- if ( vio->mmio_retry )
+ io_completion = vio->io_completion;
+ vio->io_completion = HVMIO_no_completion;
+
+ switch ( io_completion )
+ {
+ break;
(void)handle_mmio();
+ break;
+ if ( vio->io_size == 4 ) /* Needs zero extension. */
+ guest_cpu_user_regs()->rax = (uint32_t)vio->io_data;
+ else
+ memcpy(&guest_cpu_user_regs()->rax, &vio->io_data, vio-
io_size);
+ vio->io_state = HVMIO_none;
+ break;
+ {
+ struct hvm_emulate_ctxt ctxt;
+
+ hvm_emulate_prepare(&ctxt, guest_cpu_user_regs());
+ vmx_realmode_emulate_one(&ctxt);
+ hvm_emulate_writeback(&ctxt);
+
+ break;
+ }
+ BUG();

I'd prefer such to be ASSERT_UNREACHABLE(), as I don't see
getting here to be a reason to crash the hypervisor (and the
guest would likely crash in such a case anyway). Or maybe
fold in the first case and make this
ASSERT(io_completion == HVMIO_no_completion).

Ok.

Post by Paul Durrant
@@ -46,9 +51,10 @@ struct hvm_vcpu_asid {
struct hvm_vcpu_io {
/* I/O request in flight to device model. */
- enum hvm_io_state io_state;
- unsigned long io_data;
- int io_size;
+ enum hvm_io_state io_state;
+ unsigned long io_data;
+ int io_size;

Unless this can validly be negative, please make this unsigned int
if you already touch it.

Sure.

Paul

Jan

Paul Durrant

2015-06-24 11:24:47 UTC

Use an ioreq_t rather than open coded state, size, dir and data fields
in struct hvm_vcpu_io. This also allows PIO completion to be handled
similarly to MMIO completion by re-issuing the handle_pio() call.

Signed-off-by: Paul Durrant <***@citrix.com>
Cc: Keir Fraser <***@xen.org>
Cc: Jan Beulich <***@suse.com>
Cc: Andrew Cooper <***@citrix.com>
---
xen/arch/x86/hvm/emulate.c | 35 +++++++++++++++++++++--------------
xen/arch/x86/hvm/hvm.c | 15 ++++++++-------
xen/arch/x86/hvm/svm/nestedsvm.c | 2 +-
xen/arch/x86/hvm/vmx/realmode.c | 4 ++--
xen/include/asm-x86/hvm/vcpu.h | 12 ++++--------
5 files changed, 36 insertions(+), 32 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 6f538bf..6c50ef5 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -92,6 +92,7 @@ static int hvmemul_do_io(
.df = df,
.data = data,
.data_is_ptr = data_is_addr, /* ioreq_t field name is misleading */
+ .state = STATE_IOREQ_READY,
};
void *p_data = (void *)data;
int rc;
@@ -129,12 +130,24 @@ static int hvmemul_do_io(
}
}

- switch ( vio->io_state )
+ switch ( vio->io_req.state )
{
case STATE_IOREQ_NONE:
break;
case STATE_IORESP_READY:
- vio->io_state = STATE_IOREQ_NONE;
+ vio->io_req.state = STATE_IOREQ_NONE;
+ p = vio->io_req;
+
+ /* Verify the emulation request has been correctly re-issued */
+ if ( (p.type != is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO) ||
+ (p.addr != addr) ||
+ (p.size != size) ||
+ (p.count != reps) ||
+ (p.dir != dir) ||
+ (p.df != df) ||
+ (p.data_is_ptr != data_is_addr) )
+ domain_crash(curr->domain);
+
if ( data_is_addr || dir == IOREQ_WRITE )
return X86EMUL_UNHANDLEABLE;
goto finish_access;
@@ -142,11 +155,6 @@ static int hvmemul_do_io(
return X86EMUL_UNHANDLEABLE;
}

- vio->io_state = STATE_IOREQ_READY;
- vio->io_size = size;
- vio->io_dir = dir;
- vio->io_data_is_addr = data_is_addr;
-
if ( dir == IOREQ_WRITE )
{
if ( !data_is_addr )
@@ -155,13 +163,14 @@ static int hvmemul_do_io(
hvmtrace_io_assist(&p);
}

+ vio->io_req = p;
+
rc = hvm_io_intercept(&p);

switch ( rc )
{
case X86EMUL_OKAY:
- vio->io_data = p.data;
- vio->io_state = STATE_IOREQ_NONE;
+ vio->io_req.state = STATE_IOREQ_NONE;
break;
case X86EMUL_UNHANDLEABLE:
{
@@ -172,15 +181,13 @@ static int hvmemul_do_io(
if ( !s )
{
rc = hvm_process_io_intercept(&null_handler, &p);
- if ( rc == X86EMUL_OKAY )
- vio->io_data = p.data;
- vio->io_state = STATE_IOREQ_NONE;
+ vio->io_req.state = STATE_IOREQ_NONE;
}
else
{
rc = hvm_send_assist_req(s, &p);
if ( rc != X86EMUL_RETRY )
- vio->io_state = STATE_IOREQ_NONE;
+ vio->io_req.state = STATE_IOREQ_NONE;
else if ( data_is_addr || dir == IOREQ_WRITE )
rc = X86EMUL_OKAY;
}
@@ -199,7 +206,7 @@ static int hvmemul_do_io(
hvmtrace_io_assist(&p);

if ( !data_is_addr )
- memcpy(p_data, &vio->io_data, size);
+ memcpy(p_data, &p.data, size);
}

if ( is_mmio && !data_is_addr )
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 7411287..8abf29b 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -421,11 +421,11 @@ static void hvm_io_assist(ioreq_t *p)

if ( HVMIO_NEED_COMPLETION(vio) )
{
- vio->io_state = STATE_IORESP_READY;
- vio->io_data = p->data;
+ vio->io_req.state = STATE_IORESP_READY;
+ vio->io_req.data = p->data;
}
else
- vio->io_state = STATE_IOREQ_NONE;
+ vio->io_req.state = STATE_IOREQ_NONE;

msix_write_completion(curr);
vcpu_end_shutdown_deferral(curr);
@@ -501,11 +501,12 @@ void hvm_do_resume(struct vcpu *v)
(void)handle_mmio();
break;
case HVMIO_pio_completion:
- if ( vio->io_size == 4 ) /* Needs zero extension. */
- guest_cpu_user_regs()->rax = (uint32_t)vio->io_data;
+ if ( vio->io_req.size == 4 ) /* Needs zero extension. */
+ guest_cpu_user_regs()->rax = (uint32_t)vio->io_req.data;
else
- memcpy(&guest_cpu_user_regs()->rax, &vio->io_data, vio->io_size);
- vio->io_state = STATE_IOREQ_NONE;
+ memcpy(&guest_cpu_user_regs()->rax, &vio->io_req.data,
+ vio->io_req.size);
+ vio->io_req.state = STATE_IOREQ_NONE;
break;
case HVMIO_realmode_completion:
{
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index 8b165c6..78667a2 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -1231,7 +1231,7 @@ enum hvm_intblk nsvm_intr_blocked(struct vcpu *v)
* Delay the injection because this would result in delivering
* an interrupt *within* the execution of an instruction.
*/
- if ( v->arch.hvm_vcpu.hvm_io.io_state != STATE_IOREQ_NONE )
+ if ( v->arch.hvm_vcpu.hvm_io.io_req.state != STATE_IOREQ_NONE )
return hvm_intblk_shadow;

if ( !nv->nv_vmexit_pending && n2vmcb->exitintinfo.bytes != 0 ) {
diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c
index 4135ad4..0f3124d 100644
--- a/xen/arch/x86/hvm/vmx/realmode.c
+++ b/xen/arch/x86/hvm/vmx/realmode.c
@@ -205,7 +205,7 @@ void vmx_realmode(struct cpu_user_regs *regs)

vmx_realmode_emulate_one(&hvmemul_ctxt);

- if ( vio->io_state != STATE_IOREQ_NONE || vio->mmio_retry )
+ if ( vio->io_req.state != STATE_IOREQ_NONE || vio->mmio_retry )
break;

/* Stop emulating unless our segment state is not safe */
@@ -219,7 +219,7 @@ void vmx_realmode(struct cpu_user_regs *regs)
}

/* Need to emulate next time if we've started an IO operation */
- if ( vio->io_state != STATE_IOREQ_NONE )
+ if ( vio->io_req.state != STATE_IOREQ_NONE )
curr->arch.hvm_vmx.vmx_emulate = 1;

if ( !curr->arch.hvm_vmx.vmx_emulate && !curr->arch.hvm_vmx.vmx_realmode )
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index f797518..7338638 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -44,17 +44,13 @@ struct hvm_vcpu_asid {

struct hvm_vcpu_io {
/* I/O request in flight to device model. */
- uint8_t io_state;
- unsigned long io_data;
- int io_size;
+ ioreq_t io_req;
enum hvm_io_completion io_completion;
- uint8_t io_dir;
- uint8_t io_data_is_addr;

#define HVMIO_NEED_COMPLETION(_vio) \
- ( ((_vio)->io_state == STATE_IOREQ_READY) && \
- !(_vio)->io_data_is_addr && \
- ((_vio)->io_dir == IOREQ_READ) )
+ ( ((_vio)->io_req.state == STATE_IOREQ_READY) && \
+ !(_vio)->io_req.data_is_ptr && \
+ ((_vio)->io_req.dir == IOREQ_READ) )

/*
* HVM emulation:

--
1.7.10.4

Jan Beulich

2015-06-25 09:51:10 UTC

Post by Paul Durrant
Use an ioreq_t rather than open coded state, size, dir and data fields
in struct hvm_vcpu_io. This also allows PIO completion to be handled
similarly to MMIO completion by re-issuing the handle_pio() call.

Aren't you referring to ...

Post by Paul Durrant
@@ -501,11 +501,12 @@ void hvm_do_resume(struct vcpu *v)
(void)handle_mmio();

... this one as the reference?

Post by Paul Durrant
break;
- if ( vio->io_size == 4 ) /* Needs zero extension. */
- guest_cpu_user_regs()->rax = (uint32_t)vio->io_data;
+ if ( vio->io_req.size == 4 ) /* Needs zero extension. */
+ guest_cpu_user_regs()->rax = (uint32_t)vio->io_req.data;
else
- memcpy(&guest_cpu_user_regs()->rax, &vio->io_data, vio->io_size);
- vio->io_state = STATE_IOREQ_NONE;
+ memcpy(&guest_cpu_user_regs()->rax, &vio->io_req.data,
+ vio->io_req.size);
+ vio->io_req.state = STATE_IOREQ_NONE;
break;

I.e. shouldn't I expect to see a handle_pio() call here? Or where
else is this new handle_pio() call going to show up?

Jan

Paul Durrant

2015-06-25 10:17:41 UTC

-----Original Message-----
Sent: 25 June 2015 10:51
To: Paul Durrant
Subject: Re: [PATCH v4 15/17] x86/hvm: use ioreq_t to track in-flight state

Post by Paul Durrant
Use an ioreq_t rather than open coded state, size, dir and data fields
in struct hvm_vcpu_io. This also allows PIO completion to be handled
similarly to MMIO completion by re-issuing the handle_pio() call.

Aren't you referring to ...

Post by Paul Durrant
@@ -501,11 +501,12 @@ void hvm_do_resume(struct vcpu *v)
(void)handle_mmio();

... this one as the reference?

Post by Paul Durrant
break;
- if ( vio->io_size == 4 ) /* Needs zero extension. */
- guest_cpu_user_regs()->rax = (uint32_t)vio->io_data;
+ if ( vio->io_req.size == 4 ) /* Needs zero extension. */
+ guest_cpu_user_regs()->rax = (uint32_t)vio->io_req.data;
else
- memcpy(&guest_cpu_user_regs()->rax, &vio->io_data, vio-
io_size);
- vio->io_state = STATE_IOREQ_NONE;
+ memcpy(&guest_cpu_user_regs()->rax, &vio->io_req.data,
+ vio->io_req.size);
+ vio->io_req.state = STATE_IOREQ_NONE;
break;

I.e. shouldn't I expect to see a handle_pio() call here? Or where
else is this new handle_pio() call going to show up?

Damn. Looks like a hunk got dropped in a rebase somewhere. I need to fix this.

Paul

Jan

Paul Durrant

2015-06-24 11:24:43 UTC

By removing the calls in hvmemul_do_io() (which is replaced by a single
assignment) and hvm_complete_assist_request() (which is replaced by a
call to process_portio_intercept() with a suitable set of ops) then
hvm_io_assist() can be moved into hvm.c and made static (and hence be a
candidate for inlining).

This patch also fixes the I/O state test at the end of hvm_io_assist()
to check the correct value. Since the ioreq server patch series was
integrated the current ioreq state is no longer an indicator of in-flight
I/O state, since an I/O sheduled by re-emulation may be targetted at a
different ioreq server.

Signed-off-by: Paul Durrant <***@citrix.com>
Cc: Keir Fraser <***@xen.org>
Cc: Jan Beulich <***@suse.com>
Cc: Andrew Cooper <***@citrix.com>
---
xen/arch/x86/hvm/emulate.c | 34 +++++++++++++++++---
xen/arch/x86/hvm/hvm.c | 70 +++++++++++++++++++++++-------------------
xen/arch/x86/hvm/intercept.c | 4 +--
xen/arch/x86/hvm/io.c | 39 -----------------------
xen/include/asm-x86/hvm/io.h | 3 +-
5 files changed, 73 insertions(+), 77 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index eefe860..111987c 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -51,6 +51,32 @@ static void hvmtrace_io_assist(ioreq_t *p)
trace_var(event, 0/*!cycles*/, size, buffer);
}

+static int null_read(struct hvm_io_handler *io_handler,
+ uint64_t addr,
+ uint64_t size,
+ uint64_t *data)
+{
+ *data = ~0ul;
+ return X86EMUL_OKAY;
+}
+
+static int null_write(struct hvm_io_handler *handler,
+ uint64_t addr,
+ uint64_t size,
+ uint64_t data)
+{
+ return X86EMUL_OKAY;
+}
+
+static const struct hvm_io_ops null_ops = {
+ .read = null_read,
+ .write = null_write
+};
+
+static struct hvm_io_handler null_handler = {
+ .ops = &null_ops
+};
+
static int hvmemul_do_io(
bool_t is_mmio, paddr_t addr, unsigned long reps, unsigned int size,
uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
@@ -140,8 +166,7 @@ static int hvmemul_do_io(
switch ( rc )
{
case X86EMUL_OKAY:
- p.state = STATE_IORESP_READY;
- hvm_io_assist(&p);
+ vio->io_data = p.data;
vio->io_state = HVMIO_none;
break;
case X86EMUL_UNHANDLEABLE:
@@ -152,8 +177,9 @@ static int hvmemul_do_io(
/* If there is no suitable backing DM, just ignore accesses */
if ( !s )
{
- hvm_complete_assist_req(&p);
- rc = X86EMUL_OKAY;
+ rc = hvm_process_io_intercept(&null_handler, &p);
+ if ( rc == X86EMUL_OKAY )
+ vio->io_data = p.data;
vio->io_state = HVMIO_none;
}
else
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 626c431..3365abb 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -411,6 +411,45 @@ bool_t hvm_io_pending(struct vcpu *v)
return 0;
}

+static void hvm_io_assist(ioreq_t *p)
+{
+ struct vcpu *curr = current;
+ struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
+ enum hvm_io_state io_state;
+
+ p->state = STATE_IOREQ_NONE;
+
+ io_state = vio->io_state;
+ vio->io_state = HVMIO_none;
+
+ switch ( io_state )
+ {
+ case HVMIO_awaiting_completion:
+ vio->io_state = HVMIO_completed;
+ vio->io_data = p->data;
+ break;
+ case HVMIO_handle_mmio_awaiting_completion:
+ vio->io_state = HVMIO_completed;
+ vio->io_data = p->data;
+ (void)handle_mmio();
+ break;
+ case HVMIO_handle_pio_awaiting_completion:
+ if ( vio->io_size == 4 ) /* Needs zero extension. */
+ guest_cpu_user_regs()->rax = (uint32_t)p->data;
+ else
+ memcpy(&guest_cpu_user_regs()->rax, &p->data, vio->io_size);
+ break;
+ default:
+ break;
+ }
+
+ if ( p->state == STATE_IOREQ_NONE )
+ {
+ msix_write_completion(curr);
+ vcpu_end_shutdown_deferral(curr);
+ }
+}
+
static bool_t hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
{
/* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
@@ -2667,37 +2706,6 @@ int hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *proto_p)
return X86EMUL_UNHANDLEABLE;
}

-void hvm_complete_assist_req(ioreq_t *p)
-{
- switch ( p->type )
- {
- case IOREQ_TYPE_PCI_CONFIG:
- ASSERT_UNREACHABLE();
- break;
- case IOREQ_TYPE_COPY:
- case IOREQ_TYPE_PIO:
- if ( p->dir == IOREQ_READ )
- {
- if ( !p->data_is_ptr )
- p->data = ~0ul;
- else
- {
- int i, step = p->df ? -p->size : p->size;
- uint32_t data = ~0;
-
- for ( i = 0; i < p->count; i++ )
- hvm_copy_to_guest_phys(p->data + step * i, &data,
- p->size);
- }
- }
- /* FALLTHRU */
- default:
- p->state = STATE_IORESP_READY;
- hvm_io_assist(p);
- break;
- }
-}
-
void hvm_broadcast_assist_req(ioreq_t *p)
{
struct domain *d = current->domain;
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index 02d7408..d6f1966 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -112,8 +112,8 @@ static const struct hvm_io_ops portio_ops = {
.write = hvm_portio_write
};

-static int hvm_process_io_intercept(struct hvm_io_handler *handler,
- ioreq_t *p)
+int hvm_process_io_intercept(struct hvm_io_handler *handler,
+ ioreq_t *p)
{
const struct hvm_io_ops *ops = handler->ops;
int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 51ef19a..61df6dd 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -169,45 +169,6 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
return 1;
}

-void hvm_io_assist(ioreq_t *p)
-{
- struct vcpu *curr = current;
- struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
- enum hvm_io_state io_state;
-
- p->state = STATE_IOREQ_NONE;
-
- io_state = vio->io_state;
- vio->io_state = HVMIO_none;
-
- switch ( io_state )
- {
- case HVMIO_awaiting_completion:
- vio->io_state = HVMIO_completed;
- vio->io_data = p->data;
- break;
- case HVMIO_handle_mmio_awaiting_completion:
- vio->io_state = HVMIO_completed;
- vio->io_data = p->data;
- (void)handle_mmio();
- break;
- case HVMIO_handle_pio_awaiting_completion:
- if ( vio->io_size == 4 ) /* Needs zero extension. */
- guest_cpu_user_regs()->rax = (uint32_t)p->data;
- else
- memcpy(&guest_cpu_user_regs()->rax, &p->data, vio->io_size);
- break;
- default:
- break;
- }
-
- if ( p->state == STATE_IOREQ_NONE )
- {
- msix_write_completion(curr);
- vcpu_end_shutdown_deferral(curr);
- }
-}
-
static bool_t dpci_portio_accept(struct hvm_io_handler *handler,
uint64_t addr,
uint64_t size)
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 00b4a89..f913c3b 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -82,6 +82,8 @@ struct hvm_io_ops {
hvm_io_write_t write;
};

+int hvm_process_io_intercept(struct hvm_io_handler *handler,
+ ioreq_t *p);
int hvm_io_intercept(ioreq_t *p);

struct hvm_io_handler *hvm_next_io_handler(struct domain *d);
@@ -103,7 +105,6 @@ int handle_mmio_with_translation(unsigned long gva, unsigned long gpfn,
struct npfec);
int handle_pio(uint16_t port, unsigned int size, int dir);
void hvm_interrupt_post(struct vcpu *v, int vector, int type);
-void hvm_io_assist(ioreq_t *p);
void hvm_dpci_eoi(struct domain *d, unsigned int guest_irq,
const union vioapic_redir_entry *ent);
void msix_write_completion(struct vcpu *);

--
1.7.10.4

Jan Beulich

2015-06-24 15:36:32 UTC

Post by Paul Durrant
By removing the calls in hvmemul_do_io() (which is replaced by a single
assignment) and hvm_complete_assist_request() (which is replaced by a
call to process_portio_intercept() with a suitable set of ops) then

With this saying hvm_process_io_intercept() ...

Post by Paul Durrant
hvm_io_assist() can be moved into hvm.c and made static (and hence be a
candidate for inlining).
This patch also fixes the I/O state test at the end of hvm_io_assist()
to check the correct value. Since the ioreq server patch series was
integrated the current ioreq state is no longer an indicator of in-flight
I/O state, since an I/O sheduled by re-emulation may be targetted at a
different ioreq server.

Acked-by: Jan Beulich <***@suse.com>

Paul Durrant

2015-06-24 15:50:53 UTC

-----Original Message-----
Sent: 24 June 2015 16:37
To: Paul Durrant
Subject: Re: [Xen-devel] [PATCH v4 11/17] x86/hvm: only call hvm_io_assist()
from hvm_wait_for_io()

Post by Paul Durrant
By removing the calls in hvmemul_do_io() (which is replaced by a single
assignment) and hvm_complete_assist_request() (which is replaced by a
call to process_portio_intercept() with a suitable set of ops) then

With this saying hvm_process_io_intercept() ...

Ah, good spot :-)

Post by Paul Durrant
hvm_io_assist() can be moved into hvm.c and made static (and hence be a
candidate for inlining).
This patch also fixes the I/O state test at the end of hvm_io_assist()
to check the correct value. Since the ioreq server patch series was
integrated the current ioreq state is no longer an indicator of in-flight
I/O state, since an I/O sheduled by re-emulation may be targetted at a
different ioreq server.

Thanks.

Paul

_______________________________________________
Xen-devel mailing list
http://lists.xen.org/xen-devel

Paul Durrant

2015-06-24 11:24:45 UTC

By removing the HVMIO_dispatched state and making all pending emulations
(i.e. all those not handled by the hypervisor) use HVMIO_awating_completion,
various code-paths can be simplified.

Signed-off-by: Paul Durrant <***@citrix.com>
Cc: Keir Fraser <***@xen.org>
Cc: Jan Beulich <***@suse.com>
Cc: Andrew Cooper <***@citrix.com>
---
xen/arch/x86/hvm/emulate.c | 12 +++---------
xen/arch/x86/hvm/hvm.c | 12 +++---------
xen/arch/x86/hvm/io.c | 12 ++++--------
xen/arch/x86/hvm/vmx/realmode.c | 2 +-
xen/include/asm-x86/hvm/vcpu.h | 8 +++++++-
5 files changed, 18 insertions(+), 28 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 111987c..c10adad 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -138,20 +138,14 @@ static int hvmemul_do_io(
if ( data_is_addr || dir == IOREQ_WRITE )
return X86EMUL_UNHANDLEABLE;
goto finish_access;
- case HVMIO_dispatched:
- /* May have to wait for previous cycle of a multi-write to complete. */
- if ( is_mmio && !data_is_addr && (dir == IOREQ_WRITE) &&
- (addr == (vio->mmio_large_write_pa +
- vio->mmio_large_write_bytes)) )
- return X86EMUL_RETRY;
- /* fallthrough */
default:
return X86EMUL_UNHANDLEABLE;
}

- vio->io_state = (data_is_addr || dir == IOREQ_WRITE) ?
- HVMIO_dispatched : HVMIO_awaiting_completion;
+ vio->io_state = HVMIO_awaiting_completion;
vio->io_size = size;
+ vio->io_dir = dir;
+ vio->io_data_is_addr = data_is_addr;

if ( dir == IOREQ_WRITE )
{
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 39f40ad..4458fa4 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -416,22 +416,16 @@ static void hvm_io_assist(ioreq_t *p)
{
struct vcpu *curr = current;
struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
- enum hvm_io_state io_state;

p->state = STATE_IOREQ_NONE;

- io_state = vio->io_state;
- vio->io_state = HVMIO_none;
-
- switch ( io_state )
+ if ( HVMIO_NEED_COMPLETION(vio) )
{
- case HVMIO_awaiting_completion:
vio->io_state = HVMIO_completed;
vio->io_data = p->data;
- break;
- default:
- break;
}
+ else
+ vio->io_state = HVMIO_none;

msix_write_completion(curr);
vcpu_end_shutdown_deferral(curr);
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 27150e9..12fc923 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -90,9 +90,7 @@ int handle_mmio(void)

rc = hvm_emulate_one(&ctxt);

- if ( rc != X86EMUL_RETRY )
- vio->io_state = HVMIO_none;
- if ( vio->io_state == HVMIO_awaiting_completion || vio->mmio_retry )
+ if ( HVMIO_NEED_COMPLETION(vio) || vio->mmio_retry )
vio->io_completion = HVMIO_mmio_completion;
else
vio->mmio_access = (struct npfec){};
@@ -142,6 +140,9 @@ int handle_pio(uint16_t port, unsigned int size, int dir)

rc = hvmemul_do_pio_buffer(port, size, dir, &data);

+ if ( HVMIO_NEED_COMPLETION(vio) )
+ vio->io_completion = HVMIO_pio_completion;
+
switch ( rc )
{
case X86EMUL_OKAY:
@@ -154,11 +155,6 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
}
break;
case X86EMUL_RETRY:
- if ( vio->io_state != HVMIO_awaiting_completion )
- return 0;
- /* Completion in hvm_io_assist() with no re-emulation required. */
- ASSERT(dir == IOREQ_READ);
- vio->io_completion = HVMIO_pio_completion;
break;
default:
gdprintk(XENLOG_ERR, "Weird HVM ioemulation status %d.\n", rc);
diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c
index 76ff9a5..5e56a1f 100644
--- a/xen/arch/x86/hvm/vmx/realmode.c
+++ b/xen/arch/x86/hvm/vmx/realmode.c
@@ -111,7 +111,7 @@ void vmx_realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt)

rc = hvm_emulate_one(hvmemul_ctxt);

- if ( vio->io_state == HVMIO_awaiting_completion || vio->mmio_retry )
+ if ( HVMIO_NEED_COMPLETION(vio) || vio->mmio_retry )
vio->io_completion = HVMIO_realmode_completion;

if ( rc == X86EMUL_UNHANDLEABLE )
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index c4d96a8..2830057 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -32,7 +32,6 @@

enum hvm_io_state {
HVMIO_none = 0,
- HVMIO_dispatched,
HVMIO_awaiting_completion,
HVMIO_completed
};
@@ -55,6 +54,13 @@ struct hvm_vcpu_io {
unsigned long io_data;
int io_size;
enum hvm_io_completion io_completion;
+ uint8_t io_dir;
+ uint8_t io_data_is_addr;
+
+#define HVMIO_NEED_COMPLETION(_vio) \
+ ( ((_vio)->io_state == HVMIO_awaiting_completion) && \
+ !(_vio)->io_data_is_addr && \
+ ((_vio)->io_dir == IOREQ_READ) )

/*
* HVM emulation:

--
1.7.10.4

Jan Beulich

2015-06-24 15:52:52 UTC

Post by Paul Durrant
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -138,20 +138,14 @@ static int hvmemul_do_io(
if ( data_is_addr || dir == IOREQ_WRITE )
return X86EMUL_UNHANDLEABLE;
goto finish_access;
- /* May have to wait for previous cycle of a multi-write to complete. */
- if ( is_mmio && !data_is_addr && (dir == IOREQ_WRITE) &&
- (addr == (vio->mmio_large_write_pa +
- vio->mmio_large_write_bytes)) )
- return X86EMUL_RETRY;
- /* fallthrough */

The description is plausible, and the rest of the patch looks plausible
too, but I can't seem to follow why the above can go away without
any replacement (or what the replacement would be). Could you
clarify that, please?

Jan

Paul Durrant

2015-06-24 16:00:49 UTC

-----Original Message-----
Sent: 24 June 2015 16:53
To: Paul Durrant
Subject: Re: [PATCH v4 13/17] x86/hvm: remove HVMIO_dispatched I/O
state

Post by Paul Durrant
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -138,20 +138,14 @@ static int hvmemul_do_io(
if ( data_is_addr || dir == IOREQ_WRITE )
return X86EMUL_UNHANDLEABLE;
goto finish_access;
- /* May have to wait for previous cycle of a multi-write to complete. */
- if ( is_mmio && !data_is_addr && (dir == IOREQ_WRITE) &&
- (addr == (vio->mmio_large_write_pa +
- vio->mmio_large_write_bytes)) )
- return X86EMUL_RETRY;
- /* fallthrough */

The description is plausible, and the rest of the patch looks plausible
too, but I can't seem to follow why the above can go away without
any replacement (or what the replacement would be). Could you
clarify that, please?

The reason is that the dispatched state was only ever used for writes (or data_is_addr, which is disallowed in this case), and the code below this forces writes to be completed with X86EMUL_OKAY meaning that emulation will never be retried (because it's not retried directly from hvm_io_assist() unless state was HVMIO_mmio_awaiting_completion). Thus I believe this was either always dead code, or it's been dead for a long time.

Do you want me to stick words to that effect in the comment?

Paul

Jan

Jan Beulich

2015-06-24 16:12:26 UTC

Post by Paul Durrant

Sent: 24 June 2015 16:53

Post by Paul Durrant
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -138,20 +138,14 @@ static int hvmemul_do_io(
if ( data_is_addr || dir == IOREQ_WRITE )
return X86EMUL_UNHANDLEABLE;
goto finish_access;
- /* May have to wait for previous cycle of a multi-write to complete. */
- if ( is_mmio && !data_is_addr && (dir == IOREQ_WRITE) &&
- (addr == (vio->mmio_large_write_pa +
- vio->mmio_large_write_bytes)) )
- return X86EMUL_RETRY;
- /* fallthrough */

The description is plausible, and the rest of the patch looks plausible
too, but I can't seem to follow why the above can go away without
any replacement (or what the replacement would be). Could you
clarify that, please?

The reason is that the dispatched state was only ever used for writes (or
data_is_addr, which is disallowed in this case), and the code below this
forces writes to be completed with X86EMUL_OKAY meaning that emulation will
never be retried (because it's not retried directly from hvm_io_assist()
unless state was HVMIO_mmio_awaiting_completion). Thus I believe this was
either always dead code, or it's been dead for a long time.

Looks like you're right, albeit I can't see how what the comment says
would be achieved elsewhere.

Post by Paul Durrant
Do you want me to stick words to that effect in the comment?

That would be helpful I think, namely in case we later learn there
was some use for that code despite it looking to be dead. (Did you
btw do some basic verification that it's dead, by sticking a printk()
or ASSERT() in there?)

Jan

Paul Durrant

2015-06-24 17:00:51 UTC

-----Original Message-----
Sent: 24 June 2015 17:12
To: Paul Durrant
Subject: Re: [Xen-devel] [PATCH v4 13/17] x86/hvm: remove
HVMIO_dispatched I/O state

Post by Paul Durrant

Sent: 24 June 2015 16:53

Post by Paul Durrant
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -138,20 +138,14 @@ static int hvmemul_do_io(
if ( data_is_addr || dir == IOREQ_WRITE )
return X86EMUL_UNHANDLEABLE;
goto finish_access;
- /* May have to wait for previous cycle of a multi-write to complete.

*/

Post by Paul Durrant

Post by Paul Durrant
- if ( is_mmio && !data_is_addr && (dir == IOREQ_WRITE) &&
- (addr == (vio->mmio_large_write_pa +
- vio->mmio_large_write_bytes)) )
- return X86EMUL_RETRY;
- /* fallthrough */

The description is plausible, and the rest of the patch looks plausible
too, but I can't seem to follow why the above can go away without
any replacement (or what the replacement would be). Could you
clarify that, please?

The reason is that the dispatched state was only ever used for writes (or
data_is_addr, which is disallowed in this case), and the code below this
forces writes to be completed with X86EMUL_OKAY meaning that

emulation will

Post by Paul Durrant
never be retried (because it's not retried directly from hvm_io_assist()
unless state was HVMIO_mmio_awaiting_completion). Thus I believe this

was

Post by Paul Durrant
either always dead code, or it's been dead for a long time.

Looks like you're right, albeit I can't see how what the comment says
would be achieved elsewhere.

Post by Paul Durrant
Do you want me to stick words to that effect in the comment?

That would be helpful I think, namely in case we later learn there
was some use for that code despite it looking to be dead. (Did you
btw do some basic verification that it's dead, by sticking a printk()
or ASSERT() in there?)

While I was testing I stuck a BUG_ON() in there to prove to myself it really really was never hit. I never managed to provoke it (using a variety of Windows OS and both QEMUs).

Paul

Jan
_______________________________________________
Xen-devel mailing list
http://lists.xen.org/xen-devel

Andrew Cooper

2015-06-25 12:29:49 UTC

Post by Paul Durrant
+#define HVMIO_NEED_COMPLETION(_vio) \
+ ( ((_vio)->io_state == HVMIO_awaiting_completion) && \
+ !(_vio)->io_data_is_addr && \
+ ((_vio)->io_dir == IOREQ_READ) )

Please can this be a static inline which takes a const pointer.

~Andrew

Paul Durrant

2015-06-24 11:24:46 UTC

Emulation request status is already covered by STATE_IOREQ_XXX values so
just use those. The mapping is:

HVMIO_none -> STATE_IOREQ_NONE
HVMIO_awaiting_completion -> STATE_IOREQ_READY
HVMIO_completed -> STATE_IORESP_READY

Signed-off-by: Paul Durrant <***@citrix.com>
Cc: Keir Fraser <***@xen.org>
Cc: Jan Beulich <***@suse.com>
Cc: Andrew Cooper <***@citrix.com>
---
xen/arch/x86/hvm/emulate.c | 14 +++++++-------
xen/arch/x86/hvm/hvm.c | 6 +++---
xen/arch/x86/hvm/svm/nestedsvm.c | 2 +-
xen/arch/x86/hvm/vmx/realmode.c | 4 ++--
xen/include/asm-x86/hvm/vcpu.h | 10 ++--------
5 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index c10adad..6f538bf 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -131,10 +131,10 @@ static int hvmemul_do_io(

switch ( vio->io_state )
{
- case HVMIO_none:
+ case STATE_IOREQ_NONE:
break;
- case HVMIO_completed:
- vio->io_state = HVMIO_none;
+ case STATE_IORESP_READY:
+ vio->io_state = STATE_IOREQ_NONE;
if ( data_is_addr || dir == IOREQ_WRITE )
return X86EMUL_UNHANDLEABLE;
goto finish_access;
@@ -142,7 +142,7 @@ static int hvmemul_do_io(
return X86EMUL_UNHANDLEABLE;
}

- vio->io_state = HVMIO_awaiting_completion;
+ vio->io_state = STATE_IOREQ_READY;
vio->io_size = size;
vio->io_dir = dir;
vio->io_data_is_addr = data_is_addr;
@@ -161,7 +161,7 @@ static int hvmemul_do_io(
{
case X86EMUL_OKAY:
vio->io_data = p.data;
- vio->io_state = HVMIO_none;
+ vio->io_state = STATE_IOREQ_NONE;
break;
case X86EMUL_UNHANDLEABLE:
{
@@ -174,13 +174,13 @@ static int hvmemul_do_io(
rc = hvm_process_io_intercept(&null_handler, &p);
if ( rc == X86EMUL_OKAY )
vio->io_data = p.data;
- vio->io_state = HVMIO_none;
+ vio->io_state = STATE_IOREQ_NONE;
}
else
{
rc = hvm_send_assist_req(s, &p);
if ( rc != X86EMUL_RETRY )
- vio->io_state = HVMIO_none;
+ vio->io_state = STATE_IOREQ_NONE;
else if ( data_is_addr || dir == IOREQ_WRITE )
rc = X86EMUL_OKAY;
}
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 4458fa4..7411287 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -421,11 +421,11 @@ static void hvm_io_assist(ioreq_t *p)

if ( HVMIO_NEED_COMPLETION(vio) )
{
- vio->io_state = HVMIO_completed;
+ vio->io_state = STATE_IORESP_READY;
vio->io_data = p->data;
}
else
- vio->io_state = HVMIO_none;
+ vio->io_state = STATE_IOREQ_NONE;

msix_write_completion(curr);
vcpu_end_shutdown_deferral(curr);
@@ -505,7 +505,7 @@ void hvm_do_resume(struct vcpu *v)
guest_cpu_user_regs()->rax = (uint32_t)vio->io_data;
else
memcpy(&guest_cpu_user_regs()->rax, &vio->io_data, vio->io_size);
- vio->io_state = HVMIO_none;
+ vio->io_state = STATE_IOREQ_NONE;
break;
case HVMIO_realmode_completion:
{
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index be5797a..8b165c6 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -1231,7 +1231,7 @@ enum hvm_intblk nsvm_intr_blocked(struct vcpu *v)
* Delay the injection because this would result in delivering
* an interrupt *within* the execution of an instruction.
*/
- if ( v->arch.hvm_vcpu.hvm_io.io_state != HVMIO_none )
+ if ( v->arch.hvm_vcpu.hvm_io.io_state != STATE_IOREQ_NONE )
return hvm_intblk_shadow;

if ( !nv->nv_vmexit_pending && n2vmcb->exitintinfo.bytes != 0 ) {
diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c
index 5e56a1f..4135ad4 100644
--- a/xen/arch/x86/hvm/vmx/realmode.c
+++ b/xen/arch/x86/hvm/vmx/realmode.c
@@ -205,7 +205,7 @@ void vmx_realmode(struct cpu_user_regs *regs)

vmx_realmode_emulate_one(&hvmemul_ctxt);

- if ( vio->io_state != HVMIO_none || vio->mmio_retry )
+ if ( vio->io_state != STATE_IOREQ_NONE || vio->mmio_retry )
break;

/* Stop emulating unless our segment state is not safe */
@@ -219,7 +219,7 @@ void vmx_realmode(struct cpu_user_regs *regs)
}

/* Need to emulate next time if we've started an IO operation */
- if ( vio->io_state != HVMIO_none )
+ if ( vio->io_state != STATE_IOREQ_NONE )
curr->arch.hvm_vmx.vmx_emulate = 1;

if ( !curr->arch.hvm_vmx.vmx_emulate && !curr->arch.hvm_vmx.vmx_realmode )
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 2830057..f797518 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -30,12 +30,6 @@
#include <asm/hvm/svm/nestedsvm.h>
#include <asm/mtrr.h>

-enum hvm_io_state {
- HVMIO_none = 0,
- HVMIO_awaiting_completion,
- HVMIO_completed
-};
-
enum hvm_io_completion {
HVMIO_no_completion = 0,
HVMIO_mmio_completion,
@@ -50,7 +44,7 @@ struct hvm_vcpu_asid {

struct hvm_vcpu_io {
/* I/O request in flight to device model. */
- enum hvm_io_state io_state;
+ uint8_t io_state;
unsigned long io_data;
int io_size;
enum hvm_io_completion io_completion;
@@ -58,7 +52,7 @@ struct hvm_vcpu_io {
uint8_t io_data_is_addr;

#define HVMIO_NEED_COMPLETION(_vio) \
- ( ((_vio)->io_state == HVMIO_awaiting_completion) && \
+ ( ((_vio)->io_state == STATE_IOREQ_READY) && \
!(_vio)->io_data_is_addr && \
((_vio)->io_dir == IOREQ_READ) )

--
1.7.10.4

Jan Beulich

2015-06-25 09:43:33 UTC

Post by Paul Durrant
Emulation request status is already covered by STATE_IOREQ_XXX values so
HVMIO_none -> STATE_IOREQ_NONE
HVMIO_awaiting_completion -> STATE_IOREQ_READY
HVMIO_completed -> STATE_IORESP_READY

Acked-by: Jan Beulich <***@suse.com>
subject to possibly ...

Post by Paul Durrant
@@ -50,7 +44,7 @@ struct hvm_vcpu_asid {
struct hvm_vcpu_io {
/* I/O request in flight to device model. */
- enum hvm_io_state io_state;
+ uint8_t io_state;
unsigned long io_data;
int io_size;
enum hvm_io_completion io_completion;

... finding a better slot for this single byte (not sure what the
structure looks like at this point of the series).

Jan

Paul Durrant

2015-06-25 09:46:02 UTC

-----Original Message-----
Sent: 25 June 2015 10:44
To: Paul Durrant
Cc: Andrew Cooper; xen-devel; Keir (Xen.org)
Subject: Re: [PATCH v4 14/17] x86/hvm: remove hvm_io_state enumeration

Post by Paul Durrant
Emulation request status is already covered by STATE_IOREQ_XXX values

so

Post by Paul Durrant
HVMIO_none -> STATE_IOREQ_NONE
HVMIO_awaiting_completion -> STATE_IOREQ_READY
HVMIO_completed -> STATE_IORESP_READY

Thanks.

subject to possibly ...

Post by Paul Durrant
@@ -50,7 +44,7 @@ struct hvm_vcpu_asid {
struct hvm_vcpu_io {
/* I/O request in flight to device model. */
- enum hvm_io_state io_state;
+ uint8_t io_state;
unsigned long io_data;
int io_size;
enum hvm_io_completion io_completion;

... finding a better slot for this single byte (not sure what the
structure looks like at this point of the series).

Probably not worth it. The state gets pulled into an ioreq_t (along with io_data and io_size) in a later patch.

Paul

Jan

Paul Durrant

2015-06-24 11:24:42 UTC

...and error handling"

NOTE: A straight reversion was not possible because of subsequent changes
in the code so this at least partially a manual reversion.

By limiting hvmemul_do_io_addr() to reps falling within the pages on which
a reference has already been taken, we can guarantee that calls to
hvm_copy_to/from_guest_phys() will not hit the HVMCOPY_gfn_paged_out
or HVMCOPY_gfn_shared cases. Thus we can remove the retry logic from
the intercept code and simplify it significantly.

Normally hvmemul_do_io_addr() will only reference single page at a time.
It will, however, take an extra page reference for I/O spanning a page
boundary.

It is still important to know, upon returning from x86_emulate(), whether
the number of reps was reduced so the mmio_retry flag is retained for that
purpose.

Signed-off-by: Paul Durrant <***@citrix.com>
Cc: Keir Fraser <***@xen.org>
Cc: Jan Beulich <***@suse.com>
Cc: Andrew Cooper <***@citrix.com>
---
xen/arch/x86/hvm/emulate.c | 86 +++++++++++++++++++++++++++-------------
xen/arch/x86/hvm/hvm.c | 4 ++
xen/arch/x86/hvm/intercept.c | 52 +++++-------------------
xen/include/asm-x86/hvm/vcpu.h | 2 +-
4 files changed, 74 insertions(+), 70 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 4e2fdf1..eefe860 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -52,7 +52,7 @@ static void hvmtrace_io_assist(ioreq_t *p)
}

static int hvmemul_do_io(
- bool_t is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
+ bool_t is_mmio, paddr_t addr, unsigned long reps, unsigned int size,
uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
{
struct vcpu *curr = current;
@@ -61,6 +61,7 @@ static int hvmemul_do_io(
.type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
.addr = addr,
.size = size,
+ .count = reps,
.dir = dir,
.df = df,
.data = data,
@@ -126,15 +127,6 @@ static int hvmemul_do_io(
HVMIO_dispatched : HVMIO_awaiting_completion;
vio->io_size = size;

- /*
- * When retrying a repeated string instruction, force exit to guest after
- * completion of the retried iteration to allow handling of interrupts.
- */
- if ( vio->mmio_retrying )
- *reps = 1;
-
- p.count = *reps;
-
if ( dir == IOREQ_WRITE )
{
if ( !data_is_addr )
@@ -148,17 +140,9 @@ static int hvmemul_do_io(
switch ( rc )
{
case X86EMUL_OKAY:
- case X86EMUL_RETRY:
- *reps = p.count;
p.state = STATE_IORESP_READY;
- if ( !vio->mmio_retry )
- {
- hvm_io_assist(&p);
- vio->io_state = HVMIO_none;
- }
- else
- /* Defer hvm_io_assist() invocation to hvm_do_resume(). */
- vio->io_state = HVMIO_handle_mmio_awaiting_completion;
+ hvm_io_assist(&p);
+ vio->io_state = HVMIO_none;
break;
case X86EMUL_UNHANDLEABLE:
{
@@ -236,7 +220,7 @@ static int hvmemul_do_io_buffer(

BUG_ON(buffer == NULL);

- rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 0,
+ rc = hvmemul_do_io(is_mmio, addr, *reps, size, dir, df, 0,
(uintptr_t)buffer);
if ( rc == X86EMUL_UNHANDLEABLE && dir == IOREQ_READ )
memset(buffer, 0xff, size);
@@ -288,17 +272,66 @@ static int hvmemul_do_io_addr(
bool_t is_mmio, paddr_t addr, unsigned long *reps,
unsigned int size, uint8_t dir, bool_t df, paddr_t ram_gpa)
{
- struct page_info *ram_page;
+ struct vcpu *v = current;
+ unsigned long ram_gmfn = paddr_to_pfn(ram_gpa);
+ struct page_info *ram_page[2];
+ int nr_pages = 0;
+ unsigned long count;
int rc;

- rc = hvmemul_acquire_page(paddr_to_pfn(ram_gpa), &ram_page);
+ rc = hvmemul_acquire_page(ram_gmfn, &ram_page[nr_pages]);
if ( rc != X86EMUL_OKAY )
- return rc;
+ goto out;

- rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 1,
+ nr_pages++;
+
+ /* Detemine how many reps will fit within this page */
+ for ( count = 0; count < *reps; count++ )
+ {
+ paddr_t start, end;
+
+ if ( df )
+ {
+ start = ram_gpa - count * size;
+ end = ram_gpa + size - 1;
+ }
+ else
+ {
+ start = ram_gpa;
+ end = ram_gpa + (count + 1) * size - 1;
+ }
+
+ if ( paddr_to_pfn(start) != ram_gmfn ||
+ paddr_to_pfn(end) != ram_gmfn )
+ break;
+ }
+
+ if ( count == 0 )
+ {
+ /*
+ * This access must span two pages, so grab a reference to
+ * the next page and do a single rep.
+ */
+ rc = hvmemul_acquire_page(df ? ram_gmfn - 1 : ram_gmfn + 1,
+ &ram_page[nr_pages]);
+ if ( rc != X86EMUL_OKAY )
+ goto out;
+
+ nr_pages++;
+ count = 1;
+ }
+
+ rc = hvmemul_do_io(is_mmio, addr, count, size, dir, df, 1,
ram_gpa);
+ if ( rc == X86EMUL_OKAY )
+ {
+ v->arch.hvm_vcpu.hvm_io.mmio_retry = (count < *reps);
+ *reps = count;
+ }

- hvmemul_release_page(ram_page);
+ out:
+ while ( --nr_pages >= 0 )
+ hvmemul_release_page(ram_page[nr_pages]);

return rc;
}
@@ -1550,7 +1583,6 @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
}

hvmemul_ctxt->exn_pending = 0;
- vio->mmio_retrying = vio->mmio_retry;
vio->mmio_retry = 0;

if ( cpu_has_vmx )
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index f8486f4..626c431 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -440,6 +440,7 @@ static bool_t hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)

void hvm_do_resume(struct vcpu *v)
{
+ struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
struct domain *d = v->domain;
struct hvm_ioreq_server *s;

@@ -468,6 +469,9 @@ void hvm_do_resume(struct vcpu *v)
}
}

+ if ( vio->mmio_retry )
+ (void)handle_mmio();
+
/* Inject pending hw/sw trap */
if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
{
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index 625e585..02d7408 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -115,7 +115,6 @@ static const struct hvm_io_ops portio_ops = {
static int hvm_process_io_intercept(struct hvm_io_handler *handler,
ioreq_t *p)
{
- struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
const struct hvm_io_ops *ops = handler->ops;
int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
uint64_t data;
@@ -125,23 +124,12 @@ static int hvm_process_io_intercept(struct hvm_io_handler *handler,
{
for ( i = 0; i < p->count; i++ )
{
- if ( vio->mmio_retrying )
- {
- if ( vio->mmio_large_read_bytes != p->size )
- return X86EMUL_UNHANDLEABLE;
- memcpy(&data, vio->mmio_large_read, p->size);
- vio->mmio_large_read_bytes = 0;
- vio->mmio_retrying = 0;
- }
- else
- {
- addr = (p->type == IOREQ_TYPE_COPY) ?
- p->addr + step * i :
- p->addr;
- rc = ops->read(handler, addr, p->size, &data);
- if ( rc != X86EMUL_OKAY )
- break;
- }
+ addr = (p->type == IOREQ_TYPE_COPY) ?
+ p->addr + step * i :
+ p->addr;
+ rc = ops->read(handler, addr, p->size, &data);
+ if ( rc != X86EMUL_OKAY )
+ break;

if ( p->data_is_ptr )
{
@@ -150,14 +138,12 @@ static int hvm_process_io_intercept(struct hvm_io_handler *handler,
{
case HVMCOPY_okay:
break;
- case HVMCOPY_gfn_paged_out:
- case HVMCOPY_gfn_shared:
- rc = X86EMUL_RETRY;
- break;
case HVMCOPY_bad_gfn_to_mfn:
/* Drop the write as real hardware would. */
continue;
case HVMCOPY_bad_gva_to_gfn:
+ case HVMCOPY_gfn_paged_out:
+ case HVMCOPY_gfn_shared:
ASSERT_UNREACHABLE();
/* fall through */
default:
@@ -170,13 +156,6 @@ static int hvm_process_io_intercept(struct hvm_io_handler *handler,
else
p->data = data;
}
-
- if ( rc == X86EMUL_RETRY )
- {
- vio->mmio_retry = 1;
- vio->mmio_large_read_bytes = p->size;
- memcpy(vio->mmio_large_read, &data, p->size);
- }
}
else /* p->dir == IOREQ_WRITE */
{
@@ -189,14 +168,12 @@ static int hvm_process_io_intercept(struct hvm_io_handler *handler,
{
case HVMCOPY_okay:
break;
- case HVMCOPY_gfn_paged_out:
- case HVMCOPY_gfn_shared:
- rc = X86EMUL_RETRY;
- break;
case HVMCOPY_bad_gfn_to_mfn:
data = ~0;
break;
case HVMCOPY_bad_gva_to_gfn:
+ case HVMCOPY_gfn_paged_out:
+ case HVMCOPY_gfn_shared:
ASSERT_UNREACHABLE();
/* fall through */
default:
@@ -216,15 +193,6 @@ static int hvm_process_io_intercept(struct hvm_io_handler *handler,
if ( rc != X86EMUL_OKAY )
break;
}
-
- if ( rc == X86EMUL_RETRY )
- vio->mmio_retry = 1;
- }
-
- if ( i != 0 )
- {
- p->count = i;
- rc = X86EMUL_OKAY;
}

return rc;
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index dd08416..97d78bd 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -74,7 +74,7 @@ struct hvm_vcpu_io {
* For string instruction emulation we need to be able to signal a
* necessary retry through other than function return codes.
*/
- bool_t mmio_retry, mmio_retrying;
+ bool_t mmio_retry;

unsigned long msix_unmask_address;

--
1.7.10.4

Jan Beulich

2015-06-24 15:21:15 UTC

Post by Paul Durrant
@@ -288,17 +272,66 @@ static int hvmemul_do_io_addr(
bool_t is_mmio, paddr_t addr, unsigned long *reps,
unsigned int size, uint8_t dir, bool_t df, paddr_t ram_gpa)
{
- struct page_info *ram_page;
+ struct vcpu *v = current;
+ unsigned long ram_gmfn = paddr_to_pfn(ram_gpa);
+ struct page_info *ram_page[2];
+ int nr_pages = 0;
+ unsigned long count;
int rc;
- rc = hvmemul_acquire_page(paddr_to_pfn(ram_gpa), &ram_page);
+ rc = hvmemul_acquire_page(ram_gmfn, &ram_page[nr_pages]);
if ( rc != X86EMUL_OKAY )
- return rc;
+ goto out;
- rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 1,
+ nr_pages++;
+
+ /* Detemine how many reps will fit within this page */
+ for ( count = 0; count < *reps; count++ )
+ {
+ paddr_t start, end;
+
+ if ( df )
+ {
+ start = ram_gpa - count * size;
+ end = ram_gpa + size - 1;
+ }
+ else
+ {
+ start = ram_gpa;
+ end = ram_gpa + (count + 1) * size - 1;
+ }
+
+ if ( paddr_to_pfn(start) != ram_gmfn ||
+ paddr_to_pfn(end) != ram_gmfn )
+ break;
+ }
+
+ if ( count == 0 )
+ {
+ /*
+ * This access must span two pages, so grab a reference to
+ * the next page and do a single rep.
+ */
+ rc = hvmemul_acquire_page(df ? ram_gmfn - 1 : ram_gmfn + 1,
+ &ram_page[nr_pages]);
+ if ( rc != X86EMUL_OKAY )
+ goto out;
+
+ nr_pages++;
+ count = 1;
+ }
+
+ rc = hvmemul_do_io(is_mmio, addr, count, size, dir, df, 1,
ram_gpa);

Looking at this change alone I think calling this a revert is pretty odd.
Yes, you undo some of the original commit, but it looks like about 50%
of the patch are doing things other than reverting. Mentioning the
original commit in the description is certainly fine, but beyond that it
should be an ordinary patch.

As to the code above - do you really think determining "count" in a
loop is efficient? It ought to be possible to obtain this via simple
calculation...

Jan

Paul Durrant

2015-06-24 15:23:29 UTC

-----Original Message-----
Sent: 24 June 2015 16:21
To: Paul Durrant
Subject: Re: [PATCH v4 10/17] x86/hvm: revert 82ed8716b "fix direct PCI port
I/O emulation retry...

Post by Paul Durrant
@@ -288,17 +272,66 @@ static int hvmemul_do_io_addr(
bool_t is_mmio, paddr_t addr, unsigned long *reps,
unsigned int size, uint8_t dir, bool_t df, paddr_t ram_gpa)
{
- struct page_info *ram_page;
+ struct vcpu *v = current;
+ unsigned long ram_gmfn = paddr_to_pfn(ram_gpa);
+ struct page_info *ram_page[2];
+ int nr_pages = 0;
+ unsigned long count;
int rc;
- rc = hvmemul_acquire_page(paddr_to_pfn(ram_gpa), &ram_page);
+ rc = hvmemul_acquire_page(ram_gmfn, &ram_page[nr_pages]);
if ( rc != X86EMUL_OKAY )
- return rc;
+ goto out;
- rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 1,
+ nr_pages++;
+
+ /* Detemine how many reps will fit within this page */
+ for ( count = 0; count < *reps; count++ )
+ {
+ paddr_t start, end;
+
+ if ( df )
+ {
+ start = ram_gpa - count * size;
+ end = ram_gpa + size - 1;
+ }
+ else
+ {
+ start = ram_gpa;
+ end = ram_gpa + (count + 1) * size - 1;
+ }
+
+ if ( paddr_to_pfn(start) != ram_gmfn ||
+ paddr_to_pfn(end) != ram_gmfn )
+ break;
+ }
+
+ if ( count == 0 )
+ {
+ /*
+ * This access must span two pages, so grab a reference to
+ * the next page and do a single rep.
+ */
+ rc = hvmemul_acquire_page(df ? ram_gmfn - 1 : ram_gmfn + 1,
+ &ram_page[nr_pages]);
+ if ( rc != X86EMUL_OKAY )
+ goto out;
+
+ nr_pages++;
+ count = 1;
+ }
+
+ rc = hvmemul_do_io(is_mmio, addr, count, size, dir, df, 1,
ram_gpa);

Looking at this change alone I think calling this a revert is pretty odd.
Yes, you undo some of the original commit, but it looks like about 50%
of the patch are doing things other than reverting. Mentioning the
original commit in the description is certainly fine, but beyond that it
should be an ordinary patch.

Ok. I can change the terminology in the message.

As to the code above - do you really think determining "count" in a
loop is efficient? It ought to be possible to obtain this via simple
calculation...

True. It's a bit lazy doing it this way. I'll change it.

Paul

Jan

Jan Beulich

2015-06-24 12:16:02 UTC

Post by Paul Durrant
This patch series re-works much of the code involved in emulation of port
and memory mapped I/O for HVM guests.
The code has become very convoluted and, at least by inspection, certain
emulations will apparently malfunction.
The series is broken down into 17 patches (which are also available in
my xenbits repo: http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git
0001-x86-hvm-simplify-hvmemul_do_io.patch
0002-x86-hvm-remove-hvm_io_pending-check-in-hvmemul_do_io.patch
0003-x86-hvm-remove-extraneous-parameter-from-hvmtrace_io.patch

In the event there were any, however minor, changes to these
three - they went in yesterday evening.

Jan

Paul Durrant

2015-06-24 12:52:18 UTC

-----Original Message-----
Sent: 24 June 2015 13:16
To: Paul Durrant
Subject: Re: [Xen-devel] [PATCH v4 00/17] x86/hvm: I/O emulation cleanup
and fix

Post by Paul Durrant
This patch series re-works much of the code involved in emulation of port
and memory mapped I/O for HVM guests.
The code has become very convoluted and, at least by inspection, certain
emulations will apparently malfunction.
The series is broken down into 17 patches (which are also available in

http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git

Post by Paul Durrant
0001-x86-hvm-simplify-hvmemul_do_io.patch
0002-x86-hvm-remove-hvm_io_pending-check-in-hvmemul_do_io.patch
0003-x86-hvm-remove-extraneous-parameter-from-hvmtrace_io.patch

In the event there were any, however minor, changes to these
three - they went in yesterday evening.

Ok. Thanks. Having not seen your acked-by on list, I didn't check. I'll drop these from any subsequent post.

Paul

Jan

71 Replies
1 View
Permalink to this page
Disable enhanced parsing

Thread Navigation

Paul Durrant 2015-06-24 11:24:32 UTC

Paul Durrant 2015-06-24 11:24:34 UTC

Paul Durrant 2015-06-24 11:24:41 UTC

Jan Beulich 2015-06-24 13:59:10 UTC

Paul Durrant 2015-06-24 14:12:43 UTC

Jan Beulich 2015-06-24 14:30:07 UTC

Paul Durrant 2015-06-24 14:43:44 UTC

Paul Durrant 2015-06-24 11:24:36 UTC

Jan Beulich 2015-06-24 12:33:59 UTC

Paul Durrant 2015-06-24 12:59:14 UTC

Jan Beulich 2015-06-24 13:09:51 UTC

Paul Durrant 2015-06-24 11:24:38 UTC

Paul Durrant 2015-06-24 11:24:39 UTC

Jan Beulich 2015-06-24 13:08:24 UTC

Paul Durrant 2015-06-24 13:14:13 UTC

Jan Beulich 2015-06-24 13:25:08 UTC

Paul Durrant 2015-06-24 13:34:45 UTC

Jan Beulich 2015-06-24 14:01:17 UTC

Andrew Cooper 2015-06-25 12:21:27 UTC

Jan Beulich 2015-06-25 12:46:15 UTC

Paul Durrant 2015-06-25 13:08:15 UTC

Jan Beulich 2015-06-25 13:29:26 UTC

Paul Durrant 2015-06-25 13:30:52 UTC

Andrew Cooper 2015-06-25 13:34:01 UTC

Paul Durrant 2015-06-25 13:36:13 UTC

Andrew Cooper 2015-06-25 13:37:44 UTC

Paul Durrant 2015-06-25 13:38:52 UTC

Andrew Cooper 2015-06-25 13:46:49 UTC

Paul Durrant 2015-06-25 13:48:17 UTC

Jan Beulich 2015-06-25 13:47:35 UTC

Paul Durrant 2015-06-25 13:52:35 UTC

Jan Beulich 2015-06-25 14:46:08 UTC

Paul Durrant 2015-06-25 14:57:58 UTC

Paul Durrant 2015-06-24 11:24:37 UTC

Paul Durrant 2015-06-24 11:24:33 UTC

Paul Durrant 2015-06-24 11:24:35 UTC

Paul Durrant 2015-06-24 11:24:40 UTC

Jan Beulich 2015-06-24 13:46:21 UTC

Paul Durrant 2015-06-24 11:24:48 UTC

Jan Beulich 2015-06-25 09:57:44 UTC

Paul Durrant 2015-06-25 10:32:45 UTC

Jan Beulich 2015-06-25 10:50:10 UTC

Paul Durrant 2015-06-25 10:52:02 UTC

Paul Durrant 2015-06-24 11:24:49 UTC

Jan Beulich 2015-06-25 10:46:50 UTC

Paul Durrant 2015-06-25 10:51:32 UTC

Jan Beulich 2015-06-25 11:05:51 UTC

Paul Durrant 2015-06-25 10:55:07 UTC

Jan Beulich 2015-06-25 11:08:09 UTC

Paul Durrant 2015-06-24 11:24:44 UTC

Jan Beulich 2015-06-25 09:40:38 UTC

Paul Durrant 2015-06-25 15:59:21 UTC

Paul Durrant 2015-06-24 11:24:47 UTC

Jan Beulich 2015-06-25 09:51:10 UTC

Paul Durrant 2015-06-25 10:17:41 UTC

Paul Durrant 2015-06-24 11:24:43 UTC

Jan Beulich 2015-06-24 15:36:32 UTC

Paul Durrant 2015-06-24 15:50:53 UTC

Paul Durrant 2015-06-24 11:24:45 UTC

Jan Beulich 2015-06-24 15:52:52 UTC

Paul Durrant 2015-06-24 16:00:49 UTC

Jan Beulich 2015-06-24 16:12:26 UTC

Paul Durrant 2015-06-24 17:00:51 UTC

Andrew Cooper 2015-06-25 12:29:49 UTC

Paul Durrant 2015-06-24 11:24:46 UTC

Jan Beulich 2015-06-25 09:43:33 UTC

Paul Durrant 2015-06-25 09:46:02 UTC

Paul Durrant 2015-06-24 11:24:42 UTC

Jan Beulich 2015-06-24 15:21:15 UTC

Paul Durrant 2015-06-24 15:23:29 UTC

Jan Beulich 2015-06-24 12:16:02 UTC

Paul Durrant 2015-06-24 12:52:18 UTC

about - legalese

Loading...