Discussion:
Bug#701495: qemu-kvm: Wheezy qemu-kvm fails to give DHCP to Windows XP Guest using isolated network in libvirt-bin
Daniel Dickinson
2013-02-23 19:25:28 UTC
Permalink
Package: qemu-kvm
Version: 1.1.2+dfsg-5
Severity: normal

With the version of qemu-kvm in Wheezy (testing) (1.1.2+dfsg-5) a Windows XP guest fails to get a DHCP address from dnsmasq started by libvirt-bin (both from Wheezy). Downgrading to the Squeeze version of qemu-kvm works around the issue (i.e. the Windows XP guest gets DHCP). I have also tried upgrading dnsmasq with wheezy qemu-kvm, and tried using experimental qemu-kvm, but neither works.

This is obviously a regression from 0.12<...> (squeeze version).

Oh, I am using virtio networking for kvm and netkvm.sys (virtio for XP) from spice-space (spice client guest-tools for Windows).

-- Package-specific info:


/proc/cpuinfo:

processor : 0
vendor_id : AuthenticAMD
cpu family : 16
model : 4
model name : AMD Phenom(tm) II X4 945 Processor
stepping : 3
microcode : 0x10000c8
cpu MHz : 3000.000
cache size : 512 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate npt lbrv svm_lock nrip_save
bogomips : 6030.47
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor : 1
vendor_id : AuthenticAMD
cpu family : 16
model : 4
model name : AMD Phenom(tm) II X4 945 Processor
stepping : 3
microcode : 0x10000c8
cpu MHz : 3000.000
cache size : 512 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 4
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate npt lbrv svm_lock nrip_save
bogomips : 6030.48
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor : 2
vendor_id : AuthenticAMD
cpu family : 16
model : 4
model name : AMD Phenom(tm) II X4 945 Processor
stepping : 3
microcode : 0x10000c8
cpu MHz : 800.000
cache size : 512 KB
physical id : 0
siblings : 4
core id : 2
cpu cores : 4
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate npt lbrv svm_lock nrip_save
bogomips : 6030.48
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor : 3
vendor_id : AuthenticAMD
cpu family : 16
model : 4
model name : AMD Phenom(tm) II X4 945 Processor
stepping : 3
microcode : 0x10000c8
cpu MHz : 800.000
cache size : 512 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate npt lbrv svm_lock nrip_save
bogomips : 6030.47
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate




-- System Information:
Debian Release: 7.0
APT prefers testing
APT policy: (990, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_CA.UTF-8, LC_CTYPE=en_CA.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Michael Tokarev
2013-02-23 19:30:50 UTC
Permalink
Control: tag -1 + unreproducible moreinfo
Post by Daniel Dickinson
Package: qemu-kvm
Version: 1.1.2+dfsg-5
Severity: normal
With the version of qemu-kvm in Wheezy (testing) (1.1.2+dfsg-5) a Windows XP guest fails to get a DHCP address from dnsmasq started by libvirt-bin (both from Wheezy). Downgrading to the Squeeze version of qemu-kvm works around the issue (i.e. the Windows XP guest gets DHCP). I have also tried upgrading dnsmasq with wheezy qemu-kvm, and tried using experimental qemu-kvm, but neither works.
This is obviously a regression from 0.12<...> (squeeze version).
Oh, I am using virtio networking for kvm and netkvm.sys (virtio for XP) from spice-space (spice client guest-tools for Windows).
Well. You have to be *much* more specific here. DHCP in WinXP clients,
with and without libvirt, with dnsmasq or other DHCP servers, with virtio
or other virtial NICs works for many, many users and installations. In
particular, it surely works for me, not only for WinXP but for all other
guests I have (numerous windows, linux, *bsd and some other more exotic
ones).

Please at least provide version number of the virtio drivers, and try with
other kinds of virtual NICs.

Tagging as unreprodicible for now, since that's what it really is -
unreproducible.

Thanks,

/mjt
Daniel Dickinson
2013-02-24 00:57:21 UTC
Permalink
Post by Michael Tokarev
Control: tag -1 + unreproducible moreinfo
Post by Daniel Dickinson
Package: qemu-kvm
Version: 1.1.2+dfsg-5
Severity: normal
With the version of qemu-kvm in Wheezy (testing) (1.1.2+dfsg-5) a Windows XP guest fails to get a DHCP address from dnsmasq started by libvirt-bin (both from Wheezy). Downgrading to the Squeeze version of qemu-kvm works around the issue (i.e. the Windows XP guest gets DHCP). I have also tried upgrading dnsmasq with wheezy qemu-kvm, and tried using experimental qemu-kvm, but neither works.
This is obviously a regression from 0.12<...> (squeeze version).
Oh, I am using virtio networking for kvm and netkvm.sys (virtio for XP) from spice-space (spice client guest-tools for Windows).
Well. You have to be *much* more specific here. DHCP in WinXP clients,
I have found bug #647312. Apparently this is not a new problem, but one
for which you have not found a consistent reproducible case. If you
could have stated that rather simply marking it as unreproducibile and
info it'd be appreciated. As a working developer I know that not having
information when you need it is frustrating, but at the same time
sometimes all you need it to know there is a problem, and also some
debian developers dump on you when you provide info they consider stupid
to provide or as irrelevant meaningless details, and I hadn't found that
bug before. In addition if all you needed was a little info providing
scads of detail would be a waste of time.

Let me outline what I did:

About three days ago I installed Windows XP Professional (32-bit) SP1
using virt-manager with everything para-virtualized except storage
(upgrade to SP3 using an SP3 CD after the intial install, and before
virtio drivers). After sp3 I installed virtio for everthing except
storage, then moved C: to virtio as well (and had to re-activate
Windows). I used networks I had previously defined and used with debian
guests. I have two macvtap 'networks', one to my LAN and the other to a
second NIC in the host which is used only occasionally for manually
configured networking (normally no cable). The third network is a
libvirt 'isolated network' (bridge adapter on the host, tun/tap adapter
on host as part of bridge, dnsmasq serves dhcp to the guests).

There are three cores assigned to the VM (of 4). The CPU is configured
to be kvm32.
Post by Michael Tokarev
with and without libvirt, with dnsmasq or other DHCP servers, with virtio
or other virtial NICs works for many, many users and installations. In
particular, it surely works for me, not only for WinXP but for all other
guests I have (numerous windows, linux, *bsd and some other more exotic
ones).
Please at least provide version number of the virtio drivers, and try with
other kinds of virtual NICs.
Was always intending to, the bug report was an initial 'are you aware of
the problem' report. I found the problem has been reported before and
that you can't reproduce it in your environment. Again, saying so
instead of 'it surely works for me' as if it wasn't a real problem would
have been helpful. It has been apparently seen upstream as well, but
the issue is reproducing it, but the impression I got from you was that
you didn't believe the problem actually existed, not that you really
were looking for more information.

Not sure what purpose my testing all kinds of NICs would be since that
has already been shown not to be the issue by other users. I did try
the e1000 driver and got the same results (DHCP OFFER sent by dnsmasq
but XP doesn't ACK it it, and doesn't get an IP address).

Virtio Driver is from RedHat Inc 22/01/2013, version 51.64.104.5200 and
not digitally signed.

I initially had installed the driver from the RedHat iso
virtio-win-01.52.iso from their site, but changed to virtio drivers in
spice-guest-tools-0.52 from spice-space.

You say it is unreproducible. Have you tried using the version of
qemu-kvm in this bug report with WinXP SP1 upgraded to SP3, using the
mentioned virtio drivers, or do you do a 'close enough' attempt to
reproduce? i.e. how much do you reproduce the environment?

Also if you can give me a place to ftp or such the VM I can send it to
you (though it will take a while since I only have 800kbps upload and a
few GB to upload; or I could send a DVD).

libvirt generates the following command line: (with some sanitization)

/usr/bin/kvm -S -M pc-1.1 -cpu kvm32 -enable-kvm -m 2048 -smp
3,sockets=3,cores=1,threads=1 -name sanitizedxp -uuid
7c3c3bea-5435-df5e-ed3a-3a213c7d9f90 -nodefconfig -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/sanitizedxp.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
-no-shutdown -boot order=c,menu=on -device
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x9 -drive
file=/home/VM/ISO/winxp_pro_sp1_psk.iso,if=none,id=drive-ide0-0-0,readonly=on,format=raw
-device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive
file=/home/VM/Storage/sanitizedxp.img,if=none,id=drive-virtio-disk0,format=qcow2
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0
-netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=21:23:11:e2:e1:36,bus=pci.0,addr=0x3
-netdev tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 -device
virtio-net-pci,netdev=hostnet1,id=net1,mac=52:21:30:40:a3:3c,bus=pci.0,addr=0x7
-netdev tap,fd=24,id=hostnet2 -device
virtio-net-pci,netdev=hostnet2,id=net2,mac=44:23:44:11:ef:3a,bus=pci.0,addr=0x8
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -chardev
spicevmc,id=charchannel0,name=vdagent -device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
-device usb-tablet,id=input0 -spice
port=5900,addr=127.0.0.1,disable-ticketing -vga qxl -global
qxl-vga.vram_size=67108864 -device AC97,id=sound0,bus=pci.0,addr=0x4
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

I hope that provides sufficient information for you to reproduce the
problem.

Regards,

Daniel
--
<erno> hm. I've lost a machine.. literally _lost_. it responds to ping,
it works completely, I just can't figure out where in my apartment it
is.
Daniel Dickinson
2013-02-24 06:25:37 UTC
Permalink
I have started the process of recreating the VM on this (incomplete,
still SP1, unactivated Windows) instance I have used ne2k drivers and
left the CPU type blank in virt-manager (XP reports this as QEMU Virtual
CPU version 1.1.2) and used the 'Microsoft ACPI-compliant system'
instead of something about SMP multiprocessor ACPI system (I can look up
the exact name if needed, but the system type is different is the upshot).

I'm thinking based on the other bug that was reported and eventually
closed due to unreproducible and no more information, that the issue was
the CPU type, whic affected the XP drivers that got installed, and some
system types that XP detects because of the configuration work with KVM
and others don't.

Bottom line is that manually specifying the cpu type is a bad idea with
XP (but other things need to be manually specified, like virtio if you
want paravirtualized speed).

Regards,

Daniel
--
<erno> hm. I've lost a machine.. literally _lost_. it responds to ping,
it works completely, I just can't figure out where in my apartment it
is.
Michael Tokarev
2013-02-24 07:42:37 UTC
Permalink
Post by Daniel Dickinson
I have started the process of recreating the VM on this (incomplete,
still SP1, unactivated Windows) instance I have used ne2k drivers and
left the CPU type blank in virt-manager (XP reports this as QEMU Virtual
CPU version 1.1.2) and used the 'Microsoft ACPI-compliant system'
instead of something about SMP multiprocessor ACPI system (I can look up
the exact name if needed, but the system type is different is the upshot).
Um. That's several misunderstandings really.

The ACPI vs non-ACPI HAL (hardware abstraction layer) is very important
difference indeed, we should use acpi-aware one these days, incl. winXP.
Non-acpi variant receives _significantly_ less testing and is much more
difficult to deal with.

ne2k is also one of the less tested components, but it does not matter:
the only significant difference (question) is whenever the issue is
virtio-net-specific (in this case just _changing_ NIC type to anything
non-virtio will be enough) or it is somewhere deeper.
Post by Daniel Dickinson
I'm thinking based on the other bug that was reported and eventually
closed due to unreproducible and no more information, that the issue was
That bugreport has been closed because we found and _fixed_ the actual bug
in qemu-kvm in its non-acpi part. I remember that story very well because
I spent several weeks with it.
Post by Daniel Dickinson
the CPU type, whic affected the XP drivers that got installed, and some
system types that XP detects because of the configuration work with KVM
and others don't.
No, this is not the case. CPU type has nothing to do with that, and the
(NIC) drivers are the same too. Only "machine" type is what make the
difference there, so that whole driver stack on win side is different in
its lower component (the HAL), and it iteracts differently with the (virtual)
harware too, even with different part of that hardware.
Post by Daniel Dickinson
Bottom line is that manually specifying the cpu type is a bad idea with
XP (but other things need to be manually specified, like virtio if you
want paravirtualized speed).
You can change CPU type of a guest (be it winXP or win2k or any other win
machine (or non-win) OS) at will, the only key is to keep cpu feature flags
which are essential for the software inside. For example, you can't mask
`lm' (long mode, essentially 64bit support) bit of the CPU if you run a
64-bit guest, since it will fail to boot. You probably can't mask cmov
flag, but I'm not sure whenever winXP uses it or not, and it may load a
slightly different kernel if cmov is non-present. And so on.

What CPU type _may_ be used for in win is to be counted in the activation
mechanism. I never tried this, since I always use VL version of windows,
which don't require activation. But this is not our case either.

FWIW, you can reach me on irc - freenode or oftc, I'm `mjt' there.

/mjt
Daniel Dickinson
2013-02-24 17:48:53 UTC
Permalink
Post by Michael Tokarev
Post by Daniel Dickinson
I have started the process of recreating the VM on this (incomplete,
still SP1, unactivated Windows) instance I have used ne2k drivers and
left the CPU type blank in virt-manager (XP reports this as QEMU Virtual
CPU version 1.1.2) and used the 'Microsoft ACPI-compliant system'
instead of something about SMP multiprocessor ACPI system (I can look up
the exact name if needed, but the system type is different is the upshot).
Um. That's several misunderstandings really.
The ACPI vs non-ACPI HAL (hardware abstraction layer) is very important
difference indeed, we should use acpi-aware one these days, incl. winXP.
Non-acpi variant receives _significantly_ less testing and is much more
difficult to deal with.
In all cases I have used ACPI, however there are different types of ACPI
systems (I gather), which I though might be important.
Post by Michael Tokarev
the only significant difference (question) is whenever the issue is
virtio-net-specific (in this case just _changing_ NIC type to anything
non-virtio will be enough) or it is somewhere deeper.
Post by Daniel Dickinson
I'm thinking based on the other bug that was reported and eventually
closed due to unreproducible and no more information, that the issue was
That bugreport has been closed because we found and _fixed_ the actual bug
in qemu-kvm in its non-acpi part. I remember that story very well because
I spent several weeks with it.
Post by Daniel Dickinson
the CPU type, whic affected the XP drivers that got installed, and some
system types that XP detects because of the configuration work with KVM
and others don't.
No, this is not the case. CPU type has nothing to do with that, and the
(NIC) drivers are the same too. Only "machine" type is what make the
difference there, so that whole driver stack on win side is different in
its lower component (the HAL), and it iteracts differently with the (virtual)
harware too, even with different part of that hardware.
Let me put it this way. The follow command (except mac addresses which
are actually the same as the previous commandline) vs. the previous one,
gets a different HAL type (both ACPI, but different name/type for the
driver for the ACPI/system device).

/usr/bin/kvm -S -M pc-1.1 -cpu qemu32 -enable-kvm -m 2048 -smp
3,sockets=3,cores=1,threads=1 -name sanitizedxp -uuid <omitted>
-nodefconfig -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/sanitizedxp.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
-no-shutdown -boot order=c,menu=on -device
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
file=/home/VM/ISO/xpsp3_5512.080413-2113_usa_x86fre_spcd.iso,if=none,id=drive-ide0-0-0,readonly=on,format=raw
-device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive
file=/home/VM/Storage/sanitizedxp.img,if=none,id=drive-ide0-0-1,format=qcow2,cache=writeback
-device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -netdev
tap,fd=20,id=hostnet0 -device
ne2k_pci,netdev=hostnet0,id=net0,mac=XX:XX:XX:XX:XX:XX,bus=pci.0,addr=0x3 -netdev
tap,fd=21,id=hostnet1 -device
ne2k_pci,netdev=hostnet1,id=net1,mac=XX:XX:XX:XX:XX:XX,bus=pci.0,addr=0x7 -netdev
tap,fd=22,id=hostnet2 -device
ne2k_pci,netdev=hostnet2,id=net2,mac=XX:XX:XX:XX:XX:XX,bus=pci.0,addr=0x8 -chardev
pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0
-device usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga std -device
AC97,id=sound0,bus=pci.0,addr=0x4 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

Unless the issue comes from windows updates (not yet done because I'm
not babysitting the VM, but leaving and looking at it from time to time).
Post by Michael Tokarev
Post by Daniel Dickinson
Bottom line is that manually specifying the cpu type is a bad idea with
XP (but other things need to be manually specified, like virtio if you
want paravirtualized speed).
You can change CPU type of a guest (be it winXP or win2k or any other win
machine (or non-win) OS) at will, the only key is to keep cpu feature flags
which are essential for the software inside. For example, you can't mask
`lm' (long mode, essentially 64bit support) bit of the CPU if you run a
64-bit guest, since it will fail to boot. You probably can't mask cmov
flag, but I'm not sure whenever winXP uses it or not, and it may load a
slightly different kernel if cmov is non-present. And so on.
What CPU type _may_ be used for in win is to be counted in the activation
mechanism. I never tried this, since I always use VL version of windows,
which don't require activation. But this is not our case either.
FWIW, you can reach me on irc - freenode or oftc, I'm `mjt' there.
/mjt
--
<erno> hm. I've lost a machine.. literally _lost_. it responds to ping,
it works completely, I just can't figure out where in my apartment it
is.
Michael Tokarev
2013-02-24 07:59:29 UTC
Permalink
Post by Daniel Dickinson
I have started the process of recreating the VM on this (incomplete,
still SP1, unactivated Windows)[...]
It seems the process is somehow very slow. In order to dramatically
sped it up, please use cache=unsafe with your drives. It is recommended
for an OS install too. The effect of it is that qemu ignores FLUSH_CACHE
commands from gest, immediately reporting success, so I/O is dramatically
reduced. Since it is inital OS install, in case of crash or anything it
is trivial to re-start it from the beginning.

/mjt
Michael Tokarev
2013-02-24 07:31:31 UTC
Permalink
Post by Daniel Dickinson
Post by Michael Tokarev
Control: tag -1 + unreproducible moreinfo
Post by Daniel Dickinson
Package: qemu-kvm
Version: 1.1.2+dfsg-5
Severity: normal
With the version of qemu-kvm in Wheezy (testing) (1.1.2+dfsg-5) a Windows XP guest fails to get a DHCP address from dnsmasq started by libvirt-bin (both from Wheezy). Downgrading to the Squeeze version of qemu-kvm works around the issue (i.e. the Windows XP guest gets DHCP). I have also tried upgrading dnsmasq with wheezy qemu-kvm, and tried using experimental qemu-kvm, but neither works.
This is obviously a regression from 0.12<...> (squeeze version).
Oh, I am using virtio networking for kvm and netkvm.sys (virtio for XP) from spice-space (spice client guest-tools for Windows).
Well. You have to be *much* more specific here. DHCP in WinXP clients,
I have found bug #647312. Apparently this is not a new problem, but one
for which you have not found a consistent reproducible case. If you
could have stated that rather simply marking it as unreproducibile and
info it'd be appreciated. As a working developer I know that not having
information when you need it is frustrating, but at the same time
sometimes all you need it to know there is a problem, and also some
debian developers dump on you when you provide info they consider stupid
to provide or as irrelevant meaningless details, and I hadn't found that
bug before. In addition if all you needed was a little info providing
scads of detail would be a waste of time.
Um. That bug -- #647312 -- is about different issue. I remember spending
lots of time trying various combinations and arguing with Vladimir Stavrinov,
but we finally found the key combination - which only he can know was used
initially (somehow it never occured to me to actually *try* -no-acpi, even
if I suggested it in the very beginning).

Once we knew the way to trigger it, we found the cause and the actual fix,
so I closed the bugreport, quite some time ago, so I don't have a reason
to re-hash it again.

Now, after re-reading that bugreport, I see there are 2 messages at the end
to which I didn't reply and which I don't see here in my qemu folder. Somehow
I haven't received the two. Thank you for pointing this bugreport to me,
I'll ping Vladimir about two last his emails in that bugreport.

So I weren't able to "simple state that I found no reproducer for #647312",
because we found at least the original cause and actually _fixed_ it.

"Simple marking it" as unreproducible _is_ stating that there's no reproducer,
nothing more nothing less, -- I didn't close it, and didn't intend to. This
bugreport was in state "can't reproduce, need more info", which is what it
really is on my side, so I don't really understand why you complain. Can you
elaborate please?
Post by Daniel Dickinson
About three days ago I installed Windows XP Professional (32-bit) SP1
using virt-manager with everything para-virtualized except storage
(upgrade to SP3 using an SP3 CD after the intial install, and before
virtio drivers). After sp3 I installed virtio for everthing except
storage, then moved C: to virtio as well (and had to re-activate
Windows). I used networks I had previously defined and used with debian
guests. I have two macvtap 'networks', one to my LAN and the other to a
second NIC in the host which is used only occasionally for manually
configured networking (normally no cable). The third network is a
libvirt 'isolated network' (bridge adapter on the host, tun/tap adapter
on host as part of bridge, dnsmasq serves dhcp to the guests).
There are three cores assigned to the VM (of 4). The CPU is configured
to be kvm32.
Ok, this sounds fine. I don't understand for now how these macvtaps are
related, but that's probably not important.
Post by Daniel Dickinson
Post by Michael Tokarev
with and without libvirt, with dnsmasq or other DHCP servers, with virtio
or other virtial NICs works for many, many users and installations. In
particular, it surely works for me, not only for WinXP but for all other
guests I have (numerous windows, linux, *bsd and some other more exotic
ones).
Please at least provide version number of the virtio drivers, and try with
other kinds of virtual NICs.
Was always intending to, the bug report was an initial 'are you aware of
the problem' report.
In that case please state that in your bugreport. From how you see it,
you report a generic "works for all" issue without any details and
complain that the maintainer marks it as "nonreproducible". I didn't
know it is "are you aware of" sort of reports, and even if I did, I really
need more info, exactly the way I marked it, to which you complained.
Post by Daniel Dickinson
I found the problem has been reported before and
that you can't reproduce it in your environment. Again, saying so
instead of 'it surely works for me' as if it wasn't a real problem would
have been helpful.
Yet again, I've no idea that this problem has been reported before. The
issue in #647312 has been _fixed_ long time ago. I wasn't aware of the 2
last emails in there. And "it surely works for me" is true, so I don't
understant, do you want me to lie to you?
Post by Daniel Dickinson
It has been apparently seen upstream as well, but
the issue is reproducing it, but the impression I got from you was that
you didn't believe the problem actually existed, not that you really
were looking for more information.
Blah. Sorry, "this bugreport needs more info and I can't reproduce it"
is exactly what it means. If you read it differently, please clarify.
Even if my English isn't native (I'm Russian), I at least hope these
simple words are read correctly.
Post by Daniel Dickinson
Not sure what purpose my testing all kinds of NICs would be since that
has already been shown not to be the issue by other users. I did try
the e1000 driver and got the same results (DHCP OFFER sent by dnsmasq
but XP doesn't ACK it it, and doesn't get an IP address).
When you deal with a problem, you have to isolate it first. If you disagree,
please refrain from filing bugreports, since if you're not cooperative it
is impossible to solve it. I worked in support department for many years,
and you should be able to understand this issue too (as a developer).
Users report all sorts of problems - like, "my favorite application does
not work" - and the causes sometimes are so surprizing it is difficult
to imagine - like, they forgot to turn on the computer, or network does
not function (no internet connection) etc etc. This is why the supoprt
team always asks whenever this happens on a nearby computer too, or whenever
the user is able to run some other application, -- things like this.

It is _essential_ to try to isolate the issue before attempting to "solve"
it, because without understanding _where_ it is, it is impossible to solve.

So again, I don't understand what's wrong with my request to provide more
information.

I didn't ask you to try "all kinds of NICs", -- the only my intention was
to understand whenever this is virtio-driver-specific or network-subsystem-
specific, or maybe (like in #647312 - at least in main part of it) -
interrupt-subsystem-specific. I asked to try _any_ other network, not _all_
other.

Now, thank you for trying and actually providing one essential bit. So I
conclude that other NICs behave the same, and it is "a bit" deeper problem
than could be if it were virtio-specific.
Post by Daniel Dickinson
Virtio Driver is from RedHat Inc 22/01/2013, version 51.64.104.5200 and
not digitally signed.
I initially had installed the driver from the RedHat iso
virtio-win-01.52.iso from their site, but changed to virtio drivers in
spice-guest-tools-0.52 from spice-space.
So these are latest available. But as you say above, it is not virtio-
specific, so we may forget about virtio and their versions (it was just
one of my two initial guesses).
Post by Daniel Dickinson
You say it is unreproducible. Have you tried using the version of
qemu-kvm in this bug report with WinXP SP1 upgraded to SP3, using the
mentioned virtio drivers, or do you do a 'close enough' attempt to
reproduce? i.e. how much do you reproduce the environment?
I don't have WinXP SP1, only SP3. Its been long time ago when I removed
last SP2 install CD image. Sure there may be some difference between
winXP installed as SP1 and upgraded to SP3 compared with installed as SP3
from the beginning. However we don't have reasons to believe it is the
case, at least suffficient enough to seriously try it right at the arrival
of "do you aware of the problem?" (as you say) bugreport. At least it is
definitely pointless without other, much easier and obvious, steps first.

And if we were to mimic your environment _this_ close, I'll have to buy
the same hardware as your, and follow every step of your host OS life
too, -- maybe _there_ is some key difference to the reproducer? But that
obviously will have to be after an attempt to install SP1 and upgrade it
to SP3, which is, in turn, after first obvious _simple_ steps.

As you have found from #647312, I did numerous installs of numerous windows
in numerous versions of qemu-kvm (that bugreport only mentions winXP, but
since I deal with other bugreports about other windows too, I sure tried
many other versions too). I installed fresh XPsp3 on wheezy qemu-kvm version
too (not exactly 1.1.2+dfsg-5, but indeed very close to that, and the changes
between -5 and what I've tried are irrelevant for our case).
Post by Daniel Dickinson
Also if you can give me a place to ftp or such the VM I can send it to
you (though it will take a while since I only have 800kbps upload and a
few GB to upload; or I could send a DVD).
Again, let's try easier steps first.

Speaking of providing an image -- do you have more or less small images
(fresh winXP fits into 1 Gb IIRC, which can be well compressed further)?
Because at that speed large amount of gigs will be just too slow for you.
I can provide some space for you to upload things, the limiting factor
will be your uplink.

Sending DVDs isn't practical I think, I'm in Russia and you're apparently
in Canada, it will be slow and will be unnecessary expensive.
Post by Daniel Dickinson
libvirt generates the following command line: (with some sanitization)
/usr/bin/kvm -S -M pc-1.1 -cpu kvm32 -enable-kvm -m 2048 -smp
Why -M pc-1.1? You said it is a regression from 0.12? Just to clarify:
are we talking about fresh install of winXP (you mentioned it in this
(second) email but it was more about using old install in your first
email)?

Why kvm32? I guess it is lack of documentation in action - there's no
mention what all these {qemu,kvm}{32,64} actually means, anywhere. Oh
well.
Post by Daniel Dickinson
3,sockets=3,cores=1,threads=1 -name sanitizedxp -uuid
7c3c3bea-5435-df5e-ed3a-3a213c7d9f90 -nodefconfig -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/sanitizedxp.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
-no-shutdown -boot order=c,menu=on -device
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x9 -drive
file=/home/VM/ISO/winxp_pro_sp1_psk.iso,if=none,id=drive-ide0-0-0,readonly=on,format=raw
-device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive
file=/home/VM/Storage/sanitizedxp.img,if=none,id=drive-virtio-disk0,format=qcow2
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0
-netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=21:23:11:e2:e1:36,bus=pci.0,addr=0x3
-netdev tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 -device
virtio-net-pci,netdev=hostnet1,id=net1,mac=52:21:30:40:a3:3c,bus=pci.0,addr=0x7
-netdev tap,fd=24,id=hostnet2 -device
virtio-net-pci,netdev=hostnet2,id=net2,mac=44:23:44:11:ef:3a,bus=pci.0,addr=0x8
So, you use vhost-net, and you have 3 NICs. Now I think I'm trying
to understand why did you mention other macvtaps. FWIW, macvtaps didn't
work well for a long time, but I haven't tried these recently either
(it is about kernel mostly, for one thing these were very slow).

Please try without vhost, and with only one NIC.
Post by Daniel Dickinson
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -chardev
spicevmc,id=charchannel0,name=vdagent -device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
-device usb-tablet,id=input0 -spice
port=5900,addr=127.0.0.1,disable-ticketing -vga qxl -global
qxl-vga.vram_size=67108864 -device AC97,id=sound0,bus=pci.0,addr=0x4
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
The rest does not seem to be very interesting.
Post by Daniel Dickinson
I hope that provides sufficient information for you to reproduce the
problem.
Well. After your second email the main question to answer is whenever
you want to argue or to solve the issue. I have mixed feeling about
this part. If you want to solve, please let's go to actually trying to
find the problematic place/option/component/combination.

Except of the 3 NICs, you've shown a standard way to install a guest in
qemu-kvm & libvirt, which is used by many people worldwide every day,
so I think you know the answer to the "reproduce" hope. We have to
find a place on your machine which is different from others which works.

But before, please answer to a few questions, which I already raized above.

1. Do you observe the issue with fresh install of winXP in qemu-kvm 1.1.2,
or does it only occur when you install winXP in some older version of qemu
and upgrade it to 1.1.2?

2. Does the problem occur only with DHCP packets or with network in general?
I mean, if you configure network manually, does it work?

3. vhost_net and 3 NICs as I mentioned - please try without vhost and with
only one NIC.

4. How do you install windows? I ask because sometimes people use dedicated
tools (like virt-install for linux) which uses different qemu-kvm options
(like -no-acpi) which lead to surprizes. So, is it the same command line
as was used to _install_ windows?

5. Please confirm that the problem really exists with non-virtio NICs too.
I don't know whenever you were just shooting/arguing or actually tried
other NIC(s). (Just one another NIC type is sufficient.)

And if all the above "fails", do you have a winXP SP3 install? There's a
little chance that this (SP1 vs SP3) is what making our difference (yes the
chance is little), so we can try to sort this out too. I can give you such
an install image (cd) if you want - your downlink should be enough to grab
one CD fast.


The thing is: _you_ have environment where you observe the issue. Many other
people worldwide don't see these issues, me included, so it isn't just about
"install this on this and you'll see it fail". So your work is essential here,
without your help it is impossible to do anything with your issue. If you want
to argue instead of actually helping, that's your choice. But I at least hope
I've shown you that I in no way wanted to offend you or don't want to deal with
the issue or something. I spent significant amount of time for you already,
answering to every your point and describing my reasons in full details, and
this is definitely not the most useful way to spend time. Maybe my initial
tone was a bit harsh, I dunno - but if it were, it was a natural reaction to
a bugreport about something which works for (at least nearly) everyone and
without any details (and without a mention that it is just "do you aware of"
sort of report). So, we either work on this issue or we don't. If we do,
I need your help.

Thanks,

/mjt
Daniel Dickinson
2013-02-24 16:53:02 UTC
Permalink
Post by Michael Tokarev
Post by Daniel Dickinson
Post by Michael Tokarev
Control: tag -1 + unreproducible moreinfo
Post by Daniel Dickinson
Package: qemu-kvm
Version: 1.1.2+dfsg-5
Severity: normal
With the version of qemu-kvm in Wheezy (testing) (1.1.2+dfsg-5) a Windows XP guest fails to get a DHCP address from dnsmasq started by libvirt-bin (both from Wheezy). Downgrading to the Squeeze version of qemu-kvm works around the issue (i.e. the Windows XP guest gets DHCP). I have also tried upgrading dnsmasq with wheezy qemu-kvm, and tried using experimental qemu-kvm, but neither works.
This is obviously a regression from 0.12<...> (squeeze version).
Oh, I am using virtio networking for kvm and netkvm.sys (virtio for XP) from spice-space (spice client guest-tools for Windows).
Well. You have to be *much* more specific here. DHCP in WinXP clients,
I have found bug #647312. Apparently this is not a new problem, but one
for which you have not found a consistent reproducible case. If you
could have stated that rather simply marking it as unreproducibile and
info it'd be appreciated. As a working developer I know that not having
information when you need it is frustrating, but at the same time
sometimes all you need it to know there is a problem, and also some
debian developers dump on you when you provide info they consider stupid
to provide or as irrelevant meaningless details, and I hadn't found that
bug before. In addition if all you needed was a little info providing
scads of detail would be a waste of time.
Um. That bug -- #647312 -- is about different issue. I remember spending
lots of time trying various combinations and arguing with Vladimir Stavrinov,
but we finally found the key combination - which only he can know was used
initially (somehow it never occured to me to actually *try* -no-acpi, even
if I suggested it in the very beginning).
Once we knew the way to trigger it, we found the cause and the actual fix,
so I closed the bugreport, quite some time ago, so I don't have a reason
to re-hash it again.
Now, after re-reading that bugreport, I see there are 2 messages at the end
to which I didn't reply and which I don't see here in my qemu folder. Somehow
I haven't received the two. Thank you for pointing this bugreport to me,
I'll ping Vladimir about two last his emails in that bugreport.
So I weren't able to "simple state that I found no reproducer for #647312",
because we found at least the original cause and actually _fixed_ it.
Ah, ok, I saw the last two messages and didn't clue in to the fact they
were after the bug was closed and might not have been seen.
Post by Michael Tokarev
"Simple marking it" as unreproducible _is_ stating that there's no reproducer,
nothing more nothing less, -- I didn't close it, and didn't intend to. This
bugreport was in state "can't reproduce, need more info", which is what it
really is on my side, so I don't really understand why you complain. Can you
elaborate please?
I wasn't complaining about moreinfo, but unreproducible, because to me
unreproducible is I've tried it and can't get the same result, based on
the bug report, and with the same version, and it seemed to me you were
saying you didn't have enough information to test the scenario, but also
marked it unreproducible even though you hadn't tried to reproduce it.
Post by Michael Tokarev
Post by Daniel Dickinson
About three days ago I installed Windows XP Professional (32-bit) SP1
using virt-manager with everything para-virtualized except storage
(upgrade to SP3 using an SP3 CD after the intial install, and before
virtio drivers). After sp3 I installed virtio for everthing except
storage, then moved C: to virtio as well (and had to re-activate
Windows). I used networks I had previously defined and used with debian
guests. I have two macvtap 'networks', one to my LAN and the other to a
second NIC in the host which is used only occasionally for manually
configured networking (normally no cable). The third network is a
libvirt 'isolated network' (bridge adapter on the host, tun/tap adapter
on host as part of bridge, dnsmasq serves dhcp to the guests).
There are three cores assigned to the VM (of 4). The CPU is configured
to be kvm32.
Ok, this sounds fine. I don't understand for now how these macvtaps are
related, but that's probably not important.
Post by Daniel Dickinson
Post by Michael Tokarev
with and without libvirt, with dnsmasq or other DHCP servers, with virtio
or other virtial NICs works for many, many users and installations. In
particular, it surely works for me, not only for WinXP but for all other
guests I have (numerous windows, linux, *bsd and some other more exotic
ones).
Please at least provide version number of the virtio drivers, and try with
other kinds of virtual NICs.
Was always intending to, the bug report was an initial 'are you aware of
the problem' report.
In that case please state that in your bugreport. From how you see it,
you report a generic "works for all" issue without any details and
complain that the maintainer marks it as "nonreproducible". I didn't
know it is "are you aware of" sort of reports, and even if I did, I really
need more info, exactly the way I marked it, to which you complained.
Actually it wasn't the more info that was my complaint it was the
unreproducible. I guess it means something different to me than to you.
(At my work unreproducible/worksforme means you've actually tried it
after reading the bug report and setting things up the same way (as
close as possible), and when there is not enough information getting the
information before marking worksforme).
Post by Michael Tokarev
Post by Daniel Dickinson
I found the problem has been reported before and
that you can't reproduce it in your environment. Again, saying so
instead of 'it surely works for me' as if it wasn't a real problem would
have been helpful.
Yet again, I've no idea that this problem has been reported before. The
issue in #647312 has been _fixed_ long time ago. I wasn't aware of the 2
last emails in there. And "it surely works for me" is true, so I don't
understant, do you want me to lie to you?
Post by Daniel Dickinson
It has been apparently seen upstream as well, but
the issue is reproducing it, but the impression I got from you was that
you didn't believe the problem actually existed, not that you really
were looking for more information.
Blah. Sorry, "this bugreport needs more info and I can't reproduce it"
is exactly what it means. If you read it differently, please clarify.
Even if my English isn't native (I'm Russian), I at least hope these
simple words are read correctly.
Actually I spend too much time without being around actual people and
start trying to figure out too much email (that is all the stuff that
comes from body language and tone).

Sorry.
Post by Michael Tokarev
Post by Daniel Dickinson
Not sure what purpose my testing all kinds of NICs would be since that
has already been shown not to be the issue by other users. I did try
the e1000 driver and got the same results (DHCP OFFER sent by dnsmasq
but XP doesn't ACK it it, and doesn't get an IP address).
When you deal with a problem, you have to isolate it first. If you disagree,
please refrain from filing bugreports, since if you're not cooperative it
is impossible to solve it. I worked in support department for many years,
and you should be able to understand this issue too (as a developer).
Users report all sorts of problems - like, "my favorite application does
not work" - and the causes sometimes are so surprizing it is difficult
to imagine - like, they forgot to turn on the computer, or network does
not function (no internet connection) etc etc. This is why the supoprt
team always asks whenever this happens on a nearby computer too, or whenever
the user is able to run some other application, -- things like this.
It is _essential_ to try to isolate the issue before attempting to "solve"
it, because without understanding _where_ it is, it is impossible to solve.
So again, I don't understand what's wrong with my request to provide more
information.
I didn't ask you to try "all kinds of NICs", -- the only my intention was
to understand whenever this is virtio-driver-specific or network-subsystem-
specific, or maybe (like in #647312 - at least in main part of it) -
interrupt-subsystem-specific. I asked to try _any_ other network, not _all_
other.
Sorry I misread.
Post by Michael Tokarev
Now, thank you for trying and actually providing one essential bit. So I
conclude that other NICs behave the same, and it is "a bit" deeper problem
than could be if it were virtio-specific.
I read you second email, and yes I actually did try the e1000 setting so
it is not virtio-specific.
Post by Michael Tokarev
Post by Daniel Dickinson
Virtio Driver is from RedHat Inc 22/01/2013, version 51.64.104.5200 and
not digitally signed.
I initially had installed the driver from the RedHat iso
virtio-win-01.52.iso from their site, but changed to virtio drivers in
spice-guest-tools-0.52 from spice-space.
So these are latest available. But as you say above, it is not virtio-
specific, so we may forget about virtio and their versions (it was just
one of my two initial guesses).
Post by Daniel Dickinson
You say it is unreproducible. Have you tried using the version of
qemu-kvm in this bug report with WinXP SP1 upgraded to SP3, using the
mentioned virtio drivers, or do you do a 'close enough' attempt to
reproduce? i.e. how much do you reproduce the environment?
I don't have WinXP SP1, only SP3. Its been long time ago when I removed
last SP2 install CD image. Sure there may be some difference between
winXP installed as SP1 and upgraded to SP3 compared with installed as SP3
from the beginning. However we don't have reasons to believe it is the
case, at least suffficient enough to seriously try it right at the arrival
of "do you aware of the problem?" (as you say) bugreport. At least it is
definitely pointless without other, much easier and obvious, steps first.
And if we were to mimic your environment _this_ close, I'll have to buy
the same hardware as your, and follow every step of your host OS life
too, -- maybe _there_ is some key difference to the reproducer? But that
obviously will have to be after an attempt to install SP1 and upgrade it
to SP3, which is, in turn, after first obvious _simple_ steps.
Certainly, didn't mean to imply otherwise, sorry I'm not being clear.
Post by Michael Tokarev
As you have found from #647312, I did numerous installs of numerous windows
in numerous versions of qemu-kvm (that bugreport only mentions winXP, but
since I deal with other bugreports about other windows too, I sure tried
many other versions too). I installed fresh XPsp3 on wheezy qemu-kvm version
too (not exactly 1.1.2+dfsg-5, but indeed very close to that, and the changes
between -5 and what I've tried are irrelevant for our case).
Ok, fair enough. I'm bringing in too much baggage from other
experiences, not your fault. Sorry.
Post by Michael Tokarev
Post by Daniel Dickinson
Also if you can give me a place to ftp or such the VM I can send it to
you (though it will take a while since I only have 800kbps upload and a
few GB to upload; or I could send a DVD).
Again, let's try easier steps first.
Speaking of providing an image -- do you have more or less small images
(fresh winXP fits into 1 Gb IIRC, which can be well compressed further)?
Because at that speed large amount of gigs will be just too slow for you.
I can provide some space for you to upload things, the limiting factor
will be your uplink.
Sending DVDs isn't practical I think, I'm in Russia and you're apparently
in Canada, it will be slow and will be unnecessary expensive.
Actually sending a DVD isn't that bad, but I agree that we should try
the easier steps first. If I were to send you something heavy or big on
the other hand, it'd be a nightmare.

I, unfortunately, didn't make a copy of the VM prior to installing SP3
and doing updates. There's nothing on it except that, but it's still
big enough.
Post by Michael Tokarev
Post by Daniel Dickinson
libvirt generates the following command line: (with some sanitization)
/usr/bin/kvm -S -M pc-1.1 -cpu kvm32 -enable-kvm -m 2048 -smp
That was what was automatically put in place by libvirt. And yes
regression from 0.12.
Post by Michael Tokarev
are we talking about fresh install of winXP (you mentioned it in this
(second) email but it was more about using old install in your first
email)?
Ok, the image was installed about three days ago using virt-manager. I
downgraded qemu-kvm to 0.12 to see if it made a difference (e.g. because
I rand out of configuration things it could be; my initial assumption
was that I had some bad config in DHCP or some such).
Post by Michael Tokarev
Why kvm32? I guess it is lack of documentation in action - there's no
mention what all these {qemu,kvm}{32,64} actually means, anywhere. Oh
well.
Yeah, I was making sure to force 32-bit. Docs would be nice, but I
understand quite well why they don't exist (too bad docs don't write
themselves, eh?)
Post by Michael Tokarev
Post by Daniel Dickinson
3,sockets=3,cores=1,threads=1 -name sanitizedxp -uuid
7c3c3bea-5435-df5e-ed3a-3a213c7d9f90 -nodefconfig -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/sanitizedxp.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
-no-shutdown -boot order=c,menu=on -device
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x9 -drive
file=/home/VM/ISO/winxp_pro_sp1_psk.iso,if=none,id=drive-ide0-0-0,readonly=on,format=raw
-device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive
file=/home/VM/Storage/sanitizedxp.img,if=none,id=drive-virtio-disk0,format=qcow2
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0
-netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=21:23:11:e2:e1:36,bus=pci.0,addr=0x3
-netdev tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 -device
virtio-net-pci,netdev=hostnet1,id=net1,mac=52:21:30:40:a3:3c,bus=pci.0,addr=0x7
-netdev tap,fd=24,id=hostnet2 -device
virtio-net-pci,netdev=hostnet2,id=net2,mac=44:23:44:11:ef:3a,bus=pci.0,addr=0x8
So, you use vhost-net, and you have 3 NICs. Now I think I'm trying
to understand why did you mention other macvtaps. FWIW, macvtaps didn't
work well for a long time, but I haven't tried these recently either
(it is about kernel mostly, for one thing these were very slow).
Please try without vhost, and with only one NIC.
Can you do that from virt-manager (the without vhost bit)? If not, I'll
have to also change to vnc so I can connect without libvirt stuff.
Post by Michael Tokarev
Post by Daniel Dickinson
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -chardev
spicevmc,id=charchannel0,name=vdagent -device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
-device usb-tablet,id=input0 -spice
port=5900,addr=127.0.0.1,disable-ticketing -vga qxl -global
qxl-vga.vram_size=67108864 -device AC97,id=sound0,bus=pci.0,addr=0x4
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
The rest does not seem to be very interesting.
Post by Daniel Dickinson
I hope that provides sufficient information for you to reproduce the
problem.
Well. After your second email the main question to answer is whenever
you want to argue or to solve the issue. I have mixed feeling about
this part. If you want to solve, please let's go to actually trying to
find the problematic place/option/component/combination.
Except of the 3 NICs, you've shown a standard way to install a guest in
qemu-kvm & libvirt, which is used by many people worldwide every day,
so I think you know the answer to the "reproduce" hope. We have to
find a place on your machine which is different from others which works.
But before, please answer to a few questions, which I already raized above.
1. Do you observe the issue with fresh install of winXP in qemu-kvm 1.1.2,
or does it only occur when you install winXP in some older version of qemu
and upgrade it to 1.1.2?
Fresh install on qemu-kvm 1.1.2
Post by Michael Tokarev
2. Does the problem occur only with DHCP packets or with network in general?
I mean, if you configure network manually, does it work?
3. vhost_net and 3 NICs as I mentioned - please try without vhost and with
only one NIC.
I'll try the one NIC for sure, vhost might be harder (I don't use the
command line to launch kvm).
Post by Michael Tokarev
4. How do you install windows? I ask because sometimes people use dedicated
tools (like virt-install for linux) which uses different qemu-kvm options
(like -no-acpi) which lead to surprizes. So, is it the same command line
as was used to _install_ windows?
I used virt-manager, with the same config, except disk was IDE.
Post by Michael Tokarev
5. Please confirm that the problem really exists with non-virtio NICs too.
I don't know whenever you were just shooting/arguing or actually tried
other NIC(s). (Just one another NIC type is sufficient.)
Actually did try, yes.
Post by Michael Tokarev
And if all the above "fails", do you have a winXP SP3 install? There's a
little chance that this (SP1 vs SP3) is what making our difference (yes the
chance is little), so we can try to sort this out too. I can give you such
an install image (cd) if you want - your downlink should be enough to grab
one CD fast.
If you could that would be great. I do not have a straight SP3 (but I
think you're right that's probably not the issue).
Post by Michael Tokarev
The thing is: _you_ have environment where you observe the issue. Many other
people worldwide don't see these issues, me included, so it isn't just about
"install this on this and you'll see it fail". So your work is essential here,
without your help it is impossible to do anything with your issue. If you want
to argue instead of actually helping, that's your choice. But I at least hope
I've shown you that I in no way wanted to offend you or don't want to deal with
the issue or something. I spent significant amount of time for you already,
answering to every your point and describing my reasons in full details, and
this is definitely not the most useful way to spend time. Maybe my initial
tone was a bit harsh, I dunno - but if it were, it was a natural reaction to
a bugreport about something which works for (at least nearly) everyone and
without any details (and without a mention that it is just "do you aware of"
sort of report). So, we either work on this issue or we don't. If we do,
I need your help.
And my apologies. I've gotten a little squirrelly lately (I work
remotely so I spend most of my day with no human contact, and it's been
at least a year or year and a half since I've seen another developer or
even techie in person). (I'm also in a small town and have no driver's
license).

Regards,

Daniel
--
<erno> hm. I've lost a machine.. literally _lost_. it responds to ping,
it works completely, I just can't figure out where in my apartment it
is.
Daniel Dickinson
2013-02-24 21:32:30 UTC
Permalink
Hi Michael,

I have taken a working VM (from previous email) and added virtio drivers
from spice-space (guest-tools installer) and ended up with an
non-working system. I will try the same config with ne2k (which was
working) and see if it is broken now that virtio is installed (I'm
wondering this because e1000 drivers didn't work in the other VM). I
will also try removing all ethernet drivers, shutting down the VM, and
making only the offending NIC remain (the macvtap ones work).

Regards,

Daniel
--
<erno> hm. I've lost a machine.. literally _lost_. it responds to ping,
it works completely, I just can't figure out where in my apartment it
is.
Michael Tokarev
2013-02-25 05:53:34 UTC
Permalink
Post by Daniel Dickinson
Hi Michael,
I have taken a working VM (from previous email) and added virtio drivers
from spice-space (guest-tools installer) and ended up with an
non-working system.
Excellent.

That's exactly what I call "unreproducible", -- it never occured to me that
these drivers may give this effect.

So now we have more or less easy way to reproduce and hence to try to fix it.
Good work!

Note that this is unlikely a regression, since these drivers aren't used
with 0.12 version of qemu-kvm - it just does not provide necessary virtual
devices for these drivers to run.
Post by Daniel Dickinson
I will try the same config with ne2k (which was
working) and see if it is broken now that virtio is installed (I'm
wondering this because e1000 drivers didn't work in the other VM). I
One better candidate is rtl8139 instead of ne2k.

e1000 needs drivers from intel for this card, the ones which come with
winXP wont work as these are too old.

Now that's, again, an interesting point. So, did you actually try other
NICs? You said you tried e1000, and it didn't work - I assumed it had
the same problem as virtio one. But now you're saying it does not work
at all?
Post by Daniel Dickinson
will also try removing all ethernet drivers, shutting down the VM, and
making only the offending NIC remain (the macvtap ones work).
Oww. This is even more interesting. There's no need to remove other
vNICs really, -- it was my (wrong) guess, I thought maybe win behaves
somehow differently when it has more than one NIC and thus enables
some sort of router/firewall mode.

But the diff. between macvtap and bridge vNICs is very good point. It
means more things to try, maybe there are several issues present.

What might also be useful is to try configuring network manually and
checking if it is only dhcp or whole thing which is at problem. I
already asked you to do so in the previous email.


I'll try to play with all this, now there's finally quite something to
play with. Good job!

Thank you!

/mjt
Daniel Dickinson
2013-02-25 06:19:12 UTC
Permalink
On 25/02/13 12:53 AM, Michael Tokarev wrote:
[snip]
Post by Michael Tokarev
Note that this is unlikely a regression, since these drivers aren't used
with 0.12 version of qemu-kvm - it just does not provide necessary virtual
devices for these drivers to run.
Even the network ones? Not sure how the network was working then
because I had libvirt set to virtio, and the virtio drivers on the guest.
Post by Michael Tokarev
Post by Daniel Dickinson
I will try the same config with ne2k (which was
working) and see if it is broken now that virtio is installed (I'm
wondering this because e1000 drivers didn't work in the other VM). I
One better candidate is rtl8139 instead of ne2k.
e1000 needs drivers from intel for this card, the ones which come with
winXP wont work as these are too old.
Yeah, I used intel's driver. Is rtl8139 in XP? I thought it wasn't,
which is why I did ne2k.
Post by Michael Tokarev
Now that's, again, an interesting point. So, did you actually try other
NICs? You said you tried e1000, and it didn't work - I assumed it had
the same problem as virtio one. But now you're saying it does not work
at all?
Sorry, that's me not being clear. It was the same problem as virtio (no
DHCP).
Post by Michael Tokarev
Post by Daniel Dickinson
will also try removing all ethernet drivers, shutting down the VM, and
making only the offending NIC remain (the macvtap ones work).
Oww. This is even more interesting. There's no need to remove other
vNICs really, -- it was my (wrong) guess, I thought maybe win behaves
somehow differently when it has more than one NIC and thus enables
some sort of router/firewall mode.
But the diff. between macvtap and bridge vNICs is very good point. It
means more things to try, maybe there are several issues present.
What might also be useful is to try configuring network manually and
checking if it is only dhcp or whole thing which is at problem. I
already asked you to do so in the previous email.
Sorry, forgot. I will have to create a new VM in a bad state to get
test that.
Post by Michael Tokarev
I'll try to play with all this, now there's finally quite something to
play with. Good job!
Thank you!
And thank you for being patient despite my complaints, and for looking
into this. In a later email you'll note I discovered how to get out of
the bad state (basically do and update driver and don't let windows do
windows update and specifically set search to the dir spice-space
installs the driver into)

Regards,

Daniel
--
<erno> hm. I've lost a machine.. literally _lost_. it responds to ping,
it works completely, I just can't figure out where in my apartment it
is.
Michael Tokarev
2013-02-26 13:55:37 UTC
Permalink
Post by Daniel Dickinson
[snip]
Post by Michael Tokarev
Note that this is unlikely a regression, since these drivers aren't used
with 0.12 version of qemu-kvm - it just does not provide necessary virtual
devices for these drivers to run.
Even the network ones? Not sure how the network was working then
because I had libvirt set to virtio, and the virtio drivers on the guest.
No, network drivers (netkvm) are obviously being used. I was referring
to display (qxl) and baloon drivers.
Post by Daniel Dickinson
Post by Michael Tokarev
One better candidate is rtl8139 instead of ne2k.
e1000 needs drivers from intel for this card, the ones which come with
winXP wont work as these are too old.
Yeah, I used intel's driver. Is rtl8139 in XP? I thought it wasn't,
which is why I did ne2k.
rtl8139 is supported in XP out of the box.
Post by Daniel Dickinson
Post by Michael Tokarev
Now that's, again, an interesting point. So, did you actually try other
NICs? You said you tried e1000, and it didn't work - I assumed it had
the same problem as virtio one. But now you're saying it does not work
at all?
Sorry, that's me not being clear. It was the same problem as virtio (no
DHCP).
Ah ok.

So, again, I somehow fail to see how this can be a problem. These drivers
(from spice-guest-tools-0.52.exe) don't modify/substitute windows network
components except of the device drivers themselves - in this case it is
about netkvm.sys. All layers below that remains the same, and e1000
drivers are not touched either. So network functionality should not be
affected.

Well. If it really is a bug in qemu, it might be some code path which
is executed when we install additional driver (f.e. vioser, or something
else), and it does something in qemu which "damages" the network part.
Post by Daniel Dickinson
Post by Michael Tokarev
Post by Daniel Dickinson
will also try removing all ethernet drivers, shutting down the VM, and
making only the offending NIC remain (the macvtap ones work).
Oww. This is even more interesting. There's no need to remove other
vNICs really, -- it was my (wrong) guess, I thought maybe win behaves
somehow differently when it has more than one NIC and thus enables
some sort of router/firewall mode.
But the diff. between macvtap and bridge vNICs is very good point. It
means more things to try, maybe there are several issues present.
What might also be useful is to try configuring network manually and
checking if it is only dhcp or whole thing which is at problem. I
already asked you to do so in the previous email.
Sorry, forgot. I will have to create a new VM in a bad state to get
test that.
Please try it. Because...
Post by Daniel Dickinson
Post by Michael Tokarev
I'll try to play with all this, now there's finally quite something to
play with. Good job!
...because I can't reproduce the issue still, no matter if I start with
XP sp3 or sp1, no matter if I install to virtio right away or if I change
it later, no matter if I use older version of virtio drivers first and
so on -- it always Just Works for me.

Maybe these 3 additional NICs in the guest is the key point of this, I
dunno. At least it isn't easy for me to recreate your a bit complex
setup here.

But the more we dig into this, the more it becomes a problem of drivers
on the windows side. I used windows in the past, but I'm not in any way
an expert in its drivers area, I don't know how it all works. Maybe it
is better if you ask more knowlegeable people about this. Mailing to
qemu-***@nongnu.org might be appropriate for this.

Thanks,

/mjt

Loading...