[Portaudio] Proper fix for Windows Unicode issues, and a couple more things.

Discussion:

Gregorio Litenstein

2018-06-28 17:51:16 UTC

Hey, Iâm one of the developers of Performous (cross-platform karaoke game),
weâve been using portaudio for a while on Unix/Mac/Windows.

Iâd initially written a really long post with an issue report but then I
realized a) It was, after pulling most of my hair out, an issue on our side
after all. And b) I sent it to the wrong address so I get it never actually
made it into the list. AnywayâŠ Iâm back with a couple more real issues and
at least one fix.

First, we were facing issues with the display of unicode text in Windows;
essentially the same issue reported here:
https://lists.columbia.edu/pipermail/portaudio/2016-December/000961.html

I took a look at the hostapi implementations and noticed that only some
changed their behavior depending on whether UNICODE was defined or not.
while others always used CP_UTF8.

Initially I thought my issue might have been related to that, so I did some
testing and figured out that in recent versions of Windows, defining
UNICODE (i.e. having everything use CP_UTF8) made the text uniformly
garbled unless I checked the new setting âUse Unicode UTF-8 worldwideâ or
something like that. And by contrast, if that setting was off, CP_ACP
properly rendered the text.

it appears that what this setting actually does is set the codepage to
UTF-8 (65001).

With this in mind, I created a patch modifying the behavior so instead of
checking for the definition of UNICODE or _UNICODE, portaudio checks (at
runtime) for the current codepage using GetACP(); If itâs 65001 it uses
CP_UTF8, if not it uses CP_ACP. I tested it on a laptop running Windows 10
Single Language Spanish and the text rendered appropriately both with the
setting turned on and with the setting turned off.

I will attach the diff file here but TBH I have no idea whether it is
possible to attach files to these lists, so you can also get it from
Dropbox below. Iâm not opening a ticket/submitting a PR because Assembla is
paid.

https://www.dropbox.com/s/le6zyr1zjv6mank/pa_patch.diff?dl=1

Now, on to the âcouple more thingsââŠ itâs actually just one thing. The
GetVersion() function is (still) giving erroneous results. If compiling
portaudio using mingw-w64 (and thus not using WinRT), WASAPI gets the
windows version using

dwVersion = fnGetVersion();

// Get the Windows version
dwMajorVersion = (DWORD)(LOBYTE(LOWORD(dwVersion)));
dwMinorVersion = (DWORD)(HIBYTE(LOWORD(dwVersion)));

switch (dwMajorVersion)
{
case 0:
case 1:
case 2:
case 3:
case 4:
case 5:
break; // skip lower
case 6:
switch (dwMinorVersion)
{
case 0: version = WINDOWS_VISTA_SERVER2008; break;
case 1: version = WINDOWS_7_SERVER2008R2; break;
case 2: version = WINDOWS_8_SERVER2012; break;
case 3: version = WINDOWS_8_1_SERVER2012R2; break;
default: version = WINDOWS_FUTURE; break;
}
break;
case 10:
switch (dwMinorVersion)
{
case 0: version = WINDOWS_10_SERVER2016; break;
default: version = WINDOWS_FUTURE; break;
}
break;
default:
version = WINDOWS_FUTURE;
break;
}

But, from my tests, I noticed in practice Windows 10 (with latest updates)
returns the same value as Windows 8 (i.e. dwMajorVersion=6,
dwMinorVersion=2) and thus Win10 computers might end up using IAudioClient2.

All the best,

Gregorio.

P.S. Are you planning on a new stable release anytime soon?

R0b0t1

2018-06-28 18:44:31 UTC

Permalink

Hello,

On Thu, Jun 28, 2018 at 12:51 PM, Gregorio Litenstein

Hey, I’m one of the developers of Performous (cross-platform karaoke game),
we’ve been using portaudio for a while on Unix/Mac/Windows.
I’d initially written a really long post with an issue report but then I
realized a) It was, after pulling most of my hair out, an issue on our side
after all. And b) I sent it to the wrong address so I get it never actually
made it into the list. Anyway… I’m back with a couple more real issues and
at least one fix.
First, we were facing issues with the display of unicode text in Windows;
https://lists.columbia.edu/pipermail/portaudio/2016-December/000961.html
I took a look at the hostapi implementations and noticed that only some
changed their behavior depending on whether UNICODE was defined or not.
while others always used CP_UTF8.
Initially I thought my issue might have been related to that, so I did some
testing and figured out that in recent versions of Windows, defining UNICODE
(i.e. having everything use CP_UTF8) made the text uniformly garbled unless
I checked the new setting “Use Unicode UTF-8 worldwide” or something like
that. And by contrast, if that setting was off, CP_ACP properly rendered the
text.
it appears that what this setting actually does is set the codepage to UTF-8
(65001).
With this in mind, I created a patch modifying the behavior so instead of
checking for the definition of UNICODE or _UNICODE, portaudio checks (at
runtime) for the current codepage using GetACP(); If it’s 65001 it uses
CP_UTF8, if not it uses CP_ACP. I tested it on a laptop running Windows 10
Single Language Spanish and the text rendered appropriately both with the
setting turned on and with the setting turned off.
I will attach the diff file here but TBH I have no idea whether it is
possible to attach files to these lists, so you can also get it from Dropbox
below. I’m not opening a ticket/submitting a PR because Assembla is paid.
https://www.dropbox.com/s/le6zyr1zjv6mank/pa_patch.diff?dl=1

If Portaudio is using CP_UTF8 then it should be changed so that it
does not. UTF-8 support in Windows is horribly broken to the point of
being considered nonexistent. While programs *should* be compiled with
the UNICODE and _UNICODE macros, and *should* use the XxxW (wchar_t)
APIs, they most likely would be best served by using UTF-8 internally
as appropriate.

I recommend anyone following along read http://utf8everywhere.org/. I
realize you, the patch author, may not have the time to remove CP_UTF8
from Portaudio entirely.

My experience has indicated much the same as the above article
suggests, with the caveat that programs being compiled for Windows
only can usually use wchar_t everywhere. If you need to import or
export data generated by the OS, though, you should convert it from
the native codepage to UTF-8, which is why the author recommends the
things they do.

Cheers,
R0b0t1

Gregorio Litenstein

2018-06-28 19:19:54 UTC

Permalink

I agree that ideally one should use utf8 everywhere and I think that
Microsoft may want to start pushing in that direction eventually (as
suggested by the addition of this setting). My patch proves a better
alternative (with face to the end-user) than what portaudio is currently
doing.

I only touch Windows when I absolutely have to, so wouldn't even know where
to begin in order to correct the root issue, but at least this works.

From: R0b0t1 <***@gmail.com> <***@gmail.com>
Reply: portaudio list <***@lists.columbia.edu>
<***@lists.columbia.edu>
Date: June 28, 2018 at 14:44:31
To: portaudio list <***@lists.columbia.edu>
<***@lists.columbia.edu>
Subject: Re: [Portaudio] Proper fix for Windows Unicode issues, and a
couple more things.

Hello,

Post by R0b0t1
On Thu, Jun 28, 2018 at 12:51 PM, Gregorio Litenstein
Hey, Iâm one of the developers of Performous (cross-platform karaoke game),
weâve been using portaudio for a while on Unix/Mac/Windows.
Iâd initially written a really long post with an issue report but then I
realized a) It was, after pulling most of my hair out, an issue on our side
after all. And b) I sent it to the wrong address so I get it never actually
made it into the list. AnywayâŠ Iâm back with a couple more real issues and
at least one fix.
First, we were facing issues with the display of unicode text in Windows;
https://lists.columbia.edu/pipermail/portaudio/2016-December/000961.html
I took a look at the hostapi implementations and noticed that only some
changed their behavior depending on whether UNICODE was defined or not.
while others always used CP_UTF8.
Initially I thought my issue might have been related to that, so I did some
testing and figured out that in recent versions of Windows, defining UNICODE
(i.e. having everything use CP_UTF8) made the text uniformly garbled unless
I checked the new setting âUse Unicode UTF-8 worldwideâ or something like
that. And by contrast, if that setting was off, CP_ACP properly rendered the
text.
it appears that what this setting actually does is set the codepage to UTF-8
(65001).
With this in mind, I created a patch modifying the behavior so instead of
checking for the definition of UNICODE or _UNICODE, portaudio checks (at
runtime) for the current codepage using GetACP(); If itâs 65001 it uses
CP_UTF8, if not it uses CP_ACP. I tested it on a laptop running Windows 10
Single Language Spanish and the text rendered appropriately both with the
setting turned on and with the setting turned off.
I will attach the diff file here but TBH I have no idea whether it is
possible to attach files to these lists, so you can also get it from Dropbox
below. Iâm not opening a ticket/submitting a PR because Assembla is paid.
https://www.dropbox.com/s/le6zyr1zjv6mank/pa_patch.diff?dl=1
If Portaudio is using CP_UTF8 then it should be changed so that it
does not. UTF-8 support in Windows is horribly broken to the point of
being considered nonexistent. While programs *should* be compiled with
the UNICODE and _UNICODE macros, and *should* use the XxxW (wchar_t)
APIs, they most likely would be best served by using UTF-8 internally
as appropriate.
I recommend anyone following along read http://utf8everywhere.org/. I
realize you, the patch author, may not have the time to remove CP_UTF8
from Portaudio entirely.
My experience has indicated much the same as the above article
suggests, with the caveat that programs being compiled for Windows
only can usually use wchar_t everywhere. If you need to import or
export data generated by the OS, though, you should convert it from
the native codepage to UTF-8, which is why the author recommends the
things they do.
Cheers,
R0b0t1
_______________________________________________
Portaudio mailing list
https://lists.columbia.edu/mailman/listinfo/portaudio

Ross Bencina

2018-06-29 03:07:21 UTC

Permalink

Hello Gregorio,

If I understand correctly, your patch disables UTF-8 when UNICODE is not
defined, is that correct?

Ross.

Post by Gregorio Litenstein
I agree that ideally one should use utf8 everywhere and I think that
Microsoft may want to start pushing in that direction eventually (as
suggested by the addition of this setting). My patch proves a better
alternative (with face to the end-user) than what portaudio is currently
doing.
I only touch Windows when I absolutely have to, so wouldn't even know
where to begin in order to correct the root issue, but at least this works.
Date: June 28, 2018 at 14:44:31
Subject: Re: [Portaudio] Proper fix for Windows Unicode issues, and a
couple more things.

Post by R0b0t1
Hello,
On Thu, Jun 28, 2018 at 12:51 PM, Gregorio Litenstein

If Portaudio is using CP_UTF8 then it should be changed so that it
does not. UTF-8 support in Windows is horribly broken to the point of
being considered nonexistent. While programs *should* be compiled with
the UNICODE and _UNICODE macros, and *should* use the XxxW (wchar_t)
APIs, they most likely would be best served by using UTF-8 internally
as appropriate.
I recommend anyone following along read http://utf8everywhere.org/. I
realize you, the patch author, may not have the time to remove CP_UTF8
from Portaudio entirely.
My experience has indicated much the same as the above article
suggests, with the caveat that programs being compiled for Windows
only can usually use wchar_t everywhere. If you need to import or
export data generated by the OS, though, you should convert it from
the native codepage to UTF-8, which is why the author recommends the
things they do.
Cheers,
R0b0t1
_______________________________________________
Portaudio mailing list
https://lists.columbia.edu/mailman/listinfo/portaudio

_______________________________________________
Portaudio mailing list
https://lists.columbia.edu/mailman/listinfo/portaudio

Gregorio Litenstein

2018-06-29 19:07:42 UTC

Permalink

This got lost in the void accidentally:

Not really. It completely eliminates UNICODE from the equation.

Currently, the DirectSound hostapi does what you describe, while WASAPI and
WDM-KS use CP_UTF8 always. My patch makes it so all three use ANSI
codepages UNLESS Windows has been configured to try to use UTF8 even for
apps that arenât entirely Unicode compliant (which is a new setting, marked
as Beta in Windows 10 build 1803).

In doing this, localized device names should be rendered correctly
regardless of the codepage being used by Windows.

Either way, as far as I can tell, the only real impact of this patch is how
are device names displayed.
--
Gregorio Litenstein Goldzweig
[image: glit_ind.png]
MÃ©dico Cirujano

- Fono: +56 9 96343643
- E-Mail: ***@gmail.com

From: Ross Bencina <rossb-***@audiomulch.com> <rossb-***@audiomulch.com>
Date: June 28, 2018 at 23:07:21
To: portaudio list <***@lists.columbia.edu>
<***@lists.columbia.edu>, Gregorio Litenstein <***@gmail.com>
<***@gmail.com>
Subject: Re: [Portaudio] Proper fix for Windows Unicode issues, and a
couple more things.

Hello Gregorio,

Post by Ross Bencina
If I understand correctly, your patch disables UTF-8 when UNICODE is not
defined, is that correct?
Ross.
I agree that ideally one should use utf8 everywhere and I think that
Microsoft may want to start pushing in that direction eventually (as
suggested by the addition of this setting). My patch proves a better
alternative (with face to the end-user) than what portaudio is currently
doing.
I only touch Windows when I absolutely have to, so wouldn't even know
where to begin in order to correct the root issue, but at least this works.
Date: June 28, 2018 at 14:44:31
Subject: Re: [Portaudio] Proper fix for Windows Unicode issues, and a
couple more things.
Hello,
On Thu, Jun 28, 2018 at 12:51 PM, Gregorio Litenstein
Hey, Iâm one of the developers of Performous (cross-platform karaoke
game),
weâve been using portaudio for a while on Unix/Mac/Windows.
Iâd initially written a really long post with an issue report but then I
realized a) It was, after pulling most of my hair out, an issue on our side
after all. And b) I sent it to the wrong address so I get it never actually
made it into the list. AnywayâŠ Iâm back with a couple more real
issues and
at least one fix.
First, we were facing issues with the display of unicode text in Windows;
https://lists.columbia.edu/pipermail/portaudio/2016-December/000961.html
I took a look at the hostapi implementations and noticed that only some
changed their behavior depending on whether UNICODE was defined or not.
while others always used CP_UTF8.
Initially I thought my issue might have been related to that, so I did some
testing and figured out that in recent versions of Windows, defining UNICODE
(i.e. having everything use CP_UTF8) made the text uniformly garbled unless
I checked the new setting âUse Unicode UTF-8 worldwideâ or something like
that. And by contrast, if that setting was off, CP_ACP properly rendered the
text.
it appears that what this setting actually does is set the codepage to UTF-8
(65001).
With this in mind, I created a patch modifying the behavior so instead of
checking for the definition of UNICODE or _UNICODE, portaudio checks (at
runtime) for the current codepage using GetACP(); If itâs 65001 it uses
CP_UTF8, if not it uses CP_ACP. I tested it on a laptop running Windows 10
Single Language Spanish and the text rendered appropriately both with the
setting turned on and with the setting turned off.
I will attach the diff file here but TBH I have no idea whether it is
possible to attach files to these lists, so you can also get it from Dropbox
below. Iâm not opening a ticket/submitting a PR because Assembla is paid.
https://www.dropbox.com/s/le6zyr1zjv6mank/pa_patch.diff?dl=1
If Portaudio is using CP_UTF8 then it should be changed so that it
does not. UTF-8 support in Windows is horribly broken to the point of
being considered nonexistent. While programs *should* be compiled with
the UNICODE and _UNICODE macros, and *should* use the XxxW (wchar_t)
APIs, they most likely would be best served by using UTF-8 internally
as appropriate.
I recommend anyone following along read http://utf8everywhere.org/. I
realize you, the patch author, may not have the time to remove CP_UTF8
from Portaudio entirely.
My experience has indicated much the same as the above article
suggests, with the caveat that programs being compiled for Windows
only can usually use wchar_t everywhere. If you need to import or
export data generated by the OS, though, you should convert it from
the native codepage to UTF-8, which is why the author recommends the
things they do.
Cheers,
R0b0t1
_______________________________________________
Portaudio mailing list
https://lists.columbia.edu/mailman/listinfo/portaudio
_______________________________________________
Portaudio mailing list
https://lists.columbia.edu/mailman/listinfo/portaudio

Ross Bencina

2018-06-30 05:55:15 UTC

Permalink

Hi Gregorio,

Thanks for the clarification. However I don't fully understand what
you're saying.

Are you saying that without this patch, if Windows is configured in a
certain way, PortAudio does not correctly translate device names
returned by the OS into UTF8?

As a general principle for any patch, it should not matter at all how
the executable or how Windows is configured, PortAudio *MUST* always
return UTF8 strings, because

(1) this is what we have previously agreed

and

(2) if we don't do this, applications that use PortAudio are no longer
portable because they can't rely on PortAudio strings being utf8.

Finally, the reason that we switched to requiring utf8 everywhere was
that there were bugs where device names with special characters were not
displayed correctly. And we agreed that the same encoding needs to be
used everywhere.

Ross.

Post by Gregorio Litenstein
Not really. It completely eliminates UNICODE from the equation.
Currently, the DirectSound hostapi does what you describe, while WASAPI
and WDM-KS use CP_UTF8 always. My patch makes it so all three use ANSI
codepages UNLESS Windows has been configured to try to use UTF8 even for
apps that aren’t entirely Unicode compliant (which is a new setting,
marked as Beta in Windows 10 build 1803).
In doing this, localized device names should be rendered correctly
regardless of the codepage being used by Windows.
Either way, as far as I can tell, the only real impact of this patch is
how are device names displayed.
--
Gregorio Litenstein Goldzweig
glit_ind.png
Médico Cirujano
* Fono: +56 9 96343643

Post by Gregorio Litenstein
Date: June 28, 2018 at 23:07:21
Subject: Re: [Portaudio] Proper fix for Windows Unicode issues, and a
couple more things.

Post by Ross Bencina
Hello Gregorio,
If I understand correctly, your patch disables UTF-8 when UNICODE is not
defined, is that correct?
Ross.

Post by R0b0t1
Hello,
On Thu, Jun 28, 2018 at 12:51 PM, Gregorio Litenstein

If Portaudio is using CP_UTF8 then it should be changed so that it
does not. UTF-8 support in Windows is horribly broken to the point of
being considered nonexistent. While programs *should* be compiled with
the UNICODE and _UNICODE macros, and *should* use the XxxW (wchar_t)
APIs, they most likely would be best served by using UTF-8 internally
as appropriate.
I recommend anyone following along read http://utf8everywhere.org/. I
realize you, the patch author, may not have the time to remove CP_UTF8
from Portaudio entirely.
My experience has indicated much the same as the above article
suggests, with the caveat that programs being compiled for Windows
only can usually use wchar_t everywhere. If you need to import or
export data generated by the OS, though, you should convert it from
the native codepage to UTF-8, which is why the author recommends the
things they do.
Cheers,
R0b0t1
_______________________________________________
Portaudio mailing list
https://lists.columbia.edu/mailman/listinfo/portaudio

_______________________________________________
Portaudio mailing list
https://lists.columbia.edu/mailman/listinfo/portaudio

R0b0t1

2018-06-30 06:22:34 UTC

Permalink

On Sat, Jun 30, 2018 at 12:55 AM, Ross Bencina

I agree exposing a UTF-8 interface is best, but it is necessary to
interact with Windows using UTF-16. Anything else is broken.
Consequently _UNICODE and UNICODE have to be defined - CP_UTF8 isn't a
real codepage, it only exists for use with MultiBytetoWideChar and
WideCharToMultiByte.

Post by Ross Bencina
and
(2) if we don't do this, applications that use PortAudio are no longer
portable because they can't rely on PortAudio strings being utf8.

ANSI fits within UTF-8 so it should work.

Post by Ross Bencina
Finally, the reason that we switched to requiring utf8 everywhere was that
there were bugs where device names with special characters were not
displayed correctly. And we agreed that the same encoding needs to be used
everywhere.
Ross.

I hope to not confuse the conversation, but the issue isn't with
PortAudio, per se - the issue is with Windows. *Not "using" UTF-8* is
the best way to solve these problems as far as the OS is concerned.
This may mean the bug you encountered can't be fixed. If a device
doesn't provide an ASCII or limited-symbol name then there are
potentially OS configurations which can't display the name.

The problem lies in that using any Windows string function with UTF-8
data when not in the UTF-8 codepage can potentially corrupt the data.
This is unfortunate considering that the UTF-8 codepage is not really
implemented and corrupts data anyway.

The ANSI codepages also corrupt data so they can not be used
facilitate OS-mediated UTF-8 transfer. The solution is to pass around
UTF-8 and then require users convert it to their code page when
interacting with the OS.

Cheers,
R0b0t1

Post by Ross Bencina

Post by Gregorio Litenstein
Not really. It completely eliminates UNICODE from the equation.
Currently, the DirectSound hostapi does what you describe, while WASAPI
and WDM-KS use CP_UTF8 always. My patch makes it so all three use ANSI
codepages UNLESS Windows has been configured to try to use UTF8 even for
apps that aren’t entirely Unicode compliant (which is a new setting, marked
as Beta in Windows 10 build 1803).
In doing this, localized device names should be rendered correctly
regardless of the codepage being used by Windows.
Either way, as far as I can tell, the only real impact of this patch is
how are device names displayed.
--
Gregorio Litenstein Goldzweig
glit_ind.png
Médico Cirujano
* Fono: +56 9 96343643

Post by Gregorio Litenstein
Date: June 28, 2018 at 23:07:21
Subject: Re: [Portaudio] Proper fix for Windows Unicode issues, and a
couple more things.

Post by Ross Bencina
Hello Gregorio,
If I understand correctly, your patch disables UTF-8 when UNICODE is not
defined, is that correct?
Ross.

Post by R0b0t1
Hello,
On Thu, Jun 28, 2018 at 12:51 PM, Gregorio Litenstein

If Portaudio is using CP_UTF8 then it should be changed so that it
does not. UTF-8 support in Windows is horribly broken to the point of
being considered nonexistent. While programs *should* be compiled with
the UNICODE and _UNICODE macros, and *should* use the XxxW (wchar_t)
APIs, they most likely would be best served by using UTF-8 internally
as appropriate.
I recommend anyone following along read http://utf8everywhere.org/. I
realize you, the patch author, may not have the time to remove CP_UTF8
from Portaudio entirely.
My experience has indicated much the same as the above article
suggests, with the caveat that programs being compiled for Windows
only can usually use wchar_t everywhere. If you need to import or
export data generated by the OS, though, you should convert it from
the native codepage to UTF-8, which is why the author recommends the
things they do.
Cheers,
R0b0t1
_______________________________________________
Portaudio mailing list
https://lists.columbia.edu/mailman/listinfo/portaudio

_______________________________________________
Portaudio mailing list
https://lists.columbia.edu/mailman/listinfo/portaudio

Gregorio Litenstein

2018-06-30 06:30:41 UTC

Permalink

As portaudio is now, broken names are displayed for some (default)
configurations. My patch should fix that. I can probably send you some
practical examples tomorrow so it's clearer.
--
Gregorio Litenstein Goldzweig
[image: glit_ind.png]
MÃ©dico Cirujano

- Fono: +56 9 96343643
- E-Mail: ***@gmail.com

From: R0b0t1 <***@gmail.com> <***@gmail.com>
Date: June 30, 2018 at 02:22:34
To: portaudio list <***@lists.columbia.edu>
<***@lists.columbia.edu>
CC: Gregorio Litenstein <***@gmail.com> <***@gmail.com>
Subject: Re: [Portaudio] Proper fix for Windows Unicode issues, and a
couple more things.

On Sat, Jun 30, 2018 at 12:55 AM, Ross Bencina

Post by Ross Bencina
Hi Gregorio,
Thanks for the clarification. However I don't fully understand what you're
saying.
Are you saying that without this patch, if Windows is configured in a
certain way, PortAudio does not correctly translate device names returned by
the OS into UTF8?
As a general principle for any patch, it should not matter at all how the
executable or how Windows is configured, PortAudio *MUST* always return UTF8
strings, because
(1) this is what we have previously agreed
I agree exposing a UTF-8 interface is best, but it is necessary to
interact with Windows using UTF-16. Anything else is broken.
Consequently _UNICODE and UNICODE have to be defined - CP_UTF8 isn't a
real codepage, it only exists for use with MultiBytetoWideChar and
WideCharToMultiByte.
and
(2) if we don't do this, applications that use PortAudio are no longer
portable because they can't rely on PortAudio strings being utf8.
ANSI fits within UTF-8 so it should work.
Finally, the reason that we switched to requiring utf8 everywhere was that
there were bugs where device names with special characters were not
displayed correctly. And we agreed that the same encoding needs to be used
everywhere.
Ross.
I hope to not confuse the conversation, but the issue isn't with
PortAudio, per se - the issue is with Windows. *Not "using" UTF-8* is
the best way to solve these problems as far as the OS is concerned.
This may mean the bug you encountered can't be fixed. If a device
doesn't provide an ASCII or limited-symbol name then there are
potentially OS configurations which can't display the name.
The problem lies in that using any Windows string function with UTF-8
data when not in the UTF-8 codepage can potentially corrupt the data.
This is unfortunate considering that the UTF-8 codepage is not really
implemented and corrupts data anyway.
The ANSI codepages also corrupt data so they can not be used
facilitate OS-mediated UTF-8 transfer. The solution is to pass around
UTF-8 and then require users convert it to their code page when
interacting with the OS.
Cheers,
R0b0t1
Not really. It completely eliminates UNICODE from the equation.
Currently, the DirectSound hostapi does what you describe, while WASAPI
and WDM-KS use CP_UTF8 always. My patch makes it so all three use ANSI
codepages UNLESS Windows has been configured to try to use UTF8 even for
apps that arenât entirely Unicode compliant (which is a new setting,
marked
as Beta in Windows 10 build 1803).
In doing this, localized device names should be rendered correctly
regardless of the codepage being used by Windows.
Either way, as far as I can tell, the only real impact of this patch is
how are device names displayed.
--
Gregorio Litenstein Goldzweig
glit_ind.png
MÃ©dico Cirujano
* Fono: +56 9 96343643
Date: June 28, 2018 at 23:07:21
Subject: Re: [Portaudio] Proper fix for Windows Unicode issues, and a
couple more things.
Hello Gregorio,
If I understand correctly, your patch disables UTF-8 when UNICODE is not
defined, is that correct?
Ross.
I agree that ideally one should use utf8 everywhere and I think that
Microsoft may want to start pushing in that direction eventually (as
suggested by the addition of this setting). My patch proves a better
alternative (with face to the end-user) than what portaudio is
currently
doing.
I only touch Windows when I absolutely have to, so wouldn't even know
where to begin in order to correct the root issue, but at least this works.
Date: June 28, 2018 at 14:44:31
Subject: Re: [Portaudio] Proper fix for Windows Unicode issues, and a
couple more things.
Hello,
On Thu, Jun 28, 2018 at 12:51 PM, Gregorio Litenstein
Hey, Iâm one of the developers of Performous (cross-platform karaoke
game),
weâve been using portaudio for a while on Unix/Mac/Windows.
Iâd initially written a really long post with an issue report but
then I
realized a) It was, after pulling most of my hair out, an issue on our side
after all. And b) I sent it to the wrong address so I get it never actually
made it into the list. AnywayâŠ Iâm back with a couple more real
issues and
at least one fix.
First, we were facing issues with the display of unicode text in Windows;
https://lists.columbia.edu/pipermail/portaudio/2016-December/000961.html
I took a look at the hostapi implementations and noticed that only some
changed their behavior depending on whether UNICODE was defined or not.
while others always used CP_UTF8.
Initially I thought my issue might have been related to that, so I did some
testing and figured out that in recent versions of Windows, defining UNICODE
(i.e. having everything use CP_UTF8) made the text uniformly garbled unless
I checked the new setting âUse Unicode UTF-8 worldwideâ or something
like
that. And by contrast, if that setting was off, CP_ACP properly rendered the
text.
it appears that what this setting actually does is set the codepage to UTF-8
(65001).
With this in mind, I created a patch modifying the behavior so
instead of
checking for the definition of UNICODE or _UNICODE, portaudio checks (at
runtime) for the current codepage using GetACP(); If itâs 65001 it
uses
CP_UTF8, if not it uses CP_ACP. I tested it on a laptop running Windows 10
Single Language Spanish and the text rendered appropriately both with the
setting turned on and with the setting turned off.
I will attach the diff file here but TBH I have no idea whether it is
possible to attach files to these lists, so you can also get it from Dropbox
below. Iâm not opening a ticket/submitting a PR because Assembla is
paid.
https://www.dropbox.com/s/le6zyr1zjv6mank/pa_patch.diff?dl=1
If Portaudio is using CP_UTF8 then it should be changed so that it
does not. UTF-8 support in Windows is horribly broken to the point of
being considered nonexistent. While programs *should* be compiled with
the UNICODE and _UNICODE macros, and *should* use the XxxW (wchar_t)
APIs, they most likely would be best served by using UTF-8 internally
as appropriate.
I recommend anyone following along read http://utf8everywhere.org/. I
realize you, the patch author, may not have the time to remove CP_UTF8
from Portaudio entirely.
My experience has indicated much the same as the above article
suggests, with the caveat that programs being compiled for Windows
only can usually use wchar_t everywhere. If you need to import or
export data generated by the OS, though, you should convert it from
the native codepage to UTF-8, which is why the author recommends the
things they do.
Cheers,
R0b0t1
_______________________________________________
Portaudio mailing list
https://lists.columbia.edu/mailman/listinfo/portaudio
_______________________________________________
Portaudio mailing list
https://lists.columbia.edu/mailman/listinfo/portaudio
_______________________________________________
Portaudio mailing list
https://lists.columbia.edu/mailman/listinfo/portaudio
_______________________________________________
Portaudio mailing list
https://lists.columbia.edu/mailman/listinfo/portaudio

Ross Bencina

2018-07-02 05:55:05 UTC

Permalink

Hi R0b0t1,

That's a very helpful perspective, some of the details are coming back
to me now. I don't entirely agree with your conclusions -- perhaps I am
misguided, please could you correct me...

Post by R0b0t1
I agree exposing a UTF-8 interface is best, but it is necessary to
interact with Windows using UTF-16. Anything else is broken.

I understand this much. Where relevant, PortAudio needs to use the
unicode versions of any Win32 APIs that accept or return strings.

Post by R0b0t1
Consequently _UNICODE and UNICODE have to be defined

No, we just need to always explicitly call the API functions that have
the W suffix. This may not currently be the case everywhere but it is
certainly the approach I have taken when implementing UTF-8 translations
in the PortAudio code that I worked on.

Post by R0b0t1
- CP_UTF8 isn't a
real codepage, it only exists for use with MultiBytetoWideChar and
WideCharToMultiByte.

Post by Ross Bencina
and
(2) if we don't do this, applications that use PortAudio are no longer
portable because they can't rely on PortAudio strings being utf8.

ANSI fits within UTF-8 so it should work.

I'm not sure what you're getting at here. The whole point of requiring
PA to return UTF-8 strings is PortAudio needs to return UTF-8 so that
drivers with non-ASCII names can be displayed correctly and
consistently. This problem is real, and it is why we decided to
stipulate the PortAudio always returns UTF-8 strings.

Post by R0b0t1

I'm not sure how this is relevant. If Windows can't display a utf-8
string, that's a problem for the client application trying to "display"
the utf-8 name, not for PortAudio.

Also, could you clarify which OS configurations these might be? My
knowledge is limited, so I'm not aware of any. Note that we have tacitly
given up supporting anything older than Windows XP.

Post by R0b0t1
The problem lies in that using any Windows string function with UTF-8
data when not in the UTF-8 codepage can potentially corrupt the data.
This is unfortunate considering that the UTF-8 codepage is not really
implemented and corrupts data anyway.

Absolutely agree with that. That's why PortAudio should (and does)
translate from UTF-8 to to UTF-16 and then call the "W" suffix Windows
APIs. PortAudio should never be passing UTF-8 char* buffers to "A"
suffix Windows APIs. I think we both agree on that.

Post by R0b0t1
The ANSI codepages also corrupt data so they can not be used
facilitate OS-mediated UTF-8 transfer. The solution is to pass around
UTF-8 and then require users convert it to their code page when
interacting with the OS.

Erm, isn't the solution to to pass around UTF-8 and then require PA to
convert it to UTF-16 and pass it to the UTF-16 Windows APIs? Am I
missing something here? If we use the Wide-string APIs does what you're
referring to as "their code page" have any relevance?

Ross.

Post by R0b0t1
Cheers,
R0b0t1

Post by Ross Bencina

Post by Gregorio Litenstein
Not really. It completely eliminates UNICODE from the equation.
Currently, the DirectSound hostapi does what you describe, while WASAPI
and WDM-KS use CP_UTF8 always. My patch makes it so all three use ANSI
codepages UNLESS Windows has been configured to try to use UTF8 even for
apps that aren’t entirely Unicode compliant (which is a new setting, marked
as Beta in Windows 10 build 1803).
In doing this, localized device names should be rendered correctly
regardless of the codepage being used by Windows.
Either way, as far as I can tell, the only real impact of this patch is
how are device names displayed.
--
Gregorio Litenstein Goldzweig
glit_ind.png
Médico Cirujano
* Fono: +56 9 96343643

Post by Gregorio Litenstein
Date: June 28, 2018 at 23:07:21
Subject: Re: [Portaudio] Proper fix for Windows Unicode issues, and a
couple more things.

Post by Ross Bencina
Hello Gregorio,
If I understand correctly, your patch disables UTF-8 when UNICODE is not
defined, is that correct?
Ross.

Post by R0b0t1
Hello,
On Thu, Jun 28, 2018 at 12:51 PM, Gregorio Litenstein

Hey, I’m one of the developers of Performous (cross-platform karaoke game),
we’ve been using portaudio for a while on Unix/Mac/Windows.
I’d initially written a really long post with an issue report but then I
realized a) It was, after pulling most of my hair out, an issue on our side
after all. And b) I sent it to the wrong address so I get it never actually
made it into the list. Anyway… I’m back with a couple more real
issues and
at least one fix.
First, we were facing issues with the display of unicode text in Windows;
https://lists.columbia.edu/pipermail/portaudio/2016-December/000961.html
I took a look at the hostapi implementations and noticed that only some
changed their behavior depending on whether UNICODE was defined or not.
while others always used CP_UTF8.
Initially I thought my issue might have been related to that, so I did some
testing and figured out that in recent versions of Windows, defining UNICODE
(i.e. having everything use CP_UTF8) made the text uniformly garbled unless
I checked the new setting “Use Unicode UTF-8 worldwide” or something like
that. And by contrast, if that setting was off, CP_ACP properly rendered the
text.
it appears that what this setting actually does is set the codepage to UTF-8
(65001).
With this in mind, I created a patch modifying the behavior so instead of
checking for the definition of UNICODE or _UNICODE, portaudio checks (at
runtime) for the current codepage using GetACP(); If it’s 65001 it uses
CP_UTF8, if not it uses CP_ACP. I tested it on a laptop running Windows 10
Single Language Spanish and the text rendered appropriately both with the
setting turned on and with the setting turned off.
I will attach the diff file here but TBH I have no idea whether it is
possible to attach files to these lists, so you can also get it from Dropbox
below. I’m not opening a ticket/submitting a PR because Assembla is paid.
https://www.dropbox.com/s/le6zyr1zjv6mank/pa_patch.diff?dl=1

If Portaudio is using CP_UTF8 then it should be changed so that it
does not. UTF-8 support in Windows is horribly broken to the point of
being considered nonexistent. While programs *should* be compiled with
the UNICODE and _UNICODE macros, and *should* use the XxxW (wchar_t)
APIs, they most likely would be best served by using UTF-8 internally
as appropriate.
I recommend anyone following along read http://utf8everywhere.org/. I
realize you, the patch author, may not have the time to remove CP_UTF8
from Portaudio entirely.
My experience has indicated much the same as the above article
suggests, with the caveat that programs being compiled for Windows
only can usually use wchar_t everywhere. If you need to import or
export data generated by the OS, though, you should convert it from
the native codepage to UTF-8, which is why the author recommends the
things they do.
Cheers,
R0b0t1
_______________________________________________
Portaudio mailing list
https://lists.columbia.edu/mailman/listinfo/portaudio

_______________________________________________
Portaudio mailing list
https://lists.columbia.edu/mailman/listinfo/portaudio

Robert Bielik

2018-07-02 06:21:56 UTC

Permalink

Hi all,

Post by Ross Bencina
Erm, isn't the solution to to pass around UTF-8 and then require PA to
convert it to UTF-16 and pass it to the UTF-16 Windows APIs? Am I
missing something here? If we use the Wide-string APIs does what you're
referring to as "their code page" have any relevance?

I agree, the correct way to handle this is to convert between UTF-8 and wide char and use the
W suffixed Windows API. UTF-8 is just an encoding.

My 2 cents

Regards,
/Rob

Post by Ross Bencina
Ross.

Post by R0b0t1
Cheers,
R0b0t1

Post by R0b0t1

Post by Gregorio Litenstein
Not really. It completely eliminates UNICODE from the equation.
Currently, the DirectSound hostapi does what you describe, while

WASAPI

Post by R0b0t1

Post by Gregorio Litenstein
and WDM-KS use CP_UTF8 always. My patch makes it so all three use

ANSI

Post by R0b0t1

Post by Gregorio Litenstein
codepages UNLESS Windows has been configured to try to use UTF8

even for

Post by R0b0t1

Post by Gregorio Litenstein
apps that aren’t entirely Unicode compliant (which is a new setting,

marked

Post by R0b0t1

Post by Gregorio Litenstein
as Beta in Windows 10 build 1803).
In doing this, localized device names should be rendered correctly
regardless of the codepage being used by Windows.
Either way, as far as I can tell, the only real impact of this patch is
how are device names displayed.
--
Gregorio Litenstein Goldzweig
glit_ind.png
Médico Cirujano
* Fono: +56 9 96343643

Post by Gregorio Litenstein
Date: June 28, 2018 at 23:07:21
Subject: Re: [Portaudio] Proper fix for Windows Unicode issues, and a
couple more things.

Post by Ross Bencina
Hello Gregorio,
If I understand correctly, your patch disables UTF-8 when UNICODE is

not