TRAP 0000 in OS2KRNL

Discussion:

(too old to reply)

t***@antispam.ham

2008-05-14 11:15:18 UTC

My office OS/2 machine stopped today. TRAP 0000 in OS2KRNL was the reason.
I recall seeing a list of OS/2 traps once, but so far, all I've found with
search engines is a "divide by zero error", which appears to be associated
with user programs, not the kernel. What causes a TRAP 0000 in OS2KRNL?

Lars Erdmann

2008-05-15 00:29:19 UTC

Permalink

Why only user programs ? It's perfectly possible that there is a bug in
OS2KRNL whre it attempts to divide by zero due to a non-expected divisor
value ...

Lars

<***@antispam.ham> schrieb im Newsbeitrag
news:482ac9c6$0$7073$***@roadrunner.com...
> My office OS/2 machine stopped today. TRAP 0000 in OS2KRNL was the
> reason.
> I recall seeing a list of OS/2 traps once, but so far, all I've found with
> search engines is a "divide by zero error", which appears to be associated
> with user programs, not the kernel. What causes a TRAP 0000 in OS2KRNL?
>

Steven Levine

2008-05-15 05:06:54 UTC

Permalink

In <482b83e0$0$7538$***@newsspool1.arcor-online.net>, on 05/15/2008
at 02:29 AM, "Lars Erdmann" <***@arcor.de> said:

>Why only user programs ? It's perfectly possible that there is a bug in
>OS2KRNL whre it attempts to divide by zero due to a non-expected divisor
>value ...

I've always found the naming of trap 0 interesting. My experience that it
occurs more often when the dividend is too large.

Steven

--
--------------------------------------------------------------------------------------------
Steven Levine <***@earthlink.bogus.net> MR2/ICE 3.00 beta 11pre11 #10183
eCS/Warp/DIY/14.103a_W4 www.scoug.com irc.ca.webbnet.info #scoug (Wed 7pm PST)
--------------------------------------------------------------------------------------------

Paul Ratcliffe

2008-05-15 06:48:58 UTC

Permalink

On Wed, 14 May 2008 22:06:54 -0700, Steven Levine <***@earthlink.bogus.net>
wrote:

>>Why only user programs ? It's perfectly possible that there is a bug in
>>OS2KRNL whre it attempts to divide by zero due to a non-expected divisor
>>value ...
>
> I've always found the naming of trap 0 interesting. My experience that it
> occurs more often when the dividend is too large.

When the dividend is too large and/or the divisor is too small.
Remember the old Borland runtime library error 200 on faster (of the day) CPUs?
That was a 'divide by zero' error where it wasn't dividing by zero.

The meaning of the exception is actually 'divide overflow' - it's just that
it is usually caused by dividing by zero because that always generates that
exception.

t***@antispam.ham

2008-05-15 08:44:29 UTC

Permalink

Paul Ratcliffe writes:

> Steven Levine wrote:

>>> Why only user programs ? It's perfectly possible that there is a bug in
>>> OS2KRNL whre it attempts to divide by zero due to a non-expected divisor
>>> value ...

>> I've always found the naming of trap 0 interesting. My experience that it
>> occurs more often when the dividend is too large.

> When the dividend is too large and/or the divisor is too small.
> Remember the old Borland runtime library error 200 on faster (of the day) CPUs?
> That was a 'divide by zero' error where it wasn't dividing by zero.
>
> The meaning of the exception is actually 'divide overflow' - it's just that
> it is usually caused by dividing by zero because that always generates that
> exception.

I can see user programs doing that, and they can crash without taking down
the system. But shouldn't the system protect itself from such a crash by
testing for the exception and avoiding it?

scott g

2008-05-15 13:31:11 UTC

Permalink

***@antispam.ham wrote:

>> The meaning of the exception is actually 'divide overflow' - it's just that
>> it is usually caused by dividing by zero because that always generates that
>> exception.
>
> I can see user programs doing that, and they can crash without taking down
> the system. But shouldn't the system protect itself from such a crash by
> testing for the exception and avoiding it?
>
Well, why do you think that kernel code is immune from the same kind of problem
as user code? If an unexpected fatal exception (e.g. 0000) happens in ring 0,
that's a trap....

As I used to say fairly regularly -- without the complete trap screen it's
pretty hard to say what's really going on.

t***@antispam.ham

2008-05-15 20:32:42 UTC

Permalink

scott g <scottegos2-***@sbcglobal.boguspart.net> writes:

>>> The meaning of the exception is actually 'divide overflow' - it's just that
>>> it is usually caused by dividing by zero because that always generates that
>>> exception.

>> I can see user programs doing that, and they can crash without taking down
>> the system. But shouldn't the system protect itself from such a crash by
>> testing for the exception and avoiding it?

> Well, why do you think that kernel code is immune from the same kind of problem
> as user code?

I don't think it's immune. It obviously isn't, given the crash. But I
would expect it to be coded so that it is immune from the same kind of
problem as user code, simply because it's the kernel and therefore more
important than typical user code.

> If an unexpected fatal exception (e.g. 0000) happens in ring 0,
> that's a trap....

The key word here is "unexpected". If it can happen, then it should be
expected to happen, sooner or later, and should be guarded against. I've
done a fair bit of programming myself, so I know about the trade-offs
between speed of code and robustness of code. The more mission-critical
the code, the more I go for robustness. The kernel is the most mission-
critical piece of code on a computer.

> As I used to say fairly regularly -- without the complete trap screen it's
> pretty hard to say what's really going on.

If IBM was still servicing OS/2, then I might have had some motivation
to take the time the write down all that information, but without that,
and given the time constraints I was under, I didn't bother. My
curiosity as to what could cause a TRAP 0000 remains, however.

Ilya Zakharevich

2008-05-16 07:11:35 UTC

Permalink

[A complimentary Cc of this posting was sent to

<***@antispam.ham>], who wrote in article <482c9dea$0$30502$***@roadrunner.com>:
> > Well, why do you think that kernel code is immune from the same kind of problem
> > as user code?

> I don't think it's immune. It obviously isn't, given the crash. But I
> would expect it to be coded so that it is immune from the same kind of
> problem as user code, simply because it's the kernel and therefore more
> important than typical user code.

So it is much more important to crash (given an approriate opportunity
;-). Applications may have much larger freedom: "I do not know what
happened, so let me try to ignore this".

> > If an unexpected fatal exception (e.g. 0000) happens in ring 0,
> > that's a trap....

> The key word here is "unexpected". If it can happen, then it should be
> expected to happen, sooner or later, and should be guarded against. I've
> done a fair bit of programming myself

We know this, but your preceeding sentence does not ring right given
this assumption.

All programs contain hundreds of bugs. Are you surprised that a
kernel has a bug?

> If IBM was still servicing OS/2, then I might have had some motivation
> to take the time the write down all that information, but without that,
> and given the time constraints I was under, I didn't bother.

It is important to check BEFOREHAND whether your cell phone could make
a readable image of a trap screen (or something else) when you are in
a hurry. (Most cell phone cameras are focus-free, so do not allow a
good rendition of text. But given a large enough screen, they may
have a chance.)

Yours,
Ilya

t***@antispam.ham

2008-05-16 11:18:08 UTC

Permalink

Ilya Zakharevich writes:

>>> Well, why do you think that kernel code is immune from the same kind of problem
>>> as user code?

>> I don't think it's immune. It obviously isn't, given the crash. But I
>> would expect it to be coded so that it is immune from the same kind of
>> problem as user code, simply because it's the kernel and therefore more
>> important than typical user code.

> So it is much more important to crash (given an approriate opportunity
> ;-). Applications may have much larger freedom: "I do not know what
> happened, so let me try to ignore this".

Why is it more important for the kernel to crash? Such a crash takes
down everything running on the computer: all programs, and in the case
of a multi-user system, all users as well. Would you like to be flying
on an airplane whose control computer crashed because of a kernel bug?

>>> If an unexpected fatal exception (e.g. 0000) happens in ring 0,
>>> that's a trap....

>> The key word here is "unexpected". If it can happen, then it should be
>> expected to happen, sooner or later, and should be guarded against. I've
>> done a fair bit of programming myself

> We know this, but your preceeding sentence does not ring right given
> this assumption.

Why do you say that?

> All programs contain hundreds of bugs.

I disagree.

> Are you surprised that a kernel has a bug?

A divide by zero bug, yes.

>> If IBM was still servicing OS/2, then I might have had some motivation
>> to take the time the write down all that information, but without that,
>> and given the time constraints I was under, I didn't bother.

> It is important to check BEFOREHAND whether your cell phone could make
> a readable image of a trap screen (or something else) when you are in
> a hurry. (Most cell phone cameras are focus-free, so do not allow a
> good rendition of text. But given a large enough screen, they may
> have a chance.)

I had no camera handy either.

Ilya Zakharevich

2008-05-16 11:24:55 UTC

Permalink

[A complimentary Cc of this posting was sent to

<***@antispam.ham>], who wrote in article <482d6d70$0$30488$***@roadrunner.com>:
> >> I don't think it's immune. It obviously isn't, given the crash. But I
> >> would expect it to be coded so that it is immune from the same kind of
> >> problem as user code, simply because it's the kernel and therefore more
> >> important than typical user code.
>
> > So it is much more important to crash (given an approriate opportunity
> > ;-). Applications may have much larger freedom: "I do not know what
> > happened, so let me try to ignore this".
>
> Why is it more important for the kernel to crash? Such a crash takes
> down everything running on the computer: all programs, and in the case
> of a multi-user system, all users as well. Would you like to be flying
> on an airplane whose control computer crashed because of a kernel bug?

Because non-crashing has a possibility of taking down much more than
crashing.

> > All programs contain hundreds of bugs.

> I disagree.

Well, can't match this with your claim about programming experience...

> > It is important to check BEFOREHAND whether your cell phone could make
> > a readable image of a trap screen (or something else) when you are in
> > a hurry. (Most cell phone cameras are focus-free, so do not allow a
> > good rendition of text. But given a large enough screen, they may
> > have a chance.)

> I had no camera handy either.

There ARE reasons to have cell phone cameras... ;-)

Yours,
Ilya

Peter Brown

2008-05-15 13:36:00 UTC

Permalink

Hi

Ilya Zakharevich wrote:
> [A complimentary Cc of this posting was sent to
>
> <***@antispam.ham>], who wrote in article <482d6d70$0$30488$***@roadrunner.com>:
>>>> I don't think it's immune. It obviously isn't, given the crash. But I
>>>> would expect it to be coded so that it is immune from the same kind of
>>>> problem as user code, simply because it's the kernel and therefore more
>>>> important than typical user code.
>>> So it is much more important to crash (given an approriate opportunity
>>> ;-). Applications may have much larger freedom: "I do not know what
>>> happened, so let me try to ignore this".
>> Why is it more important for the kernel to crash? Such a crash takes
>> down everything running on the computer: all programs, and in the case
>> of a multi-user system, all users as well. Would you like to be flying
>> on an airplane whose control computer crashed because of a kernel bug?
>
> Because non-crashing has a possibility of taking down much more than
> crashing.
>
>>> All programs contain hundreds of bugs.
>
>> I disagree.
>
> Well, can't match this with your claim about programming experience...
>
>>> It is important to check BEFOREHAND whether your cell phone could make
>>> a readable image of a trap screen (or something else) when you are in
>>> a hurry. (Most cell phone cameras are focus-free, so do not allow a
>>> good rendition of text. But given a large enough screen, they may
>>> have a chance.)
>
>> I had no camera handy either.
>
> There ARE reasons to have cell phone cameras... ;-)
>

Or a formatted floppy disk and the ability to press
Ctrl-Alt-NumLock-NumLock ie hold Crtl-Alt keys and press NumLock twice.

That dumps the Trap data to floppy - you will only need to dump to the
1st disk which will have the trap screen data.

There is a tool to extract the trap screen data, I think it is in
DMPTRPSC.ZIP but cannot remember where to find a copy...

Regards

Pete

Peter Brown

2008-05-15 13:44:37 UTC

Permalink

Hi All

Peter Brown wrote:
> Hi
>

----- snip -----

>
> There is a tool to extract the trap screen data, I think it is in
> DMPTRPSC.ZIP but cannot remember where to find a copy...
>

Found it: http://www.dreamlandbbs.com/files/gfd/systool/DmpTrpSc.zip

Regards

Pete

Steven Levine

2008-05-16 14:56:37 UTC

Permalink

In <W9gXj.4081$***@newsfe14.ams2>, on 05/15/2008
at 02:44 PM, Peter Brown <losepeteSPAM-ME-***@ntlworld.com> said:

Hi,

>Found it: http://www.dreamlandbbs.com/files/gfd/systool/DmpTrpSc.zip

A more current version can be found at

<http://home.earthlink.net/~steve53/os2diags/DumpTrapScreen.zip>

Steven

--
--------------------------------------------------------------------------------------------
Steven Levine <***@earthlink.bogus.net> MR2/ICE 3.00 beta 11pre11 #10183
eCS/Warp/DIY/14.103a_W4 www.scoug.com irc.ca.webbnet.info #scoug (Wed 7pm PST)
--------------------------------------------------------------------------------------------

t***@antispam.ham

2008-05-16 20:46:40 UTC

Permalink

Peter Brown <losepeteSPAM-ME-***@ntlworld.com> writes:

> Ilya Zakharevich wrote:

>>>>> I don't think it's immune. It obviously isn't, given the crash. But I
>>>>> would expect it to be coded so that it is immune from the same kind of
>>>>> problem as user code, simply because it's the kernel and therefore more
>>>>> important than typical user code.

>>>> So it is much more important to crash (given an approriate opportunity
>>>> ;-). Applications may have much larger freedom: "I do not know what
>>>> happened, so let me try to ignore this".

>>> Why is it more important for the kernel to crash? Such a crash takes
>>> down everything running on the computer: all programs, and in the case
>>> of a multi-user system, all users as well. Would you like to be flying
>>> on an airplane whose control computer crashed because of a kernel bug?

>> Because non-crashing has a possibility of taking down much more than
>> crashing.

>>>> All programs contain hundreds of bugs.

>>> I disagree.

>> Well, can't match this with your claim about programming experience...

>>>> It is important to check BEFOREHAND whether your cell phone could make
>>>> a readable image of a trap screen (or something else) when you are in
>>>> a hurry. (Most cell phone cameras are focus-free, so do not allow a
>>>> good rendition of text. But given a large enough screen, they may
>>>> have a chance.)

>>> I had no camera handy either.

>> There ARE reasons to have cell phone cameras... ;-)

> Or a formatted floppy disk and the ability to press
> Ctrl-Alt-NumLock-NumLock ie hold Crtl-Alt keys and press NumLock twice.
>
> That dumps the Trap data to floppy - you will only need to dump to the
> 1st disk which will have the trap screen data.

I wasn't aware of that. I have some vague recollection of trying to
write a C-A-NL-NL dump to floppy once many years ago, and it wanted to
write all of physical memory to floppy. Well, I have 2 GB of physical
memory on that machine, and I wasn't about to sit there and feed it
several hundred floppies (nor do I have that many lying around).

> There is a tool to extract the trap screen data, I think it is in
> DMPTRPSC.ZIP but cannot remember where to find a copy...

Lars Erdmann

2008-05-16 21:58:15 UTC

Permalink

Hi,

>> That dumps the Trap data to floppy - you will only need to dump to the
>> 1st disk which will have the trap screen data.
>
> I wasn't aware of that. I have some vague recollection of trying to
> write a C-A-NL-NL dump to floppy once many years ago, and it wanted to
> write all of physical memory to floppy. Well, I have 2 GB of physical
> memory on that machine, and I wasn't about to sit there and feed it
> several hundred floppies (nor do I have that many lying around).

Things have improved in the meantime: for physical mem < 2GB you can set up
a FAT dump partition (see OS/2 help for TRAMDUMP), for physical mem >= 2 GB
(but should also work for < 2GB) you can install DUMP FS, a filesystem for
memory dumps:
http://www.os2site.com/sw/drivers/filesystem/dumpfs.zip
and further info:
http://www.os2site.com/sw/hardware/diags/trapdumpref.txt
The nice thing about the latter is that the partition can be anywhere on the
disk.

1.) It installs as any other filesystem driver but place as last filesystem
in your config.sys !
2.) replace OS2DUMP with OS2DUMP.HD (rename to OS2DUMP)
3.) Then format a partition >= 2GB with: format x: /FS:DUMPFS
4.) Let TRAPDUMP in config.sys point to that partition letter

lars

t***@antispam.ham

2008-05-16 23:39:34 UTC

Permalink

Lars Erdmann <***@arcor.de> writes:

>>> That dumps the Trap data to floppy - you will only need to dump to the
>>> 1st disk which will have the trap screen data.

>> I wasn't aware of that. I have some vague recollection of trying to
>> write a C-A-NL-NL dump to floppy once many years ago, and it wanted to
>> write all of physical memory to floppy. Well, I have 2 GB of physical
>> memory on that machine, and I wasn't about to sit there and feed it
>> several hundred floppies (nor do I have that many lying around).

> Things have improved in the meantime: for physical mem < 2GB you can set up
> a FAT dump partition (see OS/2 help for TRAMDUMP), for physical mem >= 2 GB
> (but should also work for < 2GB) you can install DUMP FS, a filesystem for
> memory dumps:
> http://www.os2site.com/sw/drivers/filesystem/dumpfs.zip
> and further info:
> http://www.os2site.com/sw/hardware/diags/trapdumpref.txt
> The nice thing about the latter is that the partition can be anywhere on the
> disk.
>
> 1.) It installs as any other filesystem driver but place as last filesystem
> in your config.sys !
> 2.) replace OS2DUMP with OS2DUMP.HD (rename to OS2DUMP)
> 3.) Then format a partition >= 2GB with: format x: /FS:DUMPFS
> 4.) Let TRAPDUMP in config.sys point to that partition letter

Excellent information!

However, once a trap dump has been saved, then what? The above was
very useful back when IBM was still servicing OS/2. I have no idea
whether the eCS folks are interested in any trap dumps from a non-eCS
system.

Lars Erdmann

2008-05-17 01:49:21 UTC

Permalink

Hi,

>> Things have improved in the meantime: for physical mem < 2GB you can set
>> up
>> a FAT dump partition (see OS/2 help for TRAMDUMP), for physical mem >= 2
>> GB
>> (but should also work for < 2GB) you can install DUMP FS, a filesystem
>> for
>> memory dumps:
>> http://www.os2site.com/sw/drivers/filesystem/dumpfs.zip
>> and further info:
>> http://www.os2site.com/sw/hardware/diags/trapdumpref.txt
>> The nice thing about the latter is that the partition can be anywhere on
>> the
>> disk.
>>
>> 1.) It installs as any other filesystem driver but place as last
>> filesystem
>> in your config.sys !
>> 2.) replace OS2DUMP with OS2DUMP.HD (rename to OS2DUMP)
>> 3.) Then format a partition >= 2GB with: format x: /FS:DUMPFS
>> 4.) Let TRAPDUMP in config.sys point to that partition letter
>
> Excellent information!
>
> However, once a trap dump has been saved, then what? The above was
> very useful back when IBM was still servicing OS/2. I have no idea
> whether the eCS folks are interested in any trap dumps from a non-eCS
> system.

You have to get familiar with PMDF.EXE utility and symbol files. If you have
a complete system dump, you can load it into PMDF.EXE and trace back from
the point of failure, you can display code in assembly and if you have
symbol file (.SYM) for failing modules which correlates symbols to
addresses, you can even deduce what routine crashed ...
Get the "debugging handbook, Volumes I-IV", it's available as .inf file and
it lists all dump formatter commands (that PMDF builds upon) and gives
failure analysis examples:
http://www.os2site.com/sw/info/redbooks/index.html
It also comes with the latest OS/2 toolkit.

Doing it this way, I was able to find a problem within JFS.IFS (and because
JFS is opensourced as OPENJFS code which I found closely matched the JFS
code, deducing from the JFS disassembly).

Lars

t***@antispam.ham

2008-05-18 01:23:34 UTC

Permalink

Lars Erdmann <***@arcor.de> writes:

>>> Things have improved in the meantime: for physical mem < 2GB you can set
>>> up
>>> a FAT dump partition (see OS/2 help for TRAMDUMP), for physical mem >= 2
>>> GB
>>> (but should also work for < 2GB) you can install DUMP FS, a filesystem
>>> for
>>> memory dumps:
>>> http://www.os2site.com/sw/drivers/filesystem/dumpfs.zip
>>> and further info:
>>> http://www.os2site.com/sw/hardware/diags/trapdumpref.txt
>>> The nice thing about the latter is that the partition can be anywhere on
>>> the
>>> disk.
>>>
>>> 1.) It installs as any other filesystem driver but place as last
>>> filesystem
>>> in your config.sys !
>>> 2.) replace OS2DUMP with OS2DUMP.HD (rename to OS2DUMP)
>>> 3.) Then format a partition >= 2GB with: format x: /FS:DUMPFS
>>> 4.) Let TRAPDUMP in config.sys point to that partition letter

>> Excellent information!
>>
>> However, once a trap dump has been saved, then what? The above was
>> very useful back when IBM was still servicing OS/2. I have no idea
>> whether the eCS folks are interested in any trap dumps from a non-eCS
>> system.

> You have to get familiar with PMDF.EXE utility and symbol files. If you have
> a complete system dump, you can load it into PMDF.EXE and trace back from
> the point of failure, you can display code in assembly and if you have
> symbol file (.SYM) for failing modules which correlates symbols to
> addresses, you can even deduce what routine crashed ...
> Get the "debugging handbook, Volumes I-IV", it's available as .inf file and
> it lists all dump formatter commands (that PMDF builds upon) and gives
> failure analysis examples:
> http://www.os2site.com/sw/info/redbooks/index.html
> It also comes with the latest OS/2 toolkit.
>
> Doing it this way, I was able to find a problem within JFS.IFS (and because
> JFS is opensourced as OPENJFS code which I found closely matched the JFS
> code, deducing from the JFS disassembly).

Is there any part of OS/2 other than JFS that has been opensourced?
My trap was in the kernel.

Lars Erdmann

2008-05-18 08:07:49 UTC

Permalink

Hi,
>
> Is there any part of OS/2 other than JFS that has been opensourced?
> My trap was in the kernel.

Certainly not the kernel. But looking at the disassembled code and
identifiying the failing routine with the help of the .SYM file gives enough
information to report that bug to the eCS bugtracker. And eCS has access to
the kernel sources.

By the way: have you upgraded to the latest kernel 14.104a ? I'd advise you
to do so. And it's freely available:
for SMP systems: http://www.os2site.com/sw/upgrades/kernel/smp20050811.zip
for UNI systems: http://www.os2site.com/sw/upgrades/kernel/uni20050811.zip
for Warp 4 systems: http://www.os2site.com/sw/upgrades/kernel/w420050811.zip

The differences between these kernels: SMP is for Multiprocessor systems and
needs a .PSD platform driver, UNI is for Singleprocessor systems and needs a
.PSD platform driver, Warp 4 is for Singleprocessor systems and does not
need a .PSD platform driver.
If you don't find a PSD=APIC.PSD or the like in config.sys then it's the
Warp 4 kernel ...

In any case if you report about a trap give at least the trap screen and (in
this specific case) the kernel version. No need to take a photo:
1.) either enable trapdump (for R0) without dumping to hard disk but to
floppy (default), have a formatted floppy ready and abort dump after that 1.
diskette. Then use the trapdumpscreen util on that diskette:
http://home.earthlink.net/~steve53/os2diags/DumpTrapScreen.zip
to get to the trap dump screen displayed
2.) or set up a dump partition and enable trapdump (for R0) and let trapdump
point to the dump partition

Lars

Steve Wendt

2008-05-18 08:15:29 UTC

Permalink

On 05/18/08 01:07 am, Lars Erdmann wrote:

> UNI is for Singleprocessor systems and needs a .PSD platform driver

? I think you are wrong about the second part of this.

Lars Erdmann

2008-05-18 08:40:23 UTC

Permalink

Hi,

"Steve Wendt" <***@forgetit.org> schrieb im Newsbeitrag
news:CARXj.577$***@nlpi067.nbdc.sbc.com...
> On 05/18/08 01:07 am, Lars Erdmann wrote:
>
>> UNI is for Singleprocessor systems and needs a .PSD platform driver
>
> ? I think you are wrong about the second part of this.

Am I ? So what's the difference between the UNI and the Warp4 kernel ? Now,
I am completely confused ...

Lars

Bob Eager

2008-05-18 09:11:19 UTC

Permalink

On Sun, 18 May 2008 08:40:23 UTC, "Lars Erdmann" <***@arcor.de>
wrote:

> Hi,
>
> "Steve Wendt" <***@forgetit.org> schrieb im Newsbeitrag
> news:CARXj.577$***@nlpi067.nbdc.sbc.com...
> > On 05/18/08 01:07 am, Lars Erdmann wrote:
> >
> >> UNI is for Singleprocessor systems and needs a .PSD platform driver
> >
> > ? I think you are wrong about the second part of this.
>
> Am I ? So what's the difference between the UNI and the Warp4 kernel ? Now,
> I am completely confused ...

There are virtually no differences between the W4 and UNI kernels;
neither needs a PSD driver. I don't know if the W4 one rejects such
drivers, and perhaps the UNI one accepts them.

The only difference I know is in the APIs for things such as thread
affinity. They aren't there in W4, but are in UNI. One example might me
DosGetProcessorStatus, but it's a long while ago and I forget the exact
details.

FWIW, I've been running a UNI kernel here for years and there is
definitley no PSD driver installed.

Lars Erdmann

2008-05-18 14:22:28 UTC

Permalink

Hi,

> There are virtually no differences between the W4 and UNI kernels;
> neither needs a PSD driver. I don't know if the W4 one rejects such
> drivers, and perhaps the UNI one accepts them.
>
> The only difference I know is in the APIs for things such as thread
> affinity. They aren't there in W4, but are in UNI. One example might me
> DosGetProcessorStatus, but it's a long while ago and I forget the exact
> details.

But the UNI kernel is for singleprocessor systems. So "thread affinity"
makes no sense here, there is only 1 CPU, n'est-ce pas ?

> FWIW, I've been running a UNI kernel here for years and there is
> definitley no PSD driver installed.

Ok, that is practical proof ...

Bob Eager

2008-05-18 15:28:55 UTC

Permalink

On Sun, 18 May 2008 14:22:28 UTC, "Lars Erdmann" <***@arcor.de>
wrote:

> Hi,
>
> > There are virtually no differences between the W4 and UNI kernels;
> > neither needs a PSD driver. I don't know if the W4 one rejects such
> > drivers, and perhaps the UNI one accepts them.
> >
> > The only difference I know is in the APIs for things such as thread
> > affinity. They aren't there in W4, but are in UNI. One example might me
> > DosGetProcessorStatus, but it's a long while ago and I forget the exact
> > details.
>
> But the UNI kernel is for singleprocessor systems. So "thread affinity"
> makes no sense here, there is only 1 CPU, n'est-ce pas ?

Exactly. But the API is there and works the same as the SMP kernel does
if run on one processor.

Steven Levine

2008-05-19 21:46:20 UTC

Permalink

In <48303ba1$0$7538$***@newsspool1.arcor-online.net>, on 05/18/2008
at 04:22 PM, "Lars Erdmann" <***@arcor.de> said:

Hi,

>But the UNI kernel is for singleprocessor systems. So "thread affinity"
>makes no sense here, there is only 1 CPU, n'est-ce pas ?

TTBOMK, the purpose of the UNI kernel is to allow application code and
maybe drivers that assumes an SMP kernel to run on an single processor
system. Apparently some apps did not fall back nicely when run on a W4
kernel.

Steven

--
--------------------------------------------------------------------------------------------
Steven Levine <***@earthlink.bogus.net> MR2/ICE 3.00 beta 11pre11 #10183
eCS/Warp/DIY/14.103a_W4 www.scoug.com irc.ca.webbnet.info #scoug (Wed 7pm PST)
--------------------------------------------------------------------------------------------

Lars Erdmann

2008-05-20 06:20:02 UTC

Permalink

Hi,

thanks for the info, I am starting to get the picture. However that leads to
the next question:
can you write a driver that works on W4 AND on SMP/UNI ? There are some
spinlock devhelps where I wonder if they also exist in the W4 kernel.
Or would it be better to come up with your own serializing means: ram
semaphore, for example ?

Lars

"Steven Levine" <***@earthlink.bogus.net> schrieb im Newsbeitrag
news:4831f5b6$2$fgrir53$***@news.west.earthlink.net...
> In <48303ba1$0$7538$***@newsspool1.arcor-online.net>, on 05/18/2008
> at 04:22 PM, "Lars Erdmann" <***@arcor.de> said:
>
> Hi,
>
>>But the UNI kernel is for singleprocessor systems. So "thread affinity"
>>makes no sense here, there is only 1 CPU, n'est-ce pas ?
>
> TTBOMK, the purpose of the UNI kernel is to allow application code and
> maybe drivers that assumes an SMP kernel to run on an single processor
> system. Apparently some apps did not fall back nicely when run on a W4
> kernel.
>
> Steven
>
> --
> --------------------------------------------------------------------------------------------
> Steven Levine <***@earthlink.bogus.net> MR2/ICE 3.00 beta 11pre11
> #10183
> eCS/Warp/DIY/14.103a_W4 www.scoug.com irc.ca.webbnet.info #scoug (Wed 7pm
> PST)
> --------------------------------------------------------------------------------------------
>

Steven Levine

2008-05-20 06:31:35 UTC

Permalink

In <48326d8b$0$7551$***@newsspool1.arcor-online.net>, on 05/20/2008
at 08:20 AM, "Lars Erdmann" <***@arcor.de> said:

Hi,

>can you write a driver that works on W4 AND on SMP/UNI ?

Sure you can. The vast majority of the drivers fall in this category.
However, I suspect you really want to know if the driver can automatically
detect which variant it is running under and optimize it's operation.

The answer should be yes, although I'm not sure why it would be needed and
I've not though much about the code that would be need for the detection.

There are some drivers, such as amouse, that must be told they are running
under SMP to operate correctly.

Steven

--
--------------------------------------------------------------------------------------------
Steven Levine <***@earthlink.bogus.net> MR2/ICE 3.00 beta 11pre11 #10183
eCS/Warp/DIY/14.103a_W4 www.scoug.com irc.ca.webbnet.info #scoug (Wed 7pm PST)
--------------------------------------------------------------------------------------------

Ruediger Ihle

2008-05-20 08:02:30 UTC

Permalink

On Tue, 20 May 2008 06:31:35 UTC, Steven Levine
<***@earthlink.bogus.net> wrote:

> There are some drivers, such as amouse, that must be told they are running
> under SMP to operate correctly.

AFAIK, this is not directly related to SMP. Instead it controls
the way, the driver acknowledges interrupts and I think this
option will go away pretty soon.

Take a look at <DDK>\Base\Inc\devhlpp.inc and start to cry :-(.

There are drivers shipping with OS/2, that use this code without
defining SMP, which was as unbelievable to me as the TRAP 0000
in OS2KRNL to the OP.

--
Ruediger "Rudi" Ihle [S&T Systemtechnik GmbH, Germany]
http://www.s-t.de
Please remove all characters left of the "R" in my email address

Steven Levine

2008-05-22 05:05:01 UTC

Permalink

In <Bd1D8ggkpXsj-pn2-***@Tobias>, on 05/20/2008
at 08:02 AM, "Ruediger Ihle" <***@S-t.De> said:

Hi,

>AFAIK, this is not directly related to SMP.

I thought the need for the option became apparent when amouse had troubles
with acpi.psd?

>Instead it controls the way,
>the driver acknowledges interrupts and I think this option will go away
>pretty soon.

I makes sense to me that the driver could automatically configure based on
the existance of the acpi.psd driver.

>Take a look at <DDK>\Base\Inc\devhlpp.inc and start to cry :-(.

Stuff happens.

>There are drivers shipping with OS/2, that use this code without defining
>SMP, which was as unbelievable to me as the TRAP 0000 in OS2KRNL to the
>OP.

I guess someone though that the slightly better performance was worth the
configuration issues and the cost of maintain two sets of drivers.

Steven

--
--------------------------------------------------------------------------------------------
Steven Levine <***@earthlink.bogus.net> MR2/ICE 3.00 beta 11pre11 #10183
eCS/Warp/DIY/14.103a_W4 www.scoug.com irc.ca.webbnet.info #scoug (Wed 7pm PST)
--------------------------------------------------------------------------------------------

Lars Erdmann

2008-05-27 00:15:11 UTC

Permalink

Ruediger Ihle schrieb:
> On Tue, 20 May 2008 06:31:35 UTC, Steven Levine
> <***@earthlink.bogus.net> wrote:
>
>> There are some drivers, such as amouse, that must be told they are running
>> under SMP to operate correctly.
>
> AFAIK, this is not directly related to SMP. Instead it controls
> the way, the driver acknowledges interrupts and I think this
> option will go away pretty soon.
>
> Take a look at <DDK>\Base\Inc\devhlpp.inc and start to cry :-(.
>
> There are drivers shipping with OS/2, that use this code without
> defining SMP, which was as unbelievable to me as the TRAP 0000
> in OS2KRNL to the OP.
Yes, the IBM mouse driver is an example, I just realized this now.

Why on planet earth don't ALL drivers just use DevHlp_EOI instead of
directly accessing the interrupt controller registers ?
Doesn't DevHlp_EOI work perfectly well on ALL kernels and ALL systems
(even on the Warp 4 kernel ?). At least that's what the documentation
implies. Why would anyone create different drivers for SMP and the
"normal" kernel ?

Lars

Steven Levine

2008-05-27 18:27:57 UTC

Permalink

In <483b5291$0$7555$***@newsspool1.arcor-online.net>, on 05/27/2008
at 02:15 AM, Lars Erdmann <***@arcor.de> said:

Hi,

>Why on planet earth don't ALL drivers just use DevHlp_EOI instead of
>directly accessing the interrupt controller registers ?

Because the driver writers used the available samples as best practices.

Because the drivers were written long ago and worked just fine in the
environments in which they were tested.

One needs to keep in mind that most of the driver development we are
discussing was was done 10 or 12 years ago. If there had been ongoing
development since then, these issues would not exist.

Steven

--
--------------------------------------------------------------------------------------------
Steven Levine <***@earthlink.bogus.net> MR2/ICE 3.00 beta 11pre11 #10183
eCS/Warp/DIY/14.103a_W4 www.scoug.com irc.ca.webbnet.info #scoug (Wed 7pm PST)
--------------------------------------------------------------------------------------------

Lars Erdmann

2008-05-27 18:56:11 UTC

Permalink

Steven Levine schrieb:
> In <483b5291$0$7555$***@newsspool1.arcor-online.net>, on 05/27/2008
> at 02:15 AM, Lars Erdmann <***@arcor.de> said:
>
> Hi,
>
>> Why on planet earth don't ALL drivers just use DevHlp_EOI instead of
>> directly accessing the interrupt controller registers ?
>
> Because the driver writers used the available samples as best practices.
>
> Because the drivers were written long ago and worked just fine in the
> environments in which they were tested.
>
> One needs to keep in mind that most of the driver development we are
> discussing was was done 10 or 12 years ago. If there had been ongoing
> development since then, these issues would not exist.

I could agree on that if it weren't for the fact that pdd.inf also
exists for more than 12 years now.
And it describes the DevHlp_EOI service in detail, a service that is
meant to signal an end of interrupt.
It's really annoying that people work around the specs.
Reason: a .PSD implements an entry point called PSD_IRQ_EOI. This will
be called when DevHlp_EOI is executed (if a .PSD is used). And there are
good reasons to have a platform specific implementation for this service
on a multi-processor system.

But I guess you are right ,sigh ....

Lars

Steven Levine

2008-05-28 08:11:17 UTC

Permalink

In <483c594c$0$7534$***@newsspool1.arcor-online.net>, on 05/27/2008
at 08:56 PM, Lars Erdmann <***@arcor.de> said:

Hi,

>I could agree on that if it weren't for the fact that pdd.inf also
>exists for more than 12 years now.
>And it describes the DevHlp_EOI service in detail, a service that is
>meant to signal an end of interrupt.
>It's really annoying that people work around the specs.

I understand what you are saying, but this is just how people are.

While I expect it, I'm still surprised now and then by how limited a scope
of knowledge many programmers have. For example, I just had one ask me
what DOSCALLS entry point 981 was and why it would cause his code to fail.
I would expect this question from a beginner, but not from someone with
over a decade of OS/2 programming experience. However, I have to assume
he knows stuff I don't know, so that's just the way it is.

>But I guess you are right ,sigh ....

:-)

Steven

--
--------------------------------------------------------------------------------------------
Steven Levine <***@earthlink.bogus.net> MR2/ICE 3.00 beta 11pre11 #10183
eCS/Warp/DIY/14.103a_W4 www.scoug.com irc.ca.webbnet.info #scoug (Wed 7pm PST)
--------------------------------------------------------------------------------------------

Lars Erdmann

2008-05-27 00:18:37 UTC

Permalink

Steven Levine schrieb:
> In <48326d8b$0$7551$***@newsspool1.arcor-online.net>, on 05/20/2008
> at 08:20 AM, "Lars Erdmann" <***@arcor.de> said:
>
> Hi,
>
>> can you write a driver that works on W4 AND on SMP/UNI ?
>
> Sure you can. The vast majority of the drivers fall in this category.
> However, I suspect you really want to know if the driver can automatically
> detect which variant it is running under and optimize it's operation.

I would not even go that far and ask for optimized operation. I just
would expect that you would not have to deliver 2 different versions of
your driver.
>
> The answer should be yes, although I'm not sure why it would be needed and
> I've not though much about the code that would be need for the detection.
See above, it would avoid the need to deliver 2 different versions of
the driver.

As a consequence, if I create a driver from the IBM sources, is it
ALWAYS ok to specify "SMP" as a define and that driver will also run on
any kernel: Warp4, UNI, SMP ?

Lars

Steven Levine

2008-05-27 18:34:59 UTC

Permalink

In <483b535d$0$7535$***@newsspool1.arcor-online.net>, on 05/27/2008
at 02:18 AM, Lars Erdmann <***@arcor.de> said:

Hi,

>As a consequence, if I create a driver from the IBM sources, is it
>ALWAYS ok to specify "SMP" as a define and that driver will also run on
>any kernel: Warp4, UNI, SMP ?

Good question. The DevHlp is documented in pdd.inf, so that implies it is
available for those that choose to use it.

Steven

--
--------------------------------------------------------------------------------------------
Steven Levine <***@earthlink.bogus.net> MR2/ICE 3.00 beta 11pre11 #10183
eCS/Warp/DIY/14.103a_W4 www.scoug.com irc.ca.webbnet.info #scoug (Wed 7pm PST)
--------------------------------------------------------------------------------------------

Ruediger Ihle

2008-05-20 07:53:49 UTC

Permalink

On Tue, 20 May 2008 06:20:02 UTC, "Lars Erdmann" <***@arcor.de>
wrote:

> can you write a driver that works on W4 AND on SMP/UNI ?

AFAIK, the story is the following:

OS/2 SMP 2.11 introduced additional device helpers (spinlocks)
that had to be used to make the driver SMP compatible. While
this was technically a good thing, it meant that only very few
device driver could be used on such a system.

Unfortunately, instead of taking this as a reason to introduce
a new device driver model, IBM decided to take a step back.
They modified the kernels of later versions of OS/2 SMP so,
that they could run the old (non-SMP-aware) drivers. AFAIK,
the most prominent consequence of this is, that interrupts
are processed only on the first CPU and that there is some
nasty and time consuming serialisation taking place.

So with current SMP kernels, drivers don't need to know whether
they are running on a single CPU or on a quad core. However,
the way the kernel is handling certain things does degrade the
possible performance.

--
Ruediger "Rudi" Ihle [S&T Systemtechnik GmbH, Germany]
http://www.s-t.de
Please remove all characters left of the "R" in my email address

Allan

2008-05-18 11:31:20 UTC

Permalink

On Sun, 18 May 2008 08:40:23 UTC, "Lars Erdmann" <***@arcor.de> wrote:

> > On 05/18/08 01:07 am, Lars Erdmann wrote:
> >
> >> UNI is for Singleprocessor systems and needs a .PSD platform driver
> >
> > ? I think you are wrong about the second part of this.
>
> Am I ? So what's the difference between the UNI and the Warp4 kernel ? Now,
> I am completely confused ...

You can use a PSD driver for all kernels - which ACPI.PSD shows.
If you have more than 1 cpu, you _need_ a PSD driver and the SMP kernel
to activate all extras.
W4 kernel is the only one that still supports 486 SX processors.
UNI kernel altso has apic support, and some extra codepage support
(you get an extra Language tab in all settings notebooks).
Some people have claimed, that IBMs APM works better with W4 kernel.

--
Allan.

It is better to close your mouth, and look like a fool,
than to open it, and remove all doubt.

Lars Erdmann

2008-05-18 13:47:51 UTC

Permalink

Hi,

> You can use a PSD driver for all kernels - which ACPI.PSD shows.

Even with the Warp 4 kernel ? Scott Garfinkle had disabled PSD support in
Warp4 kernel but he thought about putting it back in. Obviously he did ...

> If you have more than 1 cpu, you _need_ a PSD driver and the SMP kernel
> to activate all extras.
> W4 kernel is the only one that still supports 486 SX processors.
> UNI kernel altso has apic support, and some extra codepage support
> (you get an extra Language tab in all settings notebooks).

Where do you guys know all this from ?

> Some people have claimed, that IBMs APM works better with W4 kernel.

Great, that gives an inconsistent view of what is better to use: UNI or Warp
4, jeez ....

Lars

Steve Wendt

2008-05-18 17:52:43 UTC

Permalink

On 05/18/08 06:47 am, Lars Erdmann wrote:

>> You can use a PSD driver for all kernels - which ACPI.PSD shows.
>
> Even with the Warp 4 kernel ?

At least ACPI.PSD works with the W4 kernel.

>> W4 kernel is the only one that still supports 486 SX processors.
>> UNI kernel altso has apic support, and some extra codepage support
>> (you get an extra Language tab in all settings notebooks).
>
> Where do you guys know all this from ?

There have been threads about it here on Usenet in the past; Scott
supplied some info.

Hendrik Schmieder

2008-05-23 17:09:20 UTC

Permalink

Allan schrieb:
>
> On Sun, 18 May 2008 08:40:23 UTC, "Lars Erdmann" <***@arcor.de> wrote:
>
> > > On 05/18/08 01:07 am, Lars Erdmann wrote:
> > >
> > >> UNI is for Singleprocessor systems and needs a .PSD platform driver
> > >
> > > ? I think you are wrong about the second part of this.
> >
> > Am I ? So what's the difference between the UNI and the Warp4 kernel ? Now,
> > I am completely confused ...
>
> You can use a PSD driver for all kernels - which ACPI.PSD shows.
> If you have more than 1 cpu, you _need_ a PSD driver and the SMP kernel
> to activate all extras.
> W4 kernel is the only one that still supports 486 SX processors.
> UNI kernel altso has apic support, and some extra codepage support
> (you get an extra Language tab in all settings notebooks).

I use a W4 kernel and also have the extra Language tab in the settings
notebook !

Hendrik

t***@antispam.ham

2008-05-21 23:08:40 UTC

Permalink

Lars Erdmann writes:

> By the way: have you upgraded to the latest kernel 14.104a ?

No; my system is at 14.103, which, if memory serves, came with the final
FixPak. Is there a list of changes between 14.103 and 14.104 somewhere,
or is that known only to IBM (and maybe Serenity)?

> I'd advise you
> to do so. And it's freely available:
> for SMP systems: http://www.os2site.com/sw/upgrades/kernel/smp20050811.zip
> for UNI systems: http://www.os2site.com/sw/upgrades/kernel/uni20050811.zip
> for Warp 4 systems: http://www.os2site.com/sw/upgrades/kernel/w420050811.zip
>
> The differences between these kernels: SMP is for Multiprocessor systems and
> needs a .PSD platform driver, UNI is for Singleprocessor systems and needs a
> ..PSD platform driver, Warp 4 is for Singleprocessor systems and does not
> need a .PSD platform driver.
> If you don't find a PSD=APIC.PSD or the like in config.sys then it's the
> Warp 4 kernel ...

I've been using the W4 kernel on a Warp 4.52 single processor system.

> In any case if you report about a trap

To whom?

> give at least the trap screen and (in
> this specific case) the kernel version. No need to take a photo:
> 1.) either enable trapdump (for R0) without dumping to hard disk but to
> floppy (default), have a formatted floppy ready and abort dump after that 1.
> diskette. Then use the trapdumpscreen util on that diskette:
> http://home.earthlink.net/~steve53/os2diags/DumpTrapScreen.zip
> to get to the trap dump screen displayed
> 2.) or set up a dump partition and enable trapdump (for R0) and let trapdump
> point to the dump partition

Steve Wendt

2008-05-22 02:52:42 UTC

Permalink

On 05/21/08 04:08 pm, ***@antispam.ham wrote:

>> By the way: have you upgraded to the latest kernel 14.104a ?
>
> No; my system is at 14.103, which, if memory serves, came with the final
> FixPak. Is there a list of changes between 14.103 and 14.104 somewhere,
> or is that known only to IBM (and maybe Serenity)?

That info comes with the package:

20050317 14.103a
First post xr_c005 private. Starts at that level and adds the fixes
from 14.100m-14.100o.
Note: Pentium Pros are once again supported here. Also, I changed
vmco_s back to its usual size (see note under 14.100e)

PJ30210 - Another -- I hope the final -- attempt to fix doscall1 traps
on P4 and Athlon64s.

20050331 14.103d
****** SEE WARNING UNDER 20041105 (don't use this with Pentium PRO CPUs)
PJ30224 - Netservr hangs from certain win32 client requests (e.g.
QUERY_FILE_COMPRESSION_INFO packets from 7-zip).

200507808 14.104
Basically, a re-build of the above from an "official" build machine.
*This build is safe for Pentium Pros. Note: 486SX CPUs are (and have
always been) safe ONLY on the W4 kernel.

200507808 14.104a
- fixed problem in 14.104 (only) that caused (at least) some VDM
applications (such as COPY) to fail.

Hendrik Schmieder

2008-05-23 16:57:08 UTC

Permalink

Lars Erdmann schrieb:
>
>
> Certainly not the kernel. But looking at the disassembled code and
> identifiying the failing routine with the help of the .SYM file gives enough
> information to report that bug to the eCS bugtracker. And eCS has access to
> the kernel sources.
>

Why do you think that Serenity Systems has access to the kernel sources
?
That would be a big surprise !

Hendrik

Hendrik Schmieder

2008-06-06 19:40:44 UTC

Permalink

Lars Erdmann schrieb:
>
> Hi,
> >
> > Is there any part of OS/2 other than JFS that has been opensourced?
> > My trap was in the kernel.
>
> Certainly not the kernel. But looking at the disassembled code and
> identifiying the failing routine with the help of the .SYM file gives enough
> information to report that bug to the eCS bugtracker. And eCS has access to
> the kernel sources.

This is not true:
Serenity System has asked IBM about the source but IBM denied access.

This is the official statement from Serenity System.

Hendrik

Ilya Zakharevich

2008-05-18 18:39:36 UTC

Permalink

[A complimentary Cc of this posting was NOT [per weedlist] sent to
Lars Erdmann
<***@arcor.de>], who wrote in article <48303ba1$0$7538$***@newsspool1.arcor-online.net>:
> > The only difference I know is in the APIs for things such as thread
> > affinity. They aren't there in W4, but are in UNI. One example might me
> > DosGetProcessorStatus, but it's a long while ago and I forget the exact
> > details.
>
> But the UNI kernel is for singleprocessor systems. So "thread affinity"
> makes no sense here, there is only 1 CPU, n'est-ce pas ?

My memory may be hazy, but IIRC, the intent of UNI is for
MULTIPROCESSOR systems when you want to use only one processor.

Yours,
Ilya

t***@antispam.ham

2008-05-16 20:41:59 UTC

Permalink

Ilya Zakharevich <nospam-***@ilyaz.org> writes:

>>>> I don't think it's immune. It obviously isn't, given the crash. But I
>>>> would expect it to be coded so that it is immune from the same kind of
>>>> problem as user code, simply because it's the kernel and therefore more
>>>> important than typical user code.

>>> So it is much more important to crash (given an approriate opportunity
>>> ;-). Applications may have much larger freedom: "I do not know what
>>> happened, so let me try to ignore this".

>> Why is it more important for the kernel to crash? Such a crash takes
>> down everything running on the computer: all programs, and in the case
>> of a multi-user system, all users as well. Would you like to be flying
>> on an airplane whose control computer crashed because of a kernel bug?

> Because non-crashing has a possibility of taking down much more than
> crashing.

Huh? If crashing takes down everything, then how can non-crashing take
down more than everything?

>>> All programs contain hundreds of bugs.

>> I disagree.

> Well, can't match this with your claim about programming experience...

That's your problem.

>>> It is important to check BEFOREHAND whether your cell phone could make
>>> a readable image of a trap screen (or something else) when you are in
>>> a hurry. (Most cell phone cameras are focus-free, so do not allow a
>>> good rendition of text. But given a large enough screen, they may
>>> have a chance.)

>> I had no camera handy either.

> There ARE reasons to have cell phone cameras... ;-)

There ARE reasons not to have a cell phone.

Ilya Zakharevich

2008-05-16 21:28:47 UTC

Permalink

[A complimentary Cc of this posting was sent to

<***@antispam.ham>], who wrote in article <482df197$0$30525$***@roadrunner.com>:
> > Because non-crashing has a possibility of taking down much more than
> > crashing.
>
> Huh? If crashing takes down everything, then how can non-crashing take
> down more than everything?

Are you kidding, or what? Did you think about overwriting data on the
filesystems, initiating spurious financial transactions, closing
garage doors down on people's heads, etc?

> > There ARE reasons to have cell phone cameras... ;-)
>
> There ARE reasons not to have a cell phone.

Sure. This is what "Airplane mode" is about. ;-)

Yours,
Ilya

t***@antispam.ham

2008-05-16 23:36:16 UTC

Permalink

Ilya Zakharevich <nospam-***@ilyaz.org> writes:

>>> Because non-crashing has a possibility of taking down much more than
>>> crashing.

>> Huh? If crashing takes down everything, then how can non-crashing take
>> down more than everything?

> Are you kidding, or what?

I could ask you the same thing.

> Did you think about overwriting data on the filesystems,

I've seen that happen in the process of a system crashing. Indeed,
after I rebooted, a JFS partition on a USB drive had to fixed before
I could access it. That's another good reason for making the kernel
as robust as possible.

> initiating spurious financial transactions, closing
> garage doors down on people's heads, etc?

My computer isn't attached to any garage door opener. But if you
have one that is, then you have another good reason for making the
kernel as robust as possible.

>>> There ARE reasons to have cell phone cameras... ;-)

>> There ARE reasons not to have a cell phone.

> Sure. This is what "Airplane mode" is about. ;-)

Does the cell phone service provider stop charging you for service
when airplane mode is on?

Ilya Zakharevich

2008-05-17 00:33:17 UTC

Permalink

[A complimentary Cc of this posting was sent to

<***@antispam.ham>], who wrote in article <482e1a6f$0$5178$***@roadrunner.com>:
> > Did you think about overwriting data on the filesystems,
>
> I've seen that happen in the process of a system crashing.

No, you did not.

> Indeed,
> after I rebooted, a JFS partition on a USB drive had to fixed before
> I could access it.

So you see that that HD was not overwritten.

> That's another good reason for making the kernel as robust as
> possible.

Irrelevant. We discuss what should the kernel do when it discovered
that it is not as robust as needed.

> > initiating spurious financial transactions, closing
> > garage doors down on people's heads, etc?
>
> My computer isn't attached to any garage door opener.

And how the kernel should know this?

> Does the cell phone service provider stop charging you for service
> when airplane mode is on?

No. But I think you do not need a server provider in such a case.

Hope this helps,
Ilya

t***@antispam.ham

2008-05-17 09:27:58 UTC

Permalink

Ilya Zakharevich <nospam-***@ilyaz.org> writes:

>>> Did you think about overwriting data on the filesystems,

>> I've seen that happen in the process of a system crashing.

> No, you did not.

Why do you say that? I've had perfectly fine data files immediately
before a crash and ruined data files immediately after rebooting
following a crash.

>> Indeed,
>> after I rebooted, a JFS partition on a USB drive had to fixed before
>> I could access it.

> So you see that that HD was not overwritten.

On the contrary, information was overwritten, which is why it had
to be rebuilt.

>> That's another good reason for making the kernel as robust as
>> possible.

> Irrelevant.

On the contrary, it's quite relevant.

> We discuss what should the kernel do when it discovered
> that it is not as robust as needed.

No, we've been discussing why I didn't think that TRAP 0000
was associated with the kernel.

>>> initiating spurious financial transactions, closing
>>> garage doors down on people's heads, etc?

>> My computer isn't attached to any garage door opener.

> And how the kernel should know this?

It doesn't need to.

>> Does the cell phone service provider stop charging you for service
>> when airplane mode is on?

> No.

There's your reason.

> But I think you do not need a server provider in such a case.

An ornamental cell phone?

> Hope this helps,

Sorry to disappoint you.

Ilya Zakharevich

2008-05-17 20:52:07 UTC

Permalink

[A complimentary Cc of this posting was sent to

<***@antispam.ham>], who wrote in article <482ea51e$0$3378$***@roadrunner.com>:
> >>> Did you think about overwriting data on the filesystems,
>
> >> I've seen that happen in the process of a system crashing.
>
> > No, you did not.
>
> Why do you say that? I've had perfectly fine data files immediately
> before a crash and ruined data files immediately after rebooting
> following a crash.

In which way there were "ruined"? If they were overwritten, then this
happened *before* the crash. (With smart enough chkdsk, typically,
files are just truncated, or lose their names.)

> >> Indeed,
> >> after I rebooted, a JFS partition on a USB drive had to fixed before
> >> I could access it.
>
> > So you see that that HD was not overwritten.
>
> On the contrary, information was overwritten, which is why it had
> to be rebuilt.

No. The maximum which may have happened after the crash is that the
information was NOT written. (A chkdsk run may have overwritten
stuff, but this is a different topic)

> > We discuss what should the kernel do when it discovered
> > that it is not as robust as needed.

> No, we've been discussing why I didn't think that TRAP 0000
> was associated with the kernel.

Same difference.

> >>> initiating spurious financial transactions, closing
> >>> garage doors down on people's heads, etc?
>
> >> My computer isn't attached to any garage door opener.
>
> > And how the kernel should know this?
>
> It doesn't need to.

??? Then it must abort.

> > But I think you do not need a server provider in such a case.
>
> An ornamental cell phone?

If a $10 digicam from drugstore is good enough to make readable copies
of such stuff as crash screens, go with it instead.

Hope this helps,
Ilya

t***@antispam.ham

2008-05-17 23:56:13 UTC

Permalink

Ilya Zakharevich <nospam-***@ilyaz.org> writes:

>>>>> Did you think about overwriting data on the filesystems,

>>>> I've seen that happen in the process of a system crashing.

>>> No, you did not.

>> Why do you say that? I've had perfectly fine data files immediately
>> before a crash and ruined data files immediately after rebooting
>> following a crash.

> In which way there were "ruined"?

Garbage replaced valid data.

> If they were overwritten, then this
> happened *before* the crash.

I've had perfectly fine data files being actively looked at by myself
that are fine up to the instant of the system crashing, and garbage
after a reboot.

> (With smart enough chkdsk, typically,
> files are just truncated, or lose their names.)

And truncation doesn't ruin them?

>>>> Indeed,
>>>> after I rebooted, a JFS partition on a USB drive had to fixed before
>>>> I could access it.

>>> So you see that that HD was not overwritten.

>> On the contrary, information was overwritten, which is why it had
>> to be rebuilt.

> No. The maximum which may have happened after the crash is that the
> information was NOT written.

I said nothing about damage occurring after a crash. Reread my
statement above, where it says "in the process of a system crashing".

> (A chkdsk run may have overwritten
> stuff, but this is a different topic)

Doesn't that contradict the statement you just made?

>>> We discuss what should the kernel do when it discovered
>>> that it is not as robust as needed.

>> No, we've been discussing why I didn't think that TRAP 0000
>> was associated with the kernel.

> Same difference.

Incorrect.

>>>>> initiating spurious financial transactions, closing
>>>>> garage doors down on people's heads, etc?

>>>> My computer isn't attached to any garage door opener.

>>> And how the kernel should know this?

>> It doesn't need to.

> ??? Then it must abort.

Abort doing what? I just said that it isn't attached to any
garage door opener.

>>> But I think you do not need a server provider in such a case.

>> An ornamental cell phone?

> If a $10 digicam from drugstore is good enough to make readable copies
> of such stuff as crash screens, go with it instead.

I've not seen any $10 digicams in any drugstores, and if I had, I would
not have trusted the quality of the unit enough to consider purchasing
one.

Lars Erdmann

2008-05-17 01:37:31 UTC

Permalink

Hi,

>> Because non-crashing has a possibility of taking down much more than
>> crashing.
>
> Huh? If crashing takes down everything, then how can non-crashing take
> down more than everything?

What he wanted to say is that immediate abortion ("crashing") can be the
better choice than continuation (" non-crashing") on unsafe ground ...

Lars

Steven Levine

2008-05-15 15:36:24 UTC

Permalink

In <482bf7ec$0$3390$***@roadrunner.com>, on 05/15/2008
at 08:44 AM, ***@antispam.ham said:

>I can see user programs doing that, and they can crash without taking
>down the system. But shouldn't the system protect itself from such a
>crash by testing for the exception and avoiding it?

I'm of the opinion that user programs should avoid trapping too. We don't
live in a perfect world with perfect knowledge.

Steven

--
--------------------------------------------------------------------------------------------
Steven Levine <***@earthlink.bogus.net> MR2/ICE 3.00 beta 11pre11 #10183
eCS/Warp/DIY/14.103a_W4 www.scoug.com irc.ca.webbnet.info #scoug (Wed 7pm PST)
--------------------------------------------------------------------------------------------

t***@antispam.ham

2008-05-15 20:39:08 UTC

Permalink

Steven Levine writes:

>> I can see user programs doing that, and they can crash without taking
>> down the system. But shouldn't the system protect itself from such a
>> crash by testing for the exception and avoiding it?

> I'm of the opinion that user programs should avoid trapping too. We don't
> live in a perfect world with perfect knowledge.

I'm of the opinion that it depends on the trade-off between speed and
robustness. If a trap is a one-in-a-million occurrence, I'd rather live
with that than suffer a guaranteed one-in-a-hundred slowdown of the code.
Really depends on how mission critical the code is. I don't suppose the
military would want a nuke to go off unexpectedly because of a divide by
zero error in some code!

2008-05-16 19:35:46 UTC

Permalink

On May 15, 4:44 am, ***@antispam.ham wrote:
> Paul Ratcliffe writes:
> > Steven Levine wrote:
> >>> Why only user programs ? It's perfectly possible that there is a bug in
> >>> OS2KRNL whre it attempts to divide by zero due to a non-expected divisor
> >>> value ...
> >> I've always found the naming of trap 0 interesting. My experience that it
> >> occurs more often when the dividend is too large.
> > When the dividend is too large and/or the divisor is too small.
> > Remember the old Borland runtime library error 200 on faster (of the day) CPUs?
> > That was a 'divide by zero' error where it wasn't dividing by zero.
>
> > The meaning of the exception is actually 'divide overflow' - it's just that
> > it is usually caused by dividing by zero because that always generates that
> > exception.
>
> I can see user programs doing that, and they can crash without taking down
> the system.

What you see is irrelevant, Tholen.

>But shouldn't the system protect itself from such a crash by
> testing for the exception and avoiding it?

Only if the software engineers programmed such a test into the system,
Tholen.

t***@antispam.ham

2008-05-16 20:54:12 UTC

Permalink

NB <***@gmail.com> writes:

>> Paul Ratcliffe wrote:

>>> Steven Levine wrote:

>>>>> Why only user programs ? It's perfectly possible that there is a bug in
>>>>> OS2KRNL whre it attempts to divide by zero due to a non-expected divisor
>>>>> value ...

>>>> I've always found the naming of trap 0 interesting. My experience that it
>>>> occurs more often when the dividend is too large.

>>> When the dividend is too large and/or the divisor is too small.
>>> Remember the old Borland runtime library error 200 on faster (of the day) CPUs?
>>> That was a 'divide by zero' error where it wasn't dividing by zero.
>>>
>>> The meaning of the exception is actually 'divide overflow' - it's just that
>>> it is usually caused by dividing by zero because that always generates that
>>> exception.

>> I can see user programs doing that, and they can crash without taking down
>> the system.

> What you see is irrelevant, Tholen.

Further proof that you are a troll, nobuyout. You're so obsessed that
you're following me around to another newsgroup.

>> But shouldn't the system protect itself from such a crash by
>> testing for the exception and avoiding it?

> Only if the software engineers programmed such a test into the system,
> Tholen.

Classic begging of the question. But not unexpected from a troll who
has no real intent of addressing the issue at hand.

2008-05-19 13:57:39 UTC

Permalink

On May 16, 4:54 pm, ***@antispam.ham wrote:
> NB <***@gmail.com> writes:
> >> Paul Ratcliffe wrote:
> >>> Steven Levine wrote:
> >>>>> Why only user programs ? It's perfectly possible that there is a bug in
> >>>>> OS2KRNL whre it attempts to divide by zero due to a non-expected divisor
> >>>>> value ...
> >>>> I've always found the naming of trap 0 interesting. My experience that it
> >>>> occurs more often when the dividend is too large.
> >>> When the dividend is too large and/or the divisor is too small.
> >>> Remember the old Borland runtime library error 200 on faster (of the day) CPUs?
> >>> That was a 'divide by zero' error where it wasn't dividing by zero.
>
> >>> The meaning of the exception is actually 'divide overflow' - it's just that
> >>> it is usually caused by dividing by zero because that always generates that
> >>> exception.
> >> I can see user programs doing that, and they can crash without taking down
> >> the system.
> > What you see is irrelevant, Tholen.
>
> Further proof that you are a troll, nobuyout.

Classic unsubstantiated and erroneous claim from Tholen.

>You're so obsessed that
> you're following me around to another newsgroup.

Classic erroneous presupposition, given that I have used OS/2 in the
past.

> >> But shouldn't the system protect itself from such a crash by
> >> testing for the exception and avoiding it?
> > Only if the software engineers programmed such a test into the system,
> > Tholen.
>
> Classic begging of the question.

Classic unsubstantiated and erroneous claim from Tholen.

>But not unexpected from a troll

Classic erroneous presupposition.

>who
> has no real intent of addressing the issue at hand.

More proof that Tholen is a troll, given that I provided a valid
answer to his question, and he responded with an antagonistic reply.

t***@antispam.ham

2008-05-19 20:07:31 UTC

Permalink

NB <***@gmail.com> writes:

>>>> Paul Ratcliffe wrote:

>>>>> Steven Levine wrote:

>>>>>>> Why only user programs ? It's perfectly possible that there is a bug in
>>>>>>> OS2KRNL whre it attempts to divide by zero due to a non-expected divisor
>>>>>>> value ...

>>>>>> I've always found the naming of trap 0 interesting. =A0My experience that it
>>>>>> occurs more often when the dividend is too large.

>>>>> When the dividend is too large and/or the divisor is too small.
>>>>> Remember the old Borland runtime library error 200 on faster (of the day) CPUs?
>>>>> That was a 'divide by zero' error where it wasn't dividing by zero.

>>>>> The meaning of the exception is actually 'divide overflow' - it's just that
>>>>> it is usually caused by dividing by zero because that always generates that
>>>>> exception.

>>>> I can see user programs doing that, and they can crash without taking down
>>>> the system.

>>> What you see is irrelevant, Tholen.

>> Further proof that you are a troll, nobuyout.

> Classic unsubstantiated and erroneous claim from Tholen.

On the contrary, the substantiation was provided, nobuyout.

>> You're so obsessed that
>> you're following me around to another newsgroup.

> Classic erroneous presupposition,

Classic unsubstantiated and erroneous claim.

> given that I have used OS/2 in the past.

Doesn't change the fact that you followed me to this newsgroup, nobuyout.

>>>> But shouldn't the system protect itself from such a crash by
>>>> testing for the exception and avoiding it?

>>> Only if the software engineers programmed such a test into the system,
>>> Tholen.

>> Classic begging of the question.

> Classic unsubstantiated and erroneous claim from Tholen.

Classic unsubstantiated and erroneous claim from nobuyout.

>> But not unexpected from a troll

> Classic erroneous presupposition.

Classic unsubstantiated and erroneous claim.

>> who has no real intent of addressing the issue at hand.

> More proof that Tholen is a troll,

Classic unsubstantiated and erroneous claim.

> given that I provided a valid answer to his question,

Classic unsubstantiated and erroneous claim.

> and he responded with an antagonistic reply.

Classic unsubstantiated and erroneous claim.

2008-05-19 20:28:35 UTC

Permalink

On May 19, 4:07 pm, ***@antispam.ham wrote:
> NB <***@gmail.com> writes:
> >>>> Paul Ratcliffe wrote:
> >>>>> Steven Levine wrote:
> >>>>>>> Why only user programs ? It's perfectly possible that there is a bug in
> >>>>>>> OS2KRNL whre it attempts to divide by zero due to a non-expected divisor
> >>>>>>> value ...
> >>>>>> I've always found the naming of trap 0 interesting. =A0My experience that it
> >>>>>> occurs more often when the dividend is too large.
> >>>>> When the dividend is too large and/or the divisor is too small.
> >>>>> Remember the old Borland runtime library error 200 on faster (of the day) CPUs?
> >>>>> That was a 'divide by zero' error where it wasn't dividing by zero.
> >>>>> The meaning of the exception is actually 'divide overflow' - it's just that
> >>>>> it is usually caused by dividing by zero because that always generates that
> >>>>> exception.
> >>>> I can see user programs doing that, and they can crash without taking down
> >>>> the system.
> >>> What you see is irrelevant, Tholen.
> >> Further proof that you are a troll, nobuyout.
> > Classic unsubstantiated and erroneous claim from Tholen.
>
> On the contrary, the substantiation was provided, nobuyout.

So that means you are a troll, Tholen, given that you have frequently
told people that what they see is irrelevant.

> >> You're so obsessed that
> >> you're following me around to another newsgroup.
> > Classic erroneous presupposition,
>
> Classic unsubstantiated and erroneous claim.

Classic unsubstantiated and erroneous claim from Tholen.

>
> > given that I have used OS/2 in the past.
>
> Doesn't change the fact that you followed me to this newsgroup, nobuyout.

Classic erroneous presupposition, given that I can visit this
newsgroup without seeking you out, Tholen.

> >>>> But shouldn't the system protect itself from such a crash by
> >>>> testing for the exception and avoiding it?
> >>> Only if the software engineers programmed such a test into the system,
> >>> Tholen.
> >> Classic begging of the question.
> > Classic unsubstantiated and erroneous claim from Tholen.
>
> Classic unsubstantiated and erroneous claim from nobuyout.

Classic unsubstantiated and erroneous claim from Tholen.
>
> >> But not unexpected from a troll
> > Classic erroneous presupposition.
>
> Classic unsubstantiated and erroneous claim.

Classic unsubstantiated and erroneous claim from Tholen.
>
> >> who has no real intent of addressing the issue at hand.
> > More proof that Tholen is a troll,
>
> Classic unsubstantiated and erroneous claim.

Classic unsubstantiated and erroneous claim from Tholen.
>
> > given that I provided a valid answer to his question,
>
> Classic unsubstantiated and erroneous claim.

Classic unsubstantiated and erroneous claim from Tholen.
>
> > and he responded with an antagonistic reply.
>
> Classic unsubstantiated and erroneous claim.

Classic unsubstantiated and erroneous claim from Tholen.