Unique Constraint Based on Dual GUID

Discussion:

(too old to reply)

Michael Cole

2015-08-11 04:07:19 UTC

Brief Synopsis: -

I have a table with a GUID as PK (created via NewId())

I also have a table to indicate links between records in this table,
with a dual PK of GUIDS (for the two records that connect to each
other)

Putting a constrain on these two fields will limit it to only one
combination of the two fields, i.e., a link of A to B, but I also need
to ensure that the link is not duplicated as B to A - the link is
non-directional.

My idea was to include a calculated field of the two GUIDs XORed
together, and place a constraint on this calculated field. Can anyone
see any issues with this idea?

For reference, the function would be: -

CREATE FUNCTION GUIDXor ( @guid1 UNIQUEIDENTIFIER, @guid2
UNIQUEIDENTIFIER)
RETURNS UNIQUEIDENTIFIER
AS
BEGIN

-- variables
DECLARE @vb1 BINARY(16), @vb2 BINARY(16), @lo BINARY(8), @hi BINARY(8)

-- split every 8 bytes into a binary(8), which is a bigint, the
largest size usable with XOR
SELECT @vb1 = @guid1, @vb2 = @guid2

-- xor the high and low parts separately
SELECT @hi = CONVERT(binary(8), SUBSTRING(@vb1,1,8)) ^ CONVERT(bigint,
SUBSTRING(@vb2,1,8))
SELECT @lo = CONVERT(binary(8), SUBSTRING(@vb1,9,8)) ^ CONVERT(bigint,
SUBSTRING(@vb2,9,8))

RETURN CONVERT(UNIQUEIDENTIFIER, @hi + @lo)

END
GO

SELECT dbo.GUIDXor('96B4316D-1EA7-4CA3-8D50-FEE8047C1329',
'FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF')
SELECT dbo.GUIDXor('96B4316D-1EA7-4CA3-8D50-FEE8047C1329',
'00000000-0000-0000-0000-000000000000')

--
Michael Cole

Erland Sommarskog

2015-08-11 19:35:46 UTC

Permalink

Post by Michael Cole
I have a table with a GUID as PK (created via NewId())
I also have a table to indicate links between records in this table,
with a dual PK of GUIDS (for the two records that connect to each
other)
Putting a constrain on these two fields will limit it to only one
combination of the two fields, i.e., a link of A to B, but I also need
to ensure that the link is not duplicated as B to A - the link is
non-directional.
My idea was to include a calculated field of the two GUIDs XORed
together, and place a constraint on this calculated field. Can anyone
see any issues with this idea?

There is obviously a risk for false positives:

SELECT CASE WHEN dbo.GUIDXor('DF468CF1-9035-477D-BE8B-3E597509F8E2',
'1B05A03D-B830-43A8-8396-9FE8364F952C') =
dbo.GUIDXor('DF468CF1-9035-43A8-BE8B-3E597509F8E2',
'1B05A03D-B830-477D-8396-9FE8364F952C')
THEN 1
ELSE 0
END

Here I have simply swapped a number of bytes in the GUID. What the
actual probability for this to happen, I don't know though.

You need to create the functon WITH SCHEMABINDING, but you may already
have discovered that.

--
Erland Sommarskog, Stockholm, ***@sommarskog.se

rpresser

2015-08-11 19:46:44 UTC

Permalink

Post by Michael Cole
Putting a constrain on these two fields will limit it to only one
combination of the two fields, i.e., a link of A to B, but I also need
to ensure that the link is not duplicated as B to A - the link is
non-directional.
My idea was to include a calculated field of the two GUIDs XORed
together, and place a constraint on this calculated field. Can anyone
see any issues with this idea?

Rather than an XOR, which as Erland points out could cause false positives, why not use a calculated field that appends the two GUIDs together with the one that is "less" coming first?

CREATE FUNCTION OneWayOnly
(
@guid1 UNIQUEIDENTIFIER ,
@guid2 UNIQUEIDENTIFIER
)
RETURNS VARCHAR(72)
AS
BEGIN
DECLARE @g1 VARCHAR(36) ,
@g2 VARCHAR(36) ,
@result VARCHAR(72);
SET @g1 = CONVERT(VARCHAR(36), @guid1);
SET @g2 = CONVERT(VARCHAR(36), @guid2);
IF @g1 < @g2
SET @result = @g1 + @g2;
ELSE
SET @result = @g2 + @g1;
RETURN @result;
END;

--CELKO--

2015-08-11 22:30:18 UTC

Permalink

I have a table with a GUID as PK (created via NewId()) <<

The purpose of a GUID is to locate a GLOBAL resource, not to serve as an attribute inside the schema. Since it is not an attribute of anything, it cannot be a key by definition. Keys are not created in RDBMS; they are discovered in the data itself.

What you are doing is trying to mimic a 1970's network database with fake pointer chains.

I also have a table to indicate links [sic] between records [sic] in this table, with a dual [sic: compound?] PK of GUIDS (for the two records [sic] that connect [sic: we reference] to each other)<<

In RDBMS, we have references, not links. That is more terminology from pointer chains. Rows are nothing like records. Records are physical, have no constraints, etc. Rows are logical and might not be physical at all, have DRI and CHECK() constraints, etc. Event eh term "connect" is a pointer chain term!

Putting a constraint on these two fields [sic] will limit it to only one combination of the two fields [sic], i.e., a link [sic] of A to B, but I also need to ensure that the link [sic] is not duplicated as B to A - the link [sic] is non-directional. <<

Again, fields are nothing like columns. It is that "physical versus logical" concept again. There is no direction in SQL; it is based on sets and not pointer chains or vectors.

In SQL, we would have UNIQUE (a, b) and CHECK (a < b) instead.

My idea was to include a calculated field [sic] of the two GUIDs XORed together, and place a constraint on this calculated field [sic]. Can anyone see any issues with this idea? <<

This is even bad from a network database model. The first rule was never do pointer arithmetic; always let the DB handle housekeeping and pointer allocations.

Everything you are doing is completely wrong, based on a lack of any understanding of RDBMS. Will your boss pay for an intro course of some kind? I miss the 1970's, too, but not IMS, IDMS, TOTAL, Image/300 and all the other zombie products you are trying to revive :)

rpresser

2015-08-12 03:13:30 UTC

Permalink

Post by --CELKO--
In SQL, we would have UNIQUE (a, b) and CHECK (a < b) instead.

Addressing this and only this point you've made, and not addressing flaws
in the original poster's model (which you have already done):

There is no guarantee that the left side GUID actually is less than the
right side; your check constraint would very likely prevent adding references
to the join table.

--CELKO--

2015-08-13 01:26:13 UTC

Permalink

You missed my point! GUIDs are not for keys; they are locators for external global resources. This guy wants to use them for pointers like we did in the 1970'd network databases.

If you use an actual compound key (a,b), then either they are drawn from two different domains that are independent (longitude, latitude), two different domains that are dependent (clothing size, colors), or the same domain (first color and second color). In the final case, the constraint (a < b) prevents (b,a), (a,a), and (b,b) Which means that the two-tone clothing item is only shown one way and really has two tones to it.

Michael Cole

2015-08-13 01:59:24 UTC

Permalink

Post by --CELKO--
You missed my point! GUIDs are not for keys; they are locators for external
global resources. This guy wants to use them for pointers like we did in the
1970'd network databases.
If you use an actual compound key (a,b), then either they are drawn from two
different domains that are independent (longitude, latitude), two different
domains that are dependent (clothing size, colors), or the same domain (first
color and second color). In the final case, the constraint (a < b) prevents
(b,a), (a,a), and (b,b) Which means that the two-tone clothing item is only
shown one way and really has two tones to it.

Except that sometimes you need to use artificial keys, either due to
the data itself does not lend itself to a nice easy compound key or
becuse the number of fields to be included in the compound key would be
enourmous.

In this particular case, the table is for connections between entity
roles.

There is no way of uniquely identifying an entity - they can be
combinations of first, middle and last names, or legal or trading
names, and without an artificial key, it would be unworkable.

We then add into this mix that each entity could be performing multiple
roles, and that their role allocation may change over time - it gets
rather complicated.

Particularly given that the roles that they may be performing will most
likely be different for the different clients that they have.

We then get to the final piece of the puzzle which is that an entity
performing a specific role may then require a connection to another
entity perfoming a role.

Not everything is as simple as you make out. Regardless of your
critisisms of my terminology, I am developing software for actual
usage, and whilst it may not be perfect, my boss is more concerned with
it being productive and useful.

And no, he's probably not going to pay for an intro course of some
kind. Our clients don't pay for terminology.

Now you are very knowledgable, and there are useful bits in what you
say, but not everything is as pristine as you would like it to be, and
the general lecturing tone does not help.

--
Michael Cole