Nick Hall
2016-10-01 23:07:56 UTC
Devs,
The code generally looks good, but not quite finished.
Firstly, I didn't look at the experimental API code in any detail. This
is part of GEPS 33 and currently unused. It needs to be moved into a
branch.
Next, I have a list of issues:
The autobackup method imports from cli, which is not allowed. The gen
module should be self-contained.
The public API has some new methods which need to be defined in the base
class. e.g. get_summary and get_schema_version
There are some public methods which need to be private or protected.
e.g. update_backlinks and update_secondary_values
BSDDB tables look like dictionaries. A Map class provides this
functionality in the DB-API code.
Only the import code accesses the database in this way. It would be
quite easy to rewrite the code to remove the need for the Map class. It
may be better to make this change now.
The get_*_from_gramps_id, has_*_handle, and get_raw_*_data methods use
the maps. This is not a good idea if we plan to get rid of them.
Why don't the get_raw_*_data methods just call the corresponding
_get_raw_*_data method? Do we plan do deprecate the raw methods?
BSDDB provides cursors to quickly iterate over data in a table. A Cursor
and TreeCursor class provides this functionality in the DB-API code.
The TreeCursor doesn't seem to have been implemented correctly. Cursors
are only used in the list/tree views. Do we want to replace them?
There are new has_handle_for_* methods which perform the same
functionality as has_*_handle. They should be removed. We don't need
duplicates.
There are new has_gramps_id_for_* methods. They should be renamed
has_*_gramps_id for consistency.
Do we really need the new get_*_gramps_ids methods?
Some groups of methods share repeated code. This needs tidying up.
My main concern is with the secondary fields. These are extra columns
that are added for indexing.
Some columns have long names such as
primary_name__surname_list__0__surname. This is rather messy.
There are some duplicates (in addition to duplicating information in the
serialised data). The person table has columns gender and gender_type;
surname and primary_name__surname_list__0__surname for example.
Columns that contain data from other tables are not updated. For
example the father_handle__primary_name__first_name column is not
updated when the name of the father changes.
Secondary columns are added automatically from a schema in the primary
objects. We need to index some fields such as gramps_id, but should we
index all top level fields with a simple type? What is the point of
indexing tag color or media checksum for example? Adding extra indexes
will slow down imports.
Indexing isn't as simple as creating an index for every column.
Sometimes it is better to index on multiple columns for example
(latitude, longitude) or (surname, first_name). In Gramps, the "sort
as" field also has to be taken into account for names.
The secondary field/column design needs to be reviewed. A simpler
implementation may be preferable.
I think that I have said enough for now. I await any comments.
Regards,
Nick.
The code generally looks good, but not quite finished.
Firstly, I didn't look at the experimental API code in any detail. This
is part of GEPS 33 and currently unused. It needs to be moved into a
branch.
Next, I have a list of issues:
The autobackup method imports from cli, which is not allowed. The gen
module should be self-contained.
The public API has some new methods which need to be defined in the base
class. e.g. get_summary and get_schema_version
There are some public methods which need to be private or protected.
e.g. update_backlinks and update_secondary_values
BSDDB tables look like dictionaries. A Map class provides this
functionality in the DB-API code.
Only the import code accesses the database in this way. It would be
quite easy to rewrite the code to remove the need for the Map class. It
may be better to make this change now.
The get_*_from_gramps_id, has_*_handle, and get_raw_*_data methods use
the maps. This is not a good idea if we plan to get rid of them.
Why don't the get_raw_*_data methods just call the corresponding
_get_raw_*_data method? Do we plan do deprecate the raw methods?
BSDDB provides cursors to quickly iterate over data in a table. A Cursor
and TreeCursor class provides this functionality in the DB-API code.
The TreeCursor doesn't seem to have been implemented correctly. Cursors
are only used in the list/tree views. Do we want to replace them?
There are new has_handle_for_* methods which perform the same
functionality as has_*_handle. They should be removed. We don't need
duplicates.
There are new has_gramps_id_for_* methods. They should be renamed
has_*_gramps_id for consistency.
Do we really need the new get_*_gramps_ids methods?
Some groups of methods share repeated code. This needs tidying up.
My main concern is with the secondary fields. These are extra columns
that are added for indexing.
Some columns have long names such as
primary_name__surname_list__0__surname. This is rather messy.
There are some duplicates (in addition to duplicating information in the
serialised data). The person table has columns gender and gender_type;
surname and primary_name__surname_list__0__surname for example.
Columns that contain data from other tables are not updated. For
example the father_handle__primary_name__first_name column is not
updated when the name of the father changes.
Secondary columns are added automatically from a schema in the primary
objects. We need to index some fields such as gramps_id, but should we
index all top level fields with a simple type? What is the point of
indexing tag color or media checksum for example? Adding extra indexes
will slow down imports.
Indexing isn't as simple as creating an index for every column.
Sometimes it is better to index on multiple columns for example
(latitude, longitude) or (surname, first_name). In Gramps, the "sort
as" field also has to be taken into account for names.
The secondary field/column design needs to be reviewed. A simpler
implementation may be preferable.
I think that I have said enough for now. I await any comments.
Regards,
Nick.