Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible improvements for template and converter #124

Open
19 tasks
dalito opened this issue Jul 5, 2023 · 12 comments
Open
19 tasks

Possible improvements for template and converter #124

dalito opened this issue Jul 5, 2023 · 12 comments
Labels
breaking Changes breaking backward compatibility discussion This needs more discussion
Milestone

Comments

@dalito
Copy link
Member

dalito commented Jul 5, 2023

Taken over from nfdi4cat/VocExcel#2

Template

This is for discussing possible future structural changes. This is not urgent but may serve as a checklist to review before the next big template-version step (in descending priority):

  • Get rid of "Additional Concept Features" sheet. The columns can be moved to the Concepts sheet.
  • The size of a collection is currently constrained by the maximum number of characters in xlsx cells (32767 chars). Depending on the IRI-length this may correspond to a few thousand concepts per collection (which is probably large enough for most applications). However, the UX how the membership in a collection is added or edited is rather poor. It would be much nicer, if membership could be marked in the concept sheet. This could be done by adding one column per collection to the sheet "Concepts". Then Excel filters could also be used efficiently to edit/review membership of Concepts in Collections. If we change the template like this, we will have a tighter limit on the number of collections since xlsx columns are limited to max. 16 384. However, more than 16 000 collections should be even less problematic than the current limit of a few thousand concepts per collection.
  • support for skos:altLabel in multiple languages. Currently it is assumed that all (comma-separated) altLabels are always given in the default language "en".
  • Support different languages for collection prefLabel.
  • Make multi-language data entry easier by specifying just one language per line in the concept sheet.
  • Put version of template to a better place and add information about min. version of voc4cat required.
  • Use a different separator than comma in xlsx cells. Users often use a comma as part of the text and/or use a semicolon as separator because they are used to semicolons from Excel formulas. It is suggested to separate urls from other urls or text by space (and/or) newline. For pure text fields (alternate label), we should consider to use a vertical bar | as separator.
  • Add a notes/feedback column (skos:editorialNote) for editorial purposes. It could be used by tools (or humans) to add notes which are relevant for editing the concept/collection. This column may also be used by checking tools.
  • Change provenance column to skos:changeNote column to store change notes including date and author. Allow multiple line each with <date> <gh-name> <change-note-text>. This structure will be validated so that correct DC:provenance data can be created for each concept & collection. (related: dcterms:provenance - Correctly used? #122)
  • Support two ways of giving credit to used sources:
    • (i) vebratim copies; the source should be entered in columns "Source Vocab" (dct:source), "Source Vocab license", ""Source Vocab Rights holder"
    • (ii) definitions influenced by other sources; these should be entered in "Influenced by IRIs" column.
  • Add a status column with states proposed/accepted/obsolete Auto generate skos:historyNote with date & state upon change. Suggested states to track: created, obsoleted because ... (see next point). This information will not be present in Excel but only in turtle.
  • Add column for reason of obsoletion and provide pre-defined reasons for obsoletion (inspired by https://wiki.geneontology.org/index.php/Obsoleting_an_Existing_Ontology_Term)
    • The term is not clearly defined and usage has been inconsistent.
    • This term was added in error.
    • More specific terms were created.
    • This term was converted to a collection.
    • The meaning of the term is ambiguous.
    • There is no evidence that this function/process/component exists.
  • (maybe) Use tables instead of hard-coded cell-positions and sheet names. Tables can be found independently of their cell position and "home" sheet. This would give users more flexibility to adjust the layout. (previously suggested here)

Converter

  • The user should never change anything in the concept scheme sheet of the template. So the sheet should just created as info-page but never read. To realize this we need to extend the vocabulary configuration file. Some additional fields should be added (e.g. homepage-URL or issue-tracker-URL).
  • (maybe) Output SKOS-XL. Then a unique ID for each translation allows to make statements on the translated concept, e.g. about provenance of the translation.
  • (maybe) Support skos:orderedCollection
  • (maybe) Support not-yet supported SKOS relations like skos:broaderTransitive, skos:narrowerTransitive

Profile

  • Allow prefLabel in multiple languages (see vocexcel#1). We probably need our own SHACL vocabulary profile.
  • Several changes suggested above e.g. the use of skos:notes require profile changes.
@dalito dalito added discussion This needs more discussion breaking Changes breaking backward compatibility labels Jul 5, 2023
@dalito dalito added this to the 1.0.0 milestone Jul 15, 2023
@dalito
Copy link
Member Author

dalito commented Aug 4, 2023

Here is a draft for a new xlsx template structure that includes all the changes proposed above.

Note that the help sheet was not yet updated.


Previous versions:

@dalito
Copy link
Member Author

dalito commented Aug 31, 2023

The proposed new template (2nd draft) cannot handle that skos:collections may have not only concepts but also other collections as skos:member.

Note, the 0.4.3 template could also not express collection_A memberOf collection_B.

@markdoerr
Copy link
Contributor

I like esp. the first three items on the list, @dalito :)

@markdoerr
Copy link
Contributor

Hi @dalito,

regarding the collection item in the checklist above:
I would suggest the following process:

  1. collections (IRI and name) are registered in the "collections" tab
  2. in the "concept" tab each collection gets its own column (column name must match the collection name)
  3. to add a concept to a collection, simply a cross ("X") needs to be set in the corresponding column

@dalito
Copy link
Member Author

dalito commented Dec 13, 2023

Like the green column Q of the example file (2nd draft)? There would be one column per collection in concept sheet. I only added a single column to show the idea.

@dalito
Copy link
Member Author

dalito commented Dec 13, 2023

Collection in collection would be modeled in collections sheet just like narrower is modeled or concepts in the concept sheet. This is not yet in the 2nd draft IIRC.

@markdoerr
Copy link
Contributor

Like the green column Q of the example file (2nd draft)? There would be one column per collection in concept sheet. I only added a single column to show the idea.

yes, @dalito, and then in the cells the user just adds an "X" (small or capital should be allowed) - boolean would be nicer, but most non-programmers are not so familiar with this concept of True and False ;)

@markdoerr
Copy link
Contributor

Hi @dalito,
here some further usability improvement suggestions:
Children IRIs should be referenced by preferred Label (this is more readable and less error prone).

If possible, I would omit the Concept IRI from the Concept tab, completely. The Concept IRIs with the right padding could be automatically generated by the CI-pipeline. As numbering for the IRIs one could then just use the line numbers of the excel sheet. That would simplify the sheet.

@dalito
Copy link
Member Author

dalito commented Dec 13, 2023

Children IRIs should be referenced by preferred Label"

This easily breaks if a label is changed at one place but another is forgotten. In the past there were many problems with misspellings, case, white space or separator use. IDs are the solution to this.

It is possible to use indentation for expressing broader/narrower hierarchy between concepts. This requires a local install of voc4cat-tool. I would suggest to install pipx and then use pipx to install voc4cat-tool with pipx install voc4cat. To get help on the transformation to/from indentation run voc4cat transform --help.

@dalito
Copy link
Member Author

dalito commented Feb 9, 2024

@markdoerr In the childrenIRI field we could perhaps append the preferred label after each IRI. The label would just be present for convenience but would be stripped off when reading.

https://example.org/0000105 (infrared)
https://example.org/0000106 (visible)
https://example.org/0000107 (ultraviolet)

@dalito
Copy link
Member Author

dalito commented Feb 10, 2024

I updated the first message and put a new (3rd) draft for the template "1.0" to the 2nd message which addresses all issue/ideas that came up until now.

@markdoerr
Copy link
Contributor

Thanks @dalito,
sounds good, I will have a look ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Changes breaking backward compatibility discussion This needs more discussion
Projects
None yet
Development

No branches or pull requests

2 participants