Common Workflows#
Importing the Combined Nomenclature codes#
This functionality is built into pyst_client
. py-semantic-taxonomy
must be deployed and available:
from pyst_client.cn import CombinedNomenclatureLoader
CombinedNomenclatureLoader(
year=<year>,
api_key=<pyst-api-key>,
host=<host>,
sample=True or False
).write()
Where:
year
is a integer, like2024
. If only installing the sample data, this should be 2024 or 2025 (you can also run both one after the other).api_key
is the write-enabled API key (PyST_auth_token
) forpy-semantic-taxonomy
.host
is the URL thatpy-semantic-taxonomy
is running at, e.g. "http://localhost:8000" if running locallysample
is a boolean flag on whether only a sample of the available data should be imported. The full import takes more than an hour.
If you are testing locally using the default Docker containers for Postgres and Typesense, then the CombinedNomenclatureLoader
can take these values:
CombinedNomenclatureLoader(
year=2024,
api_key="abc123", # Default API key; adjust if needed
host="http://127.0.0.1:8000", # Default Uvicorn port; adjust if needed
sample=True
).write()
Creating a new taxonomy#
PyST stores concept schemes, concepts, and relationships between concepts all in different places, so the creation of these objects needs to happen in a defined order:
- First, decide on the URL pattern you will use for concept schemes and concepts. A reasonable pattern is
https://<base_url>/<concept-scheme-notation>/<concept-notation>
. Many of the EU semantic taxonomies use this patter, e.g.http://data.europa.eu/xsp/cn2025/970300000080
. When following this pattern, the concept scheme notation should be different from version to version or year to year. - Second, create the concept scheme.
- Third, create the concepts. Although it is possible to provide relationship information among concepts inside the individual concept documents, this is recommended against, as concept creation requests are normally submitted in parallel, and we can't run graph integrity checks against unknown graph nodes.
- Finally, define relationships among concepts. It's best if each request to
relationships
creates one relationship, and these can be chunked to run in parallel (asyncio doesn't seem to like it when thousands of tasks are submitted at once - its better to do 20 or 50 at a time).
Updating a Concept
or a Concept
relationship#
Best practice is to always record the who, what, why, and when of changes, which can be done by adding change, editorial, or history notes to the Concept
.
Depending on your institutions review practices, you could also require that change suggestions include changing the Concept.status
to draft
, and that changes are only accepted after review, when the status could be changed to accepted
.
Creating a new Correspondence#
Creating new Correspondence
and ConceptAssociation
objects should also follow a set order:
- First, decide on the URL pattern you will use, especially for
ConceptAssociations
, which can be N-to-1. It would be nice if the URLs forCorrespondences
were human readable, butConceptAssociations
could use computer-generated ids. - Second, create the
Correspondence
object. This doesn't have any references toConceptAssociation
objects. - Third, create the
ConceptAssociation
objects. They don't have any information on membership in aCorrespondence
. - Finally, link the
ConceptAssociation
objects to theCorrespondence
using themade_of
endpoint. This can be done in one request as this endpoint takes a list of inputs.