Skip to content

PyST (py-semantic-taxonomy)#

PyST is opinionated server software for creating, maintaining, and publishing SKOS/XKOS taxonomies.

PyST was built and is maintained by Cauldron Solutions.

Quickstart#

1. Install required software

  • Install and configure Postgres
  • Install and configure Typesense
  • pip install py-semantic-taxonomy

If you just want to try our the software, and have Docker installed on your machine, you can run Postgres and Typesense in containers using the scripts in the scripts directory, i.e.:

  • python scripts/start_postgres_container.py
  • python scripts/start_typesense_container.py

These scripts will give you the values of the environment variables needed for step 2. Note that you will still need to make up your own PyST_auth_token setting and make sure it is set correctly, i.e.:

export PyST_auth_token="supersecret"

2. Configure required software

The following parameters must be either specified as environment variables, or given in the file pyst-config.env.

Note

We use pydantic-settings for settings management, please note their instructions on dependencies, precedence, and env file location.

  • PyST_db_user : Postgres user. Must have table and index creation rights.
  • PyST_db_pass : Postgres password for given user
  • PyST_db_host : Postgres host URL
  • PyST_db_port : Postgres port
  • PyST_db_name : Postgres database name; default is "PyST"
  • PyST_auth_token : Authorization header token to allow users to change data
  • PyST_typesense_url : Typesense host URL
  • PyST_typesense_api_key : Typesense API key. Must have collection creation rights.
  • PyST_typesense_embedding_model : Typesense embedding model for semantic search. Default is "ts/all-MiniLM-L12-v2"
  • PyST_typesense_prefix : Optional prefix for Typesense collection labels.
  • PyST_languages : List of language codes used in the search engine and web UI. Should be a JSON string, e.g. '["en", "de"]'. Default is '["en", "de", "es", "fr", "pt", "it", "da"]'.

Note

If you are deploying more than one PyST instance, you can set PyST_typesense_prefix to a different value for each instance. This will keep the search results for each instance separate. The PyST_typesense_prefix should only include letters and numbers, and should start with a letter.

3. Run the server

PyST is a FastAPI app; it can be run as any python ASGI app, e.g. with uvicorn:

import uvicorn

uvicorn.run(
    "py_semantic_taxonomy.app:create_app",
    host="0.0.0.0",
    port=8000,
    log_level="warning",
)

If you are using the default ASGI app runner and configuration options, you can also do:

python <pyst-source-directory>/src/py_semantic_taxonomy/app.py

4. Add data

See common workflows for a guide on adding example data.

Why New Software?#

There are a number of great projects for browsing SKOS taxonomies already, including:

JSKOS translate SKOS to JSON, and provides validation and publication capabilities. It's an amazing project with a long history, but we started with a strict requirement that data transfer would be valid JSON-LD follow SKOS and other RDF specifications.

Our user community is comfortable with Python and relational databases, and our experiments to customize skosmos and write complicated queries in SPARQL proved to be serious barriers to barriers to productive software and vocabulary maintenance. We also wanted more flexibility on the choice of search engine.

In py_semantic_taxonomy we have the following goals:

  • Native and rich support for XKOS Correspondence and ConceptAssociation classes
  • A predictable, consistent, and validated set of properties and property uses for SKOS and XKOS terms
  • Web interface to allow for browsing
  • API to allow for the complete set of CRUD operations
  • API provides common graph queries without needing to learn SPARQL
  • IRIs should resolve to HTML or RDF serialized resources, depending on requested media type
  • Web interface supports high quality multilingual search without configuration pain

This means that we want the following technical capabilities which are missing or more difficult than they need to be in SKOSMOS:

  • A set of validation classes and functions for input data to ensure consistency in how objects are described.
  • Better query performance by optimizing database structure and indices for a small set of needed edges
  • Easy customization of the UI
  • Pluggable search index

To put it another way, SKOSMOS is amazing software which can handle knowledge organization systems which are based on SKOS and already exist in a graph database, but which include a lot of inconsistency and variability - PyST has a reduced feature set, but allows for easier data editing, and is much pickier about incoming data.