PyST (py-semantic-taxonomy)#
PyST is opinionated server software for creating, maintaining, and publishing SKOS/XKOS taxonomies.
- API docs, OpenAPI 3.1 JSON, OpenAPI 3.1 YAML
- GitHub repo
- Client library
- Client library usage guide
- Example notebook
PyST was built and is maintained by Cauldron Solutions.
Quickstart#
1. Install required software
- Install and configure Postgres
- Install and configure Typesense
pip install py-semantic-taxonomy
If you just want to try our the software, and have Docker installed on your machine, you can run Postgres and Typesense in containers using the scripts in the scripts
directory, i.e.:
python scripts/start_postgres_container.py
python scripts/start_typesense_container.py
These scripts will give you the values of the environment variables needed for step 2. Note that you will still need to make up your own PyST_auth_token
setting and make sure it is set correctly, i.e.:
export PyST_auth_token="supersecret"
2. Configure required software
The following parameters must be either specified as environment variables, or given in the file pyst-config.env
.
Note
We use pydantic-settings
for settings management, please note their instructions on dependencies, precedence, and env file location.
PyST_db_user
: Postgres user. Must have table and index creation rights.PyST_db_pass
: Postgres password for given userPyST_db_host
: Postgres host URLPyST_db_port
: Postgres portPyST_db_name
: Postgres database name; default is "PyST"PyST_auth_token
: Authorization header token to allow users to change dataPyST_typesense_url
: Typesense host URLPyST_typesense_api_key
: Typesense API key. Must have collection creation rights.PyST_typesense_embedding_model
: Typesense embedding model for semantic search. Default is "ts/all-MiniLM-L12-v2"PyST_typesense_prefix
: Optional prefix for Typesense collection labels.PyST_languages
: List of language codes used in the search engine and web UI. Should be a JSON string, e.g.'["en", "de"]'
. Default is'["en", "de", "es", "fr", "pt", "it", "da"]'
.
Note
If you are deploying more than one PyST instance, you can set PyST_typesense_prefix
to a different value for each instance. This will keep the search results for each instance separate. The PyST_typesense_prefix
should only include letters and numbers, and should start with a letter.
3. Run the server
PyST is a FastAPI app; it can be run as any python ASGI app, e.g. with uvicorn
:
import uvicorn
uvicorn.run(
"py_semantic_taxonomy.app:create_app",
host="0.0.0.0",
port=8000,
log_level="warning",
)
If you are using the default ASGI app runner and configuration options, you can also do:
python <pyst-source-directory>/src/py_semantic_taxonomy/app.py
4. Add data
See common workflows for a guide on adding example data.
Why New Software?#
There are a number of great projects for browsing SKOS taxonomies already, including:
JSKOS translate SKOS to JSON, and provides validation and publication capabilities. It's an amazing project with a long history, but we started with a strict requirement that data transfer would be valid JSON-LD follow SKOS and other RDF specifications.
Our user community is comfortable with Python and relational databases, and our experiments to customize skosmos and write complicated queries in SPARQL proved to be serious barriers to barriers to productive software and vocabulary maintenance. We also wanted more flexibility on the choice of search engine.
In py_semantic_taxonomy
we have the following goals:
- Native and rich support for XKOS
Correspondence
andConceptAssociation
classes - A predictable, consistent, and validated set of properties and property uses for SKOS and XKOS terms
- Web interface to allow for browsing
- API to allow for the complete set of CRUD operations
- API provides common graph queries without needing to learn SPARQL
- IRIs should resolve to HTML or RDF serialized resources, depending on requested media type
- Web interface supports high quality multilingual search without configuration pain
This means that we want the following technical capabilities which are missing or more difficult than they need to be in SKOSMOS:
- A set of validation classes and functions for input data to ensure consistency in how objects are described.
- Better query performance by optimizing database structure and indices for a small set of needed edges
- Easy customization of the UI
- Pluggable search index
To put it another way, SKOSMOS is amazing software which can handle knowledge organization systems which are based on SKOS and already exist in a graph database, but which include a lot of inconsistency and variability - PyST has a reduced feature set, but allows for easier data editing, and is much pickier about incoming data.