This repository contains a collection of Python scripts designed to facilitate the integration of Research Organization Registry (ROR) IDs with a Pure instance. These scripts are useful for extracting data from a Pure instance, querying the ROR API for matching organizations, and updating the external organizations in Pure with ROR IDs.
# Query ROR API for all external orgs in Pure
python getror-docker.py # OR python getror-rorapi.py
# Creates: output.csv with ROR ID matches and confidence scoresReview the output:
- Filter by
Scorecolumn (recommend keeping only Score > 0.8) - Manually verify matches for important/high-profile organizations
- Remove incorrect matches before proceeding
# Update Pure with the verified ROR IDs
python writeror2pure.py
# Input: output.csv (filtered from step 1)
# This adds ROR IDs to organizations that don't have one yetSafety features:
- Auto-detects CSV delimiter
- Skips organizations that already have a ROR ID
- Retry logic for network errors
- Logs all updates to
pure_updates.log
# Merge orgs with the same ROR ID
python merge_ex_orgs_by_rorid.py
# Input: output.csv (with ROR IDs from step 1 or 2)Select execution mode:
- DRY-RUN (recommended first!): Preview merges without making changes
- INTERACTIVE: Manually approve each merge individually
- AUTOMATIC: Execute all merges (requires confirmation)
Enable live verification (recommended):
- Fetches current org data from Pure API
- Verifies ROR IDs match between CSV and Pure
- Uses live workflow status (not CSV data)
- Detects deleted/changed organizations
Merge logic:
- Groups organizations by ROR ID
- Requires exactly 1 "Approved" org per ROR ID (becomes merge target)
- Merges all other orgs into the approved one
⚠️ IRREVERSIBLE - Test in staging first!
- getror-rorapi.py: Queries the ROR API with external organization names from a Pure instance to find potential matching ROR IDs.
- getror-docker.py: Similar to
getror-rorapi.pybut designed to work with a local ROR API instance, run via Docker. Info here - csv-to-ror_docker.py: Reads a CSV file containing organization names and UUIDs, queries a local ROR API Docker instance for matches, and generates an output CSV with ROR IDs.
- writeror2pure.py: Takes the output CSV from the ROR querying scripts and updates the Pure instance with ROR IDs.
- merge_ex_orgs_by_rorid.py: Merge external organizations in Pure based on a CSV file containing Pure UUID, Workflow step and ROR ID
- Python 3.x
- Required libraries:
pip install requests pandas - API key for Pure with read/write rights to the /external-organizations/* endpoint
- Docker (optional, for local ROR API instance)
- Run in a staging/test envriroment before moving to production!
- Check the results in the csv file, and filter out wrong IDs before writing data back to Pure
- Clone this repository to your local machine.
- Ensure you have Python 3.x installed.
- Install the required Python packages:
pip install requests pandas
When running the scripts, you need to input variables such as API keys, base URLs and csv file locations.
To query the ROR API for organization matches, run:
python getror-rorapi.pyOr, if you're using a local ROR API Docker instance:
python getror-docker.pyTo generate a CSV with organization names, UUIDs, and their corresponding ROR IDs:
python csv-to-ror_docker.pyTo update your Pure instance with ROR IDs from a generated CSV file:
python writeror2pure.pyTo merge external organizations in Pure based on a CSV file containing Pure UUID, Workflow step and ROR ID.:
python merge_ex_orgs_by_rorid.pyContributions to this repository are welcome. Please fork the repository and submit a pull request with your changes.