Skip to content

This repository contains a collection of Python scripts designed to facilitate the integration of Research Organization Registry (ROR) IDs with a Pure instance. These scripts are useful for extracting data from a Pure instance, querying the ROR API for matching organizations, and updating the external organizations in Pure instance with ROR IDs.

License

Notifications You must be signed in to change notification settings

svidmar/Pure_ROR_scripts

Repository files navigation

Pure2ROR2Pure: Pure & ROR Integration Scripts

Overview

This repository contains a collection of Python scripts designed to facilitate the integration of Research Organization Registry (ROR) IDs with a Pure instance. These scripts are useful for extracting data from a Pure instance, querying the ROR API for matching organizations, and updating the external organizations in Pure with ROR IDs.

Recommended Workflow

1. Identify ROR IDs for Organizations in Pure

# Query ROR API for all external orgs in Pure
python getror-docker.py  # OR python getror-rorapi.py

# Creates: output.csv with ROR ID matches and confidence scores

Review the output:

  • Filter by Score column (recommend keeping only Score > 0.8)
  • Manually verify matches for important/high-profile organizations
  • Remove incorrect matches before proceeding

2. Write ROR IDs Back to Pure

# Update Pure with the verified ROR IDs
python writeror2pure.py

# Input: output.csv (filtered from step 1)
# This adds ROR IDs to organizations that don't have one yet

Safety features:

  • Auto-detects CSV delimiter
  • Skips organizations that already have a ROR ID
  • Retry logic for network errors
  • Logs all updates to pure_updates.log

3. Merge Duplicate Organizations

# Merge orgs with the same ROR ID
python merge_ex_orgs_by_rorid.py

# Input: output.csv (with ROR IDs from step 1 or 2)

Select execution mode:

  • DRY-RUN (recommended first!): Preview merges without making changes
  • INTERACTIVE: Manually approve each merge individually
  • AUTOMATIC: Execute all merges (requires confirmation)

Enable live verification (recommended):

  • Fetches current org data from Pure API
  • Verifies ROR IDs match between CSV and Pure
  • Uses live workflow status (not CSV data)
  • Detects deleted/changed organizations

Merge logic:

  • Groups organizations by ROR ID
  • Requires exactly 1 "Approved" org per ROR ID (becomes merge target)
  • Merges all other orgs into the approved one
  • ⚠️ IRREVERSIBLE - Test in staging first!

Scripts

  • getror-rorapi.py: Queries the ROR API with external organization names from a Pure instance to find potential matching ROR IDs.
  • getror-docker.py: Similar to getror-rorapi.py but designed to work with a local ROR API instance, run via Docker. Info here
  • csv-to-ror_docker.py: Reads a CSV file containing organization names and UUIDs, queries a local ROR API Docker instance for matches, and generates an output CSV with ROR IDs.
  • writeror2pure.py: Takes the output CSV from the ROR querying scripts and updates the Pure instance with ROR IDs.
  • merge_ex_orgs_by_rorid.py: Merge external organizations in Pure based on a CSV file containing Pure UUID, Workflow step and ROR ID

Requirements

  • Python 3.x
  • Required libraries: pip install requests pandas
  • API key for Pure with read/write rights to the /external-organizations/* endpoint
  • Docker (optional, for local ROR API instance)

Recommendations

  • Run in a staging/test envriroment before moving to production!
  • Check the results in the csv file, and filter out wrong IDs before writing data back to Pure

Setup

  1. Clone this repository to your local machine.
  2. Ensure you have Python 3.x installed.
  3. Install the required Python packages:
    pip install requests pandas

Usage

When running the scripts, you need to input variables such as API keys, base URLs and csv file locations.

Querying ROR API

To query the ROR API for organization matches, run:

python getror-rorapi.py

Or, if you're using a local ROR API Docker instance:

python getror-docker.py

Generating and Updating ROR IDs

To generate a CSV with organization names, UUIDs, and their corresponding ROR IDs:

python csv-to-ror_docker.py

To update your Pure instance with ROR IDs from a generated CSV file:

python writeror2pure.py

Merging external organizations in Pure based on ROR ID and workflow status

To merge external organizations in Pure based on a CSV file containing Pure UUID, Workflow step and ROR ID.:

python merge_ex_orgs_by_rorid.py

Contributing

Contributions to this repository are welcome. Please fork the repository and submit a pull request with your changes.

About

This repository contains a collection of Python scripts designed to facilitate the integration of Research Organization Registry (ROR) IDs with a Pure instance. These scripts are useful for extracting data from a Pure instance, querying the ROR API for matching organizations, and updating the external organizations in Pure instance with ROR IDs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages