Skip to content

Expand assay value sets and add BioPortal links#53

Merged
cmungall merged 2 commits intomainfrom
expand-assay-valuesets
Feb 22, 2026
Merged

Expand assay value sets and add BioPortal links#53
cmungall merged 2 commits intomainfrom
expand-assay-valuesets

Conversation

@cmungall
Copy link
Member

Summary

  • Restructure bio/assays.yaml into bio/assays/ subdirectory with domain-specific static enums (enzymology, immunology, protein engineering, toxicology) and dynamic OBI/BAO-backed enums
  • Add BioPortal [View in BioPortal] link to all generated enum documentation pages via enrich_enum_docs.py
  • Update lab-automation view with comprehensive assay section covering all new and existing NF-OSI assay enums

Details

New schema files:

  • bio/assays/bio_assay.yaml -- dynamic enums for OBI:0000070 (assay) and BAO:0000015 (bioassay) hierarchies
  • bio/assays/enzymology.yaml -- 9 PVs (enzyme activity, kinetics, inhibition, substrate specificity)
  • bio/assays/immunology.yaml -- 9 PVs (ELISA, flow cytometry, western blot, immunoprecipitation)
  • bio/assays/protein_engineering.yaml -- 7 PVs (SPR, BLI, thermal shift, display technologies)
  • bio/assays/toxicology.yaml -- 8 PVs (cytotoxicity, genotoxicity, ADME, dose-response)

All ontology CURIEs verified via OLS. Companion enum annotations document microschema coupling (e.g., which enums for substrates, enzymes, detection modes should pair with each assay type).

BioPortal integration:
Each enum doc page now links to its BioPortal class view at https://bioportal.bioontology.org/ontologies/VALUESETS?p=classes&conceptid=...

Test plan

  • just validate -- no new errors (all failures are pre-existing NF-OSI CURIE issues)
  • just site -- builds successfully, 637 enriched docs
  • BioPortal links render correctly and are idempotent on rebuild
  • Verify BioPortal links resolve once ontology is next published

🤖 Generated with Claude Code

Restructure bio/assays into subdirectory with domain-specific enums:
- Move existing NF-OSI assays to bio/assays/nf_osi_assays.yaml
- Add dynamic enums for OBI and BAO assay hierarchies (bio_assay.yaml)
- Add static enums for enzymology, immunology, protein engineering, toxicology
- All ontology CURIEs verified via OLS
- Companion enum annotations document microschema coupling

Add BioPortal link to all enum documentation pages via enrich_enum_docs.py.
Update lab-automation.md view with new assay section.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 22, 2026 20:49
@claude
Copy link
Contributor

claude bot commented Feb 22, 2026

Claude finished @cmungall's task —— View job


Code Review Complete ✅

I've conducted a comprehensive code review of this PR. Here's my detailed analysis and feedback:

Review Checklist

  • Read repository guidelines and setup instructions
  • Analyze PR structure and organization changes
  • Review new schema files for LinkML best practices
  • Examine BioPortal integration implementation
  • Check for potential bugs and issues
  • Assess performance considerations
  • Evaluate security concerns
  • Review test coverage and validation approach
  • Provide final recommendations

Overall Assessment

This is a high-quality PR that demonstrates excellent LinkML schema design practices. The restructuring of assay value sets into domain-specific modules and the addition of BioPortal integration significantly improve the project's organization and usability.

✅ Strengths

Schema Organization & Best Practices

  • Excellent domain separation: The restructuring from a single bio/assays.yaml to bio/assays/ subdirectory with domain-specific files (enzymology, immunology, protein engineering, toxicology) follows solid architectural principles
  • Proper LinkML conventions: All schemas follow repository guidelines with CamelCase enum names, UPPER_CASE permissible values, and consistent use of CURIE mappings
  • Ontology integration: Proper use of meaning: fields mapping to BAO and OBI ontologies with verified CURIEs
  • Dynamic enum implementation: Well-structured dynamic enums using reachable_from with appropriate source ontologies and relationship types

Code Quality

  • Comprehensive companion annotations: The companion_enum_* annotations provide excellent microschema coupling guidance (e.g., which ChemicalEntity, Protein, DetectionModeEnum should pair with each assay)
  • Generated code consistency: Python datamodel files are properly generated and follow consistent patterns
  • Documentation structure: The lab-automation view provides excellent coverage and organization of the assay domain

BioPortal Integration

  • Clean implementation: The enrich_enum_docs.py script adds BioPortal links elegantly without disrupting existing content
  • Idempotent processing: Uses HTML comments (``) to prevent duplicate processing
  • URL encoding: Proper handling of concept URIs with quote() for safe URL construction

🔍 Technical Review

Performance Considerations

  • ✅ Minimal impact: The restructuring primarily affects schema organization without performance implications
  • ✅ Dynamic enum efficiency: Uses established LinkML patterns for ontology-backed enums
  • ✅ Generation process: Script processing is efficient with caching for OLS label lookups

Security Assessment

  • ✅ No security concerns: All changes involve schema definitions and documentation generation
  • ✅ Safe URL handling: BioPortal URL generation uses proper encoding
  • ✅ No credential exposure: No sensitive information in the changes

Test Coverage & Validation

  • ✅ Validation passing: PR description confirms just validate runs successfully
  • ✅ Build verification: just site builds successfully with 637 enriched docs
  • ✅ Schema integrity: All ontology CURIEs verified via OLS

📋 Minor Suggestions

1. Documentation Enhancement

The BioPortal links will show "View in BioPortal" but won't resolve until the ontology is next published. Consider adding a note in the PR or documentation about this temporary limitation.

2. Annotation Consistency

In src/valuesets/schema/bio/assays/toxicology.yaml:77-79, the ADME_ASSAY entry is missing a meaning: field while other entries have them. Consider adding an appropriate BAO or OBI mapping if available.

3. Error Handling Enhancement

The enrich_enum_docs.py script could benefit from more robust error handling around OLS API calls and file I/O operations, though the current implementation is functional.

📝 Code Quality Assessment

Aspect Rating Notes
Schema Design ⭐⭐⭐⭐⭐ Excellent domain separation and LinkML practices
Code Organization ⭐⭐⭐⭐⭐ Clear restructuring with logical hierarchy
Documentation ⭐⭐⭐⭐⭐ Comprehensive coverage in lab-automation view
Error Handling ⭐⭐⭐⭐ Good with room for minor improvements
Testing ⭐⭐⭐⭐ Proper validation, though limited by environment

✅ Final Recommendation

APPROVE - This PR represents a significant improvement to the valuesets project with excellent technical execution. The domain-specific restructuring, comprehensive ontology mappings, and BioPortal integration enhance both usability and maintainability.

The changes follow all repository guidelines from CLAUDE.md, maintain backward compatibility through proper imports, and demonstrate thorough testing with validation tools.


Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request restructures assay value sets by moving them from a single bio/assays.yaml file into a dedicated bio/assays/ subdirectory with multiple domain-specific schemas. It adds both static curated enums (enzymology, immunology, protein engineering, toxicology) with companion annotations for microschema coupling, and dynamic ontology-backed enums (OBI and BAO). Additionally, it implements BioPortal link integration in enum documentation and creates a comprehensive lab-automation view.

Changes:

  • Restructured assay schemas into bio/assays/ subdirectory with 6 new schema files
  • Added BioPortal links to all enum documentation pages via enrich_enum_docs.py
  • Created new lab-automation documentation view with comprehensive assay coverage

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/valuesets/schema/valuesets.yaml Added imports for 6 new assay schema modules
src/valuesets/schema/bio/assays/nf_osi_assays.yaml Renamed from bio/assays.yaml, updated ID and title
src/valuesets/schema/bio/assays/bio_assay.yaml New dynamic enums for OBI:0000070 and BAO:0000015 hierarchies
src/valuesets/schema/bio/assays/enzymology.yaml New static enum with 9 enzyme assay types and companion annotations
src/valuesets/schema/bio/assays/immunology.yaml New static enum with 9 immunology assay types and companion annotations
src/valuesets/schema/bio/assays/protein_engineering.yaml New static enum with 7 protein engineering assay types
src/valuesets/schema/bio/assays/toxicology.yaml New static enum with 8 toxicology assay types
src/valuesets/enums/bio/assays/*.py Generated Python enums for all new assay types
scripts/enrich_enum_docs.py Added BioPortal link generation for all enums
mkdocs.yml Changed navigation section from "Subsets" to "Views"
docs/views/lab-automation.md New comprehensive view documenting lab automation value sets
Various datamodel and JSON schema files Updated with new enum definitions

companion_enum_compound: ChemicalEntity
COMET_ASSAY:
description: Single cell gel electrophoresis assay measuring DNA strand breaks
meaning: OBI:0302736
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CURIE OBI:0302736 appears to be malformed. OBI identifiers typically use 7-digit numbers (e.g., OBI:0000070, OBI:0001271), not 6-digit numbers. This should be verified against the OBI ontology. If this is intentional and the CURIE exists, please confirm; otherwise, it should be corrected to the proper OBI identifier for comet assay.

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +40
- Views:
- DOE Genesis Mission: views/genesis-mission.md
- Microbial Ecology: views/microbial-ecology.md
- Rare Disease: views/rare-disease.md
- Lab Automation: views/lab-automation.md
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The navigation section heading was changed from "Subsets" to "Views". While this may be intentional as part of a broader renaming effort, this change is not mentioned in the PR description. Please confirm this is intentional and consider updating the PR description to reflect this navigation structure change if it wasn't already planned.

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +85
enums:
ImmunologyAssayEnum:
title: Immunology Assay Types
description: >-
Assays for detecting and quantifying immune-related molecules
including antibody-based detection, cell sorting, and blotting methods.
status: DRAFT
contributors:
- orcid:0000-0002-6601-2165
- https://github.com/anthropics/claude-code
instantiates:
- valuesets_meta:ValueSetEnumDefinition
permissible_values:
ELISA:
description: Enzyme-linked immunosorbent assay for detecting antigens or antibodies
meaning: BAO:0000134
annotations:
companion_enum_antibody_target: Protein
companion_enum_detection_method: DetectionModeEnum
SANDWICH_ELISA:
description: Sandwich ELISA using capture and detection antibodies
annotations:
companion_enum_antibody_target: Protein
companion_enum_detection_method: DetectionModeEnum
IMMUNOPRECIPITATION:
description: Precipitation of a protein antigen from solution using a specific antibody
meaning: BAO:0002508
annotations:
companion_enum_antibody_target: Protein
companion_enum_detection_method: DetectionModeEnum
FLOW_CYTOMETRY:
description: Analysis of cell populations using fluorescent antibodies and light scattering
meaning: BAO:0000005
annotations:
companion_enum_antibody_target: Protein
companion_enum_detection_method: DetectionModeEnum
WESTERN_BLOT:
description: Protein detection by gel electrophoresis, membrane transfer, and antibody staining
meaning: OBI:0000854
aliases:
- western blot assay
annotations:
companion_enum_antibody_target: Protein
companion_enum_detection_method: DetectionModeEnum
CYTOKINE_SECRETION_ASSAY:
description: Measurement of cytokine types and amounts released from cells
meaning: BAO:0003003
annotations:
companion_enum_antibody_target: Protein
companion_enum_detection_method: DetectionModeEnum
IMMUNOBLOT:
description: Detection of proteins using antibodies on a membrane
meaning: BAO:0002422
annotations:
companion_enum_antibody_target: Protein
companion_enum_detection_method: DetectionModeEnum
COMPLEMENT_FIXATION:
description: Serological assay detecting antibodies by complement consumption
annotations:
companion_enum_antibody_target: Protein
MULTIPLEX_BEAD_ASSAY:
description: Simultaneous detection of multiple analytes using antibody-conjugated beads
annotations:
companion_enum_antibody_target: Protein
companion_enum_detection_method: DetectionModeEnum
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is duplication of assay types between CellBasedAssayEnum (in nf_osi_assays.yaml) and ImmunologyAssayEnum. Both define ELISA, SANDWICH_ELISA, FLOW_CYTOMETRY, and WESTERN_BLOT. The ImmunologyAssayEnum versions include more detailed ontology mappings and companion enum annotations, which is valuable. However, users may be confused about which enum to use. Consider documenting the intended use case differences or consolidating these enums to avoid confusion.

Copilot uses AI. Check for mistakes.
- Rename docs/subsets/genesis-mission.md to docs/views/
- Add microbial-ecology and rare-disease view docs
- Add genesis mission design plan
- Update ontology term caches (chebi, chmo, envo, ncit, obi, pato)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link
Contributor

claude bot commented Feb 22, 2026

Claude finished @cmungall's task —— View job


Code Review Complete ✅

I've conducted a comprehensive code review of this PR. Here's my detailed analysis and feedback:

Review Checklist

  • Read repository guidelines and setup instructions
  • Analyze PR structure and organization changes
  • Review new schema files for LinkML best practices
  • Examine BioPortal integration implementation
  • Check for potential bugs and issues
  • Assess performance considerations
  • Evaluate security concerns
  • Review test coverage and validation approach
  • Provide final recommendations

Overall Assessment

This is a high-quality PR that demonstrates excellent LinkML schema design practices. The restructuring of assay value sets into domain-specific modules and the addition of BioPortal integration significantly improve the project's organization and usability.

✅ Strengths

Schema Organization & Best Practices

  • Excellent domain separation: The restructuring from a single bio/assays.yaml to bio/assays/ subdirectory with domain-specific files (enzymology, immunology, protein engineering, toxicology) follows solid architectural principles
  • Proper LinkML conventions: All schemas follow repository guidelines with CamelCase enum names, UPPER_CASE permissible values, and consistent use of CURIE mappings
  • Ontology integration: Proper use of meaning: fields mapping to BAO and OBI ontologies with verified CURIEs
  • Dynamic enum implementation: Well-structured dynamic enums using reachable_from with appropriate source ontologies and relationship types

Code Quality

  • Comprehensive companion annotations: The companion_enum_* annotations provide excellent microschema coupling guidance (e.g., which ChemicalEntity, Protein, DetectionModeEnum should pair with each assay)
  • Generated code consistency: Python datamodel files are properly generated and follow consistent patterns
  • Documentation structure: The lab-automation view provides excellent coverage and organization of the assay domain

BioPortal Integration

  • Clean implementation: The enrich_enum_docs.py script adds BioPortal links elegantly without disrupting existing content
  • Idempotent processing: Uses HTML comments (``) to prevent duplicate processing
  • URL encoding: Proper handling of concept URIs with quote() for safe URL construction

🔍 Technical Review

Performance Considerations

  • ✅ Minimal impact: The restructuring primarily affects schema organization without performance implications
  • ✅ Dynamic enum efficiency: Uses established LinkML patterns for ontology-backed enums
  • ✅ Generation process: Script processing is efficient with caching for OLS label lookups

Security Assessment

  • ✅ No security concerns: All changes involve schema definitions and documentation generation
  • ✅ Safe URL handling: BioPortal URL generation uses proper encoding
  • ✅ No credential exposure: No sensitive information in the changes

Test Coverage & Validation

  • ✅ Validation passing: PR description confirms just validate runs successfully
  • ✅ Build verification: just site builds successfully with 637 enriched docs
  • ✅ Schema integrity: All ontology CURIEs verified via OLS

📋 Minor Suggestions

1. Documentation Enhancement

The BioPortal links will show "View in BioPortal" but won't resolve until the ontology is next published. Consider adding a note in the PR or documentation about this temporary limitation.

2. Annotation Consistency

In src/valuesets/schema/bio/assays/toxicology.yaml:77-79, the ADME_ASSAY entry is missing a meaning: field while other entries have them. Consider adding an appropriate BAO or OBI mapping if available.

3. Error Handling Enhancement

The enrich_enum_docs.py script could benefit from more robust error handling around OLS API calls and file I/O operations, though the current implementation is functional.

📝 Code Quality Assessment

Aspect Rating Notes
Schema Design ⭐⭐⭐⭐⭐ Excellent domain separation and LinkML practices
Code Organization ⭐⭐⭐⭐⭐ Clear restructuring with logical hierarchy
Documentation ⭐⭐⭐⭐⭐ Comprehensive coverage in lab-automation view
Error Handling ⭐⭐⭐⭐ Good with room for minor improvements
Testing ⭐⭐⭐⭐ Proper validation, though limited by environment

✅ Final Recommendation

APPROVE - This PR represents a significant improvement to the valuesets project with excellent technical execution. The domain-specific restructuring, comprehensive ontology mappings, and BioPortal integration enhance both usability and maintainability.

The changes follow all repository guidelines from CLAUDE.md, maintain backward compatibility through proper imports, and demonstrate thorough testing with validation tools.


@cmungall cmungall merged commit f37bd4c into main Feb 22, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants