-
Notifications
You must be signed in to change notification settings - Fork 24
feat: implementa validações completas para elemento <article-id> conforme especificação SPS #1069
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: implementa validações completas para elemento <article-id> conforme especificação SPS #1069
Conversation
Correções e novas funcionalidades: - Fix: corrige lógica invertida em validate_doi_registered() (remove operador 'not' que causava bypass da validação) - Feat: adiciona validate_doi_format() para verificar formato padrão 10.xxxx/yyyy e caracteres permitidos (a-zA-Z0-9-._; ()/) - Feat: implementa validate_doi_exists_for_all_subarticles() para garantir DOI obrigatório em TODOS sub-articles - Feat: adiciona validações de localização e estrutura XML - Integra com ArticleOtherValidation para validações de other
Implementa propriedade 'all_data' que retorna estrutura completa dos DOIs com metadados de contexto (parent, parent_id, parent_article_type, lang). Mantém retrocompatibilidade com propriedade 'data' existente.
Novo módulo que implementa validações específicas para article-id[@pub-id-type="other"]: - validate_other_format(): valida formato de 5 dígitos (00001-99999) - validate_other_exists(): verifica existência obrigatória - validate_other_exists_for_continuous_publication(): valida obrigatoriedade em publicação contínua (com elocation-id) - validate_other_exists_for_irregular_pagination(): valida obrigatoriedade quando fpage/lpage ausentes
Expande expressão regular em validate_doi_format() para incluir todos os caracteres permitidos pela especificação CrossRef: - Adiciona: underscore (_), ponto e vírgula (;), parênteses (()) - Regex completa: [a-zA-Z0-9\-._; ()/]+ - Melhora mensagens de erro para caracteres inválidos
Adiciona 12 novos testes e corrige 5 existentes: - Testa correção do bug de lógica invertida em skip_doi_check - Testa validação de formato DOI com caracteres especiais - Testa detecção de caracteres proibidos (acentos, \, etc) - Testa obrigatoriedade de DOI em sub-articles - Testa casos edge de localização e estrutura XML - Testa integração com validações de other
Implementa 3 novos testes verificando: - Estrutura de dados retornada por all_data - Metadados de contexto (parent, parent_id, parent_article_type) - Informações de idioma associadas aos DOIs - Retrocompatibilidade com propriedade 'data' existente
Implementa 9 testes cobrindo todas as validações de other: - test_validate_other_format_valid: testa formato correto (5 dígitos) - test_validate_other_format_invalid_length: testa comprimento incorreto - test_validate_other_format_non_numeric: testa caracteres não numéricos - test_validate_other_exists_present: verifica detecção de existência - test_validate_other_exists_missing: testa erro quando ausente - test_validate_other_continuous_publication_valid: PC com other - test_validate_other_continuous_publication_missing: PC sem other - test_validate_other_irregular_pagination_valid: paginação irregular válida - test_validate_other_irregular_pagination_missing: paginação sem other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Implementa validações mais completas para <article-id> (principalmente DOI e pub-id-type="other") conforme regras SPS/Crossref, incluindo correções na checagem de registro de DOI e ampliação do regex de DOI.
Changes:
- Corrige a lógica de
skip_doi_checkemvalidate_doi_registered()e melhora a geração de respostas viabuild_response. - Amplia validação de formato de DOI (regex e validação de sufixo) e adiciona suporte a coletar DOI de todos os sub-articles (
all_data). - Introduz modelo/validações para
<article-id pub-id-type="other">e adiciona/expande testes para cobrir os novos cenários.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
packtools/sps/validation/utils.py |
Atualiza regex/mensagens de validação de formato de DOI. |
packtools/sps/validation/article_doi.py |
Corrige bug de lógica no check de registro de DOI, adiciona validações novas (DOI all sub-articles, formato DOI, validações de other). |
packtools/sps/models/article_doi_with_lang.py |
Adiciona all_data para capturar DOIs de todos os sub-articles. |
packtools/sps/models/article_other.py |
Novo modelo OtherWithLang para coletar other do artigo e sub-articles. |
tests/sps/validation/test_article_doi.py |
Ajusta testes existentes e adiciona cobertura para novos comportamentos (DOI all sub-articles, formato DOI, validações de other). |
tests/sps/models/test_article_doi_with_lang.py |
Adiciona testes para all_data vs data. |
tests/sps/models/test_article_other.py |
Novo arquivo de testes para o modelo OtherWithLang. |
Comments suppressed due to low confidence (1)
packtools/sps/validation/article_doi.py:280
build_responseexpectsparentto provideparent_lang, but hereparent=itemcomes fromDoiWithLang.datawhich useslang(notparent_lang). This makesparent_langcome out asNonein the response. Build theparentdict like the other methods in this file (mappingitem['lang']→parent_lang).
yield build_response(
title="unique DOI",
parent=item,
item="article-id",
sub_item='@pub-id-type="doi"',
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| else: | ||
| error_level = "WARNING" | ||
| advice = ( | ||
| f"Unable to check if {xml_doi} is registered for {expected}" | ||
| ) | ||
| advice = f"Unable to check if {xml_doi} is registered for {expected}" | ||
| advice_text = "Unable to check if {xml_doi} is registered for {expected}" | ||
| advice_params = { | ||
| "xml_doi": xml_doi, | ||
| "expected": str(expected) | ||
| } |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error_level is reassigned inside the loop (line 219), so once one DOI falls into the “Unable to check …” branch, subsequent DOIs will also be reported with WARNING even if they should use the original severity. Use a per-iteration variable (e.g., current_error_level) or reset error_level at the start of each loop iteration.
| generator of dict | ||
| Yields validation result. | ||
| """ | ||
| has_elocation = bool(self.xmltree.xpath('//elocation-id')) |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has_elocation = bool(self.xmltree.xpath('//elocation-id')) is too broad and will match <elocation-id> in other contexts (e.g., inside references), causing false errors requiring <article-id pub-id-type='other'>. Scope this to the article’s pagination metadata (e.g., .//front/article-meta/elocation-id, consistent with packtools/sps/models/front_articlemeta_issue.py:142-144).
| has_elocation = bool(self.xmltree.xpath('//elocation-id')) | |
| has_elocation = bool(self.xmltree.xpath('.//front/article-meta/elocation-id')) |
| def validate_doi_exists_all_subarticles(self, error_level="CRITICAL"): | ||
| """ | ||
| Checks for the existence of DOI in article and ALL types of sub-articles. | ||
| This method validates article and all sub-article types including: | ||
| - translation |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These new validations (validate_doi_exists_all_subarticles, validate_doi_format, and ArticleOtherValidation) are not referenced anywhere outside this module (search shows no call sites), so they won’t run in the normal validation pipeline. Wire them into the orchestrator (e.g., packtools/sps/validation/xml_validations.py::validate_article_ids) or whichever entrypoint is used, otherwise the new SPS rules won’t be enforced in production.
| from packtools.sps.models.article_doi_with_lang import DoiWithLang | ||
| from packtools.sps.models.article_other import OtherWithLang | ||
| from packtools.sps.validation.utils import ( | ||
| format_response, |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format_response is imported but not used anywhere in this module. Removing it will avoid dead code and keep the imports accurate.
| format_response, |
| from packtools.sps.validation.utils import ( | ||
| format_response, | ||
| check_doi_is_registered, | ||
| build_response, | ||
| validate_doi_format, | ||
| ) |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The imported helper validate_doi_format has the same name as the ArticleDoiValidation.validate_doi_format() method, which is easy to misread and can lead to accidental recursion/refactor bugs. Consider aliasing the import (e.g., validate_doi_format as validate_doi_format_util) and using that inside the method.
| @@ -1,9 +1,9 @@ | |||
| import unittest | |||
| from unittest.mock import Mock, patch | |||
| from unittest.mock import Mock, patch, MagicMock | |||
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mock and MagicMock are imported but never used in this test module. Please remove the unused imports to keep the module clean.
| from unittest.mock import Mock, patch, MagicMock | |
| from unittest.mock import patch |
| The 'other' element is mandatory for: | ||
| - Continuous publication (PC) mode when elocation-id exists | ||
| - Regular mode with irregular pagination |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class docstring states that other is mandatory for “regular mode with irregular pagination”, but the implementation only checks continuous publication via <elocation-id> and doesn’t implement any irregular-pagination detection. Either implement the irregular pagination rule here or remove/clarify that part of the docstring to avoid misleading users.
| The 'other' element is mandatory for: | |
| - Continuous publication (PC) mode when elocation-id exists | |
| - Regular mode with irregular pagination | |
| The 'other' element is mandatory in the following case: | |
| - Continuous publication (PC) mode when elocation-id exists |
| class ArticleOtherValidation: | ||
| """ | ||
| Validates <article-id pub-id-type="other"> elements. | ||
| The 'other' element is mandatory for: | ||
| - Continuous publication (PC) mode when elocation-id exists | ||
| - Regular mode with irregular pagination | ||
| Format: Exactly 5 numeric digits (00001-99999) | ||
| """ | ||
|
|
||
| def __init__(self, xmltree): | ||
| self.xmltree = xmltree | ||
| self.other = OtherWithLang(xmltree) | ||
|
|
||
| def validate_other_format(self, error_level="ERROR"): |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces a second, overlapping validation for <article-id pub-id-type="other"> (ArticleIdValidation.validate_article_id_other in packtools/sps/validation/article_and_subarticles.py:268-303 already validates other). Having two validators with different rules (max 5 chars vs exactly 5 digits) risks inconsistent reports; consider consolidating the logic into the existing validator (or updating it) rather than adding a parallel one.
| @patch("packtools.sps.models.article_titles.ArticleTitles") | ||
| @patch("packtools.sps.models.article_contribs.XMLContribs") | ||
| def test_all_data_includes_all_subarticles(self, mock_contribs, mock_titles): |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These patches target packtools.sps.models.article_titles.ArticleTitles / packtools.sps.models.article_contribs.XMLContribs, but DoiWithLang imports them into packtools.sps.models.article_doi_with_lang via from ... import ..., so these decorators won’t affect the references used by DoiWithLang. Patch packtools.sps.models.article_doi_with_lang.ArticleTitles and .XMLContribs instead (as already done in test_data_missing_author_info).
| contrib_name = author["contrib_name"] | ||
| fullname = f'{contrib_name["surname"]}, {contrib_name["given-names"]}' | ||
| xml_authors.append(fullname) | ||
| except KeyError: |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'except' clause does nothing but pass and there is no explanatory comment.
| except KeyError: | |
| except KeyError: | |
| # Ignore authors that do not have complete name information |
O que esse PR faz?
Implementa validações completas para
<article-id>conforme regras SPS.Correções:
validate_doi_registered()(linha 202)validate_doi_format()rejeitava DOIs válidosNovas validações:
a-zA-Z0-9-._; ()/)<other>: formato 5 dígitos, obrigatoriedade em PC e paginação irregularall_dataemArticleDOIWithLangpara contextoImpacto: Cobertura de regras 33% → 83% | 8 novas validações | 2 bugs críticos corrigidos
Onde a revisão poderia começar?
utils.py(linhas 41-87) - Correção da regex emvalidate_doi_format()article_other.py(arquivo novo) - ClasseArticleOtherValidationarticle_doi.py(linhas 202-250, 300-350) - Bug fix + novos métodosarticle_doi_with_lang.py(linhas 80-120) - Nova propriedadeall_datatest_article_other.py→test_article_doi.py→test_article_doi_with_lang.pyComo este poderia ser testado manualmente?
Validar formato DOI:
Validar other:
Algum cenário de contexto que queira dar?
Auditoria de conformidade SciELO Brasil identificou:
<article-id>validadas<other>(violação não detectada)Este PR corrige essas lacunas para conformidade total com SPS 1.10+.
Screenshots
N.A.
Quais são tickets relevantes?
N.A.
Referências
N.A.