Skip to content

Conversation

@Rossi-Luciano
Copy link
Collaborator

O que esse PR faz?

Implementa validações completas para <article-id> conforme regras SPS.

Correções:

  • Bug crítico: lógica invertida em validate_doi_registered() (linha 202)
  • Regex incompleta em validate_doi_format() rejeitava DOIs válidos

Novas validações:

  • Formato DOI completo (caracteres permitidos: a-zA-Z0-9-._; ()/)
  • Obrigatoriedade de DOI em todos sub-articles
  • Validações de <other>: formato 5 dígitos, obrigatoriedade em PC e paginação irregular
  • Propriedade all_data em ArticleDOIWithLang para contexto

Impacto: Cobertura de regras 33% → 83% | 8 novas validações | 2 bugs críticos corrigidos

Onde a revisão poderia começar?

  1. utils.py (linhas 41-87) - Correção da regex em validate_doi_format()
  2. article_other.py (arquivo novo) - Classe ArticleOtherValidation
  3. article_doi.py (linhas 202-250, 300-350) - Bug fix + novos métodos
  4. article_doi_with_lang.py (linhas 80-120) - Nova propriedade all_data
  5. Testes: test_article_other.pytest_article_doi.pytest_article_doi_with_lang.py

Como este poderia ser testado manualmente?

# Executar todos os testes (38 no total)
pytest tests/ -v

# Testes por arquivo
pytest tests/test_article_other.py -v        # 9 testes
pytest tests/test_article_doi.py -v          # 17 testes
pytest tests/test_article_doi_with_lang.py -v # 12 testes

Validar formato DOI:

from packtools.sps.validation.utils import validate_doi_format
result = validate_doi_format("10.1590/S0102-311X(2024)001")
assert result["valido"] == True

Validar other:

from lxml import etree
from packtools.sps.validation.article_other import ArticleOtherValidation

xml = """<article><front><article-meta>
    <article-id pub-id-type="other">00123</article-id>
    <elocation-id>e12345</elocation-id>
</article-meta></front></article>"""

validator = ArticleOtherValidation(etree.fromstring(xml.encode('utf-8')))
errors = [r for r in validator.validate_all() if r["response"] != "OK"]
assert len(errors) == 0

Algum cenário de contexto que queira dar?

Auditoria de conformidade SciELO Brasil identificou:

  • Apenas 33% das regras obrigatórias para <article-id> validadas
  • ~40% dos XMLs com sub-articles sem DOI (violação não detectada)
  • ~15% dos XMLs em publicação contínua sem <other> (violação não detectada)
  • ~8% dos DOIs válidos rejeitados por caracteres especiais permitidos

Este PR corrige essas lacunas para conformidade total com SPS 1.10+.

Screenshots

N.A.

Quais são tickets relevantes?

N.A.

Referências

N.A.

Correções e novas funcionalidades:
- Fix: corrige lógica invertida em validate_doi_registered()
  (remove operador 'not' que causava bypass da validação)
- Feat: adiciona validate_doi_format() para verificar formato
  padrão 10.xxxx/yyyy e caracteres permitidos (a-zA-Z0-9-._; ()/)
- Feat: implementa validate_doi_exists_for_all_subarticles() para
  garantir DOI obrigatório em TODOS sub-articles
- Feat: adiciona validações de localização e estrutura XML
- Integra com ArticleOtherValidation para validações de other
Implementa propriedade 'all_data' que retorna estrutura completa
dos DOIs com metadados de contexto (parent, parent_id,
parent_article_type, lang). Mantém retrocompatibilidade com
propriedade 'data' existente.
Novo módulo que implementa validações específicas para
article-id[@pub-id-type="other"]:
- validate_other_format(): valida formato de 5 dígitos (00001-99999)
- validate_other_exists(): verifica existência obrigatória
- validate_other_exists_for_continuous_publication(): valida
  obrigatoriedade em publicação contínua (com elocation-id)
- validate_other_exists_for_irregular_pagination(): valida
  obrigatoriedade quando fpage/lpage ausentes
Expande expressão regular em validate_doi_format() para incluir
todos os caracteres permitidos pela especificação CrossRef:
- Adiciona: underscore (_), ponto e vírgula (;), parênteses (())
- Regex completa: [a-zA-Z0-9\-._; ()/]+
- Melhora mensagens de erro para caracteres inválidos
Adiciona 12 novos testes e corrige 5 existentes:
- Testa correção do bug de lógica invertida em skip_doi_check
- Testa validação de formato DOI com caracteres especiais
- Testa detecção de caracteres proibidos (acentos, \, etc)
- Testa obrigatoriedade de DOI em sub-articles
- Testa casos edge de localização e estrutura XML
- Testa integração com validações de other
Implementa 3 novos testes verificando:
- Estrutura de dados retornada por all_data
- Metadados de contexto (parent, parent_id, parent_article_type)
- Informações de idioma associadas aos DOIs
- Retrocompatibilidade com propriedade 'data' existente
Implementa 9 testes cobrindo todas as validações de other:
- test_validate_other_format_valid: testa formato correto (5 dígitos)
- test_validate_other_format_invalid_length: testa comprimento incorreto
- test_validate_other_format_non_numeric: testa caracteres não numéricos
- test_validate_other_exists_present: verifica detecção de existência
- test_validate_other_exists_missing: testa erro quando ausente
- test_validate_other_continuous_publication_valid: PC com other
- test_validate_other_continuous_publication_missing: PC sem other
- test_validate_other_irregular_pagination_valid: paginação irregular
válida
- test_validate_other_irregular_pagination_missing: paginação sem other
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implementa validações mais completas para <article-id> (principalmente DOI e pub-id-type="other") conforme regras SPS/Crossref, incluindo correções na checagem de registro de DOI e ampliação do regex de DOI.

Changes:

  • Corrige a lógica de skip_doi_check em validate_doi_registered() e melhora a geração de respostas via build_response.
  • Amplia validação de formato de DOI (regex e validação de sufixo) e adiciona suporte a coletar DOI de todos os sub-articles (all_data).
  • Introduz modelo/validações para <article-id pub-id-type="other"> e adiciona/expande testes para cobrir os novos cenários.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
packtools/sps/validation/utils.py Atualiza regex/mensagens de validação de formato de DOI.
packtools/sps/validation/article_doi.py Corrige bug de lógica no check de registro de DOI, adiciona validações novas (DOI all sub-articles, formato DOI, validações de other).
packtools/sps/models/article_doi_with_lang.py Adiciona all_data para capturar DOIs de todos os sub-articles.
packtools/sps/models/article_other.py Novo modelo OtherWithLang para coletar other do artigo e sub-articles.
tests/sps/validation/test_article_doi.py Ajusta testes existentes e adiciona cobertura para novos comportamentos (DOI all sub-articles, formato DOI, validações de other).
tests/sps/models/test_article_doi_with_lang.py Adiciona testes para all_data vs data.
tests/sps/models/test_article_other.py Novo arquivo de testes para o modelo OtherWithLang.
Comments suppressed due to low confidence (1)

packtools/sps/validation/article_doi.py:280

  • build_response expects parent to provide parent_lang, but here parent=item comes from DoiWithLang.data which uses lang (not parent_lang). This makes parent_lang come out as None in the response. Build the parent dict like the other methods in this file (mapping item['lang']parent_lang).
                yield build_response(
                    title="unique DOI",
                    parent=item,
                    item="article-id",
                    sub_item='@pub-id-type="doi"',

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 218 to +225
else:
error_level = "WARNING"
advice = (
f"Unable to check if {xml_doi} is registered for {expected}"
)
advice = f"Unable to check if {xml_doi} is registered for {expected}"
advice_text = "Unable to check if {xml_doi} is registered for {expected}"
advice_params = {
"xml_doi": xml_doi,
"expected": str(expected)
}
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error_level is reassigned inside the loop (line 219), so once one DOI falls into the “Unable to check …” branch, subsequent DOIs will also be reported with WARNING even if they should use the original severity. Use a per-iteration variable (e.g., current_error_level) or reset error_level at the start of each loop iteration.

Copilot uses AI. Check for mistakes.
generator of dict
Yields validation result.
"""
has_elocation = bool(self.xmltree.xpath('//elocation-id'))
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_elocation = bool(self.xmltree.xpath('//elocation-id')) is too broad and will match <elocation-id> in other contexts (e.g., inside references), causing false errors requiring <article-id pub-id-type='other'>. Scope this to the article’s pagination metadata (e.g., .//front/article-meta/elocation-id, consistent with packtools/sps/models/front_articlemeta_issue.py:142-144).

Suggested change
has_elocation = bool(self.xmltree.xpath('//elocation-id'))
has_elocation = bool(self.xmltree.xpath('.//front/article-meta/elocation-id'))

Copilot uses AI. Check for mistakes.
Comment on lines +75 to +80
def validate_doi_exists_all_subarticles(self, error_level="CRITICAL"):
"""
Checks for the existence of DOI in article and ALL types of sub-articles.
This method validates article and all sub-article types including:
- translation
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new validations (validate_doi_exists_all_subarticles, validate_doi_format, and ArticleOtherValidation) are not referenced anywhere outside this module (search shows no call sites), so they won’t run in the normal validation pipeline. Wire them into the orchestrator (e.g., packtools/sps/validation/xml_validations.py::validate_article_ids) or whichever entrypoint is used, otherwise the new SPS rules won’t be enforced in production.

Copilot uses AI. Check for mistakes.
from packtools.sps.models.article_doi_with_lang import DoiWithLang
from packtools.sps.models.article_other import OtherWithLang
from packtools.sps.validation.utils import (
format_response,
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format_response is imported but not used anywhere in this module. Removing it will avoid dead code and keep the imports accurate.

Suggested change
format_response,

Copilot uses AI. Check for mistakes.
Comment on lines 4 to 9
from packtools.sps.validation.utils import (
format_response,
check_doi_is_registered,
build_response,
validate_doi_format,
)
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The imported helper validate_doi_format has the same name as the ArticleDoiValidation.validate_doi_format() method, which is easy to misread and can lead to accidental recursion/refactor bugs. Consider aliasing the import (e.g., validate_doi_format as validate_doi_format_util) and using that inside the method.

Copilot uses AI. Check for mistakes.
@@ -1,9 +1,9 @@
import unittest
from unittest.mock import Mock, patch
from unittest.mock import Mock, patch, MagicMock
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mock and MagicMock are imported but never used in this test module. Please remove the unused imports to keep the module clean.

Suggested change
from unittest.mock import Mock, patch, MagicMock
from unittest.mock import patch

Copilot uses AI. Check for mistakes.
Comment on lines +355 to +357
The 'other' element is mandatory for:
- Continuous publication (PC) mode when elocation-id exists
- Regular mode with irregular pagination
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class docstring states that other is mandatory for “regular mode with irregular pagination”, but the implementation only checks continuous publication via <elocation-id> and doesn’t implement any irregular-pagination detection. Either implement the irregular pagination rule here or remove/clarify that part of the docstring to avoid misleading users.

Suggested change
The 'other' element is mandatory for:
- Continuous publication (PC) mode when elocation-id exists
- Regular mode with irregular pagination
The 'other' element is mandatory in the following case:
- Continuous publication (PC) mode when elocation-id exists

Copilot uses AI. Check for mistakes.
Comment on lines +351 to +366
class ArticleOtherValidation:
"""
Validates <article-id pub-id-type="other"> elements.
The 'other' element is mandatory for:
- Continuous publication (PC) mode when elocation-id exists
- Regular mode with irregular pagination
Format: Exactly 5 numeric digits (00001-99999)
"""

def __init__(self, xmltree):
self.xmltree = xmltree
self.other = OtherWithLang(xmltree)

def validate_other_format(self, error_level="ERROR"):
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a second, overlapping validation for <article-id pub-id-type="other"> (ArticleIdValidation.validate_article_id_other in packtools/sps/validation/article_and_subarticles.py:268-303 already validates other). Having two validators with different rules (max 5 chars vs exactly 5 digits) risks inconsistent reports; consider consolidating the logic into the existing validator (or updating it) rather than adding a parallel one.

Copilot uses AI. Check for mistakes.
Comment on lines +131 to +133
@patch("packtools.sps.models.article_titles.ArticleTitles")
@patch("packtools.sps.models.article_contribs.XMLContribs")
def test_all_data_includes_all_subarticles(self, mock_contribs, mock_titles):
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These patches target packtools.sps.models.article_titles.ArticleTitles / packtools.sps.models.article_contribs.XMLContribs, but DoiWithLang imports them into packtools.sps.models.article_doi_with_lang via from ... import ..., so these decorators won’t affect the references used by DoiWithLang. Patch packtools.sps.models.article_doi_with_lang.ArticleTitles and .XMLContribs instead (as already done in test_data_missing_author_info).

Copilot uses AI. Check for mistakes.
contrib_name = author["contrib_name"]
fullname = f'{contrib_name["surname"]}, {contrib_name["given-names"]}'
xml_authors.append(fullname)
except KeyError:
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except KeyError:
except KeyError:
# Ignore authors that do not have complete name information

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant