Skip to content

Conversation

@Rossi-Luciano
Copy link
Collaborator

O que esse PR faz?

Este PR implementa a validação completa de abstracts conforme as regras da especificação SPS 1.10, corrigindo bugs críticos e adicionando funcionalidades ausentes:

Problemas Solucionados:

  • Corrige lógica invertida em validate_p_multiple() que validava incorretamente highlights com apenas 1 parágrafo
  • Adiciona validação ausente para keywords proibidas em abstracts especiais (graphical, key-points, summary)
  • Corrige modelo de dados que não retornava campo graphic para Visual Abstract
  • Implementa filtro de resultados None no fluxo de validação

Features Adicionadas:

  • Suporte completo a Visual Abstract (graphic, fig_id, caption)
  • Validação de keywords como elementos irmãos de <abstract> (não dentro)
  • Internacionalização (i18n) com advice_text e advice_params
  • 15 novos testes cobrindo todos os cenários SPS 1.10
  • Validação robusta de estruturas de dados (dict vs string)

Onde a revisão poderia começar?

  1. Modelo de dados: packtools/sps/models/v2/abstract.py

    • Linhas ~161-210: Novas propriedades graphic, fig_id, caption
    • Linhas ~220-235: Inclusão dos novos campos no dict data
  2. Validação crítica: packtools/sps/validation/article_abstract.py

    • Linhas ~265-301: Novo método validate_kwd_not_allowed()
    • Linhas ~395-425: Correção em validate_p_multiple() (linha ~418: > 1 ao invés de <= 1)
    • Linhas ~139-172: Filtro de None no método validate()
  3. Testes: tests/sps/validation/test_article_abstract_additional.py

    • Linhas ~28-120: Testes para Summary abstract
    • Linhas ~122-250: Testes para resumo simples/estruturado
    • Linhas ~252-360: Testes para cenários positivos

Como este poderia ser testado manualmente?

Teste 1: Validação de Visual Abstract com graphic

python3 << 'EOF'
from lxml import etree as ET
from packtools.sps.models.v2.abstract import XMLAbstracts
from packtools.sps.validation.article_abstract import AbstractValidation

xml = """<article xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en">
<front><article-meta>
<abstract abstract-type="graphical" xml:lang="en">
    <title>Visual Abstract</title>
    <p><fig id="vs1"><graphic xlink:href="image.jpg"/></fig></p>
</abstract>
</article-meta></front></article>"""

xmltree = ET.fromstring(xml)
abstracts = list(XMLAbstracts(xmltree).visual_abstracts)
validator = AbstractValidation(abstracts[0])
results = list(validator.validate())

# Deve retornar graphic="image.jpg" e validação OK
print("Graphic:", abstracts[0].get("graphic"))
print("Validação graphic:", [r for r in results if r["title"] == "graphic"][0])
EOF

Resultado esperado: graphic: "image.jpg" e response: "OK"

Teste 2: Validação de keywords proibidas em Highlights

python3 << 'EOF'
from lxml import etree as ET
from packtools.sps.models.v2.abstract import XMLAbstracts
from packtools.sps.validation.article_abstract import AbstractValidation

xml = """<article xml:lang="en">
<front><article-meta>
<abstract abstract-type="key-points" xml:lang="en">
    <title>HIGHLIGHTS</title>
    <p>Highlight 1</p>
    <p>Highlight 2</p>
</abstract>
<kwd-group xml:lang="en">
    <title>Keywords:</title>
    <kwd>keyword1</kwd>
</kwd-group>
</article-meta></front></article>"""

xmltree = ET.fromstring(xml)
abstracts = list(XMLAbstracts(xmltree).key_points_abstracts)
validator = AbstractValidation(abstracts[0])
results = list(validator.validate())

# Deve detectar erro: keywords não permitidas
print("Keywords encontradas:", abstracts[0].get("kwds"))
print("Validação:", [r for r in results if r["title"] == "unexpected kwd"][0])
EOF

Resultado esperado: response: "ERROR" com advice para remover keywords

Teste 3: Suite completa de testes

cd /home/luciano/projects/packtools
python -m unittest tests.sps.validation.test_article_abstract

Resultado esperado: OK (skipped=3) - 12 testes passando, 3 skipped

Algum cenário de contexto que queira dar?

Contexto da Mudança

O packtools possui dois modelos de dados para abstracts:

  1. Modelo antigo (packtools/sps/models/article_abstract.py):

    • Buscava keywords dentro de <abstract> (incorreto segundo SPS 1.10)
    • Tinha campo graphic para Visual Abstract
  2. Modelo novo (packtools/sps/models/v2/abstract.py):

    • Busca keywords como irmãos de <abstract> (correto segundo SPS 1.10)
    • Faltava campo graphic (migração incompleta)

Especificação SPS 1.10

Segundo a documentação SPS 1.10 para <abstract>:

  • Resumos simples/estruturados (sem @abstract-type): EXIGEM <kwd-group> irmão
  • Tipos especiais (graphical, key-points, summary): NÃO PERMITEM <kwd-group> associado
  • Highlights (key-points): Devem ter múltiplos <p>, um para cada destaque
  • Visual Abstract (graphical): Deve ter elemento <graphic> obrigatório

Bugs Críticos Encontrados

  1. validate_p_multiple() tinha lógica invertida:
   # ANTES (ERRADO)
   is_valid = len(p_list) <= 1  # Validava OK com 1 parágrafo
   
   # DEPOIS (CORRETO)
   is_valid = len(p_list) > 1   # Valida OK com múltiplos parágrafos
  1. Validação ausente para keywords proibidas em tipos especiais

  2. Modelo incompleto sem campo graphic para Visual Abstract

Screenshots

N.A. - Funcionalidade backend de validação XML

Quais são tickets relevantes?

N.A.

Referências

N.A.

Adiciona suporte completo para Visual Abstract (graphical) conforme SPS
1.10:
- Propriedade graphic: extrai xlink:href do elemento <graphic>
- Propriedade fig_id: extrai ID da figura
- Propriedade caption: extrai legenda da figura
- Inclui campos no dict retornado por data()
Correções críticas:
- Corrige lógica invertida em validate_p_multiple() para highlights
- Implementa validate_kwd_not_allowed() para tipos especiais
- Filtra resultados None no método validate()

Melhorias:
- Adiciona suporte a internacionalização (i18n)
- Valida title como dict (plain_text/html_text)
- Validação robusta de graphic em visual abstracts

Conforme especificação SPS 1.10: keywords proibidas em
graphical, key-points e summary; obrigatórias em simples/estruturado.
Cobertura de testes:
- Summary (In Brief): validação de keywords e title
- Resumo simples/estruturado: keywords obrigatórias
- Highlights: múltiplos <p> e ausência de keywords
- Visual Abstract: presença de graphic e ausência de keywords
- Trans-abstract: associação de keywords por idioma
- Abstract-type: valores válidos/inválidos

Testes robustos com skipTest() para features não disponíveis.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements complete validation of abstracts according to SPS 1.10 specification, fixing critical bugs and adding missing functionality to the packtools abstract validation system.

Changes:

  • Fixed inverted logic in validate_p_multiple() that incorrectly validated highlights with only 1 paragraph (line 404: changed from <= 1 to > 1)
  • Added validate_kwd_not_allowed() method to prohibit keywords in special abstract types (graphical, key-points, summary)
  • Implemented missing graphic, fig_id, and caption properties in the v2 Abstract model for Visual Abstract support
  • Added internationalization support with advice_text and advice_params fields throughout validation methods
  • Implemented None result filtering in the validation flow (lines 133-164)
  • Added comprehensive test suite with 15 new test cases covering all SPS 1.10 abstract scenarios

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
packtools/sps/models/v2/abstract.py Added three new properties (graphic, fig_id, caption) to extract visual abstract elements and included them in the data dictionary output
packtools/sps/validation/article_abstract.py Fixed critical bug in validate_p_multiple(), added validate_kwd_not_allowed() method, implemented i18n support, and restructured validation flow with None filtering
tests/sps/validation/test_article_abstract.py Complete rewrite with 15 new test cases organized into 7 test classes covering summary, simple, structured, highlights, visual abstract, abstract-type, and trans-abstract validation scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 396 to 409
# Check if graphic field exists in model data
# If model returns graphic, it should have a value
if "graphic" in abstract_data:
self.assertIsNotNone(abstract_data["graphic"],
"Graphic field exists but is None - graphic should be extracted from XML")

validator = AbstractValidation(abstract_data)
obtained = list(validator.validate())

graphic_validation = [v for v in obtained if v["title"] == "graphic"]

if graphic_validation:
# If graphic validation exists, it should pass for valid graphic
self.assertEqual(graphic_validation[0]["response"], "OK")
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test has conditional assertions (lines 398-400 and 407-409) that silently pass if the conditions are not met. This is a weak test pattern that doesn't guarantee the expected behavior is validated. Consider using self.assertIn("graphic", abstract_data) to ensure the field exists, and always check the graphic_validation list is not empty before accessing it, or use self.assertTrue(graphic_validation) to fail if validation is missing. This ensures the test actually validates what it claims to validate.

Copilot uses AI. Check for mistakes.
Comment on lines 204 to 207
graphic_node = self.node.find('.//graphic', namespaces={'xlink': 'http://www.w3.org/1999/xlink'})
if graphic_node is not None:
return graphic_node.get('{http://www.w3.org/1999/xlink}href')
return None
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The find() method doesn't officially support the namespaces parameter according to lxml documentation. The recommended approach is to use xpath() with namespaces or include the namespace directly in the tag name like find('.//{http://www.w3.org/1999/xlink}graphic'). While this pattern appears to work (it's used in the old model), consider using the more standard approach: self.node.xpath('.//graphic/@xlink:href', namespaces={'xlink': 'http://www.w3.org/1999/xlink'}) which would return the href directly, or find('.//{http://www.w3.org/1999/xlink}graphic') for better code clarity and future compatibility.

Suggested change
graphic_node = self.node.find('.//graphic', namespaces={'xlink': 'http://www.w3.org/1999/xlink'})
if graphic_node is not None:
return graphic_node.get('{http://www.w3.org/1999/xlink}href')
return None
hrefs = self.node.xpath(
'.//graphic/@xlink:href',
namespaces={'xlink': 'http://www.w3.org/1999/xlink'},
)
return hrefs[0] if hrefs else None

Copilot uses AI. Check for mistakes.
Comment on lines 227 to 228
# Check if graphic exists and is not empty
has_graphic = bool(graphic and (graphic if isinstance(graphic, str) else True))
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation logic bool(graphic and (graphic if isinstance(graphic, str) else True)) doesn't check if string values are non-empty after stripping whitespace. While the model returns xlink:href attribute values that are unlikely to be empty or whitespace-only, consider using bool(graphic and (isinstance(graphic, str) and graphic.strip() or not isinstance(graphic, str))) for more robust validation, or simply bool(graphic) if the model guarantees non-empty strings.

Suggested change
# Check if graphic exists and is not empty
has_graphic = bool(graphic and (graphic if isinstance(graphic, str) else True))
# Check if graphic exists and is not empty (ignoring whitespace-only strings)
if isinstance(graphic, str):
has_graphic = bool(graphic.strip())
else:
has_graphic = bool(graphic)

Copilot uses AI. Check for mistakes.
self.assertEqual(kwd_validation["validation_type"], "exist")
self.assertEqual(kwd_validation["expected_value"], None)
self.assertIsNotNone(kwd_validation["got_value"])
self.assertTrue(len(kwd_validation["got_value"]) > 0)
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertTrue(a > b) cannot provide an informative message. Using assertGreater(a, b) instead will give more informative messages.

Suggested change
self.assertTrue(len(kwd_validation["got_value"]) > 0)
self.assertGreater(len(kwd_validation["got_value"]), 0)

Copilot uses AI. Check for mistakes.

# Graphic comes as string (xlink:href) from model
# Check if graphic exists and is not empty
has_graphic = bool(graphic and (graphic if isinstance(graphic, str) else True))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano dá para ser mais simples só has_graphic = bool(graphic)

"sections": list(self.sections),
"list_items": list(self.list_items),
"kwds": list(self.kwds),
"graphic": self.graphic, # For visual abstracts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano me parece que a intenção é retornar o nome ou url do arquivo, então melhore o nome, por exemplo: graphic_xlink_href, assim mais adiante não fica dúvida se é link ou se é node.

return None

@property
def graphic(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano me parece que a intenção é retornar o nome ou url do arquivo, então melhore o nome, por exemplo: graphic_xlink_href, assim mais adiante não fica dúvida se é link ou se é node.

if not kwd_validation:
self.skipTest("Keyword validation not found")

self.assertEqual(kwd_validation[0]["response"], "OK")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano os testes não estão testando a nova modificação que é a inserção das novas chaves

Copy link
Member

@robertatakenaka robertatakenaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano verificar os comentários

Rossi-Luciano and others added 4 commits January 27, 2026 14:41
- Adiciona propriedade graphic_href para visual abstracts
- Corrige uso incorreto de namespaces no find() do lxml
- Adiciona campos graphic_href, fig_id e caption ao dict data
- Melhora documentação explicando schema JATS/SPS
- Atualiza validador para usar graphic_href ao invés de graphic
- Corrige bug: strings whitespace-only agora são inválidas
- Substitui lógica confusa por validação explícita com strip()
- Remove asserções condicionais que passavam silenciosamente
- Adiciona validações obrigatórias para campo graphic_href
- Adiciona testes para casos inválidos (sem graphic, href vazio)
- Melhora mensagens de erro com contexto detalhado
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +219 to +221
DO NOT use find() with namespaces parameter as it's not officially
supported by lxml and will be ignored silently.
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O docstring de graphic_href afirma que find() com parâmetro namespaces “não é suportado pelo lxml e será ignorado silenciosamente”, mas lxml.etree.Element.find() aceita namespaces (mapeamento prefixo→URI). Ajuste/remova essa observação para não induzir a uso incorreto da API.

Suggested change
DO NOT use find() with namespaces parameter as it's not officially
supported by lxml and will be ignored silently.

Copilot uses AI. Check for mistakes.
Comment on lines +479 to +499
# MANDATORY VALIDATION: Field 'graphic' MUST be present after model v2 fix
self.assertIn("graphic_href", abstract_data,
"Campo 'graphic' DEVE estar presente em abstract_data após correção do modelo v2. "
"Se este teste falhar, verifique se packtools/sps/models/v2/abstract.py "
"tem a propriedade @graphic implementada.")

# MANDATORY VALIDATION: Field 'graphic' MUST have value (extracted xlink:href)
self.assertIsNotNone(abstract_data["graphic_href"],
"Campo 'graphic' não deve ser None para visual abstract. "
"Deve conter o valor do atributo xlink:href extraído do XML.")

# MANDATORY VALIDATION: Verify type and content
self.assertIsInstance(abstract_data["graphic_href"], str,
"Campo 'graphic' deve ser string (valor de xlink:href)")

self.assertTrue(abstract_data["graphic_href"].strip(),
"Campo 'graphic' não pode ser string vazia")

# MANDATORY VALIDATION: Verify expected value
self.assertEqual(abstract_data["graphic_href"], "1234-5678-va-01.jpg",
"Valor de 'graphic' deve corresponder ao xlink:href do XML")
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mensagens/nomes nos asserts usam “campo 'graphic'” e mencionam “propriedade @Graphic”, mas o modelo v2 expõe graphic_href (e o teste também valida a chave graphic_href). Atualize o texto dessas mensagens para graphic_href para evitar confusão na manutenção/diagnóstico de falhas.

Suggested change
# MANDATORY VALIDATION: Field 'graphic' MUST be present after model v2 fix
self.assertIn("graphic_href", abstract_data,
"Campo 'graphic' DEVE estar presente em abstract_data após correção do modelo v2. "
"Se este teste falhar, verifique se packtools/sps/models/v2/abstract.py "
"tem a propriedade @graphic implementada.")
# MANDATORY VALIDATION: Field 'graphic' MUST have value (extracted xlink:href)
self.assertIsNotNone(abstract_data["graphic_href"],
"Campo 'graphic' não deve ser None para visual abstract. "
"Deve conter o valor do atributo xlink:href extraído do XML.")
# MANDATORY VALIDATION: Verify type and content
self.assertIsInstance(abstract_data["graphic_href"], str,
"Campo 'graphic' deve ser string (valor de xlink:href)")
self.assertTrue(abstract_data["graphic_href"].strip(),
"Campo 'graphic' não pode ser string vazia")
# MANDATORY VALIDATION: Verify expected value
self.assertEqual(abstract_data["graphic_href"], "1234-5678-va-01.jpg",
"Valor de 'graphic' deve corresponder ao xlink:href do XML")
# MANDATORY VALIDATION: Field 'graphic_href' MUST be present after model v2 fix
self.assertIn("graphic_href", abstract_data,
"Campo 'graphic_href' DEVE estar presente em abstract_data após correção do modelo v2. "
"Se este teste falhar, verifique se packtools/sps/models/v2/abstract.py "
"tem a propriedade graphic_href implementada.")
# MANDATORY VALIDATION: Field 'graphic_href' MUST have value (extracted xlink:href)
self.assertIsNotNone(abstract_data["graphic_href"],
"Campo 'graphic_href' não deve ser None para visual abstract. "
"Deve conter o valor do atributo xlink:href extraído do XML.")
# MANDATORY VALIDATION: Verify type and content
self.assertIsInstance(abstract_data["graphic_href"], str,
"Campo 'graphic_href' deve ser string (valor de xlink:href)")
self.assertTrue(abstract_data["graphic_href"].strip(),
"Campo 'graphic_href' não pode ser string vazia")
# MANDATORY VALIDATION: Verify expected value
self.assertEqual(abstract_data["graphic_href"], "1234-5678-va-01.jpg",
"Valor de 'graphic_href' deve corresponder ao xlink:href do XML")

Copilot uses AI. Check for mistakes.
p_validations = [v for v in obtained if v["sub_item"] == "p"]

# Should have validation for <p> count
self.assertTrue(len(p_validations) > 0)
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertTrue(a > b) cannot provide an informative message. Using assertGreater(a, b) instead will give more informative messages.

Suggested change
self.assertTrue(len(p_validations) > 0)
self.assertGreater(len(p_validations), 0)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants