Implementa validação completa de abstracts conforme SPS 1.10 #1064

Rossi-Luciano · 2026-01-17T14:49:52Z

O que esse PR faz?

Este PR implementa a validação completa de abstracts conforme as regras da especificação SPS 1.10, corrigindo bugs críticos e adicionando funcionalidades ausentes:

Problemas Solucionados:

Corrige lógica invertida em validate_p_multiple() que validava incorretamente highlights com apenas 1 parágrafo
Adiciona validação ausente para keywords proibidas em abstracts especiais (graphical, key-points, summary)
Corrige modelo de dados que não retornava campo graphic para Visual Abstract
Implementa filtro de resultados None no fluxo de validação

Features Adicionadas:

Suporte completo a Visual Abstract (graphic, fig_id, caption)
Validação de keywords como elementos irmãos de <abstract> (não dentro)
Internacionalização (i18n) com advice_text e advice_params
15 novos testes cobrindo todos os cenários SPS 1.10
Validação robusta de estruturas de dados (dict vs string)

Onde a revisão poderia começar?

Modelo de dados: packtools/sps/models/v2/abstract.py
- Linhas ~161-210: Novas propriedades graphic, fig_id, caption
- Linhas ~220-235: Inclusão dos novos campos no dict data
Validação crítica: packtools/sps/validation/article_abstract.py
- Linhas ~265-301: Novo método validate_kwd_not_allowed()
- Linhas ~395-425: Correção em validate_p_multiple() (linha ~418: > 1 ao invés de <= 1)
- Linhas ~139-172: Filtro de None no método validate()
Testes: tests/sps/validation/test_article_abstract_additional.py
- Linhas ~28-120: Testes para Summary abstract
- Linhas ~122-250: Testes para resumo simples/estruturado
- Linhas ~252-360: Testes para cenários positivos

Como este poderia ser testado manualmente?

Teste 1: Validação de Visual Abstract com graphic

python3 << 'EOF'
from lxml import etree as ET
from packtools.sps.models.v2.abstract import XMLAbstracts
from packtools.sps.validation.article_abstract import AbstractValidation

xml = """<article xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en">
<front><article-meta>
<abstract abstract-type="graphical" xml:lang="en">
    <title>Visual Abstract</title>
    <p><fig id="vs1"><graphic xlink:href="image.jpg"/></fig></p>
</abstract>
</article-meta></front></article>"""

xmltree = ET.fromstring(xml)
abstracts = list(XMLAbstracts(xmltree).visual_abstracts)
validator = AbstractValidation(abstracts[0])
results = list(validator.validate())

# Deve retornar graphic="image.jpg" e validação OK
print("Graphic:", abstracts[0].get("graphic"))
print("Validação graphic:", [r for r in results if r["title"] == "graphic"][0])
EOF

Resultado esperado: graphic: "image.jpg" e response: "OK"

Teste 2: Validação de keywords proibidas em Highlights

python3 << 'EOF'
from lxml import etree as ET
from packtools.sps.models.v2.abstract import XMLAbstracts
from packtools.sps.validation.article_abstract import AbstractValidation

xml = """<article xml:lang="en">
<front><article-meta>
<abstract abstract-type="key-points" xml:lang="en">
    <title>HIGHLIGHTS</title>
    <p>Highlight 1</p>
    <p>Highlight 2</p>
</abstract>
<kwd-group xml:lang="en">
    <title>Keywords:</title>
    <kwd>keyword1</kwd>
</kwd-group>
</article-meta></front></article>"""

xmltree = ET.fromstring(xml)
abstracts = list(XMLAbstracts(xmltree).key_points_abstracts)
validator = AbstractValidation(abstracts[0])
results = list(validator.validate())

# Deve detectar erro: keywords não permitidas
print("Keywords encontradas:", abstracts[0].get("kwds"))
print("Validação:", [r for r in results if r["title"] == "unexpected kwd"][0])
EOF

Resultado esperado: response: "ERROR" com advice para remover keywords

Teste 3: Suite completa de testes

cd /home/luciano/projects/packtools
python -m unittest tests.sps.validation.test_article_abstract

Resultado esperado: OK (skipped=3) - 12 testes passando, 3 skipped

Algum cenário de contexto que queira dar?

Contexto da Mudança

O packtools possui dois modelos de dados para abstracts:

Modelo antigo (packtools/sps/models/article_abstract.py):
- Buscava keywords dentro de <abstract> (incorreto segundo SPS 1.10)
- Tinha campo graphic para Visual Abstract
Modelo novo (packtools/sps/models/v2/abstract.py):
- Busca keywords como irmãos de <abstract> (correto segundo SPS 1.10)
- Faltava campo graphic (migração incompleta)

Especificação SPS 1.10

Segundo a documentação SPS 1.10 para <abstract>:

Resumos simples/estruturados (sem @abstract-type): EXIGEM <kwd-group> irmão
Tipos especiais (graphical, key-points, summary): NÃO PERMITEM <kwd-group> associado
Highlights (key-points): Devem ter múltiplos <p>, um para cada destaque
Visual Abstract (graphical): Deve ter elemento <graphic> obrigatório

Bugs Críticos Encontrados

validate_p_multiple() tinha lógica invertida:

   # ANTES (ERRADO)
   is_valid = len(p_list) <= 1  # Validava OK com 1 parágrafo
   
   # DEPOIS (CORRETO)
   is_valid = len(p_list) > 1   # Valida OK com múltiplos parágrafos

Validação ausente para keywords proibidas em tipos especiais
Modelo incompleto sem campo graphic para Visual Abstract

Screenshots

N.A. - Funcionalidade backend de validação XML

Quais são tickets relevantes?

N.A.

Referências

N.A.

Adiciona suporte completo para Visual Abstract (graphical) conforme SPS 1.10: - Propriedade graphic: extrai xlink:href do elemento <graphic> - Propriedade fig_id: extrai ID da figura - Propriedade caption: extrai legenda da figura - Inclui campos no dict retornado por data()

Correções críticas: - Corrige lógica invertida em validate_p_multiple() para highlights - Implementa validate_kwd_not_allowed() para tipos especiais - Filtra resultados None no método validate() Melhorias: - Adiciona suporte a internacionalização (i18n) - Valida title como dict (plain_text/html_text) - Validação robusta de graphic em visual abstracts Conforme especificação SPS 1.10: keywords proibidas em graphical, key-points e summary; obrigatórias em simples/estruturado.

Cobertura de testes: - Summary (In Brief): validação de keywords e title - Resumo simples/estruturado: keywords obrigatórias - Highlights: múltiplos <p> e ausência de keywords - Visual Abstract: presença de graphic e ausência de keywords - Trans-abstract: associação de keywords por idioma - Abstract-type: valores válidos/inválidos Testes robustos com skipTest() para features não disponíveis.

Copilot

Pull request overview

This pull request implements complete validation of abstracts according to SPS 1.10 specification, fixing critical bugs and adding missing functionality to the packtools abstract validation system.

Changes:

Fixed inverted logic in validate_p_multiple() that incorrectly validated highlights with only 1 paragraph (line 404: changed from <= 1 to > 1)
Added validate_kwd_not_allowed() method to prohibit keywords in special abstract types (graphical, key-points, summary)
Implemented missing graphic, fig_id, and caption properties in the v2 Abstract model for Visual Abstract support
Added internationalization support with advice_text and advice_params fields throughout validation methods
Implemented None result filtering in the validation flow (lines 133-164)
Added comprehensive test suite with 15 new test cases covering all SPS 1.10 abstract scenarios

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
`packtools/sps/models/v2/abstract.py`	Added three new properties (`graphic`, `fig_id`, `caption`) to extract visual abstract elements and included them in the data dictionary output
`packtools/sps/validation/article_abstract.py`	Fixed critical bug in `validate_p_multiple()`, added `validate_kwd_not_allowed()` method, implemented i18n support, and restructured validation flow with None filtering
`tests/sps/validation/test_article_abstract.py`	Complete rewrite with 15 new test cases organized into 7 test classes covering summary, simple, structured, highlights, visual abstract, abstract-type, and trans-abstract validation scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-19T11:41:32Z

tests/sps/validation/test_article_abstract.py

+        # Check if graphic field exists in model data
+        # If model returns graphic, it should have a value
+        if "graphic" in abstract_data:
+            self.assertIsNotNone(abstract_data["graphic"],
+                               "Graphic field exists but is None - graphic should be extracted from XML")
+
+        validator = AbstractValidation(abstract_data)
+        obtained = list(validator.validate())
+
+        graphic_validation = [v for v in obtained if v["title"] == "graphic"]
+
+        if graphic_validation:
+            # If graphic validation exists, it should pass for valid graphic
+            self.assertEqual(graphic_validation[0]["response"], "OK")


The test has conditional assertions (lines 398-400 and 407-409) that silently pass if the conditions are not met. This is a weak test pattern that doesn't guarantee the expected behavior is validated. Consider using self.assertIn("graphic", abstract_data) to ensure the field exists, and always check the graphic_validation list is not empty before accessing it, or use self.assertTrue(graphic_validation) to fail if validation is missing. This ensures the test actually validates what it claims to validate.

Copilot · 2026-01-19T11:41:33Z

packtools/sps/models/v2/abstract.py

+        graphic_node = self.node.find('.//graphic', namespaces={'xlink': 'http://www.w3.org/1999/xlink'})
+        if graphic_node is not None:
+            return graphic_node.get('{http://www.w3.org/1999/xlink}href')
+        return None


The find() method doesn't officially support the namespaces parameter according to lxml documentation. The recommended approach is to use xpath() with namespaces or include the namespace directly in the tag name like find('.//{http://www.w3.org/1999/xlink}graphic'). While this pattern appears to work (it's used in the old model), consider using the more standard approach: self.node.xpath('.//graphic/@xlink:href', namespaces={'xlink': 'http://www.w3.org/1999/xlink'}) which would return the href directly, or find('.//{http://www.w3.org/1999/xlink}graphic') for better code clarity and future compatibility.

Suggested change

graphic_node = self.node.find('.//graphic', namespaces={'xlink': 'http://www.w3.org/1999/xlink'})

if graphic_node is not None:

return graphic_node.get('{http://www.w3.org/1999/xlink}href')

return None

hrefs = self.node.xpath(

'.//graphic/@xlink:href',

namespaces={'xlink': 'http://www.w3.org/1999/xlink'},

)

return hrefs[0] if hrefs else None

Copilot · 2026-01-19T11:41:33Z

packtools/sps/validation/article_abstract.py

+        # Check if graphic exists and is not empty
+        has_graphic = bool(graphic and (graphic if isinstance(graphic, str) else True))


The validation logic bool(graphic and (graphic if isinstance(graphic, str) else True)) doesn't check if string values are non-empty after stripping whitespace. While the model returns xlink:href attribute values that are unlikely to be empty or whitespace-only, consider using bool(graphic and (isinstance(graphic, str) and graphic.strip() or not isinstance(graphic, str))) for more robust validation, or simply bool(graphic) if the model guarantees non-empty strings.

Suggested change

# Check if graphic exists and is not empty

has_graphic = bool(graphic and (graphic if isinstance(graphic, str) else True))

# Check if graphic exists and is not empty (ignoring whitespace-only strings)

if isinstance(graphic, str):

has_graphic = bool(graphic.strip())

else:

has_graphic = bool(graphic)

Copilot · 2026-01-19T11:41:33Z

tests/sps/validation/test_article_abstract.py

+        self.assertEqual(kwd_validation["validation_type"], "exist")
+        self.assertEqual(kwd_validation["expected_value"], None)
+        self.assertIsNotNone(kwd_validation["got_value"])
+        self.assertTrue(len(kwd_validation["got_value"]) > 0)


assertTrue(a > b) cannot provide an informative message. Using assertGreater(a, b) instead will give more informative messages.

Suggested change

self.assertTrue(len(kwd_validation["got_value"]) > 0)

self.assertGreater(len(kwd_validation["got_value"]), 0)

robertatakenaka · 2026-01-26T12:01:01Z

packtools/sps/validation/article_abstract.py

+
+        # Graphic comes as string (xlink:href) from model
+        # Check if graphic exists and is not empty
+        has_graphic = bool(graphic and (graphic if isinstance(graphic, str) else True))


@Rossi-Luciano dá para ser mais simples só has_graphic = bool(graphic)

robertatakenaka · 2026-01-26T13:12:10Z

packtools/sps/models/v2/abstract.py

            "sections": list(self.sections),
            "list_items": list(self.list_items),
            "kwds": list(self.kwds),
+            "graphic": self.graphic,  # For visual abstracts


@Rossi-Luciano me parece que a intenção é retornar o nome ou url do arquivo, então melhore o nome, por exemplo: graphic_xlink_href, assim mais adiante não fica dúvida se é link ou se é node.

robertatakenaka · 2026-01-26T13:12:42Z

packtools/sps/models/v2/abstract.py

+        return None
+
+    @property
+    def graphic(self):


@Rossi-Luciano me parece que a intenção é retornar o nome ou url do arquivo, então melhore o nome, por exemplo: graphic_xlink_href, assim mais adiante não fica dúvida se é link ou se é node.

robertatakenaka · 2026-01-26T13:22:20Z

tests/sps/validation/test_article_abstract.py

+        if not kwd_validation:
+            self.skipTest("Keyword validation not found")
+
+        self.assertEqual(kwd_validation[0]["response"], "OK")


@Rossi-Luciano os testes não estão testando a nova modificação que é a inserção das novas chaves

robertatakenaka

@Rossi-Luciano verificar os comentários

- Adiciona propriedade graphic_href para visual abstracts - Corrige uso incorreto de namespaces no find() do lxml - Adiciona campos graphic_href, fig_id e caption ao dict data - Melhora documentação explicando schema JATS/SPS

- Atualiza validador para usar graphic_href ao invés de graphic - Corrige bug: strings whitespace-only agora são inválidas - Substitui lógica confusa por validação explícita com strip()

- Remove asserções condicionais que passavam silenciosamente - Adiciona validações obrigatórias para campo graphic_href - Adiciona testes para casos inválidos (sem graphic, href vazio) - Melhora mensagens de erro com contexto detalhado

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-27T21:53:02Z

packtools/sps/models/v2/abstract.py

+
+            DO NOT use find() with namespaces parameter as it's not officially
+            supported by lxml and will be ignored silently.


O docstring de graphic_href afirma que find() com parâmetro namespaces “não é suportado pelo lxml e será ignorado silenciosamente”, mas lxml.etree.Element.find() aceita namespaces (mapeamento prefixo→URI). Ajuste/remova essa observação para não induzir a uso incorreto da API.

Suggested change

DO NOT use find() with namespaces parameter as it's not officially

supported by lxml and will be ignored silently.

Copilot · 2026-01-27T21:53:02Z

tests/sps/validation/test_article_abstract.py

+        # MANDATORY VALIDATION: Field 'graphic' MUST be present after model v2 fix
+        self.assertIn("graphic_href", abstract_data,
+                      "Campo 'graphic' DEVE estar presente em abstract_data após correção do modelo v2. "
+                      "Se este teste falhar, verifique se packtools/sps/models/v2/abstract.py "
+                      "tem a propriedade @graphic implementada.")
+
+        # MANDATORY VALIDATION: Field 'graphic' MUST have value (extracted xlink:href)
+        self.assertIsNotNone(abstract_data["graphic_href"],
+                             "Campo 'graphic' não deve ser None para visual abstract. "
+                             "Deve conter o valor do atributo xlink:href extraído do XML.")
+
+        # MANDATORY VALIDATION: Verify type and content
+        self.assertIsInstance(abstract_data["graphic_href"], str,
+                              "Campo 'graphic' deve ser string (valor de xlink:href)")
+
+        self.assertTrue(abstract_data["graphic_href"].strip(),
+                        "Campo 'graphic' não pode ser string vazia")
+
+        # MANDATORY VALIDATION: Verify expected value
+        self.assertEqual(abstract_data["graphic_href"], "1234-5678-va-01.jpg",
+                         "Valor de 'graphic' deve corresponder ao xlink:href do XML")


As mensagens/nomes nos asserts usam “campo 'graphic'” e mencionam “propriedade @Graphic”, mas o modelo v2 expõe graphic_href (e o teste também valida a chave graphic_href). Atualize o texto dessas mensagens para graphic_href para evitar confusão na manutenção/diagnóstico de falhas.

Suggested change

# MANDATORY VALIDATION: Field 'graphic' MUST be present after model v2 fix

self.assertIn("graphic_href", abstract_data,

"Campo 'graphic' DEVE estar presente em abstract_data após correção do modelo v2. "

"Se este teste falhar, verifique se packtools/sps/models/v2/abstract.py "

"tem a propriedade @graphic implementada.")

# MANDATORY VALIDATION: Field 'graphic' MUST have value (extracted xlink:href)

self.assertIsNotNone(abstract_data["graphic_href"],

"Campo 'graphic' não deve ser None para visual abstract. "

"Deve conter o valor do atributo xlink:href extraído do XML.")

# MANDATORY VALIDATION: Verify type and content

self.assertIsInstance(abstract_data["graphic_href"], str,

"Campo 'graphic' deve ser string (valor de xlink:href)")

self.assertTrue(abstract_data["graphic_href"].strip(),

"Campo 'graphic' não pode ser string vazia")

# MANDATORY VALIDATION: Verify expected value

self.assertEqual(abstract_data["graphic_href"], "1234-5678-va-01.jpg",

"Valor de 'graphic' deve corresponder ao xlink:href do XML")

# MANDATORY VALIDATION: Field 'graphic_href' MUST be present after model v2 fix

self.assertIn("graphic_href", abstract_data,

"Campo 'graphic_href' DEVE estar presente em abstract_data após correção do modelo v2. "

"Se este teste falhar, verifique se packtools/sps/models/v2/abstract.py "

"tem a propriedade graphic_href implementada.")

# MANDATORY VALIDATION: Field 'graphic_href' MUST have value (extracted xlink:href)

self.assertIsNotNone(abstract_data["graphic_href"],

"Campo 'graphic_href' não deve ser None para visual abstract. "

"Deve conter o valor do atributo xlink:href extraído do XML.")

# MANDATORY VALIDATION: Verify type and content

self.assertIsInstance(abstract_data["graphic_href"], str,

"Campo 'graphic_href' deve ser string (valor de xlink:href)")

self.assertTrue(abstract_data["graphic_href"].strip(),

"Campo 'graphic_href' não pode ser string vazia")

# MANDATORY VALIDATION: Verify expected value

self.assertEqual(abstract_data["graphic_href"], "1234-5678-va-01.jpg",

"Valor de 'graphic_href' deve corresponder ao xlink:href do XML")

Copilot · 2026-01-27T21:53:03Z

tests/sps/validation/test_article_abstract.py

+        p_validations = [v for v in obtained if v["sub_item"] == "p"]
+
+        # Should have validation for <p> count
+        self.assertTrue(len(p_validations) > 0)


assertTrue(a > b) cannot provide an informative message. Using assertGreater(a, b) instead will give more informative messages.

Suggested change

self.assertTrue(len(p_validations) > 0)

self.assertGreater(len(p_validations), 0)

Rossi-Luciano added 3 commits January 17, 2026 11:42

Rossi-Luciano requested a review from robertatakenaka January 17, 2026 14:49

robertatakenaka requested a review from Copilot January 19, 2026 11:36

Copilot started reviewing on behalf of robertatakenaka January 19, 2026 11:36 View session

Copilot AI reviewed Jan 19, 2026

View reviewed changes

robertatakenaka reviewed Jan 26, 2026

View reviewed changes

robertatakenaka requested changes Jan 26, 2026

View reviewed changes

Rossi-Luciano and others added 4 commits January 27, 2026 14:41

fix(validation): corrige validação de graphic_href em abstracts

383040d

- Atualiza validador para usar graphic_href ao invés de graphic - Corrige bug: strings whitespace-only agora são inválidas - Substitui lógica confusa por validação explícita com strip()

Merge branch 'master' into feature/sps-1.10-abstract-validation

edb77ae

Rossi-Luciano requested a review from robertatakenaka January 27, 2026 17:46

robertatakenaka requested a review from Copilot January 27, 2026 21:47

Copilot started reviewing on behalf of robertatakenaka January 27, 2026 21:48 View session

Copilot AI reviewed Jan 27, 2026

View reviewed changes

		# Check if graphic exists and is not empty
		has_graphic = bool(graphic and (graphic if isinstance(graphic, str) else True))

-        # Check if graphic exists and is not empty
-        has_graphic = bool(graphic and (graphic if isinstance(graphic, str) else True))
+        # Check if graphic exists and is not empty (ignoring whitespace-only strings)
+        if isinstance(graphic, str):
+            has_graphic = bool(graphic.strip())
+        else:
+            has_graphic = bool(graphic)

	self.assertTrue(len(kwd_validation["got_value"]) > 0)
	self.assertGreater(len(kwd_validation["got_value"]), 0)


		DO NOT use find() with namespaces parameter as it's not officially
		supported by lxml and will be ignored silently.

	self.assertTrue(len(p_validations) > 0)
	self.assertGreater(len(p_validations), 0)

Implementa validação completa de abstracts conforme SPS 1.10 #1064

Are you sure you want to change the base?

Implementa validação completa de abstracts conforme SPS 1.10 #1064

Uh oh!

Conversation

Rossi-Luciano commented Jan 17, 2026

O que esse PR faz?

Onde a revisão poderia começar?

Como este poderia ser testado manualmente?

Teste 1: Validação de Visual Abstract com graphic

Teste 2: Validação de keywords proibidas em Highlights

Teste 3: Suite completa de testes

Algum cenário de contexto que queira dar?

Contexto da Mudança

Especificação SPS 1.10

Bugs Críticos Encontrados

Screenshots

Quais são tickets relevantes?

Referências

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

robertatakenaka Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

robertatakenaka Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

robertatakenaka Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

robertatakenaka Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

robertatakenaka left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants