-
Notifications
You must be signed in to change notification settings - Fork 24
feat(validation): implementa validações completas de <long-desc> conforme SPS 1.10 #1063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat(validation): implementa validações completas de <long-desc> conforme SPS 1.10 #1063
Conversation
- Adiciona validação de restrição de mídia (video/mp4 ou audio/mp3) - Adiciona detecção de duplicação de label/caption - Adiciona validação de ocorrência única (0 ou 1 por elemento) - Adiciona verificação de incompatibilidade com alt-text='null' - Implementa suporte completo a i18n (msg_text, adv_text, adv_params) - Atinge 100% de conformidade SPS 1.10 para <long-desc>
- 3 testes para restrição de mídia - 3 testes para detecção de duplicação - 2 testes para validação de ocorrência - 3 testes para incompatibilidade com alt-text null - Verifica internacionalização em todos os testes - Verifica error levels configurados
- long_desc_media_restriction_error_level: ERROR - long_desc_duplication_error_level: WARNING - long_desc_occurrence_error_level: ERROR - long_desc_null_incompatibility_error_level: WARNING
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements complete validation rules for the <long-desc> XML element to achieve 100% compliance with SPS 1.10 accessibility specifications. The implementation adds four new validation methods following established patterns from existing <alt-text> validations.
Changes:
- Added 4 new validation methods for
<long-desc>with full internationalization support - Updated configuration to include error level parameters for new validations
- Added comprehensive test coverage (13 new test cases) for all validation scenarios
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| packtools/sps/validation/accessibility_data.py | Implements 4 new validation methods: media restriction, duplication detection, occurrence validation, and null alt-text incompatibility |
| packtools/sps/validation_rules/accessibility_data_rules.json | Adds 4 new error level configuration parameters for the new validations |
| tests/sps/validation/test_accessibility_data.py | Adds 13 comprehensive test cases covering valid and invalid scenarios for all new validations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Só retorna resultado quando há erro (validação inválida) | ||
| if not valid: |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation method uses early return when valid=True but yields when valid=False. This is inconsistent with the similar method validate_alt_text_media_restriction() (lines 120-189) which always yields a response regardless of validity. For consistency and complete validation reporting, consider yielding a response in both cases or document why this validation only reports errors.
| # Só retorna resultado quando há erro (validação inválida) | |
| if not valid: | |
| # Sempre retorna um resultado, indicando se a validação foi bem-sucedida ou não. | |
| if valid: | |
| yield build_response( | |
| title="<long-desc> restriction for media", | |
| parent=self.data, | |
| item="long-desc", | |
| sub_item="media restriction", | |
| is_valid=True, | |
| validation_type="value in list", | |
| expected=f"mime-type/mime-subtype combinations: {valid_combinations}", | |
| obtained=f"{mimetype}/{mime_subtype}", | |
| advice=None, | |
| error_level=self.params.get("long_desc_media_restriction_error_level", "ERROR"), | |
| data=self.data, | |
| ) | |
| else: |
|
|
||
| # Conta quantas vezes <long-desc> aparece no XML | ||
| # Usa uma abordagem simples de contar tags de abertura | ||
| long_desc_count = xml_str.count("<long-desc") |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using string counting to detect multiple <long-desc> elements is fragile and could produce false positives (e.g., if <long-desc> appears in comments or text content). Since self.data comes from AccessibilityData which has access to self.node, consider using XPath len(self.node.xpath('.//long-desc')) for more reliable element counting. Alternatively, if the XML string approach is intentional, this needs proper context validation to ensure accuracy.
| advice_params = { | ||
| "alt_text": alt_text, | ||
| "long_desc": long_desc[:50] + "..." if len(long_desc) > 50 else long_desc | ||
| } |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The advice_params dictionary includes alt_text and long_desc parameters, but the corresponding advice_text template (lines 662-667) does not use these parameters with placeholders. The internationalized message should either use these parameters (e.g., add placeholders like {alt_text} and {long_desc}) or remove unused parameters from advice_params to maintain consistency with other validation methods.
| results = list(validator.validate()) | ||
|
|
||
| duplication = [r for r in results if "duplication" in str(r.get("sub_item", "")) and r["item"] == "long-desc"] | ||
| self.assertEqual(len(duplication), 1) |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test assertion checks for partial string match in the advice message. Consider also verifying the specific fields like sub_item value equals 'label duplication' to make the test more precise and less prone to false positives if similar text appears in other validation messages.
| self.assertEqual(len(duplication), 1) | |
| self.assertEqual(len(duplication), 1) | |
| self.assertEqual(duplication[0]["sub_item"], "label duplication") |
| NOVA VALIDAÇÃO: Valida incompatibilidade entre <long-desc> e <alt-text>null</alt-text>. | ||
| Regra SPS 1.10: | ||
| "Quando houver o uso combinado <alt-text> não pode ter conteúdo ser null, |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar issue in Portuguese comment: 'não pode ter conteúdo ser null' should be 'não pode ter o conteúdo null' or 'não pode ser null'.
| "Quando houver o uso combinado <alt-text> não pode ter conteúdo ser null, | |
| "Quando houver o uso combinado, <alt-text> não pode ter o conteúdo null; |
| """ | ||
| xml_node = etree.fromstring(xml_content) | ||
| validator = XMLAccessibilityDataValidation(xml_node, self.params) | ||
| results = list(validator.validate()) |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assignment to 'results' is unnecessary as it is redefined before this value is used.
| results = list(validator.validate()) |
Resumo das Sugestões do CopilotAceitas e Implementadas
Rejeitadas
Considera, também, sugestões do PR #1062 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Verifica se temos acesso ao nó XML original para contar ocorrências | ||
| xml_str = self.data.get("xml", "") | ||
|
|
||
| # Conta quantas vezes <long-desc> aparece no XML | ||
| # Usa uma abordagem simples de contar tags de abertura | ||
| long_desc_count = xml_str.count("<long-desc") | ||
|
|
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The occurrence validation uses the 'xml' field which is truncated at 200 characters (see _get_xml_string() in models/accessibility_data.py line 52). When a graphic/media element contains multiple long-desc elements with lengthy content, the XML string may be cut off before the second occurrence, resulting in an incorrect count. Consider using len(self.node.findall('long-desc')) instead of counting string occurrences, or use the full XML without truncation for this specific validation.
| # Verifica se temos acesso ao nó XML original para contar ocorrências | |
| xml_str = self.data.get("xml", "") | |
| # Conta quantas vezes <long-desc> aparece no XML | |
| # Usa uma abordagem simples de contar tags de abertura | |
| long_desc_count = xml_str.count("<long-desc") | |
| # Prefer counting directly on the XML node (not on a truncated XML string) | |
| long_desc_count = None | |
| node = self.data.get("node") | |
| if node is not None: | |
| try: | |
| # Count how many <long-desc> elements are children of this node | |
| long_desc_count = len(node.findall("long-desc")) | |
| except AttributeError: | |
| # Fallback to string-based count if node does not support .findall | |
| long_desc_count = None | |
| if long_desc_count is None: | |
| # Fallback: use the (possibly truncated) XML string representation | |
| xml_str = self.data.get("xml", "") | |
| # Count how many times <long-desc> appears in the XML string | |
| # Uses a simple approach of counting opening tags | |
| long_desc_count = xml_str.count("<long-desc") |
| # Normaliza strings para comparação (case-insensitive e strip) | ||
| long_desc_normalized = long_desc.strip().lower() if long_desc else "" | ||
| label_normalized = parent_label.strip().lower() if parent_label else "" | ||
| caption_normalized = parent_caption_title.strip().lower() if parent_caption_title else "" |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The normalization approach for duplication detection is inconsistent with the alt-text validation. The alt-text validation (line 209) uses " ".join(text.lower().split()) to normalize all whitespace, but this long-desc validation only uses strip().lower() which preserves internal whitespace differences. This could cause false negatives where content with varying internal spaces is not detected as duplicate. Consider using the same normalization approach: long_desc_normalized = " ".join(long_desc.split()).lower() if long_desc else ""
| # Normaliza strings para comparação (case-insensitive e strip) | |
| long_desc_normalized = long_desc.strip().lower() if long_desc else "" | |
| label_normalized = parent_label.strip().lower() if parent_label else "" | |
| caption_normalized = parent_caption_title.strip().lower() if parent_caption_title else "" | |
| # Normaliza strings para comparação (case-insensitive e normalização de espaços em branco) | |
| long_desc_normalized = " ".join(long_desc.lower().split()) if long_desc else "" | |
| label_normalized = " ".join(parent_label.lower().split()) if parent_label else "" | |
| caption_normalized = " ".join(parent_caption_title.lower().split()) if parent_caption_title else "" |
- Adiciona long_desc_count usando node.findall() no modelo - Remove dependência de XML truncado na validação de ocorrência - Normaliza whitespace consistentemente na detecção de duplicação - Adiciona testes para casos extremos e validação de contagem Resolve: falsos negativos em XML > 200 chars e duplicação com espaços variados.
| try: | ||
| return self.node.xpath('xref[@ref-type="sec"]')[0].get("rid") | ||
| except: | ||
| except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Rossi-Luciano trocar except por except Exception dá igual. Qual é a motivação para ignorar a exceção? Faz sentido ignorar todas as exceções? Faz sentido ignorar algumas exceções? É esta avaliação que tem que ser feita. E coloque comentário a motivação.
| "long_desc_media_restriction_error_level": "ERROR", | ||
| "long_desc_duplication_error_level": "WARNING", | ||
| "long_desc_occurrence_error_level": "ERROR", | ||
| "long_desc_null_incompatibility_error_level": "WARNING", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Rossi-Luciano quem estabeleceu o grau de gravidade do problema? Isso está no sps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_long_desc_multiple_occurrences_failure(self): | ||
| """ATUALIZADO: Múltiplas ocorrências de long-desc devem gerar ERROR | ||
| Nota: Com o XPath simplificado, elementos <invalid> não são mais capturados, | ||
| o que está CORRETO. Este teste agora verifica que elementos válidos sem | ||
| dados de acessibilidade ainda são validados corretamente. | ||
| Este teste agora valida que a contagem é feita corretamente usando | ||
| node.findall() em vez de contar strings no XML truncado. | ||
| """ | ||
| xml_content = """ | ||
| <body> | ||
| <media> | ||
| <alt-text>Valid alt text</alt-text> | ||
| <long-desc>""" + "d" * 130 + """</long-desc> | ||
| </media> | ||
| <graphic> | ||
| <long-desc>First detailed description with more than 121 characters to pass the minimum length validation requirement for long-desc</long-desc> | ||
| <long-desc>Second detailed description with more than 121 characters to pass the minimum length validation requirement for long-desc</long-desc> | ||
| </graphic> | ||
| </body> | ||
| """ | ||
| xml_node = etree.fromstring(xml_content) | ||
| validator = XMLAccessibilityDataValidation(xml_node, self.params) | ||
| results = list(validator.validate()) | ||
|
|
||
| # Verificar que a estrutura é válida (media é um elemento válido) | ||
| structure_res = [res for res in results if res["title"] == "structure"] | ||
| self.assertEqual(len(structure_res), 1) | ||
| self.assertEqual(structure_res[0]["response"], "OK") | ||
|
|
||
| # O elemento media é válido | ||
| self.assertEqual(structure_res[0]["got_value"], "media") | ||
| occurrence = [r for r in results if "occurrence" in str(r.get("sub_item", "")) and r["item"] == "long-desc"] | ||
| self.assertEqual(len(occurrence), 1) | ||
| self.assertEqual(occurrence[0]["response"], "ERROR") | ||
| self.assertIn("2 <long-desc> elements", occurrence[0]["advice"]) | ||
|
|
||
| # Verificar internacionalização | ||
| self.assertIn("adv_text", occurrence[0]) | ||
| self.assertIn("adv_params", occurrence[0]) | ||
| self.assertIn("count", occurrence[0]["adv_params"]) | ||
| self.assertEqual(occurrence[0]["adv_params"]["count"], 2) | ||
|
|
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test suite no longer contains coverage for the existing <alt-text> validations (media restriction / duplication / decorative), and there are also no tests for <long-desc> media restriction or label-duplication specifically. Since these behaviors are important and were previously exercised here, consider restoring/adding focused tests for these validation paths to avoid silent regressions.
| <long-desc>Figura mostra crescimento da população</long-desc> | ||
| </graphic> | ||
| <caption> | ||
| <title>Figura mostra crescimento da população</title> |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_long_desc_duplication_with_multiple_spaces uses a <long-desc> shorter than the 120-char minimum, so the validator will also emit the CRITICAL "" length error alongside the duplication WARNING. To keep this test focused and less brittle, make the <long-desc> content long enough to pass the length validation (e.g., by appending filler text) so only the duplication-related result is relevant.
| <long-desc>Figura mostra crescimento da população</long-desc> | |
| </graphic> | |
| <caption> | |
| <title>Figura mostra crescimento da população</title> | |
| <long-desc>Figura mostra crescimento da população ao longo de várias décadas, considerando dados demográficos detalhados por região e faixa etária.</long-desc> | |
| </graphic> | |
| <caption> | |
| <title>Figura mostra crescimento da população ao longo de várias décadas, considerando dados demográficos detalhados por região e faixa etária.</title> |
| advice_text = _( | ||
| "The <long-desc> content {content} duplicates <label>{element}. " | ||
| "Provide unique descriptive content that adds value beyond the label. " | ||
| "Refer to SPS 1.10 documentation for best practices." | ||
| ) |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The i18n template in advice_text looks malformed/inconsistent with the non-i18n advice: it renders duplicates <label>{element}. (missing closing </label>/punctuation/quoting) while advice uses duplicates <label>'{parent_label}'.. Please align advice_text with the actual message format (or simplify to match the <alt-text> duplication template pattern).
| f"Refer to SPS 1.10 documentation for best practices." | ||
| ) | ||
| advice_text = _( | ||
| "The <long-desc> content {content} duplicates <caption><title>{element}. " |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue for the caption duplication message: advice_text renders duplicates <caption><title>{element}. which is malformed/inconsistent with advice (missing closing tags/quoting). Please adjust the i18n template so the rendered message is well-formed and matches the non-i18n advice content.
| "The <long-desc> content {content} duplicates <caption><title>{element}. " | |
| "The <long-desc> content '{content}' duplicates <caption><title>'{element}'. " |
| yield from self.validate_long_desc_media_restriction() # NOVA | ||
| yield from self.validate_long_desc_not_duplicate_label_caption() # NOVA | ||
| yield from self.validate_long_desc_occurrence() # NOVA | ||
| yield from self.validate_long_desc_incompatible_with_null_alt() # NOVA |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validate() now includes validate_long_desc_media_restriction(), but there are no unit tests covering this new validation (no references found under tests/sps/validation/). Please add tests for at least one invalid case (e.g., mimetype='application' mime-subtype='pdf' with <long-desc>) and one valid case (video/mp4 or audio/mp3) to prevent regressions.
O que esse PR faz?
Implementa 4 novas validações para o elemento
<long-desc>conforme especificação SPS 1.10, atingindo 100% de conformidade com as regras de acessibilidade. As validações adicionadas são:<label>ou<caption><alt-text>null</alt-text>Onde a revisão poderia começar?
packtools/sps/validation/accessibility_data.py- linhas 427-697Começar pelos 4 novos métodos de validação:
validate_long_desc_media_restriction()(linha 427)validate_long_desc_not_duplicate_label_caption()(linha 493)validate_long_desc_occurrence()(linha 591)validate_long_desc_incompatible_with_null_alt()(linha 638)Observar que seguem o mesmo padrão estabelecido pelas validações de
<alt-text>.Como este poderia ser testado manualmente?
test_long_desc_media_restriction_invalid()- erro em PDFtest_long_desc_duplicates_label()- detecção de duplicaçãotest_long_desc_multiple_occurrence_failure()- múltiplas ocorrênciastest_long_desc_with_null_alt_text_failure()- incompatibilidadeAlgum cenário de contexto que queira dar?
Este PR complementa as validações de acessibilidade já existentes para
<alt-text>. Anteriormente,<long-desc>tinha apenas validações básicas (existência e comprimento), cobrindo 43% das regras SPS 1.10. Com este PR, atingimos 100% de conformidade.As validações implementadas reutilizam lógica de
<alt-text>quando aplicável (2 das 4 são adaptações), mantendo consistência no código. Todas as validações incluem suporte completo a internacionalização (i18n) commsg_text,msg_params,adv_texteadv_params.Impacto: Melhora significativa na detecção de problemas de acessibilidade em documentos XML SciELO, especialmente para descrições longas de figuras, gráficos, vídeos e áudios.
Screenshots
N.A.
Quais são tickets relevantes?
N.A.
Referências
<long-desc>