-
Notifications
You must be signed in to change notification settings - Fork 10
No pid provider, adiciona deprecated_sps_pkg_name para identificar pacotes registrados e melhora a identificação e exclusão de duplicados #1256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…a query - Adiciona cached_property sps_pkg_name e deprecated_sps_pkg_name - Expande busca por pkg_name para incluir todas as variantes
- Adiciona find_duplicated_v2 para buscar duplicatas por pid v2 - Refatora deduplicate_items para suportar mark_as_duplicated e deduplicate - Renomeia fix_duplicated_pkg_name para fix_duplicated_items - fix_duplicated_items agora busca por pkg_name, v2 ou other_pid
- Unifica mark_as_duplicated e deduplicate em única chamada
- Adiciona find_duplicated_pid_v2 para buscar duplicatas por pid v2 - Refatora deduplicate_items para suportar mark_as_duplicated e deduplicate - Remove mark_items_as_duplicated (funcionalidade incorporada em deduplicate_items) - Renomeia fix_duplicated_pkg_name para fix_duplicated_items - find_duplicated_pkg_names retorna QuerySet com values_list
- Unifica mark_as_duplicated e deduplicate em única chamada
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request updates the packtools dependency version and refactors the deduplication logic for both PidProviderXML and Article models. The changes consolidate duplicate handling methods and add support for deduplication based on both package names and v2 PIDs.
Changes:
- Updated packtools dependency from version 4.13.1 to 4.14.0
- Added new properties for handling different package name variations (sps_pkg_name, deprecated_sps_pkg_name) in query_params.py
- Refactored deduplication methods to consolidate mark_as_duplicated and deduplicate operations into a single method with flags
- Extended deduplication logic to handle both pkg_name and v2/pid_v2 identifiers
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| requirements/base.txt | Updates packtools dependency from 4.13.1 to 4.14.0 |
| pid_provider/query_params.py | Adds new properties for package name variations and updates identifier query logic to check multiple package name sources |
| pid_provider/models.py | Refactors deduplication methods to consolidate operations and add v2-based duplicate detection |
| pid_provider/tasks.py | Consolidates mark_as_duplicated and deduplicate task calls into a single method invocation |
| article/models.py | Refactors deduplication methods similar to pid_provider, adding pid_v2-based duplicate detection |
| article/tasks.py | Consolidates mark_as_duplicated and deduplicate task calls into a single method invocation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """ | ||
| Corrige todos os artigos marcados como DATA_STATUS_DUPLICATED com base nos ISSNs fornecidos. | ||
|
|
||
| Args: | ||
| issns: Lista de ISSNs para verificar duplicatas. | ||
| user: Usuário que está executando a operação. | ||
| """ |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring needs to be updated. It still mentions 'artigos' (articles) but this method operates on PidProviderXML items, not articles. Additionally, the Args section should document the new 'mark_as_duplicated' and 'deduplicate' parameters.
| def fix_duplicated_pkg_name(cls, pkg_name, user): | ||
| def fix_duplicated_items(cls, user, pkg_name, v2): | ||
| """ | ||
| Corrige items marcados como PPXML_STATUS_DUPLICATED com base no pkg_name fornecido. |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring is incomplete. It only mentions 'pkg_name fornecido' (provided pkg_name) but the method now also accepts 'v2' parameter. The description should reflect that it can handle both pkg_name and v2 parameters.
| Corrige items marcados como PPXML_STATUS_DUPLICATED com base no pkg_name fornecido. | |
| Corrige items marcados como PPXML_STATUS_DUPLICATED com base no pkg_name ou v2 fornecidos. |
| exc_traceback=exc_traceback, | ||
| action="pid_provider.models.PidProviderXML.fix_duplicated_pkg_name", | ||
| action="pid_provider.models.PidProviderXML.fix_duplicated_items", | ||
| detail=pkg_name, |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'detail' parameter should include v2 when it's provided. Currently it only uses pkg_name, which will be None when fixing by v2.
| detail=pkg_name, | |
| detail=f"pkg_name={pkg_name}, v2={v2}", |
| @cached_property | ||
| def pkg_name(self): | ||
| """Nome do pacote do documento.""" | ||
| """Nome do pacote do documento, parâmtro usado ao instanciar XMLAdapter""" |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling error: 'parâmtro' should be 'parâmetro'
| """Nome do pacote do documento, parâmtro usado ao instanciar XMLAdapter""" | |
| """Nome do pacote do documento, parâmetro usado ao instanciar XMLAdapter""" |
| """ | ||
| Corrige todos os artigos marcados como DATA_STATUS_DUPLICATED com base nos ISSNs fornecidos. | ||
|
|
||
| Args: | ||
| issns: Lista de ISSNs para verificar duplicatas. | ||
| user: Usuário que está executando a operação. | ||
| """ |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation is outdated. The docstring mentions 'issns: Lista de ISSNs' but the method actually accepts 'journal' and 'journal_id' parameters, not 'issns'.
|
|
||
| @classmethod | ||
| def find_duplicated_pid_v2(cls, journal=None, journal_id=None): | ||
| # Busca em ambos os campos de ISSN |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is misleading: 'Busca em ambos os campos de ISSN' (Searches in both ISSN fields) but this method searches by journal/journal_id parameters, not ISSN fields.
| return list(set(item["pkg_name"] for item in duplicates)) | ||
|
|
||
| @classmethod | ||
|
|
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normal methods should have 'self', rather than 'cls', as their first parameter.
| @classmethod |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
article/models.py:1008
- A docstring e a seção
Argsdefix_duplicated_itemsmencionam apenaspkg_name, mas o método agora aceita e utiliza tambémpid_v2como critério de deduplicação; a descrição dos parâmetros e do comportamento deveria ser atualizada para refletir o uso de ambos.
"""
Corrige artigos marcados como DATA_STATUS_DUPLICATED com base no pkg_name fornecido.
Args:
pkg_name: Nome do pacote para verificar duplicatas.
user: Usuário que está executando a operação.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @cached_property | ||
| def pkg_name(self): | ||
| """Nome do pacote do documento.""" | ||
| """Nome do pacote do documento, parâmtro usado ao instanciar XMLAdapter""" |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Há um erro de digitação em "parâmtro" na docstring de pkg_name; o correto é "parâmetro".
| """Nome do pacote do documento, parâmtro usado ao instanciar XMLAdapter""" | |
| """Nome do pacote do documento, parâmetro usado ao instanciar XMLAdapter""" |
| exc_traceback=exc_traceback, | ||
| action="pid_provider.models.PidProviderXML.fix_duplicated_pkg_name", | ||
| action="pid_provider.models.PidProviderXML.fix_duplicated_items", | ||
| detail=pkg_name, |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No UnexpectedEvent.create de fix_duplicated_items, o campo detail sempre recebe apenas pkg_name, de modo que chamadas que deduplicam por v2 não registram o identificador usado; para facilitar depuração, seria melhor incluir o valor de v2 quando ele for o critério (por exemplo, algo como pkg_name or v2).
| detail=pkg_name, | |
| detail=pkg_name or v2, |
| def deduplicate_items(cls, user, issns, mark_as_duplicated=False, deduplicate=False): | ||
| """ | ||
| Corrige todos os artigos marcados como DATA_STATUS_DUPLICATED com base nos ISSNs fornecidos. | ||
|
|
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A descrição do PR menciona a criação do novo campo deprecated_pkg_name em PidProviderXML, mas não há campo ou migração correspondente no código (nenhuma ocorrência de deprecated_pkg_name no app pid_provider); ou o campo/migração está faltando, ou a descrição precisa ser ajustada para refletir a implementação atual.
| """Nome do pacote do documento (deprecated).""" | ||
| return self.xml_adapter.sps_pkg_name |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A propriedade deprecated_sps_pkg_name está retornando self.xml_adapter.sps_pkg_name, o que a torna redundante em relação a sps_pkg_name e impede o uso do campo deprecated_sps_pkg_name exposto pelo packtools 4.14.0 para localizar registros com nomes de pacote antigos; a propriedade deve usar o atributo específico de "deprecated" do adapter (por exemplo, xml_adapter.deprecated_sps_pkg_name).
| """Nome do pacote do documento (deprecated).""" | |
| return self.xml_adapter.sps_pkg_name | |
| """Nome antigo de pacote do documento (deprecated).""" | |
| return self.xml_adapter.deprecated_sps_pkg_name |
Descrição
Implementa busca por
deprecated_sps_pkg_namepara localizar registros PidProviderXML mesmo quando o nome do pacote foi corrigido. Complementa com deduplicação por PID v2 além da existente porpkg_name.Motivação
Correções no
sps_pkg_namepelo packtools 4.14.0 causavam duplicatas porque o sistema não localizava o registro existente com o nome antigo. Os principais casos corrigidos:santes do número do suplemento (ex:v10n2→v10n2s1)orderouseqcomo sufixo para diferenciarCom o suporte a
deprecated_sps_pkg_name, a busca considera tanto o nome atual quanto o anterior, evitando criação de registros duplicados.Alterações
PidProviderXML (
pid_provider/)deprecated_pkg_name: armazena o nome do pacote anterior quando diferente do atual, com índice condicional para otimizaçãopkg_name,sps_pkg_nameedeprecated_sps_pkg_name, garantindo localização de registros mesmo após correção de nomefind_duplicated_v2: identifica duplicatas pelo campov2deduplicate_items: aceita parâmetrosmark_as_duplicatedededuplicatepara controle granularfix_duplicated_pkg_name→fix_duplicated_items: suporta busca porpkg_name,v2ouother_pidArticle (
article/)find_duplicated_pid_v2: identifica duplicatas pelo campopid_v2deduplicate_items: unificamark_items_as_duplicated(removido) com funcionalidade de deduplicaçãofix_duplicated_pkg_name→fix_duplicated_items: suporta busca porpkg_nameoupid_v2Tasks
mark_as_duplicatedededuplicateem única invocação dededuplicate_itemsDependências
sps_pkg_namee expõedeprecated_sps_pkg_name)Migração necessária
Checklist
Bug encontrado
article/models.pylinha 980:spid_v2deve serpid_v2