-
Notifications
You must be signed in to change notification settings - Fork 12
Feat/methylation filtering #283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds filtering of sex chromosome probes to the UMAP generation pipeline and generates lists of probes that are affected by SNPs or do not map to the genome. The changes enhance the methylation workflow by providing more granular control over probe filtering and making filtered probe lists available as outputs.
Key changes:
- Added sex chromosome probe filtering capability to the UMAP generation
- Generated and output lists of SNP-affected probes and non-genomic probes
- Implemented a batched concatenation mechanism for large probe lists
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| workflows/methylation/methylation-standard.wdl | Added new outputs and batched probe list concatenation logic |
| workflows/methylation/methylation-preprocess.wdl | Added task to list sex chromosome probes and updated outputs |
| workflows/methylation/methylation-cohort.wdl | Integrated sex probe filtering into the cohort workflow |
| workflows/methylation/CHANGELOG.md | Documented new probe list outputs |
| scripts/methylation/methylation-preprocess.R | Added logic to identify and output SNP-affected and non-genomic probes |
| scripts/methylation/list-sex-probes.R | New script to generate sex chromosome probe list |
| scripts/methylation/filter.py | Added support for excluding probes from additional file sources |
| scripts/CHANGELOG.md | Documented script changes |
| docker/pandas/package.json | Incremented revision for pandas container |
| docker/minfi/package.json | Incremented revision for minfi container |
| docker/minfi/Dockerfile | Added new list-sex-probes.R script to container |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Ari Frantz <ari.frantz@stjude.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| File probe_list = probe_files[num] | ||
| } | ||
| } | ||
| } | ||
| scatter (iter_index in range(length(probe_list))){ | ||
| call concat_and_uniq { input: | ||
| files_to_combine = select_all(probe_list[iter_index]), |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable 'probe_list' is used here but refers to a 2D array of optional Files from the nested scatter. Consider renaming to 'probe_list_batches' or 'probe_file_batches' to clarify it contains batches of probe files.
| File probe_list = probe_files[num] | |
| } | |
| } | |
| } | |
| scatter (iter_index in range(length(probe_list))){ | |
| call concat_and_uniq { input: | |
| files_to_combine = select_all(probe_list[iter_index]), | |
| File probe_file_batches = probe_files[num] | |
| } | |
| } | |
| } | |
| scatter (iter_index in range(length(probe_file_batches))){ | |
| call concat_and_uniq { input: | |
| files_to_combine = select_all(probe_file_batches[iter_index]), |
| File probe_list_non_genomic = non_genomic_probe_list[num_ng] | ||
| } | ||
| } | ||
| } | ||
| scatter (iter_index in range(length(probe_list_non_genomic))){ | ||
| call concat_and_uniq as non_genomic_concat { input: | ||
| files_to_combine = select_all(probe_list_non_genomic[iter_index]), |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable 'probe_list_non_genomic' refers to a 2D array of optional Files from the nested scatter. Consider renaming to 'non_genomic_probe_batches' or 'non_genomic_file_batches' to clarify it contains batches of probe files.
| File probe_list_non_genomic = non_genomic_probe_list[num_ng] | |
| } | |
| } | |
| } | |
| scatter (iter_index in range(length(probe_list_non_genomic))){ | |
| call concat_and_uniq as non_genomic_concat { input: | |
| files_to_combine = select_all(probe_list_non_genomic[iter_index]), | |
| File non_genomic_probe_batches = non_genomic_probe_list[num_ng] | |
| } | |
| } | |
| } | |
| scatter (iter_index in range(length(non_genomic_probe_batches))){ | |
| call concat_and_uniq as non_genomic_concat { input: | |
| files_to_combine = select_all(non_genomic_probe_batches[iter_index]), |
Add filtering of sex chromosomes to the UMAP generation. Also generate a list of probes that have SNPs.
Before submitting this PR, please make sure:
scripts/ordocker/directories, please ensure any image versions have been incremented accordingly!