Conversation
agalitsyna
left a comment
There was a problem hiding this comment.
comments after a brief checkout, mostly my misunderstanding plus generally mysterious parts of the code highlighted
pairtools/cli/coverage.py
Outdated
There was a problem hiding this comment.
This description feels a bit incomplete. Maybe add a note on what is the zero point relative to which the shift wil be performed here? Also, what will be the direction of the shift?
| type=click.Choice(["5", "3", "0"]), | ||
| default="0", | ||
| show_default=True, | ||
| help="5', 3' end, or 0 (default) - whatever is reported as pos1 and pos2 in pairs", |
There was a problem hiding this comment.
- What is default "0" here?
- How does this parameter interact with the "side" parameter?
There was a problem hiding this comment.
- Default is whatever is reported in pos1 or pos2 (so depends on how the pairs where generated).
- Well... It calculates coverage of 5', 3' or default fragment ends in the specified side (i.e. --side 1 --end 5 will use pos51, --side 2 --end 3 will use pos32)
| type=str, | ||
| required=False, | ||
| ) | ||
| @click.option( |
There was a problem hiding this comment.
Instead of having two separate parameters "side" and "end", maybe it's better to allow the user to submit the list of columns to use for the coverage calculation? It's just a suggestion, might be not the smartest one.
There was a problem hiding this comment.
Interesting idea. How would you specify it in case you want to have the coverage by midpoint or whole length, as you suggest below and I also wanted to implement anyway?
| default="1", | ||
| show_default=True, | ||
| help="0: both sides, 1: first side, 2: second side", | ||
| ) |
There was a problem hiding this comment.
Might be a good idea to allow using (1) the midpoints of the reads for calculation and (2) whole length of the read (each nucleotide of the read adds +1 to the coverage)
There was a problem hiding this comment.
Yes, will add it! At least the midpoint I was planning anyway for sure.
| "first column lists scaffold names. If not provided, will be extracted " | ||
| "from the header of the input pairs file.", | ||
| ) | ||
| @click.option( |
There was a problem hiding this comment.
Is there a way to disable the output bgzip-/lz4c-file and have bigwig only? If bgzip-/lz4c- output is required, it shall have an enormous burden on the IO if you need bigwig only...
There was a problem hiding this comment.
I think in pairtools we always have a text output, no? How bioframe.to_bigwig works is it saves to text and then converts anyway, so the burden is only double... But I guess we can at least default to saving to stdout and then if you don't want it you dump it to /dev/null?
Add simple calculation of coverage from pairs. Allows to use read1, read2 or both. Can shift each position a certain distance based on strand (e.g. useful for nucleosome dyads from micro-C). If file contains pos51/pos31/... can use that instead of pos1/2. Can do base pair resolution, or some binning.
Still needs tests.
Could be nice to be able to stream output and not generate a giant dataframe for the whole genome, but actually it's only possible for pos1 (since it comes first in sorting). Also atm multiprocessing doesn't seem to do much or anything, perhaps can be optimized somehow.