-
Notifications
You must be signed in to change notification settings - Fork 2
Expansion_pvalues #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
408fed1
tdata expansion_pvalue with tests (brought over from cassiopeia and w…
isabellechan089 8d8f7c9
addressed changes (name change, multiple trees, key_added param, retu…
isabellechan089 5f8d75e
Merge branch 'main' into expansion_pvalues
isabellechan089 8c45f1e
changelog
isabellechan089 c532f14
added expansion test to docs and changed copy behavior
colganwi af941bd
fixed failing fitness test
colganwi fc1a408
Merge branch 'expansion_pvalues' of https://github.com/YosefLab/pycea…
isabellechan089 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -33,6 +33,7 @@ | |
| tl.n_extant | ||
| tl.fitness | ||
| tl.partition_test | ||
| tl.expansion_test | ||
| ``` | ||
|
|
||
| ## Plotting | ||
|
|
||
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,128 @@ | ||
| from collections.abc import Sequence | ||
| from typing import Literal, overload | ||
|
|
||
| import networkx as nx | ||
| import pandas as pd | ||
| import treedata as td | ||
| from scipy.special import comb as nCk | ||
|
|
||
| from pycea.utils import get_keyed_node_data, get_root, get_trees | ||
|
|
||
|
|
||
| @overload | ||
| def expansion_test( | ||
| tdata: td.TreeData, | ||
| tree: str | Sequence[str] | None = None, | ||
| min_clade_size: int = 10, | ||
| min_depth: int = 1, | ||
| key_added: str = "expansion_pvalue", | ||
| copy: Literal[True, False] = True, | ||
| ) -> pd.DataFrame: ... | ||
| @overload | ||
| def expansion_test( | ||
| tdata: td.TreeData, | ||
| tree: str | Sequence[str] | None = None, | ||
| min_clade_size: int = 10, | ||
| min_depth: int = 1, | ||
| key_added: str = "expansion_pvalue", | ||
| copy: Literal[True, False] = False, | ||
| ) -> None: ... | ||
| def expansion_test( | ||
| tdata: td.TreeData, | ||
| tree: str | Sequence[str] | None = None, | ||
| min_clade_size: int = 10, | ||
| min_depth: int = 1, | ||
| key_added: str = "expansion_pvalue", | ||
| copy: Literal[True, False] = False, | ||
| ) -> pd.DataFrame | None: | ||
| """Compute expansion p-values on a tree. | ||
|
|
||
| Uses the methodology described in :cite:`Yang_2022` to | ||
| assess the expansion probability of a given subclade of a phylogeny. | ||
| Mathematical treatment of the coalescent probability is described in :cite:`Griffiths_1998`. | ||
|
|
||
| The probability computed corresponds to the probability that, under a simple | ||
| neutral coalescent model, a given subclade contains the observed number of | ||
| cells; in other words, a one-sided p-value. Often, if the probability is | ||
| less than some threshold (e.g., 0.05), this might indicate that there exists | ||
| some subclade under this node to which this expansion probability can be | ||
| attributed (i.e. the null hypothesis that the subclade is undergoing | ||
| neutral drift can be rejected). | ||
|
|
||
| This function will add an attribute to tree nodes storing the expansion p-value. | ||
|
|
||
| On a typical balanced tree, this function performs in O(n) time. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| tdata | ||
| TreeData object containing a phylogenetic tree. | ||
| min_clade_size | ||
| Minimum number of leaves in a subtree to be considered. Default is 10. | ||
| min_depth | ||
| Minimum depth of clade to be considered. Depth is measured in number | ||
| of nodes from the root, not branch lengths. Default is 1. | ||
| tree | ||
| The `obst` key or keys of the trees to use. If `None`, all trees are used. | ||
| key_added | ||
| Attribute key where expansion p-values will be stored in tree nodes. | ||
| Default is "expansion_pvalue". | ||
| copy | ||
| If True, return a copy of the TreeData with attributes added. | ||
| If False, modify in place and return None. Default is False. | ||
|
|
||
| Returns | ||
| ------- | ||
| Returns `None` if ``copy=False``, otherwise returns a :class:`pandas.DataFrame` with expansion pvalues. | ||
|
|
||
| Sets the following fields: | ||
|
|
||
| * tdata.obst[tree].nodes[key_added] : `float` | ||
| - Expansion pvalue for each node. | ||
| """ | ||
| trees = get_trees(tdata, tree) | ||
|
|
||
| for _tree_key, t in trees.items(): | ||
| root = get_root(t) | ||
| # instantiate attributes | ||
| leaf_counts = {} | ||
| for node in nx.dfs_postorder_nodes(t, root): | ||
| if t.out_degree(node) == 0: | ||
| leaf_counts[node] = 1 | ||
| else: | ||
| leaf_counts[node] = sum(leaf_counts[child] for child in t.successors(node)) | ||
|
|
||
| depths = {root: 0} | ||
| for u, v in nx.dfs_edges(t, root): | ||
| depths[v] = depths[u] + 1 | ||
|
|
||
| nx.set_node_attributes(t, 1.0, key_added) | ||
|
|
||
| for node in t.nodes(): | ||
| n = leaf_counts[node] | ||
| children = list(t.successors(node)) | ||
| k = len(children) | ||
|
|
||
| if k == 0: | ||
| continue | ||
|
|
||
| for child in children: | ||
| b = leaf_counts[child] | ||
| depth = depths[child] | ||
|
|
||
| # Apply filters | ||
| if b < min_clade_size: | ||
| continue | ||
| if depth < min_depth: | ||
| continue | ||
|
|
||
| p = nCk(n - b, k - 1) / nCk(n - 1, k - 1) | ||
| t.nodes[child][key_added] = float(p) | ||
|
|
||
| if copy: | ||
| df = get_keyed_node_data(tdata, keys=key_added, tree=tree, slot="obst") | ||
| if len(trees) == 1: | ||
| df.index = df.index.droplevel(0) | ||
| return df | ||
| else: | ||
| return None | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,146 @@ | ||
| import networkx as nx | ||
| import pandas as pd | ||
| import pytest | ||
| import treedata as td | ||
|
|
||
| import pycea as py | ||
| from pycea.tl.topology import expansion_test | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def test_tree(): | ||
| """Create a test TreeData object with a tree topology.""" | ||
| # Create tree topology | ||
| tree = nx.DiGraph() | ||
| tree.add_edges_from( | ||
| [ | ||
| ("0", "1"), | ||
| ("0", "2"), | ||
| ("1", "3"), | ||
| ("1", "4"), | ||
| ("1", "5"), | ||
| ("2", "6"), | ||
| ("2", "7"), | ||
| ("3", "8"), | ||
| ("3", "9"), | ||
| ("3", "16"), | ||
| ("7", "10"), | ||
| ("7", "11"), | ||
| ("8", "12"), | ||
| ("8", "13"), | ||
| ("9", "14"), | ||
| ("9", "15"), | ||
| ("16", "17"), | ||
| ("16", "18"), | ||
| ] | ||
| ) | ||
|
|
||
| # Create TreeData object | ||
| tdata = td.TreeData( | ||
| obs=pd.DataFrame(index=["4", "5", "6", "10", "11", "12", "13", "14", "15", "17", "18"]), | ||
| obst={"tree": tree}, | ||
| ) | ||
|
|
||
| return tdata | ||
|
|
||
|
|
||
| def test_expansion_test_min_clade(test_tree): | ||
| """Test that min_clade_size=20 filters out all clades.""" | ||
| expansion_test(test_tree, min_clade_size=20) | ||
| node_data = py.get.node_df(test_tree) | ||
| assert (node_data["expansion_pvalue"] == 1.0).all(), "All nodes should be filtered with min_clade_size=20" | ||
|
|
||
|
|
||
| def test_expansion_test_basic(test_tree): | ||
| """Test expansion p-values with min_clade_size=2.""" | ||
| result = expansion_test(test_tree, min_clade_size=2, copy=True) | ||
| expected_basic = { | ||
| "0": 1.0, | ||
| "1": 0.3, | ||
| "2": 0.8, | ||
| "3": 0.047, | ||
| "4": 1.0, | ||
| "5": 1.0, | ||
| "6": 1.0, | ||
| "7": 0.5, | ||
| "8": 0.6, | ||
| "9": 0.6, | ||
| "10": 1.0, | ||
| "11": 1.0, | ||
| "12": 1.0, | ||
| "13": 1.0, | ||
| "14": 1.0, | ||
| "15": 1.0, | ||
| "16": 0.6, | ||
| "17": 1.0, | ||
| "18": 1.0, | ||
| } | ||
| node_data = py.get.node_df(test_tree) | ||
| assert result.shape == (19, 1) | ||
| for node, expected in expected_basic.items(): | ||
| actual = node_data.loc[node, "expansion_pvalue"] | ||
| assert abs(actual - expected) < 0.01, f"Basic: Node {node} expected {expected}, got {actual}" | ||
|
|
||
|
|
||
| def test_expansion_test_depth_filter(test_tree): | ||
| """Test filtering with min_depth=3.""" | ||
| expansion_test(test_tree, min_clade_size=2, min_depth=3) | ||
| expected_depth = { | ||
isabellechan089 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| "0": 1.0, | ||
| "1": 1.0, | ||
| "2": 1.0, | ||
| "3": 1.0, | ||
| "4": 1.0, | ||
| "5": 1.0, | ||
| "6": 1.0, | ||
| "7": 1.0, | ||
| "8": 0.6, | ||
| "9": 0.6, | ||
| "10": 1.0, | ||
| "11": 1.0, | ||
| "12": 1.0, | ||
| "13": 1.0, | ||
| "14": 1.0, | ||
| "15": 1.0, | ||
| "16": 0.6, | ||
| "17": 1.0, | ||
| "18": 1.0, | ||
| } | ||
| node_data = py.get.node_df(test_tree) | ||
| for node, expected in expected_depth.items(): | ||
| actual = node_data.loc[node, "expansion_pvalue"] | ||
| assert abs(actual - expected) < 0.01, f"Depth filter: Node {node} expected {expected}, got {actual}" | ||
|
|
||
|
|
||
| def test_expansion_test_multiple_trees(): | ||
| """Test multiple trees.""" | ||
| tree1 = nx.DiGraph() | ||
| tree1.add_edges_from([("0", "1"), ("0", "2")]) | ||
| tree2 = nx.DiGraph() | ||
| tree2.add_edges_from([("A", "B"), ("A", "C")]) | ||
| tdata_multi = td.TreeData( | ||
| obs=pd.DataFrame(index=["1", "2", "B", "C"]), | ||
| obst={"tree1": tree1, "tree2": tree2}, | ||
| ) | ||
| expansion_test(tdata_multi, min_clade_size=2) | ||
| assert "expansion_pvalue" in tdata_multi.obst["tree1"].nodes["0"] | ||
| assert "expansion_pvalue" in tdata_multi.obst["tree2"].nodes["A"] | ||
|
|
||
| tdata_multi2 = td.TreeData( | ||
| obs=pd.DataFrame(index=["1", "2", "3", "4", "B", "C"]), | ||
| obst={"tree1": tree1.copy(), "tree2": tree2.copy()}, | ||
| ) | ||
| expansion_test(tdata_multi2, min_clade_size=2, tree="tree1") | ||
| assert "expansion_pvalue" in tdata_multi2.obst["tree1"].nodes["0"] | ||
| assert "expansion_pvalue" not in tdata_multi2.obst["tree2"].nodes["A"] | ||
|
|
||
|
|
||
| def test_expansion_test_custom_key(test_tree): | ||
| """Test using custom key_added parameter.""" | ||
| expansion_test(test_tree, min_clade_size=2, key_added="custom_pvalue") | ||
| assert "custom_pvalue" in test_tree.obst["tree"].nodes["0"] | ||
| assert "expansion_pvalue" not in test_tree.obst["tree"].nodes["0"] | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| pytest.main(["-v", __file__]) | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.