Added UMAP Figures #60

MattsonCam · 2026-01-23T22:04:26Z

Computed UMAP from sampled JUMP data and visualized UMAPs labeled by different categories.

to prevent sampling more samples than available

review-notebook-app · 2026-01-23T22:04:32Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Copilot

Pull request overview

Adds a workflow to compute UMAP embeddings from sampled JUMP single-cell data and generate labeled UMAP visualizations (treatment type, anomaly score, control type).

Changes:

Pin umap-learn in the conda environment for compatibility with the repo’s scikit-learn version.
Add a notebook + nbconverted Python script to sample per-plate single cells and compute 2D UMAP coordinates.
Add an nbconverted R script to render and save UMAP figures colored by multiple metadata fields.

Reviewed changes

Copilot reviewed 4 out of 8 changed files in this pull request and generated 7 comments.

File	Description
`environment.yml`	Pins `umap-learn` to a specific version to support the new UMAP computation workflow.
`3.analyze_data/visualize_umaps/nbconverted/compute_plate_umaps.py`	Implements plate sampling + UMAP computation and writes a parquet used for plotting.
`3.analyze_data/visualize_umaps/compute_plate_umaps.ipynb`	Notebook version of the UMAP sampling/computation workflow.
`3.analyze_data/visualize_umaps/nbconverted/visualize_plate_umaps.r`	Generates and saves UMAP plots labeled by treatment type, anomaly score, and control type.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-23T22:12:54Z

3.analyze_data/visualize_umaps/nbconverted/visualize_plate_umaps.r

+suppressPackageStartupMessages(library(dplyr))
+suppressPackageStartupMessages(library(arrow))
+suppressPackageStartupMessages(library(stringr))


dplyr and stringr are imported but not used anywhere in this script. Removing unused package imports reduces required dependencies and speeds up startup.

Suggested change

suppressPackageStartupMessages(library(dplyr))

suppressPackageStartupMessages(library(arrow))

suppressPackageStartupMessages(library(stringr))

suppressPackageStartupMessages(library(arrow))

Copilot · 2026-01-23T22:12:54Z

3.analyze_data/visualize_umaps/nbconverted/compute_plate_umaps.py

+umapdf = umapdf.dropna(axis=1, how="any")
+
+print("Shape of plate data after sampling:", umapdf.shape)
+umapdf["Metadata_control_type"].unique()


This standalone umapdf["Metadata_control_type"].unique() expression has no effect in a script context (it only displays in a notebook). Consider removing it or converting it to an explicit print(...) if it’s meant as a diagnostic.

Suggested change

umapdf["Metadata_control_type"].unique()

print(umapdf["Metadata_control_type"].unique())

Copilot · 2026-01-23T22:12:54Z

environment.yml

+  # pin to version compatible with scikit-learn 1.1.1
+  - conda-forge::umap-learn==0.5.3


The conda dependency pin uses == (pip-style). This repo’s conda environment files consistently use conda’s package=version syntax (e.g., python=3.9 in this file, r-base=4.2.2 in R_environment.yml). Please change this to conda-forge::umap-learn=0.5.3 to match conventions and avoid potential conda parsing issues.

Copilot · 2026-01-23T22:12:55Z

3.analyze_data/visualize_umaps/nbconverted/visualize_plate_umaps.r

+    ggplot(umap_df, aes(x = umap_0, y = umap_1, color = Metadata_Treatment_Type)) +
+    geom_point(shape = 20, size = 2, alpha = 4) +
+    scale_color_manual(


alpha in ggplot2 is expected to be in the 0–1 range. Using alpha = 4 likely clamps to 1 (fully opaque) and makes the intent unclear. Please change this to a value between 0 and 1 (e.g., 0.4) for the treatment-type plot.

Copilot · 2026-01-23T22:12:55Z

3.analyze_data/visualize_umaps/nbconverted/visualize_plate_umaps.r

+    ggplot(umap_df, aes(x = umap_0, y = umap_1, color = Metadata_control_type)) +
+    geom_point(shape = 20, size = 2, alpha = 4) +
+    scale_color_manual(


alpha in ggplot2 should be in the 0–1 range. alpha = 4 is likely unintended and removes transparency. Please use a value between 0 and 1 for the control-type plot as well.

Copilot · 2026-01-23T22:12:55Z

3.analyze_data/visualize_umaps/nbconverted/compute_plate_umaps.py

+    scdf = scdf.groupby(["Metadata_control_type"], group_keys=False).apply(
+        lambda grp: grp.sample(n=min(250, len(grp)), random_state=0)
+    )
+


This groupby(...).apply(...) pattern is emitting a pandas deprecation warning (seen in the committed notebook output). To make this forward-compatible and keep logs clean, update the sampling to avoid GroupBy.apply’s changing behavior (e.g., use include_groups=False where available, or restructure sampling so it doesn’t rely on apply).

Suggested change

scdf = scdf.groupby(["Metadata_control_type"], group_keys=False).apply(

lambda grp: grp.sample(n=min(250, len(grp)), random_state=0)

)

# Sample up to 250 rows per control type without using GroupBy.apply

group_sizes = scdf["Metadata_control_type"].value_counts()

large_groups = group_sizes[group_sizes > 250].index

small_groups = group_sizes[group_sizes <= 250].index

sampled_large = (

scdf[scdf["Metadata_control_type"].isin(large_groups)]

.groupby("Metadata_control_type", group_keys=False)

.sample(n=250, random_state=0)

)

small = scdf[scdf["Metadata_control_type"].isin(small_groups)]

scdf = pd.concat([sampled_large, small], axis=0)

Copilot · 2026-01-23T22:12:55Z

3.analyze_data/visualize_umaps/nbconverted/compute_plate_umaps.py

+    umapdf["umap_0"], umapdf["umap_1"] = (
+        umap_data[:, 0],
+        umap_data[:, 1],
+    )
+


Assigning umap_0/umap_1 this way is triggering a pandas PerformanceWarning about a highly-fragmented DataFrame (also visible in the notebook output). Consider defragmenting before assignment (e.g., umapdf = umapdf.copy()) and assigning both columns in a single operation to avoid fragmentation overhead.

Suggested change

umapdf["umap_0"], umapdf["umap_1"] = (

umap_data[:, 0],

umap_data[:, 1],

)

# Defragment before assigning new columns and assign both at once

umapdf = umapdf.copy()

umapdf[["umap_0", "umap_1"]] = umap_data[:, :2]

Cameron Mattson added 3 commits January 23, 2026 14:26

Computed umap data and added figures

bc7da8e

Updated env to include umap

b189107

Removed unneeded imports and corrected group sampling

02d3a53

to prevent sampling more samples than available

Copilot AI review requested due to automatic review settings January 23, 2026 22:04

Copilot started reviewing on behalf of MattsonCam January 23, 2026 22:04 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added UMAP Figures #60

Added UMAP Figures #60

MattsonCam commented Jan 23, 2026

Uh oh!

review-notebook-app bot commented Jan 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	umapdf["Metadata_control_type"].unique()
	print(umapdf["Metadata_control_type"].unique())

		# pin to version compatible with scikit-learn 1.1.1
		- conda-forge::umap-learn==0.5.3

-    scdf = scdf.groupby(["Metadata_control_type"], group_keys=False).apply(
-        lambda grp: grp.sample(n=min(250, len(grp)), random_state=0)
-    )
+    # Sample up to 250 rows per control type without using GroupBy.apply
+    group_sizes = scdf["Metadata_control_type"].value_counts()
+    large_groups = group_sizes[group_sizes > 250].index
+    small_groups = group_sizes[group_sizes <= 250].index
+    sampled_large = (
+        scdf[scdf["Metadata_control_type"].isin(large_groups)]
+        .groupby("Metadata_control_type", group_keys=False)
+        .sample(n=250, random_state=0)
+    )
+    small = scdf[scdf["Metadata_control_type"].isin(small_groups)]
+    scdf = pd.concat([sampled_large, small], axis=0)

Added UMAP Figures #60

Are you sure you want to change the base?

Added UMAP Figures #60

Conversation

MattsonCam commented Jan 23, 2026

Uh oh!

review-notebook-app bot commented Jan 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants