Skip to content

Inconsistencies with R SummarizedExperiment and RangedSummarizedExperiment #77

@DiDeoxy

Description

@DiDeoxy

Making SummarizedExperiments

In Bioconductor there is no RangedSummarizedExperiment that does not have row ranges, if you try to build a RangedSummarizedExperiment without rowRanges in Bioconductor you get a SummarizedExperiment back instead. I think in BiocPy the best way to handle this issue is to make row_ranges required in RangedSummarizedExperiment and/or a SummarizedExperiment/RangedSummarizedExperiment constructor function.

Row data in RangedSummarizedExperiment is incorrect:

In Bioconductor the rowData accessor calls the mcols accessor on the SummarizedExperiment: https://github.com/Bioconductor/SummarizedExperiment/blob/c5b7ca2f8d975af13a18b3b5931449f092657f5c/R/SummarizedExperiment-class.R#L121

In Bioconductor the mcols accessor of RangedSummarizedExperiment calls the mcols accessor of the rowRanges : https://github.com/Bioconductor/SummarizedExperiment/blob/c5b7ca2f8d975af13a18b3b5931449f092657f5c/R/RangedSummarizedExperiment-class.R#L215

Thus in Bioconductor the rowData of a RangedSummarizedExperiment is the mcols of the rowRanges.

In BiocPy the get_row_data method accesses the _rows BiocFrame attribute of the RangedSummarizedExperiment. It is likely necessary to remove the row_data init parameter of RangedSummarizedExperiment as part of fixing this issue, alternatively you could have any supplied row_data be column concatenated to the mcols of the required row_ranges.

There are 4+ row indices in RangedSummarizedExperiment

This relates to the issue raised about GenomicRanges: BiocPy/GenomicRanges#121

The IRanges of the GenomicRanges has a names index, the GenomicRanges object has a names index, the mcols of the GenomicRanges has a row_names index, the row_data of the RangedSummarizedExperiment has a row_names index. None of these are cross validated to ensure they have the same values in the same order, this will likely lead to confusion by users in the future. Which one of these is the correct one? It would be good to resolve this upon object construction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions