Skip to content

PERF: Remove builtin type subclassing #232

@greglucas

Description

@greglucas

I was profiling some of our packet parsing and also looking into the Rust backend transition again.

Image

There is a lot of performance time spent on creating our underlying data due to our use of __new__ on the builtin types.

class _Parameter:
"""Mixin class for storing access to the raw value of a parsed data item.
The raw value is the closest representation of the data item as it appears in the packet.
e.g. bytes for binary data, int for integer data, etc. It has not been calibrated or
adjusted in any way and is an easy way for user's to debug the transformations that
happened after the fact.
Notes
-----
We need to override the __new__ method to store the raw value of the data item
on immutable built-in types. So this is just a way of allowing us to inject our
own attribute into the built-in types.
"""
def __new__(cls, value: BuiltinDataTypes, raw_value: BuiltinDataTypes = None) -> BuiltinDataTypes:
obj = super().__new__(cls, value)
# Default to the same value as the parsed value if it isn't provided
obj.raw_value = raw_value if raw_value is not None else value
return obj

This was added in v5, and followed the precedence of having the derived value and the raw value be "attached" to one another in the same class/struct, so that you could get one from the other. In v4, we had a ParsedDataItem that had all of this information associated with it.

class ParsedDataItem(xtcedef.AttrComparable):

My proposal is to remove the direct link between raw value and a derived value and rather move that logic into the Packet container itself. So, we would only return built-in types from our parsing functions, not our subclassed types. Rather than returning common.FloatItem(parsed_value) we would return float(parsed_value). The Packet dictionary (Maybe our own MutableMapping instead...) would then actually be a two-dictionary storage structure pointing to a derived mapping and a raw mapping. (something like having the get try the derived version first and then fallback to the raw data if there was no derived value to avoid storing duplicates in each mapping)

class Packet(dict):
    def __init__(self):
        # separate storage location for the second container
        self._derived_values = {}

    def __get__(self, key):
        # Try to return the derived value first, fallback to the raw if there was no derived version
        return self._derived_values.get(key, super().get(key))

    def get_raw(self, key):
        return super().get(key)

The other place this is relevant is if we try and call spp_type + spp_type, we lose the raw_values anyways in the conversion because it doesn't really make sense which raw_value you'd want to keep between the types or both etc. It opens up a bit of a can of worms in terms of combining things together.

The path forward

We need to implement the new container with the new methods for getting raw/derived values, so users have something in place that they know they should use. Along with this, we can add a raw_value property to the project's builtin subclasses that warns whenever the raw_value attribute is accessed.

Then, in a following release, we change over to using builtins directly and remove our subclasses.

Metadata

Metadata

Assignees

No one assigned

    Labels

    breakingThis involves a breaking changeenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions