-
Notifications
You must be signed in to change notification settings - Fork 8
Description
I was profiling some of our packet parsing and also looking into the Rust backend transition again.
There is a lot of performance time spent on creating our underlying data due to our use of __new__ on the builtin types.
space_packet_parser/space_packet_parser/common.py
Lines 212 to 231 in 073007e
| class _Parameter: | |
| """Mixin class for storing access to the raw value of a parsed data item. | |
| The raw value is the closest representation of the data item as it appears in the packet. | |
| e.g. bytes for binary data, int for integer data, etc. It has not been calibrated or | |
| adjusted in any way and is an easy way for user's to debug the transformations that | |
| happened after the fact. | |
| Notes | |
| ----- | |
| We need to override the __new__ method to store the raw value of the data item | |
| on immutable built-in types. So this is just a way of allowing us to inject our | |
| own attribute into the built-in types. | |
| """ | |
| def __new__(cls, value: BuiltinDataTypes, raw_value: BuiltinDataTypes = None) -> BuiltinDataTypes: | |
| obj = super().__new__(cls, value) | |
| # Default to the same value as the parsed value if it isn't provided | |
| obj.raw_value = raw_value if raw_value is not None else value | |
| return obj |
This was added in v5, and followed the precedence of having the derived value and the raw value be "attached" to one another in the same class/struct, so that you could get one from the other. In v4, we had a ParsedDataItem that had all of this information associated with it.
| class ParsedDataItem(xtcedef.AttrComparable): |
My proposal is to remove the direct link between raw value and a derived value and rather move that logic into the Packet container itself. So, we would only return built-in types from our parsing functions, not our subclassed types. Rather than returning common.FloatItem(parsed_value) we would return float(parsed_value). The Packet dictionary (Maybe our own MutableMapping instead...) would then actually be a two-dictionary storage structure pointing to a derived mapping and a raw mapping. (something like having the get try the derived version first and then fallback to the raw data if there was no derived value to avoid storing duplicates in each mapping)
class Packet(dict):
def __init__(self):
# separate storage location for the second container
self._derived_values = {}
def __get__(self, key):
# Try to return the derived value first, fallback to the raw if there was no derived version
return self._derived_values.get(key, super().get(key))
def get_raw(self, key):
return super().get(key)The other place this is relevant is if we try and call spp_type + spp_type, we lose the raw_values anyways in the conversion because it doesn't really make sense which raw_value you'd want to keep between the types or both etc. It opens up a bit of a can of worms in terms of combining things together.
The path forward
We need to implement the new container with the new methods for getting raw/derived values, so users have something in place that they know they should use. Along with this, we can add a raw_value property to the project's builtin subclasses that warns whenever the raw_value attribute is accessed.
Then, in a following release, we change over to using builtins directly and remove our subclasses.