Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions docs/user_guides/fs/feature_group/on_demand_transformations.md
Original file line number Diff line number Diff line change
Expand Up @@ -254,3 +254,68 @@ On-demand transformation functions can also be accessed and executed as normal f
](feature_vector["transaction_time"], datetime.now())

```

## Testing On-Demand Transformations Locally

Hopsworks allows you to test on-demand transformations locally without requiring a connection to the Hopsworks platform.
This is useful for validating transformation logic before deploying it to production.

### Testing individual on-demand transformation functions

Individual on-demand transformation functions can be accessed by name from the feature group and tested using the `execute` or `executor` methods.
Refer to the [Testing Transformation Functions](../transformation_functions.md#testing-transformation-functions) guide for more details.

=== "Python"
!!! example "Accessing and testing an individual on-demand transformation function from a feature group"
```python
# Access the transformation function by name
transaction_age_udf = fg["transaction_age"]

# Quick test
result = transaction_age_udf.execute(
pd.Series([datetime(2023, 1, 1)]),
pd.Series([datetime(2023, 6, 1)])
)
```

### Testing all on-demand transformations on a feature group

The `execute_odts` method on a feature group applies all attached on-demand transformations to the provided data.
This allows you to test the complete on-demand transformation pipeline locally.

=== "Python"
!!! example "Testing on-demand transformations on a feature group with a DataFrame"
```python
@hopsworks.udf(return_type=float)
def compute_ratio(amount, quantity):
return amount / quantity

fg = fs.get_or_create_feature_group(
name="transactions",
version=1,
primary_key=["pk"],
transformation_functions=[compute_ratio("amount", "quantity")]
)

# Test with a DataFrame (offline mode)
test_df = pd.DataFrame({
"amount": [100.0, 200.0, 300.0],
"quantity": [2, 4, 5]
})
result_df = fg.execute_odts(test_df)
```

=== "Python"
!!! example "Testing on-demand transformations on a feature group with a dictionary"
```python
# Test with a dictionary (simulating online inference)
test_dict = {"amount": 100.0, "quantity": 2}
result_dict = fg.execute_odts(test_dict, online=True)
```

The `execute_odts` method accepts the following parameters:

- **`data`**: Input data as a `pd.DataFrame`, `pl.DataFrame`, or `dict[str, Any]`.
- **`online`**: Whether to execute in online mode (single values) or offline mode (batch). Defaults to offline mode.
- **`transformation_context`**: A dictionary (or list of dictionaries for batch) mapping variable names to contextual values accessible via the `context` parameter in transformation functions.
- **`request_parameters`**: A dictionary (or list of dictionaries for batch) of request parameters. These take highest priority when resolving feature values.
Original file line number Diff line number Diff line change
Expand Up @@ -164,3 +164,94 @@ To achieve this, set the `transform` parameter to False.
transform=False
)
```

## Testing Transformations Locally

Hopsworks allows you to test transformations attached to a feature view locally without requiring a connection to the Hopsworks platform.
This is useful for validating transformation logic before deploying it to production.

### Accessing transformation functions by name

Transformation functions attached to a feature view can be accessed by name using dictionary-style or attribute-style access.
This returns the underlying `HopsworksUdf` object, which can be tested using the `execute` or `executor` methods described in the [Testing Transformation Functions](../transformation_functions.md#testing-transformation-functions) guide.

=== "Python"
!!! example "Accessing and testing an individual transformation function from a feature view"
```python
# Access via dictionary-style syntax
normalize_udf = fv["normalize"]

# Or access via attribute-style syntax
normalize_udf = fv.normalize

# Test with mocked statistics
executor = normalize_udf.executor(statistics={"amount": {"mean": 100.0, "std_dev": 25.0}})
result = executor.execute(pd.Series([100.0, 125.0, 150.0]))
```

### Testing model-dependent transformations locally

The `execute_mdts` method applies all model-dependent transformations attached to the feature view to the provided data.
This method requires that training data statistics have been initialized first, either by calling `create_training_data`, `init_batch_scoring`, or `init_serving`.

=== "Python"
!!! example "Testing model-dependent transformations on a feature view with a DataFrame"
```python
from hopsworks import udf
from hopsworks.transformation_statistics import TransformationStatistics

@udf(return_type=float)
def normalize(amount, statistics=TransformationStatistics("amount")):
return (amount - statistics.amount.mean) / statistics.amount.std_dev

fv = fs.get_or_create_feature_view(
name="transactions_fv",
version=1,
query=fg.select_features(),
transformation_functions=[normalize("amount")]
)

# Initialize statistics by creating training data
features, labels = fv.create_training_data()

# Test with a DataFrame (offline mode)
test_df = pd.DataFrame({"amount": [100.0, 200.0, 300.0]})
result_df = fv.execute_mdts(test_df)
```

=== "Python"
!!! example "Testing model-dependent transformations simulating online inference"
```python
# Test with a dictionary (simulating online inference)
test_dict = {"amount": 100.0}
result_dict = fv.execute_mdts(test_dict, online=True)
```

The `execute_mdts` method accepts the following parameters:

- **`data`**: Input data as a `pd.DataFrame`, `pl.DataFrame`, or `dict[str, Any]`.
- **`online`**: Whether to execute in online mode (single values) or offline mode (batch). Defaults to offline mode.
- **`transformation_context`**: A dictionary (or list of dictionaries for batch) mapping variable names to contextual values accessible via the `context` parameter in transformation functions.
- **`request_parameters`**: A dictionary (or list of dictionaries for batch) of request parameters. These take highest priority when resolving feature values.

### Testing on-demand transformations locally

If the feature view includes on-demand features from its underlying feature groups, you can test those transformations using the `execute_odts` method.
This method applies all on-demand transformations attached to the feature view on the provided data.

=== "Python"
!!! example "Testing on-demand transformations on a feature view"
```python
# Test with a DataFrame (offline mode)
test_df = pd.DataFrame({
"amount": [100.0, 200.0, 300.0],
"quantity": [2, 4, 5]
})
result_df = fv.execute_odts(test_df)

# Test with a dictionary (simulating online inference)
test_dict = {"amount": 100.0, "quantity": 2}
result_dict = fv.execute_odts(test_dict, online=True)
```

The `execute_odts` method accepts the same parameters as `execute_mdts` described above.
135 changes: 135 additions & 0 deletions docs/user_guides/fs/transformation_functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,141 @@ If only the `name` is provided, then the version will default to 1.
plus_one_fn = fs.get_transformation_function(name="plus_one", version=2)
```

## Testing Transformation Functions

Hopsworks provides built-in support for unit testing transformation functions locally, without requiring a connection to the Hopsworks platform.
This enables you to validate your transformation logic before deploying it to feature groups or feature views.

### Quick testing with `execute`

The `execute` method provides a convenient way to quickly test simple transformation functions that do not require statistics or context variables.
It executes the transformation function in offline mode (batch processing).

=== "Python"
!!! example "Quick testing of a transformation function"
```python
from hopsworks import udf
import pandas as pd

@udf(return_type=float)
def add_one(value):
return value + 1

# Direct execution for simple tests
result = add_one.execute(pd.Series([1.0, 2.0, 3.0]))
assert result.tolist() == [2.0, 3.0, 4.0]
```

### Advanced testing with `executor`

The `executor` method creates a reusable callable object for testing transformation functions that require statistics, context variables, or need to be tested in a specific execution mode.

The `executor` method accepts three optional parameters:

- **`statistics`**: Mock statistics for model-dependent transformations. Accepts three formats: a `TransformationStatistics` object, a `dict[str, dict[str, Any]]` mapping feature names to statistics, or a `list[FeatureDescriptiveStatistics]`.
- **`context`**: A dictionary of contextual variables passed to the transformation function at runtime.
- **`online`**: Whether to execute in online mode (single values) or offline mode (batch/vectorized). Only relevant for transformation functions using the `default` execution mode. Defaults to `False` (offline).

=== "Python"
!!! example "Testing a transformation function with mocked statistics"
```python
from hopsworks import udf
from hopsworks.transformation_statistics import TransformationStatistics
import pandas as pd

@udf(return_type=float)
def normalize(value, statistics=TransformationStatistics("value")):
return (value - statistics.value.mean) / statistics.value.std_dev

# Test with mock statistics provided as a dictionary
executor = normalize.executor(statistics={"value": {"mean": 100.0, "std_dev": 25.0}})
result = executor.execute(pd.Series([100.0, 125.0, 150.0]))
assert result.tolist() == [0.0, 1.0, 2.0]
```

=== "Python"
!!! example "Testing a transformation function with context variables"
```python
from hopsworks import udf
import pandas as pd

@udf(return_type=float)
def apply_discount(price, context):
return price * (1 - context["discount_rate"])

executor = apply_discount.executor(context={"discount_rate": 0.1})
result = executor.execute(pd.Series([100.0, 200.0]))
assert result.tolist() == [90.0, 180.0]
```

### Testing online and offline execution modes

Transformation functions using the `default` execution mode are executed as Pandas UDFs during batch processing and as Python UDFs during online inference.
The `executor` method allows you to test both modes by setting the `online` parameter.

=== "Python"
!!! example "Testing both online and offline execution modes"
```python
from hopsworks import udf
import pandas as pd

@udf(return_type=float)
def double_value(value):
return value * 2

# Offline mode (batch processing with Pandas Series)
offline_executor = double_value.executor(online=False)
batch_result = offline_executor.execute(pd.Series([1.0, 2.0, 3.0]))

# Online mode (single value processing)
online_executor = double_value.executor(online=True)
single_result = online_executor.execute(5.0)
assert single_result == 10.0
```

!!! note
For transformation functions with a `mode` set to `python` or `pandas`, the `online` parameter has no effect since those modes always execute as the specified UDF type.

### Accessing transformation functions by name

Transformation functions attached to feature views and feature groups can be accessed by name using dictionary-style or attribute-style access.
This is useful for testing individual transformation functions in isolation.

=== "Python"
!!! example "Accessing transformation functions from a feature view"
```python
# Access via dictionary-style syntax
normalize_udf = fv["normalize"]

# Access via attribute-style syntax
normalize_udf = fv.normalize

# Test the accessed transformation function
result = normalize_udf.execute(pd.Series([100.0, 125.0, 150.0]))
```

=== "Python"
!!! example "Accessing transformation functions from a feature group"
```python
# Access via dictionary-style syntax
transaction_age_udf = fg["transaction_age"]

# Access via attribute-style syntax
transaction_age_udf = fg.transaction_age

# Test the accessed transformation function
result = transaction_age_udf.execute(pd.Series([datetime(2023, 1, 1)]), pd.Series([datetime(2023, 6, 1)]))
```

### Testing transformations attached to feature groups and feature views

In addition to testing individual transformation functions, you can test all transformations attached to a feature group or feature view at once using the `execute_odts` and `execute_mdts` methods.
These methods are described in their respective guides:

- [Testing on-demand transformations on feature groups](./feature_group/on_demand_transformations.md#testing-on-demand-transformations-locally)
- [Testing on-demand transformations on feature views](./feature_view/model-dependent-transformations.md#testing-on-demand-transformations-locally)
- [Testing model-dependent transformations on feature views](./feature_view/model-dependent-transformations.md#testing-model-dependent-transformations-locally)

## Using transformation functions

Transformation functions can be used by attaching it to a feature view to [create model-dependent transformations](./feature_view/model-dependent-transformations.md) or attached to feature groups to [create on-demand transformations](./feature_group/on_demand_transformations.md)
Loading