Skip to content

[Bug]: logic error in json_utils.py #37575

@atheendre130505

Description

@atheendre130505

What happened?

Description: Beam YAML's JSON schema compatibility validation for objects is effectively disabled due to a logic error in json_utils.py.

The function _validate_compatible (used by row_validator) attempts to check if a Beam schema is compatible with a provided JSON schema. However, it contains several "simple code" bugs:

It compares the weak_schema dictionary directly to the string 'object' instead of checking its
type field.
It attempts to unpack a dictionary during iteration without calling .items().
It uses improper string formatting for error messages, leading to unhelpful or crashing error reports.
As a result, Validate transforms in Beam YAML may silently proceed even when schemas are fundamentally incompatible, or fail with distracting internal tracebacks.

Beam Version: 2.61.x (Python SDK)

Steps to reproduce
Run the following Python snippet. It should raise a ValueError about incompatible types ('string' vs 'integer'), but it currently completes successfully because the validation logic is skipped.

from apache_beam.yaml import json_utils
from apache_beam.portability.api import schema_pb2
from apache_beam.typehints import schemas

A schema with a string field

beam_schema = schema_pb2.Schema(fields=[
schemas.schema_field('f', schema_pb2.STRING)
])

An incompatible JSON schema expecting an integer for the same field

json_schema = {
'type': 'object',
'properties': {
'f': {'type': 'integer'}
}
}

This SHOULD fail, but silently succeeds due to logic error in json_utils.py

json_utils.row_validator(beam_schema, json_schema)

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions