feat(report): add machine-readable JSON output for -out=report #2020

x15sr71 · 2026-01-14T21:58:35Z

In raising this pull request, I confirm the following (please check boxes):

I have read and understood the contributors guide.
I have checked that another pull request for this purpose does not exist.
I have considered, and confirmed that this submission will be valuable to others.
I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
I give this submission freely, and claim no ownership to its content.
I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

I have never used CCExtractor.
I have used CCExtractor just a couple of times.
I absolutely love CCExtractor, but have not contributed previously.
I am an active contributor to CCExtractor.

Summary

This PR implements machine-readable JSON output for the -out=report feature, addressing issue #1399. Users can now generate structured reports that can be parsed with tools like jq, enabling seamless integration with automated workflows.

Background

Currently, CCExtractor’s report output is human-readable text that requires custom parsing for automation. While other media analysis tools such as ffprobe and mediainfo provide JSON output, structured closed-caption reporting is not consistently available across tools or versions. This feature enables CCExtractor to expose its existing report data in a structured JSON format.

Use case: Users running CCExtractor in automated environments (e.g., CI/CD pipelines, media processing workflows) need to programmatically determine if streams contain closed captions without writing custom parsers.

Changes

`-out=report` Option

ccextractor -out=report input.ts

Existing Text Output (-out=report)

File: ../20251206ch29FullTS.ts
Stream Mode: Transport Stream
Program Count: 5
Program Numbers: 1 2 3 4 5
PID: 49, Program: 1, MPEG-2 video
PID: 52, Program: 1, AC3 audio
PID: 53, Program: 1, AC3 audio
PID: 65, Program: 2, MPEG-2 video
PID: 68, Program: 2, AC3 audio
PID: 81, Program: 3, MPEG-2 video
PID: 84, Program: 3, AC3 audio
PID: 97, Program: 4, MPEG-2 video
PID: 100, Program: 4, AC3 audio
PID: 113, Program: 5, MPEG-2 video
PID: 116, Program: 5, AC3 audio
//////// Program #5: ////////
DVB Subtitles: No
Teletext: No
ATSC Closed Caption: Yes
EIA-608: Yes
XDS: No
CC1: Yes
CC2: No
CC3: No
CC4: No
CEA-708: Yes
Services: 1 2 3 4 5 6 9
Primary Language Present: Yes
Secondary Language Present: Yes
Width: 704
Height: 480
Aspect Ratio: 03 - 16:9
Frame Rate: 04 - 29.97


(More programs omitted for brevity)

JSON Output Structure (v1.0)

The output follows a versioned JSON report structure:

JSON output via `--report-format json`

ccextractor --report-format json -out=report input.ts

{
  "schema": {
    "name": "ccextractor-report",
    "version": "1.0"
  },
  "input": {
    "source": "file",
    "path": "../20251206ch29FullTS.ts"
  },
  "stream": {
    "mode": "Transport Stream",
    "program_count": 5,
    "program_numbers": [
      1,
      2,
      3,
      4,
      5
    ],
    "pids": [
      {
        "pid": 49,
        "program_number": 1,
        "codec": "MPEG-2 video"
      },
      {
        "pid": 52,
        "program_number": 1,
        "codec": "AC3 audio"
      },
      {
        "pid": 53,
        "program_number": 1,
        "codec": "AC3 audio"
      },
      {
        "pid": 65,
        "program_number": 2,
        "codec": "MPEG-2 video"
      },
      {
        "pid": 68,
        "program_number": 2,
        "codec": "AC3 audio"
      },
      {
        "pid": 81,
        "program_number": 3,
        "codec": "MPEG-2 video"
      },
      {
        "pid": 84,
        "program_number": 3,
        "codec": "AC3 audio"
      },
      {
        "pid": 97,
        "program_number": 4,
        "codec": "MPEG-2 video"
      },
      {
        "pid": 100,
        "program_number": 4,
        "codec": "AC3 audio"
      },
      {
        "pid": 113,
        "program_number": 5,
        "codec": "MPEG-2 video"
      },
      {
        "pid": 116,
        "program_number": 5,
        "codec": "AC3 audio"
      }
    ]
  },
  "programs": [
    {
      "program_number": 1,
      "summary": {
        "has_any_captions": true,
        "has_608": true,
        "has_708": true
      },
      "services": {
        "dvb_subtitles": false,
        "teletext": false,
        "atsc_closed_caption": true
      },
      "captions": {
        "present": true,
        "eia_608": {
          "present": true,
          "xds": false,
          "channels": {
            "cc1": true,
            "cc2": false,
            "cc3": false,
            "cc4": false
          }
        },
        "cea_708": {
          "present": true,
          "services": [
            1,
            2,
            3,
            4,
            5,
            6,
            9
          ]
        }
      },
      "video": {
        "width": 1920,
        "height": 1080,
        "aspect_ratio": "03 - 16:9",
        "frame_rate": "04 - 29.97"
      }
    },

(More programs omitted for brevity)

Schema Notes

The JSON schema is intentionally descriptive rather than prescriptive.
Field presence and values depend on the input container, stream type, and available metadata.
Codec strings reflect CCExtractor's internal stream type descriptions and are container-dependent (e.g., "AC3 audio" vs "AC3").
The services object under programs[] indicates which captioning systems are present (DVB, Teletext, ATSC), while captions.cea_708.services[] lists active CEA-708 caption service numbers.

Program Ordering:

JSON output: Programs are sorted in ascending order by program number (1, 2, 3, 4, 5) for predictable parsing
Text output: Programs are displayed in descending order (5, 4, 3, 2, 1) as they're processed

Text Output Field	JSON Field
File:	`input.path`
Stream Mode	`stream.mode`
Program Count	`stream.program_count`
Program Numbers	`stream.program_numbers[]`
PID: X, Program: Y, Codec	`stream.pids[]`
DVB Subtitles	`programs[].services.dvb_subtitles`
Teletext	`programs[].services.teletext`
ATSC Closed Caption	`programs[].services.atsc_closed_caption`
EIA-608	`programs[].captions.eia_608.present`
XDS	`programs[].captions.eia_608.xds`
CC1..CC4	`programs[].captions.eia_608.channels.*`
CEA-708	`programs[].captions.cea_708.present`
Services:	`programs[].captions.cea_708.services[]`
Primary Language Present	(not in JSON)
Secondary Language Present	(not in JSON)
Width / Height	`programs[].video.width / height`
Aspect Ratio	`programs[].video.aspect_ratio`
Frame Rate	`programs[].video.frame_rate`
MPEG-4 Timed Text	`container.mp4.timed_text_tracks`
(JSON-only)	`schema.*`
(JSON-only)	`programs[].summary.*`
(JSON-only)	`programs[].captions.present`

Key Features:

Structured, machine-readable JSON output for -out=report
Versioned schema (v1.0) for future extensibility
Backward compatible (existing text report remains the default)
Caption presence reporting for:
- ATSC Closed Captions (EIA-608 / CEA-708)
- DVB subtitles (presence flag)
- Teletext (presence flag)
- Note: the has_any_captions summary field reflects EIA-608 / CEA-708 only.)
Program-level summary fields for fast closed-caption automation checks
PID and codec metadata per program (preserving CCExtractor’s existing codec string formats)
Guarded video metadata (emitted only when valid)
Multi-program stream support with deterministic ordering
Container-level metadata when available (e.g., MP4 timed text track count)

Technical Approach

JSON generation is implemented in C using existing CCExtractor internal data structures.
String values are properly escaped to ensure valid JSON output.
Format selection uses case-insensitive comparison (strcasecmp / _stricmp).
The JSON output uses CCExtractor’s existing internal data structures without modifying caption extraction or decoding logic.
Memory allocation and cleanup follow existing project patterns.
Programs are sorted by program number to provide stable and predictable output.

Example Testing Commands

# Test JSON output
ccextractor --report-format json -out=report sample.ts | jq .

# Verify caption presence
ccextractor --report-format json -out=report sample.ts | jq '.programs[0].summary.has_any_captions'

# Extract specific caption channels
ccextractor --report-format json -out=report sample.ts | jq '.programs[].captions.eia_608.channels'

# Check which CC channels are active
ccextractor --report-format json -out=report sample.ts | jq '.programs[].captions.eia_608.channels | to_entries | map(select(.value == true)) | .[].key'

# Get video dimensions
ccextractor --report-format json -out=report sample.ts | jq '.programs[].video | select(. != null) | {width, height}'

# Default text format still works
ccextractor -out=report sample.ts

Field Value Formats:

String values like aspect_ratio and frame_rate preserve CCExtractor's internal enum formatting (e.g., "03 - 16:9", "04 - 29.97")
This design choice maintains transparency and aids debugging
Users needing normalized values can post-process with simple string operations:
jq '.programs[].video.aspect_ratio | split(" - ")[1]'

Benefits

Automation-Friendly: Enables programmatic parsing without regex/custom parsers
Familiar Structure: Uses JSON patterns similar to tools like ffprobe and mediainfo
Extensible: Versioned schema to support future extensions
Backward Compatible: Existing workflows continue to work unchanged
Addresses Real Need: Solves problem raised by multiple community members (issue [PROPOSAL] - Structured data JSON output of ccextractor -out=report #1399 and related discussions)
Quick Caption Detection: Provides has_any_captions summary field for fast EIA-608 / CEA-708 closed-caption checks

Notes

Platform compatibility: uses strcasecmp on POSIX systems and maps to _stricmp on Windows via platform-specific preprocessor guards.
Video and container metadata are emitted conditionally when applicable
Temporary allocations used for program ordering are properly released
The implementation follows existing CCExtractor coding conventions

ccextractor-bot · 2026-01-14T22:46:29Z

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 2028754...:

Report Name	Tests Passed
Broken	13/13
CEA-708	14/14
DVB	7/7
DVD	3/3
DVR-MS	2/2
General	25/27
Hardsubx	1/1
Hauppage	3/3
MP4	3/3
NoCC	10/10
Options	81/86
Teletext	21/21
WTV	13/13
XDS	34/34

Your PR breaks these cases:

ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65...
ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b...
ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...

Congratulations: Merging this PR would fix the following tests:

ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2..., Last passed: Never
ccextractor --out=spupng c83f765c66..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

ccextractor-bot · 2026-01-14T23:12:22Z

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit 2028754...:

Report Name	Tests Passed
Broken	13/13
CEA-708	14/14
DVB	6/7
DVD	3/3
DVR-MS	2/2
General	25/27
Hardsubx	1/1
Hauppage	3/3
MP4	3/3
NoCC	10/10
Options	81/86
Teletext	21/21
WTV	13/13
XDS	34/34

Your PR breaks these cases:

ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2...
ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65...
ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b...
ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...

Congratulations: Merging this PR would fix the following tests:

ccextractor --out=spupng c83f765c66..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

x15sr71 added 3 commits January 15, 2026 00:24

feat(report): add machine-readable JSON output for -out=report

73b5ea9

docs(changelog): mention JSON output support for -out=report

a7d3e7c

chore: format Rust code and fix trailing newline

25a5fcb

fix(report): guard JSON report cleanup to prevent test failures

ca55c86

x15sr71 force-pushed the feat/json-report branch from ab2cda5 to ca55c86 Compare January 14, 2026 23:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(report): add machine-readable JSON output for -out=report #2020

feat(report): add machine-readable JSON output for -out=report #2020

x15sr71 commented Jan 14, 2026 •

edited

Loading

Uh oh!

ccextractor-bot commented Jan 14, 2026

Uh oh!

ccextractor-bot commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(report): add machine-readable JSON output for -out=report #2020

Are you sure you want to change the base?

feat(report): add machine-readable JSON output for -out=report #2020

Conversation

x15sr71 commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Changes

-out=report Option

Existing Text Output (-out=report)

JSON Output Structure (v1.0)

JSON output via --report-format json

Schema Notes

Key Features:

Technical Approach

Example Testing Commands

Benefits

Notes

Uh oh!

ccextractor-bot commented Jan 14, 2026

Uh oh!

ccextractor-bot commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

x15sr71 commented Jan 14, 2026 •

edited

Loading

`-out=report` Option

JSON output via `--report-format json`