A comprehensive analysis framework for understanding developer-written Java tests at scale
Automated test generation (ATG) has been investigated for decades, producing techniques based on symbolic analysis, search-based, random/adaptive-random, learning-based, and large-language-model-based approaches. Despite this large body of research, there remains a significant gap in understanding the characteristics of developer-written tests.
Hamster bridges this gap through an extensive empirical study of developer-written tests for Java applications, covering 1.7 million test cases. This study is the first of its kind to examine aspects of developer-written tests that are mostly neglected in existing literature:
- Test Scope — Unit, Integration, UI, and API test classification
- Test Fixtures — Setup and teardown complexity analysis
- Assertion Patterns — Types, sequences, and assertion complexity
- Input Types — Structured data inputs (JSON, XML, YAML, SQL, etc.)
- Mocking Usage — Framework detection and mock complexity
Our results highlight that a vast majority of developer-written tests exhibit characteristics and complexity beyond the capabilities of current ATG tools, identifying promising research directions for bridging this gap.
Hamster identifies the focal class (class under test) for each test method and classifies tests into categories:
| Test Type | Description |
|---|---|
| Unit/Module | Tests with a single focal class |
| Integration | Tests involving multiple focal classes |
| UI | Tests using UI frameworks (Selenium, Selenide, Playwright) |
| API | Tests using API testing frameworks (REST Assured, MockMvc) |
| Library | Tests that only exercise library code |
For each test method, Hamster extracts:
- Complexity Metrics — NCLOC, cyclomatic complexity (with/without helpers)
- Fixture Analysis — Setup/teardown methods, execution order, resource cleanup
- Assertion Details — Type classification, call sequences, wrapped vs. direct assertions
- Mocking Information — Framework detection, mock counts, mocked resources
- Input Analysis — Structured input detection (JSON, XML, YAML, CSV, SQL, HTML, etc.)
- Helper Method Tracking — Reachability analysis across test helpers
|
Core Frameworks
|
Assertion Libraries
|
Mocking Frameworks
|
|
BDD Frameworks
|
API Testing
|
UI Testing
|
Hamster classifies assertions into semantic categories:
| Category | Examples |
|---|---|
| Truthiness | assertTrue, assertFalse, isTrue |
| Equality | assertEquals, isEqualTo, equalTo |
| Identity | assertSame, isInstanceOf, sameInstance |
| Nullness | assertNull, isNotNull, nullValue |
| Numeric Tolerance | assertEquals with delta, isCloseTo |
| Throwable | assertThrows, assertThatThrownBy |
| Collection | assertArrayEquals, contains, hasSize |
| String | startsWith, containsString, matches |
| Comparison | isGreaterThan, lessThan, isBetween |
src/hamster/code_analysis/
├── focal_class_method/ # Test scope and focal class identification
│ └── focal_class_method.py
├── test_statistics/ # Comprehensive test analysis
│ ├── test_class_analysis_info.py
│ ├── test_method_analysis_info.py
│ ├── call_and_assertion_sequence_details_info.py
│ ├── input_analysis.py
│ ├── setup_analysis_info.py
│ └── teardown_analysis_info.py
├── model/ # Data models and enums
│ └── models.py
├── common/ # Shared analysis utilities
│ ├── common_analysis.py
│ └── reachability.py
└── utils/ # Constants and configuration
└── constants.py
- Python 3.11+
- CLDK for Java code analysis
-
Download GitHub repositories
# Clone target Java repositories for analysis -
Run CLDK and generate code analysis
# Generate Java analysis using CLDK -
Generate Hamster model
# Process CLDK output through Hamster analysis -
Generate dataset statistics
# Aggregate and export analysis results
Hamster produces structured analysis output including:
ProjectAnalysis
├── dataset_name
├── application_class_count
├── application_method_count
├── test_class_analyses[]
│ ├── qualified_class_name
│ ├── testing_frameworks[]
│ ├── setup_analyses[]
│ ├── teardown_analyses[]
│ └── test_method_analyses[]
│ ├── test_type (unit/integration/ui/api/library)
│ ├── focal_classes[]
│ ├── ncloc / cyclomatic_complexity
│ ├── call_assertion_sequences[]
│ ├── test_inputs[]
│ └── mocking detailsIf you use Hamster in your research, please cite our work:
@misc{pan2025hamsterlargescalestudycharacterization,
title={Hamster: A Large-Scale Study and Characterization of Developer-Written Tests},
author={Rangeet Pan and Tyler Stennett and Raju Pavuluri and Nate Levin and Alessandro Orso and Saurabh Sinha},
year={2025},
eprint={2509.26204},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2509.26204},
}[TBD]
Q: What Java versions are supported? A: Hamster uses CLDK for analysis, which supports Java 8+.
Q: Can I analyze my own projects? A: Yes! Follow the usage pipeline to analyze any Java project with test suites.
Q: How does focal class detection work? A: Hamster analyzes variable declarations, call sites, and constructor invocations to identify which application classes are exercised by each test, excluding producer-consumer relationships.
