DRAFT: Upgrade to Python 3.13 with libtmux race condition fix#1978
DRAFT: Upgrade to Python 3.13 with libtmux race condition fix#1978
Conversation
- Update target-version and pythonVersion to 3.13 in root pyproject.toml - Update Python version in server.yml build matrix to 3.13 - Update Python version in pypi-release.yml to 3.13 - Update Python version in pr-review action to 3.13 - Pin libtmux to neubig/libtmux#fix/new-session-race-condition branch which fixes the race condition in new_session() that causes TmuxObjectDoesNotExist errors in Python 3.13 environments The libtmux fix avoids the race condition by eliminating the separate list-sessions query after session creation, instead parsing the session data directly from the -P output of new-session. Fixes the Python 3.13 + PyInstaller + Docker compatibility issue reported in libtmux#624. Co-authored-by: openhands <openhands@all-hands.dev>
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 100.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_reasoner
Skipped Tests:
litellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
litellm_proxy_gemini_3_pro_preview
litellm_proxy_claude_sonnet_4_5_20250929
|
Coverage Report •
|
||||||||||||||||||||||||||||||
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 28.7s | $0.03 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 21.6s | $0.03 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 11.9s | $0.01 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 48.3s | $0.04 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 19.9s | $0.02 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 27.5s | $0.02 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 34.3s | $0.03 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 16.5s | $0.02 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 21.5s | $0.01 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 4m 25s | $0.53 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 18.7s | $0.02 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 27.9s | $0.02 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 15.8s | $0.02 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 22.3s | $0.02 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 9.7s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 22.2s | $0.02 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 1m 27s | $0.01 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 5m 5s | $0.41 |
| 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 2m 16s | $0.18 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 20.6s | $0.02 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 38.4s | $0.03 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 36.6s | $0.03 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 11.2s | $0.01 |
| 01_standalone_sdk/31_iterative_refinement.py | ✅ PASS | 3m 7s | $0.21 |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 19.7s | $0.02 |
| 01_standalone_sdk/34_critic_example.py | ❌ FAIL Exit code 1 |
3.8s | -- |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ✅ PASS | 9.8s | $0.00 |
| 01_standalone_sdk/37_llm_profile_store.py | ✅ PASS | 4.0s | $0.00 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 58.3s | $0.04 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
1m 4s | -- |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
16.8s | -- |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ❌ FAIL Exit code 1 |
1m 1s | -- |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ✅ PASS | 31.8s | $0.02 |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ❌ FAIL Exit code 1 |
4.1s | -- |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ✅ PASS | 18.7s | $0.02 |
| 04_llm_specific_tools/02_gemini_file_tools.py | ✅ PASS | 44.4s | $0.08 |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ✅ PASS | 12.5s | $0.01 |
| 05_skills_and_plugins/02_loading_plugins/main.py | ✅ PASS | 7.8s | $0.01 |
❌ Some tests failed
Total: 38 | Passed: 33 | Failed: 5 | Total Cost: $1.94
Failed examples:
- examples/01_standalone_sdk/34_critic_example.py: Exit code 1
- examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py: Exit code 1
…HUB_SHA GitHub Actions sets GITHUB_SHA to the merge commit by default, which differs from the PR head commit. Use a custom variable AGENT_SERVER_SHA to explicitly pass the PR head SHA to example scripts for Docker image selection. Co-authored-by: openhands <openhands@all-hands.dev>
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 25.3s | $0.03 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 17.9s | $0.03 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 11.1s | $0.01 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 41.2s | $0.04 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 13.5s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 27.9s | $0.02 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 28.7s | $0.03 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 11.2s | $0.01 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 20.8s | $0.01 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 4m 22s | $0.54 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 17.2s | $0.02 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 25.1s | $0.01 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 13.2s | $0.02 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 19.5s | $0.02 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 14.8s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 18.1s | $0.01 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 1m 10s | $0.01 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 2m 54s | $0.22 |
| 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 2m 4s | $0.17 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 18.6s | $0.02 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 28.3s | $0.02 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 38.6s | $0.03 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 12.2s | $0.01 |
| 01_standalone_sdk/31_iterative_refinement.py | ❌ FAIL Timed out after 600 seconds |
10m 0s | -- |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 23.7s | $0.02 |
| 01_standalone_sdk/34_critic_example.py | ❌ FAIL Exit code 1 |
3.9s | -- |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ✅ PASS | 21.3s | $0.01 |
| 01_standalone_sdk/37_llm_profile_store.py | ✅ PASS | 4.1s | $0.00 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 1m 5s | $0.04 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
4.7s | -- |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
5.7s | -- |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ❌ FAIL Exit code 1 |
5m 11s | -- |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ✅ PASS | 28.3s | $0.03 |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ❌ FAIL Exit code 1 |
4.7s | -- |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ✅ PASS | 31.8s | $0.04 |
| 04_llm_specific_tools/02_gemini_file_tools.py | ✅ PASS | 1m 10s | $0.05 |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ✅ PASS | 10.5s | $0.01 |
| 05_skills_and_plugins/02_loading_plugins/main.py | ✅ PASS | 7.5s | $0.01 |
❌ Some tests failed
Total: 38 | Passed: 32 | Failed: 6 | Total Cost: $1.52
Failed examples:
- examples/01_standalone_sdk/31_iterative_refinement.py: Timed out after 600 seconds
- examples/01_standalone_sdk/34_critic_example.py: Exit code 1
- examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py: Exit code 1
- Regenerate uv.lock with pinned libtmux git dependency - Simplify Generator[T, None, None] to Generator[T] in test files Co-authored-by: openhands <openhands@all-hands.dev>
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 25.7s | $0.03 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 20.0s | $0.03 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 14.0s | $0.01 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 30.2s | $0.02 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 18.5s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 35.4s | $0.03 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 31.4s | $0.03 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 20.0s | $0.02 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 21.1s | $0.02 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 6m 29s | $0.84 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 16.4s | $0.02 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 23.3s | $0.01 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 15.6s | $0.02 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 15.8s | $0.02 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 11.4s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 16.7s | $0.01 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 57.8s | $0.01 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 51.3s | $0.05 |
| 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 1m 44s | $0.19 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 22.6s | $0.03 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 35.2s | $0.03 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 44.6s | $0.04 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 10.6s | $0.01 |
| 01_standalone_sdk/31_iterative_refinement.py | ✅ PASS | 3m 7s | $0.22 |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 16.6s | $0.02 |
| 01_standalone_sdk/34_critic_example.py | ❌ FAIL Exit code 1 |
3.8s | -- |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ✅ PASS | 13.1s | $0.00 |
| 01_standalone_sdk/37_llm_profile_store.py | ✅ PASS | 4.1s | $0.00 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 59.4s | $0.04 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
4.8s | -- |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
4.9s | -- |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ❌ FAIL Exit code 1 |
5m 11s | -- |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ✅ PASS | 28.6s | $0.02 |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ❌ FAIL Exit code 1 |
5.6s | -- |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ✅ PASS | 20.8s | $0.03 |
| 04_llm_specific_tools/02_gemini_file_tools.py | ✅ PASS | 57.7s | $0.07 |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ✅ PASS | 14.0s | $0.02 |
| 05_skills_and_plugins/02_loading_plugins/main.py | ✅ PASS | 7.6s | $0.01 |
❌ Some tests failed
Total: 38 | Passed: 33 | Failed: 5 | Total Cost: $1.90
Failed examples:
- examples/01_standalone_sdk/34_critic_example.py: Exit code 1
- examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py: Exit code 1
The SHORT_SHA extraction was using github.sha (merge commit) instead of the GITHUB_SHA env var (which is set to PR head SHA for pull_request events). This caused Docker images to be tagged with the wrong SHA, making them unfindable by run-examples.yml which uses the PR head SHA. Co-authored-by: openhands <openhands@all-hands.dev>
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 100.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_reasoner
Skipped Tests:
litellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
litellm_proxy_gemini_3_pro_preview
litellm_proxy_claude_sonnet_4_5_20250929
|
Co-authored-by: openhands <openhands@all-hands.dev>
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
GITHUB_SHA is a reserved GitHub Actions environment variable that cannot be overridden at job level. Use SDK_SHA which is already supported by build.py and takes precedence over GITHUB_SHA. This ensures Docker images are tagged with the PR head SHA (not the merge commit SHA) so run-examples.yml can find the correct images. Co-authored-by: openhands <openhands@all-hands.dev>
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 25.4s | $0.02 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 17.3s | $0.01 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 12.3s | $0.00 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 35.1s | $0.01 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 14.9s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 26.0s | $0.01 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 30.0s | $0.03 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 19.5s | $0.01 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 18.0s | $0.01 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 4m 34s | $0.57 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 16.9s | $0.01 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 25.5s | $0.01 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 12.0s | $0.02 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 15.6s | $0.02 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 10.1s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 15.3s | $0.01 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 1m 22s | $0.02 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 3m 46s | $0.30 |
| 01_standalone_sdk/25_agent_delegation.py | ❌ FAIL Timed out after 600 seconds |
10m 0s | $0.21 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 19.4s | $0.02 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 31.7s | $0.03 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 37.9s | $0.03 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 17.4s | $0.01 |
| 01_standalone_sdk/31_iterative_refinement.py | ✅ PASS | 3m 49s | $0.26 |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 16.6s | $0.02 |
| 01_standalone_sdk/34_critic_example.py | ❌ FAIL Exit code 1 |
3.3s | -- |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ✅ PASS | 8.8s | $0.00 |
| 01_standalone_sdk/37_llm_profile_store.py | ✅ PASS | 3.4s | $0.00 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 57.7s | $0.03 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
55.3s | -- |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
16.7s | -- |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ❌ FAIL Exit code 1 |
1m 9s | -- |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ✅ PASS | 26.6s | $0.02 |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ❌ FAIL Exit code 1 |
3.5s | -- |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ✅ PASS | 20.5s | $0.02 |
| 04_llm_specific_tools/02_gemini_file_tools.py | ✅ PASS | 50.1s | $0.06 |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ✅ PASS | 11.5s | $0.01 |
| 05_skills_and_plugins/02_loading_plugins/main.py | ✅ PASS | 6.8s | $0.01 |
❌ Some tests failed
Total: 38 | Passed: 32 | Failed: 6 | Total Cost: $1.83
Failed examples:
- examples/01_standalone_sdk/25_agent_delegation.py: Timed out after 600 seconds
- examples/01_standalone_sdk/34_critic_example.py: Exit code 1
- examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py: Exit code 1
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 26.4s | $0.02 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 17.6s | $0.01 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 10.9s | $0.00 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 48.7s | $0.04 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 14.5s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 27.5s | $0.01 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 32.9s | $0.03 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 18.2s | $0.01 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 24.0s | $0.02 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 7m 0s | $0.86 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 17.6s | $0.01 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 26.7s | $0.01 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 15.6s | $0.01 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 19.7s | $0.01 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 10.3s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 14.6s | $0.01 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 1m 30s | $0.01 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 3m 26s | $0.27 |
| 01_standalone_sdk/25_agent_delegation.py | ❌ FAIL Timed out after 600 seconds |
10m 0s | $0.14 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 19.3s | $0.02 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 38.9s | $0.04 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 41.4s | $0.03 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 16.3s | $0.01 |
| 01_standalone_sdk/31_iterative_refinement.py | ✅ PASS | 4m 47s | $0.32 |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 18.3s | $0.01 |
| 01_standalone_sdk/34_critic_example.py | ❌ FAIL Exit code 1 |
3.8s | -- |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ✅ PASS | 9.3s | $0.00 |
| 01_standalone_sdk/37_llm_profile_store.py | ✅ PASS | 3.9s | $0.00 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 59.1s | $0.04 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
4.7s | -- |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
4.8s | -- |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ❌ FAIL Exit code 1 |
5m 10s | -- |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ✅ PASS | 32.3s | $0.02 |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ❌ FAIL Exit code 1 |
4.7s | -- |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ✅ PASS | 23.9s | $0.02 |
| 04_llm_specific_tools/02_gemini_file_tools.py | ✅ PASS | 43.7s | $0.06 |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ✅ PASS | 12.8s | $0.01 |
| 05_skills_and_plugins/02_loading_plugins/main.py | ✅ PASS | 6.8s | $0.01 |
❌ Some tests failed
Total: 38 | Passed: 32 | Failed: 6 | Total Cost: $2.07
Failed examples:
- examples/01_standalone_sdk/25_agent_delegation.py: Timed out after 600 seconds
- examples/01_standalone_sdk/34_critic_example.py: Exit code 1
- examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py: Exit code 1
🧪 Integration Tests ResultsOverall Success Rate: 100.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_reasoner
Skipped Tests:
litellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
litellm_proxy_gemini_3_pro_preview
litellm_proxy_claude_sonnet_4_5_20250929
|
The libtmux race condition fix (libtmux#625) requires Python 3.13 environment throughout the build process, not just in the base image. Previously, the builder stage was using Python 3.12, which meant dependencies were compiled for Python 3.12 even though the runtime image used Python 3.13. Changes: - Update builder FROM to python:3.13-bookworm - Update uv python install/venv from 3.12 to 3.13 - Update ARG BASE_IMAGE default to python3.13-nodejs22 - Update comment to reflect libtmux fix status Co-authored-by: openhands <openhands@all-hands.dev>
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 23.8s | $0.03 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 18.9s | $0.03 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 11.4s | $0.01 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 40.4s | $0.04 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 13.0s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 26.7s | $0.02 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 36.2s | $0.04 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 18.6s | $0.02 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 21.9s | $0.02 |
| 01_standalone_sdk/14_context_condenser.py | ❌ FAIL Timed out after 600 seconds |
10m 0s | -- |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 16.6s | $0.02 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 26.3s | $0.02 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 13.5s | $0.02 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 21.0s | $0.02 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 10.4s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 20.6s | $0.02 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 1m 3s | $0.02 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 4m 13s | $0.31 |
| 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 1m 50s | $0.16 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 20.4s | $0.03 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 46.1s | $0.03 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 33.4s | $0.02 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 8.9s | $0.01 |
| 01_standalone_sdk/31_iterative_refinement.py | ✅ PASS | 5m 2s | $0.38 |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 22.0s | $0.02 |
| 01_standalone_sdk/34_critic_example.py | ❌ FAIL Exit code 1 |
3.8s | -- |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ✅ PASS | 9.6s | $0.00 |
| 01_standalone_sdk/37_llm_profile_store.py | ✅ PASS | 3.8s | $0.00 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 51.2s | $0.04 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
1m 0s | -- |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
16.2s | -- |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ❌ FAIL Exit code 1 |
1m 40s | -- |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ✅ PASS | 28.3s | $0.02 |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ❌ FAIL Exit code 1 |
3m 3s | -- |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ✅ PASS | 20.1s | $0.02 |
| 04_llm_specific_tools/02_gemini_file_tools.py | ✅ PASS | 1m 1s | $0.08 |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ✅ PASS | 12.8s | $0.02 |
| 05_skills_and_plugins/02_loading_plugins/main.py | ✅ PASS | 7.2s | $0.01 |
❌ Some tests failed
Total: 38 | Passed: 32 | Failed: 6 | Total Cost: $1.47
Failed examples:
- examples/01_standalone_sdk/14_context_condenser.py: Timed out after 600 seconds
- examples/01_standalone_sdk/34_critic_example.py: Exit code 1
- examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py: Exit code 1
When libtmux's new_session() returns a Session with session_id=None (due to a bug in parse_output() not handling fewer output values), we now retry fetching the session from server.sessions. This is a workaround for the issue where neubig's fix PR #625 doesn't fully handle cases where tmux returns fewer than the expected 125 format field values, causing the zip() to truncate before reaching session_id at index 92. The workaround: 1. Check if session_id is None after new_session() 2. If so, retry up to 3 times with increasing delays 3. Fetch the session by name from server.sessions 4. Raise a clear error if all retries fail See: tmux-python/libtmux#624 Co-authored-by: openhands <openhands@all-hands.dev>
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 29.9s | $0.03 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 19.5s | $0.03 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 12.4s | $0.01 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 42.8s | $0.04 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 17.1s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 26.7s | $0.02 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 38.8s | $0.04 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 19.0s | $0.02 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 23.1s | $0.02 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 2m 58s | $0.38 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 14.5s | $0.02 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 15.9s | $0.01 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 13.0s | $0.02 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 18.4s | $0.03 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 10.4s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 13.7s | $0.01 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 1m 11s | $0.01 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 3m 33s | $0.26 |
| 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 2m 23s | $0.19 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 20.2s | $0.02 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 33.3s | $0.02 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 41.2s | $0.03 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 21.9s | $0.02 |
| 01_standalone_sdk/31_iterative_refinement.py | ✅ PASS | 3m 54s | $0.28 |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 24.9s | $0.02 |
| 01_standalone_sdk/34_critic_example.py | ❌ FAIL Exit code 1 |
3.9s | -- |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ✅ PASS | 10.0s | $0.00 |
| 01_standalone_sdk/37_llm_profile_store.py | ✅ PASS | 4.0s | $0.00 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 57.8s | $0.04 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
1m 5s | -- |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
17.1s | -- |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ❌ FAIL Exit code 1 |
1m 11s | -- |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ✅ PASS | 39.3s | $0.03 |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ❌ FAIL Exit code 1 |
2m 59s | -- |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ✅ PASS | 23.6s | $0.03 |
| 04_llm_specific_tools/02_gemini_file_tools.py | ✅ PASS | 1m 3s | $0.10 |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ✅ PASS | 13.5s | $0.01 |
| 05_skills_and_plugins/02_loading_plugins/main.py | ✅ PASS | 12.6s | $0.01 |
❌ Some tests failed
Total: 38 | Passed: 33 | Failed: 5 | Total Cost: $1.75
Failed examples:
- examples/01_standalone_sdk/34_critic_example.py: Exit code 1
- examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py: Exit code 1
Improved the workaround for the libtmux race condition where new_session() returns a Session with session_id=None. Changes: - Wrap entire session creation in retry loop (up to 5 attempts) - Try to get session from server.sessions if session_id is None - Add increasing delays between retries - Clean up orphan sessions before retry - Better error messages with last error included The root issue is that neubig's PR #625 fix tries to parse 125+ format fields from tmux output, but tmux may not output all fields in some environments (especially Python 3.13 + PyInstaller + Docker), causing session_id (at index 92) to be missing from the parsed output. Co-authored-by: openhands <openhands@all-hands.dev>
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 26.0s | $0.03 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 17.3s | $0.03 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 12.5s | $0.01 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 40.0s | $0.03 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 15.5s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 29.4s | $0.02 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 35.9s | $0.04 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 20.5s | $0.02 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 20.8s | $0.01 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 4m 48s | $0.56 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 15.3s | $0.02 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 26.4s | $0.02 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 13.4s | $0.02 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 12.8s | $0.02 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 10.6s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 19.7s | $0.01 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 1m 31s | $0.01 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 5m 12s | $0.39 |
| 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 1m 58s | $0.17 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 22.8s | $0.02 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 34.8s | $0.03 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 40.9s | $0.03 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 17.4s | $0.02 |
| 01_standalone_sdk/31_iterative_refinement.py | ❌ FAIL Timed out after 600 seconds |
10m 0s | -- |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 16.7s | $0.02 |
| 01_standalone_sdk/34_critic_example.py | ❌ FAIL Exit code 1 |
4.1s | -- |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ✅ PASS | 9.9s | $0.00 |
| 01_standalone_sdk/37_llm_profile_store.py | ✅ PASS | 3.9s | $0.00 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 46.6s | $0.03 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
1m 2s | -- |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
17.8s | -- |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ❌ FAIL Exit code 1 |
2m 22s | -- |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ✅ PASS | 28.0s | $0.02 |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ❌ FAIL Exit code 1 |
2m 48s | -- |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ✅ PASS | 21.1s | $0.01 |
| 04_llm_specific_tools/02_gemini_file_tools.py | ✅ PASS | 40.6s | $0.05 |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ✅ PASS | 11.1s | $0.01 |
| 05_skills_and_plugins/02_loading_plugins/main.py | ✅ PASS | 7.1s | $0.01 |
❌ Some tests failed
Total: 38 | Passed: 32 | Failed: 6 | Total Cost: $1.69
Failed examples:
- examples/01_standalone_sdk/31_iterative_refinement.py: Timed out after 600 seconds
- examples/01_standalone_sdk/34_critic_example.py: Exit code 1
- examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py: Exit code 1
|
@OpenHands merge main. then carefully read all the surrounding context and debug why |
|
I'm on it! neubig can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
The libtmux fix (neubig's PR #625) uses a format string with 125+ fields, but tmux may not output all fields correctly in some environments (Python 3.13 + PyInstaller + Docker), causing session_id (at index 92) to be missing from the parsed output. This adds a fallback that directly queries tmux using a simple format string ('#{session_id}:#{session_name}') to get the session_id when libtmux's complex format parsing fails. Co-authored-by: openhands <openhands@all-hands.dev>
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
SummaryI successfully debugged and fixed the Root Cause AnalysisThe test was failing with: The issue was that the libtmux fix (neubig's PR #625) uses a format string with 125+ fields to parse tmux output. The The existing retry logic tried to get the session from Fix ImplementedAdded a new fallback function
This fallback is tried before the existing Changes Made
Verification
The remaining 3 failures are unrelated to the libtmux issue:
Commits Pushed
|
all-hands-bot
left a comment
There was a problem hiding this comment.
🔴 Critical Design Issues - Do Not Merge Yet
Taste Rating: 🔴 Needs fundamental rework
This PR attempts to fix a real problem (libtmux race condition in Python 3.13) but shows signs that the upstream fix is incomplete. The 80+ lines of workaround/retry logic suggests we're building a band-aid on top of a band-aid.
Core Issue: If the upstream libtmux fix works as intended, why do we need extensive retry logic, fallback mechanisms, and manual session_id extraction? The answer is in your own comments: "tmux may not output all fields" - the upstream fix is incomplete.
Key Insight: Pinning to an unmerged git branch with extensive workarounds is the wrong approach. Either (1) wait for a proper upstream release, or (2) fix the root cause in libtmux properly, or (3) implement a simpler workaround that doesn't require 80 lines of retry spaghetti.
See detailed inline comments below.
|
Thanks for the thorough review! I agree with the assessment:
Leaving the review threads unresolved as reminders for when we revisit this PR after the upstream is ready. |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable for Draft/Testing
Taste Rating: 🟡 Pragmatic for its purpose
This draft PR achieves its stated goal: testing the Python 3.13 upgrade with the libtmux race condition fix in CI. Previous architectural concerns (git branch pin, retry complexity) were discussed and marked resolved - the maintainer understands this is temporary until upstream releases to PyPI.
Key Observations
1. SDK_SHA infrastructure already exists ✅
build.pywas updated in Nov 2025 (commit 204d3a4) to readSDK_SHAwith fallback toGITHUB_SHA- The workflow changes in this PR correctly use this existing mechanism
- No changes to
build.pyneeded
2. Variable naming is intentional ✅
SDK_SHA(in server.yml) = build context for tagging Docker imagesAGENT_SERVER_SHA(in run-examples.yml) = runtime context for examples to pull images- Different names reflect their different purposes in separate workflows
3. Comprehensive comments ✅
- The added comments explaining SHA handling make the confusing GitHub Actions variable behavior clearer
- Debug echo statements (server.yml:242-244) help verify SHA handling during testing
- Good engineering for maintainability
4. Type hints are clean ✅
Generator[T]syntax is correct for Python 3.9+- Changes are consistent across codebase
For Production Merge (when upstream is ready)
Before merging to production:
- ✅ Wait for upstream libtmux PR to be merged and released to PyPI (already noted in PR description)
- 🟢 Consider removing debug echo statements or gate behind a debug flag
- 🟡 Evaluate whether the extensive retry logic indicates the upstream fix needs more work
Verdict
✅ Approved for draft/testing purposes
The code is pragmatic for validating the libtmux fix in your CI environment. The previous review comprehensively covered the blockers for production merge, and those concerns remain valid - but this is explicitly a draft PR for testing, not for immediate production use.
The approach is sound for its stated purpose. Good luck with the testing! 🚀
|
Closing in favor of #2092 which has a much simpler fix. The root cause was discovered to be a locale issue, not a race condition in libtmux. When LC_ALL/LANG are not set in PyInstaller builds, tmux converts UTF-8 separator characters to underscores, breaking libtmux's format parsing. The fix is simply adding |
Summary
This PR upgrades the project to Python 3.13 and pins libtmux to neubig/libtmux@fix/new-session-race-condition which contains the fix for the race condition reported in libtmux#624.
Context
See upstream PR: tmux-python/libtmux#625
The issue was that
new_session()in libtmux would:tmux new-session -P -F#{session_id}to create sessiontmux list-sessionsto fetch full session dataThis created a race condition in Python 3.13 environments (especially with PyInstaller + Docker) where
list-sessionsmight not see the newly created session yet, causingTmuxObjectDoesNotExisterrors.The fix expands the
-Fformat string to include all Session fields and parses the output directly, eliminating the separate list-sessions query entirely.Changes
target-versionfrompy312topy313in root pyproject.toml (ruff)pythonVersionfrom3.12to3.13in root pyproject.toml (pyright)server.ymlbuild matrix from 3.12 to 3.13pypi-release.ymlfrom 3.12 to 3.13pr-reviewaction from 3.12 to 3.13libtmux @ git+https://github.com/neubig/libtmux.git@fix/new-session-race-conditionTesting
This PR needs integration tests to verify the libtmux fix works correctly in our CI environment. The
integration-testlabel should trigger those tests.Note
This is a draft PR to test the libtmux fix. Once the upstream PR is merged and released to PyPI, we should update the dependency to the released version.
Related issues:
@neubig can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:40e712f-pythonRun
All tags pushed for this build
About Multi-Architecture Support
40e712f-python) is a multi-arch manifest supporting both amd64 and arm6440e712f-python-amd64) are also available if needed