Skip to content

Conversation

@r142f
Copy link

@r142f r142f commented Feb 4, 2026

Add automatic nearest datacenter detection for connection pooling

Pull request type

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

What is the current behavior?

The SDK uses round-robin load balancing across all discovered endpoints without considering network latency, which can lead to suboptimal performance in multi-datacenter deployments.

Fixes: #446

What is the new behavior?

Adds opt-in nearest datacenter detection (enabled via DriverConfig(detect_local_dc=True)) that uses TCP connection racing to identify and prioritize the fastest-responding datacenter, reducing request latency in multi-DC deployments.

Other information

  • Fully backward compatible (disabled by default)

@r142f r142f force-pushed the local-dc-detection branch from 1cced85 to 139dde2 Compare February 5, 2026 03:45
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an opt-in “nearest datacenter” selection mechanism for endpoint discovery, using a TCP connection race to prefer endpoints in the lowest-latency location when DriverConfig(detect_local_dc=True) is enabled.

Changes:

  • Add sync + async nearest_dc modules implementing TCP-race-based location detection.
  • Wire DC detection into sync and async discovery pools to mark endpoints in the detected location as preferred.
  • Add detect_local_dc flag to DriverConfig and introduce unit tests for both sync and async implementations.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
ydb/pool.py Uses detected local DC (when enabled) to set endpoint preference during discovery.
ydb/nearest_dc.py Implements synchronous TCP race DC detection helpers.
ydb/driver.py Adds detect_local_dc configuration flag to DriverConfig.
ydb/aio/pool.py Async discovery now prefers endpoints in the detected local DC.
ydb/aio/nearest_dc.py Implements asynchronous TCP race DC detection helpers.
tests/test_nearest_dc.py Adds unit tests for sync nearest DC detection behavior.
tests/aio/test_nearest_dc.py Adds unit tests for async nearest DC detection behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 266 to 270
if not self._ssl_required and resolved_endpoint.ssl:
continue

preferred = resolve_details.self_location == resolved_endpoint.location
preferred = local_dc == resolved_endpoint.location

Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detect_local_dc only influences the preferred flag, but ConnectionsCache ignores the preferred list when DriverConfig.use_all_nodes is True (which is the default). As a result, enabling detect_local_dc=True may have no effect on endpoint selection for the default configuration. Consider either (a) updating the selection logic so preferred endpoints are prioritized even when use_all_nodes=True (while still falling back to others), or (b) documenting that detect_local_dc is only effective when use_all_nodes=False.

Copilot uses AI. Check for mistakes.
Comment on lines 179 to 183
if not self._ssl_required and resolved_endpoint.ssl:
continue

preferred = resolve_details.self_location == resolved_endpoint.location
preferred = local_dc == resolved_endpoint.location

Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as sync pool: the preferred flag set from local_dc is only used by ConnectionsCache when use_all_nodes is False. With the default use_all_nodes=True, detect_local_dc=True is unlikely to change endpoint selection. Either prioritize preferred endpoints even in use_all_nodes mode, or clarify in config/docs that detect_local_dc requires use_all_nodes=False to have an effect.

Copilot uses AI. Check for mistakes.
Comment on lines +137 to +162
if len(endpoints_by_location) == 1:
location = list(endpoints_by_location.keys())[0]
logger.info("Only one location found: %s", location)
return location

endpoints_to_test = []
for location, location_endpoints in endpoints_by_location.items():
sample = _get_random_endpoints(location_endpoints, max_per_location)
endpoints_to_test.extend(sample)
logger.debug(
"Selected %d/%d endpoints from location '%s' for testing",
len(sample),
len(location_endpoints),
location,
)

fastest_endpoint = _check_fastest_endpoint(endpoints_to_test, timeout=timeout)

if fastest_endpoint is None:
logger.warning("Failed to detect local DC via TCP race: no endpoint connected in time")
return None

detected_location = fastest_endpoint.location
logger.info("Detected local DC: %s", detected_location)

return detected_location
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper logs at INFO/WARNING for common outcomes (single-location, success, and “no endpoint connected”), but callers (pool.py / aio/pool.py) also log these events, leading to duplicate log lines on every discovery cycle. Consider downgrading these to DEBUG here (and letting the pool layer handle INFO/WARNING), or add a way for the caller to control logging verbosity.

Copilot uses AI. Check for mistakes.
Comment on lines +138 to +163
if len(endpoints_by_location) == 1:
location = list(endpoints_by_location.keys())[0]
logger.info("Only one location found: %s", location)
return location

endpoints_to_test = []
for location, location_endpoints in endpoints_by_location.items():
sample = _get_random_endpoints(location_endpoints, max_per_location)
endpoints_to_test.extend(sample)
logger.debug(
"Selected %d/%d endpoints from location '%s' for testing",
len(sample),
len(location_endpoints),
location,
)

fastest_endpoint = await _check_fastest_endpoint(endpoints_to_test, timeout=timeout)

if fastest_endpoint is None:
logger.warning("Failed to detect local DC via TCP race: no endpoint connected in time")
return None

detected_location = fastest_endpoint.location
logger.info("Detected local DC: %s", detected_location)

return detected_location
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detect_local_dc logs INFO/WARNING in this helper, and the pool layer also logs success/failure, which can duplicate log lines during discovery. Consider making these logs DEBUG-only here (or making logging optional) and keeping user-facing logs in the pool layer where server-reported vs detected location is already included.

Copilot uses AI. Check for mistakes.
:param grpc_lb_policy_name: A load balancing policy to be used for discovery channel construction. Default value is `round_round`
:param discovery_request_timeout: A default timeout to complete the discovery. The default value is 10 seconds.
:param disable_discovery: If True, endpoint discovery is disabled and only the start endpoint is used for all requests.
:param detect_local_dc: If True, detect nearest datacenter using TCP latency measurement instead of using server-provided self_location.
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring for detect_local_dc suggests it can be enabled independently, but with the current pool/cache logic it only affects routing when use_all_nodes=False (since preferred endpoints are ignored when using all nodes). Please either update the docstring to mention this interaction, or adjust the implementation so detect_local_dc=True has an effect even with use_all_nodes=True.

Copilot uses AI. Check for mistakes.
finally:
sock.close()

except (OSError, socket.timeout):
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except (OSError, socket.timeout):
except (OSError, socket.timeout):
# Ignore expected connection errors; endpoints that fail simply lose the TCP race.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python Detect and select nearest DB by tcp-pings

1 participant