Skip to content

Conversation

@njshah301
Copy link
Collaborator

@njshah301 njshah301 commented Jan 13, 2026

Introduces the metrics exporter for the MoSAPI system.

  • Implements MosApiMetrics to export TLD and service states to Cloud Monitoring.
  • Maps ICANN MoSAPI status codes to numeric gauges: 1 (UP), 0 (DOWN), and 2 (DISABLED/INCONCLUSIVE).
  • Sets MAX_TIMESERIES_PER_REQUEST to respect Cloud Monitoring API limits

This change is Reviewable

@njshah301 njshah301 changed the title Add MoSApiMetrics exporter Add MosApiMetrics exporter Jan 13, 2026
@njshah301 njshah301 enabled auto-merge January 13, 2026 09:17
@njshah301 njshah301 added kokoro:force-run Force a Kokoro build. and removed kokoro:force-run Force a Kokoro build. labels Jan 13, 2026
@njshah301 njshah301 force-pushed the njshah301/mosapi-metrics-logic branch from ee71228 to 44605b9 Compare January 13, 2026 10:08
Introduces the metrics exporter for the MoSAPI system.

- Implements `MosApiMetrics` to export TLD and service states to Cloud Monitoring.
- Maps ICANN status codes to numeric gauges: 1 (UP), 0 (DOWN), and 2 (DISABLED/INCONCLUSIVE).
- Sets `MAX_TIMESERIES_PER_REQUEST` to 195 to respect Cloud Monitoring API limits
@njshah301 njshah301 force-pushed the njshah301/mosapi-metrics-logic branch from 44605b9 to 9fdf976 Compare January 13, 2026 16:10
@njshah301 njshah301 requested a review from weiminyu January 18, 2026 07:45
Copy link
Collaborator

@gbrodman gbrodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gbrodman reviewed 4 files and all commit messages, and made 18 comments.
Reviewable status: 4 of 6 files reviewed, 18 unresolved discussions (waiting on @CydeWeys, @njshah301, @ptkach, and @weiminyu).


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 103 at r1 (raw file):

    // Initialize Metric Descriptors once on startup
    ensureMetricDescriptors();

Don't do work in the constructor. It's bad practice, as it can lead to unexpected oddities.


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 146 at r1 (raw file):

          } catch (Exception e) {
            logger.atWarning().withCause(e).log(
                "Failed to create metric descriptors (they may already exist).");

if this is expected to happen regularly, it probably shouldn't be in an exception / warning case


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 156 at r1 (raw file):

      String displayName,
      String description,
      String metricKind,

this is always GAUGE right?


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 161 at r1 (raw file):

      throws IOException {

    List<LabelDescriptor> labelDescriptors = new ArrayList<>();

Use stream-collect (toImmutableList()) instead of manually constructing these


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 190 at r1 (raw file):

          try {
            pushBatchMetrics(states);
          } catch (Throwable t) {

Generally, avoid catching a raw Throwable. It can hide things. We should be more precise.


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 196 at r1 (raw file):

  }

  private void pushBatchMetrics(List<TldServiceState> states) throws IOException {

One batch failure shouldn't cause all the others to fail with an IOException. Catch the exception where it occurs, log it, and then try the next batch.


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 197 at r1 (raw file):

  private void pushBatchMetrics(List<TldServiceState> states) throws IOException {
    List<TimeSeries> allTimeSeries = new ArrayList<>();

Constructing a list piece by piece over many methods is brittle and it's not immediately clear where everything happens. Instead, convert the stream of TldServiceState into a stream of TimeSeries using flatMap (flatMap instead of just map because it seems like one TldServiceState maps to multiple TimeSeries objects)


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 199 at r1 (raw file):

    List<TimeSeries> allTimeSeries = new ArrayList<>();
    TimeInterval interval =
        new TimeInterval().setEndTime(new DateTime(System.currentTimeMillis()).toString());

Inject a clock and use that for the time instead of calling System


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 206 at r1 (raw file):

      // 2. Service-level Metrics
      Map<String, ServiceStatus> services = state.serviceStatuses();

state::serviceStatuses should use an immutable map if it doesn't already

can this ever even be null? if not, don't worry about the null case.


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 227 at r1 (raw file):

  private void addServiceMetrics(
      List<TimeSeries> list,

Use the full type (ImmutableList, ImmutableMap etc) whenever possible for clarity (this goes for elsewhere in the file, too)

If we ever specify the standard List/Map type it's usually because we're forced to by an API we don't control


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 254 at r1 (raw file):

      String suffix, Map<String, String> labels, Number val, TimeInterval interval) {
    Metric metric = new Metric().setType(METRIC_DOMAIN + suffix).setLabels(labels);
    MonitoredResource resource =

i don't think we need to recreate this object every time


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 304 at r1 (raw file):

   */
  private long parseServiceStatus(String status) {
    if (status == null) {

By the time we've reached this point in the code, we should be guaranteed that the status has a non-null value. If the status is null, that means (as far as I'm aware) that the response from MosAPI was bad / wrong in a major way. If that happens, we should be taking care of that error when it happens, not here down the line.


core/src/main/java/google/registry/config/files/default-config.yaml line 651 at r1 (raw file):

  # Metrics Reporting Thread Count
  # Set to 1. Given the current TLD volume and the 5-minute reporting interval,

better phrasing "Defaults to 1".

The rest of this should be phrased in a way that it can apply broadly to any user of Nomulus, not just us. Give guidance on how other people should properly set this field.


core/src/main/java/google/registry/config/RegistryConfigSettings.java line 276 at r1 (raw file):

    public List<String> services;
    public int tldThreadCnt;

no extra line


core/src/main/java/google/registry/mosapi/module/MosApiModule.java line 208 at r1 (raw file):

  /**
   * Provides a single-threaded executor for sequential metrics reporting.

This is a config point, so it's not necessary single-threaded.


core/src/main/java/google/registry/mosapi/module/MosApiModule.java line 210 at r1 (raw file):

   * Provides a single-threaded executor for sequential metrics reporting.
   *
   * <p>Bound to 1 thread because the Google Cloud Monitoring exporter processes batches

"round"?


core/src/test/java/google/registry/mosapi/MosApiMetricsTest.java line 40 at r1 (raw file):

import org.junit.jupiter.api.Test;
import org.mockito.ArgumentCaptor;

nit: link the class being tested in a class-level javadoc comment like we do for other test classes


core/src/main/java/google/registry/config/RegistryConfig.java line 1472 at r1 (raw file):

    @Provides
    @Config("mosapiMetricsThreadCnt")

Use the full name Count, there's no point in saving only two letters when it's already this long

same for the config field too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants