adds daily task to update global bot leaderboard, refactors command for convenience #3933

lsabor · 2025-12-14T20:05:54Z

This task will take ~20 minutes long, so it might be worth considering optimizing the approach.
The only thing that takes a long time is the head to head scoring for popular questions. The simplest solution would be to store all of the head to head scores at question resolution in a new table. This would take the task down to 1-2 minutes from upwards of 30 (and maybe some memory issues).

Summary by CodeRabbit

New Features
- Global Bot Leaderboard now automatically updates daily at 5:00 AM UTC.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…or convenience

lsabor · 2025-12-14T20:07:04Z

scoring/management/commands/update_global_bot_leaderboard.py

The only thing that happens here is to factor out the handle action so it can be called with a task

hlbmtc · 2025-12-15T12:18:47Z

scoring/management/commands/update_global_bot_leaderboard.py

+    bootstrap_iterations = 30
+
+    # SETUP: users to evaluate & questions
+    print("Initializing...")


Prints won't be typically visible in django command handlers, so I suggest using logging instead

hlbmtc · 2025-12-15T12:19:47Z

scoring/management/commands/update_global_bot_leaderboard.py

+        metadata__bot_details__metac_bot=True,
+        metadata__bot_details__include_in_calculations=True,
+        metadata__bot_details__display_in_leaderboard=True,


It looks like this might be a slow operation since we don't index User.metadata col

hlbmtc · 2025-12-15T12:21:17Z

scoring/management/commands/update_global_bot_leaderboard.py

+                related_posts__post__default_project_id__in=[
+                    3349,  # aib q3 2024
+                    32506,  # aib q4 2024
+                    32627,  # aib q1 2025
+                    32721,  # aib q2 2025
+                    32813,  # aib fall 2025
+                ]


Is there a way to refactor this so we don’t hardcode these values here?
At the very least, I’d move these IDs into constants.py.

hlbmtc · 2025-12-15T12:22:50Z

scoring/management/commands/update_global_bot_leaderboard.py

-                Score.objects.filter(user=user, question__in=question_list)
-                .distinct("question_id")
-                .count()
+        .exclude(related_posts__post__default_project__slug__startswith="minibench")


As I mentioned here, I don't like this slug approach

#3925 (comment)

coderabbitai · 2026-01-23T21:59:29Z

📝 Walkthrough

Walkthrough

The changes implement daily scheduled updates for the Global Bot Leaderboard. A new cron job is added to trigger at 05:00 UTC, which invokes a wrapper function in the jobs layer that executes refactored leaderboard update logic. The update function is restructured as a callable runner instead of a Django Command class.

Changes

Cohort / File(s)	Summary
Cron Job Scheduling `misc/management/commands/cron.py`	Imports `update_gobal_bot_leaderboard` and registers a daily cron job to execute at 05:00 UTC with concurrency handling via `close_old_connections` wrapper.
Job Orchestration Layer `scoring/jobs.py`	Adds new `update_gobal_bot_leaderboard()` function that locates the "Global Bot Leaderboard", logs warnings if not found, and triggers the update via exception-wrapped `run_update_global_bot_leaderboard()` call.
Leaderboard Update Logic `scoring/management/commands/update_global_bot_leaderboard.py`	Refactors command structure: removes Django Command class, introduces `run_update_global_bot_leaderboard()` as primary entry point; updates `gather_data()` to default `cache=True` and implements caching logic to track and skip already-cached questions; restructures data flow initialization before score computation.

Sequence Diagram(s)

sequenceDiagram
    participant Scheduler as Cron Scheduler
    participant Conn as Connection Manager
    participant Jobs as Jobs Layer
    participant LB as Leaderboard DB
    participant Update as Update Function
    
    Scheduler->>Scheduler: Trigger at 05:00 UTC
    Scheduler->>Conn: close_old_connections(update_gobal_bot_leaderboard)
    Conn->>Jobs: execute update_gobal_bot_leaderboard()
    Jobs->>LB: Query for "Global Bot Leaderboard"
    LB-->>Jobs: Leaderboard object or None
    alt Leaderboard Found
        Jobs->>Update: run_update_global_bot_leaderboard()
        Update->>Update: gather_data() with cache=True
        Update->>Update: Compute scores & rankings
        Update->>LB: Update leaderboard entries
        LB-->>Update: Confirmation
        Update-->>Jobs: Success
    else Leaderboard Not Found
        Jobs->>Jobs: Log warning
        Jobs-->>Conn: Return
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hop, hop, the leaderboard grows,
Five o'clock strikes where the data flows,
Caches remembered, questions all found,
Bots ranked daily, scores spinning 'round! ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The PR implements the primary objective of setting up a daily cronjob to update the leaderboard [`#3808`] but does not address performance optimization requirements or head-to-head score persistence.	Consider implementing head-to-head score caching or database persistence as outlined in `#3808` to reduce runtime from ~20 minutes to 1-2 minutes.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the two main changes: adding a daily task to update the global bot leaderboard and refactoring the command for convenience.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with the PR objectives and linked issue `#3808`, with no out-of-scope modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Fix all issues with AI agents

In `@misc/management/commands/cron.py`:
- Around line 212-218: The two scheduled jobs update_custom_leaderboards and
update_gobal_bot_leaderboard are both added via scheduler.add_job using
CronTrigger.from_crontab("0 5 * * *") and will run concurrently; change one cron
expression to a staggered time (e.g., set update_gobal_bot_leaderboard to "0 6 *
* *") so the long-running update_gobal_bot_leaderboard does not overlap with
update_custom_leaderboards, keeping id, max_instances, replace_existing handling
the same.

In `@scoring/jobs.py`:
- Line 19: Rename the misspelled function update_gobal_bot_leaderboard to
update_global_bot_leaderboard and update every reference to it (including the
cron job ID and any imports) so names stay consistent; specifically change the
import and usage in misc/management/commands/cron.py to import
update_global_bot_leaderboard and update the cron registration/ID that currently
uses the typo, then run a quick search for update_gobal_bot_leaderboard to
replace remaining occurrences.

In `@scoring/management/commands/update_global_bot_leaderboard.py`:
- Around line 805-821: The loop over ordered_skills is causing an N+1 DB hit via
User.objects.get(id=uid); fix by collecting all integer uids used in that loop
(where uid is not a str) and doing a single query like
User.objects.filter(id__in=collected_ids) to build a mapping uid_to_username,
then inside the loop replace User.objects.get(id=uid).username with
uid_to_username.get(uid, "<missing>") (keeping the existing behavior for str
uids where username = uid) so no per-iteration queries occur; update references
in the same function where unevaluated and player_stats are used to read from
the prebuilt mapping instead of calling User.objects.get.
- Around line 655-665: The per-user loop triggers an N+1 query by calling
Score.objects.filter(user=user,
question__in=question_list).distinct("question_id").count() for each user;
instead run a single aggregated query like Score.objects.filter(user__in=users,
question__in=question_list).values("user").annotate(count=Count("question",
distinct=True)) to get distinct question counts per user, then populate
scored_question_counts from that result (referencing scored_question_counts,
users, question_list and the Score model) rather than counting inside the loop.
- Around line 726-731: The current use of Leaderboard.objects.get_or_create(...)
with multiple fields risks creating duplicates when non-lookup fields change;
change the logic to call Leaderboard.objects.get_or_create(name="Global Bot
Leaderboard") to perform lookup by name only, then set or update the remaining
attributes (project=Project.objects.get(type=Project.ProjectTypes.SITE_MAIN),
score_type=LeaderboardScoreTypes.MANUAL,
bot_status=Project.BotLeaderboardStatus.BOTS_ONLY) on the returned leaderboard
instance and call save() so existing records are updated instead of creating
duplicates; ensure you still handle the created flag appropriately from
Leaderboard.objects.get_or_create.
- Around line 743-747: The loop performs N+1 queries by calling
User.objects.get(id=uid) for each integer uid; instead, before iterating gather
all integer uids used in that loop, fetch users in bulk via
User.objects.filter(id__in=ids) (or User.objects.in_bulk(ids)) in the
update_global_bot_leaderboard logic, build a mapping from id to User, and
replace the per-iteration User.objects.get call with a lookup like
users_by_id[uid] (handle missing users gracefully); this removes the
per-iteration DB hit while preserving the existing checks on
user.metadata["bot_details"] and display_in_leaderboard.
- Around line 618-648: The Post.default_project foreign key lacks an index and
is used heavily in joins (see filters/excludes on
related_posts__post__default_project__* in update_global_bot_leaderboard), so
add db_index=True to the ForeignKey declaration for default_project on the Post
model, create and commit the generated migration (makemigrations), and run/apply
the migration (migrate) to ensure the DB has the new index; this will speed up
joins against Project.default_permission and related filters.

♻️ Duplicate comments (4)

scoring/management/commands/update_global_bot_leaderboard.py (4)

144-180: Hardcoded project IDs create maintenance burden and coupling.

These hardcoded project IDs (lines 146-152, 155-159) and the mapping logic are fragile - any new AIB project requires code changes. This was already flagged in a previous review.

627-633: Hardcoded AIB project IDs duplicated from earlier in the file.

These IDs (3349, 32506, 32627, 32721, 32813) duplicate lines 146-152. This was flagged in a previous review - consider extracting to constants.

639-639: Slug-based exclusion approach was flagged in previous review.

The .exclude(related_posts__post__default_project__slug__startswith="minibench") approach was already discussed as problematic.

778-892: Debug/test output in production code.

The extensive print statements for results display (lines 779-835) and statistical tests (lines 843-892) appear to be debug/development artifacts. In a cron job context, these won't be visible and add execution overhead. The previous review flagged that prints won't be visible in Django command handlers.

Consider using logging at DEBUG level or removing these diagnostic outputs for production runs.

🧹 Nitpick comments (4)

scoring/management/commands/update_global_bot_leaderboard.py (2)
107-141: Default parameter change from False to True may have unintended side effects.

Changing cache: bool = True means all callers now default to caching behavior. This could cause stale data issues if the cache file (HtH_score_data.csv) exists but contains outdated information. The code also writes to a file in the current working directory, which may vary depending on execution context (cron job vs manual invocation).

Consider:

Using an absolute path or Django's MEDIA_ROOT/temp directory for cache files

Adding cache invalidation logic (e.g., timestamp-based expiry)

710-718: Consider adding strict=True to zip() for safety.

Static analysis flagged line 714's zip() call without strict=. While the lists should be the same length by construction, adding strict=True would catch bugs if they diverge.
♻️ Proposed fix
-    for u1id, u2id, qid in zip(user1_ids, user2_ids, question_ids):
+    for u1id, u2id, qid in zip(user1_ids, user2_ids, question_ids, strict=True):
scoring/jobs.py (2)
20-25: Existence check queries the leaderboard but doesn't use the result.

The function fetches global_bot_leaderboard to check existence but never uses it - run_update_global_bot_leaderboard() will perform its own get_or_create. This is defensive but slightly wasteful. Consider either:

Removing the check since run_update_global_bot_leaderboard handles creation

Passing the leaderboard instance to avoid duplicate queries

26-29: Use logger.exception() instead of logger.error() to capture stack trace.

Per static analysis, logger.exception() automatically includes the traceback, which is essential for debugging failures in a cron job.
♻️ Proposed fix
     try:
         run_update_global_bot_leaderboard()
     except Exception as e:
-        logger.error(f"Error updating Global Bot Leaderboard: {e}")
+        logger.exception("Error updating Global Bot Leaderboard")

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c2c0e37 and 55af6c6.

📒 Files selected for processing (3)

misc/management/commands/cron.py
scoring/jobs.py
scoring/management/commands/update_global_bot_leaderboard.py

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2026-01-15T19:29:58.940Z

Learnt from: hlbmtc
Repo: Metaculus/metaculus PR: 4075
File: authentication/urls.py:24-26
Timestamp: 2026-01-15T19:29:58.940Z
Learning: In this codebase, DRF is configured to use IsAuthenticated as the default in REST_FRAMEWORK['DEFAULT_PERMISSION_CLASSES'] within metaculus_web/settings.py. Therefore, explicit permission_classes([IsAuthenticated]) decorators are unnecessary on DRF views unless a view needs to override the default. When reviewing Python files, verify that views relying on the default are not redundantly decorated, and flag cases where permissions are being over-specified or when a non-default permission is explicitly required.

Applied to files:

scoring/management/commands/update_global_bot_leaderboard.py
misc/management/commands/cron.py
scoring/jobs.py

🧬 Code graph analysis (3)

scoring/management/commands/update_global_bot_leaderboard.py (3)

scoring/models.py (3)

Score (17-56)

Leaderboard (97-280)

LeaderboardEntry (302-354)

projects/models.py (4)

Project (212-495)

BotLeaderboardStatus (248-252)

save (404-439)

save (603-608)

scoring/constants.py (1)

LeaderboardScoreTypes (17-57)

misc/management/commands/cron.py (1)

scoring/jobs.py (1)

update_gobal_bot_leaderboard (19-29)

scoring/jobs.py (2)

scoring/management/commands/update_global_bot_leaderboard.py (1)

run_update_global_bot_leaderboard (603-892)

scoring/models.py (1)

Leaderboard (97-280)

🪛 Ruff (0.14.13)

scoring/management/commands/update_global_bot_leaderboard.py

714-714: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

900-900: Unused method argument: args

(ARG002)

900-900: Unused method argument: options

(ARG002)

scoring/jobs.py

28-28: Do not catch blind exception: Exception

(BLE001)

29-29: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Build Docker Image / Build Docker Image
GitHub Check: Backend Checks
GitHub Check: integration-tests

🔇 Additional comments (5)

scoring/management/commands/update_global_bot_leaderboard.py (3)

189-191: Early return for cached questions improves performance.

This optimization correctly skips questions that are already in the cache, reducing redundant processing time.

603-614: Refactoring to a standalone function is a good pattern for cron integration.

Exposing run_update_global_bot_leaderboard() as a standalone callable makes it reusable from both the Django management command and the scheduled job. The baseline player ID (236038) is still hardcoded though - consider making it configurable or documenting why this specific user is the baseline.

895-901: Command class correctly delegates to the runner function.

The thin wrapper pattern is appropriate. The unused args and options parameters flagged by static analysis are standard Django Command signatures and can be safely ignored.

misc/management/commands/cron.py (2)

29-29: Import inherits the typo from scoring/jobs.py.

Update this import if/when the function name is corrected in scoring/jobs.py.

212-218: Job configuration follows established patterns correctly.

The job uses close_old_connections wrapper, max_instances=1, and replace_existing=True consistent with other scoring jobs. Good implementation.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-23T22:11:00Z

misc/management/commands/cron.py

+        scheduler.add_job(
+            close_old_connections(update_gobal_bot_leaderboard),
+            trigger=CronTrigger.from_crontab("0 5 * * *"),  # Every day at 05:00 UTC
+            id="update_gobal_bot_leaderboard",
+            max_instances=1,
+            replace_existing=True,
+        )


⚠️ Potential issue | 🟡 Minor

Scheduling conflict: both update_custom_leaderboards and update_gobal_bot_leaderboard run at 05:00 UTC.

Both jobs are scheduled at the same time (lines 207 and 214). Since update_gobal_bot_leaderboard takes ~20 minutes as noted in the PR description, running them concurrently could cause resource contention. Consider staggering the times.

♻️ Suggested fix - stagger to 06:00 UTC

scheduler.add_job( close_old_connections(update_gobal_bot_leaderboard), - trigger=CronTrigger.from_crontab("0 5 * * *"), # Every day at 05:00 UTC + trigger=CronTrigger.from_crontab("0 6 * * *"), # Every day at 06:00 UTC id="update_gobal_bot_leaderboard", max_instances=1, replace_existing=True, )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

scheduler.add_job(

close_old_connections(update_gobal_bot_leaderboard),

trigger=CronTrigger.from_crontab("0 5 * * *"), # Every day at 05:00 UTC

id="update_gobal_bot_leaderboard",

max_instances=1,

replace_existing=True,

)

scheduler.add_job(

close_old_connections(update_gobal_bot_leaderboard),

trigger=CronTrigger.from_crontab("0 6 * * *"), # Every day at 06:00 UTC

id="update_gobal_bot_leaderboard",

max_instances=1,

replace_existing=True,

)

🤖 Prompt for AI Agents

In `@misc/management/commands/cron.py` around lines 212 - 218, The two scheduled jobs update_custom_leaderboards and update_gobal_bot_leaderboard are both added via scheduler.add_job using CronTrigger.from_crontab("0 5 * * *") and will run concurrently; change one cron expression to a staggered time (e.g., set update_gobal_bot_leaderboard to "0 6 * * *") so the long-running update_gobal_bot_leaderboard does not overlap with update_custom_leaderboards, keeping id, max_instances, replace_existing handling the same.

coderabbitai · 2026-01-23T22:11:00Z

scoring/jobs.py

 logger = logging.getLogger(__name__)


+def update_gobal_bot_leaderboard():


⚠️ Potential issue | 🟡 Minor

Typo in function name: update_gobal_bot_leaderboard should be update_global_bot_leaderboard.

The function name has a typo ("gobal" vs "global"). This typo propagates to the cron job ID and import. Fix this before it becomes a maintenance headache.

♻️ Proposed fix

-def update_gobal_bot_leaderboard(): +def update_global_bot_leaderboard():

Also update the import in misc/management/commands/cron.py accordingly.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def update_gobal_bot_leaderboard():

def update_global_bot_leaderboard():

🤖 Prompt for AI Agents

In `@scoring/jobs.py` at line 19, Rename the misspelled function update_gobal_bot_leaderboard to update_global_bot_leaderboard and update every reference to it (including the cron job ID and any imports) so names stay consistent; specifically change the import and usage in misc/management/commands/cron.py to import update_global_bot_leaderboard and update the cron registration/ID that currently uses the typo, then run a quick search for update_gobal_bot_leaderboard to replace remaining occurrences.

coderabbitai · 2026-01-23T22:11:00Z

scoring/management/commands/update_global_bot_leaderboard.py

+    questions: QuerySet[Question] = (
+        Question.objects.filter(
+            Q(
+                related_posts__post__default_project__default_permission__in=[
+                    "viewer",
+                    "forecaster",
+                ]
            )
-            .order_by("id")
-            .distinct("id")
+            | Q(
+                related_posts__post__default_project_id__in=[
+                    3349,  # aib q3 2024
+                    32506,  # aib q4 2024
+                    32627,  # aib q1 2025
+                    32721,  # aib q2 2025
+                    32813,  # aib fall 2025
+                ]
+            ),
+            related_posts__post__curation_status=Post.CurationStatus.APPROVED,
+            resolution__isnull=False,
+            scheduled_close_time__lte=timezone.now(),
        )
-        ###############
-        # make sure they have at least 100 resolved questions
-        print("initialize list")
-        question_list = list(questions)
-        print("Filtering users.")
-        scored_question_counts: dict[int, int] = defaultdict(int)
-        c = users.count()
-        i = 0
-        for user in users:
-            i += 1
-            print(i, "/", c, end="\r")
-            scored_question_counts[user.id] = (
-                Score.objects.filter(user=user, question__in=question_list)
-                .distinct("question_id")
-                .count()
+        .exclude(related_posts__post__default_project__slug__startswith="minibench")
+        .exclude(resolution__in=UnsuccessfulResolutionType)
+        .filter(Exists(user_forecast_exists))
+        .prefetch_related(  # only prefetch forecasts from those users
+            Prefetch(
+                "user_forecasts", queryset=Forecast.objects.filter(author__in=users)
            )
-        excluded_ids = [
-            uid for uid, count in scored_question_counts.items() if count < 100
-        ]
-        users = users.exclude(id__in=excluded_ids)
-        ###############
-        print("Initializing... DONE")
-
-        # Gather head to head scores
-        user1_ids, user2_ids, question_ids, scores, weights = gather_data(
-            users, questions
        )
-
-        # choose baseline player if not already chosen
-        if not baseline_player:
-            baseline_player = max(
-                set(user1_ids) | set(user2_ids), key=(user1_ids + user2_ids).count
-            )
-        # get variance of average scores (used in rescaling)
-        avg_scores = get_avg_scores(user1_ids, user2_ids, scores, weights)
-        var_avg_scores = (
-            np.var(np.array(list(avg_scores.values()))) if len(avg_scores) > 1 else 0
+        .order_by("id")
+        .distinct("id")


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "models.py" | head -20

Repository: Metaculus/metaculus

Length of output: 292

🏁 Script executed:

rg -n "class Post|class Question|class Forecast|class Project" --type python | grep models.py | head -20

Repository: Metaculus/metaculus

Length of output: 149

🏁 Script executed:

# Check for indexes on Post model rg -n "class Meta" -A 20 posts/models.py | grep -E "class Meta|indexes|db_index" | head -30

Repository: Metaculus/metaculus

Length of output: 319

🏁 Script executed:

# Check for indexes on Question model rg -n "class Meta" -A 20 questions/models.py | grep -E "class Meta|indexes|db_index" | head -30

Repository: Metaculus/metaculus

Length of output: 300

🏁 Script executed:

# Look for specific field definitions on Post and Project models rg -n "default_project|default_permission|curation_status" posts/models.py questions/models.py projects/models.py

Repository: Metaculus/metaculus

Length of output: 3499

🏁 Script executed:

# Get the Post model indexes around line 1023 sed -n '1017,1070p' posts/models.py

Repository: Metaculus/metaculus

Length of output: 1686

🏁 Script executed:

# Get the Question model indexes sed -n '388,410p' questions/models.py sed -n '613,640p' questions/models.py sed -n '703,730p' questions/models.py

Repository: Metaculus/metaculus

Length of output: 2999

🏁 Script executed:

# Check if curation_status and default_project fields have db_index rg -n "curation_status|default_project" posts/models.py | grep "db_index\|models\."

Repository: Metaculus/metaculus

Length of output: 232

🏁 Script executed:

# Get the full Post model definition around curation_status and default_project fields sed -n '570,590p' posts/models.py sed -n '810,830p' posts/models.py

Repository: Metaculus/metaculus

Length of output: 1572

🏁 Script executed:

# Check for any indexes on Post model itself sed -n '850,920p' posts/models.py

Repository: Metaculus/metaculus

Length of output: 2387

🏁 Script executed:

# Look for the Post model Meta class rg -n "class Post\(|class Meta:" posts/models.py | head -20

Repository: Metaculus/metaculus

Length of output: 191

🏁 Script executed:

# Get the Post model Meta class (the first one at line 986) sed -n '986,1016p' posts/models.py

Repository: Metaculus/metaculus

Length of output: 1315

🏁 Script executed:

# Find the actual Post class Meta definition - it should be after line 541 sed -n '900,1000p' posts/models.py | grep -n "class Meta" -A 15

Repository: Metaculus/metaculus

Length of output: 657

🏁 Script executed:

# Also check if there's any Meta class for Post model rg -n "^class Post\(" posts/models.py -A 450 | grep "class Meta" -A 20 | head -40

Repository: Metaculus/metaculus

Length of output: 346

🏁 Script executed:

# Find the end of the Post class definition sed -n '541,985p' posts/models.py | tail -50

Repository: Metaculus/metaculus

Length of output: 1899

🏁 Script executed:

# Check if Post class has any Meta class at all by looking for all Meta classes rg -n "^class.*\(.*\):" posts/models.py | head -20

Repository: Metaculus/metaculus

Length of output: 444

🏁 Script executed:

# Check if Post class has a Meta definition between lines 541-949 sed -n '541,949p' posts/models.py | grep -n "class Meta" -A 30

Repository: Metaculus/metaculus

Length of output: 45

🏁 Script executed:

# Also check default_project foreign key - does it have db_index? sed -n '817,830p' posts/models.py

Repository: Metaculus/metaculus

Length of output: 651

🏁 Script executed:

# Check the Question model and its related_posts relationship rg -n "class Question|related_posts|class Meta" questions/models.py | head -40

Repository: Metaculus/metaculus

Length of output: 370

🏁 Script executed:

# Check if there are any indexes involving default_project or FK relationships in Post model rg -n "default_project|Index" posts/models.py | head -40

Repository: Metaculus/metaculus

Length of output: 1111

🏁 Script executed:

# Check the Post model indexes around line 1024 sed -n '1024,1050p' posts/models.py

Repository: Metaculus/metaculus

Length of output: 854

🏁 Script executed:

# Check if Question model has a related_posts relationship sed -n '39,200p' questions/models.py | grep -E "related_posts|ForeignKey|ManyToMany"

Repository: Metaculus/metaculus

Length of output: 45

🏁 Script executed:

# Find the related_posts relationship definition rg -n "related_posts" questions/models.py posts/models.py | head -20

Repository: Metaculus/metaculus

Length of output: 45

🏁 Script executed:

# Check the Project model for default_permission field rg -n "default_permission|class Meta" projects/models.py | head -40

Repository: Metaculus/metaculus

Length of output: 389

🏁 Script executed:

# Get the default_permission field definition in Project model sed -n '333,345p' projects/models.py

Repository: Metaculus/metaculus

Length of output: 489

🏁 Script executed:

# Search for related_posts more broadly - might be a reverse relation rg -n "related_posts\|RelatedPostsQuestion\|through=" questions/models.py posts/models.py

Repository: Metaculus/metaculus

Length of output: 45

🏁 Script executed:

# Look for any model with related_posts in its name or definition rg -n "class.*Related.*Posts|related_posts.*=" questions/models.py posts/models.py

Repository: Metaculus/metaculus

Length of output: 45

🏁 Script executed:

# Search the entire codebase for related_posts definition rg -n "related_posts" . --type python | head -20

Repository: Metaculus/metaculus

Length of output: 93

🏁 Script executed:

# Search for related_posts in the codebase rg -n "related_posts" . | head -20

Repository: Metaculus/metaculus

Length of output: 1092

🏁 Script executed:

# Look at the review comment file to understand the query better sed -n '618,648p' scoring/management/commands/update_global_bot_leaderboard.py

Repository: Metaculus/metaculus

Length of output: 1268

🏁 Script executed:

# Find the model that has a ForeignKey to Question with related_name="related_posts" rg -n "ForeignKey.*Question|related_name.*related_posts" --type-list | head -5

Repository: Metaculus/metaculus

Length of output: 452

🏁 Script executed:

# Search for the related_posts definition rg -n "related_name\s*=\s*[\"']related_posts" .

Repository: Metaculus/metaculus

Length of output: 45

🏁 Script executed:

# Search for all ForeignKey definitions that might create related_posts rg -n "related_name" posts/models.py questions/models.py | grep -i related

Repository: Metaculus/metaculus

Length of output: 2671

🏁 Script executed:

# Look for any model that has a post ForeignKey rg -n "post\s*=\s*models\.ForeignKey" . -A 2 | head -40

Repository: Metaculus/metaculus

Length of output: 1879

🏁 Script executed:

# Check misc/models.py for related_posts definition sed -n '1,100p' misc/models.py

Repository: Metaculus/metaculus

Length of output: 3281

🏁 Script executed:

# Look for the model that has related_name="related_posts" rg -n "related_name\s*=\s*" misc/models.py | head -20

Repository: Metaculus/metaculus

Length of output: 229

Add database index on Post.default_project FK for query join performance.

The query filters on multiple join paths through related_posts__post__default_project__ including default_permission and default_project_id, plus traverses this FK in exclude clauses. While Post.curation_status has db_index=True and Project.default_permission is indexed, the critical Post.default_project ForeignKey field itself lacks an explicit index. Since this job processes many questions and takes ~20 minutes, adding db_index=True to the default_project FK definition will improve join performance on these high-cardinality traversals.

🤖 Prompt for AI Agents

In `@scoring/management/commands/update_global_bot_leaderboard.py` around lines 618 - 648, The Post.default_project foreign key lacks an index and is used heavily in joins (see filters/excludes on related_posts__post__default_project__* in update_global_bot_leaderboard), so add db_index=True to the ForeignKey declaration for default_project on the Post model, create and commit the generated migration (makemigrations), and run/apply the migration (migrate) to ensure the DB has the new index; this will speed up joins against Project.default_permission and related filters.

coderabbitai · 2026-01-23T22:11:00Z

scoring/management/commands/update_global_bot_leaderboard.py

+    scored_question_counts: dict[int, int] = defaultdict(int)
+    c = users.count()
+    i = 0
+    for user in users:
+        i += 1
+        print(i, "/", c, end="\r")
+        scored_question_counts[user.id] = (
+            Score.objects.filter(user=user, question__in=question_list)
+            .distinct("question_id")
+            .count()
        )


⚠️ Potential issue | 🟠 Major

N+1 query pattern: counting scores per user in a loop.

This loop executes a separate Score.objects.filter(...).count() query for each user, creating an N+1 query problem. With many bot users, this compounds the already slow execution time.

♻️ Suggested optimization using annotation

- scored_question_counts: dict[int, int] = defaultdict(int) - c = users.count() - i = 0 - for user in users: - i += 1 - print(i, "/", c, end="\r") - scored_question_counts[user.id] = ( - Score.objects.filter(user=user, question__in=question_list) - .distinct("question_id") - .count() - ) + from django.db.models import Count + scored_question_counts = dict( + Score.objects.filter(user__in=users, question__in=question_list) + .values("user_id") + .annotate(q_count=Count("question_id", distinct=True)) + .values_list("user_id", "q_count") + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

scored_question_counts: dict[int, int] = defaultdict(int)

c = users.count()

i = 0

for user in users:

i += 1

print(i, "/", c, end="\r")

scored_question_counts[user.id] = (

Score.objects.filter(user=user, question__in=question_list)

.distinct("question_id")

.count()

)

from django.db.models import Count

scored_question_counts = dict(

Score.objects.filter(user__in=users, question__in=question_list)

.values("user_id")

.annotate(q_count=Count("question_id", distinct=True))

.values_list("user_id", "q_count")

)

🤖 Prompt for AI Agents

In `@scoring/management/commands/update_global_bot_leaderboard.py` around lines 655 - 665, The per-user loop triggers an N+1 query by calling Score.objects.filter(user=user, question__in=question_list).distinct("question_id").count() for each user; instead run a single aggregated query like Score.objects.filter(user__in=users, question__in=question_list).values("user").annotate(count=Count("question", distinct=True)) to get distinct question counts per user, then populate scored_question_counts from that result (referencing scored_question_counts, users, question_list and the Score model) rather than counting inside the loop.

coderabbitai · 2026-01-23T22:11:00Z

scoring/management/commands/update_global_bot_leaderboard.py

+    leaderboard, _ = Leaderboard.objects.get_or_create(
+        name="Global Bot Leaderboard",
+        project=Project.objects.get(type=Project.ProjectTypes.SITE_MAIN),
+        score_type=LeaderboardScoreTypes.MANUAL,
+        bot_status=Project.BotLeaderboardStatus.BOTS_ONLY,
+    )


⚠️ Potential issue | 🟡 Minor

get_or_create may create duplicate leaderboards if other fields vary.

Using get_or_create with name, project, score_type, and bot_status means if any of these fields change, a new leaderboard is created rather than updating the existing one. Consider using get_or_create with only name as the lookup, then updating other fields explicitly.

♻️ Safer approach

- leaderboard, _ = Leaderboard.objects.get_or_create( - name="Global Bot Leaderboard", - project=Project.objects.get(type=Project.ProjectTypes.SITE_MAIN), - score_type=LeaderboardScoreTypes.MANUAL, - bot_status=Project.BotLeaderboardStatus.BOTS_ONLY, - ) + leaderboard, created = Leaderboard.objects.get_or_create( + name="Global Bot Leaderboard", + defaults={ + "project": Project.objects.get(type=Project.ProjectTypes.SITE_MAIN), + "score_type": LeaderboardScoreTypes.MANUAL, + "bot_status": Project.BotLeaderboardStatus.BOTS_ONLY, + }, + )

🤖 Prompt for AI Agents

In `@scoring/management/commands/update_global_bot_leaderboard.py` around lines 726 - 731, The current use of Leaderboard.objects.get_or_create(...) with multiple fields risks creating duplicates when non-lookup fields change; change the logic to call Leaderboard.objects.get_or_create(name="Global Bot Leaderboard") to perform lookup by name only, then set or update the remaining attributes (project=Project.objects.get(type=Project.ProjectTypes.SITE_MAIN), score_type=LeaderboardScoreTypes.MANUAL, bot_status=Project.BotLeaderboardStatus.BOTS_ONLY) on the returned leaderboard instance and call save() so existing records are updated instead of creating duplicates; ensure you still handle the created flag appropriately from Leaderboard.objects.get_or_create.

coderabbitai · 2026-01-23T22:11:01Z

scoring/management/commands/update_global_bot_leaderboard.py

+        if isinstance(uid, int):
+            user = User.objects.get(id=uid)
+            bot_details = user.metadata["bot_details"]
+            if not bot_details.get("display_in_leaderboard"):
+                excluded = True


⚠️ Potential issue | 🟠 Major

N+1 query: fetching User inside loop.

User.objects.get(id=uid) is called inside the loop for every integer uid. This creates unnecessary database queries.

♻️ Prefetch users before the loop

+ user_map = {u.id: u for u in User.objects.filter(id__in=[uid for uid, _ in ordered_skills if isinstance(uid, int)])} for uid, skill in ordered_skills: contribution_count = len(player_stats[uid][1]) excluded = False if isinstance(uid, int): - user = User.objects.get(id=uid) + user = user_map.get(uid) + if not user: + continue bot_details = user.metadata["bot_details"]

🤖 Prompt for AI Agents

In `@scoring/management/commands/update_global_bot_leaderboard.py` around lines 743 - 747, The loop performs N+1 queries by calling User.objects.get(id=uid) for each integer uid; instead, before iterating gather all integer uids used in that loop, fetch users in bulk via User.objects.filter(id__in=ids) (or User.objects.in_bulk(ids)) in the update_global_bot_leaderboard logic, build a mapping from id to User, and replace the per-iteration User.objects.get call with a lookup like users_by_id[uid] (handle missing users gracefully); this removes the per-iteration DB hit while preserving the existing checks on user.metadata["bot_details"] and display_in_leaderboard.

coderabbitai · 2026-01-23T22:11:01Z

scoring/management/commands/update_global_bot_leaderboard.py

+    for uid, skill in ordered_skills:
+        if isinstance(uid, str):
+            username = uid
+        else:
+            username = User.objects.get(id=uid).username
+        unevaluated.remove(uid)
+        lower = ci_lower.get(uid, 0)
+        upper = ci_upper.get(uid, 0)
        print(
-            "|  2.5%  "
-            "| Skill  "
-            "| 97.5%  "
-            "| Match  "
-            "| Quest. "
-            "|   ID   "
-            "| Username "
+            f"| {round(lower, 2):>6} "
+            f"| {round(skill, 2):>6} "
+            f"| {round(upper, 2):>6} "
+            f"| {player_stats[uid][0]:>6} "
+            f"| {len(player_stats[uid][1]):>6} "
+            f"| {uid if isinstance(uid, int) else '':>6} "
+            f"| {username}"
        )


⚠️ Potential issue | 🟡 Minor

Another N+1 query pattern in display loop.

User.objects.get(id=uid) is called again inside this display loop. Same issue as noted above - prefetch all needed users beforehand.

🤖 Prompt for AI Agents

In `@scoring/management/commands/update_global_bot_leaderboard.py` around lines 805 - 821, The loop over ordered_skills is causing an N+1 DB hit via User.objects.get(id=uid); fix by collecting all integer uids used in that loop (where uid is not a str) and doing a single query like User.objects.filter(id__in=collected_ids) to build a mapping uid_to_username, then inside the loop replace User.objects.get(id=uid).username with uid_to_username.get(uid, "<missing>") (keeping the existing behavior for str uids where username = uid) so no per-iteration queries occur; update references in the same function where unevaluated and player_stats are used to read from the prebuilt mapping instead of calling User.objects.get.

github-actions · 2026-01-23T22:15:03Z

🚀 Preview Environment

Your preview environment is ready!

Resource	Details
🌐 Preview URL	https://metaculus-pr-3933-feat-global-bot-leaderboard-da-preview.mtcl.cc
📦 Docker Image	`ghcr.io/metaculus/metaculus:feat-global-bot-leaderboard-daily-updating-9ac86cb`
🗄️ PostgreSQL	NeonDB branch `preview/pr-3933-feat-global-bot-leaderboard-da`
⚡ Redis	Fly Redis `mtc-redis-pr-3933-feat-global-bot-leaderboard-da`

Details

Commit: 9ac86cbe64f856c5c178c2dc82fa3151a5c01db1
Branch: feat/global-bot-leaderboard-daily-updating
Fly App: metaculus-pr-3933-feat-global-bot-leaderboard-da

ℹ️ Preview Environment Info

Isolation:

PostgreSQL and Redis are fully isolated from production
Each PR gets its own database branch and Redis instance
Changes pushed to this PR will trigger a new deployment

Limitations:

Background workers and cron jobs are not deployed in preview environments
If you need to test background jobs, use Heroku staging environments

Cleanup:

This preview will be automatically destroyed when the PR is closed

adds daily task to update global bot leaderboard, refactors command f…

f5a2b07

…or convenience

lsabor temporarily deployed to testing_env December 14, 2025 20:05 — with GitHub Actions Inactive

lsabor commented Dec 14, 2025

View reviewed changes

lsabor requested a review from hlbmtc December 14, 2025 20:10

hlbmtc reviewed Dec 15, 2025

View reviewed changes

add partial cache support

4d2c0f7

merge main

55af6c6

lsabor temporarily deployed to testing_env January 23, 2026 22:00 — with GitHub Actions Inactive

lsabor deployed to Preview January 23, 2026 22:08 — with GitHub Actions View deployment

coderabbitai bot reviewed Jan 23, 2026

View reviewed changes

Merge branch 'main' into feat/global-bot-leaderboard-daily-updating

e9eeda9

hlbmtc temporarily deployed to testing_env January 26, 2026 12:00 — with GitHub Actions Inactive

hlbmtc deployed to Preview January 26, 2026 12:08 — with GitHub Actions View deployment

		logger = logging.getLogger(__name__)


		def update_gobal_bot_leaderboard():

	def update_gobal_bot_leaderboard():
	def update_global_bot_leaderboard():

adds daily task to update global bot leaderboard, refactors command for convenience #3933

Are you sure you want to change the base?

adds daily task to update global bot leaderboard, refactors command for convenience #3933

Uh oh!

Conversation

lsabor commented Dec 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

lsabor Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

hlbmtc Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

hlbmtc Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

hlbmtc Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

hlbmtc Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Preview Environment

Details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lsabor commented Dec 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 23, 2026 •

edited

Loading

github-actions bot commented Jan 23, 2026 •

edited

Loading