-
Notifications
You must be signed in to change notification settings - Fork 27
adds daily task to update global bot leaderboard, refactors command for convenience #3933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing that happens here is to factor out the handle action so it can be called with a task
| bootstrap_iterations = 30 | ||
|
|
||
| # SETUP: users to evaluate & questions | ||
| print("Initializing...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prints won't be typically visible in django command handlers, so I suggest using logging instead
| metadata__bot_details__metac_bot=True, | ||
| metadata__bot_details__include_in_calculations=True, | ||
| metadata__bot_details__display_in_leaderboard=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this might be a slow operation since we don't index User.metadata col
| related_posts__post__default_project_id__in=[ | ||
| 3349, # aib q3 2024 | ||
| 32506, # aib q4 2024 | ||
| 32627, # aib q1 2025 | ||
| 32721, # aib q2 2025 | ||
| 32813, # aib fall 2025 | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to refactor this so we don’t hardcode these values here?
At the very least, I’d move these IDs into constants.py.
| Score.objects.filter(user=user, question__in=question_list) | ||
| .distinct("question_id") | ||
| .count() | ||
| .exclude(related_posts__post__default_project__slug__startswith="minibench") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned here, I don't like this slug approach
📝 WalkthroughWalkthroughThe changes implement daily scheduled updates for the Global Bot Leaderboard. A new cron job is added to trigger at 05:00 UTC, which invokes a wrapper function in the jobs layer that executes refactored leaderboard update logic. The update function is restructured as a callable runner instead of a Django Command class. Changes
Sequence Diagram(s)sequenceDiagram
participant Scheduler as Cron Scheduler
participant Conn as Connection Manager
participant Jobs as Jobs Layer
participant LB as Leaderboard DB
participant Update as Update Function
Scheduler->>Scheduler: Trigger at 05:00 UTC
Scheduler->>Conn: close_old_connections(update_gobal_bot_leaderboard)
Conn->>Jobs: execute update_gobal_bot_leaderboard()
Jobs->>LB: Query for "Global Bot Leaderboard"
LB-->>Jobs: Leaderboard object or None
alt Leaderboard Found
Jobs->>Update: run_update_global_bot_leaderboard()
Update->>Update: gather_data() with cache=True
Update->>Update: Compute scores & rankings
Update->>LB: Update leaderboard entries
LB-->>Update: Confirmation
Update-->>Jobs: Success
else Leaderboard Not Found
Jobs->>Jobs: Log warning
Jobs-->>Conn: Return
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In `@misc/management/commands/cron.py`:
- Around line 212-218: The two scheduled jobs update_custom_leaderboards and
update_gobal_bot_leaderboard are both added via scheduler.add_job using
CronTrigger.from_crontab("0 5 * * *") and will run concurrently; change one cron
expression to a staggered time (e.g., set update_gobal_bot_leaderboard to "0 6 *
* *") so the long-running update_gobal_bot_leaderboard does not overlap with
update_custom_leaderboards, keeping id, max_instances, replace_existing handling
the same.
In `@scoring/jobs.py`:
- Line 19: Rename the misspelled function update_gobal_bot_leaderboard to
update_global_bot_leaderboard and update every reference to it (including the
cron job ID and any imports) so names stay consistent; specifically change the
import and usage in misc/management/commands/cron.py to import
update_global_bot_leaderboard and update the cron registration/ID that currently
uses the typo, then run a quick search for update_gobal_bot_leaderboard to
replace remaining occurrences.
In `@scoring/management/commands/update_global_bot_leaderboard.py`:
- Around line 805-821: The loop over ordered_skills is causing an N+1 DB hit via
User.objects.get(id=uid); fix by collecting all integer uids used in that loop
(where uid is not a str) and doing a single query like
User.objects.filter(id__in=collected_ids) to build a mapping uid_to_username,
then inside the loop replace User.objects.get(id=uid).username with
uid_to_username.get(uid, "<missing>") (keeping the existing behavior for str
uids where username = uid) so no per-iteration queries occur; update references
in the same function where unevaluated and player_stats are used to read from
the prebuilt mapping instead of calling User.objects.get.
- Around line 655-665: The per-user loop triggers an N+1 query by calling
Score.objects.filter(user=user,
question__in=question_list).distinct("question_id").count() for each user;
instead run a single aggregated query like Score.objects.filter(user__in=users,
question__in=question_list).values("user").annotate(count=Count("question",
distinct=True)) to get distinct question counts per user, then populate
scored_question_counts from that result (referencing scored_question_counts,
users, question_list and the Score model) rather than counting inside the loop.
- Around line 726-731: The current use of Leaderboard.objects.get_or_create(...)
with multiple fields risks creating duplicates when non-lookup fields change;
change the logic to call Leaderboard.objects.get_or_create(name="Global Bot
Leaderboard") to perform lookup by name only, then set or update the remaining
attributes (project=Project.objects.get(type=Project.ProjectTypes.SITE_MAIN),
score_type=LeaderboardScoreTypes.MANUAL,
bot_status=Project.BotLeaderboardStatus.BOTS_ONLY) on the returned leaderboard
instance and call save() so existing records are updated instead of creating
duplicates; ensure you still handle the created flag appropriately from
Leaderboard.objects.get_or_create.
- Around line 743-747: The loop performs N+1 queries by calling
User.objects.get(id=uid) for each integer uid; instead, before iterating gather
all integer uids used in that loop, fetch users in bulk via
User.objects.filter(id__in=ids) (or User.objects.in_bulk(ids)) in the
update_global_bot_leaderboard logic, build a mapping from id to User, and
replace the per-iteration User.objects.get call with a lookup like
users_by_id[uid] (handle missing users gracefully); this removes the
per-iteration DB hit while preserving the existing checks on
user.metadata["bot_details"] and display_in_leaderboard.
- Around line 618-648: The Post.default_project foreign key lacks an index and
is used heavily in joins (see filters/excludes on
related_posts__post__default_project__* in update_global_bot_leaderboard), so
add db_index=True to the ForeignKey declaration for default_project on the Post
model, create and commit the generated migration (makemigrations), and run/apply
the migration (migrate) to ensure the DB has the new index; this will speed up
joins against Project.default_permission and related filters.
♻️ Duplicate comments (4)
scoring/management/commands/update_global_bot_leaderboard.py (4)
144-180: Hardcoded project IDs create maintenance burden and coupling.These hardcoded project IDs (lines 146-152, 155-159) and the mapping logic are fragile - any new AIB project requires code changes. This was already flagged in a previous review.
627-633: Hardcoded AIB project IDs duplicated from earlier in the file.These IDs (3349, 32506, 32627, 32721, 32813) duplicate lines 146-152. This was flagged in a previous review - consider extracting to constants.
639-639: Slug-based exclusion approach was flagged in previous review.The
.exclude(related_posts__post__default_project__slug__startswith="minibench")approach was already discussed as problematic.
778-892: Debug/test output in production code.The extensive print statements for results display (lines 779-835) and statistical tests (lines 843-892) appear to be debug/development artifacts. In a cron job context, these won't be visible and add execution overhead. The previous review flagged that prints won't be visible in Django command handlers.
Consider using logging at DEBUG level or removing these diagnostic outputs for production runs.
🧹 Nitpick comments (4)
scoring/management/commands/update_global_bot_leaderboard.py (2)
107-141: Default parameter change fromFalsetoTruemay have unintended side effects.Changing
cache: bool = Truemeans all callers now default to caching behavior. This could cause stale data issues if the cache file (HtH_score_data.csv) exists but contains outdated information. The code also writes to a file in the current working directory, which may vary depending on execution context (cron job vs manual invocation).Consider:
- Using an absolute path or Django's
MEDIA_ROOT/temp directory for cache files- Adding cache invalidation logic (e.g., timestamp-based expiry)
710-718: Consider addingstrict=Trueto zip() for safety.Static analysis flagged line 714's
zip()call withoutstrict=. While the lists should be the same length by construction, addingstrict=Truewould catch bugs if they diverge.♻️ Proposed fix
- for u1id, u2id, qid in zip(user1_ids, user2_ids, question_ids): + for u1id, u2id, qid in zip(user1_ids, user2_ids, question_ids, strict=True):scoring/jobs.py (2)
20-25: Existence check queries the leaderboard but doesn't use the result.The function fetches
global_bot_leaderboardto check existence but never uses it -run_update_global_bot_leaderboard()will perform its ownget_or_create. This is defensive but slightly wasteful. Consider either:
- Removing the check since
run_update_global_bot_leaderboardhandles creation- Passing the leaderboard instance to avoid duplicate queries
26-29: Uselogger.exception()instead oflogger.error()to capture stack trace.Per static analysis,
logger.exception()automatically includes the traceback, which is essential for debugging failures in a cron job.♻️ Proposed fix
try: run_update_global_bot_leaderboard() except Exception as e: - logger.error(f"Error updating Global Bot Leaderboard: {e}") + logger.exception("Error updating Global Bot Leaderboard")
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
misc/management/commands/cron.pyscoring/jobs.pyscoring/management/commands/update_global_bot_leaderboard.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-15T19:29:58.940Z
Learnt from: hlbmtc
Repo: Metaculus/metaculus PR: 4075
File: authentication/urls.py:24-26
Timestamp: 2026-01-15T19:29:58.940Z
Learning: In this codebase, DRF is configured to use IsAuthenticated as the default in REST_FRAMEWORK['DEFAULT_PERMISSION_CLASSES'] within metaculus_web/settings.py. Therefore, explicit permission_classes([IsAuthenticated]) decorators are unnecessary on DRF views unless a view needs to override the default. When reviewing Python files, verify that views relying on the default are not redundantly decorated, and flag cases where permissions are being over-specified or when a non-default permission is explicitly required.
Applied to files:
scoring/management/commands/update_global_bot_leaderboard.pymisc/management/commands/cron.pyscoring/jobs.py
🧬 Code graph analysis (3)
scoring/management/commands/update_global_bot_leaderboard.py (3)
scoring/models.py (3)
Score(17-56)Leaderboard(97-280)LeaderboardEntry(302-354)projects/models.py (4)
Project(212-495)BotLeaderboardStatus(248-252)save(404-439)save(603-608)scoring/constants.py (1)
LeaderboardScoreTypes(17-57)
misc/management/commands/cron.py (1)
scoring/jobs.py (1)
update_gobal_bot_leaderboard(19-29)
scoring/jobs.py (2)
scoring/management/commands/update_global_bot_leaderboard.py (1)
run_update_global_bot_leaderboard(603-892)scoring/models.py (1)
Leaderboard(97-280)
🪛 Ruff (0.14.13)
scoring/management/commands/update_global_bot_leaderboard.py
714-714: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
900-900: Unused method argument: args
(ARG002)
900-900: Unused method argument: options
(ARG002)
scoring/jobs.py
28-28: Do not catch blind exception: Exception
(BLE001)
29-29: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build Docker Image / Build Docker Image
- GitHub Check: Backend Checks
- GitHub Check: integration-tests
🔇 Additional comments (5)
scoring/management/commands/update_global_bot_leaderboard.py (3)
189-191: Early return for cached questions improves performance.This optimization correctly skips questions that are already in the cache, reducing redundant processing time.
603-614: Refactoring to a standalone function is a good pattern for cron integration.Exposing
run_update_global_bot_leaderboard()as a standalone callable makes it reusable from both the Django management command and the scheduled job. The baseline player ID (236038) is still hardcoded though - consider making it configurable or documenting why this specific user is the baseline.
895-901: Command class correctly delegates to the runner function.The thin wrapper pattern is appropriate. The unused
argsandoptionsparameters flagged by static analysis are standard Django Command signatures and can be safely ignored.misc/management/commands/cron.py (2)
29-29: Import inherits the typo from scoring/jobs.py.Update this import if/when the function name is corrected in
scoring/jobs.py.
212-218: Job configuration follows established patterns correctly.The job uses
close_old_connectionswrapper,max_instances=1, andreplace_existing=Trueconsistent with other scoring jobs. Good implementation.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| scheduler.add_job( | ||
| close_old_connections(update_gobal_bot_leaderboard), | ||
| trigger=CronTrigger.from_crontab("0 5 * * *"), # Every day at 05:00 UTC | ||
| id="update_gobal_bot_leaderboard", | ||
| max_instances=1, | ||
| replace_existing=True, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scheduling conflict: both update_custom_leaderboards and update_gobal_bot_leaderboard run at 05:00 UTC.
Both jobs are scheduled at the same time (lines 207 and 214). Since update_gobal_bot_leaderboard takes ~20 minutes as noted in the PR description, running them concurrently could cause resource contention. Consider staggering the times.
♻️ Suggested fix - stagger to 06:00 UTC
scheduler.add_job(
close_old_connections(update_gobal_bot_leaderboard),
- trigger=CronTrigger.from_crontab("0 5 * * *"), # Every day at 05:00 UTC
+ trigger=CronTrigger.from_crontab("0 6 * * *"), # Every day at 06:00 UTC
id="update_gobal_bot_leaderboard",
max_instances=1,
replace_existing=True,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| scheduler.add_job( | |
| close_old_connections(update_gobal_bot_leaderboard), | |
| trigger=CronTrigger.from_crontab("0 5 * * *"), # Every day at 05:00 UTC | |
| id="update_gobal_bot_leaderboard", | |
| max_instances=1, | |
| replace_existing=True, | |
| ) | |
| scheduler.add_job( | |
| close_old_connections(update_gobal_bot_leaderboard), | |
| trigger=CronTrigger.from_crontab("0 6 * * *"), # Every day at 06:00 UTC | |
| id="update_gobal_bot_leaderboard", | |
| max_instances=1, | |
| replace_existing=True, | |
| ) |
🤖 Prompt for AI Agents
In `@misc/management/commands/cron.py` around lines 212 - 218, The two scheduled
jobs update_custom_leaderboards and update_gobal_bot_leaderboard are both added
via scheduler.add_job using CronTrigger.from_crontab("0 5 * * *") and will run
concurrently; change one cron expression to a staggered time (e.g., set
update_gobal_bot_leaderboard to "0 6 * * *") so the long-running
update_gobal_bot_leaderboard does not overlap with update_custom_leaderboards,
keeping id, max_instances, replace_existing handling the same.
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def update_gobal_bot_leaderboard(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in function name: update_gobal_bot_leaderboard should be update_global_bot_leaderboard.
The function name has a typo ("gobal" vs "global"). This typo propagates to the cron job ID and import. Fix this before it becomes a maintenance headache.
♻️ Proposed fix
-def update_gobal_bot_leaderboard():
+def update_global_bot_leaderboard():Also update the import in misc/management/commands/cron.py accordingly.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def update_gobal_bot_leaderboard(): | |
| def update_global_bot_leaderboard(): |
🤖 Prompt for AI Agents
In `@scoring/jobs.py` at line 19, Rename the misspelled function
update_gobal_bot_leaderboard to update_global_bot_leaderboard and update every
reference to it (including the cron job ID and any imports) so names stay
consistent; specifically change the import and usage in
misc/management/commands/cron.py to import update_global_bot_leaderboard and
update the cron registration/ID that currently uses the typo, then run a quick
search for update_gobal_bot_leaderboard to replace remaining occurrences.
| questions: QuerySet[Question] = ( | ||
| Question.objects.filter( | ||
| Q( | ||
| related_posts__post__default_project__default_permission__in=[ | ||
| "viewer", | ||
| "forecaster", | ||
| ] | ||
| ) | ||
| .order_by("id") | ||
| .distinct("id") | ||
| | Q( | ||
| related_posts__post__default_project_id__in=[ | ||
| 3349, # aib q3 2024 | ||
| 32506, # aib q4 2024 | ||
| 32627, # aib q1 2025 | ||
| 32721, # aib q2 2025 | ||
| 32813, # aib fall 2025 | ||
| ] | ||
| ), | ||
| related_posts__post__curation_status=Post.CurationStatus.APPROVED, | ||
| resolution__isnull=False, | ||
| scheduled_close_time__lte=timezone.now(), | ||
| ) | ||
| ############### | ||
| # make sure they have at least 100 resolved questions | ||
| print("initialize list") | ||
| question_list = list(questions) | ||
| print("Filtering users.") | ||
| scored_question_counts: dict[int, int] = defaultdict(int) | ||
| c = users.count() | ||
| i = 0 | ||
| for user in users: | ||
| i += 1 | ||
| print(i, "/", c, end="\r") | ||
| scored_question_counts[user.id] = ( | ||
| Score.objects.filter(user=user, question__in=question_list) | ||
| .distinct("question_id") | ||
| .count() | ||
| .exclude(related_posts__post__default_project__slug__startswith="minibench") | ||
| .exclude(resolution__in=UnsuccessfulResolutionType) | ||
| .filter(Exists(user_forecast_exists)) | ||
| .prefetch_related( # only prefetch forecasts from those users | ||
| Prefetch( | ||
| "user_forecasts", queryset=Forecast.objects.filter(author__in=users) | ||
| ) | ||
| excluded_ids = [ | ||
| uid for uid, count in scored_question_counts.items() if count < 100 | ||
| ] | ||
| users = users.exclude(id__in=excluded_ids) | ||
| ############### | ||
| print("Initializing... DONE") | ||
|
|
||
| # Gather head to head scores | ||
| user1_ids, user2_ids, question_ids, scores, weights = gather_data( | ||
| users, questions | ||
| ) | ||
|
|
||
| # choose baseline player if not already chosen | ||
| if not baseline_player: | ||
| baseline_player = max( | ||
| set(user1_ids) | set(user2_ids), key=(user1_ids + user2_ids).count | ||
| ) | ||
| # get variance of average scores (used in rescaling) | ||
| avg_scores = get_avg_scores(user1_ids, user2_ids, scores, weights) | ||
| var_avg_scores = ( | ||
| np.var(np.array(list(avg_scores.values()))) if len(avg_scores) > 1 else 0 | ||
| .order_by("id") | ||
| .distinct("id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -name "models.py" | head -20Repository: Metaculus/metaculus
Length of output: 292
🏁 Script executed:
rg -n "class Post|class Question|class Forecast|class Project" --type python | grep models.py | head -20Repository: Metaculus/metaculus
Length of output: 149
🏁 Script executed:
# Check for indexes on Post model
rg -n "class Meta" -A 20 posts/models.py | grep -E "class Meta|indexes|db_index" | head -30Repository: Metaculus/metaculus
Length of output: 319
🏁 Script executed:
# Check for indexes on Question model
rg -n "class Meta" -A 20 questions/models.py | grep -E "class Meta|indexes|db_index" | head -30Repository: Metaculus/metaculus
Length of output: 300
🏁 Script executed:
# Look for specific field definitions on Post and Project models
rg -n "default_project|default_permission|curation_status" posts/models.py questions/models.py projects/models.pyRepository: Metaculus/metaculus
Length of output: 3499
🏁 Script executed:
# Get the Post model indexes around line 1023
sed -n '1017,1070p' posts/models.pyRepository: Metaculus/metaculus
Length of output: 1686
🏁 Script executed:
# Get the Question model indexes
sed -n '388,410p' questions/models.py
sed -n '613,640p' questions/models.py
sed -n '703,730p' questions/models.pyRepository: Metaculus/metaculus
Length of output: 2999
🏁 Script executed:
# Check if curation_status and default_project fields have db_index
rg -n "curation_status|default_project" posts/models.py | grep "db_index\|models\."Repository: Metaculus/metaculus
Length of output: 232
🏁 Script executed:
# Get the full Post model definition around curation_status and default_project fields
sed -n '570,590p' posts/models.py
sed -n '810,830p' posts/models.pyRepository: Metaculus/metaculus
Length of output: 1572
🏁 Script executed:
# Check for any indexes on Post model itself
sed -n '850,920p' posts/models.pyRepository: Metaculus/metaculus
Length of output: 2387
🏁 Script executed:
# Look for the Post model Meta class
rg -n "class Post\(|class Meta:" posts/models.py | head -20Repository: Metaculus/metaculus
Length of output: 191
🏁 Script executed:
# Get the Post model Meta class (the first one at line 986)
sed -n '986,1016p' posts/models.pyRepository: Metaculus/metaculus
Length of output: 1315
🏁 Script executed:
# Find the actual Post class Meta definition - it should be after line 541
sed -n '900,1000p' posts/models.py | grep -n "class Meta" -A 15Repository: Metaculus/metaculus
Length of output: 657
🏁 Script executed:
# Also check if there's any Meta class for Post model
rg -n "^class Post\(" posts/models.py -A 450 | grep "class Meta" -A 20 | head -40Repository: Metaculus/metaculus
Length of output: 346
🏁 Script executed:
# Find the end of the Post class definition
sed -n '541,985p' posts/models.py | tail -50Repository: Metaculus/metaculus
Length of output: 1899
🏁 Script executed:
# Check if Post class has any Meta class at all by looking for all Meta classes
rg -n "^class.*\(.*\):" posts/models.py | head -20Repository: Metaculus/metaculus
Length of output: 444
🏁 Script executed:
# Check if Post class has a Meta definition between lines 541-949
sed -n '541,949p' posts/models.py | grep -n "class Meta" -A 30Repository: Metaculus/metaculus
Length of output: 45
🏁 Script executed:
# Also check default_project foreign key - does it have db_index?
sed -n '817,830p' posts/models.pyRepository: Metaculus/metaculus
Length of output: 651
🏁 Script executed:
# Check the Question model and its related_posts relationship
rg -n "class Question|related_posts|class Meta" questions/models.py | head -40Repository: Metaculus/metaculus
Length of output: 370
🏁 Script executed:
# Check if there are any indexes involving default_project or FK relationships in Post model
rg -n "default_project|Index" posts/models.py | head -40Repository: Metaculus/metaculus
Length of output: 1111
🏁 Script executed:
# Check the Post model indexes around line 1024
sed -n '1024,1050p' posts/models.pyRepository: Metaculus/metaculus
Length of output: 854
🏁 Script executed:
# Check if Question model has a related_posts relationship
sed -n '39,200p' questions/models.py | grep -E "related_posts|ForeignKey|ManyToMany"Repository: Metaculus/metaculus
Length of output: 45
🏁 Script executed:
# Find the related_posts relationship definition
rg -n "related_posts" questions/models.py posts/models.py | head -20Repository: Metaculus/metaculus
Length of output: 45
🏁 Script executed:
# Check the Project model for default_permission field
rg -n "default_permission|class Meta" projects/models.py | head -40Repository: Metaculus/metaculus
Length of output: 389
🏁 Script executed:
# Get the default_permission field definition in Project model
sed -n '333,345p' projects/models.pyRepository: Metaculus/metaculus
Length of output: 489
🏁 Script executed:
# Search for related_posts more broadly - might be a reverse relation
rg -n "related_posts\|RelatedPostsQuestion\|through=" questions/models.py posts/models.pyRepository: Metaculus/metaculus
Length of output: 45
🏁 Script executed:
# Look for any model with related_posts in its name or definition
rg -n "class.*Related.*Posts|related_posts.*=" questions/models.py posts/models.pyRepository: Metaculus/metaculus
Length of output: 45
🏁 Script executed:
# Search the entire codebase for related_posts definition
rg -n "related_posts" . --type python | head -20Repository: Metaculus/metaculus
Length of output: 93
🏁 Script executed:
# Search for related_posts in the codebase
rg -n "related_posts" . | head -20Repository: Metaculus/metaculus
Length of output: 1092
🏁 Script executed:
# Look at the review comment file to understand the query better
sed -n '618,648p' scoring/management/commands/update_global_bot_leaderboard.pyRepository: Metaculus/metaculus
Length of output: 1268
🏁 Script executed:
# Find the model that has a ForeignKey to Question with related_name="related_posts"
rg -n "ForeignKey.*Question|related_name.*related_posts" --type-list | head -5Repository: Metaculus/metaculus
Length of output: 452
🏁 Script executed:
# Search for the related_posts definition
rg -n "related_name\s*=\s*[\"']related_posts" .Repository: Metaculus/metaculus
Length of output: 45
🏁 Script executed:
# Search for all ForeignKey definitions that might create related_posts
rg -n "related_name" posts/models.py questions/models.py | grep -i relatedRepository: Metaculus/metaculus
Length of output: 2671
🏁 Script executed:
# Look for any model that has a post ForeignKey
rg -n "post\s*=\s*models\.ForeignKey" . -A 2 | head -40Repository: Metaculus/metaculus
Length of output: 1879
🏁 Script executed:
# Check misc/models.py for related_posts definition
sed -n '1,100p' misc/models.pyRepository: Metaculus/metaculus
Length of output: 3281
🏁 Script executed:
# Look for the model that has related_name="related_posts"
rg -n "related_name\s*=\s*" misc/models.py | head -20Repository: Metaculus/metaculus
Length of output: 229
Add database index on Post.default_project FK for query join performance.
The query filters on multiple join paths through related_posts__post__default_project__ including default_permission and default_project_id, plus traverses this FK in exclude clauses. While Post.curation_status has db_index=True and Project.default_permission is indexed, the critical Post.default_project ForeignKey field itself lacks an explicit index. Since this job processes many questions and takes ~20 minutes, adding db_index=True to the default_project FK definition will improve join performance on these high-cardinality traversals.
🤖 Prompt for AI Agents
In `@scoring/management/commands/update_global_bot_leaderboard.py` around lines
618 - 648, The Post.default_project foreign key lacks an index and is used
heavily in joins (see filters/excludes on
related_posts__post__default_project__* in update_global_bot_leaderboard), so
add db_index=True to the ForeignKey declaration for default_project on the Post
model, create and commit the generated migration (makemigrations), and run/apply
the migration (migrate) to ensure the DB has the new index; this will speed up
joins against Project.default_permission and related filters.
| scored_question_counts: dict[int, int] = defaultdict(int) | ||
| c = users.count() | ||
| i = 0 | ||
| for user in users: | ||
| i += 1 | ||
| print(i, "/", c, end="\r") | ||
| scored_question_counts[user.id] = ( | ||
| Score.objects.filter(user=user, question__in=question_list) | ||
| .distinct("question_id") | ||
| .count() | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N+1 query pattern: counting scores per user in a loop.
This loop executes a separate Score.objects.filter(...).count() query for each user, creating an N+1 query problem. With many bot users, this compounds the already slow execution time.
♻️ Suggested optimization using annotation
- scored_question_counts: dict[int, int] = defaultdict(int)
- c = users.count()
- i = 0
- for user in users:
- i += 1
- print(i, "/", c, end="\r")
- scored_question_counts[user.id] = (
- Score.objects.filter(user=user, question__in=question_list)
- .distinct("question_id")
- .count()
- )
+ from django.db.models import Count
+ scored_question_counts = dict(
+ Score.objects.filter(user__in=users, question__in=question_list)
+ .values("user_id")
+ .annotate(q_count=Count("question_id", distinct=True))
+ .values_list("user_id", "q_count")
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| scored_question_counts: dict[int, int] = defaultdict(int) | |
| c = users.count() | |
| i = 0 | |
| for user in users: | |
| i += 1 | |
| print(i, "/", c, end="\r") | |
| scored_question_counts[user.id] = ( | |
| Score.objects.filter(user=user, question__in=question_list) | |
| .distinct("question_id") | |
| .count() | |
| ) | |
| from django.db.models import Count | |
| scored_question_counts = dict( | |
| Score.objects.filter(user__in=users, question__in=question_list) | |
| .values("user_id") | |
| .annotate(q_count=Count("question_id", distinct=True)) | |
| .values_list("user_id", "q_count") | |
| ) |
🤖 Prompt for AI Agents
In `@scoring/management/commands/update_global_bot_leaderboard.py` around lines
655 - 665, The per-user loop triggers an N+1 query by calling
Score.objects.filter(user=user,
question__in=question_list).distinct("question_id").count() for each user;
instead run a single aggregated query like Score.objects.filter(user__in=users,
question__in=question_list).values("user").annotate(count=Count("question",
distinct=True)) to get distinct question counts per user, then populate
scored_question_counts from that result (referencing scored_question_counts,
users, question_list and the Score model) rather than counting inside the loop.
| leaderboard, _ = Leaderboard.objects.get_or_create( | ||
| name="Global Bot Leaderboard", | ||
| project=Project.objects.get(type=Project.ProjectTypes.SITE_MAIN), | ||
| score_type=LeaderboardScoreTypes.MANUAL, | ||
| bot_status=Project.BotLeaderboardStatus.BOTS_ONLY, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_or_create may create duplicate leaderboards if other fields vary.
Using get_or_create with name, project, score_type, and bot_status means if any of these fields change, a new leaderboard is created rather than updating the existing one. Consider using get_or_create with only name as the lookup, then updating other fields explicitly.
♻️ Safer approach
- leaderboard, _ = Leaderboard.objects.get_or_create(
- name="Global Bot Leaderboard",
- project=Project.objects.get(type=Project.ProjectTypes.SITE_MAIN),
- score_type=LeaderboardScoreTypes.MANUAL,
- bot_status=Project.BotLeaderboardStatus.BOTS_ONLY,
- )
+ leaderboard, created = Leaderboard.objects.get_or_create(
+ name="Global Bot Leaderboard",
+ defaults={
+ "project": Project.objects.get(type=Project.ProjectTypes.SITE_MAIN),
+ "score_type": LeaderboardScoreTypes.MANUAL,
+ "bot_status": Project.BotLeaderboardStatus.BOTS_ONLY,
+ },
+ )🤖 Prompt for AI Agents
In `@scoring/management/commands/update_global_bot_leaderboard.py` around lines
726 - 731, The current use of Leaderboard.objects.get_or_create(...) with
multiple fields risks creating duplicates when non-lookup fields change; change
the logic to call Leaderboard.objects.get_or_create(name="Global Bot
Leaderboard") to perform lookup by name only, then set or update the remaining
attributes (project=Project.objects.get(type=Project.ProjectTypes.SITE_MAIN),
score_type=LeaderboardScoreTypes.MANUAL,
bot_status=Project.BotLeaderboardStatus.BOTS_ONLY) on the returned leaderboard
instance and call save() so existing records are updated instead of creating
duplicates; ensure you still handle the created flag appropriately from
Leaderboard.objects.get_or_create.
| if isinstance(uid, int): | ||
| user = User.objects.get(id=uid) | ||
| bot_details = user.metadata["bot_details"] | ||
| if not bot_details.get("display_in_leaderboard"): | ||
| excluded = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N+1 query: fetching User inside loop.
User.objects.get(id=uid) is called inside the loop for every integer uid. This creates unnecessary database queries.
♻️ Prefetch users before the loop
+ user_map = {u.id: u for u in User.objects.filter(id__in=[uid for uid, _ in ordered_skills if isinstance(uid, int)])}
for uid, skill in ordered_skills:
contribution_count = len(player_stats[uid][1])
excluded = False
if isinstance(uid, int):
- user = User.objects.get(id=uid)
+ user = user_map.get(uid)
+ if not user:
+ continue
bot_details = user.metadata["bot_details"]🤖 Prompt for AI Agents
In `@scoring/management/commands/update_global_bot_leaderboard.py` around lines
743 - 747, The loop performs N+1 queries by calling User.objects.get(id=uid) for
each integer uid; instead, before iterating gather all integer uids used in that
loop, fetch users in bulk via User.objects.filter(id__in=ids) (or
User.objects.in_bulk(ids)) in the update_global_bot_leaderboard logic, build a
mapping from id to User, and replace the per-iteration User.objects.get call
with a lookup like users_by_id[uid] (handle missing users gracefully); this
removes the per-iteration DB hit while preserving the existing checks on
user.metadata["bot_details"] and display_in_leaderboard.
| for uid, skill in ordered_skills: | ||
| if isinstance(uid, str): | ||
| username = uid | ||
| else: | ||
| username = User.objects.get(id=uid).username | ||
| unevaluated.remove(uid) | ||
| lower = ci_lower.get(uid, 0) | ||
| upper = ci_upper.get(uid, 0) | ||
| print( | ||
| "| 2.5% " | ||
| "| Skill " | ||
| "| 97.5% " | ||
| "| Match " | ||
| "| Quest. " | ||
| "| ID " | ||
| "| Username " | ||
| f"| {round(lower, 2):>6} " | ||
| f"| {round(skill, 2):>6} " | ||
| f"| {round(upper, 2):>6} " | ||
| f"| {player_stats[uid][0]:>6} " | ||
| f"| {len(player_stats[uid][1]):>6} " | ||
| f"| {uid if isinstance(uid, int) else '':>6} " | ||
| f"| {username}" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another N+1 query pattern in display loop.
User.objects.get(id=uid) is called again inside this display loop. Same issue as noted above - prefetch all needed users beforehand.
🤖 Prompt for AI Agents
In `@scoring/management/commands/update_global_bot_leaderboard.py` around lines
805 - 821, The loop over ordered_skills is causing an N+1 DB hit via
User.objects.get(id=uid); fix by collecting all integer uids used in that loop
(where uid is not a str) and doing a single query like
User.objects.filter(id__in=collected_ids) to build a mapping uid_to_username,
then inside the loop replace User.objects.get(id=uid).username with
uid_to_username.get(uid, "<missing>") (keeping the existing behavior for str
uids where username = uid) so no per-iteration queries occur; update references
in the same function where unevaluated and player_stats are used to read from
the prebuilt mapping instead of calling User.objects.get.
🚀 Preview EnvironmentYour preview environment is ready!
Details
ℹ️ Preview Environment InfoIsolation:
Limitations:
Cleanup:
|
closes #3808
This task will take ~20 minutes long, so it might be worth considering optimizing the approach.
The only thing that takes a long time is the head to head scoring for popular questions. The simplest solution would be to store all of the head to head scores at question resolution in a new table. This would take the task down to 1-2 minutes from upwards of 30 (and maybe some memory issues).
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.