Skip to content
/ server Public

MDEV-7394 : Making slave_skip_errors writable at runtime when slave is stopped #4634

Open
Mahmoud-kh1 wants to merge 4 commits intoMariaDB:mainfrom
Mahmoud-kh1:dynamic-slave-skip-error
Open

MDEV-7394 : Making slave_skip_errors writable at runtime when slave is stopped #4634
Mahmoud-kh1 wants to merge 4 commits intoMariaDB:mainfrom
Mahmoud-kh1:dynamic-slave-skip-error

Conversation

@Mahmoud-kh1
Copy link

Now we can update slave_skip_errors at runtime when slaves stopped

  • This feature makes the slave_skip_errors system variable dynamic allowing it
    to be changed at runtime when the replication slave is stopped.
  • Previously slave_skip_errors was read only at runtime and required a
    server restart to be changed.
  • Runtime updates are now validated and safely rejected when the slave is
    running preventing inconsistent replication state.

Key Changes

  • Added ON_CHECK handler to verify that updates are only allowed while the
    slave is stopped.
  • Added ON_UPDATE handler to reinitialize the internal skip error state
    when the variable is changed.
  • Added an rpl mtr test that verifies slave_skip_errors can be changed
    dynamically when the slave is stopped and verified that updates are rejected while the slave is running.

behavior now is like that :
test slave2

Feature :
MDEV-7394

@CLAassistant
Copy link

CLAassistant commented Feb 8, 2026

CLA assistant check
All committers have signed the CLA.

@Mahmoud-kh1 Mahmoud-kh1 force-pushed the dynamic-slave-skip-error branch 6 times, most recently from 87dbe8f to 3fa1017 Compare February 9, 2026 07:34
@Mahmoud-kh1
Copy link
Author

Mahmoud-kh1 commented Feb 9, 2026

The following test cases fail because they assume that slave_skip_errors is read only which is no longer true which making their checks fail also
sys_vars.sysvars_server_notembedded
main.variables-notembedded
sys_vars.slave_skip_errors_basic

@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Feb 9, 2026
Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a preliminary review. I'd like to request changes mostly because there are tests that need to be re-recorded (and possibly fixed too).

The rest of the comments are just my own limited take on the change. Feel free to ignore and leave for the final review.

sql/slave.cc Outdated
/* Make @@slave_skip_errors show the nice human-readable value. */
opt_slave_skip_errors= slave_skip_error_names;
/* we should not touch opt_slave_skip_errors here. we just build the printable string only. */
(void) opt_slave_skip_errors;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove all these lines please.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your review I will do this

START SLAVE;
--echo # should reduce error because slave is not stopped
--error ER_SLAVE_MUST_STOP
SET GLOBAL slave_skip_errors = "1040";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also add actual functionality tests for the newly set variable values.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I will do this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the great feedback.
I have added a new test I'd be grateful if you take a look on it , and should I fix the tests that fail because it expect the variable to be read-only or leave it for now.

@@ -774,8 +774,15 @@ bool init_slave_skip_errors(const char* arg)
if (!arg || !*arg) // No errors defined
goto end;

if (my_bitmap_init(&slave_error_mask,0,MAX_SLAVE_ERROR))
DBUG_RETURN(1);
if (!slave_error_mask.bitmap)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bad pattern. I'd initialize the bitmap to empty at the start of the server as it's done now.
And I'd update its contents as needed without de-allocating and re-allocating it.

@grooverdan
Copy link
Member

Thank you so much for implementing my 11 year old bug report. I'd be very grateful if you stick through the review process on this. There's a lot to keep correct in the server to implement this change.

Make slave_skip_errors dynamic so it can be changed while the slave is stopped. Attempts to change it while the slave is running are rejected with a clear error.
@Mahmoud-kh1 Mahmoud-kh1 force-pushed the dynamic-slave-skip-error branch from 3fa1017 to 440eec2 Compare February 12, 2026 12:41
Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just fix the spacing issues. The rest is pretty much as one expects it to be. Aside from some polishing. Once the space changes are gone I will do another round and approve it.


/* Make @@slave_skip_errors show the nice human-readable value. */
opt_slave_skip_errors= slave_skip_error_names;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why keep 2 empty lines?

if (!(p= str2int(p, 10, 0, LONG_MAX, &err_code)))
break;
if (err_code < MAX_SLAVE_ERROR)
bitmap_set_bit(&slave_error_mask,(uint)err_code);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no white-space only changes please


end:
make_slave_skip_errors_printable();
end:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no whitespace only changes please

}
if (ret)
rgi->last_event_start_time= 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no whitespace only changes please

sql/slave.cc Outdated
SLAVE_WAIT_GROUP_DONE ? FALSE : TRUE;

DBUG_EXECUTE_IF("stop_slave_middle_group",
DBUG_EXECUTE_IF("stop_slave_middle_group",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no whitespace only changes please

#endif

return FALSE;
return FALSE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace

It's necessary to adjust FD_q.(A) at this point because in the following
course FD_q is going to be dumped to RL.
Generally FD_q is derived from a received FD_m (roughly FD_q := FD_m)
Generally FD_q is derived from a received FD_m (roughly FD_q := FD_m)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace

DBUG_ASSERT(mi->rli.relay_log.description_event_for_queue->used_checksum_alg !=
BINLOG_CHECKSUM_ALG_UNDEF);
DBUG_ASSERT(mi->rli.relay_log.relay_log_checksum_alg !=
BINLOG_CHECKSUM_ALG_UNDEF);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last whitespace remark. you get the gist: don't do that. for the whole file.

mysql_mutex_unlock(&LOCK_global_system_variables);
mysql_mutex_lock(&LOCK_active_mi);

if (give_error_if_slave_running(1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting question: if the slave's not running, is there something to even access the skip errors list?
I'd guess not? I noticed that the other update methods would just check if the slave is running (and pass 0 here) and if it isn't they'd just do the update of the global values. I know what you did here is better. But I wonder if what the other methods are doing isn't enough?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just a flag to this function if 0 means we don't hold LOCK_acitve_mi and the function do and release after checking , but here I need to hold this lock to the end until the update is done so I pass 1 to tell it I will control this lock

my_error(ER_SLAVE_MUST_STOP, MYF(0));
res= true;
}
else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore please if you want, but this goes a bit against the coding style in the rest of the server.

It's usually like this:

bool err = false;
if (do_something_and_it_fails())
{
   err = true;
}

if (!err && do something_else_and_it_fails())
{
   err = true;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also fix the failing tests!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will go with your advice for sure and follow the same pattern

Removed unnecessary blank lines before the conditional check.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

4 participants