Skip to content

Commit 2b13b81

Browse files
gpsheadclaude
andcommitted
Remove redundant PAD check in base64 decode fast path
Address review feedback from serhiy-storchaka: the fast path was doing two checks per group - an explicit PAD comparison and the invalid char check in base64_decode_quad(). Change PAD's table entry from 0 to 64 so the existing (v0|v1|v2|v3)&0xc0 check catches it, eliminating 4 comparisons per group. The slow path is unaffected since it checks for PAD character before the table lookup. Decode is ~16% faster at 64K (1.62 GB/s → 1.88 GB/s). Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent f7b27ee commit 2b13b81

File tree

2 files changed

+3
-15
lines changed

2 files changed

+3
-15
lines changed

Doc/whatsnew/3.15.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -433,7 +433,7 @@ base64 & binascii
433433

434434
* CPython's underlying base64 implementation now encodes 2x faster and decodes 3x
435435
faster thanks to simple CPU pipelining optimizations.
436-
(Contributed by Gregory P. Smith in :gh:`143262`.)
436+
(Contributed by Gregory P. Smith & Serhiy Storchaka in :gh:`143262`.)
437437

438438
calendar
439439
--------

Modules/binascii.c

Lines changed: 2 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ static const unsigned char table_a2b_base64[] Py_ALIGNED(64) = {
8181
-1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1,
8282
-1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1,
8383
-1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,62, -1,-1,-1,63,
84-
52,53,54,55, 56,57,58,59, 60,61,-1,-1, -1, 0,-1,-1, /* Note PAD->0 */
84+
52,53,54,55, 56,57,58,59, 60,61,-1,-1, -1,64,-1,-1, /* PAD->64 detected by fast path */
8585
-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11,12,13,14,
8686
15,16,17,18, 19,20,21,22, 23,24,25,-1, -1,-1,-1,-1,
8787
-1,26,27,28, 29,30,31,32, 33,34,35,36, 37,38,39,40,
@@ -177,19 +177,7 @@ base64_decode_fast(const unsigned char *in, Py_ssize_t in_len,
177177
Py_ssize_t i;
178178

179179
for (i = 0; i < n_quads; i++) {
180-
const unsigned char *inp = in + i * 4;
181-
182-
/* Check for padding - exit fast path to handle it properly.
183-
* Four independent comparisons lets the compiler choose the optimal
184-
* approach; on modern pipelined CPUs this is faster than bitmask tricks
185-
* like XOR+SUB+AND for zero-detection which have data dependencies.
186-
*/
187-
if (inp[0] == BASE64_PAD || inp[1] == BASE64_PAD ||
188-
inp[2] == BASE64_PAD || inp[3] == BASE64_PAD) {
189-
break;
190-
}
191-
192-
if (!base64_decode_quad(inp, out + i * 3, table)) {
180+
if (!base64_decode_quad(in + i * 4, out + i * 3, table)) {
193181
break;
194182
}
195183
}

0 commit comments

Comments
 (0)