Skip to content

The allreduce benchmark results on TPUv7x do not meet expectations (only achieving 1/10 of the bandwidth) #71

@jimoosciuc

Description

@jimoosciuc

My environment is TPUv7x, the topology is 2x2x4 chips, the running configuration is as follows:

benchmarks:
- benchmark_name: psum
  benchmark_sweep_params:
  - {matrix_dim_range: {start: 2, end: 32768, multiplier: 2}, dtype: "float32",  mesh_shape: "1x1x32", ici_size_range: 32, sharding_strategy: "1x1x32" , op_dimension: 1, num_runs: 5}
  trace_dir: "../microbenchmarks/all_reduce_1d"
  csv_path: "../microbenchmarks/all_reduce_1d"
  xlml_metrics_dir: "../microbenchmarks/all_reduce_1d"
  xla_dump_dir: "../microbenchmarks/all_reduce_1d/hlo_graphs"

Benchmark command:

python Ironwood/src/run_benchmark.py --config="Ironwood/configs/collectives/all_reduce_1d.yaml"

Result:

iteration op_type replica_group_type rank mesh_shape op_dimension sharding_strategy input_num_elements matrix_shape transferred_data (GB) dtype_bytes hlo_input_shape hlo_output_shape hlo_replica_groups step_time_ms_p50 step_time_ms_p90 step_time_ms_p95 step_time_ms_p99 step_time_ms_avg step_time_ms_max step_time_ms_num_runs step_time_ms_min achieved_bw (GB/s)_p50 achieved_bw (GB/s)_p90 achieved_bw (GB/s)_p95 achieved_bw (GB/s)_p99 achieved_bw (GB/s)_avg achieved_bw (GB/s)_max achieved_bw (GB/s)_num_runs achieved_bw (GB/s)_min
2 AR non-parallel 32 1x1x32 1 1x1x32 2048 ((2, 8, 128)) 1.54E-05 4 f32[2,8,128] f32[2,8,128]{2,1,0:T(8,128) {{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}} 0.109164466 0.1103327732 0.1105685476 0.11075716712 0.1094180074 0.110804322 5 0.108681873 0.14070512651983294 0.14126131083306184 0.14129561350020328 0.14132305563391642 0.14038592940175407 0.1413299161673447 5 0.13862275155656836
4 AR non-parallel 32 1x1x32 1 1x1x32 4096 ((4, 8, 128)) 3.07E-05 4 f32[4,8,128] f32[4,8,128]{2,1,0:T(8,128) {{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}} 0.110981993 0.11134693899999999 0.111434574 0.111504682 0.11087947199999999 0.111522209 5 0.109896759 0.27680166096855013 0.2785112237616809 0.27902315498008695 0.27943269995481174 0.27706407958623314 0.27953508619849293 5 0.2754608277172846
8 AR non-parallel 32 1x1x32 1 1x1x32 8192 ((8, 8, 128)) 6.14E-05 4 f32[8,8,128] f32[8,8,128]{2,1,0:T(8,128) {{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}} 0.113919568 0.11414333739999999 0.11418751519999999 0.11422285743999999 0.113672269 0.114231693 5 0.112722689 0.5393278879006986 0.5436059545612786 0.5443301878407069 0.5449095744642494 0.5405132192449841 0.545054421120135 5 0.5378542363020044
16 AR non-parallel 32 1x1x32 1 1x1x32 16384 ((16, 8, 128)) 1.23E-04 4 f32[16,8,128] f32[16,8,128]{2,1,0:T(8,128) {{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}} 0.11437455 0.1148804322 0.1150290516 0.11514794712 0.1144857144 0.115177671 5 0.114127251 1.0743648827470798 1.0759873386246066 1.0763401155961974 1.0766223371734702 1.0733323178709484 1.0766928925677883 5 1.0668734567484006
32 AR non-parallel 32 1x1x32 1 1x1x32 32768 ((32, 8, 128)) 0.00024576000000000003 4 f32[32,8,128] f32[32,8,128]{2,1,0:T(8,128) {{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}} 0.116130852 0.1166660264 0.1167231692 0.11676888344 0.1161495798 0.116780312 5 0.115402161 2.116233505287639 2.1256447151437268 2.127620442245512 2.12920102392694 2.115927449583728 2.129596169347297 5 2.1044643210064384
64 AR non-parallel 32 1x1x32 1 1x1x32 65536 ((64, 8, 128)) 0.0004915200000000001 4 f32[64,8,128] f32[64,8,128]{2,1,0:T(8,128) {{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}} 0.118138055 0.118301801 0.11835438200000001 0.1183964468 0.11802857159999999 0.118406963 5 0.117647059 4.160556054524514 4.175655051967856 4.176787522850488 4.177693499556594 4.164437113818866 4.177919993733121 5 4.151107228381494
128 AR non-parallel 32 1x1x32 1 1x1x32 131072 ((128, 8, 128)) 0.0009830400000000001 4 f32[128,8,128] f32[128,8,128]{2,1,0:T(8,128) {{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}} 0.119304922 0.1196554624 0.1197551022 0.11983481404 0.1193786316 0.119854742 5 0.119147659 8.23972710866028 8.248343140973619 8.249472932891253 8.25037676642536 8.234675074307967 8.250602724808887 5 8.201928297505326
256 AR non-parallel 32 1x1x32 1 1x1x32 262144 ((256, 8, 128)) 0.0019660800000000003 4 f32[256,8,128] f32[256,8,128]{2,1,0:T(8,128) {{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}} 0.120731092 0.1207939976 0.1207997598 0.12080436956 0.1207058822 0.120805522 5 0.120572629 16.28478602678422 16.30235911447449 16.30427373916804 16.305805438922878 16.28819548241138 16.306188363861587 5 16.2747527385379
512 AR non-parallel 32 1x1x32 1 1x1x32 524288 ((512, 8, 128)) 0.0039321600000000005 4 f32[512,8,128] f32[512,8,128]{2,1,0:T(8,128) {{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}} 0.123763505 0.1239678268 0.12400672239999999 0.12403783888 0.1237085232 0.124045618 5 0.12337575 31.77156303063654 31.85791032024849 31.864663757640372 31.870066507553876 31.78580364256593 31.87141719503225 5 31.699305976290116
1024 AR non-parallel 32 1x1x32 1 1x1x32 1048576 ((1024, 8, 128)) 0.007864320000000001 4 f32[1024,8,128] f32[1024,8,128]{2,1,0:T(8,128) {{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}} 0.129909964 0.1300388956 0.1300470588 0.13005358936 0.12971692680000002 0.130055222 5 0.129194478 60.536696015095515 60.831294622193425 60.851623065416560.867885819994946 60.62722118667093   60.87195150863956 5 60.469082894649176
2048 AR non-parallel 32 1x1x32 1 1x1x32 2097152 ((2048, 8, 128)) 0.015728640000000002 4 f32[2048,8,128] f32[2048,8,128]{2,1,0:T(8,128) {{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}} 0.162322929 0.1627126054 0.1627320532 0.16274761144 0.1623879954 0.162751501 5 0.162072029 96.89721653556413 97.03112561532693 97.03917325743454 97.04561137112063 96.85866226002834 97.04722089954215 5 96.64205800473694
4096 AR non-parallel 32 1x1x32 1 1x1x32 4194304 ((4096, 8, 128)) 0.031457280000000004 4 f32[4096,8,128] f32[4096,8,128]{2,1,0:T(8,128) {{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}} 0.246651861 0.246856903 0.246881633 0.24690141699999998 0.24642304939999998 0.246906363 5 0.24577551 127.53716867354187 127.94545172500672 127.96868705489393 127.98727531880371 127.65601785524038 127.99192238478115 5 127.40570804973547
8192 AR non-parallel 32 1x1x32 1 1x1x32 8388608 ((8192, 8, 128)) 0.06291456000000001 4 f32[8192,8,128] f32[8192,8,128]{2,1,0:T(8,128) {{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}} 0.439457383 0.44032148860000003 0.4404794718 0.44060585836 0.4394578632 0.440637455 5 0.438445378 143.16418937032628 143.4349783234636 143.4648072506911 143.48867039247313 143.16445945055176 143.49463617791864 5 142.7807810845313
16384 AR non-parallel 32 1x1x32 1 1x1x32 16777216 ((16384, 8, 128)) 0.12582912000000002 4 f32[16384,8,128] f32[16384,8,128]{2,1,0:T(8,128) {{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}} 0.863623049 0.8642242496 0.8644026408000001 0.8645453537600001 0.8638040816 0.864581032 5 0.863533013 145.6991220251696 145.7101810993082 145.71224719897378 145.71390007870627 145.66861692105448 145.71431329863938 5 145.537682811436
32768 AR non-parallel 32 1x1x32 1 1x1x32 33554432 ((32768, 8, 128)) 0.25165824000000003 4 f32[32768,8,128] f32[32768,8,128]{2,1,0:T(8,128) {{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}} 1.749894358 1.7536086434 1.7541380551999999 1.75456158464 1.7501519807999997 1.754667467 5 1.744442977 143.81339013380602 144.0882892889745 144.1755479973181 144.24535496399298 143.79275372162158 144.2628067056617 5 143.42218382282232
65536 AR non-parallel 32 1x1x32 1 1x1x32 67108864 ((65536, 8, 128)) 0.5033164800000001 4 f32[65536,8,128] f32[65536,8,128]{2,1,0:T(8,128) {{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}} 3.7871014405 3.7950558226 3.7967385958 3.79808481436 3.789697479 3.798421369 4 3.786165666 132.9028250942936 132.92688643414817 132.93127959832555 132.93479412966747 132.8120188474508 132.93567276250295 4 132.50675243871294
131072 AR non-parallel 32 1x1x32 1 1x1x32 134217728 ((131072, 8, 128)) 1.0066329600000001 4 f32[131072,8,128] f32[131072,8,128]{2,1,0:T(8,128) {{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}} 7.5359609845 7.5368315729 7.53694039645 7.5370274552900005 7.5359609845 7.53704922 2 7.534872749 133.57725485333094 133.5926863037881 133.59461523509523 133.59615838014093 133.57725485333094 133.59654416640237 2 133.55796554025954
262144 AR non-parallel 32 1x1x32 1 1x1x32 268435456 ((262144, 8, 128)) 2.0132659200000003 4 f32[262144,8,128] f32[262144,8,128]{2,1,0:T(8,128) {{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}} 15.081388956 15.081388956 15.081388956 15.081388956 15.081388956 15.081388956 1 15.081388956 133.4934020913929 133.4934020913929 133.4934020913929 133.4934020913929 133.4934020913929 133.4934020913929 1 133.4934020913929

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions