Skip to content

Conversation

@tanishq-chugh
Copy link
Contributor

@tanishq-chugh tanishq-chugh commented Dec 30, 2025

…sRead metrics for tables with multiple partitions

What changes were proposed in this pull request?

Fix HiveProtoLoggingHook to ensure no duplicate entries are populated for TablesRead metrics in case of tables with multiple partitions

Why are the changes needed?

Currently, When a SELECT * query is executed on a table with multiple partitions, the TablesRead metric is populated with duplicate entries of the same table - one for each partition accessed.

As a result, the TablesReadCount metric also reports an incorrect value.

Does this PR introduce any user-facing change?

Yes, in generated proto files. Currently, incorrect value is produced for TablesRead metric when a SELECT * is run on a table with multiple partitions. For example, when the following queries are run:

CREATE TABLE tbl_test_part(a int) partitioned by (b int) stored as orc tblproperties("transactional"="true");
INSERT INTO tbl_test_part PARTITION (b=1) VALUES (11);
INSERT INTO tbl_test_part PARTITION (b=2) VALUES (22);
INSERT INTO tbl_test_part PARTITION (b=3) VALUES (33);

SELECT * FROM tbl_test_part;

With the current behaviour, in the proto file generated for the last SELECT query, TablesRead metric / list will contain 4 duplicate entries of default.tbl_test_part . (Correspondingly, the TablesReadCount metric is 4)

After this fix, it will contain only one entry of default.tbl_test_part . (Correspondingly, the TablesReadCount metric is 1)

How was this patch tested?

Manually tested

…sRead metrics for tables with multiple partitions
@sonarqubecloud
Copy link

@Aggarwal-Raghav
Copy link
Contributor

LGTM +1 (non-binding)
I was able to repro the issue
Screenshot 2025-12-30 at 10 38 44 PM

@tanishq-chugh , for select query, there are 4 entry (1 table, 3 partitions )instead of 3 as mentioned in the PR description. Please update the PR description.

Just thinking out loud 🤔, if there is any benefit of keeping entity.getType() == PARTITION as table name will always be present in a SQL query and partition alone can't exist.

@tanishq-chugh
Copy link
Contributor Author

@Aggarwal-Raghav
my bad, yes its 4, Thanks for catching this, changing it.

For the second part, entity.getType() == PARTITION, it was to fix - HIVE-26646, without this the TablesWritten field won't get populated for the INSERT query attached in description : INSERT INTO tbl_test_part PARTITION (b=1) VALUES (11);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants