Skip to content

[Bug] hive catalog, Failed to get first split after waiting for 30 seconds. #60682

@ZJHZH

Description

@ZJHZH

Search before asking

  • I had searched in the issues and found no similar issues.

Version

doris-3.1.4-rc02-7f5ba43de6

What's Wrong?

If the number of partitions queried exceeds num_partitions_in_batch_mode, an error will be reported after waiting for 30 seconds.

ERROR 1105 (HY000): errCode = 2, detailMessage = Failed to get first split after waiting for 30 seconds.

The Env.getCurrentEnv().getExtMetaCacheMgr().getScheduleExecutor() contains a number of threads (greater than max_external_cache_loader_thread_pool_size from historical runs) that continuously call queue.offer in the org.apache.doris.datasource.SplitAssignment#appendBatch method in an infinite loop, the queue is full. It may be due to an abnormal termination of the query, but it is impossible to determine which query terminated or the reason for the termination.

    private void appendBatch(Multimap<Backend, Split> batch) throws UserException {
        for (Backend backend : batch.keySet()) {
            // ...
            while (needMoreSplit()) {
                BlockingQueue<Collection<TScanRangeLocations>> queue =
                        assignment.computeIfAbsent(backend, be -> new LinkedBlockingQueue<>(10000));
                try {
                    if (queue.offer(locations, 100, TimeUnit.MILLISECONDS)) {
                        break;
                    }
                } catch (InterruptedException e) {
                    addUserException(new UserException("Failed to offer batch split by interrupted", e));
                }
            }
        }
    }

"NotCheckpointscheduleExecutor-0" Id=4862 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@164e81e
    at java.base@17.0.15/jdk.internal.misc.Unsafe.park(Native Method)
    -  waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@164e81e
    at java.base@17.0.15/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252)
    at java.base@17.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679)
    at java.base@17.0.15/java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:378)
    at app//org.apache.doris.datasource.SplitAssignment.appendBatch(Unknown Source)
    at app//org.apache.doris.datasource.SplitAssignment.addToQueue(Unknown Source)
    at app//org.apache.doris.datasource.hive.source.HiveScanNode.lambda$startSplit$0(Unknown Source)
    at app//org.apache.doris.datasource.hive.source.HiveScanNode$$Lambda$4683/0x00007f1235a97240.run(Unknown Source)
    at java.base@17.0.15/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)
    at java.base@17.0.15/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base@17.0.15/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base@17.0.15/java.lang.Thread.run(Thread.java:840)

What You Expected?

After the query is completed, needMoreSplit() returns false.

Or method appendBatch has a timeout period.

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions