Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU Usage During Insertions in milvus-sdk-java #1061

Open
LeePui opened this issue Sep 9, 2024 · 4 comments
Open

High CPU Usage During Insertions in milvus-sdk-java #1061

LeePui opened this issue Sep 9, 2024 · 4 comments

Comments

@LeePui
Copy link

LeePui commented Sep 9, 2024

Hi, everyone, thank Hi everyone,

Thank you for providing such a convenient Java SDK; it has been very useful.

While using version 2.4.3 of the milvus-sdk-java, I have encountered some performance issues. Here are some metrics and analysis that I have gathered.

When performing insertions in a single thread, I noticed unusually high CPU usage. After profiling with async-profiler, I pinpointed the most time-consuming operation at this line: AbstractMilvusGrpcClient.java#L1569.

public R<MutationResult> insert(@NonNull InsertParam requestParam) {
        ......
        logDebug(requestParam.toString());
        ......
}

protected void logDebug(String msg, Object... params) {
    if (logLevel.ordinal() <= LogLevel.Debug.ordinal()) {
        logger.debug(msg, params);
    }
}

The attached flame graph can attest to this issue.
image

The high CPU usage seems to be caused by premature calls to toString. In practice, when I set the log level to INFO, there is no need for the toString method to be called. I suggest checking the log level before calling toString.

Thank you for considering this improvement.

@yhmo
Copy link
Contributor

yhmo commented Sep 10, 2024

Thanks for pointing out this problem. I didn't realize it was a problem before.
The InsertParam.toString() is implemented by lombok annotation @tostring, which parses all the vectors to a long text like "[1.1234, 2.2234, ....]". It becomes a bottleneck when the inserted batch is large.

For the requests that could pass large/complicated data, we should manually customize the toString() method. For insertParam, we just want to print out the target collection name, the number of vectors, no need to print out all the vectors.
I will make a change for this, it will take effect in the next minor version.

@xiaofan-luan
Copy link
Contributor

good catch!

@yhmo
Copy link
Contributor

yhmo commented Sep 13, 2024

Fixed by this pr: #1064

@yin-bp
Copy link

yin-bp commented Nov 1, 2024

the good code is:

if(logger.isDebugEnabled()){
               String msg = segmentIDs.getDataCount() + " segments of " + collectionName + " has been flushed";
                logDebug(msg);
}
if(logger.isDebugEnabled()){
              logDebug(requestParam.toString());
}

构建调试日志信息之前,就调用isDebugEnabled进行控制,这样性能才佳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants