You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the total amount of resources for the Flink Sort Job comes from the configuration file flink-sort-plugin.properties, so all submitted sort jobs will use the same amount of resources. When the data scale is large, the resources will be insufficient. When the data scale is small, the resources are wasted.
# Flink parallelismflink.parallelism=1
Therefore, dynamically adjusting the number of resources according to the amount of data is one of the urgently needed functions
Resource Adaptive Adjustment
Theoretically, the processing performance of Flink can reach about 1000/second/core, of course, it depends on factors such as state-backed.
Influencing factors:
Data scale:
Storage IO bottleneck:
When the performance of a single client connection to external storage becomes a bottleneck, it is a good idea to increase the degree of parallelism or the number of threads
Transformation computational complexity:
In the case of a fixed LoadNode, it is a deterministic factor
Advance factors:
core/task manager, parallelism/core, and so on.
featzhang
changed the title
[Feature][Sort] Adjust sort resources according to the amount of data
[Feature][Sort] Adjust sort resources according to data scale
Dec 24, 2022
Description
Currently, the total amount of resources for the Flink Sort Job comes from the configuration file
flink-sort-plugin.properties
, so all submitted sort jobs will use the same amount of resources. When the data scale is large, the resources will be insufficient. When the data scale is small, the resources are wasted.Therefore, dynamically adjusting the number of resources according to the amount of data is one of the urgently needed functions
Resource Adaptive Adjustment
Theoretically, the processing performance of Flink can reach about
1000/second/core
, of course, it depends on factors such as state-backed.Influencing factors:
When the performance of a single client connection to external storage becomes a bottleneck, it is a good idea to increase the degree of parallelism or the number of threads
In the case of a fixed
LoadNode
, it is a deterministic factorcore/task manager, parallelism/core, and so on.
Use case
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: