Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28626: Display MoveTask/StatsTask duration on the query summary #5544

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,8 @@ public class PerfLogger {
public static final String LOAD_PARTITION = "LoadPartition";
public static final String LOAD_DYNAMIC_PARTITIONS = "LoadDynamicPartitions";

public static final String STATS_TASK = "StatsTask";

public static final String HIVE_GET_TABLE = "getTablesByType";
public static final String HIVE_GET_DATABASE = "getDatabase";
public static final String HIVE_GET_DATABASE_2 = "getDatabase2";
Expand Down
18 changes: 10 additions & 8 deletions ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@
import org.apache.hadoop.hive.ql.session.SessionState;
import org.apache.hadoop.hive.ql.util.DirectionUtils;
import org.apache.hadoop.util.StringUtils;
import org.apache.hadoop.util.Time;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

Expand All @@ -104,6 +105,7 @@ public class MoveTask extends Task<MoveWork> implements Serializable {

private static final long serialVersionUID = 1L;
private static transient final Logger LOG = LoggerFactory.getLogger(MoveTask.class);
private final PerfLogger perfLogger = SessionState.getPerfLogger();

public MoveTask() {
super();
Expand Down Expand Up @@ -169,7 +171,6 @@ public void flattenUnionSubdirectories(Path sourcePath) throws HiveException {
private void moveFile(Path sourcePath, Path targetPath, boolean isDfsDir)
throws HiveException {
try {
PerfLogger perfLogger = SessionState.getPerfLogger();
perfLogger.perfLogBegin("MoveTask", PerfLogger.FILE_MOVES);

String mesg = "Moving data to " + (isDfsDir ? "" : "local ") + "directory "
Expand Down Expand Up @@ -569,7 +570,8 @@ public int execute() {
}
releaseLocks(tbd);
}

long moveFilesDuration = perfLogger.getDuration(PerfLogger.FILE_MOVES);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions about this:

1.: This method already contains a perfLogEnd call:

      // if _blob_files_kept is present, use it to move the files. Else fall back to normal case.
      if (moveFilesUsingManifestFile(fs, sourcePath, targetPath)) {
        perfLogger.perfLogEnd("MoveTask", PerfLogger.FILE_MOVES);
        return;
      }

Does it stops the perfLogger? If yes, how that change will behave in that case?

2: perfLogEnd is already called at line 196. Can it affect the output of this getDuration call?

Copy link
Contributor Author

@abstractdog abstractdog Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. it doesn't stop, just registers and endTime for the duration of FILE_MOVES
  2. yeah, depending on the outcome, the ending time should still be registered anyways

possible refactoring is to fully encapsulate the whole movetask behavior to have a single begin + end call

console.printInfo(String.format("Time taken to move files:\t %d ms", moveFilesDuration));
return 0;
} catch (HiveException he) {
return processHiveException(he);
Expand Down Expand Up @@ -685,7 +687,7 @@ private DataContainer handleDynParts(Hive db, Table table, LoadTableDesc tbd,
Map<Path, Utilities.PartitionDetails> dps = Utilities.getFullDPSpecs(conf, dpCtx, dynamicPartitionSpecs);

console.printInfo(System.getProperty("line.separator"));
long startTime = System.currentTimeMillis();
long startTime = Time.monotonicNow();
// load the list of DP partitions and return the list of partition specs
// TODO: In a follow-up to HIVE-1361, we should refactor loadDynamicPartitions
// to use Utilities.getFullDPSpecs() to get the list of full partSpecs.
Expand All @@ -710,8 +712,8 @@ private DataContainer handleDynParts(Hive db, Table table, LoadTableDesc tbd,
pushFeed(FeedType.DYNAMIC_PARTITIONS, dp.values());
}

String loadTime = "\t Time taken to load dynamic partitions: " +
(System.currentTimeMillis() - startTime)/1000.0 + " seconds";
String loadTime = String.format("Time taken to load dynamic partitions:\t %.3f seconds",
(Time.monotonicNow() - startTime) / 1000.0);
console.printInfo(loadTime);
LOG.info(loadTime);

Expand All @@ -720,7 +722,7 @@ private DataContainer handleDynParts(Hive db, Table table, LoadTableDesc tbd,
" To turn off this error, set hive.error.on.empty.partition=false.");
}

startTime = System.currentTimeMillis();
startTime = Time.monotonicNow();
// for each partition spec, get the partition
// and put it to WriteEntity for post-exec hook
for(Map.Entry<Map<String, String>, Partition> entry : dp.entrySet()) {
Expand Down Expand Up @@ -758,8 +760,8 @@ private DataContainer handleDynParts(Hive db, Table table, LoadTableDesc tbd,
}
LOG.info("Loading partition " + entry.getKey());
}
console.printInfo("\t Time taken for adding to write entity : " +
(System.currentTimeMillis() - startTime)/1000.0 + " seconds");
console.printInfo(String.format("Time taken for adding to write entity:\t %.3f seconds",
(Time.monotonicNow() - startTime) / 1000.0));
dc = null; // reset data container to prevent it being added again.
return dc;
}
Expand Down
7 changes: 7 additions & 0 deletions ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@
import org.apache.hadoop.hive.conf.HiveConf;
import org.apache.hadoop.hive.conf.HiveConf.ConfVars;
import org.apache.hadoop.hive.ql.Context;
import org.apache.hadoop.hive.ql.log.PerfLogger;
import org.apache.hadoop.hive.ql.session.SessionState;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.hadoop.hive.ql.TaskQueue;
Expand Down Expand Up @@ -55,6 +57,7 @@
public class StatsTask extends Task<StatsWork> implements Serializable {
private static final long serialVersionUID = 1L;
private static transient final Logger LOG = LoggerFactory.getLogger(StatsTask.class);
private final PerfLogger perfLogger = SessionState.getPerfLogger();

public StatsTask() {
super();
Expand Down Expand Up @@ -94,6 +97,7 @@ public int execute() {
}
int ret = 0;
try {
perfLogger.perfLogBegin("StatsTask", PerfLogger.STATS_TASK);

if (work.isFooterScan()) {
work.getBasicStatsNoJobWork().setPartitions(work.getPartitions());
Expand All @@ -113,6 +117,9 @@ public int execute() {
LOG.error("Failed to run stats task", e);
setException(e);
return 1;
} finally {
perfLogger.perfLogEnd("StatsTask", PerfLogger.STATS_TASK);
console.printInfo(String.format("StatsTask took %d", perfLogger.getDuration(PerfLogger.STATS_TASK)));
}
return 0;
}
Expand Down
2 changes: 1 addition & 1 deletion ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Original file line number Diff line number Diff line change
Expand Up @@ -5047,7 +5047,7 @@ private static void deleteAndRename(FileSystem destFs, Path destFile, FileStatus
try {
// rename cannot overwrite non empty destination directory, so deleting the destination before renaming.
destFs.delete(destFile);
LOG.info("Deleted destination file" + destFile.toUri());
LOG.info("Deleted destination file: {}", destFile.toUri());
} catch (FileNotFoundException e) {
// no worries
}
Expand Down
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, for me, an empty starting line is not expected in case of any methods, that's why I simply removed :)

Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,6 @@ public BasicStatsTask(HiveConf conf, BasicStatsWork work) {

@Override
public int process(Hive db, Table tbl) throws Exception {

ayushtkn marked this conversation as resolved.
Show resolved Hide resolved
LOG.info("Executing stats task");
table = tbl;
return aggregateStats(db, tbl);
Expand Down
Loading