Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added an option to split the output proto files by top level category #367

Open
wants to merge 3 commits into
base: xml_converter
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion xml_converter/integration_tests/run_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ def run_xml_converter(
output_proto: Optional[List[str]] = None,
split_output_proto: Optional[str] = None,
allow_duplicates: Optional[bool] = None,
split_proto_by_category: Optional[bool] = None,
) -> Tuple[str, str, int]:

# Build the command to execute the C++ program with the desired function and arguments
Expand All @@ -38,6 +39,8 @@ def run_xml_converter(
cmd += ["--output-split-guildpoint-path"] + [split_output_proto]
if allow_duplicates:
cmd += ["--allow-duplicates"]
if split_proto_by_category:
cmd += ["--split-guildpoint-by-category"]

# Run the C++ program and capture its output
result = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
Expand Down Expand Up @@ -177,7 +180,8 @@ def main() -> bool:
input_proto=testcase.proto_input_paths,
output_xml=[xml_output_dir_path],
output_proto=[proto_output_dir_path],
allow_duplicates=testcase.allow_duplicates
allow_duplicates=testcase.allow_duplicates,
split_proto_by_category=testcase.split_proto_by_category
)

# Sanitize and denoise the lines
Expand Down
12 changes: 11 additions & 1 deletion xml_converter/integration_tests/src/testcase_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ class Testcase:
expected_stderr: List[str]
expected_returncode: int
allow_duplicates: bool
split_proto_by_category: bool


################################################################################
Expand Down Expand Up @@ -60,6 +61,7 @@ def load_testcase(path: str) -> Optional[Testcase]:
xml_input_paths: List[str] = []
proto_input_paths: List[str] = []
allow_duplicates: bool = False
split_proto_by_category: bool = False
for pack_name, pack_type in data["input_paths"].items():
if not isinstance(pack_name, str):
print(f"Invalid pack name, expecting a string but got {pack_name}")
Expand Down Expand Up @@ -113,6 +115,13 @@ def load_testcase(path: str) -> Optional[Testcase]:
else:
allow_duplicates = data["allow_duplicates"]

if "split_proto_by_category" in data:
if not isinstance(data["split_proto_by_category"], bool):
print(f"Invalid Test, expecting bool value for 'split_proto_by_category' in {path}")
return None
else:
split_proto_by_category = data["split_proto_by_category"]

return Testcase(
name=os.path.basename(path),
xml_input_paths=xml_input_paths,
Expand All @@ -122,7 +131,8 @@ def load_testcase(path: str) -> Optional[Testcase]:
expected_stdout=to_lines(data["expected_stdout"]),
expected_stderr=to_lines(data["expected_stderr"]),
expected_returncode=data["expected_returncode"],
allow_duplicates=allow_duplicates
allow_duplicates=allow_duplicates,
split_proto_by_category=split_proto_by_category
)


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ expected_stdout: |
The following top level categories were found in more than one pack:
"mycategory" in files:
pack/xml_file.xml
pack2/markers.bin
pack2/markers.guildpoint
pack3/xml_file.xml
expected_stderr: |
expected_returncode: 0
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
<OverlayData>
<MarkerCategory DisplayName="TopLevel1" Name="toplevel1" XPos="169.81" YPos="210.65" ZPos="215.83" MapID="50" ></MarkerCategory>

<MarkerCategory DisplayName="TopLevel2" Name="toplevel2" XPos="169.81" YPos="210.65" ZPos="215.83" MapID="50" ></MarkerCategory>

<MarkerCategory DisplayName="TopLevel3" Name="toplevel3" XPos="169.81" YPos="210.65" ZPos="215.83" MapID="50" ></MarkerCategory>
Comment on lines +2 to +6
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets differentiate these by more than just DisplayName and Name. Otherwise this test does a silly job at not catching the issue of "we write the same top level node but with different display/names for each category". The attributes could also be given to the POIs instead of the categories because XPos YPos ZPos dont make a huge amount of sense as category attributes.

Lets also add some subcategories here too so that we can test that "more than just the top level category gets written". Because we do want the entire category tree.


<POIs>
<POI Type="toplevel1" />
<POI Type="toplevel2" />
<POI Type="toplevel3" />
</POIs>
</OverlayData>
klingbolt marked this conversation as resolved.
Show resolved Hide resolved
klingbolt marked this conversation as resolved.
Show resolved Hide resolved
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full Diff
--- xml_converter/integration_tests/test_cases/xml_split_by_category/output_proto/toplevel1.guildpoint.textproto._old	2024-12-09 03:53:05.031311821 +0000
+++ xml_converter/integration_tests/test_cases/xml_split_by_category/output_proto/toplevel1.guildpoint.textproto._new	2024-12-09 03:53:05.035311909 +0000
@@ -0,0 +1,12 @@
+category {
+  name: "TopLevel1"
+  icon {
+    map_id: 50
+    position {
+      x: 169.81
+      y: 210.65
+      z: 215.83
+    }
+  }
+  id: "\010\007L]\332B\361\323"
+}

Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@

*
TopLevel1 2B\Ï)Cf¦RC{ÔWCBL]ÚBñÓ
Expand Down
klingbolt marked this conversation as resolved.
Show resolved Hide resolved
klingbolt marked this conversation as resolved.
Show resolved Hide resolved
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full Diff
--- xml_converter/integration_tests/test_cases/xml_split_by_category/output_proto/toplevel2.guildpoint.textproto._old	2024-12-09 03:53:05.043312083 +0000
+++ xml_converter/integration_tests/test_cases/xml_split_by_category/output_proto/toplevel2.guildpoint.textproto._new	2024-12-09 03:53:05.051312257 +0000
@@ -0,0 +1,12 @@
+category {
+  name: "TopLevel2"
+  icon {
+    map_id: 50
+    position {
+      x: 169.81
+      y: 210.65
+      z: 215.83
+    }
+  }
+  id: "\010\007L]\333B\361\323"
+}

Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@

*
TopLevel2 2B\Ï)Cf¦RC{ÔWCBL]ÛBñÓ
Expand Down
klingbolt marked this conversation as resolved.
Show resolved Hide resolved
klingbolt marked this conversation as resolved.
Show resolved Hide resolved
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full Diff
--- xml_converter/integration_tests/test_cases/xml_split_by_category/output_proto/toplevel3.guildpoint.textproto._old	2024-12-09 03:53:05.059312431 +0000
+++ xml_converter/integration_tests/test_cases/xml_split_by_category/output_proto/toplevel3.guildpoint.textproto._new	2024-12-09 03:53:05.067312606 +0000
@@ -0,0 +1,12 @@
+category {
+  name: "TopLevel3"
+  icon {
+    map_id: 50
+    position {
+      x: 169.81
+      y: 210.65
+      z: 215.83
+    }
+  }
+  id: "\010\007L]\334B\361\323"
+}

Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@

*
TopLevel3 2B\Ï)Cf¦RC{ÔWCBL]ÜBñÓ
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
<OverlayData>
<MarkerCategory DisplayName="TopLevel1" ID="CAdMXdpC8dM=" Name="toplevel1">
</MarkerCategory>

<MarkerCategory DisplayName="TopLevel2" ID="CAdMXdtC8dM=" Name="toplevel2">
</MarkerCategory>

<MarkerCategory DisplayName="TopLevel3" ID="CAdMXdxC8dM=" Name="toplevel3">
</MarkerCategory>

<POIs>
<POI Type="toplevel1" MapID="50" XPos="169.809998" YPos="210.649994" ZPos="215.830002"/>
<POI Type="toplevel2" MapID="50" XPos="169.809998" YPos="210.649994" ZPos="215.830002"/>
<POI Type="toplevel3" MapID="50" XPos="169.809998" YPos="210.649994" ZPos="215.830002"/>
</POIs>
</OverlayData>
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
input_paths:
"pack": "xml"
expected_stdout: |
expected_stderr: |
expected_returncode: 0
split_proto_by_category: true
45 changes: 43 additions & 2 deletions xml_converter/src/packaging_protobin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,6 @@ void _write_protobuf_file(

void write_protobuf_file(
const string& marker_pack_root_directory,
const StringHierarchy& category_filter,
const map<string, Category>* marker_categories,
const vector<Parseable*>* parsed_pois) {
std::map<string, std::vector<Parseable*>> category_to_pois;
Expand All @@ -235,6 +234,8 @@ void write_protobuf_file(
}
}

StringHierarchy category_filter;
category_filter.add_path({}, true);
Comment on lines +237 to +238
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you hardcoding the global wildcard here and preventing it from being passed in to the function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that category_filter is not global in the same sense as the other two large inputs. It is created just before this function is called, and always for a single path. In this sense, it is more similar in scope to the ProtoWriterState, which is also created in this function.

_write_protobuf_file(
join_file_paths(state.marker_pack_root_directory, "markers.guildpoint"),
category_filter,
Expand All @@ -243,10 +244,47 @@ void write_protobuf_file(
&state);
}

// Write protobuf per top level category
void write_protobuf_file_per_category(
const string& marker_pack_root_directory,
const map<string, Category>* marker_categories,
const vector<Parseable*>* parsed_pois) {
std::map<string, std::vector<Parseable*>> category_to_pois;
ProtoWriterState state;
state.marker_pack_root_directory = marker_pack_root_directory;

for (size_t i = 0; i < parsed_pois->size(); i++) {
Parseable* parsed_poi = (*parsed_pois)[i];
if (parsed_poi->classname() == "POI") {
Icon* icon = dynamic_cast<Icon*>(parsed_poi);
// TODO(331): This is the wrong place to lowercase() the category and is hiding some crimes elsewhere
category_to_pois[lowercase(icon->category.category)].push_back(parsed_poi);
}
else if (parsed_poi->classname() == "Trail") {
Trail* trail = dynamic_cast<Trail*>(parsed_poi);
// TODO(331): This is the wrong place to lowercase() the category and is hiding some crimes elsewhere
category_to_pois[lowercase(trail->category.category)].push_back(parsed_poi);
}
else {
std::cout << "Unknown type" << std::endl;
}
}

for (map<string, Category>::const_iterator it = marker_categories->begin(); it != marker_categories->end(); it++) {
StringHierarchy category_filter;
category_filter.add_path({it->first}, true);
_write_protobuf_file(
join_file_paths(state.marker_pack_root_directory, it->first + ".guildpoint"),
category_filter,
marker_categories,
category_to_pois,
&state);
}
}

Comment on lines +248 to +284
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then this function duplicates all of the logic of write_protobuf_file in order to access the earlier removed ability to pass in a category_fiter.

If the counter argument is the building of the category_to_pois dict for each file, aka: performance optimizations. I would necessarily have expected to see benchmarks somewhere in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function actually duplicates write_protobuf_file_per_map_id. Since that is now 3 places, perhaps the creation of category_to_pois should be its own function.

My other thought was that I am already iterating over the marker_categories map in order to get the name and would need a second iteration if I were to do it for the category filter first before this function is called.

// Write protobuf per map id
void write_protobuf_file_per_map_id(
const string& marker_pack_root_directory,
const StringHierarchy& category_filter,
const map<string, Category>* marker_categories,
const vector<Parseable*>* parsed_pois) {
std::map<int, std::map<string, std::vector<Parseable*>>> mapid_to_category_to_pois;
Expand All @@ -268,6 +306,9 @@ void write_protobuf_file_per_map_id(
}
}

StringHierarchy category_filter;
category_filter.add_path({}, true);

Comment on lines +309 to +311
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Why are you hardcoding the global wildcard here and preventing it from being passed in to the function?

for (auto iterator = mapid_to_category_to_pois.begin(); iterator != mapid_to_category_to_pois.end(); iterator++) {
string output_filepath = join_file_paths(state.marker_pack_root_directory, to_string(iterator->first) + ".guildpoint");

Expand Down
7 changes: 5 additions & 2 deletions xml_converter/src/packaging_protobin.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,15 @@ std::set<std::string> read_protobuf_file(

void write_protobuf_file(
const std::string& marker_pack_root_directory,
const StringHierarchy& category_filter,
const std::map<std::string, Category>* marker_categories,
const std::vector<Parseable*>* parsed_pois);

void write_protobuf_file_per_category(
const std::string& marker_pack_root_directory,
const std::map<std::string, Category>* marker_categories,
const std::vector<Parseable*>* parsed_pois);

void write_protobuf_file_per_map_id(
const std::string& marker_pack_root_directory,
const StringHierarchy& category_filter,
const std::map<std::string, Category>* marker_categories,
const std::vector<Parseable*>* parsed_pois);
28 changes: 19 additions & 9 deletions xml_converter/src/xml_converter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -138,16 +138,20 @@ void write_taco_directory(
void write_burrito_directory(
string output_path,
map<string, Category>* marker_categories,
vector<Parseable*>* parsed_pois) {
vector<Parseable*>* parsed_pois,
bool split_proto_by_category) {
if (!filesystem::is_directory(output_path)) {
if (!filesystem::create_directory(output_path)) {
cout << "Error: " << output_path << "is not a valid directory path" << endl;
return;
}
}
StringHierarchy category_filter;
category_filter.add_path({}, true);
write_protobuf_file(output_path, category_filter, marker_categories, parsed_pois);
if (split_proto_by_category) {
write_protobuf_file_per_category(output_path, marker_categories, parsed_pois);
}
else {
write_protobuf_file(output_path, marker_categories, parsed_pois);
}
Comment on lines +149 to +154
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic should probably be nearby the logic for split by map id. It is convoluted that split by map id is in the parent function but split by category is in this child function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have actually considered moving map_id into this function but it would require reworking some other variables so thought it outside this PR scope.

}

////////////////////////////////////////////////////////////////////////////////
Expand All @@ -166,6 +170,9 @@ void process_data(
// allow for splitting out a single markerpack
vector<string> output_taco_paths,
vector<string> output_guildpoint_paths,
// This value will change how the guildpoint files are written so that
// each top level category is in its own file
bool split_proto_by_category,

// This is a special output path used for burrito internal use that splits
// the guildpoint protobins by map id.
Expand Down Expand Up @@ -245,15 +252,13 @@ void process_data(

// Write all of the protobin guildpoint paths
for (size_t i = 0; i < output_guildpoint_paths.size(); i++) {
write_burrito_directory(output_guildpoint_paths[i], &marker_categories, &parsed_pois);
write_burrito_directory(output_guildpoint_paths[i], &marker_categories, &parsed_pois, split_proto_by_category);
}

// Write the special map-split protbin guildpoint file
begin = chrono::high_resolution_clock::now();
if (output_split_guildpoint_dir != "") {
StringHierarchy category_filter;
category_filter.add_path({}, true);
write_protobuf_file_per_map_id(output_split_guildpoint_dir, category_filter, &marker_categories, &parsed_pois);
write_protobuf_file_per_map_id(output_split_guildpoint_dir, &marker_categories, &parsed_pois);
}
end = chrono::high_resolution_clock::now();
dur = end - begin;
Expand All @@ -278,6 +283,7 @@ int main(int argc, char* argv[]) {
vector<string> input_guildpoint_paths;
vector<string> output_guildpoint_paths;
bool allow_duplicates = false;
bool split_proto_by_category = false;

// Typically "~/.local/share/godot/app_userdata/Burrito/protobins" for
// converting from xml markerpacks to internal protobuf files.
Expand Down Expand Up @@ -308,6 +314,10 @@ int main(int argc, char* argv[]) {
allow_duplicates = true;
arg_target = nullptr;
}
else if (!strcmp(argv[i], "--split-guildpoint-by-category")) {
split_proto_by_category = true;
Comment on lines +317 to +318
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if --output-split-guildpoint-path and --split-guildpoint-by-category are both specified. Does the program act in a way that a user could reasonably expect? I think it wont, because it will only split by map id as the current logic for splitting by map id does not interact with the splitting by category logic.

Maybe this is the wrong argument to be exposing to the user? Think through some user stories for this holistically because we can still alter how other command line flags for xml_converter are presented to the user if we think there are better paradigms to use than what we are currently using.

arg_target = nullptr;
}
else {
if (arg_target != nullptr) {
arg_target->push_back(argv[i]);
Expand Down Expand Up @@ -335,7 +345,7 @@ int main(int argc, char* argv[]) {
allow_duplicates,
output_taco_paths,
output_guildpoint_paths,
split_proto_by_category,
output_split_guildpoint_dir);

return 0;
}
Loading