Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨Add list_files function #615 #615

Open
wants to merge 19 commits into
base: develop-pros-4
Choose a base branch
from

Conversation

Gracelu128
Copy link
Contributor

@Gracelu128 Gracelu128 commented Oct 27, 2023

Summary:

Adds functionality to list the files on a sd card.

Motivation:

Made wrapper function for the CPP api that returns a vector of strings of file names. Renamed original function as list_files_raw in both CPP api and C api.

Test Plan:

  • Get vector of file names in the default directory
  • Get vector of file names in a sub directory
  • Test invalid directory

noam987 and others added 6 commits October 23, 2023 19:24
…tring of file names

Will return a vector of std::strings of file names, will return a vector containing one std::string PROS_ERR if error occurs.
@Gracelu128 Gracelu128 changed the title ✨Add list_files function #612 ✨Add list_files function #615 Oct 27, 2023
@noam987 noam987 self-requested a review October 27, 2023 20:17
Function to return vector containing specific error state errno is set to, if none of the error states will set to FILE I/O error
* // Given vector of std::string file names, print each file name
* // Note that if there is an error, the vector will contain one element, PROS_ERR
* // Print each file name
* for (std::string file : files) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for (std::string& file : files) {

That & is very important for performance reasons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented this suggestion in commit on Oct 30, 3d41ca6

* // Print each file name
* for (std::string file : files) {
* std::cout << file << std::endl;
* }
* }
* \endcode
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely the docs should explain the params as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added param in header for funcs in commit 3d41ca6 and 4247fb1

return usd_list_files_raw(path, buffer, len);
}

std::vector<std::string> list_files(const char* path, char* buffer, int32_t len) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should definitely not require users to pass in their own buffer and len. You should just allocate a buffer yourself at the top of this function body then free it after it's done being used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented this change in commit 4247fb1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, can I ask if the number 10000 came from somewhere or was it chosen arbitrarily?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Came from an estimate of the worst case that is realistic to happen. We needed to chose a number, and decided that 100 files in a single directory with an approximate average length of 100 (really 99 if you include the separators) will be enough for every feasible use case we could come up with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fair enough. If you're choosing a number, a power of 2 (or sum of powers of 2) might be nice for memory allocator friendliness :)

(I have no idea what allocation strategy VexOS uses, this may be completely irrelevant)

// Store char * buffer as std::string in str
// Store delimiter in std::string, '\n'
std::string str(buffer);
std::string delimiter = "\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

char delimeter = '\n';

You definitely don't need a whole std::string for just \n, which is only a single char.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented this change in commit 3d41ca6

// Parse buffer given call successful, split by '/n'
// Store char * buffer as std::string in str
// Store delimiter in std::string, '\n'
std::string str(buffer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason to allocate a whole additional std::string for this, use std::string_view over the already-existing buffer instead. You can still use .find() the same way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented this change in commit 3d41ca6.

// file_name is the string from the beginning of str to the first '\n', excluding '\n'
file_name = str.substr(0, position);
// Add token to files vector
files.push_back(file_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should move file_name into the vector instead of copying it. This will reuse the already allocated heap buffer instead of copying over a new one, resulting in each file name being copied once instead of twice or more.

files.emplace_back(std::move(file_name));

Alternatively, file_name could be a std::string_view that you emplace_back. Since files::value_type is std::string, that will result in the file name being copied (for the first time). Actually I think I like this solution better than the move anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented this change in commit 3d41ca6. Made file_name a std::string_view.

// Add token to files vector
files.push_back(file_name);
// Erase file_name from str, would be pos + 1 if we wanted to include '\n'
str.erase(0, position + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work if you use a std::string_view for str (and it's very unnecessarily slow even as a std::string). You should either keep track of the index to start at for the next iteration, or maybe str = str.substr(...)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented this change in commit 3d41ca6. Altered code to instead have index to keep track of start for next iteration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thanks

// Check if call successful, if error return vector containing error state
if (success == PROS_ERR) {
// Check errno to see which error state occurred
// push back error state to files vector as std::string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm I'm definitely not convinced that this error handling strategy is a good approach. Maybe an out-param for the vector is better so you have some other way to return errors? That's more inline with how pros usually does things anyway. (returning errors via return type)

@djava
Copy link
Contributor

djava commented Nov 3, 2023

@Gracelu128 it would be helpful if, when you make changes to address review comments I've made, you could respond to the review comment to let me know (maybe with the commit hash that you addressed it in as well). Would you be able to go through and do that here so it's easier to dig through this pile of commits?

Thanks!

Comment on lines 31 to 33
char *buffer = (char *) malloc(10000);
// Call the C function
int32_t success = usd_list_files_raw(path, buffer, len);
int32_t success = usd_list_files_raw(path, buffer, 10000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you have a constexpr size_t buffer_size = 10000; to reuse between these two instances?

// Erase file_name from str, would be pos + 1 if we wanted to include '\n'
str.erase(0, position + 1);
}
files.emplace_back(file_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it happens via an implicit conversion, it would be nice to have a comment here noting that this is the point where the string containing the file name is copied into its own std::string, since everything before this was just std::string_views over the pre-existing buffer.

// file_name used to store each file name
size_t position = 0;
std::string file_name;
size_t delimiter_pos, index = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ footgun alert!! index is initialized to 0 here, but delimiter_pos is left uninitialized (and therefore in an unspecified state). Admittedly, it doesn't matter here since the first use is an assignment, but I generally advise to avoid multiple declaration and always initialize primitive types to avoid even the potential of such mistakes causes issues, since those can be very hard to track down later.

// Add token to files vector
files.push_back(file_name);
// Erase file_name from str, would be pos + 1 if we wanted to include '\n'
str.erase(0, position + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thanks

@@ -29,6 +29,15 @@ std::vector<std::string> list_files(const char* path) {
std::vector<std::string> files = {};
// malloc buffer to store file names
char *buffer = (char *) malloc(10000);
if (buffer == NULL) {
// try again smaller buffer to see if that works
buffer = (char *) malloc(500);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping a buffer_size variable is even more pertinent with this addition, wouldn't want to put 10000 entries into a buffer of size 500! It can't be constexpr anymore now though, so ignore that part of the above comment.

@djava
Copy link
Contributor

djava commented Nov 3, 2023

Looks a lot better, thanks for the changes! Most of what's left is small or stylistic.

@djava
Copy link
Contributor

djava commented Nov 3, 2023

Just a thought - right now there's no standard prefix to the error messages, so there's not really a clean way to programmatically check if the function errored. Perhaps they can all start with "ERROR - " or similar, so that one can at least write something like this to check for errors?

auto file_list = pros::usd::list_files("/");
if (file_list.front().starts_with("ERROR")) {
    // do error stuff
} else {
    // We know file_list is valid now
}

I'm not sure how you'd do this as is. errno isn't reliable, since it could have been set by a previous function call, and theres no return code from this function that tells us succinctly that we have to check errno, right?

Gosh, thank god for C++20 adding std::string::starts_with. Not sure what the clean way to do this otherwise would be.

@Gracelu128
Copy link
Contributor Author

Looks a lot better, thanks for the changes! Most of what's left is small or stylistic.

I just implemented your suggested stylistic changes in the new commit 490a3ca.

std::vector<std::string> files = {};
// malloc buffer to store file names
size_t buffer_size = 10000;
char *buffer = (char *) malloc(buffer_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use the C++ new char[buffer_size];instead of C's malloc (and later deleteinstead of `free).

It's a bit less scary type-wise and very widely considered better stylistically for C++.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually even better would be std::unique_ptr<char[]> buffer = std::make_shared<char[]>(buffer_size)and then passing buffer.get() to the C function. Then the deletehappens automatically when you exit the function scope.

Altered to return the file path instead of file names. Ex. "/usd/path/file_name".
Copy link
Member

@WillXuCodes WillXuCodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this review might be slightly outdated... please ignore any outdated comments I just forgot to hit submit 💀

EEXIST, EINVAL, EROFS, ENXIO, ENOBUFS, ENXIO, EIO, EACCES, // FR_LOCKED
ENOBUFS, ENFILE, EINVAL};

int32_t usd_list_files_raw(const char* path, char* buffer, int32_t len) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be handled by vexFileDirectoryGet already, but can we add nullchecks for these parameters?

return usd_list_files_raw(path, buffer, len);
}

std::vector<std::string> list_files(const char* path, char* buffer, int32_t len) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a std::string instead of a c_string pls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::string_view actually

@@ -21,5 +21,80 @@ std::int32_t is_installed(void) {
return usd_is_installed();
}

std::int32_t list_files_raw(const char* path, char* buffer, int32_t len) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for our C++ API please use std::string and .c_str to call the internal C funcs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure all of the std::strings are also references pls*

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::string_view

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::string_view

I understand that's more C++onic but for the sake of our users I think a std::string would be a little more approachable... saves everyone a google search.

Copy link
Contributor

@djava djava Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😖 wasted allocation for every call with a string literal for path, which I assume is the typical case.

Copy link
Contributor

@djava djava Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure all of the std::strings are also references pls*

If you choose to use std::string instead of string_view do make sure its const refs, as mutable refs will cause other issues (won't accept temporaries i.e. string literals that convert to std::string)

std::string_view file_name;

// Loop until delimiter '\n' can not be found anymore
while ((delimiter_pos = str.find('\n', index)) != std::string::npos) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we really wanted to optimize this even more you could write a sentinel function (or whatever people call it I'm tired atm) that grabs all the delimter indexes, and then another loop that stores all the delimter positions in a data structure such as a vector, and then another loop that iterates through the data structure. This changes the O(N * M) behavior of this to O(N + M) where N is the size of the string and M is the number of delimeters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where you got O(N*M). If I'm reading correctly, you're seeing it as a full loop through all N characters each time we loop? That's not true, though, as index changes where the iteration starts each time (starting further along the string on each iteration), and each iteration stops as soon as it locates '\n'. Therefore, this is only O(N).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where you got O(N*M). If I'm reading correctly, you're seeing it as a full loop through all N characters each time we loop? That's not true, though, as index changes where the iteration starts each time (starting further along the string on each iteration), and each iteration stops as soon as it locates '\n'. Therefore, this is only O(N).

ah you're right, I didn't notice the index parameter last night.

Comment on lines 48 to 54
std::string_view path_sv(path);
size_t found = path_sv.find("usd");
if (found == 0 || found == 1) {
// Deal with when user prepends path with usd
// as either "usd/..." or "/usd/..."
path_sv.remove_prefix(3);
}
Copy link
Contributor

@djava djava Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does hardcoding the 3 here deal with both cases? Wouldn't it be 3 + found?

Also to avoid the magic-number-ness of this 3, I'd probably write this as:

constexpr std::string_view usd_prefix {"usd"};
std::string_view path_sv {path};
const size_t usd_prefix_idx = path_sv.find(usd_prefix);
if (usd_prefix_idx == 0 || usd_prefix_idx == 1) {
    path_sv.remove_prefix(usd_prefix.length() + usd_prefix_idx);
}

Copy link
Contributor

@djava djava Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also what happens here if they don't include the usd prefix? We just make it work the same as if they did? May be weird, since we also always return paths prefixed with /usd/.

My preference here would be to return a "path not found" error, as just assuming they wanted an implicit /usd/ makes doing anything with a VFS outside of /usd/ in the future into a backwards compatibility break. It also improves consistency, since nothing else in PROS would accept that path.

}

// set path to path_sv.data()
path = path_sv.data();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path_sv is already pointing to *path. It's a read-only "view" over a string that exists elsewhere in memory, hence the name.

@@ -76,7 +89,7 @@ std::vector<std::string> list_files(const char* path) {
file_name = std::string_view(str.data() + index, delimiter_pos - index);
// This is point where content of the std::string_view file name is copied to its
// own std::string and added to the files vector
files.emplace_back(file_name);
files.emplace_back("/usd" + std::string(path) + "/" + std::string(file_name));
Copy link
Contributor

@djava djava Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't do it like this, this creates 7 different std::strings (each of which "may" allocate and pretty likely at least 2 will, wasting a lot of performance).
I don't believe the version of libstdc++ (the standard library) PROS uses supports std::format but if so you could write the very nice:

files.emplace_back(std::format("/{}/{}", path, file_name));

if not i would recommend something more like

const size_t path_size = 1 + usd_prefix.length() + path.length() + 1 + file_name.length(); // +1's for the slashes
std::string buf;
buf.reserve(path_size); // Reserve enough space in the internal array that we can snprintf right into it

// The %.*s format specifier allows passing the size of a format arg before the value of the arg, so it lets
// us use the string_view into `str` with the C-style snprintf function even though it does not have the
// null terminator that would usually be required to know when a string ends.
// We also add 1 to path_size for the `n` argument for the null terminator, the space for which is
// implicitly included by the std::string::reserve call
std::snprintf(buf.data(), path_size + 1, "/%s/%.*s", path, static_cast<int>(file_name.length()), file_name.data());

files.push_back(std::move(buf));

This retains the property of the old version that it only had to allocate the string once.

(EDIT: removed the + 1 in path_size calculation for null terminator, as I realized that std::string::reserve implicitly adds 1 for that by itself, and add it back in snprintf)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the version of libstdc++ (the standard library) PROS uses supports std::format

Sadly it does not :(

Copy link
Contributor

@djava djava Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gracelu128 Ty but

  1. I would recommend keeping the comments (or rewriting them in your own words if you want). Always better to explain what you're doing, especially if it may be non-obvious why (and I think readers of this code may not immediately understand what the %.*s does, and why its needed. I know I wouldn't know what that format specifier means at first glance)
  2. Sorry, I actually made an edit to the code block after you looked at it I think:

(EDIT: removed the + 1 in path_size calculation for null terminator, as I realized that std::string::reserve implicitly adds 1 for that by itself, and add it back in snprintf)

You could also just subtract 1 from the param to the reserve call if you think thats preferable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants