-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨Add list_files function #615 #615
base: develop-pros-4
Are you sure you want to change the base?
Conversation
…tring of file names Will return a vector of std::strings of file names, will return a vector containing one std::string PROS_ERR if error occurs.
…c API as well to raw
Function to return vector containing specific error state errno is set to, if none of the error states will set to FILE I/O error
include/pros/misc.hpp
Outdated
* // Given vector of std::string file names, print each file name | ||
* // Note that if there is an error, the vector will contain one element, PROS_ERR | ||
* // Print each file name | ||
* for (std::string file : files) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (std::string& file : files) {
That &
is very important for performance reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented this suggestion in commit on Oct 30, 3d41ca6
* // Print each file name | ||
* for (std::string file : files) { | ||
* std::cout << file << std::endl; | ||
* } | ||
* } | ||
* \endcode | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely the docs should explain the params as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/devices/vdml_usd.cpp
Outdated
return usd_list_files_raw(path, buffer, len); | ||
} | ||
|
||
std::vector<std::string> list_files(const char* path, char* buffer, int32_t len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should definitely not require users to pass in their own buffer
and len
. You should just allocate a buffer yourself at the top of this function body then free it after it's done being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented this change in commit 4247fb1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, can I ask if the number 10000 came from somewhere or was it chosen arbitrarily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Came from an estimate of the worst case that is realistic to happen. We needed to chose a number, and decided that 100 files in a single directory with an approximate average length of 100 (really 99 if you include the separators) will be enough for every feasible use case we could come up with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, fair enough. If you're choosing a number, a power of 2 (or sum of powers of 2) might be nice for memory allocator friendliness :)
(I have no idea what allocation strategy VexOS uses, this may be completely irrelevant)
src/devices/vdml_usd.cpp
Outdated
// Store char * buffer as std::string in str | ||
// Store delimiter in std::string, '\n' | ||
std::string str(buffer); | ||
std::string delimiter = "\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
char delimeter = '\n';
You definitely don't need a whole std::string
for just \n
, which is only a single char.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented this change in commit 3d41ca6
src/devices/vdml_usd.cpp
Outdated
// Parse buffer given call successful, split by '/n' | ||
// Store char * buffer as std::string in str | ||
// Store delimiter in std::string, '\n' | ||
std::string str(buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no reason to allocate a whole additional std::string
for this, use std::string_view
over the already-existing buffer instead. You can still use .find()
the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented this change in commit 3d41ca6.
// file_name is the string from the beginning of str to the first '\n', excluding '\n' | ||
file_name = str.substr(0, position); | ||
// Add token to files vector | ||
files.push_back(file_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should move file_name into the vector instead of copying it. This will reuse the already allocated heap buffer instead of copying over a new one, resulting in each file name being copied once instead of twice or more.
files.emplace_back(std::move(file_name));
Alternatively, file_name
could be a std::string_view
that you emplace_back
. Since files::value_type
is std::string
, that will result in the file name being copied (for the first time). Actually I think I like this solution better than the move anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented this change in commit 3d41ca6. Made file_name a std::string_view.
src/devices/vdml_usd.cpp
Outdated
// Add token to files vector | ||
files.push_back(file_name); | ||
// Erase file_name from str, would be pos + 1 if we wanted to include '\n' | ||
str.erase(0, position + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work if you use a std::string_view
for str (and it's very unnecessarily slow even as a std::string
). You should either keep track of the index to start at for the next iteration, or maybe str = str.substr(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented this change in commit 3d41ca6. Altered code to instead have index to keep track of start for next iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Thanks
// Check if call successful, if error return vector containing error state | ||
if (success == PROS_ERR) { | ||
// Check errno to see which error state occurred | ||
// push back error state to files vector as std::string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm I'm definitely not convinced that this error handling strategy is a good approach. Maybe an out-param for the vector is better so you have some other way to return errors? That's more inline with how pros usually does things anyway. (returning errors via return type)
Either file i/o error or path not found
@Gracelu128 it would be helpful if, when you make changes to address review comments I've made, you could respond to the review comment to let me know (maybe with the commit hash that you addressed it in as well). Would you be able to go through and do that here so it's easier to dig through this pile of commits? Thanks! |
src/devices/vdml_usd.cpp
Outdated
char *buffer = (char *) malloc(10000); | ||
// Call the C function | ||
int32_t success = usd_list_files_raw(path, buffer, len); | ||
int32_t success = usd_list_files_raw(path, buffer, 10000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you have a constexpr size_t buffer_size = 10000;
to reuse between these two instances?
src/devices/vdml_usd.cpp
Outdated
// Erase file_name from str, would be pos + 1 if we wanted to include '\n' | ||
str.erase(0, position + 1); | ||
} | ||
files.emplace_back(file_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it happens via an implicit conversion, it would be nice to have a comment here noting that this is the point where the string containing the file name is copied into its own std::string
, since everything before this was just std::string_view
s over the pre-existing buffer
.
src/devices/vdml_usd.cpp
Outdated
// file_name used to store each file name | ||
size_t position = 0; | ||
std::string file_name; | ||
size_t delimiter_pos, index = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ footgun alert!! index
is initialized to 0 here, but delimiter_pos
is left uninitialized (and therefore in an unspecified state). Admittedly, it doesn't matter here since the first use is an assignment, but I generally advise to avoid multiple declaration and always initialize primitive types to avoid even the potential of such mistakes causes issues, since those can be very hard to track down later.
src/devices/vdml_usd.cpp
Outdated
// Add token to files vector | ||
files.push_back(file_name); | ||
// Erase file_name from str, would be pos + 1 if we wanted to include '\n' | ||
str.erase(0, position + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Thanks
src/devices/vdml_usd.cpp
Outdated
@@ -29,6 +29,15 @@ std::vector<std::string> list_files(const char* path) { | |||
std::vector<std::string> files = {}; | |||
// malloc buffer to store file names | |||
char *buffer = (char *) malloc(10000); | |||
if (buffer == NULL) { | |||
// try again smaller buffer to see if that works | |||
buffer = (char *) malloc(500); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping a buffer_size
variable is even more pertinent with this addition, wouldn't want to put 10000 entries into a buffer of size 500! It can't be constexpr
anymore now though, so ignore that part of the above comment.
Looks a lot better, thanks for the changes! Most of what's left is small or stylistic. |
Just a thought - right now there's no standard prefix to the error messages, so there's not really a clean way to programmatically check if the function errored. Perhaps they can all start with "ERROR - " or similar, so that one can at least write something like this to check for errors? auto file_list = pros::usd::list_files("/");
if (file_list.front().starts_with("ERROR")) {
// do error stuff
} else {
// We know file_list is valid now
} I'm not sure how you'd do this as is. Gosh, thank god for C++20 adding |
I just implemented your suggested stylistic changes in the new commit 490a3ca. |
std::vector<std::string> files = {}; | ||
// malloc buffer to store file names | ||
size_t buffer_size = 10000; | ||
char *buffer = (char *) malloc(buffer_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use the C++ new char[buffer_size];
instead of C's malloc
(and later delete
instead of `free).
It's a bit less scary type-wise and very widely considered better stylistically for C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually even better would be std::unique_ptr<char[]> buffer = std::make_shared<char[]>(buffer_size)
and then passing buffer.get()
to the C function. Then the delete
happens automatically when you exit the function scope.
Altered to return the file path instead of file names. Ex. "/usd/path/file_name".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this review might be slightly outdated... please ignore any outdated comments I just forgot to hit submit 💀
EEXIST, EINVAL, EROFS, ENXIO, ENOBUFS, ENXIO, EIO, EACCES, // FR_LOCKED | ||
ENOBUFS, ENFILE, EINVAL}; | ||
|
||
int32_t usd_list_files_raw(const char* path, char* buffer, int32_t len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be handled by vexFileDirectoryGet already, but can we add nullchecks for these parameters?
src/devices/vdml_usd.cpp
Outdated
return usd_list_files_raw(path, buffer, len); | ||
} | ||
|
||
std::vector<std::string> list_files(const char* path, char* buffer, int32_t len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a std::string instead of a c_string pls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::string_view actually
@@ -21,5 +21,80 @@ std::int32_t is_installed(void) { | |||
return usd_is_installed(); | |||
} | |||
|
|||
std::int32_t list_files_raw(const char* path, char* buffer, int32_t len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for our C++ API please use std::string and .c_str to call the internal C funcs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure all of the std::strings are also references pls*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::string_view
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::string_view
I understand that's more C++onic but for the sake of our users I think a std::string would be a little more approachable... saves everyone a google search.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😖 wasted allocation for every call with a string literal for path, which I assume is the typical case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure all of the std::strings are also references pls*
If you choose to use std::string
instead of string_view
do make sure its const
refs, as mutable refs will cause other issues (won't accept temporaries i.e. string literals that convert to std::string
)
std::string_view file_name; | ||
|
||
// Loop until delimiter '\n' can not be found anymore | ||
while ((delimiter_pos = str.find('\n', index)) != std::string::npos) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we really wanted to optimize this even more you could write a sentinel function (or whatever people call it I'm tired atm) that grabs all the delimter indexes, and then another loop that stores all the delimter positions in a data structure such as a vector, and then another loop that iterates through the data structure. This changes the O(N * M) behavior of this to O(N + M) where N is the size of the string and M is the number of delimeters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure where you got O(N*M)
. If I'm reading correctly, you're seeing it as a full loop through all N characters each time we loop? That's not true, though, as index
changes where the iteration starts each time (starting further along the string on each iteration), and each iteration stops as soon as it locates '\n'
. Therefore, this is only O(N).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure where you got
O(N*M)
. If I'm reading correctly, you're seeing it as a full loop through all N characters each time we loop? That's not true, though, asindex
changes where the iteration starts each time (starting further along the string on each iteration), and each iteration stops as soon as it locates'\n'
. Therefore, this is only O(N).
ah you're right, I didn't notice the index parameter last night.
src/devices/vdml_usd.cpp
Outdated
std::string_view path_sv(path); | ||
size_t found = path_sv.find("usd"); | ||
if (found == 0 || found == 1) { | ||
// Deal with when user prepends path with usd | ||
// as either "usd/..." or "/usd/..." | ||
path_sv.remove_prefix(3); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does hardcoding the 3
here deal with both cases? Wouldn't it be 3 + found
?
Also to avoid the magic-number-ness of this 3, I'd probably write this as:
constexpr std::string_view usd_prefix {"usd"};
std::string_view path_sv {path};
const size_t usd_prefix_idx = path_sv.find(usd_prefix);
if (usd_prefix_idx == 0 || usd_prefix_idx == 1) {
path_sv.remove_prefix(usd_prefix.length() + usd_prefix_idx);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also what happens here if they don't include the usd prefix? We just make it work the same as if they did? May be weird, since we also always return paths prefixed with /usd/
.
My preference here would be to return a "path not found" error, as just assuming they wanted an implicit /usd/
makes doing anything with a VFS outside of /usd/
in the future into a backwards compatibility break. It also improves consistency, since nothing else in PROS would accept that path.
} | ||
|
||
// set path to path_sv.data() | ||
path = path_sv.data(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path_sv
is already pointing to *path
. It's a read-only "view" over a string that exists elsewhere in memory, hence the name.
src/devices/vdml_usd.cpp
Outdated
@@ -76,7 +89,7 @@ std::vector<std::string> list_files(const char* path) { | |||
file_name = std::string_view(str.data() + index, delimiter_pos - index); | |||
// This is point where content of the std::string_view file name is copied to its | |||
// own std::string and added to the files vector | |||
files.emplace_back(file_name); | |||
files.emplace_back("/usd" + std::string(path) + "/" + std::string(file_name)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't do it like this, this creates 7 different std::string
s (each of which "may" allocate and pretty likely at least 2 will, wasting a lot of performance).
I don't believe the version of libstdc++ (the standard library) PROS uses supports std::format
but if so you could write the very nice:
files.emplace_back(std::format("/{}/{}", path, file_name));
if not i would recommend something more like
const size_t path_size = 1 + usd_prefix.length() + path.length() + 1 + file_name.length(); // +1's for the slashes
std::string buf;
buf.reserve(path_size); // Reserve enough space in the internal array that we can snprintf right into it
// The %.*s format specifier allows passing the size of a format arg before the value of the arg, so it lets
// us use the string_view into `str` with the C-style snprintf function even though it does not have the
// null terminator that would usually be required to know when a string ends.
// We also add 1 to path_size for the `n` argument for the null terminator, the space for which is
// implicitly included by the std::string::reserve call
std::snprintf(buf.data(), path_size + 1, "/%s/%.*s", path, static_cast<int>(file_name.length()), file_name.data());
files.push_back(std::move(buf));
This retains the property of the old version that it only had to allocate the string once.
(EDIT: removed the + 1
in path_size calculation for null terminator, as I realized that std::string::reserve
implicitly adds 1 for that by itself, and add it back in snprintf
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe the version of libstdc++ (the standard library) PROS uses supports std::format
Sadly it does not :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Gracelu128 Ty but
- I would recommend keeping the comments (or rewriting them in your own words if you want). Always better to explain what you're doing, especially if it may be non-obvious why (and I think readers of this code may not immediately understand what the
%.*s
does, and why its needed. I know I wouldn't know what that format specifier means at first glance) - Sorry, I actually made an edit to the code block after you looked at it I think:
(EDIT: removed the
+ 1
inpath_size
calculation for null terminator, as I realized thatstd::string::reserve
implicitly adds 1 for that by itself, and add it back insnprintf
)
You could also just subtract 1 from the param to the reserve
call if you think thats preferable.
Summary:
Adds functionality to list the files on a sd card.
Motivation:
Made wrapper function for the CPP api that returns a vector of strings of file names. Renamed original function as list_files_raw in both CPP api and C api.
Test Plan: