Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracted Logic from UserService into RepoUrlValidator #496

Closed
wants to merge 12 commits into from

Conversation

TheDavSmasher
Copy link

@TheDavSmasher TheDavSmasher commented Dec 13, 2024

Overview

Went over UserService and RepoUrlValidator to move methods around and do reformatting, removing unnecessary code blocks or methods. Methods in UserService that belonged in RepoUrlValidator were moved there.

Renamed GitHelper to GitGradingHelper and added a non-grading context specific GitHelper class

Other refactors include adding a Logger to RepoUrlValidator and changing all instances of clean to cleanRepoUrl

@TheDavSmasher TheDavSmasher linked an issue Dec 13, 2024 that may be closed by this pull request
@TheDavSmasher TheDavSmasher changed the title user-service-git-logic-extraction Extracted Logic from UserService into RepoUrlValidator Dec 13, 2024
@frozenfrank
Copy link
Contributor

What is the point of these changes? How is the changed version better than the original version?

Copy link
Contributor

@frozenfrank frozenfrank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to submitting changes, let's focus on making improvements and contributions to the overall design and/or functionality of the product. Currently, these changes represent no functional adjustments and only contain design regressions.

I would recommend closing this PR and trying again to address #463 by first understanding the key problem and then addressing just that issue.

} catch (Exception e) {
LOGGER.error("Error cloning repo during repoPatch: {}", e.getMessage());
throw new InternalServerException("There was an internal server error in verifying the Github Repo", e);
}
if (!valid) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this condition separated from the logic that produces it?

Additionally, if we're now switching to a cleaner version of RepoUrlValidator.isValidRepoUrl(), then we shouldn't need the generic catch (Exception) block anymore either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because SRP. The idea was that UserService shouldn't do the logic, and Validator shouldn't handle usecase-specific logs and Exceptions.

A reason to keep catch (Exception) might be for the sake of better error messages, but I don't have strong feelings either way

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only kept the exception block because it was already there for what is essentially the same logic, but I can remove it if we think it wasn't necessary in the first place, since it's accomplishing the same

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because SRP. The idea was that UserService shouldn't do the logic, and Validator shouldn't handle usecase-specific logs and Exceptions.

I'm sorry @ThanGerlek for not writing very clearly. I think there has been a misunderstanding. The remaining catch (Exception) is not related to the Single Responsibility Principle. It was originally in the code here because the first edition was manually cloning the repo, and this Exception was likely installed to catch untyped RuntimeException which were present when this code was first introduced. Now that this code is being refactored, the now-unnecessary generic catch should be removed in favor of properly declaring typed exceptions.

As it stands, this generic exception is mostly non-functional since none of the code it calls could throw a non-typed exception anymore. This really represents an incomplete code refactor.

A reason to keep catch (Exception) might be for the sake of better error messages, but I don't have strong feelings either way

The code remaining after the refactor is not the right place to be cleaning up error messages. The error message should be closely related to the condition that causes it; therefore, the descriptive error message should be provided by the root function so that the same error message is provided in all the contexts where it arises.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused on if I should: a) remove the Catch block entirely, b) catch specific-typed exceptions instead, c) use the Logger at all with the exception, or d) something entirely I don't understand.
You've clarified why you want it but I'm not fully getting what to do

/**
* Cleans up and returns the provided GitHub Repo URL for consistent formatting.
*/
private static String requireCleanRepoUrl(String url) throws BadRequestException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function isn't "unnecessary" just because it's currently only used once. It encapsulates and adapts the function call to another class. Previously, it was called multiple times. Now that it's currently only called once, that doesn't make it redundant. It just makes it easier to extend and reintroduce in the future.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function was static and only created an instance to call an instance method only ever called within this method, making this function a wrapper method for what should just be a static method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It encapsulates and adapts the function call to another class. (Emphasis added)

The function is doing more that merely creating an instance. It is specializing the call to RepoUrlValidator and converting the InvalidRepoUrlException which is generic to all the different contexts where RepoUrlValidator is used into the BadRequestException which is what this specific service needs.

The encapsulation and adaption is all handled in this function so that it can be easily reused within this class. Breaking the code into smaller functions also improves readability of the code overall. Notice that this function is marked as private which indicates its status as a helper function.

This is a proper and useful usage of the Adapter Design Pattern. Please read up on the design pattern and then restore the function to the way it was.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As many CS 240 students have tested, we allow students to inline all of their method calls and write duplicated and poor code to pass the class. That's just fine for them. As TA's working on the AutoGrader, we hold ourselves to higher standards. We have mostly taken additional courses in good software design, and we are building this tool following all the best practices we have learned by study, practice, and/or experience.

The contributors to the project have a done a good job producing high-quality code up to this point. Please try to learn from the principles demonstrated in the code rather than removing the very patterns and techniques intentionally installed to make it easier to maintain in the future.

src/main/java/edu/byu/cs/util/RepoUrlValidator.java Outdated Show resolved Hide resolved
clean(repoUrl);
return true;
} catch (InvalidRepoUrlException e) {
public static boolean isValidRepoUrl(String url) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming the static method from RepoUrlValidator.clean() to RepoUrlValidator.cleanRepoUrl() introduces redundancy and duplication in the naming. Additional justification should be provided to defend this as a good change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll push back only against the idea that "additional justification" is needed here.

It's just cause clean() and cleanRepoUrl() were merged, and that happened to be the one that was kept. If we think clean() is a bit more concise, then sure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just cause clean() and cleanRepoUrl() were merged, and that happened to be the one that was kept.

The problem here is that the two functions named above were merged when they were never intended to be merged. They were intentionally written as separate methods, and a little more effort should be undertaken to understand why they were written separately before zealously merging them together.

Consider the following original call signatures:

  • static boolean isValid(String)
  • static String clean(String) throws InvalidRepoUrlException
  • String clean(String) throws InvalidRepoUrlException

Can you think of any reasons they would be written/exposed as separate call signatures? What additional reasons would you find if you searched for the usages of the functions in the app?

Here are several reasons for the individual and separate existence of the call signature named above:

  • String clean(String) throws ...: This is the non-static function that actually does all of the work. Since this class is intended to be used almost entirely via the static methods, but this method is not static, this is functionally equivalent to a protected function. Most clients of the class will never see or care about this function, but it is provided in the public API so that advanced users could access or override it if desired. This is the main entry point into the working logic.
    • Notice that, in order to provide sufficient context for all the different places this code is used, the base implementation throws errors when the string is invalid, and returns a string when it is valid.
    • This gives future developers maximum freedom to reimplement the guts of this function while still satisfying the original call signature. For example, the current implementation is hard-coded to only accept GitHub.com URLs. It is very feasible that in the future, users of this code would want to allow URLs from other websites (like cs.byu.edu or bitbucket.com etc...). Different parsing rules could be inserted, or the direct parsing could be entirely replaced with a more generic rule validator, all without changing the call signature.
    • Notice that none of the app currently uses this call signature directly at the present time.
  • static String clean(String) throws ...: This static method is a convenience wrapper over the internal function that makes it available via static import.
    • Why don't we just write the function in the static namespace? Because the implementation of the code may want to update state as instance variables of the class (as one example). Consistent with the discussion in Convert Tesing Utilities to Static Imports #495, we are not avoiding static imports entirely, but leveraging the best of both worlds. On the one hand, creating an instance variable allows us to have separate state for each instance along with other potential benefits; and on the other hand, exposing a public static method on the API makes it clean and simple for clients to use.
    • As a static method, it will be invoked with the full name of the class.
    • This is exposed as a Façade over the real implementation. As such, this function can consider the perspective and needs of the calling clients. Therefore, the name is written to reduce duplication and provide a clear behavior when called as RepoUrlValidator.clean(String).
  • static boolean isValid(String): This is provided as net another convenience function over the real implementation.
    • Notice that this function does not throw an error! Instead, this function adapts the error call and alway returns a boolean True/False value to simply indicate the existence of a valid Repo URL.
    • This is used in multiple places in the app to test if the url is valid without needing to see the actual cleaned URL. This operation is performed by the tests and at a few decision points in the algorithm.
    • This saves duplication from multiple clients where they would need to catch the error and convert it into a boolean value manually.
    • If RepoUrl checking were an expensive operation, this function could be implemented differently to provide a lightweight check for validity without actually going through the expensive operation. However, since this is not the case for us, the implementation simply delegates to the root delegation and then adapts the response.

src/main/java/edu/byu/cs/util/RepoUrlValidator.java Outdated Show resolved Hide resolved
src/test/java/edu/byu/cs/util/RepoUrlValidatorTest.java Outdated Show resolved Hide resolved
@19mdavenport
Copy link
Collaborator

@TheDavSmasher don't worry, James does this with everyone's PRs, hopefully you'll get more used to it eventually

Copy link
Contributor

@ThanGerlek ThanGerlek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, with the understanding that the new GitHelper class will be fleshed out later

Copy link
Contributor

@frozenfrank frozenfrank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gratitude

Thank you for putting yourself out there and attempting to contribute to the AutoGrader. I recognize that this is your first PR, and I applaud you for taking the initiative to not merely observe the action, but also use your keyboard and present some changes. That effort is respectable.

Overview

These changes are a good first try, but they are not going in a direction that will result in positive contributions to the autograder. I have left extensive comments detailing the reasons some of the changes may be going in the wrong direction, incompletely applying refactors, or failing to account for important contextual information.

This isn't entirely your fault. The issue you have likely been referencing as a start (#463), contained only links to comments which referred to changes without explicitly stating exactly what was expected as a resolution. With more experience and time on the team to be aware of the scope and context of all the topics we discuss, you may have had a chance at understanding the root issue; however, this is really my fault for assuming so much context and failing to give a proper written description of the problem and solution.

Updated Description

The description for #463 (reference) has now been expanded with additional detail listing out exactly the expected changes.

Situation

Largely because I failed to provide clear written directions, these proposed changes miss the target of the intended contribution and change several things which should not presently be changed.

This is a great opportunity to exercise the powerful branching abilities supported by git. We can preserve the good portions of this PR (good job making small, individual commits which isolate changes), and then allow the rest of the changes to drift away. GitHub will still preserve them for us so we have a full history of the project, but we'll move forward without them so we can focus on the changes that contribute positively.

The following commit is good and can be used as-is in a new PR:

Recommendation

This is how I would recommend moving forward. I have broken down the recommended process into small, individually accomplishable steps, which should maximize your probability of success:

  1. Create a new branch (i.e. user-service-git-logic-extraction-2)
  2. Cherry-pick the above commit onto the new branch
  3. Revisit Extract git logic from UserService #463 with the expanded description.
  4. Devise a plan to resolve the issue. For this issue, it will not require changing very many files, and the files that do change will only need a few changes.
  5. Review the plan a prior contributor to the project
  6. Implement the plan
  7. Submit a new PR for review & work through any feedback (this should be minimal if you devised a plan with a prior contributor before hand).
  8. Clean up old resources:
    • Close this PR
    • Delete this branch
  9. Celebrate about successfully contributing your first PR to the AutoGrader!

Here are command-line prompts that will implement the steps above. At least, it implements as many steps as can be implemented with git on the command-line.

# Create a new branch based on `main`
git checkout -b user-service-git-logic-extraction-2 origin/main

# Cherry-pick the commit that we want to keep
git cherry-pick 730ff41b

# ... Do work here

# Push and submit new PR
git push

# ... Close the first PR

# Delete the old branch (GitHub still preserves it)
git push origin -d user-service-git-logic-extraction

Post Script

P.S. Many of my actual comments may come off as very direct, probably mean, and definitely emotionally charged. I apologize in advance for that. After going through all the details, and then summarizing the results at the end, I am able to give you a lot more credit that I was giving you in the thick of the review. You've done a good job getting to this point. Now, let's work together to help you properly address and resolve #463!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this entire file renamed?

This GitHelper class is not specific to grading information. It is intentionally left generic and capable of performing multiple tasks. In fact, it is currently used in multiple places which this rename misrepresents:

  • It is used by the commit analytics to generate reports of the commit patterns of students
  • It is used by the CommitVerification system to verify that students meet minimum thresholds
  • Now, it is also being used by the Repo Name changer to verify that a repo can be cloned

Only one of these usages are directly related to grading. Please stop changing things when their full context is not understood. Assume instead that the code was built for a particular reason and try to make the minimal possible changes to bring about an intended behavior.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going off of other TA suggestions but it can be easily reverted

} catch (Exception e) {
LOGGER.error("Error cloning repo during repoPatch: {}", e.getMessage());
throw new InternalServerException("There was an internal server error in verifying the Github Repo", e);
}
if (!valid) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because SRP. The idea was that UserService shouldn't do the logic, and Validator shouldn't handle usecase-specific logs and Exceptions.

I'm sorry @ThanGerlek for not writing very clearly. I think there has been a misunderstanding. The remaining catch (Exception) is not related to the Single Responsibility Principle. It was originally in the code here because the first edition was manually cloning the repo, and this Exception was likely installed to catch untyped RuntimeException which were present when this code was first introduced. Now that this code is being refactored, the now-unnecessary generic catch should be removed in favor of properly declaring typed exceptions.

As it stands, this generic exception is mostly non-functional since none of the code it calls could throw a non-typed exception anymore. This really represents an incomplete code refactor.

A reason to keep catch (Exception) might be for the sake of better error messages, but I don't have strong feelings either way

The code remaining after the refactor is not the right place to be cleaning up error messages. The error message should be closely related to the condition that causes it; therefore, the descriptive error message should be provided by the root function so that the same error message is provided in all the contexts where it arises.

/**
* Cleans up and returns the provided GitHub Repo URL for consistent formatting.
*/
private static String requireCleanRepoUrl(String url) throws BadRequestException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It encapsulates and adapts the function call to another class. (Emphasis added)

The function is doing more that merely creating an instance. It is specializing the call to RepoUrlValidator and converting the InvalidRepoUrlException which is generic to all the different contexts where RepoUrlValidator is used into the BadRequestException which is what this specific service needs.

The encapsulation and adaption is all handled in this function so that it can be easily reused within this class. Breaking the code into smaller functions also improves readability of the code overall. Notice that this function is marked as private which indicates its status as a helper function.

This is a proper and useful usage of the Adapter Design Pattern. Please read up on the design pattern and then restore the function to the way it was.

/**
* Cleans up and returns the provided GitHub Repo URL for consistent formatting.
*/
private static String requireCleanRepoUrl(String url) throws BadRequestException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As many CS 240 students have tested, we allow students to inline all of their method calls and write duplicated and poor code to pass the class. That's just fine for them. As TA's working on the AutoGrader, we hold ourselves to higher standards. We have mostly taken additional courses in good software design, and we are building this tool following all the best practices we have learned by study, practice, and/or experience.

The contributors to the project have a done a good job producing high-quality code up to this point. Please try to learn from the principles demonstrated in the code rather than removing the very patterns and techniques intentionally installed to make it easier to maintain in the future.

clean(repoUrl);
return true;
} catch (InvalidRepoUrlException e) {
public static boolean isValidRepoUrl(String url) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just cause clean() and cleanRepoUrl() were merged, and that happened to be the one that was kept.

The problem here is that the two functions named above were merged when they were never intended to be merged. They were intentionally written as separate methods, and a little more effort should be undertaken to understand why they were written separately before zealously merging them together.

Consider the following original call signatures:

  • static boolean isValid(String)
  • static String clean(String) throws InvalidRepoUrlException
  • String clean(String) throws InvalidRepoUrlException

Can you think of any reasons they would be written/exposed as separate call signatures? What additional reasons would you find if you searched for the usages of the functions in the app?

Here are several reasons for the individual and separate existence of the call signature named above:

  • String clean(String) throws ...: This is the non-static function that actually does all of the work. Since this class is intended to be used almost entirely via the static methods, but this method is not static, this is functionally equivalent to a protected function. Most clients of the class will never see or care about this function, but it is provided in the public API so that advanced users could access or override it if desired. This is the main entry point into the working logic.
    • Notice that, in order to provide sufficient context for all the different places this code is used, the base implementation throws errors when the string is invalid, and returns a string when it is valid.
    • This gives future developers maximum freedom to reimplement the guts of this function while still satisfying the original call signature. For example, the current implementation is hard-coded to only accept GitHub.com URLs. It is very feasible that in the future, users of this code would want to allow URLs from other websites (like cs.byu.edu or bitbucket.com etc...). Different parsing rules could be inserted, or the direct parsing could be entirely replaced with a more generic rule validator, all without changing the call signature.
    • Notice that none of the app currently uses this call signature directly at the present time.
  • static String clean(String) throws ...: This static method is a convenience wrapper over the internal function that makes it available via static import.
    • Why don't we just write the function in the static namespace? Because the implementation of the code may want to update state as instance variables of the class (as one example). Consistent with the discussion in Convert Tesing Utilities to Static Imports #495, we are not avoiding static imports entirely, but leveraging the best of both worlds. On the one hand, creating an instance variable allows us to have separate state for each instance along with other potential benefits; and on the other hand, exposing a public static method on the API makes it clean and simple for clients to use.
    • As a static method, it will be invoked with the full name of the class.
    • This is exposed as a Façade over the real implementation. As such, this function can consider the perspective and needs of the calling clients. Therefore, the name is written to reduce duplication and provide a clear behavior when called as RepoUrlValidator.clean(String).
  • static boolean isValid(String): This is provided as net another convenience function over the real implementation.
    • Notice that this function does not throw an error! Instead, this function adapts the error call and alway returns a boolean True/False value to simply indicate the existence of a valid Repo URL.
    • This is used in multiple places in the app to test if the url is valid without needing to see the actual cleaned URL. This operation is performed by the tests and at a few decision points in the algorithm.
    • This saves duplication from multiple clients where they would need to catch the error and convert it into a boolean value manually.
    • If RepoUrl checking were an expensive operation, this function could be implemented differently to provide a lightweight check for validity without actually going through the expensive operation. However, since this is not the case for us, the implementation simply delegates to the root delegation and then adapts the response.

src/main/java/edu/byu/cs/util/RepoUrlValidator.java Outdated Show resolved Hide resolved
src/test/java/edu/byu/cs/util/RepoUrlValidatorTest.java Outdated Show resolved Hide resolved
return true;
} catch (InvalidRepoUrlException e) {
return false;
GitHelper.fetchRepo(cloningDir, url);
Copy link
Contributor

@frozenfrank frozenfrank Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this new comment, this implementation is now particularly dangerous because RepoUrlValidator.isValid() and RepoUrlValidator.clean() are based on different conditions. This will cause problems, and if it's not already causing problems, it's because I didn't write tricky enough tests (that's my bad).

The isValid() function and the clean() function are intended to go together. They need to always work together or fail together. If they are evaluating different conditions (isValid() performs a check that the repo clones properly, but clean() skips that check and merely evaluates the string according to predefined string parsing rules, then the app will misbehave very quickly.

Allow me to present a few test cases that would break the system based on the current setup, let alone future changes:

Repo URL isValid() clean() Comment
https://github.com/frozenfrank/chess Baseline. Valid URL.
https://bitbucket.com/frozenfrank/chess Clones successfully, but does not match the current string rules.
https://github.com/not-a-student/chess Does not exist, but does match our string rules
https://github.com/19mdavenport/cs240chessStarter In the future, we hope to reject this URL because it is a fork of the base repo.

The original edition of the code has the problem as well, but this PR, if accepted, would make the problem even worse. Previously, a student could potentially submit the URL into the system based on one set of rules, and then have it fail during evaluation due to a different set of rules performing the work. With these changes, the difference in the code is not partially changed within the autograding system. Some calls would work and some would fail all within the same system which are assumed and intended to be consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, this change still does not address the original issue at hand: that code which directly implements a git clone is duplicated in the system and should be replaced with a call to the existing implementation located in GitHelper.java #463 (comment).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It already is using the now extracted method of GitHelper.
And, for the 4 cases that you provided, I'm confused on which ones are failing that shouldn't or passing when they should fail? Your examples follow what should already happen, no? Because the current chain of code calls clean and validifies it (updateRepoUrl calls clean and setRepoUrl, which calls isValidRepoUrl (although I could rename it to isValid))

src/main/java/edu/byu/cs/util/RepoUrlValidator.java Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extract git logic from UserService
4 participants