diff --git a/_posts/2024-10-24-collecting-and-handling-result-values.html b/_posts/2024-10-24-collecting-and-handling-result-values.html new file mode 100644 index 00000000..5d82ee86 --- /dev/null +++ b/_posts/2024-10-24-collecting-and-handling-result-values.html @@ -0,0 +1,223 @@ +--- +layout: post +title: "Collecting and handling result values" +description: "The answer is traverse. It's always traverse." +date: 2024-10-24 9:25 UTC +tags: [Functional Programming] +--- +{% include JB/setup %} + +
+ {{ page.description }} +
++ I recently came across a Stack Overflow question about collecting and handling sum types (AKA discriminated unions or, in this case, result types). While the question was tagged functional-programming, the overall structure of the code was so imperative with so much interleaved I/O that it hardly qualified as functional architecture. +
++ Instead, I gave an answer which involved a minimal change to the code. Subsequently, the original poster asked to see a more functional version of the code. That's a bit too large a task for a Stack Overflow answer, I think, so I'll do it here on the blog instead. +
++ Further comments and discussion on the original post reveal that the poster is interested in two alternatives. I'll start with the alternative that's only discussed, but not shown in the question. The motivation for that is that this variation is easier to implement than the other one, and I consider it pedagogical to start with the simplest case. +
++ I'll do that in this article, and then follow up with another article that covers the short-circuiting case. +
++ To begin, consider this mostly imperative code snippet: +
++
var storedItems = new List<ShoppingListItem>(); +var failedItems = new List<ShoppingListItem>(); +var errors = new List<Error>(); +var state = (storedItems, failedItems, errors); +foreach (var item in itemsToUpdate) +{ + OneOf<ShoppingListItem, NotFound, Error> updateResult = await UpdateItem(item, dbContext); + state = updateResult.Match<(List<ShoppingListItem>, List<ShoppingListItem>, List<Error>)>( + storedItem => { storedItems.Add(storedItem); return state; }, + notFound => { failedItems.Add(item); return state; }, + error => { errors.Add(error); return state; } + ); +} + +await dbContext.SaveChangesAsync(); + +return Results.Ok(new BulkUpdateResult([.. storedItems], [.. failedItems], [.. errors]));+ +
+ There's quite a few things to take in, and one has to infer most of the types and APIs, since the original post didn't show more code than that. If you're used to engaging with Stack Overflow questions, however, it's not too hard to figure out what most of the moving parts do. +
+
+ The most non-obvious detail is that the code uses a library called OneOf, which supplies general-purpose, but rather abstract, sum types. Both the container type OneOf
, as well as the two indicator types NotFound
and Error
are defined in that library.
+
+ The Match
method implements standard Church encoding, which enables the code to pattern-match on the three alternative values that UpdateItem
returns.
+
+ One more detail also warrants an explicit description: The itemsToUpdate
object is an input argument of the type IEnumerable<ShoppingListItem>
.
+
+ The implementation makes use of mutable state and undisciplined I/O. How do you refactor it to a more functional design? +
++ I'll pretend that we only need to turn the above code snippet into a functional design. Thus, I'm ignoring that the code is most likely part of a larger code base. Because of the implied database interaction, the method isn't a pure function. Unless it's a top-level method (that is, at the boundary of the application), it doesn't exemplify larger-scale functional architecture. +
+
+ That said, my goal is to refactor the code to an Impureim Sandwich: Impure actions first, then the meat of the functionality as a pure function, and then some more impure actions to complete the functionality. This strongly suggests that the first step should be to map over itemsToUpdate
and call UpdateItem
for each.
+
+ If, however, you do that, you get this: +
++
IEnumerable<Task<OneOf<ShoppingListItem, NotFound, Error>>> results = + itemsToUpdate.Select(item => UpdateItem(item, dbContext));+ +
+ The results
object is a sequence of tasks. If we consider Task as a surrogate for IO, each task should be considered impure, as it's either non-deterministic, has side effects, or both. This means that we can't pass results
to a pure function, and that frustrates the ambition to structure the code as an Impureim Sandwich.
+
+ This is one of the most common problems in functional programming, and the answer is usually: Use a traversal. +
++
IEnumerable<OneOf<ShoppingListItem, NotFound<ShoppingListItem>, Error>> results = + await itemsToUpdate.Traverse(item => UpdateItem(item, dbContext));+ +
+ Because this first, impure layer of the sandwich awaits the task, results
is now an immutable value that can be passed to the pure step. This, by the way, assumes that ShoppingListItem
is immutable, too.
+
+ Notice that I adjusted one of the cases of the discriminated union to NotFound<ShoppingListItem>
rather than just NotFound
. While the OneOf library ships with a NotFound
type, it doesn't have a generic container of that name, so I defined it myself:
+
+
internal sealed record NotFound<T>(T Item);+ +
+ I added it to make the next step simpler. +
+
+ The next step is to sort the results
into three 'buckets', as it were.
+
+
// Pure +var seed = + ( + Enumerable.Empty<ShoppingListItem>(), + Enumerable.Empty<ShoppingListItem>(), + Enumerable.Empty<Error>() + ); +var result = results.Aggregate( + seed, + (state, result) => + result.Match( + storedItem => (state.Item1.Append(storedItem), state.Item2, state.Item3), + notFound => (state.Item1, state.Item2.Append(notFound.Item), state.Item3), + error => (state.Item1, state.Item2, state.Item3.Append(error))));+ +
+ It's also possible to inline the seed
value, but here I defined it in a separate expression in an attempt at making the code a little more readable. I don't know if I succeeded, because regardless of where it goes, it's hardly idiomatic to break tuple initialization over multiple lines. I had to, though, because otherwise the code would run too far to the right.
+
+ The lambda expression handles each result in results
and uses Match
to append the value to its proper 'bucket'. The result
is a tuple with the three collections.
+
+ The final, impure step in the sandwich is to save the changes and return the results: +
++
// Impure +await dbContext.SaveChangesAsync(); +return new OkResult( + new BulkUpdateResult([.. result.Item1], [.. result.Item2], [.. result.Item3]));+ +
+ To be honest, the last line of code is pure, but that's not unusual when it comes to Impureim Sandwiches. +
+
+ So far, I've assumed that the final BulkUpdateResult
class is just a simple immutable container without much functionality. If, however, we add some copy-and-update functions to it, we can use that to aggregate the result, instead of an anonymous tuple.
+
+
internal BulkUpdateResult Store(ShoppingListItem item) => + new([.. StoredItems, item], FailedItems, Errors); + +internal BulkUpdateResult Fail(ShoppingListItem item) => + new(StoredItems, [.. FailedItems, item], Errors); + +internal BulkUpdateResult Error(Error error) => + new(StoredItems, FailedItems, [.. Errors, error]);+ +
+ I would have personally preferred the name NotFound
instead of Fail
, but I was going with the original post's failedItems
terminology, and I thought that it made more sense to call a method Fail
when it adds to a collection called FailedItems
.
+
+ Adding these three instance methods to BulkUpdateResult
simplifies the composing code:
+
+
// Impure +IEnumerable<OneOf<ShoppingListItem, NotFound<ShoppingListItem>, Error>> results = + await itemsToUpdate.Traverse(item => UpdateItem(item, dbContext)); + +// Pure +var result = results.Aggregate( + new BulkUpdateResult([], [], []), + (state, result) => + result.Match( + storedItem => state.Store(storedItem), + notFound => state.Fail(notFound.Item), + error => state.Error(error))); + +// Impure +await dbContext.SaveChangesAsync(); +return new OkResult(result);+ +
+ This variation starts with an empty BulkUpdateResult
and then uses Store
, Fail
, or Error
as appropriate to update the state.
+
+ If the tasks you want to traverse are thread-safe, you might consider making it concurrent. You can use Task.WhenAll for that. It has the same type as Sequence
, so if you can live with the extra non-determinism that comes with parallel execution, you can use that instead:
+
+
internal static async Task<IEnumerable<T>> Sequence<T>(this IEnumerable<Task<T>> tasks) +{ + return await Task.WhenAll(tasks); +}+ +
+ Since the method signature doesn't change, the rest of the code remains unchanged. +
+
+ One of the most common stumbling blocks in functional programming is when you have a collection of values, and you need to perform an impure action (typically I/O) for each. This leaves you with a collection of impure values (Task
in C#, Task
or Async
in F#, IO
in Haskell, etc.). What you actually need is a single impure value that contains the collection of results.
+
+ The solution to this kind of problem is to traverse the collection, rather than mapping over it (with Select
, map
, fmap
, or similar). Note that computer scientists often talk about traversing a data structure like a tree. This is a less well-defined use of the word, and not directly related. That said, you can also write Traverse
and Sequence
functions for trees.
+
+ This article used a Stack Overflow question as the starting point for an example showing how to refactor imperative code to an Impureim Sandwich. +
++ This completes the first variation requested in the Stack Overflow question. +
++ Next: Short-circuiting an asynchronous traversal. +
+