diff --git a/.nojekyll b/.nojekyll
index d5204c8..ce80980 100644
--- a/.nojekyll
+++ b/.nojekyll
@@ -1 +1 @@
-8a415019
\ No newline at end of file
+4edfc353
\ No newline at end of file
diff --git a/Building-reproducible-analytical-pipelines-with-R.epub b/Building-reproducible-analytical-pipelines-with-R.epub
index 59a5f36..a22c433 100644
Binary files a/Building-reproducible-analytical-pipelines-with-R.epub and b/Building-reproducible-analytical-pipelines-with-R.epub differ
diff --git a/Building-reproducible-analytical-pipelines-with-R.pdf b/Building-reproducible-analytical-pipelines-with-R.pdf
index f949c55..1fb8258 100644
Binary files a/Building-reproducible-analytical-pipelines-with-R.pdf and b/Building-reproducible-analytical-pipelines-with-R.pdf differ
diff --git a/fprog.html b/fprog.html
index 42ef3cd..f039ac3 100644
--- a/fprog.html
+++ b/fprog.html
@@ -815,8 +815,8 @@
chronicler::read_log(result)
[1] "Complete log:"
-[2] "NOK! sqrt() ran unsuccessfully with following exception: NaNs produced at 2024-04-11 13:04:00"
-[3] "Total running time: 0.000860929489135742 secs"
+[2] "NOK! sqrt() ran unsuccessfully with following exception: NaNs produced at 2024-07-14 15:23:16"
+[3] "Total running time: 0.000793218612670898 secs"
The {purrr} package also comes with function factories that you might find useful ({possibly}, {safely} and {quietly}).
@@ -1679,7 +1679,7 @@
function (x, ...)
UseMethod("print")
-<bytecode: 0x55f662fb0a40>
+<bytecode: 0x558c98972ff0>
<environment: namespace:base>
@@ -1723,7 +1723,7 @@
diff --git a/search.json b/search.json
index 47701b4..b9300f6 100644
--- a/search.json
+++ b/search.json
@@ -214,7 +214,7 @@
"href": "fprog.html#writing-good-functions",
"title": "6 Functional programming",
"section": "6.2 Writing good functions",
- "text": "6.2 Writing good functions\n\n6.2.1 Functions are first-class objects\nIn a functional programming language, functions are first-class objects. Contrary to what the name implies, this means that functions, especially the ones you define yourself, are nothing special. A function is an object like any other, and can thus be manipulated as such. Think of anything that you can do with any object in R, and you can do the same thing with a function. For example, let’s consider the +() function. It takes two numeric objects and returns their sum:\n\n1 + 5.3\n\n[1] 6.3\n\n# or alternatively: `+`(1, 5.3)\n\nYou can replace the numbers with functions that return numbers:\n\nsqrt(1) + log(5.3)\n\n[1] 2.667707\n\n\nIt’s also possible to define a function that explicitly takes another function as an input:\n\nh <- function(number, f){\n f(number)\n}\n\nYou can call then use h() as a wrapper for f():\n\nh(4, sqrt)\n\n[1] 2\n\nh(10, log10)\n\n[1] 1\n\n\nBecause h() takes another function as an argument, h() is called a higher-order function.\nIf you don’t know how many arguments f(), the function you’re wrapping, has, you can use the ...:\n\nh <- function(number, f, ...){\n f(number, ...)\n}\n\n... are simply a place-holder for any potential additional argument that f() might have:\n\nh(c(1, 2, NA, 3), mean, na.rm = TRUE)\n\n[1] 2\n\nh(c(1, 2, NA, 3), mean, na.rm = FALSE)\n\n[1] NA\n\n\nna.rm is an argument of mean(). As the developer of h(), I don’t necessarily know what f() might be, but even if I knew what f() would be and knew all its arguments, I might not want to list them all. So I can use ... instead. The following is also possible:\n\nw <- function(...){\n paste0(\"First argument: \", ..1,\n \", second argument: \", ..2,\n \", last argument: \", ..3)\n}\n\nw(1, 2, 3)\n\n[1] \"First argument: 1, second argument: 2, last argument: 3\"\n\n\nIf you want to learn more about ..., type ?dots in an R console.\nBecause functions are nothing special, you can also write functions that return functions. As an illustration, we’ll be writing a function that converts warnings to errors. This can be quite useful if you want your functions to fail early, which often makes debugging easier. For example, try running this:\n\nsqrt(-5)\n\nWarning in sqrt(-5): NaNs produced\n\n\n[1] NaN\n\n\nThis only raises a warning and returns NaN (Not a Number). This can be quite dangerous, especially when working non-interactively, which is what we will be doing a lot later on. It is much better if a pipeline fails early due to an error, than dragging a NaN value. This also happens with log10():\n\nlog10(-10)\n\nWarning: NaNs produced\n\n\n[1] NaN\n\n\nSo it could be useful to redefine these functions to raise an error instead, for example like this:\n\nstrict_sqrt <- function(x){\n\n if(x < 0) stop(\"x is negative\")\n\n sqrt(x)\n\n}\n\nThis function now throws an error for negative x:\n\nstrict_sqrt(-10)\n\nError in strict_sqrt(-10) : x is negative\nHowever, it can be quite tedious to redefine every function that we need in our pipeline, and remember, we don’t want to repeat ourselves. So, because functions are nothing special, we can define a function that takes a function as an argument, converts any warning thrown by that function into an error, and returns a new function. For example:\n\nstrictly <- function(f){\n function(...){\n tryCatch({\n f(...)\n },\n warning = function(warning)stop(\"Can't do that chief\"))\n }\n}\n\nThis function makes use of tryCatch() which catches warnings raised by an expression (in this example the expression is f(...)) and then raises an error instead with the stop() function. It is now possible to define new functions like this:\n\ns_sqrt <- strictly(sqrt)\n\n\ns_sqrt(-4)\n\nError in value[[3L]](cond) : Can't do that chief\n\ns_log <- strictly(log)\n\n\ns_log(-4)\n\nError in value[[3L]](cond) : Can't do that chief\nFunctions that return functions are called function factories and they’re incredibly useful. I use this so much that I’ve written a package, available on CRAN, called {chronicler}, that does this:\n\ns_sqrt <- chronicler::record(sqrt)\n\n\nresult <- s_sqrt(-4)\n\nresult\n\nNOK! Value computed unsuccessfully:\n---------------\nNothing\n\n---------------\nThis is an object of type `chronicle`.\nRetrieve the value of this object with pick(.c, \"value\").\nTo read the log of this object, call read_log(.c).\n\n\nBecause the expression above resulted in an error, Nothing is returned. Nothing is a special value defined in the {maybe} package (check it out, a very interesting package!). We can then even read a log to see what went wrong:\n\nchronicler::read_log(result)\n\n[1] \"Complete log:\" \n[2] \"NOK! sqrt() ran unsuccessfully with following exception: NaNs produced at 2024-04-11 13:04:00\"\n[3] \"Total running time: 0.000860929489135742 secs\" \n\n\nThe {purrr} package also comes with function factories that you might find useful ({possibly}, {safely} and {quietly}).\nIn part 2 we will also learn about assertive programming, another way of making our functions safer, as an alternative to using function factories.\n\n\n6.2.2 Optional arguments\nIt is possible to make functions’ arguments optional, by using NULL. For example:\n\ng <- function(x, y = NULL){\n if(is.null(y)){\n print(\"optional argument y is NULL\")\n x\n } else {\n if(y == 5) print(\"y is present\"); x+y\n }\n}\n\nCalling g(10) prints the message “Optional argument y is NULL”, and returns 10. Calling g(10, 5) however, prints “y is present” and returns 15. It is also possible to use missing():\n\ng <- function(x, y){\n if(missing(y)){\n print(\"optional argument y is missing\")\n x\n } else {\n if(y == 5) print(\"y is present\"); x+y\n }\n}\n\nI however prefer the first approach, because it is clearer which arguments are optional, which is not the case with the second approach, where you need to read the body of the function.\n\n\n6.2.3 Safe functions\nIt is important that your functions are safe and predictable. You should avoid writing functions that behave like the nchar() base function. Let’s see why this function is not safe:\n\nnchar(\"10000000\")\n\n[1] 8\n\n\nIt returns the expected result of 8. But what if I remove the quotes?\n\nnchar(10000000)\n\n[1] 5\n\n\nWhat is going on here? I’ll give you a hint: simply type 10000000 in the console:\n\n10000000\n\n[1] 1e+07\n\n\n10000000 gets represented as 1e+07 by R. This number in scientific notation gets then converted into the character “1e+07” by nchar(), and this conversion happens silently. nchar() then counts the number of characters, and correctly returns 5. The problem is that it doesn’t make sense to provide a number to a function that expects a character. This function should have returned an error message, or at the very least raised a warning that the number got converted into a character. Here is how you could rewrite nchar() to make it safer:\n\nnchar2 <- function(x, result = 0){\n\n if(!isTRUE(is.character(x))){\n stop(paste0(\"x should be of type 'character', but is of type '\",\n typeof(x), \"' instead.\"))\n } else if(x == \"\"){\n result\n } else {\n result <- result + 1\n split_x <- strsplit(x, split = \"\")[[1]]\n nchar2(paste0(split_x[-1],\n collapse = \"\"), result)\n }\n}\n\nThis function now returns an error message if the input is not a character:\n\nnchar2(10000000)\n\nError in nchar2(10000000) : x should be of type 'character', but is of type 'integer' instead.\nThis section is in a sense an introduction to assertive programming. As mentioned in the section on function factories, we will be learning about assertive programming in greater detail in part 2 of the book.\n\n\n6.2.4 Recursive functions\nYou may have noticed in the last lines of nchar2() (defined above) that nchar2() calls itself. A function that calls itself in its own body is called a recursive function. It is sometimes easier to define a function in its recursive form than in an iterative form. The most common example is the factorial function. However, there is an issue with recursive functions (in the R programming language, other programming languages may not have the same problem, like Haskell): while it is sometimes easier to write a function using a recursive algorithm than an iterative algorithm, like for the factorial function, recursive functions in R are quite slow. Let’s take a look at two definitions of the factorial function, one recursive, the other iterative:\n\nfact_iter <- function(n){\n result = 1\n for(i in 1:n){\n result = result * i\n }\n result\n}\n\nfact_recur <- function(n){\n if(n == 0 || n == 1){\n result = 1\n } else {\n n * fact_recur(n-1)\n }\n}\n\nUsing the {microbenchmark} package we can benchmark the code:\n\nmicrobenchmark::microbenchmark(\n fact_recur(50),\n fact_iter(50)\n)\n\nUnit: microseconds\n expr min lq mean median uq max neval\n fact_recur(50) 21.501 21.701 23.82701 21.901 22.0515 68.902 100\n fact_iter(50) 2.000 2.101 2.74599 2.201 2.3510 21.000 100\nWe see that the recursive factorial function is 10 times slower than the iterative version. In this particular example it doesn’t make much of a difference, because the functions only take microseconds to run. But if you’re working with more complex functions, this is a problem. If you want to keep using the recursive function and not switch to an iterative algorithm, there are ways to make them faster. The first is called trampolining. I won’t go into details, but if you’re interested, there is an R package that allows you to use trampolining with R, aptly called {trampoline}1. Another solution is using the {memoise}2 package. Again, I won’t go into details. So if you want to use and optimize recursive functions, take a look at these packages.\n\n\n6.2.5 Anonymous functions\nIt is possible to define a function and not give it a name. For example:\n\nfunction(x)(x+1)(10)\n\nSince R version 4.1, there is even a shorthand notation for anonymous functions:\n\n(\\(x)(x+1))(10)\n\nBecause we don’t name them, we cannot reuse them. So why is this useful? Anonymous functions are useful when you need to apply a function somewhere inside a pipe once, and don’t want to define a function just for this. This will become clearer once we learn about lists, but before that, let’s philosophize a bit.\n\n\n6.2.6 The Unix philosophy applied to R\n\nThis is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.\n\nDoug McIlroy, in A Quarter Century of Unix3\nWe can take inspiration from the Unix philosophy and rewrite it for our purposes:\nWrite functions that do one thing and do it well. Write functions that work together. Write functions that handle lists, because that is a universal interface.\nStrive for writing simple functions that only perform one task. Don’t hesitate to split a big function into smaller ones. Small functions that only perform one task are easier to maintain, test, document and debug. These smaller functions can then be chained using the |> operator. In other words, it is preferable to have something like:\na |> f() |> g() |> h()\nwhere a is for example a path to a data set, and where f(), g() and h() successively read, clean, and plot the data, than having something like:\nbig_function(a)\nthat does all the steps above in one go.\nThis idea of splitting the problem into smaller chunks, each chunk in turn split into even smaller units that can be handled by functions and then the results of these function combined into a final output is called composition.\nThe advantage of splitting big_function() into f(), g() and h() is that you can eat the elephant one bite at a time, and also reuse these smaller functions in other projects more easily. So what’s important is that you can make small functions work together by sharing a common interface. The list is usually a good candidate for this."
+ "text": "6.2 Writing good functions\n\n6.2.1 Functions are first-class objects\nIn a functional programming language, functions are first-class objects. Contrary to what the name implies, this means that functions, especially the ones you define yourself, are nothing special. A function is an object like any other, and can thus be manipulated as such. Think of anything that you can do with any object in R, and you can do the same thing with a function. For example, let’s consider the +() function. It takes two numeric objects and returns their sum:\n\n1 + 5.3\n\n[1] 6.3\n\n# or alternatively: `+`(1, 5.3)\n\nYou can replace the numbers with functions that return numbers:\n\nsqrt(1) + log(5.3)\n\n[1] 2.667707\n\n\nIt’s also possible to define a function that explicitly takes another function as an input:\n\nh <- function(number, f){\n f(number)\n}\n\nYou can call then use h() as a wrapper for f():\n\nh(4, sqrt)\n\n[1] 2\n\nh(10, log10)\n\n[1] 1\n\n\nBecause h() takes another function as an argument, h() is called a higher-order function.\nIf you don’t know how many arguments f(), the function you’re wrapping, has, you can use the ...:\n\nh <- function(number, f, ...){\n f(number, ...)\n}\n\n... are simply a place-holder for any potential additional argument that f() might have:\n\nh(c(1, 2, NA, 3), mean, na.rm = TRUE)\n\n[1] 2\n\nh(c(1, 2, NA, 3), mean, na.rm = FALSE)\n\n[1] NA\n\n\nna.rm is an argument of mean(). As the developer of h(), I don’t necessarily know what f() might be, but even if I knew what f() would be and knew all its arguments, I might not want to list them all. So I can use ... instead. The following is also possible:\n\nw <- function(...){\n paste0(\"First argument: \", ..1,\n \", second argument: \", ..2,\n \", last argument: \", ..3)\n}\n\nw(1, 2, 3)\n\n[1] \"First argument: 1, second argument: 2, last argument: 3\"\n\n\nIf you want to learn more about ..., type ?dots in an R console.\nBecause functions are nothing special, you can also write functions that return functions. As an illustration, we’ll be writing a function that converts warnings to errors. This can be quite useful if you want your functions to fail early, which often makes debugging easier. For example, try running this:\n\nsqrt(-5)\n\nWarning in sqrt(-5): NaNs produced\n\n\n[1] NaN\n\n\nThis only raises a warning and returns NaN (Not a Number). This can be quite dangerous, especially when working non-interactively, which is what we will be doing a lot later on. It is much better if a pipeline fails early due to an error, than dragging a NaN value. This also happens with log10():\n\nlog10(-10)\n\nWarning: NaNs produced\n\n\n[1] NaN\n\n\nSo it could be useful to redefine these functions to raise an error instead, for example like this:\n\nstrict_sqrt <- function(x){\n\n if(x < 0) stop(\"x is negative\")\n\n sqrt(x)\n\n}\n\nThis function now throws an error for negative x:\n\nstrict_sqrt(-10)\n\nError in strict_sqrt(-10) : x is negative\nHowever, it can be quite tedious to redefine every function that we need in our pipeline, and remember, we don’t want to repeat ourselves. So, because functions are nothing special, we can define a function that takes a function as an argument, converts any warning thrown by that function into an error, and returns a new function. For example:\n\nstrictly <- function(f){\n function(...){\n tryCatch({\n f(...)\n },\n warning = function(warning)stop(\"Can't do that chief\"))\n }\n}\n\nThis function makes use of tryCatch() which catches warnings raised by an expression (in this example the expression is f(...)) and then raises an error instead with the stop() function. It is now possible to define new functions like this:\n\ns_sqrt <- strictly(sqrt)\n\n\ns_sqrt(-4)\n\nError in value[[3L]](cond) : Can't do that chief\n\ns_log <- strictly(log)\n\n\ns_log(-4)\n\nError in value[[3L]](cond) : Can't do that chief\nFunctions that return functions are called function factories and they’re incredibly useful. I use this so much that I’ve written a package, available on CRAN, called {chronicler}, that does this:\n\ns_sqrt <- chronicler::record(sqrt)\n\n\nresult <- s_sqrt(-4)\n\nresult\n\nNOK! Value computed unsuccessfully:\n---------------\nNothing\n\n---------------\nThis is an object of type `chronicle`.\nRetrieve the value of this object with pick(.c, \"value\").\nTo read the log of this object, call read_log(.c).\n\n\nBecause the expression above resulted in an error, Nothing is returned. Nothing is a special value defined in the {maybe} package (check it out, a very interesting package!). We can then even read a log to see what went wrong:\n\nchronicler::read_log(result)\n\n[1] \"Complete log:\" \n[2] \"NOK! sqrt() ran unsuccessfully with following exception: NaNs produced at 2024-07-14 15:23:16\"\n[3] \"Total running time: 0.000793218612670898 secs\" \n\n\nThe {purrr} package also comes with function factories that you might find useful ({possibly}, {safely} and {quietly}).\nIn part 2 we will also learn about assertive programming, another way of making our functions safer, as an alternative to using function factories.\n\n\n6.2.2 Optional arguments\nIt is possible to make functions’ arguments optional, by using NULL. For example:\n\ng <- function(x, y = NULL){\n if(is.null(y)){\n print(\"optional argument y is NULL\")\n x\n } else {\n if(y == 5) print(\"y is present\"); x+y\n }\n}\n\nCalling g(10) prints the message “Optional argument y is NULL”, and returns 10. Calling g(10, 5) however, prints “y is present” and returns 15. It is also possible to use missing():\n\ng <- function(x, y){\n if(missing(y)){\n print(\"optional argument y is missing\")\n x\n } else {\n if(y == 5) print(\"y is present\"); x+y\n }\n}\n\nI however prefer the first approach, because it is clearer which arguments are optional, which is not the case with the second approach, where you need to read the body of the function.\n\n\n6.2.3 Safe functions\nIt is important that your functions are safe and predictable. You should avoid writing functions that behave like the nchar() base function. Let’s see why this function is not safe:\n\nnchar(\"10000000\")\n\n[1] 8\n\n\nIt returns the expected result of 8. But what if I remove the quotes?\n\nnchar(10000000)\n\n[1] 5\n\n\nWhat is going on here? I’ll give you a hint: simply type 10000000 in the console:\n\n10000000\n\n[1] 1e+07\n\n\n10000000 gets represented as 1e+07 by R. This number in scientific notation gets then converted into the character “1e+07” by nchar(), and this conversion happens silently. nchar() then counts the number of characters, and correctly returns 5. The problem is that it doesn’t make sense to provide a number to a function that expects a character. This function should have returned an error message, or at the very least raised a warning that the number got converted into a character. Here is how you could rewrite nchar() to make it safer:\n\nnchar2 <- function(x, result = 0){\n\n if(!isTRUE(is.character(x))){\n stop(paste0(\"x should be of type 'character', but is of type '\",\n typeof(x), \"' instead.\"))\n } else if(x == \"\"){\n result\n } else {\n result <- result + 1\n split_x <- strsplit(x, split = \"\")[[1]]\n nchar2(paste0(split_x[-1],\n collapse = \"\"), result)\n }\n}\n\nThis function now returns an error message if the input is not a character:\n\nnchar2(10000000)\n\nError in nchar2(10000000) : x should be of type 'character', but is of type 'integer' instead.\nThis section is in a sense an introduction to assertive programming. As mentioned in the section on function factories, we will be learning about assertive programming in greater detail in part 2 of the book.\n\n\n6.2.4 Recursive functions\nYou may have noticed in the last lines of nchar2() (defined above) that nchar2() calls itself. A function that calls itself in its own body is called a recursive function. It is sometimes easier to define a function in its recursive form than in an iterative form. The most common example is the factorial function. However, there is an issue with recursive functions (in the R programming language, other programming languages may not have the same problem, like Haskell): while it is sometimes easier to write a function using a recursive algorithm than an iterative algorithm, like for the factorial function, recursive functions in R are quite slow. Let’s take a look at two definitions of the factorial function, one recursive, the other iterative:\n\nfact_iter <- function(n){\n result = 1\n for(i in 1:n){\n result = result * i\n }\n result\n}\n\nfact_recur <- function(n){\n if(n == 0 || n == 1){\n result = 1\n } else {\n n * fact_recur(n-1)\n }\n}\n\nUsing the {microbenchmark} package we can benchmark the code:\n\nmicrobenchmark::microbenchmark(\n fact_recur(50),\n fact_iter(50)\n)\n\nUnit: microseconds\n expr min lq mean median uq max neval\n fact_recur(50) 21.501 21.701 23.82701 21.901 22.0515 68.902 100\n fact_iter(50) 2.000 2.101 2.74599 2.201 2.3510 21.000 100\nWe see that the recursive factorial function is 10 times slower than the iterative version. In this particular example it doesn’t make much of a difference, because the functions only take microseconds to run. But if you’re working with more complex functions, this is a problem. If you want to keep using the recursive function and not switch to an iterative algorithm, there are ways to make them faster. The first is called trampolining. I won’t go into details, but if you’re interested, there is an R package that allows you to use trampolining with R, aptly called {trampoline}1. Another solution is using the {memoise}2 package. Again, I won’t go into details. So if you want to use and optimize recursive functions, take a look at these packages.\n\n\n6.2.5 Anonymous functions\nIt is possible to define a function and not give it a name. For example:\n\nfunction(x)(x+1)(10)\n\nSince R version 4.1, there is even a shorthand notation for anonymous functions:\n\n(\\(x)(x+1))(10)\n\nBecause we don’t name them, we cannot reuse them. So why is this useful? Anonymous functions are useful when you need to apply a function somewhere inside a pipe once, and don’t want to define a function just for this. This will become clearer once we learn about lists, but before that, let’s philosophize a bit.\n\n\n6.2.6 The Unix philosophy applied to R\n\nThis is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.\n\nDoug McIlroy, in A Quarter Century of Unix3\nWe can take inspiration from the Unix philosophy and rewrite it for our purposes:\nWrite functions that do one thing and do it well. Write functions that work together. Write functions that handle lists, because that is a universal interface.\nStrive for writing simple functions that only perform one task. Don’t hesitate to split a big function into smaller ones. Small functions that only perform one task are easier to maintain, test, document and debug. These smaller functions can then be chained using the |> operator. In other words, it is preferable to have something like:\na |> f() |> g() |> h()\nwhere a is for example a path to a data set, and where f(), g() and h() successively read, clean, and plot the data, than having something like:\nbig_function(a)\nthat does all the steps above in one go.\nThis idea of splitting the problem into smaller chunks, each chunk in turn split into even smaller units that can be handled by functions and then the results of these function combined into a final output is called composition.\nThe advantage of splitting big_function() into f(), g() and h() is that you can eat the elephant one bite at a time, and also reuse these smaller functions in other projects more easily. So what’s important is that you can make small functions work together by sharing a common interface. The list is usually a good candidate for this."
},
{
"objectID": "fprog.html#lists-a-powerful-data-structure",
@@ -228,7 +228,7 @@
"href": "fprog.html#functional-programming-in-r",
"title": "6 Functional programming",
"section": "6.4 Functional programming in R",
- "text": "6.4 Functional programming in R\nUp until now I focused on general concepts rather than on specifics of the R programming language when it comes to functional programming. In this section, we will be focusing entirely on R-specific capabilities and packages for functional programming.\n\n6.4.1 Base capabilities\nR is a functional programming language (but not only), and as such it comes with many functions out of the box to write functional code. We have already discussed lapply() and Reduce(). You should know that depending on what you want to achieve, there are other functions that are similar to lapply(): apply(), sapply(), vapply(), mapply() and tapply(). There’s also Map() which is a wrapper around mapply(). Each function performs the same basic task of applying a function over all the elements of a list or list-like structure, but it can be hard to keep them apart and when you should use one over another. This is why {purrr}, which we will discuss in the next section, is quite an interesting alternative to base R’s offering.\nAnother one of the quintessential functional programming functions (alongside Reduce() and Map()) that ships with R is Filter(). If you know dplyr::filter() you should be familiar with the concept of filtering rows of a data frame where the elements of one particular column satisfy a predicate. Filter() works the same way, but focusing on lists instead of data frame:\n\nFilter(is.character,\n list(\n seq(1, 5),\n \"Hey\")\n )\n\n[[1]]\n[1] \"Hey\"\n\n\nThe call above only returns the elements where is.character() evaluates to TRUE.\nAnother useful function is Negate() which is a function factory that takes a boolean function as an input and returns the opposite boolean function. As an illustration, suppose that in the example above we wanted to get everything but the character:\n\nFilter(Negate(is.character),\n list(\n seq(1, 5),\n \"Hey\")\n )\n\n[[1]]\n[1] 1 2 3 4 5\n\n\nThere are some other functions like this that you might want to check out: type ?Negate in console to read more about them.\nSometimes you may need to run code with side-effects, but want to avoid any interaction between these side-effects and the global environment. For example, you might want to run some code that creates a plot and saves it to disk, or code that creates some data and writes them to disk. local() can be used for this. local() runs code in a temporary environment that gets discarded at the end:\n\nlocal({\n a <- 2\n})\n\nVariable a was created inside this local environment. Checking if it exists now yields FALSE:\n\nexists(\"a\")\n\n[1] FALSE\n\n\nWe will be using this technique later in the book to keep our scripts pure.\nBefore continuing with R packages that extend R’s functional programming capabilities it’s also important to stress that just as R is a functional programming language, it is also an object oriented language. In fact, R is what John Chambers called a functional OOP language (Chambers (2014)). I won’t delve too much into what this means (read Wickham (2019) for this), but as a short discussion, consider the print() function. Depending on what type of object the user gives it, it seems as if somehow print() knows what to do with it:\n\nprint(5)\n\n[1] 5\n\nprint(head(mtcars))\n\n mpg cyl disp hp drat wt qsec vs am\nMazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1\nMazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1\nDatsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1\nHornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0\nHornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0\nValiant 18.1 6 225 105 2.76 3.460 20.22 1 0\n gear carb\nMazda RX4 4 4\nMazda RX4 Wag 4 4\nDatsun 710 4 1\nHornet 4 Drive 3 1\nHornet Sportabout 3 2\nValiant 3 1\n\nprint(str(mtcars))\n\n'data.frame': 32 obs. of 11 variables:\n $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...\n $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...\n $ disp: num 160 160 108 258 360 ...\n $ hp : num 110 110 93 110 175 105 245 62 95 123 ...\n $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...\n $ wt : num 2.62 2.88 2.32 3.21 3.44 ...\n $ qsec: num 16.5 17 18.6 19.4 17 ...\n $ vs : num 0 0 1 1 0 1 0 1 1 1 ...\n $ am : num 1 1 1 0 0 0 0 0 0 0 ...\n $ gear: num 4 4 4 3 3 3 3 4 4 4 ...\n $ carb: num 4 4 1 1 2 1 4 2 2 4 ...\nNULL\n\n\nThis works by essentially mixing both functional and object-oriented programming, hence functional OOP. Let’s take a closer look at the source code of print() by simply typing print without brackets, into a console:\n\nprint\n\nfunction (x, ...) \nUseMethod(\"print\")\n<bytecode: 0x55f662fb0a40>\n<environment: namespace:base>\n\n\nQuite unexpectedly, the source code of print() is one line long and is just UseMethod(\"print\"). So all print() does is use a generic method called “print”. If your text editor has auto-completion enabled, you might see that there are actually many print() functions. For example, type print.data.frame into a console:\n\nprint.data.frame\n\nfunction (x, ..., digits = NULL, quote = FALSE, right = TRUE, \n row.names = TRUE, max = NULL) \n{\n n <- length(row.names(x))\n if (length(x) == 0L) {\n cat(sprintf(ngettext(n, \"data frame with 0 columns and %d row\", \n \"data frame with 0 columns and %d rows\"), n), \"\\n\", \n sep = \"\")\n }\n else if (n == 0L) {\n print.default(names(x), quote = FALSE)\n cat(gettext(\"<0 rows> (or 0-length row.names)\\n\"))\n }\n else {\n if (is.null(max)) \n max <- getOption(\"max.print\", 99999L)\n if (!is.finite(max)) \n stop(\"invalid 'max' / getOption(\\\"max.print\\\"): \", \n max)\n omit <- (n0 <- max%/%length(x)) < n\n m <- as.matrix(format.data.frame(if (omit) \n x[seq_len(n0), , drop = FALSE]\n else x, digits = digits, na.encode = FALSE))\n if (!isTRUE(row.names)) \n dimnames(m)[[1L]] <- if (isFALSE(row.names)) \n rep.int(\"\", if (omit) \n n0\n else n)\n else row.names\n print(m, ..., quote = quote, right = right, max = max)\n if (omit) \n cat(\" [ reached 'max' / getOption(\\\"max.print\\\") -- omitted\", \n n - n0, \"rows ]\\n\")\n }\n invisible(x)\n}\n<bytecode: 0x55f664dedb78>\n<environment: namespace:base>\n\n\nThis is the print function for data.frame objects. So what print() does, is look at the class of its argument x, and then look for the right print function to call. In more traditional OOP languages, users would type something like:\n\nmtcars.print()\n\nIn these languages, objects encapsulate methods (the equivalent of our functions), so if mtcars is a data frame, it encapsulates a print() method that then does the printing. R is different, because classes and methods are kept separate. If a package developer creates a new object class, then the developer also must implement the required methods. For example in the {chronicler} package, the chronicler class is defined alongside the print.chronicler() function to print these objects.\nAll of this to say that if you want to extend R by writing packages, learning some OOP essentials is also important. But for data analysis, functional programming does the job perfectly well. To learn more about R’s different OOP systems (yes, R can do OOP in different ways and the one I sketched here is the simplest, but probably the most used as well), take a look at Wickham (2019).\n\n\n6.4.2 purrr\nThe {purrr} package, developed by Posit (formerly RStudio), contains many functions to make functional programming with R more smooth. In the previous section, we discussed the apply() family of function; they all do a very similar thing, which is looping over a list and applying a function to the elements of the list, but it is not quite easy to remember which one does what. Also, for some of these functions like apply(), the list argument comes first, and then the function, but in the case of mapply(), the function comes first. This type of inconsistencies can be frustrating. Another issue with these functions is that it is not always easy to know what type the output is going to be. List? Atomic vector? Something else?\n{purrr} solves this issue by offering the map() family of functions, which behave in a very consistent way. The basic function is called map() and we’ve already used it:\n\nmap(seq(1, 5), sqrt)\n\n[[1]]\n[1] 1\n\n[[2]]\n[1] 1.414214\n\n[[3]]\n[1] 1.732051\n\n[[4]]\n[1] 2\n\n[[5]]\n[1] 2.236068\n\n\nBut there are many interesting variants:\n\nmap_dbl(seq(1, 5), sqrt)\n\n[1] 1.000000 1.414214 1.732051 2.000000 2.236068\n\n\nmap_dbl() coerces the output to an atomic vector of doubles instead of a list of doubles. Then there’s:\n\nmap_chr(letters, toupper)\n\n [1] \"A\" \"B\" \"C\" \"D\" \"E\" \"F\" \"G\" \"H\" \"I\" \"J\" \"K\" \"L\" \"M\" \"N\"\n[15] \"O\" \"P\" \"Q\" \"R\" \"S\" \"T\" \"U\" \"V\" \"W\" \"X\" \"Y\" \"Z\"\n\n\nfor when the output needs to be an atomic vector of characters.\nThere are many others, so take a look at the documentation with ?map. There’s also walk() which is used if you’re only interested in the side-effect of the function (for example if the function takes paths as input and saves something to disk).\n{purrr} also has functions to replace Reduce(), simply called reduce() and accumulate(), and there are many, many other useful functions. Read through the documentation of the package4 and take the time to learn about all it has to offer.\n\n\n6.4.3 withr\n{withr} is a powerful package that makes it easy to “purify” functions that behave in a way that can cause problems. Remember the function from the introduction that randomly gave out a dish Bruno liked? Here it is again:\n\nh <- function(name, food_list = list()){\n\n food <- sample(c(\"lasagna\", \"cassoulet\", \"feijoada\"), 1)\n\n food_list <- append(food_list, food)\n\n print(paste0(name, \" likes \", food))\n\n food_list\n}\n\nFor the same input, this function may return different outputs so this function is not referentially transparent. So we improved the function by adding calls to set.seed() like this:\n\nh2 <- function(name, food_list = list(), seed = 123){\n\n # We set the seed, making sure that we get the same selection of food for a given seed\n set.seed(seed)\n food <- sample(c(\"lasagna\", \"cassoulet\", \"feijoada\"), 1)\n\n # We now need to unset the seed, because if we don't, guess what, the seed will stay set for the whole session!\n set.seed(NULL)\n\n food_list <- append(food_list, food)\n\n print(paste0(name, \" likes \", food))\n\n food_list\n}\n\nThe problem with this approach is that we need to modify our function. We can instead use withr::with_seed() to achieve the same effect:\n\nwithr::with_seed(seed = 123,\n h(\"Bruno\"))\n\n[1] \"Bruno likes feijoada\"\n\n\n[[1]]\n[1] \"feijoada\"\n\n\nIt is also easier to create a wrapper if needed:\n\nh3 <- function(..., seed){\n withr::with_seed(seed = seed,\n h(...))\n}\n\n\nh3(\"Bruno\", seed = 123)\n\n[1] \"Bruno likes feijoada\"\n\n\n[[1]]\n[1] \"feijoada\"\n\n\nIn a previous example we downloaded a dataset and loaded it into memory; we did so by first creating a temporary file, then downloading it and then loading it. Suppose that instead of loading this data into our session, we simply wanted to test whether the link was still working. We wouldn’t want to keep the loaded data in our session, so to avoid having to delete it again manually, we could use with_tempfile():\n\nwithr::with_tempfile(\"unemp\", {\n download.file(\n \"https://is.gd/l57cNX\",\n destfile = unemp)\n load(unemp)\n nrow(unemp)\n }\n)\n\n[1] 472\n\n\nThe data got downloaded, and then loaded, and then we computed the number of rows of the data, without touching the global environment, or state, of our current session.\nJust like for {purrr}, {withr} has many useful functions which I encourage you to familiarize yourself with5."
+ "text": "6.4 Functional programming in R\nUp until now I focused on general concepts rather than on specifics of the R programming language when it comes to functional programming. In this section, we will be focusing entirely on R-specific capabilities and packages for functional programming.\n\n6.4.1 Base capabilities\nR is a functional programming language (but not only), and as such it comes with many functions out of the box to write functional code. We have already discussed lapply() and Reduce(). You should know that depending on what you want to achieve, there are other functions that are similar to lapply(): apply(), sapply(), vapply(), mapply() and tapply(). There’s also Map() which is a wrapper around mapply(). Each function performs the same basic task of applying a function over all the elements of a list or list-like structure, but it can be hard to keep them apart and when you should use one over another. This is why {purrr}, which we will discuss in the next section, is quite an interesting alternative to base R’s offering.\nAnother one of the quintessential functional programming functions (alongside Reduce() and Map()) that ships with R is Filter(). If you know dplyr::filter() you should be familiar with the concept of filtering rows of a data frame where the elements of one particular column satisfy a predicate. Filter() works the same way, but focusing on lists instead of data frame:\n\nFilter(is.character,\n list(\n seq(1, 5),\n \"Hey\")\n )\n\n[[1]]\n[1] \"Hey\"\n\n\nThe call above only returns the elements where is.character() evaluates to TRUE.\nAnother useful function is Negate() which is a function factory that takes a boolean function as an input and returns the opposite boolean function. As an illustration, suppose that in the example above we wanted to get everything but the character:\n\nFilter(Negate(is.character),\n list(\n seq(1, 5),\n \"Hey\")\n )\n\n[[1]]\n[1] 1 2 3 4 5\n\n\nThere are some other functions like this that you might want to check out: type ?Negate in console to read more about them.\nSometimes you may need to run code with side-effects, but want to avoid any interaction between these side-effects and the global environment. For example, you might want to run some code that creates a plot and saves it to disk, or code that creates some data and writes them to disk. local() can be used for this. local() runs code in a temporary environment that gets discarded at the end:\n\nlocal({\n a <- 2\n})\n\nVariable a was created inside this local environment. Checking if it exists now yields FALSE:\n\nexists(\"a\")\n\n[1] FALSE\n\n\nWe will be using this technique later in the book to keep our scripts pure.\nBefore continuing with R packages that extend R’s functional programming capabilities it’s also important to stress that just as R is a functional programming language, it is also an object oriented language. In fact, R is what John Chambers called a functional OOP language (Chambers (2014)). I won’t delve too much into what this means (read Wickham (2019) for this), but as a short discussion, consider the print() function. Depending on what type of object the user gives it, it seems as if somehow print() knows what to do with it:\n\nprint(5)\n\n[1] 5\n\nprint(head(mtcars))\n\n mpg cyl disp hp drat wt qsec vs am\nMazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1\nMazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1\nDatsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1\nHornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0\nHornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0\nValiant 18.1 6 225 105 2.76 3.460 20.22 1 0\n gear carb\nMazda RX4 4 4\nMazda RX4 Wag 4 4\nDatsun 710 4 1\nHornet 4 Drive 3 1\nHornet Sportabout 3 2\nValiant 3 1\n\nprint(str(mtcars))\n\n'data.frame': 32 obs. of 11 variables:\n $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...\n $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...\n $ disp: num 160 160 108 258 360 ...\n $ hp : num 110 110 93 110 175 105 245 62 95 123 ...\n $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...\n $ wt : num 2.62 2.88 2.32 3.21 3.44 ...\n $ qsec: num 16.5 17 18.6 19.4 17 ...\n $ vs : num 0 0 1 1 0 1 0 1 1 1 ...\n $ am : num 1 1 1 0 0 0 0 0 0 0 ...\n $ gear: num 4 4 4 3 3 3 3 4 4 4 ...\n $ carb: num 4 4 1 1 2 1 4 2 2 4 ...\nNULL\n\n\nThis works by essentially mixing both functional and object-oriented programming, hence functional OOP. Let’s take a closer look at the source code of print() by simply typing print without brackets, into a console:\n\nprint\n\nfunction (x, ...) \nUseMethod(\"print\")\n<bytecode: 0x558c98972ff0>\n<environment: namespace:base>\n\n\nQuite unexpectedly, the source code of print() is one line long and is just UseMethod(\"print\"). So all print() does is use a generic method called “print”. If your text editor has auto-completion enabled, you might see that there are actually many print() functions. For example, type print.data.frame into a console:\n\nprint.data.frame\n\nfunction (x, ..., digits = NULL, quote = FALSE, right = TRUE, \n row.names = TRUE, max = NULL) \n{\n n <- length(row.names(x))\n if (length(x) == 0L) {\n cat(sprintf(ngettext(n, \"data frame with 0 columns and %d row\", \n \"data frame with 0 columns and %d rows\"), n), \"\\n\", \n sep = \"\")\n }\n else if (n == 0L) {\n print.default(names(x), quote = FALSE)\n cat(gettext(\"<0 rows> (or 0-length row.names)\\n\"))\n }\n else {\n if (is.null(max)) \n max <- getOption(\"max.print\", 99999L)\n if (!is.finite(max)) \n stop(\"invalid 'max' / getOption(\\\"max.print\\\"): \", \n max)\n omit <- (n0 <- max%/%length(x)) < n\n m <- as.matrix(format.data.frame(if (omit) \n x[seq_len(n0), , drop = FALSE]\n else x, digits = digits, na.encode = FALSE))\n if (!isTRUE(row.names)) \n dimnames(m)[[1L]] <- if (isFALSE(row.names)) \n rep.int(\"\", if (omit) \n n0\n else n)\n else row.names\n print(m, ..., quote = quote, right = right, max = max)\n if (omit) \n cat(\" [ reached 'max' / getOption(\\\"max.print\\\") -- omitted\", \n n - n0, \"rows ]\\n\")\n }\n invisible(x)\n}\n<bytecode: 0x558c9a2a5728>\n<environment: namespace:base>\n\n\nThis is the print function for data.frame objects. So what print() does, is look at the class of its argument x, and then look for the right print function to call. In more traditional OOP languages, users would type something like:\n\nmtcars.print()\n\nIn these languages, objects encapsulate methods (the equivalent of our functions), so if mtcars is a data frame, it encapsulates a print() method that then does the printing. R is different, because classes and methods are kept separate. If a package developer creates a new object class, then the developer also must implement the required methods. For example in the {chronicler} package, the chronicler class is defined alongside the print.chronicler() function to print these objects.\nAll of this to say that if you want to extend R by writing packages, learning some OOP essentials is also important. But for data analysis, functional programming does the job perfectly well. To learn more about R’s different OOP systems (yes, R can do OOP in different ways and the one I sketched here is the simplest, but probably the most used as well), take a look at Wickham (2019).\n\n\n6.4.2 purrr\nThe {purrr} package, developed by Posit (formerly RStudio), contains many functions to make functional programming with R more smooth. In the previous section, we discussed the apply() family of function; they all do a very similar thing, which is looping over a list and applying a function to the elements of the list, but it is not quite easy to remember which one does what. Also, for some of these functions like apply(), the list argument comes first, and then the function, but in the case of mapply(), the function comes first. This type of inconsistencies can be frustrating. Another issue with these functions is that it is not always easy to know what type the output is going to be. List? Atomic vector? Something else?\n{purrr} solves this issue by offering the map() family of functions, which behave in a very consistent way. The basic function is called map() and we’ve already used it:\n\nmap(seq(1, 5), sqrt)\n\n[[1]]\n[1] 1\n\n[[2]]\n[1] 1.414214\n\n[[3]]\n[1] 1.732051\n\n[[4]]\n[1] 2\n\n[[5]]\n[1] 2.236068\n\n\nBut there are many interesting variants:\n\nmap_dbl(seq(1, 5), sqrt)\n\n[1] 1.000000 1.414214 1.732051 2.000000 2.236068\n\n\nmap_dbl() coerces the output to an atomic vector of doubles instead of a list of doubles. Then there’s:\n\nmap_chr(letters, toupper)\n\n [1] \"A\" \"B\" \"C\" \"D\" \"E\" \"F\" \"G\" \"H\" \"I\" \"J\" \"K\" \"L\" \"M\" \"N\"\n[15] \"O\" \"P\" \"Q\" \"R\" \"S\" \"T\" \"U\" \"V\" \"W\" \"X\" \"Y\" \"Z\"\n\n\nfor when the output needs to be an atomic vector of characters.\nThere are many others, so take a look at the documentation with ?map. There’s also walk() which is used if you’re only interested in the side-effect of the function (for example if the function takes paths as input and saves something to disk).\n{purrr} also has functions to replace Reduce(), simply called reduce() and accumulate(), and there are many, many other useful functions. Read through the documentation of the package4 and take the time to learn about all it has to offer.\n\n\n6.4.3 withr\n{withr} is a powerful package that makes it easy to “purify” functions that behave in a way that can cause problems. Remember the function from the introduction that randomly gave out a dish Bruno liked? Here it is again:\n\nh <- function(name, food_list = list()){\n\n food <- sample(c(\"lasagna\", \"cassoulet\", \"feijoada\"), 1)\n\n food_list <- append(food_list, food)\n\n print(paste0(name, \" likes \", food))\n\n food_list\n}\n\nFor the same input, this function may return different outputs so this function is not referentially transparent. So we improved the function by adding calls to set.seed() like this:\n\nh2 <- function(name, food_list = list(), seed = 123){\n\n # We set the seed, making sure that we get the same selection of food for a given seed\n set.seed(seed)\n food <- sample(c(\"lasagna\", \"cassoulet\", \"feijoada\"), 1)\n\n # We now need to unset the seed, because if we don't, guess what, the seed will stay set for the whole session!\n set.seed(NULL)\n\n food_list <- append(food_list, food)\n\n print(paste0(name, \" likes \", food))\n\n food_list\n}\n\nThe problem with this approach is that we need to modify our function. We can instead use withr::with_seed() to achieve the same effect:\n\nwithr::with_seed(seed = 123,\n h(\"Bruno\"))\n\n[1] \"Bruno likes feijoada\"\n\n\n[[1]]\n[1] \"feijoada\"\n\n\nIt is also easier to create a wrapper if needed:\n\nh3 <- function(..., seed){\n withr::with_seed(seed = seed,\n h(...))\n}\n\n\nh3(\"Bruno\", seed = 123)\n\n[1] \"Bruno likes feijoada\"\n\n\n[[1]]\n[1] \"feijoada\"\n\n\nIn a previous example we downloaded a dataset and loaded it into memory; we did so by first creating a temporary file, then downloading it and then loading it. Suppose that instead of loading this data into our session, we simply wanted to test whether the link was still working. We wouldn’t want to keep the loaded data in our session, so to avoid having to delete it again manually, we could use with_tempfile():\n\nwithr::with_tempfile(\"unemp\", {\n download.file(\n \"https://is.gd/l57cNX\",\n destfile = unemp)\n load(unemp)\n nrow(unemp)\n }\n)\n\n[1] 472\n\n\nThe data got downloaded, and then loaded, and then we computed the number of rows of the data, without touching the global environment, or state, of our current session.\nJust like for {purrr}, {withr} has many useful functions which I encourage you to familiarize yourself with5."
},
{
"objectID": "fprog.html#conclusion",