Skip to content

Commit

Permalink
feat!: bump Rust polars to 0.39.0 (#1034)
Browse files Browse the repository at this point in the history
Co-authored-by: etiennebacher <[email protected]>
  • Loading branch information
eitsupi and etiennebacher authored Apr 14, 2024
1 parent 54feba1 commit 7986aaa
Show file tree
Hide file tree
Showing 43 changed files with 443 additions and 236 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -118,5 +118,5 @@ Collate:
'zzz.R'
Config/rextendr/version: 0.3.1
VignetteBuilder: knitr
Config/polars/LibVersion: 0.38.2
Config/polars/RustToolchainVersion: nightly-2024-02-23
Config/polars/LibVersion: 0.39.0
Config/polars/RustToolchainVersion: nightly-2024-03-28
9 changes: 8 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@
- In `$dt$convert_time_zone()` and `$dt$replace_time_zone()`, the `tz`
argument is renamed to `time_zone` (#944).
- In `$str$strptime()`, the argument `datatype` is renamed to `dtype` (#939).
- In `$str$parse_int()`, argument `radix` is renamed to `base` (#1034).

2. Change in the way arguments are passed:

Expand All @@ -85,6 +86,8 @@
`$str$to_time()`, all arguments (except the first one) must be named (#939).
- In `pl$date_range()`, the arguments `closed`, `time_unit`, and `time_zone`
must be named (#950).
- In `$set_sorted()` and `$sort_by()`, argument `descending` must be named
(#1034).
- In `pl$Series()`, using positional arguments throws a warning, since the
argument positions will be changed in the future (#966).

Expand Down Expand Up @@ -144,6 +147,7 @@
early stage of this package and does not exist in other language APIs (#1028).
- The following deprecated functions are now removed: `pl$threadpool_size()`,
`<DataFrame>$with_row_count()`, `<LazyFrame>$with_row_count()` (#965).
- In `$group_by_dynamic()`, the first datapoint is always preserved (#1034).


### New features
Expand Down Expand Up @@ -181,14 +185,17 @@
when a datetime doesn't exist.
- `mapping_strategy` in `$over()` (#984, #988).
- `raise_if_undetermined` in `$meta$output_name()` (#961).
- `null_on_oob` in `$arr$get()` and `$list$get()` to determine what happens
when the index is out of bounds (#1034).
- `nulls_last`, `multithreaded`, and `maintain_order` in `$sort_by()` (#1034).
- Other:
- `pl$Series()` now calls `as_polars_series()` internally, so it can convert
more classes to Series properly (#1015).
- Export the `Duration` datatype (#955).
- New active binding `<Series>$struct$fields` (#1002).
- rust-polars is updated to 0.38.3 (#937).
- rust-polars is updated to 0.39.0 (#937, #1034).
### Bug fixes
Expand Down
11 changes: 4 additions & 7 deletions R/expr__array.R
Original file line number Diff line number Diff line change
Expand Up @@ -136,14 +136,11 @@ ExprArr_unique = function(maintain_order = FALSE) .pr$Expr$arr_unique(self, main
#'
#' This allows to extract one value per array only.
#'
#' @inherit ExprList_get params return
#' @param index An Expr or something coercible to an Expr, that must return a
#' single index. Values are 0-indexed (so index 0 would return the first item
#' of every sub-array) and negative values start from the end (index `-1`
#' returns the last item). If the index is out of bounds, it will return a
#' `null`. Strings are parsed as column names.
#'
#' @return Expr
#' @aliases arr_get
#' returns the last item).
#' @examples
#' df = pl$DataFrame(
#' values = list(c(1, 2), c(3, 4), c(NA_real_, 6)),
Expand All @@ -156,8 +153,8 @@ ExprArr_unique = function(maintain_order = FALSE) .pr$Expr$arr_unique(self, main
#' val_minus_1 = pl$col("values")$arr$get(-1),
#' val_oob = pl$col("values")$arr$get(10)
#' )
ExprArr_get = function(index) {
.pr$Expr$arr_get(self, index) |>
ExprArr_get = function(index, ..., null_on_oob = TRUE) {
.pr$Expr$arr_get(self, index, null_on_oob) |>
unwrap("in $arr$get():")
}

Expand Down
29 changes: 19 additions & 10 deletions R/expr__expr.R
Original file line number Diff line number Diff line change
Expand Up @@ -1377,15 +1377,13 @@ Expr_mode = use_extendr_wrapper
#'
#' Sort this column. If used in a groupby context, the groups are sorted.
#'
#' @param descending Sort in descending order. When sorting by multiple columns,
#' can be specified per column by passing a vector of booleans.
#' @param nulls_last If `TRUE`, place nulls values last.
#' @inheritParams Series_sort
#' @return Expr
#' @examples
#' pl$DataFrame(a = c(6, 1, 0, NA, Inf, NaN))$
#' with_columns(sorted = pl$col("a")$sort())
Expr_sort = function(descending = FALSE, nulls_last = FALSE) {
.pr$Expr$sort(self, descending, nulls_last)
Expr_sort = function(..., descending = FALSE, nulls_last = FALSE) {
.pr$Expr$sort_with(self, descending, nulls_last)
}

#' Top k values
Expand Down Expand Up @@ -1477,14 +1475,17 @@ Expr_search_sorted = function(element) {
.pr$Expr$search_sorted(self, wrap_e(element))
}

# TODO: rewrite `by` to `...` <https://github.com/pola-rs/r-polars/pull/997>
#' Sort Expr by order of others
#'
#' Sort this column by the ordering of another column, or multiple other columns.
#' If used in a groupby context, the groups are sorted.
#'
#' @param by One expression or a list of expressions and/or strings (interpreted
#' as column names).
#' @inheritParams Expr_sort
#' @param maintain_order A logical to indicate whether the order should be maintained
#' if elements are equal.
#' @inheritParams Series_sort
#' @return Expr
#' @examples
#' df = pl$DataFrame(
Expand All @@ -1510,12 +1511,19 @@ Expr_search_sorted = function(element) {
#' df$with_columns(
#' sorted = pl$col("group")$sort_by(pl$col("value1")$sort(descending = TRUE))
#' )
Expr_sort_by = function(by, descending = FALSE) {
Expr_sort_by = function(
by, ..., descending = FALSE,
nulls_last = FALSE,
multithreaded = TRUE,
maintain_order = FALSE) {
.pr$Expr$sort_by(
self,
wrap_elist_result(by, str_to_lit = FALSE),
result(descending)
) |> unwrap("in $sort_by:")
descending,
nulls_last,
maintain_order,
multithreaded
) |> unwrap("in $sort_by():")
}

#' Gather values by index
Expand Down Expand Up @@ -3142,6 +3150,7 @@ Expr_cumulative_eval = function(expr, min_periods = 1L, parallel = FALSE) {
#' This enables downstream code to use fast paths for sorted arrays. WARNING:
#' this doesn't check whether the data is actually sorted, you have to ensure of
#' that yourself.
#' @param ... Ignored.
#' @param descending Sort the columns in descending order.
#' @return Expr
#' @examples
Expand All @@ -3153,7 +3162,7 @@ Expr_cumulative_eval = function(expr, min_periods = 1L, parallel = FALSE) {
#' s2 = pl$select(pl$lit(c(1, 3, 2, 4))$set_sorted()$alias("a"))$get_column("a")
#' s2$sort()
#' s2$flags # returns TRUE while it's not actually sorted
Expr_set_sorted = function(descending = FALSE) {
Expr_set_sorted = function(..., descending = FALSE) {
self$map_batches(\(s) {
.pr$Series$set_sorted_mut(s, descending) # use private to bypass mut protection
s
Expand Down
27 changes: 18 additions & 9 deletions R/expr__list.R
Original file line number Diff line number Diff line change
Expand Up @@ -112,11 +112,11 @@ ExprList_concat = function(other) {
#' @param index An Expr or something coercible to an Expr, that must return a
#' single index. Values are 0-indexed (so index 0 would return the first item
#' of every sublist) and negative values start from the end (index `-1`
#' returns the last item). If the index is out of bounds, it will return a
#' `null`. Strings are parsed as column names.
#'
#' @return Expr
#' @aliases list_get
#' returns the last item).
#' @param ... Ignored.
#' @param null_on_oob If `TRUE`, return `null` if an index is out of bounds.
#' Otherwise, raise an error.
#' @return [Expr][Expr_class]
#' @examples
#' df = pl$DataFrame(
#' values = list(c(2, 2, NA), c(1, 2, 3), NA_real_, NULL),
Expand All @@ -128,7 +128,10 @@ ExprList_concat = function(other) {
#' val_minus_1 = pl$col("values")$list$get(-1),
#' val_oob = pl$col("values")$list$get(10)
#' )
ExprList_get = function(index) .pr$Expr$list_get(self, wrap_e(index, str_to_lit = FALSE))
ExprList_get = function(index, ..., null_on_oob = TRUE) {
.pr$Expr$list_get(self, index, null_on_oob) |>
unwrap("in $list$get():")
}

#' Get several values by index in a list
#'
Expand All @@ -140,7 +143,7 @@ ExprList_get = function(index) .pr$Expr$list_get(self, wrap_e(index, str_to_lit
#' first item of every sublist) and negative values start from the end (index
#' `-1` returns the last item). If the index is out of bounds, it will return
#' a `null`. Strings are parsed as column names.
#' @param null_on_oob Return a `null` value if index is out of bounds.
#' @inheritParams ExprList_get
#'
#' @return Expr
#' @aliases list_gather
Expand Down Expand Up @@ -196,7 +199,10 @@ ExprList_gather_every = function(n, offset = 0) {
#' df$with_columns(
#' first = pl$col("a")$list$first()
#' )
ExprList_first = function() .pr$Expr$list_get(self, wrap_e(0L, str_to_lit = FALSE))
ExprList_first = function() {
.pr$Expr$list_get(self, 0, null_on_oob = TRUE) |>
unwrap("in $list$first():")
}

#' Get the last value in a list
#'
Expand All @@ -207,7 +213,10 @@ ExprList_first = function() .pr$Expr$list_get(self, wrap_e(0L, str_to_lit = FALS
#' df$with_columns(
#' last = pl$col("a")$list$last()
#' )
ExprList_last = function() .pr$Expr$list_get(self, wrap_e(-1L, str_to_lit = FALSE))
ExprList_last = function() {
.pr$Expr$list_get(self, -1, null_on_oob = TRUE) |>
unwrap("in $list$last():")
}

#' Check if list contains a given value
#'
Expand Down
7 changes: 4 additions & 3 deletions R/expr__string.R
Original file line number Diff line number Diff line change
Expand Up @@ -865,11 +865,12 @@ ExprStr_explode = function() {
unwrap("in str$explode():")
}

# TODO: rename to `to_integer`
#' Parse integers with base radix from strings
#'
#' @description Parse integers with base 2 by default.
#' @keywords ExprStr
#' @param radix Positive integer which is the base of the string we are parsing.
#' @param base Positive integer which is the base of the string we are parsing.
#' Default is 2.
#' @param strict If `TRUE` (default), integer overflow will raise an error.
#' Otherwise, they will be converted to `null`.
Expand All @@ -882,8 +883,8 @@ ExprStr_explode = function() {
#' # Convert to null if the string is not a valid integer when `strict = FALSE`
#' df = pl$DataFrame(x = c("1", "2", "foo"))
#' df$select(pl$col("x")$str$parse_int(10, FALSE))
ExprStr_parse_int = function(radix = 2, strict = TRUE) {
.pr$Expr$str_parse_int(self, radix, strict) |> unwrap("in str$parse_int():")
ExprStr_parse_int = function(base = 2, strict = TRUE) {
.pr$Expr$str_parse_int(self, base, strict) |> unwrap("in str$parse_int():")
}

#' Returns string values in reversed order
Expand Down
20 changes: 10 additions & 10 deletions R/extendr-wrappers.R
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ all_horizontal <- function(dotdotdot) .Call(wrap__all_horizontal, dotdotdot)

any_horizontal <- function(dotdotdot) .Call(wrap__any_horizontal, dotdotdot)

arg_sort_by <- function(exprs, descending) .Call(wrap__arg_sort_by, exprs, descending)
arg_sort_by <- function(exprs, descending, nulls_last, multithreaded, maintain_order) .Call(wrap__arg_sort_by, exprs, descending, nulls_last, multithreaded, maintain_order)

arg_where <- function(condition) .Call(wrap__arg_where, condition)

Expand Down Expand Up @@ -98,7 +98,7 @@ concat_series <- function(l, rechunk, to_supertypes) .Call(wrap__concat_series,

new_from_csv <- function(path, has_header, separator, comment_prefix, quote_char, skip_rows, dtypes, null_values, ignore_errors, cache, infer_schema_length, n_rows, encoding, low_memory, rechunk, skip_rows_after_header, row_index_name, row_index_offset, try_parse_dates, eol_char, raise_if_empty, truncate_ragged_lines) .Call(wrap__new_from_csv, path, has_header, separator, comment_prefix, quote_char, skip_rows, dtypes, null_values, ignore_errors, cache, infer_schema_length, n_rows, encoding, low_memory, rechunk, skip_rows_after_header, row_index_name, row_index_offset, try_parse_dates, eol_char, raise_if_empty, truncate_ragged_lines)

import_arrow_ipc <- function(path, n_rows, cache, rechunk, row_name, row_index, memmap) .Call(wrap__import_arrow_ipc, path, n_rows, cache, rechunk, row_name, row_index, memmap)
import_arrow_ipc <- function(path, n_rows, cache, rechunk, row_name, row_index, memory_map) .Call(wrap__import_arrow_ipc, path, n_rows, cache, rechunk, row_name, row_index, memory_map)

new_from_ndjson <- function(path, infer_schema_length, batch_size, n_rows, low_memory, rechunk, row_index_name, row_index_offset, ignore_errors) .Call(wrap__new_from_ndjson, path, infer_schema_length, batch_size, n_rows, low_memory, rechunk, row_index_name, row_index_offset, ignore_errors)

Expand Down Expand Up @@ -484,7 +484,7 @@ RPolarsExpr$to_physical <- function() .Call(wrap__RPolarsExpr__to_physical, self

RPolarsExpr$cast <- function(data_type, strict) .Call(wrap__RPolarsExpr__cast, self, data_type, strict)

RPolarsExpr$sort <- function(descending, nulls_last) .Call(wrap__RPolarsExpr__sort, self, descending, nulls_last)
RPolarsExpr$sort_with <- function(descending, nulls_last) .Call(wrap__RPolarsExpr__sort_with, self, descending, nulls_last)

RPolarsExpr$arg_sort <- function(descending, nulls_last) .Call(wrap__RPolarsExpr__arg_sort, self, descending, nulls_last)

Expand All @@ -500,7 +500,7 @@ RPolarsExpr$search_sorted <- function(element) .Call(wrap__RPolarsExpr__search_s

RPolarsExpr$gather <- function(idx) .Call(wrap__RPolarsExpr__gather, self, idx)

RPolarsExpr$sort_by <- function(by, descending) .Call(wrap__RPolarsExpr__sort_by, self, by, descending)
RPolarsExpr$sort_by <- function(by, descending, nulls_last, maintain_order, multithreaded) .Call(wrap__RPolarsExpr__sort_by, self, by, descending, nulls_last, maintain_order, multithreaded)

RPolarsExpr$backward_fill <- function(limit) .Call(wrap__RPolarsExpr__backward_fill, self, limit)

Expand Down Expand Up @@ -690,7 +690,7 @@ RPolarsExpr$list_gather <- function(index, null_on_oob) .Call(wrap__RPolarsExpr_

RPolarsExpr$list_gather_every <- function(n, offset) .Call(wrap__RPolarsExpr__list_gather_every, self, n, offset)

RPolarsExpr$list_get <- function(index) .Call(wrap__RPolarsExpr__list_get, self, index)
RPolarsExpr$list_get <- function(index, null_on_oob) .Call(wrap__RPolarsExpr__list_get, self, index, null_on_oob)

RPolarsExpr$list_join <- function(separator, ignore_nulls) .Call(wrap__RPolarsExpr__list_join, self, separator, ignore_nulls)

Expand Down Expand Up @@ -742,7 +742,7 @@ RPolarsExpr$arr_arg_min <- function() .Call(wrap__RPolarsExpr__arr_arg_min, self

RPolarsExpr$arr_arg_max <- function() .Call(wrap__RPolarsExpr__arr_arg_max, self)

RPolarsExpr$arr_get <- function(index) .Call(wrap__RPolarsExpr__arr_get, self, index)
RPolarsExpr$arr_get <- function(index, null_on_oob) .Call(wrap__RPolarsExpr__arr_get, self, index, null_on_oob)

RPolarsExpr$arr_join <- function(separator, ignore_nulls) .Call(wrap__RPolarsExpr__arr_join, self, separator, ignore_nulls)

Expand Down Expand Up @@ -1024,7 +1024,7 @@ RPolarsExpr$str_slice <- function(offset, length) .Call(wrap__RPolarsExpr__str_s

RPolarsExpr$str_explode <- function() .Call(wrap__RPolarsExpr__str_explode, self)

RPolarsExpr$str_parse_int <- function(radix, strict) .Call(wrap__RPolarsExpr__str_parse_int, self, radix, strict)
RPolarsExpr$str_parse_int <- function(base, strict) .Call(wrap__RPolarsExpr__str_parse_int, self, base, strict)

RPolarsExpr$str_reverse <- function() .Call(wrap__RPolarsExpr__str_reverse, self)

Expand Down Expand Up @@ -1176,7 +1176,7 @@ RPolarsLazyFrame$join_asof <- function(other, left_on, right_on, left_by, right_

RPolarsLazyFrame$join <- function(other, left_on, right_on, how, validate, join_nulls, suffix, allow_parallel, force_parallel) .Call(wrap__RPolarsLazyFrame__join, self, other, left_on, right_on, how, validate, join_nulls, suffix, allow_parallel, force_parallel)

RPolarsLazyFrame$sort_by_exprs <- function(by, dotdotdot, descending, nulls_last, maintain_order) .Call(wrap__RPolarsLazyFrame__sort_by_exprs, self, by, dotdotdot, descending, nulls_last, maintain_order)
RPolarsLazyFrame$sort_by_exprs <- function(by, dotdotdot, descending, nulls_last, maintain_order, multithreaded) .Call(wrap__RPolarsLazyFrame__sort_by_exprs, self, by, dotdotdot, descending, nulls_last, maintain_order, multithreaded)

RPolarsLazyFrame$melt <- function(id_vars, value_vars, value_name, variable_name, streamable) .Call(wrap__RPolarsLazyFrame__melt, self, id_vars, value_vars, value_name, variable_name, streamable)

Expand All @@ -1198,7 +1198,7 @@ RPolarsLazyFrame$clone_in_rust <- function() .Call(wrap__RPolarsLazyFrame__clone

RPolarsLazyFrame$with_context <- function(contexts) .Call(wrap__RPolarsLazyFrame__with_context, self, contexts)

RPolarsLazyFrame$rolling <- function(index_column, period, offset, closed, by, check_sorted) .Call(wrap__RPolarsLazyFrame__rolling, self, index_column, period, offset, closed, by, check_sorted)
RPolarsLazyFrame$rolling <- function(index_column, period, offset, closed, group_by, check_sorted) .Call(wrap__RPolarsLazyFrame__rolling, self, index_column, period, offset, closed, group_by, check_sorted)

RPolarsLazyFrame$group_by_dynamic <- function(index_column, every, period, offset, label, include_boundaries, closed, by, start_by, check_sorted) .Call(wrap__RPolarsLazyFrame__group_by_dynamic, self, index_column, every, period, offset, label, include_boundaries, closed, by, start_by, check_sorted)

Expand Down Expand Up @@ -1250,7 +1250,7 @@ RPolarsSeries$n_unique <- function() .Call(wrap__RPolarsSeries__n_unique, self)

RPolarsSeries$name <- function() .Call(wrap__RPolarsSeries__name, self)

RPolarsSeries$sort_mut <- function(descending, nulls_last) .Call(wrap__RPolarsSeries__sort_mut, self, descending, nulls_last)
RPolarsSeries$sort <- function(descending, nulls_last, multithreaded) .Call(wrap__RPolarsSeries__sort, self, descending, nulls_last, multithreaded)

RPolarsSeries$value_counts <- function(sort, parallel) .Call(wrap__RPolarsSeries__value_counts, self, sort, parallel)

Expand Down
11 changes: 9 additions & 2 deletions R/functions__lazy.R
Original file line number Diff line number Diff line change
Expand Up @@ -1240,6 +1240,8 @@ pl_arg_where = function(condition) {
#' @param ... Column(s) to arg sort by. Can be Expr(s) or something coercible
#' to Expr(s). Strings are parsed as column names.
#' @inheritParams Expr_sort
#' @inheritParams Series_sort
#' @inheritParams LazyFrame_sort
#'
#' @return Expr
#' @seealso [$arg_sort()][Expr_arg_sort()] to find the row indices that would
Expand All @@ -1259,7 +1261,12 @@ pl_arg_where = function(condition) {
#' df$with_columns(
#' arg_sort_a = pl$arg_sort_by(pl$col("a") * -1)
#' )
pl_arg_sort_by = function(..., descending = FALSE) {
pl_arg_sort_by = function(
...,
descending = FALSE,
nulls_last = FALSE,
multithreaded = TRUE,
maintain_order = FALSE) {
dots = list2(...)

# The first argument must be a column, not columns
Expand All @@ -1268,7 +1275,7 @@ pl_arg_sort_by = function(..., descending = FALSE) {
dots = unlist(dots, recursive = FALSE)
}

arg_sort_by(dots, descending) |>
arg_sort_by(dots, descending = descending, nulls_last = nulls_last, multithreaded = multithreaded, maintain_order = maintain_order) |>
unwrap("in pl$arg_sort_by():")
}

Expand Down
11 changes: 6 additions & 5 deletions R/lazyframe__lazy.R
Original file line number Diff line number Diff line change
Expand Up @@ -1296,15 +1296,15 @@ LazyFrame_join = function(
}


#' Sort a LazyFrame
#' @description Sort by one or more Expressions.
#' Sort the LazyFrame by the given columns
#'
#' @inheritParams Series_sort
#' @param by Column(s) to sort by. Can be character vector of column names,
#' a list of Expr(s) or a list with a mix of Expr(s) and column names.
#' @param ... More columns to sort by as above but provided one Expr per argument.
#' @param descending Logical. Sort in descending order (default is `FALSE`). This must be
#' either of length 1 or a logical vector of the same length as the number of
#' Expr(s) specified in `by` and `...`.
#' @param nulls_last Logical. Place `NULL`s at the end? Default is `FALSE`.
#' @param maintain_order Whether the order should be maintained if elements are
#' equal. If `TRUE`, streaming is not possible and performance might be worse
#' since this requires a stable search.
Expand All @@ -1326,10 +1326,11 @@ LazyFrame_sort = function(
...,
descending = FALSE,
nulls_last = FALSE,
maintain_order = FALSE) {
maintain_order = FALSE,
multithreaded = TRUE) {
.pr$LazyFrame$sort_by_exprs(
self, unpack_list(by, .context = "in $sort():"), err_on_named_args(...),
descending, nulls_last, maintain_order
descending, nulls_last, maintain_order, multithreaded
) |>
unwrap("in $sort():")
}
Expand Down
Loading

0 comments on commit 7986aaa

Please sign in to comment.