Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: bump Rust polars to 0.39.0 #1034

Merged
merged 14 commits into from
Apr 14, 2024
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -118,5 +118,5 @@ Collate:
'zzz.R'
Config/rextendr/version: 0.3.1
VignetteBuilder: knitr
Config/polars/LibVersion: 0.38.2
Config/polars/RustToolchainVersion: nightly-2024-02-23
Config/polars/LibVersion: 0.39.0
Config/polars/RustToolchainVersion: nightly-2024-03-28
9 changes: 8 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@
- In `$dt$convert_time_zone()` and `$dt$replace_time_zone()`, the `tz`
argument is renamed to `time_zone` (#944).
- In `$str$strptime()`, the argument `datatype` is renamed to `dtype` (#939).
- In `$str$parse_int()`, argument `radix` is renamed to `base` (#1034).

2. Change in the way arguments are passed:

Expand All @@ -85,6 +86,8 @@
`$str$to_time()`, all arguments (except the first one) must be named (#939).
- In `pl$date_range()`, the arguments `closed`, `time_unit`, and `time_zone`
must be named (#950).
- In `$set_sorted()` and `$sort_by()`, argument `descending` must be named
(#1034).
- In `pl$Series()`, using positional arguments throws a warning, since the
argument positions will be changed in the future (#966).

Expand Down Expand Up @@ -144,6 +147,7 @@
early stage of this package and does not exist in other language APIs (#1028).
- The following deprecated functions are now removed: `pl$threadpool_size()`,
`<DataFrame>$with_row_count()`, `<LazyFrame>$with_row_count()` (#965).
- In `$group_by_dynamic()`, the first datapoint is always preserved (#1034).


### New features
Expand Down Expand Up @@ -181,14 +185,17 @@
when a datetime doesn't exist.
- `mapping_strategy` in `$over()` (#984, #988).
- `raise_if_undetermined` in `$meta$output_name()` (#961).
- `null_on_oob` in `$arr$get()` and `$list$get()` to determine what happens
when the index is out of bounds (#1034).
- `nulls_last`, `multithreaded`, and `maintain_order` in `$sort_by()` (#1034).

- Other:

- `pl$Series()` now calls `as_polars_series()` internally, so it can convert
more classes to Series properly (#1015).
- Export the `Duration` datatype (#955).
- New active binding `<Series>$struct$fields` (#1002).
- rust-polars is updated to 0.38.3 (#937).
- rust-polars is updated to 0.39.0 (#937, #1034).


### Bug fixes
Expand Down
11 changes: 4 additions & 7 deletions R/expr__array.R
Original file line number Diff line number Diff line change
Expand Up @@ -136,14 +136,11 @@ ExprArr_unique = function(maintain_order = FALSE) .pr$Expr$arr_unique(self, main
#'
#' This allows to extract one value per array only.
#'
#' @inherit ExprList_get params return
#' @param index An Expr or something coercible to an Expr, that must return a
#' single index. Values are 0-indexed (so index 0 would return the first item
#' of every sub-array) and negative values start from the end (index `-1`
#' returns the last item). If the index is out of bounds, it will return a
#' `null`. Strings are parsed as column names.
#'
#' @return Expr
#' @aliases arr_get
#' returns the last item).
#' @examples
#' df = pl$DataFrame(
#' values = list(c(1, 2), c(3, 4), c(NA_real_, 6)),
Expand All @@ -156,8 +153,8 @@ ExprArr_unique = function(maintain_order = FALSE) .pr$Expr$arr_unique(self, main
#' val_minus_1 = pl$col("values")$arr$get(-1),
#' val_oob = pl$col("values")$arr$get(10)
#' )
ExprArr_get = function(index) {
.pr$Expr$arr_get(self, index) |>
ExprArr_get = function(index, ..., null_on_oob = TRUE) {
.pr$Expr$arr_get(self, index, null_on_oob) |>
unwrap("in $arr$get():")
}

Expand Down
29 changes: 19 additions & 10 deletions R/expr__expr.R
Original file line number Diff line number Diff line change
Expand Up @@ -1377,15 +1377,13 @@ Expr_mode = use_extendr_wrapper
#'
#' Sort this column. If used in a groupby context, the groups are sorted.
#'
#' @param descending Sort in descending order. When sorting by multiple columns,
#' can be specified per column by passing a vector of booleans.
#' @param nulls_last If `TRUE`, place nulls values last.
#' @inheritParams Series_sort
#' @return Expr
#' @examples
#' pl$DataFrame(a = c(6, 1, 0, NA, Inf, NaN))$
#' with_columns(sorted = pl$col("a")$sort())
Expr_sort = function(descending = FALSE, nulls_last = FALSE) {
.pr$Expr$sort(self, descending, nulls_last)
Expr_sort = function(..., descending = FALSE, nulls_last = FALSE) {
.pr$Expr$sort_with(self, descending, nulls_last)
}

#' Top k values
Expand Down Expand Up @@ -1477,14 +1475,17 @@ Expr_search_sorted = function(element) {
.pr$Expr$search_sorted(self, wrap_e(element))
}

# TODO: rewrite `by` to `...` <https://github.com/pola-rs/r-polars/pull/997>
#' Sort Expr by order of others
#'
#' Sort this column by the ordering of another column, or multiple other columns.
#' If used in a groupby context, the groups are sorted.
#'
#' @param by One expression or a list of expressions and/or strings (interpreted
#' as column names).
#' @inheritParams Expr_sort
#' @param maintain_order A logical to indicate whether the order should be maintained
#' if elements are equal.
#' @inheritParams Series_sort
#' @return Expr
#' @examples
#' df = pl$DataFrame(
Expand All @@ -1510,12 +1511,19 @@ Expr_search_sorted = function(element) {
#' df$with_columns(
#' sorted = pl$col("group")$sort_by(pl$col("value1")$sort(descending = TRUE))
#' )
Expr_sort_by = function(by, descending = FALSE) {
Expr_sort_by = function(
by, ..., descending = FALSE,
nulls_last = FALSE,
multithreaded = TRUE,
maintain_order = FALSE) {
.pr$Expr$sort_by(
self,
wrap_elist_result(by, str_to_lit = FALSE),
result(descending)
) |> unwrap("in $sort_by:")
descending,
nulls_last,
maintain_order,
multithreaded
) |> unwrap("in $sort_by():")
}

#' Gather values by index
Expand Down Expand Up @@ -3142,6 +3150,7 @@ Expr_cumulative_eval = function(expr, min_periods = 1L, parallel = FALSE) {
#' This enables downstream code to use fast paths for sorted arrays. WARNING:
#' this doesn't check whether the data is actually sorted, you have to ensure of
#' that yourself.
#' @param ... Ignored.
#' @param descending Sort the columns in descending order.
#' @return Expr
#' @examples
Expand All @@ -3153,7 +3162,7 @@ Expr_cumulative_eval = function(expr, min_periods = 1L, parallel = FALSE) {
#' s2 = pl$select(pl$lit(c(1, 3, 2, 4))$set_sorted()$alias("a"))$get_column("a")
#' s2$sort()
#' s2$flags # returns TRUE while it's not actually sorted
Expr_set_sorted = function(descending = FALSE) {
Expr_set_sorted = function(..., descending = FALSE) {
self$map_batches(\(s) {
.pr$Series$set_sorted_mut(s, descending) # use private to bypass mut protection
s
Expand Down
27 changes: 18 additions & 9 deletions R/expr__list.R
Original file line number Diff line number Diff line change
Expand Up @@ -112,11 +112,11 @@ ExprList_concat = function(other) {
#' @param index An Expr or something coercible to an Expr, that must return a
#' single index. Values are 0-indexed (so index 0 would return the first item
#' of every sublist) and negative values start from the end (index `-1`
#' returns the last item). If the index is out of bounds, it will return a
#' `null`. Strings are parsed as column names.
#'
#' @return Expr
#' @aliases list_get
#' returns the last item).
#' @param ... Ignored.
#' @param null_on_oob If `TRUE`, return `null` if an index is out of bounds.
#' Otherwise, raise an error.
#' @return [Expr][Expr_class]
#' @examples
#' df = pl$DataFrame(
#' values = list(c(2, 2, NA), c(1, 2, 3), NA_real_, NULL),
Expand All @@ -128,7 +128,10 @@ ExprList_concat = function(other) {
#' val_minus_1 = pl$col("values")$list$get(-1),
#' val_oob = pl$col("values")$list$get(10)
#' )
ExprList_get = function(index) .pr$Expr$list_get(self, wrap_e(index, str_to_lit = FALSE))
ExprList_get = function(index, ..., null_on_oob = TRUE) {
.pr$Expr$list_get(self, index, null_on_oob) |>
unwrap("in $list$get():")
}

#' Get several values by index in a list
#'
Expand All @@ -140,7 +143,7 @@ ExprList_get = function(index) .pr$Expr$list_get(self, wrap_e(index, str_to_lit
#' first item of every sublist) and negative values start from the end (index
#' `-1` returns the last item). If the index is out of bounds, it will return
#' a `null`. Strings are parsed as column names.
#' @param null_on_oob Return a `null` value if index is out of bounds.
#' @inheritParams ExprList_get
#'
#' @return Expr
#' @aliases list_gather
Expand Down Expand Up @@ -196,7 +199,10 @@ ExprList_gather_every = function(n, offset = 0) {
#' df$with_columns(
#' first = pl$col("a")$list$first()
#' )
ExprList_first = function() .pr$Expr$list_get(self, wrap_e(0L, str_to_lit = FALSE))
ExprList_first = function() {
.pr$Expr$list_get(self, 0, null_on_oob = TRUE) |>
unwrap("in $list$first():")
}

#' Get the last value in a list
#'
Expand All @@ -207,7 +213,10 @@ ExprList_first = function() .pr$Expr$list_get(self, wrap_e(0L, str_to_lit = FALS
#' df$with_columns(
#' last = pl$col("a")$list$last()
#' )
ExprList_last = function() .pr$Expr$list_get(self, wrap_e(-1L, str_to_lit = FALSE))
ExprList_last = function() {
.pr$Expr$list_get(self, -1, null_on_oob = TRUE) |>
unwrap("in $list$last():")
}

#' Check if list contains a given value
#'
Expand Down
7 changes: 4 additions & 3 deletions R/expr__string.R
Original file line number Diff line number Diff line change
Expand Up @@ -865,11 +865,12 @@ ExprStr_explode = function() {
unwrap("in str$explode():")
}

# TODO: rename to `to_integer`
#' Parse integers with base radix from strings
#'
#' @description Parse integers with base 2 by default.
#' @keywords ExprStr
#' @param radix Positive integer which is the base of the string we are parsing.
#' @param base Positive integer which is the base of the string we are parsing.
#' Default is 2.
#' @param strict If `TRUE` (default), integer overflow will raise an error.
#' Otherwise, they will be converted to `null`.
Expand All @@ -882,8 +883,8 @@ ExprStr_explode = function() {
#' # Convert to null if the string is not a valid integer when `strict = FALSE`
#' df = pl$DataFrame(x = c("1", "2", "foo"))
#' df$select(pl$col("x")$str$parse_int(10, FALSE))
ExprStr_parse_int = function(radix = 2, strict = TRUE) {
.pr$Expr$str_parse_int(self, radix, strict) |> unwrap("in str$parse_int():")
ExprStr_parse_int = function(base = 2, strict = TRUE) {
.pr$Expr$str_parse_int(self, base, strict) |> unwrap("in str$parse_int():")
}

#' Returns string values in reversed order
Expand Down
20 changes: 10 additions & 10 deletions R/extendr-wrappers.R
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ all_horizontal <- function(dotdotdot) .Call(wrap__all_horizontal, dotdotdot)

any_horizontal <- function(dotdotdot) .Call(wrap__any_horizontal, dotdotdot)

arg_sort_by <- function(exprs, descending) .Call(wrap__arg_sort_by, exprs, descending)
arg_sort_by <- function(exprs, descending, nulls_last, multithreaded, maintain_order) .Call(wrap__arg_sort_by, exprs, descending, nulls_last, multithreaded, maintain_order)

arg_where <- function(condition) .Call(wrap__arg_where, condition)

Expand Down Expand Up @@ -98,7 +98,7 @@ concat_series <- function(l, rechunk, to_supertypes) .Call(wrap__concat_series,

new_from_csv <- function(path, has_header, separator, comment_prefix, quote_char, skip_rows, dtypes, null_values, ignore_errors, cache, infer_schema_length, n_rows, encoding, low_memory, rechunk, skip_rows_after_header, row_index_name, row_index_offset, try_parse_dates, eol_char, raise_if_empty, truncate_ragged_lines) .Call(wrap__new_from_csv, path, has_header, separator, comment_prefix, quote_char, skip_rows, dtypes, null_values, ignore_errors, cache, infer_schema_length, n_rows, encoding, low_memory, rechunk, skip_rows_after_header, row_index_name, row_index_offset, try_parse_dates, eol_char, raise_if_empty, truncate_ragged_lines)

import_arrow_ipc <- function(path, n_rows, cache, rechunk, row_name, row_index, memmap) .Call(wrap__import_arrow_ipc, path, n_rows, cache, rechunk, row_name, row_index, memmap)
import_arrow_ipc <- function(path, n_rows, cache, rechunk, row_name, row_index, memory_map) .Call(wrap__import_arrow_ipc, path, n_rows, cache, rechunk, row_name, row_index, memory_map)

new_from_ndjson <- function(path, infer_schema_length, batch_size, n_rows, low_memory, rechunk, row_index_name, row_index_offset, ignore_errors) .Call(wrap__new_from_ndjson, path, infer_schema_length, batch_size, n_rows, low_memory, rechunk, row_index_name, row_index_offset, ignore_errors)

Expand Down Expand Up @@ -484,7 +484,7 @@ RPolarsExpr$to_physical <- function() .Call(wrap__RPolarsExpr__to_physical, self

RPolarsExpr$cast <- function(data_type, strict) .Call(wrap__RPolarsExpr__cast, self, data_type, strict)

RPolarsExpr$sort <- function(descending, nulls_last) .Call(wrap__RPolarsExpr__sort, self, descending, nulls_last)
RPolarsExpr$sort_with <- function(descending, nulls_last) .Call(wrap__RPolarsExpr__sort_with, self, descending, nulls_last)

RPolarsExpr$arg_sort <- function(descending, nulls_last) .Call(wrap__RPolarsExpr__arg_sort, self, descending, nulls_last)

Expand All @@ -500,7 +500,7 @@ RPolarsExpr$search_sorted <- function(element) .Call(wrap__RPolarsExpr__search_s

RPolarsExpr$gather <- function(idx) .Call(wrap__RPolarsExpr__gather, self, idx)

RPolarsExpr$sort_by <- function(by, descending) .Call(wrap__RPolarsExpr__sort_by, self, by, descending)
RPolarsExpr$sort_by <- function(by, descending, nulls_last, maintain_order, multithreaded) .Call(wrap__RPolarsExpr__sort_by, self, by, descending, nulls_last, maintain_order, multithreaded)

RPolarsExpr$backward_fill <- function(limit) .Call(wrap__RPolarsExpr__backward_fill, self, limit)

Expand Down Expand Up @@ -690,7 +690,7 @@ RPolarsExpr$list_gather <- function(index, null_on_oob) .Call(wrap__RPolarsExpr_

RPolarsExpr$list_gather_every <- function(n, offset) .Call(wrap__RPolarsExpr__list_gather_every, self, n, offset)

RPolarsExpr$list_get <- function(index) .Call(wrap__RPolarsExpr__list_get, self, index)
RPolarsExpr$list_get <- function(index, null_on_oob) .Call(wrap__RPolarsExpr__list_get, self, index, null_on_oob)

RPolarsExpr$list_join <- function(separator, ignore_nulls) .Call(wrap__RPolarsExpr__list_join, self, separator, ignore_nulls)

Expand Down Expand Up @@ -742,7 +742,7 @@ RPolarsExpr$arr_arg_min <- function() .Call(wrap__RPolarsExpr__arr_arg_min, self

RPolarsExpr$arr_arg_max <- function() .Call(wrap__RPolarsExpr__arr_arg_max, self)

RPolarsExpr$arr_get <- function(index) .Call(wrap__RPolarsExpr__arr_get, self, index)
RPolarsExpr$arr_get <- function(index, null_on_oob) .Call(wrap__RPolarsExpr__arr_get, self, index, null_on_oob)

RPolarsExpr$arr_join <- function(separator, ignore_nulls) .Call(wrap__RPolarsExpr__arr_join, self, separator, ignore_nulls)

Expand Down Expand Up @@ -1024,7 +1024,7 @@ RPolarsExpr$str_slice <- function(offset, length) .Call(wrap__RPolarsExpr__str_s

RPolarsExpr$str_explode <- function() .Call(wrap__RPolarsExpr__str_explode, self)

RPolarsExpr$str_parse_int <- function(radix, strict) .Call(wrap__RPolarsExpr__str_parse_int, self, radix, strict)
RPolarsExpr$str_parse_int <- function(base, strict) .Call(wrap__RPolarsExpr__str_parse_int, self, base, strict)

RPolarsExpr$str_reverse <- function() .Call(wrap__RPolarsExpr__str_reverse, self)

Expand Down Expand Up @@ -1176,7 +1176,7 @@ RPolarsLazyFrame$join_asof <- function(other, left_on, right_on, left_by, right_

RPolarsLazyFrame$join <- function(other, left_on, right_on, how, validate, join_nulls, suffix, allow_parallel, force_parallel) .Call(wrap__RPolarsLazyFrame__join, self, other, left_on, right_on, how, validate, join_nulls, suffix, allow_parallel, force_parallel)

RPolarsLazyFrame$sort_by_exprs <- function(by, dotdotdot, descending, nulls_last, maintain_order) .Call(wrap__RPolarsLazyFrame__sort_by_exprs, self, by, dotdotdot, descending, nulls_last, maintain_order)
RPolarsLazyFrame$sort_by_exprs <- function(by, dotdotdot, descending, nulls_last, maintain_order, multithreaded) .Call(wrap__RPolarsLazyFrame__sort_by_exprs, self, by, dotdotdot, descending, nulls_last, maintain_order, multithreaded)

RPolarsLazyFrame$melt <- function(id_vars, value_vars, value_name, variable_name, streamable) .Call(wrap__RPolarsLazyFrame__melt, self, id_vars, value_vars, value_name, variable_name, streamable)

Expand All @@ -1198,7 +1198,7 @@ RPolarsLazyFrame$clone_in_rust <- function() .Call(wrap__RPolarsLazyFrame__clone

RPolarsLazyFrame$with_context <- function(contexts) .Call(wrap__RPolarsLazyFrame__with_context, self, contexts)

RPolarsLazyFrame$rolling <- function(index_column, period, offset, closed, by, check_sorted) .Call(wrap__RPolarsLazyFrame__rolling, self, index_column, period, offset, closed, by, check_sorted)
RPolarsLazyFrame$rolling <- function(index_column, period, offset, closed, group_by, check_sorted) .Call(wrap__RPolarsLazyFrame__rolling, self, index_column, period, offset, closed, group_by, check_sorted)

RPolarsLazyFrame$group_by_dynamic <- function(index_column, every, period, offset, label, include_boundaries, closed, by, start_by, check_sorted) .Call(wrap__RPolarsLazyFrame__group_by_dynamic, self, index_column, every, period, offset, label, include_boundaries, closed, by, start_by, check_sorted)

Expand Down Expand Up @@ -1250,7 +1250,7 @@ RPolarsSeries$n_unique <- function() .Call(wrap__RPolarsSeries__n_unique, self)

RPolarsSeries$name <- function() .Call(wrap__RPolarsSeries__name, self)

RPolarsSeries$sort_mut <- function(descending, nulls_last) .Call(wrap__RPolarsSeries__sort_mut, self, descending, nulls_last)
RPolarsSeries$sort <- function(descending, nulls_last, multithreaded) .Call(wrap__RPolarsSeries__sort, self, descending, nulls_last, multithreaded)

RPolarsSeries$value_counts <- function(sort, parallel) .Call(wrap__RPolarsSeries__value_counts, self, sort, parallel)

Expand Down
11 changes: 9 additions & 2 deletions R/functions__lazy.R
Original file line number Diff line number Diff line change
Expand Up @@ -1240,6 +1240,8 @@ pl_arg_where = function(condition) {
#' @param ... Column(s) to arg sort by. Can be Expr(s) or something coercible
#' to Expr(s). Strings are parsed as column names.
#' @inheritParams Expr_sort
#' @inheritParams Series_sort
#' @inheritParams LazyFrame_sort
#'
#' @return Expr
#' @seealso [$arg_sort()][Expr_arg_sort()] to find the row indices that would
Expand All @@ -1259,7 +1261,12 @@ pl_arg_where = function(condition) {
#' df$with_columns(
#' arg_sort_a = pl$arg_sort_by(pl$col("a") * -1)
#' )
pl_arg_sort_by = function(..., descending = FALSE) {
pl_arg_sort_by = function(
...,
descending = FALSE,
nulls_last = FALSE,
multithreaded = TRUE,
maintain_order = FALSE) {
dots = list2(...)

# The first argument must be a column, not columns
Expand All @@ -1268,7 +1275,7 @@ pl_arg_sort_by = function(..., descending = FALSE) {
dots = unlist(dots, recursive = FALSE)
}

arg_sort_by(dots, descending) |>
arg_sort_by(dots, descending = descending, nulls_last = nulls_last, multithreaded = multithreaded, maintain_order = maintain_order) |>
unwrap("in pl$arg_sort_by():")
}

Expand Down
11 changes: 6 additions & 5 deletions R/lazyframe__lazy.R
Original file line number Diff line number Diff line change
Expand Up @@ -1296,15 +1296,15 @@ LazyFrame_join = function(
}


#' Sort a LazyFrame
#' @description Sort by one or more Expressions.
#' Sort the LazyFrame by the given columns
#'
#' @inheritParams Series_sort
#' @param by Column(s) to sort by. Can be character vector of column names,
#' a list of Expr(s) or a list with a mix of Expr(s) and column names.
#' @param ... More columns to sort by as above but provided one Expr per argument.
#' @param descending Logical. Sort in descending order (default is `FALSE`). This must be
#' either of length 1 or a logical vector of the same length as the number of
#' Expr(s) specified in `by` and `...`.
#' @param nulls_last Logical. Place `NULL`s at the end? Default is `FALSE`.
#' @param maintain_order Whether the order should be maintained if elements are
#' equal. If `TRUE`, streaming is not possible and performance might be worse
#' since this requires a stable search.
Expand All @@ -1326,10 +1326,11 @@ LazyFrame_sort = function(
...,
descending = FALSE,
nulls_last = FALSE,
maintain_order = FALSE) {
maintain_order = FALSE,
multithreaded = TRUE) {
.pr$LazyFrame$sort_by_exprs(
self, unpack_list(by, .context = "in $sort():"), err_on_named_args(...),
descending, nulls_last, maintain_order
descending, nulls_last, maintain_order, multithreaded
) |>
unwrap("in $sort():")
}
Expand Down
Loading
Loading