Skip to content

Commit

Permalink
Refactor the token module.
Browse files Browse the repository at this point in the history
This major change refactors most APIs in the `token` module. Here are
some of the primary goals:

- Token APIs better represent and support the tree data structure.
- Token trees structurally distinguish branch and leaf tokens.
- Tokens and token topology are never implicit. Previously,
  concatenations and alternatives had no explicit representation, for
  example.
- Variance computation is more consistent, flexible, and complete.
- Token trees are always traversed iteratively (within the `token`
  module; parsing and encoding are still essentially recursive).

These changes present better APIs and capabilities that enable further
improvements and new features.

As part of this change, the MSRV has been bumped to `1.75.0`.
  • Loading branch information
olson-sean-k committed Feb 14, 2024
1 parent a7a02f6 commit 68b94ff
Show file tree
Hide file tree
Showing 22 changed files with 4,398 additions and 1,923 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/continuous-integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ jobs:
matrix:
os: [macOS-latest, ubuntu-latest, windows-latest]
toolchain:
- 1.66.1 # Minimum.
- 1.75.0 # Minimum.
- stable
- beta
- nightly
Expand Down
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description = "Opinionated and portable globs that can be matched against paths
repository = "https://github.com/olson-sean-k/wax"
readme = "README.md"
edition = "2021"
rust-version = "1.66.1"
rust-version = "1.75.0"
license = "MIT"
keywords = [
"glob",
Expand Down
26 changes: 13 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ occurrence of that literal while `$` stops at the first occurence.
The exactly-one wildcard `?` matches any single character within a component
(**never path separators**). Exactly-one wildcards do not group automatically,
so a pattern of contiguous wildcards such as `???` form distinct captures for
each `?` wildcard. [An alternative](#alternatives) can be used to group
each `?` wildcard. [An alternation](#alternations) can be used to group
exactly-one wildcards into a single capture, such as `{???}`.

The tree wildcard `**` matches any characters across zero or more components.
Expand Down Expand Up @@ -210,21 +210,21 @@ concern, but **such patterns should be avoided**.
Character classes have limited utility on their own, but compose well with
[repetitions](#repetitions).

### Alternatives
### Alternations

Alternatives match an arbitrary sequence of one or more comma separated
Alternations match an arbitrary sequence of one or more comma separated
sub-globs delimited by curly braces `{...,...}`. For example, `{a?c,x?z,foo}`
matches any of the sub-globs `a?c`, `x?z`, or `foo`. Alternatives may be
matches any of the alternative globs `a?c`, `x?z`, or `foo`. Alternations may be
arbitrarily nested and composed with [repetitions](#repetitions).

Alternatives form a single capture group regardless of the contents of their
Alternations form a single capture group regardless of the contents of their
sub-globs. This capture is formed from the complete match of the sub-glob, so if
the alternative `{a?c,x?z}` matches `abc`, then the captured text will be `abc`
(**not** `b`). Alternatives can be used to group captures using a single
sub-glob, such as `{*.{go,rs}}` to capture an entire file name with a particular
extension or `{???}` to group a sequence of exactly-one wildcards.
the alternation `{a?c,x?z}` matches the path `abc`, then the captured text will
be `abc` (**not** `b`). Alternations can be used to group captures using a
single sub-glob, such as `{*.{go,rs}}` to capture an entire file name with a
particular extension or `{???}` to group a sequence of exactly-one wildcards.

Alternatives must consider adjacency rules and neighboring patterns. For
Alternations must consider adjacency rules and neighboring patterns. For
example, `*{a,b*}` is allowed but `*{a,*b}` is not. Additionally, they may not
contain a sub-glob consisting of a singular tree wildcard `**` and cannot root a
glob expression as this could cause the expression to match or walk overlapping
Expand All @@ -238,7 +238,7 @@ precedes the colon and an optional bounds specification follows it. For example,
`<a*/:0,>` matches the sub-glob `a*/` zero or more times. Though not implicit
like tree [wildcards](#wildcards), **repetitions can match across component
boundaries** (and can themselves include tree wildcards). Repetitions may be
arbitrarily nested and composed with [alternatives](#alternatives).
arbitrarily nested and composed with [alternations](#alternations).

Bound specifications are formed from inclusive lower and upper bounds separated
by a comma `,`, such as `:1,4` to match between one and four times. The upper
Expand Down Expand Up @@ -284,7 +284,7 @@ let any = wax::any(["**/*.txt", "src/**/*.rs"]).unwrap();
assert!(any.is_match("src/lib.rs"));
```

Unlike [alternatives](#alternatives), `Any` supports patterns with overlapping
Unlike [alternations](#alternations), `Any` supports patterns with overlapping
trees (rooted and unrooted expressions). However, combinators can only perform
logical matches and it is not possible to match an `Any` against a directory
tree (as with `Glob::walk`).
Expand Down Expand Up @@ -332,7 +332,7 @@ Error: wax::glob::adjacent_zero_or_more
1 | doc/**/*{.md,.tex,*.txt}
: |^^^^^^^^|^^^^^^^
: | | `-- here
: | `-- in this alternative
: | `-- in this alternation
: `-- here
`----
```
Expand Down
4 changes: 2 additions & 2 deletions src/capture.rs
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ impl From<OwnedText> for MaybeOwnedText<'static> {
/// # Examples
///
/// Capturing tokens and matched text can be used to isolate sub-text in a match. For example, the
/// file name of a match can be extracted using an alternative to group patterns.
/// file name of a match can be extracted using an alternation to group patterns.
///
/// ```rust
/// use wax::{CandidatePath, Glob, Program};
Expand Down Expand Up @@ -155,7 +155,7 @@ impl<'t> MatchedText<'t> {
/// implicit complete text. For example, the sub-expression `*` in the glob expression `*.txt`
/// is at index one and will exclude the suffix `.txt` in its matched text.
///
/// Alternative and repetition patterns group their sub-globs into a single capture, so it is
/// Alternation and repetition patterns group their sub-globs into a single capture, so it is
/// not possible to isolate matched text from their sub-globs. This can be used to explicitly
/// group matched text, such as isolating an entire matched file name using an expression like
/// `{*.{go,rs}}`.
Expand Down
40 changes: 23 additions & 17 deletions src/diagnostics/miette.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ use tardar::{
};
use thiserror::Error;

use crate::diagnostics::SpanExt as _;
use crate::diagnostics::{SpanExt, Spanned};
use crate::rule::{self, Checked};
use crate::token::{self, TokenKind, TokenTree, Tokenized};
use crate::token::{self, Boundary, ExpressionMetadata, TokenTree, Tokenized};
use crate::Glob;

/// APIs for diagnosing globs.
Expand Down Expand Up @@ -40,7 +40,7 @@ impl<'t> Glob<'t> {
/// [`Glob::new`]: crate::Glob::new
pub fn diagnosed(expression: &'t str) -> DiagnosticResult<'t, Self> {
parse_and_diagnose(expression).and_then_diagnose(|tree| {
Glob::compile(tree.as_ref().tokens())
Glob::compile::<Tokenized<_>>(tree.as_ref())
.into_error_diagnostic()
.map_output(|program| Glob { tree, program })
})
Expand Down Expand Up @@ -84,10 +84,12 @@ pub struct TerminatingSeparatorWarning<'t> {
span: SourceSpan,
}

fn parse_and_diagnose(expression: &str) -> DiagnosticResult<Checked<Tokenized>> {
fn parse_and_diagnose(
expression: &str,
) -> DiagnosticResult<Checked<Tokenized<ExpressionMetadata>>> {
token::parse(expression)
.into_error_diagnostic()
.and_then_diagnose(|tokenized| rule::check(tokenized).into_error_diagnostic())
.and_then_diagnose(|tree| rule::check(tree).into_error_diagnostic())
.and_then_diagnose(|checked| {
// TODO: This should accept `&Checked`.
diagnose(checked.as_ref())
Expand All @@ -96,37 +98,41 @@ fn parse_and_diagnose(expression: &str) -> DiagnosticResult<Checked<Tokenized>>
})
}

fn diagnose<'i, 't>(
tokenized: &'i Tokenized<'t>,
) -> impl 'i + Iterator<Item = BoxedDiagnostic<'t>> {
fn diagnose<'i, 't, A>(tree: &'i Tokenized<'t, A>) -> impl 'i + Iterator<Item = BoxedDiagnostic<'t>>
where
A: Spanned,
{
let token = tree.as_token();
None.into_iter()
.chain(
token::literals(tokenized.tokens())
token
.literals()
.filter(|(_, literal)| literal.is_semantic_literal())
.map(|(component, literal)| {
Box::new(SemanticLiteralWarning {
expression: tokenized.expression().clone(),
expression: tree.expression().clone(),
literal: literal.text().clone(),
span: component
.tokens()
.iter()
.map(|token| *token.annotation())
.reduce(|left, right| left.union(&right))
.map(|token| token.annotation().span())
.copied()
.reduce(SpanExt::union)
.map(SourceSpan::from)
.expect("no tokens in component"),
}) as BoxedDiagnostic
}),
)
.chain(
tokenized
.tokens()
token
.concatenation()
.last()
.into_iter()
.filter(|token| matches!(token.kind(), TokenKind::Separator(_)))
.filter(|token| matches!(token.boundary(), Some(Boundary::Separator)))
.map(|token| {
Box::new(TerminatingSeparatorWarning {
expression: tokenized.expression().clone(),
span: (*token.annotation()).into(),
expression: tree.expression().clone(),
span: (*token.annotation().span()).into(),
}) as BoxedDiagnostic
}),
)
Expand Down
30 changes: 28 additions & 2 deletions src/diagnostics/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,17 +28,43 @@ use std::fmt::{self, Display, Formatter};
pub type Span = (usize, usize);

pub trait SpanExt {
fn union(&self, other: &Self) -> Self;
fn union(self, other: Self) -> Self;
}

impl SpanExt for Span {
fn union(&self, other: &Self) -> Self {
fn union(self, other: Self) -> Self {
let start = cmp::min(self.0, other.0);
let end = cmp::max(self.0 + self.1, other.0 + other.1);
(start, end - start)
}
}

pub trait Spanned {
fn span(&self) -> &Span;

fn span_mut(&mut self) -> &mut Span;

fn map_span<F>(mut self, f: F) -> Self
where
Self: Sized,
F: FnOnce(Span) -> Span,
{
let span = *self.span();
*self.span_mut() = f(span);
self
}
}

impl Spanned for Span {
fn span(&self) -> &Span {
self
}

fn span_mut(&mut self) -> &mut Span {
self
}
}

/// Error associated with a [`Span`] within a glob expression.
///
/// Located errors describe specific instances of an error within a glob expression. Types that
Expand Down
Loading

0 comments on commit 68b94ff

Please sign in to comment.