Skip to content

Commit

Permalink
Refactor variance.
Browse files Browse the repository at this point in the history
This large change does a major refactoring of `token::variance`.
Operations are now described by traits, invariants now specify bounds,
and natural invariants (those that model natural "counting" numbers)
have well-defined bounds.

This is not quite working yet, but yields some powerful insights. For
one, the `union` operator may not be necessary. It propagates
unboundedness, which did not happen before when, for example, taking the
conjunction of unbounded and invariant variance. Now, this defers to
`Bound<B>`, where `B` is the bound type of the invariant. For natural
invariants, they yields unbounded when zero and bounded (with an
appropriate range) when non-zero. This works naturally for depth: a
depth of zero and an unbounded depth remain unbounded, but a depth of
one and an unbounded depth yield an open range from that depth to
infinity. Consider `a/**` (0 + INF) and `a/b/**` (0 + 1 + 0 + INF).

`Repetition` seems to be quite broken too. For example, an invariant
(converged) repetition should not only repeat invariants, but also
repeat bounds! For example, consider `<<?:1,2>:2>`. The size of the
inner repetition is `Variant(Bounded(1,2))`. The outer repetition is
invariant, but does not apply a repetition to the bounds! The size of
the outer repetition _should_ be `Variant(Bounded(2,4))`.

There may be more to figure out regarding `UnitBound`. Such a bound may
not be possible. Take a look at the conversions from invariants to
`Bound<UnitBound>`.
  • Loading branch information
olson-sean-k committed Feb 3, 2024
1 parent a10f507 commit b032530
Show file tree
Hide file tree
Showing 9 changed files with 1,256 additions and 392 deletions.
10 changes: 5 additions & 5 deletions src/encode.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ use std::borrow::{Borrow, Cow};
use std::fmt::Display;
use thiserror::Error;

use crate::token::{Bound, ConcatenationTree, Token, TokenTopology};
use crate::token::{ConcatenationTree, Token, TokenTopology};

/// A regular expression that never matches.
///
Expand Down Expand Up @@ -296,7 +296,7 @@ fn encode<'t, T>(
Concatenation(_) => unreachable!(),
Repetition(repetition) => {
let encoding = {
let cardinality = repetition.cardinality();
let variance = repetition.variance();
let mut pattern = String::new();
pattern.push_str("(?:");
encode::<Token<_>>(
Expand All @@ -305,11 +305,11 @@ fn encode<'t, T>(
&mut pattern,
repetition.token(),
);
pattern.push_str(&if let Bound::Bounded(upper) = cardinality.upper() {
format!("){{{},{}}}", cardinality.lower(), upper)
pattern.push_str(&if let Some(upper) = variance.upper().into_usize() {
format!("){{{},{}}}", variance.lower().into_usize(), upper)
}
else {
format!("){{{},}}", cardinality.lower())
format!("){{{},}}", variance.lower().into_usize())
});
pattern
};
Expand Down
39 changes: 20 additions & 19 deletions src/rule.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ use thiserror::Error;
use crate::diagnostics::{CompositeSpan, CorrelatedSpan, SpanExt as _, Spanned};
use crate::token::walk::{self, TokenEntry};
use crate::token::{
self, BranchKind, Cardinality, ExpressionMetadata, Repetition, Size, Token, TokenTree,
self, BranchKind, ExpressionMetadata, NaturalVariance, Repetition, Size, Token, TokenTree,
Tokenized,
};
use crate::{Any, BuildError, Glob, Pattern};
Expand Down Expand Up @@ -374,7 +374,7 @@ where
A: Spanned,
{
boundary(&tree)?;
cardinality(&tree)?;
bounds(&tree)?;
group(&tree)?;
size(&tree)?;
Ok(Checked { inner: tree })
Expand Down Expand Up @@ -665,20 +665,20 @@ where
fn check_group_repetition<'i, 't, A>(
terminals: Terminals<&'i Token<'t, A>>,
outer: Outer<'i, 't, A>,
cardinality: Cardinality<usize>,
variance: NaturalVariance,
) -> Result<(), CorrelatedError>
where
A: Spanned,
{
let Outer { left, .. } = outer;
let lower = *cardinality.lower();
let lower = variance.lower().into_bound();
match terminals.map(|token| (token, token.as_leaf())) {
// The repetition is preceded by a termination; disallow rooted sub-globs with a zero
// lower bound.
//
// For example, `</foo:0,>`.
Only((inner, Some(Separator(_)))) | StartEnd((inner, Some(Separator(_))), _)
if left.is_none() && lower == 0 =>
if left.is_none() && lower.is_unbounded() =>
{
Err(CorrelatedError::new(
RuleErrorKind::RootedSubGlob,
Expand All @@ -692,7 +692,7 @@ where
// For example, `</**/foo>`.
Only((inner, Some(Wildcard(Tree { has_root: true }))))
| StartEnd((inner, Some(Wildcard(Tree { has_root: true }))), _)
if left.is_none() && lower == 0 =>
if left.is_none() && lower.is_unbounded() =>
{
Err(CorrelatedError::new(
RuleErrorKind::RootedSubGlob,
Expand Down Expand Up @@ -761,7 +761,7 @@ where
let concatenation = token.concatenation();
if let Some(terminals) = concatenation.terminals() {
check_group(terminals, outer).map_err(diagnose)?;
check_group_repetition(terminals, outer, repetition.cardinality())
check_group_repetition(terminals, outer, repetition.variance())
.map_err(diagnose)?;
}
tokens.push_back(token);
Expand All @@ -773,16 +773,15 @@ where
Ok(())
}

// TODO: `Cardinality` orders its inputs and so repetition expressions like `<a:6,1>` where the
// finite bounds are in nonconventional order cannot be detected here. The parser disallows
// any ambiguous syntax, but a question remains: should such bounds be rejected? If so,
// `Cardinality` may need an unchecked constructor.
// Arguably, this function enforces _syntactic_ rules. Any bound specification can be used to
// construct a functional repetition token, but this is only because `NaturalVariance` has an
// interpretation of these bounds (such as `<_:0,0>` and `<_:10,1>`). These interpretations may be
// counterintuitive (especially bounds converged on zero, which are considered entirely unbounded),
// so they are rejected here.
//
// For now, this function only checks for empty bounds like `<a:0,0>`.
//
// This highlights the advantages of using separate syntactic and semantic (intermediate)
// trees.
fn cardinality<'t, A>(tree: &Tokenized<'t, A>) -> Result<(), RuleError<'t>>
// Put another way, the _semantics_ of ranges like `0,0` differ between glob expressions and
// `NaturalVariance`, and the semantics of glob expressions must reject such ranges.
fn bounds<'t, A>(tree: &Tokenized<'t, A>) -> Result<(), RuleError<'t>>
where
A: Spanned,
{
Expand All @@ -791,9 +790,11 @@ where
.find(|token| {
token
.as_repetition()
.map(Repetition::cardinality)
.and_then(|cardinality| cardinality.convergence().copied())
.map_or(false, |n| n == 0)
.map(Repetition::bound_specification)
.and_then(|(lower, upper)| upper.map(|upper| (lower, upper)))
.map_or(false, |(lower, upper)| {
(lower > upper) || (lower == 0 && lower == upper)
})
})
{
Err(RuleError::new(
Expand Down
Loading

0 comments on commit b032530

Please sign in to comment.