Skip to content

Commit

Permalink
Update comments and various token module fixes.
Browse files Browse the repository at this point in the history
  • Loading branch information
olson-sean-k committed Feb 14, 2024
1 parent 7f3d8cb commit 076075c
Show file tree
Hide file tree
Showing 6 changed files with 96 additions and 58 deletions.
19 changes: 14 additions & 5 deletions src/token/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -267,9 +267,10 @@ impl<'t, A> Tokenized<'t, A>
where
A: Default + Spanned,
{
// TODO: This function is limited to immediate concatenations, but probably shouldn't be. As a
// result, expressions like `{.cargo/**/*.crate}` do not partition (in this case, into
// the path `.cargo` and glob `**/*.crate`).
// TODO: This function is limited to immediate concatenations and so expressions like
// `{.cargo/**/*.crate}` do not partition (into the path `.cargo` and glob
// `{**/*.crate}`). This requires a more sophisticated transformation of the token tree,
// but is probably worth supporting. Maybe this can be done via `FoldMap`.
pub fn partition(self) -> (PathBuf, Option<Self>) {
fn pop_expression_bytes(expression: &str, n: usize) -> &str {
let n = cmp::min(expression.len(), n);
Expand Down Expand Up @@ -473,7 +474,7 @@ impl<'t, A> Token<'t, A> {
self.as_leaf().and_then(LeafKind::boundary)
}

pub fn variance<'a, T>(&'a self) -> GlobVariance<T>
pub fn variance<T>(&self) -> GlobVariance<T>
where
TreeVariance<T>: Fold<'t, A, Term = GlobVariance<T>>,
T: Conjunction<Output = T> + Invariant,
Expand Down Expand Up @@ -502,6 +503,8 @@ impl<'t, A> Token<'t, A> {

let mut tokens = self.concatenation().iter().peekable();
if tokens.peek().map_or(false, |token| {
// This is a very general predicate, but at time of writing amounts to, "Is this a tree
// wildcard?"
token.has_root().is_always() && token.variance::<Text>().is_variant()
}) {
return (0, String::from(Separator::INVARIANT_TEXT));
Expand Down Expand Up @@ -594,10 +597,16 @@ impl<'t, A> Token<'t, A> {
}
}

// TODO: Exhaustiveness should be expressed with `When` rather than `bool`. In particular,
// alternations may _sometimes_ be exhaustive.
// TODO: There is a distinction between exhaustiveness of a glob and exhaustiveness of a match
// (this is also true of other properties). The latter can be important for performance
// optimization, but may also be useful in the public API (perhaps as part of
// `MatchedText`).
// NOTE: False positives in this function may cause logic errors and are completely
// unacceptable. The discovery of a false positive here almost cetainly indicates a
// serious bug. False positives in negative patterns cause matching to incorrectly
// discard directory trees.
// discard directory trees in the `FileIterator::not` combinator.
pub fn is_exhaustive(&self) -> bool {
self.fold(TreeExhaustiveness)
.as_ref()
Expand Down
16 changes: 6 additions & 10 deletions src/token/variance/bound.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,6 @@ use crate::token::variance::Variance;

pub use Boundedness::{Bounded, Unbounded};

// TODO: Reorganize type definitions and `impl`s in this module. Reorder from specific (e.g.,
// `Bound<NonZeroUsize>`) to general (e.g., `Bound<T>`) or in whichever way is consistent
// with other modules in the crate.

pub trait Cobound: Sized {
type Bound;

Expand Down Expand Up @@ -53,7 +49,7 @@ pub trait OpenedUpperBound: Sized {
}

#[derive(Clone, Copy, Debug, Eq, Hash, PartialEq)]
pub struct Operand<T> {
pub struct BinaryOperand<T> {
pub lhs: T,
pub rhs: T,
}
Expand All @@ -65,7 +61,7 @@ pub struct LowerUpper<L, U> {
}

pub type LowerUpperBound<B> = LowerUpper<Lower<B>, Upper<B>>;
pub type LowerUpperOperand<B> = LowerUpper<Operand<Lower<B>>, Operand<Upper<B>>>;
pub type LowerUpperOperand<B> = LowerUpper<BinaryOperand<Lower<B>>, BinaryOperand<Upper<B>>>;

pub type NaturalBound = Variance<Zero, NonZeroBound>;
pub type NaturalRange = Variance<usize, VariantRange>;
Expand Down Expand Up @@ -123,8 +119,8 @@ impl NaturalRange {
// here: the closed bound is considered unbounded when zero, but the open bound is only
// considered unbounded when `None`.
// NOTE: Given the naturals X and Y where X < Y, this defines an unconventional meaning for the
// range [Y,X] and repetitions like `<_:10,1>`: the bounds are reordered, so `<_:10,1>` and
// `<_:1,10>` are the same.
// range [Y,X] and therefore repetitions like `<_:10,1>`: the bounds are reordered, so
// `<_:10,1>` and `<_:1,10>` are the same.
pub fn from_closed_and_open<T>(closed: usize, open: T) -> Self
where
T: Into<Option<usize>>,
Expand Down Expand Up @@ -162,11 +158,11 @@ impl NaturalRange {
{
let lhs = self;
let LowerUpper { lower, upper } = f(LowerUpper {
lower: Operand {
lower: BinaryOperand {
lhs: lhs.lower(),
rhs: rhs.lower(),
},
upper: Operand {
upper: BinaryOperand {
lhs: lhs.upper(),
rhs: rhs.upper(),
},
Expand Down
24 changes: 10 additions & 14 deletions src/token/variance/invariant/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,26 +28,22 @@ pub trait Invariant: Sized + Identity {
fn into_lower_bound(self) -> Boundedness<Self::Bound>;
}

// Breadth is the size of components in a glob expression (for some notion of size, probably UTF-8
// bytes). For example, the breadth of `{a/,a/b/}<c/d:2>` is two.
// Breadth is the maximum size (UTF-8 bytes) of component text in a glob expression. For example,
// the breadth of `{a/,a/b/}<c/d:2>` is two (bytes).
//
// Composition (i.e., folds, conjunction, etc.) is not implemented for breadth and it is only
// queried in leaf tokens to compute exhaustiveness. No values are associated with its invariant
// nor bounds.
//
// Breadth is probably the least interesting and yet most difficult quantity to compute.
// Determining correct breadth values likely involves:
// Composition is not implemented for breadth and it is only queried from leaf tokens. No values
// are associated with its invariant nor bounds. Breadth is probably the least interesting and yet
// most difficult quantity to compute. Determining correct breadth values likely involves:
//
// - Complex terms that support both running sums and running maximums across boundaries.
// - Searches from the start and end of sub-trees in repetitions, where terminals may interact.
// Consider `<a/b:2>` vs. `<a/b/:2>`, which have breadths of two and one, respectively. These
// searches probably need to cross levels in the token tree somehow.
// Consider `<a/b:2>` vs. `<a/b/:2>`, which have breadths of two and one, respectively.
// - Potentially large sets for bounds. Ranges lose too much information, in particular information
// about boundaries when composing terms and bounds.
// about boundaries when composing terms.
//
// There are likely other difficult composition problems, though this all assumes computation via a
// tree fold. As such, breadth is only implemented as much as needed. Any requirements on more
// complete breadth computations ought to be considered carefully and avoided.
// There are likely other difficult composition problems. As such, breadth is only implemented as
// much as needed. Any requirements on more complete breadth computations ought to be considered
// carefully.
#[derive(Clone, Copy, Debug, Eq, PartialEq)]
pub struct Breadth;

Expand Down
82 changes: 59 additions & 23 deletions src/token/variance/invariant/text.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,25 +9,27 @@ use crate::token::variance::ops::{self, Conjunction, Disjunction, Product};
use crate::token::variance::Variance;
use crate::PATHS_ARE_CASE_INSENSITIVE;

use Semantic::{Nominal, Structural};

pub trait IntoNominalText<'t> {
fn into_nominal_text(self) -> Text<'t>;
}

impl<'t> IntoNominalText<'t> for Cow<'t, str> {
fn into_nominal_text(self) -> Text<'t> {
Fragment::Nominal(self).into()
Nominal(self).into()
}
}

impl<'t> IntoNominalText<'t> for &'t str {
fn into_nominal_text(self) -> Text<'t> {
Fragment::Nominal(self.into()).into()
Nominal(self.into()).into()
}
}

impl IntoNominalText<'static> for String {
fn into_nominal_text(self) -> Text<'static> {
Fragment::Nominal(self.into()).into()
Nominal(self.into()).into()
}
}

Expand All @@ -37,26 +39,23 @@ pub trait IntoStructuralText<'t> {

impl<'t> IntoStructuralText<'t> for Cow<'t, str> {
fn into_structural_text(self) -> Text<'t> {
Fragment::Structural(self).into()
Structural(self).into()
}
}

impl<'t> IntoStructuralText<'t> for &'t str {
fn into_structural_text(self) -> Text<'t> {
Fragment::Structural(self.into()).into()
Structural(self.into()).into()
}
}

impl IntoStructuralText<'static> for String {
fn into_structural_text(self) -> Text<'static> {
Fragment::Structural(self.into()).into()
Structural(self.into()).into()
}
}

// TODO: The derived `PartialEq` implementation is incomplete and does not detect contiguous like
// fragments that are equivalent to an aggregated fragment. This works, but relies on
// constructing `InvariantText` by consistently appending fragments.
#[derive(Clone, Debug, Eq, PartialEq)]
#[derive(Clone, Debug, Eq)]
pub struct Text<'t> {
fragments: VecDeque<Fragment<'t>>,
}
Expand Down Expand Up @@ -96,6 +95,15 @@ impl<'t> Text<'t> {
.collect(),
}
}

fn bytes(&self) -> impl '_ + Iterator<Item = Byte> {
self.fragments.iter().flat_map(|fragment| {
fragment.bytes().map(match fragment {
Nominal(_) => Nominal,
Structural(_) => Structural,
})
})
}
}

impl<'t> Conjunction for Text<'t> {
Expand Down Expand Up @@ -150,6 +158,7 @@ impl<'t> Identity for Text<'t> {
}

impl<'t> Invariant for Text<'t> {
// No bounds are computed for text (only boundedness).
type Bound = UnitBound;

fn bound(_: Self, _: Self) -> Boundedness<Self::Bound> {
Expand All @@ -161,6 +170,12 @@ impl<'t> Invariant for Text<'t> {
}
}

impl<'t> PartialEq for Text<'t> {
fn eq(&self, other: &Self) -> bool {
self.bytes().eq(other.bytes())
}
}

impl<'t> Product<usize> for Text<'t> {
type Output = Self;

Expand All @@ -185,35 +200,56 @@ impl<'t> Disjunction<Text<'t>> for UnitBound {
}
}

#[derive(Clone, Debug, Eq)]
enum Fragment<'t> {
Nominal(Cow<'t, str>),
Structural(Cow<'t, str>),
#[derive(Clone, Copy, Debug)]
enum Semantic<T> {
Nominal(T),
Structural(T),
}

type Byte = Semantic<u8>;
type Fragment<'t> = Semantic<Cow<'t, str>>;

impl<T> AsRef<T> for Semantic<T> {
fn as_ref(&self) -> &T {
match self {
Nominal(ref inner) | Structural(ref inner) => inner,
}
}
}

impl Eq for Byte {}

impl PartialEq for Byte {
fn eq(&self, other: &Self) -> bool {
match (self, other) {
(Nominal(ref left), Nominal(ref right))
| (Structural(ref left), Structural(ref right)) => left == right,
_ => false,
}
}
}

impl<'t> Fragment<'t> {
pub fn into_owned(self) -> Fragment<'static> {
use Fragment::{Nominal, Structural};

match self {
Nominal(text) => Nominal(text.into_owned().into()),
Structural(text) => Structural(text.into_owned().into()),
}
}

pub fn as_string(&self) -> &Cow<'t, str> {
match self {
Fragment::Nominal(ref text) | Fragment::Structural(ref text) => text,
}
self.as_ref()
}

fn bytes(&self) -> impl '_ + Iterator<Item = u8> {
self.as_string().bytes()
}
}

impl<'t> Conjunction for Fragment<'t> {
type Output = Text<'t>;

fn conjunction(self, other: Self) -> Self::Output {
use Fragment::{Nominal, Structural};

match (self, other) {
(Nominal(left), Nominal(right)) => Text {
fragments: [Nominal(left + right)].into_iter().collect(),
Expand All @@ -228,10 +264,10 @@ impl<'t> Conjunction for Fragment<'t> {
}
}

impl<'t> Eq for Fragment<'t> {}

impl<'t> PartialEq for Fragment<'t> {
fn eq(&self, other: &Self) -> bool {
use Fragment::{Nominal, Structural};

match (self, other) {
(Nominal(ref left), Nominal(ref right)) => {
if PATHS_ARE_CASE_INSENSITIVE {
Expand Down
6 changes: 0 additions & 6 deletions src/token/variance/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,6 @@ where
#[derive(Clone, Copy, Debug, Eq)]
pub enum Variance<T, B = Boundedness<UnitBound>> {
Invariant(T),
// When variant, the bound describes the constraints of that variance. For example, the
// expression `**` is unconstrained and matches _any and all_ text, breadths, and depths. On
// the other hand, the expression `a*z` is constrained and matches _some_ text and _some_
// breadths (and is invariant w.r.t. depth). Regarding text, note that bounds do **not**
// consider the length at all (breadth). Only constraints that limit an expression to a known
// set of matches are considered, so both `?` and `*` are unbounded w.r.t. text.
Variant(B),
}

Expand Down
7 changes: 7 additions & 0 deletions src/walk/glob.rs
Original file line number Diff line number Diff line change
Expand Up @@ -445,6 +445,13 @@ impl GlobWalker {
}
}

// TODO: Partitioned programs are important here, because there is no other way to determine the
// exhaustiveness of a match (which is not the same as the exhaustiveness of a glob). This
// partitioning leaks into the `FileIterator::not` API, which accepts a sequence of
// `Pattern`s rather than one `Pattern` as a best effort attempt to detect exhaustive
// negations. Consider instead a decomposition of token trees that separates the branches of
// level-adjacent alternations from the root. This may allow APIs to accept `Any` and still
// partition in this way when match exhaustiveness is important.
#[derive(Clone, Debug)]
enum FilterAnyProgram {
Empty,
Expand Down

0 comments on commit 076075c

Please sign in to comment.