Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrid date time index - Initial Proposal #88

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

ahmed-mahran
Copy link
Contributor

In this PR, a number of time series transformations are (would be) contributed.
Those kind of transformations are basically transformations of the underlying date time index such that:

new_ts = ts.transform is equivalent to new_ts = ts.rebase(ts.index.transform)

new_ts = transform(ts1, ts2) is equivalent to

new_index = transform(ts1.index, ts2.index)
new_ts = ts1.rebase(new_index) + ts2.rebase(new_index)

@ahmed-mahran ahmed-mahran force-pushed the hybrid-date-time-index branch from a8e6268 to d2c792a Compare December 3, 2015 20:56
@ahmed-mahran
Copy link
Contributor Author

Additions to the public API:

  • HybridDateTimeIndex is an implementation of DateTimeIndex that holds a hybrid collection of different types of DateTimeIndex. Indices are assumed to be sorted and disjoint such that for any two consecutive indices i and j: i.last < j.first. This is helpful for transformations on multiple indices. Factory methods for constructing HybridDateTimeIndex:
    • DateTimeIndex.hybrid(indices: Array[DateTimeIndex])
    • DateTimeIndex.hybrid(indices: Array[DateTimeIndex], zone: DateTimeZone)
    • DateTimeIndex.fromString
  • DateTimeIndex helper methods
    • millisIterator(): Iterator[Long]
    • zonedDateTimeIterator(): Iterator[ZonedDateTime]
    • insertionLoc methods to find the location at which the given date-time could be inserted. It is the location of the first date-time that is greater than the given date-time. If the given date-time is greater than or equal to the last date-time in the index, the index size is returned. Used in transformations on multiple indices.
    • atZone(zone: ZoneId) adjusts the time zone of the index. Used in transformations on multiple indices.
    • equals uses match ... case instead of casting to avoid class cast exceptions
  • Transformations on indices added to DateTimeIndex object:
    • union
    • intersect
    • except
  • Transformations added to TimeSeries
    • union multiple multivariate time series of disjoint keys into one multivariate time series by applying union on all time indices and rebasing all univariate time series using the union index.
    • intersect multiple multivariate time series of disjoint keys into one multivariate time series if possible
    • leftJoin merging two multivariate time series of disjoint keys into one multivariate time series keeping instants of the left index only
    • rightJoin merging two multivariate time series of disjoint keys into one multivariate time series keeping instants of the right index only
    • withIndex rebases a time series using a new index

Additions to the private API:

  • Generic DateTimeIndex rebaser at TimeSeriesUtils.rebaserGeneric(sourceIndex: DateTimeIndex, targetIndex: DateTimeIndex, defaultValue: Double). Helpful for transformations on multiple indices of different types.
  • DateTimeIndexUtils object holds utilities methods of the DateTimeIndex
    • dateTimeIndexOrdering defines an ordering on DateTimeIndex s.t. for two DateTimeIndex x and y, x < y iff x.first < y.first || (x.first == y.first && x.size < y.size)
    • simplify(indices: Array[DateTimeIndex]): Array[DateTimeIndex] merges contiguous indices as possible
    • union unions a list of indices into one DateTimeIndex
    • intersect intersects a list of indices and returns a new index if possible
    • except(index1, index2) creates an index with instants of index1 that are not instants of index2
  • TimeSeriesUtils
    • rebaseAndMerge(tss: Array[TimeSeries[K]], newIndex: DateTimeIndex, defaultValue: Double): TimeSeries[K] a utility for rebasing a collection of multivariate time series of disjoint keys and merging them into one multivariate time series

@sryza
Copy link
Owner

sryza commented Dec 13, 2015

Apologies for the delay in getting to this, Ahmed. I've been traveling and on vacation the last couple weeks, but I will make sure to look at this this week.

@sryza
Copy link
Owner

sryza commented Dec 16, 2015

I'm wondering if we can break this out into a few smaller changes? E.g. can HybridDateTimeIndex be its own change, the DateTimeIndex operations like union etc. be its own change, and the TimeSeries API additions be their own change? How difficult would it be to separate these out?

Also, should TimeSeriesRDD get the same new methods that are being added to TimeSeries?

@ahmed-mahran
Copy link
Contributor Author

If it is for readability, I've tried to organize commits so that it is easy to understand the changes going though commits one by one. If it is not doing, I'll break it up then. Please, let me know.

I think yes TimeSeriesRDD should get the same methods as TimeSeries. The Java API as well. I'll submit those, as you recommend, either on the same PR or through a separate one.

@sryza
Copy link
Owner

sryza commented Dec 21, 2015

Ah, yes, the commit history does look nice. My concern is that it'll take me a non-negligible amount of time to get through the whole patch, and if I have suggested revisions for minor changes, it'll hold the whole thing up. Where, if alternatively, it's broken up into a couple different changes, we can start merging parts of it sooner. Which will save you work rebasing if other things need to get merged in the mean time. Would you be open to submitting all the commits up to 066efe8 as a PR?


private val sizeOnLeft: Array[Int] = {
var sum: Int = 0
(Array(0) ++ indices.init.map(_.size)).map {a =>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add space after curly brace

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could probably be written a little more clearly with scanLeft

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea 👍

@ahmed-mahran ahmed-mahran changed the title Hybrid date time index Hybrid date time index - Initial Proposal Dec 21, 2015
@ahmed-mahran
Copy link
Contributor Author

I got your point. I've opened a new PR #96 for the HybridDateTimeIndex only.

@ahmed-mahran ahmed-mahran force-pushed the hybrid-date-time-index branch from c29681a to 80c8862 Compare December 21, 2015 23:54
@ahmed-mahran
Copy link
Contributor Author

Suggested splits:

What do you think?

@sryza
Copy link
Owner

sryza commented Dec 27, 2015

How about the third - i.e. "Support intersection"?

@ahmed-mahran
Copy link
Contributor Author

I've issued a new PR up to "Support intersection" #100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants