Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added kpss stationarity test #43

Open
wants to merge 53 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 52 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
6c2e2a6
reworked bgtest to take residuals, added bptest
Jun 12, 2015
75e9932
cleaned up after adding bptest/bgtest
Jun 12, 2015
7fdbafa
cleaned up additional style issues
Jun 15, 2015
e7ed3ae
added EWMA functionality and tests
Jun 16, 2015
c6b5e55
Merge pull request #26 from josepablocam/master
sryza Jun 16, 2015
dcc429c
cleaned up additional style issues
Jun 15, 2015
5cfc79b
Add Hadoop dependency
sryza Jun 16, 2015
1aff3f4
Add seed generation in historical var example
sryza Jun 16, 2015
e1487d0
Convert tab to spaces
sryza Jun 16, 2015
db6a716
changed SSE calculation to use while loop rather than map + sum
Jun 17, 2015
fd45327
changed EWMA fitting to use gradient descent
Jun 17, 2015
7adc5d0
style changes
Jun 17, 2015
88bd843
added note stating ubounded optimization for ewma
Jun 17, 2015
fab45bc
Merge pull request #27 from josepablocam/ewma
sryza Jun 18, 2015
e942aae
Document more functionality in README
sryza Jun 19, 2015
4e56a07
Merge remote-tracking branch 'origin/master'
Jun 22, 2015
d151a6f
Copyright bump
cdalzell Jun 23, 2015
3bd0c28
Quick find & replace: toSamples -> toInstants
cdalzell Jun 23, 2015
ae91035
Merge pull request #28 from cdalzell/feature/#23-rename_toSamples
sryza Jun 23, 2015
5a4aa29
added up/down sampling and cubic spline interpolation
Jun 25, 2015
4b98b63
added up/down sampling tests
Jun 25, 2015
48d9251
Harmonize MLlib dependency version
sryza Jun 26, 2015
dd80326
started on acf plot
Jun 27, 2015
1c595b0
Mention R's equivalents in README
sryza Jun 27, 2015
0655b07
acf plot done
josepablocam Jun 28, 2015
432b10d
added pacf, refactored acf
josepablocam Jun 28, 2015
3c316ee
added scaladocs to acf/pacf
josepablocam Jun 28, 2015
78b5d77
added scaladocs to acf/pacf
josepablocam Jun 28, 2015
7ccc531
Merge remote-tracking branch 'origin/acf_pacf' into acf_pacf
josepablocam Jun 28, 2015
315ff27
changed up/downSample function names to up/downsample (along with rel…
Jun 30, 2015
f939146
Merge pull request #30 from josepablocam/acf_pacf
sryza Jun 30, 2015
87418e8
added scaladoc comments for up/down sampling
Jul 1, 2015
496dc92
Merge pull request #31 from josepablocam/docsampling
sryza Jul 2, 2015
bec7089
Renamed labels to keys
cdalzell Jul 10, 2015
1a15e0f
Merge pull request #33 from cdalzell/feature/#24-TimeSeries_label_to_key
sryza Jul 10, 2015
1684baf
Added scalastyle plugin config
cdalzell Jul 15, 2015
84923bf
Gave maven-surefire-plugin a version since Maven was cranky about it …
cdalzell Jul 15, 2015
67f90dd
Ignoring scalastyle-output.xml
cdalzell Jul 15, 2015
e096f80
Scalastyle now runs as part of mvn compile
cdalzell Jul 15, 2015
a0a7bd6
Added notes about how to disable scalastyle when needed, also gave th…
cdalzell Jul 15, 2015
4f9ac31
Adding notes about how to switch off checking for a specific rule
cdalzell Jul 15, 2015
fefaaa0
Assignment to val is not considered to be a magic number
cdalzell Jul 15, 2015
95529d0
Added the Cloudera file header
cdalzell Jul 16, 2015
f684218
Added newline at end of file
cdalzell Jul 21, 2015
73f4d35
Disabled a couple of checks
cdalzell Jul 24, 2015
4a7a7a3
Merge pull request #37 from cdalzell/feature/scalastyle
sryza Jul 27, 2015
09fdd40
added ljunbox test, commonly used with arima to check residuals for s…
josepablocam Aug 3, 2015
5e90bae
Merge pull request #41 from josepablocam/ljungbox
sryza Aug 3, 2015
a5cdfa4
modified easyplot functions to return figure, so that user can 'saveas'
Aug 4, 2015
945b0d0
Merge pull request #42 from josepablocam/return_figures
sryza Aug 5, 2015
c2405ab
added kpss stationarity test
Aug 7, 2015
8b5c188
added source for newey west
Aug 10, 2015
5b3a8f0
fixed spacing for critical values in kpss and style fix for capitaliz…
Aug 19, 2015
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@
target
*.iml
.idea

scalastyle-output.xml
20 changes: 15 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,12 @@ A Scala / Java library for interacting with time series data on Apache Spark.
Scaladoc is available at http://cloudera.github.io/spark-timeseries.

The aim here is to provide
* A set of abstractions for transforming and summarizing large time series data sets, similar to
* A set of abstractions for manipulating large time series data sets, similar to
what's provided for smaller data sets in
[Pandas](http://pandas.pydata.org/pandas-docs/dev/timeseries.html) and
[Matlab](http://www.mathworks.com/help/matlab/time-series.html).
[Pandas](http://pandas.pydata.org/pandas-docs/dev/timeseries.html),
[Matlab](http://www.mathworks.com/help/matlab/time-series.html), and R's
[zoo](http://cran.r-project.org/web/packages/zoo/index.html) and
[xts](http://cran.r-project.org/web/packages/xts/index.html) packages.
* Models, tests, and functions that enable dealing with time series from a statistical perspective,
similar to what's provided in [StatsModels](http://statsmodels.sourceforge.net/devel/tsa.html)
and a variety of Matlab and R packages.
Expand Down Expand Up @@ -52,16 +54,24 @@ TimeSeriesRDDs then support efficient series-wise operations like slicing, imput
val residuals = filled.mapSeries(series => ar(series, 1).removeTimeDependentEffects(series))


Statistical Functionality
Functionality
--------------------------

### Time Series
### Time Series Manipulation
* Aligning
* Slicing by date-time
* Missing value imputation

### Time Series Math and Stats

* Exponentially weighted moving average
* Autoregressive models
* GARCH models
* Missing data imputation
* Augmented Dickey-Fuller test
* Durbin-Watson test
* Breusch-Godfrey test
* Breusch-Pagan test

### General Prob / Stats

Expand Down
44 changes: 40 additions & 4 deletions pom.xml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright (c) 2014, Cloudera, Inc. All Rights Reserved.
Copyright (c) 2015, Cloudera, Inc. All Rights Reserved.

Cloudera, Inc. licenses this file to you under the Apache License,
Version 2.0 (the "License"). You may not use this file except in
Expand All @@ -13,7 +13,9 @@
the specific language governing permissions and limitations under the
License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.cloudera.datascience</groupId>
<artifactId>sparktimeseries</artifactId>
Expand Down Expand Up @@ -72,14 +74,43 @@
</execution>
</executions>
</plugin>

<!-- disable surefire -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.12</version>
<configuration>
<skipTests>true</skipTests>
</configuration>
</plugin>

<!-- enable scalastyle -->
<plugin>
<groupId>org.scalastyle</groupId>
<artifactId>scalastyle-maven-plugin</artifactId>
<version>0.7.0</version>
<configuration>
<verbose>false</verbose>
<failOnWarning>false</failOnWarning>
<failOnViolation>true</failOnViolation>
<includeTestSourceDirectory>true</includeTestSourceDirectory>
<sourceDirectory>${basedir}/src/main/scala</sourceDirectory>
<testSourceDirectory>${basedir}/src/test/scala</testSourceDirectory>
<configLocation>${basedir}/scalastyle-config.xml</configLocation>
<outputFile>${project.basedir}/scalastyle-output.xml</outputFile>
<outputEncoding>UTF-8</outputEncoding>
</configuration>
<executions>
<execution>
<phase>compile</phase>
<goals>
<goal>check</goal>
</goals>
</execution>
</executions>
</plugin>

<!-- enable scalatest -->
<plugin>
<groupId>org.scalatest</groupId>
Expand Down Expand Up @@ -160,6 +191,12 @@
</build>

<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-client</artifactId>
<version>2.6.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
Expand Down Expand Up @@ -198,8 +235,7 @@
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_${scala.minor.version}</artifactId>
<!-- TODO -->
<version>1.2.0</version>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
Expand Down
160 changes: 160 additions & 0 deletions scalastyle-config.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
<!-- If you wish to turn off checking for a section of code, you can put a comment in the source
before and after the section, with the following syntax: -->
<!-- // scalastyle:off -->
<!-- ... -->
<!-- // naughty stuff -->
<!-- ... -->
<!-- // scalastyle:on -->

<!-- You can also switch off checking for a specific rule by specifying the id of the rule to ignore
IDs and such can be found here: http://www.scalastyle.org/rules-0.7.0.html -->
<!--
// scalastyle:off magic.number
var notAtAllAMagicNumber = 1234
// scalastyle:on magic.number
-->
<scalastyle commentFilter="enabled">
<name>Scalastyle standard configuration</name>
<check level="warning" class="org.scalastyle.file.FileTabChecker" enabled="true"></check>
<check level="warning" class="org.scalastyle.file.FileLengthChecker" enabled="true">
<parameters>
<parameter name="maxFileLength"><![CDATA[800]]></parameter>
</parameters>
</check>
<check level="warning" class="org.scalastyle.file.HeaderMatchesChecker" enabled="true">
<parameters>
<parameter name="header"><![CDATA[/**
* Copyright (c) 2015, Cloudera, Inc. All Rights Reserved.
*
* Cloudera, Inc. licenses this file to you under the Apache License,
* Version 2.0 (the "License"). You may not use this file except in
* compliance with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* This software is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
* CONDITIONS OF ANY KIND, either express or implied. See the License for
* the specific language governing permissions and limitations under the
* License.
*/]]></parameter>
</parameters>
</check>
<check level="warning" class="org.scalastyle.scalariform.SpacesAfterPlusChecker" enabled="true"></check>
<check level="warning" class="org.scalastyle.file.WhitespaceEndOfLineChecker" enabled="true"></check>
<check level="warning" class="org.scalastyle.scalariform.SpacesBeforePlusChecker" enabled="true"></check>
<check level="warning" class="org.scalastyle.file.FileLineLengthChecker" enabled="true">
<parameters>
<parameter name="maxLineLength"><![CDATA[120]]></parameter>
<parameter name="tabSize"><![CDATA[2]]></parameter>
</parameters>
</check>
<check level="warning" class="org.scalastyle.scalariform.ClassNamesChecker" enabled="true">
<parameters>
<parameter name="regex"><![CDATA[[A-Z][A-Za-z]*]]></parameter>
</parameters>
</check>
<check level="warning" class="org.scalastyle.scalariform.ObjectNamesChecker" enabled="true">
<parameters>
<parameter name="regex"><![CDATA[[A-Z][A-Za-z]*]]></parameter>
</parameters>
</check>
<check level="warning" class="org.scalastyle.scalariform.PackageObjectNamesChecker" enabled="true">
<parameters>
<parameter name="regex"><![CDATA[^[a-z][A-Za-z]*$]]></parameter>
</parameters>
</check>
<check level="warning" class="org.scalastyle.scalariform.EqualsHashCodeChecker" enabled="true"></check>
<check level="warning" class="org.scalastyle.scalariform.IllegalImportsChecker" enabled="true">
<parameters>
<parameter name="illegalImports"><![CDATA[sun._,java.awt._]]></parameter>
</parameters>
</check>
<check level="warning" class="org.scalastyle.scalariform.ParameterNumberChecker" enabled="true">
<parameters>
<parameter name="maxParameters"><![CDATA[8]]></parameter>
</parameters>
</check>
<!-- Should only affect tests
<check level="warning" class="org.scalastyle.scalariform.MagicNumberChecker" enabled="true">
<parameters>
<parameter name="ignore"><![CDATA[-1,0,1,2,3]]></parameter>
</parameters>
</check>
-->
<check level="warning" class="org.scalastyle.scalariform.NoWhitespaceBeforeLeftBracketChecker" enabled="true"></check>
<check level="warning" class="org.scalastyle.scalariform.NoWhitespaceAfterLeftBracketChecker" enabled="true"></check>
<check level="warning" class="org.scalastyle.scalariform.ReturnChecker" enabled="true"></check>
<check level="warning" class="org.scalastyle.scalariform.NullChecker" enabled="true"></check>
<check level="warning" class="org.scalastyle.scalariform.NoCloneChecker" enabled="true"></check>
<check level="warning" class="org.scalastyle.scalariform.NoFinalizeChecker" enabled="true"></check>
<check level="warning" class="org.scalastyle.scalariform.CovariantEqualsChecker" enabled="true"></check>
<check level="warning" class="org.scalastyle.scalariform.StructuralTypeChecker" enabled="true"></check>
<check level="warning" class="org.scalastyle.file.RegexChecker" enabled="true">
<parameters>
<parameter name="regex"><![CDATA[println]]></parameter>
</parameters>
</check>
<check level="warning" class="org.scalastyle.scalariform.NumberOfTypesChecker" enabled="true">
<parameters>
<parameter name="maxTypes"><![CDATA[30]]></parameter>
</parameters>
</check>
<check level="warning" class="org.scalastyle.scalariform.CyclomaticComplexityChecker" enabled="true">
<parameters>
<parameter name="maximum"><![CDATA[10]]></parameter>
</parameters>
</check>
<check level="warning" class="org.scalastyle.scalariform.UppercaseLChecker" enabled="true"></check>
<check level="warning" class="org.scalastyle.scalariform.SimplifyBooleanExpressionChecker" enabled="true"></check>
<check level="warning" class="org.scalastyle.scalariform.IfBraceChecker" enabled="true">
<parameters>
<parameter name="singleLineAllowed"><![CDATA[true]]></parameter>
<parameter name="doubleLineAllowed"><![CDATA[false]]></parameter>
</parameters>
</check>
<check level="warning" class="org.scalastyle.scalariform.MethodLengthChecker" enabled="true">
<parameters>
<parameter name="maxLength"><![CDATA[50]]></parameter>
</parameters>
</check>
<check level="warning" class="org.scalastyle.scalariform.MethodNamesChecker" enabled="true">
<parameters>
<parameter name="regex"><![CDATA[^[a-z][A-Za-z0-9]*$]]></parameter>
</parameters>
</check>
<check level="warning" class="org.scalastyle.scalariform.NumberOfMethodsInTypeChecker" enabled="true">
<parameters>
<parameter name="maxMethods"><![CDATA[30]]></parameter>
</parameters>
</check>
<check level="warning" class="org.scalastyle.scalariform.PublicMethodsHaveTypeChecker" enabled="true"></check>
<check level="warning" class="org.scalastyle.file.NewLineAtEofChecker" enabled="true"></check>
<check level="warning" class="org.scalastyle.file.NoNewLineAtEofChecker" enabled="false"></check>
<check level="warning" class="org.scalastyle.scalariform.WhileChecker" enabled="false"></check>
<check level="warning" class="org.scalastyle.scalariform.VarFieldChecker" enabled="false"></check>
<check level="warning" class="org.scalastyle.scalariform.VarLocalChecker" enabled="false"></check>
<check level="warning" class="org.scalastyle.scalariform.RedundantIfChecker" enabled="false"></check>
<check level="warning" class="org.scalastyle.scalariform.TokenChecker" enabled="false">
<parameters>
<parameter name="regex"><![CDATA[println]]></parameter>
</parameters>
</check>
<check level="warning" class="org.scalastyle.scalariform.DeprecatedJavaChecker" enabled="true"></check>
<check level="warning" class="org.scalastyle.scalariform.EmptyClassChecker" enabled="true"></check>
<check level="warning" class="org.scalastyle.scalariform.ClassTypeParameterChecker" enabled="true">
<parameters>
<parameter name="regex"><![CDATA[^[A-Z_]$]]></parameter>
</parameters>
</check>
<!-- Wildcard imports are fine
<check level="warning" class="org.scalastyle.scalariform.UnderscoreImportChecker" enabled="true"></check>
-->
<check level="warning" class="org.scalastyle.scalariform.LowercasePatternMatchChecker" enabled="true"></check>
<check level="warning" class="org.scalastyle.scalariform.MultipleStringLiteralsChecker" enabled="true">
<parameters>
<parameter name="allowed"><![CDATA[2]]></parameter>
<parameter name="ignoreRegex"><![CDATA[^""$]]></parameter>
</parameters>
</check>
<check level="warning" class="org.scalastyle.scalariform.ImportGroupingChecker" enabled="true"></check>
</scalastyle>
Original file line number Diff line number Diff line change
Expand Up @@ -25,18 +25,19 @@ import com.cloudera.sparkts.TimeSeriesRDD._

import com.github.nscala_time.time.Imports._

import org.apache.commons.math3.random.RandomGenerator
import org.apache.commons.math3.random.{MersenneTwister, RandomGenerator}
import org.apache.commons.math3.stat.regression.OLSMultipleLinearRegression

import org.apache.spark.{SparkConf, SparkContext}

object HistoricalValueAtRiskExample {
def main(args: Array[String]): Unit = {
// Read parameters
val numTrials = if (args.length > 0) args(0).toInt else 10000
val parallelism = if (args.length > 1) args(1).toInt else 10
val factorsDir = if (args.length > 2) args(2) else "factors/"
val instrumentsDir = if (args.length > 3) args(3) else "instruments/"
val factorsDir = if (args.length > 0) args(0) else "factors/"
val instrumentsDir = if (args.length > 1) args(1) else "instruments/"
val numTrials = if (args.length > 2) args(2).toInt else 10000
val parallelism = if (args.length > 3) args(3).toInt else 10
val horizon = if (args.length > 4) args(4).toInt else 1

// Initialize Spark
val conf = new SparkConf().setMaster("local").setAppName("Historical VaR")
Expand Down Expand Up @@ -66,13 +67,18 @@ object HistoricalValueAtRiskExample {

// Fit an AR(1) + GARCH(1, 1) model to each factor
val garchModels = factorReturns.mapValues(ARGARCH.fitModel(_)).toMap
val iidFactorReturns = factorReturns.mapSeriesWithLabel { case (symbol, series) =>
val iidFactorReturns = factorReturns.mapSeriesWithKey { case (symbol, series) =>
val model = garchModels(symbol)
model.removeTimeDependentEffects(series, DenseVector.zeros[Double](series.length))
}

// Generate an RDD of simulations
// val rand = new MersenneTwister()
val seeds = sc.parallelize(0 until numTrials, parallelism)
seeds.map { seed =>
val rand = new MersenneTwister(seed)
val factorPaths = simulatedFactorReturns(horizon, rand, iidFactorReturns, garchModels)
}

// val factorsDist = new FilteredHistoricalFactorDistribution(rand, iidFactorReturns.toArray,
// garchModels.asInstanceOf[Array[TimeSeriesFilter]])
// val returns = simulationReturns(0L, factorsDist, numTrials, parallelism, sc,
Expand Down Expand Up @@ -101,7 +107,7 @@ object HistoricalValueAtRiskExample {
mat(i, ::) := iidSeries.data(rand.nextInt(iidSeries.data.rows), ::)
}
(0 until models.size).foreach { i =>
models(iidSeries.labels(i)).addTimeDependentEffects(mat(::, i), mat(::, i))
models(iidSeries.keys(i)).addTimeDependentEffects(mat(::, i), mat(::, i))
}
mat
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,6 @@ object SimpleTickDataExample {
val iidRdd = slicedRdd.mapSeries(series => ar(series, 1).removeTimeDependentEffects(series))

// Regress a stock against all the others
val samples = iidRdd.toSamples()
val samples = iidRdd.toInstants()
}
}
Loading