Skip to content

Commit

Permalink
Merge pull request #53 from AMS-QF/taq_data
Browse files Browse the repository at this point in the history
Taq data
  • Loading branch information
jasonbohne123 authored Feb 13, 2024
2 parents d5777da + 7b6cb8f commit ea48c5e
Show file tree
Hide file tree
Showing 34 changed files with 17 additions and 9,307 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ data_preprocessing/__pycache__
.env
.ipynb_checkpoints/
.DS_Store
.vscode/
__pycache__/

# python
feature_generation/__pycache__/
Expand Down
File renamed without changes.
15 changes: 11 additions & 4 deletions Example_Data_NB.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -299,7 +299,7 @@
{
"data": {
"text/plain": [
"[<matplotlib.lines.Line2D at 0x7f9888886890>]"
"[<matplotlib.lines.Line2D at 0x7f47c4730550>]"
]
},
"execution_count": 5,
Expand Down Expand Up @@ -608,7 +608,7 @@
{
"data": {
"text/plain": [
"[<matplotlib.lines.Line2D at 0x7f9888881510>]"
"[<matplotlib.lines.Line2D at 0x7f47c4761190>]"
]
},
"execution_count": 9,
Expand Down Expand Up @@ -881,7 +881,7 @@
{
"data": {
"text/plain": [
"[<matplotlib.lines.Line2D at 0x7f9880654cd0>]"
"[<matplotlib.lines.Line2D at 0x7f47c45d7cd0>]"
]
},
"execution_count": 12,
Expand Down Expand Up @@ -1215,7 +1215,7 @@
{
"data": {
"text/plain": [
"[<matplotlib.lines.Line2D at 0x7f98884cf5d0>]"
"[<matplotlib.lines.Line2D at 0x7f47c45a0e90>]"
]
},
"execution_count": 16,
Expand Down Expand Up @@ -1587,6 +1587,13 @@
"get_ref(symbols, start_date, end_date, row_limit, columns)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
Expand Down
22 changes: 1 addition & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# TAQ-Query-Scripts
# TAQ-Data
Included are the client side-scripts for access to the TAQ-Clickhouse Database remotely with Python.

Instructions for access through the SQL UI DBeaver are included in Accessing the TAQ-Clickhouse Database PDF
Expand All @@ -18,35 +18,15 @@ Instructions for access through the SQL UI DBeaver are included in Accessing th
db_pass=
```

### Spark Pipelines
- For feature generation, transformation, and normalization


### PyTorch Models
- For training and testing models

### Conda Tips
- `environment.yml` is a file that contains all the dependencies for the conda environment
- Will have to update path name on this yml file
- `query_user_environment.yml` is a file that contains all the dependencies for the conda environment for the query user on the server
- To install datatable it is required to install from source repo using `pip install git+https://github.com/h2oai/datatable`
- If conda environment is not working, try to update conda using `conda update -n base -c defaults conda`.

### Sample Data
Sample data of 1000 trade and quote samples are included within the `sample_data` directory. The sample data is stored as gzip compressed csv files. All tutorials will use this sample data.

Feel free to create a directory for your own research called `personal_research` in the root directory of the repo. This directory is ignored by git and can be used to store your own scripts and data

**Do not republish, distribute or utilize the sample data found in this repo for any purposes other than academic research**


### Internally Setting up a new user
- Requires server username and password with DBUserDev permissions- contact Victor Poon
- Server user groups requried - docker, condausers, TAQDatabaseCoreDev, TAQDatabaseUsers
- Requires Database username and password
- Database user groups QUERY_USER, taq_group


More detailed sketch
- TAQ-Query-Scripts triggers conda env and execute scripts in TAQNYSE-Clickhouse to programmatically query from DB
- Save file to local directory and transfer file to local machine through SCP
28 changes: 0 additions & 28 deletions data_preprocessing/README.md

This file was deleted.

19 changes: 3 additions & 16 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,35 +5,28 @@ channels:
- bioconda
- anaconda
- defaults
- huggingface
dependencies:
- _libgcc_mutex
- blas
- bzip2
- ca-certificates
- certifi
- charset-normalizer
- clickhouse-sqlalchemy
- exchange-calendars
- huggingface_hub
- idna
- intel-openmp
- ipykernel
- ipywidgets
- joblib
- korean_lunar_calendar
- libffi
- matplotlib
- mkl
- mkl_fft
- mkl_random
- nb_conda_kernels
- numpy-base
- numba
- openssl
- packaging
- pandas_market_calendars
- pip
- pre_commit
- pycparser
- pyluach
- pyopenssl
Expand All @@ -43,20 +36,14 @@ dependencies:
- python-dateutil
- python_abi
- python-dotenv
- pytorch
- pytorch-lightning
- pytz
- requests
- setuptools
- scp
- six
- sortedcollections
- scikit-learn
- sqlite
- statsmodels
- tk
- toolz
- transformers
- tzdata
- tzlocal
- urllib3
Expand All @@ -66,6 +53,7 @@ dependencies:
- pip:
- brotlipy
- cffi
- clickhouse-driver
- configobj
- cryptography
- greenlet
Expand All @@ -74,6 +62,5 @@ dependencies:
- mkl-service
- numpy
- pandas
- pyarrow
- torchsummary
- sqlalchemy
prefix: /home/jbohne/anaconda3/envs
Loading

0 comments on commit ea48c5e

Please sign in to comment.