SQLDF is a system for SQL on data frames, optimised for memory.
Explore the docs »
Report Bug
·
Request Feature
·
Reference Manual
SQLDF make it easy to query data frames. SQL is the de-facto standard for data manipulation, with ample learning resources and SQL skills are ubiquitous. Although any query that can be done in SQL can also be done with the data frame API, SQL will be easier for most data scientists starting out with Lisp Stat. It is similar to the R package of the same name.
The price for convenient queries is memory. At worst twice the data set memory is used: one for the original data frame, and one for the in-memory SQLite database that is constructed to query. SQLDF automatically creates this in-memory database, creates the tables and schema corresponding to the data frame, transfers the data and performs the query. The query results are returned as another data frame. This is suprisingly fast and memory is typically not an issue on data workstations.
To get a local copy up and running follow these steps:
An ANSI Common Lisp implementation. Developed and tested with SBCL and CCL.
To make the system accessible to ASDF (a build facility, similar to make
in the C world), clone the repository in a directory ASDF knows about. By default the common-lisp
directory in your home directory is known. Create this if it doesn't already exist and then:
- Clone the repositories
cd ~/common-lisp && \
git clone https://github.com/Lisp-Stat/data-frame.git && \
git clone https://github.com/Lisp-Stat/sql-df && \
git clone https://github.com/Lisp-Stat/select && \
git clone https://github.com/TeMPOraL/cl-sqlite.git
- From the REPL reset the ASDF source-registry to find the new systems:
(asdf:clear-source-registry)
- Load the system
(asdf:load-system :sql-df)
If you have installed the slime ASDF extensions, you can invoke this with a comma (',') from the slime REPL.
To get the third party systems that these system may depend on, you can use a dependency manager, such as Quicklisp or CLPM Once installed, get the dependencies with either of:
(clpm-client:sync :sources "clpi") ;sources may vary
(ql:quickload :sql-df)
You need do this only once. After obtaining the dependencies, you can
load the system with ASDF
as described above without first syncing
sources.
Load the iris data set from R:
(asdf:load-system :lisp-stat)
(asdf:load-system :lisp-stat/rdata)
(defdf iris
(read-csv 'rdata:iris)))
and query it:
(pprint (sqldf:sqldf "select species, count(*) from iris group by species"))
;; SPECIES COUNT(*)
;; 0 setosa 50
;; 1 versicolor 50
;; 2 virginica 50
For more examples, please refer to the Documentation.
SQLDF is currently written using an apparently abandoned library, cl-sqlite. Pull requests from 2012 have been made with no response from the author, and the SQLite FFI interface has improved considerably in the 12 years since it was last updated.
We choose CL-SQLite because, at the time of writing, it was the only SQLite library with a commercially acceptable license. Since then, CLSQL has migrated to a BSD license and is a better option for new development. Not only does it support CommonSQL, the de-facto SQL query syntax for Common Lisp, it also supports several additional databases.
All new development on SQLDF will be on CLSQL, possibly including some of the CSV and other extensions available in SQLite. Benchmarks show that SQLite's CSV import is about 15x faster than cl-csv, and a FFI wrapper of SQLite's CSV importer would be a good addition to Lisp-Stat.
Also see the open issues for a list of proposed features (and known issues).
This system is part of the Lisp-Stat project; that should be your first stop for information. Also see the community page for more information.
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated. Please see CONTRIBUTING.md for details on the code of conduct, and the process for submitting pull requests.
Distributed under the MS-PL License. See LICENSE for more information.
Project Link: https://github.com/lisp-stat/sqldf