Replies: 7 comments 25 replies
-
Very nice! I'll add a few links and ideas. Consider supporting Boolean datatypes which are available on the following dbms This is nice because users sometimes want to move data from R to the database and other drivers will preserve the datatype as boolean. So if a user changes drivers their code will not need to change.
SQLite does not have a bool type but apparently will store the keyword TRUE as 1 and FALSE as 0. It also does not have date types. duckdb is now past v1 and might be better than sqlite for many of our use cases. Hive, netezza, impala, and pdw, (and possibly synapse?) aren't really supported since we don't have test servers so it seems like all OHDSI databases except sqlite have boolean datatypes. DBITest We could try to improve DatabaseConnectors DBI support by implementing more tests from the DBITest package. https://dbitest.r-dbi.org/ dbplyr integration This PR might work with dbplyr. Since DatabaseConnector supports so many dialects I guess we need to set the class of the connection type dynamically when the connection is created. I think we should keep renderTranslateQuerySql() and renderTranslateExecuteSql() simply because they are used a lot in ohdsi code. To make the package leaner we could make the dependency on SqlRender optional (Suggests rather than Depends). Possible issues to fix: I got this when I add the PqConnection class to a databaseconnector connection |
Beta Was this translation helpful? Give feedback.
-
I've added the steps we agreed on as issues that are part of the v7 milestone. If possible, I'd like to have this done by end of Q7, but currently I'm waiting on step 1 (swapping out the back ends). |
Beta Was this translation helpful? Give feedback.
-
This approach using dplyr...this will make the api to sql translation an R-exclusive function then? What about those parties that are producing sql via python or Java? |
Beta Was this translation helpful? Give feedback.
-
My experience of dbplyr translations has been that they're almost always inefficient and I have had more than one situation where I've had to do a complete re-write of code. In general, my opinion on ORM style systems (from experience with Python, php, C# and Java based mechanisms) is that you should write SQL. Dbplyr might be fine for quick, user friendly scripts but I don't think it should be used in production ready situations and not for big studies. This should especially be a consideration in 'pay-per-query-credit' databases such as databricks and snowflake. |
Beta Was this translation helpful? Give feedback.
-
I was also wondering if we could discuss the approach to casing of column names (uppercase, lowercase) that is used in DatabaseConnector. I think the way this work is that you can pass any case into insertTable (upper, lower, or mixed). That might end up in the database as uppercase or lowercase depending on the dbms. When you use querySql you will always read the column names out as uppercase. What if we had an option to return column names as upper, lowercase, or whatever they are in the database without casting the case? |
Beta Was this translation helpful? Give feedback.
-
I'm making some progress on the boolean type support #285 . I'm putting all my changes on one branch which is a lot easier for me since I can just focus on getting a single branch in a good state. The PR (work in progress) is here #286. I found it easier to pass integers between R and Java than logical types. The reason is that Java's native boolean type does not have an NA/null value while R's logical type does. Java has a Boolean class that does support NA/null but that does not get automatically converted to R's logical type by RJava. The int type seems to be working fine though. The data is still stored as boolean in the database and in R but is temporarily represented as integers when being passed between java and R. So far I've only been testing on postgres. |
Beta Was this translation helpful? Give feedback.
-
I also realized we can't yet remove OracleTempSchema because FeatureExtraction still requires it. It would be nice to remove this in the next major release. OHDSI/FeatureExtraction#126 |
Beta Was this translation helpful? Give feedback.
-
When DatabaseConnector was first created in April 2014 DBI was still in its infancy, with no drivers for most platforms, and dbplyr did not exist yet. In the 10 years since then, the landscape has changed dramatically, also forcing us to re-evaluate the role of DatatabaseConnector.
I believe there still is a place for DatabaseConnector. DBI is a rather loose interface, leaving many per-platform idiosyncrasies. I propose DatabaseConnector is the place where those idiosyncrasies are resolved, so that anybody wanting to write analytics packages that run on all platforms we support in OHDSI, they only need to know about DatabaseConnector, and not have to worry about each individual platform. In other words, if DatabaseConnector does its job, no other OHDS package should have code that looks like
if dbms == 'postgresql' do this else do this
. All that should be handled by DatabaseConnector. In that sense, I think of DatabaseConnector as the 'last mile' solver: it standardizes what DBI doesn't. Hopefully in the future we won't need DatabaseConnector anymore, and we should work towards that.Translation mechanisms
Each database platform speaks its own SQL dialect. We initially developed SqlRender so OHDSI packages would write SQL once (in what we now call OhdsiSql), and we can translate that to all other dialects. Since then, dbplyr has been developed, which allows constructing platform-specific SQL by chaining R commands. Each has its pros and cons:
SqlRender pros:
DATE_ADD()
,DATE_DIFF()
,YEAR()
)SqlRender cons:
dbplyr pros:
dbplyr cons:
I therefore think for the foreseeable future we want to support both SqlRender and dbplyr.
Database connections
Currently most connections are supported via JDBC drivers, although SQLite and DuckDB now use their respective DBI drivers. I think we should be agnostic to the type of driver used, and perhaps we can switch some of the JDBC drivers to DBI. I do still think that
Proposed high-level changes
For DatabaseConnector 7 I suggest we lean more into the role of last-mile solver. DatabaseConnector should get leaner, offloading as much as possible to other packages (e.g. dbplyr support should be done entirely by back-ends not in DatabaseConnector). I also suggest we more closely align with DBI, and some of its unwritten conventions. (e.g. DatabaseConnector uses
databaseSchema
, whereas DBI unofficially usescatalog
andschema
).I'm unsure what to do with
renderTranslateQuerySql()
andrenderTranslateExecuteSql()
. I use these functions all the time, because they (a) translate on the fly, (b) throw nice shareable error reports, and (c) have progress bars. However, they are not part of the DBI specification.Beta Was this translation helpful? Give feedback.
All reactions