-
Notifications
You must be signed in to change notification settings - Fork 26
Potential Projects
Warning: Wiki should not be edited directly. Edit the files in the ./wiki/ folder instead and make a PR.
For anyone getting started with developing for Drasil, the following thread discusses good candidate first/early issues: Starting issues.
The projects listed below are intended to improve Drasil; the scope of these projects is generally larger than a single issue (Drasil Issue Tracker). Where a good issues should generally be close-able with less than a week of effort, the projects listed here will likely take longer. Moreover, not all of the project details have been worked out, so the path to closure for each project still needs to be determined. Each project will likely be completed by decomposing it into a series of issues.
All of the projects are larger than a single issue, but beyond that characterization there is considerable variability in their scope. Some are suitable for a summer student research project, while others would be more appropriate for an MEng, Masters or PhD project.
The information given for each project is just a starting point. All of the potential projects require further thought and refinement. An initial version of several of the potential projects is given in the SE4SC Repo. The SE4SC repo (not public) provides some additional brainstormed ideas.
Pandoc is a Haskell library for converting from one markup format to another. Instructions on using Pandoc are available at: Pandoc Web-page for Users, while the code itself is maintained in the following repo: Pandoc GitHub Repo. From the Pandoc GitHub page: > Pandoc has a modular design: it consists of a set of readers, which parse text in a given format and produce a native representation of the document (an abstract syntax tree or AST), and a set of writers, which convert this native representation into a target format.
The AST for the documentation representation should be compared to the Drasil representation of a document. We could use the comparison to improve DocLang, or maybe even replace DocLang with Pandoc. The readers and writers may also be helpful for conversion between different document formats. The data types that shows up in pandoc overlap with some of the things we do in our printers.
The organization of the code for Drasil has been refactored as patterns emerge. In particular, when opportunities for reuse are observed, the code is changed to facilitate this. We are implicitly capturing scientific information and the relationship between different theories, definitions, etc. If we could make this information explicit, it could facilitate future additions of knowledge to Drasil. An ontology of scientific knowledge would be useful in its own right, especially one that is backed up by being (at least partially) formalized. An informal ontology for scientific knowledge is currently implicitly available in the organization of books, papers, journals, curricula etc, but we are not aware of a formal model of scientific knowledge.
Capturing scientific knowledge will require categorizing concepts and determining their properties and the relations between them. As knowledge is gained errors and inconsistencies in SCS can be avoided at earlier and earlier stages. For instance, as an ontology of physics knowledge takes form, Drasil will know the concept of the length of a beam makes sense, but the "length" of water, does not. Similarly, Drasil will be able to generate warnings, or error messages, when a variable representing length is assigned a negative value, or when Poisson's ratio is outside of the admissible range [0, 0.5], or when a fluid mechanics theory is used outside the laminar flow range (Reynolds number less than 2010 for flow in a pipe).
Some sample ontologies that might be relevant include:
An overview of ontologies is given on Wiki Page on Ontology Related Definitions
Developing our own ontology can be driven by the Drasil examples. As the information is organized for reuse, an ontology should naturally arise. For instance, the SSP example has the local concept of stress, but this really is a concept that applies for any example in continuum mechanics. However the concept of effective stress should be reserved for continuum mechanics problems where the material is granular, such as for a soil. This example is discussed as part of a pull request.
This ontology is based on scientific project knowledge that Drasil already captures. It is not the scientific knowledge related to the domain knowledge (discussed in another potential project); it is the things like people, documents, expressions, references, etc. The plan is to encode the knowledge using OWL, where OWL stands for the Web Ontology Language (OWL). OWL languages have a formal semantics and are built on the Resource Description Framework (RDF).
The initial steps are:
- collect all of the objects that Drasil directly can talk about [there are actually very few!]
- collect all the classes of objects that Drasil knows about [there are more]
- determine all the properties that Drasil can encode
These are all in the Drasil code (in multiple packages).
Some useful links include:
- https://en.wikipedia.org/wiki/Upper_ontology
- http://dublincore.org/ and https://en.wikipedia.org/wiki/Dublin_Core
- https://www.w3.org/wiki/Lists_of_ontologies
- http://info.slis.indiana.edu/~dingying/Teaching/S604/OntologyList.html
We will look for ontologies that are close to what we can already talk about, and then point out the differences between the prior work and Drasil.
Test cases can be generated from the typical inputs, data constraints, and the properties of a correct solution. Generating test cases from the data constraints will involve improving the constraint representation, as partially introduced in #1220.
The test case generation facility should also be incorporated into the Travis continuous integration system, so that the generated code for the case studies is automatically tested with each build. Currently we test that the generated code compiles, but we do not test to see if it passes any test cases.
Cardiovascular diseases are a leading cause of death globally. Lives could potentially be saved if we had a means of early detection of different diseases. Automatic construction of a 3D model of the aorta from CT scans would help researchers and clinicians. Currently 3D aorta reconstruction is done with a mix of manual and automatic tools. Improving the automation would save significant time, which could mean shorter visits to the doctor, and possibly fewer visits.
A tool for automatic reconstruction does not currently exist. An interesting case study of Drasil would be to build this tool, and the associated documetation, using Drasil facilities. The existing Drasil examples were created a priori and translated into Drasil.
Jupyter notebooks are commonly used to present "worksheets" that present the theory, code and computational results together. This is just a different view of the same information that is already in Drasil. Showing that we can generated Jupyter notebooks would highlight the flexibility of Drasil. It would also highlight the kind of knowledge that we can manipulate.
The examples for the Jupyter notebook could start with simple physics examples, possible borrowing ideas from Learn You a Physics for Great Good
Many potential simple physics problems are given at: My Physics Lab. Another great place for Physics is Dyna-Kinematics, which also has very pretty animations. Would make a good showcase.
Markdown is a small markup language for creating formatted text. It is easier for humans to write, read and edit than html. Jupyter notebooks often use markdown for their text, rather than more complicated html code. Drasil generating Markdown should be relatively easy and it would give another nice example. We could incorporate it into the generated Jupyter notebooks. Moreover, Markdown would give us a nice example of an internal program family, since there are several variants including the GitHub Flavoured Markdown and Markdown Extra.
If information is missing, Drasil should inform the user. The following information can be checked:
- Necessary information is provided, or explicitly indicated as not applicable. Necessary information could include properties of a correct solution. The properties of a correct solution are easy to pass neglect at the early stages, but attention to this detail can definitely pay dividends down the road.
- The number of inputs is sufficient to find the output of a given equation.
- Every "chunk" is used at least once. If a theoretical model, for instance, is never referenced elsewhere in the documentation, then it is likely irrelevant for the given problem. As another example, all assumptions should be invoked somewhere, or they shouldn't be in the documentation.
- It seems likely that every instance model should be invoked by at least one requirement. Automatic generation of the traceability between requirements and instance models will help determine whether this is a realistic check.
- Add "sanity checkers" to review the Drasil code. These checkers should prevent "silly" mistakes. For instance, if there is a min and a max specified for data constraints, the min should be less than (or equal to?) the max.
- Check for words that are usually indicative of a problem with a specification
document. The first three come from Ron Patton in the 2nd edition of Software
Testing while the last one comes from Dr. Smith.
- Potentially unrealistic: always, every, all, none, every, certainly, therefore, clearly, obviously, evidently
- Potentially vague: some, sometimes, often, usually, ordinarily, customarily, most, mostly, good, high-quality, fast, quickly, cheap, inexpensive, efficient, small, stable
- Potentially incomplete: etc., and so forth, and so on, such as, handled, processed, rejected, skipped, eliminated, if . . . then . . . (without "else" or "otherwise")
- Potentially nonatomic: and (in requirements)
It should be possible to turn the checks off, since there could be cases where the user wants to ignore the warnings.
The following Drasil case studies do not generate code: Game Physics, SSP and SWHS. These examples should be completed. The SRS for Game Physics also needs to be carefully reviewed. As it is right now, the inputs and outputs for the game physics library are not complete, or consistent.
To get SSP working will mainly require hooking it into an external library for optimization. The same pattern as for using external libraries with noPCM can be used with SSP, but with optimization libraries.
When revising the game physics example, the following issues are worth considering:
- Boundary case of collision where velocity and surface normal are pependicular
- TM:NewtonSecLawRotMot should reference DD:torque
The following projects could be added to Drasil. They are suggested for one or more of the following reasons: they would be of interest to potential students, they are in an area not covered by the current Drasil examples, they are more ambitious than the current Drasil examples:
- machine learning
- discrete probability density function
- family of data fitting algorithms
- family of finite element analysis programs
- family of convex hull algorithms
Drasil currently focuses on physics based examples. Adding general purpose research software tools would be helpful, since they provide a bridge between the physics problems and how the problems are solved numerically. For instance, the fitting used in GlassBR and in SFS (Software for Solidification (not in Drasil)) could be made much more generic. We could have a family of fitting algorithms that could be used in any situation where fitting is required. A proper commonality analysis of this domain could potentially show the potential design decisions that bridge between the requirements and the design. In the SFS example many different fitting routines were tried. If the experiments could have been done easily via a declarative specification, considerable time would have been saved. If the experiments are combined with automated testing and "properties of a correct solution" the human involvement could be reduced, so that we have partially automated algorithm selection.
Drasil can currently generate requirements documentation and code. We should be able to write recipes for writing physics based papers. Since we have started examples on projectile motion, we should be able to generate a physics based paper like the following:
Projectile Trajectory of Penguin's Faeces and Rectal Pressure Revisited
(The paper title might sound like a joke, but it is actually a real world application of the use of physics.) 😄
Drasil currently supports one task as implemented by an external library: solving
an initial value problem (IVP) ordinary differential equation (ODE) for a linear ODE.
Adding external library support could start with improving ODEs. From a conversation
with Brooks: "All of the ODE variations (coupled, BVPs, higher order) are not yet working,
but are close. ... Any of these would require encoding the
appropriate library as data, as well as some minor infrastructure changes to have the
right data structures in the right places (for example, the ODEInfo
we currently use
is tailored specifically to IVPs, so we would need a new data structure for BVPs,
or re-work ODEInfo
to be general enough to work for both cases). Notably, when I
designed ODEInfo
I had ODE systems in mind, so for example ODEInfo
supports a list
of ODE equations instead of just one, though all of the example library encodings I
created were for single ODEs. Thus, coupled ODEs are likely even closer than the
other variations you mentioned." After the ODE variations, support for solving a linear
system of equations (Ax = b) seems like a good candidate, since linear systems
come up often.
To prevent Drasil users from building expressions that have inconsistent units, a proper type system could be added for the units. An example is available for Isabelle. Since scientists and engineers are not always as careful as they should be with units, an option should be available to issue warnings rather than type errors.
We should be able to add knowledge and recipes to Drasil so that we can reproduce an existing case study on Verifying a Cruise Control System using Simulink and SpaceEx.
We should do what we preach (also known as eating your own dog food). So we should be able to write down the requirements, specification, design, etc, of Drasil inside Drasil. Right now, the biggest first impediment to that is that our only backend is GOOL, which specifically targets OO languages. We'd have to have a Haskell backend as well. A Haskell backend is not really all that difficult, at least if it is done in parallel with GOOL. Having it as part of GOOL seems infeasible, as GOOL is on-purpose OO-specific. There are of course parts of the expression language which could be re-used, but it's not clear if that's worth it.
It is, of course, not that simple. The next, even larger, problem is that our specification language really does not let us talk about representations (i.e. what would eventually become data-structures). So that would be needed too.
Luckily, the whole project could be done incrementally, meaning that various pieces of Drasil could be generated, and plugged in to the hand-written code. drasil-lang is probably "best understood", for example. drasil-docLang is kind of at the opposite end. drasil-data is close to drasil-lang in being understood.
Such a task would most likely entail de-embedding drasil-example (and drasil-data). Meaning that we'd need to have an external syntax for describing the examples. That would definitely be a good thing. Same with the fundamental knowledge in drasil-data, it should get an external syntax.
Most importantly, this would require an analysis of all of our own softifacts, classification of what knowledge is in them, how to capture that knowledge, and what recipes to reproduce those softifacts would look like. That would be some of the most interesting parts of the work. Some of the meta-structure of the code in drasil-lang, i.e. what's a data-structure, what is a class, would require some very careful thinking. The wonderful thing about that is that all that information is design information, something that we don't have enough of right now.
We should eventually check Drasil against existing checklists. Using checklists we may find some ways to improve our artifacts and infrastructure. For instance, I don’t think we explicitly mention known bugs (or in our case, explicitly state the current scope/limitations of Drasil). I also don’t think we have test coverage metric. Maybe Haskell tell us what lines of Haskell are exercised when we generate all of our examples? If there are files that aren’t used, that would be useful information.
Checklists can also be applied to our generated programs. We could look to see what is existing from our complete examples, like GlassBR. This might give us some "low hanging" fruit that will improve the relevance of our examples. For instance, we could likely easily add an AUTHORS file and similar artifacts. It would be more work to add a design document, but GlassBR does have an assumed design that we could roughly document.
If we go through the exercise of “grading” Drasil and one of our generated case studies, we could get a nice “to do” list.
Here are some good checklists to start with, in rough order of preference:
Software Sustainability Institute Form
Scottish Covid Response Modelling Software Checklist
DLR Software Engineering Guidelines
We also have a list of more checklists.
The common artifacts recommended by the different software development guidelines are summarized in Table 3 of the "Digging Deeper" paper.
This project can be split into sub-projects:
- grade Drasil
- grade Drasil output
- analyze/prioritize each of the above; priority should weight both ease of implementation and ‘impact’
- create tasks to fix the highest priority items
Drasil currently uses external libraries, like scipy, for their ODE solvers. As Drasil development continues we'll be adding more external libraries. Currently we keep copies of the external libraries we are using as part of our repo. This isn't practical in the long-term, given the amount of space the libraries will consume. The current approach is made even worse because the copies of external libraries aren't currently shared between examples. We have sorted out that problem with symbolic links (#2980), but this is not an elegant solution. The question remains on how we should handle dependencies. Should we use a package manager (like nix, homebrew (for Macs), etc)? Should we just generate instructions for the user and let them install the dependencies? Should we generate files for the common .configure
, make build
, make install
chain of commands?
In the Drasil generated LaTeX code long equations can run off the edge of the page. For instance, the equation is cut off in the double pendulum example for IM:calOfAngularAcceleration1):
Ideally we would like to generate code that is automatically formatted to fit on the page. As discussed in #718, full automation is likely too difficult with the information Drasil currently has access to. Drasil would need to know more, like the page width, font size etc. Rather than full automation, we can aim for providing the user with access to options that more aggressively break equations across lines. This topic, and some ideas for how to split equations across lines are discussed in #718.
Drasil could be taught generate graphs (bar charts, line charts, scatter plots etc) from available data. The data could be entered into Drasil directly, or it could be an output of calculations. To do this we would need to understand the vocabulary of graphs. A starting point could be the prior work on infographics.
As with any scientific problem, visualizing is always helpful, intriguing, and encouraging to those pursuing science. For our case studies that produce code, we've already gone through the work of understanding how the calculation works. As an alternative "view" of our body of knowledge gathered to generate "solving code", we can similarly generate web apps that simulate the same problems using visual information. In particular, generating graphs and diagrams can be helpful in understanding how abstract theories work and how they concretely appear.
For example, myPhysicsLab and JavaLab are both web projects with similar objectives to this idea: simulating physics experiments.
Elm is a functional language, similar to Haskell, and follows a specific Model-View-Update architecture, and a potential target language for our simulation software artifacts. Regarding the MVU architecture, we should be able to create a "model" that contains the notable symbols we have in our case studies, a "view" that displays the options for the simulations and the simulation according to our "model", and an "update" module that updates the "model" according to changes in the options (and over time). The "update" and "model" components should largely be based on our existing "code" generation. Elm does not necessarily need to be our target language for simulation software, but it is a user-friendly and commonly used language.
Moved here from Issue #1182.
Currently, Drasil has a home-growm 'units' handling module. This project would investigate if we can switch to a third-party solution.
dimensional is a potential third-party package.
- Does the package have a notion of Expr? If so, investigate to see compatibility.
- Investigate how to interface package
- Add interface
The issue can be reopened and assigned once the project is taken on.
- Home
- Getting Started
- Documentation (of Drasil specifics)
- Design
-
Readings
- Drasil Papers and Documents
- Related Work and Inspiration
- Writing Documentation
- Compression as a Means to Learn Structure
- Glossary, Taxonomy, Ontology
- Grounded Theory
- Model Driven Scrapbook
- Model Transformation Languages
- ODE Definitions
- The Code Generator
- Suggested Reading
- Sustainability
- Productivity
- Reuse
- Formal Concept Analysis
- Generative Programming
- Software Documentation
- Units and Quantities
- Misc.
- WIP Projects