Skip to content

Commit

Permalink
Update for signac 2.0 and general cleanup (#183)
Browse files Browse the repository at this point in the history
* Remove unnecessary setup.py

* Update recommended Python version.

* Remove --user recommendation.

* Update quickstart.

* Update tutorial

* projects

* job

* querying

* flowproject

* remove collection

* configuration

* Add pre-commit config

* Add codespell.

* community

* Put back original hooks.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply Bradley's suggestions

Co-authored-by: Bradley Dice <[email protected]>

* Fix job reference.

Co-authored-by: Corwin Kerr <[email protected]>

* Fix a/an.

Co-authored-by: Corwin Kerr <[email protected]>

* Clarify analogy to primary keys.

* Remove from "data space" terms

* Small wording

* Update docs/source/flow-project.rst

Co-authored-by: Corwin Kerr <[email protected]>

* Apply Bradley's batch 2 suggestions from code review

Co-authored-by: Bradley Dice <[email protected]>

* Apply suggestions from code review

Co-authored-by: Bradley Dice <[email protected]>

* Don't use tilde in example path

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Corwin Kerr <[email protected]>
Co-authored-by: Bradley Dice <[email protected]>
  • Loading branch information
4 people authored Mar 30, 2023
1 parent 2bc457c commit a4dbccb
Show file tree
Hide file tree
Showing 17 changed files with 226 additions and 427 deletions.
4 changes: 4 additions & 0 deletions .codespellrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[codespell]
builtin = clear
quiet-level = 2
ignore-words-list = musil
10 changes: 10 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,13 @@ repos:
- --sort-fields=title,shorttitle,author,year,month,day,journal,booktitle,location,on,publisher,address,series,volume,number,pages,doi,isbn,issn,url,urldate,copyright,category,note,metadata
- --remove-empty-fields
- --no-remove-dupe-fields
- repo: https://github.com/pre-commit/pygrep-hooks
rev: 'v1.10.0'
hooks:
- id: rst-backticks
- id: rst-directive-colons
- id: rst-inline-touching-normal
- repo: https://github.com/codespell-project/codespell
rev: v2.2.4
hooks:
- id: codespell
2 changes: 1 addition & 1 deletion docs/source/aggregation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ This chapter provides information about passing aggregates of jobs to operation
Definition
==========

An :py:class:`~flow.aggregator` creates generators of aggregates for use in operation functions via `FlowProject.operation`.
An :py:class:`~flow.aggregator` creates generators of aggregates for use in operation functions via :attr:`FlowProject.operation`.
Such functions may accept a variable number of positional arguments, ``*jobs``.
The argument ``*jobs`` is unpacked into an *aggregate*, defined as an ordered tuple of jobs.
See also the Python documentation about :ref:`argument unpacking <python:tut-unpacking-arguments>`.
Expand Down
77 changes: 0 additions & 77 deletions docs/source/collections.rst

This file was deleted.

4 changes: 2 additions & 2 deletions docs/source/community.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,9 @@ All code contributed via pull request needs to adhere to the following guideline
* Avoid introducing dependencies -- especially those that might be harder to install in high-performance computing environments.
* All code needs to adhere to the PEP8_ style guide, with the exception that a line may have up to 100 characters.
* Create `unit tests <https://en.wikipedia.org/wiki/Unit_testing>`_ and `integration tests <ttps://en.wikipedia.org/wiki/Integration_testing>`_ that cover the common cases and the corner cases of the code.
* Preserve backwards-compatibility whenever possible, and make clear if something must change.
* Preserve backwards compatibility whenever possible, and make clear if something must change.
* Document any portions of the code that might be less clear to others, especially to new developers.
* Write API documentation as part of the doc-strings of the package, and put usage information, guides, and concept overviews in the `framework documentation <https://docs.signac.io/>`_, the page you are currently on (`source <https://github.com/glotzerlab/signac-docs/>`_).
* Write API documentation as part of the docstrings of the package, and put usage information, guides, and concept overviews in the `framework documentation <https://docs.signac.io/>`_, the page you are currently on (`source <https://github.com/glotzerlab/signac-docs/>`_).

.. _GitHub: https://github.com/glotzerlab/
.. _PEP8: https://www.python.org/dev/peps/pep-0008/
Expand Down
146 changes: 10 additions & 136 deletions docs/source/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,153 +7,27 @@ Configuration
Overview
========

The **signac** framework is configured with configuration files, which are named either ``.signacrc`` or ``signac.rc``.
These configuration files are searched for at multiple locations in the following order:
The **signac** framework is configured with configuration files.
The configuration files are stored using the standard `INI file format <https://en.wikipedia.org/wiki/INI_file>`__.
In general, two config files are supported:

1. in the current working directory,
2. in each directory above the current working directory until a project configuration file is found,
3. and the user's home directory.

The configuration file follows the standard "ini-style".
Global configuration options, should be stored in the home directory, while project-specific options should be stored *locally* in a project configuration file.

This is an example for a global configuration file in the user's home directory:

.. code-block:: ini
# ~/.signarc
[hosts]
[[localhost]]
url = mongodb://localhost
1. Project-specific configuration uses the ``.signac/config`` file in the project path.
2. Per-user configuration is stored in a global file at ``$HOME/.signacrc``.

You can either edit these configuration files manually, or execute ``signac config`` on the command line.
Please see ``signac config --help`` for more information.

Project configuration
=====================

A project configuration file is defined by containing the keyword *project*.
Once **signac** found a project configuration file it will stop to search for more configuration files above the current working directory.

For example, to initialize a project named *MyProject*, navigate to the project's root directory and either execute ``$ signac init MyProject`` on the command line, use the :py:func:`signac.init_project` function or create the project configuration file manually.
A project configuration file is defined as a file named ``config`` contained within a ``.signac`` directory.
Functions like :py:func:`~signac.get_project` will search upwards from a provided directory until a project configuration is found to indicate the project path.
This is an example for a project configuration file:

.. code-block:: ini
# signac.rc
project = MyProject
workspace_dir = $HOME/myproject/workspace
project
The name is required for the identification of the project's root directory.

workspace_dir
The path to your project's workspace, which defaults to ``$project_root_dir/workspace``.
Can be configured relative to the project's root directory or as absolute path and may contain environment variables.


Host configuration
==================

The current version of **signac** supports MongoDB databases as a backend.
To use **signac** in combination with a MongoDB database, make sure to install ``pymongo``.

Configuring a new host
----------------------

To configure a new MongoDB database host, create a new entry in the ``[hosts]`` section of the configuration file.
We can do so manually or by using the ``signac config host`` command.

Assuming that we a have a MongoDB database reachable via *example.com*, which requires a username and a password for login, execute:

.. code-block:: bash
$ signac config host example mongodb://example.com -u johndoe -p
Configuring new host 'example'.
Password:
Configured host 'example':
[hosts]
[[example]]
url = mongodb://example.com
username = johndoe
auth_mechanism = SCRAM-SHA-1
password = ***
The name of the configured host (here: *example*) can be freely chosen.
You can omit the ``-p/--password`` argument, in which case the password will not be stored and you will prompted to enter it for each session.

We can now connect to this host with:

.. code-block:: pycon
>>> import signac
>>> db = signac.get_database("mydatabase", hostname="example")
The ``hostname`` argument defaults to the first configured host and can always be omitted if there is only one configured host.

.. note::

To prevent unauthorized users from obtaining your login credentials, **signac** will update the configuration file permissions such that it is only readable by yourself.


Changing the password
---------------------

To change the password for a configured host, execute

.. code-block:: bash
$ signac host example --update-pw -p
.. warning::

By default, any password set in this way will be **encrypted**. This means that the actual password is different from the one that you entered.
However, while it is practically impossible to guess what you entered, a stored password hash will give any intruder access to the database.
This means you need to **treat the hash like a password!**

Copying a configuration
-----------------------

In general, in order to copy a configuration from one machine to another, you can simply copy the ``.signacrc`` file as is.
If you only want to copy a single host configuration, you can either manually copy the associated section or use the ``signac config host`` command for export:

.. code-block:: bash
$ signac config host example > example_config.rc
Then copy the ``example_config.rc`` file to the new machine and rename or append it to an existing ``.signacrc`` file.
For security reasons, any stored password is not directly copied in this way.
To copy the password, follow:

.. code-block:: bash
# Copy the password from the old machine:
johndoe@oldmachine $ signac config host example --show-pw
XXXX
# Enter it on the new machine:
johndoe@newmachine $ signac config host example -p
Manual host configuration
-------------------------

You can configure one or multiple hosts in the ``[hosts]`` section, where each subsection header specifies the host's name.

url
The url specifies the MongoDB host url, e.g. ``mongodb://localhost``.
authentication_method (default=none)
Specify the authentication method with the database, possible choices are: ``none`` or ``SCRAM-SHA-1``.
username
A username is required if you authenticate via ``SCRAM-SHA-1``.
password
The password to authenticate via ``SCRAM-SHA-1``.
db_auth (default=admin)
The database to authenticate with.
password_config
In case that you update, but not store your password, the configuration file will contain only meta hashing data, such as the salt.
This allows to authenticate by entering the password for each session, which is generally more secure than storing the actual password hash.

.. warning::
schema_version = 2
**signac** will automatically change the file permissions of the configuration file to *user read-write only* in case that it contains authentication credentials.
In case that this fails, you can set the permissions manually, e.g., on UNIX-like operating systems with: ``chmod 600 ~/.signacrc``.
schema_version
Identifier for the current internal schema used by signac. This schema version determines internal details such as the location of configuration files or caches.
19 changes: 11 additions & 8 deletions docs/source/flow-project.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,7 @@ Operations
==========

It is highly recommended to divide individual modifications of your project's data space into distinct functions.

In this context, an *operation* is defined as a function whose only positional argument is an instance of :py:class:`~signac.contrib.job.Job` (in the special case of :ref:`aggregate operations <aggregation>`, variable positional arguments ``*jobs`` are permitted).

In this context, an *operation* is defined as a function whose only positional arguments are instances of :py:class:`~signac.job.Job`.
We will demonstrate this concept with a simple example.
Let's initialize a project with a few jobs, by executing the following ``init.py`` script within a ``~/my_project`` directory:

Expand Down Expand Up @@ -92,6 +90,11 @@ A very simple *operation*, which creates a file called ``hello.txt`` within a jo
if __name__ == "__main__":
MyProject().main()
.. tip::

By default operations only act on a single job and can simply be defined with the signature ``def op(job)``.
When using :ref:`aggregate operations <aggregation>`, it is recommended to allow the operation to accept a variable number of jobs using a variadic parameter ``*jobs``, so that the operation is not restricted to a specific aggregate size.


.. _conditions:

Expand Down Expand Up @@ -151,8 +154,8 @@ The entirety of the code is as follows:
for more information.

We can define both :py:meth:`~flow.FlowProject.pre` and :py:meth:`~flow.FlowProject.post` conditions, which allow us to define arbitrary workflows as a `directed acyclic graph <https://en.wikipedia.org/wiki/Directed_acyclic_graph>`__.
A operation is only executed if **all** pre-conditions are met, and at *at least one* post-condition is not met.
These are added above a `~flow.FlowProject.operation` decorator.
An operation is only executed if **all** preconditions are met, and at *at least one* postcondition is not met.
These are added above a :attr:`~flow.FlowProject.operation` decorator.
Using these decorators before declaring a function an operation is an error.

.. tip::
Expand Down Expand Up @@ -229,7 +232,7 @@ The Project Status
The :py:class:`~flow.FlowProject` class allows us to generate a **status** view of our project.
The status view provides information about which conditions are met and what operations are pending execution.

A *label-function* is a condition function which will be shown in the **status** view.
A *label function* is a condition function which will be shown in the **status** view.
We can convert any condition function into a label function by adding the :py:meth:`~.flow.FlowProject.label` decorator:

.. code-block:: python
Expand All @@ -238,13 +241,13 @@ We can convert any condition function into a label function by adding the :py:me
def greeted(job):
return job.isfile("hello.txt")
We will reset the workflow for only a few jobs to get a more interesting *status* view:
We will reset the workflow for only a few jobs to get a more interesting status view:

.. code-block:: bash
~/my_project $ signac find a.\$lt 5 | xargs -I{} rm workspace/{}/hello.txt
We then generate a *detailed* status view with:
We then generate a detailed status view with:

.. code-block:: bash
Expand Down
3 changes: 1 addition & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. signac documentation master file, created by
sphinx-quickstart on Fri Oct 23 17:41:32 2015.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
contain the root ``toctree`` directive.
Welcome to the signac framework documentation!
==============================================
Expand Down Expand Up @@ -49,7 +49,6 @@ If you are new to **signac**, the best place to start is to read the :ref:`intro
flow-group
aggregation
hooks
collections
configuration
recipes
tips_and_tricks
Expand Down
8 changes: 2 additions & 6 deletions docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Installation

The **signac** framework consists of three packages: **signac**, **signac-flow**, and **signac-dashboard**.
All packages in the **signac** framework depend on the core **signac** package, which provides the data management functionality used by all other packages.
Most users should install the **signac** and the **signac-flow** packages, which are tested for Python 3.6+ and are built for all major platforms.
Most users should install the **signac** and the **signac-flow** packages, which are tested for Python 3.8+ and are built for all major platforms.
For more details about the functionalities of individual packages, please see :ref:`package-overview`.


Expand All @@ -37,11 +37,7 @@ For a standard installation with pip_, execute:

.. code:: bash
$ pip install --user signac signac-flow
.. note::

If you want to install packages for all users on a machine, you can remove the ``--user`` option in the install command.
$ pip install signac signac-flow
Installation from Source
Expand Down
5 changes: 2 additions & 3 deletions docs/source/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,11 @@ Concept
The **signac** framework is designed to simplify the storage, generation and analysis of multidimensional data sets associated with large-scale, file-based computational studies.
Any computational work that requires you to manage files and execute workflows may benefit from an integration with **signac**.
Typical examples include hyperparameter optimization for machine learning applications and high-throughput screening of material properties with various simulation methods.
The data model assumes that the work can be divided into so called *projects*, where each project is roughly confined by similarly structured data, e.g., a parameter study.

In **signac**, the elements of a project's data space are called *jobs*.
In **signac**, collections of parameter values are *jobs* and are stored in a flat directory structure.
Every job is defined by a unique set of well-defined parameters that define the job's context, and it also contains all the data associated with this metadata.
This means that all data is uniquely addressable from the associated parameters.
With **signac**, we define the processes generating and manipulating a specific data set as a sequence of operations on a job.
Using this abstraction, **signac** can define workflows on an arbitrary **signac** data space.
Using this abstraction, **signac** can define workflows on an arbitrary **signac** workspace.

.. image:: images/signac_data_space.png
Loading

0 comments on commit a4dbccb

Please sign in to comment.