create_model_code.py

# create_model_code.py
# Author: Matthew Smarte
# Original Creation: 8/21/18
# Updated: 9/16/18

'''

Below is an optional utility function that converts models written in an A+B==>C format 
to code representing a system of differential rate equations. The code generated by this
function can be used when creating the user model function to reduce typing errors / save time.
Also generates code that can be used in the model parameters set up section of the Jupyter notebook.

See comments in 'ex_model_1.py' to identify which portions of the user model code can be
automatically generated by using this utility.  This function does NOT generate all code
necessary for defining the user model function.

See comments in 'ex_notebook_1.ipynb' to identify which portions of the Jupyter notebook code
can be automatically generated by using this utility.  If there are model params than just the
rate constants (for example: initial concentrations), then this function does NOT generate
all code necessary for setting up the model params.

See 'ex_create_model_in.csv' for an example of the input file formatting and 'ex_create_model_out.txt'
for an example of the output file.  The example corresponds to the running H2O2 photolysis example
described in 'ex_notebook_1.ipynb'.

The input file should be a five column comma-separated values (csv) file.  Each row represents
a different reaction.
	Column 1: contains a string that represents the rate constant.
	Column 2: contains a string that represents the reaction in A+B==>C format.
	Column 3: contains the value of the rate constant.
	Column 4: contains the uncertainty in the rate constant (1-sigma).  (Uncertainty not used if fit = True.)
	Column 5: contains True/False where True means to fit this rate constant and False means fix the rate constant.

Leading coefficients in a reaction (ex: '2A==>3B') will be properly interpreted.  If no
products are specified for a reaction (ex: 'C==>'), then it assumed that the products are
completely removed from the reaction system (ex: wall loss).

The create_model_code function has three arguments:
	file_in : str
		The name of the csv file containing the model. (Example: 'ex_create_model_in.csv')
	file_out: str
		The name the txt file that will contain the generated code. (Example: 'ex_create_model_out.txt')
		Any preexisting file with this name is overwritten.
	rxn_delim: str, optional
		The delimeter used to separate reactants and products.  If not specified, then the
		default delimiter of '==>' will be assumed.

The function can be used from the python interpreter command line or a Jupyter notebook.
In either case, the below two lines would reproduce the example:

>>> from create_model_code import create_model_code
>>> create_model_code('ex_create_model_in.csv', 'ex_create_model_out.txt')

The code does NOT check for atom balance between the reactants and products.
The code will throw errors and crash if:
---There are more or less than five columns in a row.
---The same rate constant is used for more than one reaction.
---A rate constant is not specified for a reaction.
---The rxn_delim does not appear in a reaction or appears more than once.
---No reactants are specified for a reaction.
---Fitting True/False not specified (or improperly formatted) for a reaction.

'''

import csv

def create_model_code(file_in, file_out, rxn_delim='==>'):

	k_list = []
	dy_dt = {}
	k_val = {}
	k_err = {}
	k_fit = {}

	with open(file_in) as f:

		reader = csv.reader(f)
		for i, rxn in enumerate(reader):

			if len(rxn) != 5:
				raise ValueError('Input file line {:d} is not properly comma separated!'.format(i+1))

			# Add rate constant to global rate constant list
			k = rxn[0]
			if k in k_list:
				raise ValueError('Rate constant \'{}\' appears in input file more than once!'.format(k))
			if k == '':
				raise ValueError('Input file line {:d} does not contain a rate constant!'.format(i+1))
			k_list.append(k)

			# Break apart the reaction line
			rxn_split = rxn[1].split(rxn_delim)
			if len(rxn_split) != 2:
				raise ValueError('Reaction on line {:d} of input file is not properly formatted!'.format(i+1))
			left = rxn_split[0].split('+')
			right = rxn_split[1].split('+')
			if left == ['']:
				raise ValueError('No reactants specified for the reaction on line {:d} of input file!'.format(i+1))
			if right == ['']:
				right = []

			# Process stoichiometry to create reactant and product lists
			# Example: 2A ==> B + 2C should produce reactants = ['A','A'] and products = ['B','C','C']
			def process_coeff(x):
				idx = 0
				while x[:idx+1].isdigit():
					idx += 1
				return ([x] if idx == 0 else int(x[:idx])*[x[idx:]])

			reactants = []
			for term in left:
				reactants += process_coeff(term)
			products = []
			for term in right:
				products += process_coeff(term)

			# Add a key in dy_dt for any new species
			for species in (reactants+products):
				if species not in dy_dt.keys():
					dy_dt[species] = []

			# Add reaction rate terms to diff eq matrix
			rxn_txt = (k + '*' + '*'.join(reactants))

			for species in reactants:
				dy_dt[species].append('-' + rxn_txt)
			for species in products:
				dy_dt[species].append('+' + rxn_txt)

			# Add rate constant value, error, and fit (True/False) to k_val, k_err, and k_fit respectively
			k_val[k] = rxn[2]
			k_err[k] = rxn[3]

			fit = rxn[4].upper()
			if fit == 'TRUE':
				k_fit[k] = 'True'
			elif fit == 'FALSE':
				k_fit[k] = 'False'
			else:
				raise ValueError('Fitting True/False not specified for rate constant on line {:d} of input file!'.format(i+1))

	species_names = dy_dt.keys()

	with open(file_out, 'w') as f:

		# Header for user model function code blocks
		print('# Code blocks for user model function:', file=f)
		print(file=f)

		# Extraction of rate constants from model_params argument
		for k in k_list:
			print(k + ' = model_params[\'' + k + '\']', file=f)
		print(file=f)

		# Initial concentrations code (right side of equation still needs to be completed by user)
		for species in species_names:
			print(species + '_0 = ', file=f)
		print('y0 = np.array([' + '_0, '.join(species_names) + '_0])', file=f)
		print(file=f)

		# Function defining system of differential rate equations
		print('def calc_dy_dt(y, t_curr):', file=f)
		for i, species in enumerate(species_names):
			print('\t' + species + ' = y[' + str(i) + ']', file=f)
		print(file=f)
		for species in species_names:
			print('\td' + species + ' = ' + ' '.join(dy_dt[species]), file=f)
		print(file=f)
		print('\tdy_dt = np.array([d' + ', d'.join(species_names) +'])', file=f)
		print('\treturn dy_dt', file=f)
		print(file=f)

		# Header for Jupyter notebook code blocks
		print('# Code blocks for Jupyter notebook (model parameters setup):', file=f)
		print(file=f)

		for k in k_list:
			print(k + ' = ' + k_val[k], file=f)
		print(file=f)

		for k in k_list:
			print(k + '_err = ' + k_err[k], file=f)
		print(file=f)

		for k in k_list:
			print('df_model_params[\'' + k + '\'] = {\'val\':' + k + ', \'err\':' + k + '_err, \'fit\':' + k_fit[k] + '}', file=f)