Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify pipeline #39

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
54 changes: 24 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,53 +1,47 @@
![pytest](https://github.com/N3PDF/pycompressor/workflows/pytest/badge.svg)
[![documentation](https://github.com/N3PDF/pycompressor/workflows/docs/badge.svg)](https://n3pdf.github.io/pycompressor/)

### pycompressor
## pycompressor

Fast and efficient python implementation of PDF set **compressor** (https://arxiv.org/abs/1504.06469).
Fast and efficient python implementation of PDF **compression** (https://arxiv.org/abs/1504.06469).

#### New features

Additional new features have been added to the following python package. The two main features are:
- **Covariance Matrix Adaptation-Evlotion strategy (CMA-ES):** in addition to the Genetic
Algorithm (GA), there is now the possibility to choose as a minimizer the CMA. The choice
of minimizer can be defined in the `runcard.yml` file.
- **Generative Adversarial Strategy (GANs):** this is a standalone python [package](https://github.com/N3PDF/ganpdfs/tree/master)
that can enhance the statistics of the prior PDF replicas before compression by generating
synthetic replicas. For more details, refer to the [documentation](https://n3pdf.github.io/ganpdfs/)
(still has to be done). In a similar way, in order to trigger the enhancement, one just has to set
the value of `enhance` in the runcard to be `True`. Setting this value to `False` will just run the
standard compression. The GANs also requires extra-parameters (as shown in the example
[runcard.yml](https://github.com/N3PDF/pycompressor/blob/master/runcard.yml)) that defines
the structure of the networks.

#### Installation
### How to install

To install `pyCompressor`, just type:
```bash
python setup.py install
```
or if you are a developer:
```bash
python setup.py develop
python setup.py install # or python setup.py develop (if you want development mode)
```

#### How to use
### How to use

#### Standard compression

The input parameters that define the compression is contained in a YAML file. To run
the `pycompressor` code, just type the following:
The input parameters that define the compression is contained in a YAML file. To run the standard compression,
Radonirinaunimi marked this conversation as resolved.
Show resolved Hide resolved
use the reference [runcard](https://github.com/N3PDF/pycompressor/blob/master/runcards/runcard.yml) as it is by
just replacing the entry of the `pdf` key with the name of the PDF set, then run the following:
Radonirinaunimi marked this conversation as resolved.
Show resolved Hide resolved
```bash
pycomp runcards/runcard.yml [--threads NUMB_THREADS]
```
A detailed instruction on how to set the different parameters in the runcard can be found here.

#### Generating compressed PDF set & post-analysis
#### Using GAN and/or Compressing from an enhanced set

Although it is advised to run the [ganpdfs](https://github.com/N3PDF/ganpdfs) independently, it is possible
to generate enhanced PDF replicas within the `pycompressor`. To do so, just set the entry `enhance` in the
runcard to `True` and specify the total number of replicas (prior+synthetics).

Finally, in order to perform a compression with an enhanced set, set the entry `existing_enhanced` to `True`.

A detailed instruction on how to set the different parameters in the runcard can be found
[here](https://n3pdf.github.io/pycompressor/howto/howto.html).

### Generating compressed PDF set & post-analysis

The code will create a folder named after the prior PDF sets. To generate the
compressed PDF grid, run the following command:
```bash
get-grid -i <PDF_NAME>/compressed_<PDF_NAME>_<NB_COMPRESSED>_output.dat
```
Note that if the compression is done from an enhanced set, the output folder will be append by `_enhanced`.
Note that if the compression is done from an enhanced set, the output folder will be appended by `_enhanced`.

Finally, in order to generate ERF plots, enter in the `erfs_output` directory and run the following:
```bash
Expand All @@ -56,7 +50,7 @@ validate --random erf_randomized.dat --reduced erf_reduced.dat
This script can also plot the ERF validation from the old compressor code by adding the flag
`--format ccomp`.

#### Warning
### Warning

This package cannot be installed with python 3.9 yet due to the numba dependency. This will be resolved
soon according to [#6579](https://github.com/numba/numba/pull/6579).
6 changes: 0 additions & 6 deletions runcards/ganpdfs.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,3 @@
#############################################################################################
# Input PDF #
#############################################################################################
pdf: NNPDF40_nnlo_as_0118_1000

#############################################################################################
# PDF Grids: #
# --------- #
Expand Down Expand Up @@ -70,4 +65,3 @@ nd_steps : 4 # Number of steps to train
ng_steps : 3 # Number of steps to train the Generator for one training run
batch_size : 70 # Batch size per epoch in terms of percentage
epochs : 1000 # Number of epochs
pdf: NNPDF40_nnlo_as_0118_1000
216 changes: 115 additions & 101 deletions src/pycompressor/compressing.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,20 +63,6 @@ def check_validity(pdfsetting, compressed, gans, est_dic):
f" {members} members if enhancing is not active.")


@make_argcheck
def check_adiabaticity(pdfsetting, gans, compressed):
""" Check whether we are in an adiabatic optimization and if so if it can be performed """
pdf_name = pdfsetting["pdf"]
if pdfsetting.get("existing_enhanced") and not gans.get("enhanced"):
adiabatic_result = f"{pdf_name}/compress_{pdf_name}_{compressed}_output.dat"
if not pathlib.Path(adiabatic_result).exists():
raise CheckError(
"Adiabatic optimization needs to be ran first with existing_enhanced: False"
f"\nMissing the file: {adiabatic_result}"
)


@check_adiabaticity
@check_validity
def compressing(pdfsetting, compressed, minimizer, est_dic, gans):
"""
Expand All @@ -94,7 +80,7 @@ def compressing(pdfsetting, compressed, minimizer, est_dic, gans):
"""

pdf = str(pdfsetting["pdf"])
enhanced_already_exists = pdfsetting.get("existing_enhanced", False)
enhd_exists = pdfsetting.get("existing_enhanced", False)

if gans["enhance"]:
from pycompressor.postgans import postgans
Expand All @@ -121,95 +107,123 @@ def compressing(pdfsetting, compressed, minimizer, est_dic, gans):
postgans(str(pdf), outfolder, nbgen)

splash()
# Set seed
rndgen = Generator(PCG64(seed=0))

console.print("\n• Load PDF sets & Printing Summary:", style="bold blue")
xgrid = XGrid().build_xgrid()
# Load Prior Sets
prior = PdfSet(pdf, xgrid, Q0, NF).build_pdf()
rndindex = rndgen.choice(prior.shape[0], compressed, replace=False)
# Load Enhanced Sets
if enhanced_already_exists:
try:
postgan = pdf + "_enhanced"
final_result = {"pdfset_name": postgan}
enhanced = PdfSet(postgan, xgrid, Q0, NF).build_pdf()
except RuntimeError as excp:
raise LoadingEnhancedError(f"{excp}")
nb_iter, ref_estimators = 100000, None
init_index = np.array(extract_index(pdf, compressed))
else:
final_result = {"pdfset_name": pdf}
nb_iter, ref_estimators = 15000, None
init_index, enhanced = rndindex, prior

# Create output folder
outrslt = postgan if enhanced_already_exists else pdf
folder = pathlib.Path().absolute() / outrslt
folder.mkdir(exist_ok=True)
# Create output folder for ERF stats
out_folder = pathlib.Path().absolute() / "erfs_output"
out_folder.mkdir(exist_ok=True)

# Output Summary
table = Table(show_header=True, header_style="bold magenta")
table.add_column("Parameters", justify="left", width=24)
table.add_column("Description", justify="left", width=50)
table.add_row("PDF set name", f"{pdf}")
table.add_row("Size of Prior", f"{prior.shape[0] - 1} replicas")
if enhanced_already_exists:
table.add_row("Size of enhanced", f"{enhanced.shape[0] - 1} replicas")
table.add_row("Size of compression", f"{compressed} replicas")
table.add_row("Input energy Q0", f"{Q0} GeV")
table.add_row(
"x-grid size",
f"{xgrid.shape[0]} points, x=({xgrid[0]:.4e}, {xgrid[-1]:.4e})"
)
table.add_row("Minimizer", f"{minimizer}")
console.print(table)

# Init. Compressor class
comp = Compress(
prior,
enhanced,
est_dic,
compressed,
init_index,
ref_estimators,
out_folder,
rndgen
)
# Start compression depending on the Evolution Strategy
erf_list = []
console.print("\n• Compressing MC PDF replicas:", style="bold blue")
if minimizer == "genetic":
# Run compressor using GA
with trange(nb_iter) as iter_range:
for _ in iter_range:
iter_range.set_description("Compression")
erf, index = comp.genetic_algorithm(nb_mut=5)
erf_list.append(erf)
iter_range.set_postfix(ERF=erf)
elif minimizer == "cma":
# Run compressor using CMA
erf, index = comp.cma_algorithm(std_dev=0.8)
else:
raise ValueError(f"{minimizer} is not a valid minimizer.")

# Prepare output file
final_result["ERFs"] = erf_list
final_result["index"] = index.tolist()
outfile = open(f"{outrslt}/compress_{pdf}_{compressed}_output.dat", "w")
outfile.write(json.dumps(final_result, indent=2))
outfile.close()
# Fetching ERF and construct reduced PDF grid
console.print(f"\n• Final ERF: [bold red]{erf}.", style="bold red")

# Compute final ERFs for the final choosen replicas
final_err_func = comp.final_erfs(index)
serfile = open(f"{out_folder}/erf_reduced.dat", "a+")
serfile.write(f"{compressed}:")
serfile.write(json.dumps(final_err_func))
serfile.write("\n")
serfile.close()

outname = [pdf]
final_result = [{"pdfset_name": pdf}]
nb_iter, ref_estimators = [15000], [None]
Radonirinaunimi marked this conversation as resolved.
Show resolved Hide resolved
init_index, enhanced = [rndindex], [prior]
Radonirinaunimi marked this conversation as resolved.
Show resolved Hide resolved

# Methodological iterations
mtd_iteration = 2 if enhd_exists else 1

for cmtype in range(mtd_iteration):
# necessary to get the same normalization
rndgen = Generator(PCG64(seed=0))
_ = rndgen.choice(prior.shape[0], compressed, replace=False)
# reference log
if cmtype==0:
console.print(
"Standard compression using Input set",
style="bold green underline"
)
elif cmtype==1:
console.print(
"Adiabatic compression using Enhanced set",
style="bold green underline"
)

# Create output folder
outrslt = outname[cmtype]
folder = pathlib.Path().absolute() / outrslt
folder.mkdir(exist_ok=True)
# Create output folder for ERF stats
out_folder = pathlib.Path().absolute() / "erfs_output"
out_folder.mkdir(exist_ok=True)

# Output Summary
console.print("\n• Compression Summary:", style="bold blue")
table = Table(show_header=True, header_style="bold magenta")
table.add_column("Parameters", justify="left", width=24)
table.add_column("Description", justify="left", width=50)
table.add_row("PDF set name", f"{pdf}")
table.add_row("Size of Prior", f"{prior.shape[0] - 1} replicas")
if cmtype!=0 and enhd_exists:
table.add_row(
"Size of enhanced",
f"{enhanced[1].shape[0] - 1} replicas"
)
table.add_row("Size of compression", f"{compressed} replicas")
table.add_row("Input energy Q0", f"{Q0} GeV")
table.add_row(
"x-grid size",
f"{xgrid.shape[0]} points, x=({xgrid[0]:.4e}, {xgrid[-1]:.4e})"
)
table.add_row("Minimizer", f"{minimizer}")
console.print(table)

# Init. Compressor class
comp = Compress(
prior,
enhanced[cmtype],
est_dic,
compressed,
init_index[cmtype],
ref_estimators[cmtype],
out_folder,
rndgen
)
# Start compression depending on the Evolution Strategy
erf_list = []
console.print("\n• Compressing MC PDF replicas:", style="bold blue")
if minimizer == "genetic":
# Run compressor using GA
with trange(nb_iter[cmtype]) as iter_range:
for _ in iter_range:
iter_range.set_description("Compression")
erf, index = comp.genetic_algorithm(nb_mut=5)
erf_list.append(erf)
iter_range.set_postfix(ERF=erf)
elif minimizer == "cma":
# Run compressor using CMA
erf, index = comp.cma_algorithm(std_dev=0.8)
else:
raise ValueError(f"{minimizer} is not a valid minimizer.")

# Prepare output file
final_result[cmtype]["ERFs"] = erf_list
final_result[cmtype]["index"] = index.tolist()
outfile = open(f"{outrslt}/compress_{pdf}_{compressed}_output.dat", "w")
outfile.write(json.dumps(final_result[cmtype], indent=2))
outfile.close()
# Fetching ERF and construct reduced PDF grid
console.print(f"\n• Final ERF: {erf}.", style="bold blue")

if (cmtype!=0 and enhd_exists) or (cmtype==0 and not enhd_exists):
# Compute final ERFs for the final choosen replicas
final_err_func = comp.final_erfs(index)
serfile = open(f"{out_folder}/erf_reduced.dat", "a+")
serfile.write(f"{compressed}:")
serfile.write(json.dumps(final_err_func))
serfile.write("\n")
serfile.close()

# Load Enhanced Sets
if cmtype==0 and enhd_exists:
try:
postgan = pdf + "_enhanced"
outname.append(postgan)
final_result.append({"pdfset_name": postgan})
enhncd = PdfSet(postgan, xgrid, Q0, NF).build_pdf()
enhanced.append(enhncd)
except RuntimeError as excp:
raise LoadingEnhancedError(f"{excp}")
nb_iter.append(100000)
ref_estimators.append(None)
pre_index = np.array(extract_index(pdf, compressed))
init_index.append(pre_index)