Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receiving KeyError: '1' when attempting arcrestart #624

Open
calvinp0 opened this issue Mar 25, 2023 · 4 comments
Open

Receiving KeyError: '1' when attempting arcrestart #624

calvinp0 opened this issue Mar 25, 2023 · 4 comments

Comments

@calvinp0
Copy link
Member

calvinp0 commented Mar 25, 2023

Describe the bug
Fails to restart - reports this in the traceback

Traceback (most recent call last):
  File "/Local/ce_dana/Code/ARC//ARC.py", line 69, in <module>
    main()
  File "/Local/ce_dana/Code/ARC//ARC.py", line 65, in main
    arc_object.execute()
  File "/Local/ce_dana/Code/ARC/arc/main.py", line 583, in execute
    fine_only=self.fine_only,
  File "/Local/ce_dana/Code/ARC/arc/scheduler.py", line 484, in __init__
    self.schedule_jobs()
  File "/Local/ce_dana/Code/ARC/arc/scheduler.py", line 500, in schedule_jobs
    self.run_conformer_jobs()
  File "/Local/ce_dana/Code/ARC/arc/scheduler.py", line 1048, in run_conformer_jobs 
    self.process_conformers(label)
  File "/Local/ce_dana/Code/ARC/arc/scheduler.py", line 1748, in process_conformers 
    conformer=i,
  File "/Local/ce_dana/Code/ARC/arc/scheduler.py", line 831, in run_job
    self.save_restart_dict()
  File "/Local/ce_dana/Code/ARC/arc/scheduler.py", line 3468, in save_restart_dict  
    + [self.job_dict[spc.label]['tsg'][get_i_from_job_name(job_name)].as_dict()
  File "/Local/ce_dana/Code/ARC/arc/scheduler.py", line 3467, in <listcomp>
    for job_name in self.running_jobs[spc.label] if 'conformer' in job_name] \
KeyError: 1

How to reproduce
A 20 reaction list input file that I attempted to restart after the initial failed run

Additional context
Maybe all these are related? #622 #623

Here is the restart file
restart.zip

@calvinp0
Copy link
Member Author

Error is occurring for

Label: 'r_162_[CH2]c1ccccc1'
self.job_dict[spc.label]: {'conformers': {0: <arc.job.adapters.ga...211622750>}}

In reality, there were 4 conformers completed.
image

@calvinp0
Copy link
Member Author

Here is part of another restart.

  r_177_[CH]=CC=C:
  - args:
      block: {}
      keyword:
        general: scf=xqc
      trsh: {}
    conformer: 1
    constraints: []
    cpu_cores: 10
    ess_settings: *id005
    ess_trsh_methods:
    - restart_due_to_file_not_found
    execution_type: queue
    fine: false
    initial_time: '2023-03-25 12:15:46'
    job_adapter: gaussian
    job_id: '319902'
    job_memory_gb: 7.0
    job_name: conformer1
    job_num: 79
    job_server_name: a12079
    job_status:
    - running
    - error: ''
      keywords: []
      line: ''
      status: initializing
    job_type: conformers
    level:
      basis: def2svp
      compatible_ess: *id007
      method: wb97xd
      method_type: dft
      software: gaussian
    max_job_time: 120
    project: arc_ll_hab
    project_directory: /home/calvin/Code/arc_restart_debug/20_rows_171_to_190/
    server: local
    server_nodes: []
    species_labels:
    - r_177_[CH]=CC=C

In the debug, self.running_jobs[spc.label]: ['conformer0','conformer1']. So it appears from first glance that when reading in the yaml file, it is adding also conformer0 to the self.running_jobs[spc.label] dictionary

@calvinp0
Copy link
Member Author

Okay, so when the restart yaml is parsed, it correctly records which conformers are in the running_jobs (in this case conformer1) and also self.job_dict (self.job_dict['r_177_[CH]=CC=C'] = {'conformers':{1:<arc.job.adapters>}}).

However, as the run restarts, it prints this out in the terminal:

Running local queue job conformer0 (a89) using gaussian for r_177_[CH]=CC=C

And thus, everything changes.

Now, self.job_dict['r_177_[CH]=CC=C'] = {'conformers':{0:<arc.job.adapters>}} and self.running_jobs['r_177_[CH]=CC=C'] = ['conformer1', 'conformer0']. So it appears it isn't starting from 1, but rather from 0? And then removing the conformers: 1 from the self.job_dict but then appending to the self.running_jobs list the new conformer0. So when it does this line:
https://github.com/ReactionMechanismGenerator/ARC/blob/main/arc/scheduler.py#L3470

It errors out because there is no conformer1 in the self.job_dict

@alongd
Copy link
Member

alongd commented Mar 26, 2023

The conformer counter should indeed start at 0. Need to check why you got self.job_dict['r_177_[CH]=CC=C'] = {'conformers':{1:<arc.job.adapters>}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants