You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OS: CentOS Linux release 7.9.2009 (Core)
Compiler: GCC 13.2.0
CPU: Intel(R) Xeon(R) Platinum 8352V CPU @ 2.10GHz
NUMA node(s): 2
pytorch:1.12.0
lammps version: 2021.09 release
mpi :intel parallel studio xe 2019
When I executed the simulated annealing algorithm on small clusters, I got the following error.
LAMMPS (29 Sep 2021)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
using 1 OpenMP thread(s) per MPI task
units metal
atom_style atomic
boundary p p p
newton on
read_data in.data
Reading data file ...
orthogonal box = (0.0000000 0.0000000 0.0000000) to (20.000000 20.000000 20.000000)
1 by 1 by 1 MPI processor grid
reading atoms ...
12 atoms
read_data CPU = 0.003 seconds
#read_restart file.restart.100000
pair_style allegro
pair_coeff * * fe-total.pth Fe
timestep 0.001 # ps
thermo_style custom step dt time temp ke pe etotal press vol
thermo 20
dump 1 all custom 200 dump.lammpstrj id type x y z
restart 100000 file.restart
fix s1 all nvt temp 0.01 1000 $(100.0*dt)
fix s1 all nvt temp 0.01 1000 0.10000000000000000555
run 30000
Neighbor list info ...
update every 1 steps, delay 10 steps, check yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 8
ghost atom cutoff = 8
binsize = 4, bins = 5 5 5
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pair allegro, perpetual
attributes: full, newton on, ghost
pair build: full/bin/ghost
stencil: full/ghost/bin/3d
bin: standard
Per MPI rank memory allocation (min/avg/max) = 4.315 | 4.315 | 4.315 Mbytes
Step Dt Time Temp KinEng PotEng TotEng Press Volume
0 0.001 0 0 0 -77.797695 -77.797695 0 8000
.......
.......
.......
470920 0.001 470.92 676.16539 0.9614136 -83.998843 -83.03743 128.36286 8000
470940 0.001 470.94 668.32156 0.95026076 -83.998562 -83.048301 126.87379 8000
470960 0.001 470.96 676.39779 0.96174404 -83.99844 -83.036696 128.40698 8000
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 18750 RUNNING AT node02
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 18750 RUNNING AT node02
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
file content is as follows。
units metal
atom_style atomic
boundary p p p
newton on
read_data in.data
#read_restart file.restart.100000
pair_style allegro
pair_coeff * * fe-total.pth Fe
timestep 0.001 # ps
thermo_style custom step dt time temp ke pe etotal press vol
thermo 20
dump 1 all custom 200 dump.lammpstrj id type x y z
restart 100000 file.restart
fix s1 all nvt temp 0.01 1000 $(100.0dt)
run 30000
unfix s1
fix s2 all nvt temp 1000 1000 $(100.0dt)
run 100000
unfix s2
fix s3 all nvt temp 1000 50 $(100.0*dt)
run 6000000
unfix s3
write_data out.data
He did not complete the task. I need to perform 6130000 calculations, but the task ends around 470000 times. Then the error message above appears.
So I tried to use GDB to analyze the errors, but I am not very familiar with this aspect.
The analysis results are as follows.
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) where
#0 0x0000000000000000 in ?? () #1 0x00007fffe0ff25ad in torch::jit::InterpreterStateImpl::callstack() const () from /opt/software/python3/lib/python3.7/site -packages/torch/lib/libtorch_cpu.so #2 0x00007fffe0ff3e8e in torch::jit::InterpreterStateImpl::handleError(std::exception const&, bool, c10::NotImplementedError* , c10::optionalstd::string) ()
from /opt/software/python3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so #3 0x00007fffe1000fd0 in torch::jit::InterpreterStateImpl::runImpl(std::vector<c10::IValue, std::allocatorc10::IValue >&) ( ) from /opt/software/python3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so #4 0x00007fffe0fee44f in torch::jit::InterpreterState::run(std::vector<c10::IValue, std::allocatorc10::IValue >&) () from / opt/software/python3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so #5 0x00007fffe0fe167a in torch::jit::GraphExecutorImplBase::run(std::vector<c10::IValue, std::allocatorc10::IValue >&) () f rom /opt/software/python3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so #6 0x00007fffe0c90ade in torch::jit::Method::operator()(std::vector<c10::IValue, std::allocatorc10::IValue >, std::unordere d_map<std::string, c10::IValue, std::hashstd::string, std::equal_tostd::string, std::allocator<std::pair<std::string const , c10::IValue> > > const&) const () from /opt/software/python3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so #7 0x00000000006f3496 in torch::jit::Module::forward (this=this@entry=0x2c83a38, inputs=..., kwargs=...) at /opt/software/pyt hon3/lib/python3.7/site-packages/torch/include/torch/csrc/jit/api/module.h:114 #8 0x00000000006ef443 in LAMMPS_NS::PairAllegro::compute (this=0x2c836c0, eflag=, vflag=) at /o pt/source/lammps-stable_29Sep2021/src/pair_allegro.cpp:426 #9 0x00000000005379fb in LAMMPS_NS::Verlet::run (this=0x2c82c60, n=6000000) at /opt/source/lammps-stable_29Sep2021/src/verlet .cpp:312 #10 0x00000000004f291b in LAMMPS_NS::Run::command (this=, narg=, arg=) at /opt/so urce/lammps-stable_29Sep2021/src/run.cpp:180 #11 0x0000000000448614 in LAMMPS_NS::Input::execute_command (this=0x2c68cd0) at /opt/source/lammps-stable_29Sep2021/src/input. cpp:794 #12 0x0000000000448c2c in LAMMPS_NS::Input::file (this=0x2c68cd0) at /opt/source/lammps-stable_29Sep2021/src/input.cpp:273 #13 0x00000000004235a8 in main (argc=, argv=) at /opt/source/lammps-stable_29Sep2021/src/main.cp p:98
I noticed that it mentioned Segmentation fault, but I'm not sure how to solve this problem.I hope u can provide me with some valuable help.thanks!
The text was updated successfully, but these errors were encountered:
OS: CentOS Linux release 7.9.2009 (Core)
Compiler: GCC 13.2.0
CPU: Intel(R) Xeon(R) Platinum 8352V CPU @ 2.10GHz
NUMA node(s): 2
pytorch:1.12.0
lammps version: 2021.09 release
mpi :intel parallel studio xe 2019
When I executed the simulated annealing algorithm on small clusters, I got the following error.
LAMMPS (29 Sep 2021)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
using 1 OpenMP thread(s) per MPI task
units metal
atom_style atomic
boundary p p p
newton on
read_data in.data
Reading data file ...
orthogonal box = (0.0000000 0.0000000 0.0000000) to (20.000000 20.000000 20.000000)
1 by 1 by 1 MPI processor grid
reading atoms ...
12 atoms
read_data CPU = 0.003 seconds
#read_restart file.restart.100000
pair_style allegro
pair_coeff * * fe-total.pth Fe
timestep 0.001 # ps
thermo_style custom step dt time temp ke pe etotal press vol
thermo 20
dump 1 all custom 200 dump.lammpstrj id type x y z
restart 100000 file.restart
fix s1 all nvt temp 0.01 1000 $(100.0*dt)
fix s1 all nvt temp 0.01 1000 0.10000000000000000555
run 30000
Neighbor list info ...
update every 1 steps, delay 10 steps, check yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 8
ghost atom cutoff = 8
binsize = 4, bins = 5 5 5
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pair allegro, perpetual
attributes: full, newton on, ghost
pair build: full/bin/ghost
stencil: full/ghost/bin/3d
bin: standard
Per MPI rank memory allocation (min/avg/max) = 4.315 | 4.315 | 4.315 Mbytes
Step Dt Time Temp KinEng PotEng TotEng Press Volume
0 0.001 0 0 0 -77.797695 -77.797695 0 8000
.......
.......
.......
470920 0.001 470.92 676.16539 0.9614136 -83.998843 -83.03743 128.36286 8000
470940 0.001 470.94 668.32156 0.95026076 -83.998562 -83.048301 126.87379 8000
470960 0.001 470.96 676.39779 0.96174404 -83.99844 -83.036696 128.40698 8000
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 18750 RUNNING AT node02
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 18750 RUNNING AT node02
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
Intel(R) MPI Library troubleshooting guide:
https://software.intel.com/node/561764
The input
file content is as follows。
units metal
atom_style atomic
boundary p p p
newton on
read_data in.data
#read_restart file.restart.100000
pair_style allegro
pair_coeff * * fe-total.pth Fe
timestep 0.001 # ps
thermo_style custom step dt time temp ke pe etotal press vol
thermo 20
dump 1 all custom 200 dump.lammpstrj id type x y z
restart 100000 file.restart
fix s1 all nvt temp 0.01 1000 $(100.0dt)
run 30000
unfix s1
fix s2 all nvt temp 1000 1000 $(100.0dt)
run 100000
unfix s2
fix s3 all nvt temp 1000 50 $(100.0*dt)
run 6000000
unfix s3
write_data out.data
He did not complete the task. I need to perform 6130000 calculations, but the task ends around 470000 times. Then the error message above appears.
So I tried to use GDB to analyze the errors, but I am not very familiar with this aspect.
The analysis results are as follows.
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) where
#0 0x0000000000000000 in ?? ()
#1 0x00007fffe0ff25ad in torch::jit::InterpreterStateImpl::callstack() const () from /opt/software/python3/lib/python3.7/site -packages/torch/lib/libtorch_cpu.so
#2 0x00007fffe0ff3e8e in torch::jit::InterpreterStateImpl::handleError(std::exception const&, bool, c10::NotImplementedError* , c10::optionalstd::string) ()
from /opt/software/python3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#3 0x00007fffe1000fd0 in torch::jit::InterpreterStateImpl::runImpl(std::vector<c10::IValue, std::allocatorc10::IValue >&) ( ) from /opt/software/python3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#4 0x00007fffe0fee44f in torch::jit::InterpreterState::run(std::vector<c10::IValue, std::allocatorc10::IValue >&) () from / opt/software/python3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#5 0x00007fffe0fe167a in torch::jit::GraphExecutorImplBase::run(std::vector<c10::IValue, std::allocatorc10::IValue >&) () f rom /opt/software/python3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#6 0x00007fffe0c90ade in torch::jit::Method::operator()(std::vector<c10::IValue, std::allocatorc10::IValue >, std::unordere d_map<std::string, c10::IValue, std::hashstd::string, std::equal_tostd::string, std::allocator<std::pair<std::string const , c10::IValue> > > const&) const () from /opt/software/python3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#7 0x00000000006f3496 in torch::jit::Module::forward (this=this@entry=0x2c83a38, inputs=..., kwargs=...) at /opt/software/pyt hon3/lib/python3.7/site-packages/torch/include/torch/csrc/jit/api/module.h:114
#8 0x00000000006ef443 in LAMMPS_NS::PairAllegro::compute (this=0x2c836c0, eflag=, vflag=) at /o pt/source/lammps-stable_29Sep2021/src/pair_allegro.cpp:426
#9 0x00000000005379fb in LAMMPS_NS::Verlet::run (this=0x2c82c60, n=6000000) at /opt/source/lammps-stable_29Sep2021/src/verlet .cpp:312
#10 0x00000000004f291b in LAMMPS_NS::Run::command (this=, narg=, arg=) at /opt/so urce/lammps-stable_29Sep2021/src/run.cpp:180
#11 0x0000000000448614 in LAMMPS_NS::Input::execute_command (this=0x2c68cd0) at /opt/source/lammps-stable_29Sep2021/src/input. cpp:794
#12 0x0000000000448c2c in LAMMPS_NS::Input::file (this=0x2c68cd0) at /opt/source/lammps-stable_29Sep2021/src/input.cpp:273
#13 0x00000000004235a8 in main (argc=, argv=) at /opt/source/lammps-stable_29Sep2021/src/main.cp p:98
I noticed that it mentioned Segmentation fault, but I'm not sure how to solve this problem.I hope u can provide me with some valuable help.thanks!
The text was updated successfully, but these errors were encountered: