Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak occurs when sending eager0 messages in pt2pt #702

Open
qizhi45 opened this issue Nov 1, 2023 · 0 comments
Open

Memory leak occurs when sending eager0 messages in pt2pt #702

qizhi45 opened this issue Nov 1, 2023 · 0 comments
Assignees

Comments

@qizhi45
Copy link

qizhi45 commented Nov 1, 2023

Memory leak occurs when sending eager0 messages in pt2pt for sstmacro standalone model

1 - Detailed description of problem or enhancement

When calling MPI_Send and MPI_Recv to send and receive messages point-to-point, if the protocol for sending messages uses the eager0 protocol. Then a memory leak will occur.
The reason for preliminary analysis is that when sending a message, Eager0::start() new the memory of smsg_buffer_ by calling fillSendBuffer(). When executing NetworkMessage::putOnWire(), a wire_buffer_ is newed, and copy smsg_buffer_ to it, but code does not free the smsg_buffer_ and directly sets smsg_buffer_ to nullptr, at this time, the previously newed memory will not have a pointer to manage, causing a memory leak. The memory released in the NetworkMessage destructor is actually the memory of wire_buffer_.

In addition, the memory management of this part (network_message.cc, eager0.cc) is a bit confusing to me. There are too many new and delete, so that the memory management rights are lost. If so, can this part be structurally optimized?
The heaptrack tools some analyse logs are following:

PEAK MEMORY CONSUMERS
1.64G peak memory consumed over 200000 calls from
sumi::MpiProtocol::fillSendBuffer(int, void*, sumi::MpiType*)
  at ../../sumi-mpi/mpi_protocol/mpi_protocol.cc:62
  in /home/WorkSpace/test/SSTMacroBuild/lib/libsstmac.so.12
819.10M consumed over 100000 calls from:
    sumi::Eager0::start(void*, int, int, int, int, sumi::MpiType*, int, long, int, sumi::MpiRequest*)
      at ../../sumi-mpi/mpi_protocol/eager0.cc:71
      in /home/WorkSpace/test/SSTMacroBuild/lib/libsstmac.so.12
    sumi::MpiQueue::send(sumi::MpiRequest*, int, unsigned short, int, int, sumi::MpiComm*, void*)
      at ../../sumi-mpi/mpi_queue/mpi_queue.cc:197
      in /home/WorkSpace/test/SSTMacroBuild/lib/libsstmac.so.12
    sumi::MpiApi::send(void const*, int, unsigned short, int, int, long)
      at ../../sumi-mpi/mpi_api_send_recv.cc:81
      in /home/WorkSpace/test/SSTMacroBuild/lib/libsstmac.so.12
    sstmac_send
      at ../../sumi-mpi/sstmac_mpi.cc:89
      in /home/WorkSpace/test/SSTMacroBuild/lib/libsstmac.so.12
    userSkeletonMain(int, char**)
      in ./osu_latency
    sstmac::sw::App::run()
      at ../../../sstmac/software/process/app.cc:539
      in /home/WorkSpace/test/SSTMacroBuild/lib/libsstmac.so.12
    sstmac::sw::Thread::runRoutine(void*)
      at ../../../sstmac/software/process/thread.cc:141
      in /home/WorkSpace/test/SSTMacroBuild/lib/libsstmac.so.12
    sstmac_make_fcontext
      at ../../../sstmac/software/threading/asm/make_x86_64_sysv_elf_gas.S:49
      in /home/WorkSpace/test/SSTMacroBuild/lib/libsstmac.so.12
819.10M consumed over 100000 calls from:
    sumi::Eager0::start(void*, int, int, int, int, sumi::MpiType*, int, long, int, sumi::MpiRequest*)
      at ../../sumi-mpi/mpi_protocol/eager0.cc:71
      in /home/WorkSpace/test/SSTMacroBuild/lib/libsstmac.so.12
    sumi::MpiQueue::send(sumi::MpiRequest*, int, unsigned short, int, int, sumi::MpiComm*, void*)
      at ../../sumi-mpi/mpi_queue/mpi_queue.cc:197
      in /home/WorkSpace/test/SSTMacroBuild/lib/libsstmac.so.12
    sumi::MpiApi::send(void const*, int, unsigned short, int, int, long)
      at ../../sumi-mpi/mpi_api_send_recv.cc:81
      in /home/WorkSpace/test/SSTMacroBuild/lib/libsstmac.so.12
    sstmac_send
      at ../../sumi-mpi/sstmac_mpi.cc:89
      in /home/WorkSpace/test/SSTMacroBuild/lib/libsstmac.so.12
    userSkeletonMain(int, char**)
      in ./osu_latency
    sstmac::sw::App::run()
      at ../../../sstmac/software/process/app.cc:539
      in /home/WorkSpace/test/SSTMacroBuild/lib/libsstmac.so.12
    sstmac::sw::Thread::runRoutine(void*)
      at ../../../sstmac/software/process/thread.cc:141
      in /home/WorkSpace/test/SSTMacroBuild/lib/libsstmac.so.12
    sstmac_make_fcontext
      at ../../../sstmac/software/threading/asm/make_x86_64_sysv_elf_gas.S:49
      in /home/WorkSpace/test/SSTMacroBuild/lib/libsstmac.so.12
total runtime: 20.91s.
calls to allocation functions: 9589794 (458710/s)
temporary memory allocations: 6111 (292/s)
peak heap memory consumption: 1.68G
peak RSS (including heaptrack overhead): 1.68G
total memory leaked: 1.64G
suppressed leaks: 10.97K

2 - Describe how to reproduce

git clone https://github.com/sstsimulator/sst-macro.git
./configure --prefix=/home/WorkSpace/test/SSTMacroBuild CFLAGS="-fPIC" CXXFLAGS="-fPIC"
make && make install
cd WorkSpace/test/sst-macro/skeletons/osu-micro-benchmarks-5.3.2
cd mpi/pt2pt
vim osu_latency.c
modify line 87 : for(size = 8191; size > 0; size = 0) {
modify line 99 : for(i = 0; i < 100000; i++) {
modify line 110 : for(i = 0; i < 100000; i++) {
/home/WorkSpce/test/SSTMacroBuild/bin/sst++ -o osu_latency osu_latency.c osu_pt2pt.c -I.
heaptrack /home/WorkSpce/test/SSTMacroBuild/bin/sstmac -f parameter.ini
heaptrack --analyse

If there is no heaptrack environment, you can use other memory leak detection tools, or observe sstmac memory usage through top/htop.

Parameter.ini is the following:

node {
  name = simple
  app1 {
    launch_cmd = aprun -n 2 -N 1
    exe=./osu_latency
  mpi {
    max_vshort_msg_size = 16384
    max_eager_msg_size = 16384
    post_header_delay   = 0.25us
    post_rdma_delay     = 0.3us
    rdma_pin_latency    = 0.3us
    rdma_page_delay     = 0.03ns
    }
  }
  proc {
    frequency = 2.6Ghz
    ncores = 60
    parallelism = 16
  }
  memory {
    name = pisces
    total_bandwidth = 51.2GB/s
    nchannels = 6
    latency = 12.5ns
    arbitrator = cut_through
    max_single_bandwidth = 51.2GB/s
  }
  nic {
    name = pisces
    negligible_size = 0
    injection {
      mtu = 4096
      arbitrator = cut_through
      bandwidth = 100Gb/s
      latency = 300ns
      credits = 64KB
    }
    ejection{
      mtu = 4096
      arbitrator = cut_through
      bandwidth = 100Gb/s
      latency = 300ns
      credits = 64KB
    }
  }
}
switch {
 router {
   name = fat_tree
 }
 name = pisces
 arbitrator = cut_through
 mtu = 512
 link {
  bandwidth = 1Gb/s
  latency = 130ns
  credits = 64KB
 }
 xbar {
  bandwidth = 16Tb/s
 }
 logp {
  bandwidth = 1Gb/s
  hop_latency = 100ns
  out_in_latency = 60ns
 }
}
topology {
 name = fat_tree 
 concentration = 8
 num_core_switches = 64
 down_ports_per_core_switch = 16
 num_agg_subtrees = 16
 agg_switches_per_subtree = 8
 up_ports_per_agg_switch = 8
 down_ports_per_agg_switch = 8
 leaf_switches_per_subtree = 8
 up_ports_per_leaf_switch = 8
}

3 - What Operating system(s) and versions

lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 8.5.2111
Release:        8.5.2111
Codename:       n/a

4 - What version of external libraries (Boost, MPI)

g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/8/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --disable-libmpx --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 8.5.0 20210514 (Red Hat 8.5.0-4) (GCC)

5 - Provide sha1 of all relevant sst repositories (sst-core, sst-elements, etc)
SSTMAC repo: c30a5ce

6 - Fill out Labels, Milestones, and Assignee fields as best possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants