Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: same TCP connection appears twice #631

Merged
merged 1 commit into from
Apr 15, 2024
Merged

fix: same TCP connection appears twice #631

merged 1 commit into from
Apr 15, 2024

Conversation

weidongkx
Copy link
Contributor

@weidongkx weidongkx commented Apr 9, 2024

/proc/net/{TCP, TCP6, UDP, UDP6} are dynamic ,
and when we read these files, we should read them all at once.
there will be data consistency issues if using line by line reading

fix: #576

```
TCP, TCP6, UDP, and UDP6 are dynamically changing,
and when we read these files, we should read them all at once.
there will be data consistency issues if using line by lin reading

fix: #576
```

Signed-off-by: weidongkl <[email protected]>
@discordianfish
Copy link
Member

Makes sense but I'm still worried about the increased memory usage here. These can be significant sized.. @SuperQ wdyt?

@weidongkx
Copy link
Contributor Author

net-tools used setvbuf _IOFBF to read tcp file at once too. data consistency should be more important than the increased memory usage

@weidongkx
Copy link
Contributor Author

@SuperQ wdyt

@discordianfish
Copy link
Member

Yeah I think we'll need to do this

@SuperQ SuperQ merged commit 987bedc into prometheus:master Apr 15, 2024
9 checks passed
@dswarbrick
Copy link
Contributor

net-tools used setvbuf _IOFBF to read tcp file at once too. data consistency should be more important than the increased memory usage

net-tools does not read the entire file at once. It merely uses buffered IO, with a buffer of PAGE_SIZE:

FILE *proc_fopen(const char *name)
{
    static char *buffer;
    static size_t pagesz;
    FILE *fd = fopen(name, "r");

    if (fd == NULL)
      return NULL;

    if (!buffer) {
      pagesz = getpagesize();
      buffer = malloc(pagesz);
    }   

    setvbuf(fd, buffer, _IOFBF, pagesz);
    return fd; 
}

I suspect that an equivalent approach in Go, which would have only required a single line change, would be to replace the previous io.LimitReader with a bufio.NewReaderSize(f, os.Getpagesize()).

@SuperQ
Copy link
Member

SuperQ commented Apr 16, 2024

Actually, this may be worse. os.ReadFile() attempts to open and read into a buffer the size of the file in one read. But only if it can stat the size of the file. Otherwise it reads 512 bytes at a time.

The proc files in question have no size.

$ stat /proc/net/tcp
  File: /proc/net/tcp
  Size: 0         	Blocks: 0          IO Block: 1024   regular empty file
Device: 17h/23d	Inode: 4026532077  Links: 1
Access: (0444/-r--r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2024-04-16 09:48:28.233648680 +0200
Modify: 2024-04-16 09:48:28.233648680 +0200
Change: 2024-04-16 09:48:28.233648680 +0200
 Birth: -

So this change likely does not fix the problem and may make it worse.

I think I'm going to revert this, without proof that this fix actually works, I think the original code is actually what we want.

SuperQ added a commit that referenced this pull request Apr 16, 2024
@dswarbrick
Copy link
Contributor

@SuperQ The os.Readfile approach indeed results in some unusual read syscall behaviour:

openat(AT_FDCWD, "/proc/net/tcp", O_RDONLY|O_CLOEXEC) = 3
fcntl(3, F_GETFL)                       = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
epoll_create1(EPOLL_CLOEXEC)            = 4
pipe2([5, 6], O_NONBLOCK|O_CLOEXEC)     = 0
epoll_ctl(4, EPOLL_CTL_ADD, 5, {events=EPOLLIN, data={u32=6294888, u64=6294888}}) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=513802241, u64=9100870932907425793}}) = 0
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(3, "  sl  local_address rem_address "..., 512) = 512
read(3, "0000 00000000   101        0 167"..., 384) = 384
read(3, "   \n   5: 0601A8C0:9BBA 86C8F950"..., 512) = 512
read(3, "00000000 00000000  1000        0"..., 640) = 640
read(3, " 0000000000000000 52 4 26 7 7   "..., 1024) = 1024
read(3, "000  1000        0 128169 2 0000"..., 1024) = 1024
read(3, "00000000 01:00000062 00000004  1"..., 1280) = 1280
read(3, "1 -1                   \n  35: 06"..., 1536) = 1536
read(3, "C0:81D6 A7619125:5D91 01 0000000"..., 2560) = 2560
read(3, "B6F5A2:A5B4 01 00000009:00000000"..., 2816) = 2816
read(3, "           \n  81: 0601A8C0:8E98 "..., 4096) = 462
read(3, "", 3634)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 3, 0xc000147bf4) = 0
close(3)

It appears to start with a 512-byte read, but gradually increases (apart from a strange decrease to 384 bytes for the second read).

Compare that with the original io.LimitReader approach:

openat(AT_FDCWD, "/proc/net/tcp", O_RDONLY|O_CLOEXEC) = 3
fcntl(3, F_GETFL)                       = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
epoll_create1(EPOLL_CLOEXEC)            = 4
pipe2([5, 6], O_NONBLOCK|O_CLOEXEC)     = 0
epoll_ctl(4, EPOLL_CTL_ADD, 5, {events=EPOLLIN, data={u32=6299176, u64=6299176}}) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=3198156801, u64=8984441875505086465}}) = 0
read(3, "  sl  local_address rem_address "..., 4096) = 4050
read(3, "  26: 0601A8C0:B45E 86A4E9AA:C8D"..., 4096) = 4050
read(3, "  53: 0601A8C0:A0D6 2EB6F5A2:A5B"..., 4096) = 3300
read(3, "", 4096)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 3, 0xc000147c04) = 0
close(3)

Incidentally, my suggested bufio.Reader approach looks identical to the io.LimitReader strace, which is no surprise I guess, if they're both using the same underlying io.Reader.

@dswarbrick
Copy link
Contributor

Also for comparison, the (deprecated) netstat command:

openat(AT_FDCWD, "/proc/net/tcp", O_RDONLY) = 3
read(3, "  sl  local_address rem_address "..., 4096) = 4050
read(3, "  26: 0601A8C0:8938 8416AB51:05A"..., 4096) = 4050
read(3, "  53: 0601A8C0:8102 182F853E:589"..., 4096) = 4050
read(3, "  80: 0601A8C0:B44A B6126749:D98"..., 4096) = 4050
read(3, " 107: 0601A8C0:8890 6D39D766:DE9"..., 4096) = 4050
read(3, " 134: 0601A8C0:A3A4 9630C658:1AE"..., 4096) = 300
read(3, "", 4096)                       = 0
close(3)

So if anything, the original code most closely resembles how net-tools reads /proc/net/tcp.

Now, if both prometheus/procfs and net-tools produce inconsistent / duplicated entries, then I would say that it's time to use rtnetlink to get this information, as iproute2 does with the ss (socket statistics) tool.

@SuperQ
Copy link
Member

SuperQ commented Apr 16, 2024

We already switched to netlink in the node_exporter tcpstat collector.

@dswarbrick
Copy link
Contributor

Exactly - so should these legacy functions that parse /proc/net/tcp{,6} be marked as deprecated to discourage people from using them?

@SuperQ
Copy link
Member

SuperQ commented Apr 16, 2024

Yea, maybe not a bad idea. @discordianfish What do you think of marking this reader/parser as deprecated?

SuperQ added a commit that referenced this pull request Apr 17, 2024
@discordianfish
Copy link
Member

@SuperQ yeah that makes sense

codeboten referenced this pull request in open-telemetry/opentelemetry-collector-contrib Apr 23, 2024
)

[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [github.com/prometheus/procfs](https://togithub.com/prometheus/procfs)
| `v0.13.0` -> `v0.14.0` |
[![age](https://developer.mend.io/api/mc/badges/age/go/github.com%2fprometheus%2fprocfs/v0.14.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/go/github.com%2fprometheus%2fprocfs/v0.14.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/go/github.com%2fprometheus%2fprocfs/v0.13.0/v0.14.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/go/github.com%2fprometheus%2fprocfs/v0.13.0/v0.14.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

### Release Notes

<details>
<summary>prometheus/procfs (github.com/prometheus/procfs)</summary>

###
[`v0.14.0`](https://togithub.com/prometheus/procfs/releases/tag/v0.14.0)

[Compare
Source](https://togithub.com/prometheus/procfs/compare/v0.13.0...v0.14.0)

#### What's Changed

- Synchronize common files from prometheus/prometheus by
[@&#8203;prombot](https://togithub.com/prombot) in
[https://github.com/prometheus/procfs/pull/613](https://togithub.com/prometheus/procfs/pull/613)
- Synchronize common files from prometheus/prometheus by
[@&#8203;prombot](https://togithub.com/prombot) in
[https://github.com/prometheus/procfs/pull/615](https://togithub.com/prometheus/procfs/pull/615)
- Synchronize common files from prometheus/prometheus by
[@&#8203;prombot](https://togithub.com/prombot) in
[https://github.com/prometheus/procfs/pull/616](https://togithub.com/prometheus/procfs/pull/616)
- Synchronize common files from prometheus/prometheus by
[@&#8203;prombot](https://togithub.com/prombot) in
[https://github.com/prometheus/procfs/pull/621](https://togithub.com/prometheus/procfs/pull/621)
- Revert add avgRTT to nfs mountstats
[#&#8203;487](https://togithub.com/prometheus/procfs/issues/487) by
[@&#8203;SuperQ](https://togithub.com/SuperQ) in
[https://github.com/prometheus/procfs/pull/625](https://togithub.com/prometheus/procfs/pull/625)
- style: returns procfs build-in error like other parsing methods by
[@&#8203;weidongkl](https://togithub.com/weidongkl) in
[https://github.com/prometheus/procfs/pull/630](https://togithub.com/prometheus/procfs/pull/630)
- update MAINTAINERS.md by [@&#8203;pgier](https://togithub.com/pgier)
in
[https://github.com/prometheus/procfs/pull/629](https://togithub.com/prometheus/procfs/pull/629)
- \*: `s/(%v|%s)/%w` and use `go1.20` by
[@&#8203;rexagod](https://togithub.com/rexagod) in
[https://github.com/prometheus/procfs/pull/617](https://togithub.com/prometheus/procfs/pull/617)
- chore: Use kernel-compliant types for `{U,G}IDs` by
[@&#8203;rexagod](https://togithub.com/rexagod) in
[https://github.com/prometheus/procfs/pull/620](https://togithub.com/prometheus/procfs/pull/620)
- Update Go versions by [@&#8203;SuperQ](https://togithub.com/SuperQ) in
[https://github.com/prometheus/procfs/pull/632](https://togithub.com/prometheus/procfs/pull/632)
- fix: same TCP connection appears twice by
[@&#8203;weidongkl](https://togithub.com/weidongkl) in
[https://github.com/prometheus/procfs/pull/631](https://togithub.com/prometheus/procfs/pull/631)
- Revert "fix: same TCP connection appears twice" by
[@&#8203;SuperQ](https://togithub.com/SuperQ) in
[https://github.com/prometheus/procfs/pull/633](https://togithub.com/prometheus/procfs/pull/633)

#### New Contributors

- [@&#8203;rexagod](https://togithub.com/rexagod) made their first
contribution in
[https://github.com/prometheus/procfs/pull/617](https://togithub.com/prometheus/procfs/pull/617)

**Full Changelog**:
prometheus/procfs@v0.13.0...v0.14.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "on tuesday" (UTC), Automerge - At any
time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/open-telemetry/opentelemetry-collector-contrib).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zMTMuMSIsInVwZGF0ZWRJblZlciI6IjM3LjMxMy4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJkZXBlbmRlbmNpZXMiLCJyZW5vdmF0ZWJvdCJdfQ==-->

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: opentelemetrybot <[email protected]>
jritter pushed a commit to jritter/procfs that referenced this pull request Jul 15, 2024
```
TCP, TCP6, UDP, and UDP6 are dynamically changing,
and when we read these files, we should read them all at once.
there will be data consistency issues if using line by lin reading

fix: prometheus#576
```

Signed-off-by: weidongkl <[email protected]>
jritter pushed a commit to jritter/procfs that referenced this pull request Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

procfs has data consistency issues when reading /proc/net/tcp
5 participants