-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use more fields to address dynamic address processes #2361
Comments
@doomedraven do you have some insight on how repeating PIDs is handled for CAPE? I remember we did some limited tests for it and did not encounter repetitions. |
Just mentioning this here as well: I think CAPE reports have a "first-seen" field for each process and I think that maybe that could be it (that would be similar to |
sysmon uses a guid associated with each process to deal with this scenario. but i don't think this is exposed in the windows api. so it does seem like there should be a field that capa can provide sandboxes to differentiate colliding PIDs. |
@kevoreilly i guess you will be the best person to respond that, can you help with that? as i can provide answer how cape handles pids in analyzer.py but not in capemon |
It has always been my understanding that it is not technically possible for a process to have the same pid as its parent. The best source I can find seems to be Raymond Chen: https://devblogs.microsoft.com/oldnewthing/20110107-00/?p=11803 but I have heard this many times over the years, and have certainly never seen anything like it myself. I would be happy to learn something new and be proven wrong, but my first reaction to hearing "we saw a duplicate pid on drakvuf" is to assume there is something wrong with that setup rather than this being a universal behavior of Windows... |
@kevoreilly I think the discussion here is about PID reuse, especially when the PIDs of a parent process's children collide after some time. (such as many short lived processes or something over a long period. not PIDs reused at the same moment in time) |
Apologies I misunderstood - I see what you mean. A single sandbox run containing two children at different times with the same pid (and parent)... Well I have never seen that either! I would have thought as you suggest the circumstances in which that might occur would be very unusual, creating a very high number of child processes for example. As such I don't really see this as a problem that needs to be solved in cape as I still can't believe it would happen in a run that one would otherwise expect the sandbox to handle. I stand to be corrected though! But I would need to see it to believe it. |
Sorry but there is a bit of a miscommunication from my part as well. The issue at hand was with correctly identifying processes. Possible implications of not using adequate means to ID them would be thinking that 2 different processes are the same (in the case of PID reuse), or thinking that a single process is in fact 2 different processes (in the case of PPID spoofing and using PPID:PID to identify processes). For the first implication (PID reuse) I think what @kevoreilly said is sufficient, but I think that there should still be an issue with the second case (PPID spoofing/UAC elevation). DRAKVUF sandbox previously used PID and ts_from (process first spotted) and ts_to (last seen process) to identify processes, but it has now moved to using a sequential ID (PR: CERT-Polska/drakvuf-sandbox#958). capa identifies different processes using a combination of PPID and PID, which should be an issue in the case of PPID spoofing/UAC elevation since the same process would be split into two (before UAC elevation and after). It would be nice to see how CAPE handles this issue (identifying processes), so that if it's something similar to what DRAKVUF sandbox does then we could factor that into capa and use it (instead of the current PPID:PID) to identify different processes. If not, then I guess would could add a different optional ( Line 45 in 25111f8
|
Lots of discussion about the design for a PID-replacement over here: elastic/ecs#672 |
As @yelhamer suggested, I think we should extend ProcessAddress class to add an optional field that can be further used to "unique" the process. Dynamic analysis backends, such as Drakvuf, CAPE, VMRay, etc. may provide a sandbox-specific string for this field. If it is present, then capa can differentiate processes whose PIDs collide. If not, then capa may report results that are unexpectedly merged together (not the end of the world, just confusing, and the current behavior). We'll leave it up to the backend authors to figure out the right "unique" data to put into this field. For VMRay, this might be the "monitor thread ID" and for DRAKVUF it might be the "SEQID". We should try to render the "unique" data to users, if possible, because PID/PPID isn't sufficient for machines nor humans to differentiate processes. I don't have strong opinions for the name of the field, but perhaps |
Sounds good although I don't see this as a high priority item currently. |
While working on the DRAKVUF sandbox, we noticed that sometimes processes would have the same PID and PPID and would therefore be fused together in the final generated JSON sandbox report. It would be nice to have some type of way to distinguish between processes.
DRAKVUF (the monitor) gets around this by specifying more fields while reporting each api call that was made or file that was accessed. These fields include ts_from (time when the process was created), ts_to (time where process ended), as well as process name. As for the DRAKVUF sandbox, then the devs have now added a new "SEQID" field that's an alphanumeric value that's generated from ts_from and ts_to, so it might be nice to use one of these two ideas to distinguish between processes with the same PID and PPID.
I think this issue has come up in the past, and I think that maybe we could add an extra field (maybe call it inner?) that we could add to the
capa.features.address.ProcessAddress
class, and then in the case of the DRAKVUF sandbox we could put the newly added SEQID there and use it to tell which process is which. Alternatively, we could register the ts_from and ts_to into ProcessAddress and use it to tell processes apart for all sandboxes?I am not sure how different sandboxes tackle this issue, so maybe some research is needed to try and find a common ground between all of them that we factor out into the
ProcessAddress
class. I am using the issue to get a conversation started on it.The text was updated successfully, but these errors were encountered: