Review #1

NissesSenap · 2021-05-03T07:08:48Z

No description provided.

* Add minor go code, super broken! * Add NOTES.md for me

https://tekton.dev/docs/triggers/triggertemplates/#escaping-quoted-strings

* Update command and args and use env instead * Fix go code... * Set SA correct in triggerTemplate * Set ful path in poddeleter command, tekton changes the WORKSPACE * Ignore poddeleter in falco so we don't get a loop of delete pods

* Add go.sum * Clean up NOTES.md * Add LICENSE

bittrance

Partial review: the readme proved tricky to review on my phone.

I think I would have split the go main function into two functions: one to create the client and one two maybe kill the pod (i.e. it would take a client, a blacklist and a falco event and maybe do something). That should make it possible to write a few unit test for the latter.

Makefile

main.go

bittrance · 2021-05-03T07:26:04Z

main.go

+			critical = true
+			break
+		}
+	}


I believe my calendar says it is 2021. Surely there is a set or hashmap we can use somewhere?

Rather than use an array and iterate over it, we can use a map. Consider:

CriticalNamespaces := map[string]bool { "kube-system": true, "kube-public": true, "kube-node-lease": true, "falco": true, } // ... if CriticalNamespaces[ns] { log.Printf("Protected namespace %s", ns) return }

Golang has no set type so we will have to abuse a map, using bools as values. Once we have a map, we can ask for stuff in it and we do not need to iterate at all. More about maps here https://blog.golang.org/maps.

Ahh this was what I couldn't really wrap my head around. What I should use for key/value but a simple bool seems like a easy enough idea.

I guess you could argue that the []string is easier for people to understand + in the future when we support having criticalNamepsaces as a input variable that probably would be a []string we would need to do some formatting from the input in to the hash map.

Once again way to long in the future for this issue but still.

But I will update it so I remember how to do it in the future :)

main.go

bittrance · 2021-05-03T07:40:21Z

main.go

+			log.Fatalf("Unable to delete pod due to err %v", err)
+			os.Exit(1)
+		}
+	}


I like that you try to create the client first, so you get failure even if we don't need to act. Of course, that assumes there are no kube-system (i.e. cluster-level) events that shows up here when the API is temporarily unavailable (i.e. events that should be ignored but we don't get to the ignoring part because we fail).

I have done a break out now, please have a look.

tekton/task.yaml

tekton/trigger-template.yaml

simongottschlag

This is really cool!! 😁😁

simongottschlag · 2021-05-03T18:25:02Z

falco/values.yaml

+  config:
+    webhook:
+      address: http://el-falco-listener.falcoresponse.svc.cluster.local:8080
+      customHeaders: Falcon:true\,Stuff:yes


Should it be Falcon or Falco?

I have been testing with a few things earlier, I should probably delete it. I would like to get a CEL filter working but tekton have changed the API and got rather tiered when it didn't work.
For now I will keep it just to show that you can add a customHeader. The reason why I wrote falcon was that I wanted to be 100% sure I didn't overwrite any headers that is included in falco.

simongottschlag · 2021-05-03T18:27:32Z

main.go

+}
+
+func main() {
+	var CriticalNamespaces = []string{"kube-system", "kube-public", "kube-node-lease", "falco"}


Could be nice with ability to append more namespaces from environment variables here?

@simongottschlag That would indeed be useful, but remember that we are reviewing code already on master and should be very careful about feature growth. If we introduce env var for this it should be its own PR.

If you look in the bottom of the README you will see a conclusion where I talk about the need for this feature.
The reason why I don't add it is because I want the code to have a quick overview.
This code is just for this blog, we had a small discussion about writing a bigger application to be able to manage these kind of loads in the falco slack channel. https://kubernetes.slack.com/archives/CMWH3EH32/p1619869899307600?thread_ts=1619847638.297000&cid=CMWH3EH32

So lets see how that evolves in the futre.

simongottschlag · 2021-05-03T18:28:51Z

main.go

+	bodyReq := os.Getenv("BODY")
+	if bodyReq == "" {
+		panic("Need to get environment variable BODY")
+	}


Not sure how big the body can be, but there are limits (even though large) to environment variables. Can this be read from a file instead?

Hm, I wonder if encoding is correctly handled through the chain?

Overall the event won't be suuuuper big so we shouldn't hit any issues.
Yeah I have been able to trigger a bug, instead of using uptime in the exec and used:

echo '"please dont kill me"'

The json got screwed up and I was unable to parse to input.

At this time I don't think it's worth making it better. But to make this production ready we should see if we can find another way of sending data to our pod through the event-listener or we should handle a bunch of inputs that can create issues.

simongottschlag · 2021-05-03T18:34:19Z

main.go

+
+	bodyReq := os.Getenv("BODY")
+	if bodyReq == "" {
+		panic("Need to get environment variable BODY")


Personal preference: never panic, print error to stderr and os.Exit(1).

I would put most of the code in a func run() error {} and return early on this and then in main do something like:

func main() { err := run() if err != nil { os.Exit(1) } }

panic and os.Exit are different things. Read e.g. here https://stackoverflow.com/a/28473339/205674. For example os.Exit will not run you defers. Cleanup and unwinding may not matter a lot in a simple program like this, but if you want to be conscientious, I think this guy has the right idea https://stackoverflow.com/a/46255965/205674 .

Interestingly, Simon demonstrates proper use of os.Exit in his PR https://github.com/simongottschlag/mqtt-log-stdout-v2/pull/1/files#diff-9edc738f3908a55100ae0f3f1adbb25e9b341052e68eebf7e792440bdf06ec8aR30 . Here, the main function can be trivially judged not to require cleanup, since that happened at a lower level already; thus we can safely os.Exit.

I think short is swat in this case but it's a good point for the future.

This Go Time (podcast) episode goes into it quite well: https://changelog.com/gotime/165

main.go

tekton/pipeline.yaml

simongottschlag · 2021-05-03T18:45:37Z

main.go

+
+	if !critical {
+		log.Printf("Deleting pod %s from namespace %s", podName, namespace)
+		err := kubeClient.CoreV1().Pods(namespace).Delete(context.Background(), podName, metaV1.DeleteOptions{})


Should we force delete? Or maybe make it configurable.

Should the context have a timeout?

Since the use case is terminating suspicious pods, it may be that there should be some form of ops notification in case a pod resists arrest? (Would be new PR, tho.)

Hmm about force delete I will skip it to keep it shorter.

About context it's a good idea. I just don't know how the k8s client works in go well enough.
I assume that even if the context timesout and the application stops freezing the k8s API will continue to try to delete the pod just like it would if you wrote a kubectl delete command.
If so It might not be worth adding it since the call have gone through.

My initial thought is to do something like this. Is 5 seconds to long?

ctx, cancel := context.WithTimeout(context.Background(), time.Duration(time.Second*5)) defer cancel() err := kubeClient.CoreV1().Pods(namespace).Delete(ctx, podName, metaV1.DeleteOptions{})

Well, deleting pods is about changing the ideal state and letting Kubernetes try to reconcile that state. I think it makes sense that this process waits a bit to see if the delete is successful, but we do not want to keep trying forever or this will itself be a vector for DoS. Probably, the process should wait a bit, log whether it succeeded or timed out and move on.

main.go

Update naming overall

It's a less general name

simongottschlag

Some comments for the readme

README.md

simongottschlag · 2021-05-03T19:34:24Z

README.md

+
+This was a rather simple example on how we can use the power of tekton together with Falco to protect us from bad actors that is trying to take over pods in our cluster.
+
+As noted during this post there are allot of potential improvements before this is production ready, for example:


a lot
as an example

simongottschlag

Noticed that Golang 1.15 was used to build

Dockerfile

* Also remove a unused exit(1)

README.md

simongottschlag · 2021-05-04T05:12:47Z

main.go

+	criticalNamespaces := map[string]bool{
+		"kube-system":     true,
+		"kube-public":     true,
+		"kube-node-lease": true,
+		"falco":           true,
+	}


I would actually fail early here if it was a critical namespace and skip passing it to the deletePod() function.

Can't both you and @bittrance happy on this one :)
He asked me earlier to break out to delete and if statement in a separate function. At least if I understood him correctly.

The question is, should we test to see if we are in k8s first or should we check if we aren't in a protected namespace first?

I think a process should generally ascertain its ability to function properly before starting any work. Consider a scenario where it is difficult for me to provoke a Falco event that results in a dead pod. In that scenario, if we are taking decisions (i.e. early return on protected namespaces) before we done all the setup, it may take a long time before we discover that we have bad credentials. This is basically the fail-fast principle applied to software design.

This was what I meant: #1 (comment)

I was thinking that this could be done earlier, but making sure the credentials work seems like a good idea before. 👍

main.go

bittrance · 2021-05-04T07:49:40Z

main.go

+	if criticalNamespaces[namespace] {
+		log.Printf("The pod %v won't be deleted due to it's part of the critical ns list: %v ", podName, namespace)
+		return nil
+	}


My previous review comment may have been confusing, but I think the namespace check should go "around" this method rather than inside it.

Okay so if it's not present call the function to kill the pod?

Yes, something like that. In main(), I think I would do something like:

if criticalNamespaces[namespace] { log.Printf("The pod %v won't be deleted due to it's part of the critical ns list: %v ", podName, namespace) return } err = deletePod(kubeClient, falcoEvent, criticalNamespaces) if err != nil { log.Fatalf("Unable to delete pod due to err %v", err) }

bittrance · 2021-05-04T07:59:43Z

main.go

+		return nil
+	}
+
+	log.Printf("Deleting pod %s from namespace %s", podName, namespace)


I have not yet published my magnus opus on logging statements, but I prefer one "info" level log entry per "request". This method has three. Also, logging should in my opinion be written with the assumption that the code works (you use tests to verify that the code works) which means that the request logging goes after the operation, documenting that it happened. Consider:

log.Printf("Do stuff") doStuff()

vs

doStuff() log.Printf("Did stuff")

The doStuff() function is almost guaranteed to be more complex than the logging. Thus, if you see "Did stuff" in your log, you are a lot more sure that stuff actually happened than if you see "Do stuff". The former style is usually a product of a developer trying to find out if code works and should not go into production - presumably you found out that it works or you would not put the code in production.

Sometimes "ingress" logging is motivated, but should generally be at debug or trace level.

NissesSenap added 8 commits May 1, 2021 16:16

initial commit

1826af6

Update yaml

cef7ce4

* Add minor go code, super broken! * Add NOTES.md for me

pipelinerun not pipline in triggerTemplate

f5ca713

https://tekton.dev/docs/triggers/triggertemplates/#escaping-quoted-strings

Removal of all extra params

4e984ea

go code that works

fc6503e

change hello task image to poddeleter:3a9d3d5

ebed023

* Update command and args and use env instead * Fix go code... * Set SA correct in triggerTemplate * Set ful path in poddeleter command, tekton changes the WORKSPACE * Ignore poddeleter in falco so we don't get a loop of delete pods

Improve instructions

a5cdefd

* Add go.sum * Clean up NOTES.md * Add LICENSE

Merge branch 'main' into review

c3b292b

bittrance reviewed May 3, 2021

View reviewed changes

NissesSenap added 2 commits May 3, 2021 20:44

Split up main in a few functions

b118f1d

Merge branch 'main' into review

1662cc9

simongottschlag reviewed May 3, 2021

View reviewed changes

NissesSenap added 2 commits May 3, 2021 20:58

Change from hello task to pod-delete

4dc3cc3

Update naming overall

Rename param body to falco-event

290d77a

It's a less general name

simongottschlag reviewed May 3, 2021

View reviewed changes

NissesSenap added 2 commits May 3, 2021 22:03

Update README to match the new names

829dcc2

Merge branch 'main' into review

fb02076

simongottschlag reviewed May 3, 2021

View reviewed changes

Dockerfile Outdated Show resolved Hide resolved

NissesSenap added 2 commits May 3, 2021 22:34

Fix spelling errors

30caa0b

* Also remove a unused exit(1)

Merge branch 'main' into review

1746d57

simongottschlag reviewed May 3, 2021

View reviewed changes

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

NissesSenap added 7 commits May 3, 2021 22:58

Use hash map instead of []string

5bb363e

minor updates to README

57b8fa4

Merge branch 'main' into review

0191e76

use hash map correct...

abcecf9

Merge branch 'main' into review

c5d7068

Update README to match go code

d0ff146

Merge branch 'main' into review

b77e9c5

simongottschlag reviewed May 4, 2021

View reviewed changes

Being more consistent in main.go

a0037ab

NissesSenap added 3 commits May 4, 2021 08:31

Merge branch 'main' into review

bfd4f47

Update docker tag to latest

a4d9897

Merge branch 'main' into review

42ae948

bittrance reviewed May 4, 2021

View reviewed changes


		This was a rather simple example on how we can use the power of tekton together with Falco to protect us from bad actors that is trying to take over pods in our cluster.

		As noted during this post there are allot of potential improvements before this is production ready, for example:

Review #1

Are you sure you want to change the base?

Review #1

Conversation

NissesSenap commented May 3, 2021

bittrance left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bittrance May 3, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simongottschlag left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simongottschlag left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simongottschlag left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bittrance May 3, 2021 •

edited

Loading