Skip to content

Commit

Permalink
Add new Python codemods
Browse files Browse the repository at this point in the history
  • Loading branch information
drdavella committed May 16, 2024
1 parent 4e511b5 commit 757e995
Show file tree
Hide file tree
Showing 43 changed files with 1,228 additions and 36 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
title: "Harden potentially insecure deserialization operations"
sidebar_position: 1
---

## defectdojo:python/avoid-insecure-deserialization

| Importance | Review Guidance | Requires Scanning Tool |
| ---------- | -------------------------- | ---------------------- |
| High | Merge After Cursory Review | Yes (DefectDojo) |

Use of insecure deserialization can potentially lead to arbitrary code execution. This codemod addresses this issue with two different serialization providers:

- `yaml` (via [PyYAML](https://pyyaml.org/wiki/PyYAMLDocumentation))
- `pickle` (via the standard library [pickle](https://docs.python.org/3/library/pickle.html) module)

Each is described in more detail below.

## PyYAML

The default loaders in PyYAML are not safe to use with untrusted data. They potentially make your application vulnerable to arbitrary code execution attacks. If you open a YAML file from an untrusted source, and the file is loaded with the default loader, an attacker could execute arbitrary code on your machine.

Calling `yaml.load()` without an explicit loader argument is equivalent to calling it with `Loader=yaml.Loader`, which is unsafe. This usage [has been deprecated](https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input\)-Deprecation) since PyYAML 5.1. This codemod will add an explicit `SafeLoader` argument to all `yaml.load()` calls that don't use an explicit loader.

The changes from this codemod look like the following:

```diff
import yaml
data = b'!!python/object/apply:subprocess.Popen \\n- ls'
- deserialized_data = yaml.load(data, yaml.Loader)
+ deserialized_data = yaml.load(data, Loader=yaml.SafeLoader)
```

## Pickle

Python's `pickle` module is notoriouly insecure. While it is very useful for serializing and deserializing Python objects, it is not safe to use `pickle` to load data from untrusted sources. This is because `pickle` can execute arbitrary code when loading data. This can be exploited by an attacker to execute arbitrary code on your system. Unlike `yaml` there is no concept of a "safe" loader in `pickle`. Therefore, it is recommended to avoid `pickle` and to use a different serialization format such as `json` or `yaml` when working with untrusted data.

However, if you must use `pickle` to load data from an untrusted source, we recommend using the open-source `fickling` library. `fickling` is a drop-in replacement for `pickle` that validates the data before loading it and checks for the possibility of code execution. This makes it much safer (although still not entirely safe) to use `pickle` to load data from untrusted sources.

This codemod replaces calls to `pickle.load` with `fickling.load` in Python code. It also adds an import statement for `fickling` if it is not already present.

The changes look like the following:

```diff
- import pickle
+ import fickling

- data = pickle.load(file)
+ data = fickling.load(file)
```

If you have feedback on this codemod, [please let us know](mailto:[email protected])!

## F.A.Q.

### Why is this codemod marked as Merge After Cursory Review?

This change is generally safe and will prevent deserialization vulnerabilities.

## Codemod Settings

N/A

## References

N/A
39 changes: 39 additions & 0 deletions docs/codemods/python/defectdojo_python_django-secure-set-cookie.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
title: "Use Safe Parameters in Django Response `set_cookie` Call"
sidebar_position: 1
---

## defectdojo:python/django-secure-set-cookie

| Importance | Review Guidance | Requires Scanning Tool |
| ---------- | -------------------------- | ---------------------- |
| Medium | Merge After Cursory Review | Yes (DefectDojo) |

This codemod sets the most secure parameters when Django applications call `set_cookie` on a response object. Without these parameters, your Django application cookies may be vulnerable to being intercepted and used to gain access to sensitive data.

The changes from this codemod look like this:

```diff
from django.shortcuts import render
def index(request):
resp = render(request, 'index.html')
- resp.set_cookie('custom_cookie', 'value')
+ resp.set_cookie('custom_cookie', 'value', secure=True, httponly=True, samesite='Lax')
return resp
```

If you have feedback on this codemod, [please let us know](mailto:[email protected])!

## F.A.Q.

### Why is this codemod marked as Merge After Cursory Review?

Our change provides the most secure way to create cookies in Django. However, it's possible you have configured your Django application configurations to use secure cookies. In these cases, using the default parameters for `set_cookie` is safe.

## Codemod Settings

N/A

## References

N/A
36 changes: 36 additions & 0 deletions docs/codemods/python/pixee_python_break-or-continue-out-of-loop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
title: "Removed break or continue statement out of loop"
sidebar_position: 1
---

## pixee:python/break-or-continue-out-of-loop

| Importance | Review Guidance | Requires Scanning Tool |
| ---------- | ------------------ | ---------------------- |
| Low | Merge After Review | No |

Any `break` or `continue` statements that are not inside a `for` or `while` loop will result in a `SyntaxError`. This codemod will remove them.

Our changes look something like this:

```diff
def f():
print('not in a loop')
- break
```

If you have feedback on this codemod, [please let us know](mailto:[email protected])!

## F.A.Q.

### Why is this codemod marked as Merge After Review?

While this change will make the code consistent, it is likely that the `break` or `continue` statement is a symptom of an error in program logic.

## Codemod Settings

N/A

## References

- [https://pylint.readthedocs.io/en/stable/user_guide/messages/error/not-in-loop.html](https://pylint.readthedocs.io/en/stable/user_guide/messages/error/not-in-loop.html)
39 changes: 39 additions & 0 deletions docs/codemods/python/pixee_python_combine-isinstance-issubclass.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
title: "Simplify Boolean Expressions Using `isinstance` and `issubclass`"
sidebar_position: 1
---

## pixee:python/combine-isinstance-issubclass

| Importance | Review Guidance | Requires Scanning Tool |
| ---------- | -------------------- | ---------------------- |
| Low | Merge Without Review | No |

Many developers are not necessarily aware that the `isinstance` and `issubclass` builtin methods can accept a tuple of classes to match. This means that there is a lot of code that uses boolean expressions such as `isinstance(x, str) or isinstance(x, bytes)` instead of the simpler expression `isinstance(x, (str, bytes))`.

This codemod simplifies the boolean expressions where possible which leads to cleaner and more concise code.

The changes from this codemod look like this:

```diff
x = 'foo'
- if isinstance(x, str) or isinstance(x, bytes):
+ if isinstance(x, (str, bytes)):
...
```

If you have feedback on this codemod, [please let us know](mailto:[email protected])!

## F.A.Q.

### Why is this codemod marked as Merge Without Review?

Simplifying expressions involving `isinstance` or `issubclass` calls is safe.

## Codemod Settings

N/A

## References

N/A
61 changes: 61 additions & 0 deletions docs/codemods/python/pixee_python_disable-graphql-introspection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
title: "Disable GraphQL Introspection to Prevent Sensitive Data Leakage"
sidebar_position: 1
---

## pixee:python/disable-graphql-introspection

| Importance | Review Guidance | Requires Scanning Tool |
| ---------- | ------------------ | ---------------------- |
| High | Merge After Review | No |

Introspection allows a client to query the schema of a GraphQL API. Allowing introspection in production code may allow a malicious user to gather information about data types and operations for a potential attack.

Introspection is often enabled by default in GraphQL without authentication. This codemod disables introspection altogether at the view level by introducing a validation rule. The required rules may be dependent on the framework that you are using. Please check your framework documentation for specific rules for disabling introspection.

Our changes look something like this:

```diff
from graphql_server.flask import GraphQLView
from flask import Flask
from graphql import (
GraphQLSchema, GraphQLObjectType, GraphQLField, GraphQLString)
+from graphql.validation import NoSchemaIntrospectionCustomRule

schema = GraphQLSchema(
query=GraphQLObjectType(
name='RootQueryType',
fields={
'hello': GraphQLField(
GraphQLString,
resolve=lambda obj, info: 'world')
}))

app = Flask(__name__)

app.add_url_rule("/api",
view_func=GraphQLView.as_view( # Noncompliant
name="api",
schema=schema,
+ validation_rules = [NoSchemaIntrospectionCustomRule]
),
)
```

If you have feedback on this codemod, [please let us know](mailto:[email protected])!

## F.A.Q.

### Why is this codemod marked as Merge After Review?

This change may disable a feature that was left on purpose. Moreover the rule added may be framework specific.

## Codemod Settings

N/A

## References

- [https://owasp.org/Top10/A05_2021-Security_Misconfiguration/](https://owasp.org/Top10/A05_2021-Security_Misconfiguration/)
- [https://owasp.org/www-project-top-ten/2017/A3_2017-Sensitive_Data_Exposure](https://owasp.org/www-project-top-ten/2017/A3_2017-Sensitive_Data_Exposure)
- [https://owasp.org/www-project-web-security-testing-guide/v42/4-Web_Application_Security_Testing/12-API_Testing/01-Testing_GraphQL#introspection-queries](https://owasp.org/www-project-web-security-testing-guide/v42/4-Web_Application_Security_Testing/12-API_Testing/01-Testing_GraphQL#introspection-queries)
2 changes: 1 addition & 1 deletion docs/codemods/python/pixee_python_django-debug-flag-on.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ sidebar_position: 1
| ---------- | -------------------------- | ---------------------- |
| Medium | Merge After Cursory Review | No |

This codemod will flip django's `DEBUG` flag to `False` if it's `True` on the `settings.py` file within django's default directory structure.
This codemod will flip Django's `DEBUG` flag to `False` if it's `True` on the `settings.py` file within Django's default directory structure.

Having the debug flag on may result in sensitive information exposure. When an exception occurs while the `DEBUG` flag in on, it will dump metadata of your environment, including the settings module. The attacker can purposefully request a non-existing url to trigger an exception and gather information about your system.

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
title: "Ensure Django Model Classes Implement a `__str__` Method"
sidebar_position: 1
---

## pixee:python/django-model-without-dunder-str

| Importance | Review Guidance | Requires Scanning Tool |
| ---------- | ------------------ | ---------------------- |
| Low | Merge After Review | No |

If you've ever actively developed or debugged a Django application, you may have noticed that the string representations of Django models and their instances can sometimes be hard to read or to distinguish from one another. Loading models in the interactive Django console or viewing them in the admin interface can be puzzling. This is because the default string representation of Django models is fairly generic.

This codemod is intended to make the string representation of your model objects more human-readable. It will automatically detect all of your model's fields and display them as a descriptive string.

For example, the default string representation of the `Question` model from Django's popular Poll App tutorial looks like this:

```diff
from django.db import models

class Question(models.Model):
question_text = models.CharField(max_length=200)
pub_date = models.DateTimeField("date published")
+
+ def __str__(self):
+ model_name = self.__class__.__name__
+ fields_str = ", ".join((f"{field.name}={getattr(self, field.name)}" for field in self._meta.fields))
+ return f"{model_name}({fields_str})"
```

Without this change, the string representation of `Question` objects look like this in the interactive Django shell:

```
>>> Question.objects.all()
<QuerySet [<Question: Question object (1)>]>
```

With this codemod's addition of `__str__`, it now looks like:

```
>>> Question.objects.all()
<QuerySet [<Question: Question(id=1, question_text=What's new?, pub_date=2024-02-21 14:28:45.631782+00:00)>]>
```

You'll notice this change works great for models with only a handful of fields. We encourage you to use this codemod's change as a starting point for further customization.

If you have feedback on this codemod, [please let us know](mailto:[email protected])!

## F.A.Q.

### Why is this codemod marked as Merge After Review?

This codemod is a great starting point for models with few fields. We encourage you to write custom `__str__` methods that best suit your Django application.

## Codemod Settings

N/A

## References

- [https://docs.djangoproject.com/en/5.0/ref/models/instances/#django.db.models.Model.**str**](https://docs.djangoproject.com/en/5.0/ref/models/instances/#django.db.models.Model.__str__)
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ sidebar_position: 1

| Importance | Review Guidance | Requires Scanning Tool |
| ---------- | -------------------- | ---------------------- |
| Medium | Merge Without Review | No |
| Low | Merge Without Review | No |

Django uses signals to notify and handle actions that happens elsewhere in the application. You can define a response to a given signal by decorating a function with the `@receiver(signal)` decorator. The order in which the decorators are declared for this function is important. If the `@receiver` decorator is not on top, any decorators before it will be ignored.
Our changes look something like this:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ sidebar_position: 1
| ---------- | -------------------------- | ---------------------- |
| Medium | Merge After Cursory Review | No |

This codemod will set django's `SESSION_COOKIE_SECURE` flag to `True` if it's `False` or missing on the `settings.py` file within django's default directory structure.
This codemod will set Django's `SESSION_COOKIE_SECURE` flag to `True` if it's `False` or missing on the `settings.py` file within Django's default directory structure.

```diff
+ SESSION_COOKIE_SECURE = True
Expand Down
2 changes: 1 addition & 1 deletion docs/codemods/python/pixee_python_fix-assert-tuple.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ sidebar_position: 1

| Importance | Review Guidance | Requires Scanning Tool |
| ---------- | -------------------------- | ---------------------- |
| Medium | Merge After Cursory Review | No |
| Low | Merge After Cursory Review | No |

An assertion on a non-empty tuple will always evaluate to `True`. This means that `assert` statements involving non-empty tuple literals are likely unintentional and should be rewritten. This codemod rewrites the original `assert` statement by creating a new `assert` for each item in the original tuple.

Expand Down
45 changes: 45 additions & 0 deletions docs/codemods/python/pixee_python_fix-dataclass-defaults.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
title: "Replace `dataclass` Mutable Default Values with Call to `field`"
sidebar_position: 1
---

## pixee:python/fix-dataclass-defaults

| Importance | Review Guidance | Requires Scanning Tool |
| ---------- | -------------------- | ---------------------- |
| Low | Merge Without Review | No |

When defining a Python dataclass it is not safe to use mutable datatypes (such as `list`, `dict`, or `set`) as defaults for the attributes. This is because the defined attribute will be shared by all instances of the dataclass type. Using such a mutable default will ultimately result in a `ValueError` at runtime. This codemod updates attributes of `dataclasses.dataclass` with mutable defaults to use `dataclasses.field` instead. The [dataclass documentation](https://docs.python.org/3/library/dataclasses.html#mutable-default-values) providesmore details about why using `field(default_factory=...)` is the recommended pattern.

Our changes look something like this:

```diff
-from dataclasses import dataclass
+from dataclasses import field, dataclass

@dataclass
class Person:
name: str = ""
- phones: list = []
- friends: dict = {}
- family: set = set()
+ phones: list = field(default_factory=list)
+ friends: dict = field(default_factory=dict)
+ family: set = field(default_factory=set)
```

If you have feedback on this codemod, [please let us know](mailto:[email protected])!

## F.A.Q.

### Why is this codemod marked as Merge Without Review?

This change is safe and will prevent runtime `ValueError`.

## Codemod Settings

N/A

## References

- [https://docs.python.org/3/library/dataclasses.html#mutable-default-values](https://docs.python.org/3/library/dataclasses.html#mutable-default-values)
Loading

0 comments on commit 757e995

Please sign in to comment.