Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dtype changes #481

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

bavithra-devi
Copy link

Related Issue

Checklist

  • I have read the CONTRIBUTING guidelines.
  • I have added tests to cover my changes.
  • I have updated the documentation (docs folder) accordingly.

Additional Notes

Add any other context about the PR here.

kartik-ganesh and others added 3 commits September 10, 2024 12:16
* implement logging

* add EOF
initialise content in log_handler

* use poetry lock to update

* fix poetry missing dependencies

* fix dependencies

* reformatted

* fix p
fix duration
add span_id

* add temp act

* log act

* add noqa

* reformat

* add ignore

* add d only in response and delight

* use td from thread

* add to delight log

* run black

* null as default values

* remove header from delight
* Adding security workflow for onboarding WASP

* Updated Hardcoded FR Project key as a variable for easy migration
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR adds OpenTelemetry instrumentation, structured logging, and dtype configuration to support the nomic-ai/nomic-embed-text-v1.5 model while enhancing observability.

  • Added new security workflow /.github/workflows/security.yml implementing Wasp (Semgrep) SAST scanning for code safety
  • Introduced structured logging in /libs/infinity_emb/infinity_emb/log_handler.py with OpenTelemetry integration for distributed tracing
  • Modified /libs/infinity_emb/Dockerfile to support configurable dtype and base CUDA image via build arguments
  • Added /libs/infinity_emb/environment_config.sh for OpenTelemetry configuration and service startup
  • Added einops dependency in pyproject.toml to support nomic-ai/nomic-embed-text-v1.5 model

7 file(s) reviewed, 10 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines +4 to +6
pull_request_target:
branches:
- main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: pull_request_target is a security risk as it runs with repository secrets on external PR code. Consider using pull_request instead if repository secrets are not needed, or add explicit ref validation

group: security-lrg
steps:
- name: Setting permission
run: sudo chown runner:runner -R .*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The wildcard pattern .* in chown is overly broad and could affect hidden files. Use a more specific path like '.'

uses: actions/checkout@v4

- name: Running Wasp scan
uses: freshactions/wasp@latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Using @latest tag for actions can be dangerous. Pin to a specific version for security and stability

@@ -6,6 +6,10 @@ __pycache__/
# C extensions
*.so

# Pycharn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: 'Pycharn' is misspelled, should be 'PyCharm'

@@ -6,6 +6,10 @@ __pycache__/
# C extensions
*.so

# Pycharn
.idea
models/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Ignoring 'models/' directory may prevent tracking of important model files that should be version controlled. Consider being more specific about which model files to ignore

#export OTEL_TRACES_SAMPLER=parentbased_always_off
export OTEL_RESOURCE_ATTRIBUTES=service.name=${SHERLOCK_SERVICE_NAME},host.name=${POD_NAME},host.ip=${POD_IP}
export OTEL_EXPORTER_OTLP_ENDPOINT=http://${HOST_IP}:5680
infinity_emb v2 --model-id /models --dtype ${DTYPE}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: The hardcoded /models path assumes a specific directory structure. Consider making this configurable via environment variable

@@ -129,6 +133,7 @@ async def validate_token(

instrumentator = Instrumentator().instrument(app)
app.add_exception_handler(errors.OpenAIException, errors.openai_exception_handler)
app.add_middleware(StructuredLoggingMiddleware)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: StructuredLoggingMiddleware should be added before other middleware like CORSMiddleware to ensure all requests are properly logged

Comment on lines +108 to +109
extra_index = 9 # obtained from function signature, see above comment.
extra = args[extra_index]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Hardcoding the extra_index is fragile and could break with Python/logging updates. Consider using inspect module to find index dynamically

Comment on lines +94 to +100
def nice_format(data):
if type(data) in (int, str, float):
return data
try:
return json.dumps(data, ensure_ascii=False)
except Exception:
return str(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Using type() instead of isinstance() for type checking is not recommended as it breaks inheritance

Comment on lines +43 to +44
opentelemetry-instrumentation-logging = "^0.48b0"
opentelemetry-sdk = "^1.27.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: OpenTelemetry dependencies are added as required but should likely be optional since they are only needed for logging instrumentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants