-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
switch to micromamba #6
Conversation
WalkthroughThe changes involve modifications to several files, primarily focusing on environment management and script functionality. The Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (6)
conda.yaml (1)
12-12
: Consider stricter version constraint for numpyThe current constraint
numpy<2
is quite loose and might lead to compatibility issues.Consider using a more specific version range:
-- numpy<2 +- numpy>=1.24,<2Dockerfile.moeflow_worker (1)
1-4
: Consider version pinning and file validation improvementsWhile the switch to micromamba is good, consider these enhancements:
- Pin the exact micromamba version (e.g.,
mambaorg/micromamba:2.0.0-debian12-slim
) to ensure build reproducibility- Add validation for conda.yaml existence
-FROM mambaorg/micromamba:2-debian12-slim +FROM mambaorg/micromamba:2.0.0-debian12-slim +# Validate conda.yaml exists +COPY ./conda.yaml ./conda.yaml +RUN test -f ./conda.yaml || (echo "conda.yaml not found" && exit 1) -COPY ./conda.yaml /tmp/conda.yaml RUN --mount=type=cache,target=/opt/conda/pkgs micromamba env create --yes --file /tmp/conda.yamlMakefile (2)
3-4
: Consider making CONDA_ENV configurableThe environment name is hardcoded to "mit-py311". Consider making it configurable like CONDA_YML to improve flexibility:
-CONDA_ENV = mit-py311 +CONDA_ENV ?= mit-py311
Line range hint
8-27
: Complete migration to micromambaSeveral targets still use
conda run
instead ofmicromamba run
. For consistency with the migration to micromamba:run-worker: - conda run -n mit-py311 --no-capture-output celery --app moeflow_worker worker --queues mit --loglevel=debug --concurrency=1 + micromamba run -n $(CONDA_ENV) --no-capture-output celery --app moeflow_worker worker --queues mit --loglevel=debug --concurrency=1 prepare-models: - conda run -n mit-py311 --no-capture-output python3 docker_prepare.py + micromamba run -n $(CONDA_ENV) --no-capture-output python3 docker_prepare.pydocker_prepare.py (2)
28-42
: Refactor main function for better maintainabilitySeveral improvements are needed:
- Document or make configurable the hardcoded exclusion of "sd" from inpainters
- Pass the continue_on_error flag to the download function
- Add validation for empty models set
Apply this diff to improve the implementation:
+# Models that are handled differently and should be excluded +EXCLUDED_INPAINTERS = {"sd"} + async def main(): parsed = arg_parser.parse_args() models: set[str] = set(filter(None, parsed.models.split(","))) + + if parsed.models and not models: + raise ValueError("Invalid --models format. Expected comma-separated list of model names.") + await download( - {k: v for k, v in DETECTORS.items() if not models or f"detector.{k}" in models} + {k: v for k, v in DETECTORS.items() if not models or f"detector.{k}" in models}, + continue_on_error=parsed.continue_on_error ) await download( - {k: v for k, v in OCRS.items() if not models or f"ocr.{k}" in models} + {k: v for k, v in OCRS.items() if not models or f"ocr.{k}" in models}, + continue_on_error=parsed.continue_on_error ) await download( { k: v for k, v in INPAINTERS.items() - if (not models or f"inpaint.{k}" in models) and (k not in ["sd"]) - } + if (not models or f"inpaint.{k}" in models) and (k not in EXCLUDED_INPAINTERS) + }, + continue_on_error=parsed.continue_on_error )
45-46
: Add signal handling for graceful shutdownConsider adding signal handling to ensure graceful shutdown when the script is interrupted during downloads.
Apply this diff to improve the implementation:
+import signal +import sys + +def handle_interrupt(signum, frame): + print("\nInterrupted. Cleaning up...") + sys.exit(1) + if __name__ == "__main__": + signal.signal(signal.SIGINT, handle_interrupt) asyncio.run(main())
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
.gitignore
(1 hunks)Dockerfile.moeflow_worker
(1 hunks)Makefile
(2 hunks)conda.yaml
(1 hunks)docker_prepare.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- .gitignore
🔇 Additional comments (4)
conda.yaml (2)
3-5
: Verify package availability with the new channel configuration
The channel order looks good with conda-forge prioritized. However, removing the defaults channel might affect package resolution.
✅ Verification successful
Channel configuration is appropriate for PyTorch ecosystem
The channel configuration is correct and sufficient:
- The PyTorch channel contains all required PyTorch-related packages (pytorch, torchvision, torchaudio, pytorch-cuda)
- The order (conda-forge → pytorch → nvidia) is optimal for package resolution
- Removing the defaults channel is safe as all core packages are available in conda-forge
- The configuration aligns with the project's dependencies, particularly the CUDA 12.1 and PyTorch 2.2.2 requirements
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify package availability in configured channels
# Expected: All packages should be found in the specified channels
# Check if all packages are available in configured channels
micromamba search -c conda-forge -c pytorch -c nvidia \
python==3.11 pytorch==2.2.2 torchvision==0.17.2 \
torchaudio==2.2.2 pytorch-cuda=12.1 numpy \
--json | jq '.[] | select(.version != null) | {name, version, channel}'
Length of output: 295
Script:
#!/bin/bash
# Let's try with conda-search instead since micromamba isn't available
# First check if conda is available
which conda || echo "conda not available"
# If neither conda nor micromamba are available, let's verify package availability
# through direct inspection of conda-forge and pytorch channel metadata
curl -s "https://conda.anaconda.org/conda-forge/channeldata.json" | jq '.packages | with_entries(select(.key | test("^(python|pytorch|torchvision|torchaudio|pytorch-cuda|numpy)$"))) | keys'
curl -s "https://conda.anaconda.org/pytorch/channeldata.json" | jq '.packages | with_entries(select(.key | test("^(pytorch|torchvision|torchaudio|pytorch-cuda)$"))) | keys'
# Also check if there are any other conda config files that might affect channels
fd -e yaml -e yml . | xargs grep -l "channel\|conda-forge\|pytorch\|nvidia"
Length of output: 779
7-12
: Verify PyTorch and CUDA version compatibility
The PyTorch ecosystem versions (2.2.2) and CUDA (12.1) need verification for compatibility.
Makefile (1)
29-31
: Verify hybrid environment setup and dependency isolation
The target creates a Python venv that uses packages from the Conda environment (--system-site-packages
). This hybrid approach might lead to:
- Dependency conflicts between Conda and venv packages
- Unclear package precedence
- Difficulty in reproducing environments
✅ Verification successful
Let me gather more information about the venv setup and related targets.
Let me check the content of these dependency files to better understand the package management setup.
Hybrid environment setup is intentional and correctly configured
The hybrid setup is actually well-designed for this project's needs:
- Conda environment (
conda.yaml
) manages core dependencies that benefit from conda-forge builds: PyTorch, CUDA, and specific numpy version constraints - Regular pip requirements handle the broader set of Python packages
- The
--system-site-packages
flag is necessary here to make conda-installed PyTorch accessible to the venv
The setup avoids dependency conflicts because:
- Core packages (PyTorch ecosystem) are strictly versioned in conda.yaml
- Other dependencies in requirements.txt are compatible with these core versions
- Development dependencies are isolated in requirements-dev.txt
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if venv/.venv_created target exists and its relationship with conda-venv
rg -l "venv/.venv_created"
# Check for any other venv-related targets that might conflict
rg "venv" Makefile
Length of output: 276
Script:
#!/bin/bash
# Check for the complete Makefile content to understand the venv setup
cat Makefile
# Look for any Python requirements or dependency files
fd -e txt -e ini -e yml -e yaml -e toml '(requirements|pyproject|setup|environment)'
Length of output: 1181
Script:
#!/bin/bash
# Check contents of dependency files
echo "=== requirements.txt ==="
cat requirements.txt
echo -e "\n=== requirements-dev.txt ==="
cat requirements-dev.txt
echo -e "\n=== pyproject.toml ==="
cat pyproject.toml
echo -e "\n=== conda.yaml ==="
cat conda.yaml
Length of output: 1832
docker_prepare.py (1)
1-46
: Verify script functionality in micromamba environment
Since this PR switches to micromamba, let's verify that all required packages are available in the new environment.
✅ Verification successful
Script is compatible with micromamba environment
The script's dependencies are fully satisfied as:
asyncio
is a built-in module in Python 3.11 (specified in conda.yaml)argparse
is also a standard library module- All manga-translator specific imports are from the local package
The script is correctly integrated into the Docker build process and should work as expected in the micromamba environment.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if all imported packages are available in conda.yaml
# Get all Python imports from the file
IMPORTS=$(cat docker_prepare.py | grep -E '^import |^from .* import' | awk '{print $2}' | cut -d' ' -f1 | cut -d'.' -f1 | grep -v '^manga_translator$' | sort -u)
# Check conda.yaml for these packages
echo "Checking conda.yaml for required packages:"
for pkg in $IMPORTS; do
if [ "$pkg" != "argparse" ]; then # Skip standard library modules
echo "Checking for $pkg..."
rg -i "name: *$pkg|$pkg *=|$pkg *>|$pkg *<" conda.yaml || echo "Warning: $pkg not found in conda.yaml"
fi
done
Length of output: 607
Script:
#!/bin/bash
# Let's verify if asyncio is part of Python standard library
# and check Python version in conda.yaml
echo "Checking Python version in conda.yaml:"
rg -i "python *[=>]" conda.yaml
echo -e "\nChecking if docker_prepare.py is used in Dockerfile:"
rg "docker_prepare.py" Dockerfile
Length of output: 322
- python==3.11 | ||
- pytorch==2.2.2 | ||
- torchvision==0.17.2 | ||
- torchaudio==2.2.2 | ||
- pytorch-cuda=12.1 | ||
- numpy<2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Review the removal of previously included dependencies
The dependency list has been significantly reduced. While this aligns with the switch to micromamba for a leaner environment, please verify if essential dependencies are missing.
Consider adding these common essential packages if they're required:
- python==3.11
- pytorch==2.2.2
- torchvision==0.17.2
- torchaudio==2.2.2
- pytorch-cuda=12.1
- numpy<2
+- pip
+- setuptools
+- wheel
Committable suggestion skipped: line range outside the PR's diff.
Dockerfile.moeflow_worker
Outdated
COPY --chown=mambauser . /app | ||
WORKDIR /app | ||
COPY . /app | ||
RUN --mount=type=cache,target=/root/.conda conda env update --prune --file conda.yaml | ||
RUN conda run -n mit-py311 --no-capture-output python3 docker_prepare.py | ||
RUN micromamba run -n mit-py311 python3 -mvenv venv | ||
RUN venv/bin/pip -r mit-requirements.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix pip install command and review environment strategy
Critical issues and considerations:
- The pip install command is missing the
-r
flag - Creating a venv inside a conda environment is redundant and adds complexity
- Consider using micromamba's environment directly
Fix the immediate syntax error:
-RUN venv/bin/pip -r mit-requirements.txt
+RUN venv/bin/pip install -r mit-requirements.txt
Consider simplifying the environment setup by using micromamba directly instead of creating a nested venv:
-RUN micromamba run -n mit-py311 python3 -mvenv venv
-RUN venv/bin/pip install -r mit-requirements.txt
+RUN micromamba run -n mit-py311 pip install -r mit-requirements.txt
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
COPY --chown=mambauser . /app | |
WORKDIR /app | |
COPY . /app | |
RUN --mount=type=cache,target=/root/.conda conda env update --prune --file conda.yaml | |
RUN conda run -n mit-py311 --no-capture-output python3 docker_prepare.py | |
RUN micromamba run -n mit-py311 python3 -mvenv venv | |
RUN venv/bin/pip -r mit-requirements.txt | |
COPY --chown=mambauser . /app | |
WORKDIR /app | |
RUN micromamba run -n mit-py311 pip install -r mit-requirements.txt |
# RUN apt update \ | ||
# && DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get -y install tzdata build-essential |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Keep system packages - they are required for custom compilation
The packages should not be commented out as they are necessary:
- Found C++ source files (
custom_ctc.cc
,custom_ctc_cuda_driver.cc
) intraining/ocr/
that require compilation build-essential
is needed for compiling these custom CTC (Connectionist Temporal Classification) implementationstzdata
is likely needed for proper timezone handling in the application (multiple Python files use datetime)- These packages are not provided by conda.yaml which only includes Python and PyTorch dependencies
- The micromamba base image (debian12-slim) is minimal and doesn't include these by default
🔗 Analysis chain
Verify removal of system packages
The commented-out system packages (tzdata
, build-essential
) might still be required. Please verify:
- Are these packages now provided by the micromamba base image?
- Are they installed through conda.yaml instead?
- Are they no longer needed in the new setup?
Let me gather more information about the base image and conda environment setup.
Let me check one final thing about the base image and any potential custom compilation needs.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if these packages are specified in conda.yaml
echo "Checking conda.yaml for system packages..."
rg -i "tzdata|build-essential" conda.yaml
# Check if any build commands or timezone operations exist
echo "Checking for build or timezone dependencies..."
rg -l "datetime|pytz|compile|build" .
Length of output: 989
Script:
#!/bin/bash
# Check base image and conda environment setup
echo "Checking Dockerfile base image and conda setup..."
head -n 20 Dockerfile.moeflow_worker
echo "Checking conda.yaml dependencies..."
cat conda.yaml
echo "Checking if there are any compile/build steps in Makefile..."
cat Makefile
Length of output: 2020
Script:
#!/bin/bash
# Check if there are any custom compilation needs
echo "Checking for C/C++ files that might need compilation..."
fd -e cc -e cpp -e c
# Check base image details
echo "Checking if base image already includes build tools..."
docker run --rm mambaorg/micromamba:2-debian12-slim dpkg -l | grep -E "build-essential|tzdata"
Length of output: 499
.conda_env_created: $(CONDA_YML) | ||
# setup conda environment AND env-wise deps | ||
micromamba env create -n $(CONDA_ENV) --yes -f $(CONDA_YML) | ||
touch $@ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add environment cleanup and micromamba version check
The target creates a new Conda environment but doesn't handle existing environments. Consider:
- Adding cleanup of existing environment
- Verifying micromamba version for compatibility
.conda_env_created: $(CONDA_YML)
+ micromamba --version
+ micromamba env remove -n $(CONDA_ENV) --yes || true
# setup conda environment AND env-wise deps
micromamba env create -n $(CONDA_ENV) --yes -f $(CONDA_YML)
touch $@
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
.conda_env_created: $(CONDA_YML) | |
# setup conda environment AND env-wise deps | |
micromamba env create -n $(CONDA_ENV) --yes -f $(CONDA_YML) | |
touch $@ | |
.conda_env_created: $(CONDA_YML) | |
micromamba --version | |
micromamba env remove -n $(CONDA_ENV) --yes || true | |
# setup conda environment AND env-wise deps | |
micromamba env create -n $(CONDA_ENV) --yes -f $(CONDA_YML) | |
touch $@ |
docker_prepare.py
Outdated
|
||
async def download(dict): | ||
for key, value in dict.items(): | ||
if issubclass(value, ModelWrapper): | ||
print(' -- Downloading', key) | ||
try: | ||
inst = value() | ||
await inst.download() | ||
except Exception as e: | ||
print('Failed to download', key, value) | ||
print(e) | ||
""" """ | ||
for key, value in dict.items(): | ||
if issubclass(value, ModelWrapper): | ||
print(" -- Downloading", key) | ||
try: | ||
inst = value() | ||
await inst.download() | ||
except Exception as e: | ||
print("Failed to download", key, value) | ||
print(e) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve download function implementation
The function needs several improvements:
- Missing docstring documentation
- Error handling doesn't respect the
--continue-on-error
flag - Missing type hints
Apply this diff to improve the implementation:
-async def download(dict):
- """ """
+async def download(models_dict: dict[str, type[ModelWrapper]], continue_on_error: bool = False) -> None:
+ """Downloads model files for the specified model classes.
+
+ Args:
+ models_dict: Dictionary mapping model names to their ModelWrapper classes
+ continue_on_error: If True, continue downloading other models when one fails
+
+ Raises:
+ Exception: If a model download fails and continue_on_error is False
+ """
for key, value in dict.items():
if issubclass(value, ModelWrapper):
print(" -- Downloading", key)
try:
inst = value()
await inst.download()
except Exception as e:
print("Failed to download", key, value)
print(e)
+ if not continue_on_error:
+ raise
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
async def download(dict): | |
for key, value in dict.items(): | |
if issubclass(value, ModelWrapper): | |
print(' -- Downloading', key) | |
try: | |
inst = value() | |
await inst.download() | |
except Exception as e: | |
print('Failed to download', key, value) | |
print(e) | |
""" """ | |
for key, value in dict.items(): | |
if issubclass(value, ModelWrapper): | |
print(" -- Downloading", key) | |
try: | |
inst = value() | |
await inst.download() | |
except Exception as e: | |
print("Failed to download", key, value) | |
print(e) | |
async def download(models_dict: dict[str, type[ModelWrapper]], continue_on_error: bool = False) -> None: | |
"""Downloads model files for the specified model classes. | |
Args: | |
models_dict: Dictionary mapping model names to their ModelWrapper classes | |
continue_on_error: If True, continue downloading other models when one fails | |
Raises: | |
Exception: If a model download fails and continue_on_error is False | |
""" | |
for key, value in dict.items(): | |
if issubclass(value, ModelWrapper): | |
print(" -- Downloading", key) | |
try: | |
inst = value() | |
await inst.download() | |
except Exception as e: | |
print("Failed to download", key, value) | |
print(e) | |
if not continue_on_error: | |
raise |
arg_parser = ArgumentParser() | ||
arg_parser.add_argument("--models", default="") | ||
arg_parser.add_argument("--continue-on-error", action="store_true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance argument parser implementation
The argument parser needs improvements in documentation and validation:
- The
--continue-on-error
flag is defined but never used in the code - Both arguments lack help text explaining their purpose and format
- The
--models
argument should validate the input format
Apply this diff to improve the implementation:
arg_parser = ArgumentParser()
-arg_parser.add_argument("--models", default="")
-arg_parser.add_argument("--continue-on-error", action="store_true")
+arg_parser.add_argument(
+ "--models",
+ default="",
+ help="Comma-separated list of models to download (format: 'detector.name,ocr.name,inpaint.name')"
+)
+arg_parser.add_argument(
+ "--continue-on-error",
+ action="store_true",
+ help="Continue downloading other models if one fails"
+)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
arg_parser = ArgumentParser() | |
arg_parser.add_argument("--models", default="") | |
arg_parser.add_argument("--continue-on-error", action="store_true") | |
arg_parser = ArgumentParser() | |
arg_parser.add_argument( | |
"--models", | |
default="", | |
help="Comma-separated list of models to download (format: 'detector.name,ocr.name,inpaint.name')" | |
) | |
arg_parser.add_argument( | |
"--continue-on-error", | |
action="store_true", | |
help="Continue downloading other models if one fails" | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (5)
Dockerfile.moeflow_worker (1)
11-11
: Make model list configurable via build argThe model list is currently hardcoded. Consider making it configurable using Docker build arguments for flexibility.
Apply this diff:
+ARG MODELS=ocr.48px,ocr.48px_ctc,ocr.32px,ocr.mocr -RUN venv/bin/python docker_prepare.py --models ocr.48px,ocr.48px_ctc,ocr.32px,ocr.mocr +RUN venv/bin/python docker_prepare.py --models ${MODELS}Usage example:
docker build --build-arg MODELS=ocr.48px,ocr.32px -t moeflow-worker .
requirements-moeflow.txt (1)
1-52
: General recommendations for requirements file
- Many critical dependencies are commented out. This could lead to installation issues.
- Version pinning is inconsistent across packages.
- Consider using version ranges (e.g.,
package>=1.0.0,<2.0.0
) instead of exact versions for better flexibility.- The commented GitHub dependency for pydensecrf should be reviewed for security implications.
Recommendations:
- Document why certain packages are commented out
- Add version constraints for unpinned packages
- Consider splitting requirements into base and optional dependencies
- Add comments explaining specific version pins
docker_prepare.py (3)
9-15
: Clean up unnecessary blank linesThere are multiple consecutive blank lines that can be reduced to a single blank line for better readability.
arg_parser = ArgumentParser() arg_parser.add_argument("--models", default="") arg_parser.add_argument("--continue-on-error", action="store_true") - - cli_args = arg_parser.parse_args() -
34-34
: Remove commented debug codeRemove the commented debug print statement as it adds noise to the codebase.
- # print("parsed.models", models)
35-51
: Refactor repeated filtering logic and document model exclusionsThe model filtering logic is repeated three times with slight variations. Additionally, the exclusion of the "sd" model should be documented.
Consider refactoring to reduce duplication and improve clarity:
+def filter_models(models_dict: dict, prefix: str, *, exclude: set[str] = None) -> dict: + """Filter models based on command line arguments. + + Args: + models_dict: Dictionary of available models + prefix: Model type prefix (e.g., 'detector', 'ocr', 'inpaint') + exclude: Set of model names to exclude (e.g., {'sd'} for inpainters) + """ + exclude = exclude or set() + return { + k: v + for k, v in models_dict.items() + if k not in exclude and ((not models) or (f"{prefix}.{k}" in models)) + } + async def main(): models: set[str] = set(filter(None, cli_args.models.split(","))) - await download( - { - k: v - for k, v in DETECTORS.items() - if (not models) or (f"detector.{k}" in models) - } - ) - await download( - {k: v for k, v in OCRS.items() if (not models) or (f"ocr.{k}" in models)} - ) - await download( - { - k: v - for k, v in INPAINTERS.items() - if (not models) or (f"inpaint.{k}" in models) and (k not in ["sd"]) - } - ) + # Filter and download models by type + await download(filter_models(DETECTORS, "detector"), continue_on_error=cli_args.continue_on_error) + await download(filter_models(OCRS, "ocr"), continue_on_error=cli_args.continue_on_error) + # Note: 'sd' model is excluded as it requires special handling + await download( + filter_models(INPAINTERS, "inpaint", exclude={"sd"}), + continue_on_error=cli_args.continue_on_error + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
Dockerfile.moeflow_worker
(1 hunks)Makefile
(2 hunks)docker_prepare.py
(1 hunks)requirements-moeflow.txt
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- Makefile
🔇 Additional comments (6)
Dockerfile.moeflow_worker (3)
1-1
: LGTM! Efficient base image and environment setup
Good practices observed:
- Using slim base image
- Leveraging build cache for conda packages
- Proper environment creation from yaml file
Also applies to: 3-4
5-6
:
Uncomment required system packages
The commented packages are essential:
build-essential
: Required for compiling custom CTC implementationstzdata
: Needed for timezone handling in the application
Apply this diff:
-# RUN apt update \
-# && DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get -y install tzdata build-essential
+RUN apt update \
+ && DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get -y install tzdata build-essential
9-10
: 🛠️ Refactor suggestion
Simplify environment setup by removing nested venv
Creating a venv inside a conda environment adds unnecessary complexity. Use micromamba's environment directly.
Apply this diff:
-RUN micromamba run -n mit-py311 python3 -mvenv --system-site-packages venv
-RUN venv/bin/pip install -r requirements-moeflow.txt
+RUN micromamba run -n mit-py311 pip install -r requirements-moeflow.txt
requirements-moeflow.txt (2)
8-11
: Verify compatibility between image processing libraries
The combination of scikit-image, opencv-python, and shapely is common for image processing, but version compatibility should be ensured.
#!/bin/bash
# Check for any version constraints in the codebase
rg -A 1 -B 1 "scikit-image|opencv|shapely" "*.py" "*.yaml" "*.txt"
Consider adding version constraints to ensure compatibility and reproducibility.
23-23
:
Review pinned versions security
Two packages have pinned versions:
- py3langid==0.2.2
- onnxruntime==1.18.1
Let's verify if these versions are current and secure.
#!/bin/bash
# Check PyPI for latest versions and potential security advisories
for pkg in "py3langid" "onnxruntime"; do
echo "Checking $pkg..."
curl -s "https://pypi.org/pypi/$pkg/json" | jq -r '.info.version'
done
# Check for known vulnerabilities
gh api graphql -f query='
{
securityVulnerabilities(first: 5, ecosystem: PIP, package: "onnxruntime") {
nodes {
advisory {
summary
severity
publishedAt
}
vulnerableVersionRange
}
}
}'
Also applies to: 36-36
docker_prepare.py (1)
54-55
: LGTM!
The script entry point correctly uses asyncio.run()
to handle the async main function.
# arabic-reshaper | ||
# pyhyphen | ||
# langcodes | ||
manga-ocr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
🛠️ Refactor suggestion
Pin manga-ocr version and add missing dependencies
The analysis reveals potential compatibility concerns:
- numpy is pinned to 1.26.4 which conflicts with manga-ocr's requirement of numpy<2
- Several required dependencies are missing: fire, fugashi, jaconv, loguru, pyperclip
- Pillow>=10.0.0 dependency is not specified
Update requirements-moeflow.txt to include:
manga-ocr==1.0.0
(latest stable version)- Add missing dependencies with compatible versions
- Ensure numpy version satisfies both project needs and manga-ocr constraint
🔗 Analysis chain
Verify manga-ocr compatibility
The manga-ocr package might have specific version requirements for its dependencies. We should verify compatibility with other packages.
Consider pinning the manga-ocr version to ensure stability.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check manga-ocr's dependencies
curl -s https://pypi.org/pypi/manga-ocr/json | jq -r '.info.requires_dist[]'
Length of output: 226
Script:
#!/bin/bash
# Check for potential conflicts in requirements files
echo "=== Checking requirements-moeflow.txt ==="
cat requirements-moeflow.txt | grep -E "numpy|Pillow|torch|transformers|fire|fugashi|jaconv|loguru|pyperclip"
echo -e "\n=== Checking other requirement files ==="
fd "requirements.*\.txt" --exec cat {} \; | grep -E "numpy|Pillow|torch|transformers|fire|fugashi|jaconv|loguru|pyperclip"
Length of output: 678
# vim: ft=requirements | ||
# networkx | ||
# torch | ||
# groq | ||
# torchvision | ||
# torch-summary | ||
einops |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Based on the search results, I can now generate the final response:
Uncomment and maintain PyTorch dependencies
The codebase heavily relies on PyTorch and related ML libraries. Evidence found:
- Multiple model implementations using PyTorch operations (
.cuda
,.forward()
,.backward()
) - Neural network architectures in OCR, inpainting, and detection modules
- Extensive use of CUDA and device management
- Model definitions across various components
Key files requiring these dependencies:
manga_translator/ocr/model_*.py
filesmanga_translator/inpainting/
modulesmanga_translator/detection/
modulesmanga_translator/colorization/
modules
🔗 Analysis chain
Review core ML dependencies configuration
Several critical ML dependencies are commented out (networkx, torch, torchvision). This could lead to installation issues or version mismatches if these are required by other packages.
Let's check which packages might require these dependencies:
Let me try a different search strategy to verify the dependencies:
I'm looking for evidence of ML operations and model definitions that would require these dependencies, as well as any dependency specifications in build files.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for dependencies that might require torch
rg -l "import torch|from torch" --type py
# Check for dependencies that might require torchvision
rg -l "import torchvision|from torchvision" --type py
# Check for dependencies that might require networkx
rg -l "import networkx|from networkx" --type py
Length of output: 630
Script:
#!/bin/bash
# Search for ML-related imports and function calls in Python files
rg -l "\.cuda|\.to\(device\)|\.to\('cuda'\)|model\." --type py
# Search for specific ML operations that typically require torch
rg -l "\.forward\(|\.backward\(|\.train\(|\.eval\(|optimizer\." --type py
# Search for any files that might contain ML model definitions
rg -l "class.*\(.*Model\)|class.*\(.*Net\)" --type py
# Look for setup.py or pyproject.toml to check dependencies
fd "setup.py|pyproject.toml"
Length of output: 4821
|
||
async def download(dict): | ||
for key, value in dict.items(): | ||
if issubclass(value, ModelWrapper): | ||
print(' -- Downloading', key) | ||
try: | ||
inst = value() | ||
await inst.download() | ||
except Exception as e: | ||
print('Failed to download', key, value) | ||
print(e) | ||
""" """ | ||
for key, value in dict.items(): | ||
if issubclass(value, ModelWrapper): | ||
print(" -- Downloading", key) | ||
try: | ||
inst = value() | ||
await inst.download() | ||
except Exception as e: | ||
print("Failed to download", key, value) | ||
print(e) | ||
if not cli_args.continue_on_error: | ||
raise | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid using global variables
The function directly accesses the global cli_args
variable, which makes it harder to test and reuse. Consider passing the continue_on_error
flag as a parameter.
-async def download(dict):
+async def download(dict, *, continue_on_error: bool = False):
""" """
for key, value in dict.items():
if issubclass(value, ModelWrapper):
print(" -- Downloading", key)
try:
inst = value()
await inst.download()
except Exception as e:
print("Failed to download", key, value)
print(e)
- if not cli_args.continue_on_error:
+ if not continue_on_error:
raise
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
async def download(dict): | |
for key, value in dict.items(): | |
if issubclass(value, ModelWrapper): | |
print(' -- Downloading', key) | |
try: | |
inst = value() | |
await inst.download() | |
except Exception as e: | |
print('Failed to download', key, value) | |
print(e) | |
""" """ | |
for key, value in dict.items(): | |
if issubclass(value, ModelWrapper): | |
print(" -- Downloading", key) | |
try: | |
inst = value() | |
await inst.download() | |
except Exception as e: | |
print("Failed to download", key, value) | |
print(e) | |
if not cli_args.continue_on_error: | |
raise | |
async def download(dict, *, continue_on_error: bool = False): | |
""" """ | |
for key, value in dict.items(): | |
if issubclass(value, ModelWrapper): | |
print(" -- Downloading", key) | |
try: | |
inst = value() | |
await inst.download() | |
except Exception as e: | |
print("Failed to download", key, value) | |
print(e) | |
if not continue_on_error: | |
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
Dockerfile.moeflow_worker (2)
3-4
: Consider making uid/gid configurableThe cache mount uses hardcoded uid/gid values (57439) which might cause permission issues in different environments. Consider making these configurable using build arguments.
+ARG MAMBA_USER_ID=57439 +ARG MAMBA_GROUP_ID=57439 -RUN --mount=type=cache,uid=57439,gid=57439,target=/opt/conda/pkgs micromamba env create --yes --file /tmp/conda.yaml +RUN --mount=type=cache,uid=${MAMBA_USER_ID},gid=${MAMBA_GROUP_ID},target=/opt/conda/pkgs micromamba env create --yes --file /tmp/conda.yaml
11-11
: Make model list configurableThe model list is hardcoded in the docker_prepare.py command. Consider making it configurable using build arguments for better flexibility.
+ARG MODEL_LIST="ocr.48px,ocr.48px_ctc,ocr.32px,ocr.mocr" -RUN venv/bin/python docker_prepare.py --models ocr.48px,ocr.48px_ctc,ocr.32px,ocr.mocr +RUN venv/bin/python docker_prepare.py --models ${MODEL_LIST}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
Dockerfile.moeflow_worker
(1 hunks)
🔇 Additional comments (2)
Dockerfile.moeflow_worker (2)
5-6
: System packages should not be commented out
The previous review comment about keeping system packages is still valid. These packages are required for:
- Custom CTC compilation (build-essential)
- Proper timezone handling (tzdata)
9-10
: Simplify environment management
The previous review comment about avoiding nested environments (venv inside micromamba) is still valid. This adds unnecessary complexity to the setup.
Summary by CodeRabbit
Release Notes
New Features
conda-venv
target in the Makefile.Improvements
moeflow_worker
to use Micromamba for environment setup.conda.yaml
with prioritized channels and controlled package versions.requirements-moeflow.txt
with new essential dependencies for enhanced functionality.Chores
.conda_env_created
entry to.gitignore
for better environment management.