Skip to content
This repository has been archived by the owner on Jul 27, 2023. It is now read-only.

Commit

Permalink
Merge pull request #2 from zhangchi0104/main
Browse files Browse the repository at this point in the history
添加了Docker镜像的配置
  • Loading branch information
MistEO authored Dec 16, 2022
2 parents 40a620c + 8a34520 commit 3db3fcb
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 0 deletions.
15 changes: 15 additions & 0 deletions Vision/OCR/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
ARG OCR_LANG=zh_CN
ARG PRETRAINED_MODEL=ch_PP-OCRv3_rec_distillation
ARG VERSION=22.05
FROM nvcr.io/nvidia/paddlepaddle:{22.05}-py3

RUN git clone https://gitee.com/paddlepaddle/PaddleOCR.git /PaddleOCR --depth=1 && \
cd /PaddleOCR && \
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple && \
pip install opencv-python-headless -i https://pypi.tuna.tsinghua.edu.cn/simple

COPY output/$OCR_LANG /workspace/output/$OCR_LANG
COPY output/render/$OCR_LANG /workspace/output/render/$OCR_LANG
COPY pretrained_model/$PRETRAINED_MODEL /workspace/pretrained_model/$PRETRAINED_MODEL
COPY *.yml /workspace/
WORKDIR /workspace
21 changes: 21 additions & 0 deletions Vision/OCR/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,27 @@

只是个大致的流程,都还是 PaddleOCR 的那套,更多详细的参数等请参考 PaddleOCR 的文档

## 训练方法(Docker)
如果你是用恰好有nvidia-docker并且不想折腾环境可以试试Docker, 本教程假设你知道一些常用的
0. 依赖
- `docker` 以及 `nvidia-docker` 具体安装流程参考[Nvidia文档](https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html)
- 对应版本的CUDA, 本仓库提供的Dockerfile默认版本支持nvidia驱动版本>=515 (CUDA >= 11.7)
1. 获取镜像
```bash
docker build -t maa_train . \ # 以下为可选参数
--build-arg VERSION=22.05 \ # Nvdia 镜像的版本,默认为22.05, 可选的版本参考之前的链接
--build-arg OCR_LANG=zh_CN \ # 训练数据集的语言,docker将拷贝对应语言数据集到镜像,默认zh_CN, 可选`zh_CN | ja_JP | zh_TW | en_US`
--build_arg PRETRAINED_MODEL=ch_PP-OCRv3_rec_distillation # 预训练模型权重名称, 默认为简中知识蒸馏模型
```
2. 运行镜像
```bash
# 如果启动失败,可尝试删除 --ulimit memlock=-1 或者添加sudo运行
docker --gpus all --shm-size=1g --ulimit memlock=-1 run -it maa_train /bin/bash
```
进入容器后,将第六步中PaddleOCR的位置替换为`../PaddleOCR`,即
```
python ../PaddleOCR/tools/train.py -c ch_PP-OCRv3_rec_distillation.yml
```
## 开源库

- [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR): Awesome multilingual OCR toolkits based on PaddlePaddle
Expand Down

0 comments on commit 3db3fcb

Please sign in to comment.