Unverified Commit 1fc0b76d authored by Xiaomeng Zhao's avatar Xiaomeng Zhao Committed by GitHub

build(docker): update docker build step (#471)

* build(docker): update base image to Ubuntu 22.04 and install PaddlePaddleUpgrade the Docker base image from ubuntu:latest to ubuntu:22.04 for improved
performance and stability.

Additionally, integrate PaddlePaddle GPU version 3.0.0b1
into the Docker build for enhanced AI capabilities. The MinIO configuration file has
also been updated to the latest version.

* build(dockerfile): Updated the Dockerfile

* build(Dockerfile): update Dockerfile

* docs(docker): add instructions for quick deployment with Docker

Include Docker-based deployment instructions in the README for both English and
Chinese locales. This update provides users a quick-start guide to using Docker for
deployment, with notes on GPU VRAM requirements and default acceleration features.

* build(docker): Layer the installation of dependencies, downloading the model, and the setup of the program itself.

* build(docker): Layer the installation of dependencies, downloading the model, and the setup of the program itself.
parent dd19f59e
# Use the official Ubuntu base image # Use the official Ubuntu base image
FROM ubuntu:latest FROM ubuntu:22.04
# Set environment variables to non-interactive to avoid prompts during installation # Set environment variables to non-interactive to avoid prompts during installation
ENV DEBIAN_FRONTEND=noninteractive ENV DEBIAN_FRONTEND=noninteractive
...@@ -29,17 +29,23 @@ RUN python3 -m venv /opt/mineru_venv ...@@ -29,17 +29,23 @@ RUN python3 -m venv /opt/mineru_venv
# Activate the virtual environment and install necessary Python packages # Activate the virtual environment and install necessary Python packages
RUN /bin/bash -c "source /opt/mineru_venv/bin/activate && \ RUN /bin/bash -c "source /opt/mineru_venv/bin/activate && \
pip install --upgrade pip && \ pip3 install --upgrade pip && \
pip install magic-pdf[full-cpu] detectron2 --extra-index-url https://myhloli.github.io/wheels/" wget https://gitee.com/myhloli/MinerU/raw/master/requirements-docker.txt && \
pip3 install -r requirements-docker.txt --extra-index-url https://wheels.myhloli.com -i https://pypi.tuna.tsinghua.edu.cn/simple && \
# Copy the configuration file template and set up the model directory pip3 install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/"
COPY magic-pdf.template.json /root/magic-pdf.json
# Copy the configuration file template and install magic-pdf latest
# Set the models directory in the configuration file (adjust the path as needed) RUN /bin/bash -c "wget https://gitee.com/myhloli/MinerU/raw/master/magic-pdf.template.json && \
RUN sed -i 's|/tmp/models|/opt/models|g' /root/magic-pdf.json cp magic-pdf.template.json /root/magic-pdf.json && \
source /opt/mineru_venv/bin/activate && \
# Create the models directory pip3 install magic-pdf==0.7.0b1"
RUN mkdir -p /opt/models
# Download models and update the configuration file
RUN /bin/bash -c "pip3 install modelscope && \
wget https://gitee.com/myhloli/MinerU/raw/master/docs/download_models.py && \
python3 download_models.py && \
sed -i 's|/tmp/models|/root/.cache/modelscope/hub/wanderkid/PDF-Extract-Kit/models|g' /root/magic-pdf.json && \
sed -i 's|cpu|cuda|g' /root/magic-pdf.json"
# Set the entry point to activate the virtual environment and run the command line tool # Set the entry point to activate the virtual environment and run the command line tool
ENTRYPOINT ["/bin/bash", "-c", "source /opt/mineru_venv/bin/activate && exec \"$@\"", "--"] ENTRYPOINT ["/bin/bash", "-c", "source /opt/mineru_venv/bin/activate && exec \"$@\"", "--"]
...@@ -227,6 +227,14 @@ If your device supports CUDA and meets the GPU requirements of the mainline envi ...@@ -227,6 +227,14 @@ If your device supports CUDA and meets the GPU requirements of the mainline envi
- [Ubuntu 22.04 LTS + GPU](docs/README_Ubuntu_CUDA_Acceleration_en_US.md) - [Ubuntu 22.04 LTS + GPU](docs/README_Ubuntu_CUDA_Acceleration_en_US.md)
- [Windows 10/11 + GPU](docs/README_Windows_CUDA_Acceleration_en_US.md) - [Windows 10/11 + GPU](docs/README_Windows_CUDA_Acceleration_en_US.md)
- Quick Deployment with Docker
> Docker requires a GPU with at least 16GB of VRAM, and all acceleration features are enabled by default.
```bash
wget https://github.com/opendatalab/MinerU/raw/master/Dockerfile
docker build -t mineru:0.7.0b1 .
docker run --rm -it --gpus=all mineru:0.7.0b1 /bin/bash
magic-pdf --help
```
## Usage ## Usage
......
...@@ -230,6 +230,15 @@ cp magic-pdf.template.json ~/magic-pdf.json ...@@ -230,6 +230,15 @@ cp magic-pdf.template.json ~/magic-pdf.json
- [Ubuntu22.04LTS + GPU](docs/README_Ubuntu_CUDA_Acceleration_zh_CN.md) - [Ubuntu22.04LTS + GPU](docs/README_Ubuntu_CUDA_Acceleration_zh_CN.md)
- [Windows10/11 + GPU](docs/README_Windows_CUDA_Acceleration_zh_CN.md) - [Windows10/11 + GPU](docs/README_Windows_CUDA_Acceleration_zh_CN.md)
- 使用Docker快速部署
> Docker 需设备gpu显存大于等于16GB,默认开启所有加速功能
```bash
wget https://github.com/opendatalab/MinerU/raw/master/Dockerfile
docker build -t mineru:0.7.0b1 .
docker run --rm -it --gpus=all mineru:0.7.0b1 /bin/bash
magic-pdf --help
```
## 使用 ## 使用
......
boto3>=1.28.43
Brotli>=1.1.0
click>=8.1.7
PyMuPDF>=1.24.9
loguru>=0.6.0
numpy>=1.21.6,<2.0.0
fast-langdetect==0.2.0
wordninja>=2.0.0
scikit-learn>=1.0.2
pdfminer.six==20231228
unimernet==0.1.6
matplotlib
ultralytics
paddleocr==2.7.3
paddlepaddle==3.0.0b1
pypandoc
struct-eqtable==0.1.0
detectron2
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment