Commit c98df758 authored by xu rui's avatar xu rui

feat: add static images

parent 338c6814
This diff is collapsed.
This diff is collapsed.
...@@ -16,4 +16,11 @@ Changelog ...@@ -16,4 +16,11 @@ Changelog
process, added table recognition functionality process, added table recognition functionality
- 2024/08/01: Version 0.6.2b1 released, optimized dependency conflict - 2024/08/01: Version 0.6.2b1 released, optimized dependency conflict
issues and installation documentation issues and installation documentation
- 2024/07/05: Initial open-source release - 2024/07/05: Initial open-source release
\ No newline at end of file
.. warning::
fix ``localized deployment version`` and ``front-end interface``
...@@ -6,4 +6,6 @@ Glossary ...@@ -6,4 +6,6 @@ Glossary
1. jsonl 1. jsonl
TODO: add description TODO: add description
2. magic-pdf.json
TODO: add description
...@@ -9,5 +9,5 @@ Eager to get started? This page gives a good introduction to MinerU. Follow Inst ...@@ -9,5 +9,5 @@ Eager to get started? This page gives a good introduction to MinerU. Follow Inst
:maxdepth: 1 :maxdepth: 1
quick_start/command_line quick_start/command_line
quick_start/extract_text quick_start/to_markdown
...@@ -55,6 +55,5 @@ directory. The output file list is as follows: ...@@ -55,6 +55,5 @@ directory. The output file list is as follows:
├── some_pdf_spans.pdf # smallest granularity bbox position information diagram ├── some_pdf_spans.pdf # smallest granularity bbox position information diagram
└── some_pdf_content_list.json # Rich text JSON arranged in reading order └── some_pdf_content_list.json # Rich text JSON arranged in reading order
For more information about the output files, please refer to the `Output For more information about the output files, please refer to the :doc:`../tutorial/output_file_description`
File Description <docs/output_file_en_us.md>`__.
Convert To Markdown
========================
.. code:: python
import os
from magic_pdf.data.data_reader_writer import FileBasedDataWriter, FileBasedDataReader
from magic_pdf.libs.MakeContentConfig import DropMode, MakeMode
from magic_pdf.pipe.OCRPipe import OCRPipe
## args
model_list = []
pdf_file_name = "abc.pdf" # replace with the real pdf path
## prepare env
local_image_dir, local_md_dir = "output/images", "output"
os.makedirs(local_image_dir, exist_ok=True)
image_writer, md_writer = FileBasedDataWriter(local_image_dir), FileBasedDataWriter(
local_md_dir
) # create 00
image_dir = str(os.path.basename(local_image_dir))
reader1 = FileBasedDataReader("")
pdf_bytes = reader1.read(pdf_file_name) # read the pdf content
pipe = OCRPipe(pdf_bytes, model_list, image_writer)
pipe.pipe_classify()
pipe.pipe_analyze()
pipe.pipe_parse()
pdf_info = pipe.pdf_mid_data["pdf_info"]
md_content = pipe.pipe_mk_markdown(
image_dir, drop_mode=DropMode.NONE, md_make_mode=MakeMode.MM_MD
)
if isinstance(md_content, list):
md_writer.write_string(f"{pdf_file_name}.md", "\n".join(md_content))
else:
md_writer.write_string(f"{pdf_file_name}.md", md_content)
Check :doc:`../data/data_reader_writer` for more [reader | writer] examples
Tutorial Tutorial
---------- ===========
From the beginning to the end, Show how to using mineru via a minimal project From the beginning to the end, Show how to using mineru via a minimal project
.. toctree::
:maxdepth: 1
tutorial/output_file_description
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment