Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
pdf-miner
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Qin Kaijie
pdf-miner
Commits
ac869888
Unverified
Commit
ac869888
authored
Sep 27, 2024
by
Xiaomeng Zhao
Committed by
GitHub
Sep 27, 2024
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Update README.md
parent
70e3083b
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
20 additions
and
7 deletions
+20
-7
README.md
README.md
+20
-7
No files found.
README.md
View file @
ac869888
...
@@ -267,13 +267,26 @@ Usage: magic-pdf [OPTIONS]
...
@@ -267,13 +267,26 @@ Usage: magic-pdf [OPTIONS]
Options:
Options:
-v, --version display the version and exit
-v, --version display the version and exit
-p, --path PATH local pdf filepath or directory
[
required
]
-p, --path PATH local pdf filepath or directory
[
required
]
-o, --output-dir TEXT output local directory
-o, --output-dir PATH output local directory
[
required
]
-m, --method
[
ocr|txt|auto
]
the method for parsing pdf.
-m, --method
[
ocr|txt|auto
]
the method for parsing pdf. ocr: using ocr
ocr: using ocr technique to extract information from pdf,
technique to extract information from pdf. txt:
txt: suitable for the text-based pdf only and outperform ocr,
suitable for the text-based pdf only and
auto: automatically choose the best method for parsing pdf
outperform ocr. auto: automatically choose the
from ocr and txt.
best method for parsing pdf from ocr and txt.
without method specified, auto will be used by default.
without method specified, auto will be used by
default.
-l, --lang TEXT Input the languages in the pdf (if known) to
improve OCR accuracy. Optional. You should
input "Abbreviation" with language form url: ht
tps://paddlepaddle.github.io/PaddleOCR/en/ppocr
/blog/multi_languages.html#5-support-languages-
and-abbreviations
-d, --debug BOOLEAN Enables detailed debugging information during
the execution of the CLI commands.
-s, --start INTEGER The starting page for PDF parsing, beginning
from 0.
-e, --end INTEGER The ending page for PDF parsing, beginning from
0.
--help Show this message and exit.
--help Show this message and exit.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment