Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
pdf-miner
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Qin Kaijie
pdf-miner
Commits
e3ef9b20
Commit
e3ef9b20
authored
Jul 17, 2024
by
myhloli
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
docs(readme): add link to FAQ for common issue resolution
parent
6d65855c
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
32 additions
and
2 deletions
+32
-2
README_zh-CN.md
README_zh-CN.md
+7
-2
FAQ_zh_cn.md
docs/FAQ_zh_cn.md
+25
-0
No files found.
README_zh-CN.md
View file @
e3ef9b20
...
@@ -168,7 +168,7 @@ magic-pdf --help
...
@@ -168,7 +168,7 @@ magic-pdf --help
```
python
```
python
image_writer
=
DiskReaderWriter
(
local_image_dir
)
image_writer
=
DiskReaderWriter
(
local_image_dir
)
image_dir
=
str
(
os
.
path
.
basename
(
local_image_dir
))
image_dir
=
str
(
os
.
path
.
basename
(
local_image_dir
))
jso_useful_key
=
{
"_pdf_type"
:
""
,
"model_list"
:
[]
}
jso_useful_key
=
{
"_pdf_type"
:
""
,
"model_list"
:
model_json
}
pipe
=
UNIPipe
(
pdf_bytes
,
jso_useful_key
,
image_writer
)
pipe
=
UNIPipe
(
pdf_bytes
,
jso_useful_key
,
image_writer
)
pipe
.
pipe_classify
()
pipe
.
pipe_classify
()
pipe
.
pipe_parse
()
pipe
.
pipe_parse
()
...
@@ -181,7 +181,7 @@ s3pdf_cli = S3ReaderWriter(pdf_ak, pdf_sk, pdf_endpoint)
...
@@ -181,7 +181,7 @@ s3pdf_cli = S3ReaderWriter(pdf_ak, pdf_sk, pdf_endpoint)
image_dir
=
"s3://img_bucket/"
image_dir
=
"s3://img_bucket/"
s3image_cli
=
S3ReaderWriter
(
img_ak
,
img_sk
,
img_endpoint
,
parent_path
=
image_dir
)
s3image_cli
=
S3ReaderWriter
(
img_ak
,
img_sk
,
img_endpoint
,
parent_path
=
image_dir
)
pdf_bytes
=
s3pdf_cli
.
read
(
s3_pdf_path
,
mode
=
s3pdf_cli
.
MODE_BIN
)
pdf_bytes
=
s3pdf_cli
.
read
(
s3_pdf_path
,
mode
=
s3pdf_cli
.
MODE_BIN
)
jso_useful_key
=
{
"_pdf_type"
:
""
,
"model_list"
:
[]
}
jso_useful_key
=
{
"_pdf_type"
:
""
,
"model_list"
:
model_json
}
pipe
=
UNIPipe
(
pdf_bytes
,
jso_useful_key
,
s3image_cli
)
pipe
=
UNIPipe
(
pdf_bytes
,
jso_useful_key
,
s3image_cli
)
pipe
.
pipe_classify
()
pipe
.
pipe_classify
()
pipe
.
pipe_parse
()
pipe
.
pipe_parse
()
...
@@ -191,6 +191,11 @@ md_content = pipe.pipe_mk_markdown(image_dir, drop_mode="none")
...
@@ -191,6 +191,11 @@ md_content = pipe.pipe_mk_markdown(image_dir, drop_mode="none")
详细实现可参考
[
demo.py
](
demo/demo.py
)
详细实现可参考
[
demo.py
](
demo/demo.py
)
### 常见问题处理解答
参考
[
FAQ
](
docs/FAQ_zh_cn.md
)
# Magic-Doc
# Magic-Doc
...
...
docs/FAQ_zh_cn.md
0 → 100644
View file @
e3ef9b20
# 常见问题解答
##### 1.离线部署首次运行,报错urllib.error.URLError: <urlopen error [Errno 101] Network is unreachable>
首次运行需要在线下载一个小的语言检测模型,如果是离线部署需要手动下载该模型并放到指定目录。
参考:https://github.com/opendatalab/MinerU/issues/121
##### 2.在较新版本的mac上使用命令安装pip install magic-pdf[full-cpu] zsh: no matches found: magic-pdf[full-cpu]
在 macOS 上,默认的 shell 从 Bash 切换到了 Z shell,而 Z shell 对于某些类型的字符串匹配有特殊的处理逻辑,这可能导致no matches found错误。
可以通过在命令行禁用globbing特性,再尝试运行安装命令
```
bash
setopt no_nomatch
pip
install
magic-pdf[full-cpu]
```
##### 3.在intel cpu 的mac上 安装最新版的 magic-pdf[full-cpu] (>=0.6.1) 不成功
完整功能包依赖的公式解析库unimernet限制了pytorch的最低版本为2.3.0,而pytorch官方没有为intel cpu的macOS 提供2.3.0版本的预编译包,所以会产生依赖不兼容的问题。
可以先尝试安装unimernet的老版本之后再尝试安装完整功能包的其他依赖。(为避免依赖冲突,请激活一个全新的虚拟环境)
```
bash
pip
install
magic-pdf
pip
install
unimernet
==
0.1.0
pip
install
matplotlib ultralytics
paddleocr
==
2.7.3 paddlepaddle
```
\ No newline at end of file
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment