Commit 892f522a authored by 赵小蒙's avatar 赵小蒙

update

parent 26e19fd2
...@@ -208,14 +208,14 @@ def line_to_standard_format(line): ...@@ -208,14 +208,14 @@ def line_to_standard_format(line):
def ocr_mk_mm_standard_format(pdf_info_dict: dict): def ocr_mk_mm_standard_format(pdf_info_dict: dict):
''' """
content_list content_list
type string image/text/table/equation(行间的单独拿出来,行内的和text合并) type string image/text/table/equation(行间的单独拿出来,行内的和text合并)
latex string latex文本字段。 latex string latex文本字段。
text string 纯文本格式的文本数据。 text string 纯文本格式的文本数据。
md string markdown格式的文本数据。 md string markdown格式的文本数据。
img_path string s3://full/path/to/img.jpg img_path string s3://full/path/to/img.jpg
''' """
content_list = [] content_list = []
for _, page_info in pdf_info_dict.items(): for _, page_info in pdf_info_dict.items():
blocks = page_info.get("preproc_blocks") blocks = page_info.get("preproc_blocks")
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment