Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
pdf-miner
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Qin Kaijie
pdf-miner
Commits
7e468d52
Commit
7e468d52
authored
Sep 26, 2024
by
dechen lin
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
feat: conflict
parent
f5c431cc
Changes
8
Hide whitespace changes
Inline
Side-by-side
Showing
8 changed files
with
317 additions
and
36 deletions
+317
-36
.gitignore
projects/web/.gitignore
+24
-0
README.md
projects/web/README.md
+52
-9
README_zh-CN.md
projects/web/README_zh-CN.md
+61
-0
extract.ts
projects/web/src/api/extract.ts
+28
-0
SaveStatus.tsx
projects/web/src/components/SaveStatus.tsx
+67
-0
index.tsx
...ects/web/src/pages/extract/components/md-viewer/index.tsx
+11
-3
home.tsx
projects/web/src/pages/home.tsx
+19
-23
mdStore.ts
projects/web/src/store/mdStore.ts
+55
-1
No files found.
projects/web/.gitignore
0 → 100644
View file @
7e468d52
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
pnpm-debug.log*
lerna-debug.log*
node_modules
dist
dist-ssr
*.local
# Editor directories and files
.vscode/*
!.vscode/extensions.json
.idea
.DS_Store
*.suo
*.ntvs*
*.njsproj
*.sln
*.sw?
projects/web/README.md
View file @
7e468d52
#
## 前端本地开发
#
MinerU web
1.
安装 nodejs 18 和 pnpm;
```
javascript
npm
install
-
g
pnpm
## Table of Contents
-
[
Local Frontend Development
](
#local-frontend-development
)
-
[
Technology Stack
](
#technology-stack
)
## Local Frontend Development
### Prerequisites
-
Node.js 18.x
-
pnpm
### Installation Steps
1.
Install Node.js 18
-
Visit the
[
Node.js official website
](
https://nodejs.org/
)
to download and install Node.js version 18.x
2.
Install pnpm
```
bash
npm
install
-g
pnpm
3. Clone the repository
```
git clone https://github.com/opendatalab/MinerU
cd
./projects/web
```
4. Install dependencies
```
pnpm
install
```
5. Run the development server
```
pnpm run dev
```
6. ⚠️ Note: This
command
is
for
local
development only,
do
not use
for
deployment!
Open your browser and visit http://localhost:5173
(
or another address output
in
the console
)
7. Ensure that the backend service
in
./projects/web_demo is running
8. If you encounter an error when executing
`
pnpm
install
`
, you can switch to an alternative package manager.
```
npm
install
-g
yarn
yarn
yarn start
```
## Building the Project
```
2.
执行
`pnpm install && pnpm run dev`
即可。⚠️ 注意:此命令仅用于本地开发,不要用于部署!
3.
build
pnpm run build
```
1.pnpm run build
2.npm run build
```
\ No newline at end of file
## Technology Stack
-
React
-
Tailwind CSS
-
typeScript
-
zustand
-
ahooks
projects/web/README_zh-CN.md
0 → 100644
View file @
7e468d52
# MinerU web
## 目录
-
[
前端本地开发
](
#前端本地开发
)
-
[
技术栈
](
#技术栈
)
## 前端本地开发
### 前置条件
-
Node.js 18.x
-
pnpm
### 安装步骤
1.
安装 Node.js 18
-
访问
[
Node.js 官网
](
https://nodejs.org/
)
下载并安装 Node.js 18.x 版本
2.
安装 pnpm
```bash
npm install -g pnpm
```
3.
克隆仓库
```
1. git clone https://github.com/opendatalab/MinerU
2. cd ./projects/web
```
4.
安装依赖
```
pnpm install
```
5.
运行开发服务器
```
pnpm run dev
```
6.
⚠️ 注意:此命令仅用于本地开发,不要用于部署!
打开浏览器访问 http://localhost:5173(或控制台输出的其他地址)
构建项目
要构建生产版本,请执行以下命令:
```
pnpm run build
```
7.
请确保./projects/web_demo后端服务启动
8.
如果pnpm install执行error,可更换包管理器
```
npm install -g yarn
yarn
yarn start
```
## 技术栈
-
React
-
Tailwind CSS
-
typeScript
-
zustand
-
ahooks
projects/web/src/api/extract.ts
View file @
7e468d52
...
...
@@ -95,6 +95,7 @@ export interface TaskIdResItem {
type
:
ExtractTaskType
|
"unknown"
;
state
:
"running"
|
"done"
|
"pending"
|
"failed"
|
"unknown"
;
markdownUrl
:
string
[];
file_key
?:
string
;
}
export
type
TaskIdRes
=
TaskIdResItem
[];
...
...
@@ -166,3 +167,30 @@ export const localUpload = (file: File) => {
},
});
};
export
interface
UpdateMarkdownRequest
{
file_key
:
string
;
data
:
{
[
pageNumber
:
string
]:
string
;
};
}
export
interface
UpdateMarkdownResponse
{
success
:
boolean
;
message
?:
string
;
}
export
const
updateMarkdownContent
=
async
(
params
:
UpdateMarkdownRequest
):
Promise
<
UpdateMarkdownResponse
|
null
>
=>
{
return
axios
.
put
<
UpdateMarkdownResponse
>
(
"/api/v2/extract/markdown"
,
params
)
.
then
((
res
)
=>
{
if
(
!
res
?.
data
?.
error
)
{
return
res
.
data
.
data
;
}
else
{
handleErrorMsg
(
res
);
return
null
;
}
});
};
projects/web/src/components/SaveStatus.tsx
0 → 100644
View file @
7e468d52
import
React
,
{
useState
,
useEffect
,
useImperativeHandle
,
forwardRef
,
}
from
"react"
;
interface
SaveStatusProps
{
className
?:
string
;
}
export
interface
SaveStatusRef
{
triggerSave
:
()
=>
void
;
}
const
SaveStatus
=
forwardRef
<
SaveStatusRef
,
SaveStatusProps
>
(
({
className
},
ref
)
=>
{
const
[
lastSaveTime
,
setLastSaveTime
]
=
useState
<
Date
|
null
>
(
null
);
const
[
showSaved
,
setShowSaved
]
=
useState
(
false
);
const
[
timeSinceLastSave
,
setTimeSinceLastSave
]
=
useState
(
""
);
useImperativeHandle
(
ref
,
()
=>
({
triggerSave
:
()
=>
{
setLastSaveTime
(
new
Date
());
setShowSaved
(
true
);
},
}));
useEffect
(()
=>
{
if
(
showSaved
)
{
const
timer
=
setTimeout
(()
=>
{
setShowSaved
(
false
);
},
10000
);
return
()
=>
clearTimeout
(
timer
);
}
},
[
showSaved
]);
useEffect
(()
=>
{
const
updateTimeSinceLastSave
=
()
=>
{
if
(
lastSaveTime
)
{
const
now
=
new
Date
();
const
diffInMinutes
=
Math
.
floor
(
(
now
.
getTime
()
-
lastSaveTime
.
getTime
())
/
60000
);
if
(
diffInMinutes
>
0
)
{
setTimeSinceLastSave
(
`
${
diffInMinutes
}
分钟前`
);
}
}
};
const
timer
=
setInterval
(
updateTimeSinceLastSave
,
60000
);
updateTimeSinceLastSave
();
// 立即更新一次
return
()
=>
clearInterval
(
timer
);
},
[
lastSaveTime
]);
return
(
<
div
className=
{
className
}
>
{
showSaved
&&
<
span
>
已保存
</
span
>
}
{
!
showSaved
&&
lastSaveTime
&&
(
<
span
>
最近修改:
{
timeSinceLastSave
}
</
span
>
)
}
</
div
>
);
}
);
export
default
SaveStatus
;
projects/web/src/pages/extract/components/md-viewer/index.tsx
View file @
7e468d52
...
...
@@ -17,6 +17,7 @@ import { TaskIdResItem } from "@/api/extract";
import
useMdStore
from
"@/store/mdStore"
;
import
CodeMirror
from
"@/components/code-mirror"
;
import
{
useParams
}
from
"react-router-dom"
;
import
SaveStatus
,
{
SaveStatusRef
}
from
"@/components/SaveStatus"
;
interface
IMdViewerProps
{
md
?:
string
;
...
...
@@ -48,10 +49,12 @@ const MdViewer: React.FC<IMdViewerProps> = ({
allMdContentWithAnchor
,
setMdUrlArr
,
mdContents
,
updateMdContent
,
}
=
useMdStore
();
const
[
lineWrap
,
setLineWrap
]
=
useState
(
false
);
const
threshold
=
562
-
427
;
const
statusRef
=
useRef
<
SaveStatusRef
>
(
null
);
const
menuList
=
[
{
...
...
@@ -137,8 +140,12 @@ const MdViewer: React.FC<IMdViewerProps> = ({
}
},
[
taskInfo
?.
markdownUrl
,
params
?.
jobID
]);
const
handleContentChange
=
(
val
:
string
)
=>
{
const
handleContentChange
=
(
val
:
string
,
index
:
number
)
=>
{
setAllMdContentWithAnchor
(
val
);
statusRef
?.
current
?.
triggerSave
();
if
(
taskInfo
?.
file_key
)
{
updateMdContent
(
taskInfo
.
file_key
!
,
index
,
val
);
}
};
return
(
...
...
@@ -161,12 +168,13 @@ const MdViewer: React.FC<IMdViewerProps> = ({
</
li
>
))
}
</
ul
>
<
SaveStatus
ref=
{
statusRef
}
/>
{
displayType
===
"code"
&&
(
<>
<
Tooltip
title=
{
fullScreen
?
formatMessage
({
id
:
"extractor.button.
exitFullScreen
"
})
?
formatMessage
({
id
:
"extractor.button.
lineWrap
"
})
:
formatMessage
({
id
:
"extractor.button.lineWrap"
,
})
...
...
@@ -253,7 +261,7 @@ const MdViewer: React.FC<IMdViewerProps> = ({
<
CodeMirror
value=
{
md
}
lineWrapping=
{
lineWrap
}
onChange=
{
handleContentChange
}
onChange=
{
(
val
)
=>
handleContentChange
(
val
,
index
)
}
editable
className=
"w-full h-full"
/>
...
...
projects/web/src/pages/home.tsx
View file @
7e468d52
...
...
@@ -3,23 +3,24 @@
import
ErrorBoundary
from
"@/components/error-boundary"
;
import
styles
from
"./home.module.scss"
;
import
{
SlotID
,
Path
}
from
"@/constant/route"
;
import
{
HashRouter
,
Routes
,
Route
,
Outlet
}
from
"react-router-dom"
;
import
{
BrowserRouter
,
Routes
,
Route
,
Outlet
,
Navigate
,
useLocation
,
HashRouter
,
}
from
"react-router-dom"
;
import
{
ExtractorSide
}
from
"./extract-side"
;
import
{
LanguageProvider
}
from
"@/context/language-provider"
;
import
PDFUpload
from
"@/pages/extract/components/pdf-upload"
;
import
PDFExtractionJob
from
"@/pages/extract/components/pdf-extraction"
;
// judge if the app has hydrated
// const useHasHydrated = () => {
// const [hasHydrated, setHasHydrated] = useState<boolean>(false);
// useEffect(() => {
// setHasHydrated(true);
// }, []);
// return hasHydrated;
// };
export
function
WindowContent
()
{
const
location
=
useLocation
();
const
isHome
=
location
.
pathname
===
Path
.
Home
;
return
(
<>
<
ExtractorSide
className=
{
isHome
?
styles
[
"sidebar-show"
]
:
""
}
/>
...
...
@@ -31,13 +32,15 @@ export function WindowContent() {
}
function
Screen
()
{
// if you do not need to use the renderContent for rendering router, you can use the other render function to interrupt before the renderContent
const
renderContent
=
()
=>
{
return
(
<
div
className=
"w-full h-full flex"
id=
{
SlotID
.
AppBody
}
>
<
Routes
>
<
Route
path=
"/"
element=
{
<
WindowContent
/>
}
>
<
Route
index
element=
{
<
Navigate
to=
"/OpenSourceTools/Extractor/PDF"
replace
/>
}
/>
<
Route
path=
"/OpenSourceTools/Extractor/PDF"
element=
{
<
PDFUpload
/>
}
...
...
@@ -46,15 +49,13 @@ function Screen() {
path=
"/OpenSourceTools/Extractor/PDF/:jobID"
element=
{
<
PDFExtractionJob
/>
}
/>
{
/* <Route path="*" element={<PDFUpload />} /> */
}
<
Route
path=
"*"
element=
{
<
Navigate
to=
"/OpenSourceTools/Extractor/PDF"
replace
/>
}
/>
</
Route
>
</
Routes
>
</
div
>
// <ExtractorSide className={isHome ? styles["sidebar-show"] : ""} />
// <WindowContent className="flex-1">
// <AppRoutes />
// </WindowContent>
);
};
...
...
@@ -62,11 +63,6 @@ function Screen() {
}
export
function
Home
()
{
// leave this comment to check if the app has hydrated
// if (!useHasHydrated()) {
// return <LoadingAnimation />;
// }
return
(
<
ErrorBoundary
>
<
LanguageProvider
>
...
...
projects/web/src/store/mdStore.ts
View file @
7e468d52
// mdStore.ts
import
{
create
}
from
"zustand"
;
import
axios
from
"axios"
;
import
{
updateMarkdownContent
,
UpdateMarkdownRequest
}
from
"@/api/extract"
;
// 确保路径正确
interface
MdContent
{
content
:
string
;
...
...
@@ -48,6 +49,11 @@ interface MdState {
)
=>
string
;
jumpToAnchor
:
(
anchorId
:
string
)
=>
number
;
reset
:
()
=>
void
;
updateMdContent
:
(
fileKey
:
string
,
pageNumber
:
string
|
number
,
newContent
:
string
)
=>
Promise
<
void
>
;
}
const
MAX_CONCURRENT_REQUESTS
=
2
;
...
...
@@ -122,7 +128,6 @@ const useMdStore = create<MdState>((set, get) => ({
const
results
=
await
fetchWithConcurrency
(
urls
);
// 只有当这是最新的请求时,才更新状态
if
(
get
().
currentRequestId
===
requestId
)
{
const
newMdContents
:
Record
<
string
,
MdContent
>
=
{};
results
.
forEach
(([
url
,
content
])
=>
{
...
...
@@ -191,6 +196,55 @@ const useMdStore = create<MdState>((set, get) => ({
}
return
-
1
;
// Anchor not found
},
updateMdContent
:
async
(
fileKey
:
string
,
pageNumber
:
string
,
newContent
:
string
)
=>
{
try
{
const
params
:
UpdateMarkdownRequest
=
{
file_key
:
fileKey
,
data
:
{
[
pageNumber
]:
newContent
,
},
};
const
result
=
await
updateMarkdownContent
(
params
);
if
(
result
&&
result
.
success
)
{
// 更新本地状态
set
((
state
)
=>
{
const
updatedMdContents
=
{
...
state
.
mdContents
};
if
(
updatedMdContents
[
fileKey
])
{
updatedMdContents
[
fileKey
]
=
{
...
updatedMdContents
[
fileKey
],
content
:
newContent
,
};
}
// 重新计算 allMdContent 和 allMdContentWithAnchor
const
contentArray
=
Object
.
values
(
updatedMdContents
).
map
(
(
content
)
=>
content
.
content
);
const
newAllMdContent
=
state
.
getAllMdContent
(
contentArray
);
const
newAllMdContentWithAnchor
=
state
.
getContentWithAnchors
(
contentArray
);
return
{
mdContents
:
updatedMdContents
,
allMdContent
:
newAllMdContent
,
allMdContentWithAnchor
:
newAllMdContentWithAnchor
,
};
});
}
else
{
throw
new
Error
(
"Failed to update Markdown content"
);
}
}
catch
(
error
)
{
set
({
error
:
error
as
Error
});
throw
error
;
}
},
}));
export
default
useMdStore
;
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment