Skip to content

Commit f4ac09b

Browse files
authored
feat: support PP-DocLayoutV3
1 parent d8cf3b1 commit f4ac09b

5 files changed

Lines changed: 35 additions & 4 deletions

File tree

docs/blog/posts/support_pp_doc_layout.md

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,8 @@ paddle2onnx --model_dir=models/PP-DocLayoutV2 --model_filename inference.json
4242

4343
我在`/xxxx/miniforge3/envs/wjh_debug/lib/python3.10/site-packages/paddlex/inference/models/layout_analysis/predictor.py`中插入以下代码(在 **L103** 行左右),来保证输入相同,比较输出。
4444

45+
#### PP-DocLayoutV2
46+
4547
按照上面直接转换后,在相同输入下,ONNX模型和Paddle模型推理结果误差为 **14.8%** 。在我看来,这个误差其实挺大的。
4648

4749
但是从可视化示例图结果来看,两者并无明显区别。可能在某些图上会有较大区别。
@@ -53,7 +55,7 @@ paddle2onnx --model_dir=models/PP-DocLayoutV2 --model_filename inference.json
5355
import onnxruntime
5456
import numpy as np
5557

56-
model_path = "models/PP-DocLayoutV2/inference_v5_op15_pd_cpu_fixed.onnx"
58+
model_path = "models/PP-DocLayoutV2/inference.onnx"
5759
ort_session = onnxruntime.InferenceSession(model_path)
5860
ort_inputs = {
5961
"im_shape": batch_inputs[0],
@@ -105,6 +107,25 @@ Max relative difference: 194.
105107
106108
暂时先用这个ONNX模型,该问题已经反馈到了Paddle2ONNX issue [#1608](https://github.com/PaddlePaddle/Paddle2ONNX/issues/1608#issuecomment-3875561303)
107109
110+
#### PP-DocLayoutV3
111+
112+
和 PP-DocLayoutV2 相同环境,相同转换代码,这个模型误差就小很多了,仅有 **1.57%**了。
113+
114+
```bash
115+
AssertionError:
116+
Not equal to tolerance rtol=0, atol=0.001
117+
118+
Mismatched elements: 33 / 2100 (1.57%)
119+
Max absolute difference among violations: 1.
120+
Max relative difference among violations: 0.01754386
121+
ACTUAL: array([[2.200000e+01, 9.658169e-01, 3.387792e+01, ..., 3.626684e+02,
122+
8.528884e+02, 1.540000e+02],
123+
[2.200000e+01, 9.657925e-01, 3.363610e+01, ..., 3.633332e+02,...
124+
DESIRED: array([[2.200000e+01, 9.658167e-01, 3.387791e+01, ..., 3.626685e+02,
125+
8.528885e+02, 1.530000e+02],
126+
[2.200000e+01, 9.657924e-01, 3.363615e+01, ..., 3.633333e+02,...
127+
```
128+
108129
### 剥离推理代码
109130
110131
因为PaddleOCR库中需要兼容的推理代码较多,大而全。这也导致了有些臃肿。这是难以避免的。但是如果只看PP-DocLayout推理代码的话,很多问题就很简单了。
@@ -202,9 +223,11 @@ $ python write_dict.py
202223
['abstract', 'algorithm', 'aside_text', 'chart', 'content', 'display_formula', 'doc_title', 'figure_title', 'footer', 'footer_image', 'footnote', 'formula_number', 'header', 'header_image', 'image', 'inline_formula', 'number', 'paragraph_title', 'reference', 'reference_content', 'seal', 'table', 'text', 'vertical_text', 'vision_footnote']
203224
```
204225
226+
PP-DocLayoutV2 和 PP-DocLayoutV3 字典是一样的。
227+
205228
### 使用
206229
207-
目前已经在`rapid-layout>=1.1.0`支持。使用示例:
230+
目前 PP-DocLayoutV2 在`rapid_layout>=1.1.0`已经支持。PP-DocLayoutV3 在`rapid_layout>=1.2.0`中支持。使用示例:
208231
209232
```python linenums="1"
210233
from rapid_layout import EngineType, ModelType, RapidLayout

docs/models.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,8 @@ hide:
1212

1313
| `model_type` | 版面类型 | 支持类别 |
1414
| :------ | :----- | :----- |
15-
|`pp_doc_layoutv2 (rapid_layout>=1.1.0)`|通用|`['abstract', 'algorithm', 'aside_text', 'chart', 'content', 'display_formula', 'doc_title', 'figure_title', 'footer', 'footer_image', 'footnote', 'formula_number', 'header', 'header_image', 'image', 'inline_formula', 'number', 'paragraph_title', 'reference', 'reference_content', 'seal', 'table', 'text', 'vertical_text', 'vision_footnote']`|
15+
|`pp_doc_layoutv3 (rapid_layout>=1.2.0)`|文档|`['abstract', 'algorithm', 'aside_text', 'chart', 'content', 'display_formula', 'doc_title', 'figure_title', 'footer', 'footer_image', 'footnote', 'formula_number', 'header', 'header_image', 'image', 'inline_formula', 'number', 'paragraph_title', 'reference', 'reference_content', 'seal', 'table', 'text', 'vertical_text', 'vision_footnote']`|
16+
|`pp_doc_layoutv2 (rapid_layout>=1.1.0)`|文档|`['abstract', 'algorithm', 'aside_text', 'chart', 'content', 'display_formula', 'doc_title', 'figure_title', 'footer', 'footer_image', 'footnote', 'formula_number', 'header', 'header_image', 'image', 'inline_formula', 'number', 'paragraph_title', 'reference', 'reference_content', 'seal', 'table', 'text', 'vertical_text', 'vision_footnote']`|
1617
||||
1718
| `pp_layout_table` | 表格 | `["table"]` |
1819
| `pp_layout_publaynet` | 英文 | `["text", "title", "list", "table", "figure"]` |
@@ -29,6 +30,8 @@ hide:
2930

3031
## 模型来源
3132

33+
**🔥 PP-DocLayoutV3**: [PP-DocLayoutV2](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3)
34+
3235
**🔥 PP-DocLayoutV2**: [PP-DocLayoutV2](https://huggingface.co/PaddlePaddle/PP-DocLayoutV2)
3336

3437
**PP 模型**[PaddleOCR 版面分析](https://github.com/PaddlePaddle/PaddleOCR/blob/133d67f27dc8a241d6b2e30a9f047a0fb75bebbe/ppstructure/layout/README_ch.md)

overrides/partials/comments.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ <h2 id="__comments">{{ lang.t("meta.comments") }}</h2>
77
data-repo-id="R_kgDOMLOtcQ"
88
data-category="General"
99
data-category-id="DIC_kwDOMLOtcc4CgMBG"
10-
data-mapping="url"
10+
data-mapping="title"
1111
data-strict="0"
1212
data-reactions-enabled="1"
1313
data-emit-metadata="0"

rapid_layout/configs/default_models.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
pp_doc_layoutv3:
2+
model_dir_or_path: https://www.modelscope.cn/models/RapidAI/RapidLayout/resolve/master/onnx/pp_doc_layout/pp_doc_layoutv3.onnx
3+
SHA256: 250dbad1dfb9e4983fab75e1bf5085cd56ec3f41d5c7d0f8623ec74856e7aa67
4+
15
pp_doc_layoutv2:
26
model_dir_or_path: https://www.modelscope.cn/models/RapidAI/RapidLayout/resolve/v1.1.0/onnx/pp_doc_layout/pp_doc_layoutv2.onnx
37
SHA256: 0bd2ea0997fe0789f0300292291f8bbf897d890b44a9a3bd5be72afd6198aa90

rapid_layout/utils/typings.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ class ModelType(Enum):
2727
DOCLAYOUT_D4LA = "doclayout_d4la"
2828
DOCLAYOUT_DOCSYNTH = "doclayout_docsynth"
2929
PP_DOC_LAYOUTV2 = "pp_doc_layoutv2"
30+
PP_DOC_LAYOUTV3 = "pp_doc_layoutv3"
3031

3132

3233
class EngineType(Enum):

0 commit comments

Comments
 (0)