You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DataFlow-Eval-Process is a data evaluation and processing system designed to evaluate data quality from multiple dimensions and filter out high-quality data. We mainly support SOTA algorithms within academic papers with strong theoretical support.
22
+
DataFlow is a data evaluation and processing system designed to 1) evaluate data quality from multiple dimensions; 2) filter out high-quality data and 3) generate chain-of-thought or other types of augmentation. We mainly support SOTA algorithms within academic papers with strong theoretical support.
23
+
24
+
<!-- We now support text, image, video, and multimodality data types. -->
25
+
Specifically, we first build various `operators` based on rules, LLMs, and LLM APIs, which are then assembled into six `pipelines`. These pipelines form the complete `Dataflow` system. Further, We also build an `agent` that can flexibly compose new pipelines with existing `operators` on demand.
26
+
27
+
Current Pipelines in Dataflow are as follows:
28
+
-**Reasoning Pipeline**: Enhances existing question–answer pairs with (1) extended chain-of-thought, (2) category classification, and (3) difficulty estimation.
29
+
-**Text2SQL Pipeline**: Translates natural language questions into SQL queries, supplemented with explanations, chain-of-thought reasoning, and contextual schema information.
23
30
24
-
We now support text, image, video, and multimodality data types.
25
31
26
32
## News
27
33
-[2025-07-25] 🎉 We release the dataflow-agent.
@@ -35,13 +41,16 @@ We now support text, image, video, and multimodality data types.
35
41
## Installation
36
42
For environment setup, please using the following commands👇
37
43
38
-
```
44
+
```shell
39
45
conda create -n dataflow python=3.10
40
46
conda activate dataflow
41
47
pip install -e .
42
48
```
43
49
44
50
## Features
45
51
### 1. Reasoning Pipeline
52
+

53
+
54
+
For demo inputs and outputs, you can refence our [Reasoning Pipeline sample](https://huggingface.co/datasets/Open-Dataflow/dataflow-demo-Reasonning/) on Huggingface.
0 commit comments