PaddlePaddle · Xiao-congxi · Sep 5, 2024
diff --git a/research/ReFound/README.md b/research/ReFound/README.md
@@ -0,0 +1,121 @@
+
+## Introduction
+
+This repo is the [PaddlePaddle](https://www.paddlepaddle.org.cn/en) implementation of the KDD 2024 Reasearch Track paper "ReFound: Crafting a Foundation Model for Urban Region Understanding upon Language and Visual Foundations"([paper link](https://dl.acm.org/doi/pdf/10.1145/3637528.3671992)).
+
+## Requirement
+
+* Python >= 3.7
+* paddlepaddle == 2.4.2
+
+
+## Pre-trained Model
+Pretrained model weights of ReFound can be downloaded [here](https://www.dropbox.com/scl/fo/d6rj3r0b2plavmikjsldz/APMM-9LT-DrYx4A_b4scJVk?rlkey=5k5zrfjpxfiu1kuyevmgmvgch&st=f8vkc1xy&dl=0).
+
+
+
+## Evaluation Dataset
+We provide the processed dataset of two downstream tasks in our paper: Urban Village Detection (UVD) and Population Prediction (POP). 
+<u>The link is coming soon.</u>
+
+<!-- Click [here](https://www.dropbox.com/sh/wsgmmwbab90b17a/AAAIHFPSTKyXqcq_oujqEHdCa?dl=0) to download the dataset. -->
+
+<!-- The dataset consists of 4 dir / files: -->
+
+<!-- - **graph.pgl** - dir of urban graph that contains nodes, edges, and the spatial information of nodes.
+
+    - *node_x.npy* and *node_y.npy* record the coordinates of nodes.
+    - *edge_len.npy* records the distance between connected nodes.
+
+- **label.npy** - the ground truth label data of nodes.
+
+    - For CP task, it is the crime count of the region.
+    - For DRSD task, it is the binary label indicating a section is a dangerous road section (1) or not (0).
+- **features.npy** - the node features.
+    - For CP task, it is constructed based on POI data.
+    - For DRSD task, it is generated via Deepwalk algorithm.
+- **mask.json** - a python dict that records the node id in train / val / test set. -->
+
+
+## Folder Structure
+Please create folders with the following directory structure:
+```
+ReFound
+   |- bert-based-chinese
+   |- code
+   |- data
+   |- checkpoint
+   |- region_embed
+   |- log
+   |- log_feature
+   |- prob
+   |- prob_feature
+```
+- ./bert-based-chinese/ : download BERT tokenizer
+- ./checkpoint/ : the pre-trained ReFound model will be loaded from this dir
+- ./region_embed/ : save the features of each region extracted by ReFound model (for feature-based evaluation)
+- ./log/ : save log files (for fine-tuning evaluation)
+- ./log_feature/ : log files (for feature-based evaluation)
+- ./prob/ : model's output probability in UVD binary classification task (for fine-tuning evaluation)
+- ./prob_feature/ : model's output probability in UVD binary classification task (for feature-based evaluation)
+
+
+## Usage
+The pre-trained ReFound model can be applied to downstream urban region understanding tasks in two ways: *fine-tuning* and *feature-based prediction*.
+
+### Preparation
+
+**Step1:** download the evaluation data and put it to ./data/
+**Step2:** download the pre-trained ReFound model and put it to ./checkpoint/
+**Step3:** download Bert tokenizer and put it to ./bert-based-chinese/
+
+
+### Fine-tuning
+
+Check hyper-parameters in the file script_finetune.sh, and fine-tune the pre-trained model by:
+
+```
+# Urban Village Detection (UVD) task
+sh script.sh uvd [city] [param1] [param2] ... 
+
+# Popilation Prediction (POP) task
+sh script.sh pop [city] [param1] [param2] ... 
+```
+
+
+### Feature-based Prediction
+Extract the region feature using the pre-trained model by:
+```
+sh feature_extraction.sh [city]
+```
+
+
+Check hyper-parameters in the file script_feature_based.sh, and then train the task-specific prediction head:
+```
+# Urban Village Detection (UVD) task
+sh script_feature_based.sh uvd [city] [param1] [param2] ... 
+
+# Popilation Prediction (POP) task
+sh script_feature_based.sh pop [city] [param1] [param2] ... 
+```
+
+### Reference
+
+If you find this code or any of the ideas in the paper useful, please cite:
+
+```bibtex
+@inproceedings{xiao2024refound,
+  title={ReFound: Crafting a Foundation Model for Urban Region Understanding upon Language and Visual Foundations},
+  author={Xiao, Congxi and Zhou, Jingbo and Xiao, Yixiong and Huang, Jizhou and Xiong, Hui},
+  booktitle={Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
+  pages={3527--3538},
+  year={2024}
+}
+``` 
+
+
+
+
+
+
+
diff --git a/research/ReFound/code/MoGETransformer.py b/research/ReFound/code/MoGETransformer.py
@@ -0,0 +1,174 @@
+import math
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+
+
+class MoGESelfAttention(nn.Layer):
+    def __init__(self, config):
+        super().__init__()
+        assert config['hidden_size'] % config['num_attention_heads'] == 0 
+
+        self.num_attention_heads = config['num_attention_heads']
+        self.attention_head_size = int(config['hidden_size'] / config['num_attention_heads'])
+        self.all_head_size = self.num_attention_heads * self.attention_head_size
+
+        self.query = nn.Linear(config['hidden_size'], self.all_head_size)
+        self.key = nn.Linear(config['hidden_size'], self.all_head_size)
+        self.value = nn.Linear(config['hidden_size'], self.all_head_size)
+
+        self.dropout = nn.Dropout(config['attention_probs_dropout_prob'])
+
+    def transpose_for_scores(self, x):
+        new_x_shape = tuple(x.shape[:-1]) + (self.num_attention_heads, self.attention_head_size)
+        x = x.reshape(new_x_shape)
+        return x.transpose((0, 2, 1, 3))
+
+    def forward(self, hidden_states, attention_mask):
+        query_layer = self.transpose_for_scores(self.query(hidden_states))
+        key_layer = self.transpose_for_scores(self.key(hidden_states))
+        value_layer = self.transpose_for_scores(self.value(hidden_states))
+
+        attention_scores = paddle.matmul(query_layer, key_layer, transpose_y=True)
+        attention_scores = attention_scores / math.sqrt(self.attention_head_size)
+
+        if attention_mask is not None:
+            attention_scores = attention_scores + attention_mask
+
+        attention_probs = nn.functional.softmax(attention_scores, axis=-1)
+        attention_probs = self.dropout(attention_probs)
+
+        context_layer = paddle.matmul(attention_probs, value_layer)
+
+        context_layer = context_layer.transpose((0, 2, 1, 3))
+        new_context_layer_shape = tuple(context_layer.shape[:-2]) + (self.all_head_size,)
+        context_layer = context_layer.reshape(new_context_layer_shape)
+
+        outputs = context_layer
+        return outputs
+
+
+class MoGESelfOutput(nn.Layer):
+    def __init__(self, config):
+        super().__init__()
+        self.dense = nn.Linear(config['hidden_size'], config['hidden_size'])
+        self.LayerNorm = nn.LayerNorm(config['hidden_size'], epsilon=config['layer_norm_eps'])
+        self.dropout = nn.Dropout(config['hidden_dropout_prob'])
+
+    def forward(self, hidden_states, input_tensor):
+        hidden_states = self.dense(hidden_states)
+        hidden_states = self.dropout(hidden_states)
+        hidden_states = self.LayerNorm(hidden_states + input_tensor)
+        return hidden_states
+
+
+class MoGEAttention(nn.Layer):
+    def __init__(self, config):
+        super().__init__()
+        self.selfattn = MoGESelfAttention(config)
+        self.output = MoGESelfOutput(config)
+
+    def forward(self, hidden_states, attention_mask):
+        selfattn_outputs = self.selfattn(hidden_states, attention_mask)
+        attention_output = self.output(selfattn_outputs, hidden_states)
+        outputs = attention_output
+        return outputs
+
+
+class MoGEIntermediate(nn.Layer):
+    def __init__(self, config):
+        super().__init__()
+        self.dense = nn.Linear(config['hidden_size'], config['intermediate_size'])
+        self.act_fn = nn.GELU()
+
+    def forward(self, hidden_states):
+        hidden_states = self.dense(hidden_states)
+        hidden_states = self.act_fn(hidden_states)
+        return hidden_states
+
+
+class MoGEOutput(nn.Layer):
+    def __init__(self, config):
+        super().__init__()
+        self.dense = nn.Linear(config['intermediate_size'], config['hidden_size'])
+        self.LayerNorm = nn.LayerNorm(config['hidden_size'], epsilon=config['layer_norm_eps'])
+        self.dropout = nn.Dropout(config['hidden_dropout_prob'])
+
+    def forward(self, hidden_states, input_tensor):
+        hidden_states = self.dense(hidden_states)
+        hidden_states = self.dropout(hidden_states)
+        hidden_states = self.LayerNorm(hidden_states + input_tensor)
+        return hidden_states
+
+
+
+class MoGELayer(nn.Layer):
+    def __init__(self, config, is_ps_expert):
+        super().__init__()
+        self.attention = MoGEAttention(config)
+
+        self.poi_intermediate = MoGEIntermediate(config)
+        self.poi_output = MoGEOutput(config)
+
+        self.img_intermediate = MoGEIntermediate(config)
+        self.img_output = MoGEOutput(config)
+
+        if is_ps_expert:
+            self.ps_intermediate = MoGEIntermediate(config)
+            self.ps_output = MoGEOutput(config)
+
+
+    def forward(self, hidden_states, attention_mask, expert_selection, split_idx):
+        attention_output = self.attention(hidden_states,  attention_mask)
+
+        if expert_selection == 'p_and_s':
+            poi_attention_output = attention_output[:, : split_idx]
+            img_attention_output = attention_output[:, split_idx :]
+
+            poi_intermediate_output = self.poi_intermediate(poi_attention_output)
+            poi_mlp_output = self.poi_output(poi_intermediate_output, poi_attention_output)
+
+            img_intermediate_output = self.img_intermediate(img_attention_output)
+            img_mlp_output = self.img_output(img_intermediate_output, img_attention_output)
+
+            mlp_output = paddle.concat([poi_mlp_output, img_mlp_output], axis=1)
+
+        elif expert_selection == 'ps':
+            ps_intermediate_output = self.ps_intermediate(attention_output)
+            ps_mlp_output = self.ps_output(ps_intermediate_output, attention_output)
+            mlp_output = ps_mlp_output
+
+        return mlp_output
+
+
+
+
+class MoGEEncoder(nn.Layer):
+    def __init__(self, config):
+        super().__init__()
+        self.config = config
+        self.ps_layer_start_idx = config['ps_layer_start_idx']
+        self.layer = nn.LayerList([MoGELayer(config=config, is_ps_expert=(i >= self.ps_layer_start_idx)) for i in range(config['num_hidden_layers'])])
+
+
+    def forward(self, hidden_states, attention_mask, split_idx):
+
+        for i, layer_module in enumerate(self.layer):
+            if i < self.ps_layer_start_idx:
+                hidden_states = layer_module(
+                    hidden_states=hidden_states, 
+                    attention_mask=attention_mask, 
+                    expert_selection='p_and_s', 
+                    split_idx=split_idx,
+                )
+            else:
+                hidden_states = layer_module(
+                    hidden_states=hidden_states, 
+                    attention_mask=attention_mask, 
+                    expert_selection='ps', 
+                    split_idx=split_idx,
+                )
+        return hidden_states
+
+
diff --git a/research/ReFound/code/configuration.py b/research/ReFound/code/configuration.py
@@ -0,0 +1,28 @@
+
+
+config = {
+    "attention_probs_dropout_prob": 0.1,
+    "num_attention_heads": 12,
+    "num_hidden_layers": 12,
+    "ps_layer_start_idx": 10,
+    'initializer_range': 0.02,
+    "vocab_size": 21128,
+    "type_vocab_size": 2,
+    "max_len_poi": 512,
+    "max_len_token": 512,
+    "hidden_size": 768,
+    "intermediate_size": 3072,
+    "layer_norm_eps": 1e-12,
+    "hidden_dropout_prob": 0.1,
+    "chunk_size_feed_forward": 0,
+    "poi_cate_num": 130,
+    "image_size": 256,
+    "patch_size": 16,
+    "num_grid_x": 16,
+    "num_grid_y": 16,
+    "visual_vocab_size": 8192,
+    "mask_ratio_poi": 0.15,
+    "mask_ratio_img": 0.4,
+    "dvlfm_temp": 0.07,
+}
+