Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions research/ReFound/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@

## Introduction

This repo is the [PaddlePaddle](https://www.paddlepaddle.org.cn/en) implementation of the KDD 2024 Reasearch Track paper "ReFound: Crafting a Foundation Model for Urban Region Understanding upon Language and Visual Foundations"([paper link](https://dl.acm.org/doi/pdf/10.1145/3637528.3671992)).

## Requirement

* Python >= 3.7
* paddlepaddle == 2.4.2


## Pre-trained Model
Pretrained model weights of ReFound can be downloaded [here](https://www.dropbox.com/scl/fo/d6rj3r0b2plavmikjsldz/APMM-9LT-DrYx4A_b4scJVk?rlkey=5k5zrfjpxfiu1kuyevmgmvgch&st=f8vkc1xy&dl=0).



## Evaluation Dataset
We provide the processed dataset of two downstream tasks in our paper: Urban Village Detection (UVD) and Population Prediction (POP).
<u>The link is coming soon.</u>

<!-- Click [here](https://www.dropbox.com/sh/wsgmmwbab90b17a/AAAIHFPSTKyXqcq_oujqEHdCa?dl=0) to download the dataset. -->

<!-- The dataset consists of 4 dir / files: -->

<!-- - **graph.pgl** - dir of urban graph that contains nodes, edges, and the spatial information of nodes.

- *node_x.npy* and *node_y.npy* record the coordinates of nodes.
- *edge_len.npy* records the distance between connected nodes.

- **label.npy** - the ground truth label data of nodes.

- For CP task, it is the crime count of the region.
- For DRSD task, it is the binary label indicating a section is a dangerous road section (1) or not (0).
- **features.npy** - the node features.
- For CP task, it is constructed based on POI data.
- For DRSD task, it is generated via Deepwalk algorithm.
- **mask.json** - a python dict that records the node id in train / val / test set. -->


## Folder Structure
Please create folders with the following directory structure:
```
ReFound
|- bert-based-chinese
|- code
|- data
|- checkpoint
|- region_embed
|- log
|- log_feature
|- prob
|- prob_feature
```
- ./bert-based-chinese/ : download BERT tokenizer
- ./checkpoint/ : the pre-trained ReFound model will be loaded from this dir
- ./region_embed/ : save the features of each region extracted by ReFound model (for feature-based evaluation)
- ./log/ : save log files (for fine-tuning evaluation)
- ./log_feature/ : log files (for feature-based evaluation)
- ./prob/ : model's output probability in UVD binary classification task (for fine-tuning evaluation)
- ./prob_feature/ : model's output probability in UVD binary classification task (for feature-based evaluation)


## Usage
The pre-trained ReFound model can be applied to downstream urban region understanding tasks in two ways: *fine-tuning* and *feature-based prediction*.

### Preparation

**Step1:** download the evaluation data and put it to ./data/
**Step2:** download the pre-trained ReFound model and put it to ./checkpoint/
**Step3:** download Bert tokenizer and put it to ./bert-based-chinese/


### Fine-tuning

Check hyper-parameters in the file script_finetune.sh, and fine-tune the pre-trained model by:

```
# Urban Village Detection (UVD) task
sh script.sh uvd [city] [param1] [param2] ...

# Popilation Prediction (POP) task
sh script.sh pop [city] [param1] [param2] ...
```


### Feature-based Prediction
Extract the region feature using the pre-trained model by:
```
sh feature_extraction.sh [city]
```


Check hyper-parameters in the file script_feature_based.sh, and then train the task-specific prediction head:
```
# Urban Village Detection (UVD) task
sh script_feature_based.sh uvd [city] [param1] [param2] ...

# Popilation Prediction (POP) task
sh script_feature_based.sh pop [city] [param1] [param2] ...
```

### Reference

If you find this code or any of the ideas in the paper useful, please cite:

```bibtex
@inproceedings{xiao2024refound,
title={ReFound: Crafting a Foundation Model for Urban Region Understanding upon Language and Visual Foundations},
author={Xiao, Congxi and Zhou, Jingbo and Xiao, Yixiong and Huang, Jizhou and Xiong, Hui},
booktitle={Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
pages={3527--3538},
year={2024}
}
```







174 changes: 174 additions & 0 deletions research/ReFound/code/MoGETransformer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
import math
import paddle
import paddle.nn as nn
import paddle.nn.functional as F



class MoGESelfAttention(nn.Layer):
def __init__(self, config):
super().__init__()
assert config['hidden_size'] % config['num_attention_heads'] == 0

self.num_attention_heads = config['num_attention_heads']
self.attention_head_size = int(config['hidden_size'] / config['num_attention_heads'])
self.all_head_size = self.num_attention_heads * self.attention_head_size

self.query = nn.Linear(config['hidden_size'], self.all_head_size)
self.key = nn.Linear(config['hidden_size'], self.all_head_size)
self.value = nn.Linear(config['hidden_size'], self.all_head_size)

self.dropout = nn.Dropout(config['attention_probs_dropout_prob'])

def transpose_for_scores(self, x):
new_x_shape = tuple(x.shape[:-1]) + (self.num_attention_heads, self.attention_head_size)
x = x.reshape(new_x_shape)
return x.transpose((0, 2, 1, 3))

def forward(self, hidden_states, attention_mask):
query_layer = self.transpose_for_scores(self.query(hidden_states))
key_layer = self.transpose_for_scores(self.key(hidden_states))
value_layer = self.transpose_for_scores(self.value(hidden_states))

attention_scores = paddle.matmul(query_layer, key_layer, transpose_y=True)
attention_scores = attention_scores / math.sqrt(self.attention_head_size)

if attention_mask is not None:
attention_scores = attention_scores + attention_mask

attention_probs = nn.functional.softmax(attention_scores, axis=-1)
attention_probs = self.dropout(attention_probs)

context_layer = paddle.matmul(attention_probs, value_layer)

context_layer = context_layer.transpose((0, 2, 1, 3))
new_context_layer_shape = tuple(context_layer.shape[:-2]) + (self.all_head_size,)
context_layer = context_layer.reshape(new_context_layer_shape)

outputs = context_layer
return outputs


class MoGESelfOutput(nn.Layer):
def __init__(self, config):
super().__init__()
self.dense = nn.Linear(config['hidden_size'], config['hidden_size'])
self.LayerNorm = nn.LayerNorm(config['hidden_size'], epsilon=config['layer_norm_eps'])
self.dropout = nn.Dropout(config['hidden_dropout_prob'])

def forward(self, hidden_states, input_tensor):
hidden_states = self.dense(hidden_states)
hidden_states = self.dropout(hidden_states)
hidden_states = self.LayerNorm(hidden_states + input_tensor)
return hidden_states


class MoGEAttention(nn.Layer):
def __init__(self, config):
super().__init__()
self.selfattn = MoGESelfAttention(config)
self.output = MoGESelfOutput(config)

def forward(self, hidden_states, attention_mask):
selfattn_outputs = self.selfattn(hidden_states, attention_mask)
attention_output = self.output(selfattn_outputs, hidden_states)
outputs = attention_output
return outputs


class MoGEIntermediate(nn.Layer):
def __init__(self, config):
super().__init__()
self.dense = nn.Linear(config['hidden_size'], config['intermediate_size'])
self.act_fn = nn.GELU()

def forward(self, hidden_states):
hidden_states = self.dense(hidden_states)
hidden_states = self.act_fn(hidden_states)
return hidden_states


class MoGEOutput(nn.Layer):
def __init__(self, config):
super().__init__()
self.dense = nn.Linear(config['intermediate_size'], config['hidden_size'])
self.LayerNorm = nn.LayerNorm(config['hidden_size'], epsilon=config['layer_norm_eps'])
self.dropout = nn.Dropout(config['hidden_dropout_prob'])

def forward(self, hidden_states, input_tensor):
hidden_states = self.dense(hidden_states)
hidden_states = self.dropout(hidden_states)
hidden_states = self.LayerNorm(hidden_states + input_tensor)
return hidden_states



class MoGELayer(nn.Layer):
def __init__(self, config, is_ps_expert):
super().__init__()
self.attention = MoGEAttention(config)

self.poi_intermediate = MoGEIntermediate(config)
self.poi_output = MoGEOutput(config)

self.img_intermediate = MoGEIntermediate(config)
self.img_output = MoGEOutput(config)

if is_ps_expert:
self.ps_intermediate = MoGEIntermediate(config)
self.ps_output = MoGEOutput(config)


def forward(self, hidden_states, attention_mask, expert_selection, split_idx):
attention_output = self.attention(hidden_states, attention_mask)

if expert_selection == 'p_and_s':
poi_attention_output = attention_output[:, : split_idx]
img_attention_output = attention_output[:, split_idx :]

poi_intermediate_output = self.poi_intermediate(poi_attention_output)
poi_mlp_output = self.poi_output(poi_intermediate_output, poi_attention_output)

img_intermediate_output = self.img_intermediate(img_attention_output)
img_mlp_output = self.img_output(img_intermediate_output, img_attention_output)

mlp_output = paddle.concat([poi_mlp_output, img_mlp_output], axis=1)

elif expert_selection == 'ps':
ps_intermediate_output = self.ps_intermediate(attention_output)
ps_mlp_output = self.ps_output(ps_intermediate_output, attention_output)
mlp_output = ps_mlp_output

return mlp_output




class MoGEEncoder(nn.Layer):
def __init__(self, config):
super().__init__()
self.config = config
self.ps_layer_start_idx = config['ps_layer_start_idx']
self.layer = nn.LayerList([MoGELayer(config=config, is_ps_expert=(i >= self.ps_layer_start_idx)) for i in range(config['num_hidden_layers'])])


def forward(self, hidden_states, attention_mask, split_idx):

for i, layer_module in enumerate(self.layer):
if i < self.ps_layer_start_idx:
hidden_states = layer_module(
hidden_states=hidden_states,
attention_mask=attention_mask,
expert_selection='p_and_s',
split_idx=split_idx,
)
else:
hidden_states = layer_module(
hidden_states=hidden_states,
attention_mask=attention_mask,
expert_selection='ps',
split_idx=split_idx,
)
return hidden_states


28 changes: 28 additions & 0 deletions research/ReFound/code/configuration.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@


config = {
"attention_probs_dropout_prob": 0.1,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"ps_layer_start_idx": 10,
'initializer_range': 0.02,
"vocab_size": 21128,
"type_vocab_size": 2,
"max_len_poi": 512,
"max_len_token": 512,
"hidden_size": 768,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"hidden_dropout_prob": 0.1,
"chunk_size_feed_forward": 0,
"poi_cate_num": 130,
"image_size": 256,
"patch_size": 16,
"num_grid_x": 16,
"num_grid_y": 16,
"visual_vocab_size": 8192,
"mask_ratio_poi": 0.15,
"mask_ratio_img": 0.4,
"dvlfm_temp": 0.07,
}

Loading