Skip to content

Commit 451ce7a

Browse files
authored
Merge pull request #40 from freelw/wangli_dev_20250617_1
update tools
2 parents 53cb213 + a5a886e commit 451ce7a

5 files changed

Lines changed: 3331 additions & 0 deletions

File tree

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,3 +14,9 @@ vocab/fra_vocab_builder/tgt_vocab.txt
1414
vocab/fra_vocab_builder/src_vocab.txt
1515
backends/gpu/metal/metal-cpp/.DS_Store
1616
handwritten_recognition
17+
tools/vocab_builder/vocab.txt
18+
tools/vocab_builder/builder.dSYM/Contents/Resources/DWARF/builder
19+
tools/vocab_builder/builder.dSYM/Contents/Resources/Relocations/aarch64/builder.yml
20+
tools/vocab_builder/builder.dSYM/Contents/Info.plist
21+
tools/vocab_builder/builder
22+
tools/vocab_builder/builder.dSYM

tools/pre_process/preprocess.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
import re
2+
3+
def get_timemachine():
4+
with open("./timemachine.txt") as f:
5+
return f.read()
6+
7+
def preprocess(text):
8+
"""Defined in :numref:`sec_text-sequence`"""
9+
return re.sub('[^A-Za-z]+', ' ', text).lower()
10+
11+
if '__main__' == __name__:
12+
13+
with open("./timemachine_preprocessed.txt", "w") as f:
14+
f.write(preprocess(get_timemachine()))

0 commit comments

Comments
 (0)