Conversation
|
./lm |
|
The issue has been fixed. The cause was that an int was incorrectly used instead of an unsigned int when calculating the offset of the Metal buffer. I readjusted the default value of batch_size to 16, and in this case, the program can run correctly, but it triggers swap on my MacBook, making the machine run a bit slowly. Therefore, it's generally recommended to reduce it, such as using the -b 8 configuration. Additionally, batch_size can be set slightly larger, but not by much, as it doesn't support video memory exceeding the range of a uint. You can try the latest code from the main branch and check if the loss decreases with the default parameters on your MacBook. @dratman |
|
fix #35 |
|
(base) ➜ cpp-transformer git:(main) ./lm |
|
Fortunately my MacBook has 64 GByte shared RAM so not likely to be a problem. Will continue testing in the morning. |
I'm so envious of you. By the way, should I support video memory exceeding 4GB? Hahahaha |
|
Dear freelw...
(I'd like to know your Chinese name, or English name if you prefer, so that
I can address you properly)
Don't be too envious of the equipment I'm fortunate to have -- you see, my
working time was in the 1980s. I am 73 years old, and my mind works much
more slowly now. You have youth, which is the best gift in the whole world.
Enjoy it!
But back to work...
Here is my latest run, where the only difference from pulling your latest
version is (as you will see below) I have added some sentences to
test_lm.txt:
…--->time ./lm -e 5
corpus : ./resources/time_machine/timemachine_preprocessed.txt
epochs : 5
batch_size : 16
dropout : 0.2
gpu : 1
learning rate : 0.001
checkpoint :
max_words_cnt : 256
token_ids_size : 256
Allocating memory
for tensors : 36609236 bytes,
for c_tensors: 3194706336 bytes
for grad_tensors: 1241779004 bytes
epoch 0 : [192/224]loss : 5.99219
epoch 1 : [192/224]loss : 2.0627
epoch 2 : [192/224]loss : 0.410259
epoch 3 : [192/224]loss : 0.0856831
epoch 4 : [192/224]loss : 0.0341517
checkpoint saved : ./checkpoints/checkpoint_20250617_223325_4.bin
real 0m57.148s
user 0m19.555s
sys 0m6.291s
Tue Jun 17 22:33:26 EDT 2025
--->time ./lm -e 0 -c ./checkpoints/checkpoint_20250617_223325_4.bin
corpus : ./resources/time_machine/timemachine_preprocessed.txt
epochs : 0
batch_size : 16
dropout : 0.2
gpu : 1
learning rate : 0.001
checkpoint : ./checkpoints/checkpoint_20250617_223325_4.bin
max_words_cnt : 256
token_ids_size : 256
Allocating memory
for tensors : 36355416 bytes,
for c_tensors: 17206908 bytes
for grad_tensors: 14209596 bytes
loading from checkpoint : ./checkpoints/checkpoint_20250617_223325_4.bin
loaded from checkpoint
serving mode
test file : ./test_lm.txt
sentence : crystalline substance and now i must be explicit
convenient to speak of him was expounding a recondite matter to us his grey
eyes shone and twinkled and his usually pale face was flushed and animated
the fire burned brightly and animated the fire burned brightly and animated
the time traveller for so it will be convenient to speak
-----------------
sentence : a small shaded lamp the bright light of which
was expounding a recondite matter to us his grey eyes shone and twinkled
and his usually pale face was flushed and animated the fire burned brightly
and animated the fire burned brightly and animated the time traveller for
so it will be convenient to speak of him was expounding a
-----------------
sentence : candlesticks
wells i the time traveller for so it will be convenient to speak of him was
expounding a recondite matter to us his grey eyes shone and twinkled and
his usually pale face was flushed and animated the fire burned brightly and
animated the fire burned brightly and animated the
-----------------
sentence : expounding
a recondite matter to us his grey eyes shone and twinkled and his usually
pale face was flushed and animated the fire his usually pale face was
flushed and animated the fire burned brightly and animated the fire burned
brightly and animated the time traveller for so it will be
-----------------
sentence : the time
machine by h g wells i the time traveller for so it will be convenient to
speak of him was expounding a recondite matter to us his grey eyes shone
and twinkled and his usually pale face was flushed and animated the fire
burned brightly and animated the fire burned
-----------------
sentence : the fire burned
time traveller for so it will be convenient to speak of him was expounding
a recondite matter to us his grey eyes shone and twinkled and his usually
pale face was flushed and animated the fire burned brightly and animated
the fire burned brightly and animated the time traveller for
-----------------
sentence : between that thursday
it will be convenient to speak of him was expounding a recondite matter to
us his grey eyes shone and twinkled and his usually pale face was flushed
and animated the fire burned brightly and animated the fire burned brightly
and animated the time traveller for so it will be
-----------------
sentence : the blowing out of the candle
him was expounding a recondite matter to us his grey eyes shone and
twinkled and his usually pale face was flushed and animated the fire burned
brightly and animated the fire burned brightly and animated the time
traveller for so it will be convenient to speak of him was expounding
-----------------
sentence : between that thursday
it will be convenient to speak of him was expounding a recondite matter to
us his grey eyes shone and twinkled and his usually pale face was flushed
and animated the fire burned brightly and animated the fire burned brightly
and animated the time traveller for so it will be
-----------------
sentence : a small shaded lamp the bright light of which
was expounding a recondite matter to us his grey eyes shone and twinkled
and his usually pale face was flushed and animated the fire burned brightly
and animated the fire burned brightly and animated the time traveller for
so it will be convenient to speak of him was expounding a
-----------------
sentence : candlesticks
wells i the time traveller for so it will be convenient to speak of him was
expounding a recondite matter to us his grey eyes shone and twinkled and
his usually pale face was flushed and animated the fire burned brightly and
animated the fire burned brightly and animated the
-----------------
sentence : expounding
a recondite matter to us his grey eyes shone and twinkled and his usually
pale face was flushed and animated the fire his usually pale face was
flushed and animated the fire burned brightly and animated the fire burned
brightly and animated the time traveller for so it will be
-----------------
sentence : the time
machine by h g wells i the time traveller for so it will be convenient to
speak of him was expounding a recondite matter to us his grey eyes shone
and twinkled and his usually pale face was flushed and animated the fire
burned brightly and animated the fire burned
-----------------
sentence : the fire burned
time traveller for so it will be convenient to speak of him was expounding
a recondite matter to us his grey eyes shone and twinkled and his usually
pale face was flushed and animated the fire burned brightly and animated
the fire burned brightly and animated the time traveller for
-----------------
sentence : the blowing out of the candle
him was expounding a recondite matter to us his grey eyes shone and
twinkled and his usually pale face was flushed and animated the fire burned
brightly and animated the fire burned brightly and animated the time
traveller for so it will be convenient to speak of him was expounding
-----------------
sentence : crystalline substance and now i must be explicit
convenient to speak of him was expounding a recondite matter to us his grey
eyes shone and twinkled and his usually pale face was flushed and animated
the fire burned brightly and animated the fire burned brightly and animated
the time traveller for so it will be convenient to speak
-----------------
real 0m2.630s
user 0m0.395s
sys 0m0.074s
Tue Jun 17 22:33:53 EDT 2025
-----------------
As you can see the processing is running very fast, and everything seems to
work correctly, except that the generated output does not seem to vary much
depending on the prompt (or "sentence"). I am continuing to investigate
that.
Your work is excellent!
Ralph
On Tue, Jun 17, 2025 at 8:53 AM freelw ***@***.***> wrote:
*freelw* left a comment (freelw/cpp-transformer#41)
<#41 (comment)>
Fortunately my MacBook has 64 GByte shared RAM so not likely to be a
problem. Will continue testing in the morning.
I'm so envious of you. By the way, should I support video memory exceeding
4GB? Hahahaha
—
Reply to this email directly, view it on GitHub
<#41 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAWZPWHKOVOPVQDSD5YMJD3EAFT7AVCNFSM6AAAAAB7PA7KO6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSOBQGI3DSNRQGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
@dratman First of all, I wish you good health. May I directly contact you via the email published on your GitHub? My email is freelw81@qq.com. Regarding the issue mentioned above: The output did not change significantly with the modification of the prompt. I think there might be two reasons:
|
|
Of course, feel free to send email to my regular address. --- By "the full
Time Machine data" you mean the 178 KByte novel in the file
timemachine.txt? I assumed I was already training with that. To train with
the whole novel, I just add "-m 10000000" to the training command? 4 hours
per epoch is no problem. I can easily let it run overnight or even for
several days.
About my continued interest in this field: I spent two years as an
undergraduate at UC Berkeley in 1969-1971, concentrating on physics and
math. Later I found a career in both logic design and software development.
Many years and events went by. My wife and I raised two children but lost
one to a drug overdose. Over time I began to feel old, and my ability to
learn new technical material was actually declining until about 2022, when
I first found out about the astonishing architecture of the GPT-type
language models. The high-dimensional vectors I read about in connection
with GPT-3, successively modified through dozens of processing layers, were
unlike anything I could have imagined as a way of representing word,
sub-word or character tokens. The idea of changing an integer representing
an English letter, word, or a Chinese character into a
thousand-dimensional vector of floating-point numbers seemed to defy common
sense. I was immediately determined to understand what was going on. I
started reading extensively and watching youtube videos, and gradually --
to my surprise -- some of my technical acumen returned. I began playing
with Andrej Karpathy's makemore and similar small models.
I am still trying to fully grasp how it is possible that this bizarre
system of vectors and weights can understand what I write, and then reply
in ways that often help me understand some topic more quickly than by the
old methods of study.
…On Tue, Jun 17, 2025 at 11:31 PM freelw ***@***.***> wrote:
*freelw* left a comment (freelw/cpp-transformer#41)
<#41 (comment)>
@dratman <https://github.com/dratman>
I am amazed that you still maintain a passion for learning at such an age,
which is truly admirable!
First of all, I wish you good health.
May I directly contact you via the email published on your GitHub? My
email is ***@***.***
Regarding the issue mentioned above: The output did not change
significantly with the modification of the prompt. I think there might be
two reasons:
1. The training data we used only contains 256 words, which may cause
the model's output to lack diversity. We can use the parameter -m 10000000
to let the program train with the complete Time Machine data. On my
machine, one epoch takes 4 hours.
2. The positional encoding I used is absolute positional encoding,
which should differ from the standard GPT-2. I am still learning about this
field and plan to try other positional encodings in future versions.
—
Reply to this email directly, view it on GitHub
<#41 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAWZPVWBQHRMVQ3F5OPHZL3EDMSPAVCNFSM6AAAAAB7PA7KO6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSOBSGU2DGMZQHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
If you add a parameter like -m 10000000, the output should contain the wording What you need to note is that the denominator of the "epoch" line is 32743, and the token_ids_size is 32775. |
|
You're really amazing! My project was also inspired by two projects of Andrej Karpathy, one is llm.c, and the other is micrograd.I see you've already paid attention to llm.c. It's indeed a remarkable project, but I don't think it's particularly suitable for understanding deep learning. I highly recommend micrograd – its implementation is simple. If you have basic concepts about the backpropagation mechanism, this project might suddenly make things click for you, just like it did for me. It's extremely inspiring: simple to implement yet perfect for learning.
|
#fix 35