Make stream_token can break prefill, too #94

zeerd · 2024-03-12T05:06:08Z

When we give the gemma a large input(ex. a file), we need a way to stop it.

austinvhuang

Thanks, the need to be able to exit makes sense.

Probably beyond the scope of this PR, but I wonder if in the long term this starts to get messy trying to track this in stream_token It's certainly doable, but there's probably a breaking point where stream_token is trying to accomplish too much control flow logic.

There's some related design choices around customizable hooks/callbacks in the context of trainer abstractions (eg in pytorch lightning), might look to see how other frameworks deal with this.

Specific tactical items to get this merged:

Can you use a separate variable to track the stream_token return value instead of overwriting token? (unless there's a rationale I'm not seeing)
Also, may need to merge/test with the latest state of the dev branch.

austinvhuang · 2024-03-19T02:22:40Z

gemma.cc

@@ -529,7 +529,7 @@ void GenerateImpl(GemmaImpl<TConfig>& gemma, const InferenceArgs& args,

  // Prefill stops before prompt.size() - 1 since the last prompt token is the
  // first input token for generation.
-  while (pos_offset < prompt.size() - 1) {
+  while (pos_offset < prompt.size() - 1 && token != EOS_ID) {


Is there a reason to override the token value vs. track the stream_token return state with its own bool variable? Keeping the two forms of state seems easier to reason about rather than having to track "Is token EOS_ID because that's what was generated or because stream_token was false..."

You are right. I want to change as little code as possible. But it's not meaningful.
I will think more again.

zeerd · 2024-03-22T06:03:06Z

In fact, I don't like returning in the middle of a function.
However, it is strange to enclose the second half of the entire function with conditional judgments.

Another side, I recommend splitting GenerateImpl into two functions.
But this might be a big changing.

jan-wassenberg · 2024-03-22T23:33:21Z

Yes, GenerateImpl does not fit on a single screen and I agree it would make sense to split into two functions.
We just did that with Attention as well. Would you prefer lambdas or actual functions? A function probably only makes sense if there are relatively few parameters.

zeerd · 2024-03-25T03:10:31Z

I have less knowlage about neurol network.
But I am thinking about an usecase :

There is a source file, which has 5 functions.I want the gemma help me to know what the 3rd function dose.

We know that, if the gemma read the whole file first, it could give the summarization of the 3rd function better.

So, maybe I could do:

1st, let gemma read the whole file to summarize but not output(to save time time, as I do not care it).
2nd, let gemma read the 3rd function again , summarize and output.

I want to know, If we splite the GenerateImpl to two part. Could them help me to implement this case?

If the answer is not (or this could not help me to save time), I think lambda is enough.

jan-wassenberg · 2024-03-27T10:05:03Z

hm, I do not believe splitting the function up helps this case specifically. GenerateImpl consists of the prefill loop, bounds checks, then the decode loop. Splitting it up could improve readability and make it easier to understand "early return" in this pull request. We can see how it looks as a lambda :)

zeerd mentioned this pull request Mar 12, 2024

libgemma API refactor - decouple from interactive repl demo specifics, add hello world example using libgemma #82

Merged

austinvhuang requested changes Mar 19, 2024

View reviewed changes

zeerd closed this Apr 20, 2024

This was referenced Apr 20, 2024

Make stream_token can break prefill, too #151

Closed

Use lambda to split function and Make stream_token can break prefill #156

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make stream_token can break prefill, too #94

Make stream_token can break prefill, too #94

zeerd commented Mar 12, 2024

austinvhuang left a comment

austinvhuang Mar 19, 2024

zeerd Mar 19, 2024

zeerd commented Mar 22, 2024

jan-wassenberg commented Mar 22, 2024

zeerd commented Mar 25, 2024

jan-wassenberg commented Mar 27, 2024

Make stream_token can break prefill, too #94

Make stream_token can break prefill, too #94

Conversation

zeerd commented Mar 12, 2024

austinvhuang left a comment

Choose a reason for hiding this comment

austinvhuang Mar 19, 2024

Choose a reason for hiding this comment

zeerd Mar 19, 2024

Choose a reason for hiding this comment

zeerd commented Mar 22, 2024

jan-wassenberg commented Mar 22, 2024

zeerd commented Mar 25, 2024

jan-wassenberg commented Mar 27, 2024