Skip to content

Conversation

@borgr
Copy link

@borgr borgr commented Aug 11, 2025

Added rnn architecture type

Added rnn architecture type
Copy link
Collaborator

@damian1996 damian1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure we are need Architecture class at all? Probably model_name and provider_name can identify the model for us and I'm not sure if we need information about underlying architecture from evaluation logging point of view.

@nelaturuharsha
Copy link
Collaborator

I think the way things are setup - model_name is sufficient information for a checkpoint, and there will be edge-cases like Nemotron-H which are hybrid and this class will bloat up without adding other information.

I think it might be okay to remove it completely.

@damian1996
Copy link
Collaborator

In my opinion, this full class is not very helpful. Architecture like Harsha mentioned can be probably removed completely. Number of parameters often is included in model_name directly. If not, there probably this information is not necessary. Context_window/instruct also is dependant on the model directly (instruct often is given in model name as well).

class Configuration(BaseModel):
    architecture: Optional[Architecture] = Field(
        None, description='Model architecture type'
    )
    parameters: Optional[conint(ge=1)] = Field(
        None, description='Number of parameters in billions'
    )
    context_window: conint(ge=1) = Field(
        ..., description='Maximum context window size in tokens'
    )
    is_instruct: Optional[bool] = Field(
        None, description='Whether the model is instruction-tuned'
    )
    hf_path: Optional[str] = Field(None, description='HuggingFace model path')
    revision: Optional[str] = Field(None, description='Model revision/commit hash')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants