LogLLM

Is your ML script too cluttered for manual logging? 😵 No worries—here’s the seamless solution you’ve been looking for! ✨

Automate All Your Machine Learning Experiment Logging with LLMs

Join our Discord GitHub

GitHub Repo

System Flow

A package that automates the extraction of experimental conditions from your Python scripts with GPT4o-mini, and logs results using Weights & Biases (W&B).

Installation

git clone https://github.com/shure-dev/logllm.git
pip install -e .

export OPENAI_API_KEY="your-openai-api-key"
wandb login



Usage

# In your Jupyter Notebook (sample-script.ipynb)
from logllm import log_llm

notebook_path = "sample-script.ipynb"
project_name = "sample-project"

log_llm(notebook_path, project_name)

How it works

LLM("Our prompt" + "Your ML script") = "Extracted experimental conditions"

Our Prompt

You are advanced machine learning experiment designer.
Extract all experimental conditions and results for logging via wandb api. 
Add your original params in your JSON responce if you want to log other params.
Extract all informaiton you can find the given script as int, bool or float value.
If you can not describe conditions with int, bool or float value, use list of natural language.
Give advice to improve the acc.
If you use natural language, answer should be very short.
Do not include information already provided in param_name_1 for `condition_as_natural_langauge`.
Output JSON schema example:
This is just a example, make it change as you want.
{{
    "method":"str",
    "dataset":"str",
    "task":"str",
    "is_advanced_method":bool,
    "is_latest_method":"",
    "accuracy":"",
    "other_param_here":"",
    "other_param_here":"",
    ...
    "condition_as_natural_langauge":["Small dataset."],
    "advice_to_improve_acc":["Use bigger dataset.","Use more simple model."]
}}

Your ML Script

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

iris = datasets.load_iris()
X = iris.data[iris.target != 2] 
y = iris.target[iris.target != 2]  

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = SVC(kernel='linear')
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Extracted Experimental Conditions

{
    "method": "SVC",
    "dataset": "Iris",
    "task": "classification",
    "is_advanced_method": false,
    "is_latest_method": "",
    "accuracy": 1.00,
    "kernel": "linear",
    "test_size": 0.2,
    "random_state": 42,
    "condition_as_natural_langauge": ["Using linear kernel on SVC model.", "Excluding class 2 from Iris dataset.", "Splitting data into 80% training and 20% testing."],
    "advice_to_improve_acc": ["Confirm dataset consistency.", "Consider cross-validation for validation."]
}

Contributing

Contributions are welcome! If you have suggestions or improvements, please feel free to submit an issue or a pull request.

License

This project is licensed under the MIT License.