LLM Fine-Tuning Experiment#

An experiment in fine-tuning an open-source LLM to generate XML documentation comments for C# code. The goal was to determine whether a small, locally-runnable model could reliably produce formatted code comments,reducing reliance on larger hosted models for a narrow, repetitive task.

Approach#

Fine-tuned Llama 3.1 8B using Unsloth with QLoRA (4-bit quantization) on a free Google Colab T4 GPU. The training data consisted of approximately 1,000 rows of practice questions and 1,000 rows of commented C# code formatted as instruction-response pairs.

Training Configuration#

Parameter	Value
Base model	Llama 3.1 8B (4-bit quantized)
Method	QLoRA with LoRA rank 16
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
Training steps	70
Batch size	2 (with 4 gradient accumulation steps)
Learning rate	2e-4
Hardware	Tesla T4 (free Colab tier)

Resource Usage#

Metric	Value
Training time	~7.7 minutes
Peak memory	7.9 GB of 14.7 GB available
Memory for training	1.9 GB (13% of GPU)
Trainable parameters	41.9M (LoRA adapters only)

Results#

The fine-tuned model could generate structurally correct XML documentation comments for C# code:

Input:  C# method with no comments
Output: /// <summary>, /// <param>, /// <returns> tags
        with contextually appropriate descriptions

Example output from the model given a simple C# program:

/// <summary>
/// Demonstrates a simple interest calculation.
/// </summary>
class Program
{
    /// <summary>
    /// The entry point for the program.
    /// </summary>
    /// <param name="args">An array of command-line arguments.</param>
    static void Main(string[] args) { ... }

    /// <summary>
    /// Calculates the simple interest given principal and rate.
    /// </summary>
    ...
}

Code Quality Review Script#

A separate script was written to use the model for automated quality review of C# code blocks across a codebase. The script walks every .cs file, chunks it into 50-line segments, and feeds each chunk to the model with a structured prompt covering five review categories:

Security vulnerabilities (injection points, credential exposure, Azure-specific concerns)
Performance issues (redundant code, inefficient patterns, concurrency bottlenecks)
Logical errors and edge cases
Maintainability and readability
C# / .NET / Azure best practices

Results are written to a CSV with file path, line range, and analysis for each chunk.

Full script

#!/usr/bin/env python
import os
import glob
import csv
import torch
from unsloth import FastLanguageModel

# Configuration
CODEBASE_DIR = os.getenv("CODEBASE_DIR", os.path.join(os.getcwd(), "codebase"))
FILE_PATTERNS = ["**/*.cs"]
LINES_PER_CHUNK = 50
MAX_NEW_TOKENS = 128
TEMPERATURE = 0.7
TOP_P = 0.9
OUTPUT_FILE = "analysis_results.csv"
MODEL_NAME = "unsloth/mistral-7b-v0.3-bnb-4bit"


def chunk_file_lines(file_path, lines_per_chunk=50):
    """Yield (chunk_text, start_line, end_line) for every chunk of lines."""
    with open(file_path, "r", encoding="utf-8", errors="ignore") as f:
        lines_buffer = []
        line_numbers = []
        for idx, line in enumerate(f, start=1):
            lines_buffer.append(line)
            line_numbers.append(idx)
            if len(lines_buffer) >= lines_per_chunk:
                yield "".join(lines_buffer), line_numbers[0], line_numbers[-1]
                lines_buffer = []
                line_numbers = []
        if lines_buffer:
            yield "".join(lines_buffer), line_numbers[0], line_numbers[-1]


def main():
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name=MODEL_NAME,
        max_seq_length=8000,
        load_in_4bit=True,
        dtype=None,
    )
    model = FastLanguageModel.for_inference(model)
    model.eval()
    torch.set_grad_enabled(False)

    results = []

    for pattern in FILE_PATTERNS:
        matching_files = glob.glob(
            os.path.join(CODEBASE_DIR, pattern), recursive=True
        )
        for file_path in matching_files:
            if not os.path.isfile(file_path):
                continue
            for chunk_idx, (chunk_text, start_line, end_line) in enumerate(
                chunk_file_lines(file_path, LINES_PER_CHUNK)
            ):
                prompt = f"""
You are an expert code reviewer for C#, .NET, and Azure.

Review the code snippet below. If context is missing (dependencies, hosting model, Azure service, config, auth model), state assumptions explicitly and do not invent details.

Return your review in this exact structure:

1) Summary (2-4 bullets)
2) Security
   - Finding: ...
   - Severity: Low/Med/High/Critical
   - Evidence: cite relevant line(s) or exact code fragment
   - Fix: concrete change (code-level when possible)
3) Performance
   - (same sub-structure)
4) Correctness (logic, edge cases)
   - (same sub-structure)
5) Maintainability/Readability
   - (same sub-structure)
6) C#/.NET/Azure best practices
   - (same sub-structure)
7) Suggested tests (unit/integration) to validate fixes (bullets)

Code snippet:
{chunk_text}
""".strip()

                inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
                generated_ids = model.generate(
                    **inputs,
                    max_new_tokens=MAX_NEW_TOKENS,
                    temperature=TEMPERATURE,
                    top_p=TOP_P,
                    do_sample=True,
                )
                raw_output = tokenizer.decode(
                    generated_ids[0], skip_special_tokens=True
                ).strip()

                if "Analysis:" in raw_output:
                    output_text = raw_output.split("Analysis:", 1)[-1].strip()
                else:
                    output_text = raw_output

                results.append({
                    "file": file_path,
                    "chunk_index": chunk_idx,
                    "start_line": start_line,
                    "end_line": end_line,
                    "analysis": output_text,
                })

    with open(OUTPUT_FILE, "w", encoding="utf-8", newline="") as csvfile:
        writer = csv.DictWriter(
            csvfile,
            fieldnames=["file", "chunk_index", "start_line", "end_line", "analysis"],
        )
        writer.writeheader()
        for row in results:
            writer.writerow(row)

    print(f"Analysis complete! Results saved to '{OUTPUT_FILE}'.")


if __name__ == "__main__":
    main()

Outcome#

This was an exploratory experiment. The model produced reasonable output for the narrow task of XML documentation, but we did not end up using it in production. The code quality review script demonstrated a second use case, running the model against an entire codebase to flag potential issues across security, performance, and maintainability. Still, the work informed decisions about where local models fit (and don’t fit) in the broader pipeline.