C++CUDAInfraOpen Source

Title

Nott

InfrastructureModern C++/CUDA deep-learning framework, focused on fast prototyping, explicit control over kernels, memory layout, and runtime execution.

Description

C++-first deep-learning framework on LibTorch with CUDA-oriented runtime controls, providing a strongly typed, graph-based API for building, training, and evaluating models with reproducibility and low-jitter execution as primary goals.

Latency

5–10% faster than LibTorch

Latency jitter (CV)

0.001 [Nott] vs 0.142 [LibTorch]

Lines vs LibTorch

~50% fewer

Prebuilt Layers

24

Prebuilt Transformers

8

Prebuilt Activations

14

Prebuilt Losses

12

Prebuilt Optimizers

12

Prebuilt Evaluation Metrics

105

Developed Description

Nott is an infrastructure project: a reusable C++ deep-learning framework that layers a cohesive, strongly typed API over LibTorch while preserving explicit control over GPU execution and runtime behavior. The goal is to keep research iteration close to the metal (predictable performance, controllable memory/layout, explicit execution modes) without sacrificing safety, composability, and reproducibility across repeated training runs. Architecture and authoring center around a directed acyclic graph (DAG) representation of models. Networks are expressed as layers and higher-order blocks that compose into linear or non-linear topologies. This graph-first design supports both rapid composition for standard architectures and explicit wiring for more complex structures, while maintaining a consistent, descriptor-driven configuration surface across the framework. A core differentiator is the deliberate exposure of GPU-oriented runtime controls that typically get abstracted away. Nott is designed to make performance-critical knobs explicit and auditable: CUDA Graph execution to reduce launch overhead and stabilize step time, Tensor Core-oriented execution where applicable, and controllable memory formats/layouts (e.g., NHWC vs NCHW) to match kernel expectations and hardware efficiency. These controls are intended to reduce latency variance (jitter) and improve predictability, which is essential both for fair experimental comparisons and for production-like constraints. Nott also treats the training workflow as a first-class system rather than a loose collection of scripts. Data ingestion and preprocessing are integrated via dataset loaders, data manipulation utilities (augmentation/normalization/splitting/shuffling), and data checks that keep assumptions explicit and help prevent silent dataset issues. Training and evaluation are exposed through unified APIs with configurable runtime options (batching, validation behavior, mixed precision/graph modes when enabled) and a large metrics suite to standardize measurement across experiments. For monitoring and diagnosis, Nott integrates visualization/telemetry through a GNUplot wrapper to keep experiment observability close to execution. For model understanding and validation, interpretability tools (e.g., Grad-CAM, LIME, Shapley-style explanations) are included to support debugging, failure analysis, and research reporting.

Images