DRUT
DRUT
  • Why Drut ?
  • Products
    • PRU 2500
    • FIC 2500
    • PRU 1000
    • FIC 1000
    • Photonic Fabric (PXC)
    • Drut Fabric Manager
    • Drut Software Platform
  • Solutions
    • DynamicXcelerator
    • Drut 2500 Product Series
    • Use Cases
  • Resources
    • Drut Blog
    • Product Datasheets
    • News and Announcements
    • Articles and Whitepapers
    • Videos
  • Company
    • About Us
    • Careers
    • Team Drut
    • Partners and Associations
    • Upcoming Events
  • Contact Us
  • More
    • Why Drut ?
    • Products
      • PRU 2500
      • FIC 2500
      • PRU 1000
      • FIC 1000
      • Photonic Fabric (PXC)
      • Drut Fabric Manager
      • Drut Software Platform
    • Solutions
      • DynamicXcelerator
      • Drut 2500 Product Series
      • Use Cases
    • Resources
      • Drut Blog
      • Product Datasheets
      • News and Announcements
      • Articles and Whitepapers
      • Videos
    • Company
      • About Us
      • Careers
      • Team Drut
      • Partners and Associations
      • Upcoming Events
    • Contact Us
  • Sign In

  • My Account
  • Signed in as:

  • filler@godaddy.com


  • My Account
  • Sign out

Signed in as:

filler@godaddy.com

  • Why Drut ?
  • Products
    • PRU 2500
    • FIC 2500
    • PRU 1000
    • FIC 1000
    • Photonic Fabric (PXC)
    • Drut Fabric Manager
    • Drut Software Platform
  • Solutions
    • DynamicXcelerator
    • Drut 2500 Product Series
    • Use Cases
  • Resources
    • Drut Blog
    • Product Datasheets
    • News and Announcements
    • Articles and Whitepapers
    • Videos
  • Company
    • About Us
    • Careers
    • Team Drut
    • Partners and Associations
    • Upcoming Events
  • Contact Us

Account

  • My Account
  • Sign out

  • Sign In
  • My Account

AI Infrastructure Engineer (Lab & Benchmarks): onsite nashua

Overview

This is a full-time, onsite position based in Nashua, NH (3–5 days per week). In this role, you’ll work with our lab team to build and scale the AI development, test, and validation environment. Responsibilities include rack-and-stack, high-speed networking, optical interconnects, bare-metal bring-up, and reproducible LLM/GPU benchmarking to support Engineering, QA, and Marketing. This is a hands-on, fast-paced startup role where you’ll take on multiple responsibilities, collaborate closely across teams, and contribute directly to advancing our AI infrastructure.

Key Responsibilities

  • Lab operations: Assist with racking servers/GPUs/switches/PDUs; neat LC/MPO fiber work; labeling, inventory, ESD/safety; maintain as-built diagrams and runbooks.
  • Platform bring-up: Partner on BIOS/BMC configuration; firmware updates (GPU/NIC/storage); NUMA tuning; validate PCIe topology and links.
  • OS provisioning & automation: Contribute to automated bare-metal installs (PXE/iPXE, kickstart, cloud-init) and improve Ansible playbooks.
  • VM provisioning & orchestration: Support creation, management, and benchmarking of virtualized environments (OpenStack, VMware, KVM); validate GPU passthrough, SR-IOV, and network/storage integration.
  • AI stack setup: Install/verify CUDA/NCCL (and/or ROCm), PyTorch, vLLM with Ray; optional Slurm/Kubernetes for multi-node runs.
  • Benchmarking & validation: Co-design and run repeatable tests (throughput, tokens/s, latency, utilization, power/thermals) on single- and multi-GPU nodes; track and visualize results.
  • Troubleshooting: With the team, triage performance issues across GPU/PCIe/NVLink/RoCE/Ethernet, GPUDirect storage, and CPU/NUMA; document fixes and prevention steps.
  • Reporting & enablement: Produce concise reports and reproducible configs for Dev/QA/Marketing (methods, graphs, exact steps); help prep demo/POC images and scripts. 

Required Qualifications

  • 3–7+ years in lab/validation/SRE/solutions roles supporting Linux on bare metal.
  • Hands-on with NVIDIA GPUs (driver, CUDA, NCCL) and PyTorch; able to stand up an LLM serving stack with team guidance.
  • Solid Linux admin + networking fundamentals (VLANs, LACP, MTU/Jumbo, routing basics).
  • Automation with Bash and/or Python; config management with Ansible.
  • Familiar with BMC/IPMI/Redfish, firmware lifecycles, and disciplined documentation.
  • Able to lift/move ~40–50 lbs; onsite lab presence required.

Preferred Skills

  • Fabrics: 100–400G Ethernet/InfiniBand (ConnectX-6/7), RoCE/IB, SR-IOV; iperf experience.
  • Storage: GPUDirect Storage, NVMe/NVMe-oF, fio profiling, latency tuning.
  • Optics/Photonics: MPO/LC cleaning/inspection, basic optical power checks; exposure to co-packaged optics or photonic links is a plus.
  • PCIe & switches: Practical experience with PCIe topology and hot-plug; familiarity with PCIe switches is a bonus.
  • Scheduling/containers: Slurm, Kubernetes GPU operators, Helm.
  • Observability: Prometheus/Grafana, basic perf tracing.
  • AMD GPUs/ROCm, NVIDIA MIG/MPS; CI for benchmarks 

AI INFRASTRUCTURE ENGINEER (LAB & BENCHMARKS): NASHUA, NH

Join Our Team

If you're interested in one of our open positions, start by applying here and attaching your resume.

Apply Now

Attach Resume
Attachments (0)

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

  • Drut Blog
  • Product Datasheets
  • News and Announcements
  • Articles and Whitepapers
  • Videos
  • About Us
  • Upcoming Events
  • Contact Us

Drut Technologies Inc.

200 Innovative Way, Suite 1390, Nashua, New Hampshire 03062

©2025 Drut Technologies Inc. All Rights Reserved.

Powered by

This website uses cookies.

We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.

Accept