I started with a simple home prototype monitoring a handful of spots where temperature matters and something going wrong may be costly. My goal was continuous readings, meaningful history, an alarm when something drifts. Once it worked, the more interesting target was boats: engine compartments, cabins, bilges that need to stay in range while vessels sit unattended. I want an alert before a problem becomes expensive, and so do many owners.
To make it possible I needed efficient sensors and the infrastructure to go with them. The standard path for this kind of project runs through an MQTT broker, TLS, a WebSocket bridge, Kafka or Kinesis for buffering, container-based consumers, and Kubernetes to hold it all together. That stack may make sense at high commercial scale with a dedicated team. But for a prototype evolving to moderate scale the infrastructure needs to prove itself before time and budget are committed, otherwise it's high risk.
The stack to avoid
The MQTT + Kafka + containers model isn't bad software. It's just heavy. An MQTT broker needs to be running all the time, authenticated, load-balanced if you care about availability, and patched regularly. TLS on embedded devices requires a PKI (managed or self-hosted) and certificates that expire on a schedule your devices need to follow. Kafka (or its managed equivalents) adds another deployment with its own operational surface. By the time you've stood up consumers in containers, you have four or five always-on services to keep healthy, just to receive numbers from sensors that fire once every ten seconds.
The cost model is equally unfriendly. AWS IoT Core charges per connection-minute and per message, so at small scale it is inexpensive but requires a rules engine, certificate provisioning per device, and a downstream processing stack to actually do anything useful with the data. A managed Kafka cluster (Amazon MSK) starts around $600/month in broker instance charges alone for a minimal 3-broker deployment, before storage or data transfer. MSK Serverless removes broker provisioning but still charges per cluster-hour, so the idle cost doesn't go away. Kubernetes on EKS adds $0.10/hour for the control plane — roughly $73/month before a single worker node. Container runtimes, worker nodes, and the people-time to operate them raise the floor further. None of this scales down to zero when nothing is happening — it all runs whether the sensors are sending or not.
For a prototype, the risk is that you build out this stack to get something working, but then the operational and cost overheads make it impractical to maintain or scale. You have to prove the value of the data before committing to the infrastructure, but you also need the infrastructure to prove the value of the data. It's a chicken-and-egg problem.
Goals
Hard requirements:
- Sample temperature continuously; send readings to the backend in configurable batches
- Retain historical charts, and alarms when something goes out of range or offline
- Battery life measured in years, not months
Nice-to-haves:
- BOM cost low enough that deploying additional sensors doesn't require a business case
- Be able to secure transport — no hardcoded credentials, no per-device certificate management
- Backend cost that scales sub-linearly — near-zero at small scale, still cheap at hundreds of sensors
Phase 1: the CYD prototype
Before committing to a power-optimized design, I wanted to validate the full system end to end: firmware → WireGuard → UDP Gateway → backend → charts. The "Cheap Yellow Display" board — an ESP32-2432S028R with a built-in 2.8" touch display — is ideal for this phase. The touchscreen makes the firmware state machine visible without needing a serial monitor attached; you can watch WiFi connect, WireGuard negotiate, and packets being sent, all on the device itself.
Hardware is minimal:
- CYD board (ESP32 + 240×320 TFT + resistive touchscreen, all in one unit)
- DS18B20 temperature sensor wired to GPIO 27 with a 4.7kΩ pull-up to VCC
- LDR on GPIO 34 for ambient light sensing (used to dim the backlight automatically)
The firmware is built with PlatformIO and follows a straightforward state machine:
- STATE_INIT — load
config.txtfrom LittleFS - STATE_WIFI_BEGIN — connect to WiFi
- STATE_TIME_CONFIG — sync time via NTP
- STATE_WIREGUARD_BEGIN — initialize the WireGuard tunnel
- STATE_UDP_BEGIN — start UDP telemetry
- STATE_READY — normal operation, sample and send
Configuration lives in a config.txt file stored in LittleFS, the on-device filesystem. It
holds WiFi credentials, WireGuard private key, peer public key, and the Gateway endpoint. There are no
hardcoded secrets in the firmware binary. Uploading a new config.txt via PlatformIO's
filesystem upload is all it takes to reconfigure a device — no firmware change needed.
WireGuard is handled by the WireGuard-ESP32 library. Key generation uses standard WireGuard tooling on any desktop:
wg genkey > privatekey wg pubkey < privatekey > publickey
The device private key goes in config.txt. The public key is registered as a peer in the
WireGuard Listener configuration on UDP Gateway. No certificate authority, no renewal schedule.
One practical wrinkle: when the ESP32 establishes a full WireGuard tunnel, all UDP traffic — including NTP — goes through it. The device clock never syncs because public NTP servers on port 123 aren't reachable past the Listener. The fix, described in detail in Adding WireGuard-Tunneled NTP, is to handle NTP requests inside the Lambda destination. It's a 48-byte exchange, straightforward to implement, and means the device syncs its clock through the same tunnel it uses for telemetry — no external NTP dependency.
The full CYD example — firmware, wiring diagram, and WireGuard configuration — is at proxylity/examples/wireguard-iot-device.
Phase 2: the XIAO ESP32-C3 for the field
With the backend validated end-to-end, the next step was something small enough to mount in a junction box and run for years on a battery. The Seeed XIAO ESP32-C3 fits: postage-stamp sized, around $4, more than enough processing power. No display, and WireGuard is off the table — the handshake latency after each deep sleep wake costs more in battery life than the encryption is worth for this use case. So just plain fire-and-forget UDP.
The wiring is slightly more involved than on the CYD because sensor power is actively managed:
- DS18B20 GND → GPIO3 (driven LOW during measurement, floated otherwise)
- DS18B20 DATA → GPIO4 (4.7kΩ pull-up required — the internal pull-up is too weak for reliable OneWire)
- DS18B20 VCC → GPIO5 (driven HIGH during measurement, floated otherwise)
Powering the sensor completely off between readings prevents current leakage through the GPIO pins during deep sleep. This is one of several small optimizations that together produce an average consumption of about 139 mAh/month — giving roughly 1.8 years of runtime on a 3000 mAh 18650 cell.
The firmware structure is unusual if you're used to traditional Arduino sketches: loop()
never runs. The chip fully resets on every wake from deep sleep. All state that needs to survive across
cycles lives in RTC_DATA_ATTR variables, which are preserved across deep sleep but
lost on full power-off:
RTC_DATA_ATTR int bootCount = 0; RTC_DATA_ATTR float readings[SAMPLE_COUNT * 2]; // alternating external/internal RTC_DATA_ATTR IPAddress cachedLocalIP; RTC_DATA_ATTR IPAddress cachedGatewayIP; RTC_DATA_ATTR IPAddress cachedSubnetMask;
Every tenth boot, the chip wakes, reads the DS18B20 external temperature and the ESP32-C3 internal temperature, appends both to the RTC buffer, and returns to deep sleep. On the 30th reading, it wakes WiFi, sends the batch, resets the counter, and sleeps again — WiFi active for about 64 milliseconds every five minutes.
WiFi reconnection uses static IP caching. The first boot performs DHCP and saves the resulting IP, subnet, gateway, and DNS to RTC memory. Every subsequent connection skips DHCP entirely and completes in roughly 63 milliseconds rather than the typical 2 seconds. That matters: every extra second at WiFi-active power levels (~65 mA) meaningfully reduces battery life across millions of wake cycles.
The UDP packet itself is a compact binary format:
- Bytes 0-7: DS18B20 sensor address (8-byte unique hardware ID, eliminates manual device assignment)
- Bytes 8-11:
uint32_tsample count (30) - Bytes 12+: 30 pairs of
float— external temperature then internal temperature, in °C
Total: 252 bytes per transmission, once every five minutes.
The full project — including power consumption analysis, wiring notes, and the debug mode trick that simulates deep sleep without losing the serial connection — is at mlhpdx/xiao-esp32c3-wifi-temp-sensor.
Security: right-sized for each platform
The CYD uses a WireGuard tunnel. It has the CPU headroom, it's typically plugged into mains power, and the full tunnel approach means the device behaves as if it's on a private network — NTP resolves, the Lambda can respond to the device, and all communication is encrypted and authenticated with perfect forward secrecy. No certificates, no renewal schedule, no CA to manage.
WireGuard is also significantly more CPU-efficient than TLS. ChaCha20-Poly1305 is designed for speed on devices without dedicated crypto accelerators. Building the raptor CLI, we measured WireGuard-over-UDP at roughly 10x less CPU time per transfer than TLS-over-HTTPS at the same data size — less energy per packet, which matters even on mains-powered hardware.
The XIAO uses plain UDP. The reason is handshake latency. After waking from deep sleep, re-establishing a WireGuard tunnel requires multiple round trips before any data flows. At WiFi-active power levels (~65 mA), those extra milliseconds of connection setup add meaningful cost across thousands of daily wake cycles, and they show up directly in battery life projections. The packet payload is temperature readings — not inherently sensitive data. If the threat model changes (a less trusted network, or firmware that can afford to stay awake a bit longer), a WireGuard Listener can replace or be added alongside the plain UDP Listener with no changes to the downstream pipeline.
On the backend, the Listeners — one WireGuard or plain UDP — point to a single Firehose Destination. The downstream pipeline is identical regardless of how the sample data arrived. The DS18B20 hardware address in the binary payload identifies the device; no per-packet routing logic is needed.
The backend: packets as events
The entire backend is a CloudFormation stack with no always-on compute. When sensors are not sending, it costs nothing in compute. When they are, it runs for the duration of packet processing and stops.
The Proxylity side is two resources: a
WireGuard Listener and/or a plain UDP Listener, both
configured with the same
Firehose Destination. The Firehose Destination
uses the base64 formatter to preserve the binary payload. Each Firehose record contains the full
JSON packet envelope — source IP, port,
timestamp, batch tag, and the base64-encoded payload.
Firehose buffers and writes compressed batches to S3. An S3 event notification triggers a Lambda function on each new object. The Lambda:
- Reads the Firehose object from S3
- Splits the newline-delimited records and parses the JSON envelope from each
- Base64-decodes and parses the binary UDP payload: sensor address, sample count, and float pairs
- Writes structured readings (timestamp, sensor ID, external temp, internal temp) to a DynamoDB table for query access
- Regenerates pre-cooked charts — SVG line charts and bitmap thumbnails — stored in S3 and served directly as static assets
S3 object notification triggers Lambda; Lambda does work and writes output. A UDP packet arriving at the Gateway is an event with a payload — the same pattern as an HTTP request hitting API Gateway. The transport changes but the processing doesn't.
The CloudFormation stack for the backend covers: the two Proxylity Listeners and shared Destination (defined via the UDP Gateway resource type), a Firehose delivery stream with S3 buffering and GZIP compression, an S3 bucket with object notification configured, and a Lambda function with the S3 trigger and DynamoDB write permissions. The Firehose Destination docs and WireGuard Listener docs cover the Proxylity side of the configuration in detail.
Alarms without custom metrics
CloudWatch custom metrics cost $0.30 per unique metric name per month. At 1,000 sensors with two temperature channels each, that's $600/month in CloudWatch metrics alone — before writing a single alarm rule. The math makes custom metrics impractical at any meaningful scale.
The DynamoDB table has a GSI with yyyy-MM-dd as the partition key and sensor ID as
the sort key. Every time the Lambda processes a packet batch, it updates the sensor's main record
(latest readings, last-seen timestamp) and writes a presence marker into the GSI for today's date.
To find sensors that haven't reported today, a scheduled Lambda queries the GSI for yesterday's
date — any sensor ID that appears there but not in today's partition hasn't checked in. That list
drives the alarm.
Daily granularity is sufficient for this use case. If I'd needed finer resolution, an alternative
would have been to put a TTL on the GSI presence records and attach a DynamoDB Stream filtered to
REMOVE events — each expiring record triggers a check for that specific sensor.
More real-time, more moving parts. For once-a-day alerting, the scheduled query is simpler and
costs almost nothing to run.
Scale and cost
The table below estimates monthly backend cost at various sensor counts, assuming ten-second sample intervals and thirty-sample batches (one UDP packet every five minutes per sensor). The XIAO sends no acknowledgment responses, so only inbound packets are billed. AWS costs cover Firehose ingestion (billed at 5 KB minimum per record), DynamoDB on-demand writes (one update per sensor per Firehose delivery), and Lambda invocations (within free tier at these volumes).
| Sensors | Packets/day | Data/day | Est. monthly backend cost |
|---|---|---|---|
| 1 | 288 | ~73 KB | ~$1 |
| 10 | 2,880 | ~730 KB | ~$1 |
| 100 | 28,800 | ~7.3 MB | ~$2–3 |
| 1,000 | 288,000 | ~73 MB | ~$20–25 |
The ~$1 floor across the first three rows is the Proxylity listener port charge (~$0.00139/hr). Packet volume stays well under the 1 million/month free tier until approaching 1,000 sensors, where Proxylity packet charges (~$9.55) and DynamoDB writes (~$10.80) each become significant. Lambda invocations and S3 storage remain negligible throughout.
Compare this to the cost floor of the traditional stack. A minimal managed Kafka cluster on Amazon MSK runs around $600/month in broker instance charges before adding storage or data transfer — and MSK Serverless still charges per cluster-hour even when idle. Add the EKS control plane ($0.10/hour, ~$73/month), worker nodes, and an MQTT broker or IoT Core rules pipeline, and the always-on baseline is well over $700/month — regardless of how many sensors you have, regardless of whether any of them are sending.
With the serverless UDP approach, beyond the ~$1/month listener port, every other cost scales directly with traffic. When nothing is sending, there is nothing else to pay.
Links
- proxylity/examples/wireguard-iot-device — CYD firmware, wiring, LVGL GUI, and WireGuard configuration
- mlhpdx/xiao-esp32c3-wifi-temp-sensor — XIAO ESP32-C3 firmware with deep sleep, power analysis, and static IP caching
- Adding WireGuard-Tunneled NTP — handling NTP for embedded devices behind a full WireGuard tunnel
- WireGuard Listener documentation
- Firehose Destination documentation