IoT in Azure (Azure Function)

This is a continuation of my previous post.

(figure 1.1) example of real-time (IoT) data architecture

We are at the second step of our architecture diagram; Azure Function.

In pure SQL terms, an Azure Function can be thought of as trigger and a stored procedure together that instantiates on every new row of data in our raw layer; the RawEvents table. This Function is the stream transformer layer.
Example of the trigger:

CREATE TRIGGER EnrichEvent AFTER INSERT ON RawEvents FOR EACH ROW EXEC EnrichEventProcedure;

Example of the stored procedure:

CREATE PROCEDURE EnrichEventProcedure AS BEGIN -- Parse JSON -- Validate fields -- Lookup metadata (JOIN to Device table) -- Add derived fields -- Normalize timestamps -- Insert into RefinedEvents END;

The stored procedure in our example represents our data enrichment and data normalization logic. In terms of real-time (or, near real-time data) this is happening very quickly and (hopefully) accurately.

Why is this Function sitting between two Event Hubs? Because we only want clean, enriched, and consistent data downstream in our second Event Hub so that consumers have data they can rely on and have confidence in.

What if we didn’t have a Function between our two Event Hubs? That would mean Event Hub #2 would have the same raw data as Event Hub #1 and we would be duplicating the process unnecessarily which results in lost time (latency), greater costs (Azure resources), and higher likelihood of errors and failed processes.

Our Azure Function essentially becomes our single source of truth for data quality.

It is important to note here that Azure Functions – specifically in our use-case where it is situated between both of our Event Hubs – runs on .NET or Node runtimes that are optimized for JSON allowing Azure Functions to parse JSON quickly (microseconds).

Azure Functions also scale out automatically with increasing volumes of data. For example, if we begin receiving 10 times the amount of data we normally see, Azure will automatically spin up 10 times more Function instances to handle the increasing amount of data.

Each instance of a function can also process events/data in parallel, meaning they are well suited to be used for ever-increasing amounts of data as they can process events concurrently.

An important item to note here pertaining to bottlenecks with Azure Functions: bottlenecks are rarely caused by JSON parsing, instead it’s most likely that if we were to experience a bottleneck they would be from the following:

External lookups (Cosmos, SQL, Redis)
Network latency
Cold starts (if we are using a consumption plan)
Large payloads

Otherwise, JSON parsing is a very trivial activity for Functions.

In the case that our Function is parsing JSON, similar to the following example:

{ "deviceId": "abc123", "temp": "72.5", "pressure": "101.3", "timestamp": "2026-02-22T23:59:12Z" }

We might have the following steps:

Validation
- Are the required fields present?
- Is the Timestamp valid?
- Are the values in range?
Normalization
- Convert strings to numbers
- Convert Fahrenheit to Celsius
- Convert psi to kPa
- Standardize timestamp format
Enrich
- Lookup device metadata
- Add region, customer, model
- Add derived fields (price, status flags)
Output clean event
- Send to Event Hub #2

IoT in Azure (Azure Function)

Join the conversation Cancel reply