What is a broker topology and how does it work?
19 min
broker topology explained audience this document is intended for stakeholders without a technical background it explains how a broker based architecture works, what the key concepts mean, and how it differs from more familiar approaches such as apis and graphql the core idea a shared noticeboard the easiest way to understand a broker topology is to think of a noticeboard in a shared office anyone can pin a note to the board anyone else can walk up and read the notes that interest them the person who pinned the note does not need to know who will read it, and the reader does not need to know who wrote it the noticeboard is the broker this is fundamentally different from a phone call (an api), where one party dials another directly, waits for an answer, and the conversation is private between the two publishers, subscribers, and messages three roles make up a broker topology publisher a publisher is any system or device that sends data to the broker on the shop floor this is typically a machine, a sensor, or a controller in an enterprise context it might be an erp system, a production planning tool, or a quality management application the publisher does not know who will receive its data it simply sends a message and moves on broker the broker sits in the middle it receives messages from publishers, organizes them by topic, and delivers them to any subscriber that has expressed interest in that topic the broker handles routing so that publishers and subscribers never need to know about each other subscriber a subscriber is any system or application that has told the broker "i want to receive messages on this topic " when a new message arrives on that topic, the broker delivers it a subscriber can follow many topics at once, and many subscribers can follow the same topic simultaneously graph lr subgraph publishers p1\[machine sensor] p2\[production controller] p3\[quality system] end subgraph broker\["📡 broker"] t1(topic machine/temperature) t2(topic production/output) t3(topic quality/rejects) end subgraph subscribers s1\[mes application] s2\[reporting dashboard] s3\[maintenance alert] end p1 >|publishes| t1 p2 >|publishes| t2 p3 >|publishes| t3 t1 >|delivers| s1 t1 >|delivers| s3 t2 >|delivers| s1 t2 >|delivers| s2 t3 >|delivers| s2 what is a message? a message is a small, self contained packet of information it has a topic (the address it belongs to) and a payload (the actual content) a payload might be a temperature reading, a production count, an alarm event, or a quality measurement messages are typically lightweight and structured so that any system can read them without needing to know the internals of the sender do messages stay on the broker? this depends on the type of broker the answer is one of the most important practical differences between the two main broker technologies used in manufacturing mqtt and kafka mqtt live data, last known value mqtt is designed for lightweight, real time communication, making it ideal for sensors and machines that send frequent updates by default, a message sent over mqtt is delivered to whoever is currently subscribed and then discarded if no one is subscribed at that moment, the message is lost mqtt does support a concept called a retained message the broker keeps the single most recent message on a topic, so a new subscriber immediately receives the latest known value when it connects think of it as a whiteboard that always shows the current reading, overwritten each time a new value arrives mqtt is therefore well suited to answering "what is the current value right now?" kafka a durable, replayable log kafka (and its equivalent redpanda) works differently every message is written to a persistent log on disk and kept for a configurable period hours, days, or indefinitely subscribers do not just receive new messages; they can also go back and read messages from the past think of kafka as a journal rather than a whiteboard the journal keeps every entry ever made a new reader can start from the beginning and read the full history, or start from any point in time, or simply follow along from the current moment this makes kafka well suited to answering "what happened, and in what order?" it also means that if a downstream application goes offline temporarily, it can catch up on everything it missed when it comes back graph td subgraph mqtt\["📡 mqtt broker"] direction lr mw\[whiteboard\nshows latest value only] end subgraph kafka\["📋 kafka / redpanda"] direction lr kl\[persistent log\nfull history retained] end p1\[machine sensor] >|publishes| mw p1 >|publishes| kl mw >|delivers current value| s1\[live dashboard] kl >|delivers from any point in history| s2\[reporting application] kl >|replays missed messages| s3\[system that was offline] summary mqtt vs kafka mqtt kafka / redpanda primary use real time sensor data, ot layer event streaming, it layer message retention last value only (optional) full history, configurable duration can replay past messages? no, but mqtt5 does allow storage for specific topics yes designed for lightweight devices, low bandwidth high throughput enterprise systems best question it answers what is the value right now? what happened, and when? in a well designed manufacturing architecture, both are used together mqtt at plant level for machine communication, kafka at enterprise level for durable event streaming and integration how is this different from an api? an api (application programming interface) works on a request response model one application asks another for data, the second application responds, and the exchange is complete neither party is involved until one of them initiates a new request sequencediagram participant app as application participant api as api / server app >>api request give me the current temperature api >>app response 74 2°c note over app,api nothing happens until the next request a broker topology works on a publish subscribe model data flows continuously the subscriber does not ask; it simply receives new data the moment the publisher sends it sequencediagram participant sensor as machine sensor participant broker as broker participant app as application sensor >>broker 74 2°c broker >>app 74 2°c sensor >>broker 74 5°c broker >>app 74 5°c sensor >>broker 75 1°c broker >>app 75 1°c note over sensor,app data flows continuously without requests in practice, apis come in different forms the two most common are rest and soap a rest api communicates over standard web protocols and is widely used in modern applications and cloud services a soap api is an older, more formal standard that is still common in enterprise systems such as erp and lims platforms despite their differences in technical style, both follow the same fundamental principle one party sends a request, the other responds, and the exchange ends there the practical difference is coupling and timing with an api, the caller and the provider are directly connected the caller must know the address of the provider, both must be available at the same time, and if the provider is slow the caller waits with a broker, the publisher and subscriber are completely independent they do not know each other, they do not need to be online at the same time, and either side can change without affecting the other api (request response) broker (publish subscribe) who initiates? the consumer (caller) the producer (publisher) when is data received? only when requested whenever new data arrives are sender and receiver coupled? yes, directly connected no, fully independent what if the receiver is offline? request fails kafka message is stored; mqtt message may be lost best suited for retrieving specific data on demand reacting to events as they happen how is this different from graphql? graphql is a type of api like a standard api it follows the request response model, but it gives the consumer much more control over exactly what data is returned instead of receiving a fixed set of fields, the caller writes a precise query describing exactly which fields it needs, and nothing more sequencediagram participant app as dashboard participant gql as graphql api app >>gql query give me machine id, temperature, and shift output for line 3 gql >>app exactly those three fields for line 3 note over app,gql data shape is defined by the caller, not the provider graphql is excellent for building dashboards and reporting interfaces that need to pull specific, structured data on demand it is efficient because callers only receive what they asked for, and flexible because different callers can request different shapes of data from the same api where graphql ends and broker topology begins graphql answers the question "show me a specific set of data as it stands right now " a broker answers the question "tell me whenever something changes or happens " the difference is querying versus streaming a graphql query is a snapshot you ask, you receive, you move on if you want to know whether the temperature changed, you must ask again you can automate this by polling repeatedly, but this is wasteful and introduces a delay between the real event and your awareness of it a broker subscription is a continuous feed the moment a machine temperature changes, the broker delivers that change to every interested subscriber, instantly and without any polling if an alarm threshold is breached, a broker connected system can react within milliseconds graph td subgraph polling\["graphql polling for changes"] q1\[query at 09 00] > r1\[74 2°c] q2\[query at 09 01] > r2\[74 2°c] q3\[query at 09 02] > r3\[75 8°c ⚠️] note1\[change happened at 09 01 30 but was only discovered at 09 02] end subgraph streaming\["broker event driven"] e1\[09 01 30 sensor publishes 75 8°c] e2\[broker delivers instantly] e3\[system reacts at 09 01 30 ⚡] e1 > e2 > e3 end when graphql and apis are the right tool graphql and apis shine whenever the consumer needs to ask a specific question and receive a precise, structured answer the consumer is in control it decides what it wants, when it wants it, and what shape the response should take good examples in a manufacturing context reporting and analytics a shift report needs to show total output, downtime, and scrap rate for a specific line over a specific eight hour window the reporting tool queries the database at the end of the shift with exactly those parameters and receives exactly those numbers there is no need for a continuous stream; the data already exists and the question is well defined retrieving results from a lims a laboratory information management system (lims) stores test results, certifications, and quality verdicts when a production operator or an erp system needs to know whether a batch has passed quality release, it asks the lims a direct question "give me the test results for batch 2024 b 0471 " the lims responds with the full result record this is a point in time lookup against a system of record the consumer cares about a specific batch, at a specific moment, and wants a complete and authoritative answer an api or graphql query is the natural fit building flexible dashboards a management dashboard may need to combine data from multiple sources production counts from the mes, energy consumption from a utility system, and quality metrics from the lims graphql allows the dashboard to request precisely the fields it needs from each source in a single query, without receiving large amounts of irrelevant data sequencediagram participant erp as erp system participant lims as lims api erp >>lims query release status for batch 2024 b 0471 lims >>erp batch passed certificate number qc 8812 released by j peeters date 2024 06 09 note over erp,lims a specific question, a specific answer, at a specific moment in all of these cases the consumer knows exactly what it is looking for the data already exists somewhere the exchange is intentional and bounded when a broker is the right tool a broker works on a fundamentally different principle the publisher sends a message to the broker and immediately moves on it does not wait for a response it does not know who will receive the message it does not care what the receiver will do with the data its only responsibility is to publish accurately and consistently this is what makes a broker powerful in a manufacturing environment a temperature sensor does not know whether its data will be used by a dashboard, a maintenance alert, an energy optimization algorithm, or a compliance logging system it does not need to know it simply publishes the reading, and the broker takes care of delivery to whoever is interested graph lr subgraph publisher s\[temperature sensor publishes 74 2°c] end subgraph broker\["📡 broker"] t(topic line3/temperature) end subgraph subscribers d\[live dashboard displays the value] a\[alerting system checks against threshold] e\[energy monitor logs for efficiency analysis] h\[historian writes to long term storage] end s >|publishes once| t t > d t > a t > e t > h style s fill #f0f0f0 style t fill #e8f4f8 the sensor publishes once four completely different systems receive and use the data for four completely different purposes adding a fifth subscriber — say, a new ai based anomaly detection system — requires no changes to the sensor, no changes to the broker configuration, and no changes to any of the existing subscribers the new system simply subscribes to the topic and starts receiving data this decoupling is what separates a broker from an api with an api, the provider must know about every consumer and build an interface for each use case with a broker, the provider publishes data once and the organization decides independently how to use it a broker is therefore the right tool when something is happening now and multiple systems need to react to it the producer cannot or should not need to know who consumes its data new consumers may be added in the future without changing the source system speed of delivery matters more than the precision of the question being answered using both together graphql and broker topologies are not competing approaches they address different moments in the data lifecycle and are most effective when used in combination the broker handles the flow of live events something happens on the shop floor, the message travels through the broker to every interested system, and each system acts on it in its own way a historian subscribes and writes every event to a database an alerting system subscribes and checks whether thresholds have been crossed once the data is in the database, a graphql api can expose it for structured queries a reporting tool, a lims integration, or an erp system can then ask precise questions about historical data without needing to connect to the broker directly graph td subgraph live\["real time layer (broker)"] sensor\[machine or sensor] broker\[broker] alert\[alerting system] historian\[historian] end subgraph historical\["historical layer (api)"] db\[(database)] gql\[graphql api] report\[reporting tool] lims\[lims / erp query] end sensor >|publishes event| broker broker >|delivers instantly| alert broker >|delivers instantly| historian historian >|writes to| db db >|exposed via| gql gql >|answers specific questions| report gql >|answers specific questions| lims the broker answers "what is happening right now, and who needs to know?" the api answers "what happened, what does it mean, and what exactly do i need to know about it?" both questions are valid both need to be answered the skill lies in knowing which tool is responsible for which question graphql / api broker (pub sub) model query on demand continuous event stream who initiates the exchange? the consumer the producer data freshness as of the moment of the query real time, pushed immediately does the producer know its consumers? yes, the api is built for specific use cases no, the publisher does not know or care best for reporting, lab results, point in time lookups automation, alerts, live monitoring can new consumers be added easily? only if the api supports their use case yes, any system can subscribe without changes handles history? yes, queries the database directly kafka yes; mqtt last value only can they work together? yes the broker feeds data and graphql exposes it for queries putting it all together a modern manufacturing data architecture typically combines all of these approaches, each used where it fits best graph td subgraph ot\["🏭 plant floor"] machines\[machines & sensors] mqtt\[mqtt broker\nreal time ot data] end subgraph it\["🏢 enterprise layer"] kafka\[kafka / redpanda\ndurable event log] db\[(database\nhistorical storage)] gql\[graphql api\nquery interface] end subgraph consumers\["📊 applications"] live\[live monitoring\ndashboard] report\[reporting &\nanalytics] alert\[alerting &\nautomation] end machines >|publishes live data| mqtt mqtt >|selected data promoted| kafka kafka >|streams events| alert kafka >|writes history| db db >|queried via| gql gql >|serves structured data| report kafka >|streams live| live each layer plays a different role mqtt handles fast, lightweight communication on the plant floor kafka provides a durable, replayable log of everything that matters at enterprise level a database stores historical data in a queryable form graphql gives dashboards and reports a flexible, efficient way to retrieve exactly what they need this combination means that real time alerts fire the moment something happens, reporting tools can query history at any time, and every layer remains independent and maintainable on its own
