<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.9.3">Jekyll</generator><link href="https://blog.deterministic6g.eu/feed.xml" rel="self" type="application/atom+xml" /><link href="https://blog.deterministic6g.eu/" rel="alternate" type="text/html" /><updated>2025-05-05T11:19:51+02:00</updated><id>https://blog.deterministic6g.eu/feed.xml</id><title type="html">The DETERMINISTIC6G Blog</title><subtitle>Blog of the DETERMINISTIC6G Project</subtitle><entry><title type="html">Building a Programmable Software TSN Switch with P4</title><link href="https://blog.deterministic6g.eu/posts/2025/04/29/programmable-software-tsn-switch.html" rel="alternate" type="text/html" title="Building a Programmable Software TSN Switch with P4" /><published>2025-04-29T01:00:00+02:00</published><updated>2025-04-29T01:00:00+02:00</updated><id>https://blog.deterministic6g.eu/posts/2025/04/29/programmable-software-tsn-switch</id><content type="html" xml:base="https://blog.deterministic6g.eu/posts/2025/04/29/programmable-software-tsn-switch.html">&lt;p&gt;Time-Sensitive Networking (TSN) enhances traditional Ethernet with capabilities for reliable and predictable communication. It’s widely used in systems where precise timing and low latency are critical, such as industrial automation, automotive networks, and professional media streaming.&lt;/p&gt;

&lt;p&gt;One key component of TSN is the Time-Aware Shaper (TAS), which schedules traffic based on time slots to ensure that critical data is delivered exactly when needed. Traditionally, implementing TSN features like TAS required specialized hardware. However, recent developments in the Linux kernel have introduced native support for some TSN capabilities, such as &lt;a href=&quot;https://www.man7.org/linux/man-pages/man8/tc-taprio.8.html&quot;&gt;TAPRIO&lt;/a&gt; (Time-Aware Priority Scheduler).&lt;/p&gt;

&lt;p&gt;This blog explores how to build a software-based TSN switch that’s not only fully functional but also programmable using the P4 language. This combination brings the benefits of deterministic networking to a flexible, software-defined environment.&lt;/p&gt;

&lt;h1 id=&quot;ltdr--takeaway-messages&quot;&gt;lt;dr – Takeaway Messages&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;A fully software-based TSN switch is now possible using Linux’s built-in TSN capabilities. This opens up opportunities for testing and prototyping TSN features without requiring specialized hardware.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;By combining Linux’s traffic shaping features with P4 programmability, we can create a TSN switch that reacts more intelligently and quickly to changing network conditions, offering better performance and control than traditional setups.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;motivation&quot;&gt;Motivation&lt;/h1&gt;

&lt;h3 id=&quot;motivation-for-building-a-programmable-tsn-switch&quot;&gt;Motivation for Building a Programmable TSN Switch&lt;/h3&gt;

&lt;p&gt;TSN has traditionally relied on dedicated hardware that supports precise timing and scheduling. While powerful, such hardware is often costly and lacks flexibility. In response, recent versions of the Linux kernel have introduced support for features like TAPRIO qdisc, which allow time-aware traffic shaping in software.&lt;/p&gt;

&lt;p&gt;Building on this, our earlier &lt;a href=&quot;https://blog.deterministic6g.eu/posts/2024/11/02/software_tsn_switch.html&quot;&gt;blog post&lt;/a&gt; showed how to configure a Linux bridge as a basic TSN switch. However, that approach had some limitations:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Cumbersome configuration: It relies on auxiliary components, like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;qdisc&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tc&lt;/code&gt; filters, to classify traffic, making configuration and management.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Limited flexibility: The bridge’s behavior is controlled via the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tc&lt;/code&gt; tool from the control plane, which may introduce delays when rapid traffic adaptation is required.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To overcome these limitations, we introduce a programmable software TSN switch built on Linux’s existing TSN features, with added programmability using the P4 language. This allows dynamic, real-time control of packet behavior directly from the data plane.&lt;/p&gt;

&lt;h1 id=&quot;background&quot;&gt;Background&lt;/h1&gt;

&lt;h2 id=&quot;tsn&quot;&gt;TSN&lt;/h2&gt;

&lt;p&gt;TSN is a set of IEEE 802.1 standards that extend Ethernet to support deterministic communication with bounded latency, low jitter, and minimal packet loss.&lt;/p&gt;

&lt;p&gt;TAS is one of the core components of TSN, defined in IEEE 802.1Qbv, to enable time-division-based scheduling of network traffic by assigning transmission gates to traffic queues, which open and close according to a periodic schedule synchronized across the network. This allows critical traffic to be transmitted at precise time intervals, ensuring predictability.&lt;/p&gt;

&lt;h2 id=&quot;p4-programming-protocol-independent-packet-processors&quot;&gt;P4: Programming Protocol-independent Packet Processors&lt;/h2&gt;

&lt;p&gt;Programmable data planes fundamentally change the architecture of network devices by enabling direct control over packet processing logic at the forwarding layer. Unlike traditional fixed-function switches that rely on pre-defined behaviors hardcoded in hardware, programmable data planes allow users to  define how packets are parsed, matched, modified and forwarded.&lt;/p&gt;

&lt;p&gt;The core of this paradigm is P4, a domain-specific language. In contrast to a general purpose language such as C or Python, P4 is optimized for network data processing.
P4 programs are compiled to run on a variety of targets, including software switches (e.g., BMv2), programmable hardware switches based on FPGAs, or ASICs.&lt;/p&gt;

&lt;h1 id=&quot;a-p4-based-programmable-software-tsn-switch&quot;&gt;A P4-based Programmable Software TSN switch&lt;/h1&gt;

&lt;p&gt;The P4 software switch (BMv2) is the key component, serving as the data plane responsible for packet processing. By utilizing the P4 programming language, we remove the dependency on manual traffic control commands, e.g., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tc&lt;/code&gt;, enabling flexible and runtime adjustments.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://blog.deterministic6g.eu/assets/images/switch.png&quot; alt=&quot;A P4-based Programmable Software TSN switch&quot; title=&quot;A P4-based Programmable Software TSN switch&quot; class=&quot;align-center&quot; width=&quot;450px&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Specifically, we integrate seamlessly the BMv2 with the TAPRIO qdisc. When a packet enters the BMv2, it undergoes a parsing process defined by the P4 program. This parsing step allows us to extract relevant fields from the packet header and perform logical processing as required. For instance, operations like In-band Network Telemetry (INT) can be implemented to monitor the packet’s journey through the network, collecting data on latency, jitter, etc.&lt;/p&gt;

&lt;p&gt;After completing the logical processing, the VLAN Priority Code Point (PCP) in the packet header is updated to reflect its traffic class. Simultaneously, BMv2 updates the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;skb-&amp;gt;priority&lt;/code&gt; value of the packet in the Linux kernel. The TAPRIO qdisc uses this &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;skb-&amp;gt;priority&lt;/code&gt; value to determine the appropriate traffic class for the packet. The packets of a same traffic class will be sent to the same output transmit (TX) queue, aligning it with the preconfigured time slots defined in the time-aware schedule.&lt;/p&gt;

&lt;p&gt;Figure below show how each output packet is classified and attributed to corresponding TX queue based on its PCP:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://blog.deterministic6g.eu/assets/images/map.png&quot; alt=&quot;Mapping of traffic class&quot; title=&quot;Mapping of traffic class&quot; class=&quot;align-center&quot; width=&quot;650px&quot; /&gt;&lt;/p&gt;

&lt;p&gt;As PCP is a 3-bit value, there are maximumally 8 traffic classes.&lt;/p&gt;

&lt;h2 id=&quot;environment-setup&quot;&gt;Environment Setup&lt;/h2&gt;

&lt;p&gt;In this experimentation, we use Ubuntu 22.04 which is installed inside a Dell laptop.&lt;/p&gt;

&lt;p&gt;As we use P4 language to program the switch, we need to install its compiler, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;p4c&lt;/code&gt;, and its executor, BMv2:&lt;/p&gt;

&lt;p&gt;First, we need to clone the supported elements:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone https://github.com/DETERMINISTIC6G/det6g-blog-prog-soft-tsn-switch.git
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;p4-compiler&quot;&gt;P4 compiler&lt;/h3&gt;

&lt;p&gt;For further information, go &lt;a href=&quot;https://github.com/p4lang/p4c?tab=readme-ov-file#ubuntu-dependencies&quot;&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;source&lt;/span&gt; /etc/lsb-release
&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;deb http://download.opensuse.org/repositories/home:/p4lang/xUbuntu_&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;DISTRIB_RELEASE&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;/ /&quot;&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;sudo tee&lt;/span&gt; /etc/apt/sources.list.d/home:p4lang.list
curl &lt;span class=&quot;nt&quot;&gt;-fsSL&lt;/span&gt; https://download.opensuse.org/repositories/home:p4lang/xUbuntu_&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;DISTRIB_RELEASE&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;/Release.key | gpg &lt;span class=&quot;nt&quot;&gt;--dearmor&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;sudo tee&lt;/span&gt; /etc/apt/trusted.gpg.d/home_p4lang.gpg &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; /dev/null
&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;apt-get update
&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;apt &lt;span class=&quot;nb&quot;&gt;install &lt;/span&gt;p4lang-p4c
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;p4-software-switch---bmv2&quot;&gt;P4 software switch - BMv2&lt;/h3&gt;

&lt;p&gt;A pre-compiled version of BMv2 is available &lt;a href=&quot;https://github.com/p4lang/behavioral-model&quot;&gt;here&lt;/a&gt;. However, as we need to patch it to communicate with TAPRIO qdisc via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;skb-&amp;gt;priority&lt;/code&gt;, we need to install it from source code within our patch.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# install requirements&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;apt-get &lt;span class=&quot;nb&quot;&gt;install&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-y&lt;/span&gt; automake cmake libgmp-dev &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
    libpcap-dev libboost-dev libboost-test-dev libboost-program-options-dev &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
    libboost-system-dev libboost-filesystem-dev libboost-thread-dev &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
    libevent-dev libtool flex bison pkg-config g++ libssl-dev
&lt;span class=&quot;c&quot;&gt;# clone source code&lt;/span&gt;
git clone https://github.com/p4lang/behavioral-model.git
&lt;span class=&quot;c&quot;&gt;# apply our patch&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;cd &lt;/span&gt;behavioral-model
&lt;span class=&quot;c&quot;&gt;# the latest patch is available here: https://github.com/p4lang/behavioral-model/compare/main...montimage-projects:behavioral-model:main&lt;/span&gt;
git checkout 199af48 &lt;span class=&quot;c&quot;&gt;#same moment we patched BMv2&lt;/span&gt;
git apply ../bmv2/bmv2.patch
&lt;span class=&quot;c&quot;&gt;# compile and install (will take few minutes)&lt;/span&gt;
./autogen.sh &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; ./configure &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; make &lt;span class=&quot;nt&quot;&gt;-j&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;make &lt;span class=&quot;nb&quot;&gt;install&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;cd&lt;/span&gt; ..
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;testbed&quot;&gt;Testbed&lt;/h3&gt;

&lt;p&gt;As an example, we will implement a software TSN switch having an input port and an output port to connect a talker and a listener as shown in the following figure.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://blog.deterministic6g.eu/assets/images/testbed.png&quot; alt=&quot;Testbed&quot; title=&quot;Testbed&quot; class=&quot;align-center&quot; width=&quot;500px&quot; /&gt;&lt;/p&gt;

&lt;p&gt;To implement this testbed on a single machine, the talker and listener are isolated in two separate containers to prevent direct communication between them. Each container, that is typically a Linux namespace, connects to the switch via a virtual Ethernet link that, like a cable, acutally has 2 ends. When a packet is sent to one end, it becomes available at the other.&lt;/p&gt;

&lt;p&gt;We first create 2 namespaces, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;talker&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;listener&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip netns add talker
&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip netns add listener
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then create 2 virtual Ethernet links.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip &lt;span class=&quot;nb&quot;&gt;link &lt;/span&gt;add veth-ta &lt;span class=&quot;nb&quot;&gt;type &lt;/span&gt;veth peer name veth-tb
&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip &lt;span class=&quot;nb&quot;&gt;link &lt;/span&gt;add veth-la &lt;span class=&quot;nb&quot;&gt;type &lt;/span&gt;veth peer name veth-lb
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We attach one end of each link to its corresponding container.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip &lt;span class=&quot;nb&quot;&gt;link set &lt;/span&gt;veth-ta netns talker
&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip &lt;span class=&quot;nb&quot;&gt;link set &lt;/span&gt;veth-la netns listener
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We must also activate the links by bringing up its ends:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip netns &lt;span class=&quot;nb&quot;&gt;exec &lt;/span&gt;talker   ip &lt;span class=&quot;nb&quot;&gt;link set &lt;/span&gt;veth-ta up
&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip netns &lt;span class=&quot;nb&quot;&gt;exec &lt;/span&gt;listener ip &lt;span class=&quot;nb&quot;&gt;link set &lt;/span&gt;veth-la up
&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip &lt;span class=&quot;nb&quot;&gt;link set &lt;/span&gt;veth-tb up
&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip &lt;span class=&quot;nb&quot;&gt;link set &lt;/span&gt;veth-lb up
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then set IP addresses for the ends inside &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;talker&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;listener&lt;/code&gt; namespaces:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip netns &lt;span class=&quot;nb&quot;&gt;exec &lt;/span&gt;talker   ip address add 10.0.0.1/24 dev veth-ta
&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip netns &lt;span class=&quot;nb&quot;&gt;exec &lt;/span&gt;listener ip address add 10.0.0.2/24 dev veth-la
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We also need to disable its offload features which can cause Linux kernel to incorrectly calculcate checksum of packets:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip netns &lt;span class=&quot;nb&quot;&gt;exec &lt;/span&gt;talker   ethtool &lt;span class=&quot;nt&quot;&gt;--offload&lt;/span&gt; veth-ta tx off rx off
&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip netns &lt;span class=&quot;nb&quot;&gt;exec &lt;/span&gt;listener ethtool &lt;span class=&quot;nt&quot;&gt;--offload&lt;/span&gt; veth-la tx off rx off
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In this demo, we suppose that there are 2 traffic classes, TC0 and TC1 which will be sent into separated 2 TX queues. Thus we need to set number of TX queues to 2:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ethtool &lt;span class=&quot;nt&quot;&gt;-L&lt;/span&gt; veth-lb tx 2
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Finally, we attach the TAPRIO qdisc to the output port of the switch:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;tc qdisc replace dev veth-lb parent root handle 100 taprio &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
     num_tc 2 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
     map 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
     queues 1@0 1@1 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
     base-time 1554445635681310809 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
     sched-entry S 01 100000000 sched-entry S 03 50000000 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
     clockid CLOCK_TAI
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The essensital parameters are as below:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;num_tc 2&lt;/code&gt;: there are 2 traffic classes&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;map 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0&lt;/code&gt;: maps skb priorities 0..15 to a specified traffic class (TC). Specifically,
    &lt;ul&gt;
      &lt;li&gt;map priority 0 (first bit from the left) to TC0&lt;/li&gt;
      &lt;li&gt;map priority 1 to TC1&lt;/li&gt;
      &lt;li&gt;and priorities 2-15 to TC0 (16 mappings for 16 possible traffic classes).&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;queues 1@0 1@1&lt;/code&gt;: map traffic classes to TX queues of the network device.
 Its values use the format &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;count@offset&lt;/code&gt;. Specifically,
    &lt;ul&gt;
      &lt;li&gt;map the firs traffic class (TC0) to 1 queue starting at offset 0 (first queue)&lt;/li&gt;
      &lt;li&gt;map the second traffic class (TC1) to 1 queue starting at offset 1 (second queue)&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sched-entry S 01 100000000 sched-entry S 03 50000000&lt;/code&gt;: define the intervals, in nanoseconds, during which gates are open or closed. For the first 100ms, only the gate of 1st TX queue is opened. Then the next 50ms, gates of both 1st and 2nd (indicated by the 1st and 2nd bits of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;03&lt;/code&gt;) TX queues are opended. This means that, TX queue for TC0 is always available; the one for TC1 is 100 ms unavailable and 50 ms available (cycle time is 150 ms).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;test&quot;&gt;Test&lt;/h2&gt;

&lt;p&gt;Based on the TAPRIO qdisc configuration described above, TC0 packets can be transmitted at any time, whereas TC1 packets are allowed to transmit only during a 50 ms window in each 150 ms cycle. This setup may lead to underutilization of the egress link’s bandwidth, especially when only TC1 packets are present.&lt;/p&gt;

&lt;p&gt;In this test, we demonstrate that TC1 traffic can opportunistically use TC0’s transmission queue if it has remained idle for more than one second. For simplicity, we assign UDP packets with destination port 1000 to TC0, and all other traffic to TC1. We use UDP instead of TCP to avoid the influence of TCP’s congestion control mechanisms which may impact traffic rate.&lt;/p&gt;

&lt;p&gt;The switch’s behavior is controlled by &lt;a href=&quot;https://github.com/DETERMINISTIC6G/det6g-blog-prog-soft-tsn-switch/blob/main/switch.p4&quot;&gt;switch.p4&lt;/a&gt; program. It contains multiple &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;control&lt;/code&gt; blocks to parse Ethernet, VLAN, IPv4, UDP headers; perform basic routing; and dynamically adjust PCP value of each packet.
While we won’t cover all of these components due to space constraints, let’s focus on the most relevant and interesting part, dynamic PCP adjustment, as shown in the snippet below:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;//an array having only one element of 48 bits&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;//  to store timestamp of the most recent packet belong to traffic class 0, TC0&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;register&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;48&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;last_tc0_packet_ts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;control&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;myEgress&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inout&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;headers&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inout&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;metadata&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;meta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;n&quot;&gt;inout&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;standard_metadata_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std_data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;bit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;48&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;bit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;48&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;last_ts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;apply&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;//enable VLAN if it is not existing&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vlan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;isValid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;){&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;hdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vlan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;setValid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;hdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vlan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;etherType&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ethernet&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;etherType&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;hdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ethernet&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;etherType&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TYPE_VLAN&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

        &lt;span class=&quot;err&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;atomic&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;c1&quot;&gt;//get ingress timestamp (in microsecond) of the current packet&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std_data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ingress_global_timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;c1&quot;&gt;//dynamically adjust VLAN PCP&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;udp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;isValid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;udp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dstPort&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;){&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;hdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vlan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pcp&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
                &lt;span class=&quot;c1&quot;&gt;// remember timestamp of the last packet of TC0&lt;/span&gt;
                &lt;span class=&quot;c1&quot;&gt;//   to the first element of the array&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;last_tc0_packet_ts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;c1&quot;&gt;// get the timestamp from the first element of the array&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;last_tc0_packet_ts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;read&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;last_ts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
                &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;last_ts&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
                    &lt;span class=&quot;n&quot;&gt;hdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vlan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pcp&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
                &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;
                    &lt;span class=&quot;n&quot;&gt;hdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vlan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pcp&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We begin by declaring a &lt;em&gt;global&lt;/em&gt; variable named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;last_tc0_packet_ts&lt;/code&gt; to record the ingress timestamp of the most recent TC0 packet observed.&lt;/p&gt;

&lt;p&gt;The core logic is implemented inside the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;apply&lt;/code&gt; block. It first ensures that a VLAN header is present in the output packet. Next, if the current packet is a UDP packet with destination port 1000, its PCP field is set to 0, indicating traffic class TC0. For all other packets, the switch compares the current packet’s ingress timestamp with the timestamp of the last observed TC0 packet. If more than 1 second has elapsed, the packet is treated as TC0 by setting its PCP to 0; otherwise, it is classified as TC1 by setting the PCP to 1.&lt;/p&gt;

&lt;p&gt;This logic is enclosed within an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;atomic&lt;/code&gt; block to ensure that the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;write&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;read&lt;/code&gt; operations on the global variable &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;last_tc0_packet_ts&lt;/code&gt; are executed sequentially and consitently.&lt;/p&gt;

&lt;p&gt;This mechanism effectively implements a simple, time-aware packet prioritization strategy directly in the programmable data plane, enabling more flexible traffic handling without relying on the control plane.&lt;/p&gt;

&lt;p&gt;We need to compile the P4 code:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;p4c &lt;span class=&quot;nt&quot;&gt;--target&lt;/span&gt;  bmv2  &lt;span class=&quot;nt&quot;&gt;--arch&lt;/span&gt;  v1model switch.p4
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After compiling, we get &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;switch.json&lt;/code&gt; file which is used to start the BMv2:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;simple_switch &lt;span class=&quot;nt&quot;&gt;-i&lt;/span&gt; 1@veth-tb &lt;span class=&quot;nt&quot;&gt;-i&lt;/span&gt; 2@veth-lb switch.json &amp;amp;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Start 2 iPerf servers:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip netns &lt;span class=&quot;nb&quot;&gt;exec &lt;/span&gt;listener iperf3 &lt;span class=&quot;nt&quot;&gt;--server&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--port&lt;/span&gt; 1000 &lt;span class=&quot;nt&quot;&gt;--daemon&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip netns &lt;span class=&quot;nb&quot;&gt;exec &lt;/span&gt;listener iperf3 &lt;span class=&quot;nt&quot;&gt;--server&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--port&lt;/span&gt; 2000 &lt;span class=&quot;nt&quot;&gt;--daemon&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We use tcpdump to capture the output packets of the switch so we can check its traffic shaping:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip netns &lt;span class=&quot;nb&quot;&gt;exec &lt;/span&gt;listener tcpdump &lt;span class=&quot;nt&quot;&gt;-i&lt;/span&gt; veth-la &lt;span class=&quot;nt&quot;&gt;-w&lt;/span&gt; trace.pcap &lt;span class=&quot;nt&quot;&gt;--time-stamp-precision&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;nano &lt;span class=&quot;nt&quot;&gt;--snap&lt;/span&gt; 100 tcp or udp &amp;amp;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then start the iPerf clients inside &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;talker&lt;/code&gt; namespace to generate UDP traffic:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip netns &lt;span class=&quot;nb&quot;&gt;exec &lt;/span&gt;talker bash &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;iperf3 --client 10.0.0.2 --port 1000 --udp --time 3 &amp;amp; iperf3 --client 10.0.0.2 --port 2000 --udp --time 5&apos;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Stop tcpdump, then plot the traffic shapping:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;python3 ./plot.py
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We obtain the following figure which illustrates the arrival times of packets at the listener:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://blog.deterministic6g.eu/assets/images/arrival_time.png&quot; alt=&quot;Time-shaped traffic&quot; title=&quot;Time-shaped traffic&quot; class=&quot;align-center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Each vertical line represents the arrival of a single packet. The traffic throughput for each traffic class is set to 1 Mbps, resulting in a total of 789 packets transmitted during the experiment.&lt;/p&gt;

&lt;p&gt;The results confirm that TAS is effective and functions as expected. During the first 3 seconds, both traffic classes, TC0 and TC1, are active. As configured in the TAPRIO qdisc, TC1 is transmitted only within its allocated 50 ms time slots in each 150 ms cycle, while TC0 packets are sent without restriction.&lt;/p&gt;

&lt;p&gt;After 3 seconds, TC0 traffic ends, leaving its time slots unused. During this period, TC1 continues to transmit only during its scheduled windows, resulting in the egress link being utilized at only one-third of its capacity.&lt;/p&gt;

&lt;p&gt;However, after 1 additional second, the system detects that TC0 has been inactive, and TC1 begins to utilize all available time slots, demonstrating the switch’s ability to opportunistically reuse idle transmission queues to improve bandwidth efficiency.&lt;/p&gt;</content><author><name>Huu-Nghia Nguyen</name></author><category term="P4" /><category term="Programmable" /><category term="Linux" /><category term="TSN" /><category term="networking" /><category term="edge" /><category term="cloud" /><category term="P4" /><category term="Programmable" /><category term="Linux" /><category term="TSN" /><category term="networking" /><category term="edge" /><category term="cloud" /><summary type="html">Time-Sensitive Networking (TSN) enhances traditional Ethernet with capabilities for reliable and predictable communication. It’s widely used in systems where precise timing and low latency are critical, such as industrial automation, automotive networks, and professional media streaming.</summary></entry><entry><title type="html">TSN on the Edge: Software TSN Bridge for the Edge Cloud</title><link href="https://blog.deterministic6g.eu/posts/2024/11/02/software_tsn_switch.html" rel="alternate" type="text/html" title="TSN on the Edge: Software TSN Bridge for the Edge Cloud" /><published>2024-11-02T00:00:00+01:00</published><updated>2024-11-02T00:00:00+01:00</updated><id>https://blog.deterministic6g.eu/posts/2024/11/02/software_tsn_switch</id><content type="html" xml:base="https://blog.deterministic6g.eu/posts/2024/11/02/software_tsn_switch.html">&lt;p&gt;In this blog post, we explain how to set up a software TSN bridge implementing time-aware shaping with Linux on an edge cloud server.&lt;/p&gt;

&lt;h1 id=&quot;tldr--takeaway-messages&quot;&gt;tl;dr – Takeaway Messages&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;Many networked real-time systems utilize edge computing servers to offload computations.&lt;/li&gt;
  &lt;li&gt;Time-Sensitive Networking (TSN) is one technology to support deterministic real-time communication with its Time-Aware Shaper to communicate with edge cloud servers.&lt;/li&gt;
  &lt;li&gt;Using virtualization technologies like containers or virtual machines (VM) on edge servers requires software bridges to connect containers, VMs, etc. The TSN network effectively extends onto the edge server.&lt;/li&gt;
  &lt;li&gt;The Time-Aware Priority Shaper (TAPRIO) is one technology to implement the Time-Aware Shaper on Linux software bridges.&lt;/li&gt;
  &lt;li&gt;In this blog post, we explain and demonstrate how to combine TAPRIO and Linux virtual bridges into software TSN bridges.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;motivation&quot;&gt;Motivation&lt;/h1&gt;

&lt;p&gt;Networked real-time system often utilize an edge cloud computing environment to execute components on edge cloud servers. A prominent example are networked control systems, where the controller of the control systems is offloaded to an edge cloud server that communicates with the ‘plant’ consisting of sensors and actuators over the network as shown in the following figure.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://blog.deterministic6g.eu/assets/images/networked_control_system.png&quot; alt=&quot;Networked control system&quot; title=&quot;Networked control system&quot; class=&quot;align-center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The DETERMINISTIC6G project describes a number of such applications in &lt;a href=&quot;https://deterministic6g.eu/images/deliverables/DETERMINISTIC6G-D1.1-v1.0.pdf&quot;&gt;this document&lt;/a&gt;, including:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Automated guided vehicles moving on a shop floor in a factory and communicating with edge cloud servers in the factory.&lt;/li&gt;
  &lt;li&gt;Exoskeletons assisting workers on a shop floor, which are remotely controlled from an edge cloud server in the factory.&lt;/li&gt;
  &lt;li&gt;Extended reality devices like Augmented Reality (AR) headsets offloading compute-intensive tasks like rendering of images to an edge server.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Real-time communication technologies are used to guarantee bounds on the network delay between plant and controller. Time-Sensitive Networking (TSN) is a popular technology to implement real-time communication over IEEE 802.3 (Ethernet) networks consisting of TSN bridges. In particular, the so-called Time-Aware Shaper (TAS) implemented by bridges is able to guarantee very low bounds on network delay and delay variation (jitter).&lt;/p&gt;

&lt;p&gt;Today, edge cloud servers often utilize virtualization technologies such as virtual machines (VM) or containers to host and isolate application components such as the controller of a networked control system as shown in the figure above. Typically, VMs or containers are connected to the physical network through a software bridge. That is, the network does not end at the network interface of the host, but extends onto the host up to the network interfaces of the VMs or containers, including in particular the software bridge. Consequently, real-time end-to-end communication must also include the software bridge. In particluar, similarly to the hardware bridges, the software bridge should also implement the TAS.&lt;/p&gt;

&lt;p&gt;The Linux operating system comes with an implementation of the TAS called the Time-Aware Priority Shaper (TAPRIO) which can be combined with the Linux software bridge to implement a software TSN bridge. However, the usage of TAPRIO and in particular the combination of TAPRIO and software bridges is not trivial. Therefore, the goal of this blog post is to explain how to implement a software TSN bridge with TAPRIO with Linux.&lt;/p&gt;

&lt;p&gt;If you have never heard of TSN or the Time-aware Shaper, you should start reading the following TSN background section first, where we give a brief overview of the TAS. All TSN experts can safely skip this section and directly jump to the description of TAPRIO, the Linux queuing discipline implementing the Time-aware Shaper. Finally, we will show how to integrate a Linux bridge with TAPRIO into a software TSN bridge.&lt;/p&gt;

&lt;h1 id=&quot;tsn-background-time-aware-shaper-for-scheduled-traffic&quot;&gt;TSN Background: Time-aware Shaper for Scheduled Traffic&lt;/h1&gt;

&lt;p&gt;Time-sensitive Networking (TSN) is a collection of IEEE standards to enable real-time communication over IEEE 802.3 networks (Ethernet). Although several implementations of real-time Ethernet technologies have already existed for some time in the past, TSN now brings real-time communication to standard Ethernet as defined by IEEE. With TSN, a TSN-enabled Ethernet can now transport both, real-time and non-real-time traffic over one converged network.&lt;/p&gt;

&lt;p&gt;At the center of the TSN standards are different so-called shapers, which some people would call schedulers, and others queuing disciplines, so don’t be confused if we use these words interchangeably. Deterministic real-time communication with very low delay and jitter is the realm of the so-called Time-aware Shaper (TAS). Basically, the TAS implements a TDMA scheme, by giving packets (or frames as they are called on the data link layer) of different traffic classes access to the medium within different time slots. To understand the technical details better, let’s have a look at how a packet traverses a bridge. The following figure shows a simplified but sufficiently accurate view onto the data path of a TSN bridge.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;          incoming packet from ingress network interface
                             |
                             v
+------------------------------------------------------------+
|                      Forwarding Logic                      +
+------------------------------------------------------------+
               | output on port 1   ...    | output on port n       
               v                           v
+--------------------------------+
+          Classifier            +
+--------------------------------+
    |          |             |
    v          v             v
+-------+  +-------+     +-------+  
|       |  |       |     |       |
| Queue |  | Queue | ... | Queue |
|  TC0  |  |  TC1  |     |  TC7  |  
|       |  |       |     |       |
+-------+  +-------+     +-------+
    |          |             |         +-------------------+ 
    v          v             v         | Gate Control List |
+-------+  +-------+     +-------+     | t1 10000000       |  
| Gate  |&amp;lt;-| Gate  | ... | Gate  |&amp;lt;----| t2 01111111       |
+-------+  +-------+     +-------+     | ...               |
    |          |             |         | repeat            |
    v          v             v         +-------------------+
+--------------------------------+     
|     Transmission Selection     |     
+--------------------------------+
               |
	       v
outgoing packet to egress network interface 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;First, the packets enters the bridge through the incoming port or the network interface controller (NIC). Then, the forwarding logic decides on which outgoing port to forward the packet. So far, this is not different from an ordinary bridge.&lt;/p&gt;

&lt;p&gt;Then comes the more interesting part from the point of view of a TSN switch. For the following discussion, we zoom into one outgoing port (this part of the figure should be replicated n times, once for each outgoing port). First, the classifier decides, which traffic class the packet belongs to. To this end, the VLAN tag of the packet contains a three-bit Priority Code Point (PCP) field. So it should not come as a big surprise that eight different traffic classes are supported, each having its own outgoing FIFO queue, i.e., eight queues per outgoing port.&lt;/p&gt;

&lt;p&gt;The TAS controls when packets from which class are eligible for forwarding. Behind each queue is a gate. If the gate of a queue is open, the first packet in the queue is eligible for transmission. If the gate is closed, the queue cannot transmit. Whether a gate is open or closed is defined by the time schedule stored in the Gate Control List (GCL). Each entry in the GCL has a timestamp defining the time when the state of the gates should change to a given state. For instance the entry &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t1 10000000&lt;/code&gt; says that at time t1 gate 0 should be open (1) and gates 1-7 should be closed (0). After the end of the schedule, the schedule repeats in a cyclic fashion, i.e., t1, t2, etc. define relative times with respect to the start of a cycle. For a defined behavior, the clocks of all switches need to be synchronized, so all switches refer to the same cycle base time with their schedules. This is the job of the PTP (Precision Time Protocol).&lt;/p&gt;

&lt;p&gt;The idea is that gates along the path of a time-sensitive packet are opened and closed such that upper bounds on the end-to-end network delay and jitter can be guarateed despite concurrent traffic, which might need to wait behind closed gates. How to calculate schedules to guaranteed a desired upper bound on end-to-end network delay and jitter is out of the scope of the IEEE standard. Actually, it is a hard problem and subject to active research. We will not got into detail here, but refer the interested reader to a &lt;a href=&quot;https://arxiv.org/abs/2211.10954&quot;&gt;recent survey&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;It is also allowed to open multiple gates at the same time. For instance, the scheduling entry &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t2 011111111&lt;/code&gt; would open 7 gates all at the same time. Then Transmission Selection will decided which queue with an open gate is allowed to transmit a packet, e.g., using strict priority queuing.&lt;/p&gt;

&lt;h1 id=&quot;the-linux-time-aware-priority-shaper&quot;&gt;The Linux Time-aware Priority Shaper&lt;/h1&gt;

&lt;p&gt;The Time-aware Shaper (TAS) as defined by IEEE standards introduced in the previous section is implemented by the Linux Queuing Discipline (QDisc) Time-aware Priority Shaper (TAPRIO). QDiscs are a powerful Linux concept to arrange packets to be output over a network interface to which the QDisc is attached. You can even define chains of QDiscs that a packets passes through on its way to the egress interface. Next, we describe briefly how to configure TAPRIO for a network interface.&lt;/p&gt;

&lt;p&gt;The configuration of a QDisc is done with the tc (traffic control) tool. Let’s assume that we want to set up TAPRIO for all traffic leaving through the network interface enp2s0f1. Then, the tc command could look as follows:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ tc qdisc replace dev enp2s0f1 parent root handle 100 taprio \
num_tc 2 \
map 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 \
queues 1@0 1@1 \
base-time 1554445635681310809 \
sched-entry S 01 800000 sched-entry S 02 200000 \
clockid CLOCK_TAI
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here, we replace the existing QDisc (maybe the default one) of the device enp2s0f1 by a TAPRIO QDisc, which is placed right at the root of the device. We need to provide a unique handle (100) for this QDisc.&lt;/p&gt;

&lt;p&gt;We define two traffic classes (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;num_tc 2&lt;/code&gt;). As you have seen above, an IEEE switch might have queues for up to 8 traffic classes. TAPRIO supports up to 16 traffic classes, although your NIC then also would need as many transmit (TX) queues (see below).&lt;/p&gt;

&lt;p&gt;Then, we need to define how to classify packets, i.e., how to assign packets to traffic classes. To this end, TAPRIO uses the priority field of the sk_buff structure (socket buffer, SKB for short). The SKB is the internal kernel data structure for managing packets. Since the SKB is a kernel structure, you cannot directly set it from user space. One way of setting the priority field of an SKP from user space is to use the SO_PRIORITY socket option by the sending application. Another option, which is particularly important for a bridge, is to map the PCP value from a packet to the SKB priority as shown further below. For now, let’s assume the priority is set somehow. Then, the map parameter defines the mapping of SKB priority values to traffic classes (TC) using a bit vector (positional parameters). You can read the bit vector &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1&lt;/code&gt; as follows: map priority 0 (first bit from the left) to TC1, priority 1 to TC0, and priorities 2-15 to TC1 (16 mappings for 16 possible traffic classes).&lt;/p&gt;

&lt;p&gt;Next, we map traffic classes to TX queues of the network device. Modern network devices typically implement more than one TX queue for outgoing traffic. How many TX queues are supported by your device, you can find out with the following command:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ ls /sys/class/net/enp2s0f1/queues/
rx-0 rx-1 rx-2 rx-3 rx-4 rx-5 rx-6 rx-7 tx-0 tx-1 tx-2 tx-3 tx-4 tx-5 tx-6 tx-7
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here, the network device enp2s0f1 supports 8 TX queues.&lt;/p&gt;

&lt;p&gt;The parameter &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;queues 1@0 1@1&lt;/code&gt; of TAPRIO reads like this: The first entry &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1@0&lt;/code&gt; defines the mapping of the first traffic class (TC0) to TX queues, the second entry &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1@1&lt;/code&gt; the mapping of the second traffic class (TC1), and so on. Each entry defines a range of queues to which the traffic class should be mapped using the schema queue_count@queue_offset. That is, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1@0&lt;/code&gt; means map (the traffic class) to 1 TX queue starting at queue index 0, i.e., queue range [0,0]. The second class is also mapped to 1 queue at queue index 1 (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1@1&lt;/code&gt;). You can also map one traffic class to several TX queues by increasing the count parameter beyond 1. Make sure that queue ranges do not overlap.&lt;/p&gt;

&lt;p&gt;Next, we define the schedule of the TAS implemented by TAPRIO. First of all, we need to define a base time as a reference for the cyclic schedule. Every scheduling cycle starts at base_time + k*cycle_time. The cycle time (duration of the cycle until it repeats) is implicitly defined by the sum of the times (interval durations) of the schedule entries (see below), in our example 800000 ns + 200000 ns = 1000000 ns = 1 ms. The base time is defined in nano seconds according to some clock. The reference clock to be used is defined by parameter clockid. CLOCK_TAI is the International Atomic Time. The advantages of TAI are: TAI is not adjusted by leap seconds in contrast to CLOCK_REALTIME, and TAI refers to a well-defined starting time in contrast to CLOCK_MONOTONIC.&lt;/p&gt;

&lt;p&gt;Finally, we need to define the entries of the Gate Control List, i.e., the points in time when gates should open or close (or in other words: the time intervals during which gates are open or closed). For instance, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sched-entry S 01 800000&lt;/code&gt; says that the gate of TC0 (least significant bit in bit vector) opens at the start of the cycle for 800000 ns duration, and all other gates are closed for this interval. Then, 800000 ns after the start of the cycle, the entry &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sched-entry S 02 200000&lt;/code&gt; defines that the gate of TC1 (second bit of bit vector) opens for 200000 ns, and all other gates are closed.&lt;/p&gt;

&lt;h1 id=&quot;software-tsn-bridge&quot;&gt;Software TSN Bridge&lt;/h1&gt;

&lt;p&gt;Now that we know how to use the TAPRIO QDisc, we next set up a software TSN bridge. The software TSN bridge integrates three parts:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;A software bridge taking care of forwarding packets to the right outgoing port.&lt;/li&gt;
  &lt;li&gt;Per outgoing switch port (network interface) a TAPRIO QDISC.&lt;/li&gt;
  &lt;li&gt;A mapper for mapping PCP values to SKB priorities as used by TAPRIO for classifying packets.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As an example, we would like to implement the following scenario with two network namespaces hosting applications—network namespaces are typically used by containers, however, we could also use VMs:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://blog.deterministic6g.eu/assets/images/software_tsn_bridge_scenario.png&quot; alt=&quot;Scenario&quot; title=&quot;Scenario&quot; class=&quot;align-center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;If you want to follow this example, you can set up the namespaces as follows:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ sudo ip netns add talker
$ sudo ip netns add listener
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We set up a software bridge called vbridge and bring it up:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ sudo ip link add name vbridge type bridge
$ sudo ip link set dev vbridge up
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We assume that each namespace is attached to the virtual bridge with a virtual Ethernet (veth) interface. veth interfaces actually come as pairs of devices, like a virtual cable with two ends. One end (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;veth-...-a&lt;/code&gt;) will be attached to the container (namespace), the other end (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;veth-...-b&lt;/code&gt;) to the virtual bridge, like you would connect a physical host to a physical bridge. We also need to make sure that veth devices with a TAPRIO QDisc have the required number of TX queues as mentioned above using option &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;numtxqueues&lt;/code&gt;. In our system, only the virtual TSN bridge uses TAPRIO to schedule packets towards the listener namespace. Therefore, we only set the number of TX queues for the bridge-side veth device &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;veth-listener-b&lt;/code&gt; towards the listener:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ sudo ip link add veth-talker-a type veth peer name veth-talker-b
$ sudo ip link add veth-listener-a numtxqueues 8 type veth peer name veth-listener-b numtxqueues 8
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Since TSN requires VLAN tags to carry PCP values, we also create VLAN interfaces (VLAN id 100) for one end of the virtual cable (the end attached to the namespace. All packets coming out of this VLAN device (from the namespace to the bridge) will carry a VLAN tag; for all incoming packets (from the bridge to the container), the VLAN tag is removed. The sending application (called talker in the following) defines SKB priorities as described above using the SO_PRIORITY socket option. This SKB priority is mapped to the PCP value of the VLAN header using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SO_PRIORITY&lt;/code&gt; option. This ensures that all packets arriving at the bridge have a VLAN tag with defined PCP value. This also applies to packets coming from the physical network outside of the host, i.e., the bridge receives over all attached interfaces VLAN-tagges packets with PCP field:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ sudo ip link add link veth-talker-a name veth-t-a.100 type vlan id 100
$ sudo ip link add link veth-listener-a name veth-l-a.100 type vlan id 100  
$ sudo ip link set veth-t-a.100 type vlan egress 0:0 1:1
$ sudo ip link set veth-l-a.100 type vlan egress 0:0 1:1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We also assign a physical network interfaces (enp2s0f0) to the bridge to connect the bridge to the physical network of the edge server. We put this interface into promiscuous mode, so the virtual bridge will see all incoming packets (similar to physical bridges):&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ ip link set dev enp2s0f0 promisc on
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then, we assign all network interfaces to the bridge and to the containers, respectively:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ sudo ip link set veth-t-a.100 netns talker
$ sudo ip link set veth-talker-b master vbridge
$ sudo ip link set veth-l-a.100 netns listener
$ sudo ip link set veth-listener-b master vbridge
$ sudo ip link set enp2s0f0 master vbridge
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We must also bring all interfaces up:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ sudo ip link set veth-talker-a up
$ sudo ip link set veth-talker-b up
$ sudo ip link set veth-listener-a up
$ sudo ip link set veth-listener-b up
$ sudo ip netns exec talker veth-t-a.100 up
$ sudo ip netns exec listener veth-l-a.100 up
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To perform traffic shaping on egress traffic, we need to assign TAPRIO QDiscs to all network interfaces shaping traffic in egress direction. In our example, we only define a TAPRIO QDisc for the virtual bridge interface towards the listener namespace to demonstrate the idea (in practice, all interfaces attached to the TSN bridge might have a TAPRIO QDisc). SKB priority 0 is mapped to traffic class 0; SKB priority 1 is mapped to traffic class 1 (all other SKB priorities are mapped to traffic class 0). The gate for traffic class 0 is always open (least significant bit always set); the gate for traffic class 1 is 100 ms closed and 50 ms open (cycle time is 150 ms):&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ sudo tc qdisc replace dev veth-listener-b parent root handle 100 taprio \
num_tc 2 \
map 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 1@1 \
base-time 1554445635681310809 \
sched-entry S 01 100000000 sched-entry S 03 50000000 \
clockid CLOCK_TAI
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Finally, we need to remember that the virtual bridge receives VLAN-tagges packets with PCP values, but TAPRIO needs SKB priorities to classify packets, i.e., we need to perform a mapping from PCP values to SKB priorities. A very elegant method working without removing the VLAN tag is to use a so-called ingress QDisc on all interfaces of the virtual bridge together with the action &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;skbedit&lt;/code&gt;. This action does exactyl what its name suggests: modify the SKB. In our case, we specify a mapping from PCP value 1 (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;vlan_prio 1&lt;/code&gt;) to SKB priority 1 (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;priority 1&lt;/code&gt;). For the sake of a short description, we only do this for the interface from namespace &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;talker&lt;/code&gt;, but you typically would do this on all interfaces of the bridge:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ sudo tc qdisc add dev veth-talker-b ingress
$ sudo tc filter add dev veth-talker-b ingress prio 1 protocol 802.1Q flower vlan_prio 1 action skbedit priority 1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;test&quot;&gt;Test&lt;/h1&gt;

&lt;p&gt;Finally, we can test our software TSN bridge that we have set up above. We start a talker in namespace &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;talker&lt;/code&gt; sending UDP messages as fast as possible to the listener in namespace &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;listener&lt;/code&gt; with SKB priority 1, which will be mapped to the PCP 1 by the VLAN device. PCP 1 will be mapped by the ingress QDisc to SKB 1 at the virtual bridge. TAPRIO maps SKB priority 1 to traffic class 1. The gate for traffic class 1 is open for 50 ms and closed for 100 ms. Therefore, we would expect to see this pattern in the traffic arriving at the talker.&lt;/p&gt;

&lt;p&gt;We start root shells in the talker and listener namespaces:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ sudo ip netns exec talker /bin/bash
$ sudo ip netns exec listener /bin/bash
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then set private IP addresses for talker and listener:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;(talker) $ ip address add 10.0.1.1/24 dev veth-t-a.100
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;(listener) $ ip address add 10.0.1.2/24 dev veth-l-a.100
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On the listener side, we use netcat to receive and drop all received packets to port 6666:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;(listener) $ nc -u -l -p 6666 &amp;gt; /dev/null
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On the talker side, we use a custom C application, which sends UDP packets at a rate of 1000 pkt/s and sets the priority to 1 using the socket option SO_PRIORITY (an alternative application that can also set the required SO_PRIORITY socket option would be socat).&lt;/p&gt;

&lt;p&gt;Traffic is captured with TCP dump at the virtual listener interface, i.e., before VLAN tags are removed, so we can also check the correct tagging of packets with VLAN ID 100 and PCP value 1:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ sudo tcpdump -i veth-listener-a -w trace.pcap --time-stamp-precision=nano
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The following figure shows the arrival times of packets at the listener. We draw a vertical line whenever a packet of stream is received by the listener (the individual lines blend together at higher data rates).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://blog.deterministic6g.eu/assets/images/taprio-traffic.png&quot; alt=&quot;Time-shaped traffic&quot; title=&quot;Time-shaped traffic&quot; class=&quot;align-center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;As we can see, the packets arrive with the anticipated pattern in bursts of 50 ms length (gate for prio 1 open), so time-aware shaping is effective and working correctly.&lt;/p&gt;

&lt;p&gt;Using Wireshark, we can also inspect the packets received at the listener interface &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;veth-listener-a&lt;/code&gt;, i.e., before VLAN tags are removed:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://blog.deterministic6g.eu/assets/images/taprio-wireshark.png&quot; alt=&quot;Packet inspection with Wireshark&quot; title=&quot;Packet inspection with Wireshark&quot; class=&quot;align-center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;We can see that packets indeed carry a VLAN tag with id 100, and the PCP value is 1, so also the VLAN tagging and PCP mapping works as intended.&lt;/p&gt;</content><author><name>Frank Dürr</name></author><category term="Linux" /><category term="TSN" /><category term="networking" /><category term="edge" /><category term="cloud" /><category term="Linux" /><category term="TSN" /><category term="networking" /><category term="edge" /><category term="cloud" /><summary type="html">In this blog post, we explain how to set up a software TSN bridge implementing time-aware shaping with Linux on an edge cloud server.</summary></entry><entry><title type="html">Network Delay Emulator: Emulating the Characteristic 5G/6G Network Delay with Linux</title><link href="https://blog.deterministic6g.eu/posts/2024/10/26/network_delay_emulator.html" rel="alternate" type="text/html" title="Network Delay Emulator: Emulating the Characteristic 5G/6G Network Delay with Linux" /><published>2024-10-26T01:00:00+02:00</published><updated>2024-10-26T01:00:00+02:00</updated><id>https://blog.deterministic6g.eu/posts/2024/10/26/network_delay_emulator</id><content type="html" xml:base="https://blog.deterministic6g.eu/posts/2024/10/26/network_delay_emulator.html">&lt;p&gt;In this blog post, we present the DETERMINISTIC6G Network Delay Emulator for Linux.&lt;/p&gt;

&lt;h1 id=&quot;tldr--takeaway-messages&quot;&gt;tl;dr – Takeaway Messages&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;The performance of networked real-time systems depends on the network delay between distributed components.&lt;/li&gt;
  &lt;li&gt;Novel networked real-time systems require wireless communication, using for instance 5G/6G mobile networks.&lt;/li&gt;
  &lt;li&gt;The &lt;a href=&quot;https://github.com/DETERMINISTIC6G/NetworkDelayEmulator&quot;&gt;DETERMNISTC6G Network Delay Emulator&lt;/a&gt; for Linux is an open-source tool to evaluate the performance of networked real-time systems with emulated characteristic network delay, including the possibility to use &lt;a href=&quot;https://github.com/DETERMINISTIC6G/deterministic6g_data&quot;&gt;open data&lt;/a&gt; from delay measurements in real 5G networks.&lt;/li&gt;
  &lt;li&gt;This blog post presents the DETERMINISTIC6G Network Delay Emulator, its usage, and a showcase (evaluation of an inverted pendulum).&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;motivation&quot;&gt;Motivation&lt;/h1&gt;

&lt;p&gt;Networked real-time systems are typically sensitive to network delay, i.e., their performance and safety depends on the delay of communicating messages between distributed system components. Moreover, many novel networked real-time systems include mobile devices that communicate with remote components over a wireless network such as 5G or future 6G mobile networks. The DETERMINISTIC6G project describes a number of such applications in &lt;a href=&quot;https://deterministic6g.eu/images/deliverables/DETERMINISTIC6G-D1.1-v1.0.pdf&quot;&gt;this document&lt;/a&gt;, including:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Automated guided vehicles moving on a shop floor in a factory and communicting with machines and edge cloud servers in the factory.&lt;/li&gt;
  &lt;li&gt;Smart farming where for instance drones monitore the field in front of a harvester to protect animals.&lt;/li&gt;
  &lt;li&gt;Exoskeletons assisting workers on a shop floor, which are remotely controlled from an edge cloud server in the factory.&lt;/li&gt;
  &lt;li&gt;Extended reality devices like Augmented Reality (AR) headsets displaying remotely rendered images.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One major challenge comes from the fact that the characteristic network delay of 5G/6G mobile networks is fundamentally different from wired networks, such as a wired Ethernet network. The following figure compares the delay measured for a single TSN Ethernet bridge (left) to the delay of a wireless 5G bridge (right)—open data of these and other delay measurements from the DETERMINISTIC6G project can be found in &lt;a href=&quot;https://github.com/DETERMINISTIC6G/deterministic6g_data&quot;&gt;this Github repository&lt;/a&gt;. Obviously, the delay of the wireless bridge is substantially larger (milli-seconds vs. micro-seconds), stochastic in nature, and heavy-tailed, i.e., the probability of large delay values does not decrease exponentially.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://blog.deterministic6g.eu/assets/images/port-to-port-delay.png&quot; alt=&quot;Port-to-port delay&quot; title=&quot;Port-to-port delay of wired bridge (left) compared to wireless bridge (right)&quot; class=&quot;align-center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This brings up the crucial question: How would a networked real-time system perform with such a given network delay? For instance, would a networked control system still be stable? What is the Quality of Control (QoC)? Or what Quality of Experience (QoE) would users of a distributed XR system have with such a network delay?&lt;/p&gt;

&lt;p&gt;The answer to these and other questions requires tools to evaluate the performance with characteristic network delay distributions. Network delay emulation is an attractive method to this end, since it allows for testing the real application with an emulated network behaving like a real network—in particular, we want to emulate the characteristic end-to-end network delay of a 5G/6G (or other) network, and see how the real application under test behaves. Testing the real application (if available) is a huge benefit, since building simulation models of the application such as the plant of a control system (e.g., the exoskeleton mentioned above) or even the user in the XR use case mentioned above might be very difficult. Instead, we can use the real application “under test” and expose it to the emulated network delay as shown in the example of a networked control system in the following figure. On the left-hand side and right-hand side, we see real system components, namely the plant of the control system (mobile exoskeleton, drone, automated guided vehicle, etc.), and the controller hosted in an edge cloud environment, respectively. The wireless 5G/6G network is emulated and induces the characteristic network delay of the wireless network.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://blog.deterministic6g.eu/assets/images/network_emulation.png&quot; alt=&quot;Network emulation&quot; title=&quot;Network emulation&quot; class=&quot;align-center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/DETERMINISTIC6G/NetworkDelayEmulator&quot;&gt;DETERMINISTIC6G Network Delay Emulator&lt;/a&gt; is an open-source network delay emulator for Linux. In particular, it can emulate network delay from histograms captured through measurements in real networks, such as the openly available &lt;a href=&quot;https://github.com/DETERMINISTIC6G/deterministic6g_data&quot;&gt;DETERMINISTIC6G delay measurement data&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Next, we will describe the technical details and usage of the Network Delay Emulator in more detail, and also showcase its usage using a textbook example of a networked control system: an inverted pendulum. For more details on how to compile and use the emulator, please visit the &lt;a href=&quot;https://github.com/DETERMINISTIC6G/NetworkDelayEmulator&quot;&gt;Github page of the DETERMINISTIC6G Network Delay Emulator&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;architecture-of-the-network-delay-emulator&quot;&gt;Architecture of the Network Delay Emulator&lt;/h1&gt;

&lt;p&gt;The core of the emulator is a Linux Queueing Discipline (QDisc) called sch_delay that can be assigned to network interfaces to add artificial delay to all packets leaving through this network interface.&lt;/p&gt;

&lt;p&gt;The following figure shows the system architecture consisting of two major parts: the QDisc running in the kernel space, and a user-space application providing individual delays for each transmitted packet through a character device. The provided delays are buffered in the QDisc, such that delay values are available immediatelly when new packets arrive. Whenever a packet is to be transmitted through the network interface, the next delay value is dequeued and applied to the packet before passing it on to the network interface (TX queue).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://blog.deterministic6g.eu/assets/images/network-emulator-architecture.png&quot; alt=&quot;Network Emulator Architecture&quot; title=&quot;Network Emulator Architecture&quot; class=&quot;align-center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Providing delays through a user-space application allows for a flexible and convenient definition of delays without touching any kernel code. The project contains a sample user-space application implemented in Python to define delays as constant values, normal distributions (probability density function), or histograms. This application can be easily extended to calculate other delay distributions.&lt;/p&gt;

&lt;p&gt;The QDisc can also be applied to network interfaces that are assigned to a virtual bridge as described below to apply individual delay distributions to packets forwarded through different egress interfaces. This allows for emulating the end-to-end network delay of a whole emulated network with a single Linux machine.&lt;/p&gt;

&lt;p&gt;One limitation of this approach based on pre-calculating and buffering delays is that it is restricted to independent and identically distributed (i.i.d.) delays. If the delay of a specific packet depends on the delay of an earlier packet, this cannot be easily modelled by this approach since delays were already calculated and buffered possibly long before the packet to be delayed actually arrives. Also changing the delay distribution at runtime is not easily possible due to the buffering of delays from the old distribution.&lt;/p&gt;

&lt;p&gt;QDiscs are typically configured with the tc (traffic control) command in Linux. Since the sch_delay QDisc requires specific parameters as shown next, a modified version of tc is required (a patch for tc is also included in the GitHub repository of the Network Delay Emulator). The following parameters are available to configure the QDisc:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Option&lt;/th&gt;
      &lt;th&gt;Type&lt;/th&gt;
      &lt;th&gt;Default&lt;/th&gt;
      &lt;th&gt;Explanation&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;limit&lt;/td&gt;
      &lt;td&gt;int&lt;/td&gt;
      &lt;td&gt;1000&lt;/td&gt;
      &lt;td&gt;The size of the internal queue for buffering delayed packets. If this queue overflows, packets will get dropped. For instance, if packets are delayed by a constant value of 10 ms and arrive at a rate of 1000 pkt/s, then a queue of at least 1000 pkt/s * 10e-3 s = 10 pkts would be required. A warning will be posted to the kernel log if messages are dropped.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;reorder&lt;/td&gt;
      &lt;td&gt;bool&lt;/td&gt;
      &lt;td&gt;true&lt;/td&gt;
      &lt;td&gt;Whether packet reordering is allowed to closely follow the given delay values, or keep packet order as received. If packet reordering is allowed, a packet with a smaller random delay might overtake an earlier packet with a larger random delay in the QDisc. If packet re-ordering is not allowed, additional delay might be added to the given delay values to avoid packet re-ordering.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h1 id=&quot;emulating-end-to-end-network-delay-for-multiple-end-to-end-paths&quot;&gt;Emulating End-to-End Network Delay for Multiple End-to-End Paths&lt;/h1&gt;

&lt;p&gt;Assume you want to emulate the individual end-to-end network delay in different directions (upstream and downstream to/from hosts) and/or between different pairs of hosts. To this end, you can use a virtual bridge on a central emulation node and assign individual delays to each outgoing (downstream) port.&lt;/p&gt;

&lt;p&gt;The following figure shows a sample topology with two hosts Host1 and Host2, respectively. The end-to-end delay of packets from Host1 to Host2 shall be different from the end-to-end delay from Host2 to Host1. The machine in the middle is the host implementing network emulation to introduce the artificial delay between Host1 and Host2. A Linux virtual bridge is used to forward packets from eth0 to eth1 and vice versa. The two depicted QDiscs add the delay to egress packets individually in each direction.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://blog.deterministic6g.eu/assets/images/network_emulation_topology.png&quot; alt=&quot;Sample topology&quot; title=&quot;Sample topology&quot; class=&quot;align-center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;To implement this scenario, we first create a virtual bridge on HEmu and assign the two physical network interfaces eth0 and eth1 to the virtual bridge. We also bring the interfaces up in case they were previously down:&lt;/p&gt;

&lt;div class=&quot;language-console highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip &lt;span class=&quot;nb&quot;&gt;link &lt;/span&gt;add name vbridge &lt;span class=&quot;nb&quot;&gt;type &lt;/span&gt;bridge
&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip &lt;span class=&quot;nb&quot;&gt;link set &lt;/span&gt;dev vbridge up
&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip &lt;span class=&quot;nb&quot;&gt;link set &lt;/span&gt;eth0 master vbridge
&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip &lt;span class=&quot;nb&quot;&gt;link set &lt;/span&gt;eth1 master vbridge
&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip &lt;span class=&quot;nb&quot;&gt;link set &lt;/span&gt;eth0 up
&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ip &lt;span class=&quot;nb&quot;&gt;link set &lt;/span&gt;eth1 up
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Next, we add the delay QDiscs to eth0 and eth1 on Hemu using the tc command:&lt;/p&gt;

&lt;div class=&quot;language-console highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;cd&lt;/span&gt; ~/NetworkDelayEmulator/tc
&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;sudo&lt;/span&gt; ./iproute2/tc/tc  qdisc add dev eth0 root handle 1:0 delay reorder True limit 1000
&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;sudo&lt;/span&gt; ./iproute2/tc/tc  qdisc add dev eth1 root handle 1:0 delay reorder True limit 1000
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Finally, we start two instances of the user-space application, one providing delays for messages leaving through port eth0 (character device &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/dev/sch_delay/eth0-1_0&lt;/code&gt;) to emulate the delay from Host2 towards Host1, and one for port eth1 (character device &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/dev/sch_delay/eth1-1_0&lt;/code&gt;) to emulate the delay from Host1 to Host2.&lt;/p&gt;

&lt;div class=&quot;language-console highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;cd&lt;/span&gt; ~/NetworkDelayEmulator/userspace_delay
&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;python3 userspace_delay.py /dev/sch_delay/eth0-1_0
&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;python3 userspace_delay.py /dev/sch_delay/eth1-1_0
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The sample user-space application provided with the emulator is an interactive Python application presenting the user with different options to specify delays—of course, you could also use your own custom application to create delay values porgrammatically and send them to the QDisc. In particular, the provided user-space application can load delay histograms created from measurements. The DETERMINSTIC6G project provides different delay data sets measured in real 5G networks in &lt;a href=&quot;https://github.com/DETERMINISTIC6G/deterministic6g_data&quot;&gt;this GitHub repository&lt;/a&gt;. How to integrate these measurements with the emulator is described in detail in the README file of the emulator.&lt;/p&gt;

&lt;h1 id=&quot;per-traffic-class-delay-qdiscs&quot;&gt;Per-Traffic-Class Delay QDiscs&lt;/h1&gt;

&lt;p&gt;QDiscs are a sophisticated concept in Linux. We can exploit this concept to build a hierarchy of QDiscs to assign different delays to packets of different traffic classes sent through the same network interface.&lt;/p&gt;

&lt;p&gt;For instance, the QDisc hierarchy could look as follows:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;            ETS
	   (1:0)
	  /     \
	 /       \
       ETS       ETS
       1:1       1:2
        |         |
	|         |
      Delay     Delay
     (10:0)     (20:0)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;At the top of the hierarchy, we place an ETS (Enhanced Transmission Selection) QDisc. ETS is a so-called classful QDisc with sub-classes to which other QDiscs can be assigned. The traffic of these sub-classes is scheduled by ETS either using deficit round-robbin (weighted fair bandwidth sharing) or priority queuing. We use fair bandwidth sharing since we do not want to penalize any of the traffic classes (other than applying some delay to them). The ETS QDisc with two sub-classes is set up as follows on a network interface, say &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;eth0&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ sudo tc qdisc add dev ens1f3 root handle 1: ets bands 2 quanta 1000 1000
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Each band receives the same “quanta” for fair sharing (same bandwidth for all sub-classes), and is implicitly associated with a sub-class. The first sub-class gets the handle 1:1 and the second 1:2 (parent:bandnumber).&lt;/p&gt;

&lt;p&gt;Next, we add two delay QDiscs to sub-class 1:1 and 1:2:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ sudo tc qdisc add dev eth0 parent 1:1 handle 10: delay reorder True limit 1000
$ sudo tc qdisc add dev eth0 parent 1:2 handle 20: delay reorder True limit 1000
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The first delay QDisc gets the handle 10: and the second the second the handle 20:.&lt;/p&gt;

&lt;p&gt;Finally, we set up filters to classify egress traffic from eth0 as follows:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ sudo tc filter add dev eth0 protocol ip parent 1: prio 1 u32 match ip dst 192.168.1.1/24 flowid 1:1
$ sudo tc filter add dev eth0 protocol ip parent 1: prio 2 matchall flowid 1:2

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The first filter has the highest priority (prio 1). The second filter has a lower priority (prio 2) and, therefore, is only effective if the first filter does not match.&lt;/p&gt;

&lt;p&gt;The first filter uses the versatile u32 filter, which matches on bits of the packet header. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ip dst ...&lt;/code&gt; is just a shorthand for matching on the bits of the IP destination address of IPv4 packets. Such packets are assigned to traffic class 1:1 and, therefore, will be delayed by the delay QDisc 10:.&lt;/p&gt;

&lt;p&gt;The second filter catches all other packets (matchall) and assigns them to the second traffic class, which is delayed by the second delay QDisc 20:0.&lt;/p&gt;

&lt;p&gt;For other classful QDiscs and filters, please check the man page of tc.&lt;/p&gt;

&lt;h1 id=&quot;evaluation-of-delay-emulation-accuracy&quot;&gt;Evaluation of Delay Emulation Accuracy&lt;/h1&gt;

&lt;p&gt;To give an impression on the accuracy to be expected with the NetworkDelayEmulator, we performed measurements with the following virtual bridge setup. We used network taps in fiber optic cables (marked with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt;) and an FPGA network measurement card from Napatech (NT40E3-4-PTP) to capture the traffic from H1 (sender) to H2 (receiver) with nano-second precision.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;       pcap (sender)
         ^
         |
 ----    |     -------------------------------------          ----
|    |   |    |              ---------              |        |    |
|    |---x---&amp;gt;|------------&amp;gt;|         |&amp;lt;------------|&amp;lt;-------|    |
| H1 |        | eth0        | vBridge |        eth1 |        | H2 |
|    |&amp;lt;-------|&amp;lt;------------|         |------QDisc-&amp;gt;|---x---&amp;gt;|    |
|    |        |              ---------              |   |    |    |
|    |        |                Hemu                 |   |    |    |
 ----          -------------------------------------    |     ----
                                                        |
                                                        v
                                                       pcap (receiver)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The sender app on H1 sends minimum-size (64B) UDP packets at a rate of 100 pkt/s and at a speed of 10 Gbps to the receiver app on H2.&lt;/p&gt;

&lt;p&gt;The QDisc is configure with a normal distribution with mean = 10 ms and stddev = 1 ms.&lt;/p&gt;

&lt;p&gt;As baseline, we also capture a trace with zero delay emulation (w/o QDisc on eth1).&lt;/p&gt;

&lt;p&gt;Traces were captured for about 10 min.&lt;/p&gt;

&lt;p&gt;The specs of the Hemu host are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz&lt;/li&gt;
  &lt;li&gt;16 GB RAM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The following figure shows the histogram of the measured (emulated) end-to-end delay overlayed with the input histogram. We see that the emulated delay closely follows the input with a small offset (see below).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://blog.deterministic6g.eu/assets/images/emulation-accuracy.png&quot; alt=&quot;Network emulation accuray&quot; title=&quot;Network emulation accuray&quot; class=&quot;align-center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The following figures show the histograms of the actual delay between the measurement points. Theoretically, we would need to subtract:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;One transmission delay (64*8 bit / 10 Gbps = 51.2 ns) since Hemu will only start processing the packet when it has been fully received, and the measurement card takes the timestamp at the header.&lt;/li&gt;
  &lt;li&gt;The propagation delay for about 6 m fiber cable (about 6 m / (2/3*3e8 m/s) = 30 ns) connecting the tap to the measurement card.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, since this delay is only in the range of microseconds or below, we report values as measured.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://blog.deterministic6g.eu/assets/images/e2e_delay_normal_distribution.png&quot; alt=&quot;End-to-end delay normal distribution&quot; title=&quot;End-to-end delay with normal distribution&quot; class=&quot;align-center&quot; /&gt;
&lt;img src=&quot;https://blog.deterministic6g.eu/assets/images/e2e_delay_null_delay.png&quot; alt=&quot;End-to-end delay null distribution&quot; title=&quot;End-to-end delay null distribution&quot; class=&quot;align-center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;For the normal distribution, the following values were measured:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;mean = 0.010126313739089609 s&lt;/li&gt;
  &lt;li&gt;stddev = 0.0009962691742813061 s&lt;/li&gt;
  &lt;li&gt;99 % confidence interval of the mean = [0.010115940597227065 s, 0.010136686880952152 s]&lt;/li&gt;
  &lt;li&gt;min = 0.005947589874267578 s&lt;/li&gt;
  &lt;li&gt;max = 0.014116764068603516 s&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without delay emulation, the delay was:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;mean = 8.166161013459658e-05 s&lt;/li&gt;
  &lt;li&gt;99 % confidence interval of the mean = [8.159266669171887e-05 s, 8.173055357747428e-05 s]&lt;/li&gt;
  &lt;li&gt;min = 1.0251998901367188e-05 s&lt;/li&gt;
  &lt;li&gt;max = 9.942054748535156e-05 s&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We see that the delay added without any emulated delay is around 81 us. This could be considered as an offset when creating delay distributions.&lt;/p&gt;

&lt;h1 id=&quot;evaluation-of-a-networked-control-system-with-emulated-network-delay&quot;&gt;Evaluation of a Networked Control System with Emulated Network Delay&lt;/h1&gt;

&lt;p&gt;As motivated in the beginning, network emulation is a great method to evaluate the influence of network delay onto already existing, real applications. To showcase this ability, we use a textbook example in the following: an inverted pendulum (networked control systems), which is remotely controlled from an edge cloud server.&lt;/p&gt;

&lt;p&gt;The following figure shows the setup of the evaluation. As inverted pendulum, we use a software implementation (simulation) of the pendulum available &lt;a href=&quot;https://github.com/duerrfk/Inverted-Pendulum&quot;&gt;here on GitHub&lt;/a&gt;. But of course, you could also use a real pendulum, as long as it can communicate through the emulator host with a remote controller on another machine in the network.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://blog.deterministic6g.eu/assets/images/network-emulator-pendulum.png&quot; alt=&quot;Emulator with inverted pendulum&quot; title=&quot;Emulator with inverted pendulum&quot; class=&quot;align-center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;We run the experiment once without adding extra emulated network delay (only delay of a wired 10 Gbps Ethernet network and the software bridge running on the emulator host), and once with the characteristic emulated network delay from a 5G system as made openly available in &lt;a href=&quot;https://github.com/DETERMINISTIC6G/deterministic6g_data&quot;&gt;this GitHub repository&lt;/a&gt; by the DETERMINISTIC6G project. The pendulum is initialized with a 5 degree angle (deviation from the zero degree setpoint).&lt;/p&gt;

&lt;p&gt;The following figure shows the angle of the pendulum over time, which corresponds to the error of the controlled system (deviation from zero degree upright position of the pendulum). We can see that with the emulated delay of the 5G system, it takes significantly longer to bring the pendulum to the zero degree setpoint, although both pendulums did not tip over.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://blog.deterministic6g.eu/assets/images/inverted-pendulum-evaluation.png&quot; alt=&quot;Inverted pendulum evaluation&quot; title=&quot;Inverted pendulum evaluation&quot; class=&quot;align-center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Of course, a more detailed evaluation and rigorous analysis of the stability and performance of the system would be required for any real system. However, this demonstrations might suffice to show the idea of how to use the network delay emulator.&lt;/p&gt;

&lt;h1 id=&quot;acknowledgments&quot;&gt;Acknowledgments&lt;/h1&gt;

&lt;p&gt;NetworkDelayEmulator was originally developed by Lorenz Grohmann in his &lt;a href=&quot;https://elib.uni-stuttgart.de/handle/11682/14240&quot;&gt;Bachelor’s thesis&lt;/a&gt; at University of Stuttgart in the context of the DETERMINISTIC6G project, which has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No. 101096504.&lt;/p&gt;</content><author><name>Frank Dürr</name></author><category term="linux" /><category term="tsn" /><category term="5g" /><category term="6g" /><category term="networking" /><category term="emulation" /><category term="linux" /><category term="TSN" /><category term="networking" /><category term="emulation" /><category term="5G" /><category term="6G" /><summary type="html">In this blog post, we present the DETERMINISTIC6G Network Delay Emulator for Linux.</summary></entry></feed>