Skip to content
Snippets Groups Projects
Commit 77079bf2 authored by Zygmunt Krynicki's avatar Zygmunt Krynicki
Browse files

Numerous suggestions from Amit and Andrei


Signed-off-by: default avatarZygmunt Krynicki <zygmunt.krynicki@huawei.com>
parent db894ea8
No related branches found
No related tags found
1 merge request!26Add OTA documentation
Pipeline #10908 passed
...@@ -7,7 +7,7 @@ Over The Air (OTA) Updates ...@@ -7,7 +7,7 @@ Over The Air (OTA) Updates
========================== ==========================
|main_project_name| provides support for updating Linux devices in the field. |main_project_name| provides support for updating Linux devices in the field.
With certain preparations, derivative projects can prepare and distribute With certain modifications, derivative projects can prepare and distribute
periodic updates to bring in up-to-date security patches, new features and periodic updates to bring in up-to-date security patches, new features and
capabilities. capabilities.
...@@ -28,22 +28,21 @@ This chapter contains specific advice to the implementer of the update system. ...@@ -28,22 +28,21 @@ This chapter contains specific advice to the implementer of the update system.
complete product must tune and adjust a number of elements. complete product must tune and adjust a number of elements.
Failure to understand and correctly implement the following advice can cause Failure to understand and correctly implement the following advice can cause
catastrophic failure in the field. When in doubt, re-test and re-check. significant failure in the field. When in doubt, re-test and re-check.
Partitions Partitions
.......... ..........
|main_project_name| devices are using an A/B model with two immutable system |main_project_name| devices use an A/B model with two immutable system
partitions, separate boot partition, separate application data partition and partitions and separate partitions for boot, application data, system data and immutable device data.
separate system data partition and separate immutable device data partition. The roles for these partitions was determined at the design stage and should
The roles for those partitions were determined at the design stage and should
be used in according with the intent. be used in according with the intent.
OS, not apps OS, not apps
............ ............
The update stack is designed to update the operating system, not applications. The update stack is designed to update the operating system, not applications.
Applications _may_ be embedded into the operating system image but _should_ be Applications _may_ be embedded into the operating system image but ideally _should_ be
delivered as separate entities, for example, as system containers, because that delivered as separate entities, for example, as system containers, because that
de-couples their life-cycle and upgrade frequency from that of the base system. de-couples their life-cycle and upgrade frequency from that of the base system.
...@@ -52,8 +51,8 @@ Care should be taken to plan ahead, so that their sizes are not a constraining ...@@ -52,8 +51,8 @@ Care should be taken to plan ahead, so that their sizes are not a constraining
factor during the evolution of the system software. This is also related to any factor during the evolution of the system software. This is also related to any
applications that may be bundled in the system image. applications that may be bundled in the system image.
Each update involves system re-boot. In case of failure (total or partial), Each update requires a system re-boot. In case of failure (total or partial),
another reboot is performed for the rollback operation. In contrast some another reboot is performed for the rollback operation. In contrast, some
application update stacks may be able to achieve zero-downtime updates. application update stacks may be able to achieve zero-downtime updates.
Plan your updates such, so that least downtime and interruption occurs for the Plan your updates such, so that least downtime and interruption occurs for the
...@@ -75,7 +74,7 @@ product recall. ...@@ -75,7 +74,7 @@ product recall.
Space Requirements Space Requirements
.................. ..................
Update involves downloading the complete copy of the system partition. The An update involves downloading the complete copy of the system partition. The
device must either use the data partition (which should have enough storage for device must either use the data partition (which should have enough storage for
typical use-cases) or must combine having enough memory in RAM-based file system typical use-cases) or must combine having enough memory in RAM-based file system
_and_ use small enough images to ensure that the copy may be fully downloaded. _and_ use small enough images to ensure that the copy may be fully downloaded.
...@@ -91,16 +90,16 @@ Time Requirements ...@@ -91,16 +90,16 @@ Time Requirements
Update frequency incurs proportional load on the update server. A large enough Update frequency incurs proportional load on the update server. A large enough
fleet of devices merely _checking_ for an update can take down any single fleet of devices merely _checking_ for an update can take down any single
server. To alleviate this product design should balance update frequency (in server. To alleviate this, product design should balance update frequency (in
some cases it can be controlled remotely post-deployment) and to spread the load some cases it can be controlled remotely post-deployment) and to spread the load
over time. It is strongly advisable to evenly distribute update checks with a over time. It is strongly advisable to evenly distribute update checks with a
random element. If any potential updates must occur at a specific local time random element. If any potential updates must occur at a specific local time
(e.g. between three and four AM), then the system must be correctly configured (e.g. between three and four AM), then the system must be correctly configured
to observe the correct time zone. The update server can be scaled horizontally, to observe the correct time zone. The update server can be scaled horizontally,
to an extent. At least for NetOTA care was taken to allow efficiency at scale, to an extent. At least for NetOTA, care was taken to allow efficiency at scale,
with stateless operation and no need for a traditional database. Any number of with stateless operation and no need for a traditional database. Any number of
geographically distributed replicas, behind load balancers and geo-routing can geographically distributed replicas, behind load balancers and geo-routing, can
take arbitrarily large load. The update server (both HawkBit and NetOTA) uses withstand an arbitrarily large load. The update server (both HawkBit and NetOTA) uses
separates meta-data from file storage, allowing to offload network traffic to separates meta-data from file storage, allowing to offload network traffic to
optimized CDN solutions. optimized CDN solutions.
...@@ -125,22 +124,22 @@ The disk is partitioned into the following partitions: ...@@ -125,22 +124,22 @@ The disk is partitioned into the following partitions:
- system-data (ext4) - system-data (ext4)
- app-data (ext4) - app-data (ext4)
The update stack interacts with the boot partition, the system a and b The update stack interacts with the boot partition, the system-a and system-b
partitions and the system data partition. Remaining partitions may be used by partitions and the system-data partition. Remaining partitions may be used by
other parts of the system, but are not directly affected by anything that other parts of the system, but are not directly affected by anything that
happens during the update process. happens during the system (base OS) update process.
Boot and update process Boot and update process
----------------------- -----------------------
Platform specific boot loader chooses one of the system partitions, either A or The platform-specific boot loader chooses one of the system partitions, either A or
B, and boots into it. On EFI systems the kernel is loaded from the system B, and boots into it. On EFI systems the kernel is loaded from the system
partition. Other boot loaders may need to load the kernel from the boot partition. Other boot loaders may need to load the kernel from the boot
partition. Appropriate redundancy scheme is used, to allow more than one kernel partition. Appropriate redundancy scheme is used, to allow more than one kernel
to co-exist. to co-exist.
During early initialization of userspace, the immutable system partition mounted During early initialization of userspace, the immutable system partition mounted
at `/` is augmented with bind mounts to other partitions. In general application at `/` is augmented with bind mounts to other partitions. In general, application
data (e.g. containers and other large data sets) is meant to live on the data (e.g. containers and other large data sets) is meant to live on the
application data partition, which does not use the A/B update model. application data partition, which does not use the A/B update model.
...@@ -150,7 +149,7 @@ Applications that are compiled into the image need overrides for their Yocto to ...@@ -150,7 +149,7 @@ Applications that are compiled into the image need overrides for their Yocto to
allow them to persist state. This is handled with the ``WRITABLES`` system which allow them to persist state. This is handled with the ``WRITABLES`` system which
is not documented here. is not documented here.
When an update is initiated, a complete image is downloaded to temporary When an update is initiated, a complete system image is downloaded to temporary
storage. The image is cryptographically verified against the RAUC signing key or storage. The image is cryptographically verified against the RAUC signing key or
key-chain. Compatibility is checked against the RAUC ``COMPATIBLE`` string. key-chain. Compatibility is checked against the RAUC ``COMPATIBLE`` string.
...@@ -163,7 +162,7 @@ active, then the image is copied to *slot B*. Platform-specific logic is then ...@@ -163,7 +162,7 @@ active, then the image is copied to *slot B*. Platform-specific logic is then
used to configure the boot system to boot into the newly written slot **once**. used to configure the boot system to boot into the newly written slot **once**.
This acts as a safety mechanism, ensuring that power loss anywhere during the This acts as a safety mechanism, ensuring that power loss anywhere during the
update process has the effect of reverting back to the known-good image. After update process has the effect of reverting back to the known-good image. After
the image is written, platform specific post-install schedules the device to the image is written, a platform-specific post-install hook schedules the device to
reboot, perhaps in a special way to ensure the boot-once constraint. reboot, perhaps in a special way to ensure the boot-once constraint.
During boot-up, platform firmware or GRUB EFI application detects the boot-once During boot-up, platform firmware or GRUB EFI application detects the boot-once
...@@ -173,13 +172,13 @@ applications. On successful boot, late userspace takes the decision to commit ...@@ -173,13 +172,13 @@ applications. On successful boot, late userspace takes the decision to commit
the update transaction. A committed transaction atomically swaps the the update transaction. A committed transaction atomically swaps the
active-inactive role of the two system partitions. active-inactive role of the two system partitions.
If failure, for example power loss or unexpected software error, prevents If failure, for example due to power loss or unexpected software error, prevents
reaching the commit stage, then update commit will not happen. Depending on the reaching the commit stage, then update commit will not happen. Depending on the
nature of the failure the device may restart automatically, or may need to nature of the failure the device may restart automatically, or may need to
restarted externally. It is recommended to equip and configure a hardware restarted externally. It is recommended to equip and configure a hardware
watchdog to avoid the need of manual recovery during this critical step. watchdog to avoid the need for manual recovery during this critical step, while ensuring the the watchdog doesn't result in a reboot loop.
Once restarted the good slot is booted into automatically and the upgrade is Once restarted the known-good slot is booted into automatically and the upgrade is
aborted. Temporary data saved during the update process is removed, so that it aborted. Temporary data saved during the update process is removed, so that it
does not accumulate in the boot partition. does not accumulate in the boot partition.
...@@ -191,11 +190,11 @@ Supported update servers ...@@ -191,11 +190,11 @@ Supported update servers
HawkBit is a mature solution and recommended for scenarios where devices are HawkBit is a mature solution and recommended for scenarios where devices are
managed centrally by a single authority. The device manufacturer may sell managed centrally by a single authority. The device manufacturer may sell
white-label boxes, deferring all management to the integrator or reseller. The white-label boxes, deferring all management to the integrator or reseller. The
integrator must deploy operate and maintain a HawkBit installation for the integrator must deploy, operate and maintain a HawkBit installation for the
lifetime of the product. All devices deployed in the field must be explicitly lifetime of the product. All devices deployed in the field must be explicitly
provisioned with location and credentials before updates can be distributed. provisioned with location and credentials before updates can be distributed.
NetOTA is not as mature but is recommended for scenarios where no central NetOTA is still under development but is recommended for scenarios where no central
authority manages devices, but the device manufacturer or vendor still maintains authority manages devices, but the device manufacturer or vendor still maintains
the software over time, releasing updates that devices may install at any time. the software over time, releasing updates that devices may install at any time.
The manufacturer may pre-provision all devices with the location of the update The manufacturer may pre-provision all devices with the location of the update
...@@ -238,7 +237,7 @@ architecture and deploy a scalable installation across multiple machines. ...@@ -238,7 +237,7 @@ architecture and deploy a scalable installation across multiple machines.
Deploying HawkBit Deploying HawkBit
................. .................
To deploy HawkBit for a evaluation it is best to use the ``hawkbit`` snap In order to evaluate HawkBit, it is best to use the ``hawkbit`` snap
package. The package offers several stability levels expressed as distinct snap package. The package offers several stability levels expressed as distinct snap
tracks. Installation instructions can be found on the `hawkbit snap information tracks. Installation instructions can be found on the `hawkbit snap information
page <https://snapcraft.io/hawkbit>`_. page <https://snapcraft.io/hawkbit>`_.
...@@ -264,7 +263,7 @@ HawkBit for evaluation, set the listen address to `0.0.0.0` or `::`, so that the ...@@ -264,7 +263,7 @@ HawkBit for evaluation, set the listen address to `0.0.0.0` or `::`, so that the
service is reachable from all the network interfaces. This can be done with service is reachable from all the network interfaces. This can be done with
``snap set hawkbit address=0.0.0.0``. ``snap set hawkbit address=0.0.0.0``.
Once HawkBit is installed, either using the snap or in any other way, it should Once HawkBit is installed, it should
be configured in one of several ways. The primary deciding factor is how devices be configured in one of several ways. The primary deciding factor is how devices
authenticate to HawkBit. The full documentation is beyond the scope of this authenticate to HawkBit. The full documentation is beyond the scope of this
document, but for simple deployments we recommend either using *per-device document, but for simple deployments we recommend either using *per-device
...@@ -279,7 +278,7 @@ menu. In HawkBit nomenclature, a device is called a _target_. Targets may be ...@@ -279,7 +278,7 @@ menu. In HawkBit nomenclature, a device is called a _target_. Targets may be
clustered into target types, which aid in maintaining a heterogeneous fleet more clustered into target types, which aid in maintaining a heterogeneous fleet more
easily. Each target has a *controller ID*, which is an unique string identifying easily. Each target has a *controller ID*, which is an unique string identifying
the device in the system. In some authentication modes, devices need to be the device in the system. In some authentication modes, devices need to be
provisioned with not only the URL of the HawkBit server, but also with their provisioned not only with the URL of the HawkBit server, but also with their
*controller ID* and *security token*. Mass deployments can be performed using *controller ID* and *security token*. Mass deployments can be performed using
bulk upload or using the management API. bulk upload or using the management API.
...@@ -297,7 +296,7 @@ Provisioning Devices for HawkBit ...@@ -297,7 +296,7 @@ Provisioning Devices for HawkBit
SysOTA does not contain a native HawkBit client yet, so it leverages the SysOTA does not contain a native HawkBit client yet, so it leverages the
``rauc-hawkbit-updater`` program for this role. Said program reads a ``rauc-hawkbit-updater`` program for this role. Said program reads a
configuration file ``/etc/rauc-hawkbit-updater/config.conf``, which must be configuration file ``/etc/rauc-hawkbit-updater/config.conf``, which must be
owned by the ``rauc-hawkbit`` user, connects to a given HawkBit server owned by the ``rauc-hawkbit`` user, connects to a given HawkBit server,
authenticates using either device or gateway token and then listens for events. authenticates using either device or gateway token and then listens for events.
|main_project_name| images contain a sample configuration file in |main_project_name| images contain a sample configuration file in
...@@ -327,8 +326,8 @@ Working with HawkBit ...@@ -327,8 +326,8 @@ Working with HawkBit
.................... ....................
HawkBit has both the web dashboard and a complex set of REST APIs covering all HawkBit has both the web dashboard and a complex set of REST APIs covering all
aspects of the management story. During exploration and evaluation it is aspects of the management story. During exploration and evaluation, it is
recommended to use the graphical user interface. As the workflow solidifies it recommended to use the graphical user interface. As the workflow solidifies, it
is encouraged to switch the REST APIs and automation. is encouraged to switch the REST APIs and automation.
The general data model related to updates is as follows: The general data model related to updates is as follows:
...@@ -340,7 +339,7 @@ The general data model related to updates is as follows: ...@@ -340,7 +339,7 @@ The general data model related to updates is as follows:
The |main_project_name| project has created the ``hawkbitctl`` utility, which The |main_project_name| project has created the ``hawkbitctl`` utility, which
easily create the required scaffolding and to upload the bundle to the server. easily create the required scaffolding and to upload the bundle to the server.
While useful, the tool does not cover the entire API surface yet and you may While useful, the tool does not cover the entire API surface yet and you may
find that specific functionality is missing. In cases like that custom find that specific functionality is missing. In cases like that, custom
solutions, for example scripts using ``curl`` may be used as a stop-gap solutions, for example scripts using ``curl`` may be used as a stop-gap
measure. measure.
...@@ -391,7 +390,7 @@ is deployed and devices are provisioned, is as follows: ...@@ -391,7 +390,7 @@ is deployed and devices are provisioned, is as follows:
- Bind the *software module* to the *distribution set* (by drag-and-drop). - Bind the *software module* to the *distribution set* (by drag-and-drop).
At this stage, the update is uploaded and can be rolled out or assigned to At this stage, the update is uploaded and can be rolled out or assigned to
individual devices. Once a device is asked to update it will download and individual devices. Once a device is asked to update, it will download and
install the bundle. Basic information about the process is relayed from the install the bundle. Basic information about the process is relayed from the
device to HawkBit and can be seen in per-device action history. device to HawkBit and can be seen in per-device action history.
...@@ -408,7 +407,7 @@ HTTPS to check if an update is available. In this mode whoever operates the ...@@ -408,7 +407,7 @@ HTTPS to check if an update is available. In this mode whoever operates the
NetOTA server chooses the composition and number of available system images and NetOTA server chooses the composition and number of available system images and
devices can be configured to follow a specific image name and stability level. devices can be configured to follow a specific image name and stability level.
Unlike in the HawkBit model, the central server has no control over the devices. Unlike in the HawkBit model, the central server has no control over the devices.
Instead anyone controlling individual devices chooses the server, the image name Instead, anyone controlling individual devices chooses the server, the image name
and the stability level and then follows along at the pace determined by the and the stability level and then follows along at the pace determined by the
device. device.
...@@ -430,7 +429,7 @@ documentation <https://gitlab.com/zygoon/netota>`_. ...@@ -430,7 +429,7 @@ documentation <https://gitlab.com/zygoon/netota>`_.
Deploying NetOTA Deploying NetOTA
................ ................
To deploy NetOTA for a evaluation it is best to use the ``netota`` snap To deploy NetOTA for evaluation, it is best to use the ``netota`` snap
package. The package offers several stability levels expressed as distinct snap package. The package offers several stability levels expressed as distinct snap
tracks. Installation instructions can be found on the `netota snap information tracks. Installation instructions can be found on the `netota snap information
page <https://snapcraft.io/netota>`_. page <https://snapcraft.io/netota>`_.
...@@ -441,7 +440,7 @@ This version is annotated with the git commit hash and a sequential number ...@@ -441,7 +440,7 @@ This version is annotated with the git commit hash and a sequential number
counted since the most recent tag. counted since the most recent tag.
Once ``netota`` snap is installed, consult the ``snap info netota`` command and Once ``netota`` snap is installed, consult the ``snap info netota`` command and
read the description explaining available configuration options. Those are read the description explaining available configuration options. Those options are
managed through the snap configuration system. By default NetOTA listens on managed through the snap configuration system. By default NetOTA listens on
``localhost``, port ``8000`` and is meant to be exposed by a reverse http ``localhost``, port ``8000`` and is meant to be exposed by a reverse http
proxy. Evaluation installations can use the insecure http protocol directly and proxy. Evaluation installations can use the insecure http protocol directly and
...@@ -453,7 +452,7 @@ address=0.0.0.0:8000``. ...@@ -453,7 +452,7 @@ address=0.0.0.0:8000``.
NetOTA does not offer any graphical dashboards and is configured by placing NetOTA does not offer any graphical dashboards and is configured by placing
files in the file system. The snap package uses the directory files in the file system. The snap package uses the directory
``/var/snap/netota/common/repository`` as the root of the data set. Upon ``/var/snap/netota/common/repository`` as the root of the data set. Upon
installation an ``example`` package is copied there. It can be used to installation, an ``example`` package is copied there. It can be used to
understand the data structure used by NetOTA. Evaluation deployments can edit understand the data structure used by NetOTA. Evaluation deployments can edit
the data in place with a text editor. Production deployments are advised to use the data in place with a text editor. Production deployments are advised to use
a git repository to track deployment operations. Updates to the repository do a git repository to track deployment operations. Updates to the repository do
...@@ -514,7 +513,7 @@ is deployed and devices are provisioned, is as follows: ...@@ -514,7 +513,7 @@ is deployed and devices are provisioned, is as follows:
- Choose which stream to publish the bundle to. You can create additional streams - Choose which stream to publish the bundle to. You can create additional streams
at will, by touching a ``foo.stream`` file. Make sure to create the at will, by touching a ``foo.stream`` file. Make sure to create the
corresponding ``foo.stream.d`` directory as well. This will create the stream corresponding ``foo.stream.d`` directory as well. This will create the stream
``foo``. If you choose an existing stream remember that all the *archives* ``foo``. If you choose an existing stream, remember that all the *archives*
present in that stream must have the exact same version. This means you may present in that stream must have the exact same version. This means you may
need to perform additional builds, if the package is built for more than one need to perform additional builds, if the package is built for more than one
architecture or ``MACHINE`` value. architecture or ``MACHINE`` value.
...@@ -529,7 +528,7 @@ is deployed and devices are provisioned, is as follows: ...@@ -529,7 +528,7 @@ is deployed and devices are provisioned, is as follows:
services for your fleet. services for your fleet.
- If you are doing this for the first time, make sure to read the upstream - If you are doing this for the first time, make sure to read the upstream
documentation of the NetOTA project and consult the sample repository created documentation of the NetOTA project and consult the sample repository created
by the ``netota`` snap package on first install. Ideally keep the changes by the ``netota`` snap package on first install. Ideally, keep the changes
you've made in a git repository, so that you can both track any changes or you've made in a git repository, so that you can both track any changes or
revert back to previous state. revert back to previous state.
- Restart the NetOTA service or sent ``SIGHUP`` to the ``netotad`` process. - Restart the NetOTA service or sent ``SIGHUP`` to the ``netotad`` process.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment