diff --git a/ota.rst b/ota.rst index 166f89cc366a54fbb00295459ac71c5060090a3d..b03454e7abf34fa048e6e2097cc9b0ad148c9f0f 100644 --- a/ota.rst +++ b/ota.rst @@ -7,7 +7,7 @@ Over The Air (OTA) Updates ========================== |main_project_name| provides support for updating Linux devices in the field. -With certain preparations, derivative projects can prepare and distribute +With certain modifications, derivative projects can prepare and distribute periodic updates to bring in up-to-date security patches, new features and capabilities. @@ -28,22 +28,21 @@ This chapter contains specific advice to the implementer of the update system. complete product must tune and adjust a number of elements. Failure to understand and correctly implement the following advice can cause -catastrophic failure in the field. When in doubt, re-test and re-check. +significant failure in the field. When in doubt, re-test and re-check. Partitions .......... -|main_project_name| devices are using an A/B model with two immutable system -partitions, separate boot partition, separate application data partition and -separate system data partition and separate immutable device data partition. -The roles for those partitions were determined at the design stage and should +|main_project_name| devices use an A/B model with two immutable system +partitions and separate partitions for boot, application data, system data and immutable device data. +The roles for these partitions was determined at the design stage and should be used in according with the intent. OS, not apps ............ The update stack is designed to update the operating system, not applications. -Applications _may_ be embedded into the operating system image but _should_ be +Applications _may_ be embedded into the operating system image but ideally _should_ be delivered as separate entities, for example, as system containers, because that de-couples their life-cycle and upgrade frequency from that of the base system. @@ -52,8 +51,8 @@ Care should be taken to plan ahead, so that their sizes are not a constraining factor during the evolution of the system software. This is also related to any applications that may be bundled in the system image. -Each update involves system re-boot. In case of failure (total or partial), -another reboot is performed for the rollback operation. In contrast some +Each update requires a system re-boot. In case of failure (total or partial), +another reboot is performed for the rollback operation. In contrast, some application update stacks may be able to achieve zero-downtime updates. Plan your updates such, so that least downtime and interruption occurs for the @@ -75,7 +74,7 @@ product recall. Space Requirements .................. -Update involves downloading the complete copy of the system partition. The +An update involves downloading the complete copy of the system partition. The device must either use the data partition (which should have enough storage for typical use-cases) or must combine having enough memory in RAM-based file system _and_ use small enough images to ensure that the copy may be fully downloaded. @@ -91,16 +90,16 @@ Time Requirements Update frequency incurs proportional load on the update server. A large enough fleet of devices merely _checking_ for an update can take down any single -server. To alleviate this product design should balance update frequency (in +server. To alleviate this, product design should balance update frequency (in some cases it can be controlled remotely post-deployment) and to spread the load over time. It is strongly advisable to evenly distribute update checks with a random element. If any potential updates must occur at a specific local time (e.g. between three and four AM), then the system must be correctly configured to observe the correct time zone. The update server can be scaled horizontally, -to an extent. At least for NetOTA care was taken to allow efficiency at scale, +to an extent. At least for NetOTA, care was taken to allow efficiency at scale, with stateless operation and no need for a traditional database. Any number of -geographically distributed replicas, behind load balancers and geo-routing can -take arbitrarily large load. The update server (both HawkBit and NetOTA) uses +geographically distributed replicas, behind load balancers and geo-routing, can +withstand an arbitrarily large load. The update server (both HawkBit and NetOTA) uses separates meta-data from file storage, allowing to offload network traffic to optimized CDN solutions. @@ -125,22 +124,22 @@ The disk is partitioned into the following partitions: - system-data (ext4) - app-data (ext4) -The update stack interacts with the boot partition, the system a and b -partitions and the system data partition. Remaining partitions may be used by +The update stack interacts with the boot partition, the system-a and system-b +partitions and the system-data partition. Remaining partitions may be used by other parts of the system, but are not directly affected by anything that -happens during the update process. +happens during the system (base OS) update process. Boot and update process ----------------------- -Platform specific boot loader chooses one of the system partitions, either A or +The platform-specific boot loader chooses one of the system partitions, either A or B, and boots into it. On EFI systems the kernel is loaded from the system partition. Other boot loaders may need to load the kernel from the boot partition. Appropriate redundancy scheme is used, to allow more than one kernel to co-exist. During early initialization of userspace, the immutable system partition mounted -at `/` is augmented with bind mounts to other partitions. In general application +at `/` is augmented with bind mounts to other partitions. In general, application data (e.g. containers and other large data sets) is meant to live on the application data partition, which does not use the A/B update model. @@ -150,7 +149,7 @@ Applications that are compiled into the image need overrides for their Yocto to allow them to persist state. This is handled with the ``WRITABLES`` system which is not documented here. -When an update is initiated, a complete image is downloaded to temporary +When an update is initiated, a complete system image is downloaded to temporary storage. The image is cryptographically verified against the RAUC signing key or key-chain. Compatibility is checked against the RAUC ``COMPATIBLE`` string. @@ -163,7 +162,7 @@ active, then the image is copied to *slot B*. Platform-specific logic is then used to configure the boot system to boot into the newly written slot **once**. This acts as a safety mechanism, ensuring that power loss anywhere during the update process has the effect of reverting back to the known-good image. After -the image is written, platform specific post-install schedules the device to +the image is written, a platform-specific post-install hook schedules the device to reboot, perhaps in a special way to ensure the boot-once constraint. During boot-up, platform firmware or GRUB EFI application detects the boot-once @@ -173,13 +172,13 @@ applications. On successful boot, late userspace takes the decision to commit the update transaction. A committed transaction atomically swaps the active-inactive role of the two system partitions. -If failure, for example power loss or unexpected software error, prevents +If failure, for example due to power loss or unexpected software error, prevents reaching the commit stage, then update commit will not happen. Depending on the nature of the failure the device may restart automatically, or may need to restarted externally. It is recommended to equip and configure a hardware -watchdog to avoid the need of manual recovery during this critical step. +watchdog to avoid the need for manual recovery during this critical step, while ensuring the the watchdog doesn't result in a reboot loop. -Once restarted the good slot is booted into automatically and the upgrade is +Once restarted the known-good slot is booted into automatically and the upgrade is aborted. Temporary data saved during the update process is removed, so that it does not accumulate in the boot partition. @@ -191,11 +190,11 @@ Supported update servers HawkBit is a mature solution and recommended for scenarios where devices are managed centrally by a single authority. The device manufacturer may sell white-label boxes, deferring all management to the integrator or reseller. The -integrator must deploy operate and maintain a HawkBit installation for the +integrator must deploy, operate and maintain a HawkBit installation for the lifetime of the product. All devices deployed in the field must be explicitly provisioned with location and credentials before updates can be distributed. -NetOTA is not as mature but is recommended for scenarios where no central +NetOTA is still under development but is recommended for scenarios where no central authority manages devices, but the device manufacturer or vendor still maintains the software over time, releasing updates that devices may install at any time. The manufacturer may pre-provision all devices with the location of the update @@ -238,7 +237,7 @@ architecture and deploy a scalable installation across multiple machines. Deploying HawkBit ................. -To deploy HawkBit for a evaluation it is best to use the ``hawkbit`` snap +In order to evaluate HawkBit, it is best to use the ``hawkbit`` snap package. The package offers several stability levels expressed as distinct snap tracks. Installation instructions can be found on the `hawkbit snap information page <https://snapcraft.io/hawkbit>`_. @@ -264,7 +263,7 @@ HawkBit for evaluation, set the listen address to `0.0.0.0` or `::`, so that the service is reachable from all the network interfaces. This can be done with ``snap set hawkbit address=0.0.0.0``. -Once HawkBit is installed, either using the snap or in any other way, it should +Once HawkBit is installed, it should be configured in one of several ways. The primary deciding factor is how devices authenticate to HawkBit. The full documentation is beyond the scope of this document, but for simple deployments we recommend either using *per-device @@ -279,7 +278,7 @@ menu. In HawkBit nomenclature, a device is called a _target_. Targets may be clustered into target types, which aid in maintaining a heterogeneous fleet more easily. Each target has a *controller ID*, which is an unique string identifying the device in the system. In some authentication modes, devices need to be -provisioned with not only the URL of the HawkBit server, but also with their +provisioned not only with the URL of the HawkBit server, but also with their *controller ID* and *security token*. Mass deployments can be performed using bulk upload or using the management API. @@ -297,7 +296,7 @@ Provisioning Devices for HawkBit SysOTA does not contain a native HawkBit client yet, so it leverages the ``rauc-hawkbit-updater`` program for this role. Said program reads a configuration file ``/etc/rauc-hawkbit-updater/config.conf``, which must be -owned by the ``rauc-hawkbit`` user, connects to a given HawkBit server +owned by the ``rauc-hawkbit`` user, connects to a given HawkBit server, authenticates using either device or gateway token and then listens for events. |main_project_name| images contain a sample configuration file in @@ -327,8 +326,8 @@ Working with HawkBit .................... HawkBit has both the web dashboard and a complex set of REST APIs covering all -aspects of the management story. During exploration and evaluation it is -recommended to use the graphical user interface. As the workflow solidifies it +aspects of the management story. During exploration and evaluation, it is +recommended to use the graphical user interface. As the workflow solidifies, it is encouraged to switch the REST APIs and automation. The general data model related to updates is as follows: @@ -340,7 +339,7 @@ The general data model related to updates is as follows: The |main_project_name| project has created the ``hawkbitctl`` utility, which easily create the required scaffolding and to upload the bundle to the server. While useful, the tool does not cover the entire API surface yet and you may -find that specific functionality is missing. In cases like that custom +find that specific functionality is missing. In cases like that, custom solutions, for example scripts using ``curl`` may be used as a stop-gap measure. @@ -391,7 +390,7 @@ is deployed and devices are provisioned, is as follows: - Bind the *software module* to the *distribution set* (by drag-and-drop). At this stage, the update is uploaded and can be rolled out or assigned to -individual devices. Once a device is asked to update it will download and +individual devices. Once a device is asked to update, it will download and install the bundle. Basic information about the process is relayed from the device to HawkBit and can be seen in per-device action history. @@ -408,7 +407,7 @@ HTTPS to check if an update is available. In this mode whoever operates the NetOTA server chooses the composition and number of available system images and devices can be configured to follow a specific image name and stability level. Unlike in the HawkBit model, the central server has no control over the devices. -Instead anyone controlling individual devices chooses the server, the image name +Instead, anyone controlling individual devices chooses the server, the image name and the stability level and then follows along at the pace determined by the device. @@ -430,7 +429,7 @@ documentation <https://gitlab.com/zygoon/netota>`_. Deploying NetOTA ................ -To deploy NetOTA for a evaluation it is best to use the ``netota`` snap +To deploy NetOTA for evaluation, it is best to use the ``netota`` snap package. The package offers several stability levels expressed as distinct snap tracks. Installation instructions can be found on the `netota snap information page <https://snapcraft.io/netota>`_. @@ -441,7 +440,7 @@ This version is annotated with the git commit hash and a sequential number counted since the most recent tag. Once ``netota`` snap is installed, consult the ``snap info netota`` command and -read the description explaining available configuration options. Those are +read the description explaining available configuration options. Those options are managed through the snap configuration system. By default NetOTA listens on ``localhost``, port ``8000`` and is meant to be exposed by a reverse http proxy. Evaluation installations can use the insecure http protocol directly and @@ -453,7 +452,7 @@ address=0.0.0.0:8000``. NetOTA does not offer any graphical dashboards and is configured by placing files in the file system. The snap package uses the directory ``/var/snap/netota/common/repository`` as the root of the data set. Upon -installation an ``example`` package is copied there. It can be used to +installation, an ``example`` package is copied there. It can be used to understand the data structure used by NetOTA. Evaluation deployments can edit the data in place with a text editor. Production deployments are advised to use a git repository to track deployment operations. Updates to the repository do @@ -514,7 +513,7 @@ is deployed and devices are provisioned, is as follows: - Choose which stream to publish the bundle to. You can create additional streams at will, by touching a ``foo.stream`` file. Make sure to create the corresponding ``foo.stream.d`` directory as well. This will create the stream - ``foo``. If you choose an existing stream remember that all the *archives* + ``foo``. If you choose an existing stream, remember that all the *archives* present in that stream must have the exact same version. This means you may need to perform additional builds, if the package is built for more than one architecture or ``MACHINE`` value. @@ -529,7 +528,7 @@ is deployed and devices are provisioned, is as follows: services for your fleet. - If you are doing this for the first time, make sure to read the upstream documentation of the NetOTA project and consult the sample repository created - by the ``netota`` snap package on first install. Ideally keep the changes + by the ``netota`` snap package on first install. Ideally, keep the changes you've made in a git repository, so that you can both track any changes or revert back to previous state. - Restart the NetOTA service or sent ``SIGHUP`` to the ``netotad`` process.