Commit Graph

498 Commits

Author SHA1 Message Date
Jan Čermák 2a622a929d
Limit reporting of errors in Supervisor logs fallback (#5040)
We do not need to capture HostNotSupported errors to Sentry. The only
possible code path this error might come from is where the Journal
Gateway Daemon socket is unavailable, which is already reported as an
"Unsupported system" repair.
2024-04-25 10:44:21 +02:00
Mike Degatano 8d18d2d9c6
Use signals to recognize new disks immediately (#5023)
* Use signals to recognize new disks immediately

* Add test for disabled data disk issue

* Add mock of UDisks2 base service to test

* Apply suggestions from code review

* Shutdown manager first to avoid potential race conditions

* Update tests/dbus_service_mocks/udisks2.py

Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>

---------

Co-authored-by: Stefan Agner <stefan@agner.ch>
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
2024-04-22 16:35:03 +02:00
Jan Čermák 18d9d32bca
Fix Supervisor logs fallback (#5022)
Supervisor logs fallback in get_supervisor_logs didn't work properly
because the exception was caught in api_process_raw instead. This was
not discovered in tests because the side effect raised OSError, which
isn't handled there.

To address that, I split the advanced_logs to two functions, one being a
wrapped API handler, one being plain function returning response without
any additional error handling. The tests now check for both cases of
errors (HassioError and random generic Python error).

Refs #5021
2024-04-22 09:42:12 +02:00
Mike Degatano 06513e88c6
Allow restarting core in safe mode (#5017) 2024-04-17 08:54:56 +02:00
Mike Degatano dfd8fe84e0
Mount manager reload mounts all failed mounts (#5014)
* Mount manager reload mounts all failed mounts

* Remove invalid part of mount manager reload test
2024-04-12 12:02:29 +02:00
Mike Degatano 672a7621f9
Adopt a disabled data disk (#5010) 2024-04-11 13:53:19 -04:00
Mike Degatano f0e2fb3f57
Addon load should not fail due to docker error (#5011) 2024-04-11 15:06:57 +02:00
Mike Degatano 50a2e8fde3
Allow adoption of existing data disk (#4991)
* Allow adoption of existing data disk

* Fix existing tests

* Add test cases and fix image issues

* Fix addon build test

* Run checks during setup not startup

* Addon load mimics plugin and HA load for docker part

* Default image accessible in except
2024-04-10 10:25:22 +02:00
Jan Čermák a894c4589e
Use Systemd Journal API for all logs endpoints in API (#4972)
* Use Systemd Journal API for all logs endpoints in API

Replace all logs endpoints using container logs with wrapped
advanced_logs function, adding possibility to get logs from previous
boots and following the logs. Supervisor logs are an excetion where
Docker logs are still used - in case an exception is raised while
accessing the Systemd logs, they're used as fallback - otherwise we
wouldn't have an easy way to see what went wrong.

* Refactor testing of advanced logs endpoints to a common method

* Send error while fetching Supervisor logs to Sentry; minor cleanup

* Properly handle errors and use consistent content type in logs endpoints

* Replace api_process_custom with reworked api_process_raw per @mdegat01 suggestion
2024-04-04 12:09:08 +02:00
Jan Čermák 906e400ab7
Fix submounts of /dev being read-only with Docker 25+ (#4997)
As described in #4996, Docker 25+ changes made sub-mounts of the /dev
filesystem to be mounted read-only. Revert to the previous behavior by
adjusting the ReadOnlyNonRecursive option. Cleaner way would be to
upstream support for setting this option via Mount class arguments, so
this change is meant to be rather a hotfix for the issue. Even better
approach would be mounting /dev non-recursively, and taking care of
creating all necessary filesystems when creating containers in
Supervisor.
2024-04-02 21:07:53 +02:00
Mike Degatano 90c971f9f1
Unsupported if wrong image used on virtualization (#4968)
* Unsupported if wrong image used on virtualization

* Add generic-aarch64 as supported image

* Add virtualization field to API

* Change startup to setup in check
2024-03-21 18:08:48 +01:00
Jan Čermák 0814552b2a
Use Journal Export Format for host (advanced) logs (#4963)
* Use Journal Export Format for host (advanced) logs

Add methods for handling Journal Export Format and use it for fetching
of host logs. This is foundation for colored streaming logs for other
endpoints as well.

* Make pylint happier - remove extra pass statement

* Rewrite journal gateway tests to mock ClientResponse's StreamReader

* Handle connection refused error when connecting to journal-gatewayd

* Use SYSTEMD_JOURNAL_GATEWAYD_SOCKET global path also for connection

* Use parsing algorithm suggested by @agners in review

* Fix timestamps in formatting, always use UTC for now

* Add tests for Accept header in host logs

* Apply suggestions from @agners

Co-authored-by: Stefan Agner <stefan@agner.ch>

* Bail out of parsing earlier if field is not in required fields

* Fix parsing issue discovered in the wild and add test case

* Make verbose formatter more tolerant

* Use some bytes' native functions for some minor optimizations

* Move MalformedBinaryEntryError to exceptions module, add test for it

---------

Co-authored-by: Stefan Agner <stefan@agner.ch>
2024-03-20 09:00:45 +01:00
Mike Degatano 0e0fadd72d
Fix some expected boot slot fields are optional (#4964)
* Fix some expected boot slot fields are optional

* Move stuff around to make pylint happy
2024-03-18 18:30:10 +01:00
Mike Degatano a8af04ff82
Cache existence of addon paths (#4944)
* Cache existence of addon paths

* Always update submodules

* Switch to an always cached model

* Cache on store addon only

* Fix tests

* refresh_cache to refresh_path_cache

* Fix name change in test

* Move logic into StoreManager
2024-03-15 16:43:26 +01:00
Mike Degatano 2148de45a0
Allow client to change boot slot via API (#4945)
* Allow client to change boot slot via API

* Wrap call to rauc in job that checks for OS

* Reboot after changing the active boot slot

* Add test cases and clean up

* BootName to BootSlot

* Fix test

* Rename boot_name to boot_slot

* Fix tests after field change
2024-03-15 10:36:37 -04:00
Mike Degatano 9ca927dbe7
Watchdog does not start core before supervisor (#4955) 2024-03-13 09:08:27 +01:00
Mike Degatano 74a5899626
Remove discovery config validation from supervisor (#4937)
* Remove discovery config validation from supervisor

* Remove invalid test

* Change validation to require a dictionary for compatibility
2024-03-05 16:25:15 +01:00
Mike Degatano 202ebf6d4e
Set core timeout from S6_SERVICES_GRACETIME (#4938) 2024-03-04 11:14:51 -05:00
Mike Degatano 2c7b417e25
APIForbidden should result in 403 status (#4943) 2024-03-04 11:09:17 -05:00
Mike Degatano 9d4848ee77
Add an admin only device wipe API (#4934)
* Add an admin only device wipe API

* Fix pylint issue
2024-02-29 10:29:52 -05:00
Mike Degatano 5126820619
Allow removing addon config on uninstall (#4913) 2024-02-29 10:24:51 -05:00
Mike Degatano 8b5c808e8c
Allow listing of HA users via admin CLI (#4912)
* Allow listing of HA users via admin CLI

* Filter out system generated users and fields
2024-02-28 13:30:37 -05:00
dependabot[bot] d493ccde28
Bump pytest from 7.4.4 to 8.0.1 (#4901)
* Bump pytest from 7.4.4 to 8.0.1

Bumps [pytest](https://github.com/pytest-dev/pytest) from 7.4.4 to 8.0.1.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/7.4.4...8.0.1)

---
updated-dependencies:
- dependency-name: pytest
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update pytest-asyncio to 0.23.5

* Set scope to function on fixture

* Unthrottle by patching last call to prevent carryover

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Stefan Agner <stefan@agner.ch>
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
2024-02-27 09:57:44 +01:00
J. Nick Koston 1f19f84edd
Create backups files without having to copy inner tarballs (#4884)
* Create backups files without having to copy inner tarballs

needs https://github.com/pvizeli/securetar/pull/33

* fix writing json

* fix writing json

* fixes

* fixes

* ensure cleaned up

* need ./

* fix type

* Bump securetar to 2024.2.0

changelog: https://github.com/pvizeli/securetar/compare/2023.12.0...2024.2.0

* backup file is now created sooner

* reorder so comment still makes sense
2024-02-14 09:24:43 +01:00
Mike Degatano b5bf270d22
Mount status checks look at connection (#4882)
* Mount status checks look at connection

* Fix tests and refactor to fixture

* Fix test
2024-02-12 17:32:54 +01:00
Mike Degatano 4c573991d2
Improve error handling when mounts fail (#4872) 2024-02-05 16:24:53 -05:00
Mike Degatano 7fd6dce55f
Migrate to Ruff for lint and format (#4852)
* Migrate to Ruff for lint and format

* Fix pylint issues

* DBus property sets into normal awaitable methods

* Fix tests relying on separate tasks in connect

* Fixes from feedback
2024-02-05 11:37:39 -05:00
Stefan Agner 190894010c
Reset failed API call counter on successful API call (#4862)
* Reset failed API call counter on successful API call

Make sure to reset the failed API call counter after a successful
API call. While at it also update the log messages a bit to make it
clearer what the problem is exactly.

* Address pytest changes
2024-01-31 11:41:21 -05:00
Mike Degatano 140b769a42
Auto updates to new version delay for 24 hours (#4838) 2024-01-30 08:58:28 -05:00
Mike Degatano 88d718271d
Fix serialization issue adding error to job (#4853) 2024-01-30 12:21:02 +01:00
Mike Degatano ddadbec7e3
Addon devs can block auto update for breaking versions (#4832) 2024-01-26 08:08:03 +01:00
Mike Degatano 480b383782
Add background option to backup APIs (#4802)
* Add background option to backup APIs

* Fix decorator tests

* Working error handling, initial test cases

* Change to schedule_job and always return job id

* Add tests

* Reorder call at/later args

* Validation errors return immediately in background

* None is invalid option for background

* Must pop the background option from body
2024-01-22 12:09:15 -05:00
J. Nick Koston eb85be2770
Improve json performance by porting core orjson utils (#4816)
* Improve json performance by porting core orjson utils

* port relevant tests

* pylint

* add test for read_json_file

* add test for read_json_file

* remove workaround for core issue we do not have here

---------

Co-authored-by: Pascal Vizeli <pvizeli@syshack.ch>
2024-01-13 19:19:01 +01:00
Mike Degatano 2da27937a5
Update python to 3.12 (#4815)
* Update python to 3.12

* Fix tests and deprecations

* Fix other references to 3.11

* build.json doesn't exist
2024-01-13 16:35:07 +01:00
Mike Degatano 5baf19f7a3
Migrate to pyproject.toml where possible (#4770)
* Migrate to pyproject.toml where possible

* Share requirements and fix version import

* Fix issues with timezone in tests
2023-12-29 11:46:01 +01:00
Mike Degatano 6c66a7ba17
Improve error handling in backup restore (#4791) 2023-12-29 11:45:50 +01:00
Jeff Oakley e08c8ca26d
Add support for setting target path in map config (#4694)
* Added support for setting addon target path in map config

* Updated addon target path mapping to use dataclass

* Added check before adding string folder maps

* Moved enum to addon/const, updated map_volumes logic, fixed test

* Removed log used for debugging

* Use more readable approach to determine addon_config_used

Co-authored-by: Mike Degatano <michael.degatano@gmail.com>

* Use cleaner approach for checking volume config

Co-authored-by: Mike Degatano <michael.degatano@gmail.com>

* Use dict syntax and ATTR_TYPE

Co-authored-by: Mike Degatano <michael.degatano@gmail.com>

* Use coerce for validating mapping type

Co-authored-by: Mike Degatano <michael.degatano@gmail.com>

* Default read_only to true in schema

Co-authored-by: Mike Degatano <michael.degatano@gmail.com>

* Use ATTR_TYPE and ATTR_READ_ONLY instead of static strings

Co-authored-by: Mike Degatano <michael.degatano@gmail.com>

* Use constants instead of in-line strings

Co-authored-by: Mike Degatano <michael.degatano@gmail.com>

* Correct type for path

Co-authored-by: Mike Degatano <michael.degatano@gmail.com>

* Added read_only and path constants

* Fixed small syntax error and added includes for constants

* Simplify logic for handling string and dict entries in map config

* Use ATTR_PATH instead of inline string

Co-authored-by: Mike Degatano <michael.degatano@gmail.com>

* Add missing ATTR_PATH reference

* Moved FolderMapping dataclass to data.py

* Fix edge case where "data" map type is used but optional path is not set

* Move FolderMapping dataclass to configuration.py to prevent circular reference

---------

Co-authored-by: Jeff Oakley <jeff.oakley@LearningCircleSoftware.com>
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
Co-authored-by: Pascal Vizeli <pvizeli@syshack.ch>
2023-12-27 15:14:23 -05:00
Stefan Agner 3e760f0d85
Always pass explicit architecture of installed add-ons (#4786)
* Pass architecture of installed add-on on update

When using multi-architecture container images, the architecture of the
add-on is not passed to Docker in all cases. This causes the
architecture of the Supervisor container to be used, which potentially
is not supported by the add-on in question.

This commit passes the architecture of the add-on to Docker, so that
the correct image is pulled.

* Call update with architecture

* Also pass architecture on add-on restore

* Fix pytest
2023-12-21 16:52:25 -05:00
Mike Degatano 3cc6bd19ad
Mark system as unhealthy on OSError Bad message errors (#4750)
* Bad message error marks system as unhealthy

* Finish adding test cases for changes

* Rename test file for uniqueness

* bad_message to oserror_bad_message

* Omit some checks and check for network mounts
2023-12-21 18:05:29 +01:00
Mike Degatano b7ddfba71d
Set max reanimation attempts on HA watchdog (#4784) 2023-12-21 16:44:39 +01:00
Jan Čermák ed7edd9fe0
Adjust "retry in ..." log messages to avoid confusion (#4783)
As shown in home-assistant/operating-system#3007, error messages printed
to logs when container installation fails can cause some confusion,
because they are sometimes printed to the log on the landing page.
Adjust all wordings of "retry in" to "retrying in" to make it obvious
this happens automatically.
2023-12-20 18:34:42 +01:00
Stefan Agner 7fef92c480
Fix fallback to non-SSL whoami call (#4751)
* Fix fallback to non-SSL whoami call

In case of an exception "data" is not set leading to an error:
cannot access local variable 'data' where it is not associated with a value

Make sure to fallback to the non-SSL whoami call properly.

* Add pytests

* Ignore protected access in pytests

* Add test when system time is behind by more than 3 days

* Fix test_adjust_system_datetime_if_time_behind test and cleanup
2023-12-12 15:24:46 -05:00
Mike Degatano c64744dedf
Refactor addons init to addons manager (#4760)
Co-authored-by: Stefan Agner <stefan@agner.ch>
2023-12-12 09:36:05 +01:00
Stefan Agner a0429179a0
Add Raspberry Pi 5 (#4757) 2023-12-11 11:14:04 +01:00
Stefan Agner 96f4ba5d25
Check/get ingress port on add-on load (#4744)
Instead of setting the ingress port on install, make sure to set
the port when the add-on gets loaded (on Supervisor startup and
before installation). This is necessary since the dynamic ingress
ports are not stored as part of the add-on data storage themself
but in the ingress data store. So on every Supervisor start the
port needs to be transferred to the add-on model.

Note that we still need to check the port on add-on update since
the add-on potentially added (dynamic) ingress on update. Same
applies to add-on restore (the restored version might use a dynamic
ingress port).
2023-12-06 10:46:47 +01:00
Stefan Agner 883e54f989
Make check_port an async function (#4677)
* Make check_port asyncio

This requires to change the ingress_port property to a async method.

* Avoid using wait_for

* Add missing async

* Really await

* Set dynamic ingress port on add-on installation/update

* Fix pytest issue

* Rename async_check_port back to check_port

* Raise RuntimeError in case port is not set

* Make sure port gets set on add-on restore

* Drop unnecessary async

* Simplify check_port by using asyncio.get_running_loop()
2023-12-05 15:49:35 -05:00
Stefan Agner 11ec6dd9ac
Wait until mount unit is deactivated on unmount (#4733)
* Wait until mount unit is deactivated on unmount

The current code does not wait until the (bind) mount unit has been
actually deactivated (state "inactive"). This is especially problematic
when restoring a backup, where we deactivate all bind mounts before
restoring the target folder. Before the tarball is actually restored,
we delete all contents of the target folder. This lead to the situation
where the "rm -rf" command got executed before the bind mount actually
got unmounted.

The current code polls the state using an exponentially increasing
delay. Wait up to 30s for the bind mount to actually deactivate.

* Fix function name

* Fix missing await

* Address pytest errors

Change state of systemd unit according to use cases. Note that this
is currently rather fragile, and ideally we should have a smarter
mock service instead.

* Fix pylint

* Fix remaining

* Check transition fo failed as well

* Used alternative mocking mechanism

* Remove state lists in test_manager

---------

Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
2023-12-01 00:35:15 +01:00
Erik Montnemery 95ac53d780
Bump core shutdown timeout for new pre-stopping core state (#4736)
* Bump core shutdown timeout

* Clarify comment

* Update tests
2023-11-28 15:03:25 -05:00
Stefan Agner 9088810b49
Improve D-Bus error handling for NetworkManager (#4720)
* Improve D-Bus error handling for NetworkManager

There are quite some errors captured which are related by seemingly a
suddenly missing NetworkManager. Errors appear as:
23-11-21 17:42:50 ERROR (MainThread) [supervisor.dbus.network] Error while processing /org/freedesktop/NetworkManager/Devices/10: Remote peer disconnected
...
23-11-21 17:42:50 ERROR (MainThread) [supervisor.dbus.network] Error while processing /org/freedesktop/NetworkManager/Devices/35: The name is not activatable

Both errors seem to already happen at introspection time, however
the current code doesn't converts these errors to Supervisor issues.
This PR uses the already existing `DBus.from_dbus_error()`.

Furthermore this adds a new Exception `DBusNoReplyError` for the
`ErrorType.NO_REPLY` (or `org.freedesktop.DBus.Error.NoReply` in
D-Bus terms, which is the type of the first of the two issues above).

And finally it separates the `ErrorType.SERVICE_UNKNOWN` (or
`org.freedesktop.DBus.Error.ServiceUnknown` in D-Bus terms, which is
the second of the above issue) from `DBusInterfaceError` into a new
`DBusServiceUnkownError`.

This allows to handle errors more specifically.

To avoid too much churn, all instances where `DBusInterfaceError`
got handled, we are now also handling `DBusServiceUnkownError`.

The `DBusNoReplyError` and `DBusServiceUnkownError` appear when
the NetworkManager service stops or crashes. Instead of retrying
every interface we know, just give up if one of these issues appear.
This should significantly lower error messages users are seeing
and Sentry events.

* Remove unnecessary statement

* Fix pytests

* Make sure error strings are compared correctly

* Fix typo/remove unnecessary pylint exception

* Fix DBusError typing

* Add pytest for from_dbus_error

* Revert "Make sure error strings are compared correctly"

This reverts commit 10dc2e4c38.

* Add test cases

---------

Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
2023-11-27 23:32:11 +01:00
Mike Degatano c74f87ca12
Fix ingress session cleanup (#4719) 2023-11-21 11:56:01 -05:00