mirror of
https://github.com/home-assistant/core
synced 2024-09-06 10:29:55 +02:00
scrape: extract strings from new non-text tags (#35021)
With the upgrade to beautifulsoup4 to 4.9.0 (#34007), certain tags (`<style>`, `<script>` and `<template>`) are no longer treated as having text content (see https://www.crummy.com/software/BeautifulSoup/bs4/doc/#comments-and-other-special-strings and reported bug https://bugs.launchpad.net/beautifulsoup/+bug/1868861) meaning the content of these types of tags became inaccessible to HA. Where the previous code could access `.text` on the tag, bs4 4.9 now yields an empty string; these types of tags require accesing `.string` instead. This PR checks the tag name (which will aalways be lowercase given how the parser works; https://www.crummy.com/software/BeautifulSoup/bs4/doc/#other-parser-problems) and applies this different access strategy to get the content of the HTML tag. All other tags are handled in the original manner.
This commit is contained in:
parent
49979d0a75
commit
7a73c6adf7
@ -132,7 +132,11 @@ class ScrapeSensor(Entity):
|
||||
if self._attr is not None:
|
||||
value = raw_data.select(self._select)[self._index][self._attr]
|
||||
else:
|
||||
value = raw_data.select(self._select)[self._index].text
|
||||
tag = raw_data.select(self._select)[self._index]
|
||||
if tag.name in ("style", "script", "template"):
|
||||
value = tag.string
|
||||
else:
|
||||
value = tag.text
|
||||
_LOGGER.debug(value)
|
||||
except IndexError:
|
||||
_LOGGER.error("Unable to extract data from HTML")
|
||||
|
Loading…
Reference in New Issue
Block a user