Compare revisions

Jesper Zedlitz · Jesper Zedlitz · Jesper Zedlitz · Thorge Petersen · Thorge Petersen · Thorge Petersen
--- a/.coveragerc
+++ b/.coveragerc
+[run]
+omit =
+  tests/*
\ No newline at end of file
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -20,6 +20,9 @@ ruff:
  image: python:3.10
  stage: lint
  before_script:
+    # Install libgdal-dev
+    - apt-get update
+    - apt-get install -y libgdal-dev
    # Install pipx
    - python3 -m pip install --user pipx
    - python3 -m pipx ensurepath
@@ -36,6 +39,9 @@ test:
  image: python:3.10
  stage: test
  before_script:
+    # Install libgdal-dev
+    - apt-get update
+    - apt-get install -y libgdal-dev
    # Install pipx
    - python3 -m pip install --user pipx
    - python3 -m pipx ensurepath

--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,6 +5,27 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

+## [1.1.0] - 2025-01-09
+
+### Added
+
+- **Unit Tests**:
+  - URI replacements and resource clearing functionality.
+  - Support for multiple formats: Atom, DOCX, GeoTIFF, GeoJSON, JPEG, ODT, ODS, PDF, RDF, TXT, WMTS, XLSX.
+  - Frictionless Data Resource validation.
+
+- **Report Generation**:
+  - Added columns for **HTTP status** and **error message** in the generated reports.
+  - Implemented **filters** for table columns, allowing users to refine data views.
+
+### Changed
+
+- **Coverage Configuration**:
+  - Updated coverage settings to better manage test file inclusion/exclusion.
+  - Test files are now excluded from coverage reports to focus on measuring application code quality.
+
+- **Dockerfile**: Switched base image to `python:3.10` and updated installation steps for dependencies, pipx, and Poetry.
+
 ## [1.0.0] - 2024-12-20

 ### Added

--- a/Dockerfile
+++ b/Dockerfile
-FROM alpine
+FROM python:3.10

 # Install necessary system dependencies
-RUN apk add --no-cache poetry proj-util gdal-dev gcc python3-dev musl-dev geos-dev proj-dev libmagic
+RUN apt-get update && \
+    apt-get install -y \
+    libgdal-dev \
+    libmagic-dev \
+    gcc \
+    python3-dev \
+    musl-dev \
+    libgeos-dev \
+    libproj-dev \
+    && python3 -m pip install --upgrade pip \
+    && python3 -m pip install pipx \
+    && python3 -m pipx ensurepath

-# Set the PATH for pipx
+# Ensure pipx is in the PATH
 ENV PATH="/root/.local/bin:${PATH}"

+# Install poetry using pipx
+RUN pipx install poetry
+
 # Set the working directory inside the container
 WORKDIR /app


--- a/README.md
+++ b/README.md
@@ -30,16 +30,26 @@ The following format checks are currently being carried out:

 | Format    | Check |
 | --------- | ------- |
-| `GEOJSON` | Load the file using [`GeoPandas`](https://geopandas.org). |
-| `GML`     | Load the file using [`GeoPandas`](https://geopandas.org). |
-| `JPEG`    | Load the image. |
-| `JSON`    | Is it syntactically correct JSON? If it is a *Frictionless Data Resource*, it is checked with the Frictionless Tools. |
-| `PNG`     | Load the image. |
-| `PDF`     | Load the document using [`pypdf`](https://pypi.org/project/pypdf/). |
-| `SHP`     | Load the file using [`GeoPandas`](https://geopandas.org). |
-| `WFS`     | Is it a valid well-formed `WMS_Capabilities` XML document? If the address does not contain the `request=GetCapabilities` parameter, a `GetCapabilities` request is performed. This response is then checked. |
-| `WMS`     | Is it a valid well-formed `WFS_Capabilities` XML document? If the address does not contain the `request=GetCapabilities` parameter, a `GetCapabilities` request is performed. This response is then checked.  |
-| `XML`     | Is it well-formed XML? |
+| `ATOM`    | Validates whether the file content is a valid ATOM feed by confirming the root element is `<feed>` in the Atom XML namespace. |
+| `DOCX`    | Verifies that the file is a valid DOCX by ensuring the ZIP archive contains the necessary XML files (`document.xml` and `styles.xml`). |
+| `GEOJSON` | Loads and validates the file using [`GeoPandas`](https://geopandas.org). |
+| `GEOTIFF` | Verifies the file is a valid GeoTIFF by checking its GeoTransform information and supports both standalone and ZIP-compressed GeoTIFF formats. |
+| `GML`     | Loads and validates the file using [`GeoPandas`](https://geopandas.org). |
+| `JPEG`    | Loads and validates the image file. |
+| `JSON`    | Verifies that the file is syntactically correct JSON and, if it is a *Frictionless Data Resource*, checks it using Frictionless Tools. |
+| `ODS`     | Validates that the file is a valid ODS (OpenDocument Spreadsheet) by checking the ZIP structure, required files, and correct MIME type. |
+| `ODT`     | Validates that the file is a valid ODT (OpenDocument Text) by confirming the ZIP structure, required files, and correct MIME type. |
+| `PARQUET` | Verifies that the file is a readable Apache Parquet file by loading it using [`pandas`](https://pandas.pydata.org/). |
+| `PDF`     | Loads and validates the PDF document using [`pypdf`](https://pypi.org/project/pypdf/). |
+| `PNG`     | Loads and validates the image file. |
+| `RDF`     | Verifies the file is a valid RDF (Resource Description Framework) document and contains at least two statements. |
+| `SHP`     | Loads and validates the file using [`GeoPandas`](https://geopandas.org). |
+| `WFS`     | Validates if the file is a well-formed `WMS_Capabilities` XML document. If not, a `GetCapabilities` request is made and validated. |
+| `WMS`     | Validates if the file is a well-formed `WFS_Capabilities` XML document. If not, a `GetCapabilities` request is made and validated. |
+| `WMTS`    | Validates if the file contains a valid WMTS (Web Map Tile Service) capabilities XML response, either directly or by performing a `GetCapabilities` request. |
+| `XLSX`    | Verifies that the file is a ZIP archive and contains the required files (`xl/workbook.xml` and `xl/styles.xml`) typical of a valid XLSX file. |
+| `XML`     | Verifies if the file is well-formed XML. |
+| `ZIP`     | Verifies if the file is a valid ZIP archive using Python's `zipfile.is_zipfile()` method. |

 ## Installation

@@ -208,8 +218,6 @@ In this example:
 2. Define the desired replacements in the JSON array format described above.
 3. Run the script as usual. If the file exists, replacements will be applied automatically.

-By using `uri_replacements.json`, you can streamline URL handling and ensure consistent preprocessing for your link-checking tasks.
-
 ## Docker

 You can run the script in a Docker container. See the [Dockerfile](./Dockerfile) for more information.
@@ -225,7 +233,7 @@ You can run the script in a Docker container. See the [Dockerfile](./Dockerfile)
 2. Run the Docker container:

    ```sh
-    docker run --rm dcat-catalog-check --url https://example.com 
+    docker run --rm dcat-catalog-check --url https://example.com
    ```

 ## Tests

--- a/dcat_catalog_check.py
+++ b/dcat_catalog_check.py
@@ -154,7 +154,8 @@ class DcatCatalogCheck:
            "error",
            "etag",
            "http_status",
-            "last_check" "mimetype",
+            "last_check",
+            "mimetype",
            "mimetype_mismatch",
            "valid",
        ]:
@@ -174,8 +175,7 @@ class DcatCatalogCheck:
        format = resource["format"].lower()
        try:
            # dynamically import the corresponding module for the format
-            format_check_module = importlib.import_module(
-                f"formats.{format}_format")
+            format_check_module = importlib.import_module(f"formats.{format}_format")
        except ModuleNotFoundError:
            format_check_module = None

@@ -194,6 +194,9 @@ class DcatCatalogCheck:
            if "etag" in response.headers:
                resource["etag"] = response.headers["etag"]

+            if "content-length" in response.headers:
+                resource["size"] = response.headers["content-length"]
+
        except requests.exceptions.RequestException as err:
            # Handle connection, timeout, or other request errors
            resource["accessible"] = False
@@ -210,8 +213,7 @@ class DcatCatalogCheck:

        # write the content of the HTTP response into a temporary file
        original_file_name = url.split("/")[-1]
-        suffix = original_file_name.split(
-            ".")[-1] if "." in original_file_name else ""
+        suffix = original_file_name.split(".")[-1] if "." in original_file_name else ""
        with tempfile.NamedTemporaryFile(
            delete=False, suffix="." + suffix
        ) as temp_file:
@@ -234,8 +236,7 @@ class DcatCatalogCheck:

            decompressor = decompressors.get(resource["mimetype"])
            if not decompressor:
-                self.logger.warning(
-                    f"Unknown compression {resource['mimetype']}.")
+                self.logger.warning(f"Unknown compression {resource['mimetype']}.")
            else:
                with tempfile.NamedTemporaryFile(delete=False) as decompressed_file:
                    with decompressor.open(temp_file.name, "rb") as compressed_file:
@@ -245,9 +246,10 @@ class DcatCatalogCheck:
                temp_file = decompressed_file
                resource["mimetype"] = self._guess_mime_type(temp_file.name)

-        if self._is_container(resource["mimetype"], resource["format"]):
-            self._check_container_file(
-                resource, temp_file, format_check_module)
+        if self._is_container(resource["mimetype"], resource["format"]) and resource[
+            "format"
+        ] not in ["GTFS", "GEOTIFF", "SHP"]:
+            self._check_container_file(resource, temp_file, format_check_module)
        else:
            self._check_single_file(resource, temp_file, format_check_module)

@@ -275,8 +277,7 @@ class DcatCatalogCheck:
                            temp_file.write(file.read())
                            temp_file.flush()

-                            resource["mimetype"] = self._guess_mime_type(
-                                temp_file.name)
+                            resource["mimetype"] = self._guess_mime_type(temp_file.name)
                            validation_result = (
                                validation_result
                                and self._check_single_file(
@@ -290,14 +291,12 @@ class DcatCatalogCheck:
            return contains_at_least_one_relevant_file and validation_result

        else:
-            self.logger.error(
-                f"Unsupported container format {resource['mimetype']}")
+            self.logger.error(f"Unsupported container format {resource['mimetype']}")

    def _check_single_file(self, resource, temp_file, format_check_module):
        if format_check_module:
            # call the function `process` that is defined in every modul
-            resource["valid"] = format_check_module.is_valid(
-                resource, temp_file)
+            resource["valid"] = format_check_module.is_valid(resource, temp_file)
        else:
            # There is no specialized check for the specified format.
            # Does the returned MIME type match the promised format?
@@ -322,8 +321,7 @@ class DcatCatalogCheck:
        ):
            hash_algorithm = hashlib.md5()
        else:
-            print(
-                f"WARNING: unknown checksum algorithm {algo_name}", file=sys.stderr)
+            print(f"WARNING: unknown checksum algorithm {algo_name}", file=sys.stderr)
            return

        with open(temp_file.name, "rb") as f:
@@ -418,8 +416,7 @@ class DcatCatalogCheck:
            publisher = graph.value(dataset, DCTERMS.publisher)

            if not publisher:
-                self.logger.warning(
-                    f"Publisher not found for dataset: {dataset}")
+                self.logger.warning(f"Publisher not found for dataset: {dataset}")
                return None

            # Attempt to get the publisher's name
@@ -433,8 +430,7 @@ class DcatCatalogCheck:

        except Exception as e:
            # Log any unexpected errors
-            self.logger.error(
-                f"Error retrieving publisher for dataset {dataset}: {e}")
+            self.logger.error(f"Error retrieving publisher for dataset {dataset}: {e}")
            return None

    def _process_datasets(self, datasets, g):
@@ -459,8 +455,7 @@ class DcatCatalogCheck:

                    url = str(resource["url"])
                    if self._needs_check(url):
-                        checksum_resource = g.value(
-                            distribution, SPDX.checksum)
+                        checksum_resource = g.value(distribution, SPDX.checksum)
                        if checksum_resource:
                            resource["checksum_algorithm"] = str(
                                g.value(checksum_resource, SPDX.algorithm)
@@ -481,7 +476,8 @@ class DcatCatalogCheck:
    def read_previous_results(self, file_path):
        if not os.path.exists(file_path):
            self.logger.warning(
-                f"File '{file_path}' does not exist. No previous results loaded.")
+                f"File '{file_path}' does not exist. No previous results loaded."
+            )
            return

        loaded_count = 0
@@ -500,7 +496,8 @@ class DcatCatalogCheck:
                    url = json_object.get("url")
                    if not url:
                        self.logger.warning(
-                            f"Line {line_number} is missing 'url': {line}")
+                            f"Line {line_number} is missing 'url': {line}"
+                        )
                        skipped_count += 1
                        continue

@@ -508,12 +505,12 @@ class DcatCatalogCheck:
                    loaded_count += 1

                except json.JSONDecodeError as e:
-                    self.logger.error(
-                        f"Invalid JSON at line {line_number}: {e}")
+                    self.logger.error(f"Invalid JSON at line {line_number}: {e}")
                    skipped_count += 1

        self.logger.info(
-            f"Loaded {loaded_count} results from '{file_path}', skipped {skipped_count} lines.")
+            f"Loaded {loaded_count} results from '{file_path}', skipped {skipped_count} lines."
+        )

    def read_dcat_catalog(self, url):
        while url:
@@ -536,8 +533,7 @@ class DcatCatalogCheck:

            self._process_datasets(datasets, g)

-            paged_collection = g.value(
-                predicate=RDF.type, object=HYDRA.PagedCollection)
+            paged_collection = g.value(predicate=RDF.type, object=HYDRA.PagedCollection)
            next_page = g.value(paged_collection, HYDRA.nextPage)
            url = str(next_page) if next_page else None

@@ -562,12 +558,9 @@ if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--url", help="DCAT catalog URL")
    parser.add_argument("--log_file", help="Log file path")
-    parser.add_argument(
-        "--results", help="File from which the results are loaded")
-    parser.add_argument("--verbose", action="store_true",
-                        help="Enable verbose logging")
-    parser.add_argument("--debug", action="store_true",
-                        help="Enable debug logging")
+    parser.add_argument("--results", help="File from which the results are loaded")
+    parser.add_argument("--verbose", action="store_true", help="Enable verbose logging")
+    parser.add_argument("--debug", action="store_true", help="Enable debug logging")
    parser.add_argument(
        "--recheck",
        action="store_true",
@@ -578,8 +571,7 @@ if __name__ == "__main__":
        action="store_true",
        help="Just check new entries from the catalog. Do not re-check existing results.",
    )
-    parser.add_argument(
-        "--check-format", help="Only check the specified format")
+    parser.add_argument("--check-format", help="Only check the specified format")
    parser.add_argument(
        "--force-check-format",
        help="Check distributinons with the specified format regardless of previous results",

--- a/docker-compose.yaml
+++ b/docker-compose.yaml
+version: '3.8'
+services:
+  lint:
+    image: node
+    command: >
+      sh -c "
+      npm install -g markdownlint markdownlint-cli &&
+      markdownlint '**/*.md' --ignore node_modules | tee lint.log
+      "
+    volumes:
+      - .:/app
+      - /app/node_modules
+  ruff:
+    image: python:3.10
+    command: >
+      bash -c "
+      apt-get update &&
+      apt-get install -y libgdal-dev &&
+      python3 -m pip install --user pipx &&
+      python3 -m pipx ensurepath &&
+      source ~/.bashrc &&
+      pipx install poetry &&
+      poetry install &&
+      poetry run ruff check .
+      "
+    volumes:
+      - .:/app
+    working_dir: /app
--- a/formats/atom_format.py
+++ b/formats/atom_format.py
+import xml.etree.ElementTree as ET
+
+
+def is_valid(resource, file):
+    """Check if the HTTP response is an ATOM feed."""
+
+    with open(file.name, "rb") as f:
+        try:
+            xml = ET.parse(f).getroot()
+
+            if xml.tag == "{http://www.w3.org/2005/Atom}feed":
+                return True
+            else:
+                resource["error"] = (
+                    "Root element is not {http://www.w3.org/2005/Atom}feed"
+                )
+                return False
+        except Exception as e:
+            resource["error"] = str(e)
+            return False
--- a/formats/docx_format.py
+++ b/formats/docx_format.py
+import zipfile
+
+
+def is_valid(resource, file):
+    """Check if the content is a DOCX file."""
+
+    if not zipfile.is_zipfile(file.name):
+        resource["error"] = "Not a ZIP file."
+        return False
+
+    with zipfile.ZipFile(file.name, "r") as zip_ref:
+        zip_contents = zip_ref.namelist()
+
+        required_files = ["word/document.xml", "word/styles.xml"]
+
+        if not all(file in zip_contents for file in required_files):
+            resource["error"] = "That does not look like an DOCX file."
+            return False
+
+        return True
--- a/formats/geojson_format.py
+++ b/formats/geojson_format.py
-import geopandas
-from pyogrio.errors import DataSourceError
-from shapely.errors import GEOSException
+import geojson


 def is_valid(resource, file):
@@ -8,9 +6,11 @@ def is_valid(resource, file):

    with open(file.name, "rb") as f:
        try:
-            geopandas.read_file(f)
-            return True
-        except DataSourceError:
-            return False
-        except GEOSException:
-            return False
+            geojson_data = geojson.load(f)
+            if isinstance(geojson_data, dict) and "type" in geojson_data:
+                return True
+            else:
+                resource["error"] = "JSON is not GeoJSON."
+                return False
+        except Exception as e:
+            resource["error"] = str(e)
--- a/formats/geotiff_format.py
+++ b/formats/geotiff_format.py
+from osgeo import gdal
+import zipfile
+import tempfile
+import os
+
+
+def is_geotiff(resource, file_name):
+    dataset = gdal.Open(file_name)
+
+    if not dataset:
+        resource["error"] = f"could not read file {file_name}"
+        return False
+
+    geotransform = dataset.GetGeoTransform()
+    default_transform = (0.0, 1.0, 0.0, 0.0, 0.0, 1.0)
+
+    if geotransform == default_transform:
+        resource["error"] = "missing transformation"
+        return False
+
+    return True
+
+
+def is_valid(resource, file):
+    """Check if the content is a GeoTIFF file."""
+
+    # Some GeoTIFF files consist for two files in a ZIP file:
+    #  - the TIFF image itself
+    #  - a TFW world file with the transform information
+    if zipfile.is_zipfile(file.name):
+        with tempfile.TemporaryDirectory() as temp_dir:
+            with zipfile.ZipFile(file.name, "r") as zip_ref:
+                file_list = zip_ref.namelist()
+                relevant_files = [
+                    file
+                    for file in file_list
+                    if file.lower().endswith(".tiff") or file.lower().endswith(".tif")
+                ]
+                contains_at_least_one_relevant_file = len(relevant_files) > 0
+                if contains_at_least_one_relevant_file:
+                    zip_ref.extractall(temp_dir)
+                    for tif_name in relevant_files:
+                        tif_path = os.path.join(temp_dir, tif_name)
+                        if is_geotiff(resource, tif_path):
+                            # the ZIP file contains at least one valid GeoTIFF
+                            return True
+                else:
+                    resource["error"] = "ZIP file contains not TIFF image"
+                    return False
+    else:
+        return is_geotiff(resource, file.name)
--- a/formats/gml_format.py
+++ b/formats/gml_format.py
 import geopandas
-from pyogrio.errors import DataSourceError
-from shapely.errors import GEOSException


 def is_valid(resource, file):
@@ -10,12 +8,6 @@ def is_valid(resource, file):
        try:
            geopandas.read_file(f)
            return True
-        except DataSourceError as e:
-            resource["error"] = str(e)
-            return False
-        except GEOSException as e:
-            resource["error"] = str(e)
-            return False
        except Exception as e:
            resource["error"] = str(e)
            return False
--- a/formats/json_format.py
+++ b/formats/json_format.py
@@ -23,9 +23,6 @@ def is_valid(resource, file):
                return resource["schema_valid"]

            return True
-        except json.JSONDecodeError as e:
-            resource["error"] = str(e)
-            return False
-        except UnicodeDecodeError as e:
+        except Exception as e:
            resource["error"] = str(e)
            return False
--- a/formats/ods_format.py
+++ b/formats/ods_format.py
+import zipfile
+
+
+def is_valid(resource, file):
+    """Check if the content is a ODS file."""
+
+    if not zipfile.is_zipfile(file.name):
+        resource["error"] = "Not a ZIP file."
+        return False
+
+    with zipfile.ZipFile(file.name, "r") as zip_ref:
+        zip_contents = zip_ref.namelist()
+
+        required_files = ["mimetype", "content.xml", "meta.xml", "styles.xml"]
+
+        if not all(file in zip_contents for file in required_files):
+            resource["error"] = "That does not look like an ODS file."
+            return False
+
+        with zip_ref.open("mimetype") as mimetype_file:
+            mimetype_content = mimetype_file.read().decode("utf-8").strip()
+
+        if mimetype_content != "application/vnd.oasis.opendocument.spreadsheet":
+            resource["error"] = f"Incorrect MIME type: {mimetype_content}"
+            return False
+
+        return True
--- a/formats/odt_format.py
+++ b/formats/odt_format.py
+import zipfile
+
+
+def is_valid(resource, file):
+    """Check if the content is a ODT file."""
+
+    if not zipfile.is_zipfile(file.name):
+        resource["error"] = "Not a ZIP file."
+        return False
+
+    with zipfile.ZipFile(file.name, "r") as zip_ref:
+        zip_contents = zip_ref.namelist()
+
+        required_files = ["mimetype", "content.xml", "meta.xml", "styles.xml"]
+
+        if not all(file in zip_contents for file in required_files):
+            resource["error"] = "That does not look like an ODT file."
+            return False
+
+        with zip_ref.open("mimetype") as mimetype_file:
+            mimetype_content = mimetype_file.read().decode("utf-8").strip()
+
+        if mimetype_content != "application/vnd.oasis.opendocument.text":
+            resource["error"] = f"Incorrect MIME type: {mimetype_content}"
+            return False
+
+        return True
--- a/formats/pdf_format.py
+++ b/formats/pdf_format.py
 from pypdf import PdfReader
-from pypdf.errors import PyPdfError


 def is_valid(resource, file):
@@ -9,5 +8,6 @@ def is_valid(resource, file):
        try:
            PdfReader(f)
            return True
-        except PyPdfError:
+        except Exception as e:
+            resource["error"] = str(e)
            return False
--- a/formats/png_format.py
+++ b/formats/png_format.py
-from PIL import Image, UnidentifiedImageError
+from PIL import Image


 def is_valid(resource, file):
@@ -7,5 +7,6 @@ def is_valid(resource, file):
    try:
        with Image.open(file.name, formats=["PNG"]):
            return True
-    except UnidentifiedImageError:
+    except Exception as e:
+        resource["error"] = str(e)
        return False
--- a/formats/rdf_format.py
+++ b/formats/rdf_format.py
+from rdflib import Graph
+
+
+def is_valid(resource, file):
+    """Check if file is a valid RDF document."""
+
+    try:
+        graph = Graph()
+        graph.parse(file.name)
+
+        # even an empty RDF document contains two statements
+        if len(graph) > 2:
+            return True
+        else:
+            resource["error"] = "RDF document does not contain any statements."
+            return False
+    except Exception as e:
+        resource["error"] = str(e)
+        return False
--- a/formats/shp_format.py
+++ b/formats/shp_format.py
 import geopandas
-from pyogrio.errors import DataSourceError
-from shapely.errors import GEOSException
 import zipfile


@@ -24,10 +22,7 @@ def is_valid(resource, file):
        with open(file.name, "rb") as f:
            try:
                geopandas.read_file(f)
-            except DataSourceError as e:
-                resource["error"] = str(e)
-                return False
-            except GEOSException as e:
+            except Exception as e:
                resource["error"] = str(e)
                return False
        return True
@@ -37,10 +32,7 @@ def is_valid(resource, file):
                with z.open(shp) as f:
                    try:
                        geopandas.read_file(f"zip://{file.name}!{shp}")
-                    except DataSourceError as e:
-                        resource["error"] = str(e)
-                        return False
-                    except GEOSException as e:
+                    except Exception as e:
                        resource["error"] = str(e)
                        return False
        return True
--- a/formats/wfs_srvc_format.py
+++ b/formats/wfs_srvc_format.py
@@ -12,21 +12,26 @@ def _load_into_file(url):
        return temp_file


-def _is_capabilites_response(file):
+def _is_capabilites_response(resource, file):
    with open(file.name, "rb") as f:
        try:
            xml = ET.parse(f).getroot()

-            return (
+            if (
                xml.tag == "{http://www.opengis.net/wfs/2.0}WFS_Capabilities"
                or xml.tag == "{http://www.opengis.net/wfs}WFS_Capabilities"
-            )
-        except ET.ParseError:
+            ):
+                return True
+            else:
+                resource["error"] = "Root element is not WFS_Capabilities"
+                return False
+        except Exception as e:
+            resource["error"] = str(e)
            return False


 def is_valid(resource, file):
-    if _is_capabilites_response(file):
+    if _is_capabilites_response(resource, file):
        return True

    # The response is not a capabilites XML files. That is allowed.
@@ -38,7 +43,12 @@ def is_valid(resource, file):
            url = url + "?"

        url = url + "service=WFS&request=GetCapabilities"
-        return _is_capabilites_response(_load_into_file(url))
+
+        try:
+            return _is_capabilites_response(resource, _load_into_file(url))
+        except Exception as e:
+            resource["error"] = str(e)
+            return False
    else:
        # The URL already contains a getCapabilites request but the result was not a correct answer.
        return False
No results found