Skip to content
Snippets Groups Projects
Verified Commit b072c066 authored by Jesper Zedlitz's avatar Jesper Zedlitz
Browse files

Initial commit

parents
No related branches found
No related tags found
No related merge requests found
Pipeline #1394 passed
Showing with 841 additions and 0 deletions
.coverage
__pycache__/
.idea/
.env/
*~
uri_replacements.json
*.log
\ No newline at end of file
stages:
- lint
- test
lint:
image: node
stage: lint
before_script:
- "npm install -g markdownlint"
- "npm install -g markdownlint-cli"
script:
- markdownlint '**/*.md' --ignore node_modules | tee lint.log # Lint markdown files
artifacts:
when: always
paths:
- lint.log
allow_failure: true # Continue pipeline even if markdownlint fails
ruff:
image: python:3.10
stage: lint
before_script:
# Install pipx
- python3 -m pip install --user pipx
- python3 -m pipx ensurepath
- source ~/.bashrc
# Install Poetry using pipx
- pipx install poetry
# Install dependencies using Poetry
- poetry install
script:
- poetry run ruff check . # Run ruff linter
allow_failure: true # Continue pipeline even if ruff fails
test:
image: python:3.10
stage: test
before_script:
# Install pipx
- python3 -m pip install --user pipx
- python3 -m pipx ensurepath
- source ~/.bashrc
# Install Poetry using pipx
- pipx install poetry
# Install dependencies using Poetry
- poetry install
script:
- poetry run coverage run -m unittest # Run unit tests with coverage
- poetry run coverage report # Output coverage report to console
- poetry run coverage xml # Output coverage report to console
coverage: '/TOTAL.*\s+([0-9]{1,3})%/' # Extract coverage percentage
{
"MD013": false,
"MD024": {
"siblings_only": true
}
}
\ No newline at end of file
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [1.0.0] - 2024-12-20
### Added
- Initial project files
FROM alpine
# Install necessary system dependencies
RUN apk add --no-cache poetry proj-util gdal-dev gcc python3-dev musl-dev geos-dev proj-dev libmagic
# Set the PATH for pipx
ENV PATH="/root/.local/bin:${PATH}"
# Set the working directory inside the container
WORKDIR /app
# Copy the entire project to the container's working directory
COPY . .
# Install project dependencies with Poetry
RUN poetry install --no-interaction --no-ansi
# Make the main script executable
RUN chmod +x dcat_catalog_check.py
# Specify the entry point to run the script by default
ENTRYPOINT ["poetry", "run", "./dcat_catalog_check.py"]
LICENSE 0 → 100644
EUROPEAN UNION PUBLIC LICENCE v. 1.2
EUPL © the European Union 2007, 2016
This European Union Public Licence (the ‘EUPL’) applies to the Work (as defined below) which is provided under the
terms of this Licence. Any use of the Work, other than as authorised under this Licence is prohibited (to the extent such
use is covered by a right of the copyright holder of the Work).
The Work is provided under the terms of this Licence when the Licensor (as defined below) has placed the following
notice immediately following the copyright notice for the Work:
Licensed under the EUPL
or has expressed by any other means his willingness to license under the EUPL.
1.Definitions
In this Licence, the following terms have the following meaning:
— ‘The Licence’:this Licence.
— ‘The Original Work’:the work or software distributed or communicated by the Licensor under this Licence, available
as Source Code and also as Executable Code as the case may be.
— ‘Derivative Works’:the works or software that could be created by the Licensee, based upon the Original Work or
modifications thereof. This Licence does not define the extent of modification or dependence on the Original Work
required in order to classify a work as a Derivative Work; this extent is determined by copyright law applicable in
the country mentioned in Article 15.
— ‘The Work’:the Original Work or its Derivative Works.
— ‘The Source Code’:the human-readable form of the Work which is the most convenient for people to study and
modify.
— ‘The Executable Code’:any code which has generally been compiled and which is meant to be interpreted by
a computer as a program.
— ‘The Licensor’:the natural or legal person that distributes or communicates the Work under the Licence.
— ‘Contributor(s)’:any natural or legal person who modifies the Work under the Licence, or otherwise contributes to
the creation of a Derivative Work.
— ‘The Licensee’ or ‘You’:any natural or legal person who makes any usage of the Work under the terms of the
Licence.
— ‘Distribution’ or ‘Communication’:any act of selling, giving, lending, renting, distributing, communicating,
transmitting, or otherwise making available, online or offline, copies of the Work or providing access to its essential
functionalities at the disposal of any other natural or legal person.
2.Scope of the rights granted by the Licence
The Licensor hereby grants You a worldwide, royalty-free, non-exclusive, sublicensable licence to do the following, for
the duration of copyright vested in the Original Work:
— use the Work in any circumstance and for all usage,
— reproduce the Work,
— modify the Work, and make Derivative Works based upon the Work,
— communicate to the public, including the right to make available or display the Work or copies thereof to the public
and perform publicly, as the case may be, the Work,
— distribute the Work or copies thereof,
— lend and rent the Work or copies thereof,
— sublicense rights in the Work or copies thereof.
Those rights can be exercised on any media, supports and formats, whether now known or later invented, as far as the
applicable law permits so.
In the countries where moral rights apply, the Licensor waives his right to exercise his moral right to the extent allowed
by law in order to make effective the licence of the economic rights here above listed.
The Licensor grants to the Licensee royalty-free, non-exclusive usage rights to any patents held by the Licensor, to the
extent necessary to make use of the rights granted on the Work under this Licence.
3.Communication of the Source Code
The Licensor may provide the Work either in its Source Code form, or as Executable Code. If the Work is provided as
Executable Code, the Licensor provides in addition a machine-readable copy of the Source Code of the Work along with
each copy of the Work that the Licensor distributes or indicates, in a notice following the copyright notice attached to
the Work, a repository where the Source Code is easily and freely accessible for as long as the Licensor continues to
distribute or communicate the Work.
4.Limitations on copyright
Nothing in this Licence is intended to deprive the Licensee of the benefits from any exception or limitation to the
exclusive rights of the rights owners in the Work, of the exhaustion of those rights or of other applicable limitations
thereto.
5.Obligations of the Licensee
The grant of the rights mentioned above is subject to some restrictions and obligations imposed on the Licensee. Those
obligations are the following:
Attribution right: The Licensee shall keep intact all copyright, patent or trademarks notices and all notices that refer to
the Licence and to the disclaimer of warranties. The Licensee must include a copy of such notices and a copy of the
Licence with every copy of the Work he/she distributes or communicates. The Licensee must cause any Derivative Work
to carry prominent notices stating that the Work has been modified and the date of modification.
Copyleft clause: If the Licensee distributes or communicates copies of the Original Works or Derivative Works, this
Distribution or Communication will be done under the terms of this Licence or of a later version of this Licence unless
the Original Work is expressly distributed only under this version of the Licence — for example by communicating
‘EUPL v. 1.2 only’. The Licensee (becoming Licensor) cannot offer or impose any additional terms or conditions on the
Work or Derivative Work that alter or restrict the terms of the Licence.
Compatibility clause: If the Licensee Distributes or Communicates Derivative Works or copies thereof based upon both
the Work and another work licensed under a Compatible Licence, this Distribution or Communication can be done
under the terms of this Compatible Licence. For the sake of this clause, ‘Compatible Licence’ refers to the licences listed
in the appendix attached to this Licence. Should the Licensee's obligations under the Compatible Licence conflict with
his/her obligations under this Licence, the obligations of the Compatible Licence shall prevail.
Provision of Source Code: When distributing or communicating copies of the Work, the Licensee will provide
a machine-readable copy of the Source Code or indicate a repository where this Source will be easily and freely available
for as long as the Licensee continues to distribute or communicate the Work.
Legal Protection: This Licence does not grant permission to use the trade names, trademarks, service marks, or names
of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and
reproducing the content of the copyright notice.
6.Chain of Authorship
The original Licensor warrants that the copyright in the Original Work granted hereunder is owned by him/her or
licensed to him/her and that he/she has the power and authority to grant the Licence.
Each Contributor warrants that the copyright in the modifications he/she brings to the Work are owned by him/her or
licensed to him/her and that he/she has the power and authority to grant the Licence.
Each time You accept the Licence, the original Licensor and subsequent Contributors grant You a licence to their contributions
to the Work, under the terms of this Licence.
7.Disclaimer of Warranty
The Work is a work in progress, which is continuously improved by numerous Contributors. It is not a finished work
and may therefore contain defects or ‘bugs’ inherent to this type of development.
For the above reason, the Work is provided under the Licence on an ‘as is’ basis and without warranties of any kind
concerning the Work, including without limitation merchantability, fitness for a particular purpose, absence of defects or
errors, accuracy, non-infringement of intellectual property rights other than copyright as stated in Article 6 of this
Licence.
This disclaimer of warranty is an essential part of the Licence and a condition for the grant of any rights to the Work.
8.Disclaimer of Liability
Except in the cases of wilful misconduct or damages directly caused to natural persons, the Licensor will in no event be
liable for any direct or indirect, material or moral, damages of any kind, arising out of the Licence or of the use of the
Work, including without limitation, damages for loss of goodwill, work stoppage, computer failure or malfunction, loss
of data or any commercial damage, even if the Licensor has been advised of the possibility of such damage. However,
the Licensor will be liable under statutory product liability laws as far such laws apply to the Work.
9.Additional agreements
While distributing the Work, You may choose to conclude an additional agreement, defining obligations or services
consistent with this Licence. However, if accepting obligations, You may act only on your own behalf and on your sole
responsibility, not on behalf of the original Licensor or any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against such Contributor by
the fact You have accepted any warranty or additional liability.
10.Acceptance of the Licence
The provisions of this Licence can be accepted by clicking on an icon ‘I agree’ placed under the bottom of a window
displaying the text of this Licence or by affirming consent in any other similar way, in accordance with the rules of
applicable law. Clicking on that icon indicates your clear and irrevocable acceptance of this Licence and all of its terms
and conditions.
Similarly, you irrevocably accept this Licence and all of its terms and conditions by exercising any rights granted to You
by Article 2 of this Licence, such as the use of the Work, the creation by You of a Derivative Work or the Distribution
or Communication by You of the Work or copies thereof.
11.Information to the public
In case of any Distribution or Communication of the Work by means of electronic communication by You (for example,
by offering to download the Work from a remote location) the distribution channel or media (for example, a website)
must at least provide to the public the information requested by the applicable law regarding the Licensor, the Licence
and the way it may be accessible, concluded, stored and reproduced by the Licensee.
12.Termination of the Licence
The Licence and the rights granted hereunder will terminate automatically upon any breach by the Licensee of the terms
of the Licence.
Such a termination will not terminate the licences of any person who has received the Work from the Licensee under
the Licence, provided such persons remain in full compliance with the Licence.
13.Miscellaneous
Without prejudice of Article 9 above, the Licence represents the complete agreement between the Parties as to the
Work.
If any provision of the Licence is invalid or unenforceable under applicable law, this will not affect the validity or
enforceability of the Licence as a whole. Such provision will be construed or reformed so as necessary to make it valid
and enforceable.
The European Commission may publish other linguistic versions or new versions of this Licence or updated versions of
the Appendix, so far this is required and reasonable, without reducing the scope of the rights granted by the Licence.
New versions of the Licence will be published with a unique version number.
All linguistic versions of this Licence, approved by the European Commission, have identical value. Parties can take
advantage of the linguistic version of their choice.
14.Jurisdiction
Without prejudice to specific agreement between parties,
— any litigation resulting from the interpretation of this License, arising between the European Union institutions,
bodies, offices or agencies, as a Licensor, and any Licensee, will be subject to the jurisdiction of the Court of Justice
of the European Union, as laid down in article 272 of the Treaty on the Functioning of the European Union,
— any litigation arising between other parties and resulting from the interpretation of this License, will be subject to
the exclusive jurisdiction of the competent court where the Licensor resides or conducts its primary business.
15.Applicable Law
Without prejudice to specific agreement between parties,
— this Licence shall be governed by the law of the European Union Member State where the Licensor has his seat,
resides or has his registered office,
— this licence shall be governed by Belgian law if the Licensor has no seat, residence or registered office inside
a European Union Member State.
Appendix
‘Compatible Licences’ according to Article 5 EUPL are:
— GNU General Public License (GPL) v. 2, v. 3
— GNU Affero General Public License (AGPL) v. 3
— Open Software License (OSL) v. 2.1, v. 3.0
— Eclipse Public License (EPL) v. 1.0
— CeCILL v. 2.0, v. 2.1
— Mozilla Public Licence (MPL) v. 2
— GNU Lesser General Public Licence (LGPL) v. 2.1, v. 3
— Creative Commons Attribution-ShareAlike v. 3.0 Unported (CC BY-SA 3.0) for works other than software
— European Union Public Licence (EUPL) v. 1.1, v. 1.2
— Québec Free and Open-Source Licence — Reciprocity (LiLiQ-R) or Strong Reciprocity (LiLiQ-R+).
The European Commission may update this Appendix to later versions of the above licences without producing
a new version of the EUPL, as long as they provide the rights granted in Article 2 of this Licence and protect the
covered Source Code from exclusive appropriation.
All other changes or additions to this Appendix require the production of a new EUPL version.
README.md 0 → 100644
# DCAT Catalog Check
![pipeline status](https://code.schleswig-holstein.de/opendata/dcat-catalog-check/badges/main/pipeline.svg)
![Coverage](https://code.schleswig-holstein.de/opendata/dcat-catalog-check/badges/main/coverage.svg?job=test)
This project is a Python script designed to monitor and validate links in a DCAT catalog.
The script is particularly useful for maintaining the integrity of distributions by ensuring that links are active and files are correctly formatted, thus helping to avoid issues related to broken links and invalid file types.
## Table of Contents
- [Features](#features)
- [Installation](#installation)
- [Usage](#usage)
- [Configuration](#configuration)
- [Docker](#docker)
- [Tests](#tests)
- [Contributing](#contributing)
- [License](#license)
## Features
- Retrieves the DCAT catalog.
- Checks if the URLs associated with the resources are alive or dead.
- If the file has been successfully downloaded, it is checked using the **format specified in the metadata**.
- Validates the MIME type of the distributions if no specialized check is available.
- Logs the results.
The following format checks are currently being carried out:
| Format | Check |
| --------- | ------- |
| `GEOJSON` | Load the file using [`GeoPandas`](https://geopandas.org). |
| `GML` | Load the file using [`GeoPandas`](https://geopandas.org). |
| `JPEG` | Load the image. |
| `JSON` | Is it syntactically correct JSON? If it is a *Frictionless Data Resource*, it is checked with the Frictionless Tools. |
| `PNG` | Load the image. |
| `PDF` | Load the document using [`pypdf`](https://pypi.org/project/pypdf/). |
| `SHP` | Load the file using [`GeoPandas`](https://geopandas.org). |
| `WFS` | Is it a valid well-formed `WMS_Capabilities` XML document? If the address does not contain the `request=GetCapabilities` parameter, a `GetCapabilities` request is performed. This response is then checked. |
| `WMS` | Is it a valid well-formed `WFS_Capabilities` XML document? If the address does not contain the `request=GetCapabilities` parameter, a `GetCapabilities` request is performed. This response is then checked. |
| `XML` | Is it well-formed XML? |
## Installation
Follow the steps below to set up the **DCAT Catalog Check** on your local machine.
### Installation with Poetry
Using **Poetry** is recommended for dependency management and virtual environment handling.
1. **Install Dependencies**
Navigate to the project directory and install the project’s dependencies (including development dependencies) using Poetry:
```sh
poetry install
```
This command will create a virtual environment and install all necessary packages as specified in the [`pyproject.toml`](./pyproject.toml) file.
2. **Activating the Virtual Environment**
Poetry automatically manages virtual environments. You can activate the virtual environment with:
```sh
poetry shell
```
To exit the virtual environment, simply run:
```sh
exit
```
## Usage
### Parameters
The **DCAT Catalog Check** script accepts several command-line arguments to customize its behavior. Below is a detailed explanation of each parameter:
| Parameter | Description | Type | Default |
| --------- | ----------- | ---- | ------- |
| `--url` | The URL of the DCAT catalog to check. | String | Required |
| `--log_file` | Path to the log file for storing detailed output. | String | None |
| `--results` | File path to load results from previous runs. | String | None |
| `--verbose` | Enable verbose logging for more detailed output. | Flag | Off |
| `--debug` | Enable debug logging for troubleshooting purposes. | Flag | Off |
| `--recheck` | Use the previous results (specified by `--results`) as input for rechecking only. | Flag | Off |
| `--no-recheck` | Only check new entries from the catalog without rechecking existing results. | Flag | Off |
| `--check-format` | Specify a single format to check (e.g., `JSON`, `JPEG`). | String | None |
| `--force-check-format`| Force checking distributions with the specified format, regardless of previous results. | String | None |
| `--check-http-5xx` | Recheck entries that encountered HTTP 5xx errors in previous runs. | Flag | Off |
### Example Usage
**Basic Run:**
To check a DCAT catalog and save the results:
```sh
poetry run python dcat_catalog_check.py --url https://example.com/catalog.xml > results.jsonl
```
The catalog (including possible subsequent pages) is completely downloaded and checked. The result is written to the file `results.jsonl` in *JSON Lines text file format*.
**Recheck Previous Results:**
To recheck only existing results from a previous run:
```sh
poetry run python dcat_catalog_check.py --url https://example.com/catalog.xml --results results.jsonl --recheck
```
**Check New Entries Only:**
To check only new entries without rechecking the existing ones:
```sh
poetry run python dcat_catalog_check.py --url https://example.com/catalog.xml --results results.jsonl --no-recheck > new.jsonl
mv new.json results.jsonl
```
The results from a previous run from the file `result.jsonl` are used. The catalog is processed completely. Only new data records are checked. All results (new ones as well as the old ones that have not been checked again) are output to the file `new.jsonl`. Once the check is complete, the old results file is overwritten with the new one.
**Debugging and Verbose Output:**
To enable verbose and debug logging:
```sh
poetry run python dcat_catalog_check.py --url https://example.com/catalog.xml --verbose --debug
```
**Format-Specific Checks:**
To check only a specific format (e.g., `JSON`):
```sh
poetry run python dcat_catalog_check.py --url https://example.com/catalog.xml --check-format JSON
```
**Force Format Check:**
To force-check a specific format regardless of previous results:
```sh
poetry run python dcat_catalog_check.py --url https://example.com/catalog.xml --force-check-format JSON
```
## Configuration
### File Formats
The script reads the allowed file formats from [`resources/file_types.json`](./resources/file_types.json)
file. This file defines the MIME types that are considered valid for each
format and should be placed in the same directory as the script.
#### Example `file_types.json`
```json
{
"HTML": [
"text/html"
],
"JPEG": [
"image/jpeg"
],
"JSON": [
"application/json", "text/plain"
]
}
```
### URI Replacements (optional)
The `uri_replacements.json` file is an optional configuration file that provides a way to preprocess and modify URLs before they are checked by the script. This can be useful for standardizing, correcting, or transforming URLs to match specific patterns or to comply with expected formats.
#### Example `uri_replacements.json`
The file is a JSON array, where each element is an object containing two keys:
- `regex`: A regular expression (in Python regex syntax) that matches parts of the URL that need to be replaced.
- `replaced_by`: A string specifying the replacement value for the matched parts of the URL.
Example:
```json
[
{
"regex": "http://example.com/old-path",
"replaced_by": "http://example.com/new-path"
},
{
"regex": "https://(.*)/deprecated",
"replaced_by": "https://\\1/updated"
}
]
```
In this example:
- URLs starting with `http://example.com/old-path` will be replaced with `http://example.com/new-path`.
- Any URL containing `/deprecated` after the domain will have `/deprecated` replaced with `/updated`.
#### How to Use
1. Create a file named `uri_replacements.json` in the script's directory.
2. Define the desired replacements in the JSON array format described above.
3. Run the script as usual. If the file exists, replacements will be applied automatically.
By using `uri_replacements.json`, you can streamline URL handling and ensure consistent preprocessing for your link-checking tasks.
## Docker
You can run the script in a Docker container. See the [Dockerfile](./Dockerfile) for more information.
### Build and Run
1. Build the Docker image:
```sh
docker build -t dcat-catalog-check .
```
2. Run the Docker container:
```sh
docker run --rm dcat-catalog-check --url https://example.com
```
## Tests
To ensure the quality of the code, we utilize **unittest** for testing and **coverage** to measure code coverage. Follow the instructions below to run the tests and generate coverage reports.
### Running Tests
To run the tests with coverage, you can use either of the following commands:
```sh
# Using Python directly
python3 -m coverage run -m unittest
```
or
```sh
# Using Poetry
poetry run coverage run -m unittest
```
### Generating a Coverage Report
After running the tests, you can generate a coverage report to see which parts of your code were exercised during testing:
```sh
# Using Python directly
python3 -m coverage report
```
or
```sh
# Using Poetry
poetry run coverage report
```
### Code Linting
For code linting, we use **ruff** to enforce style and catch potential issues. Run the following command to lint your code:
```sh
# Using Python directly
python3 -m ruff check .
```
or
```sh
# Using Poetry
poetry run ruff check .
```
## Contributing
Contributions are welcome! Please open an issue or submit a pull request
with your changes.
## License
This project is licensed under the European Union Public License 1.2.
See the [`LICENSE`](./LICENSE) file for details.
This diff is collapsed.
import geopandas
from pyogrio.errors import DataSourceError
from shapely.errors import GEOSException
def is_valid(resource, file):
"""Check if the content is a readable GeoJSON file."""
with open(file.name, "rb") as f:
try:
geopandas.read_file(f)
return True
except DataSourceError:
return False
except GEOSException:
return False
import geopandas
from pyogrio.errors import DataSourceError
from shapely.errors import GEOSException
def is_valid(resource, file):
"""Check if the content is a readable GML file."""
with open(file.name, "rb") as f:
try:
geopandas.read_file(f)
return True
except DataSourceError as e:
resource["error"] = str(e)
return False
except GEOSException as e:
resource["error"] = str(e)
return False
except Exception as e:
resource["error"] = str(e)
return False
from PIL import Image, UnidentifiedImageError
def is_valid(resource, file):
"""Check if the content is a readable JPEG image."""
try:
with Image.open(file.name, formats=["JPEG"]):
return True
except UnidentifiedImageError:
return False
import json
from frictionless import Resource
def is_valid(resource, file):
"""Check if the HTTP response is a valid JSON document."""
with open(file.name, "rb") as f:
try:
json_data = json.load(f)
if (
"path" in json_data
and "schema" in json_data
and (
json_data.get("profile") == "tabular-data-resource"
or json_data.get("type") == "table"
)
):
# There is a good chance that this is a Frictionless Data Resource.
res = Resource(json_data)
resource["schema_valid"] = res.validate().valid
return resource["schema_valid"]
return True
except json.JSONDecodeError as e:
resource["error"] = str(e)
return False
except UnicodeDecodeError as e:
resource["error"] = str(e)
return False
import pandas as pd
def is_valid(resource, file):
"""Check if the content is a readable Apache Parquet file."""
try:
pd.read_parquet(file.name)
return True
except Exception as e:
resource["error"] = str(e)
return False
from pypdf import PdfReader
from pypdf.errors import PyPdfError
def is_valid(resource, file):
"""Check if the content is a readable PDF document."""
with open(file.name, "rb") as f:
try:
PdfReader(f)
return True
except PyPdfError:
return False
from PIL import Image, UnidentifiedImageError
def is_valid(resource, file):
"""Check if the content is a readable PNG image."""
try:
with Image.open(file.name, formats=["PNG"]):
return True
except UnidentifiedImageError:
return False
import geopandas
from pyogrio.errors import DataSourceError
from shapely.errors import GEOSException
import zipfile
def is_valid(resource, file):
"""Check if the content is a readable shape file."""
# There are some strange 'shape files' that are actually ZIP files
# containing multiple SHP file.
try:
with zipfile.ZipFile(file.name, "r") as z:
shapefiles = [f for f in z.namelist() if f.endswith(".shp")]
except zipfile.BadZipFile as e:
resource["error"] = str(e)
return False
if len(shapefiles) == 0:
resource["error"] = "File contains no .shp file"
return False
elif (len(shapefiles) == 1) and ("/" not in shapefiles[0]):
# the normal case
with open(file.name, "rb") as f:
try:
geopandas.read_file(f)
except DataSourceError as e:
resource["error"] = str(e)
return False
except GEOSException as e:
resource["error"] = str(e)
return False
return True
else:
with zipfile.ZipFile(file.name, "r") as z:
for shp in shapefiles:
with z.open(shp) as f:
try:
geopandas.read_file(f"zip://{file.name}!{shp}")
except DataSourceError as e:
resource["error"] = str(e)
return False
except GEOSException as e:
resource["error"] = str(e)
return False
return True
import xml.etree.ElementTree as ET
import requests
import tempfile
def _load_into_file(url):
response = requests.get(url)
response.raise_for_status()
with tempfile.NamedTemporaryFile(delete=False) as temp_file:
temp_file.write(response.content)
return temp_file
def _is_capabilites_response(file):
with open(file.name, "rb") as f:
try:
xml = ET.parse(f).getroot()
return (
xml.tag == "{http://www.opengis.net/wfs/2.0}WFS_Capabilities"
or xml.tag == "{http://www.opengis.net/wfs}WFS_Capabilities"
)
except ET.ParseError:
return False
def is_valid(resource, file):
if _is_capabilites_response(file):
return True
# The response is not a capabilites XML files. That is allowed.
# Let's add the request parameters to the URL and try again.
url = resource["url"]
if "request=" not in url.lower():
if not url.endswith("?"):
url = url + "?"
url = url + "service=WFS&request=GetCapabilities"
return _is_capabilites_response(_load_into_file(url))
else:
# The URL already contains a getCapabilites request but the result was not a correct answer.
return False
import xml.etree.ElementTree as ET
import requests
import tempfile
def _load_into_file(url):
response = requests.get(url)
response.raise_for_status()
with tempfile.NamedTemporaryFile(delete=False) as temp_file:
temp_file.write(response.content)
return temp_file
def _is_capabilites_response(file):
with open(file.name, "rb") as f:
try:
xml = ET.parse(f).getroot()
return xml.tag == "{http://www.opengis.net/wms}WMS_Capabilities"
except ET.ParseError:
return False
def is_valid(resource, file):
if _is_capabilites_response(file):
return True
# The response is not a capabilites XML files. That is allowed.
# Let's add the request parameters to the URL and try again.
url = resource["url"]
if "request=" not in url.lower():
if not url.endswith("?"):
url = url + "?"
url = url + "service=WMS&request=GetCapabilities"
return _is_capabilites_response(_load_into_file(url))
else:
# The URL already contains a getCapabilites request but the result was not a correct answer.
return False
import xml.etree.ElementTree as ET
def is_valid(resource, file):
"""Check if the HTTP response is a well-formed XML document."""
with open(file.name, "rb") as f:
try:
ET.parse(f)
return True
except ET.ParseError as e:
resource["error"] = str(e)
return False
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment