@@ -30,16 +30,26 @@ The following format checks are currently being carried out:
...
@@ -30,16 +30,26 @@ The following format checks are currently being carried out:
| Format | Check |
| Format | Check |
| --------- | ------- |
| --------- | ------- |
| `GEOJSON` | Load the file using [`GeoPandas`](https://geopandas.org). |
| `ATOM` | Validates whether the file content is a valid ATOM feed by confirming the root element is `<feed>` in the Atom XML namespace. |
| `GML` | Load the file using [`GeoPandas`](https://geopandas.org). |
| `DOCX` | Verifies that the file is a valid DOCX by ensuring the ZIP archive contains the necessary XML files (`document.xml` and `styles.xml`). |
| `JPEG` | Load the image. |
| `GEOJSON` | Loads and validates the file using [`GeoPandas`](https://geopandas.org). |
| `JSON` | Is it syntactically correct JSON? If it is a *Frictionless Data Resource*, it is checked with the Frictionless Tools. |
| `GEOTIFF` | Verifies the file is a valid GeoTIFF by checking its GeoTransform information and supports both standalone and ZIP-compressed GeoTIFF formats. |
| `PNG` | Load the image. |
| `GML` | Loads and validates the file using [`GeoPandas`](https://geopandas.org). |
| `PDF` | Load the document using [`pypdf`](https://pypi.org/project/pypdf/). |
| `JPEG` | Loads and validates the image file. |
| `SHP` | Load the file using [`GeoPandas`](https://geopandas.org). |
| `JSON` | Verifies that the file is syntactically correct JSON and, if it is a *Frictionless Data Resource*, checks it using Frictionless Tools. |
| `WFS` | Is it a valid well-formed `WMS_Capabilities` XML document? If the address does not contain the `request=GetCapabilities` parameter, a `GetCapabilities` request is performed. This response is then checked. |
| `ODS` | Validates that the file is a valid ODS (OpenDocument Spreadsheet) by checking the ZIP structure, required files, and correct MIME type. |
| `WMS` | Is it a valid well-formed `WFS_Capabilities` XML document? If the address does not contain the `request=GetCapabilities` parameter, a `GetCapabilities` request is performed. This response is then checked. |
| `ODT` | Validates that the file is a valid ODT (OpenDocument Text) by confirming the ZIP structure, required files, and correct MIME type. |
| `XML` | Is it well-formed XML? |
| `PARQUET` | Verifies that the file is a readable Apache Parquet file by loading it using [`pandas`](https://pandas.pydata.org/). |
| `PDF` | Loads and validates the PDF document using [`pypdf`](https://pypi.org/project/pypdf/). |
| `PNG` | Loads and validates the image file. |
| `RDF` | Verifies the file is a valid RDF (Resource Description Framework) document and contains at least two statements. |
| `SHP` | Loads and validates the file using [`GeoPandas`](https://geopandas.org). |
| `WFS` | Validates if the file is a well-formed `WMS_Capabilities` XML document. If not, a `GetCapabilities` request is made and validated. |
| `WMS` | Validates if the file is a well-formed `WFS_Capabilities` XML document. If not, a `GetCapabilities` request is made and validated. |
| `WMTS` | Validates if the file contains a valid WMTS (Web Map Tile Service) capabilities XML response, either directly or by performing a `GetCapabilities` request. |
| `XLSX` | Verifies that the file is a ZIP archive and contains the required files (`xl/workbook.xml` and `xl/styles.xml`) typical of a valid XLSX file. |
| `XML` | Verifies if the file is well-formed XML. |
| `ZIP` | Verifies if the file is a valid ZIP archive using Python's `zipfile.is_zipfile()` method. |
## Installation
## Installation
...
@@ -208,8 +218,6 @@ In this example:
...
@@ -208,8 +218,6 @@ In this example:
2. Define the desired replacements in the JSON array format described above.
2. Define the desired replacements in the JSON array format described above.
3. Run the script as usual. If the file exists, replacements will be applied automatically.
3. Run the script as usual. If the file exists, replacements will be applied automatically.
By using `uri_replacements.json`, you can streamline URL handling and ensure consistent preprocessing for your link-checking tasks.
## Docker
## Docker
You can run the script in a Docker container. See the [Dockerfile](./Dockerfile) for more information.
You can run the script in a Docker container. See the [Dockerfile](./Dockerfile) for more information.
...
@@ -225,7 +233,7 @@ You can run the script in a Docker container. See the [Dockerfile](./Dockerfile)
...
@@ -225,7 +233,7 @@ You can run the script in a Docker container. See the [Dockerfile](./Dockerfile)
2. Run the Docker container:
2. Run the Docker container:
```sh
```sh
docker run --rm dcat-catalog-check --url https://example.com
docker run --rm dcat-catalog-check --url https://example.com