Skip to content
Snippets Groups Projects
Verified Commit 0b3d00cf authored by Thorge Petersen's avatar Thorge Petersen
Browse files

docs: updated format check descriptions

parent 52900efc
No related branches found
No related tags found
1 merge request!1Update Formats, Dependencies, and Dockerfile Configuration
Pipeline #1424 passed
......@@ -30,16 +30,26 @@ The following format checks are currently being carried out:
| Format | Check |
| --------- | ------- |
| `GEOJSON` | Load the file using [`GeoPandas`](https://geopandas.org). |
| `GML` | Load the file using [`GeoPandas`](https://geopandas.org). |
| `JPEG` | Load the image. |
| `JSON` | Is it syntactically correct JSON? If it is a *Frictionless Data Resource*, it is checked with the Frictionless Tools. |
| `PNG` | Load the image. |
| `PDF` | Load the document using [`pypdf`](https://pypi.org/project/pypdf/). |
| `SHP` | Load the file using [`GeoPandas`](https://geopandas.org). |
| `WFS` | Is it a valid well-formed `WMS_Capabilities` XML document? If the address does not contain the `request=GetCapabilities` parameter, a `GetCapabilities` request is performed. This response is then checked. |
| `WMS` | Is it a valid well-formed `WFS_Capabilities` XML document? If the address does not contain the `request=GetCapabilities` parameter, a `GetCapabilities` request is performed. This response is then checked. |
| `XML` | Is it well-formed XML? |
| `ATOM` | Validates whether the file content is a valid ATOM feed by confirming the root element is `<feed>` in the Atom XML namespace. |
| `DOCX` | Verifies that the file is a valid DOCX by ensuring the ZIP archive contains the necessary XML files (`document.xml` and `styles.xml`). |
| `GEOJSON` | Loads and validates the file using [`GeoPandas`](https://geopandas.org). |
| `GEOTIFF` | Verifies the file is a valid GeoTIFF by checking its GeoTransform information and supports both standalone and ZIP-compressed GeoTIFF formats. |
| `GML` | Loads and validates the file using [`GeoPandas`](https://geopandas.org). |
| `JPEG` | Loads and validates the image file. |
| `JSON` | Verifies that the file is syntactically correct JSON and, if it is a *Frictionless Data Resource*, checks it using Frictionless Tools. |
| `ODS` | Validates that the file is a valid ODS (OpenDocument Spreadsheet) by checking the ZIP structure, required files, and correct MIME type. |
| `ODT` | Validates that the file is a valid ODT (OpenDocument Text) by confirming the ZIP structure, required files, and correct MIME type. |
| `PARQUET` | Verifies that the file is a readable Apache Parquet file by loading it using [`pandas`](https://pandas.pydata.org/). |
| `PDF` | Loads and validates the PDF document using [`pypdf`](https://pypi.org/project/pypdf/). |
| `PNG` | Loads and validates the image file. |
| `RDF` | Verifies the file is a valid RDF (Resource Description Framework) document and contains at least two statements. |
| `SHP` | Loads and validates the file using [`GeoPandas`](https://geopandas.org). |
| `WFS` | Validates if the file is a well-formed `WMS_Capabilities` XML document. If not, a `GetCapabilities` request is made and validated. |
| `WMS` | Validates if the file is a well-formed `WFS_Capabilities` XML document. If not, a `GetCapabilities` request is made and validated. |
| `WMTS` | Validates if the file contains a valid WMTS (Web Map Tile Service) capabilities XML response, either directly or by performing a `GetCapabilities` request. |
| `XLSX` | Verifies that the file is a ZIP archive and contains the required files (`xl/workbook.xml` and `xl/styles.xml`) typical of a valid XLSX file. |
| `XML` | Verifies if the file is well-formed XML. |
| `ZIP` | Verifies if the file is a valid ZIP archive using Python's `zipfile.is_zipfile()` method. |
## Installation
......@@ -208,8 +218,6 @@ In this example:
2. Define the desired replacements in the JSON array format described above.
3. Run the script as usual. If the file exists, replacements will be applied automatically.
By using `uri_replacements.json`, you can streamline URL handling and ensure consistent preprocessing for your link-checking tasks.
## Docker
You can run the script in a Docker container. See the [Dockerfile](./Dockerfile) for more information.
......@@ -225,7 +233,7 @@ You can run the script in a Docker container. See the [Dockerfile](./Dockerfile)
2. Run the Docker container:
```sh
docker run --rm dcat-catalog-check --url https://example.com
docker run --rm dcat-catalog-check --url https://example.com
```
## Tests
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment