diff --git a/README.md b/README.md index 12d3898b6cc38f35ad59ace2bf9c477abcb21f2f..e891a00ad61d9c21bebce4eab83fdb04fc874b27 100644 --- a/README.md +++ b/README.md @@ -30,16 +30,26 @@ The following format checks are currently being carried out: | Format | Check | | --------- | ------- | -| `GEOJSON` | Load the file using [`GeoPandas`](https://geopandas.org). | -| `GML` | Load the file using [`GeoPandas`](https://geopandas.org). | -| `JPEG` | Load the image. | -| `JSON` | Is it syntactically correct JSON? If it is a *Frictionless Data Resource*, it is checked with the Frictionless Tools. | -| `PNG` | Load the image. | -| `PDF` | Load the document using [`pypdf`](https://pypi.org/project/pypdf/). | -| `SHP` | Load the file using [`GeoPandas`](https://geopandas.org). | -| `WFS` | Is it a valid well-formed `WMS_Capabilities` XML document? If the address does not contain the `request=GetCapabilities` parameter, a `GetCapabilities` request is performed. This response is then checked. | -| `WMS` | Is it a valid well-formed `WFS_Capabilities` XML document? If the address does not contain the `request=GetCapabilities` parameter, a `GetCapabilities` request is performed. This response is then checked. | -| `XML` | Is it well-formed XML? | +| `ATOM` | Validates whether the file content is a valid ATOM feed by confirming the root element is `<feed>` in the Atom XML namespace. | +| `DOCX` | Verifies that the file is a valid DOCX by ensuring the ZIP archive contains the necessary XML files (`document.xml` and `styles.xml`). | +| `GEOJSON` | Loads and validates the file using [`GeoPandas`](https://geopandas.org). | +| `GEOTIFF` | Verifies the file is a valid GeoTIFF by checking its GeoTransform information and supports both standalone and ZIP-compressed GeoTIFF formats. | +| `GML` | Loads and validates the file using [`GeoPandas`](https://geopandas.org). | +| `JPEG` | Loads and validates the image file. | +| `JSON` | Verifies that the file is syntactically correct JSON and, if it is a *Frictionless Data Resource*, checks it using Frictionless Tools. | +| `ODS` | Validates that the file is a valid ODS (OpenDocument Spreadsheet) by checking the ZIP structure, required files, and correct MIME type. | +| `ODT` | Validates that the file is a valid ODT (OpenDocument Text) by confirming the ZIP structure, required files, and correct MIME type. | +| `PARQUET` | Verifies that the file is a readable Apache Parquet file by loading it using [`pandas`](https://pandas.pydata.org/). | +| `PDF` | Loads and validates the PDF document using [`pypdf`](https://pypi.org/project/pypdf/). | +| `PNG` | Loads and validates the image file. | +| `RDF` | Verifies the file is a valid RDF (Resource Description Framework) document and contains at least two statements. | +| `SHP` | Loads and validates the file using [`GeoPandas`](https://geopandas.org). | +| `WFS` | Validates if the file is a well-formed `WMS_Capabilities` XML document. If not, a `GetCapabilities` request is made and validated. | +| `WMS` | Validates if the file is a well-formed `WFS_Capabilities` XML document. If not, a `GetCapabilities` request is made and validated. | +| `WMTS` | Validates if the file contains a valid WMTS (Web Map Tile Service) capabilities XML response, either directly or by performing a `GetCapabilities` request. | +| `XLSX` | Verifies that the file is a ZIP archive and contains the required files (`xl/workbook.xml` and `xl/styles.xml`) typical of a valid XLSX file. | +| `XML` | Verifies if the file is well-formed XML. | +| `ZIP` | Verifies if the file is a valid ZIP archive using Python's `zipfile.is_zipfile()` method. | ## Installation @@ -208,8 +218,6 @@ In this example: 2. Define the desired replacements in the JSON array format described above. 3. Run the script as usual. If the file exists, replacements will be applied automatically. -By using `uri_replacements.json`, you can streamline URL handling and ensure consistent preprocessing for your link-checking tasks. - ## Docker You can run the script in a Docker container. See the [Dockerfile](./Dockerfile) for more information. @@ -225,7 +233,7 @@ You can run the script in a Docker container. See the [Dockerfile](./Dockerfile) 2. Run the Docker container: ```sh - docker run --rm dcat-catalog-check --url https://example.com + docker run --rm dcat-catalog-check --url https://example.com ``` ## Tests