Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 84 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,26 @@ python3 -m pip install databusclient
You can then use the client in the command line:

```bash
# Python
databusclient --help
databusclient deploy --help
databusclient delete --help
databusclient download --help

# Example output:
# Usage: databusclient [OPTIONS] COMMAND [ARGS]...
#
# Options:
# --install-completion [bash|zsh|fish|powershell|pwsh] Install completion for the specified shell.
# --show-completion [bash|zsh|fish|powershell|pwsh] Show completion for the specified shell.
# --help Show this message and exit.
#
# Commands:
# deploy
# download
# delete
# mkdist
# completion
```

### Download command
```
Comment on lines +66 to 67
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Complete the code block or remove the incomplete heading.

Line 66 contains a heading "### Download command" followed by an incomplete code fence. Either complete this section with the intended content or remove it if it was added in error.

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

67-67: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In README.md around lines 66 to 67, the heading "### Download command" is
followed by an incomplete code fence; complete the markdown by either adding the
intended download command(s) inside a properly closed code block (triple
backticks) and any explanatory text, or remove the heading and the stray
backtick if this section was added by mistake so the file has no unterminated
code fence.


### Docker
Expand Down Expand Up @@ -283,7 +299,7 @@ Usage: databusclient deploy [OPTIONS] [DISTRIBUTIONS]...
- Upload & deploy via Nextcloud (--webdav-url, --remote, --path)

Options:
--version-id TEXT Target databus version/dataset identifier of the form <h
--versionid TEXT Target databus version/dataset identifier of the form <h
ttps://databus.dbpedia.org/$ACCOUNT/$GROUP/$ARTIFACT/$VE
RSION> [required]
--title TEXT Dataset title [required]
Expand All @@ -297,6 +313,18 @@ Options:
--remote TEXT rclone remote name (e.g., 'nextcloud')
--path TEXT Remote path on Nextcloud (e.g., 'datasets/mydataset')
--help Show this message and exit.
<<<<<<< HEAD

```
#### Examples of using deploy command
##### Mode 1: Classic Deploy (Distributions)
```
databusclient deploy --versionid https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 --title title1 --abstract abstract1 --description description1 --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
```

```
databusclient deploy --versionid https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18 --title "Client Testing" --abstract "Testing the client...." --description "Testing the client...." --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
=======
Comment on lines +316 to +327
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Resolve merge conflicts before merging.

The README contains unresolved merge conflict markers at multiple locations (lines 316-327, 368-389, and 419-444). These must be resolved to ensure the documentation is consistent and complete.

Also applies to: 368-389, 419-444

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

321-321: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


325-325: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In README.md around lines 316-327 (also review and fix the similar markers at
368-389 and 419-444), there are unresolved Git conflict markers (<<<<<<<,
=======, >>>>>>>) left in the file; remove the conflict markers, decide which of
the conflicting blocks to keep (or merge their contents manually into a single
correct example), ensure the surrounding Markdown is valid (code fences,
headings, and example commands are coherent), then save and commit the resolved
file.

```

### Mode 1: Classic Deploy (Distributions)
Expand All @@ -320,6 +348,7 @@ docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy \
--license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 \
--apikey MYSTERIOUS \
'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger
>>>>>>> upstream/main
```
A few more notes for CLI usage:

Expand All @@ -336,6 +365,10 @@ All files referenced there will be registered on the Databus.
```bash
# Python
databusclient deploy \
<<<<<<< HEAD
--metadata /home/metadata.json \
--versionid https://databus.org/user/dataset/version/1.0 \
=======
--metadata ./metadata.json \
--version-id https://databus.dbpedia.org/user1/group1/artifact1/1.0 \
--title "Metadata Deploy Example" \
Expand All @@ -347,6 +380,7 @@ databusclient deploy \
docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy \
--metadata ./metadata.json \
--version-id https://databus.dbpedia.org/user1/group1/artifact1/1.0 \
>>>>>>> upstream/main
--title "Metadata Deploy Example" \
--abstract "This is a short abstract of the dataset." \
--description "This dataset was uploaded using metadata.json." \
Expand Down Expand Up @@ -382,6 +416,9 @@ databusclient deploy \
--webdav-url https://cloud.example.com/remote.php/webdav \
--remote nextcloud \
--path datasets/mydataset \
<<<<<<< HEAD
--versionid https://databus.org/user/dataset/version/1.0 \
=======
--version-id https://databus.dbpedia.org/user1/group1/artifact1/1.0 \
--title "Test Dataset" \
--abstract "Short abstract of dataset" \
Expand All @@ -396,6 +433,7 @@ docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy \
--remote nextcloud \
--path datasets/mydataset \
--version-id https://databus.dbpedia.org/user1/group1/artifact1/1.0 \
>>>>>>> upstream/main
--title "Test Dataset" \
--abstract "Short abstract of dataset" \
--description "This dataset was uploaded for testing the Nextcloud → Databus pipeline." \
Expand Down Expand Up @@ -481,6 +519,48 @@ databusclient delete https://databus.dbpedia.org/dbpedia/collections/dbpedia-sna
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12 --databus-key YOUR_API_KEY
```

### mkdist command

Create a distribution string from components.

Usage:
```
databusclient mkdist URL --cv key=value --cv key2=value2 --format ttl --compression gz --sha-length <sha256hex>:<length>
```

Example:
```
python -m databusclient mkdist "https://example.org/file.ttl" --cv lang=en --cv part=sorted --format ttl --compression gz --sha-length aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa:12345
```

## Completion

Enable shell completion (bash example):
```
eval "$(_DATABUSCLIENT_COMPLETE=source_bash python -m databusclient)"
```

### mkdist command

Create a distribution string from components.

Usage:
```
databusclient mkdist URL --cv key=value --cv key2=value2 --format ttl --compression gz --sha-length <sha256hex>:<length>
```

Example:
```
python -m databusclient mkdist "https://example.org/file.ttl" --cv lang=en --cv part=sorted --format ttl --compression gz --sha-length aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa:12345
```

## Completion

Enable shell completion (bash example):
```
eval "$(_DATABUSCLIENT_COMPLETE=source_bash python -m databusclient)"
```

Comment on lines +522 to +563
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove duplicate documentation sections.

The mkdist command documentation (lines 301-314 and 322-335) and Completion documentation (lines 315-321 and 336-342) appear twice in the file. Please remove the duplicate sections.

🔎 Proposed fix

Remove lines 322-342 to eliminate the duplication:

-### mkdist command
-
-Create a distribution string from components.
-
-Usage:
-```
-databusclient mkdist URL --cv key=value --cv key2=value2 --format ttl --compression gz --sha-length <sha256hex>:<length>
-```
-
-Example:
-```
-python -m databusclient mkdist "https://example.org/file.ttl" --cv lang=en --cv part=sorted --format ttl --compression gz --sha-length aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa:12345
-```
-
-## Completion
-
-Enable shell completion (bash example):
-```
-eval "$(_DATABUSCLIENT_COMPLETE=source_bash python -m databusclient)"
-```
-
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

306-306: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


311-311: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


318-318: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


327-327: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


332-332: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


336-336: Multiple headings with the same content

(MD024, no-duplicate-heading)


339-339: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In README.md around lines 301 to 342 there are duplicate "mkdist command" and
"Completion" sections; remove the repeated block (lines 322-342) so only one
instance of the mkdist usage/example and one Completion section remain, ensuring
surrounding spacing and formatting stay consistent after deletion.

## Module Usage

<a id="module-deploy"></a>
Expand Down
47 changes: 47 additions & 0 deletions databusclient/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from typing import List

import click
import re

import databusclient.api.deploy as api_deploy
from databusclient.api.delete import delete as api_delete
Expand Down Expand Up @@ -213,5 +214,51 @@ def delete(databusuris: List[str], databus_key: str, dry_run: bool, force: bool)
)


@app.command()
@click.argument("url")
@click.option("--cv", "cvs", multiple=True, help="Content variant like key=value (repeatable). Keys must not contain '|' or '_'")
@click.option("--format", "file_format", help="Format extension (e.g. ttl)")
@click.option("--compression", help="Compression (e.g. gzip)")
@click.option("--sha-length", help="sha256:length (64 hex chars followed by ':' and integer length)")
@click.option("--json-output", is_flag=True, help="Output JSON distribution object instead of plain string")
def mkdist(url, cvs, file_format, compression, sha_length, json_output):
"""Create a distribution string from components."""
# Validate CVs
cvs_dict = {}
for cv in cvs:
if "=" not in cv:
raise click.BadParameter(f"Invalid content variant '{cv}': expected key=value")
key, val = cv.split("=", 1)
if any(ch in key for ch in ("|", "_")):
raise click.BadParameter("Invalid characters in content-variant key (forbidden: '|' and '_')")
if key in cvs_dict:
raise click.BadParameter(f"Duplicate content-variant key '{key}'")
cvs_dict[key] = val

# Validate sha-length
sha_tuple = None
if sha_length:
if not re.match(r'^[A-Fa-f0-9]{64}:\d+$', sha_length):
raise click.BadParameter("Invalid --sha-length; expected SHA256HEX:length")
sha, length = sha_length.split(":", 1)
sha_tuple = (sha, int(length))

# Deterministic ordering
sorted_cvs = {k: cvs_dict[k] for k in sorted(cvs_dict)}

dist = api_deploy.create_distribution(url=url, cvs=sorted_cvs, file_format=file_format, compression=compression, sha256_length_tuple=sha_tuple)
if json_output:
import json as _json
click.echo(_json.dumps({"distribution": dist}))
else:
click.echo(dist)
Comment on lines +250 to +254
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove redundant local import.

The json module is already imported at line 2. The local import import json as _json on line 149 is unnecessary and can cause confusion.

🔎 Proposed fix

Since json is already imported at the module level (line 2), use it directly:

     dist = client.create_distribution(url=url, cvs=sorted_cvs, file_format=file_format, compression=compression, sha256_length_tuple=sha_tuple)
     if json_output:
-        import json as _json
-        click.echo(_json.dumps({"distribution": dist}))
+        click.echo(json.dumps({"distribution": dist}))
     else:
         click.echo(dist)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if json_output:
import json as _json
click.echo(_json.dumps({"distribution": dist}))
else:
click.echo(dist)
if json_output:
click.echo(json.dumps({"distribution": dist}))
else:
click.echo(dist)
🤖 Prompt for AI Agents
In databusclient/cli.py around lines 148 to 152, there is a redundant local
import "import json as _json" even though json is already imported at the module
level; remove the local import and use the module-level json directly (replace
_json.dumps with json.dumps or simply call json.dumps) so the code echoes JSON
without the unnecessary alias.

Comment on lines +217 to +254
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find the api_deploy module and create_distribution function
find . -name "*.py" -type f | xargs grep -l "def create_distribution" 2>/dev/null

Repository: dbpedia/databus-python-client

Length of output: 102


🏁 Script executed:

# Search for distribution string parsing/parsing logic
rg -n "parse.*distribution|distribution.*parse" --type=py -A 3 -B 3

Repository: dbpedia/databus-python-client

Length of output: 55


🏁 Script executed:

# Search for how distribution strings are split or used with "|" and "_"
rg -n "split.*['\"]\\|['\"]|split.*['\"]_['\"]" --type=py -A 5 -B 3

Repository: dbpedia/databus-python-client

Length of output: 4006


🏁 Script executed:

# Look at the api_deploy module structure
find . -path "*/api_deploy*" -type f 2>/dev/null | head -20

Repository: dbpedia/databus-python-client

Length of output: 55


🏁 Script executed:

rg -n "def create_distribution" --type=py -A 50 databusclient/api/deploy.py

Repository: dbpedia/databus-python-client

Length of output: 2307


🏁 Script executed:

# Also check if there's any encoding/escaping of CV values in the create_distribution function
rg -n "def create_distribution|def mkdist|encode|escape" --type=py -B 2 -A 20 databusclient/api/deploy.py

Repository: dbpedia/databus-python-client

Length of output: 1230


Validate content-variant values for forbidden characters.

The validation at lines 232-233 correctly forbids "|" and "" in content-variant keys, but it does not validate the values. Since the distribution string format (url|key1=value1_key2=value2|format|compression|sha:length) uses "|" as a field separator and "" as a CV pair separator, allowing these characters in values will break downstream parsing. For example, a value containing "_" would be split incorrectly at line 38 in _get_content_variants(), and a value containing "|" would disrupt the initial split at line 28, creating malformed distribution strings.

Add validation for values:

 for cv in cvs:
     if "=" not in cv:
         raise click.BadParameter(f"Invalid content variant '{cv}': expected key=value")
     key, val = cv.split("=", 1)
     if any(ch in key for ch in ("|", "_")):
         raise click.BadParameter("Invalid characters in content-variant key (forbidden: '|' and '_')")
+    if any(ch in val for ch in ("|", "_")):
+        raise click.BadParameter("Invalid characters in content-variant value (forbidden: '|' and '_')")
     if key in cvs_dict:
         raise click.BadParameter(f"Duplicate content-variant key '{key}'")
     cvs_dict[key] = val
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@app.command()
@click.argument("url")
@click.option("--cv", "cvs", multiple=True, help="Content variant like key=value (repeatable). Keys must not contain '|' or '_'")
@click.option("--format", "file_format", help="Format extension (e.g. ttl)")
@click.option("--compression", help="Compression (e.g. gzip)")
@click.option("--sha-length", help="sha256:length (64 hex chars followed by ':' and integer length)")
@click.option("--json-output", is_flag=True, help="Output JSON distribution object instead of plain string")
def mkdist(url, cvs, file_format, compression, sha_length, json_output):
"""Create a distribution string from components."""
# Validate CVs
cvs_dict = {}
for cv in cvs:
if "=" not in cv:
raise click.BadParameter(f"Invalid content variant '{cv}': expected key=value")
key, val = cv.split("=", 1)
if any(ch in key for ch in ("|", "_")):
raise click.BadParameter("Invalid characters in content-variant key (forbidden: '|' and '_')")
if key in cvs_dict:
raise click.BadParameter(f"Duplicate content-variant key '{key}'")
cvs_dict[key] = val
# Validate sha-length
sha_tuple = None
if sha_length:
if not re.match(r'^[A-Fa-f0-9]{64}:\d+$', sha_length):
raise click.BadParameter("Invalid --sha-length; expected SHA256HEX:length")
sha, length = sha_length.split(":", 1)
sha_tuple = (sha, int(length))
# Deterministic ordering
sorted_cvs = {k: cvs_dict[k] for k in sorted(cvs_dict)}
dist = api_deploy.create_distribution(url=url, cvs=sorted_cvs, file_format=file_format, compression=compression, sha256_length_tuple=sha_tuple)
if json_output:
import json as _json
click.echo(_json.dumps({"distribution": dist}))
else:
click.echo(dist)
@app.command()
@click.argument("url")
@click.option("--cv", "cvs", multiple=True, help="Content variant like key=value (repeatable). Keys must not contain '|' or '_'")
@click.option("--format", "file_format", help="Format extension (e.g. ttl)")
@click.option("--compression", help="Compression (e.g. gzip)")
@click.option("--sha-length", help="sha256:length (64 hex chars followed by ':' and integer length)")
@click.option("--json-output", is_flag=True, help="Output JSON distribution object instead of plain string")
def mkdist(url, cvs, file_format, compression, sha_length, json_output):
"""Create a distribution string from components."""
# Validate CVs
cvs_dict = {}
for cv in cvs:
if "=" not in cv:
raise click.BadParameter(f"Invalid content variant '{cv}': expected key=value")
key, val = cv.split("=", 1)
if any(ch in key for ch in ("|", "_")):
raise click.BadParameter("Invalid characters in content-variant key (forbidden: '|' and '_')")
if any(ch in val for ch in ("|", "_")):
raise click.BadParameter("Invalid characters in content-variant value (forbidden: '|' and '_')")
if key in cvs_dict:
raise click.BadParameter(f"Duplicate content-variant key '{key}'")
cvs_dict[key] = val
# Validate sha-length
sha_tuple = None
if sha_length:
if not re.match(r'^[A-Fa-f0-9]{64}:\d+$', sha_length):
raise click.BadParameter("Invalid --sha-length; expected SHA256HEX:length")
sha, length = sha_length.split(":", 1)
sha_tuple = (sha, int(length))
# Deterministic ordering
sorted_cvs = {k: cvs_dict[k] for k in sorted(cvs_dict)}
dist = api_deploy.create_distribution(url=url, cvs=sorted_cvs, file_format=file_format, compression=compression, sha256_length_tuple=sha_tuple)
if json_output:
import json as _json
click.echo(_json.dumps({"distribution": dist}))
else:
click.echo(dist)
🤖 Prompt for AI Agents
In databusclient/cli.py around lines 217 to 254, the current validation forbids
'|' and '_' only in content-variant keys but not in values, which can break the
distribution string format; add the same checks for the cv value after splitting
(forbid '|' and '_' in val) and raise click.BadParameter with a clear message
(e.g. "Invalid characters in content-variant value (forbidden: '|' and '_')") so
values containing those separators are rejected before building the
distribution.



@app.command()
@click.argument("shell", type=click.Choice(["bash","zsh","fish","powershell"]), required=False)
def completion(shell="bash"):
click.echo(f"Run: eval \"$(_DATABUSCLIENT_COMPLETE=source_{shell} python -m databusclient)\"")

Comment on lines +257 to +261
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix default parameter behavior.

When no argument is provided to the completion command, Click will pass None (not "bash") because the argument is marked required=False. The default value in the function signature shell="bash" is never used in this scenario.

🔎 Proposed fix

Handle the None case explicitly or set a default in the function body:

 @app.command()
 @click.argument("shell", type=click.Choice(["bash","zsh","fish","powershell"]), required=False)
-def completion(shell="bash"):
+def completion(shell):
+    if shell is None:
+        shell = "bash"
     click.echo(f"Run: eval \"$(_DATABUSCLIENT_COMPLETE=source_{shell} python -m databusclient)\"")
🤖 Prompt for AI Agents
In databusclient/cli.py around lines 155 to 159, the completion command assumes
a default "bash" via the function signature but Click passes None when the
argument is omitted; update the function to accept shell=None (or keep
signature) and explicitly set shell = shell or "bash" at the start of the
function (or check if shell is None then assign "bash") before using it in the
echo so the default is applied correctly when no argument is provided.


if __name__ == "__main__":
app()
2 changes: 1 addition & 1 deletion test.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env bash

databusclient deploy \
--version-id "https://d8lr.tools.dbpedia.org/hopver/testGroup/testArtifact/1.0-alpha/" \
--versionid "https://d8lr.tools.dbpedia.org/hopver/testGroup/testArtifact/1.0-alpha/" \
--title "Test Title" \
--abstract "Test Abstract" \
--description "Test Description" \
Expand Down
42 changes: 42 additions & 0 deletions tests/test_cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
from click.testing import CliRunner
from databusclient import cli


def test_mkdist_multiple_cv():
runner = CliRunner()
sha = 'a' * 64
res = runner.invoke(cli.app, [
'mkdist',
'https://example.org/file',
'--cv', 'b=2',
'--cv', 'a=1',
'--format', 'ttl',
'--compression', 'gz',
'--sha-length', f'{sha}:42'
])
assert res.exit_code == 0, res.output
# keys should be sorted alphabetically: a then b
assert res.output.strip() == f'https://example.org/file|a=1_b=2|ttl|gz|{sha}:42'


def test_mkdist_invalid_cv():
runner = CliRunner()
res = runner.invoke(cli.app, ['mkdist', 'https://example.org/file', '--cv', 'badcv'])
assert res.exit_code != 0
assert 'Invalid content variant' in res.output


def test_mkdist_invalid_sha():
runner = CliRunner()
res = runner.invoke(cli.app, [
'mkdist', 'https://example.org/file', '--cv', 'k=v', '--sha-length', 'abc:123'
])
assert res.exit_code != 0
assert 'Invalid --sha-length' in res.output


def test_completion_output():
runner = CliRunner()
res = runner.invoke(cli.app, ['completion', 'bash'])
assert res.exit_code == 0
assert '_DATABUSCLIENT_COMPLETE' in res.output