Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 21 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,26 @@
# Changelog

## Version 0.4 (development)
## Version 0.5.0

- SQLAlchemy session management
* Implemented proper session handling
* Fixed `DetachedInstanceError` issues and added helper method `_get_detached_resource` for consistent session management
* Improved transaction handling with commits and rollbacks

- New features
* Added cache statistics with `get_stats()` method
* Implemented resource tagging
* Added cache size management
* Added support for file compression
* Added resource validation with checksums
* Improved search
* Added metadata export/import functionality

## Version 0.4.1

- Method to list all resources.

## Version 0.4

- Migrate the schema to match R/Bioconductor's BiocFileCache (Check out [this issue](https://github.com/BiocPy/pyBiocFileCache/issues/11)). Thanks to [@khoroshevskyi ](https://github.com/khoroshevskyi) for the PR.

Expand Down
107 changes: 62 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,74 +4,91 @@

# pyBiocFileCache

File system based cache for resources & metadata. Compatible with [BiocFileCache R package](https://github.com/Bioconductor/BiocFileCache)
`pyBiocFileCache` is a Python package that provides a robust file caching system with resource validation, cache size management, file compression, and resource tagging. Compatible with [BiocFileCache R package](https://github.com/Bioconductor/BiocFileCache).

***Note: Package is in development. Use with caution!!***
## Installation

### Installation
Install from [PyPI](https://pypi.org/project/pyBiocFileCache/),

Package is published to [PyPI](https://pypi.org/project/pyBiocFileCache/)

```
```bash
pip install pybiocfilecache
```

#### Initialize a cache directory
## Quick Start

```
from pybiocfilecache import BiocFileCache
import os

bfc = BiocFileCache(cache_dir = os.getcwd() + "/cache")
```
```python
from biocfilecache import BiocFileCache

Once the cache directory is created, the library provides methods to
- `add`: Add a resource or artifact to cache
- `get`: Get the resource from cache
- `remove`: Remove a resource from cache
- `update`: update the resource in cache
- `purge`: purge the entire cache, removes all files in the cache directory
# Initialize cache
cache = BiocFileCache("path/to/cache/directory")

### Add a resource to cache
# Add a file to cache
resource = cache.add("myfile", "path/to/file.txt")

(for testing use the temp files in the `tests/data` directory)
# Retrieve a file from cache
resource = cache.get("myfile")

```
rec = bfc.add("test1", os.getcwd() + "/test1.txt")
print(rec)
# Use the cached file
print(resource.rpath) # Path to cached file
```

### Get resource from cache
## Advanced Usage

```
rec = bfc.get("test1")
print(rec)
```
### Configuration

### Remove resource from cache
```python
from biocfilecache import BiocFileCache, CacheConfig
from datetime import timedelta
from pathlib import Path

```
rec = bfc.remove("test1")
print(rec)
# Create custom configuration
config = CacheConfig(
cache_dir=Path("cache_directory"),
max_size_bytes=1024 * 1024 * 1024, # 1GB
cleanup_interval=timedelta(days=7),
compression=True
)

# Initialize cache with configuration
cache = BiocFileCache(config=config)
```

### Update resource in cache
### Resource Management

```
rec = bfc.get("test1"m os.getcwd() + "test2.txt")
print(rec)
```
```python
# Add file with tags and expiration
from datetime import datetime, timedelta

### purge the cache
resource = cache.add(
"myfile",
"path/to/file.txt",
tags=["data", "raw"],
expires=datetime.now() + timedelta(days=30)
)

```
bfc.purge()
# List resources by tag
resources = cache.list_resources(tag="data")

# Search resources
results = cache.search("myfile", field="rname")

# Update resource
cache.update("myfile", "path/to/new_file.txt")

# Remove resource
cache.remove("myfile")
```

### Cache Statistics and Maintenance

<!-- pyscaffold-notes -->
```python
# Get cache statistics
stats = cache.get_stats()
print(stats)

## Note
# Clean up expired resources
removed_count = cache.cleanup()

This project has been set up using PyScaffold 4.1. For details and usage
information on PyScaffold see https://pyscaffold.org/.
# Purge entire cache
cache.purge()
```
28 changes: 28 additions & 0 deletions docs/best_practices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Best Practices

1. Use context managers for cleanup:
```python
with BiocFileCache("cache_directory") as cache:
cache.add("myfile", "path/to/file.txt")
```

2. Add tags for better organization:
```python
cache.add("data.csv", "data.csv", tags=["raw", "csv", "2024"])
```

3. Set expiration dates for temporary files:
```python
cache.add("temp.txt", "temp.txt", expires=datetime.now() + timedelta(hours=1))
```

4. Regular maintenance:
```python
# Periodically clean up expired resources
cache.cleanup()

# Monitor cache size
stats = cache.get_stats()
if stats["cache_size_bytes"] > threshold:
# Take action
```
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ package_dir =
# For more information, check out https://semver.org/.
install_requires =
importlib-metadata; python_version<"3.8"
sqlalchemy>=2,<2.1
sqlalchemy

[options.packages.find]
where = src
Expand Down
Loading
Loading