From b1fec3d0d2c898e0a0223ea635d30fefbcc3c429 Mon Sep 17 00:00:00 2001 From: Muhammad Muqarrab Date: Fri, 8 Aug 2025 15:26:59 +0500 Subject: [PATCH 1/2] Update _index.md --- content/home/english/_index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/home/english/_index.md b/content/home/english/_index.md index 7e0dd6c..6534b89 100644 --- a/content/home/english/_index.md +++ b/content/home/english/_index.md @@ -37,7 +37,7 @@ Learn document automation and report generation with our practical guides coveri ## Data Extraction & Security Tutorials -### [GroupDocs.Parser Cloud Tutorials](#) +### [GroupDocs.Parser Cloud Tutorials](./parser/) Discover techniques for extracting text, images, and metadata from various document formats with our comprehensive guides for data extraction and document parsing. ### [GroupDocs.Signature Cloud Tutorials](#) From 351475eaaa4e15928e04094b5a0c16a009962e1a Mon Sep 17 00:00:00 2001 From: Muhammad Muqarrab Date: Fri, 8 Aug 2025 15:31:13 +0500 Subject: [PATCH 2/2] Updated Parser tutorials --- content/parser/english/_index.md | 229 +++++++ .../parser/english/data_operations/_index.md | 42 ++ .../get_document_information_tutorial.md | 269 ++++++++ .../get_supported_file_types_tutorial.md | 224 +++++++ .../parser/english/parse-operations/_index.md | 55 ++ .../extract-formatted-text-tutorial.md | 445 +++++++++++++ ...ages-document-inside-container-tutorial.md | 620 ++++++++++++++++++ ...tract-images-page-number-range-tutorial.md | 488 ++++++++++++++ .../extract-images-whole-document-tutorial.md | 387 +++++++++++ ...extract-text-page-number-range-tutorial.md | 339 ++++++++++ .../extract-text-whole-document-tutorial.md | 264 ++++++++ .../english/storage-operations/_index.md | 53 ++ .../working-with-files-tutorial.md | 470 +++++++++++++ .../working-with-folder-tutorial.md | 464 +++++++++++++ .../working-with-storage-tutorial.md | 290 ++++++++ .../english/template-operations/_index.md | 65 ++ .../create-or-update-template-tutorial.md | 311 +++++++++ .../delete-template-tutorial.md | 204 ++++++ .../get-template/get-template-tutorial.md | 252 +++++++ 19 files changed, 5471 insertions(+) create mode 100644 content/parser/english/_index.md create mode 100644 content/parser/english/data_operations/_index.md create mode 100644 content/parser/english/data_operations/get-document-information/get_document_information_tutorial.md create mode 100644 content/parser/english/data_operations/get-supported-file-types/get_supported_file_types_tutorial.md create mode 100644 content/parser/english/parse-operations/_index.md create mode 100644 content/parser/english/parse-operations/extract-formatted-text/extract-formatted-text-tutorial.md create mode 100644 content/parser/english/parse-operations/extract-images-document-inside-container/extract-images-document-inside-container-tutorial.md create mode 100644 content/parser/english/parse-operations/extract-images-page-number-range/extract-images-page-number-range-tutorial.md create mode 100644 content/parser/english/parse-operations/extract-images-whole-document/extract-images-whole-document-tutorial.md create mode 100644 content/parser/english/parse-operations/extract-text-page-number-range/extract-text-page-number-range-tutorial.md create mode 100644 content/parser/english/parse-operations/extract-text-whole-document/extract-text-whole-document-tutorial.md create mode 100644 content/parser/english/storage-operations/_index.md create mode 100644 content/parser/english/storage-operations/working-with-files/working-with-files-tutorial.md create mode 100644 content/parser/english/storage-operations/working-with-folder/working-with-folder-tutorial.md create mode 100644 content/parser/english/storage-operations/working-with-storage/working-with-storage-tutorial.md create mode 100644 content/parser/english/template-operations/_index.md create mode 100644 content/parser/english/template-operations/create-or-update-template/create-or-update-template-tutorial.md create mode 100644 content/parser/english/template-operations/delete-template/delete-template-tutorial.md create mode 100644 content/parser/english/template-operations/get-template/get-template-tutorial.md diff --git a/content/parser/english/_index.md b/content/parser/english/_index.md new file mode 100644 index 0000000..d8ad868 --- /dev/null +++ b/content/parser/english/_index.md @@ -0,0 +1,229 @@ +--- +id: "developer-guide" +url: /parser/ +title: "Document Parsing API - Extract Text, Images & Metadata" +linktitle: "Document Parsing API" +productName: "GroupDocs.Parser Cloud" +weight: 2 +description: "Complete resource for document parsing API integration. Extract text, images, and metadata from 50+ formats using GroupDocs.Parser Cloud with SDKs and examples." +keywords: "document parsing API, cloud document extraction, text extraction API, document metadata extraction, PDF text extraction, document parser SDK" +date: "2025-01-02" +lastmod: "2025-01-02" +categories: ["Developer Tools"] +tags: ["document-parsing", "cloud-api", "text-extraction", "api-integration"] +--- + +# Complete Document Parsing API + +Looking to extract data from documents in your application? You're in the right place. This comprehensive guide walks you through everything you need to know about implementing GroupDocs.Parser Cloud - a powerful document parsing API that handles 50+ file formats. + +Whether you're building a content management system, automating document workflows, or creating data extraction pipelines, this guide will get you up and running quickly with practical examples and proven best practices. + +## Why Choose GroupDocs.Parser Cloud for Document Extraction? + +**Simplified Integration**: No need to install complex libraries or worry about format compatibility. One API handles everything from Word documents to PDFs, emails to eBooks. + +**Cloud-Native Architecture**: Scale automatically based on your parsing volume. No server maintenance, no storage concerns - just reliable document processing. + +**Developer-Friendly**: SDKs available in 6+ programming languages with comprehensive documentation and code examples. + +## Getting Started with Document Parsing API + +Ready to start extracting data from your documents? Here's your roadmap: + +### Essential Tutorials for Implementation + +1. [Cloud API Document Data Operations Tutorials](./data-operations/) - Master the fundamentals of extracting text, metadata, and structured data from documents. Perfect starting point for new developers. + +2. [Cloud API Document Parse Operations Tutorials](./parse-operations/) - Dive deeper into advanced parsing techniques including table extraction, barcode recognition, and custom data parsing workflows. + +3. [Cloud API Document Storage Operations Tutorials](./storage-operations/) - Learn efficient document storage management, batch processing, and optimization strategies for large-scale operations. + +4. [Cloud API Document Template Operations Tutorials](./template-operations/) - Unlock the power of template-based parsing for consistent data extraction from similar document structures. + +### Core Setup Requirements + +Before diving into the tutorials, you'll need to handle these essentials: + +- **Authentication**: Secure your API requests with proper authentication tokens +- **SDK Installation**: Choose your preferred programming language and install the corresponding SDK +- **API Endpoints**: Familiarize yourself with the RESTful endpoints and their specific use cases + +## Document Parsing API Features That Save Development Time + +### Text Extraction Made Simple + +Extract text in multiple formats depending on your needs: +- **Raw text**: Perfect for search indexing and content analysis +- **Formatted text**: Preserves styling for display purposes +- **Structured text**: Maintains document hierarchy for complex processing + +**Common Use Case**: Content management systems use raw text extraction for search functionality while preserving formatted text for user display. + +### Metadata Extraction for Document Intelligence + +Beyond just text, you can extract valuable document properties: +- Creation dates and modification timestamps +- Author information and document statistics +- Custom properties specific to different file formats +- Security settings and permissions + +**Pro Tip**: Metadata extraction is incredibly useful for document classification and automated filing systems. + +### Image and Media Extraction + +Pull out embedded images, charts, and graphics from documents: +- High-quality image preservation +- Batch extraction from multi-page documents +- Format conversion capabilities +- Coordinate and positioning data + +### Advanced Data Parsing Capabilities + +**Table Extraction**: Convert document tables into structured data formats like JSON or CSV. Essential for processing invoices, reports, and financial documents. + +**Barcode Recognition**: Automatically identify and decode various barcode types. Perfect for inventory management and document tracking systems. + +**Text Search**: Perform precise text searches within documents before extraction. Saves processing time and reduces bandwidth usage. + +## Supported Document Formats (50+ Types) + +The document parsing API handles virtually any file format you'll encounter: + +### Office Documents +- **Microsoft Office**: DOCX, XLSX, PPTX, DOC, XLS, PPT +- **OpenOffice**: ODT, ODS, ODP +- **Legacy formats**: Works with older Office versions seamlessly + +### Digital Documents +- **PDF**: All versions including password-protected files +- **Email formats**: EML, MSG, EMLX with attachment support +- **eBooks**: EPUB, FB2, CHM with metadata preservation + +### Web and Markup +- **HTML, XML, RTF**: Perfect for web scraping and content migration projects +- **Archive formats**: ZIP, RAR with recursive extraction capabilities + +**Implementation Note**: The API automatically detects file formats, so you don't need to specify the document type in most cases. + +## Language-Specific Implementation Examples + +### Popular SDK Options + +Choose the SDK that matches your development stack: + +- **C#**: Full .NET Framework and .NET Core support +- **Java**: Compatible with Java 8+ and all major frameworks +- **PHP**: PSR-4 compliant with Composer integration +- **Python**: Works with Python 3.6+ and popular frameworks like Django, Flask +- **Ruby**: Rails-friendly implementation with gem packaging +- **Node.js**: Promise-based API with async/await support + +**Best Practice**: Start with the SDK for your primary language, then expand to others as needed for microservices architectures. + +## Common Use Cases and Applications + +### Enterprise Document Processing +- **Invoice Processing**: Extract vendor information, amounts, and line items +- **Contract Analysis**: Pull key terms, dates, and parties from legal documents +- **Report Generation**: Aggregate data from multiple document sources + +### Content Management Systems +- **Document Search**: Index text content for full-text search capabilities +- **Automated Tagging**: Use metadata extraction for automatic categorization +- **Version Control**: Track document changes through metadata comparison + +### Data Migration Projects +- **Legacy System Modernization**: Extract data from old document formats +- **Database Population**: Convert document content into structured database records +- **Archive Digitization**: Process large volumes of scanned documents + +## Implementation Best Practices + +### Performance Optimization Strategies + +**Batch Processing**: Group similar documents together to reduce API calls and improve throughput. The API handles concurrent requests efficiently. + +**Selective Extraction**: Only extract the data you need. If you just need text, don't request images and metadata - it'll speed up processing significantly. + +**Caching Results**: Implement local caching for frequently accessed documents to reduce API usage and improve response times. + +### Error Handling and Reliability + +**Graceful Degradation**: Always implement fallback logic for unsupported formats or corrupted files. + +**Retry Logic**: Network issues happen - implement exponential backoff retry mechanisms for failed requests. + +**Validation**: Verify extracted data quality, especially for critical business processes. + +### Security Considerations + +**Token Management**: Rotate API keys regularly and store them securely (never in source code). + +**Data Privacy**: Understand data retention policies and ensure compliance with regulations like GDPR. + +**Transmission Security**: All API communications use HTTPS encryption, but verify this in your implementation. + +## Troubleshooting Common Issues + +### Authentication Problems +**Issue**: "Unauthorized" or "Invalid credentials" errors +**Solution**: Double-check your API key and ensure it's properly included in request headers. Verify the key hasn't expired. + +### Large File Processing +**Issue**: Timeouts with large documents (>50MB) +**Solution**: Consider breaking large documents into smaller chunks or using asynchronous processing endpoints. + +### Format-Specific Errors +**Issue**: Extraction fails for specific document types +**Solution**: Verify the document isn't corrupted by testing with a known-good file of the same format. + +### Rate Limiting +**Issue**: "Too Many Requests" responses +**Solution**: Implement proper rate limiting in your application and consider upgrading your plan for higher throughput. + +## Performance Optimization Tips + +**Document Size Considerations**: Files under 10MB process fastest. For larger files, expect proportionally longer processing times. + +**Concurrent Requests**: Most plans support multiple simultaneous requests. Check your plan limits and optimize accordingly. + +**Regional Endpoints**: Use the API endpoint closest to your users' location for best performance. + +**Format Optimization**: PDF and DOCX files generally process faster than image-heavy presentations or complex spreadsheets. + +## Advanced Implementation Topics + +### Custom Parsing Templates +Create reusable templates for documents with consistent structures. This dramatically improves accuracy and processing speed for repetitive document types. + +### Webhook Integration +Set up real-time notifications for document processing completion, especially useful for large batch operations. + +### Multi-Language Support +The API handles documents in multiple languages automatically, with special optimizations for RTL languages and complex scripts. + +## Frequently Asked Questions + +**How accurate is the text extraction from scanned PDFs?** +OCR accuracy depends on document quality, but typically ranges from 95-99% for clear, well-scanned documents. + +**Can I extract data from password-protected documents?** +Yes, you can provide passwords through the API for encrypted PDFs and Office documents. + +**What's the maximum file size supported?** +Individual files up to 500MB are supported, though processing time increases with file size. + +**How do I handle documents with multiple languages?** +The API automatically detects and processes multi-language documents without additional configuration. + +**Is there a way to preview extraction results before processing?** +Yes, you can use the document information endpoint to get metadata and basic structure before full extraction. + +## Next Steps and Resources + +### Essential Resources for Success + +- [API Reference Documentation](https://apireference.groupdocs.cloud/parser/) Complete technical specifications for all endpoints +- [Interactive API Explorer](https://apireference.groupdocs.cloud/parser/) Test API calls directly in your browser +- [Community Forum](https://forum.groupdocs.com/) Get help from other developers and GroupDocs experts diff --git a/content/parser/english/data_operations/_index.md b/content/parser/english/data_operations/_index.md new file mode 100644 index 0000000..f2788a9 --- /dev/null +++ b/content/parser/english/data_operations/_index.md @@ -0,0 +1,42 @@ +--- +title: GroupDocs.Parser Cloud API Document Data Operations Tutorials +url: /data-operations/ +weight: 1 +description: Step-by-step tutorials for extracting and processing document data with GroupDocs.Parser Cloud API +--- + +# GroupDocs.Parser Cloud API Document Data Operations Tutorials + +Welcome to our hands-on tutorial series for developers learning to work with document data operations using GroupDocs.Parser Cloud API. These tutorials are designed to take you from basic document information retrieval to advanced container operations through practical, step-by-step instructions. + +## Learning Path: From Basics to Advanced Document Parsing + +This tutorial series presents a structured learning path to help you master GroupDocs.Parser Cloud API document operations. Each tutorial builds upon knowledge gained in previous lessons, gradually increasing in complexity while providing practical implementations you can apply to your own projects. + +### Getting Started with Document Information Operations + +Begin your journey with these foundational tutorials: + +1. [Learn to Get Supported File Types](/data-operations/get-supported-file-types/) - Master how to retrieve the complete list of file formats supported by GroupDocs.Parser Cloud. + +2. [Tutorial: How to Get Document Information](/data-operations/get-document-information/) - Learn to extract essential document metadata including file format, size, and page count. + +Each tutorial includes complete code examples in multiple programming languages, detailed explanations, and practical scenarios to enhance your learning experience. + +## Prerequisites + +Before starting these tutorials, you should have: + +- A GroupDocs.Cloud account (if you don't have one, [sign up for a free trial](https://dashboard.groupdocs.cloud/#/apps)) +- Basic knowledge of REST APIs and your preferred programming language +- Your GroupDocs application Client ID and Client Secret from the [dashboard](https://dashboard.groupdocs.cloud/#/apps) + +## Helpful Resources + +- [Product Page](https://products.groupdocs.cloud/parser/) +- [Documentation](https://docs.groupdocs.cloud/parser/) +- [Live Demo](https://products.groupdocs.app/parser/family) +- [API Reference](https://reference.groupdocs.cloud/parser/) +- [Blog](https://blog.groupdocs.cloud/categories/groupdocs.parser-cloud-product-family/) +- [Free Support](https://forum.groupdocs.cloud/c/parser/19/) +- [Free Trial](https://dashboard.groupdocs.cloud/#/apps) diff --git a/content/parser/english/data_operations/get-document-information/get_document_information_tutorial.md b/content/parser/english/data_operations/get-document-information/get_document_information_tutorial.md new file mode 100644 index 0000000..2127ee7 --- /dev/null +++ b/content/parser/english/data_operations/get-document-information/get_document_information_tutorial.md @@ -0,0 +1,269 @@ +--- +title: How to Get Document Information with GroupDocs.Parser Cloud API Tutorial +url: /data-operations/get-document-information/ +weight: 2 +description: Learn to extract document metadata including file format, size, and page count using GroupDocs.Parser Cloud API in this developer tutorial +--- + +# Tutorial: How to Get Document Information with GroupDocs.Parser Cloud API + +## Learning Objectives + +In this tutorial, you'll learn how to: +- Retrieve basic document metadata using GroupDocs.Parser Cloud API +- Extract information such as file format, size, and page count +- Handle standard and password-protected documents +- Implement this functionality in multiple programming languages + +## Prerequisites + +Before starting this tutorial, you should have: + +1. A GroupDocs.Cloud account (if you don't have one, [sign up for a free trial](https://dashboard.groupdocs.cloud/#/apps)) +2. Your Client ID and Client Secret credentials from the [dashboard](https://dashboard.groupdocs.cloud/#/apps) +3. Basic understanding of REST API concepts +4. A document uploaded to your GroupDocs storage (or you can use the sample documents provided in the API) +5. Familiarity with your chosen programming language (C#, Java, or cURL) + +## Real-World Scenario + +Imagine you're developing a document management system that needs to display metadata for each uploaded file. Before processing any document, you want to retrieve basic information like file type, size, and page count to help users identify their documents and to plan processing resources. This tutorial shows you how to implement this functionality. + +## Step 1: Obtain an Access Token + +Before making API calls, you need to authenticate with the GroupDocs.Parser Cloud API using your Client ID and Client Secret. + +### Try it yourself: + +Use the following cURL command to obtain a JWT token: + +```bash +curl -v "https://api.groupdocs.cloud/connect/token" \ +-X POST \ +-d "grant_type=client_credentials&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET" \ +-H "Content-Type: application/x-www-form-urlencoded" \ +-H "Accept: application/json" +``` + +Remember to replace `YOUR_CLIENT_ID` and `YOUR_CLIENT_SECRET` with your actual credentials. + +The response will include an access token that you'll use in subsequent API calls: + +```json +{ + "access_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...", + "expires_in": 3600, + "token_type": "bearer" +} +``` + +## Step 2: Prepare the Request + +To get document information, you'll need to call the `/info` endpoint with the path to your document in storage. + +### Understanding the Request Parameters + +| Name | Description | Comment | +|---|---|---| +| FileInfo.FilePath | The path of the document in storage | Required | +| FileInfo.StorageName | Storage name | Optional (omit for default storage) | +| FileInfo.Password | The password to open file | Required only for password-protected documents | +| ContainerItemInfo.RelativePath | The relative path of the container | Required only for container files | +| ContainerItemInfo.Password | Password for container items | Required only for password-protected container items | + +## Step 3: Call the Get Document Information API + +Now that you have your authentication token, you can call the API endpoint to get document information. + +### cURL Example + +```bash +curl -v "https://api.groupdocs.cloud/v1.0/parser/info" \ +-X POST \ +-H "Content-Type: application/json" \ +-H "Accept: application/json" \ +-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \ +-d "{ \"FileInfo\": { \"FilePath\": \"/words/four-pages.docx\" } }" +``` + +### Response + +The API returns a JSON response with the document information: + +```json +{ + "fileType": { + "fileFormat": "Microsoft Word Open XML Document", + "extension": ".docx" + }, + "size": 8651, + "pageCount": 4 +} +``` + +## Step 4: Handle Password-Protected Documents + +If you need to get information from a password-protected document, you'll need to include the password in your request. + +### cURL Example for Password-Protected Document + +```bash +curl -v "https://api.groupdocs.cloud/v1.0/parser/info" \ +-X POST \ +-H "Content-Type: application/json" \ +-H "Accept: application/json" \ +-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \ +-d "{ \"FileInfo\": { \"FilePath\": \"/words/protected-document.docx\", \"Password\": \"password123\" } }" +``` + +## Step 5: Implement in Your Application + +Let's implement this functionality using SDK examples in different programming languages. + +### C# Example + +```csharp +using System; +using GroupDocs.Parser.Cloud.Sdk.Api; +using GroupDocs.Parser.Cloud.Sdk.Client; +using GroupDocs.Parser.Cloud.Sdk.Model; +using GroupDocs.Parser.Cloud.Sdk.Model.Requests; + +namespace GetDocumentInformation +{ + class Program + { + static void Main(string[] args) + { + // Configure API client + var configuration = new Configuration + { + AppSid = "YOUR_CLIENT_ID", + AppKey = "YOUR_CLIENT_SECRET" + }; + + // Create API instance + var infoApi = new InfoApi(configuration); + + try + { + // Create request object + var fileInfo = new FileInfo + { + FilePath = "/words/four-pages.docx" + // Add Password property if the document is password-protected + // Password = "password123" + }; + + var request = new GetInfoRequest(fileInfo); + + // Get document information + var response = infoApi.GetInfo(request); + + // Display document information + Console.WriteLine($"File Format: {response.FileType.FileFormat}"); + Console.WriteLine($"Extension: {response.FileType.Extension}"); + Console.WriteLine($"Size: {response.Size} bytes"); + Console.WriteLine($"Page Count: {response.PageCount}"); + } + catch (Exception e) + { + Console.WriteLine("Error: " + e.Message); + } + } + } +} +``` + +### Java Example + +```java +import com.groupdocs.parser.cloud.sdk.api.InfoApi; +import com.groupdocs.parser.cloud.sdk.client.ApiException; +import com.groupdocs.parser.cloud.sdk.client.Configuration; +import com.groupdocs.parser.cloud.sdk.model.FileInfo; +import com.groupdocs.parser.cloud.sdk.model.InfoResult; +import com.groupdocs.parser.cloud.sdk.model.requests.GetInfoRequest; + +public class GetDocumentInformation { + public static void main(String[] args) { + // Configure API client + Configuration configuration = new Configuration(); + configuration.setAppSid("YOUR_CLIENT_ID"); + configuration.setAppKey("YOUR_CLIENT_SECRET"); + + try { + // Create InfoApi instance + InfoApi infoApi = new InfoApi(configuration); + + // Create request object + FileInfo fileInfo = new FileInfo(); + fileInfo.setFilePath("/words/four-pages.docx"); + // Add password if the document is password-protected + // fileInfo.setPassword("password123"); + + GetInfoRequest request = new GetInfoRequest(fileInfo); + + // Get document information + InfoResult response = infoApi.getInfo(request); + + // Display document information + System.out.println("File Format: " + response.getFileType().getFileFormat()); + System.out.println("Extension: " + response.getFileType().getExtension()); + System.out.println("Size: " + response.getSize() + " bytes"); + System.out.println("Page Count: " + response.getPageCount()); + } catch (ApiException e) { + System.err.println("Error: " + e.getMessage()); + e.printStackTrace(); + } + } +} +``` + +## Learning Checkpoint + +Let's verify what you've learned so far: + +1. What endpoint is used to retrieve document information? +2. What parameters must be included in the request? +3. How do you handle password-protected documents? +4. What information does the API return about a document? + +## Troubleshooting Common Issues + +1. File Not Found: If you receive a 404 error, check that the document path is correct and that the file exists in your storage. + +2. Authentication Errors: If you receive a 401 Unauthorized error, ensure your access token is valid and hasn't expired. + +3. Password Required: If you receive an error indicating that a password is required, ensure you've included the correct password in your request for protected documents. + +4. Unsupported Format: If you receive an error about an unsupported format, check if the file type is supported by GroupDocs.Parser Cloud. You can use the [Get Supported File Types](/data-operations/get-supported-file-types/) API to verify. + +## What You've Learned + +In this tutorial, you've learned how to: +- Authenticate with the GroupDocs.Parser Cloud API +- Make API calls to retrieve document information +- Process responses to extract metadata like file format, size, and page count +- Handle password-protected documents +- Implement this functionality using C# and Java SDKs + +This knowledge is essential for building document management systems that need to process and validate documents before performing more complex operations. + +## Further Practice + +To reinforce your learning: +1. Try retrieving information for different document types (PDF, Excel, images) +2. Create a simple application that displays document thumbnails along with metadata +3. Implement error handling for various scenarios (file not found, unsupported format, etc.) + + +## Helpful Resources + +- [Product Page](https://products.groupdocs.cloud/parser/) +- [Documentation](https://docs.groupdocs.cloud/parser/) +- [Live Demo](https://products.groupdocs.app/parser/family) +- [API Reference](https://reference.groupdocs.cloud/parser/) +- [Blog](https://blog.groupdocs.cloud/categories/groupdocs.parser-cloud-product-family/) +- [Free Support](https://forum.groupdocs.cloud/c/parser/19/) +- [Free Trial](https://dashboard.groupdocs.cloud/#/apps) diff --git a/content/parser/english/data_operations/get-supported-file-types/get_supported_file_types_tutorial.md b/content/parser/english/data_operations/get-supported-file-types/get_supported_file_types_tutorial.md new file mode 100644 index 0000000..da0dc3e --- /dev/null +++ b/content/parser/english/data_operations/get-supported-file-types/get_supported_file_types_tutorial.md @@ -0,0 +1,224 @@ +--- +title: How to Get Supported File Types with GroupDocs.Parser Cloud API Tutorial +url: /data-operations/get-supported-file-types/ +weight: 1 +description: Learn to retrieve all supported file formats with GroupDocs.Parser Cloud API in this step-by-step tutorial for developers +--- + +# Tutorial: How to Get Supported File Types with GroupDocs.Parser Cloud API + +## Learning Objectives + +In this tutorial, you'll learn how to: +- Authenticate with the GroupDocs.Parser Cloud API +- Make REST API calls to retrieve supported file formats +- Implement this functionality in various programming languages +- Handle and process the response data + +## Prerequisites + +Before starting this tutorial, you should have: + +1. A GroupDocs.Cloud account (if you don't have one, [sign up for a free trial](https://dashboard.groupdocs.cloud/#/apps)) +2. Your Client ID and Client Secret credentials from the [dashboard](https://dashboard.groupdocs.cloud/#/apps) +3. Basic understanding of REST API concepts +4. Familiarity with your chosen programming language (C#, Java, or cURL) + +## Real-World Scenario + +Imagine you're developing a document management application that needs to validate file types before processing. Before accepting user uploads, you want to check if the file format is supported by the parsing system. This tutorial shows you how to retrieve the complete list of supported formats to implement this validation. + +## Step 1: Obtain an Access Token + +Before making API calls, you need to authenticate with the GroupDocs.Parser Cloud API using your Client ID and Client Secret. + +### Try it yourself: + +Use the following cURL command to obtain a JWT token: + +```bash +curl -v "https://api.groupdocs.cloud/connect/token" \ +-X POST \ +-d "grant_type=client_credentials&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET" \ +-H "Content-Type: application/x-www-form-urlencoded" \ +-H "Accept: application/json" +``` + +Remember to replace `YOUR_CLIENT_ID` and `YOUR_CLIENT_SECRET` with your actual credentials. + +The response will include an access token that you'll use in subsequent API calls: + +```json +{ + "access_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...", + "expires_in": 3600, + "token_type": "bearer" +} +``` + +## Step 2: Call the Get Supported File Types API + +Now that you have your authentication token, you can call the API endpoint to get the list of supported file formats. + +### cURL Example + +```bash +curl -v "https://api.groupdocs.cloud/v1.0/parser/formats" \ +-X GET \ +-H "Content-Type: application/json" \ +-H "Accept: application/json" \ +-H "Authorization: Bearer YOUR_ACCESS_TOKEN" +``` + +### Response + +The API returns a JSON response with all supported file formats: + +```json +{ + "formats": [ + { + "extension": ".doc", + "fileFormat": "Microsoft Word Document" + }, + { + "extension": ".docm", + "fileFormat": "Word Open XML Macro-Enabled Document" + }, + { + "extension": ".docx", + "fileFormat": "Microsoft Word Open XML Document" + }, + /* Additional formats listed here */ + { + "extension": ".xlsx", + "fileFormat": "Microsoft Excel Open XML Spreadsheet" + } + ] +} +``` + +## Step 3: Implement in Your Application + +Let's implement this functionality using SDK examples in different programming languages. + +### C# Example + +```csharp +using System; +using GroupDocs.Parser.Cloud.Sdk.Api; +using GroupDocs.Parser.Cloud.Sdk.Client; +using GroupDocs.Parser.Cloud.Sdk.Model; + +namespace GetSupportedFileTypes +{ + class Program + { + static void Main(string[] args) + { + // Configure API client + var configuration = new Configuration + { + AppSid = "YOUR_CLIENT_ID", + AppKey = "YOUR_CLIENT_SECRET" + }; + + // Create API instance + var infoApi = new InfoApi(configuration); + + try + { + // Get supported file formats + var response = infoApi.GetSupportedFileFormats(); + + // Display supported formats + Console.WriteLine("Supported File Formats:"); + foreach (var format in response.Formats) + { + Console.WriteLine($"Extension: {format.Extension}, Format: {format.FileFormat}"); + } + } + catch (Exception e) + { + Console.WriteLine("Error: " + e.Message); + } + } + } +} +``` + +### Java Example + +```java +import com.groupdocs.parser.cloud.sdk.api.InfoApi; +import com.groupdocs.parser.cloud.sdk.client.ApiException; +import com.groupdocs.parser.cloud.sdk.client.Configuration; +import com.groupdocs.parser.cloud.sdk.model.FormatsResult; +import com.groupdocs.parser.cloud.sdk.model.Format; + +public class GetSupportedFileTypes { + public static void main(String[] args) { + // Configure API client + Configuration configuration = new Configuration(); + configuration.setAppSid("YOUR_CLIENT_ID"); + configuration.setAppKey("YOUR_CLIENT_SECRET"); + + try { + // Create InfoApi instance + InfoApi infoApi = new InfoApi(configuration); + + // Get supported file formats + FormatsResult response = infoApi.getSupportedFileFormats(); + + // Display supported formats + System.out.println("Supported File Formats:"); + for (Format format : response.getFormats()) { + System.out.println("Extension: " + format.getExtension() + + ", Format: " + format.getFileFormat()); + } + } catch (ApiException e) { + System.err.println("Error: " + e.getMessage()); + e.printStackTrace(); + } + } +} +``` + +## Troubleshooting Common Issues + +1. Authentication Errors: If you receive a 401 Unauthorized error, ensure your Client ID and Client Secret are correct and that your token hasn't expired. + +2. Connection Issues: If you can't connect to the API, check your network connection and ensure the API endpoint URL is correct. + +3. Parsing Response: If you have trouble processing the response, verify that you're correctly deserializing the JSON response according to your programming language's conventions. + +## What You've Learned + +In this tutorial, you've learned how to: +- Authenticate with the GroupDocs.Parser Cloud API +- Make API calls to retrieve the list of supported file formats +- Process and display the formats information +- Implement this functionality using C# and Java SDKs + +This knowledge forms the foundation for working with document formats in your applications using GroupDocs.Parser Cloud. + +## Further Practice + +To reinforce your learning: +1. Try implementing the same functionality using a different programming language +2. Create a simple file validator that checks if a given extension is supported +3. Build a user interface that displays all supported formats categorized by document type + +## Next Steps + +Ready to continue your learning journey? Check out our next tutorial: [How to Get Document Information](/data-operations/get-document-information/) to learn how to extract metadata from specific documents. + +## Helpful Resources + +- [Product Page](https://products.groupdocs.cloud/parser/) +- [Documentation](https://docs.groupdocs.cloud/parser/) +- [Live Demo](https://products.groupdocs.app/parser/family) +- [API Reference](https://reference.groupdocs.cloud/parser/) +- [Blog](https://blog.groupdocs.cloud/categories/groupdocs.parser-cloud-product-family/) +- [Free Support](https://forum.groupdocs.cloud/c/parser/19/) +- [Free Trial](https://dashboard.groupdocs.cloud/#/apps) diff --git a/content/parser/english/parse-operations/_index.md b/content/parser/english/parse-operations/_index.md new file mode 100644 index 0000000..a40ac7b --- /dev/null +++ b/content/parser/english/parse-operations/_index.md @@ -0,0 +1,55 @@ +--- +title: GroupDocs.Parser Cloud API Document Parse Operations Tutorials +url: /parse-operations/ +weight: 1 +description: Hands-on tutorials for developers to learn and implement various document parsing operations using GroupDocs.Parser Cloud API +--- + +# Tutorial: Document Parse Operations with GroupDocs.Parser Cloud API + +Welcome to our hands-on tutorial series on document parsing operations using GroupDocs.Parser Cloud API. These tutorials are designed to help developers learn how to extract and process content from various document formats efficiently and effectively. + +## Learning Path + +This tutorial series follows a progressive learning path, starting with basic operations and moving toward more advanced techniques: + +1. Beginner Level: Learn the fundamentals of extracting text and images from documents +2. Intermediate Level: Master working with specific page ranges and formatted text +3. Advanced Level: Implement template-based parsing and working with documents in containers + +## Available Tutorials + +### Text Extraction Tutorials + +- [Tutorial: Extract Text from the Whole Document](/parse-operations/extract-text-whole-document/) - Learn how to extract all text content from documents with simple API calls +- [Tutorial: Extract Text by a Page Number Range](/parse-operations/extract-text-page-number-range/) - Master extracting text from specific document pages by defining page ranges +- [Tutorial: Extract Formatted Text](/parse-operations/extract-formatted-text/) - Discover how to preserve text formatting (HTML, Markdown) when extracting document content + +### Image Extraction Tutorials + +- [Tutorial: Extract Images from the Whole Document](/extract-images/extract-images-whole-document/) - Learn to extract all images from various document formats +- [Tutorial: Extract Images by a Page Number Range](/extract-images//extract-images-page-number-range/) - Master extracting images from specific document pages +- [Tutorial: Extract Images from a Document Inside a Container](/extract-images//extract-images-document-inside-container/) - Discover advanced techniques for extracting images from nested documents + +## Prerequisites + +Before starting these tutorials, you should have: + +- A GroupDocs.Parser Cloud account (free trial available) +- Basic understanding of REST APIs +- Familiarity with one of our supported programming languages (C#, Java, etc.) +- Your Client ID and Client Secret from the GroupDocs Cloud Dashboard + +## Estimated Time + +Each tutorial is designed to be completed in approximately 15-30 minutes, depending on your familiarity with the concepts. + +## Helpful Resources + +- [Product Page](https://products.groupdocs.cloud/parser/) +- [Documentation](https://docs.groupdocs.cloud/parser/) +- [Live Demo](https://products.groupdocs.app/parser/family) +- [API Reference](https://reference.groupdocs.cloud/parser/) +- [Blog](https://blog.groupdocs.cloud/categories/groupdocs.parser-cloud-product-family/) +- [Free Support](https://forum.groupdocs.cloud/c/parser/19/) +- [Free Trial](https://dashboard.groupdocs.cloud/#/apps) diff --git a/content/parser/english/parse-operations/extract-formatted-text/extract-formatted-text-tutorial.md b/content/parser/english/parse-operations/extract-formatted-text/extract-formatted-text-tutorial.md new file mode 100644 index 0000000..678fe57 --- /dev/null +++ b/content/parser/english/parse-operations/extract-formatted-text/extract-formatted-text-tutorial.md @@ -0,0 +1,445 @@ +--- +title: How to Extract Formatted Text Tutorial +url: /parse-operations/extract-formatted-text/ +weight: 3 +description: Learn how to extract formatted text from documents with preserved HTML, Markdown, or plain text formatting using GroupDocs.Parser Cloud API +--- + +# Tutorial: How to Extract Formatted Text + +## Learning Objectives + +In this tutorial, you'll learn how to: +- Extract text from documents while preserving formatting +- Use different text formatting modes (HTML, Markdown, Plain Text) +- Process formatted text in different programming languages + +## Prerequisites + +Before starting this tutorial, make sure you have: + +1. A GroupDocs.Parser Cloud account (if you don't have one, [register for a free trial](https://dashboard.groupdocs.cloud/#/apps)) +2. Your Client ID and Client Secret (available from the [dashboard](https://dashboard.groupdocs.cloud/#/apps)) +3. A formatted document (e.g., a DOCX file with various formatting) uploaded to your cloud storage + +## The Practical Scenario + +Imagine you're developing an application that needs to: +- Convert documents to web content while preserving formatting +- Export document text as HTML for rendering in a browser +- Preserve document structure including headings, lists, tables, and text styling + +This tutorial will show you how to implement this functionality step by step. + +## Step 1: Obtain Authorization Token + +Before making any API calls, you need to authenticate with the GroupDocs API using your Client ID and Client Secret. + +```bash +# First get JSON Web Token +curl -v "https://api.groupdocs.cloud/connect/token" \ +-X POST \ +-d "grant_type=client_credentials&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET" \ +-H "Content-Type: application/x-www-form-urlencoded" \ +-H "Accept: application/json" +``` + +This will return a JWT token that you'll use in subsequent requests. + +## Step 2: Prepare Your API Request + +To extract formatted text, you'll make a POST request to the text endpoint with the following parameters: + +```bash +curl -v "https://api.groupdocs.cloud/v1.0/parser/text" \ +-X POST \ +-H "Content-Type: application/json" \ +-H "Accept: application/json" \ +-H "Authorization: Bearer YOUR_JWT_TOKEN" \ +-d "{ + \"FormattedTextOptions\": { + \"Mode\": \"Html\" + }, + \"FileInfo\": { + \"FilePath\": \"words/docx/formatted-document.docx\" + } +}" +``` + +The key parameter here is `FormattedTextOptions.Mode`, which can be set to one of the following values: +- `Html`: Extracts text with HTML formatting +- `Markdown`: Extracts text with Markdown formatting +- `PlainText`: Extracts text without formatting (default) + +## Step 3: Execute the Request and Process the Response + +When you execute the request, the API will return a JSON response containing the formatted text: + +```json +{ + "text": " +

+Bold text + +

+

+Italic text + +

+
    +
  1. +First element + +
  2. +
  3. +Second element + +
  4. +
  5. +Third element + +
  6. +
+

Heading 1 +

+

+Hyperlink +targetwebsite.domain +

+ + + + + + + + + + + +
+

table +

+
+

Cell 1 +

+
+

Cell 2 +

+
+

Cell 3 +

+
+

Cell 4 +

+
+

Cell 5 +

+
+

\f +

+

+Second page bold text + +

+

Second page heading +

" +} +``` + +Notice that the HTML response preserves: +- Paragraph structure with `

` tags +- Text styling with `` and `` tags +- Lists with `

    ` and `
  1. ` tags +- Headings with `

    ` tags +- Links with `` tags +- Tables with ``, ``, and `
    ` tags +- Page breaks with `\f` character + +## Try It Yourself + +Now it's your turn to try extracting formatted text: + +1. Replace `YOUR_CLIENT_ID` and `YOUR_CLIENT_SECRET` with your actual credentials +2. Update the `FilePath` parameter to point to a formatted document in your storage +3. Try different values for `Mode` (Html, Markdown, PlainText) and observe how the response changes +4. Execute the curl command and analyze the formatted output + +## Implementation in Different Languages + +### C# Example + +```csharp +using System; +using System.Collections.Generic; +using System.Net.Http; +using System.Text; +using System.Threading.Tasks; +using Newtonsoft.Json; + +namespace GroupDocsParserCloudTutorial +{ + class Program + { + static async Task Main(string[] args) + { + // Get your ClientID and ClientSecret from https://dashboard.groupdocs.cloud + string clientId = "YOUR_CLIENT_ID"; + string clientSecret = "YOUR_CLIENT_SECRET"; + + // Get JWT token + string token = await GetAuthToken(clientId, clientSecret); + + // Extract formatted text (HTML mode) + await ExtractFormattedText(token, "words/docx/formatted-document.docx", "Html"); + + // You can also try other modes + // await ExtractFormattedText(token, "words/docx/formatted-document.docx", "Markdown"); + // await ExtractFormattedText(token, "words/docx/formatted-document.docx", "PlainText"); + } + + static async Task GetAuthToken(string clientId, string clientSecret) + { + using (var client = new HttpClient()) + { + // Prepare request + var requestBody = $"grant_type=client_credentials&client_id={clientId}&client_secret={clientSecret}"; + var content = new StringContent(requestBody, Encoding.UTF8, "application/x-www-form-urlencoded"); + + // Send request + var response = await client.PostAsync("https://api.groupdocs.cloud/connect/token", content); + + // Process response + var jsonString = await response.Content.ReadAsStringAsync(); + var token = JsonConvert.DeserializeObject>(jsonString); + + return token["access_token"]; + } + } + + static async Task ExtractFormattedText(string token, string filePath, string mode) + { + using (var client = new HttpClient()) + { + // Prepare request + client.DefaultRequestHeaders.Add("Authorization", $"Bearer {token}"); + + var requestBody = new + { + FormattedTextOptions = new + { + Mode = mode + }, + FileInfo = new + { + FilePath = filePath + } + }; + + var content = new StringContent(JsonConvert.SerializeObject(requestBody), Encoding.UTF8, "application/json"); + + // Send request + var response = await client.PostAsync("https://api.groupdocs.cloud/v1.0/parser/text", content); + + // Process response + var jsonString = await response.Content.ReadAsStringAsync(); + var result = JsonConvert.DeserializeObject(jsonString); + + Console.WriteLine($"Extracted text in {mode} format:"); + Console.WriteLine(result.Text); + + // If extracting HTML, you can save it to an HTML file + if (mode == "Html") + { + System.IO.File.WriteAllText("extracted.html", result.Text); + Console.WriteLine("HTML content saved to extracted.html"); + } + } + } + + class FormattedTextResponse + { + public string Text { get; set; } + } + } +} +``` + +### Java Example + +```java +import java.io.FileWriter; +import java.io.IOException; +import java.io.OutputStream; +import java.net.HttpURLConnection; +import java.net.URL; +import java.nio.charset.StandardCharsets; +import java.util.Scanner; +import org.json.JSONObject; + +public class ExtractFormattedTextTutorial { + + private static final String BASE_URL = "https://api.groupdocs.cloud/v1.0/parser"; + private static final String AUTH_URL = "https://api.groupdocs.cloud/connect/token"; + + public static void main(String[] args) throws IOException { + // Get your ClientID and ClientSecret from https://dashboard.groupdocs.cloud + String clientId = "YOUR_CLIENT_ID"; + String clientSecret = "YOUR_CLIENT_SECRET"; + + // Get JWT token + String token = getAuthToken(clientId, clientSecret); + + // Extract formatted text (HTML mode) + extractFormattedText(token, "words/docx/formatted-document.docx", "Html"); + + // You can also try other modes + // extractFormattedText(token, "words/docx/formatted-document.docx", "Markdown"); + // extractFormattedText(token, "words/docx/formatted-document.docx", "PlainText"); + } + + private static String getAuthToken(String clientId, String clientSecret) throws IOException { + URL url = new URL(AUTH_URL); + HttpURLConnection conn = (HttpURLConnection) url.openConnection(); + conn.setRequestMethod("POST"); + conn.setRequestProperty("Content-Type", "application/x-www-form-urlencoded"); + conn.setDoOutput(true); + + String requestBody = "grant_type=client_credentials&client_id=" + clientId + "&client_secret=" + clientSecret; + try (OutputStream os = conn.getOutputStream()) { + os.write(requestBody.getBytes(StandardCharsets.UTF_8)); + } + + try (Scanner scanner = new Scanner(conn.getInputStream(), StandardCharsets.UTF_8.name())) { + String jsonResponse = scanner.useDelimiter("\\A").next(); + JSONObject jsonObject = new JSONObject(jsonResponse); + return jsonObject.getString("access_token"); + } + } + + private static void extractFormattedText(String token, String filePath, String mode) throws IOException { + URL url = new URL(BASE_URL + "/text"); + HttpURLConnection conn = (HttpURLConnection) url.openConnection(); + conn.setRequestMethod("POST"); + conn.setRequestProperty("Content-Type", "application/json"); + conn.setRequestProperty("Accept", "application/json"); + conn.setRequestProperty("Authorization", "Bearer " + token); + conn.setDoOutput(true); + + String requestBody = String.format( + "{\"FormattedTextOptions\":{\"Mode\":\"%s\"},\"FileInfo\":{\"FilePath\":\"%s\"}}", + mode, filePath + ); + + try (OutputStream os = conn.getOutputStream()) { + os.write(requestBody.getBytes(StandardCharsets.UTF_8)); + } + + try (Scanner scanner = new Scanner(conn.getInputStream(), StandardCharsets.UTF_8.name())) { + String jsonResponse = scanner.useDelimiter("\\A").next(); + JSONObject responseObj = new JSONObject(jsonResponse); + String extractedText = responseObj.getString("text"); + + System.out.println("Extracted text in " + mode + " format:"); + System.out.println(extractedText); + + // If extracting HTML, you can save it to an HTML file + if (mode.equals("Html")) { + try (FileWriter writer = new FileWriter("extracted.html")) { + writer.write(extractedText); + } + System.out.println("HTML content saved to extracted.html"); + } + } + } +} +``` + +## Learning Checkpoint + +Take a moment to test your understanding: + +1. What are the three available modes for formatted text extraction? +2. How does the HTML mode preserve document structure compared to plain text? +3. What types of formatting elements are preserved in the HTML output? + +## Using the Extracted Formatted Text + +Here are some practical uses for the formatted text: + +### HTML Mode +The HTML-formatted output can be directly embedded in a webpage or application. You might need to add some CSS styling to make it look better: + +```html + + + + Extracted Document + + + + + [EXTRACTED_HTML_CONTENT] + + +``` + +### Markdown Mode +The Markdown-formatted output can be used in Markdown editors, GitHub README files, or converted to other formats using Markdown processors. + +## Common Issues and Troubleshooting + +- Missing Formatting: Some complex formatting may not be preserved exactly as in the original document. The API focuses on preserving the most common formatting elements. +- Special Characters: Special characters in HTML might need to be escaped when you display the content. Most modern frameworks handle this automatically. +- Rendering Differences: Different browsers or Markdown renderers might display the formatted content slightly differently. + +## What You've Learned + +In this tutorial, you've learned: +- How to extract text with preserved formatting from documents +- How to specify different formatting modes (HTML, Markdown, Plain Text) +- How to process and use the formatted text in your applications + +## Next Steps + +Now that you know how to extract formatted text, you can: +- Combine this with [Page Range Extraction](/parse-operations/extract-images-page-number-range/) to extract formatted text from specific pages +- Learn about [Extracting Images](/parse-operations/extract-images-whole-document/) to complement your formatted text with images + +## Further Practice + +Try creating an application that: +1. Extracts formatted text from a document +2. Processes the HTML to enhance it (e.g., adding custom CSS classes) +3. Renders the formatted content in a web application +4. Allows the user to toggle between HTML and Markdown views + +## Helpful Resources + +- [Product Page](https://products.groupdocs.cloud/parser/) +- [Documentation](https://docs.groupdocs.cloud/parser/) +- [Live Demo](https://products.groupdocs.app/parser/family) +- [API Reference](https://reference.groupdocs.cloud/parser/) +- [Blog](https://blog.groupdocs.cloud/categories/groupdocs.parser-cloud-product-family/) +- [Free Support](https://forum.groupdocs.cloud/c/parser/19/) +- [Free Trial](https://dashboard.groupdocs.cloud/#/apps) diff --git a/content/parser/english/parse-operations/extract-images-document-inside-container/extract-images-document-inside-container-tutorial.md b/content/parser/english/parse-operations/extract-images-document-inside-container/extract-images-document-inside-container-tutorial.md new file mode 100644 index 0000000..157c238 --- /dev/null +++ b/content/parser/english/parse-operations/extract-images-document-inside-container/extract-images-document-inside-container-tutorial.md @@ -0,0 +1,620 @@ +--- +title: How to Extract Images from a Document Inside a Container Tutorial +url: /parse-operations/extract-images-document-inside-container/ +weight: 3 +description: Learn how to extract images from documents stored within containers like ZIP archives, emails, or PDF portfolios using GroupDocs.Parser Cloud API +--- + +# Tutorial: How to Extract Images from a Document Inside a Container + +## Learning Objectives + +In this tutorial, you'll learn how to: +- Extract images from documents stored within container formats +- Access and process images from nested documents +- Handle password-protected containers and documents +- Download and use images extracted from container documents + +## Prerequisites + +Before starting this tutorial, make sure you have: + +1. A GroupDocs.Parser Cloud account (if you don't have one, [register for a free trial](https://dashboard.groupdocs.cloud/#/apps)) +2. Your Client ID and Client Secret (available from the [dashboard](https://dashboard.groupdocs.cloud/#/apps)) +3. A container file (ZIP archive, email, or PDF portfolio) with embedded documents containing images uploaded to your cloud storage + +## The Practical Scenario + +Imagine you're developing an application that needs to: +- Process images from email attachments without saving them to disk +- Extract visuals from documents in compressed archives +- Access images in PDF portfolios that contain multiple embedded documents + +This tutorial will show you how to implement this functionality step by step. + +## Step 1: Obtain Authorization Token + +Before making any API calls, you need to authenticate with the GroupDocs API using your Client ID and Client Secret. + +```bash +# First get JSON Web Token +curl -v "https://api.groupdocs.cloud/connect/token" \ +-X POST \ +-d "grant_type=client_credentials&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET" \ +-H "Content-Type: application/x-www-form-urlencoded" \ +-H "Accept: application/json" +``` + +This will return a JWT token that you'll use in subsequent requests. + +## Step 2: Prepare Your API Request + +To extract images from a document inside a container, you'll make a POST request to the images endpoint with the following parameters: + +```bash +curl -v "https://api.groupdocs.cloud/v1.0/parser/images" \ +-X POST \ +-H "Content-Type: application/json" \ +-H "Accept: application/json" \ +-H "Authorization: Bearer YOUR_JWT_TOKEN" \ +-d "{ + \"StartPageNumber\": 1, + \"CountPagesToExtract\": 2, + \"FileInfo\": { + \"FilePath\": \"pdf/PDF with attachements.pdf\", + \"StorageName\": \"\", + \"Password\": \"password\" + }, + \"ContainerItemInfo\": { + \"RelativePath\": \"template-document.pdf\", + \"Password\": \"password\" + } +}" +``` + +The key parameters here are: +- `FileInfo`: Information about the container file + - `FilePath`: The path to the container file in your storage + - `Password`: If the container is password-protected (optional) +- `ContainerItemInfo`: Information about the document inside the container + - `RelativePath`: The relative path to the document inside the container + - `Password`: If the embedded document is password-protected (optional) +- `StartPageNumber` and `CountPagesToExtract`: Optional parameters if you want to extract images from specific pages of the embedded document + +## Step 3: Execute the Request and Process the Response + +When you execute the request, the API will return a JSON response containing information about the extracted images, organized by page: + +```json +{ + "pages": [ + { + "pageIndex": 1, + "images": [ + { + "path": "parser/images/template-document_pdf/page_1/image_0.jpeg", + "downloadUrl": "https://api.groupdocs.cloud/v1.0/parser/storage/file/parser/images/template-document_pdf/page_1/image_0.jpeg" + } + ] + }, + { + "pageIndex": 2, + "images": [ + { + "path": "parser/images/template-document_pdf/page_2/image_0.jpeg", + "downloadUrl": "https://api.groupdocs.cloud/v1.0/parser/storage/file/parser/images/template-document_pdf/page_2/image_0.jpeg" + } + ] + } + ] +} +``` + +The response includes: +- `pages`: An array of page objects, each containing: + - `pageIndex`: The index of the page + - `images`: An array of images found on that page, each containing: + - `path`: The storage path where the extracted image is saved + - `downloadUrl`: A direct URL to download the extracted image + +## Try It Yourself + +Now it's your turn to try extracting images from a document inside a container: + +1. Replace `YOUR_CLIENT_ID` and `YOUR_CLIENT_SECRET` with your actual credentials +2. Update the `FilePath` parameter to point to a container file in your storage +3. Update the `RelativePath` parameter to specify the document inside the container +4. Add passwords if necessary +5. Execute the curl command and observe the response +6. Try downloading some of the extracted images using the provided `downloadUrl` + +## Implementation in Different Languages + +### C# Example + +```csharp +using System; +using System.Collections.Generic; +using System.IO; +using System.Net.Http; +using System.Text; +using System.Threading.Tasks; +using Newtonsoft.Json; + +namespace GroupDocsParserCloudTutorial +{ + class Program + { + static async Task Main(string[] args) + { + // Get your ClientID and ClientSecret from https://dashboard.groupdocs.cloud + string clientId = "YOUR_CLIENT_ID"; + string clientSecret = "YOUR_CLIENT_SECRET"; + + // Get JWT token + string token = await GetAuthToken(clientId, clientSecret); + + // Extract images from a document inside a container + await ExtractImagesFromContainer( + token, + "pdf/PDF with attachements.pdf", + "template-document.pdf", + 1, // Start page (optional) + 2, // Page count (optional) + "container_pass", // Container password if needed + "document_pass" // Document password if needed + ); + } + + static async Task GetAuthToken(string clientId, string clientSecret) + { + using (var client = new HttpClient()) + { + // Prepare request + var requestBody = $"grant_type=client_credentials&client_id={clientId}&client_secret={clientSecret}"; + var content = new StringContent(requestBody, Encoding.UTF8, "application/x-www-form-urlencoded"); + + // Send request + var response = await client.PostAsync("https://api.groupdocs.cloud/connect/token", content); + + // Process response + var jsonString = await response.Content.ReadAsStringAsync(); + var token = JsonConvert.DeserializeObject>(jsonString); + + return token["access_token"]; + } + } + + static async Task ExtractImagesFromContainer( + string token, + string containerPath, + string documentPath, + int? startPage = null, + int? pageCount = null, + string containerPassword = null, + string documentPassword = null) + { + using (var client = new HttpClient()) + { + // Prepare request + client.DefaultRequestHeaders.Add("Authorization", $"Bearer {token}"); + + // Create request object + var requestObj = new Dictionary(); + + // Add FileInfo + var fileInfo = new Dictionary + { + { "FilePath", containerPath }, + { "StorageName", "" } + }; + + if (!string.IsNullOrEmpty(containerPassword)) + { + fileInfo.Add("Password", containerPassword); + } + + requestObj.Add("FileInfo", fileInfo); + + // Add ContainerItemInfo + var containerItemInfo = new Dictionary + { + { "RelativePath", documentPath } + }; + + if (!string.IsNullOrEmpty(documentPassword)) + { + containerItemInfo.Add("Password", documentPassword); + } + + requestObj.Add("ContainerItemInfo", containerItemInfo); + + // Add page range parameters if specified + if (startPage.HasValue) + { + requestObj.Add("StartPageNumber", startPage.Value); + } + + if (pageCount.HasValue) + { + requestObj.Add("CountPagesToExtract", pageCount.Value); + } + + var content = new StringContent(JsonConvert.SerializeObject(requestObj), Encoding.UTF8, "application/json"); + + // Send request + var response = await client.PostAsync("https://api.groupdocs.cloud/v1.0/parser/images", content); + + // Process response + var jsonString = await response.Content.ReadAsStringAsync(); + Console.WriteLine("API Response:"); + Console.WriteLine(jsonString); + + // Parse the response + var result = JsonConvert.DeserializeObject(jsonString); + + if (result.Pages != null && result.Pages.Count > 0) + { + Console.WriteLine($"Extracted images from {result.Pages.Count} pages."); + + // Process each page + foreach (var page in result.Pages) + { + Console.WriteLine($"Page {page.PageIndex} has {page.Images.Count} image(s)."); + + // Download the first image from each page (if available) + if (page.Images.Count > 0) + { + var imageUrl = page.Images[0].DownloadUrl; + string fileName = $"container_image_page_{page.PageIndex}_image_0.jpeg"; + await DownloadImage(client, imageUrl, fileName); + } + } + } + else + { + Console.WriteLine("No images found in the specified document inside the container."); + } + } + } + + static async Task DownloadImage(HttpClient client, string imageUrl, string localPath) + { + Console.WriteLine($"Downloading image from: {imageUrl}"); + + var imageBytes = await client.GetByteArrayAsync(imageUrl); + File.WriteAllBytes(localPath, imageBytes); + + Console.WriteLine($"Image downloaded to: {localPath}"); + } + + class PagedImageExtractionResponse + { + public List Pages { get; set; } + } + + class PageImageInfo + { + public int PageIndex { get; set; } + public List Images { get; set; } + } + + class ImageInfo + { + public string Path { get; set; } + public string DownloadUrl { get; set; } + } + } +} +``` + +### Java Example + +```java +import java.io.FileOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.net.HttpURLConnection; +import java.net.URL; +import java.nio.charset.StandardCharsets; +import java.util.Scanner; +import org.json.JSONArray; +import org.json.JSONObject; + +public class ExtractImagesFromContainerTutorial { + + private static final String BASE_URL = "https://api.groupdocs.cloud/v1.0/parser"; + private static final String AUTH_URL = "https://api.groupdocs.cloud/connect/token"; + + public static void main(String[] args) throws IOException { + // Get your ClientID and ClientSecret from https://dashboard.groupdocs.cloud + String clientId = "YOUR_CLIENT_ID"; + String clientSecret = "YOUR_CLIENT_SECRET"; + + // Get JWT token + String token = getAuthToken(clientId, clientSecret); + + // Extract images from a document inside a container + extractImagesFromContainer( + token, + "pdf/PDF with attachements.pdf", + "template-document.pdf", + 1, // Start page (optional) + 2, // Page count (optional) + "container_pass", // Container password if needed + "document_pass" // Document password if needed + ); + } + + private static String getAuthToken(String clientId, String clientSecret) throws IOException { + URL url = new URL(AUTH_URL); + HttpURLConnection conn = (HttpURLConnection) url.openConnection(); + conn.setRequestMethod("POST"); + conn.setRequestProperty("Content-Type", "application/x-www-form-urlencoded"); + conn.setDoOutput(true); + + String requestBody = "grant_type=client_credentials&client_id=" + clientId + "&client_secret=" + clientSecret; + try (OutputStream os = conn.getOutputStream()) { + os.write(requestBody.getBytes(StandardCharsets.UTF_8)); + } + + try (Scanner scanner = new Scanner(conn.getInputStream(), StandardCharsets.UTF_8.name())) { + String jsonResponse = scanner.useDelimiter("\\A").next(); + JSONObject jsonObject = new JSONObject(jsonResponse); + return jsonObject.getString("access_token"); + } + } + + private static void extractImagesFromContainer( + String token, + String containerPath, + String documentPath, + Integer startPage, + Integer pageCount, + String containerPassword, + String documentPassword) throws IOException { + + URL url = new URL(BASE_URL + "/images"); + HttpURLConnection conn = (HttpURLConnection) url.openConnection(); + conn.setRequestMethod("POST"); + conn.setRequestProperty("Content-Type", "application/json"); + conn.setRequestProperty("Accept", "application/json"); + conn.setRequestProperty("Authorization", "Bearer " + token); + conn.setDoOutput(true); + + // Create request JSON + JSONObject requestJson = new JSONObject(); + + // Add FileInfo + JSONObject fileInfo = new JSONObject(); + fileInfo.put("FilePath", containerPath); + fileInfo.put("StorageName", ""); + if (containerPassword != null && !containerPassword.isEmpty()) { + fileInfo.put("Password", containerPassword); + } + requestJson.put("FileInfo", fileInfo); + + // Add ContainerItemInfo + JSONObject containerItemInfo = new JSONObject(); + containerItemInfo.put("RelativePath", documentPath); + if (documentPassword != null && !documentPassword.isEmpty()) { + containerItemInfo.put("Password", documentPassword); + } + requestJson.put("ContainerItemInfo", containerItemInfo); + + // Add page range parameters if specified + if (startPage != null) { + requestJson.put("StartPageNumber", startPage); + } + + if (pageCount != null) { + requestJson.put("CountPagesToExtract", pageCount); + } + + String requestBody = requestJson.toString(); + try (OutputStream os = conn.getOutputStream()) { + os.write(requestBody.getBytes(StandardCharsets.UTF_8)); + } + + try (Scanner scanner = new Scanner(conn.getInputStream(), StandardCharsets.UTF_8.name())) { + String jsonResponse = scanner.useDelimiter("\\A").next(); + System.out.println("API Response:"); + System.out.println(jsonResponse); + + // Parse the response + JSONObject responseObj = new JSONObject(jsonResponse); + + if (responseObj.has("pages")) { + JSONArray pages = responseObj.getJSONArray("pages"); + System.out.println("Extracted images from " + pages.length() + " pages."); + + // Process each page + for (int i = 0; i < pages.length(); i++) { + JSONObject page = pages.getJSONObject(i); + int pageIndex = page.getInt("pageIndex"); + JSONArray images = page.getJSONArray("images"); + + System.out.println("Page " + pageIndex + " has " + images.length() + " image(s)."); + + // Download the first image from each page (if available) + if (images.length() > 0) { + JSONObject firstImage = images.getJSONObject(0); + String downloadUrl = firstImage.getString("downloadUrl"); + String fileName = "container_image_page_" + pageIndex + "_image_0.jpeg"; + downloadImage(token, downloadUrl, fileName); + } + } + } else { + System.out.println("No images found in the specified document inside the container."); + } + } + } + + private static void downloadImage(String token, String imageUrl, String localPath) throws IOException { + System.out.println("Downloading image from: " + imageUrl); + + URL url = new URL(imageUrl); + HttpURLConnection conn = (HttpURLConnection) url.openConnection(); + conn.setRequestMethod("GET"); + conn.setRequestProperty("Authorization", "Bearer " + token); + + try (InputStream is = conn.getInputStream(); + FileOutputStream fos = new FileOutputStream(localPath)) { + + byte[] buffer = new byte[4096]; + int bytesRead; + + while ((bytesRead = is.read(buffer)) != -1) { + fos.write(buffer, 0, bytesRead); + } + } + + System.out.println("Image downloaded to: " + localPath); + } +} +``` + +## Understanding Container Types and Document Paths + +Different container types require different approaches for specifying the relative path: + +### ZIP Archives + +For ZIP archives, the relative path is the path of the file inside the archive: + +```json +"ContainerItemInfo": { + "RelativePath": "folder/document.docx" +} +``` + +### Email Attachments + +For emails, the relative path is the name of the attachment: + +```json +"ContainerItemInfo": { + "RelativePath": "attachment.pdf" +} +``` + +### PDF Portfolios + +For PDF portfolios, the relative path is the name of the embedded document: + +```json +"ContainerItemInfo": { + "RelativePath": "embedded-document.pdf" +} +``` + +## Working with Password-Protected Documents + +If either the container or the embedded document is password-protected, you need to provide the passwords: + +```json +"FileInfo": { + "FilePath": "archives/protected-archive.zip", + "Password": "container-password" +}, +"ContainerItemInfo": { + "RelativePath": "protected-document.pdf", + "Password": "document-password" +} +``` + +## Practical Use Cases + +### Email Attachment Processing + +One common application is to process images from email attachments without saving them to disk: + +```csharp +// C# example: Process images from email attachments +async Task ProcessEmailAttachmentImages(string emailFilePath) +{ + // Get list of attachments in the email (using other API methods) + var attachments = await GetEmailAttachments(emailFilePath); + + foreach (var attachment in attachments) + { + // Check if attachment is a document that might contain images + if (IsDocumentWithPotentialImages(attachment.FileName)) + { + // Extract images from this attachment + await ExtractImagesFromContainer( + token, + emailFilePath, // Email as container + attachment.FileName // Attachment as document inside container + ); + } + } +} +``` + +### Batch Processing ZIP Archives + +Another application is to extract images from all documents in a ZIP archive: + +```java +// Java example: Process all documents in a ZIP archive +void processAllDocumentsInZip(String zipFilePath) throws IOException { + // Get list of files in the ZIP (using other API methods) + List zipContents = getZipContents(zipFilePath); + + for (String filePath : zipContents) { + // Check if file is a document that might contain images + if (isDocumentWithPotentialImages(filePath)) { + // Extract images from this document + extractImagesFromContainer( + token, + zipFilePath, // ZIP as container + filePath // Document inside ZIP + ); + } + } +} +``` + +## Learning Checkpoint + +Take a moment to test your understanding: + +1. What is the purpose of the `ContainerItemInfo` parameter? +2. How do you handle password-protected containers and documents? +3. What types of containers are supported by the API? +4. How would you extract images from a specific page range of a document inside a container? + +## Common Issues and Troubleshooting + +- Invalid Relative Path: Ensure the relative path to the document inside the container is correct. The path is case-sensitive. +- Password Issues: If the container or document is password-protected, make sure you're providing the correct passwords. +- Unsupported Container Types: Not all container formats may be supported. Check the documentation for a complete list of supported formats. +- No Images Found: The document inside the container might not have any embedded images. Try with a different document. + +## What You've Learned + +In this tutorial, you've learned: +- How to extract images from documents stored inside containers +- How to handle password-protected containers and documents +- How to extract images from specific page ranges of documents inside containers +- How to download and process the extracted images + +## Further Practice + +Try creating an application that: +1. Monitors an email inbox for new messages with attachments +2. Automatically extracts all images from documents in attachments +3. Organizes the extracted images by email sender, date, and document name +4. Creates a searchable database of all extracted images with metadata + +## Helpful Resources + +- [Product Page](https://products.groupdocs.cloud/parser/) +- [Documentation](https://docs.groupdocs.cloud/parser/) +- [Live Demo](https://products.groupdocs.app/parser/family) +- [API Reference](https://reference.groupdocs.cloud/parser/) +- [Blog](https://blog.groupdocs.cloud/categories/groupdocs.parser-cloud-product-family/) +- [Free Support](https://forum.groupdocs.cloud/c/parser/19/) +- [Free Trial](https://dashboard.groupdocs.cloud/#/apps) diff --git a/content/parser/english/parse-operations/extract-images-page-number-range/extract-images-page-number-range-tutorial.md b/content/parser/english/parse-operations/extract-images-page-number-range/extract-images-page-number-range-tutorial.md new file mode 100644 index 0000000..400aec8 --- /dev/null +++ b/content/parser/english/parse-operations/extract-images-page-number-range/extract-images-page-number-range-tutorial.md @@ -0,0 +1,488 @@ +--- +title: How to Extract Images by a Page Number Range Tutorial +url: /parse-operations/extract-images-page-number-range/ +weight: 2 +description: Learn how to extract images from specific pages in documents using GroupDocs.Parser Cloud API in this step-by-step tutorial for developers +--- + +# Tutorial: How to Extract Images by a Page Number Range + +## Learning Objectives + +In this tutorial, you'll learn how to: +- Extract images from specific pages in a document +- Define page ranges for targeted image extraction +- Process and save page-specific images in different programming languages + +## Prerequisites + +Before starting this tutorial, make sure you have: + +1. A GroupDocs.Parser Cloud account (if you don't have one, [register for a free trial](https://dashboard.groupdocs.cloud/#/apps)) +2. Your Client ID and Client Secret (available from the [dashboard](https://dashboard.groupdocs.cloud/#/apps)) +3. A multi-page document with images uploaded to your cloud storage + +## The Practical Scenario + +Imagine you're building an application that needs to: +- Extract images only from specific sections of a document +- Process only visual elements from relevant pages instead of the entire document +- Create visual presentations focused on particular document pages + +This tutorial will show you how to implement this functionality step by step. + +## Step 1: Obtain Authorization Token + +Before making any API calls, you need to authenticate with the GroupDocs API using your Client ID and Client Secret. + +```bash +# First get JSON Web Token +curl -v "https://api.groupdocs.cloud/connect/token" \ +-X POST \ +-d "grant_type=client_credentials&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET" \ +-H "Content-Type: application/x-www-form-urlencoded" \ +-H "Accept: application/json" +``` + +This will return a JWT token that you'll use in subsequent requests. + +## Step 2: Prepare Your API Request + +To extract images from specific pages, you'll make a POST request to the images endpoint with the following parameters: + +```bash +curl -v "https://api.groupdocs.cloud/v1.0/parser/images" \ +-X POST \ +-H "Content-Type: application/json" \ +-H "Accept: application/json" \ +-H "Authorization: Bearer YOUR_JWT_TOKEN" \ +-d "{ + \"StartPageNumber\": 1, + \"CountPagesToExtract\": 2, + \"FileInfo\": { + \"FilePath\": \"pdf/template-document.pdf\", + \"StorageName\": \"\" + } +}" +``` + +The key parameters here are: +- `StartPageNumber`: The zero-based index of the first page to extract images from (1 = second page) +- `CountPagesToExtract`: The number of pages to extract images from, starting from the start page + +## Step 3: Execute the Request and Process the Response + +When you execute the request, the API will return a JSON response containing information about the extracted images, organized by page: + +```json +{ + "pages": [ + { + "pageIndex": 1, + "images": [ + { + "path": "parser/images/pdf/template-document_pdf/page_1/image_0.jpeg", + "downloadUrl": "https://api.groupdocs.cloud/v1.0/parser/storage/file/parser/images/pdf/template-document_pdf/page_1/image_0.jpeg" + } + ] + }, + { + "pageIndex": 2, + "images": [ + { + "path": "parser/images/pdf/template-document_pdf/page_2/image_0.jpeg", + "downloadUrl": "https://api.groupdocs.cloud/v1.0/parser/storage/file/parser/images/pdf/template-document_pdf/page_2/image_0.jpeg" + } + ] + } + ] +} +``` + +The response includes: +- `pages`: An array of page objects, each containing: + - `pageIndex`: The index of the page + - `images`: An array of images found on that page, each containing: + - `path`: The storage path where the extracted image is saved + - `downloadUrl`: A direct URL to download the extracted image + +## Try It Yourself + +Now it's your turn to try extracting images from specific pages: + +1. Replace `YOUR_CLIENT_ID` and `YOUR_CLIENT_SECRET` with your actual credentials +2. Update the `FilePath` parameter to point to a multi-page document with images in your storage +3. Adjust the `StartPageNumber` and `CountPagesToExtract` parameters to extract images from different page ranges +4. Execute the curl command and observe how the response changes +5. Try downloading some of the extracted images using the provided `downloadUrl` + +## Implementation in Different Languages + +### C# Example + +```csharp +using System; +using System.Collections.Generic; +using System.IO; +using System.Net.Http; +using System.Text; +using System.Threading.Tasks; +using Newtonsoft.Json; + +namespace GroupDocsParserCloudTutorial +{ + class Program + { + static async Task Main(string[] args) + { + // Get your ClientID and ClientSecret from https://dashboard.groupdocs.cloud + string clientId = "YOUR_CLIENT_ID"; + string clientSecret = "YOUR_CLIENT_SECRET"; + + // Get JWT token + string token = await GetAuthToken(clientId, clientSecret); + + // Extract images from specific pages + await ExtractImagesByPageRange(token, "pdf/template-document.pdf", 1, 2); + } + + static async Task GetAuthToken(string clientId, string clientSecret) + { + using (var client = new HttpClient()) + { + // Prepare request + var requestBody = $"grant_type=client_credentials&client_id={clientId}&client_secret={clientSecret}"; + var content = new StringContent(requestBody, Encoding.UTF8, "application/x-www-form-urlencoded"); + + // Send request + var response = await client.PostAsync("https://api.groupdocs.cloud/connect/token", content); + + // Process response + var jsonString = await response.Content.ReadAsStringAsync(); + var token = JsonConvert.DeserializeObject>(jsonString); + + return token["access_token"]; + } + } + + static async Task ExtractImagesByPageRange(string token, string filePath, int startPage, int pageCount) + { + using (var client = new HttpClient()) + { + // Prepare request + client.DefaultRequestHeaders.Add("Authorization", $"Bearer {token}"); + + var requestBody = new + { + StartPageNumber = startPage, + CountPagesToExtract = pageCount, + FileInfo = new + { + FilePath = filePath, + StorageName = "" + } + }; + + var content = new StringContent(JsonConvert.SerializeObject(requestBody), Encoding.UTF8, "application/json"); + + // Send request + var response = await client.PostAsync("https://api.groupdocs.cloud/v1.0/parser/images", content); + + // Process response + var jsonString = await response.Content.ReadAsStringAsync(); + Console.WriteLine("API Response:"); + Console.WriteLine(jsonString); + + // Parse the response + var result = JsonConvert.DeserializeObject(jsonString); + + if (result.Pages != null && result.Pages.Count > 0) + { + Console.WriteLine($"Extracted images from {result.Pages.Count} pages."); + + // Process each page + foreach (var page in result.Pages) + { + Console.WriteLine($"Page {page.PageIndex} has {page.Images.Count} image(s)."); + + // Download the first image from each page (if available) + if (page.Images.Count > 0) + { + var imageUrl = page.Images[0].DownloadUrl; + string fileName = $"page_{page.PageIndex}_image_0.jpeg"; + await DownloadImage(client, imageUrl, fileName); + } + } + } + else + { + Console.WriteLine("No images found in the specified page range."); + } + } + } + + static async Task DownloadImage(HttpClient client, string imageUrl, string localPath) + { + Console.WriteLine($"Downloading image from: {imageUrl}"); + + var imageBytes = await client.GetByteArrayAsync(imageUrl); + File.WriteAllBytes(localPath, imageBytes); + + Console.WriteLine($"Image downloaded to: {localPath}"); + } + + class PagedImageExtractionResponse + { + public List Pages { get; set; } + } + + class PageImageInfo + { + public int PageIndex { get; set; } + public List Images { get; set; } + } + + class ImageInfo + { + public string Path { get; set; } + public string DownloadUrl { get; set; } + } + } +} +``` + +### Java Example + +```java +import java.io.FileOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.net.HttpURLConnection; +import java.net.URL; +import java.nio.charset.StandardCharsets; +import java.util.Scanner; +import org.json.JSONArray; +import org.json.JSONObject; + +public class ExtractImagesByPageRangeTutorial { + + private static final String BASE_URL = "https://api.groupdocs.cloud/v1.0/parser"; + private static final String AUTH_URL = "https://api.groupdocs.cloud/connect/token"; + + public static void main(String[] args) throws IOException { + // Get your ClientID and ClientSecret from https://dashboard.groupdocs.cloud + String clientId = "YOUR_CLIENT_ID"; + String clientSecret = "YOUR_CLIENT_SECRET"; + + // Get JWT token + String token = getAuthToken(clientId, clientSecret); + + // Extract images from specific pages + extractImagesByPageRange(token, "pdf/template-document.pdf", 1, 2); + } + + private static String getAuthToken(String clientId, String clientSecret) throws IOException { + URL url = new URL(AUTH_URL); + HttpURLConnection conn = (HttpURLConnection) url.openConnection(); + conn.setRequestMethod("POST"); + conn.setRequestProperty("Content-Type", "application/x-www-form-urlencoded"); + conn.setDoOutput(true); + + String requestBody = "grant_type=client_credentials&client_id=" + clientId + "&client_secret=" + clientSecret; + try (OutputStream os = conn.getOutputStream()) { + os.write(requestBody.getBytes(StandardCharsets.UTF_8)); + } + + try (Scanner scanner = new Scanner(conn.getInputStream(), StandardCharsets.UTF_8.name())) { + String jsonResponse = scanner.useDelimiter("\\A").next(); + JSONObject jsonObject = new JSONObject(jsonResponse); + return jsonObject.getString("access_token"); + } + } + + private static void extractImagesByPageRange(String token, String filePath, int startPage, int pageCount) throws IOException { + URL url = new URL(BASE_URL + "/images"); + HttpURLConnection conn = (HttpURLConnection) url.openConnection(); + conn.setRequestMethod("POST"); + conn.setRequestProperty("Content-Type", "application/json"); + conn.setRequestProperty("Accept", "application/json"); + conn.setRequestProperty("Authorization", "Bearer " + token); + conn.setDoOutput(true); + + String requestBody = String.format( + "{\"StartPageNumber\":%d,\"CountPagesToExtract\":%d,\"FileInfo\":{\"FilePath\":\"%s\",\"StorageName\":\"\"}}", + startPage, pageCount, filePath + ); + + try (OutputStream os = conn.getOutputStream()) { + os.write(requestBody.getBytes(StandardCharsets.UTF_8)); + } + + try (Scanner scanner = new Scanner(conn.getInputStream(), StandardCharsets.UTF_8.name())) { + String jsonResponse = scanner.useDelimiter("\\A").next(); + System.out.println("API Response:"); + System.out.println(jsonResponse); + + // Parse the response + JSONObject responseObj = new JSONObject(jsonResponse); + + if (responseObj.has("pages")) { + JSONArray pages = responseObj.getJSONArray("pages"); + System.out.println("Extracted images from " + pages.length() + " pages."); + + // Process each page + for (int i = 0; i < pages.length(); i++) { + JSONObject page = pages.getJSONObject(i); + int pageIndex = page.getInt("pageIndex"); + JSONArray images = page.getJSONArray("images"); + + System.out.println("Page " + pageIndex + " has " + images.length() + " image(s)."); + + // Download the first image from each page (if available) + if (images.length() > 0) { + JSONObject firstImage = images.getJSONObject(0); + String downloadUrl = firstImage.getString("downloadUrl"); + String fileName = "page_" + pageIndex + "_image_0.jpeg"; + downloadImage(token, downloadUrl, fileName); + } + } + } else { + System.out.println("No images found in the specified page range."); + } + } + } + + private static void downloadImage(String token, String imageUrl, String localPath) throws IOException { + System.out.println("Downloading image from: " + imageUrl); + + URL url = new URL(imageUrl); + HttpURLConnection conn = (HttpURLConnection) url.openConnection(); + conn.setRequestMethod("GET"); + conn.setRequestProperty("Authorization", "Bearer " + token); + + try (InputStream is = conn.getInputStream(); + FileOutputStream fos = new FileOutputStream(localPath)) { + + byte[] buffer = new byte[4096]; + int bytesRead; + + while ((bytesRead = is.read(buffer)) != -1) { + fos.write(buffer, 0, bytesRead); + } + } + + System.out.println("Image downloaded to: " + localPath); + } +} +``` + +## Understanding Page Indexing + +It's important to remember that page indexing in the API is zero-based: +- Page 1 of the document is represented as index 0 +- Page 2 of the document is represented as index 1 +- And so on... + +When setting the `StartPageNumber` parameter, make sure to use the zero-based index of the page you want to start from. + +## Learning Checkpoint + +Take a moment to test your understanding: + +1. If you want to extract images from pages 3-5 of a document, what values should you use for `StartPageNumber` and `CountPagesToExtract`? +2. How would you modify the request to extract images only from the last page of a document if you don't know how many pages it has? +3. What's the difference between the response structure of extracting images from the whole document vs. extracting images by page range? + +## Practical Use Cases + +Here are some practical applications for page-specific image extraction: + +### Document Previews +Create preview images for specific pages of documents in a document management system. + +```javascript +// Frontend code example (using extracted image URLs) +function createDocumentPreview(pageImages) { + const previewContainer = document.getElementById('preview-container'); + + pageImages.forEach(page => { + const pageDiv = document.createElement('div'); + pageDiv.className = 'page-preview'; + + page.images.forEach(image => { + const img = document.createElement('img'); + img.src = image.downloadUrl; + img.alt = `Image from page ${page.pageIndex}`; + pageDiv.appendChild(img); + }); + + previewContainer.appendChild(pageDiv); + }); +} +``` + +### Targeted Image Processing +Extract and process images from only the relevant sections of a document, saving time and resources. + +```python +# Python example for image processing +import requests +from PIL import Image +from io import BytesIO + +def process_document_images(download_urls): + processed_images = [] + + for url in download_urls: + # Download the image + response = requests.get(url) + img = Image.open(BytesIO(response.content)) + + # Process the image (example: convert to grayscale) + processed = img.convert('L') + + processed_images.append(processed) + + return processed_images +``` + +## Common Issues and Troubleshooting + +- Page Range Out of Bounds: If you specify a page range that's outside the document's boundaries, you'll receive an error. Make sure your `StartPageNumber` and `CountPagesToExtract` parameters are valid for your document. + +- No Images on Specified Pages: If the specified pages don't contain any images, the API will return empty image arrays for those pages. Always check if the `images` array has elements before trying to access them. + +- Different Page Counts: Remember that page numbering starts from 0 in the API, but may start from 1 in the document viewer. This can cause confusion when specifying page ranges. + +## What You've Learned + +In this tutorial, you've learned: +- How to extract images from specific pages in a document +- How to specify page ranges using `StartPageNumber` and `CountPagesToExtract` +- How to process the page-specific images in your application +- How to download and use the extracted images + +## Next Steps + +Now that you know how to extract images from specific pages, you can: +- Learn about [Extracting Images from Documents in Containers](url: /parse-operations/extract-images-document-inside-container/) for more advanced scenarios +- Combine text and image extraction techniques to create comprehensive document processing workflows + +## Further Practice + +Try creating an application that: +1. Extracts images from each page of a presentation (PPTX) file +2. Creates a thumbnail gallery of all images organized by page +3. Adds page number overlays to each extracted image +4. Provides a UI to navigate through the document images by page + +## Helpful Resources + +- [Product Page](https://products.groupdocs.cloud/parser/) +- [Documentation](https://docs.groupdocs.cloud/parser/) +- [Live Demo](https://products.groupdocs.app/parser/family) +- [API Reference](https://reference.groupdocs.cloud/parser/) +- [Blog](https://blog.groupdocs.cloud/categories/groupdocs.parser-cloud-product-family/) +- [Free Support](https://forum.groupdocs.cloud/c/parser/19/) +- [Free Trial](https://dashboard.groupdocs.cloud/#/apps) diff --git a/content/parser/english/parse-operations/extract-images-whole-document/extract-images-whole-document-tutorial.md b/content/parser/english/parse-operations/extract-images-whole-document/extract-images-whole-document-tutorial.md new file mode 100644 index 0000000..40b122c --- /dev/null +++ b/content/parser/english/parse-operations/extract-images-whole-document/extract-images-whole-document-tutorial.md @@ -0,0 +1,387 @@ +--- +title: How to Extract Images from the Whole Document Tutorial +url: /parse-operations/extract-images-whole-document/ +weight: 1 +description: Learn how to extract all images from documents using GroupDocs.Parser Cloud API in this step-by-step tutorial for developers +--- + +# Tutorial: How to Extract Images from the Whole Document + +## Learning Objectives + +In this tutorial, you'll learn how to: +- Set up the GroupDocs.Parser Cloud API for image extraction +- Extract all images from a document with a simple API call +- Download and process the extracted images in different programming languages + +## Prerequisites + +Before starting this tutorial, make sure you have: + +1. A GroupDocs.Parser Cloud account (if you don't have one, [register for a free trial](https://dashboard.groupdocs.cloud/#/apps)) +2. Your Client ID and Client Secret (available from the [dashboard](https://dashboard.groupdocs.cloud/#/apps)) +3. A document with images uploaded to your cloud storage + +## The Practical Scenario + +Imagine you need to extract all images from a document to: +- Create a gallery of document images +- Process images for OCR or other image analysis +- Archive document visual assets separately + +This tutorial will show you how to implement this functionality step by step. + +## Step 1: Obtain Authorization Token + +Before making any API calls, you need to authenticate with the GroupDocs API using your Client ID and Client Secret. + +```bash +# First get JSON Web Token +curl -v "https://api.groupdocs.cloud/connect/token" \ +-X POST \ +-d "grant_type=client_credentials&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET" \ +-H "Content-Type: application/x-www-form-urlencoded" \ +-H "Accept: application/json" +``` + +This will return a JWT token that you'll use in subsequent requests. + +## Step 2: Prepare Your API Request + +To extract images from a document, you'll make a POST request to the images endpoint with the following parameters: + +```bash +curl -v "https://api.groupdocs.cloud/v1.0/parser/images" \ +-X POST \ +-H "Content-Type: application/json" \ +-H "Accept: application/json" \ +-H "Authorization: Bearer YOUR_JWT_TOKEN" \ +-d "{ + \"FileInfo\": { + \"FilePath\": \"containers/archive/zip-eml-jpg-pdf.zip\", + \"StorageName\": \"\" + } +}" +``` + +Note that you only need to specify the `FilePath` parameter to extract images from the entire document. + +## Step 3: Execute the Request and Process the Response + +When you execute the request, the API will return a JSON response containing information about the extracted images: + +```json +{ + "images": [ + { + "path": "parser/images/containers/archive/zip-eml-jpg-pdf_zip/template-document_pdf/image_0.jpeg", + "downloadUrl": "https://api.groupdocs.cloud/v1.0/parser/storage/file/parser/images/containers/archive/zip-eml-jpg-pdf_zip/template-document_pdf/image_0.jpeg" + }, + { + "path": "parser/images/containers/archive/zip-eml-jpg-pdf_zip/template-document_pdf/image_1.jpeg", + "downloadUrl": "https://api.groupdocs.cloud/v1.0/parser/storage/file/parser/images/containers/archive/zip-eml-jpg-pdf_zip/template-document_pdf/image_1.jpeg" + }, + { + "path": "parser/images/containers/archive/zip-eml-jpg-pdf_zip/template-document_pdf/image_2.jpeg", + "downloadUrl": "https://api.groupdocs.cloud/v1.0/parser/storage/file/parser/images/containers/archive/zip-eml-jpg-pdf_zip/template-document_pdf/image_2.jpeg" + }, + { + "path": "parser/images/containers/archive/zip-eml-jpg-pdf_zip/template-document_pdf/image_3.jpeg", + "downloadUrl": "https://api.groupdocs.cloud/v1.0/parser/storage/file/parser/images/containers/archive/zip-eml-jpg-pdf_zip/template-document_pdf/image_3.jpeg" + } + ] +} +``` + +The response includes: +- `path`: The storage path where the extracted image is saved +- `downloadUrl`: A direct URL to download the extracted image + +## Try It Yourself + +Now it's your turn to try extracting images from your own document: + +1. Replace `YOUR_CLIENT_ID` and `YOUR_CLIENT_SECRET` with your actual credentials +2. Update the `FilePath` parameter to point to a document with images in your storage +3. Execute the curl command and observe the response +4. Try downloading one of the extracted images using the provided `downloadUrl` + +## Implementation in Different Languages + +### C# Example + +```csharp +using System; +using System.Collections.Generic; +using System.IO; +using System.Net.Http; +using System.Text; +using System.Threading.Tasks; +using Newtonsoft.Json; + +namespace GroupDocsParserCloudTutorial +{ + class Program + { + static async Task Main(string[] args) + { + // Get your ClientID and ClientSecret from https://dashboard.groupdocs.cloud + string clientId = "YOUR_CLIENT_ID"; + string clientSecret = "YOUR_CLIENT_SECRET"; + + // Get JWT token + string token = await GetAuthToken(clientId, clientSecret); + + // Extract images from the document + await ExtractImages(token, "documents/document-with-images.pdf"); + } + + static async Task GetAuthToken(string clientId, string clientSecret) + { + using (var client = new HttpClient()) + { + // Prepare request + var requestBody = $"grant_type=client_credentials&client_id={clientId}&client_secret={clientSecret}"; + var content = new StringContent(requestBody, Encoding.UTF8, "application/x-www-form-urlencoded"); + + // Send request + var response = await client.PostAsync("https://api.groupdocs.cloud/connect/token", content); + + // Process response + var jsonString = await response.Content.ReadAsStringAsync(); + var token = JsonConvert.DeserializeObject>(jsonString); + + return token["access_token"]; + } + } + + static async Task ExtractImages(string token, string filePath) + { + using (var client = new HttpClient()) + { + // Prepare request + client.DefaultRequestHeaders.Add("Authorization", $"Bearer {token}"); + + var requestBody = new + { + FileInfo = new + { + FilePath = filePath, + StorageName = "" + } + }; + + var content = new StringContent(JsonConvert.SerializeObject(requestBody), Encoding.UTF8, "application/json"); + + // Send request + var response = await client.PostAsync("https://api.groupdocs.cloud/v1.0/parser/images", content); + + // Process response + var jsonString = await response.Content.ReadAsStringAsync(); + Console.WriteLine("API Response:"); + Console.WriteLine(jsonString); + + // Parse the response + var result = JsonConvert.DeserializeObject(jsonString); + + if (result.Images != null && result.Images.Count > 0) + { + Console.WriteLine($"Extracted {result.Images.Count} images."); + + // Download the first image as an example + if (result.Images.Count > 0) + { + var imageUrl = result.Images[0].DownloadUrl; + await DownloadImage(client, imageUrl, "extracted_image_0.jpeg"); + } + } + else + { + Console.WriteLine("No images found in the document."); + } + } + } + + static async Task DownloadImage(HttpClient client, string imageUrl, string localPath) + { + Console.WriteLine($"Downloading image from: {imageUrl}"); + + var imageBytes = await client.GetByteArrayAsync(imageUrl); + File.WriteAllBytes(localPath, imageBytes); + + Console.WriteLine($"Image downloaded to: {localPath}"); + } + + class ImageExtractionResponse + { + public List Images { get; set; } + } + + class ImageInfo + { + public string Path { get; set; } + public string DownloadUrl { get; set; } + } + } +} +``` + +### Java Example + +```java +import java.io.FileOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.net.HttpURLConnection; +import java.net.URL; +import java.nio.charset.StandardCharsets; +import java.util.Scanner; +import org.json.JSONArray; +import org.json.JSONObject; + +public class ExtractImagesTutorial { + + private static final String BASE_URL = "https://api.groupdocs.cloud/v1.0/parser"; + private static final String AUTH_URL = "https://api.groupdocs.cloud/connect/token"; + + public static void main(String[] args) throws IOException { + // Get your ClientID and ClientSecret from https://dashboard.groupdocs.cloud + String clientId = "YOUR_CLIENT_ID"; + String clientSecret = "YOUR_CLIENT_SECRET"; + + // Get JWT token + String token = getAuthToken(clientId, clientSecret); + + // Extract images from the document + extractImages(token, "documents/document-with-images.pdf"); + } + + private static String getAuthToken(String clientId, String clientSecret) throws IOException { + URL url = new URL(AUTH_URL); + HttpURLConnection conn = (HttpURLConnection) url.openConnection(); + conn.setRequestMethod("POST"); + conn.setRequestProperty("Content-Type", "application/x-www-form-urlencoded"); + conn.setDoOutput(true); + + String requestBody = "grant_type=client_credentials&client_id=" + clientId + "&client_secret=" + clientSecret; + try (OutputStream os = conn.getOutputStream()) { + os.write(requestBody.getBytes(StandardCharsets.UTF_8)); + } + + try (Scanner scanner = new Scanner(conn.getInputStream(), StandardCharsets.UTF_8.name())) { + String jsonResponse = scanner.useDelimiter("\\A").next(); + JSONObject jsonObject = new JSONObject(jsonResponse); + return jsonObject.getString("access_token"); + } + } + + private static void extractImages(String token, String filePath) throws IOException { + URL url = new URL(BASE_URL + "/images"); + HttpURLConnection conn = (HttpURLConnection) url.openConnection(); + conn.setRequestMethod("POST"); + conn.setRequestProperty("Content-Type", "application/json"); + conn.setRequestProperty("Accept", "application/json"); + conn.setRequestProperty("Authorization", "Bearer " + token); + conn.setDoOutput(true); + + String requestBody = "{\"FileInfo\":{\"FilePath\":\"" + filePath + "\",\"StorageName\":\"\"}}"; + try (OutputStream os = conn.getOutputStream()) { + os.write(requestBody.getBytes(StandardCharsets.UTF_8)); + } + + try (Scanner scanner = new Scanner(conn.getInputStream(), StandardCharsets.UTF_8.name())) { + String jsonResponse = scanner.useDelimiter("\\A").next(); + System.out.println("API Response:"); + System.out.println(jsonResponse); + + // Parse the response + JSONObject responseObj = new JSONObject(jsonResponse); + + if (responseObj.has("images")) { + JSONArray images = responseObj.getJSONArray("images"); + System.out.println("Extracted " + images.length() + " images."); + + // Download the first image as an example + if (images.length() > 0) { + JSONObject firstImage = images.getJSONObject(0); + String downloadUrl = firstImage.getString("downloadUrl"); + downloadImage(token, downloadUrl, "extracted_image_0.jpeg"); + } + } else { + System.out.println("No images found in the document."); + } + } + } + + private static void downloadImage(String token, String imageUrl, String localPath) throws IOException { + System.out.println("Downloading image from: " + imageUrl); + + URL url = new URL(imageUrl); + HttpURLConnection conn = (HttpURLConnection) url.openConnection(); + conn.setRequestMethod("GET"); + conn.setRequestProperty("Authorization", "Bearer " + token); + + try (InputStream is = conn.getInputStream(); + FileOutputStream fos = new FileOutputStream(localPath)) { + + byte[] buffer = new byte[4096]; + int bytesRead; + + while ((bytesRead = is.read(buffer)) != -1) { + fos.write(buffer, 0, bytesRead); + } + } + + System.out.println("Image downloaded to: " + localPath); + } +} +``` + +## What You Can Do with Extracted Images + +Once you've extracted the images, you can: + +1. Display them in your application: Use the download URLs to display the images in your web or mobile application. + +2. Process them for analysis: Download the images and process them using image analysis libraries for tasks like OCR, object detection, or facial recognition. + +3. Store them separately: Save the images to a dedicated image storage system or database for better organization and retrieval. + +4. Create image galleries: Build image galleries that showcase all the visuals contained in a document. + +## Common Issues and Troubleshooting + +- No Images Extracted: If no images are returned, ensure that the document actually contains embedded images. Some documents might have vector graphics or other non-extractable visual elements. + +- Image Quality: The extracted images are saved as JPEG files, which might result in some quality loss for certain types of images. If you need lossless extraction, you might need to use a different approach. + +- Access Denied: Ensure that you're using a valid token and that you have the necessary permissions to access the document in storage. + +## What You've Learned + +In this tutorial, you've learned: +- How to authenticate with the GroupDocs.Parser Cloud API +- How to extract all images from a document +- How to download and process the extracted images +- How to implement this functionality in C# and Java + +## Next Steps + +Now that you've mastered extracting images from an entire document, you can: +- Follow our tutorial on [Extracting Images by Page Number Range](/parse-operations/extract-images-page-number-range/) to learn how to extract images from specific pages +- Learn about combining [Text Extraction](/parse-operations/extract-images-whole-document) with image extraction for comprehensive document processing + +## Further Practice + +Try extracting images from various document formats (PDF, DOCX, PPTX, etc.) to understand how the API handles different document structures. Experiment with combining multiple API operations to build a more comprehensive document processing solution. + +## Helpful Resources + +- [Product Page](https://products.groupdocs.cloud/parser/) +- [Documentation](https://docs.groupdocs.cloud/parser/) +- [Live Demo](https://products.groupdocs.app/parser/family) +- [API Reference](https://reference.groupdocs.cloud/parser/) +- [Blog](https://blog.groupdocs.cloud/categories/groupdocs.parser-cloud-product-family/) +- [Free Support](https://forum.groupdocs.cloud/c/parser/19/) +- [Free Trial](https://dashboard.groupdocs.cloud/#/apps) diff --git a/content/parser/english/parse-operations/extract-text-page-number-range/extract-text-page-number-range-tutorial.md b/content/parser/english/parse-operations/extract-text-page-number-range/extract-text-page-number-range-tutorial.md new file mode 100644 index 0000000..4110e23 --- /dev/null +++ b/content/parser/english/parse-operations/extract-text-page-number-range/extract-text-page-number-range-tutorial.md @@ -0,0 +1,339 @@ +--- +title: How to Extract Text by a Page Number Range Tutorial +url: /parse-operations/extract-text-page-number-range/ +weight: 2 +description: Learn how to extract text from specific pages in documents using GroupDocs.Parser Cloud API in this step-by-step tutorial for developers +--- + +# Tutorial: How to Extract Text by a Page Number Range + +## Learning Objectives + +In this tutorial, you'll learn how to: +- Extract text from specific pages in a document +- Define page ranges for targeted text extraction +- Process page-specific text in different programming languages + +## Prerequisites + +Before starting this tutorial, make sure you have: + +1. A GroupDocs.Parser Cloud account (if you don't have one, [register for a free trial](https://dashboard.groupdocs.cloud/#/apps)) +2. Your Client ID and Client Secret (available from the [dashboard](https://dashboard.groupdocs.cloud/#/apps)) +3. A multi-page document uploaded to your cloud storage + +## The Practical Scenario + +Imagine you're building an application that needs to: +- Extract text from specific sections of a long document +- Process only relevant pages instead of the entire document +- Allow users to navigate through document content page by page + +This tutorial will show you how to implement this functionality step by step. + +## Step 1: Obtain Authorization Token + +Before making any API calls, you need to authenticate with the GroupDocs API using your Client ID and Client Secret. + +```bash +# First get JSON Web Token +curl -v "https://api.groupdocs.cloud/connect/token" \ +-X POST \ +-d "grant_type=client_credentials&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET" \ +-H "Content-Type: application/x-www-form-urlencoded" \ +-H "Accept: application/json" +``` + +This will return a JWT token that you'll use in subsequent requests. + +## Step 2: Prepare Your API Request + +To extract text from specific pages, you'll make a POST request to the text endpoint with the following parameters: + +```bash +curl -v "https://api.groupdocs.cloud/v1.0/parser/text" \ +-X POST \ +-H "Content-Type: application/json" \ +-H "Accept: application/json" \ +-H "Authorization: Bearer YOUR_JWT_TOKEN" \ +-d "{ + \"StartPageNumber\": 0, + \"CountPagesToExtract\": 4, + \"FileInfo\": { + \"FilePath\": \"pdf/pages-document.pdf\" + } +}" +``` + +The key parameters here are: +- `StartPageNumber`: The zero-based index of the first page to extract (0 = first page) +- `CountPagesToExtract`: The number of pages to extract starting from the start page + +## Step 3: Execute the Request and Process the Response + +When you execute the request, the API will return a JSON response containing the extracted text for each page: + +```json +{ + "pages": [ + { + "pageIndex": 0, + "text": "Text inside bookmark 0\r\n\r\n Page 0 heading\r\nP a g e T e x t - P a g e 0\r\n" + }, + { + "pageIndex": 1, + "text": "Text inside bookmark 1\r\n\r\n Page 1 heading\r\nP a g e T e x t - P a g e 1\r\n" + }, + { + "pageIndex": 2, + "text": "Text inside bookmark 2\r\n\r\n Page 2 heading\r\nP a g e T e x t - P a g e 2\r\n" + }, + { + "pageIndex": 3, + "text": "Text inside bookmark 3\r\n\r\n Page 3 heading\r\nP a g e T e x t - P a g e 3\r\n" + } + ] +} +``` + +Notice that the response includes the `pageIndex` for each extracted page, making it easy to identify which text belongs to which page. + +## Try It Yourself + +Now it's your turn to try extracting text from specific pages: + +1. Replace `YOUR_CLIENT_ID` and `YOUR_CLIENT_SECRET` with your actual credentials +2. Update the `FilePath` parameter to point to a multi-page document in your storage +3. Adjust the `StartPageNumber` and `CountPagesToExtract` parameters to extract different page ranges +4. Execute the curl command and observe how the response changes + +## Implementation in Different Languages + +### C# Example + +```csharp +using System; +using System.Collections.Generic; +using System.Net.Http; +using System.Text; +using System.Threading.Tasks; +using Newtonsoft.Json; + +namespace GroupDocsParserCloudTutorial +{ + class Program + { + static async Task Main(string[] args) + { + // Get your ClientID and ClientSecret from https://dashboard.groupdocs.cloud + string clientId = "YOUR_CLIENT_ID"; + string clientSecret = "YOUR_CLIENT_SECRET"; + + // Get JWT token + string token = await GetAuthToken(clientId, clientSecret); + + // Extract text from specific pages + await ExtractTextByPageRange(token, "pdf/pages-document.pdf", 0, 4); + } + + static async Task GetAuthToken(string clientId, string clientSecret) + { + using (var client = new HttpClient()) + { + // Prepare request + var requestBody = $"grant_type=client_credentials&client_id={clientId}&client_secret={clientSecret}"; + var content = new StringContent(requestBody, Encoding.UTF8, "application/x-www-form-urlencoded"); + + // Send request + var response = await client.PostAsync("https://api.groupdocs.cloud/connect/token", content); + + // Process response + var jsonString = await response.Content.ReadAsStringAsync(); + var token = JsonConvert.DeserializeObject>(jsonString); + + return token["access_token"]; + } + } + + static async Task ExtractTextByPageRange(string token, string filePath, int startPage, int pageCount) + { + using (var client = new HttpClient()) + { + // Prepare request + client.DefaultRequestHeaders.Add("Authorization", $"Bearer {token}"); + + var requestBody = new + { + StartPageNumber = startPage, + CountPagesToExtract = pageCount, + FileInfo = new + { + FilePath = filePath + } + }; + + var content = new StringContent(JsonConvert.SerializeObject(requestBody), Encoding.UTF8, "application/json"); + + // Send request + var response = await client.PostAsync("https://api.groupdocs.cloud/v1.0/parser/text", content); + + // Process response + var jsonString = await response.Content.ReadAsStringAsync(); + Console.WriteLine(jsonString); + + // You can process each page individually + var result = JsonConvert.DeserializeObject(jsonString); + foreach (var page in result.Pages) + { + Console.WriteLine($"Page {page.PageIndex}:"); + Console.WriteLine(page.Text); + Console.WriteLine("--------------------"); + } + } + } + + class PageTextResponse + { + public List Pages { get; set; } + } + + class PageData + { + public int PageIndex { get; set; } + public string Text { get; set; } + } + } +} +``` + +### Java Example + +```java +import java.io.IOException; +import java.io.OutputStream; +import java.net.HttpURLConnection; +import java.net.URL; +import java.nio.charset.StandardCharsets; +import java.util.Scanner; +import org.json.JSONArray; +import org.json.JSONObject; + +public class ExtractTextByPageRangeTutorial { + + private static final String BASE_URL = "https://api.groupdocs.cloud/v1.0/parser"; + private static final String AUTH_URL = "https://api.groupdocs.cloud/connect/token"; + + public static void main(String[] args) throws IOException { + // Get your ClientID and ClientSecret from https://dashboard.groupdocs.cloud + String clientId = "YOUR_CLIENT_ID"; + String clientSecret = "YOUR_CLIENT_SECRET"; + + // Get JWT token + String token = getAuthToken(clientId, clientSecret); + + // Extract text from specific pages + extractTextByPageRange(token, "pdf/pages-document.pdf", 0, 4); + } + + private static String getAuthToken(String clientId, String clientSecret) throws IOException { + URL url = new URL(AUTH_URL); + HttpURLConnection conn = (HttpURLConnection) url.openConnection(); + conn.setRequestMethod("POST"); + conn.setRequestProperty("Content-Type", "application/x-www-form-urlencoded"); + conn.setDoOutput(true); + + String requestBody = "grant_type=client_credentials&client_id=" + clientId + "&client_secret=" + clientSecret; + try (OutputStream os = conn.getOutputStream()) { + os.write(requestBody.getBytes(StandardCharsets.UTF_8)); + } + + try (Scanner scanner = new Scanner(conn.getInputStream(), StandardCharsets.UTF_8.name())) { + String jsonResponse = scanner.useDelimiter("\\A").next(); + JSONObject jsonObject = new JSONObject(jsonResponse); + return jsonObject.getString("access_token"); + } + } + + private static void extractTextByPageRange(String token, String filePath, int startPage, int pageCount) throws IOException { + URL url = new URL(BASE_URL + "/text"); + HttpURLConnection conn = (HttpURLConnection) url.openConnection(); + conn.setRequestMethod("POST"); + conn.setRequestProperty("Content-Type", "application/json"); + conn.setRequestProperty("Accept", "application/json"); + conn.setRequestProperty("Authorization", "Bearer " + token); + conn.setDoOutput(true); + + String requestBody = String.format( + "{\"StartPageNumber\":%d,\"CountPagesToExtract\":%d,\"FileInfo\":{\"FilePath\":\"%s\"}}", + startPage, pageCount, filePath + ); + + try (OutputStream os = conn.getOutputStream()) { + os.write(requestBody.getBytes(StandardCharsets.UTF_8)); + } + + try (Scanner scanner = new Scanner(conn.getInputStream(), StandardCharsets.UTF_8.name())) { + String jsonResponse = scanner.useDelimiter("\\A").next(); + System.out.println(jsonResponse); + + // Process each page individually + JSONObject responseObj = new JSONObject(jsonResponse); + JSONArray pages = responseObj.getJSONArray("pages"); + + for (int i = 0; i < pages.length(); i++) { + JSONObject page = pages.getJSONObject(i); + int pageIndex = page.getInt("pageIndex"); + String text = page.getString("text"); + + System.out.println("Page " + pageIndex + ":"); + System.out.println(text); + System.out.println("--------------------"); + } + } + } +} +``` + +## Learning Checkpoint + +Take a moment to test your understanding: + +1. What is the purpose of the `StartPageNumber` parameter? +2. If you want to extract pages 5-10 of a document, what values should you use for `StartPageNumber` and `CountPagesToExtract`? +3. How would you modify the request to extract only the last page of a document if you don't know how many pages it has? + +## Common Issues and Troubleshooting + +- Page Range Out of Bounds: If you specify a page range that's outside the document's boundaries, you'll receive an error. Make sure your `StartPageNumber` and `CountPagesToExtract` parameters are valid for your document. +- Zero Pages Returned: Ensure that the document actually has content on the specified pages. Some documents might have blank pages that return empty text. +- Different Page Counts: Remember that page numbering starts from 0 in the API, but may start from 1 in the document viewer. + +## What You've Learned + +In this tutorial, you've learned: +- How to extract text from specific pages in a document +- How to specify page ranges using `StartPageNumber` and `CountPagesToExtract` +- How to process the page-specific text in your application + +## Next Steps + +Now that you know how to extract text from specific pages, you can: +- Explore [Extracting Formatted Text](/parse-operations/extract-formatted-text) to preserve document formatting + +## Further Practice + +Try creating an application that: +1. First determines the total number of pages in a document +2. Extracts text page by page using a loop +3. Performs analysis on each page (such as word count, keyword detection, etc.) +4. Compiles a summary report of the document's content structure + +## Helpful Resources + +- [Product Page](https://products.groupdocs.cloud/parser/) +- [Documentation](https://docs.groupdocs.cloud/parser/) +- [Live Demo](https://products.groupdocs.app/parser/family) +- [API Reference](https://reference.groupdocs.cloud/parser/) +- [Blog](https://blog.groupdocs.cloud/categories/groupdocs.parser-cloud-product-family/) +- [Free Support](https://forum.groupdocs.cloud/c/parser/19/) +- [Free Trial](https://dashboard.groupdocs.cloud/#/apps) diff --git a/content/parser/english/parse-operations/extract-text-whole-document/extract-text-whole-document-tutorial.md b/content/parser/english/parse-operations/extract-text-whole-document/extract-text-whole-document-tutorial.md new file mode 100644 index 0000000..d74bf33 --- /dev/null +++ b/content/parser/english/parse-operations/extract-text-whole-document/extract-text-whole-document-tutorial.md @@ -0,0 +1,264 @@ +--- +title: How to Extract Text from the Whole Document Tutorial +url: /parse-operations/extract-text-whole-document/ +weight: 1 +description: Learn how to extract all text content from documents using GroupDocs.Parser Cloud API in this step-by-step tutorial for developers +--- + +# Tutorial: How to Extract Text from the Whole Document + +## Learning Objectives + +In this tutorial, you'll learn how to: +- Set up the GroupDocs.Parser Cloud API for text extraction +- Extract text from an entire document with a simple API call +- Process the extracted text in different programming languages + +## Prerequisites + +Before starting this tutorial, make sure you have: + +1. A GroupDocs.Parser Cloud account (if you don't have one, [register for a free trial](https://dashboard.groupdocs.cloud/#/apps)) +2. Your Client ID and Client Secret (available from the [dashboard](https://dashboard.groupdocs.cloud/#/apps)) +3. At least one document uploaded to your cloud storage + +## The Practical Scenario + +Imagine you need to extract all text from a document to: +- Create a searchable database of document content +- Perform analysis on document text +- Enable full-text search functionality in your application + +This tutorial will show you how to implement this functionality step by step. + +## Step 1: Obtain Authorization Token + +Before making any API calls, you need to authenticate with the GroupDocs API using your Client ID and Client Secret. + +```bash +# First get JSON Web Token +curl -v "https://api.groupdocs.cloud/connect/token" \ +-X POST \ +-d "grant_type=client_credentials&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET" \ +-H "Content-Type: application/x-www-form-urlencoded" \ +-H "Accept: application/json" +``` + +This will return a JWT token that you'll use in subsequent requests. + +## Step 2: Prepare Your API Request + +To extract text from a document, you'll make a POST request to the text endpoint with the following parameters: + +```bash +curl -v "https://api.groupdocs.cloud/v1.0/parser/text" \ +-X POST \ +-H "Content-Type: application/json" \ +-H "Accept: application/json" \ +-H "Authorization: Bearer YOUR_JWT_TOKEN" \ +-d "{ + \"FileInfo\": { + \"FilePath\": \"words/docx/document.docx\" + } +}" +``` + +Note that you only need to specify the `FilePath` parameter to extract text from the entire document. + +## Step 3: Execute the Request and Process the Response + +When you execute the request, the API will return a JSON response containing the extracted text: + +```json +{ + "text": "First Page\r\r\f" +} +``` + +The text is returned in a simple format, with page breaks represented by the `\f` character. + +## Try It Yourself + +Now it's your turn to try extracting text from your own document: + +1. Replace `YOUR_CLIENT_ID` and `YOUR_CLIENT_SECRET` with your actual credentials +2. Update the `FilePath` parameter to point to a document in your storage +3. Execute the curl command and observe the response + +## Implementation in Different Languages + +### C# Example + +```csharp +using System; +using System.Collections.Generic; +using System.Net.Http; +using System.Text; +using System.Threading.Tasks; +using Newtonsoft.Json; + +namespace GroupDocsParserCloudTutorial +{ + class Program + { + static async Task Main(string[] args) + { + // Get your ClientID and ClientSecret from https://dashboard.groupdocs.cloud + string clientId = "YOUR_CLIENT_ID"; + string clientSecret = "YOUR_CLIENT_SECRET"; + + // Get JWT token + string token = await GetAuthToken(clientId, clientSecret); + + // Extract text from the document + await ExtractText(token, "words/docx/document.docx"); + } + + static async Task GetAuthToken(string clientId, string clientSecret) + { + using (var client = new HttpClient()) + { + // Prepare request + var requestBody = $"grant_type=client_credentials&client_id={clientId}&client_secret={clientSecret}"; + var content = new StringContent(requestBody, Encoding.UTF8, "application/x-www-form-urlencoded"); + + // Send request + var response = await client.PostAsync("https://api.groupdocs.cloud/connect/token", content); + + // Process response + var jsonString = await response.Content.ReadAsStringAsync(); + var token = JsonConvert.DeserializeObject>(jsonString); + + return token["access_token"]; + } + } + + static async Task ExtractText(string token, string filePath) + { + using (var client = new HttpClient()) + { + // Prepare request + client.DefaultRequestHeaders.Add("Authorization", $"Bearer {token}"); + + var requestBody = new + { + FileInfo = new + { + FilePath = filePath + } + }; + + var content = new StringContent(JsonConvert.SerializeObject(requestBody), Encoding.UTF8, "application/json"); + + // Send request + var response = await client.PostAsync("https://api.groupdocs.cloud/v1.0/parser/text", content); + + // Process response + var jsonString = await response.Content.ReadAsStringAsync(); + Console.WriteLine(jsonString); + } + } + } +} +``` + +### Java Example + +```java +import java.io.IOException; +import java.io.OutputStream; +import java.net.HttpURLConnection; +import java.net.URL; +import java.nio.charset.StandardCharsets; +import java.util.Scanner; +import org.json.JSONObject; + +public class ExtractTextTutorial { + + private static final String BASE_URL = "https://api.groupdocs.cloud/v1.0/parser"; + private static final String AUTH_URL = "https://api.groupdocs.cloud/connect/token"; + + public static void main(String[] args) throws IOException { + // Get your ClientID and ClientSecret from https://dashboard.groupdocs.cloud + String clientId = "YOUR_CLIENT_ID"; + String clientSecret = "YOUR_CLIENT_SECRET"; + + // Get JWT token + String token = getAuthToken(clientId, clientSecret); + + // Extract text from the document + extractText(token, "words/docx/document.docx"); + } + + private static String getAuthToken(String clientId, String clientSecret) throws IOException { + URL url = new URL(AUTH_URL); + HttpURLConnection conn = (HttpURLConnection) url.openConnection(); + conn.setRequestMethod("POST"); + conn.setRequestProperty("Content-Type", "application/x-www-form-urlencoded"); + conn.setDoOutput(true); + + String requestBody = "grant_type=client_credentials&client_id=" + clientId + "&client_secret=" + clientSecret; + try (OutputStream os = conn.getOutputStream()) { + os.write(requestBody.getBytes(StandardCharsets.UTF_8)); + } + + try (Scanner scanner = new Scanner(conn.getInputStream(), StandardCharsets.UTF_8.name())) { + String jsonResponse = scanner.useDelimiter("\\A").next(); + JSONObject jsonObject = new JSONObject(jsonResponse); + return jsonObject.getString("access_token"); + } + } + + private static void extractText(String token, String filePath) throws IOException { + URL url = new URL(BASE_URL + "/text"); + HttpURLConnection conn = (HttpURLConnection) url.openConnection(); + conn.setRequestMethod("POST"); + conn.setRequestProperty("Content-Type", "application/json"); + conn.setRequestProperty("Accept", "application/json"); + conn.setRequestProperty("Authorization", "Bearer " + token); + conn.setDoOutput(true); + + String requestBody = "{\"FileInfo\":{\"FilePath\":\"" + filePath + "\"}}"; + try (OutputStream os = conn.getOutputStream()) { + os.write(requestBody.getBytes(StandardCharsets.UTF_8)); + } + + try (Scanner scanner = new Scanner(conn.getInputStream(), StandardCharsets.UTF_8.name())) { + String jsonResponse = scanner.useDelimiter("\\A").next(); + System.out.println(jsonResponse); + } + } +} +``` + +## Common Issues and Troubleshooting + +- Authentication Error: If you receive a 401 Unauthorized error, check that your Client ID and Client Secret are correct and that your token hasn't expired. +- File Not Found: Ensure the specified file path is correct and the file exists in your storage. +- Empty Text Response: Some document formats may not contain extractable text. Try with a different document format. + +## What You've Learned + +In this tutorial, you've learned: +- How to authenticate with the GroupDocs.Parser Cloud API +- How to extract text from an entire document +- How to implement this functionality in C# and Java + +## Next Steps + +Now that you've mastered extracting text from an entire document, you can: +- Follow our tutorial on [Extracting Text by Page Number Range](/parse-operations/extract-text-page-number-range) to learn how to extract text from specific pages +- Explore how to [Extract Formatted Text](/parse-operations/extract-formatted-text) to preserve document formatting +## Further Practice + +Try extracting text from various document formats (PDF, DOCX, XLSX, etc.) to understand how the API handles different document structures. Experiment with combining multiple API operations to build a more comprehensive document processing solution. + +## Helpful Resources + +- [Product Page](https://products.groupdocs.cloud/parser/) +- [Documentation](https://docs.groupdocs.cloud/parser/) +- [Live Demo](https://products.groupdocs.app/parser/family) +- [API Reference](https://reference.groupdocs.cloud/parser/) +- [Blog](https://blog.groupdocs.cloud/categories/groupdocs.parser-cloud-product-family/) +- [Free Support](https://forum.groupdocs.cloud/c/parser/19/) +- [Free Trial](https://dashboard.groupdocs.cloud/#/apps) diff --git a/content/parser/english/storage-operations/_index.md b/content/parser/english/storage-operations/_index.md new file mode 100644 index 0000000..c74bc82 --- /dev/null +++ b/content/parser/english/storage-operations/_index.md @@ -0,0 +1,53 @@ +--- +title: GroupDocs.Parser Cloud API Document Storage Operations Tutorials +url: /storage-operations/ +weight: 4 +description: Learn how to manage document storage operations with these step-by-step tutorials for GroupDocs.Parser Cloud API +--- + +# GroupDocs.Parser Cloud API Storage Operations Tutorials + +Welcome to our comprehensive tutorial series on Storage Operations with GroupDocs.Parser Cloud API. These hands-on tutorials are designed specifically for developers who want to efficiently manage documents in cloud storage using GroupDocs.Parser. + +## Learning Path + +This tutorial series is structured to take you from basic to advanced storage operations. We recommend following the tutorials in the order presented below to build your knowledge progressively: + +1. [Tutorial: Getting Started with Storage Management](/storage-operations/working-with-storage/) - Learn the fundamentals of connecting to and managing your cloud storage. +2. [Tutorial: Learn to Work with Files in Cloud Storage](/storage-operations/working-with-files/) - Master the essential operations for uploading, downloading, moving, and copying files. +3. [Tutorial: How to Work with Folders in Cloud Storage](/storage-operations/working-with-folder/) - Discover techniques for creating folder structures and managing document organization. + +## What You'll Learn + +By completing these tutorials, you'll gain hands-on experience with: + +- Setting up and configuring GroupDocs.Parser Cloud API for storage operations +- Implementing file upload and download functionality +- Managing document versions and storage +- Creating and navigating folder structures +- Moving and copying files between storage locations +- Checking storage space usage and availability +- Implementing best practices for cloud storage management + +## Prerequisites + +Before starting these tutorials, please ensure you have: + +- A GroupDocs Cloud account ([sign up for free](https://dashboard.groupdocs.cloud/#/apps)) +- Basic understanding of REST APIs and HTTP requests +- Familiarity with your preferred programming language (C#, Java, or Python) +- API credentials (Client ID and Client Secret) + +## Estimated Time to Complete + +Each tutorial takes approximately 30-45 minutes to complete, depending on your prior experience with cloud APIs. + +## Helpful Resources + +- [Product Page](https://products.groupdocs.cloud/parser/) +- [Documentation](https://docs.groupdocs.cloud/parser/) +- [Live Demo](https://products.groupdocs.app/parser/family) +- [API Reference](https://reference.groupdocs.cloud/parser/) +- [Blog](https://blog.groupdocs.cloud/categories/groupdocs.parser-cloud-product-family/) +- [Free Support](https://forum.groupdocs.cloud/c/parser/19/) +- [Free Trial](https://dashboard.groupdocs.cloud/#/apps) diff --git a/content/parser/english/storage-operations/working-with-files/working-with-files-tutorial.md b/content/parser/english/storage-operations/working-with-files/working-with-files-tutorial.md new file mode 100644 index 0000000..7f96ab3 --- /dev/null +++ b/content/parser/english/storage-operations/working-with-files/working-with-files-tutorial.md @@ -0,0 +1,470 @@ +--- +title: Learn to Work with Files in GroupDocs.Parser Cloud Tutorial +url: /storage-operations/working-with-files/ +weight: 3 +description: Master file operations in cloud storage with this comprehensive tutorial on uploading, downloading, copying, moving, and deleting files using GroupDocs.Parser Cloud API. +--- + +# Tutorial: Learn to Work with Files in GroupDocs.Parser Cloud + +## Learning Objectives + +In this tutorial, you'll learn how to: +- Download files from cloud storage +- Upload files to cloud storage +- Delete files from storage +- Copy files between locations +- Move files to different paths + +## Prerequisites + +Before you begin this tutorial, you need: + +1. A GroupDocs Cloud account (if you don't have one, [register for free](https://dashboard.groupdocs.cloud/#/apps)) +2. Your Client ID and Client Secret (available from the [Dashboard](https://dashboard.groupdocs.cloud/#/apps)) +3. Basic knowledge of REST APIs +4. Familiarity with your preferred programming language (C#, Java, or cURL) + +## Introduction + +Efficient file management is essential when working with documents in the cloud. GroupDocs.Parser Cloud API provides comprehensive file operations that let you upload, download, copy, move, and delete files in your cloud storage. In this tutorial, we'll explore how to perform these essential file operations to manage your documents effectively. + +## Step 1: Setting Up Your Environment + +Before working with file operations, you need to set up authentication to access the GroupDocs.Parser Cloud API. + +### Try it yourself + +Create a new project in your preferred development environment and add the required dependencies: + +For C#: +```csharp +// Install via NuGet: +// Install-Package GroupDocs.Parser-Cloud +``` + +For Java: +```java +// Add to pom.xml: +// +// com.groupdocs +// groupdocs-parser-cloud +// latest-version +// +``` + +Then authenticate with your Client ID and Client Secret: + +```csharp +// C# authentication example +string ClientId = "your-client-id"; +string ClientSecret = "your-client-secret"; +var configuration = new Configuration(ClientId, ClientSecret); +``` + +## Step 2: Downloading Files from Cloud Storage + +Let's start by learning how to download files from your cloud storage. + +### Understanding the API + +The Download File API allows you to retrieve a file from your GroupDocs Cloud Storage to your local environment. + +### Implementation Example + +Here's how to download files using different methods: + +#### Using cURL + +```bash +curl -X GET "https://api.groupdocs.cloud/v1.0/parser/storage/file/parserdocs/sample.docx?storageName=MyStorage" \ +-H "accept: multipart/form-data" \ +-H "authorization: Bearer {access_token}" \ +-o "downloaded_sample.docx" +``` + +#### Using C# SDK + +```csharp +// Create FileApi instance +var apiInstance = new FileApi(configuration); + +// File path in the storage +string filePath = "parserdocs/sample.docx"; +// Storage name +string storageName = "MyStorage"; + +// Call API to download file +var response = apiInstance.DownloadFile(new DownloadFileRequest(filePath, storageName)); + +// Save file to local disk +string localFilePath = "downloaded_sample.docx"; +using (var fileStream = System.IO.File.Create(localFilePath)) +{ + response.CopyTo(fileStream); +} + +Console.WriteLine($"File downloaded to {localFilePath} successfully."); +``` + +#### Using Java SDK + +```java +// Create FileApi instance +FileApi apiInstance = new FileApi(configuration); + +// File path in the storage +String filePath = "parserdocs/sample.docx"; +// Storage name +String storageName = "MyStorage"; + +// Call API to download file +DownloadFileRequest request = new DownloadFileRequest(filePath, storageName); +File response = apiInstance.downloadFile(request); + +// Process the downloaded file +System.out.println("File downloaded successfully."); +System.out.println("File size: " + response.length() + " bytes"); +``` + +### Try it yourself + +1. Replace `{access_token}` with your actual token in the cURL example +2. Modify "parserdocs/sample.docx" to a file path in your storage +3. Run the code and check if the file is downloaded to your local environment + +## Step 3: Uploading Files to Cloud Storage + +Now let's learn how to upload files to your cloud storage. + +### Implementation Example + +#### Using cURL + +```bash +curl -X POST "https://api.groupdocs.cloud/v1.0/parser/storage/file/parserdocs/uploaded_document.docx?storageName=MyStorage" \ +-H "accept: application/json" \ +-H "authorization: Bearer {access_token}" \ +-H "Content-Type: multipart/form-data" \ +-F "file=@local_document.docx" +``` + +#### Using C# SDK + +```csharp +// Create FileApi instance +var apiInstance = new FileApi(configuration); + +// Local file to upload +string localFilePath = "local_document.docx"; +// Destination path in storage +string uploadPath = "parserdocs/uploaded_document.docx"; +// Storage name +string storageName = "MyStorage"; + +// Prepare file content +var fileStream = System.IO.File.OpenRead(localFilePath); + +// Call API to upload file +var response = apiInstance.UploadFile(new UploadFileRequest(uploadPath, fileStream, storageName)); + +// Check the result +if (response.Uploaded != null && response.Uploaded.Count > 0) +{ + Console.WriteLine("File uploaded successfully:"); + foreach (var file in response.Uploaded) + { + Console.WriteLine(file); + } +} +else +{ + Console.WriteLine("File upload failed."); +} +``` + +#### Using Java SDK + +```java +// Create FileApi instance +FileApi apiInstance = new FileApi(configuration); + +// Local file to upload +String localFilePath = "local_document.docx"; +// Destination path in storage +String uploadPath = "parserdocs/uploaded_document.docx"; +// Storage name +String storageName = "MyStorage"; + +// Prepare file content +File fileToUpload = new File(localFilePath); +FileInputStream fileStream = new FileInputStream(fileToUpload); + +// Call API to upload file +UploadFileRequest request = new UploadFileRequest(uploadPath, fileStream, storageName); +FilesUploadResult response = apiInstance.uploadFile(request); + +// Check the result +if (response.getUploaded() != null && response.getUploaded().size() > 0) { + System.out.println("File uploaded successfully:"); + for (String file : response.getUploaded()) { + System.out.println(file); + } +} else { + System.out.println("File upload failed."); +} +``` + +### Expected Response + +```json +{ + "Uploaded": [ + "parserdocs/uploaded_document.docx" + ], + "Errors": [] +} +``` + +### Learning Checkpoint + +Question: What happens if you upload a file to a path that already exists in your storage? +Answer: If you upload a file to a path that already exists, the existing file will be overwritten with the new content. There is no automatic versioning, so make sure to use proper file naming or check for existence first if you want to prevent overwriting. + +## Step 4: Deleting Files from Storage + +Let's learn how to delete files from your cloud storage. + +### Implementation Example + +#### Using cURL + +```bash +curl -X DELETE "https://api.groupdocs.cloud/v1.0/parser/storage/file/parserdocs/document_to_delete.docx?storageName=MyStorage" \ +-H "accept: application/json" \ +-H "authorization: Bearer {access_token}" +``` + +#### Using C# SDK + +```csharp +// Create FileApi instance +var apiInstance = new FileApi(configuration); + +// File path to delete +string filePath = "parserdocs/document_to_delete.docx"; +// Storage name +string storageName = "MyStorage"; +// Version ID (optional) +string versionId = null; + +// Call API to delete file +apiInstance.DeleteFile(new DeleteFileRequest(filePath, storageName, versionId)); + +Console.WriteLine($"File '{filePath}' deleted successfully."); +``` + +#### Using Java SDK + +```java +// Create FileApi instance +FileApi apiInstance = new FileApi(configuration); + +// File path to delete +String filePath = "parserdocs/document_to_delete.docx"; +// Storage name +String storageName = "MyStorage"; +// Version ID (optional) +String versionId = null; + +// Call API to delete file +DeleteFileRequest request = new DeleteFileRequest(filePath, storageName, versionId); +apiInstance.deleteFile(request); + +System.out.println("File '" + filePath + "' deleted successfully."); +``` + +### Expected Response + +```json +{ + "Code": 200, + "Status": "OK" +} +``` + +## Step 5: Copying Files in Cloud Storage + +Next, let's learn how to copy files between locations in your cloud storage. + +### Implementation Example + +#### Using cURL + +```bash +curl -X PUT "https://api.groupdocs.cloud/v1.0/parser/storage/file/copy/parserdocs/source.docx?destPath=parserdocs/destination.docx&srcStorageName=MyStorage&destStorageName=MyStorage" \ +-H "accept: application/json" \ +-H "authorization: Bearer {access_token}" +``` + +#### Using C# SDK + +```csharp +// Create FileApi instance +var apiInstance = new FileApi(configuration); + +// Source file path +string srcPath = "parserdocs/source.docx"; +// Destination file path +string destPath = "parserdocs/destination.docx"; +// Source storage name +string srcStorageName = "MyStorage"; +// Destination storage name +string destStorageName = "MyStorage"; +// Version ID (optional) +string versionId = null; + +// Call API to copy file +apiInstance.CopyFile(new CopyFileRequest(srcPath, destPath, srcStorageName, destStorageName, versionId)); + +Console.WriteLine($"File copied from '{srcPath}' to '{destPath}' successfully."); +``` + +#### Using Java SDK + +```java +// Create FileApi instance +FileApi apiInstance = new FileApi(configuration); + +// Source file path +String srcPath = "parserdocs/source.docx"; +// Destination file path +String destPath = "parserdocs/destination.docx"; +// Source storage name +String srcStorageName = "MyStorage"; +// Destination storage name +String destStorageName = "MyStorage"; +// Version ID (optional) +String versionId = null; + +// Call API to copy file +CopyFileRequest request = new CopyFileRequest(srcPath, destPath, srcStorageName, destStorageName, versionId); +apiInstance.copyFile(request); + +System.out.println("File copied from '" + srcPath + "' to '" + destPath + "' successfully."); +``` + +### Expected Response + +```json +{ + "Code": 200, + "Status": "OK" +} +``` + +## Step 6: Moving Files in Cloud Storage + +Finally, let's learn how to move files between locations in your cloud storage. + +### Implementation Example + +#### Using cURL + +```bash +curl -X PUT "https://api.groupdocs.cloud/v1.0/parser/storage/file/move/parserdocs/source.docx?destPath=parserdocs/moved.docx&srcStorageName=MyStorage&destStorageName=MyStorage" \ +-H "accept: application/json" \ +-H "authorization: Bearer {access_token}" +``` + +#### Using C# SDK + +```csharp +// Create FileApi instance +var apiInstance = new FileApi(configuration); + +// Source file path +string srcPath = "parserdocs/source.docx"; +// Destination file path +string destPath = "parserdocs/moved.docx"; +// Source storage name +string srcStorageName = "MyStorage"; +// Destination storage name +string destStorageName = "MyStorage"; +// Version ID (optional) +string versionId = null; + +// Call API to move file +apiInstance.MoveFile(new MoveFileRequest(srcPath, destPath, srcStorageName, destStorageName, versionId)); + +Console.WriteLine($"File moved from '{srcPath}' to '{destPath}' successfully."); +``` + +#### Using Java SDK + +```java +// Create FileApi instance +FileApi apiInstance = new FileApi(configuration); + +// Source file path +String srcPath = "parserdocs/source.docx"; +// Destination file path +String destPath = "parserdocs/moved.docx"; +// Source storage name +String srcStorageName = "MyStorage"; +// Destination storage name +String destStorageName = "MyStorage"; +// Version ID (optional) +String versionId = null; + +// Call API to move file +MoveFileRequest request = new MoveFileRequest(srcPath, destPath, srcStorageName, destStorageName, versionId); +apiInstance.moveFile(request); + +System.out.println("File moved from '" + srcPath + "' to '" + destPath + "' successfully."); +``` + +### Expected Response + +```json +{ + "Code": 200, + "Status": "OK" +} +``` + +## Troubleshooting Tips + +If you encounter issues while working with file operations, consider these common solutions: + +1. File Not Found Errors: Ensure that the file path is correct and the file exists. Use the Object Exists API to check. +2. Permission Errors: Verify that your account has the necessary permissions for the specified storage. +3. Invalid File Types: Ensure that the file type is supported by GroupDocs.Parser Cloud. +4. Destination Already Exists: When copying or moving files, check if the destination already exists to avoid unexpected overwrites. + +## What You've Learned + +In this tutorial, you've learned how to: +- Download files from cloud storage to your local environment +- Upload local files to your cloud storage +- Delete files from your storage when they're no longer needed +- Copy files between different locations in the cloud +- Move files to reorganize your document storage + +These file operations are essential for effectively managing your documents in the cloud with GroupDocs.Parser. + +## Further Practice + +To reinforce your learning, try these exercises: +1. Create a batch file uploader that uploads multiple files at once +2. Build a simple file synchronization tool that compares local and cloud files +3. Implement a file backup system that copies important files to a backup folder + + +## Helpful Resources + +- [Product Page](https://products.groupdocs.cloud/parser/) +- [Documentation](https://docs.groupdocs.cloud/parser/) +- [API Reference](https://reference.groupdocs.cloud/parser/) +- [Free Support](https://forum.groupdocs.cloud/c/parser/19/) +- [Free Trial](https://dashboard.groupdocs.cloud/#/apps) diff --git a/content/parser/english/storage-operations/working-with-folder/working-with-folder-tutorial.md b/content/parser/english/storage-operations/working-with-folder/working-with-folder-tutorial.md new file mode 100644 index 0000000..473c8b9 --- /dev/null +++ b/content/parser/english/storage-operations/working-with-folder/working-with-folder-tutorial.md @@ -0,0 +1,464 @@ +--- +title: How to Work with Folders in GroupDocs.Parser Cloud Tutorial +url: /storage-operations/working-with-folder/ +weight: 2 +description: Learn how to create, list, copy, move, and delete folders in cloud storage with this step-by-step GroupDocs.Parser Cloud API tutorial. +--- + +# Tutorial: How to Work with Folders in GroupDocs.Parser Cloud + +## Learning Objectives + +In this tutorial, you'll learn how to: +- List all files within a specific folder +- Create new folders in cloud storage +- Delete folders from your storage +- Copy folders between storage locations +- Move folders to different paths + +## Prerequisites + +Before you begin this tutorial, you need: + +1. A GroupDocs Cloud account (if you don't have one, [register for free](https://dashboard.groupdocs.cloud/#/apps)) +2. Your Client ID and Client Secret (available from the [Dashboard](https://dashboard.groupdocs.cloud/#/apps)) +3. Basic knowledge of REST APIs +4. Familiarity with your preferred programming language (C#, Java, or cURL) + +## Introduction + +Organizing your documents in cloud storage is essential for efficient document management. GroupDocs.Parser Cloud API provides powerful folder operations to help you structure and manage your documents effectively. In this tutorial, we'll explore how to list files, create folders, and perform various operations on folders in your cloud storage. + +## Step 1: Setting Up Your Environment + +Before working with folder operations, you need to set up authentication to access the GroupDocs.Parser Cloud API. + +### Try it yourself + +Create a new project in your preferred development environment and add the required dependencies: + +For C#: +```csharp +// Install via NuGet: +// Install-Package GroupDocs.Parser-Cloud +``` + +For Java: +```java +// Add to pom.xml: +// +// com.groupdocs +// groupdocs-parser-cloud +// latest-version +// +``` + +Then authenticate with your Client ID and Client Secret: + +```csharp +// C# authentication example +string ClientId = "your-client-id"; +string ClientSecret = "your-client-secret"; +var configuration = new Configuration(ClientId, ClientSecret); +``` + +## Step 2: Listing Files in a Folder + +Let's start by learning how to retrieve a list of all files in a specific folder. + +### Understanding the API + +The Get File Listing API allows you to get a comprehensive list of all files and subfolders within a specified folder in your cloud storage. + +### Implementation Example + +Here's how to list files in a folder using different methods: + +#### Using cURL + +```bash +curl -X GET "https://api.groupdocs.cloud/v1.0/parser/storage/folder/parserdocs?storageName=MyStorage" \ +-H "accept: application/json" \ +-H "authorization: Bearer {access_token}" +``` + +#### Using C# SDK + +```csharp +// Create FolderApi instance +var apiInstance = new FolderApi(configuration); + +// Folder path +string folderPath = "parserdocs"; +// Storage name +string storageName = "MyStorage"; + +// Call API to get files list +FilesList response = apiInstance.GetFilesList(new GetFilesListRequest(folderPath, storageName)); + +// Display the file list +Console.WriteLine($"Files in '{folderPath}':"); +foreach (var item in response.Value) +{ + Console.WriteLine($"Name: {item.Name}"); + Console.WriteLine($"Is Folder: {item.IsFolder}"); + Console.WriteLine($"Size: {item.Size} bytes"); + Console.WriteLine($"Path: {item.Path}"); + Console.WriteLine($"Modified Date: {item.ModifiedDate}"); + Console.WriteLine(); +} +``` + +#### Using Java SDK + +```java +// Create FolderApi instance +FolderApi apiInstance = new FolderApi(configuration); + +// Folder path +String folderPath = "parserdocs"; +// Storage name +String storageName = "MyStorage"; + +// Call API to get files list +GetFilesListRequest request = new GetFilesListRequest(folderPath, storageName); +FilesList response = apiInstance.getFilesList(request); + +// Display the file list +System.out.println("Files in '" + folderPath + "':"); +for (FileInfo item : response.getValue()) { + System.out.println("Name: " + item.getName()); + System.out.println("Is Folder: " + item.getIsFolder()); + System.out.println("Size: " + item.getSize() + " bytes"); + System.out.println("Path: " + item.getPath()); + System.out.println("Modified Date: " + item.getModifiedDate()); + System.out.println(); +} +``` + +### Expected Response + +```json +{ + "value": [ + { + "name": "four-pages.docx", + "isFolder": false, + "modifiedDate": "2022-03-20T12:35:38+00:00", + "size": 8651, + "path": "/parserdocs/four-pages.docx" + }, + { + "name": "one-page.docx", + "isFolder": false, + "modifiedDate": "2022-03-20T12:17:34+00:00", + "size": 351348, + "path": "/parserdocs/one-page.docx" + }, + { + "name": "sample.pdf", + "isFolder": false, + "modifiedDate": "2022-03-20T12:29:10+00:00", + "size": 12345, + "path": "/parserdocs/sample.pdf" + } + ] +} +``` + +### Try it yourself + +1. Replace `{access_token}` with your actual token in the cURL example +2. Modify "parserdocs" to a folder path in your storage +3. Run the code and observe the list of files in your folder + +## Step 3: Creating a New Folder + +Now let's learn how to create a new folder in your cloud storage. + +### Implementation Example + +#### Using cURL + +```bash +curl -X POST "https://api.groupdocs.cloud/v1.0/parser/storage/folder/parserdocs/newfolder?storageName=MyStorage" \ +-H "accept: application/json" \ +-H "authorization: Bearer {access_token}" +``` + +#### Using C# SDK + +```csharp +// Create FolderApi instance +var apiInstance = new FolderApi(configuration); + +// Folder path to create +string folderPath = "parserdocs/newfolder"; +// Storage name +string storageName = "MyStorage"; + +// Call API to create folder +apiInstance.CreateFolder(new CreateFolderRequest(folderPath, storageName)); + +Console.WriteLine($"Folder '{folderPath}' created successfully."); +``` + +#### Using Java SDK + +```java +// Create FolderApi instance +FolderApi apiInstance = new FolderApi(configuration); + +// Folder path to create +String folderPath = "parserdocs/newfolder"; +// Storage name +String storageName = "MyStorage"; + +// Call API to create folder +CreateFolderRequest request = new CreateFolderRequest(folderPath, storageName); +apiInstance.createFolder(request); + +System.out.println("Folder '" + folderPath + "' created successfully."); +``` + +### Expected Response + +```json +{ + "code": 200, + "status": "OK" +} +``` + +### Learning Checkpoint + +Question: Can you create nested folders with a single API call? +Answer: Yes, GroupDocs.Parser Cloud API can create nested folders with a single call by specifying the full path including all subfolder levels (e.g., "folder1/folder2/folder3"). The folders will be created recursively. + +## Step 4: Deleting a Folder + +Let's learn how to delete a folder from your cloud storage. + +### Implementation Example + +#### Using cURL + +```bash +curl -X DELETE "https://api.groupdocs.cloud/v1.0/parser/storage/folder/parserdocs/newfolder?storageName=MyStorage&recursive=true" \ +-H "accept: application/json" \ +-H "authorization: Bearer {access_token}" +``` + +#### Using C# SDK + +```csharp +// Create FolderApi instance +var apiInstance = new FolderApi(configuration); + +// Folder path to delete +string folderPath = "parserdocs/newfolder"; +// Storage name +string storageName = "MyStorage"; +// Delete recursively (true to remove all contents) +bool recursive = true; + +// Call API to delete folder +apiInstance.DeleteFolder(new DeleteFolderRequest(folderPath, storageName, recursive)); + +Console.WriteLine($"Folder '{folderPath}' deleted successfully."); +``` + +#### Using Java SDK + +```java +// Create FolderApi instance +FolderApi apiInstance = new FolderApi(configuration); + +// Folder path to delete +String folderPath = "parserdocs/newfolder"; +// Storage name +String storageName = "MyStorage"; +// Delete recursively (true to remove all contents) +Boolean recursive = true; + +// Call API to delete folder +DeleteFolderRequest request = new DeleteFolderRequest(folderPath, storageName, recursive); +apiInstance.deleteFolder(request); + +System.out.println("Folder '" + folderPath + "' deleted successfully."); +``` + +### Expected Response + +```json +{ + "code": 200, + "status": "OK" +} +``` + +## Step 5: Copying a Folder + +Next, let's learn how to copy a folder to another location in your cloud storage. + +### Implementation Example + +#### Using cURL + +```bash +curl -X PUT "https://api.groupdocs.cloud/v1.0/parser/storage/folder/copy/parserdocs/source?destPath=parserdocs/destination&srcStorageName=MyStorage&destStorageName=MyStorage" \ +-H "accept: application/json" \ +-H "authorization: Bearer {access_token}" +``` + +#### Using C# SDK + +```csharp +// Create FolderApi instance +var apiInstance = new FolderApi(configuration); + +// Source folder path +string srcPath = "parserdocs/source"; +// Destination folder path +string destPath = "parserdocs/destination"; +// Source storage name +string srcStorageName = "MyStorage"; +// Destination storage name +string destStorageName = "MyStorage"; + +// Call API to copy folder +apiInstance.CopyFolder(new CopyFolderRequest(srcPath, destPath, srcStorageName, destStorageName)); + +Console.WriteLine($"Folder copied from '{srcPath}' to '{destPath}' successfully."); +``` + +#### Using Java SDK + +```java +// Create FolderApi instance +FolderApi apiInstance = new FolderApi(configuration); + +// Source folder path +String srcPath = "parserdocs/source"; +// Destination folder path +String destPath = "parserdocs/destination"; +// Source storage name +String srcStorageName = "MyStorage"; +// Destination storage name +String destStorageName = "MyStorage"; + +// Call API to copy folder +CopyFolderRequest request = new CopyFolderRequest(srcPath, destPath, srcStorageName, destStorageName); +apiInstance.copyFolder(request); + +System.out.println("Folder copied from '" + srcPath + "' to '" + destPath + "' successfully."); +``` + +### Expected Response + +```json +{ + "code": 200, + "status": "OK" +} +``` + +## Step 6: Moving a Folder + +Finally, let's learn how to move a folder to a different location in your cloud storage. + +### Implementation Example + +#### Using cURL + +```bash +curl -X PUT "https://api.groupdocs.cloud/v1.0/parser/storage/folder/move/parserdocs/source?destPath=parserdocs/newlocation&srcStorageName=MyStorage&destStorageName=MyStorage" \ +-H "accept: application/json" \ +-H "authorization: Bearer {access_token}" +``` + +#### Using C# SDK + +```csharp +// Create FolderApi instance +var apiInstance = new FolderApi(configuration); + +// Source folder path +string srcPath = "parserdocs/source"; +// Destination folder path +string destPath = "parserdocs/newlocation"; +// Source storage name +string srcStorageName = "MyStorage"; +// Destination storage name +string destStorageName = "MyStorage"; + +// Call API to move folder +apiInstance.MoveFolder(new MoveFolderRequest(srcPath, destPath, srcStorageName, destStorageName)); + +Console.WriteLine($"Folder moved from '{srcPath}' to '{destPath}' successfully."); +``` + +#### Using Java SDK + +```java +// Create FolderApi instance +FolderApi apiInstance = new FolderApi(configuration); + +// Source folder path +String srcPath = "parserdocs/source"; +// Destination folder path +String destPath = "parserdocs/newlocation"; +// Source storage name +String srcStorageName = "MyStorage"; +// Destination storage name +String destStorageName = "MyStorage"; + +// Call API to move folder +MoveFolderRequest request = new MoveFolderRequest(srcPath, destPath, srcStorageName, destStorageName); +apiInstance.moveFolder(request); + +System.out.println("Folder moved from '" + srcPath + "' to '" + destPath + "' successfully."); +``` + +### Expected Response + +```json +{ + "code": 200, + "status": "OK" +} +``` + +## Troubleshooting Tips + +If you encounter issues while working with folder operations, consider these common solutions: + +1. Path Not Found Errors: Ensure that all folder paths exist before performing operations. Use the Object Exists API to check. +2. Permission Errors: Verify that your account has the necessary permissions for the specified storage. +3. Recursive Delete Issues: When deleting a non-empty folder without setting `recursive=true`, you'll get an error. Always use recursive delete for non-empty folders. + +## What You've Learned + +In this tutorial, you've learned how to: +- List all files within a specific folder +- Create new folders in cloud storage +- Delete folders from your storage +- Copy folders between storage locations +- Move folders to different paths + +These folder operations are essential for organizing and managing your documents efficiently in the cloud. + +## Further Practice + +To reinforce your learning, try these exercises: +1. Create a folder structure with multiple nested levels +2. Build a simple file browser application using the folder and file APIs +3. Implement a folder synchronization function between two different storage locations + +## Helpful Resources + +- [Product Page](https://products.groupdocs.cloud/parser/) +- [Documentation](https://docs.groupdocs.cloud/parser/) +- [API Reference](https://reference.groupdocs.cloud/parser/) +- [Free Support](https://forum.groupdocs.cloud/c/parser/19/) +- [Free Trial](https://dashboard.groupdocs.cloud/#/apps) diff --git a/content/parser/english/storage-operations/working-with-storage/working-with-storage-tutorial.md b/content/parser/english/storage-operations/working-with-storage/working-with-storage-tutorial.md new file mode 100644 index 0000000..11b0d94 --- /dev/null +++ b/content/parser/english/storage-operations/working-with-storage/working-with-storage-tutorial.md @@ -0,0 +1,290 @@ +--- +title: How to Work with Storage in GroupDocs.Parser Cloud Tutorial +url: /storage-operations/working-with-storage/ +weight: 1 +description: Learn how to check storage existence, manage storage objects, and monitor space usage with this step-by-step GroupDocs.Parser Cloud API tutorial. +--- + +# Tutorial: How to Work with Storage in GroupDocs.Parser Cloud + +## Learning Objectives + +In this tutorial, you'll learn how to: +- Check if a specific cloud storage exists +- Verify the existence of files and folders in your storage +- Monitor storage space usage +- Work with file versions in GroupDocs.Parser Cloud storage + +## Prerequisites + +Before you begin this tutorial, you need: + +1. A GroupDocs Cloud account (if you don't have one, [register for free](https://dashboard.groupdocs.cloud/#/apps)) +2. Your Client ID and Client Secret (available from the [Dashboard](https://dashboard.groupdocs.cloud/#/apps)) +3. Basic knowledge of REST APIs +4. Familiarity with your preferred programming language (C#, Java, or cURL) + +## Introduction + +Managing your cloud storage efficiently is crucial when working with documents in the cloud. GroupDocs.Parser Cloud API provides a set of powerful operations to help you manage your storage resources. In this tutorial, we'll explore how to check storage existence, verify objects in storage, monitor space usage, and work with file versions. + +## Step 1: Setting Up Your Environment + +Before working with storage operations, you need to set up authentication to access the GroupDocs.Parser Cloud API. + +### Try it yourself + +Create a new project in your preferred development environment and add the required dependencies: + +For C#: +```csharp +// Install via NuGet: +// Install-Package GroupDocs.Parser-Cloud +``` + +For Java: +```java +// Add to pom.xml: +// +// com.groupdocs +// groupdocs-parser-cloud +// latest-version +// +``` + +Then authenticate with your Client ID and Client Secret: + +```csharp +// C# authentication example +string ClientId = "your-client-id"; +string ClientSecret = "your-client-secret"; +var configuration = new Configuration(ClientId, ClientSecret); +``` + +## Step 2: Checking Storage Existence + +Let's start by learning how to check if a specific cloud storage exists. + +### Understanding the API + +The Storage Existence API lets you verify if a named storage configuration exists in your GroupDocs Cloud account. + +### Implementation Example + +Here's how to check storage existence using different methods: + +#### Using cURL + +```bash +curl -X GET "https://api.groupdocs.cloud/v1.0/parser/storage/MyStorage/exist" \ +-H "accept: application/json" \ +-H "authorization: Bearer {access_token}" +``` + +#### Using C# SDK + +```csharp +// Create StorageApi instance +var apiInstance = new StorageApi(configuration); + +// Storage name +string storageName = "MyStorage"; + +// Call API to check if storage exists +StorageExist response = apiInstance.StorageExists(new StorageExistsRequest(storageName)); + +// Check the result +Console.WriteLine("Storage exists: " + response.Exists); +``` + +#### Expected Response + +```json +{ + "exists": true +} +``` + +### Try it yourself + +1. Replace `{access_token}` with your actual token in the cURL example +2. In the C# example, replace "MyStorage" with your storage name +3. Run the code and check if your storage exists + +## Step 3: Checking Object Existence + +Next, let's learn how to verify if a specific file or folder exists in your storage. + +### Implementation Example + +#### Using cURL + +```bash +curl -X GET "https://api.groupdocs.cloud/v1.0/parser/storage/exist/documents/sample.docx?storageName=MyStorage" \ +-H "accept: application/json" \ +-H "authorization: Bearer {access_token}" +``` + +#### Using C# SDK + +```csharp +// Create StorageApi instance +var apiInstance = new StorageApi(configuration); + +// File path +string path = "documents/sample.docx"; +// Storage name +string storageName = "MyStorage"; + +// Call API to check if file exists +ObjectExist response = apiInstance.ObjectExists(new ObjectExistsRequest(path, storageName)); + +// Check the result +Console.WriteLine($"Object exists: {response.Exists}"); +Console.WriteLine($"Is folder: {response.IsFolder}"); +``` + +### Expected Response + +```json +{ + "exists": true, + "isFolder": false +} +``` + +## Step 4: Checking Storage Space Usage + +Monitoring your storage space is important to ensure you have enough resources for your documents. + +### Implementation Example + +#### Using cURL + +```bash +curl -X GET "https://api.groupdocs.cloud/v1.0/parser/storage/disc?storageName=MyStorage" \ +-H "accept: application/json" \ +-H "authorization: Bearer {access_token}" +``` + +#### Using C# SDK + +```csharp +// Create StorageApi instance +var apiInstance = new StorageApi(configuration); + +// Storage name +string storageName = "MyStorage"; + +// Call API to get disc usage +DiscUsage response = apiInstance.GetDiscUsage(new GetDiscUsageRequest(storageName)); + +// Print disc usage info +Console.WriteLine($"Total space: {response.TotalSize} bytes"); +Console.WriteLine($"Used space: {response.UsedSize} bytes"); +``` + +### Expected Response + +```json +{ + "usedSize": 31032368, + "totalSize": 3221225472 +} +``` + +### Learning Checkpoint + +Question: Why is monitoring storage space important when working with cloud document processing? +Answer: Monitoring storage space helps you manage resources efficiently, avoid running out of space during document processing operations, and plan for scaling your application as needed. + +## Step 5: Working with File Versions + +GroupDocs.Parser Cloud allows you to work with different versions of the same file. + +### Implementation Example + +#### Using cURL + +```bash +curl -X GET "https://api.groupdocs.cloud/v1.0/parser/storage/version/documents/sample.docx?storageName=MyStorage" \ +-H "accept: application/json" \ +-H "authorization: Bearer {access_token}" +``` + +#### Using C# SDK + +```csharp +// Create StorageApi instance +var apiInstance = new StorageApi(configuration); + +// File path +string path = "documents/sample.docx"; +// Storage name +string storageName = "MyStorage"; + +// Call API to get file versions +FileVersions response = apiInstance.GetFileVersions(new GetFileVersionsRequest(path, storageName)); + +// Print versions info +foreach (var version in response.Value) +{ + Console.WriteLine($"Version: {version.VersionId}"); + Console.WriteLine($"Is latest: {version.IsLatest}"); + Console.WriteLine($"Name: {version.Name}"); + Console.WriteLine($"Modified date: {version.ModifiedDate}"); + Console.WriteLine($"Size: {version.Size} bytes"); + Console.WriteLine(); +} +``` + +### Expected Response + +```json +{ + "value": [ + { + "versionId": "null", + "isLatest": true, + "name": "sample.docx", + "isFolder": false, + "modifiedDate": "2022-08-16T19:45:05+00:00", + "size": 347612, + "path": "/documents/sample.docx" + } + ] +} +``` + +## Troubleshooting Tips + +If you encounter issues while working with storage operations, consider these common solutions: + +1. Authentication Errors: Ensure your Client ID and Client Secret are correct and that your access token is valid. +2. 404 Not Found Errors: Check that the storage name or file path is spelled correctly. +3. Permission Issues: Verify that your account has the necessary permissions to access the specified storage. + +## What You've Learned + +In this tutorial, you've learned how to: +- Check if a specific cloud storage exists +- Verify the existence of files and folders in your storage +- Monitor storage space usage +- Retrieve and work with file versions + +These storage operations are foundational for efficiently managing your documents in the cloud with GroupDocs.Parser. + +## Further Practice + +To reinforce your learning, try these exercises: +1. Create a storage monitoring tool that alerts you when space usage exceeds 80% +2. Build a version history viewer for a specific document +3. Implement a function to verify multiple files exist before processing them + +## Helpful Resources + +- [Product Page](https://products.groupdocs.cloud/parser/) +- [Documentation](https://docs.groupdocs.cloud/parser/) +- [API Reference](https://reference.groupdocs.cloud/parser/) +- [Free Support](https://forum.groupdocs.cloud/c/parser/19/) +- [Free Trial](https://dashboard.groupdocs.cloud/#/apps) diff --git a/content/parser/english/template-operations/_index.md b/content/parser/english/template-operations/_index.md new file mode 100644 index 0000000..4980304 --- /dev/null +++ b/content/parser/english/template-operations/_index.md @@ -0,0 +1,65 @@ +--- +id: "template-operations-tutorials" +url: /template-operations/ +title: "GroupDocs.Parser Cloud API Document Template Operations Tutorials" +productName: "GroupDocs.Parser Cloud" +weight: 3 +description: "Learn how to work with document templates in GroupDocs.Parser Cloud with these comprehensive step-by-step tutorials for developers" +keywords: "parser cloud, document parsing, templates, template operations, parsing tutorial, groupdocs tutorial, cloud api" +toc: True +--- + +# GroupDocs.Parser Cloud API Document Template Operations Tutorials + +Welcome to our hands-on tutorial series for developers working with GroupDocs.Parser Cloud API's template operations. These tutorials are designed to help you effectively implement document parsing solutions using templates in your applications. + +## Learning Path: From Basics to Advanced Template Operations + +This comprehensive tutorial series follows a structured learning path that will take you from basic template creation to advanced template operations. Each tutorial builds upon skills learned in previous lessons to help you master the complete template workflow. + +### What You'll Learn + +In these tutorials, you'll discover how to: +- Create and update document parsing templates +- Retrieve templates for use in your parsing operations +- Delete templates that are no longer needed +- Apply templates to extract structured data from documents + +## Template Operations Tutorials + +Here are the step-by-step guides available in this tutorial series: + +1. [Learn to Create or Update Templates]({{< ref "/template-operations/create-or-update-template" >}}) + Create custom templates to define data extraction patterns or update existing templates with new fields and tables. + +2. [Tutorial: How to Retrieve Templates]({{< ref "/template-operations/get-template" >}}) + Master the process of retrieving saved templates to use in your document parsing operations. + +3. [Tutorial: Deleting Unused Templates]({{< ref "/template-operations/delete-template" >}}) + Learn the proper methods for removing templates that are no longer needed in your workflow. + +## Prerequisites + +To get the most from these tutorials, you should have: +- A GroupDocs.Parser Cloud account +- Basic understanding of REST APIs +- Familiarity with your preferred programming language (C# or Java) +- Development environment set up for your chosen SDK + +## Getting Started + +Before diving into the tutorials, make sure you have set up your GroupDocs account and have your Client ID and Client Secret ready. If you haven't done this yet, sign up for a [free trial](https://dashboard.groupdocs.cloud/#/apps). + +Each tutorial includes code examples in multiple languages to help you quickly implement the concepts in your preferred development environment. + +## Helpful Resources + +- [Product Page](https://products.groupdocs.cloud/parser/) +- [Documentation](https://docs.groupdocs.cloud/parser/) +- [Live Demo](https://products.groupdocs.app/parser/family) +- [API Reference](https://reference.groupdocs.cloud/parser/) +- [Blog](https://blog.groupdocs.cloud/categories/groupdocs.parser-cloud-product-family/) +- [Free Support](https://forum.groupdocs.cloud/c/parser/19/) +- [Free Trial](https://dashboard.groupdocs.cloud/#/apps) + +We welcome your feedback on these tutorials! If you have questions or suggestions, please share them in the [support forum](https://forum.groupdocs.cloud/c/parser/19/). diff --git a/content/parser/english/template-operations/create-or-update-template/create-or-update-template-tutorial.md b/content/parser/english/template-operations/create-or-update-template/create-or-update-template-tutorial.md new file mode 100644 index 0000000..7282923 --- /dev/null +++ b/content/parser/english/template-operations/create-or-update-template/create-or-update-template-tutorial.md @@ -0,0 +1,311 @@ +--- +id: "create-or-update-template-tutorial" +url: /template-operations/create-or-update-template/ +title: Learn to Create or Update Templates in GroupDocs.Parser Cloud Tutorial +weight: 1 +description: This tutorial teaches you how to create and update document parsing templates using GroupDocs.Parser Cloud API to extract structured data from your documents. +--- + +# Tutorial: Learn to Create or Update Templates in GroupDocs.Parser Cloud + +## Learning Objectives + +In this tutorial, you'll learn how to create and update document parsing templates using the GroupDocs.Parser Cloud API. By the end, you'll be able to: + +- Define custom templates for extracting data from documents +- Create templates with fields for specific data points +- Add tables to your templates for structured data extraction +- Save templates for future use in parsing operations +- Update existing templates with new fields or parameters + +Estimated completion time: 20-25 minutes + +## Prerequisites + +Before starting this tutorial, make sure you have: + +1. A GroupDocs.Parser Cloud account (if not, [register for a free trial](https://dashboard.groupdocs.cloud/#/apps)) +2. Your Client ID and Client Secret from the [dashboard](https://dashboard.groupdocs.cloud/applications) +3. Basic understanding of REST API concepts +4. Familiarity with your preferred programming language (C#, Java, or cURL) +5. Development environment set up for your chosen SDK + +## Understanding Templates in Document Parsing + +Templates are powerful tools that define patterns for extracting specific data from documents. They're especially useful when you need to extract data from similarly structured documents like invoices, contracts, or forms. + +### Why Use Templates? + +Templates allow you to: +- Extract data from fixed positions in a document +- Find data linked to other elements in the document +- Define tables for structured data extraction +- Process batches of similarly structured documents consistently + +## Creating Your First Template + +Let's start by creating a simple template that extracts an address and company name from a document. + +### Step 1: Plan Your Template Structure + +First, identify what data you need to extract: +1. Address - located at a fixed position +2. Company Name - positioned below the address +3. Totals Table - containing financial summary data + +### Step 2: Obtain Authentication Token + +Before making any API calls, you need to authenticate with the GroupDocs.Parser Cloud API: + +```bash +# First get JSON Web Token +curl -v "https://api.groupdocs.cloud/connect/token" \ +-X POST \ +-d "grant_type=client_credentials&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET" \ +-H "Content-Type: application/x-www-form-urlencoded" \ +-H "Accept: application/json" +``` + +Save the JWT token from the response for use in subsequent API calls. + +### Step 3: Define Template Fields + +Now, let's create a template with fields for the address and company name: + +#### Try it yourself: + +Use the following cURL command to create a new template: + +```bash +curl -v "https://api.groupdocs.cloud/v1.0/parser/template" \ +-X PUT \ +-H "Content-Type: application/json" \ +-H "Accept: application/json" \ +-H "Authorization: Bearer YOUR_JWT_TOKEN" \ +-d "{ + \"Template\": { + \"Fields\": [ + { + \"FieldName\": \"Address\", + \"FieldPosition\": { + \"FieldPositionType\": \"Fixed\", + \"Rectangle\": { + \"Position\": { + \"X\": 13.0, + \"Y\": 35.0 + }, + \"Size\": { + \"Height\": 10.0, + \"Width\": 100.0 + } + }, + \"MatchCase\": false, + \"IsLeftLinked\": false, + \"IsRightLinked\": false, + \"IsTopLinked\": false, + \"IsBottomLinked\": false, + \"AutoScale\": false + } + }, + { + \"FieldName\": \"Company\", + \"FieldPosition\": { + \"FieldPositionType\": \"Linked\", + \"MatchCase\": false, + \"LinkedFieldName\": \"Address\", + \"IsLeftLinked\": false, + \"IsRightLinked\": false, + \"IsTopLinked\": false, + \"IsBottomLinked\": true, + \"SearchArea\": { + \"Height\": 15.0, + \"Width\": 100.0 + }, + \"AutoScale\": true + } + } + ], + \"Tables\": [ + { + \"TableName\": \"Totals\", + \"DetectorParameters\": { + \"Rectangle\": { + \"Position\": { + \"X\": 300.0, + \"Y\": 385.0 + }, + \"Size\": { + \"Height\": 220.0, + \"Width\": 65.0 + } + } + } + } + ] + }, + \"TemplatePath\": \"templates/my_first_template.json\" +}" +``` + +### Step 4: Understanding the Template Structure + +Let's break down what each part of the template does: + +1. Fixed Position Field (Address): + - Positioned at coordinates X:13, Y:35 + - Covers an area of width 100 and height 10 + - Uses absolute positioning on the page + +2. Linked Field (Company): + - Linked to the "Address" field + - Positioned below the address (IsBottomLinked: true) + - Search area of width 100 and height 15 + - Will automatically scale based on the size of the linked field + +3. Table (Totals): + - Located within a rectangle at position X:300, Y:385 + - Covers an area with width 65 and height 220 + +## SDK Implementation + +Let's implement the same template creation using the SDK of your choice. + +### C# Example + +{{< gist groupdocscloud 39135fbf5cfb74deeeae6c47eafb2473 Parser_CSharp_Create_Update_Template_Tutorial.cs >}} + +### Java Example + +{{< gist groupdocscloud c8b8e01a52ef2bae6fa5d78aba152238 Parser_Java_Create_Update_Template_Tutorial.java >}} + +## Updating an Existing Template + +Now let's learn how to update an existing template by adding a new field. + +### Step 5: Add a New Field to Your Template + +To update a template, use the same API endpoint but specify the path of the existing template: + +```bash +curl -v "https://api.groupdocs.cloud/v1.0/parser/template" \ +-X PUT \ +-H "Content-Type: application/json" \ +-H "Accept: application/json" \ +-H "Authorization: Bearer YOUR_JWT_TOKEN" \ +-d "{ + \"Template\": { + \"Fields\": [ + { + \"FieldName\": \"Address\", + \"FieldPosition\": { + \"FieldPositionType\": \"Fixed\", + \"Rectangle\": { + \"Position\": { + \"X\": 13.0, + \"Y\": 35.0 + }, + \"Size\": { + \"Height\": 10.0, + \"Width\": 100.0 + } + }, + \"MatchCase\": false + } + }, + { + \"FieldName\": \"Company\", + \"FieldPosition\": { + \"FieldPositionType\": \"Linked\", + \"MatchCase\": false, + \"LinkedFieldName\": \"Address\", + \"IsBottomLinked\": true, + \"SearchArea\": { + \"Height\": 15.0, + \"Width\": 100.0 + }, + \"AutoScale\": true + } + }, + { + \"FieldName\": \"InvoiceNumber\", + \"FieldPosition\": { + \"FieldPositionType\": \"Regex\", + \"Regex\": \"Invoice #(\\\\d+)\", + \"MatchCase\": false + } + } + ], + \"Tables\": [ + { + \"TableName\": \"Totals\", + \"DetectorParameters\": { + \"Rectangle\": { + \"Position\": { + \"X\": 300.0, + \"Y\": 385.0 + }, + \"Size\": { + \"Height\": 220.0, + \"Width\": 65.0 + } + } + } + } + ] + }, + \"TemplatePath\": \"templates/my_first_template.json\" +}" +``` + +Notice we've added an "InvoiceNumber" field that uses a regular expression to find the invoice number. + +## Troubleshooting Tips + +### Common Issues and Solutions + +1. Authentication Errors: + - Ensure your Client ID and Client Secret are correct + - Check that your JWT token hasn't expired + +2. Template Not Saving: + - Verify that the storage path exists + - Ensure all required fields have valid values + +3. Field Not Found During Parsing: + - Double-check coordinates and dimensions + - Try expanding the search area for linked fields + - Test regex patterns separately before using them in templates + +## What You've Learned + +In this tutorial, you've learned how to: +- Create a new document parsing template +- Define fixed-position fields for specific data extraction +- Create linked fields that relate to other elements +- Add tables to your templates +- Update existing templates with new fields +- Implement template operations using cURL, C#, and Java + +## Further Practice + +To reinforce your learning, try these exercises: +1. Create a template that extracts data from an invoice PDF +2. Add a regex field to find dates in various formats +3. Create a template with multiple tables on different pages +4. Update an existing template to add search capabilities for email addresses + +## Next Steps + +Now that you know how to create and update templates, continue your learning journey with our next tutorial: [How to Retrieve Templates]({{< ref "/template-operations/get-template" >}}). + +## Helpful Resources + +- [Product Page](https://products.groupdocs.cloud/parser/) +- [Documentation](https://docs.groupdocs.cloud/parser/) +- [Live Demo](https://products.groupdocs.app/parser/family) +- [API Reference](https://reference.groupdocs.cloud/parser/) +- [Blog](https://blog.groupdocs.cloud/categories/groupdocs.parser-cloud-product-family/) +- [Free Support](https://forum.groupdocs.cloud/c/parser/19/) +- [Free Trial](https://dashboard.groupdocs.cloud/#/apps) + +Have questions about this tutorial? Feel free to ask in our [support forum](https://forum.groupdocs.cloud/c/parser/19/). diff --git a/content/parser/english/template-operations/delete-template/delete-template-tutorial.md b/content/parser/english/template-operations/delete-template/delete-template-tutorial.md new file mode 100644 index 0000000..4c5615d --- /dev/null +++ b/content/parser/english/template-operations/delete-template/delete-template-tutorial.md @@ -0,0 +1,204 @@ +--- +id: "delete-template-tutorial" +url: /template-operations/delete-template/ +title: "Tutorial: Deleting Unused Templates in GroupDocs.Parser Cloud" +productName: "GroupDocs.Parser Cloud" +weight: 3 +description: "Learn how to efficiently manage your document parsing templates by deleting unused templates with GroupDocs.Parser Cloud API in this step-by-step tutorial." +keywords: "parser cloud tutorial, delete template, remove template, template management, groupdocs tutorial, document parsing" +toc: True +--- + +# Tutorial: Deleting Unused Templates in GroupDocs.Parser Cloud + +## Learning Objectives + +In this tutorial, you'll learn how to properly delete document parsing templates that are no longer needed in your GroupDocs.Parser Cloud workflow. By the end, you'll be able to: + +- Remove unwanted templates from your storage +- Manage your template library efficiently +- Implement template deletion in your applications +- Troubleshoot common template deletion issues + +Estimated completion time: 15 minutes + +## Prerequisites + +Before starting this tutorial, make sure you have: + +1. A GroupDocs.Parser Cloud account (if not, [register for a free trial](https://dashboard.groupdocs.cloud/#/apps)) +2. Your Client ID and Client Secret from the [dashboard](https://dashboard.groupdocs.cloud/applications) +3. At least one template saved in your storage (see our [Create Template Tutorial]({{< ref "/template-operations/create-or-update-template" >}})) +4. Basic understanding of REST API concepts +5. Familiarity with your preferred programming language (C#, Java, or cURL) + +## Why Delete Templates? + +Managing your template library is an important part of maintaining an efficient document parsing workflow. Reasons to delete templates include: + +- Removing obsolete templates that no longer match your document formats +- Cleaning up test templates after development +- Organizing your storage by removing duplicate or unnecessary templates +- Ensuring that your parsing operations only use current, valid templates + +## Deleting a Template + +Let's walk through the process of deleting a template from your storage. + +### Step 1: Obtain Authentication Token + +Before making any API calls, you need to authenticate with the GroupDocs.Parser Cloud API: + +```bash +# First get JSON Web Token +curl -v "https://api.groupdocs.cloud/connect/token" \ +-X POST \ +-d "grant_type=client_credentials&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET" \ +-H "Content-Type: application/x-www-form-urlencoded" \ +-H "Accept: application/json" +``` + +Save the JWT token from the response for use in subsequent API calls. + +### Step 2: Delete the Template + +Now, let's delete a template using the DELETE Template endpoint: + +#### Try it yourself: + +```bash +curl -v "https://api.groupdocs.cloud/v1.0/parser/template" \ +-X PUT \ +-H "Content-Type: application/json" \ +-H "Accept: application/json" \ +-H "Authorization: Bearer YOUR_JWT_TOKEN" \ +-d "{ + \"TemplatePath\": \"templates/template_1.json\" +}" +``` + +Note that despite this being a delete operation, the API uses the PUT HTTP method with specific parameters to identify the template to be deleted. + +### Step 3: Understanding the Response + +A successful deletion will return a 204 No Content response, indicating that the template has been successfully removed from your storage. + +If you receive an error response, it might indicate that: +- The template path is incorrect +- The template doesn't exist +- You don't have permission to delete the file +- There's an issue with your storage + +## SDK Implementation + +Let's implement the same template deletion using the SDK of your choice. + +### C# Example + +{{< gist groupdocscloud 39135fbf5cfb74deeeae6c47eafb2473 Parser_CSharp_Delete_Template_Tutorial.cs >}} + +### Java Example + +{{< gist groupdocscloud c8b8e01a52ef2bae6fa5d78aba152238 Parser_Java_Delete_Template_Tutorial.java >}} + +## Alternative Deletion Methods + +In addition to using the dedicated Delete Template endpoint, you can also use the standard storage operations to delete template files: + +### Step 4: Delete Using Storage API + +```bash +curl -v "https://api.groupdocs.cloud/v1.0/storage/file/templates/template_1.json" \ +-X DELETE \ +-H "Content-Type: application/json" \ +-H "Accept: application/json" \ +-H "Authorization: Bearer YOUR_JWT_TOKEN" +``` + +This storage operation achieves the same result as the dedicated Delete Template endpoint. + +## Implementing Template Lifecycle Management + +In a production environment, it's beneficial to implement a complete template lifecycle management strategy: + +1. Creation: Create templates for your document types +2. Retrieval: Use templates for parsing operations +3. Updates: Modify templates as document formats change +4. Deletion: Remove templates when they become obsolete + +### Best Practices for Template Management + +- Maintain a versioning scheme for your templates +- Document which templates are used for which document types +- Regularly audit your template storage to identify unused templates +- Back up important templates before deletion +- Use a naming convention that makes template purpose clear + +## Troubleshooting Tips + +### Common Issues and Solutions + +1. Template Not Found: + - Verify that the template path is correct + - Check that the storage name is correct (if not using default storage) + - Ensure the template file exists in your storage + +2. Authentication Errors: + - Ensure your Client ID and Client Secret are correct + - Check that your JWT token hasn't expired + +3. Permission Errors: + - Verify that you have the necessary permissions to delete files + - Check if the file is locked by another process + +### Learning Checkpoint + +Let's verify your understanding with a few questions: + +1. What HTTP method is used for the Delete Template operation? + - Answer: PUT + +2. What status code indicates a successful template deletion? + - Answer: 204 No Content + +3. What alternative API can be used to delete template files? + - Answer: The Storage API + +## What You've Learned + +In this tutorial, you've learned how to: +- Delete templates from your GroupDocs.Parser Cloud storage +- Implement template deletion using cURL, C#, and Java +- Use alternative methods for template deletion +- Follow best practices for template lifecycle management +- Troubleshoot common template deletion issues + +## Further Practice + +To reinforce your learning, try these exercises: +1. Create a template management utility that lists, retrieves, and deletes templates +2. Implement a version control system for your templates with automatic deletion of outdated versions +3. Build a template cleanup script that identifies and removes unused templates +4. Create a workflow that checks if a template exists before trying to delete it + +## What You've Learned in This Tutorial Series + +Congratulations on completing our tutorial series on Template Operations! You've now learned the complete lifecycle of working with templates: + +1. Creating and updating templates to define data extraction patterns +2. Retrieving templates for use in document parsing +3. Deleting templates when they're no longer needed + +You now have the skills to implement a robust document parsing solution using templates in your applications. + +## Helpful Resources + +- [Product Page](https://products.groupdocs.cloud/parser/) +- [Documentation](https://docs.groupdocs.cloud/parser/) +- [Live Demo](https://products.groupdocs.app/parser/family) +- [API Reference](https://reference.groupdocs.cloud/parser/) +- [Blog](https://blog.groupdocs.cloud/categories/groupdocs.parser-cloud-product-family/) +- [Free Support](https://forum.groupdocs.cloud/c/parser/19/) +- [Free Trial](https://dashboard.groupdocs.cloud/#/apps) + +Have questions about this tutorial? Feel free to ask in our [support forum](https://forum.groupdocs.cloud/c/parser/19/). We're here to help you master template-based document parsing! diff --git a/content/parser/english/template-operations/get-template/get-template-tutorial.md b/content/parser/english/template-operations/get-template/get-template-tutorial.md new file mode 100644 index 0000000..1c1f57c --- /dev/null +++ b/content/parser/english/template-operations/get-template/get-template-tutorial.md @@ -0,0 +1,252 @@ +--- +id: "get-template-tutorial" +url: /template-operations/get-template/ +title: "Tutorial: How to Retrieve Templates in GroupDocs.Parser Cloud" +productName: "GroupDocs.Parser Cloud" +weight: 2 +description: "Learn how to retrieve document parsing templates using GroupDocs.Parser Cloud API in this step-by-step tutorial for developers." +keywords: "parser cloud tutorial, get template, retrieve template, template tutorial, groupdocs tutorial, document parsing" +toc: True +--- + +# Tutorial: How to Retrieve Templates in GroupDocs.Parser Cloud + +## Learning Objectives + +In this tutorial, you'll learn how to retrieve document parsing templates that you've previously created with the GroupDocs.Parser Cloud API. By the end, you'll be able to: + +- Access templates stored in your cloud storage +- Use retrieved templates for document parsing operations +- Handle template retrieval through both REST API calls and SDK implementations +- Troubleshoot common template retrieval issues + +Estimated completion time: 15-20 minutes + +## Prerequisites + +Before starting this tutorial, make sure you have: + +1. A GroupDocs.Parser Cloud account (if not, [register for a free trial](https://dashboard.groupdocs.cloud/#/apps)) +2. Your Client ID and Client Secret from the [dashboard](https://dashboard.groupdocs.cloud/applications) +3. At least one template saved in your storage (see our [Create Template Tutorial]({{< ref "/template-operations/create-or-update-template" >}})) +4. Basic understanding of REST API concepts +5. Familiarity with your preferred programming language (C#, Java, or cURL) + +## Why Retrieve Templates? + +Templates are valuable assets for document parsing operations. Once created, you can reuse them across multiple parsing jobs. Retrieving templates allows you to: + +- Use the same extraction patterns across multiple documents +- Ensure consistency in your parsing operations +- Save time by not having to redefine extraction rules +- Build a library of templates for different document types + +## Retrieving a Template + +Let's walk through the process of retrieving a template from your storage. + +### Step 1: Obtain Authentication Token + +Before making any API calls, you need to authenticate with the GroupDocs.Parser Cloud API: + +```bash +# First get JSON Web Token +curl -v "https://api.groupdocs.cloud/connect/token" \ +-X POST \ +-d "grant_type=client_credentials&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET" \ +-H "Content-Type: application/x-www-form-urlencoded" \ +-H "Accept: application/json" +``` + +Save the JWT token from the response for use in subsequent API calls. + +### Step 2: Retrieve the Template + +Now, let's retrieve a template using the GET Template endpoint: + +#### Try it yourself: + +```bash +curl -v "https://api.groupdocs.cloud/v1.0/parser/template" \ +-X POST \ +-H "Content-Type: application/json" \ +-H "Accept: application/json" \ +-H "Authorization: Bearer YOUR_JWT_TOKEN" \ +-d "{ + \"TemplatePath\": \"templates/template_1.json\" +}" +``` + +This request will return the template content as a JSON response. You can then use this template in your document parsing operations. + +### Step 3: Understanding the Response + +The API response will contain the complete template definition including: + +- Fields with their positions and properties +- Tables with their detection parameters +- All configuration needed for data extraction + +Let's examine a sample response: + +```json +{ + "fields": [ + { + "fieldName": "Address", + "fieldPosition": { + "fieldPositionType": "Fixed", + "rectangle": { + "position": { + "x": 13.0, + "y": 35.0 + }, + "size": { + "height": 10.0, + "width": 100.0 + } + }, + "matchCase": false, + "isLeftLinked": false, + "isRightLinked": false, + "isTopLinked": false, + "isBottomLinked": false, + "autoScale": false + } + }, + { + "fieldName": "Company", + "fieldPosition": { + "fieldPositionType": "Linked", + "matchCase": false, + "linkedFieldName": "Address", + "isLeftLinked": false, + "isRightLinked": false, + "isTopLinked": false, + "isBottomLinked": true, + "searchArea": { + "height": 15.0, + "width": 100.0 + }, + "autoScale": true + } + } + ], + "tables": [ + { + "tableName": "Totals", + "detectorParameters": { + "rectangle": { + "position": { + "x": 300.0, + "y": 385.0 + }, + "size": { + "height": 220.0, + "width": 65.0 + } + } + } + } + ] +} +``` + +## SDK Implementation + +Let's implement the same template retrieval using the SDK of your choice. + +### C# Example + +{{< gist groupdocscloud 39135fbf5cfb74deeeae6c47eafb2473 Parser_CSharp_Get_Template_Tutorial.cs >}} + +### Java Example + +{{< gist groupdocscloud c8b8e01a52ef2bae6fa5d78aba152238 Parser_Java_Get_Template_Tutorial.java >}} + +## Using the Retrieved Template + +After retrieving the template, you can use it in the Parse operation. Here's how you can incorporate the template in your parsing workflow: + +### Step 4: Parse a Document Using the Retrieved Template + +```bash +curl -v "https://api.groupdocs.cloud/v1.0/parser/parse" \ +-X POST \ +-H "Content-Type: application/json" \ +-H "Accept: application/json" \ +-H "Authorization: Bearer YOUR_JWT_TOKEN" \ +-d "{ + \"FileInfo\": { + \"FilePath\": \"invoices/sample.pdf\" + }, + \"Template\": { + \"TemplatePath\": \"templates/template_1.json\" + } +}" +``` + +This will parse the document using the fields and tables defined in your template. + +## Troubleshooting Tips + +### Common Issues and Solutions + +1. Template Not Found: + - Verify that the template path is correct + - Check that the storage name is correct (if not using default storage) + - Ensure the template file exists in your storage + +2. Authentication Errors: + - Ensure your Client ID and Client Secret are correct + - Check that your JWT token hasn't expired + +3. Invalid Template Format: + - Confirm that the template was created correctly + - Verify that the template is a valid JSON file + +### Learning Checkpoint + +Let's verify your understanding with a few questions: + +1. What is the HTTP method used for retrieving a template? + - Answer: POST + +2. What are the required parameters for the Get Template endpoint? + - Answer: TemplatePath (and optionally StorageName if not using the default storage) + +3. What can you do with a retrieved template? + - Answer: Use it in Parse operations to extract data from documents + +## What You've Learned + +In this tutorial, you've learned how to: +- Retrieve templates from your GroupDocs.Parser Cloud storage +- Understand the structure of retrieved templates +- Implement template retrieval using cURL, C#, and Java +- Use retrieved templates in your document parsing workflow +- Troubleshoot common template retrieval issues + +## Further Practice + +To reinforce your learning, try these exercises: +1. Retrieve templates from different storage locations +2. Create a template library by retrieving and listing all available templates +3. Implement error handling for template retrieval in your application +4. Create a workflow that retrieves a template, checks if it exists, and creates it if it doesn't + +## Next Steps + +Now that you know how to retrieve templates, continue your learning journey with our next tutorial: [Deleting Unused Templates]({{< ref "/template-operations/delete-template" >}}). + +## Helpful Resources + +- [Product Page](https://products.groupdocs.cloud/parser/) +- [Documentation](https://docs.groupdocs.cloud/parser/) +- [Live Demo](https://products.groupdocs.app/parser/family) +- [API Reference](https://reference.groupdocs.cloud/parser/) +- [Blog](https://blog.groupdocs.cloud/categories/groupdocs.parser-cloud-product-family/) +- [Free Support](https://forum.groupdocs.cloud/c/parser/19/) +- [Free Trial](https://dashboard.groupdocs.cloud/#/apps) + +Have questions about this tutorial? Feel free to ask in our [support forum](https://forum.groupdocs.cloud/c/parser/19/). We're here to help you master template-based document parsing!