diff --git a/README.md b/README.md index 01d7bb5ff..55ae71515 100644 --- a/README.md +++ b/README.md @@ -151,11 +151,11 @@ Install-Package ManagedCode.MarkItDown dotnet add package ManagedCode.MarkItDown # PackageReference (add to your .csproj) - + ``` ### Prerequisites -- .NET 9.0 SDK or later +- .NET 9.0 SDK or later (project targets net9.0) - Compatible with .NET 9 apps and libraries ### Optional Dependencies for Advanced Features @@ -219,21 +219,17 @@ Console.WriteLine(urlResult.Title); ### Customise the pipeline with options ```csharp -using Azure; using MarkItDown; var options = new MarkItDownOptions { - // Plug in your own services (Azure AI, OpenAI, etc.) + // Plug in your own services (custom image captioning, audio transcription, etc.) ImageCaptioner = async (bytes, info, token) => await myCaptionService.DescribeAsync(bytes, info, token), AudioTranscriber = async (bytes, info, token) => await speechClient.TranscribeAsync(bytes, info, token), - DocumentIntelligence = new DocumentIntelligenceOptions - { - Endpoint = "https://.cognitiveservices.azure.com/", - Credential = new AzureKeyCredential("") - } + // Note: Azure Document Intelligence integration is planned but not yet implemented + ExifToolPath = "/usr/local/bin/exiftool" }; var markItDown = new MarkItDown(options); @@ -309,20 +305,20 @@ markItDown.RegisterConverter(new MyCustomConverter()); git clone https://github.com/managedcode/markitdown.git cd markitdown -# Build the solution -dotnet build +# Build the solution (requires .NET 9 SDK for .slnx support) +dotnet build src/MarkItDown/MarkItDown.csproj # Run tests -dotnet test +dotnet test tests/MarkItDown.Tests/MarkItDown.Tests.csproj # Create NuGet package -dotnet pack --configuration Release +dotnet pack src/MarkItDown/MarkItDown.csproj --configuration Release ``` ### Tests & Coverage ```bash -dotnet test --collect:"XPlat Code Coverage" +dotnet test tests/MarkItDown.Tests/MarkItDown.Tests.csproj --collect:"XPlat Code Coverage" ``` The command emits standard test results plus a Cobertura coverage report at @@ -334,13 +330,12 @@ HTML or Markdown dashboards. ``` ├── src/ -│ ├── MarkItDown/ # Core library -│ │ ├── Converters/ # Format-specific converters (HTML, PDF, audio, etc.) -│ │ ├── MarkItDown.cs # Main conversion engine -│ │ ├── StreamInfoGuesser.cs # MIME/charset/extension detection helpers -│ │ ├── MarkItDownOptions.cs # Runtime configuration flags -│ │ └── ... # Shared utilities (UriUtilities, MimeMapping, etc.) -│ └── MarkItDown.Cli/ # CLI host (under active development) +│ └── MarkItDown/ # Core library +│ ├── Converters/ # Format-specific converters (HTML, PDF, audio, etc.) +│ ├── MarkItDown.cs # Main conversion engine +│ ├── StreamInfoGuesser.cs # MIME/charset/extension detection helpers +│ ├── MarkItDownOptions.cs # Runtime configuration flags +│ └── ... # Shared utilities (UriUtilities, MimeMapping, etc.) ├── tests/ │ └── MarkItDown.Tests/ # xUnit + Shouldly tests, Python parity vectors (WIP) ├── Directory.Build.props # Shared build + packaging settings @@ -359,9 +354,9 @@ HTML or Markdown dashboards. ## 🗺️ Roadmap ### 🎯 Near-Term -- Azure Document Intelligence converter (options already scaffolded) +- Azure Document Intelligence converter (options already scaffolded, implementation pending) +- CLI tool for command-line usage - Outlook `.msg` ingestion via MIT-friendly dependencies -- Expanded CLI commands (batch mode, globbing, JSON output) - Richer regression suite mirroring Python test vectors ### 🎯 Future Ideas