diff --git a/README.md b/README.md index e907db2..238e02e 100644 --- a/README.md +++ b/README.md @@ -5,12 +5,12 @@ [![codecov](https://codecov.io/gh/JuliaIO/ZipArchives.jl/branch/main/graph/badge.svg?token=K3J0T9BZ42)](https://codecov.io/gh/JuliaIO/ZipArchives.jl) [![Aqua QA](https://raw.githubusercontent.com/JuliaTesting/Aqua.jl/master/badge.svg)](https://github.com/JuliaTesting/Aqua.jl) -Read and write Zip archives in julia. +Read and write Zip archives in Julia. Like Tar.jl, it is designed to use the Zip format to share data between multiple computers, not to backup a directory and preserve all local filesystem metadata. -All public functions are exported, non exported functions and struct fields are internal. +All public functions are exported. Non-exported functions and struct fields are internal. See [test/test_simple-usage.jl](https://github.com/JuliaIO/ZipArchives.jl/blob/main/test/test_simple-usage.jl) for more examples. @@ -20,14 +20,14 @@ See [test/test_simple-usage.jl](https://github.com/JuliaIO/ZipArchives.jl/blob/m An archive contains a list of named entries. These entries represent archived files or empty directories. Internally there is no file system like tree structure; however, -the entry name may have "/"s to represented a relative path. +the entry name may have "/"s to represent a relative path. -At the end of the archive there is a "central directory" of all entry names, sizes, +At the end of the archive, there is a "central directory" of all entry names, sizes, and other metadata. The central directory gets parsed first when reading an archive. -The central directory makes it fast to read just one random entry out of a very large archive. +The central directory makes reading just one random entry out of a large archive fast. When writing it is important to close the writer so the central directory gets written out. @@ -35,9 +35,9 @@ More details on the file format can be found at https://pkware.cachefly.net/webd ### Reading Zip archives -Archives can be read from any `AbstractVector{UInt8}` containing the data of a zip archive. +Archives can be read from any `AbstractVector{UInt8}` containing the data of a Zip archive. -For example if you download this repo as a ".zip" from github https://github.com/JuliaIO/ZipArchives.jl/archive/refs/heads/main.zip you can read this README in julia. +For example, if you download this repo as a ".zip" from GitHub https://github.com/JuliaIO/ZipArchives.jl/archive/refs/heads/main.zip you can read this README in julia. ```julia using ZipArchives: ZipReader, zip_names, zip_readentry @@ -64,9 +64,9 @@ using Test: @test_throws filename = tempname() ``` Open a new zip file with `ZipWriter` -If a file already exists at filename, it will be replaced. +If a file already exists at `filename`, it will be replaced. Using the do syntax ensures the file will be closed. -Otherwise make sure to close the ZipWriter to finish writing the file. +Otherwise, make sure to close the ZipWriter to finish writing the file. ```julia ZipWriter(filename) do w @@ -87,6 +87,22 @@ ZipWriter(filename) do w end ``` +### Streaming one entry in a large archive file +If your archive is in a file, `mmap` can be used to treat the file as a `Vector{UInt8}`. + +An entry in the archive can be opened as an `IO` stream using `zip_openentry`. + +```julia +using ZipArchives: ZipReader, zip_openentry +using Downloads: download +using Mmap: mmap +zip_file_path = download("https://github.com/JuliaIO/ZipArchives.jl/archive/refs/heads/main.zip"); +archive = ZipReader(mmap(open(zip_file_path))) +readme_n_lines = zip_openentry(archive, "ZipArchives.jl-main/README.md") do io + countlines(io) +end +``` + ### Supported Compression Methods | Compression Method | Reading | Writing | @@ -100,7 +116,7 @@ end 1. Cannot directly extract all files in an archive and write those files to disk. 1. Ignores time stamps. 1. Cannot write an archive fully in streaming mode. See https://github.com/madler/zipflow if you need this functionality. -1. Encryption and decryption not supported. +1. Encryption and decryption are not supported. 1. Multi disk archives not supported. 1. Cannot recover data from a corrupted archive. Especially if the end of the archive is corrupted. @@ -115,12 +131,12 @@ It is just a wrapper of p7zip, and must be run as an external program. ZipFile is very similar to ZipArchives at a high level. -Currently ZipArchives has the following benefits over ZipFile: +Currently, ZipArchives has the following benefits over ZipFile: 1. Full ZIP64 support: archives larger than 4GB can be written. 2. UTF-8 file name support: entry names correctly mark that they are UTF-8. -3. Safe multi threaded reading of different entries in a single archive. +3. Safe multi-threaded reading of different entries in a single archive. 4. Files can be marked as executable. Permissions are handled like in https://github.com/JuliaIO/Tar.jl#permissions -5. By default when writing an archive, entry names are checked to avoid some common issues if the archive would be extracted on windows. +5. By default when writing an archive, entry names are checked to avoid some common issues if the archive is extracted on Windows. 6. Ability to append to an existing zip archive, in an `IO` or in a file on disk. ZipArchives currently has the following limitations compared to ZipFile: @@ -132,11 +148,11 @@ ZipArchives currently has the following limitations compared to ZipFile: -## Is there a unzip function for a whole archive? +## Is there an unzip function for a whole archive? This package cannot unzip a whole archive to disk with a single function. -This is quite complicated to do in a cross platform manner that also handles all potential errors or malicious zip archives in a safe way. +This is quite complicated to do in a cross-platform manner that also handles all potential errors or malicious Zip archives safely. -So this could be done in a separate package that depends on this package. Or using existing well tested C libraries such as `p7zip_jll` +So this could be done in a separate package that depends on this package. Or using existing well-tested C libraries such as `p7zip_jll` -I am happy to add other high level functions for creating zip archives to this package. +I am happy to add other high-level functions for creating zip archives to this package.