From d5d7b607403835b0f31eb8ccf9092f0473681728 Mon Sep 17 00:00:00 2001 From: Nathan Zimmerberg <39104088+nhz2@users.noreply.github.com> Date: Fri, 20 Sep 2024 17:45:14 -0400 Subject: [PATCH 1/3] Add example of streaming only one file in the archive --- README.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/README.md b/README.md index e907db2..52945f8 100644 --- a/README.md +++ b/README.md @@ -87,6 +87,22 @@ ZipWriter(filename) do w end ``` +### Streaming one entry in a large archive file +If your archive is in a file, `mmap` can be used to treat the file as a `Vector{UInt8}` + +An entry can also be opened as an `IO` stream. + +```julia +using ZipArchives: ZipReader, zip_openentry +using Downloads: download +using Mmap: mmap +zip_file_path = download("https://github.com/JuliaIO/ZipArchives.jl/archive/refs/heads/main.zip"); +archive = ZipReader(mmap(open(zip_file_path))) +readme_n_lines = zip_openentry(archive, "ZipArchives.jl-main/README.md") do io + countlines(io) +end +``` + ### Supported Compression Methods | Compression Method | Reading | Writing | From 53a9bf1a2383f3e5577394e7bd5ba593344c4757 Mon Sep 17 00:00:00 2001 From: Nathan Zimmerberg <39104088+nhz2@users.noreply.github.com> Date: Sun, 22 Sep 2024 00:19:00 -0400 Subject: [PATCH 2/3] Other readme fixes --- README.md | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/README.md b/README.md index 52945f8..7042b49 100644 --- a/README.md +++ b/README.md @@ -5,12 +5,12 @@ [![codecov](https://codecov.io/gh/JuliaIO/ZipArchives.jl/branch/main/graph/badge.svg?token=K3J0T9BZ42)](https://codecov.io/gh/JuliaIO/ZipArchives.jl) [![Aqua QA](https://raw.githubusercontent.com/JuliaTesting/Aqua.jl/master/badge.svg)](https://github.com/JuliaTesting/Aqua.jl) -Read and write Zip archives in julia. +Read and write Zip archives in Julia. Like Tar.jl, it is designed to use the Zip format to share data between multiple computers, not to backup a directory and preserve all local filesystem metadata. -All public functions are exported, non exported functions and struct fields are internal. +All public functions are exported. Non-exported functions and struct fields are internal. See [test/test_simple-usage.jl](https://github.com/JuliaIO/ZipArchives.jl/blob/main/test/test_simple-usage.jl) for more examples. @@ -20,14 +20,14 @@ See [test/test_simple-usage.jl](https://github.com/JuliaIO/ZipArchives.jl/blob/m An archive contains a list of named entries. These entries represent archived files or empty directories. Internally there is no file system like tree structure; however, -the entry name may have "/"s to represented a relative path. +the entry name may have "/"s to represent a relative path. -At the end of the archive there is a "central directory" of all entry names, sizes, +At the end of the archive, there is a "central directory" of all entry names, sizes, and other metadata. The central directory gets parsed first when reading an archive. -The central directory makes it fast to read just one random entry out of a very large archive. +The central directory makes reading just one random entry out of a large archive fast. When writing it is important to close the writer so the central directory gets written out. @@ -37,7 +37,7 @@ More details on the file format can be found at https://pkware.cachefly.net/webd Archives can be read from any `AbstractVector{UInt8}` containing the data of a zip archive. -For example if you download this repo as a ".zip" from github https://github.com/JuliaIO/ZipArchives.jl/archive/refs/heads/main.zip you can read this README in julia. +For example, if you download this repo as a ".zip" from GitHub https://github.com/JuliaIO/ZipArchives.jl/archive/refs/heads/main.zip you can read this README in julia. ```julia using ZipArchives: ZipReader, zip_names, zip_readentry @@ -64,9 +64,9 @@ using Test: @test_throws filename = tempname() ``` Open a new zip file with `ZipWriter` -If a file already exists at filename, it will be replaced. +If a file already exists at `filename`, it will be replaced. Using the do syntax ensures the file will be closed. -Otherwise make sure to close the ZipWriter to finish writing the file. +Otherwise, make sure to close the ZipWriter to finish writing the file. ```julia ZipWriter(filename) do w @@ -90,7 +90,7 @@ end ### Streaming one entry in a large archive file If your archive is in a file, `mmap` can be used to treat the file as a `Vector{UInt8}` -An entry can also be opened as an `IO` stream. +An entry in the archive can be opened as an `IO` stream using `zip_openentry`. ```julia using ZipArchives: ZipReader, zip_openentry @@ -116,7 +116,7 @@ end 1. Cannot directly extract all files in an archive and write those files to disk. 1. Ignores time stamps. 1. Cannot write an archive fully in streaming mode. See https://github.com/madler/zipflow if you need this functionality. -1. Encryption and decryption not supported. +1. Encryption and decryption are not supported. 1. Multi disk archives not supported. 1. Cannot recover data from a corrupted archive. Especially if the end of the archive is corrupted. @@ -131,12 +131,12 @@ It is just a wrapper of p7zip, and must be run as an external program. ZipFile is very similar to ZipArchives at a high level. -Currently ZipArchives has the following benefits over ZipFile: +Currently, ZipArchives has the following benefits over ZipFile: 1. Full ZIP64 support: archives larger than 4GB can be written. 2. UTF-8 file name support: entry names correctly mark that they are UTF-8. -3. Safe multi threaded reading of different entries in a single archive. +3. Safe multi-threaded reading of different entries in a single archive. 4. Files can be marked as executable. Permissions are handled like in https://github.com/JuliaIO/Tar.jl#permissions -5. By default when writing an archive, entry names are checked to avoid some common issues if the archive would be extracted on windows. +5. By default when writing an archive, entry names are checked to avoid some common issues if the archive is extracted on Windows. 6. Ability to append to an existing zip archive, in an `IO` or in a file on disk. ZipArchives currently has the following limitations compared to ZipFile: @@ -148,11 +148,11 @@ ZipArchives currently has the following limitations compared to ZipFile: -## Is there a unzip function for a whole archive? +## Is there an unzip function for a whole archive? This package cannot unzip a whole archive to disk with a single function. -This is quite complicated to do in a cross platform manner that also handles all potential errors or malicious zip archives in a safe way. +This is quite complicated to do in a cross-platform manner that also handles all potential errors or malicious zip archives safely. -So this could be done in a separate package that depends on this package. Or using existing well tested C libraries such as `p7zip_jll` +So this could be done in a separate package that depends on this package. Or using existing well-tested C libraries such as `p7zip_jll` -I am happy to add other high level functions for creating zip archives to this package. +I am happy to add other high-level functions for creating zip archives to this package. From d58e5dfc289ac2f9193ac5e44c1333e4ce296fbf Mon Sep 17 00:00:00 2001 From: Nathan Zimmerberg <39104088+nhz2@users.noreply.github.com> Date: Sun, 22 Sep 2024 00:29:33 -0400 Subject: [PATCH 3/3] more typo fixes --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 7042b49..238e02e 100644 --- a/README.md +++ b/README.md @@ -35,7 +35,7 @@ More details on the file format can be found at https://pkware.cachefly.net/webd ### Reading Zip archives -Archives can be read from any `AbstractVector{UInt8}` containing the data of a zip archive. +Archives can be read from any `AbstractVector{UInt8}` containing the data of a Zip archive. For example, if you download this repo as a ".zip" from GitHub https://github.com/JuliaIO/ZipArchives.jl/archive/refs/heads/main.zip you can read this README in julia. @@ -88,7 +88,7 @@ end ``` ### Streaming one entry in a large archive file -If your archive is in a file, `mmap` can be used to treat the file as a `Vector{UInt8}` +If your archive is in a file, `mmap` can be used to treat the file as a `Vector{UInt8}`. An entry in the archive can be opened as an `IO` stream using `zip_openentry`. @@ -151,7 +151,7 @@ ZipArchives currently has the following limitations compared to ZipFile: ## Is there an unzip function for a whole archive? This package cannot unzip a whole archive to disk with a single function. -This is quite complicated to do in a cross-platform manner that also handles all potential errors or malicious zip archives safely. +This is quite complicated to do in a cross-platform manner that also handles all potential errors or malicious Zip archives safely. So this could be done in a separate package that depends on this package. Or using existing well-tested C libraries such as `p7zip_jll`