From 7c1f6a1ad609ecd33ceda5655cd8fc02137f3e5e Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Tue, 25 Oct 2022 10:52:08 -0400 Subject: [PATCH 01/15] config: add config values for packed-refs v2 When updating the file format version for something as critical as ref storage, the file format version must come with an extension change. The extensions.refFormat config value is a multi-valued config value that defaults to the pair "files" and "packed". Add "packed-v2" as a possible value to extensions.refFormat. This value specifies that the packed-refs file may exist in the version 2 format. (If the "packed" value does not exist, then the packed-refs file must exist in version 2, not version 1.) In order to select version 2 for writing, the user will have two options. First, the user could remove "packed" and add "packed-v2" to the extensions.refFormat list. This would imply that version 2 is the only format available. However, this also means that version 1 files would be ignored at read time, so this does not allow users to upgrade repositories with existing packed-refs files. Add a new refs.packedRefsVersion config option which allows specifying which version to use during writes. Thus, when both "packed" and "packed-v2" are in the extensions.refFormat list, the user can upgrade from version 1 to version 2, or downgrade from 2 to 1. Currently, the implementation does not use refs.packedRefsVersion, as that is delayed until we have the code to write that file format version. However, we can add the necessary enum values and flag constants to communicate the presence of "packed-v2" in the extensions.refFormat list. Signed-off-by: Derrick Stolee --- Documentation/config.txt | 2 ++ Documentation/config/extensions.txt | 27 ++++++++++++++++++++++----- Documentation/config/refs.txt | 13 +++++++++++++ refs.c | 4 +++- refs/packed-backend.c | 17 ++++++++++++++++- refs/refs-internal.h | 5 +++-- repository.h | 1 + setup.c | 2 ++ t/t3212-ref-formats.sh | 19 +++++++++++++++++++ 9 files changed, 81 insertions(+), 9 deletions(-) create mode 100644 Documentation/config/refs.txt diff --git a/Documentation/config.txt b/Documentation/config.txt index 0e93aef86264db..e480f99c3e16bc 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -493,6 +493,8 @@ include::config/rebase.txt[] include::config/receive.txt[] +include::config/refs.txt[] + include::config/remote.txt[] include::config/remotes.txt[] diff --git a/Documentation/config/extensions.txt b/Documentation/config/extensions.txt index 18071c336d0f42..05abb821e0773f 100644 --- a/Documentation/config/extensions.txt +++ b/Documentation/config/extensions.txt @@ -35,17 +35,34 @@ indicate the existence of different layers: `files`, the `packed` format will only be used to group multiple loose object files upon request via the `git pack-refs` command or via the `pack-refs` maintenance task. + +`packed-v2`;; + When present, references may be stored as a group in a + `packed-refs` file in its version 2 format. This file is in the + same position and interacts with loose refs the same as when the + `packed` value exists. Both `packed` and `packed-v2` must exist to + upgrade an existing `packed-refs` file from version 1 to version 2 + or to downgrade from version 2 to version 1. When both are + present, the `refs.packedRefsVersion` config value indicates which + file format version is used during writes, but both versions are + understood when reading the file. -- + The following combinations are supported by this version of Git: + -- -`files` and `packed`;; +`files` and (`packed` and/or `packed-v2`);; This set of values indicates that references are stored both as - loose reference files and in the `packed-refs` file in its v1 - format. Loose references are preferred, and the `packed-refs` file - is updated only when deleting a reference that is stored in the - `packed-refs` file or during a `git pack-refs` command. + loose reference files and in the `packed-refs` file. Loose + references are preferred, and the `packed-refs` file is updated + only when deleting a reference that is stored in the `packed-refs` + file or during a `git pack-refs` command. ++ +The presence of `packed` and `packed-v2` specifies whether the `packed-refs` +file is allowed to be in its v1 or v2 formats, respectively. When only one +is present, Git will refuse to read the `packed-refs` file that do not +match the expected format. When both are present, the `refs.packedRefsVersion` +config option indicates which file format is used during writes. `files`;; When only this value is present, Git will ignore the `packed-refs` diff --git a/Documentation/config/refs.txt b/Documentation/config/refs.txt new file mode 100644 index 00000000000000..b2fdb2923f7652 --- /dev/null +++ b/Documentation/config/refs.txt @@ -0,0 +1,13 @@ +refs.packedRefsVersion:: + Specifies the file format version to use when writing a `packed-refs` + file. Defaults to `1`. ++ +The only other value currently allowed is `2`, which uses a structured file +format to result in a smaller `packed-refs` file. In order to write this +file format version, the repository must also have the `packed-v2` extension +enabled. The most typical setup will include the +`core.repositoryFormatVersion=1` config value and the `extensions.refFormat` +key will have three values: `files`, `packed`, and `packed-v2`. ++ +If `extensions.refFormat` has the value `packed-v2` and not `packed`, then +`refs.packedRefsVersion` defaults to `2`. diff --git a/refs.c b/refs.c index 21441ddb1620a8..bf53d1445f28ee 100644 --- a/refs.c +++ b/refs.c @@ -1987,6 +1987,8 @@ static int add_ref_format_flags(enum ref_format_flags flags, int caps) { caps |= REF_STORE_FORMAT_FILES; if (flags & REF_FORMAT_PACKED) caps |= REF_STORE_FORMAT_PACKED; + if (flags & REF_FORMAT_PACKED_V2) + caps |= REF_STORE_FORMAT_PACKED_V2; return caps; } @@ -2006,7 +2008,7 @@ static struct ref_store *ref_store_init(struct repository *repo, flags = add_ref_format_flags(repo->ref_format, flags); if (!(flags & REF_STORE_FORMAT_FILES) && - (flags & REF_STORE_FORMAT_PACKED)) + packed_refs_enabled(flags)) be_name = "packed"; be = find_ref_storage_backend(be_name); diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 7ed9475812c81d..655aab939bea0d 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -236,7 +236,13 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs) if (!load_contents(snapshot)) return snapshot; - if (parse_packed_format_v1_header(refs, snapshot, &sorted)) { + /* + * If this is a v1 file format, but we don't have v1 enabled, + * then ignore it the same way we would as if we didn't + * understand it. + */ + if (parse_packed_format_v1_header(refs, snapshot, &sorted) || + !(refs->store_flags & REF_STORE_FORMAT_PACKED)) { clear_snapshot(refs); return NULL; } @@ -310,6 +316,12 @@ static int packed_read_raw_ref(struct ref_store *ref_store, const char *refname, packed_downcast(ref_store, REF_STORE_READ, "read_raw_ref"); struct snapshot *snapshot = get_snapshot(refs); + if (!snapshot) { + /* refname is not a packed reference. */ + *failure_errno = ENOENT; + return -1; + } + return packed_read_raw_ref_v1(refs, snapshot, refname, oid, type, failure_errno); } @@ -410,6 +422,9 @@ static struct ref_iterator *packed_ref_iterator_begin( */ snapshot = get_snapshot(refs); + if (!snapshot) + return empty_ref_iterator_begin(); + if (prefix && *prefix) start = find_reference_location_v1(snapshot, prefix, 0); else diff --git a/refs/refs-internal.h b/refs/refs-internal.h index a1900848a878e4..39b93fce97c993 100644 --- a/refs/refs-internal.h +++ b/refs/refs-internal.h @@ -522,11 +522,12 @@ struct ref_store; REF_STORE_MAIN) #define REF_STORE_FORMAT_FILES (1 << 8) /* can use loose ref files */ -#define REF_STORE_FORMAT_PACKED (1 << 9) /* can use packed-refs file */ +#define REF_STORE_FORMAT_PACKED (1 << 9) /* can use v1 packed-refs file */ +#define REF_STORE_FORMAT_PACKED_V2 (1 << 10) /* can use v2 packed-refs file */ static inline int packed_refs_enabled(int flags) { - return flags & REF_STORE_FORMAT_PACKED; + return flags & (REF_STORE_FORMAT_PACKED | REF_STORE_FORMAT_PACKED_V2); } /* diff --git a/repository.h b/repository.h index 5cfde4282c50c0..ee3a90efc72b72 100644 --- a/repository.h +++ b/repository.h @@ -64,6 +64,7 @@ struct repo_path_cache { enum ref_format_flags { REF_FORMAT_FILES = (1 << 0), REF_FORMAT_PACKED = (1 << 1), + REF_FORMAT_PACKED_V2 = (1 << 2), }; struct repository { diff --git a/setup.c b/setup.c index a5e63479558a94..72bfa289ade820 100644 --- a/setup.c +++ b/setup.c @@ -582,6 +582,8 @@ static enum extension_result handle_extension(const char *var, data->ref_format |= REF_FORMAT_FILES; else if (!strcmp(value, "packed")) data->ref_format |= REF_FORMAT_PACKED; + else if (!strcmp(value, "packed-v2")) + data->ref_format |= REF_FORMAT_PACKED_V2; else return error(_("invalid value for '%s': '%s'"), "extensions.refFormat", value); diff --git a/t/t3212-ref-formats.sh b/t/t3212-ref-formats.sh index 67aa65c116f642..cd1b399bbb8e02 100755 --- a/t/t3212-ref-formats.sh +++ b/t/t3212-ref-formats.sh @@ -56,4 +56,23 @@ test_expect_success 'extensions.refFormat=files only' ' ) ' +test_expect_success 'extensions.refFormat=files,packed-v2' ' + test_commit Q && + git pack-refs --all && + git init no-packed-v1 && + ( + cd no-packed-v1 && + git config core.repositoryFormatVersion 1 && + git config extensions.refFormat files && + git config --add extensions.refFormat packed-v2 && + test_commit A && + test_commit B && + + # Refuse to parse a v1 packed-refs file. + cp ../.git/packed-refs .git/packed-refs && + test_must_fail git rev-parse refs/tags/Q && + rm -f .git/packed-refs + ) +' + test_done From eb6152f1d39310b2e0527bbe199e25b941744753 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Tue, 25 Oct 2022 11:35:37 -0400 Subject: [PATCH 02/15] packed-backend: create shell of v2 writes Signed-off-by: Derrick Stolee --- Makefile | 1 + refs/packed-backend.c | 75 +++++++++++++++++++++++++++++++++++------ refs/packed-backend.h | 7 ++++ refs/packed-format-v2.c | 38 +++++++++++++++++++++ 4 files changed, 110 insertions(+), 11 deletions(-) create mode 100644 refs/packed-format-v2.c diff --git a/Makefile b/Makefile index 3dc887941d44bb..16cd245e0adfc9 100644 --- a/Makefile +++ b/Makefile @@ -1058,6 +1058,7 @@ LIB_OBJS += refs/files-backend.o LIB_OBJS += refs/iterator.o LIB_OBJS += refs/packed-backend.o LIB_OBJS += refs/packed-format-v1.o +LIB_OBJS += refs/packed-format-v2.o LIB_OBJS += refs/ref-cache.o LIB_OBJS += refspec.o LIB_OBJS += remote.o diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 655aab939bea0d..09f7b74584f462 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -692,6 +692,45 @@ int merge_iterator_and_updates(struct packed_ref_store *refs, return ok; } +static int write_with_updates_v1(struct packed_ref_store *refs, + struct string_list *updates, + struct strbuf *err) +{ + FILE *out; + + out = fdopen_tempfile(refs->tempfile, "w"); + if (!out) { + strbuf_addf(err, "unable to fdopen packed-refs tempfile: %s", + strerror(errno)); + goto error; + } + + if (write_packed_file_header_v1(out) < 0) { + add_write_error(refs, err); + goto error; + } + + return merge_iterator_and_updates(refs, updates, err, + write_packed_entry_v1, out); + +error: + return -1; +} + +static int write_with_updates_v2(struct packed_ref_store *refs, + struct string_list *updates, + struct strbuf *err) +{ + struct write_packed_refs_v2_context *ctx = create_v2_context(refs, updates, err); + int ok = -1; + + if ((ok = write_packed_refs_v2(ctx)) < 0) + add_write_error(refs, err); + + free_v2_context(ctx); + return ok; +} + /* * Write the packed refs from the current snapshot to the packed-refs * tempfile, incorporating any changes from `updates`. `updates` must @@ -707,9 +746,9 @@ static int write_with_updates(struct packed_ref_store *refs, struct strbuf *err) { int ok; - FILE *out; struct strbuf sb = STRBUF_INIT; char *packed_refs_path; + int version; if (!is_lock_file_locked(&refs->lock)) BUG("write_with_updates() called while unlocked"); @@ -731,21 +770,35 @@ static int write_with_updates(struct packed_ref_store *refs, } strbuf_release(&sb); - out = fdopen_tempfile(refs->tempfile, "w"); - if (!out) { - strbuf_addf(err, "unable to fdopen packed-refs tempfile: %s", - strerror(errno)); - goto error; + if (git_config_get_int("refs.packedrefsversion", &version)) { + /* + * Set the default depending on the current extension + * list. Default to version 1 if available, but allow a + * default of 2 if only "packed-v2" exists. + */ + if (refs->store_flags & REF_STORE_FORMAT_PACKED) + version = 1; + else if (refs->store_flags & REF_STORE_FORMAT_PACKED_V2) + version = 2; + else + BUG("writing a packed-refs file without an extension"); } - if (write_packed_file_header_v1(out) < 0) { - add_write_error(refs, err); + switch (version) { + case 1: + ok = write_with_updates_v1(refs, updates, err); + break; + + case 2: + ok = write_with_updates_v2(refs, updates, err); + break; + + default: + strbuf_addf(err, "unknown packed-refs version: %d", + version); goto error; } - ok = merge_iterator_and_updates(refs, updates, err, - write_packed_entry_v1, out); - if (ok != ITER_DONE) { strbuf_addstr(err, "unable to write packed-refs file: " "error iterating over old contents"); diff --git a/refs/packed-backend.h b/refs/packed-backend.h index b6908bb002c153..e76f26bfc4690d 100644 --- a/refs/packed-backend.h +++ b/refs/packed-backend.h @@ -243,4 +243,11 @@ int write_packed_entry_v1(const char *refname, const struct object_id *peeled, void *write_data); +struct write_packed_refs_v2_context; +struct write_packed_refs_v2_context *create_v2_context(struct packed_ref_store *refs, + struct string_list *updates, + struct strbuf *err); +int write_packed_refs_v2(struct write_packed_refs_v2_context *ctx); +void free_v2_context(struct write_packed_refs_v2_context *ctx); + #endif /* REFS_PACKED_BACKEND_H */ diff --git a/refs/packed-format-v2.c b/refs/packed-format-v2.c new file mode 100644 index 00000000000000..ecf3cc93694316 --- /dev/null +++ b/refs/packed-format-v2.c @@ -0,0 +1,38 @@ +#include "../cache.h" +#include "../config.h" +#include "../refs.h" +#include "refs-internal.h" +#include "packed-backend.h" +#include "../iterator.h" +#include "../lockfile.h" +#include "../chdir-notify.h" + +struct write_packed_refs_v2_context { + struct packed_ref_store *refs; + struct string_list *updates; + struct strbuf *err; +}; + +struct write_packed_refs_v2_context *create_v2_context(struct packed_ref_store *refs, + struct string_list *updates, + struct strbuf *err) +{ + struct write_packed_refs_v2_context *ctx; + CALLOC_ARRAY(ctx, 1); + + ctx->refs = refs; + ctx->updates = updates; + ctx->err = err; + + return ctx; +} + +int write_packed_refs_v2(struct write_packed_refs_v2_context *ctx) +{ + return 0; +} + +void free_v2_context(struct write_packed_refs_v2_context *ctx) +{ + free(ctx); +} From 740c2f6e6d1e628a84dc4e1927fef70b5d8d624c Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Tue, 25 Oct 2022 13:00:40 -0400 Subject: [PATCH 03/15] packed-refs: write file format version 2 TODO: add writing tests. Signed-off-by: Derrick Stolee --- refs/packed-backend.c | 3 +- refs/packed-format-v2.c | 108 ++++++++++++++++++++++++++++++++++++++++ t/t3212-ref-formats.sh | 6 ++- 3 files changed, 115 insertions(+), 2 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 09f7b74584f462..3429e63620a764 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -790,7 +790,8 @@ static int write_with_updates(struct packed_ref_store *refs, break; case 2: - ok = write_with_updates_v2(refs, updates, err); + /* Convert the normal error codes to ITER_DONE. */ + ok = write_with_updates_v2(refs, updates, err) ? -2 : ITER_DONE; break; default: diff --git a/refs/packed-format-v2.c b/refs/packed-format-v2.c index ecf3cc93694316..044cc9f629ac02 100644 --- a/refs/packed-format-v2.c +++ b/refs/packed-format-v2.c @@ -6,11 +6,30 @@ #include "../iterator.h" #include "../lockfile.h" #include "../chdir-notify.h" +#include "../chunk-format.h" +#include "../csum-file.h" + +#define OFFSET_IS_PEELED (((uint64_t)1) << 63) + +#define PACKED_REFS_SIGNATURE 0x50524546 /* "PREF" */ +#define CHREFS_CHUNKID_OFFSETS 0x524F4646 /* "ROFF" */ +#define CHREFS_CHUNKID_REFS 0x52454653 /* "REFS" */ struct write_packed_refs_v2_context { struct packed_ref_store *refs; struct string_list *updates; struct strbuf *err; + + struct hashfile *f; + struct chunkfile *cf; + + /* + * As we stream the ref names to the refs chunk, store these + * values in-memory. These arrays are populated one for every ref. + */ + uint64_t *offsets; + size_t nr; + size_t offsets_alloc; }; struct write_packed_refs_v2_context *create_v2_context(struct packed_ref_store *refs, @@ -24,15 +43,104 @@ struct write_packed_refs_v2_context *create_v2_context(struct packed_ref_store * ctx->updates = updates; ctx->err = err; + if (!fdopen_tempfile(refs->tempfile, "w")) { + strbuf_addf(err, "unable to fdopen packed-refs tempfile: %s", + strerror(errno)); + return ctx; + } + + ctx->f = hashfd(refs->tempfile->fd, refs->tempfile->filename.buf); + ctx->cf = init_chunkfile(ctx->f); + return ctx; } +static int write_packed_entry_v2(const char *refname, + const struct object_id *oid, + const struct object_id *peeled, + void *write_data) +{ + struct write_packed_refs_v2_context *ctx = write_data; + size_t reflen = strlen(refname) + 1; + size_t i = ctx->nr; + + ALLOC_GROW(ctx->offsets, i + 1, ctx->offsets_alloc); + + /* Write entire ref, including null terminator. */ + hashwrite(ctx->f, refname, reflen); + hashwrite(ctx->f, oid->hash, the_hash_algo->rawsz); + if (peeled) + hashwrite(ctx->f, peeled->hash, the_hash_algo->rawsz); + + if (i) + ctx->offsets[i] = (ctx->offsets[i - 1] & (~OFFSET_IS_PEELED)); + else + ctx->offsets[i] = 0; + ctx->offsets[i] += reflen + the_hash_algo->rawsz; + + if (peeled) { + ctx->offsets[i] += the_hash_algo->rawsz; + ctx->offsets[i] |= OFFSET_IS_PEELED; + } + + ctx->nr++; + return 0; +} + +static int write_refs_chunk_refs(struct hashfile *f, + void *data) +{ + struct write_packed_refs_v2_context *ctx = data; + int ok; + + trace2_region_enter("refs", "refs-chunk", the_repository); + ok = merge_iterator_and_updates(ctx->refs, ctx->updates, ctx->err, + write_packed_entry_v2, ctx); + trace2_region_leave("refs", "refs-chunk", the_repository); + + return ok != ITER_DONE; +} + +static int write_refs_chunk_offsets(struct hashfile *f, + void *data) +{ + struct write_packed_refs_v2_context *ctx = data; + size_t i; + + trace2_region_enter("refs", "offsets", the_repository); + for (i = 0; i < ctx->nr; i++) + hashwrite_be64(f, ctx->offsets[i]); + + trace2_region_leave("refs", "offsets", the_repository); + return 0; +} + int write_packed_refs_v2(struct write_packed_refs_v2_context *ctx) { + unsigned char file_hash[GIT_MAX_RAWSZ]; + + add_chunk(ctx->cf, CHREFS_CHUNKID_REFS, 0, write_refs_chunk_refs); + add_chunk(ctx->cf, CHREFS_CHUNKID_OFFSETS, 0, write_refs_chunk_offsets); + + hashwrite_be32(ctx->f, PACKED_REFS_SIGNATURE); + hashwrite_be32(ctx->f, 2); + hashwrite_be32(ctx->f, the_hash_algo->format_id); + + if (write_chunkfile(ctx->cf, CHUNKFILE_TRAILING_TOC, ctx)) + goto failure; + + finalize_hashfile(ctx->f, file_hash, FSYNC_COMPONENT_REFERENCE, + CSUM_HASH_IN_STREAM | CSUM_FSYNC); + return 0; + +failure: + return -1; } void free_v2_context(struct write_packed_refs_v2_context *ctx) { + if (ctx->cf) + free_chunkfile(ctx->cf); free(ctx); } diff --git a/t/t3212-ref-formats.sh b/t/t3212-ref-formats.sh index cd1b399bbb8e02..03c713ac4f6e89 100755 --- a/t/t3212-ref-formats.sh +++ b/t/t3212-ref-formats.sh @@ -71,7 +71,11 @@ test_expect_success 'extensions.refFormat=files,packed-v2' ' # Refuse to parse a v1 packed-refs file. cp ../.git/packed-refs .git/packed-refs && test_must_fail git rev-parse refs/tags/Q && - rm -f .git/packed-refs + rm -f .git/packed-refs && + + # Create a v2 packed-refs file + git pack-refs --all && + test_path_exists .git/packed-refs ) ' From 701c5ad22e7787cfd27628ff613b7849e24fc675 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Tue, 25 Oct 2022 16:31:18 -0400 Subject: [PATCH 04/15] packed-refs: read file format v2 Signed-off-by: Derrick Stolee --- refs/packed-backend.c | 129 ++++++++++++++++--------- refs/packed-backend.h | 72 ++++++++++++-- refs/packed-format-v2.c | 209 ++++++++++++++++++++++++++++++++++++++++ t/t3212-ref-formats.sh | 17 +++- 4 files changed, 372 insertions(+), 55 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 3429e63620a764..549cce1f84abbf 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -66,7 +66,7 @@ void clear_snapshot_buffer(struct snapshot *snapshot) * Decrease the reference count of `*snapshot`. If it goes to zero, * free `*snapshot` and return true; otherwise return false. */ -static int release_snapshot(struct snapshot *snapshot) +int release_snapshot(struct snapshot *snapshot) { if (!--snapshot->referrers) { stat_validity_clear(&snapshot->validity); @@ -142,7 +142,6 @@ static int load_contents(struct snapshot *snapshot) { int fd; struct stat st; - size_t size; ssize_t bytes_read; if (!packed_refs_enabled(snapshot->refs->store_flags)) @@ -168,25 +167,25 @@ static int load_contents(struct snapshot *snapshot) if (fstat(fd, &st) < 0) die_errno("couldn't stat %s", snapshot->refs->path); - size = xsize_t(st.st_size); + snapshot->buflen = xsize_t(st.st_size); - if (!size) { + if (!snapshot->buflen) { close(fd); return 0; - } else if (mmap_strategy == MMAP_NONE || size <= SMALL_FILE_SIZE) { - snapshot->buf = xmalloc(size); - bytes_read = read_in_full(fd, snapshot->buf, size); - if (bytes_read < 0 || bytes_read != size) + } else if (mmap_strategy == MMAP_NONE || snapshot->buflen <= SMALL_FILE_SIZE) { + snapshot->buf = xmalloc(snapshot->buflen); + bytes_read = read_in_full(fd, snapshot->buf, snapshot->buflen); + if (bytes_read < 0 || bytes_read != snapshot->buflen) die_errno("couldn't read %s", snapshot->refs->path); snapshot->mmapped = 0; } else { - snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); + snapshot->buf = xmmap(NULL, snapshot->buflen, PROT_READ, MAP_PRIVATE, fd, 0); snapshot->mmapped = 1; } close(fd); snapshot->start = snapshot->buf; - snapshot->eof = snapshot->buf + size; + snapshot->eof = snapshot->buf + snapshot->buflen; return 1; } @@ -232,46 +231,52 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs) snapshot->refs = refs; acquire_snapshot(snapshot); snapshot->peeled = PEELED_NONE; + snapshot->version = 1; if (!load_contents(snapshot)) return snapshot; - /* - * If this is a v1 file format, but we don't have v1 enabled, - * then ignore it the same way we would as if we didn't - * understand it. - */ - if (parse_packed_format_v1_header(refs, snapshot, &sorted) || - !(refs->store_flags & REF_STORE_FORMAT_PACKED)) { - clear_snapshot(refs); - return NULL; - } + if ((refs->store_flags & REF_STORE_FORMAT_PACKED) && + !detect_packed_format_v2_header(refs, snapshot)) { + parse_packed_format_v1_header(refs, snapshot, &sorted); + snapshot->version = 1; + verify_buffer_safe_v1(snapshot); - verify_buffer_safe_v1(snapshot); + if (!sorted) { + sort_snapshot_v1(snapshot); - if (!sorted) { - sort_snapshot_v1(snapshot); + /* + * Reordering the records might have moved a short one + * to the end of the buffer, so verify the buffer's + * safety again: + */ + verify_buffer_safe_v1(snapshot); + } - /* - * Reordering the records might have moved a short one - * to the end of the buffer, so verify the buffer's - * safety again: - */ - verify_buffer_safe_v1(snapshot); + if (mmap_strategy != MMAP_OK && snapshot->mmapped) { + /* + * We don't want to leave the file mmapped, so we are + * forced to make a copy now: + */ + char *buf_copy = xmalloc(snapshot->buflen); + + memcpy(buf_copy, snapshot->start, snapshot->buflen); + clear_snapshot_buffer(snapshot); + snapshot->buf = snapshot->start = buf_copy; + snapshot->eof = buf_copy + snapshot->buflen; + } + + return snapshot; } - if (mmap_strategy != MMAP_OK && snapshot->mmapped) { + if (refs->store_flags & REF_STORE_FORMAT_PACKED_V2) { /* - * We don't want to leave the file mmapped, so we are - * forced to make a copy now: + * Assume we are in v2 format mode, now. + * + * fill_snapshot_v2() will die() if parsing fails. */ - size_t size = snapshot->eof - snapshot->start; - char *buf_copy = xmalloc(size); - - memcpy(buf_copy, snapshot->start, size); - clear_snapshot_buffer(snapshot); - snapshot->buf = snapshot->start = buf_copy; - snapshot->eof = buf_copy + size; + fill_snapshot_v2(snapshot); + snapshot->version = 2; } return snapshot; @@ -322,8 +327,18 @@ static int packed_read_raw_ref(struct ref_store *ref_store, const char *refname, return -1; } - return packed_read_raw_ref_v1(refs, snapshot, refname, - oid, type, failure_errno); + switch (snapshot->version) { + case 1: + return packed_read_raw_ref_v1(refs, snapshot, refname, + oid, type, failure_errno); + + case 2: + return packed_read_raw_ref_v2(refs, snapshot, refname, + oid, type, failure_errno); + + default: + return -1; + } } /* @@ -335,7 +350,16 @@ static int packed_read_raw_ref(struct ref_store *ref_store, const char *refname, */ static int next_record(struct packed_ref_iterator *iter) { - return next_record_v1(iter); + switch (iter->version) { + case 1: + return next_record_v1(iter); + + case 2: + return next_record_v2(iter); + + default: + return -1; + } } static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator) @@ -410,6 +434,7 @@ static struct ref_iterator *packed_ref_iterator_begin( struct packed_ref_iterator *iter; struct ref_iterator *ref_iterator; unsigned int required_flags = REF_STORE_READ; + size_t v2_row = 0; if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN)) required_flags |= REF_STORE_ODB; @@ -422,13 +447,21 @@ static struct ref_iterator *packed_ref_iterator_begin( */ snapshot = get_snapshot(refs); - if (!snapshot) + if (!snapshot || snapshot->version < 0 || snapshot->version > 2) return empty_ref_iterator_begin(); - if (prefix && *prefix) - start = find_reference_location_v1(snapshot, prefix, 0); - else - start = snapshot->start; + if (prefix && *prefix) { + if (snapshot->version == 1) + start = find_reference_location_v1(snapshot, prefix, 0); + else + start = find_reference_location_v2(snapshot, prefix, 0, + &v2_row); + } else { + if (snapshot->version == 1) + start = snapshot->start; + else + start = snapshot->refs_chunk; + } if (start == snapshot->eof) return empty_ref_iterator_begin(); @@ -439,6 +472,8 @@ static struct ref_iterator *packed_ref_iterator_begin( iter->snapshot = snapshot; acquire_snapshot(snapshot); + iter->version = snapshot->version; + iter->row = v2_row; iter->pos = start; iter->eof = snapshot->eof; diff --git a/refs/packed-backend.h b/refs/packed-backend.h index e76f26bfc4690d..3a8649857f1365 100644 --- a/refs/packed-backend.h +++ b/refs/packed-backend.h @@ -72,6 +72,9 @@ struct snapshot { /* Is the `packed-refs` file currently mmapped? */ int mmapped; + /* which file format version is this file? */ + int version; + /* * The contents of the `packed-refs` file: * @@ -96,6 +99,14 @@ struct snapshot { */ enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled; + /************************* + * packed-refs v2 values * + *************************/ + size_t nr; + size_t buflen; + const unsigned char *offset_chunk; + const char *refs_chunk; + /* * Count of references to this instance, including the pointer * from `packed_ref_store::snapshot`, if any. The instance @@ -112,6 +123,8 @@ struct snapshot { struct stat_validity validity; }; +int release_snapshot(struct snapshot *snapshot); + /* * If the buffer in `snapshot` is active, then either munmap the * memory and close the file, or free the memory. Then set the buffer @@ -175,21 +188,30 @@ struct packed_ref_store { */ struct packed_ref_iterator { struct ref_iterator base; - struct snapshot *snapshot; + struct repository *repo; + unsigned int flags; + int version; + + /* Scratch space for current values: */ + struct object_id oid, peeled; + struct strbuf refname_buf; /* The current position in the snapshot's buffer: */ const char *pos; + /*********************************** + * packed-refs v1 iterator values. * + ***********************************/ + /* The end of the part of the buffer that will be iterated over: */ const char *eof; - /* Scratch space for current values: */ - struct object_id oid, peeled; - struct strbuf refname_buf; - - struct repository *repo; - unsigned int flags; + /*********************************** + * packed-refs v2 iterator values. * + ***********************************/ + size_t nr; + size_t row; }; typedef int (*write_ref_fn)(const char *refname, @@ -243,6 +265,42 @@ int write_packed_entry_v1(const char *refname, const struct object_id *peeled, void *write_data); +/** + * Parse the buffer at the given snapshot to verify that it is a + * packed-refs file in version 1 format. Update the snapshot->peeled + * value according to the header information. Update the given + * 'sorted' value with whether or not the packed-refs file is sorted. + */ +int parse_packed_format_v1_header(struct packed_ref_store *refs, + struct snapshot *snapshot, + int *sorted); + +int detect_packed_format_v2_header(struct packed_ref_store *refs, + struct snapshot *snapshot); +/* + * Find the place in `snapshot->buf` where the start of the record for + * `refname` starts. If `mustexist` is true and the reference doesn't + * exist, then return NULL. If `mustexist` is false and the reference + * doesn't exist, then return the point where that reference would be + * inserted, or `snapshot->eof` (which might be NULL) if it would be + * inserted at the end of the file. In the latter mode, `refname` + * doesn't have to be a proper reference name; for example, one could + * search for "refs/replace/" to find the start of any replace + * references. + * + * The record is sought using a binary search, so `snapshot->buf` must + * be sorted. + */ +const char *find_reference_location_v2(struct snapshot *snapshot, + const char *refname, int mustexist, + size_t *pos); + +int packed_read_raw_ref_v2(struct packed_ref_store *refs, struct snapshot *snapshot, + const char *refname, struct object_id *oid, + unsigned int *type, int *failure_errno); +int next_record_v2(struct packed_ref_iterator *iter); +void fill_snapshot_v2(struct snapshot *snapshot); + struct write_packed_refs_v2_context; struct write_packed_refs_v2_context *create_v2_context(struct packed_ref_store *refs, struct string_list *updates, diff --git a/refs/packed-format-v2.c b/refs/packed-format-v2.c index 044cc9f629ac02..d75df9545ec496 100644 --- a/refs/packed-format-v2.c +++ b/refs/packed-format-v2.c @@ -15,6 +15,215 @@ #define CHREFS_CHUNKID_OFFSETS 0x524F4646 /* "ROFF" */ #define CHREFS_CHUNKID_REFS 0x52454653 /* "REFS" */ +int detect_packed_format_v2_header(struct packed_ref_store *refs, + struct snapshot *snapshot) +{ + /* + * packed-refs v1 might not have a header, so check instead + * that the v2 signature is not present. + */ + return get_be32(snapshot->buf) == PACKED_REFS_SIGNATURE; +} + +static const char *get_nth_ref(struct snapshot *snapshot, + size_t n) +{ + uint64_t offset; + + if (n >= snapshot->nr) + BUG("asking for position %"PRIu64" outside of bounds (%"PRIu64")", + (uint64_t)n, (uint64_t)snapshot->nr); + + if (n) + offset = get_be64(snapshot->offset_chunk + (n-1) * sizeof(uint64_t)) + & ~OFFSET_IS_PEELED; + else + offset = 0; + + return snapshot->refs_chunk + offset; +} + +/* + * Find the place in `snapshot->buf` where the start of the record for + * `refname` starts. If `mustexist` is true and the reference doesn't + * exist, then return NULL. If `mustexist` is false and the reference + * doesn't exist, then return the point where that reference would be + * inserted, or `snapshot->eof` (which might be NULL) if it would be + * inserted at the end of the file. In the latter mode, `refname` + * doesn't have to be a proper reference name; for example, one could + * search for "refs/replace/" to find the start of any replace + * references. + * + * The record is sought using a binary search, so `snapshot->buf` must + * be sorted. + */ +const char *find_reference_location_v2(struct snapshot *snapshot, + const char *refname, int mustexist, + size_t *pos) +{ + size_t lo = 0, hi = snapshot->nr; + + while (lo != hi) { + const char *rec; + int cmp; + size_t mid = lo + (hi - lo) / 2; + + rec = get_nth_ref(snapshot, mid); + cmp = strcmp(rec, refname); + if (cmp < 0) { + lo = mid + 1; + } else if (cmp > 0) { + hi = mid; + } else { + if (pos) + *pos = mid; + return rec; + } + } + + if (mustexist) { + return NULL; + } else { + const char *ret; + /* + * We are likely doing a prefix match, so use the current + * 'lo' position as the indicator. + */ + if (pos) + *pos = lo; + if (lo >= snapshot->nr) + return NULL; + + ret = get_nth_ref(snapshot, lo); + return ret; + } +} + +int packed_read_raw_ref_v2(struct packed_ref_store *refs, struct snapshot *snapshot, + const char *refname, struct object_id *oid, + unsigned int *type, int *failure_errno) +{ + const char *rec; + + *type = 0; + + rec = find_reference_location_v2(snapshot, refname, 1, NULL); + + if (!rec) { + /* refname is not a packed reference. */ + *failure_errno = ENOENT; + return -1; + } + + hashcpy(oid->hash, (const unsigned char *)rec + strlen(rec) + 1); + oid->algo = hash_algo_by_ptr(the_hash_algo); + + *type = REF_ISPACKED; + return 0; +} + +static int packed_refs_read_offsets(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct snapshot *snapshot = data; + + snapshot->offset_chunk = chunk_start; + snapshot->nr = chunk_size / sizeof(uint64_t); + return 0; +} + +void fill_snapshot_v2(struct snapshot *snapshot) +{ + uint32_t file_signature, file_version, hash_version; + struct chunkfile *cf; + + file_signature = get_be32(snapshot->buf); + if (file_signature != PACKED_REFS_SIGNATURE) + die(_("%s file signature %X does not match signature %X"), + "packed-ref", file_signature, PACKED_REFS_SIGNATURE); + + file_version = get_be32(snapshot->buf + sizeof(uint32_t)); + if (file_version != 2) + die(_("format version %u does not match expected file version %u"), + file_version, 2); + + hash_version = get_be32(snapshot->buf + 2 * sizeof(uint32_t)); + if (hash_version != the_hash_algo->format_id) + die(_("hash version %X does not match expected hash version %X"), + hash_version, the_hash_algo->format_id); + + cf = init_chunkfile(NULL); + + if (read_trailing_table_of_contents(cf, (const unsigned char *)snapshot->buf, snapshot->buflen)) { + release_snapshot(snapshot); + snapshot = NULL; + goto cleanup; + } + + read_chunk(cf, CHREFS_CHUNKID_OFFSETS, packed_refs_read_offsets, snapshot); + pair_chunk(cf, CHREFS_CHUNKID_REFS, (const unsigned char**)&snapshot->refs_chunk); + + /* TODO: add error checks for invalid chunk combinations. */ + +cleanup: + free_chunkfile(cf); +} + +/* + * Move the iterator to the next record in the snapshot, without + * respect for whether the record is actually required by the current + * iteration. Adjust the fields in `iter` and return `ITER_OK` or + * `ITER_DONE`. This function does not free the iterator in the case + * of `ITER_DONE`. + */ +int next_record_v2(struct packed_ref_iterator *iter) +{ + uint64_t offset; + const char *pos = iter->pos; + strbuf_reset(&iter->refname_buf); + + if (iter->row == iter->snapshot->nr) + return ITER_DONE; + + iter->base.flags = REF_ISPACKED; + + strbuf_addstr(&iter->refname_buf, pos); + iter->base.refname = iter->refname_buf.buf; + pos += strlen(pos) + 1; + + hashcpy(iter->oid.hash, (const unsigned char *)pos); + iter->oid.algo = hash_algo_by_ptr(the_hash_algo); + pos += the_hash_algo->rawsz; + + if (check_refname_format(iter->base.refname, REFNAME_ALLOW_ONELEVEL)) { + if (!refname_is_safe(iter->base.refname)) + die("packed refname is dangerous: %s", + iter->base.refname); + oidclr(&iter->oid); + iter->base.flags |= REF_BAD_NAME | REF_ISBROKEN; + } + + /* We always know the peeled value! */ + iter->base.flags |= REF_KNOWS_PEELED; + + offset = get_be64(iter->snapshot->offset_chunk + sizeof(uint64_t) * iter->row); + if (offset & OFFSET_IS_PEELED) { + hashcpy(iter->peeled.hash, (const unsigned char *)pos); + iter->peeled.algo = hash_algo_by_ptr(the_hash_algo); + } else { + oidclr(&iter->peeled); + } + + /* TODO: somehow all tags are getting OFFSET_IS_PEELED even though + * some are not annotated tags. + */ + iter->pos = iter->snapshot->refs_chunk + (offset & (~OFFSET_IS_PEELED)); + + iter->row++; + + return ITER_OK; +} + struct write_packed_refs_v2_context { struct packed_ref_store *refs; struct string_list *updates; diff --git a/t/t3212-ref-formats.sh b/t/t3212-ref-formats.sh index 03c713ac4f6e89..571ba518ef1fa5 100755 --- a/t/t3212-ref-formats.sh +++ b/t/t3212-ref-formats.sh @@ -73,9 +73,24 @@ test_expect_success 'extensions.refFormat=files,packed-v2' ' test_must_fail git rev-parse refs/tags/Q && rm -f .git/packed-refs && + git for-each-ref --format="%(refname) %(objectname)" >expect-all && + git for-each-ref --format="%(refname) %(objectname)" \ + refs/tags/* >expect-tags && + # Create a v2 packed-refs file git pack-refs --all && - test_path_exists .git/packed-refs + test_path_exists .git/packed-refs && + for t in A B + do + test_path_is_missing .git/refs/tags/$t && + git rev-parse refs/tags/$t || return 1 + done && + + git for-each-ref --format="%(refname) %(objectname)" >actual-all && + test_cmp expect-all actual-all && + git for-each-ref --format="%(refname) %(objectname)" \ + refs/tags/* >actual-tags && + test_cmp expect-tags actual-tags ) ' From 9b3bd93e51e5ed4358c76263e96c4b4e218987b7 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Tue, 25 Oct 2022 16:52:40 -0400 Subject: [PATCH 05/15] packed-refs: read optional prefix chunks Signed-off-by: Derrick Stolee --- refs/packed-backend.c | 2 + refs/packed-backend.h | 9 +++ refs/packed-format-v2.c | 159 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 170 insertions(+) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 549cce1f84abbf..ae904de9014e03 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -475,6 +475,8 @@ static struct ref_iterator *packed_ref_iterator_begin( iter->version = snapshot->version; iter->row = v2_row; + init_iterator_prefix_info(prefix, iter); + iter->pos = start; iter->eof = snapshot->eof; strbuf_init(&iter->refname_buf, 0); diff --git a/refs/packed-backend.h b/refs/packed-backend.h index 3a8649857f1365..1936bb5c76c141 100644 --- a/refs/packed-backend.h +++ b/refs/packed-backend.h @@ -103,9 +103,12 @@ struct snapshot { * packed-refs v2 values * *************************/ size_t nr; + size_t prefixes_nr; size_t buflen; const unsigned char *offset_chunk; const char *refs_chunk; + const unsigned char *prefix_offsets_chunk; + const char *prefix_chunk; /* * Count of references to this instance, including the pointer @@ -212,6 +215,9 @@ struct packed_ref_iterator { ***********************************/ size_t nr; size_t row; + size_t prefix_row_end; + size_t prefix_i; + const char *cur_prefix; }; typedef int (*write_ref_fn)(const char *refname, @@ -308,4 +314,7 @@ struct write_packed_refs_v2_context *create_v2_context(struct packed_ref_store * int write_packed_refs_v2(struct write_packed_refs_v2_context *ctx); void free_v2_context(struct write_packed_refs_v2_context *ctx); +void init_iterator_prefix_info(const char *prefix, + struct packed_ref_iterator *iter); + #endif /* REFS_PACKED_BACKEND_H */ diff --git a/refs/packed-format-v2.c b/refs/packed-format-v2.c index d75df9545ec496..0ab277f7ad42df 100644 --- a/refs/packed-format-v2.c +++ b/refs/packed-format-v2.c @@ -14,6 +14,79 @@ #define PACKED_REFS_SIGNATURE 0x50524546 /* "PREF" */ #define CHREFS_CHUNKID_OFFSETS 0x524F4646 /* "ROFF" */ #define CHREFS_CHUNKID_REFS 0x52454653 /* "REFS" */ +#define CHREFS_CHUNKID_PREFIX_DATA 0x50465844 /* "PFXD" */ +#define CHREFS_CHUNKID_PREFIX_OFFSETS 0x5046584F /* "PFXO" */ + +static const char *get_nth_prefix(struct snapshot *snapshot, + size_t n, size_t *len) +{ + uint64_t offset, next_offset; + + if (n >= snapshot->prefixes_nr) + BUG("asking for prefix %"PRIu64" outside of bounds (%"PRIu64")", + (uint64_t)n, (uint64_t)snapshot->prefixes_nr); + + if (n) + offset = get_be32(snapshot->prefix_offsets_chunk + + 2 * sizeof(uint32_t) * (n - 1)); + else + offset = 0; + + if (len) { + next_offset = get_be32(snapshot->prefix_offsets_chunk + + 2 * sizeof(uint32_t) * n); + + /* Prefix includes null terminator. */ + *len = next_offset - offset - 1; + } + + return snapshot->prefix_chunk + offset; +} + +/* + * Find the place in `snapshot->buf` where the start of the record for + * `refname` starts. If `mustexist` is true and the reference doesn't + * exist, then return NULL. If `mustexist` is false and the reference + * doesn't exist, then return the point where that reference would be + * inserted, or `snapshot->eof` (which might be NULL) if it would be + * inserted at the end of the file. In the latter mode, `refname` + * doesn't have to be a proper reference name; for example, one could + * search for "refs/replace/" to find the start of any replace + * references. + * + * The record is sought using a binary search, so `snapshot->buf` must + * be sorted. + */ +static const char *find_prefix_location(struct snapshot *snapshot, + const char *refname, size_t *pos) +{ + size_t lo = 0, hi = snapshot->prefixes_nr; + + while (lo != hi) { + const char *rec; + int cmp; + size_t len; + size_t mid = lo + (hi - lo) / 2; + + rec = get_nth_prefix(snapshot, mid, &len); + cmp = strncmp(rec, refname, len); + if (cmp < 0) { + lo = mid + 1; + } else if (cmp > 0) { + hi = mid; + } else { + /* we have a prefix match! */ + *pos = mid; + return rec; + } + } + + *pos = lo; + if (lo < snapshot->prefixes_nr) + return get_nth_prefix(snapshot, lo, NULL); + else + return NULL; +} int detect_packed_format_v2_header(struct packed_ref_store *refs, struct snapshot *snapshot) @@ -63,6 +136,46 @@ const char *find_reference_location_v2(struct snapshot *snapshot, { size_t lo = 0, hi = snapshot->nr; + if (snapshot->prefix_chunk) { + size_t prefix_row; + const char *prefix; + int found = 1; + + prefix = find_prefix_location(snapshot, refname, &prefix_row); + + if (!prefix || !starts_with(refname, prefix)) { + if (mustexist) + return NULL; + found = 0; + } + + /* The second 4-byte column of the prefix offsets */ + if (prefix_row) { + /* if prefix_row == 0, then lo = 0, which is already true. */ + lo = get_be32(snapshot->prefix_offsets_chunk + + 2 * sizeof(uint32_t) * (prefix_row - 1) + sizeof(uint32_t)); + } + + if (!found) { + const char *ret; + /* Terminate early with this lo position as the insertion point. */ + if (pos) + *pos = lo; + + if (lo >= snapshot->nr) + return NULL; + + ret = get_nth_ref(snapshot, lo); + return ret; + } + + hi = get_be32(snapshot->prefix_offsets_chunk + + 2 * sizeof(uint32_t) * prefix_row + sizeof(uint32_t)); + + if (prefix) + refname += strlen(prefix); + } + while (lo != hi) { const char *rec; int cmp; @@ -132,6 +245,16 @@ static int packed_refs_read_offsets(const unsigned char *chunk_start, return 0; } +static int packed_refs_read_prefix_offsets(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct snapshot *snapshot = data; + + snapshot->prefix_offsets_chunk = chunk_start; + snapshot->prefixes_nr = chunk_size / sizeof(uint64_t); + return 0; +} + void fill_snapshot_v2(struct snapshot *snapshot) { uint32_t file_signature, file_version, hash_version; @@ -163,6 +286,9 @@ void fill_snapshot_v2(struct snapshot *snapshot) read_chunk(cf, CHREFS_CHUNKID_OFFSETS, packed_refs_read_offsets, snapshot); pair_chunk(cf, CHREFS_CHUNKID_REFS, (const unsigned char**)&snapshot->refs_chunk); + read_chunk(cf, CHREFS_CHUNKID_PREFIX_OFFSETS, packed_refs_read_prefix_offsets, snapshot); + pair_chunk(cf, CHREFS_CHUNKID_PREFIX_DATA, (const unsigned char**)&snapshot->prefix_chunk); + /* TODO: add error checks for invalid chunk combinations. */ cleanup: @@ -187,6 +313,8 @@ int next_record_v2(struct packed_ref_iterator *iter) iter->base.flags = REF_ISPACKED; + if (iter->cur_prefix) + strbuf_addstr(&iter->refname_buf, iter->cur_prefix); strbuf_addstr(&iter->refname_buf, pos); iter->base.refname = iter->refname_buf.buf; pos += strlen(pos) + 1; @@ -221,9 +349,40 @@ int next_record_v2(struct packed_ref_iterator *iter) iter->row++; + if (iter->row == iter->prefix_row_end && iter->snapshot->prefix_chunk) { + size_t prefix_pos = get_be32(iter->snapshot->prefix_offsets_chunk + + 2 * sizeof(uint32_t) * iter->prefix_i); + iter->cur_prefix = iter->snapshot->prefix_chunk + prefix_pos; + iter->prefix_i++; + iter->prefix_row_end = get_be32(iter->snapshot->prefix_offsets_chunk + + 2 * sizeof(uint32_t) * iter->prefix_i + sizeof(uint32_t)); + } + return ITER_OK; } +void init_iterator_prefix_info(const char *prefix, + struct packed_ref_iterator *iter) +{ + struct snapshot *snapshot = iter->snapshot; + + if (snapshot->version != 2 || !snapshot->prefix_chunk) { + iter->prefix_row_end = snapshot->nr; + return; + } + + if (prefix) + iter->cur_prefix = find_prefix_location(snapshot, prefix, &iter->prefix_i); + else { + iter->cur_prefix = snapshot->prefix_chunk; + iter->prefix_i = 0; + } + + iter->prefix_row_end = get_be32(snapshot->prefix_offsets_chunk + + 2 * sizeof(uint32_t) * iter->prefix_i + + sizeof(uint32_t)); +} + struct write_packed_refs_v2_context { struct packed_ref_store *refs; struct string_list *updates; From 36f9aa02ebfb967799036c4a0a648ab332c2612b Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Tue, 25 Oct 2022 16:55:33 -0400 Subject: [PATCH 06/15] packed-refs: write prefix chunks Tests already cover that we will start reading these prefixes. TODO: discuss time and space savings over typical approach. Signed-off-by: Derrick Stolee --- refs/packed-format-v2.c | 103 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 103 insertions(+) diff --git a/refs/packed-format-v2.c b/refs/packed-format-v2.c index 0ab277f7ad42df..2cd45a5987a64c 100644 --- a/refs/packed-format-v2.c +++ b/refs/packed-format-v2.c @@ -398,6 +398,18 @@ struct write_packed_refs_v2_context { uint64_t *offsets; size_t nr; size_t offsets_alloc; + + int write_prefixes; + const char *cur_prefix; + size_t cur_prefix_len; + + char **prefixes; + uint32_t *prefix_offsets; + uint32_t *prefix_rows; + size_t prefix_nr; + size_t prefixes_alloc; + size_t prefix_offsets_alloc; + size_t prefix_rows_alloc; }; struct write_packed_refs_v2_context *create_v2_context(struct packed_ref_store *refs, @@ -434,6 +446,56 @@ static int write_packed_entry_v2(const char *refname, ALLOC_GROW(ctx->offsets, i + 1, ctx->offsets_alloc); + if (ctx->write_prefixes) { + if (ctx->cur_prefix && starts_with(refname, ctx->cur_prefix)) { + /* skip ahead! */ + refname += ctx->cur_prefix_len; + reflen -= ctx->cur_prefix_len; + } else { + size_t len; + const char *slash, *slashslash = NULL; + if (ctx->prefix_nr) { + /* close out the old prefix. */ + ctx->prefix_rows[ctx->prefix_nr - 1] = ctx->nr; + } + + /* Find the new prefix. */ + slash = strchr(refname, '/'); + if (slash) + slashslash = strchr(slash + 1, '/'); + /* If there are two slashes, use that. */ + slash = slashslash ? slashslash : slash; + /* + * If there is at least one slash, use that, + * and include the slash in the string. + * Otherwise, use the end of the ref. + */ + slash = slash ? slash + 1 : refname + strlen(refname); + + len = slash - refname; + ALLOC_GROW(ctx->prefixes, ctx->prefix_nr + 1, ctx->prefixes_alloc); + ALLOC_GROW(ctx->prefix_offsets, ctx->prefix_nr + 1, ctx->prefix_offsets_alloc); + ALLOC_GROW(ctx->prefix_rows, ctx->prefix_nr + 1, ctx->prefix_rows_alloc); + + if (ctx->prefix_nr) + ctx->prefix_offsets[ctx->prefix_nr] = ctx->prefix_offsets[ctx->prefix_nr - 1] + len + 1; + else + ctx->prefix_offsets[ctx->prefix_nr] = len + 1; + + ctx->prefixes[ctx->prefix_nr] = xstrndup(refname, len); + ctx->cur_prefix = ctx->prefixes[ctx->prefix_nr]; + ctx->prefix_nr++; + + refname += len; + reflen -= len; + ctx->cur_prefix_len = len; + } + + /* Update the last row continually. */ + ctx->prefix_rows[ctx->prefix_nr - 1] = i + 1; + } + + /* Write entire ref, including null terminator. */ hashwrite(ctx->f, refname, reflen); hashwrite(ctx->f, oid->hash, the_hash_algo->rawsz); @@ -483,13 +545,54 @@ static int write_refs_chunk_offsets(struct hashfile *f, return 0; } +static int write_refs_chunk_prefix_data(struct hashfile *f, + void *data) +{ + struct write_packed_refs_v2_context *ctx = data; + size_t i; + + trace2_region_enter("refs", "prefix-data", the_repository); + for (i = 0; i < ctx->prefix_nr; i++) { + size_t len = strlen(ctx->prefixes[i]) + 1; + hashwrite(f, ctx->prefixes[i], len); + + /* TODO: assert the prefix lengths match the stored offsets? */ + } + + trace2_region_leave("refs", "prefix-data", the_repository); + return 0; +} + +static int write_refs_chunk_prefix_offsets(struct hashfile *f, + void *data) +{ + struct write_packed_refs_v2_context *ctx = data; + size_t i; + + trace2_region_enter("refs", "prefix-offsets", the_repository); + for (i = 0; i < ctx->prefix_nr; i++) { + hashwrite_be32(f, ctx->prefix_offsets[i]); + hashwrite_be32(f, ctx->prefix_rows[i]); + } + + trace2_region_leave("refs", "prefix-offsets", the_repository); + return 0; +} + int write_packed_refs_v2(struct write_packed_refs_v2_context *ctx) { unsigned char file_hash[GIT_MAX_RAWSZ]; + ctx->write_prefixes = git_env_bool("GIT_TEST_WRITE_PACKED_REFS_PREFIXES", 1); + add_chunk(ctx->cf, CHREFS_CHUNKID_REFS, 0, write_refs_chunk_refs); add_chunk(ctx->cf, CHREFS_CHUNKID_OFFSETS, 0, write_refs_chunk_offsets); + if (ctx->write_prefixes) { + add_chunk(ctx->cf, CHREFS_CHUNKID_PREFIX_DATA, 0, write_refs_chunk_prefix_data); + add_chunk(ctx->cf, CHREFS_CHUNKID_PREFIX_OFFSETS, 0, write_refs_chunk_prefix_offsets); + } + hashwrite_be32(ctx->f, PACKED_REFS_SIGNATURE); hashwrite_be32(ctx->f, 2); hashwrite_be32(ctx->f, the_hash_algo->format_id); From 6fe60ef2f53e680b047581eefc629144048b2224 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Wed, 26 Oct 2022 13:44:20 -0400 Subject: [PATCH 07/15] packed-backend: create GIT_TEST_PACKED_REFS_VERSION When set, this will create a default value for the packed-refs file version on writes. When set to "2", it will automatically add the "packed-v2" value to extensions.refFormat. Not all tests pass with GIT_TEST_PACKED_REFS_VERSION=2 because they care specifically about the content of the packed-refs file. These tests will be updated in following changes. To start, though, disable the GIT_TEST_PACKED_REFS_VERSION environment variable in t3212-ref-formats.sh, since that script already tests both versions, including upgrade scenarios. Signed-off-by: Derrick Stolee --- refs/packed-backend.c | 3 ++- setup.c | 5 ++++- t/t3212-ref-formats.sh | 3 +++ 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index ae904de9014e03..e84f669c42e466 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -807,7 +807,8 @@ static int write_with_updates(struct packed_ref_store *refs, } strbuf_release(&sb); - if (git_config_get_int("refs.packedrefsversion", &version)) { + if (!(version = git_env_ulong("GIT_TEST_PACKED_REFS_VERSION", 0)) && + git_config_get_int("refs.packedrefsversion", &version)) { /* * Set the default depending on the current extension * list. Default to version 1 if available, but allow a diff --git a/setup.c b/setup.c index 72bfa289ade820..a4525732fe9aa3 100644 --- a/setup.c +++ b/setup.c @@ -732,8 +732,11 @@ int read_repository_format(struct repository_format *format, const char *path) clear_repository_format(format); /* Set default ref_format if no extensions.refFormat exists. */ - if (!format->ref_format_count) + if (!format->ref_format_count) { format->ref_format = REF_FORMAT_FILES | REF_FORMAT_PACKED; + if (git_env_ulong("GIT_TEST_PACKED_REFS_VERSION", 0) == 2) + format->ref_format |= REF_FORMAT_PACKED_V2; + } return format->version; } diff --git a/t/t3212-ref-formats.sh b/t/t3212-ref-formats.sh index 571ba518ef1fa5..5583f16db41fc9 100755 --- a/t/t3212-ref-formats.sh +++ b/t/t3212-ref-formats.sh @@ -2,6 +2,9 @@ test_description='test across ref formats' +GIT_TEST_PACKED_REFS_VERSION=0 +export GIT_TEST_PACKED_REFS_VERSION + . ./test-lib.sh test_expect_success 'extensions.refFormat requires core.repositoryFormatVersion=1' ' From 188a55ddcb876aab4e5476234da5412ace053b7b Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Wed, 26 Oct 2022 13:45:55 -0400 Subject: [PATCH 08/15] t1409: test with packed-refs v2 t1409-avoid-packing-refs.sh seeks to test that the packed-refs file is not modified unnecessarily. One way it does this is by creating a packed-refs file, then munging its contents and verifying that the munged data remains after other commands. For packed-refs v1, it suffices to add a line that is similar to a comment. For packed-refs v2, we cannot even add to the file without messing up the trailing table of contents of its chunked format. However, we can manipulate the last bytes that are within the trailing hash and use 'tail -c 4' to read them. This makes t1409 pass with GIT_TEST_PACKED_REFS_VERSION=2. Signed-off-by: Derrick Stolee --- t/t1409-avoid-packing-refs.sh | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/t/t1409-avoid-packing-refs.sh b/t/t1409-avoid-packing-refs.sh index be12fb63506e2a..dc8d58432c87b5 100755 --- a/t/t1409-avoid-packing-refs.sh +++ b/t/t1409-avoid-packing-refs.sh @@ -8,13 +8,29 @@ test_description='avoid rewriting packed-refs unnecessarily' # shouldn't upset readers, and it should be omitted if the file is # ever rewritten. mark_packed_refs () { - sed -e "s/^\(#.*\)/\1 t1409 /" .git/packed-refs >.git/packed-refs.new && - mv .git/packed-refs.new .git/packed-refs + if test "$GIT_TEST_PACKED_REFS_VERSION" = "2" + then + size=$(wc -c < .git/packed-refs) && + pos=$(expr $size - 4) && + printf "FAKE" | dd of=".git/packed-refs" bs=1 seek="$pos" conv=notrunc + else + sed -e "s/^\(#.*\)/\1 t1409 /" .git/packed-refs >.git/packed-refs.new && + mv .git/packed-refs.new .git/packed-refs + fi } # Verify that the packed-refs file is still marked. check_packed_refs_marked () { - grep -q '^#.* t1409 ' .git/packed-refs + if test "$GIT_TEST_PACKED_REFS_VERSION" = "2" + then + size=$(wc -c < .git/packed-refs) && + pos=$(expr $size - 4) && + tail -c 4 .git/packed-refs >actual && + printf "FAKE" >expect && + test_cmp expect actual + else + grep -q '^#.* t1409 ' .git/packed-refs + fi } test_expect_success 'setup' ' From 191ad7fdef6880738c25c307bbc3c1d66b6378b5 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Wed, 26 Oct 2022 13:59:04 -0400 Subject: [PATCH 09/15] t5312: allow packed-refs v2 format One test in t5312 uses 'grep' to detect that a ref is written in the packed-refs file instead of a loose object. This does not work when the packed-refs file is in v2 format, such as when GIT_TEST_PACKED_REFS_VERSION=2. Since the test already checks that the loose ref is missing, it suffices to check that 'git rev-parse' succeeds. Signed-off-by: Derrick Stolee --- t/t3210-pack-refs.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh index 577f32dc71ff2c..fe6c97d9087cd5 100755 --- a/t/t3210-pack-refs.sh +++ b/t/t3210-pack-refs.sh @@ -159,7 +159,7 @@ test_expect_success 'delete ref while another dangling packed ref' ' test_expect_success 'pack ref directly below refs/' ' git update-ref refs/top HEAD && git pack-refs --all --prune && - grep refs/top .git/packed-refs && + git rev-parse refs/top && test_path_is_missing .git/refs/top ' From f8b9f355f0fd11857928de90e79eb8fc284fb009 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Wed, 26 Oct 2022 15:22:15 -0400 Subject: [PATCH 10/15] t5502: add PACKED_REFS_V1 prerequisite The last test in t5502-quickfetch.sh exploits the packed-refs v1 file format by appending 1000 lines to the packed-refs file. If the packed-refs file is in the v2 format, this corrupts the file as unreadable. Instead of making the test slower, let's ignore it when GIT_TEST_PACKED_REFS_VERSION=2. The test is really about 'git fetch', not the packed-refs format. Create a prerequisite in case we want to use this technique again in the future. An alternative would be to write those 1000 refs using a different mechanism, but let's opt for the simpler case for now. Signed-off-by: Derrick Stolee --- t/t5502-quickfetch.sh | 2 +- t/test-lib.sh | 4 ++++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/t/t5502-quickfetch.sh b/t/t5502-quickfetch.sh index b160f8b7fb7e1f..0c4aadebae697b 100755 --- a/t/t5502-quickfetch.sh +++ b/t/t5502-quickfetch.sh @@ -122,7 +122,7 @@ test_expect_success 'quickfetch should not copy from alternate' ' ' -test_expect_success 'quickfetch should handle ~1000 refs (on Windows)' ' +test_expect_success PACKED_REFS_V1 'quickfetch should handle ~1000 refs (on Windows)' ' git gc && head=$(git rev-parse HEAD) && diff --git a/t/test-lib.sh b/t/test-lib.sh index 6db377f68b82c4..a244cd75c062b4 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -1954,3 +1954,7 @@ test_lazy_prereq FSMONITOR_DAEMON ' git version --build-options >output && grep "feature: fsmonitor--daemon" output ' + +test_lazy_prereq PACKED_REFS_V1 ' + test "$GIT_TEST_PACKED_REFS_VERSION" -ne "2" +' \ No newline at end of file From e6ed6e7607c4ce81876f281152c675fc14f0498c Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Wed, 26 Oct 2022 15:36:20 -0400 Subject: [PATCH 11/15] t3210: require packed-refs v1 for some tests Three tests in t3210-pack-refs.sh corrupt a packed-refs file to test that Git properly discovers and handles those failures. These tests assume that the file is in the v1 format, so add the PACKED_REFS_V1 prereq to skip these tests when GIT_TEST_PACKED_REFS_VERSION=2. Signed-off-by: Derrick Stolee --- t/t3210-pack-refs.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh index fe6c97d9087cd5..76251dfe05a0e2 100755 --- a/t/t3210-pack-refs.sh +++ b/t/t3210-pack-refs.sh @@ -197,7 +197,7 @@ test_expect_success 'notice d/f conflict with existing ref' ' test_must_fail git branch foo/bar/baz/lots/of/extra/components ' -test_expect_success 'reject packed-refs with unterminated line' ' +test_expect_success PACKED_REFS_V1 'reject packed-refs with unterminated line' ' cp .git/packed-refs .git/packed-refs.bak && test_when_finished "mv .git/packed-refs.bak .git/packed-refs" && printf "%s" "$HEAD refs/zzzzz" >>.git/packed-refs && @@ -206,7 +206,7 @@ test_expect_success 'reject packed-refs with unterminated line' ' test_cmp expected_err err ' -test_expect_success 'reject packed-refs containing junk' ' +test_expect_success PACKED_REFS_V1 'reject packed-refs containing junk' ' cp .git/packed-refs .git/packed-refs.bak && test_when_finished "mv .git/packed-refs.bak .git/packed-refs" && printf "%s\n" "bogus content" >>.git/packed-refs && @@ -215,7 +215,7 @@ test_expect_success 'reject packed-refs containing junk' ' test_cmp expected_err err ' -test_expect_success 'reject packed-refs with a short SHA-1' ' +test_expect_success PACKED_REFS_V1 'reject packed-refs with a short SHA-1' ' cp .git/packed-refs .git/packed-refs.bak && test_when_finished "mv .git/packed-refs.bak .git/packed-refs" && printf "%.7s %s\n" $HEAD refs/zzzzz >>.git/packed-refs && From 5aa0d4080291dda854fc1ea7655037822b53111a Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Wed, 26 Oct 2022 15:37:34 -0400 Subject: [PATCH 12/15] t*: skip packed-refs v2 over http tests The GIT_TEST_PACKED_REFS_VERSION=2 environment variable helps us test the packed-refs file format in its v2 version. This variable makes the Git process act as if the extensions.refFormat config key has "packed-v2" in its list. This means that if the environment variable is removed, the repository is in a bad state. This is sufficient for most test cases. However, tests that fetch over HTTP appear to lose this environment variable when executed through the HTTP server. Since the repositories are created via Git commands in the tests, the packed-refs files end up in the v2 format, but the server processes do not understand this and start serving empty payloads since they do not recognize any refs. The preferred long-term solution would be to ensure that the GIT_TEST_* environment variable persists into the HTTP server. However, these tests are not exercising any particularly tricky parts of the packed-refs file format. It may not be worth the effort to pass the environment variable and instead we can unset the environment variable (with a comment explaining why) in these tests. Signed-off-by: Derrick Stolee --- t/t5539-fetch-http-shallow.sh | 7 +++++++ t/t5541-http-push-smart.sh | 7 +++++++ t/t5542-push-http-shallow.sh | 7 +++++++ t/t5551-http-fetch-smart.sh | 7 +++++++ t/t5558-clone-bundle-uri.sh | 7 +++++++ 5 files changed, 35 insertions(+) diff --git a/t/t5539-fetch-http-shallow.sh b/t/t5539-fetch-http-shallow.sh index 3ea75d34ca0e7a..5e3b43043678d3 100755 --- a/t/t5539-fetch-http-shallow.sh +++ b/t/t5539-fetch-http-shallow.sh @@ -5,6 +5,13 @@ test_description='fetch/clone from a shallow clone over http' GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME +# If GIT_TEST_PACKED_REFS_VERSION=2, then the packed-refs file will +# be written in v2 format without extensions.refFormat=packed-v2. This +# causes issues for the HTTP server which does not carry over the +# environment variable to the server process. +GIT_TEST_PACKED_REFS_VERSION=0 +export GIT_TEST_PACKED_REFS_VERSION + . ./test-lib.sh . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd diff --git a/t/t5541-http-push-smart.sh b/t/t5541-http-push-smart.sh index fbad2d5ff5e9e3..495437dd3c74a0 100755 --- a/t/t5541-http-push-smart.sh +++ b/t/t5541-http-push-smart.sh @@ -7,6 +7,13 @@ test_description='test smart pushing over http via http-backend' GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME +# If GIT_TEST_PACKED_REFS_VERSION=2, then the packed-refs file will +# be written in v2 format without extensions.refFormat=packed-v2. This +# causes issues for the HTTP server which does not carry over the +# environment variable to the server process. +GIT_TEST_PACKED_REFS_VERSION=0 +export GIT_TEST_PACKED_REFS_VERSION + . ./test-lib.sh ROOT_PATH="$PWD" diff --git a/t/t5542-push-http-shallow.sh b/t/t5542-push-http-shallow.sh index c2cc83182f9a6b..c47b18b9faa52d 100755 --- a/t/t5542-push-http-shallow.sh +++ b/t/t5542-push-http-shallow.sh @@ -5,6 +5,13 @@ test_description='push from/to a shallow clone over http' GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME +# If GIT_TEST_PACKED_REFS_VERSION=2, then the packed-refs file will +# be written in v2 format without extensions.refFormat=packed-v2. This +# causes issues for the HTTP server which does not carry over the +# environment variable to the server process. +GIT_TEST_PACKED_REFS_VERSION=0 +export GIT_TEST_PACKED_REFS_VERSION + . ./test-lib.sh . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh index 6a38294a47671d..61f2e90eabeab9 100755 --- a/t/t5551-http-fetch-smart.sh +++ b/t/t5551-http-fetch-smart.sh @@ -4,6 +4,13 @@ test_description='test smart fetching over http via http-backend' GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME +# If GIT_TEST_PACKED_REFS_VERSION=2, then the packed-refs file will +# be written in v2 format without extensions.refFormat=packed-v2. This +# causes issues for the HTTP server which does not carry over the +# environment variable to the server process. +GIT_TEST_PACKED_REFS_VERSION=0 +export GIT_TEST_PACKED_REFS_VERSION + . ./test-lib.sh . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index 9155f31fa2cb58..3e35322155e4f2 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -2,6 +2,13 @@ test_description='test fetching bundles with --bundle-uri' +# If GIT_TEST_PACKED_REFS_VERSION=2, then the packed-refs file will +# be written in v2 format without extensions.refFormat=packed-v2. This +# causes issues for the HTTP server which does not carry over the +# environment variable to the server process. +GIT_TEST_PACKED_REFS_VERSION=0 +export GIT_TEST_PACKED_REFS_VERSION + . ./test-lib.sh test_expect_success 'fail to clone from non-existent file' ' From 9d261a55403f8c9d207cfb363689ba9964a57c57 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Wed, 26 Oct 2022 15:42:18 -0400 Subject: [PATCH 13/15] ci: run GIT_TEST_PACKED_REFS_VERSION=2 in some builds The linux-TEST-vars CI build helps us check that certain opt-in features are still exercised in at least one environment. The new GIT_TEST_PACKED_REFS_VERSION environment variable now passes the test suite when set to "2", so add this to that list of variables. This provides nearly the same coverage of the v2 format as we had in the v1 format. Signed-off-by: Derrick Stolee --- ci/run-build-and-tests.sh | 1 + 1 file changed, 1 insertion(+) diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh index 8ebff4259676e3..e93574ca262ea6 100755 --- a/ci/run-build-and-tests.sh +++ b/ci/run-build-and-tests.sh @@ -30,6 +30,7 @@ linux-TEST-vars) export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master export GIT_TEST_WRITE_REV_INDEX=1 export GIT_TEST_CHECKOUT_WORKERS=2 + export GIT_TEST_PACKED_REFS_VERSION=2 ;; linux-clang) export GIT_TEST_DEFAULT_HASH=sha1 From a4a69d8ee91d23465f945488ad42fa38818c2651 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 3 Nov 2022 10:23:00 -0400 Subject: [PATCH 14/15] p1401: create performance test for ref operations TBD Signed-off-by: Derrick Stolee --- t/perf/p1401-ref-operations.sh | 47 ++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) create mode 100755 t/perf/p1401-ref-operations.sh diff --git a/t/perf/p1401-ref-operations.sh b/t/perf/p1401-ref-operations.sh new file mode 100755 index 00000000000000..1c372ba0ee884d --- /dev/null +++ b/t/perf/p1401-ref-operations.sh @@ -0,0 +1,47 @@ +#!/bin/sh + +test_description="Tests performance of ref operations" + +. ./perf-lib.sh + +test_perf_large_repo + +test_perf 'git pack-refs (v1)' ' + git commit --allow-empty -m "change one ref" && + git pack-refs --all +' + +test_perf 'git for-each-ref (v1)' ' + git for-each-ref --format="%(refname)" >/dev/null +' + +test_perf 'git for-each-ref prefix (v1)' ' + git for-each-ref --format="%(refname)" refs/tags/ >/dev/null +' + +test_expect_success 'configure packed-refs v2' ' + git config core.repositoryFormatVersion 1 && + git config --add extensions.refFormat files && + git config --add extensions.refFormat packed && + git config --add extensions.refFormat packed-v2 && + git config refs.packedRefsVersion 2 && + git commit --allow-empty -m "change one ref" && + git pack-refs --all && + test_copy_bytes 16 .git/packed-refs | xxd >actual && + grep PREF actual +' + +test_perf 'git pack-refs (v2)' ' + git commit --allow-empty -m "change one ref" && + git pack-refs --all +' + +test_perf 'git for-each-ref (v2)' ' + git for-each-ref --format="%(refname)" >/dev/null +' + +test_perf 'git for-each-ref prefix (v2)' ' + git for-each-ref --format="%(refname)" refs/tags/ >/dev/null +' + +test_done From 37fb4e73ca711f642351d10e1db51c330a1544f1 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 3 Nov 2022 14:23:45 -0400 Subject: [PATCH 15/15] refs: skip hashing when writing packed-refs v2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The 'skip_hash' option in 'struct hashfile' indicates that we want to use the hashfile API as a buffered writer, and not use the hash function to create a trailing hash. We still write a trailing null hash to indicate that we do not have a checksum at the end. This feature is enabled for index writes using the 'index.computeHash' config key. Create a similar (currently hidden) option for the packed-refs v2 file format: refs.hashPackedRefs. This defaults to false because performance is compared to the packed-refs v1 file format which does have a checksum anywhere. This change results in improvements to p1401 when using a repository with a 42 MB packed-refs file (600,000+ refs). Test HEAD~1 HEAD -------------------------------------------------------------------- 1401.1: git pack-refs (v1) 0.38(0.31+0.52) 0.37(0.28+0.52) -2.6% 1401.5: git pack-refs (v2) 0.39(0.33+0.52) 0.30(0.28+0.46) -23.1% Note that these tests update a ref and then repack the packed-refs file. The following benchmarks are from a hyperfine experiment that only ran the 'git pack-refs --all' command for the two formats, but also compared the effect when refs.hashPackedRefs=true. Benchmark 1: v1 Time (mean ± σ): 163.5 ms ± 18.1 ms [User: 117.8 ms, System: 38.1 ms] Range (min … max): 131.3 ms … 190.4 ms 50 runs Benchmark 2: v2-no-hash Time (mean ± σ): 95.8 ms ± 15.1 ms [User: 72.5 ms, System: 23.0 ms] Range (min … max): 82.9 ms … 131.2 ms 50 runs Benchmark 3: v2-hashing Time (mean ± σ): 100.8 ms ± 16.4 ms [User: 77.2 ms, System: 23.1 ms] Range (min … max): 83.0 ms … 131.1 ms 50 runs Summary 'v2-no-hash' ran 1.05 ± 0.24 times faster than 'v2-hashing' 1.71 ± 0.33 times faster than 'v1' In this case of repeatedly rewriting the same refs seems to demonstrate a smaller improvement than the p1401 test. However, the overall reduction from v1 matches the expected reduction in file size. In my tests, the 42 MB packed-refs (v1) file was compacted to 28 MB in the v2 format. Signed-off-by: Derrick Stolee --- refs/packed-format-v2.c | 7 +++++++ t/perf/p1401-ref-operations.sh | 5 +++++ 2 files changed, 12 insertions(+) diff --git a/refs/packed-format-v2.c b/refs/packed-format-v2.c index 2cd45a5987a64c..ada34bf9bf05f2 100644 --- a/refs/packed-format-v2.c +++ b/refs/packed-format-v2.c @@ -417,6 +417,7 @@ struct write_packed_refs_v2_context *create_v2_context(struct packed_ref_store * struct strbuf *err) { struct write_packed_refs_v2_context *ctx; + int do_skip_hash; CALLOC_ARRAY(ctx, 1); ctx->refs = refs; @@ -430,6 +431,12 @@ struct write_packed_refs_v2_context *create_v2_context(struct packed_ref_store * } ctx->f = hashfd(refs->tempfile->fd, refs->tempfile->filename.buf); + + /* Default to true, so skip_hash if not set. */ + if (git_config_get_maybe_bool("refs.hashpackedrefs", &do_skip_hash) || + do_skip_hash) + ctx->f->skip_hash = 1; + ctx->cf = init_chunkfile(ctx->f); return ctx; diff --git a/t/perf/p1401-ref-operations.sh b/t/perf/p1401-ref-operations.sh index 1c372ba0ee884d..0b88a2f531a310 100755 --- a/t/perf/p1401-ref-operations.sh +++ b/t/perf/p1401-ref-operations.sh @@ -36,6 +36,11 @@ test_perf 'git pack-refs (v2)' ' git pack-refs --all ' +test_perf 'git pack-refs (v2;hashing)' ' + git commit --allow-empty -m "change one ref" && + git -c refs.hashPackedRefs=true pack-refs --all +' + test_perf 'git for-each-ref (v2)' ' git for-each-ref --format="%(refname)" >/dev/null '