Skip to content

Database Improvements#88

Merged
theyoyojo merged 10 commits into
masterfrom
db_improvements
Apr 25, 2024
Merged

Database Improvements#88
theyoyojo merged 10 commits into
masterfrom
db_improvements

Conversation

@charliemirabile

Copy link
Copy Markdown
Contributor

In preparation for switching from raw sql queries to an orm, we should clean up the existing database schema and make sure the existing code works with that schema, so that the schema can be more easily translated into a strictly typed class.

Each commit in this pr involves changing the schema of the database which means that the the binary blob containing a gzip'd tar of an empty orbit db file in the tests must be updated. This script will be helpful when reviewing. It takes arguments of two git objects (commit shas, or expressions that evaluate to one like master or v0.1 or HEAD or HEAD^), and shows a diff between the schemas (both in a traditional line base diff, and a word diff) and also calls the sqldiff utility from sqlite to show if any values changed (no values should change at any point since the db is and should remain empty).

#!/bin/sh

tmp1=$(mktemp)
tmp2=$(mktemp)
git show $1:test.sh | sed -n "/'EOF'/,/EOF/p" | head -n -1 | tail -n +2 | xxd -r | tar -zxO orbit.db > $tmp1
git show $2:test.sh | sed -n "/'EOF'/,/EOF/p" | head -n -1 | tail -n +2 | xxd -r | tar -zxO orbit.db > $tmp2

diff -U3 --color=auto <(sqlite3 $tmp1 .schema) <(sqlite3 $tmp2 .schema)
wdiff -w "$(printf '\033[31m')" -x "$(printf '\033[0m')" -y "$(printf '\033[32m')" -z "$(printf '\033[0m')" <(sqlite3 $tmp1 .schema) <(sqlite3 $tmp2 .schema)

sqldiff $tmp1 $tmp2

rm $tmp1 $tmp2

In the users table the student id was being stored as an integer, but
in the registration table it was being stored as a string. Use a string
since the exact format of what a program might use as a student id is not
consistent. E.g. while UML ids are integers, we may want to use emails
for LFX students, and other institutions may have alphanumeric ids etc.

Since the registration table also puts a unique constraint on the column
(since the id is the lookup key for registration), that constraint should
also be applied to the column in the user table.

empty orbit db binary blob in tests needed updating again to reflect new
schema. The timestamp changed in the tar metadata, but as for the actual
contents, only and the schema was changed.
since the expiry time is a floating point value, it should be
stored as one to avoid converions back and forth to string.

Again empty orbit db binary blob needed updating.
string is not a type in sqlite - text is. If you put string, it
treats the column like an integer but also can accommodate strings
because sqlite is not strict with the column values.

binary blob in test needed updating again
sqlite databases do not enforce strict types for columns by default
instead it is possible to insert any sort of value into any column.
Using the keyword "strict" after each table forces sqlite to error
if it cannot losslessly convert a provided value into the type that
a column specifies.
This query returns the credentials for a given student ID
while simultaenously deleting the entry for them so that only
one db call is needed to register a user. If no user is found
that matches the credentials, nothing is deleted and this case
can still be detected by the empty return.
This simplifies the code nicely, and makes it faster since
only one call to the db is needed.
now that the combined query exists, it fully supplants
the two prior single purpose querries.
The primary key was always really the student id and now
that we have the combo query to fetch and delete entries
there is no longer any need to track this id explicitly.
This field isn't really used for anything. Its value is
managed by sqlite, and it only exists to get printed as
part of the user info... but that is just printing all fields
its not like it was actually there for any reason. Just cargo
cult. There is a rowid column provided automatically by sqlite
that it can fully handle behind the scenes, so why include this
column explicitly.
as of previous commit this no longer is used for anything so we
can remove it from the schema.

binary blob needed updating as always.
@charliemirabile

Copy link
Copy Markdown
Contributor Author

This script was helpful to me for updating the binary blob in the tests. Not needed for reviewing the changes, but useful for anyone else who will change the schema.

#!/bin/sh
set -ex

#create a new empty db from the schema using version of sqlite from alpine
podman run \
	-i \
	--rm \
	alpine:3.19 \
	sh -c '
		apk add sqlite >/dev/null 2>/dev/null
		sqlite3 db
		cat db
	' \
	< orbit/init-db.sql \
	> /tmp/orbit.db

trap 'rm /tmp/orbit.db' EXIT

#create hexdump of db. Use options suggested here https://reproducible-builds.org/docs/archives/ to make it idempotent
tar \
	--sort=name \
	--mtime='@0' \
	--owner=100 \
	--group=100 \
	--numeric-owner \
	--pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime \
	-C /tmp/ \
	-cz \
	orbit.db \
| xxd \
> /tmp/orbit.db.hex

trap 'rm /tmp/orbit.db /tmp/orbit.db.hex' EXIT

# delete current hexdump and replace with new one
# commands are:
# 1) seek to line after one matching /'EOF'/
# 2) delete from curr line to one before one matching /EOF/
# 3) read in hex file starting at line before the one matching /EOF/
# 4) write and quit
ed -s test.sh <<'EOF'
/'EOF'/+
.,/EOF/- d
- r /tmp/orbit.db.hex
wq
EOF

@charliemirabile

charliemirabile commented Apr 24, 2024

Copy link
Copy Markdown
Contributor Author

this script however will be useful. It does a full cycle of bringing containers up/down rebuilding and testing. The real innovation here is that it knows how to wait until synapse is finished initializing before starting the tests.

#!/bin/sh
set -ex

podman-compose down || true # not a big deal if this fails since they might not be up
yes | podman volume prune #slightly scary in case you have other volumes you care about. FIXME: could name them explicitly
podman-compose build
podman-compose up -d
# wait until synapse is done initializing
podman-compose logs -f submatrix 2>&1 | sed '/Synapse now listening on TCP port 8008/ q'
./test.sh
podman-compose down

@theyoyojo theyoyojo merged commit 5521454 into master Apr 25, 2024
@theyoyojo theyoyojo deleted the db_improvements branch April 25, 2024 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants