This is a collection of assumptions, API / implementation differences
and comments about the ndb rewrite process.
The primary differences come from:
- Absence of "legacy" APIs provided by Google App Engine (e.g.
google.appengine.api.datastore_types) as well as other environment specific features (e.g. theAPPLICATION_IDenvironment variable) - Differences in Datastore APIs between the versions provided by Google App Engine and Google Clould Platform.
- Presence of new features in Python 3 like keyword only arguments and async support
The biggest difference is in establishing a runtime context for your NDB application. The Google App Engine Python 2.7 runtime had a strong assumption that all code executed inside a web framework request-response cycle, in a single thread per request. In order to decouple from that assumption, Cloud NDB implements explicit clients and contexts. This is consistent with other Cloud client libraries.
The Client class has been introduced which by and large works the same as
Datastore's Client class and uses google.auth for authentication. You
can pass a credentials parameter to Client or use the
GOOGLE_APPLICATION_CREDENTIALS environment variable (recommended). See
[https://cloud.google.com/docs/authentication/getting-started] for details.
Once a client has been obtained, you still need to establish a runtime context,
which you can do using the Client.context method.
from google.cloud import ndb
# Assume GOOGLE_APPLICATION_CREDENTIALS is set in environment
client = ndb.Client()
with client.context() as context:
do_stuff_with_ndb()
Because the Google App Engine Memcache service is not a part of the Google
Cloud Platform, it was necessary to refactor the "memcache" functionality of
NDB. The concept of a memcache has been generalized to that of a "global cache"
and defined by the GlobalCache interface, which is an abstract base class.
NDB provides a single concrete implementation of GlobalCache, RedisCache,
which uses Redis.
In order to enable the global cache, a GlobalCache instance must be passed
into the context. The Bootstrapping example can be amended as follows:
from google.cloud import ndb
# Assume GOOGLE_APPLICATION_CREDENTIALS is set in environment.
client = ndb.Client()
# Assume REDIS_CACHE_URL is set in environment (or not).
# If left unset, this will return `None`, which effectively allows you to turn
# global cache on or off using the environment.
global_cache = ndb.RedisCache.from_environment()
with client.context(global_cache=global_cache) as context:
do_stuff_with_ndb()
context.Context had a number of methods that were direct pass-throughs to GAE
Memcache. These are no longer implemented. The methods of context.Context
that are affected are: memcache_add, memcache_cas, memcache_decr,
memcache_delete, memcache_get, memcache_gets, memcache_incr,
memcache_replace, memcache_set.
- The "standard" exceptions from App Engine are no longer available. Instead,
we'll create "shims" for them in
google.cloud.ndb.exceptionsto match the class names and emulate behavior. - There is no replacement for
google.appengine.api.namespace_managerwhich is used to determine the default namespace when not passed in toKey() - The
Key()constructor (and helpers) make a distinction betweenunicodeandstrtypes (in Python 2). These are nowunicode->strandstr->bytes. However,google.cloud.datastore.Key()(the actual type we use under the covers), only allows thestrtype in Python 3, so much of the "type-check and branch" from the original implementation is gone. This may cause some slight differences. Key.from_old_key()andKey.to_old_key()always raiseNotImplementedError. Without the actual types from the legacy runtime, these methods are impossible to implement. Also, since this code won't run on legacy Google App Engine, these methods aren't needed.Key.app()may not preserve the prefix from the constructor (this is noted in the docstring)Key.__eq__previously claimed to be "performance-conscious" and directly usedself.__app == other.__appand similar comparisons. We don't store the same data on ourKey(we just make a wrapper aroundgoogle.cloud.datastore.Key), so these are replaced by functions callsself.app() == self.app()which incur some overhead.- The verification of kind / string ID fails when they exceed 1500 bytes. The original implementation didn't allow in excess of 500 bytes, but it seems the limit has been raised by the backend. (FWIW, Danny's opinion is that the backend should enforce these limits, not the library.)
Property.__creation_counter_globalhas been removed as it seems to have been included for a feature that was never implemented. See Issue #175 for original rationale for including it and Issue #6317 for discussion of its removal.ndbuses "private" instance attributes in many places, e.g.Key.__app. The current implementation (for now) just uses "protected" attribute names, e.g.Key._key(the implementation has changed in the rewrite). We may want to keep the old "private" names around for compatibility. However, in some cases, the underlying representation of the class has changed (such asKey) due to newly available helper libraries or due to missing behavior from the legacy runtime.query.PostFilterNode.__eq__comparesself.predicatetoother.predicaterather than usingself.__dict__ == other.__dict____slots__have been added to most non-exception types for a number of reasons. The first is the naive "performance" win and the second is that this will make it transparent wheneverndbusers refer to non-existent "private" or "protected" instance attributes- I dropped
Property._positionalsince keyword-only arguments are native Python 3 syntax and droppedProperty._attributesin favor of an approach usinginspect.signature() - A bug in
Property._find_methodswas fixed wherereverse=Truewas applied before caching and then not respected when pulling from the cache - The
Property._find_methods_cachehas been changed. Previously it would be set on eachPropertysubclass and populated dynamically on first use. NowProperty._FIND_METHODS_CACHEis set to{}when thePropertyclass is created and there is another level of keys (based on fully-qualified class name) in the cache. BlobProperty._datastore_typehas not been implemented; the base class implementation is sufficient. The original implementation wrapped a byte string in agoogle.appengine.api.datastore_types.ByteStringinstance, but that type was mostly an alias forstrin Python 2BlobProperty._validateused to special case for "too long when indexed" ifisinstance(self, TextProperty). We have removed this check since the implementation does the same check inTextProperty._validate.- The
BlobPropertyconstructor only sets_compressedif explicitly passed. The original set_compressedalways (and usedFalseas default). In the exact same fashion theJsonPropertyconstructor only sets_json_typeif explicitly passed. Similarly, theDateTimePropertyconstructor only sets_auto_nowand_auto_now_addif explicitly passed. TextProperty(indexed=True)andStringProperty(indexed=False)are no longer supported (see docstrings for more info)model.GeoPtis an alias forgoogle.cloud.datastore.helpers.GeoPointrather than an alias forgoogle.appengine.api.datastore_types.GeoPt. These classes have slightly different characteristics.- The
Property()constructor (and subclasses) originally accepted bothunicodeandstr(the Python 2 versions) forname(andkind) but we only acceptstr. - The
Parameter()constructor (and subclasses) originally acceptedint,unicodeandstr(the Python 2 versions) forkeybut we only acceptintandstr. - When a
Keyis used to create a query "node", e.g. viaMyModel.my_value == some_key, the underlying behavior has changed. Previously aFilterNodewould be created with the actual value set tosome_key.to_old_key(). Now, we set it tosome_key._key. - The
google.appengine.api.users.Userclass is missing, so there is a replacement ingoogle.cloud.ndb.model.Userthat is also available asgoogle.cloud.ndb.User. This does not support federated identity and has new support for adding such a user to agoogle.cloud.datastore.Entityand for reading one from a new-styleEntity - The
UserPropertyclass no longer supportsauto_current_user(_add) Model.__repr__will use_keyto describe the entity's key when there is also a user-defined property namedkey. For an example, see the class docstring forModel.Future.set_exceptionno longer takestbargument. Python 3 does a good job of remembering the original traceback for an exception and there is no longer any value added by manually keeping track of the traceback ourselves. This method shouldn't generally be called by user code, anyway.Future.stateis omitted as it is redundant. CallFuture.done()orFuture.running()to get the state of a future.StringPropertyproperties were previously stored as blobs (entity_pb2.Value.blob_value) in Datastore. They are now properly stored as strings (entity_pb2.Value.string_value). At read time, aStringPropertywill accept either a string or blob value, so compatibility is maintained with legacy databases.- The QueryOptions class from google.cloud.ndb.query, has been reimplemented, since google.appengine.datastore.datastore_rpc.Configuration is no longer available. It still uses the same signature, but does not support original Configuration methods.
- Because google.appengine.datastore.datastore_query.Order is no longer available, the ndb.query.PropertyOrder class has been created to replace it.
- Transaction propagation is no longer supported. This was a feature of the
older Datastore RPC library which is no longer used. Starting a new
transaction when a transaction is already in progress in the current context
will result in an error, as will passing a value for the
propagationoption when starting a transaction. - The
xgoption for transactions is ignored. Previously, setting this toTrue, allowed writes up 5 entity groups in a transaction, as opposed to only being able to write to a single entity group. In Datastore, currently, writing up to 25 entity groups in a transaction is supported by default and there is no option to change this. - Datastore API does not support Entity Group metadata queries anymore, so
google.cloud.ndb.metadata.EntityGroupandgoogle.cloud.ndb.metadata.get_entity_group_versionboth throw agoogle.cloud.ndb.exceptions.NoLongerImplementedErrorexception when used. - The
batch_sizeandprefetch_sizearguments toQuery.fetchandQuery.fetch_asyncare no longer supported. These were passed through directly to Datastore, which no longer supports these options. - The
index_listmethod ofQueryIteratoris not implemented. Datastore no longer returns this data with query results, so it is not available from the API in this way. - The
produce_cursorsquery option is deprecated. Datastore always returns cursors, where it can, and NDB always makes them available when possible. This option can be passed in but it will be ignored. - The
maxargument toModel.allocate_idsandModel.allocate_ids_asyncis no longer supported. The Google Datastore API does not support setting a maximum ID, a feature that GAE Datastore presumably had. model.get_indexes()andmodel.get_indexes_async()are no longer implemented, as the support in Datastore for these functions has disappeared from GAE to GCP.- The
max_memcache_itemsoption is no longer supported. - The
force_writesoption is no longer supported. - The
blobstoremodule is no longer supported. - The
pass_batch_into_callbackargument toQuery.mapandQuery.map_asyncis no longer supported. - The
merge_futureargument toQuery.mapandQuery.map_asyncis no longer supported. - Key.urlsafe() output is subtly different: the original NDB included a GAE Datastore-specific "location prefix", but that string is neither necessary nor available on Cloud Datastore. For applications that require urlsafe() strings to be exactly consistent between versions, use Key.to_legacy_urlsafe(location_prefix) and pass in your location prefix as an argument. Location prefixes are most commonly "s~" (or "e~" in Europe) but the easiest way to find your prefix is to base64 decode any urlsafe key produced by the original NDB and manually inspect it. The location prefix will be consistent for an App Engine project and its corresponding Datastore instance over its entire lifetime.
- Key.urlsafe outputs a "bytes" object on Python 3. This is consistent behavior and actually just a change in nomenclature; in Python 2, the "str" type referred to a bytestring, and in Python 3 the corresponding type is called "bytes". Users may notice a difficulty in incorporating urlsafe() strings in JSON objects in Python 3; that is due to a change in the json.JSONEncoder default behavior between Python 2 and Python 3 (in Python 2, json.JSONEncoder accepted bytestrings and attempted to convert them to unicode automatically, which can result in corrupted data and as such is no longer done) and does not reflect a change in NDB behavior.
App Engine NDB exposed some internal utilities as part of the public API. A few bits of the nominally public API have been found to be de facto private. These are pieces that are omitted from public facing documentation and which have no apparent use outside of NDB internals. These pieces have been formally renamed as part of the private API:
eventloophas been renamed to_eventloop.tasklets.get_return_valuehas been renamed totasklets._get_return_valueand is no longer among top level exports.tasklets.MultiFuturehas been renamed totasklets._MultiFuture, removed from top level exports, and has a much simpler interface.
These options classes appear not to have been used directly by users and are not implemented—public facing API used keyword arguments instead, which are still supported:
ContextOptionsTransactionOptions
The following pieces appear to have been only used internally and are no longer implemented due to the features they were used for having been refactored:
Query.run_to_queuetasklets.add_flow_exceptiontasklets.make_contexttasklets.make_default_contexttasklets.QueueFuturetasklets.ReducingFuturetasklets.SerialQueueFuturetasklets.set_context
A number of functions in the utils package appear to have only been used
internally and have been made obsolete either by API changes, internal
refactoring, or new features of Python 3, and are no longer implemented:
utils.code_info()utils.decorator()utils.frame_info()utils.func_info()utils.gen_info()utils.get_stack()utils.logging_debug()utils.positional()utils.tweak_logging()utils.wrapping()(usefunctools.wrapsinstead)utils.threading_local()
One of the largest classes of differences comes from the use of the current
Datastore API, rather than the legacy App Engine Datastore. In general, for
users coding to the public interface, this won't be an issue, but users relying
on pieces of the ostensibly private API that are exposed to the bare metal of
the original datastore implementation will have to rewrite those pieces.
Specifically, any function or method that dealt directly with protocol buffers
will no longer work. The Datastore .protobuf definitions have changed
significantly from the barely public API used by App Engine to the current
published API. Additionally, this version of NDB mostly delegates to
google.cloud.datastore for parsing data returned by RPCs, which is a
significant internal refactoring.
ModelAdapteris no longer used. In legacy NDB, this was passed to the Datastore RPC client so that calls to Datastore RPCs could yield NDB entities directly from Datastore RPC calls. AFAIK, Datastore no longer accepts an adapter for adapting entities. At any rate, we no longer do it that way.Property._db_get_value,Property._db_set_value, are no longer used. They worked directly with Datastore protocol buffers, work which is now delegated togoogle.cloud.datastore.Property._db_set_compressed_meaningandProperty._db_set_uncompressed_meaningwere used byProperty._db_set_valueand are no longer used.Model._deserializeandModel._serializeare no longer used. They worked directly with protocol buffers, so weren't really salvageable. Unfortunately, there were comments indicating they were overridden by subclasses. Hopefully this isn't broadly the case.model.make_connectionis no longer implemented.
- There is rampant use (and abuse) of
__new__rather than__init__as a constructor as the original implementation. By using__new__, sometimes a different type is used from the constructor. It seems that feature, along with the fact thatpickleonly calls__new__(and never__init__) is why__init__is almost never used. - The
Key.__getnewargs__()method isn't needed. For pickle protocols 0 and 1,__new__is not invoked on a class during unpickling; the state "unpacking" is handled solely via__setstate__. However, for pickle protocols 2, 3 and 4, during unpickling an instance will first be created viaKey.__new__()and then__setstate__would be called on that instance. The addition of the__getnewargs__allows the (positional) arguments to be stored in the pickled bytes. All of the work of the constructor happens in__new__, so the call to__setstate__is redundant. In our implementation__setstate__is sufficient, hence__getnewargs__isn't needed. - Key parts (i.e. kind, string ID and / or integer ID) are verified when a
Referenceis created. However, this won't occur when the corresponding protobuf for the underlyinggoogle.cloud.datastore.Keyis created. This is because theReferenceis a legacy protobuf message type from App Engine, while the latest (google/datastore/v1) RPC definition uses aKey. - There is a
Property._CREATION_COUNTERthat gets incremented every time a newProperty()instance is created. This increment is not threadsafe. However,ndbwas designed forProperty()instances to be created at import time, so this may not be an issue. ndb.model._BaseValuefor "wrapping" non-user values should probably be dropped or redesigned if possible.- Since we want "compatibility", suggestions in
TODOcomments have not been implemented. However, that policy can be changed if desired. - It seems that
query.ConjunctionNode.__new__had an unreachable line that returned aFalseNode. This return has been changed to aRuntimeErrorjust it case it is actually reached. - For
ANDandORto compare equal, the nodes must come in the same order. SoAND(a > 7, b > 6)is not equal toAND(b > 6, a > 7). - It seems that
query.ConjunctionNode.__new__had an unreachable line that returned aFalseNode. This return has been changed to aRuntimeErrorjust it case it is actually reached. - For
ANDandORto compare equal, the nodes must come in the same order. SoAND(a > 7, b > 6)is not equal toAND(b > 6, a > 7). - The whole
bytesvs.strissue needs to be considered package-wide. For example, theProperty()constructor always encoded Python 2unicodeto a Python 2str(i.e.bytes) with theutf-8encoding. This fits in some sense: the property name in the protobuf definition is astring(i.e. UTF-8 encoded text). However, there is a bit of a disconnect with other types that use property names, e.g.FilterNode. - There is a giant web of module interdependency, so runtime imports (to avoid
import cycles) are very common. For example
model.Propertydepends onquerybutquerydepends onmodel. - Will need to sort out dependencies on old RPC implementations and port to modern gRPC. (Issue #6363)