Type Mismatch – The New Stack
Bobbie Cochrane
Bobbie is a seasoned research scientist with a proven track record in the information technology and services industry. She is a seasoned research professional specializing in blockchain, scalability, IBM DB2, cloud, IT and enterprise software.
Typeless systems, such as NoSQL, JSON, JavaScript, and Python databases, have grown in popularity and are very useful in practice because the world’s data does not conform perfectly to a consistent, rigid structure.
Even if it were, it is nearly impossible to capture all facets of the data and predict the future use of that data.
However, the data is also not completely schema-free, and any application that uses the data will apply a well-defined schema.
Opponents of NoSQL complain that this popularity with NoSQL takes application development back to the days of hierarchical databases of the 1960s, ignoring the reality that the data in these typeless systems does, in fact, have a consistent structure and that their schema is well-designed and evolves orderly with the application development process.
With the advent of GraphQL, we can introduce a type system to these systems without taking away the flexibility they introduced, bringing order to what appears to be clutter.
Let’s walk through this using one of our favorite backend databases, MongoDB.
Suppose we have a GraphQL schema for data that represents a customer:
type Customer { id: ID! name: Rope! Email: Channel Age: Int Joined: Date! }
type Customer { identifier: IDENTIFIER! Last name: String of characters! E-mail: String of characters age: Int joined: Date! } |
The above indicates that a customer has five attributes and each must be of the type indicated. The presence of “!” in GraphQL indicates that the field is non-nullable, similar to NOT NULL
in SQL.
Now suppose that your MongoDB collection contains the following three JSON-style documents:
1 2 3 4 5 6 seven 8 9 ten 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
[ { “id”: “12345”, “name”: “John Doe”, “email”: “john.doe@example.com”, “age”: 20, “joined”: “2021–01–10”, “children”: [“John”, “Mary”] }, { “identifier”: “23456”, “Last name”: “Joan Doe”, “E-mail”: “Joan.doe@Example.com”, “age”: 30, “joined”: “2020–01–ten”, “hobby”: [“surfing”, “reading”] }, { “identifier”: “34567”, “Last name”: “John Black-smith”, “E-mail”: “Jeans.black-smith@Example.com”, “age”: 40.1 “desks”: [“StepZen”] } ] |
As you can see, the data has a fairly consistent structure with a few exceptions. The most obvious are that each document has differentiating characteristics. John Doe has children, Jane Doe has hobbies, and John Smith has offices. The “attached” property seems to be common, but on closer inspection, we see that one of the documents is missing this field. And finally, we notice that if we only considered Mr. and Mrs. Doe, we might think that the age could be represented as an integer, but Mr. Smith’s age is clearly not an integer.
Herman Camarena
Herman is a Senior Software Engineer at StepZen and has over 25 years of technology experience. Herman was a data scientist and senior software engineer at Google. Prior to that, he served as CTO at Spribo and QoS Labs and began his tech journey by holding leadership positions at Apple, WebMethods and BEA Systems.
How would this data fit our GraphQL model? There are obvious problems:
- Attached field cannot be set to ! since in the last document it is missing. Easy solution, right? Convert it to “Date” attached as opposed to “Date!” » attached. But what if, out of the million documents, it’s the only record that doesn’t seem to have any seal value? With this increased perspective, remove the ! Now seems like a bad idea because this anomaly could represent bad data.
- The documents seem to have various additional fields. This is a fairly common pattern even when the data is in strictly typed systems like relational database management systems, which is why BLOBS (Binary Large Objects), XML and JSON were introduced. With GraphQL we could do something like:
type Customer { … children: [String!]hobby: [String!]Desk: [String!]}
type Customer {
…
children: [String!]
hobby: [String!]
Desk: [String!]
}
ensure that all three records can be mapped. Note that the children’s, leisure and office fields do not have any! after them, indicating that they are optional. (The String! descriptor in the list means that an element in the list cannot be null). However, this also seems suboptimal; after all, we can get hundreds of such fields out of millions of records, which leads to excessive bloat and application logic. We’ll show you a better way below.
- Our GraphQL schema assumes the INT type for the age, which works for Mr. and Mrs. Doe but is not compatible with Mr. Smith’s age, which would be better handled with the FLOAT type, unless it is not cast to INT.
These are just a few simple examples that illustrate these marriage problems between strongly typed (GraphQL) and weakly typed (here JSON).
However, our team at StepZen has found great solutions using GraphQL for these and other issues.
- A GraphQL type can be JSON. This means that even though the data contained in this JSON is opaque to GraphQL, any data in the form of JSON can be assigned to this field, preserving the type flexibility of the system without an underlying type. For example, if there was a field:
type Customer { … extras: JSON }
type Customer {
…
Supplements: JSON
}
then all records could go through the GraphQL layer.
- A GraphQL type can be a UNION type. So we could say:
union Extra =Leisure | Children | Client type offices { … extras: [Extra!]}
union in addition =Hobby | Children | Desks
type Customer {
…
Supplements: [Extra!]
}
and this keeps the representation compact. Also, records are always strongly typed, except for field extras, which can be one or more of the three types. Additionally, we can use a fragment with a type condition to capture the Extra type-specific design:
{ customers (id: “12345”) { extras name: { __typeName … on hobbies { level } … on kids { name } … on offices { location } } } }
1
2
3
4
5
6
seven
8
9
ten
11
12
13
14
15
16
17
{
clients (identifier: “12345”) {
Last name
Supplements: {
__typeName
… on hobby {
level
}
… on children {
Last name
}
… on desks {
location
}
}
}
}
In the above, the GraphQL query explicitly states: “If an extras item is of type hobbies, then return the level, etc.” This allows the GraphQL API to do whatever it takes to return the data in the correct form, removing the load from the application.
In all of the above, the GraphQL API developer makes explicit choices about the types they want that roughly match the types of the backends. What if we had the opposite problem? What if we inspect the backend and automatically derive GraphQL types from the underlying data? For such a system:
- Sample enough records to see diversity. MongoDB has a search where you can retrieve multiple records. REST APIs can have paging models. SQL systems have limitations and offsets. Anyway, a dozen records is a good sample.
- Make wise choices. As mentioned above, the data in these systems is not completely schemaless; they contain lightweight patterns. Find the largest common substructures and declare the rarer ones as UNION or JSON.
- Choose the right types and when in doubt, drop one!
If you look at JSON2SDL, you will find such an implementation built by the company we work for, StepZen. In fact, we’ve built a set of introspectors that make these kinds of smart choices, including going against a wide variety of REST, SQL, and NoSQL backends. And we are currently releasing our REST2GraphQL introspector to the public.
The problem of type mismatches is an important problem to solve for GraphQL to finally realize its potential. However, as a language, it has many features that make the mismatch problem less serious. And tools like StepZen’s JSON2SDL are making the job more and more automatic. The paradigm mismatch between a NoSQL database such as MongoDB and GraphQL can be easily bridged. And this is also true for other backends like REST and SQL. All signs point to GraphQL becoming the default strongly-typed API layer for accessing all sorts of backends.
Image by mac231 from Pixabay