Technical Things for Non-Technical People Today: Data Formats YAML and its siblings

In the past, I was asked to explain even quite complex things to certain people. Those people usually don’t have any technical background – e.g. some marketer or even an economist. Often, they don’t understand why certain things are complex by nature and no one can make them simple. It is not that they are stupid – the contrary is true. But they lack certain basics (as for me is valid in business economy). Therefore, I tried to explain basics things to them, so that they are able to grasp principle connections and dependencies. It makes both our lives easier and it’s really fun to explain things to someone who is eager to learn it.

Today I want to explain data structure things. What do I mean by it? Data are usually stored in a more or less structured way. Let me use some sample, so that I can better explain what it means. Let us assume we want to open a cloth store for high quality clothes. The users are able to define some preferences, so that special offers for entire outfits can be automatically created for them. The users define their favorite color, their favorite trousers and shirt style and probably which kind of shoes they prefer. So how would such a user profile look like in a more or less structured way:

  • User Identification
    • Favorite Color
    • Shirt
      • Collar style
      • Material
      • Tailoring
    • Trousers
      • Material
      • Tailoring
    • Shoes
      • Heel
      • Toe-Cap
      • Material
        • Inside
        • Outside

The structure we need to provide the according information from one service to another e.g. the user configuration service to the shop, or even from the backend service to the web or mobile client.

When you imagined such a structure in a flat format like Excel, it would probably look like that:

User
Favorite Color
Shirt Collar Style
Shirt Material
Shirt Tailoring
Trousers Material
Trousers Tailoring
Shoes
Heel
Shoes Toe-Cap
Shoe Material Inside
Shoe Material Outside
Annegret.junker@adesso.de
Red
Button-Down
Cotton
Comfort-fit
Cotton
Five Pocket
High
Metal
Leather
Leather

As you can see, you need to transport always the name of the thing and the value of it. In our sample key-value-pairs look not really efficient – and they are rarely used. But when you have those settings which are seldom changed and usually only at single points, those key-value-pairs make sense – e.g. for service configurations.

So, it might be better to transport the entire information, but to structure them by indents:

user: john.doe@somewhere.com
 favoriteColor: Red
 shirt:
   collar: Button-Down
   material: Cotton
   tailoring: Comfort-fit
 trousers:
   material: Cotton
   tailoring: Five-Pocket
 shoes:
   heels: High
   toe: Metal
   material:
     inside: Leather
     outside: Leather

As you can see, you can interpret only correctly, if the indents are correct. Otherwise the user would get probably a leather shirt instead of cotton.

Even though there is a format which works like that: YAML. YAML stands for “YAML Ain’t Markup Language “). Usually it is used for interface description (not the data inside J). You can find it widely used for web and mobile application interface definitions, e.g. swagger.io

So, a better variation would be to name what the content stands for (as in the key-value-pairs) and to structure it not only by indents.

{
   "user": "john.doe@somewhere.com",
   "favoriteColor": "Red",
   "shirt": {
     "collar": "Button-Down",
     "material": "Cotton",
     "tailoring": "Comfort-fit"
   },
   "trousers": {
     "material": "Cotton",
     "tailoring": "Five-Pocket"
   },
   "shoes": {
     "heels": "High",
     "toe": "Metal",
     "material": {
       "inside": "Leather",
       "outside": "Leather"
     }
   }
 }

Such a format is called JSON. JSON stands for Java Script Object Notation. It can be used in mobile and web clients to exchange data between the client (e.g. the app) and the server side (e.g. some cloud). The format is wide spread and is especially used by microservices and REST interfaces (which I will explain the next time in this post). As you can see, we have here the key-value-pairs, but they belong to something. The belonging is done by indent and parentheses. But we don’t know for sure, when an entry ends – it might end by comma or by parenthesis, but sometimes we might be not sure. There could be content errors, even though the data are syntactically correct written down.

To be sure that a certain object ends, we need to start and to end with the key.

<userPreferences user="john.doe@somewhere.com">
     <favoriteColor>Red</favoriteColor>
     <shirt>
         <collar>Button-Down</collar>
         <material>Cotton</material>
         <tailoring>Comfort-fit</tailoring>
     </shirt>
     <trousers>
         <material>Cotton</material>
         <tailoring>Five-Pocket</tailoring>
     </trousers>
     <shoes>
         <heels>High</heels>
         <toe>Metal</toe>
         <material>
             <inside>Leather</inside>
             <outside>Leather</outside>
         </material>
     </shoes>
 </userPreferences>

 

Such a format is called XML for Extended Markup Language. It quite reliable, but the reliability is bought with heaviness. Therefore, those interfaces using XML like SOAP interfaces are usually slow and cannot be changed so fast as other interfaces.

I hope you understand better now the differences between the different formats. Please let me know, when you like the post and when you have more questions about technical things. Always with the motto: “How do I explain it to my manager?”

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s