We'll be sharing lessons learned from the field of enterprise (big) data and AI on an ongoing basis.
Data DynamicsApril 10, 20262 min read
Hello, we're Data Dynamics.
Starting today, we'll be regularly publishing our enterprise (big) data and AI experience from the field as a tech blog.
What we write about
We cover three main topics:
Data Platforms: Operational and development know-how for Apache Airflow, Impala, Kudu, Hive, Hadoop, Spark, NiFi, Databricks, Delta Lake, Unity Catalog, and more
AI / MLOps: Know-how for K8S, AI Platforms, generative AI pipelines, model deployment/monitoring, RAG architecture, and more
Engineering in the Field: Know-how for architecture design, large-scale migrations, post-incident analysis, performance tuning, and more
In other words, we document "stories that aren't in the official docs but are essential in production."
Code examples
Since this is a tech blog, code naturally comes with the territory.
from pyspark.sql import functions as Fdef enrich_events(df): return ( df.withColumn("event_date", F.to_date("event_ts")) .withColumn("is_valid", F.col("user_id").isNotNull()) .filter("is_valid") )
-- OPTIMIZE only the last 7 days of partitions in Delta LakeOPTIMIZE eventsWHERE event_date >= current_date() - INTERVAL 7 DAYSZORDER BY (user_id);
What's ahead
We'll publish 1-2 posts per week, with engineers from the team taking turns writing.
If you have feedback or topics you'd like us to cover, feel free to let us know via Contact Us.
We'll unpack the complexity of the field honestly, but always in a way that helps.