announcementdataaiplatformbigdatadatabricks

Launching the Data Dynamics Tech Blog

We'll be sharing lessons learned from the field of enterprise (big) data and AI on an ongoing basis.

Data DynamicsApril 10, 20262 min read

Hello, we're Data Dynamics. Starting today, we'll be regularly publishing our enterprise (big) data and AI experience from the field as a tech blog.

What we write about

We cover three main topics:

Data Platforms: Operational and development know-how for Apache Airflow, Impala, Kudu, Hive, Hadoop, Spark, NiFi, Databricks, Delta Lake, Unity Catalog, and more
AI / MLOps: Know-how for K8S, AI Platforms, generative AI pipelines, model deployment/monitoring, RAG architecture, and more
Engineering in the Field: Know-how for architecture design, large-scale migrations, post-incident analysis, performance tuning, and more

In other words, we document "stories that aren't in the official docs but are essential in production."

Code examples

Since this is a tech blog, code naturally comes with the territory.

from pyspark.sql import functions as F
 
def enrich_events(df):
    return (
        df.withColumn("event_date", F.to_date("event_ts"))
          .withColumn("is_valid", F.col("user_id").isNotNull())
          .filter("is_valid")
    )

-- OPTIMIZE only the last 7 days of partitions in Delta Lake
OPTIMIZE events
WHERE event_date >= current_date() - INTERVAL 7 DAYS
ZORDER BY (user_id);

What's ahead

We'll publish 1-2 posts per week, with engineers from the team taking turns writing. If you have feedback or topics you'd like us to cover, feel free to let us know via Contact Us.

We'll unpack the complexity of the field honestly, but always in a way that helps.

— The Data Dynamics Team