Imply’s Post

View organization page for Imply, graphic

15,275 followers

2w Edited

🔎 #DruidSummit Speaker Spotlight: Imply's Abhishek Balaji Radhakrishnan – Ingesting Delta Lake tables into Druid Level: 🟠 Intermediate https://bit.ly/4dBxZsm

To view or add a comment, sign in

More Relevant Posts

Dr. Benedikt Koehler

CEO & Founder @ DataLion | Dashboards and Automation for Market Research and Consumer Insights
4mo
Report this post
Did you know how easy it is to create a stacked bar chart in DataLion? Here’s everything you need to get started. #dashboards #datavisualization
Like Comment
To view or add a comment, sign in
Miles Cole

Principal Program Manager @ Microsoft, Azure Data CAT | Spark | Lakehouse | Blogger on all things Big Data
3mo
Report this post
UPDATE: Release 0.1.2 of onelake-shortcut-tools is out which includes the updated Delta Lake 3.1 features available in Fabric Runtime 1.3 Preview (liquid clustering, default columns) Need to evaluate what external tables can be read from and written to in Fabric? Just install my library via PIP and get a clean report out of compatibility. See blog in comments for how to run it. #deltalake
1 Comment
Like Comment
To view or add a comment, sign in
Nnaemezue Obi-Eyisi

Managing Delivery Architect at Capgemini with expertise in Azure Databricks and Data Engineering. I teach Azure Data Engineering and Databricks!
1mo
Report this post
It took me a while to learn this: for Delta tables that use Hive-style partitioning instead of Liquid Clustering, running an OPTIMIZE command will not compact smaller files if the table is over-partitioned. This is because files cannot be combined or compacted across partition boundaries. As a result, you will experience poor performance when querying that Delta table unless you perform a full rewrite of the table with a new partition scheme or implement Liquid Clustering. #dataengineering
Like Comment
To view or add a comment, sign in
Ashutosh Tamrakar

Python developer | Programmer | Data science | Data analyst | Mysql | Pandas | Numpy | Jupyter | Bca from virendra swarup institute of computer studies | Mca from asian international University
1mo
Report this post
a program to generate multiplication table from 2 to 20 and write it to the different file. def generateTable(n): table = "" for i in range (1, 11): table += f"{n} X {i} = {n*i}\n" with open(f"tables/table_{n}.txt", "w") as f: f.write(table) for i in range (2, 21): generateTable(i)
Like Comment
To view or add a comment, sign in
Venkat Talapaneni

Technical Lead 👨💻 | Data Engineer 📊 | Data Analysis 📈| Business Intelligence 🤖 | Cloud Engineering ☁️ | snowflake ❄️ | AWS 🌨️ | ETL 💠 | IICS 📑|Databricks
4mo
Report this post
Hello Everyone, in this presentation, I discussed the three ways to convert Parquet tables to Delta tables based on different use cases. The three methods are "CONVERT TO DELTA","shallow cloning" and "deep cloning"

Convert Parquet Tables into Delta Tables

https://www.loom.com

4 Comments
Like Comment
To view or add a comment, sign in
Jaromir Hamala

Software Engineer
4mo
Report this post
I screwed up. I implemented the change. Micro-benchmarks were promising, but real SQL benchmarks with serious datasets were disappointing. Some queries were running three times slower. It felt awful, I felt awful! I thought I would document my findings, blame pointer-chasing for the slowdown, and move on. But then my dear #QuestDB colleagues Andrei and Vlad pushed me to investigate a bit more. Sure enough, the slowdown was not inherent to the design, but was merely due to an implementation bug. The bug was subtle enough to go undetected in tests with a limited dataset size. After fixing the bug, the SQL benchmarks showed a nice 30% speed-up in some queries. Moral of the story: 1. Microbenchmarks are fine, but nothing replaces real-world-like datasets. 2. Do not give up. 2. Benchmark numbers are just data. To gain reusable insights, you need to follow up on why the numbers are the way they are ;-)
Andrei Pechkurov

Core Database Engineer at QuestDB
4mo

Jaromir Hamala is cooking something fancy. That's a specialized single string key hash table for the #QuestDB query engine. The idea is to avoid redundant allocations and memcpy and, instead, store mmapped memory pointers in the table entries. The memory footprint is much lower and the table is more efficient.
6 Comments
Like Comment
To view or add a comment, sign in
Andrei Pechkurov

Core Database Engineer at QuestDB
4mo
Report this post
Jaromir Hamala is cooking something fancy. That's a specialized single string key hash table for the #QuestDB query engine. The idea is to avoid redundant allocations and memcpy and, instead, store mmapped memory pointers in the table entries. The memory footprint is much lower and the table is more efficient.
4 Comments
Like Comment
To view or add a comment, sign in
Karthik Paila

UG-CSE'27 || AI Enthusiast | SQL | Passionate about in Problem solving DSA | C, C++, Java, Python | Looking for Internship
1mo
Report this post
#Day27 of #MonthOfGraphs: LQ.1319)Number of operations to make network Connected ( Medium) Today i solved a question realted to spanning trees, which is a part of graphs and i used kruskal algorithm to solve the problem .initially i attempted with my logic and got solved of many cases but one case getting rejected and later i looked into description , so many members were facing same situation and later i watch strivers approach . he also done with same logic but there was a little change in it . and i corrected it by analyzing the some more examples and got solved. #DFS #BFS #GraphAlgorithms #Undirected #LearningDSA Raj Vikramaditya Mani Bhargavi Bendalam, Pradeep Kumar Puvvala, Praveen Kumar Inti, Sahu Akshaya
Like Comment
To view or add a comment, sign in
Ali Pala

Senior QA Engineer @ ABN AMRO Bank N.V. | Test Automation Coach | DevOps, REST APIs | GenAI Enthusiast
1w
Report this post
All of us are trying to make our models efficient and more reliable. In order to the that we are either fine tuning or implementing a RAG pipeline. Hereby a good article how to build a GraphRAG from LlamaIndex https://lnkd.in/e7wdUyUf
Like Comment
To view or add a comment, sign in
Delta Lake

58,749 followers
1mo
Report this post
Column mapping feature allows Delta table columns and the underlying #Parquet file columns to use different names. 🙌 📌 This enables Delta schema evolution operations such as 𝚁𝙴𝙽𝙰𝙼𝙴 𝙲𝙾𝙻𝚄𝙼𝙽 and 𝙳𝚁𝙾𝙿 𝙲𝙾𝙻𝚄𝙼𝙽𝚂 on a Delta table without the need to rewrite the underlying Parquet files. ✍ 📌 It also allows users to name Delta table columns by using characters that are not allowed by Parquet, such as spaces, so that users can directly ingest CSV or JSON data into Delta without the need to rename columns due to previous character constraints. 🔗 Check out the documentation for more: https://lnkd.in/eviHRkRr #opensource #oss #linuxfoundation #deltalake

Delta column mapping

docs.delta.io

6 Comments
Like Comment
To view or add a comment, sign in

15,275 followers

View Profile Follow

Imply’s Post

More from this author

Imply Polaris is now on Microsoft Azure

How to Monitor Your Data in Real Time with AWS IoT Core & Imply

Druid Summit, Druid 28.0, and the Developer Center

Explore topics

Imply’s Post

More Relevant Posts

Convert Parquet Tables into Delta Tables

https://www.loom.com

More from this author

Imply Polaris is now on Microsoft Azure

How to Monitor Your Data in Real Time with AWS IoT Core & Imply

Druid Summit, Druid 28.0, and the Developer Center

Explore topics