PostgreSQL and pgvector: Now Faster than Pinecone, 75% cheaper, 100% open-source. Introducing pgvectorscale, an open-source PostgreSQL extension that builds on pgvector, enabling greater performance and scalability. Here’s how pgvectorscale helps pgvector outperform specialized vector database like Pinecone: 1️⃣ StreamingDiskANN: A new vector search index that overcomes limitations of in-memory indexes like HNSW the index on disk, making it more cost-efficient to run and scale as vector workloads grow. Inspired by the DiskANN paper from Microsoft. 2️⃣ Statistical Binary Quantization (SBQ): Developed by researchers at Timescale, this technique improves on standard binary quantization techniques by improving accuracy when using quantization to reduce the space needed for vector storage 3️⃣ Written in Rust, giving the PostgreSQL community to contribute to vector support. 📈The result? On our benchmark of 50 million Cohere embeddings (768 dimensions each), PostgreSQL with pgvector and pgvectorscale achieves 28x lower p95 latency and 16x higher query throughput compared to Pinecone for approximate nearest neighbor queries at 99 % recall, all at 75 % less cost when self-hosted on AWS EC2. We also tested it against Pinecone’s p2 high performance index, see the blog post at the end of this post for full results (spoiler: It’s just as impressive). Pgvectorscale is open-source under the PostgreSQL license and free for you to use on any PostgreSQL database for your AI projects. To get started, see the pgvectorscale github repo: https://lnkd.in/ghXj2e-U Or try it on Timescale Cloud on any new database service. Eager to learn more about pgvectorscale and how it works? Head over to our blog post with all the details: https://lnkd.in/gcMcxrVb
Timescale
Software Development
New York, New York 10,950 followers
Timescale is the modern cloud platform built on PostgreSQL for time series, events, and analytics.
About us
Timescale is addressing one of the largest challenges (and opportunities) in databases for years to come: helping developers, businesses, and society make sense of the data that humans and their machines are generating in copious amounts. TimescaleDB is the only open-source time-series database that natively supports full-SQL, combining the power, reliability, and ease-of-use of a relational database with the scalability typically seen in NoSQL systems. It is built on PostgreSQL and optimized for fast ingest and complex queries. TimescaleDB is deployed for powering mission-critical applications, including industrial data analysis, complex monitoring systems, operational data warehousing, financial risk management, and geospatial asset tracking across industries as varied as manufacturing, space, utilities, oil & gas, logistics, mining, ad tech, finance, telecom, and more. Timescale is backed by NEA, Benchmark, Icon Ventures, Redpoint Ventures, Two Sigma Ventures, and Tiger Global. Documentation: https://docs.timescale.com GitHub: https://github.com/timescale/timescaledb Twitter: https://twitter.com/timescaledb
- Website
-
https://www.timescale.com/
External link for Timescale
- Industry
- Software Development
- Company size
- 51-200 employees
- Headquarters
- New York, New York
- Type
- Privately Held
- Founded
- 2015
- Specialties
- RDBMS, OpenTelemetry, Observability, Promscale, Technology, PostgreSQL, SQL, Data Historian, Geospatial Data, Time-Series Data, Databases, IoT, Sensor Data, Metrics, Developer Community, Software Development, Open Source, Software, and Data Management
Products
Timescale Cloud
Time Series Databases (TSDB)
TimescaleDB is a time-series SQL database providing fast analytics, scalability, with automated data management on a proven storage engine.
Locations
-
Primary
335 Madison Ave.
Floor 5, Suite E
New York, New York 10017, US
Employees at Timescale
Updates
-
Timescale reposted this
Building AI and Developer Products at Timescale | I talk about vector databases, RAG, search, AI agents and PostgreSQL
If you're building a RAG system, embeddings search alone won't cut it. Go beyond vector search in our new blog by Jason Liu, where we deep dive into how truly useful RAG systems need a multi-layered approach to address limitations of vector search and provide real value to users. Here's an overview of some of the techniques you'll learn to improve your RAG application: - Text to SQL - Hybrid search - Time-based filtering - Structured extraction with LLMs - Metadata augmentation and filtering - Parallel tool calling (semantic search, SQL generation) - And more! https://lnkd.in/eRXfCKyX #rag #vectorsearch #embeddingsearch #vectordatabase
RAG Is More Than Just Vector Search
timescale.com
-
How can you leverage PostgreSQL's JSONB data type for flexible data storage? 📋 Explore the versatility of PostgreSQL's JSONB data type for storing semi-structured data, enabling dynamic schema design and simplified data modeling. JSONB allows efficient storage, indexing, and querying of JSON data within PostgreSQL, making it ideal for applications requiring flexible data structures. Code: sql CREATE TABLE user_data (id SERIAL PRIMARY KEY, info JSONB); INSERT INTO user_data (info) VALUES ('{"name": "John", "age": 30}'); SELECT info->>'name', info->>'age' FROM user_data; #JSONB #FlexibleDataStorage
-
Timescale reposted this
👀 Seen in Timescale Slack: "The bulk of PostgreSQL 17 support for timescaledb is merged...never have we been so far along to support the next major postgres version before it is even released" The planned date for PostgreSQL 17 is September 26... the team did this *weeks* before This team is 🔥
-
Timescale reposted this
Building AI and Developer Products at Timescale | I talk about vector databases, RAG, search, AI agents and PostgreSQL
If you're building an AI app with PostgreSQL/ pgvector, you'll probably need to add filters to your semantic search to get better results. Here's an overview of 5 most useful filter options for building search and RAG apps: 𝟏/ 𝐌𝐞𝐭𝐚𝐝𝐚𝐭𝐚 𝐟𝐢𝐥𝐭𝐞𝐫 Use case: A technical documentation search system for a software company with multiple products. This query searches for documents related to the CRM Software product, specifically API reference documents. 𝟐/ 𝐂𝐨𝐦𝐩𝐨𝐬𝐢𝐭𝐞 𝐟𝐢𝐥𝐭𝐞𝐫 Use case: An e-commerce product recommendation system that considers both user preferences and product attributes. This query combines multiple filters to find relevant products within specific categories, price range, stock status, and rating. 𝟑/ 𝐓𝐢𝐦𝐞-𝐛𝐚𝐬𝐞𝐝 𝐟𝐢𝐥𝐭𝐞𝐫 Use case: A news article recommendation system that prioritizes recent content. This query retrieves semantically similar news articles published within the last 7 days. 𝟒/ 𝐏𝐞𝐫𝐦𝐢𝐬𝐬𝐢𝐨𝐧𝐬-𝐛𝐚𝐬𝐞𝐝 𝐟𝐢𝐥𝐭𝐞𝐫 Use case: A company-wide knowledge base where access to certain documents is restricted based on user roles. (This is super common for internal RAG apps!) This query ensures that only documents the user has permission to access are included in the search results. 𝟓/ 𝐆𝐞𝐨-𝐬𝐩𝐚𝐭𝐢𝐚𝐥 𝐟𝐢𝐥𝐭𝐞𝐫 Use case: A location-based service recommendation system for tourists. This query combines semantic search with geospatial filtering to find tourist attractions within a 5km radius of the user's location, ordered by semantic relevance and distance. 𝐏𝐫𝐨-𝐭𝐢𝐩: Use pgvectorscale and the StreamingDiskANN index to get higher search accuracy for filtered search. You can learn more about how streaming filtering works in the blog post linked in the comments: pgvectorscale is open-source and free to use under the PostgreSQL license. Head over to GitHub to get started. I hope this thread was helpful, let me know what other topics you'd like me to cover regarding building RAG and search systems with Postgres and pgvector. #semanticsearch #vectordatabase #postgresql #rag #pgvector #devtool #pinecone #search
-
Just ICYMI: The 2024 State of PostgreSQL survey is now open!💥🤯 It aims to gather insights from the PostgreSQL community and help track trends such as the increasing popularity of PostgreSQL, growth in managed services adoption, and evolving learning preferences.✨ Last year's report showed a rise in PostgreSQL usage among small businesses and developers, with its open-source nature, reliability, and strong documentation being top reasons for its use. The survey will remain open until September 30, 2024. Take it now! https://lnkd.in/gER82F4y
State of PostgreSQL 2024 survey 🐘
https://typeform.com
-
Timescale reposted this
Data/AI folks in New York - Data Driven NYC #114 just announced (September 18 at 6pm) 🚨 🚨 🚨 * Captions AI with co-founder Dwight Churchill * Timescale with co-founder 🐯 Michael Freedman * TitanML with co-founder Meryem Arik Everyone welcome! But you must RSVP, link below 👇 #AI #ML #data #startups #LLM
Data Driven NYC featuring Captions AI, Timescale and TitanML
eventbrite.com
-
Timescale reposted this
New webinar alert! 🚨 https://lu.ma/0v7nwfxd
Building AI applications with PostgreSQL: A busy developer's guide · Zoom · Luma
lu.ma
-
Timescale reposted this
Me and Sven (Timescale colleague) gave a talk about improving #PostgreSQL plans by “constifying” expressions at #pgibiza2024. It’s an optimization we’ve implemented in TimescaleDB. References: https://lnkd.in/dvJb_pmi https://lnkd.in/d8SJjpQ4
How We Fixed Long-Running PostgreSQL now( ) Queries (and Made Them Lightning Fast)
timescale.com
-
Timescale reposted this
I came across this article about "SQL as fast as NoSQL" today (as always great content Franck Pachot and Yugabyte). I know it's not the point of the article, but what took my eye was the table showing independent ingest benchmarking, with TimescaleDB topping the list 👀. I got old mate ChatGPT to work out the relative ingest rates as percentages 🔥: --- Here are the relative insert speeds as percentages compared to TimescaleDB #TimescaleDB: 100.00% (600,000 inserts/sec) #InfluxDB: 76.67% (460,000 inserts/sec) #PostgreSQL: 71.33% (428,000 inserts/sec) #Cassandra async inserts: 68.33% (410,000 inserts/sec) #Cassandra sync inserts: 64.83% (389,000 inserts/sec) #YugabyteDB YSQL: 49.17% (295,000 inserts/sec) #YugabyteDB YCQL: 48.00% (288,000 inserts/sec) #Elasticsearch: 28.33% (170,000 inserts/sec) #ArrangoDB: 22.83% (137,000 inserts/sec) #CockroachDB: 15.17% (91,000 inserts/sec)
SQL as fast as NoSQL, Bulk Loads, Covering and Partial Indexes
dev.to