+$PostgreSQL$
+
Gin for PostgreSQL
==================
$PostgreSQL$
+GiST Indexing
+=============
+
This directory contains an implementation of GiST indexing for Postgres.
GiST stands for Generalized Search Tree. It was introduced in the seminal paper
a problem of uncompleted insertions when recovering after crash, which
was not touched in the paper.
-SEARCH ALGORITHM
+Search Algorithm
+----------------
Function gettuple finds a tuple which satisfies the search
predicate. It store their state and returns next tuple under
end
-INSERT ALGORITHM
+Insert Algorithm
+----------------
INSERT guarantees that the GiST tree remains balanced. User defined key method
Penalty is used for choosing a subtree to insert; method PickSplit is used for
$PostgreSQL$
+Hash Indexing
+=============
+
This directory contains an implementation of hash indexing for Postgres. Most
of the core ideas are taken from Margo Seltzer and Ozan Yigit, A New Hashing
Package for UNIX, Proceedings of the Winter USENIX Conference, January 1991.
There is no provision for reducing the number of buckets, either.
-Page addressing
+Page Addressing
---------------
There are four kinds of pages in a hash index: the meta page (page zero),
the initially created buckets.
-Lock definitions
+Lock Definitions
----------------
We use both lmgr locks ("heavyweight" locks) and buffer context locks
stronger than necessary, but it makes the proof of no deadlock obvious.)
-Pseudocode algorithms
+Pseudocode Algorithms
---------------------
The operations we need to support are: readers scanning the index for
we can just error out without any great harm being done.
-Free space management
+Free Space Management
---------------------
(Question: why is this so complicated? Why not just have a linked list
locks. Since they need no lmgr locks, deadlock is not possible.
-Other notes
+Other Notes
-----------
All the shenanigans with locking prevent a split occurring while *another*
$PostgreSQL$
+Btree Indexing
+--------------
+
This directory contains a correct implementation of Lehman and Yao's
high-concurrency B-tree management algorithm (P. Lehman and S. Yao,
Efficient Locking for Concurrent Operations on B-Trees, ACM Transactions
Shasha (V. Lanin and D. Shasha, A Symmetric Concurrent B-Tree Algorithm,
Proceedings of 1986 Fall Joint Computer Conference, pp 380-389).
-The Lehman and Yao algorithm and insertions
+The Lehman and Yao Algorithm and Insertions
-------------------------------------------
We have made the following changes in order to incorporate the L&Y algorithm
this calculation, otherwise it is possible to find that the incoming
item doesn't fit on the split page where it needs to go!
-The deletion algorithm
+The Deletion Algorithm
----------------------
Before deleting a leaf item, we get a super-exclusive lock on the target
possible to implement the test with a small counter value stored on each
index page.
-On-the-fly deletion of index tuples
+On-the-Fly Deletion Of Index Tuples
-----------------------------------
If a process visits a heap tuple and finds that it's dead and removable
btbulkdelete has to get super-exclusive lock on every leaf page, not only
the ones where it actually sees items to delete.
-WAL considerations
+WAL Considerations
------------------
The insertion and deletion algorithms in themselves don't guarantee btree
immediately deleted due to a subsequent crash, there is no loss of
consistency, and the empty page will be picked up by the next VACUUM.
-Other things that are handy to know
+Other Things That Are Handy to Know
-----------------------------------
Page zero of every btree is a meta-data page. This page stores the
scanned to decide whether to return the entry and whether the scan can
stop (see _bt_checkkeys()).
-Notes about data representation
+Notes About Data Representation
-------------------------------
The right-sibling link required by L&Y is kept in the page "opaque
corresponds to the fact that an L&Y non-leaf page has one more pointer
than key.
-Notes to operator class implementors
+Notes to Operator Class Implementors
------------------------------------
With this implementation, we require each supported combination of
takes care of initializing the memory subsystem at main transaction start.
-Subtransaction handling
+Subtransaction Handling
-----------------------
Subtransactions are implemented using a stack of TransactionState structures,
explicit transaction block has been established, while DefineSavepoint is not.
-Transaction and subtransaction numbering
+Transaction and Subtransaction Numbering
----------------------------------------
Transactions and subtransactions are assigned permanent XIDs only when/if
own VXIDs; they use the parent top transaction's VXID.
-Interlocking transaction begin, transaction end, and snapshots
+Interlocking Transaction Begin, Transaction End, and Snapshots
--------------------------------------------------------------
We try hard to minimize the amount of overhead and lock contention involved
clog.c. pg_subtrans is contained completely in subtrans.c.
-Write-Ahead Log coding
+Write-Ahead Log Coding
----------------------
The WAL subsystem (also called XLOG in the code) exists to guarantee crash
$PostgreSQL$
+System Catalog
+--------------
+
This directory contains .c files that manipulate the system catalogs;
src/include/catalog contains the .h files that define the structure
of the system catalogs.
(a separate FreeExprContext call is not necessary)
-EvalPlanQual (READ COMMITTED update checking)
+EvalPlanQual (READ COMMITTED Update Checking)
---------------------------------------------
For simple SELECTs, the executor need only pay attention to tuples that are
-*******************************************************************************
-* *
-* EXPLANATION OF THE NODE STRUCTURES *
-* - Andrew Yu (11/94) *
-* *
-* Copyright (c) 1994, Regents of the University of California *
-* *
-* $PostgreSQL$
-* *
-*******************************************************************************
-
-INTRODUCTION
+$PostgreSQL$
+
+Node Structures
+===============
+
+Andrew Yu (11/94)
+
+Introduction
+------------
The current node structures are plain old C structures. "Inheritance" is
achieved by convention. No additional functions will be generated. Functions
memnodes.h - memory nodes
-STEPS TO ADD A NODE
+Steps to Add a Node
+-------------------
Suppose you wana define a node Foo:
bother writing a creator function in makefuncs.c)
-HISTORICAL NOTE
+Historical Note
+---------------
Prior to the current simple C structure definitions, the Node structures
uses a pseudo-inheritance system which automatically generates creator and
-Summary
--------
+$PostgreSQL$
+
+Optimizer
+---------
These directories take the Query structure returned by the parser, and
generate a plan used by the executor. The /plan directory generates the
JOIN_INNER, JOIN_LEFT, etc.)
-Valid OUTER JOIN optimizations
+Valid OUTER JOIN Optimizations
------------------------------
The planner's treatment of outer join reordering is based on the following
preventing it from being formed before the lower OJ is.)
-Pulling up subqueries
+Pulling Up Subqueries
---------------------
As we described above, a subquery appearing in the range table is planned
-Subselect notes from Vadim.
+$PostgreSQL$
+Subselects
+----------
+
+Vadim B. Mikheev
From owner-pgsql-hackers@hub.org Fri Feb 13 09:01:19 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id JAA11576
for <maillist@candle.pha.pa.us>; Fri, 13 Feb 1998 09:01:17 -0500 (EST)
-Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id IAA09761 for <maillist@candle.pha.pa.us>; Fri, 13 Feb 1998 08:41:22 -0500 (EST)
+Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id IAA09761 for <maillist@candle.pha.pa.us>; Fri, 13 Feb 1998 08:41:22 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id IAA08135; Fri, 13 Feb 1998 08:40:17 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 13 Feb 1998 08:38:42 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id IAA06646 for pgsql-hackers-outgoing; Fri, 13 Feb 1998 08:38:35 -0500 (EST)
+$PostgreSQL$
+
+Parser
+------
+
This directory does more than tokenize and parse SQL queries. It also
creates Query structures for the various complex queries that are passed
to the optimizer and then executor.
+$PostgreSQL$
+
+Darwin
+------
+
The file system.c included herein is taken directly from Apple's Darwin
open-source CVS archives, and is redistributed under the BSD copyright
notice it bears. (According to Apple's CVS logs, their version is
-Snowball-based stemming
+$PostgreSQL$
+
+Snowball-Based Stemming
-----------------------
This module uses the word stemming code developed by the Snowball project,
$PostgreSQL$
-Notes about shared buffer access rules
+Notes About Shared Buffer Access Rules
--------------------------------------
There are two separate access control mechanisms for shared disk buffers:
single relation anyway.
-Buffer manager's internal locking
+Buffer Manager's Internal Locking
---------------------------------
Before PostgreSQL 8.1, all operations of the shared buffer manager itself
a field to show which backend is doing its I/O).
-Normal buffer replacement strategy
+Normal Buffer Replacement Strategy
----------------------------------
There is a "free list" of buffers that are prime candidates for replacement.
of the basic select-a-victim-buffer algorithm.)
-Buffer ring replacement strategy
+Buffer Ring Replacement Strategy
---------------------------------
When running a query that needs to access a large number of pages just once,
256KB between WAL flushes should be more efficient.
-Background writer's processing
+Background Writer's Processing
------------------------------
The background writer is designed to write out pages that are likely to be
$PostgreSQL$
-Mon Jul 18 11:09:22 PDT 1988 W.KLAS
-Cache invalidation synchronization routines:
+Cache Invalidation Synchronization Routines
===========================================
+Mon Jul 18 11:09:22 PDT 1988 W.KLAS
+
The cache synchronization is done using a message queue. Every
backend can register a message which then has to be read by
all backends. A message read by all backends is removed from the
$PostgreSQL$
-
-LOCKING OVERVIEW
+Locking Overview
+----------------
Postgres uses three types of interprocess locks:
The rest of this README file discusses the regular lock manager in detail.
-LOCK DATA STRUCTURES
+Lock Data Structures
+--------------------
Lock methods describe the overall locking behavior. Currently there are
two lock methods: DEFAULT and USER.
---------------------------------------------------------------------------
-LOCK MANAGER INTERNAL LOCKING
+Lock Manager Internal Locking
+-----------------------------
Before PostgreSQL 8.2, all of the shared-memory data structures used by
the lock manager were protected by a single LWLock, the LockMgrLock;
when needed.
-THE DEADLOCK DETECTION ALGORITHM
+The Deadlock Detection Algorithm
+--------------------------------
Since we allow user transactions to request locks in any order, deadlock
is possible. We use a deadlock detection/breaking algorithm that is
Got that?
-Miscellaneous notes:
+Miscellaneous Notes
+-------------------
1. It is easily proven that no deadlock will be missed due to our
asynchronous invocation of deadlock checking. A deadlock cycle in the WFG
principle that autovacuum has a low locking priority (eg it must not block
DDL on the table).
-USER LOCKS
+User Locks
+----------
User locks are handled totally on the application side as long term
cooperative locks which extend beyond the normal transaction boundaries.
# $PostgreSQL$
+Storage Manager
+---------------
+
In the original Berkeley Postgres system, there were several storage managers,
of which only the "magnetic disk" manager remains. (At Berkeley there were
also managers for the Sony WORM optical disk jukebox and persistent main
-Proposal for function-manager redesign 19-Nov-2000
+$PostgreSQL$
+
+Function Manager
+================
+
+Proposal For Function-Manager Redesign 19-Nov-2000
--------------------------------------
We know that the existing mechanism for calling Postgres functions needs
backward compatibility for user-written C functions.
-Changes in pg_proc (system data about a function)
+Changes In pg_proc (System Data About a Function)
-------------------------------------------------
A new column "proisstrict" will be added to the system pg_proc table.
am open to arguments for the other choice.
-The new function-manager interface
+The New Function-Manager Interface
----------------------------------
The core of the new design is revised data structures for representing
should have no portability or optimization problems.
-Function coding conventions
+Function Coding Conventions
---------------------------
As an example, int4 addition goes from old-style
syntactic-sugar macros for these cases is useful.
-Call-site coding conventions
+Call-Site Coding Conventions
----------------------------
There are many places in the system that call either a specific function
continue to support the same external appearance.
-Support for TOAST-able data types
+Support for TOAST-Able Data Types
---------------------------------
For TOAST-able data types, the PG_GETARG macro will deliver a de-TOASTed
tuple toaster will decide whether toasting is needed.
-Functions accepting or returning sets
+Functions Accepting or Returning Sets
-------------------------------------
[ this section revised 29-Aug-2002 for 7.3 ]
be called multiple times, once for each element of the input set.
-Notes about function handlers
+Notes About Function Handlers
-----------------------------
Handlers for classes of functions should find life much easier and
FmgrInfo itself.
-Telling the difference between old- and new-style functions
+Telling the Difference Between Old- and New-Style Functions
-----------------------------------------------------------
During the conversion process, we carried two different pg_language
+$PostgreSQL$
+
+Encodings
+---------
+
encnames.c: public functions for both the backend and the frontend.
conv.c: static functions and a public table for code conversion
wchar.c: mostly static functions and a public table for mb string and
$PostgreSQL$
-
-GUC IMPLEMENTATION NOTES
+Guc Implementation Notes
+========================
The GUC (Grand Unified Configuration) module implements configuration
variables of multiple types (currently boolean, enum, int, float, and string).
determining which setting is used.
-PER-VARIABLE HOOKS
+Per-Variable Hooks
+------------------
Each variable known to GUC can optionally have an assign_hook and/or
a show_hook to provide customized behavior. Assign hooks are used to
by SHOW.
-SAVING/RESTORING GUC VARIABLE VALUES
+Saving/Restoring Guc Variable Values
+------------------------------------
Prior values of configuration variables must be remembered in order to deal
with several special cases: RESET (a/k/a SET TO DEFAULT), rollback of SET
changed.
-STRING MEMORY HANDLING
+String Memory Handling
+----------------------
String option values are allocated with strdup, not with the
pstrdup/palloc mechanisms. We would need to keep them in a permanent
$PostgreSQL$
-Notes about memory allocation redesign
---------------------------------------
+Notes About Memory Allocation Redesign
+======================================
Up through version 7.0, Postgres had serious problems with memory leakage
during large queries that process a lot of pass-by-reference data. There
after each tuple.
-Some notes about the palloc API versus standard C library
+Some Notes About the palloc API Versus Standard C Library
---------------------------------------------------------
The behavior of palloc and friends is similar to the standard C library's
* pfree and repalloc do not accept a NULL pointer. This is intentional.
-pfree/repalloc no longer depend on CurrentMemoryContext
+pfree/repalloc No Longer Depend On CurrentMemoryContext
-------------------------------------------------------
In this proposal, pfree() and repalloc() can be applied to any chunk
temporary-allocation context. That might as well be CurrentMemoryContext.
-Additions to the memory-context mechanism
+Additions to the Memory-Context Mechanism
-----------------------------------------
If we are going to have more contexts, we need more mechanism for keeping
itself".
-Globally known contexts
+Globally Known Contexts
-----------------------
There will be several widely-known contexts that will typically be
to be treated as a normal ERROR condition, not a FATAL error.
-Contexts for prepared statements and portals
+Contexts For Prepared Statements And Portals
--------------------------------------------
A prepared-statement object has an associated private context, in which
and won't actually need any storage allocated in their private contexts.
-Transient contexts during execution
+Transient Contexts During Execution
-----------------------------------
When creating a prepared statement, the parse and plan trees will be built
nested transactions, but this'll do fine for now.)
-Mechanisms to allow multiple types of contexts
+Mechanisms to Allow Multiple Types of Contexts
----------------------------------------------
We may want several different types of memory contexts with different
squeezing out that last little bit ...
-More control over aset.c behavior
+More Control Over aset.c Behavior
---------------------------------
Currently, aset.c allocates an 8K block upon the first allocation in
thrashing.
-Other notes
+Other Notes
-----------
The original version of this proposal suggested that functions returning
$PostgreSQL$
-Notes about resource owners
+Notes About Resource Owners
---------------------------
ResourceOwner objects are a concept invented to simplify management of
as query parsing) when no associated Portal exists yet.
-API overview
+API Overview
------------
The basic operations on a ResourceOwner are: