diff options
Diffstat (limited to 'doc/TODO.detail/logging')
-rw-r--r-- | doc/TODO.detail/logging | 207 |
1 files changed, 207 insertions, 0 deletions
diff --git a/doc/TODO.detail/logging b/doc/TODO.detail/logging new file mode 100644 index 00000000000..2decf2a529c --- /dev/null +++ b/doc/TODO.detail/logging @@ -0,0 +1,207 @@ +From owner-pgsql-hackers@hub.org Fri Nov 13 13:24:37 1998 +Received: from hub.org (majordom@hub.org [209.47.148.200]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA13457 + for <maillist@candle.pha.pa.us>; Fri, 13 Nov 1998 13:24:35 -0500 (EST) +Received: from localhost (majordom@localhost) + by hub.org (8.9.1/8.9.1) with SMTP id NAA02464; + Fri, 13 Nov 1998 13:22:52 -0500 (EST) + (envelope-from owner-pgsql-hackers@hub.org) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 13 Nov 1998 13:21:14 +0000 (EST) +Received: (from majordom@localhost) + by hub.org (8.9.1/8.9.1) id NAA02331 + for pgsql-hackers-outgoing; Fri, 13 Nov 1998 13:21:12 -0500 (EST) + (envelope-from owner-pgsql-hackers@postgreSQL.org) +Received: from orion.SAPserv.Hamburg.dsh.de (Tpolaris2.sapham.debis.de [53.2.131.8]) + by hub.org (8.9.1/8.9.1) with SMTP id NAA02316 + for <pgsql-hackers@postgreSQL.org>; Fri, 13 Nov 1998 13:21:06 -0500 (EST) + (envelope-from wieck@sapserv.debis.de) +Received: by orion.SAPserv.Hamburg.dsh.de + for pgsql-hackers@postgreSQL.org + id m0zeOEf-000EBPC; Fri, 13 Nov 98 19:46 MET +Message-Id: <m0zeOEf-000EBPC@orion.SAPserv.Hamburg.dsh.de> +From: jwieck@debis.com (Jan Wieck) +Subject: [HACKERS] shmem limits and redolog +To: pgsql-hackers@postgreSQL.org (PostgreSQL HACKERS) +Date: Fri, 13 Nov 1998 19:46:20 +0100 (MET) +Reply-To: jwieck@debis.com (Jan Wieck) +X-Mailer: ELM [version 2.4 PL25] +Content-Type: text +Sender: owner-pgsql-hackers@postgreSQL.org +Precedence: bulk +Status: ROr + +Hi, + + I'm currently hacking around on a solution for logging all + database operations at query level that can recover a crashed + database from the last successful backup by redoing all the + commands. + + Well, I wanted it to be as flexible as can. So I decided to + make it per database configurable. One could say which + databases are logged and if a database is, if it is logged + sync or async (in sync mode, every COMMIT forces an fsync of + the actual logfile and controlfiles). + + To make async mode as fast as can, I'm using a shared memory + of 32K per database (not per backend) that is used as a wrap + around buffer from the backends to place their query + information. So the log writer can fall a little behind if + there are many backends doing different things that don't + lock each other. + + Now I'm a little in doubt about the shared memory limits + reported. Was it a good decision to use shared memory? Am I + better off using socket's? + + The bad thing in what I have up to now (it's far from + complete) is, that even if a database isn't currently logged, + a redolog writer is started and creates the 32K shmem segment + (plus a semaphore set with 5 semaphores). This is because I + plan to create commands like + + ALTER DATABASE LOG MODE=ASYNC LOGDIR='/somewhere/dbname'; + + and the like that can be used at runtime (while more than one + backend is connected to the database) to turn logging on/off, + switch to/from backup mode (all other activity is stopped) + etc. + + So every 32 databases will require another megabyte of shared + memory. The logging master controls which databases have + activity and kills redolog writers after some time of + inactivity, and the shmem is freed then. But it can hurt if + someone really has many many databases that are all used at + the same time. + + What do the others say? + + +Jan + +-- + +#======================================================================# +# It's easier to get forgiveness for being wrong than for being right. # +# Let's break this rule - forgive me. # +#======================================== jwieck@debis.com (Jan Wieck) # + + + + +From owner-pgsql-hackers@hub.org Wed Dec 16 15:46:41 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id PAA00521 + for <maillist@candle.pha.pa.us>; Wed, 16 Dec 1998 15:46:40 -0500 (EST) +Received: from hub.org (majordom@hub.org [209.47.145.100]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id PAA08772 for <maillist@candle.pha.pa.us>; Wed, 16 Dec 1998 15:10:01 -0500 (EST) +Received: from localhost (majordom@localhost) + by hub.org (8.9.1/8.9.1) with SMTP id PAA01254; + Wed, 16 Dec 1998 15:06:56 -0500 (EST) + (envelope-from owner-pgsql-hackers@hub.org) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 16 Dec 1998 14:58:11 +0000 (EST) +Received: (from majordom@localhost) + by hub.org (8.9.1/8.9.1) id OAA00660 + for pgsql-hackers-outgoing; Wed, 16 Dec 1998 14:58:10 -0500 (EST) + (envelope-from owner-pgsql-hackers@postgreSQL.org) +Received: from orion.SAPserv.Hamburg.dsh.de (Tpolaris2.sapham.debis.de [53.2.131.8]) + by hub.org (8.9.1/8.9.1) with SMTP id OAA00643 + for <pgsql-hackers@postgreSQL.org>; Wed, 16 Dec 1998 14:58:05 -0500 (EST) + (envelope-from wieck@sapserv.debis.de) +Received: by orion.SAPserv.Hamburg.dsh.de + for pgsql-hackers@postgreSQL.org + id m0zqNDo-000EBTC; Wed, 16 Dec 98 21:07 MET +Message-Id: <m0zqNDo-000EBTC@orion.SAPserv.Hamburg.dsh.de> +From: jwieck@debis.com (Jan Wieck) +Subject: Re: [HACKERS] redolog - for discussion +To: vadim@krs.ru (Vadim Mikheev) +Date: Wed, 16 Dec 1998 21:07:00 +0100 (MET) +Cc: jwieck@debis.com, pgsql-hackers@postgreSQL.org +Reply-To: jwieck@debis.com (Jan Wieck) +In-Reply-To: <3677B71D.C67462B3@krs.ru> from "Vadim Mikheev" at Dec 16, 98 08:35:25 pm +X-Mailer: ELM [version 2.4 PL25] +Content-Type: text +Sender: owner-pgsql-hackers@postgreSQL.org +Precedence: bulk +Status: RO + +Vadim wrote: + +> +> Jan Wieck wrote: +> > +> > RECOVER DATABASE {ALL | UNTIL 'datetime' | RESET}; +> > +> ... +> > +> > For the others, the backend starts the recovery program +> > which reads the redolog files, establishes database +> > connections as required and reruns all the commands in +> ^^^^^^^^^^^^^^^^^^^^^^^^^^ +> > them. If a required logfile isn't found, it tells the +> ^^^^^ +> +> I foresee problems with using _commands_ logging for +> recovery/replication -:(( +> +> Let's consider two concurrent updates in READ COMMITTED mode: +> +> update test set x = 2 where y = 1; +> +> and +> +> update test set x = 3 where y = 1; +> +> The result of both committed transaction will be x = 2 +> if the 1st transaction updated row _after_ 2nd transaction +> and x = 3 if the 2nd transaction gets row after 1st one. +> Order of updates is not defined by order in which commands +> begun and so order in which commands should be rerun +> will be unknown... + + Yepp, the order in which commands begun is absolutely not of + interest. Locking could already delay the execution of one + command until another one started later has finished and + released the lock. It's a classic race condition. + + Thus, my plan was to log the queries just before the call to + CommitTransactionCommand() in tcop. This has the advantage, + that queries which bail out with errors don't get into the + log at all and must not get rerun. And I can set a static + flag to false before starting the command, which is set to + true in the buffer manager when a buffer is written (marked + dirty), so filtering out queries that do no updates at all is + easy. + + Unfortunately query level logging get's hit by the current + implementation of sequence numbers. If a query that get's + aborted somewhere in the middle (maybe by a trigger) called + nextval() for rows processed earlier, the sequence number + isn't advanced at recovery time, because the query is + suppressed at all. And sequences aren't locked, so for + concurrently running queries getting numbers from the same + sequence, the results aren't reproduceable. If some + application selects a value resulting from a sequence and + uses that later in another query, how could the redolog know + that this has changed? It's a Const in the query logged, and + all that corrupts the whole thing. + + All that is painful and I don't see another solution yet than + to hook into nextval(), log out the numbers generated in + normal operation and getting back the same numbers in redo + mode. + + The whole thing gets more and more complicated :-( + + +Jan + +-- + +#======================================================================# +# It's easier to get forgiveness for being wrong than for being right. # +# Let's break this rule - forgive me. # +#======================================== jwieck@debis.com (Jan Wieck) # + + + + |