LumoSQL
diff --git a/‎doc/www/lumo-architecture.md‎
Lines changed: 8 additions & 8 deletions b/‎doc/www/lumo-architecture.md‎
Lines changed: 8 additions & 8 deletions
diff --git a/‎doc/www/lumo-benchmarking.md‎
Lines changed: 2 additions & 2 deletions b/‎doc/www/lumo-benchmarking.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎doc/www/lumo-corruption-detection-and-magic.md‎
Lines changed: 1 addition & 1 deletion b/‎doc/www/lumo-corruption-detection-and-magic.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/www/lumo-landscape.md‎
Lines changed: 4 additions & 4 deletions b/‎doc/www/lumo-landscape.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎doc/www/lumo-legal-aspects.md‎
Lines changed: 1 addition & 1 deletion b/‎doc/www/lumo-legal-aspects.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/www/lumo-not-forking.md‎
Lines changed: 17 additions & 3 deletions b/‎doc/www/lumo-not-forking.md‎
Lines changed: 17 additions & 3 deletions
diff --git a/‎doc/www/lumo-project-aims.md‎
Lines changed: 1 addition & 1 deletion b/‎doc/www/lumo-project-aims.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/www/lumo-relevant-codebases.md‎
Lines changed: 2 additions & 2 deletions b/‎doc/www/lumo-relevant-codebases.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎doc/www/lumo-relevant-knowledgebase.md‎
Lines changed: 1 addition & 1 deletion b/‎doc/www/lumo-relevant-knowledgebase.md‎
Lines changed: 1 addition & 1 deletion
@@ -195,25 +195,25 @@ Single-level store concepts are well-explained in [Howard Chu's 2013 MDB Paper](
 > Store". The basic idea is to treat all of computer memory as a single address
 > space. Pages of storage may reside in primary storage (RAM) or in secondary
 > storage (disk) but the actual location is unimportant to the application. If
-> a referenced page is currently in primary storagethe application can use it
+> a referenced page is currently in primary storage the application can use it
 > immediately, if not a page fault occurs and the operating system brings the
 > page into primary storage. The concept was introduced in 1964 in the Multics
 > operating system but was generally abandoned by the early 1990s as data
 > volumes surpassed the capacity of 32 bit address spaces. (We last knew of it
-> in the ApolloDOMAIN operating system, though many other Multics-influenced
+> in the Apollo DOMAIN operating system, though many other Multics-influenced
 > designs carried it on.) With the ubiquity of 64 bit processors today this
 > concept can again be put to good use. (Given a virtual address space limit of
-> 63 bits that puts the upper bound of database size at 8exabytes. Commonly
-> available processors today only implement 48 bit address spaces,limiting us
+> 63 bits that puts the upper bound of database size at 8 exabytes. Commonly
+> available processors today only implement 48 bit address spaces, limiting us
 > to 47 bits or 128 terabytes.) Another operating system requirement for this
 > approach to be viable is a Unified BufferCache. While most POSIX-based
 > operating systems have supported an mmap() system call for many years, their
-> initial implementations kept memory managed by the VM subsystemseparate from
+> initial implementations kept memory managed by the VM subsystem separate from
 > memory managed by the filesystem cache. This was not only wasteful
-> (again,keeping data cached in two places at once) but also led to coherency
+> (again, keeping data cached in two places at once) but also led to coherency
 > problems - data modified through a memory map was not visible using
-> filesystem read() calls, or data modifiedthrough a filesystem write() was not
-> visible in the memory map. Most modern operatingsystems now have filesystem
+> filesystem read() calls, or data modified through a filesystem write() was not
+> visible in the memory map. Most modern operating systems now have filesystem
 > and VM paging unified, so this should not be a concern in most deployments.
 
 
@@ -37,7 +37,7 @@ field of SQL databases, and certainly in the SQLite landscape.
 Benchmarking is a big part of LumoSQL, to determine if changes are an
 improvement. The trouble is that SQLite and other top databases are not really
 benchmarked in a realistic and consistent way, despite SQL server benchmarking
-using tools like TPCC from [tpc.org](https://tpc.org) being an obsessive
+using tools like TPC-C from [tpc.org](http://tpc.org) being an obsessive
 industry in itself, and many testing tools released with SQLite, Postgresql,
 MariaDB etc. But in practical terms there is no way of comparing the most-used
 databases with each other, or even of being sure that the tests that do exist
@@ -223,7 +223,7 @@ groups, those using:
  deployments, who are likely to a wider range of the supported SQL features
 
 The embedded style of statement is typically used within the application process
-space,the code written by these developers is often tightly coupled with the
+space, the code written by these developers is often tightly coupled with the
 SQLite library. The online style of SQL statement is typically more loosely
 coupled with the database implementation and these developers may switch to
 execute similar SQL statements on different databases. Further this second style
 
@@ -87,7 +87,7 @@ SQLite needs row-level integrity checking even more than the online databases be
 * it is easy to backup an SQLite database partway through a transaction, meaning that the restore will be corrupted
 * SQLite does not have robust locking mechanisms available for access by multiple processes at once, since it relies on lockfiles and Posix advisory locking 
 * SQLite provides the [VFS API Interface](https://www.sqlite.org/vfs.html) which users can easily misuse to ignore locking via the sql3_*v2 APIs
-* the on-disk file format is seemingly often corrupted regardless of use case. Better evidence on this is needed but authors of SQLite data file recovery software (see listing in [SQLite Relevant Knowledgebase](./lumo-relevant-knowledebase)) indicates high demand for their servies. Informal shows of hands at conferences indicates that SQLite users expect corruption.
+* the on-disk file format is seemingly often corrupted regardless of use case. Better evidence on this is needed but authors of SQLite data file recovery software (see listing in [SQLite Relevant Knowledgebase](./lumo-relevant-knowledebase)) indicates high demand for their services. Informal shows of hands at conferences indicates that SQLite users expect corruption.
 
 sqlite.org has a much more detailed, but still incomplete, summary of [How to Corrupt an SQLite Database](https://www.sqlite.org/howtocorrupt.html).
 
 
@@ -85,7 +85,7 @@ profile and attracts technical review.
 
 **Sqlite.org is entirely focussed on its existing scope and traditional user
 needs** and [SQLite imposes strict limits](https://sqlite.org/about.html) for
-example “Think of SQLite not as a replacement for but as a replacement for
+example “Think of SQLite not as a replacement for Oracle but as a replacement for
 fopen()” which eliminates many of the possibilities LumoSQL is now exploring
 that go beyond any version of fopen(). Many things that SQLite can used
 for are [declared out of scope](https://sqlite.org/whentouse.html) by the
@@ -98,7 +98,7 @@ project.
 **Sqlite has a very strict and reliable view on maintaining backwards
 compatibility both binary and API (except when it comes to encryption, see
 further down.)** The Sqlite foundation aims to keep SQLite3 interfaces and
-formats stable until the year 2050 years, according to Richard Hipp in the
+formats stable until the year 2050, according to Richard Hipp in the
 podcast interview, as once requested by an airframe construction company
 (Airbus). Whatever happens in years to come SQLite has definitely delivered on
 this to date. This means that there are many things SQLite cannot do which
@@ -156,7 +156,7 @@ serious problems with it too:
  SQLite binary format has almost zero internal integrity checking.
  LumoSQL aims to add options to address this problem.
 
-**Sqlite is less open source than it appears**. The existance of so many SQLite
+**Sqlite is less open source than it appears**. The existence of so many SQLite
 spin-offs is evidence that SQLite code is highly available. However there are
 several aspects of SQLite that mean it cannot be considered open source, in
 ways that are increasingly important in the 21st century:
@@ -235,7 +235,7 @@ SQLite Downstreams
 
 There is still a lot for LumoSQL to explore because there is just so much code, but
 as of March 2020 we are confident code could be assembled from here and there
-and there on the internet to demonstrate the following features:
+on the internet to demonstrate the following features:
 
 - SQLite with Berkely bdb backend
 - SQLite with LevelDB backend
 
@@ -155,7 +155,7 @@ has been released as "Public Domain"
 
 SQLite is not available with encryption. There are two common ways of adding encryption to SQLite, both of which have legal implications: 
 
-1. Purchasing the [SQLite Encryption Extension](https://www.hwaci.com/sw/sqlite/see.html)(SEE) from Richard Hipp's company Hwaci. The SEE is proprietary software, and cannot be use with open source applications. 
+1. Purchasing the [SQLite Encryption Extension](https://www.hwaci.com/sw/sqlite/see.html)(SEE) from Richard Hipp's company Hwaci. The SEE is proprietary software, and cannot be used with open source applications.
 2. [SQLcipher](https://www.zetetic.net/sqlcipher/) which has a open core model. The BSD-licensed open source version requires users to publish copyright notices, and the more capable commercial editions are available on similar terms to SEE, and therefore cannot be used with open source applications. 
 
 There are many other ways of adding encryption to SQLite, some of which are listed in the [Knowledgebase Relevant to LumoSQL](./lumo-relevant-knowledgebase.md).
 
@@ -88,7 +88,7 @@ numbers in order are:
 
 We may extend this definition to deal with version numbering schemes
 used by normal software, however it will never work correctly with the
-version numbers used by INTERCAL compilers.
+version numbers used by [INTERCAL](https://en.wikipedia.org/wiki/INTERCAL) compilers.
 
 The `subtree` key indicates a directory inside the sources to use instead
 of the top level.
@@ -104,7 +104,8 @@ keys need to be present:
 either a single string which is prefixed to the version number, or two
 strings separated by space, the first one is prefixed and the second appended.
 - optionally, `user` and `password` can be specified to obtain access to the
-repository.
+repository (this is currently not implemented, all repositories must be
+accessible without authentication).
 
 A software version can be identified by a generic git commit ID, or by a
 version string similar to the one described for the `compare` key, if the
@@ -147,7 +148,9 @@ a modification is only necessary up to a particular version, because
 for example that modification has been accepted by upstream and is
 no longer necessary. Another use of this key is to identify versions
 in which substantial upstream changes make it difficult to specify a
-modification which works for every possible version.
+modification which works for every possible version. Specifying this
+keyword is essentially equivalent to put the whole `.mod` file in
+a conditional.
 - `method`; the method used to specify the modification; currently, the
 value can be either `patch`, indicating that the final part of the file is
 in a format suitable for passing as standard input to the "patch" program;
@@ -162,6 +165,10 @@ currently no other keys for the `replace` method, and the following for
 the `patch` method:
 
 - `options`: options to pass to the "patch" program (default: "-Nsp1")
+- `list`: extra options to the "patch" program to list what it would do
+instead of actually doing it (this is used internally to figure out
+what changes; the default currently assumes the "patch" program provided
+by most Linux distributions)
 
 # Example Configuration directory <a name="example"></a>
 
@@ -282,3 +289,10 @@ not been modified since; in this case, delete the output directory
 completely, or rename it to something else, and run the program again.
 There is currently no option to override this safety feature.
 
+We plan to add logging to the notforking tool, in which all messages are
+written to a log file (under control of configuration), while the subset
+of messages selected by the verbosity setting will go to standard output;
+this will allow us to increase the amount of information provided and make
+it available if there is a processing error; however in the current version
+this is just planned, and not yet implemented.
+
@@ -100,7 +100,7 @@ nearly all 30 million lines of the Linux kernel can be exclude giving just 200k
 lines. Runtime modularity will be controlled through the same user interfaces 
 as the rest of LumoSQL.
 
-* LumoSQL will ensure that new code can all be active at one, eg
+* LumoSQL will ensure that new code can all be active at once, eg
 multiple backends or frontends for conversion between/upgrading from one
 format or protocol to another. This is crucial to provide continuity and
 supported upgrade paths for users, for example, users who want to become
 
@@ -30,7 +30,7 @@ What is a Relevant Codebase?
 
 There are three dimensions to codebases relevant to LumoSQL:
 
-1. Code that is a derivitive of SQLite code adding a feature or improvement, and
+1. Code that is a derivative of SQLite code adding a feature or improvement, and
 2. Code that has nothing to do with SQLite but implements an interesting database feature we want to use in LumoSQL
 3. Code that supports the development of LumoSQL such as testing, benchmarking or analysing relevant codebases
 
@@ -126,7 +126,7 @@ The on-disk file format is important to many SQLite use cases, and introspection
 
 # List of Relevant Benchmarking and Test Knowledge
 
-Benchmarking is a big part of LumoSQL, to determine if changes are an improvement. The trouble is that SQLite and other top databases are not really benchmarked in realistic and consistent way, despite SQL server benchmarking using tools like TCP being an obsessive industry in itself, and there being myriad of testing tools released with SQLite, Postgresql, MariaDB etc. But in practical terms there is no way of comparing the most-used databases with each other, or even of being sure that the tests that do exist are in any way realistic, or even of simply reproducing results that other people have found. LumoSQL covers so many codebases and use cases that better SQL benchmarking is a project requirement. Benchmarking and testing overlap, which is addressed in the code and docs.
+Benchmarking is a big part of LumoSQL, to determine if changes are an improvement. The trouble is that SQLite and other top databases are not really benchmarked in realistic and consistent way, despite SQL server benchmarking using tools like TPC being an obsessive industry in itself, and there being myriad of testing tools released with SQLite, Postgresql, MariaDB etc. But in practical terms there is no way of comparing the most-used databases with each other, or even of being sure that the tests that do exist are in any way realistic, or even of simply reproducing results that other people have found. LumoSQL covers so many codebases and use cases that better SQL benchmarking is a project requirement. Benchmarking and testing overlap, which is addressed in the code and docs.
 
 The well-described [testing of SQLite](https://sqlite.org/testing.html) involves some open code, some closed code, and many ad hoc processes. Clearly the SQLite team have an internal culture of testing that has benefitted the world. However that is very different to reproducible testing, which is in turn very different to reproducible benchmarking, and that is even without considering whether the benchmarking is a reasonable approximation of actual use cases.
 
 
@@ -78,7 +78,7 @@ Analyser, DB Browser for SQLite, Magnet AXIOM and Oxygen Forensic Detective.)
 
 # List of Relevant Benchmarking and Test Knowledge
 
-Benchmarking is a big part of LumoSQL, to determine if changes are an improvement. The trouble is that SQLite and other top databases are not really benchmarked in realistic and consistent way, despite SQL server benchmarking using tools like TCP being an obsessive industry in itself, and there being myriad of testing tools released with SQLite, Postgresql, MariaDB etc. But in practical terms there is no way of comparing the most-used databases with each other, or even of being sure that the tests that do exist are in any way realistic, or even of simply reproducing results that other people have found. LumoSQL covers so many codebases and use cases that better SQL benchmarking is a project requirement. Benchmarking and testing overlap, which is addressed in the code and docs.
+Benchmarking is a big part of LumoSQL, to determine if changes are an improvement. The trouble is that SQLite and other top databases are not really benchmarked in realistic and consistent way, despite SQL server benchmarking using tools like TPC being an obsessive industry in itself, and there being myriad of testing tools released with SQLite, Postgresql, MariaDB etc. But in practical terms there is no way of comparing the most-used databases with each other, or even of being sure that the tests that do exist are in any way realistic, or even of simply reproducing results that other people have found. LumoSQL covers so many codebases and use cases that better SQL benchmarking is a project requirement. Benchmarking and testing overlap, which is addressed in the code and docs.
 
 The well-described [testing of SQLite](https://sqlite.org/testing.html) involves some open code, some closed code, and many ad hoc processes. Clearly the SQLite team have an internal culture of testing that has benefitted the world. However that is very different to reproducible testing, which is in turn very different to reproducible benchmarking, and that is even without considering whether the benchmarking is a reasonable approximation of actual use cases.