Changes

1,306 bytes added , 16:09, 1 October 2013

updating notes

Line 3: Line 3:

==Article information==

−

We will ~~need to decide if we want to~~ preserve book information. ~~One approach is to build~~ 'canonical' book content in a separate namespace~~. But this will effectively be obsolete as soon as the wiki version is edited~~ (~~Nupedia/Wikipedia story).~~ SPE did this. SEG ~~have~~ sort of ~~done~~ this too~~. It could be confusing for readers (two versions of everything~~).

+

We will preserve book information. We will not maintain 'canonical' book content in a separate namespace (SPE did this; SEG sort of did this too).

−

The article info is semantically tagged (yay!). Everything after <code><nowiki><HR></nowiki></code> and before <code><CJSTEXT></code> is header information about the article — book name, volume number, page range, etc:

+

The article info in Datapages is semantically tagged (yay!). This is very useful indeed. Everything after <code><nowiki><HR></nowiki></code> and before <code><CJSTEXT></code> is header information about the article — book name, volume number, page range, etc:

<pre>

Line 17: Line 17:

Author(s): <CJSAUTHOR>Stephen A. Holditch</CJSAUTHOR>

Text:

+

</pre>

+

Most of this is repeated in the page's <code><nowiki><head></nowiki></code>, and this is the easiest place to get the URLs for the HTML and PDF resources:

+

<pre>

+

+

+

+

+

+

+

+

+

+

+

+

+

+

</pre>

Line 22: Line 41:

Everything after <code><CJSTEXT></code> and before <code></CJSTEXT></code> is the text of the article.

−

~~Some~~ things need to be converted:

+

Many things need to be converted:

+

* Remove all the <code><nowiki> </nowiki></code> and <code><nowiki> </nowiki></code> tags

+

* Remove the page breaks, which are contained in <code><nowiki><blockquote></nowiki></code> tags

* Delete the <code><nowiki></nowiki></code> and <code><nowiki> </nowiki></code> tags

Line 31: Line 52:

* H3, subsection name — <code><nowiki>Expendable Gun</nowiki></code>

−

They all need to be converted to sentence case.

+

They all need to be converted to sentence case, but leaving proper names, all-uppercase words, and parentheses alone.

===Figures===

−

~~First mention of a figure, e.g. '''Figure 1''', should trigger a file call:~~

+

The actual figure references — which might come before or after the mention in the text — look like this:

−

~~: <code>[[File:<filename.jpg>|thumb|Fig. 1 — Caption.]]</code>~~

−

The actual figure references — which might come before or after the mention in the text~~, looks~~ like this:

<pre>

Line 43: Line 61:

</pre>

−

~~If we're very cunning, we can gather the~~ file ~~calls, gather~~ the ~~actual~~ figure ~~references,~~ and ~~match them up~~, so ~~that the figure caption~~ is ~~inserted into the file call.~~

+

We make a file name out of the figure caption and the author's name, so every file is unique:

−

~~An alternative approach~~, ~~which would require us to write an extension I think (I can~~'~~t find one)~~, ~~would be to upload the images using their captions as~~ the file ~~description (if available). Then we could ask for the description when we~~ call ~~the file, either with a magic word or via a template (less good, because it breaks the way to make an image call)~~:

+

Ideally, the first mention of a figure, e.g. '''Figure 1''', should trigger the file call:

−

+

: <code>[[File:<filename.jpg>|thumb|Fig. 1 — Caption.]]</code>

−

<~~pre~~>

−

[[File:~~Myfile~~.jpg|thumb|~~{{DESCRIPTION}}~~]]

−

~~</pre>~~

−

or

−

~~<pre>~~

−

~~{{fig | 3.2~~

−

~~| myfile.jpg~~

−

~~| Caption text.~~

−

~~| Smith et al. 2006~~

−

}}

−

</~~pre~~>

===Lists===

Line 70: Line 75:

</pre>

−

This ~~will become~~:

+

This becomes:

−

<pre>

* Slotted liner

Line 77: Line 81:

* Cemented liner

</pre>

−

~~To do this:~~

−

* Interpret such a block as a list: perhaps lines that start with <code>--[SPACE]</code>

−

* Delete the <code><nowiki></nowiki></code> and <code><nowiki> </nowiki></code> tags.

−

* Replace <code>--</code> with <code>*</code>

Similar thing for ordered lists:

−

<pre>

1. Expendable gun

Line 91: Line 89:

</pre>

−

~~Some things can be removed~~:

+

which becomes...

−

* ~~Everything in~~ <~~code~~><~~nowiki~~><~~BLOCKQUOTE~~></~~nowiki~~></~~code~~> ~~tags is page information we won't want in the wiki~~

+

<pre>

+

# Expendable gun

+

# Semi-expendable gun

+

# Retrievable, hollow carrier gun

+

</pre>

+

Here's how we deal with all of this:

+

<pre>

+

# Convert unordered lists

+

text = re.sub(r"( )*(\n)*\n-- ",r"\n* ",text)

+

text = re.sub(r"\n\n\*",r"\n*",text) # to handle double newlines

+

# Convert ordered lists

+

text = re.sub(r"( )*(\n)*\n[0-9][0-9]?\. ",r"\n# ",text)

+

text = re.sub(r"\n\n#",r"\n#",text) # to handle double newlines

+

# Convert description lists

+

text = re.sub(r"<dd>",r"",text)

+

text = re.sub(r"</dd>",r"",text)

+

text = re.sub(r"<dl>",r"",text)

+

text = re.sub(r"</dl>",r"",text)

+

text = re.sub(r"<dt>(?:[0-9]\. )?(.+?)</dt>(\n)?",r"\n\n====\1====\n",text)

+

text = re.sub(r"<dt>-- (.+?)</dt>(\n)?",r"* \1\n",text)

+

text = re.sub(r"<dt>(.+?)</dt>(\n)?",r"* \1\n",text)

+

# Any more s are probably unecessary linebreaks and can be ordinary text lists.

+

text = re.sub(r"\n(.+?) \n",r"\n* \1 \n",text)

+

text = re.sub(r" \n?",r"\n* ",text) # to handle double newlines

+

</pre>

Matt

Bureaucrats, Administrators

1,298

edits

Changes

User:Matt/Content conversion (view source)

Revision as of 16:09, 1 October 2013

Navigation menu

Search