Discussion:
[Wikisource-l] Budget for Wikisource
Andrea Zanni
2017-05-10 13:38:21 UTC
Permalink
Dear all,
Wikimedia Italia put in its budget 3000€ for Wikisource-related work.
When we discussed this, months ago, we thought about paying a developer for
the DJVU issue of the IA-Upload tool,
which then has been resolved by our beloved Sam Wilson.

The tool is still not perfect (I often get errors), so maybe some
development is still needed, but I'd ask you (especially technically
skilled people like Tpt, Sam, Philippe, etc.) if you think there is some
low-hanging fruit that could be reached with that kind of budget.
Of course, we will be looking for developers, so if you want to propose
yourself for something, please do! ;-)

Aubrey
Philippe Elie
2017-05-10 14:54:03 UTC
Permalink
Post by Andrea Zanni
Dear all,
Wikimedia Italia put in its budget 3000€ for Wikisource-related work.
When we discussed this, months ago, we thought about paying a developer for
the DJVU issue of the IA-Upload tool,
which then has been resolved by our beloved Sam Wilson.
The tool is still not perfect (I often get errors), so maybe some
development is still needed, but I'd ask you (especially technically
skilled people like Tpt, Sam, Philippe, etc.) if you think there is some
low-hanging fruit that could be reached with that kind of budget.
Of course, we will be looking for developers, so if you want to propose
yourself for something, please do! ;-)
It's possible Tpt, me and perhaps Sam will be interested but the scope
of the work is yet unclear. Have you some recent example of errors?
--
Phe
Andrea Zanni
2017-05-10 15:14:49 UTC
Permalink
You can check in the queue that a lot of processes just freeze:
es. https://tools.wmflabs.org/ia-upload/log/bullettinodella04italgoog

Also, there is an issue with HTML tags: sometime they are present in the IA
description,
and this means they are copied also in the Commons Book template during the
workflow.
When that happens, you get an error before uploading the book on Commons.

Aubrey
Post by Philippe Elie
Post by Andrea Zanni
Dear all,
Wikimedia Italia put in its budget 3000€ for Wikisource-related work.
When we discussed this, months ago, we thought about paying a developer
for
Post by Andrea Zanni
the DJVU issue of the IA-Upload tool,
which then has been resolved by our beloved Sam Wilson.
The tool is still not perfect (I often get errors), so maybe some
development is still needed, but I'd ask you (especially technically
skilled people like Tpt, Sam, Philippe, etc.) if you think there is some
low-hanging fruit that could be reached with that kind of budget.
Of course, we will be looking for developers, so if you want to propose
yourself for something, please do! ;-)
It's possible Tpt, me and perhaps Sam will be interested but the scope
of the work is yet unclear. Have you some recent example of errors?
--
Phe
Philippe Elie
2017-05-10 15:59:09 UTC
Permalink
Post by Andrea Zanni
es. https://tools.wmflabs.org/ia-upload/log/bullettinodella04italgoog
Also, there is an issue with HTML tags: sometime they are present in the IA
description,
and this means they are copied also in the Commons Book template during the
workflow.
When that happens, you get an error before uploading the book on Commons.
Aubrey
There isn't also a trend when converting from jp2 --> pdf to produce
too big djvu?
--
Phe
Andrea Zanni
2017-05-10 16:00:43 UTC
Permalink
Post by Philippe Elie
There isn't also a trend when converting from jp2 --> pdf to produce
too big djvu?
May you please explain it better? I don't understand.
Post by Philippe Elie
--
Phe
Philippe Elie
2017-05-10 16:01:52 UTC
Permalink
Post by Andrea Zanni
Post by Philippe Elie
There isn't also a trend when converting from jp2 --> pdf to produce
too big djvu?
May you please explain it better? I don't understand.
Aren't djvu produced often too big?

--
Phe
Andrea Zanni
2017-05-10 16:03:00 UTC
Permalink
It may be.
Not sure how Sam and Tpt solved that issue.

Aubrey
Post by Philippe Elie
Post by Andrea Zanni
Post by Philippe Elie
There isn't also a trend when converting from jp2 --> pdf to produce
too big djvu?
May you please explain it better? I don't understand.
Aren't djvu produced often too big?
--
Phe
Thomas PT
2017-05-10 16:07:31 UTC
Permalink
Changing the topic as the conversation has diverged.
Post by Andrea Zanni
Not sure how Sam and Tpt solved that issue.
It's not solved yet at my knowledge.

Thomas
Post by Andrea Zanni
It may be.
Not sure how Sam and Tpt solved that issue.
Aubrey
Post by Andrea Zanni
Post by Philippe Elie
There isn't also a trend when converting from jp2 --> pdf to produce
too big djvu?
May you please explain it better? I don't understand.
Aren't djvu produced often too big?
--
Phe
_______________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Sam Wilson
2017-05-10 23:23:24 UTC
Permalink
Yeah, it's still not fixed for books of more than about 500 pages. :-(

But it's on my list to work on! Along with
https://phabricator.wikimedia.org/T159796, which hopefully will be
before the hackathon next week. I've been having some dramas with
getting JP2 things working on my new computer...

Unfortunately, at the moment, xtools is taking priority.

—sam

PS For and IA Upload bugs, feel free to add the community-tech tag in
Phabricator, so they get a bit more visibility.
Post by Thomas PT
Changing the topic as the conversation has diverged.
Post by Andrea Zanni
Not sure how Sam and Tpt solved that issue.
It's not solved yet at my knowledge.
Thomas
Post by Andrea Zanni
It may be.
Not sure how Sam and Tpt solved that issue.
Aubrey
Post by Andrea Zanni
Post by Philippe Elie
There isn't also a trend when converting from jp2 --> pdf to produce
too big djvu?
May you please explain it better? I don't understand.
Aren't djvu produced often too big?
--
Phe
_______________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_______________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
+ signature.asc
1k (application/pgp-signature)
Sam Wilson
2017-05-10 23:30:23 UTC
Permalink
This is very cool news. :)

One possibly not-too-onerous feature would be to permit upload of other
file types other than DjVu (e.g. PDF). Or there's the whole topic of
creating/finding Wikidata items for the books uploaded, and updating
them with the IA identifier. That'd probably require the uploading user
to specify a Wikidata ID though — which is what the {{book}} template on
Commons should work from anyway, in my opinion (because it can't be done
via a sitelink).
I'm very happy to help with whatever I can!

—sam
Post by Andrea Zanni
Dear all,
Wikimedia Italia put in its budget 3000€ for Wikisource-related work.> When we discussed this, months ago, we thought about paying a
developer for> the DJVU issue of the IA-Upload tool,
which then has been resolved by our beloved Sam Wilson.
The tool is still not perfect (I often get errors), so maybe some
development is still needed, but I'd ask you (especially technically
skilled people like Tpt, Sam, Philippe, etc.) if you think there is
some low-hanging fruit that could be reached with that kind of budget.> Of course, we will be looking for developers, so if you want to
propose yourself for something, please do! ;-)>
Aubrey
_________________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Andrea Zanni
2017-06-30 07:23:54 UTC
Permalink
Hello everyone, before talking again about this let me say that I think we
have a "major" bug in the IA-upload:
sometimes, the OCR is not aligned between the pages, meaning you have the
right OCR but it's shown for the following page...

Aubrey
Post by Sam Wilson
This is very cool news. :)
One possibly not-too-onerous feature would be to permit upload of other
file types other than DjVu (e.g. PDF). Or there's the whole topic of
creating/finding Wikidata items for the books uploaded, and updating them
with the IA identifier. That'd probably require the uploading user to
specify a Wikidata ID though — which is what the {{book}} template on
Commons should work from anyway, in my opinion (because it can't be done
via a sitelink).
I'm very happy to help with whatever I can!
—sam
Dear all,
Wikimedia Italia put in its budget 3000€ for Wikisource-related work.
When we discussed this, months ago, we thought about paying a developer for
the DJVU issue of the IA-Upload tool,
which then has been resolved by our beloved Sam Wilson.
The tool is still not perfect (I often get errors), so maybe some
development is still needed, but I'd ask you (especially technically
skilled people like Tpt, Sam, Philippe, etc.) if you think there is some
low-hanging fruit that could be reached with that kind of budget.
Of course, we will be looking for developers, so if you want to propose
yourself for something, please do! ;-)
Aubrey
*_______________________________________________*
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_______________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Sam Wilson
2017-06-30 08:00:11 UTC
Permalink
This is indeed a bug! I can't replicate it though. Does it happen for
every book for you? Or only sometimes? Do you know what is different
about the ones that fail? Is it related to removing (or not) the Google
cover page?
I can find time this weekend I think, to work on this.
Post by Andrea Zanni
Hello everyone, before talking again about this let me say that I
think we have a "major" bug in the IA-upload:> sometimes, the OCR is not aligned between the pages, meaning you have
the right OCR but it's shown for the following page...> Aubrey
On Thu, May 11, 2017 at 1:30 AM, Sam Wilson
Post by Sam Wilson
This is very cool news. :)
One possibly not-too-onerous feature would be to permit upload of
other file types other than DjVu (e.g. PDF). Or there's the whole
topic of creating/finding Wikidata items for the books uploaded, and
updating them with the IA identifier. That'd probably require the
uploading user to specify a Wikidata ID though — which is what the
{{book}} template on Commons should work from anyway, in my opinion
(because it can't be done via a sitelink).>>
I'm very happy to help with whatever I can!
—sam
Post by Andrea Zanni
Dear all,
Wikimedia Italia put in its budget 3000€ for Wikisource-
related work.>>> When we discussed this, months ago, we thought about paying a
developer for>>> the DJVU issue of the IA-Upload tool,
which then has been resolved by our beloved Sam Wilson.
The tool is still not perfect (I often get errors), so maybe some
development is still needed, but I'd ask you (especially technically
skilled people like Tpt, Sam, Philippe, etc.) if you think there is
some low-hanging fruit that could be reached with that kind of
budget.>>> Of course, we will be looking for developers, so if you want to
propose yourself for something, please do! ;-)>>>
Aubrey
_________________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_______________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_________________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Andrea Zanni
2017-06-30 08:10:44 UTC
Permalink
Unfortunately, sometimes, and apparently it's not related to the Google
cover page (at least, I removed a page in a book and it doesn't have the
problem. Another book indeed is disaligned, without removing the cover).

Look this:
https://it.wikisource.org/wiki/Indice:Decio_Albini_-_La_spedizione_di_Sapri,_Tip._delle_Terme_diocleziane_di_G._Balbi,_Roma_1891.djvu
Post by Sam Wilson
This is indeed a bug! I can't replicate it though. Does it happen for
every book for you? Or only sometimes? Do you know what is different about
the ones that fail? Is it related to removing (or not) the Google cover
page?
I can find time this weekend I think, to work on this.
Hello everyone, before talking again about this let me say that I think we
sometimes, the OCR is not aligned between the pages, meaning you have the
right OCR but it's shown for the following page...
Aubrey
This is very cool news. :)
One possibly not-too-onerous feature would be to permit upload of other
file types other than DjVu (e.g. PDF). Or there's the whole topic of
creating/finding Wikidata items for the books uploaded, and updating them
with the IA identifier. That'd probably require the uploading user to
specify a Wikidata ID though — which is what the {{book}} template on
Commons should work from anyway, in my opinion (because it can't be done
via a sitelink).
I'm very happy to help with whatever I can!
—sam
Dear all,
Wikimedia Italia put in its budget 3000€ for Wikisource-related work.
When we discussed this, months ago, we thought about paying a developer for
the DJVU issue of the IA-Upload tool,
which then has been resolved by our beloved Sam Wilson.
The tool is still not perfect (I often get errors), so maybe some
development is still needed, but I'd ask you (especially technically
skilled people like Tpt, Sam, Philippe, etc.) if you think there is some
low-hanging fruit that could be reached with that kind of budget.
Of course, we will be looking for developers, so if you want to propose
yourself for something, please do! ;-)
Aubrey
*_______________________________________________*
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_______________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
*_______________________________________________*
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_______________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Alex Brollo
2017-06-30 10:20:48 UTC
Permalink
Take a look to this case:
https://archive.org/details/GiacomoRacioppiLAgiografiaDiSanLaverioDel1162Images

Here OCR (as you can see from _djvu.xml file) seems severely bugged, and
obviously djvu file built by IA Upload tool can't be better than source.

Please Aubrey go on notifying me any case of faulty djvu coming from IA or
coming from IA files used by IA Upload tool.

Alex
Post by Andrea Zanni
Unfortunately, sometimes, and apparently it's not related to the Google
cover page (at least, I removed a page in a book and it doesn't have the
problem. Another book indeed is disaligned, without removing the cover).
https://it.wikisource.org/wiki/Indice:Decio_Albini_-_La_
spedizione_di_Sapri,_Tip._delle_Terme_diocleziane_di_G._
Balbi,_Roma_1891.djvu
Post by Sam Wilson
This is indeed a bug! I can't replicate it though. Does it happen for
every book for you? Or only sometimes? Do you know what is different about
the ones that fail? Is it related to removing (or not) the Google cover
page?
I can find time this weekend I think, to work on this.
Hello everyone, before talking again about this let me say that I think
sometimes, the OCR is not aligned between the pages, meaning you have the
right OCR but it's shown for the following page...
Aubrey
This is very cool news. :)
One possibly not-too-onerous feature would be to permit upload of other
file types other than DjVu (e.g. PDF). Or there's the whole topic of
creating/finding Wikidata items for the books uploaded, and updating them
with the IA identifier. That'd probably require the uploading user to
specify a Wikidata ID though — which is what the {{book}} template on
Commons should work from anyway, in my opinion (because it can't be done
via a sitelink).
I'm very happy to help with whatever I can!
—sam
Dear all,
Wikimedia Italia put in its budget 3000€ for Wikisource-related work.
When we discussed this, months ago, we thought about paying a developer for
the DJVU issue of the IA-Upload tool,
which then has been resolved by our beloved Sam Wilson.
The tool is still not perfect (I often get errors), so maybe some
development is still needed, but I'd ask you (especially technically
skilled people like Tpt, Sam, Philippe, etc.) if you think there is some
low-hanging fruit that could be reached with that kind of budget.
Of course, we will be looking for developers, so if you want to propose
yourself for something, please do! ;-)
Aubrey
*_______________________________________________*
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_______________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
*_______________________________________________*
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_______________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_______________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Alex Brollo
2017-06-30 10:23:49 UTC
Permalink
Opppss... I *presume* that _djvu.xml is bugged, really I only examined
whole text file (deved, I think, from _djvu.xml file). I'll take a deeper
look, examining too searchable PDF.

Alex
Take a look to this case: https://archive.org/details/
GiacomoRacioppiLAgiografiaDiSanLaverioDel1162Images
Here OCR (as you can see from _djvu.xml file) seems severely bugged, and
obviously djvu file built by IA Upload tool can't be better than source.
Please Aubrey go on notifying me any case of faulty djvu coming from IA or
coming from IA files used by IA Upload tool.
Alex
Post by Andrea Zanni
Unfortunately, sometimes, and apparently it's not related to the Google
cover page (at least, I removed a page in a book and it doesn't have the
problem. Another book indeed is disaligned, without removing the cover).
https://it.wikisource.org/wiki/Indice:Decio_Albini_-_La_sped
izione_di_Sapri,_Tip._delle_Terme_diocleziane_di_G._Balbi,_Roma_1891.djvu
Post by Sam Wilson
This is indeed a bug! I can't replicate it though. Does it happen for
every book for you? Or only sometimes? Do you know what is different about
the ones that fail? Is it related to removing (or not) the Google cover
page?
I can find time this weekend I think, to work on this.
Hello everyone, before talking again about this let me say that I think
sometimes, the OCR is not aligned between the pages, meaning you have
the right OCR but it's shown for the following page...
Aubrey
This is very cool news. :)
One possibly not-too-onerous feature would be to permit upload of other
file types other than DjVu (e.g. PDF). Or there's the whole topic of
creating/finding Wikidata items for the books uploaded, and updating them
with the IA identifier. That'd probably require the uploading user to
specify a Wikidata ID though — which is what the {{book}} template on
Commons should work from anyway, in my opinion (because it can't be done
via a sitelink).
I'm very happy to help with whatever I can!
—sam
Dear all,
Wikimedia Italia put in its budget 3000€ for Wikisource-related work.
When we discussed this, months ago, we thought about paying a developer for
the DJVU issue of the IA-Upload tool,
which then has been resolved by our beloved Sam Wilson.
The tool is still not perfect (I often get errors), so maybe some
development is still needed, but I'd ask you (especially technically
skilled people like Tpt, Sam, Philippe, etc.) if you think there is some
low-hanging fruit that could be reached with that kind of budget.
Of course, we will be looking for developers, so if you want to propose
yourself for something, please do! ;-)
Aubrey
*_______________________________________________*
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_______________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
*_______________________________________________*
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_______________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_______________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Loading...