ArchiveBox or similar for shared archiving of research project

Stopwatch1986@lemmy.ml · 2 days ago

In other news (well, from April): https://post.parliament.uk/facial-recognition-technology-in-policing

Stopwatch1986@lemmy.ml · 3 days ago

Thanks for doingthe digging. An archivist may know something more. Or the archive.is people.

Stopwatch1986@lemmy.ml · 3 days ago

I have been using Zotero every day for more than two decades and somehow it hasn’t cross my mind. You may be on to something.

Zotero supports public and private shared bibliographies that you can subscribe to through the client or their web interface. Each entry contains the bibliographical details, notes attachments, file attachments and links to local files. It also captures webpages and metadata through the browser addon. The local database can be backed up and, if self-hosted, you have control. The best part is that academic researchers will be familiar with the software and process. One downside is that the cached file is not independently archived so it could be tampered with. Thanks for the idea.

Stopwatch1986@lemmy.ml · 3 days ago

A wiki is a good idea. Putting a Singlefile or similar all-in-one file in a repository and provide index numbers organised as a look-up table would also work for easy retrieval by a random research user. Both require some admin and more effort from the researchers.

I wish there was a hostable version of archive.is for near-zero maintenance. You just submit a URL over the internet and the web page is cached once along with a screenshot. Then, anyone can access the archived version. This can be done already with archive.is but we have no control over its future, which is critical for long-term dependable archiving.

Stopwatch1986@lemmy.ml · 4 days ago

One advantage and disadvantage of having webrecorder host our archived pages is that the archive may survive longer than, or not as long as our project.

I have been using singlefile for years. It’s great but not for seamlessly making cached web pages available to the general public reading our reports and finding that cited links are now dead. And it doesn’t support URLs point to PDF, CSV files. A public-facing repository of singlefile files with an index for ToC might do it though. Simplicity is good for future-proofing an archive.

Something like archive.org and archive.is would be ideal, but we have no control over its future and practices.

Stopwatch1986@lemmy.ml · 4 days ago

I wonder if an authorised remote user (ie an affiliated researcher) can easily instruct ArchiveBox to store a URL and later retrieve it. Also, ideally a random user should be able to retrieve the archived web page or file (eg a PDF, CSV etc). The idea is that authorised researchers can get URLs archived, and then any user reading our reports can click on a citation and get our archived source if the original is not available any more. I’ll need to run it and see, but it looks promising.

Keeping the archive alive for years later, possibly after funding dries up, is another challenge but there are public repositories that may be suitable for that.

Stopwatch1986@lemmy.ml · 5 days ago

ArchiveBox or similar for shared archiving of research project

Stopwatch1986@lemmy.ml · edit-2 5 days ago

I had missed the proxies and it looks great. It would work well with FoxyProxy in my current setup. But I can’t get it to work with my network socks5 proxy. I enter something like http://ip:port (which works well with FoxyProxy) but I get ‘The proxy server is refusing connections’. Does the URL pattern look correct?

EDIT: I get the web page only when Firefox VPN is running presumably because then the connection is routed through Firefox VPN rather than my socks proxy, so the proxy feature is not working here for some reason.

Stopwatch1986@lemmy.ml · 5 days ago

Isn’t that what MAC + Simple Tab Groups do? Isn’t it still the case that if we have, say, 10 tabs in the same container they can read each other’s cookies and possibly other data? If that’s true, it seems one way to stop this is to create a separate container for each site. That would be very impractical unless the browser does it for us.

Stopwatch1986@lemmy.ml · 6 days ago

Doesn’t clicking on the headphones switch to an audio test like with regular captcha? That’s what I do and it works first time instead of getting an endless number of images when I use VPN. The words you enter don’t even have to be 100% correct.

Stopwatch1986@lemmy.ml · 8 days ago

Unless I misunderstand this, the problem with Multi-Account Containers is that sites sharing the same container can read each other’s cookies probably because MAC is actually designed for multi-account scenarios rather than to isolate each site. I wish there was a way to containerise each site by default without breaking the web.

Konform Browser

Looks good but I would miss new Firefox features and the stricter the restrictions the more friction. I used NoScript for a while in the past and it was a nightmare.

Stopwatch1986@lemmy.ml · 10 days ago

Well yes, but I am trying to figure out if Strict Protection and different containers overlap in what they do. If yes, I don’t see why FF maintains and promotes all of them. Facebook Container seems like a single-case Multi-account Container, and Strict Protection seems to block cross-site cookies anyway.

On fingerprinting, perhaps switching languages, addons etc on/off would confuse trackers. FF could even do that automatically.

Stopwatch1986@lemmy.ml · 10 days ago

Firefox containers

Stopwatch1986@lemmy.ml · 13 days ago

It is but creeping privatisation may change that, as does legislation becoming more hostile to unionisation since the 1980s.

The broader point is that individuals can try all they want to preserve their privacy, but then friends, family and organisations spy on them, often unwittingly, eg when we share with them calendar events or email messages. The only way forward is collective resistance, building alliances and influencing public policy. But it’s always been like that with systemic issues.

Stopwatch1986@lemmy.ml · edit-2 15 days ago

And resistance can only be collective. Another reason unionisation is as important as it’s ever been.

Stopwatch1986@lemmy.ml · 18 days ago

The one thing holding me back from switching from Windows to Linux was the very poor PDF support in Linux. Every time I raised this several people told me I use PDF wrong. Others would tell me to use Inkscape, Draw, Okular etc.

Office workers, publishers, academics and many more are expected to edit several PDFs every day. It may be simply crossing things out in a draft, adding/deleting/extracting/converting pages, OCRing or dewarping images. Telling colleagues, clients and line managers they shouldn’t do it is not an option. Adobe does all this and more very well. This workflow is so common and important in so many contexts, I am surprised it’s not a separate application in the LibreOffice suite. What is more surprising is some of the attitudes.

I have now switched to Linux anyway, but I had to create scripts to do things with Ghostscript. Not very user-friendly and I wouldn’t recommend Linux to people who rely heavily on PDF handling.

Stopwatch1986@lemmy.ml · 18 days ago

Same situation here. For heavier editing I now use local Stirling PDF and BentoPDF. As I say above, both run in docker, but Stirling PDF also comes as Appimage. They are powerful but don’t feel like integrated applications.

But there is a surprising gap in Linux for PDF editing. Available tools are like toys for the task or geared towards techies. I would expect a PDF reader/editor to be a separate application in the LibreOffice suite. (No, Draw or Inkscape won’t cut it, sorry)

Stopwatch1986@lemmy.ml · 18 days ago

I stopped recommending Master PDF Editor when I realised they were trying to lock me in with letting me know after the event that watermarks would be added.

PDF4QT is aekward in many ways but the latest version has the best compression, even allowing you to select one-by-one which images will be compressed and how.

Other options for editing are local Stirling PDF and BentoPDF. Both run in docker, but Stirling PDF also comes as Appimage. They are powerful but don’t feel like integrated applications.

Stopwatch1986@lemmy.ml · 18 days ago

I agree it’s great but I had issues running it under Wine. An alternative is Xodo for Linux which apparently is Qoppa PDF Studio revamped. These three look strangely very similar. But Xodo has other issues, including crashes.

Stopwatch1986@lemmy.ml · 19 days ago

The implication is that sending links to encrypted files with the decryption key added to the URL (eg Thunderbird Send, Mega etc) is not zero-trust. Decryption may take place locally and the key part of the URL may not be sent to the file hosting service, but when the recipient clicks on the link and is served one-off code by the web site, that code may be compromised.

As we know, the best way to be sure is to do your own separate encryption but without secure-by-design most people will think you are very odd demanding that decryption is done separately and keys are shared through a different channel. Speaking from experience, no matter how much training they are given at work, most people, including HR, would rather you sent them sensitive documents (like passport scans) in the clear as email attachments or at least in a way that involves a single click (Wetransfer etc).

Stopwatch1986@lemmy.ml · 19 days ago

Zero-trust services and web access

Stopwatch1986@lemmy.ml · 1 month ago

I thought it was Autonomy. You installed a program, instructed puppies agents, logged out, and while you were offline the puppies searched through several engines. Next time you logged in the findings waited for you. That was the time of 56k modems and metered connections.

Stopwatch1986@lemmy.ml · 1 month ago

Yes, and you might also say that time-starved humans just reviewing LLM output may generate more accurate reports than having to write them from scratch in a rush. That’s until humans get complacent or are expected to do even more per minute. But there is a fundamental difference. Unlike humans, LLMs don’t understand context and don’t do sanity checks. When they hallucinate they can do so wildly, without a sense of implications, but always with confidence.

Stopwatch1986@lemmy.ml · 8 months ago

Different installation methods and system stability

Stopwatch1986@lemmy.ml · 8 months ago

Is there a simple GUI application alerting the user when a process is not running?

Stopwatch1986@lemmy.ml · edit-2 9 months ago

SOLVED: Ethernet stopped working hours after installation. Wifi works OK.