Hacker Newsnew | past | comments | ask | show | jobs | submit | mushiake's commentslogin

maybe Archy[1] you thinking about?

[1] https://en.wikipedia.org/wiki/Archy_(software)


It's not emulation, it's wrapper on top of native ui toolkit. win32 for windows, cocoa for macos.


I am speaking about when Tk mattered, around 2000's.


Sateloggic[0] is using cuis smalltalk to process satellite image.

[0]https://en.wikipedia.org/wiki/Satellogic


There is a proposal for new Ocaml macro inspired by racket.[0]

[0]http://www.lpw25.net/ocaml2015-abs1.pdf


Very exciting - thanks for sharing. Now we just need proper multicore and that's a versatile modern language.


fastastic news.

Archive Team's take on this[0]

[0]http://www.archiveteam.org/index.php?title=Robots.txt


It is great news in general, but seems to be done in a clumsy and counterproductive manner that may cause the Internet Archive to be banned from crawling some websites.

The problem: when robots.txt for a website is found to have been made more restrictive, the IA retrospectively applies its new restrictions to already-archived pages and hides them from view. This can also cause entire domains to vanish into the deep-archive. No-one outside IA thinks this is sensible.

Their solution: ignore robots.txt altogether. What? That will just annoy many website operators.

My proposed solution: keep parsing robots.txt on each crawl and obey it progressively, without applying the changes to existing archived material. This is actually less work than what they currently do. If the new robots.txt says to ignore about_iphone.html you just do that and ignore it. Older versions aren't affected.

Basically they're switching from being excessively obedient to completely ignoring robots.txt in order to fix a self-made problem. I can only see that antagonising operators.


Archive Team is not associated with Internet Archive. AT does not crawl the web at large, it only targets specific sites.


There's some value in allowing site operators to retroactively remove content which was never intended to be public. A common and unfortunate example is backups (like SQL dumps) being stored in web-accessible directories, then subseqently being indexed and archived when a crawler finds the appropriate directory index.

What needs to be fixed first is just the really common case mentioned in the blog post, where a domain changes ownership and a restrictive robots.txt is applied to the parking page.


Here's a slight modification to the GP proposal:

- Respect robots.txt at the time you crawl it.

- If robots.txt appears later, stop archiving from that date forwards.

- Preserve access to old archived copies of the site by default.

- Offer a mechanism that allows a proven site owner to explicitly request retrospective access removal.

If archive.org have recorded the date that they first observed a robots.txt on the sites currently unavailable, they could even consider applying the above logic today retrospectively. Perhaps after a couple of warning emails to the current Administrative Contact for the domain.


>mechanism that allows a proven site owner to explicitly request retrospective access removal. //

It should be "a proven content owner", just buying a site shouldn't allow someone to remove it from archive.


How about you respect the robots.txt until the IP address where it is hosted changes. Once the IP has changed, then any new robots.txt exclusions apply only to the new pages not the archived pages under the old IP, which continue respecting the old archived robots.txt.

The IP address changing is a pretty solid indicator that control of that content has moved to a new organisation. Note this does not always coincide with the domain name owner changing.

A scenario that I can imagine becoming litigious: company owns a domain for promoting some product and they use robots.txt to prevent copies. The product reaches end of life and domain is allowed to expire. Someone else buys the domain and starts hosting content with no robots restriction. Archive.org start to display pages from the old company. Company then sues archive.org for copyright violation.


>may cause the Internet Archive to be banned from crawling some websites.

It looks like Facebook banned ia_archiver (recently? I recall it worked a few weeks ago):

>User-agent: ia_archiver

>Disallow: /

https://www.facebook.com/robots.txt


The logic is sound, and I see that it was mostly written in 2011, but I can also see it being harmful.

How about an IETF RFC to clarify?

Libraries operate under a lot of unwritten social conventions, perhaps even more than most other institutions. (robots.txt even if largely ignored is a popular convention) Aggressive or confrontational wording, regardless of whether they are "right" doesn't seem in libraries' interests.


sl [0] is always my favorite.

[0]https://github.com/mtoyoda/sl


Is there a way to lolcat sl?


FFTW[0] is also written like that (generator written in OCaml emitting C).

[0]http://www.fftw.org/


Gtk3 has native file chooser api (only for windows, no mac one yet).


Racket uses cocoa for macos, win32 for windows, gtk+ for linux. But it is really minimal and you may find feature lacking.


For anyone interested in how Racket (Matthew Flatt) pulled it off : http://blog.racket-lang.org/2010/12/rebuilding-rackets-graph...


smoke/kde was supposed to be contender for gobject-introspection, however it is barely maintained.

It is only(?) used by common lisp[0].

And there was claro[1].

[0]https://github.com/Shinmera/qtools

[1]https://github.com/Araq/Claro


It too seems to be stuck on Qt4.

Only python bindings seem to have made it up to Qt5.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: