Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add a duration/timedelta type #514

Open
JelteF opened this issue Jan 14, 2018 · 64 comments
Open

Feature request: Add a duration/timedelta type #514

JelteF opened this issue Jan 14, 2018 · 64 comments

Comments

@JelteF
Copy link
Contributor

JelteF commented Jan 14, 2018

I think it would be very useful to have a duration type natively in toml. It's a thing I use a lot in my web service configs, for cache TTL or timeouts. Right now I resort to using integers and making the key include the resolution (e.g. timeout_ms, ttl_hours). This has a couple of disadvantages:

  1. Requires extra code every time to be converted to actual language specific duration type.
  2. If the resolution chosen was wrong you have at least one of these problems:
    1. You have to change the key, resulting in backwards incompatibility
    2. You have to add zeros, which decreases readability
    3. You have to do calculations. e.g. if have ttl_hours and want 9 days you need have to enter 216. Which makes it (at least to me) not obvious when quickly looking at the config.

I would propose the following basic and IMHO natural syntax (inspired by go duration parsing/formatting):

day = 1d
hour = 1h
minute = 1m
second = 1s
milli = 1ms
micro1 = 1µs # U+00B5 = micro symbol
micro2 = 1μs # U+03BC = Greek letter mu
nano = 1ns

# allows floats
micro3 = 0.1ms

# allows combining
two_and_a_half_hours = 2h30m
# not advised but possible
five_seconds = 2s3s

# can be negative
minus_one_seconds = -1s

# allows underscores
hundred_thousand_hours = 100_000h

This notably doesn't include months and years because they can differ in duration and are quite easily approximated in days. I'm also fine with the following changes:

  1. Taking out/changing the µ for micro seconds. I think it's fine to use 0.1ms in most cases, so it's not strictly needed. I mainly put it in because Go duration parsing and formatting allows/uses it as well.
  2. I'm fine with adding a prefix to make differentiation with numbers easier. For instance D which would result in D2h30m.
  3. Removing the duplication possibility for 2s3s. Again I mainly put this in because the Go duration parsing allows it.

I really hope this is considered for inclusion as it would be really useful to me and my colleagues. (Much more so than the already supported datetime type, which I've never had an actual use for in a config).

PS. I created a modified fork https://github.com/pelletier/go-toml that supports this: https://github.com/JelteF/go-toml (see the last couple of commits)

@forember
Copy link

A prefix is a good idea, especially if #427 gets accepted. Microseconds could be represented with us. And TOML tends to be pretty strict about formatting (to reduce the chance of confusion when reading a doc), so combining should probably require descending order of units with no duplication.

@JelteF
Copy link
Contributor Author

JelteF commented Jan 22, 2018

@NighttimeDriver50000, thanks for the input. The us suffix sounds like a good idea indeed, much easier to type than µs. And the reasoning for the other two points make sense as well.

@jongiddy
Copy link

The date-time type is derived from RFC3339, which is a subset of ISO8601. It would be great to define any other time types using similar standards. ISO8601 has a duration representation and RFC5545 defines a subset.

This would make your example:

day = P1D
hour = PT1H
minute = PT1M
second = PT1S
milli = PT0.001S
micro = PT0.000001S
nano = PT0.000000001S

# allows floats
micro3 = PT0.001S

# allows combining
two_and_a_half_hours = PT2H30M
# not supported
five_seconds = PT2S3S

# can be negative
minus_one_seconds = -PT1S

# allowing underscores would be a non-standard extension
hundred_thousand_hours = PT100_000H

The benefit is the use of a recognised standard, and that the P prefix makes parsing simpler and keeps more space for other types that may one day be added.

The downsides are that sub-second units are only supported as decimals (while ISO8601 supports decimals for any time unit, RFC5545 doesn't allow them at all), and the decimals do not support underscores.

@pnathan
Copy link

pnathan commented May 20, 2018 via email

@JelteF
Copy link
Contributor Author

JelteF commented May 21, 2018

Good to see some activity on this issue again. Usually I agree that using standards is preferable. However, I don't agree that ISO8601 or RFC5545 would be a better fit for this then something similar to the Go duration parsing. With the following reasons:

  1. Having sub second resolution is really nice for defining short timeouts.
  2. Days and weeks in those standards are defined only relative to the date that you subtract them from. This means P1D is not always 24 hours, so you cannot use the built-in language duration types.
  3. The use of capital letters makes it harder to see the units at a glance.
  4. Requiring a T between the days and time adds extra clutter for no good reason (unless you allow M for months, which is not consistent in number of days)
  5. I've never seen anybody use this standard, which suggests that people don't really like it. (yes this is not scientific of course, feel free to dispute this)

Fixing some of these is of course possible, but that would result in a custom standard. So it would lose the benefit of using the standard.

PS. I'm fine with having the P prefix (or any other one to avoid confusion with the standards). So don't take that as a reason to prefer the standard.

@Falkon1313
Copy link

A suggestion: a simple quantity unit may be better than trying to specify all possibilities, and/or a limited subset with arbitrary exclusions.

This notably doesn't include months and years because they can differ in duration and are quite easily specified in days.

But that is exactly why you can't specify them in days. If something is due every 3 months, how do you express that? Or that something has been owned for 5 years?

And time is not the only measurement, they all have units. Whether it be disk space/RAM, distance, volume, temperature, currency, etc.

A more general quantity unit measurement type would cover every possible combination, without having to specify them all or include all the conversion factors. Actually interpreting them (and converting if necessary) isn't really the job of a configuration or data file parser, that belongs to the application interpreting the configuration or data.

I like the idea of the combining - you could express something like 8 lb 5 oz or 2 months 3 days.

But generally, this sort of thing belongs more in the domain of the consuming application, not the data file format. And all measurements could be expressed as strings and handled by the application as appropriate for the application. What is the benefit of making it more complicated?

@JelteF
Copy link
Contributor Author

JelteF commented Jul 3, 2018

@Falkon1313 I agree that a general quantity unit should be part of the the consuming application. However, there's a big difference between durations and the other quantities you mention: There's a standard library type for durations in almost every programming languague (at least the ones that also have a datetime type, which is already part of the toml spec). And like I said before, the main advantage would be to directly generate that type.

But that is exactly why you can't specify them in days. If something is due every 3 months, how do you express that? Or that something has been owned for 5 years?

Yes I messed up there. I meant to say they are quite easily approximated in days. (fixed up that comment now)

@Hugo-Trentesaux
Copy link

Maybe we should just add a type which is a pair (number, string), the unit of the number being represented in the string. It would be very useful for any physical quantity.

lenght = 1 mm
size = 720 px
duration = 1 day

But the question is how it would be loaded in the program so that we can not add 1 apple with 2 watermelons...

@willstott101
Copy link

willstott101 commented Apr 9, 2019

I personally think generic quantities are best parsed and considered by language and/or domain specific libraries. Any number with a unit may well get coerced to a native type differently depending on that language's native type, and the application's specific understanding of the units of a given domain. For instance a language's Decimal type might be most appropriate when handling quantities (especially currencies).

An example of a quantity library which I think handles them well is Python's Pint, which keeps original strings around as long as possible (extremely useful for any kind of user input, including configs): https://pint.readthedocs.io/en/0.9/

But because durations are so often representable in standard libraries, and so ubiquitous when configuring services, I think it makes a lot of sense to have an intuitive, obvious format in Toml. For the record, that standard mentioned above is far from obvious to me.

I'm not sure number, string tuples will offer much benefit beyond strings, if the application has to decide how those strings transform the number anyway - parsing numbers is easy, handling units is a fiddle.

@workingjubilee
Copy link
Contributor

The USA standard for characters to use in lieu of μ is mc, as in 150mcg = "150 micrograms" which, while not intuitive, is in fact standard. In fact, the precise reason that the USA standard is that is because people might confuse the Mu symbol with a u, and be mislead as to its meaning. Accepting SI and USA transparency would be fine, but us would be undesirable.

Of course, ISO8601 favors the .000001 second style of notation as of the previous issuance... it was recently revised in 2019 and I am not sure how, exactly, yet.

@yeongjet
Copy link

Need this feature!

@Felk
Copy link

Felk commented Mar 6, 2020

Days and weeks in those [ISO8601] standards are defined only relative to the date that you subtract them from. This means P1D is not always 24 hours, so you cannot use the built-in language duration types.

Strictly separating time and calendar periods tends to be a good thing. Using a calendar unit ("day" and everything above) in the context of time is almost always a bug. Though I don't have any numbers on how common actual usecases of using calendar units for time are, work experience shows that whenever people used instant.add(1, ChronoUnit.DAYS) instead of instant.atZone(...).addDays(1) in java, it always caused a timezone bug.

Given that ISO8601 makes a clear distinction between periods and time in the form of P<period>T<time>, I am strongly in favor of using it. Also since the already built-in support for datetimes RFC 3339 is also ISO8601-compatible, it seems intuitive to stick with it

@eksortso
Copy link
Contributor

eksortso commented Mar 8, 2020

The more I look at ISO 8601's duration standards, the more I like them. The P prefix identifies durations immediately, and T instantly separates the times and dates. And the smallest unit can have a fractional value. They don't have the brevity that @JelteF's original proposal offered. But maybe for the sake of easier configuration writing, we can accommodate a few modest extensions?

Here are some proposals. I tried to cover all the bases touched upon so far, and I hope I didn't stretch things out too far. What do you think?

  • Allow numbers to have underscores in them, the same way we allow them in integers and floats. This would only be useful for very small fractions like PT0.000_001S, but more on this later.
  • Allow an underscore between units, and between "P"/"T" and the following unit. That would make, e.g., PT_2H_45M valid. They wouldn't be allowed between the number and its unit. We'll want those to stick together.
  • Let the letters be case-insensitive. For instance, PT30S would be the same as pt30s. The height difference between the letters and numbers would make those numbers pop out to human readers.
  • A negative proposal: let's not allow weeks. We don't use the ISO 8601 week-based calendar, and so we wouldn't allow something like P1W to be legal in TOML.
  • A radical change: Drop the P if there is no date part, so that T5M for five minutes would be valid. (We're going to need that T for time durations.)
  • Most radically: Introduce 2- or 3-character unit names for sub-second durations. These would follow after S but would work the same way as other units. In order of magnitude, these would be MS for milliseconds, MCS and its variants for micro, and NS for nano. So something like PT100MS for a 100-millisecond interval would be valid.

@ChristianSi
Copy link
Contributor

@eksortso In my view, if we follow the ISO 8601 standard, we should stay close to it. One or two small changes may be fine, but if we're to deviate as far as you suggest, we can as well start from scratch and roll our own solution – maybe as @JelteF suggested or something close it. Or we look for another fine standard/convention that fits our needs better without requiring as many changes as you suggest.

@eksortso
Copy link
Contributor

eksortso commented Mar 9, 2020

Fair enough. No need for additional units. Not now, anyway.

But, and I was trying to address some of @JelteF's concerns: I still recommend allowing the underscores, being case-insensitive, and prepending P if T starts a time duration. TOML would gain readability and brevity, and these somewhat intermediate forms can be converted to ISO 8601-compliant durations with trivial string munging.

@JelteF
Copy link
Contributor Author

JelteF commented Mar 9, 2020

Strictly separating time and calendar periods tends to be a good thing. Using a calendar unit ("day" and everything above) in the context of time is almost always a bug.

@Felk Thinking about this again, you are absolutely right. However, what I'm trying to say though is that this separation is not very useful in practice, if it means you cannot convert the string into the built in duration type of the language. This is the case for Go and Python at least (and I expect more languages).

@JelteF
Copy link
Contributor Author

JelteF commented Mar 9, 2020

@eksortso I agree that if we add a pass that removes all underscores and capitalizes all letters, we can still use parsing libraries for the standard.

@eksortso
Copy link
Contributor

eksortso commented Mar 9, 2020

Thanks, @JelteF.

The notion to allow T instead of PT for intervals with no date components wasn't done to create a distinction between date-based and time-based durations. It's just that I would confuse myself sometimes, that P5M is five months and PT5M is five minutes. That "T" makes a difference, but I guess I still need to convince folks that letting it stand without the "P" would be valuable.

@eksortso
Copy link
Contributor

So I know we're trying to get v1.0 out the door, but since there's little along those lines that I can help with, I'd like to move this along, in anticipation of v1.1.

I've sat on a PR for this for awhile. Would it cause any problems (e.g. distract from v1.0 release efforts) if it were submitted for future consideration?

@salmangano
Copy link

salmangano commented Mar 30, 2020

I needed a way to represent durations sooner than later. I also did not want to fork the implementations I use as I rely on both cpptoml and python toml. So for the time being I am using an inline table like so:

delta={count=-15, unit="secs"}

I have a simple C++ utility to convert this into the std::chrono::duration types.

But I'd love to see first class support for this.

@abelbraaksma
Copy link
Contributor

Copied from #717:

But I'd only allow the last expressed unit to have a fractional part, per ISO 8601.

While 8601 allows it on the last, but any part, I propose to only allow it on seconds. This is also the approach that the standards body of W3.org adopted.

It's non-trivial what it means to have fractional minutes, hours or even days, months or years. You're better of keeping it simple, and users will quickly come to understand that only the seconds part can be decimal (or double).

Btw, 8601 allows weeks, and an abbreviated format (without the letters). I wouldn't use either of those either (but I think there was already consensus on that in the main thread).

even though ISO 8601 doesn't appear to allow them

They don't disallow them, which in standard's parlor usually means that they allow them. My suggestion would be, again, to keep it simple: either the whole duration is negative, or the whole duration is positive. Subtracting parts is complicated, and without timezone information not even reliably possible. What's more, you'll get a different amount of days depending on the time of year (daylight saving time) if you allow subtractions of parts.

Ending up with a duration that's either years and months, or days and time means you have ordered types. These types are exact. Once you mix these, they mean something else depending on time of year.

That's OK, and ultimately up to implementers, but doing all that for positive or negative durations is already quite some work. If independent parts can be positive and negative it's that much harder. And likewise, that much harder to explain to end users and in spec prose.

@abelbraaksma
Copy link
Contributor

Note that my point of limiting scope of individual members is not about date, time, duration calculations in toml, but that it can be reasonably expected to be the main use case where these types will be applied.

(though I can sympathize with an opposing argument that we should be inclusive and allow each duration segment to be negative, many existing implementations of such types don't support such flexibility, but also, those that do either chose to support that the whole duration can be negative, or support that individual segments can be negative, but not both)

@eksortso
Copy link
Contributor

Copied from #717:

But I'd only allow the last expressed unit to have a fractional part, per ISO 8601.

While 8601 allows it on the last, but any part, I propose to only allow it on seconds. This is also the approach that the standards body of W3.org adopted.

The significance of W3.org standards only carries so far. Web technologies operate in second-based time intervals anyway. But not everything does. So I have no problems with using ISO 8601's approach to fractional units, which seems reasonable enough to me.

It's non-trivial what it means to have fractional minutes, hours or even days, months or years. You're better of keeping it simple, and users will quickly come to understand that only the seconds part can be decimal (or double).

I can understand the value of simplicity, but I also want to create a standard that's eminently usable. I wouldn't exclude half-hours for general use when over 8750 hours each year would interpret 0.5 hours the exact same way.

Allowing such niceties creates challenges to devise simple, precise definitions. This is partly done; I already have ABNF code that takes fractions into account. And once I submit a PR (with language that's not on the computer I'm currently typing on), you can assess that for yourself.

My suggestion would be, again, to keep it simple: either the whole duration is negative, or the whole duration is positive.

I agree with you here. Plus or minus the whole duration. That way, we can safely look past the fine points of duration arithmetic.

@abelbraaksma
Copy link
Contributor

abelbraaksma commented May 21, 2020

The significance of W3.org standards only carries so far. Web technologies operate in second-based time intervals anyway.

Perhaps true for http (but that doesn't support durations, iirc), my work was in the xml, xsd, XPath, xslt area, and those transcend the area of just "web technologies".

(and also, the W3 mention was merely as an illustrating example how "some other standards body" did it, I'm fully aware that their approach has been, and often still is, with its own flaws)

But i understand your points. Besides, most discussion in the W3 groups was wrt date, time, tz, era, calendar and duration arithmetic, which can fill a bookshelf by itself ;). It's dauntingly complex...

I now realize that data manipulation is not something toml concerns itself with. I understand you need to be able to support applications that would want to express fractional time units, while other applications might want to prohibit that. Which is kind of in the same league of an application expecting a numeric value that is in the range 1-10, while toml will allow any 64 bit integer.

In other words, @eksortso, I see now why you'd generally prefer a broader definition over a more limiting one, allowing a wider range of potential scenarios.

@abelbraaksma
Copy link
Contributor

Btw, would we want to differentiate between time span and duration? The first is defined by a start- and end-datetime, the second by a period without reference to, or bearing on, a given datetime. They are semantically equivalent, but serve different scenarios, and are expressed and interpreted differently. (apologies if this has already been decided).

@abelbraaksma
Copy link
Contributor

I personally have no problems at all with the ISO syntax, but I might be biased: I've seen it often enough (and have implemented it myself a few times) that it comes natural to me. In the end, what people consider as most human-friendly and still easily machine-definable is totally fine with me. It's good that you (@marzer, @arp242) took a deeper dive into this and came to a strong favor of the xxm/xxh syntax. Though I do have a few comments, concerns:

Things like 1h30m don't need to be supported, IMHO; just 1.5h is good enough.

I think you should definitely support it, as time cannot be precisely defined in decimals. You cannot express 1 hour and 10 min as 1.1666666666666h, as it's never a precise mapping, while 1h10m is.

You may even consider, just as with times, to only allow fractions in the seconds part, as it may prove hard to define a proper translation from inexact decimal (which may be represented as floats internally as some languages don't support fixed decimals) to exact duration.

Assume for a moment we have to deal with floats. Now, language X may support 80 bit float on one CPU but 64 bit float on another. This may lead to, after reading a TOML file, that two equally written durations, are not the same for comparison.

Anyway, that's a bit of a tangent, but bottom line is, from decimal or floating point to sexagesimal (which is what time is) can be harder than it seems at first and when calculations come into place, may lead to unexpected, incomparable results.

My suggestion would be then to use \d+h\d+m[0-9.]+s (not a precise definition, but you get the idea).

@abelbraaksma
Copy link
Contributor

abelbraaksma commented May 20, 2022

While we're at it, how are we going to interpret mixed hour/days/months/year durations? In most specifications and implementations I've seen, it's either time+days or months+years, but not both (and if both are allowed, they are not allowed to be normalized).

Consider 1m2d, and 33d. These cannot be combined or be considered the same. There's more to this, but I'll leave it at this simple example in case this can of worms has been opened before and been sorted out ;).

Edit, just missed this suggestion:

Let's focus on units smaller than a day. Let's consider only allowing durations to have a single one of these units.

Probably a good idea indeed to start small. Though I don't think we should disallow durations longer than 24h, just disallow durations with d, m and y in it (though frankly, d isn't the trouble maker, it is m and y).

We should also be explicit in allowing 1h5m to be the same as 65m, for instance.

@arp242
Copy link
Contributor

arp242 commented May 20, 2022

You cannot express 1 hour and 10 min as 1.1666666666666h, as it's never a precise mapping, while 1h10m is.

You can use 70minutes; I think that's "good enough", and it's a good trade-off with keeping both the implementation and syntax simpler. Some small possible inaccuracy with floats is also fine I think; we're not concerned with precision time-keeping, and if guaranteed precision is really needed you can use milliseconds or nanoseconds similar to 70minutes instead of 1.166..hours

Consider 1m2d, and 33d. These cannot be combined or be considered the same. There's more to this, but I'll leave it at this simple example in case this can of worms has been opened before and been sorted out ;).

I think we shouldn't include "month" at all, or if we do, simply define it as "30 days".

There is no good way to deal with this in a context-less "duration" unless you force implementations to parse it as an object which keeps track of this (e.g. instead of merely storing it as an int64 you need some class/struct with a hour, day, month, etc. field), but many stdlib "duration" types don't (at least, Python and Go doesn't).

@abelbraaksma
Copy link
Contributor

abelbraaksma commented May 20, 2022

There is no good way to deal with this in a context-less "duration"

Well, there is (keep month and year separate, basically), and there isn't (some might not call this a "good way"). But I agree, as I mentioned in my other comment, which I may have just edited while you were typing.

and it's a good trade-off with keeping both the implementation and syntax simpler.

You may have misunderstood why I made the suggestion. It is precisely to keep the implementation simpler, as there's no way of knowing what happens if we try to force a decimal time-duration system on people. It's just not what time is. I think it'll be much harder to formalize decimal minutes (which you'll need to do if you were to allow it) than it is to allow only integer hours, minutes and decimal seconds.

I don't think it'll be hard to create, parse and interpret [Xh][Xm][Fs] where Xdenotes an integer and F a float/decimal. Each section is optional, of course, and the order h-m-s is required. Overflow travels from right-to-left where it concerns comparisons.

@Falkon1313
Copy link

I think we shouldn't include "month" at all, or if we do, simply define it as "30 days".

Just want to point out that excluding months (or days or years) could be a WTF for users if you have other units. Arbitrarily assigning a non-standard value for them (like 30 days per month) would be even worse, since it would falsely appear to be able to do the right thing, but actually only sometimes would and other times you'd have apparently random bugs.

Lots of things in both business and tech operate in terms of months. Whether it be things like quarterly reports (3 months) or monthly billing (1st of every month) or checking when something is due or if something is more than a month overdue, etc. You might have monthly log rotations, quarterly batch processes, semi-annual things (6 months), etc. I don't know how often people would reach for a duration (aside from the next due/overdue case, which is actually very common), but if it's there they'd expect to be able to use it.

If this type is specifically going to exclude things like that or handle them in non-standard ways, then it needs to at least be very clearly documented that people who need standard durations should not use it but instead use a string and their standard libraries to handle it. And that it's not meant for things like scheduling, etc. That it's really only meant to measure durations in contiguous real seconds regardless of timezones and DST? In which case you only really need the seconds unit, right? Well, maybe microseconds too.

Because I'd also second abelbraaksma in saying that decimal durations would be a bad idea.

Which brings me to a suggestion.

If it's not considering DST or month durations etc., then anything above 1 hour is ambiguous. If not accounting for leap seconds, then even 1 minute is ambiguous. So if the intent is to specify a duration in raw seconds, or less, then those are the only units that should be available. Whether it is seconds, milliseconds, nanoseconds, whatever unit precision makes the most sense; as an integer. And documentation should make clear that it's raw time, not clock time or calendar time, so people don't use 86400s to intend a day, etc. I'd suggest calling it something like 'raw duration' instead of just 'duration' to make it clear.

I think that would simplify and clarify it. Maybe something like /RD(\d+s)|(\d+ms)|(\d+µs)|(\d+ns)/

@arp242
Copy link
Contributor

arp242 commented May 21, 2022

Re: @abelbraaksma; it's indeed not hard to parse 1h10m, it just seems to me that the alternative is simpler.

At any rate, I just looked at what seems to work well for Varnish; that was the only config file format I could think of with native duration types (Might be worth looking what other formats are out there, can't recall any from the top of my head).

I'm not opposed to the 1h10m format. If we do go with that then fractions should be forbidden IMHO, as I find that mixing both decimal fractions and base-24/base-60 to be confusing (e.g. 1.5m and 1m30s would be the same), with the possible exception of seconds.


Re: @Falkon1313: I'm not sure how common those scenarios really are for TOML; what a duration useful for is mostly things like timeouts, cache durations, how often to run some background jobs, things like that.

Things like "send report every quarter" or "send invoice 1st of every month" can't easily be expressed in a time duration; the first issue is that many standard libraries use an integer or some variant thereof so the only way this can work is if TOML implementations provide a custom "duration" type which keeps records what the TOML file actually has, and which won't integrate all that well in most stdlibs. Personally, I'd really like to avoid that: TOML should be easily parsed to the native types of most common languages.

The second issue is what does "3 months" really mean? 3 months from when the application starts? 3 months from now? 3 months from Jan 1st? For something like cache-duration = 1week or connect-timeout = 10s this doesn't matter, but for "send reports every 3 months" or "send invoice every month" it does. Personally I'd never encode this kind of thing as a duration, but rather as send-invoice-day = [1..31] and send-report = "[daily/weekly/monthly/quarterly]".

A small ambiguation also exists due to leap seconds and leap days, but for many (not all) use cases these can essentially be ignored.

@septatrix
Copy link

Usually the relevant time scale is pretty well known and does not differ by more than an order of magnitude which is why I do not thing this feature is too critical. In most cases a well chosen field name is sufficient like TimeoutSec = 5 like systemd does... There are situations where the duration might span a wider range e.g. BackupInterval could be anything from 12 hours to every 4 weeks but at least in this case it is probably better to use a crontab like syntax anyhow. Sure it would be nifty to have this at hand sometimes but most situations could be solved with a well chosen field name.


Generally this seems like a very niche feature while resulting in more complex implementations.
A good chunk of languages have no native duration support making supporting this even more annoying.
Short letter abbreviations can result in unexpected type conversions which is already very annoying in YAML and I hope TOML will avoid this.
Many interpretation issues like whether a year is 365 or 365.25 days long and restricting the allowed units only makes this feature even more niche.

@abelbraaksma
Copy link
Contributor

abelbraaksma commented May 24, 2022

I'm not opposed to the 1h10m format. If we do go with that then fractions should be forbidden IMHO,

@arp242 I agree, that's why I suggested to use integers for h and m and floats for s.

With respect to the side-discussion on allowing months and years, if (big if?) we go that route, just do what NodaTime and other libraries do and don't mix year-month durations with day-time durations. Durations are irrespective of a timezone or a starting date/time. Hence a minute is 60 seconds, an hour is 60 minutes. But a month has undefined length (it must be irrespective of starting date/time), so a year is 12 months, but what a month is, we don't define.

If you have any date or date-time value, you can add a year-month duration to it and a day-time duration. You can also add a year-month-day-time duration, but only by adding year and month first and then adding day and time.

That way it is an unambiguous definition.

@eksortso
Copy link
Contributor

I'm not opposed to the 1h10m format. If we do go with that then fractions should be forbidden IMHO,

@arp242 I agree, that's why I suggested to use integers for h and m and floats for s.

@abelbraaksma I'd recommend using the same precision that we define for time types.

From v1.0.0:

Millisecond precision is required. Further precision of fractional seconds is implementation-specific. If the value contains greater precision than the implementation can support, the additional precision must be truncated, not rounded.

@abelbraaksma
Copy link
Contributor

abelbraaksma commented May 25, 2022

@eksorto, you're absolutely right, my main point was to have integers for hours and mins, secs should be the same as for time of course.

@eksortso
Copy link
Contributor

@abelbraaksma Well, taking a hint from the current spec, we could use a similar approach for hours, minutes, and seconds. Values falling within well-defined boundaries will be accepted as is. And if the time values fall out of bounds, are fractional float values, or are specified out of order, then the parsing behavior would be implementation-specific. Someone will want to use 0.5h instead of 30m. It's inevitable. But if they do that, then they must know it won't be standard behavior. It will be defined by the parser and the language, not by us.

But, any potential reliance on implementation-specific behaviors does beg the question posed by @pradyunsg of whether durations ought to be standardized in TOML at all. Do we want to bear the burden of defining time delta standards that all parsers must adhere to? We got away with that for dates and times. But we'd have to impose TOML-specific duration standards that are not as clear-cut as what exists for datetimes.

@abelbraaksma
Copy link
Contributor

@eksortso, that might be a viable approach. Also, I totally understand the reluctance of implementing this in the first place. I don't really have a strong opinion on that. I do like strong, useful types in TOML, but at the same time, where do you draw the line? Whether this is feature-creep or not is probably anybody's guess. Yet at the same time, it's useful and a relatively small addition. And people are not required to use it (heck, I know many people using TOML without using tables...).

@abcdehc
Copy link

abcdehc commented May 27, 2022

agree with it whichever way it takes ! When i use datetime.timedelta in python , i have to write like this :
"self.moving_validity = datetime.timedelta(minutes=self.config['param']['moving_validity_m'],seconds=self.config['param']['moving_validity_s'])"
That's so inelegant

@eksortso
Copy link
Contributor

@abcdehc I get where you're coming from. In Python it'd be nice to write self.moving_validity = self.config['param']['moving_validity'] and have a timedelta set straightaway. Short of that, the most elegant way would be to break it down a bit first. (Assuming the same config as before.)

mv_units = self.config['param']
m = mv_units['moving_validity_m']
s = mv_units['moving_validity_s']
self.moving_validity = datetime.timedelta(minutes=m, seconds=s)

And even then, Python will normalize all that to days, seconds, and microseconds anyway. Which points to the fact that TOML durations' fundamental nature has not yet been agreed upon, if it ever will be.

The timedelta documentation in Python explicitly says that seconds are stored internally, but not minutes. A TOML parser could naively lean on Python's own implementation, or it could introduce a standardized duration object that would be at odds with how timedelta works. So if we had a duration value assigned in TOML like param.moving_validity = 2m, would we end up with a timedelta of 120 seconds in our program, or a quantity that would preserve 2 exact minutes no matter what? If we added it to an aware datetime like 2016-12-31 23:59:00Z (i.e. 60 seconds before a leap second was added to the UTC timestream) in our program, would we expect to see it become 2017-01-01 00:01:00Z, or would we take the leap second into account and expect 2017-01-01 00:00:59Z instead? If we did it the Python way, we'd get that latter datetime, which is 120 seconds later in UTC.

But no proposal so far about durations in TOML has defined what units are preserved in implementation. We haven't even discussed normalization. Too much is left to the parser or the language to scrape together. It's not like how we had RFC 3339 and implementations of it to rely on for dates and times.

This is how deep this subject goes. I haven't even looked into how C++ or Golang represent their typical time duration data types or how they interoperate with time types. Is there any sort of agreed-upon standard? Is there an RFC that we could point to, to smooth this whole thing out? What is so minimal about any of these efforts? Complexity underlies the simplest implementations.

So I regret to admit, short of a well-accepted standard (sorry, ISO 8601) or implementation, that we ought to abandon time durations as being not obvious enough for the TOML standard to embrace.

@arp242
Copy link
Contributor

arp242 commented May 27, 2022

I haven't even looked into how C++ or Golang represent their typical time duration data types or how they interoperate with time types.

In Go time.Duration is just an int64 (type Duration int64), representing a number of nanoseconds.

Personally I think that's not really a show-stopper though, as leap-seconds can be ignored for many purposes (it is a show-stopper for supporting at least months though), and in practice many applications (including those written in Go, but probably also Python) already ignore leap seconds with durations since they don't contain a database of when leap seconds occurred. Event time-specific applications don't always fully implement leap seconds "the right way"; for example Google's NTP doesn't apply leap seconds, OpenBSD just pretends they don't exist, etc.

What I'm saying is that defining a "minute" to be "60 seconds" will be fine for practically all use cases, and we don't need to worry about leap seconds at all.

@eksortso
Copy link
Contributor

@arp242 A little more comforting! But still no common standard. Seconds are the common standard, and we need millisecond precision guaranteed. If in TOML we fixed minutes to 60 seconds and hours to 3600 hours, could we confidently assert that common time durations in all languages can handle a sufficiently large number of seconds, positive or negative? And what would that limit be in order to ensure compatibility across platforms?

@arp242
Copy link
Contributor

arp242 commented May 27, 2022

For numbers TOML already specifies that "Arbitrary 64-bit signed integers should be accepted and handled losslessly"; for int64 we'd be talking about 2.9 million years, or 292 years if we allow nanoseconds. Using int64 nanoseconds probably makes sense.

@marzer
Copy link
Contributor

marzer commented May 27, 2022

@eksortso

I haven't even looked into how C++ [represents] typical time duration types

Using chrono::duration<>. Bunch of C++ template soup that ultimately distills down to a single integer or float, depending on what you want it to represent and what precision you need. Typically you'd use nanoseconds (64-bit integer backing) or milliseconds (integer of at least 45 bits, which almost always a 64-bit integer in practice).

or how they interoperate with time types.

In one of the newer versions of the standard there's new date/time types, with duration interop, but I have absolutely no idea how it works and it seems confusing as hell, tbh. All I can say is that there is some interop.

@abelbraaksma
Copy link
Contributor

we don't need to worry about leap seconds at all.

Indeed. But just to emphasize, durations should be agnostic to leap seconds, minutes, years or even Era or calendar. That's why it's important to separate months + year and day + time. The only moment leap seconds or leap years come into play is when a duration is added to a date-time, which itself already has all the information (i.e., adding 1 month to Feb 1 2004 is 1 March 2004; adding 28 days to Feb 1 2004 is 29 Feb 2004, adding it to Feb 1 2005 is 1 March 2005).

Luckily, TOML doesn't do calculations, so we don't have to worry about that.

By making durations (which is not the same as timedelta!) agnostic of the current time, we bypass any of these potential issues and only need a very simple datatype.

@eksortso
Copy link
Contributor

eksortso commented May 31, 2022

durations (which is not the same as timedelta!)

I'm used to the timedelta type in Python, which is about the equivalent of the TOML duration type that we are working to propose. I don't know what you're referring to, @abelbraaksma. But we rejected doing intervals between timestamps a long time ago.

@Felk
Copy link

Felk commented Jun 1, 2022

I think he just means that if you say "1 month" that it cannot be translated to a fixed duration (say 30.43 days or something), but is only applicable in the context of a calendar (say "February 3rd" + "1 month" = "March 3rd"). And that such calculations are not TOML's responsibility.

@abelbraaksma
Copy link
Contributor

But we rejected doing intervals between timestamps a long time ago.

@eksortso, I may be wrong. What I meant is that timedelta is a delta between two time instances (i.e, the delta between 23:59 - 0:02) and therefor potentially dependent on leap seconds and years, Era, Calendar and the like.

To me, a duration is agnostic to any time instance (i.e., 3 minutes) and therefor not dependent or influenced by such notions.

What different programming languages use for duration or timedelta or other (i.e. it could be Interval just the same) ultimately shouldn't matter for TOML, but I do think that the notion "a delta between two times" or a "duration of certain amount of seconds" are two similar, yet distinct notions.

And that such calculations are not TOML's responsibility.

Indeed @Felk, that's what I meant ;).

@eksortso
Copy link
Contributor

eksortso commented Jun 6, 2022

For numbers TOML already specifies that "Arbitrary 64-bit signed integers should be accepted and handled losslessly";

@arp242 That specifies an expected range for integers. It's not an implementation detail necessarily. That's important to remember because smaller integer ranges might be permitted, against advice, for things like embedded systems. I think we need to keep the specification logic separate from the implementation details that a parser may use.

for int64 we'd be talking about 2.9 million years, or 292 years if we allow nanoseconds. Using int64 nanoseconds probably makes sense.

Again, we're not dictating the implementation details. But your calculations suggest a good expected range for durations. In whichever way a parser may implement a TOML duration, it would guarantee millisecond precision, even though most implementations we've seen allow for an integral number of nanoseconds. How would we state this? 290,000 years in either direction?

I think, though, that we ought to require this one thing of compliant parsers. It should be readily apparent, if not downright obvious, how a duration's value can be added to or subtracted from a timestamp's value, once each value is parsed. For instance, Python's timedelta class lives in the same standard library module as the date/time classes, which permit addition and subtraction of timedeltas from their objects. Surely other libraries have similar connections between their timestamp and duration concepts. I don't know how to codify this requirement, just that it's not a MUST, it's not as strong as a SHOULD, but it's stronger than a MAY, I think, if I may abuse the RFC 2119 terminology.

@tintin10q
Copy link

tintin10q commented Sep 11, 2023

While I do think that this proposed by arp242 is quite elegant:

m[illi]s[econds]   milliseconds
s[econds]          seconds
m[inutes]          minutes
h[ours]            hours
d[ays]             days
w[eeks]            weeks
y[ears]            years

But I think it is a bad idea to go anywhere above days. Leap years already is already not nice but there are many ways to represent years for example See https://altalang.com/beyond-words/6-calendars-around-the-world/. Time in general is just hard.

Yes it is nice to have something like this in python:

release_date = datetime.date.today() + parsed_toml["time-till-release"]

But is that really that much better than:

release_date = datetime.date.today() + datetime.timedelta(days=parsed_toml["days-till-release"])

I don't think it warrants the extra complexity. Similar to the file sizes proposal it is just put the unit in the name.

We should really try to keep this in mind prettier/prettier#40 as well I know it talks about formatting but still.

@arp242
Copy link
Contributor

arp242 commented Sep 23, 2023

One major issue any implementation will run in to with durations (but also sizes, or any other suffix) is compatibility. Consider an existing file with:

timeout = 5000  # Timeout, in ms

You upgrade to a new TOML version with durations, and you want to support:

timeout = 5s  # or 500ms, or any other duration.

Great, but ... you don't want everyone to update their config files, so timeout = 5000, with ms implied, should still work.

Turns out this is a bit tricky; in e.g. Python I guess you'll end up with:

if type(config.timeout) in [int, float]:
    # Assume ms
else:
    # datetime.timedelta

But in other more statically typed languages parsers will have to end up creating your own struct or class or whatever the language has, so you can do:

if config.timeout.DurationUnspecified() {
    // Assume ms
} else {
}

And/or maybe:

var config Config
config.timeout.Default(time.Millisecond)
toml.Decode(..)

But it all pushes some amount of complexity to both the parser and application, at least if you want v = 5 and v = 5s to both work (which you do for existing keys). You typically can't "just" use the stdlib's duration/timedelta time.

Although for new keys it's okay to only support the suffixed variant, you still want to make sure ONLY that variant is allowed. I can foresee subtle confusion with things where people do:

timeout = 5 # which means "5ms" instead of "5s" the user may have expected.

And then the application just does:

app.set_timeout(config.timeout)

And this "duck types" out alright and it "works", except it does something expected, which isn't even immediately obvious (low timeout which works fine in your local machine but times out in production ... sounds like a fun time).

I suppose type hints and the like will prevent that, and things have been moving in that direction over the last few years, but still...


Long story short, I started prototyping this in my TOML library and writing a concrete proposal, but after encountering these issues I'm less sure if we really want this.

That said, it is commonly implemented in many config files. I did a survey of some common software, based on "what I could think of" and looking at the top 500 packages in https://popcon.debian.org – this is perhaps a bit biased, and some software supports neither format (e.g. ALSA configuration has no use for either durations or sizes).

Overall, I think it's more widely supported than datetimes, which TOML already supports:

Config Time units Fractions Size units Fractions
Apache ms ⹋
Caddy ns, us/µs, ms, s, m, h, d 1.5h 1h30m 1000: kB, mB - 1024: kiB, miB ?
MariaDB For some options; but can't find docs ?
OpenSSH s, m, h, d, w 1h30m ?: K, M, G ?
Postfix s, m, h, d, w no
PostgreSQL us, ms, s, min, h, d no 1024: B, kB, MB, GB, TB (w/ fractions) 1.5M (round to K)
Redis 1000: k, m, g - 1024: kb, mb, gb ?
Varnish ms, s, m, h, d, w, y 1.5h n/a
git 1024: k, m, g no
haproxy us, ms, s, m, h, d no 1024: k, m, g (no fractions) -
nginx ms, s, m, h, d, w, M (30d), y 1h30m ?: k/K, m/M ?
samba
systemd n/a

n/a: Not applicable; there are no settings that could use this unit.
†: Unit is implied.
‡: Unit is in the key name (e.g. podInitialBackoffSeconds, JobTimeoutSec).
⹋: Unit is usually implied, but a few values do allow changing the unit.

In many cases where a unit isn't supported, it would be better if it was. For example (default values):

  • Postfix: lmdb_map_size = 16777216 (16MB)
  • Samba: deadtime = 10080 (1 week)
  • Apache: LimitRequestBody = 1073741824 (1GB)

Some are also inconsistent; e.g. Redis's lfu-decay-time is in minutes, but repl-timeout is in seconds.

Of course, this isn't TOML, but "how many TOML files actually need this?" is a bit harder to answer as it's harder to find projects which support TOML. I suppose I could check package list contents, but I haven't bothered (because that's a bit of work).

@eksortso
Copy link
Contributor

eksortso commented Oct 4, 2023

Let's put this feature request on hold until after v1.1.0 is released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests