Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xlsx2csv - controlling DATE output format #159

Open
tseemann opened this issue Aug 24, 2021 · 6 comments
Open

xlsx2csv - controlling DATE output format #159

tseemann opened this issue Aug 24, 2021 · 6 comments

Comments

@tseemann
Copy link

tseemann commented Aug 24, 2021

I have used csvtk xlsx2csv a few times.
It ususally outputs dates as YYYY-MM-DD
but yesterday i got one in USA format MM-DD-YY

Does XLSX store the date in a locale independent epoch format?

Ifo so, could you provide option for --date-format '%Y-%m-%d %H:%M' etc please?

The ISO-8601 standard would be the best default choice.

@shenwei356
Copy link
Owner

All data (as string matrix) are exported by xlsx.GetRows(sheetName) [][]string (supported by https://github.com/qax-os/excelize), and I also check the code, there's nothing I can do.

But after I format the cells with different date formats, they are outputted as they are shown in Excel/Libre Office.

test.xlsx

image

$ csvtk xlsx2csv test.xlsx 
data,value
2021-09-06 19:21:21,1
09/06/21,2

Looks like every cell stores its format and locale information, and excelize parses and shows it. You might need to format the original .xlsx file.

@shenwei356
Copy link
Owner

Or add a new command for formatting time and date. Gosh...

@tseemann
Copy link
Author

Excel is very annoying! I assumed it would store a "format independent" version of the date in the XML as well as the formatted one. Thank you for checking this.

This is also related to the DATE bug in filter2 you recently patched. We need xlsx2csv to produce ISO-8601 date format so filter2 can work :-)

shenwei356 added a commit that referenced this issue Aug 25, 2021
@shenwei356
Copy link
Owner

Added a new command fmtdate.

Example:

image

$ csvtk xlsx2csv date.xlsx  | csvtk pretty 
data                  value
-------------------   -----
2021-08-25 11:24:21   1
08/25/21 11:24 p8     2
NA                    3
                      4

$ csvtk xlsx2csv date.xlsx  \
    | csvtk fmtdate --format "YYYY-MM-DD hh:mm:ss" \
    | csvtk pretty 
data                  value
-------------------   -----
2021-08-25 11:24:21   1
2021-08-25 11:24:00   2
                      3
                      4

$ csvtk xlsx2csv date.xlsx  \
    | csvtk fmtdate --format "YYYY-MM-DD hh:mm:ss" -k \
    | csvtk pretty 
data                  value
-------------------   -----
2021-08-25 11:24:21   1
2021-08-25 11:24:00   2
NA                    3
                      4

Usage:

$ csvtk fmtdate -h
format date of selected fields

Date parsing is supported by: https://github.com/araddon/dateparse
Date formating is supported by: https://github.com/metakeule/fmtdate

Time zones: 
    format: Asia/Shanghai
    whole list: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones

Output format is in MS Excel (TM) syntax.
Placeholders:

    M    - month (1)
    MM   - month (01)
    MMM  - month (Jan)
    MMMM - month (January)
    D    - day (2)
    DD   - day (02)
    DDD  - day (Mon)
    DDDD - day (Monday)
    YY   - year (06)
    YYYY - year (2006)
    hh   - hours (15)
    mm   - minutes (04)
    ss   - seconds (05)

    AM/PM hours: 'h' followed by optional 'mm' and 'ss' followed by 'pm', e.g.

    hpm        - hours (03PM)
    h:mmpm     - hours:minutes (03:04PM)
    h:mm:sspm  - hours:minutes:seconds (03:04:05PM)

    Time zones: a time format followed by 'ZZZZ', 'ZZZ' or 'ZZ', e.g.

    hh:mm:ss ZZZZ (16:05:06 +0100)
    hh:mm:ss ZZZ  (16:05:06 CET)
    hh:mm:ss ZZ   (16:05:06 +01:00)

Usage:
  csvtk fmtdate [flags]

Flags:
  -f, --fields string      select only these fields. e.g -f 1,2 or -f columnA,columnB (default "1")
      --format string      output date format in MS Excel (TM) syntax, type "csvtk fmtdate -h" for details (default "YYYY-MM-DD hh:mm:ss")
  -F, --fuzzy-fields       using fuzzy fields, e.g., -F -f "*name" or -F -f "id123*"
  -h, --help               help for fmtdate
  -k, --keep-unparsed      keep the key as value when no value found for the key
  -z, --time-zone string   timezone aka "Asia/Shanghai" or "America/Los_Angeles" formatted time-zone, type "csvtk fmtdate -h" for details

@tolot27
Copy link

tolot27 commented Aug 27, 2021

Hmm, csvtk mutate2 can calculate with dates, at least partially. But it does not return a date-formatted string. It returns a number. Formatting it with csvtk fmtdate returns an empty column:

echo -e "date\n2021-08-26" | csvtk mutate2 -e '$date - 86400' -n res
date,res
2021-08-26,1629842400.00

echo -e "date\n2021-08-26" | csvtk mutate2 -e '$date - 86400' -n res | csvtk fmtdate -f res --format "YYYY-MM-DD"
date,res
2021-08-26,

@shenwei356
Copy link
Owner

Let's leave this to v0.25.0, after finishing #172

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants